query performance: need help optimizing #495

spiicychknsandy · 2020-07-02T01:27:39Z

I know this is not an issue for this repo but I have looked in a lot of places and tried a lot of different things and nothing is working.

I need to bulk update deleted_at for records for a table with 3M rows and I need to look at 2 other tables that both have 5-10M records.

My query is

UPDATE public.study_data_events sde
SET deleted_at = current_timestamp
WHERE sde.id
IN (
  SELECT s.id
  FROM public.study_data_events s
  LEFT JOIN public.study_data_values sdv on sdv.study_data_event_id = s.id and sdv.deleted_at is NULL
  LEFT JOIN public.intervention_phase_events ipe on ipe.intervention_event_id = s.id and ipe.deleted_at is NULL
  WHERE sdv.id is NULL
  AND ipe.id is NULL
  AND s.deleted_at is NULL
);

The EXPLAIN looks like

Update on study_data_events sde  (cost=48822.30..48822.62 rows=1 width=132)
   ->  Nested Loop  (cost=48822.30..48822.62 rows=1 width=132)
         ->  HashAggregate  (cost=48821.87..48821.88 rows=1 width=22)
               Group Key: s.id
               ->  Nested Loop Left Join  (cost=36390.15..48821.87 rows=1 width=22)
                     Filter: (sdv.id IS NULL)
                     ->  Hash Right Join  (cost=36389.71..48817.69 rows=1 width=16)
                           Hash Cond: (ipe.intervention_event_id = s.id)
                           Filter: (ipe.id IS NULL)
                           ->  Seq Scan on intervention_phase_events ipe  (cost=0.00..5637.57 rows=270448 width=14)
                                 Filter: (deleted_at IS NULL)
                           ->  Hash  (cost=29704.09..29704.09 rows=384610 width=10)
                                 ->  Seq Scan on study_data_events s  (cost=0.00..29704.09 rows=384610 width=10)
                                       Filter: (deleted_at IS NULL)
                     ->  Index Scan using study_data_values_study_data_event_id_idx on study_data_values sdv  (cost=0.43..3.31 rows=87 width=14)
                           Index Cond: (study_data_event_id = s.id)
                           Filter: (deleted_at IS NULL)
         ->  Index Scan using study_data_events_pkey on study_data_events sde  (cost=0.42..0.72 rows=1 width=106)
               Index Cond: (id = s.id)

We have this seq scan

->  Seq Scan on study_data_events s  (cost=0.00..29704.09 rows=384610 width=10)
    Filter: (deleted_at IS NULL)`

that is very costly and I'm wondering what I can do about it.

I've tried adding a variety of indices on s.deleted_at none of which have helped. Has anyone needed to filter a large dataset on deleted_at IS NULL?

The text was updated successfully, but these errors were encountered:

rishi-jisr · 2022-04-14T19:30:24Z

@spiicychknsandy Did you find any solution for this ?

mathieujobin added the stale label Feb 2, 2022

mathieujobin added the question label Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query performance: need help optimizing #495

query performance: need help optimizing #495

spiicychknsandy commented Jul 2, 2020

rishi-jisr commented Apr 14, 2022

query performance: need help optimizing #495

query performance: need help optimizing #495

Comments

spiicychknsandy commented Jul 2, 2020

rishi-jisr commented Apr 14, 2022