Possible enhancement: Filters as lists of indexes #429

dale-wahl · 2024-04-25T07:31:43Z

I had a thought regarding how filters work. I think it would be relatively easy to implement filters as simply lists/ db tables of item indexes of the parent dataset. Essentially creating a filter would create a dataset containing only the indexes of the filtered items from the parent. Then iterate_item would, if dataset is a filtered dataset, iterate through the parent dataset and only yield those indexes items.

Could store as actual csv/ndjson dataset or in database table. The idea being to save space from duplicate data.

Possible issue: deleting a parent dataset, but wanting to keep a filtered dataset.

Also would have to update how downloading datasets in frontend works since we would not have a flat file anymore for filtered datasets.

Migrate script would be complicated since indexes for parent dataset are not stored and filter datasets may need to thus be rerun. Alternatively we could use something like item id and check for it instead of indexes. That makes iterate_items more complex a calculation (is id in long_list) though perhaps fetching from a db table it wouldn’t be so bad. This would make the migrate script easier.

sal-uva · 2024-04-29T15:12:47Z

My two cents is that this would indeed provide some technical and memory benefits, but goes against 4CAT's 'traceability' where every research step is a concrete and easily retrievable. We could make it work like it seems as if filtered datasets are new ones, but the work required is not worth it imo.

dale-wahl added enhancement New feature or request questionable labels Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible enhancement: Filters as lists of indexes #429

Possible enhancement: Filters as lists of indexes #429

dale-wahl commented Apr 25, 2024 •

edited

Loading

sal-uva commented Apr 29, 2024

Possible enhancement: Filters as lists of indexes #429

Possible enhancement: Filters as lists of indexes #429

Comments

dale-wahl commented Apr 25, 2024 • edited Loading

sal-uva commented Apr 29, 2024

dale-wahl commented Apr 25, 2024 •

edited

Loading