You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a thought regarding how filters work. I think it would be relatively easy to implement filters as simply lists/ db tables of item indexes of the parent dataset. Essentially creating a filter would create a dataset containing only the indexes of the filtered items from the parent. Then iterate_item would, if dataset is a filtered dataset, iterate through the parent dataset and only yield those indexes items.
Could store as actual csv/ndjson dataset or in database table. The idea being to save space from duplicate data.
Possible issue: deleting a parent dataset, but wanting to keep a filtered dataset.
Also would have to update how downloading datasets in frontend works since we would not have a flat file anymore for filtered datasets.
Migrate script would be complicated since indexes for parent dataset are not stored and filter datasets may need to thus be rerun. Alternatively we could use something like item id and check for it instead of indexes. That makes iterate_items more complex a calculation (is id in long_list) though perhaps fetching from a db table it wouldn’t be so bad. This would make the migrate script easier.
The text was updated successfully, but these errors were encountered:
My two cents is that this would indeed provide some technical and memory benefits, but goes against 4CAT's 'traceability' where every research step is a concrete and easily retrievable. We could make it work like it seems as if filtered datasets are new ones, but the work required is not worth it imo.
I had a thought regarding how filters work. I think it would be relatively easy to implement filters as simply lists/ db tables of item indexes of the parent dataset. Essentially creating a filter would create a dataset containing only the indexes of the filtered items from the parent. Then iterate_item would, if dataset is a filtered dataset, iterate through the parent dataset and only yield those indexes items.
Could store as actual csv/ndjson dataset or in database table. The idea being to save space from duplicate data.
Possible issue: deleting a parent dataset, but wanting to keep a filtered dataset.
Also would have to update how downloading datasets in frontend works since we would not have a flat file anymore for filtered datasets.
Migrate script would be complicated since indexes for parent dataset are not stored and filter datasets may need to thus be rerun. Alternatively we could use something like item id and check for it instead of indexes. That makes iterate_items more complex a calculation (is id in long_list) though perhaps fetching from a db table it wouldn’t be so bad. This would make the migrate script easier.
The text was updated successfully, but these errors were encountered: