Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible enhancement: Filters as lists of indexes #429

Open
dale-wahl opened this issue Apr 25, 2024 · 1 comment
Open

Possible enhancement: Filters as lists of indexes #429

dale-wahl opened this issue Apr 25, 2024 · 1 comment
Labels
enhancement New feature or request questionable

Comments

@dale-wahl
Copy link
Member

dale-wahl commented Apr 25, 2024

I had a thought regarding how filters work. I think it would be relatively easy to implement filters as simply lists/ db tables of item indexes of the parent dataset. Essentially creating a filter would create a dataset containing only the indexes of the filtered items from the parent. Then iterate_item would, if dataset is a filtered dataset, iterate through the parent dataset and only yield those indexes items.

Could store as actual csv/ndjson dataset or in database table. The idea being to save space from duplicate data.

Possible issue: deleting a parent dataset, but wanting to keep a filtered dataset.

Also would have to update how downloading datasets in frontend works since we would not have a flat file anymore for filtered datasets.

Migrate script would be complicated since indexes for parent dataset are not stored and filter datasets may need to thus be rerun. Alternatively we could use something like item id and check for it instead of indexes. That makes iterate_items more complex a calculation (is id in long_list) though perhaps fetching from a db table it wouldn’t be so bad. This would make the migrate script easier.

@dale-wahl dale-wahl added enhancement New feature or request questionable labels Apr 25, 2024
@sal-uva
Copy link
Collaborator

sal-uva commented Apr 29, 2024

My two cents is that this would indeed provide some technical and memory benefits, but goes against 4CAT's 'traceability' where every research step is a concrete and easily retrievable. We could make it work like it seems as if filtered datasets are new ones, but the work required is not worth it imo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request questionable
Projects
None yet
Development

No branches or pull requests

2 participants