Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collections to field that can be updated atomically #271

Open
anjackson opened this issue Nov 2, 2021 · 4 comments
Open

Add collections to field that can be updated atomically #271

anjackson opened this issue Nov 2, 2021 · 4 comments

Comments

@anjackson
Copy link
Contributor

anjackson commented Nov 2, 2021

Currently, collections are stored as strings in multivalued fields. This has a couple of problems. Firstly, really, the string version should be translated in the UI, and we only need to store integer IDs for collections.

More importantly, the current model requires full document re-indexes if the Collections are updated. It would be better to store the collection in fields that meet the criteria for atomic, in-place updates (see In-Place Updates). This would allow collection membership to be updated without costly full re-indexing.

The main limitation is that these fields have to be single-valued. If URLs can only belong to one collection, or have a 'primary collection', then this works fine. But in general we want multiple collections, so as a workaround, we can use dynamic fields something like:

collection_1_id_i: 231
collection_2_id_i: 214
...

Then, at query time, we facet on all collection_*_id_i values (and likely have to enumerate and merge these facets client side?).

This needs to be tested from the client end to check it's workable. I think we may have to enumerate all the facets separately, so in practice we'll have a limit of e.g. 6 collections an item can belong too?

EDIT The rights field access_terms should also be an integer rather than a string to, so this can be changed. Same for any subject fields.

@anjackson anjackson changed the title Shift collections to field that can be updated atomically Add collections to field that can be updated atomically Nov 2, 2021
@tokee
Copy link
Collaborator

tokee commented Nov 9, 2021

Updating is not trivial as one needs to extracts the collections for a document first, so that the next free collection-field can be determined. But I have no better idea than yours: By limiting the number of collections to 64, they could be stored in a single long, but that would require more front end code to unpack and the number of unique values when faceting is potentially enormous.

@anjackson
Copy link
Contributor Author

We can store them in a long, but I couldn't see a way to facet on bits? Maybe I missed something?

@tokee
Copy link
Collaborator

tokee commented Nov 9, 2021

You can't facet on bits in longs (well, one could build a special processor for it, but that would be tedious to maintain). But you could post process the facet result and do the tallying of the individual collections there. But again: I prefer your solution. I'm just thinking out loud here.

@anjackson
Copy link
Contributor Author

Ah gotcha. And you're right, the updating will be tricky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants