Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user of the objects API, I want to ensure my data is properly indexed and quick to retrieve #200

Open
alextreme opened this issue Jun 4, 2021 · 1 comment
Assignees

Comments

@alextreme
Copy link
Member

From: https://github.com/orgs/Gemeente-DenHaag/projects/3#card-62299949

Based on a review a couple of improvements would be possible, feel free to correct me if I'm wrong and note that we can have @stevenbal help out with these changes if necessary:

  1. Pagination has been implemented in As developer, I want the API to be paginated #148 and implemented by Anna in PR As developer, I want the objects API to include a reference to the objecttype #36 . It has yet to be merged or released and will not be a backwards-compatable API change. What should be kept in mind is that this is something that might hamper in-bulk retrievals (getting 1M objects will need 2000 API calls instead of 1 with a pagination of 500-per-page, and those 2000 calls will be much more difficult to optimize performance-wise than one big call). I would suggest allowing the API-user to toggle paginating or allow a MAX_INT pageSize as an appropriate size depends on the use-case.

  2. All objectrecords have a date and filtering seems date-enabled (last record / current record), however I didn't notice a DB index on the date fields. This could be added and you might want to only order on '-index' for the last record to avoid the date-ordering overhead altogether https://github.com/maykinmedia/objects-api/blob/master/src/objects/core/models.py#L71

  3. The postgres JSONField could be switched to the built-in Django 3.1 db.models.JSONField https://docs.djangoproject.com/en/3.2/ref/models/fields/#django.db.models.JSONField but both use jsonb under the hood so it shouldn't matter too much.

  4. Specific indexes can be added on-the-fly to a jsonb field ( https://www.postgresql.org/docs/current/datatype-json.html ) however this does depend on the use-case and the optimal indexes will be different depending on the object-type.
    4.1) It might make sense to add indexes to certain fields which occur in many object-types.
    4.2) As an alternative it would be possible to extend the viewing and creating of jsonb indexes to an administrator of the objects-api. This would allow an administrator to tweak the performance for their use-case. In-depth knowledge of the objects stored and the API calls used would be essential to do this properly though, and the performance could also be negatively impacted if used incorrectly. I would not recommend exposing this functionality via an API.

  5. Geometry field used is automatically indexed using a spatial index, so doesn't need one set explicitly.

Based on the above I would recommend 2+3 and like to discuss 1+4 further. 2+3+4 can be implemented without API changes so I would only want to purse 1 on the short term to avoid the API change later on.

@annashamray
Copy link
Collaborator

@alextreme

Thanks for analyzing the API performance-wise!

  1. Let's assess the time-to-response for different page sizes in the performance tests before making any decision (Feature/pagination #153)
  2. Nice catch! No DB indexes were created to optimize performance, indexes on date fields are certainly needed. I hope we will find other fields for indexing when running performance test.
  3. No objection, but I thought that DB-specific field should be more optimized to work with this particular DB
  4. Afaik we don't have enough information now which data attributes would be used by many object types. I think this optimization is a bit premature, let's collect some data from clients first
  5. It looks like geometry field representation can be a bottleneck itself, even without filtering on it. We'll see the results after the performance testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants