-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL list should be sorted before given to the user!? #351
Comments
@smancini we need to identify which layers would make sense to be provided sorted and in which way. For example, I'm suggesting to now sort ACORN hourly files alphabetically (equivalent to chronologically in this case): aodn/geoserver-config#567 |
I don't think we should be on a blitz to sort as many collections as possible while there are performance problems partially caused by trying to sort huge tables. See: https://github.com/aodn/issues/issues/199 Having sorts behind synchronous web requests has the potential to cause a bunch more performance problems, so I'd suggest this needs to be approached delicately from a systems perspective, and not considered as a "content enhancement", because sorting is not free. For example, the acorn change immediately makes the query take 3 times as long to complete. This might be acceptable in the context of that individual use case, but this is just one small example of how this can drastically worsen performance:
In any case, like I said, this is not a simple content enhancement and must be considered in the context of the current database performance/metrics/tuning initiative(s). |
I don't think this case is quite as comparable as https://github.com/aodn/issues/issues/199 because here we're not trying to sort huge amount of data, we're trying to sort a number of file URLs which is considerably smaller than a number of data in a dataset. Like shown by your benchmark, it would only take less than 2sec to retrieve the whole list of files from BONC which equates to only 45000 rows. In addition, the number of files is not going to grow exponentially like a data collections would (instruments are collecting data at a consistently growing rate) because we're only going to get 1 file every 1hour, consistently. I think in the case of radar URLs it is pretty safe to sort and yet would enhance the user experience significantly. |
In terms of amount of URLs, radar collections are probably the biggest with satellite data being second. gsla_dm is 6000 files growing at 1 file per day. Then probably comes the moorings because of the amount of instruments deployed on a mooring. anmn_ts is 5700 files. anfog_dm is 255 files. Is this a performance problem to query and sort 45000 rows? 6000 rows? Note that this is the worse case scenario when the user requests the whole collection. |
It's not necessarily a performance problem per se, but we do need to look at it in the context of the current performance tuning/metrics work. For example, introducing external disk sorts should be looked at in terms of the configuration around sort memory configuration etc. because it seems like a small thing but has the potential to contribute to an overall poorly performing database. We just need to be very clear on the scope, size and growth rate of tables where this sort of thing is enabled, which is why I suggest aligning it with the other database work which is seeking optimise and improve performance. Otherwise we end up with a situation where the left hand isn't talking to the right hand and the optimisation work is undermined by overly zealous content enhancements, or content enhancements are undermined by overly zealous optimisation work! |
@ggalibert can we close. I don't think this is an important task. and as mentioned in the comments given the asynchronous system in place, not easily achievable |
Migrated from here: aodn/aodn-portal#2686
The text was updated successfully, but these errors were encountered: