-
-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v.out.ogr: faster export with many attributes #4741
Conversation
With dbf as db driver, the time to export streets_wake is reduced from 2m22.4s to 2.5 sec. |
Perhaps obvious question, but why is this a separate tool from v.out.ogr, why not incorporate it it there? |
Because it might substantially increase RAM consumption if the complete attribute table sorted by category values is kept in RAM (depends on the db driver). Therefore I am undecided if this should be a new module or an improvement to |
Exporting many attributes seems like a very common use case, so making it faster (even with more RAM used) sounds good to me. |
Would it be possible to add this as an option to v.out.ogr, or make it the default and have the old method as option? That way, there remains an alternative if RAM becomes a limitation? |
Thanks for the positive feedback! Regarding memory consumption, there is an increase of 0.5% (sqlite) and 1.5% (dbf) in RAM consumption when exporting streets_wake. This should in practice have no adverse effect. I have included the new, faster method directly in |
Since it would be used for special cases (if ever), and when the new way doesn't work, can't you use a full word flag instead of single letter? |
After that (finishing adapting), the v.out.ograttr files won't be needed in this PR anymore right? |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have a general question regarding the approach. From the description, I understand that you do the equivalent of a query to get the attributes, then do operations one by one on each feature.
If I was doing something purely in a database, I would have been using a transaction with consistent snapshot if available and appropriate to make sure that what I'm iterating over from my cached copy is still valid when I'm at the feature. Is there any of these concepts that apply here?
Co-authored-by: Edouard Choinière <[email protected]>
Co-authored-by: Edouard Choinière <[email protected]>
Co-authored-by: Edouard Choinière <[email protected]>
IIUC, the result of a select statement is a temporary table that will hopefully not be modified while iterating over the results. At least in case of our default db driver sqlite, another process should not be able to modify the original table on which the select statement was executed because the database should be locked. The |
Is there some html docs to update here before merging, or the defaults will be enough? |
The new option |
I was just making sure, since I noticed that there wasn't any html file in the diff, while I remember that there was one in a previous state of the PR, when it was a separate module. If it wasn't a 1:1 copy of the docs, then the changes would have been lost here. It's all for me, thanks a lot for this enhancement! |
This PR proposes a new version of
v.out.ogr
that is about 5x faster when exporting vector maps with many attributes. E.g. the time to export a copy of streets_wake from the NC dataset in a mapset that uses sqlite as db is reduced from 12.5 sec to 2.5 sec.While
v.out.ogr
issues a sql select statement for each feature to be exported, the new version issues only a single sql select statement at the beginning to get attributes ordered by category value. Vector features are then traversed also by ordered category value and the corresponding attributes can be directly fetched from the result of the initial select statement.This PR introduces a new alternative to
v.out.ogr
namedv.out.ograttr
, but instead another name could be used orv.out.ogr
could be replaced.