v.out.ogr: faster export with many attributes #4741

metzm · 2024-11-22T17:03:59Z

This PR proposes a new version of v.out.ogr that is about 5x faster when exporting vector maps with many attributes. E.g. the time to export a copy of streets_wake from the NC dataset in a mapset that uses sqlite as db is reduced from 12.5 sec to 2.5 sec.

While v.out.ogr issues a sql select statement for each feature to be exported, the new version issues only a single sql select statement at the beginning to get attributes ordered by category value. Vector features are then traversed also by ordered category value and the corresponding attributes can be directly fetched from the result of the initial select statement.

This PR introduces a new alternative to v.out.ogr named v.out.ograttr, but instead another name could be used or v.out.ogr could be replaced.

metzm · 2024-11-22T18:24:31Z

With dbf as db driver, the time to export streets_wake is reduced from 2m22.4s to 2.5 sec.

petrasovaa · 2024-11-22T20:15:46Z

Perhaps obvious question, but why is this a separate tool from v.out.ogr, why not incorporate it it there?

metzm · 2024-11-22T20:34:12Z

Perhaps obvious question, but why is this a separate tool from v.out.ogr, why not incorporate it it there?

Because it might substantially increase RAM consumption if the complete attribute table sorted by category values is kept in RAM (depends on the db driver). Therefore I am undecided if this should be a new module or an improvement to v.out.ogr. Opinions welcome! You opt to update v.out.ogr right?

petrasovaa · 2024-11-22T20:54:52Z

Perhaps obvious question, but why is this a separate tool from v.out.ogr, why not incorporate it it there?

Because it might substantially increase RAM consumption if the complete attribute table sorted by category values is kept in RAM (depends on the db driver). Therefore I am undecided if this should be a new module or an improvement to v.out.ogr. Opinions welcome! You opt to update v.out.ogr right?

Exporting many attributes seems like a very common use case, so making it faster (even with more RAM used) sounds good to me.

ecodiv · 2024-11-23T15:07:18Z

Perhaps obvious question, but why is this a separate tool from v.out.ogr, why not incorporate it it there?

Because it might substantially increase RAM consumption if the complete attribute table sorted by category values is kept in RAM (depends on the db driver). Therefore I am undecided if this should be a new module or an improvement to v.out.ogr. Opinions welcome! You opt to update v.out.ogr right?

Would it be possible to add this as an option to v.out.ogr, or make it the default and have the old method as option? That way, there remains an alternative if RAM becomes a limitation?

metzm · 2024-11-23T17:48:22Z

Thanks for the positive feedback! Regarding memory consumption, there is an increase of 0.5% (sqlite) and 1.5% (dbf) in RAM consumption when exporting streets_wake. This should in practice have no adverse effect.

I have included the new, faster method directly in v.out.ogr. The old, slower method can be used with the new -o flag. v.out.ogr has already quite a few flags, therefore I struggled to come up with a new flag that somehow makes sense and is not yet used. Suggestions welcome!

echoix · 2024-11-23T17:52:06Z

Since it would be used for special cases (if ever), and when the new way doesn't work, can't you use a full word flag instead of single letter?

echoix · 2024-11-23T17:53:24Z

After that (finishing adapting), the v.out.ograttr files won't be needed in this PR anymore right?

metzm · 2024-11-23T19:29:28Z

The -o flag has been replaced with a new option method, allowed answers are slow and fast, default is the new, fast method.

echoix

I also have a general question regarding the approach. From the description, I understand that you do the equivalent of a query to get the attributes, then do operations one by one on each feature.
If I was doing something purely in a database, I would have been using a transaction with consistent snapshot if available and appropriate to make sure that what I'm iterating over from my cached copy is still valid when I'm at the feature. Is there any of these concepts that apply here?

vector/v.out.ogr/export_areas_fast.c

vector/v.out.ogr/export_lines_fast.c

Co-authored-by: Edouard Choinière <[email protected]>

metzm · 2024-11-26T18:04:13Z

I also have a general question regarding the approach. From the description, I understand that you do the equivalent of a query to get the attributes, then do operations one by one on each feature. If I was doing something purely in a database, I would have been using a transaction with consistent snapshot if available and appropriate to make sure that what I'm iterating over from my cached copy is still valid when I'm at the feature. Is there any of these concepts that apply here?

IIUC, the result of a select statement is a temporary table that will hopefully not be modified while iterating over the results. At least in case of our default db driver sqlite, another process should not be able to modify the original table on which the select statement was executed because the database should be locked. The db_fetch(cursor, DB_NEXT, more) approach on the result of a select statement is also used by a number of other modules, without known harm so far.

echoix · 2024-11-27T17:40:25Z

Is there some html docs to update here before merging, or the defaults will be enough?

metzm · 2024-11-27T21:00:50Z

Is there some html docs to update here before merging, or the defaults will be enough?

The new option method=fast is automatically included in the docs (I assume you are familiar with the way module docs are generated). From a user perspective, the output is identical, the only difference is that the output is produced at least 5 times faster.

echoix · 2024-11-27T21:43:48Z

Is there some html docs to update here before merging, or the defaults will be enough?

The new option method=fast is automatically included in the docs (I assume you are familiar with the way module docs are generated). From a user perspective, the output is identical, the only difference is that the output is produced at least 5 times faster.

I was just making sure, since I noticed that there wasn't any html file in the diff, while I remember that there was one in a previous state of the PR, when it was a separate module. If it wasn't a 1:1 copy of the docs, then the changes would have been lost here.

It's all for me, thanks a lot for this enhancement!

v.out.ogr: faster export with many attributes

0ab586a

metzm added enhancement New feature or request vector Related to vector data processing C Related code is in C labels Nov 22, 2024

metzm added this to the 8.5.0 milestone Nov 22, 2024

metzm requested a review from benducke November 22, 2024 17:03

github-actions bot added Python Related code is in Python HTML Related code is in HTML module docs tests Related to Test Suite labels Nov 22, 2024

metzm changed the title ~~DRAFT: v.out.ogr: faster export with many attributes~~ v.out.ogr: faster export with many attributes Nov 22, 2024

metzm requested a review from nilason November 22, 2024 17:24

add new fast method directly to v.out.ogr

146c0ff

remove v.out.ograttr, replace flag with option

17b84e1

comment out unused function

249c810

echoix reviewed Nov 26, 2024

View reviewed changes

vector/v.out.ogr/export_areas_fast.c Outdated Show resolved Hide resolved

vector/v.out.ogr/export_lines_fast.c Outdated Show resolved Hide resolved

vector/v.out.ogr/export_lines_fast.c Outdated Show resolved Hide resolved

metzm and others added 4 commits November 26, 2024 18:56

Update vector/v.out.ogr/export_areas_fast.c

dbb69cb

Co-authored-by: Edouard Choinière <[email protected]>

Update vector/v.out.ogr/export_lines_fast.c

abffbfd

Co-authored-by: Edouard Choinière <[email protected]>

Update vector/v.out.ogr/export_lines_fast.c

da5e499

Co-authored-by: Edouard Choinière <[email protected]>

translate error messages

ab9df91

echoix approved these changes Nov 27, 2024

View reviewed changes

metzm merged commit fac3981 into OSGeo:main Nov 27, 2024
26 checks passed

metzm deleted the v.out.ogr_attributes branch November 27, 2024 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v.out.ogr: faster export with many attributes #4741

v.out.ogr: faster export with many attributes #4741

metzm commented Nov 22, 2024 •

edited

Loading

metzm commented Nov 22, 2024 •

edited

Loading

petrasovaa commented Nov 22, 2024

metzm commented Nov 22, 2024

petrasovaa commented Nov 22, 2024

ecodiv commented Nov 23, 2024

metzm commented Nov 23, 2024

echoix commented Nov 23, 2024

echoix commented Nov 23, 2024

metzm commented Nov 23, 2024 •

edited

Loading

echoix left a comment

metzm commented Nov 26, 2024

echoix commented Nov 27, 2024

metzm commented Nov 27, 2024

echoix commented Nov 27, 2024

v.out.ogr: faster export with many attributes #4741

v.out.ogr: faster export with many attributes #4741

Conversation

metzm commented Nov 22, 2024 • edited Loading

metzm commented Nov 22, 2024 • edited Loading

petrasovaa commented Nov 22, 2024

metzm commented Nov 22, 2024

petrasovaa commented Nov 22, 2024

ecodiv commented Nov 23, 2024

metzm commented Nov 23, 2024

echoix commented Nov 23, 2024

echoix commented Nov 23, 2024

metzm commented Nov 23, 2024 • edited Loading

echoix left a comment

Choose a reason for hiding this comment

metzm commented Nov 26, 2024

echoix commented Nov 27, 2024

metzm commented Nov 27, 2024

echoix commented Nov 27, 2024

metzm commented Nov 22, 2024 •

edited

Loading

metzm commented Nov 22, 2024 •

edited

Loading

metzm commented Nov 23, 2024 •

edited

Loading