Performance Issues #72

yeesian · 2015-09-24T18:10:14Z

I've found the streaming parser to be too slow, mostly because it's maintaining a complicated state machine, and generating lots of objects during the parsing.

I think it's better to just slurp in the data, first, before dealing with the logic within julia. Here's an example:

With this package:

julia> MAP_FILENAME = "tech_square.osm"
"tech_square.osm"

julia> @time nodes, hwys, builds, feats = getOSMData(MAP_FILENAME);
  1.001549 seconds (340.25 k allocations: 1.888 GB, 8.55% gc time)

Under my proposal (implemented here):

julia> @time nodes1 = parseNodes(MAP_FILENAME);
  0.032125 seconds (163.66 k allocations: 10.621 MB, 17.01% gc time)

julia> @time ways1 = parseWays(MAP_FILENAME);
  0.022908 seconds (141.77 k allocations: 8.979 MB)

The differences become significant when processing the following osm file in julia v0.4:

Before:

julia> @time nodes, hwys, builds, feats = getOSMData(MAP_FILENAME);
  2161.987085 seconds (570.21 M allocations: 3.354 TB, 13.85% gc time)

After:

julia> @time nodes = OpenStreetMapParser.parseNodes(MAP_FILENAME);
  49.350339 seconds (244.94 M allocations: 14.512 GB, 9.32% gc time)

julia> @time ways = OpenStreetMapParser.parseWays(MAP_FILENAME);
  50.805148 seconds (209.75 M allocations: 11.937 GB, 23.16% gc time)

The text was updated successfully, but these errors were encountered:

yeesian · 2015-09-24T18:17:02Z

I don't really know how to deal with it in this package though. I'm also leaning towards many other major changes (such as replacing Graphs with LightGraphs, modeling OSM elements more faithfully, and interoperability with DataFrames). Would you prefer for me to get the other package up to speed with most of the functionality here before dropping it in as a replacement, or to maintain a separate fork going forward?

garborg · 2015-09-24T19:18:58Z

That all sounds great to me -- I'd guess with the major switches, it could
be easier to implement and benchmark them in the pared down package at
first.

On Thursday, September 24, 2015, Yeesian Ng [email protected]
wrote:

I don't really know how to deal with it in this package though. I'm also
leaning towards many other major changes (such as replacing Graphs with
LightGraphs, modeling OSM elements more faithfully, and interoperability
with DataFrames). Would you be interested for me to get the other package
https://github.com/yeesian/OpenStreetMapParser.jl up to speed with most
of the functionality here before dropping it in as a replacement, or to
maintain a separate fork going forward?

—
Reply to this email directly or view it on GitHub
#72 (comment)
.

tedsteiner · 2015-09-24T20:59:25Z

I feel like I missed something big here. Are you planning to replace this package entirely?

If there is a faster parser available, why not just incorporate it here instead? I can't see anyone arguing against a faster parser that just reads the nodes or ways. The same goes for Graphs vs. LightGraphs -- I think everyone's only preference would be towards something that works well.

I guess my question is, why do you need to fork or start fresh?

garborg · 2015-09-24T21:31:27Z

I'm on the road, so I haven't followed the link, but it sounded to me like
a set of refactorings that would be easier to do (and Api easier to
experiment with) with a small core, and this is a bit of a sprawling
package. I assumed that would be temporary. Aside: the one item I still
wouldn't be in favor of is having data frames interop in the core
openstreetmap package.

On Thursday, September 24, 2015, Ted Steiner [email protected]
wrote:

I feel like I missed something big here. Are you planning to replace this
package entirely?

If there is a faster parser available, why not just incorporate it here
instead? I can't see anyone arguing against a faster parser that just reads
the nodes or ways. The same goes for Graphs vs. LightGraphs -- I think
everyone's only preference would be towards something that works well.

I guess my question is, why do you need to fork or start fresh?

—
Reply to this email directly or view it on GitHub
#72 (comment)
.

yeesian · 2015-09-24T22:12:40Z

Sorry I was in a seminar.

No, I don't have any intentions of replacing this package by registering a different package in METADATA. Any changes I'd like to see will still come in incremental PRs to this package.

But I did need to start afresh, for the reasons sean mentioned: to experiment with a smaller core, and see how changes (that are beyond the scope of a PR) would work out in practice. It will be really draining to do multiple major-but-incremental PRs (I have tried) that change both the parsers and the object models simultaneously, while (i) making sure everything else works, (ii) demonstrating why the changes might be necessary, and (iii) maintaining a tidy commit history.

yeesian · 2015-09-24T22:37:19Z

Aside: the one item I still wouldn't be in favor of is having data frames interop in the core openstreetmap package.

There won't be any dataframes operations in the core openstreetmap package. The OSM package/objects should be designed only to reflect the nature of OSM objects, rather than providing sophisticated data operations. The allusion to dataframes is only to allow users to export/convert OSM objects to dataframes (for functionality that OSM will not provide) if they so desire.

garborg · 2015-09-24T22:41:00Z

Oh, I misread. That makes sense.

On Thursday, September 24, 2015, Yeesian Ng [email protected]
wrote:

Aside: the one item I still wouldn't be in favor of is having data frames
interop in the core openstreetmap package.

There wouldn't be any dataframes operations in the core openstreetmap
package. The OSM package/objects should be designed only to reflect the
nature of OSM objects, rather than providing sophisticated data operations.
The allusion to dataframes is to allow users to export/convert OSM objects
to dataframes (for functionality that OSM will not provide) if they so
desire.

—
Reply to this email directly or view it on GitHub
#72 (comment)
.

tedsteiner · 2015-09-25T11:32:07Z

Cool, I'm all for it. Thanks for explaining, @yeesian! I was just a little thrown off by some of the text in the scope section your readme. If there's anything I can do to help make your life easier just let me know!

garborg · 2015-09-25T17:01:32Z

I'm back in front of a computer, and after reading @yeesian's readme, I agree that the package structure he outlines is what we should have in the long term.

Since I think @tedsteiner wants this package to remain a one-stop shop, which makes sense to have, I think that, once the design (apis and boundaries) settles (long term), making those individual packages offshoots / dependencies of this package, and registering those individual packages (for those that aren't served by always having graphing + plotting + ... dependencies in production) is the right goal.

tedsteiner · 2015-09-25T18:33:56Z

Sounds good to me.

sbromberger · 2015-11-02T04:48:26Z

Let me know how I can help with LightGraphs. Specifically, if there's functionality you need that doesn't exist, or you have questions relating to existing functionality, please feel free to open up an issue there so we can discuss the best way to proceed.

yeesian · 2015-11-02T04:55:23Z

So I've got stuff working (with LightGraphs) and running on OpenStreetMapParser, and am using that for my research at the moment. I'm really busy at the moment, and won't have time to start doing PRs to this package for awhile though

yeesian mentioned this issue Oct 21, 2015

Maybe remove the dependency on DataFrames? penntaylor/DBaseReader.jl#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issues #72

Performance Issues #72

yeesian commented Sep 24, 2015

yeesian commented Sep 24, 2015

garborg commented Sep 24, 2015

tedsteiner commented Sep 24, 2015

garborg commented Sep 24, 2015

yeesian commented Sep 24, 2015

yeesian commented Sep 24, 2015

garborg commented Sep 24, 2015

tedsteiner commented Sep 25, 2015

garborg commented Sep 25, 2015

tedsteiner commented Sep 25, 2015

sbromberger commented Nov 2, 2015

yeesian commented Nov 2, 2015

Performance Issues #72

Performance Issues #72

Comments

yeesian commented Sep 24, 2015

yeesian commented Sep 24, 2015

garborg commented Sep 24, 2015

tedsteiner commented Sep 24, 2015

garborg commented Sep 24, 2015

yeesian commented Sep 24, 2015

yeesian commented Sep 24, 2015

garborg commented Sep 24, 2015

tedsteiner commented Sep 25, 2015

garborg commented Sep 25, 2015

tedsteiner commented Sep 25, 2015

sbromberger commented Nov 2, 2015

yeesian commented Nov 2, 2015