Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception dumping recent planets #25

Open
tomhughes opened this issue Mar 26, 2022 · 10 comments
Open

Exception dumping recent planets #25

tomhughes opened this issue Mar 26, 2022 · 10 comments

Comments

@tomhughes
Copy link
Contributor

Planet has failed two weeks in a row now and this week I caught the output and it is throwing an exception dumping the relations to the PBF output:

Writing relations...
EXCEPTION: writer_thread(2): pbf_writer.cpp(189): Throw in function void pbf_writer::pimpl::write_blob(const google::protobuf::MessageLite&, const string&)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >
std::exception::what: Unable to write block of type OSMData, uncompressed size 33630260 because it is larger than the maximum allowed 33554432.

. Trying to continue...
EXCEPTION: pbf_writer.cpp(189): Throw in function void pbf_writer::pimpl::write_blob(const google::protobuf::MessageLite&, const string&)
Dynamic exception type: boost::wrapexcept<std::runtime_error>
std::exception::what: Unable to write block of type OSMData, uncompressed size 33723897 because it is larger than the maximum allowed 33554432.

Not sure if this is an issue in protozero (maybe @joto can help? we could rebuild against a newer version?) or whether planet-dump-ng is feeding it things that are too large.

@tomhughes
Copy link
Contributor Author

Oh scratch that - that's the pbf_writer.cpp in this repository, not the protozero one ;-)

@zerebubuth
Copy link
Owner

zerebubuth commented Mar 26, 2022 via email

@tomhughes
Copy link
Contributor Author

I've applied that change locally for now so hopefully next week will go better...

@tomhughes
Copy link
Contributor Author

Looks like that worked and this week's dump is in the process of being published now.

@tomhughes
Copy link
Contributor Author

Of course bumping to 1.2.4 undid that local hack and it failed again this week :-(

@tomhughes
Copy link
Contributor Author

It failed again this week even with the modified limit...

@zerebubuth
Copy link
Owner

Urgh. Crap. Sorry about that. I've pushed v1.2.5, which should have the lower limit plus also a reduced recheck time. I'm hoping that helps, and I'll try to repro locally again.

@tomhughes
Copy link
Contributor Author

Thanks. I've deployed that now.

zerebubuth added a commit that referenced this issue Oct 5, 2022
Planet dump would fail intermittently complaining of block overflow (#25). The root cause of this was failing to encode [relation 6677259](https://www.openstreetmap.org/relation/6677259) in the history PBF.

PBF files are sequences of individual blocks and every block must be smaller than 16MiB. To try and satisfy this, the PBF encoder checks the size of the encoded pblock on a regular basis, every `N` elements. The assumption is that the average size of the element multiplied by `N` would be smaller than 16MiB. Unfortunately with relation 6677259 that wasn't the case, as versions in its history have more than 25,000 members and were taking up about 100kiB each (so only 164 versions are needed to overflow, relation 6677259 has 440-ish).

This change implements an _approximate_ size counter for relations, which should help to make sure that the pgroup is flushed when it gets too large, even if it hasn't reached `N` elements yet. This hasn't seemed necessary yet for ways (they are limited to a fixed number of nodes each) or nodes, but extending to handle those - if necessary - should be straightforward.
@zerebubuth
Copy link
Owner

Thanks for your patience! I think I found what was causing the issue: basically relation 6677259 is very large (some versions >25k members) and has relatively many versions (about 440), so this was enough to overflow the pblock between checks of the current size.

I think this might also explain the intermittent failure, as it's possible that the recheck might have happened in the "right" place and split the history into two blocks, or in the "wrong" place and tried to collect it all into one block.

I've implemented an approximate size counter for the relation pgroups, which acts as an additional trigger for a re-check of the pblock size. On my local machine, this allowed dumping the 2022-09-12 planet to completion.

I've pushed a new version: v1.2.6. Hopefully this works 🤞

@tomhughes
Copy link
Contributor Author

Thanks. I've deployed that now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants