Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling #20

Open
egorps opened this issue Dec 5, 2013 · 3 comments
Open

Improve error handling #20

egorps opened this issue Dec 5, 2013 · 3 comments

Comments

@egorps
Copy link

egorps commented Dec 5, 2013

https://github.com/spilgames/pyvertica/blob/master/pyvertica/batch.py#L92-L93

I'd like to add a parameter to the vertica batch constructor that provide a rescue action for everything remaining in the fifo - if things are just sucked out and dropped, important data could be lost. I'm thinking the default could just be "pass", but for my use, it would be nice to dump everything in the fifo to a file on disk so it could be restored later.

I'll work on this feature.

@lomignet
Copy link
Contributor

lomignet commented Dec 9, 2013

That would be a good one. One thing to keep in mind is that as we are talking about big data, filling up a partition is quite easy (and already happen to us with the first uses of the REJECTED DATA option of copy).

@egorps
Copy link
Author

egorps commented Dec 9, 2013

Hmmm good point. I think my use case will be like 100 entries per batch. Are you really loading that much data in one batch? How long does it take and how much data are you talking about?

I was thinking about passing in an optional param to the constructor of VerticaBatch, the default value could be "pass"

@lomignet
Copy link
Contributor

We had a project where we were loading daily hundreds of gigs from a binary format, and we really felt the pain of big data (but then it was really fun to work on it). Sadly this data source has been scrapped. Currently we load maybe 1GB/hour, and we are taking about not even 5 minutes of load I believe divided in 16 batches, hence including all startup, checks and so on.

My guess is that a default of 'pass' is safe, and then if data needs to be written to disk, making sure to catch the diskFullException (not sure on top of my head what the real name is) and delete the written file in that case would be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants