Data upload prototype [Not to be merged] #930
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We've kicked around the idea of allowing users to upload their own dataset to PolicyBrain. This PR presents a prototype for allowing the user to do this. The purpose of doing this is:
This is definitely possible. File upload capabilities similar to those used for the reform and assumptions files were used. Here's the input page:
and the output page (using the Tax-Calculator version of the CPS file):
The goal is to not crash the server with a very large file. Using the file objects provided by Django and Flask are very helpful in this regard but the data has to be serialized when sent from Django to Flask and from Flask to Celery. I was semi-successfully able to pass a file-like object from Django to Flask but not from Flask to Celery. Celery only receives Pickle, JSON, or msgpack data. I haven't found a good way to to pass a file like object to Celery without using Pickle. I wound up just reading the data into memory and passing it around as a binary blob. This may be a viable approach, but we should be careful to not overwhelm the server.
[Also note that I ran all these tests locally. Hopefully, they hold up when deployed on servers. I plan to test this some time in the coming week.]