Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
cortex.yaml		cortex.yaml
predictor.py		predictor.py
requirements.txt		requirements.txt
sample.json		sample.json
trainer.ipynb		trainer.ipynb

README.md

Clean Dirty Documents w/ Autoencoders

This example model cleans text documents of anything that isn't text (aka noise): coffee stains, old wear artifacts, etc. You can inspect the notebook that has been used to train the model here.

Here's a collage of input texts and predictions.

Figure 1 - The dirty documents are on the left side and the cleaned ones are on the right

Sample Prediction

Once this model is deployed, get the API endpoint by running cortex get document-denoiser.

Now let's take a sample image like this one.

Export the endpoint & the image's URL by running

export ENDPOINT=<API endpoint>
export IMAGE_URL=https://i.imgur.com/JJLfFxB.png

Then run the following piped commands

curl "${ENDPOINT}" -X POST -H "Content-Type: application/json" -d '{"url":"'${IMAGE_URL}'"}' |
sed 's/"//g' |
base64 -d > prediction.png

Once this has run, we'll see a prediction.png file saved to the disk. This is the result.

As it can be seen, the text document has been cleaned of any noise. Success!

Here's a short list of URLs of other text documents in image format that can be cleaned using this model. Export these links to IMAGE_URL variable:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document-denoiser

document-denoiser

README.md

Clean Dirty Documents w/ Autoencoders

Sample Prediction

Files

document-denoiser

Directory actions

More options

Directory actions

More options

Latest commit

History

document-denoiser

Folders and files

parent directory

README.md

Clean Dirty Documents w/ Autoencoders

Sample Prediction