This example model cleans text documents of anything that isn't text (aka noise): coffee stains, old wear artifacts, etc. You can inspect the notebook that has been used to train the model here.
Here's a collage of input texts and predictions.
Figure 1 - The dirty documents are on the left side and the cleaned ones are on the right
Once this model is deployed, get the API endpoint by running cortex get document-denoiser
.
Now let's take a sample image like this one.
Export the endpoint & the image's URL by running
export ENDPOINT=<API endpoint>
export IMAGE_URL=https://i.imgur.com/JJLfFxB.png
Then run the following piped commands
curl "${ENDPOINT}" -X POST -H "Content-Type: application/json" -d '{"url":"'${IMAGE_URL}'"}' |
sed 's/"//g' |
base64 -d > prediction.png
Once this has run, we'll see a prediction.png
file saved to the disk. This is the result.
As it can be seen, the text document has been cleaned of any noise. Success!
Here's a short list of URLs of other text documents in image format that can be cleaned using this model. Export these links to IMAGE_URL
variable: