In this project, I deployed the output from my subscriber-pipeline
project to a MongoDB database on DigitalOcean. The project included
- creating a new MongoDB instance on DigitalOcean
- connecting to the cloud MongoDB using MongoDB Compass
- uploading the clean dataset as a NoSQL Collection
- validating the final dataset
- Create a MongoDB instance on DigitalOcean
- Connect to the MongoDB instance from MongoDB Compass
- Create a new database on the server
- Collect the output CSV from
subscriber-pipeline
- Import the CSV as a NoSQL collection with the correct datatypes
After importing the data, I performed the following validation checks
- To confirm no data was lost, I compared row counts between the CSV and the MongoDB Database
- I inspected a few records visually, and noticed some whitespace had been introduced. I can use a
$trim
aggregation after import to remove this whitespace in the MongoDB, though it would be better to investigate exactly where these artifacts were introduced in the full pipeline. - I created the following analytics-oriented filters to inspect the results:
{state: {$eq: " Colorado"}}
to see a specific state (notice the whitespace issue appearing here){avg_salary: {$gt: 100000}}
to see customers from high-earning industries{time_spent_hrs: {$eq: 0}}
to see customers who cancelled before spending any time on the platform