The purpose of this project is to manage Netflix library data by employing data cleaning and normalization processes. Once the data has been cleaned and normalized, it is structured into tables and uploaded to a PostgreSQL database. These procedures ensure that the data is systematically organized and ready for further analysis or integration into other applications. Additionally, there are two options for accessing the data via API: FastAPI and a BASH option.
- Review raw data in Jupyter Notebook.
- Create a database and establish the connection.
- Transform data.
- Load transformed data to database.
- Create relationships between tables, normalize database.
- Load Raw data to the database.
- Create Python API's to have access to the data using SQL queries.
Follow these steps to set up and run the project:
- Clone the Repository type
git clone https://github.com/TuringCollegeSubmissions/rgaldi-DE1.v2.4.1
in bash - CD to rgaldi-DE1.v2.4.1
- Set Up Database Connection Settings
- Create a file named
db_conn_settings.py
in the root directory of the project. - Inside this file, define the following variables with your database connection details: python
DB_USERNAME = "your_database_username"
DB_PASSWORD = "your_database_password"
DB_HOST = "your_database_host"
DB_PORT = "your_database_port"
DB_NAME = "your_database_name"
- Make sure you have Python and pip installed.
- Then install the required dependencies: Copy code
pip install -r requirements.txt
- Run the
main.py
script to execute the project. - Follow the instructions and messages printed to the console for further guidance.
Returns the query result if the first word in the query is SELECT
otherwise, it returns the message "Query executed successfully."
- TYPE
uvicorn db_api:app --reload
in BASH/TERMINAL - Go to
http://127.0.0.1:8000/docs#/
- ENTER query and press EXECUTE
- CTRL+C in BASH/TERMINAL to quit
Returns the query result if the first word in the query is SELECT
otherwise, it returns the message "Query executed successfully."
Enter in BASH/TERMINAL
python -c "from sequal_queries import terminal_query; terminal_query('enter_query_here;')"
Change "enter_query_here" with your query