From 3217cc6dc5e486ef2ddee694beac823fc4dc1505 Mon Sep 17 00:00:00 2001 From: Crosslad Date: Tue, 29 Oct 2024 16:51:10 +0200 Subject: [PATCH] adding notebook template --- Notebook_template.ipynb | 2428 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 2428 insertions(+) create mode 100644 Notebook_template.ipynb diff --git a/Notebook_template.ipynb b/Notebook_template.ipynb new file mode 100644 index 0000000..abe1674 --- /dev/null +++ b/Notebook_template.ipynb @@ -0,0 +1,2428 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Final Python Project\n", + "\n", + "### Avocado Data Analysis\n", + "#### Done By: Team WFM\n", + "\n", + "#© ExploreAI 2024\n", + "\n", + "#---\n", + "\n", + "## Table of Contents\n", + "\n", + " Background Context\n", + "\n", + "1. Importing Packages\n", + "\n", + "2. Data Collection and Description\n", + "\n", + "3. Loading Data \n", + "\n", + "4. Data Cleaning and Filtering\n", + "\n", + "5. Exploratory Data Analysis (EDA)\n", + "\n", + "9. Conclusion and Future Work\n", + "\n", + "10. References" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + " \n", + "## **Background Context**\n", + "Back to Table of Contents\n", + "\n", + "* **Purpose:** This project focuses on analyzing avocado sales and pricing data to uncover trends and insights. The analysis includes data cleaning, filtering, and exploratory data analysis (EDA). Insights gained from this analysis can help in understanding the avocado market and predicting future trends.\n", + "* **Details:** Include information about the problem domain, the specific questions or challenges the project aims to address, and any relevant background information that sets the stage for the work.\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## **Importing Packages**\n", + "Back to Table of Contents\n", + "\n", + "* **Purpose:** Set up the Python environment with necessary libraries and tools.\n", + "* **Details:** List and import all the Python packages that will be used throughout the project such as Pandas for data manipulation, Matplotlib/Seaborn for visualization, scikit-learn for modeling, etc.\n", + "---" + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: anyio==4.2.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 1)) (4.2.0)\n", + "Requirement already satisfied: appnope==0.1.3 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 2)) (0.1.3)\n", + "Requirement already satisfied: argon2-cffi==21.3.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 3)) (21.3.0)\n", + "Requirement already satisfied: argon2-cffi-bindings==21.2.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 4)) (21.2.0)\n", + "Requirement already satisfied: asttokens==2.0.5 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 5)) (2.0.5)\n", + "Requirement already satisfied: async-lru==2.0.4 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 6)) (2.0.4)\n", + "Requirement already satisfied: attrs==23.1.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 7)) (23.1.0)\n", + "Requirement already satisfied: Babel==2.11.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 8)) (2.11.0)\n", + "Requirement already satisfied: beautifulsoup4==4.12.3 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 9)) (4.12.3)\n", + "Requirement already satisfied: bleach==4.1.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 10)) (4.1.0)\n", + "Requirement already satisfied: Brotli==1.0.9 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 11)) (1.0.9)\n", + "Requirement already satisfied: certifi==2024.8.30 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 12)) (2024.8.30)\n", + "Requirement already satisfied: cffi==1.16.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 13)) (1.16.0)\n", + "Requirement already satisfied: charset-normalizer==3.3.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 14)) (3.3.2)\n", + "Requirement already satisfied: comm==0.2.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 15)) (0.2.1)\n", + "Requirement already satisfied: contourpy==1.2.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 16)) (1.2.1)\n", + "Requirement already satisfied: cryptography==42.0.5 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 17)) (42.0.5)\n", + "Requirement already satisfied: cycler==0.12.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 18)) (0.12.1)\n", + "Requirement already satisfied: debugpy==1.6.7 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 19)) (1.6.7)\n", + "Requirement already satisfied: decorator==5.1.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 20)) (5.1.1)\n", + "Requirement already satisfied: defusedxml==0.7.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 21)) (0.7.1)\n", + "Requirement already satisfied: executing==0.8.3 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 22)) (0.8.3)\n", + "Requirement already satisfied: fastjsonschema==2.16.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 23)) (2.16.2)\n", + "Requirement already satisfied: fonttools==4.53.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 24)) (4.53.1)\n", + "Requirement already satisfied: greenlet==3.0.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 25)) (3.0.1)\n", + "Requirement already satisfied: idna==3.7 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 26)) (3.7)\n", + "Requirement already satisfied: ipykernel==6.28.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 27)) (6.28.0)\n", + "Requirement already satisfied: ipython==8.25.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 28)) (8.25.0)\n", + "Requirement already satisfied: jedi==0.19.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 29)) (0.19.1)\n", + "Requirement already satisfied: Jinja2==3.1.4 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 30)) (3.1.4)\n", + "Requirement already satisfied: json5==0.9.6 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 31)) (0.9.6)\n", + "Requirement already satisfied: jsonschema==4.19.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 32)) (4.19.2)\n", + "Requirement already satisfied: jsonschema-specifications==2023.7.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 33)) (2023.7.1)\n", + "Requirement already satisfied: jupyter_client==8.6.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 34)) (8.6.0)\n", + "Requirement already satisfied: jupyter_core==5.7.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 35)) (5.7.2)\n", + "Requirement already satisfied: jupyter-events==0.10.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 36)) (0.10.0)\n", + "Requirement already satisfied: jupyter-lsp==2.2.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 37)) (2.2.0)\n", + "Requirement already satisfied: jupyter_server==2.14.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 38)) (2.14.1)\n", + "Requirement already satisfied: jupyter_server_terminals==0.4.4 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 39)) (0.4.4)\n", + "Requirement already satisfied: jupyterlab==4.0.11 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 40)) (4.0.11)\n", + "Requirement already satisfied: jupyterlab-pygments==0.1.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 41)) (0.1.2)\n", + "Requirement already satisfied: jupyterlab_server==2.25.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 42)) (2.25.1)\n", + "Requirement already satisfied: kiwisolver==1.4.5 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 43)) (1.4.5)\n", + "Requirement already satisfied: MarkupSafe==2.1.3 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 44)) (2.1.3)\n", + "Requirement already satisfied: matplotlib==3.9.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 45)) (3.9.2)\n", + "Requirement already satisfied: matplotlib-inline==0.1.6 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 46)) (0.1.6)\n", + "Requirement already satisfied: mistune==2.0.4 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 47)) (2.0.4)\n", + "Requirement already satisfied: nbclient==0.8.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 48)) (0.8.0)\n", + "Requirement already satisfied: nbconvert==7.10.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 49)) (7.10.0)\n", + "Requirement already satisfied: nbformat==5.9.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 50)) (5.9.2)\n", + "Requirement already satisfied: nest-asyncio==1.6.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 51)) (1.6.0)\n", + "Requirement already satisfied: notebook==7.0.8 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 52)) (7.0.8)\n", + "Requirement already satisfied: notebook_shim==0.2.3 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 53)) (0.2.3)\n", + "Requirement already satisfied: numpy==2.1.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 54)) (2.1.0)\n", + "Requirement already satisfied: overrides==7.4.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 55)) (7.4.0)\n", + "Requirement already satisfied: packaging==24.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 56)) (24.1)\n", + "Requirement already satisfied: pandas==2.2.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 57)) (2.2.2)\n", + "Requirement already satisfied: pandocfilters==1.5.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 58)) (1.5.0)\n", + "Requirement already satisfied: parso==0.8.3 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 59)) (0.8.3)\n", + "Requirement already satisfied: pexpect==4.8.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 60)) (4.8.0)\n", + "Requirement already satisfied: pillow==10.4.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 61)) (10.4.0)\n", + "Requirement already satisfied: pip==24.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 62)) (24.2)\n", + "Requirement already satisfied: platformdirs==3.10.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 63)) (3.10.0)\n", + "Requirement already satisfied: prometheus-client==0.14.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 64)) (0.14.1)\n", + "Requirement already satisfied: prompt-toolkit==3.0.43 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 65)) (3.0.43)\n", + "Requirement already satisfied: psutil==5.9.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 66)) (5.9.0)\n", + "Requirement already satisfied: ptyprocess==0.7.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 67)) (0.7.0)\n", + "Requirement already satisfied: pure-eval==0.2.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 68)) (0.2.2)\n", + "Requirement already satisfied: pycparser==2.21 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 69)) (2.21)\n", + "Requirement already satisfied: Pygments==2.15.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 70)) (2.15.1)\n", + "Requirement already satisfied: PyMySQL==1.0.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 71)) (1.0.2)\n", + "Requirement already satisfied: pyparsing==3.1.4 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 72)) (3.1.4)\n", + "Requirement already satisfied: PySocks==1.7.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 73)) (1.7.1)\n", + "Requirement already satisfied: python-dateutil==2.9.0.post0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 74)) (2.9.0.post0)\n", + "Requirement already satisfied: python-json-logger==2.0.7 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 75)) (2.0.7)\n", + "Requirement already satisfied: pytz==2024.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 76)) (2024.1)\n", + "Requirement already satisfied: PyYAML==6.0.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 77)) (6.0.1)\n", + "Requirement already satisfied: pyzmq==25.1.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 78)) (25.1.2)\n", + "Requirement already satisfied: referencing==0.30.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 79)) (0.30.2)\n", + "Requirement already satisfied: requests==2.32.3 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 80)) (2.32.3)\n", + "Requirement already satisfied: rfc3339-validator==0.1.4 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 81)) (0.1.4)\n", + "Requirement already satisfied: rfc3986-validator==0.1.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 82)) (0.1.1)\n", + "Requirement already satisfied: rpds-py==0.10.6 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 83)) (0.10.6)\n", + "Requirement already satisfied: seaborn==0.13.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 84)) (0.13.2)\n", + "Requirement already satisfied: Send2Trash==1.8.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 85)) (1.8.2)\n", + "Requirement already satisfied: setuptools==72.1.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 86)) (72.1.0)\n", + "Requirement already satisfied: six==1.16.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 87)) (1.16.0)\n", + "Requirement already satisfied: sniffio==1.3.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 88)) (1.3.0)\n", + "Requirement already satisfied: soupsieve==2.5 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 89)) (2.5)\n", + "Requirement already satisfied: SQLAlchemy==2.0.30 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 90)) (2.0.30)\n", + "Requirement already satisfied: stack-data==0.2.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 91)) (0.2.0)\n", + "Requirement already satisfied: terminado==0.17.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 92)) (0.17.1)\n", + "Requirement already satisfied: tinycss2==1.2.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 93)) (1.2.1)\n", + "Requirement already satisfied: tornado==6.4.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 94)) (6.4.1)\n", + "Requirement already satisfied: traitlets==5.14.3 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 95)) (5.14.3)\n", + "Requirement already satisfied: typing_extensions==4.11.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 96)) (4.11.0)\n", + "Requirement already satisfied: tzdata==2024.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 97)) (2024.1)\n", + "Requirement already satisfied: urllib3==2.2.2 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 98)) (2.2.2)\n", + "Requirement already satisfied: wcwidth==0.2.5 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 99)) (0.2.5)\n", + "Requirement already satisfied: webencodings==0.5.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 100)) (0.5.1)\n", + "Requirement already satisfied: websocket-client==1.8.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 101)) (1.8.0)\n", + "Requirement already satisfied: wheel==0.43.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from -r requirements.txt (line 102)) (0.43.0)\n", + "Requirement already satisfied: colorama in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from ipython==8.25.0->-r requirements.txt (line 28)) (0.4.6)\n", + "Requirement already satisfied: pywin32>=300 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from jupyter_core==5.7.2->-r requirements.txt (line 35)) (305.1)\n", + "Requirement already satisfied: pywinpty>=2.0.1 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from jupyter_server==2.14.1->-r requirements.txt (line 38)) (2.0.10)\n", + "Requirement already satisfied: fqdn in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events==0.10.0->-r requirements.txt (line 36)) (1.5.1)\n", + "Requirement already satisfied: isoduration in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events==0.10.0->-r requirements.txt (line 36)) (20.11.0)\n", + "Requirement already satisfied: jsonpointer>1.13 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events==0.10.0->-r requirements.txt (line 36)) (3.0.0)\n", + "Requirement already satisfied: uri-template in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events==0.10.0->-r requirements.txt (line 36)) (1.3.0)\n", + "Requirement already satisfied: webcolors>=1.11 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events==0.10.0->-r requirements.txt (line 36)) (24.8.0)\n", + "Requirement already satisfied: arrow>=0.15.0 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from isoduration->jsonschema[format-nongpl]>=4.18.0->jupyter-events==0.10.0->-r requirements.txt (line 36)) (1.3.0)\n", + "Requirement already satisfied: types-python-dateutil>=2.8.10 in c:\\users\\johansgr\\appdata\\local\\anaconda3\\envs\\creating_an_environment\\lib\\site-packages (from arrow>=0.15.0->isoduration->jsonschema[format-nongpl]>=4.18.0->jupyter-events==0.10.0->-r requirements.txt (line 36)) (2.9.0.20240906)\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "pip install -r requirements.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd # Importing the Pandas package with an alias, pd\n", + "from sqlalchemy import create_engine, text # Importing the SQL interface. If this fails, run !pip install sqlalchemy in another cell.\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "import re #this import regex package for Python used in data cleaning" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These packages are essential for data manipulation, visualization, and statistical analysis." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## **Data Collection and Description**\n", + "Back to Table of Contents\n", + "\n", + "* The dataset used in this project is sourced from [Kaggle's Avocado Prices dataset](https://www.kaggle.com/neuromusic/avocado-prices). It contains data on avocado prices and sales volume from various regions in the U.S. between 2015 and 2018.\n", + " \n", + "* **Data Fields:**\n", + "- `Date`: The date of the observation.\n", + "- `AveragePrice`: The average price of a single avocado.\n", + "- `Total Volume`: The total number of avocados sold.\n", + "- `4046`, `4225`, `4770`: Different avocado types based on PLU codes.\n", + "- `Region`: The geographical region.\n", + "- `Type`: The type of avocado (conventional or organic).\n", + "- `Year`: The year of the observation." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "#Please use code cells to code in and do not forget to comment your code." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## **Loading Data**\n", + "Back to Table of Contents\n", + "\n", + "The data is loaded into a Pandas DataFrame for easy manipulation:" + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DateAveragePriceTotalVolumeplu4046plu4225plu4770TotalBagsSmallBagsLargeBagsXLargeBagstyperegion
02015-01-041.2240873.282819.5028287.4249.909716.469186.93529.530.0conventionalAlbany
12015-01-041.791373.9557.42153.880.001162.651162.650.000.0organicAlbany
22015-01-041.00435021.49364302.3923821.1682.1546815.7916707.1530108.640.0conventionalAtlanta
32015-01-041.763846.691500.15938.350.001408.191071.35336.840.0organicAtlanta
42015-01-041.08788025.0653987.31552906.0439995.03141136.68137146.073990.610.0conventionalBaltimoreWashington
\n", + "
" + ], + "text/plain": [ + " Date AveragePrice TotalVolume plu4046 plu4225 plu4770 \\\n", + "0 2015-01-04 1.22 40873.28 2819.50 28287.42 49.90 \n", + "1 2015-01-04 1.79 1373.95 57.42 153.88 0.00 \n", + "2 2015-01-04 1.00 435021.49 364302.39 23821.16 82.15 \n", + "3 2015-01-04 1.76 3846.69 1500.15 938.35 0.00 \n", + "4 2015-01-04 1.08 788025.06 53987.31 552906.04 39995.03 \n", + "\n", + " TotalBags SmallBags LargeBags XLargeBags type \\\n", + "0 9716.46 9186.93 529.53 0.0 conventional \n", + "1 1162.65 1162.65 0.00 0.0 organic \n", + "2 46815.79 16707.15 30108.64 0.0 conventional \n", + "3 1408.19 1071.35 336.84 0.0 organic \n", + "4 141136.68 137146.07 3990.61 0.0 conventional \n", + "\n", + " region \n", + "0 Albany \n", + "1 Albany \n", + "2 Atlanta \n", + "3 Atlanta \n", + "4 BaltimoreWashington " + ] + }, + "execution_count": 125, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = pd.read_csv(\"Avocado_data.csv\")\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This DataFrame will be used for all subsequent analysis." + ] + }, + { + "cell_type": "code", + "execution_count": 126, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 53415 entries, 0 to 53414\n", + "Data columns (total 12 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Date 53415 non-null object \n", + " 1 AveragePrice 53415 non-null float64\n", + " 2 TotalVolume 53415 non-null float64\n", + " 3 plu4046 53415 non-null float64\n", + " 4 plu4225 53415 non-null float64\n", + " 5 plu4770 53415 non-null float64\n", + " 6 TotalBags 53415 non-null float64\n", + " 7 SmallBags 41025 non-null float64\n", + " 8 LargeBags 41025 non-null float64\n", + " 9 XLargeBags 41025 non-null float64\n", + " 10 type 53415 non-null object \n", + " 11 region 53415 non-null object \n", + "dtypes: float64(9), object(3)\n", + "memory usage: 4.9+ MB\n" + ] + } + ], + "source": [ + "df.info() # Using this function to provide a concise summary of our data and to check for null entries." + ] + }, + { + "cell_type": "code", + "execution_count": 127, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AveragePriceTotalVolumeplu4046plu4225plu4770TotalBagsSmallBagsLargeBagsXLargeBags
count53415.0000005.341500e+045.341500e+045.341500e+045.341500e+045.341500e+044.102500e+044.102500e+0441025.000000
mean1.4289108.694474e+052.982707e+052.222170e+052.053195e+042.175083e+051.039222e+052.331316e+042731.811796
std0.3931163.545274e+061.307669e+069.554624e+051.040977e+058.676947e+055.692608e+051.496622e+0522589.096454
min0.4400008.456000e+010.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000
25%1.1190911.626465e+046.947250e+022.120800e+030.000000e+007.846520e+030.000000e+000.000000e+000.000000
50%1.4000001.203525e+051.458058e+041.751663e+049.005000e+013.695310e+046.945800e+020.000000e+000.000000
75%1.6900004.542380e+051.287924e+059.351560e+043.599735e+031.110146e+053.795298e+042.814920e+030.000000
max3.4408306.103446e+072.544720e+072.047057e+072.860025e+061.629830e+071.256716e+074.324231e+06679586.800000
\n", + "
" + ], + "text/plain": [ + " AveragePrice TotalVolume plu4046 plu4225 plu4770 \\\n", + "count 53415.000000 5.341500e+04 5.341500e+04 5.341500e+04 5.341500e+04 \n", + "mean 1.428910 8.694474e+05 2.982707e+05 2.222170e+05 2.053195e+04 \n", + "std 0.393116 3.545274e+06 1.307669e+06 9.554624e+05 1.040977e+05 \n", + "min 0.440000 8.456000e+01 0.000000e+00 0.000000e+00 0.000000e+00 \n", + "25% 1.119091 1.626465e+04 6.947250e+02 2.120800e+03 0.000000e+00 \n", + "50% 1.400000 1.203525e+05 1.458058e+04 1.751663e+04 9.005000e+01 \n", + "75% 1.690000 4.542380e+05 1.287924e+05 9.351560e+04 3.599735e+03 \n", + "max 3.440830 6.103446e+07 2.544720e+07 2.047057e+07 2.860025e+06 \n", + "\n", + " TotalBags SmallBags LargeBags XLargeBags \n", + "count 5.341500e+04 4.102500e+04 4.102500e+04 41025.000000 \n", + "mean 2.175083e+05 1.039222e+05 2.331316e+04 2731.811796 \n", + "std 8.676947e+05 5.692608e+05 1.496622e+05 22589.096454 \n", + "min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000 \n", + "25% 7.846520e+03 0.000000e+00 0.000000e+00 0.000000 \n", + "50% 3.695310e+04 6.945800e+02 0.000000e+00 0.000000 \n", + "75% 1.110146e+05 3.795298e+04 2.814920e+03 0.000000 \n", + "max 1.629830e+07 1.256716e+07 4.324231e+06 679586.800000 " + ] + }, + "execution_count": 127, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.describe() # Using the describe function to view summary statistics on our data in order to have an overview." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "#Please use code cells to code in and do not forget to comment your code.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## **Data Cleaning and Filtering**\n", + "Back to Table of Contents\n", + "\n", + "* Data cleaning includes handling missing values, filtering irrelevant data, and creating new features if necessary:\n", + "---" + ] + }, + { + "cell_type": "code", + "execution_count": 128, + "metadata": {}, + "outputs": [], + "source": [ + "#Please use code cells to code in and do not forget to comment your code.\n", + "''' This code will take the headings from dataframe and convert them into PEP 8 compliant headings and then replace the dataframe headings'''\n", + "\n", + "def pep8_compliant_column_names(df):\n", + " \"\"\"\n", + " Convert DataFrame column names to PEP 8 compliant names.\n", + " \n", + " Parameters:\n", + " df (pd.DataFrame): The DataFrame with column names to convert.\n", + " \n", + " Returns:\n", + " pd.DataFrame: DataFrame with updated column names.\n", + " \"\"\"\n", + " def convert_to_pep8(name):\n", + " # Replace spaces with underscores\n", + " name = re.sub(r'\\s+', '_', name)\n", + " # Insert underscores between adjacent capitalized words\n", + " name = re.sub(r'(?<=[a-z])(?=[A-Z])', '_', name)\n", + " # Convert to lowercase\n", + " name = name.lower()\n", + " # Remove any non-alphanumeric characters except underscores\n", + " name = re.sub(r'[^\\w_]', '', name)\n", + " # Replace multiple underscores with a single underscore\n", + " name = re.sub(r'_+', '_', name)\n", + " return name\n", + "\n", + " # Apply PEP 8 compliance to each column name\n", + " new_columns = [convert_to_pep8(col) for col in df.columns]\n", + " \n", + " # Set the new column names\n", + " df.columns = new_columns\n", + " \n", + " return df" + ] + }, + { + "cell_type": "code", + "execution_count": 129, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " date average_price total_volume plu4046 plu4225 plu4770 \\\n", + "0 2015-01-04 1.22 40873.28 2819.50 28287.42 49.90 \n", + "1 2015-01-04 1.79 1373.95 57.42 153.88 0.00 \n", + "2 2015-01-04 1.00 435021.49 364302.39 23821.16 82.15 \n", + "3 2015-01-04 1.76 3846.69 1500.15 938.35 0.00 \n", + "4 2015-01-04 1.08 788025.06 53987.31 552906.04 39995.03 \n", + "\n", + " total_bags small_bags large_bags xlarge_bags type \\\n", + "0 9716.46 9186.93 529.53 0.0 conventional \n", + "1 1162.65 1162.65 0.00 0.0 organic \n", + "2 46815.79 16707.15 30108.64 0.0 conventional \n", + "3 1408.19 1071.35 336.84 0.0 organic \n", + "4 141136.68 137146.07 3990.61 0.0 conventional \n", + "\n", + " region \n", + "0 Albany \n", + "1 Albany \n", + "2 Atlanta \n", + "3 Atlanta \n", + "4 BaltimoreWashington \n" + ] + } + ], + "source": [ + "df = pep8_compliant_column_names(df)\n", + "print(df.head())" + ] + }, + { + "cell_type": "code", + "execution_count": 130, + "metadata": {}, + "outputs": [], + "source": [ + "a = ['BaltimoreWashington', 'BuffaloRochester', 'CincinnatiDayton', 'DallasFtWorth',\n", + " 'GrandRapids', 'GreatLakes', 'HarrisburgScranton', 'HartfordSpringfield',\n", + " 'LasVegas', 'LosAngeles', 'NewOrleans', 'NewYork',\n", + " 'NorthernNewEngland', 'PhoenixTucson', 'RaleighGreensboro', 'RichmondNorfolk',\n", + " 'SanDiego', 'SanFrancisco', 'SouthCarolina', 'SouthCentral',\n", + " 'StLouis', 'TotalUS', 'WestTexNewMexico', 'BirminghamMontgomery',\n", + " 'PeoriaSpringfield', 'MiamiFtLauderdale']\n", + "b = ['Baltimore_Washington', 'Buffalo_Rochester', 'Cincinnati_Dayton', 'Dallas_Ft_Worth',\n", + " 'Grand_Rapids', 'Great_Lakes', 'Harrisburg_Scranton', 'Hartford_Springfield',\n", + " 'Las_Vegas', 'Los_Angeles', 'New_Orleans', 'New_York',\n", + " 'Northern_New_England', 'Phoenix_Tucson', 'Raleigh_Greensboro', 'Richmond_Norfolk',\n", + " 'San_Diego', 'San_Francisco', 'South_Carolina', 'South_Central',\n", + " 'St_Louis', 'Total_US', 'West_Tex_New_Mexico', 'Birmingham_Montgomery',\n", + " 'Peoria_Springfield', 'Miami_Ft_Lauderdale']\n", + "df['region'] = df['region'].replace(a, b, regex = True)" + ] + }, + { + "cell_type": "code", + "execution_count": 131, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 131, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.isnull().sum # Checking for missing values" + ] + }, + { + "cell_type": "code", + "execution_count": 132, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['Albany', 'Atlanta', 'Baltimore_Washington', 'Boise', 'Boston',\n", + " 'Buffalo_Rochester', 'California', 'Charlotte', 'Chicago',\n", + " 'Cincinnati_Dayton', 'Columbus', 'Dallas_Ft_Worth', 'Denver',\n", + " 'Detroit', 'Grand_Rapids', 'Great_Lakes', 'Harrisburg_Scranton',\n", + " 'Hartford_Springfield', 'Houston', 'Indianapolis', 'Jacksonville',\n", + " 'Las_Vegas', 'Los_Angeles', 'Louisville', 'Miami', 'Midsouth',\n", + " 'Nashville', 'New_Orleans', 'New_York', 'Northeast',\n", + " 'Northern_New_England', 'Orlando', 'Philadelphia',\n", + " 'Phoenix_Tucson', 'Pittsburgh', 'Plains', 'Portland',\n", + " 'Raleigh_Greensboro', 'Richmond_Norfolk', 'Roanoke', 'Sacramento',\n", + " 'San_Diego', 'San_Francisco', 'Seattle', 'South_Carolina',\n", + " 'South_Central', 'Southeast', 'Spokane', 'St_Louis', 'Syracuse',\n", + " 'Tampa', 'Total_US', 'West', 'West_Tex_New_Mexico',\n", + " 'Birmingham_Montgomery', 'Peoria_Springfield', 'Providence',\n", + " 'Toledo', 'Wichita', 'Miami_Ft_Lauderdale'], dtype=object)" + ] + }, + "execution_count": 132, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['region'].unique() # Using the unique method to find distinct values in the regions column" + ] + }, + { + "cell_type": "code", + "execution_count": 133, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "region\n", + "Albany 932\n", + "Atlanta 932\n", + "Baltimore_Washington 932\n", + "Boise 932\n", + "Boston 932\n", + "Buffalo_Rochester 932\n", + "California 932\n", + "Charlotte 932\n", + "Chicago 932\n", + "Cincinnati_Dayton 932\n", + "Columbus 932\n", + "Dallas_Ft_Worth 932\n", + "Denver 932\n", + "Detroit 932\n", + "Grand_Rapids 932\n", + "Great_Lakes 932\n", + "Harrisburg_Scranton 932\n", + "Hartford_Springfield 932\n", + "Houston 932\n", + "Indianapolis 932\n", + "Jacksonville 932\n", + "Las_Vegas 932\n", + "Los_Angeles 932\n", + "Louisville 932\n", + "Midsouth 932\n", + "Nashville 932\n", + "New_Orleans 932\n", + "New_York 932\n", + "South_Carolina 932\n", + "Northeast 932\n", + "Northern_New_England 932\n", + "Orlando 932\n", + "Philadelphia 932\n", + "Phoenix_Tucson 932\n", + "Pittsburgh 932\n", + "Plains 932\n", + "Portland 932\n", + "Raleigh_Greensboro 932\n", + "Richmond_Norfolk 932\n", + "Roanoke 932\n", + "Sacramento 932\n", + "San_Diego 932\n", + "San_Francisco 932\n", + "Seattle 932\n", + "West 932\n", + "South_Central 932\n", + "Southeast 932\n", + "Spokane 932\n", + "St_Louis 932\n", + "Syracuse 932\n", + "Tampa 932\n", + "Total_US 932\n", + "West_Tex_New_Mexico 929\n", + "Miami 722\n", + "Birmingham_Montgomery 618\n", + "Peoria_Springfield 618\n", + "Providence 618\n", + "Toledo 618\n", + "Wichita 618\n", + "Miami_Ft_Lauderdale 210\n", + "Name: count, dtype: int64" + ] + }, + "execution_count": 133, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['region'].value_counts() # Using the value_counts method to find the frequency of each unique value in the region column " + ] + }, + { + "cell_type": "code", + "execution_count": 134, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dateaverage_pricetotal_volumeplu4046plu4225plu4770total_bagssmall_bagslarge_bagsxlarge_bagstype
region
Albany932932932932932932932722722722932
Atlanta932932932932932932932722722722932
Baltimore_Washington932932932932932932932722722722932
Birmingham_Montgomery618618618618618618618408408408618
Boise932932932932932932932722722722932
Boston932932932932932932932722722722932
Buffalo_Rochester932932932932932932932722722722932
California932932932932932932932722722722932
Charlotte932932932932932932932722722722932
Chicago932932932932932932932722722722932
Cincinnati_Dayton932932932932932932932722722722932
Columbus932932932932932932932722722722932
Dallas_Ft_Worth932932932932932932932722722722932
Denver932932932932932932932722722722932
Detroit932932932932932932932722722722932
Grand_Rapids932932932932932932932722722722932
Great_Lakes932932932932932932932722722722932
Harrisburg_Scranton932932932932932932932722722722932
Hartford_Springfield932932932932932932932722722722932
Houston932932932932932932932722722722932
Indianapolis932932932932932932932722722722932
Jacksonville932932932932932932932722722722932
Las_Vegas932932932932932932932722722722932
Los_Angeles932932932932932932932722722722932
Louisville932932932932932932932722722722932
Miami722722722722722722722722722722722
Miami_Ft_Lauderdale210210210210210210210000210
Midsouth932932932932932932932722722722932
Nashville932932932932932932932722722722932
New_Orleans932932932932932932932722722722932
New_York932932932932932932932722722722932
Northeast932932932932932932932722722722932
Northern_New_England932932932932932932932722722722932
Orlando932932932932932932932722722722932
Peoria_Springfield618618618618618618618408408408618
Philadelphia932932932932932932932722722722932
Phoenix_Tucson932932932932932932932722722722932
Pittsburgh932932932932932932932722722722932
Plains932932932932932932932722722722932
Portland932932932932932932932722722722932
Providence618618618618618618618408408408618
Raleigh_Greensboro932932932932932932932722722722932
Richmond_Norfolk932932932932932932932722722722932
Roanoke932932932932932932932722722722932
Sacramento932932932932932932932722722722932
San_Diego932932932932932932932722722722932
San_Francisco932932932932932932932722722722932
Seattle932932932932932932932722722722932
South_Carolina932932932932932932932722722722932
South_Central932932932932932932932722722722932
Southeast932932932932932932932722722722932
Spokane932932932932932932932722722722932
St_Louis932932932932932932932722722722932
Syracuse932932932932932932932722722722932
Tampa932932932932932932932722722722932
Toledo618618618618618618618408408408618
Total_US932932932932932932932722722722932
West932932932932932932932722722722932
West_Tex_New_Mexico929929929929929929929719719719929
Wichita618618618618618618618408408408618
\n", + "
" + ], + "text/plain": [ + " date average_price total_volume plu4046 plu4225 \\\n", + "region \n", + "Albany 932 932 932 932 932 \n", + "Atlanta 932 932 932 932 932 \n", + "Baltimore_Washington 932 932 932 932 932 \n", + "Birmingham_Montgomery 618 618 618 618 618 \n", + "Boise 932 932 932 932 932 \n", + "Boston 932 932 932 932 932 \n", + "Buffalo_Rochester 932 932 932 932 932 \n", + "California 932 932 932 932 932 \n", + "Charlotte 932 932 932 932 932 \n", + "Chicago 932 932 932 932 932 \n", + "Cincinnati_Dayton 932 932 932 932 932 \n", + "Columbus 932 932 932 932 932 \n", + "Dallas_Ft_Worth 932 932 932 932 932 \n", + "Denver 932 932 932 932 932 \n", + "Detroit 932 932 932 932 932 \n", + "Grand_Rapids 932 932 932 932 932 \n", + "Great_Lakes 932 932 932 932 932 \n", + "Harrisburg_Scranton 932 932 932 932 932 \n", + "Hartford_Springfield 932 932 932 932 932 \n", + "Houston 932 932 932 932 932 \n", + "Indianapolis 932 932 932 932 932 \n", + "Jacksonville 932 932 932 932 932 \n", + "Las_Vegas 932 932 932 932 932 \n", + "Los_Angeles 932 932 932 932 932 \n", + "Louisville 932 932 932 932 932 \n", + "Miami 722 722 722 722 722 \n", + "Miami_Ft_Lauderdale 210 210 210 210 210 \n", + "Midsouth 932 932 932 932 932 \n", + "Nashville 932 932 932 932 932 \n", + "New_Orleans 932 932 932 932 932 \n", + "New_York 932 932 932 932 932 \n", + "Northeast 932 932 932 932 932 \n", + "Northern_New_England 932 932 932 932 932 \n", + "Orlando 932 932 932 932 932 \n", + "Peoria_Springfield 618 618 618 618 618 \n", + "Philadelphia 932 932 932 932 932 \n", + "Phoenix_Tucson 932 932 932 932 932 \n", + "Pittsburgh 932 932 932 932 932 \n", + "Plains 932 932 932 932 932 \n", + "Portland 932 932 932 932 932 \n", + "Providence 618 618 618 618 618 \n", + "Raleigh_Greensboro 932 932 932 932 932 \n", + "Richmond_Norfolk 932 932 932 932 932 \n", + "Roanoke 932 932 932 932 932 \n", + "Sacramento 932 932 932 932 932 \n", + "San_Diego 932 932 932 932 932 \n", + "San_Francisco 932 932 932 932 932 \n", + "Seattle 932 932 932 932 932 \n", + "South_Carolina 932 932 932 932 932 \n", + "South_Central 932 932 932 932 932 \n", + "Southeast 932 932 932 932 932 \n", + "Spokane 932 932 932 932 932 \n", + "St_Louis 932 932 932 932 932 \n", + "Syracuse 932 932 932 932 932 \n", + "Tampa 932 932 932 932 932 \n", + "Toledo 618 618 618 618 618 \n", + "Total_US 932 932 932 932 932 \n", + "West 932 932 932 932 932 \n", + "West_Tex_New_Mexico 929 929 929 929 929 \n", + "Wichita 618 618 618 618 618 \n", + "\n", + " plu4770 total_bags small_bags large_bags \\\n", + "region \n", + "Albany 932 932 722 722 \n", + "Atlanta 932 932 722 722 \n", + "Baltimore_Washington 932 932 722 722 \n", + "Birmingham_Montgomery 618 618 408 408 \n", + "Boise 932 932 722 722 \n", + "Boston 932 932 722 722 \n", + "Buffalo_Rochester 932 932 722 722 \n", + "California 932 932 722 722 \n", + "Charlotte 932 932 722 722 \n", + "Chicago 932 932 722 722 \n", + "Cincinnati_Dayton 932 932 722 722 \n", + "Columbus 932 932 722 722 \n", + "Dallas_Ft_Worth 932 932 722 722 \n", + "Denver 932 932 722 722 \n", + "Detroit 932 932 722 722 \n", + "Grand_Rapids 932 932 722 722 \n", + "Great_Lakes 932 932 722 722 \n", + "Harrisburg_Scranton 932 932 722 722 \n", + "Hartford_Springfield 932 932 722 722 \n", + "Houston 932 932 722 722 \n", + "Indianapolis 932 932 722 722 \n", + "Jacksonville 932 932 722 722 \n", + "Las_Vegas 932 932 722 722 \n", + "Los_Angeles 932 932 722 722 \n", + "Louisville 932 932 722 722 \n", + "Miami 722 722 722 722 \n", + "Miami_Ft_Lauderdale 210 210 0 0 \n", + "Midsouth 932 932 722 722 \n", + "Nashville 932 932 722 722 \n", + "New_Orleans 932 932 722 722 \n", + "New_York 932 932 722 722 \n", + "Northeast 932 932 722 722 \n", + "Northern_New_England 932 932 722 722 \n", + "Orlando 932 932 722 722 \n", + "Peoria_Springfield 618 618 408 408 \n", + "Philadelphia 932 932 722 722 \n", + "Phoenix_Tucson 932 932 722 722 \n", + "Pittsburgh 932 932 722 722 \n", + "Plains 932 932 722 722 \n", + "Portland 932 932 722 722 \n", + "Providence 618 618 408 408 \n", + "Raleigh_Greensboro 932 932 722 722 \n", + "Richmond_Norfolk 932 932 722 722 \n", + "Roanoke 932 932 722 722 \n", + "Sacramento 932 932 722 722 \n", + "San_Diego 932 932 722 722 \n", + "San_Francisco 932 932 722 722 \n", + "Seattle 932 932 722 722 \n", + "South_Carolina 932 932 722 722 \n", + "South_Central 932 932 722 722 \n", + "Southeast 932 932 722 722 \n", + "Spokane 932 932 722 722 \n", + "St_Louis 932 932 722 722 \n", + "Syracuse 932 932 722 722 \n", + "Tampa 932 932 722 722 \n", + "Toledo 618 618 408 408 \n", + "Total_US 932 932 722 722 \n", + "West 932 932 722 722 \n", + "West_Tex_New_Mexico 929 929 719 719 \n", + "Wichita 618 618 408 408 \n", + "\n", + " xlarge_bags type \n", + "region \n", + "Albany 722 932 \n", + "Atlanta 722 932 \n", + "Baltimore_Washington 722 932 \n", + "Birmingham_Montgomery 408 618 \n", + "Boise 722 932 \n", + "Boston 722 932 \n", + "Buffalo_Rochester 722 932 \n", + "California 722 932 \n", + "Charlotte 722 932 \n", + "Chicago 722 932 \n", + "Cincinnati_Dayton 722 932 \n", + "Columbus 722 932 \n", + "Dallas_Ft_Worth 722 932 \n", + "Denver 722 932 \n", + "Detroit 722 932 \n", + "Grand_Rapids 722 932 \n", + "Great_Lakes 722 932 \n", + "Harrisburg_Scranton 722 932 \n", + "Hartford_Springfield 722 932 \n", + "Houston 722 932 \n", + "Indianapolis 722 932 \n", + "Jacksonville 722 932 \n", + "Las_Vegas 722 932 \n", + "Los_Angeles 722 932 \n", + "Louisville 722 932 \n", + "Miami 722 722 \n", + "Miami_Ft_Lauderdale 0 210 \n", + "Midsouth 722 932 \n", + "Nashville 722 932 \n", + "New_Orleans 722 932 \n", + "New_York 722 932 \n", + "Northeast 722 932 \n", + "Northern_New_England 722 932 \n", + "Orlando 722 932 \n", + "Peoria_Springfield 408 618 \n", + "Philadelphia 722 932 \n", + "Phoenix_Tucson 722 932 \n", + "Pittsburgh 722 932 \n", + "Plains 722 932 \n", + "Portland 722 932 \n", + "Providence 408 618 \n", + "Raleigh_Greensboro 722 932 \n", + "Richmond_Norfolk 722 932 \n", + "Roanoke 722 932 \n", + "Sacramento 722 932 \n", + "San_Diego 722 932 \n", + "San_Francisco 722 932 \n", + "Seattle 722 932 \n", + "South_Carolina 722 932 \n", + "South_Central 722 932 \n", + "Southeast 722 932 \n", + "Spokane 722 932 \n", + "St_Louis 722 932 \n", + "Syracuse 722 932 \n", + "Tampa 722 932 \n", + "Toledo 408 618 \n", + "Total_US 722 932 \n", + "West 722 932 \n", + "West_Tex_New_Mexico 719 929 \n", + "Wichita 408 618 " + ] + }, + "execution_count": 134, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby('region').count() # Grouping by the region column and counting the values in each column to look for any anomalies" + ] + }, + { + "cell_type": "code", + "execution_count": 135, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dateaverage_pricetotal_volumeplu4046plu4225plu4770total_bagssmall_bagslarge_bagsxlarge_bagstyperegion
122015-01-040.9300005777334.902843648.262267755.26137479.64528451.74477193.3847882.563375.80conventionalCalifornia
132015-01-041.240000142349.77107490.7325711.962.939144.159144.150.000.00organicCalifornia
1202015-01-110.9200006024932.342889591.292485720.10103573.42546047.53510560.4131874.033613.09conventionalCalifornia
1212015-01-111.100000158110.68123712.5125975.271.478421.438421.430.000.00organicCalifornia
2282015-01-181.0200005570915.262780859.662108450.36121614.31559990.93520299.2636501.183190.49conventionalCalifornia
.......................................
531272023-11-191.727136271097.77120937.098513.680.00129489.38NaNNaNNaNorganicCalifornia
531862023-11-261.3049164200357.002391331.83470082.13373369.38834081.63NaNNaNNaNconventionalCalifornia
532452023-11-261.734881268456.70115377.247771.020.00135478.19NaNNaNNaNorganicCalifornia
533042023-12-031.0660075845932.703868487.00591584.26340737.02916388.81NaNNaNNaNconventionalCalifornia
533632023-12-031.663419290944.44114223.148976.701.46154905.12NaNNaNNaNorganicCalifornia
\n", + "

932 rows × 12 columns

\n", + "
" + ], + "text/plain": [ + " date average_price total_volume plu4046 plu4225 \\\n", + "12 2015-01-04 0.930000 5777334.90 2843648.26 2267755.26 \n", + "13 2015-01-04 1.240000 142349.77 107490.73 25711.96 \n", + "120 2015-01-11 0.920000 6024932.34 2889591.29 2485720.10 \n", + "121 2015-01-11 1.100000 158110.68 123712.51 25975.27 \n", + "228 2015-01-18 1.020000 5570915.26 2780859.66 2108450.36 \n", + "... ... ... ... ... ... \n", + "53127 2023-11-19 1.727136 271097.77 120937.09 8513.68 \n", + "53186 2023-11-26 1.304916 4200357.00 2391331.83 470082.13 \n", + "53245 2023-11-26 1.734881 268456.70 115377.24 7771.02 \n", + "53304 2023-12-03 1.066007 5845932.70 3868487.00 591584.26 \n", + "53363 2023-12-03 1.663419 290944.44 114223.14 8976.70 \n", + "\n", + " plu4770 total_bags small_bags large_bags xlarge_bags \\\n", + "12 137479.64 528451.74 477193.38 47882.56 3375.80 \n", + "13 2.93 9144.15 9144.15 0.00 0.00 \n", + "120 103573.42 546047.53 510560.41 31874.03 3613.09 \n", + "121 1.47 8421.43 8421.43 0.00 0.00 \n", + "228 121614.31 559990.93 520299.26 36501.18 3190.49 \n", + "... ... ... ... ... ... \n", + "53127 0.00 129489.38 NaN NaN NaN \n", + "53186 373369.38 834081.63 NaN NaN NaN \n", + "53245 0.00 135478.19 NaN NaN NaN \n", + "53304 340737.02 916388.81 NaN NaN NaN \n", + "53363 1.46 154905.12 NaN NaN NaN \n", + "\n", + " type region \n", + "12 conventional California \n", + "13 organic California \n", + "120 conventional California \n", + "121 organic California \n", + "228 conventional California \n", + "... ... ... \n", + "53127 organic California \n", + "53186 conventional California \n", + "53245 organic California \n", + "53304 conventional California \n", + "53363 organic California \n", + "\n", + "[932 rows x 12 columns]" + ] + }, + "execution_count": 135, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_filtered = df[df['region'] == 'California'] # Filtering data for specific types of analysis\n", + "df_filtered" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This step ensures the dataset is clean and ready for analysis." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## **Exploratory Data Analysis (EDA)**\n", + "Back to Table of Contents\n", + "\n", + "* The EDA involves visualizing trends, distribution, and correlations within the dataset.\n", + "---\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- **Price Trends Over Time:**" + ] + }, + { + "cell_type": "code", + "execution_count": 136, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + " sns.lineplot(data=df, x='date', y='average_price', hue='type')\n", + " plt.title('Avocado Price Trends Over Time')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- **Regional Analysis:**" + ] + }, + { + "cell_type": "code", + "execution_count": 137, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + " sns.boxplot(data=df, x='region', y='average_price')\n", + " plt.xticks(rotation=90)\n", + " plt.title('Avocado Prices by Region')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## **Conclusion and Future Work**\n", + "Back to Table of Contents\n", + "\n", + "* In this analysis, we have explored avocado price trends across different regions and over time. Future work could include building predictive models for avocado pricing, further analyzing the impact of seasonality, and expanding the dataset with more recent data to improve forecasting accuracy.\n", + "---\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "#Please use code cells to code in and do not forget to comment your code." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## **References**\n", + "Back to Table of Contents\n", + "\n", + "- Kaggle Avocado Prices Dataset: [https://www.kaggle.com/neuromusic/avocado-prices](https://www.kaggle.com/neuromusic/avocado-prices)\n", + "- Pandas Documentation: [https://pandas.pydata.org/docs/](https://pandas.pydata.org/docs/)\n", + "- Seaborn Documentation: [https://seaborn.pydata.org/](https://seaborn.pydata.org/)\n", + "\n", + "---" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "#Please use code cells to code in and do not forget to comment your code." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Additional Sections to Consider\n", + "\n", + "* ### Appendix: \n", + "For any additional code, detailed tables, or extended data visualizations that are supplementary to the main content.\n", + "\n", + "* ### Contributors: \n", + "If this is a group project, list the contributors and their roles or contributions to the project.\n", + "Gary Munn Test 8 Sep" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.2" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}