This repository contains materials for the introductory pandas workshop at the UC Berkeley D-Lab.
The best learning experience happens when you can edit and run code. So, please have Python Anaconda Distribution 3.7, pandas, matplotlib, and Jupyter installed before the start of the workshop. Follow the steps below to setup your environment:
-
Click here to download Python Anaconda 3.7 Distribution, although 3.6 is also okay if you already have it installed. Scroll down to the "Anaconda Installers" section and click the "Graphical Installer" option that corresponds to your operating system.
-
If you are using Terminal (Mac) or GitBash (PC), you can pip install the necessary packages by typing:
$ pip install pandas matplotlib jupyter
Windows users only - if you wish to emulate the Bash programming language found in Mac users' "Terminal" application, click here to download GitBash, a Unix command-line environment for Windows users.
Alternatively, you can install these packages by adding a cell to the top of your Jupyter Notebook and typing:
!pip install pandas matplotlib jupyter
Once the software is installed, download the necessary files for the workshops which are contained in this repository. Get them by doing the following:
- Click the green "Clone or Download" button
- Click "Download Zip"
- Extract this .zip file someplace familiar, such as your Desktop.
Or, if you are a Git user you can simply clone this repository
$ git clone [email protected]:dlab-berkeley/introduction-to-pandas.git
- Open the "Anaconda Navigator" application and click "Launch" under Jupyter Notebook
or
Navigate to the respository using Terminal or Gitbash and type
$ cd introduction-to-pandas
then
$ jupyter notebook
or python3 -m notebook
This will open a blank notebook for you to use as a scratch space is you desire. Open the file "introduction-to-pandas.ipynb" to access the tutorial.
For this workshop, we'll go through an example using European unemployment data. We'll load, view, and modify the data as well as calculate some descriptive statistics. The idea is to get a sense of what it would be like to use pandas as part of your workflow.
We plan to cover:
- pandas data structures
- loading data
- subsetting and filtering
- calculating summary statistics
- dealing with missing values
- merging data sets
- creating new variables
- basic plotting
- exporting data
If you have trouble installing the software or can otherwise not get the Jupyter Notebook to open, click this "launch binder" badge to start this session