A one-day course covering the basis of descriptive statistics with open data, including basic statistical measures such as mean, median, standard deviation, and variance. The course also covers correlation, linear regression, and introduces decision modeling using open data.
Employees of all levels who perform data analysis and communicate analytical findings in support city operations.
This course introduces participants to the use of statistics for understanding and communicating city data. Using Excel, participants will learn how to use measures like mean, median, mode, standard deviation, and variance interval to understand the content of city data for making operational decisions. Participants will also learn how to display statistical information in meaningful ways.
- Learn common statistical measures, including mean, median, mode, standard deviation, and variance
- Calculate correlation coefficients for bivariate data and apply the technique of simple regression analysis
- Demonstrate techniques used for forecasting
- Communicate data meaningfully to a broad audience using charts and graphs in Microsoft Excel
- Participants will be familiar with common statistical measures
- Participants will be able to calculate correlation coefficients for bivariate data and perform simple linear regression analysis
- Participants will be familiar with the basic techniques of forecasting
- Participants will be better able to communicate analysis using charts and graphs in Microsoft Excel
- Introduction and Welcome (9:00 - 9:15)
- Data collection exercise with students
- Collected data includes
- Agency
- Number of siblings
- Height
- Years of Service with NYC
- Overview of Statistics and Basic Statistical Measures (9:15 - 9:50)
- Why statistics?
- Measures of central tendency
- Exercise 1: Calculate mean and median in small groups and compare to class as a whole
- Break (9:50 - 10:00)
- Basic statistical measures (continued) (10:00 - 10:50)
- Measures of variability
- Exercise 2: Calculate measure of variability in small groups and compare to class as a whole
- Break (10:50 - 11:00)
- Calculating statistics on 311 Service Requests (11:00 - 12:00)
- Lunch (12:00 - 1:00)
- Calculating statistics on 311 Service Requests and Visualizing Statistical Data in Excel (1:00 - 1:50)
- Exercise 4: How long do pothole complaints stay open in New York City?
- Anscombe's Quartet
- Histograms
- Data distributions
- Normal
- Long-tail
- Bi-modal
- Creating histograms in Excel with Data Analysis Toolpak
- Scatterplots
- Break (1:50 - 2:00)
- Correlation (2:00 - 2:50)
- Correlation
- Calculating coefficient of correlation in Excel
- Exercise 5: Calculate the correlation between median income and recycling rate in New York City
- Coefficient of determination
- Calculating coefficient of determination in Excel
- Causality
- Prediction
- Break (2:50 - 3:00)
- Linear Regression (3:00 - 3:50)
- Formula for a line
- Calculating coefficients in Excel
- Interpreting linear regression coefficients
- Break (3:50 - 4:00)
- Building a Decision Model in Excel (4:00 - 4:50)
- Introduction to decision models
- Exercise 6: Build a basic decision model in Excel to maximize parking ticket revenue in NYC
- Wrap-up (4:50 - 5:00)
- Course Evaluations and Dismissal (5:00)
- Task to participants
- In a small group, calculate the mean and median for your group and compare with the class as a whole
- Report your findings to the class
- Desired outcomes
- Participants become familiar calculating mean and median in Excel
- Participants understand the value of statistics for comparison
- Participants practice communicating statistics
- Task to participants
- In a small group, calculate the measures of variability for your group and compare with the class as a whole
- Report your findings to the class
- Desired outcomes
- Participants become familiar calculating measures of variability in Excel
- Participants understand the value of statistics for comparison
- Participants practice communicating statistics
- Task to participants
- Prepare data in a guided exercise to calculate the time 311 Service Requests related to noise remain open
- Desired outcomes
- Participants are guided through the steps necessary to calculate the time a service request remains open
- Participants learn Excel functions and formulas if they have no previous experience
- Participants practice calculating statistics on a larger dataset than previous
- Participants communicate findings from statistical analysis
- Task to participants
- Prepare the data in another guided exercise to calculate the time 311 Service Requests related to pothole complaints remain open
- Filter and clean the data as necessary to obtain reliable results
- Compare the results of this analysis with the results from the previous exercise
- Desired outcomes
- Participants practice calculating statistics using the Data Analysis ToolPak in Excel
- Task to participants
- Calculate the correlation between median income and the recycling rate in New York City Community Districts
- Interpret the result based on the calculated coefficient of correlation
- Desired outcomes
- Participants practice calculating the coefficient of correlation on NYC data
- Participants practice communicating statistics
- Task to participants
- Follow along in a guided exercise to maximize parking ticket revenue in NYC by varying the assignment of parking ticket officers in the 5 boroughs on New York City using the Excel Solver add in
- Desired outcomes
- Participants are familiar with creating decision models in Excel
- Participants are able to communicate the outcome of decision models