While specific schedules will vary from project to project, the summer will follow roughly the structure below. See also high level summer plan for an outline of the flow of the summer, and general concepts that will inform the topics addressed by tutorials and speakers as well as what we'll ask fellows to present about in weekly updates and deep dives.
- Before the Summer
- Learning About Projects and Partners
- Working on Your Project
- Presentations
- Wrap-up and Handovers
- Curriculum
Prior to your arrival, we provide you with the prerequisites so you can familiarize yourself with the tools you’ll use all summer and equip yourself with the knowledge to be able to follow along with the curriculum. You'll receive a list of software to install before the first day of orientation, programming languages you should brush up on, and tools we suggest you use to manage your data workflow. We will also send you a tentative list of projects and ask for you to respond with your preferences. Based on your preferences, the requirements of each project, and the balance of disciplines within each team, we will assign teams of 3-4 fellows per project. You will learn which project you've been assigned during the second week of the fellowship.
Staff at the Center for Data Science and Public Policy (DSaPP) work hard year-round to recruit partners and scope projects. This is a lengthy, complicated process with plenty of logistical hurdles (think legal data sharing agreements and data transfer challenges), which means the list of project partners is usually not finalized until the fellowship begins. You will find out your project and team assignments in the second week of the fellowship.
We ask all our project partners to come to Chicago in the first two weeks of the summer. During partner visits, you’ll spend a lot of time talking through the problem and the data with them, and they give a presentation to the fellowship. We want you to meet the people you’re working with face to face. This also gives all of the fellows a chance to hear about all of the projects and for the partners to meet other project partners and the other fellows.
After orientation, you will spend the first part of the summer getting to know your project partner and their unique challenges. While the projects have already been scoped, you will almost certainly need to refine that scope throughout the summer. For example, we may know your partner’s goal is to find violations of a particular law. Your team would then work with the partner to narrow that to: (1) locations at risk of violations in general; (2) locations at risk of multiple violations; or (3) locations with the most impactful violations.
We believe it is important for you to thoroughly understand the problem and the process that gives rise to the data before getting too entrenched in the data itself. A deep understanding of your partner and the problems they face is crucial to knowing what your variables really mean, and defining your outcome and evaluation metrics will depend heavily upon this understanding as well. While we know you are eager to dig into the data, you will find that at least as much of your time is devoted to talking about the data as manipulating it. This is a good thing.
You will also be working with real data, which is messy! You will encounter missing values and things that don’t seem to make sense. Talking to your partners and telling them what you see from looking at the data they’ve given you will help you evaluate your own understanding, reconcile inconsistencies (or carry on being aware of them), and identify whether the errors lie in the data itself or in whatever preconceived notions you had.
Although we aim to have all the data from project partners ready well in advance of the fellowship, there are inevitably data transfer delays and partial data that will continue to be augmented throughout - and sometimes after - the fellowship. As you explore, you'll find holes in the provided information, or identify potential new useful sources of data, and will need to work with your partner to decide whether it’s possible to acquire the data you need in the time that you have; that’s the reality of working with real world partners and sensitive data.
Fellows drive the work of every project, learning their subject matter in depth, writing code, and collaborating with their project partners to develop something useful and usable. Over the course of the summer, your team will:
- Explore the (real) data your partners collect
- Design your project workflow based on what tools you'll use and how your team works together
- Identify user stories to make sure what you're creating has a real purpose
- Develop a machine learning pipeline to turn raw data into analysis that can inform decisions
- Build relevant models that reflect the subject you're analyzing as closely as possible
- Add features to your model based on subject matter expertise, available data, and exploratory analysis
- Evaluate model performance using the metrics that make sense for your project
- Create an interface for your partner to use your results (API, dashboard applications, etc)
We believe that our work is only useful if we are able to communicate what we do and why it’s important to our partners, peers, and the general public. As such, an important piece of this training program is learning to present the work you do.
Each week, a member of your team will give a 2-3 minute update to the entire fellowship, outlining your recent progress and findings, giving shoutouts to other fellows or staff members who have helped you along the way, and things you're stuck on and are seeking help with. Two teams per week (so each team presents twice throughout the summer) will give a longer 20-minute "deep dive" presentation, outlining more technical components of your project and seeking feedback from other fellows and mentors.
At the end of the summer, your team will develop two polished presentations: one 5-minute presentation for the DataFest final event, and one 20-minute technical presentation for use at a local tech meetup at the end of the summer and for future presentations at conferences or your home institution.
In the last few weeks of the fellowship, each team will present at a local meetup. Each team will elect one team member to deliver the short final presentation at DataFest; however, all team members should feel comfortable delivering all presentations. Our communications staff will work with you on both of these presentations, brainstorming ways to present your work and providing feedback on your delivery.
To make sure the work you do this summer has real impact, a lot more work needs to be done after the official end of the fellowship; some of this will be done by your project partner, and some will be done by the Center for Data Science and Public Policy at the University of Chicago. You will need to transition the work over to your project partners so they can validate, implement, and extend your work. To do this, you will have to document your work throughout the summer and wrap it up neatly at the end of the summer.
We ask that you prepare a poster to be displayed at DSSG events and for potential conference poster sessions, a technical report, and an outline of a paper. This makes it easier to collaborate once you and your team mates are no longer working at the same desk every day. We also ask that all your code can be run on a new machine, and that there is sufficient documentation for someone else to replicate and understand your work.
Our number one goal is to train the fellows to do data science for social good work. Here is some insight into how we accomplish this throughout the summer.
To look through all of our curriculum materials, please see the curriculum section.
We expect that every incoming fellow has experience programming in Python, a basic working knowledge of statistics and social science, and an interest in doing social good. However, we understand that everyone comes from a different background, so to ensure that everyone is able to contribute as a productive member of the team and the fellowship, we start the first few weeks off with an intensive orientation, getting everyone "up to speed" with the basic skills and tools they'll need.
- Week One
- Prerequisites
- Software Setup
- Pipelines and Project Workflow
- Git and Github
- Making the Fellowship
- Skills You Need to do DSSG
- Command Line Tools
- Project Management, Partners, and Communications
- Data Exploration in Python
- Project Scoping Intro
- Week Two
- Week Three
- Reproducible ETL
- The Work We Do
- Record Linkage
- Databases
- Quantitative Social Science
Training continues on throughout the summer in the form of "lunch and learns" - less formal lessons over lunch - and teachouts by staff or fellows who have relevant specializations. Sometimes we ask for volunteers to do a teachout on a topic we think is important, like data visualization or inference with observational data, and a few fellows will work together to put together a lesson. Sometimes a DSSGer will suggest a topic that they have a pet interest in, or that they think will be relevant to one or more of the summer projects. We have lunch and learns scheduled twice a week through the summer, and some fellows choose to offer optional teachouts at the end of the workday.
Although we don't expect all twelve teams to be working in unison, there is a general structure to the summer that guides how we pace the remaining curriculum - we try to schedule topics so that fellows know about them with enough time to incorporate them into their projects, but not so early that they've forgotten about what they learned by the time the knowledge would be useful. As we get nearer to the end of the summer, there are fewer required topics, so there are more open time slots for fellows to do teachouts.
- The Rest of the Summer
- Educational Data and Testing (Kevin Wilson)
- Social Good Business Models (Allison Weil and Paul van der Boor)
- Basic Web Scraping (Matt Bauman)
- Pipelines and Evaluation
- Feature Generation Workshop
- Test, Test, Test (Benedict Kuester)
- Beyond the Deep Learning Hype (Reza Borhani)
- Causal Inference with Observational Data (Dean Magee, Monica Alexander, Zhe Zhang, and Jackie Gutman)
- Model Evaluation
- Spatial Analysis Tools
- Operations Research (Jan Vlachy)
- Theory and Theorizing in the Social Sciences (Tom Davidson)
- Web Classification (Yaeli Cohen)
- Presentation Skills (Allison Weil)
- Data Visualization (Jon Keane, Monica Alexander, Diego Olano, Ned Yoxall)
- Natural Language Processing (Garren Gaut)
- Opening Closed Data (Jen Helsby)