Skip to content

Latest commit

 

History

History
377 lines (251 loc) · 29.5 KB

README.md

File metadata and controls

377 lines (251 loc) · 29.5 KB

Urban Informatics and Data Visualization (CYPLAN255)

Course repo for material related to CYPLAN 255 at UC Berkeley, Spring 2024

Getting started

Syllabus

CYPLAN255: Urban Informatics and Data Visualization
Department of City and Regional Planning, UC Berkeley
Spring 2024

Instructor Max Gardner / [email protected]
Office hours (Zoom): Mon/Tues 5-6 pm -- sign-up here
GSI Meiqing Li / [email protected]
Office hours (Zoom): Thurs 7:30-8:30 pm -- sign-up here
Office hours (IRL): Fri 12-1pm, Bauer Wurster Hall Room 222
Details Meeting times: Mon/Wed 6-7:30 pm
Meeting location: Zoom
Course website: https://bcourses.berkeley.edu/courses/1532651
Course GitHub repository: https://github.com/mxndrwgrdnr/UCB_CYPLAN255_2024
Prerequisites: CP201A, CP204C, or equivalent experience
Grading: out of 100 pts – attendance (10%) / assignments (15%) / project (75%)
Letter grade scale: 0-59% F, 60-69% D, 70-79% C, 80-89% B, 90-100% A

Overview

The goal of this course is to train students to analyze urban data, derive insights, and create effective visualizations using open source software tools and public data. The course will first introduce the fundamentals of programming in Python before moving on to a survey of data analysis/visualization tools and technologies. Sessions will include lectures and practice exercises. Assignments will reinforce the skills and topics being presented. A final project will provide an opportunity for students to use these skills to complete an end-to-end data analysis of their own design, the results of which will be published on GitHub and presented in class.

This is a "hands-on" course. It requires some tolerance for experimentation, self-directed trial and error, and an interest in learning to write code. If you are willing to roll up your sleeves and embrace some uncertainty, you'll learn the fundamentals of urban data analysis and visualization, and might discover an entirely new lens through which to study, plan, and design neighborhoods, cities, and regions.

Course Materials and Attendance

Attendance will be recorded on Zoom. Students are granted two excused absences over the course of semester without affecting their attendance grade. Qualifying exceptions will be handled on a case-by-case basis.

All required readings will be provided via bCourses or hyperlinks on this electronic syllabus. Lecture slides, example code, and demos/exercises will all be made available via GitHub.

We'll write code in Jupyter notebooks using the Anaconda Python distribution plus some additional software libraries. In some cases you may want to use a Berkeley service called DataHub instead of your own computer – but in general we encourage you to get comfortable installing Python and Python tools on your own computer. You'll get far more comfortable with it that way, and know that whatever you learn, and whatever you install, you can take with you when the class is over. We will use only open source, free software in this course. You'll be surprised how far you can go with it.

You should plan to bring a laptop** to all class sessions.

**NOTE TO WINDOWS USERS: Most exercises and lectures will be OS-agnostic, but command-line tools will be demonstrated in a Unix-like terminal. Windows users can use the Windows command prompt if they choose, but instructor support will be limited. Instead, I recommend installing one of the following to gain access to a Unix-like shell: Windows Subsystem for Linux, Cygwin, Git Bash, or PyCharm.

Assignments (15 pts)

Students will develop skills gradually through assignments paced over the semester. These will typically involve writing some code and documenting it, using Jupyter Notebooks that can be shared and interactively run inside a web browser, and providing a writeup discussing the assignment and its results.

Assignments will be posted on the course GitHub repository, and students will need to pull them down from there. Assignments will generally be due one week from the day they are assigned, by 11:59pm PST. Students will submit their completed assignments by opening a pull request on the course repo.

Assignments are designed to build a degree of mastery of skills and will be used as a means of ensuring that students are keeping up with the material and not falling behind. All assignments will be marked down 10% for each day late, so please submit on time.

Readings and Exercises (ungraded)

This course has readings associated with nearly every class meeting. These are suggested readings, unless otherwise specified. You will not be quizzed on them, and they may or may not be referenced in class. They are, however, strongly recommended. They have been thoughtfully compiled over the many years this course has been taught, and are designed to help you get the most of this course and make your final projects a success.

In addition to the readings, the Course Schedule (see below) specifies several other exercises which, unless explicitly stated, will not be collected. They will, however, be used for the following: 1) to facilitate discussion/break-out groups in class; 2) to inspire final project ideas; and 3) to ensure that you make steady progress on your final projects throughout the course of the semester. In some cases there will be class time designated for working on these ungraded exercises, but not always. It is in your own best interest, and that of your fellow students, that you keep up with them.

Final Projects (75 pts)

Final projects will require harnessing the skills practiced in the exercises and developing a more independent work plan to accomplish an analysis of data. More details will be provided later in the semester.

Project components and due dates:

  1. Project proposal + initial analysis (10 pts)
    • Due Sunday, Mar 10
  2. Final presentation (20 pts)
    • Slides, etc. presented during the last week of class.
  3. github.io project page (45 pts)
    • Due Monday, May 6 (first day of Finals week)

Self-Reliance, Collaboration, and Academic Integrity

This course requires a lot of experimentation and trial-and-error. Google and StackOverflow will be your best friends! Google your questions, Google any error messages, and if you can't find an answer, talk to your classmates, and if you still can't sort it out, e-mail Max and Irene. When you e-mail us, tell us what you've searched and what you've discovered, and include screenshots, links, and error messages. 99% of the time, somebody else has encountered the exact issue you are having and has documented the solution.

That being said, you are welcome — in fact, encouraged — to work on the homework exercises and your semester project together with other students. Discussing code is a great way to understand it better, and can make tracking down bugs less frustrating. If you copy an entire substantive piece of code (i.e., several lines or more) from the internet or from another student, we ask that you indicate this in a code comment. Otherwise, we will expect everything you submit to be your own original work. Details of the U.C. Berkeley Academic Honor Code can be found here.

Campus Policies and Guidelines

https://teaching.berkeley.edu/campus-policies

Accommodations for Students with Disabilities

UC Berkeley is committed to creating a learning environment that meets the needs of its diverse student body. If you anticipate or experience any barriers to learning in this course, please feel welcome to discuss your concerns with me.

If you have a disability, or think you may have a disability, you can work with the Disabled Students' Program (DSP) to request an official accommodation. The Disabled Students' Program (DSP) is the campus office responsible for authorizing disability-related academic accommodations, in cooperation with the students themselves and their instructors. You can find more information about DSP, including contact information and the application process here. If you have already been approved for accommodations through DSP, please meet with me so we can develop an implementation plan together.

Students who need academic accommodations or have questions about their accommodations should contact DSP, located at 260 César Chávez Student Center. Students may call 510-642-0518 (voice), 510-642-6376 (TTY), or email [email protected].

Department Climate Statement

The Department of City and Regional Planning in the College of Environmental Design is committed to an equitable and inclusive educational environment for all. As students, staff, and faculty, we strive to foster a community in which we celebrate our diversity and affirm the dignity of each person by respecting the identities, perspectives, and experiences of those with whom we work. As a member of the UC Berkeley community, the Department of City and Regional Planning is committed to a safe work environment for all.

The following campus-wide resources are available to assist with this effort:

Reading Material and Web Resources

The following books and websites may be helpful resources, and we will draw material from many of them during the semester. (All readings assigned for class will be available online or as PDFs in bCourses.) Each piece of software we'll use also has official documentation online.

  • Adhikari, Ani and John DeNero, Computational and Inferential Thinking, 2019(https://inferentialthinking.com)

    • Online textbook developed for Berkeley's Foundations of Data Science class.
  • Downey, Allen, Think Python, 2nd Edition, O'Reilly Media, 2015([https://greenteapress.com/wp/think-python-2e/))

    • Introduction to programming using Python. All the material is online.
  • Foster, Ian, et al., Big Data and Social Science, CRC Press, 2017

    • A practical guide to gathering data and working with it in various ways.
  • Lutz, Mark, Learning Python, 5th Edition, O'Reilly Media, 2013(https://learning-python.com/about-lp5e.html)

    • Much more depth than you need for this class, but a great reference.
  • McKinney, Wes, Python for Data Analysis, 2nd Edition, O'Reilly Media, 2017

    • More depth about Pandas than Python Data Science Handbook, but less readable.
  • Pilgrim, Mark, Dive into Python 3, Apress, 2009 (https://diveintopython3.net)

    • Nice tutorials and reference for aspects of core Python syntax and programming concepts, but missing some topics that are in Think Python. All the material is online.
  • VanderPlas, Jake, Python Data Science Handbook, O'Reilly Media, 2016(https://jakevdp.github.io/PythonDataScienceHandbook)

    • Excellent – working with data, making graphs and charts, machine learning. All the material is online.
  • Real Python (https://realpython.com) — Great Python tutorials on numerous topics.

  • Rey, Sergio, et al., Geographic Data Science with Python, 2020 (https://geographicdata.science/book/intro.html)

    • Great resource for geospatial data analysis in Python from the creators of PySAL.
  • Software Carpentry (https://software-carpentry.org/lessons)

    • Tutorials about scientific computing.
  • Stack Overflow (https://stackoverflow.com) — Best website for user-contributed coding Q&As.

Topics + Course Schedule

The topics covered by this course are organized into the following seven (7) modules:

  1. Fundamentals of Programming
  2. Intro to Data Analysis in Python
  3. Intro to Data Visualization
  4. APIs + Open Data
  5. Working with Geospatial Data
  6. Visualizing Spatial Data
  7. Statistical Analysis + Machine Learning

MODULE 1: FUNDAMENTALS OF PROGRAMMING

  • Weds, Jan 17 -- Course Introduction: Overview of the course, expectations, prerequisites, learning objectives, assignments and projects.

  • Mon, Jan 22 -- Intro to the Command-line: Using a command-line interpreter; common syntax, programs, and arguments; accessing and navigating the file system; Python interpreters; conda environments; starting/stopping a Jupyter server; using Git; text editors

  • Weds, Jan 24 -- Git and GitHub: Principles of distributed version control; repositories; commits; branches; forks; making a GitHub pages website

    • Exercises

      • Create your own github.io website by following this helpful tutorial from the Data89 class at Cal. For advanced users, take it one step further with a slightly more advanced version here.
        • NOTE: although this is only listed as an "exercise" and not an "assignment", your final project will be submitted as a GitHub Pages website, so it would be wise to get started on this sooner than later.
    • Readings

  • Mon, Jan 29 -- Python at the Command-line Anaconda distro; Python vs. IPython vs. Jupyter; virtual environments; intro to the Jupyter Notebook

    • Assignments

      • Assignment 1 released (due Sun, Feb 11)
    • Exercises

      • Re-read and work your way through "notebooks/lecture_03_intro_python_jupyter.ipynb"
      • Continue to work your way through the GitHub Pages website tutorial.
    • Readings

  • Weds, Jan 31 -- The Python Standard Library Variables, expressions, and assignment; built-in functions and data types; the math module; working with strings and lists and dicts.


MODULE 2: INTRO TO DATA ANALYSIS IN PYTHON

  • Mon, Feb 5 -- Programming Logic: Procedural programming; control flow in Python (conditional logic, loops, functions)

  • Wed, Feb 7 -- Object-oriented Programming: Modules, classes, methods, and functions; namespaces and scopes; lambda functions and map() for iteration

    • Readings
    • Assignments
      • Assignment 2 released (due Tues, Feb 13)
  • Mon, Feb 12 -- Data Analysis in Python: NumPy arrays and matrices; Pandas Series and DataFrames; loading, displaying and exporting data; descriptive statistics; indexing and filtering

  • Wed, Feb 14 💘 -- More Pandas: Vectorized operations; merge, join, concatenate; group by and aggregations; cleaning and imputing missing data

    • Readings
    • Exercises
      • Spend 2-3 hours working through notebooks 7 and 8 on your own

MODULE 3: INTRO TO DATA VISUALIZATION

  • Mon, Feb 19 NO CLASS (President's Day)

    • Exercises
      • Spend 2-3 hours working through notebooks 7 and 8 on your own
  • Wed, Feb 21 -- Data Visualization Pt. I: Data viz. for good and evil; use Matplotlib and Seaborn to create static images; dimensionality of data; continuous vs. categorical data; univariate distributions

    • Exercises

      • Find three (3) examples of interesting data visualizations and describe in 2-3 sentences what makes each of them good, bad, or misleading. Be prepared to talk about them in class.
    • Readings

    • Assignments

      • Assignment 3 released (due Tues, Feb 27)
  • Mon, Feb 26 -- Data Visualization Pt. II: Interactive plots, widgets, and apps.

    • Readings

MODULE 4: OPEN DATA AND APIs

  • Wed, Feb 28 -- Intro to APIs: What's in an API; performing queries; authentication; Socrata;

    • Assignments

      • Project proposal assignment (Assignment 4) released (due Sun, Mar 10)
    • Readings

  • Mon, Mar 4 -- APIs and Beyond: Geocoding; web scraping; parsing XML


MODULE 5: WORKING WITH GEOSPATIAL DATA


MODULE 6: VISUALIZING GEOSPATIAL DATA

  • Wed, Mar 20 -- Intro to Network Analysis: Graph theory; GTFS; Python tools for working with networks

    • Readings
      • Boeing, Geoff. "OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks." Computers, Environment and Urban Systems 65 (2017): 126-139. https://doi.org/10.1016/j.compenvurbsys.2017.05.004
      • Blanchard SD, Waddell P. Assessment of Regional Transit Accessibility in the San Francisco Bay Area of California with UrbanAccess. Transportation Research Record. 2017;2654(1):45-54. https://doi.org/10.3141%2F2654-06
      • Foti, Fletcher, Paul Waddell, and Dennis Luxen. "A generalized computational framework for accessibility: from the pedestrian to the metropolitan scale." Proceedings of the 4th TRB Conference on Innovations in Travel Modeling. Transportation Research Board. 2012. http://onlinepubs.trb.org/onlinepubs/conferences/2012/4thITM/Papers-A/0117-000062.pdf
      • https://www.mapzen.com/blog/animating-transitland/
      • Li, Yang, and Wei "David" Fan. "Modeling and evaluating public transit equity and accessibility by integrating general transit feed specification data: Case study of the City of Charlotte." Journal of Transportation Engineering, Part A: Systems 146.10 (2020): 04020112. Available here.
  • Mon, Mar 25 -- NO CLASS (🏄 Spring Break 🏄)

  • Wed, Mar 27 -- NO CLASS (🏄 Spring Break 🏄)

  • Mon, Apr 1 -- Effective Communication of Spatial Data: Types of geospatial visualizations; color theory; common pitfalls of cartographic representation

  • Wed, Apr 3 -- Building Static Maps in Python: Survey of Python libraries for plotting geospatial data on a map

    • Readings
      • Norwood, Carla; Cumming, Gabriel (2012). Making Maps That Matter: Situating GIS within Community Conversations about Changing Landscapes. Cartographica: The International Journal for Geographic Information and Geovisualization, 47(1), 2–17. doi:10.3138/carto.47.1.2
      • Chapter 5 of Geographic Data Science with Python

MODULE 7: STATISTICAL ANALYSIS + MACHINE LEARNING


UC Berkeley sits on the territory of xučyun (Huichin), the ancestral and unceded land of the Chochenyo speaking Ohlone people, the successors of the sovereign Verona Band of Alameda County. This land was and continues to be of great importance to the Muwekma Ohlone Tribe and other familial descendants of the Verona Band.

We recognize that every member of the Berkeley community has, and continues to benefit from, the use and occupation of this land, since the institution's founding in 1868. Consistent with our values of community, inclusion and diversity, we have a responsibility to acknowledge and make visible the university's relationship to Native peoples.

It is vitally important that we not only recognize the history of the land on which we stand, but also, we recognize that the Muwekma Ohlone people are alive and flourishing members of the Berkeley and broader Bay Area communities today.

Read more on the Centers for Educational Justice & Community Engagement website.