Course offered Fall 2015 at GWSB for students in the MSBA program, taught by Dan Chudnov, with materials online at https://github.com/dchud/warehousing-course.
This course provides a practical grounding in relational databases with a focus on data warehousing and dimensional modeling, along with hands-on experience in these tools and other traditional and contemporary methods for managing and analyzing data at scale, such as the Unix command line and Apache Spark. We will focus on using these tools for the middle phases of data analysis: wrangling, exploring, and modeling, with an emphasis on delivering reproducible data analyses. This course is complementary to other foundational courses in the Business Analytics program; as such, topics and techniques from Statistics, Programming, Data Mining, and Optimization may be present as use cases, but will not be a focal point for grading.
Students are asked to install an Ubuntu 14.04 preloaded with the Data Science Toolbox along with Jupyter, MySQL, PostgreSQL, R, Julia, and Spark preconfigured. Installation (via Virtualbox and Vagrant) instructions are available here.
OS X or Linux users may be able to install necessary pieces via Anaconda and Homebrew (OS X).
This work is in the public domain. See LICENSE for details.