ACCY 571: Statistical Analyses for Accountancy explores advanced concepts in data science by employing a practical approach, including machine learning; probabilistic programming; text, network, and graph analysis; and cloud computing.
Upon completion of this course, students will be expected to understand advanced data science concepts. Students will learn the practical aspects of applying machine and statistical learning in a variety of contexts, as well as different aspects of cloud computing. Specific concepts that will be covered including supervised and unsupervised learning, dimensional reduction, clustering, probabilistic programming, text mining, graph analysis, network analysis, Hadoop, NoSQL data stores, Spark, and streaming data analysis.
Technically, ACCY 570 is a pre-requisite for this course; however, we integrate the lecture for these two courses in such a way that any required concepts have been covered prior to the relevant material in ACCY 571. In addition, we except students to have a strong interest in learning the basic skills necessary for being a data scientist and access to a computer to participate in the course lectures, and to complete the required course assignments.
Note: At present, we are using NCSA's Nebula Openstack cloud computing system to run a course JupyterHub server. Each student is running a Dockerized version of the course software stack. This provides many advantages including robustness against crashes, simplicity of deploying software updates, reduced requirements for students (simply a modern web browser, we have used tablets and smart phones), and simplifying assignment submission. You can also run a Docker container locally, as in previous courses, but this approach is not recommended. In addition, if you work locally, since assignments are automatically collected from your cloud-based Docker container, you must ensure that you push local changes to your course cloud Docker container prior to the deadline.
Since this course meets face-to-face, please become familiar with the safety announcement provided by the University Police.
There are no required textbooks for this course. Instead, we will utilize Internet accessible websites, videos, and documentation as supplemental material to the lesson content. We also will include links, as relevant, to readings from books that are freely available to University of Illinois students, staff and faculty via the University's Safari subscription.
Academic honesty is essential to this course and the University. Any instance of academic dishonesty (including but not limited to cheating, plagiarism, falsification of data, and alteration of grades) will be documented in the student's academic file. In addition, at a minimum the particular assessment, exam, or assignment will be given a zero. Serious or repeated offenses may be punished more severely.
Guidelines for collaborative work: Discussing course material with your classmates is in general a good idea, but each student is expected to do his or her own work. On assignments, you may discuss the problems and concepts behind them, but you are responsible for your own answers. Please do not post code in the forums! Finally, on assessments and quizzes, your answers must of course be your own. For further info, see the Student Code, Part 4. Academic Integrity.
The instructional staff will use the Announcement Forum on the course Moodle to communicate important course information. Do not unsubscribe from this forum or you risk missing important news!
The preferred method for student communication in this course is to use the Q&A Forum on the course Moodle. The instructional staff monitors this forum and will respond in less than 24 hours (in general we will respond even faster than this, especially during normal business hours). Furthermore, your fellow students may be able to help even faster. We also encourage you to search this forum prior to making a new post since your question may have already been answered. You can search a forum on Moodle by using the Search forums tool that is located on the upper right corner of any Moodle forum.
If you have a question (that is not answered in this syllabus nor on the online course forums) you can email the instructional staff, however, this should be a last resort. If we feel the question is best answered on the Q&A Forum, we reserve the right to post your question and our answer on Moodle.
Scheduled office hours are listed below for all instructors. You can also communicate via the course forums and email.
Name | Data | Time | Location |
---|---|---|---|
Brunner | Wednesday | 1:30 pm - 2:30 pm | 226 Astronomy |
Kim | Thursday | 1:30 pm - 2:30 pm | 234 Astronomy |
Note: The following list of topics is tentative. We build the course during the semester for several reasons:
- This is a new course, covering dynamic content!
- This course integrates with ACCY 570, which is itself dynamic.
As a result, we feel it is imperative to be able to change the planned pace and material to benefit the majority of enrolled students. We are currently planning on mixing the ACCY 570 and ACCY 571 lecture spots to guarantee proper topic coverage. As a result, the first four weeks of both courses will be used to cover ACCY 570 material, the next three and a half weeks will be used to cover ACCY 571 material, and the remaining weeks will utilize the regular class periods to cover material relevant to the appropriate course.
Week | Topics |
---|---|
Week 1 | N/A |
Week 2 | N/A |
Week 3 | N/A |
Week 4 | N/A |
Week 5 | Introduction to Machine Learning |
Week 6 | Regression and Classification |
Week 7 | Clustering and Outlier Detection |
Week 8 | Dimensional Reduction |
Week 9 | Introduction to Text Mining |
Week 10 | Introduction to Social Media |
Week 11 | Advanced Text Mining |
Week 12 | Introduction to Network Analysis |
Week 13 | Introduction to Cloud Computing |
Week 14 | Cloud Computing: Streaming Data |
Week 15 | Introduction to Deep Learning |
Each week will provide learning objectives and an outline of the activities for that week with a list of all deadlines and corresponding point values for assignments.
Readings will consist of articles and excerpts from books and Web sites, internet-accessible videos demonstrating a concept, and, in some cases, IPython Notebooks that can be viewed statically on the Github website, or (via the preferred approach) by interacting with them via the course JupyterHub server. You will be required to read and be familiar with the content of these documents. Readings are contextualized as part of the weekly lesson content and are located in the "Readings" section of each lesson.
Lessons will expand upon, or clarify key concepts in the reading assignments or supplement or add to the reading. Part of each class period will be used to review the concepts in the relevant readings, after which students will be expected to pursue these concepts in more depth. This will include using the course JupyterHub server to complete specific activities, such as learning to program, using the Unix command line, or working with a relational database.
Occasionally, a lecture period will also utilize a moodle assessment to spot-check understanding of important material. These assessments will form part of your class participation grade. These lesson assessments must all be completed in class.
Every week but the first and last will contain an assignment that will involve one or more computational tasks related to the focus for that given week. Your assignment will be automatically collected at the deadline from the course JupyterHub server. These assignments will be automatically graded for your instructor grade, and will also be randomly distributed for peer assessment. You will have up to five assignments to grade as part of peer assessment. You will receive thirty points for simply grading your peer's assignments. Your peer assessment score will be worth a maximum of forty points, and we will drop the highest and lowest score and average the three remaining scores.
To receive full credit from instructor grading, your assignment must be submitted prior to the deadline. There will be NO grace period, late assignments will not be accepted. The assignment deadline is 6:00 PM Central on the Tuesday following the relevant week.
NOTE: We will drop your lowest assignment score, but only if you performed peer assessment.
Weekly assignments will be reviewed by your course peers, as well as automatic instructor grading. 70 points (out of the maximum 150 points for each assignment) for each weekly assignment submission will derive from peer review, 80 points (out of the maximum 150 points for each assignment) are assigned from automated instructor review. You will receive 30 points each week for simply viewing and grading your peers' assignments. Note that you can (and should) still grade your peers even if you miss an assignment submission. Peer review of an assignment must be completed by 6:00 PM Central on Friday of the following week (i.e., you submit your assignment on a Tuesday and you must peer assess other students assignments by the following Friday). You will be assigned assignments to grade approximately one hour after the assignment deadline, thus around 7:00 pm Tuesday evening of the relevant week.
Item | Grade |
---|---|
Instructor Assessment | 80 points |
Peer Grading | 30 points |
Peer Assessments | 40 points |
Total | 150 points |
Note that we will only review clearly erroneous peer assessments (this means there needs to be a major problem). Review requests that are deemed insignificant are subject to an instructor determined point reduction.
This course will utilize two exams. The first exam will be in-class on Wednesday, October 12 from 11:00 am - 12:20 pm. The second exam will likely be held during Finals week; more specific information will be forthcoming.
Item | Grade Percentage |
---|---|
First Exam | 15% |
Second Exam | 15% |
Assignments | 60% |
Lecture Participation | 10% |
Note: We will drop your lowest assignment score, but only if you performed peer assessment.
Final grades will be graded on a curve, if necessary. The letter grade cutoffs will be set at the traditional 90%, 80%, and 70% limits, and plus/minus will be added if you are within two points of the traditional cutoffs (so 100–98 is an A+ and 90–92 is an A-).
Percentage | Letter Grade |
---|---|
98-100 | A+ |
92-98 | A |
90-92 | A- |
88-90 | B+ |
82-88 | B |
80-82 | B- |
78-80 | C+ |
72-78 | C |
70-72 | C- |
68-70 | D+ |
62-68 | D |
60-62 | D- |
Below 60 | F |
There is a course Wiki hosted on the course github repository. If you have a problem and obtain a solution (either through your own efforts or in partnership with an instructor), consider writing your problem and solution up as a FAQ post in the github wiki. You get extra credit for doing this and also help your classmates!
To get credit for your wiki entry you must contact the course teaching assistant, Edward Kim. He will review your post and indicate how many points you will receive, and if he would be willing to review an edited post for additional information. You can submit multiple Wiki entries.
The following table summarizes the typical weekly schedule, where the assignments are collected the Tuesday following the week when the assignments are released.
Task | Days into Week | Date/Time |
---|---|---|
Week Opens | 0 | Monday, 12:00 am |
Lecture 1 | 0 | Monday, 9:30-10:50 am |
Lecture 2 (when used for 570) | 0 | Monday, 11:00 am - 12:20 pm |
Assignment Released | 2 | Wednesday, 9:00 am |
Lecture 3 | 2 | Wednesday, 9:30-10:50 am |
Lecture 4 (when used for 570) | 2 | Wednesday, 11:00 am - 12:20 pm |
Assignment Collected | 8 | The following Tuesday, 6:00 pm |
Assignments distributed for Peer Assessment | 8 | The following Tuesday, 7:00 pm |
Peer Assessment Deadline | 11 | The following Friday, 6:00 pm |
The following table provides the full set of deadlines for ACCY 571.
Date | Item | Time |
---|---|---|
Wed. Sep 21, 2016 | HW#1 Out | 9:00 AM |
Tue. Sep 27, 2016 | HW#1 In | 6:00 PM |
Wed. Sep 28, 2016 | HW#2 Out | 9:00 AM |
Fri. Sep 30, 2016 | HW#1 Peer | 6:00 PM |
Tue. Oct 4, 2016 | HW#2 In | 6:00 PM |
Wed. Oct 5, 2016 | HW#3 Out | 9:00 AM |
Fri. Oct 7, 2016 | HW#2 Peer | 6:00 PM |
Wed. Oct 12, 2016 | Midterm: L15-L28 | 11:00 AM |
Tue. Oct 18, 2016 | HW#3 In | 6:00 PM |
Wed. Oct 19, 2016 | HW#4 Out | 9:00 AM |
Fri. Oct 21, 2016 | HW#3 Peer | 6:00 PM |
Tue. Oct 25, 2016 | HW#4 In | 6:00 PM |
Wed. Oct 26, 2016 | HW#5 Out | 9:00 AM |
Fri. Oct 28, 2016 | HW#4 Peer | 6:00 PM |
Tue. Nov 1, 2016 | HW#5 In | 6:00 PM |
Wed. Nov 2, 2016 | HW#6 Out | 9:00 AM |
Fri. Nov 4, 2016 | HW#5 Peer | 6:00 PM |
Tue. Nov 8, 2016 | HW#6 In | 6:00 PM |
Wed. Nov 9, 2016 | HW#7 Out | 9:00 AM |
Fri. Nov 11, 2016 | HW#6 Peer | 6:00 PM |
Tue. Nov 15, 2016 | HW#7 In | 6:00 PM |
Wed. Nov 16, 2016 | HW#8 Out | 9:00 AM |
Fri. Nov 18, 2016 | HW#7 Peer | 6:00 PM |
Tue. Nov 29, 2016 | HW#8 In | 6:00 PM |
Wed. Nov 30, 2016 | HW#9 Out | 9:00 AM |
Fri. Dec 2, 2016 | HW#8 Peer | 6:00 PM |
Tue. Dec 6, 2016 | HW#9 In | 6:00 PM |
Fri. Dec 9, 2016 | HW#9 Peer | 6:00 PM |