By Eetu Mäkelä, professor in Digital Humanities (Human Sciences–Computing Interaction) at the University of Helsinki.
{% hint style="warning" %} This content is not yet complete, in the sense that some sections have not yet been converted from their original lecture slide format into self-contained texts for self-study. Each such section has a header similar to this at the top noting its draft status, as well as a :construction_site: mark in the table of contents below. {% endhint %}
People of all levels in the humanities and interpretive social sciences (henceforth abbreviated as human sciences) interested in whether computational methods might help them in their own work.
Prerequisites: Absolutely none.
Aside: Why should you be interested in computational methods? Two reasons:
- they may allow you yourself to do your work more efficiently, and
- they may lead to completely new and powerful ways of addressing questions in your field
The probability of either of these happening very much depends on what you are interested in, but not in any way that can be shortly enumerated. Instead, that is what this course aims at enabling you to discover yourself.
This course is an introductory course on applying modern data processing to complex social and historical data. As a signposting course, the course describes the landscape of computational human sciences. The main learning goals of the course are that after completing it, a student will be able to:
- make informed decisions on which computational approaches will be of use to themself, and
- understand, follow and discuss the development of computational approaches within their field in general
They will also have the necessary background to avail of more specific courses and learning resources to further their understanding in these directions. With regard to subfields of the humanities or social sciences, the course makes no delineations, on the contrary arguing that by taking examples from different fields, a deeper understanding of the possibilities afforded by computation can be attained. For more details, see the introduction.
In terms of smaller objectives, as part of the above, after the course:
- The student understands the multiple ways in which computational approaches benefit work within the human sciences.
- They are able to use ready tools to work with data.
- In addition, they have attained knowledge of the fundamental concepts of programming, through which they can start to expand their capabilities, should they so choose.
- The student also gains a basic understanding of the central fundamental concepts of statistics, which both 1) act as a general framework with regard to which many statistical approaches encountered later can be positioned, and 2) act as a practical foundation from which to pursue further understanding.
- Further, the student gains a general literacy on advanced statistical and computer science methods applicable to computational human sciences, and when to apply them (as well as crucially, when and how not to apply them).
- They also learn how open, reproducible research and publishing is done in practice.
- Finally, the student learns to apply all of the above in practice in a small concrete computational human sciences project.
This course is meant for both independent self-study (reading up on only certain sections of the course), as well as for completing as either a contact learning or MOOC course with a group of like-minded students. For material relating to particular instances of this latter mode of study, see here.
Workload-wise, the full course is rated at 5 ECTS, which officially translates to ~135 hours of study. However, ECTS workload ratings have always diverged both from reality, as well as student expectations. In practice, I expect the load to be some 60-70 hours, or about ½ to ⅔ of the official norm. Generally, courses at this workload-level seem to be evaluated by students as "moderate to heavyish" in workload (because sometimes you can get 5 ECTS even for something like 25h or ⅕ of the official norm, for example from just sitting in lectures 14 x 1½ hours, and then doing a couple of hours of work on top of that!)
( :construction_site: marks parts of the course not yet fully converted out of lecture slide format)
- Introduction: three approaches to methods for digital humanists
- Easy, ready-made tools for data acquisition, cleanup, visualisation and exploration
- Fundamentals of programming for data processing
- Data analysis method literacy
- Data :construction_site:
- Easy tools for acquiring, processing and exploring data :construction_site:
- Data processing: fundamental concepts of programming for humanists
- Data processing: regular expressions
- Data analysis: fundamental concepts of statistics
- Computational data analysis method literacy :construction_site:
- Open, reproducible research and publishing :construction_site:
- Digital humanities project
"At times the course felt like being hit by a bus, the way we were forced to figure out many things on our own. It did at times result in an awful lot of stress, but it actually was the best way to learn how to do these things and more importantly, how to find info on how different things work and should be done." - course feedback
There's a lot to take in during the course, and much of it may be unfamiliar and at first confusing. A major principle of the course is that you should not try to wholly understand everything in the first instance. While an effort has been made to keep the language and concepts as simple as I could make them, as well as order them sensibly with regard to each other, often there was no way I could order everything neatly into a linear learning progression.
For example, to really understand easy to use end-user tools, one needs to know how they relate to the possibilities of computational analyses in general, as well as different types of data and different types of preprocessing of that data. Further, to properly contextualise them, one also needs to understand how their affordances differ from those available to users of programmatic analysis libraries. However, ready to use tools are still presented before programming, data transformations and computational analyses, because I feel having tried them in practice provides a good springboard for understanding these more abstract and complex topics.
Thus, when going through the course and doing the assignments, try not to be bothered by not understanding everything in the first go. Instead, it is enough at each point to just have even a vague general notion or gist of things, and trust that it will all make sense in the end, once you've gone through all the subtopics.
- The course has Slack channels at dhintros.slack.com used for both returning some assignments as well as peer and teacher support. Please join the Slack as well as the channel for the instance of the course you're on (e.g. #cl4hss2024).
- For linking to quotes in their original context, the course uses hypothes.is. To be able to use this, you must join the CL4HSS group (as well as register in general if you don't already have an account). You also naturally need access to the sources (most commonly through accessing them from a university network / VPN. For example for Helsinki, see this guide).
- If you use the material for self-study and it ends up being useful for you, I'd appreciate a note about this. Feel free to send that either through Slack, e-mail, Twitter or wherever you find me.
The text of this course is licensed under a Creative Commons Attribution 4.0 International License. This means that you are free to use, embed, remix and further develop any part of this course for use in your own course or other material. The only requirement is that you give appropriate credit for this material, provide a link to the license, and indicate if changes were made (see the license for more details).
If you do make use of this material, I'd naturally also appreciate a ping, as well as the possibility to merge any improvements to this version, even if neither of those is actually required by the license.
For access to the source code of this GitBook, please see this GitHub repository.