-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* modify getting started docs * remove wip warning from infra section * add datapuller documentation * datapuller local deevelopment docs * polish
- Loading branch information
Showing
6 changed files
with
52 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,15 @@ | ||
# Datapuller | ||
|
||
TODO | ||
Welcome to the `datapuller` section. | ||
|
||
## What is the `datapuller`? | ||
|
||
The `datapuller` is a modular collection of data-pulling scripts responsible for populating Berkeleytime's databases with course, class, section, grades, and enrollment data from the official university-provided APIs. This collection of pullers are unified through a singular entrypoint, making it incredibly easy for new pullers to be developed. The original proposal can be found [here](https://docs.google.com/document/d/1EdfI5Cmsk91LwZtUN0VSC5HEKy4RRuMhLhw8TRKRQrM/edit?tab=t.0#heading=h.c6lfrfjeglpv)[^1]. | ||
|
||
### Motivation | ||
|
||
Before the `datapuller`, all data updates were done through a single script run everyday. The lack of modularity made it difficult to increase or decrease the frequency of specific data types. For example, enrollment data changes rapidly during enrollment season—it would be beneficial to be able to update our data more frequently than just once a day. However, course data seldom changes—it would be efficient to update our data less frequently. | ||
|
||
Thus, `datapuller` was born, modularizing each puller into a separate script and giving us more control and increasing the fault-tolerance of each script. | ||
|
||
[^1]: Modifications to the initial proposal are not included in the document. However, the motivation remains relatively consistent. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Local & Remote Development | ||
|
||
## Local Development | ||
|
||
The nature of the `datapuller` separates it from the backend and frontend services. Thus, when testing locally, it is quicker and easier to build and run the `datapuller` separately from the backend/frontend stack. | ||
|
||
To run a specific puller, the datapuller must first be built, then the specific puller must be passed as a command[^1]. In addition, a Mongo instance should be running in the same network and the correct `MONGO_URI` in `.env`. | ||
|
||
```sh | ||
# ./berkeleytime | ||
|
||
# Run a Mongo instance. The name flag changes the MONGO_URI. | ||
# Here, it would be mongodb://mongodb:27017/bt. | ||
docker run --name mongodb --network bt --detach "mongo:7.0.5" | ||
|
||
# Build the datapuller-dev image | ||
docker build --target datapuller-dev --tag "datapuller-dev" . | ||
|
||
# Run the desired puller. The default puller is main. | ||
docker run --volume ./.env:/datapuller/apps/datapuller/.env --network bt \ | ||
"datapuller-dev" "--puller=courses" | ||
``` | ||
|
||
The valid pullers are `courses`, `classes`, `sections`, `grade-distributions`, and `main`. | ||
|
||
[^1]: Here, I reference the Docker world's terminology. In the Docker world, the `ENTRYPOINT` instruction denotes the the executable that cannot be overriden after the image is built. The `CMD` instruction denotes an argument that can be overriden after the image is built. In the Kubernetes world, the `ENTRYPOINT` analogous is the `command` field, while the `CMD` equivalent is the `args` field. | ||
|
||
## Remote Development | ||
|
||
The development CI/CD pipeline marks all `datapuller` CronJobs as [suspended](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/cron-job-v1/#CronJobSpec), preventing the `datapuller` jobs to be scheduled. To test a change, [manually run the desired puller](../infrastructure/runbooks.md#manually-run-datapuller). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters