diff --git a/README.md b/README.md
index b5cffd4ed..812f98d33 100644
--- a/README.md
+++ b/README.md
@@ -64,12 +64,7 @@ cd Navigatum
### Data Processing
In case you do not want to work on the data processing, you can instead
-download the latest compiled files:
-
-```bash
-wget -P data/output https://nav.tum.de/cdn/api_data.json
-wget -P data/output https://nav.tum.de/cdn/search_data.json
-```
+download the latest compiled files via running the server.
Else you can follow the steps in the [data documentation](data/README.md).
@@ -84,9 +79,11 @@ docker compose -f docker-compose.local.yml up --build
```
> [!NOTE]
-> While most of the setup is simple, we need to download data (only Oberbayern is needed) for the initial setup. This takes 1-2 minutes.
-> Please first bring up a [postgis](https://postgis.net/) instance (for example via `docker compose -f docker-compose.local.yml up --build`) and then run:
->
+> While most of the setup is simple, we need to download data (only Oberbayern is needed) for the initial setup. This
+> takes 1-2 minutes.
+> Please first bring up a [postgis](https://postgis.net/) instance (for example
+> via `docker compose -f docker-compose.local.yml up --build`) and then run:
+>
> ```bash
> wget -O data.pbf https://download.geofabrik.de/europe/germany/bayern/oberbayern-latest.osm.pbf
> docker run -it -v $(pwd):/data -e PGPASSWORD=CHANGE_ME --network="host" iboates/osm2pgsql:latest osm2pgsql --create --slim --database postgres --user postgres --host 127.0.0.1 --port 5432 /data/data.pbf --hstore --hstore-add-index --hstore-column raw
diff --git a/data/README.md b/data/README.md
index 939099caa..b57606d4e 100644
--- a/data/README.md
+++ b/data/README.md
@@ -9,16 +9,21 @@ This folder contains:
The code to retrieve external data, as well as externally retrieved data is located under `external`.
> [!WARNING]
-> A lot of this code is more a work in progress than finished. Especially features such as POIs, custom maps or other data types such as events are drafted but not yet fully implemented.
+> A lot of this code is more a work-in-progress than finished.
+> Especially features such as POIs, custom maps or other data types such as events are drafted but not yet fully implemented.
>
-> New external data might break the scripts from time to time, as either rooms or buildings are removed, the external data has errors or we make assumptions here that turn out to be wrong.
+> New external data might break the scripts from time to time,
+> - as either rooms or buildings are removed,
+> - the external data has errors,
+> - or we make assumptions here that turn out to be wrong.
## Getting started
### Prerequisites
-For getting started, there are some system dependencys which you will need.
-Please follow the [system dependencys docs](/resources/documentation/Dependencys.md) before trying to run this part of our project.
+For getting started, there are some system dependencies which you will need.
+Please follow the [system dependencies docs](/resources/documentation/Dependencys.md) before trying to run this part of
+our project.
### Dependencies
@@ -63,7 +68,8 @@ python3 tumonline.py
python3 compile.py
```
-The exported datasets will be stored in `output/` as JSON files.
+The exported datasets will be stored in `output/`
+as [JSON](https://www.json.org/json-de.html)/[Parquet](https://wikipedia.org/wiki/Apache_Parquet) files.
### Directory structure
@@ -92,18 +98,33 @@ data
```json
{
- "entry-id": {
- "id": "entry-id",
- "type": "room",
- ... data as specified in `data-format.yaml`
- },
- ... all other entries in the same form
+ "entry-id": {
+ "id": "entry-id",
+ "type": "room",
+ ...
+ data
+ as
+ specified
+ in
+ `
+ data-format.yaml
+ `
+ },
+ ...
+ all
+ other
+ entries
+ in
+ the
+ same
+ form
}
```
## Compilation process
-The data compilation is made of indiviual processing steps, where each step adds new or modifies the current data. The basic structure of the data however stays the same from the beginning on and is specified in `data-format_*.yaml`.
+The data compilation is made of indiviual processing steps, where each step adds new or modifies the current data. The
+basic structure of the data however stays the same from the beginning on and is specified in `data-format_*.yaml`.
- **Step 00**: The first step reads the base root node, areas, buildings etc. from the
`sources/00_areatree` file and creates an object collection (python dictionary)
@@ -111,18 +132,18 @@ The data compilation is made of indiviual processing steps, where each step adds
- **Steps 01-29**: Within these steps, new rooms or POIs might be added, however no
new areas or buildings, since all areas and buildings have to be defined in the
_areatree_. After them, no new entries are being added to the data.
- - **Steps 0x**: Supplement the base data with extended custom data.
- - **Steps 1x**: Import rooms and building information from external sources
- - **Steps 2x**: Import POIs
+ - **Steps 0x**: Supplement the base data with extended custom data.
+ - **Steps 1x**: Import rooms and building information from external sources
+ - **Steps 2x**: Import POIs
- **Steps 30-89**: Later steps are intended to augment the entries with even more
information and to ensure a consistent format. After them, no new (external or custom)
information should be added to the data.
- - **Steps 3x**: Make data more coherent & structural stuff
- - **Steps 4x**: Coordinates and maps
- - **Steps 5x**: Add images
- - **Steps 6x**: -
- - **Steps 7x**: -
- - **Steps 8x**: Generate properties and sections (such as overview sections)
+ - **Steps 3x**: Make data more coherent & structural stuff
+ - **Steps 4x**: Coordinates and maps
+ - **Steps 5x**: Add images
+ - **Steps 6x**: -
+ - **Steps 7x**: -
+ - **Steps 8x**: Generate properties and sections (such as overview sections)
- **Steps 90-99**: Process and export for search.
- **Step 100**: Export final data (for use in the API). Some temporary data fields might be removed at this point.
@@ -136,12 +157,16 @@ Details about the formatting are given at the head of the file.
## License
-The source data (i.e. all files located in `sources/` that are not images) is made available under the Open Database License: .
-Any rights in individual contents of the database are licensed under the Database Contents License: .
+The source data (i.e. all files located in `sources/` that are not images) is made available under the Open Database
+License: .
+Any rights in individual contents of the database are licensed under the Database Contents
+License: .
> [!WARNING]
-> The images in `sources/img/` are subject to their own licensing terms, which are stated in the file `sources/img/img-sources.yaml`.
-> The compiled database may contain contents from external sources (i.e. all files in `external/`) that do have different license terms.
+> The images in `sources/img/` are subject to their own licensing terms, which are stated in the
+> file `sources/img/img-sources.yaml`.
+> The compiled database may contain contents from external sources (i.e. all files in `external/`) that do have
+> different license terms.
---