Data Engineer Interview Test

Tools and technologies

Meltano - very convenient open source tool for building pipelines with:
- Singer taps and targets - ready-to-use extract and load scripts
- dbt - transform data with simple SELECT statements and Jinja templating
- Apache Airflow - the one and only
Apache Superset - "open source Tableau" (also comes with a useful SQL editor for ad hoc queries)
PostgreSQL
Docker and docker-compose

ETL (ELT, really)

The data for this exercise can be found on the data.zip file. Can you describe the file format?

They seem to be typical flat files delimited by pipe characters. Extra pipe at the end of each line.

Super Bonus: generate your own data through the instructions on the encoded file bonus_etl_data_gen.txt. To get the bonus points, please encoded the file with the instructions were used to generate the files.

Done ✅. File was encoded in base64.

Code you scripts to load the data into a database.

Meltano, building on Singer, makes this dead simple:

$ meltano add extractor tap-spreadsheets-anywhere
$ meltano add loader target-postgres --variant meltano
$ meltano elt tap-spreadsheets-anywhere target-postgres

Most of the work goes into the configuration.

Design a star schema model which the data should flow.

Build your process to load the data into the star schema

Bonus point:

add a fields to classify the customer account balance in 3 groups

add revenue per line item

convert the dates to be distributed over the last 2 years

How to schedule this process to run multiple times per day?

Bonus: What to do if the data arrives in random order and times via streaming?

How to deploy this code?

Bonus: Can you make it to run on a container like process (Docker)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Engineer Interview Test

Tools and technologies

ETL (ELT, really)

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Engineer Interview Test

Tools and technologies

ETL (ELT, really)