-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Guided tour - interactive tutorial complete version (#27)
* guided tour * QA based on the interactive tutorial
- Loading branch information
1 parent
69540bb
commit c1078f4
Showing
41 changed files
with
2,205 additions
and
1,007 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,224 @@ | ||
## Running a task locally | ||
|
||
In this section, we will learn how to run a task with handoff locally, | ||
using project 01_word_count. | ||
|
||
|
||
|
||
Each project directory contains: | ||
|
||
``` | ||
> ls -l 01_word_count | ||
``` | ||
``` | ||
files | ||
project.yml | ||
``` | ||
|
||
|
||
project.yml looks like: | ||
|
||
``` | ||
> cat 01_word_count/project.yml | ||
``` | ||
|
||
``` | ||
commands: | ||
- command: cat | ||
args: "./files/the_great_dictator_speech.txt" | ||
- command: wc | ||
args: "-w" | ||
envs: | ||
- key: TITLE | ||
value: "The Great Dictator" | ||
``` | ||
|
||
|
||
Here, | ||
|
||
- `commands` lists the commands and arguments. | ||
- `envs` lists the environment varaibles. | ||
|
||
|
||
The example from 01_word_count runs a command line equivalent of: | ||
|
||
``` | ||
cat ./files/the_great_dictator_speech.txt | wc -w | ||
``` | ||
|
||
Now let's run. Try entering this command below: | ||
|
||
``` | ||
> handoff --project 01_word_count --workspace workspace run local | ||
``` | ||
``` | ||
INFO - 2020-08-06 03:35:12,691 - handoff.config - Reading configurations from 01_word_count/project.yml | ||
INFO - 2020-08-06 03:35:12,693 - handoff.config - Setting environment variables from config. | ||
INFO - 2020-08-06 03:35:12,771 - botocore.credentials - Found credentials in shared credentials file: ~/.aws/credentials | ||
INFO - 2020-08-06 03:35:13,056 - handoff.config - You have the access to AWS resources. | ||
WARNING - 2020-08-06 03:35:13,123 - handoff.config - Environment variable HO_BUCKET is not set. Remote file read/write will fail. | ||
INFO - 2020-08-06 03:35:13,123 - handoff.config - Writing configuration files in the workspace configuration directory workspace/config | ||
INFO - 2020-08-06 03:35:13,123 - handoff.config - Copying files from the local project directory 01_word_count | ||
INFO - 2020-08-06 03:35:13,124 - handoff.config - Running run local in workspace directory | ||
INFO - 2020-08-06 03:35:13,124 - handoff.config - Job started at 2020-08-06 03:35:13.124542 | ||
INFO - 2020-08-06 03:35:13,130 - handoff.config - Job ended at 2020-08-06 03:35:13.130391 | ||
``` | ||
|
||
|
||
If you see the output that looks like: | ||
|
||
``` | ||
INFO - 2020-08-03 04:51:01,971 - handoff.config - Reading configurations from 01_word_count/project.yml | ||
... | ||
INFO - 2020-08-03 04:51:02,690 - handoff.config - Processed in 0:00:00.005756 | ||
``` | ||
|
||
|
||
Then great! You just ran the first local test. It created a workspace | ||
directory that looks like: | ||
|
||
``` | ||
> ls -l workspace | ||
``` | ||
``` | ||
artifacts | ||
config | ||
files | ||
``` | ||
|
||
And the word count is stored at workspace/artifacts/state. Here is the content: | ||
|
||
``` | ||
> cat workspace/artifacts/state | ||
``` | ||
|
||
``` | ||
644 | ||
``` | ||
|
||
|
||
By the way, the example text is from the awesome speech by Charlie Chaplin's | ||
in the movie the Great Dictator. | ||
|
||
Here is a link to the famous speech scene. | ||
Check out on YouTube: https://www.youtube.com/watch?v=J7GY1Xg6X20 | ||
|
||
|
||
|
||
And here is the first few paragraphs of the text: | ||
|
||
``` | ||
I’m sorry, but I don’t want to be an emperor. That’s not my business. I don’t want to rule or conquer anyone. I should like to help everyone - if possible - Jew, Gentile - black man - white. We all want to help one another. Human beings are like that. We want to live by each other’s happiness - not by each other’s misery. We don’t want to hate and despise one another. In this world there is room for everyone. And the good earth is rich and can provide for everyone. The way of life can be free and beautiful, but we have lost the way. | ||
Greed has poisoned men’s souls, has barricaded the world with hate, has goose-stepped us into misery and bloodshed. We have developed speed, but we have shut ourselves in. Machinery that gives abundance has left us in want. Our knowledge has made us cynical. Our cleverness, hard and unkind. We think too much and feel too little. More than machinery we need humanity. More than cleverness we need kindness and gentleness. Without these qualities, life will be violent and all will be lost…. | ||
``` | ||
|
||
|
||
Now to the second example. This time project.yml looks like: | ||
|
||
``` | ||
> cat 02_collect_stats/project.yml | ||
``` | ||
|
||
``` | ||
commands: | ||
- command: cat | ||
args: ./files/the_great_dictator_speech.txt | ||
- command: python files/stats_collector.py | ||
- command: wc | ||
args: -w | ||
``` | ||
|
||
|
||
...which is shell equivalent to | ||
|
||
``` | ||
cat ./files/the_great_dictator_speech.txt | python ./files/stats_collector.py | wc -w | ||
``` | ||
|
||
The script for the second command stats_collector.py can be found in | ||
02_collect_stats/files directory and it is a Python script that looks like: | ||
|
||
|
||
``` | ||
> cat 02_collect_stats/files/stats_collector.py | ||
``` | ||
|
||
``` | ||
#!/usr/bin/python | ||
import io, json, logging, sys, os | ||
LOGGER = logging.getLogger() | ||
def collect_stats(outfile): | ||
""" | ||
Read from stdin and count the lines. Output to a file after done. | ||
""" | ||
lines = io.TextIOWrapper(sys.stdin.buffer, encoding="utf-8") | ||
output = {"rows_read": 0} | ||
for line in lines: | ||
try: | ||
o = json.loads(line) | ||
print(json.dumps(o)) | ||
if o["type"].lower() == "record": | ||
output["rows_read"] += 1 | ||
except json.decoder.JSONDecodeError: | ||
print(line) | ||
output["rows_read"] += 1 | ||
with open(outfile, "w") as f: | ||
json.dump(output, f) | ||
f.write("\n") | ||
if __name__ == "__main__": | ||
collect_stats("artifacts/collect_stats.json") | ||
``` | ||
|
||
The script reads from stdin and counts the lines while passing the raw input to stdout. | ||
The raw text is then processed by the third command (wc -w) and it conts the number of words. | ||
|
||
|
||
|
||
Now let's run. Try entering this command below: | ||
|
||
``` | ||
> handoff --project 02_collect_stats --workspace workspace run local | ||
``` | ||
``` | ||
INFO - 2020-08-06 03:35:13,401 - handoff.config - Reading configurations from 02_collect_stats/project.yml | ||
INFO - 2020-08-06 03:35:13,402 - handoff.config - Setting environment variables from config. | ||
INFO - 2020-08-06 03:35:13,481 - botocore.credentials - Found credentials in shared credentials file: ~/.aws/credentials | ||
INFO - 2020-08-06 03:35:13,765 - handoff.config - You have the access to AWS resources. | ||
WARNING - 2020-08-06 03:35:13,830 - handoff.config - Environment variable HO_BUCKET is not set. Remote file read/write will fail. | ||
INFO - 2020-08-06 03:35:13,830 - handoff.config - Writing configuration files in the workspace configuration directory workspace/config | ||
INFO - 2020-08-06 03:35:13,830 - handoff.config - Copying files from the local project directory 02_collect_stats | ||
INFO - 2020-08-06 03:35:13,831 - handoff.config - Running run local in workspace directory | ||
INFO - 2020-08-06 03:35:13,831 - handoff.config - Job started at 2020-08-06 03:35:13.831683 | ||
INFO - 2020-08-06 03:35:13,881 - handoff.config - Job ended at 2020-08-06 03:35:13.881507 | ||
INFO - 2020-08-06 03:35:13,881 - handoff.config - Processed in 0:00:00.049824 | ||
``` | ||
|
||
Let's check out the contents of the second command: | ||
|
||
|
||
``` | ||
> cat workspace/artifacts/collect_stats.json | ||
``` | ||
|
||
``` | ||
{"rows_read": 15} | ||
``` | ||
|
||
|
||
In the next section, we will try pullin the currency exchange rate data. | ||
You will also learn how to create Python virtual enviroments for each command | ||
and pip-install commands. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
## Virtual environment and install | ||
|
||
In this section, we will retrieve currency exchange rates and write out to CSV | ||
file. | ||
|
||
We will install singer.io (https://singer.io), a data collection framework, | ||
in Python vitual environment. | ||
|
||
|
||
|
||
We will use 03_exchange_rates project. project.yml looks like: | ||
|
||
``` | ||
> cat 03_exchange_rates/project.yml | ||
``` | ||
|
||
``` | ||
commands: | ||
- command: "tap-exchangeratesapi" | ||
args: "--config config/tap-config.json" | ||
venv: "proc_01" | ||
installs: | ||
- "pip install tap-exchangeratesapi" | ||
- command: "python files/stats_collector.py" | ||
venv: "proc_01" | ||
- command: "target-csv" | ||
args: "--config config/target-config.json" | ||
venv: "proc_02" | ||
installs: | ||
- "pip install target-csv" | ||
deploy: | ||
provider: "aws" | ||
platform: "fargate" | ||
envs: | ||
resource_group: "handoff-test" | ||
docker_image: "singer_exchange_rates_to_csv" | ||
task: "test-03-exchange-rates" | ||
``` | ||
|
||
|
||
...which is shell equivalent to | ||
|
||
tap-exchangeratesapi | python files/stats_collector.py | target-csv | ||
|
||
|
||
|
||
Before we can run this, we need to install tap-exchangeratesapi and target-csv. | ||
The instructions for the install are listed in install section of project.yml. | ||
|
||
Notice `venv` entries for each command. handoff can create Python virtual | ||
enviroment for each command to avoid conflicting dependencies among the | ||
commands. | ||
|
||
To install everything, run this command: | ||
|
||
``` | ||
> handoff -p 03_exchange_rates -w workspace_03 workspace install | ||
``` | ||
``` | ||
INFO - 2020-08-06 03:35:14,158 - handoff.config - Reading configurations from 03_exchange_rates/project.yml | ||
INFO - 2020-08-06 03:35:14,240 - botocore.credentials - Found credentials in shared credentials file: ~/.aws/credentials | ||
INFO - 2020-08-06 03:35:14,524 - handoff.config - You have the access to AWS resources. | ||
INFO - 2020-08-06 03:35:14,524 - handoff.config - Platform: aws | ||
INFO - 2020-08-06 03:35:19,456 - handoff.config - Running /bin/bash -c "source proc_01/bin/activate && pip install wheel && pip install tap-exchangeratesapi" | ||
Requirement already satisfied: wheel in ./proc_01/lib/python3.6/site-packages (0.34.2) | ||
Processing /home/ubuntu/.cache/pip/wheels/1f/73/f9/xxxxxxxx0dba8423841c1404f319bb/tap_exchangeratesapi-0.1.1-cp36-none-any.whl | ||
Processing /home/ubuntu/.cache/pip/wheels/6e/07/1b/xxxxxxxx6d9ce55c05f67a69127e25/singer_python-5.3.3-cp36-none-any.whl | ||
Processing /home/ubuntu/.cache/pip/wheels/fc/d8/34/xxxxxxxx027b62dfcf922fdf8e396d/backoff-1.3.2-cp36-none-any.whl | ||
Collecting requests==2.21.0 | ||
. | ||
. | ||
. | ||
Collecting python-dateutil | ||
Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB) | ||
Collecting pytzdata | ||
Using cached pytzdata-2020.1-py2.py3-none-any.whl (489 kB) | ||
Collecting pytz | ||
Using cached pytz-2020.1-py2.py3-none-any.whl (510 kB) | ||
Collecting six>=1.5 | ||
Using cached six-1.15.0-py2.py3-none-any.whl (10 kB) | ||
Installing collected packages: jsonschema, simplejson, pytz, tzlocal, six, python-dateutil, pytzdata, pendulum, singer-python, target-csv | ||
Successfully installed jsonschema-2.6.0 pendulum-1.2.0 python-dateutil-2.8.1 pytz-2020.1 pytzdata-2020.1 simplejson-3.11.1 singer-python-2.1.4 six-1.15.0 target-csv-0.3.0 tzlocal-2.1 | ||
``` | ||
|
||
Now let's run the task. Try entering this command below: | ||
|
||
``` | ||
> handoff -p 03_exchange_rates -w workspace_03 run local | ||
``` | ||
``` | ||
INFO - 2020-08-06 03:35:29,258 - handoff.config - Reading configurations from 03_exchange_rates/project.yml | ||
INFO - 2020-08-06 03:35:29,339 - botocore.credentials - Found credentials in shared credentials file: ~/.aws/credentials | ||
INFO - 2020-08-06 03:35:29,626 - handoff.config - You have the access to AWS resources. | ||
INFO - 2020-08-06 03:35:29,626 - handoff.config - Platform: aws | ||
INFO - 2020-08-06 03:35:29,626 - handoff.config - Setting environment variables from config. | ||
INFO - 2020-08-06 03:35:29,693 - handoff.config - Environment variable HO_BUCKET was set autoamtically as xxxxxxxxxxxx-handoff-test | ||
INFO - 2020-08-06 03:35:29,693 - handoff.config - Writing configuration files in the workspace configuration directory workspace_03/config | ||
INFO - 2020-08-06 03:35:29,694 - handoff.config - Copying files from the local project directory 03_exchange_rates | ||
INFO - 2020-08-06 03:35:29,695 - handoff.config - Running run local in workspace_03 directory | ||
INFO - 2020-08-06 03:35:29,695 - handoff.config - Job started at 2020-08-06 03:35:29.695732 | ||
. | ||
. | ||
. | ||
INFO - 2020-08-06 03:35:33,964 - handoff.config - Job ended at 2020-08-06 03:35:33.964206 | ||
INFO - 2020-08-06 03:35:33,964 - handoff.config - Processed in 0:00:04.268474 | ||
``` | ||
|
||
This process should have created a CSV file in artifacts directory: | ||
|
||
``` | ||
exchange_rate-20200806T033530.csv | ||
``` | ||
|
||
...which looks like: | ||
|
||
``` | ||
CAD,HKD,ISK,PHP,DKK,HUF,CZK,GBP,RON,SEK,IDR,INR,BRL,RUB,HRK,JPY,THB,CHF,EUR,MYR,BGN,TRY,CNY,NOK,NZD,ZAR,USD,MXN,SGD,AUD,ILS,KRW,PLN,date | ||
0.0127290837,0.0725398406,1.3197211155,0.4630976096,0.0618218792,2.9357569721,0.2215388446,0.007434429,0.0401958831,0.0863047809,135.1005146082,0.7041915671,0.050374336,0.6657569721,0.0625373506,1.0,0.29312749,0.0088188911,0.0083001328,0.0399311089,0.0162333997,0.0642571381,0.0655312085,0.0889467131,0.0142670983,0.158440405,0.0093592297,0.2132744024,0.0130336985,0.0134852258,0.032375498,11.244189907,0.0371372842,2020-07-10T00:00:00Z | ||
0.0126573311,0.072330313,1.313014827,0.4612685338,0.061324547,2.9145799012,0.2195057661,0.007408402,0.0399036244,0.085529654,134.613509061,0.7019439868,0.049830313,0.6601894563,0.0620593081,1.0,0.2929324547,0.0088014827,0.0082372323,0.0397907743,0.0161103789,0.0641054366,0.0653286656,0.0878500824,0.0141894563,0.1562817133,0.0093319605,0.209931631,0.0129678748,0.013383855,0.0321466227,11.2139209226,0.0368682043,2020-07-13T00:00:00Z | ||
``` | ||
|
||
|
||
Now that we know how to run locally, we will gradually thinking about how to deploy this in the cloud *severlessly*. | ||
We will learn how to save and fetch the configurations to the remote storage. | ||
Before doing that, we will cover how to set up AWS account and profile in the next section. | ||
|
Oops, something went wrong.