Creating AWS lambda function for crime data ETL

Must wrap program into lamba_handler function:

import pandas as pd from sqlalchemy import create_engine

def lambda_handler(event, context): print('The crime program has started')

#read in data from DataSF df = pd.read_csv('https://data.sfgov.org/api/views/wg3w-h783/rows.csv?accessType=DOWNLOAD',parse_dates = ['Incident Datetime', 'Incident Date'], index_col=False) print('data recieved')

#preprocessing the dataframe df_crime = df[['Incident Datetime','Incident Date', 'Incident ID', 'Incident Category', 'Incident Description', 'Latitude', 'Longitude', 'Police District', 'Analysis Neighborhood']] df_crime = df_crime[df_crime['Longitude'].notna()] df_crime = df_crime.rename(columns={'Incident Datetime':'datetime','Incident Date':'date', 'Incident ID':'id', 'Incident Category':'category', 'Incident Description':'description', 'Latitude':'latitude', 'Longitude':'longitude', 'Police District':'pd_distirct', 'Analysis Neighborhood':'neighborhood'})

#define connect to postgresql via config file, see attached example URL = 'postgresql://postgres:postgres@34.125.36.4:5432/crime' engine = create_engine(URL)

#send dataframe to postgresql df_crime.to_sql('sf_crime', engine, if_exists='replace', index=False, method='multi',chunksize=1000)

print('Your database has been updated')
Use built in aws data wrangling layer for access to pandas
Make custom layer for sqlalchemy and psycopg2:
- create a folder named python
- pip install the needed libraries into that folder using target argument:
  - pip install aws-psycopg2 sqlalchemy -t python/
- zip the contents of the pythonfolder:
  - zip -r sql_lambda.zip python/
Upload the zip file to aws lamba layers
Add custom layer to funtion
Run test of lambda function

Set up Lambda function to run automatically every day at 4:05 am

Once the lambda function has been tested and it works go to AWS EventBridge
Create a rule using the cron expression option with the input 5 4 * * ? *
Select your target as the lambda function that was made and create the rule.
Now the lambda function should be executed daily at 4:05 am

select count(*)/count(distinct(date)) as crimes, 
date_part('hour', datetime) as hour_of_day 
from sf_crime 
group by hour_of_day;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crime_etl_lambda_function.md

crime_etl_lambda_function.md

Creating AWS lambda function for crime data ETL

Set up Lambda function to run automatically every day at 4:05 am

Files

crime_etl_lambda_function.md

Latest commit

History

crime_etl_lambda_function.md

File metadata and controls

Creating AWS lambda function for crime data ETL

Set up Lambda function to run automatically every day at 4:05 am