-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathstop_and_search.py
executable file
·112 lines (94 loc) · 3.67 KB
/
stop_and_search.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# -*- coding: utf-8 -*-
"""takehome.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1hlfcHYkCxovW_gFSu7J44l79D2NyU28U
"""
""" The latest data is available through the public Police Data API
https
://data.police.uk/docs/method/stops-force/
Utilise the API documentation to fetch all historical search data relating to a force. This API
does not require authentication.
The example URL below only fetches 1-month of data:
force?force=avon-and-somerset&date=2023-01
Notes: Historical data only goes back to a certain time.
Your task is to complete following steps:
https://data.police.uk/api/stops
• Write a script in Python to programmatically pull down all stop and search data from the
API for the Metropolitan Police Service (i.e. just one force).
• Combine all data into a Pandas dataframe.
• Clean and format the data as you see appropriate.
• Write the data out to a csv file.
• Write a process that pulls the latest data at a given time each day and updates the existing
csv file.
• Prepare a short explanation of how you would store the data in a relational database if
required – think about the structure of the data and what schema you would apply.
• Suppose entries were updated or deleted in the API – how would your system handle
this? What changes to their API would you recommend to help make the update process
easier?
• Prepare an example of a SQL statement that creates a view to return an aggregate count
of crime type by date.
• Use docker to containerise your script to be deployed in any cloud service as a data
fetching service.
Other things to consider:
• If there are other technologies that you want to include in your solution, do feel free –
don’t feel
"""
import requests
import pandas as pd
import datetime
import os
import schedule
import time
FORCE_ID = 'metropolitan'
BASE_URL = 'https://data.police.uk/api/stops-force'
csv_file= 'met_data.csv'
def fetch_data(force_id, date):
url = f'{BASE_URL}?force={force_id}&date={date}'
response = requests.get(url)
if response.status_code == 200:
return response.json()
else:
return []
def fetch_historical_data(force_id):
data = []
current_date = datetime.date.today()
start_date = datetime.date(2024, 1, 1)
while start_date < current_date:
formatted_date = start_date.strftime('%Y-%m')
print(f'Fetching data for {formatted_date}')
monthly_data = fetch_data(force_id, formatted_date)
data.extend(monthly_data)
if start_date.month == 12:
start_date = datetime.date(start_date.year + 1, 1, 1)
else:
start_date = datetime.date(start_date.year, start_date.month + 1, 1)
return data
def clean_data(df):
df.columns = df.columns.str.lower().str.replace(' ', '_')
df['datetime'] = pd.to_datetime(df['datetime'], errors='coerce')
df.fillna('', inplace=True)
df = df.dropna(subset=['datetime'])
return df
def daily_update():
current_date = datetime.date.today()
formatted_date = current_date.strftime('%Y-%m')
new_data = fetch_data(FORCE_ID, formatted_date)
if new_data:
new_df = pd.DataFrame(new_data)
new_df_clean = clean_data(new_df)
new_df_clean.to_csv(csv_file, mode = 'a', header=False, index = False)
print(f'Appended new data for {formatted_date}')
else:
print('No new data found')
if __name__ == '__main__':
met_data = fetch_historical_data(FORCE_ID)
df = pd.DataFrame(met_data)
df.to_csv(csv_file, index=False)
print(f'Data saved to {csv_file}')
df_clean = clean_data(df)
df_clean.to_csv(csv_file, index=False)
schedule.every().day.at('00:00').do(daily_update)
while True:
schedule.run_pending()
time.sleep(60)