This Python script scrapes air quality data from a website and saves it to an Excel sheet.
Reading Websites and IDs: The script reads a list of sites and details such as State, City, Site_ID from an Excel sheet ("sites_all.xlsx"). This list can be modified to include different sites.
For each site, the script generates an encoded query based on the from_date, to_date, state, city, station_id using the function create_encoded_query.
The script retrieves HTML content from the website using the encoded queries in the function get_info.
This is perhaps the most important part of the code to ensure it it actually running.
Click on this link to access the page.
Enter the correct captcha and then open the inspect panel by right clicking.
Next select networks -> ccr and choose headers
data:image/s3,"s3://crabby-images/46645/466459f45eb2101dce0bf6c6ec3ba475296cf13d" alt="Screenshot 2024-03-07 at 10 14 08 PM"
Copy the Cookie element and paste in the headers
headers["Cookie"] = '#paste copied element here and DONT remove the single quotations'
The script extracts relevant air quality data from the HTML content. Function printing_lines
The script saves the extracted data to a new Excel sheet ("Final_AQI.xlsx").
This script requires the following Python libraries: pandas requests certifi You can install them using
pip install pandas
pip install requests
pip install certifi
Download the following files:
air_quality.py: The Python script that performs the web scraping.
sites_all.xlsx: The Excel sheet containing the list all sites.
Final_AQI.xlsx: The Excel sheet storing the final output.
Open a terminal and navigate to the directory containing the downloaded files. Run the following command:
python air_quality.py
The script will update the excel sheet ("Final_AQI.xlsx") in the same directory. This sheet will contain the extracted air quality data for all specified sites.