Skip to content

Latest commit

 

History

History
391 lines (276 loc) · 14.8 KB

README.md

File metadata and controls

391 lines (276 loc) · 14.8 KB

qualR: An R package to download São Paulo and Rio de Janeiro air pollution data

R-CMD-check Coverage Status DOI CodeFactor

The goal of qualR is to facilitate the download of air pollutants and meteorological information from CETESB QUALAR System for São Paulo, and MonitorAr Program, for Rio de Janeiro. This information is often used for air pollution data analysis and for air quality model evaluation.

qualR functions return completed data frames (missing hours padded out with NA), with a date column in POSIXct for temporal aggregation and for compatibility with openair package . qualR improves air pollution research by easily producing ready-to-use datasets, by facilitating exploratory data analysis, and by fostering reproducibility.

Installation

You can install it directly using:

install.packages('qualR', repos = c('https://ropensci.r-universe.dev', 'https://cloud.r-project.org'))

How to use

qualR have the following functions:

  • cetesb_retrieve_param: Download a list of different parameter from one air quality station (AQS) from CETESB QUALAR System.
  • cetesb_retrieve_pol: Download criteria pollutants from one AQS from CETESB QUALAR System.
  • cetesb_retrieve_met: Download meteorological parameters from one AQS from CETESB QUALAR System.
  • cetesb_retrieve_met_pol: Download meteorological parameters and criteria pollutants from one AQS from CETESB QUALAR System.
  • monitor_ar_retrieve_param: Download a list of different parameters from MonitorAr - Rio program.
  • monitor_ar_retrieve_pol: Download criteria pollutants from one AQS from MonitorAr - Rio program.
  • monitor_ar_retrieve_met: Download meteorological parameters from one AQS from MonitorAr - Rio program.
  • monitor_ar_retrieve_met_pol: Download meteorological parameters and criteria pollutants from one AQS from MonitorAr - Rio Program.

These functions return a data frame, with a date column in POSIXct, which allows you to use other packages for data analysis, such as openair.

To download the information for São Paulo, you first need to have an account in CETESB QUALAR system. Here, you can sign up to CETESB QUALAR system. MonitorAr doesn't require an account.

Then you have to know the AQS and parameter codes (i.e. pollutant or meteorological data) to use these functions. Currently, cetesb_retrieve family functions also accept the parameter abbreviation (i.e "O3" instead of 63), and the complete name of the AQS (i.e "Pinheiros" instead of 99) as inputs. To check those parameters you can check the following datasets:

library(qualR)

# To see all CETESB AQS names with their codes and lat lon
cetesb_aqs

# To see all CETESB AQS parameters with their codes and abbreviation
cetesb_param

# To see all MonitorAr-Rio AQS names with their codes and lat and lon
monitor_ar_aqs

# To see all MonitorAr-Rio parameters with their codes
monitor_ar_param

Using qualR to download CETESB data

Downloading multiple parameter from one AQS

If you want to download Ozone information from Pinheiros AQS, from January first to January 7th, you can do:

library(qualR)

cetesb_aqs # To check Pinheiros aqs_code
cetesb_param # To check Ozone pol_code

my_user_name <- "[email protected]"
my_password <- "drowssap"
pin_code <- 99
start_date <- "01/01/2020"
end_date <- "07/01/2020"

pin_o3 <- cetesb_retrieve_param(my_user_name,
                              my_password,
                              "O3",
                              pin_code, # It could also be "Pinheiros"
                              start_date,
                              end_date)

(Note: Previous cetesb_retrieve function now is depreciated use cetesb_retrieve_param instead)

Maybe you just need a couple of parameters. For example, if you want to download ozone and wind speed and direction from Pinheiros AQS, you can do the following:

library(qualR)

cetesb_aqs # To check Pinheiros aqs_code

my_user_name <- "[email protected]"
my_password <- "drowssap"
start_date <- "01/01/2020"
end_date <- "07/01/2020"

cetesb_param # To check ozone, wind speed and wind direction abbreviations

pin_o3_ws_wd <- cetesb_retrieve_param(my_user_name,
                                    my_password,
                                    c("O3", "VV", "VD"),
                                    "Pinheiros",
                                    start_date = "01/01/2020",
                                    end_date = "07/01/2020")

Downloading criteria pollutants from one AQS

We use cetesb_retrieve_pol. This function already have the parameter codes for O3, NO, NO2, NOX, CO, PM10 and PM2.5. So, it doesn't require pol_code, only aqs_code. CO is in ppm and NOX is in ppb, the other pollutants are in μg/m3. In this example, we download all these pollutants from Pinheiros AQS.

library(qualR)

cetesb_aqs # To check Pinheiros aqs_code

my_user_name <- "[email protected]"
my_password <- "drowssap"
pin_code <- 99
start_date <- "01/01/2020"
end_date <- "07/01/2020"

pin_pol <- cetesb_retrieve_pol(my_user_name,
                             my_password,
                             pin_code, # It could also be "Pinheiros"
                             start_date,
                             end_date)

Downloading meteorological parameters from one AQS

We use cetesb_retrieve_met. This function already has the parameter codes for Temperature (°C), Relative Humidity (%), Wind Speed (m/s) and wind Direction (°), and Pressure (hPa). So, it doesn't require pol_code, only aqs_code. In this example, we download all these parameters from Pinheiros AQS. Remember that CETESB uses 777 and 888 values in wind direction to indicate calm wind and no data, these values appear in the final data frame.

library(qualR)

cetesb_aqs # To check Pinheiros aqs_code

my_user_name <- "[email protected]"
my_password <- "drowssap"
pin_code <- 99
start_date <- "01/01/2020"
end_date <- "07/01/2020"

pin_met <- cetesb_retrieve_met(my_user_name,
                             my_password,
                             pin_code, # It could also be Pinheiros
                             start_date,
                             end_date)

Downloading meteorological and criteria pollutant from one AQS

This is the equivalent to run cetesb_retrieve_met and cetesb_retrieve_pol at the same time, and It will return all the data in one data frame.

library(qualR)

cetesb_aqs # To check Pinheiros aqs_code

my_user_name <- "[email protected]"
my_password <- "drowssap"
pin_code <- 99
start_date <- "01/01/2020"
end_date <- "07/01/2020"

pin_all <- cetesb_retrieve_met_pol(my_user_name,
                                my_password,
                                pin_code,
                                start_date,
                                end_date)

Some other examples

To .csv

Now, We want to download all the information from Ibirapuera AQS, and then export this data in .csv to be read by other software. qualR functions have the argument to_csv, which by default has a FALSE value. So, if you want to export the data to csv, you just need to change it to TRUE.

The csv file have the following file name: {aqs_name}_{pol}_{start_date}_{end_date}.csv. For the functions that retrieve more than one parameter the file name is: {aqs_name}_{TYPE}_{start_date}_{end_date}.csv, where TYPE is "POL", "MET", or "MET_POL".

library(qualR)

cetesb_aqs # To check Ibirapuera aqs_code

my_user_name <- "[email protected]"
my_password <- "drowssap"
ibi_code <- 83
start_date <- "01/01/2020"
end_date <- "07/01/2020"

ibi_all <- cetesb_retrieve_met_pol(my_user_name,
                                my_password,
                                ibi_code,
                                start_date,
                                end_date,
                                to_csv = TRUE)

In this case, we will get the file Ibirapuera_MET_POL_01-01-2020_07-01-2020.csv.

A variable from all CETESB AQS

Sometimes, to check the spatial distribution of air pollutants, you need to download a pollutant from all the AQS. In this example, we download a year of Ozone from all CETESB AQS.

library(qualR)

my_user_name <- "[email protected]"
my_password <- "drowssap"
o3_code <- 63
start_date <- "01/01/2019"
end_date <- "31/12/2019"

# All_o3 is a list with a data frame per AQS
all_o3 <- lapply(cetesb_aqs$code, cetesb_retrieve_param,
                 username = my_user_name,
                 password = my_password,
                 parameters = "O3",
                 start_date = start_date,
                 end_date = end_date)

# If you want  to export all in csv
all_o3_csv <- do.call(rbind, all_o3)
write.table(all_o3_csv, "all_o3_csv.csv", sep = ",", row.names = F)
AQS latitudes and longitudes

Maybe you need to make a map of the AQS you used in your study. Now, we added latitude and longitude in degrees in the cetesb_aqs dataset:

library(qualR)

# To see all the AQS latitude and longitude
cetesb_aqs

Here are some examples to make some plots:

A better way to save your credentials

It is not so safe to write your user and password when you are coding, it is even more dangerous when we have to share our scripts. For this reason, it is a better practice (and safer) to save your credentials (i.e. user and password) in your global environment.

An easier way to do it is by using usethis package. So, first install it by:

install.packages("usethis")

Then use the function edit_r_environ(). It will show a new file called .Renviron, where you'll define your user and password.

library(usethis)

edit_r_environ()

It will open .Renviron file, there you define your credentials:

QUALAR_USER="[email protected]"
QUALAR_PASS="drowssap"

Save it, and the changes will work after restart R. To call them, you use Sys.getenv().

So now, if we replicate the previous example Downloading multiple parameter from one AQS, it will be something like this:

library(qualR)

cetesb_aqs # To check Pinheiros aqs_code
cetesb_param # To check Ozone pol_code

o3_code <- 63
pin_code <- 99
start_date <- "01/01/2020"
end_date <- "07/01/2020"

pin_o3 <- cetesb_retrieve_param(Sys.getenv("QUALAR_USER"), # calling your user
                              Sys.getenv("QUALAR_PASS"),  # calling your passord  
                              o3_code,
                              pin_code,
                              start_date,
                              end_date)

This idea came from this awesome post.

Using qualR to download MonitorAr - Rio data

Downloading one parameter from one AQS

Here we will download Ozone information from Iraja AQS for all February 2019 by using monitor_ar_retrieve_param function.

library(qualR)
monitor_ar_aqs # To check Iraja AQS code
monitor_ar_param # To check Ozone code

start_date <- "01/02/2019"
end_date <- "01/03/2019"
aqs_code <- "IR"
param <- "O3"

ir_o3 <- monitor_ar_retrieve_param(date_start, date_end, aqs_code, param)

Downloading multiple parameters from one AQS

monitor_ar_retrieve_param is similar to cetesb_retrieve_param, so it allows us to download multiple parameters. Here, we will download Ozone, Nitric oxide, Nitrogen dioxide, wind speed and direction.

library(qualR)
monitor_ar_aqs # To check Iraja AQS code
monitor_ar_param # To check parameter codes

date_start <- "01/02/2019"
date_end <- "01/03/2019"
aqs_code <- "IR"
params <- c("O3", "NO", "NO2", "Dir_Vento", "Vel_Vento")


ir_data <- monitor_ar_retrieve_param(date_start, date_end, aqs_code, params)

Caveat emptor

  • CETESB QUALAR system describes midnight as 24:00, and the first hour of each day starts at 1:00. qualR transform it to get the time in 00-23 hour notation, for that reason you'll get NA at 00:00 of your first downloaded day. So, consider download one day before your study period.
  • To pad-out with NA when there is a missing date, qualR "tricks" the date information, an assume it's on UTC (when in reality it's on "America/Sao_Paulo" time). This avoids problems with merging data frames and also with Daylight saving time (DST) issues. Beware of this,when dealing with study periods that include DST. It always a good idea, to double check by retrieving the suspicious date from CETESB QUALAR system.
  • Take into account that in CETESB data, the hourly averaged is the mean until the hour. That is, a concentration value for 22:00 is the mean from 21:01 to 22:00.
  • Consider the previous three points if you need to change from local time to UTC.
  • Currently, MonitorAr only has data until March, 2021.

Code of Conduct

Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Acknowledgments

Thanks to CETESB and to MonitorAr Program for make public this atmospheric data.

This work was supported by the Wellcome Trust [grant number 216087/Z/19/Z]. We acknowledge the programs CAPES (Coordenadoria de Aperfeiçoamento de Pessoal de Nível Superior), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), FAPESP (2016/18438-0 - Fundação de Amparo à Pesquisa do Estado do São Paulo).

Finally, we want to thanks to the LAPAT-IAG team for test and help to improve qualR.

Last but not least

I hope this package will help you on your research!