Skip to content

Latest commit

 

History

History
129 lines (110 loc) · 10.4 KB

Readme.md

File metadata and controls

129 lines (110 loc) · 10.4 KB

SynthFirm Input Data Generation

This folder include scripts that are used to generate SynthFirm inputs from raw data sources under various geographic resolution

Contact: Xiaodan Xu, Ph.D. ([email protected])

Updates (May 8, 2023): documented firm and employment data generation

Updates (May 11, 2023): documented CFS processing, supplier selection, I-O projection, and map generation

Updates (Aug 08, 2023): documented firm fleet generation

Updates (Oct 14, 2024): update documents to reflect V2 changes and improve readability

a. national-level input generation (regardless of study area)

step a.1 - processing CFS zone in cfs_to_faf_map_generation.R:

step a.2 - generate initial employment count per firm by firm size at national level in employment_count_by_firm.R:

  • Require CENSUS employment data by firm size (https://www.census.gov/programs-surveys/susb.html)
  • Produce average employment per firm by firm size group 'SynthFirm_parameters/employment_by_firm_size_gapfill.csv'
  • Produce average employment per firm by firm size group and industries 'SynthFirm_parameters/employment_by_firm_size_naics.csv'
  • While these two tables are not directly associated with synthetic firm output due to IPF fitting, it is needed to initiate firm synthesize and provide initial employment values.

step a.3 - project BEA I-O table for baseline year in BEA_IO_processor.py:

step a.4 - collect commodity flow distributions from CFS data in CFS_distribution_generator.py:

  • Require CFS 2017 public use data as input (cfs-2017-puf-csv.csv):https://www.census.gov/data/datasets/2017/econ/cfs/historical-datasets.html
  • Require several predefined lookup tables:
    • industy and commodity look up table (from MAG model): corresp_naics6_n6io_sctg_revised.csv
    • CFS and FAF zone lookup from FAF5 document: CFS_FAF_LOOKUP.csv (downloaded from FAF5 website under 'Related Information': https://faf.ornl.gov/faf5/)
    • commodity SCTG group definition: SCTG_Groups_revised.csv (defined by research team)
  • Produce the following 5 outputs:
    • Unit cost by commodity: SynthFirm_parameters/data_unitcost_cfs2017.csv
    • Unit cost by CFS zone and commodity: SynthFirm_parameters/data_unitcost_by_zone_cfs2017.csv
    • Shipment size with pre-defined percentile cut-off criteria: SynthFirm_parameters/max_load_per_shipment_{X}percent.csv -> x is pre-defined quantile value to select max load by sctg
    • Production/consumption value allocation factor by FAF zone: SynthFirm_parameters/producer_value_fraction_by_faf.csv AND SynthFirm_parameters/consumer_value_fraction_by_faf.csv
    • Seasonal load allocation factor by SCTG group for national freight trip generation (future work): seasonal_allocation_factor_by_sctg.csv

step a.5 - estimate supplier selection model parameter in supplier_selection_model_estimation.py:

  • Require CFS 2017 imputed data as input (CFS2017_national_forML_short.csv)
  • Require unit cost estimation by zone and sctg from step a.4 (data_unitcost_by_zone_cfs2017)
  • Require several predefined lookup tables:
    • CFS and FAF zone lookup from FAF5 document: CFS_FAF_LOOKUP.csv (downloaded from FAF5 website under 'Related Information': https://faf.ornl.gov/faf5/)
    • commodity SCTG group definition: SCTG_Groups_revised.csv (defined by research team)
  • Perform MNL model using CFS O-D data for supplier selection
  • Produce the following 5 outputs:
    • Avg. routed distance matrix from CFS observation: CFS2017_routed_distance_matrix.csv
    • Estimated model parameters: supplier_selection_national_sctg{sctg_group_id}.html

step a.6 - generate demand forecast inputs in demand_forecast_input_generation.py:

  • Require FAF5 and SCTG group definition (SCTG_Groups_revised.csv)
  • Require CFS and FAF zone lookup from FAF5 document: CFS_FAF_LOOKUP.csv (downloaded from FAF5 website under 'Related Information': https://faf.ornl.gov/faf5/)
  • Generate total production and consumption by FAF zone, SCTG and forecast year:
    • SynthFirm_parameters/total_commodity_production_{forecast_year}.csv
    • SynthFirm_parameters/total_commodity_consumption_{forecast_year}.csv
  • Generate unit cost by year from FAF5
    • SynthFirm_parameters/data_unitcost_by_zone_faf{forecast_year}.csv

b.study area input generation (need to specify study region)

step b.1 - collecting data from Census API in Census.R:

  • No input data required (need Census API key)
  • produce employment count by census block group and industry ({state}_naics.csv)
  • Produce census block group shape file within selected states ({state}_bg.geojson)

step b.2 - collect CBP employment data using FAFZone.R:

  • Require CFS to FAF crosswalk map from step a.1 (CFS2017_areas.geojson) and CFS zone to FAF zone lookup table (CFS_FAF_LOOKUP.csv) as input (downloaded from FAF5 website under 'Related Information': https://faf.ornl.gov/faf5/)
  • Require CBP complete county file from census.gov (sample: https://www.census.gov/data/datasets/2017/econ/cbp/2017-cbp.html)
  • Produce county and FAF zone lookup table within study area ({region_name}_FAFCNTY.csv)
  • Produce county establishment file before imputation (data_emp_cbp.csv)

step b.3 - impute firms and employment for missing industry in FAFZones_no_cbp.R (using firm data outside CBP for gap filling)

  • Require department of labor employment data for imputing firm count and employment of missing industries in CBP (https://www.bls.gov/cew/downloadable-data-files.htm)
  • Require outputs {region_name}_FAFCNTY.csv and data_emp_cbp.csv from step b.2
  • Produce imputed firm and employment output data_emp_cbp_imputed.csv

step b.4 - generate CBG tier and ranking for each industry in CBP_EMPRANKS.R

  • Require FAF zone county lookup table ({region_name}_FAFCNTY.csv) from step b.2- FAFZone.R
  • Require state employment data ({state}_naics.csv) from step b.1 - Census.R
  • Produce ranking of employment by industry within each MESOZONE (data_mesozone_emprankings.csv)
  • Produce mesozone - census GEOID lookup table (MESOZONE_GEOID_LOOKUP.csv)

step b.5 generate shapefile of the study area and rest of the U.S. in FreightGIS.R:

  • Require the following zonal shapefile and lookup tables:
    • Require FAF5 zonal shapefile (FAF5Zones.geojson) from step a.1
    • Require state census block group shapefile ({state}_bg.geojson) from step b.1
    • Require FAF zone - county look up file {region_name}_FAFCNTY.csv from step b.2
    • Require mesozone - census GEOID look up file MESOZONE_GEOID_LOOKUP.csv from step b.4
  • Produce shapefile of model spatial resolution: {region_name}_freight.geojson
  • Produce centroids of modeled zones: {region_name}_freight_centroids.geojson

step b.6 generate final mesoscopic zonal lookup table in zonal_data_processor.py:

  • Require the following zonal shapefile and lookup tables:
    • Require CFS and FAF zone lookup from FAF5 document: CFS_FAF_LOOKUP.csv from step a.5
    • Require FAF zone - county look up file {region_name}_FAFCNTY.csv from step b.2
    • Require shapefile of model spatial resolution: {region_name}_freight.geojson from step b.5
  • Produce mesoscopic zonal lookup table: zonal_id_lookup_final.csv

c.firm fleet generation (using private data sources, under folder: fleet_generation)

step c.1 processing IHS vehicle registration data from 2014 in vehicle_registration_data_processor.py:

  • Require the following inputs:
    • Require state-level vehicle registration data from NREL (POC: Alicia Birky: [email protected])
  • Produce private, for-hire and for-leasing fleet distribution (MD/HD) for selected state:
    • Private fleet: '{state}_private_fleet_size_distribution.csv'
    • For-hire fleet: '{state}_for_hire_fleet_size_distribution.csv'
    • For-leasing fleet: '{state}_for_lease_fleet_size_distribution.csv'

step c.2 processing cargo type from 2022 FMCSA Carrier Census data in carrier_cargo_type_analysis.py:

step c.3 processing state-by-state fleet projection from NREL TDA analysis in national_fleet_generation_TDA.py:

  • Require the following inputs:
    • Require state-level MD/HD split from NREL: 'MDHDbyState.csv'
    • Require national fleet projection from NREL TDA analysis: 'TDA_results/BEAMFreightSensitivity_HOPmod.xlsx'
    • Manually define list of scenarios from NREL TDA analysis
  • Produce fleet stock and distributions:
    • national vehicle stock by year: {scenario}/TDA_vehicle_stock.csv'
    • state-level vehicle composition by year: {scenario}/fleet_composition_by_state.csv'
    • EV powertrain distribution by vehicle type: {scenario}/EV_availability.csv'