-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remake mart_ntd.dim_monthly_ntd_ridership_with_adjustments with new data #3519
Comments
As discussed during weekly meeting, this work is important and urgent. So I moving to our current iteration to work on it. |
estimates to bring data for new available months (July and August) [#3519]
The URL for the xlsx file changes every month, so we will need someone to get the new URL and update it.
For new month_year columns on new files, currently we just need to add them to the yml file. Depending on the path we use for the URL change, this is just part of the changes on PR... see any new column and add them. If we want it to be automatic, I could remove the columns definition, so any new columns would be imported. It would impact on the order of the columns that gets a little messy on External and Staging tables, also we cannot pre-define the typecast for each column the process created the typecast based on the data. @evansiroky @csuyat-dot Would love to hear if you have any thoughts or preferences. |
Cc @vevetron for comments too |
Some notes:
|
Thank you! I was going to add him and end up forgetting to tag him. |
- Also Format SQL files and reorder columns [#3519]
@csuyat-dot @vevetron I have some questions about The Intermediate tables are currently returning only records with the following modes:
The dim_monthly_ridership_with_adjustments is created from those intermediate tables, but it has the rules for modes:
QUESTION 1 For the mode QUESTION 2 The existing modes on QUESTION 3 There column year is used for anything? |
estimates to bring data for new available months (July and August) [#3519]
- Also Format SQL files and reorder columns [#3519]
@erikamov here are my thoughts, though im not too familiar with modes Q1: there was an old thread on Slack regarding Q2: im personally unsure about removing/keeping Q3: i dont use that |
Note for future references: External monthly ridership tables have some records with empty NTD_ID for modes: They are lines from the summary added to the bottom of each monthly data worksheet inside the So I am adding filters to staging tables to only return records where |
instead of every day. There is no need to run every day since it is a monthly file, but having the process running once a week give us a better window to add and process new files once it is released. [#3519]
estimates to bring data for new available months (July and August) [#3519]
- Also Format SQL files and reorder columns [#3519]
instead of every day. There is no need to run every day since it is a monthly file, but having the process running once a week give us a better window to add and process new files once it is released. [#3519]
estimates to bring data for new available months: July, August, and September [#3519]
Mart tables - Also Format SQL files and reorder columns - Remove irrelevant year column from dim model - Filter out null NTD_ids - Edit description on documentation [#3519]
Mart tables - Also Format SQL files and reorder columns - Remove irrelevant year column from dim model - Filter out null NTD_ids - Edit description on documentation [#3519]
instead of every day. There is no need to run every day since it is a monthly file, but having the process running once a week give us a better window to add and process new files once it is released. [#3519]
estimates to bring data for new available months: July, August, and September [#3519]
models. - Add documentation for models and columns - Add Test for NTD_ID [#3519]
the new process running on Airflow. [#3519]
and Mart tables - Also Format SQL files and reorder columns - Remove irrelevant year column from dim model - Update documentation for models and columns [#3519]
and Mart tables - Format SQL files and reorder columns - Remove irrelevant year column from dim model - Rename columns to follow the same pattern as the other new NTD models - Add test for not Null NTD_ID - Update documentation for models and columns [#3519]
estimates to bring data for new available months: July, August, and September [#3519]
models. - Add documentation for models and columns - Add Test for NTD_ID [#3519]
the new process running on Airflow. [#3519]
and Mart tables - Format SQL files and reorder columns - Remove irrelevant year column from dim model - Rename columns to follow the same pattern as the other new NTD models - Add test for not Null NTD_ID - Update documentation for models and columns [#3519]
estimates to bring data for new available months: July, August, and September [#3519]
models. - Add documentation for models and columns - Add Test for NTD_ID [#3519]
the new process running on Airflow. [#3519]
- Format SQL files and reorder columns - Remove irrelevant year column [#3519]
dim_monthly_ridership_with_adjustments (removed _ntd_). - Format SQL file and reorder columns - Remove irrelevant year column - Rename columns to follow the same pattern as the other new NTD models - Add surrogate key [#3519]
dim_monthly_ridership_with_adjustments (removed _ntd_). - Format SQL file and reorder columns - Remove irrelevant year column - Rename columns to follow the same pattern as the other new NTD models - Add surrogate key [#3519]
Remake completed:
Remaining work:
|
instead of every day. There is no need to run every day since it is a monthly file, but having the process running once a week give us a better window to add and process new files once it is released. [#3519]
Schedule changed and dropped tables. |
User story / feature request
As a data analyst, I want
mart_ntd.dim_monthly_ntd_ridership_with_adjustments
remade as a new table that uses data fromexternal_ntd__ridership.historical__complete_monthly_ridership_with_adjustments_and_estimates__upt
mart_ntd.dim_monthly_ntd_ridership_with_adjustments
data is used in the Monthly NTD Ridership Report. Previously, this table was manually updated when FTA publishes new data from the complete monthly ridership report. However, the FTA report is in a wide format (a new column is added every month).dim_monthly_ntd_ridership_with_adjustment
converted the FTA data to a long format which works well with the report site scripts.The new
external_ntd__ridership.historical__complete_monthly_ridership_with_adjustments_and_estimates__upt
ingest data in initial the wide format, which is not compatible with the report site scripts.Acceptance Criteria
A model as close as possible to initial dim_monthly_ntd_ridership_with_adjustments model where each rows is an agency's UPT for that month-year
Notes
The text was updated successfully, but these errors were encountered: