-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pep_sex_2024 changes made #1110
base: master
Are you sure you want to change the base?
Conversation
@kurus21 Can you remove input & output folder and confirm? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The File has been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kuru. Looks good.
skiprows=7, | ||
skipfooter=102, | ||
header=None) | ||
df.columns = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use df.rename() instead of assuming column order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed over chat rename method doesn't gives an upper hand since that approach has also demands to assume the rows and cols position.
skipfooter=102, | ||
header=None) | ||
df.columns = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use df.rename()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed over chat rename method doesn't gives an upper hand since that approach has also demands to assume the rows and cols position.
'White Total', 'White Male', 'White Female', 'NonWhite Total', | ||
'NonWhite Male', 'NonWhite Female' | ||
] | ||
df = df.drop(columns=[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more readable to list columns of interest to be retained:
df.drop(columns=df.columns.difference(['Count_Person_Male', 'Count_Person_Female']), inplace=True)
Then it can be moved outside the if/else block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been modified accordingly.
# adding geoid, year and measurement method | ||
df['Year'] = year | ||
df.insert(0, 'geo_ID', 'country/USA', True) | ||
df['Measurement_Method'] = 'dcAggregate/CensusPEPSurvey_PartialAggregate' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems common to both if and else and can be moved out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been modified accordingly.
for col in float_col.columns.values: | ||
df[col] = df[col].astype('int64') | ||
df[col] = df[col].astype("str").str.replace("-1", "") | ||
df.rename(columns={'SEX': 'Year'}, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the column 'SEX' being renamed to 'Year' here and in functions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been renamed accordingly to match the data frame after modification.
'POPEST_FEM': 'Count_Person_Female', | ||
'YEAR': 'Year' | ||
}) | ||
df = df.drop(columns=[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be easier to to do df.drop(columns=df.columns.difference([])..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified
'Count_Person_Male', 'Count_Person_Female' | ||
] | ||
df = pd.read_excel(file_path, skiprows=5, skipfooter=7, header=None) | ||
df.columns = column_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use df.rename()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As same as above
'July2022Female', | ||
'July2023Male', | ||
'July2023Female', | ||
'2023Total', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we generalize this to 2024 and future years?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been generalized for future years
"sc-est2023-syasex-": _state_2023, | ||
"sc-est2023-agesex-": _state_2023, | ||
"cc-est2023-agesex-": _county_2023, | ||
"cc-est2023-agesex-a": _county_2023 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also extend to handle future years assuming the same format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes modified accordingly
return df | ||
except Exception as e: | ||
logging.fatal(f"Error processing the file {file_path}: {e}") | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple except block are there. Remove the duplicate ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
return df | ||
try: | ||
df = pd.read_csv(file_path, thousands=',', skiprows=4, header=None) | ||
df.columns = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider implementing a more dynamic approach to identify the required columns instead of hardcoding their order.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to assume the position of the columns [rows and cols] anyway. So it has been hard coded like other places.
Please be informed it has been handled with try and catch block anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kuru, could you add a comment to the script explaining the reason for fixing the column order? This will help future developers understand the rationale behind the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Kuru, Work on the comments provided.
|
Please be informed that the comments has been updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kuru. Looks good.
No description provided.