Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pep_sex_2024 changes made #1110

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

kurus21
Copy link

@kurus21 kurus21 commented Nov 6, 2024

No description provided.

@krishnaswamypradeep
Copy link

@kurus21 Can you remove input & output folder and confirm?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The File has been removed.

Copy link

@krishnaswamypradeep krishnaswamypradeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kuru. Looks good.

skiprows=7,
skipfooter=102,
header=None)
df.columns = [
Copy link
Contributor

@ajaits ajaits Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use df.rename() instead of assuming column order.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed over chat rename method doesn't gives an upper hand since that approach has also demands to assume the rows and cols position.

skipfooter=102,
header=None)
df.columns = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use df.rename()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed over chat rename method doesn't gives an upper hand since that approach has also demands to assume the rows and cols position.

'White Total', 'White Male', 'White Female', 'NonWhite Total',
'NonWhite Male', 'NonWhite Female'
]
df = df.drop(columns=[
Copy link
Contributor

@ajaits ajaits Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more readable to list columns of interest to be retained:
df.drop(columns=df.columns.difference(['Count_Person_Male', 'Count_Person_Female']), inplace=True)

Then it can be moved outside the if/else block

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been modified accordingly.

Comment on lines 158 to 161
# adding geoid, year and measurement method
df['Year'] = year
df.insert(0, 'geo_ID', 'country/USA', True)
df['Measurement_Method'] = 'dcAggregate/CensusPEPSurvey_PartialAggregate'
Copy link
Contributor

@ajaits ajaits Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems common to both if and else and can be moved out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been modified accordingly.

for col in float_col.columns.values:
df[col] = df[col].astype('int64')
df[col] = df[col].astype("str").str.replace("-1", "")
df.rename(columns={'SEX': 'Year'}, inplace=True)
Copy link
Contributor

@ajaits ajaits Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the column 'SEX' being renamed to 'Year' here and in functions below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been renamed accordingly to match the data frame after modification.

'POPEST_FEM': 'Count_Person_Female',
'YEAR': 'Year'
})
df = df.drop(columns=[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be easier to to do df.drop(columns=df.columns.difference([])..)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

'Count_Person_Male', 'Count_Person_Female'
]
df = pd.read_excel(file_path, skiprows=5, skipfooter=7, header=None)
df.columns = column_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use df.rename()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As same as above

'July2022Female',
'July2023Male',
'July2023Female',
'2023Total',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we generalize this to 2024 and future years?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been generalized for future years

"sc-est2023-syasex-": _state_2023,
"sc-est2023-agesex-": _state_2023,
"cc-est2023-agesex-": _county_2023,
"cc-est2023-agesex-a": _county_2023
Copy link
Contributor

@ajaits ajaits Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also extend to handle future years assuming the same format?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes modified accordingly

return df
except Exception as e:
logging.fatal(f"Error processing the file {file_path}: {e}")
except Exception as e:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple except block are there. Remove the duplicate ones.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

return df
try:
df = pd.read_csv(file_path, thousands=',', skiprows=4, header=None)
df.columns = [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implementing a more dynamic approach to identify the required columns instead of hardcoding their order.?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to assume the position of the columns [rows and cols] anyway. So it has been hard coded like other places.
Please be informed it has been handled with try and catch block anyway.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kuru, could you add a comment to the script explaining the reason for fixing the column order? This will help future developers understand the rationale behind the change

Copy link

@krishnaswamypradeep krishnaswamypradeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kuru, Work on the comments provided.

@kurus21
Copy link
Author

kurus21 commented Nov 25, 2024

Hi Kuru, Work on the comments provided.
Please be informed that the comments has been addressed

@kurus21
Copy link
Author

kurus21 commented Nov 25, 2024

Please be informed that the comments has been updated.

Copy link

@krishnaswamypradeep krishnaswamypradeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kuru. Looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants