Skip to content

Commit

Permalink
fix(main.py): drop columns with all NaN values to improve data qualit…
Browse files Browse the repository at this point in the history
…y and reduce file size

feat(main.py): add leading zeros to the prefecture number in the output file name for better sorting and readability
  • Loading branch information
ryo-ma committed Feb 18, 2024
1 parent 672348b commit 4109991
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,6 @@ def delete_headers(df, line_number):
dfs = [delete_headers(x, 2) for x in dfs[1:]]
dfs.insert(0, first_df)
merged_df = pd.concat(dfs).replace('\n', '', regex=True).replace('\r', '', regex=True).replace('\r\n', '', regex=True).replace('\n\r', '', regex=True)
result_df = merged_df.dropna(subset=[0])
result_df.to_csv(f"./output_files/{i}_{PREFECTURES[i-1]}.csv", header=False, index=False)
result_df = merged_df.dropna(subset=[0]).dropna(axis=1)
prefecture_number = str(i).zfill(2)
result_df.to_csv(f"./output_files/{prefecture_number}_{PREFECTURES[i-1]}.csv", header=False, index=False)

0 comments on commit 4109991

Please sign in to comment.