Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update csv to parquet to use multiprocessing #795

Merged
merged 1 commit into from
Jul 10, 2024

Conversation

JoshCu
Copy link
Contributor

@JoshCu JoshCu commented Jul 3, 2024

Multiprocessing for Faster Input Reformatting

When testing routing for 6500 catchments and 24 timesteps, a large portion of the troute execution time is still spent on reformatting ngen output. PR #714 partially addressed this issue with an awk command. This PR adds multiprocessing to speed it up even more.

Performance Comparison

Without Multiprocessing

************ TIMING SUMMARY ************
----------------------------------------
Network graph construction: 0.8 secs,  7.27%
Forcing array construction: 7.7 secs, 69.93%
Routing computations:      1.89 secs, 17.20%
Output writing:            0.61 secs,  5.56%
----------------------------------------

With Multiprocessing (56 cores - default to CPU core count)

************ TIMING SUMMARY ************
----------------------------------------
Network graph construction: 0.84 secs, 19.68%
Forcing array construction: 0.98 secs, 22.94%
Routing computations:      1.82 secs, 42.92%
Output writing:            0.61 secs, 14.36%
----------------------------------------

With Multiprocessing (4 cores - hardcoded example)

************ TIMING SUMMARY ************
----------------------------------------
Network graph construction: 0.79 secs, 14.09%
Forcing array construction: 2.36 secs, 42.39%
Routing computations:       1.8 secs, 32.29%
Output writing:            0.62 secs, 11.15%
----------------------------------------

Notes

  • This works in ngiab with ngen run serially,
  • Testing this via ngen with MPI will actually reduce performance due to an issue being worked on in NOAA-OWP/ngen#846.
  • Building ngen from that PR should work.
  • An ngiab image for x86 with both the MPI patch and this troute patch applied is available at joshcu/ngiab_dev.

@shorvath-noaa shorvath-noaa merged commit ed8a105 into NOAA-OWP:master Jul 10, 2024
4 checks passed
@JoshCu JoshCu deleted the parallel_csv_to_parquet branch July 11, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants