Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark performance of PyGMT functions #2910

Closed
weiji14 opened this issue Dec 24, 2023 · 7 comments · Fixed by #3631
Closed

Benchmark performance of PyGMT functions #2910

weiji14 opened this issue Dec 24, 2023 · 7 comments · Fixed by #3631
Assignees
Labels
maintenance Boring but important stuff for the core devs
Milestone

Comments

@weiji14
Copy link
Member

weiji14 commented Dec 24, 2023

Description of the desired feature

We are attempting some big refactoring steps in PyGMT to avoid the use of temporary intermediate files #2730, and will also be performing some updates to GMT 6.5.0 soon, so it would be good to track any performance improvements or regressions in terms of execution speed. Since #835, we have actually measured the execution time of the slowest tests (>0.2s) on CI for #584, but those execution times are not really tracked over time, and only hidden in the CI log files.

So, to better track performance over time, we are setting up a Continuous Benchmarking workflow in #2908. This is using pytest-codspeed, with the results logged to https://codspeed.io/GenericMappingTools/pygmt. The benchmarks is selective however, and will only be run on unit tests marked with @pytest.mark.benchmark.

Main question: Which tests do we decide to benchmark?

So which tests should we benchmark (and should we mark all those tests with @pytest.mark.benchmark in this PR or a follow-up one)?

I think we should focus on benchmarking low-level functions rather than modules' wrappers, since the low-level functions are heavily used in everywhere and most wrappers have very simple and similar code structures.

For example, most plotting modules have very simple code like the one in Figure.basemap

    kwargs = self._preprocess(**kwargs)                                         
    with Session() as lib:                                                      
        lib.call_module(module="basemap", args=build_arg_string(kwargs))

so we just need to benchmark one basemap test (e.g., test_basemap()) and don't need to benchmark other basemap's tests and other plotting methods (e.g., Figure.coast). Of course, there are few exceptions, for example, Figure.meca, Figure.plot, Figure.plot3d, Figure.text are among the complicated wrappers and should be benchmarked.

Similarly, for table-processing and grid-processing functions, benchmarking pygmt.select and pygmt.grdfill should be enough.

Originally posted by @seisman in #2908 (comment)

Other questions: When do we want to run the benchmarks? Should this be enabled on every Pull Request, or just Pull Requests that modify certain files perhaps? Edit: Answered in #2908 (comment)

Are you willing to help implement and maintain this feature?

Yes

@weiji14 weiji14 added question Further information is requested maintenance Boring but important stuff for the core devs labels Dec 24, 2023
@weiji14 weiji14 added this to the 0.11.0 milestone Dec 24, 2023
@weiji14 weiji14 self-assigned this Dec 24, 2023
@weiji14
Copy link
Member Author

weiji14 commented Dec 24, 2023

I think we should focus on benchmarking low-level functions rather than modules' wrappers, since the low-level functions are heavily used in everywhere and most wrappers have very simple and similar code structures.

Agree with benchmarking the low-level clib functions, but I have other thoughts about benchmarking the wrappers (see below).

so we just need to benchmark one basemap test (e.g., test_basemap()) and don't need to benchmark other basemap's tests and other plotting methods (e.g., Figure.coast). Of course, there are few exceptions, for example, Figure.meca, Figure.plot, Figure.plot3d, Figure.text are among the complicated wrappers and should be benchmarked.

Similarly, for table-processing and grid-processing functions, benchmarking pygmt.select and pygmt.grdfill should be enough.

Considering that we will be switching from GMT 6.4.0 to 6.5.0 soon, I almost think we should have at least one benchmark for each of the 60+ PyGMT modules listed in pygmt/src/__init__.py. That way, we can spot any potential performance regressions in GMT. The same could be said for any of the upstream libraries we depend on such as NumPy, Pandas, Xarray, GeoPandas, etc.

I assumed that the benchmarking would be slow in #2730 (comment) since they might need to run a benchmark multiple times, but actually after reading https://codspeed.io/blog/pinpoint-performance-regressions-with-ci-integrated-differential-profiling, codspeed only runs each benchmark once!

The action will run the benchmarks on a "virtualized" CPU with Valgrind. Instead of measuring the execution time, it measures the number of CPU cycles and memory accesses. Each benchmark is only ran once and it is enough to get consistent data.

So it might be ok to have 60+ benchmarks running at once. Plus pytest-codspeed can be combined with pytest-xdist (see https://docs.codspeed.io/benchmarks/python#running-benchmarks-in-parallel), so we could potentially speed up the benchmarking further.

@seisman
Copy link
Member

seisman commented Dec 25, 2023

Considering that we will be switching from GMT 6.4.0 to 6.5.0 soon, I almost think we should have at least one benchmark for each of the 60+ PyGMT modules listed in pygmt/src/__init__.py. That way, we can spot any potential performance regressions in GMT. The same could be said for any of the upstream libraries we depend on such as NumPy, Pandas, Xarray, GeoPandas, etc.

Or maybe we should just benchmark all tests except those for catching exceptions.

@weiji14
Copy link
Member Author

weiji14 commented Dec 25, 2023

Or maybe we should just benchmark all tests except those for catching exceptions.

Maybe not all all, but at least one (or two) for each PyGMT function for now?

@seisman
Copy link
Member

seisman commented Dec 25, 2023

I'm OK with that.

@seisman seisman added discussions Need more discussion before taking further actions and removed question Further information is requested labels Dec 25, 2023
@weiji14
Copy link
Member Author

weiji14 commented Dec 27, 2023

Ok, here's the list of test files:

  • test_accessor.py
  • test_basemap.py
  • test_binstats.py
  • test_blockmedian.py
  • test_blockm.py
  • test_clib_loading.py
  • test_clib_put_matrix.py
  • test_clib_put_strings.py
  • test_clib_put_vector.py
  • test_clib.py
  • test_clib_virtualfiles.py
  • test_coast.py
  • test_colorbar.py
  • test_config.py
  • test_contour.py
  • test_datasets_earth_age.py
  • test_datasets_earth_free_air_anomaly.py
  • test_datasets_earth_geoid.py
  • test_datasets_earth_magnetic_anomaly.py
  • test_datasets_earth_mask.py
  • test_datasets_earth_relief.py
  • test_datasets_earth_vertical_gravity_gradient.py
  • test_datasets_load_remote_datasets.py
  • test_datasets_samples.py
  • test_dimfilter.py
  • test_figure.py
  • test_filter1d.py

  • test_geopandas.py
  • test_grd2cpt.py
  • test_grd2xyz.py
  • test_grdclip.py
  • test_grdcontour.py
  • test_grdcut.py
  • test_grdfill.py
  • test_grdfilter.py
  • test_grdgradient.py
  • test_grdhisteq.py
  • test_grdimage_image.py
  • test_grdimage.py
  • test_grdinfo.py
  • test_grdlandmask.py
  • test_grdproject.py
  • test_grdsample.py
  • test_grdtrack.py
  • test_grdview.py
  • test_grdvolume.py
  • test_helpers.py
  • test_histogram.py
  • test_image.py
  • test_info.py
  • test_init.py - only tests pygmt.show_versions(), so ignored, see Mark unit tests with @pytest.mark.benchmark part 1 #2911 (comment)
  • test_inset.py
  • test_io.py
  • test_legend.py
  • test_logo.py
  • test_makecpt.py
  • test_meca.py
  • test_nearneighbor.py
  • test_plot3d.py
  • test_plot.py
  • test_project.py
  • test_psconvert.py
  • test_rose.py
  • test_select.py
  • test_session_management.py
  • test_solar.py
  • test_sph2grd.py
  • test_sphdistance.py
  • test_sphinterpolate.py
  • test_sphinx_gallery.py - requires sphinx_gallery to be installed, plus PyGMTScraper is non-user facing, so ignored, see Mark unit tests with @pytest.mark.benchmark part 1 #2911 (comment)
  • test_subplot.py
  • test_surface.py
  • test_ternary.py
  • test_text.py
  • test_tilemap.py
  • test_timestamp.py
  • test_triangulate.py
  • test_velo.py
  • test_which.py
  • test_wiggle.py
  • test_x2sys_cross.py
  • test_x2sys_init.py
  • test_xyz2grd.py

I'm gonna work on the second half (from test_geopandas to test_xyz2grd) in #2911 for now.

@weiji14
Copy link
Member Author

weiji14 commented Dec 27, 2023

Putting this here to remind ourselves to document guidelines for Continuous Benchmarking before closing this issue.

@weiji14 Do you want to add a section like "Continuous Benchmark" to this guide?

I did think about it when writing the docs in #2908, but wasn't sure what to include yet in terms of guidelines. Maybe after a few weeks/months when we've got some experience with benchmarking, then we can add some words in.

Originally posted by @weiji14 in #2916 (comment)

@seisman seisman removed the discussions Need more discussion before taking further actions label Jan 17, 2024
@seisman seisman modified the milestones: 0.11.0, 0.12.0 Feb 1, 2024
@weiji14
Copy link
Member Author

weiji14 commented Feb 6, 2024

At commit dedfa7a in #2908, and the follow-up patch in #2952, we set the benchmark workflow to only run on merge to main branch when '*.py' files or the '.github/workflows/benchmarks.yml' is modified. However, in #3040 (comment), I triggered the benchmark runs, and the flame graphs are not showing up (there is a message like No baseline data is yet available. Once your main branch has generated a first performance report, you will see the list of commits and their performance impact here), possibly because that PR cannot compare to a baseline run on the main branch (72131ae) which did not have a benchmark run.

Luckily, we can manually trigger a baseline benchmark run on the main branch! Steps are:

  1. Go to https://github.com/GenericMappingTools/pygmt/actions/workflows/benchmarks.yml
  2. Click on the 'Run Workflow' button on the right. There should be an option to trigger the workflow on the main branch (or any tag/ref):
    image
  3. Wait for the baseline benchmark run to finish (takes about 14min)
  4. Trigger the benchmark run on the PR branch by adding the run/benchmark tag.
  5. View the flame graphs at https://codspeed.io/GenericMappingTools/pygmt/branches/!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants