Skip to content

Latest commit

 

History

History
76 lines (60 loc) · 3.02 KB

CHANGELOG.md

File metadata and controls

76 lines (60 loc) · 3.02 KB

Changelog

All notable changes to this project will be documented in this file. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

Added

  • Add trace format validator
  • Added multiple trace filter classes and demos.
  • Added enhanced trace call stack graph implementation.
  • Added memory timeline view.
  • Added support for trace parser customization.
  • Added support for H100 traces.
  • (Experimental) Support to read PyTorch Execution Trace and correlate it with PyTorch Profiler Trace.
  • (Experimental) Added lightweight critical path analysis feature.
  • (Experimental) Critical path analysis features: event attribution and summary()
  • (Experimental) Critical path analysis fixes: fixing async memcpy and adding GPU to CPU event based synchronization.
  • (Experimental) Added save and restore feature for critical path graph.
  • Add nccl collective fields to parser config
  • Queue length analysis: Add feature to compute time blocked on a stream hitting max queue length.
  • Add kernel_backend to parser config for Triton / torch.compile() support.

Changed

  • Change test data path in unittests from relative path to real path to support running test within IDEs.
  • Add a workaround for overlapping events when using ns resolution traces (pytorch/pytorch#122425)
  • Better handling of CUDA sync events with steam = -1
  • Fix ijson metadata parser for some corner cases
  • Add an option for ns rounding and cover ijson loading with it.
  • Updated Trace() api to specify a list of files and auto figure out ranks.

Deprecated

  • Deprecated 'call_stack'; use 'trace_call_stack' and 'trace_call_graph' instead.

Removed

Fixed

  • Fixed issue #65 to handle floating point counter values in cupti_counter_analysis.
  • Fixes bug in Critical path analysis relating to listing out the edges on the critical path.
  • Updated critical path analysis with edge attribution.

[0.2.0] - 2023-05-22

Added

  • (Experimental) Added CUPTI Counter analyzer feature to parse kernel and operator level counter statistics.

Changed

  • Improved loading time by parallelizing reads in the create_rank_to_trace_dict function.

Fixed

  • Fix unit of time in get_gpu_kernel_breakdown API.
  • Optimized multiprocessing strategy to handle OOMs when process pool is too large.

[0.1.3] - 2023-02-09

Changed

  • Split requirements.txt into two files: requirements.txt and requirements-dev.txt. The former does not contain kaleido, coverage as they are required for development purposes only.

Removed

  • Coverage tests for graph visualization
  • Dependency on Matplotlib for Venn diagram generation

Fixed

  • LICENSE type in setup.py
  • Queue length summary handles missing data and negative values
  • Use .get for key lookup in dictionary
  • Typos in README.md

[0.1.2] - 2023-01-06

  • Initial release

[0.1.1] - 2023-01-06 [YANKED]

  • Test release

[0.1.0] - 2023-01-06 [YANKED]

  • Test release