Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CUDA Events for measuring elapsed time #143

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

staghado
Copy link
Contributor

@staghado staghado commented Apr 20, 2024

Use CUDA events to measure elapsed time in the distributed trainer, all pair-to-pair GPUs throughput test and in decode_text.

I would like to add some tests before review but I am not sure what to test yet...

issue : #88

@staghado
Copy link
Contributor Author

staghado commented May 8, 2024

Conducting simple tests by running the tiny llama example(examples/train_tiny_llama.sh), here are the results:

  • With time.time()
iteration: 14 / 15 | elapsed_time_per_iteration_ms: 14.8 | tokens_per_sec: 69.3K 
iteration: 15 / 15 | elapsed_time_per_iteration_ms: 15.1 | tokens_per_sec: 67.7K
  • With CUDA events with dist.barrier()
iteration: 14 / 15 | elapsed_time_per_iteration_ms: 14.3 | tokens_per_sec: 71.5K
iteration: 15 / 15 | elapsed_time_per_iteration_ms: 13.8 | tokens_per_sec: 74.1K

  • With CUDA events w/o dist.barrier()
iteration: 14 / 15 | elapsed_time_per_iteration_ms: 14.3 | tokens_per_sec: 71.6K
iteration: 15 / 15 | elapsed_time_per_iteration_ms: 13.8 | tokens_per_sec: 74.3K

These values fluctuate from a run to another but time.time() seems to overestimate the elapsed times a little bit and dist.barrier() seems to have no effect when using CUDA events.

@staghado staghado changed the title [WIP] Use CUDA Events for measuring elapsed time Use CUDA Events for measuring elapsed time May 8, 2024
@staghado
Copy link
Contributor Author

@NouamaneTazi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant