Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster CUDA prompt speeds #924

Closed
wants to merge 1 commit into from
Closed

Conversation

EricLBuehler
Copy link
Owner

I measure +4% PP for Llama 3.2 3b (807 T/s -> 840 T/s, 42 prompt tokens).

@EricLBuehler EricLBuehler deleted the cuda_attnmask_expand branch November 21, 2024 20:09
Copy link

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 Happy                   1          442          369            0           73
 JSON                   12          105          104            0            1
 Python                 53         2274         1949           63          262
 Shell                   1           57           22           18           17
 TOML                   18          583          520            2           61
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               40         3009            0         2286          723
 |- BASH                 6          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               6          114          102            0           12
 |- Rust                10          361          306            0           55
 |- TOML                 2           75           63            0           12
 (Total)                           3672          581         2286          805
-------------------------------------------------------------------------------
 Rust                  280        84887        76163         1764         6960
 |- Markdown           136         1435           25         1306          104
 (Total)                          86322        76188         3070         7064
===============================================================================
 Total                 415        91454        79196         4145         8113
===============================================================================
  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant