Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precalculate Non-zero divisors for lak #89

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

mcroomp
Copy link
Collaborator

@mcroomp mcroomp commented Jun 11, 2024

Simplify division step in Lak calculations so that we don't need to check for divide by zero.

@Melirius
Copy link
Collaborator

It is OK, do you want to change division later to division-by-multiplication here?

@mcroomp
Copy link
Collaborator Author

mcroomp commented Jun 13, 2024

Not sure the division-by-multiply will fly because of the i16 cast to determine the sign. It messes up a bunch of potential optimizations.

@Melirius
Copy link
Collaborator

Unfortunately it is slower than current main:

main 82d6547

2024-11-15T21:03:34.699Z INFO  [lepton_jpeg::structs::lepton_file_writer] compressing to Lepton format
2024-11-15T21:03:35.173Z INFO  [lepton_jpeg::structs::lepton_file_writer] Number of threads: 8
2024-11-15T21:03:36.610Z INFO  [lepton_jpeg::structs::lepton_file_writer] worker threads 9632ms of CPU time in 1436ms of wall time
2024-11-15T21:03:36.610Z INFO  [lepton_jpeg::structs::lepton_file_writer] decompressing to verify contents
2024-11-15T21:03:38.216Z INFO  [lepton_jpeg_util] compressed input 22171278, output 17324076 bytes (compression = 28.0%)
2024-11-15T21:03:38.216Z INFO  [lepton_jpeg_util] Main thread CPU: 3517ms, Worker thread CPU: 20710 ms, walltime: 3517 ms

 Performance counter stats for 'taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.jpg images/img_52MP_7k2.lep':

       871 003 064      cache-references                                                        (41,51%)
        86 439 870      cache-misses                     #    9,92% of all cache refs           (41,52%)
    15 795 148 123      cycles                                                                  (41,72%)
       834 608 725      ic_fetch_stall.ic_stall_back_pressure                                        (41,87%)
     1 014 898 692      stalled-cycles-frontend          #    6,43% frontend cycles idle        (42,14%)
    39 107 914 014      instructions                     #    2,48  insn per cycle            
                                                  #    0,03  stalled cycles per insn     (42,43%)
     4 396 207 542      branch-instructions                                                     (42,50%)
       162 547 993      branch-misses                    #    3,70% of all branches             (42,32%)
     5 434 637 214      ic_fetch_stall.ic_stall_any                                             (42,14%)
        45 374 631      ic_fetch_stall.ic_stall_dq_empty                                        (41,99%)
        72 770 962      l2_cache_misses_from_ic_miss                                            (41,75%)
     2 149 017 778      l2_latency.l2_cycles_waiting_on_fills                                        (41,47%)
           184 005      faults                                                                
                 1      migrations                                                            

       3,551824055 seconds time elapsed

       3,224073000 seconds user
       0,324906000 seconds sys

this PR rebased on main:

2024-11-15T21:02:37.776Z INFO  [lepton_jpeg::structs::lepton_file_writer] compressing to Lepton format
2024-11-15T21:02:38.258Z INFO  [lepton_jpeg::structs::lepton_file_writer] Number of threads: 8
2024-11-15T21:02:39.715Z INFO  [lepton_jpeg::structs::lepton_file_writer] worker threads 9853ms of CPU time in 1456ms of wall time
2024-11-15T21:02:39.715Z INFO  [lepton_jpeg::structs::lepton_file_writer] decompressing to verify contents
2024-11-15T21:02:41.327Z INFO  [lepton_jpeg_util] compressed input 22171278, output 17324076 bytes (compression = 28.0%)
2024-11-15T21:02:41.327Z INFO  [lepton_jpeg_util] Main thread CPU: 3550ms, Worker thread CPU: 20941 ms, walltime: 3550 ms

 Performance counter stats for 'taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.jpg images/img_52MP_7k2.lep':

       871 415 376      cache-references                                                        (41,78%)
        81 979 646      cache-misses                     #    9,41% of all cache refs           (41,60%)
    15 921 809 354      cycles                                                                  (41,64%)
       927 421 935      ic_fetch_stall.ic_stall_back_pressure                                        (41,85%)
     1 064 411 574      stalled-cycles-frontend          #    6,69% frontend cycles idle        (42,04%)
    38 932 713 731      instructions                     #    2,45  insn per cycle            
                                                  #    0,03  stalled cycles per insn     (42,18%)
     4 399 352 404      branch-instructions                                                     (42,15%)
       163 853 521      branch-misses                    #    3,72% of all branches             (42,07%)
     5 490 454 599      ic_fetch_stall.ic_stall_any                                             (42,14%)
        40 739 513      ic_fetch_stall.ic_stall_dq_empty                                        (42,01%)
        67 832 885      l2_cache_misses_from_ic_miss                                            (41,87%)
     2 175 383 646      l2_latency.l2_cycles_waiting_on_fills                                        (41,81%)
           184 003      faults                                                                
                 1      migrations                                                            

       3,585066645 seconds time elapsed

       3,248145000 seconds user
       0,333706000 seconds sys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants