Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Game of Life Optimizations #181

Merged
merged 8 commits into from
Nov 16, 2024
Merged

Conversation

Brandon502
Copy link

Game of Life was struggling on large setups, changed the algorithm to get much faster update speeds at the cost of memory. 2 bytes per cell instead of 2 bits. Previously on my 128x64 panel I could get around 35fps max usually much lower depending on how many alive cells. Now can get well over 100 and averages around 80fps with blur on and 100+ without blur. @troyhacks also tested an previous version of this and saw significant improvements on his 192x96 display.

CRC16 is no longer used and can maybe be removed. On large setups CRC had quite a few false positive triggers, so I stored previous status in the new struct since I had plenty of extra bits free.

color_blend doesn't always blend color1 completely to color2, not sure if this is intended, but I used this to fix it, since I needed it for speed improvements.

uint32_t blended = color_blend(cellColor, bgColor, blur);
if (blended == cellColor) blended = bgColor; // color_blend fix

To set the initial cells alive I noticed random16() seemed to produce long vertical lines pretty frequently. I switched to esp_random() for esp32s and it seems much better. This change can be reverted if needed.

If mirror or transposed is toggled a new game starts. Cell struct stores neighbor counts and if it is an edge cell. Mirror/Transpose break these values, you could either use more code to recalculate or just reset to a new game. I chose the latter.

Few misc bug fixes. All features work the same or better than before now.

Comparison using esp32_4MB_V4_S:

20241109_162450_1.mp4

Uses more memory to achieve much higher framerates on large setups. Neighbor counts are stored instead of constantly recalculated. CRC is no longer used for repeat detection so false positives are no longer possible.
wled00/FX.cpp Outdated Show resolved Hide resolved
wled00/FX.cpp Outdated Show resolved Hide resolved
wled00/FX.cpp Outdated Show resolved Hide resolved
Use defined(ARDUINO_ARCH_ESP32)
getNeighborIndexes loop changes
offsets use int8_t
change prevRows/Cols to uint16
bool allColors = SEGMENT.check1;
bool overlayBG = SEGMENT.check2;
bool wrap = SEGMENT.check3;
bool bgBlendMode = SEGMENT.custom1 > 220 && !overlayBG; // if blur is high and not overlaying, use bg blend mode
byte blur = bgBlendMode ? map2(SEGMENT.custom1 - 220, 0, 35, 255, 128) : map2(SEGMENT.custom1, 0, 255, 255, 0);
byte blur = overlayBG ? 255 : bgBlendMode ? map2(SEGMENT.custom1 - 220, 0, 35, 255, 128) : map2(SEGMENT.custom1, 0, 220, 255, 10);
Copy link
Collaborator

@softhack007 softhack007 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be the source of your problem with color_blend 🤔
map is allowed to "overshoot", so when you have an input outside the range, you'll also get an out-of-range result.
this might happen if overlayBG is true and custom1 > 220.

--> map2 produces something thats not in [0...255] and then casting to byte does "result & 0xFF".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a quick test to show the results using this code:

  static uint32_t prevColor1, prevColor2;
  static int prevBlend;
  uint32_t colorTest = SEGMENT.color_from_palette(0, false, PALETTE_SOLID_WRAP, 0);
  if (prevColor1 != colorTest || prevColor2 != bgColor || prevBlend != SEGMENT.custom1) {
    prevColor1 = colorTest;
    prevColor2 = bgColor;
    prevBlend = SEGMENT.custom1;
    printf("Blending C1: %d C2: %d Blend: %d\n", colorTest, bgColor, SEGMENT.custom1);
    uint32_t result = colorTest;
    for (int i = 0; i < 255; i++) {
      result = color_blend(result, prevColor2, SEGMENT.custom1);
      printf("Color %2d: %12d -> %d\n", i, result, bgColor);
      if (result == bgColor) break; 
    }
  }

Test blending blue into black:

Blending C1: 255 C2: 0 Blend: 128
Color  0:          126 -> 0
Color  1:           62 -> 0
Color  2:           30 -> 0
Color  3:           14 -> 0
Color  4:            6 -> 0
Color  5:            2 -> 0
Color  6:            0 -> 0

Test blending blue into white:

Blending C1: 255 C2: 16777215 Blend: 128
Color  0:      8355838 -> 16777215
Color  1:     12500733 -> 16777215
Color  2:     14540285 -> 16777215
Color  3:     15592957 -> 16777215
Color  4:     16119293 -> 16777215
Color  5:     16382461 -> 16777215
Color  6:     16514045 -> 16777215
Color  7:     16579837 -> 16777215
Color  8:     16579837 -> 16777215
Color  9:     16579837 -> 16777215
Color 10:     16579837 -> 16777215

Into white it gets stuck on 16579837 and never fully blends into white. My fix was to check if color1 changed after blending, if it doesn't just snap directly to color2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Brandon502 indeed this looks unexpected.
I'll cross-check in the next days, maybe you found something.

Thanks 👍

Copy link
Collaborator

@softhack007 softhack007 Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Brandon502 looks like our color_blend - and the same code from upstream - has major rounding errors.

We seem to use an old version of the function that's was taken from fastLed
https://github.com/FastLED/FastLED/blob/6d913bccd5a2cfd15a87115f26dd2ecdd7cab92f/src/colorutils.cpp#L287
"Old blend method which unfortunately had some rounding errors"

FastLed "blend8" was corrected in the meantime
https://github.com/FastLED/FastLED/blob/6d913bccd5a2cfd15a87115f26dd2ecdd7cab92f/src/lib8tion/math8.h#L592-L596

So one option would be to exchange the homebrew code in color_blend with r3 = blend8(r1, r2, blend); (same for g,b,w).
I'll check tomorrow if using the blend8() function has any negative impact on performance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Brandon502 I have updated color_blend, basicially copying the logic from FastLED blend8().

6ef0578

I've done some unit tests to confirm the new function is more accurate, and better than the old one for 8bit blends.

Could you repeat your own tests, and also check if the workaround in GOL is still needed?
(upstream still has the buggy blend function, so maybe just comment out your workaround if not needed any more in MM)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@softhack007 New blend function does seem more accurate, but it still doesn't always convert to color2 depending on the colors. So the fix still needed.

Removed future status/neighbors. Uses 2 loops to set cells. Shifting from future to current no longer needed.
@softhack007
Copy link
Collaborator

softhack007 commented Nov 16, 2024

@Brandon502 looks good - tested on 128x64 and the effect achieves 80-120 fps depending on number of active cells.
I think the 1-byte per cell version is good enough, as effect memory is very small by default.

Tested OK, neither clang-tidy nor cppcheck had any major comments on code quality.

Just tell me once you're happy with the changes, and i'll merge into mdev prior to our upcoming release.

Use superDead correctly with bgBlendMode
@Brandon502
Copy link
Author

@Brandon502 looks good - tested on 128x64 and the effect achieves 80-120 fps depending on number of active cells. I think the 1-byte per cell version is good enough, as effect memory is very small by default.

Tested OK, neither clang-tidy nor cppcheck had any major comments on code quality.

Just tell me one you're happy with the changes, and i'll merge into mdev prior to our upcoming release.

Yeah 1 byte is pretty similar performance after optimizing it so I agree it should be used. Just pushed one last change speeding up games that use bgBlendMode.

The only bug currently is when reverse or transpose (square grids) is toggled all colors are lost. Not a huge deal, but depending on the background color it can blank the display until a new game starts.

@softhack007 softhack007 merged commit e4902b8 into MoonModules:mdev Nov 16, 2024
39 checks passed
@softhack007
Copy link
Collaborator

softhack007 commented Nov 18, 2024

Hi @Brandon502,
we might need another solution for randomizing the initial grid generation ...

@troyhacks has seen delays that bring his new Art-Net driver out of sync when a new grid is generated.

https://github.com/MoonModules/WLED/blob/fb259d1bc6b77a3e6b664e4d12007c336ce94a32/wled00/FX.cpp#L5378-L5384


It looks like esp_random() is causing the slowdown - replacing it with the line for 8266 helps.

A few other options that come to my mind
a) use the libc functions srand() and rand() - they are faster than esp_random setAlive = (rand() & 0xFF) < 86;
b) keep using random16, however for every row (outer for loop) add some randomness with random16_add_entropy(esp_random() & 0xFFFF);

@troyhacks
Copy link
Collaborator

troyhacks commented Nov 18, 2024

Yeah, it's mostly just delays in how long it takes to init the cells at startup.

This also seems reasonably fast:

uint32_t xorshift32_state = random(16);
const uint32_t threshold = (UINT32_MAX / 3);
for (unsigned y = 0; y < rows; ++y) for (unsigned x = 0; x < cols; ++x, ++cIndex) {
  #if defined(ARDUINO_ARCH_ESP32)
    // bool setAlive = esp_random() < 1374389534; // ~32%
    // bool setAlive = random16(100) < 32;
    // bool setAlive = (rand() & 0xFF) < 86;
    // bool setAlive = random8(3);
    uint32_t myrand = xorshift32_state; 
    myrand ^= myrand << 13;
    myrand ^= myrand >> 17;
    myrand ^= myrand << 5;
    xorshift32_state = myrand;
    bool setAlive = (myrand < threshold);
  #else
    bool setAlive = random16(100) < 32;
  #endif

@troyhacks
Copy link
Collaborator

Currently the pauses are enough to cause my external Art-Net controllers to lose sync for a moment... and also big pauses before it starts up in general:

Whatsapp.Video.2024-11-18.At.8.03.11.Am.mp4

@Brandon502
Copy link
Author

@softhack007 @troyhacks I tested out all your options and they all seemed better than just using random16(). I put the two most similar to the original below. But any of your options are fine with me. The shifting method is neat, but when I was testing it, sometimes it seemed like the alive chance shot up to ~95% chance and lit up most of the grid which dies instantly.

Add entropy:

    unsigned cIndex = 0;
    for (unsigned y = 0; y < rows; ++y) {
      #if defined(ARDUINO_ARCH_ESP32)
        random16_add_entropy(esp_random() & 0xFFFF);
      #endif
      for (unsigned x = 0; x < cols; ++x, ++cIndex) {
        if ((random16() & 0xFF) < 82) { // ~32%
          grid.setCell(cIndex, x, y, true, wrap);
          cells[cIndex].toggleStatus = 1; // Used to set initial color
        }
        else cells[cIndex].superDead = 1;
      }
    }

Use rand():

   unsigned cIndex = 0;
   for (unsigned y = 0; y < rows; ++y) for (unsigned x = 0; x < cols; ++x, ++cIndex) {
     #if defined(ARDUINO_ARCH_ESP32)
       bool setAlive = (rand() & 0xFF) < 82; // ~32%
     #else
       bool setAlive = (random16() & 0xFF) < 82; // ~32%
     #endif
     if (setAlive) {
       grid.setCell(cIndex, x, y, true, wrap);
       cells[cIndex].toggleStatus = 1; // Used to set initial color
     }
     else cells[cIndex].superDead = 1;
   }

@softhack007
Copy link
Collaborator

softhack007 commented Nov 18, 2024

@Brandon502 thanks for your quick response:-)

Actually the "Add entropy" solution seems better to me since it does not only rely on software pseudo randomness.

@troyhacks would be good to know if both solutions keep your art-net hardware in sync?

@troyhacks
Copy link
Collaborator

troyhacks commented Nov 19, 2024

Both were equally as fast to my eye, no issues with startup delay or Art-Net output.

Pushed the one @softhack007 picked via #189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants