Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow first start/initialization #340

Open
wickles opened this issue Jul 2, 2023 · 8 comments
Open

Slow first start/initialization #340

wickles opened this issue Jul 2, 2023 · 8 comments

Comments

@wickles
Copy link

wickles commented Jul 2, 2023

ble version: 0.4.0-devel4+f16c0d8
Bash version: 5.2.15(1)-release (aarch64-apple-darwin22.1.0)

My login shell is zsh but I've been playing with BLE by starting bash from zsh. On first starting bash with BLE it takes quite a long time to load the first time, like over a second. Then if I exit bash and start again it is quicker (due to caching?) but still noticeably slower than I would expect. When I start bash without my user config (without BLE) it is instantaneous.

I looked around and didn't find any documentation or discussion about initialization performance so I'm not sure if this is expected behavior. I'm also not exactly sure how to benchmark the startup time. Any ideas for benchmarking and improving the slow startup time? Might look at https://github.com/romkatv/zsh-bench for inspiration.

@akinomyoga
Copy link
Owner

akinomyoga commented Jul 2, 2023

The startup time depends on the device and also on the operating system. It is typically about 100-400 msec in a Linux environment. Based on the reports I have received so far, the performance in macOS seems to be slower than Linux, but I haven't actually tested it in macOS as I don't have it.

so I'm not sure if this is expected behavior.

This is expected. As you suspected, the startup for the first time is slower due to caching, specifically for the caching of the terminal sequences and keymaps. After the first-time initialization, it shouldn't take so much time, but it is still slower than plain Bash. If you request zero cost from ble.sh, it's impossible.

I'm also not exactly sure how to benchmark the startup time.

There are two phases in the initialization. The first one is the loading phase (source ble.sh), and the other is the attaching phase (ble-attach).

If you would like to see the initialization time measurements for the first phase, you can change the following line of the source file ble.pp and rebuild ble.sh by running make:

diff --git a/ble.pp b/ble.pp
index 00e85a32..b05b8d6d 100644
--- a/ble.pp
+++ b/ble.pp
@@ -1,7 +1,7 @@
 #!/bin/bash
 #%$> out/ble.sh
 #%[release = 0]
-#%[measure_load_time = 0]
+#%[measure_load_time = 1]
 #%[debug_keylogger = 1]
 #%[leakvar = ""]
 #%#----------------------------------------------------------------------------

If you would like to profile the second phase, you can modify ble-attach in bashrc in the following way:

# at the end of bashrc
if [[ ${BLE_VERSION-} ]]; then
  ble/debug/profiler/start a
  ble-attach
  ble/debug/profiler/stop
fi

Then, two files a.func.txt and a.line.txt will be created after starting a ble.sh session. For example, to see the most time-consuming functions, you can sort a.func.txt by column 6:

$ sort -rk 6 a.func.txt | head -20

It should be noted that the time measurements and profiler make the startup even slower because the processing of time measurements and profiling also takes time.

@akinomyoga
Copy link
Owner

To measure the time of the first-time startup, you can clear the cache before loading ble.sh using the following command

$ bash /path/to/ble.sh --clear-cache

@wickles
Copy link
Author

wickles commented Jul 3, 2023

Perfect, thank you. Here are my results.

Loading phase

Without cache.

❯ bash
ble.sh: 1688351168.895969 load start
ble.sh: 1688351168.932548 parsed
[Elapsed 0.007s; CPU U:0.003s S:0.004s (96.76%)] ble.pp/prologue (2 forks)
[Elapsed 0.002s; CPU U:0.001s S:0.001s (97.74%)] src/def.sh (0 forks)
ble/term.sh: updating tput cache for TERM=xterm-256color... done
[Elapsed 0.173s; CPU U:0.045s S:0.107s (87.92%)] src/util.sh (83 forks)
[Elapsed 0.007s; CPU U:0.005s S:0.002s (98.92%)] src/decode.sh (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (97.50%)] src/color.sh (0 forks)
[Elapsed 0.012s; CPU U:0.010s S:0.002s (99.21%)] src/canvas.sh (0 forks)
[Elapsed 0.004s; CPU U:0.002s S:0.002s (98.45%)] src/history.sh (0 forks)
[Elapsed 0.012s; CPU U:0.009s S:0.003s (98.67%)] src/edit.sh (0 forks)
[Elapsed 0.002s; CPU U:0.000s S:0.001s (95.09%)] lib/core-cmdspec-def.sh (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (96.47%)] lib/core-syntax-def.sh (0 forks)
[Elapsed 0.006s; CPU U:0.003s S:0.002s (97.63%)] lib/core-complete-def.sh (0 forks)
[Elapsed 0.001s; CPU U:0.000s S:0.001s (95.98%)] lib/core-debug-def.sh (0 forks)
[Elapsed 0.002s; CPU U:0.001s S:0.001s (96.05%)] contrib/integration/bash-preexec-def.sh (0 forks)
[Elapsed 0.002s; CPU U:0.000s S:0.001s (95.25%)] ble.pp/epilogue (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (96.79%)] blerc: '/Users/me/.blerc' (0 forks)
[Elapsed 0.256s; CPU U:0.097s S:0.136s (91.17%)] Total
ble.sh: 1688351169.191618 load end

With cache.

❯ bash
ble.sh: 1688351445.087609 load start
ble.sh: 1688351445.121589 parsed
[Elapsed 0.007s; CPU U:0.003s S:0.004s (97.49%)] ble.pp/prologue (2 forks)
[Elapsed 0.001s; CPU U:0.001s S:0.001s (98.80%)] src/def.sh (0 forks)
[Elapsed 0.020s; CPU U:0.011s S:0.008s (93.65%)] src/util.sh (2 forks)
[Elapsed 0.007s; CPU U:0.005s S:0.002s (98.89%)] src/decode.sh (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (97.28%)] src/color.sh (0 forks)
[Elapsed 0.012s; CPU U:0.010s S:0.002s (99.21%)] src/canvas.sh (0 forks)
[Elapsed 0.004s; CPU U:0.002s S:0.001s (98.20%)] src/history.sh (0 forks)
[Elapsed 0.012s; CPU U:0.008s S:0.003s (99.23%)] src/edit.sh (0 forks)
[Elapsed 0.002s; CPU U:0.000s S:0.001s (96.25%)] lib/core-cmdspec-def.sh (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (97.15%)] lib/core-syntax-def.sh (0 forks)
[Elapsed 0.005s; CPU U:0.003s S:0.002s (98.23%)] lib/core-complete-def.sh (0 forks)
[Elapsed 0.002s; CPU U:0.000s S:0.001s (95.41%)] lib/core-debug-def.sh (0 forks)
[Elapsed 0.003s; CPU U:0.001s S:0.001s (96.66%)] contrib/integration/bash-preexec-def.sh (0 forks)
[Elapsed 0.002s; CPU U:0.000s S:0.001s (96.13%)] ble.pp/epilogue (0 forks)
[Elapsed 0.003s; CPU U:0.002s S:0.001s (96.86%)] blerc: '/Users/me/.blerc' (0 forks)
[Elapsed 0.103s; CPU U:0.063s S:0.037s (97.27%)] Total
ble.sh: 1688351445.227942 load end

In this phase it takes ~0.15 sec longer to load while building the cache.

Attaching phase

Without cache.

❯ sort -rk 6 a.func.txt | head -20
#  count  subcall  allcall total_msec TOTAL%  self_msec SELF%  child_msec   max_msec   max_self  max_child FUNCNAME (SOURCE)
   16781    32282    50683    575.116 36.87%    136.847 8.774%    438.269     13.479      0.034     13.445 ble-bind (/Users/me/.local/share/blesh/ble.sh)
   17092     7281    11047    226.731 14.54%    129.166 8.282%     97.565      0.238      0.028      0.210 ble-decode-kbd (/Users/me/.local/share/blesh/ble.sh)
    9295     1350     1350     85.353 5.472%     74.375 4.769%     10.978      0.099      0.009      0.090 ble-decode-char/bind (/Users/me/.local/share/blesh/ble.sh)
    8092     8100    13648    199.840 12.81%     73.895 4.738%    125.945      0.633      0.013      0.620 ble/unicode/GraphemeCluster/match (/Users/me/.local/share/blesh/ble.sh)
    7196     4450     5410    117.787 7.552%     68.370 4.384%     49.417      0.259      0.011      0.248 ble/unicode/GraphemeCluster/s2break-right (/Users/me/.local/share/blesh/ble.sh)
    7836        0        0     66.983 4.295%     66.983 4.295%      0.000      1.978      1.978      0.000 ble/string#split-words (/Users/me/.local/share/blesh/ble.sh)
     143      108      108     82.233 5.272%     62.260 3.992%     19.973      8.705      8.705      0.000 ble/builtin/bind/read-user-settings/.collect (/Users/me/.local/share/blesh/ble.sh)
    5411     2152     2152     74.495 4.776%     48.863 3.133%     25.632      2.037      0.010      2.027 ble-decode-key/bind (/Users/me/.local/share/blesh/ble.sh)
    4276     9606    27998    310.190 19.89%     44.956 2.882%    265.234      1.652      0.025      1.627 ble/canvas/trace/.impl (/Users/me/.local/share/blesh/ble.sh)
    4107     2738     2738     62.063 3.979%     38.724 2.483%     23.339      0.069      0.015      0.054 ble/util/s2c (/Users/me/.local/share/blesh/ble.sh)
    1582        0        0     38.286 2.455%     38.286 2.455%      0.000      0.603      0.603      0.000 ble/util/print (/Users/me/.local/share/blesh/ble.sh)
      32        2        2     34.785 2.230%     34.757 2.228%      0.028      8.463      8.463      0.000 starship_precmd (/dev/fd/63)
    3584     5231    19942    202.441 12.98%     31.937 2.048%    170.504     33.156      0.018     33.138 ble/decode/cmap/.generate-binder-template (/Users/me/.local/share/blesh/ble.sh)
      12        4        4     31.361 2.011%     31.361 2.011%      0.000      8.978      8.978      0.000 ble/util/declare-print-definitions (/Users/me/.local/share/blesh/ble.sh)
     663      371      764    231.256 14.83%     28.536 1.830%    202.720     49.500      0.012     49.488 ble/util/assign (/Users/me/.local/share/blesh/ble.sh)
    3220       30       34     27.745 1.779%     27.401 1.757%      0.344      0.186      0.011      0.175 ble/util/is-unicode-output (/Users/me/.local/share/blesh/ble.sh)
    3347     3351     5923    105.864 6.788%     25.188 1.615%     80.676      4.475      0.014      4.461 ble/init:bind/generate-binder (/Users/me/.local/share/blesh/lib/init-bind.sh)
       4        0        0     24.514 1.572%     24.514 1.572%      0.000     12.313     12.313      0.000 ble/util/reset-keymap-of-editing-mode (/Users/me/.local/share/blesh/ble.sh)
    2924        0        0     22.404 1.436%     22.404 1.436%      0.000      0.098      0.098      0.000 ble/canvas/trace/.implicit-move (/Users/me/.local/share/blesh/ble.sh)

❯ awk '{sum+=$6;}END{print sum;}' a.func.txt
1559.69

With cache.

❯ sort -rk 6 ble-log-cache/a.func.txt | head -20
#  count  subcall  allcall total_msec TOTAL%  self_msec SELF%  child_msec   max_msec   max_self  max_child FUNCNAME (SOURCE)
    4046     4050     6824    102.070 32.30%     37.749 11.95%     64.321      0.599      0.013      0.586 ble/unicode/GraphemeCluster/match (/Users/me/.local/share/blesh/ble.sh)
    3598     2225     2705     60.113 19.02%     35.120 11.11%     24.993      0.256      0.010      0.246 ble/unicode/GraphemeCluster/s2break-right (/Users/me/.local/share/blesh/ble.sh)
    2138     4803    13999    159.874 50.59%     23.491 7.433%    136.383      1.652      0.026      1.626 ble/canvas/trace/.impl (/Users/me/.local/share/blesh/ble.sh)
      16        1        1     17.550 5.553%     17.539 5.550%      0.011      8.338      8.338      0.000 starship_precmd (/dev/fd/63)
    1462        0        0     11.762 3.722%     11.762 3.722%      0.000      0.067      0.067      0.000 ble/canvas/trace/.implicit-move (/Users/me/.local/share/blesh/ble.sh)
     320      180      369     49.102 15.54%     11.052 3.497%     38.050     19.685      0.011     19.674 ble/util/assign (/Users/me/.local/share/blesh/ble.sh)
       2        0        0     10.784 3.412%     10.784 3.412%      0.000     10.777     10.777      0.000 ble/util/reset-keymap-of-editing-mode (/Users/me/.local/share/blesh/ble.sh)
     639      150      899     21.270 6.731%     10.491 3.320%     10.779      4.682      0.023      4.659 source (/Users/me/.cache/blesh/0.4/decode.bind.50215.UTF-8.bind)
       3        0        0     10.402 3.292%     10.402 3.292%      0.000     10.402     10.402      0.000 ble/bin/awk (/Users/me/.local/share/blesh/ble.sh)
    1025        0        0      8.938 2.828%      8.938 2.828%      0.000      0.027      0.027      0.000 ble/unicode/GraphemeCluster/c2break (/Users/me/.local/share/blesh/ble.sh)
     726      484      484     11.974 3.789%      7.573 2.396%      4.401      0.079      0.053      0.026 ble/util/s2c (/Users/me/.local/share/blesh/ble.sh)
       3        0        0      6.769 2.142%      6.769 2.142%      0.000      2.595      2.595      0.000 ble/bin/stty (/Users/me/.local/share/blesh/ble.sh)
     725       15       17      6.683 2.115%      6.503 2.058%      0.180      0.190      0.010      0.180 ble/util/is-unicode-output (/Users/me/.local/share/blesh/ble.sh)
     244        0        0      5.723 1.811%      5.723 1.811%      0.000      0.077      0.077      0.000 ble/function#suppress-stderr:ble/util/isprint+ (/Users/me/.local/share/blesh/ble.sh)
     503      408     1011     20.052 6.345%      5.084 1.609%     14.968      1.769      0.029      1.740 ble/builtin/trap/.handler (/Users/me/.local/share/blesh/ble.sh)
      20        0        0      4.755 1.505%      4.755 1.505%      0.000      1.547      1.547      0.000 ble/base/xtrace/adjust (/Users/me/.local/share/blesh/ble.sh)
      55       48       48      7.333 2.320%      4.748 1.502%      2.585      1.627      1.627      0.000 ble/builtin/bind/read-user-settings/.collect (/Users/me/.local/share/blesh/ble.sh)
     476     1546     1546     17.383 5.501%      4.554 1.441%     12.829      0.196      0.012      0.184 ble/canvas/trace/.put-atomic.draw (/Users/me/.local/share/blesh/ble.sh)
       1        0        0      4.155 1.315%      4.155 1.315%      0.000      4.155      4.155      0.000 ble/bin/wc (/Users/me/.local/share/blesh/ble.sh)

❯ awk '{sum+=$6;}END{print sum;}' a.func.txt
316.021

In this phase many parts take much longer without the cache. I can give sum totals once I remember how to use awk.

(You may notice starship_precmd in here since I am using starship. I have tried without it and it did not make a noticeable difference.)

Not sure if there is anything actionable here for me. Maybe something will stand out to you.

@akinomyoga
Copy link
Owner

What is your request: to speed up the startup when there is an existing cache, or to speed up the cache creation? If it is the latter, I will not try to improve the performance of the cache creation because the cached part is what is hard to improve the performance. If the performance could have been improved, that part wouldn't have been cached but would have been processed directly.

@wickles
Copy link
Author

wickles commented Jul 5, 2023

I have added the sums for the binding part. In total my startup times are roughly ~1800ms without cache and ~420ms with cache. This is on an M1 macbook pro which I would think should be competitive with the best results you have seen.

The startup time depends on the device and also on the operating system. It is typically about 100-400 msec in a Linux environment. Based on the reports I have received so far, the performance in macOS seems to be slower than Linux, but I haven't actually tested it in macOS as I don't have it.

Is the 100-400 ms time you cite here for cached start? Either way it seems like these results are not really in line with expectations. I guess I am wondering if you notice any functions taking particularly long here that might be running especially poorly on mac for whatever reason.

@akinomyoga
Copy link
Owner

Sorry for the delayed reply.

I have added the sums for the binding part. In total my startup times are roughly ~1800ms without cache and ~420ms with cache. This is on an M1 macbook pro which I would think should be competitive with the best results you have seen.

I think this is the normal range of the delay, although I think 420ms is a bit slower than my experience in Linux but the order is the same.

Is the 100-400 ms time you cite here for cached start?

Yes.

I guess I am wondering if you notice any functions taking particularly long here that might be running especially poorly on mac for whatever reason.

I don't see any significant difference in a particular function. However, the prompt construction and layout seem to take a longer time as a whole with your setup. One possibility is that your starship prompt produces relatively long prompt strings.

@Darukutsu
Copy link

I don't understand from issue if startup should be bellow 1 sec or not for cached... What's the average startup for time bash?

these results are cached on Intel i7-8550U (4) @ 4.000GHz

 $  awk '{sum+=$6}END{print sum}' a.func.txt 
434.295
 $  awk '{sum+=$6}END{print sum}' a.line.txt 
929.907

@akinomyoga
Copy link
Owner

akinomyoga commented Dec 27, 2024

The time should in general depend on the system and the setup, but it should usually be less than 1 sec in my experience as far as you are on Linux. In my machines (which are already kind of seven years old), I have now measured the time with a plain setting:

  • 177ms (load) + 2ms (Bash) + 23ms (prompt) + 43ms (ready) = 245ms with Bash 5.2 / "Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz"
  • 193ms (load) + 7ms (Bash) + 6ms (prompt) + 58ms (ready) = 264ms with Bash 5.3-alpha / "Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz"
  • edit(added): 321ms (load) + 23ms (Bash) + 56ms (prompt) + 221ms (ready) = 621ms with Bash 4.4 / "Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz". This is not my computer, but a frontend node of a scientific supercomputing system.

In the above results, the times with labels show the detailed breakdown of the total time:

  • "load" is the total time for sourcing ble.sh
  • "Bash" is the time consumed by Bash (outside ble.sh), i.e., the time between the end of source ble.sh and the first call of PROMPT_COMMAND by Bash.
  • "prompt" is the time consumed until the first rendering of the prompt by ble.sh.
  • "ready" is the time until starting to accept user input. This time usually does not affect the user experience because it is not observable unless the user starts to input something before this time.

These are measured in pure systems. If you measure it inside a virtual machine or on top of an emulation layer, the initialization becomes slower even with the same CPU.


You should be reminded that the times in a.func.txt and a.line.txt contain the idle time, where ble.sh is waiting for the user's inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants