TRLC performance analysis and improvements #43

florianschanda · 2023-10-23T07:30:52Z

The worst offenders are for tests-system/bulk are:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   31.589   31.589 trlc/trlc.py:21(<module>)
        1    0.000    0.000   31.571   31.571 trlc/trlc.py:490(main)
        1    0.000    0.000   31.558   31.558 trlc/trlc.py:422(process)
        1    0.000    0.000   27.343   27.343 trlc/trlc.py:364(parse_trlc_files)
       85    0.037    0.000   27.338    0.322 trlc/parser.py:1705(parse_trlc_file)
    63859    0.044    0.000   27.257    0.000 trlc/parser.py:1577(parse_trlc_entry)
    63859    0.887    0.000   27.084    0.000 trlc/parser.py:1530(parse_record_object_declaration)
  1217740    0.433    0.000   19.282    0.000 trlc/parser.py:167(match)
  1217930    0.643    0.000   18.853    0.000 trlc/parser.py:139(advance)
  1217930    3.994    0.000   18.210    0.000 trlc/lexer.py:332(token)
   319609    0.692    0.000   10.996    0.000 trlc/parser.py:1352(parse_value)
  6214992    3.212    0.000    4.665    0.000 trlc/lexer.py:215(is_alnum)
   581360    0.591    0.000    4.360    0.000 trlc/ast.py:3060(lookup_direct)
        1    0.017    0.017    4.212    4.212 trlc/trlc.py:391(resolve_record_references)
    63859    0.144    0.000    4.160    0.000 trlc/ast.py:2873(resolve_references)
   109334    0.077    0.000    4.030    0.000 trlc/ast.py:1063(resolve_references)
    74685    0.032    0.000    3.853    0.000 trlc/ast.py:904(resolve_references)
  9752278    3.774    0.000    3.774    0.000 trlc/lexer.py:238(advance)
       10    0.298    0.030    3.659    0.366 /usr/lib/python3.8/difflib.py:688(get_close_matches)
    94872    0.136    0.000    2.933    0.000 trlc/parser.py:328(parse_qualified_name)
   638740    1.912    0.000    2.813    0.000 /usr/lib/python3.8/difflib.py:647(quick_ratio)
  1217930    0.876    0.000    2.442    0.000 trlc/lexer.py:232(skip_whitespace)
    63859    0.096    0.000    2.229    0.000 trlc/ast.py:2816(__init__)
  1217844    1.020    0.000    1.905    0.000 trlc/lexer.py:71(__init__)
    63859    0.531    0.000    1.846    0.000 trlc/ast.py:2821(<dictcomp>)
  1021744    0.720    0.000    1.316    0.000 trlc/ast.py:546(__init__)
  1217844    0.765    0.000    1.242    0.000 trlc/lexer.py:162(__init__)
  1217844    0.798    0.000    1.188    0.000 trlc/lexer.py:200(is_alpha)

This is not unexpected:

token() is the worst offender with 18s (number crunching)
parse_trlc_files() takes around 9s once you remove the lexing (which likely seems unavoidable)
and process() takes 4 seconds, which is entirely due to resolve_record_references (unavoidable, this is work that needs to happen sooner or later)

There are some immediate ideas:

is_alpha, is_alum, and is_digit could be replaced by more builtiny functions (but we need to take care of unicode stuff, so it's not as easy as just using the builtins)
implement partial parsing (sound) #47
implement partial parsing (unsound) #48
token() could be optimised in some other way
token() could be replaced by a hand-written c lexer (but this adds portability concerns)

There is one more issue that could manifest on windows with large repos: if you have millions of files (most of which are not trlc files) then the initial traversal for register_dir could take a lot of time.

The text was updated successfully, but these errors were encountered:

Replace the char classification functions with more efficient, but equivalent, implementations. This reduces token() runtime from 18.2s to 15.1 which is a 17% improvement.

florianschanda added the topic: core Affects lexer/parser/infrastructure label Oct 23, 2023

florianschanda self-assigned this Oct 23, 2023

florianschanda added a commit that referenced this issue Oct 23, 2023

#43 Improve performance

6648472

Replace the char classification functions with more efficient, but equivalent, implementations. This reduces token() runtime from 18.2s to 15.1 which is a 17% improvement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRLC performance analysis and improvements #43

TRLC performance analysis and improvements #43

florianschanda commented Oct 23, 2023 •

edited

Loading

TRLC performance analysis and improvements #43

TRLC performance analysis and improvements #43

Comments

florianschanda commented Oct 23, 2023 • edited Loading

florianschanda commented Oct 23, 2023 •

edited

Loading