Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial plugin #1

Merged
merged 37 commits into from
Nov 11, 2024
Merged

Initial plugin #1

merged 37 commits into from
Nov 11, 2024

Conversation

dustinblack
Copy link
Member

@dustinblack dustinblack commented Oct 30, 2024

Changes introduced with this PR

Adding a new plugin for RTLA timerlat. Please review all code, not just changes.


By contributing to this repository, I agree to the contribution guidelines.

webbnh

This comment was marked as resolved.

@dustinblack dustinblack marked this pull request as ready for review October 31, 2024 12:50
@dustinblack dustinblack marked this pull request as draft October 31, 2024 12:51
@dustinblack dustinblack marked this pull request as ready for review November 5, 2024 12:57
@dustinblack dustinblack requested review from webbnh and a team November 5, 2024 12:57
@dustinblack dustinblack requested a review from dbutenhof November 5, 2024 17:52
Comment on lines 32 to 33
exit = Event()
finished_early = False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given my recent experiences with plugins in odd scenarios, I think we should move these from class variables to instance variables. To do that, just move them into a constructor.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. Definitely matches my preference ... use class variables only for things that must be shared across instances, which presumably these would not be. But on the other hand is it realistic for a single container to have multiple simultaneous instances of the plugin class?

latency_hist.append(row_obj)
else:
stats_per_col.append(row_obj)
if re.match(r"^ALL", line) and not found_all:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reordering this to put not found_all first could make it more efficient by skipping the pattern matching after it's found.

stats_per_col = []
found_all = False

for line in output.splitlines():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section could use some comments. I have little idea on what it's doing since I'm not familiar with the output format.

Comment on lines 131 to 145
if params.user_threads:
return "success", TimerlatOutput(
latency_hist,
stats_per_col,
latency_stats_schema.unserialize(total_irq_latency),
latency_stats_schema.unserialize(total_thr_latency),
latency_stats_schema.unserialize(total_usr_latency),
)

return "success", TimerlatOutput(
latency_hist,
stats_per_col,
latency_stats_schema.unserialize(total_irq_latency),
latency_stats_schema.unserialize(total_thr_latency),
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to combine these into the same output, and just use the Ternary Operator on the value for total_usr_latency.

Comment on lines +94 to +108
min: typing.Annotated[
int,
schema.name("minimum latency"),
schema.description("Minimum latency value"),
] = None
avg: typing.Annotated[
int,
schema.name("average latency"),
schema.description("Average latency value"),
] = None
max: typing.Annotated[
int,
schema.name("maximum latency"),
schema.description("Maximum latency value"),
] = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this mention the units?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made me realize that the inputs allow us to change the unit, so I added that as a separate object in the output. Since this can be either ms or us based on the input, I can't state the unit directly in the description except maybe as an either/or.

@dataclass
class TimerlatOutput:
latency_hist: typing.Annotated[
typing.List[typing.Any],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses an any type. What type is it outputting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is a list of key:value pairs where the values are always integers, but the number of pairs and the names of the keys are determined by the input parameters and/or the particular CPU architecture of the system. I'm not sure how to define an output object that can account for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this and variations, but I just get exceptions: typing.List[typing.Dict[str, int]],

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6794, in _resolve_list_annotation
    cls._resolve_abstract_type(
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6281, in _resolve_abstract_type
    result = cls._resolve(t, type_hints, path, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6342, in _resolve
    return cls._resolve_dict_annotation(t, type_hints, path, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6875, in _resolve_dict_annotation
    args[1], arg_hints[1], tuple(values_path), scope
IndexError: tuple index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/arcaflow_plugin_rtla/tests/test_arcaflow_plugin_rtla.py", line 3, in <module>
    import rtla_plugin
  File "/app/arcaflow_plugin_rtla/rtla_plugin.py", line 31, in <module>
    class StartTimerlatStep:
  File "/app/arcaflow_plugin_rtla/rtla_plugin.py", line 60, in StartTimerlatStep
    def run_timerlat(
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/plugin.py", line 143, in step_decorator
    scope = build_object_schema(outputs[response_id])
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 7013, in build_object_schema
    r = _SchemaBuilder.resolve(t, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6271, in resolve
    return cls._resolve_abstract_type(t, t, tuple(path), scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6281, in _resolve_abstract_type
    result = cls._resolve(t, type_hints, path, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6324, in _resolve
    return cls._resolve_type(t, type_hints, path, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6377, in _resolve_type
    return cls._resolve_class(t, type_hints, path, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6569, in _resolve_class
    name, final_field = cls._resolve_dataclass_field(
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6409, in _resolve_dataclass_field
    underlying_type = cls._resolve_field(t.type, type_hints, path, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6308, in _resolve_field
    result = cls._resolve(t, type_hints, path, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6346, in _resolve
    return cls._resolve_annotated(t, type_hints, path, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6728, in _resolve_annotated
    underlying_t = cls._resolve(args[0], args_hints[0], path, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6340, in _resolve
    return cls._resolve_list_annotation(t, type_hints, path, scope)
  File "/usr/local/lib/python3.9/site-packages/arcaflow_plugin_sdk/schema.py", line 6799, in _resolve_list_annotation
    raise SchemaBuildException(
arcaflow_plugin_sdk.schema.SchemaBuildException: Invalid schema definition for TimerlatOutput -> latency_hist -> typing.Annotated: Failed to create list type
Error: building at STEP "RUN python -m coverage run tests/test_${package}.py  && python -m coverage html -d /htmlcov --omit=/usr/local/*": while running runtime: exit status 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a sample of what is currently returned. Note the number of keys and their names (-001, -002, -003, etc) are determined by the user input to cpus and user-threads and the number of CPUs on the target system.

  latency_hist:
  - index: '0'
    irq-001: '1380'
    thr-001: '0'
    usr-001: '0'
    irq-002: '1849'
    thr-002: '0'
    usr-002: '0'
  - index: '1'
    irq-001: '1436'
    thr-001: '0'
    usr-001: '0'
    irq-002: '1119'
    thr-002: '2'
    usr-002: '0'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the SDK fix from arcalot/arcaflow-plugin-sdk-python#141 these are being re-defined as lists of dicts (though some related issues with that are still being addressed).

webbnh

This comment was marked as resolved.

dustinblack and others added 6 commits November 8, 2024 12:51
Co-authored-by: Webb Scales <[email protected]>
Signed-off-by: Dustin Black <[email protected]>
Co-authored-by: Webb Scales <[email protected]>
Signed-off-by: Dustin Black <[email protected]>
Co-authored-by: Webb Scales <[email protected]>
Signed-off-by: Dustin Black <[email protected]>
Co-authored-by: Webb Scales <[email protected]>
Signed-off-by: Dustin Black <[email protected]>
@dustinblack dustinblack requested a review from webbnh November 8, 2024 12:46
Copy link

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample output helps a lot, although the code indicates at least one "special case" that's not covered -- possibly by accident? (See detailed comments.)

Basically, I think the parsing is disorganized, hard to follow, and likely to be hard to maintain. But we'll probably never need to look at it again (and if someone ever does need to maintain it, it'll probably be you, and definitely won't be me 🤣) ... so if you're happy that this works, go for it.

Copy link
Contributor

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Contributor

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below for a coding suggestion.

Note that the Renovate bot should be stopping by a half hour from now for the arcaflow-plugin-sdk update, if everything is working as hoped.

Copy link

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that looks a lot cleaner; although getting rid of the output_lines indexing (e.g., with output_lines.pop(0)) might streamline it a bit by reducing extraneous code.

is_summary = re.compile(r"ALL")

output_lines = output.splitlines()
line_num = 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of dealing with line numbers as list offsets, you could consider using output_lines.pop(0) to remove the first entry on each iteration.

You'd need to handle an IndexError exception if you don't actually want to check that len(output_lines)>0 first... but unless you get output radically different than you expect (in which case an unhandled exception is probably not the worst outcome), that's only an issue on "phase 3".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about using .pop() and just discarding as I go, but that's probably a better solution than tracking the line number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pop() would be good, but I've recommended using an iterator instead, below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, the idea of the .pop() made sense to me, but I'm getting really strange results every way that I try it. For what is gained, It's not worth me putting more time into it.

Copy link
Contributor

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dustin, this looks great! I really like the phased parsing. Of course, I do have a couple of suggestions.

Comment on lines 152 to 153
# Capture the column headers
elif re_isindex.match(line):
elif is_header.match(line):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps efficiency is not paramount (and, I don't even know how expensive regex parsing is), but given how simple this match is, you might consider using str.startswith() instead of match(). (Ditto for is_digit and is_summary.)

Comment on lines 144 to 185
output_lines = output.splitlines()
line_num = 0

for line in output.splitlines():
if re_isunit.match(line):
time_unit = re_isunit.match(line).group(1)
# Phase 1: Get the headers
for line_num, line in enumerate(output_lines):
# Get the time unit (user-selectable)
if is_time_unit.match(line):
time_unit = is_time_unit.match(line).group(1)
# Capture the column headers
elif re_isindex.match(line):
elif is_header.match(line):
col_headers = line.lower().split()
# Stats names repeat, so flag when have passed ^ALL
elif re_isall.match(line):
found_all = True
# Either this is a histogram bucket row, or the first time we have seen
# a row beginning with a stat name
elif (re_isdigit.match(line)) or (
line.split()[0] in stats_names and not found_all
):
line_num += 1
break

# Phase 2: Get the columnar data
for i in range(line_num, len(output_lines)):
line_list = output_lines[i].split()
row_obj = {}
# Collect histogram buckets and column latency statistics
if not is_summary.match(line_list[0]):
# Capture the columnar data
line_list = []
for element in line.split():
try:
line_list.append(int(element))
except ValueError:
line_list.append(element)
row_obj = dict(zip(col_headers, line_list))
if re_isdigit.match(line):
latency_hist.append(row_obj)
if not is_digit.match(line_list[0]):
# Stats index values are strings
row_obj[col_headers[0]] = line_list[0][:-1]
accumulator = stats_per_col
else:
stats_per_col.append(row_obj)
# Since we've encountered the summary statistics (marked by the line
# starting with "ALL"), generate key:value pairs instead of columnar data.
elif found_all and line.split()[0] in stats_names:
label = line.split()[0][:-1]
if label != "over":
total_irq_latency[label] = line.split()[1]
total_thr_latency[label] = line.split()[2]
if params.user_threads:
total_usr_latency[label] = line.split()[3]
# Histogram index values are integers
row_obj[col_headers[0]] = int(line_list[0])
row_obj = row_obj | dict(zip(col_headers[1:], map(int, line_list[1:])))
accumulator.append(row_obj)
else:
line_num = i + 1
break

# Phase 3: Get the stats summary as key:value pairs
for i in range(line_num, len(output_lines)):
line_list = output_lines[i].split()
label = line_list[0][:-1]
total_irq_latency[label] = line_list[1]
total_thr_latency[label] = line_list[2]
if params.user_threads:
total_usr_latency[label] = line_list[3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you only want to visit each line once, I suggest using an iterator instead of enumeration:

Suggested change
output_lines = output.splitlines()
line_num = 0
for line in output.splitlines():
if re_isunit.match(line):
time_unit = re_isunit.match(line).group(1)
# Phase 1: Get the headers
for line_num, line in enumerate(output_lines):
# Get the time unit (user-selectable)
if is_time_unit.match(line):
time_unit = is_time_unit.match(line).group(1)
# Capture the column headers
elif re_isindex.match(line):
elif is_header.match(line):
col_headers = line.lower().split()
# Stats names repeat, so flag when have passed ^ALL
elif re_isall.match(line):
found_all = True
# Either this is a histogram bucket row, or the first time we have seen
# a row beginning with a stat name
elif (re_isdigit.match(line)) or (
line.split()[0] in stats_names and not found_all
):
line_num += 1
break
# Phase 2: Get the columnar data
for i in range(line_num, len(output_lines)):
line_list = output_lines[i].split()
row_obj = {}
# Collect histogram buckets and column latency statistics
if not is_summary.match(line_list[0]):
# Capture the columnar data
line_list = []
for element in line.split():
try:
line_list.append(int(element))
except ValueError:
line_list.append(element)
row_obj = dict(zip(col_headers, line_list))
if re_isdigit.match(line):
latency_hist.append(row_obj)
if not is_digit.match(line_list[0]):
# Stats index values are strings
row_obj[col_headers[0]] = line_list[0][:-1]
accumulator = stats_per_col
else:
stats_per_col.append(row_obj)
# Since we've encountered the summary statistics (marked by the line
# starting with "ALL"), generate key:value pairs instead of columnar data.
elif found_all and line.split()[0] in stats_names:
label = line.split()[0][:-1]
if label != "over":
total_irq_latency[label] = line.split()[1]
total_thr_latency[label] = line.split()[2]
if params.user_threads:
total_usr_latency[label] = line.split()[3]
# Histogram index values are integers
row_obj[col_headers[0]] = int(line_list[0])
row_obj = row_obj | dict(zip(col_headers[1:], map(int, line_list[1:])))
accumulator.append(row_obj)
else:
line_num = i + 1
break
# Phase 3: Get the stats summary as key:value pairs
for i in range(line_num, len(output_lines)):
line_list = output_lines[i].split()
label = line_list[0][:-1]
total_irq_latency[label] = line_list[1]
total_thr_latency[label] = line_list[2]
if params.user_threads:
total_usr_latency[label] = line_list[3]
output_lines = iter(output.splitlines())
# Phase 1: Get the headers
for line in output_lines:
# Get the time unit (user-selectable)
if is_time_unit.match(line):
time_unit = is_time_unit.match(line).group(1)
# Capture the column headers
elif is_header.match(line):
col_headers = line.lower().split()
break
# Phase 2: Collect histogram buckets and column latency statistics
for line in output_lines:
line_list = line.split()
# Collect statistics up until the summary section
# (We don't process the summary header line itself, we just skip it here.)
if is_summary.match(line_list[0]):
break
row_obj = dict(zip(col_headers[1:], map(int, line_list[1:])))
if not is_digit.match(line_list[0]):
# Stats index values are strings ending in a colon
row_obj[col_headers[0]] = line_list[0][:-1]
# When we hit the stats, switch to the other accumulator
accumulator = stats_per_col
else:
# Histogram index values are integers
row_obj[col_headers[0]] = int(line_list[0])
accumulator.append(row_obj)
# Phase 3: Get the stats summary as key:value pairs
for line in output_lines:
line_list = line.split()
label = line_list[0][:-1]
total_irq_latency[label] = line_list[1]
total_thr_latency[label] = line_list[2]
if params.user_threads:
total_usr_latency[label] = line_list[3]

That is, using the iterator allows us to avoid enumerating the output lines and having to keep track of where we are; we process the lines in sequence, moving to the next phase like a state machine.

Also, I tweaked Phase 2. Moving the break up allows us to avoid the extra level of indentation/complexity. And, using the result of the zip() call as the initializer for row_obj saves us from having to use the "update" operation to add it to row_obj later. And, I added/reworked the comments, slightly.

[Disclaimer: I didn't actually try running this code....]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll play with this as an alternative to the .pop(), suggested above by Dave, that was giving me fits.

@dustinblack dustinblack merged commit fc7c664 into main Nov 11, 2024
3 checks passed
@dustinblack dustinblack deleted the initial-plugin branch November 11, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants