Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

converted elf_analysis plugin to new base class #1266

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jstucke
Copy link
Collaborator

@jstucke jstucke commented Sep 11, 2024

No description provided.

@jstucke jstucke requested a review from maringuu September 11, 2024 15:39
@jstucke jstucke self-assigned this Sep 11, 2024
@jstucke jstucke force-pushed the elf-analysis-plugin-v1 branch from c14b112 to b307c7e Compare September 12, 2024 11:32
@jstucke jstucke force-pushed the elf-analysis-plugin-v1 branch from b307c7e to afec839 Compare December 3, 2024 08:46
@codecov-commenter
Copy link

codecov-commenter commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 99.44134% with 1 line in your changes missing coverage. Please review.

Project coverage is 91.86%. Comparing base (ab95ac5) to head (5b1d342).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
...alysis/hardware_analysis/code/hardware_analysis.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1266      +/-   ##
==========================================
- Coverage   92.48%   91.86%   -0.62%     
==========================================
  Files         379      376       -3     
  Lines       24115    21044    -3071     
==========================================
- Hits        22302    19332    -2970     
+ Misses       1813     1712     -101     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/plugins/analysis/elf_analysis/code/elf_analysis.py Outdated Show resolved Hide resolved
src/helperFunctions/hash.py Outdated Show resolved Hide resolved
src/plugins/analysis/elf_analysis/code/elf_analysis.py Outdated Show resolved Hide resolved
src/plugins/analysis/elf_analysis/code/elf_analysis.py Outdated Show resolved Hide resolved
@jstucke jstucke force-pushed the elf-analysis-plugin-v1 branch from 700a7d4 to 64db148 Compare December 31, 2024 13:24
@jstucke jstucke force-pushed the elf-analysis-plugin-v1 branch from 64db148 to cb519ee Compare January 8, 2025 16:54
@jstucke
Copy link
Collaborator Author

jstucke commented Jan 8, 2025

I made some additional changes:

  • "libraries" were missing from the result
  • fixed the _get_elf_analysis_libraries() method to work with the new result structure
  • deduplicated imports and exports (for whatever reason, they can sometimes contain duplicates in lief)

src/plugins/analysis/elf_analysis/code/elf_analysis.py Outdated Show resolved Hide resolved
Comment on lines +287 to +288
def _deduplicate_functions(lief_functions: Iterable[lief.Function]) -> list[tuple[int, str]]:
return sorted({(f.address, f.name) for f in lief_functions})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file a bug report upstream and put a link to it in the code?
Then when someone stumbles upon this code again, they can see if it is still needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure whether or not this is intended. In the old plugin this wasn't an issue, since it also "deduplicated" the results using sets, so in the plugin this is a new bug

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my two cents on how multiple symbols can have the same name.

The ELF specification is defined as part of the System V ABI [1].
According to it, multiple non-global symbols can have the same name in multiple object files [2].

exported_functions: List[ElfSymbol]
imported_functions: List[ElfSymbol]
libraries: List[str]
mod_info: Optional[List[str]] = Field(description='Key value pairs with module information.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reword the description and make more clear that this is related to Linux kernel modules (Isn't it?).
To me the description does not convey much information that is not conveyed by the type and name of the field.

Also, how do 'key value pairs' fit in an Optinal[List[str]] ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, a list of key value pairs doesn't make a lot of sense. I converted it to a dict.

@jstucke jstucke force-pushed the elf-analysis-plugin-v1 branch from cb519ee to ed406c1 Compare January 16, 2025 12:55
@jstucke jstucke requested a review from maringuu January 16, 2025 12:55
@jstucke jstucke force-pushed the elf-analysis-plugin-v1 branch from ed406c1 to 24c1c7e Compare January 16, 2025 13:05
and also removed the redundant normalize_lief_items method
@jstucke jstucke force-pushed the elf-analysis-plugin-v1 branch from 24c1c7e to 5b1d342 Compare January 16, 2025 13:06
Comment on lines +287 to +288
def _deduplicate_functions(lief_functions: Iterable[lief.Function]) -> list[tuple[int, str]]:
return sorted({(f.address, f.name) for f in lief_functions})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my two cents on how multiple symbols can have the same name.

The ELF specification is defined as part of the System V ABI [1].
According to it, multiple non-global symbols can have the same name in multiple object files [2].

Comment on lines +54 to +71
entrypoint=header.entrypoint,
file_type=header.file_type.__name__,
header_size=header.header_size,
identity_abi_version=header.identity_abi_version,
identity_class=header.identity_class.__name__,
identity_data=header.identity_data.__name__,
identity_os_abi=header.identity_os_abi.__name__,
identity_version=header.identity_version.__name__,
machine_type=header.machine_type.__name__.lower(),
numberof_sections=header.numberof_sections,
numberof_segments=header.numberof_segments,
object_file_version=header.object_file_version.__name__,
processor_flag=header.processor_flag,
program_header_size=header.program_header_size,
program_headers_offset=header.program_header_offset,
section_header_size=header.section_header_size,
section_headers_offset=header.section_header_offset,
section_name_table_idx=header.section_name_table_idx,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of using __name__ to convert the constants to human readable names.
The lief names are not documented somehwere accessible (are they?).
For example, the header.file_type as defined in the typing stubs does not have any documentation.
Furthermore, the lief names do not match the names in the official specification.

I'm not sure what the right way to proceed is here. From a GUI user perspective human readable names are important. From a API user perspective the numeric values would be preferable.

Also, why does this work at all? It does not work with standard enum.Enum's and is not defined in lief's typing stubs.

Copy link
Collaborator Author

@jstucke jstucke Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of using name to convert the constants to human readable names.
The lief names are not documented somehwere accessible (are they?).

neither am I, but sadly this seems to be the only way to reproduce the old output. The alternative would be this: #1266 (comment)

I'm not sure what the right way to proceed is here. From a GUI user perspective human readable names are important. From a API user perspective the numeric values would be preferable.

since the barrier for a GUI user to work with numeric values is arguably higher, IMHO using strings is the better approach

Also, why does this work at all? It does not work with standard enum.Enum's and is not defined in lief's typing stubs.

It probably has something to do with the fact that lief is not actually python code, but rather a shared library (.so) with python bindings

@jstucke
Copy link
Collaborator Author

jstucke commented Feb 13, 2025

Just my two cents on how multiple symbols can have the same name.
The ELF specification is defined as part of the System V ABI [1].
According to it, multiple non-global symbols can have the same name in multiple object files [2].

I'm not sure this is the problem here. In cases of duplicate entries, every entry seems to have a duplicate (with the same offset). I'm still not sure why

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants