Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Garbage collection and disappearing ports issue #3214

Merged
merged 60 commits into from
Aug 8, 2024

Conversation

Donnype
Copy link
Contributor

@Donnype Donnype commented Jul 9, 2024

Changes

This, ideally, fixes the garbage collection issue with nmap and nmap-udp.

It also adds a rather involved migration script that should still be thoroughly tested somehow. There we take all origins that should be updated and add data from bytes to the new source_method field. (The name for that field is very much to be discussed, by the way.) Note that it has been added to the boefjes module because it has connections to both Bytes and Octopoes already and I guess I'm more comfortable there. I wouldn't mind if there's a solid argument for using Rocky instead.

Issue link

Closes #2875

QA notes

Follow the instructions in #2875 and hopefully the issue is now gone.

However, there is a much more intricate part to QA here: the migration script. We need to update the existing origins using Octopoes and Bytes at the same time. This means: getting a realistic working setup, building, and triggering the migration using Docker:

git checkout fix/disappearing-ports
git pull
make kat
docker compose exec boefje python -m tools.upgrade_v1_16_0

Now the issue should not be present anymore.

The logs ideally show no exceptions and end with total_failures=0. If not, it can be rerun multiple times in case a random network error appeared. I say this because we could potentially hit the complete database and hence the script had a high chance of failing some API calls in production systems. If there are consistent exceptions however, there is probably a bug.


UPDATE:

Benchmark without new filter:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    7.903    7.903 /app/boefjes/tests/integration/test_bench.py:19(test_migration)
        1    0.000    0.000    4.364    4.364 /app/boefjes/tools/upgrade_v1_16_0.py:39(upgrade)
        1    0.003    0.003    3.953    3.953 /app/boefjes/tools/upgrade_v1_16_0.py:71(migrate_org)
       30    0.001    0.000    3.193    0.106 /app/boefjes/tools/upgrade_v1_16_0.py:133(update_origin)
        1    0.000    0.000    2.362    2.362 /app/boefjes/tests/conftest.py:1(<module>)
        2    0.000    0.000    1.362    0.681 /app/boefjes/tests/conftest.py:173(octopoes_api_connector)
        1    0.000    0.000    1.314    1.314 /app/boefjes/boefjes/app.py:1(<module>)
      121    0.000    0.000    0.879    0.007 /app/boefjes/boefjes/clients/bytes_client.py:20(wrapper)
        2    0.000    0.000    0.773    0.386 /app/boefjes/boefjes/clients/bytes_client.py:48(login)
        2    0.000    0.000    0.773    0.386 /app/boefjes/boefjes/clients/bytes_client.py:62(_get_authentication_headers)

Benchmark with new filter:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    5.428    5.428 /app/boefjes/tests/integration/test_bench.py:19(test_migration)
        1    0.000    0.000    2.399    2.399 /app/boefjes/tests/conftest.py:1(<module>)
        1    0.000    0.000    1.427    1.427 /app/boefjes/tools/upgrade_v1_16_0.py:40(upgrade)
        2    0.000    0.000    1.410    0.705 /app/boefjes/tests/conftest.py:173(octopoes_api_connector)
        1    0.000    0.000    1.349    1.349 /app/boefjes/boefjes/app.py:1(<module>)
        1    0.000    0.000    1.012    1.012 /app/boefjes/tools/upgrade_v1_16_0.py:72(migrate_organisation)
        1    0.001    0.001    0.823    0.823 /app/boefjes/tools/upgrade_v1_16_0.py:147(collect_boefjes_per_normalizer)
        1    0.000    0.000    0.821    0.821 /app/boefjes/boefjes/dependencies/plugins.py:55(_get_all_without_enabled)
        1    0.000    0.000    0.806    0.806 /app/boefjes/boefjes/local_repository.py:28(get_all)
      100    0.000    0.000    0.802    0.008 /app/boefjes/boefjes/clients/bytes_client.py:20(wrapper)

Note that the migrate_org method first takes almost 4 seconds, and then this drops to 1 second, of which 800 ms is from collect_boefjes_per_normalizer which we only call once. Actually calling the new bulk endpoint takes only 28 ms for 30 origins. Still 30 origins per 200 ms for a 30% rate of to-be-updated origins (e.g. nmap) means we could do 150 origins per second in my setup.


Code Checklist

  • All the commits in this PR are properly PGP-signed and verified.
  • This PR only contains functionality relevant to the issue.
  • I have written unit tests for the changes or fixes I made.
  • I have checked the documentation and made changes where necessary.
  • I have performed a self-review of my code and refactored it to the best of my abilities.
  • For any non-trivial functionality, I have added integration and/or end-to-end tests.
  • I have included comments in the code to elaborate on what is not self-evident from the code itself, including references to issues and discussions online, or implicit behavior of an interface.

Checklist for code reviewers:

Copy-paste the checklist from the docs/source/templates folder into your comment.


Checklist for QA:

Copy-paste the checklist from the docs/source/templates folder into your comment.

@Donnype Donnype requested a review from a team as a code owner July 9, 2024 19:27
@Donnype Donnype self-assigned this Jul 9, 2024
@stephanie0x00
Copy link
Contributor

Checklist for QA:

  • I have checked out this branch, and successfully ran a fresh make reset.
  • I confirmed that there are no unintended functional regressions in this branch:
    • I have managed to pass the onboarding flow
    • Objects and Findings are created properly
    • Tasks are created and completed properly
  • I confirmed that the PR's advertised feature or hotfix works as intended.
  • I checked the logs for errors and/or warnings and made issues where necessary

What works:

Works like a champ! I've tried to get the finding to disappear again by enabling both nmap TCP, nmap Ports and nmap UDP, but with all those boefjes the finding remains as it is. Logs look clear and the finding for the IPs are also identified by the corresponding boefjes.

Can be merged when Review is done.

What doesn't work:

n/a

Bug or feature?:

n/a

@underdarknl
Copy link
Contributor

Looks like the test trips over a missing Finding. Namely port 3306 being a database port. Did we seed the database with a correct Ports config ooi? If not, the port will be present, but the BIT will not make a finding.

@Donnype
Copy link
Contributor Author

Donnype commented Jul 11, 2024

@underdarknl I don't think the config will have impact because the defaults should work. I think it's some race condition because it works locally but not in the CI..

Copy link
Contributor

@ammar92 ammar92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I haven't gone into detail with the script, since it's likely a temporary tool and won't be needed once everyone has migrated. However, here are a few notes about the script to keep in mind:

  • Since it's more like a script rather than a CLI, using click here is redundant
  • We've recently introduced structlog, which should be preferred over Python's builtin logging
  • Instead of using literal numbers for HTTP status codes, you should use the constants provided in httpx.codes

@dekkers
Copy link
Contributor

dekkers commented Jul 12, 2024

@underdarknl I don't think the config will have impact because the defaults should work. I think it's some race condition because it works locally but not in the CI..

I think the race condition is in the test. recalculate_bits uses the current datetime for the valid time: https://github.com/minvws/nl-kat-coordination/blob/main/octopoes/octopoes/core/service.py#L606

So I think we also need to use a newer valid time in the test after calling recalculate_bits.

Copy link
Contributor

@dekkers dekkers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried that it will take a long time for bigger databases. Do we know how many origins per second it can do?

A more efficient way would be to do more in a single database query. For example get all the source methods in a single query (or batches of 1000 or something), then update the corresponding origins in XTDB with a single transaction. But we might not have the infrastructure / architecture that we can do that easily...

octopoes/tests/integration/test_api_connector.py Outdated Show resolved Hide resolved
boefjes/tools/upgrade_v1_16_0.py Outdated Show resolved Hide resolved
Signed-off-by: Donny Peeters <[email protected]>
@Donnype Donnype marked this pull request as draft July 16, 2024 14:28
@Donnype Donnype changed the title Fix Garbage collection and disappearing ports issue [Awaiting Performance fixes] Fix Garbage collection and disappearing ports issue Jul 16, 2024
@ammar92
Copy link
Contributor

ammar92 commented Jul 16, 2024

Checklist for QA:

  • I have checked out this branch, and successfully ran a fresh make reset.
  • I confirmed that there are no unintended functional regressions in this branch:
    • I have managed to pass the onboarding flow
    • Objects and Findings are created properly
    • Tasks are created and completed properly
  • I confirmed that the PR's advertised feature or hotfix works as intended.
  • I checked the logs for errors and/or warnings and made issues where necessary

What works:

  • Migration tool works; bug fixed; verified and tested together with @Donnype

@Donnype Donnype changed the title [Awaiting Performance fixes] Fix Garbage collection and disappearing ports issue [Awaiting Performance Updates] Fix Garbage collection and disappearing ports issue Jul 17, 2024
@Donnype Donnype changed the title [Awaiting Performance Updates] Fix Garbage collection and disappearing ports issue Fix Garbage collection and disappearing ports issue Jul 30, 2024
@Donnype Donnype marked this pull request as ready for review July 30, 2024 13:20
Copy link
Contributor

@dekkers dekkers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small things, but it looks good in general 👍

"D:SCHEDULER_API=http://placeholder:8002",
"D:BYTES_API=http://placeholder:8003",
"D:BYTES_USERNAME=placeholder",
"D:BYTES_PASSWORD=placeholder",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run the tests in the containers, those variables will already be set, but the tests should use the variables as defined here. This means we shouldn't use D: here.

Copy link
Contributor Author

@Donnype Donnype Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes it hard to perform integration tests where we want a different set of environment variables though. Isn't a containerized unit test a scenario for which we want to create a separate ci container with its own environment as well? Else we are going to have to figure out how to split sets of environment variables for these two scenarios (perhaps by overwriting these in the integration test setup by passing settings objects instead of the global accessing, although I'd say that makes more sense to do for the unit tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boefjes/.ci/docker-compose.yml Outdated Show resolved Hide resolved
boefjes/.ci/docker-compose.yml Outdated Show resolved Hide resolved
boefjes/.ci/docker-compose.yml Outdated Show resolved Hide resolved
octopoes/tests/integration/test_api_connector.py Outdated Show resolved Hide resolved
octopoes/octopoes/api/router.py Outdated Show resolved Hide resolved
boefjes/tools/upgrade_v1_16_0.py Outdated Show resolved Hide resolved
boefjes/tools/upgrade_v1_16_0.py Show resolved Hide resolved
@stephanie0x00
Copy link
Contributor

Checklist for QA:

  • I have checked out this branch, and successfully ran a fresh make reset.
  • I confirmed that there are no unintended functional regressions in this branch:
    • I have managed to pass the onboarding flow
    • Objects and Findings are created properly
    • Tasks are created and completed properly
  • I confirmed that the PR's advertised feature or hotfix works as intended.
  • I checked the logs for errors and/or warnings and made issues where necessary

What works:

Migration scenario works! We tested from version 1.15.0 and then followed the steps as mentioned in the QA notes. Testing was done against mispo.es. After performing nmap tcp + udp scans on 1.15.0 the Open Port finding for port 3306 and 22 disappears. After rescheduling both tcp + udp scans on this branch, the Open Port finding for 3306 and 22 appear in the findings overview.

What doesn't work:

Nothing found.

Bug or feature?:

Nothing found.

@underdarknl underdarknl merged commit 3623dd1 into main Aug 8, 2024
28 checks passed
@underdarknl underdarknl deleted the fix/disappearing-ports branch August 8, 2024 08:31
jpbruinsslot added a commit that referenced this pull request Aug 8, 2024
* main:
  Basic audit trails via logging (#3317)
  Raw upload with Scan OOIS (#3169)
  Fix Garbage collection and disappearing ports issue (#3214)
  Updated `Django` and `opentelemetry` packages (#3324)
jpbruinsslot added a commit that referenced this pull request Aug 8, 2024
* feature/mula/refactor-queue:
  Fix poetry
  Updates according to code review
  Basic audit trails via logging (#3317)
  Raw upload with Scan OOIS (#3169)
  Fix Garbage collection and disappearing ports issue (#3214)
  Formatting
  Formatting
  Fix formatting
  Updated `Django` and `opentelemetry` packages (#3324)
  Restructure scheduler development scripts (#3293)
  Change report flow to POST requests (#3174)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Garbage collection issue with multiple nmap boefjes enabled (tcp + udp)
5 participants