-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Garbage collection and disappearing ports issue #3214
Conversation
Signed-off-by: Donny Peeters <[email protected]>
Add an origin migration script
Checklist for QA:
What works:Works like a champ! I've tried to get the finding to disappear again by enabling both nmap TCP, nmap Ports and nmap UDP, but with all those boefjes the finding remains as it is. Logs look clear and the finding for the IPs are also identified by the corresponding boefjes. Can be merged when Review is done. What doesn't work:n/a Bug or feature?:n/a |
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Looks like the test trips over a missing Finding. Namely port 3306 being a database port. Did we seed the database with a correct Ports config ooi? If not, the port will be present, but the BIT will not make a finding. |
Signed-off-by: Donny Peeters <[email protected]>
@underdarknl I don't think the config will have impact because the defaults should work. I think it's some race condition because it works locally but not in the CI.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I haven't gone into detail with the script, since it's likely a temporary tool and won't be needed once everyone has migrated. However, here are a few notes about the script to keep in mind:
- Since it's more like a script rather than a CLI, using
click
here is redundant - We've recently introduced
structlog
, which should be preferred over Python's builtinlogging
- Instead of using literal numbers for HTTP status codes, you should use the constants provided in
httpx.codes
I think the race condition is in the test. recalculate_bits uses the current datetime for the valid time: https://github.com/minvws/nl-kat-coordination/blob/main/octopoes/octopoes/core/service.py#L606 So I think we also need to use a newer valid time in the test after calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit worried that it will take a long time for bigger databases. Do we know how many origins per second it can do?
A more efficient way would be to do more in a single database query. For example get all the source methods in a single query (or batches of 1000 or something), then update the corresponding origins in XTDB with a single transaction. But we might not have the infrastructure / architecture that we can do that easily...
Signed-off-by: Donny Peeters <[email protected]>
Checklist for QA:
What works:
|
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Signed-off-by: Donny Peeters <[email protected]>
Co-authored-by: Jan Klopper <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small things, but it looks good in general 👍
"D:SCHEDULER_API=http://placeholder:8002", | ||
"D:BYTES_API=http://placeholder:8003", | ||
"D:BYTES_USERNAME=placeholder", | ||
"D:BYTES_PASSWORD=placeholder", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you run the tests in the containers, those variables will already be set, but the tests should use the variables as defined here. This means we shouldn't use D:
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes it hard to perform integration tests where we want a different set of environment variables though. Isn't a containerized unit test a scenario for which we want to create a separate ci container with its own environment as well? Else we are going to have to figure out how to split sets of environment variables for these two scenarios (perhaps by overwriting these in the integration test setup by passing settings
objects instead of the global accessing, although I'd say that makes more sense to do for the unit tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checklist for QA:
What works:Migration scenario works! We tested from version 1.15.0 and then followed the steps as mentioned in the QA notes. Testing was done against mispo.es. After performing nmap tcp + udp scans on 1.15.0 the Open Port finding for port 3306 and 22 disappears. After rescheduling both tcp + udp scans on this branch, the Open Port finding for 3306 and 22 appear in the findings overview. What doesn't work:Nothing found. Bug or feature?:Nothing found. |
* feature/mula/refactor-queue: Fix poetry Updates according to code review Basic audit trails via logging (#3317) Raw upload with Scan OOIS (#3169) Fix Garbage collection and disappearing ports issue (#3214) Formatting Formatting Fix formatting Updated `Django` and `opentelemetry` packages (#3324) Restructure scheduler development scripts (#3293) Change report flow to POST requests (#3174)
Changes
This, ideally, fixes the garbage collection issue with
nmap
andnmap-udp
.It also adds a rather involved migration script that should still be thoroughly tested somehow. There we take all origins that should be updated and add data from bytes to the new
source_method
field. (The name for that field is very much to be discussed, by the way.) Note that it has been added to theboefjes
module because it has connections to both Bytes and Octopoes already and I guess I'm more comfortable there. I wouldn't mind if there's a solid argument for using Rocky instead.Issue link
Closes #2875
QA notes
Follow the instructions in #2875 and hopefully the issue is now gone.
However, there is a much more intricate part to QA here: the migration script. We need to update the existing origins using Octopoes and Bytes at the same time. This means: getting a realistic working setup, building, and triggering the migration using Docker:
Now the issue should not be present anymore.
The logs ideally show no exceptions and end with
total_failures=0
. If not, it can be rerun multiple times in case a random network error appeared. I say this because we could potentially hit the complete database and hence the script had a high chance of failing some API calls in production systems. If there are consistent exceptions however, there is probably a bug.UPDATE:
Benchmark without new filter:
Benchmark with new filter:
Note that the
migrate_org
method first takes almost 4 seconds, and then this drops to 1 second, of which 800 ms is fromcollect_boefjes_per_normalizer
which we only call once. Actually calling the new bulk endpoint takes only 28 ms for 30 origins. Still 30 origins per 200 ms for a 30% rate of to-be-updated origins (e.g. nmap) means we could do 150 origins per second in my setup.Code Checklist
Checklist for code reviewers:
Copy-paste the checklist from the docs/source/templates folder into your comment.
Checklist for QA:
Copy-paste the checklist from the docs/source/templates folder into your comment.