-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: csaf correlation - correlate vulnerability to purls and sboms #948
Conversation
2d45d4f
to
fd6d593
Compare
First stab at finding sboms and components affected by advisory when purls are not available. @JimFuller-RedHat I'll try next to use graph analysis to try to search for purl when appropriate information is available |
e3a636c
to
e79b51c
Compare
2d22088
to
4a23380
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel qualified to review some of the logic -- it's a big PR.
I'm a tad concerned about terminology. That the UX refers to "packages" and the backend to "purls" makes me nervous about introducing "component". Ideally, we could agree on a single term.
entity/src/product_status.rs
Outdated
pub base_purl_id: Option<Uuid>, | ||
pub component: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the name much better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I guess I wanted to keep it distinct enough from purls. In cyclonedx terminology it's components, spdx referf to packages. Maybe going with packages make sense in the end to be uniform in our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer component as packages are overloaded - though good as long as they are the same wherever (package or component).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed them to package. I like the term component better as well, but let's discuss this on a whole code base level once we are ready to change that
.drop_column(ProductStatus::BasePurlId) | ||
.to_owned(), | ||
) | ||
.await?; | ||
|
||
manager | ||
.alter_table( | ||
Table::alter() | ||
.table(ProductStatus::Table) | ||
.add_column(ColumnDef::new(ProductStatus::Component).string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we migrate existing data as well? Or is there no correlation between BasePurl's and Components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's doable, but not that easy. I think the best way would be to reimport and as this was a short-lived "feature" it shouldn't be a big impact. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "reimport"? If I have a database full of data, and I merge in these changes, how will my existing data be affected? If the UX doesn't break, I guess I don't care about migrating the data. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about empty database and reimport documents. Let me see if I can come up with a way to avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added data migration logic as well
let product_advisory_info = sbom | ||
.find_related(product_version::Entity) | ||
.join(JoinType::LeftJoin, product_version::Relation::Product.def()) | ||
.join(JoinType::LeftJoin, product::Relation::Cpe.def()) | ||
.join( | ||
JoinType::Join, | ||
trustify_entity::cpe::Relation::ProductStatus.def(), | ||
) | ||
.join(JoinType::Join, product_status::Relation::Status.def()) | ||
.join(JoinType::Join, product_status::Relation::Advisory.def()) | ||
.distinct_on([ | ||
(product_status::Entity, product_status::Column::ContextCpeId), | ||
(product_status::Entity, product_status::Column::StatusId), | ||
(product_status::Entity, product_status::Column::Component), | ||
( | ||
product_status::Entity, | ||
product_status::Column::VulnerabilityId, | ||
), | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these joins feel like a smell to me, but I got no better suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look existing search for purls, it's the similar join-armargedon :)
There's A LOT of room for improvement, but we need to start somewhere.
I can see data for vulnerabilities that is great!!!!!!!!!! The only thing I noticed is that if I hit http://localhost:8080/api/v1/sbom/urn:uuid:019320e4-4b70-7a90-a81b-8003340dc7bf/advisory then I can get packages without purl but only with name e.g. curl http://localhost:8080/api/v1/sbom/urn:uuid:019320e4-4b70-7a90-a81b-8003340dc7bf/advisory | jq [
{
"uuid": "urn:uuid:e07bc26b-0445-454c-b621-5ba663529bd7",
"identifier": "https://www.redhat.com/#CVE-2023-1664",
"issuer": {
"id": "91267f8b-91c9-4a50-99e5-73bd1140fc3e",
"name": "Red Hat Product Security",
"cpe_key": null,
"website": null
},
"published": "2023-03-27T00:00:00Z",
"modified": "2023-11-14T17:39:21Z",
"withdrawn": null,
"title": "keycloak: Untrusted Certificate Validation",
"labels": {
"type": "csaf",
"datasetFile": "csaf/2023/cve-2023-1664.json"
},
"status": [
{
"vulnerability_id": "CVE-2023-1664",
"status": "affected",
"context": {
"cpe": "cpe:/a:redhat:quarkus:2:*:*:*"
},
"packages": [
{
"id": "",
"name": "org.keycloak/keycloak-core",
"version": null,
"purl": [],
"cpe": []
}
]
}
]
}, I cut the previous JSON to only see the relevant data. |
@carlosthe19916 I wanted to mentioned this earlier today, but forgot :) Find the purl based on the component name is the next step and I filled the separate issue for it #994. But in general we can run into the case where we can't find it and I think we should display component name only in that case. |
WHERE base_purl_id IN ( | ||
SELECT id FROM base_purl | ||
WHERE id IN (SELECT DISTINCT base_purl_id FROM product_status WHERE base_purl_id IS NOT NULL) | ||
AND TYPE = 'generic' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my own education ... why the TYPE='generic' filter is relevant ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how previous logic stored those "fake" purls, so basically io.quarkus/quarkus-vertx-http
was stored as generic://io.quarkus/quarkus-vertx-http
. So we need to clean those up
let current_version = &cpe.version().clone().to_string(); | ||
let current_semver = lenient_semver::parse(current_version); | ||
|
||
if match (&version_semver, ¤t_semver) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using existing version related functions (e.g. psql funcs) so we do not introduce a 2nd code path (using lenient_semver) - I believe all those functions can be used directly in rust/sea_orm (or maybe even better performing in the query itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, but I was pushing what I was able to make work at this point with pure sql. I would leave that for the future improvement, I would love to have all this incorporated in the original query. And also make more improvements, once we have proper testing infra set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good - might be worth inserting a TODO here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it, but it definitely won't be forgotten :)
let ingest_results = ctx | ||
.ingest_documents([ | ||
"csaf/cve-2023-0044.json", | ||
"quarkus/v2/quarkus-bom-2.13.8.Final-redhat-00004.json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering how much variation there is with sboms ... might be worthwhile to test a few different ones (quarkus is pretty opinionated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's where #986 should come to play
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's at all possible to test the variations in a unit test, i'd prefer that to #986
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's definitely add more tests here as we go along and find the cases that we want to cover while improving the matching logic
feat: csaf correlation - correlate sbom to vulnerabilities chore: add basic test for product status correlation fix: broken sbom advisory logic fix: product status should only be valid for the current major stream fix: improve sbom product_status query fix: sbom status query to distinct on vulnerabilty id as well fix: improve the performance of vulnerability to sbom query feat: add sbom packages, with only names for now
e4e2618
to
743a420
Compare
743a420
to
15fb106
Compare
) | ||
.await?; | ||
|
||
// Transform data: populate `package` from `base_purl` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for going to the effort -- seems a tad gnarly. I confirmed the migration of my local data worked fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is :/ Thanks for checking it
This PR should be able to correlate VEX advisories from CSAF files that don't have product identifier helpers (purls) to correct sboms.
To test it, run a local instance of trustify
Ingest one of the datasets (ds1 and ds3 should both work)
http POST localhost:8080/api/v1/dataset @etc/datasets/ds3.zip
And inspect results for some of the documents, e.g. CVE-2023-0044 and quarkus-sbom
http GET localhost:8080/api/v1/vulnerability/CVE-2023-0044 | jq .advisories
(find uuid of quarkus-sbom and the provide it below)
http GET http://localhost:8080/api/v1/sbom/urn%3Auuid%3A01931b9e-8c5f-7681-8343-c8265a1257f1/advisory
This PR doesn't link found components to actual SBOM purls. That will be done in a separate effort.