Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: csaf correlation - correlate vulnerability to purls and sboms #948

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

dejanb
Copy link
Contributor

@dejanb dejanb commented Oct 30, 2024

This PR should be able to correlate VEX advisories from CSAF files that don't have product identifier helpers (purls) to correct sboms.

To test it, run a local instance of trustify

Ingest one of the datasets (ds1 and ds3 should both work)

http POST localhost:8080/api/v1/dataset @etc/datasets/ds3.zip

And inspect results for some of the documents, e.g. CVE-2023-0044 and quarkus-sbom

http GET localhost:8080/api/v1/vulnerability/CVE-2023-0044 | jq .advisories

(find uuid of quarkus-sbom and the provide it below)

http GET http://localhost:8080/api/v1/sbom/urn%3Auuid%3A01931b9e-8c5f-7681-8343-c8265a1257f1/advisory
This PR doesn't link found components to actual SBOM purls. That will be done in a separate effort.

@dejanb dejanb force-pushed the sbom-correlation branch 3 times, most recently from 2d45d4f to fd6d593 Compare October 30, 2024 13:53
@dejanb
Copy link
Contributor Author

dejanb commented Oct 30, 2024

First stab at finding sboms and components affected by advisory when purls are not available.

@JimFuller-RedHat I'll try next to use graph analysis to try to search for purl when appropriate information is available

@dejanb dejanb force-pushed the sbom-correlation branch 2 times, most recently from e3a636c to e79b51c Compare November 4, 2024 12:29
@dejanb dejanb marked this pull request as draft November 4, 2024 14:02
@dejanb dejanb force-pushed the sbom-correlation branch 3 times, most recently from 2d22088 to 4a23380 Compare November 12, 2024 12:06
@dejanb dejanb marked this pull request as ready for review November 12, 2024 12:29
Copy link
Contributor

@jcrossley3 jcrossley3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel qualified to review some of the logic -- it's a big PR.

I'm a tad concerned about terminology. That the UX refers to "packages" and the backend to "purls" makes me nervous about introducing "component". Ideally, we could agree on a single term.

pub base_purl_id: Option<Uuid>,
pub component: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the name much better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I guess I wanted to keep it distinct enough from purls. In cyclonedx terminology it's components, spdx referf to packages. Maybe going with packages make sense in the end to be uniform in our code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer component as packages are overloaded - though good as long as they are the same wherever (package or component).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed them to package. I like the term component better as well, but let's discuss this on a whole code base level once we are ready to change that

Comment on lines 14 to 23
.drop_column(ProductStatus::BasePurlId)
.to_owned(),
)
.await?;

manager
.alter_table(
Table::alter()
.table(ProductStatus::Table)
.add_column(ColumnDef::new(ProductStatus::Component).string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we migrate existing data as well? Or is there no correlation between BasePurl's and Components?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's doable, but not that easy. I think the best way would be to reimport and as this was a short-lived "feature" it shouldn't be a big impact. WDYT?

Copy link
Contributor

@jcrossley3 jcrossley3 Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "reimport"? If I have a database full of data, and I merge in these changes, how will my existing data be affected? If the UX doesn't break, I guess I don't care about migrating the data. 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about empty database and reimport documents. Let me see if I can come up with a way to avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added data migration logic as well

Comment on lines 83 to 101
let product_advisory_info = sbom
.find_related(product_version::Entity)
.join(JoinType::LeftJoin, product_version::Relation::Product.def())
.join(JoinType::LeftJoin, product::Relation::Cpe.def())
.join(
JoinType::Join,
trustify_entity::cpe::Relation::ProductStatus.def(),
)
.join(JoinType::Join, product_status::Relation::Status.def())
.join(JoinType::Join, product_status::Relation::Advisory.def())
.distinct_on([
(product_status::Entity, product_status::Column::ContextCpeId),
(product_status::Entity, product_status::Column::StatusId),
(product_status::Entity, product_status::Column::Component),
(
product_status::Entity,
product_status::Column::VulnerabilityId,
),
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these joins feel like a smell to me, but I got no better suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look existing search for purls, it's the similar join-armargedon :)

https://github.com/trustification/trustify/blob/main/modules/fundamental/src/sbom/model/details.rs#L42

There's A LOT of room for improvement, but we need to start somewhere.

@carlosthe19916
Copy link
Member

I can see data for vulnerabilities that is great!!!!!!!!!!

image

The only thing I noticed is that if I hit http://localhost:8080/api/v1/sbom/urn:uuid:019320e4-4b70-7a90-a81b-8003340dc7bf/advisory then I can get packages without purl but only with name e.g.

curl http://localhost:8080/api/v1/sbom/urn:uuid:019320e4-4b70-7a90-a81b-8003340dc7bf/advisory | jq
[
    {
        "uuid": "urn:uuid:e07bc26b-0445-454c-b621-5ba663529bd7",
        "identifier": "https://www.redhat.com/#CVE-2023-1664",
        "issuer": {
            "id": "91267f8b-91c9-4a50-99e5-73bd1140fc3e",
            "name": "Red Hat Product Security",
            "cpe_key": null,
            "website": null
        },
        "published": "2023-03-27T00:00:00Z",
        "modified": "2023-11-14T17:39:21Z",
        "withdrawn": null,
        "title": "keycloak: Untrusted Certificate Validation",
        "labels": {
            "type": "csaf",
            "datasetFile": "csaf/2023/cve-2023-1664.json"
        },
        "status": [
            {
                "vulnerability_id": "CVE-2023-1664",
                "status": "affected",
                "context": {
                    "cpe": "cpe:/a:redhat:quarkus:2:*:*:*"
                },
                "packages": [
                    {
                        "id": "",
                        "name": "org.keycloak/keycloak-core",
                        "version": null,
                        "purl": [],
                        "cpe": []
                    }
                ]
            }
        ]
    },

I cut the previous JSON to only see the relevant data.
Having a package only with name and without a purl intended? Currently, at least on the UI, there must be a purl for it to be rendered otherwise it won't be rendered.

@dejanb
Copy link
Contributor Author

dejanb commented Nov 12, 2024

I cut the previous JSON to only see the relevant data. Having a package only with name and without a purl intended? Currently, at least on the UI, there must be a purl for it to be rendered otherwise it won't be rendered.

@carlosthe19916 I wanted to mentioned this earlier today, but forgot :)

Find the purl based on the component name is the next step and I filled the separate issue for it #994. But in general we can run into the case where we can't find it and I think we should display component name only in that case.

WHERE base_purl_id IN (
SELECT id FROM base_purl
WHERE id IN (SELECT DISTINCT base_purl_id FROM product_status WHERE base_purl_id IS NOT NULL)
AND TYPE = 'generic'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my own education ... why the TYPE='generic' filter is relevant ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how previous logic stored those "fake" purls, so basically io.quarkus/quarkus-vertx-http was stored as generic://io.quarkus/quarkus-vertx-http. So we need to clean those up

let current_version = &cpe.version().clone().to_string();
let current_semver = lenient_semver::parse(current_version);

if match (&version_semver, &current_semver) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using existing version related functions (e.g. psql funcs) so we do not introduce a 2nd code path (using lenient_semver) - I believe all those functions can be used directly in rust/sea_orm (or maybe even better performing in the query itself.

Copy link
Contributor Author

@dejanb dejanb Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, but I was pushing what I was able to make work at this point with pure sql. I would leave that for the future improvement, I would love to have all this incorporated in the original query. And also make more improvements, once we have proper testing infra set up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good - might be worth inserting a TODO here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it, but it definitely won't be forgotten :)

let ingest_results = ctx
.ingest_documents([
"csaf/cve-2023-0044.json",
"quarkus/v2/quarkus-bom-2.13.8.Final-redhat-00004.json",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering how much variation there is with sboms ... might be worthwhile to test a few different ones (quarkus is pretty opinionated).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's where #986 should come to play

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's at all possible to test the variations in a unit test, i'd prefer that to #986

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's definitely add more tests here as we go along and find the cases that we want to cover while improving the matching logic

feat: csaf correlation - correlate sbom to vulnerabilities

chore: add basic test for product status correlation

fix: broken sbom advisory logic

fix: product status should only be valid for the current major stream

fix: improve sbom product_status query

fix: sbom status query to distinct on vulnerabilty id as well

fix: improve the performance of vulnerability to sbom query

feat: add sbom packages, with only names for now
)
.await?;

// Transform data: populate `package` from `base_purl`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for going to the effort -- seems a tad gnarly. I confirmed the migration of my local data worked fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is :/ Thanks for checking it

@dejanb dejanb added this pull request to the merge queue Nov 13, 2024
Merged via the queue into main with commit 5436509 Nov 13, 2024
2 checks passed
@dejanb dejanb deleted the sbom-correlation branch November 13, 2024 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants