Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excavate from RAW_TEXT events #1636

Merged

Conversation

domwhewell-sage
Copy link
Contributor

This PR adds the if statement back to the internal excavate module so it can run its rules on RAW_TEXT events. And a test has been added to download a pdf and extract URLS from it.

I have made changes to the unstructured module as when it printed its discovery context it included the content of the RAW_TEXT event so output.csv was quite large.
I have also added a replace() to the __str__ function of the Event class so newlines are not printed in debug logs

@domwhewell-sage
Copy link
Contributor Author

domwhewell-sage commented Aug 6, 2024

Excavate is correctly parsing events out of downloaded files. However after running bbot -t dell.com -m github_org, git_clone, unstructured, httpx as a test many events that should be outside of scope are getting captured for example

SOCIAL,"{'platform': 'github', 'profile_name': 'irisgve', 'url': 'https://github.com/irisgve'}",,github_org,4,"distance-4,github-org-member,spider-danger","Scan subtle_karen seeded with DNS_NAME: dell.com --> speculated ORG_STUB: dell --> github_org tried ""dell"" as GitHub profile and discovered SOCIAL: https://github.com/dell --> github_org listed repos for GitHub profile and discovered CODE_REPOSITORY: https://github.com/dell/thinos-electron --> git_clone downloaded git repo at https://github.com/dell/thinos-electron to FILESYSTEM: /home/user/.bbot/scans/subtle_karen/git_repos/thinos-electron --> unstructured discovered FILESYSTEM: {'path': '/home/user/.bbot/scans/subtle_karen/git_repos/thinos-electron/docs/tutorial/updates.md'} --> Extracted text from /home/user/.bbot/scans/subtle_karen/git_repos/thinos-electron/docs/tutorial/updates.md --> Excavate's URLExtractor emitted URL_UNVERIFIED https://github.com/atlassian/nucleus, because Parsed file content contains full URL --> social detected github SOCIAL at https://github.com/atlassian --> github_org listed members of GitHub organization and discovered SOCIAL: https://github.com/irisgve"

@domwhewell-sage domwhewell-sage marked this pull request as ready for review August 6, 2024 15:05
@TheTechromancer
Copy link
Collaborator

TheTechromancer commented Aug 6, 2024

many events that should be outside of scope are getting captured

Yeah we should probably look at each event in that chain and figure out where either:

  1. An event's scope distance is not getting incremented when it should be
    ..or else
  2. a module is accepting events that are too far out, and should have its scope_distance_modifier lowered.

@domwhewell-sage
Copy link
Contributor Author

Ah I have this in my config

scope:
  report_distance: 1
  search_distance: 1

Which may be allowing SOCIAL to emit "https://github.com/atlassian"

Will try it without

@domwhewell-sage
Copy link
Contributor Author

domwhewell-sage commented Aug 7, 2024

Yes it was my configuration allowing social to grab OOS github profiles and have them be consumed by other modules.
The reason I created that configuration was unstructured wasn't scanning downloaded git repos
Not accepting FILESYSTEM("{'path': '/root/.bbot/scans/dark_janet/git_repos/iDRAC-Redfish-Scripting'}", module=git_clone, tags={'git', 'folder', 'distance-1'}) because its scope_distance (1) exceeds the maximum allowed by the scan (0) + the module (0) == 0
So I have made a change to unstructured, scope_distance_modifier = 1 so it grabs downloaded files to raise them as raw text.

My re-run on dell.com was far more successful just a few events that probably should be OOS but nothing from unstructured

# cat ~/.bbot/scans/acrophobic_annie/output.txt | grep -v dell
[SCAN]                  acrophobic_annie (SCAN:d0563c942297fca3993ca4880e153ae474a43d91)        TARGET  (in-scope, target)
[DNS_NAME]              mxb-00154901.gslb.pphosted.com  MX      (a-record, affiliate, distance-1, subdomain)
[DNS_NAME]              spf.has.pphosted.com    TXT     (affiliate, distance-1, ns-record, soa-record, subdomain)
[DNS_NAME]              mxa-00154901.gslb.pphosted.com  MX      (a-record, affiliate, distance-1, subdomain)
[STORAGE_BUCKET]        {"name": "github-cloud", "url": "https://github-cloud.s3.amazonaws.com/"}       httpx->cloud_amazon     (cloud-amazon, cloud-storage-bucket, distance-2)
[SOCIAL]                {"platform": "github", "profile_name": "customer-stories", "url": "https://github.com/customer-stories"}        httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "notifications", "url": "https://github.com/notifications"}      httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "security", "url": "https://github.com/security"}        httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "team", "url": "https://github.com/team"}        httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "features", "url": "https://github.com/features"}        httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "solutions", "url": "https://github.com/solutions"}      httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "readme", "url": "https://github.com/readme"}    httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "trending", "url": "https://github.com/trending"}        httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "enterprise", "url": "https://github.com/enterprise"}    httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "pricing", "url": "https://github.com/pricing"}  httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "topics", "url": "https://github.com/topics"}    httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "collections", "url": "https://github.com/collections"}  httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "login", "url": "https://github.com/login"}      httpx->excavate->social (distance-2, spider-danger, spider-max)
[DNS_NAME]              spf.has.pphosted.com    TXT     (affiliate, distance-1, subdomain)
[SOCIAL]                {"platform": "github", "profile_name": "sponsors", "url": "https://github.com/sponsors"}        httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "resources", "url": "https://github.com/resources"}      httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "premium-support", "url": "https://github.com/premium-support"}  httpx->excavate->social (distance-2, spider-danger, spider-max)
[DNS_NAME]              demdex.net      httpx->excavate (affiliate, ns-record, soa-record)
[DNS_NAME]              truste.com      httpx->excavate (a-record, affiliate, cloud-amazon, mx-record, ns-record, soa-record, txt-record)
[DNS_NAME]              ytimg.com       httpx->excavate (affiliate, mx-record, ns-record, soa-record)
[DNS_NAME]              cdn-prod.eu.securiti.ai httpx->excavate (a-record, aaaa-record, affiliate, cloud-amazon, cname-record, ns-record, soa-record)
[DNS_NAME]              iperceptions.com        httpx->excavate (a-record, affiliate, cloud-amazon, mx-record, ns-record, soa-record, txt-record)
[DNS_NAME]              universal.iper2.com     httpx->excavate (a-record, affiliate, cname-record, srv-record)
[DNS_NAME]              ensighten.com   httpx->excavate (a-record, affiliate, cloud-amazon, mx-record, ns-record, soa-record, txt-record)
[DNS_NAME]              adoberesources.net      httpx->excavate (affiliate, ns-record, soa-record)
[DNS_NAME]              cdnjs.cloudflare.com    httpx->excavate (a-record, aaaa-record, affiliate, cdn-cloudflare, ns-record, soa-record)
[DNS_NAME]              app.eu.securiti.ai      httpx->excavate (a-record, affiliate, cloud-amazon, cname-record)
[DNS_NAME]              www.youtube.com httpx->excavate (a-record, aaaa-record, affiliate, cname-record)
[DNS_NAME]              instruqt.com    httpx->excavate (a-record, affiliate, cloud-amazon, mx-record, ns-record, soa-record, txt-record)
[SOCIAL]                {"platform": "github", "profile_name": "orgs", "url": "https://github.com/orgs"}        httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "search", "url": "https://github.com/search"}    httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "signup", "url": "https://github.com/signup"}    httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "opensearch", "url": "https://github.com/opensearch"}    httpx->excavate->social (distance-2, spider-danger, spider-max)
[SOCIAL]                {"platform": "github", "profile_name": "manifest", "url": "https://github.com/manifest"}        httpx->excavate->social (distance-2, spider-danger, spider-max)
[DNS_NAME]              github-cloud.s3.amazonaws.com   httpx->excavate (affiliate)
[DNS_NAME]              github.githubassets.com httpx->excavate (affiliate)
[DNS_NAME]              www.githubstatus.com    httpx->excavate (affiliate)
[DNS_NAME]              productionresultssa15.blob.core.windows.net     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa5.blob.core.windows.net      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              github-production-user-asset-6210df.s3.amazonaws.com    httpx->excavate (a-record, affiliate, cname-record, ns-record, soa-record)
[DNS_NAME]              collector.github.com    httpx->excavate (affiliate)
[DNS_NAME]              productionresultssa13.blob.core.windows.net     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa8.blob.core.windows.net      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              objects-origin.githubusercontent.com    httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa4.blob.core.windows.net      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa12.blob.core.windows.net     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              github.com      httpx->excavate (affiliate, cdn-github)
[DNS_NAME]              github-production-repository-file-5c1aeb.s3.amazonaws.com       httpx->excavate (a-record, affiliate, cname-record, ns-record, soa-record)
[DNS_NAME]              alive.github.com        httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa6.blob.core.windows.net      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa19.blob.core.windows.net     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa18.blob.core.windows.net     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa7.blob.core.windows.net      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              raw.githubusercontent.com       httpx->excavate (a-record, aaaa-record, affiliate)
[DNS_NAME]              api.github.com  httpx->excavate (affiliate)
[DNS_NAME]              proxy.enterprise.githubcopilot.com      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              github-production-upload-manifest-file-7fdce7.s3.amazonaws.com  httpx->excavate (a-record, affiliate, cname-record, ns-record, soa-record)
[DNS_NAME]              productionresultssa16.blob.core.windows.net     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              gist.github.com httpx->excavate (a-record, affiliate, cname-record, mx-record, ns-record, soa-record, txt-record)
[DNS_NAME]              productionresultssa3.blob.core.windows.net      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa0.blob.core.windows.net      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              github-production-repository-image-32fea6.s3.amazonaws.com      httpx->excavate (a-record, affiliate, cname-record, ns-record, soa-record)
[DNS_NAME]              api.githubcopilot.com   httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa14.blob.core.windows.net     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              uploads.github.com      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa17.blob.core.windows.net     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              insights.github.com     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              rel.tunnels.api.visualstudio.com        httpx->excavate (affiliate, ns-record, soa-record)
[DNS_NAME]              productionresultssa1.blob.core.windows.net      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              actions.githubusercontent.com   httpx->excavate (affiliate, ns-record, soa-record)
[DNS_NAME]              productionresultssa10.blob.core.windows.net     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa2.blob.core.windows.net      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              github-production-release-asset-2e65be.s3.amazonaws.com httpx->excavate (a-record, affiliate, cname-record, ns-record, soa-record)
[DNS_NAME]              productionresultssa11.blob.core.windows.net     httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              productionresultssa9.blob.core.windows.net      httpx->excavate (a-record, affiliate, cname-record)
[DNS_NAME]              copilot-proxy.githubusercontent.com     httpx->excavate (a-record, affiliate)
[DNS_NAME]              prod-lc-mqtt-tier2.sprinklr.com httpx->excavate (a-record, aaaa-record, affiliate, cloud-amazon, cname-record)
[DNS_NAME]              cdn.pendo.io    httpx->excavate (a-record, affiliate, cloud-google)
[DNS_NAME]              api.feedback.us.pendo.io        httpx->excavate (a-record, affiliate, cloud-google)
[DNS_NAME]              data.pendo.io   httpx->excavate (a-record, affiliate, cloud-google)
[DNS_NAME]              fonts.googleapis.com    httpx->excavate (a-record, aaaa-record, affiliate, cloud-google)
[DNS_NAME]              fonts.gstatic.com       httpx->excavate (a-record, aaaa-record, affiliate)
[DNS_NAME]              sprinklr.com    httpx->excavate (affiliate, cloud-amazon)
[DNS_NAME]              sandbox.embed.apollographql.com httpx->excavate (a-record, aaaa-record, affiliate, cdn-cloudflare, cname-record)
[DNS_NAME]              pendo-io-static.storage.googleapis.com  httpx->excavate (a-record, aaaa-record, affiliate, cloud-google, cloud-storage-bucket)
[DNS_NAME]              app.pendo.io    httpx->excavate (a-record, affiliate, cloud-google)
[STORAGE_BUCKET]        {"name": "pendo-io-static", "url": "https://pendo-io-static.storage.googleapis.com/"}   cloud_google    (affiliate, cloud-google, cloud-storage-bucket, distance-1)
[DNS_NAME]              consent.truste.com      httpx->excavate (a-record, affiliate, cloud-amazon)
[DNS_NAME]              apollo-server-landing-page.cdn.apollographql.com        httpx->excavate (a-record, affiliate, cloud-google)
[DNS_NAME]              nexus.ensighten.com     httpx->excavate (a-record, aaaa-record, affiliate, cloud-amazon, cname-record, ns-record, soa-record)
[DNS_NAME]              api.iperceptions.com    httpx->excavate (a-record, affiliate, cdn-github, cloud-azure, cname-record)
[DNS_NAME]              pendo-static-4809984318504960.storage.googleapis.com    httpx->excavate (a-record, aaaa-record, affiliate, cloud-google, cloud-storage-bucket)
[DNS_NAME]              consent.trustarc.com    httpx->excavate (a-record, affiliate, cloud-amazon)
[STORAGE_BUCKET]        {"name": "pendo-static-4809984318504960", "url": "https://pendo-static-4809984318504960.storage.googleapis.com/"}       cloud_google    (affiliate, cloud-google, cloud-storage-bucket, distance-1)
[DNS_NAME]              googleapis.com  httpx->excavate (affiliate, cloud-google)
[DNS_NAME]              everesttech.net httpx->excavate (a-record, affiliate, ns-record, soa-record)
[DNS_NAME]              google.com      httpx->excavate (a-record, aaaa-record, affiliate, mx-record, ns-record, soa-record, txt-record)
[DNS_NAME]              salesforceliveagent.com httpx->excavate (a-record, affiliate, cloud-amazon, ns-record, soa-record)
[DNS_NAME]              brightcove.net  httpx->excavate (affiliate)
[DNS_NAME]              dpm.demdex.net  httpx->excavate (a-record, affiliate, cloud-amazon, cname-record)
[IP_ADDRESS]            127.0.0.1       httpx->excavate (affiliate)
[DNS_NAME]              force.com       httpx->excavate (a-record, affiliate, cdn-akamai, mx-record, ns-record, soa-record, txt-record)
[DNS_NAME]              i.ytimg.com     httpx->excavate (a-record, aaaa-record, affiliate)
[DNS_NAME]              coveo.com       httpx->excavate (a-record, affiliate, cloud-amazon, mx-record, ns-record, soa-record, txt-record)
[DNS_NAME]              c.evidon.com    httpx->excavate (a-record, affiliate, cdn-akamai, cname-record)
[DNS_NAME]              boltdns.net     httpx->excavate (affiliate, ns-record, soa-record)
[DNS_NAME]              ggpht.com       httpx->excavate (affiliate, ns-record, soa-record)
[DNS_NAME]              bdstatic.com    httpx->excavate (a-record, affiliate, ns-record, soa-record)
[DNS_NAME]              brightcove.com  httpx->excavate (a-record, affiliate, mx-record, ns-record, soa-record, txt-record)
[DNS_NAME]              bootstrapcdn.com        httpx->excavate (a-record, affiliate, ns-record, soa-record)
[DNS_NAME]              brightcovecdn.com       httpx->excavate (affiliate, ns-record, soa-record)
[DNS_NAME]              my.salesforce.com       httpx->excavate (affiliate, ns-record, soa-record)
[DNS_NAME]              salesforce-sites.com    httpx->excavate (a-record, affiliate, cloud-amazon, ns-record, soa-record)
[DNS_NAME]              eu.securiti.ai  httpx->excavate (affiliate)
[DNS_NAME]              ns-261.awsdns-32.com    NS      (a-record, aaaa-record, affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              ns-1527.awsdns-62.org   NS      (a-record, aaaa-record, affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              ns-261.awsdns-32.com    SOA     (affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              ns-653.awsdns-17.net    NS      (a-record, aaaa-record, affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              b.mx.p53.neolane.net    MX      (a-record, affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              ns-1856.awsdns-40.co.uk NS      (a-record, aaaa-record, affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              a.mx.p53.neolane.net    MX      (a-record, affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              evidon.com      httpx->excavate (affiliate, cloud-amazon)
[DNS_NAME]              screenmeet.com  httpx->excavate (a-record, affiliate, mx-record, ns-record, soa-record, txt-record)
[DNS_NAME]              iper2.com       httpx->excavate (affiliate)
[DNS_NAME]              akamaihd.net    httpx->excavate (affiliate, cdn-akamai, mx-record, ns-record, soa-record)
[DNS_NAME]              gstatic.com     httpx->excavate (affiliate)
[DNS_NAME]              l.betrad.com    httpx->excavate (a-record, affiliate, cloud-amazon, cname-record)
[DNS_NAME]              visualforce.com httpx->excavate (a-record, affiliate, cloud-amazon, ns-record, soa-record)
[SOCIAL]                {"platform": "facebook", "profile_name": "alienware", "url": "https://facebook.com/alienware"}  httpx->excavate->social (distance-2)
[SOCIAL]                {"platform": "instagram", "profile_name": "alienware", "url": "https://instagram.com/alienware"}        httpx->excavate->social (distance-2)
[DNS_NAME]              spf.has.pphosted.com    TXT     (affiliate, distance-1, subdomain)
[SOCIAL]                {"platform": "discord", "profile_name": "Alienware", "url": "https://discord.gg/Alienware"}     httpx->excavate->social (distance-2)
[DNS_NAME]              ns-149.awsdns-18.com    NS      (a-record, aaaa-record, affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              ns-2024.awsdns-61.co.uk NS      (a-record, aaaa-record, affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              ns-773.awsdns-32.net    NS      (a-record, aaaa-record, affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              ns-1147.awsdns-15.org   NS      (a-record, aaaa-record, affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              ns-1147.awsdns-15.org   SOA     (affiliate, cloud-amazon, distance-1, subdomain)
[DNS_NAME]              time.windows.com        CNAME   (a-record, affiliate, cloud-azure, cname-record, distance-1, subdomain)

I have found a few errors in excavate (Created by it consuming RAW_TEXT events instead of HTTP_RESPONSE and these not having urls) and unstructured (Where it autodetect delimiters or fails parsing a file) which I will handle. Changing this PR back to draft whilst I resolve these.

@domwhewell-sage domwhewell-sage marked this pull request as draft August 7, 2024 10:46
@domwhewell-sage
Copy link
Contributor Author

domwhewell-sage commented Aug 7, 2024

So Ive added a few bits of data that should test most of excavate rules when extracting from RAW_TEXT

The tests are getting an error when emmiting the FINDING events e.g.

WARNING  bbot.modules.internal.excavate:base.py:1347 Error sanitizing event data "{'host': '', 'url': '', 'description': 'Parsed file content contains JSON Web Token (JWT) [eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c]'}" for type "FINDING": 2 validation errors for _data_validator
host
  Value error, Validation failed for ('',), {}: Invalid hostname: "" [type=value_error, input_value='', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/value_error
url
  Value error, Validation failed for ('',), {}: Validation failed for ('',), {}: Invalid URL: "" [type=value_error, input_value='', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/value_error

Thinking of the scan of dell.com a file from a repo contains a JWT which probably cant be tracked back to a host or a url to associate with the finding. So maybe a FINDING needs to validate a host & url OR a filepath attribute... But I'm open to suggestions

@domwhewell-sage domwhewell-sage marked this pull request as ready for review August 7, 2024 18:57
@TheTechromancer
Copy link
Collaborator

TheTechromancer commented Aug 7, 2024

Makes sense. Probably what we should do is, for event types like FINDING that require a host, if a host isn't specified, we just walk the chain of parents back until we hit the first host event, and use that one. We can raise an error if no host was found.

Always having a host helps in showing which git repo / website the file originally came from.

EDIT: this logic should probably go in DictHostEvent. I should have time to look at this sometime after DEFCON. And I agree we should add a path attribute to FINDING.

@liquidsec
Copy link
Collaborator

Makes sense. Probably what we should do is, for event types like FINDING that require a host, if a host isn't specified, we just walk the chain of parents back until we hit the first host event, and use that one. We can raise an error if no host was found.

Always having a host helps in showing which git repo / website the file originally came from.

EDIT: this logic should probably go in DictHostEvent. I should have time to look at this sometime after DEFCON. And I agree we should add a path attribute to FINDING.

I totally agree with the first part about walking back up the parents looking for a host

But i'm not getting the path/filepath thing

Many findings wouldn't have anything like that, unless i'm misunderstanding?

@TheTechromancer
Copy link
Collaborator

the path/filepath thing

This would be for secrets etc. found in a git repo. We'd want to capture which repo it was found in, but also which individual file.

@liquidsec
Copy link
Collaborator

the path/filepath thing

This would be for secrets etc. found in a git repo. We'd want to capture which repo it was found in, but also which individual file.

hmm, are we to the point with the "secrets" based modules that they need their own unique event type?

@domwhewell-sage
Copy link
Contributor Author

domwhewell-sage commented Aug 12, 2024

I don't think a new event would be necessary, but maybe an optional path / filepath attribute to a FINDING.

Yes you would only really get a path attribute in the event.data dictionary in a FINDING that is a child of RAW_TEXT

@domwhewell-sage domwhewell-sage marked this pull request as draft August 16, 2024 13:08
@domwhewell-sage domwhewell-sage marked this pull request as ready for review August 16, 2024 19:00
@domwhewell-sage
Copy link
Contributor Author

Not sure if the event.host is correct? I've merged in the latest changes from #1656 but it doesnt seem to be getting the correct parent host
2024-08-16T18:39:39.7445104Z DEBUG bbot.modules._scan_ingress:base.py:1235 FINDING("{'host': 'none', 'description': 'Parsed file content contains a possible seriali...", module=excavate, tags=set()) passed post-check

@TheTechromancer
Copy link
Collaborator

TheTechromancer commented Aug 16, 2024

it doesnt seem to be getting the correct parent host

Hmm I'll look into that. Also I noticed there's several places we're pulling url/path from parent events. I'm thinking we should also inherit those automatically. What do you think?

@TheTechromancer
Copy link
Collaborator

TheTechromancer commented Aug 16, 2024

Okay @domwhewell-sage I pushed some fixes to #1666. This should save you from having to manually pull the url/path and hopefully fix the host issue.

EDIT: merge it into this branch and if it works well we can just merge both at once.

Copy link

codecov bot commented Aug 17, 2024

Codecov Report

Attention: Patch coverage is 87.83784% with 18 lines in your changes missing coverage. Please review.

Project coverage is 93%. Comparing base (3172528) to head (a0a8f32).
Report is 34 commits behind head on dev.

Files Patch % Lines
bbot/modules/internal/excavate.py 83% 12 Missing ⚠️
bbot/modules/unstructured.py 58% 3 Missing ⚠️
bbot/core/event/base.py 92% 2 Missing ⚠️
bbot/core/engine.py 0% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #1636   +/-   ##
=====================================
+ Coverage     93%     93%   +1%     
=====================================
  Files        341     341           
  Lines      25926   25979   +53     
=====================================
+ Hits       23893   23950   +57     
+ Misses      2033    2029    -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheTechromancer TheTechromancer merged commit 5fa774a into blacklanternsecurity:dev Aug 17, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants