Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoIP in http logs #40

Open
Iqi-Malick opened this issue Jun 15, 2022 · 5 comments
Open

GeoIP in http logs #40

Iqi-Malick opened this issue Jun 15, 2022 · 5 comments

Comments

@Iqi-Malick
Copy link

How can we implement GeoIP feature for HTTP and DNS logs?

@philrz
Copy link
Contributor

philrz commented Jun 27, 2022

@Iqi-Malick: Sorry for the delay in getting back to you. I noticed you'd asked the Zeek team about this over on their Slack at https://zeekorg.slack.com/archives/CSZBXF6TH/p1655289167790869. Have you managed to achieve what you were attempting, or do you still need help?

@Iqi-Malick
Copy link
Author

@Iqi-Malick: Sorry for the delay in getting back to you. I noticed you'd asked the Zeek team about this over on their Slack at https://zeekorg.slack.com/archives/CSZBXF6TH/p1655289167790869. Have you managed to achieve what you were attempting, or do you still need help?

Issue is not resolved yet. I still need help with it.

@philrz
Copy link
Contributor

philrz commented Jul 6, 2022

@Iqi-Malick: I just spent some time hacking at trying to add these additional fields to the other logs like DNS and HTTP and couldn't get it to work. I'm sure someone with better Zeek scripting skills than me could figure it out. I'll leave this issue open in case someone else sees it and is able to contribute.

FWIW, rather than adding the redundant info in all the different log files, the way I'd try to approach this would be to grab the uid field from the logs for HTTP/DNS/etc. and then look up that same uid in the conn.log to get the geolocation data, since there's always going to be a conn entry that corresponds to each higher-level protocol message. Is there a reason why this approach is unworkable for your use case?

@Iqi-Malick
Copy link
Author

System architecture is designed in such a way that only connection logs are not enough that's why I need a separate script to add geoip in all the logs.

@philrz
Copy link
Contributor

philrz commented Apr 4, 2024

While I still don't have a solution to have the geoip-conn package add this detail to non-conn Zeek logs via Zeek scripting, I can point to a way zq could be used to accomplish this once the logs are generated. The approach leverages the fact that Zeek conn records and app-layer records like http share a uid value. Consider Zed script add-geo-to-http.zed:

file http.log
| inner join (
  file conn.log
  | _path=="conn"
) on uid=uid geo:=geo

I'll walk through an example of putting it to use on the sample pcap wrccdc.2018-03-23.010014000000000.pcap.gz. Assuming that it's been downloaded and unzipped, first process it with a Zeek instance that has the geoip-conn package installed, then run the Zed script.

$ zeek -C -r wrccdc.2018-03-23.010014000000000.pcap local

$ zq -f zeek -o http-with-geo.log -I add-geo-to-http.zed 

That command line outputs the geolocation-enhanced HTTP records in a separate Zeek TSV file http-with-geo.log. Since the free GeoLite2 data has limited coverage, as usual the geolocation data will be missing for many of the entries. But we can create another zq command line to see an example of it having successfully found and used the info:

$ zq -Z 'geo.resp.country_code != null | tail 1' http-with-geo.log  
{
    _path: "http",
    ts: 2018-03-23T19:59:00.449926Z,
    uid: "Czv8eo3DjGuHWIA9He",
    id: {
        orig_h: 10.47.4.218,
        orig_p: 50017 (port=uint16),
        resp_h: 172.217.11.78,
        resp_p: 80 (port)
    },
    trans_depth: 1 (uint64),
    method: "POST",
    host: "clients1.google.com",
    uri: "/ocsp",
    referrer: null (string),
    version: "1.1",
    user_agent: "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:55.0) Gecko/20100101 Goanna/4.0 Firefox/55.0 Basilisk/20180321",
    origin: null (string),
    request_body_len: 75 (uint64),
    response_body_len: 463 (uint64),
    status_code: 200 (uint64),
    status_msg: "OK",
    info_code: null (uint64),
    info_msg: null (string),
    tags: |[]| (|[zenum=string]|),
    username: null (string),
    password: null (string),
    proxied: null (|[string]|),
    orig_fuids: [
        "FXVbqtTzjyUsLg5b1"
    ],
    orig_filenames: null ([string]),
    orig_mime_types: [
        "application/ocsp-request"
    ],
    resp_fuids: [
        "FNkgeP19NcT2E1f3L3"
    ],
    resp_filenames: null ([string]),
    resp_mime_types: [
        "application/ocsp-response"
    ],
    geo: {
        orig: {
            country_code: null (string),
            region: null (string),
            city: null (string),
            latitude: null (float64),
            longitude: null (float64)
        },
        resp: {
            country_code: "US",
            region: null (string),
            city: null (string),
            latitude: 37.751,
            longitude: -97.822
        }
    }
}

This same approach could be used to decorate other Zeek log types that also have the uid field, e.g., dns as mentioned in a previous comment.

If you have a Zeek installation that's continuously generating logs based on live traffic, I'd recommend running scripts like this as a post-log-rotation step since all the different Zeek log types needed to perform the join will be present after each rotation.

@Iqi-Malick: I'm not sure if you're still watching this issue and are interested in the enhancement, but if you (or anyone else that stumbles onto this issue) has questions about the zq approach I'm happy to provide more guidance. I'll continue to hold this issue open so others can find it whether to try the zq approach or if they have the Zeek scripting skills to enhance the Zeek script itself as originally requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants