This repository contains the source code to create a container image containing tcpdump
and pcap-cli
to perform packet capture in Cloud Run multi-container deployments.
Captured packets are optionally translated to JSON and written into Cloud Logging
During development, it is often useful to perform packet capturing to troubleshoot specific/gnarly network related conditions/issues.
This container image is to be used as a sidecar of the Cloud Run main –ingress– container in order to perform a packet capture using tcpdump
within the same network namespace.
The sidecar approach enables decoupling from the main –ingress– container so that it does not require any modifications to perform a packet capture; additionally, sidecars use their own resources which allows tcpdump
to not compete with the main app resources allocation.
NOTE: the main –ingress– container is the one to which all ingress traffic ( HTTP Requests ) is delivered to; for Cloud Run services, this is typically your APP container.
- Structured Cloud Logging entries that provide easily digestible pcap info.
ARP
analysis.ICMPv4
andICMPv6
analysis:- supported messages:
EchoRequest
,EchoReply
,TimeExceeded
,DestinationUnreachable
, andRedirect
.
- supported messages:
HTTP/1.1
orHTTP/2
analysis:- Semented by networking layer and
HTTP/1.1
with raw message. - Report errors at
HTTP/1.1
message andHTTP/2
frames analysis.
- Semented by networking layer and
- Packet linking query analysis via flow ID ( 5-tuple ) and Cloud Trace ID.
- Exports pcap files to Google Cloud Storage (GCS)
- Support
.json
and.pcap
file formats with optional gzip compression - Graceful handling of
SIGTERM
to ensure all completed pcap files are flushed to GCS before container exits.
- Support
- Packet capture configurability:
tcpdump
filter, interface, snapshot length, pcap file rotation duration.- simplified
tcpdump
filter creation by defining: FQDN, ports and TCP flags.
- Control for scheduling
tcpdump
executions viaCRON
.
- Ubuntu 22.04 official docker image
tcpdump
installed from Ubuntu's official repository to perform packet captures.gopacket
to perform packet capturing and getting a handle on all captured packets.- GCSFuse to mount the GCS Bucket used to export PCAP files.
- Go Supervisord to orchestrate startup processes execution.
- fsnotify to listen for filesystem events.
- gocron to schedule execution of
tcpdump
. - Docker Engine and Docker CLI to build the sidecar container image.
- pcap-cli to perform packet capturing and translations to JSON.
The sidecar uses:
-
tcpdump
/pcap-cli
to capture packets in both wireshark compatible format andJSON
. All containers use the same network namespace and so this sidecar captures packets from all containers within the same instance. -
pcap-cli
allows to perform packet translations into Cloud Logging compatible structuredJSON
. It also providesHTTP/1.1
andHTTP/2
analysis, including Trace context awareness (X-Cloud-Trace-Context
/traceparenmt
) to hydrate structured logging with trace information which allows rich network data analysis using Cloud Trace. -
tcpdumpw
to executetcpdump
/pcap-cli
and generate PCAP files; optionally, schedulestcpdump
/pcap-cli
executions. -
pcap-fsnotify
to listen for newly created PCAP files, optionally compress PCAPs ( recommended ) and move them into Cloud Storage mount point. -
GCSFuse to mount a Cloud Storage Bucket to move compressed PCAP files into.
PCAP files are moved from the sidecar's in-memory filesystem into the mounted Cloud Storage Bucket.
The pcap sidecar has images that are compatible with both Cloud Run execution environments.
Important
- The gen1 images are compatible for BOTH gen1 and gen2 Cloud Run execution environments.
- The gen2 images are compatible for ONLY the gen2 Cloud Run execution environment.
This is because gen1 does not support the newest version of libpcap, whereas gen2 does.
- Cloud Run gen1 images:
us-central1-docker.pkg.dev/pcap-sidecar/pcap-sidecar/pcap-sidecar:latest
us-central1-docker.pkg.dev/pcap-sidecar/pcap-sidecar/pcap-sidecar:v#.#.#-gen1
- Cloud Run gen2 images:
us-central1-docker.pkg.dev/pcap-sidecar/pcap-sidecar/pcap-sidecar:newest
us-central1-docker.pkg.dev/pcap-sidecar/pcap-sidecar/pcap-sidecar:v#.#.#-gen2
-
Define environment variables to be used during Cloud Run service deployment:
export SERVICE_NAME='...' # Cloud Run or App Engine Flex service name export SERVICE_REGION='...' # GCP Region: https://cloud.google.com/about/locations export SERVICE_ACCOUNT='...' # Cloud Run service's identity export INGRESS_CONTAINER_NAME='...' # the name of the ingress container i/e: `app` export INGRESS_IMAGE_URI='...' export INGRESS_PORT='...' export TCPDUMP_SIDECAR_NAME='...' # the name of the pcap sidecar i/e: `pcap-sidecar` # public image compatible with both gen1 & gen2. Alternatively build your own export TCPDUMP_IMAGE_URI='us-central1-docker.pkg.dev/pcap-sidecar/pcap-sidecar/pcap-sidecar:latest' export PCAP_IFACE='eth' # prefix of the interface in which packets should be captured from export PCAP_GCS_BUCKET='...' # the name of the Cloud Storage Bucket to mount export PCAP_FILTER='...' # the BPF filter to use; i/e: `tcp port 443` export PCAP_JSON_LOG=true # set to `true` for writting structured logs into Cloud Logging
-
Deploy the Cloud Run service including the
tcpdump
sidecar:
Note
If adding the tcpdump
sidecar to a preexisting Cloud Run service that is a single container service the gcloud command will fail.
You will need to instead make these updates via the Cloud Console or create a new Cloud Run service.
gcloud run deploy ${SERVICE_NAME} \
--project=${PROJECT_ID} \
--region=${SERVICE_REGION} \
--service-account=${SERVICE_ACCOUNT} \
--container=${INGRESS_CONTAINER_NAME} \
--image=${INGRESS_IMAGE_URI} \
--port=${INGRESS_PORT} \
--container=${TCPDUMP_SIDECAR_NAME} \
--image=${TCPDUMP_IMAGE_URI} \
--cpu=1 --memory=1G \
--set-env-vars="PCAP_IFACE=${PCAP_IFACE},PCAP_GCS_BUCKET=${PCAP_GCS_BUCKET},PCAP_FILTER=${PCAP_FILTER},PCAP_JSON_LOG=${PCAP_JSON_LOG} \
See the full list of available flags for
gcloud run deploy
at https://cloud.google.com/sdk/gcloud/reference/run/deploy
- All containers need to depend on the
tcpdump
sidecar, but this configuration is not available via gcloud due to needing to configure healthchecks for the sidecar container. To make all containers depend on thetcpdump
sidecar, edit the Cloud Run service via the Cloud Console and make all other containers depend on thetcpdump
sidecar and add the following TCP startup probe healthcheck to thetcpdump
sidecar:
startupProbe:
timeoutSeconds: 1
periodSeconds: 10
failureThreshold: 10
tcpSocket:
port: 12345
You can optionally choose a different port by setting
PCAP_HC_PORT
as an env var of thetcpdump
sidecar
The tcpdump
sidecar accepts the following environment variables:
-
PCAP_IFACE
: (STRING, required) a prefix for the interface to perform packet capturing on; i/e:eth
,ens
...Notice that
PCAP_IFACE
is not the full interface name nor a regex or a pattern, but a prefix; soeth0
becomeseth
, andens4
becomesens
.For Cloud Run gen1 the value of this environment variable will always be
any
. -
PCAP_GCS_BUCKET
: (STRING, required) the name of the Cloud Storage Bucket to be mounted and used to store PCAP files. Ensure that you provide the runtime service account theroles/storage.admin
so that it may create objects and read bucket metadata. -
PCAP_L3_PROTOS
: (STRING, optional) comma separated list of network layer protocols; default value isipv4,ipv6
. -
PCAP_L4_PROTOS
: (STRING, optional) comma separated list of transport layer protocols; default value istcp,udp
. -
PCAP_IPV4
: (STRING, optional) comma separated list of IPv4 addresses or IPv4 networks using CIDR notation; default value isDISABLED
. Example:127.0.0.1,127.0.0.1/32
. -
PCAP_IPV6
: (STRING, optional) comma separated list of IPv6 addresses or IPv6 networks using CIDR notation; default value isDISABLED
. Example:::1,::1/128
. -
PCAP_HOSTS
: (STRING, optional) comma separated list of FQDNs (hosts) to capture traffic to/from; default value isALL
. Example:metadata.google.internal,pubsub.googleapis.com
. -
PCAP_PORTS
: (STRING, optional) comma separated list of translport layer addresses (UDP or TCP ports) to capture traffic to/from; default value isALL
. Example:80,443
. -
PCAP_TCP_FLAGS
: (STRING, optional) comma separated list of lowercase TCP flags that a segment must contain for it to be captured; default value isANY
. Example:syn,rst
. -
PCAP_SNAPSHOT_LENGTH
: (NUMBER, optional) bytes of data from each packet rather than the default of 262144 bytes; default value is65536
. For more details see https://www.tcpdump.org/manpages/tcpdump.1.html#:~:text=%2D%2D-,snapshot%2Dlength,-%3DsnaplenThe value of this environment variable must not be
0
, specially for Cloud Run gen1 where if it is set to0
not even PDU headers will be available. -
PCAP_ROTATE_SECS
: (NUMBER, optional) how often to rotate PCAP files created bytcpdump
; default value is60
seconds. -
GCS_MOUNT
: (STRING, optional) where in the sidecar in-memory filesystem to mount the Cloud Storage Bucket; default value is/pcap
. -
PCAP_FILE_EXT
: (STRING, optional) extension to be used for PCAP files; default value ispcap
. -
PCAP_COMPRESS
: (BOOLEAN, optional) whether to compress PCAP files or not; default value istrue
. -
PCAP_TCPDUMP
: (BOOLEAN, required) whether to usetcpdump
or not (tcpdump
will generate pcap files, if notPCAP_JSON
must be enabled ) and push those.pcap
files to GCS; default valie istrue
. -
PCAP_JSON
: (BOOLEAN, optional) whether to useJSON
to dump packets or not into GCS ; default value isfalse
.PCAP_TCPDUMP
andPCAP_JSON
maybe be bothtrue
in order to generate both:.pcap
and.json
PCAP files that are stored in GCS. -
PCAP_JSON_LOG
: (BOOLEAN, optional) wheter to writeJSON
translated packets intostdout
(PCAP_JSON
may not be enabled ); default value isfalse
.This is useful when
Wireshark
is not available, as it makes it possible to have all captured packets available in Cloud Logging -
PCAP_ORDERED
: (BOOLEAN, optional) whenPCAP_JSON
orPCAP_JSON_LOG
are enabled, wheter to print packets in captured order ( if set tofalse
, packet will be written as fast as possible ); default value isfalse
.In order to improve performance, packets are translated and written concurrently; when
PCAP_ORDERED
is enabled, only translations are performed concurrently. EnablingPCAP_ORDERED
may cause packet capturing to be slower, so it is recommended to keep it disabled as all translated packets have apcap.num
property to assert order. -
PCAP_HC_PORT
: (NUMBER, optional) the TCP port that should be used to accept startup probes; connections will only be accepted when packet capturing is ready; default value is12345
.
More advanced use cases may benefit from scheduling tcpdump
executions. Use the following environment variables to configure scheduling:
-
PCAP_FILTER
: (STRING, optional) standardtcpdump
BPF filters to scope the packet capture to specific traffic; i/e:tcp
. Its default value isDISABLED
.PCAP_FILTER
is not available for Cloud Run gen1; use simple filters instead. -
PCAP_USE_CRON
: (BOOLEAN, optional) whether to enable scheduling oftcpdump
executions; default value isfalse
. -
PCAP_CRON_EXP
: (STRING, optional)cron
expression used to configure schedulingtcpdump
executions.- NOTE: if
PCAP_USE_CRON
is set totrue
, thenPCAP_CRON_EXP
is required. See https://crontab.cronhub.io/ to get help withcrontab
expressions.
- NOTE: if
-
PCAP_TIMEZONE
: (STRING, optional) the Timezone ID used to configure scheduling oftcpdump
executions usingPCAP_CRON_EXP
; default value isUTC
. -
PCAP_TIMEOUT_SECS
: (NUMBER, optional) secondstcpdump
execution will last; devault value is0
: execution will not be stopped.NOTE: if
PCAP_USE_CRON
is set totrue
, you should set this value to less than the time in seconds between scheduled executions. -
PCAP_COMPAT
: (BOOLEAN, optional) whether to run the PCAP sidecar in Cloud Run gen1 compatible mode; default value isfalse
.When using
latest
orgen1
container images, this environment variable will be automatically set totrue
.
-
The Cloud Storage Bucket mounted by the
tcpdump
sidecar is not accessible by the main –ingress– container. -
Processes running in the
tcpdump
sidecar are not visible to the main –ingress– container ( or any other container ); similarly, thetcpdump
sidecar doesn't have visibility of processes running in other containers. -
All PCAP files will be stored within the Cloud Storage Bucket with the following "hierarchy":
PROJECT_ID
/SERVICE_NAME
/GCP_REGION
/REVISION_NAME
/INSTANCE_STARTUP_TIMESTAMP
/INSTANCE_ID
.this hierarchy guarantees that PCAP files are easily indexable and hard to override by multiple deployments/instances.
It also simplifies deleting no longer needed PCAPs from specific deployments/instances.
-
When defining
PCAP_ROTATE_SECS
, keep in mind that the current PCAP file is temporarily stored in the sidecar in-memory filesystem. This means that if your APP is network intensive:- The longer it takes to rotate the current PCAP file, the larger the current PCAP file will be, so...
- Larger PCAP files will require more memory to temporarily store the current one before offloading it into the Cloud Storage Bucket.
-
When defining
PCAP_SNAPSHOT_LENGTH
, keep in mind that a large value will result in larget PCAP files; additionally, you may not need to ispect the data, just the packet headers. -
Keep in mind that every Cloud Run instance will produce its own set of PCAP files, so for troubleshooting purposes, it is best to define a low Cloud Run maximum number of instances.
It is equally important to define a well scoped BPF filter in order to capture only the required packets and skip everything else. The
tcpdump
flag --snapshot-length is also useful to limit the bytes of data to capture from each packet. -
Packet capturing is always on while the instance is available, so it is best to rollback to a non packet capturing revision and delete the packet-capturing one after you have captured all the required traffic.
-
The full packet capture from a Cloud Run instance will be composed out of multiple smaller ( optionally compressed ) PCAP files. Use a tool like mergecap to combine them into one.
-
In order to be able to mount the Cloud Storage Bucket and store PCAP files, Cloud Run's identity must have proper roles/permissions.
-
The
tcpdump
sidecar is intended to be used for troubleshooting purposes only. While thetcpdump
sidecar has its own set of resources, storing bytes from PCAP files in Cloud Storage and logging packet translations into Cloud Logging introduces additional costs for both Storage and Networking.-
Define a BPF filter to capture just the required packets, and nothing else; examples of bad filters for long running or data intensive tests:
tcp
,tcp or udp
,tcp port 443
, etc... -
Set
PCAP_COMPRESS
totrue
to store compressed PCAP files and save storage bytes; additionally, use regional Buckets to minize costs. -
Whenever possible, use packet capturing scheduling to avoid running
tcpdump
100% of instance lifetime. -
When troubleshooting is complete, deploy a new Revision without the
tcpdump
sidecar to completely disable it.
-
-
While it is true that Cloud Storage volume mounts is an available built in feature of Cloud Run, GCSFuse is used instead to minimize the required configuration to deploy a Revision instrumented with the
tcpdump
sidecar.NOTE: this is also the reason why the base image for the
tcpdump
sidecar isubuntu:22.04
and not something lighter likealpine
. GCSFuse pre-built packages are only available for Debian and RPM based distributions. -
While setting
PCAP_ORDER
totrue
is a good alternative for low traffic scenarios, it is recommended setting it tofalse
for most other cases since the level of concurrency is reduced (only for translations) in order to guarantee packet order.NOTE: packet order means the order in which the underlying engine (
gopacket
) delivers captured packets. -
Use scheduled packet capturing (
PCAP_USE_CRON
and other advanced flags ) if you don't need to capture packets 100% of instance runtime as it will reduce the number ofPCAP files
.NOTE: this sidecar is subject to Cloud Run CPU allocation configuration; so if the revision is configured to only allocate CPU during request processing, then CPU will also be throttled for the sidecar. This means that when CPU is only allocated during request processing, no packet capturing will happen outside request processing; the same applies for
PCAP files
export into Cloud Storage. -
The advanced congifuration
PCAP_FILTER
is not currently supported for Cloud Run gen1; this means that in order to apply packets filtering you should use the simple filters:PCAP_IPV4
,PCAP_IPV6
,PCAP_HOSTS
,PCAP_PORTS
,PCAP_TCP_FLAGS
,PCAP_L3_PROTOS
, andPCAP_L4_PROTOS
.
-
Use Cloud Logging to look for the entry starting with:
[INFO] - PCAP files available at: gs://
...It may be useful to use the following filter:
resource.type = "cloud_run_revision" resource.labels.service_name = "<cloud-run-service-name>" resource.labels.location = "<cloud-run-service-region>" "<cloud-run-revision-name>" "PCAP files available at:"
This entry contains the exact Cloud Storate path to be used to download all the PCAP files.
Copy the full path including the prefix
gs://
, and assign it to the environment variableGCS_PCAP_PATH
. -
Download all PCAP files using:
mkdir pcap_files cd pcap_files gcloud storage cp ${GCS_PCAP_PATH}/*.gz . # use `${GCS_PCAP_PATH}/*.pcap` if `PCAP_COMPRESS` was set to `false`
-
If
PCAP_COMPRESS
was set totrue
, uncompress all the PCAP files:gunzip ./*.gz
-
Merge all PCAP files into a single file:
-
for
.pcap
files:mergecap -w full.pcap -F pcap ./*_part_*.pcap
-
for
.json
files:cat *_part_*.json | jq -crMs 'sort_by(.pcap.date)' > pcap.json
See
mergecap
docs: https://www.wireshark.org/docs/man-pages/mergecap.htmlSee
jq
docs: https://jqlang.github.io/jq/manual/ , JSON pcaps are particularly useful when Wireshark is not available. -
-
Define the
PROJECT_ID
environment variable; i/e:export PROJECT_ID='...'
. -
Clone this repository:
git clone --depth=1 --branch=main --single-branch https://github.com/gchux/cloud-run-tcpdump.git
Tip
If you prefer to let Cloud Build perform all the tasks, go directly to build using Cloud Build
- Move into the repository local directory:
cd cloud-run-tcpdump
.
Continue with one of the following alternatives:
Using a local environment or Cloud Shell
-
Build and push the
tcpdump
sidecar container image:export TCPDUMP_IMAGE_URI='...' # this is usually Artifact Registry e.g. '${_REPO_LOCATION}-docker.pkg.dev/${PROJECT_ID}/${_REPO_NAME}/${_IMAGE_NAME}' export RUNTIME_ENVIRONMENT='...' # either 'cloud_run_gen1' or 'cloud_run_gen2' ./docker_build ${RUNTIME_ENVIRONMENT} ${TCPDUMP_IMAGE_URI}
Using Cloud Build
This approach assumes that Artifact Registry is available in PROJECT_ID
.
-
Define the following environment variables:
export REPO_LOCATION='...' # Artifact Registry Docker repository location e.g. us-central1 export REPO_NAME='...' # Artifact Registry Docker repository name export IMAGE_NAME='...' # container image name; i/e: `pcap-sidecar`
-
Build and push the
tcpdump
sidecar container image using Cloud Build:gcloud builds submit \ --project=${PROJECT_ID} \ --config=$(pwd)/cloudbuild.yaml \ --substitutions='_REPO_LOCATION=${REPO_LOCATION},_REPO_NAME=${REPO_NAME},_IMAGE_NAME=${IMAGE_NAME}' $(pwd)
See the full list of available flags for
gcloud builds submit
: https://cloud.google.com/sdk/gcloud/reference/builds/submit
-
Enable debug mode an App Engine Flexible instance: https://cloud.google.com/appengine/docs/flexible/debugging-an-instance#enabling_and_disabling_debug_mode
-
Connect to the instnace using SSH: https://cloud.google.com/appengine/docs/flexible/debugging-an-instance#connecting_to_the_instance
-
Escalate privileges; execute:
sudo su
-
Create the following
env
file namedpcap.env
, use the following sample to define sidecar variables:# $ touch pcap.env PCAP_GAE=true PCAP_GCS_BUCKET=the-gcs-bucket # the name of the Cloud Storage bucket used to store PCAP files GCS_MOUNT=/gae/pcap # where to mount the Cloud Storage bucket within the container FS PCAP_IFACE=eth # network interface prefix PCAP_FILTER=tcp or udp # BPF filter to scope packet capturing to specific network traffic PCAP_SNAPSHOT_LENGTH=0 PCAP_USE_CRON=false # do not schedule packet capturing PCAP_TIMEZONE=America/Los_Angeles PCAP_TIMEOUT_SECS=60 PCAP_ROTATE_SECS=30 PCAP_TCPDUMP=true PCAP_JSON=true PCAP_JSON_LOG=false # NOT necessary, packet translations are streamed directly to Cloud Logging PCAP_ORDERED=false
-
Create a directory to store the PCAP files in the host filesystem:
mkdir gae
-
Pull the sidecar container image:
docker --config=/etc/docker pull ${TCPDUMP_IMAGE_URI}
-
Run the sidecar to start capturing packets:
docker run --rm --name=pcap -it \ --cpus=1 --cpuset-cpus=1 \ --privileged --network=host \ --env-file=./pcap.env \ -v ./gae:/gae -v /var/log:/var/log \ -v /var/run/docker.sock:/docker.sock \ ${TCPDUMP_IMAGE_URI} nsenter -t 1 -u -n -i /init \ >/var/log/app_engine/app/STDOUT_pcap.log \ 2>/var/log/app_engine/app/STDERR_pcap.log
NOTE: for GAE Flex: it is strongly recommended to not use
PCAP_FILTER=tcp or udp
( or eventcp port 443
) as packets are streamed into Cloud Logging using its gRPC API,which means that traffic is HTTP/2 over TCP and so if you capture all TCP and UDP traffic you'll also be capturing all what's being exported into Cloud Logging which will cause a
write aplification effect that will starve memory as all your traffic will eventually be stored in sidecar's memory.
This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.