Analysis for eBPF for safety

Agenda

BPF

Introduction
Practical use of BPF

Observability

BPF front-end tool
BCC tools and programming
BPFtrace

Security
- BPF verifier
Networking

XDP

Work done during mentorship

XDP tools
Accepted patches in the Linux Kernel
Ongoing

Attempted but not accepted

BPF current issue proposed by Daniel Borkmann

How to use ebpf for safety
Mentorship wrap up

Introduction

The Berkeley Packet Filter(BPF/eBPF) is a technology developed in 1992. BPF brought two innovations in packet filtering technology:

A virtual machine working efficiently with register-based CPUs
The usage of per-application buffers that could filter packets without copying all the packet information.

BPF gained popularity then after massively improving the performance of packet capture tools of that time (tcpdump).

In 2013, BPF was extended and optimised for modern machine. This version was known as eBPF. It has been in constant development since then. The number of registers in BPF VM was increased from two 32 bit to ten 64-bit registers; writing more complex bpf programs became possible. This extended version of BPF added JIT support that increased the performance by four times.

This new version turned BPF into a general purpose execution engine that can be used for a variety of use case ranging from security , networking to observability to name but a few.

What is BPF?

BPF can be difficult to define because of its wide range of use cases.

Alexei Starovoirtov, the creator of the new version, define BPF as simply an instruction set, a new language, an extension to C or a safer C. Any programming language canbe compiled into BPF, he added.

eBPF/BPF implements a dedicated virtual machine with custom interpreter. User space programs can attach at various tracepoints within the kernel and perform a wide variety of tasks: tracing, monitoring and debugging.

BPF can be considered a virtual machine: it has an in kernel execution engine that processes this virtual instruction set.

The technology is actually composed of an instruction set, storage objects and helper functions.

Having started as a simple language for writing packet-filtering code for utilities like tcpdump: McCanne 92, BPF grew to a general purpose execution engine; that can be used for a variety of things including creation of advanced performance analysis tools.

With BPF, we can run mini programs on wide variety of kernel and application events.

An eBPF program is attached to a designated code path in the Kernel. When a code path is traversed, any attached eBPF programs are executed.

The main use of BPF are networking, observability(tracing) and security.

In this introduction, we will focus on the main use of the BPF subsystem.

Practical Use

Observability

Observability is the understanding of the system through observation by use of tracing tools, sampling tools and tools based on fixed counters. These tools are written using bpf.

BPF is an event driven programming that provides observability or tracing. System administrator tools that give extra informations that are not provided by common system administrator tools.

BCC

bcc tools

execsnoop

# execsnoop
PCOMM            PID    PPID   RET ARGS
dhcpcd-run-hook  29407  2642     0 /lib/dhcpcd/dhcpcd-run-hooks
sed              29410  29409    0 /bin/sed -n s/^domain //p wlan0.dhcp
cmp              29417  29407    0 /usr/bin/cmp -s /etc/resolv.conf ../resolv.conf.wlan0.ra
         qemu-system-x86  29422  27546    0 /usr/bin/qemu-system-x86_64 -m 4096 -smp 8 ... -snapshot

opensnoop

# opensnoop -T
	      TIME(s)       PID    COMM               FD ERR PATH
	      0.000000000   11552  baloo_file_extr    20   0 /home/jules/../linux/../unistd_32.h
	      0.000433000   11552  baloo_file_extr    20   0 /home/jules/../linux/../unistd_64.h
	      0.000764000   11552  baloo_file_extr    20   0 /home/jules/../linux/../unistd_x32.h
	      0.001084000   11552  baloo_file_extr    20   0 /home/jules/../linux/../syscalls_32.h
	      0.001391000   11552  baloo_file_extr    20   0 /home/jules/../linux/../unistd_32_ia32.h
	      0.001685000   11552  baloo_file_extr    20   0 /home/jules/../linux/../unistd_64_x32.h
	      0.079771000   3486   qemu-system-x86    23   0 /etc/resolv.conf
	      0.422395000   11858  Chrome_IOThread   389   0 /dev/shm/.com.google.Chrome.ct746O

ext4slower

# ext4slower
       Tracing ext4 operations slower than 10 ms
       TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
       22:16:08 baloo_file_ext 4458   S 0       0         125.20 index
       22:16:12 baloo_file_ext 4458   S 0       0         134.65 index
       22:16:16 baloo_file_ext 4458   S 0       0         151.65 index
       22:16:20 baloo_file_ext 4458   S 0       0         172.81 index
	     22:16:25 baloo_file_ext 4458   W 60678144 5098540    11.48 index

biolatency

# biolatency 
	      Tracing block device I/O... Hit Ctrl-C to end.
	      usecs               : count     distribution
	      0 -> 1          : 0        |                                        |
	      2 -> 3          : 0        |                                        |
	      4 -> 7          : 3        |                                        |
	      8 -> 15         : 115      |**************                          |
	      16 -> 31         : 49       |******                                  |
	      32 -> 63         : 36       |****                                    |
	      64 -> 127        : 1        |                                        |
	      128 -> 255        : 286      |************************************    |
        256 -> 511        : 160      |********************                    |
        512 -> 1023       : 315      |****************************************|
        1024 -> 2047       : 21       |**                                      |
	      2048 -> 4095       : 1        |                                        |

biosnoop

# biosnoop
	      TIME(s)     COMM           PID    DISK    T SECTOR     BYTES  LAT(ms)
	      0.000000    kworker/23:1   9126           R 18446744073709551615 0         0.61
	      1.774198    ThreadPoolFore 5270   nvme0n1 W 520198144  225280    0.48
	      1.774381    jbd2/nvme0n1p3 686    nvme0n1 W 490161296  65536     0.03
	      1.774609    ?              0              R 0          0         0.21
	      1.774809    jbd2/nvme0n1p3 686    nvme0n1 W 490161424  4096      0.19
	      2.069546    kworker/23:1   9126           R 18446744073709551615 0         0.17
	      2.159061    ?              0              R 0          0         0.24
	      2.159129    ThreadPoolFore 5270   nvme0n1 W 777702184  4096      0.01
	      2.159341    ?              0              R 0          0         0.20
	      2.159387    ThreadPoolFore 5270   nvme0n1 W 15221256   8192      0.01
	      2.159598    ?              0              R 0          0         0.20
	      2.159713    jbd2/nvme0n1p3 686    nvme0n1 W 490161432  53248     0.02

tcpconnect

# tcpconnect
	      Tracing connect ... Hit Ctrl-C to end
	      PID    COMM         IP SADDR            DADDR            DPORT
	      4909   Chrome_Child 4  192.168.1.245    40.74.98.194     443
	      4909   Chrome_Child 4  192.168.1.245    40.74.98.194     443
	      5564   Chrome_Child 4  192.168.1.245    172.217.16.238   443
	      4909   Chrome_Child 4  192.168.1.245    52.97.208.18     443
	      5564   Chrome_Child 4  192.168.1.245    142.250.200.14   443
	      5564   Chrome_Child 4  192.168.1.245    35.206.151.171   443
	      4909   Chrome_Child 4  192.168.1.245    52.113.205.5     443
	      5564   Chrome_Child 4  192.168.1.245    34.131.36.146    443
	      4909   Chrome_Child 4  192.168.1.245    13.89.179.10     443
	      5564   Chrome_Child 4  192.168.1.245    142.250.179.229  443

tcpretrans

# tcpretrans
Tracing retransmits ... Hit Ctrl-C to end
TIME     PID    IP LADDR:LPORT          T> RADDR:RPORT          STATE
 22:36:32 0  4  192.168.1.245:42072  R> 13.33.52.19:443  ESTABLISHED
 22:39:50 0  4  192.168.1.245:59090  R> 142.250.179.229:443  ESTABLISHED
22:39:50 0   4  192.168.1.245:59070  R> 142.250.179.229:443  ESTABLISHED
22:39:51 1372 4  192.168.1.245:59090  R> 142.250.179.229:443  ESTABLISHED
22:39:51 1372 4  192.168.1.245:59092  R> 142.250.179.229:443  ESTABLISHED

dcsnoop

# dcsnoop
  TIME(s)     PID    COMM             T FILE
  8.893741    29295  sadc             M dev
  8.893782    29295  sadc             M dev
  8.893813    29295  sadc             M dev
  8.894006    29295  sadc             M nfs
  8.894028    29295  sadc             M nfsd
  8.894041    29295  sadc             M sockstat
  8.894053    29295  sadc             M softnet_stat
  13.240580   3743   ThreadPoolForeg  M todelete_e3e954c5761fd557_0_1
  13.240988   3557   Chrome_IOThread  M .org.chromium.Chromium.PPeyk5
  13.243443   3743   ThreadPoolForeg  M e3e954c5761fd557_0
  21.747442   29303  dhcpcd-run-hook  M resolv.conf.wlan0.ra
  21.747854   29313  cmp              M maps
  21.748626   29315  rm               M resolv.conf.wlan0.ra
  21.750007   2470   dhcpcd           M if_inet6

cachestat

# cachestat
  HITS   MISSES  DIRTIES HITRATIO   BUFFERS_MB  CACHED_MB
   16        1        1   94.12%         1312       3249
    0        0        0    0.00%         1312       3249
   34        3       15   91.89%         1312       3249
    0        0        0    0.00%         1312       3249
   14        3        5   82.35%         1312       3249
  407        0       80  100.00%         1312       3249
	0        0        0    0.00%         1312       3249
	0        0        0    0.00%         1312       3249
	0        0       19    0.00%         1312       3249
	0        0        0    0.00%         1312       3249
	9743        0      136  100.00%         1312       3249
	0        0        3    0.00%         1312       3249
	0        0        0    0.00%         1312       3249
	5        0        0  100.00%         1312       3249
   0        0        0    0.00%         1312       3249

trace

 # trace 'do_nanosleep(struct hrtimer_sleeper *t) "task: %x", t->task'
 PID     TID     COMM            FUNC             -
 3437    3489    teams           do_nanosleep  task: d4588000
 2511    2815    pool-gsd-smartc do_nanosleep  task: 6b248000
 18685   18693   nautilus        do_nanosleep  task: 21b55200
 112985  113009  vqueue:src      do_nanosleep  task: 328b0000
 3437    3489    teams           do_nanosleep  task: d4588000

funccount

<<<<<<< HEAD
       # funccount 'tcp_send*'
       Tracing 16 functions for "b'tcp_send*'"... Hit Ctrl-C to end.
       ^C        
=======
       # funccount 'vfs_*'
       Tracing 70 functions for "b'vfs_*'"...
       Hit Ctrl-C to end. ^C
>>>>>>> d88860d83a4d33fdd69b6b3bf76019d86e9a54e0
       FUNC                                    COUNT
       b'tcp_send_probe0'                          5
       b'tcp_send_active_reset'                    8
       b'tcp_send_loss_probe'                      9
       b'tcp_send_dupack'                         18
       b'tcp_send_fin'                            58
       b'tcp_send_mss'                          2594
       b'tcp_sendmsg_locked'                    2595
       b'tcp_sendmsg'                           2595
       b'tcp_send_delayed_ack'                  2778
       b'tcp_send_ack'                          3723
       Detaching...

stackcount

Why is this event called? What is the code path?
What are all the different code paths that call this event, and what are their frequencies?

 # sudo stackcount t:sched:sched_switch
        b'__sched_text_start'
  b'__sched_text_start'
  b'schedule'
  b'__down_write_common'
  b'down_write_killable'
  b'mmap_write_lock_killable'
  b'__vm_munmap'
  b'__x64_sys_munmap'
  b'do_syscall_64'
  b'entry_SYSCALL_64_after_hwframe'
    1

bcc programming
bpftrace

 
$ sudo bpftrace  -e 'kretprobe:vfs_read { @bytes = hist(retval); }'
Attaching 1 probe...
^C
@bytes:
(..., 0)              78 |                                                    |
[0]                   56 |                                                    |
[1]                  492 |@@@@@                                               |
[2, 4)                54 |                                                    |
[4, 8)              2395 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
[8, 16)             1414 |@@@@@@@@@@@@@@@                                     |
[16, 32)              30 |                                                    |
[32, 64)             236 |@@                                                  |
[64, 128)             12 |                                                    |
[128, 256)             4 |                                                    |
[256, 512)             6 |                                                    |
[512, 1K)              6 |                                                    |
[1K, 2K)             311 |@@@                                                 |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)              0 |                                                    |
[16K, 32K)          4656 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32K, 64K)             0 |                                                    |
[64K, 128K)            1 |                                                    |
[128K, 256K)           0 |                                                    |
[256K, 512K)           1 |

BPFtrace being a programming language, an hello world is written as

$ sudo bpftrace -e 'BEGIN { printf("Hello, World!\n"); }'
Attaching 1 probe...
Hello, World!

XDP

In XDP, bpf hook is added early in RX path of the kernel, enabling the user supplied bpf program to decide the fate of the packet. With XDP code are executed very early on when network packet arrive at the kernel.

Unsurprisingly XDP programs are controlled through bpf syscall and loaded using the program type BPF_PROG_TYPE_XDP.

The execution of an XDP program can happen in one of the three modes:

Native XDP: XDP BPF program is run directly out of the networking driver's early receive path. Most drivers support XDP. The command below will check whether the driver support XDP
```
 # git grep -l XDP_SETUP_PROG drivers/
 
```
Offloaded XDP

XDP BPF program is directly offloaded into th NIC instead of being executed by host CPU. The command below checks whether the driver support XDP.

 # git grep -l XDP_SETUP_PROG_HW drivers/

Generic XDP

This is a test-mode for developers, XDP program is loaded on virtualised card - veth devices.

eBPF Verifer

The verifier is a mechanism that determines the safety of the eBPF program\nand only allow the execution of the program that passes the safety checks.

The checks are done in two steps:

Directed Acyclic Graph (DAG) check Here the verifier checks whether the program will terminate (acyclic), ensuring that the program does not have any backward branches as it must be directed graph, though the program can branch forward to the same point. Program with unreachable instruction will not not be allowed to run.

This step is done by doing a depth-first search of the program's control flow graph. The CFG enforce two eBPF rules:

a) No back-edges b) No unreachable instruction

Simulation check: The verifier simulates the execution of every instruction in the program, starting from the first instruction and try all possible paths the instructions can lead to, while observing the state change of registers and stack making sure no invalid operations are performed.

Workplan

BPF current issue proposed by Daniel Borkmann * Move samples/bpf to BPF selftests folder to improve on test_prog BPFci - currently ongoing, rewriting the Makefile * Require reading on Makefile writing * Duration: ongoing - one more week - expected to complete 26th May * progress: + moved files, made them compiled correctly in the same directory + challenges : + merging two makefiles into one + add more tests to bpf CI from the samples/bpf
- Create a bcc tool for network statistics using XDP/BPF technology
  - proposed tool to gather statisitics on per ip address connected to network event block an IP if suspicious
  - Duration: two weeks
    - 1 week reading xdp documents and practice
    - 1 week coding
    - expected completion 5th June.
- Improving the eBPF verifier : work with Wenhui * Duration: One Month - starts 6th June
- Any other issues assigned by mentor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis_of_ebpf_for_safety_mentorship_program.md

analysis_of_ebpf_for_safety_mentorship_program.md

Analysis for eBPF for safety

Agenda

Introduction

What is BPF?

Practical Use

Workplan

Files

analysis_of_ebpf_for_safety_mentorship_program.md

Latest commit

History

analysis_of_ebpf_for_safety_mentorship_program.md

File metadata and controls

Analysis for eBPF for safety

Agenda

Introduction

What is BPF?

Practical Use

Workplan