A Practical Introduction to eBPF

Jul 12, 2024 by Zacharia Mansouri | 627 views

https://cylab.be/blog/352/a-practical-introduction-to-ebpf

Have you ever wanted to enhance your favorite distribution kernel with debugging, tracing, networking, security or plenty of other features without going through a long approval/testing/integration process managed by the Linux community? The extended Berkeley Package Filter (eBPF) is a Linux kernel feature that aims at running user-space code (eBPF programs) in the kernel, safely and efficiently, via the in-kernel eBPF machine. Let’s discover how to build such programs.

Why eBPF?

The typical way to upgrade your distribution kernel is to: propose, develop and submit your feature as a patch to the community, so that it goes through an extensive review and testing process. If the outcome is positive, you’ll then need to wait for its integration into the mainline kernel and finally wait for its integration into your distribution. This whole process may take years… but not with eBPF programs that also bear the responsibility to avoid crashing your kernel thanks to a built-in eBPF code verifier while empowering you with the capability to extend your kernel.

Note that eBPF is an extension of BPF that was a tool focused on network packet filtering only.

Running eBPF programs with Python and C

eBPF will execute programs using its own virtual machine that executes bytecode produced from a set of RISC instructions using the LLVM toolchain. There are many ways to compile an eBPF program into bytecode, libraries exist in many programming languages such as C/C++, Python, Go, Rust and so on. For starters, let’s use Python and immediately dig into the code!

You first need to install the BPF Compiler Collection (BCC):

sudo apt-get install bpfcc-tools python3-bpfcc

Then create a file (let’s call it hello-bpf.py) with the following code:

from bcc import BPF

program = r"""
int hello(void *ctx) {
    bpf_trace_printk("Hello World!");
    return 0;
}"""

b = BPF(text=program)
syscall = b.get_syscall_fnname("execve")
b.attach_kprobe(event=syscall, fn_name="hello")

b.trace_print()

The content of the BPF program is pretty straightforward:

int hello(void *ctx) {
    bpf_trace_printk("Hello World!");
    return 0;
}

A function named hello uses the eBPF printing helper function bpf_trace_printk and returns a non-error code 0. The ctx parameter is the context, a data structure passed between components within an eBPF program, but we will not need it here. Note that the core of the eBPF program is written using the C language.

The bpf_trace_printk outputs text in the /sys/kernel/tracing/trace_pipe file, this is therefore where you’ll be able to read the messages that the program prints. The trace_print function called at the end of the Python hello-bpf.py script shows the content of this file.

But when will the hello function be called? After importing the BPF function from the bcc module previously installed, that function is used in order to load the BPF program in an instance that contains all the helpers you need to interact with eBPF and other helpers, such as the get_syscall_fnname function that will look for the exact syscall name as how it is implemented into the kernel when calling the standard execve. This name is then used in attach_kprobe that will attach the hello function of the BPF program to the execve syscall event. Such events are called kernel probes (kprobes). Whenever a new program is executed, execve will be called and the BPF program will be triggered, printing "Hello World!".

Since eBPF is inside kernel, you will need root permissions in order to launch hello-bpf.py.

eBPF hooks

Here are the system events on which eBPF programs can be attached:

eBPF XDP (eXpress Data Path) and TC (Traffic Control) programs can be attached to a network interface in order to process, drop, redirect and modify all the packets arriving to that network interface;
eBPF LIRC (Linux Infrared Remote Control) programs can customize infrared decoding or encoding in order to inject keyboard events (disabled by default on most Linux distributions such as Ubuntu);
eBPF LSM (Linux Security Modules) programs can define access control and audit policies;
eBPF Kprobe, Kretprobe and Tracepoint programs can be attached to a specific system call or kernel function.

Running eBPF programs with bpftool and C

Python offers useful functions like BPF, get_syscall_fnname, attach_kprobe, etc that make the whole process of loading and attaching BPF programs simple. Other tools exist, such as bpftool that is a utility to load and attach a BPF program into the kernel.

To illustrate the use of this tool, let’s create a program that will be triggered each time a network packet is received. Here is the code of such a program, hello.bpf.c:

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

SEC("xdp")
int hello(struct xdp_md *ctx) {
    bpf_printk("Hello World");
    return XDP_PASS;
}

char LICENSE[] SEC("license") = "Dual BSD/GPL";

A bit of explanation regarding this program:

the specific library used in this code can be installed with the libbpf-dev apt package:
```
sudo apt-get install libbpf-dev
```
SEC("xdp") announces a section of code that will be attached to the XDP hook;
XDP_PASS is the return code that allows a packet to be forwarded;
SEC("license") announces a section of code where the program license must be defined.

Here is the Makefile used to compile hello.bpf.c:

TARGETS = hello

all: $(TARGETS)
.PHONY: all

$(TARGETS): %: %.bpf.o

%.bpf.o: %.bpf.c
        clang \
            -target bpf \
                -I/usr/include/$(shell uname -m)-linux-gnu \
                -g \
            -O2 -o $@ -c $<

clean:
        - rm *.bpf.o

Run make to produce the object file hello.bpf.o that will be used by bpftool to inject the program into the kernel:

load the program:

bpftool prog load hello.bpf.o /sys/fs/bpf/hello

ensure the program is loaded:
```
ls /sys/fs/bpf
```
that should show a hello file if the loading worked
show the id of the program you loaded:
```
bpftool prog list
```

attach the program to a network interface (eth0 here):

bpftool net attach xdp id <id_you_found_for_hello> dev eth0

show the result:

cat /sys/kernel/debug/tracing/trace_pipe

Once again, note that you need root permissions to load and attach the program with bpftool as when using Python.

Here is how to detach the program from the eth0 interface and then unload it:

bpftool net detach xdp dev eth0
rm /sys/fs/bpf/hello

Here are a few more commands to get information about your BPF program:

bpftool prog show id <id_you_found_for_hello> --pretty
bpftool prog show name hello --pretty # replace hello with the name of your program
bpftool net list

A few words about the bpf syscall

The bpf syscall is used under the hood in order to load programs, attach them to events and so on. Let’s run the hello-bpf.py file we created beforehand and trace all the bpf syscalls it actually calls.

strace -e bpf ./hello-bpf.py

outputs:

bpf(BPF_BTF_LOAD, {btf="\237\353\1\0...}, 120) = 3
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, ..., prog_name="hello", ..., 120) = 4

where:

the first call to bpf (BPF_BTF_LOAD) is loading BTF (BPF Type Format) data that will be used by the BPF program into the kernel with 3 as the return code which is the file descriptor that refers to that data;
the second call to bpf (BPF_PROG_LOAD) loads the actual program into the kernel with many parameters including:
- prog_type: the program type, here attached to a kprobe,
- prog_name: the program name, here hello.

More on the different bpf syscall types in the Linux kernel documentation.

Conclusion

eBPF provides a powerful, efficient, and flexible way to extend the Linux kernel’s capabilities without extensive integration processes. By running user-space code safely within the kernel, eBPF allows developers to easily implement debugging, tracing, networking, and security features. Utilizing tools like BCC and bpftool, eBPF programs can be written in languages such as Python and C, enabling straightforward kernel customization and monitoring. This flexibility makes eBPF an invaluable resource for enhancing performance, improving security, and gaining deeper system insights.

References

This blog post is licensed under CC BY-SA 4.0