Frequently asked questions about using btf raw tracepoint with ebpf/libbpf programs

Preface

Note some common problems related to raw tracepoint when writing ebpf/libbpf programs (such as BPF_TRACE_RAW_TP program).

Types of eBPF Programs

This article focuses on the eBPF program type called BPF_TRACE_RAW_TP.

The difference between btf raw tracepoint and raw tracepoint

The term btf raw tracepoint refers to BTF-powered raw tracepoint (tp_btf) or BTF-enabled raw tracepoint .

There is one main difference between a btf raw tracepoint and a regular raw tracepoint is:

The btf version can access kernel memory directly from within the ebpf program. There is no need to use a helper function like bpf_core_read or bpf_probe_read_kernel to access the kernel memory as in regular raw tracepoint:

struct task_struct *task = (struct task_struct *) bpf_get_current_task();
u32 ppid = BPF_CORE_READ(task, real_parent, tgid);

// btf enabled
struct task_struct *task = (struct task_struct *) bpf_get_current_task_btf();
u32 ppid = task->real_parent->tgid;

What events can be monitored by btf raw tracepoint

btf raw tracepoint can monitor the same events as raw tracepoint, so we won't go over them here.

Format of SEC content

The SEC format corresponding to the btf raw tracepoint event is:

SEC("tp_btf/<name>")

// 比如:
// SEC("tp_btf/sched_switch")
// SEC("tp_btf/sys_enter")
// SEC("tp_btf/sys_exit")

The value of <name> is the same as the <name> used in the raw tracepoint SEC is the same.

How to determine the parameter type of the btf raw tracepoint event handling function and get the corresponding kernel call parameters

All events are defined by the presence of a definition named btf_trace_<name> in vmlinux.h.

For example, sys_enter This event corresponds to the following definition:

typedef void (*btf_trace_sys_enter)(void *, struct pt_regs *, long int);

The corresponding ebpf function can be defined as follows:

SEC("tp_btf/sys_enter")
int btf_raw_tracepoint__sys_enter(u64 *ctx)
{
  // ...
}

where ctx[0] corresponds to the first parameter struct pt_regs * after void * in btf_trace_sys_enter above, ctx[1] is the second parameter long int. The meaning of these two parameters is the same as in raw tracepoint: TP_PROTO( struct pt_regs *regs, long id).

Correspondingly, a sample program to obtain fchmodat system call events using btf raw tracepoint is as follows:

SEC("tp_btf/sys_enter")
int btf_raw_tracepoint__sys_enter(u64 *ctx)
{
    long int syscall_id = (long int)ctx[1];
    if(syscall_id != 268)    // fchmodat
        return 0;

    struct pt_regs *regs = (struct pt_regs *)ctx[0];
    // others code same as sample of raw tracepoint
    // ...
}

BTW, in the btf raw tracepoint program you can get the btf version of task information via bpf_get_current_task_btf().

You can check out full example codes on Github:


Comments