Preface¶
Note some common problems related to tracepoint when writing ebpf/libbpf programs (such as BPF_PROG_TYPE_TRACEPOINT program).
What events can be monitored by tracepoint¶
The events that tracepoint can monitor can be found by looking at the contents of the /sys/kernel/debug/tracing/available_events file.
The format of each line in the file is:
<category>:<name>
For example:
syscalls:sys_enter_execve
Format of SEC content¶
The SEC format for the tracepoint event is:
SEC("tracepoint/<category>/<name>") // for example: // SEC("tracepoint/syscalls/sys_enter_openat")
or:
SEC("tp/<category>/<name>") // for example: // SEC("tp/syscalls/sys_enter_openat")
the values of <category> and <name> both take the values listed in the available_events file earlier.
SEC("tp/xx/yy") and SEC("tracepoint/xx/yy") are actually equivalent, depending on personal preference, which one can be used at will.
How to determine the parameter type of the tracepoint event handler and get the corresponding kernel call parameters¶
Suppose that we want to monitor the fchmodat system call involved in the chmod command via tracepoint. Then, how do we determine the types of parameters of the event handler functions in ebpf and how do we get the contents of the corresponding fchmodat system call parameters? For example, get the name of the file to be operated on and the value of the permission mode to be operated on.
The first step is to determine the system call used by chmod, which is relatively simple and can be done in a variety of ways, such as through the strace command:
$ strace chmod 600 a.txt ... fchmodat(AT_FDCWD, "a.txt", 0600) = 0 ...
The second step is to find the tracepoint event that can be used for this system call:
$ sudo cat /sys/kernel/debug/tracing/available_events |grep fchmodat syscalls:sys_exit_fchmodat syscalls:sys_enter_fchmodat
As you can see, there are sys_enter_fchmodat and sys_exit_fchmodat events. Here choose sys_enter_fchmodat event for subsequent explanation.
The third step is to determine the argument type of the function. This needs to be found in the vmlinux.h file, generally sys_enter_xx corresponds to trace_event_raw_sys_enter, sys_exit_xx corresponds to trace_event_raw_sys_exit, and the others generally correspond to trace_event_raw_sys_enter. trace_event_raw_<name>, if you don't find it, you can refer to the trace_event_raw_sys_enter example to find its similar struct.
For sys_enter_fchmodat, we use the struct trace_event_raw_sys_enter:
struct trace_event_raw_sys_enter {
struct trace_entry ent;
long int id;
long unsigned int args[6];
char __data[0];
};
The args stores the information we can get about the event, and what information is contained in them is what we need to determine in step 4.
The fourth step is to determine what information is available in the event itself, although we know that the fchmodat system call requires the file name and mode information. However, we are not sure if this information is available in the ebpf program. This can be done by looking at /sys/kernel/debug/tracing/events/<category>/<name>/format file to see what information we can get. For example, the sys_enter_fchmodat event /sys/kernel/debug/tracing/events/syscalls/sys_enter_fchmodat/format is as follows:
$ sudo cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_fchmodat/format name: sys_enter_fchmodat ID: 647 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:int __syscall_nr; offset:8; size:4; signed:1; field:int dfd; offset:16; size:8; signed:0; field:const char * filename; offset:24; size:8; signed:0; field:umode_t mode; offset:32; size:8; signed:0; print fmt: "dfd: 0x%08lx, filename: 0x%08lx, mode: 0x%08lx", ((unsigned long)(REC->dfd)), ((unsigned long)(REC->filename)), ((unsigned long)(REC->mode))
The fields referenced in print fmt are all information that we can get in the ebpf program. From the above, we can see that we can get the sys_enter_fchmodat event dfd, filename and mode information. Here contains the previously mentioned file name and permission mode information. The values of these fields can be obtained from the args array of trace_event_raw_sys_enter, i.e. args[0] for dfd, args[1] for filename and so on.
Once the information has been determined, you can write the program. For example, the example ebpf program for the sys_enter_fchmodat event above is as follows:
SEC("tracepoint/syscalls/sys_enter_fchmodat")
int tracepoint__syscalls__sys_enter_fchmodat(struct trace_event_raw_sys_enter *ctx)
{
// ...
char *filename_ptr = (char *) BPF_CORE_READ(ctx, args[1]);
bpf_core_read_user_str(&event->filename, sizeof(event->filename), filename_ptr);
event->mode = BPF_CORE_READ(ctx, args[2]);
// ...
}
You can check out full example codes on Github:
Comments
comments powered by Disqus