libbpfgo example: get process info in eBPF program


Most ebpf-based applications need to obtain information about the process that triggered the event when the corresponding event occurred in the ebpf program. This article documents how to obtain common process information in ebpf programs.

Get process information

In linux, the task_struct structure contains process-related information, so we can get the process information we want from the task instance obtained from bpf_get_ current_task() from the task instance we get: pid, ppid, process name, process namespace information, etc.

Also, bpf-helpers provides some helper functions to assist us in getting relevant information, such as the bpf_get_current_task () function.

Get host-level pid information

The first is how to get the pid information at the host level. The reason for adding a host level is that in a container-like case, the process has two pid information, one is the pid seen on the host, and the other is the pid seen under a specific pid namespace in the container.

The bpf_get_current_pid_tgid() function (which encapsulates calls to task->tgid and task->pid) provided by bpf-helpers can be used to obtain the corresponding host-level pid information:

u32 host_pid = bpf_get_current_pid_tgid() >> 32;

With pid, you will generally also need ppid, the pid of the parent process. ppid we will have to get from the task.

First, you need to get the task information of the parent process through task->real_parent, and then get the corresponding pid information through task->tgid:

struct task_struct *task = (struct task_struct *)bpf_get_current_task();
u32 host_ppid = task->real_parent->tgid;

Get the pid information at the userspace level

As mentioned above, in scenarios where containers use separate pid namspace, the pid seen under the corresponding pid namespace is not the same as the pid on the host, So we also need to get the pid information at the userspace level.

It is mainly through task->nsproxy that we get the nsproxy information, and the structure definition of nsproxy is as follows:

 * A structure to contain pointers to all per-process
 * namespaces - fs (mount), uts, network, sysvipc, etc.
 * The pid namespace is an exception -- it's accessed using
 * task_active_pid_ns.  The pid namespace here is the
 * namespace that children will use.
 * 'count' is the number of tasks holding a reference.
 * The count for each namespace, then, will be the number
 * of nsproxies pointing to it, not the number of tasks.
 * The nsproxy is shared by tasks which share all namespaces.
 * As soon as a single namespace is cloned or unshared, the
 * nsproxy is copied.
struct nsproxy {
    atomic_t count;
    struct uts_namespace *uts_ns;
    struct ipc_namespace *ipc_ns;
    struct mnt_namespace *mnt_ns;
    struct pid_namespace *pid_ns_for_children;
    struct net           *net_ns;
    struct time_namespace *time_ns;
    struct time_namespace *time_ns_for_children;
    struct cgroup_namespace *cgroup_ns;

You can see that nsproxy contains various namespace information related to the process.

You can get the required userspace level pid information by using the following method:

unsigned int level = task->nsproxy->pid_ns_for_children->level;
u32 pid = task->group_leader->thread_pid->numbers[level].nr;

The method of getting the corresponding ppid is similar:

unsigned int p_level = task->real_parent->nsproxy->pid_ns_for_children->level;
u32 ppid = task->real_parent->group_leader->thread_pid->numbers[p_level].nr;

Get namespace information

As we saw earlier, nsproxy contains various namespace information, so you can get the namespace related information directly through it.

For example, get the id of the pid namespace:

u32 pid_ns_id = task->nsproxy->pid_ns_for_children->ns.ium

You can check out full codes on Github: