Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp347424imm; Tue, 22 May 2018 20:34:31 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqdW76aETJG2axkKqx/yqDDaUbsPz8JREOpvwGaP5sjer2UPgeh+P9FMvPZbRa7LdYfn4L6 X-Received: by 2002:a63:744c:: with SMTP id e12-v6mr943195pgn.4.1527046471744; Tue, 22 May 2018 20:34:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527046471; cv=none; d=google.com; s=arc-20160816; b=GB4y0lxJ/k9hXIg0chVrlUX/G/ou/qQI/2xT6joyFXkOe1HbV0sjXd/+w3JtubgMvA V0cASBe7swuz9hufTl3Uouw6skqTNFwyD+JUuPVSSljltIN/pfa3DmsgMX+6any/KKnO 1z5Nff5urmFihAUQHruZoGRraAI0N3Nxf0ZW0VN+esjlnPJbjXZ/g0KV2rEOd7WuNNf+ pgJM2A4GNms8Vocq5EFuy5XUxL4CiXJEsRNDV/xdfiU9uzUL9kurx7CkFNfoWqQdDjQu n5w1HzaQIIgCqstrLPe0PNNlHH8Z44gri/f5iC9uDJ97R8zCsieHFnySJqyb671BnCZD K6rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=TYDlz5RSbjcKjRaZHOc77nojR2iD2kVpps6nSjZ/9Cg=; b=GWB96EFiOJczsTTKYJ5CXw41fGYahg0jefI4SqemiHO4U+vDKuhek79nX+puAncbzx 38oQrGJkNwAUCvN7ratHNPEpu9yTXxk0R7bbB9TMyYM6t7/r1wGcXDxjyairFPCO/271 eMt/QJX7yj3UuNpy0siyzQob30gkKE1HZnTfXlr0vXAUfXe412CwmudpBrcX1UjrKrTG a/xiUNPq8Im8NW87896CjDXdQjB2ZBuZCVxuHQZepzZW4DBtYtoZQW5YJNDcarD1vU7j UMX3ZQnEB86W3pzVN7ZlRpkIfq0Ztwf5zCTZRcPM3V5U2KatDoe47C+U0tbEIyazXELC +w/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OC9wD3kt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l20-v6si3659374pgo.93.2018.05.22.20.34.16; Tue, 22 May 2018 20:34:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OC9wD3kt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753867AbeEWDeI (ORCPT + 99 others); Tue, 22 May 2018 23:34:08 -0400 Received: from mail-vk0-f65.google.com ([209.85.213.65]:32878 "EHLO mail-vk0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753769AbeEWDeG (ORCPT ); Tue, 22 May 2018 23:34:06 -0400 Received: by mail-vk0-f65.google.com with SMTP id q189-v6so12270076vkb.0; Tue, 22 May 2018 20:34:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=TYDlz5RSbjcKjRaZHOc77nojR2iD2kVpps6nSjZ/9Cg=; b=OC9wD3ktTg0XaHN15d8z7I2LQ2g1HzdOO4WtR6bl/Y47A50w+ZMao5sB5cGZ0gNbFT kbbO5/blIYRcPN9TSzdrI/0F+/6TqgovgzH5CiWNGj2ogZ0YtcfAadD+gR/+VMnb0aAA 3QHTGjkVf01RgA0ChiFwBtFdFgljmOPPSS6uGSDr0qFqUBNbFu+Wp0CpXH/eSeVmEwvS IF7FLwhjdr9NW+WZLNmr4IUHZJQuD+17nbl09LxBh0NFuaYKcUJO4n8XGI+RWymuHsru ZlDAMATvWpiTtTqWe0+KighHz0Dfvng8d4+KJ1906KbkWDyqc7ZONeAdFQD5aWumS994 X72w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=TYDlz5RSbjcKjRaZHOc77nojR2iD2kVpps6nSjZ/9Cg=; b=EVZ76/oBadv9piofT0h1gy3NaABLMAUSb9X2XSenFFyFa7Ux5fabNFnHzydXWXu+Yi pOqL1iH6xvgWQvx4xpVpSQo7im6w8F+rTCZJRzzLeFdlw13kVb8Y13wMx31XXOgAtOo3 K9Ly5XSROIt10sPzSKq6RGTnEn4wITWo+/7sdP99tHWPGvzMlXRWw233ck5IdxqJhZ7B cySIz7YIqiKtKNNwFGQUC7GsMONkLFrDu58j6CFkkI+UaJ0Ln4wIKgytL459655ffIAr hy+gWNpKaocmM4ZOeAH6+/b6ghTqsiMHQ6Xr7KRHxnsZtUfLH44W5wfaBwk83gF2our/ JKVQ== X-Gm-Message-State: ALKqPwf8SbOj7lRGMd9137S+qoDhrhXXJAWsy/Oui9CsFx3Ao/Zo2VDA q5onYSKGO7bI1nTlthP3zLfF8rXAug7ZtN0A1n+euQ== X-Received: by 2002:a1f:9214:: with SMTP id u20-v6mr699721vkd.133.1527046445087; Tue, 22 May 2018 20:34:05 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a67:405a:0:0:0:0:0 with HTTP; Tue, 22 May 2018 20:33:24 -0700 (PDT) In-Reply-To: References: <20180513173318.21680-1-alban@kinvolk.io> <20180521162609.lpdrnozowmzdn57m@ast-mbp.dhcp.thefacebook.com> From: Y Song Date: Tue, 22 May 2018 20:33:24 -0700 Message-ID: Subject: Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino To: Alexei Starovoitov Cc: Alban Crequy , netdev , linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, cgroups@vger.kernel.org, Alban Crequy , tj@kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I did a quick prototyping and the above interface seems working fine. The kernel change: =============== [yhs@localhost bpf-next]$ git diff diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 97446bbe2ca5..669b7383fddb 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1976,7 +1976,8 @@ union bpf_attr { FN(fib_lookup), \ FN(sock_hash_update), \ FN(msg_redirect_hash), \ - FN(sk_redirect_hash), + FN(sk_redirect_hash), \ + FN(get_current_cgroup_id), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index ce2cbbff27e4..e11e3298f911 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -493,6 +493,21 @@ static const struct bpf_func_proto bpf_current_task_under_cgroup_proto = { .arg2_type = ARG_ANYTHING, }; +BPF_CALL_0(bpf_get_current_cgroup_id) +{ + struct cgroup *cgrp = task_dfl_cgroup(current); + if (!cgrp) + return -EINVAL; + + return cgrp->kn->id.id; +} + +static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = { + .func = bpf_get_current_cgroup_id, + .gpl_only = false, + .ret_type = RET_INTEGER, +}; + BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size, const void *, unsafe_ptr) { @@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_prandom_u32_proto; case BPF_FUNC_probe_read_str: return &bpf_probe_read_str_proto; + case BPF_FUNC_get_current_cgroup_id: + return &bpf_get_current_cgroup_id_proto; default: return NULL; } The following program can be used to print out a cgroup id given a cgroup path. [yhs@localhost cg]$ cat get_cgroup_id.c #define _GNU_SOURCE #include #include #include #include #include int main(int argc, char **argv) { int dirfd, err, flags, mount_id, fhsize; struct file_handle *fhp; char *pathname; if (argc != 2) { printf("usage: %s \n", argv[0]); return 1; } pathname = argv[1]; dirfd = AT_FDCWD; flags = 0; fhsize = sizeof(*fhp); fhp = malloc(fhsize); if (!fhp) return 1; err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags); if (err >= 0) { printf("error\n"); return 1; } fhsize = sizeof(struct file_handle) + fhp->handle_bytes; fhp = realloc(fhp, fhsize); if (!fhp) return 1; err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags); if (err < 0) perror("name_to_handle_at"); else { int i; printf("dir = %s, mount_id = %d\n", pathname, mount_id); printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes, fhp->handle_type); if (fhp->handle_bytes != 8) return 1; printf("cgroup_id = 0x%llx\n", *(unsigned long long *)fhp->f_handle); } return 0; } [yhs@localhost cg]$ Given a cgroup path, the user can get cgroup_id and use it in their bpf program for filtering purpose. I run a simple program t.c int main() { while(1) sleep(1); return 0; } in the cgroup v2 directory /home/yhs/tmp/yhs none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel) $ ./get_cgroup_id /home/yhs/tmp/yhs dir = /home/yhs/tmp/yhs, mount_id = 124 handle_bytes = 8, handle_type = 1 cgroup_id = 0x1000006b2 // the below command to get cgroup_id from the kernel for the // process compiled with t.c and ran under /home/yhs/tmp/yhs: $ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid' PID TID COMM FUNC - 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2 ^C[yhs@localhost tools]$ The kernel and user space cgid matches. Will provide a formal patch later. On Mon, May 21, 2018 at 5:24 PM, Y Song wrote: > On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov > wrote: >> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote: >>> >>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags) >>> +{ >>> + // TODO: pick the correct hierarchy instead of the mem controller >>> + struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id); >>> + >>> + if (unlikely(!cgrp)) >>> + return -EINVAL; >>> + if (unlikely(hierarchy)) >>> + return -EINVAL; >>> + if (unlikely(flags)) >>> + return -EINVAL; >>> + >>> + return cgrp->kn->id.ino; >> >> ino only is not enough to identify cgroup. It needs generation number too. >> I don't quite see how hierarchy and flags can be used in the future. >> Also why limit it to memcg? >> >> How about something like this instead: >> >> BPF_CALL_2(bpf_get_current_cgroup_id) >> { >> struct cgroup *cgrp = task_dfl_cgroup(current); >> >> return cgrp->kn->id.id; >> } >> The user space can use fhandle api to get the same 64-bit id. > > I think this should work. This will also be useful to bcc as user > space can encode desired id > in the bpf program and compared that id to the current cgroup id, so we can have > cgroup level tracing (esp. stat collection) support. To cope with > cgroup hierarchy, user can use > cgroup-array based approach or explicitly compare against multiple cgroup id's.