Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp3600577imm; Fri, 25 May 2018 08:23:18 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpglfOmkGa0udlRSOT6WT02xyxBIogLRHL/bxdtF90Cf/qJG08In36tY7yeGl32ycTMkw2+ X-Received: by 2002:a62:4f0c:: with SMTP id d12-v6mr3021648pfb.220.1527261798142; Fri, 25 May 2018 08:23:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527261798; cv=none; d=google.com; s=arc-20160816; b=j+qufQDuaUqdOjeAruvdDfY+0hhsqfY8g0ZWl1WAqbVE91KfvNqQ1sp2vJpT564Nzm 1fc+xdNiR9f0vMKZ9BZuE6BOiiBbIPnhkQi7DjcS89igs6MqrS3NKbB6o/qB8tiI5JOG Q+igxxlpXzlv4j8/KqWPbiDIEih7cVUFcg5VulNQzXny1p5zDBnglxeg1XNECb0MD9Xi qDS3AtkT6TR2W2FODVhUlGS3MmrRx+hQ7ov0VxFNXsCMEdCOhaAb/3Kzxr5mKxD1vMqr FWJCpT4PmG43stRSXj1eEs+tu+MqgkuuYgbuq2rOaBg4Q1obUvYp/4qyDJJfXjElRO1L Pw9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=FNH30+yl2tdRzZcUXPjwnQPdoUCGrqlTaT9xBDR53Y4=; b=fORQV6NHPaUnDA7nYq1/kFMHFwCsQFyXr9TqRjK6/dprUxn9Hput/gVyb0BbIqokIo /WzVaPjoTnkV31ZITPbKpSWJCLhxX1sAX6ZDq5qJWXVpv7l2NnSsffla12JnXnxMpDYl Pfae6Fm8s4Bptjpul0fzPPuzVk5seTpNDwoz6aW9OYpg0NmWxEqyLC5YoSF5Rp8EvhWK EvUUo37+PEbHBjYgP9sU3Aaajh2gdf8uKKXDC3xD1l9eaAA1aRS1wwX6m4oimJ8vRGz0 qvMjw7CUVPqykRgoIrEdU6kbv3TnkemS0gywf2rwWRhh7YQ9bqpXFaY6ih7jjM2UHZlD gUmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kinvolk.io header.s=google header.b=NM/tfGCu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 24-v6si24679833pfj.6.2018.05.25.08.23.02; Fri, 25 May 2018 08:23:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kinvolk.io header.s=google header.b=NM/tfGCu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966547AbeEYPW0 (ORCPT + 99 others); Fri, 25 May 2018 11:22:26 -0400 Received: from mail-pg0-f66.google.com ([74.125.83.66]:36001 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965674AbeEYPWC (ORCPT ); Fri, 25 May 2018 11:22:02 -0400 Received: by mail-pg0-f66.google.com with SMTP id u7-v6so2441219pgp.3 for ; Fri, 25 May 2018 08:22:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kinvolk.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FNH30+yl2tdRzZcUXPjwnQPdoUCGrqlTaT9xBDR53Y4=; b=NM/tfGCurSHO5jnlqqq8cPom1ueRgqqz6j81azK5GHls87+N6ea8nqo2kNnuGCgotu zZSnKwVmBQQBHlgOuO0xvRwssYdcDf80wacpXCGDUKuNfnmLSgYWGC5NwyN+rvDCmQT0 SqcHnsmDBjDlAb8IWFB1UVz4JlrGhv1yYNxB0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FNH30+yl2tdRzZcUXPjwnQPdoUCGrqlTaT9xBDR53Y4=; b=Ih1gCMTRUr/v+A24mW4Clr+S5DGVfSFoJJhtt2eGv4yx2WNg6f2Bnw5iI1vLjbDavT 8z5tXyFAHfCmjmGL74ifg+ZpCkLnwnwuFnDFnCAFPZexbIOeqWeviF000c2hQn5HLxKf I2ZBu/M1wZntiZ5kjnXghpCb7UwimYryO9TSuol5qIopIaIrJQTROEWrPXuIyKiICSiO UGz5QzeP7Zg5cEnYhdPcPdZf8pkk0RKpHI0S0THsCmLv6P9HtHew3GotU1ZzEikkw3L+ zpKuYiROlnWbdKENXLPyO67my1uDYd3FSiwndAybIqqGwIdpBg1crWeNV7KrkyygiW+s rlKg== X-Gm-Message-State: ALKqPwcEi0DfbHaA6rrnANsY6sSZrmhg5EVeMtDvJwxpN7t7cJ7pSAGe XmmzBHqedxgiyKCDNlauage7N7qcBLgPp53ob1DxYQ== X-Received: by 2002:a63:4003:: with SMTP id n3-v6mr2368311pga.184.1527261722045; Fri, 25 May 2018 08:22:02 -0700 (PDT) MIME-Version: 1.0 References: <20180513173318.21680-1-alban@kinvolk.io> <20180521162609.lpdrnozowmzdn57m@ast-mbp.dhcp.thefacebook.com> In-Reply-To: From: Alban Crequy Date: Fri, 25 May 2018 16:21:50 +0100 Message-ID: Subject: Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino To: Y Song Cc: Alexei Starovoitov , netdev , LKML , Linux Containers , cgroups@vger.kernel.org, Tejun Heo , =?UTF-8?Q?Iago_L=C3=B3pez_Galeiras?= Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 23, 2018 at 4:34 AM Y Song wrote: > I did a quick prototyping and the above interface seems working fine. Thanks! I gave your kernel patch & userspace program a try and it works for me on cgroup-v2. Also, I found out how to get my containers to use both cgroup-v1 and cgroup-v2 (by enabling systemd's hybrid cgroup mode and docker's '--exec-opt native.cgroupdriver=systemd' option). So I should be able to use the BPF helper function without having to add support for all the cgroup-v1 hierarchies. > The kernel change: > =============== > [yhs@localhost bpf-next]$ git diff > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 97446bbe2ca5..669b7383fddb 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -1976,7 +1976,8 @@ union bpf_attr { > FN(fib_lookup), \ > FN(sock_hash_update), \ > FN(msg_redirect_hash), \ > - FN(sk_redirect_hash), > + FN(sk_redirect_hash), \ > + FN(get_current_cgroup_id), > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > * function eBPF program intends to call > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index ce2cbbff27e4..e11e3298f911 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -493,6 +493,21 @@ static const struct bpf_func_proto > bpf_current_task_under_cgroup_proto = { > .arg2_type = ARG_ANYTHING, > }; > +BPF_CALL_0(bpf_get_current_cgroup_id) > +{ > + struct cgroup *cgrp = task_dfl_cgroup(current); > + if (!cgrp) > + return -EINVAL; > + > + return cgrp->kn->id.id; > +} > + > +static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = { > + .func = bpf_get_current_cgroup_id, > + .gpl_only = false, > + .ret_type = RET_INTEGER, > +}; > + > BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size, > const void *, unsafe_ptr) > { > @@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const > struct bpf_prog *prog) > return &bpf_get_prandom_u32_proto; > case BPF_FUNC_probe_read_str: > return &bpf_probe_read_str_proto; > + case BPF_FUNC_get_current_cgroup_id: > + return &bpf_get_current_cgroup_id_proto; > default: > return NULL; > } > The following program can be used to print out a cgroup id given a cgroup path. > [yhs@localhost cg]$ cat get_cgroup_id.c > #define _GNU_SOURCE > #include > #include > #include > #include > #include > int main(int argc, char **argv) > { > int dirfd, err, flags, mount_id, fhsize; > struct file_handle *fhp; > char *pathname; > if (argc != 2) { > printf("usage: %s \n", argv[0]); > return 1; > } > pathname = argv[1]; > dirfd = AT_FDCWD; > flags = 0; > fhsize = sizeof(*fhp); > fhp = malloc(fhsize); > if (!fhp) > return 1; > err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags); > if (err >= 0) { > printf("error\n"); > return 1; > } > fhsize = sizeof(struct file_handle) + fhp->handle_bytes; > fhp = realloc(fhp, fhsize); > if (!fhp) > return 1; > err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags); > if (err < 0) > perror("name_to_handle_at"); > else { > int i; > printf("dir = %s, mount_id = %d\n", pathname, mount_id); > printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes, > fhp->handle_type); > if (fhp->handle_bytes != 8) > return 1; > printf("cgroup_id = 0x%llx\n", *(unsigned long long *)fhp->f_handle); > } > return 0; > } > [yhs@localhost cg]$ > Given a cgroup path, the user can get cgroup_id and use it in their bpf > program for filtering purpose. > I run a simple program t.c > int main() { while(1) sleep(1); return 0; } > in the cgroup v2 directory /home/yhs/tmp/yhs > none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel) > $ ./get_cgroup_id /home/yhs/tmp/yhs > dir = /home/yhs/tmp/yhs, mount_id = 124 > handle_bytes = 8, handle_type = 1 > cgroup_id = 0x1000006b2 > // the below command to get cgroup_id from the kernel for the > // process compiled with t.c and ran under /home/yhs/tmp/yhs: > $ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid' > PID TID COMM FUNC - > 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2 > 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2 > 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2 > ^C[yhs@localhost tools]$ > The kernel and user space cgid matches. Will provide a > formal patch later. > On Mon, May 21, 2018 at 5:24 PM, Y Song wrote: > > On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov > > wrote: > >> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote: > >>> > >>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags) > >>> +{ > >>> + // TODO: pick the correct hierarchy instead of the mem controller > >>> + struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id); > >>> + > >>> + if (unlikely(!cgrp)) > >>> + return -EINVAL; > >>> + if (unlikely(hierarchy)) > >>> + return -EINVAL; > >>> + if (unlikely(flags)) > >>> + return -EINVAL; > >>> + > >>> + return cgrp->kn->id.ino; > >> > >> ino only is not enough to identify cgroup. It needs generation number too. > >> I don't quite see how hierarchy and flags can be used in the future. > >> Also why limit it to memcg? > >> > >> How about something like this instead: > >> > >> BPF_CALL_2(bpf_get_current_cgroup_id) > >> { > >> struct cgroup *cgrp = task_dfl_cgroup(current); > >> > >> return cgrp->kn->id.id; > >> } > >> The user space can use fhandle api to get the same 64-bit id. > > > > I think this should work. This will also be useful to bcc as user > > space can encode desired id > > in the bpf program and compared that id to the current cgroup id, so we can have > > cgroup level tracing (esp. stat collection) support. To cope with > > cgroup hierarchy, user can use > > cgroup-array based approach or explicitly compare against multiple cgroup id's.