Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp3668577imm; Fri, 25 May 2018 09:29:31 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqMcMwagljfeYKFvOWfS6MfgQ2JNZsMxiPVuzz1BjfjMSQB72a2e3R9jSrFRreh2f4pBpNi X-Received: by 2002:a62:5c06:: with SMTP id q6-v6mr3291231pfb.118.1527265771541; Fri, 25 May 2018 09:29:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527265771; cv=none; d=google.com; s=arc-20160816; b=bBt42rvKUcAtkHBSedNZylAjC8oC6/z0GtaNvNRsB9kgU6cR105G1z5MqUYU/ISPOg gCSyA1eANpVPENkcG0cp/cltXkqwHoseBvnz8GzBIYnXgcivBXbmvbHU0yKW+/5b1woF Hijt7bmhYzCkpkAHbNFvV04Fe3WZMv/+qWXfPvqIPftRTLGrpHP64SJ9mHQVWB0Ne+1H qAFMLcNEyPDYE/u37OiTScdcyR5ewY1QEOcBsUXuJLRWyKaE4TBwfE+uJBKHntO0eOju 6fW438McTPDa0o9YhRz8w6Pf3AMVCafNqUgG7s37Eq4OlUbK25H1NQlnvHN1vVjjsO1I 3hew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=OWbit73o3g8CB32wuanhinjcIKJ4xEOW/2MFm7PZJok=; b=uXFuPrdcqq8hJjG1HDP7uC9/+0z4sQrQf9as8mU09Q6QRdf+2BLOlysLpMSQ7dlax/ IkpOBb1n91dsgGbGCzoLZ93z904Va263/gFMRLj95fyrDpgP4KS61nBoIE/UtFg1xg37 pEwhDpN1vbCkXAzXhMYHSDQNyyg+jfqisYsvUFLJjTez0SO1MSpAR8FCrNHVXnug4+dq wOXP1mnZF8fwWovHt9XHIG5ovEobi7HP/2gEH8HrvO9NMPnrNQ5dm8RzfVcAiV9r2YGw tBlQXfaIqU70xKGrRKf3L+mouzdkQ+LO0MAf9noL23SoE2nM4LnAe4L0s2OV0Klbm5+K WzSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cjqdCv1f; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i3-v6si23503488pld.189.2018.05.25.09.29.16; Fri, 25 May 2018 09:29:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cjqdCv1f; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S967239AbeEYQ3G (ORCPT + 99 others); Fri, 25 May 2018 12:29:06 -0400 Received: from mail-vk0-f68.google.com ([209.85.213.68]:37487 "EHLO mail-vk0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966873AbeEYQ3D (ORCPT ); Fri, 25 May 2018 12:29:03 -0400 Received: by mail-vk0-f68.google.com with SMTP id m144-v6so3493515vke.4; Fri, 25 May 2018 09:29:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=OWbit73o3g8CB32wuanhinjcIKJ4xEOW/2MFm7PZJok=; b=cjqdCv1fqfy2QhBi06Z9Rs8dMk/AiFF6yE8d4fLqXJzSdlBig3Fn1VJvvPRLQOBgNs 8heOyTR8zHeoyDpKPCT9aCKca2QsTLKrfwGqcqSTP/iVTRfYng8Lwjgjvd25PTHk2Vu7 ISbHtNPOmP0X8+ts//dLtUbxuSf/ugC0vNm/U+e+5K5VBFK6Pa3mF3WtS+7l+0DmYqNY v1pd0k+PIvcYY29eIO9ayirozZqSk7mSWgAdWpXvnvnUc1joACqDMHutW/dgx8kt+HDk JxAEBFbJFLr8RQqPf2uFpBM0zrqzvJPnSW3+bWLEJMf6D53H6GqUJhiWiCM9k7znGVW/ S36A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=OWbit73o3g8CB32wuanhinjcIKJ4xEOW/2MFm7PZJok=; b=VkdZSLivqgSwdx21EXXhOVqsdzn2S5G/VFxAXpZDBlrBzibJuDYfIXlb8yNL9/qQC9 suxC8Crwd4A8PZbySOeYwhD0bjUTkUrZPgwTLsdYNXWCGdR6aeyphAuMJRoFnj3MlDZ4 CDTwE9N++QtQBw2mpDVa4oz5ZJp2XiBPHOuLy/vWSlTBhw8XcGYKh+CuyaA/pjzDKDAF rHn/fGcucy+iIpbAXDJO5ITgaLyN2aKuGy0iEZipP415a0Odu2Mv/IE/hHLPbfBcRf6r +S0urNgCoHHyssKiPg8MDK4leblOrqF1ypQANdVV69T4wkDmazgh7Oef/qRVJp3Fwlce 1wEA== X-Gm-Message-State: ALKqPwcNsrwbuKJjIjvl2lQWHFshBbc84aAomlykoZUSocfURaygyFgB nKszFJz9EW9dVNCV1enWiNHhUhwqucFAEATvp+E= X-Received: by 2002:a1f:c0c1:: with SMTP id q184-v6mr1992969vkf.144.1527265742536; Fri, 25 May 2018 09:29:02 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a67:405a:0:0:0:0:0 with HTTP; Fri, 25 May 2018 09:28:22 -0700 (PDT) In-Reply-To: References: <20180513173318.21680-1-alban@kinvolk.io> <20180521162609.lpdrnozowmzdn57m@ast-mbp.dhcp.thefacebook.com> From: Y Song Date: Fri, 25 May 2018 09:28:22 -0700 Message-ID: Subject: Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino To: Alban Crequy Cc: Alexei Starovoitov , netdev , LKML , Linux Containers , cgroups@vger.kernel.org, Tejun Heo , =?UTF-8?Q?Iago_L=C3=B3pez_Galeiras?= Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 25, 2018 at 8:21 AM, Alban Crequy wrote: > On Wed, May 23, 2018 at 4:34 AM Y Song wrote: > >> I did a quick prototyping and the above interface seems working fine. > > Thanks! I gave your kernel patch & userspace program a try and it works for > me on cgroup-v2. > > Also, I found out how to get my containers to use both cgroup-v1 and > cgroup-v2 (by enabling systemd's hybrid cgroup mode and docker's > '--exec-opt native.cgroupdriver=systemd' option). So I should be able to > use the BPF helper function without having to add support for all the > cgroup-v1 hierarchies. Great. Will submit a formal patch soon. > >> The kernel change: >> =============== > >> [yhs@localhost bpf-next]$ git diff >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 97446bbe2ca5..669b7383fddb 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -1976,7 +1976,8 @@ union bpf_attr { >> FN(fib_lookup), \ >> FN(sock_hash_update), \ >> FN(msg_redirect_hash), \ >> - FN(sk_redirect_hash), >> + FN(sk_redirect_hash), \ >> + FN(get_current_cgroup_id), > >> /* integer value in 'imm' field of BPF_CALL instruction selects which > helper >> * function eBPF program intends to call >> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >> index ce2cbbff27e4..e11e3298f911 100644 >> --- a/kernel/trace/bpf_trace.c >> +++ b/kernel/trace/bpf_trace.c >> @@ -493,6 +493,21 @@ static const struct bpf_func_proto >> bpf_current_task_under_cgroup_proto = { >> .arg2_type = ARG_ANYTHING, >> }; > >> +BPF_CALL_0(bpf_get_current_cgroup_id) >> +{ >> + struct cgroup *cgrp = task_dfl_cgroup(current); >> + if (!cgrp) >> + return -EINVAL; >> + >> + return cgrp->kn->id.id; >> +} >> + >> +static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = { >> + .func = bpf_get_current_cgroup_id, >> + .gpl_only = false, >> + .ret_type = RET_INTEGER, >> +}; >> + >> BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size, >> const void *, unsafe_ptr) >> { >> @@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const >> struct bpf_prog *prog) >> return &bpf_get_prandom_u32_proto; >> case BPF_FUNC_probe_read_str: >> return &bpf_probe_read_str_proto; >> + case BPF_FUNC_get_current_cgroup_id: >> + return &bpf_get_current_cgroup_id_proto; >> default: >> return NULL; >> } > >> The following program can be used to print out a cgroup id given a cgroup > path. >> [yhs@localhost cg]$ cat get_cgroup_id.c >> #define _GNU_SOURCE >> #include >> #include >> #include >> #include >> #include > >> int main(int argc, char **argv) >> { >> int dirfd, err, flags, mount_id, fhsize; >> struct file_handle *fhp; >> char *pathname; > >> if (argc != 2) { >> printf("usage: %s \n", argv[0]); >> return 1; >> } > >> pathname = argv[1]; >> dirfd = AT_FDCWD; >> flags = 0; > >> fhsize = sizeof(*fhp); >> fhp = malloc(fhsize); >> if (!fhp) >> return 1; > >> err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags); >> if (err >= 0) { >> printf("error\n"); >> return 1; >> } > >> fhsize = sizeof(struct file_handle) + fhp->handle_bytes; >> fhp = realloc(fhp, fhsize); >> if (!fhp) >> return 1; > >> err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags); >> if (err < 0) >> perror("name_to_handle_at"); >> else { >> int i; > >> printf("dir = %s, mount_id = %d\n", pathname, mount_id); >> printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes, >> fhp->handle_type); >> if (fhp->handle_bytes != 8) >> return 1; > >> printf("cgroup_id = 0x%llx\n", *(unsigned long long > *)fhp->f_handle); >> } > >> return 0; >> } >> [yhs@localhost cg]$ > >> Given a cgroup path, the user can get cgroup_id and use it in their bpf >> program for filtering purpose. > >> I run a simple program t.c >> int main() { while(1) sleep(1); return 0; } >> in the cgroup v2 directory /home/yhs/tmp/yhs >> none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel) > >> $ ./get_cgroup_id /home/yhs/tmp/yhs >> dir = /home/yhs/tmp/yhs, mount_id = 124 >> handle_bytes = 8, handle_type = 1 >> cgroup_id = 0x1000006b2 > >> // the below command to get cgroup_id from the kernel for the >> // process compiled with t.c and ran under /home/yhs/tmp/yhs: >> $ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid' >> PID TID COMM FUNC - >> 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2 >> 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2 >> 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2 >> ^C[yhs@localhost tools]$ > >> The kernel and user space cgid matches. Will provide a >> formal patch later. > > > > >> On Mon, May 21, 2018 at 5:24 PM, Y Song wrote: >> > On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov >> > wrote: >> >> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote: >> >>> >> >>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags) >> >>> +{ >> >>> + // TODO: pick the correct hierarchy instead of the mem > controller >> >>> + struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id); >> >>> + >> >>> + if (unlikely(!cgrp)) >> >>> + return -EINVAL; >> >>> + if (unlikely(hierarchy)) >> >>> + return -EINVAL; >> >>> + if (unlikely(flags)) >> >>> + return -EINVAL; >> >>> + >> >>> + return cgrp->kn->id.ino; >> >> >> >> ino only is not enough to identify cgroup. It needs generation number > too. >> >> I don't quite see how hierarchy and flags can be used in the future. >> >> Also why limit it to memcg? >> >> >> >> How about something like this instead: >> >> >> >> BPF_CALL_2(bpf_get_current_cgroup_id) >> >> { >> >> struct cgroup *cgrp = task_dfl_cgroup(current); >> >> >> >> return cgrp->kn->id.id; >> >> } >> >> The user space can use fhandle api to get the same 64-bit id. >> > >> > I think this should work. This will also be useful to bcc as user >> > space can encode desired id >> > in the bpf program and compared that id to the current cgroup id, so we > can have >> > cgroup level tracing (esp. stat collection) support. To cope with >> > cgroup hierarchy, user can use >> > cgroup-array based approach or explicitly compare against multiple > cgroup id's.