Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp4766434imm; Mon, 14 May 2018 12:39:55 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqRGgJNnzlIq4Nw1WSsZ9Sm3BhB/7iUY5GH/0LB5tAh1OkIOhPZKmESnE25oJVVHc3CpD7V X-Received: by 2002:a17:902:a70b:: with SMTP id w11-v6mr11177207plq.342.1526326795572; Mon, 14 May 2018 12:39:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526326795; cv=none; d=google.com; s=arc-20160816; b=woRsA3R5XWXO2zzHGTcZFsAAJSmKNvzHWolTY2+8Pr+FGoQq2/Q7cLw7ec509NhVPe IEtFIFzLTnzsQqSSrP6smUy5B/cpZrZ3l16+0+AzHAd6dgV/Y+ErMkQrYY/hDbf3wSG0 sNTvFEYO0/63nAV04qC+Es4/QD21UPN4KWeRPJtvQirpvQbpe7kj10ttVr0o84iQPikC lhVgS3POi1mXR5SQmqnMnggSyYmhpt5RcUcgoip/iGNMDdoyeb2EEmhc96UCCFVZHTQk TX1Cbdyk+P5bQZhHOsFRAo/6Liii1BBWTL71pyu891AJ/Ss0mR9fuYe52w9HwD5zZqXU I1FA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=9qJrl2X55DA6W1tX8wvp7onNh3DzxeY4acokgrpB4AI=; b=rPXMtGii5wq/XSlA4XYW+BEkuK7XkTG2+8QvK7R7rPTePgt04EmNv7/CUlB9uCinPD CzGq07fnlc7A6itTaCNlcIYWNMfdYIXFDW7HMwnrKqwUybe6viCPaMEcqhXqE8ITvASM lbZFt5scKpodvj1V3TVzqi7uw/CgcrjHw8CX8t9LX7df9GmvmIDji95lvpAmGKWFmOH1 XoNG4UisgkRW0bWx9LI1Jph+2aumQ8xbrnrPz1nBrLEL9Yyy3sL02MWS7/Z0K1OsbnkQ dr23yoQSXG1BCtZz3M9aHzMoEMtMudlcuLO6K2KHmXbqteME9aIwGutHQ5TBIBRrgVV3 YdVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=roiN+i4U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x2-v6si8046634pgp.298.2018.05.14.12.39.41; Mon, 14 May 2018 12:39:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=roiN+i4U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752151AbeENTjX (ORCPT + 99 others); Mon, 14 May 2018 15:39:23 -0400 Received: from mail-ua0-f196.google.com ([209.85.217.196]:41859 "EHLO mail-ua0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751422AbeENTjV (ORCPT ); Mon, 14 May 2018 15:39:21 -0400 Received: by mail-ua0-f196.google.com with SMTP id a3-v6so9184234uad.8; Mon, 14 May 2018 12:39:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=9qJrl2X55DA6W1tX8wvp7onNh3DzxeY4acokgrpB4AI=; b=roiN+i4UzlXZu+pEWJXJt/6S5H85oUgh2F/eiTAUr1CBMkB9HLLlcDNRT2R9bpAi6H yh5a8yjbmOr3+B/Auj8ZHDIbSNWIJE9k6bl6jruh6T36Ap/AWCoM+NZWybSlx+Eh6mX+ JB0Wtawrn38YvJ5kkWpBCSKQZCVbJe73nMCPR5nHRTWChaj5jtHqbIk631Asjd5rxBYp ItvrtukhJEfguAAyeCrMMxWrCybC84+LJUMKe/AGtVbg33Pw6nANC6sDWoBhmny0AQYE GMJzx+XC6ulO3Y6D++CaTkpjfaWdNnROWZ2p6/wY4sc/hyJRLKMxiKQW7yTp7iped1bE aKrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=9qJrl2X55DA6W1tX8wvp7onNh3DzxeY4acokgrpB4AI=; b=Y1qjk4hZ05ulYfQmBQo5JKFL851DabfpmiCc58hZRIYlWVCqJFPlnp4dAe8AOrrzOm cGNCRbgVLKe0k/azeycfMRxZyJ5CoIxXUUnMHB58wahxMnSaV0hMRKXKacUTC8V59Mcv Z3CZ2dckmrBifBAzcWAZlgGPACpvpbDedHkxzizJLX6eFr6DB4gqQtdbZle85SrqdqUu KmnY21ZYPoyDwRpYhUhnoMpbx2zry9oFaJ9R/zM5zVF9ttMuiFvxaCYyyHb9fuvNwfhI 3wruRPCNGT21n0QtyOTgru53itYwB1H/0/qidPP3byyM6eIbasGftqStSwKTmbT11mcN u2Xw== X-Gm-Message-State: ALKqPwfDLA7h4UM8h/P0Qe294wy2wF2/4wTgHWi9hXh9i6DtAPZZWBZs VEy1She4C/8CfgdKPr5UuujhHGeH9DAXuKhtJ50= X-Received: by 2002:a9f:3b90:: with SMTP id r16-v6mr13196518uah.55.1526326760031; Mon, 14 May 2018 12:39:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.64.90 with HTTP; Mon, 14 May 2018 12:38:39 -0700 (PDT) In-Reply-To: <20180513173318.21680-1-alban@kinvolk.io> References: <20180513173318.21680-1-alban@kinvolk.io> From: Y Song Date: Mon, 14 May 2018 12:38:39 -0700 Message-ID: Subject: Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino To: Alban Crequy Cc: netdev , linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, cgroups@vger.kernel.org, Alban Crequy Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, May 13, 2018 at 10:33 AM, Alban Crequy wrote: > From: Alban Crequy > > bpf_get_current_cgroup_ino() allows BPF trace programs to get the inode > of the cgroup where the current process resides. > > My use case is to get statistics about syscalls done by a specific > Kubernetes container. I have a tracepoint on raw_syscalls/sys_enter and > a BPF map containing the cgroup inode that I want to trace. I use > bpf_get_current_cgroup_ino() and I quickly return from the tracepoint if > the inode is not in the BPF hash map. Alternatively, the kernel already has bpf_current_task_under_cgroup helper which uses a cgroup array to store cgroup fd's. If the current task is in the hierarchy of a particular cgroup, the helper will return true. One difference between your helper and bpf_current_task_under_cgroup() is that your helper tests against a particular cgroup, not including its children, but bpf_current_task_under_cgroup() will return true even the task is in a nested cgroup. Maybe this will work for you? > > Without this BPF helper, I would need to keep track of all pids in the > container. The Netlink proc connector can be used to follow process > creation and destruction but it is racy. > > This patch only looks at the memory cgroup, which was enough for me > since each Kubernetes container is placed in a different mem cgroup. > For a generic implementation, I'm not sure how to proceed: it seems I > would need to use 'for_each_root(root)' (see example in > proc_cgroup_show() from kernel/cgroup/cgroup.c) but I don't know if > taking the cgroup mutex is possible in the BPF helper function. It might > be ok in the tracepoint raw_syscalls/sys_enter but could the mutex > already be taken in some other tracepoints? mutex is not allowed in a helper since it can block. > > Signed-off-by: Alban Crequy > --- > include/uapi/linux/bpf.h | 11 ++++++++++- > kernel/trace/bpf_trace.c | 25 +++++++++++++++++++++++++ > 2 files changed, 35 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index c5ec89732a8d..38ac3959cdf3 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -755,6 +755,14 @@ union bpf_attr { > * @addr: pointer to struct sockaddr to bind socket to > * @addr_len: length of sockaddr structure > * Return: 0 on success or negative error code > + * > + * u64 bpf_get_current_cgroup_ino(hierarchy, flags) > + * Get the cgroup{1,2} inode of current task under the specified hierarchy. > + * @hierarchy: cgroup hierarchy Not sure what is the value to specify hierarchy here. A cgroup directory fd? > + * @flags: reserved for future use > + * Return: > + * == 0 error looks like < 0 means error. > + * > 0 inode of the cgroup >= 0 means good? > */ > #define __BPF_FUNC_MAPPER(FN) \ > FN(unspec), \ > @@ -821,7 +829,8 @@ union bpf_attr { > FN(msg_apply_bytes), \ > FN(msg_cork_bytes), \ > FN(msg_pull_data), \ > - FN(bind), > + FN(bind), \ > + FN(get_current_cgroup_ino), > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > * function eBPF program intends to call > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index 56ba0f2a01db..9bf92a786639 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -524,6 +524,29 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = { > .arg3_type = ARG_ANYTHING, > }; > > +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags) > +{ > + // TODO: pick the correct hierarchy instead of the mem controller > + struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id); > + > + if (unlikely(!cgrp)) > + return -EINVAL; > + if (unlikely(hierarchy)) > + return -EINVAL; > + if (unlikely(flags)) > + return -EINVAL; > + > + return cgrp->kn->id.ino; > +} > + > +static const struct bpf_func_proto bpf_get_current_cgroup_ino_proto = { > + .func = bpf_get_current_cgroup_ino, > + .gpl_only = false, > + .ret_type = RET_INTEGER, > + .arg1_type = ARG_DONTCARE, > + .arg2_type = ARG_DONTCARE, > +}; > + > static const struct bpf_func_proto * > tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > { > @@ -564,6 +587,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > return &bpf_get_prandom_u32_proto; > case BPF_FUNC_probe_read_str: > return &bpf_probe_read_str_proto; > + case BPF_FUNC_get_current_cgroup_ino: > + return &bpf_get_current_cgroup_ino_proto; > default: > return NULL; > } > -- > 2.14.3 >