Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1582900rwl; Wed, 5 Apr 2023 20:43:27 -0700 (PDT) X-Google-Smtp-Source: AKy350YDgvvzPp/Pfhz+FnSOnT9okh92sb/iXXNYtUTCb54Rd9sAXxVHTkeHAYHSBE4DaXtH8fzn X-Received: by 2002:a17:906:55cb:b0:878:545b:e540 with SMTP id z11-20020a17090655cb00b00878545be540mr4701427ejp.51.1680752607648; Wed, 05 Apr 2023 20:43:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680752607; cv=none; d=google.com; s=arc-20160816; b=ZUeVZuY28Vi/ghiSXy6eRZIRSbp8PSEwP0abwlI0ZUfUIzrvcAeG7V65rLNlnyhhC0 j61+jmyP5uwgZvI6PmaR4UQYIWYAOxmGfM/28dq8XD/rg8542UpNJoa9E9NLFjwkIB3o kRqtlj9dFbXG76TG6xMCVYM8D4K1NbGXSJgP2n/5KIl68SQTT5bEEq48Z0rM6O6mf/Ic uCI1rCD/MStcwkz9rpbPNzCpz1VY1PYSCKtPF3LTV2LBJvVbThTvevqKfDcwJme+axBi QiBtUbIjdDHvsPNM7o4I8gJp54EJU82lbH6dt0E7NtmclkRiu+oj+er/4kkzpFjRR7+c Q5lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=R5XC3ufD+Qey01Z2ErPpFvERrSWVVhTft+yj5CoTEmQ=; b=Nezx6tnQedv/vv+uFUjrDI7ZVK0L8PATV2g8ShNl7+H3Bigi4zVtS62wXtcVRiqZn2 +OYbHovptCYXxNsbQsLte3Wu3iBHV/RghdDRqJ8MLlJe0/WfugVqSrDXouiNrP212vJ6 sQtgPETil/bL+M3f3clVBEUrEJIUSGyJrkNHC7cG672Okp2wxDL79f0MzB/2DbFB98Kp ssyJ66P4ejEMuKTpIkUR5ConU7plzlGnAL+6KcMU6ud9q7PNJhAcVERvUN6QePNFln7O tfufc2Sd8O7bowjvYhRdPBcFMb9eIs0z9Jw6FMqEWIPNkWBEngQOqlb41vrigM9mVDxx YOcg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=JCYDeRwz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 24-20020a170906005800b008bc042c16cesi278182ejg.827.2023.04.05.20.42.58; Wed, 05 Apr 2023 20:43:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=JCYDeRwz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235171AbjDFDWt (ORCPT + 99 others); Wed, 5 Apr 2023 23:22:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230059AbjDFDWo (ORCPT ); Wed, 5 Apr 2023 23:22:44 -0400 Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D61F293ED; Wed, 5 Apr 2023 20:22:38 -0700 (PDT) Received: by mail-qt1-x82b.google.com with SMTP id g19so36942044qts.9; Wed, 05 Apr 2023 20:22:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680751357; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=R5XC3ufD+Qey01Z2ErPpFvERrSWVVhTft+yj5CoTEmQ=; b=JCYDeRwzlodAgahW3ub028888bjpfqFV0nWtKmFA0xOwM90pdsB5XjeJ/J6mcwGW6R 1KSUneDFUHVh5p+DPb2KV1LzFSo3Cp65KBvUeYzzo5mD17JHV9NV85MudpNoAN+JPWbn SRIyihCsSFqHwMJm608Us5iFmuPuxv/YHZpAfOr0lqznC6JuCgHog2prD19wMrMQ7+Jl ifPTLFeP4Nvrkpm71icVA+RQvxw2WPIuvTHCf0wtTNFcHOYzc8nVuuGFqUh17WOPI7Ec Hky+yWduHkk5S9QoJhRKgaDC9w8r9t9BGusjQhxiC/p98kdKGzeRulvykOXB8w31bEBT m6IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680751357; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R5XC3ufD+Qey01Z2ErPpFvERrSWVVhTft+yj5CoTEmQ=; b=TunW3xWsH+E9ndVOB+2f+Fnz/TfzoGS7UNGgY6Jvh4hzxnf/I7ekG3Ov7V06NEs/+M cAWqtqpF0hchkm43CS8oFMdAjhRFza2p/fXRDBDeCRinN3aVzwdCXlIsjsGXnNn+FOlU bEGwyzZmwf6WZ7YHI3d7BmLv7hWCy5UlBub2KBFJsSnsI8rrFAUy5HY5YZkb6RwaWc4i VXZwu4n5JSKVYF82KC5PvopGqcBk7x4gfBeybCwUXQWebRGGanc7sPMm1fDeu23oLk77 dty4NYyc6PcRpk0WOFZL1ayR/+0/bX49tTl9um8PVf8cl/0IOjWWlgn00+uo1zuMGtoX Bb+w== X-Gm-Message-State: AAQBX9cpOo3PDQtg/noabxmBDt0Gp3Zm24/aire8s2kboOpQxLBfaUgh s145Xw+IPm3XAPB6vVd5AXszDK4hmfjbxdRMiWI= X-Received: by 2002:a05:622a:1815:b0:3df:4392:1aff with SMTP id t21-20020a05622a181500b003df43921affmr2103282qtc.6.1680751357446; Wed, 05 Apr 2023 20:22:37 -0700 (PDT) MIME-Version: 1.0 References: <20230326092208.13613-1-laoar.shao@gmail.com> <20230402233740.haxb7lgfavcoe27f@dhcp-172-26-102-232.dhcp.thefacebook.com> <20230403225017.onl5pbp7h2ugclbk@dhcp-172-26-102-232.dhcp.thefacebook.com> <20230406020656.7v5ongxyon5fr4s7@dhcp-172-26-102-232.dhcp.thefacebook.com> In-Reply-To: From: Yafang Shao Date: Thu, 6 Apr 2023 11:22:01 +0800 Message-ID: Subject: Re: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace To: Alexei Starovoitov Cc: Song Liu , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , bpf , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 6, 2023 at 11:06=E2=80=AFAM Alexei Starovoitov wrote: > > On Wed, Apr 5, 2023 at 7:55=E2=80=AFPM Yafang Shao = wrote: > > > > It seems that I didn't describe the issue clearly. > > The container doesn't have CAP_SYS_ADMIN, but the CAP_SYS_ADMIN is > > required to run bpftool, so the bpftool running in the container > > can't get the ID of bpf objects or convert IDs to FDs. > > Is there something that I missed ? > > Nothing. This is by design. bpftool needs sudo. That's all. > Hmm, what I'm trying to do is make bpftool run without sudo. > > > > > > > --- a/kernel/bpf/syscall.c > > > > +++ b/kernel/bpf/syscall.c > > > > @@ -3705,9 +3705,6 @@ static int bpf_obj_get_next_id(const union bp= f_attr *attr, > > > > if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >=3D INT_MAX= ) > > > > return -EINVAL; > > > > > > > > - if (!capable(CAP_SYS_ADMIN)) > > > > - return -EPERM; > > > > - > > > > next_id++; > > > > spin_lock_bh(lock); > > > > if (!idr_get_next(idr, &next_id)) > > > > > > > > Because the container doesn't have CAP_SYS_ADMIN enabled, while the= y > > > > only have CAP_BPF and other required CAPs. > > > > > > > > Another possible solution is that we run an agent in the host, and = the > > > > user in the container who wants to get the bpf objects info in his > > > > container should send a request to this agent via unix domain socke= t. > > > > That is what we are doing now in our production environment. That > > > > said, each container has to run a client to get the bpf object fd. > > > > > > None of such hacks are necessary. People that debug bpf setups with b= pftool > > > can always sudo. > > > > > > > There are some downsides, > > > > - It can't handle pinned bpf programs > > > > For pinned programs, the user can get them from the pinned files > > > > directly, so he can use bpftool in his case, only with some > > > > complaints. > > > > - If the user attached the bpf prog, and then removed the pinned > > > > file, but didn't detach it. > > > > That happened. But this error case can't be handled. > > > > - There may be other corner cases that it can't fit. > > > > > > > > There's a solution to improve it, but we also need to change the > > > > kernel. That is, we can use the wasted space btf->name. > > > > > > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c > > > > index b7e5a55..59d73a3 100644 > > > > --- a/kernel/bpf/btf.c > > > > +++ b/kernel/bpf/btf.c > > > > @@ -5542,6 +5542,8 @@ static struct btf *btf_parse(bpfptr_t btf_dat= a, > > > > u32 btf_data_size, > > > > err =3D -ENOMEM; > > > > goto errout; > > > > } > > > > + snprintf(btf->name, sizeof(btf->name), "%s-%d-%d", current-= >comm, > > > > + current->pid, cgroup_id(task_cgroup(p, cpu= _cgrp_id))); > > > > > > Unnecessary. > > > comm, pid, cgroup can be printed by bpftool without changing the kern= el. > > > > Some questions, > > - What if the process exits after attaching the bpf prog and the prog > > is not auto-detachable? > > For example, the reuserport bpf prog is not auto-detachable. After > > pins the reuserport bpf prog, a task can attach it through the pinned > > bpf file, but if the task forgets to detach it and the pinned file is > > removed, then it seems there's no way to figure out which task or > > cgroup this prog belongs to... > > you're saying that there is a bpf prog in the kernel without > corresponding user space ? No, it is corresponding to user space. For example, it may be corresponding to a socket fd, or a cgroup fd. > Meaning no user space process has an FD > that points to this prog or FD to a map that this prog is using? > In such a case this is truly kernel bpf prog. It doesn't belong to cgroup= . > Even if it is kernel bpf prog, it is created by a process. The user needs to know which one created it. > > - Could you pls. explain in detail how to get comm, pid, or cgroup > > from a pinned bpffs file? > > pinned bpf prog and no user space holds FD to it? > It's not part of any cgroup. Nothing to print. As I explained above, even if it holds nothing, the user needs to know the information from it. For example, if it is expected, which one created it? --=20 Regards Yafang