Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp5451273pxb; Mon, 7 Feb 2022 02:14:11 -0800 (PST) X-Google-Smtp-Source: ABdhPJzK/xabm2VA9NmhHAMC0EGilwezX8xw9cilfo2SNIl80XReQ57ELz9Ck0J3lGfazSsQJWAT X-Received: by 2002:a05:6402:2806:: with SMTP id h6mr13403912ede.223.1644228851587; Mon, 07 Feb 2022 02:14:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644228851; cv=none; d=google.com; s=arc-20160816; b=qED3Wwi/080Mf47sQ/3d3wfJvMjR+qOVw9Rd7Dlt8MoPM1X+vH2AYh1qSHtPYR8uxy xlXsqlhZuKbSQ0PBYh44Q0jWD3ezuKCw0ym+x6qEhiOQIkfwyP3L7CyCY14l/dF2NYsE D09ZG82sP04IPlDpjVzIbiSZRXW/S4Y5NIBSQaH9CGZabjP+taSLZtK1O48hyqbF8GRC Xt3aeZRHEpZwQB4WF86odJJ0/Xo6KfiSlXMbZ2kpdGU4dSOAyAAC0U+pBKnowqKT5xAa QRZnrpPbCcPHl6UTDF7xhts+aHEHD88gWR7ZvjhMeY62jk7L406u+YKbw7BeO4um6nGt yiWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Jdhw7mSR7V7cNcaM0UCeu+GU2bmiq3T19a9N118Zppg=; b=GJ2EfnvUOFVGClb+0aDYp4PmknTAl5S20CbOULqhv2PF2jfLiqMVwigaqHAWyvYuiE Dk3QbAI12no1btK4Sw/sXTdQATfylQ4VUUEstFKRe6mEmbKZ3hm0gE6IcBHo7z7x8Nk0 9ZYNuIxBs8cCAK4MG1Wf6t3ql3OTZUsdspeGsWosQqe65PcCwDOPPPYDNk/pyQGU/pKL DOiFZDE5jWN9iy6v1EU95aVeUSfFy39sxBjgRcYECSyNzjrT8gqXOs45/lCY8LK7ioCY QzlG6fMSBMSuiUA9nQz9RvvZRu/oYKTkHOCyubbKLR4q9R0HmgFK39Op4RNusnLKcDm2 Q9TA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="Cd+ijFk/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nd12si6868162ejc.399.2022.02.07.02.13.47; Mon, 07 Feb 2022 02:14:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="Cd+ijFk/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229563AbiBFE3Z (ORCPT + 99 others); Sat, 5 Feb 2022 23:29:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbiBFE3X (ORCPT ); Sat, 5 Feb 2022 23:29:23 -0500 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1AC0C06173B; Sat, 5 Feb 2022 20:29:21 -0800 (PST) Received: by mail-pl1-x632.google.com with SMTP id z5so8613460plg.8; Sat, 05 Feb 2022 20:29:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Jdhw7mSR7V7cNcaM0UCeu+GU2bmiq3T19a9N118Zppg=; b=Cd+ijFk/74tk8ZNujs8fsDYd1sTXI5V/wGkDR0Fi92TrXWNha4DUr2B62eZNiFm4lw lornrc25uXGBV8m9I7uqUKok5hGpfLrnjmPt3u1ZAFfeKVXtpqOkOCJO/EWgx7vrWJdC 2WF5ogO26iW7WTGKMGpEJCJHESEVdlI/DG5JWyEXLoLBDVvyvDEwTUW7/RkDop7INl44 LckVO0k9clUROBR5+IgXhfP2UEB3gr23nWPfxBaoldKDCj015MM60H/CEiKLVTh2EH24 nTFIlyztZGJ2gDziCu6w5Q09K/7k9n5eCid3hIs94RwI4c7UFNw/xvjtTykZ7uImKVAl KBow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Jdhw7mSR7V7cNcaM0UCeu+GU2bmiq3T19a9N118Zppg=; b=dixe0EsVrjLPWcomXwFVfdeclgc46Ei/g3+zP2aTNNxQJxzcnap9dqSDhfLW5RqSEl kgk4Qt3Yssu6JSFWkS+b+Zi8/BMs2/NBR/zzJCzFTFx6zIrLWh0FlNfdBxlYg00j6DU5 KHBMiaiD23ppilF52kaR5xq6o0DwohKPxJbDAmn9lVeAUj1G0CfMpcPOIV3Uvpxta83h /Mcsbj1MvASdATbPZiwFI5C6fFwe/xizfqeOl1svodQgfDKqrS5yzNgH/ZnrAMwHTAWF Y+5CFMZcXXz4VaMiqtUHafhbYIfxY5M+muOGo1esFARZjezL5+mcllapVLLY1jg+DpDd 5YPQ== X-Gm-Message-State: AOAM5303UEZwtxnqrDMw3kMP6gG6avflh7MzxSFlPxMo+5zoFxq1MnZ2 GTPG6czZXc/C/FRjGe8wPeeyVOsffDzIMlo6OLqdS8Xe X-Received: by 2002:a17:90b:4ac6:: with SMTP id mh6mr7409716pjb.138.1644121761313; Sat, 05 Feb 2022 20:29:21 -0800 (PST) MIME-Version: 1.0 References: <20220201205534.1962784-1-haoluo@google.com> <20220201205534.1962784-6-haoluo@google.com> <20220203180414.blk6ou3ccmod2qck@ast-mbp.dhcp.thefacebook.com> In-Reply-To: From: Alexei Starovoitov Date: Sat, 5 Feb 2022 20:29:10 -0800 Message-ID: Subject: Re: [PATCH RFC bpf-next v2 5/5] selftests/bpf: test for pinning for cgroup_view link To: Hao Luo Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe Burton , Stanislav Fomichev , bpf , LKML Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 4, 2022 at 10:27 AM Hao Luo wrote: > > > > > In our use case, we can't ask the users who create cgroups to do the > > > pinning. Pinning requires root privilege. In our use case, we have > > > non-root users who can create cgroup directories and still want to > > > read bpf stats. They can't do pinning by themselves. This is why > > > inheritance is a requirement for us. With inheritance, they only need > > > to mkdir in cgroupfs and bpffs (unprivileged operations), no pinning > > > operation is required. Patch 1-4 are needed to implement inheritance. > > > > > > It's also not a good idea in our use case to add a userspace > > > privileged process to monitor cgroupfs operations and perform the > > > pinning. It's more complex and has a higher maintenance cost and > > > runtime overhead, compared to the solution of asking whoever makes > > > cgroups to mkdir in bpffs. The other problem is: if there are nodes in > > > the data center that don't have the userspace process deployed, the > > > stats will be unavailable, which is a no-no for some of our users. > > > > The commit log says that there will be a daemon that does that > > monitoring of cgroupfs. And that daemon needs to mkdir > > directories in bpffs when a new cgroup is created, no? > > The kernel is only doing inheritance of bpf progs into > > new dirs. I think that daemon can pin as well. > > > > The cgroup creation is typically managed by an agent like systemd. > > Sounds like you have your own agent that creates cgroups? > > If so it has to be privileged and it can mkdir in bpffs and pin too ? > > Ah, yes, we have our own daemon to manage cgroups. That daemon creates > the top-level cgroup for each job to run inside. However, the job can > create its own cgroups inside the top-level cgroup, for fine grained > resource control. This doesn't go through the daemon. The job-created > cgroups don't have the pinned objects and this is a no-no for our > users. We can whitelist certain tracepoints to be sleepable and extend tp_btf prog type to include everything from prog_type_syscall. Such prog would attach to cgroup_mkdir and cgroup_release and would call bpf_sys_bpf() helper to pin progs in new bpffs dirs. We can allow prog_type_syscall to do mkdir in bpffs as well. This feature could be useful for similar monitoring/introspection tasks. We can write a program that would monitor bpf prog load/unload and would pin an iterator prog that would show debug info about a prog. Like cat /sys/fs/bpf/progs.debug shows a list of loaded progs. With this feature we can implement: ls /sys/fs/bpf/all_progs.debug/ and each loaded prog would have a corresponding file. The file name would be a program name, for example. cat /sys/fs/bpf/all_progs.debug/my_prog would pretty print info about 'my_prog' bpf program. This way the kernfs/cgroupfs specific logic from patches 1-4 will not be necessary. wdyt?