Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp1258633pxp; Sun, 6 Mar 2022 09:51:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJzkGRA4UklAN2rXIbDqxFaZ8AIu1kH6D2bMR775bkLEUwq9BEGwsAy4Oy6h7KiNfvNbJy9d X-Received: by 2002:a05:6402:3489:b0:415:bc37:a81f with SMTP id v9-20020a056402348900b00415bc37a81fmr7565584edc.354.1646589108004; Sun, 06 Mar 2022 09:51:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646589107; cv=none; d=google.com; s=arc-20160816; b=OO9hRJRyu3yDew40xUJnkY2UvB9TsABONejKsXUETV3Dke6+ouzQLuptZJvZYSqrav Gwyi+0yGtKrOxiG9eGUhZttGvBMLaikS+X0rOL6kd7bYCwyCFYBibON8C5kOX/fKMdxr ZQQcM2MvPYm0O4pNp0+UUrocGpsbgAioqpJrWtn/Hnq8fRe9TJgu7kpybwKX7KCNWixB pMpY8tvaQ0+YyG1ar61vh8AQSkEJG0283ub/3gtBvmwdQ6ModRVTBiAL0Z47rnjmKMMp lmfCnblBlWLqv++stUAGz39sR+y/o1SLIRSukFBvHyQu19Gprwf0/mT8fs8y0VyP+v2i 7W+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=8xHcyjm0EShrfW/6kiv17bipizmd4vd+MkZwRWGD9cM=; b=WlNGSDqV9JX5+srJTmCYA6l4CF2GhWbwoPbR8YPuoJmdvMG+biXIwaoNf3lUxL16Hd S3xJK/RZm6tNUhlOaTdqcEK61yktpJCClYYANvxDvtRZk4ThbDx/Jldu67oigrC3PSOF 6Jc8N4sgjVH0iIHAu2Pm4WRqUNOGMlgc6UAAu39Wmq723aiEpgFC2hMiMbxaz7flRJ8Z p3d8KKQPdZgn/fCqaIMBNPi56/U/p6i/MNF1TcRS9ALC63A3CgQjM/SmWhYRWoEOJv8u mFupyEox03hQKHbZjrY27yhyb2ksl+50ekvf6iVfb3R3X331ICPTEZ9VeGNBVOD1p9Vz KDpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=JEkvRAzV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qw36-20020a1709066a2400b006db2bf6fc1fsi866605ejc.85.2022.03.06.09.51.21; Sun, 06 Mar 2022 09:51:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=JEkvRAzV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230119AbiCEXsY (ORCPT + 99 others); Sat, 5 Mar 2022 18:48:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229488AbiCEXsV (ORCPT ); Sat, 5 Mar 2022 18:48:21 -0500 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6365ADD9; Sat, 5 Mar 2022 15:47:29 -0800 (PST) Received: by mail-pf1-x435.google.com with SMTP id g1so10721233pfv.1; Sat, 05 Mar 2022 15:47:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8xHcyjm0EShrfW/6kiv17bipizmd4vd+MkZwRWGD9cM=; b=JEkvRAzVSRZwkKVtuKUiHR0/a3AEck1yAb6VU4oPxdB2ALsOGCB7ln3hvKlQw51/IR 1pcgGaC6E98akJ9Rttugzn4sp8LHt9RHpIjCpLqxc3TXs3WHlK5WE5r14dHjn+NVoAO/ bTIfvsPsX0nv1ngCMZV/1xAc0iO2hW6L4lbEf3xoxN6X/LUTPUbPNBYLFK7R8p8zEG45 9PQxi5/GuWC9BTsZljST7CDBCu4YiO3ST71n2IaCSoADpSBySA7KMwPuvtoTc6aJt7vV brsbDL70NILV0OGJ5z5VWaZXJvxA2ByYfiz7Px6FlLNS78TxTB6VsljlM/ex2iOUxNSY GIpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8xHcyjm0EShrfW/6kiv17bipizmd4vd+MkZwRWGD9cM=; b=GlIyAc/cyOQfgeSydfHPeNhY/3uv8dKLIwwx3yHabqOQ6JD0UR3lvqnVej9OnmN7iu 5dtxidCCzrQ07pdjQIxly73aud/aGE7Th98ABkXYj60MiJIWN0y3HBchZjjIXlKY44TE uXzPHjlp2xlWws5JbC2wtpF9Mj+N15zVXO/W8ckbBHn8WgKK5ctrwuhXQdqjNOBkop+U 1M1EbK8oJKFPkGHHtXevJYt6ZflxVngbmAcIM+Df723xg5vs7xCU9Qz4EA0YvEkEOGDg k9uJQdcN8pAjBrLlwtmkJ0NzHdHvlgy07ub4QT4JFXQMOILxETJ2hAojFNudgQAUflYw W8Xg== X-Gm-Message-State: AOAM5313wBOEGlIqW4LK91O/9P44+pWLqNBZ3uH0J2cBSlnEaCmplmIc I5g5BYwjmALW8qP1fPvRUjG+GD8QNx31kqZfQ1ggOU7V X-Received: by 2002:aa7:805a:0:b0:4f6:dc68:5d41 with SMTP id y26-20020aa7805a000000b004f6dc685d41mr3697586pfm.69.1646524048739; Sat, 05 Mar 2022 15:47:28 -0800 (PST) MIME-Version: 1.0 References: <20220225234339.2386398-1-haoluo@google.com> <20220225234339.2386398-2-haoluo@google.com> <20220227051821.fwrmeu7r6bab6tio@apollo.legion> <20220302193411.ieooguqoa6tpraoe@ast-mbp.dhcp.thefacebook.com> In-Reply-To: From: Alexei Starovoitov Date: Sat, 5 Mar 2022 15:47:17 -0800 Message-ID: Subject: Re: [PATCH bpf-next v1 1/9] bpf: Add mkdir, rmdir, unlink syscalls for prog_bpf_syscall To: Hao Luo Cc: Kumar Kartikeya Dwivedi , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe Burton , Tejun Heo , Josh Don , Stanislav Fomichev , bpf , LKML Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 4, 2022 at 10:37 AM Hao Luo wrote: > > I gave this question more thought. We don't need to bind mount the top > bpffs into the container, instead, we may be able to overlay a bpffs > directory into the container. Here is the workflow in my mind: I don't quite follow what you mean by 'overlay' here. Another bpffs mount or future overlayfs that supports bpffs? > For each job, let's say A, the container runtime can create a > directory in bpffs, for example > > /sys/fs/bpf/jobs/A > > and then create the cgroup for A. The sleepable tracing prog will > create the file: > > /sys/fs/bpf/jobs/A/100/stats > > 100 is the created cgroup's id. Then the container runtime overlays > the bpffs directory into container A in the same path: Why cgroup id ? Wouldn't it be easier to use the same cgroup name as in cgroupfs ? > [A's container path]/sys/fs/bpf/jobs/A. > > A can see the stats at the path within its mount ns: > > /sys/fs/bpf/jobs/A/100/stats > > When A creates cgroup, it is able to write to the top layer of the > overlayed directory. So it is > > /sys/fs/bpf/jobs/A/101/stats > > Some of my thoughts: > 1. Compared to bind mount top bpffs into container, overlaying a > directory avoids exposing other jobs' stats. This gives better > isolation. I already have a patch for supporting laying bpffs over > other fs, it's not too hard. So it's overlayfs combination of bpffs and something like ext4, right? I thought you found out that overlaryfs has to be upper fs and lower fs shouldn't be modified underneath. So if bpffs is a lower fs the writes into it should go through the upper overlayfs, right? > 2. Once the container runtime has overlayed directory into the > container, it has no need to create more cgroups for this job. It > doesn't need to track the stats of job-created cgroups, which are > mainly for inspection by the job itself. Even if it needs to collect > the stats from those cgroups, it can read from the path in the > container. > 3. The overlay path in container doesn't have to be exactly the same > as the path in root mount ns. In the sleepable tracing prog, we may > select paths based on current process's ns. If we choose to do this, > we can further avoid exposing cgroup id and job name to the container. The benefits make sense.