Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp621977rwb; Thu, 11 Aug 2022 07:18:51 -0700 (PDT) X-Google-Smtp-Source: AA6agR7PMe7T64iPRXbxw0mPcidMslbcFvjEmNw8Sx+JFoXvPrsM5RYz6MNH7l0wRGjx/6Ni5PCI X-Received: by 2002:a05:6a00:408e:b0:52e:7ae5:af62 with SMTP id bw14-20020a056a00408e00b0052e7ae5af62mr31861250pfb.20.1660227530966; Thu, 11 Aug 2022 07:18:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660227530; cv=none; d=google.com; s=arc-20160816; b=gRnSimyFwau0Vd2qnBVbItHB20RQ8KTiiKoBQjGUH7fs0EOX820qaOwK4nJ4Yy70Te bjmrIjbyB/GYPjQ/c8+QZ+91BsTTjQvyJbJ8YT1jUSXpZzVIlGKUCFzJjLkk5qsua8/2 vkaL3JoKlcLnukFA+jhOSiNjZg0biT1q71JyaDfsBUOl4fzYBAfP/r0LcaVceVjsJh2H ccs/ENk5ffnzP/yx58lfY5GRXo5hXTZgsiPuoHvMQvb+EPf27nwxl8gyXJ42MHdESR0+ j+lRtZfCw7/Q5WLCu0R5C2qOK1J9779lpwwe0xik2awW8EjD3koFaBWvHANAAM9c+yWg 22Rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=WLk0grdpHXDP9f9JN3vRVxfg5WZpuipZfPqGJjRfsbM=; b=Uohg1lO9SX5GpbCxgGR1TLKLHZryLN9TJjc7akrBt9tk30GZhtYZq30qDM6ffavesr eoI+NqRcCwfhwnY5Yyj+PpJ5wypyJQweT+ygvbUFB7qWJRwivotZqjYnleA+kQo3kwh5 I+wt9iVxCGJ3fDtDHs/8CwbX3odTMut900U4bZmWPKsJf6abaIYjYr3CRGUJdWJkhmaA V1qEOrEfIVlih6zaMSIPGXOWEDB8X4FtcL7Y61Y7w5GuCaI+wXOc4+thpRquco82NznA Ano/T5EIyF0wAsPvzyefhfAI6pbKYWcxHiFwnxfGUxaJZBeFX3LzkqtCUQAcA/NrnVU4 vCGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MtBhkQZO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r134-20020a632b8c000000b0041dcd148307si6276440pgr.655.2022.08.11.07.18.35; Thu, 11 Aug 2022 07:18:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MtBhkQZO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230445AbiHKOKQ (ORCPT + 99 others); Thu, 11 Aug 2022 10:10:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234925AbiHKOKN (ORCPT ); Thu, 11 Aug 2022 10:10:13 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 573218E444 for ; Thu, 11 Aug 2022 07:10:10 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id d24-20020a1c7318000000b003a5c1bf4f27so700432wmb.4 for ; Thu, 11 Aug 2022 07:10:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=WLk0grdpHXDP9f9JN3vRVxfg5WZpuipZfPqGJjRfsbM=; b=MtBhkQZOjz/dOkOu0kGjIwRw7X+TGIfvAxUtwShgJf5gfXBC271Hm+2nrxKyCawSqm SUAnhfutlbyHthhFMOTEMIxnuUHZ11QsiVB210iy+SfA4g67q99qsBeJI5nRJ/SFEzdV Q+v4d+u/LL3goF20YvtVV1VCXDQmTmzymfk9tfPJSwyPM7e/y2XbJVmbbzGLPtZA4gtt hDtCGPakHrxwy/d/b9VXyDknjSKUFLS5vRELIe39GztXFXUQanVYWoWyRH+5yZKf3lT6 8X7HVFBhBzCgav9X0D1z3uNgB4IfwVs6JeYq8jS9AM334uxJMe8X5vMVWu6Q6WKDHOS5 eI0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=WLk0grdpHXDP9f9JN3vRVxfg5WZpuipZfPqGJjRfsbM=; b=TnRS5CRcXQ90kWm2Iwv+PLT+qtFavA9QKga4Ond072EBcPXoz8tYv8ydL4Ouqr70iV ZDMS2zQq1UlGFJmkCKjwoU1IEhk7QO4PsQ/riQg8mpqmzmoGSvzqEUvqotz9H0un7QaJ dwf5Qt4g8Lk7ee0VhLhuwyoI4S2e+PMWsXEZX2xyTbSenyLI8JM/LNTIY4BUWWViuvYz CWJigAJS0RgLuV2U5jXR1Yka1X55/BpXuUu9ex+FzG+z9mKC8ZTuXlaJB0KwW/CgEz/M A9IjfK+oECI59Gp+cvTFVWmysYSEjGZP8a2XLyNc4G3kvLVsOc6gUBs8GUXJBQcurBfa ARcg== X-Gm-Message-State: ACgBeo1tuDaCsS0iwqGRfEfyJYXS6KDpWpPvDoU/Ey/06IFqVMA42F9a lBS+pZgdwh4Jt08UK9xCRR40F6P4gtYBvRqXVa+2RvFAqmVyhQ== X-Received: by 2002:a1c:ed16:0:b0:3a5:1206:147d with SMTP id l22-20020a1ced16000000b003a51206147dmr5819846wmh.196.1660227008707; Thu, 11 Aug 2022 07:10:08 -0700 (PDT) MIME-Version: 1.0 References: <20220805214821.1058337-1-haoluo@google.com> <20220805214821.1058337-5-haoluo@google.com> <20220809162325.hwgvys5n3rivuz7a@MacBook-Pro-3.local.dhcp.thefacebook.com> In-Reply-To: From: Yosry Ahmed Date: Thu, 11 Aug 2022 07:09:31 -0700 Message-ID: Subject: Re: [PATCH bpf-next v7 4/8] bpf: Introduce cgroup iter To: Hao Luo Cc: Alexei Starovoitov , Andrii Nakryiko , Linux Kernel Mailing List , bpf , Cgroups , Networking , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , Tejun Heo , Zefan Li , KP Singh , Johannes Weiner , Michal Hocko , Benjamin Tissoires , John Fastabend , Michal Koutny , Roman Gushchin , David Rientjes , Stanislav Fomichev , Shakeel Butt Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 10, 2022 at 8:10 PM Hao Luo wrote: > > On Tue, Aug 9, 2022 at 11:38 AM Hao Luo wrote: > > > > On Tue, Aug 9, 2022 at 9:23 AM Alexei Starovoitov > > wrote: > > > > > > On Mon, Aug 08, 2022 at 05:56:57PM -0700, Hao Luo wrote: > > > > On Mon, Aug 8, 2022 at 5:19 PM Andrii Nakryiko > > > > wrote: > > > > > > > > > > On Fri, Aug 5, 2022 at 2:49 PM Hao Luo wrote: > > > > > > > > > > > > Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: > > > > > > > > > > > > - walking a cgroup's descendants in pre-order. > > > > > > - walking a cgroup's descendants in post-order. > > > > > > - walking a cgroup's ancestors. > > > > > > - process only the given cgroup. > > > > > > > > [...] > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > > index 59a217ca2dfd..4d758b2e70d6 100644 > > > > > > --- a/include/uapi/linux/bpf.h > > > > > > +++ b/include/uapi/linux/bpf.h > > > > > > @@ -87,10 +87,37 @@ struct bpf_cgroup_storage_key { > > > > > > __u32 attach_type; /* program attach type (enum bpf_attach_type) */ > > > > > > }; > > > > > > > > > > > > +enum bpf_iter_order { > > > > > > + BPF_ITER_ORDER_DEFAULT = 0, /* default order. */ > > > > > > > > > > why is this default order necessary? It just adds confusion (I had to > > > > > look up source code to know what is default order). I might have > > > > > missed some discussion, so if there is some very good reason, then > > > > > please document this in commit message. But I'd rather not do some > > > > > magical default order instead. We can set 0 to mean invalid and error > > > > > out, or just do SELF as the very first value (and if user forgot to > > > > > specify more fancy mode, they hopefully will quickly discover this in > > > > > their testing). > > > > > > > > > > > > > PRE/POST/UP are tree-specific orders. SELF applies on all iters and > > > > yields only a single object. How does task_iter express a non-self > > > > order? By non-self, I mean something like "I don't care about the > > > > order, just scan _all_ the objects". And this "don't care" order, IMO, > > > > may be the common case. I don't think everyone cares about walking > > > > order for tasks. The DEFAULT is intentionally put at the first value, > > > > so that if users don't care about order, they don't have to specify > > > > this field. > > > > > > > > If that sounds valid, maybe using "UNSPEC" instead of "DEFAULT" is better? > > > > > > I agree with Andrii. > > > This: > > > + if (order == BPF_ITER_ORDER_DEFAULT) > > > + order = BPF_ITER_DESCENDANTS_PRE; > > > > > > looks like an arbitrary choice. > > > imo > > > BPF_ITER_DESCENDANTS_PRE = 0, > > > would have been more obvious. No need to dig into definition of "default". > > > > > > UNSPEC = 0 > > > is fine too if we want user to always be conscious about the order > > > and the kernel will error if that field is not initialized. > > > That would be my preference, since it will match the rest of uapi/bpf.h > > > > > > > Sounds good. In the next version, will use > > > > enum bpf_iter_order { > > BPF_ITER_ORDER_UNSPEC = 0, > > BPF_ITER_SELF_ONLY, /* process only a single object. */ > > BPF_ITER_DESCENDANTS_PRE, /* walk descendants in pre-order. */ > > BPF_ITER_DESCENDANTS_POST, /* walk descendants in post-order. */ > > BPF_ITER_ANCESTORS_UP, /* walk ancestors upward. */ > > }; > > > > Sigh, I find that having UNSPEC=0 and erroring out when seeing UNSPEC > doesn't work. Basically, if we have a non-iter prog and a cgroup_iter > prog written in the same source file, I can't use > bpf_object__attach_skeleton to attach them. Because the default > prog_attach_fn for iter initializes `order` to 0 (that is, UNSPEC), > which is going to be rejected by the kernel. In order to make > bpf_object__attach_skeleton work on cgroup_iter, I think I need to use > the following > > enum bpf_iter_order { > BPF_ITER_DESCENDANTS_PRE, /* walk descendants in pre-order. */ > BPF_ITER_DESCENDANTS_POST, /* walk descendants in post-order. */ > BPF_ITER_ANCESTORS_UP, /* walk ancestors upward. */ > BPF_ITER_SELF_ONLY, /* process only a single object. */ > }; > > So that when calling bpf_object__attach_skeleton() on cgroup_iter, a > link can be generated and the generated link defaults to pre-order > walk on the whole hierarchy. Is there a better solution? > I think this can be handled by userspace? We can attach the cgroup_iter separately first (and maybe we will need to set prog->link as well) so that bpf_object__attach_skeleton() doesn't try to attach it? I am following this pattern in the selftest in the final patch, although I think I might be missing setting prog->link, so I am wondering why there are no issues in that selftest which has the same scenario that you are talking about. I think such a pattern will need to be used anyway if the users need to set any non-default arguments for the cgroup_iter prog (like the selftest), right? The only case we are discussing here is the case where the user wants to attach the cgroup_iter with all default options (in which case the default order will fail). I agree that it might be inconvenient if the default/uninitialized options don't work for cgroup_iter, but Alexei pointed out that this matches other bpf uapis. My concern is that in the future we try to reuse enum bpf_iter_order to set ordering for other iterators, and then the default/uninitialized value (BPF_ITER_DESCENDANTS_PRE) doesn't make sense for that iterator (e.g. not a tree). In this case, the same problem that we are avoiding for cgroup_iter here will show up for that iterator, and we can't easily change it at this point because it's uapi. > > and explicitly list the values acceptable by cgroup_iter, error out if > > UNSPEC is detected. > > > > Also, following Andrii's comments, will change BPF_ITER_SELF to > > BPF_ITER_SELF_ONLY, which does seem a little bit explicit in > > comparison. > > > > > I applied the first 3 patches to ease respin. > > > > Thanks! This helps! > > > > > Thanks!