Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp3867517imw; Mon, 11 Jul 2022 18:15:48 -0700 (PDT) X-Google-Smtp-Source: AGRyM1u4aL9aIhBiP7GlR7tmmp6Y0AwimJc5JlEOj1xcVZiWBEOSCuh4jhRY9FPq2isGCrq5LMgB X-Received: by 2002:a17:907:ea5:b0:726:2c1c:312f with SMTP id ho37-20020a1709070ea500b007262c1c312fmr21161968ejc.248.1657588548350; Mon, 11 Jul 2022 18:15:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657588548; cv=none; d=google.com; s=arc-20160816; b=CRgzf3kpmqScAeP7Y1uRG+dCyYMwaiaKS2l4InenwLwRBxfQsLMbYaJsG19uKdcocp b/kCS4DNTQYBe42wm/76fTi+E3n308n0vJVx9bKnDy3594qsfGz/aL0Rufbx6QrYNl5J /6eA2+81DXkK9vKuJ/KD+Vm4W684TQb7xybpnKEPEEYIXkeWx5pCIg2ENYUjNvDdHQmd xfrKTsk2MIi7IaVASuyah16X5xz2bpaPgeOuuAbTqqWwkWb5uEOOgPu1pji585Mhe5z2 V+xIUr2KCz5CBX5Urg6FymkzbreGsZTiVTnYLVeWD5xo0J4I+00GGA4TWrcODxML9ERJ MXGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=68mkDNBCkeJpX+q116C9yWsW5XjBLSxGi+xt71TSQ28=; b=ZMlJhQR00oki5pYyRXi0pbTn+ik7CeIs6wQEf555R4ThV1wEvzE/WZWIzEJ323NpEh 1NuOxZGLkoiE4z2h5Xw6wZXJOenCMYMxma40WyMYnQqFibLoJ8rFSS4R+BEE16z5L7Vk dtRnVslwrZQlPH2PTp0cq1t1eyWXRe7erLftX+QnfkVsd8finUhkLlYHwG1++lvC+Sdk NyfrsIjykz/cVROu6cFAvmizhvzh5/suK8rGvfYw6XunpmbjKr2CMeBvTZXxbKylymab Dzae9pjcv3vv+p0LFdqqpfk71dnA3iK7GC2+fYaMO4MvY9joke5tZLY8j48lE9xhP95Y iHBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=pXcUEGlI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id js19-20020a17090797d300b00722ea4dd14csi13420619ejc.387.2022.07.11.18.15.24; Mon, 11 Jul 2022 18:15:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=pXcUEGlI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229515AbiGLAma (ORCPT + 99 others); Mon, 11 Jul 2022 20:42:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229764AbiGLAm2 (ORCPT ); Mon, 11 Jul 2022 20:42:28 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D78F2DA9B for ; Mon, 11 Jul 2022 17:42:27 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id h17so9105107wrx.0 for ; Mon, 11 Jul 2022 17:42:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=68mkDNBCkeJpX+q116C9yWsW5XjBLSxGi+xt71TSQ28=; b=pXcUEGlIhOHDWFHvDN9rRDLyJeDZML22bbC8eiEGLCy/l4q3ub47IwesxJKljkqfb8 nHo/r/cp9kyG8mZVZTNOAx4hzS3GaEjcvicCDoe02fE9FolqPAKFuY8Yz4hOx4lEEI+B vs5VuZWfB+WFXaL8jLSq4cjzTVtlNJI1bco7XI+5d7o+6Nrc35e3zlx/La0iy33o4aQB s4CD0WymQMmITSwJOIiWAIaFjB08hdaj5K8yBVWYxR1oYSeGBMLeY66fr2CPeR5SUtQz FqjKBS1ff2f/3D3pFF/1j0RLdvML2qQgm8bFuhL1eit5NdIkvBO5vNiChDWEJL0ialM5 ooNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=68mkDNBCkeJpX+q116C9yWsW5XjBLSxGi+xt71TSQ28=; b=V9HP3xFAMBh99H38QTAlRtoj7AqSlrcqYC6yfTvOWKnzsT7Yi8P0b5OCDXk16uOKAu ijyyzjQqFPDjGiWHtlsUXgEG/I/UbpyaZZ+FGWA8ThPyCOL6aK6v++3OJTvBeELvH6Mr ey+CkttX11cFISi8i9Jq8eAtEy7/w65xppC7Ah0BaVbradWagn+IVSd7UatNETlwfmUl cNHGvaBUJQgvjkf8ik6eyXI++ztxXwT911PBM7JpuJPrPaZVhpRs30hCgBXX5NMm9XJI hWW1OtFzOd9vdy8cms701sieCfyo9ToCvyLABrGvpDDC8a3EmuZtnR8tPu08M100r5ME RumA== X-Gm-Message-State: AJIora8COg4d8oliOb8anTo/e45A1LwxM3gHV2w2PGYefIuaDlgFmfX6 g/gOTc//evGMuhQoELjGzRh7sra5Fv0B5PMgSRgS1A== X-Received: by 2002:a5d:59a8:0:b0:21d:8a9d:732b with SMTP id p8-20020a5d59a8000000b0021d8a9d732bmr19203014wrr.28.1657586546036; Mon, 11 Jul 2022 17:42:26 -0700 (PDT) MIME-Version: 1.0 References: <20220709000439.243271-1-yosryahmed@google.com> <20220709000439.243271-5-yosryahmed@google.com> <370cb480-a427-4d93-37d9-3c6acd73b967@fb.com> In-Reply-To: From: Hao Luo Date: Mon, 11 Jul 2022 17:42:14 -0700 Message-ID: Subject: Re: [PATCH bpf-next v3 4/8] bpf: Introduce cgroup iter To: Yonghong Song Cc: Yosry Ahmed , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Tejun Heo , Zefan Li , Johannes Weiner , Shuah Khan , Michal Hocko , KP Singh , Benjamin Tissoires , John Fastabend , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Roman Gushchin , David Rientjes , Stanislav Fomichev , Greg Thelen , Shakeel Butt , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 11, 2022 at 4:20 PM Yonghong Song wrote: > > On 7/10/22 5:19 PM, Yonghong Song wrote: > > > > [...] > >> + > >> union bpf_iter_link_info { > >> struct { > >> __u32 map_fd; > >> } map; > >> + > >> + /* cgroup_iter walks either the live descendants of a cgroup > >> subtree, or the ancestors > >> + * of a given cgroup. > >> + */ > >> + struct { > >> + /* Cgroup file descriptor. This is root of the subtree if for > >> walking the > >> + * descendants; this is the starting cgroup if for walking > >> the ancestors. > > > > Adding comment that cgroup_fd 0 means starting from root cgroup? Sure. > > Also, if I understand correctly, cgroup v1 is also supported here, > > right? If this is the case, for cgroup v1 which root cgroup will be > > used for cgroup_fd? It would be good to clarify here too. > > IMO, the case of cgroup_fd = 0 combined with cgroup v1 should return errors. It's an invalid case. If anyone wants to use cgroup_iter on cgroup v1 hierarchy, they could explicitly open the subsystems' root directory and pass the fd. With that said, Yosry and I will test and confirm the behavior in this situation and clarify in the comment. Thanks for pointing this out. > >> + */ > >> + __u32 cgroup_fd; > >> + __u32 traversal_order; > >> + } cgroup; > >> }; > >> /* BPF syscall commands, see bpf(2) man-page for more details. */ > >> @@ -6134,6 +6151,10 @@ struct bpf_link_info { > >> struct { > >> __u32 map_id; > >> } map; > >> + struct { > >> + __u32 traversal_order; > >> + __aligned_u64 cgroup_id; > >> + } cgroup; > > > > We actually has a problem here although I don't have a solution yet. > > [...] > > > > There is a 4 byte hole after member 'target_name_len'. So map_id will > > have a offset 16 from the start of structure 'iter'. > > > > > > This will break uapi. We probably won't be able to change the existing > > uapi with adding a ':32' after member 'target_name_len'. I don't have > > a good solution yet, but any suggestion is welcome. > > > > Also, for '__aligned_u64 cgroup_id', '__u64 cgroup_id' is enough. > > '__aligned_u64' mostly used for pointers. > > Briefly discussed with Alexei, the following structure iter definition > should work. Later on, if we need to addition fields for other iter's, > for a single __u32, the field can be added to either the first or the > second union. If fields are more than __u32, they can be placed > in the second union. > > struct { > __aligned_u64 target_name; /* in/out: > target_name buffer ptr */ > __u32 target_name_len; /* in/out: > target_name buffer len */ > union { > struct { > __u32 map_id; > } map; > }; > union { > struct { > __u64 cgroup_id; > __u32 traversal_order; > } cgroup; > }; > } iter; > Thanks Yonghong for seeking the solution here. The solution looks good. I'm going to put your heads-up as comments there. One thing I'd like to confirm, when we query bpf_link_info for cgroup iter, do we also need to zero those fields for map_elem? > > > > > > >> }; > >> } iter; > >> struct { [...] > >> + > >> +static void *cgroup_iter_seq_start(struct seq_file *seq, loff_t *pos) > >> +{ > >> + struct cgroup_iter_priv *p = seq->private; > >> + > >> + mutex_lock(&cgroup_mutex); > >> + > >> + /* support only one session */ > >> + if (*pos > 0) > >> + return NULL; > > > > This might be okay. But want to check what is > > the practical upper limit for cgroups in a system > > and whether we may miss some cgroups. If this > > happens, it will be a surprise to the user. > > Ok. What's the max number of items supported in a single session? > >> + > >> + ++*pos; > >> + p->terminate = false; > >> + if (p->order == BPF_ITER_CGROUP_PRE) > >> + return css_next_descendant_pre(NULL, p->start_css); > >> + else if (p->order == BPF_ITER_CGROUP_POST) > >> + return css_next_descendant_post(NULL, p->start_css); > >> + else /* BPF_ITER_CGROUP_PARENT_UP */ > >> + return p->start_css; > >> +} > >> + > >> +static int __cgroup_iter_seq_show(struct seq_file *seq, > >> + struct cgroup_subsys_state *css, int in_stop); > >> + > >> +static void cgroup_iter_seq_stop(struct seq_file *seq, void *v) > >> +{ > >> + /* pass NULL to the prog for post-processing */ > >> + if (!v) > >> + __cgroup_iter_seq_show(seq, NULL, true); > >> + mutex_unlock(&cgroup_mutex); > >> +} > >> + > > [...]