Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp806648ioo; Sat, 21 May 2022 15:24:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJySXOEADXxNrp2J6vyGxyDB4abrfZO1T5O6QEdnpXdGQquXQYAQEVV0NXmwhp5oL8qmoJKf X-Received: by 2002:a05:6402:290e:b0:42a:e401:6a07 with SMTP id ee14-20020a056402290e00b0042ae4016a07mr16935683edb.99.1653171857784; Sat, 21 May 2022 15:24:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653171857; cv=none; d=google.com; s=arc-20160816; b=ZrVyc0N/hjjwsMD4s0eluQWAK+sSgka6OGijekCO7JOP5ifBwITxkWFWC85xRbM4nd MnYWnTrn4GUttEF+GyeQ7K6PyEBn3hEKwRJ55gqds73Wtc7fAMajjsbY7W3V2psoVzFy 2qB4I0bS+89ehT2w4AA9WHk0JU42MuFfzJwc3uyFq5uJIXK2QA0FLosHwtH8YVgUoYMU TTH/xZJS6zMITNAX332FkTsFJe8nqTKtnEJajO26m3ps09bmbhz3E7eYIKXdqkx+tcG5 L1OWHp8N+5HXt07WGITf1DqE4EoaOkQiI1vo32NvL/V8GiPIR/HCxoYk4bfQz04Sj0yW 1qJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=CPavKVkR8bELd7v5gex0nX0RA6A8QGn+WltRM5Og+4g=; b=fTIR549UcQFaViGijGBBDLFOqJyJ9a2z4PKQDzX+c4eJbht2NkDb5AqLvGCbnnW8QV L9vfNDwLt9Lnjn1RqG6K6qBEmnaqRJaxB+VMsnGYhkC7+1MWpt3kfnnT29QVz8WBKn+q 5d+jCUwKvrxvFfs9Tk0mypS8IQ96IlsGmtD31MCQG6Of5sMsk06daH8LS1OYn7HLTZo+ JtnpIb1b2Sf2MM/o8dROdPtUrD2uhsj3xvi1439VC1TrHpVHGJ3Cx64QlTyBvgFdObdy w3BwIpqIjBWD2HNUZMwlpitg3vVa1is1JF32dEO7XopmJRBTkxBNjvYcUFIyH5A90iFF 51Rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MSLFYCHC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hz6-20020a1709072ce600b006fe8d2b5272si10116646ejc.922.2022.05.21.15.23.52; Sat, 21 May 2022 15:24:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MSLFYCHC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244696AbiETRat (ORCPT + 99 others); Fri, 20 May 2022 13:30:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48194 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349172AbiETRar (ORCPT ); Fri, 20 May 2022 13:30:47 -0400 Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F81C186299 for ; Fri, 20 May 2022 10:30:43 -0700 (PDT) Received: by mail-qk1-x730.google.com with SMTP id c1so7379604qkf.13 for ; Fri, 20 May 2022 10:30:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CPavKVkR8bELd7v5gex0nX0RA6A8QGn+WltRM5Og+4g=; b=MSLFYCHCp/SZxn9cm5CoWu+3FpwqYMeTrBILRT4T9npnAg/BaxF/9YpOt/8BpfCeIi QNFKHiQ8wH5t6TjjD362O2oxnfrihMLlQiPFrrUH+NDukCfPWUuGOvL3+zuu3fi7AGdj qgdT4/2uMNS7DYiZi6YKk9IXmhduvHlRP/0woJyAJMNWtD0YS0jrjWL0wPpxdQM3uspA vJgbRI25KoLUFPhjSaZoBEnuSQDq+i0ze73pH1kri1TDLTflSDd6UyZwXBxmu5eVfBh6 UxiMZ9l7YRjsNMFPzIyBLQzKMTq6kkJMswyxyU98m9mX63TaYJcYhQG5GnSYZ1S/bTgH STSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CPavKVkR8bELd7v5gex0nX0RA6A8QGn+WltRM5Og+4g=; b=qAk5srmmtxtTfuhIofaAs7qE0Ew6oN625Slw1iXLlXPn/X4yyWpNbhIhW/fhgAxmqr mQeCja40NLGxwqjRwMCjbmbTJmrvKD0D7Fx7f/JTvzbQeELwGzaVXQLrAL98EHN1AyXi O4EGzN0V42ovLD3I568UgA4yIZGQJ4+FddzdCFHNE7ITzTPE8S0faMKAhIB7je3/7qOU Xcf5AqOaJvMksTCbUC5BEwqA0skjh4tZ9JYr6zKGEUE3zHgWBz84ehm7/PWodfKUv1n7 44d+y4f8fr5OYCC1lYy2SbaV6zuMb+gR5u5dRTJmp0Mr//8SqrjZyyMujULJZml1vHj3 DUsw== X-Gm-Message-State: AOAM531+N+1sFKpuFsih+BzOxLQD8d9CkQJXPGlCv0VHLKdOaYdb3fZR LEeVfj1tF98xXMbPPTYfJj5LHCsnqdJSTWR7FocMdg== X-Received: by 2002:a05:620a:2849:b0:687:651:54ee with SMTP id h9-20020a05620a284900b00687065154eemr7084966qkp.446.1653067842202; Fri, 20 May 2022 10:30:42 -0700 (PDT) MIME-Version: 1.0 References: <20220520012133.1217211-1-yosryahmed@google.com> <20220520012133.1217211-4-yosryahmed@google.com> In-Reply-To: From: Hao Luo Date: Fri, 20 May 2022 10:30:30 -0700 Message-ID: Subject: Re: [PATCH bpf-next v1 3/5] bpf: Introduce cgroup iter To: Tejun Heo Cc: Yosry Ahmed , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Zefan Li , Johannes Weiner , Shuah Khan , Roman Gushchin , Michal Hocko , Stanislav Fomichev , David Rientjes , Greg Thelen , Shakeel Butt , Linux Kernel Mailing List , Networking , bpf , Cgroups Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Tejun, On Fri, May 20, 2022 at 1:11 AM Tejun Heo wrote: > > Hello, > > On Fri, May 20, 2022 at 12:58:52AM -0700, Yosry Ahmed wrote: > > On Fri, May 20, 2022 at 12:41 AM Tejun Heo wrote: > > > > > > On Fri, May 20, 2022 at 01:21:31AM +0000, Yosry Ahmed wrote: > > > > From: Hao Luo > > > > > > > > Introduce a new type of iter prog: cgroup. Unlike other bpf_iter, this > > > > iter doesn't iterate a set of kernel objects. Instead, it is supposed to > > > > be parameterized by a cgroup id and prints only that cgroup. So one > > > > needs to specify a target cgroup id when attaching this iter. The target > > > > cgroup's state can be read out via a link of this iter. > > > > > > > > Signed-off-by: Hao Luo > > > > Signed-off-by: Yosry Ahmed > > > > > > This could be me not understanding why it's structured this way but it keeps > > > bothering me that this is adding a cgroup iterator which doesn't iterate > > > cgroups. If all that's needed is extracting information from a specific > > > cgroup, why does this need to be an iterator? e.g. why can't I use > > > BPF_PROG_TEST_RUN which looks up the cgroup with the provided ID, flushes > > > rstat, retrieves whatever information necessary and returns that as the > > > result? > > > > I will let Hao and Yonghong reply here as they have a lot more > > context, and they had previous discussions about cgroup_iter. I just > > want to say that exposing the stats in a file is extremely convenient > > for userspace apps. It becomes very similar to reading stats from > > cgroupfs. It also makes migrating cgroup stats that we have > > implemented in the kernel to BPF a lot easier. > > So, if it were upto me, I'd rather direct energy towards making retrieving > information through TEST_RUN_PROG easier rather than clinging to making > kernel output text. I get that text interface is familiar but it kinda > sucks in many ways. > Tejun, could you explain more about the downside of text interfaces and why TEST_RUN_PROG would address the problems in text output? From the discussion we had last time, I understand that your concern was the unstable interface if we introduce bpf files in cgroupfs, so we are moving toward replicating the directory structure in bpffs. But I am not sure about the issue of text format output > > AFAIK there are also discussions about using overlayfs to have links > > to the bpffs files in cgroupfs, which makes it even better. So I would > > really prefer keeping the approach we have here of reading stats > > through a file from userspace. As for how we go about this (and why a > > cgroup iterator doesn't iterate cgroups) I will leave this for Hao and > > Yonghong to explain the rationale behind it. Ideally we can keep the > > same functionality under a more descriptive name/type. > > My answer would be the same here. You guys seem dead set on making the > kernel emulate cgroup1. I'm not gonna explicitly block that but would > strongly suggest having a longer term view. > The reason why Yosry and I are still pushing toward this direction is that our user space applications rely heavily on extracting information from text output for cgroups. Please understand that migrating them from the traditional model to a new model is a bigger pain. But I agree that if we have a better, concrete solution (for example, maybe TEST_RUN_PROG) to convince them and help them migrate, I really would love to contribute and work on it. > If you *must* do the iterator, can you at least make it a proper iterator > which supports seeking? AFAICS there's nothing fundamentally preventing bpf > iterators from supporting seeking. Or is it that you need something which is > pinned to a cgroup so that you can emulate the directory structure? > Yonghong may comment on adding seek for bpf_iter. I would love to contribute if we are in need of that. Right now, we don't have a use case that needs seek for bpf_iter, I think. My thought: for cgroups, we can seek using cgroup id. Maybe, not all kernel objects are indexable, so seeking doesn't apply there? Hao