Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp1653834ioo; Sun, 22 May 2022 23:08:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxbxBTenyPhdhLpoSrOUHVuG3j+YA0nH8rdtjyuSf2IxtSDnBMOvvysq49YUX23jjX94UVx X-Received: by 2002:a17:902:a9ca:b0:161:54a6:af3f with SMTP id b10-20020a170902a9ca00b0016154a6af3fmr22028913plr.48.1653286124008; Sun, 22 May 2022 23:08:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653286124; cv=none; d=google.com; s=arc-20160816; b=zpzRkuD2fpfiZupQY2a68AED4+8UTtizaOPf6E43L4gaEIKTrnBgy/PjSoi7TOFuj9 ASgIbjg3WzNSNnYrQiJ+OlBBQnIDgCWJES1URmSAFI3+kTwJSwEOVMayP7DviSd3MFIX tbVwclZxyOKseki2/efL5AGHnL6W9em+xq3pDPNPtMdxm0dyIZo1GZP7on1QcuSIzKGs b5+Oyd3bxRp+iijw7wVY/L1wzyAteEJPoJezue4ZUynOVo1FjPXuBctABBsI7/0dOAtR qjFfG0Gl+XT0J+rJyVeZcPO1/J/WptxJEcUuwk31zq8AgyLx3zSd2JKkK45poCmj4ju6 Qq2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=0lG+wkpvvM5BrA6qMbE7wBMotbPnzZWg8Zgz+BedKu0=; b=nIBZLLV7Yv7ua3QNGk3/I429omcJXXU2q9thTh3SAgzIlkM/nCzSlc6qRbllSS39D9 2AGNxkO7My7F0sY4PmZqCNEUAbxvQamU0BVwTQz9a0uTPtSbcbpbqdzwcVruPdQqLC+A S1F9KzMGgclh2qTxWI2fhbKYr5ArMazjwZwTf2ZadKvPXZhlhDr6IdQZj+ZeCd1pbENm 6S8iKOgnxjyrWCORc5ZgZMdIhzBfff3A4agT+QECb0HFNWhwEVYIeYgHdTswmWMNzzcD FIETWv6x8HcuIvVjm2mN8RF4BJN97eY3QQbhp+D87tLEwDD1B/0vW/JrueNBfuVfsN9f 1oUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MxqMb1bf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id 129-20020a621587000000b0050d6182e323si11908061pfv.146.2022.05.22.23.08.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 May 2022 23:08:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MxqMb1bf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7637A393F6; Sun, 22 May 2022 22:59:17 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353274AbiETTnL (ORCPT + 99 others); Fri, 20 May 2022 15:43:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353271AbiETTnJ (ORCPT ); Fri, 20 May 2022 15:43:09 -0400 Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41FB0195EBD for ; Fri, 20 May 2022 12:43:07 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id g207so5312615qke.7 for ; Fri, 20 May 2022 12:43:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0lG+wkpvvM5BrA6qMbE7wBMotbPnzZWg8Zgz+BedKu0=; b=MxqMb1bfrueYp5btVTDjnT2tZ4EudnH7IxjI0/g/SP5vEIe/U2NL9ob1BaPa8WrAAR gHX32C1grdPWfQAeZ3t3XutIipxcrCttxt0pbHMynNAuNxoaHJpIUkpJOWdoL+jti5/w j2Sz1Ry4bwwp1I09ZXwTrH30sJNUvlsXzNlOJJSHSXHiJ+4F5FE6Bz19mWJUzsgA0AzW wag0FmlobomPxKN5cV2/IVZDBNTgvni6AX5t0CfhwObdRYjhIsOOQj+NeZy9qA8qqayf jVACJxUx6T2C5Dy7ncHo4iQwQCSGKzdywRmM2t8gw67xFH5l+VlMZ0sqGJELpxpWIyd5 +t5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0lG+wkpvvM5BrA6qMbE7wBMotbPnzZWg8Zgz+BedKu0=; b=Eh2ys3a3FwhhUoOw4I8FOcVpjDgXxttwHO34qNpcJ1Pb5aILZxBdpNPg3sLbCu4EeD NpivHlMCzedLEcPmVPt3CsZzTXBKDVtCkcDyh6m9ldfD2qH+5Uoe1NtOSnoEjYE35+od cI7NJenCVYYuzEdR+9gawBKeZ5Ivf0wubWEXAdovuUtbW+3lDy5l24cN+58LdYVkga7J 45Ykgh9j76cM15upt/1jNpbsaUSXAzfeKc61yOyroxc1428vIgaOW3zPfaWkW7gwpRAp MdCddZeW4upWUV5U8YFG8+8Ns1bmO6UOeV/dfyoNIaXj/evmCg4QZq9jozTatgVYSIRW gC6g== X-Gm-Message-State: AOAM532ZqCi0LNX842mpHMg2JviwxC85d4muUglZ0hwYH1RLZ46mckbD UwwgzTcAzNBhD8F+DLDhYNe6O8Aw3EPHzUrvoi/IqA== X-Received: by 2002:a05:620a:2849:b0:687:651:54ee with SMTP id h9-20020a05620a284900b00687065154eemr7426627qkp.446.1653075786167; Fri, 20 May 2022 12:43:06 -0700 (PDT) MIME-Version: 1.0 References: <20220520012133.1217211-1-yosryahmed@google.com> <20220520012133.1217211-4-yosryahmed@google.com> <73fd9853-5dab-8b59-24a0-74c0a6cae88e@fb.com> In-Reply-To: From: Hao Luo Date: Fri, 20 May 2022 12:42:54 -0700 Message-ID: Subject: Re: [PATCH bpf-next v1 3/5] bpf: Introduce cgroup iter To: Tejun Heo Cc: Yonghong Song , Yosry Ahmed , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , John Fastabend , KP Singh , Zefan Li , Johannes Weiner , Shuah Khan , Roman Gushchin , Michal Hocko , Stanislav Fomichev , David Rientjes , Greg Thelen , Shakeel Butt , Linux Kernel Mailing List , Networking , bpf , Cgroups Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Tejun and Yonghong, On Fri, May 20, 2022 at 9:45 AM Tejun Heo wrote: > On Fri, May 20, 2022 at 09:29:43AM -0700, Yonghong Song wrote: > > Maybe you can have a bpf program signature like below: > > > > int BPF_PROG(dump_vmscan, struct bpf_iter_meta *meta, struct cgroup *cgrp, > > struct cgroup *parent_cgrp) > > > > parent_cgrp is NULL when cgrp is the root cgroup. > > > > I would like the bpf program should send the following information to > > user space: > > > > I don't think parent cgroup dir name would be sufficient to reconstruct the > path given that multiple cgroups in different subtrees can have the same > name. For live cgroups, userspace can find the path from id (or ino) without > traversing anything by constructing the fhandle, open it open_by_handle_at() > and then reading /proc/self/fd/$FD symlink - > https://lkml.org/lkml/2020/12/2/1126. This isn't available for dead cgroups > but I'm not sure how much that'd matter given that they aren't visible from > userspace anyway. > Sending cgroup id is better than cgroup dir name, also because IIUC the path obtained from cgroup id depends on the namespace of the userspace process. So if the dump file may be potentially read by processes within a container, it's better to have the output namespaced IMO. > > > > > > This way, user space can easily construct the cgroup hierarchy stat like > > cpu mem cpu pressure mem pressure ... > > cgroup1 ... > > child1 ... > > grandchild1 ... > > child2 ... > > cgroup 2 ... > > child 3 ... > > ... ... > > > > the bpf iterator can have additional parameter like > > cgroup_id = ... to only call bpf program once with that > > cgroup_id if specified. Yep, this should work. We just need to make the cgroup_id parameter optional. If it is specified when creating bpf_iter_link, we print for that cgroup only. If it is not specified, we iterate over all cgroups. If I understand correctly, sounds doable. > > The kernel part of cgroup_iter can call cgroup_rstat_flush() > > before calling cgroup_iter bpf program. Sounds good to me as well. But my knowledge on rstat_flush is limited. Yosry can give this a try. > > Would it work to just pass in @cgrp and provide a group of helpers so that > the program can do whatever it wanna do including looking up the full path > and passing that to userspace? > My understanding is, yes, doable. If we need the full path information of a cgroup, helpers or kfuncs are needed. The userspace needs to specify the identity of the cgroup, when creating bpf_iter. This identity could be cgroup id or fd. This identity needs to be converted to cgroup object somewhere before passing into bpf program to use.