Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp1645307ioo; Sun, 22 May 2022 22:50:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz5Aow3Us5Z0dTKHtFLC6i4NdvxfmP8BANp9/0L3fKm6O3Z5aIT28pAgueFrrdcidH30CHv X-Received: by 2002:a65:4d48:0:b0:3f3:936a:7c33 with SMTP id j8-20020a654d48000000b003f3936a7c33mr19303870pgt.217.1653285001078; Sun, 22 May 2022 22:50:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653285001; cv=none; d=google.com; s=arc-20160816; b=wsMZfFPBuDUg5tNIMwjiZTzFJI9Zqw+1feI4AEwx9VsF62ou38B7HUnyBwbVGieYvd Gpitkjibo8YRXrjtMjYBUSR1Bv1AOKlvH+6+JNluo8A5Z7CCDk42OgnB1BKw1oHImVRh x5K3eVkNO3PObo+bES5Usj44zy87kONZciz7NNjwuD+ztsWQ64G+Dm2HAlqIkQBKhcOj Af7m8nYnnBbw9v2IM9ePwp9ruNYqkyAcVvhcz4YupudbAi4aMdlNJnRV50VmF8ztgPv5 LEXX1zjO2pHIWxojwpnIcST3uJ0yDfMq7bc05D7RwO6Rgv7HJXV/ZOurJeX51fH/nkBY EJSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=fjiL3NxvzYrXKj45aHBa+PDLeI3SrPcf1o9kYao+3oU=; b=hNdD4IfMH5jnLJAwZDUHsPshcSdRI4PofCacueNvCPIkREFm9/eAIcGAW3FoRmJT1p Q8UlZmlHTBXbebWKD1QL+k53waOkvwWXgK0fLtwm6Iknw8yJTnm7f+W65cB3378Xw/No t3hN2MJ8Wvdh89xPEAB9uMy3w7kmVi1OrZ5vMXWMMes+OakvF08G4YExmVU77lJ5lt3u Gglc5mwC2msNqCpXnWXNCSdTsIxFdoe0c51ziacaLv70Dwk86PZt+BjsCSDic18/QtnH IeeIQr4+yr45YGLrxYjmVf9Zf46aRueWsr28KEFJw7Ufmc9iRAtHfTeOHZNgcORmCwtt jGNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=f1SZJ5wL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id r6-20020aa79626000000b0050d2ac6b8b6si12279973pfg.226.2022.05.22.22.50.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 May 2022 22:50:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=f1SZJ5wL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DB9A527FC7; Sun, 22 May 2022 22:48:38 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351730AbiETVTZ (ORCPT + 99 others); Fri, 20 May 2022 17:19:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236005AbiETVTX (ORCPT ); Fri, 20 May 2022 17:19:23 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9993719C761 for ; Fri, 20 May 2022 14:19:21 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id e2so1420304wrc.1 for ; Fri, 20 May 2022 14:19:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fjiL3NxvzYrXKj45aHBa+PDLeI3SrPcf1o9kYao+3oU=; b=f1SZJ5wLl9KsySVn8cTyhGlWJMoGbRfZWdXIbT19HvldLNIfxvW+e8t2QACM0vOjjl TTkCnR8Uv4N+prXqmTruAvJV4jPFv1yFQG5Mcwc/JNk/2wd7QMX3xCRWlmOHUKqiOa8m 7rN49OeStoRttGDgdidlQT1/mlxLY42xKmmDXWeXODcWslDYXMO/z0UyC6ANAVntoFnN dNYejFKQpABvj/LLi6y+ekG47MHIQYksaSFuqWQxHCnopiPAMI0U+V5dHZGF9MC72GLu k9uItHw/yoSOK+kR4G3a/Qmt5aepyIBViRMr2gyaWIABqizZ0tr7XgaAqAuOnTlWVB55 E0tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fjiL3NxvzYrXKj45aHBa+PDLeI3SrPcf1o9kYao+3oU=; b=hHvwTs4UuQvUhM2FSo+EySM2gFRqv3GGU9DrY87hpgLDZQS5N9c8HBdIfgZ5l6KVVp +TAiZanern88Y9emWTlCbrlHjB2v43b3ft86HFD19V89M86BF3ka080lPk3iF+8Ig6RS LnACA/i554v3ysMbGMOz1/Kd3Mn8h5zSeccm1VjYXAuBt7hVvqHz8XPmGUXmoDhCF8DS +646nZeSvXI1V8O+1leD5cWWPBCW0Y6UYIk5kpVDOnwNTrUpq+PP8xEP7I/K2nelZEG4 7qMLgpTgbbpM0q538o6S+KB3bKFhkNt8F/oi820QXjbMwXkeUsOoL5Za0WwK6hk14oOl 5R+Q== X-Gm-Message-State: AOAM533+7RTC/UsRi7AaM5B5yeVdYmkFPaKq53GZnw2l1clJZp8ddA/4 bq9d9hTbh4f5J5T3frQ4iM74p39kNVyP05v6cMCNpw== X-Received: by 2002:adf:fb05:0:b0:20a:e113:8f3f with SMTP id c5-20020adffb05000000b0020ae1138f3fmr10019749wrr.534.1653081559944; Fri, 20 May 2022 14:19:19 -0700 (PDT) MIME-Version: 1.0 References: <20220520012133.1217211-1-yosryahmed@google.com> <20220520012133.1217211-4-yosryahmed@google.com> <73fd9853-5dab-8b59-24a0-74c0a6cae88e@fb.com> In-Reply-To: From: Yosry Ahmed Date: Fri, 20 May 2022 14:18:42 -0700 Message-ID: Subject: Re: [PATCH bpf-next v1 3/5] bpf: Introduce cgroup iter To: Hao Luo Cc: Tejun Heo , Yonghong Song , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , John Fastabend , KP Singh , Zefan Li , Johannes Weiner , Shuah Khan , Roman Gushchin , Michal Hocko , Stanislav Fomichev , David Rientjes , Greg Thelen , Shakeel Butt , Linux Kernel Mailing List , Networking , bpf , Cgroups Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 20, 2022 at 12:43 PM Hao Luo wrote: > > Hi Tejun and Yonghong, > > On Fri, May 20, 2022 at 9:45 AM Tejun Heo wrote: > > On Fri, May 20, 2022 at 09:29:43AM -0700, Yonghong Song wrote: > > > Maybe you can have a bpf program signature like below: > > > > > > int BPF_PROG(dump_vmscan, struct bpf_iter_meta *meta, struct cgroup *cgrp, > > > struct cgroup *parent_cgrp) > > > > > > parent_cgrp is NULL when cgrp is the root cgroup. > > > > > > I would like the bpf program should send the following information to > > > user space: > > > > > > > I don't think parent cgroup dir name would be sufficient to reconstruct the > > path given that multiple cgroups in different subtrees can have the same > > name. For live cgroups, userspace can find the path from id (or ino) without > > traversing anything by constructing the fhandle, open it open_by_handle_at() > > and then reading /proc/self/fd/$FD symlink - > > https://lkml.org/lkml/2020/12/2/1126. This isn't available for dead cgroups > > but I'm not sure how much that'd matter given that they aren't visible from > > userspace anyway. > > > > Sending cgroup id is better than cgroup dir name, also because IIUC > the path obtained from cgroup id depends on the namespace of the > userspace process. So if the dump file may be potentially read by > processes within a container, it's better to have the output > namespaced IMO. > > > > > > > > > > This way, user space can easily construct the cgroup hierarchy stat like > > > cpu mem cpu pressure mem pressure ... > > > cgroup1 ... > > > child1 ... > > > grandchild1 ... > > > child2 ... > > > cgroup 2 ... > > > child 3 ... > > > ... ... > > > > > > the bpf iterator can have additional parameter like > > > cgroup_id = ... to only call bpf program once with that > > > cgroup_id if specified. > > Yep, this should work. We just need to make the cgroup_id parameter > optional. If it is specified when creating bpf_iter_link, we print for > that cgroup only. If it is not specified, we iterate over all cgroups. > If I understand correctly, sounds doable. > > > > The kernel part of cgroup_iter can call cgroup_rstat_flush() > > > before calling cgroup_iter bpf program. > > Sounds good to me as well. But my knowledge on rstat_flush is limited. > Yosry can give this a try. > > > > > Would it work to just pass in @cgrp and provide a group of helpers so that > > the program can do whatever it wanna do including looking up the full path > > and passing that to userspace? > > > > My understanding is, yes, doable. If we need the full path information > of a cgroup, helpers or kfuncs are needed. > > The userspace needs to specify the identity of the cgroup, when > creating bpf_iter. This identity could be cgroup id or fd. This > identity needs to be converted to cgroup object somewhere before > passing into bpf program to use. Let's sum up the discussion here, I feel like we are losing track of the main problem. IIUC the main concern is that cgroup_iter is not effectively an iterator, it rather dumps information for one cgroup. I like the suggestion to make it iterate cgroups by default, and an optional cgroup_id parameter to make it only "iterate" this one cgroup. IIUC, this cgroup_id parameter would be a link parameter, similar to the current approach. Basically, we extend the current patch so that if cgroup_id is not specified the iterator gets called for all cgroups instead of one. This fixes the problem for our use case and also keeps cgroup_iter generic enough. Is my understanding correct? If yes, I don't see a need to flush rstat in the kernel on behalf of cgroup_iter progs.