Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp4947431rdb; Fri, 15 Sep 2023 18:39:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IExvLNjHH13VwOjTcRnE6GivT56cO+83Fg8jBFoOuIDbyLDf73m96jRm+jAlrpAisXMN+NE X-Received: by 2002:a54:4718:0:b0:3a3:6cb2:d5bf with SMTP id k24-20020a544718000000b003a36cb2d5bfmr3597279oik.4.1694828386125; Fri, 15 Sep 2023 18:39:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694828386; cv=none; d=google.com; s=arc-20160816; b=tBrd8d1c6WXgIEcd0cdektIGpw4ZUnXHwKwSvIzdhKljAjFroiXYoog80ukoWiqsZb 4nRWNE8Fs2tpk8tDkKIhTPqdyJkFOtMIuQZd617uHGSGtp7tprSugsgtvNCnWQx/cEQy mhq7JVj2IhSgrKocq9fQizf1yVVkkkOEfXTCSuVMPF96YgLsjEGTFb+ffNOLUWpJ9vBQ WvAg6NSvWT8vo5E8MbpseagVMKmMX5ny9zQwmYEeFshwbTBriScf2nYRCbMt1zX3kG5I 5x7CQ0xvVzqUBvHj1giTyDHCCR4mHs5x2glfupiuerNy1fwREIr5DQJd1G3jYtaASX3j 7kBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=LlwYCvNiValFIdZJV91TP0CGm/bPmLsjvUp+RBWRPbI=; fh=luN1T+TZmJdWrv2MwLqOcKf0btUR/zj1LntAQkciNa8=; b=o/2EITvG9O18mG/uEReuv/4C3/OscjnsQlN9vcd4Gt0DFKv3OCgiQA4ASdBG00br3o QWmsnbHVlR7n4eb7R7DAGIxNvmFYr6MFXqHoYUWibisRRhh8tYWadxq2u15rK9xYlo40 TzGRl1kfrIxxlZC1NHGF+C9w6mKy4XseMWltFgACElJ0X8RtFc7nk1V6nvRR4g61nxH8 CR6S5NbYYiRfZ5o1FEiwppKm57V7SB010n5ZusDlkSUfeNGyMAPW6yYzwUUmI7oVAbwB g6AOKAzcq1WlO/mJNzE1bP8R9W+RJ2GaqhDUvQg/Bh9xMVBQYo5JkSmOmf7GeoaIpn57 kctw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Ea5YJO7H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id ck10-20020a056a00328a00b0068fea8a6169si4051285pfb.404.2023.09.15.18.39.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Sep 2023 18:39:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Ea5YJO7H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id C79BC80E226D; Fri, 15 Sep 2023 13:38:41 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237376AbjIOUiL (ORCPT + 99 others); Fri, 15 Sep 2023 16:38:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237312AbjIOUhz (ORCPT ); Fri, 15 Sep 2023 16:37:55 -0400 Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F687B2; Fri, 15 Sep 2023 13:37:50 -0700 (PDT) Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-52bcb8b199aso3080375a12.3; Fri, 15 Sep 2023 13:37:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1694810269; x=1695415069; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LlwYCvNiValFIdZJV91TP0CGm/bPmLsjvUp+RBWRPbI=; b=Ea5YJO7HTY0qAZaAzgzFgo2z1mHeKtHIN1Q4N/3zxx1us++DuwChWakEmOOI/IBw2d jlBGJtL6EKd6kpQhKnZSzaLbhsGrjNguw3C06H8DRjB3LP0lYtcE3/Ae1pydJEQRC5jP 1PL07UWrJzhtC15AS5/Vh+G6+4mRMN1NvsgAovosUSEk4OQJY2qj0X6RC1uaL+1yxGBg XCPJZqvc9VhHWoyepM/Wj1XRPEohIL3cNkusODlJv2OkLu1KLY2j0jnKyr4UiLc2+PqZ qEoDRWVw2GSTOgVev4unR7BcCsvVSyY9TIgiIcPrc+YLRqZnD3FVYRXC0DUeN8o5YpuG nOnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694810269; x=1695415069; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LlwYCvNiValFIdZJV91TP0CGm/bPmLsjvUp+RBWRPbI=; b=gFL3Jqx3TgOrPeanBUsMq/kk6+oOD9QPH7xC+KztRC4lcU7dgi5npUBAS3Y2vbvv1C xLGXwosMjpIQ6/9wV4DONE1Q1PAH2ylkomiPEByjqlPocIRpvY3xFgT0YiL+fDlujXyO G0LnZnuJ74l8N6/74XyRwp69XMvS5mSQf67vBxQWNg1YamlMaSbL4E6CEhtweDEOPIvl ngkBZcUheDKw7qR35KaCM9RaeBByikghLoaXBxdb3tAjaDRhEC1NZYQGuLEQ7jp/q1ud RQXUHMBHa4VA2nKVHeo0S2rO2LgaAMjf3aUzRtUjn2239msoPjpGFo8VbfA22NipZPrd xXbQ== X-Gm-Message-State: AOJu0YywiE98efYWqVQk8pwaUumWzgYje20Vg322nlatyK0rHWhLlhaZ lZjaEj/hTz6KmuatAWP9TBsYz/Ya6pjeV+mYuqc= X-Received: by 2002:aa7:caca:0:b0:523:387d:f5f1 with SMTP id l10-20020aa7caca000000b00523387df5f1mr2449282edt.24.1694810268479; Fri, 15 Sep 2023 13:37:48 -0700 (PDT) MIME-Version: 1.0 References: <20230912070149.969939-1-zhouchuyi@bytedance.com> <20230912070149.969939-4-zhouchuyi@bytedance.com> <30eadbff-8340-a721-362b-ff82de03cb9f@bytedance.com> In-Reply-To: <30eadbff-8340-a721-362b-ff82de03cb9f@bytedance.com> From: Andrii Nakryiko Date: Fri, 15 Sep 2023 13:37:36 -0700 Message-ID: Subject: Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs To: Chuyi Zhou Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Fri, 15 Sep 2023 13:38:42 -0700 (PDT) On Fri, Sep 15, 2023 at 8:03=E2=80=AFAM Chuyi Zhou wrote: > > > > =E5=9C=A8 2023/9/15 07:26, Andrii Nakryiko =E5=86=99=E9=81=93: > > On Tue, Sep 12, 2023 at 12:02=E2=80=AFAM Chuyi Zhou wrote: > >> > >> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow > >> creation and manipulation of struct bpf_iter_process in open-coded ite= rator > >> style. BPF programs can use these kfuncs or through bpf_for_each macro= to > >> iterate all processes in the system. > >> > >> Signed-off-by: Chuyi Zhou > >> --- > >> include/uapi/linux/bpf.h | 4 ++++ > >> kernel/bpf/helpers.c | 3 +++ > >> kernel/bpf/task_iter.c | 29 +++++++++++++++++++++++++++++ > >> tools/include/uapi/linux/bpf.h | 4 ++++ > >> tools/lib/bpf/bpf_helpers.h | 5 +++++ > >> 5 files changed, 45 insertions(+) > >> > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >> index de02c0971428..befa55b52e29 100644 > >> --- a/include/uapi/linux/bpf.h > >> +++ b/include/uapi/linux/bpf.h > >> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task { > >> __u64 __opaque[1]; > >> } __attribute__((aligned(8))); > >> > >> +struct bpf_iter_process { > >> + __u64 __opaque[1]; > >> +} __attribute__((aligned(8))); > >> + > >> #endif /* _UAPI__LINUX_BPF_H__ */ > >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > >> index d6a16becfbb9..9b7d2c6f99d1 100644 > >> --- a/kernel/bpf/helpers.c > >> +++ b/kernel/bpf/helpers.c > >> @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER= _DESTROY) > >> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW) > >> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NUL= L) > >> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) > >> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW) > >> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL) > >> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY) > >> BTF_ID_FLAGS(func, bpf_dynptr_adjust) > >> BTF_ID_FLAGS(func, bpf_dynptr_is_null) > >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) > >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c > >> index d8539cc05ffd..9d1927dc3a06 100644 > >> --- a/kernel/bpf/task_iter.c > >> +++ b/kernel/bpf/task_iter.c > >> @@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct= bpf_iter_css_task *it) > >> kfree(kit->css_it); > >> } > >> > >> +struct bpf_iter_process_kern { > >> + struct task_struct *tsk; > >> +} __attribute__((aligned(8))); > >> + > > > > Few high level thoughts. I think it would be good to follow > > SEC("iter/task") naming and approach. Open-coded iterators in many > > ways are in-kernel counterpart to iterator programs, so keeping them > > close enough within reason is useful for knowledge transfer. > > > > SEC("iter/task") allows to: > > a) iterate all threads in the system > > b) iterate all threads for a given TGID > > c) it also allows to "iterate" a single thread or process, but that's > > a bit less relevant for in-kernel iterator, but we can still support > > them, why not? > > > > I'm not sure if it supports iterating all processes (as in group > > leaders of each task group) in the system, but if it's possible I > > think we should support it at least for open-coded iterator, seems > > like a very useful functionality. > > > > So to that end, let's design a small set of input arguments for > > bpf_iter_process_new() that would allow to specify this as flags + > > either (optional) struct task_struct * pointer to represent > > task/process or PID/TGID. > > > > Another concern from Alexei was the readability of the API of open-coded > in BPF Program[1]. > > bpf_for_each(task, curr) is straightforward. Users can easily understand > that this API does the same thing as 'for_each_process' in kernel. In general, users might have no idea about for_each_process macro in the kernel, so I don't find this particular argument very convincing. We can add a separate set of iterator kfuncs for every useful combination of conditions, of course, but it's a double-edged sword. Needing to use a different iterator just to specify a different direction of cgroup iteration (from the example you referred in [1]) also means that it's now harder to write some generic function that needs to do something for all cgroups matching some criteria where the order might be coming as an argument. Similarly for task iterators. It's not hard to imagine some processing that can be equivalently done per thread or per process in the system, or on each thread of the process, depending on some conditions or external configuration. Having to do three different bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the nature of the thing that is iterated over is the same, and it's just a different set of filters to specify which subset of those items should be iterated, I think it's better to try to stick to the same iterator with few simple arguments. IMO, of course, there is no objectively best approach. > > However, if we keep the approach of SEC("iter/task") > > enum ITER_ITEM { > ITER_TASK, > ITER_THREAD, > } > > __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct > task_struct *group_task, enum ITER_ITEM type) > > the API have to chang: > > > bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in > the system > bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all > thread of group_leader > bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of > all the process in the system > > Useres may guess what are this API actually doing.... I'd expect users to consult documentation before trying to use an unfamiliar cutting-edge functionality. So let's try to keep documentation clear and up to the point. Extra flag argument doesn't seem to be a big deal. > > So, I'm thinking if we can add a layer of abstraction to hide the > details from the users: > > #define bpf_for_each_process(task) \ > bpf_for_each(task, curr, NULL, ITERATE_TASK) > > > It would be nice if you could give me some better suggestions. No, please no. This macro wrapper is useless and just obfuscates what is going on. If users think it's helpful for them, they can trivially define it for their own code. > > Thanks! > > [1] > https://lore.kernel.org/lkml/CAADnVQLbDWUxFen-RS67C86sOE5DykEPD8xyihJ2RnG= 1WEnTQg@mail.gmail.com/