Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp2986165rdb; Tue, 12 Sep 2023 20:15:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGXc4f4NJGF/QcZl/ExojChp25/Xu3pRJTfI2pdvUtHKbOLDgOo6gDV5LygyF/Bx3sERmo3 X-Received: by 2002:a17:90b:1904:b0:273:f087:1c8f with SMTP id mp4-20020a17090b190400b00273f0871c8fmr1050921pjb.11.1694574959273; Tue, 12 Sep 2023 20:15:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694574959; cv=none; d=google.com; s=arc-20160816; b=ky2DJWB7QBnSmm8vmnP4l40MpAc7HQKicTDqkekfkel/LX/MOx/WKpmcfyIzuzKA41 AY302KslK7wTeMWfgYTlhUY2O3jQNHZrdVdAHGQtRDRV7gX0DAYhpdIx/vwcyX81X17U 8OIfDLYGgyqpwBsVpZrdx5C2x2/5/anM5tf6n3Ce7sRkt3G4YcI4fWFY75DFUtWQTPiH Qo4dfozy2dDjPWx5S7z5LfkEdBlb4yTFqsftX86w/XEPbVKlYcnR4ftYPoF6WF/AGAq/ vBiOIH0jLHsnkmfw5Ai5cQbpyjWNVmNWdgHL15ct+cRiGranrr4MhzLlTeqC8nfRmLkr DglQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=6IJfarNrrigVZ6GT5z4GlrFnGstC/W9B80yBZkpCLS4=; fh=lMgZAtsYlUMpZhngkQIo2qkZ9Y47MM2WK/Psb7v97mg=; b=fPwnvXsphDMzT+NitqZkkUYeMWA415byEuvVCCfWqKFUWe9dZpMpYSZdZnIocYLn7F 2qz4Js79/CZwN8E5y8DYhD4e+Nc/oZB1m7orzNwHAgukMzJ2CxBOE+DU4efXu67tWXCH UjB5I5Xc00YJQICqz0Cq8dB/XPjc2kxMVu1SdqJ3LeKj6qUbDxbapmlFYuDKnanLw4Uy a7gOvsKyush0q8uG1Tee68qiQ6N/NCQ7m9T8s1ISatD0ypOYLKsXGtUn/l36v0ZCxS3/ 6TaQRt3qmYRJiUOd4ONGI/H6b4TtZCuRbeFWVViy7b4/aBWJ3aRS/IuatVedMETzwzfR 5yFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=jG5zbqIC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id o16-20020a17090ac71000b00263a923c189si665616pjt.100.2023.09.12.20.15.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 20:15:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=jG5zbqIC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 5FEE18249F49; Tue, 12 Sep 2023 15:12:52 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234813AbjILWMv (ORCPT + 99 others); Tue, 12 Sep 2023 18:12:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234111AbjILWMu (ORCPT ); Tue, 12 Sep 2023 18:12:50 -0400 Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7266210DF; Tue, 12 Sep 2023 15:12:46 -0700 (PDT) Received: by mail-ej1-x633.google.com with SMTP id a640c23a62f3a-9a648f9d8e3so784875566b.1; Tue, 12 Sep 2023 15:12:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694556765; x=1695161565; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6IJfarNrrigVZ6GT5z4GlrFnGstC/W9B80yBZkpCLS4=; b=jG5zbqICHDkZbcu7osSx3Aywz7K2dqp2HwIw3Lo2/CTjkUOsKm28EK0IH9XRwOJz1O VdNrJ8y7JKwO4NatztujZgobcEG7i8W4jzRy1SUnMIXDE4bv+SAj1qR9YKTtzLMyv8r9 vF5RbqVwtYuk14cSyMzVPOk9JRticIGJHS0ctKak1/+4xPpukiID6TBi8DZwkdDgqvSm M7zf5RN4p0Nb98lImA1svMJlA37QLSeajf4t7WKf0cOOZMHqlsHjfqemH3ai1npsRC5z CVi8l1keLeQPu7Q8vXvAI/jdwV8sHOmqLxG4wLho9o4VMdRUncsqt8ttoAhwLohu9dO1 nLNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694556765; x=1695161565; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6IJfarNrrigVZ6GT5z4GlrFnGstC/W9B80yBZkpCLS4=; b=Uje/s1QbNbbBFUKHlIXa0ByGFEihlkEYLA6O71yq+KqhmShc/JabMnAU8e4dMZ2hNa 5S/IfMkpJCQIJ7e1uGvoa7GzLR2tpYpbj+PpfeP1l8snQwyJN3+Bcyx3QoZHEpT0oB0G huQFzfoBUzYyRSboqCGzOsoI4rzvwvw1gKARJele7KP83Mfsfgh7TlNT/a8VDXgsjEax OC7DIEDpgc9CLxGQKk07YmqEBxPtmKOBoLGf0aMuyVNDcIaRtPUDlQ2NT1fOFYJ62pFQ 7L+kDm48wOO34uc7B3QCujrOfLEWeQ31aEBbip01d1pUCvV7mks81m6oEgc33rkwXoqQ /6ww== X-Gm-Message-State: AOJu0YxVx1nFH6uksK4l8hfFBbpSVr31BTSX3D2cgVsk8ddzfyowFqnS Ef9O1Pu2oej0E/Qzka7crfPwxXtJ0oIiO0yLSuk= X-Received: by 2002:a17:907:2c4f:b0:9a1:fab3:ee43 with SMTP id hf15-20020a1709072c4f00b009a1fab3ee43mr436753ejc.0.1694556764468; Tue, 12 Sep 2023 15:12:44 -0700 (PDT) MIME-Version: 1.0 References: <20230827072057.1591929-1-zhouchuyi@bytedance.com> <20230827072057.1591929-3-zhouchuyi@bytedance.com> In-Reply-To: From: Andrii Nakryiko Date: Tue, 12 Sep 2023 15:12:33 -0700 Message-ID: Subject: Re: [RFC PATCH bpf-next 2/4] bpf: Introduce process open coded iterator kfuncs To: Alexei Starovoitov Cc: Chuyi Zhou , bpf , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 12 Sep 2023 15:12:52 -0700 (PDT) X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email On Wed, Sep 6, 2023 at 10:18=E2=80=AFAM Alexei Starovoitov wrote: > > On Wed, Sep 6, 2023 at 5:38=E2=80=AFAM Chuyi Zhou wrote: > > > > Hello, Alexei. > > > > =E5=9C=A8 2023/9/6 04:09, Alexei Starovoitov =E5=86=99=E9=81=93: > > > On Sun, Aug 27, 2023 at 12:21=E2=80=AFAM Chuyi Zhou wrote: > > >> > > >> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which all= ow > > >> creation and manipulation of struct bpf_iter_process in open-coded i= terator > > >> style. BPF programs can use these kfuncs or through bpf_for_each mac= ro to > > >> iterate all processes in the system. > > >> > > >> Signed-off-by: Chuyi Zhou > > >> --- > > >> include/uapi/linux/bpf.h | 4 ++++ > > >> kernel/bpf/helpers.c | 3 +++ > > >> kernel/bpf/task_iter.c | 31 ++++++++++++++++++++++++++++++= + > > >> tools/include/uapi/linux/bpf.h | 4 ++++ > > >> tools/lib/bpf/bpf_helpers.h | 5 +++++ > > >> 5 files changed, 47 insertions(+) > > >> > > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > >> index 2a6e9b99564b..cfbd527e3733 100644 > > >> --- a/include/uapi/linux/bpf.h > > >> +++ b/include/uapi/linux/bpf.h > > >> @@ -7199,4 +7199,8 @@ struct bpf_iter_css_task { > > >> __u64 __opaque[1]; > > >> } __attribute__((aligned(8))); > > >> > > >> +struct bpf_iter_process { > > >> + __u64 __opaque[1]; > > >> +} __attribute__((aligned(8))); > > >> + > > >> #endif /* _UAPI__LINUX_BPF_H__ */ > > >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > >> index cf113ad24837..81a2005edc26 100644 > > >> --- a/kernel/bpf/helpers.c > > >> +++ b/kernel/bpf/helpers.c > > >> @@ -2458,6 +2458,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_IT= ER_DESTROY) > > >> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW) > > >> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_N= ULL) > > >> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) > > >> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW) > > >> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NUL= L) > > >> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY) > > >> BTF_ID_FLAGS(func, bpf_dynptr_adjust) > > >> BTF_ID_FLAGS(func, bpf_dynptr_is_null) > > >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) > > >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c > > >> index b1bdba40b684..a6717a76c1e0 100644 > > >> --- a/kernel/bpf/task_iter.c > > >> +++ b/kernel/bpf/task_iter.c > > >> @@ -862,6 +862,37 @@ __bpf_kfunc void bpf_iter_css_task_destroy(stru= ct bpf_iter_css_task *it) > > >> kfree(kit->css_it); > > >> } > > >> > > >> +struct bpf_iter_process_kern { > > >> + struct task_struct *tsk; > > >> +} __attribute__((aligned(8))); > > >> + > > >> +__bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it) > > >> +{ > > >> + struct bpf_iter_process_kern *kit =3D (void *)it; > > >> + > > >> + BUILD_BUG_ON(sizeof(struct bpf_iter_process_kern) !=3D sizeo= f(struct bpf_iter_process)); > > >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_process_kern) !=3D > > >> + __alignof__(struct bpf_iter_= process)); > > >> + > > >> + rcu_read_lock(); > > >> + kit->tsk =3D &init_task; > > >> + return 0; > > >> +} > > >> + > > >> +__bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_it= er_process *it) > > >> +{ > > >> + struct bpf_iter_process_kern *kit =3D (void *)it; > > >> + > > >> + kit->tsk =3D next_task(kit->tsk); > > >> + > > >> + return kit->tsk =3D=3D &init_task ? NULL : kit->tsk; > > >> +} > > >> + > > >> +__bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *= it) > > >> +{ > > >> + rcu_read_unlock(); > > >> +} > > > > > > This iter can be used in all ctx-s which is nice, but let's > > > make the verifier enforce rcu_read_lock/unlock done by bpf prog > > > instead of doing in the ctor/dtor of iter, since > > > in sleepable progs the verifier won't recognize that body is RCU CS. > > > We'd need to teach the verifier to allow bpf_iter_process_new() > > > inside in_rcu_cs() and make sure there is no rcu_read_unlock > > > while BPF_ITER_STATE_ACTIVE. > > > bpf_iter_process_destroy() would become a nop. > > > > Thanks for your review! > > > > I think bpf_iter_process_{new, next, destroy} should be protected by > > bpf_rcu_read_lock/unlock explicitly whether the prog is sleepable or > > not, right? > > Correct. By explicit bpf_rcu_read_lock() in case of sleepable progs > or just by using them in normal bpf progs that have implicit rcu_read_loc= k() > done before calling into them. > > > I'm not very familiar with the BPF verifier, but I believe > > there is still a risk in directly calling these kfuns even if > > in_rcu_cs() is true. > > > > Maby what we actually need here is to enforce BPF verifier to check > > env->cur_state->active_rcu_lock is true when we want to call these kfun= cs. > > active_rcu_lock means explicit bpf_rcu_read_lock. > Currently we do allow bpf_rcu_read_lock in non-sleepable, but it's pointl= ess. > > Technically we can extend the check: > if (in_rbtree_lock_required_cb(env) && (rcu_lock || > rcu_unlock)) { > verbose(env, "Calling > bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n"); > return -EACCES; > } > to discourage their use in all non-sleepable, but it will break some prog= s. > > I think it's ok to check in_rcu_cs() to allow bpf_iter_process_*(). > If bpf prog adds explicit and unnecessary bpf_rcu_read_lock() around > the iter ops it won't do any harm. > Just need to make sure that rcu unlock logic: > } else if (rcu_unlock) { > bpf_for_each_reg_in_vstate(env->cur_state, > state, reg, ({ > if (reg->type & MEM_RCU) { > reg->type &=3D ~(MEM_RCU | > PTR_MAYBE_NULL); > reg->type |=3D PTR_UNTRUSTED; > } > })); > clears iter state that depends on rcu. > > I thought about changing mark_stack_slots_iter() to do > st->type =3D PTR_TO_STACK | MEM_RCU; > so that the above clearing logic kicks in, > but it might be better to have something iter specific. > is_iter_reg_valid_init() should probably be changed to > make sure reg->type is not UNTRUSTED. > > Andrii, > do you have better suggestions? What if we just remember inside bpf_reg_state.iter state whether iterator needs to be RCU protected (it's just one bit if we don't allow nesting rcu_read_lock()/rcu_read_unlock(), or we'd need to remember RCU nestedness level), and then when validating iter_next and iter_destroy() kfuncs, check that we are still in RCU-protected region (if we have nestedness, then iter->rcu_nest_level <=3D cur_rcu_nest_level, if I understand correctly). And if not, provide a clear and nice message. That seems straightforward enough, but am I missing anything subtle?