Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp736867rwb; Fri, 18 Nov 2022 07:37:16 -0800 (PST) X-Google-Smtp-Source: AA0mqf4TSwGpkGe084f4ZHuCwYaiv+Qnq34qYIL5nN85JrwQCWz1U0XgF3Y3JKoEPSjGT6m6KhXT X-Received: by 2002:a17:90a:6906:b0:20d:5587:805b with SMTP id r6-20020a17090a690600b0020d5587805bmr8379974pjj.190.1668785835815; Fri, 18 Nov 2022 07:37:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668785835; cv=none; d=google.com; s=arc-20160816; b=eXeIM6XDZCIgUmvU35sjt1YTdtge9MPDWy6DD/Az6zFRJIf1Do3pMyxrh60uqWf5BF sfsRv+VHzc5g1Me//dj9DRn7fX3WsrmKEdNPuPJ0lxfNsv18hMfxFqGrTZNb73mky+Pj 52EEsLXjNxfY4bB/K8tdnVjXeXMZrHKs2kaZrT2wvyh22W7YTkYtzcJoAsvkpD08yRDp 02uBlhlb/J2HS2iohbPFimcc54p6jFuo7WDpOVk8ofL+df4cpSk4dmE8nQTH833Fxmb1 oH2bD4tYVymFbhDJxR7N8uGL69LB1z4GYIa1VRuMGzHJzBMEDpmwOKxWsMRFENqGrR0Y 3pNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=Vchearp9WS1QYhrT0ExBOsKGzWpettAVm8UGsd0BETI=; b=vf2iV69/WCm/uGFT6in6ZRhyipi9HrJUaSu7RGvFA70UHctIq3HnrtLZ0ySnBq+sAh qDJ5I3L9TJWv/8LwH4t5yWou4qWcEHHFltizSYkt7gFu0SGfPCdsp8D0D9yfAVzpRtIa PAIRZAedOnpTKwO9srnsQtkPa1VJFDZrbV3cCFrCs/970/fvGMX3WG4skbdr9xbhul+s aQffx8+MTVPCna5NrlKTbXPByHkUcnLws2hsjPWSeOLztNeDsTPCkgzDHQo1xI97sL/2 jcl7wNdzT5TSG+bi0uI56xqDE/aW9BpqIyJD/fm84CN6eaE/3eSp8XDTJljgNRU4OoGP Resw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mw13-20020a17090b4d0d00b00218592c28casi9114951pjb.33.2022.11.18.07.37.04; Fri, 18 Nov 2022 07:37:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235335AbiKRPIW (ORCPT + 91 others); Fri, 18 Nov 2022 10:08:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241628AbiKRPIR (ORCPT ); Fri, 18 Nov 2022 10:08:17 -0500 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 561A1140EF; Fri, 18 Nov 2022 07:08:16 -0800 (PST) Received: by mail-qk1-f175.google.com with SMTP id z17so3551002qki.11; Fri, 18 Nov 2022 07:08:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vchearp9WS1QYhrT0ExBOsKGzWpettAVm8UGsd0BETI=; b=Ff3h4ArePwDYN1xRuKlThjZBJO+/FYWTbRmsPc0Vpw+XAa0h79DlBpSJhg3Sfx08eh PE6nq7Eucup8e1aebvKu4d/ijGYBMRPrshZFkGCIykriQK+KZpnWH3Eu8i8ZpHpApm+D rugDbofIPOX1feN46Ft8heenpbQqEf4fH3S4PClOHv4L7WOBxKXwT3NvZfa7037MO45G 52oxv5sE1/j+PAcRzvtUjm4C5YG8ZFM4bhBVPhaxu7yVLjHjMBZ3cRaDxrADyQELPqTB oWeR7zlvh4ZB30PFm8cxdtPOEHNkcg1M2uwMNsOhCEaBiPjP5kGLLUPZ/JcuNuq+3xYN 6sVg== X-Gm-Message-State: ANoB5pkT8FQBXPLRjZDyjMJ52zjg/MfaFYRogh0Fo4VsZq7NkP1FCv5N EASxiisEhFMLHVTFUY5m7Cw= X-Received: by 2002:a37:9a0e:0:b0:6fa:b56e:38b2 with SMTP id c14-20020a379a0e000000b006fab56e38b2mr6233053qke.521.1668784095114; Fri, 18 Nov 2022 07:08:15 -0800 (PST) Received: from maniforge.lan (c-24-15-214-156.hsd1.il.comcast.net. [24.15.214.156]) by smtp.gmail.com with ESMTPSA id h18-20020a05620a401200b006fa2b1c3c1esm2557495qko.58.2022.11.18.07.08.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Nov 2022 07:08:14 -0800 (PST) Date: Fri, 18 Nov 2022 09:08:12 -0600 From: David Vernet To: John Fastabend Cc: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org, daniel@iogearbox.net, martin.lau@linux.dev, memxor@gmail.com, yhs@fb.com, song@kernel.org, sdf@google.com, kpsingh@kernel.org, jolsa@kernel.org, haoluo@google.com, tj@kernel.org, kernel-team@fb.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs Message-ID: References: <20221117032402.2356776-1-void@manifault.com> <6376a1b12bb4d_4101208d@john.notmuch> <6376b7822f4df_8c7a208f7@john.notmuch> <6377206bed37e_2063d20878@john.notmuch> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6377206bed37e_2063d20878@john.notmuch> User-Agent: Mutt/2.2.7 (2022-08-07) X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 17, 2022 at 10:04:27PM -0800, John Fastabend wrote: [...] > > > And last thing I was checking is because KF_SLEEPABLE is not set > > > this should be blocked from running on sleepable progs which would > > > break the call_rcu in the destructor. Maybe small nit, not sure > > > its worth it but might be nice to annotate the helper description > > > with a note, "will not work on sleepable progs" or something to > > > that effect. > > > > KF_SLEEPABLE is used to indicate whether the kfunc _itself_ may sleep, > > not whether the calling program can be sleepable. call_rcu() doesn't > > block, so no need to mark the kfunc as KF_SLEEPABLE. The key is that if > > a kfunc is sleepable, non-sleepable programs are not able to call it > > (and this is enforced in the verifier). > > OK but should these helpers be allowed in sleepable progs? I think > not. What stops this, (using your helpers): > > cpu0 cpu1 > ---- > v = insert_lookup_task(task) > kptr = bpf_kptr_xchg(&v->task, NULL); > if (!kptr) > return 0; > map_delete_elem() > put_task() > rcu_call > do_something_might_sleep() > put_task_struct > ... free > kptr->[free'd memory] > > the insert_lookup_task will bump the refcnt on the acquire on map > insert. But the lookup doesn't do anything to the refcnt and the > map_delete_elem will delete it. We have a check for spin_lock > types to stop them from being in sleepable progs. Did I miss a > similar check for these? So, in your example above, bpf_kptr_xchg(&v->task, NULL) will atomically xchg the kptr from the map, and so the map_delete_elem() call would fail with (something like) -ENOENT. In general, the semantics are similar to std::unique_ptr::swap() in C++. FWIW, I think KF_KPTR_GET kfuncs are the more complex / racy kfuncs to reason about. The reason is that we're passing a pointer to the map value containing a kptr directly to the kfunc (with the attempt of acquiring an additional reference if a kptr was already present in the map) rather than doing an xchg which atomically gets us the unique pointer if nobody else xchgs it in first. So with KF_KPTR_GET, someone else could come along and delete the kptr from the map while the kfunc is trying to acquire that additional reference. The race looks something like this: cpu0 cpu1 ---- v = insert_lookup_task(task) kptr = bpf_task_kptr_get(&v->task); map_delete_elem() put_task() rcu_call put_task_struct ... free if (!kptr) /* In this race example, this path will be taken. */ return 0; The difference is that here, we're not doing an atomic xchg of the kptr out of the map. Instead, we're passing a pointer to the map value containing the kptr directly to bpf_task_kptr_get(), which itself tries to acquire an additional reference on the task to return to the program as a kptr. This is still safe, however, as bpf_task_kptr_get() uses RCU and refcount_inc_not_zero() in the bpf_task_kptr_get() kfunc to ensure that it can't hit a UAF, and that it won't return a dying task to the caller: /** * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task * kptr acquired by this kfunc which is not subsequently stored in a map, must * be released by calling bpf_task_release(). * @pp: A pointer to a task kptr on which a reference is being acquired. */ __used noinline struct task_struct *bpf_task_kptr_get(struct task_struct **pp) { struct task_struct *p; rcu_read_lock(); p = READ_ONCE(*pp); /* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< * cpu1 could remove the element from the map here, and invoke * put_task_struct_rcu_user(). We're in an RCU read region * though, so the task won't be freed until at the very * earliest, the rcu_read_unlock() below. * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> */ if (p && !refcount_inc_not_zero(&p->rcu_users)) /* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< * refcount_inc_not_zero() will return false, as cpu1 * deleted the element from the map and dropped its last * refcount. So we just return NULL as the task will be * deleted once an RCU gp has elapsed. * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> */ p = NULL; rcu_read_unlock(); return p; } Let me know if that makes sense. This stuff is tricky, and I plan to clearly / thoroughly add it to that kptr docs page once this patch set lands.