Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp570063rwb; Wed, 7 Dec 2022 02:10:29 -0800 (PST) X-Google-Smtp-Source: AA0mqf6DH+91NkHUA9BcUwJlQQBdK8vXWuqsgY5MoGF7iLd/nJez7MIq6sYXAIhKzfhlxjqBVJOn X-Received: by 2002:a17:906:850a:b0:7c0:e4b5:a789 with SMTP id i10-20020a170906850a00b007c0e4b5a789mr12608317ejx.746.1670407829513; Wed, 07 Dec 2022 02:10:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670407829; cv=none; d=google.com; s=arc-20160816; b=QgyOyCJbllaWHQVP4cZNjFEJkOeTEQ6VXo5az8ejPE1fM/9IJLZaqW1J+6nRrz2m7M Bp+lgcuqv53GeJg8DqFOSAm3UxXr8r65EogpcD1RnKmD82w2f1PV4DojS3fjtIA7oshI QdVpvyyTH45qg6fh/YBjH0KWtdSu/NpXmsQA1FqINSn82V7lU9BsNkFMyq8lkksRHDbn /Mg+erTzGKa331MNjXbbCpLhch01zQyoTswA8WWO2I41MbyaBluw5fq4poPy4P1g/5sC FTvPuywsMDUpcsTtiOAlIPRMvmJO8xKKiOYYlrIVEhAfrLgLv4OxJlgXFLFyzX8DaMX6 Jtgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=3IU7j/fGQw488nTr3UdYtHsBo8tWefZ/LoILwoUa3Eg=; b=V6vXJbMMMrOp40K5cT8CtAdlrj1sMBX4xShY8MKOAe5tInCJMeEzX8bh8T92jJG+8J SSSlXrrsXk+zVRTsKvNJ4M1VPe7AR4FofGKM1+o0raRdcBeduDSnqVzHhmi4j3OslTeB g6MoUyZJtIBmuGn/vBQwxl3mwKKtPjiZ4rQSLiNFP8emIYn5EYeCM+7ZHHsJn3Rgdlx3 87X+Q7RMLqqhZ2kzuT7AKuyPZsRAzgkxHndcdVcGg+ggahnpRdmiSfoE4bl2WLkZOEp+ C2StFuwkF0mWj9hs4OxlXpUxul07uAHgjbWN6FBO8tyJBdo05Ny99T8CIy2tCCy6pC1z 3fIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dneg.com header.s=google header.b=opCUybT6; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=dneg.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bd20-20020a056402207400b004542c9947c1si3540262edb.217.2022.12.07.02.09.55; Wed, 07 Dec 2022 02:10:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@dneg.com header.s=google header.b=opCUybT6; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=dneg.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229606AbiLGJ6t (ORCPT + 99 others); Wed, 7 Dec 2022 04:58:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229543AbiLGJ6s (ORCPT ); Wed, 7 Dec 2022 04:58:48 -0500 Received: from mail-yw1-x112b.google.com (mail-yw1-x112b.google.com [IPv6:2607:f8b0:4864:20::112b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 956772C13C for ; Wed, 7 Dec 2022 01:58:47 -0800 (PST) Received: by mail-yw1-x112b.google.com with SMTP id 00721157ae682-381662c78a9so180676437b3.7 for ; Wed, 07 Dec 2022 01:58:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dneg.com; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=3IU7j/fGQw488nTr3UdYtHsBo8tWefZ/LoILwoUa3Eg=; b=opCUybT63u6jk4pKUL6KXy7eDxizeFUgxgC64OdAw0b05FLxEWolyTezoAENX0MGxA 6Sog89eCS+JfLcJWDpfDUY5JQK9O4Ya5Iu75ex3R8WJH/gD4l+d7R6dZKSNFS6re5LFD Ou6xXcdSjgm+PpTDHtp+H0dzkZW8r0C6bb6Aezchx6WkdJ3cL/ToIGdWzH301gVh4uOz aJ3mtHAvWoGo+XaKQea6sC6nEEGxbrQBh+CmCG/OJwNjK0QnEeK0ExD7jxxhNqhigm2C fox5XCZTTcWNaKOkIX1yZlO2r7TvLO1VVNpTZnnL/3C2JHB5A5b3+fvIerkDUgSceZlZ igxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3IU7j/fGQw488nTr3UdYtHsBo8tWefZ/LoILwoUa3Eg=; b=beIlm63qGeRef4cAaOIRWUV4NfR/mQcXEqUnMtfSR56GiaQ2QnWXtp5UYeAA9AC+JI 8nXMEgN7UHJ3Nq7vNZ1X0N5in0k3/cY4huTO9nH6wg6YskkzUNPsKAF9KiR8aYUQuyTN K3LAS/LrAg3ePTuL7V2LkpRzfw8AmJHlUWGgvg/c/8ox32e1R6NM0cPn3ECGCD9r5zvT 0xkl/KhYFcMggcskQXcLnA57jNQ1262RkqrvRbCkVaTbiFpvyO9KL+8GfA9bjmpZBX7m mdBR1id2J073k9pabvXYP3JTFK7yonnbCZ7cJc+xSpVSxmyG9VFtfClN+FKChjYTHHCt 3cWQ== X-Gm-Message-State: ANoB5pmA7rfdci2JtPJCrSxlLENy5wao/Hiiw7xh/UzrmmZbyHpmyPyd M3B6XH84yEJZ7LyV4E6Mx6ldWbAkLO/5x+nPsZuo6w== X-Received: by 2002:a81:f14:0:b0:3f1:f4b1:e197 with SMTP id 20-20020a810f14000000b003f1f4b1e197mr9161371ywp.324.1670407126795; Wed, 07 Dec 2022 01:58:46 -0800 (PST) MIME-Version: 1.0 References: <20221117142915.1366990-1-dwysocha@redhat.com> In-Reply-To: <20221117142915.1366990-1-dwysocha@redhat.com> From: Daire Byrne Date: Wed, 7 Dec 2022 09:58:10 +0000 Message-ID: Subject: Re: [PATCH v2] fscache: Fix oops due to race with cookie_lru and use_cookie To: Dave Wysochanski Cc: David Howells , Daire Byrne , Benjamin Maynard , linux-cachefs@redhat.com, linux-nfs@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org I have also now tested this v2 patch and can confirm that it also fixes the race in fscache that we were reliably able to reproduce with our (re-export) workloads.. Tested-by: Daire Byrne Daire On Thu, 17 Nov 2022 at 14:30, Dave Wysochanski wrote: > > If a cookie expires from the LRU and the LRU_DISCARD flag is set, > but the state machine has not run yet, it's possible another thread > can call fscache_use_cookie and begin to use it. When the > cookie_worker finally runs, it will see the LRU_DISCARD flag set, > transition the cookie->state to LRU_DISCARDING, which will then > withdraw the cookie. Once the cookie is withdrawn the object is > removed the below oops will occur because the object associated > with the cookie is now NULL. > > Fix the oops by clearing the LRU_DISCARD bit if another thread > uses the cookie before the cookie_worker runs. > > BUG: kernel NULL pointer dereference, address: 0000000000000008 > ... > CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G E 6.0.0-5.dneg.x86_64 #1 > Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 > Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs] > RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles] > ... > Call Trace: > netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs] > process_one_work+0x217/0x3e0 > worker_thread+0x4a/0x3b0 > ? process_one_work+0x3e0/0x3e0 > kthread+0xd6/0x100 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x1f/0x30 > > Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") > Reported-by: Daire Byrne > Signed-off-by: Dave Wysochanski > --- > fs/fscache/cookie.c | 8 ++++++++ > include/trace/events/fscache.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c > index 451d8a077e12..bce2492186d0 100644 > --- a/fs/fscache/cookie.c > +++ b/fs/fscache/cookie.c > @@ -605,6 +605,14 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify) > set_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags); > queue = true; > } > + /* > + * We could race with cookie_lru which may set LRU_DISCARD bit > + * but has yet to run the cookie state machine. If this happens > + * and another thread tries to use the cookie, clear LRU_DISCARD > + * so we don't end up withdrawing the cookie while in use. > + */ > + if (test_and_clear_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags)) > + fscache_see_cookie(cookie, fscache_cookie_see_lru_discard_clear); > break; > > case FSCACHE_COOKIE_STATE_FAILED: > diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h > index c078c48a8e6d..a6190aa1b406 100644 > --- a/include/trace/events/fscache.h > +++ b/include/trace/events/fscache.h > @@ -66,6 +66,7 @@ enum fscache_cookie_trace { > fscache_cookie_put_work, > fscache_cookie_see_active, > fscache_cookie_see_lru_discard, > + fscache_cookie_see_lru_discard_clear, > fscache_cookie_see_lru_do_one, > fscache_cookie_see_relinquish, > fscache_cookie_see_withdraw, > @@ -149,6 +150,7 @@ enum fscache_access_trace { > EM(fscache_cookie_put_work, "PQ work ") \ > EM(fscache_cookie_see_active, "- activ") \ > EM(fscache_cookie_see_lru_discard, "- x-lru") \ > + EM(fscache_cookie_see_lru_discard_clear,"- lrudc") \ > EM(fscache_cookie_see_lru_do_one, "- lrudo") \ > EM(fscache_cookie_see_relinquish, "- x-rlq") \ > EM(fscache_cookie_see_withdraw, "- x-wth") \ > -- > 2.31.1 >