Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1102415rwb; Tue, 4 Oct 2022 15:40:57 -0700 (PDT) X-Google-Smtp-Source: AMsMyM54Oujscrj35t5t0KoSaI50Ch/hQ2DATc2+hYqez+pPEKDnH5LKB1aGRYNCCzERttO7/+Xl X-Received: by 2002:a17:90b:3141:b0:20a:9553:cd90 with SMTP id ip1-20020a17090b314100b0020a9553cd90mr1859918pjb.11.1664923256847; Tue, 04 Oct 2022 15:40:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664923256; cv=none; d=google.com; s=arc-20160816; b=dkUaJ2U67TijJiVDmRPNMBosPIIkji0tj2Mfjx0BRYwPLUqIkPokgR5xtU3TPXMpTX hd/r41Mz5oE/mLKdVgwujaK01KIwwMIJ1kel0VW+qEJpykeNo4L918z0//68hjThUXgD RKmUrwknVte41HUBSLZt/9fpqQ8ojljh6cDJcvSv70mb4Mxx4vNtbAF2WX7HXwDNeg+9 /DMxqDtE5LneIb03DttLUXReeyc37PWVTACoagmU01l6GObPs5BmbjlN/PEbbrscFyE8 +TC7O8tb/Yl2AcfKSWrtliroBuXaeoSBYPR3/XO0+CRrhCc16g7prPoMobaRnjXkrjso DBRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding:dkim-signature :dkim-signature; bh=6YHB4eKZj4s/GZfWlDLRX3bMXEI3dxt9yepXSaHJLZ0=; b=l+NWjWCZM0E27ySJjK6+XytvesGkPRpYll9o6rWUmu4rQ2Rcsbzf9jhleVQM4hu+WU 23MQGdLN8m8Ob9a1/1MLZwog6faZQ+l84s1yfY9IXKagBsSCeiNnOtAnBW443TI6nhpg quNAbIWKR0Q+SFGLEiy75aQtvzusT4xK2Vy+zUANGq093wkBaXGM5E6TlMKMxexxk41c iWKcxYigMU1ynr0BJpMPmXytxAWCQIlUfPkIS2SjWk80aMnDMTaz/U5c332HlEg0dI+9 tNF/jX3aSjx7vnMDrPpxqBv+r/C4/5G7Mpr/zGA5+/xUZCo3uWetzJtjFk5EYF2AocQj 44hw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=n+aBkM6R; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z12-20020a17090ad78c00b0020ae29877besi239580pju.20.2022.10.04.15.40.44; Tue, 04 Oct 2022 15:40:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=n+aBkM6R; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229534AbiJDWRh (ORCPT + 99 others); Tue, 4 Oct 2022 18:17:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229489AbiJDWRg (ORCPT ); Tue, 4 Oct 2022 18:17:36 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9461669F4A for ; Tue, 4 Oct 2022 15:17:35 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 3B9E11F8F6; Tue, 4 Oct 2022 22:17:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1664921854; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6YHB4eKZj4s/GZfWlDLRX3bMXEI3dxt9yepXSaHJLZ0=; b=n+aBkM6R7PUeAMi+lNTBp0RT1Kv85FTg5ZkfgkD7J1Vzj1xHBSLzOrrEJ7fLwov9p+IDPn +2gSaYwr+0SDh73+cHaJkwr/f7CwCw/igpzDRLPYBKzz7DSyb761YNxaGVmMj1mIddNDId 5E++ICSO7xNqQgkYoa6iFyvyQ5Ib9LU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1664921854; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6YHB4eKZj4s/GZfWlDLRX3bMXEI3dxt9yepXSaHJLZ0=; b=A8cYlBRC61UzxAA9U+r9nAYgkgq/zUDpQD/68Vk+cJzNEn/XtYK7k9g2qAckdJ/7+dKenw A3+9oGI67aLLBZDw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 18739139D2; Tue, 4 Oct 2022 22:17:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id uNnWMPywPGNdKAAAMHmgww (envelope-from ); Tue, 04 Oct 2022 22:17:32 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 From: "NeilBrown" To: "Chuck Lever III" Cc: "Jeff Layton" , "Linux NFS Mailing List" Subject: Re: nfsd: another possible delegation race In-reply-to: References: <166486048770.14457.133971372966856907@noble.neil.brown.name>, Date: Wed, 05 Oct 2022 09:17:29 +1100 Message-id: <166492184913.14457.15445320504611194255@noble.neil.brown.name> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, 05 Oct 2022, Chuck Lever III wrote: > > > On Oct 4, 2022, at 1:14 AM, NeilBrown wrote: > > > > > > Hi, > > I have a customer who experienced a crash in nfsd which appears to be > > related to delegation return. I cannot completely rule-out > > Commit 548ec0805c39 ("nfsd: fix use-after-free due to delegation race") > > as the kernel being used didn't have that commit, but the symptoms are > > quite different, and while exploring I found, I think, a different > > race. This new race doesn't obviously address all the symptoms, but > > maybe it does... > > > > The symptoms were: > > 1/ WARN_ON(!unhash_delegation_locked(dp)); > > in nfs4_laundromat complained (delegation wasn't hashed!) > > 2/ refcount_t: saturated; leaking memory > > This came from the refcount_inc in revoke_delegation() called from > > nfs4_laundromat(), a few lines below the above warning > > 3/ BUG: kernel NULL pointer dereference, address: 0000000000000028 > > This is from the destroy_unhashed_deleg() call at the end of > > that same revoke_delegation() call, which calls > > nfs4_unlock_deleg_lease() and passes fp->fi_deleg_file, which is > > NULL (!!!), to vfs_setlease(). > > These three happened in a 200usec window. > > > > What I imagine might be happening is that the nfsd_break_deleg_cb() > > callback is called after destroy_delegation() has unhashed the deleg, > > but before destroy_unhashed_delegation() gets called. > > > > If nfsd_break_deleg_cb() is called before the unhash - and particularly > > if nfsd_break_one_deleg()->nfsd4_run_cb() is called before, then the > > unhash will disconnect the delegation from the recall list, and no > > harm can be done. > > Once vfs_setlease(F_UNLCK) is called, the callback can no longer be > > called, so again no harm is possible. > > > > Between these two is a race window. The delegation can be put on the > > recall list, but the deleg will be unhashed and put_deleg_file() will > > have set fi_deleg_file to NULL - resulting in first WARNING and the > > BUG. > > That seems plausible. I've been accepting defensive patches like > what you proposed below, so I can queue that up for v6.2 as soon as > you post an official version. > > It would help to know the kernel version where you encountered > these symptoms, and to have a rough description of the workload; > I assume you do not have a reliable reproducer. I'm wondering if > there should be a bug report too (bugzilla.linux-nfs.org)? > Kernel version 5.3.18 plus various backported patches SLE15-SP3 from July 2021, so not ancient but not the most recent either. I don't have any workload information. bug 394 seem much the same, though details might be different. > > > I cannot see how the refcount_t warning can be generated ... so maybe > > I've missed something. > > stid refcounting does not seem reliable in the current code base. > It's possible that the overflow is a separate issue that simply > appeared at the same time or due to the same conditions that > triggered the BUG. Maybe ... though refcount is quite noisy about anything suspicious.... Aha.. The refcount is at the start of the structure. When slub frees an allocation, it put is on a freelist with pointers at the start of the allocation. So freeing an nfsd_deleg will corrupt the refcount. So when refcount_inc() is called on a freed nfsd_deleg, we get the "saturation" error, because pointers tend to look like negative numbers. One would expect to see sc_type corrupt too because it is within the first 64 bits of the start of the struct, but it is set explicitly in revoke_delegation() which happens after the struct is freed, and is where the crash happens. Oh good - I think I understand it all now. Thanks :-) NeilBrown