Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1157611rwb; Thu, 8 Dec 2022 07:22:42 -0800 (PST) X-Google-Smtp-Source: AA0mqf50OyyDaskickARC+ayt0Wtm3J0RRGXpsI7zj75f+kzcng4Of3tIyaykW2kYUIv5KJfPRWW X-Received: by 2002:a05:6a00:4003:b0:575:d2cf:1142 with SMTP id by3-20020a056a00400300b00575d2cf1142mr2534935pfb.13.1670512962533; Thu, 08 Dec 2022 07:22:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512962; cv=none; d=google.com; s=arc-20160816; b=HvfRtipJEfF/rki7p5pq5+9UxS9FN42fULTFIP8zkymIHyhfIVyyoZnT5dDRYEWHSP tQVNAz9eafGmsde5EsL0+f15tdm+5ottl2OdPrRZPr1J0IqQX2Icoh3uc3OztOcPZxF7 HlhHuN1ziztWPc5DUaRMo7bbGfEusgfM23FHyrBdvB1NVqQeY1hxJymkHKIgQxEmzdFy fTQu0ShsDsthbEkTJWwqNbq+J6bZLcPjZ5K1K78QwUGlIQNLYTPkAqygMm/J1BLRRnGe v8LfxCzBkLBjqvFADPfwEcLa5YeMIs5h7bgp0nJlyTJl0XxCM6MDpejKFuYu6p+DITEM BLlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature:dkim-filter; bh=Q/kTXQ0Cy2jF/9ctAGUSG3KpRvJg6WX6zQtjcqoVlt0=; b=O5QJlZ9UJWyLPP/dxib8gcIHgt3LpU9/g6XOhLUHQQ0VRRDxGCF5oK2SKWS6MqY4U1 MSTuuxARH7dcEAR+oMYC/ON60M5JHEGdXuEiWzVSYCStcbXqho+r4PqwuxuAiu997lV9 JeQRBas5yOSfii+hS9eKOFCdjPQYfeBM4+sFfwriMhuAuqM+ozCFONK9qocm1SHCPZkq 49CKhdGsC6/aPGuE6MPp5jA5k5eM9haHsQswwuy3v5apLFQ5OiTW4551FKJuBos2eMEg 5VfmDlwb+cmsaxj+wdhj1XKpjTWz4qylj8Re8IAOEgfIfzBrOkRtcVVPLNMXGwATVj1y KZjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=CjGkKGTc; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cv9-20020a056a0044c900b005631aaf81f2si22329307pfb.183.2022.12.08.07.22.22; Thu, 08 Dec 2022 07:22:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=CjGkKGTc; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230352AbiLHPVN (ORCPT + 99 others); Thu, 8 Dec 2022 10:21:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230223AbiLHPUy (ORCPT ); Thu, 8 Dec 2022 10:20:54 -0500 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BF74671243; Thu, 8 Dec 2022 07:20:11 -0800 (PST) Received: by linux.microsoft.com (Postfix, from userid 1112) id 7657B20B6C40; Thu, 8 Dec 2022 07:20:11 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 7657B20B6C40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1670512811; bh=Q/kTXQ0Cy2jF/9ctAGUSG3KpRvJg6WX6zQtjcqoVlt0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=CjGkKGTcFFRj7T6RLwa3KZ6PyeqdS+urbWoxIk+/4ZPnj1//qiRG+Z6yVEJSsxH8v PaWUn28Fv4rqqIyAQvkuzNXyoa0hZkTsDpmWJyykC4Dkp+yH7Y5CIA7+ulMCLrJdUn y3YHn8w0BD/Dz6hfq4yIRMprk1SV05OtwFWTXKjM= Date: Thu, 8 Dec 2022 07:20:11 -0800 From: Jeremi Piotrowski To: Jan Kara Cc: Theodore Ts'o , Thorsten Leemhuis , Andreas Dilger , linux-ext4@vger.kernel.org, stable@vger.kernel.org, Thilo Fromm , Andreas Gruenbacher Subject: Re: [PATCH] ext4: Fix deadlock due to mbcache entry corruption Message-ID: <20221208152011.GA12315@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> References: <20221123193950.16758-1-jack@suse.cz> <20221201151021.GA18380@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> <9c414060-989d-55bb-9a7b-0f33bf103c4f@leemhuis.info> <20221208091523.t6ka6tqtclcxnsrp@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221208091523.t6ka6tqtclcxnsrp@quack3> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-19.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu, Dec 08, 2022 at 10:15:23AM +0100, Jan Kara wrote: > Hi Ted! > > On Thu 08-12-22 00:55:55, Theodore Ts'o wrote: > > One thing which is completely unclear to me is how this relates to the > > claimed regression. I understand that Jeremi and Thilo have asserted > > that the hang goes away if a backport commit 51ae846cff5 ("ext4: fix > > warning in ext4_iomap_begin as race between bmap and write") is not in > > their 5.15 product tree. > > IMHO the bisection was flawed and somehow the test of a revert (which guys > claimed to have done) must have been lucky and didn't trip the race. This > is not that hard to imagine because firstly, the commit 51ae846cff5 got > included in the same stable kernel release as ext4 xattr changes > (65f8b80053 ("ext4: fix race when reusing xattr blocks") and related > mbcache changes) which likely made the race more likely. Secondly, the > mbcache state corruption is not that easy to hit because you need an > interaction of slab reclaim on mbcache entry with ext4 xattr code adding > reference to xattr block and just hitting the reference limit. > Yeah, sorry about that, there was never a bisect that led to 51ae846cff5, it was just a guess and at that point we were unable to reproduce it ourselves so we just had information from a user stating that when they revert that commit in their own test build the issue doesn't occur. Once we were able to personally reproduce the actual bisect led to 65f8b80053, which as Honza stated made sure that the corruption/inconsistency leads to a busy loop which is harder to miss. > > However, the stack traces point to a problem in the extended attribute > > code, which has nothing to do with ext4_bmap(), and commit 51ae846cff5 > > only changes the ext4's bmap function --- which these days gets used > > for the FIBMAP ioctl and very little else. > > > > Furthermore, the fix which Jan provided, and which apparently fixes > > the user's problem, (a) doesn't touch the ext4_bmap function, and (b) > > has a fixes tag for the patch: > > > > Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries") > > > > ... which is a commit which dates back to 2016, and the v4.6 kernel. ?!? > > Yes. AFAICT the bitfield race in mbcache was introduced in this commit but > somehow ext4 was using mbcache in a way that wasn't tripping the race. > After 65f8b80053 ("ext4: fix race when reusing xattr blocks"), the race > became much more likely and users started to notice... > > Honza > -- > Jan Kara > SUSE Labs, CR