Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1439728yba; Tue, 2 Apr 2019 08:54:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqyzyxArfp9TTrPYnWgQaopclhVr8fLMvSrnCgWNLK1l9JSavScXFoKC0JNVkTZcUNW9y+3u X-Received: by 2002:a62:6402:: with SMTP id y2mr41439167pfb.194.1554220483568; Tue, 02 Apr 2019 08:54:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554220483; cv=none; d=google.com; s=arc-20160816; b=RcWOd26FEX4ROfYUr1uNYvq14p0f1jLUZNxud9pOKJgGTl6x+6BzgdJBQNOaHDC8QM neA6/vzaKjCQf8yXqrvWdEeGWCGsRF1l0A+qbhEnmVVfEQV6SooUS3c9+w/LWPIHSbLo ox+cuo78vKLPiwTiap3+peaAkD1Kb1TnPgyfbgn6cUus45LP6JqwjeQVb7mQ7/txQaWM 14uN+iVQ/WvddBerjCTOchz5KTFdepFbe7lw7WPRapm8Yn090nny7Vh29zXZCANe8yzw t5ltYmk8oEVw5y8Mmt4T8WP21cQtlz52XoPPFXj0wMWXuIU303BWo7DJHoyMAx0hLHLi 1BKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=4yTrbeHnaqI0u9zslzJU/Mv1z4oe43y03+Bu1YXNj6Q=; b=0LD49j5+IJPg+ry661wjaNE2H4gprCyCx+pYyJCAOtUIyU5J4HOodgNWDrkrO2HinE zmNRQ9P0FauZckkZgeJxWyQDoxi8mDY7w9YLMwAU93l+pmSLRZEO5OrAw8GIozV1zpf7 Ww2O7DHWjdV2sKpS/ddKF1MVQJy+/D9Nbtk/8e4JoH4wyNH42YfwwoNQC9ngV+EWCcf8 faBNs7chhBMhaY7Xjn6tgrariYXSAd1cFgYMVI7h+h2oMgooSYyXZw/VrnvmzXNqeL1X 7NOEiILfGhGkxNi3wo2s2eocJyTttZGE7MMMSYbFJKTMP9QCmCJujYTrMyR0HSeuIcOB kipA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q2si11454332pgc.507.2019.04.02.08.54.28; Tue, 02 Apr 2019 08:54:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732376AbfDBPKc (ORCPT + 99 others); Tue, 2 Apr 2019 11:10:32 -0400 Received: from mx2.suse.de ([195.135.220.15]:40888 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729532AbfDBPKb (ORCPT ); Tue, 2 Apr 2019 11:10:31 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D3927AEFE; Tue, 2 Apr 2019 15:10:29 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 7CA6A1E3FD4; Tue, 2 Apr 2019 17:10:29 +0200 (CEST) Date: Tue, 2 Apr 2019 17:10:29 +0200 From: Jan Kara To: Greg Kroah-Hartman Cc: Jari Ruusu , "zhangyi (F)" , Theodore Ts'o , Jan Kara , linux-kernel@vger.kernel.org Subject: Re: ext3 file system livelock and file system corruption, 4.9.166 stable kernel Message-ID: <20190402151029.GA25668@quack2.suse.cz> References: <20190402103507.GA15511@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190402103507.GA15511@kroah.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 02-04-19 12:35:07, Greg Kroah-Hartman wrote: > On Tue, Apr 02, 2019 at 01:08:45PM +0300, Jari Ruusu wrote: > > To trigger this ext4 file system bug, you need a sparse file with > > correct sparse pattern on old-school ext3 file system. I tried > > more simpler ways to trigger this but those attempts did not > > trigger the bug. I have provided compressed sparse file that > > reliably triggers the bug. Size of compressed sparse file 1667256 > > bytes. Size of uncompressed sparse file 7369850880 bytes. > > Following commands will demo the problem. > > > > wget http://www.elisanet.fi/jariruusu/123/sparse-demo.data.xz > > xz -d sparse-demo.data.xz > > mkfs -t ext3 -b 4096 -e remount-ro -O "^dir_index" /dev/sdc1 > > mount -t ext3 /dev/sdc1 /mnt > > cp -v --sparse=always sparse-demo.data /mnt/aa > > cp -v --sparse=always sparse-demo.data /mnt/bb > > umount /mnt > > mount -t ext3 /dev/sdc1 /mnt > > cp -v --sparse=always /mnt/bb /mnt/aa > > > > That last cp command reliably triggers the bug that livelocks and > > after reset you have file system corruption to deal with. Deeply > > unfunny. > > > > The bug is caused by > > "ext4: brelse all indirect buffer in ext4_ind_remove_space()" > > upstream commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217, from > > , who provided a follow-up patch > > "ext4: cleanup bh release code in ext4_ind_remove_space()" > > upstream commit 5e86bdda41534e17621d5a071b294943cae4376e. The > > problem with that follow-up patch is that it is almost criminally > > mislabeled. It should have said "fixes ext3 livelock and file > > system corrupting bug" or something like that, so that Greg KH & > > Co would have understood that it must be backported to stable > > kernels too. Now the bug appears to be in all/most stable kernels > > already. > > > > Below is the buggy patch that causes the problem. Look at those > > new while loops. Once the while condition is true once, it is > > ALWAYS true, so it livelocks. > > > > > --- a/fs/ext4/indirect.c > > > +++ b/fs/ext4/indirect.c > > > @@ -1385,10 +1385,14 @@ end_range: > > > partial->p + 1, > > > partial2->p, > > > (chain+n-1) - partial); > > > - BUFFER_TRACE(partial->bh, "call brelse"); > > > - brelse(partial->bh); > > > - BUFFER_TRACE(partial2->bh, "call brelse"); > > > - brelse(partial2->bh); > > > + while (partial > chain) { > > > + BUFFER_TRACE(partial->bh, "call brelse"); > > > + brelse(partial->bh); > > > + } > > > + while (partial2 > chain2) { > > > + BUFFER_TRACE(partial2->bh, "call brelse"); > > > + brelse(partial2->bh); > > > + } > > > return 0; > > > } > > > > > > > Greg & Co, > > Please revert that above patch from stable kernels or backport the > > follow-up patch that fixes the problem. > > So you need 5e86bdda4153 ("ext4: cleanup bh release code in > ext4_ind_remove_space()") applied to all of the stable and LTS kernels > at the moment (as that patch only showed up in 5.1-rc1)? > > If so, I need an ack from the ext4 developers/maintainer to do so. Ack from me, and sorry for missing this brown paper bag bug during review... Honza -- Jan Kara SUSE Labs, CR