Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1162044yba; Tue, 2 Apr 2019 03:40:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqzURBvIGcYjPGHeFmIYn7ganUqb8UWChyOYc/faFBnJNx2ICM0/FOAQH0/I5e64jExqJV6B X-Received: by 2002:a17:902:b191:: with SMTP id s17mr15025825plr.70.1554201641285; Tue, 02 Apr 2019 03:40:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554201641; cv=none; d=google.com; s=arc-20160816; b=lC4GPyIjcpebd4vJF6s7RPwoyMnHx9+qEQymG/ImdPgQN026+5v2mQxl3ZD4bcPqvx JqUUi69LFHBafVSyrGpEwGTC6he5xT3+TMkBg41TArQeY/KOVTNKUZVF6d7D82ENlHoY vEY6r/E0BpB8muAIoq81v/gL7MmF0LDJjbwwyO9PT02+YS2gmdM1zySRSqVy2xqMlI3X MR1VKkj7w0ij23OcWjHlekCj5nTsZZ6BnYe0d7jy9nS+4JDFXmiF9uMVKUIFHD9CDEQF 9++k0SbRRUaqmeGr7AHfdycfT6FEXBZgdytApU1rdP+i/QoM9BcdPP1u9ytMPsBySxjP cK6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=mpoo8QuRadccMMulRjuKKbl0cg4s2QMxTttU7VTvlvw=; b=Ei1ssanENBuTDJxQ64PFBITyPHizm+ykWbYQLsx99ohR5agF1i6ms5hW9/scTqnRUx GnoC2FX3p40icVe3spuglyCwvAQTBcQBQq03aDt0njV0nf/t/BjziUxLKob8/vC6cU9B dydmQRObSWWfHVZEUjOo7jEN4RRHwWual60NOFV23/prSxdAT+JwyIfOWtLrCvEtUxhD 8iGhORM3mBiObkadpKIImIwGDk5jsMk2fk9rKvdjOPD6/C25KwDp0zYHhDNTkr+eZjQU emk3LxmslcWozPgh2DrcciJxlw+ji5IhB7N3j0+T9ZdlorNRuUl0WJIM/5gbJHk0boLq FtKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ycdDV7k8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c14si3245450pls.106.2019.04.02.03.40.25; Tue, 02 Apr 2019 03:40:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ycdDV7k8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729086AbfDBKfL (ORCPT + 99 others); Tue, 2 Apr 2019 06:35:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:37996 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726705AbfDBKfL (ORCPT ); Tue, 2 Apr 2019 06:35:11 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3DB272084C; Tue, 2 Apr 2019 10:35:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1554201309; bh=VkbNACm/8od0be0SyULl7JcTxUCS+//sRbl2F+o4Iog=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ycdDV7k8M8Y14E0mQ3iNCIPeUgG/S+/kLIJZ0N/a8lvLVi3632jrAzzeTHZD1C4K0 O2ugQj3umNpssjD9/xyRYNSqPg3LGzdOTBvxzCvp6ZSkASRB7akXcmyby+A9MZLsKQ OrLh1w4pW/Rkt5E3RbKjsNqFP0NzA5u+GnNDtBG4= Date: Tue, 2 Apr 2019 12:35:07 +0200 From: Greg Kroah-Hartman To: Jari Ruusu Cc: "zhangyi (F)" , Theodore Ts'o , Jan Kara , linux-kernel@vger.kernel.org Subject: Re: ext3 file system livelock and file system corruption, 4.9.166 stable kernel Message-ID: <20190402103507.GA15511@kroah.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.4 (2019-03-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 02, 2019 at 01:08:45PM +0300, Jari Ruusu wrote: > To trigger this ext4 file system bug, you need a sparse file with > correct sparse pattern on old-school ext3 file system. I tried > more simpler ways to trigger this but those attempts did not > trigger the bug. I have provided compressed sparse file that > reliably triggers the bug. Size of compressed sparse file 1667256 > bytes. Size of uncompressed sparse file 7369850880 bytes. > Following commands will demo the problem. > > wget http://www.elisanet.fi/jariruusu/123/sparse-demo.data.xz > xz -d sparse-demo.data.xz > mkfs -t ext3 -b 4096 -e remount-ro -O "^dir_index" /dev/sdc1 > mount -t ext3 /dev/sdc1 /mnt > cp -v --sparse=always sparse-demo.data /mnt/aa > cp -v --sparse=always sparse-demo.data /mnt/bb > umount /mnt > mount -t ext3 /dev/sdc1 /mnt > cp -v --sparse=always /mnt/bb /mnt/aa > > That last cp command reliably triggers the bug that livelocks and > after reset you have file system corruption to deal with. Deeply > unfunny. > > The bug is caused by > "ext4: brelse all indirect buffer in ext4_ind_remove_space()" > upstream commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217, from > , who provided a follow-up patch > "ext4: cleanup bh release code in ext4_ind_remove_space()" > upstream commit 5e86bdda41534e17621d5a071b294943cae4376e. The > problem with that follow-up patch is that it is almost criminally > mislabeled. It should have said "fixes ext3 livelock and file > system corrupting bug" or something like that, so that Greg KH & > Co would have understood that it must be backported to stable > kernels too. Now the bug appears to be in all/most stable kernels > already. > > Below is the buggy patch that causes the problem. Look at those > new while loops. Once the while condition is true once, it is > ALWAYS true, so it livelocks. > > > --- a/fs/ext4/indirect.c > > +++ b/fs/ext4/indirect.c > > @@ -1385,10 +1385,14 @@ end_range: > > partial->p + 1, > > partial2->p, > > (chain+n-1) - partial); > > - BUFFER_TRACE(partial->bh, "call brelse"); > > - brelse(partial->bh); > > - BUFFER_TRACE(partial2->bh, "call brelse"); > > - brelse(partial2->bh); > > + while (partial > chain) { > > + BUFFER_TRACE(partial->bh, "call brelse"); > > + brelse(partial->bh); > > + } > > + while (partial2 > chain2) { > > + BUFFER_TRACE(partial2->bh, "call brelse"); > > + brelse(partial2->bh); > > + } > > return 0; > > } > > > > Greg & Co, > Please revert that above patch from stable kernels or backport the > follow-up patch that fixes the problem. So you need 5e86bdda4153 ("ext4: cleanup bh release code in ext4_ind_remove_space()") applied to all of the stable and LTS kernels at the moment (as that patch only showed up in 5.1-rc1)? If so, I need an ack from the ext4 developers/maintainer to do so. thanks, greg k-h