2017-01-04 05:18:08

by Anton Blanchard

[permalink] [raw]
Subject: ext4 filesystem corruption with 4.10-rc2 on ppc64le

Hi,

I'm consistently seeing ext4 filesystem corruption using a mainline
kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
cloud image, boot it in KVM and run:

sudo apt-get update
sudo apt-get dist-upgrade
sudo reboot

And it never makes it back up, dying with rather severe filesystem
corruption.

I've narrowed it down to:

64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")

Backing these patches out fixes the issue.

Anton


2017-01-04 06:02:42

by Chandan Rajendra

[permalink] [raw]
Subject: Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le

On Wednesday, January 04, 2017 04:18:08 PM Anton Blanchard wrote:
> Hi,
>
> I'm consistently seeing ext4 filesystem corruption using a mainline
> kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
> cloud image, boot it in KVM and run:
>
> sudo apt-get update
> sudo apt-get dist-upgrade
> sudo reboot
>
> And it never makes it back up, dying with rather severe filesystem
> corruption.

Hi,

The patch at https://patchwork.kernel.org/patch/9488235/ should fix the
bug.

>
> I've narrowed it down to:
>
> 64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
> e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
> ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")
>
> Backing these patches out fixes the issue.
>
> Anton
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
chandan

2017-01-04 07:34:00

by luigi burdo

[permalink] [raw]
Subject: Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le

Hi,

it is present on ppc not le too.

found it on Ubuntu Mate 16.10 PPC with kernel 4.9 rc6 PPC64 on P5020/P5040


Thanks

Luigi


________________________________
Da: Linuxppc-dev <[email protected]> per conto di Anton Blanchard <[email protected]>
Inviato: mercoled? 4 gennaio 2017 06.18
A: [email protected]; Michael Ellerman; Benjamin Herrenschmidt; Paul Mackerras; Stephen Rothwell; [email protected]
Cc: [email protected]; [email protected]; [email protected]; [email protected]
Oggetto: ext4 filesystem corruption with 4.10-rc2 on ppc64le

Hi,

I'm consistently seeing ext4 filesystem corruption using a mainline
kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
cloud image, boot it in KVM and run:

sudo apt-get update
sudo apt-get dist-upgrade
sudo reboot

And it never makes it back up, dying with rather severe filesystem
corruption.

I've narrowed it down to:

64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")

Backing these patches out fixes the issue.

Anton

2017-01-04 15:09:28

by Jens Axboe

[permalink] [raw]
Subject: Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le

On 01/03/2017 10:18 PM, Anton Blanchard wrote:
> Hi,
>
> I'm consistently seeing ext4 filesystem corruption using a mainline
> kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
> cloud image, boot it in KVM and run:
>
> sudo apt-get update
> sudo apt-get dist-upgrade
> sudo reboot
>
> And it never makes it back up, dying with rather severe filesystem
> corruption.
>
> I've narrowed it down to:
>
> 64e1c57fa474 ("ext4: Use clean_bdev_aliases() instead of iteration")
> e64855c6cfaa ("fs: Add helper to clean bdev aliases under a bh and use it")
> ce98321bf7d2 ("fs: Remove unmap_underlying_metadata")
>
> Backing these patches out fixes the issue.

Fix is going out today, I see Chandan already pointed you at it. For the
other reporter, it's not an LE vs BE thing, it's a fs blocksize < page
size problem.

--
Jens Axboe

2017-01-04 15:28:37

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le

On Wed, Jan 04, 2017 at 11:32:42AM +0530, Chandan Rajendra wrote:
> On Wednesday, January 04, 2017 04:18:08 PM Anton Blanchard wrote:
> > I'm consistently seeing ext4 filesystem corruption using a mainline
> > kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
> > cloud image, boot it in KVM and run:
> >
> > sudo apt-get update
> > sudo apt-get dist-upgrade
> > sudo reboot
> >
> > And it never makes it back up, dying with rather severe filesystem
> > corruption.
>
> The patch at https://patchwork.kernel.org/patch/9488235/ should fix the
> bug.

It looks like this patch is already queued up on the "for-linus"
branch on the linux-block.git tree.

Chandra, thanks for pointing this out! I had missed your e-mail from
Christmas day, and it was on my todo list to figure out why I was
seeing lots of 1k block regressions on gce-xfstests post-merge window
that wasn't showing up on the ext4.git tree before I sent my pull
request to Linus.

Jens, could you expedite a pull request to Linus? This is affecting
ext4 on 1k block file systems on x86/x86_64, so this is not a ppc-only
regression.

Anton or Chandan, could you do me a favor and verify whether or not
64k block sizes are working for you on ppcle on ext4 by running
xfstests? Light duty testing works for me but when I stress ext4 with
pagesize==blocksize on ppcle64 via xfstests, it blows up. I suspect
(but am not sure) it's due to (non-upstream) device driver issues, and
a verification that you can run xfstests on your ppcle64 systems using
standard upstream device drivers would be very helpful, since I don't
have easy console access on the machines I have access to at $WORK. :-(

And of course, if there are still blocksize==pagesize issues on ext4
on ppc64le, it would be good to know that too.

Many thanks!!
- Ted

P.S. And for those people who are doing storage work, let me put in a
plug for "gce-xfstests full". It's cheap and finds lots of problems
before I and others have to. And if the $1.50 USD is the problem, let
me know and I'll try to work something out. :-) :-)

2017-01-04 16:23:19

by Jens Axboe

[permalink] [raw]
Subject: Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le

On 01/04/2017 08:28 AM, Theodore Ts'o wrote:
> On Wed, Jan 04, 2017 at 11:32:42AM +0530, Chandan Rajendra wrote:
>> On Wednesday, January 04, 2017 04:18:08 PM Anton Blanchard wrote:
>>> I'm consistently seeing ext4 filesystem corruption using a mainline
>>> kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
>>> cloud image, boot it in KVM and run:
>>>
>>> sudo apt-get update
>>> sudo apt-get dist-upgrade
>>> sudo reboot
>>>
>>> And it never makes it back up, dying with rather severe filesystem
>>> corruption.
>>
>> The patch at https://patchwork.kernel.org/patch/9488235/ should fix the
>> bug.
>
> It looks like this patch is already queued up on the "for-linus"
> branch on the linux-block.git tree.
>
> Chandra, thanks for pointing this out! I had missed your e-mail from
> Christmas day, and it was on my todo list to figure out why I was
> seeing lots of 1k block regressions on gce-xfstests post-merge window
> that wasn't showing up on the ext4.git tree before I sent my pull
> request to Linus.
>
> Jens, could you expedite a pull request to Linus? This is affecting
> ext4 on 1k block file systems on x86/x86_64, so this is not a ppc-only
> regression.

Yes, it'll go out this morning.

--
Jens Axboe


2017-01-04 18:09:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le

On Wed, Jan 4, 2017 at 8:23 AM, Jens Axboe <[email protected]> wrote:
> On 01/04/2017 08:28 AM, Theodore Ts'o wrote:
>>
>> Jens, could you expedite a pull request to Linus? This is affecting
>> ext4 on 1k block file systems on x86/x86_64, so this is not a ppc-only
>> regression.
>
> Yes, it'll go out this morning.

It's merged and out there in my tree now.

Linus

2017-01-05 10:44:10

by Anton Blanchard

[permalink] [raw]
Subject: Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le

Hi Ted,

> Anton or Chandan, could you do me a favor and verify whether or not
> 64k block sizes are working for you on ppcle on ext4 by running
> xfstests? Light duty testing works for me but when I stress ext4 with
> pagesize==blocksize on ppcle64 via xfstests, it blows up. I suspect
> (but am not sure) it's due to (non-upstream) device driver issues, and
> a verification that you can run xfstests on your ppcle64 systems using
> standard upstream device drivers would be very helpful, since I don't
> have easy console access on the machines I have access to at
> $WORK. :-(

I fired off an xfstests run, and it looks good. There are 3 failures,
but they seem to be setup issues on my part. I also double checked
those same three failed on 4.8.

Chandan has been running the test suite regularly, and plans to do a
run against mainline too.

Anton

2017-01-09 04:10:29

by Chandan Rajendra

[permalink] [raw]
Subject: Re: ext4 filesystem corruption with 4.10-rc2 on ppc64le

On Wednesday, January 04, 2017 10:28:37 AM Theodore Ts'o wrote:
> On Wed, Jan 04, 2017 at 11:32:42AM +0530, Chandan Rajendra wrote:
> > On Wednesday, January 04, 2017 04:18:08 PM Anton Blanchard wrote:
> > > I'm consistently seeing ext4 filesystem corruption using a mainline
> > > kernel. It doesn't take much to trigger it - download a ppc64le Ubuntu
> > > cloud image, boot it in KVM and run:
> > >
> > > sudo apt-get update
> > > sudo apt-get dist-upgrade
> > > sudo reboot
> > >
> > > And it never makes it back up, dying with rather severe filesystem
> > > corruption.
> >
> > The patch at https://patchwork.kernel.org/patch/9488235/ should fix the
> > bug.
>
> It looks like this patch is already queued up on the "for-linus"
> branch on the linux-block.git tree.
>
> Chandra, thanks for pointing this out! I had missed your e-mail from
> Christmas day, and it was on my todo list to figure out why I was
> seeing lots of 1k block regressions on gce-xfstests post-merge window
> that wasn't showing up on the ext4.git tree before I sent my pull
> request to Linus.
>
> Jens, could you expedite a pull request to Linus? This is affecting
> ext4 on 1k block file systems on x86/x86_64, so this is not a ppc-only
> regression.
>
> Anton or Chandan, could you do me a favor and verify whether or not
> 64k block sizes are working for you on ppcle on ext4 by running
> xfstests? Light duty testing works for me but when I stress ext4 with
> pagesize==blocksize on ppcle64 via xfstests, it blows up. I suspect
> (but am not sure) it's due to (non-upstream) device driver issues, and
> a verification that you can run xfstests on your ppcle64 systems using
> standard upstream device drivers would be very helpful, since I don't
> have easy console access on the machines I have access to at $WORK. :-(

Hi Ted,

I found one regression w.r.t 64k blocksize. I posted a patch
(http://marc.info/?l=linux-block&m=148388687722745&w=2) to fix the issue.

--
chandan