2011-09-02 21:00:54

by Christian Kujau

[permalink] [raw]
Subject: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

Hi,

for some time now, the following message keeps pooping up in my logs:

> EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan
> inode list. Please umount/remount instead

I don't know when it started. Maybe 2 months ago, I'd guess. The ext4 fs
is on top of a dm-crypt device, attached via firewire to a 1TB external
disk enclosure. The system (powerpc 32) is loosely following vanilla
kernels, currently running 3.1.0-rc4.

The filesystem is normally mounted r/o but remounted r/w every day to
receive backups, then remounted r/o again. Running e2fsck-1.41.12 with -n
on the r/o-mounted devices gives the output below.

I've unmounted the disk some weeks ago, ran e2fsck for real and it fixed
the errors. But now more errors seem to have occured.

Anyone got an idea why this keeps happening? Bad memory? Bad cables? Disk?
No other hardware related errors are in the logs and the box is otherwise
quite stable.

Thanks,
Christian.

-------------------------------------------------
# fsck.ext4 -vnf /dev/mapper/wdc0
e2fsck 1.41.12 (17-May-2010)
Warning! /dev/mapper/wdc0 is mounted.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix? no

Inode 16385 was part of the orphaned inode list. IGNORED.
Deleted inode 16439 has zero dtime. Fix? no

Inode 2260993 was part of the orphaned inode list. IGNORED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(9298--9299) -8921121
Fix? no

Inode bitmap differences: -16385 -16439 -2260993
Fix? no

Directories count wrong for group #2 (72, counted=70).
Fix? no

Directories count wrong for group #276 (76, counted=75).
Fix? no

/dev/mapper/wdc0: ********** WARNING: Filesystem still has errors **********


562145 inodes used (0.92%)
3952 non-contiguous files (0.7%)
2544 non-contiguous directories (0.5%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 555449/932
215226375 blocks used (88.14%)
0 bad blocks
5 large files

353453 regular files
202351 directories
0 character device files
0 block device files
7 fifos
2491335 links
6235 symbolic links (5657 fast symbolic links)
87 sockets
--------
3053468 files

--
BOFH excuse #19:

floating point processor overflow


2011-09-06 16:17:23

by Eric Sandeen

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On 9/2/11 4:00 PM, Christian Kujau wrote:
> Hi,
>
> for some time now, the following message keeps pooping up in my logs:
>
> > EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan
> > inode list. Please umount/remount instead
>
> I don't know when it started. Maybe 2 months ago, I'd guess. The ext4 fs
> is on top of a dm-crypt device, attached via firewire to a 1TB external
> disk enclosure. The system (powerpc 32) is loosely following vanilla
> kernels, currently running 3.1.0-rc4.
>
> The filesystem is normally mounted r/o but remounted r/w every day to
> receive backups, then remounted r/o again. Running e2fsck-1.41.12 with -n
> on the r/o-mounted devices gives the output below.
>
> I've unmounted the disk some weeks ago, ran e2fsck for real and it fixed
> the errors. But now more errors seem to have occured.
>
> Anyone got an idea why this keeps happening? Bad memory? Bad cables? Disk?
> No other hardware related errors are in the logs and the box is otherwise
> quite stable.

It's probably not a bug or flaw; orphan inodes can occur for legitimate
reasons (fs goes down while someone is holding open an unlinked file),
and then they must be cleaned up. If orphan inode processing was skipped
for some reason on the original mount, you can get this error.

Did you happen to also get a message like this on the original mount?

if (bdev_read_only(sb->s_bdev)) {
ext4_msg(sb, KERN_ERR, "write access "
"unavailable, skipping orphan cleanup");
return;
}

?

See also commit:

commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd
Author: Eric Sandeen <[email protected]>
Date: Sat Feb 10 01:46:08 2007 -0800

[PATCH] ext4: refuse ro to rw remount of fs with orphan inodes

In the rare case where we have skipped orphan inode processing due to a
readonly block device, and the block device subsequently changes back to
read-write, disallow a remount,rw transition of the filesystem when we have an
unprocessed orphan inodes as this would corrupt the list.

Ideally we should process the orphan inode list during the remount, but that's
trickier, and this plugs the hole for now.

Signed-off-by: Eric Sandeen <[email protected]>
Cc: "Stephen C. Tweedie" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 486a641..463b52b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2419,6 +2419,22 @@ static int ext4_remount (struct super_block * sb, int * flags, char * data)
err = -EROFS;
goto restore_opts;
}
+
+ /*
+ * If we have an unprocessed orphan list hanging
+ * around from a previously readonly bdev mount,
+ * require a full umount/remount for now.
+ */
+ if (es->s_last_orphan) {
+ printk(KERN_WARNING "EXT4-fs: %s: couldn't "
+ "remount RDWR because of unprocessed "
+ "orphan inode list. Please "
+ "umount/remount instead.\n",
+ sb->s_id);
+ err = -EINVAL;
+ goto restore_opts;
+ }
+
/*
* Mounting a RDONLY partition read-write, so reread
* and store the current valid flag. (It may have

-Eric

> Thanks,
> Christian.
>
> -------------------------------------------------
> # fsck.ext4 -vnf /dev/mapper/wdc0
> e2fsck 1.41.12 (17-May-2010)
> Warning! /dev/mapper/wdc0 is mounted.
> Pass 1: Checking inodes, blocks, and sizes
> Inodes that were part of a corrupted orphan linked list found. Fix? no
>
> Inode 16385 was part of the orphaned inode list. IGNORED.
> Deleted inode 16439 has zero dtime. Fix? no
>
> Inode 2260993 was part of the orphaned inode list. IGNORED.
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences: -(9298--9299) -8921121
> Fix? no
>
> Inode bitmap differences: -16385 -16439 -2260993
> Fix? no
>
> Directories count wrong for group #2 (72, counted=70).
> Fix? no
>
> Directories count wrong for group #276 (76, counted=75).
> Fix? no
>
> /dev/mapper/wdc0: ********** WARNING: Filesystem still has errors **********
>
>
> 562145 inodes used (0.92%)
> 3952 non-contiguous files (0.7%)
> 2544 non-contiguous directories (0.5%)
> # of inodes with ind/dind/tind blocks: 0/0/0
> Extent depth histogram: 555449/932
> 215226375 blocks used (88.14%)
> 0 bad blocks
> 5 large files
>
> 353453 regular files
> 202351 directories
> 0 character device files
> 0 block device files
> 7 fifos
> 2491335 links
> 6235 symbolic links (5657 fast symbolic links)
> 87 sockets
> --------
> 3053468 files
>


2011-09-06 16:37:49

by Christian Kujau

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Tue, 6 Sep 2011 at 11:17, Eric Sandeen wrote:
> It's probably not a bug or flaw; orphan inodes can occur for legitimate
> reasons (fs goes down while someone is holding open an unlinked file),

The filesystem is being constantly accessed by an application, holding at
least one file open (readonly). And then there is this mechanism trying to
remount the filesystem rw and then ro again every day. I guess this equals
the scenario of "fs goes down (remount!) while someone is holding open a
file"?

> Did you happen to also get a message like this on the original mount?
> ext4_msg(sb, KERN_ERR, "write access "
> "unavailable, skipping orphan cleanup");

I think I've seen this message before, but I'm nore sure where and it's
not in the logs of this particular system.

> See also commit:
>
> commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd
> Author: Eric Sandeen <[email protected]>
> Date: Sat Feb 10 01:46:08 2007 -0800
>
> [PATCH] ext4: refuse ro to rw remount of fs with orphan inodes

Yes, I've seen this commit when I was searching where this message came
from. And I think I understand now why this is happening, but
still...if I may ask: can't this be handled more elegantly? Do other
filesystems have the same problem?

Right now the procedure is to pause the application, stop the nfs exports,
unmount, fsck, mount, start nfs exports and resume the application. And
every few days/weeks this has to be repeated, "just because" these daily
remounts occur (which are the main reason for this, I suppose).

Thanks for replying,
Christian.
--
BOFH excuse #190:

Proprietary Information.

2011-09-06 17:29:48

by Eric Sandeen

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On 9/6/11 11:37 AM, Christian Kujau wrote:
> On Tue, 6 Sep 2011 at 11:17, Eric Sandeen wrote:
>> It's probably not a bug or flaw; orphan inodes can occur for legitimate
>> reasons (fs goes down while someone is holding open an unlinked file),
>
> The filesystem is being constantly accessed by an application, holding at
> least one file open (readonly). And then there is this mechanism trying to
> remount the filesystem rw and then ro again every day. I guess this equals
> the scenario of "fs goes down (remount!) while someone is holding open a
> file"?

well, no - "goes down" means "crashed or lost power"

>> Did you happen to also get a message like this on the original mount?
>> ext4_msg(sb, KERN_ERR, "write access "
>> "unavailable, skipping orphan cleanup");
>
> I think I've seen this message before, but I'm nore sure where and it's
> not in the logs of this particular system.
>
>> See also commit:
>>
>> commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd
>> Author: Eric Sandeen <[email protected]>
>> Date: Sat Feb 10 01:46:08 2007 -0800
>>
>> [PATCH] ext4: refuse ro to rw remount of fs with orphan inodes
>
> Yes, I've seen this commit when I was searching where this message came
> from. And I think I understand now why this is happening, but
> still...if I may ask: can't this be handled more elegantly? Do other
> filesystems have the same problem?

well, as the commit said, it'd be nice to handle it in remount, yes... :(

> Right now the procedure is to pause the application, stop the nfs exports,
> unmount, fsck, mount, start nfs exports and resume the application. And
> every few days/weeks this has to be repeated, "just because" these daily
> remounts occur (which are the main reason for this, I suppose).

well, seems like you need to get to the root cause of the unprocessed
orphan inodes.

I don't yet have my post-vacation thinking cap back on... does cycling
rw/ro/rw/ro with open & unlinked files cause an orphan inode situation?

-Eric

> Thanks for replying,
> Christian.


2011-09-06 18:14:54

by Christian Kujau

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Tue, 6 Sep 2011 at 11:44, Eric Sandeen wrote:
> > remount the filesystem rw and then ro again every day. I guess this equals
> > the scenario of "fs goes down (remount!) while someone is holding open a
> > file"?
>
> well, no - "goes down" means "crashed or lost power"

Hm, the machine and its storage is online all the time and the messages
occur inbetween downtimes.

> well, as the commit said, it'd be nice to handle it in remount, yes... :(

If my daily remounts are causing this, it's unforuntate. But it's nice to
know that now. It'd be more worrying that someting else is slowly
corrupting the fs.

> well, seems like you need to get to the root cause of the unprocessed
> orphan inodes.
>
> I don't yet have my post-vacation thinking cap back on... does cycling
> rw/ro/rw/ro with open & unlinked files cause an orphan inode situation?

This is almost all I do on this fs. The whole process is:

1) fs is ro most of the time, while a remote application accesses it via
a readonly nfs mount.
2) once a day the fs gets remounted rw (the remote application does not
know this and is still accessing the fs via the same ro-nfs mount
3) backups are being pushed to the fs (via rsync, using hardlinks a lot)
4) fs is remounted ro again
5) at some point the remote application notices that the nfs mount went
stale and has to remount its readonly nfs-mount

Thanks,
Christian.
--
BOFH excuse #93:

Feature not yet implemented

2011-09-10 01:11:29

by Christian Kujau

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
> There's race where VFS remount code can race with unlink and result will
> be unlinked file in orphan list on read-only filesystem. Christian seems to
> be hitting this race. Miklos Szeredi has patches
> (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
> mostly close this hole but they're waiting for Al to find time to look at
> them / merge them AFAIK.

While these patches are still pending review, are they "dangerous" to
apply? If not, I'd like to volunteer as a tester :-)

Christian.
--
BOFH excuse #180:

ether leak

2011-09-10 20:04:16

by Jan Kara

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Fri 09-09-11 18:11:26, Christian Kujau wrote:
> On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
> > There's race where VFS remount code can race with unlink and result will
> > be unlinked file in orphan list on read-only filesystem. Christian seems to
> > be hitting this race. Miklos Szeredi has patches
> > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
> > mostly close this hole but they're waiting for Al to find time to look at
> > them / merge them AFAIK.
>
> While these patches are still pending review, are they "dangerous" to
> apply? If not, I'd like to volunteer as a tester :-)
As far as I saw them, they should be pretty safe. So feel free to test
them.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-09-10 20:33:45

by Jan Kara

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Tue 06-09-11 11:44:45, Eric Sandeen wrote:
> On 9/6/11 11:37 AM, Christian Kujau wrote:
> > On Tue, 6 Sep 2011 at 11:17, Eric Sandeen wrote:
> >> It's probably not a bug or flaw; orphan inodes can occur for legitimate
> >> reasons (fs goes down while someone is holding open an unlinked file),
> >
> > The filesystem is being constantly accessed by an application, holding at
> > least one file open (readonly). And then there is this mechanism trying to
> > remount the filesystem rw and then ro again every day. I guess this equals
> > the scenario of "fs goes down (remount!) while someone is holding open a
> > file"?
>
> well, no - "goes down" means "crashed or lost power"
>
> >> Did you happen to also get a message like this on the original mount?
> >> ext4_msg(sb, KERN_ERR, "write access "
> >> "unavailable, skipping orphan cleanup");
> >
> > I think I've seen this message before, but I'm nore sure where and it's
> > not in the logs of this particular system.
> >
> >> See also commit:
> >>
> >> commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd
> >> Author: Eric Sandeen <[email protected]>
> >> Date: Sat Feb 10 01:46:08 2007 -0800
> >>
> >> [PATCH] ext4: refuse ro to rw remount of fs with orphan inodes
> >
> > Yes, I've seen this commit when I was searching where this message came
> > from. And I think I understand now why this is happening, but
> > still...if I may ask: can't this be handled more elegantly? Do other
> > filesystems have the same problem?
>
> well, as the commit said, it'd be nice to handle it in remount, yes... :(
>
> > Right now the procedure is to pause the application, stop the nfs exports,
> > unmount, fsck, mount, start nfs exports and resume the application. And
> > every few days/weeks this has to be repeated, "just because" these daily
> > remounts occur (which are the main reason for this, I suppose).
>
> well, seems like you need to get to the root cause of the unprocessed
> orphan inodes.
>
> I don't yet have my post-vacation thinking cap back on... does cycling
> rw/ro/rw/ro with open & unlinked files cause an orphan inode situation?
There's race where VFS remount code can race with unlink and result will
be unlinked file in orphan list on read-only filesystem. Christian seems to
be hitting this race. Miklos Szeredi has patches
(http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
mostly close this hole but they're waiting for Al to find time to look at
them / merge them AFAIK.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-09-13 04:52:29

by Christian Kujau

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote:
> On Fri 09-09-11 18:11:26, Christian Kujau wrote:
> > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
> > > There's race where VFS remount code can race with unlink and result will
> > > be unlinked file in orphan list on read-only filesystem. Christian seems to
> > > be hitting this race. Miklos Szeredi has patches
> > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
> > > mostly close this hole but they're waiting for Al to find time to look at
> > > them / merge them AFAIK.
> >
> > While these patches are still pending review, are they "dangerous" to
> > apply? If not, I'd like to volunteer as a tester :-)
> As far as I saw them, they should be pretty safe. So feel free to test
> them.

I've applied them to -rc5. It might take a few days untile the message
occurs. Or, until "nothing happens", since I have the patches applied :-)

Meanwhile I'm trying to reproduce this issue on an x86 machine, but
haven't succeeded yet.

Thanks,
Christian.
--
BOFH excuse #133:

It's not plugged in.

2011-09-16 03:49:23

by Christian Kujau

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Mon, 12 Sep 2011 at 21:52, Christian Kujau wrote:
> On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote:
> > On Fri 09-09-11 18:11:26, Christian Kujau wrote:
> > > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
> > > > There's race where VFS remount code can race with unlink and result will
> > > > be unlinked file in orphan list on read-only filesystem. Christian seems to
> > > > be hitting this race. Miklos Szeredi has patches
> > > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
> > > > mostly close this hole but they're waiting for Al to find time to look at
> > > > them / merge them AFAIK.
> > >
> > > While these patches are still pending review, are they "dangerous" to
> > > apply? If not, I'd like to volunteer as a tester :-)
> > As far as I saw them, they should be pretty safe. So feel free to test
> > them.
>
> I've applied them to -rc5. It might take a few days untile the message
> occurs. Or, until "nothing happens", since I have the patches applied :-)

With Miklos' patches applied to -rc5, this happend again just now :-(

> Meanwhile I'm trying to reproduce this issue on an x86 machine, but
> haven't succeeded yet.

After a ~3k remounts with constantly reading from the filesystem in
question[0], I still was NOT able to reproduce this on an x86 VM :(

Any ideas?

Thanks,
Christian.

[0] http://nerdbynature.de/bits/3.1-rc4/ext4/
--
BOFH excuse #403:

Sysadmin didn't hear pager go off due to loud music from bar-room speakers.

2011-09-16 12:04:56

by Amir Goldstein

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Fri, Sep 16, 2011 at 6:49 AM, Christian Kujau <[email protected]> wrote:
> On Mon, 12 Sep 2011 at 21:52, Christian Kujau wrote:
>> On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote:
>> > On Fri 09-09-11 18:11:26, Christian Kujau wrote:
>> > > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
>> > > > There's race where VFS remount code can race with unlink and result will
>> > > > be unlinked file in orphan list on read-only filesystem. Christian seems to
>> > > > be hitting this race. Miklos Szeredi has patches
>> > > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
>> > > > mostly close this hole but they're waiting for Al to find time to look at
>> > > > them / merge them AFAIK.
>> > >
>> > > While these patches are still pending review, are they "dangerous" to
>> > > apply? If not, I'd like to volunteer as a tester :-)
>> > ? As far as I saw them, they should be pretty safe. So feel free to test
>> > them.
>>
>> I've applied them to -rc5. It might take a few days untile the message
>> occurs. Or, until "nothing happens", since I have the patches applied :-)
>
> With Miklos' patches applied to -rc5, this happend again just now :-(
>
>> Meanwhile I'm trying to reproduce this issue on an x86 machine, but
>> haven't succeeded yet.
>
> After a ~3k remounts with constantly reading from the filesystem in
> question[0], I still was NOT able to reproduce this on an x86 VM :(
>
> Any ideas?
>

This is just a shot in the dark, but are you using Ubuntu on your
production machine by any chance?
The reason I am asking is becasue I have been getting failures to
umount fs, while running xfstests on ext4
with Ubuntu for a long time and nobody else seems to share this problem.

I always suspected Ubuntu has some service that keeps open handles on
mounted fs, but I never got
to examine this.

Amir.

2011-09-16 12:17:41

by Christian Kujau

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Fri, 16 Sep 2011 at 15:04, Amir Goldstein wrote:
> This is just a shot in the dark, but are you using Ubuntu on your
> production machine by any chance?

No, I'm using Debian/stable on the "production" machine (the powerpc32
box, where the error occurs). I was trying to reproduce this in a x86
Ubuntu/10.04 VM, but could not.

> The reason I am asking is becasue I have been getting failures to
> umount fs, while running xfstests on ext4
> with Ubuntu for a long time and nobody else seems to share this problem.

Is there a bug open for that?

> I always suspected Ubuntu has some service that keeps open handles on
> mounted fs, but I never got to examine this.

Yeah...their "server" version needs major surgery to disable all the
bells and whistles before it's becoming usable.

Christian.
--
BOFH excuse #92:

Stale file handle (next time use Tupperware(tm)!)

2011-09-16 12:36:09

by Amir Goldstein

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Fri, Sep 16, 2011 at 3:17 PM, Christian Kujau <[email protected]> wrote:
> On Fri, 16 Sep 2011 at 15:04, Amir Goldstein wrote:
>> This is just a shot in the dark, but are you using Ubuntu on your
>> production machine by any chance?
>
> No, I'm using Debian/stable on the "production" machine (the powerpc32
> box, where the error occurs). I was trying to reproduce this in a x86
> Ubuntu/10.04 VM, but could not.

Actually, now I recall that Yongqiang did say he saw the same problem on Debian,
but I may be wrong.

>
>> The reason I am asking is becasue I have been getting failures to
>> umount fs, while running xfstests on ext4
>> with Ubuntu for a long time and nobody else seems to share this problem.
>
> Is there a bug open for that?
>

No, I couldn't find any trace of bug reports on this behavior, so I wrote it off
as "miss-configuration" of my server.
I did write to xfs list to ask if anyone else has seen this problem.
You could try to run xfstests on your server and see if the problem is
reproducible.

>> I always suspected Ubuntu has some service that keeps open handles on
>> mounted fs, but I never got to examine this.
>
> Yeah...their "server" version needs major surgery to disable all the
> bells and whistles before it's becoming usable.
>

And I installed the "desktop" version, so where does this leave me...

Amir.

2011-10-05 18:03:41

by Jan Kara

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Thu 15-09-11 20:49:19, Christian Kujau wrote:
> On Mon, 12 Sep 2011 at 21:52, Christian Kujau wrote:
> > On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote:
> > > On Fri 09-09-11 18:11:26, Christian Kujau wrote:
> > > > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
> > > > > There's race where VFS remount code can race with unlink and result will
> > > > > be unlinked file in orphan list on read-only filesystem. Christian seems to
> > > > > be hitting this race. Miklos Szeredi has patches
> > > > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
> > > > > mostly close this hole but they're waiting for Al to find time to look at
> > > > > them / merge them AFAIK.
> > > >
> > > > While these patches are still pending review, are they "dangerous" to
> > > > apply? If not, I'd like to volunteer as a tester :-)
> > > As far as I saw them, they should be pretty safe. So feel free to test
> > > them.
> >
> > I've applied them to -rc5. It might take a few days untile the message
> > occurs. Or, until "nothing happens", since I have the patches applied :-)
>
> With Miklos' patches applied to -rc5, this happend again just now :-(
Thanks for careful testing! Hmm, since you are able to reproduce on ppc
but not on x86 there might be some memory ordering bug in Miklos' patches
or it's simply because of different timing. Miklos, care to debug this
further?

> > Meanwhile I'm trying to reproduce this issue on an x86 machine, but
> > haven't succeeded yet.
>
> After a ~3k remounts with constantly reading from the filesystem in
> question[0], I still was NOT able to reproduce this on an x86 VM :(
>
> Any ideas?


Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-10-06 01:34:39

by Christian Kujau

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Wed, 5 Oct 2011 at 20:03, Jan Kara wrote:
>> With Miklos' patches applied to -rc5, this happend again just now :-(
>>
> Thanks for careful testing! Hmm, since you are able to reproduce on ppc
> but not on x86 there might be some memory ordering bug in Miklos' patches
> or it's simply because of different timing. Miklos, care to debug this
> further?

Just to be clear: I'm still not entirely sure how to reproduce this at
will. I *assumed* that the daily remount-rw-and-ro-again routine that left
some inodes in limbo and eventually lead to those "unprocessed orphan
inodes". With that in mind I tried to reproduce this with the help of a
test-script (test-remount.sh, [0]) - but the message did not occur while
the script was running.

I've ran the script again today on the said powerpc machine on a
loop-mounted 500MB ext4 partition. But even after 100 iterations no
such message occured.

So maybe it's caused by something else or my test-script just doesn't get
the scenario right and there's something subtle to this whole
remounting-business I haven't figured out yet, leading to those orphan
inodes.

I'm at 3.1.0-rc9 now and will wait until the errors occur again.

Christian.

[0] nerdbynature.de/bits/3.1-rc4/ext4/
--
BOFH excuse #423:

It's not RFC-822 compliant.

2011-10-06 10:10:46

by Toshiyuki Okajima

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

(2011/10/06 10:34), Christian Kujau wrote:
> On Wed, 5 Oct 2011 at 20:03, Jan Kara wrote:
>>> With Miklos' patches applied to -rc5, this happend again just now :-(
>>>
>> Thanks for careful testing! Hmm, since you are able to reproduce on ppc
>> but not on x86 there might be some memory ordering bug in Miklos' patches
>> or it's simply because of different timing. Miklos, care to debug this
>> further?
>
> Just to be clear: I'm still not entirely sure how to reproduce this at
> will. I *assumed* that the daily remount-rw-and-ro-again routine that left
> some inodes in limbo and eventually lead to those "unprocessed orphan
> inodes". With that in mind I tried to reproduce this with the help of a
> test-script (test-remount.sh, [0]) - but the message did not occur while
> the script was running.
>
> I've ran the script again today on the said powerpc machine on a
> loop-mounted 500MB ext4 partition. But even after 100 iterations no
> such message occured.
>
> So maybe it's caused by something else or my test-script just doesn't get
> the scenario right and there's something subtle to this whole
> remounting-business I haven't figured out yet, leading to those orphan
> inodes.
>
> I'm at 3.1.0-rc9 now and will wait until the errors occur again.
>
> Christian.
>
> [0] nerdbynature.de/bits/3.1-rc4/ext4/

With Miklos' patches applies to -rc8, I could display
"Couldn't remount RDWR because of unprocessed orphan inode list".
on my x86_64 machine by my reproducer.

Because actual removal starts from over a range between mnt_want_write() and
mnt_drop_write() even if do_unlinkat() or do_rmdir() calls mnt_want_write()
and mnt_drop_write() to prevent a filesystem from re-mounting read-only.

My reproducer is as follows:
-----------------------------------------------------------------------------
[1] go.sh
#!/bin/sh

dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1000k > /dev/null 2>&1
/sbin/mkfs.ext4 -Fq /tmp/img
mount -o loop /tmp/img /mnt
./writer.sh /mnt &
LOOP=1000000000
for ((i=0; i<LOOP; i++));
do
echo "[$i]"
if ((i%2 == 0));
then
mount -o ro,remount,loop /mnt
else
mount -o rw,remount,loop /mnt
fi
sleep 1
done

[2] writer.sh
#!/bin/sh

dir=$1
for ((i=0;i<10000000;i++));
do
for ((j=0;j<64;j++));
do
filename="$dir/file$((i*64 + j))"
dd if=/dev/zero of=$filename bs=1k count=8 > /dev/null 2>&1 &
done
for ((j=0;j<64;j++));
do
filename="$dir/file$((i*64 + j))"
rm -f $filename > /dev/null 2>&1 &
done
wait
if ((i%100 == 0 && i > 0));
then
rm -f $dir/file*
fi
done
exit

[step to run]
# ./go.sh
-----------------------------------------------------------------------------

Therefore, we need a mechanism to prevent a filesystem from re-mounting
read-only until actual removal finishes.

------------------------------------------------------------------------
[example fix]
do_unlinkat() {
...
mnt_want_write()
vfs_unlink()
if (inode && inode->i_nlink == 0) { //
atomic_inc(&inode->i_sb->s_unlink_count); //
inode->i_deleting++; //
} //
mnt_drop_write()
...
iput() // usually, an acutal removal starts
...
}

destroy_inode() {
...
if (inode->i_deleting)
atomic_dec(&inode->i_sb->s_unlink_count);
...
}

do_remount_sb() {
...
else if (!fs_may_remount_ro(sb) || atomic_read(&sb->s_unlink_count)
return -EBUSY;
...
}
------------------------------------------------------------------------

Besides, my reproducer also detects the following message:
"Ext4-fs (xxx): ext4_da_writepages: jbd2_start: xxx pages, ino xx: err -30"

This is because ext4_remount() cannot guarantee to write all ext4
filesystem data out due to the delayed allocation feature.
(ext4_da_writepages() fails after ext4_remount() sets MS_RDONLY with
sb->s_flags)

Therefore, we must write all delayed allocation buffers out before
ext4_remount() sets sb->s_flags with MS_RDONLY.

------------------------------------------------------------------------
[example fix] // This requires Miklos' patches.

ext4_remount() {
...
if (*flags & MS_RDONLY) {
err = dquot_suspend(sb, -1);
if (err < 0)
goto restore_opts;

sync_filesystem(sb); // write all delayed buffers out
sb->s_flags |= MS_RDONLY;
...
}
------------------------------------------------------------------------

Best Regards,
Toshiyuki Okajima


2011-10-11 08:45:05

by Miklos Szeredi

[permalink] [raw]
Subject: Re: EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

On Thu, 2011-10-06 at 19:12 +0900, Toshiyuki Okajima wrote:

> With Miklos' patches applies to -rc8, I could display
> "Couldn't remount RDWR because of unprocessed orphan inode list".
> on my x86_64 machine by my reproducer.
>
> Because actual removal starts from over a range between mnt_want_write() and
> mnt_drop_write() even if do_unlinkat() or do_rmdir() calls mnt_want_write()
> and mnt_drop_write() to prevent a filesystem from re-mounting read-only.


Thanks for the reproducer.

I'm looking at this now...

Miklos