LinuxLists.cc - Crash after umount'ing a disconnected disk (Re: extfs reliability)

2010-08-04 18:03:29

Subject: Crash after umount'ing a disconnected disk (Re: extfs reliability)

Ping?

Have you had a chance to check out whether this patch solves the
problem you were complaining with respect to yanking out the last
iSCSI or FC link to a hard drive, and then umounting the disk
afterwards?

If you could try it out, I would really appreciate it.

- Ted

On Thu, Jul 29, 2010 at 02:58:49PM -0400, Ted Ts'o wrote:
> OK, I've looked at your kernel messages, and it looks like the problem
> comes from this:
>
> /* Debugging code just in case the in-memory inode orphan list
> * isn't empty. The on-disk one can be non-empty if we've
> * detected an error and taken the fs readonly, but the
> * in-memory list had better be clean by this point. */
> if (!list_empty(&sbi->s_orphan))
> dump_orphan_list(sb, sbi);
> J_ASSERT(list_empty(&sbi->s_orphan)); <====
>
> This is a "should never happen situation", and we crash so we can
> figure out how we got there. For production kernels, arguably it
> would probably be better to print a message and a WARN_ON(1), and then
> not force a crash from a BUG_ON (which is what J_ASSERT is defined to
> use).
>
> Looking at your messages and the ext4_delete_inode() warning, I think
> I know what caused it. Can you try this patch (attached below) and
> see if it fixes things for you?
>
> > I already reported such issues some time ago, but my reports were
> > not too much welcomed, so I gave up. Anyway, anybody can easily do
> > my tests at any time.
>
> My apologies. I've gone through the linux-ext4 mailing list logs, and
> I can't find any mention of this problem from any username @vlnb.net.
> I'm not sure where you reported it, and I'm sorry we dropped your bug
> report. All I can say is that we do the best that we can, and our
> team is relatively small and short-handed.
>
> - Ted
>
> From a190d0386e601d58db6d2a6cbf00dc1c17d02136 Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <[email protected]>
> Date: Thu, 29 Jul 2010 14:54:48 -0400
> Subject: [PATCH] patch explicitly-drop-inode-from-orphan-list-on-ext4_delete_inode-failure
>
> ---
> fs/ext4/inode.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index a52d5af..533b607 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -221,6 +221,7 @@ void ext4_delete_inode(struct inode *inode)
> "couldn't extend journal (err %d)", err);
> stop_handle:
> ext4_journal_stop(handle);
> + ext4_orphan_del(NULL, inode);
> goto no_delete;
> }
> }
> --
> 1.7.0.4
>

2010-08-04 18:24:44

by Vladislav Bolkhovitin

[permalink] [raw]

Subject: Re: Crash after umount'ing a disconnected disk (Re: extfs reliability)

Ted Ts'o, on 08/04/2010 10:03 PM wrote:
> Ping?
>
> Have you had a chance to check out whether this patch solves the
> problem you were complaining with respect to yanking out the last
> iSCSI or FC link to a hard drive, and then umounting the disk
> afterwards?
>
> If you could try it out, I would really appreciate it.

Sorry, I have troubles to compile that kernel. With my regular config
which I used for ages with Fedora Ubuntu early user space crashes. But
I'll try to overcome it tomorrow.

Thanks,
Vlad

2010-08-05 19:30:02

by Vladislav Bolkhovitin

[permalink] [raw]

Subject: Re: Crash after umount'ing a disconnected disk (Re: extfs reliability)

Ted Ts'o, on 08/04/2010 10:03 PM wrote:
> Ping?
>
> Have you had a chance to check out whether this patch solves the
> problem you were complaining with respect to yanking out the last
> iSCSI or FC link to a hard drive, and then umounting the disk
> afterwards?

Looks like it works. I was able to reach that branch (see AAA in the
attached log) and it was handled well.

I've also got other (see the attached log file):

1. A bunch of detected hung tasks with call traces.

2. "JBD: recovery failed" I reported before.

The log is basically self talking. I did 2 runs.

Thanks,
Vlad

Attachments:

messages.bz2 (17.62 kB)

2010-08-05 21:18:02

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Crash after umount'ing a disconnected disk (Re: extfs reliability)

On Thu, Aug 05, 2010 at 11:29:59PM +0400, Vladislav Bolkhovitin wrote:
> >Have you had a chance to check out whether this patch solves the
> >problem you were complaining with respect to yanking out the last
> >iSCSI or FC link to a hard drive, and then umounting the disk
> >afterwards?
>
> Looks like it works. I was able to reach that branch (see AAA in the
> attached log) and it was handled well.

OK, great!

> I've also got other (see the attached log file):
>
> 1. A bunch of detected hung tasks with call traces.
>

Is this unique to ext4? It looks like a problem where we're either
(a) not getting an I/O error from the block device in time before we
get the hung task timeout (which might be the right thing, if the link
eventually comes back --- what I've seen is there's a no clear
consensus how long the last FC or iSCSI link should be done before we
give up on an I/O operation), or (b) for some reason we're not
noticing the I/O error and waiting forever. I believe (a) is more
likely here, but it's possible it's (b). Do you eventually get file
system I/O errors that abort the journal transaction? You should...

> 2. "JBD: recovery failed" I reported before.

I've searched my mail archives, and I'm not sure what you're talking
about here. Maybe this was in an e-mail that you sent that perhaps
got lost?

- Ted

2010-08-06 13:23:50

by Vladislav Bolkhovitin

[permalink] [raw]

Subject: Re: Crash after umount'ing a disconnected disk (Re: extfs reliability)

Ted Ts'o, on 08/06/2010 01:17 AM wrote:
> On Thu, Aug 05, 2010 at 11:29:59PM +0400, Vladislav Bolkhovitin wrote:
>>> Have you had a chance to check out whether this patch solves the
>>> problem you were complaining with respect to yanking out the last
>>> iSCSI or FC link to a hard drive, and then umounting the disk
>>> afterwards?
>>
>> Looks like it works. I was able to reach that branch (see AAA in the
>> attached log) and it was handled well.
>
> OK, great!
>
>> I've also got other (see the attached log file):
>>
>> 1. A bunch of detected hung tasks with call traces.
>>
>
> Is this unique to ext4? It looks like a problem where we're either
> (a) not getting an I/O error from the block device in time before we
> get the hung task timeout (which might be the right thing, if the link
> eventually comes back --- what I've seen is there's a no clear
> consensus how long the last FC or iSCSI link should be done before we
> give up on an I/O operation), or (b) for some reason we're not
> noticing the I/O error and waiting forever. I believe (a) is more
> likely here, but it's possible it's (b). Do you eventually get file
> system I/O errors that abort the journal transaction? You should...

Yes, as you can see in the previously attached log.

>> 2. "JBD: recovery failed" I reported before.
>
> I've searched my mail archives, and I'm not sure what you're talking
> about here. Maybe this was in an e-mail that you sent that perhaps
> got lost?

It's next to the message on which you originally replied. It was about
ext3, but this time I saw it with ext4.

Thanks,
Vlad

2010-08-06 18:10:45

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Crash after umount'ing a disconnected disk (Re: extfs reliability)

On Fri, Aug 06, 2010 at 05:23:46PM +0400, Vladislav Bolkhovitin wrote:
> >>
> >>1. A bunch of detected hung tasks with call traces.
> >>
> >
> >Do you eventually get file
> >system I/O errors that abort the journal transaction? You should...
>
> Yes, as you can see in the previously attached log.

OK, so what are you complaining about? This is *normal*. You can
change how long the kernel will wait before printing the warnings, by
adjusting /proc/sys/kernel/hung_task_timeout_secs, but the fact that
you're getting the detected hung tasks is the system performing as
designed.

> >>2. "JBD: recovery failed" I reported before.
> >
> >I've searched my mail archives, and I'm not sure what you're talking
> >about here. Maybe this was in an e-mail that you sent that perhaps
> >got lost?
>
> It's next to the message on which you originally replied. It was
> about ext3, but this time I saw it with ext4.

Can you resend, and with a new and specific subject line that is
helpful for finding it, and just that one message? Sorry, but I get
literally hundreds, and some days over a thousand, e-mails a day, and
that doesn't include e-mails which get caught in spam traps....

- Ted

2010-08-09 18:45:49

by Vladislav Bolkhovitin

[permalink] [raw]

Subject: Re: Crash after umount'ing a disconnected disk and JBD: recovery failed (Re: extfs reliability)

Ted Ts'o, on 08/06/2010 10:10 PM wrote:
> On Fri, Aug 06, 2010 at 05:23:46PM +0400, Vladislav Bolkhovitin wrote:
>>>>
>>>> 1. A bunch of detected hung tasks with call traces.
>>>>
>>>
>>> Do you eventually get file
>>> system I/O errors that abort the journal transaction? You should...
>>
>> Yes, as you can see in the previously attached log.
>
> OK, so what are you complaining about? This is *normal*. You can
> change how long the kernel will wait before printing the warnings, by
> adjusting /proc/sys/kernel/hung_task_timeout_secs, but the fact that
> you're getting the detected hung tasks is the system performing as
> designed.

Well, I'm not complaining, I'm reporting.

I can't say where is the problem. And I really would *not* say that
activation of the hung tasks detector is normal. A correct timeout
should be set by default, not after manual user intervention.

>>>> 2. "JBD: recovery failed" I reported before.
>>>
>>> I've searched my mail archives, and I'm not sure what you're talking
>>> about here. Maybe this was in an e-mail that you sent that perhaps
>>> got lost?
>>
>> It's next to the message on which you originally replied. It was
>> about ext3, but this time I saw it with ext4.
>
> Can you resend, and with a new and specific subject line that is
> helpful for finding it, and just that one message?

See http://lkml.org/lkml/2010/7/29/222 and
http://lkml.org/lkml/2010/7/29/325.

Vlad

2010-08-09 19:32:47

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Crash after umount'ing a disconnected disk and JBD: recovery failed (Re: extfs reliability)

On Mon, Aug 09, 2010 at 10:45:52PM +0400, Vladislav Bolkhovitin wrote:
>
> Well, I'm not complaining, I'm reporting.
>
> I can't say where is the problem. And I really would *not* say that
> activation of the hung tasks detector is normal. A correct timeout
> should be set by default, not after manual user intervention.

The root cause of your issues is that very few people tend to use
disks that can randomly appear and disappear due to links appearing
and disappearing. So it doesn't get much testing, and in the case of
USB, for example, if you pull the USB stick out, the pending I/O's
error out immediately. The hung tasks detector has no idea that the
iSCSI and FC drivers will not immediately error out the I/O's, but
will wait some amount of time. You could say the iSCSI and FC drivers
should change the hung tasks timeout if they happen to be in use, but
maybe the sysadmin _wants_ the hung tasks detector to be a smaller
value. In any case, it's not my code, and if you want to complain at
the folks who do the iSCSI driver, feel free.

> >>It's next to the message on which you originally replied. It was
> >>about ext3, but this time I saw it with ext4.
> >
> >Can you resend, and with a new and specific subject line that is
> >helpful for finding it, and just that one message?
>
> See http://lkml.org/lkml/2010/7/29/222 and
> http://lkml.org/lkml/2010/7/29/325.

My bet the problem is that iSCSI driver and/or the buffer cache array
doesn't do the right thing with data in the buffer cache which is
didn't actually make it out to the disk (when the I/O finally timed
out), so there is some old data in the buffer cache which doesn't
reflect what is on the disk.

I suspect that if you run the following command after you umount the
disk, and recover the disk, before you mount the disk again, you run
this command (source attached) on the block device, the journal
recovery should no longer fail. Can you try this experiment? If we
see that this solves the problem, then we can force a buffer cache
flush at mount-time, so that it happens automatically.

- Ted

/*
* flushb.c --- This routine flushes the disk buffers for a disk
*
* Copyright 1997, 2000, by Theodore Ts'o.
*
* WARNING: use of flushb on some older 2.2 kernels on a heavily loaded
* system will corrupt filesystems. This program is not really useful
* beyond for benchmarking scripts.
*
* %Begin-Header%
* This file may be redistributed under the terms of the GNU Public
* License.
* %End-Header%
*/

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <sys/mount.h>
#include "../misc/nls-enable.h"

/* For Linux, define BLKFLSBUF if necessary */
#if (!defined(BLKFLSBUF) && defined(__linux__))
#define BLKFLSBUF _IO(0x12,97) /* flush buffer cache */
#endif

const char *progname;

static void usage(void)
{
fprintf(stderr, _("Usage: %s disk\n"), progname);
exit(1);
}

int main(int argc, char **argv)
{
int fd;

progname = argv[0];
if (argc != 2)
usage();

fd = open(argv[1], O_RDONLY, 0);
if (fd < 0) {
perror("open");
exit(1);
}
/*
* Note: to reread the partition table, use the ioctl
* BLKRRPART instead of BLKFSLBUF.
*/
#ifdef BLKFLSBUF
if (ioctl(fd, BLKFLSBUF, 0) < 0) {
perror("ioctl BLKFLSBUF");
exit(1);
}
return 0;
#else
fprintf(stderr,
_("BLKFLSBUF ioctl not supported! Can't flush buffers.\n"));
return 1;
#endif
}

2010-08-13 19:04:54

by Vladislav Bolkhovitin

[permalink] [raw]

Subject: Re: Crash after umount'ing a disconnected disk and JBD: recovery failed (Re: extfs reliability)

Ted Ts'o, on 08/09/2010 11:32 PM wrote:
>>>> It's next to the message on which you originally replied. It was
>>>> about ext3, but this time I saw it with ext4.
>>>
>>> Can you resend, and with a new and specific subject line that is
>>> helpful for finding it, and just that one message?
>>
>> See http://lkml.org/lkml/2010/7/29/222 and
>> http://lkml.org/lkml/2010/7/29/325.
>
> My bet the problem is that iSCSI driver and/or the buffer cache array
> doesn't do the right thing with data in the buffer cache which is
> didn't actually make it out to the disk (when the I/O finally timed
> out), so there is some old data in the buffer cache which doesn't
> reflect what is on the disk.
>
> I suspect that if you run the following command after you umount the
> disk, and recover the disk, before you mount the disk again, you run
> this command (source attached) on the block device, the journal
> recovery should no longer fail. Can you try this experiment? If we
> see that this solves the problem, then we can force a buffer cache
> flush at mount-time, so that it happens automatically.

I ran the program just before the mount and it changed nothing:

[36630.781663] e1000: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
# ./a.out /dev/sdb
# mount -t ext4 /dev/sdb /mnt
[36640.487208] JBD: recovery failed
[36640.500639] EXT4-fs (sdb): error loading journal
# mount -t ext4 /dev/sdb /mnt
[36721.642852] EXT4-fs (sdb): ext4_orphan_cleanup: deleting unreferenced inode 128135
[36721.669780] EXT4-fs (sdb): ext4_orphan_cleanup: deleting unreferenced inode 128136
[36721.696432] EXT4-fs (sdb): 2 orphan inodes deleted
[36721.709978] EXT4-fs (sdb): recovery complete
[36721.730531] EXT4-fs (sdb): mounted filesystem with ordered data mode

Vlad