2007-09-12 07:49:55

by Maurice Volaski

[permalink] [raw]
Subject: Can LVM block I/O and hang a system?

A working system begins hanging and it seems to be stuck on I/O
processes that use ext3 partitions that are running on top of LVM.
The system is AMD 64-bit running Gentoo. Kernel is Gentoo 2.6.22-r3
and LVM lvm2-2.02.27. Here is the disk setup:

Boot disk, attached to motherboard via SATA
1) some partitions accessed via ext3 -> hardware partition.
2) some partitions accessed via ext3 -> drbd, which is version 8.0.5,
-> hardware partition.

External SATA-SCSI RAID, attached to via an LSI Logic card,
3) one partition accessed via ext3 -> drbd -> hardware partition.
4) some partitions accessed via ext3 -> LVM -> drbd -> hardware partition.

On repeated reboots, #1) boots fine, and I can fsck #2) no problem. I
can also fsck #3, but the fsck processes on #4, which all are trying
to recover the journals, just seem to not do anything. There is no
evidence of I/O and there are no errors reported anywhere. The frozen
fsck processes cannot even be killed and the system ignores the
shutdown command.

That the hanging fsck processes are all occurring on just the LVM
partitions seems to imply that LVM is responsible.

drbd had been unattached to its peer during this time, and when I
reattached it, it had no trouble syncing to the peer. That system,
which should basically be identical, however, has no trouble running
running fsck everywhere. I'm not sure, though, if that lets LVM off
the hook.
--

Maurice Volaski, [email protected]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University


2007-09-15 09:40:24

by Andrew Morton

[permalink] [raw]
Subject: Re: Can LVM block I/O and hang a system?

On Wed, 12 Sep 2007 03:09:14 -0400 Maurice Volaski <[email protected]> wrote:

> A working system begins hanging and it seems to be stuck on I/O
> processes that use ext3 partitions that are running on top of LVM.
> The system is AMD 64-bit running Gentoo. Kernel is Gentoo 2.6.22-r3
> and LVM lvm2-2.02.27. Here is the disk setup:
>
> Boot disk, attached to motherboard via SATA
> 1) some partitions accessed via ext3 -> hardware partition.
> 2) some partitions accessed via ext3 -> drbd, which is version 8.0.5,
> -> hardware partition.
>
> External SATA-SCSI RAID, attached to via an LSI Logic card,
> 3) one partition accessed via ext3 -> drbd -> hardware partition.
> 4) some partitions accessed via ext3 -> LVM -> drbd -> hardware partition.
>
> On repeated reboots, #1) boots fine, and I can fsck #2) no problem. I
> can also fsck #3, but the fsck processes on #4, which all are trying
> to recover the journals, just seem to not do anything. There is no
> evidence of I/O and there are no errors reported anywhere. The frozen
> fsck processes cannot even be killed and the system ignores the
> shutdown command.
>
> That the hanging fsck processes are all occurring on just the LVM
> partitions seems to imply that LVM is responsible.
>
> drbd had been unattached to its peer during this time, and when I
> reattached it, it had no trouble syncing to the peer. That system,
> which should basically be identical, however, has no trouble running
> running fsck everywhere. I'm not sure, though, if that lets LVM off
> the hook.

Next time it hangs, please do

dmesg -c
echo t > /proc/sysrq-trigger
dmesg -s 1000000 > foo

and send foo.