Subject: possible ext4 race situation freezing linux

Kernel version: 2.26.28.7
efs2progs version: 1.41.4
arch: x86_64 (amd)

I made this test four times and the results were the same: linux
freezes and becomes unresponsive. Only solution is to reset the box. I
do not know if the problem is with the USB devices sub-subsystem or a
possible ext4 race condition.

That's the scenario:
1. a vg group (VGSTORE) using usb mass storage devices
2. a non-lvm controlled usb mass storage device named ANDREAS
3. a cpio operation running from the ANDREAS device to a VGSTORE.LV
filesystem (LV1)
4. a cpio operation from a VGSTORE.LV filesystem (LV2) to LV1
5. a rm operation on LV1

Nothing is recorded in /var/log/messages

I know that this is not too much information but I'm keen to repeat the
tests and turn on any debug options necessary to track the problem.

-- Berendsen




2009-02-24 14:17:57

by Theodore Ts'o

[permalink] [raw]
Subject: Re: possible ext4 race situation freezing linux

On Wed, Feb 25, 2009 at 01:22:22AM +1300, Andreas Friedrich Berendsen wrote:
> Kernel version: 2.26.28.7
> efs2progs version: 1.41.4
> arch: x86_64 (amd)
>
> I made this test four times and the results were the same: linux
> freezes and becomes unresponsive. Only solution is to reset the box. I
> do not know if the problem is with the USB devices sub-subsystem or a
> possible ext4 race condition.

Can you use alt-sysrq to get some stack traces or a register dump out,
so we can see where the kernel is hanging?

- Ted

2009-02-24 15:38:24

by Eric Sandeen

[permalink] [raw]
Subject: Re: possible ext4 race situation freezing linux

Theodore Tso wrote:
> On Wed, Feb 25, 2009 at 01:22:22AM +1300, Andreas Friedrich Berendsen wrote:
>> Kernel version: 2.26.28.7
>> efs2progs version: 1.41.4
>> arch: x86_64 (amd)
>>
>> I made this test four times and the results were the same: linux
>> freezes and becomes unresponsive. Only solution is to reset the box. I
>> do not know if the problem is with the USB devices sub-subsystem or a
>> possible ext4 race condition.
>
> Can you use alt-sysrq to get some stack traces or a register dump out,
> so we can see where the kernel is hanging?
>
> - Ted

alt-sysrq-w would be a good place to start (just in case you're not
familiar w/ the sysrq keys)

It'd also be great to test w/ 2.6.29, as a deadlock was fixed there
recently (it's on its way to .28.x too AFAIK)

Thanks,

-Eric

Subject: Re: possible ext4 race situation freezing linux

Tso & Eric,

I'll try again to reproduce the error, but when the system freezes, I
have the X interface running. And, indeed, I'm not familiar with the
behaviour of sysreq. Which sysrq should I use when the problem happens?
Will the system change to the text interface? If there are no disk
activity (all leds are off), where the information will be recorded?

-- Berendsen

-----Original Message-----
From: Eric Sandeen <[email protected]>
To: Theodore Tso <[email protected]>
Cc: Andreas Friedrich Berendsen <[email protected]>,
[email protected]
Subject: Re: possible ext4 race situation freezing linux
Date: Tue, 24 Feb 2009 09:38:17 -0600

Theodore Tso wrote:
> On Wed, Feb 25, 2009 at 01:22:22AM +1300, Andreas Friedrich Berendsen wrote:
>> Kernel version: 2.26.28.7
>> efs2progs version: 1.41.4
>> arch: x86_64 (amd)
>>
>> I made this test four times and the results were the same: linux
>> freezes and becomes unresponsive. Only solution is to reset the box. I
>> do not know if the problem is with the USB devices sub-subsystem or a
>> possible ext4 race condition.
>
> Can you use alt-sysrq to get some stack traces or a register dump out,
> so we can see where the kernel is hanging?
>
> - Ted

alt-sysrq-w would be a good place to start (just in case you're not
familiar w/ the sysrq keys)

It'd also be great to test w/ 2.6.29, as a deadlock was fixed there
recently (it's on its way to .28.x too AFAIK)

Thanks,

-Eric
--
__________________________________________
Andreas Friedrich Berendsen
SCA OCP MSCA A+ Linux+ Network+ HpMASE


2009-02-24 17:30:40

by Theodore Ts'o

[permalink] [raw]
Subject: Re: possible ext4 race situation freezing linux

On Wed, Feb 25, 2009 at 06:03:33AM +1300, Andreas Friedrich Berendsen wrote:
> Tso & Eric,
>
> I'll try again to reproduce the error, but when the system freezes, I
> have the X interface running. And, indeed, I'm not familiar with the
> behaviour of sysreq. Which sysrq should I use when the problem happens?
> Will the system change to the text interface? If there are no disk
> activity (all leds are off), where the information will be recorded?

Well, given that you can reproduce it fairly reliably, can't you just
switch to a text console using before triggering your reproduction
case? You can use different VT consules, switching between them using
Alt-F2, Alt-F3, Alt-F2, Alt-F4, etc., instead of using different
terminal windows.

Depending on how badly system is wedged, it may not be possible to
record the information to disk. Usually what folks will do is use a
digital camera and record snapshots from the text console, or, if they
have a serial console set up, they can record output on another
machine. A serial console has the advantage that you can reliably
capture the entire sysrq output (you send a serial break character
instead of using the sysrq key) and it also works even if you have X
running. As such, it's the preferred method, but it's a bit of a pain
for most people to set up, and some modern laptops no longer have 8250
serial ports any more, so we make do with what we have.

Regards,

- Ted

Subject: Re: possible ext4 race situation freezing linux

Ts'o,

Problem still exist. Now, when executing 'fsck.ext4 -C 0 -F -y -v' I
receive a list of inodes, and at certain point system freezes.

Attached I'm sending the output for SysRq as requested

Cheers
-- Berendsen


-----Original Message-----
From: Theodore Tso <[email protected]>
To: Andreas Friedrich Berendsen <[email protected]>
Cc: linux-ext4 <[email protected]>
Bcc: [email protected]
Subject: Re: possible ext4 race situation freezing linux
Date: Tue, 24 Feb 2009 12:30:31 -0500

On Wed, Feb 25, 2009 at 06:03:33AM +1300, Andreas Friedrich Berendsen wrote:
> Tso & Eric,
>
> I'll try again to reproduce the error, but when the system freezes, I
> have the X interface running. And, indeed, I'm not familiar with the
> behaviour of sysreq. Which sysrq should I use when the problem happens?
> Will the system change to the text interface? If there are no disk
> activity (all leds are off), where the information will be recorded?

Well, given that you can reproduce it fairly reliably, can't you just
switch to a text console using before triggering your reproduction
case? You can use different VT consules, switching between them using
Alt-F2, Alt-F3, Alt-F2, Alt-F4, etc., instead of using different
terminal windows.

Depending on how badly system is wedged, it may not be possible to
record the information to disk. Usually what folks will do is use a
digital camera and record snapshots from the text console, or, if they
have a serial console set up, they can record output on another
machine. A serial console has the advantage that you can reliably
capture the entire sysrq output (you send a serial break character
instead of using the sysrq key) and it also works even if you have X
running. As such, it's the preferred method, but it's a bit of a pain
for most people to set up, and some modern laptops no longer have 8250
serial ports any more, so we make do with what we have.

Regards,

- Ted
--
__________________________________________
Andreas Friedrich Berendsen
SCA OCP MSCA A+ Linux+ Network+ HpMASE


Attachments:
messages.gz (28.75 kB)

2009-03-04 06:56:51

by Eric Sandeen

[permalink] [raw]
Subject: Re: possible ext4 race situation freezing linux

Andreas Friedrich Berendsen wrote:
> Ts'o,
>
> Problem still exist. Now, when executing 'fsck.ext4 -C 0 -F -y -v' I
> receive a list of inodes, and at certain point system freezes.

so now it's freezing when ext4 isn't even mounted but simply being
fsck'd? This may point to a generic storage problem.

> Attached I'm sending the output for SysRq as requested

$ zcat messages.gz | grep -i sysrq
$

The sysrq output didn't seem to make it to that log.

-Eric

Subject: Re: possible ext4 race situation freezing linux

Steps:

1. Original FS with problems.
2. Using fsck with -y was problematic because at certainpoint a segment
faul occured
3. Using fsck manually. Answering 'y' for all questions but the one
which caused the segment fault
4. Removed as most as possible files from FS to new LV in the same VG
5. A new fsck run worked
6. resize2fs+lvreduce to reduce FS size and have more free space in VG
7. Removed more files to new LV inside the same VG
8. New run of fsck worked
9. resize2fs to prepare for a new lvreduce. Power failure after almost
24 hours of run
10. After system reboot, FS can be mounted and FS seems to be ok.
Executed find, find+grep, cp, and other tools to check file
accessibility. Not messages in /var/log/messages
11. New run of fsck. system freeze
12. Per request, used ALT+PrintScreen+(dlmpqtvw)
13. Used AltPrintScree+(resuib) to restart system
14. System restarted
15. Copy of /var/log/messages to /tmp/messages. Removed lines before and
after Alt+PrintScreen commands

Problem can be reproduced as many times as needed.

Do you want me to execute any procedures to collect data?

Extract from /tmp/messages:

Mar 4 19:19:28 storage kernel: ------------[ cut here ]------------
Mar 4 19:19:28 storage kernel: WARNING: at arch/x86/mm/ioremap.c:226
__ioremap_caller+0xc7/0x299()
Mar 4 19:19:28 storage kernel: Modules linked in: ck804xrom(+) i2c_core
mtd chipreg map_funcs joydev usb_storage ata_generic pata_acpi pata_amd
[last unloaded: scsi_wait_scan]
Mar 4 19:19:28 storage kernel: Pid: 881, comm: modprobe Not tainted
2.6.28.7.afb.fc10.4.x86_amd64 #1
Mar 4 19:19:28 storage kernel: Call Trace:
Mar 4 19:19:28 storage kernel: [<ffffffff8104516d>] warn_on_slowpath
+0x58/0x7d
Mar 4 19:19:28 storage kernel: [<ffffffff810458a5>] ?
release_console_sem+0x1c6/0x1fb
Mar 4 19:19:28 storage kernel: [<ffffffff81347775>] ? printk+0x3c/0x3f
Mar 4 19:19:28 storage kernel: [<ffffffff81028787>] ?
default_spin_lock_flags+0x9/0xe
Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom
+0x0/0x556 [ck804xrom]
Mar 4 19:19:28 storage kernel: [<ffffffff8102e4a6>] __ioremap_caller
+0xc7/0x299
Mar 4 19:19:28 storage kernel: [<ffffffffa006c25c>] ? init_ck804xrom
+0x25c/0x556 [ck804xrom]
Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom
+0x0/0x556 [ck804xrom]
Mar 4 19:19:28 storage kernel: [<ffffffff8102e74d>] ioremap_nocache
+0x12/0x14
Mar 4 19:19:28 storage kernel: [<ffffffffa006c25c>] init_ck804xrom
+0x25c/0x556 [ck804xrom]
Mar 4 19:19:28 storage kernel: [<ffffffff810ae835>] ? vfree+0x29/0x2b
Mar 4 19:19:28 storage kernel: [<ffffffff810699e5>] ? load_module
+0x1803/0x197a
Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom
+0x0/0x556 [ck804xrom]
Mar 4 19:19:28 storage kernel: [<ffffffff8100a058>] do_one_initcall
+0x58/0x145
Mar 4 19:19:28 storage kernel: [<ffffffff810c612c>] ? do_sync_read
+0xe7/0x12d
Mar 4 19:19:28 storage kernel: [<ffffffff81069ce2>] sys_init_module
+0xa9/0x1b6
Mar 4 19:19:28 storage kernel: [<ffffffff8101104a>]
system_call_fastpath+0x16/0x1b
Mar 4 19:19:28 storage kernel: ---[ end trace 66d1cdaa6433edb1 ]---


-----Original Message-----
From: Eric Sandeen <[email protected]>
To: Andreas Friedrich Berendsen <[email protected]>
Cc: Theodore Tso <[email protected]>, linux-ext4
<[email protected]>
Subject: Re: possible ext4 race situation freezing linux
Date: Wed, 04 Mar 2009 00:56:04 -0600

Andreas Friedrich Berendsen wrote:
> Ts'o,
>
> Problem still exist. Now, when executing 'fsck.ext4 -C 0 -F -y -v' I
> receive a list of inodes, and at certain point system freezes.

so now it's freezing when ext4 isn't even mounted but simply being
fsck'd? This may point to a generic storage problem.

> Attached I'm sending the output for SysRq as requested

$ zcat messages.gz | grep -i sysrq
$

The sysrq output didn't seem to make it to that log.

-Eric
--
__________________________________________
Andreas Friedrich Berendsen
SCA OCP MSCA A+ Linux+ Network+ HpMASE


Subject: Re: possible ext4 race situation freezing linux

After downloading and installing kernel 2.6.29-rc7 fsck is working.
Few (451) files with multiply-claimed blocks.
fsck is running for the last 12 hours and looks like is doing the job.

-----Original Message-----
From: Andreas Friedrich Berendsen <[email protected]>
To: Eric Sandeen <[email protected]>
Cc: Theodore Tso <[email protected]>, linux-ext4
<[email protected]>
Subject: Re: possible ext4 race situation freezing linux
Date: Wed, 04 Mar 2009 21:03:35 +1300

Steps:

1. Original FS with problems.
2. Using fsck with -y was problematic because at certainpoint a segment
faul occured
3. Using fsck manually. Answering 'y' for all questions but the one
which caused the segment fault
4. Removed as most as possible files from FS to new LV in the same VG
5. A new fsck run worked
6. resize2fs+lvreduce to reduce FS size and have more free space in VG
7. Removed more files to new LV inside the same VG
8. New run of fsck worked
9. resize2fs to prepare for a new lvreduce. Power failure after almost
24 hours of run
10. After system reboot, FS can be mounted and FS seems to be ok.
Executed find, find+grep, cp, and other tools to check file
accessibility. Not messages in /var/log/messages
11. New run of fsck. system freeze
12. Per request, used ALT+PrintScreen+(dlmpqtvw)
13. Used AltPrintScree+(resuib) to restart system
14. System restarted
15. Copy of /var/log/messages to /tmp/messages. Removed lines before and
after Alt+PrintScreen commands

Problem can be reproduced as many times as needed.

Do you want me to execute any procedures to collect data?

Extract from /tmp/messages:

Mar 4 19:19:28 storage kernel: ------------[ cut here ]------------
Mar 4 19:19:28 storage kernel: WARNING: at arch/x86/mm/ioremap.c:226
__ioremap_caller+0xc7/0x299()
Mar 4 19:19:28 storage kernel: Modules linked in: ck804xrom(+) i2c_core
mtd chipreg map_funcs joydev usb_storage ata_generic pata_acpi pata_amd
[last unloaded: scsi_wait_scan]
Mar 4 19:19:28 storage kernel: Pid: 881, comm: modprobe Not tainted
2.6.28.7.afb.fc10.4.x86_amd64 #1
Mar 4 19:19:28 storage kernel: Call Trace:
Mar 4 19:19:28 storage kernel: [<ffffffff8104516d>] warn_on_slowpath
+0x58/0x7d
Mar 4 19:19:28 storage kernel: [<ffffffff810458a5>] ?
release_console_sem+0x1c6/0x1fb
Mar 4 19:19:28 storage kernel: [<ffffffff81347775>] ? printk+0x3c/0x3f
Mar 4 19:19:28 storage kernel: [<ffffffff81028787>] ?
default_spin_lock_flags+0x9/0xe
Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom
+0x0/0x556 [ck804xrom]
Mar 4 19:19:28 storage kernel: [<ffffffff8102e4a6>] __ioremap_caller
+0xc7/0x299
Mar 4 19:19:28 storage kernel: [<ffffffffa006c25c>] ? init_ck804xrom
+0x25c/0x556 [ck804xrom]
Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom
+0x0/0x556 [ck804xrom]
Mar 4 19:19:28 storage kernel: [<ffffffff8102e74d>] ioremap_nocache
+0x12/0x14
Mar 4 19:19:28 storage kernel: [<ffffffffa006c25c>] init_ck804xrom
+0x25c/0x556 [ck804xrom]
Mar 4 19:19:28 storage kernel: [<ffffffff810ae835>] ? vfree+0x29/0x2b
Mar 4 19:19:28 storage kernel: [<ffffffff810699e5>] ? load_module
+0x1803/0x197a
Mar 4 19:19:28 storage kernel: [<ffffffffa006c000>] ? init_ck804xrom
+0x0/0x556 [ck804xrom]
Mar 4 19:19:28 storage kernel: [<ffffffff8100a058>] do_one_initcall
+0x58/0x145
Mar 4 19:19:28 storage kernel: [<ffffffff810c612c>] ? do_sync_read
+0xe7/0x12d
Mar 4 19:19:28 storage kernel: [<ffffffff81069ce2>] sys_init_module
+0xa9/0x1b6
Mar 4 19:19:28 storage kernel: [<ffffffff8101104a>]
system_call_fastpath+0x16/0x1b
Mar 4 19:19:28 storage kernel: ---[ end trace 66d1cdaa6433edb1 ]---


-----Original Message-----
From: Eric Sandeen <[email protected]>
To: Andreas Friedrich Berendsen <[email protected]>
Cc: Theodore Tso <[email protected]>, linux-ext4
<[email protected]>
Subject: Re: possible ext4 race situation freezing linux
Date: Wed, 04 Mar 2009 00:56:04 -0600

Andreas Friedrich Berendsen wrote:
> Ts'o,
>
> Problem still exist. Now, when executing 'fsck.ext4 -C 0 -F -y -v' I
> receive a list of inodes, and at certain point system freezes.

so now it's freezing when ext4 isn't even mounted but simply being
fsck'd? This may point to a generic storage problem.

> Attached I'm sending the output for SysRq as requested

$ zcat messages.gz | grep -i sysrq
$

The sysrq output didn't seem to make it to that log.

-Eric
--
__________________________________________
Andreas Friedrich Berendsen
SCA OCP MSCA A+ Linux+ Network+ HpMASE