2008-02-15 16:45:23

by Laurent CORBES

[permalink] [raw]
Subject: [BUG] OOPS 2.6.24.2 raid5 write with ioatdma

Hi all,

I got a raid5 oops when trying to write on a raid 5 array, with ioatdma loaded
and without DCA activated in bios:

------------[ cut here ]------------
kernel BUG at crypto/async_tx/async_xor.c:185!
invalid opcode: 0000 [#2] SMP
Modules linked in: dm_snapshot dm_mirror dm_mod thermal parport_pc parport button processor

Pid: 1135, comm: md11_raid5 Tainted: G D (2.6.24.2-sj-std-p4-smp #2)
EIP: 0060:[<c020713b>] EFLAGS: 00010202 CPU: 2
EIP is at async_xor+0x31b/0x320
EAX: f7556f5c EBX: f7556f5c ECX: f77433f8 EDX: c039d410
ESI: 00000001 EDI: 00000001 EBP: 00000000 ESP: f756dcfc
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process md11_raid5 (pid: 1135, ti=f756c000 task=f7032db0 task.ti=f756c000)
Stack: 00000001 00000000 c02070fc 00000400 00000000 f756dd70 c16eda20 00000000
00000000 3770d000 00000001 0000001a 00000000 00000001 f7556f5c f77433f8
c039d410 00000002 f61fa940 f756dd70 f756ddc0 c039d841 00000001 00001000
Call Trace:
[<c02070fc>] async_xor+0x2dc/0x320
[<c039d410>] ops_complete_write+0x0/0x60
[<c039d841>] ops_run_postxor+0xd1/0x160
[<c039d410>] ops_complete_write+0x0/0x60
[<c039c569>] async_copy_data+0x79/0x140
[<c039e9d1>] handle_stripe5+0x1021/0x1570
[<c02e882d>] scsi_alloc_sgtable+0x7d/0x1d0
[<c02e89d2>] scsi_init_io+0x52/0xd0
[<c02e8497>] scsi_get_cmd_from_req+0x27/0x40
[<c03adb20>] md_thread+0x0/0xe0
[<c039ff08>] handle_stripe+0x28/0xef0
[<c03adb20>] md_thread+0x0/0xe0
[<c03ac154>] md_check_recovery+0x24/0x4e0
[<c043009b>] schedule+0x1fb/0x7c0
[<c03adb20>] md_thread+0x0/0xe0
[<c03a114d>] raid5d+0x37d/0x400
[<c0127837>] lock_timer_base+0x27/0x60
[<c01278ce>] del_timer_sync+0xe/0x20
[<c04308a1>] schedule_timeout+0x51/0xc0
[<c03adb20>] md_thread+0x0/0xe0
[<c03adb20>] md_thread+0x0/0xe0
[<c03adb43>] md_thread+0x23/0xe0
[<c0131900>] autoremove_wake_function+0x0/0x40
[<c03adb20>] md_thread+0x0/0xe0
[<c0131652>] kthread+0x42/0x70
[<c0131610>] kthread+0x0/0x70
[<c01037b7>] kernel_thread_helper+0x7/0x10
=======================
Code: fe ff ff 8b 5c 24 64 c7 43 04 01 00 00 00 e9 63 fe ff ff 0f 0b eb fe c7 44 24 04 a9 41 44 c0 c7 04 24 9c e7 4b c0 e8 15 7d f1 ff <0f> 0b eb fe 90 55 57 89 cf 56 53 89 d3 83 ec 20 ba 05 00 00 00
EIP: [<c020713b>] async_xor+0x31b/0x320 SS:ESP 0068:f756dcfc
---[ end trace 091e56cc9ca29fd6 ]---

It seems like aync_tx cannot process the xor (trying to access ioatdma but
failed ?).

When I enable DCA in system bios I cannot boot, the ioatdma subsystem failed to
initialize, it stalled at:
ioatdma: ioat_dma_test_callback(00008086)

Here is the mdstat:

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sdd2[3] sdc2[2] sdb2[1] sda2[0]
979840 blocks [4/4] [UUUU]

md2 : active raid1 sdh2[3] sdg2[2] sdf2[1] sde2[0]
979840 blocks [4/4] [UUUU]

md10 : active raid5 sdh3[7] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1] sda3[0]
6830121984 blocks level 5, 256k chunk, algorithm 2 [8/8] [UUUUUUUU]
bitmap: 0/233 pages [0KB], 2048KB chunk

md3 : active raid1 sdl2[3] sdk2[2] sdj2[1] sdi2[0]
979840 blocks [4/4] [UUUU]

md4 : active raid1 sdp2[3] sdo2[2] sdn2[1] sdm2[0]
979840 blocks [4/4] [UUUU]

md11 : active raid5 sdp3[7] sdo3[6] sdn3[5] sdm3[4] sdl3[3] sdk3[2] sdj3[1] sdi3[0]
6830121984 blocks level 5, 256k chunk, algorithm 2 [8/8] [UUUUUUUU]
bitmap: 0/233 pages [0KB], 2048KB chunk

md0 : active raid1 sdb1[1] sda1[0]
48064 blocks [2/2] [UU]

unused devices: <none>

Here is the lspci:
00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller Hub (rev b1)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 2-3 (rev b1)
00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev b1)
00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 6-7 (rev b1)
00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev b1)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset Error Reporting Registers (rev b1)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset Error Reporting Registers (rev b1)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset Error Reporting Registers (rev b1)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1)
00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)
00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09)
00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09)
01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)
02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01)
03:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
03:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
06:00.0 Ethernet controller: Intel Corporation 631xESB/632xESB DPT LAN Controller Copper (rev 01)
06:00.1 Ethernet controller: Intel Corporation 631xESB/632xESB DPT LAN Controller Copper (rev 01)
09:00.0 PCI bridge: Intel Corporation 80333 Segment-A PCI Express-to-PCI Express Bridge
09:00.2 PCI bridge: Intel Corporation 80333 Segment-B PCI Express-to-PCI Express Bridge
0a:0e.0 RAID bus controller: Areca Technology Corp. ARC-1260 16-Port PCI-Express to SATA RAID Controller
0d:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)

full dmesg is in attachement.

Thanks.
--
Laurent Corbes - [email protected]
+33 (0)1 4996 6325
Smartjog SA - http://www.smartjog.com/


Attachments:
(No filename) (6.54 kB)
dmesg (42.51 kB)
Download all attachments

2008-02-15 19:17:58

by Dan Williams

[permalink] [raw]
Subject: Re: [BUG] OOPS 2.6.24.2 raid5 write with ioatdma

On Fri, Feb 15, 2008 at 9:19 AM, Laurent CORBES
<[email protected]> wrote:
> Hi all,
>
> I got a raid5 oops when trying to write on a raid 5 array, with ioatdma loaded
> and without DCA activated in bios:
>

At first glance I believe the attached patch may fix the issue, I'll
try to reproduce this locally.

Regards,
Dan


Attachments:
(No filename) (334.00 B)
ioat-init-ack.patch (861.00 B)
Download all attachments

2008-02-15 19:43:36

by Shannon Nelson

[permalink] [raw]
Subject: Re: [BUG] OOPS 2.6.24.2 raid5 write with ioatdma

On Fri, Feb 15, 2008 at 8:19 AM, Laurent CORBES
<[email protected]> wrote:
> Hi all,
>
> I got a raid5 oops when trying to write on a raid 5 array, with ioatdma loaded
> and without DCA activated in bios:
>
[...]

Dan's quick patch is likely the right answer.

>
> When I enable DCA in system bios I cannot boot, the ioatdma subsystem failed to
> initialize, it stalled at:
> ioatdma: ioat_dma_test_callback(00008086)

I don't see this example in the dmesg, but I'm not sure why you think
this is the case. If DCA is enabled, you should not see any further
messages from ioatdma after the ioat_self_test_callback().

sln
--
==============================================
Mr. Shannon Nelson Parents can't afford to be squeamish.