2009-09-09 10:14:41

by Thomas Fjellstrom

[permalink] [raw]
Subject: 2.6.31-rc8 libata WARNING?

Ok, After doing some more tests, I can be fairly certain that my disk is not
at fault, testing just ONE of my 2 WD Green drives, not using any md raid, or
filesystem on top I get the following:

[ 329.394283] ------------[ cut here ]------------
[ 329.394361] WARNING: at drivers/ata/libata-core.c:5129
ata_qc_issue+0x10a/0x347 [libata]()
[ 329.394367] Hardware name: GA-MA790FXT-UD5P
[ 329.394371] Modules linked in: powernow_k8 cpufreq_conservative
cpufreq_stats cpufreq_userspace cpufreq_powersave kvm_amd kvm nfsd exportfs
nfs lockd fscache nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473
firewire_sbp2 loop md_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm amd64_edac_mod snd_seq_midi
snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device edac_core snd
i2c_piix4 soundcore i2c_core snd_page_alloc evdev parport_pc parport button
wmi processor ext3 jbd mbcache dm_mod sg usbhid sr_mod hid cdrom pata_jmicron
firewire_ohci firewire_core ata_generic sd_mod crc_t10dif ohci_hcd crc_itu_t
atiixp ide_pci_generic ide_core ahci mvsas ehci_hcd libsas libata
scsi_transport_sas scsi_mod r8169 mii floppy thermal fan thermal_sys [last
unloaded: scsi_wait_scan]
[ 329.394488] Pid: 3103, comm: hddtemp Not tainted 2.6.31-rc8 #1
[ 329.394493] Call Trace:
[ 329.394540] [<ffffffffa007cf90>] ? ata_qc_issue+0x10a/0x347 [libata]
[ 329.394583] [<ffffffffa007cf90>] ? ata_qc_issue+0x10a/0x347 [libata]
[ 329.394596] [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3
[ 329.394639] [<ffffffffa00810ce>] ? ata_scsi_pass_thru+0x0/0x240 [libata]
[ 329.394680] [<ffffffffa007cf90>] ? ata_qc_issue+0x10a/0x347 [libata]
[ 329.394726] [<ffffffffa00401d6>] ? scsi_get_command+0x75/0x97 [scsi_mod]
[ 329.394768] [<ffffffffa00810ce>] ? ata_scsi_pass_thru+0x0/0x240 [libata]
[ 329.394807] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
[ 329.394850] [<ffffffffa00824d5>] ? __ata_scsi_queuecmd+0x185/0x1dc
[libata]
[ 329.394889] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
[ 329.394911] [<ffffffffa00abc8e>] ? sas_queuecommand+0x83/0x25d [libsas]
[ 329.394949] [<ffffffffa003fa7c>] ? scsi_dispatch_cmd+0x1c0/0x23c
[scsi_mod]
[ 329.394988] [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506 [scsi_mod]
[ 329.394999] [<ffffffff810546e0>] ? del_timer+0x59/0x62
[ 329.395009] [<ffffffff81163b08>] ? blk_execute_rq_nowait+0x65/0x89
[ 329.395024] [<ffffffffa016764f>] ? sg_common_write+0x489/0x4ab [sg]
[ 329.395034] [<ffffffff8115deee>] ? __freed_request+0x26/0x83
[ 329.395048] [<ffffffffa01681da>] ? sg_new_write+0x23e/0x269 [sg]
[ 329.395062] [<ffffffffa0168473>] ? sg_ioctl+0x26e/0xb63 [sg]
[ 329.395072] [<ffffffff81100ef8>] ? inotify_d_instantiate+0x12/0x39
[ 329.395081] [<ffffffff8105eee6>] ? autoremove_wake_function+0x0/0x2e
[ 329.395090] [<ffffffff810d8097>] ? fd_install+0x2e/0x5a
[ 329.395097] [<ffffffff810e5207>] ? vfs_ioctl+0x56/0x6c
[ 329.395104] [<ffffffff810e56ca>] ? do_vfs_ioctl+0x437/0x475
[ 329.395111] [<ffffffff810e5759>] ? sys_ioctl+0x51/0x70
[ 329.395121] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
[ 329.395127] ---[ end trace 2e6f5d9886b0398e ]---


And that happens after only a couple minutes of: dd if=/dev/sdc of=/dev/null

And this is with the WD that wasn't previously showing up in any dmesg logs.
I'm assuming if I let the dd test run, I will continue to see more errors
until the entire libata subsystem causes the sata driver to kneel over and
die.

I'm going to let it run for a while to see what happens.

--
Thomas Fjellstrom
[email protected]


2009-09-09 16:30:15

by Thomas Fjellstrom

[permalink] [raw]
Subject: 2.6.31-rc9 kernel BUG (was: Re: 2.6.31-rc8 libata WARNING?)

On Wed September 9 2009, Thomas Fjellstrom wrote:
> Ok, After doing some more tests, I can be fairly certain that my disk is
> not at fault, testing just ONE of my 2 WD Green drives, not using any md
> raid, or filesystem on top I get the following:
>
> [ 329.394283] ------------[ cut here ]------------
> [ 329.394361] WARNING: at drivers/ata/libata-core.c:5129
> ata_qc_issue+0x10a/0x347 [libata]()
> [ 329.394367] Hardware name: GA-MA790FXT-UD5P
> [ 329.394371] Modules linked in: powernow_k8 cpufreq_conservative
> cpufreq_stats cpufreq_userspace cpufreq_powersave kvm_amd kvm nfsd exportfs
> nfs lockd fscache nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid
> adt7473 firewire_sbp2 loop md_mod snd_hda_codec_realtek snd_hda_intel
> snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm amd64_edac_mod
> snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
> snd_seq_device edac_core snd i2c_piix4 soundcore i2c_core snd_page_alloc
> evdev parport_pc parport button wmi processor ext3 jbd mbcache dm_mod sg
> usbhid sr_mod hid cdrom pata_jmicron firewire_ohci firewire_core
> ata_generic sd_mod crc_t10dif ohci_hcd crc_itu_t atiixp ide_pci_generic
> ide_core ahci mvsas ehci_hcd libsas libata
> scsi_transport_sas scsi_mod r8169 mii floppy thermal fan thermal_sys [last
> unloaded: scsi_wait_scan]
> [ 329.394488] Pid: 3103, comm: hddtemp Not tainted 2.6.31-rc8 #1
> [ 329.394493] Call Trace:
> [ 329.394540] [<ffffffffa007cf90>] ? ata_qc_issue+0x10a/0x347 [libata]
> [ 329.394583] [<ffffffffa007cf90>] ? ata_qc_issue+0x10a/0x347 [libata]
> [ 329.394596] [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3
> [ 329.394639] [<ffffffffa00810ce>] ? ata_scsi_pass_thru+0x0/0x240
> [libata] [ 329.394680] [<ffffffffa007cf90>] ? ata_qc_issue+0x10a/0x347
> [libata] [ 329.394726] [<ffffffffa00401d6>] ? scsi_get_command+0x75/0x97
> [scsi_mod] [ 329.394768] [<ffffffffa00810ce>] ?
> ata_scsi_pass_thru+0x0/0x240 [libata] [ 329.394807] [<ffffffffa003f7aa>]
> ? scsi_done+0x0/0xc [scsi_mod] [ 329.394850] [<ffffffffa00824d5>] ?
> __ata_scsi_queuecmd+0x185/0x1dc [libata]
> [ 329.394889] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
> [ 329.394911] [<ffffffffa00abc8e>] ? sas_queuecommand+0x83/0x25d [libsas]
> [ 329.394949] [<ffffffffa003fa7c>] ? scsi_dispatch_cmd+0x1c0/0x23c
> [scsi_mod]
> [ 329.394988] [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506
> [scsi_mod] [ 329.394999] [<ffffffff810546e0>] ? del_timer+0x59/0x62
> [ 329.395009] [<ffffffff81163b08>] ? blk_execute_rq_nowait+0x65/0x89
> [ 329.395024] [<ffffffffa016764f>] ? sg_common_write+0x489/0x4ab [sg]
> [ 329.395034] [<ffffffff8115deee>] ? __freed_request+0x26/0x83
> [ 329.395048] [<ffffffffa01681da>] ? sg_new_write+0x23e/0x269 [sg]
> [ 329.395062] [<ffffffffa0168473>] ? sg_ioctl+0x26e/0xb63 [sg]
> [ 329.395072] [<ffffffff81100ef8>] ? inotify_d_instantiate+0x12/0x39
> [ 329.395081] [<ffffffff8105eee6>] ? autoremove_wake_function+0x0/0x2e
> [ 329.395090] [<ffffffff810d8097>] ? fd_install+0x2e/0x5a
> [ 329.395097] [<ffffffff810e5207>] ? vfs_ioctl+0x56/0x6c
> [ 329.395104] [<ffffffff810e56ca>] ? do_vfs_ioctl+0x437/0x475
> [ 329.395111] [<ffffffff810e5759>] ? sys_ioctl+0x51/0x70
> [ 329.395121] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> [ 329.395127] ---[ end trace 2e6f5d9886b0398e ]---
>
>
> And that happens after only a couple minutes of: dd if=/dev/sdc
> of=/dev/null
>
> And this is with the WD that wasn't previously showing up in any dmesg
> logs. I'm assuming if I let the dd test run, I will continue to see more
> errors until the entire libata subsystem causes the sata driver to kneel
> over and die.
>
> I'm going to let it run for a while to see what happens.
>

No errors on that disk. Other than the one above, and its more of a warning.
However, I just rebooted to add some extra drives, thinking everything was
working a little better now that I've updated to 2.6.31-rc9, I'm treated to
the following two messages right after boot (and a system lockup to boot):

kernel: [ 971.033138] ------------[ cut here ]------------
kernel: [ 971.033211] WARNING: at drivers/ata/libata-core.c:4913
__ata_qc_complete+0x5a/0xe1 [libata]()
kernel: [ 971.033217] Hardware name: GA-MA790FXT-UD5P
kernel: [ 971.033221] Modules linked in: powernow_k8 cpufreq_conservative
cpufreq_stats cpufreq_userspace cpufreq_powersave kvm_amd kvm nfsd exportfs
nfs lockd fscache nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473
firewire_sbp2 loop md_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
snd_seq_midi_event snd_seq snd_timer snd_seq_device snd amd64_edac_mod
edac_core i2c_piix4 soundcore snd_page_alloc i2c_core evdev wmi parport_pc
button parport processor ext3 jbd mbcache dm_mod sg sr_mod cdrom sd_mod
crc_t10dif usbhid ata_generic ide_pci_generic hid mvsas firewire_ohci libsas
firewire_core crc_itu_t scsi_transport_sas r8169 atiixp ide_core floppy ahci
mii ohci_hcd libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
scsi_wait_scan]
kernel: [ 971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9 #2
kernel: [ 971.033342] Call Trace:
kernel: [ 971.033346] <IRQ> [<ffffffffa00562ca>] ?
__ata_qc_complete+0x5a/0xe1 [libata]
kernel: [ 971.033434] [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1
[libata]
kernel: [ 971.033446] [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3
kernel: [ 971.033455] [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65
kernel: [ 971.033496] [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1
[libata]
kernel: [ 971.033519] [<ffffffffa00f7b59>] ? sas_ata_task_done+0x178/0x210
[libsas]
kernel: [ 971.033528] [<ffffffff8115ead1>] ? blk_run_queue+0x21/0x35
kernel: [ 971.033548] [<ffffffffa010e2ce>] ? mvs_slot_complete+0x3df/0x41b
[mvsas]
kernel: [ 971.033565] [<ffffffffa010e39c>] ? mvs_int_rx+0x92/0x101 [mvsas]
kernel: [ 971.033583] [<ffffffffa01112ba>] ? mvs_int_full+0x25/0x88 [mvsas]
kernel: [ 971.033600] [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas]
kernel: [ 971.033617] [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
kernel: [ 971.033625] [<ffffffff8108aaac>] ? handle_IRQ_event+0x58/0x135
kernel: [ 971.033633] [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5
kernel: [ 971.033642] [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
kernel: [ 971.033649] [<ffffffff81012ee5>] ? do_IRQ+0x57/0xb6
kernel: [ 971.033656] [<ffffffff81011413>] ? ret_from_intr+0x0/0x11
kernel: [ 971.033660] <EOI> [<ffffffff8102b520>] ? native_safe_halt+0x2/0x3
kernel: [ 971.033676] [<ffffffff81017c61>] ? default_idle+0x40/0x68
kernel: [ 971.033684] [<ffffffff810684d0>] ? clockevents_notify+0x2b/0x7c
kernel: [ 971.033692] [<ffffffff8101805e>] ? c1e_idle+0xd3/0xfb
kernel: [ 971.033700] [<ffffffff8100fd9b>] ? cpu_idle+0x50/0x91
kernel: [ 971.033706] ---[ end trace bb4a1fceddfa8284 ]---
kernel: [ 998.728950] ------------[ cut here ]------------
kernel: [ 998.728961] kernel BUG at mm/slab.c:2974!
kernel: [ 998.728967] invalid opcode: 0000 [#1] SMP
kernel: [ 998.728974] last sysfs file:
/sys/devices/platform/it87.552/temp1_input
kernel: [ 998.728979] CPU 2
kernel: [ 998.728983] Modules linked in: powernow_k8 cpufreq_conservative
cpufreq_stats cpufreq_userspace cpufreq_powersave kvm_amd kvm nfsd exportfs
nfs lockd fscache nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473
firewire_sbp2 loop md_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
snd_seq_midi_event snd_seq snd_timer snd_seq_device snd amd64_edac_mod
edac_core i2c_piix4 soundcore snd_page_alloc i2c_core evdev wmi parport_pc
button parport processor ext3 jbd mbcache dm_mod sg sr_mod cdrom sd_mod
crc_t10dif usbhid ata_generic ide_pci_generic hid mvsas firewire_ohci libsas
firewire_core crc_itu_t scsi_transport_sas r8169 atiixp ide_core floppy ahci
mii ohci_hcd libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
scsi_wait_scan]
kernel: [ 998.729105] Pid: 8278, comm: hddtemp Tainted: G W 2.6.31-
rc9 #2 GA-MA790FXT-UD5P
kernel: [ 998.729111] RIP: 0010:[<ffffffff810d4c17>] [<ffffffff810d4c17>]
cache_alloc_refill+0xf6/0x1f9
kernel: [ 998.729128] RSP: 0018:ffff88012e1dfab8 EFLAGS: 00010086
kernel: [ 998.729134] RAX: 00000000fffffffe RBX: ffff88012b90cc40 RCX:
0000000000000000
kernel: [ 998.729140] RDX: 0000000000000000 RSI: ffff880109597140 RDI:
ffff88012b90cc50
kernel: [ 998.729145] RBP: ffff88012b911a00 R08: ffff88012b90cc60 R09:
0000000000000086
kernel: [ 998.729151] R10: 00007fff9c05cc30 R11: 0000000100000002 R12:
0000000000000010
kernel: [ 998.729156] R13: ffff88012b9366c0 R14: 0000000000049220 R15:
0000000000000000
kernel: [ 998.729163] FS: 00007f07cfd826f0(0000) GS:ffff88002805c000(0000)
knlGS:00000000f76fbbb0
kernel: [ 998.729169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [ 998.729174] CR2: 00000000084ec298 CR3: 000000012e1d9000 CR4:
00000000000006e0
kernel: [ 998.729179] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
kernel: [ 998.729185] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
kernel: [ 998.729191] Process hddtemp (pid: 8278, threadinfo
ffff88012e1de000, task ffff88012d8dd100)
kernel: [ 998.729196] Stack:
kernel: [ 998.729199] ffff880105c64918 ffff88012b9366c0 ffff88012ac97a00
0000000000008020
kernel: [ 998.729207] <0> 0000000000000002 0000000000008020 ffff88012bc8e000
ffffffff810d4fb7
kernel: [ 998.729216] <0> 0000000000000000 ffff88012ac97a00 ffff88012aed40d0
ffffffffa005c0ce
kernel: [ 998.729225] Call Trace:
kernel: [ 998.729236] [<ffffffff810d4fb7>] ? kmem_cache_alloc+0xe9/0x175
kernel: [ 998.729287] [<ffffffffa005c0ce>] ? ata_scsi_pass_thru+0x0/0x240
[libata]
kernel: [ 998.729311] [<ffffffffa00f7055>] ? sas_alloc_task+0x14/0x62
[libsas]
kernel: [ 998.729331] [<ffffffffa00f77ff>] ? sas_ata_qc_issue+0x3b/0x21d
[libsas]
kernel: [ 998.729373] [<ffffffffa005c0ce>] ? ata_scsi_pass_thru+0x0/0x240
[libata]
kernel: [ 998.729415] [<ffffffffa0058184>] ? ata_qc_issue+0x2fe/0x347
[libata]
kernel: [ 998.729456] [<ffffffffa001b1d6>] ? scsi_get_command+0x75/0x97
[scsi_mod]
kernel: [ 998.729498] [<ffffffffa005c0ce>] ? ata_scsi_pass_thru+0x0/0x240
[libata]
kernel: [ 998.729536] [<ffffffffa001a7aa>] ? scsi_done+0x0/0xc [scsi_mod]
kernel: [ 998.729578] [<ffffffffa005d4d5>] ? __ata_scsi_queuecmd+0x185/0x1dc
[libata]
kernel: [ 998.729615] [<ffffffffa001a7aa>] ? scsi_done+0x0/0xc [scsi_mod]
kernel: [ 998.729635] [<ffffffffa00f6c8e>] ? sas_queuecommand+0x83/0x25d
[libsas]
kernel: [ 998.729673] [<ffffffffa001aa7c>] ? scsi_dispatch_cmd+0x1c0/0x23c
[scsi_mod]
kernel: [ 998.729712] [<ffffffffa001fff0>] ? scsi_request_fn+0x3a5/0x506
[scsi_mod]
kernel: [ 998.729723] [<ffffffff810546e0>] ? del_timer+0x59/0x62
kernel: [ 998.729733] [<ffffffff81163b70>] ? blk_execute_rq_nowait+0x65/0x89
kernel: [ 998.729749] [<ffffffffa016964f>] ? sg_common_write+0x489/0x4ab
[sg]
kernel: [ 998.729759] [<ffffffff8115df56>] ? __freed_request+0x26/0x83
kernel: [ 998.729773] [<ffffffffa016a1da>] ? sg_new_write+0x23e/0x269 [sg]
kernel: [ 998.729786] [<ffffffffa016a473>] ? sg_ioctl+0x26e/0xb63 [sg]
kernel: [ 998.729796] [<ffffffff81100f38>] ? inotify_d_instantiate+0x12/0x39
kernel: [ 998.729805] [<ffffffff8105eee6>] ?
autoremove_wake_function+0x0/0x2e
kernel: [ 998.729813] [<ffffffff810d80bf>] ? fd_install+0x2e/0x5a
kernel: [ 998.729820] [<ffffffff810e5247>] ? vfs_ioctl+0x56/0x6c
kernel: [ 998.729827] [<ffffffff810e570a>] ? do_vfs_ioctl+0x437/0x475
kernel: [ 998.729834] [<ffffffff810e5799>] ? sys_ioctl+0x51/0x70
kernel: [ 998.729844] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
kernel: [ 998.729848] Code: 00 00 00 48 8b 33 48 39 de 75 14 48 8b 73 20 c7
43 60 01 00 00 00 4c 39 c6 0f 84 a4 00 00 00 8b 46 20 41 3b 85 18 10 00 00 72
31 <0f> 0b eb fe ff c0 8b 4d 00 41 8b 95 0c 10 00 00 89 46 20 8b 46
kernel: [ 998.729913] RIP [<ffffffff810d4c17>] cache_alloc_refill+0xf6/0x1f9
kernel: [ 998.729922] RSP <ffff88012e1dfab8>
kernel: [ 998.729928] ---[ end trace bb4a1fceddfa8285 ]---

The added hard drives are connected to a Supermicro AOC-SASLP-MV8, which is
based on a marvel MV64460/64461/64462 chipset, which uses the sata_mv driver.

--
Thomas Fjellstrom
[email protected]

2009-09-09 18:52:23

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
> No errors on that disk. Other than the one above, and its more of a warning.
> However, I just rebooted to add some extra drives, thinking everything was
> working a little better now that I've updated to 2.6.31-rc9, I'm treated to
> the following two messages right after boot (and a system lockup to boot):
>
> kernel: [ 971.033138] ------------[ cut here ]------------
> kernel: [ 971.033211] WARNING: at drivers/ata/libata-core.c:4913
> __ata_qc_complete+0x5a/0xe1 [libata]()
> kernel: [ 971.033217] Hardware name: GA-MA790FXT-UD5P
> kernel: [ 971.033221] Modules linked in: powernow_k8 cpufreq_conservative
> cpufreq_stats cpufreq_userspace cpufreq_powersave kvm_amd kvm nfsd exportfs
> nfs lockd fscache nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473
> firewire_sbp2 loop md_mod snd_hda_codec_realtek snd_hda_intel snd_hda_codec
> snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
> snd_seq_midi_event snd_seq snd_timer snd_seq_device snd amd64_edac_mod
> edac_core i2c_piix4 soundcore snd_page_alloc i2c_core evdev wmi parport_pc
> button parport processor ext3 jbd mbcache dm_mod sg sr_mod cdrom sd_mod
> crc_t10dif usbhid ata_generic ide_pci_generic hid mvsas firewire_ohci libsas
> firewire_core crc_itu_t scsi_transport_sas r8169 atiixp ide_core floppy ahci
> mii ohci_hcd libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
> scsi_wait_scan]
> kernel: [ 971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9 #2
> kernel: [ 971.033342] Call Trace:
> kernel: [ 971.033346]<IRQ> [<ffffffffa00562ca>] ?
> __ata_qc_complete+0x5a/0xe1 [libata]
> kernel: [ 971.033434] [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1
> [libata]
> kernel: [ 971.033446] [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3
> kernel: [ 971.033455] [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65
> kernel: [ 971.033496] [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1
> [libata]
> kernel: [ 971.033519] [<ffffffffa00f7b59>] ? sas_ata_task_done+0x178/0x210
> [libsas]
> kernel: [ 971.033528] [<ffffffff8115ead1>] ? blk_run_queue+0x21/0x35
> kernel: [ 971.033548] [<ffffffffa010e2ce>] ? mvs_slot_complete+0x3df/0x41b
> [mvsas]
> kernel: [ 971.033565] [<ffffffffa010e39c>] ? mvs_int_rx+0x92/0x101 [mvsas]
> kernel: [ 971.033583] [<ffffffffa01112ba>] ? mvs_int_full+0x25/0x88 [mvsas]
> kernel: [ 971.033600] [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas]
> kernel: [ 971.033617] [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
> kernel: [ 971.033625] [<ffffffff8108aaac>] ? handle_IRQ_event+0x58/0x135
> kernel: [ 971.033633] [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5
> kernel: [ 971.033642] [<ffffffff8101388d>] ? handle_irq+0x17/0x1d

That warning is triggered when an ata_queued_cmd is passed to completion
without the ATA_QCFLAG_ACTIVE flag being set (which indicates the qc was
started with some activity).

That possibly indicates the low-level driver (or libsas) was passing an
already-completed cmd to libata.


> The added hard drives are connected to a Supermicro AOC-SASLP-MV8, which is
> based on a marvel MV64460/64461/64462 chipset, which uses the sata_mv driver.

Surely you mean 'mvsas' driver?

Jeff


2009-09-09 19:10:31

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Wed September 9 2009, Jeff Garzik wrote:
> On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
> > No errors on that disk. Other than the one above, and its more of a
> > warning. However, I just rebooted to add some extra drives, thinking
> > everything was working a little better now that I've updated to
> > 2.6.31-rc9, I'm treated to the following two messages right after boot
> > (and a system lockup to boot):
> >
> > kernel: [ 971.033138] ------------[ cut here ]------------
> > kernel: [ 971.033211] WARNING: at drivers/ata/libata-core.c:4913
> > __ata_qc_complete+0x5a/0xe1 [libata]()
> > kernel: [ 971.033217] Hardware name: GA-MA790FXT-UD5P
> > kernel: [ 971.033221] Modules linked in: powernow_k8
> > cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave
> > kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc
> > bridge stp it87 hwmon_vid adt7473 firewire_sbp2 loop md_mod
> > snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss
> > snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq
> > snd_timer snd_seq_device snd amd64_edac_mod edac_core i2c_piix4 soundcore
> > snd_page_alloc i2c_core evdev wmi parport_pc button parport processor
> > ext3 jbd mbcache dm_mod sg sr_mod cdrom sd_mod crc_t10dif usbhid
> > ata_generic ide_pci_generic hid mvsas firewire_ohci libsas firewire_core
> > crc_itu_t scsi_transport_sas r8169 atiixp ide_core floppy ahci mii
> > ohci_hcd libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
> > scsi_wait_scan]
> > kernel: [ 971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9 #2
> > kernel: [ 971.033342] Call Trace:
> > kernel: [ 971.033346]<IRQ> [<ffffffffa00562ca>] ?
> > __ata_qc_complete+0x5a/0xe1 [libata]
> > kernel: [ 971.033434] [<ffffffffa00562ca>] ?
> > __ata_qc_complete+0x5a/0xe1 [libata]
> > kernel: [ 971.033446] [<ffffffff8104aca0>] ?
> > warn_slowpath_common+0x77/0xa3 kernel: [ 971.033455]
> > [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65 kernel: [ 971.033496]
> > [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1 [libata]
> > kernel: [ 971.033519] [<ffffffffa00f7b59>] ?
> > sas_ata_task_done+0x178/0x210 [libsas]
> > kernel: [ 971.033528] [<ffffffff8115ead1>] ? blk_run_queue+0x21/0x35
> > kernel: [ 971.033548] [<ffffffffa010e2ce>] ?
> > mvs_slot_complete+0x3df/0x41b [mvsas]
> > kernel: [ 971.033565] [<ffffffffa010e39c>] ? mvs_int_rx+0x92/0x101
> > [mvsas] kernel: [ 971.033583] [<ffffffffa01112ba>] ?
> > mvs_int_full+0x25/0x88 [mvsas] kernel: [ 971.033600]
> > [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas] kernel: [
> > 971.033617] [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
> > kernel: [ 971.033625] [<ffffffff8108aaac>] ?
> > handle_IRQ_event+0x58/0x135 kernel: [ 971.033633] [<ffffffff8108c1a1>]
> > ? handle_fasteoi_irq+0x7d/0xb5 kernel: [ 971.033642]
> > [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
>
> That warning is triggered when an ata_queued_cmd is passed to completion
> without the ATA_QCFLAG_ACTIVE flag being set (which indicates the qc was
> started with some activity).
>
> That possibly indicates the low-level driver (or libsas) was passing an
> already-completed cmd to libata.
>
> > The added hard drives are connected to a Supermicro AOC-SASLP-MV8, which
> > is based on a marvel MV64460/64461/64462 chipset, which uses the sata_mv
> > driver.
>
> Surely you mean 'mvsas' driver?

Yes, sorry I did mean mvsas.

I am more concerned about the actual oops/BUG rather than the warning though.
Unless the problem causing the warning is also causing the oops.

> Jeff
>


--
Thomas Fjellstrom
[email protected]

2009-09-09 20:07:12

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Wed September 9 2009, Thomas Fjellstrom wrote:
> On Wed September 9 2009, Jeff Garzik wrote:
> > On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
> > > No errors on that disk. Other than the one above, and its more of a
> > > warning. However, I just rebooted to add some extra drives, thinking
> > > everything was working a little better now that I've updated to
> > > 2.6.31-rc9, I'm treated to the following two messages right after boot
> > > (and a system lockup to boot):
> > >
> > > kernel: [ 971.033138] ------------[ cut here ]------------
> > > kernel: [ 971.033211] WARNING: at drivers/ata/libata-core.c:4913
> > > __ata_qc_complete+0x5a/0xe1 [libata]()
> > > kernel: [ 971.033217] Hardware name: GA-MA790FXT-UD5P
> > > kernel: [ 971.033221] Modules linked in: powernow_k8
> > > cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave
> > > kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc
> > > bridge stp it87 hwmon_vid adt7473 firewire_sbp2 loop md_mod
> > > snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss
> > > snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event
> > > snd_seq snd_timer snd_seq_device snd amd64_edac_mod edac_core i2c_piix4
> > > soundcore snd_page_alloc i2c_core evdev wmi parport_pc button parport
> > > processor ext3 jbd mbcache dm_mod sg sr_mod cdrom sd_mod crc_t10dif
> > > usbhid ata_generic ide_pci_generic hid mvsas firewire_ohci libsas
> > > firewire_core crc_itu_t scsi_transport_sas r8169 atiixp ide_core floppy
> > > ahci mii ohci_hcd libata ehci_hcd scsi_mod thermal fan thermal_sys
> > > [last unloaded: scsi_wait_scan]
> > > kernel: [ 971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9 #2
> > > kernel: [ 971.033342] Call Trace:
> > > kernel: [ 971.033346]<IRQ> [<ffffffffa00562ca>] ?
> > > __ata_qc_complete+0x5a/0xe1 [libata]
> > > kernel: [ 971.033434] [<ffffffffa00562ca>] ?
> > > __ata_qc_complete+0x5a/0xe1 [libata]
> > > kernel: [ 971.033446] [<ffffffff8104aca0>] ?
> > > warn_slowpath_common+0x77/0xa3 kernel: [ 971.033455]
> > > [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65 kernel: [ 971.033496]
> > > [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1 [libata]
> > > kernel: [ 971.033519] [<ffffffffa00f7b59>] ?
> > > sas_ata_task_done+0x178/0x210 [libsas]
> > > kernel: [ 971.033528] [<ffffffff8115ead1>] ? blk_run_queue+0x21/0x35
> > > kernel: [ 971.033548] [<ffffffffa010e2ce>] ?
> > > mvs_slot_complete+0x3df/0x41b [mvsas]
> > > kernel: [ 971.033565] [<ffffffffa010e39c>] ? mvs_int_rx+0x92/0x101
> > > [mvsas] kernel: [ 971.033583] [<ffffffffa01112ba>] ?
> > > mvs_int_full+0x25/0x88 [mvsas] kernel: [ 971.033600]
> > > [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas] kernel: [
> > > 971.033617] [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
> > > kernel: [ 971.033625] [<ffffffff8108aaac>] ?
> > > handle_IRQ_event+0x58/0x135 kernel: [ 971.033633]
> > > [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5 kernel: [
> > > 971.033642]
> > > [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
> >
> > That warning is triggered when an ata_queued_cmd is passed to completion
> > without the ATA_QCFLAG_ACTIVE flag being set (which indicates the qc was
> > started with some activity).
> >
> > That possibly indicates the low-level driver (or libsas) was passing an
> > already-completed cmd to libata.
> >
> > > The added hard drives are connected to a Supermicro AOC-SASLP-MV8,
> > > which is based on a marvel MV64460/64461/64462 chipset, which uses the
> > > sata_mv driver.
> >
> > Surely you mean 'mvsas' driver?
>
> Yes, sorry I did mean mvsas.
>
> I am more concerned about the actual oops/BUG rather than the warning
> though. Unless the problem causing the warning is also causing the oops.
>
> > Jeff
>

Thanks for taking a look so far. But I'm having more and more trouble with
this card as the days go by:

[ 464.792214] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 464.792222] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 494.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 494.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 494.816192] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 494.816197] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 525.817335] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 525.817343] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 525.817358] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 525.817363] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 556.816148] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 556.816157] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 556.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 556.816175] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 587.816171] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 587.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 587.816193] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 587.816199] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 600.616255] INFO: task mount:4395 blocked for more than 120 seconds.
[ 600.616263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 600.616270] mount D 0000000000000000 0 4395 4229 0x00000000
[ 600.616281] ffff88012fb6f780 0000000000000082 ffff8800b808ddc8
ffff8800b80f3d60
[ 600.616290] ffff880128923c90 0000000000014800 000000000000f800
ffff88012dd91840
[ 600.616299] ffff88012dd91b38 0000000300000008 0000000000000000
ffff88010c5bfa88
[ 600.616308] Call Trace:
[ 600.616324] [<ffffffff81017015>] ? read_tsc+0xa/0x20
[ 600.616335] [<ffffffff810adc63>] ? __pagevec_free+0x29/0x3b
[ 600.616343] [<ffffffff810661e9>] ? getnstimeofday+0x55/0xaf
[ 600.616351] [<ffffffff810a8e69>] ? sync_page+0x0/0x46
[ 600.616361] [<ffffffff812dc8cb>] ? io_schedule+0x63/0xa5
[ 600.616368] [<ffffffff810a8eaa>] ? sync_page+0x41/0x46
[ 600.616376] [<ffffffff812dcade>] ? __wait_on_bit_lock+0x3f/0x84
[ 600.616383] [<ffffffff810a8e55>] ? __lock_page+0x5d/0x63
[ 600.616391] [<ffffffff8105ef14>] ? wake_bit_function+0x0/0x23
[ 600.616401] [<ffffffff810b0ad4>] ? pagevec_lookup+0x17/0x1e
[ 600.616408] [<ffffffff810b1bbd>] ? truncate_inode_pages_range+0x288/0x318
[ 600.616418] [<ffffffff810fc8cb>] ? set_blocksize+0xc2/0xd2
[ 600.616426] [<ffffffff810fc8f2>] ? sb_set_blocksize+0x17/0x43
[ 600.616477] [<ffffffffa04aa8e0>] ? ext4_fill_super+0x1cc/0x2060 [ext4]
[ 600.616486] [<ffffffff8117593b>] ? snprintf+0x44/0x4c
[ 600.616493] [<ffffffff810fb5b7>] ? check_disk_change+0x22/0x52
[ 600.616501] [<ffffffff812dd680>] ? __down_write_nested+0x15/0xab
[ 600.616524] [<ffffffff810dbfe6>] ? get_sb_bdev+0x111/0x159
[ 600.616571] [<ffffffffa04aa714>] ? ext4_fill_super+0x0/0x2060 [ext4]
[ 600.616582] [<ffffffff810dbc44>] ? vfs_kern_mount+0x95/0x111
[ 600.616593] [<ffffffff810dbd13>] ? do_kern_mount+0x43/0xe2
[ 600.616607] [<ffffffff810ef49d>] ? do_mount+0x767/0x7d6
[ 600.616620] [<ffffffff810ef591>] ? sys_mount+0x85/0xc8
[ 600.616633] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
[ 618.816175] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 618.816184] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 618.816196] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 618.816201] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 618.816241] sd 1:0:3:0: [sdf] Unhandled error code
[ 618.816247] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
driverbyte=DRIVER_TIMEOUT
[ 618.816255] end_request: I/O error, dev sdf, sector 160
[ 618.816263] Buffer I/O error on device md2, logical block 20
[ 618.816274] Buffer I/O error on device md2, logical block 21
[ 618.816280] Buffer I/O error on device md2, logical block 22
[ 618.816285] Buffer I/O error on device md2, logical block 23
[ 618.816291] Buffer I/O error on device md2, logical block 24
[ 618.816298] Buffer I/O error on device md2, logical block 25
[ 618.816307] Buffer I/O error on device md2, logical block 26
[ 618.816316] Buffer I/O error on device md2, logical block 27
[ 618.816324] Buffer I/O error on device md2, logical block 28
[ 618.816331] Buffer I/O error on device md2, logical block 29
[ 649.780185] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 649.780194] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 649.780208] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 649.780214] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 649.780293] ------------[ cut here ]------------
[ 649.780366] WARNING: at drivers/ata/libata-core.c:5129
ata_qc_issue+0x10a/0x347 [libata]()
[ 649.780373] Hardware name: GA-MA790FXT-UD5P
[ 649.780377] Modules linked in: ext4 jbd2 crc16 raid0 powernow_k8
cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave kvm_amd
kvm nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc bridge stp it87
hwmon_vid adt7473 firewire_sbp2 loop md_mod snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm
snd_seq_midi snd_rawmidi snd_seq_midi_event amd64_edac_mod snd_seq edac_core
snd_timer snd_seq_device snd soundcore i2c_piix4 snd_page_alloc i2c_core evdev
parport_pc wmi parport button processor ext3 jbd mbcache dm_mod usbhid hid sg
sr_mod cdrom sd_mod crc_t10dif ata_generic ide_pci_generic firewire_ohci
firewire_core ohci_hcd crc_itu_t atiixp ide_core mvsas ehci_hcd ahci libsas
libata scsi_transport_sas scsi_mod r8169 mii floppy thermal fan thermal_sys
[last unloaded: scsi_wait_scan]
[ 649.780499] Pid: 3185, comm: hddtemp Not tainted 2.6.31-rc9 #2
[ 649.780504] Call Trace:
[ 649.780551] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
[ 649.780593] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
[ 649.780607] [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3
[ 649.780649] [<ffffffffa00790ce>] ? ata_scsi_pass_thru+0x0/0x240 [libata]
[ 649.780690] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
[ 649.780733] [<ffffffffa00401d6>] ? scsi_get_command+0x75/0x97 [scsi_mod]
[ 649.780776] [<ffffffffa00790ce>] ? ata_scsi_pass_thru+0x0/0x240 [libata]
[ 649.780815] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
[ 649.780858] [<ffffffffa007a4d5>] ? __ata_scsi_queuecmd+0x185/0x1dc
[libata]
[ 649.780896] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
[ 649.780918] [<ffffffffa00a3c8e>] ? sas_queuecommand+0x83/0x25d [libsas]
[ 649.780956] [<ffffffffa003fa7c>] ? scsi_dispatch_cmd+0x1c0/0x23c
[scsi_mod]
[ 649.780996] [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506 [scsi_mod]
[ 649.781006] [<ffffffff810546e0>] ? del_timer+0x59/0x62
[ 649.781016] [<ffffffff81163b70>] ? blk_execute_rq_nowait+0x65/0x89
[ 649.781032] [<ffffffffa014164f>] ? sg_common_write+0x489/0x4ab [sg]
[ 649.781042] [<ffffffff8115df56>] ? __freed_request+0x26/0x83
[ 649.781056] [<ffffffffa01421da>] ? sg_new_write+0x23e/0x269 [sg]
[ 649.781070] [<ffffffffa0142473>] ? sg_ioctl+0x26e/0xb63 [sg]
[ 649.781080] [<ffffffff81100f38>] ? inotify_d_instantiate+0x12/0x39
[ 649.781088] [<ffffffff8105eee6>] ? autoremove_wake_function+0x0/0x2e
[ 649.781098] [<ffffffff810d80bf>] ? fd_install+0x2e/0x5a
[ 649.781105] [<ffffffff810e5247>] ? vfs_ioctl+0x56/0x6c
[ 649.781111] [<ffffffff810e570a>] ? do_vfs_ioctl+0x437/0x475
[ 649.781118] [<ffffffff810e5799>] ? sys_ioctl+0x51/0x70
[ 649.781128] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
[ 649.781134] ---[ end trace 9005373b1b9c6eb7 ]---
[ 680.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 680.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 711.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 711.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 741.816069] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 741.816078] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 741.816090] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 741.816096] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 772.820160] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 772.820168] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 772.820181] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 772.820186] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 803.816212] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 803.816220] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 803.816233] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 803.816239] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 803.816264] sd 1:0:3:0: [sdf] Unhandled error code
[ 803.816270] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
driverbyte=DRIVER_TIMEOUT
[ 803.816278] end_request: I/O error, dev sdf, sector 512
[ 803.816284] __ratelimit: 70 callbacks suppressed
[ 803.816290] Buffer I/O error on device md2, logical block 64

That's after bringing up a raid0 array I build a few days ago on 4 perfectly
good (Seagate Baracuda 1TB 7200.12) disks, without the bad disk plugged in. I
try to mount it, and the driver hangs. Anything trying to access any of the 4
disks hangs as well.

I know this array worked a few days ago. The most major change I've made was
upgrade from -rc8 (or -rc5, not sure if I mounted the array under -rc8) to
-rc9.

--
Thomas Fjellstrom
[email protected]

2009-09-10 23:55:53

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Wed September 9 2009, Thomas Fjellstrom wrote:
> On Wed September 9 2009, Thomas Fjellstrom wrote:
> > On Wed September 9 2009, Jeff Garzik wrote:
> > > On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
> > > > No errors on that disk. Other than the one above, and its more of a
> > > > warning. However, I just rebooted to add some extra drives, thinking
> > > > everything was working a little better now that I've updated to
> > > > 2.6.31-rc9, I'm treated to the following two messages right after
> > > > boot (and a system lockup to boot):
> > > >
> > > > kernel: [ 971.033138] ------------[ cut here ]------------
> > > > kernel: [ 971.033211] WARNING: at drivers/ata/libata-core.c:4913
> > > > __ata_qc_complete+0x5a/0xe1 [libata]()
> > > > kernel: [ 971.033217] Hardware name: GA-MA790FXT-UD5P
> > > > kernel: [ 971.033221] Modules linked in: powernow_k8
> > > > cpufreq_conservative cpufreq_stats cpufreq_userspace
> > > > cpufreq_powersave kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl
> > > > auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473 firewire_sbp2
> > > > loop md_mod
> > > > snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
> > > > snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
> > > > snd_seq_midi_event snd_seq snd_timer snd_seq_device snd
> > > > amd64_edac_mod edac_core i2c_piix4 soundcore snd_page_alloc i2c_core
> > > > evdev wmi parport_pc button parport processor ext3 jbd mbcache dm_mod
> > > > sg sr_mod cdrom sd_mod crc_t10dif usbhid ata_generic ide_pci_generic
> > > > hid mvsas firewire_ohci libsas firewire_core crc_itu_t
> > > > scsi_transport_sas r8169 atiixp ide_core floppy ahci mii ohci_hcd
> > > > libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
> > > > scsi_wait_scan]
> > > > kernel: [ 971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9
> > > > #2 kernel: [ 971.033342] Call Trace:
> > > > kernel: [ 971.033346]<IRQ> [<ffffffffa00562ca>] ?
> > > > __ata_qc_complete+0x5a/0xe1 [libata]
> > > > kernel: [ 971.033434] [<ffffffffa00562ca>] ?
> > > > __ata_qc_complete+0x5a/0xe1 [libata]
> > > > kernel: [ 971.033446] [<ffffffff8104aca0>] ?
> > > > warn_slowpath_common+0x77/0xa3 kernel: [ 971.033455]
> > > > [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65 kernel: [ 971.033496]
> > > > [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1 [libata]
> > > > kernel: [ 971.033519] [<ffffffffa00f7b59>] ?
> > > > sas_ata_task_done+0x178/0x210 [libsas]
> > > > kernel: [ 971.033528] [<ffffffff8115ead1>] ?
> > > > blk_run_queue+0x21/0x35 kernel: [ 971.033548] [<ffffffffa010e2ce>]
> > > > ?
> > > > mvs_slot_complete+0x3df/0x41b [mvsas]
> > > > kernel: [ 971.033565] [<ffffffffa010e39c>] ? mvs_int_rx+0x92/0x101
> > > > [mvsas] kernel: [ 971.033583] [<ffffffffa01112ba>] ?
> > > > mvs_int_full+0x25/0x88 [mvsas] kernel: [ 971.033600]
> > > > [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas] kernel: [
> > > > 971.033617] [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
> > > > kernel: [ 971.033625] [<ffffffff8108aaac>] ?
> > > > handle_IRQ_event+0x58/0x135 kernel: [ 971.033633]
> > > > [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5 kernel: [
> > > > 971.033642]
> > > > [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
> > >
> > > That warning is triggered when an ata_queued_cmd is passed to
> > > completion without the ATA_QCFLAG_ACTIVE flag being set (which
> > > indicates the qc was started with some activity).
> > >
> > > That possibly indicates the low-level driver (or libsas) was passing an
> > > already-completed cmd to libata.
> > >
> > > > The added hard drives are connected to a Supermicro AOC-SASLP-MV8,
> > > > which is based on a marvel MV64460/64461/64462 chipset, which uses
> > > > the sata_mv driver.
> > >
> > > Surely you mean 'mvsas' driver?
> >
> > Yes, sorry I did mean mvsas.
> >
> > I am more concerned about the actual oops/BUG rather than the warning
> > though. Unless the problem causing the warning is also causing the oops.
> >
> > > Jeff
>
> Thanks for taking a look so far. But I'm having more and more trouble with
> this card as the days go by:
>
> [ 464.792214] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 464.792222] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 494.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 494.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 494.816192] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 494.816197] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 525.817335] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 525.817343] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 525.817358] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 525.817363] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 556.816148] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 556.816157] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 556.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 556.816175] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 587.816171] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 587.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 587.816193] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 587.816199] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 600.616255] INFO: task mount:4395 blocked for more than 120 seconds.
> [ 600.616263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 600.616270] mount D 0000000000000000 0 4395 4229
> 0x00000000 [ 600.616281] ffff88012fb6f780 0000000000000082
> ffff8800b808ddc8 ffff8800b80f3d60
> [ 600.616290] ffff880128923c90 0000000000014800 000000000000f800
> ffff88012dd91840
> [ 600.616299] ffff88012dd91b38 0000000300000008 0000000000000000
> ffff88010c5bfa88
> [ 600.616308] Call Trace:
> [ 600.616324] [<ffffffff81017015>] ? read_tsc+0xa/0x20
> [ 600.616335] [<ffffffff810adc63>] ? __pagevec_free+0x29/0x3b
> [ 600.616343] [<ffffffff810661e9>] ? getnstimeofday+0x55/0xaf
> [ 600.616351] [<ffffffff810a8e69>] ? sync_page+0x0/0x46
> [ 600.616361] [<ffffffff812dc8cb>] ? io_schedule+0x63/0xa5
> [ 600.616368] [<ffffffff810a8eaa>] ? sync_page+0x41/0x46
> [ 600.616376] [<ffffffff812dcade>] ? __wait_on_bit_lock+0x3f/0x84
> [ 600.616383] [<ffffffff810a8e55>] ? __lock_page+0x5d/0x63
> [ 600.616391] [<ffffffff8105ef14>] ? wake_bit_function+0x0/0x23
> [ 600.616401] [<ffffffff810b0ad4>] ? pagevec_lookup+0x17/0x1e
> [ 600.616408] [<ffffffff810b1bbd>] ?
> truncate_inode_pages_range+0x288/0x318 [ 600.616418]
> [<ffffffff810fc8cb>] ? set_blocksize+0xc2/0xd2
> [ 600.616426] [<ffffffff810fc8f2>] ? sb_set_blocksize+0x17/0x43
> [ 600.616477] [<ffffffffa04aa8e0>] ? ext4_fill_super+0x1cc/0x2060 [ext4]
> [ 600.616486] [<ffffffff8117593b>] ? snprintf+0x44/0x4c
> [ 600.616493] [<ffffffff810fb5b7>] ? check_disk_change+0x22/0x52
> [ 600.616501] [<ffffffff812dd680>] ? __down_write_nested+0x15/0xab
> [ 600.616524] [<ffffffff810dbfe6>] ? get_sb_bdev+0x111/0x159
> [ 600.616571] [<ffffffffa04aa714>] ? ext4_fill_super+0x0/0x2060 [ext4]
> [ 600.616582] [<ffffffff810dbc44>] ? vfs_kern_mount+0x95/0x111
> [ 600.616593] [<ffffffff810dbd13>] ? do_kern_mount+0x43/0xe2
> [ 600.616607] [<ffffffff810ef49d>] ? do_mount+0x767/0x7d6
> [ 600.616620] [<ffffffff810ef591>] ? sys_mount+0x85/0xc8
> [ 600.616633] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> [ 618.816175] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 618.816184] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 618.816196] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 618.816201] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 618.816241] sd 1:0:3:0: [sdf] Unhandled error code
> [ 618.816247] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> driverbyte=DRIVER_TIMEOUT
> [ 618.816255] end_request: I/O error, dev sdf, sector 160
> [ 618.816263] Buffer I/O error on device md2, logical block 20
> [ 618.816274] Buffer I/O error on device md2, logical block 21
> [ 618.816280] Buffer I/O error on device md2, logical block 22
> [ 618.816285] Buffer I/O error on device md2, logical block 23
> [ 618.816291] Buffer I/O error on device md2, logical block 24
> [ 618.816298] Buffer I/O error on device md2, logical block 25
> [ 618.816307] Buffer I/O error on device md2, logical block 26
> [ 618.816316] Buffer I/O error on device md2, logical block 27
> [ 618.816324] Buffer I/O error on device md2, logical block 28
> [ 618.816331] Buffer I/O error on device md2, logical block 29
> [ 649.780185] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 649.780194] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 649.780208] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 649.780214] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 649.780293] ------------[ cut here ]------------
> [ 649.780366] WARNING: at drivers/ata/libata-core.c:5129
> ata_qc_issue+0x10a/0x347 [libata]()
> [ 649.780373] Hardware name: GA-MA790FXT-UD5P
> [ 649.780377] Modules linked in: ext4 jbd2 crc16 raid0 powernow_k8
> cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave
> kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc
> bridge stp it87 hwmon_vid adt7473 firewire_sbp2 loop md_mod
> snd_hda_codec_realtek
> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm
> snd_seq_midi snd_rawmidi snd_seq_midi_event amd64_edac_mod snd_seq
> edac_core snd_timer snd_seq_device snd soundcore i2c_piix4 snd_page_alloc
> i2c_core evdev parport_pc wmi parport button processor ext3 jbd mbcache
> dm_mod usbhid hid sg sr_mod cdrom sd_mod crc_t10dif ata_generic
> ide_pci_generic firewire_ohci firewire_core ohci_hcd crc_itu_t atiixp
> ide_core mvsas ehci_hcd ahci libsas libata scsi_transport_sas scsi_mod
> r8169 mii floppy thermal fan thermal_sys [last unloaded: scsi_wait_scan]
> [ 649.780499] Pid: 3185, comm: hddtemp Not tainted 2.6.31-rc9 #2
> [ 649.780504] Call Trace:
> [ 649.780551] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
> [ 649.780593] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
> [ 649.780607] [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3
> [ 649.780649] [<ffffffffa00790ce>] ? ata_scsi_pass_thru+0x0/0x240
> [libata] [ 649.780690] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347
> [libata] [ 649.780733] [<ffffffffa00401d6>] ? scsi_get_command+0x75/0x97
> [scsi_mod] [ 649.780776] [<ffffffffa00790ce>] ?
> ata_scsi_pass_thru+0x0/0x240 [libata] [ 649.780815] [<ffffffffa003f7aa>]
> ? scsi_done+0x0/0xc [scsi_mod] [ 649.780858] [<ffffffffa007a4d5>] ?
> __ata_scsi_queuecmd+0x185/0x1dc [libata]
> [ 649.780896] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
> [ 649.780918] [<ffffffffa00a3c8e>] ? sas_queuecommand+0x83/0x25d [libsas]
> [ 649.780956] [<ffffffffa003fa7c>] ? scsi_dispatch_cmd+0x1c0/0x23c
> [scsi_mod]
> [ 649.780996] [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506
> [scsi_mod] [ 649.781006] [<ffffffff810546e0>] ? del_timer+0x59/0x62
> [ 649.781016] [<ffffffff81163b70>] ? blk_execute_rq_nowait+0x65/0x89
> [ 649.781032] [<ffffffffa014164f>] ? sg_common_write+0x489/0x4ab [sg]
> [ 649.781042] [<ffffffff8115df56>] ? __freed_request+0x26/0x83
> [ 649.781056] [<ffffffffa01421da>] ? sg_new_write+0x23e/0x269 [sg]
> [ 649.781070] [<ffffffffa0142473>] ? sg_ioctl+0x26e/0xb63 [sg]
> [ 649.781080] [<ffffffff81100f38>] ? inotify_d_instantiate+0x12/0x39
> [ 649.781088] [<ffffffff8105eee6>] ? autoremove_wake_function+0x0/0x2e
> [ 649.781098] [<ffffffff810d80bf>] ? fd_install+0x2e/0x5a
> [ 649.781105] [<ffffffff810e5247>] ? vfs_ioctl+0x56/0x6c
> [ 649.781111] [<ffffffff810e570a>] ? do_vfs_ioctl+0x437/0x475
> [ 649.781118] [<ffffffff810e5799>] ? sys_ioctl+0x51/0x70
> [ 649.781128] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> [ 649.781134] ---[ end trace 9005373b1b9c6eb7 ]---
> [ 680.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 680.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 711.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 711.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 741.816069] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 741.816078] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 741.816090] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 741.816096] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 772.820160] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 772.820168] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 772.820181] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 772.820186] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 803.816212] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 803.816220] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 803.816233] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> [ 803.816239] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> [ 803.816264] sd 1:0:3:0: [sdf] Unhandled error code
> [ 803.816270] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> driverbyte=DRIVER_TIMEOUT
> [ 803.816278] end_request: I/O error, dev sdf, sector 512
> [ 803.816284] __ratelimit: 70 callbacks suppressed
> [ 803.816290] Buffer I/O error on device md2, logical block 64
>
> That's after bringing up a raid0 array I build a few days ago on 4
> perfectly good (Seagate Baracuda 1TB 7200.12) disks, without the bad disk
> plugged in. I try to mount it, and the driver hangs. Anything trying to
> access any of the 4 disks hangs as well.
>
> I know this array worked a few days ago. The most major change I've made
> was upgrade from -rc8 (or -rc5, not sure if I mounted the array under
> -rc8) to -rc9.
>

Hi, just wondering if anyone has had a chance to look at this, or if there's
some patches I should try out, or if you need me to do some testing, I'd be
glad to, thanks :)

--
Thomas Fjellstrom
[email protected]

2009-09-11 16:04:06

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Thu September 10 2009, Thomas Fjellstrom wrote:
> On Wed September 9 2009, Thomas Fjellstrom wrote:
> > On Wed September 9 2009, Thomas Fjellstrom wrote:
> > > On Wed September 9 2009, Jeff Garzik wrote:
> > > > On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
> > > > > No errors on that disk. Other than the one above, and its more of a
> > > > > warning. However, I just rebooted to add some extra drives,
> > > > > thinking everything was working a little better now that I've
> > > > > updated to 2.6.31-rc9, I'm treated to the following two messages
> > > > > right after boot (and a system lockup to boot):
> > > > >
> > > > > kernel: [ 971.033138] ------------[ cut here ]------------
> > > > > kernel: [ 971.033211] WARNING: at drivers/ata/libata-core.c:4913
> > > > > __ata_qc_complete+0x5a/0xe1 [libata]()
> > > > > kernel: [ 971.033217] Hardware name: GA-MA790FXT-UD5P
> > > > > kernel: [ 971.033221] Modules linked in: powernow_k8
> > > > > cpufreq_conservative cpufreq_stats cpufreq_userspace
> > > > > cpufreq_powersave kvm_amd kvm nfsd exportfs nfs lockd fscache
> > > > > nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473
> > > > > firewire_sbp2 loop md_mod
> > > > > snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
> > > > > snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
> > > > > snd_seq_midi_event snd_seq snd_timer snd_seq_device snd
> > > > > amd64_edac_mod edac_core i2c_piix4 soundcore snd_page_alloc
> > > > > i2c_core evdev wmi parport_pc button parport processor ext3 jbd
> > > > > mbcache dm_mod sg sr_mod cdrom sd_mod crc_t10dif usbhid ata_generic
> > > > > ide_pci_generic hid mvsas firewire_ohci libsas firewire_core
> > > > > crc_itu_t
> > > > > scsi_transport_sas r8169 atiixp ide_core floppy ahci mii ohci_hcd
> > > > > libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
> > > > > scsi_wait_scan]
> > > > > kernel: [ 971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9
> > > > > #2 kernel: [ 971.033342] Call Trace:
> > > > > kernel: [ 971.033346]<IRQ> [<ffffffffa00562ca>] ?
> > > > > __ata_qc_complete+0x5a/0xe1 [libata]
> > > > > kernel: [ 971.033434] [<ffffffffa00562ca>] ?
> > > > > __ata_qc_complete+0x5a/0xe1 [libata]
> > > > > kernel: [ 971.033446] [<ffffffff8104aca0>] ?
> > > > > warn_slowpath_common+0x77/0xa3 kernel: [ 971.033455]
> > > > > [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65 kernel: [
> > > > > 971.033496] [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1
> > > > > [libata] kernel: [ 971.033519] [<ffffffffa00f7b59>] ?
> > > > > sas_ata_task_done+0x178/0x210 [libsas]
> > > > > kernel: [ 971.033528] [<ffffffff8115ead1>] ?
> > > > > blk_run_queue+0x21/0x35 kernel: [ 971.033548]
> > > > > [<ffffffffa010e2ce>] ?
> > > > > mvs_slot_complete+0x3df/0x41b [mvsas]
> > > > > kernel: [ 971.033565] [<ffffffffa010e39c>] ?
> > > > > mvs_int_rx+0x92/0x101 [mvsas] kernel: [ 971.033583]
> > > > > [<ffffffffa01112ba>] ?
> > > > > mvs_int_full+0x25/0x88 [mvsas] kernel: [ 971.033600]
> > > > > [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas] kernel: [
> > > > > 971.033617] [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
> > > > > kernel: [ 971.033625] [<ffffffff8108aaac>] ?
> > > > > handle_IRQ_event+0x58/0x135 kernel: [ 971.033633]
> > > > > [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5 kernel: [
> > > > > 971.033642]
> > > > > [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
> > > >
> > > > That warning is triggered when an ata_queued_cmd is passed to
> > > > completion without the ATA_QCFLAG_ACTIVE flag being set (which
> > > > indicates the qc was started with some activity).
> > > >
> > > > That possibly indicates the low-level driver (or libsas) was passing
> > > > an already-completed cmd to libata.
> > > >
> > > > > The added hard drives are connected to a Supermicro AOC-SASLP-MV8,
> > > > > which is based on a marvel MV64460/64461/64462 chipset, which uses
> > > > > the sata_mv driver.
> > > >
> > > > Surely you mean 'mvsas' driver?
> > >
> > > Yes, sorry I did mean mvsas.
> > >
> > > I am more concerned about the actual oops/BUG rather than the warning
> > > though. Unless the problem causing the warning is also causing the
> > > oops.
> > >
> > > > Jeff
> >
> > Thanks for taking a look so far. But I'm having more and more trouble
> > with this card as the days go by:
> >
> > [ 464.792214] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 464.792222] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 494.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 494.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 494.816192] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 494.816197] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 525.817335] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 525.817343] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 525.817358] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 525.817363] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 556.816148] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 556.816157] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 556.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 556.816175] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 587.816171] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 587.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 587.816193] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 587.816199] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 600.616255] INFO: task mount:4395 blocked for more than 120 seconds.
> > [ 600.616263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 600.616270] mount D 0000000000000000 0 4395 4229
> > 0x00000000 [ 600.616281] ffff88012fb6f780 0000000000000082
> > ffff8800b808ddc8 ffff8800b80f3d60
> > [ 600.616290] ffff880128923c90 0000000000014800 000000000000f800
> > ffff88012dd91840
> > [ 600.616299] ffff88012dd91b38 0000000300000008 0000000000000000
> > ffff88010c5bfa88
> > [ 600.616308] Call Trace:
> > [ 600.616324] [<ffffffff81017015>] ? read_tsc+0xa/0x20
> > [ 600.616335] [<ffffffff810adc63>] ? __pagevec_free+0x29/0x3b
> > [ 600.616343] [<ffffffff810661e9>] ? getnstimeofday+0x55/0xaf
> > [ 600.616351] [<ffffffff810a8e69>] ? sync_page+0x0/0x46
> > [ 600.616361] [<ffffffff812dc8cb>] ? io_schedule+0x63/0xa5
> > [ 600.616368] [<ffffffff810a8eaa>] ? sync_page+0x41/0x46
> > [ 600.616376] [<ffffffff812dcade>] ? __wait_on_bit_lock+0x3f/0x84
> > [ 600.616383] [<ffffffff810a8e55>] ? __lock_page+0x5d/0x63
> > [ 600.616391] [<ffffffff8105ef14>] ? wake_bit_function+0x0/0x23
> > [ 600.616401] [<ffffffff810b0ad4>] ? pagevec_lookup+0x17/0x1e
> > [ 600.616408] [<ffffffff810b1bbd>] ?
> > truncate_inode_pages_range+0x288/0x318 [ 600.616418]
> > [<ffffffff810fc8cb>] ? set_blocksize+0xc2/0xd2
> > [ 600.616426] [<ffffffff810fc8f2>] ? sb_set_blocksize+0x17/0x43
> > [ 600.616477] [<ffffffffa04aa8e0>] ? ext4_fill_super+0x1cc/0x2060
> > [ext4] [ 600.616486] [<ffffffff8117593b>] ? snprintf+0x44/0x4c
> > [ 600.616493] [<ffffffff810fb5b7>] ? check_disk_change+0x22/0x52
> > [ 600.616501] [<ffffffff812dd680>] ? __down_write_nested+0x15/0xab
> > [ 600.616524] [<ffffffff810dbfe6>] ? get_sb_bdev+0x111/0x159
> > [ 600.616571] [<ffffffffa04aa714>] ? ext4_fill_super+0x0/0x2060 [ext4]
> > [ 600.616582] [<ffffffff810dbc44>] ? vfs_kern_mount+0x95/0x111
> > [ 600.616593] [<ffffffff810dbd13>] ? do_kern_mount+0x43/0xe2
> > [ 600.616607] [<ffffffff810ef49d>] ? do_mount+0x767/0x7d6
> > [ 600.616620] [<ffffffff810ef591>] ? sys_mount+0x85/0xc8
> > [ 600.616633] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> > [ 618.816175] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 618.816184] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 618.816196] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 618.816201] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 618.816241] sd 1:0:3:0: [sdf] Unhandled error code
> > [ 618.816247] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> > driverbyte=DRIVER_TIMEOUT
> > [ 618.816255] end_request: I/O error, dev sdf, sector 160
> > [ 618.816263] Buffer I/O error on device md2, logical block 20
> > [ 618.816274] Buffer I/O error on device md2, logical block 21
> > [ 618.816280] Buffer I/O error on device md2, logical block 22
> > [ 618.816285] Buffer I/O error on device md2, logical block 23
> > [ 618.816291] Buffer I/O error on device md2, logical block 24
> > [ 618.816298] Buffer I/O error on device md2, logical block 25
> > [ 618.816307] Buffer I/O error on device md2, logical block 26
> > [ 618.816316] Buffer I/O error on device md2, logical block 27
> > [ 618.816324] Buffer I/O error on device md2, logical block 28
> > [ 618.816331] Buffer I/O error on device md2, logical block 29
> > [ 649.780185] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 649.780194] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 649.780208] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 649.780214] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 649.780293] ------------[ cut here ]------------
> > [ 649.780366] WARNING: at drivers/ata/libata-core.c:5129
> > ata_qc_issue+0x10a/0x347 [libata]()
> > [ 649.780373] Hardware name: GA-MA790FXT-UD5P
> > [ 649.780377] Modules linked in: ext4 jbd2 crc16 raid0 powernow_k8
> > cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave
> > kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc
> > bridge stp it87 hwmon_vid adt7473 firewire_sbp2 loop md_mod
> > snd_hda_codec_realtek
> > snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm
> > snd_seq_midi snd_rawmidi snd_seq_midi_event amd64_edac_mod snd_seq
> > edac_core snd_timer snd_seq_device snd soundcore i2c_piix4
> > snd_page_alloc i2c_core evdev parport_pc wmi parport button processor
> > ext3 jbd mbcache dm_mod usbhid hid sg sr_mod cdrom sd_mod crc_t10dif
> > ata_generic ide_pci_generic firewire_ohci firewire_core ohci_hcd
> > crc_itu_t atiixp ide_core mvsas ehci_hcd ahci libsas libata
> > scsi_transport_sas scsi_mod r8169 mii floppy thermal fan thermal_sys
> > [last unloaded: scsi_wait_scan] [ 649.780499] Pid: 3185, comm: hddtemp
> > Not tainted 2.6.31-rc9 #2 [ 649.780504] Call Trace:
> > [ 649.780551] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
> > [ 649.780593] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
> > [ 649.780607] [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3
> > [ 649.780649] [<ffffffffa00790ce>] ? ata_scsi_pass_thru+0x0/0x240
> > [libata] [ 649.780690] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347
> > [libata] [ 649.780733] [<ffffffffa00401d6>] ?
> > scsi_get_command+0x75/0x97 [scsi_mod] [ 649.780776]
> > [<ffffffffa00790ce>] ?
> > ata_scsi_pass_thru+0x0/0x240 [libata] [ 649.780815]
> > [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod] [ 649.780858]
> > [<ffffffffa007a4d5>] ? __ata_scsi_queuecmd+0x185/0x1dc [libata]
> > [ 649.780896] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
> > [ 649.780918] [<ffffffffa00a3c8e>] ? sas_queuecommand+0x83/0x25d
> > [libsas] [ 649.780956] [<ffffffffa003fa7c>] ?
> > scsi_dispatch_cmd+0x1c0/0x23c [scsi_mod]
> > [ 649.780996] [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506
> > [scsi_mod] [ 649.781006] [<ffffffff810546e0>] ? del_timer+0x59/0x62
> > [ 649.781016] [<ffffffff81163b70>] ? blk_execute_rq_nowait+0x65/0x89
> > [ 649.781032] [<ffffffffa014164f>] ? sg_common_write+0x489/0x4ab [sg]
> > [ 649.781042] [<ffffffff8115df56>] ? __freed_request+0x26/0x83
> > [ 649.781056] [<ffffffffa01421da>] ? sg_new_write+0x23e/0x269 [sg]
> > [ 649.781070] [<ffffffffa0142473>] ? sg_ioctl+0x26e/0xb63 [sg]
> > [ 649.781080] [<ffffffff81100f38>] ? inotify_d_instantiate+0x12/0x39
> > [ 649.781088] [<ffffffff8105eee6>] ? autoremove_wake_function+0x0/0x2e
> > [ 649.781098] [<ffffffff810d80bf>] ? fd_install+0x2e/0x5a
> > [ 649.781105] [<ffffffff810e5247>] ? vfs_ioctl+0x56/0x6c
> > [ 649.781111] [<ffffffff810e570a>] ? do_vfs_ioctl+0x437/0x475
> > [ 649.781118] [<ffffffff810e5799>] ? sys_ioctl+0x51/0x70
> > [ 649.781128] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> > [ 649.781134] ---[ end trace 9005373b1b9c6eb7 ]---
> > [ 680.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 680.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 711.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 711.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 741.816069] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 741.816078] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 741.816090] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 741.816096] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 772.820160] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 772.820168] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 772.820181] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 772.820186] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 803.816212] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 803.816220] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 803.816233] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [ 803.816239] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [ 803.816264] sd 1:0:3:0: [sdf] Unhandled error code
> > [ 803.816270] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> > driverbyte=DRIVER_TIMEOUT
> > [ 803.816278] end_request: I/O error, dev sdf, sector 512
> > [ 803.816284] __ratelimit: 70 callbacks suppressed
> > [ 803.816290] Buffer I/O error on device md2, logical block 64
> >
> > That's after bringing up a raid0 array I build a few days ago on 4
> > perfectly good (Seagate Baracuda 1TB 7200.12) disks, without the bad
> > disk plugged in. I try to mount it, and the driver hangs. Anything trying
> > to access any of the 4 disks hangs as well.
> >
> > I know this array worked a few days ago. The most major change I've made
> > was upgrade from -rc8 (or -rc5, not sure if I mounted the array under
> > -rc8) to -rc9.
>
> Hi, just wondering if anyone has had a chance to look at this, or if
> there's some patches I should try out, or if you need me to do some
> testing, I'd be glad to, thanks :)
>

If this makes it any more confusing, I do not have the suspected bad drive
connected, and have been getting the following message from dmesg for the past
day:

mvsas 0000:04:00.0: mvsas exec failed[-132]!
ata7: no sense translation for status: 0x00
ata7: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata7: status=0x00 { }
mvsas 0000:04:00.0: mvsas exec failed[-132]!
ata8: no sense translation for status: 0x00
ata8: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata8: status=0x00 { }
mvsas 0000:04:00.0: mvsas exec failed[-132]!
ata9: no sense translation for status: 0x00
ata9: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata9: status=0x00 { }
mvsas 0000:04:00.0: mvsas exec failed[-132]!
ata10: no sense translation for status: 0x00
ata10: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata10: status=0x00 { }

Theres 4 total disks connected, though ports 0 and 1 are currently empty, with
ports 2, 3, 4, 5 populated with mostly identical 1TB Seagate disks (as
identical as you can get, 1TB 7200.12 same firmware)

Trying to smartctl the disks only returns errors. hddtemp gives "drive is
sleeping".

--
Thomas Fjellstrom
[email protected]

2009-09-13 00:43:18

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On 09/10/2009 07:55 PM, Thomas Fjellstrom wrote:
> On Wed September 9 2009, Thomas Fjellstrom wrote:
>> On Wed September 9 2009, Thomas Fjellstrom wrote:
>>> On Wed September 9 2009, Jeff Garzik wrote:
>>>> On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
>>>>> No errors on that disk. Other than the one above, and its more of a
>>>>> warning. However, I just rebooted to add some extra drives, thinking
>>>>> everything was working a little better now that I've updated to
>>>>> 2.6.31-rc9, I'm treated to the following two messages right after
>>>>> boot (and a system lockup to boot):
>>>>>
>>>>> kernel: [ 971.033138] ------------[ cut here ]------------
>>>>> kernel: [ 971.033211] WARNING: at drivers/ata/libata-core.c:4913
>>>>> __ata_qc_complete+0x5a/0xe1 [libata]()
>>>>> kernel: [ 971.033217] Hardware name: GA-MA790FXT-UD5P
>>>>> kernel: [ 971.033221] Modules linked in: powernow_k8
>>>>> cpufreq_conservative cpufreq_stats cpufreq_userspace
>>>>> cpufreq_powersave kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl
>>>>> auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473 firewire_sbp2
>>>>> loop md_mod
>>>>> snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
>>>>> snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
>>>>> snd_seq_midi_event snd_seq snd_timer snd_seq_device snd
>>>>> amd64_edac_mod edac_core i2c_piix4 soundcore snd_page_alloc i2c_core
>>>>> evdev wmi parport_pc button parport processor ext3 jbd mbcache dm_mod
>>>>> sg sr_mod cdrom sd_mod crc_t10dif usbhid ata_generic ide_pci_generic
>>>>> hid mvsas firewire_ohci libsas firewire_core crc_itu_t
>>>>> scsi_transport_sas r8169 atiixp ide_core floppy ahci mii ohci_hcd
>>>>> libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
>>>>> scsi_wait_scan]
>>>>> kernel: [ 971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9
>>>>> #2 kernel: [ 971.033342] Call Trace:
>>>>> kernel: [ 971.033346]<IRQ> [<ffffffffa00562ca>] ?
>>>>> __ata_qc_complete+0x5a/0xe1 [libata]
>>>>> kernel: [ 971.033434] [<ffffffffa00562ca>] ?
>>>>> __ata_qc_complete+0x5a/0xe1 [libata]
>>>>> kernel: [ 971.033446] [<ffffffff8104aca0>] ?
>>>>> warn_slowpath_common+0x77/0xa3 kernel: [ 971.033455]
>>>>> [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65 kernel: [ 971.033496]
>>>>> [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1 [libata]
>>>>> kernel: [ 971.033519] [<ffffffffa00f7b59>] ?
>>>>> sas_ata_task_done+0x178/0x210 [libsas]
>>>>> kernel: [ 971.033528] [<ffffffff8115ead1>] ?
>>>>> blk_run_queue+0x21/0x35 kernel: [ 971.033548] [<ffffffffa010e2ce>]
>>>>> ?
>>>>> mvs_slot_complete+0x3df/0x41b [mvsas]
>>>>> kernel: [ 971.033565] [<ffffffffa010e39c>] ? mvs_int_rx+0x92/0x101
>>>>> [mvsas] kernel: [ 971.033583] [<ffffffffa01112ba>] ?
>>>>> mvs_int_full+0x25/0x88 [mvsas] kernel: [ 971.033600]
>>>>> [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas] kernel: [
>>>>> 971.033617] [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
>>>>> kernel: [ 971.033625] [<ffffffff8108aaac>] ?
>>>>> handle_IRQ_event+0x58/0x135 kernel: [ 971.033633]
>>>>> [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5 kernel: [
>>>>> 971.033642]
>>>>> [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
>>>>
>>>> That warning is triggered when an ata_queued_cmd is passed to
>>>> completion without the ATA_QCFLAG_ACTIVE flag being set (which
>>>> indicates the qc was started with some activity).
>>>>
>>>> That possibly indicates the low-level driver (or libsas) was passing an
>>>> already-completed cmd to libata.
>>>>
>>>>> The added hard drives are connected to a Supermicro AOC-SASLP-MV8,
>>>>> which is based on a marvel MV64460/64461/64462 chipset, which uses
>>>>> the sata_mv driver.
>>>>
>>>> Surely you mean 'mvsas' driver?
>>>
>>> Yes, sorry I did mean mvsas.
>>>
>>> I am more concerned about the actual oops/BUG rather than the warning
>>> though. Unless the problem causing the warning is also causing the oops.
>>>
>>>> Jeff
>>
>> Thanks for taking a look so far. But I'm having more and more trouble with
>> this card as the days go by:
>>
>> [ 464.792214] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 464.792222] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 494.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 494.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 494.816192] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 494.816197] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 525.817335] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 525.817343] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 525.817358] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 525.817363] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 556.816148] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 556.816157] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 556.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 556.816175] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 587.816171] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 587.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 587.816193] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 587.816199] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 600.616255] INFO: task mount:4395 blocked for more than 120 seconds.
>> [ 600.616263] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [ 600.616270] mount D 0000000000000000 0 4395 4229
>> 0x00000000 [ 600.616281] ffff88012fb6f780 0000000000000082
>> ffff8800b808ddc8 ffff8800b80f3d60
>> [ 600.616290] ffff880128923c90 0000000000014800 000000000000f800
>> ffff88012dd91840
>> [ 600.616299] ffff88012dd91b38 0000000300000008 0000000000000000
>> ffff88010c5bfa88
>> [ 600.616308] Call Trace:
>> [ 600.616324] [<ffffffff81017015>] ? read_tsc+0xa/0x20
>> [ 600.616335] [<ffffffff810adc63>] ? __pagevec_free+0x29/0x3b
>> [ 600.616343] [<ffffffff810661e9>] ? getnstimeofday+0x55/0xaf
>> [ 600.616351] [<ffffffff810a8e69>] ? sync_page+0x0/0x46
>> [ 600.616361] [<ffffffff812dc8cb>] ? io_schedule+0x63/0xa5
>> [ 600.616368] [<ffffffff810a8eaa>] ? sync_page+0x41/0x46
>> [ 600.616376] [<ffffffff812dcade>] ? __wait_on_bit_lock+0x3f/0x84
>> [ 600.616383] [<ffffffff810a8e55>] ? __lock_page+0x5d/0x63
>> [ 600.616391] [<ffffffff8105ef14>] ? wake_bit_function+0x0/0x23
>> [ 600.616401] [<ffffffff810b0ad4>] ? pagevec_lookup+0x17/0x1e
>> [ 600.616408] [<ffffffff810b1bbd>] ?
>> truncate_inode_pages_range+0x288/0x318 [ 600.616418]
>> [<ffffffff810fc8cb>] ? set_blocksize+0xc2/0xd2
>> [ 600.616426] [<ffffffff810fc8f2>] ? sb_set_blocksize+0x17/0x43
>> [ 600.616477] [<ffffffffa04aa8e0>] ? ext4_fill_super+0x1cc/0x2060 [ext4]
>> [ 600.616486] [<ffffffff8117593b>] ? snprintf+0x44/0x4c
>> [ 600.616493] [<ffffffff810fb5b7>] ? check_disk_change+0x22/0x52
>> [ 600.616501] [<ffffffff812dd680>] ? __down_write_nested+0x15/0xab
>> [ 600.616524] [<ffffffff810dbfe6>] ? get_sb_bdev+0x111/0x159
>> [ 600.616571] [<ffffffffa04aa714>] ? ext4_fill_super+0x0/0x2060 [ext4]
>> [ 600.616582] [<ffffffff810dbc44>] ? vfs_kern_mount+0x95/0x111
>> [ 600.616593] [<ffffffff810dbd13>] ? do_kern_mount+0x43/0xe2
>> [ 600.616607] [<ffffffff810ef49d>] ? do_mount+0x767/0x7d6
>> [ 600.616620] [<ffffffff810ef591>] ? sys_mount+0x85/0xc8
>> [ 600.616633] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
>> [ 618.816175] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 618.816184] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 618.816196] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 618.816201] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 618.816241] sd 1:0:3:0: [sdf] Unhandled error code
>> [ 618.816247] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
>> driverbyte=DRIVER_TIMEOUT
>> [ 618.816255] end_request: I/O error, dev sdf, sector 160
>> [ 618.816263] Buffer I/O error on device md2, logical block 20
>> [ 618.816274] Buffer I/O error on device md2, logical block 21
>> [ 618.816280] Buffer I/O error on device md2, logical block 22
>> [ 618.816285] Buffer I/O error on device md2, logical block 23
>> [ 618.816291] Buffer I/O error on device md2, logical block 24
>> [ 618.816298] Buffer I/O error on device md2, logical block 25
>> [ 618.816307] Buffer I/O error on device md2, logical block 26
>> [ 618.816316] Buffer I/O error on device md2, logical block 27
>> [ 618.816324] Buffer I/O error on device md2, logical block 28
>> [ 618.816331] Buffer I/O error on device md2, logical block 29
>> [ 649.780185] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 649.780194] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 649.780208] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 649.780214] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 649.780293] ------------[ cut here ]------------
>> [ 649.780366] WARNING: at drivers/ata/libata-core.c:5129
>> ata_qc_issue+0x10a/0x347 [libata]()
>> [ 649.780373] Hardware name: GA-MA790FXT-UD5P
>> [ 649.780377] Modules linked in: ext4 jbd2 crc16 raid0 powernow_k8
>> cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave
>> kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc
>> bridge stp it87 hwmon_vid adt7473 firewire_sbp2 loop md_mod
>> snd_hda_codec_realtek
>> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm
>> snd_seq_midi snd_rawmidi snd_seq_midi_event amd64_edac_mod snd_seq
>> edac_core snd_timer snd_seq_device snd soundcore i2c_piix4 snd_page_alloc
>> i2c_core evdev parport_pc wmi parport button processor ext3 jbd mbcache
>> dm_mod usbhid hid sg sr_mod cdrom sd_mod crc_t10dif ata_generic
>> ide_pci_generic firewire_ohci firewire_core ohci_hcd crc_itu_t atiixp
>> ide_core mvsas ehci_hcd ahci libsas libata scsi_transport_sas scsi_mod
>> r8169 mii floppy thermal fan thermal_sys [last unloaded: scsi_wait_scan]
>> [ 649.780499] Pid: 3185, comm: hddtemp Not tainted 2.6.31-rc9 #2
>> [ 649.780504] Call Trace:
>> [ 649.780551] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
>> [ 649.780593] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
>> [ 649.780607] [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3
>> [ 649.780649] [<ffffffffa00790ce>] ? ata_scsi_pass_thru+0x0/0x240
>> [libata] [ 649.780690] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347
>> [libata] [ 649.780733] [<ffffffffa00401d6>] ? scsi_get_command+0x75/0x97
>> [scsi_mod] [ 649.780776] [<ffffffffa00790ce>] ?
>> ata_scsi_pass_thru+0x0/0x240 [libata] [ 649.780815] [<ffffffffa003f7aa>]
>> ? scsi_done+0x0/0xc [scsi_mod] [ 649.780858] [<ffffffffa007a4d5>] ?
>> __ata_scsi_queuecmd+0x185/0x1dc [libata]
>> [ 649.780896] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
>> [ 649.780918] [<ffffffffa00a3c8e>] ? sas_queuecommand+0x83/0x25d [libsas]
>> [ 649.780956] [<ffffffffa003fa7c>] ? scsi_dispatch_cmd+0x1c0/0x23c
>> [scsi_mod]
>> [ 649.780996] [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506
>> [scsi_mod] [ 649.781006] [<ffffffff810546e0>] ? del_timer+0x59/0x62
>> [ 649.781016] [<ffffffff81163b70>] ? blk_execute_rq_nowait+0x65/0x89
>> [ 649.781032] [<ffffffffa014164f>] ? sg_common_write+0x489/0x4ab [sg]
>> [ 649.781042] [<ffffffff8115df56>] ? __freed_request+0x26/0x83
>> [ 649.781056] [<ffffffffa01421da>] ? sg_new_write+0x23e/0x269 [sg]
>> [ 649.781070] [<ffffffffa0142473>] ? sg_ioctl+0x26e/0xb63 [sg]
>> [ 649.781080] [<ffffffff81100f38>] ? inotify_d_instantiate+0x12/0x39
>> [ 649.781088] [<ffffffff8105eee6>] ? autoremove_wake_function+0x0/0x2e
>> [ 649.781098] [<ffffffff810d80bf>] ? fd_install+0x2e/0x5a
>> [ 649.781105] [<ffffffff810e5247>] ? vfs_ioctl+0x56/0x6c
>> [ 649.781111] [<ffffffff810e570a>] ? do_vfs_ioctl+0x437/0x475
>> [ 649.781118] [<ffffffff810e5799>] ? sys_ioctl+0x51/0x70
>> [ 649.781128] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
>> [ 649.781134] ---[ end trace 9005373b1b9c6eb7 ]---
>> [ 680.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 680.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 711.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 711.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 741.816069] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 741.816078] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 741.816090] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 741.816096] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 772.820160] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 772.820168] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 772.820181] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 772.820186] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 803.816212] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 803.816220] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 803.816233] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
>> [ 803.816239] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
>> [ 803.816264] sd 1:0:3:0: [sdf] Unhandled error code
>> [ 803.816270] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
>> driverbyte=DRIVER_TIMEOUT
>> [ 803.816278] end_request: I/O error, dev sdf, sector 512
>> [ 803.816284] __ratelimit: 70 callbacks suppressed
>> [ 803.816290] Buffer I/O error on device md2, logical block 64
>>
>> That's after bringing up a raid0 array I build a few days ago on 4
>> perfectly good (Seagate Baracuda 1TB 7200.12) disks, without the bad disk
>> plugged in. I try to mount it, and the driver hangs. Anything trying to
>> access any of the 4 disks hangs as well.
>>
>> I know this array worked a few days ago. The most major change I've made
>> was upgrade from -rc8 (or -rc5, not sure if I mounted the array under
>> -rc8) to -rc9.
>>
>
> Hi, just wondering if anyone has had a chance to look at this, or if there's
> some patches I should try out, or if you need me to do some testing, I'd be
> glad to, thanks :)

I was hoping that some VM would jump in. The BUG in question is, from
mm/slab.c:

/*
* The slab was either on partial or free list so
* there must be at least one object available for
* allocation.
*/
BUG_ON(slabp->inuse >= cachep->num);

So I wonder if that is a double-free, indicating a bug in
SCSI/libsas/mvsas, or a VM problem of some sort.

Was free memory low on that machine, at that point, perchance?

Jeff


2009-09-13 04:57:01

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Sat September 12 2009, Jeff Garzik wrote:
> On 09/10/2009 07:55 PM, Thomas Fjellstrom wrote:
> > On Wed September 9 2009, Thomas Fjellstrom wrote:
> >> On Wed September 9 2009, Thomas Fjellstrom wrote:
> >>> On Wed September 9 2009, Jeff Garzik wrote:
> >>>> On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
> >>>>> No errors on that disk. Other than the one above, and its more of a
> >>>>> warning. However, I just rebooted to add some extra drives, thinking
> >>>>> everything was working a little better now that I've updated to
> >>>>> 2.6.31-rc9, I'm treated to the following two messages right after
> >>>>> boot (and a system lockup to boot):
> >>>>>
> >>>>> kernel: [ 971.033138] ------------[ cut here ]------------
> >>>>> kernel: [ 971.033211] WARNING: at drivers/ata/libata-core.c:4913
> >>>>> __ata_qc_complete+0x5a/0xe1 [libata]()
> >>>>> kernel: [ 971.033217] Hardware name: GA-MA790FXT-UD5P
> >>>>> kernel: [ 971.033221] Modules linked in: powernow_k8
> >>>>> cpufreq_conservative cpufreq_stats cpufreq_userspace
> >>>>> cpufreq_powersave kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl
> >>>>> auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473 firewire_sbp2
> >>>>> loop md_mod
> >>>>> snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
> >>>>> snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
> >>>>> snd_seq_midi_event snd_seq snd_timer snd_seq_device snd
> >>>>> amd64_edac_mod edac_core i2c_piix4 soundcore snd_page_alloc i2c_core
> >>>>> evdev wmi parport_pc button parport processor ext3 jbd mbcache dm_mod
> >>>>> sg sr_mod cdrom sd_mod crc_t10dif usbhid ata_generic ide_pci_generic
> >>>>> hid mvsas firewire_ohci libsas firewire_core crc_itu_t
> >>>>> scsi_transport_sas r8169 atiixp ide_core floppy ahci mii ohci_hcd
> >>>>> libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
> >>>>> scsi_wait_scan]
> >>>>> kernel: [ 971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9
> >>>>> #2 kernel: [ 971.033342] Call Trace:
> >>>>> kernel: [ 971.033346]<IRQ> [<ffffffffa00562ca>] ?
> >>>>> __ata_qc_complete+0x5a/0xe1 [libata]
> >>>>> kernel: [ 971.033434] [<ffffffffa00562ca>] ?
> >>>>> __ata_qc_complete+0x5a/0xe1 [libata]
> >>>>> kernel: [ 971.033446] [<ffffffff8104aca0>] ?
> >>>>> warn_slowpath_common+0x77/0xa3 kernel: [ 971.033455]
> >>>>> [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65 kernel: [ 971.033496]
> >>>>> [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1 [libata]
> >>>>> kernel: [ 971.033519] [<ffffffffa00f7b59>] ?
> >>>>> sas_ata_task_done+0x178/0x210 [libsas]
> >>>>> kernel: [ 971.033528] [<ffffffff8115ead1>] ?
> >>>>> blk_run_queue+0x21/0x35 kernel: [ 971.033548] [<ffffffffa010e2ce>]
> >>>>> ?
> >>>>> mvs_slot_complete+0x3df/0x41b [mvsas]
> >>>>> kernel: [ 971.033565] [<ffffffffa010e39c>] ? mvs_int_rx+0x92/0x101
> >>>>> [mvsas] kernel: [ 971.033583] [<ffffffffa01112ba>] ?
> >>>>> mvs_int_full+0x25/0x88 [mvsas] kernel: [ 971.033600]
> >>>>> [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas] kernel: [
> >>>>> 971.033617] [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
> >>>>> kernel: [ 971.033625] [<ffffffff8108aaac>] ?
> >>>>> handle_IRQ_event+0x58/0x135 kernel: [ 971.033633]
> >>>>> [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5 kernel: [
> >>>>> 971.033642]
> >>>>> [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
> >>>>
> >>>> That warning is triggered when an ata_queued_cmd is passed to
> >>>> completion without the ATA_QCFLAG_ACTIVE flag being set (which
> >>>> indicates the qc was started with some activity).
> >>>>
> >>>> That possibly indicates the low-level driver (or libsas) was passing
> >>>> an already-completed cmd to libata.
> >>>>
> >>>>> The added hard drives are connected to a Supermicro AOC-SASLP-MV8,
> >>>>> which is based on a marvel MV64460/64461/64462 chipset, which uses
> >>>>> the sata_mv driver.
> >>>>
> >>>> Surely you mean 'mvsas' driver?
> >>>
> >>> Yes, sorry I did mean mvsas.
> >>>
> >>> I am more concerned about the actual oops/BUG rather than the warning
> >>> though. Unless the problem causing the warning is also causing the
> >>> oops.
> >>>
> >>>> Jeff
> >>
> >> Thanks for taking a look so far. But I'm having more and more trouble
> >> with this card as the days go by:
> >>
> >> [ 464.792214] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 464.792222] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 494.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 494.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 494.816192] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 494.816197] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 525.817335] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 525.817343] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 525.817358] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 525.817363] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 556.816148] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 556.816157] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 556.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 556.816175] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 587.816171] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 587.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 587.816193] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 587.816199] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 600.616255] INFO: task mount:4395 blocked for more than 120 seconds.
> >> [ 600.616263] "echo 0> /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [ 600.616270] mount D 0000000000000000 0 4395 4229
> >> 0x00000000 [ 600.616281] ffff88012fb6f780 0000000000000082
> >> ffff8800b808ddc8 ffff8800b80f3d60
> >> [ 600.616290] ffff880128923c90 0000000000014800 000000000000f800
> >> ffff88012dd91840
> >> [ 600.616299] ffff88012dd91b38 0000000300000008 0000000000000000
> >> ffff88010c5bfa88
> >> [ 600.616308] Call Trace:
> >> [ 600.616324] [<ffffffff81017015>] ? read_tsc+0xa/0x20
> >> [ 600.616335] [<ffffffff810adc63>] ? __pagevec_free+0x29/0x3b
> >> [ 600.616343] [<ffffffff810661e9>] ? getnstimeofday+0x55/0xaf
> >> [ 600.616351] [<ffffffff810a8e69>] ? sync_page+0x0/0x46
> >> [ 600.616361] [<ffffffff812dc8cb>] ? io_schedule+0x63/0xa5
> >> [ 600.616368] [<ffffffff810a8eaa>] ? sync_page+0x41/0x46
> >> [ 600.616376] [<ffffffff812dcade>] ? __wait_on_bit_lock+0x3f/0x84
> >> [ 600.616383] [<ffffffff810a8e55>] ? __lock_page+0x5d/0x63
> >> [ 600.616391] [<ffffffff8105ef14>] ? wake_bit_function+0x0/0x23
> >> [ 600.616401] [<ffffffff810b0ad4>] ? pagevec_lookup+0x17/0x1e
> >> [ 600.616408] [<ffffffff810b1bbd>] ?
> >> truncate_inode_pages_range+0x288/0x318 [ 600.616418]
> >> [<ffffffff810fc8cb>] ? set_blocksize+0xc2/0xd2
> >> [ 600.616426] [<ffffffff810fc8f2>] ? sb_set_blocksize+0x17/0x43
> >> [ 600.616477] [<ffffffffa04aa8e0>] ? ext4_fill_super+0x1cc/0x2060
> >> [ext4] [ 600.616486] [<ffffffff8117593b>] ? snprintf+0x44/0x4c
> >> [ 600.616493] [<ffffffff810fb5b7>] ? check_disk_change+0x22/0x52
> >> [ 600.616501] [<ffffffff812dd680>] ? __down_write_nested+0x15/0xab
> >> [ 600.616524] [<ffffffff810dbfe6>] ? get_sb_bdev+0x111/0x159
> >> [ 600.616571] [<ffffffffa04aa714>] ? ext4_fill_super+0x0/0x2060 [ext4]
> >> [ 600.616582] [<ffffffff810dbc44>] ? vfs_kern_mount+0x95/0x111
> >> [ 600.616593] [<ffffffff810dbd13>] ? do_kern_mount+0x43/0xe2
> >> [ 600.616607] [<ffffffff810ef49d>] ? do_mount+0x767/0x7d6
> >> [ 600.616620] [<ffffffff810ef591>] ? sys_mount+0x85/0xc8
> >> [ 600.616633] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> >> [ 618.816175] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 618.816184] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 618.816196] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 618.816201] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 618.816241] sd 1:0:3:0: [sdf] Unhandled error code
> >> [ 618.816247] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> >> driverbyte=DRIVER_TIMEOUT
> >> [ 618.816255] end_request: I/O error, dev sdf, sector 160
> >> [ 618.816263] Buffer I/O error on device md2, logical block 20
> >> [ 618.816274] Buffer I/O error on device md2, logical block 21
> >> [ 618.816280] Buffer I/O error on device md2, logical block 22
> >> [ 618.816285] Buffer I/O error on device md2, logical block 23
> >> [ 618.816291] Buffer I/O error on device md2, logical block 24
> >> [ 618.816298] Buffer I/O error on device md2, logical block 25
> >> [ 618.816307] Buffer I/O error on device md2, logical block 26
> >> [ 618.816316] Buffer I/O error on device md2, logical block 27
> >> [ 618.816324] Buffer I/O error on device md2, logical block 28
> >> [ 618.816331] Buffer I/O error on device md2, logical block 29
> >> [ 649.780185] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 649.780194] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 649.780208] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 649.780214] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 649.780293] ------------[ cut here ]------------
> >> [ 649.780366] WARNING: at drivers/ata/libata-core.c:5129
> >> ata_qc_issue+0x10a/0x347 [libata]()
> >> [ 649.780373] Hardware name: GA-MA790FXT-UD5P
> >> [ 649.780377] Modules linked in: ext4 jbd2 crc16 raid0 powernow_k8
> >> cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave
> >> kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc
> >> bridge stp it87 hwmon_vid adt7473 firewire_sbp2 loop md_mod
> >> snd_hda_codec_realtek
> >> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm
> >> snd_seq_midi snd_rawmidi snd_seq_midi_event amd64_edac_mod snd_seq
> >> edac_core snd_timer snd_seq_device snd soundcore i2c_piix4
> >> snd_page_alloc i2c_core evdev parport_pc wmi parport button processor
> >> ext3 jbd mbcache dm_mod usbhid hid sg sr_mod cdrom sd_mod crc_t10dif
> >> ata_generic ide_pci_generic firewire_ohci firewire_core ohci_hcd
> >> crc_itu_t atiixp ide_core mvsas ehci_hcd ahci libsas libata
> >> scsi_transport_sas scsi_mod r8169 mii floppy thermal fan thermal_sys
> >> [last unloaded: scsi_wait_scan] [ 649.780499] Pid: 3185, comm: hddtemp
> >> Not tainted 2.6.31-rc9 #2 [ 649.780504] Call Trace:
> >> [ 649.780551] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
> >> [ 649.780593] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
> >> [ 649.780607] [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3
> >> [ 649.780649] [<ffffffffa00790ce>] ? ata_scsi_pass_thru+0x0/0x240
> >> [libata] [ 649.780690] [<ffffffffa0074f90>] ?
> >> ata_qc_issue+0x10a/0x347 [libata] [ 649.780733] [<ffffffffa00401d6>] ?
> >> scsi_get_command+0x75/0x97 [scsi_mod] [ 649.780776]
> >> [<ffffffffa00790ce>] ?
> >> ata_scsi_pass_thru+0x0/0x240 [libata] [ 649.780815]
> >> [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod] [ 649.780858]
> >> [<ffffffffa007a4d5>] ? __ata_scsi_queuecmd+0x185/0x1dc [libata]
> >> [ 649.780896] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
> >> [ 649.780918] [<ffffffffa00a3c8e>] ? sas_queuecommand+0x83/0x25d
> >> [libsas] [ 649.780956] [<ffffffffa003fa7c>] ?
> >> scsi_dispatch_cmd+0x1c0/0x23c [scsi_mod]
> >> [ 649.780996] [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506
> >> [scsi_mod] [ 649.781006] [<ffffffff810546e0>] ? del_timer+0x59/0x62
> >> [ 649.781016] [<ffffffff81163b70>] ? blk_execute_rq_nowait+0x65/0x89
> >> [ 649.781032] [<ffffffffa014164f>] ? sg_common_write+0x489/0x4ab [sg]
> >> [ 649.781042] [<ffffffff8115df56>] ? __freed_request+0x26/0x83
> >> [ 649.781056] [<ffffffffa01421da>] ? sg_new_write+0x23e/0x269 [sg]
> >> [ 649.781070] [<ffffffffa0142473>] ? sg_ioctl+0x26e/0xb63 [sg]
> >> [ 649.781080] [<ffffffff81100f38>] ? inotify_d_instantiate+0x12/0x39
> >> [ 649.781088] [<ffffffff8105eee6>] ? autoremove_wake_function+0x0/0x2e
> >> [ 649.781098] [<ffffffff810d80bf>] ? fd_install+0x2e/0x5a
> >> [ 649.781105] [<ffffffff810e5247>] ? vfs_ioctl+0x56/0x6c
> >> [ 649.781111] [<ffffffff810e570a>] ? do_vfs_ioctl+0x437/0x475
> >> [ 649.781118] [<ffffffff810e5799>] ? sys_ioctl+0x51/0x70
> >> [ 649.781128] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> >> [ 649.781134] ---[ end trace 9005373b1b9c6eb7 ]---
> >> [ 680.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 680.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 711.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 711.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 741.816069] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 741.816078] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 741.816090] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 741.816096] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 772.820160] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 772.820168] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 772.820181] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 772.820186] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 803.816212] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 803.816220] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 803.816233] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> >> [ 803.816239] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> >> [ 803.816264] sd 1:0:3:0: [sdf] Unhandled error code
> >> [ 803.816270] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> >> driverbyte=DRIVER_TIMEOUT
> >> [ 803.816278] end_request: I/O error, dev sdf, sector 512
> >> [ 803.816284] __ratelimit: 70 callbacks suppressed
> >> [ 803.816290] Buffer I/O error on device md2, logical block 64
> >>
> >> That's after bringing up a raid0 array I build a few days ago on 4
> >> perfectly good (Seagate Baracuda 1TB 7200.12) disks, without the bad
> >> disk plugged in. I try to mount it, and the driver hangs. Anything
> >> trying to access any of the 4 disks hangs as well.
> >>
> >> I know this array worked a few days ago. The most major change I've made
> >> was upgrade from -rc8 (or -rc5, not sure if I mounted the array under
> >> -rc8) to -rc9.
> >
> > Hi, just wondering if anyone has had a chance to look at this, or if
> > there's some patches I should try out, or if you need me to do some
> > testing, I'd be glad to, thanks :)
>
> I was hoping that some VM would jump in. The BUG in question is, from
> mm/slab.c:
>
> /*
> * The slab was either on partial or free list so
> * there must be at least one object available for
> * allocation.
> */
> BUG_ON(slabp->inuse >= cachep->num);
>
> So I wonder if that is a double-free, indicating a bug in
> SCSI/libsas/mvsas, or a VM problem of some sort.
>
> Was free memory low on that machine, at that point, perchance?

Its remotely possible do to leaks in programs, but even konqueror and firefox
don't eat up memory _that_ fast, so its very unlikely. This machine has 4G
ram.

> Jeff
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Thomas Fjellstrom
[email protected]

2009-09-15 16:20:57

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Sat September 12 2009, Thomas Fjellstrom wrote:
> On Sat September 12 2009, Jeff Garzik wrote:
> > On 09/10/2009 07:55 PM, Thomas Fjellstrom wrote:
> > > On Wed September 9 2009, Thomas Fjellstrom wrote:
> > >> On Wed September 9 2009, Thomas Fjellstrom wrote:
> > >>> On Wed September 9 2009, Jeff Garzik wrote:
> > >>>> On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
> > >>>>> No errors on that disk. Other than the one above, and its more of a
> > >>>>> warning. However, I just rebooted to add some extra drives,
> > >>>>> thinking everything was working a little better now that I've
> > >>>>> updated to 2.6.31-rc9, I'm treated to the following two messages
> > >>>>> right after boot (and a system lockup to boot):
> > >>>>>
> > >>>>> kernel: [ 971.033138] ------------[ cut here ]------------
> > >>>>> kernel: [ 971.033211] WARNING: at drivers/ata/libata-core.c:4913
> > >>>>> __ata_qc_complete+0x5a/0xe1 [libata]()
> > >>>>> kernel: [ 971.033217] Hardware name: GA-MA790FXT-UD5P
> > >>>>> kernel: [ 971.033221] Modules linked in: powernow_k8
> > >>>>> cpufreq_conservative cpufreq_stats cpufreq_userspace
> > >>>>> cpufreq_powersave kvm_amd kvm nfsd exportfs nfs lockd fscache
> > >>>>> nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473
> > >>>>> firewire_sbp2 loop md_mod
> > >>>>> snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
> > >>>>> snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
> > >>>>> snd_seq_midi_event snd_seq snd_timer snd_seq_device snd
> > >>>>> amd64_edac_mod edac_core i2c_piix4 soundcore snd_page_alloc
> > >>>>> i2c_core evdev wmi parport_pc button parport processor ext3 jbd
> > >>>>> mbcache dm_mod sg sr_mod cdrom sd_mod crc_t10dif usbhid ata_generic
> > >>>>> ide_pci_generic hid mvsas firewire_ohci libsas firewire_core
> > >>>>> crc_itu_t
> > >>>>> scsi_transport_sas r8169 atiixp ide_core floppy ahci mii ohci_hcd
> > >>>>> libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
> > >>>>> scsi_wait_scan]
> > >>>>> kernel: [ 971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9
> > >>>>> #2 kernel: [ 971.033342] Call Trace:
> > >>>>> kernel: [ 971.033346]<IRQ> [<ffffffffa00562ca>] ?
> > >>>>> __ata_qc_complete+0x5a/0xe1 [libata]
> > >>>>> kernel: [ 971.033434] [<ffffffffa00562ca>] ?
> > >>>>> __ata_qc_complete+0x5a/0xe1 [libata]
> > >>>>> kernel: [ 971.033446] [<ffffffff8104aca0>] ?
> > >>>>> warn_slowpath_common+0x77/0xa3 kernel: [ 971.033455]
> > >>>>> [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65 kernel: [
> > >>>>> 971.033496] [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1
> > >>>>> [libata] kernel: [ 971.033519] [<ffffffffa00f7b59>] ?
> > >>>>> sas_ata_task_done+0x178/0x210 [libsas]
> > >>>>> kernel: [ 971.033528] [<ffffffff8115ead1>] ?
> > >>>>> blk_run_queue+0x21/0x35 kernel: [ 971.033548]
> > >>>>> [<ffffffffa010e2ce>] ?
> > >>>>> mvs_slot_complete+0x3df/0x41b [mvsas]
> > >>>>> kernel: [ 971.033565] [<ffffffffa010e39c>] ?
> > >>>>> mvs_int_rx+0x92/0x101 [mvsas] kernel: [ 971.033583]
> > >>>>> [<ffffffffa01112ba>] ?
> > >>>>> mvs_int_full+0x25/0x88 [mvsas] kernel: [ 971.033600]
> > >>>>> [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas] kernel: [
> > >>>>> 971.033617] [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
> > >>>>> kernel: [ 971.033625] [<ffffffff8108aaac>] ?
> > >>>>> handle_IRQ_event+0x58/0x135 kernel: [ 971.033633]
> > >>>>> [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5 kernel: [
> > >>>>> 971.033642]
> > >>>>> [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
> > >>>>
> > >>>> That warning is triggered when an ata_queued_cmd is passed to
> > >>>> completion without the ATA_QCFLAG_ACTIVE flag being set (which
> > >>>> indicates the qc was started with some activity).
> > >>>>
> > >>>> That possibly indicates the low-level driver (or libsas) was passing
> > >>>> an already-completed cmd to libata.
> > >>>>
> > >>>>> The added hard drives are connected to a Supermicro AOC-SASLP-MV8,
> > >>>>> which is based on a marvel MV64460/64461/64462 chipset, which uses
> > >>>>> the sata_mv driver.
> > >>>>
> > >>>> Surely you mean 'mvsas' driver?
> > >>>
> > >>> Yes, sorry I did mean mvsas.
> > >>>
> > >>> I am more concerned about the actual oops/BUG rather than the warning
> > >>> though. Unless the problem causing the warning is also causing the
> > >>> oops.
> > >>>
> > >>>> Jeff
> > >>
> > >> Thanks for taking a look so far. But I'm having more and more trouble
> > >> with this card as the days go by:
> > >>
> > >> [ 464.792214] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 464.792222] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 494.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 494.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 494.816192] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 494.816197] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 525.817335] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 525.817343] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 525.817358] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 525.817363] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 556.816148] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 556.816157] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 556.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 556.816175] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 587.816171] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 587.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 587.816193] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 587.816199] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 600.616255] INFO: task mount:4395 blocked for more than 120
> > >> seconds. [ 600.616263] "echo 0>
> > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > >> [ 600.616270] mount D 0000000000000000 0 4395 4229
> > >> 0x00000000 [ 600.616281] ffff88012fb6f780 0000000000000082
> > >> ffff8800b808ddc8 ffff8800b80f3d60
> > >> [ 600.616290] ffff880128923c90 0000000000014800 000000000000f800
> > >> ffff88012dd91840
> > >> [ 600.616299] ffff88012dd91b38 0000000300000008 0000000000000000
> > >> ffff88010c5bfa88
> > >> [ 600.616308] Call Trace:
> > >> [ 600.616324] [<ffffffff81017015>] ? read_tsc+0xa/0x20
> > >> [ 600.616335] [<ffffffff810adc63>] ? __pagevec_free+0x29/0x3b
> > >> [ 600.616343] [<ffffffff810661e9>] ? getnstimeofday+0x55/0xaf
> > >> [ 600.616351] [<ffffffff810a8e69>] ? sync_page+0x0/0x46
> > >> [ 600.616361] [<ffffffff812dc8cb>] ? io_schedule+0x63/0xa5
> > >> [ 600.616368] [<ffffffff810a8eaa>] ? sync_page+0x41/0x46
> > >> [ 600.616376] [<ffffffff812dcade>] ? __wait_on_bit_lock+0x3f/0x84
> > >> [ 600.616383] [<ffffffff810a8e55>] ? __lock_page+0x5d/0x63
> > >> [ 600.616391] [<ffffffff8105ef14>] ? wake_bit_function+0x0/0x23
> > >> [ 600.616401] [<ffffffff810b0ad4>] ? pagevec_lookup+0x17/0x1e
> > >> [ 600.616408] [<ffffffff810b1bbd>] ?
> > >> truncate_inode_pages_range+0x288/0x318 [ 600.616418]
> > >> [<ffffffff810fc8cb>] ? set_blocksize+0xc2/0xd2
> > >> [ 600.616426] [<ffffffff810fc8f2>] ? sb_set_blocksize+0x17/0x43
> > >> [ 600.616477] [<ffffffffa04aa8e0>] ? ext4_fill_super+0x1cc/0x2060
> > >> [ext4] [ 600.616486] [<ffffffff8117593b>] ? snprintf+0x44/0x4c
> > >> [ 600.616493] [<ffffffff810fb5b7>] ? check_disk_change+0x22/0x52
> > >> [ 600.616501] [<ffffffff812dd680>] ? __down_write_nested+0x15/0xab
> > >> [ 600.616524] [<ffffffff810dbfe6>] ? get_sb_bdev+0x111/0x159
> > >> [ 600.616571] [<ffffffffa04aa714>] ? ext4_fill_super+0x0/0x2060
> > >> [ext4] [ 600.616582] [<ffffffff810dbc44>] ?
> > >> vfs_kern_mount+0x95/0x111 [ 600.616593] [<ffffffff810dbd13>] ?
> > >> do_kern_mount+0x43/0xe2 [ 600.616607] [<ffffffff810ef49d>] ?
> > >> do_mount+0x767/0x7d6
> > >> [ 600.616620] [<ffffffff810ef591>] ? sys_mount+0x85/0xc8
> > >> [ 600.616633] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> > >> [ 618.816175] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 618.816184] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 618.816196] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 618.816201] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 618.816241] sd 1:0:3:0: [sdf] Unhandled error code
> > >> [ 618.816247] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> > >> driverbyte=DRIVER_TIMEOUT
> > >> [ 618.816255] end_request: I/O error, dev sdf, sector 160
> > >> [ 618.816263] Buffer I/O error on device md2, logical block 20
> > >> [ 618.816274] Buffer I/O error on device md2, logical block 21
> > >> [ 618.816280] Buffer I/O error on device md2, logical block 22
> > >> [ 618.816285] Buffer I/O error on device md2, logical block 23
> > >> [ 618.816291] Buffer I/O error on device md2, logical block 24
> > >> [ 618.816298] Buffer I/O error on device md2, logical block 25
> > >> [ 618.816307] Buffer I/O error on device md2, logical block 26
> > >> [ 618.816316] Buffer I/O error on device md2, logical block 27
> > >> [ 618.816324] Buffer I/O error on device md2, logical block 28
> > >> [ 618.816331] Buffer I/O error on device md2, logical block 29
> > >> [ 649.780185] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 649.780194] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 649.780208] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 649.780214] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 649.780293] ------------[ cut here ]------------
> > >> [ 649.780366] WARNING: at drivers/ata/libata-core.c:5129
> > >> ata_qc_issue+0x10a/0x347 [libata]()
> > >> [ 649.780373] Hardware name: GA-MA790FXT-UD5P
> > >> [ 649.780377] Modules linked in: ext4 jbd2 crc16 raid0 powernow_k8
> > >> cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave
> > >> kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss
> > >> sunrpc bridge stp it87 hwmon_vid adt7473 firewire_sbp2 loop md_mod
> > >> snd_hda_codec_realtek
> > >> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
> > >> snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event amd64_edac_mod
> > >> snd_seq edac_core snd_timer snd_seq_device snd soundcore i2c_piix4
> > >> snd_page_alloc i2c_core evdev parport_pc wmi parport button processor
> > >> ext3 jbd mbcache dm_mod usbhid hid sg sr_mod cdrom sd_mod crc_t10dif
> > >> ata_generic ide_pci_generic firewire_ohci firewire_core ohci_hcd
> > >> crc_itu_t atiixp ide_core mvsas ehci_hcd ahci libsas libata
> > >> scsi_transport_sas scsi_mod r8169 mii floppy thermal fan thermal_sys
> > >> [last unloaded: scsi_wait_scan] [ 649.780499] Pid: 3185, comm:
> > >> hddtemp Not tainted 2.6.31-rc9 #2 [ 649.780504] Call Trace:
> > >> [ 649.780551] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347
> > >> [libata] [ 649.780593] [<ffffffffa0074f90>] ?
> > >> ata_qc_issue+0x10a/0x347 [libata] [ 649.780607] [<ffffffff8104aca0>]
> > >> ? warn_slowpath_common+0x77/0xa3 [ 649.780649] [<ffffffffa00790ce>]
> > >> ? ata_scsi_pass_thru+0x0/0x240 [libata] [ 649.780690]
> > >> [<ffffffffa0074f90>] ?
> > >> ata_qc_issue+0x10a/0x347 [libata] [ 649.780733] [<ffffffffa00401d6>]
> > >> ? scsi_get_command+0x75/0x97 [scsi_mod] [ 649.780776]
> > >> [<ffffffffa00790ce>] ?
> > >> ata_scsi_pass_thru+0x0/0x240 [libata] [ 649.780815]
> > >> [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod] [ 649.780858]
> > >> [<ffffffffa007a4d5>] ? __ata_scsi_queuecmd+0x185/0x1dc [libata]
> > >> [ 649.780896] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
> > >> [ 649.780918] [<ffffffffa00a3c8e>] ? sas_queuecommand+0x83/0x25d
> > >> [libsas] [ 649.780956] [<ffffffffa003fa7c>] ?
> > >> scsi_dispatch_cmd+0x1c0/0x23c [scsi_mod]
> > >> [ 649.780996] [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506
> > >> [scsi_mod] [ 649.781006] [<ffffffff810546e0>] ?
> > >> del_timer+0x59/0x62 [ 649.781016] [<ffffffff81163b70>] ?
> > >> blk_execute_rq_nowait+0x65/0x89 [ 649.781032] [<ffffffffa014164f>] ?
> > >> sg_common_write+0x489/0x4ab [sg] [ 649.781042] [<ffffffff8115df56>]
> > >> ? __freed_request+0x26/0x83 [ 649.781056] [<ffffffffa01421da>] ?
> > >> sg_new_write+0x23e/0x269 [sg] [ 649.781070] [<ffffffffa0142473>] ?
> > >> sg_ioctl+0x26e/0xb63 [sg] [ 649.781080] [<ffffffff81100f38>] ?
> > >> inotify_d_instantiate+0x12/0x39 [ 649.781088] [<ffffffff8105eee6>] ?
> > >> autoremove_wake_function+0x0/0x2e [ 649.781098] [<ffffffff810d80bf>]
> > >> ? fd_install+0x2e/0x5a
> > >> [ 649.781105] [<ffffffff810e5247>] ? vfs_ioctl+0x56/0x6c
> > >> [ 649.781111] [<ffffffff810e570a>] ? do_vfs_ioctl+0x437/0x475
> > >> [ 649.781118] [<ffffffff810e5799>] ? sys_ioctl+0x51/0x70
> > >> [ 649.781128] [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> > >> [ 649.781134] ---[ end trace 9005373b1b9c6eb7 ]---
> > >> [ 680.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 680.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 711.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 711.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 741.816069] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 741.816078] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 741.816090] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 741.816096] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 772.820160] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 772.820168] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 772.820181] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 772.820186] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 803.816212] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 803.816220] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 803.816233] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [ 803.816239] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [ 803.816264] sd 1:0:3:0: [sdf] Unhandled error code
> > >> [ 803.816270] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> > >> driverbyte=DRIVER_TIMEOUT
> > >> [ 803.816278] end_request: I/O error, dev sdf, sector 512
> > >> [ 803.816284] __ratelimit: 70 callbacks suppressed
> > >> [ 803.816290] Buffer I/O error on device md2, logical block 64
> > >>
> > >> That's after bringing up a raid0 array I build a few days ago on 4
> > >> perfectly good (Seagate Baracuda 1TB 7200.12) disks, without the bad
> > >> disk plugged in. I try to mount it, and the driver hangs. Anything
> > >> trying to access any of the 4 disks hangs as well.
> > >>
> > >> I know this array worked a few days ago. The most major change I've
> > >> made was upgrade from -rc8 (or -rc5, not sure if I mounted the array
> > >> under -rc8) to -rc9.
> > >
> > > Hi, just wondering if anyone has had a chance to look at this, or if
> > > there's some patches I should try out, or if you need me to do some
> > > testing, I'd be glad to, thanks :)
> >
> > I was hoping that some VM would jump in. The BUG in question is, from
> > mm/slab.c:
> >
> > /*
> > * The slab was either on partial or free list so
> > * there must be at least one object available for
> > * allocation.
> > */
> > BUG_ON(slabp->inuse >= cachep->num);
> >
> > So I wonder if that is a double-free, indicating a bug in
> > SCSI/libsas/mvsas, or a VM problem of some sort.
> >
> > Was free memory low on that machine, at that point, perchance?
>
> Its remotely possible do to leaks in programs, but even konqueror and
> firefox don't eat up memory _that_ fast, so its very unlikely. This
> machine has 4G ram.
>
> > Jeff
> >
> >

Ok, I've managed to figure out one of my problems, the WD drive[s] don't much
like hddtemp and smartmon running all the time. I disabled them and ran a long
read test on the drives on a different controller, and they work fine.

Things however are not so rosy, trying to do anything with the Marvell based
SAS card causes errors.

I've attached the dmesg log from a fresh boot of 2.6.31-git4 (with the edac
amd patch applied).

I suspect somehow the card isn't initializing things right, as the "Activity"
lights on my hotswap bays are on for the two Seagate Baracuda drives, not on
full, but on none the less.

--
Thomas Fjellstrom
[email protected]


Attachments:
mvsas.dmesg (75.44 kB)

2009-09-17 15:40:04

by Thomas Fjellstrom

[permalink] [raw]
Subject: 2.6.31-git4 mvsas DRIVER_TIMEOUT

On Tue September 15 2009, Thomas Fjellstrom wrote:
> On Sat September 12 2009, Thomas Fjellstrom wrote:
> > On Sat September 12 2009, Jeff Garzik wrote:
> > > On 09/10/2009 07:55 PM, Thomas Fjellstrom wrote:
> > > > On Wed September 9 2009, Thomas Fjellstrom wrote:
> > > >> On Wed September 9 2009, Thomas Fjellstrom wrote:
> > > >>> On Wed September 9 2009, Jeff Garzik wrote:
> > > >>>> On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
> > > >>>>> No errors on that disk. Other than the one above, and its more of
> > > >>>>> a warning. However, I just rebooted to add some extra drives,
> > > >>>>> thinking everything was working a little better now that I've
> > > >>>>> updated to 2.6.31-rc9, I'm treated to the following two messages
> > > >>>>> right after boot (and a system lockup to boot):
> > > >>>>>
> > > >>>>> kernel: [ 971.033138] ------------[ cut here ]------------
> > > >>>>> kernel: [ 971.033211] WARNING: at drivers/ata/libata-core.c:4913
> > > >>>>> __ata_qc_complete+0x5a/0xe1 [libata]()
> > > >>>>> kernel: [ 971.033217] Hardware name: GA-MA790FXT-UD5P
> > > >>>>> kernel: [ 971.033221] Modules linked in: powernow_k8
> > > >>>>> cpufreq_conservative cpufreq_stats cpufreq_userspace
> > > >>>>> cpufreq_powersave kvm_amd kvm nfsd exportfs nfs lockd fscache
> > > >>>>> nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473
> > > >>>>> firewire_sbp2 loop md_mod
> > > >>>>> snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
> > > >>>>> snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
> > > >>>>> snd_seq_midi_event snd_seq snd_timer snd_seq_device snd
> > > >>>>> amd64_edac_mod edac_core i2c_piix4 soundcore snd_page_alloc
> > > >>>>> i2c_core evdev wmi parport_pc button parport processor ext3 jbd
> > > >>>>> mbcache dm_mod sg sr_mod cdrom sd_mod crc_t10dif usbhid
> > > >>>>> ata_generic ide_pci_generic hid mvsas firewire_ohci libsas
> > > >>>>> firewire_core crc_itu_t
> > > >>>>> scsi_transport_sas r8169 atiixp ide_core floppy ahci mii ohci_hcd
> > > >>>>> libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
> > > >>>>> scsi_wait_scan]
> > > >>>>> kernel: [ 971.033337] Pid: 0, comm: swapper Not tainted
> > > >>>>> 2.6.31-rc9 #2 kernel: [ 971.033342] Call Trace:
> > > >>>>> kernel: [ 971.033346]<IRQ> [<ffffffffa00562ca>] ?
> > > >>>>> __ata_qc_complete+0x5a/0xe1 [libata]
> > > >>>>> kernel: [ 971.033434] [<ffffffffa00562ca>] ?
> > > >>>>> __ata_qc_complete+0x5a/0xe1 [libata]
> > > >>>>> kernel: [ 971.033446] [<ffffffff8104aca0>] ?
> > > >>>>> warn_slowpath_common+0x77/0xa3 kernel: [ 971.033455]
> > > >>>>> [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65 kernel: [
> > > >>>>> 971.033496] [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1
> > > >>>>> [libata] kernel: [ 971.033519] [<ffffffffa00f7b59>] ?
> > > >>>>> sas_ata_task_done+0x178/0x210 [libsas]
> > > >>>>> kernel: [ 971.033528] [<ffffffff8115ead1>] ?
> > > >>>>> blk_run_queue+0x21/0x35 kernel: [ 971.033548]
> > > >>>>> [<ffffffffa010e2ce>] ?
> > > >>>>> mvs_slot_complete+0x3df/0x41b [mvsas]
> > > >>>>> kernel: [ 971.033565] [<ffffffffa010e39c>] ?
> > > >>>>> mvs_int_rx+0x92/0x101 [mvsas] kernel: [ 971.033583]
> > > >>>>> [<ffffffffa01112ba>] ?
> > > >>>>> mvs_int_full+0x25/0x88 [mvsas] kernel: [ 971.033600]
> > > >>>>> [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas] kernel: [
> > > >>>>> 971.033617] [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78
> > > >>>>> [mvsas] kernel: [ 971.033625] [<ffffffff8108aaac>] ?
> > > >>>>> handle_IRQ_event+0x58/0x135 kernel: [ 971.033633]
> > > >>>>> [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5 kernel: [
> > > >>>>> 971.033642]
> > > >>>>> [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
> > > >>>>
> > > >>>> That warning is triggered when an ata_queued_cmd is passed to
> > > >>>> completion without the ATA_QCFLAG_ACTIVE flag being set (which
> > > >>>> indicates the qc was started with some activity).
> > > >>>>
> > > >>>> That possibly indicates the low-level driver (or libsas) was
> > > >>>> passing an already-completed cmd to libata.
> > > >>>>
> > > >>>>> The added hard drives are connected to a Supermicro
> > > >>>>> AOC-SASLP-MV8, which is based on a marvel MV64460/64461/64462
> > > >>>>> chipset, which uses the sata_mv driver.
> > > >>>>
> > > >>>> Surely you mean 'mvsas' driver?
> > > >>>
> > > >>> Yes, sorry I did mean mvsas.
> > > >>>
> > > >>> I am more concerned about the actual oops/BUG rather than the
> > > >>> warning though. Unless the problem causing the warning is also
> > > >>> causing the oops.
> > > >>>
> > > >>>> Jeff
> > > >>
> > > >> Thanks for taking a look so far. But I'm having more and more
> > > >> trouble with this card as the days go by:
> > > >>
> > > >> [ 464.792214] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 464.792222] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 494.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 494.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 494.816192] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 494.816197] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 525.817335] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 525.817343] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 525.817358] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 525.817363] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 556.816148] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 556.816157] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 556.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 556.816175] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 587.816171] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 587.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 587.816193] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 587.816199] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 600.616255] INFO: task mount:4395 blocked for more than 120
> > > >> seconds. [ 600.616263] "echo 0>
> > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > >> [ 600.616270] mount D 0000000000000000 0 4395 4229
> > > >> 0x00000000 [ 600.616281] ffff88012fb6f780 0000000000000082
> > > >> ffff8800b808ddc8 ffff8800b80f3d60
> > > >> [ 600.616290] ffff880128923c90 0000000000014800 000000000000f800
> > > >> ffff88012dd91840
> > > >> [ 600.616299] ffff88012dd91b38 0000000300000008 0000000000000000
> > > >> ffff88010c5bfa88
> > > >> [ 600.616308] Call Trace:
> > > >> [ 600.616324] [<ffffffff81017015>] ? read_tsc+0xa/0x20
> > > >> [ 600.616335] [<ffffffff810adc63>] ? __pagevec_free+0x29/0x3b
> > > >> [ 600.616343] [<ffffffff810661e9>] ? getnstimeofday+0x55/0xaf
> > > >> [ 600.616351] [<ffffffff810a8e69>] ? sync_page+0x0/0x46
> > > >> [ 600.616361] [<ffffffff812dc8cb>] ? io_schedule+0x63/0xa5
> > > >> [ 600.616368] [<ffffffff810a8eaa>] ? sync_page+0x41/0x46
> > > >> [ 600.616376] [<ffffffff812dcade>] ? __wait_on_bit_lock+0x3f/0x84
> > > >> [ 600.616383] [<ffffffff810a8e55>] ? __lock_page+0x5d/0x63
> > > >> [ 600.616391] [<ffffffff8105ef14>] ? wake_bit_function+0x0/0x23
> > > >> [ 600.616401] [<ffffffff810b0ad4>] ? pagevec_lookup+0x17/0x1e
> > > >> [ 600.616408] [<ffffffff810b1bbd>] ?
> > > >> truncate_inode_pages_range+0x288/0x318 [ 600.616418]
> > > >> [<ffffffff810fc8cb>] ? set_blocksize+0xc2/0xd2
> > > >> [ 600.616426] [<ffffffff810fc8f2>] ? sb_set_blocksize+0x17/0x43
> > > >> [ 600.616477] [<ffffffffa04aa8e0>] ? ext4_fill_super+0x1cc/0x2060
> > > >> [ext4] [ 600.616486] [<ffffffff8117593b>] ? snprintf+0x44/0x4c
> > > >> [ 600.616493] [<ffffffff810fb5b7>] ? check_disk_change+0x22/0x52
> > > >> [ 600.616501] [<ffffffff812dd680>] ? __down_write_nested+0x15/0xab
> > > >> [ 600.616524] [<ffffffff810dbfe6>] ? get_sb_bdev+0x111/0x159
> > > >> [ 600.616571] [<ffffffffa04aa714>] ? ext4_fill_super+0x0/0x2060
> > > >> [ext4] [ 600.616582] [<ffffffff810dbc44>] ?
> > > >> vfs_kern_mount+0x95/0x111 [ 600.616593] [<ffffffff810dbd13>] ?
> > > >> do_kern_mount+0x43/0xe2 [ 600.616607] [<ffffffff810ef49d>] ?
> > > >> do_mount+0x767/0x7d6
> > > >> [ 600.616620] [<ffffffff810ef591>] ? sys_mount+0x85/0xc8
> > > >> [ 600.616633] [<ffffffff81010a02>] ?
> > > >> system_call_fastpath+0x16/0x1b [ 618.816175]
> > > >> drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5 [ 618.816184]
> > > >> drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5 [ 618.816196]
> > > >> drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5 [ 618.816201]
> > > >> drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5 [ 618.816241]
> > > >> sd 1:0:3:0: [sdf] Unhandled error code
> > > >> [ 618.816247] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> > > >> driverbyte=DRIVER_TIMEOUT
> > > >> [ 618.816255] end_request: I/O error, dev sdf, sector 160
> > > >> [ 618.816263] Buffer I/O error on device md2, logical block 20
> > > >> [ 618.816274] Buffer I/O error on device md2, logical block 21
> > > >> [ 618.816280] Buffer I/O error on device md2, logical block 22
> > > >> [ 618.816285] Buffer I/O error on device md2, logical block 23
> > > >> [ 618.816291] Buffer I/O error on device md2, logical block 24
> > > >> [ 618.816298] Buffer I/O error on device md2, logical block 25
> > > >> [ 618.816307] Buffer I/O error on device md2, logical block 26
> > > >> [ 618.816316] Buffer I/O error on device md2, logical block 27
> > > >> [ 618.816324] Buffer I/O error on device md2, logical block 28
> > > >> [ 618.816331] Buffer I/O error on device md2, logical block 29
> > > >> [ 649.780185] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 649.780194] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 649.780208] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 649.780214] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 649.780293] ------------[ cut here ]------------
> > > >> [ 649.780366] WARNING: at drivers/ata/libata-core.c:5129
> > > >> ata_qc_issue+0x10a/0x347 [libata]()
> > > >> [ 649.780373] Hardware name: GA-MA790FXT-UD5P
> > > >> [ 649.780377] Modules linked in: ext4 jbd2 crc16 raid0 powernow_k8
> > > >> cpufreq_conservative cpufreq_stats cpufreq_userspace
> > > >> cpufreq_powersave kvm_amd kvm nfsd exportfs nfs lockd fscache
> > > >> nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473
> > > >> firewire_sbp2 loop md_mod snd_hda_codec_realtek
> > > >> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
> > > >> snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event amd64_edac_mod
> > > >> snd_seq edac_core snd_timer snd_seq_device snd soundcore i2c_piix4
> > > >> snd_page_alloc i2c_core evdev parport_pc wmi parport button
> > > >> processor ext3 jbd mbcache dm_mod usbhid hid sg sr_mod cdrom sd_mod
> > > >> crc_t10dif ata_generic ide_pci_generic firewire_ohci firewire_core
> > > >> ohci_hcd crc_itu_t atiixp ide_core mvsas ehci_hcd ahci libsas libata
> > > >> scsi_transport_sas scsi_mod r8169 mii floppy thermal fan thermal_sys
> > > >> [last unloaded: scsi_wait_scan] [ 649.780499] Pid: 3185, comm:
> > > >> hddtemp Not tainted 2.6.31-rc9 #2 [ 649.780504] Call Trace: [
> > > >> 649.780551] [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347
> > > >> [libata] [ 649.780593] [<ffffffffa0074f90>] ?
> > > >> ata_qc_issue+0x10a/0x347 [libata] [ 649.780607]
> > > >> [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3 [ 649.780649]
> > > >> [<ffffffffa00790ce>] ? ata_scsi_pass_thru+0x0/0x240 [libata] [
> > > >> 649.780690]
> > > >> [<ffffffffa0074f90>] ?
> > > >> ata_qc_issue+0x10a/0x347 [libata] [ 649.780733]
> > > >> [<ffffffffa00401d6>] ? scsi_get_command+0x75/0x97 [scsi_mod] [
> > > >> 649.780776]
> > > >> [<ffffffffa00790ce>] ?
> > > >> ata_scsi_pass_thru+0x0/0x240 [libata] [ 649.780815]
> > > >> [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod] [ 649.780858]
> > > >> [<ffffffffa007a4d5>] ? __ata_scsi_queuecmd+0x185/0x1dc [libata]
> > > >> [ 649.780896] [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
> > > >> [ 649.780918] [<ffffffffa00a3c8e>] ? sas_queuecommand+0x83/0x25d
> > > >> [libsas] [ 649.780956] [<ffffffffa003fa7c>] ?
> > > >> scsi_dispatch_cmd+0x1c0/0x23c [scsi_mod]
> > > >> [ 649.780996] [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506
> > > >> [scsi_mod] [ 649.781006] [<ffffffff810546e0>] ?
> > > >> del_timer+0x59/0x62 [ 649.781016] [<ffffffff81163b70>] ?
> > > >> blk_execute_rq_nowait+0x65/0x89 [ 649.781032] [<ffffffffa014164f>]
> > > >> ? sg_common_write+0x489/0x4ab [sg] [ 649.781042]
> > > >> [<ffffffff8115df56>] ? __freed_request+0x26/0x83 [ 649.781056]
> > > >> [<ffffffffa01421da>] ? sg_new_write+0x23e/0x269 [sg] [ 649.781070]
> > > >> [<ffffffffa0142473>] ? sg_ioctl+0x26e/0xb63 [sg] [ 649.781080]
> > > >> [<ffffffff81100f38>] ? inotify_d_instantiate+0x12/0x39 [
> > > >> 649.781088] [<ffffffff8105eee6>] ?
> > > >> autoremove_wake_function+0x0/0x2e [ 649.781098]
> > > >> [<ffffffff810d80bf>] ? fd_install+0x2e/0x5a
> > > >> [ 649.781105] [<ffffffff810e5247>] ? vfs_ioctl+0x56/0x6c
> > > >> [ 649.781111] [<ffffffff810e570a>] ? do_vfs_ioctl+0x437/0x475
> > > >> [ 649.781118] [<ffffffff810e5799>] ? sys_ioctl+0x51/0x70
> > > >> [ 649.781128] [<ffffffff81010a02>] ?
> > > >> system_call_fastpath+0x16/0x1b [ 649.781134] ---[ end trace
> > > >> 9005373b1b9c6eb7 ]---
> > > >> [ 680.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 680.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 711.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 711.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 741.816069] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 741.816078] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 741.816090] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 741.816096] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 772.820160] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 772.820168] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 772.820181] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 772.820186] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 803.816212] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 803.816220] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 803.816233] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > > >> [ 803.816239] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > > >> [ 803.816264] sd 1:0:3:0: [sdf] Unhandled error code
> > > >> [ 803.816270] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> > > >> driverbyte=DRIVER_TIMEOUT
> > > >> [ 803.816278] end_request: I/O error, dev sdf, sector 512
> > > >> [ 803.816284] __ratelimit: 70 callbacks suppressed
> > > >> [ 803.816290] Buffer I/O error on device md2, logical block 64
> > > >>
> > > >> That's after bringing up a raid0 array I build a few days ago on 4
> > > >> perfectly good (Seagate Baracuda 1TB 7200.12) disks, without the
> > > >> bad disk plugged in. I try to mount it, and the driver hangs.
> > > >> Anything trying to access any of the 4 disks hangs as well.
> > > >>
> > > >> I know this array worked a few days ago. The most major change I've
> > > >> made was upgrade from -rc8 (or -rc5, not sure if I mounted the array
> > > >> under -rc8) to -rc9.
> > > >
> > > > Hi, just wondering if anyone has had a chance to look at this, or if
> > > > there's some patches I should try out, or if you need me to do some
> > > > testing, I'd be glad to, thanks :)
> > >
> > > I was hoping that some VM would jump in. The BUG in question is, from
> > > mm/slab.c:
> > >
> > > /*
> > > * The slab was either on partial or free list so
> > > * there must be at least one object available for
> > > * allocation.
> > > */
> > > BUG_ON(slabp->inuse >= cachep->num);
> > >
> > > So I wonder if that is a double-free, indicating a bug in
> > > SCSI/libsas/mvsas, or a VM problem of some sort.
> > >
> > > Was free memory low on that machine, at that point, perchance?
> >
> > Its remotely possible do to leaks in programs, but even konqueror and
> > firefox don't eat up memory _that_ fast, so its very unlikely. This
> > machine has 4G ram.
> >
> > > Jeff
>
> Ok, I've managed to figure out one of my problems, the WD drive[s] don't
> much like hddtemp and smartmon running all the time. I disabled them and
> ran a long read test on the drives on a different controller, and they
> work fine.
>
> Things however are not so rosy, trying to do anything with the Marvell
> based SAS card causes errors.
>
> I've attached the dmesg log from a fresh boot of 2.6.31-git4 (with the edac
> amd patch applied).
>
> I suspect somehow the card isn't initializing things right, as the
> "Activity" lights on my hotswap bays are on for the two Seagate Baracuda
> drives, not on full, but on none the less.
>

I've tried yet again to get this stuff working, the two WD drives are now happily located on my mainboard's SATA ports, so they can't interfear with anything (in case they are overly picky or slow about various things, they are WD Green drives after all). I've also made sure to disable hddtemp and smartd.

The WD Green array works great, near 200MB/s throughput on my mainboard's controller.

But the mvsas controller continues to kill itself minutes after I try to do something with it. It starts with a bunch of these messages:

[ 1270.698287] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 1270.698296] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5

then the kernel sd/sata code gets upset and returns a bunch of these:

[ 1424.705240] sd 0:0:1:0: [sdf] Unhandled error code
[ 1424.705247] sd 0:0:1:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 1424.705256] sd 0:0:1:0: [sdf] CDB: Read(10): 28 00 00 00 10 00 00 03 50 00
[ 1424.705278] end_request: I/O error, dev sdf, sector 4096
[ 1424.705287] Buffer I/O error on device md1, logical block 2176

And on till everything just gives up and the controller locks up completely. hotswaping disks won't get them back, and shutting down fails when the kernel attempts to shut down LVM volumes (even though I don't have any), and synchronize the drive's SCSI cache.

I've attached my full dmesg from this boot. Maybe it'll help.

--
Thomas Fjellstrom
[email protected]


Attachments:
mvsas.dmesg2 (73.81 kB)

2009-11-04 01:23:24

by Andre Tomt

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

Jeff Garzik wrote:
> On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
>> No errors on that disk. Other than the one above, and its more of a
>> warning.
>> However, I just rebooted to add some extra drives, thinking everything
>> was
>> working a little better now that I've updated to 2.6.31-rc9, I'm
>> treated to
>> the following two messages right after boot (and a system lockup to
>> boot):

Jeff & Co,

Just chiming in with a "me too" comment.

Identical issues with the AOC-SASLP-MV8 card with 8 Seagate Barracuda
7200.11 1.5TB drives using the mvsas driver. Creating a md raid array
seems to trigger it nearly instantly, however other loads do not trigger
it as fast (if at all, only tested briefly with paralell dd).

Not sure if its related in any way, but I also noticed that with
parallell dd read on all ports, the card/driver tops out at 600MB/s even
though the card is in a northbridge connected slot running in x4 mode.
Only the first 4-5 ports run at nearly full speed, while the rest barely
gets to read any data at all. To illustrate, it looks sort of like this
in iostat -m:
sda 122
sdb 122
sbc 120
sdd 121
sde 100
sdf 10
sdg 5
sdh 5

Starting to think this driver has some issues with concurrency.. The
other controllers have no issues saturating all their ports, even if
they are on a south bridge connected slot.

Same disks/setup works just fine on AHCI and sata_mv controllers on the
same computer, easily pushes 8-900MB/s with md software raid5/6 (Yipes!)

2009-11-04 01:05:07

by Andre Tomt

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

Andre Tomt wrote:
> Jeff Garzik wrote:
>> On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
>>> No errors on that disk. Other than the one above, and its more of a
>>> warning.
>>> However, I just rebooted to add some extra drives, thinking
>>> everything was
>>> working a little better now that I've updated to 2.6.31-rc9, I'm
>>> treated to
>>> the following two messages right after boot (and a system lockup to
>>> boot):
>
> Jeff & Co,
>
> Just chiming in with a "me too" comment.
>
> Identical issues with the AOC-SASLP-MV8 card with 8 Seagate Barracuda
> 7200.11 1.5TB drives using the mvsas driver. Creating a md raid array
> seems to trigger it nearly instantly, however other loads do not trigger
> it as fast (if at all, only tested briefly with paralell dd).
>
> Not sure if its related in any way, but I also noticed that with
> parallell dd read on all ports, the card/driver tops out at 600MB/s even
> though the card is in a northbridge connected slot running in x4 mode.
> Only the first 4-5 ports run at nearly full speed, while the rest barely
> gets to read any data at all. To illustrate, it looks sort of like this
> in iostat -m:
> sda 122
> sdb 122
> sbc 120
> sdd 121
> sde 100
> sdf 10
> sdg 5
> sdh 5
>
> Starting to think this driver has some issues with concurrency.. The
> other controllers have no issues saturating all their ports, even if
> they are on a south bridge connected slot.

Err, saturating all the disks. I'm not expecting 300MB/s*8, even though
it would be very cool 8)

>
> Same disks/setup works just fine on AHCI and sata_mv controllers on the
> same computer, easily pushes 8-900MB/s with md software raid5/6 (Yipes!)
>

2009-12-03 21:17:53

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Wed, 4 Nov 2009, Andre Tomt wrote:

> Identical issues with the AOC-SASLP-MV8 card with 8 Seagate Barracuda
> 7200.11 1.5TB drives using the mvsas driver. Creating a md raid array
> seems to trigger it nearly instantly, however other loads do not trigger
> it as fast (if at all, only tested briefly with paralell dd).

Would just like to chime in with a "me too" on this. I've tried both
ubuntu 9.10 kernel (2.6.31) and their 10.04 pre-alpha (guess it's
2.6.32-rc8) and mvsas + AOC-SASLP-MV8 is pretty much unusable. If I start
md creation it stalls immediately and gets nowhere, last time I tried it
md claimed it had written 160 blocks then it stalled (all drive LEDs on
the controller is constantly lit).

If I boot without drives and then hot-plug the drives, it immediately
oops:es.

Any code/fixes I can try, this is a lab machine so I can easily do
whatever needed with it...

--
Mikael Abrahamsson email: [email protected]

2009-12-03 21:36:40

by Kristleifur Daðason

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Thu, Dec 3, 2009 at 9:17 PM, Mikael Abrahamsson <[email protected]> wrote:
>
> On Wed, 4 Nov 2009, Andre Tomt wrote:
>
>> Identical issues with the AOC-SASLP-MV8 card with 8 Seagate Barracuda
>> 7200.11 1.5TB drives using the mvsas driver. Creating a md raid array seems
>> to trigger it nearly instantly, however other loads do not trigger it
>> as fast (if at all, only tested briefly with paralell dd).
>
> Would just like to chime in with a "me too" on this. I've tried both ubuntu
> 9.10 kernel (2.6.31) and their 10.04 pre-alpha (guess it's 2.6.32-rc8) and
> mvsas + AOC-SASLP-MV8 is pretty much unusable. If I start md creation it
> stalls immediately and gets nowhere, last time I tried it md claimed it had
> written 160 blocks then it stalled (all drive LEDs on the controller is
> constantly lit).
>
> If I boot without drives and then hot-plug the drives, it immediately
> oops:es.
>
> Any code/fixes I can try, this is a lab machine so I can easily do whatever
> needed with it...
>

Hi,

I had a bit of trouble finding some patches that were released by Andy Yan at
Marvell on November 9th, so here is a link:

1/7: http://kerneltrap.org/mailarchive/linux-scsi/2009/11/9/6558753
2/7: http://kerneltrap.org/mailarchive/linux-scsi/2009/11/9/6558763
3/7: http://kerneltrap.org/mailarchive/linux-scsi/2009/11/9/6558773
4/7: http://kerneltrap.org/mailarchive/linux-scsi/2009/11/9/6558783
5/7: http://kerneltrap.org/mailarchive/linux-scsi/2009/11/9/6558793
6/7: http://kerneltrap.org/mailarchive/linux-scsi/2009/11/9/6558803
7/7: http://kerneltrap.org/mailarchive/linux-scsi/2009/11/9/6558813

I haven't tried these myself, but I believe that mr. Fjellstrom tried, with
success.

Note to the lists - the patches weren't properly construction for inclusion in
the kernel. If the outstanding issues could be fixed, this would be a boon to
the open-soure community as these cards are very good value for money. They
have been recommended for use on OpenSolaris by one of Stanford's ZFS
admins, which indicates their quality to be very much acceptable.

-- Kristleifur

2009-12-04 19:00:13

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Thu, 3 Dec 2009, Kristleifur Da?ason wrote:

> On Thu, Dec 3, 2009 at 9:17 PM, Mikael Abrahamsson <[email protected]> wrote:
>>
>> On Wed, 4 Nov 2009, Andre Tomt wrote:
>>
>>> Identical issues with the AOC-SASLP-MV8 card with 8 Seagate Barracuda
>>> 7200.11 1.5TB drives using the mvsas driver. Creating a md raid array seems
>>> to trigger it nearly instantly, however other loads do not trigger it
>>> as fast (if at all, only tested briefly with paralell dd).
>>
>> Would just like to chime in with a "me too" on this. I've tried both ubuntu
>> 9.10 kernel (2.6.31) and their 10.04 pre-alpha (guess it's 2.6.32-rc8) and
>> mvsas + AOC-SASLP-MV8 is pretty much unusable. If I start md creation it
>> stalls immediately and gets nowhere, last time I tried it md claimed it had
>> written 160 blocks then it stalled (all drive LEDs on the controller is
>> constantly lit).
>>
>> If I boot without drives and then hot-plug the drives, it immediately
>> oops:es.
>>
>> Any code/fixes I can try, this is a lab machine so I can easily do whatever
>> needed with it...
>>
>
> Hi,
>
> I had a bit of trouble finding some patches that were released by Andy Yan at
> Marvell on November 9th, so here is a link:
>
> 1/7: http://kerneltrap.org/mailarchive/linux-scsi/2009/11/9/6558753

They applied cleanly to the ubuntu 10.04 2.6.32 kernel source package but
when compiling I received an error about "sas_change_queue_depth" having
wrong amount of arguments:

- sas_change_queue_depth(sdev, MVS_QUEUE_SIZE);
+ sas_change_queue_depth(sdev, MVS_QUEUE_SIZE, SCSI_QDEPTH_DEFAULT);

So I removed that change and then things compiled, booted properly, but
just froze when I plugged in the disks (including some garbled graphics on
my X screen). That patch set doesn't seem to be ok.

--
Mikael Abrahamsson email: [email protected]

2009-12-04 19:09:06

by Kristleifur Daðason

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Fri, Dec 4, 2009 at 7:00 PM, Mikael Abrahamsson <[email protected]> wrote:
>
> On Thu, 3 Dec 2009, Kristleifur Dađason wrote:
>
>> On Thu, Dec 3, 2009 at 9:17 PM, Mikael Abrahamsson <[email protected]> wrote:
>>>
>>> On Wed, 4 Nov 2009, Andre Tomt wrote:
>>>
>>>> Identical issues with the AOC-SASLP-MV8 card with 8 Seagate Barracuda
>>>> 7200.11 1.5TB drives using the mvsas driver. Creating a md raid array seems
>>>> to trigger it nearly instantly, however other loads do not trigger it
>>>> as fast (if at all, only tested briefly with paralell dd).
>>>
>>> Would just like to chime in with a "me too" on this. I've tried both ubuntu
>>> 9.10 kernel (2.6.31) and their 10.04 pre-alpha (guess it's 2.6.32-rc8) and
>>> mvsas + AOC-SASLP-MV8 is pretty much unusable. If I start md creation it
>>> stalls immediately and gets nowhere, last time I tried it md claimed it had
>>> written 160 blocks then it stalled (all drive LEDs on the controller is
>>> constantly lit).
>>>
>>> If I boot without drives and then hot-plug the drives, it immediately
>>> oops:es.
>>>
>>> Any code/fixes I can try, this is a lab machine so I can easily do whatever
>>> needed with it...
>>>
>>
>> Hi,
>>
>> I had a bit of trouble finding some patches that were released by Andy Yan at
>> Marvell on November 9th, so here is a link:
>>
>> 1/7: http://kerneltrap.org/mailarchive/linux-scsi/2009/11/9/6558753
>
> They applied cleanly to the ubuntu 10.04 2.6.32 kernel source package but when compiling I received an error about "sas_change_queue_depth" having wrong amount of arguments:
>
> -               sas_change_queue_depth(sdev, MVS_QUEUE_SIZE);
> +               sas_change_queue_depth(sdev, MVS_QUEUE_SIZE, SCSI_QDEPTH_DEFAULT);
>
> So I removed that change and then things compiled, booted properly, but just froze when I plugged in the disks (including some garbled graphics on my X screen). That patch set doesn't seem to be ok.

(This is resent as my first email was HTML formatted and rejected by
the list. Edited and amended also.)

Cheers for trying them out! For the record, I think they were made
against 2.6.31, perhaps the original 2.6.31 rather than later 2.6.31.x
releases.

And also, I recommend using the mainline kernel for Ubuntu, especially
when trying these patches. It's available here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/ - including source
packages.

-- Kristleifur

2009-12-05 10:54:17

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Fri, 4 Dec 2009, Kristleifur Da?ason wrote:

> Cheers for trying them out! For the record, I think they were made
> against 2.6.31, perhaps the original 2.6.31 rather than later 2.6.31.x
> releases.

It doesn't apply cleanly against vanilla 2.6.31, it does however apply
cleanly against vanilla 2.6.32, but with the following compile error:

drivers/scsi/mvsas/mv_sas.c: In function ?mvs_slave_configure?:
drivers/scsi/mvsas/mv_sas.c:417: error: ?SCSI_QDEPTH_DEFAULT? undeclared
(first use in this function)
drivers/scsi/mvsas/mv_sas.c:417: error: (Each undeclared identifier is
reported only once
drivers/scsi/mvsas/mv_sas.c:417: error: for each function it appears in.)
drivers/scsi/mvsas/mv_sas.c:417: error: too many arguments to function
?sas_change_queue_depth?
make[4]: *** [drivers/scsi/mvsas/mv_sas.o] Error 1

When removing this extra option, it compiles with vanilla 2.6.32 just
fine. I crashes the same way as the ubuntu kernel when hot-plugging a disk
as well (this time I did it without gdm/x running and it oopses, but
unfortunately the highest part of the oops scrolled off). It's 100%
reproducible anyway, so basically the controller is a no-go in 2.6.31.6
and 2.6.32 both vanilla and ubuntu versions, both without and including
the nov09 mvsas patch (at least the original version of the patch as
posted). Different failure scenarios though... With the stock kernel the
controller seems to get stuck in writing and nothing more happens but at
least the machine doesn't crash and burn like it does with the
mvsas-patch.

--
Mikael Abrahamsson email: [email protected]

2009-12-10 19:25:20

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Sat, 5 Dec 2009, Mikael Abrahamsson wrote:

> When removing this extra option, it compiles with vanilla 2.6.32 just
> fine. I crashes the same way as the ubuntu kernel when hot-plugging a
> disk as well (this time I did it without gdm/x running and it oopses,
> but unfortunately the highest part of the oops scrolled off). It's 100%
> reproducible anyway, so basically the controller is a no-go in 2.6.31.6
> and 2.6.32 both vanilla and ubuntu versions, both without and including
> the nov09 mvsas patch (at least the original version of the patch as
> posted). Different failure scenarios though... With the stock kernel the
> controller seems to get stuck in writing and nothing more happens but at
> least the machine doesn't crash and burn like it does with the
> mvsas-patch.

Would it make sense for me to log this problem in bugzilla? There seems to
be a similar issue with equal hw:

<http://bugzilla.kernel.org/show_bug.cgi?id=14534>

did these fixes ever go upstream? Are they scheduled to go into the
stable rebuilds of 2.6.32 ? The patches mentioned in there, are these the
ones I already tried that didn't solve my problem?

--
Mikael Abrahamsson email: [email protected]

2009-12-10 21:04:32

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Thu December 10 2009, Mikael Abrahamsson wrote:
> On Sat, 5 Dec 2009, Mikael Abrahamsson wrote:
> > When removing this extra option, it compiles with vanilla 2.6.32 just
> > fine. I crashes the same way as the ubuntu kernel when hot-plugging a
> > disk as well (this time I did it without gdm/x running and it oopses,
> > but unfortunately the highest part of the oops scrolled off). It's 100%
> > reproducible anyway, so basically the controller is a no-go in 2.6.31.6
> > and 2.6.32 both vanilla and ubuntu versions, both without and including
> > the nov09 mvsas patch (at least the original version of the patch as
> > posted). Different failure scenarios though... With the stock kernel
> > the controller seems to get stuck in writing and nothing more happens
> > but at least the machine doesn't crash and burn like it does with the
> > mvsas-patch.
>
> Would it make sense for me to log this problem in bugzilla? There seems
> to be a similar issue with equal hw:
>
> <http://bugzilla.kernel.org/show_bug.cgi?id=14534>
>
> did these fixes ever go upstream? Are they scheduled to go into the
> stable rebuilds of 2.6.32 ? The patches mentioned in there, are these the
> ones I already tried that didn't solve my problem?
>

I'm really hoping Andy Yan or someone else on the project can get some time
to polish up the last patch set posted to linux-scsi. There were a few
comments regarding them, but no one has responded to them yet.

--
Thomas Fjellstrom
[email protected]