Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932258AbVLUCRF (ORCPT ); Tue, 20 Dec 2005 21:17:05 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932260AbVLUCRE (ORCPT ); Tue, 20 Dec 2005 21:17:04 -0500 Received: from L8R.net ([216.58.41.32]:19408 "EHLO l8r.net") by vger.kernel.org with ESMTP id S932258AbVLUCRC (ORCPT ); Tue, 20 Dec 2005 21:17:02 -0500 Date: Tue, 20 Dec 2005 21:17:01 -0500 From: Brad Barnett To: linux-kernel@vger.kernel.org Subject: Re: ahci + software raid (intel E7221, supermicro P8SCT) causes kernel BUG at drivers/scsi/scsi.c:295 Message-ID: <20051220211701.303fae75@be.back.l8r.net> In-Reply-To: <20051215112155.57a2a0bb@be.back.l8r.net> References: <20051215112155.57a2a0bb@be.back.l8r.net> X-Mailer: Sylpheed-Claws 1.0.5 (GTK+ 1.2.10; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5078 Lines: 129 I've narrowed this down to a faulty SATA drive. It performed at 1/4th the speed of my other drives, and eventually would disintegrate and remain non-responsive for long periods of time. Still, what happened below is probably not the desired behaviour in response to a dead drive. :/ On Thu, 15 Dec 2005 11:21:55 -0500 Brad Barnett wrote: > > Is there any additional information anyone would like? Any suggestions > on this bug? I currently have a reproducible problem, on a system I can > monkey with for the next week or two. > > After that, it will have to move into production, and I won't be able to > do much with it. :/ > > Thanks > > > ------------------------------- > > > Supermicro P8SCT: > > http://supermicro.com/products/motherboard/P4/E7221/P8SCT.cfm > > (uses intel E7221/ahci for SATA controller). Hot-swap raid cage. Three > SATA ports in use. Also have two MegaRAID SATA 150-6 onboard, one in > PCI-X slot. > > This box will run stable, but at inconsistent times lock up. After > hooking up serial console, we discovered the problem. > > FYI, we have three different motherboards from Supermicro, and they all > exhibit the same issue... it is unlikely a fault with our specific > hardware. > > It is easiest to trigger this bug (not certain if this is the only way) > when running software raid, two drives, in stripe mode. Many times this > configuration will cause no isses, other times I get this: > > ata2: handling error/timeout > ata2: port reset, p_is 0 is 0 pis 0 cmd 4017 tf d0 ss 113 se 0 > ata2: status=0x50 { DriveReady SeekComplete } > sdb: Current: sense key=0x0 > ASC=0x0 ASCQ=0x0 > Assertion failed! qc->flags & > ATA_QCFLAG_ACTIVE,drivers/scsi/libata-core.c,ata_qc_complete,line=351 > 3 > ------------[ cut here ]------------ > kernel BUG at drivers/scsi/scsi.c:295! > invalid operand: 0000 [#1] > Modules linked in: > CPU: 0 > EIP: 0060:[] Not tainted VLI > EFLAGS: 00010046 (2.6.15-rc2) > EIP is at scsi_put_command+0x8b/0x95 > eax: f7f69d90 ebx: f7f98680 ecx: f7f9868c edx: f7f9868c > esi: f7f8c000 edi: 00000296 ebp: f7f69c00 esp: f7ff7c0c > ds: 007b es: 007b ss: 0068 > Process scsi_eh_3 (pid: 858, threadinfo=f7ff6000 task=f7ea9030) > Stack: f7f69df8 c0225720 f7f98680 f7f69d90 f7f62dfc f7f62dfc c02d9cdf > f7f98680 efadda7c f7f98680 00000282 c02d9e05 f7f98680 00000001 > 00000000 efadda7c f7f98680 00000000 f7f98680 c02da0df f7f98680 > 00000001 00000000 00000001 Call Trace: > [] kobject_get+0x1a/0x24 > [] scsi_next_command+0x2f/0x4f > [] scsi_end_request+0xc4/0xd6 > [] scsi_io_completion+0x19a/0x41e > [] sd_rw_intr+0xc6/0x26d > [] scsi_finish_command+0x82/0xa2 > [] ata_scsi_qc_complete+0x47/0x8d > [] ata_qc_complete+0x40/0xc6 > [] ahci_interrupt+0x107/0x1e3 > [] activate_task+0x6d/0x80 > [] handle_IRQ_event+0x2e/0x64 > [] __do_IRQ+0x53/0xa5 > [] update_process_times+0x7b/0x106 > [] do_IRQ+0x19/0x24 > [] common_interrupt+0x1a/0x20 > [] __do_softirq+0x2f/0x8a > [] do_softirq+0x26/0x28 > [] do_IRQ+0x1e/0x24 > [] common_interrupt+0x1a/0x20 > [] scsi_request_fn+0x1a9/0x2c2 > [] blk_run_queue+0x3a/0x3c > [] scsi_next_command+0x37/0x4f > [] scsi_end_request+0xc4/0xd6 > [] scsi_io_completion+0x19a/0x41e > [] sd_rw_intr+0xc6/0x26d > [] scsi_finish_command+0x82/0xa2 > [] ata_scsi_qc_complete+0x47/0x8d > [] ata_qc_complete+0x40/0xc6 > [] ahci_eng_timeout+0x85/0xb0 > [] scsi_error_handler+0x0/0x99 > [] ata_scsi_error+0x1a/0x31 > [] scsi_error_handler+0x5f/0x99 > [] kthread+0xb1/0xb7 > [] kthread+0x0/0xb7 > [] kernel_thread_helper+0x5/0xb > Code: 5c 24 08 8b 74 24 0c 89 44 24 1c 8b 7c 24 10 8b 6c 24 14 83 c4 18 > e9 cd b1 fc ff 89 43 0c 89 48 04 31 db 89 51 04 89 4e 14 eb b7 <0f> 0b > 27 01 3f ce 47 c0 eb 95 57 > <0>Kernel panic - not syncing: Fatal exception in interrupt > > > > > Put another way, I can setup, prep and create a software raid.. format > it.. and start to move data onto it. I can even use it for quite some > time. However, once the above happens once, I can not use that software > raid again until I wipe out the raid and start fresh. Merely booting > with the raid in causes the above problem on bootup... > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" > in the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/