Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760622AbXIJTAk (ORCPT ); Mon, 10 Sep 2007 15:00:40 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1760047AbXIJS7y (ORCPT ); Mon, 10 Sep 2007 14:59:54 -0400 Received: from py-out-1112.google.com ([64.233.166.180]:60116 "EHLO py-out-1112.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759720AbXIJS7u (ORCPT ); Mon, 10 Sep 2007 14:59:50 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=GvbH85pQ7c01QVrZM/6e7zTGkDI3Jav63gbPtL5JRMan4MfMFqZcxV+HLvej65GyAtrnaev9VNwHunvBrwwLxIDaV4qNX20oeTXEY1MEeONuVQEg6zo60gbrot/L/pBU325AO7J2O75bKErAtDjLxv/sQUTzOQpGqy+s0dAmnOs= Message-ID: <64bb37e0709101159v47f586aby7f078ef1db5cbc39@mail.gmail.com> Date: Mon, 10 Sep 2007 20:59:49 +0200 From: "Torsten Kaiser" To: "Andrew Morton" Subject: Re: 2.6.23-rc4-mm1 Cc: "Andy Whitcroft" , linux-kernel@vger.kernel.org, mel@csn.ul.ie, "Jens Axboe" , linux-scsi@vger.kernel.org In-Reply-To: <20070910111926.9c942358.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20070831215822.26e1432b.akpm@linux-foundation.org> <20070910174926.GC30335@shadowen.org> <20070910111926.9c942358.akpm@linux-foundation.org> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7098 Lines: 137 On 9/10/07, Andrew Morton wrote: > On Mon, 10 Sep 2007 18:49:26 +0100 Andy Whitcroft wrote: > > > I have a couple of old NUMA-Q systems which are unable to read their > > boot disks with 2.6.23-rc4-mm1. The disks appear to be recognised and > > even the partition tables read correctly, and then they go pop: I reported a similar problem on Sep 1, but until now got no response. The system boots, reads the partition tables, starts the RAID and then kicks one drive out because of errors. > > qla1280: QLA1040 found on PCI bus 0, dev 10 > > Clocksource tsc unstable (delta = 99922590 ns) > > Time: jiffies clocksource has been installed. > > scsi(0:0): Resetting SCSI BUS > > scsi0 : QLogic QLA1040 PCI to SCSI Host Adapter > > Firmware version: 7.65.06, Driver version 3.26 > > scsi 0:0:0:0: Direct-Access IBM DGHS18X 0360 PQ: 0 ANSI: 3 > > scsi(0:0:0:0): Sync: period 10, offset 12, Wide > > scsi 0:0:1:0: Direct-Access IBM OEM DCHS09X 5454 PQ: 0 ANSI: 2 > > scsi(0:0:1:0): Sync: period 10, offset 12, Wide > > scsi 0:0:2:0: Direct-Access IBM OEM DCHS09X 5454 PQ: 0 ANSI: 2 > > scsi(0:0:2:0): Sync: period 10, offset 12, Wide > > scsi 0:0:3:0: Direct-Access IBM OEM DCHS09X 5454 PQ: 0 ANSI: 2 > > scsi(0:0:3:0): Sync: period 10, offset 12, Wide > > scsi 0:0:4:0: Direct-Access IBM OEM DCHS09X 5454 PQ: 0 ANSI: 2 > > scsi(0:0:4:0): Sync: period 10, offset 12, Wide > > st: Version 20070203, fixed bufsize 32768, s/g segs 256 > > sd 0:0:0:0: [sda] 35843670 512-byte hardware sectors (18352 MB) > > sd 0:0:0:0: [sda] Write Protect is off > > sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA > > sd 0:0:0:0: [sda] 35843670 512-byte hardware sectors (18352 MB) > > sd 0:0:0:0: [sda] Write Protect is off > > sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA > > sda: sda1 [snip] > > sd 0:0:0:0: [sda] Result: hostbyte=0x07 driverbyte=0x00 > > end_request: I/O error, dev sda, sector 63 > > Buffer I/O error on device sda1, logical block 0 > > Buffer I/O error on device sda1, logical block 1 > > Buffer I/O error on device sda1, logical block 2 > > Buffer I/O error on device sda1, logical block 3 > > mount: fs type devfs not supported by kernel > > ext3: No journal on filesystem on sda1 > > umount: devfs: not mounted > > sd 0:0:0:0: [sda] Result: hostbyte=0x07 driverbyte=0x00 > > end_request: I/O error, dev sda, sector 28010831 > > sd 0:0:0:0: [sda] Result: hostbyte=0x07 driverbyte=0x00 > > end_request: I/O error, dev sda, sector 31080815 >From my log: [ 3.890000] scsi0 : sata_sil24 [ 3.900000] scsi1 : sata_sil24 [ 3.900000] ata1: SATA max UDMA/100 host m128@0xefeffc00 port 0xefef8000 irq 16 [ 3.920000] ata2: SATA max UDMA/100 host m128@0xefeffc00 port 0xefefa000 irq 16 [ 4.300000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 4.360000] ata1.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133 [ 4.370000] ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 4.430000] ata1.00: configured for UDMA/100 [ 4.500000] ieee1394: Node added: ID:BUS[0-00:1023] GUID[0010dc00005cc354] [ 4.500000] ieee1394: Host added: ID:BUS[0-01:1023] GUID[0011d80000c4c261] [ 4.790000] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 4.850000] ata2.00: ATA-7: MAXTOR STM3320820AS, 3.AAE, max UDMA/133 [ 4.860000] ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 4.920000] ata2.00: configured for UDMA/100 [ 4.930000] scsi 0:0:0:0: Direct-Access ATA MAXTOR STM332082 3.AA PQ: 0 ANSI: 5 [ 4.960000] sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) [ 4.980000] sd 0:0:0:0: [sda] Write Protect is off [ 4.990000] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 4.990000] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.020000] sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) [ 5.040000] sd 0:0:0:0: [sda] Write Protect is off [ 5.050000] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 5.050000] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.080000] sda: sda1 sda2 [ 5.110000] sd 0:0:0:0: [sda] Attached SCSI disk [ 5.120000] scsi 1:0:0:0: Direct-Access ATA MAXTOR STM332082 3.AA PQ: 0 ANSI: 5 [ 5.140000] sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) [ 5.170000] sd 1:0:0:0: [sdb] Write Protect is off [ 5.180000] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 5.180000] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.210000] sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) [ 5.230000] sd 1:0:0:0: [sdb] Write Protect is off [ 5.240000] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 5.240000] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.270000] sdb: sdb1 sdb2 [ 5.300000] sd 1:0:0:0: [sdb] Attached SCSI disk [more normal boot messaged, 3-disk RAID5 starts] [ 63.420000] ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [ 63.420000] ata2.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out [ 63.420000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 63.420000] ata2.00: status: {DRDY } [ 63.420000] ata2: hard resetting link [ 65.720000] ata2: softreset failed (port not ready) [ 65.720000] ata2: reset failed (errno=-5), retrying in 8 secs [ 73.420000] ata2: hard resetting link [ 75.720000] ata2: softreset failed (port not ready) [ 75.720000] ata2: reset failed (errno=-5), retrying in 8 secs [ 83.420000] ata2: hard resetting link [ 85.720000] ata2: softreset failed (port not ready) [ 85.720000] ata2: reset failed (errno=-5), retrying in 33 secs [snip, disk gets kicked] [ 120.780000] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK [ 120.780000] end_request: I/O error, dev sdb, sector 19550927 [ 120.780000] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK [ 120.780000] end_request: I/O error, dev sdb, sector 19550935 [ 120.780000] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK [ 120.780000] end_request: I/O error, dev sdb, sector 19550943 [ 120.780000] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK More similar error messages in the old my LKML-mail. After sdb was removed from the array the system worked normal with only two drives. But on the next boot it kicked the second sata_sil24 disk from the array killing it. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/