Hi folks!
There always have been problems with the SCSI subsystem of my server (dual
PIII 450) since I upgraded to 2.4. Exchanging the disk didn't help. I get
errors from SCSI about once a week, last time the disks were unaccessable
(sysrq-sync/umount printed on the console, but didn't touch the disks).
This time I was lucky enough to capture the error message, but I am not
skilled enough to decode its meaning:
scsi1:A:2: no active SCB for reconnecting target - issuing BUS DEVICE
RESET
SAVED_SCSIID == 0x27, SAVED_LUN == 0x4, ARG_1 == 0x20 ACCUM = 0x0
SEQ_FLAGS == 0x0, SCBPTR == 0x1, BTT == 0xff, SINDEX == 0x31
SCSIID == 0x0, SCB_SCSIID == 0x0, SCB_LUN == 0x0, SCB_TAG == 0xff,
SCB_CONTROL == 0x0
SCSIBUSL == 0x20, SCSISIGI == 0xe6
SXFRCTL0 == 0x88
SEQCTL == 0x10
scsi1: Dumping Card State in Message-in phase, at SEQADDR 0x1a6
SCSISEQ = 0x12, SBLKCTL = 0xa
DFCNTRL = 0x0, DFSTATUS = 0x89
LASTPHASE = 0xe0, SCSISIGI = 0xe6, SXFRCTL0 = 0x88
SSTAT0 = 0x2, SSTAT1 = 0x3
STACK == 0x17b, 0x165, 0x0, 0x13f
SCB count = 20
Kernel NEXTQSCB = 6
Card NEXTQSCB = 6
QINFIFO entries:
Waiting Queue entries:
Disconnected Queue entries: 9:1 11:15 6:9 4:7 2:19 7:0 13:8 3:14 12:11
15:13
QOUTFIFO entries:
Sequencer Free SCB List: 1 14 10 0 8 5 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30 31
Pending list: 1 15 9 7 19 0 8 14 11 13
Kernel Free SCB list: 2 10 4 3 12 5 18 17 16
DevQ(0:0:0): 0 waiting
DevQ(0:1:0): 0 waiting
DevQ(0:2:0): 0 waiting
scsi1: Bus Device Reset on A:2. 8 SCBs aborted
SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
I/O error: dev 08:61, sector 10181264
SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
I/O error: dev 08:61, sector 10181016
SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
I/O error: dev 08:61, sector 10180768
SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
I/O error: dev 08:61, sector 10180520
SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
I/O error: dev 08:61, sector 10180272
SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
I/O error: dev 08:61, sector 10180024
SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
I/O error: dev 08:61, sector 10179776
The upper layer (scp to a reiserfs partition) apparently didn't see an
error, but this I don't know for sure. Here's the output of
/proc/scsi/aic7xxx/1:
=================================================================
Adaptec AIC7xxx driver version: 6.1.13
aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/255 SCBs
Channel A Target 0 Negotiation Settings
User: 40.000MB/s transfers (20.000MHz, offset 255, 16bit)
Goal: 40.000MB/s transfers (20.000MHz, offset 15, 16bit)
Curr: 40.000MB/s transfers (20.000MHz, offset 15, 16bit)
Channel A Target 0 Lun 0 Settings
Commands Queued 1842691
Commands Active 0
Command Openings 253
Max Tagged Openings 253
Device Queue Frozen Count 0
Channel A Target 1 Negotiation Settings
User: 40.000MB/s transfers (20.000MHz, offset 255, 16bit)
Goal: 40.000MB/s transfers (20.000MHz, offset 63, 16bit)
Curr: 40.000MB/s transfers (20.000MHz, offset 63, 16bit)
Channel A Target 1 Lun 0 Settings
Commands Queued 2184
Commands Active 0
Command Openings 253
Max Tagged Openings 253
Device Queue Frozen Count 0
Channel A Target 2 Negotiation Settings
User: 40.000MB/s transfers (20.000MHz, offset 255, 16bit)
Goal: 40.000MB/s transfers (20.000MHz, offset 15, 16bit)
Curr: 40.000MB/s transfers (20.000MHz, offset 15, 16bit)
Channel A Target 2 Lun 0 Settings
Commands Queued 272085
Commands Active 0
Command Openings 253
Max Tagged Openings 253
Device Queue Frozen Count 0
===============================================================
The other IDs are unused.
Now my question? What actually did happen to cause this error? Is it
hardware-related? _May_ it be caused by faulty RAM? Could this go away by
switching to aic7xxx_old or another version of aic7xxx?
Kernel is 2.4.6 vanilla, compiled with gcc3.0.1.
Thanks for your answers,
Roland
+-----------------------------------------------------+
| Tel.: 089/32649332 0561/873744 |
| in Radeberger Weg 8 Am Fasanenhof 16 |
| 85748 Garching 34125 Kassel |
+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+
| May the Source be with you! |
+-----------------------------------------------------+
Definitly hardware related, one of your hdds is beginning to die. Backup
is the next step?
On Tue, 2001-10-02 at 12:44, Roland Kuhn wrote:
> I/O error: dev 08:61, sector 10180768
> SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
> Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
> I/O error: dev 08:61, sector 10180520
> SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
> Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
> I/O error: dev 08:61, sector 10180272
> SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
> Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
> I/O error: dev 08:61, sector 10180024
> SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
> Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
> I/O error: dev 08:61, sector 10179776
Hi!
On 2 Oct 2001, Stephane Dudzinski wrote:
> Definitly hardware related, one of your hdds is beginning to die. Backup
> is the next step?
>
> On Tue, 2001-10-02 at 12:44, Roland Kuhn wrote:
> > I/O error: dev 08:61, sector 10180768
> > SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
> > Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
> > I/O error: dev 08:61, sector 10180520
> > SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
> > Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
> > I/O error: dev 08:61, sector 10180272
> > SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
> > Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
> > I/O error: dev 08:61, sector 10180024
> > SCSI disk error : host 1 channel 0 id 2 lun 0 return code = 8000000
> > Info fld=0x400001f4 (nonstd), Current sd08:61: sense key None
> > I/O error: dev 08:61, sector 10179776
>
I forgot to tell you that this also happened to a brand-new Cheetah after
5 hours of bonnie...
Generally, there has to be at least some moderate load to be going on on
the bus. This time it was a tape backup of ID 0 running in parallel with
the scp on ID 2.
Ciao,
Roland
+-----------------------------------------------------+
| Tel.: 089/32649332 0561/873744 |
| in Radeberger Weg 8 Am Fasanenhof 16 |
| 85748 Garching 34125 Kassel |
+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+
| May the Source be with you! |
+-----------------------------------------------------+