2004-03-15 23:04:44

by Chris Moore

[permalink] [raw]
Subject: problem with sbp2 / ieee1394 in kernel 2.6.3

I have a Maxtor 5000XT external ieee1394/USB2 disk drive, which I
connect to my PC's ieee1394 port. It works well with Linux under
both kernels 2.4.x and 2.6.3 for a while. I see messages like these
in syslog (under 2.6.3):

Mar 15 13:21:55 chrislap kernel: Vendor: Maxtor Model: 5000XT Rev: 0100
Mar 15 13:21:55 chrislap kernel: Type: Direct-Access ANSI SCSI revision: 06
Mar 15 13:21:55 chrislap kernel: SCSI device sda: 490232832 512-byte hdwr sectors (250999 MB)
Mar 15 13:21:55 chrislap kernel: sda: asking for cache data failed
Mar 15 13:21:55 chrislap kernel: sda: assuming drive cache: write through
Mar 15 13:21:55 chrislap kernel: /dev/scsi/host0/bus0/target0/lun0: p1 p2 p3 p4 < p5 p6 >
Mar 15 13:21:55 chrislap kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0

Unfortunately, at some point problems always occur. Sometimes the
drive stops working very quickly, other times I can transfer tens of
gigabytes of data to and from the drive before anything goes wrong
with it, but eventually it always breaks, under both 2.4.x and 2.6.3.

When it breaks I see messages like these in the syslog:

Mar 15 23:32:26 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 15 23:32:26 chrislap kernel: Read (10) 00 04 d0 26 6f 00 00 b0 00
Mar 15 23:32:36 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 15 23:32:36 chrislap kernel: Test Unit Ready 00 00 00 00 00
Mar 15 23:32:36 chrislap kernel: ieee1394: sbp2: reset requested
Mar 15 23:32:36 chrislap kernel: ieee1394: sbp2: Generating sbp2 fetch agent reset
Mar 15 23:32:46 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 15 23:32:46 chrislap kernel: Test Unit Ready 00 00 00 00 00
Mar 15 23:32:46 chrislap kernel: ieee1394: sbp2: reset requested
Mar 15 23:32:46 chrislap kernel: ieee1394: sbp2: Generating sbp2 fetch agent reset
Mar 15 23:33:06 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 15 23:33:06 chrislap kernel: Test Unit Ready 00 00 00 00 00
Mar 15 23:33:06 chrislap kernel: ieee1394: sbp2: reset requested
Mar 15 23:33:06 chrislap kernel: ieee1394: sbp2: Generating sbp2 fetch agent reset
Mar 15 23:33:26 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 15 23:33:26 chrislap kernel: Test Unit Ready 00 00 00 00 00
Mar 15 23:33:26 chrislap kernel: scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0
Mar 15 23:33:26 chrislap kernel: SCSI error : <0 0 0 0> return code = 0x50000
Mar 15 23:33:26 chrislap kernel: end_request: I/O error, dev sda, sector 80750191
Mar 15 23:33:26 chrislap kernel: scsi0 (0:0): rejecting I/O to offline device
Mar 15 23:33:26 chrislap kernel: Buffer I/O error on device sda2, logical block 110592
Mar 15 23:33:26 chrislap kernel: lost page write due to I/O error on sda2
Mar 15 23:33:26 chrislap kernel: scsi0 (0:0): rejecting I/O to offline device
Mar 15 23:33:26 chrislap kernel: scsi0 (0:0): rejecting I/O to offline device
Mar 15 23:33:26 chrislap kernel: Buffer I/O error on device sda2, logical block 3547
Mar 15 23:33:26 chrislap kernel: lost page write due to I/O error on sda2
Mar 15 23:33:26 chrislap kernel: Aborting journal on device sda2.
Mar 15 23:33:26 chrislap kernel: scsi0 (0:0): rejecting I/O to offline device
Mar 15 23:33:26 chrislap kernel: ext3_abort called.
Mar 15 23:33:26 chrislap kernel: EXT3-fs abort (device sda2): ext3_journal_start: Detected aborted journal
Mar 15 23:33:26 chrislap kernel: Remounting filesystem read-only
Mar 15 23:33:26 chrislap kernel: scsi0 (0:0): rejecting I/O to offline device
Mar 15 23:33:26 chrislap kernel: EXT3-fs error (device sda2): ext3_get_inode_loc: unable to read inode block - inode=327681, block=655362

When this happens I umount the firewire filesystems and then try to
"modprobe -r -v sbp2", but the modprobe command always hangs. I find
I cannot access the drive again until I reboot.

Please, if there's anything more you need to know, or anything I can
do to help diagnose this problem, just reply to this message. I would
love to see it fixed.

Chris.


2004-03-16 03:07:43

by bob

[permalink] [raw]
Subject: Re: problem with sbp2 / ieee1394 in kernel 2.6.3

Hi. I occasionally had problems like you describe 'sbp2 aborting
request', but on my system sbp2 would always recover. All I would see
were a delay in data access, and an entry in /var/log/messages. There
were a lot of changes to ieee1394 between 2.6.3 and 2.6.4, and haven't
seen sbp2 fail since (two or three patches) before 2.6.4. Try the 2.6.4
kernel (or newer) and see if that helps.

Bob

2004-03-18 18:32:20

by Edd Dumbill

[permalink] [raw]
Subject: Re: problem with sbp2 / ieee1394 in kernel 2.6.3

On Mon, 2004-03-15 at 20:07 -0700, Bob Gill wrote:
> Hi. I occasionally had problems like you describe 'sbp2 aborting
> request', but on my system sbp2 would always recover. All I would see
> were a delay in data access, and an entry in /var/log/messages. There
> were a lot of changes to ieee1394 between 2.6.3 and 2.6.4, and haven't
> seen sbp2 fail since (two or three patches) before 2.6.4. Try the 2.6.4
> kernel (or newer) and see if that helps.

I'm also seeing the same errors on 2.6.4 (XP 2800+, nForce2 chipset)
with a Lacie external firewire drive. The drive used to work OK on my
2.4.18 powerpc box.


Mar 17 13:17:49 starway kernel: sbp2: $Rev: 1170 $ Ben Collins
<[email protected]>
Mar 17 13:17:49 starway kernel: scsi2 : SCSI emulation for IEEE-1394
SBP-2 Devices
Mar 17 13:17:49 starway kernel: ieee1394: sbp2: Logged into SBP-2 device
Mar 17 13:17:49 starway kernel: ieee1394: sbp2: Node 0-00:1023: Max
speed [S400] - Max payload [2048]
Mar 17 13:17:49 starway kernel: Vendor: IC35L080 Model: AVVA07-0
Rev: VA4O
Mar 17 13:17:49 starway kernel: Type: Direct-Access
ANSI SCSI revision: 06
Mar 17 13:17:49 starway kernel: SCSI device sdc: 160836480 512-byte hdwr
sectors (82348 MB)
Mar 17 13:17:49 starway kernel: sdc: asking for cache data failed
Mar 17 13:17:49 starway kernel: sdc: assuming drive cache: write through
Mar 17 13:17:49 starway kernel: sdc: sdc1
Mar 17 13:17:49 starway kernel: Attached scsi disk sdc at scsi2, channel
0, id 0
Mar 17 13:17:49 starway kernel: Attached scsi generic sg2 at scsi2,
channel 0, id 0, lun 0, type 0

....

Mar 17 15:38:22 starway kernel: ieee1394: sbp2: aborting sbp2 command
Mar 17 15:38:22 starway kernel: Read (10) 00 06 fe f4 ff 00 00 08 00
Mar 17 15:38:22 starway kernel: ieee1394: sbp2: aborting sbp2 command
Mar 17 15:38:22 starway kernel: Read (10) 00 06 fe f5 0f 00 00 08 00
Mar 17 15:38:22 starway kernel: ieee1394: sbp2: aborting sbp2 command
Mar 17 15:38:22 starway kernel: Read (10) 00 06 fe f5 1f 00 00 08 00
Mar 17 15:38:22 starway kernel: ieee1394: sbp2: aborting sbp2 command
Mar 17 15:38:22 starway kernel: Read (10) 00 06 fe f5 2f 00 00 08 00
Mar 17 15:38:22 starway kernel: ieee1394: sbp2: aborting sbp2 command
Mar 17 15:38:22 starway kernel: Read (10) 00 06 fe f5 3f 00 00 08 00
Mar 17 15:38:22 starway kernel: ieee1394: sbp2: aborting sbp2 command
Mar 17 15:38:22 starway kernel: Read (10) 00 06 fe f5 4f 00 00 08 00
Mar 17 15:38:22 starway kernel: ieee1394: sbp2: aborting sbp2 command
Mar 17 15:38:22 starway kernel: Read (10) 00 06 fe f5 5f 00 00 08 00
Mar 17 15:38:22 starway kernel: ieee1394: sbp2: aborting sbp2 command

-- Edd



2004-03-19 07:15:25

by Chris Moore

[permalink] [raw]
Subject: Re: problem with sbp2 / ieee1394 in kernel 2.6.3

Bob Gill, Mon Mar 15 2004 - 22:10:19 EST:

> There were a lot of changes to ieee1394 between 2.6.3 and 2.6.4, and
> haven't seen sbp2 fail since (two or three patches) before
> 2.6.4. Try the 2.6.4 kernel (or newer) and see if that helps.

I just compiled 2.6.4. It fails just the same as in 2.4.x and 2.6.3,
but this time I got a message I hadn't noticed before - the first
line, the "unsolicited response" message is new to me:

Mar 19 07:48:10 chrislap kernel: ieee1394: unsolicited response packet received - no tlabel match
Mar 19 07:48:51 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 19 07:48:51 chrislap kernel: Read (10) 00 06 b5 82 bf 00 00 f8 00
Mar 19 07:49:01 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 19 07:49:01 chrislap kernel: Test Unit Ready 00 00 00 00 00
Mar 19 07:49:01 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 19 07:49:01 chrislap kernel: Read (10) 00 06 b5 83 b7 00 00 08 00
Mar 19 07:49:11 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 19 07:49:11 chrislap kernel: Test Unit Ready 00 00 00 00 00
Mar 19 07:49:11 chrislap kernel: ieee1394: sbp2: reset requested
Mar 19 07:49:11 chrislap kernel: ieee1394: sbp2: Generating sbp2 fetch agent reset
Mar 19 07:49:21 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 19 07:49:21 chrislap kernel: Test Unit Ready 00 00 00 00 00
Mar 19 07:49:21 chrislap kernel: ieee1394: sbp2: reset requested
Mar 19 07:49:21 chrislap kernel: ieee1394: sbp2: Generating sbp2 fetch agent reset
Mar 19 07:49:41 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 19 07:49:41 chrislap kernel: Test Unit Ready 00 00 00 00 00
Mar 19 07:49:51 chrislap kernel: ieee1394: sbp2: aborting sbp2 command
Mar 19 07:49:51 chrislap kernel: Test Unit Ready 00 00 00 00 00
Mar 19 07:49:51 chrislap kernel: ieee1394: sbp2: reset requested
[...]

I umounted the drive - got some errors, but it umounted:

[root@chrislap firewire]# umount g
messages:Mar 19 07:55:12 chrislap kernel: scsi1 (1:0): rejecting I/O to offline device
messages:Mar 19 07:55:12 chrislap kernel: Buffer I/O error on device sda2, logical block 524
messages:Mar 19 07:55:12 chrislap kernel: lost page write due to I/O error on sda2
[root@chrislap firewire]#

but then modprobe hung:

[root@chrislap firewire]# modprobe -r -v sbp2
unloading the sbp2 module:
rmmod /lib/modules/2.6.4/kernel/drivers/ieee1394/sbp2.ko
^--note no root prompt came back

So what should I do next? Incidentally, I don't suppose it is of any
interest, but the drive has a USB 2 interface too. My PC doesn't
have a USB 2 port, but it does have USB 1. I've managed to read the
whole disk using the USB 1 interface (albeit VERY slowly). Perhaps
that's interesting...

Chris.