2004-06-29 18:16:52

by Matthias Urlichs

[permalink] [raw]
Subject: 2.6.7-mm4: regression: ieee1394: sbp2: null pointer dereference

2.6.7-mm4 oopses when confronted with an unresponsive iee1394 disk.

-mm4:
kernel: ieee1394: sbp2: Error logging into SBP-2 device - login timed-out
kernel: sbp2: probe of 00a0b80a0000144f-0 failed with error -16
kernel: ieee1394: sbp2: Error reconnecting to SBP-2 device - reconnect timed-out
kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000013
kernel: printing eip:
kernel: fb2c55a3
kernel: *pde = 00000000
kernel: Oops: 0002 [#1]
kernel: PREEMPT SMP
kernel: Modules linked in: saa7115 saa7127 raw1394 sbp2 dv1394 eth1394 ivtv tun radeonfb agpgart btaudio tuner tvaudio msp3400 bttv video_buf i2c_algo_bit v4l2_common btcx_risc i2c_core videodev psmouse snd_pcm_oss snd_mixer_oss snd_seq_midi snd_seq_oss snd_seq_midi_event snd_seq snd_ens1370 snd_ak4531_codec snd_via82xx snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore ide_cd cdrom ohci1394 ieee1394
via_rhine mii crc32 rtc sata_via libata reiserfs sym53c8xx scsi_transport_spi sd_mod scsi_mod via82cxxx piix ide_disk ide_core ext3 jbd mbcache kernel: CPU: 0
kernel: EIP: 0060:[<fb2c55a3>] Not tainted VLI
kernel: EFLAGS: 00010282 (2.6.7-mm4-1.16)
kernel: EIP is at sbp2_logout_device+0x13/0x140 [sbp2]
kernel: eax: 00000013 ebx: c1a61b60 ecx: c02d0dd8 edx: 00004a00
kernel: esi: 00000020 edi: c1a5f8f4 ebp: f689df64 esp: f689df4c
kernel: ds: 007b es: 007b ss: 0068
kernel: Process knodemgrd_0 (pid: 867, threadinfo=f689c000 task=f7013210)
kernel: Stack: f7a69878 0000c0ff 00e0e537 c1a61b60 c1a61b60 c1a5f800 f689df74 fb2c49f1
kernel: f7fc5600 f7fc5600 f689df90 f8f7a5b5 f0000234 fb31b000 f7fc5600 f75f71f8
kernel: 00000002 f689dfa8 f8f7a697 f7fc5640 f7fc56fc f75f71f8 00000002 f689dfc0
kernel: Call Trace:
kernel: [<c010801a>] show_stack+0x7a/0x90
kernel: [<c01081a2>] show_registers+0x152/0x1c0
kernel: [<c0108376>] die+0xb6/0x180
kernel: [<c011a8f5>] do_page_fault+0x1e5/0x583
kernel: [<c0107ca9>] error_code+0x2d/0x38
kernel: [<fb2c49f1>] sbp2_update+0x21/0x80 [sbp2]
kernel: [<f8f7a5b5>] nodemgr_update_pdrv+0x95/0x100 [ieee1394]
kernel: [<f8f7a697>] nodemgr_probe_ne+0x77/0x90 [ieee1394]
kernel: [<f8f7a70f>] nodemgr_node_probe+0x5f/0xa0 [ieee1394]
kernel: [<f8f7aab6>] nodemgr_host_thread+0x176/0x1a0 [ieee1394]
kernel: [<c01052c5>] kernel_thread_helper+0x5/0x10
kernel: Code: 25 fe ff ff c7 04 24 c0 75 2c fb eb 81 8d 74 26 00 8d bc 27 00 00 00 00 55 89 e5 56 53 83 ec 10 89 c3 8b b0 b0 00 00 00 8b 40 30 <c7> 00 00 00 00 00 8b 43 30 c7 40 04 00 00 00 00 8b 43 30 c7 40

No -mm4:

kernel: ieee1394: sbp2: Error logging into SBP-2 device - login timed-out
kernel: sbp2: probe of 00a0b80a0000144f-0 failed with error -16
kernel: ieee1394: raw1394: /dev/raw1394 device initialized
kernel: ieee1394: Error parsing configrom for node 0-01:1023
kernel: ieee1394: Error parsing configrom for node 0-03:1023
kernel: ieee1394: Node suspended: ID:BUS[0-03:1023] GUID[006037444d4c4353]
kernel: scsi6 : SCSI emulation for IEEE-1394 SBP-2 Devices
kernel: ieee1394: sbp2: Error logging into SBP-2 device - login failed
kernel: sbp2: probe of 00a0b80a0000144f-0 failed with error -16

--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | [email protected]


2004-06-30 04:01:24

by Matthias Urlichs

[permalink] [raw]
Subject: Re: 2.6.7-mm4: regression: ieee1394: sbp2: null pointer dereference

Hi, Matthias Urlichs wrote:

> 2.6.7-mm4 oopses when confronted with an unresponsive iee1394 disk.

(Andrew helpfully forwarded this to 1394-dev. Thanks.)

Further tests show that the problem just shows up more reliably (if that's
the word...) under -mm4. However, I just got the error on plain 2.6.7.

--
Matthias Urlichs

2004-06-30 16:11:17

by Ben Collins

[permalink] [raw]
Subject: Re: 2.6.7-mm4: regression: ieee1394: sbp2: null pointer dereference

On Wed, Jun 30, 2004 at 06:01:10AM +0200, Matthias Urlichs wrote:
> Hi, Matthias Urlichs wrote:
>
> > 2.6.7-mm4 oopses when confronted with an unresponsive iee1394 disk.
>
> (Andrew helpfully forwarded this to 1394-dev. Thanks.)
>
> Further tests show that the problem just shows up more reliably (if that's
> the word...) under -mm4. However, I just got the error on plain 2.6.7.

This oops traces back into the scsi stack, right? The spaghetti of
trying to get things to work right with the scsi stack is getting to be
a pain. I guess USB doesn't have too many problems since it does a
scsi-host per device, but that's not as easy with sbp2 and 1394, since a
single sbp2 device can have multiple LUN's, and it's just easier to
treat that as one scsi host.

I can't reproduce it, but I'll try to get into the logic of sbp2 device
removal again to see if I can find out where and why this is occuring.

--
Debian - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/

2004-06-30 18:48:45

by Matthias Urlichs

[permalink] [raw]
Subject: Re: 2.6.7-mm4: regression: ieee1394: sbp2: null pointer dereference

Hi,

Ben Collins:
> This oops traces back into the scsi stack, right?

Umm ... not that I know of. It basically says

sbp2_logout_device+0x13/0x140 [sbp2]
kernel: [<fb2c49f1>] sbp2_update+0x21/0x80 [sbp2]

No SCSI anywhere (that I can see).

sbp2_update+0x21 points to the instruction after a call to
sbp2_logout_device(), so that matches up. GDB says the error is here:

sbp2.c:1323: scsi_id->logout_orb->reserved1 = 0x0;

which probably means that either scsi_id or logout_orb is NULL.

> I guess USB doesn't have too many problems since it does a
> scsi-host per device, but that's not as easy with sbp2 and 1394, since a
> single sbp2 device can have multiple LUN's, and it's just easier to
> treat that as one scsi host.
>
So can USB; I have a card reader here which registers a different LUN
per card interface.

> I can't reproduce it, but I'll try to get into the logic of sbp2 device
> removal again to see if I can find out where and why this is occuring.

FWIW, here's the device info (from gscanbus):

SelfID Info
-----------
Physical ID: 2
Link active: Yes
Gap Count: 10
PHY Speed: S400
PHY Delay: <=144ns
IRM Capable: No
Power Class: None
Port 0: Connected to parent node
Port 1: Connected to child node
Init. reset: No

CSR ROM Info
------------
GUID: 0x00A0B80A0000144F
Node Capabilities: 0x000083C0
Vendor ID: 0x0000A0B8
Unit Spec ID: 0x0000609E
Unit SW Version: 0x00010483
Model ID: 0x00000000
Nr. Textual Leafes: 2

Vendor: SYMBIOS LOGIC INC.
Textual Leafes:
LSI Logic
LSI 501 rev B3

AV/C Subunits
-------------
N/A



--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | [email protected]