2003-03-13 05:46:36

by Torrey Hoffman

[permalink] [raw]
Subject: Oops in firewire (2.4.21-pre5 with 2.4.21-pre4 firewire driver)

I heard that the firewire merge in 2.4.21-pre5 was messed up, so I
replaced the -pre5 drivers/ieee1394 with the one from -pre4.

I got an oops while loading the driver. I will continue to experiment
with recent kernels, and try to find a bitkeeper snapshot with the
latest firewire fixes. Any suggestions are welcome.

(I am experimenting with recent kernels because these modules cause oops
and hangs with the latest Red Hat kernel as well. However, the hardware
works fine with older Red Hat kernels.)

Anyway, the oops is decoded below. System is an up to date Red Hat 8,
except for the kernel.

ohci1394_0: Unexpected PCI resource length of 1000!
ohci1394_0: OHCI-1394 1.0 (PCI): IRQ=[9] MMIO=[e9000000-e90007ff] Max Packet=[2048]
ieee1394: SelfID completion called outside of bus reset!
ieee1394: Device added: Node[00:1023] GUID[0004830000002cb3] [Oxford ]
ieee1394: Host added: Node[01:1023] GUID[0030dd8000505e29] [Linux OHCI-1394]
Unable to handle kernel NULL pointer dereference at virtual address 0000002c
printing eip:
c016e639
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c016e639>] Not tainted
EFLAGS: 00013246
eax: 00000000 ebx: d1e518c0 ecx: d1eaf550 edx: d1eaf550
esi: 00000000 edi: 00000000 ebp: 00000000 esp: d2547e60
ds: 0018 es: 0018 ss: 0018
Process kjournald (pid: 157, stackpage=d2547000)
Stack: c9656b00 d273f380 d1eaf550 d1e4dc60 c016c92d d1eaf550 c9656b00 00000004
000006e4 00000000 d273f3f4 00000000 000003dc c95ebc24 00000000 d1e4dc60
d1eaf3d0 000006e4 ce8ed480 c9656a40 ce8ed5a0 ce8ed540 ce8ed4e0 ce8ed480
Call Trace: [<c016c92d>] [<c01176fc>] [<c016f62a>] [<c016f4c0>] [<c010744e>]
[<c016f4e0>]

Code: 3b 5e 2c 74 29 89 34 24 89 5c 24 04 e8 56 01 00 00 8b 5c 24
<6>ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node[00:1023]: Max speed [S400] - Max payload [2048]
scsi1 : IEEE-1394 SBP-2 protocol driver (host: ohci1394)
$Rev: 707 $ James Goodwin <[email protected]>
SBP-2 module load options:
- Max speed supported: S400
- Max sectors per I/O supported: 255
- Max outstanding commands supported: 8
- Max outstanding commands per lun supported: 1
- Serialized I/O (debug): no
- Exclusive login: yes
Vendor: WDC WD12 Model: 00JB-00CRA1 Rev:
Type: Direct-Access ANSI SCSI revision: 06
Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB)
sda: sda1

>>EIP; c016e639 <__journal_remove_checkpoint+39/90> <=====

>>ebx; d1e518c0 <_end+11ae7e88/144df628>
>>ecx; d1eaf550 <_end+11b45b18/144df628>
>>edx; d1eaf550 <_end+11b45b18/144df628>
>>esp; d2547e60 <_end+121de428/144df628>

Trace; c016c92d <journal_commit_transaction+6dd/1180>
Trace; c01176fc <schedule+21c/360>
Trace; c016f62a <kjournald+14a/1d0>
Trace; c016f4c0 <commit_timeout+0/10>
Trace; c010744e <kernel_thread+2e/40>
Trace; c016f4e0 <kjournald+0/1d0>

Code; c016e639 <__journal_remove_checkpoint+39/90>
00000000 <_EIP>:
Code; c016e639 <__journal_remove_checkpoint+39/90> <=====
0: 3b 5e 2c cmp 0x2c(%esi),%ebx <=====
Code; c016e63c <__journal_remove_checkpoint+3c/90>
3: 74 29 je 2e <_EIP+0x2e>
Code; c016e63e <__journal_remove_checkpoint+3e/90>
5: 89 34 24 mov %esi,(%esp,1)
Code; c016e641 <__journal_remove_checkpoint+41/90>
8: 89 5c 24 04 mov %ebx,0x4(%esp,1)
Code; c016e645 <__journal_remove_checkpoint+45/90>
c: e8 56 01 00 00 call 167 <_EIP+0x167>
Code; c016e64a <__journal_remove_checkpoint+4a/90>
11: 8b 5c 24 00 mov 0x0(%esp,1),%ebx


Torrey Hoffman
[email protected]


2003-03-13 06:01:47

by Ben Collins

[permalink] [raw]
Subject: Re: Oops in firewire (2.4.21-pre5 with 2.4.21-pre4 firewire driver)

On Wed, Mar 12, 2003 at 05:06:23PM -0800, Torrey Hoffman wrote:
> I heard that the firewire merge in 2.4.21-pre5 was messed up, so I
> replaced the -pre5 drivers/ieee1394 with the one from -pre4.

I'd suggest with trying the latest BK cset patch (which fixes -pre5 and
also fixes some things in general).

> I got an oops while loading the driver. I will continue to experiment
> with recent kernels, and try to find a bitkeeper snapshot with the
> latest firewire fixes. Any suggestions are welcome.

> >>EIP; c016e639 <__journal_remove_checkpoint+39/90> <=====

This happened in the kjournald thread context. I'm not sure it is
ieee1394 related, but it is suspect that it happened in the middle of
handling an ieee1394 bus reset.

Is this reproducible when loading the ohci1394 driver? If so, does it
occur when you turn off hotplug (IOW, don't load sbp2 driver) or if the
sbp2 device is not attached?

--
Debian - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
Deqo - http://www.deqo.com/

2003-03-13 17:50:07

by Torrey Hoffman

[permalink] [raw]
Subject: Re: Oops in firewire (2.4.21-pre5 with 2.4.21-pre4 firewire driver)

On Wed, 2003-03-12 at 22:11, Ben Collins wrote:
> On Wed, Mar 12, 2003 at 05:06:23PM -0800, Torrey Hoffman wrote:
[ohci1394 / sbp2 problems]

> I'd suggest with trying the latest BK cset patch (which fixes -pre5 and
> also fixes some things in general).

Thanks for the response.

Last night I (finally) installed bitkeeper, pulled the latest 2.4 tree,
and gave it a try. It seems to have solved the problem on my single CPU
machine. I will try my SMP machine tonight and see how things go there.

I run reiserfs on my firewire drives but ext3 on some other partitions.
These oops have often occurred when doing rsync's between the reiserfs
on firewire and an ext3 or reiserfs partition on a regular disk or raid5
setup.

On my SMP machine this morning (using Red Hat's 2.4.18-18smp kernel) I
had a similar oops with references to kjournald under a heavy firewire
load. The machine didn't die, and after the bus resets completed, the
rsync from the firewire drive continued.

These oopses have been very reproducible while loading ohci1394, and
sometimes while transferring data after loading. They don't occur if
the sbp2 device is not attached. I have hacked my rc.sysinit script to
always load the drivers, since Red Hat's autodetection stuff there quit
working around 2.4.18-17, and as long as the device isn't attached
2.4.18-24 boots fine and loads the drivers.

Up until installing 2.4-bk last night, I normally booted to Red Hat's
2.4.18-24 kernel, except when I need to use firewire. 2.4.18-24 doesn't
work at all for me under firewire, and 2.4.18-18 "mostly" works.)

Anyway, I will upgrade all my machines to the latest -bk snapshot and
will be back with more bug reports if I see any glitches...

Hopefully Red Hat will update their official kernel with the firewire
fixes. And fix their rc.sysinit script too, while they are at it. (No,
I haven't submitted a bugzilla report yet, will do so if -bk fixes
things for me...)

Thanks again,

Torrey Hoffman


>
> > I got an oops while loading the driver. I will continue to experiment
> > with recent kernels, and try to find a bitkeeper snapshot with the
> > latest firewire fixes. Any suggestions are welcome.
>
> > >>EIP; c016e639 <__journal_remove_checkpoint+39/90> <=====
>
> This happened in the kjournald thread context. I'm not sure it is
> ieee1394 related, but it is suspect that it happened in the middle of
> handling an ieee1394 bus reset.
>
> Is this reproducible when loading the ohci1394 driver? If so, does it
> occur when you turn off hotplug (IOW, don't load sbp2 driver) or if the
> sbp2 device is not attached?
>
> --
> Debian - http://www.debian.org/
> Linux 1394 - http://www.linux1394.org/
> Subversion - http://subversion.tigris.org/
> Deqo - http://www.deqo.com/
>


2003-03-14 03:32:12

by Torrey Hoffman

[permalink] [raw]
Subject: Re: Oops in firewire (2.4.21-pre5 with 2.4.21-pre4 firewire driver)

On Wed, 2003-03-12 at 22:11, Ben Collins wrote:
[about firewire problems]
> I'd suggest with trying the latest BK cset patch (which fixes -pre5 and
> also fixes some things in general).

Although I said things were working, it turns out firewire is not bug
free quite yet.

Here's another oops to look at.

The kernel was a bitkeeper snapshot from last night (March 12), no other
patches or modifications, and the hardware is a single CPU Pentium III,
a Maxtor-brand firewire controller, with an ATA-6 supporting
IDE-to-Firewire bridge hooked up to it.

This occurred when rc.sysinit loaded ohci1394. sbp2 is automatically
loaded by the hotplug driver, so that may have actually been the source
of the problem.

The system continued to boot after the oops (and in fact it's running
now as I write this...)

ksymoops 2.4.5 on i686 2.4.21-bk-0312. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21-bk-0312/ (default)
-m /boot/System.map-2.4.21-bk-0312 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Unable to handle kernel NULL pointer dereference at virtual address 00000000
d3848227
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<d3848227>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000 ebx: 00000000 ecx: 00000001 edx: d24b2000
esi: d24b2000 edi: 00000002 ebp: 00000002 esp: ce0c9fa8
ds: 0018 es: 0018 ss: 0018
Process knodemgrd (pid: 211, stackpage=ce0c9000)
Stack: 00000001 00000000 d24b20f4 0000ffc2 d38482d7 d24b2000 00000002 00000002
c1361da0 c1361db8 d384f1f8 d24b2000 d384836a d24b2000 00000018 00000f00
d2461e50 d24b2000 c010744e c1361da0 d3848300 c1361da4
Call Trace: [<d38482d7>] [<d384f1f8>] [<d384836a>] [<c010744e>] [<d3848300>]
Code: 8b 1b 3d 08 f2 84 d3 75 f0 58 5b 5e 5f c3 39 78 20 74 eb 89


>>EIP; d3848227 <[ieee1394]nodemgr_node_probe_cleanup+27/50> <=====

>>edx; d24b2000 <_end+1218d108/134fe168>
>>esi; d24b2000 <_end+1218d108/134fe168>
>>esp; ce0c9fa8 <_end+dda50b0/134fe168>

Trace; d38482d7 <[ieee1394]nodemgr_node_probe+87/b0>
Trace; d384f1f8 <[ieee1394]nodemgr_serialize+0/10>
Trace; d384836a <[ieee1394]nodemgr_host_thread+6a/b0>
Trace; c010744e <kernel_thread+2e/40>
Trace; d3848300 <[ieee1394]nodemgr_host_thread+0/b0>

Code; d3848227 <[ieee1394]nodemgr_node_probe_cleanup+27/50>
00000000 <_EIP>:
Code; d3848227 <[ieee1394]nodemgr_node_probe_cleanup+27/50> <=====
0: 8b 1b mov (%ebx),%ebx <=====
Code; d3848229 <[ieee1394]nodemgr_node_probe_cleanup+29/50>
2: 3d 08 f2 84 d3 cmp $0xd384f208,%eax
Code; d384822e <[ieee1394]nodemgr_node_probe_cleanup+2e/50>
7: 75 f0 jne fffffff9 <_EIP+0xfffffff9>
Code; d3848230 <[ieee1394]nodemgr_node_probe_cleanup+30/50>
9: 58 pop %eax
Code; d3848231 <[ieee1394]nodemgr_node_probe_cleanup+31/50>
a: 5b pop %ebx
Code; d3848232 <[ieee1394]nodemgr_node_probe_cleanup+32/50>
b: 5e pop %esi
Code; d3848233 <[ieee1394]nodemgr_node_probe_cleanup+33/50>
c: 5f pop %edi
Code; d3848234 <[ieee1394]nodemgr_node_probe_cleanup+34/50>
d: c3 ret
Code; d3848235 <[ieee1394]nodemgr_node_probe_cleanup+35/50>
e: 39 78 20 cmp %edi,0x20(%eax)
Code; d3848238 <[ieee1394]nodemgr_node_probe_cleanup+38/50>
11: 74 eb je fffffffe <_EIP+0xfffffffe>
Code; d384823a <[ieee1394]nodemgr_node_probe_cleanup+3a/50>
13: 89 00 mov %eax,(%eax)


1 warning issued. Results may not be reliable.