2004-03-07 06:39:37

by Dmitry Torokhov

[permalink] [raw]
Subject: OOPS when copying data from local to an external drive (ieee1394)

Hi,

I started getting oopses when cpying data from local IDE to an external
Firewire drive. Not always, but quite often. The kernel is a bk pull a
day before 2.6.4-rc2 was released, I do not see any ieee1394 updates
since.

Unfortunately the oops was not saves in the logs, so here is what I managed
to write down:

Oops: 00002 [#1]
PREEMPT
CPU: 0
EIP: 0060 [<c0243d087>] Tainted: P
EFLAGS: 00010047
EIP is at hpsb_packet_sent+0x86/0x90
eax: 00100100 ebx: dfd74000 ecx: dd6edfb0 edx: 00200200
esi: 00000001 edi: dd6cdf60 ebp: c03e3ee0 esp: c03c3edc
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0; threadinfo=c03c2000, task=c034a800)
....
Call trace:
[<co25306e>] dma_trm_tasklet+0xae/0x1b0
recal_task_prio+0xb4/0x1f0
tasklet_action
do_softirq
do_IRQ
common_interrupt
acpi_process_idle
default_idle
rest_init
default_init
rest_init
cpu_idle
start_kernel
unknown_bootparam

Code: ...
Kernel panic: Fatal exception in interrupt
In interrupt handler - not synching


This OOPS is with NVIDIA module loaded but I have seen exactly the
same trace without the module loaded.

--
Dmitry


2004-03-08 06:50:53

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: OOPS when copying data from local to an external drive (ieee1394)

On Sun, 7 Mar 2004, Dmitry Torokhov wrote:

> I started getting oopses when cpying data from local IDE to an external
> Firewire drive. Not always, but quite often. The kernel is a bk pull a
> day before 2.6.4-rc2 was released, I do not see any ieee1394 updates
> since.
>
> Unfortunately the oops was not saves in the logs, so here is what I managed
> to write down:

> Oops: 00002 [#1]
> PREEMPT
> CPU: 0
> EIP: 0060 [<c0243d087>] Tainted: P
> EFLAGS: 00010047
> EIP is at hpsb_packet_sent+0x86/0x90
> eax: 00100100 ebx: dfd74000 ecx: dd6edfb0 edx: 00200200

A spot of linked list corruption.

> esi: 00000001 edi: dd6cdf60 ebp: c03e3ee0 esp: c03c3edc
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 0; threadinfo=c03c2000, task=c034a800)
> ....
> Call trace:
> [<co25306e>] dma_trm_tasklet+0xae/0x1b0

Does this patch help any?

Index: linux-2.6.4-rc1-mm2/drivers/ieee1394/ieee1394_core.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.4-rc1-mm2/drivers/ieee1394/ieee1394_core.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 ieee1394_core.c
--- linux-2.6.4-rc1-mm2/drivers/ieee1394/ieee1394_core.c 4 Mar 2004 04:12:44 -0000 1.1.1.1
+++ linux-2.6.4-rc1-mm2/drivers/ieee1394/ieee1394_core.c 8 Mar 2004 06:47:04 -0000
@@ -403,6 +403,8 @@ void hpsb_selfid_complete(struct hpsb_ho
void hpsb_packet_sent(struct hpsb_host *host, struct hpsb_packet *packet,
int ackcode)
{
+ unsigned long flags;
+
packet->ack_code = ackcode;

if (packet->no_waiter) {
@@ -413,7 +415,9 @@ void hpsb_packet_sent(struct hpsb_host *

if (ackcode != ACK_PENDING || !packet->expect_response) {
atomic_dec(&packet->refcnt);
+ spin_lock_irqsave(&host->pending_pkt_lock, flags);
list_del(&packet->list);
+ spin_unlock_irqrestore(&host->pending_pkt_lock, flags);
packet->state = hpsb_complete;
queue_packet_complete(packet);
return;

2004-03-09 07:12:08

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: OOPS when copying data from local to an external drive (ieee1394)

On Monday 08 March 2004 01:50 am, Zwane Mwaikambo wrote:
> On Sun, 7 Mar 2004, Dmitry Torokhov wrote:
>
> > I started getting oopses when cpying data from local IDE to an external
> > Firewire drive. Not always, but quite often. The kernel is a bk pull a
> > day before 2.6.4-rc2 was released, I do not see any ieee1394 updates
> > since.
> >
> > Unfortunately the oops was not saves in the logs, so here is what I managed
> > to write down:
>
> > Oops: 00002 [#1]
> > PREEMPT
> > CPU: 0
> > EIP: 0060 [<c0243d087>] Tainted: P
> > EFLAGS: 00010047
> > EIP is at hpsb_packet_sent+0x86/0x90
> > eax: 00100100 ebx: dfd74000 ecx: dd6edfb0 edx: 00200200
>
> A spot of linked list corruption.
>
> > esi: 00000001 edi: dd6cdf60 ebp: c03e3ee0 esp: c03c3edc
> > ds: 007b es: 007b ss: 0068
> > Process swapper (pid: 0; threadinfo=c03c2000, task=c034a800)
> > ....
> > Call trace:
> > [<co25306e>] dma_trm_tasklet+0xae/0x1b0
>
> Does this patch help any?
>

Unfortunately I am still getting oopses with exactly the same call trace.
On top of that I am now seeing the following in the logs:

Mar 9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:21 core kernel: Write (10) 00 11 27 de 17 00 00 f8 00
Mar 9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:21 core kernel: Write (10) 00 11 27 df 0f 00 00 f8 00
Mar 9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:21 core kernel: Write (10) 00 11 27 e0 07 00 00 f8 00
Mar 9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:21 core kernel: Write (10) 00 11 27 e0 ff 00 00 f8 00
Mar 9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:21 core kernel: Write (10) 00 11 27 e1 f7 00 00 f8 00
Mar 9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:21 core kernel: Write (10) 00 11 27 e3 e7 00 00 f8 00
Mar 9 01:41:21 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:22 core kernel: Write (10) 00 11 27 e4 df 00 00 f8 00
Mar 9 01:41:23 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:23 core kernel: Write (10) 00 11 27 e5 d7 00 00 f8 00
Mar 9 01:41:26 core kernel: ieee1394: sbp2: sbp2util_node_write_no_wait failed
Mar 9 01:41:28 core last message repeated 8 times
Mar 9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:56 core kernel: Write (10) 00 11 2a f8 ff 00 00 f8 00
Mar 9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:56 core kernel: Write (10) 00 11 2a f9 f7 00 00 f8 00
Mar 9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:56 core kernel: Write (10) 00 11 2a fa ef 00 00 f8 00
Mar 9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:56 core kernel: Write (10) 00 11 2a fb e7 00 00 f8 00
Mar 9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:56 core kernel: Write (10) 00 11 2a fc df 00 00 f8 00
Mar 9 01:41:56 core kernel: ieee1394: sbp2: aborting sbp2 command
Mar 9 01:41:56 core kernel: Write (10) 00 11 2a fe cf 00 00 f8 00

I did not have these messages before. The kernel was pulled today
from bkbits plus your patch (and some of my patches but they only
affect input drivers).

--
Dmitry

2004-03-09 15:16:59

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: OOPS when copying data from local to an external drive (ieee1394)

On Tue, 9 Mar 2004, Dmitry Torokhov wrote:

> > Does this patch help any?
> >
>
> Unfortunately I am still getting oopses with exactly the same call trace.
> On top of that I am now seeing the following in the logs:

Thanks for testing it, the messages below look like they may be due to
something else.

> I did not have these messages before. The kernel was pulled today
> from bkbits plus your patch (and some of my patches but they only
> affect input drivers).

Just to reconfirm could you backout my patch from that and retry?

Thanks,
Zwane

2004-03-09 15:55:06

by Ben Collins

[permalink] [raw]
Subject: Re: OOPS when copying data from local to an external drive (ieee1394)

On Tue, Mar 09, 2004 at 10:16:40AM -0500, Zwane Mwaikambo wrote:
> On Tue, 9 Mar 2004, Dmitry Torokhov wrote:
>
> > > Does this patch help any?
> > >
> >
> > Unfortunately I am still getting oopses with exactly the same call trace.
> > On top of that I am now seeing the following in the logs:
>
> Thanks for testing it, the messages below look like they may be due to
> something else.

No, that's exactly from your patch. The locking your patch added seems
to be wrong. I'm looking into the issue already.


--
Debian - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
WatchGuard - http://www.watchguard.com/