2005-10-18 21:02:57

by Mark Knecht

[permalink] [raw]
Subject: scsi_eh / 1394 bug - -rt7

Hi,
I'm seeing this each time I plug in a 1394 hard drive:

Attached scsi disk sdc at scsi6, channel 0, id 0, lun 0
ieee1394: Node changed: 0-01:1023 -> 0-00:1023
ieee1394: Node changed: 0-02:1023 -> 0-01:1023
ieee1394: Reconnected to SBP-2 device
ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048]
ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0050c501e00b31ec]
prev->state: 2 != TASK_RUNNING??
scsi_eh_6/20286[CPU#0]: BUG in __schedule at kernel/sched.c:3328

Call Trace:<ffffffff801322b1>{__WARN_ON+97} <ffffffff803f8870>{__schedule+608}
<ffffffff8013434f>{do_exit+1007}
<ffffffff80147300>{keventd_create_kthread+0}
<ffffffff8010e5ed>{child_rip+15}
<ffffffff80147300>{keventd_create_kthread+0}
<ffffffff801471f0>{kthread+0} <ffffffff8010e5de>{child_rip+0}

ieee1394: Node resumed: ID:BUS[0-00:1023] GUID[0050c501e00b31ec]
ieee1394: Node changed: 0-00:1023 -> 0-01:1023
ieee1394: Node changed: 0-01:1023 -> 0-02:1023
ieee1394: Reconnected to SBP-2 device
ieee1394: Node 0-01:1023: Max speed [S400] - Max payload [2048]
scsi7 : SCSI emulation for IEEE-1394 SBP-2 Devices
lightning linux #

Note: This drive is currently partitioned using the 'Apple Partition
Scheme' and cannot be mounted. (At least by the likes of me!!) Anyway,
more info forthcoming if I can determine what you need.

Thanks,
Mark


2005-10-19 03:44:22

by Lee Revell

[permalink] [raw]
Subject: Re: scsi_eh / 1394 bug - -rt7

On Tue, 2005-10-18 at 14:02 -0700, Mark Knecht wrote:
> Hi,
> I'm seeing this each time I plug in a 1394 hard drive:
>
> Attached scsi disk sdc at scsi6, channel 0, id 0, lun 0
> ieee1394: Node changed: 0-01:1023 -> 0-00:1023
> ieee1394: Node changed: 0-02:1023 -> 0-01:1023
> ieee1394: Reconnected to SBP-2 device
> ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048]
> ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0050c501e00b31ec]
> prev->state: 2 != TASK_RUNNING??
> scsi_eh_6/20286[CPU#0]: BUG in __schedule at kernel/sched.c:3328

I hit this exact same bug while at a client site today, with an external
USB drive.

Lee

2005-10-19 04:20:15

by Lee Revell

[permalink] [raw]
Subject: Re: scsi_eh / 1394 bug - -rt7

On Tue, 2005-10-18 at 23:43 -0400, Lee Revell wrote:
> On Tue, 2005-10-18 at 14:02 -0700, Mark Knecht wrote:
> > Hi,
> > I'm seeing this each time I plug in a 1394 hard drive:
> >
> > Attached scsi disk sdc at scsi6, channel 0, id 0, lun 0
> > ieee1394: Node changed: 0-01:1023 -> 0-00:1023
> > ieee1394: Node changed: 0-02:1023 -> 0-01:1023
> > ieee1394: Reconnected to SBP-2 device
> > ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048]
> > ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0050c501e00b31ec]
> > prev->state: 2 != TASK_RUNNING??
> > scsi_eh_6/20286[CPU#0]: BUG in __schedule at kernel/sched.c:3328
>
> I hit this exact same bug while at a client site today, with an external
> USB drive.

And again on my home machine running -rt1 with my digital camera!

Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
usb 2-1: USB disconnect, address 2
prev->state: 2 != TASK_RUNNING??
scsi_eh_0/12648[CPU#0]: BUG in __schedule at kernel/sched.c:3326
[<c01048b9>] dump_stack+0x19/0x20 (20)
[<c011e766>] __WARN_ON+0x46/0x80 (12)
[<c02c0bf7>] __schedule+0x547/0x790 (84)
[<c012057a>] do_exit+0x26a/0x430 (28)
[<c010147b>] kernel_thread_helper+0xb/0x10 (1020129312)

Lee

2005-10-19 07:11:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: scsi_eh / 1394 bug - -rt7



On Wed, 19 Oct 2005, Lee Revell wrote:

> On Tue, 2005-10-18 at 23:43 -0400, Lee Revell wrote:
> > On Tue, 2005-10-18 at 14:02 -0700, Mark Knecht wrote:
> > > Hi,
> > > I'm seeing this each time I plug in a 1394 hard drive:
> > >
> > > Attached scsi disk sdc at scsi6, channel 0, id 0, lun 0
> > > ieee1394: Node changed: 0-01:1023 -> 0-00:1023
> > > ieee1394: Node changed: 0-02:1023 -> 0-01:1023
> > > ieee1394: Reconnected to SBP-2 device
> > > ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048]
> > > ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0050c501e00b31ec]
> > > prev->state: 2 != TASK_RUNNING??
> > > scsi_eh_6/20286[CPU#0]: BUG in __schedule at kernel/sched.c:3328
> >
> > I hit this exact same bug while at a client site today, with an external
> > USB drive.
>
> And again on my home machine running -rt1 with my digital camera!
>
> Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
> usb 2-1: USB disconnect, address 2
> prev->state: 2 != TASK_RUNNING??
> scsi_eh_0/12648[CPU#0]: BUG in __schedule at kernel/sched.c:3326
> [<c01048b9>] dump_stack+0x19/0x20 (20)
> [<c011e766>] __WARN_ON+0x46/0x80 (12)
> [<c02c0bf7>] __schedule+0x547/0x790 (84)
> [<c012057a>] do_exit+0x26a/0x430 (28)
> [<c010147b>] kernel_thread_helper+0xb/0x10 (1020129312)
>

This is also a problem in the upstream kernel. It's just that RT catches
it! Here's the patch. I'll also write one for the upstream kernel,
although this patch would probably work there as well. But I'll make it
official.

Ingo,

Here's the patch. The problem is similar to the pcmcia bug. It seems
that the loop usually exits in the TASK_INTERRUPTIBLE state.


Lee and Mark,

Can you two try it as well and confirm that you cant get the
bug anymore.

Thanks,


-- Steve


Index: linux-2.6.14-rc4-rt9/drivers/scsi/scsi_error.c
===================================================================
--- linux-2.6.14-rc4-rt9.orig/drivers/scsi/scsi_error.c 2005-10-19 02:54:49.000000000 -0400
+++ linux-2.6.14-rc4-rt9/drivers/scsi/scsi_error.c 2005-10-19 02:57:06.000000000 -0400
@@ -1645,6 +1645,12 @@
set_current_state(TASK_INTERRUPTIBLE);
}

+ /*
+ * There's a good chance that the loop will exit in the
+ * TASK_INTERRUPTIBLE state.
+ */
+ __set_current_state(TASK_RUNNING);
+
SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler scsi_eh_%d"
" exiting\n",shost->host_no));

2005-10-19 07:55:15

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH] scsi_error thread exits in TASK_INTERRUPTIBLE state.


Found in the -rt patch set. The scsi_error thread likely will be in the
TASK_INTERRUPTIBLE state upon exit. This patch fixes this bug.

-- Steve

Signed-off-by: Steven Rostedt <[email protected]>

Index: linux-2.6.14-rc4/drivers/scsi/scsi_error.c
===================================================================
--- linux-2.6.14-rc4.orig/drivers/scsi/scsi_error.c 2005-10-19 03:37:55.000000000 -0400
+++ linux-2.6.14-rc4/drivers/scsi/scsi_error.c 2005-10-19 03:38:59.000000000 -0400
@@ -1645,6 +1645,12 @@
set_current_state(TASK_INTERRUPTIBLE);
}

+ /*
+ * There's a good chance that the loop will exit in the
+ * TASK_INTERRUPTIBLE state.
+ */
+ __set_current_state(TASK_RUNNING);
+
SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler scsi_eh_%d"
" exiting\n",shost->host_no));


2005-10-19 11:23:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: scsi_eh / 1394 bug - -rt7


* Steven Rostedt <[email protected]> wrote:

> > Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
> > usb 2-1: USB disconnect, address 2
> > prev->state: 2 != TASK_RUNNING??
> > scsi_eh_0/12648[CPU#0]: BUG in __schedule at kernel/sched.c:3326
> > [<c01048b9>] dump_stack+0x19/0x20 (20)
> > [<c011e766>] __WARN_ON+0x46/0x80 (12)
> > [<c02c0bf7>] __schedule+0x547/0x790 (84)
> > [<c012057a>] do_exit+0x26a/0x430 (28)
> > [<c010147b>] kernel_thread_helper+0xb/0x10 (1020129312)
> >
>
> This is also a problem in the upstream kernel. It's just that RT
> catches it! Here's the patch. I'll also write one for the upstream
> kernel, although this patch would probably work there as well. But
> I'll make it official.
>
> Ingo,
>
> Here's the patch. The problem is similar to the pcmcia bug. It seems
> that the loop usually exits in the TASK_INTERRUPTIBLE state.

thanks, applied and released in 2.6.14-rc4-rt10.

Ingo

2005-10-19 11:31:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] scsi_error thread exits in TASK_INTERRUPTIBLE state.

> + /*
> + * There's a good chance that the loop will exit in the
> + * TASK_INTERRUPTIBLE state.
> + */
> + __set_current_state(TASK_RUNNING);

no need to comment the obvious.

2005-10-19 11:52:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] scsi_error thread exits in TASK_INTERRUPTIBLE state.


On Wed, 19 Oct 2005, Christoph Hellwig wrote:

> > + /*
> > + * There's a good chance that the loop will exit in the
> > + * TASK_INTERRUPTIBLE state.
> > + */
> > + __set_current_state(TASK_RUNNING);
>
> no need to comment the obvious.
>

So, should I resend the patch without the comment?

-- Steve

2005-10-19 11:54:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] scsi_error thread exits in TASK_INTERRUPTIBLE state.

On Wed, Oct 19, 2005 at 07:51:47AM -0400, Steven Rostedt wrote:
> So, should I resend the patch without the comment?

yes, please ;-)

2005-10-19 11:56:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] scsi_error thread exits in TASK_INTERRUPTIBLE state.


* Steven Rostedt <[email protected]> wrote:

> On Wed, 19 Oct 2005, Christoph Hellwig wrote:
>
> > > + /*
> > > + * There's a good chance that the loop will exit in the
> > > + * TASK_INTERRUPTIBLE state.
> > > + */
> > > + __set_current_state(TASK_RUNNING);
> >
> > no need to comment the obvious.
>
> So, should I resend the patch without the comment?

i guess so. OTOH, if it was so obvious, why did it stay unfixed for so
long ;-)

Ingo

2005-10-19 11:59:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] scsi_error thread exits in TASK_INTERRUPTIBLE state.

On Wed, Oct 19, 2005 at 01:56:53PM +0200, Ingo Molnar wrote:
> > So, should I resend the patch without the comment?
>
> i guess so. OTOH, if it was so obvious, why did it stay unfixed for so
> long ;-)

a) the code is pretty new
b) this isn't a serious problem for kernels without your preempt-rt patch

2005-10-19 12:22:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] scsi_error thread exits in TASK_INTERRUPTIBLE state.


Once again here's the patch.

Andrew, can you pull my previous one in favor of this one

I'm one of the rare breeds of programmers that over document, at least I
didn't comment "x equals a plus b" for "x = a + b" but I'm sure Christoph
would say that I was pretty close ;-)

Description:

Found in the -rt patch set. The scsi_error thread likely will be in the
TASK_INTERRUPTIBLE state upon exit. This patch fixes this bug.

Ingo,

I take it that you don't care about the comment for RT.

-- Steve

Signed-off-by: Steven Rostedt <[email protected]>

Index: linux-2.6.14-rc4/drivers/scsi/scsi_error.c
===================================================================
--- linux-2.6.14-rc4.orig/drivers/scsi/scsi_error.c 2005-10-19 03:37:55.000000000 -0400
+++ linux-2.6.14-rc4/drivers/scsi/scsi_error.c 2005-10-19 08:10:23.000000000 -0400
@@ -1645,6 +1645,8 @@
set_current_state(TASK_INTERRUPTIBLE);
}

+ __set_current_state(TASK_RUNNING);
+
SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler scsi_eh_%d"
" exiting\n",shost->host_no));

2005-10-19 15:23:48

by Mark Knecht

[permalink] [raw]
Subject: Re: scsi_eh / 1394 bug - -rt7

On 10/19/05, Steven Rostedt <[email protected]> wrote:
>
>
> On Wed, 19 Oct 2005, Lee Revell wrote:
>
> > On Tue, 2005-10-18 at 23:43 -0400, Lee Revell wrote:
> > > On Tue, 2005-10-18 at 14:02 -0700, Mark Knecht wrote:
> > > > Hi,
> > > > I'm seeing this each time I plug in a 1394 hard drive:
> > > >
> > > > Attached scsi disk sdc at scsi6, channel 0, id 0, lun 0
> > > > ieee1394: Node changed: 0-01:1023 -> 0-00:1023
> > > > ieee1394: Node changed: 0-02:1023 -> 0-01:1023
> > > > ieee1394: Reconnected to SBP-2 device
> > > > ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048]
> > > > ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0050c501e00b31ec]
> > > > prev->state: 2 != TASK_RUNNING??
> > > > scsi_eh_6/20286[CPU#0]: BUG in __schedule at kernel/sched.c:3328
> > >
> > > I hit this exact same bug while at a client site today, with an external
> > > USB drive.
> >
> > And again on my home machine running -rt1 with my digital camera!
> >
> > Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
> > usb 2-1: USB disconnect, address 2
> > prev->state: 2 != TASK_RUNNING??
> > scsi_eh_0/12648[CPU#0]: BUG in __schedule at kernel/sched.c:3326
> > [<c01048b9>] dump_stack+0x19/0x20 (20)
> > [<c011e766>] __WARN_ON+0x46/0x80 (12)
> > [<c02c0bf7>] __schedule+0x547/0x790 (84)
> > [<c012057a>] do_exit+0x26a/0x430 (28)
> > [<c010147b>] kernel_thread_helper+0xb/0x10 (1020129312)
> >
>
> This is also a problem in the upstream kernel. It's just that RT catches
> it! Here's the patch. I'll also write one for the upstream kernel,
> although this patch would probably work there as well. But I'll make it
> official.
>
> Ingo,
>
> Here's the patch. The problem is similar to the pcmcia bug. It seems
> that the loop usually exits in the TASK_INTERRUPTIBLE state.
>
>
> Lee and Mark,
>
> Can you two try it as well and confirm that you cant get the
> bug anymore.
>
> Thanks,
>
>
> -- Steve
>
>
> Index: linux-2.6.14-rc4-rt9/drivers/scsi/scsi_error.c
> ===================================================================
> --- linux-2.6.14-rc4-rt9.orig/drivers/scsi/scsi_error.c 2005-10-19 02:54:49.000000000 -0400
> +++ linux-2.6.14-rc4-rt9/drivers/scsi/scsi_error.c 2005-10-19 02:57:06.000000000 -0400
> @@ -1645,6 +1645,12 @@
> set_current_state(TASK_INTERRUPTIBLE);
> }
>
> + /*
> + * There's a good chance that the loop will exit in the
> + * TASK_INTERRUPTIBLE state.
> + */
> + __set_current_state(TASK_RUNNING);
> +
> SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler scsi_eh_%d"
> " exiting\n",shost->host_no));
>
>

Steve,
Initial indications are that this fixes it. I've attached and
removed a 1394 drive a few times. I'll watch it through the day,
either in -rt7-patched or -rt10 when I get it built correctly.

Thanks very much!

Cheers,
Mark