2004-09-29 00:50:44

by Alan

[permalink] [raw]
Subject: Core scsi layer crashes in 2.6.8.1

I've been doing some stress testing for Fedora Core 3 and lets say the
SCSI layer came apart under stress.

First problem was a device with 256 byte sector sizes. When it probes I
get a chain of errors. If I then try and mount it then it hangs the
mount forever. If you remove the USB scsi device to try and unjam it you
get errors logged about Invalid State 256 in USB reset and it doesn't
recover.

Second problem is with the scsi handling logic for errors. If you rmmod
a scsi driver while it is error handling you get a chain of errors
starting with

Illegal transition Cancel->Offline
Badness is scsi_device_set_state
path:
scsi_device_set_state
scsi_unjam_host
scsi_error_handler

This is followed by a series of further errors including kobject errors
and oopses. Then the machine dies.

The set up is fairly simple. Its a a disk and two scsi cd multichangers
configured so that I can also badly terminate them. In that situation
identify works but other commands tend to fail which allows good error
stressing.



2004-09-29 14:15:01

by James Bottomley

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Tue, 2004-09-28 at 16:03, Alan Cox wrote:
> Second problem is with the scsi handling logic for errors. If you rmmod
> a scsi driver while it is error handling you get a chain of errors
> starting with
>
> Illegal transition Cancel->Offline
> Badness is scsi_device_set_state
> path:
> scsi_device_set_state
> scsi_unjam_host
> scsi_error_handler
>

These state transition warnings are currently expected in this code
(they're basically verbose warnings).

What was the oops?

I have a theory that we should be taking a device reference before
waking up the error handler, otherwise host removal can race with error
handling.

James


2004-09-29 14:27:50

by Alan

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Mer, 2004-09-29 at 15:11, James Bottomley wrote:
> What was the oops?
>
> I have a theory that we should be taking a device reference before
> waking up the error handler, otherwise host removal can race with error
> handling.

The sequence I scribbled down from the console was

Illegal state transition Cancel->Offline
Badness in scsi_device_set_state
scsi_device_set_state
scsi_unjam_host
scsi_error_handler

badness in kref_get
kobject_get, get_device, scsi_request_fn
blk_insert_request, scsi_queue_insert
scsi_eh_flush_done_q, scsi_unjam_host
scsi_error_handler

OOPS scsi_device_dev_release
device_release
kobject_cleanup
kobject_release
kref_put
scsi_request_fn

Alan

2004-09-29 14:32:07

by Matthew Wilcox

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Wed, Sep 29, 2004 at 10:11:59AM -0400, James Bottomley wrote:
> I have a theory that we should be taking a device reference before
> waking up the error handler, otherwise host removal can race with error
> handling.

That would certainly mesh with what's happening with Anton's sick ppc64
box ...

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-09-29 14:39:55

by James Bottomley

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Wed, 2004-09-29 at 09:24, Alan Cox wrote:
> badness in kref_get
> kobject_get, get_device, scsi_request_fn
> blk_insert_request, scsi_queue_insert
> scsi_eh_flush_done_q, scsi_unjam_host
> scsi_error_handler

Yes .. this is the significant one ... we're trying to get a reference
to a device that has already been released

> OOPS scsi_device_dev_release
> device_release
> kobject_cleanup
> kobject_release
> kref_put
> scsi_request_fn

James


2004-10-05 11:52:59

by Anton Blanchard

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1


Hi James,

> These state transition warnings are currently expected in this code
> (they're basically verbose warnings).
>
> What was the oops?
>
> I have a theory that we should be taking a device reference before
> waking up the error handler, otherwise host removal can race with error
> handling.

Did this get sorted out? Here is an oops from a few week old BK tree.
FYI I just noticed I have disabled host reset in the sym2 driver (it
was locking up at the time and I never went back to work out why).
However, even with a host reset this could happen right?

Below we get a WARN_ON then an oops (the bit starting with NIP, the
address we tried to access was 0x100510.

Anton

sym0: <1010-66> rev 0x1 at pci 0004:03:01.0 irq 87
sym.0004:03:01.0: No NVRAM, ID 7, Fast-80, LVD, parity checking
xics_enable_irq 47 buid 4 gqirm 255
sym.0004:03:01.0: SCSI BUS has been reset.
scsi0 : sym-2.1.18j
Using anticipatory io scheduler
sym.0004:03:01.0:10: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31)
Vendor: IBM Model: IC35L036UCDY10-0 Rev: S25M
Type: Direct-Access ANSI SCSI revision: 03
sym.0004:03:01.0:10:0: tagged command queuing enabled, command queue depth 16.
scsi(0:0:10:0): Beginning Domain Validation
sym.0004:03:01.0:10: asynchronous.
sym.0004:03:01.0:10: wide asynchronous.
sym.0004:03:01.0:10: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
scsi(0:0:10:0): Ending Domain Validation
sym.0004:03:01.0:11:0:phase change 2-7 6@01050368 resid=5.
sym.0004:03:01.0:11:0:phase change 2-3 6@01050368 resid=5.
sym.0004:03:01.0:11: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31)
sym.0004:03:01.0:11:control msgout: c.
sym.0004:03:01.0: TARGET 11 has been reset.
sym.0004:03:01.0:11:0: ABORT operation started.
sym.0004:03:01.0:11:0: ABORT operation complete.
sym.0004:03:01.0:11:0: DEVICE RESET operation started.
sym.0004:03:01.0:11:0: DEVICE RESET operation complete.
sym.0004:03:01.0:11:control msgout: c.
sym.0004:03:01.0: TARGET 11 has been reset.
sym.0004:03:01.0:11:0: ABORT operation started.
sym.0004:03:01.0:11:0: ABORT operation complete.
sym.0004:03:01.0:11:0: BUS RESET operation started.
sym.0004:03:01.0:11:0: BUS RESET operation complete.
sym.0004:03:01.0: SCSI BUS reset detected.
sym.0004:03:01.0: SCSI BUS has been reset.
scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 11 lun 0
Badness in kref_get at lib/kref.c:32
Call Trace:
[c0000025fe1b3bd0] [c0000030262bf4b0] 0xc0000030262bf4b0 (unreliable)
[c0000025fe1b3c50] [c00000000021f5b8] .get_device+0x20/0x3c
[c0000025fe1b3cc0] [c000000000294c60] .scsi_device_get+0x38/0xe4
[c0000025fe1b3d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
[c0000025fe1b3de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
[c0000025fe1b3e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
[c0000025fe1b3f90] [c000000000017aac] .kernel_thread+0x4c/0x68
sym.0004:03:01.0:11:control msgout: c.
NIP: C000000000294C48 XER: 0000000020000000 LR: C000000000294E30
REGS: c0000025fe1b3a40 TRAP: 0300 Not tainted (2.6.9-rc2-bml)
MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 0000000000100510, DSISR: 0000000040000000
TASK: c000001dfe1b72c0[1467] 'scsi_eh_0' THREAD: c0000025fe1b0000 CPU: 3
GPR00: FFFFFFFFFFFFFFFA C0000025FE1B3CC0 C0000000007297B8 00000000001000F0
GPR04: C000000FFE185000 0000000000000001 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000100100 C000000000B96228 9000000000009032
GPR12: 0000000024FFFF22 C000000000542700 0000000000000000 0000000000000000
GPR16: 0000000000000000 C00000000040D188 C000000000587058 C0000025FE1B3ED0
GPR20: 00000000000000FC C00000000040D188 C000000000587058 C0000025FE1B3F00
GPR24: C0000025FE1B3EF0 0000040100000000 C000002FFE128D30 C000000FFE185000
GPR28: 9000000000009032 C0000007FFFCF800 00000000001002D8 00000000001000F0
NIP [c000000000294c48] .scsi_device_get+0x20/0xe4
LR [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
Call Trace:
[c0000025fe1b3cc0] [c000000000294da8] .scsi_device_put+0x9c/0xc4 (unreliable)
[c0000025fe1b3d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
[c0000025fe1b3de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
[c0000025fe1b3e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
[c0000025fe1b3f90] [c000000000017aac] .kernel_thread+0x4c/0x68

2004-10-05 14:00:17

by James Bottomley

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Tue, 2004-10-05 at 06:49, Anton Blanchard wrote:
>
> Hi James,
>
> > These state transition warnings are currently expected in this code
> > (they're basically verbose warnings).
> >
> > What was the oops?
> >
> > I have a theory that we should be taking a device reference before
> > waking up the error handler, otherwise host removal can race with error
> > handling.
>
> Did this get sorted out? Here is an oops from a few week old BK tree.
> FYI I just noticed I have disabled host reset in the sym2 driver (it
> was locking up at the time and I never went back to work out why).
> However, even with a host reset this could happen right?

Well, the theoretical hole is fixed ... If you test the current tree
we'll find out if this is indeed your problem.

James


2004-10-05 14:46:12

by Mark Lord

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

There seem to be other holes/races in this and related code.

The QStor driver implements hot insertion/removal of drives.

One thing it has to cope with at present is, after notifying
the mid-layer that a drive has been removed, the mid-layer calls
back with a synchronize-cache command for that drive..

Cheers
--
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

2004-10-05 14:56:15

by James Bottomley

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Tue, 2004-10-05 at 09:44, Mark Lord wrote:
> There seem to be other holes/races in this and related code.
>
> The QStor driver implements hot insertion/removal of drives.
>
> One thing it has to cope with at present is, after notifying
> the mid-layer that a drive has been removed, the mid-layer calls
> back with a synchronize-cache command for that drive..

This is expected behaviour. For orderly removal an cache sync command
must be sent to drives with a writeback cache before they're powered
down. For forced ejection, the driver has to error the command.

James


2004-10-05 15:51:32

by Mark Lord

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

James Bottomley wrote:
>
> This is expected behaviour. For orderly removal an cache sync command
> must be sent to drives with a writeback cache before they're powered
> down. For forced ejection, the driver has to error the command.

Yup, that's how it has to be done at present.

Another weirdness I ran into at one point, was that the mid-layer
could be made to segfault if a LLD asked it to remove a drive that
had previously been set "offline" -- it complains about an illegal
state transition during the removal, and then dies. This sequence
no longer occurs in the QStor driver, but it might resurface soon
as more drivers begin to support hot insertion/removal.

Cheers
--
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

2004-10-05 15:55:53

by James Bottomley

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Tue, 2004-10-05 at 10:49, Oliver Neukum wrote:
> Then let the driver tell the upper layers whether the device is still
> connected or not.

Do we have to go over this again?

It would add quite a bit of complexity to the reference counted
aynchronous model to try and force synchronicity between queuecommand
and scsi_remove_host in the mid-layer. Therefore it's much easier to
let the LLD decide what to do with the command.

James


2004-10-05 16:00:34

by Oliver Neukum

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

Am Dienstag, 5. Oktober 2004 16:56 schrieb James Bottomley:
> On Tue, 2004-10-05 at 09:44, Mark Lord wrote:
> > There seem to be other holes/races in this and related code.
> >
> > The QStor driver implements hot insertion/removal of drives.
> >
> > One thing it has to cope with at present is, after notifying
> > the mid-layer that a drive has been removed, the mid-layer calls
> > back with a synchronize-cache command for that drive..
>
> This is expected behaviour. For orderly removal an cache sync command
> must be sent to drives with a writeback cache before they're powered
> down. For forced ejection, the driver has to error the command.

Then let the driver tell the upper layers whether the device is still
connected or not.

Regards
Oliver

2004-10-05 16:08:46

by James Bottomley

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Tue, 2004-10-05 at 11:01, Oliver Neukum wrote:
> Why is it in any way difficult to decide whether to issue a command in the
> first place? The command is generated upon being notified by the lower layer.
> There is no issue of synchronisation here. It is simply stupid to give
> commands that are bound to fail, if the information is already available.

a) we don't know that they are ... for notified ejection they will
succeed.

b) The scsi bus is a scanned model ... drivers must be prepared to
accept commands for non-existent devices. How does the removal case
differ from the never present case?

James


2004-10-05 16:19:02

by Oliver Neukum

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

Am Dienstag, 5. Oktober 2004 17:54 schrieb James Bottomley:
> On Tue, 2004-10-05 at 10:49, Oliver Neukum wrote:
> > Then let the driver tell the upper layers whether the device is still
> > connected or not.
>
> Do we have to go over this again?
>
> It would add quite a bit of complexity to the reference counted
> aynchronous model to try and force synchronicity between queuecommand
> and scsi_remove_host in the mid-layer. Therefore it's much easier to
> let the LLD decide what to do with the command.

Why is it in any way difficult to decide whether to issue a command in the
first place? The command is generated upon being notified by the lower layer.
There is no issue of synchronisation here. It is simply stupid to give
commands that are bound to fail, if the information is already available.

Regards
Oliver

2004-10-05 16:20:47

by James Bottomley

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Tue, 2004-10-05 at 10:57, Mark Lord wrote:
> James Bottomley wrote:
> >
> > It would add quite a bit of complexity to the reference counted
> > aynchronous model to try and force synchronicity between queuecommand
> > and scsi_remove_host in the mid-layer. Therefore it's much easier to
> > let the LLD decide what to do with the command.
>
> Presumably the same is also true for scsi_remove_device() ?

Yes.

James


2004-10-05 16:30:11

by Mark Lord

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

James Bottomley wrote:
>
> It would add quite a bit of complexity to the reference counted
> aynchronous model to try and force synchronicity between queuecommand
> and scsi_remove_host in the mid-layer. Therefore it's much easier to
> let the LLD decide what to do with the command.

Presumably the same is also true for scsi_remove_device() ?

Cheers
--
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

2004-10-05 16:30:03

by Oliver Neukum

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

Am Dienstag, 5. Oktober 2004 18:07 schrieb James Bottomley:
> On Tue, 2004-10-05 at 11:01, Oliver Neukum wrote:
> > Why is it in any way difficult to decide whether to issue a command in the
> > first place? The command is generated upon being notified by the lower layer.
> > There is no issue of synchronisation here. It is simply stupid to give
> > commands that are bound to fail, if the information is already available.
>
> a) we don't know that they are ... for notified ejection they will
> succeed.

Yes, that is why I proposed that you let the lower levels tell you whether
the devices in questions are still operative or not.

> b) The scsi bus is a scanned model ... drivers must be prepared to
> accept commands for non-existent devices. How does the removal case
> differ from the never present case?

It doesn't. But that doesn't explain why you want to issue the command
in all cases, even if we coule easily tell you whether it makes sense or
not? It makes no sense to me to throw away information you already have.

Regards
Oliver

2004-10-05 16:45:29

by James Bottomley

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Tue, 2004-10-05 at 11:26, Oliver Neukum wrote:
> > b) The scsi bus is a scanned model ... drivers must be prepared to
> > accept commands for non-existent devices. How does the removal case
> > differ from the never present case?
>
> It doesn't. But that doesn't explain why you want to issue the command
> in all cases, even if we coule easily tell you whether it makes sense or
> not? It makes no sense to me to throw away information you already have.

I'm lazy ... I don't see any point in going to a huge engineering effort
to avoid behaviour that the driver must cope correctly with anyway. If
it isn't broken, don't fix it.

James


2004-10-05 17:00:08

by Jeff Garzik

[permalink] [raw]
Subject: Re: Core scsi layer crashes in 2.6.8.1

On Tue, Oct 05, 2004 at 11:57:40AM -0400, Mark Lord wrote:
> James Bottomley wrote:
> >
> >It would add quite a bit of complexity to the reference counted
> >aynchronous model to try and force synchronicity between queuecommand
> >and scsi_remove_host in the mid-layer. Therefore it's much easier to
> >let the LLD decide what to do with the command.
>
> Presumably the same is also true for scsi_remove_device() ?

What I do in my local hotplug code is assume that the SCSI layer will be
stupid and send commands after I call scsi_remove_device(), for an
indeterminant but short period of time.

Jeff