I see an Oops in the SCSI code, caused by the fact that sdkp is NULL
in sd_shutdown. "How can that be?", you will ask - dev->driver_data was set
in sd_probe. But in my case sd_probe never finished. An insmod usb-storage
hangs forever, or at least for more than six hours, giving ample opportunity
to observe this race between sd_probe and sd_shutdown.
(Of course sd_probe hangs in sd_revalidate disk.)
Perhaps the obvious test is a good idea.
Locking seems meaningless - sd_probe will never finish.
Andries
[Probably the init of usb_storage should start probing the devices in separate
threads, in parallel, and return immediately.]
The obvious patch (with whitespace damage)
diff -u --recursive --new-file -X /linux/dontdiff a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c Mon Jul 28 05:39:31 2003
+++ b/drivers/scsi/sd.c Tue Aug 12 01:24:51 2003
@@ -1351,10 +1351,14 @@
static void sd_shutdown(struct device *dev)
{
struct scsi_device *sdp = to_scsi_device(dev);
- struct scsi_disk *sdkp = dev_get_drvdata(dev);
+ struct scsi_disk *sdkp;
struct scsi_request *sreq;
int retries, res;
+ sdkp = dev_get_drvdata(dev);
+ if (!sdkp)
+ return; /* this can happen */
+
if (!sdp->online || !sdkp->WCE)
return;
At +0200 12:28 AM 8/12/2003, Andries Brouwer wrote (in part):
>The obvious patch (with whitespace damage)
>
>diff -u --recursive --new-file -X /linux/dontdiff a/drivers/scsi/sd.c
>b/drivers/scsi/sd.c
>--- a/drivers/scsi/sd.c Mon Jul 28 05:39:31 2003
>+++ b/drivers/scsi/sd.c Tue Aug 12 01:24:51 2003
>@@ -1351,10 +1351,14 @@
> static void sd_shutdown(struct device *dev)
> {
> struct scsi_device *sdp = to_scsi_device(dev);
>- struct scsi_disk *sdkp = dev_get_drvdata(dev);
>+ struct scsi_disk *sdkp;
> struct scsi_request *sreq;
> int retries, res;
>
>+ sdkp = dev_get_drvdata(dev);
>+ if (!sdkp)
>+ return; /* this can happen */
>+
> if (!sdp->online || !sdkp->WCE)
> return;
Looking only at the above code snippet, I'd suggest something more like:
~~~~~~~~~~~~~~~~~~
diff -u --recursive --new-file -X /linux/dontdiff a/drivers/scsi/sd.c
b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c Mon Jul 28 05:39:31 2003
+++ b/drivers/scsi/sd.c Tue Aug 12 01:24:51 2003
@@ -1351,10 +1351,14 @@
static void sd_shutdown(struct device *dev)
{
struct scsi_device *sdp = to_scsi_device(dev);
struct scsi_disk *sdkp = dev_get_drvdata(dev);
struct scsi_request *sreq;
int retries, res;
- if (!sdp->online || !sdkp->WCE)
+ if (!sdp || !sdp->online || !sdkp || !sdkp->WCE)
return;
~~~~~~~~~~~~~~~~~~
If sdp can *never* be NULL *and* this is a performance-critical code-path
then perhaps it makes sense to leave off the "!sdp || " which I added to
the logic, but it seems a very small cost to pay for the insurance checking
in sd_shutdown().
--
Jeff Woods <[email protected]>
On Mon, Aug 11, 2003 at 06:13:50PM -0700, Jeff Woods wrote:
> Looking only at the above code snippet, I'd suggest something more like:
> + if (!sdp ||
This is not meaningful.
A general kind of convention is that a pointer will be NULL either
by mistake, when it is uninitialized, or on purpose, when no object
is present or no action (other than the default) needs to be performed.
But that general idea is broken by container_of(), which just subtracts
a constant. So, one should check before subtracting that the pointer
is non-NULL. Checking afterwards is meaningless.
At +0200 04:49 AM 8/12/2003, Andries Brouwer wrote:
>On Mon, Aug 11, 2003 at 06:13:50PM -0700, Jeff Woods wrote:
>
>>Looking only at the above code snippet, I'd suggest something more like:
>
>>+ if (!sdp ||
>
>This is not meaningful.
How is it not meaningful? The next action in the expression is to
dereference the pointer and if it has a NULL value then I expect the
dereference to fail. [But I am a complete newbie with respect to Linux
kernel and driver code so perhaps my understanding is in error. If so,
please enlighten me.]
>A general kind of convention is that a pointer will be NULL either by
>mistake, when it is uninitialized, or on purpose, when no object is
>present or no action (other than the default) needs to be performed.
Of course. But in this case, the next action the code will attempt is to
dereference that pointer which I expect would fail if it's NULL. If you're
telling me the pointer cannot be NULL, then fine [which I tried to indicate
was a possibility in my first email in this thread], but if the pointer
might *ever* be NULL (and dereferencing a NULL pointer in this context is
as bad as it usually is) then there is no point in proceeding and returning
from the function immediately seems like a reasonable thing to do in case
of a shutdown function. (I can see possible value in additionally
reporting an error or warning somehow if feasible, but that's not germane
to whether checking the pointers for NULL is a prudent action.
>But that general idea is broken by container_of(), which just subtracts a
>constant. So, one should check before subtracting that the pointer is
>non-NULL. Checking afterwards is meaningless.
As I tried to indicate in the opening statement I have not looked at any
other code than what you included in the patch diff beginning this thread
so I don't see any reference to anything that indicates some function
called container_of() [which sounds like it might be some C++ routine...
and I was under the impression this code is C rather than C++]. The diff
includes the beginning of the function including initialization of both the
sdp and sdkp pointers. One bug the patch fixes is the implicit dereference
of sdkp in the original version of the "if" statement I suggest be
modified. My version of the patch (1) reduces the number of lines changed,
(2) results in fewer lines, (3) improves the transparency of the code, and
(4) additionally checks for a (perhaps unlikely or even improbable)
potential unanticipated runtime error, all of which makes me believe it is
an improvement.
--
Jeff Woods <[email protected]>
On Tue, Aug 12, 2003 at 12:28:44AM +0200, Andries Brouwer wrote:
> I see an Oops in the SCSI code, caused by the fact that sdkp is NULL
> in sd_shutdown. "How can that be?", you will ask - dev->driver_data was set
> in sd_probe. But in my case sd_probe never finished. An insmod usb-storage
> hangs forever, or at least for more than six hours, giving ample opportunity
> to observe this race between sd_probe and sd_shutdown.
> (Of course sd_probe hangs in sd_revalidate disk.)
Well, this same problem could show upb in any other driver. Could
you instead send a patch to Pat that the driver model never calls
the shutdown method for a driver that hasn't finished ->probe?
On Tue, Aug 12, 2003 at 07:53:53AM +0100, Christoph Hellwig wrote:
> > I see an Oops in the SCSI code, caused by the fact that sdkp is NULL
> > in sd_shutdown. "How can that be?", you will ask - dev->driver_data was set
> > in sd_probe. But in my case sd_probe never finished. An insmod usb-storage
> > hangs forever, or at least for more than six hours, giving ample opportunity
> > to observe this race between sd_probe and sd_shutdown.
> > (Of course sd_probe hangs in sd_revalidate disk.)
>
> Well, this same problem could show upb in any other driver. Could
> you instead send a patch to Pat that the driver model never calls
> the shutdown method for a driver that hasn't finished ->probe?
Yes, that is the next stage. But it takes a few hours instead of a
few seconds.
On Mon, Aug 11, 2003 at 11:35:26PM -0700, Jeff Woods wrote:
> At +0200 04:49 AM 8/12/2003, Andries Brouwer wrote:
> >On Mon, Aug 11, 2003 at 06:13:50PM -0700, Jeff Woods wrote:
> >
> >>Looking only at the above code snippet, I'd suggest something more like:
> >
> >>+ if (!sdp ||
> >
> >This is not meaningful.
>
> How is it not meaningful? The next action in the expression is to
> dereference the pointer and if it has a NULL value then I expect the
> dereference to fail. [But I am a complete newbie with respect to Linux
> kernel and driver code so perhaps my understanding is in error. If so,
> please enlighten me.]
sdp can not be NULL in this case. That is why it is not meaningful to
try to check for it.
thanks,
greg k-h
On Tue, Aug 12, 2003 at 07:53:53AM +0100, Christoph Hellwig wrote:
> On Tue, Aug 12, 2003 at 12:28:44AM +0200, Andries Brouwer wrote:
> > I see an Oops in the SCSI code, caused by the fact that sdkp is NULL
> > in sd_shutdown. "How can that be?", you will ask - dev->driver_data was set
> > in sd_probe. But in my case sd_probe never finished. An insmod usb-storage
> > hangs forever, or at least for more than six hours, giving ample opportunity
> > to observe this race between sd_probe and sd_shutdown.
> > (Of course sd_probe hangs in sd_revalidate disk.)
>
> Well, this same problem could show upb in any other driver. Could
> you instead send a patch to Pat that the driver model never calls
> the shutdown method for a driver that hasn't finished ->probe?
I think it already will not do that due to taking the bus->subsys.rwsem
before calling either probe() or remove().
thanks,
greg k-h
Greg KH [[email protected]] wrote:
> On Tue, Aug 12, 2003 at 07:53:53AM +0100, Christoph Hellwig wrote:
> > On Tue, Aug 12, 2003 at 12:28:44AM +0200, Andries Brouwer wrote:
> > > I see an Oops in the SCSI code, caused by the fact that sdkp is NULL
> > > in sd_shutdown. "How can that be?", you will ask - dev->driver_data was set
> > > in sd_probe. But in my case sd_probe never finished. An insmod usb-storage
> > > hangs forever, or at least for more than six hours, giving ample opportunity
> > > to observe this race between sd_probe and sd_shutdown.
> > > (Of course sd_probe hangs in sd_revalidate disk.)
> >
> > Well, this same problem could show upb in any other driver. Could
> > you instead send a patch to Pat that the driver model never calls
> > the shutdown method for a driver that hasn't finished ->probe?
>
> I think it already will not do that due to taking the bus->subsys.rwsem
> before calling either probe() or remove().
>
Is the shutdown being called directly? The shutdown call is protected by
a different rwsem. Depending on the call graph setting dev->driver on
return of probe may provide a solution. I have not looked at all probe
routines to understand if this would cause any bad side effects.
Andries,
Can you send the oops output?
-andmike
--
Michael Anderson
[email protected]
On Tue, Aug 12, 2003 at 05:24:50PM -0700, Mike Anderson wrote:
> > > Well, this same problem could show upb in any other driver. Could
> > > you instead send a patch to Pat that the driver model never calls
> > > the shutdown method for a driver that hasn't finished ->probe?
> >
> > I think it already will not do that due to taking the bus->subsys.rwsem
> > before calling either probe() or remove().
>
> Is the shutdown being called directly? The shutdown call is protected by
> a different rwsem. Depending on the call graph setting dev->driver on
> return of probe may provide a solution.
Yes, that is precisely what I had considered doing.
> I have not looked at all probe
> routines to understand if this would cause any bad side effects.
>
> Andries,
> Can you send the oops output?
top of stack was reported as (process reboot):
sd_shutdown + 0x22/0x110 NULL deref (namely, sdkp)
i8042_notify_sys
device_shutdown
sys_reboot
do_clock_nanosleep