2007-11-29 20:25:24

by Mark Lord

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

(resending .. somebody trimmed the CC: list earlier)

Greg KH wrote:
>>>> Mark Lord wrote:
>>>>> ..
>>>>>
>>>>> While doing insert/remove (quickly) tests on USB,
>>>>> I managed to trigger an Oops on 2.6.23.8 on a call
>>>>> to strlen() in make_class_name().
...

> I'll hold off on adding this patch for now.
..

Why?

Bugs that result in Oops in core code (class.c) can bite
just about any subsystem that does hotplug, and should get
prompt attention. Or so one might think.

* * * *

Patch reproduced below for full CC:

While doing insert/remove (quickly) tests on USB, I managed to trigger
an Oops on 2.6.23.1 on the call to strlen() in make_class_name().

This patch prevents the oops, but still keeps the bug visible.

There is still the larger problem of the overall race
that caused this in the first place, but much of the rest
of the code in class.c appears to also do NULL checks to
avoid Oops'ing, so this continues the tradition.

Signed-off-by: Mark Lord <[email protected]> ---

Patch applies to both 2.6.24 and 2.6.23.

--- old/drivers/base/class.c 2007-11-29 10:51:43.000000000 -0500
+++ linux/drivers/base/class.c 2007-11-29 13:00:15.000000000 -0500
@@ -352,9 +352,22 @@
char *make_class_name(const char *name, struct kobject *kobj)
{
char *class_name;
+ const char *kname;
int size;

- size = strlen(name) + strlen(kobject_name(kobj)) + 2;
+ /* Rapidly inserting/removing a USB device (others?)
+ * can trigger an Oops on the strlen() call.
+ * Cause unknown yet, so prevent the Oops
+ * but don't mask the issue.
+ */
+ kname = kobject_name(kobj);
+ if (!kname || !name) {
+ printk(KERN_ERR "make_class_name: name=%p kname=%p\n",
+ name, kname);
+ BUG_ON(1);
+ return NULL;
+ }
+ size = strlen(name) + strlen(kname) + 2;

class_name = kmalloc(size, GFP_KERNEL);
if (!class_name)


2007-11-29 20:33:37

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

On Thu, Nov 29, 2007 at 03:25:04PM -0500, Mark Lord wrote:
> (resending .. somebody trimmed the CC: list earlier)
>
> Greg KH wrote:
>>>>> Mark Lord wrote:
>>>>>> ..
>>>>>>
>>>>>> While doing insert/remove (quickly) tests on USB,
>>>>>> I managed to trigger an Oops on 2.6.23.8 on a call
>>>>>> to strlen() in make_class_name().
> ...
>
>> I'll hold off on adding this patch for now.
> ..
>
> Why?
>
> Bugs that result in Oops in core code (class.c) can bite
> just about any subsystem that does hotplug, and should get
> prompt attention. Or so one might think.

And they have, the 2.6.24 kernel should have the correct fix for this
problem, right? The fact that you oopsed out in this function enabled
people to find and fix the problem already. Adding a BUG_ON() does the
same exact thing :)

So again, the problem is in the higher up scsi layer, and that is where
the problem should already be fixed. Don't add code to lower layers to
paper over bugs elsewhere.

thanks,

greg k-h

2007-11-29 20:46:55

by Mark Lord

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

Greg KH wrote:
> On Thu, Nov 29, 2007 at 03:25:04PM -0500, Mark Lord wrote:
>> (resending .. somebody trimmed the CC: list earlier)
>>
>> Greg KH wrote:
>>>>>> Mark Lord wrote:
>>>>>>> ..
>>>>>>>
>>>>>>> While doing insert/remove (quickly) tests on USB,
>>>>>>> I managed to trigger an Oops on 2.6.23.8 on a call
>>>>>>> to strlen() in make_class_name().
>> ...
>>
>>> I'll hold off on adding this patch for now.
>> ..
>>
>> Why?
>>
>> Bugs that result in Oops in core code (class.c) can bite
>> just about any subsystem that does hotplug, and should get
>> prompt attention. Or so one might think.
>
> And they have, the 2.6.24 kernel should have the correct fix for this
> problem, right? The fact that you oopsed out in this function enabled
> people to find and fix the problem already. Adding a BUG_ON() does the
> same exact thing :)
..

Well, actually BUG_ON() allows the system to continue running
while still not masking the issue. But close enough.


> So again, the problem is in the higher up scsi layer, and that is where
> the problem should already be fixed.
..

Ahhh.. so you figure the Oops should also have been fixed
as part of the 2.6.24 SCSI fixes ? That's what I was missing here.

Thanks.

2007-11-29 21:54:52

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

On Thu, Nov 29, 2007 at 03:46:42PM -0500, Mark Lord wrote:
> Greg KH wrote:
>> So again, the problem is in the higher up scsi layer, and that is where
>> the problem should already be fixed.
> ..
>
> Ahhh.. so you figure the Oops should also have been fixed
> as part of the 2.6.24 SCSI fixes ? That's what I was missing here.

Yes, that is what Alan originally stated...

2007-11-29 22:08:33

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

On Thu, 29 Nov 2007, Mark Lord wrote:

> > So again, the problem is in the higher up scsi layer, and that is where
> > the problem should already be fixed.
> ..
>
> Ahhh.. so you figure the Oops should also have been fixed
> as part of the 2.6.24 SCSI fixes ? That's what I was missing here.

Yes indeed. I wish I could point you to the exact patch containing the
fix, but the git software seems to have lost track of it (it's combined
in with a large number of other patches with no obvious way to separate
it out). It's also available in the various mailing list archives, but
I don't have a pointer to it and there's no reasonable way to search
for it.

The patch in question was written by Matthew Wilcox; it added code to
the SCSI async-scanning routines to utilize the scan_mutex. IMO it
should have been applied to 2.6.23 but it wasn't.

Alan Stern

2007-11-29 22:29:33

by Mark Lord

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

Alan Stern wrote:
> On Thu, 29 Nov 2007, Mark Lord wrote:
>
>>> So again, the problem is in the higher up scsi layer, and that is where
>>> the problem should already be fixed.
>> ..
>>
>> Ahhh.. so you figure the Oops should also have been fixed
>> as part of the 2.6.24 SCSI fixes ? That's what I was missing here.
>
> Yes indeed. I wish I could point you to the exact patch containing the
> fix, but the git software seems to have lost track of it (it's combined
> in with a large number of other patches with no obvious way to separate
> it out). It's also available in the various mailing list archives, but
> I don't have a pointer to it and there's no reasonable way to search
> for it.
>
> The patch in question was written by Matthew Wilcox; it added code to
> the SCSI async-scanning routines to utilize the scan_mutex. IMO it
> should have been applied to 2.6.23 but it wasn't.
..

Ahh. Well, thanks for the *great* followup, Alan!

Cheers

2007-11-29 22:37:20

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

On Thu, Nov 29, 2007 at 05:07:58PM -0500, Alan Stern wrote:
> On Thu, 29 Nov 2007, Mark Lord wrote:
>
> > > So again, the problem is in the higher up scsi layer, and that is where
> > > the problem should already be fixed.
> > ..
> >
> > Ahhh.. so you figure the Oops should also have been fixed
> > as part of the 2.6.24 SCSI fixes ? That's what I was missing here.
>
> Yes indeed. I wish I could point you to the exact patch containing the
> fix, but the git software seems to have lost track of it (it's combined
> in with a large number of other patches with no obvious way to separate
> it out). It's also available in the various mailing list archives, but
> I don't have a pointer to it and there's no reasonable way to search
> for it.
>
> The patch in question was written by Matthew Wilcox; it added code to
> the SCSI async-scanning routines to utilize the scan_mutex. IMO it
> should have been applied to 2.6.23 but it wasn't.

If someone can dig it out, and send it to [email protected], I'd be
gladd to add it to the older kernel trees...

thanks,

greg k-h

2007-11-29 22:43:42

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

On Thu, 29 Nov 2007, Mark Lord wrote:

> Alan Stern wrote:
> > On Thu, 29 Nov 2007, Mark Lord wrote:
> >
> >>> So again, the problem is in the higher up scsi layer, and that is where
> >>> the problem should already be fixed.
> >> ..
> >>
> >> Ahhh.. so you figure the Oops should also have been fixed
> >> as part of the 2.6.24 SCSI fixes ? That's what I was missing here.
> >
> > Yes indeed. I wish I could point you to the exact patch containing the
> > fix, but the git software seems to have lost track of it (it's combined
> > in with a large number of other patches with no obvious way to separate
> > it out). It's also available in the various mailing list archives, but
> > I don't have a pointer to it and there's no reasonable way to search
> > for it.
> >
> > The patch in question was written by Matthew Wilcox; it added code to
> > the SCSI async-scanning routines to utilize the scan_mutex. IMO it
> > should have been applied to 2.6.23 but it wasn't.

Wait -- I found it:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6b7f123f378743d739377871c0cbfbaf28c7d25a

Try applying that to 2.6.23 (it should merge with minimal problems)
and do your stress testing again. Note also the date the patch was
submitted: June 26.

> Ahh. Well, thanks for the *great* followup, Alan!

You're welcome.

Alan Stern

2007-11-29 23:10:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)



On Thu, 29 Nov 2007, Alan Stern wrote:
>
> Yes indeed. I wish I could point you to the exact patch containing the
> fix, but the git software seems to have lost track of it (it's combined
> in with a large number of other patches with no obvious way to separate
> it out). It's also available in the various mailing list archives, but
> I don't have a pointer to it and there's no reasonable way to search
> for it.
>
> The patch in question was written by Matthew Wilcox; it added code to
> the SCSI async-scanning routines to utilize the scan_mutex. IMO it
> should have been applied to 2.6.23 but it wasn't.

Heh. It definitely hasn't gotten lost by "the git software". In fact, with
the kinds of hints you already gave, git makes it really _trivial_ to find
it.

Here's what you do:

git log v2.6.23.. --author=Wilcox

and then just search for "scan_mutex", in the hope that Matthew wrote a
nice commit message. And yes, he did, so in less than a blink you get:

commit 6b7f123f378743d739377871c0cbfbaf28c7d25a
Author: Matthew Wilcox <[email protected]>
Date: Tue Jun 26 15:18:51 2007 -0600

[SCSI] Fix async scanning double-add problems

Stress-testing and some thought has revealed some places where
asynchronous scanning needs some more attention to locking.

- Since async_scan is a bit, we need to hold the host_lock while
modifying it to prevent races against other CPUs modifying the word
that bit is in. This is probably a theoretical race for the moment,
but other patches may change that.
- The async_scan bit means not only that this host is being scanned
asynchronously, but that all the devices attached to this host are not
yet added to sysfs. So we must ensure that this bit is always in sync.
I've chosen to do this with the scan_mutex since it's already acquired
in most of the right places.
...

which I assume is the commit you're talking about.

Linus

2007-11-30 02:58:45

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] [PATCH] base/class.c: prevent ooops due to insert/remove race (v3)

On Thu, 29 Nov 2007, Linus Torvalds wrote:

> Heh. It definitely hasn't gotten lost by "the git software".

No, it sure hasn't. In fact it was staring me right in the face and I
didn't realize it.

> In fact, with
> the kinds of hints you already gave, git makes it really _trivial_ to find
> it.
>
> Here's what you do:
>
> git log v2.6.23.. --author=Wilcox
>
> and then just search for "scan_mutex", in the hope that Matthew wrote a
> nice commit message. And yes, he did, so in less than a blink you get:
>
> commit 6b7f123f378743d739377871c0cbfbaf28c7d25a
> Author: Matthew Wilcox <[email protected]>
> Date: Tue Jun 26 15:18:51 2007 -0600
>
> [SCSI] Fix async scanning double-add problems
>
> Stress-testing and some thought has revealed some places where
> asynchronous scanning needs some more attention to locking.
>
> - Since async_scan is a bit, we need to hold the host_lock while
> modifying it to prevent races against other CPUs modifying the word
> that bit is in. This is probably a theoretical race for the moment,
> but other patches may change that.
> - The async_scan bit means not only that this host is being scanned
> asynchronously, but that all the devices attached to this host are not
> yet added to sysfs. So we must ensure that this bit is always in sync.
> I've chosen to do this with the scan_mutex since it's already acquired
> in most of the right places.
> ...
>
> which I assume is the commit you're talking about.

Yep, that's the one.

Alan Stern