2007-09-11 03:24:23

by Pete Zaitcev

[permalink] [raw]
Subject: Oops in make_class_name in 2.6.22.1 on Fedora

Hi, Greg (and others):

I do not seem to be able to find an answer, sorry.
Do you happen to remember if this was fixed after 2.6.22.1:

localhost kernel: EIP is at make_class_name+0x27/0x7a
localhost kernel: [<c05586f1>] class_device_del+0x97/0x119
localhost kernel: [<c055877b>] class_device_unregister+0x8/0x10
localhost kernel: [<e08735f4>] __scsi_remove_device+0x1d/0x60 [scsi_mod]
localhost kernel: [<e08711ae>] scsi_forget_host+0x2d/0x4a [scsi_mod]
localhost kernel: [<e086c49c>] scsi_remove_host+0x65/0xd5 [scsi_mod]
localhost kernel: [<e0e76775>] storage_disconnect+0xe/0x16 [usb_storage]
localhost kernel: [<c056f8b8>] usb_unbind_interface+0x44/0x85
localhost kernel: [<c0557df7>] __device_release_driver+0x6e/0x8b

Obviously a known bug but all I see is users reporting it. In my case
it's this:
https://bugzilla.redhat.com/show_bug.cgi?id=253424
I saw Alan giving it a try here:
http://lkml.org/lkml/2007/7/12/259

Yours,
-- Pete


2007-09-11 06:39:46

by Greg KH

[permalink] [raw]
Subject: Re: Oops in make_class_name in 2.6.22.1 on Fedora

On Mon, Sep 10, 2007 at 08:26:47PM -0700, Pete Zaitcev wrote:
> Hi, Greg (and others):
>
> I do not seem to be able to find an answer, sorry.
> Do you happen to remember if this was fixed after 2.6.22.1:
>
> localhost kernel: EIP is at make_class_name+0x27/0x7a
> localhost kernel: [<c05586f1>] class_device_del+0x97/0x119
> localhost kernel: [<c055877b>] class_device_unregister+0x8/0x10
> localhost kernel: [<e08735f4>] __scsi_remove_device+0x1d/0x60 [scsi_mod]
> localhost kernel: [<e08711ae>] scsi_forget_host+0x2d/0x4a [scsi_mod]
> localhost kernel: [<e086c49c>] scsi_remove_host+0x65/0xd5 [scsi_mod]
> localhost kernel: [<e0e76775>] storage_disconnect+0xe/0x16 [usb_storage]
> localhost kernel: [<c056f8b8>] usb_unbind_interface+0x44/0x85
> localhost kernel: [<c0557df7>] __device_release_driver+0x6e/0x8b
>
> Obviously a known bug but all I see is users reporting it. In my case
> it's this:
> https://bugzilla.redhat.com/show_bug.cgi?id=253424
> I saw Alan giving it a try here:
> http://lkml.org/lkml/2007/7/12/259

I think this was a scsi problem that has been fixed, but don't really
remember the exact commit.

Alan, any ideas?

It would probably be good to get this into the next -stable if we can
figure out which patch did fix it.

thanks,

greg k-h

2007-09-11 09:18:17

by Cornelia Huck

[permalink] [raw]
Subject: Re: Oops in make_class_name in 2.6.22.1 on Fedora

On Mon, 10 Sep 2007 23:12:26 -0700,
Greg KH <[email protected]> wrote:

> On Mon, Sep 10, 2007 at 08:26:47PM -0700, Pete Zaitcev wrote:
> > Hi, Greg (and others):
> >
> > I do not seem to be able to find an answer, sorry.
> > Do you happen to remember if this was fixed after 2.6.22.1:
> >
> > localhost kernel: EIP is at make_class_name+0x27/0x7a
> > localhost kernel: [<c05586f1>] class_device_del+0x97/0x119
> > localhost kernel: [<c055877b>] class_device_unregister+0x8/0x10
> > localhost kernel: [<e08735f4>] __scsi_remove_device+0x1d/0x60 [scsi_mod]
> > localhost kernel: [<e08711ae>] scsi_forget_host+0x2d/0x4a [scsi_mod]
> > localhost kernel: [<e086c49c>] scsi_remove_host+0x65/0xd5 [scsi_mod]
> > localhost kernel: [<e0e76775>] storage_disconnect+0xe/0x16 [usb_storage]
> > localhost kernel: [<c056f8b8>] usb_unbind_interface+0x44/0x85
> > localhost kernel: [<c0557df7>] __device_release_driver+0x6e/0x8b
> >
> > Obviously a known bug but all I see is users reporting it. In my case
> > it's this:
> > https://bugzilla.redhat.com/show_bug.cgi?id=253424
> > I saw Alan giving it a try here:
> > http://lkml.org/lkml/2007/7/12/259
>
> I think this was a scsi problem that has been fixed, but don't really
> remember the exact commit.

It may be something like "removing a device before it has been
completely initialized" (class_device_unregister() racing against
class_device_register()). (Perhaps you can see if that is the case by
turning on verbose driver core and kobject debugging messages.)

2007-09-11 15:11:18

by Alan Stern

[permalink] [raw]
Subject: Re: Oops in make_class_name in 2.6.22.1 on Fedora

On Mon, 10 Sep 2007, Greg KH wrote:

> On Mon, Sep 10, 2007 at 08:26:47PM -0700, Pete Zaitcev wrote:
> > Hi, Greg (and others):
> >
> > I do not seem to be able to find an answer, sorry.
> > Do you happen to remember if this was fixed after 2.6.22.1:
> >
> > localhost kernel: EIP is at make_class_name+0x27/0x7a
> > localhost kernel: [<c05586f1>] class_device_del+0x97/0x119
> > localhost kernel: [<c055877b>] class_device_unregister+0x8/0x10
> > localhost kernel: [<e08735f4>] __scsi_remove_device+0x1d/0x60 [scsi_mod]
> > localhost kernel: [<e08711ae>] scsi_forget_host+0x2d/0x4a [scsi_mod]
> > localhost kernel: [<e086c49c>] scsi_remove_host+0x65/0xd5 [scsi_mod]
> > localhost kernel: [<e0e76775>] storage_disconnect+0xe/0x16 [usb_storage]
> > localhost kernel: [<c056f8b8>] usb_unbind_interface+0x44/0x85
> > localhost kernel: [<c0557df7>] __device_release_driver+0x6e/0x8b
> >
> > Obviously a known bug but all I see is users reporting it. In my case
> > it's this:
> > https://bugzilla.redhat.com/show_bug.cgi?id=253424
> > I saw Alan giving it a try here:
> > http://lkml.org/lkml/2007/7/12/259
>
> I think this was a scsi problem that has been fixed, but don't really
> remember the exact commit.

Yes. There are some bugs in the SCSI async-scanning code, and there is
a patch to fix them:

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=a93a091df8232fad60867d41fbc3be855a0b78f2

It is scheduled for 2.6.24. For now, people can work around the bug by
disabling CONFIG_SCSI_SCAN_ASYNC.

> Alan, any ideas?
>
> It would probably be good to get this into the next -stable if we can
> figure out which patch did fix it.

I have urged both James Bottomley and Andrew Morton to apply that patch
as soon as possible. Neither of them paid any attention.

Alan Stern

2007-10-22 22:07:14

by Chuck Ebbert

[permalink] [raw]
Subject: Re: Oops in make_class_name in 2.6.22.1 on Fedora

On 09/11/2007 11:11 AM, Alan Stern wrote:
>>>
>>> I do not seem to be able to find an answer, sorry.
>>> Do you happen to remember if this was fixed after 2.6.22.1:
>>>
>>> localhost kernel: EIP is at make_class_name+0x27/0x7a
>>> localhost kernel: [<c05586f1>] class_device_del+0x97/0x119
>>> localhost kernel: [<c055877b>] class_device_unregister+0x8/0x10
>>> localhost kernel: [<e08735f4>] __scsi_remove_device+0x1d/0x60 [scsi_mod]
>>> localhost kernel: [<e08711ae>] scsi_forget_host+0x2d/0x4a [scsi_mod]
>>> localhost kernel: [<e086c49c>] scsi_remove_host+0x65/0xd5 [scsi_mod]
>>> localhost kernel: [<e0e76775>] storage_disconnect+0xe/0x16 [usb_storage]
>>> localhost kernel: [<c056f8b8>] usb_unbind_interface+0x44/0x85
>>> localhost kernel: [<c0557df7>] __device_release_driver+0x6e/0x8b
>>>
>>> Obviously a known bug but all I see is users reporting it. In my case
>>> it's this:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=253424
>>> I saw Alan giving it a try here:
>>> http://lkml.org/lkml/2007/7/12/259
>> I think this was a scsi problem that has been fixed, but don't really
>> remember the exact commit.
>
> Yes. There are some bugs in the SCSI async-scanning code, and there is
> a patch to fix them:
>
> http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=a93a091df8232fad60867d41fbc3be855a0b78f2
>
> It is scheduled for 2.6.24. For now, people can work around the bug by
> disabling CONFIG_SCSI_SCAN_ASYNC.
>

The bug still happens with that patch applied and CONFIG_SCSI_SCAN_ASYNC
enabled. In the marked line, class_dev->class->name is NULL and this causes
the oops in make_class_name().


static void remove_deprecated_class_device_links(struct class_device *class_dev)
{
char *class_name;

if (!class_dev->dev)
return;

===>> class_name = make_class_name(class_dev->class->name, &class_dev->kobj);
if (class_name)
sysfs_remove_link(&class_dev->dev->kobj, class_name);
kfree(class_name);
}

2007-10-30 16:50:59

by Alan Stern

[permalink] [raw]
Subject: Re: Oops in make_class_name in 2.6.22.1 on Fedora

On Mon, 22 Oct 2007, Chuck Ebbert wrote:

> On 09/11/2007 11:11 AM, Alan Stern wrote:
> >>>
> >>> I do not seem to be able to find an answer, sorry.
> >>> Do you happen to remember if this was fixed after 2.6.22.1:
> >>>
> >>> localhost kernel: EIP is at make_class_name+0x27/0x7a
> >>> localhost kernel: [<c05586f1>] class_device_del+0x97/0x119
> >>> localhost kernel: [<c055877b>] class_device_unregister+0x8/0x10
> >>> localhost kernel: [<e08735f4>] __scsi_remove_device+0x1d/0x60 [scsi_mod]
> >>> localhost kernel: [<e08711ae>] scsi_forget_host+0x2d/0x4a [scsi_mod]
> >>> localhost kernel: [<e086c49c>] scsi_remove_host+0x65/0xd5 [scsi_mod]
> >>> localhost kernel: [<e0e76775>] storage_disconnect+0xe/0x16 [usb_storage]
> >>> localhost kernel: [<c056f8b8>] usb_unbind_interface+0x44/0x85
> >>> localhost kernel: [<c0557df7>] __device_release_driver+0x6e/0x8b
> >>>
> >>> Obviously a known bug but all I see is users reporting it. In my case
> >>> it's this:
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=253424
> >>> I saw Alan giving it a try here:
> >>> http://lkml.org/lkml/2007/7/12/259
> >> I think this was a scsi problem that has been fixed, but don't really
> >> remember the exact commit.
> >
> > Yes. There are some bugs in the SCSI async-scanning code, and there is
> > a patch to fix them:
> >
> > http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=a93a091df8232fad60867d41fbc3be855a0b78f2
> >
> > It is scheduled for 2.6.24. For now, people can work around the bug by
> > disabling CONFIG_SCSI_SCAN_ASYNC.
> >
>
> The bug still happens with that patch applied and CONFIG_SCSI_SCAN_ASYNC
> enabled. In the marked line, class_dev->class->name is NULL and this causes
> the oops in make_class_name().

Can anybody provide a dmesg log showing the error on a system with
CONFIG_USB_DEBUG and CONFIG_USB_STORAGE_DEBUG enabled?

Does it still occur with 2.6.24-rc1?

Alan Stern