2006-03-05 23:50:47

by Andrew Morton

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()


Might as well cc the list on this, so there's a record...

For several days I've been getting repeatable oopses in the -mm kernel.
They occur once per ~30 boots, during initscripts.

Two days of bisection searching shows that when this scheduler patch

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/sched-alter_uninterruptible_sleep_interactivity.patch

is included, the oopses start occurring.

I don't think the scheduler patch is the bug - I guess it just alters
timing. The pipe code and the exec code are repeatedly implicated in the
oopses, and it's noteworthy that
sched-make_task_noninteractive_use_sleep_type.patch, which affects the pipe
wakeup handling does _not_ need to be present for this bug to manifest.

I tested current -linus with just these patches applied:

sched-fix-task-interactivity-calculation
sched-implement-smpnice
sched-smpnice-apply-review-suggestions
sched-smpnice-fix-average-load-per-run-queue-calculations
sched-cleanup_task_activated
sched-alter_uninterruptible_sleep_interactivity
sched-make_task_noninteractive_use_sleep_type
sched-dont_decrease_idle_sleep_avg
sched-include_noninteractive_sleep_in_idle_detect
sched-remove-on-runqueue-requeueing

and mainline oopses too. So either there's a bug in mainline, or there's a
bug in one of these:

sched-fix-task-interactivity-calculation
sched-implement-smpnice
sched-smpnice-apply-review-suggestions
sched-smpnice-fix-average-load-per-run-queue-calculations
sched-cleanup_task_activated
sched-alter_uninterruptible_sleep_interactivity


The usual pattern of the oopses is that something has been overwritten with
some data which has been used as an environment string by the uevent code.

Example:

SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts
SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts
SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
Unable to handle kernel paging request at virtual address 44535955
printing eip:
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: nvram ipw2200 ohci1394 ide_cd cdrom ieee1394 uhci_hcd sg ehci_hcd ieee80211 ieee80211_crypt joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event i2c_i801 snd_seq i2c_core snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore piix hw_random snd_page_alloc generic ext3 jbd ide_disk ide_core
CPU: 0
EIP: 0060:[<c0242dd3>] Not tainted VLI
EFLAGS: 00010246 (2.6.16-rc5-mm2 #298)
EIP is at choose_configuration+0x58/0x13a
eax: 000000d2 ebx: f7724c00 ecx: 44535950 edx: 000001f4
esi: f77e3600 edi: 00000001 ebp: 00000000 esp: c1d7def8
ds: 007b es: 007b ss: 0068
Process khubd (pid: 114, threadinfo=c1d7d000 task=c1d7ca90)
Stack: <0>00000000 0000d000 f7724c00 f7724c00 00000000 00000001 c0242f68 00000000
f7724c00 c1d410c0 c02440b8 00000000 0000000a f7c71214 f7055000 00000010
c1d410c0 00000001 f7055000 00000001 c024450f 00000001 00000000 f7c71200
Call Trace:
[<c0242f68>] usb_new_device+0xb3/0x129
[<c02440b8>] hub_port_connect_change+0x1ad/0x343
[<c024450f>] hub_events+0x2c1/0x3f1
[<c024463f>] hub_thread+0x0/0xe6
[<c0244649>] hub_thread+0xa/0xe6
[<c0129bb3>] autoremove_wake_function+0x0/0x2d
[<c0129bb3>] autoremove_wake_function+0x0/0x2d
[<c01298a8>] kthread+0x9c/0xa1
[<c012980c>] kthread+0x0/0xa1
[<c01012e1>] kernel_thread_helper+0x5/0xb
Code: 00 58 39 fd c7 04 24 00 00 00 00 7d 53 0f b6 46 08 8b 8e 90 00 00 00 0f b7 93 18 02 00 00 01 c0 83 c1 08 39 d0 7f 2e 85 ed 75 0a <80> 79 05 02 0f 84 bf 00 00 00 80 bb 7c 01 00 00 ff 74 0a 80 79
<7>ieee1394: Host added: ID:BUS[0-00:1023] GUID[0800460300efc55d]

Here, the faulting address is ascii "UYSD". Ignore the "D" because it was
offset a bit by pointer indexing. That YSD appears in
drivers/base/core.c:dev_uevent().


Previously I've seen corruption of the local linux_binprm storage in
do_execve(). Often I've seen oopses like this:


audit(1141477276.672:3): policy loaded auid=4294967295
SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004
printing eip:
*pde = 00000000
Oops: 0002 [#1]
last sysfs file: /block/hda/removable
Modules linked in: ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd uhci_hcd ieee80211 sg ieee80211_crypt joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer snd i2c_core piix soundcore hw_random snd_page_alloc generic ext3 jbd ide_disk ide_core
CPU: 0
EIP: 0060:[<c0129011>] Not tainted VLI
EFLAGS: 00010046 (2.6.16-rc5-mm2 #287)
EIP is at add_wait_queue+0xd/0x1d
eax: f76dc000 ebx: 00000000 ecx: f768a018 edx: f768a00c
esi: 00000246 edi: f7f6b200 ebp: f7106764 esp: f7763ea0
ds: 007b es: 007b ss: 0068
Process udevd (pid: 1005, threadinfo=f7763000 task=f7729ab0)
Stack: <0>f76dc000 f7763f0c c015e6a7 f7f6b200 00000000 00000008 00000145 c0163c6e
00000000 00000000 00000000 00000003 00000008 00000000 00000000 00000008
f705bf6c f705bf68 f705bf64 f705bf74 f705bf70 f705bf6c 00000003 f7763f0c
Call Trace:
<c015e6a7> pipe_poll+0x24/0x94 <c0163c6e> do_select+0x2d1/0x2d5
<c016385f> __pollwait+0x0/0x94 <c0163e49> core_sys_select+0x1c4/0x2f8
<c0164007> sys_select+0x8a/0x18f <c0102a9b> sysenter_past_esp+0x54/0x75
Code: 11 c0 e8 af e1 ff ff eb c2 89 d8 e8 28 dd fe ff eb af 8d 06 e8 55 72 19 00 eb a6 90 90 90 56 53 83 22 fe 9c 5e fa 8b 18 8d 4a 0c <89> 4b 04 89 5a 0c 89 41 04 89 08 56 9d 5b 5e c3 56 53 83 0a 01
<1>BUG: unable to handle kernel paging request at virtual address 464c457f
printing eip:
c0117418
*pde = 3e427067
Oops: 0000 [#2]
last sysfs file: /class/firmware/0000:06:0b.0/loading
Modules linked in: ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd uhci_hcd ieee80211 sg ieee80211_crypt joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer snd i2c_core piix soundcore hw_random snd_page_alloc generic ext3 jbd ide_disk ide_core
CPU: 0
EIP: 0060:[<c0117418>] Not tainted VLI
EFLAGS: 00010092 (2.6.16-rc5-mm2 #287)
EIP is at __wake_up_common+0x12/0x57
eax: 464c457f ebx: 00000202 ecx: 00000001 edx: 00000001
esi: 00000001 edi: f76dc000 ebp: f7eb1f1c esp: f7eb1f04
ds: 007b es: 007b ss: 0068
Process cat (pid: 1013, threadinfo=f7eb1000 task=f7ebe570)
Stack: <0>00000006 c0339118 00000001 00000202 c17e9f40 00000006 f7eb1f30 c011746e
00000000 00000000 f76dc000 00000001 c015e4e8 00000000 f76dc000 f76dc010
00000000 00000001 00000006 f76dc000 f7106764 f7eb1f78 f7f6b740 00000006
Call Trace:
<c011746e> __wake_up+0x11/0x1a <c015e4e8> pipe_writev+0x273/0x38a
<c015e61b> pipe_write+0x1c/0x20 <c0154386> vfs_write+0xb1/0x151
<c01544c6> sys_write+0x3c/0x64 <c0102a9b> sysenter_past_esp+0x54/0x75
Code: c0 89 f0 e8 df f6 ff ff e9 f4 fd ff ff 55 89 e5 8b 40 04 5d e9 95 f8 ff ff 55 89 e5 57 89 c7 56 89 ce 53 83 ec 0c 89 55 f0 8b 00 <8b> 10 39 f8 89 55 ec 74 34 8d 58 f4 8b 55 f0 8b 40 f4 ff 75 0c
<1>BUG: unable to handle kernel paging request at virtual address 6f732e6f
printing eip:
c0163033
*pde = 00000000
Oops: 0000 [#3]


Note that in oops #2, the faulting address is ascii "\7fELF". We think
that kernel_read() has read an elf file on top of *(inode->i_pipe) in
pipe_writev. Note that i_pipe and linux_binprm both use the size-512 slab,
and the offsets into that memory is 0x00 in both cases.

But the management of do_execve():bprm is super-simple.


Here's another pipe-got-overwritten-by-ELF oops:

audit(1141524105.698:3): policy loaded auid=4294967295
SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
BUG: unable to handle kernel paging request at virtual address 464c457f
printing eip:
*pde = 3eeb0067
Oops: 0000 [#1]
last sysfs file: /devices/pci0000:00/0000:00:1d.7/usb5/5-0:1.0/modalias
Modules linked in: ipw2200 ohci1394 ieee1394 ehci_hcd uhci_hcd sg ieee80211 ieee80211_crypt joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer piix snd i2c_core soundcore hw_random snd_page_alloc generic ext3 jbd ide_disk ide_core
CPU: 0
EIP: 0060:[<c011741a>] Not tainted VLI
EFLAGS: 00010092 (2.6.16-rc5-mm2 #293)
EIP is at __wake_up_common+0x12/0x57
eax: 464c457f ebx: 00000202 ecx: 00000001 edx: 00000001
esi: 00000001 edi: f76fde00 ebp: f7f7af1c esp: f7f7af04
ds: 007b es: 007b ss: 0068
Process sh (pid: 994, threadinfo=f7f7a000 task=f776a030)
Stack: <0>0000000f c0338128 00000001 00000202 c17d5b60 0000000f f7f7af30 c0117470
00000000 00000000 f76fde00 00000001 c015e00c 00000000 f76fde00 f76fde10
00000000 00000001 0000000f f76fde00 f706b320 f7f7af78 f7702900 0000000f
Call Trace:
<c0117470> __wake_up+0x11/0x1a <c015e00c> pipe_writev+0x273/0x38a
<c015e13f> pipe_write+0x1c/0x20 <c0153eae> vfs_write+0xb1/0x151
<c0153fee> sys_write+0x3c/0x64 <c0102a9b> sysenter_past_esp+0x54/0x75
Code: c0 89 f0 e8 d9 f6 ff ff e9 f4 fd ff ff 55 89 e5 8b 40 04 5d e9 95 f8 ff ff 55 89 e5 57 89 c7 56 89 ce 53 83 ec 0c 89 55 f0 8b 00 <8b> 10 39 f8 89 55 ec 74 34 8d 58 f4 8b 55 f0 8b 40 f4 ff 75 0c
<1>BUG: unable to handle kernel paging request at virtual address 0064777f


It takes maybe an hour of rebooting to reproduce this. Kernel is uniproc,
voluntary preempt
(http://www.zip.com.au/~akpm/linux/patches/stuff/config-sony).

The next stage in brute-forcing is to start bisecting Linus's tree,
applying the sched patches at each stage. At a few hours per step I don't
relish that.

Ingo, could you please take a real close look at the above six sched
patches, see if you can spot anything in there which might cause this?

I have a feeling that this bug has been there for a while - I have seen a
couple of funny oopses booting this machine over the past couple of months.
Those sched patches have been in -mm for a couple of months too.

Enabling CONFIG_SLAB_DEBUG or CONFIG_DEBUG_PAGEALLOC seems to make the bug
go away (naturally). I'll retest that now.


2006-03-06 03:28:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()


Andrew,
I'm worried about the fact that kmalloc() doesn't get redlining.

The lifetime rules for the pipe_info thing and the bprm seems totally
obvious, and we'd get slab errors if somebody was doing something strange
there anyway.

So I'd be more inclined to blame a buffer overflow on a kmalloc, and the
obvious target is the "add_uevent_var()" thing, since all/many of the
corruptions seem to come from uevent environment variable strings.

Because kmalloc() doesn't do redlining, we'd never see an overflow as an
error, and it would just overwrite the next block. Now, they are in
different slab blocks (the uevent strign allocation is a 2048-byte
allocation), but maybe it flows over into the next page entirely..

Now, it all uses "vnsprintf()" which _should_ be safe, but that in turn
uses pointer comparisons, so maybe gcc screws that up. Who knows. Gcc has
been known to use signed comparisons on pointers and other brokenness. And
we could just have screwed something up (not updating "len" when we update
the buffer start etc etc)

Anyway, this trivial patch will check for buffer length consistency and
overflow by just putting a magic value at the end of the buffer, and
checking it. Maybe.

I don't see anything wrong there, and booting with this patch doesn't
trigger anything for me, but it's simple enough to be worth checking out.

Linus
---
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 086a0c6..366214a 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -51,6 +51,9 @@ static char *action_to_string(enum kobje
}
}

+#define UEV_MAGIC (0xc0edbabeu)
+#define uev_magic(buffer, len) ((unsigned int *) ((len) + (void *)(buffer)))[-1]
+
/**
* kobject_uevent - notify userspace by ending an uevent
*
@@ -107,6 +110,8 @@ void kobject_uevent(struct kobject *kobj
if (!buffer)
goto exit;

+ uev_magic(buffer, BUFFER_SIZE) = UEV_MAGIC;
+
/* complete object path */
devpath = kobject_get_path(kobj, GFP_KERNEL);
if (!devpath)
@@ -223,6 +228,10 @@ int add_uevent_var(char **envp, int num_
{
va_list args;

+ BUG_ON(buffer_size < 4);
+ BUG_ON(buffer_size > 2048);
+ BUG_ON(uev_magic(buffer, buffer_size) != UEV_MAGIC);
+
/*
* We check against num_envp - 1 to make sure there is at
* least one slot left after we return, since kobject_uevent()

2006-03-06 05:00:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()



On Sun, 5 Mar 2006, Andrew Morton wrote:
>
> For several days I've been getting repeatable oopses in the -mm kernel.
> They occur once per ~30 boots, during initscripts.

Actually, having thought about this some more, I wonder if the bug isn't a
hell of a lot simpler than we've given it credit for.

I think you're running with CONFIG_PREEMPT_VOLUNTARY, right?

And looking more closely, that thing is BROKEN. DaveJ - do Fedora kernels
also enable that thing?

Ingo: as far as I can see, CONFIG_PREEMPT_VOLUNTARY is totally and utterly
broken during bootup. It does:

# define might_resched() cond_resched()

and then we have

# define might_sleep() do { might_resched(); } while (0)

and but the fact is, we _know_ that "might_sleep()" is broken during early
bootup. We know this, because when we ahev __might_sleep() enabled to
warn about cases where we must not sleep, we've had those tests disabled
during early boot for a long time, in order to avoid irritating and nasty
known "sleeping function called from invalid context" messages:

...
if ((in_atomic() || irqs_disabled()) &&
system_state == SYSTEM_RUNNING && !oops_in_progress) {
if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
...

Note in particular the "system_state == SYSTEM_RUNNING". It's there for a
reason. Namely that we know that we do things that aren't valid during
early bootup, and that we call functions that might sleep while we have
interrupts disabled, for example.

HOWEVER, the "cond_resched()" does not take that into account at all, and
will happily conditionally reschedule things at early bootup before we
have set system_state to SYSTEM_RUNNING.

In other words, unless I've totally lost it, I think that
CONFIG_PREEMPT_VOLUNTARY currently makes us re-schedule at points in the
early boot that we _know_ are unsafe. We happen to not hit it very often,
because (a) some of the time it doesn't matter and (b) when it matters, we
seldom have "need_resched()" returning true, but I would not be at all
surprised if Andrew's problems are because the scheduler heuristics make
it happen when it shouldn't.

And the end result? I don't know. But we've traditionally run _all_ of the
early boot ignoring the "might_sleep()" warnings, up until the point where
we unlock the kernel lock, long after things like kmem_cache_init().

So I would not be surprised, for example, if we had kmem_cache_init()
doing bad things because it got interrupts enabled at a point where it
shouldn't, because it went through the scheduler.

I dunno. I can't actually see what would corrupt anything, but the point
is that we definitely do scheduling in places that have gotten absolutely
_zero_ coverage, because we turned off the checks on purpose during early
boot because the checks gave false positives.

And CONFIG_PREEMPT_VOLUNTARY turns those false positives into potential
rescheduling events.

Maybe I'm crazy. But it looks really really broken to me.

Andrew, if I'm right, then this ugly patch should make a difference.

Is there something else I've missed?

Linus

----
diff --git a/kernel/sched.c b/kernel/sched.c
index 12d291b..3454bb8 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4028,6 +4028,8 @@ static inline void __cond_resched(void)
*/
if (unlikely(preempt_count()))
return;
+ if (unlikely(system_state != SYSTEM_RUNNING))
+ return;
do {
add_preempt_count(PREEMPT_ACTIVE);
schedule();

2006-03-06 07:47:16

by Mike Galbraith

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()

On Sun, 2006-03-05 at 21:00 -0800, Linus Torvalds wrote:

>
> Is there something else I've missed?

Maybe. Does this add anything to the picture? During boot,
recalc_task_prio() is called with now < p->timestamp. This causes quite
a stir. If you WARN_ON(now < p->timestamp) or printk, you'll have a
dead box due to hundreds of gripes as things churn. Adding...

if (unlikely(now < p->timestamp))
__sleep_time = 0ULL;

...turns it into exactly one gripe.

-Mike

2006-03-06 08:50:25

by Andrew Morton

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()

Linus Torvalds <[email protected]> wrote:
>
>
> Andrew,
> I'm worried about the fact that kmalloc() doesn't get redlining.
>
> The lifetime rules for the pipe_info thing and the bprm seems totally
> obvious, and we'd get slab errors if somebody was doing something strange
> there anyway.
>
> So I'd be more inclined to blame a buffer overflow on a kmalloc, and the
> obvious target is the "add_uevent_var()" thing, since all/many of the
> corruptions seem to come from uevent environment variable strings.
>
> Because kmalloc() doesn't do redlining, we'd never see an overflow as an
> error, and it would just overwrite the next block. Now, they are in
> different slab blocks (the uevent strign allocation is a 2048-byte
> allocation), but maybe it flows over into the next page entirely..
>
> Now, it all uses "vnsprintf()" which _should_ be safe, but that in turn
> uses pointer comparisons, so maybe gcc screws that up. Who knows. Gcc has
> been known to use signed comparisons on pointers and other brokenness. And
> we could just have screwed something up (not updating "len" when we update
> the buffer start etc etc)

Maybe. Something which is as deterministic as that would trip
CONFIG_DEBUG_PAGEALLOC though.

> Anyway, this trivial patch will check for buffer length consistency and
> overflow by just putting a magic value at the end of the buffer, and
> checking it. Maybe.
>
> I don't see anything wrong there, and booting with this patch doesn't
> trigger anything for me, but it's simple enough to be worth checking out.

Yup. The wind changed direction and I lost the ability to reproduce it for
a few hours. Going back to exactly the rc5-mm2 lineup did the trick. Your
debug patch didn't trigger.



SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts
SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts
audit(1141604160.744:3): policy loaded auid=4294967295
SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
BUG: unable to handle kernel paging request at virtual address 44535955
printing eip:
*pde = 00000000
Oops: 0000 [#1]
last sysfs file: /class/firmware/0000:06:0b.0/loading
Modules linked in: ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd ieee80211 uhci_hcd sg joydev ieee80211_crypt snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm hw_random i2c_i801 snd_timer piix snd i2c_core soundcore snd_page_alloc generic ext3 jbd ide_disk ide_core
CPU: 0
EIP: 0060:[<c02437ed>] Not tainted VLI
EFLAGS: 00010246 (2.6.16-rc5-mm2 #302)
EIP is at choose_configuration+0x58/0x13a
eax: 000000d2 ebx: f70aac00 ecx: 44535950 edx: 000001f4
esi: f76eca00 edi: 00000001 ebp: 00000000 esp: f7f3def8
ds: 007b es: 007b ss: 0068
Process khubd (pid: 114, threadinfo=f7f3d000 task=f7f3cab0)
Stack: <0>00000000 0000d000 f70aac00 f70aac00 00000000 00000001 c0243a5a 00000000
f70aac00 c1caa3c0 c0244ba1 00000000 0000000a f76ec814 f7029000 00000011
c1caa3c0 00000001 f7029000 00000001 c0244ff8 00000003 00000000 f76ec800
Call Trace:
<c0243a5a> usb_new_device+0x15a/0x1d0 <c0244ba1> hub_port_connect_change+0x1ad/0x343
<c0244ff8> hub_events+0x2c1/0x3f1 <c0245128> hub_thread+0x0/0xe6
<c0245132> hub_thread+0xa/0xe6 <c0128dfb> autoremove_wake_function+0x0/0x2d
<c0128dfb> autoremove_wake_function+0x0/0x2d <c0128b0c> kthread+0x9c/0xa1
<c0128a70> kthread+0x0/0xa1 <c01012ed> kernel_thread_helper+0x5/0xb
Code: 00 58 39 fd c7 04 24 00 00 00 00 7d 53 0f b6 46 08 8b 8e 90 00 00 00 0f b7 93 18 02 00 00 01 c0 83 c1 08 39 d0 7f 2e 85 ed 75 0a <80> 79 05 02 0f 84 bf 00 00 00 80 bb 7c 01 00 00 ff 74 0a 80 79
<6>Non-volatile memory driver v1.2
BUG: unable to handle kernel paging request at virtual address 6963702f
printing eip:
c01bc9b0
*pde = 00000000
Oops: 0000 [#2]
last sysfs file: /class/firmware/0000:06:0b.0/loading
Modules linked in: nvram ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd ieee80211 uhci_hcd sg joydev ieee80211_crypt snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm hw_random i2c_i801 snd_timer piix snd i2c_core soundcore snd_page_alloc generic ext3 jbd ide_disk ide_core
CPU: 0
EIP: 0060:[<c01bc9b0>] Not tainted VLI
EFLAGS: 00010246 (2.6.16-rc5-mm2 #302)
EIP is at kobject_uevent+0x28d/0x37c
eax: 00000000 ebx: f76f76c0 ecx: ffffffff edx: f7176efc
esi: 0000002a edi: 6963702f ebp: f77542c0 esp: f7176f04
ds: 007b es: 007b ss: 0068
Process knodemgrd_0 (pid: 1025, threadinfo=f7176000 task=f7788ab0)
Stack: <0>f70f1000 c034da80 00000002 f70f1051 f8cf6e4d f7097640 c02e93d8 f76e9568
00000246 00000000 f7758a94 f76e9568 f76e9568 f76e9560 f76e9568 f8d28c40
c021e752 00000000 f70473a0 00000000 f76e9400 f76e9440 f76e9560 f763c000
Call Trace:
<c021e752> class_device_add+0x14e/0x201 <f8cf1efe> nodemgr_create_node+0x104/0x17b [ieee1394]
<f8cf2b1a> nodemgr_node_scan_one+0x160/0x16e [ieee1394] <f8cf2b76> nodemgr_node_scan+0x4e/0x50 [ieee1394]
<f8cf3277> nodemgr_host_thread+0xdc/0x14c [ieee1394] <f8cf319b> nodemgr_host_thread+0x0/0x14c [ieee1394]
<c01012ed> kernel_thread_helper+0x5/0xb
Code: 24 1c 68 01 1d 2f c0 57 e8 6e 25 00 00 c7 44 24 18 02 00 00 00 83 c4 10 83 7d 08 00 74 4f 8b 44 24 08 83 c9 ff 8b 7c 85 00 31 c0 <f2> ae f7 d1 49 83 7b 64 00 8d 71 01 8b bb 9c 00 00 00 75 61 01
<6>ACPI: AC Adapter [ACAD] (on-line)
ACPI: Battery Slot [BAT1] (battery present)
ACPI: Lid Switch [LID0]


As is usual with this particular manifestation, we died here:

if (i == 0 && desc->bInterfaceClass == USB_CLASS_COMM


(gdb) p sizeof(struct usb_interface_descriptor)
$3 = 9
(gdb) p sizeof(struct usb_host_config)
$4 = 280
(gdb) p sizeof(struct usb_interface_cache)
$5 = 8
(gdb) p sizeof(struct usb_host_interface)
$6 = 28
(gdb) offsetof usb_interface_descriptor bInterfaceClass
5 0x5

And that's quite different. `struct usb_interface_cache' is only 8 bytes,
and the scribble is (as far as we know), not at offset zero. Unless there
are a large number of `usb_host_interface's in `struct
usb_interface_cache's variable-sized array, we're not using the size-512
slab here.

<looks at choose_configuration()>

struct usb_interface_descriptor *desc =
&c->intf_cache[0]->altsetting->desc;

Seems funny.

struct usb_interface_cache {
unsigned num_altsetting; /* number of alternate settings */
struct kref ref; /* reference counter */

/* variable-length array of alternate settings for this interface,
* stored in no particular order */
struct usb_host_interface altsetting[0];
};

A clearer way of coding that assignment would have been

struct usb_interface_descriptor *desc =
&c->intf_cache[0]->altsetting[0].desc;

a) How come we're only considering the zeroth slot in that array in here?

b) How do we know that there's actually anything _there_? The length of
that variable-sized array doesn't seem to have been stored anywhere
obvious by usb_parse_configuration() and choose_configuration() doesn't
check. What happens if the length was zero?

drivers/usb/core/config.c hasn't changed since October. If it was this
easy, we'd have hit it before now.

Greg, who knows this code? Can we triple-check it please?



Oh dear, there it goes.


SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts
audit(1141605301.698:3): policy loaded auid=4294967295
SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
------------[ cut here ]------------
kernel BUG at kernel/timer.c:213!
invalid opcode: 0000 [#1]
last sysfs file: /class/firmware/0000:06:0b.0/loading
Modules linked in: ide_cd cdrom ipw2200 ohci1394 ieee1394 ieee80211 ehci_hcd uhci_hcd ieee80211_crypt sg joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 hw_random piix snd_timer i2c_core snd soundcore snd_page_alloc generic ext3 jbd ide_disk ide_core
CPU: 0
EIP: 0060:[<c0120d68>] Not tainted VLI
EFLAGS: 00010246 (2.6.16-rc5-mm2 #302)
EIP is at __mod_timer+0x5e/0x68
eax: f70a0898 ebx: f70a0898 ecx: 00000000 edx: fffef455
esi: fffef455 edi: 00000000 ebp: f7fadc48 esp: f77e4e10
ds: 007b es: 007b ss: 0068
Process modprobe (pid: 613, threadinfo=f77e4000 task=f77d8570)
Stack: <0>00000000 f70a0840 f8d616e4 00000000 c022125f f706e240 f70faa0c f8d54328
f8d616e4 f8d54328 c0221323 00000001 f8d426d8 f70faa0c 00000000 00000003
f8d42a75 f8d54313 f70faa0c f70faa10 00000000 f7fadcb0 f8d4df95 f70faa0c
Call Trace:
<c022125f> _request_firmware+0x8f/0x14c <c0221323> request_firmware+0x7/0x9
<f8d426d8> ipw_get_fw+0x16/0xa0 [ipw2200] <f8d42a75> ipw_load+0x232/0x799 [ipw2200]
<f8d4df95> ipw_up+0x48/0x358 [ipw2200] <f8d4e550> ipw_net_init+0x1c/0x3a [ipw2200]
<c0271593> register_netdevice+0x4c/0x2da <c0187433> create_files+0x23/0x44
<c0271869> register_netdev+0x48/0x68 <f8d4e8b7> ipw_pci_probe+0x349/0x401 [ipw2200]
<c0117484> __wake_up_locked+0x11/0x13 <c021db21> __driver_attach+0x0/0x6b
<c01c693c> pci_call_probe+0xa/0xc <c01c6972> __pci_device_probe+0x34/0x41
<c01c699d> pci_device_probe+0x1e/0x32 <c021da87> driver_probe_device+0x49/0x8c
<c021db77> __driver_attach+0x56/0x6b <c021d2e2> bus_for_each_dev+0x44/0x59
<c021db9d> driver_attach+0x11/0x13 <c021db21> __driver_attach+0x0/0x6b
<c021d68e> bus_add_driver+0x57/0x81 <c021df2c> driver_register+0x70/0x75
<c012f3bb> sys_init_module+0x66/0x15f <c01c6b28> __pci_register_driver+0x37/0x49
<c011a909> printk+0xe/0x11 <f8c93015> ipw_init+0x15/0x6a [ipw2200]
<f8c93026> ipw_init+0x26/0x6a [ipw2200] <c012f49c> sys_init_module+0x147/0x15f
<c0102a9b> sysenter_past_esp+0x54/0x75
Code: 20 00 a1 84 08 49 c0 39 c1 74 07 39 19 74 18 89 43 14 89 73 08 89 da e8 fb fe ff ff ff 34 24 9d 5a 89 f8 5b 5e 5f c3 89 c8 eb e7 <0f> 0b d5 00 c4 44 2e c0 eb a8 53 31 d2 8b 0d 84 08 49 c0 83 38
BUG: modprobe/613, lock held at task exit time!
[c0357220] {rtnl_mutex}
.. held by: modprobe: 613 [f77d8570, 113]
... acquired at: register_netdev+0xb/0x68

2006-03-06 09:15:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()


* Linus Torvalds <[email protected]> wrote:

> @@ -4028,6 +4028,8 @@ static inline void __cond_resched(void)
> */
> if (unlikely(preempt_count()))
> return;
> + if (unlikely(system_state != SYSTEM_RUNNING))
> + return;

we used to have this, but it was frowned upon during the initial lkml
review of -VP so i fixed all the known 'early bootup scheduling
assumptions' and removed this condition .. but that was a losing battle
i suspect. In any case, i fully agree with your fix. We also need this
fix for v2.6.16.

Acked-by: Ingo Molnar <[email protected]>

Ingo

2006-03-06 10:33:37

by Andrew Morton

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()

Linus Torvalds <[email protected]> wrote:
>
>
>
> On Sun, 5 Mar 2006, Andrew Morton wrote:
> >
> > For several days I've been getting repeatable oopses in the -mm kernel.
> > They occur once per ~30 boots, during initscripts.
>
> Actually, having thought about this some more, I wonder if the bug isn't a
> hell of a lot simpler than we've given it credit for.
>
> I think you're running with CONFIG_PREEMPT_VOLUNTARY, right?

Yes.

> ...
> And CONFIG_PREEMPT_VOLUNTARY turns those false positives into potential
> rescheduling events.

Yes. A similar problem exists with CONFIG_PREEMPT_BKL - we're doing down()
inside lock_kenrel() in situations where we shouldn't be sleeping yet.
That's one of the reasons why might_sleep() is suppressed early in boot.
We assume that the down() won't be contented early in boot. We were
looking at fixing all that up a month or so back, but the patches started
getting pretty ugly.

Still. If either lock_kernel() or cond_resched() _do_ sleep, we'll
hopefully get scheduling-while-atomic warnings.

> index 12d291b..3454bb8 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4028,6 +4028,8 @@ static inline void __cond_resched(void)
> */
> if (unlikely(preempt_count()))
> return;
> + if (unlikely(system_state != SYSTEM_RUNNING))
> + return;
> do {
> add_preempt_count(PREEMPT_ACTIVE);
> schedule();

It died after 35 reboots.

SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts
SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts
audit(1141611448.702:3): policy loaded auid=4294967295
SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
BUG: unable to handle kernel paging request at virtual address 464c4583
printing eip:
*pde = 3f1ea067
Oops: 0002 [#1]
last sysfs file: /devices/pci0000:00/0000:00:1d.2/usb3/idVendor
Modules linked in: nvram snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq ide_cd cdrom snd_seq_device snd_pcm_oss ipw2200 snd_mixer_oss ieee80211 ieee80211_crypt ohci1394 snd_pcm ieee1394 uhci_hcd sg ehci_hcd joydev snd_timer snd soundcore i2c_i801 piix i2c_core hw_random snd_page_alloc generic ext3 jbd ide_disk ide_core
CPU: 0
EIP: 0060:[<c0128d66>] Not tainted VLI
EFLAGS: 00010046 (2.6.16-rc5-mm2 #303)
EIP is at prepare_to_wait+0x19/0x3b
eax: 464c457f ebx: f776cf08 ecx: 00000246 edx: f776cf14
esi: f76fde00 edi: 00000041 ebp: f71378d0 esp: f776ced0
ds: 007b es: 007b ss: 0068
Process loadkeys (pid: 1119, threadinfo=f776c000 task=f7eaa030)
Stack: <0>f7137940 f776cefc f776cf1c c015db02 f7137940 f776cee8 00000000 f7eaa030
c0128e0b f776cf14 f776cf14 00000246 c02c0f02 c015dc4b 00000000 f7eaa030
c0128e0b f776cf14 f776cf14 c01a0a63 f776cf34 00000000 f76fde00 f7cbb240
Call Trace:
<c015db02> pipe_wait+0x66/0x90 <c0128e0b> autoremove_wake_function+0x0/0x2d
<c02c0f02> __mutex_lock_slowpath+0x1e1/0x339 <c015dc4b> pipe_readv+0x58/0x25a
<c0128e0b> autoremove_wake_function+0x0/0x2d <c01a0a63> file_has_perm+0x87/0x8a
<c015dd36> pipe_readv+0x143/0x25a <c015de69> pipe_read+0x1c/0x20
<c0153d36> vfs_read+0xae/0x14e <c015405e> sys_read+0x3c/0x64
<c0102a9b> sysenter_past_esp+0x54/0x75
Code: 43 04 00 02 20 00 c7 42 0c 00 01 10 00 56 9d 5b 5e c3 57 89 cf 56 89 c6 53 89 d3 83 22 fe 9c 59 fa 8d 52 0c 39 53 0c 75 0d 8b 00 <89> 50 04 89 43 0c 89 72 04 89 16 83 7b 04 00 74 0b b8 00 f0 ff


That's "\7fELF" again, plus 4 bytes.

Note that this time it's the sleeper that went splat - usually it's the
waker.


I don't think that a bisection search of mainline (adding the sched patches
each time) will be fruitful really. Just breathing on this thing makes the
crashes go away.

2006-03-06 11:04:35

by Jesper Juhl

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()

On 3/6/06, Andrew Morton <[email protected]> wrote:
>
> Might as well cc the list on this, so there's a record...
>
> For several days I've been getting repeatable oopses in the -mm kernel.
> They occur once per ~30 boots, during initscripts.
>
[snip]

Hi Andrew,

I send a mail to the list yesterday (with you and a few other people
on cc) about Slab corruption in 2.6.16-5c5-mm2.
You have a problem with memory corruption and I'm seeing slab
corruption messages in dmesg...

Just wanted to point it out in this thread in case they are related.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-03-06 11:17:34

by Andrew Morton

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()

"Jesper Juhl" <[email protected]> wrote:
>
> On 3/6/06, Andrew Morton <[email protected]> wrote:
> >
> > Might as well cc the list on this, so there's a record...
> >
> > For several days I've been getting repeatable oopses in the -mm kernel.
> > They occur once per ~30 boots, during initscripts.
> >
> [snip]
>
> Hi Andrew,
>
> I send a mail to the list yesterday (with you and a few other people
> on cc) about Slab corruption in 2.6.16-5c5-mm2.
> You have a problem with memory corruption and I'm seeing slab
> corruption messages in dmesg...
>
> Just wanted to point it out in this thread in case they are related.
>

Yes, there have been several memory-scribblish reports. Until this one's
found I'm not sure there's much we can do with them.

I'll do an overnight run with CONFIG_DEBUG_SLAB, see if that catches
anything. Although AFAICT it just fixes the bug.

Beyond that, I'm rather stuck. I'd probably have to drop a pile of sched
patches, see if that improves things for people. But if we cannot spot a
bug in those patches then it's probably in mainline and it will bite
people in generally random and rare ways.

2006-03-06 15:59:28

by Dave Jones

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()

On Mon, Mar 06, 2006 at 03:15:50AM -0800, Andrew Morton wrote:

> Yes, there have been several memory-scribblish reports. Until this one's
> found I'm not sure there's much we can do with them.
>
> I'll do an overnight run with CONFIG_DEBUG_SLAB, see if that catches
> anything. Although AFAICT it just fixes the bug.
>
> Beyond that, I'm rather stuck. I'd probably have to drop a pile of sched
> patches, see if that improves things for people. But if we cannot spot a
> bug in those patches then it's probably in mainline and it will bite
> people in generally random and rare ways.

Not sure if it was you or Linus I private-mailed about this, but
to reiterate - circa rc4-git2, the Fedora rawhide kernel stopped booting
for a bunch of people, all with 686-SMP boxes. I saw it myself too,
it hung just after the 'write protecting kernel rodata'.

It totally puzzled me for a day. The following day, I rebased to rc4-git4,
and the problem "went away". Nothing in the changesets merged could
explain the hangs I saw.

A few days ago, the exact same bug resurfaced, and like before, the
following day, it went into hiding again.

I'm terrified the release kernel I build for FC5 is going to be a
stinker because of this, but every time I tried to instrument this,
the bugger disappeared.

I'm not sure this is even remotely the same bug you're chasing,
but the unpredictability of really nasty bugs like these give
me the creeps.

Dave

--
http://www.codemonkey.org.uk

2006-03-07 05:50:50

by Mike Galbraith

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()

On Mon, 2006-03-06 at 08:47 +0100, Mike Galbraith wrote:
> On Sun, 2006-03-05 at 21:00 -0800, Linus Torvalds wrote:
>
> >
> > Is there something else I've missed?
>
> Maybe. Does this add anything to the picture? During boot,
> recalc_task_prio() is called with now < p->timestamp. This causes quite
> a stir. If you WARN_ON(now < p->timestamp) or printk, you'll have a
> dead box due to hundreds of gripes as things churn. Adding...
>
> if (unlikely(now < p->timestamp))
> __sleep_time = 0ULL;
>
> ...turns it into exactly one gripe.

Nope. Further research shows that this is just a speed-step problem.
Still, the scheduler needs to protect itself, because even with all of
the speed-step stuff enabled, these continue to occurr even after
boot-up if you switch to a low power setting, which screws up the
scheduler.

-Mike

2006-03-07 05:56:29

by Nick Piggin

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()

Mike Galbraith wrote:
> On Mon, 2006-03-06 at 08:47 +0100, Mike Galbraith wrote:
>
>>On Sun, 2006-03-05 at 21:00 -0800, Linus Torvalds wrote:
>>
>>
>>>Is there something else I've missed?
>>
>>Maybe. Does this add anything to the picture? During boot,
>>recalc_task_prio() is called with now < p->timestamp. This causes quite
>>a stir. If you WARN_ON(now < p->timestamp) or printk, you'll have a
>>dead box due to hundreds of gripes as things churn. Adding...
>>
>>if (unlikely(now < p->timestamp))
>> __sleep_time = 0ULL;
>>
>>...turns it into exactly one gripe.
>
>
> Nope. Further research shows that this is just a speed-step problem.
> Still, the scheduler needs to protect itself, because even with all of
> the speed-step stuff enabled, these continue to occurr even after
> boot-up if you switch to a low power setting, which screws up the
> scheduler.
>

sched_clock needs to be fixed, rather than scheduler.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-03-08 01:31:16

by Greg KH

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()

On Mon, Mar 06, 2006 at 12:48:36AM -0800, Andrew Morton wrote:
> Linus Torvalds <[email protected]> wrote:
> >
> >
> > Andrew,
> > I'm worried about the fact that kmalloc() doesn't get redlining.
> >
> > The lifetime rules for the pipe_info thing and the bprm seems totally
> > obvious, and we'd get slab errors if somebody was doing something strange
> > there anyway.
> >
> > So I'd be more inclined to blame a buffer overflow on a kmalloc, and the
> > obvious target is the "add_uevent_var()" thing, since all/many of the
> > corruptions seem to come from uevent environment variable strings.
> >
> > Because kmalloc() doesn't do redlining, we'd never see an overflow as an
> > error, and it would just overwrite the next block. Now, they are in
> > different slab blocks (the uevent strign allocation is a 2048-byte
> > allocation), but maybe it flows over into the next page entirely..
> >
> > Now, it all uses "vnsprintf()" which _should_ be safe, but that in turn
> > uses pointer comparisons, so maybe gcc screws that up. Who knows. Gcc has
> > been known to use signed comparisons on pointers and other brokenness. And
> > we could just have screwed something up (not updating "len" when we update
> > the buffer start etc etc)
>
> Maybe. Something which is as deterministic as that would trip
> CONFIG_DEBUG_PAGEALLOC though.
>
> > Anyway, this trivial patch will check for buffer length consistency and
> > overflow by just putting a magic value at the end of the buffer, and
> > checking it. Maybe.
> >
> > I don't see anything wrong there, and booting with this patch doesn't
> > trigger anything for me, but it's simple enough to be worth checking out.
>
> Yup. The wind changed direction and I lost the ability to reproduce it for
> a few hours. Going back to exactly the rc5-mm2 lineup did the trick. Your
> debug patch didn't trigger.
>
>
>
> SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts
> SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts
> audit(1141604160.744:3): policy loaded auid=4294967295
> SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
> BUG: unable to handle kernel paging request at virtual address 44535955
> printing eip:
> *pde = 00000000
> Oops: 0000 [#1]
> last sysfs file: /class/firmware/0000:06:0b.0/loading
> Modules linked in: ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd ieee80211 uhci_hcd sg joydev ieee80211_crypt snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm hw_random i2c_i801 snd_timer piix snd i2c_core soundcore snd_page_alloc generic ext3 jbd ide_disk ide_core
> CPU: 0
> EIP: 0060:[<c02437ed>] Not tainted VLI
> EFLAGS: 00010246 (2.6.16-rc5-mm2 #302)
> EIP is at choose_configuration+0x58/0x13a
> eax: 000000d2 ebx: f70aac00 ecx: 44535950 edx: 000001f4
> esi: f76eca00 edi: 00000001 ebp: 00000000 esp: f7f3def8
> ds: 007b es: 007b ss: 0068
> Process khubd (pid: 114, threadinfo=f7f3d000 task=f7f3cab0)
> Stack: <0>00000000 0000d000 f70aac00 f70aac00 00000000 00000001 c0243a5a 00000000
> f70aac00 c1caa3c0 c0244ba1 00000000 0000000a f76ec814 f7029000 00000011
> c1caa3c0 00000001 f7029000 00000001 c0244ff8 00000003 00000000 f76ec800
> Call Trace:
> <c0243a5a> usb_new_device+0x15a/0x1d0 <c0244ba1> hub_port_connect_change+0x1ad/0x343
> <c0244ff8> hub_events+0x2c1/0x3f1 <c0245128> hub_thread+0x0/0xe6
> <c0245132> hub_thread+0xa/0xe6 <c0128dfb> autoremove_wake_function+0x0/0x2d
> <c0128dfb> autoremove_wake_function+0x0/0x2d <c0128b0c> kthread+0x9c/0xa1
> <c0128a70> kthread+0x0/0xa1 <c01012ed> kernel_thread_helper+0x5/0xb
> Code: 00 58 39 fd c7 04 24 00 00 00 00 7d 53 0f b6 46 08 8b 8e 90 00 00 00 0f b7 93 18 02 00 00 01 c0 83 c1 08 39 d0 7f 2e 85 ed 75 0a <80> 79 05 02 0f 84 bf 00 00 00 80 bb 7c 01 00 00 ff 74 0a 80 79
> <6>Non-volatile memory driver v1.2
> BUG: unable to handle kernel paging request at virtual address 6963702f
> printing eip:
> c01bc9b0
> *pde = 00000000
> Oops: 0000 [#2]
> last sysfs file: /class/firmware/0000:06:0b.0/loading
> Modules linked in: nvram ide_cd cdrom ipw2200 ohci1394 ieee1394 ehci_hcd ieee80211 uhci_hcd sg joydev ieee80211_crypt snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm hw_random i2c_i801 snd_timer piix snd i2c_core soundcore snd_page_alloc generic ext3 jbd ide_disk ide_core
> CPU: 0
> EIP: 0060:[<c01bc9b0>] Not tainted VLI
> EFLAGS: 00010246 (2.6.16-rc5-mm2 #302)
> EIP is at kobject_uevent+0x28d/0x37c
> eax: 00000000 ebx: f76f76c0 ecx: ffffffff edx: f7176efc
> esi: 0000002a edi: 6963702f ebp: f77542c0 esp: f7176f04
> ds: 007b es: 007b ss: 0068
> Process knodemgrd_0 (pid: 1025, threadinfo=f7176000 task=f7788ab0)
> Stack: <0>f70f1000 c034da80 00000002 f70f1051 f8cf6e4d f7097640 c02e93d8 f76e9568
> 00000246 00000000 f7758a94 f76e9568 f76e9568 f76e9560 f76e9568 f8d28c40
> c021e752 00000000 f70473a0 00000000 f76e9400 f76e9440 f76e9560 f763c000
> Call Trace:
> <c021e752> class_device_add+0x14e/0x201 <f8cf1efe> nodemgr_create_node+0x104/0x17b [ieee1394]
> <f8cf2b1a> nodemgr_node_scan_one+0x160/0x16e [ieee1394] <f8cf2b76> nodemgr_node_scan+0x4e/0x50 [ieee1394]
> <f8cf3277> nodemgr_host_thread+0xdc/0x14c [ieee1394] <f8cf319b> nodemgr_host_thread+0x0/0x14c [ieee1394]
> <c01012ed> kernel_thread_helper+0x5/0xb
> Code: 24 1c 68 01 1d 2f c0 57 e8 6e 25 00 00 c7 44 24 18 02 00 00 00 83 c4 10 83 7d 08 00 74 4f 8b 44 24 08 83 c9 ff 8b 7c 85 00 31 c0 <f2> ae f7 d1 49 83 7b 64 00 8d 71 01 8b bb 9c 00 00 00 75 61 01
> <6>ACPI: AC Adapter [ACAD] (on-line)
> ACPI: Battery Slot [BAT1] (battery present)
> ACPI: Lid Switch [LID0]
>
>
> As is usual with this particular manifestation, we died here:
>
> if (i == 0 && desc->bInterfaceClass == USB_CLASS_COMM
>
>
> (gdb) p sizeof(struct usb_interface_descriptor)
> $3 = 9
> (gdb) p sizeof(struct usb_host_config)
> $4 = 280
> (gdb) p sizeof(struct usb_interface_cache)
> $5 = 8
> (gdb) p sizeof(struct usb_host_interface)
> $6 = 28
> (gdb) offsetof usb_interface_descriptor bInterfaceClass
> 5 0x5
>
> And that's quite different. `struct usb_interface_cache' is only 8 bytes,
> and the scribble is (as far as we know), not at offset zero. Unless there
> are a large number of `usb_host_interface's in `struct
> usb_interface_cache's variable-sized array, we're not using the size-512
> slab here.
>
> <looks at choose_configuration()>
>
> struct usb_interface_descriptor *desc =
> &c->intf_cache[0]->altsetting->desc;
>
> Seems funny.
>
> struct usb_interface_cache {
> unsigned num_altsetting; /* number of alternate settings */
> struct kref ref; /* reference counter */
>
> /* variable-length array of alternate settings for this interface,
> * stored in no particular order */
> struct usb_host_interface altsetting[0];
> };
>
> A clearer way of coding that assignment would have been
>
> struct usb_interface_descriptor *desc =
> &c->intf_cache[0]->altsetting[0].desc;

Agreed.

> a) How come we're only considering the zeroth slot in that array in here?

We start out with the first interface setting, as we always know we have
one of them as per the USB spec (I think, anyone from linux-usb-devel
want to verify this?)

> b) How do we know that there's actually anything _there_? The length of
> that variable-sized array doesn't seem to have been stored anywhere
> obvious by usb_parse_configuration() and choose_configuration() doesn't
> check. What happens if the length was zero?

I don't think it is allowed to be, as all USB devices have to have at
least 1 interface.

> drivers/usb/core/config.c hasn't changed since October. If it was this
> easy, we'd have hit it before now.
>
> Greg, who knows this code? Can we triple-check it please?

I'll walk through it again, it's some nasty crap, I agree. Time to dig
out the USB spec...

Anyone else on linux-usb-devel want to verify it too?

thanks,

greg k-h

2006-03-08 01:51:30

by Andrew Morton

[permalink] [raw]
Subject: Re: Fw: Re: oops in choose_configuration()

Greg KH <[email protected]> wrote:
>
> > a) How come we're only considering the zeroth slot in that array in here?
>
> We start out with the first interface setting, as we always know we have
> one of them as per the USB spec (I think, anyone from linux-usb-devel
> want to verify this?)

I think that code has to be OK - it's longstanding stuff and slab
debugging would have caught these sorts of things in a jiffy.

2006-03-08 15:30:50

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Fw: Re: oops in choose_configuration()

On Tue, 7 Mar 2006, Greg KH wrote:

> > As is usual with this particular manifestation, we died here:
> >
> > if (i == 0 && desc->bInterfaceClass == USB_CLASS_COMM
> >
> >
> > (gdb) p sizeof(struct usb_interface_descriptor)
> > $3 = 9
> > (gdb) p sizeof(struct usb_host_config)
> > $4 = 280
> > (gdb) p sizeof(struct usb_interface_cache)
> > $5 = 8
> > (gdb) p sizeof(struct usb_host_interface)
> > $6 = 28
> > (gdb) offsetof usb_interface_descriptor bInterfaceClass
> > 5 0x5
> >
> > And that's quite different. `struct usb_interface_cache' is only 8 bytes,
> > and the scribble is (as far as we know), not at offset zero. Unless there
> > are a large number of `usb_host_interface's in `struct
> > usb_interface_cache's variable-sized array, we're not using the size-512
> > slab here.
> >
> > <looks at choose_configuration()>
> >
> > struct usb_interface_descriptor *desc =
> > &c->intf_cache[0]->altsetting->desc;
> >
> > Seems funny.
> >
> > struct usb_interface_cache {
> > unsigned num_altsetting; /* number of alternate settings */
> > struct kref ref; /* reference counter */
> >
> > /* variable-length array of alternate settings for this interface,
> > * stored in no particular order */
> > struct usb_host_interface altsetting[0];
> > };
> >
> > A clearer way of coding that assignment would have been
> >
> > struct usb_interface_descriptor *desc =
> > &c->intf_cache[0]->altsetting[0].desc;
>
> Agreed.

The C language doesn't make it easy to express the difference between a
pointer, an array, and a pointer to an array, especially when following
a series of links like this one. I wasn't sure which way of writing it
was best... but if you want to change it, that's fine with me.

> > a) How come we're only considering the zeroth slot in that array in here?
>
> We start out with the first interface setting, as we always know we have
> one of them as per the USB spec (I think, anyone from linux-usb-devel
> want to verify this?)

In this case it wouldn't make any difference, since all the altsettings
for a particular interface are supposed to have the same bInterfaceClass,
bInterfaceSubClass, and bInterfaceProtocol. Although I don't think the
USB spec actually says this anywhere, it would be pretty strange if they
didn't.

The bMaxPower value could be different for different altsettings. I
suppose the code could go through and check all the altsettings... but the
routine and the heuristics it implements are already complicated enough.
Remember, none of this is critical code. It's a form of kernel policy
that has been separated out into its own subroutine, as you can see from
the routine's name.

> > b) How do we know that there's actually anything _there_? The length of
> > that variable-sized array doesn't seem to have been stored anywhere
> > obvious by usb_parse_configuration() and choose_configuration() doesn't
> > check. What happens if the length was zero?
>
> I don't think it is allowed to be, as all USB devices have to have at
> least 1 interface.

The code in usb_parse_configuration() guarantees that the number of
entries in the altsettings array is at least 1, because it sets nalts[n]
to 1 initially and never decreases it. The whole idea of an interface
without altsettings makes no sense, bacause interfaces come into being
when an interface descriptor (which contains an altsetting index) is
parsed.

The number of entries in the variable-sized array isn't actually stored
anywhere. Instead, the code in usb_parse_interface() uses struct
usb_interface_cache's num_altsetting member to count the number of entries
in the array that actually get used, which might be less than the number
that were allocated if the device's descriptors are screwy.

> > drivers/usb/core/config.c hasn't changed since October. If it was this
> > easy, we'd have hit it before now.
> >
> > Greg, who knows this code? Can we triple-check it please?
>
> I'll walk through it again, it's some nasty crap, I agree. Time to dig
> out the USB spec...
>
> Anyone else on linux-usb-devel want to verify it too?

I wrote it originally, and unless someone has changed something while I
wasn't looking it should still be okay. If anybody feels like going
through it in detail... please do; more eyeballs never hurt. I'm always
available to answer questions about the code.

Also, as has been mentioned, this stuff has been working perfectly for
quite a while so far as we know. It seems striking that many of the oops
events Andrew experienced appear to be connected to the uevent stuff in
sysfs, whereas some of the oops reports had no apparent connection with
usb_choose_configuration. Maybe what's going on involves lots of small
memory allocations in both the USB code and the sysfs code, all around the
same size (hence from the same arenas) and all happening at the same time
(when a new device is discovered).

Perhaps it might help to add some diagnostic stuff to
usb_choose_configuration(), triggered by something like this:

(((unsigned long) desc) & 0xffffff00) == 0x44535900

When that happens, you could print out the values of all the relevant
pointers in the definition of desc. Also print out the surrounding values
in memory to see what got overwritten and what the new contents are.

Alan Stern

2006-03-08 18:33:26

by David Brownell

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Fw: Re: oops in choose_configuration()

I wonder if that SCSI fix (restoring a wrongly deleted mem clear) helps
get rid of this oops too?


On Wednesday 08 March 2006 7:30 am, Alan Stern wrote:
> > > a) How come we're only considering the zeroth slot in that array in here?
> >
> > We start out with the first interface setting, as we always know we have
> > one of them as per the USB spec (I think, anyone from linux-usb-devel
> > want to verify this?)
>
> In this case it wouldn't make any difference, since all the altsettings
> for a particular interface are supposed to have the same bInterfaceClass,
> bInterfaceSubClass, and bInterfaceProtocol. Although I don't think the
> USB spec actually says this anywhere..

I'd have stopped at "wouldn't make any difference"; the kernel must make
some initial choice, but userspace is free to revise it. Agreed it would
be odd if altsettings had different class/subclass/protocol, but I don't
see any good reason to make that illegal.


> The bMaxPower value could be different for different altsettings.

Erm, no; that's a per-configuration thing, not a per-altsetting thing.
It's checking the config descriptor, not the interface descriptor,
for that particular concern.


> > > b) How do we know that there's actually anything _there_? The length of
> > > that variable-sized array doesn't seem to have been stored anywhere
> > > obvious by usb_parse_configuration() and choose_configuration() doesn't
> > > check. What happens if the length was zero?
> >
> > I don't think it is allowed to be, as all USB devices have to have at
> > least 1 interface.

I think that's not true, and it would be worth verifying that it's not
a no-interfaces device even if the USB spec required it. It's trivial
to create device firmware that advertises no-interfaces, and those should
never be able to make Linux hiccup (much less oops).


> The code in usb_parse_configuration() guarantees that the number of
> entries in the altsettings array is at least 1, because it sets nalts[n]
> to 1 initially and never decreases it. The whole idea of an interface
> without altsettings makes no sense...

Right; there's always at least one setting. Calling them "alt" settings
can be confusing; any one of them could be the "main" setting.

- Dave

2006-03-08 20:09:09

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Fw: Re: oops in choose_configuration()

On Wed, 8 Mar 2006, David Brownell wrote:

> > In this case it wouldn't make any difference, since all the altsettings
> > for a particular interface are supposed to have the same bInterfaceClass,
> > bInterfaceSubClass, and bInterfaceProtocol. Although I don't think the
> > USB spec actually says this anywhere..
>
> I'd have stopped at "wouldn't make any difference"; the kernel must make
> some initial choice, but userspace is free to revise it. Agreed it would
> be odd if altsettings had different class/subclass/protocol, but I don't
> see any good reason to make that illegal.

Agreed. And like I said before, this is only a heuristic.

> > The bMaxPower value could be different for different altsettings.
>
> Erm, no; that's a per-configuration thing, not a per-altsetting thing.
> It's checking the config descriptor, not the interface descriptor,
> for that particular concern.

Whoops, yes. I misread the code. All the more reason not to worry about
any but the first altsetting.

> > > > b) How do we know that there's actually anything _there_? The length of
> > > > that variable-sized array doesn't seem to have been stored anywhere
> > > > obvious by usb_parse_configuration() and choose_configuration() doesn't
> > > > check. What happens if the length was zero?
> > >
> > > I don't think it is allowed to be, as all USB devices have to have at
> > > least 1 interface.
>
> I think that's not true, and it would be worth verifying that it's not
> a no-interfaces device even if the USB spec required it. It's trivial
> to create device firmware that advertises no-interfaces, and those should
> never be able to make Linux hiccup (much less oops).

Ha! "should never be able" indeed. It turns out the code doesn't like it
if a configuration has no interfaces. How embarassing...

Andrew, if you tell us what's in your /proc/bus/usb/devices we'll see
whether that was the real problem. In any case, a patch follows.

Alan Stern

2006-03-08 20:14:11

by Alan Stern

[permalink] [raw]
Subject: [PATCH] usbcore: Don't assume a USB configuration includes any interfaces

In a couple of places, usbcore assumes that a USB device configuration
will have a nonzero number of interfaces. Having no interfaces may or may
not be allowed by the USB spec; in any event we shouldn't die if we
encounter such a thing. This patch (as662) removes the assumptions.

Signed-off-by: Alan Stern <[email protected]>

---

Index: usb-2.6/drivers/usb/core/hub.c
===================================================================
--- usb-2.6.orig/drivers/usb/core/hub.c
+++ usb-2.6/drivers/usb/core/hub.c
@@ -1179,8 +1179,11 @@ static int choose_configuration(struct u
c = udev->config;
num_configs = udev->descriptor.bNumConfigurations;
for (i = 0; i < num_configs; (i++, c++)) {
- struct usb_interface_descriptor *desc =
- &c->intf_cache[0]->altsetting->desc;
+ struct usb_interface_descriptor *desc = NULL;
+
+ /* It's possible that a config has no interfaces! */
+ if (c->desc.bNumInterfaces > 0)
+ desc = &c->intf_cache[0]->altsetting->desc;

/*
* HP's USB bus-powered keyboard has only one configuration
@@ -1215,7 +1218,8 @@ static int choose_configuration(struct u
/* If the first config's first interface is COMM/2/0xff
* (MSFT RNDIS), rule it out unless Linux has host-side
* RNDIS support. */
- if (i == 0 && desc->bInterfaceClass == USB_CLASS_COMM
+ if (i == 0 && desc
+ && desc->bInterfaceClass == USB_CLASS_COMM
&& desc->bInterfaceSubClass == 2
&& desc->bInterfaceProtocol == 0xff) {
#ifndef CONFIG_USB_NET_RNDIS
@@ -1231,8 +1235,8 @@ static int choose_configuration(struct u
* than a vendor-specific driver. */
else if (udev->descriptor.bDeviceClass !=
USB_CLASS_VENDOR_SPEC &&
- desc->bInterfaceClass !=
- USB_CLASS_VENDOR_SPEC) {
+ (!desc || desc->bInterfaceClass !=
+ USB_CLASS_VENDOR_SPEC)) {
best = c;
break;
}
@@ -3022,7 +3026,7 @@ int usb_reset_device(struct usb_device *
parent_hub = hdev_to_hub(parent_hdev);

/* If we're resetting an active hub, take some special actions */
- if (udev->actconfig &&
+ if (udev->actconfig && udev->actconfig->desc.bNumInterfaces > 0 &&
udev->actconfig->interface[0]->dev.driver ==
&hub_driver.driver &&
(hub = hdev_to_hub(udev)) != NULL) {

2006-03-08 20:16:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Fw: Re: oops in choose_configuration()

Alan Stern <[email protected]> wrote:
>
> Andrew, if you tell us what's in your /proc/bus/usb/devices we'll see
> whether that was the real problem.

Below.

> In any case, a patch follows.

ooh, I like patches.

This crash manifests in several ways. Pretty much any debugging patches
seem to make it hide.



T: Bus=05 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=0000 ProdID=0000 Rev= 2.06
S: Manufacturer=Linux 2.6.16-rc5-mm2 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1d.3
C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=04 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=0000 ProdID=0000 Rev= 2.06
S: Manufacturer=Linux 2.6.16-rc5-mm2 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1d.2
C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0
D: Ver= 2.00 Cls=e0(unk. ) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=044e ProdID=300c Rev=19.15
S: Manufacturer=ALPS
S: Product=UGX
C:* #Ifs= 3 Cfg#= 1 Atr=80 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(unk. ) Sub=01 Prot=01 Driver=(none)
E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms
E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(unk. ) Sub=01 Prot=01 Driver=(none)
E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms
I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(unk. ) Sub=01 Prot=01 Driver=(none)
E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms
I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(unk. ) Sub=01 Prot=01 Driver=(none)
E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms
I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(unk. ) Sub=01 Prot=01 Driver=(none)
E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms
I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(unk. ) Sub=01 Prot=01 Driver=(none)
E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms
I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(unk. ) Sub=01 Prot=01 Driver=(none)
E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
I: If#= 2 Alt= 0 #EPs= 0 Cls=fe(app. ) Sub=01 Prot=00 Driver=(none)

T: Bus=03 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc=106/900 us (12%), #Int= 1, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=0000 ProdID=0000 Rev= 2.06
S: Manufacturer=Linux 2.6.16-rc5-mm2 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1d.1
C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=1.5 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=045e ProdID=00e1 Rev= 0.07
S: Manufacturer=Microsoft
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 6 Ivl=10ms

T: Bus=02 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=0000 ProdID=0000 Rev= 2.06
S: Manufacturer=Linux 2.6.16-rc5-mm2 uhci_hcd
S: Product=UHCI Host Controller
S: SerialNumber=0000:00:1d.0
C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms

T: Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=480 MxCh= 8
B: Alloc= 0/800 us ( 0%), #Int= 0, #Iso= 0
D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0000 ProdID=0000 Rev= 2.06
S: Manufacturer=Linux 2.6.16-rc5-mm2 ehci_hcd
S: Product=EHCI Host Controller
S: SerialNumber=0000:00:1d.7
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=256ms

2006-03-08 20:51:18

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Fw: Re: oops in choose_configuration()

On Wed, 8 Mar 2006, Andrew Morton wrote:

> Alan Stern <[email protected]> wrote:
> >
> > Andrew, if you tell us what's in your /proc/bus/usb/devices we'll see
> > whether that was the real problem.
>
> Below.

Unforunately, the data shows that all your devices' configurations do
indeed have at least one interface. (Unless your ALPS or Microsoft HID
device just happens to report differently once every few thousand
times...) So scratch that theory. Besides, it doesn't explain the
crashes you got that were not connected to usb_choose_configuration.

> > In any case, a patch follows.
>
> ooh, I like patches.

We should keep the patch. After all, some weird device might need it
eventually.

> This crash manifests in several ways. Pretty much any debugging patches
> seem to make it hide.

Aren't those your favorite kind of bugs? Maybe we should just add a
permanent debugging patch, with the expectation that it will never trigger
and therefore will solve the bug. :-)

What about those scheduler changes you found through the bisection search?
Any word on that?

Alan Stern

2006-03-08 20:56:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Fw: Re: oops in choose_configuration()

Alan Stern <[email protected]> wrote:
>
> What about those scheduler changes you found through the bisection search?
> Any word on that?

Ingo's gone over them pretty closely. The current theory is that the CPU
scheduler change alters timing sufficiently for the bug to bite.

The machine passes memtest86.

Ingo's suspecting stack corruption. Do you know whether USB anywhere does
DMA into automatically-allocated storage (ie: kernel stacks)?

Am about to reboot into a stack-corruption-detector patch from Ingo.

2006-03-08 21:55:04

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Fw: Re: oops in choose_configuration()

On Wed, 8 Mar 2006, Andrew Morton wrote:

> Alan Stern <[email protected]> wrote:
> >
> > What about those scheduler changes you found through the bisection search?
> > Any word on that?
>
> Ingo's gone over them pretty closely. The current theory is that the CPU
> scheduler change alters timing sufficiently for the bug to bite.
>
> The machine passes memtest86.
>
> Ingo's suspecting stack corruption. Do you know whether USB anywhere does
> DMA into automatically-allocated storage (ie: kernel stacks)?

We try to avoid doing that, but such things have been known to creep into
the sources from time to time. We fix them whenever they surface. I'm
pretty sure that usbcore and usb-storage are clean in this respect, and
probably usbhid is also (I haven't gone through it to check personally;
presumably others have). The only drivers listed in your
/proc/bus/usb/devices were hub and usbhid, and the ALPS UGX device didn't
have any drivers bound.

Have you tried running your test with the USB devices unplugged? That
won't prevent usb_choose_configuration from getting called (since it's
used for the virtual root hubs exported by the host controller drivers)
but it should make everything more deterministic.

Alan Stern