2011-04-04 12:20:43

by Oncaphillis

[permalink] [raw]
Subject: Kernel bug message and missing data on libusb_interrupt_transfer

Hi,

We are experiencing sporadic kernel bug messages and total
kernel freezes on usb communication via libusb-1.0.6.

We are currently under kernel 2.6.30.10 but have seen
it also under older and newer kernel (I think the newest was
a 2.6.36.x).

The communication should work as follows:

- Send a 2 byte sequence to endpoint #4
selecting the register to which the command should be send

- The device should answer with the same two byte sequence
on Endpoint #8

- Send a 2 byte command sequence to endpoint #4

- The device should acknowledge the command by returning
the same two bytes on endpoint #8

- The device may also initiate inbound data transfer on endpoint #8
to inform about status changes.

We send the register selection and command data via

libusb_bulk_transfer() with a time out of 10000 ms

and read the reply via libusb_interrupt_transfer
with a time out of 100 ms as specified for our device
We also periodically read on endpoint #8 to get the
status changes.

Sometimes we run into the following situation.

- We send the 2 byte register selection sequence via libusb_bulk_transfer
- We try to read the response via libusb_interrupt_transfer but run into
a time out.
- If we look at the USB communication via USB analyzer we actually see
the inbound
transfer of two bytes and the acknowledge by the kernel, but this
data never
ends up as a valid result of libusb_interrupt_transfer.
- Since we got a time out we retry the read up to 30 times. Within
this polling
we see the following kernel bug message in dmesg. Sometimes the
kernel freezes completely

<snip>
------------[ cut here ]------------
kernel BUG at mm/slub.c:2808!
invalid opcode: 0000 [#1] SMP
last sysfs file:
/sys/devices/pci0000:00/0000:00:1d.7/usb2/2-3/bConfigurationValue
CPU 3
Modules linked in:
Pid: 4314, comm: rrdupdate Not tainted 2.6.30.10 #2 To Be Filled By O.E.M.
RIP: 0010:[<ffffffff8028d2f8>] [<ffffffff8028d2f8>] kfree+0x7c/0xdb
RSP: 0000:ffff880077de1d38 EFLAGS: 00010246
RAX: 4000000000000000 RBX: ffff88007a07a772 RCX: ffff88007acea7e0
RDX: ffffe20000000000 RSI: ffffe20001ab1ab0 RDI: ffff88007a07a772
RBP: ffff880077de1d58 R08: 0000000000000000 R09: 0000000000000008
R10: 00000000f7f5e000 R11: 00000000f7f5d5b8 R12: ffff88007c081c80
R13: ffffffff802c9cd8 R14: 00000000f7f5e000 R15: ffff880077de1f58
FS: 0000000000000000(0000) GS:ffff88000105b000(0000) knlGS:0000000000000000
CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000f7f5d5b8 CR3: 0000000079940000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rrdupdate (pid: 4314, threadinfo ffff880077de0000, task
ffff8800790d8b40)
Stack:
00000000f7f5e000 0000000000000000 ffff88007c081c80 ffff880077d77800
ffff880077de1e48 ffffffff802c9cd8 0000000000000080 ffff880077d77800
0000000000000001 00000000f7f3d000 ffff880077df2000 ffff880077df2000
Call Trace:
[<ffffffff802c9cd8>] load_elf_binary+0xfda/0x1862
[<ffffffff802c0b6f>] ? compat_copy_strings+0x1b8/0x1ca
[<ffffffff802958fe>] search_binary_handler+0xb0/0x23f
[<ffffffff802c0dc7>] compat_do_execve+0x246/0x36f
[<ffffffff8022593b>] sys32_execve+0x3e/0x5c
[<ffffffff80225765>] ia32_ptregs_common+0x25/0x4c
Code: ba 00 00 00 00 00 e2 ff ff 48 c1 e8 0c 48 6b f0 38 48 01 d6 66 83
3e 00 79 04 48 8b 76 10 48 8b 06 84 c0 78 14 66 a9 00 c0 75 04 <0f> 0b
eb fe 48 89 f7 e8 35 13 fe ff eb 48 48 8b 4d 08 48 8b 7e
RIP [<ffffffff8028d2f8>] kfree+0x7c/0xdb
RSP <ffff880077de1d38>
---[ end trace ba800619f794f281 ]---

</snip>


I've placed a pdf on

http://www.oncaphillis.net/usb.pdf

which represents the annotated USB log


Thank you

O.


2011-04-04 13:06:07

by Xiaofan Chen

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

On Mon, Apr 4, 2011 at 8:14 PM, Oncaphillis <[email protected]> wrote:
> Hi,
>
> ?We are experiencing sporadic kernel bug messages and total
> kernel freezes on usb communication via libusb-1.0.6.

First thing first, why not try the latest release version which
is libusb-1.08?

Or better yet, try the latest git from libusb-stuge branch
which has quite some fixes post 1.08.
http://git.libusb.org/?p=libusb-stuge.git;a=summary;js=1


> ?We are currently under kernel 2.6.30.10 but have seen
> it also under older and newer kernel (I think the newest was
> a 2.6.36.x).
>



--
Xiaofan

2011-04-04 13:31:41

by Oncaphillis

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

Thanks for the reply.

I'm in an environment where we are
reluctant to switch between versions. I had a hard time to
argue for 0.1.x to 1.0.x transition. We saw the same bug (or at least
kernel freezes in 0.1.x) I'll give 1.0.8 a try and report back.
Using a git pull isn't an option for us for political reasons.

Thanks

O.


On 04/04/2011 03:06 PM, Xiaofan Chen wrote:
> On Mon, Apr 4, 2011 at 8:14 PM, Oncaphillis<[email protected]> wrote:
>> Hi,
>>
>> We are experiencing sporadic kernel bug messages and total
>> kernel freezes on usb communication via libusb-1.0.6.
> First thing first, why not try the latest release version which
> is libusb-1.08?
>
> Or better yet, try the latest git from libusb-stuge branch
> which has quite some fixes post 1.08.
> http://git.libusb.org/?p=libusb-stuge.git;a=summary;js=1
>
>
>> We are currently under kernel 2.6.30.10 but have seen
>> it also under older and newer kernel (I think the newest was
>> a 2.6.36.x).
>>
>
>

2011-04-05 09:29:53

by Oncaphillis

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

On 04/04/2011 03:31 PM, Oncaphillis wrote:
> Thanks for the reply.
>
> I'm in an environment where we are
> reluctant to switch between versions. I had a hard time to
> argue for 0.1.x to 1.0.x transition. We saw the same bug (or at least
> kernel freezes in 0.1.x) I'll give 1.0.8 a try and report back.
> Using a git pull isn't an option for us for political reasons.
>
> Thanks
>
> O.
>
>
> On 04/04/2011 03:06 PM, Xiaofan Chen wrote:
>> On Mon, Apr 4, 2011 at 8:14 PM, Oncaphillis<[email protected]> wrote:
>>> Hi,
>>>
>>> We are experiencing sporadic kernel bug messages and total
>>> kernel freezes on usb communication via libusb-1.0.6.
>> First thing first, why not try the latest release version which
>> is libusb-1.08?
>>
>> Or better yet, try the latest git from libusb-stuge branch
>> which has quite some fixes post 1.08.
>> http://git.libusb.org/?p=libusb-stuge.git;a=summary;js=1
>>
Ok -- tried libusb 1.0.8.

Now I get

<snip>
------------[ cut here ]------------
kernel BUG at mm/slub.c:2808!
invalid opcode: 0000 [#2] SMP
last sysfs file:
/sys/devices/pci0000:00/0000:00:1d.7/usb2/2-3/bConfigurationValue
CPU 1
Modules linked in:
Pid: 14293, comm: E25Stress Tainted: G D 2.6.30.10 #2 To Be
Filled By O.E.M.
RIP: 0010:[<ffffffff8028d2f8>] [<ffffffff8028d2f8>] kfree+0x7c/0xdb
RSP: 0000:ffff88007847fb98 EFLAGS: 00010246
RAX: 4000000000000000 RBX: ffff880077c4a772 RCX: 0000000009f110a0
RDX: ffffe20000000000 RSI: ffffe20001a33030 RDI: ffff880077c4a772
RBP: ffff88007847fbb8 R08: ffff8800650dda40 R09: 00000000ffe1caa8
R10: ffff88007847e000 R11: 0000000000000000 R12: ffff88007b43fe40
R13: ffffffff80523871 R14: 0000000009f11080 R15: 00000000ffe1ca90
FS: 0000000000000000(0000) GS:ffff880001029000(0063) knlGS:00000000558356d0
CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000055577000 CR3: 000000007842c000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process E25Stress (pid: 14293, threadinfo ffff88007847e000, task
ffff88007b46cec0)
Stack:
ffffffff8029ee80 ffff88007b389e40 ffff88007b43fe40 ffff88007b389e40
ffff88007847fbd8 ffffffff80523871 0000000000000000 ffff87fffffffffd
ffff88007847fc18 ffffffff80523eee 0000000000200200 00000000ffffffff
Call Trace:
[<ffffffff8029ee80>] ? __pollwait+0x0/0xbf
[<ffffffff80523871>] free_async+0x22/0x47
[<ffffffff80523eee>] processcompl_compat+0xe3/0x108
[<ffffffff8052611f>] usbdev_ioctl+0x114f/0x157d
[<ffffffff8029a046>] ? __link_path_walk+0x130/0xc32
[<ffffffff802299e6>] ? dequeue_task_fair+0x13e/0x14d
[<ffffffff80209b30>] ? __switch_to+0x13d/0x269
[<ffffffff8029ce8e>] vfs_ioctl+0x6a/0x82
[<ffffffff8029d2be>] do_vfs_ioctl+0x418/0x459
[<ffffffff80246a12>] ? hrtimer_try_to_cancel+0x67/0x72
[<ffffffff8029d341>] sys_ioctl+0x42/0x65
[<ffffffff802c2cfd>] do_ioctl32_pointer+0xb/0xd
[<ffffffff802c4790>] compat_sys_ioctl+0x2fa/0x34a
[<ffffffff8029ddac>] ? poll_select_set_timeout+0x61/0x7c
[<ffffffff80225602>] ia32_sysret+0x0/0xa
Code: ba 00 00 00 00 00 e2 ff ff 48 c1 e8 0c 48 6b f0 38 48 01 d6 66 83
3e 00 79 04 48 8b 76 10 48 8b 06 84 c0 78 14 66 a9 00 c0 75 04 <0f> 0b
eb fe 48 89 f7 e8 35 13 fe ff eb 48 48 8b 4d 08 48 8b 7e
RIP [<ffffffff8028d2f8>] kfree+0x7c/0xdb
RSP <ffff88007847fb98>
---[ end trace 840ffcb16d410ea4 ]---
</snip>

Suprisingly now the kernel is reported as tainted although there isn't
any module inserted the
The timing seems to be the same as can be seen under
http://oncaphillis.net/usb.pdf
but now we actually read data, but it's not the #72A7 sequence we are
expecting.

Thanks
O.


>>> We are currently under kernel 2.6.30.10 but have seen
>>> it also under older and newer kernel (I think the newest was
>>> a 2.6.36.x).
>>>
>>
>
> ------------------------------------------------------------------------------
> Create and publish websites with WebMatrix
> Use the most popular FREE web apps or write code yourself;
> WebMatrix provides all the features you need to develop and
> publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
> _______________________________________________
> Libusb-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/libusb-devel

2011-04-05 14:13:54

by Alan Stern

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

On Tue, 5 Apr 2011, Oncaphillis wrote:

> Now I get
>
> <snip>
> ------------[ cut here ]------------
> kernel BUG at mm/slub.c:2808!
> invalid opcode: 0000 [#2] SMP

Can you duplicate this using either a 2.6.37 or 2.6.38 kernel?

Alan Stern

2011-04-05 14:21:44

by Oncaphillis

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

On 04/05/2011 04:13 PM, Alan Stern wrote:
> On Tue, 5 Apr 2011, Oncaphillis wrote:
>
>> Now I get
>>
>> <snip>
>> ------------[ cut here ]------------
>> kernel BUG at mm/slub.c:2808!
>> invalid opcode: 0000 [#2] SMP
> Can you duplicate this using either a 2.6.37 or 2.6.38 kernel?
>
> Alan Stern
>
The highest kernel version we've tried (as far as i remember) was the
2.6.36. Have there been issues which might have been solved betwen 36
and 3(7|8) ?

I'll give it a try.

O.


2011-04-05 14:42:26

by Alan Stern

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

On Tue, 5 Apr 2011, Oncaphillis wrote:

> On 04/05/2011 04:13 PM, Alan Stern wrote:
> > On Tue, 5 Apr 2011, Oncaphillis wrote:
> >
> >> Now I get
> >>
> >> <snip>
> >> ------------[ cut here ]------------
> >> kernel BUG at mm/slub.c:2808!
> >> invalid opcode: 0000 [#2] SMP
> > Can you duplicate this using either a 2.6.37 or 2.6.38 kernel?
> >
> > Alan Stern
> >
> The highest kernel version we've tried (as far as i remember) was the
> 2.6.36. Have there been issues which might have been solved betwen 36
> and 3(7|8) ?

I don't know. But the core kernel developers always want to hear about
problems reported against the most recent version possible. If you
could test 2.6.39-rc1 (or -rc2, which should be released in a day or
so), that would be even better.

> I'll give it a try.

Thanks.

Alan Stern

2011-04-05 14:55:01

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

>> I'm in an environment where we are
>> reluctant to switch between versions. I had a hard time to

> Pid: 14293, comm: E25Stress Tainted: G D 2.6.30.10 #2 To Be
> Filled By O.E.M.
> RIP: 0010:[<ffffffff8028d2f8>] [<ffffffff8028d2f8>] kfree+0x7c/0xdb

> Suprisingly now the kernel is reported as tainted although there isn't
> any module inserted the

That is not true; "G" means you have loaded a (GPL-compatible) module.
"D" means the kernel died. It died because it tried to dealloc memory
using a bad pointer, perhaps a null pointer.

Your kernel is two years old, you should try a current kernel.


Segher

2011-04-05 15:15:42

by Oncaphillis

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

On 04/05/2011 04:54 PM, Segher Boessenkool wrote:
>>> I'm in an environment where we are
>>> reluctant to switch between versions. I had a hard time to
>> Pid: 14293, comm: E25Stress Tainted: G D 2.6.30.10 #2 To Be
>> Filled By O.E.M.
>> RIP: 0010:[<ffffffff8028d2f8>] [<ffffffff8028d2f8>] kfree+0x7c/0xdb
>> Suprisingly now the kernel is reported as tainted although there isn't
>> any module inserted the
> That is not true; "G" means you have loaded a (GPL-compatible) module.
> "D" means the kernel died. It died because it tried to dealloc memory
> using a bad pointer, perhaps a null pointer.
>
> Your kernel is two years old, you should try a current kernel.
>
>

Thanks -- Didn't know anything about the 'G D' convention. Although
/proc/modules is empty (I don't have lsmod on the target system).

We already have tried 2.6.36.x coming up with the same error.
Alan Stern recommended testing 2.6.3(8|9-rc1) which I'm currently doing.

O.

> Segher
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-04-06 10:37:30

by Oncaphillis

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

On 04/05/2011 04:42 PM, Alan Stern wrote:
> On Tue, 5 Apr 2011, Oncaphillis wrote:
>
>> On 04/05/2011 04:13 PM, Alan Stern wrote:
>>> On Tue, 5 Apr 2011, Oncaphillis wrote:
>>>
>>>> Now I get
>>>>
>>>> <snip>
>>>> ------------[ cut here ]------------
>>>> kernel BUG at mm/slub.c:2808!
>>>> invalid opcode: 0000 [#2] SMP
>>> Can you duplicate this using either a 2.6.37 or 2.6.38 kernel?
>>>
>>> Alan Stern
>>>
>> The highest kernel version we've tried (as far as i remember) was the
>> 2.6.36. Have there been issues which might have been solved betwen 36
>> and 3(7|8) ?
> I don't know. But the core kernel developers always want to hear about
> problems reported against the most recent version possible. If you
> could test 2.6.39-rc1 (or -rc2, which should be released in a day or
> so), that would be even better.
Ok -- kernel 2.6.38.2 together with libusb-1.0.8 seems to behave well.
This was only a minimal overnight test. Not the full functionality we
are trying to implement. But it looks promising.

Thank you

O.

>> I'll give it a try.
> Thanks.
>
> Alan Stern
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-04-08 09:07:12

by Oncaphillis

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

On 04/06/2011 12:37 PM, Oncaphillis wrote:
> On 04/05/2011 04:42 PM, Alan Stern wrote:
>> On Tue, 5 Apr 2011, Oncaphillis wrote:
>>
>>> On 04/05/2011 04:13 PM, Alan Stern wrote:
>>>> On Tue, 5 Apr 2011, Oncaphillis wrote:
>>>>
>>>>> Now I get
>>>>>
>>>>> <snip>
>>>>> ------------[ cut here ]------------
>>>>> kernel BUG at mm/slub.c:2808!
>>>>> invalid opcode: 0000 [#2] SMP
>>>> Can you duplicate this using either a 2.6.37 or 2.6.38 kernel?
>>>>
>>>> Alan Stern
>>>>
>>> The highest kernel version we've tried (as far as i remember) was the
>>> 2.6.36. Have there been issues which might have been solved betwen 36
>>> and 3(7|8) ?
>> I don't know. But the core kernel developers always want to hear about
>> problems reported against the most recent version possible. If you
>> could test 2.6.39-rc1 (or -rc2, which should be released in a day or
>> so), that would be even better.
> Ok -- kernel 2.6.38.2 together with libusb-1.0.8 seems to behave well.
> This was only a minimal overnight test. Not the full functionality we
> are trying to implement. But it looks promising.
>
> Thank you
>
> O.
>
Just a short update,

It doesn't seem to have anything to do with the libsub-1.0.x) version but
with the --(enable|disable)-timerfd option of libusb configure. If
libusb was
configured with --disable-timerfd, which was the case for our 1.0.6 libusb
I get the kernel bug message in in slub.c module and eventually a
kernel freeze. This holds true for libusb-1.0.8 and kernel 2.6.38.2

O.

>>> I'll give it a try.
>> Thanks.
>>
>> Alan Stern
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> ------------------------------------------------------------------------------
> Xperia(TM) PLAY
> It's a major breakthrough. An authentic gaming
> smartphone on the nation's most reliable network.
> And it wants your games.
> http://p.sf.net/sfu/verizon-sfdev
> _______________________________________________
> Libusb-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/libusb-devel

2011-04-08 10:09:39

by Xiaofan Chen

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

On Fri, Apr 8, 2011 at 5:07 PM, Oncaphillis <[email protected]> wrote:
> Just a short update,
>
> ?It doesn't seem to have anything to do with the libsub-1.0.x) version but
> with the --(enable|disable)-timerfd option of libusb configure. If libusb
> was configured with --disable-timerfd, which was the case for our 1.0.6 libusb
> I get the kernel bug message in in slub.c module and eventually a
> kernel freeze. This holds true for libusb-1.0.8 and kernel 2.6.38.2

What if you try libusb-stuge git tree? It has some fixes for the Linux side.

--
Xiaofan

2011-04-08 14:37:50

by Alan Stern

[permalink] [raw]
Subject: Re: [Libusb-devel] Kernel bug message and missing data on libusb_interrupt_transfer

On Fri, 8 Apr 2011, Oncaphillis wrote:

> Just a short update,
>
> It doesn't seem to have anything to do with the libsub-1.0.x) version but
> with the --(enable|disable)-timerfd option of libusb configure. If
> libusb was
> configured with --disable-timerfd, which was the case for our 1.0.6 libusb
> I get the kernel bug message in in slub.c module and eventually a
> kernel freeze. This holds true for libusb-1.0.8 and kernel 2.6.38.2

Can you post an example of the bug message (together with an
explanation of what you did to cause it) to LKML? Change the Subject:
line to something highly visible, like "Kernel BUG in 2.6.38.2 slub.c".

Alan Stern