2009-06-22 12:05:23

by Marek Szyprowski

[permalink] [raw]
Subject: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

Hello,

I would like to ask if someone has successfully used g_serial USB gadget driver with kernel 2.6.29 or 2.6.30? I'm developing a low level hardware driver for USB gadgets on ARM S3C6410 platform. This driver is working quite fine (I've used it a lot with g_ether CDC/RNDIS ethernet gadget driver). During my development I've found the following bug in g_serial driver:

On Linux host:

# cat /dev/ttyACM0

On device:

# cat >/dev/ttyGS0
[ 1897.350000] BUG: sleeping function called from invalid context at kernel/mutex.c:94
[ 1897.360000] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper
[ 1897.370000] [<c0024afc>] (unwind_backtrace+0x0/0xdc) from [<c0267218>] (mutex_lock+0x14/0x30)
[ 1897.380000] [<c0267218>] (mutex_lock+0x14/0x30) from [<c014ceac>] (echo_char_raw+0x1c/0x48)
[ 1897.390000] [<c014ceac>] (echo_char_raw+0x1c/0x48) from [<c014ecdc>] (n_tty_receive_buf+0x9dc/0xec4)
[ 1897.390000] [<c014ecdc>] (n_tty_receive_buf+0x9dc/0xec4) from [<c01513bc>] (flush_to_ldisc+0x104/0x1b0)
[ 1897.400000] [<c01513bc>] (flush_to_ldisc+0x104/0x1b0) from [<bf001ce8>] (gs_rx_push+0x118/0x1cc [g_serial])
[ 1897.410000] [<bf001ce8>] (gs_rx_push+0x118/0x1cc [g_serial]) from [<c003c7c0>] (tasklet_action+0x78/0xc8)
[ 1897.420000] [<c003c7c0>] (tasklet_action+0x78/0xc8) from [<c003cc54>] (__do_softirq+0x6c/0xf4)
[ 1897.430000] [<c003cc54>] (__do_softirq+0x6c/0xf4) from [<c003cd20>] (irq_exit+0x44/0x5c)
[ 1897.440000] [<c003cd20>] (irq_exit+0x44/0x5c) from [<c001e054>] (_text+0x54/0x6c)
[ 1897.450000] [<c001e054>] (_text+0x54/0x6c) from [<c001ea28>] (__irq_svc+0x48/0x9c)
[ 1897.450000] Exception stack(0xc0349f70 to 0xc0349fb8)
[ 1897.460000] 9f60: 00000001 00000000 f4100000 00000021
[ 1897.470000] 9f80: c002032c c0348000 c002032c c036c8c4 5001aab8 410fb766 5001aa84 00000000
[ 1897.480000] 9fa0: c0349fc0 c0349fb8 c0020394 c0020398 20000013 ffffffff
[ 1897.480000] [<c001ea28>] (__irq_svc+0x48/0x9c) from [<c0020394>] (default_idle+0x68/0x7c)
[ 1897.490000] [<c0020394>] (default_idle+0x68/0x7c) from [<c0020300>] (cpu_idle+0x30/0x5c)
[ 1897.500000] [<c0020300>] (cpu_idle+0x30/0x5c) from [<c0008a68>] (start_kernel+0x1f8/0x244)
[ 1897.510000] [<c0008a68>] (start_kernel+0x1f8/0x244) from [<50008034>] (0x50008034)
[ 1897.520000] ------------[ cut here ]------------
[ 1897.520000] WARNING: at kernel/mutex.c:207 __mutex_lock_slowpath+0x6c/0x2c8()
[ 1897.530000] Modules linked in: g_serial
[ 1897.530000] [<c0024afc>] (unwind_backtrace+0x0/0xdc) from [<c0037ed4>] (warn_slowpath_common+0x4c/0x80)
[ 1897.540000] [<c0037ed4>] (warn_slowpath_common+0x4c/0x80) from [<c026687c>] (__mutex_lock_slowpath+0x6c/0x2c8)
[ 1897.550000] [<c026687c>] (__mutex_lock_slowpath+0x6c/0x2c8) from [<c0267220>] (mutex_lock+0x1c/0x30)
[ 1897.560000] [<c0267220>] (mutex_lock+0x1c/0x30) from [<c014ceac>] (echo_char_raw+0x1c/0x48)
[ 1897.570000] [<c014ceac>] (echo_char_raw+0x1c/0x48) from [<c014ecdc>] (n_tty_receive_buf+0x9dc/0xec4)
[ 1897.580000] [<c014ecdc>] (n_tty_receive_buf+0x9dc/0xec4) from [<c01513bc>] (flush_to_ldisc+0x104/0x1b0)
[ 1897.590000] [<c01513bc>] (flush_to_ldisc+0x104/0x1b0) from [<bf001ce8>] (gs_rx_push+0x118/0x1cc [g_serial])
[ 1897.600000] [<bf001ce8>] (gs_rx_push+0x118/0x1cc [g_serial]) from [<c003c7c0>] (tasklet_action+0x78/0xc8)
[ 1897.610000] [<c003c7c0>] (tasklet_action+0x78/0xc8) from [<c003cc54>] (__do_softirq+0x6c/0xf4)
[ 1897.620000] [<c003cc54>] (__do_softirq+0x6c/0xf4) from [<c003cd20>] (irq_exit+0x44/0x5c)
[ 1897.620000] [<c003cd20>] (irq_exit+0x44/0x5c) from [<c001e054>] (_text+0x54/0x6c)
[ 1897.630000] [<c001e054>] (_text+0x54/0x6c) from [<c001ea28>] (__irq_svc+0x48/0x9c)
[ 1897.640000] Exception stack(0xc0349f70 to 0xc0349fb8)
[ 1897.640000] 9f60: 00000001 00000000 f4100000 00000021
[ 1897.650000] 9f80: c002032c c0348000 c002032c c036c8c4 5001aab8 410fb766 5001aa84 00000000
[ 1897.660000] 9fa0: c0349fc0 c0349fb8 c0020394 c0020398 20000013 ffffffff
[ 1897.670000] [<c001ea28>] (__irq_svc+0x48/0x9c) from [<c0020394>] (default_idle+0x68/0x7c)
[ 1897.680000] [<c0020394>] (default_idle+0x68/0x7c) from [<c0020300>] (cpu_idle+0x30/0x5c)
[ 1897.680000] [<c0020300>] (cpu_idle+0x30/0x5c) from [<c0008a68>] (start_kernel+0x1f8/0x244)
[ 1897.690000] [<c0008a68>] (start_kernel+0x1f8/0x244) from [<50008034>] (0x50008034)
[ 1897.700000] ---[ end trace 6355fd65cb30f602 ]---

At USB protocol level everything looks fine (I've used in-kernel usb logger in Linux host machine). This looks like a bug somewhere in tty handling code. Is this problem known?

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center


2009-06-22 12:34:35

by Peter Korsgaard

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

>>>>> "Marek" == Marek Szyprowski <[email protected]> writes:

Marek> Hello,

Marek> I would like to ask if someone has successfully used g_serial
Marek> USB gadget driver with kernel 2.6.29 or 2.6.30? I'm developing
Marek> a low level hardware driver for USB gadgets on ARM S3C6410
Marek> platform. This driver is working quite fine (I've used it a
Marek> lot with g_ether CDC/RNDIS ethernet gadget driver). During my
Marek> development I've found the following bug in g_serial driver:

You are aware that Ben Dooks has written an UDC driver for the OTG
controller on the s3c6410 which is now in mainline, right?

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5b7d70c6dbf2db786395cbd21750a1a4ce222f84

I've used the g_serial driver on 2.6.29 without problems (not on
s3c6410 though).

--
Bye, Peter Korsgaard

2009-06-22 13:56:36

by Marek Szyprowski

[permalink] [raw]
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

Hello,

On Monday, June 22, 2009 2:34 PM, Peter Korsgaard wrote:

> Marek> I would like to ask if someone has successfully used g_serial
> Marek> USB gadget driver with kernel 2.6.29 or 2.6.30? I'm developing
> Marek> a low level hardware driver for USB gadgets on ARM S3C6410
> Marek> platform. This driver is working quite fine (I've used it a
> Marek> lot with g_ether CDC/RNDIS ethernet gadget driver). During my
> Marek> development I've found the following bug in g_serial driver:
>
> You are aware that Ben Dooks has written an UDC driver for the OTG
> controller on the s3c6410 which is now in mainline, right?
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
> 2.6.git;a=commit;h=5b7d70c6dbf2db786395cbd21750a1a4ce222f84

Yes, I am aware. That driver does not work well for me (but this is the
other case). However I did a test with his driver and the result was also
a kernel oops:

[ 162.520000] BUG: sleeping function called from invalid context at
kernel/mutex.c:94
[ 162.530000] in_atomic(): 1, irqs_disabled(): 0, pid: 504, name: cat
[ 162.540000] [<c0024afc>] (unwind_backtrace+0x0/0xdc) from [<c0268a28>]
(mutex_lock+0x14/0x30)
[ 162.540000] [<c0268a28>] (mutex_lock+0x14/0x30) from [<c014ceb4>]
(echo_set_canon_col+0x1c/0x44)
[ 162.550000] [<c014ceb4>] (echo_set_canon_col+0x1c/0x44) from [<c014eeb0>]
(n_tty_receive_buf+0xbf0/0xec4)
[ 162.560000] [<c014eeb0>] (n_tty_receive_buf+0xbf0/0xec4) from
[<c015137c>] (flush_to_ldisc+0x104/0x1b0)
[ 162.570000] [<c015137c>] (flush_to_ldisc+0x104/0x1b0) from [<bf002238>]
(gs_rx_push+0x118/0x1cc [g_serial])
[ 162.580000] [<bf002238>] (gs_rx_push+0x118/0x1cc [g_serial]) from
[<c003c780>] (tasklet_action+0x78/0xc8)
[ 162.590000] [<c003c780>] (tasklet_action+0x78/0xc8) from [<c003cc14>]
(__do_softirq+0x6c/0xf4)
[ 162.600000] [<c003cc14>] (__do_softirq+0x6c/0xf4) from [<c003cce0>]
(irq_exit+0x44/0x5c)
[ 162.610000] [<c003cce0>] (irq_exit+0x44/0x5c) from [<c001e054>]
(_text+0x54/0x6c)
[ 162.610000] [<c001e054>] (_text+0x54/0x6c) from [<c001ea28>]
(__irq_svc+0x48/0x9c)
[ 162.620000] Exception stack(0xc6567e78 to 0xc6567ec0)
[ 162.630000] 7e60:
c6c883c0 00000013
[ 162.640000] 7e80: ffffffff 00000001 00000013 00000004 c64df800 00000013
c6566000 c659c0a0
[ 162.640000] 7ea0: c6526c30 c64df800 c6567e80 c6567ec0 c02699bc c02699d0
60000013 ffffffff
[ 162.650000] [<c001ea28>] (__irq_svc+0x48/0x9c) from [<c02699bc>]
(_spin_unlock_irqrestore+0xc/0x44)
[ 162.660000] [<c02699bc>] (_spin_unlock_irqrestore+0xc/0x44) from
[<bf000d24>] (gs_write+0x5c/0x68 [g_serial])
[ 162.670000] [<bf000d24>] (gs_write+0x5c/0x68 [g_serial]) from
[<c014d870>] (n_tty_write+0x240/0x39c)
[ 162.680000] [<c014d870>] (n_tty_write+0x240/0x39c) from [<c014b078>]
(tty_write+0x180/0x21c)
[ 162.690000] [<c014b078>] (tty_write+0x180/0x21c) from [<c008c4d4>]
(vfs_write+0xac/0x158)
[ 162.700000] [<c008c4d4>] (vfs_write+0xac/0x158) from [<c008c638>]
(sys_write+0x40/0x6c)
[ 162.700000] [<c008c638>] (sys_write+0x40/0x6c) from [<c001ede0>]
(ret_fast_syscall+0x0/0x2c)
[ 162.710000] ------------[ cut here ]------------
[ 162.720000] WARNING: at kernel/mutex.c:207
__mutex_lock_slowpath+0x6c/0x2c8()
[ 162.720000] Modules linked in: g_serial
[ 162.730000] [<c0024afc>] (unwind_backtrace+0x0/0xdc) from [<c0037e94>]
(warn_slowpath_common+0x4c/0x80)
[ 162.740000] [<c0037e94>] (warn_slowpath_common+0x4c/0x80) from
[<c026808c>] (__mutex_lock_slowpath+0x6c/0x2c8)
[ 162.750000] [<c026808c>] (__mutex_lock_slowpath+0x6c/0x2c8) from
[<c0268a30>] (mutex_lock+0x1c/0x30)
[ 162.760000] [<c0268a30>] (mutex_lock+0x1c/0x30) from [<c014ceb4>]
(echo_set_canon_col+0x1c/0x44)
[ 162.770000] [<c014ceb4>] (echo_set_canon_col+0x1c/0x44) from [<c014eeb0>]
(n_tty_receive_buf+0xbf0/0xec4)
[ 162.780000] [<c014eeb0>] (n_tty_receive_buf+0xbf0/0xec4) from
[<c015137c>] (flush_to_ldisc+0x104/0x1b0)
[ 162.780000] [<c015137c>] (flush_to_ldisc+0x104/0x1b0) from [<bf002238>]
(gs_rx_push+0x118/0x1cc [g_serial])
[ 162.790000] [<bf002238>] (gs_rx_push+0x118/0x1cc [g_serial]) from
[<c003c780>] (tasklet_action+0x78/0xc8)
[ 162.800000] [<c003c780>] (tasklet_action+0x78/0xc8) from [<c003cc14>]
(__do_softirq+0x6c/0xf4)
[ 162.810000] [<c003cc14>] (__do_softirq+0x6c/0xf4) from [<c003cce0>]
(irq_exit+0x44/0x5c)
[ 162.820000] [<c003cce0>] (irq_exit+0x44/0x5c) from [<c001e054>]
(_text+0x54/0x6c)
[ 162.830000] [<c001e054>] (_text+0x54/0x6c) from [<c001ea28>]
(__irq_svc+0x48/0x9c)
[ 162.840000] Exception stack(0xc6567e78 to 0xc6567ec0)
[ 162.840000] 7e60:
c6c883c0 00000013
[ 162.850000] 7e80: ffffffff 00000001 00000013 00000004 c64df800 00000013
c6566000 c659c0a0
[ 162.860000] 7ea0: c6526c30 c64df800 c6567e80 c6567ec0 c02699bc c02699d0
60000013 ffffffff
[ 162.870000] [<c001ea28>] (__irq_svc+0x48/0x9c) from [<c02699bc>]
(_spin_unlock_irqrestore+0xc/0x44)
[ 162.870000] [<c02699bc>] (_spin_unlock_irqrestore+0xc/0x44) from
[<bf000d24>] (gs_write+0x5c/0x68 [g_serial])
[ 162.880000] [<bf000d24>] (gs_write+0x5c/0x68 [g_serial]) from
[<c014d870>] (n_tty_write+0x240/0x39c)
[ 162.890000] [<c014d870>] (n_tty_write+0x240/0x39c) from [<c014b078>]
(tty_write+0x180/0x21c)
[ 162.900000] [<c014b078>] (tty_write+0x180/0x21c) from [<c008c4d4>]
(vfs_write+0xac/0x158)
[ 162.910000] [<c008c4d4>] (vfs_write+0xac/0x158) from [<c008c638>]
(sys_write+0x40/0x6c)
[ 162.920000] [<c008c638>] (sys_write+0x40/0x6c) from [<c001ede0>]
(ret_fast_syscall+0x0/0x2c)
[ 162.930000] ---[ end trace aa3ca3a4bfa49b54 ]---
[ 162.930000] BUG: scheduling while atomic: cat/504/0x00000104
[ 162.940000] Modules linked in: g_serial
[ 162.940000]
[ 162.940000] Pid: 504, comm: cat
[ 162.950000] CPU: 0 Tainted: G W (2.6.30 #334)
[ 162.950000] PC is at _spin_unlock_irqrestore+0x20/0x44
[ 162.960000] LR is at _spin_unlock_irqrestore+0xc/0x44
[ 162.960000] pc : [<c02699d0>] lr : [<c02699bc>] psr: 60000013
[ 162.960000] sp : c6567ec0 ip : c6567e80 fp : c64df800
[ 162.970000] r10: c6526c30 r9 : c659c0a0 r8 : c6566000
[ 162.980000] r7 : 00000013 r6 : c64df800 r5 : 00000004 r4 : 00000013
[ 162.990000] r3 : 00000001 r2 : ffffffff r1 : 00000013 r0 : c6c883c0
[ 162.990000] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment
user
[ 163.000000] Control: 00c5387d Table: 565d0008 DAC: 00000015

When I disabled Preemptive kernel and disabled debugs in Kernel Hacking
section I only got the following message:

[ 69.810000] BUG: scheduling while atomic: cat/499/0x00000100
[ 69.820000] Modules linked in: g_serial
[ 69.820000]
[ 69.820000] Pid: 499, comm: cat
[ 69.830000] CPU: 0 Not tainted (2.6.30 #336)
[ 69.830000] PC is at gs_write+0x50/0x58 [g_serial]
[ 69.840000] LR is at check_preempt_wakeup+0x24/0x114
[ 69.840000] pc : [<bf000a48>] lr : [<c002fc5c>] psr: 60000013
[ 69.840000] sp : c6553ed0 ip : c64698e0 fp : c6469c00
[ 69.850000] r10: c6469b80 r9 : c6ffd300 r8 : c6552000
[ 69.860000] r7 : 00000005 r6 : 00000004 r5 : 00000013 r4 : c6ffd980
[ 69.860000] r3 : 00000004 r2 : 00000000 r1 : 00000001 r0 : 00000000
[ 69.870000] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment
user
[ 69.880000] Control: 00c5387d Table: 56580008 DAC: 00000015

> I've used the g_serial driver on 2.6.29 without problems (not on
> s3c6410 though).

On what hardware you use the g_serial driver? It is ARM-based? I
understand that this might be also related to the way that low level
hardware gadget driver is implemented, but I really have no idea how
to hunt for this bug.

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center


2009-06-22 14:06:59

by Peter Korsgaard

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

>>>>> "Marek" == Marek Szyprowski <[email protected]> writes:

Hi,

Marek> Yes, I am aware. That driver does not work well for me (but
Marek> this is the other case). However I did a test with his driver
Marek> and the result was also a kernel oops:

Ok :/

Any plans on merging your stuff with Ben's driver?

>> I've used the g_serial driver on 2.6.29 without problems (not on
>> s3c6410 though).

Marek> On what hardware you use the g_serial driver? It is ARM-based?
Marek> I understand that this might be also related to the way that
Marek> low level hardware gadget driver is implemented, but I really
Marek> have no idea how to hunt for this bug.

No, it was on PowerPC (mpc8347).

--
Bye, Peter Korsgaard

2009-06-22 14:07:16

by Alan Stern

[permalink] [raw]
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

On Mon, 22 Jun 2009, Marek Szyprowski wrote:

> Hello,
>
> On Monday, June 22, 2009 2:34 PM, Peter Korsgaard wrote:
>
> > Marek> I would like to ask if someone has successfully used g_serial
> > Marek> USB gadget driver with kernel 2.6.29 or 2.6.30? I'm developing
> > Marek> a low level hardware driver for USB gadgets on ARM S3C6410
> > Marek> platform. This driver is working quite fine (I've used it a
> > Marek> lot with g_ether CDC/RNDIS ethernet gadget driver). During my
> > Marek> development I've found the following bug in g_serial driver:
> >
> > You are aware that Ben Dooks has written an UDC driver for the OTG
> > controller on the s3c6410 which is now in mainline, right?
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
> > 2.6.git;a=commit;h=5b7d70c6dbf2db786395cbd21750a1a4ce222f84
>
> Yes, I am aware. That driver does not work well for me (but this is the
> other case). However I did a test with his driver and the result was also
> a kernel oops:
>
> [ 162.520000] BUG: sleeping function called from invalid context at
> kernel/mutex.c:94
> [ 162.530000] in_atomic(): 1, irqs_disabled(): 0, pid: 504, name: cat
> [ 162.540000] [<c0024afc>] (unwind_backtrace+0x0/0xdc) from [<c0268a28>]
> (mutex_lock+0x14/0x30)
> [ 162.540000] [<c0268a28>] (mutex_lock+0x14/0x30) from [<c014ceb4>]
> (echo_set_canon_col+0x1c/0x44)
> [ 162.550000] [<c014ceb4>] (echo_set_canon_col+0x1c/0x44) from [<c014eeb0>]
> (n_tty_receive_buf+0xbf0/0xec4)
> [ 162.560000] [<c014eeb0>] (n_tty_receive_buf+0xbf0/0xec4) from
> [<c015137c>] (flush_to_ldisc+0x104/0x1b0)
> [ 162.570000] [<c015137c>] (flush_to_ldisc+0x104/0x1b0) from [<bf002238>]
> (gs_rx_push+0x118/0x1cc [g_serial])
> [ 162.580000] [<bf002238>] (gs_rx_push+0x118/0x1cc [g_serial]) from
> [<c003c780>] (tasklet_action+0x78/0xc8)
> [ 162.590000] [<c003c780>] (tasklet_action+0x78/0xc8) from [<c003cc14>]
> (__do_softirq+0x6c/0xf4)
> [ 162.600000] [<c003cc14>] (__do_softirq+0x6c/0xf4) from [<c003cce0>]
> (irq_exit+0x44/0x5c)
> [ 162.610000] [<c003cce0>] (irq_exit+0x44/0x5c) from [<c001e054>]
> (_text+0x54/0x6c)
> [ 162.610000] [<c001e054>] (_text+0x54/0x6c) from [<c001ea28>]
> (__irq_svc+0x48/0x9c)
> [ 162.620000] Exception stack(0xc6567e78 to 0xc6567ec0)
> [ 162.630000] 7e60:
> c6c883c0 00000013
> [ 162.640000] 7e80: ffffffff 00000001 00000013 00000004 c64df800 00000013
> c6566000 c659c0a0
> [ 162.640000] 7ea0: c6526c30 c64df800 c6567e80 c6567ec0 c02699bc c02699d0
> 60000013 ffffffff
> [ 162.650000] [<c001ea28>] (__irq_svc+0x48/0x9c) from [<c02699bc>]
> (_spin_unlock_irqrestore+0xc/0x44)
> [ 162.660000] [<c02699bc>] (_spin_unlock_irqrestore+0xc/0x44) from
> [<bf000d24>] (gs_write+0x5c/0x68 [g_serial])
> [ 162.670000] [<bf000d24>] (gs_write+0x5c/0x68 [g_serial]) from
> [<c014d870>] (n_tty_write+0x240/0x39c)
> [ 162.680000] [<c014d870>] (n_tty_write+0x240/0x39c) from [<c014b078>]
> (tty_write+0x180/0x21c)
> [ 162.690000] [<c014b078>] (tty_write+0x180/0x21c) from [<c008c4d4>]
> (vfs_write+0xac/0x158)
> [ 162.700000] [<c008c4d4>] (vfs_write+0xac/0x158) from [<c008c638>]
> (sys_write+0x40/0x6c)
> [ 162.700000] [<c008c638>] (sys_write+0x40/0x6c) from [<c001ede0>]
> (ret_fast_syscall+0x0/0x2c)
> [ 162.710000] ------------[ cut here ]------------
> [ 162.720000] WARNING: at kernel/mutex.c:207
> __mutex_lock_slowpath+0x6c/0x2c8()
> [ 162.720000] Modules linked in: g_serial
> [ 162.730000] [<c0024afc>] (unwind_backtrace+0x0/0xdc) from [<c0037e94>]
> (warn_slowpath_common+0x4c/0x80)
> [ 162.740000] [<c0037e94>] (warn_slowpath_common+0x4c/0x80) from
> [<c026808c>] (__mutex_lock_slowpath+0x6c/0x2c8)
> [ 162.750000] [<c026808c>] (__mutex_lock_slowpath+0x6c/0x2c8) from
> [<c0268a30>] (mutex_lock+0x1c/0x30)
> [ 162.760000] [<c0268a30>] (mutex_lock+0x1c/0x30) from [<c014ceb4>]
> (echo_set_canon_col+0x1c/0x44)
> [ 162.770000] [<c014ceb4>] (echo_set_canon_col+0x1c/0x44) from [<c014eeb0>]
> (n_tty_receive_buf+0xbf0/0xec4)
> [ 162.780000] [<c014eeb0>] (n_tty_receive_buf+0xbf0/0xec4) from
> [<c015137c>] (flush_to_ldisc+0x104/0x1b0)
> [ 162.780000] [<c015137c>] (flush_to_ldisc+0x104/0x1b0) from [<bf002238>]
> (gs_rx_push+0x118/0x1cc [g_serial])
> [ 162.790000] [<bf002238>] (gs_rx_push+0x118/0x1cc [g_serial]) from
> [<c003c780>] (tasklet_action+0x78/0xc8)
> [ 162.800000] [<c003c780>] (tasklet_action+0x78/0xc8) from [<c003cc14>]
> (__do_softirq+0x6c/0xf4)
> [ 162.810000] [<c003cc14>] (__do_softirq+0x6c/0xf4) from [<c003cce0>]
> (irq_exit+0x44/0x5c)
> [ 162.820000] [<c003cce0>] (irq_exit+0x44/0x5c) from [<c001e054>]
> (_text+0x54/0x6c)
> [ 162.830000] [<c001e054>] (_text+0x54/0x6c) from [<c001ea28>]
> (__irq_svc+0x48/0x9c)
> [ 162.840000] Exception stack(0xc6567e78 to 0xc6567ec0)
> [ 162.840000] 7e60:
> c6c883c0 00000013
> [ 162.850000] 7e80: ffffffff 00000001 00000013 00000004 c64df800 00000013
> c6566000 c659c0a0
> [ 162.860000] 7ea0: c6526c30 c64df800 c6567e80 c6567ec0 c02699bc c02699d0
> 60000013 ffffffff
> [ 162.870000] [<c001ea28>] (__irq_svc+0x48/0x9c) from [<c02699bc>]
> (_spin_unlock_irqrestore+0xc/0x44)
> [ 162.870000] [<c02699bc>] (_spin_unlock_irqrestore+0xc/0x44) from
> [<bf000d24>] (gs_write+0x5c/0x68 [g_serial])
> [ 162.880000] [<bf000d24>] (gs_write+0x5c/0x68 [g_serial]) from
> [<c014d870>] (n_tty_write+0x240/0x39c)
> [ 162.890000] [<c014d870>] (n_tty_write+0x240/0x39c) from [<c014b078>]
> (tty_write+0x180/0x21c)
> [ 162.900000] [<c014b078>] (tty_write+0x180/0x21c) from [<c008c4d4>]
> (vfs_write+0xac/0x158)
> [ 162.910000] [<c008c4d4>] (vfs_write+0xac/0x158) from [<c008c638>]
> (sys_write+0x40/0x6c)
> [ 162.920000] [<c008c638>] (sys_write+0x40/0x6c) from [<c001ede0>]
> (ret_fast_syscall+0x0/0x2c)
> [ 162.930000] ---[ end trace aa3ca3a4bfa49b54 ]---
> [ 162.930000] BUG: scheduling while atomic: cat/504/0x00000104
> [ 162.940000] Modules linked in: g_serial
> [ 162.940000]
> [ 162.940000] Pid: 504, comm: cat
> [ 162.950000] CPU: 0 Tainted: G W (2.6.30 #334)
> [ 162.950000] PC is at _spin_unlock_irqrestore+0x20/0x44
> [ 162.960000] LR is at _spin_unlock_irqrestore+0xc/0x44
> [ 162.960000] pc : [<c02699d0>] lr : [<c02699bc>] psr: 60000013
> [ 162.960000] sp : c6567ec0 ip : c6567e80 fp : c64df800
> [ 162.970000] r10: c6526c30 r9 : c659c0a0 r8 : c6566000
> [ 162.980000] r7 : 00000013 r6 : c64df800 r5 : 00000004 r4 : 00000013
> [ 162.990000] r3 : 00000001 r2 : ffffffff r1 : 00000013 r0 : c6c883c0
> [ 162.990000] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment
> user
> [ 163.000000] Control: 00c5387d Table: 565d0008 DAC: 00000015
>
> When I disabled Preemptive kernel and disabled debugs in Kernel Hacking
> section I only got the following message:
>
> [ 69.810000] BUG: scheduling while atomic: cat/499/0x00000100
> [ 69.820000] Modules linked in: g_serial
> [ 69.820000]
> [ 69.820000] Pid: 499, comm: cat
> [ 69.830000] CPU: 0 Not tainted (2.6.30 #336)
> [ 69.830000] PC is at gs_write+0x50/0x58 [g_serial]
> [ 69.840000] LR is at check_preempt_wakeup+0x24/0x114
> [ 69.840000] pc : [<bf000a48>] lr : [<c002fc5c>] psr: 60000013
> [ 69.840000] sp : c6553ed0 ip : c64698e0 fp : c6469c00
> [ 69.850000] r10: c6469b80 r9 : c6ffd300 r8 : c6552000
> [ 69.860000] r7 : 00000005 r6 : 00000004 r5 : 00000013 r4 : c6ffd980
> [ 69.860000] r3 : 00000004 r2 : 00000000 r1 : 00000001 r0 : 00000000
> [ 69.870000] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment
> user
> [ 69.880000] Control: 00c5387d Table: 56580008 DAC: 00000015
>
> > I've used the g_serial driver on 2.6.29 without problems (not on
> > s3c6410 though).
>
> On what hardware you use the g_serial driver? It is ARM-based? I
> understand that this might be also related to the way that low level
> hardware gadget driver is implemented, but I really have no idea how
> to hunt for this bug.

This is just a guess... But there's a good possibility that the oops
was caused by recent changes to the serial layer which have not been
propagated through to the g_serial driver.

Alan Stern

2009-06-22 14:13:22

by Marek Szyprowski

[permalink] [raw]
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

Hello,

On Monday, June 22, 2009 4:07 PM, Alan Stern wrote:

> > > Marek> I would like to ask if someone has successfully used
> g_serial
> > > Marek> USB gadget driver with kernel 2.6.29 or 2.6.30? I'm
> developing
> > > Marek> a low level hardware driver for USB gadgets on ARM S3C6410
> > > Marek> platform. This driver is working quite fine (I've used it a
> > > Marek> lot with g_ether CDC/RNDIS ethernet gadget driver). During
> my
> > > Marek> development I've found the following bug in g_serial
> driver:
> > >
> > > You are aware that Ben Dooks has written an UDC driver for the OTG
> > > controller on the s3c6410 which is now in mainline, right?
> > >
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
> > > 2.6.git;a=commit;h=5b7d70c6dbf2db786395cbd21750a1a4ce222f84
> >
> > Yes, I am aware. That driver does not work well for me (but this is
> the
> > other case). However I did a test with his driver and the result was
> also
> > a kernel oops:
> >
> > [...]
> >
> > > I've used the g_serial driver on 2.6.29 without problems (not on
> > > s3c6410 though).
> >
> > On what hardware you use the g_serial driver? It is ARM-based? I
> > understand that this might be also related to the way that low level
> > hardware gadget driver is implemented, but I really have no idea how
> > to hunt for this bug.
>
> This is just a guess... But there's a good possibility that the oops
> was caused by recent changes to the serial layer which have not been
> propagated through to the g_serial driver.

How recent these changes are? I did a test on another ARM-based Linux
platform with old 2.6.28 kernel and the result was exactly the same as
above...

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center

2009-06-22 15:02:33

by Alan Stern

[permalink] [raw]
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

On Mon, 22 Jun 2009, Marek Szyprowski wrote:

> > This is just a guess... But there's a good possibility that the oops
> > was caused by recent changes to the serial layer which have not been
> > propagated through to the g_serial driver.
>
> How recent these changes are? I did a test on another ARM-based Linux
> platform with old 2.6.28 kernel and the result was exactly the same as
> above...

I think the changes are new in 2.6.30. So they aren't the reason for
you oops, after all.

Alan Stern

2009-06-23 03:26:59

by David Brownell

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

On Monday 22 June 2009, Marek Szyprowski wrote:
>
> > This is just a guess... ?But there's a good possibility that the oops
> > was caused by recent changes to the serial layer which have not been
> > propagated through to the g_serial driver.
>
> How recent these changes are? I did a test on another ARM-based Linux
> platform with old 2.6.28 kernel and the result was exactly the same as
> above...

Just for the record, the reworked g_serial code merged in 2.6.27
and was mostly developed on 2.6.25 and 2.6.26 ... and it included
a lot of stress testing. No such mutex_lock() in_irq() bug showed
up at that time. And that was running with all practical kernel
debug options, so it should have showed up if it were that easy.

I do however recall turning up several regressions in how "sparse"
lock checking behaved. As in, it broke when faced with common
idioms like needing to temporarily drop a lock deep in a call stack.

Now, the serial layer has been getting a *LONG* overdue incremental
overhaul since before that started. So there's been plenty of time
for incompatible changes to sneak in; I believe Alan Cox focuses on
host side things, out of defensive necessity.

Like, oh, changing a spinlock to a mutex. You might change the
low_latency setting and review how that's now supposed to behave.

- Dave

2009-06-23 06:42:53

by Marek Szyprowski

[permalink] [raw]
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

Hello,

On Tuesday, June 23, 2009 5:27 AM, David Brownell wrote:

> > > This is just a guess... ?But there's a good possibility that the
> oops
> > > was caused by recent changes to the serial layer which have not
> been
> > > propagated through to the g_serial driver.
> >
> > How recent these changes are? I did a test on another ARM-based Linux
> > platform with old 2.6.28 kernel and the result was exactly the same
> as
> > above...
>
> Just for the record, the reworked g_serial code merged in 2.6.27
> and was mostly developed on 2.6.25 and 2.6.26 ... and it included
> a lot of stress testing. No such mutex_lock() in_irq() bug showed
> up at that time. And that was running with all practical kernel
> debug options, so it should have showed up if it were that easy.
>
> I do however recall turning up several regressions in how "sparse"
> lock checking behaved. As in, it broke when faced with common
> idioms like needing to temporarily drop a lock deep in a call stack.
>
> Now, the serial layer has been getting a *LONG* overdue incremental
> overhaul since before that started. So there's been plenty of time
> for incompatible changes to sneak in; I believe Alan Cox focuses on
> host side things, out of defensive necessity.
>
> Like, oh, changing a spinlock to a mutex. You might change the
> low_latency setting and review how that's now supposed to behave.

In in gs_open() function in drivers/usb/gadget/u_serial.c, after the
line 780 I've added:

tty->low_latency = 1;

This helped a bit, but after a few serial transfers I got the same
bug again.

With mutex debugging and preemptive kernel turned off I got the
following crash:

# ls >/dev/ttyGS0
[ 80.070000] BUG: scheduling while atomic: ls/497/0x00000100
[ 80.070000] Modules linked in:
[ 80.070000]
[ 80.070000] Pid: 497, comm: ls
[ 80.070000] CPU: 0 Not tainted (2.6.30 #340)
[ 80.080000] PC is at n_tty_write+0x1fc/0x39c
[ 80.080000] LR is at gs_write_room+0x38/0x54
[ 80.090000] pc : [<c013d7d4>] lr : [<c0184fb0>] psr: 60000013
[ 80.090000] sp : c6573ee0 ip : 00000020 fp : c6505400
[ 80.100000] r10: c6505380 r9 : c6f13f00 r8 : c6572000
[ 80.100000] r7 : 0000005a r6 : c6505400 r5 : c6505000 r4 : 00001fff
[ 80.110000] r3 : 000000a0 r2 : 00000015 r1 : c6e6eb40 r0 : 0000005a
[ 80.120000] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment
user
[ 80.120000] Control: 00c5387d Table: 56580008 DAC: 00000015

With preemptive kernel and mutex debuging:

[ 26.800000] BUG: sleeping function called from invalid context at
kernel/mutex.c:280
[ 26.800000] in_atomic(): 1, irqs_disabled(): 0, pid: 500, name: ls
[ 26.810000] no locks held by ls/500.
[ 26.810000] [<c0025c30>] (unwind_backtrace+0x0/0xdc) from [<c0272570>]
(mutex_lock_nested+0x28/0x304)
[ 26.820000] [<c0272570>] (mutex_lock_nested+0x28/0x304) from [<c0153b0c>]
(echo_set_canon_col+0x20/0x48)
[ 26.830000] [<c0153b0c>] (echo_set_canon_col+0x20/0x48) from [<c0155b2c>]
(n_tty_receive_buf+0xbf8/0xecc)
[ 26.840000] [<c0155b2c>] (n_tty_receive_buf+0xbf8/0xecc) from
[<c0158054>] (flush_to_ldisc+0x104/0x1b0)
[ 26.850000] [<c0158054>] (flush_to_ldisc+0x104/0x1b0) from [<c019e0b8>]
(gs_rx_push+0x118/0x1cc)
[ 26.860000] [<c019e0b8>] (gs_rx_push+0x118/0x1cc) from [<c003d794>]
(tasklet_action+0x78/0xc8)
[ 26.870000] [<c003d794>] (tasklet_action+0x78/0xc8) from [<c003dc28>]
(__do_softirq+0x6c/0xf4)
[ 26.880000] [<c003dc28>] (__do_softirq+0x6c/0xf4) from [<c003dcf4>]
(irq_exit+0x44/0x5c)
[ 26.890000] [<c003dcf4>] (irq_exit+0x44/0x5c) from [<c001f054>]
(_text+0x54/0x6c)
[ 26.890000] [<c001f054>] (_text+0x54/0x6c) from [<c001fc04>]
(__irq_usr+0x44/0xa0)
[ 26.900000] Exception stack(0xc65a9fb0 to 0xc65a9ff8)
[ 26.910000] 9fa0: 4022e68c 000007b2
40215e98 00000001
[ 26.910000] 9fc0: 4022e68c 4022fef8 4022d000 00002f28 00000001 00001688
beb16a34 00000028
[ 26.920000] 9fe0: 4022d000 beb1693c 401904e0 4018f620 20000010 ffffffff
[ 26.930000] ------------[ cut here ]------------
[ 26.930000] WARNING: at kernel/mutex.c:207 mutex_lock_nested+0xa4/0x304()
[ 26.940000] Modules linked in:
[ 26.940000] [<c0025c30>] (unwind_backtrace+0x0/0xdc) from [<c0038e4c>]
(warn_slowpath_common+0x4c/0x80)
[ 26.950000] [<c0038e4c>] (warn_slowpath_common+0x4c/0x80) from
[<c02725ec>] (mutex_lock_nested+0xa4/0x304)
[ 26.960000] [<c02725ec>] (mutex_lock_nested+0xa4/0x304) from [<c0153b0c>]
(echo_set_canon_col+0x20/0x48)
[ 26.970000] [<c0153b0c>] (echo_set_canon_col+0x20/0x48) from [<c0155b2c>]
(n_tty_receive_buf+0xbf8/0xecc)
[ 26.980000] [<c0155b2c>] (n_tty_receive_buf+0xbf8/0xecc) from
[<c0158054>] (flush_to_ldisc+0x104/0x1b0)
[ 26.990000] [<c0158054>] (flush_to_ldisc+0x104/0x1b0) from [<c019e0b8>]
(gs_rx_push+0x118/0x1cc)
[ 27.000000] [<c019e0b8>] (gs_rx_push+0x118/0x1cc) from [<c003d794>]
(tasklet_action+0x78/0xc8)
[ 27.010000] [<c003d794>] (tasklet_action+0x78/0xc8) from [<c003dc28>]
(__do_softirq+0x6c/0xf4)
[ 27.020000] [<c003dc28>] (__do_softirq+0x6c/0xf4) from [<c003dcf4>]
(irq_exit+0x44/0x5c)
[ 27.030000] [<c003dcf4>] (irq_exit+0x44/0x5c) from [<c001f054>]
(_text+0x54/0x6c)
[ 27.030000] [<c001f054>] (_text+0x54/0x6c) from [<c001fc04>]
(__irq_usr+0x44/0xa0)
[ 27.040000] Exception stack(0xc65a9fb0 to 0xc65a9ff8)
[ 27.050000] 9fa0: 4022e68c 000007b2
40215e98 00000001
[ 27.050000] 9fc0: 4022e68c 4022fef8 4022d000 00002f28 00000001 00001688
beb16a34 00000028
[ 27.060000] 9fe0: 4022d000 beb1693c 401904e0 4018f620 20000010 ffffffff
[ 27.070000] ---[ end trace 675aa93c2a01b9b1 ]---

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center

2009-06-23 07:22:46

by David Brownell

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

On Monday 22 June 2009, Marek Szyprowski wrote:
>
> > Like, oh, changing a spinlock to a mutex. ? You might change the
> > low_latency setting and review how that's now supposed to behave.
>
> In in gs_open() function in drivers/usb/gadget/u_serial.c, after the
> line 780 I've added:
>
> tty->low_latency = 1;

That's already how it's set. Try instead *REMOVING* the line
which sets that flag.


> This helped a bit, but after a few serial transfers I got the same
> bug again.

Of course. You didn't change anything.

2009-06-23 08:38:49

by Marek Szyprowski

[permalink] [raw]
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

Hello,

On Tuesday, June 23, 2009 9:23 AM, David Brownell wrote:

> > > Like, oh, changing a spinlock to a mutex. ? You might change the
> > > low_latency setting and review how that's now supposed to behave.
> >
> > In in gs_open() function in drivers/usb/gadget/u_serial.c, after the
> > line 780 I've added:
> >
> > tty->low_latency = 1;
>
> That's already how it's set. Try instead *REMOVING* the line
> which sets that flag.

My fault. I didn't notice that a few lines lower the low_latency flag is
alredy set and I thought I should set it. After removing that, g_serial
started working. :) Thanks a lot.

I noticed that there is also a comment:
/* low_latency means ldiscs work in tasklet context, without
* needing a workqueue schedule ... easier to keep up.
*/

So in low latency mode calls are made from a tasklet. This is not true in
my case, as the S3C OTG UDC driver does all its job in interrupts. This
way also a (usb) packet_done callback is done from an interrupt. I expect
that g_serial driver interacts with tty layer in that packet_done callback,
so this is the source of the problems. I noticed that some other UDC
drivers also does all its job from an interrupt, so they also might be
affected. How this bug should be properly resolved?

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center

2009-06-23 09:20:46

by Alan

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

> that g_serial driver interacts with tty layer in that packet_done callback,
> so this is the source of the problems. I noticed that some other UDC
> drivers also does all its job from an interrupt, so they also might be
> affected. How this bug should be properly resolved?

Either by not setting ->low_latency or by running the data paths from a
non IRQ context.

Basically: don't set tty->low_latency if you are handling the processing
from the IRQ path. The only case low_latency is useful is handling data
from a non-IRQ path where you have latency concerns for tx/rx switching.

2009-06-23 09:34:54

by Marek Szyprowski

[permalink] [raw]
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

Hello,

On Tuesday, June 23, 2009 11:21 AM, Alan Cox wrote:

> > that g_serial driver interacts with tty layer in that packet_done
> callback,
> > so this is the source of the problems. I noticed that some other UDC
> > drivers also does all its job from an interrupt, so they also might
> be
> > affected. How this bug should be properly resolved?
>
> Either by not setting ->low_latency or by running the data paths from a
> non IRQ context.
>
> Basically: don't set tty->low_latency if you are handling the
> processing
> from the IRQ path. The only case low_latency is useful is handling data
> from a non-IRQ path where you have latency concerns for tx/rx
> switching.

I know that, but I wonder how this should be handled in g_serial gadget
driver, which might be working on top of both types of low level
drivers (doing callback from irq or tasklet).

Are there any drawbacks of disabling low_latency mode if callbacks are
done from tasklet not from interrupt? If no then the low latency mode
should be disabled in g_serial driver.

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center

2009-06-23 09:53:43

by Alan

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

> Are there any drawbacks of disabling low_latency mode if callbacks are
> done from tasklet not from interrupt?

Not really no

Alan

2009-06-23 15:01:55

by Marek Szyprowski

[permalink] [raw]
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

Hello,

On Tuesday, June 23, 2009 11:55 AM, Alan Cox wrote:

> > Are there any drawbacks of disabling low_latency mode if callbacks
> are
> > done from tasklet not from interrupt?
>
> Not really no

David: do you want to update a g_serial driver and disable the low latency
mode, or should I post a patch that does it? Or do you have other plan to
solve this issue? I tested the g_serial driver with low latency mode
disabled and it works quite fine here on ARM S3C6410.

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center

2009-06-23 16:47:51

by David Brownell

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

On Tuesday 23 June 2009, Marek Szyprowski wrote:
> David: do you want to update a g_serial driver and disable the low latency
> mode, or should I post a patch that does it? Or do you have other plan to
> solve this issue? I tested the g_serial driver with low latency mode
> disabled and it works quite fine here on ARM S3C6410.

I'd like you to submit that.

- Dave

2009-06-23 16:52:49

by David Brownell

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

On Tuesday 23 June 2009, Marek Szyprowski wrote:
> /* low_latency means ldiscs work in tasklet context, without
> ?* needing a workqueue schedule ... easier to keep up.
> ?*/
>
> So in low latency mode calls are made from a tasklet.

... and that has, at some point since 2.6.26 or so, become a
problem that caused oopsing.


> This is not true in
> my case, as the S3C OTG UDC driver does all its job in interrupts. This
> way also a (usb) packet_done callback is done from an interrupt. I expect
> that g_serial driver interacts with tty layer in that packet_done callback,
> so this is the source of the problems. I noticed that some other UDC
> drivers also does all its job from an interrupt, so they also might be
> affected. How this bug should be properly resolved?

Change the u_serial.c code so that this newish tty behavior
stops causing problems: don't set low_latency.

But also try and sort through any consequences of that, and
don't forget to update the comments which talk about how the
low_latency setting is affecting code flow.

- Dave

2009-06-24 07:08:43

by Marek Szyprowski

[permalink] [raw]
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

Hello,

On Tuesday, June 23, 2009 6:53 PM David Brownell wrote:

> > /* low_latency means ldiscs work in tasklet context, without
> > ?* needing a workqueue schedule ... easier to keep up.
> > ?*/
> >
> > So in low latency mode calls are made from a tasklet.
>
> ... and that has, at some point since 2.6.26 or so, become a
> problem that caused oopsing.
>
> > This is not true in
> > my case, as the S3C OTG UDC driver does all its job in interrupts. This
> > way also a (usb) packet_done callback is done from an interrupt. I expect
> > that g_serial driver interacts with tty layer in that packet_done callback,
> > so this is the source of the problems. I noticed that some other UDC
> > drivers also does all its job from an interrupt, so they also might be
> > affected. How this bug should be properly resolved?
>
> Change the u_serial.c code so that this newish tty behavior
> stops causing problems: don't set low_latency.
>
> But also try and sort through any consequences of that, and
> don't forget to update the comments which talk about how the
> low_latency setting is affecting code flow.

I did some additional tests and found another bug. When I enabled debug in my
low level udc driver then I can easily trigger the following bug:

[ 55.630000] Unable to handle kernel NULL pointer dereference at virtual address 00000014
[ 55.630000] pgd = c0004000
[ 55.630000] [00000014] *pgd=00000000
[ 55.630000] Internal error: Oops: 17 [#1] PREEMPT
[ 55.630000] Modules linked in:
[ 55.630000] CPU: 0 Not tainted (2.6.30 #355)
[ 55.630000] PC is at __lock_acquire+0xa0/0xa6c
[ 55.630000] LR is at lock_acquire+0x58/0x6c
[ 55.630000] pc : [<c005786c>] lr : [<c0058290>] psr: 60000093
[ 55.630000] sp : c6c39da8 ip : c6c22080 fp : c6546524
[ 55.630000] r10: c6546128 r9 : 00000010 r8 : 0000001f
[ 55.630000] r7 : 00000000 r6 : c6c38000 r5 : 60000093 r4 : 00000001
[ 55.630000] r3 : 60000093 r2 : 00000000 r1 : 00000000 r0 : 00000000
[ 55.630000] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 55.630000] Control: 00c5387d Table: 565c4008 DAC: 00000017
[ 55.630000] Process events/0 (pid: 5, stack limit = 0xc6c38268)
[ 55.630000] Stack: (0xc6c39da8 to 0xc6c3a000)
[ 55.630000] 9da0: 00000000 00000000 00000000 00000000 c6c22080 00000000
[ 55.630000] 9dc0: 60000013 00000080 60000093 c6c38000 00000000 0000001f c6546168 c6546128
[ 55.630000] 9de0: c6546524 c0058290 00000001 00000080 00000000 c019d608 c6546524 00000000
[ 55.630000] 9e00: 00000000 c019d608 a0000013 c0274e74 00000001 00000000 c019d608 22222222
[ 55.630000] 9e20: 22222222 00000000 0000000a 00001500 c6546000 c019d608 c6546000 0000000a
[ 55.630000] 9e40: 00001500 c0156058 c6546524 c01538e0 c6546000 c65464e4 c0155a38 00000000
[ 55.630000] 9e60: 0000000a 00001500 c6546000 0000001f c6546168 c6546128 c654656c c0155a40
[ 55.630000] 9e80: c65c241d c65c253a 00000000 00000003 c65461d8 c65461bc c6546124 c654624c
[ 55.630000] 9ea0: c6546108 c654621c c6546524 0000001c c6c22080 0000000c 00000000 00000000
[ 55.630000] 9ec0: 00000000 00000000 c6c22080 00000000 00000000 00000000 c6c22080 c6c38000
[ 55.630000] 9ee0: 00000000 00000080 60000093 c6c38000 00000000 00000000 c6546168 c6546128
[ 55.630000] 9f00: c0158070 c0058290 00000001 00000000 c6546168 c01580a0 a0000013 00000000
[ 55.630000] 9f20: c6546168 c6546128 c0158070 c0274e7c c6546168 c65c241d c65c251d c65c2400
[ 55.630000] 9f40: c6546000 0000001f c6546168 c6546128 c6546108 c0158174 c0158070 c6d5dc80
[ 55.630000] 9f60: 00000001 c654612c c6c38000 c6c1c180 c6546128 00000000 c6c39fb4 00000001
[ 55.630000] 9f80: c0158070 c00498e0 00000001 00000000 c0049874 c035f870 c6c1c1a4 c6c1c19c
[ 55.630000] 9fa0: 00000000 c6c22080 c004d7e0 c6c39fac c6c39fac c076e76c 00000000 c0314cba
[ 55.630000] 9fc0: c6c1c180 c6c38000 c6c1c180 c0049710 00000000 00000000 00000000 00000000
[ 55.630000] 9fe0: 00000000 c004d55c 00000000 00000000 00000000 c003c03c c0335f20 ffffffff
[ 55.630000] [<c005786c>] (__lock_acquire+0xa0/0xa6c) from [<c0058290>] (lock_acquire+0x58/0x6c)
[ 55.630000] [<c0058290>] (lock_acquire+0x58/0x6c) from [<c0274e74>] (_spin_lock_irqsave+0x44/0x58)
[ 55.630000] [<c0274e74>] (_spin_lock_irqsave+0x44/0x58) from [<c019d608>] (gs_write_room+0x10/0x58)
[ 55.630000] [<c019d608>] (gs_write_room+0x10/0x58) from [<c0156058>] (tty_write_room+0x20/0x28)
[ 55.630000] [<c0156058>] (tty_write_room+0x20/0x28) from [<c01538e0>] (process_echoes+0x4c/0x288)
[ 55.630000] [<c01538e0>] (process_echoes+0x4c/0x288) from [<c0155a40>] (n_tty_receive_buf+0x9ec/0xecc)
[ 55.630000] [<c0155a40>] (n_tty_receive_buf+0x9ec/0xecc) from [<c0158174>] (flush_to_ldisc+0x104/0x1b0)
[ 55.630000] [<c0158174>] (flush_to_ldisc+0x104/0x1b0) from [<c00498e0>] (worker_thread+0x1d0/0x2cc)
[ 55.630000] [<c00498e0>] (worker_thread+0x1d0/0x2cc) from [<c004d55c>] (kthread+0x58/0x90)
[ 55.630000] [<c004d55c>] (kthread+0x58/0x90) from [<c003c03c>] (do_exit+0x0/0x5d0)
[ 55.630000] [<c003c03c>] (do_exit+0x0/0x5d0) from [<c6c26180>] (0xc6c26180)
[ 55.630000] Code: ea000076 e59d100c e3510000 1a000002 (e5994004)
[ 55.640000] ---[ end trace f9a4499d9482c504 ]---
[ 55.650000] note: events/0[5] exited with preempt_count 1

The low latency mode is disabled. I've tested different ways of gathering
the debug output from my lowlevel udc driver: in first attempt it was
printk(KERN_INFO ...) then I switched to use lowlevel printascii() to remove
possible interaction with console/tty framework at all. In both cases I got
the above NULL pointer exception. It only worked fine when I switched to
printk(KERN_DEBUG ...) with debug level above KERN_DEBUG, so all the
messages were only buffered in dmesg buffer. This looks like a timing issue
in tty framework...

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center

2009-06-24 08:41:54

by Alan

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

> [ 55.630000] [<c0274e74>] (_spin_lock_irqsave+0x44/0x58) from [<c019d608>] (gs_write_room+0x10/0x58)
> [ 55.630000] [<c019d608>] (gs_write_room+0x10/0x58) from [<c0156058>] (tty_write_room+0x20/0x28)
> [ 55.630000] [<c0156058>] (tty_write_room+0x20/0x28) from [<c01538e0>] (process_echoes+0x4c/0x288)
> [ 55.630000] [<c01538e0>] (process_echoes+0x4c/0x288) from [<c0155a40>] (n_tty_receive_buf+0x9ec/0xecc)
> [ 55.630000] [<c0155a40>] (n_tty_receive_buf+0x9ec/0xecc) from [<c0158174>] (flush_to_ldisc+0x104/0x1b0)
> [ 55.630000] [<c0158174>] (flush_to_ldisc+0x104/0x1b0) from [<c00498e0>] (worker_thread+0x1d0/0x2cc)
> [ 55.630000] [<c00498e0>] (worker_thread+0x1d0/0x2cc) from [<c004d55c>] (kthread+0x58/0x90)
> [ 55.630000] [<c004d55c>] (kthread+0x58/0x90) from [<c003c03c>] (do_exit+0x0/0x5d0)
> [ 55.630000] [<c003c03c>] (do_exit+0x0/0x5d0) from [<c6c26180>] (0xc6c26180)
> [ 55.630000] Code: ea000076 e59d100c e3510000 1a000002 (e5994004)
> [ 55.640000] ---[ end trace f9a4499d9482c504 ]---
> [ 55.650000] note: events/0[5] exited with preempt_count 1
>
> The low latency mode is disabled. I've tested different ways of gathering
> the debug output from my lowlevel udc driver: in first attempt it was
> printk(KERN_INFO ...) then I switched to use lowlevel printascii() to remove
> possible interaction with console/tty framework at all. In both cases I got
> the above NULL pointer exception. It only worked fine when I switched to
> printk(KERN_DEBUG ...) with debug level above KERN_DEBUG, so all the
> messages were only buffered in dmesg buffer. This looks like a timing issue
> in tty framework...

Looks to me like the gs_write_room function takes a lock already held
when the driver calls tty_flip_buffer_push() - which will and can call
back into the driver (eg for echo processing)

2009-06-24 08:49:53

by David Brownell

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

On Wednesday 24 June 2009, Marek Szyprowski wrote:
> I did some additional tests and found another bug. When I enabled debug in my
> low level udc driver then I can easily trigger the following bug:
>
> [ ? 55.630000] Unable to handle kernel NULL pointer dereference at virtual address 00000014
> [ ? 55.630000] pgd = c0004000
> [ ? 55.630000] [00000014] *pgd=00000000
> [ ? 55.630000] Internal error: Oops: 17 [#1] PREEMPT
> [ ? 55.630000] Modules linked in:
> [ ? 55.630000] CPU: 0 ? ?Not tainted ?(2.6.30 #355)
> [ ? 55.630000] PC is at __lock_acquire+0xa0/0xa6c
> [ ? 55.630000] LR is at lock_acquire+0x58/0x6c
> [ ? 55.630000] ...
> [ ? 55.630000] [<c005786c>] (__lock_acquire+0xa0/0xa6c) from [<c0058290>] (lock_acquire+0x58/0x6c)
> [ ? 55.630000] [<c0058290>] (lock_acquire+0x58/0x6c) from [<c0274e74>] (_spin_lock_irqsave+0x44/0x58)
> [ ? 55.630000] [<c0274e74>] (_spin_lock_irqsave+0x44/0x58) from [<c019d608>] (gs_write_room+0x10/0x58)
> [ ? 55.630000] [<c019d608>] (gs_write_room+0x10/0x58) from [<c0156058>] (tty_write_room+0x20/0x28)

So it's looking like tty->driver_data is somehow NULL. That's
never supposed to happen. Did gs_open() fail or something?



> [ ? 55.630000] [<c0156058>] (tty_write_room+0x20/0x28) from [<c01538e0>] (process_echoes+0x4c/0x288)
> [ ? 55.630000] [<c01538e0>] (process_echoes+0x4c/0x288) from [<c0155a40>] (n_tty_receive_buf+0x9ec/0xecc)
> [ ? 55.630000] [<c0155a40>] (n_tty_receive_buf+0x9ec/0xecc) from [<c0158174>] (flush_to_ldisc+0x104/0x1b0)
> [ ? 55.630000] [<c0158174>] (flush_to_ldisc+0x104/0x1b0) from [<c00498e0>] (worker_thread+0x1d0/0x2cc)
> [ ? 55.630000] [<c00498e0>] (worker_thread+0x1d0/0x2cc) from [<c004d55c>] (kthread+0x58/0x90)
> [ ? 55.630000] [<c004d55c>] (kthread+0x58/0x90) from [<c003c03c>] (do_exit+0x0/0x5d0)
> [ ? 55.630000] [<c003c03c>] (do_exit+0x0/0x5d0) from [<c6c26180>] (0xc6c26180)
> [ ? 55.630000] Code: ea000076 e59d100c e3510000 1a000002 (e5994004)
> [ ? 55.640000] ---[ end trace f9a4499d9482c504 ]---
> [ ? 55.650000] note: events/0[5] exited with preempt_count 1
>
> The low latency mode is disabled. I've tested different ways of gathering
> the debug output from my lowlevel udc driver: in first attempt it was
> printk(KERN_INFO ...) then I switched to use lowlevel printascii() to remove
> possible interaction with console/tty framework at all. In both cases I got
> the above NULL pointer exception. It only worked fine when I switched to
> printk(KERN_DEBUG ...) with debug level above KERN_DEBUG, so all the
> messages were only buffered in dmesg buffer. This looks like a timing issue
> in tty framework...
> ?

2009-06-24 08:54:30

by David Brownell

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

On Wednesday 24 June 2009, Alan Cox wrote:
> Looks to me like the gs_write_room function takes a lock already held
> when the driver calls tty_flip_buffer_push() - which will and can call
> back into the driver (eg for echo processing)

That doesn't seem to explain whatever null pointer was being
traversed though.

A lock-already-held problem would self-deadlock, not oops.

2009-06-24 10:32:50

by Marek Szyprowski

[permalink] [raw]
Subject: RE: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

Hello,

On Wednesday, June 24, 2009 10:50 AM, David Brownell wrote:

> > I did some additional tests and found another bug. When I enabled debug in my
> > low level udc driver then I can easily trigger the following bug:
> >
> > [ ? 55.630000] Unable to handle kernel NULL pointer dereference at virtual address 00000014
> > [ ? 55.630000] pgd = c0004000
> > [ ? 55.630000] [00000014] *pgd=00000000
> > [ ? 55.630000] Internal error: Oops: 17 [#1] PREEMPT
> > [ ? 55.630000] Modules linked in:
> > [ ? 55.630000] CPU: 0 ? ?Not tainted ?(2.6.30 #355)
> > [ ? 55.630000] PC is at __lock_acquire+0xa0/0xa6c
> > [ ? 55.630000] LR is at lock_acquire+0x58/0x6c
> > [ ? 55.630000] ...
> > [ ? 55.630000] [<c005786c>] (__lock_acquire+0xa0/0xa6c) from [<c0058290>] (lock_acquire+0x58/0x6c)
> > [ ? 55.630000] [<c0058290>] (lock_acquire+0x58/0x6c) from [<c0274e74>]
> (_spin_lock_irqsave+0x44/0x58)
> > [ ? 55.630000] [<c0274e74>] (_spin_lock_irqsave+0x44/0x58) from [<c019d608>]
> (gs_write_room+0x10/0x58)
> > [ ? 55.630000] [<c019d608>] (gs_write_room+0x10/0x58) from [<c0156058>] (tty_write_room+0x20/0x28)
>
> So it's looking like tty->driver_data is somehow NULL. That's
> never supposed to happen. Did gs_open() fail or something?

I've triggered this bug by running a 'getty -L 115200 ttyGS0 vt100' and trying
to login on that usb console and then pressing 'enter' key for longer time.
getty might abort after a few failed logins, so the /dev/ttyGS0 file might be
closed before all characters that need to be echoed were processed (usb udc debug
messages really slows it down). This is however a pure guessing, I know nothing
about tty framework and internals.

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center


2009-06-24 23:43:55

by David Brownell

[permalink] [raw]
Subject: Re: PROBLEM: kernel oops with g_serial USB gadget on 2.6.30

On Wednesday 24 June 2009, Marek Szyprowski wrote:
>
> > So it's looking like tty->driver_data is somehow NULL. ?That's
> > never supposed to happen. ?Did gs_open() fail or something?
>
> I've triggered this bug by running a 'getty -L 115200 ttyGS0 vt100' and trying
> to login on that usb console and then pressing 'enter' key for longer time.
> getty might abort after a few failed logins, so the /dev/ttyGS0 file might be
> closed before all characters that need to be echoed were processed (usb udc debug
> messages really slows it down). This is however a pure guessing, I know nothing
> about tty framework and internals.

Hmm, that would mean the shutdown processing isn't behaving
right. Plausible, it's tricky to test.

Patches accepted. ;)