LinuxLists.cc - N_PPP_SYNC ldisc BUG: sleeping function called from invalid context

2009-09-30 14:50:28

Subject: N_PPP_SYNC ldisc BUG: sleeping function called from invalid context

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

While testing pppd with capiplugin on kernel 2.6.31 (plus my CAPI and
Gigaset patches) I came across this BUG message. (Please ignore the
Tainted flag, it's from the Nvidia driver and doesn't influence what's
following.)

BUG: sleeping function called from invalid context at kernel/mutex.c:280
in_atomic(): 1, irqs_disabled(): 1, pid: 10, name: events/0
INFO: lockdep is turned off.
irq event stamp: 2034
hardirqs last enabled at (2033): [<c0347b07>]
_spin_unlock_irqrestore+0x3c/0x6c
hardirqs last disabled at (2034): [<c03475d0>] _spin_lock_irq+0x11/0x3f
softirqs last enabled at (2008): [<c012f598>] __do_softirq+0x114/0x11c
softirqs last disabled at (1971): [<c010593a>] do_softirq+0x69/0xc7
Pid: 10, comm: events/0 Tainted: P 2.6.31-vanilla #1
Call Trace:
[<c0124a3c>] __might_sleep+0x10e/0x116
[<c0346bdd>] mutex_lock_nested+0x18/0x31
[<c026d39b>] tty_unthrottle+0x10/0x38
[<f8dcc31f>] ppp_sync_receive+0x168/0x170 [ppp_synctty]
[<f8fbb9ce>] handle_minor_recv+0x187/0x1cd [capi]
[<f8fbc19b>] capi_recv_message+0x1d9/0x24e [capi]
[<fa5be461>] recv_handler+0x56/0x6f [kernelcapi]
[<c0138cba>] worker_thread+0x14a/0x21d
[<c0138c9e>] ? worker_thread+0x12e/0x21d
[<fa5be40b>] ? recv_handler+0x0/0x6f [kernelcapi]
[<c013b375>] ? autoremove_wake_function+0x0/0x30
[<c013b201>] kthread+0x64/0x69
[<c0138b70>] ? worker_thread+0x0/0x21d
[<c013b19d>] ? kthread+0x0/0x69
[<c0103f33>] kernel_thread_helper+0x7/0x10

Turns out the ppp_sync_receive() function (drivers/net/ppp_synctty.c
line 385ff.) has a comment in front stating:

/*
* This can now be called from hard interrupt level as well
* as soft interrupt level or mainline.
*/

but calls tty_unthrottle() which in turn calls mutex_lock() which of
course can sleep.

That tty_unthrottle() call was already removed once by

commit a6540f731d506d9e82444cf0020e716613d4c46c
Author: Alan Cox <[email protected]>
Date: Mon Jun 22 18:42:29 2009 +0100

ppp: Fix throttling bugs

but re-added by

commit 4a21b8cb3550f19f838f7c48345fbbf6a0e8536b
Author: Linus Torvalds <[email protected]>
Date: Thu Jul 16 09:14:23 2009 -0700

Revert "ppp: Fix throttling bugs"

This reverts commit a6540f731d506d9e82444cf0020e716613d4c46c, as
requested by Alan:

"... as it was wrong, the pty code is now fixed and the fact this
isn't reverted is breaking pptp setups."

Opinions?

Thanks,
Tilman

- --
Tilman Schmidt E-Mail: [email protected]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFKw3AyQ3+did9BuFsRAiZXAKCGaos/qZNTlStEP4SE90PA0ZoMAQCdFtvf
U9chE7at35y8c6CGGS1IGg0=
=Vpq4
-----END PGP SIGNATURE-----

2009-09-30 16:45:47

by Alan

[permalink] [raw]

Subject: Re: N_PPP_SYNC ldisc BUG: sleeping function called from invalid context

> [<c026d39b>] tty_unthrottle+0x10/0x38
> [<f8dcc31f>] ppp_sync_receive+0x168/0x170 [ppp_synctty]
> [<f8fbb9ce>] handle_minor_recv+0x187/0x1cd [capi]
> [<f8fbc19b>] capi_recv_message+0x1d9/0x24e [capi]

Really need to see the rest of the call trace to be sure

> Turns out the ppp_sync_receive() function (drivers/net/ppp_synctty.c
> line 385ff.) has a comment in front stating:
>
> /*
> * This can now be called from hard interrupt level as well
> * as soft interrupt level or mainline.
> */

Which is wrong. The flip_buffer_push -> rx processing path should never
be called from IRQ context and that was fixed for various drivers that
mis-set tty->low_latency, as well as in the PPP rework. The PPP case is
actually unrelated in many was.

> Opinions?

See how we got into that code direct from an IRQ path. The expectation of
the tty logic is that it gets processed from work queues either
specifically in driver or via tty_flip_buffer_push when tty->low_latency
= 0

2009-09-30 18:55:14

by Tilman Schmidt

[permalink] [raw]

Subject: Re: N_PPP_SYNC ldisc BUG: sleeping function called from invalid context

Alan Cox schrieb:
>> [<c026d39b>] tty_unthrottle+0x10/0x38
>> [<f8dcc31f>] ppp_sync_receive+0x168/0x170 [ppp_synctty]
>> [<f8fbb9ce>] handle_minor_recv+0x187/0x1cd [capi]
>> [<f8fbc19b>] capi_recv_message+0x1d9/0x24e [capi]
>
> Really need to see the rest of the call trace to be sure

There wasn't more than what I posted. I had six of them, they looked all
identical, and all of them ended after the kernel_thread_helper line.

>> Turns out the ppp_sync_receive() function (drivers/net/ppp_synctty.c
>> line 385ff.) has a comment in front stating:
>>
>> /*
>> * This can now be called from hard interrupt level as well
>> * as soft interrupt level or mainline.
>> */
>
> Which is wrong. The flip_buffer_push -> rx processing path should never
> be called from IRQ context and that was fixed for various drivers that
> mis-set tty->low_latency, as well as in the PPP rework. The PPP case is
> actually unrelated in many was.

Might be worth correcting that text then before is misleads someone.

>> Opinions?
>
> See how we got into that code direct from an IRQ path. The expectation of
> the tty logic is that it gets processed from work queues either
> specifically in driver or via tty_flip_buffer_push when tty->low_latency
> = 0

I'm at a loss here. According to all the backtraces:

- ppp_sync_receive() was called, as the LD's receive_buf method,
via handle_recv_skb() [drivers/isdn/capi/capi.c line 504, inlined]
from handle_minor_recv() [drivers/isdn/capi/capi.c line 519]

- handle_minor_recv() was called from capi_recv_message()
[drivers/isdn/capi/capi.c line 656]

- capi_recv_message() was called, as the CAPI application's
recv_message method, from recv_handler()
[drivers/isdn/capi/kcapi.c line 268]

- recv_handler() is never called directly. It's only scheduled
via the work queue ap->recv_work from capi_ctr_handle_message()
[drivers/isdn/capi/kcapi.c line 349]

Even if we don't trust the backtraces, there's not much room for
another activation path. So for all I know, the expectation of the
tty logic should have been met. The call was indeed processed from
a work queue.

Why then does mutex_lock() complain?

--
Tilman Schmidt E-Mail: [email protected]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge?ffnet mindestens haltbar bis: (siehe R?ckseite)

Attachments:

signature.asc (254.00 B)
OpenPGP digital signature

2009-09-30 20:29:21

by Jarek Poplawski

[permalink] [raw]

Subject: Re: N_PPP_SYNC ldisc BUG: sleeping function called from invalid context

2009-09-30 21:13:29

by Jarek Poplawski

[permalink] [raw]

Subject: Re: N_PPP_SYNC ldisc BUG: sleeping function called from invalid context

Tilman Schmidt wrote, On 09/30/2009 08:55 PM:

> Alan Cox schrieb:
>>> [<c026d39b>] tty_unthrottle+0x10/0x38
>>> [<f8dcc31f>] ppp_sync_receive+0x168/0x170 [ppp_synctty]
>>> [<f8fbb9ce>] handle_minor_recv+0x187/0x1cd [capi]
>>> [<f8fbc19b>] capi_recv_message+0x1d9/0x24e [capi]
>> Really need to see the rest of the call trace to be sure
>
> There wasn't more than what I posted. I had six of them, they looked all
> identical, and all of them ended after the kernel_thread_helper line.
>
>>> Turns out the ppp_sync_receive() function (drivers/net/ppp_synctty.c
>>> line 385ff.) has a comment in front stating:
>>>
>>> /*
>>> * This can now be called from hard interrupt level as well
>>> * as soft interrupt level or mainline.
>>> */
>> Which is wrong. The flip_buffer_push -> rx processing path should never
>> be called from IRQ context and that was fixed for various drivers that
>> mis-set tty->low_latency, as well as in the PPP rework. The PPP case is
>> actually unrelated in many was.
>
> Might be worth correcting that text then before is misleads someone.
>
>>> Opinions?
>> See how we got into that code direct from an IRQ path. The expectation of
>> the tty logic is that it gets processed from work queues either
>> specifically in driver or via tty_flip_buffer_push when tty->low_latency
>> = 0
>
> I'm at a loss here. According to all the backtraces:
>
> - ppp_sync_receive() was called, as the LD's receive_buf method,
> via handle_recv_skb() [drivers/isdn/capi/capi.c line 504, inlined]
> from handle_minor_recv() [drivers/isdn/capi/capi.c line 519]
>
> - handle_minor_recv() was called from capi_recv_message()
> [drivers/isdn/capi/capi.c line 656]
>
> - capi_recv_message() was called, as the CAPI application's
> recv_message method, from recv_handler()
> [drivers/isdn/capi/kcapi.c line 268]
>
> - recv_handler() is never called directly. It's only scheduled
> via the work queue ap->recv_work from capi_ctr_handle_message()
> [drivers/isdn/capi/kcapi.c line 349]
>
> Even if we don't trust the backtraces, there's not much room for
> another activation path. So for all I know, the expectation of the
> tty logic should have been met. The call was indeed processed from
> a work queue.
>
> Why then does mutex_lock() complain?

Hmm... capi_recv_message() calls handle_minor_recv() under
spin_lock_irqsave(), doesn't it?

Jarek P.

2009-09-30 22:00:24

by Tilman Schmidt

[permalink] [raw]

Subject: Re: N_PPP_SYNC ldisc BUG: sleeping function called from invalid context

Jarek Poplawski schrieb:
> Tilman Schmidt wrote, On 09/30/2009 08:55 PM:
> ...
>> Why then does mutex_lock() complain?
>
>
> Maybe it doesn't matter here, but this:
>
>> INFO: lockdep is turned off.
> suggests there was some lockdep issue/warning earlier.

That's just because some <deleted> decided that tainted kernels don't deserve lockdep.

--
Tilman Schmidt E-Mail: [email protected]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge?ffnet mindestens haltbar bis: (siehe R?ckseite)

Attachments:

signature.asc (254.00 B)
OpenPGP digital signature

2009-09-30 23:16:14

by Tilman Schmidt

[permalink] [raw]

Subject: capi.c calls receive_buf with interrupts disabled (was: N_PPP_SYNC ldisc BUG: sleeping function called from invalid context)

Jarek Poplawski schrieb:
> Tilman Schmidt wrote, On 09/30/2009 08:55 PM:
[...]
>> - ppp_sync_receive() was called, as the LD's receive_buf method,
>> via handle_recv_skb() [drivers/isdn/capi/capi.c line 504, inlined]
>> from handle_minor_recv() [drivers/isdn/capi/capi.c line 519]
>>
>> - handle_minor_recv() was called from capi_recv_message()
>> [drivers/isdn/capi/capi.c line 656]
>>
>> - capi_recv_message() was called, as the CAPI application's
>> recv_message method, from recv_handler()
>> [drivers/isdn/capi/kcapi.c line 268]
>>
>> - recv_handler() is never called directly. It's only scheduled
>> via the work queue ap->recv_work from capi_ctr_handle_message()
>> [drivers/isdn/capi/kcapi.c line 349]
>>
>> Even if we don't trust the backtraces, there's not much room for
>> another activation path. So for all I know, the expectation of the
>> tty logic should have been met. The call was indeed processed from
>> a work queue.
>>
>> Why then does mutex_lock() complain?
>
> Hmm... capi_recv_message() calls handle_minor_recv() under
> spin_lock_irqsave(), doesn't it?

Well spotted. Indeed it does. That explains it, of course.

The spinlock in question was added by:

commit 053b47ff249b9e0a634dae807f81465205e7c228
Author: Michael Buesch <[email protected]>
Date: Mon Feb 12 00:53:26 2007 -0800

[PATCH] Workaround CAPI subsystem locking issue

I think the following patch should go into the kernel, until the ISDN/CAPI
guys create the real fix for this issue.

The issue is a concurrency issue with some internal CAPI data structure
which can crash the kernel.

On my FritzCard DSL with the AVM driver it crashes about once a day without
this workaround patch. With this workaround patch it's rock-stable (at
least on UP, but I don't see why this shouldn't work on SMP as well. But
maybe I'm missing something.)

This workaround is kind of a sledgehammer which inserts a global lock to
wrap around all the critical sections. Of course, this is a scalability
issue, if you have many ISDN/CAPI cards. But it prevents a crash. So I
vote for this fix to get merged, until people come up with a better
solution. Better have a stable kernel that's less scalable, than a
crashing and useless kernel.

So let's cc the author of that patch, and also the good people on the
isdn4linux developer mailing list ...

Any ideas for a fix?

Thanks,
Tilman

--
Tilman Schmidt E-Mail: [email protected]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge?ffnet mindestens haltbar bis: (siehe R?ckseite)

Attachments:

signature.asc (254.00 B)
OpenPGP digital signature