LinuxLists.cc - 2.6.24-rc6: possible recursive locking detected

2008-01-03 22:58:24

Subject: 2.6.24-rc6: possible recursive locking detected

hi,

a few minutes after upgrading from -rc5 to -rc6 I got:

[ 1310.670986] =============================================
[ 1310.671690] [ INFO: possible recursive locking detected ]
[ 1310.672097] 2.6.24-rc6 #1
[ 1310.672421] ---------------------------------------------
[ 1310.672828] FahCore_a0.exe/3692 is trying to acquire lock:
[ 1310.673238] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
[ 1310.673869]
[ 1310.673870] but task is already holding lock:
[ 1310.674567] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
[ 1310.675267]
[ 1310.675268] other info that might help us debug this:
[ 1310.675952] 5 locks held by FahCore_a0.exe/3692:
[ 1310.676334] #0: (rcu_read_lock){..--}, at: [<c038b620>] net_rx_action+0x60/0x1b0
[ 1310.677251] #1: (rcu_read_lock){..--}, at: [<c0388d60>] netif_receive_skb+0x100/0x470
[ 1310.677924] #2: (rcu_read_lock){..--}, at: [<c03a7fb2>] ip_local_deliver_finish+0x32/0x210
[ 1310.678460] #3: (clock-AF_INET){-.-?}, at: [<c038164e>] sock_def_readable+0x1e/0x80
[ 1310.679250] #4: (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
[ 1310.680151]
[ 1310.680152] stack backtrace:
[ 1310.680772] Pid: 3692, comm: FahCore_a0.exe Not tainted 2.6.24-rc6 #1
[ 1310.681209] [<c01038aa>] show_trace_log_lvl+0x1a/0x30
[ 1310.681659] [<c0104322>] show_trace+0x12/0x20
[ 1310.682085] [<c0104cba>] dump_stack+0x6a/0x70
[ 1310.682512] [<c0138ec1>] __lock_acquire+0x971/0x10c0
[ 1310.682961] [<c013966e>] lock_acquire+0x5e/0x80
[ 1310.683392] [<c0419b78>] _spin_lock_irqsave+0x38/0x50
[ 1310.683914] [<c011544b>] __wake_up+0x1b/0x50
[ 1310.684337] [<c018e2ba>] ep_poll_safewake+0x9a/0xc0
[ 1310.684822] [<c018f11b>] ep_poll_callback+0x8b/0xe0
[ 1310.685265] [<c0114418>] __wake_up_common+0x48/0x70
[ 1310.685712] [<c0115467>] __wake_up+0x37/0x50
[ 1310.686136] [<c03816aa>] sock_def_readable+0x7a/0x80
[ 1310.686579] [<c0381c2b>] sock_queue_rcv_skb+0xeb/0x150
[ 1310.687028] [<c03c7d99>] udp_queue_rcv_skb+0x139/0x2a0
[ 1310.687554] [<c03c81f1>] __udp4_lib_rcv+0x2f1/0x7e0
[ 1310.687996] [<c03c86f2>] udp_rcv+0x12/0x20
[ 1310.688415] [<c03a80a5>] ip_local_deliver_finish+0x125/0x210
[ 1310.688881] [<c03a84ed>] ip_local_deliver+0x2d/0x90
[ 1310.689323] [<c03a7d6b>] ip_rcv_finish+0xeb/0x300
[ 1310.689760] [<c03a8425>] ip_rcv+0x195/0x230
[ 1310.690182] [<c0388fdc>] netif_receive_skb+0x37c/0x470
[ 1310.690632] [<c038ba39>] process_backlog+0x69/0xc0
[ 1310.691175] [<c038b6f7>] net_rx_action+0x137/0x1b0
[ 1310.691681] [<c011e5c2>] __do_softirq+0x52/0xb0
[ 1310.692006] [<c0104e94>] do_softirq+0x94/0xe0
[ 1310.692301] =======================

This is a single CPU machine, and the box was quite busy due to disk I/O
(load 6-8). The machine continues to run and all is well now. Even the
application mentioned above (FahCore_a0.exe) is running fine
("Folding@Home", cpu bound). The binary is located on an jfs filesystem,
which was also under heavy I/O. Can someone tell me why the backtrace
shows so much net* stuff? There was not much net I/O...

more details and .config: http://nerdbynature.de/bits/2.6.24-rc6

Thanks,
Christian.
--
BOFH excuse #312:

incompatible bit-registration operators

2008-01-03 23:04:58

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

[Added some CCs]

On Thursday, 3 of January 2008, Christian Kujau wrote:
> hi,
>
> a few minutes after upgrading from -rc5 to -rc6 I got:
>
> [ 1310.670986] =============================================
> [ 1310.671690] [ INFO: possible recursive locking detected ]
> [ 1310.672097] 2.6.24-rc6 #1
> [ 1310.672421] ---------------------------------------------
> [ 1310.672828] FahCore_a0.exe/3692 is trying to acquire lock:
> [ 1310.673238] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> [ 1310.673869]
> [ 1310.673870] but task is already holding lock:
> [ 1310.674567] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> [ 1310.675267]
> [ 1310.675268] other info that might help us debug this:
> [ 1310.675952] 5 locks held by FahCore_a0.exe/3692:
> [ 1310.676334] #0: (rcu_read_lock){..--}, at: [<c038b620>] net_rx_action+0x60/0x1b0
> [ 1310.677251] #1: (rcu_read_lock){..--}, at: [<c0388d60>] netif_receive_skb+0x100/0x470
> [ 1310.677924] #2: (rcu_read_lock){..--}, at: [<c03a7fb2>] ip_local_deliver_finish+0x32/0x210
> [ 1310.678460] #3: (clock-AF_INET){-.-?}, at: [<c038164e>] sock_def_readable+0x1e/0x80
> [ 1310.679250] #4: (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> [ 1310.680151]
> [ 1310.680152] stack backtrace:
> [ 1310.680772] Pid: 3692, comm: FahCore_a0.exe Not tainted 2.6.24-rc6 #1
> [ 1310.681209] [<c01038aa>] show_trace_log_lvl+0x1a/0x30
> [ 1310.681659] [<c0104322>] show_trace+0x12/0x20
> [ 1310.682085] [<c0104cba>] dump_stack+0x6a/0x70
> [ 1310.682512] [<c0138ec1>] __lock_acquire+0x971/0x10c0
> [ 1310.682961] [<c013966e>] lock_acquire+0x5e/0x80
> [ 1310.683392] [<c0419b78>] _spin_lock_irqsave+0x38/0x50
> [ 1310.683914] [<c011544b>] __wake_up+0x1b/0x50
> [ 1310.684337] [<c018e2ba>] ep_poll_safewake+0x9a/0xc0
> [ 1310.684822] [<c018f11b>] ep_poll_callback+0x8b/0xe0
> [ 1310.685265] [<c0114418>] __wake_up_common+0x48/0x70
> [ 1310.685712] [<c0115467>] __wake_up+0x37/0x50
> [ 1310.686136] [<c03816aa>] sock_def_readable+0x7a/0x80
> [ 1310.686579] [<c0381c2b>] sock_queue_rcv_skb+0xeb/0x150
> [ 1310.687028] [<c03c7d99>] udp_queue_rcv_skb+0x139/0x2a0
> [ 1310.687554] [<c03c81f1>] __udp4_lib_rcv+0x2f1/0x7e0
> [ 1310.687996] [<c03c86f2>] udp_rcv+0x12/0x20
> [ 1310.688415] [<c03a80a5>] ip_local_deliver_finish+0x125/0x210
> [ 1310.688881] [<c03a84ed>] ip_local_deliver+0x2d/0x90
> [ 1310.689323] [<c03a7d6b>] ip_rcv_finish+0xeb/0x300
> [ 1310.689760] [<c03a8425>] ip_rcv+0x195/0x230
> [ 1310.690182] [<c0388fdc>] netif_receive_skb+0x37c/0x470
> [ 1310.690632] [<c038ba39>] process_backlog+0x69/0xc0
> [ 1310.691175] [<c038b6f7>] net_rx_action+0x137/0x1b0
> [ 1310.691681] [<c011e5c2>] __do_softirq+0x52/0xb0
> [ 1310.692006] [<c0104e94>] do_softirq+0x94/0xe0
> [ 1310.692301] =======================
>
>
> This is a single CPU machine, and the box was quite busy due to disk I/O
> (load 6-8). The machine continues to run and all is well now. Even the
> application mentioned above (FahCore_a0.exe) is running fine
> ("Folding@Home", cpu bound). The binary is located on an jfs filesystem,
> which was also under heavy I/O. Can someone tell me why the backtrace
> shows so much net* stuff? There was not much net I/O...
>
> more details and .config: http://nerdbynature.de/bits/2.6.24-rc6
>
> Thanks,
> Christian.

2008-01-04 08:31:30

by Ingo Molnar

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

[ Added even more CCs :-) Seems eventpoll & net related. ]

* Rafael J. Wysocki <[email protected]> wrote:

> [Added some CCs]
>
> On Thursday, 3 of January 2008, Christian Kujau wrote:
> > hi,
> >
> > a few minutes after upgrading from -rc5 to -rc6 I got:
> >
> > [ 1310.670986] =============================================
> > [ 1310.671690] [ INFO: possible recursive locking detected ]
> > [ 1310.672097] 2.6.24-rc6 #1
> > [ 1310.672421] ---------------------------------------------
> > [ 1310.672828] FahCore_a0.exe/3692 is trying to acquire lock:
> > [ 1310.673238] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > [ 1310.673869]
> > [ 1310.673870] but task is already holding lock:
> > [ 1310.674567] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > [ 1310.675267]
> > [ 1310.675268] other info that might help us debug this:
> > [ 1310.675952] 5 locks held by FahCore_a0.exe/3692:
> > [ 1310.676334] #0: (rcu_read_lock){..--}, at: [<c038b620>] net_rx_action+0x60/0x1b0
> > [ 1310.677251] #1: (rcu_read_lock){..--}, at: [<c0388d60>] netif_receive_skb+0x100/0x470
> > [ 1310.677924] #2: (rcu_read_lock){..--}, at: [<c03a7fb2>] ip_local_deliver_finish+0x32/0x210
> > [ 1310.678460] #3: (clock-AF_INET){-.-?}, at: [<c038164e>] sock_def_readable+0x1e/0x80
> > [ 1310.679250] #4: (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > [ 1310.680151]
> > [ 1310.680152] stack backtrace:
> > [ 1310.680772] Pid: 3692, comm: FahCore_a0.exe Not tainted 2.6.24-rc6 #1
> > [ 1310.681209] [<c01038aa>] show_trace_log_lvl+0x1a/0x30
> > [ 1310.681659] [<c0104322>] show_trace+0x12/0x20
> > [ 1310.682085] [<c0104cba>] dump_stack+0x6a/0x70
> > [ 1310.682512] [<c0138ec1>] __lock_acquire+0x971/0x10c0
> > [ 1310.682961] [<c013966e>] lock_acquire+0x5e/0x80
> > [ 1310.683392] [<c0419b78>] _spin_lock_irqsave+0x38/0x50
> > [ 1310.683914] [<c011544b>] __wake_up+0x1b/0x50
> > [ 1310.684337] [<c018e2ba>] ep_poll_safewake+0x9a/0xc0
> > [ 1310.684822] [<c018f11b>] ep_poll_callback+0x8b/0xe0
> > [ 1310.685265] [<c0114418>] __wake_up_common+0x48/0x70
> > [ 1310.685712] [<c0115467>] __wake_up+0x37/0x50
> > [ 1310.686136] [<c03816aa>] sock_def_readable+0x7a/0x80
> > [ 1310.686579] [<c0381c2b>] sock_queue_rcv_skb+0xeb/0x150
> > [ 1310.687028] [<c03c7d99>] udp_queue_rcv_skb+0x139/0x2a0
> > [ 1310.687554] [<c03c81f1>] __udp4_lib_rcv+0x2f1/0x7e0
> > [ 1310.687996] [<c03c86f2>] udp_rcv+0x12/0x20
> > [ 1310.688415] [<c03a80a5>] ip_local_deliver_finish+0x125/0x210
> > [ 1310.688881] [<c03a84ed>] ip_local_deliver+0x2d/0x90
> > [ 1310.689323] [<c03a7d6b>] ip_rcv_finish+0xeb/0x300
> > [ 1310.689760] [<c03a8425>] ip_rcv+0x195/0x230
> > [ 1310.690182] [<c0388fdc>] netif_receive_skb+0x37c/0x470
> > [ 1310.690632] [<c038ba39>] process_backlog+0x69/0xc0
> > [ 1310.691175] [<c038b6f7>] net_rx_action+0x137/0x1b0
> > [ 1310.691681] [<c011e5c2>] __do_softirq+0x52/0xb0
> > [ 1310.692006] [<c0104e94>] do_softirq+0x94/0xe0
> > [ 1310.692301] =======================
> >
> >
> > This is a single CPU machine, and the box was quite busy due to disk I/O
> > (load 6-8). The machine continues to run and all is well now. Even the
> > application mentioned above (FahCore_a0.exe) is running fine
> > ("Folding@Home", cpu bound). The binary is located on an jfs filesystem,
> > which was also under heavy I/O. Can someone tell me why the backtrace
> > shows so much net* stuff? There was not much net I/O...
> >
> > more details and .config: http://nerdbynature.de/bits/2.6.24-rc6
> >
> > Thanks,
> > Christian.

2008-01-05 07:12:44

by Herbert Xu

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

On Fri, Jan 04, 2008 at 09:30:49AM +0100, Ingo Molnar wrote:
>
> > > [ 1310.670986] =============================================
> > > [ 1310.671690] [ INFO: possible recursive locking detected ]
> > > [ 1310.672097] 2.6.24-rc6 #1
> > > [ 1310.672421] ---------------------------------------------
> > > [ 1310.672828] FahCore_a0.exe/3692 is trying to acquire lock:
> > > [ 1310.673238] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > [ 1310.673869]
> > > [ 1310.673870] but task is already holding lock:
> > > [ 1310.674567] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > [ 1310.675267]
> > > [ 1310.675268] other info that might help us debug this:
> > > [ 1310.675952] 5 locks held by FahCore_a0.exe/3692:
> > > [ 1310.676334] #0: (rcu_read_lock){..--}, at: [<c038b620>] net_rx_action+0x60/0x1b0
> > > [ 1310.677251] #1: (rcu_read_lock){..--}, at: [<c0388d60>] netif_receive_skb+0x100/0x470
> > > [ 1310.677924] #2: (rcu_read_lock){..--}, at: [<c03a7fb2>] ip_local_deliver_finish+0x32/0x210
> > > [ 1310.678460] #3: (clock-AF_INET){-.-?}, at: [<c038164e>] sock_def_readable+0x1e/0x80
> > > [ 1310.679250] #4: (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50

The net part might just be a red herring, since the problem is that
__wake_up is somehow reentering itself.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2008-01-05 16:54:10

by Peter Zijlstra

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

On Sat, 2008-01-05 at 18:12 +1100, Herbert Xu wrote:
> On Fri, Jan 04, 2008 at 09:30:49AM +0100, Ingo Molnar wrote:
> >
> > > > [ 1310.670986] =============================================
> > > > [ 1310.671690] [ INFO: possible recursive locking detected ]
> > > > [ 1310.672097] 2.6.24-rc6 #1
> > > > [ 1310.672421] ---------------------------------------------
> > > > [ 1310.672828] FahCore_a0.exe/3692 is trying to acquire lock:
> > > > [ 1310.673238] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > > [ 1310.673869]
> > > > [ 1310.673870] but task is already holding lock:
> > > > [ 1310.674567] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > > [ 1310.675267]
> > > > [ 1310.675268] other info that might help us debug this:
> > > > [ 1310.675952] 5 locks held by FahCore_a0.exe/3692:
> > > > [ 1310.676334] #0: (rcu_read_lock){..--}, at: [<c038b620>] net_rx_action+0x60/0x1b0
> > > > [ 1310.677251] #1: (rcu_read_lock){..--}, at: [<c0388d60>] netif_receive_skb+0x100/0x470
> > > > [ 1310.677924] #2: (rcu_read_lock){..--}, at: [<c03a7fb2>] ip_local_deliver_finish+0x32/0x210
> > > > [ 1310.678460] #3: (clock-AF_INET){-.-?}, at: [<c038164e>] sock_def_readable+0x1e/0x80
> > > > [ 1310.679250] #4: (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
>
> The net part might just be a red herring, since the problem is that
> __wake_up is somehow reentering itself.

/*
* Perform a safe wake up of the poll wait list. The problem is that
* with the new callback'd wake up system, it is possible that the
* poll callback is reentered from inside the call to wake_up() done
* on the poll wait queue head. The rule is that we cannot reenter the
* wake up code from the same task more than EP_MAX_POLLWAKE_NESTS times,
* and we cannot reenter the same wait queue head at all. This will
* enable to have a hierarchy of epoll file descriptor of no more than
* EP_MAX_POLLWAKE_NESTS deep. We need the irq version of the spin lock
* because this one gets called by the poll callback, that in turn is called
* from inside a wake_up(), that might be called from irq context.
*/

Seems to suggest that the epoll code can indeed recurse into wakeup.

Davide, Johannes, any ideas?

2008-01-05 17:01:57

by Peter Zijlstra

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

On Sat, 2008-01-05 at 17:53 +0100, Peter Zijlstra wrote:
> On Sat, 2008-01-05 at 18:12 +1100, Herbert Xu wrote:
> > On Fri, Jan 04, 2008 at 09:30:49AM +0100, Ingo Molnar wrote:
> > >
> > > > > [ 1310.670986] =============================================
> > > > > [ 1310.671690] [ INFO: possible recursive locking detected ]
> > > > > [ 1310.672097] 2.6.24-rc6 #1
> > > > > [ 1310.672421] ---------------------------------------------
> > > > > [ 1310.672828] FahCore_a0.exe/3692 is trying to acquire lock:
> > > > > [ 1310.673238] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > > > [ 1310.673869]
> > > > > [ 1310.673870] but task is already holding lock:
> > > > > [ 1310.674567] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > > > [ 1310.675267]
> > > > > [ 1310.675268] other info that might help us debug this:
> > > > > [ 1310.675952] 5 locks held by FahCore_a0.exe/3692:
> > > > > [ 1310.676334] #0: (rcu_read_lock){..--}, at: [<c038b620>] net_rx_action+0x60/0x1b0
> > > > > [ 1310.677251] #1: (rcu_read_lock){..--}, at: [<c0388d60>] netif_receive_skb+0x100/0x470
> > > > > [ 1310.677924] #2: (rcu_read_lock){..--}, at: [<c03a7fb2>] ip_local_deliver_finish+0x32/0x210
> > > > > [ 1310.678460] #3: (clock-AF_INET){-.-?}, at: [<c038164e>] sock_def_readable+0x1e/0x80
> > > > > [ 1310.679250] #4: (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> >
> > The net part might just be a red herring, since the problem is that
> > __wake_up is somehow reentering itself.
>
> /*
> * Perform a safe wake up of the poll wait list. The problem is that
> * with the new callback'd wake up system, it is possible that the
> * poll callback is reentered from inside the call to wake_up() done
> * on the poll wait queue head. The rule is that we cannot reenter the
> * wake up code from the same task more than EP_MAX_POLLWAKE_NESTS times,
> * and we cannot reenter the same wait queue head at all. This will
> * enable to have a hierarchy of epoll file descriptor of no more than
> * EP_MAX_POLLWAKE_NESTS deep. We need the irq version of the spin lock
> * because this one gets called by the poll callback, that in turn is called
> * from inside a wake_up(), that might be called from irq context.
> */
>
> Seems to suggest that the epoll code can indeed recurse into wakeup.
>
> Davide, Johannes, any ideas?

Since EP_MAX_POLLWAKE_NESTS < MAX_LOCKDEP_SUBCLASSES we could perhaps do
something like:

wake_up_nested(..., wake_nests);

although I'm not quite sure that is correct, my understanding of this
code is still fragile at best.

2008-01-05 21:37:20

by Davide Libenzi

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

On Sat, 5 Jan 2008, Peter Zijlstra wrote:

>
> On Sat, 2008-01-05 at 17:53 +0100, Peter Zijlstra wrote:
> > On Sat, 2008-01-05 at 18:12 +1100, Herbert Xu wrote:
> > > On Fri, Jan 04, 2008 at 09:30:49AM +0100, Ingo Molnar wrote:
> > > >
> > > > > > [ 1310.670986] =============================================
> > > > > > [ 1310.671690] [ INFO: possible recursive locking detected ]
> > > > > > [ 1310.672097] 2.6.24-rc6 #1
> > > > > > [ 1310.672421] ---------------------------------------------
> > > > > > [ 1310.672828] FahCore_a0.exe/3692 is trying to acquire lock:
> > > > > > [ 1310.673238] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > > > > [ 1310.673869]
> > > > > > [ 1310.673870] but task is already holding lock:
> > > > > > [ 1310.674567] (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > > > > > [ 1310.675267]
> > > > > > [ 1310.675268] other info that might help us debug this:
> > > > > > [ 1310.675952] 5 locks held by FahCore_a0.exe/3692:
> > > > > > [ 1310.676334] #0: (rcu_read_lock){..--}, at: [<c038b620>] net_rx_action+0x60/0x1b0
> > > > > > [ 1310.677251] #1: (rcu_read_lock){..--}, at: [<c0388d60>] netif_receive_skb+0x100/0x470
> > > > > > [ 1310.677924] #2: (rcu_read_lock){..--}, at: [<c03a7fb2>] ip_local_deliver_finish+0x32/0x210
> > > > > > [ 1310.678460] #3: (clock-AF_INET){-.-?}, at: [<c038164e>] sock_def_readable+0x1e/0x80
> > > > > > [ 1310.679250] #4: (&q->lock){++..}, at: [<c011544b>] __wake_up+0x1b/0x50
> > >
> > > The net part might just be a red herring, since the problem is that
> > > __wake_up is somehow reentering itself.
> >
> > /*
> > * Perform a safe wake up of the poll wait list. The problem is that
> > * with the new callback'd wake up system, it is possible that the
> > * poll callback is reentered from inside the call to wake_up() done
> > * on the poll wait queue head. The rule is that we cannot reenter the
> > * wake up code from the same task more than EP_MAX_POLLWAKE_NESTS times,
> > * and we cannot reenter the same wait queue head at all. This will
> > * enable to have a hierarchy of epoll file descriptor of no more than
> > * EP_MAX_POLLWAKE_NESTS deep. We need the irq version of the spin lock
> > * because this one gets called by the poll callback, that in turn is called
> > * from inside a wake_up(), that might be called from irq context.
> > */
> >
> > Seems to suggest that the epoll code can indeed recurse into wakeup.
> >
> > Davide, Johannes, any ideas?
>
> Since EP_MAX_POLLWAKE_NESTS < MAX_LOCKDEP_SUBCLASSES we could perhaps do
> something like:
>
> wake_up_nested(..., wake_nests);
>
> although I'm not quite sure that is correct, my understanding of this
> code is still fragile at best.

I remember I talked with Arjan about this time ago. Basically, since 1)
you can drop an epoll fd inside another epoll fd 2) callback-based wakeups
are used, you can see a wake_up() from inside another wake_up(), but they
will never refer to the same lock instance.
Think about:

dfd = socket(...);
efd1 = epoll_create();
efd2 = epoll_create();
epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);

When a packet arrives to the device underneath "dfd", the net code will
issue a wake_up() on its poll wake list. Epoll (efd1) has installed a
callback wakeup entry on that queue, and the wake_up() performed by the
"dfd" net code will end up in ep_poll_callback(). At this point epoll
(efd1) notices that it may have some event ready, so it needs to wake up
the waiters on its poll wait list (efd2). So it calls ep_poll_safewake()
that ends up in another wake_up(), after having checked about the
recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to
avoid stack blasting. Never hit the same queue, to avoid loops like:

epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);

The code "if (tncur->wq == wq || ..." prevents re-entering the same
queue/lock.
I don't know how the lockdep code works, so I can't say about
wake_up_nested(). Although I have a feeling is not enough in this case.
A solution may be to move the call to ep_poll_safewake() (that'd become a
simple wake_up()) inside a tasklet or whatever is today trendy for delayed
work. But his kinda scares me to be honest, since epoll has already a
bunch of places where it could be asynchronously hit (plus performance
regression will need to be verified).

- Davide

2008-01-06 00:20:42

by Christian Kujau

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

On Sat, 5 Jan 2008, Davide Libenzi wrote:
> A solution may be to move the call to ep_poll_safewake() (that'd become a
> simple wake_up()) inside a tasklet or whatever is today trendy for delayed
> work. But his kinda scares me to be honest, since epoll has already a
> bunch of places where it could be asynchronously hit (plus performance
> regression will need to be verified).

Although I'm not able to reproduce this one right now, I'm happy to test
any patches you guys come up with.

Thanks for your time,
Christian.
--
BOFH excuse #78:

Yes, yes, its called a design limitation

2008-01-06 21:45:11

by Cyrill Gorcunov

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

[Davide Libenzi - Sat, Jan 05, 2008 at 01:35:25PM -0800]
| On Sat, 5 Jan 2008, Peter Zijlstra wrote:
|
[...snip...]
| I remember I talked with Arjan about this time ago. Basically, since 1)
| you can drop an epoll fd inside another epoll fd 2) callback-based wakeups
| are used, you can see a wake_up() from inside another wake_up(), but they
| will never refer to the same lock instance.
| Think about:
|
| dfd = socket(...);
| efd1 = epoll_create();
| efd2 = epoll_create();
| epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
| epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
|
| When a packet arrives to the device underneath "dfd", the net code will
| issue a wake_up() on its poll wake list. Epoll (efd1) has installed a
| callback wakeup entry on that queue, and the wake_up() performed by the
| "dfd" net code will end up in ep_poll_callback(). At this point epoll
| (efd1) notices that it may have some event ready, so it needs to wake up
| the waiters on its poll wait list (efd2). So it calls ep_poll_safewake()
| that ends up in another wake_up(), after having checked about the
| recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to
| avoid stack blasting. Never hit the same queue, to avoid loops like:
|
| epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
| epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
| epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
| epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);
|
| The code "if (tncur->wq == wq || ..." prevents re-entering the same
| queue/lock.
| I don't know how the lockdep code works, so I can't say about
| wake_up_nested(). Although I have a feeling is not enough in this case.
| A solution may be to move the call to ep_poll_safewake() (that'd become a
| simple wake_up()) inside a tasklet or whatever is today trendy for delayed
| work. But his kinda scares me to be honest, since epoll has already a
| bunch of places where it could be asynchronously hit (plus performance
| regression will need to be verified).
|
|
|
| - Davide
|
|

it's quite possible that i'm wrong but just interested...
why in ep_poll_safewake() the assignment

struct list_head *lsthead = &psw->wake_task_list;

is not protected by spinlock?

- Cyrill -

2008-01-06 21:54:22

by Cyrill Gorcunov

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

[Cyrill Gorcunov - Mon, Jan 07, 2008 at 12:44:42AM +0300]
| [Davide Libenzi - Sat, Jan 05, 2008 at 01:35:25PM -0800]
| | On Sat, 5 Jan 2008, Peter Zijlstra wrote:
| |
| [...snip...]
| | I remember I talked with Arjan about this time ago. Basically, since 1)
| | you can drop an epoll fd inside another epoll fd 2) callback-based wakeups
| | are used, you can see a wake_up() from inside another wake_up(), but they
| | will never refer to the same lock instance.
| | Think about:
| |
| | dfd = socket(...);
| | efd1 = epoll_create();
| | efd2 = epoll_create();
| | epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...);
| | epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
| |
| | When a packet arrives to the device underneath "dfd", the net code will
| | issue a wake_up() on its poll wake list. Epoll (efd1) has installed a
| | callback wakeup entry on that queue, and the wake_up() performed by the
| | "dfd" net code will end up in ep_poll_callback(). At this point epoll
| | (efd1) notices that it may have some event ready, so it needs to wake up
| | the waiters on its poll wait list (efd2). So it calls ep_poll_safewake()
| | that ends up in another wake_up(), after having checked about the
| | recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to
| | avoid stack blasting. Never hit the same queue, to avoid loops like:
| |
| | epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...);
| | epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...);
| | epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...);
| | epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...);
| |
| | The code "if (tncur->wq == wq || ..." prevents re-entering the same
| | queue/lock.
| | I don't know how the lockdep code works, so I can't say about
| | wake_up_nested(). Although I have a feeling is not enough in this case.
| | A solution may be to move the call to ep_poll_safewake() (that'd become a
| | simple wake_up()) inside a tasklet or whatever is today trendy for delayed
| | work. But his kinda scares me to be honest, since epoll has already a
| | bunch of places where it could be asynchronously hit (plus performance
| | regression will need to be verified).
| |
| |
| |
| | - Davide
| |
| |
|
| it's quite possible that i'm wrong but just interested...
| why in ep_poll_safewake() the assignment
|
| struct list_head *lsthead = &psw->wake_task_list;
|
| is not protected by spinlock?
|
| - Cyrill -

it was a completely stupid question... please drop ;)

- Cyrill -

2008-01-07 17:21:14

by Oleg Nesterov

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

On 01/05, Peter Zijlstra wrote:
>
> Since EP_MAX_POLLWAKE_NESTS < MAX_LOCKDEP_SUBCLASSES we could perhaps do
> something like:
>
> wake_up_nested(..., wake_nests);

I think this would be the most correct change. But I wonder if it is possible
to do something more generic (but otoh more stupid/hackish and less safe).

Consider this "just for illustration" patch,

--- t/kernel/lockdep.c 2007-11-09 12:57:31.000000000 +0300
+++ t/kernel/lockdep.c 2008-01-07 19:43:50.000000000 +0300
@@ -1266,10 +1266,13 @@ check_deadlock(struct task_struct *curr,
struct held_lock *prev;
int i;

- for (i = 0; i < curr->lockdep_depth; i++) {
+ for (i = curr->lockdep_depth; --i >= 0; ) {
prev = curr->held_locks + i;
if (prev->class != next->class)
continue;
+
+ if (prev->trylock == -1)
+ return 2;
/*
* Allow read-after-read recursion of the same
* lock class (i.e. read_lock(lock)+read_lock(lock)):
-------------------------------------------------------------------------

Now,

// trylock == -1
#define spin_mark_nested(l) \
lock_acquire(&(l)->dep_map, 0, -1, 0, 2, _THIS_IP_)
#define spin_unmark_nested(l) \
lock_release(&(l)->dep_map, 1, _THIS_IP_)

and ep_poll_safewake() can do:

/* Do really wake up now */
spin_mark_nested(&wq->lock);
wake_up(wq);
spin_unmark_nested(&wq->lock);

Possible?

Oleg.

2008-01-07 17:47:26

by Oleg Nesterov

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

On 01/07, Oleg Nesterov wrote:
>
> Consider this "just for illustration" patch,
>
> --- t/kernel/lockdep.c 2007-11-09 12:57:31.000000000 +0300
> +++ t/kernel/lockdep.c 2008-01-07 19:43:50.000000000 +0300
> @@ -1266,10 +1266,13 @@ check_deadlock(struct task_struct *curr,
> struct held_lock *prev;
> int i;
>
> - for (i = 0; i < curr->lockdep_depth; i++) {
> + for (i = curr->lockdep_depth; --i >= 0; ) {
> prev = curr->held_locks + i;
> if (prev->class != next->class)
> continue;
> +
> + if (prev->trylock == -1)
> + return 2;
> /*
> * Allow read-after-read recursion of the same
> * lock class (i.e. read_lock(lock)+read_lock(lock)):
> -------------------------------------------------------------------------
>
> Now,
>
> // trylock == -1
> #define spin_mark_nested(l) \
> lock_acquire(&(l)->dep_map, 0, -1, 0, 2, _THIS_IP_)
> #define spin_unmark_nested(l) \
> lock_release(&(l)->dep_map, 1, _THIS_IP_)
>
> and ep_poll_safewake() can do:
>
> /* Do really wake up now */
> spin_mark_nested(&wq->lock);
> wake_up(wq);
> spin_unmark_nested(&wq->lock);

I tested the patch above with the following code,

wait_queue_head_t w1, w2, w3;

init_waitqueue_head(&w1);
init_waitqueue_head(&w2);
init_waitqueue_head(&w3);

local_irq_disable();
spin_lock(&w1.lock);

spin_mark_nested(&w2.lock);
spin_lock(&w2.lock);

spin_mark_nested(&w3.lock);
wake_up(&w3);
spin_unmark_nested(&w3.lock);

spin_unlock(&w2.lock);
spin_unmark_nested(&w2.lock);

spin_unlock(&w1.lock);
local_irq_enable();

seems to work. What do you think?

Oleg.

2008-01-07 21:36:07

by Davide Libenzi

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

On Sun, 6 Jan 2008, Christian Kujau wrote:

> On Sat, 5 Jan 2008, Davide Libenzi wrote:
> > A solution may be to move the call to ep_poll_safewake() (that'd become a
> > simple wake_up()) inside a tasklet or whatever is today trendy for delayed
> > work. But his kinda scares me to be honest, since epoll has already a
> > bunch of places where it could be asynchronously hit (plus performance
> > regression will need to be verified).
>
> Although I'm not able to reproduce this one right now, I'm happy to test any
> patches you guys come up with.

There's no need to reproduce it. It's there, it's among us ;)

- Davide

2008-01-13 16:32:25

by Peter Zijlstra

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

On Mon, 2008-01-07 at 20:49 +0300, Oleg Nesterov wrote:
> On 01/07, Oleg Nesterov wrote:
> >
> > Consider this "just for illustration" patch,
> >
> > --- t/kernel/lockdep.c 2007-11-09 12:57:31.000000000 +0300
> > +++ t/kernel/lockdep.c 2008-01-07 19:43:50.000000000 +0300
> > @@ -1266,10 +1266,13 @@ check_deadlock(struct task_struct *curr,
> > struct held_lock *prev;
> > int i;
> >
> > - for (i = 0; i < curr->lockdep_depth; i++) {
> > + for (i = curr->lockdep_depth; --i >= 0; ) {
> > prev = curr->held_locks + i;
> > if (prev->class != next->class)
> > continue;
> > +
> > + if (prev->trylock == -1)
> > + return 2;
> > /*
> > * Allow read-after-read recursion of the same
> > * lock class (i.e. read_lock(lock)+read_lock(lock)):
> > -------------------------------------------------------------------------
> >
> > Now,
> >
> > // trylock == -1
> > #define spin_mark_nested(l) \
> > lock_acquire(&(l)->dep_map, 0, -1, 0, 2, _THIS_IP_)
> > #define spin_unmark_nested(l) \
> > lock_release(&(l)->dep_map, 1, _THIS_IP_)
> >
> > and ep_poll_safewake() can do:
> >
> > /* Do really wake up now */
> > spin_mark_nested(&wq->lock);
> > wake_up(wq);
> > spin_unmark_nested(&wq->lock);
>
> I tested the patch above with the following code,
>
> wait_queue_head_t w1, w2, w3;
>
> init_waitqueue_head(&w1);
> init_waitqueue_head(&w2);
> init_waitqueue_head(&w3);
>
> local_irq_disable();
> spin_lock(&w1.lock);
>
> spin_mark_nested(&w2.lock);
> spin_lock(&w2.lock);
>
> spin_mark_nested(&w3.lock);
> wake_up(&w3);
> spin_unmark_nested(&w3.lock);
>
> spin_unlock(&w2.lock);
> spin_unmark_nested(&w2.lock);
>
> spin_unlock(&w1.lock);
> local_irq_enable();
>
> seems to work. What do you think?

I've been pondering this for a while, and some days I really like it,
some days I don't.

The problem I have with it is that it becomes very easy to falsely
annotate problems away - its a very powerful annotation. That said, its
almost powerful enough to annotate the device semaphore/mutex problem.

I think I'll do wake_up_nested() for now and keep this around.

Thanks for this very nice idea though.

2008-01-14 21:28:47

by Oleg Nesterov

[permalink] [raw]

Subject: Re: 2.6.24-rc6: possible recursive locking detected

On 01/13, Peter Zijlstra wrote:
>
> On Mon, 2008-01-07 at 20:49 +0300, Oleg Nesterov wrote:
>
> > On 01/07, Oleg Nesterov wrote:
> > >
> > > Consider this "just for illustration" patch,
> > >
> > > --- t/kernel/lockdep.c 2007-11-09 12:57:31.000000000 +0300
> > > +++ t/kernel/lockdep.c 2008-01-07 19:43:50.000000000 +0300
> > > @@ -1266,10 +1266,13 @@ check_deadlock(struct task_struct *curr,
> > > struct held_lock *prev;
> > > int i;
> > >
> > > - for (i = 0; i < curr->lockdep_depth; i++) {
> > > + for (i = curr->lockdep_depth; --i >= 0; ) {
> > > prev = curr->held_locks + i;
> > > if (prev->class != next->class)
> > > continue;
> > > +
> > > + if (prev->trylock == -1)
> > > + return 2;
> > > /*
> > > * Allow read-after-read recursion of the same
> > > * lock class (i.e. read_lock(lock)+read_lock(lock)):
> > > -------------------------------------------------------------------------
> > >
> > > Now,
> > >
> > > // trylock == -1
> > > #define spin_mark_nested(l) \
> > > lock_acquire(&(l)->dep_map, 0, -1, 0, 2, _THIS_IP_)
> > > #define spin_unmark_nested(l) \
> > > lock_release(&(l)->dep_map, 1, _THIS_IP_)
> > >
> > > and ep_poll_safewake() can do:
> > >
> > > /* Do really wake up now */
> > > spin_mark_nested(&wq->lock);
> > > wake_up(wq);
> > > spin_unmark_nested(&wq->lock);
> >
> > seems to work. What do you think?
>
> I've been pondering this for a while, and some days I really like it,
> some days I don't.
>
> The problem I have with it is that it becomes very easy to falsely
> annotate problems away - its a very powerful annotation.

Also, I don't like the overloading of ->trylock, this is really hackish.

> I think I'll do wake_up_nested() for now and keep this around.

Agreed.

Perhaps it is a bit easier to use spin_lock_nested() + __wake_up_common()
directly (we have a lot of wake_up_xxx helpers), but this is up to you.

Offtopic question. Why do we have so many lockdep stuff in timer.c and hrtimer.c ?
We never lock 2 bases at the same time, except in migrate_timers(). We can kill
double_spin_lock() and base_lock_keys[] and just use spin_lock_nested in
migrate_timers(), no?

Oleg.

2008-01-30 10:34:50

by Peter Zijlstra

[permalink] [raw]

Subject: hrtimers and lockdep (was: Re: 2.6.24-rc6: possible recursive locking detected)

( trimmed CC list )

Sorry for the delay, this message seems to have gotten lost in my
inbox :-/

On Tue, 2008-01-15 at 00:27 +0300, Oleg Nesterov wrote:
> On 01/13, Peter Zijlstra wrote:

> Offtopic question. Why do we have so many lockdep stuff in timer.c and hrtimer.c ?
> We never lock 2 bases at the same time, except in migrate_timers(). We can kill
> double_spin_lock() and base_lock_keys[] and just use spin_lock_nested in
> migrate_timers(), no?

Lets ask Thomas.. :-)

2008-01-30 17:37:08

by Thomas Gleixner

[permalink] [raw]

Subject: Re: hrtimers and lockdep (was: Re: 2.6.24-rc6: possible recursive locking detected)

On Wed, 30 Jan 2008, Peter Zijlstra wrote:

> ( trimmed CC list )
>
> Sorry for the delay, this message seems to have gotten lost in my
> inbox :-/
>
> On Tue, 2008-01-15 at 00:27 +0300, Oleg Nesterov wrote:
> > On 01/13, Peter Zijlstra wrote:
>
> > Offtopic question. Why do we have so many lockdep stuff in timer.c and hrtimer.c ?
> > We never lock 2 bases at the same time, except in migrate_timers(). We can kill
> > double_spin_lock() and base_lock_keys[] and just use spin_lock_nested in
> > migrate_timers(), no?
>
> Lets ask Thomas.. :-)

No objections as long as it all works :)

tglx