2014-10-20 20:31:29

by Kevin Fenzi

[permalink] [raw]
Subject: localed stuck in recent 3.18 git in copy_net_ns?

Greetings.

I'm seeing suspend/resume failures with recent 3.18 git kernels.

Full dmesg at: http://paste.fedoraproject.org/143615/83287914/

The possibly interesting parts:

[ 78.373144] PM: Syncing filesystems ... done.
[ 78.411180] PM: Preparing system for mem sleep
[ 78.411995] Freezing user space processes ...
[ 98.429955] Freezing of tasks failed after 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
[ 98.429971] (-localed) D ffff88025f214c80 0 1866 1 0x00000084
[ 98.429975] ffff88024e777df8 0000000000000086 ffff88009b4444b0 0000000000014c80
[ 98.429978] ffff88024e777fd8 0000000000014c80 ffff880250ffb110 ffff88009b4444b0
[ 98.429981] 0000000000000000 ffffffff81cec1a0 ffffffff81cec1a4 ffff88009b4444b0
[ 98.429983] Call Trace:
[ 98.429991] [<ffffffff8175d619>] schedule_preempt_disabled+0x29/0x70
[ 98.429994] [<ffffffff8175f433>] __mutex_lock_slowpath+0xb3/0x120
[ 98.429997] [<ffffffff8175f4c3>] mutex_lock+0x23/0x40
[ 98.430001] [<ffffffff8163e325>] copy_net_ns+0x75/0x140
[ 98.430005] [<ffffffff810b8c2d>] create_new_namespaces+0xfd/0x1a0
[ 98.430008] [<ffffffff810b8e5a>] unshare_nsproxy_namespaces+0x5a/0xc0
[ 98.430012] [<ffffffff81098813>] SyS_unshare+0x193/0x340
[ 98.430015] [<ffffffff817617a9>] system_call_fastpath+0x12/0x17

[ 98.430032] Restarting tasks ... done.
[ 98.480361] PM: Syncing filesystems ... done.
[ 98.571645] PM: Preparing system for freeze sleep
[ 98.571779] Freezing user space processes ...
[ 118.592086] Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
[ 118.592102] (-localed) D ffff88025f214c80 0 1866 1 0x00000084
[ 118.592106] ffff88024e777df8 0000000000000086 ffff88009b4444b0 0000000000014c80
[ 118.592109] ffff88024e777fd8 0000000000014c80 ffff880250ffb110 ffff88009b4444b0
[ 118.592111] 0000000000000000 ffffffff81cec1a0 ffffffff81cec1a4 ffff88009b4444b0
[ 118.592114] Call Trace:
[ 118.592121] [<ffffffff8175d619>] schedule_preempt_disabled+0x29/0x70
[ 118.592125] [<ffffffff8175f433>] __mutex_lock_slowpath+0xb3/0x120
[ 118.592127] [<ffffffff8175f4c3>] mutex_lock+0x23/0x40
[ 118.592132] [<ffffffff8163e325>] copy_net_ns+0x75/0x140
[ 118.592136] [<ffffffff810b8c2d>] create_new_namespaces+0xfd/0x1a0
[ 118.592139] [<ffffffff810b8e5a>] unshare_nsproxy_namespaces+0x5a/0xc0
[ 118.592143] [<ffffffff81098813>] SyS_unshare+0x193/0x340
[ 118.592146] [<ffffffff817617a9>] system_call_fastpath+0x12/0x17

[ 118.592163] Restarting tasks ... done.

root 6 0.0 0.0 0 0 ? D 13:49 0:00 [kworker/u16:0]
root 1876 0.0 0.0 41460 5784 ? Ds 13:49 0:00 (-localed)

I'll try and bisect this, but perhaps it rings bells already for folks.

kevin


Attachments:
signature.asc (819.00 B)

2014-10-20 20:43:38

by Dave Jones

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Mon, Oct 20, 2014 at 02:15:15PM -0600, Kevin Fenzi wrote:

> I'm seeing suspend/resume failures with recent 3.18 git kernels.
>
> Full dmesg at: http://paste.fedoraproject.org/143615/83287914/
>
> The possibly interesting parts:
>
> [ 78.373144] PM: Syncing filesystems ... done.
> [ 78.411180] PM: Preparing system for mem sleep
> [ 78.411995] Freezing user space processes ...
> [ 98.429955] Freezing of tasks failed after 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
> [ 98.429971] (-localed) D ffff88025f214c80 0 1866 1 0x00000084
> [ 98.429975] ffff88024e777df8 0000000000000086 ffff88009b4444b0 0000000000014c80
> [ 98.429978] ffff88024e777fd8 0000000000014c80 ffff880250ffb110 ffff88009b4444b0
> [ 98.429981] 0000000000000000 ffffffff81cec1a0 ffffffff81cec1a4 ffff88009b4444b0
> [ 98.429983] Call Trace:
> [ 98.429991] [<ffffffff8175d619>] schedule_preempt_disabled+0x29/0x70
> [ 98.429994] [<ffffffff8175f433>] __mutex_lock_slowpath+0xb3/0x120
> [ 98.429997] [<ffffffff8175f4c3>] mutex_lock+0x23/0x40
> [ 98.430001] [<ffffffff8163e325>] copy_net_ns+0x75/0x140
> [ 98.430005] [<ffffffff810b8c2d>] create_new_namespaces+0xfd/0x1a0
> [ 98.430008] [<ffffffff810b8e5a>] unshare_nsproxy_namespaces+0x5a/0xc0
> [ 98.430012] [<ffffffff81098813>] SyS_unshare+0x193/0x340
> [ 98.430015] [<ffffffff817617a9>] system_call_fastpath+0x12/0x17

I've seen similar soft lockup traces from the sys_unshare path when running my
fuzz tester. It seems that if you create enough network namespaces,
it can take a huge amount of time for them to be iterated.
(Running trinity with '-c unshare' you can see the slow down happen. In
some cases, it takes so long that the watchdog process kills it --
though the SIGKILL won't get delivered until the unshare() completes)

Any idea what this machine had been doing prior to this that may have
involved creating lots of namespaces ?

Dave

2014-10-20 20:54:04

by Kevin Fenzi

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Mon, 20 Oct 2014 16:43:26 -0400
Dave Jones <[email protected]> wrote:

> I've seen similar soft lockup traces from the sys_unshare path when
> running my fuzz tester. It seems that if you create enough network
> namespaces, it can take a huge amount of time for them to be iterated.
> (Running trinity with '-c unshare' you can see the slow down happen.
> In some cases, it takes so long that the watchdog process kills it --
> though the SIGKILL won't get delivered until the unshare() completes)
>
> Any idea what this machine had been doing prior to this that may have
> involved creating lots of namespaces ?

That was right after boot. ;)

This is my main rawhide running laptop.

A 'ip netns list' shows nothing.

kevin


Attachments:
signature.asc (819.00 B)

2014-10-21 21:12:31

by Kevin Fenzi

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Mon, 20 Oct 2014 14:53:59 -0600
Kevin Fenzi <[email protected]> wrote:

> On Mon, 20 Oct 2014 16:43:26 -0400
> Dave Jones <[email protected]> wrote:
>
> > I've seen similar soft lockup traces from the sys_unshare path when
> > running my fuzz tester. It seems that if you create enough network
> > namespaces, it can take a huge amount of time for them to be
> > iterated. (Running trinity with '-c unshare' you can see the slow
> > down happen. In some cases, it takes so long that the watchdog
> > process kills it -- though the SIGKILL won't get delivered until
> > the unshare() completes)
> >
> > Any idea what this machine had been doing prior to this that may
> > have involved creating lots of namespaces ?
>
> That was right after boot. ;)
>
> This is my main rawhide running laptop.
>
> A 'ip netns list' shows nothing.

Some more information:

The problem started between:

v3.17-7872-g5ff0b9e1a1da and v3.17-8307-gf1d0d14120a8

(I can try and do a bisect, but have to head out on a trip tomorrow)

In all the kernels with the problem, there is a kworker process in D.

sysrq-t says:
Showing all locks held in the system:
Oct 21 15:06:31 voldemort.scrye.com kernel: 4 locks held by kworker/u16:0/6:
Oct 21 15:06:31 voldemort.scrye.com kernel: #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccbff>] process_one_work+0x17f/0x850
Oct 21 15:06:31 voldemort.scrye.com kernel: #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810ccbff>] process_one_work+0x17f/0x850
Oct 21 15:06:31 voldemort.scrye.com kernel: #2: (net_mutex){+.+.+.}, at: [<ffffffff817069fc>] cleanup_net+0x8c/0x1f0
Oct 21 15:06:31 voldemort.scrye.com kernel: #3:
(rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a395>]
_rcu_barrier+0x35/0x200

On first running any of the systemd units that use PrivateNetwork, then
run ok, but they are also set to timeout after a minute. On sucessive
runs they hang in D also.

kevin


Attachments:
signature.asc (819.00 B)

2014-10-22 17:12:29

by Josh Boyer

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Tue, Oct 21, 2014 at 5:12 PM, Kevin Fenzi <[email protected]> wrote:
> On Mon, 20 Oct 2014 14:53:59 -0600
> Kevin Fenzi <[email protected]> wrote:
>
>> On Mon, 20 Oct 2014 16:43:26 -0400
>> Dave Jones <[email protected]> wrote:
>>
>> > I've seen similar soft lockup traces from the sys_unshare path when
>> > running my fuzz tester. It seems that if you create enough network
>> > namespaces, it can take a huge amount of time for them to be
>> > iterated. (Running trinity with '-c unshare' you can see the slow
>> > down happen. In some cases, it takes so long that the watchdog
>> > process kills it -- though the SIGKILL won't get delivered until
>> > the unshare() completes)
>> >
>> > Any idea what this machine had been doing prior to this that may
>> > have involved creating lots of namespaces ?
>>
>> That was right after boot. ;)
>>
>> This is my main rawhide running laptop.
>>
>> A 'ip netns list' shows nothing.
>
> Some more information:
>
> The problem started between:
>
> v3.17-7872-g5ff0b9e1a1da and v3.17-8307-gf1d0d14120a8
>
> (I can try and do a bisect, but have to head out on a trip tomorrow)
>
> In all the kernels with the problem, there is a kworker process in D.
>
> sysrq-t says:
> Showing all locks held in the system:
> Oct 21 15:06:31 voldemort.scrye.com kernel: 4 locks held by kworker/u16:0/6:
> Oct 21 15:06:31 voldemort.scrye.com kernel: #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccbff>] process_one_work+0x17f/0x850
> Oct 21 15:06:31 voldemort.scrye.com kernel: #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810ccbff>] process_one_work+0x17f/0x850
> Oct 21 15:06:31 voldemort.scrye.com kernel: #2: (net_mutex){+.+.+.}, at: [<ffffffff817069fc>] cleanup_net+0x8c/0x1f0
> Oct 21 15:06:31 voldemort.scrye.com kernel: #3:
> (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a395>]
> _rcu_barrier+0x35/0x200
>
> On first running any of the systemd units that use PrivateNetwork, then
> run ok, but they are also set to timeout after a minute. On sucessive
> runs they hang in D also.

Someone else is seeing this when they try and modprobe ppp_generic:

[ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
[ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
[ 240.599744] Workqueue: netns cleanup_net
[ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
00000000001d5f00
[ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
ffff8802202db480
[ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
ffffffff81ee2690
[ 240.600386] Call Trace:
[ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
[ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
[ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
[ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
[ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
[ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
[ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
[ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
[ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
[ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
[ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
[ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
[ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
[ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
[ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
[ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
[ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
[ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
[ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
[ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
[ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
[ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
[ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
[ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
[ 240.603224] 4 locks held by kworker/u16:5/100:
[ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
process_one_work+0x17f/0x850
[ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
[<ffffffff810ccf0f>] process_one_work+0x17f/0x850
[ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
cleanup_net+0x8c/0x1f0
[ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
[<ffffffff8112a625>] _rcu_barrier+0x35/0x200
[ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
[ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
[ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
00000000001d5f00
[ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
ffff8800cb4f1a40
[ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
ffff8800cb4f1a40
[ 240.605228] Call Trace:
[ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
[ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
[ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
[ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
[ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.606773] 1 lock held by modprobe/1387:
[ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
register_pernet_subsys+0x1f/0x50
[ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
[ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
[ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
[ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
00000000001d5f00
[ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
ffff88020fbab480
[ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
ffff88020fbab480
[ 240.608138] Call Trace:
[ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
[ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
[ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
[ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
[ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
[ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
[ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
[ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
[ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
[ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
[ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
[ 240.609677] 1 lock held by modprobe/1466:
[ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
register_pernet_device+0x1d/0x70

Looks like contention on net_mutex or something, but I honestly have
no idea yet. I can't recreate it myself at the moment or I would
bisect.

Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
carrying any patches in this area.

josh

2014-10-22 17:37:58

by Cong Wang

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

(Adding Paul and Eric in Cc)

I am not aware of any change in net/core/dev.c related here,
so I guess it's a bug in rcu_barrier().

Thanks.

On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <[email protected]> wrote:
>
> Someone else is seeing this when they try and modprobe ppp_generic:
>
> [ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
> [ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> [ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
> [ 240.599744] Workqueue: netns cleanup_net
> [ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
> 00000000001d5f00
> [ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
> ffff8802202db480
> [ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
> ffffffff81ee2690
> [ 240.600386] Call Trace:
> [ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
> [ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
> [ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
> [ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
> [ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
> [ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
> [ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
> [ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
> [ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
> [ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
> [ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
> [ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
> [ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
> [ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
> [ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
> [ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
> [ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
> [ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
> [ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
> [ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
> [ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
> [ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
> [ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> [ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
> [ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> [ 240.603224] 4 locks held by kworker/u16:5/100:
> [ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
> process_one_work+0x17f/0x850
> [ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
> [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
> [ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
> cleanup_net+0x8c/0x1f0
> [ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
> [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
> [ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
> [ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> [ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
> [ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
> 00000000001d5f00
> [ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
> ffff8800cb4f1a40
> [ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
> ffff8800cb4f1a40
> [ 240.605228] Call Trace:
> [ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> [ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> [ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> [ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> [ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
> [ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
> [ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
> [ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> [ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
> [ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> [ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> [ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> [ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> [ 240.606773] 1 lock held by modprobe/1387:
> [ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
> register_pernet_subsys+0x1f/0x50
> [ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
> [ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> [ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
> [ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
> 00000000001d5f00
> [ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
> ffff88020fbab480
> [ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
> ffff88020fbab480
> [ 240.608138] Call Trace:
> [ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> [ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> [ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> [ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> [ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
> [ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
> [ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
> [ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> [ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
> [ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> [ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> [ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> [ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> [ 240.609677] 1 lock held by modprobe/1466:
> [ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
> register_pernet_device+0x1d/0x70
>
> Looks like contention on net_mutex or something, but I honestly have
> no idea yet. I can't recreate it myself at the moment or I would
> bisect.
>
> Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
> carrying any patches in this area.
>
> josh
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-10-22 17:49:40

by Josh Boyer

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Wed, Oct 22, 2014 at 1:37 PM, Cong Wang <[email protected]> wrote:
> (Adding Paul and Eric in Cc)
>
> I am not aware of any change in net/core/dev.c related here,
> so I guess it's a bug in rcu_barrier().

Possibly. The person that reported the issue below said it showed up
between Linux v3.17-7872-g5ff0b9e1a1da and Linux
v3.17-8307-gf1d0d14120a8 for them. Which is a slightly older window
than the on that Kevin reported. I haven't had a chance to dig
through the commits yet.

josh

> On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <[email protected]> wrote:
>>
>> Someone else is seeing this when they try and modprobe ppp_generic:
>>
>> [ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
>> [ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> [ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
>> [ 240.599744] Workqueue: netns cleanup_net
>> [ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
>> 00000000001d5f00
>> [ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
>> ffff8802202db480
>> [ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
>> ffffffff81ee2690
>> [ 240.600386] Call Trace:
>> [ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
>> [ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
>> [ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
>> [ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
>> [ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
>> [ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
>> [ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
>> [ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
>> [ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
>> [ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
>> [ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
>> [ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
>> [ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
>> [ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
>> [ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
>> [ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
>> [ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
>> [ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
>> [ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
>> [ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
>> [ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
>> [ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
>> [ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
>> [ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
>> [ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
>> [ 240.603224] 4 locks held by kworker/u16:5/100:
>> [ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
>> process_one_work+0x17f/0x850
>> [ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
>> [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
>> [ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
>> cleanup_net+0x8c/0x1f0
>> [ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
>> [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
>> [ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
>> [ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> [ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
>> [ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
>> 00000000001d5f00
>> [ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
>> ffff8800cb4f1a40
>> [ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
>> ffff8800cb4f1a40
>> [ 240.605228] Call Trace:
>> [ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
>> [ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
>> [ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
>> [ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
>> [ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
>> [ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
>> [ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
>> [ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> [ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
>> [ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
>> [ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
>> [ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
>> [ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
>> [ 240.606773] 1 lock held by modprobe/1387:
>> [ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
>> register_pernet_subsys+0x1f/0x50
>> [ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
>> [ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> [ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
>> [ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
>> 00000000001d5f00
>> [ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
>> ffff88020fbab480
>> [ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
>> ffff88020fbab480
>> [ 240.608138] Call Trace:
>> [ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
>> [ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
>> [ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
>> [ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
>> [ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
>> [ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
>> [ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
>> [ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> [ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
>> [ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
>> [ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
>> [ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
>> [ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
>> [ 240.609677] 1 lock held by modprobe/1466:
>> [ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
>> register_pernet_device+0x1d/0x70
>>
>> Looks like contention on net_mutex or something, but I honestly have
>> no idea yet. I can't recreate it myself at the moment or I would
>> bisect.
>>
>> Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
>> carrying any patches in this area.
>>
>> josh
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-10-22 17:54:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Cong Wang <[email protected]> writes:

> (Adding Paul and Eric in Cc)
>
>
> On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <[email protected]> wrote:
>>
>> Someone else is seeing this when they try and modprobe ppp_generic:
>>
>> [ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
>> [ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> [ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
>> [ 240.599744] Workqueue: netns cleanup_net
>> [ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
>> 00000000001d5f00
>> [ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
>> ffff8802202db480
>> [ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
>> ffffffff81ee2690
>> [ 240.600386] Call Trace:
>> [ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
>> [ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
>> [ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
>> [ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
>> [ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
>> [ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
>> [ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
>> [ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
>> [ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
>> [ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
>> [ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
>> [ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
>> [ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
>> [ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
>> [ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
>> [ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
>> [ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
>> [ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
>> [ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
>> [ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
>> [ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
>> [ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
>> [ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
>> [ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
>> [ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
>> [ 240.603224] 4 locks held by kworker/u16:5/100:
>> [ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
>> process_one_work+0x17f/0x850
>> [ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
>> [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
>> [ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
>> cleanup_net+0x8c/0x1f0
>> [ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
>> [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
>> [ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
>> [ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> [ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
>> [ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
>> 00000000001d5f00
>> [ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
>> ffff8800cb4f1a40
>> [ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
>> ffff8800cb4f1a40
>> [ 240.605228] Call Trace:
>> [ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
>> [ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
>> [ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
>> [ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
>> [ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
>> [ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
>> [ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
>> [ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> [ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
>> [ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
>> [ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
>> [ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
>> [ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
>> [ 240.606773] 1 lock held by modprobe/1387:
>> [ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
>> register_pernet_subsys+0x1f/0x50
>> [ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
>> [ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> [ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
>> [ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
>> 00000000001d5f00
>> [ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
>> ffff88020fbab480
>> [ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
>> ffff88020fbab480
>> [ 240.608138] Call Trace:
>> [ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
>> [ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
>> [ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
>> [ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
>> [ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
>> [ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
>> [ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
>> [ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> [ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
>> [ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
>> [ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
>> [ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
>> [ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
>> [ 240.609677] 1 lock held by modprobe/1466:
>> [ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
>> register_pernet_device+0x1d/0x70
>>
>> Looks like contention on net_mutex or something, but I honestly have
>> no idea yet. I can't recreate it myself at the moment or I would
>> bisect.
>>
>> Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
>> carrying any patches in this area.

> I am not aware of any change in net/core/dev.c related here,
> so I guess it's a bug in rcu_barrier().

>From the limited trace data I see in this email I have to agree.

It looks like for some reason rcu_barrier is taking forever
while the rtnl_lock is held in cleanup_net. Because the
rtnl_lock is held modprobe of the ppp driver is getting stuck.

Is it possible we have an AB BA deadlock between the rtnl_lock
and rcu. With something the module loading code assumes?

Eric

2014-10-22 17:59:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Wed, Oct 22, 2014 at 10:37:53AM -0700, Cong Wang wrote:
> (Adding Paul and Eric in Cc)
>
> I am not aware of any change in net/core/dev.c related here,
> so I guess it's a bug in rcu_barrier().
>
> Thanks.

Does commit 789cbbeca4e (workqueue: Add quiescent state between work items)
and 3e28e3772 (workqueue: Use cond_resched_rcu_qs macro) help this?

Thanx, Paul

> On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <[email protected]> wrote:
> >
> > Someone else is seeing this when they try and modprobe ppp_generic:
> >
> > [ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
> > [ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> > [ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
> > [ 240.599744] Workqueue: netns cleanup_net
> > [ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
> > 00000000001d5f00
> > [ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
> > ffff8802202db480
> > [ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
> > ffffffff81ee2690
> > [ 240.600386] Call Trace:
> > [ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
> > [ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
> > [ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
> > [ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
> > [ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
> > [ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
> > [ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
> > [ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
> > [ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
> > [ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
> > [ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
> > [ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
> > [ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
> > [ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
> > [ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
> > [ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
> > [ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
> > [ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
> > [ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
> > [ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
> > [ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
> > [ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
> > [ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> > [ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
> > [ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> > [ 240.603224] 4 locks held by kworker/u16:5/100:
> > [ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
> > process_one_work+0x17f/0x850
> > [ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
> > [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
> > [ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
> > cleanup_net+0x8c/0x1f0
> > [ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
> > [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
> > [ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
> > [ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> > [ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
> > [ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
> > 00000000001d5f00
> > [ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
> > ffff8800cb4f1a40
> > [ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
> > ffff8800cb4f1a40
> > [ 240.605228] Call Trace:
> > [ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> > [ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> > [ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> > [ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> > [ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
> > [ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
> > [ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
> > [ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> > [ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
> > [ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> > [ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> > [ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> > [ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> > [ 240.606773] 1 lock held by modprobe/1387:
> > [ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
> > register_pernet_subsys+0x1f/0x50
> > [ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
> > [ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> > [ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
> > [ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
> > 00000000001d5f00
> > [ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
> > ffff88020fbab480
> > [ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
> > ffff88020fbab480
> > [ 240.608138] Call Trace:
> > [ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> > [ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> > [ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> > [ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> > [ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
> > [ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
> > [ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
> > [ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> > [ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
> > [ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> > [ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> > [ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> > [ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> > [ 240.609677] 1 lock held by modprobe/1466:
> > [ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
> > register_pernet_device+0x1d/0x70
> >
> > Looks like contention on net_mutex or something, but I honestly have
> > no idea yet. I can't recreate it myself at the moment or I would
> > bisect.
> >
> > Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
> > carrying any patches in this area.
> >
> > josh
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2014-10-22 18:03:14

by Josh Boyer

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Wed, Oct 22, 2014 at 1:59 PM, Paul E. McKenney
<[email protected]> wrote:
> On Wed, Oct 22, 2014 at 10:37:53AM -0700, Cong Wang wrote:
>> (Adding Paul and Eric in Cc)
>>
>> I am not aware of any change in net/core/dev.c related here,
>> so I guess it's a bug in rcu_barrier().
>>
>> Thanks.
>
> Does commit 789cbbeca4e (workqueue: Add quiescent state between work items)
> and 3e28e3772 (workqueue: Use cond_resched_rcu_qs macro) help this?

I don't believe so. The output below is from a post 3.18-rc1 kernel
(Linux v3.18-rc1-221-gc3351dfabf5c to be exact), and both of those
commits are included in that if I'm reading the git output correctly.

josh

>> On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <[email protected]> wrote:
>> >
>> > Someone else is seeing this when they try and modprobe ppp_generic:
>> >
>> > [ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
>> > [ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> > [ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> > disables this message.
>> > [ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
>> > [ 240.599744] Workqueue: netns cleanup_net
>> > [ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
>> > 00000000001d5f00
>> > [ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
>> > ffff8802202db480
>> > [ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
>> > ffffffff81ee2690
>> > [ 240.600386] Call Trace:
>> > [ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
>> > [ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
>> > [ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
>> > [ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
>> > [ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
>> > [ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
>> > [ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
>> > [ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
>> > [ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
>> > [ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
>> > [ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
>> > [ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
>> > [ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
>> > [ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
>> > [ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
>> > [ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
>> > [ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
>> > [ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
>> > [ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
>> > [ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
>> > [ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
>> > [ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
>> > [ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
>> > [ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
>> > [ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
>> > [ 240.603224] 4 locks held by kworker/u16:5/100:
>> > [ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
>> > process_one_work+0x17f/0x850
>> > [ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
>> > [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
>> > [ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
>> > cleanup_net+0x8c/0x1f0
>> > [ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
>> > [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
>> > [ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
>> > [ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> > [ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> > disables this message.
>> > [ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
>> > [ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
>> > 00000000001d5f00
>> > [ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
>> > ffff8800cb4f1a40
>> > [ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
>> > ffff8800cb4f1a40
>> > [ 240.605228] Call Trace:
>> > [ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
>> > [ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
>> > [ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
>> > [ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
>> > [ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
>> > [ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
>> > [ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
>> > [ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> > [ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
>> > [ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
>> > [ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
>> > [ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
>> > [ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
>> > [ 240.606773] 1 lock held by modprobe/1387:
>> > [ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
>> > register_pernet_subsys+0x1f/0x50
>> > [ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
>> > [ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> > [ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> > disables this message.
>> > [ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
>> > [ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
>> > 00000000001d5f00
>> > [ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
>> > ffff88020fbab480
>> > [ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
>> > ffff88020fbab480
>> > [ 240.608138] Call Trace:
>> > [ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
>> > [ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
>> > [ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
>> > [ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
>> > [ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
>> > [ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
>> > [ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
>> > [ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> > [ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
>> > [ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
>> > [ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
>> > [ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
>> > [ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
>> > [ 240.609677] 1 lock held by modprobe/1466:
>> > [ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
>> > register_pernet_device+0x1d/0x70
>> >
>> > Looks like contention on net_mutex or something, but I honestly have
>> > no idea yet. I can't recreate it myself at the moment or I would
>> > bisect.
>> >
>> > Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
>> > carrying any patches in this area.
>> >
>> > josh
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe netdev" in
>> > the body of a message to [email protected]
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>

2014-10-22 18:11:42

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Wed, Oct 22, 2014 at 12:53:24PM -0500, Eric W. Biederman wrote:
> Cong Wang <[email protected]> writes:
>
> > (Adding Paul and Eric in Cc)
> >
> >
> > On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <[email protected]> wrote:
> >>
> >> Someone else is seeing this when they try and modprobe ppp_generic:
> >>
> >> [ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
> >> [ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> [ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
> >> [ 240.599744] Workqueue: netns cleanup_net
> >> [ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
> >> 00000000001d5f00
> >> [ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
> >> ffff8802202db480
> >> [ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
> >> ffffffff81ee2690
> >> [ 240.600386] Call Trace:
> >> [ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
> >> [ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
> >> [ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
> >> [ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
> >> [ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
> >> [ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
> >> [ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
> >> [ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
> >> [ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
> >> [ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
> >> [ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
> >> [ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
> >> [ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
> >> [ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
> >> [ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
> >> [ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
> >> [ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
> >> [ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
> >> [ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
> >> [ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
> >> [ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
> >> [ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
> >> [ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> >> [ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
> >> [ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> >> [ 240.603224] 4 locks held by kworker/u16:5/100:
> >> [ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
> >> process_one_work+0x17f/0x850
> >> [ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
> >> [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
> >> [ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
> >> cleanup_net+0x8c/0x1f0
> >> [ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
> >> [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
> >> [ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
> >> [ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> [ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
> >> [ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
> >> 00000000001d5f00
> >> [ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
> >> ffff8800cb4f1a40
> >> [ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
> >> ffff8800cb4f1a40
> >> [ 240.605228] Call Trace:
> >> [ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> >> [ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> >> [ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> >> [ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> >> [ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
> >> [ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
> >> [ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
> >> [ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >> [ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
> >> [ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> >> [ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> >> [ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> >> [ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> >> [ 240.606773] 1 lock held by modprobe/1387:
> >> [ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
> >> register_pernet_subsys+0x1f/0x50
> >> [ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
> >> [ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> [ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
> >> [ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
> >> 00000000001d5f00
> >> [ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
> >> ffff88020fbab480
> >> [ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
> >> ffff88020fbab480
> >> [ 240.608138] Call Trace:
> >> [ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> >> [ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> >> [ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> >> [ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> >> [ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
> >> [ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
> >> [ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
> >> [ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >> [ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
> >> [ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> >> [ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> >> [ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> >> [ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> >> [ 240.609677] 1 lock held by modprobe/1466:
> >> [ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
> >> register_pernet_device+0x1d/0x70
> >>
> >> Looks like contention on net_mutex or something, but I honestly have
> >> no idea yet. I can't recreate it myself at the moment or I would
> >> bisect.
> >>
> >> Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
> >> carrying any patches in this area.
>
> > I am not aware of any change in net/core/dev.c related here,
> > so I guess it's a bug in rcu_barrier().
>
> >From the limited trace data I see in this email I have to agree.
>
> It looks like for some reason rcu_barrier is taking forever
> while the rtnl_lock is held in cleanup_net. Because the
> rtnl_lock is held modprobe of the ppp driver is getting stuck.
>
> Is it possible we have an AB BA deadlock between the rtnl_lock
> and rcu. With something the module loading code assumes?

I am not aware of RCU ever acquiring rtnl_lock, not directly, anyway.

Thanx, Paul

2014-10-22 18:26:30

by Eric W. Biederman

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

"Paul E. McKenney" <[email protected]> writes:

> On Wed, Oct 22, 2014 at 12:53:24PM -0500, Eric W. Biederman wrote:
>> Cong Wang <[email protected]> writes:
>>
>> > (Adding Paul and Eric in Cc)
>> >
>> >
>> > On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <[email protected]> wrote:
>> >>
>> >> Someone else is seeing this when they try and modprobe ppp_generic:
>> >>
>> >> [ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
>> >> [ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> >> [ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> >> disables this message.
>> >> [ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
>> >> [ 240.599744] Workqueue: netns cleanup_net
>> >> [ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
>> >> 00000000001d5f00
>> >> [ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
>> >> ffff8802202db480
>> >> [ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
>> >> ffffffff81ee2690
>> >> [ 240.600386] Call Trace:
>> >> [ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
>> >> [ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
>> >> [ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
>> >> [ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
>> >> [ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
>> >> [ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
>> >> [ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
>> >> [ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
>> >> [ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
>> >> [ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
>> >> [ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
>> >> [ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
>> >> [ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
>> >> [ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
>> >> [ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
>> >> [ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
>> >> [ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
>> >> [ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
>> >> [ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
>> >> [ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
>> >> [ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
>> >> [ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
>> >> [ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
>> >> [ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
>> >> [ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
>> >> [ 240.603224] 4 locks held by kworker/u16:5/100:
>> >> [ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
>> >> process_one_work+0x17f/0x850
>> >> [ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
>> >> [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
>> >> [ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
>> >> cleanup_net+0x8c/0x1f0
>> >> [ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
>> >> [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
>> >> [ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
>> >> [ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> >> [ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> >> disables this message.
>> >> [ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
>> >> [ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
>> >> 00000000001d5f00
>> >> [ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
>> >> ffff8800cb4f1a40
>> >> [ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
>> >> ffff8800cb4f1a40
>> >> [ 240.605228] Call Trace:
>> >> [ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
>> >> [ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
>> >> [ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
>> >> [ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
>> >> [ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
>> >> [ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
>> >> [ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
>> >> [ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> >> [ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
>> >> [ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
>> >> [ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
>> >> [ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
>> >> [ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
>> >> [ 240.606773] 1 lock held by modprobe/1387:
>> >> [ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
>> >> register_pernet_subsys+0x1f/0x50
>> >> [ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
>> >> [ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> >> [ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> >> disables this message.
>> >> [ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
>> >> [ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
>> >> 00000000001d5f00
>> >> [ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
>> >> ffff88020fbab480
>> >> [ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
>> >> ffff88020fbab480
>> >> [ 240.608138] Call Trace:
>> >> [ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
>> >> [ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
>> >> [ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
>> >> [ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
>> >> [ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
>> >> [ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
>> >> [ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
>> >> [ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> >> [ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
>> >> [ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
>> >> [ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
>> >> [ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
>> >> [ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
>> >> [ 240.609677] 1 lock held by modprobe/1466:
>> >> [ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
>> >> register_pernet_device+0x1d/0x70
>> >>
>> >> Looks like contention on net_mutex or something, but I honestly have
>> >> no idea yet. I can't recreate it myself at the moment or I would
>> >> bisect.
>> >>
>> >> Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
>> >> carrying any patches in this area.
>>
>> > I am not aware of any change in net/core/dev.c related here,
>> > so I guess it's a bug in rcu_barrier().
>>
>> >From the limited trace data I see in this email I have to agree.
>>
>> It looks like for some reason rcu_barrier is taking forever
>> while the rtnl_lock is held in cleanup_net. Because the
>> rtnl_lock is held modprobe of the ppp driver is getting stuck.
>>
>> Is it possible we have an AB BA deadlock between the rtnl_lock
>> and rcu. With something the module loading code assumes?
>
> I am not aware of RCU ever acquiring rtnl_lock, not directly, anyway.

Does the module loading code do something strange with rcu? Perhaps
blocking an rcu grace period until the module loading completes?

If the module loading somehow blocks an rcu grace period that would
create an AB deadlock because loading the ppp module grabs the
rtnl_lock. And elsewhere we have the rtnl_lock waiting for an rcu grace
period.

I would think trying and failing to get the rtnl_lock would sleep and
thus let any rcu grace period happen but shrug.

It looks like something is holding up the rcu grace period, and causing
this. Although it is possible that something is causing cleanup_net
to run slowly and we are just seeing that slowness show up in
rcu_barrier as that is one of the slower bits. With a single trace I
can't definitely same that the rcu barrier is getting stuck but it
certainly looks that way.

Eric

2014-10-22 18:55:19

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Wed, Oct 22, 2014 at 01:25:37PM -0500, Eric W. Biederman wrote:
> "Paul E. McKenney" <[email protected]> writes:
>
> > On Wed, Oct 22, 2014 at 12:53:24PM -0500, Eric W. Biederman wrote:
> >> Cong Wang <[email protected]> writes:
> >>
> >> > (Adding Paul and Eric in Cc)
> >> >
> >> >
> >> > On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <[email protected]> wrote:
> >> >>
> >> >> Someone else is seeing this when they try and modprobe ppp_generic:
> >> >>
> >> >> [ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
> >> >> [ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> >> [ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> >> disables this message.
> >> >> [ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
> >> >> [ 240.599744] Workqueue: netns cleanup_net
> >> >> [ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
> >> >> 00000000001d5f00
> >> >> [ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
> >> >> ffff8802202db480
> >> >> [ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
> >> >> ffffffff81ee2690
> >> >> [ 240.600386] Call Trace:
> >> >> [ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
> >> >> [ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
> >> >> [ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
> >> >> [ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
> >> >> [ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
> >> >> [ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
> >> >> [ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
> >> >> [ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
> >> >> [ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
> >> >> [ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
> >> >> [ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
> >> >> [ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
> >> >> [ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
> >> >> [ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
> >> >> [ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
> >> >> [ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
> >> >> [ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
> >> >> [ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
> >> >> [ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
> >> >> [ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
> >> >> [ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
> >> >> [ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
> >> >> [ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> >> >> [ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
> >> >> [ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> >> >> [ 240.603224] 4 locks held by kworker/u16:5/100:
> >> >> [ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
> >> >> process_one_work+0x17f/0x850
> >> >> [ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
> >> >> [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
> >> >> [ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
> >> >> cleanup_net+0x8c/0x1f0
> >> >> [ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
> >> >> [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
> >> >> [ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
> >> >> [ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> >> [ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> >> disables this message.
> >> >> [ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
> >> >> [ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
> >> >> 00000000001d5f00
> >> >> [ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
> >> >> ffff8800cb4f1a40
> >> >> [ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
> >> >> ffff8800cb4f1a40
> >> >> [ 240.605228] Call Trace:
> >> >> [ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> >> >> [ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> >> >> [ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> >> >> [ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> >> >> [ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
> >> >> [ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
> >> >> [ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
> >> >> [ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >> >> [ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
> >> >> [ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> >> >> [ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> >> >> [ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> >> >> [ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> >> >> [ 240.606773] 1 lock held by modprobe/1387:
> >> >> [ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
> >> >> register_pernet_subsys+0x1f/0x50
> >> >> [ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
> >> >> [ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> >> [ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> >> disables this message.
> >> >> [ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
> >> >> [ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
> >> >> 00000000001d5f00
> >> >> [ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
> >> >> ffff88020fbab480
> >> >> [ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
> >> >> ffff88020fbab480
> >> >> [ 240.608138] Call Trace:
> >> >> [ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> >> >> [ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> >> >> [ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> >> >> [ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> >> >> [ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
> >> >> [ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
> >> >> [ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
> >> >> [ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >> >> [ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
> >> >> [ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> >> >> [ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> >> >> [ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> >> >> [ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> >> >> [ 240.609677] 1 lock held by modprobe/1466:
> >> >> [ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
> >> >> register_pernet_device+0x1d/0x70
> >> >>
> >> >> Looks like contention on net_mutex or something, but I honestly have
> >> >> no idea yet. I can't recreate it myself at the moment or I would
> >> >> bisect.
> >> >>
> >> >> Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
> >> >> carrying any patches in this area.
> >>
> >> > I am not aware of any change in net/core/dev.c related here,
> >> > so I guess it's a bug in rcu_barrier().
> >>
> >> >From the limited trace data I see in this email I have to agree.
> >>
> >> It looks like for some reason rcu_barrier is taking forever
> >> while the rtnl_lock is held in cleanup_net. Because the
> >> rtnl_lock is held modprobe of the ppp driver is getting stuck.
> >>
> >> Is it possible we have an AB BA deadlock between the rtnl_lock
> >> and rcu. With something the module loading code assumes?
> >
> > I am not aware of RCU ever acquiring rtnl_lock, not directly, anyway.
>
> Does the module loading code do something strange with rcu? Perhaps
> blocking an rcu grace period until the module loading completes?
>
> If the module loading somehow blocks an rcu grace period that would
> create an AB deadlock because loading the ppp module grabs the
> rtnl_lock. And elsewhere we have the rtnl_lock waiting for an rcu grace
> period.
>
> I would think trying and failing to get the rtnl_lock would sleep and
> thus let any rcu grace period happen but shrug.
>
> It looks like something is holding up the rcu grace period, and causing
> this. Although it is possible that something is causing cleanup_net
> to run slowly and we are just seeing that slowness show up in
> rcu_barrier as that is one of the slower bits. With a single trace I
> can't definitely same that the rcu barrier is getting stuck but it
> certainly looks that way.

Don't get me wrong -- the fact that this kthread appears to have
blocked within rcu_barrier() for 120 seconds means that something is
most definitely wrong here. I am surprised that there are no RCU CPU
stall warnings, but perhaps the blockage is in the callback execution
rather than grace-period completion. Or something is preventing this
kthread from starting up after the wake-up callback executes. Or...

Is this thing reproducible?

Thanx, Paul

2014-10-22 19:33:38

by Josh Boyer

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
<[email protected]> wrote:
> On Wed, Oct 22, 2014 at 01:25:37PM -0500, Eric W. Biederman wrote:
>> "Paul E. McKenney" <[email protected]> writes:
>>
>> > On Wed, Oct 22, 2014 at 12:53:24PM -0500, Eric W. Biederman wrote:
>> >> Cong Wang <[email protected]> writes:
>> >>
>> >> > (Adding Paul and Eric in Cc)
>> >> >
>> >> >
>> >> > On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <[email protected]> wrote:
>> >> >>
>> >> >> Someone else is seeing this when they try and modprobe ppp_generic:
>> >> >>
>> >> >> [ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
>> >> >> [ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> >> >> [ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> >> >> disables this message.
>> >> >> [ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
>> >> >> [ 240.599744] Workqueue: netns cleanup_net
>> >> >> [ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
>> >> >> 00000000001d5f00
>> >> >> [ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
>> >> >> ffff8802202db480
>> >> >> [ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
>> >> >> ffffffff81ee2690
>> >> >> [ 240.600386] Call Trace:
>> >> >> [ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
>> >> >> [ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
>> >> >> [ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
>> >> >> [ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
>> >> >> [ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
>> >> >> [ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
>> >> >> [ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
>> >> >> [ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
>> >> >> [ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
>> >> >> [ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
>> >> >> [ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
>> >> >> [ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
>> >> >> [ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
>> >> >> [ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
>> >> >> [ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
>> >> >> [ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
>> >> >> [ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
>> >> >> [ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
>> >> >> [ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
>> >> >> [ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
>> >> >> [ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
>> >> >> [ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
>> >> >> [ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
>> >> >> [ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
>> >> >> [ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
>> >> >> [ 240.603224] 4 locks held by kworker/u16:5/100:
>> >> >> [ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
>> >> >> process_one_work+0x17f/0x850
>> >> >> [ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
>> >> >> [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
>> >> >> [ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
>> >> >> cleanup_net+0x8c/0x1f0
>> >> >> [ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
>> >> >> [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
>> >> >> [ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
>> >> >> [ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> >> >> [ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> >> >> disables this message.
>> >> >> [ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
>> >> >> [ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
>> >> >> 00000000001d5f00
>> >> >> [ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
>> >> >> ffff8800cb4f1a40
>> >> >> [ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
>> >> >> ffff8800cb4f1a40
>> >> >> [ 240.605228] Call Trace:
>> >> >> [ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
>> >> >> [ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
>> >> >> [ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
>> >> >> [ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
>> >> >> [ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
>> >> >> [ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
>> >> >> [ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
>> >> >> [ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> >> >> [ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
>> >> >> [ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
>> >> >> [ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
>> >> >> [ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
>> >> >> [ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
>> >> >> [ 240.606773] 1 lock held by modprobe/1387:
>> >> >> [ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
>> >> >> register_pernet_subsys+0x1f/0x50
>> >> >> [ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
>> >> >> [ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
>> >> >> [ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> >> >> disables this message.
>> >> >> [ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
>> >> >> [ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
>> >> >> 00000000001d5f00
>> >> >> [ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
>> >> >> ffff88020fbab480
>> >> >> [ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
>> >> >> ffff88020fbab480
>> >> >> [ 240.608138] Call Trace:
>> >> >> [ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
>> >> >> [ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
>> >> >> [ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
>> >> >> [ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
>> >> >> [ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
>> >> >> [ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
>> >> >> [ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
>> >> >> [ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> >> >> [ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
>> >> >> [ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
>> >> >> [ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
>> >> >> [ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
>> >> >> [ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
>> >> >> [ 240.609677] 1 lock held by modprobe/1466:
>> >> >> [ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
>> >> >> register_pernet_device+0x1d/0x70
>> >> >>
>> >> >> Looks like contention on net_mutex or something, but I honestly have
>> >> >> no idea yet. I can't recreate it myself at the moment or I would
>> >> >> bisect.
>> >> >>
>> >> >> Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
>> >> >> carrying any patches in this area.
>> >>
>> >> > I am not aware of any change in net/core/dev.c related here,
>> >> > so I guess it's a bug in rcu_barrier().
>> >>
>> >> >From the limited trace data I see in this email I have to agree.
>> >>
>> >> It looks like for some reason rcu_barrier is taking forever
>> >> while the rtnl_lock is held in cleanup_net. Because the
>> >> rtnl_lock is held modprobe of the ppp driver is getting stuck.
>> >>
>> >> Is it possible we have an AB BA deadlock between the rtnl_lock
>> >> and rcu. With something the module loading code assumes?
>> >
>> > I am not aware of RCU ever acquiring rtnl_lock, not directly, anyway.
>>
>> Does the module loading code do something strange with rcu? Perhaps
>> blocking an rcu grace period until the module loading completes?
>>
>> If the module loading somehow blocks an rcu grace period that would
>> create an AB deadlock because loading the ppp module grabs the
>> rtnl_lock. And elsewhere we have the rtnl_lock waiting for an rcu grace
>> period.
>>
>> I would think trying and failing to get the rtnl_lock would sleep and
>> thus let any rcu grace period happen but shrug.
>>
>> It looks like something is holding up the rcu grace period, and causing
>> this. Although it is possible that something is causing cleanup_net
>> to run slowly and we are just seeing that slowness show up in
>> rcu_barrier as that is one of the slower bits. With a single trace I
>> can't definitely same that the rcu barrier is getting stuck but it
>> certainly looks that way.
>
> Don't get me wrong -- the fact that this kthread appears to have
> blocked within rcu_barrier() for 120 seconds means that something is
> most definitely wrong here. I am surprised that there are no RCU CPU
> stall warnings, but perhaps the blockage is in the callback execution
> rather than grace-period completion. Or something is preventing this
> kthread from starting up after the wake-up callback executes. Or...
>
> Is this thing reproducible?

I've added Yanko on CC, who reported the backtrace above and can
recreate it reliably. Apparently reverting the RCU merge commit
(d6dd50e) and rebuilding the latest after that does not show the
issue. I'll let Yanko explain more and answer any questions you have.

josh

2014-10-22 22:47:17

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> <[email protected]> wrote:
> > On Wed, Oct 22, 2014 at 01:25:37PM -0500, Eric W. Biederman wrote:
> >> "Paul E. McKenney" <[email protected]> writes:
> >>
> >> > On Wed, Oct 22, 2014 at 12:53:24PM -0500, Eric W. Biederman wrote:
> >> >> Cong Wang <[email protected]> writes:
> >> >>
> >> >> > (Adding Paul and Eric in Cc)
> >> >> >
> >> >> >
> >> >> > On Wed, Oct 22, 2014 at 10:12 AM, Josh Boyer <[email protected]> wrote:
> >> >> >>
> >> >> >> Someone else is seeing this when they try and modprobe ppp_generic:
> >> >> >>
> >> >> >> [ 240.599195] INFO: task kworker/u16:5:100 blocked for more than 120 seconds.
> >> >> >> [ 240.599338] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> >> >> [ 240.599446] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> >> >> disables this message.
> >> >> >> [ 240.599583] kworker/u16:5 D ffff8802202db480 12400 100 2 0x00000000
> >> >> >> [ 240.599744] Workqueue: netns cleanup_net
> >> >> >> [ 240.599823] ffff8802202eb9e8 0000000000000096 ffff8802202db480
> >> >> >> 00000000001d5f00
> >> >> >> [ 240.600066] ffff8802202ebfd8 00000000001d5f00 ffff8800368c3480
> >> >> >> ffff8802202db480
> >> >> >> [ 240.600228] ffffffff81ee2690 7fffffffffffffff ffffffff81ee2698
> >> >> >> ffffffff81ee2690
> >> >> >> [ 240.600386] Call Trace:
> >> >> >> [ 240.600445] [<ffffffff8185e239>] schedule+0x29/0x70
> >> >> >> [ 240.600541] [<ffffffff8186345c>] schedule_timeout+0x26c/0x410
> >> >> >> [ 240.600651] [<ffffffff81865ef7>] ? retint_restore_args+0x13/0x13
> >> >> >> [ 240.600765] [<ffffffff818644e4>] ? _raw_spin_unlock_irq+0x34/0x50
> >> >> >> [ 240.600879] [<ffffffff8185fc6c>] wait_for_completion+0x10c/0x150
> >> >> >> [ 240.601025] [<ffffffff810e53e0>] ? wake_up_state+0x20/0x20
> >> >> >> [ 240.601133] [<ffffffff8112a749>] _rcu_barrier+0x159/0x200
> >> >> >> [ 240.601237] [<ffffffff8112a845>] rcu_barrier+0x15/0x20
> >> >> >> [ 240.601335] [<ffffffff81718ebf>] netdev_run_todo+0x6f/0x310
> >> >> >> [ 240.601442] [<ffffffff8170da85>] ? rollback_registered_many+0x265/0x2e0
> >> >> >> [ 240.601564] [<ffffffff81725f2e>] rtnl_unlock+0xe/0x10
> >> >> >> [ 240.601660] [<ffffffff8170f8e6>] default_device_exit_batch+0x156/0x180
> >> >> >> [ 240.601781] [<ffffffff810fd8a0>] ? abort_exclusive_wait+0xb0/0xb0
> >> >> >> [ 240.601895] [<ffffffff81707993>] ops_exit_list.isra.1+0x53/0x60
> >> >> >> [ 240.602028] [<ffffffff81708540>] cleanup_net+0x100/0x1f0
> >> >> >> [ 240.602131] [<ffffffff810ccfa8>] process_one_work+0x218/0x850
> >> >> >> [ 240.602241] [<ffffffff810ccf0f>] ? process_one_work+0x17f/0x850
> >> >> >> [ 240.602350] [<ffffffff810cd6c7>] ? worker_thread+0xe7/0x4a0
> >> >> >> [ 240.602454] [<ffffffff810cd64b>] worker_thread+0x6b/0x4a0
> >> >> >> [ 240.602555] [<ffffffff810cd5e0>] ? process_one_work+0x850/0x850
> >> >> >> [ 240.602665] [<ffffffff810d399b>] kthread+0x10b/0x130
> >> >> >> [ 240.602762] [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
> >> >> >> [ 240.602862] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> >> >> >> [ 240.603004] [<ffffffff818651fc>] ret_from_fork+0x7c/0xb0
> >> >> >> [ 240.603106] [<ffffffff810d3890>] ? kthread_create_on_node+0x250/0x250
> >> >> >> [ 240.603224] 4 locks held by kworker/u16:5/100:
> >> >> >> [ 240.603304] #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf0f>]
> >> >> >> process_one_work+0x17f/0x850
> >> >> >> [ 240.603495] #1: (net_cleanup_work){+.+.+.}, at:
> >> >> >> [<ffffffff810ccf0f>] process_one_work+0x17f/0x850
> >> >> >> [ 240.603691] #2: (net_mutex){+.+.+.}, at: [<ffffffff817084cc>]
> >> >> >> cleanup_net+0x8c/0x1f0
> >> >> >> [ 240.603869] #3: (rcu_sched_state.barrier_mutex){+.+...}, at:
> >> >> >> [<ffffffff8112a625>] _rcu_barrier+0x35/0x200
> >> >> >> [ 240.604211] INFO: task modprobe:1387 blocked for more than 120 seconds.
> >> >> >> [ 240.604329] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> >> >> [ 240.604434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> >> >> disables this message.
> >> >> >> [ 240.604570] modprobe D ffff8800cb4f1a40 13112 1387 1386 0x00000080
> >> >> >> [ 240.604719] ffff8800cafbbbe8 0000000000000096 ffff8800cb4f1a40
> >> >> >> 00000000001d5f00
> >> >> >> [ 240.604878] ffff8800cafbbfd8 00000000001d5f00 ffff880223280000
> >> >> >> ffff8800cb4f1a40
> >> >> >> [ 240.605068] ffff8800cb4f1a40 ffffffff81f8fb48 0000000000000246
> >> >> >> ffff8800cb4f1a40
> >> >> >> [ 240.605228] Call Trace:
> >> >> >> [ 240.605283] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> >> >> >> [ 240.605400] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> >> >> >> [ 240.605510] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> >> >> >> [ 240.605626] [<ffffffff8170835f>] ? register_pernet_subsys+0x1f/0x50
> >> >> >> [ 240.605757] [<ffffffffa0701000>] ? 0xffffffffa0701000
> >> >> >> [ 240.605854] [<ffffffff8170835f>] register_pernet_subsys+0x1f/0x50
> >> >> >> [ 240.606005] [<ffffffffa0701048>] br_init+0x48/0xd3 [bridge]
> >> >> >> [ 240.606112] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >> >> >> [ 240.606224] [<ffffffff81153c02>] load_module+0x20c2/0x2870
> >> >> >> [ 240.606327] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> >> >> >> [ 240.606433] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> >> >> >> [ 240.606557] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> >> >> >> [ 240.606664] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> >> >> >> [ 240.606773] 1 lock held by modprobe/1387:
> >> >> >> [ 240.606845] #0: (net_mutex){+.+.+.}, at: [<ffffffff8170835f>]
> >> >> >> register_pernet_subsys+0x1f/0x50
> >> >> >> [ 240.607114] INFO: task modprobe:1466 blocked for more than 120 seconds.
> >> >> >> [ 240.607231] Not tainted 3.18.0-0.rc1.git2.1.fc22.x86_64 #1
> >> >> >> [ 240.607337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> >> >> disables this message.
> >> >> >> [ 240.607473] modprobe D ffff88020fbab480 13096 1466 1399 0x00000084
> >> >> >> [ 240.607622] ffff88020d1bbbe8 0000000000000096 ffff88020fbab480
> >> >> >> 00000000001d5f00
> >> >> >> [ 240.607791] ffff88020d1bbfd8 00000000001d5f00 ffffffff81e1b580
> >> >> >> ffff88020fbab480
> >> >> >> [ 240.607949] ffff88020fbab480 ffffffff81f8fb48 0000000000000246
> >> >> >> ffff88020fbab480
> >> >> >> [ 240.608138] Call Trace:
> >> >> >> [ 240.608193] [<ffffffff8185e7e1>] schedule_preempt_disabled+0x31/0x80
> >> >> >> [ 240.608316] [<ffffffff81860033>] mutex_lock_nested+0x183/0x440
> >> >> >> [ 240.608425] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> >> >> >> [ 240.608542] [<ffffffff817083ad>] ? register_pernet_device+0x1d/0x70
> >> >> >> [ 240.608662] [<ffffffffa071d000>] ? 0xffffffffa071d000
> >> >> >> [ 240.608759] [<ffffffff817083ad>] register_pernet_device+0x1d/0x70
> >> >> >> [ 240.608881] [<ffffffffa071d020>] ppp_init+0x20/0x1000 [ppp_generic]
> >> >> >> [ 240.609021] [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >> >> >> [ 240.609131] [<ffffffff81153c02>] load_module+0x20c2/0x2870
> >> >> >> [ 240.609235] [<ffffffff8114ebe0>] ? store_uevent+0x70/0x70
> >> >> >> [ 240.609339] [<ffffffff8110ac26>] ? lock_release_non_nested+0x3c6/0x3d0
> >> >> >> [ 240.609462] [<ffffffff81154497>] SyS_init_module+0xe7/0x140
> >> >> >> [ 240.609568] [<ffffffff818652a9>] system_call_fastpath+0x12/0x17
> >> >> >> [ 240.609677] 1 lock held by modprobe/1466:
> >> >> >> [ 240.609749] #0: (net_mutex){+.+.+.}, at: [<ffffffff817083ad>]
> >> >> >> register_pernet_device+0x1d/0x70
> >> >> >>
> >> >> >> Looks like contention on net_mutex or something, but I honestly have
> >> >> >> no idea yet. I can't recreate it myself at the moment or I would
> >> >> >> bisect.
> >> >> >>
> >> >> >> Has nobody else run into this with the pre-3.18 kernels? Fedora isn't
> >> >> >> carrying any patches in this area.
> >> >>
> >> >> > I am not aware of any change in net/core/dev.c related here,
> >> >> > so I guess it's a bug in rcu_barrier().
> >> >>
> >> >> >From the limited trace data I see in this email I have to agree.
> >> >>
> >> >> It looks like for some reason rcu_barrier is taking forever
> >> >> while the rtnl_lock is held in cleanup_net. Because the
> >> >> rtnl_lock is held modprobe of the ppp driver is getting stuck.
> >> >>
> >> >> Is it possible we have an AB BA deadlock between the rtnl_lock
> >> >> and rcu. With something the module loading code assumes?
> >> >
> >> > I am not aware of RCU ever acquiring rtnl_lock, not directly, anyway.
> >>
> >> Does the module loading code do something strange with rcu? Perhaps
> >> blocking an rcu grace period until the module loading completes?
> >>
> >> If the module loading somehow blocks an rcu grace period that would
> >> create an AB deadlock because loading the ppp module grabs the
> >> rtnl_lock. And elsewhere we have the rtnl_lock waiting for an rcu grace
> >> period.
> >>
> >> I would think trying and failing to get the rtnl_lock would sleep and
> >> thus let any rcu grace period happen but shrug.
> >>
> >> It looks like something is holding up the rcu grace period, and causing
> >> this. Although it is possible that something is causing cleanup_net
> >> to run slowly and we are just seeing that slowness show up in
> >> rcu_barrier as that is one of the slower bits. With a single trace I
> >> can't definitely same that the rcu barrier is getting stuck but it
> >> certainly looks that way.
> >
> > Don't get me wrong -- the fact that this kthread appears to have
> > blocked within rcu_barrier() for 120 seconds means that something is
> > most definitely wrong here. I am surprised that there are no RCU CPU
> > stall warnings, but perhaps the blockage is in the callback execution
> > rather than grace-period completion. Or something is preventing this
> > kthread from starting up after the wake-up callback executes. Or...
> >
> > Is this thing reproducible?
>
> I've added Yanko on CC, who reported the backtrace above and can
> recreate it reliably. Apparently reverting the RCU merge commit
> (d6dd50e) and rebuilding the latest after that does not show the
> issue. I'll let Yanko explain more and answer any questions you have.

- It is reproducible
- I've done another build here to double check and its definitely the rcu merge
that's causing it.

Don't think I'll be able to dig deeper, but I can do testing if needed.

--Yanko

2014-10-22 23:24:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti wrote:
> On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > <[email protected]> wrote:

[ . . . ]

> > > Don't get me wrong -- the fact that this kthread appears to have
> > > blocked within rcu_barrier() for 120 seconds means that something is
> > > most definitely wrong here. I am surprised that there are no RCU CPU
> > > stall warnings, but perhaps the blockage is in the callback execution
> > > rather than grace-period completion. Or something is preventing this
> > > kthread from starting up after the wake-up callback executes. Or...
> > >
> > > Is this thing reproducible?
> >
> > I've added Yanko on CC, who reported the backtrace above and can
> > recreate it reliably. Apparently reverting the RCU merge commit
> > (d6dd50e) and rebuilding the latest after that does not show the
> > issue. I'll let Yanko explain more and answer any questions you have.
>
> - It is reproducible
> - I've done another build here to double check and its definitely the rcu merge
> that's causing it.
>
> Don't think I'll be able to dig deeper, but I can do testing if needed.

Please! Does the following patch help?

Thanx, Paul

------------------------------------------------------------------------

rcu: More on deadlock between CPU hotplug and expedited grace periods

Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and
expedited grace periods) was incomplete. Although it did eliminate
deadlocks involving synchronize_sched_expedited()'s acquisition of
cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar
deadlock involving acquisition of this same lock via put_online_cpus().
This deadlock became apparent with testing involving hibernation.

This commit therefore changes put_online_cpus() acquisition of this lock
to be conditional, and increments a new cpu_hotplug.puts_pending field
in case of acquisition failure. Then cpu_hotplug_begin() checks for this
new field being non-zero, and applies any changes to cpu_hotplug.refcount.

Reported-by: Jiri Kosina <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: Jiri Kosina <[email protected]>

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 356450f09c1f..90a3d017b90c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -64,6 +64,8 @@ static struct {
* an ongoing cpu hotplug operation.
*/
int refcount;
+ /* And allows lockless put_online_cpus(). */
+ atomic_t puts_pending;

#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
@@ -113,7 +115,11 @@ void put_online_cpus(void)
{
if (cpu_hotplug.active_writer == current)
return;
- mutex_lock(&cpu_hotplug.lock);
+ if (!mutex_trylock(&cpu_hotplug.lock)) {
+ atomic_inc(&cpu_hotplug.puts_pending);
+ cpuhp_lock_release();
+ return;
+ }

if (WARN_ON(!cpu_hotplug.refcount))
cpu_hotplug.refcount++; /* try to fix things up */
@@ -155,6 +161,12 @@ void cpu_hotplug_begin(void)
cpuhp_lock_acquire();
for (;;) {
mutex_lock(&cpu_hotplug.lock);
+ if (atomic_read(&cpu_hotplug.puts_pending)) {
+ int delta;
+
+ delta = atomic_xchg(&cpu_hotplug.puts_pending, 0);
+ cpu_hotplug.refcount -= delta;
+ }
if (likely(!cpu_hotplug.refcount))
break;
__set_current_state(TASK_UNINTERRUPTIBLE);

2014-10-23 06:16:11

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti wrote:
> > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > <[email protected]> wrote:
>
> [ . . . ]
>
> > > > Don't get me wrong -- the fact that this kthread appears to
> > > > have
> > > > blocked within rcu_barrier() for 120 seconds means that
> > > > something is
> > > > most definitely wrong here. I am surprised that there are no
> > > > RCU CPU
> > > > stall warnings, but perhaps the blockage is in the callback
> > > > execution
> > > > rather than grace-period completion. Or something is
> > > > preventing this
> > > > kthread from starting up after the wake-up callback executes.
> > > > Or...
> > > >
> > > > Is this thing reproducible?
> > >
> > > I've added Yanko on CC, who reported the backtrace above and can
> > > recreate it reliably. Apparently reverting the RCU merge commit
> > > (d6dd50e) and rebuilding the latest after that does not show the
> > > issue. I'll let Yanko explain more and answer any questions you
> > > have.
> >
> > - It is reproducible
> > - I've done another build here to double check and its definitely
> > the rcu merge
> > that's causing it.
> >
> > Don't think I'll be able to dig deeper, but I can do testing if
> > needed.
>
> Please! Does the following patch help?

Nope, doesn't seem to make a difference to the modprobe ppp_generic
test


INFO: task kworker/u16:6:101 blocked for more than 120 seconds.
Not tainted 3.18.0-0.rc1.git2.3.fc22.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kworker/u16:6 D ffff88022067cec0 11680 101 2 0x00000000
Workqueue: netns cleanup_net
ffff8802206939e8 0000000000000096 ffff88022067cec0 00000000001d5f00
ffff880220693fd8 00000000001d5f00 ffff880223263480 ffff88022067cec0
ffffffff82c51d60 7fffffffffffffff ffffffff81ee2698 ffffffff81ee2690
Call Trace:
[<ffffffff8185e289>] schedule+0x29/0x70
[<ffffffff818634ac>] schedule_timeout+0x26c/0x410
[<ffffffff81028c4a>] ? native_sched_clock+0x2a/0xa0
[<ffffffff81107afc>] ? mark_held_locks+0x7c/0xb0
[<ffffffff81864530>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff81107c8d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff8185fcbc>] wait_for_completion+0x10c/0x150
[<ffffffff810e5430>] ? wake_up_state+0x20/0x20
[<ffffffff8112a799>] _rcu_barrier+0x159/0x200
[<ffffffff8112a895>] rcu_barrier+0x15/0x20
[<ffffffff81718f0f>] netdev_run_todo+0x6f/0x310
[<ffffffff8170dad5>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff81725f7e>] rtnl_unlock+0xe/0x10
[<ffffffff8170f936>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd8f0>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff817079e3>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81708590>] cleanup_net+0x100/0x1f0
[<ffffffff810ccff8>] process_one_work+0x218/0x850
[<ffffffff810ccf5f>] ? process_one_work+0x17f/0x850
[<ffffffff810cd717>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd69b>] worker_thread+0x6b/0x4a0
[<ffffffff810cd630>] ? process_one_work+0x850/0x850
[<ffffffff810d39eb>] kthread+0x10b/0x130
[<ffffffff81028cc9>] ? sched_clock+0x9/0x10
[<ffffffff810d38e0>] ? kthread_create_on_node+0x250/0x250
[<ffffffff8186527c>] ret_from_fork+0x7c/0xb0
[<ffffffff810d38e0>] ? kthread_create_on_node+0x250/0x250
4 locks held by kworker/u16:6/101:
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf5f>] process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810ccf5f>] process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff8170851c>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a675>] _rcu_barrier+0x35/0x200
INFO: task modprobe:1139 blocked for more than 120 seconds.
Not tainted 3.18.0-0.rc1.git2.3.fc22.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
modprobe D ffff880213ac1a40 13112 1139 1138 0x00000080
ffff880036ab3be8 0000000000000096 ffff880213ac1a40 00000000001d5f00
ffff880036ab3fd8 00000000001d5f00 ffff880223264ec0 ffff880213ac1a40
ffff880213ac1a40 ffffffff81f8fb48 0000000000000246 ffff880213ac1a40
Call Trace:
[<ffffffff8185e831>] schedule_preempt_disabled+0x31/0x80
[<ffffffff81860083>] mutex_lock_nested+0x183/0x440
[<ffffffff817083af>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff817083af>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa06f3000>] ? 0xffffffffa06f3000
[<ffffffff817083af>] register_pernet_subsys+0x1f/0x50
[<ffffffffa06f3048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153c52>] load_module+0x20c2/0x2870
[<ffffffff8114ec30>] ? store_uevent+0x70/0x70
[<ffffffff8110ac76>] ? lock_release_non_nested+0x3c6/0x3d0
[<ffffffff811544e7>] SyS_init_module+0xe7/0x140
[<ffffffff81865329>] system_call_fastpath+0x12/0x17
1 lock held by modprobe/1139:
#0: (net_mutex){+.+.+.}, at: [<ffffffff817083af>]
register_pernet_subsys+0x1f/0x50
INFO: task modprobe:1209 blocked for more than 120 seconds.
Not tainted 3.18.0-0.rc1.git2.3.fc22.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
modprobe D ffff8800c5324ec0 13368 1209 1151 0x00000080
ffff88020d14bbe8 0000000000000096 ffff8800c5324ec0 00000000001d5f00
ffff88020d14bfd8 00000000001d5f00 ffff880223280000 ffff8800c5324ec0
ffff8800c5324ec0 ffffffff81f8fb48 0000000000000246 ffff8800c5324ec0
Call Trace:
[<ffffffff8185e831>] schedule_preempt_disabled+0x31/0x80
[<ffffffff81860083>] mutex_lock_nested+0x183/0x440
[<ffffffff817083fd>] ? register_pernet_device+0x1d/0x70
[<ffffffff817083fd>] ? register_pernet_device+0x1d/0x70
[<ffffffffa070f000>] ? 0xffffffffa070f000
[<ffffffff817083fd>] register_pernet_device+0x1d/0x70
[<ffffffffa070f020>] ppp_init+0x20/0x1000 [ppp_generic]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153c52>] load_module+0x20c2/0x2870
[<ffffffff8114ec30>] ? store_uevent+0x70/0x70
[<ffffffff8110ac76>] ? lock_release_non_nested+0x3c6/0x3d0
[<ffffffff811544e7>] SyS_init_module+0xe7/0x140
[<ffffffff81865329>] system_call_fastpath+0x12/0x17
1 lock held by modprobe/1209:
#0: (net_mutex){+.+.+.}, at: [<ffffffff817083fd>] register_pernet_device+0x1d/0x70


> Thanx, Paul
>
> ---------------------------------------------------------------------
> ---
>
> rcu: More on deadlock between CPU hotplug and expedited grace periods
>
> Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and
> expedited grace periods) was incomplete. Although it did eliminate
> deadlocks involving synchronize_sched_expedited()'s acquisition of
> cpu_hotplug.lock via get_online_cpus(), it did nothing about the
> similar
> deadlock involving acquisition of this same lock via
> put_online_cpus().
> This deadlock became apparent with testing involving hibernation.
>
> This commit therefore changes put_online_cpus() acquisition of this
> lock
> to be conditional, and increments a new cpu_hotplug.puts_pending
> field
> in case of acquisition failure. Then cpu_hotplug_begin() checks for
> this
> new field being non-zero, and applies any changes to
> cpu_hotplug.refcount.
>
> Reported-by: Jiri Kosina <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> Tested-by: Jiri Kosina <[email protected]>
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 356450f09c1f..90a3d017b90c 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -64,6 +64,8 @@ static struct {
> * an ongoing cpu hotplug operation.
> */
> int refcount;
> + /* And allows lockless put_online_cpus(). */
> + atomic_t puts_pending;
>
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> struct lockdep_map dep_map;
> @@ -113,7 +115,11 @@ void put_online_cpus(void)
> {
> if (cpu_hotplug.active_writer == current)
> return;
> - mutex_lock(&cpu_hotplug.lock);
> + if (!mutex_trylock(&cpu_hotplug.lock)) {
> + atomic_inc(&cpu_hotplug.puts_pending);
> + cpuhp_lock_release();
> + return;
> + }
>
> if (WARN_ON(!cpu_hotplug.refcount))
> cpu_hotplug.refcount++; /* try to fix things up */
> @@ -155,6 +161,12 @@ void cpu_hotplug_begin(void)
> cpuhp_lock_acquire();
> for (;;) {
> mutex_lock(&cpu_hotplug.lock);
> + if (atomic_read(&cpu_hotplug.puts_pending)) {
> + int delta;
> +
> + delta = atomic_xchg(&cpu_hotplug.puts_pending, 0);
> + cpu_hotplug.refcount -= delta;
> + }
> if (likely(!cpu_hotplug.refcount))
> break;
> __set_current_state(TASK_UNINTERRUPTIBLE);
>
>

2014-10-23 12:31:47

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti wrote:
> > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > > <[email protected]> wrote:
> >
> > [ . . . ]
> >
> > > > > Don't get me wrong -- the fact that this kthread appears to
> > > > > have
> > > > > blocked within rcu_barrier() for 120 seconds means that
> > > > > something is
> > > > > most definitely wrong here. I am surprised that there are no
> > > > > RCU CPU
> > > > > stall warnings, but perhaps the blockage is in the callback
> > > > > execution
> > > > > rather than grace-period completion. Or something is
> > > > > preventing this
> > > > > kthread from starting up after the wake-up callback executes.
> > > > > Or...
> > > > >
> > > > > Is this thing reproducible?
> > > >
> > > > I've added Yanko on CC, who reported the backtrace above and can
> > > > recreate it reliably. Apparently reverting the RCU merge commit
> > > > (d6dd50e) and rebuilding the latest after that does not show the
> > > > issue. I'll let Yanko explain more and answer any questions you
> > > > have.
> > >
> > > - It is reproducible
> > > - I've done another build here to double check and its definitely
> > > the rcu merge
> > > that's causing it.
> > >
> > > Don't think I'll be able to dig deeper, but I can do testing if
> > > needed.
> >
> > Please! Does the following patch help?
>
> Nope, doesn't seem to make a difference to the modprobe ppp_generic
> test

Well, I was hoping. I will take a closer look at the RCU merge commit
and see what suggests itself. I am likely to ask you to revert specific
commits, if that works for you.

Thanx, Paul

> INFO: task kworker/u16:6:101 blocked for more than 120 seconds.
> Not tainted 3.18.0-0.rc1.git2.3.fc22.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> kworker/u16:6 D ffff88022067cec0 11680 101 2 0x00000000
> Workqueue: netns cleanup_net
> ffff8802206939e8 0000000000000096 ffff88022067cec0 00000000001d5f00
> ffff880220693fd8 00000000001d5f00 ffff880223263480 ffff88022067cec0
> ffffffff82c51d60 7fffffffffffffff ffffffff81ee2698 ffffffff81ee2690
> Call Trace:
> [<ffffffff8185e289>] schedule+0x29/0x70
> [<ffffffff818634ac>] schedule_timeout+0x26c/0x410
> [<ffffffff81028c4a>] ? native_sched_clock+0x2a/0xa0
> [<ffffffff81107afc>] ? mark_held_locks+0x7c/0xb0
> [<ffffffff81864530>] ? _raw_spin_unlock_irq+0x30/0x50
> [<ffffffff81107c8d>] ? trace_hardirqs_on_caller+0x15d/0x200
> [<ffffffff8185fcbc>] wait_for_completion+0x10c/0x150
> [<ffffffff810e5430>] ? wake_up_state+0x20/0x20
> [<ffffffff8112a799>] _rcu_barrier+0x159/0x200
> [<ffffffff8112a895>] rcu_barrier+0x15/0x20
> [<ffffffff81718f0f>] netdev_run_todo+0x6f/0x310
> [<ffffffff8170dad5>] ? rollback_registered_many+0x265/0x2e0
> [<ffffffff81725f7e>] rtnl_unlock+0xe/0x10
> [<ffffffff8170f936>] default_device_exit_batch+0x156/0x180
> [<ffffffff810fd8f0>] ? abort_exclusive_wait+0xb0/0xb0
> [<ffffffff817079e3>] ops_exit_list.isra.1+0x53/0x60
> [<ffffffff81708590>] cleanup_net+0x100/0x1f0
> [<ffffffff810ccff8>] process_one_work+0x218/0x850
> [<ffffffff810ccf5f>] ? process_one_work+0x17f/0x850
> [<ffffffff810cd717>] ? worker_thread+0xe7/0x4a0
> [<ffffffff810cd69b>] worker_thread+0x6b/0x4a0
> [<ffffffff810cd630>] ? process_one_work+0x850/0x850
> [<ffffffff810d39eb>] kthread+0x10b/0x130
> [<ffffffff81028cc9>] ? sched_clock+0x9/0x10
> [<ffffffff810d38e0>] ? kthread_create_on_node+0x250/0x250
> [<ffffffff8186527c>] ret_from_fork+0x7c/0xb0
> [<ffffffff810d38e0>] ? kthread_create_on_node+0x250/0x250
> 4 locks held by kworker/u16:6/101:
> #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810ccf5f>] process_one_work+0x17f/0x850
> #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810ccf5f>] process_one_work+0x17f/0x850
> #2: (net_mutex){+.+.+.}, at: [<ffffffff8170851c>] cleanup_net+0x8c/0x1f0
> #3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a675>] _rcu_barrier+0x35/0x200
> INFO: task modprobe:1139 blocked for more than 120 seconds.
> Not tainted 3.18.0-0.rc1.git2.3.fc22.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> modprobe D ffff880213ac1a40 13112 1139 1138 0x00000080
> ffff880036ab3be8 0000000000000096 ffff880213ac1a40 00000000001d5f00
> ffff880036ab3fd8 00000000001d5f00 ffff880223264ec0 ffff880213ac1a40
> ffff880213ac1a40 ffffffff81f8fb48 0000000000000246 ffff880213ac1a40
> Call Trace:
> [<ffffffff8185e831>] schedule_preempt_disabled+0x31/0x80
> [<ffffffff81860083>] mutex_lock_nested+0x183/0x440
> [<ffffffff817083af>] ? register_pernet_subsys+0x1f/0x50
> [<ffffffff817083af>] ? register_pernet_subsys+0x1f/0x50
> [<ffffffffa06f3000>] ? 0xffffffffa06f3000
> [<ffffffff817083af>] register_pernet_subsys+0x1f/0x50
> [<ffffffffa06f3048>] br_init+0x48/0xd3 [bridge]
> [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> [<ffffffff81153c52>] load_module+0x20c2/0x2870
> [<ffffffff8114ec30>] ? store_uevent+0x70/0x70
> [<ffffffff8110ac76>] ? lock_release_non_nested+0x3c6/0x3d0
> [<ffffffff811544e7>] SyS_init_module+0xe7/0x140
> [<ffffffff81865329>] system_call_fastpath+0x12/0x17
> 1 lock held by modprobe/1139:
> #0: (net_mutex){+.+.+.}, at: [<ffffffff817083af>]
> register_pernet_subsys+0x1f/0x50
> INFO: task modprobe:1209 blocked for more than 120 seconds.
> Not tainted 3.18.0-0.rc1.git2.3.fc22.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> modprobe D ffff8800c5324ec0 13368 1209 1151 0x00000080
> ffff88020d14bbe8 0000000000000096 ffff8800c5324ec0 00000000001d5f00
> ffff88020d14bfd8 00000000001d5f00 ffff880223280000 ffff8800c5324ec0
> ffff8800c5324ec0 ffffffff81f8fb48 0000000000000246 ffff8800c5324ec0
> Call Trace:
> [<ffffffff8185e831>] schedule_preempt_disabled+0x31/0x80
> [<ffffffff81860083>] mutex_lock_nested+0x183/0x440
> [<ffffffff817083fd>] ? register_pernet_device+0x1d/0x70
> [<ffffffff817083fd>] ? register_pernet_device+0x1d/0x70
> [<ffffffffa070f000>] ? 0xffffffffa070f000
> [<ffffffff817083fd>] register_pernet_device+0x1d/0x70
> [<ffffffffa070f020>] ppp_init+0x20/0x1000 [ppp_generic]
> [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> [<ffffffff81153c52>] load_module+0x20c2/0x2870
> [<ffffffff8114ec30>] ? store_uevent+0x70/0x70
> [<ffffffff8110ac76>] ? lock_release_non_nested+0x3c6/0x3d0
> [<ffffffff811544e7>] SyS_init_module+0xe7/0x140
> [<ffffffff81865329>] system_call_fastpath+0x12/0x17
> 1 lock held by modprobe/1209:
> #0: (net_mutex){+.+.+.}, at: [<ffffffff817083fd>] register_pernet_device+0x1d/0x70
>
>
> > Thanx, Paul
> >
> > ---------------------------------------------------------------------
> > ---
> >
> > rcu: More on deadlock between CPU hotplug and expedited grace periods
> >
> > Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and
> > expedited grace periods) was incomplete. Although it did eliminate
> > deadlocks involving synchronize_sched_expedited()'s acquisition of
> > cpu_hotplug.lock via get_online_cpus(), it did nothing about the
> > similar
> > deadlock involving acquisition of this same lock via
> > put_online_cpus().
> > This deadlock became apparent with testing involving hibernation.
> >
> > This commit therefore changes put_online_cpus() acquisition of this
> > lock
> > to be conditional, and increments a new cpu_hotplug.puts_pending
> > field
> > in case of acquisition failure. Then cpu_hotplug_begin() checks for
> > this
> > new field being non-zero, and applies any changes to
> > cpu_hotplug.refcount.
> >
> > Reported-by: Jiri Kosina <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > Tested-by: Jiri Kosina <[email protected]>
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 356450f09c1f..90a3d017b90c 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -64,6 +64,8 @@ static struct {
> > * an ongoing cpu hotplug operation.
> > */
> > int refcount;
> > + /* And allows lockless put_online_cpus(). */
> > + atomic_t puts_pending;
> >
> > #ifdef CONFIG_DEBUG_LOCK_ALLOC
> > struct lockdep_map dep_map;
> > @@ -113,7 +115,11 @@ void put_online_cpus(void)
> > {
> > if (cpu_hotplug.active_writer == current)
> > return;
> > - mutex_lock(&cpu_hotplug.lock);
> > + if (!mutex_trylock(&cpu_hotplug.lock)) {
> > + atomic_inc(&cpu_hotplug.puts_pending);
> > + cpuhp_lock_release();
> > + return;
> > + }
> >
> > if (WARN_ON(!cpu_hotplug.refcount))
> > cpu_hotplug.refcount++; /* try to fix things up */
> > @@ -155,6 +161,12 @@ void cpu_hotplug_begin(void)
> > cpuhp_lock_acquire();
> > for (;;) {
> > mutex_lock(&cpu_hotplug.lock);
> > + if (atomic_read(&cpu_hotplug.puts_pending)) {
> > + int delta;
> > +
> > + delta = atomic_xchg(&cpu_hotplug.puts_pending, 0);
> > + cpu_hotplug.refcount -= delta;
> > + }
> > if (likely(!cpu_hotplug.refcount))
> > break;
> > __set_current_state(TASK_UNINTERRUPTIBLE);
> >
> >
>

2014-10-23 15:37:35

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti wrote:
> > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > > > <[email protected]> wrote:
> > >
> > > [ . . . ]
> > >
> > > > > > Don't get me wrong -- the fact that this kthread appears to
> > > > > > have
> > > > > > blocked within rcu_barrier() for 120 seconds means that
> > > > > > something is
> > > > > > most definitely wrong here. I am surprised that there are no
> > > > > > RCU CPU
> > > > > > stall warnings, but perhaps the blockage is in the callback
> > > > > > execution
> > > > > > rather than grace-period completion. Or something is
> > > > > > preventing this
> > > > > > kthread from starting up after the wake-up callback executes.
> > > > > > Or...
> > > > > >
> > > > > > Is this thing reproducible?
> > > > >
> > > > > I've added Yanko on CC, who reported the backtrace above and can
> > > > > recreate it reliably. Apparently reverting the RCU merge commit
> > > > > (d6dd50e) and rebuilding the latest after that does not show the
> > > > > issue. I'll let Yanko explain more and answer any questions you
> > > > > have.
> > > >
> > > > - It is reproducible
> > > > - I've done another build here to double check and its definitely
> > > > the rcu merge
> > > > that's causing it.
> > > >
> > > > Don't think I'll be able to dig deeper, but I can do testing if
> > > > needed.
> > >
> > > Please! Does the following patch help?
> >
> > Nope, doesn't seem to make a difference to the modprobe ppp_generic
> > test
>
> Well, I was hoping. I will take a closer look at the RCU merge commit
> and see what suggests itself. I am likely to ask you to revert specific
> commits, if that works for you.

Well, rather than reverting commits, could you please try testing the
following commits?

11ed7f934cb8 (rcu: Make nocb leader kthreads process pending callbacks after spawning)

73a860cd58a1 (rcu: Replace flush_signals() with WARN_ON(signal_pending()))

c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())

For whatever it is worth, I am guessing this one.

a53dd6a65668 (rcutorture: Add RCU-tasks tests to default rcutorture list)

If any of the above fail, this one should also fail.

Also, could you please send along your .config?

Thanx, Paul

2014-10-23 16:32:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Thu, Oct 23, 2014 at 12:11:26PM -0400, Josh Boyer wrote:
> On Oct 23, 2014 11:37 AM, "Paul E. McKenney" <[email protected]>
> wrote:
> >
> > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti wrote:
> > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > > > > > <[email protected]> wrote:
> > > > >
> > > > > [ . . . ]
> > > > >
> > > > > > > > Don't get me wrong -- the fact that this kthread appears to
> > > > > > > > have
> > > > > > > > blocked within rcu_barrier() for 120 seconds means that
> > > > > > > > something is
> > > > > > > > most definitely wrong here. I am surprised that there are no
> > > > > > > > RCU CPU
> > > > > > > > stall warnings, but perhaps the blockage is in the callback
> > > > > > > > execution
> > > > > > > > rather than grace-period completion. Or something is
> > > > > > > > preventing this
> > > > > > > > kthread from starting up after the wake-up callback executes.
> > > > > > > > Or...
> > > > > > > >
> > > > > > > > Is this thing reproducible?
> > > > > > >
> > > > > > > I've added Yanko on CC, who reported the backtrace above and can
> > > > > > > recreate it reliably. Apparently reverting the RCU merge commit
> > > > > > > (d6dd50e) and rebuilding the latest after that does not show the
> > > > > > > issue. I'll let Yanko explain more and answer any questions you
> > > > > > > have.
> > > > > >
> > > > > > - It is reproducible
> > > > > > - I've done another build here to double check and its definitely
> > > > > > the rcu merge
> > > > > > that's causing it.
> > > > > >
> > > > > > Don't think I'll be able to dig deeper, but I can do testing if
> > > > > > needed.
> > > > >
> > > > > Please! Does the following patch help?
> > > >
> > > > Nope, doesn't seem to make a difference to the modprobe ppp_generic
> > > > test
> > >
> > > Well, I was hoping. I will take a closer look at the RCU merge commit
> > > and see what suggests itself. I am likely to ask you to revert specific
> > > commits, if that works for you.
> >
> > Well, rather than reverting commits, could you please try testing the
> > following commits?
> >
> > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending callbacks
> after spawning)
> >
> > 73a860cd58a1 (rcu: Replace flush_signals() with WARN_ON(signal_pending()))
> >
> > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
> >
> > For whatever it is worth, I am guessing this one.
> >
> > a53dd6a65668 (rcutorture: Add RCU-tasks tests to default rcutorture list)
> >
> > If any of the above fail, this one should also fail.
> >
> > Also, could you please send along your .config?
>
> Which tree are those in?

They are all in Linus's tree. They are topic branches of the RCU merge
commit (d6dd50e), and the test results will hopefully give me more of a
clue where to look. As would the .config file. ;-)

Thanx, Paul

2014-10-23 19:52:10

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
> On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti wrote:
> > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > > > > <[email protected]> wrote:
> > > >
> > > > [ . . . ]
> > > >
> > > > > > > Don't get me wrong -- the fact that this kthread appears to
> > > > > > > have
> > > > > > > blocked within rcu_barrier() for 120 seconds means that
> > > > > > > something is
> > > > > > > most definitely wrong here. I am surprised that there are no
> > > > > > > RCU CPU
> > > > > > > stall warnings, but perhaps the blockage is in the callback
> > > > > > > execution
> > > > > > > rather than grace-period completion. Or something is
> > > > > > > preventing this
> > > > > > > kthread from starting up after the wake-up callback executes.
> > > > > > > Or...
> > > > > > >
> > > > > > > Is this thing reproducible?
> > > > > >
> > > > > > I've added Yanko on CC, who reported the backtrace above and can
> > > > > > recreate it reliably. Apparently reverting the RCU merge commit
> > > > > > (d6dd50e) and rebuilding the latest after that does not show the
> > > > > > issue. I'll let Yanko explain more and answer any questions you
> > > > > > have.
> > > > >
> > > > > - It is reproducible
> > > > > - I've done another build here to double check and its definitely
> > > > > the rcu merge
> > > > > that's causing it.
> > > > >
> > > > > Don't think I'll be able to dig deeper, but I can do testing if
> > > > > needed.
> > > >
> > > > Please! Does the following patch help?
> > >
> > > Nope, doesn't seem to make a difference to the modprobe ppp_generic
> > > test
> >
> > Well, I was hoping. I will take a closer look at the RCU merge commit
> > and see what suggests itself. I am likely to ask you to revert specific
> > commits, if that works for you.
>
> Well, rather than reverting commits, could you please try testing the
> following commits?
>
> 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending callbacks after spawning)
>
> 73a860cd58a1 (rcu: Replace flush_signals() with WARN_ON(signal_pending()))
>
> c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
>
> For whatever it is worth, I am guessing this one.

Indeed, c847f14217d5 it is.

Much to my embarrasment I just noticed that in addition to the
rcu merge, triggering the bug "requires" my specific Fedora rawhide network
setup. Booting in single mode and modprobe ppp_generic is fine. The bug
appears when starting with my regular fedora network setup, which in my case
includes 3 ethernet adapters and a libvirt birdge+nat setup.

Hope that helps.

I am attaching the config.


Attachments:
(No filename) (2.88 kB)
3.18.0-rc0.config (147.55 kB)
Download all attachments

2014-10-23 20:09:06

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
> > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti wrote:
> > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > > > > > <[email protected]> wrote:
> > > > >
> > > > > [ . . . ]
> > > > >
> > > > > > > > Don't get me wrong -- the fact that this kthread appears to
> > > > > > > > have
> > > > > > > > blocked within rcu_barrier() for 120 seconds means that
> > > > > > > > something is
> > > > > > > > most definitely wrong here. I am surprised that there are no
> > > > > > > > RCU CPU
> > > > > > > > stall warnings, but perhaps the blockage is in the callback
> > > > > > > > execution
> > > > > > > > rather than grace-period completion. Or something is
> > > > > > > > preventing this
> > > > > > > > kthread from starting up after the wake-up callback executes.
> > > > > > > > Or...
> > > > > > > >
> > > > > > > > Is this thing reproducible?
> > > > > > >
> > > > > > > I've added Yanko on CC, who reported the backtrace above and can
> > > > > > > recreate it reliably. Apparently reverting the RCU merge commit
> > > > > > > (d6dd50e) and rebuilding the latest after that does not show the
> > > > > > > issue. I'll let Yanko explain more and answer any questions you
> > > > > > > have.
> > > > > >
> > > > > > - It is reproducible
> > > > > > - I've done another build here to double check and its definitely
> > > > > > the rcu merge
> > > > > > that's causing it.
> > > > > >
> > > > > > Don't think I'll be able to dig deeper, but I can do testing if
> > > > > > needed.
> > > > >
> > > > > Please! Does the following patch help?
> > > >
> > > > Nope, doesn't seem to make a difference to the modprobe ppp_generic
> > > > test
> > >
> > > Well, I was hoping. I will take a closer look at the RCU merge commit
> > > and see what suggests itself. I am likely to ask you to revert specific
> > > commits, if that works for you.
> >
> > Well, rather than reverting commits, could you please try testing the
> > following commits?
> >
> > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending callbacks after spawning)
> >
> > 73a860cd58a1 (rcu: Replace flush_signals() with WARN_ON(signal_pending()))
> >
> > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
> >
> > For whatever it is worth, I am guessing this one.
>
> Indeed, c847f14217d5 it is.
>
> Much to my embarrasment I just noticed that in addition to the
> rcu merge, triggering the bug "requires" my specific Fedora rawhide network
> setup. Booting in single mode and modprobe ppp_generic is fine. The bug
> appears when starting with my regular fedora network setup, which in my case
> includes 3 ethernet adapters and a libvirt birdge+nat setup.
>
> Hope that helps.
>
> I am attaching the config.

It does help a lot, thank you!!!

The following patch is a bit of a shot in the dark, and assumes that
commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled idle
code) introduced the problem. Does this patch fix things up?

Thanx, Paul

------------------------------------------------------------------------

rcu: Kick rcuo kthreads after their CPU goes offline

If a no-CBs CPU were to post an RCU callback with interrupts disabled
after it entered the idle loop for the last time, there might be no
deferred wakeup for the corresponding rcuo kthreads. This commit
therefore adds a set of calls to do_nocb_deferred_wakeup() after the
CPU has gone completely offline.

Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 84b41b3c6ebd..4f3d25a58786 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3493,8 +3493,10 @@ static int rcu_cpu_notify(struct notifier_block *self,
case CPU_DEAD_FROZEN:
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
- for_each_rcu_flavor(rsp)
+ for_each_rcu_flavor(rsp) {
rcu_cleanup_dead_cpu(cpu, rsp);
+ do_nocb_deferred_wakeup(this_cpu_ptr(rsp->rda));
+ }
break;
default:
break;

2014-10-23 21:45:45

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?


On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
> > > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> > > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> > > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti
> > > > > > wrote:
> > > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > > > > > > <[email protected]> wrote:
> > > > > >
> > > > > > [ . . . ]
> > > > > >
> > > > > > > > > Don't get me wrong -- the fact that this kthread
> > > > > > > > > appears to
> > > > > > > > > have
> > > > > > > > > blocked within rcu_barrier() for 120 seconds means
> > > > > > > > > that
> > > > > > > > > something is
> > > > > > > > > most definitely wrong here. I am surprised that
> > > > > > > > > there are no
> > > > > > > > > RCU CPU
> > > > > > > > > stall warnings, but perhaps the blockage is in the
> > > > > > > > > callback
> > > > > > > > > execution
> > > > > > > > > rather than grace-period completion. Or something is
> > > > > > > > > preventing this
> > > > > > > > > kthread from starting up after the wake-up callback
> > > > > > > > > executes.
> > > > > > > > > Or...
> > > > > > > > >
> > > > > > > > > Is this thing reproducible?
> > > > > > > >
> > > > > > > > I've added Yanko on CC, who reported the backtrace
> > > > > > > > above and can
> > > > > > > > recreate it reliably. Apparently reverting the RCU
> > > > > > > > merge commit
> > > > > > > > (d6dd50e) and rebuilding the latest after that does
> > > > > > > > not show the
> > > > > > > > issue. I'll let Yanko explain more and answer any
> > > > > > > > questions you
> > > > > > > > have.
> > > > > > >
> > > > > > > - It is reproducible
> > > > > > > - I've done another build here to double check and its
> > > > > > > definitely
> > > > > > > the rcu merge
> > > > > > > that's causing it.
> > > > > > >
> > > > > > > Don't think I'll be able to dig deeper, but I can do
> > > > > > > testing if
> > > > > > > needed.
> > > > > >
> > > > > > Please! Does the following patch help?
> > > > >
> > > > > Nope, doesn't seem to make a difference to the modprobe
> > > > > ppp_generic
> > > > > test
> > > >
> > > > Well, I was hoping. I will take a closer look at the RCU
> > > > merge commit
> > > > and see what suggests itself. I am likely to ask you to
> > > > revert specific
> > > > commits, if that works for you.
> > >
> > > Well, rather than reverting commits, could you please try
> > > testing the
> > > following commits?
> > >
> > > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending
> > > callbacks after spawning)
> > >
> > > 73a860cd58a1 (rcu: Replace flush_signals() with
> > > WARN_ON(signal_pending()))
> > >
> > > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
> > >
> > > For whatever it is worth, I am guessing this one.
> >
> > Indeed, c847f14217d5 it is.
> >
> > Much to my embarrasment I just noticed that in addition to the
> > rcu merge, triggering the bug "requires" my specific Fedora
> > rawhide network
> > setup. Booting in single mode and modprobe ppp_generic is fine.
> > The bug
> > appears when starting with my regular fedora network setup, which
> > in my case
> > includes 3 ethernet adapters and a libvirt birdge+nat setup.
> >
> > Hope that helps.
> >
> > I am attaching the config.
>
> It does help a lot, thank you!!!
>
> The following patch is a bit of a shot in the dark, and assumes that
> commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled
> idle
> code) introduced the problem. Does this patch fix things up?

Unfortunately not, This is linus-tip + patch


INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
Workqueue: netns cleanup_net
ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
Call Trace:
[<ffffffff8185b8e9>] schedule+0x29/0x70
[<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
[<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
[<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
[<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
[<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
[<ffffffff8112a219>] _rcu_barrier+0x159/0x200
[<ffffffff8112a315>] rcu_barrier+0x15/0x20
[<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
[<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff817235ee>] rtnl_unlock+0xe/0x10
[<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81705c00>] cleanup_net+0x100/0x1f0
[<ffffffff810cca98>] process_one_work+0x218/0x850
[<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
[<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
[<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
[<ffffffff810d348b>] kthread+0x10b/0x130
[<ffffffff81028c69>] ? sched_clock+0x9/0x10
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
[<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
4 locks held by kworker/u16:6/96:
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
INFO: task modprobe:1045 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff880218343480 12920 1045 1044 0x00000080
ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
Call Trace:
[<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
[<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa0673000>] ? 0xffffffffa0673000
[<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
[<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff81153052>] load_module+0x20c2/0x2870
[<ffffffff8114e030>] ? store_uevent+0x70/0x70
[<ffffffff81278717>] ? kernel_read+0x57/0x90
[<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
[<ffffffff81862969>] system_call_fastpath+0x12/0x17
1 lock held by modprobe/1045:
#0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50


> Thanx, Paul
>
> ---------------------------------------------------------------------
> ---
>
> rcu: Kick rcuo kthreads after their CPU goes offline
>
> If a no-CBs CPU were to post an RCU callback with interrupts disabled
> after it entered the idle loop for the last time, there might be no
> deferred wakeup for the corresponding rcuo kthreads. This commit
> therefore adds a set of calls to do_nocb_deferred_wakeup() after the
> CPU has gone completely offline.
>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 84b41b3c6ebd..4f3d25a58786 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3493,8 +3493,10 @@ static int rcu_cpu_notify(struct
> notifier_block *self,
> case CPU_DEAD_FROZEN:
> case CPU_UP_CANCELED:
> case CPU_UP_CANCELED_FROZEN:
> - for_each_rcu_flavor(rsp)
> + for_each_rcu_flavor(rsp) {
> rcu_cleanup_dead_cpu(cpu, rsp);
> + do_nocb_deferred_wakeup(this_cpu_ptr(rsp->rda));
> + }
> break;
> default:
> break;
>
>

2014-10-23 22:08:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
>
> On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > > On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
> > > > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> > > > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> > > > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti
> > > > > > > wrote:
> > > > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > > > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > [ . . . ]
> > > > > > >
> > > > > > > > > > Don't get me wrong -- the fact that this kthread
> > > > > > > > > > appears to
> > > > > > > > > > have
> > > > > > > > > > blocked within rcu_barrier() for 120 seconds means
> > > > > > > > > > that
> > > > > > > > > > something is
> > > > > > > > > > most definitely wrong here. I am surprised that
> > > > > > > > > > there are no
> > > > > > > > > > RCU CPU
> > > > > > > > > > stall warnings, but perhaps the blockage is in the
> > > > > > > > > > callback
> > > > > > > > > > execution
> > > > > > > > > > rather than grace-period completion. Or something is
> > > > > > > > > > preventing this
> > > > > > > > > > kthread from starting up after the wake-up callback
> > > > > > > > > > executes.
> > > > > > > > > > Or...
> > > > > > > > > >
> > > > > > > > > > Is this thing reproducible?
> > > > > > > > >
> > > > > > > > > I've added Yanko on CC, who reported the backtrace
> > > > > > > > > above and can
> > > > > > > > > recreate it reliably. Apparently reverting the RCU
> > > > > > > > > merge commit
> > > > > > > > > (d6dd50e) and rebuilding the latest after that does
> > > > > > > > > not show the
> > > > > > > > > issue. I'll let Yanko explain more and answer any
> > > > > > > > > questions you
> > > > > > > > > have.
> > > > > > > >
> > > > > > > > - It is reproducible
> > > > > > > > - I've done another build here to double check and its
> > > > > > > > definitely
> > > > > > > > the rcu merge
> > > > > > > > that's causing it.
> > > > > > > >
> > > > > > > > Don't think I'll be able to dig deeper, but I can do
> > > > > > > > testing if
> > > > > > > > needed.
> > > > > > >
> > > > > > > Please! Does the following patch help?
> > > > > >
> > > > > > Nope, doesn't seem to make a difference to the modprobe
> > > > > > ppp_generic
> > > > > > test
> > > > >
> > > > > Well, I was hoping. I will take a closer look at the RCU
> > > > > merge commit
> > > > > and see what suggests itself. I am likely to ask you to
> > > > > revert specific
> > > > > commits, if that works for you.
> > > >
> > > > Well, rather than reverting commits, could you please try
> > > > testing the
> > > > following commits?
> > > >
> > > > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending
> > > > callbacks after spawning)
> > > >
> > > > 73a860cd58a1 (rcu: Replace flush_signals() with
> > > > WARN_ON(signal_pending()))
> > > >
> > > > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
> > > >
> > > > For whatever it is worth, I am guessing this one.
> > >
> > > Indeed, c847f14217d5 it is.
> > >
> > > Much to my embarrasment I just noticed that in addition to the
> > > rcu merge, triggering the bug "requires" my specific Fedora
> > > rawhide network
> > > setup. Booting in single mode and modprobe ppp_generic is fine.
> > > The bug
> > > appears when starting with my regular fedora network setup, which
> > > in my case
> > > includes 3 ethernet adapters and a libvirt birdge+nat setup.
> > >
> > > Hope that helps.
> > >
> > > I am attaching the config.
> >
> > It does help a lot, thank you!!!
> >
> > The following patch is a bit of a shot in the dark, and assumes that
> > commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled
> > idle
> > code) introduced the problem. Does this patch fix things up?
>
> Unfortunately not, This is linus-tip + patch

OK. Can't have everything, I guess.

> INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
> Not tainted 3.18.0-rc1+ #4
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
> Workqueue: netns cleanup_net
> ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
> ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
> ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
> Call Trace:
> [<ffffffff8185b8e9>] schedule+0x29/0x70
> [<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
> [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
> [<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
> [<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
> [<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
> [<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
> [<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
> [<ffffffff8112a219>] _rcu_barrier+0x159/0x200
> [<ffffffff8112a315>] rcu_barrier+0x15/0x20
> [<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
> [<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
> [<ffffffff817235ee>] rtnl_unlock+0xe/0x10
> [<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
> [<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
> [<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
> [<ffffffff81705c00>] cleanup_net+0x100/0x1f0
> [<ffffffff810cca98>] process_one_work+0x218/0x850
> [<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
> [<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
> [<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
> [<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
> [<ffffffff810d348b>] kthread+0x10b/0x130
> [<ffffffff81028c69>] ? sched_clock+0x9/0x10
> [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> [<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
> [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> 4 locks held by kworker/u16:6/96:
> #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> #2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
> #3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
> INFO: task modprobe:1045 blocked for more than 120 seconds.
> Not tainted 3.18.0-rc1+ #4
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> modprobe D ffff880218343480 12920 1045 1044 0x00000080
> ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
> ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
> ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
> Call Trace:
> [<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
> [<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
> [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> [<ffffffffa0673000>] ? 0xffffffffa0673000
> [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> [<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
> [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> [<ffffffff81153052>] load_module+0x20c2/0x2870
> [<ffffffff8114e030>] ? store_uevent+0x70/0x70
> [<ffffffff81278717>] ? kernel_read+0x57/0x90
> [<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
> [<ffffffff81862969>] system_call_fastpath+0x12/0x17
> 1 lock held by modprobe/1045:
> #0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50

Presumably the kworker/u16:6 completed, then modprobe hung?

If not, I have some very hard questions about why net_mutex can be
held by two tasks concurrently, given that it does not appear to be a
reader-writer lock...

Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
Make nocb leader kthreads process pending callbacks after spawning)
and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?

Thanx, Paul

2014-10-24 04:48:48

by Jay Vosburgh

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <[email protected]> wrote:

>On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
>>
>> On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
>> > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
>> > > On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
>> > > > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
>> > > > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
>> > > > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
>> > > > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti
>> > > > > > > wrote:
>> > > > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
>> > > > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
>> > > > > > > > > <[email protected]> wrote:
>> > > > > > >
>> > > > > > > [ . . . ]
>> > > > > > >
>> > > > > > > > > > Don't get me wrong -- the fact that this kthread
>> > > > > > > > > > appears to
>> > > > > > > > > > have
>> > > > > > > > > > blocked within rcu_barrier() for 120 seconds means
>> > > > > > > > > > that
>> > > > > > > > > > something is
>> > > > > > > > > > most definitely wrong here. I am surprised that
>> > > > > > > > > > there are no
>> > > > > > > > > > RCU CPU
>> > > > > > > > > > stall warnings, but perhaps the blockage is in the
>> > > > > > > > > > callback
>> > > > > > > > > > execution
>> > > > > > > > > > rather than grace-period completion. Or something is
>> > > > > > > > > > preventing this
>> > > > > > > > > > kthread from starting up after the wake-up callback
>> > > > > > > > > > executes.
>> > > > > > > > > > Or...
>> > > > > > > > > >
>> > > > > > > > > > Is this thing reproducible?
>> > > > > > > > >
>> > > > > > > > > I've added Yanko on CC, who reported the backtrace
>> > > > > > > > > above and can
>> > > > > > > > > recreate it reliably. Apparently reverting the RCU
>> > > > > > > > > merge commit
>> > > > > > > > > (d6dd50e) and rebuilding the latest after that does
>> > > > > > > > > not show the
>> > > > > > > > > issue. I'll let Yanko explain more and answer any
>> > > > > > > > > questions you
>> > > > > > > > > have.
>> > > > > > > >
>> > > > > > > > - It is reproducible
>> > > > > > > > - I've done another build here to double check and its
>> > > > > > > > definitely
>> > > > > > > > the rcu merge
>> > > > > > > > that's causing it.
>> > > > > > > >
>> > > > > > > > Don't think I'll be able to dig deeper, but I can do
>> > > > > > > > testing if
>> > > > > > > > needed.
>> > > > > > >
>> > > > > > > Please! Does the following patch help?
>> > > > > >
>> > > > > > Nope, doesn't seem to make a difference to the modprobe
>> > > > > > ppp_generic
>> > > > > > test
>> > > > >
>> > > > > Well, I was hoping. I will take a closer look at the RCU
>> > > > > merge commit
>> > > > > and see what suggests itself. I am likely to ask you to
>> > > > > revert specific
>> > > > > commits, if that works for you.
>> > > >
>> > > > Well, rather than reverting commits, could you please try
>> > > > testing the
>> > > > following commits?
>> > > >
>> > > > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending
>> > > > callbacks after spawning)
>> > > >
>> > > > 73a860cd58a1 (rcu: Replace flush_signals() with
>> > > > WARN_ON(signal_pending()))
>> > > >
>> > > > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
>> > > >
>> > > > For whatever it is worth, I am guessing this one.
>> > >
>> > > Indeed, c847f14217d5 it is.
>> > >
>> > > Much to my embarrasment I just noticed that in addition to the
>> > > rcu merge, triggering the bug "requires" my specific Fedora
>> > > rawhide network
>> > > setup. Booting in single mode and modprobe ppp_generic is fine.
>> > > The bug
>> > > appears when starting with my regular fedora network setup, which
>> > > in my case
>> > > includes 3 ethernet adapters and a libvirt birdge+nat setup.
>> > >
>> > > Hope that helps.
>> > >
>> > > I am attaching the config.
>> >
>> > It does help a lot, thank you!!!
>> >
>> > The following patch is a bit of a shot in the dark, and assumes that
>> > commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled
>> > idle
>> > code) introduced the problem. Does this patch fix things up?
>>
>> Unfortunately not, This is linus-tip + patch
>
>OK. Can't have everything, I guess.
>
>> INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
>> Not tainted 3.18.0-rc1+ #4
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
>> Workqueue: netns cleanup_net
>> ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
>> ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
>> ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
>> Call Trace:
>> [<ffffffff8185b8e9>] schedule+0x29/0x70
>> [<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
>> [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
>> [<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
>> [<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
>> [<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
>> [<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
>> [<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
>> [<ffffffff8112a219>] _rcu_barrier+0x159/0x200
>> [<ffffffff8112a315>] rcu_barrier+0x15/0x20
>> [<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
>> [<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
>> [<ffffffff817235ee>] rtnl_unlock+0xe/0x10
>> [<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
>> [<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
>> [<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
>> [<ffffffff81705c00>] cleanup_net+0x100/0x1f0
>> [<ffffffff810cca98>] process_one_work+0x218/0x850
>> [<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
>> [<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
>> [<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
>> [<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
>> [<ffffffff810d348b>] kthread+0x10b/0x130
>> [<ffffffff81028c69>] ? sched_clock+0x9/0x10
>> [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
>> [<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
>> [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
>> 4 locks held by kworker/u16:6/96:
>> #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
>> #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
>> #2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
>> #3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
>> INFO: task modprobe:1045 blocked for more than 120 seconds.
>> Not tainted 3.18.0-rc1+ #4
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> modprobe D ffff880218343480 12920 1045 1044 0x00000080
>> ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
>> ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
>> ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
>> Call Trace:
>> [<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
>> [<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
>> [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
>> [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
>> [<ffffffffa0673000>] ? 0xffffffffa0673000
>> [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
>> [<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
>> [<ffffffff81002148>] do_one_initcall+0xd8/0x210
>> [<ffffffff81153052>] load_module+0x20c2/0x2870
>> [<ffffffff8114e030>] ? store_uevent+0x70/0x70
>> [<ffffffff81278717>] ? kernel_read+0x57/0x90
>> [<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
>> [<ffffffff81862969>] system_call_fastpath+0x12/0x17
>> 1 lock held by modprobe/1045:
>> #0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
>
>Presumably the kworker/u16:6 completed, then modprobe hung?
>
>If not, I have some very hard questions about why net_mutex can be
>held by two tasks concurrently, given that it does not appear to be a
>reader-writer lock...
>
>Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
>__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
>NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
>If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
>Make nocb leader kthreads process pending callbacks after spawning)
>and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?

Just a note to add that I am also reliably inducing what appears
to be this issue on a current -net tree, when configuring openvswitch
via script. I am available to test patches or bisect tomorrow (Friday)
US time if needed.

The stack is as follows:

[ 1320.492020] INFO: task ovs-vswitchd:1303 blocked for more than 120 seconds.
[ 1320.498965] Not tainted 3.17.0-testola+ #1
[ 1320.503570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1320.511374] ovs-vswitchd D ffff88013fc14600 0 1303 1302 0x00000004
[ 1320.511378] ffff8801388d77d8 0000000000000002 ffff880031144b00 ffff8801388d7fd8
[ 1320.511382] 0000000000014600 0000000000014600 ffff8800b092e400 ffff880031144b00
[ 1320.511385] ffff8800b1126000 ffffffff81c58ad0 ffffffff81c58ad8 7fffffffffffffff
[ 1320.511389] Call Trace:
[ 1320.511396] [<ffffffff81739db9>] schedule+0x29/0x70
[ 1320.511399] [<ffffffff8173cd8c>] schedule_timeout+0x1dc/0x260
[ 1320.511404] [<ffffffff8109698d>] ? check_preempt_curr+0x8d/0xa0
[ 1320.511407] [<ffffffff810969bd>] ? ttwu_do_wakeup+0x1d/0xd0
[ 1320.511410] [<ffffffff8173aab6>] wait_for_completion+0xa6/0x160
[ 1320.511413] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 1320.511417] [<ffffffff810cdb57>] _rcu_barrier+0x157/0x200
[ 1320.511419] [<ffffffff810cdc55>] rcu_barrier+0x15/0x20
[ 1320.511423] [<ffffffff8163a780>] netdev_run_todo+0x60/0x300
[ 1320.511427] [<ffffffff8164515e>] rtnl_unlock+0xe/0x10
[ 1320.511435] [<ffffffffa01aecc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 1320.511440] [<ffffffffa01ae622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 1320.511444] [<ffffffffa01a7dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 1320.511448] [<ffffffffa01a7ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 1320.511452] [<ffffffff816675b5>] genl_family_rcv_msg+0x1a5/0x3c0
[ 1320.511455] [<ffffffff816677d0>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 1320.511458] [<ffffffff81667861>] genl_rcv_msg+0x91/0xd0
[ 1320.511461] [<ffffffff816658d1>] netlink_rcv_skb+0xc1/0xe0
[ 1320.511463] [<ffffffff81665dfc>] genl_rcv+0x2c/0x40
[ 1320.511466] [<ffffffff81664e66>] netlink_unicast+0xf6/0x200
[ 1320.511468] [<ffffffff8166528d>] netlink_sendmsg+0x31d/0x780
[ 1320.511472] [<ffffffff81662274>] ? netlink_rcv_wake+0x44/0x60
[ 1320.511475] [<ffffffff816632e3>] ? netlink_recvmsg+0x1d3/0x3e0
[ 1320.511479] [<ffffffff8161c463>] sock_sendmsg+0x93/0xd0
[ 1320.511484] [<ffffffff81332d00>] ? apparmor_file_alloc_security+0x20/0x40
[ 1320.511487] [<ffffffff8162a697>] ? verify_iovec+0x47/0xd0
[ 1320.511491] [<ffffffff8161cc79>] ___sys_sendmsg+0x399/0x3b0
[ 1320.511495] [<ffffffff81254e02>] ? kernfs_seq_stop_active+0x32/0x40
[ 1320.511499] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 1320.511502] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 1320.511505] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 1320.511509] [<ffffffff81122d5c>] ? acct_account_cputime+0x1c/0x20
[ 1320.511512] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 1320.511516] [<ffffffff811fc135>] ? __fget_light+0x25/0x70
[ 1320.511519] [<ffffffff8161d372>] __sys_sendmsg+0x42/0x80
[ 1320.511521] [<ffffffff8161d3c2>] SyS_sendmsg+0x12/0x20
[ 1320.511525] [<ffffffff8173e6a4>] tracesys_phase2+0xd8/0xdd

-J

---
-Jay Vosburgh, [email protected]

2014-10-24 09:09:05

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> >
> > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > > > On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
> > > > > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> > > > > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> > > > > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> > > > > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti
> > > > > > > > wrote:
> > > > > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> > > > > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> > > > > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > [ . . . ]
> > > > > > > >
> > > > > > > > > > > Don't get me wrong -- the fact that this kthread
> > > > > > > > > > > appears to
> > > > > > > > > > > have
> > > > > > > > > > > blocked within rcu_barrier() for 120 seconds means
> > > > > > > > > > > that
> > > > > > > > > > > something is
> > > > > > > > > > > most definitely wrong here. I am surprised that
> > > > > > > > > > > there are no
> > > > > > > > > > > RCU CPU
> > > > > > > > > > > stall warnings, but perhaps the blockage is in the
> > > > > > > > > > > callback
> > > > > > > > > > > execution
> > > > > > > > > > > rather than grace-period completion. Or something is
> > > > > > > > > > > preventing this
> > > > > > > > > > > kthread from starting up after the wake-up callback
> > > > > > > > > > > executes.
> > > > > > > > > > > Or...
> > > > > > > > > > >
> > > > > > > > > > > Is this thing reproducible?
> > > > > > > > > >
> > > > > > > > > > I've added Yanko on CC, who reported the backtrace
> > > > > > > > > > above and can
> > > > > > > > > > recreate it reliably. Apparently reverting the RCU
> > > > > > > > > > merge commit
> > > > > > > > > > (d6dd50e) and rebuilding the latest after that does
> > > > > > > > > > not show the
> > > > > > > > > > issue. I'll let Yanko explain more and answer any
> > > > > > > > > > questions you
> > > > > > > > > > have.
> > > > > > > > >
> > > > > > > > > - It is reproducible
> > > > > > > > > - I've done another build here to double check and its
> > > > > > > > > definitely
> > > > > > > > > the rcu merge
> > > > > > > > > that's causing it.
> > > > > > > > >
> > > > > > > > > Don't think I'll be able to dig deeper, but I can do
> > > > > > > > > testing if
> > > > > > > > > needed.
> > > > > > > >
> > > > > > > > Please! Does the following patch help?
> > > > > > >
> > > > > > > Nope, doesn't seem to make a difference to the modprobe
> > > > > > > ppp_generic
> > > > > > > test
> > > > > >
> > > > > > Well, I was hoping. I will take a closer look at the RCU
> > > > > > merge commit
> > > > > > and see what suggests itself. I am likely to ask you to
> > > > > > revert specific
> > > > > > commits, if that works for you.
> > > > >
> > > > > Well, rather than reverting commits, could you please try
> > > > > testing the
> > > > > following commits?
> > > > >
> > > > > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending
> > > > > callbacks after spawning)
> > > > >
> > > > > 73a860cd58a1 (rcu: Replace flush_signals() with
> > > > > WARN_ON(signal_pending()))
> > > > >
> > > > > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
> > > > >
> > > > > For whatever it is worth, I am guessing this one.
> > > >
> > > > Indeed, c847f14217d5 it is.
> > > >
> > > > Much to my embarrasment I just noticed that in addition to the
> > > > rcu merge, triggering the bug "requires" my specific Fedora
> > > > rawhide network
> > > > setup. Booting in single mode and modprobe ppp_generic is fine.
> > > > The bug
> > > > appears when starting with my regular fedora network setup, which
> > > > in my case
> > > > includes 3 ethernet adapters and a libvirt birdge+nat setup.
> > > >
> > > > Hope that helps.
> > > >
> > > > I am attaching the config.
> > >
> > > It does help a lot, thank you!!!
> > >
> > > The following patch is a bit of a shot in the dark, and assumes that
> > > commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled
> > > idle
> > > code) introduced the problem. Does this patch fix things up?
> >
> > Unfortunately not, This is linus-tip + patch
>
> OK. Can't have everything, I guess.
>
> > INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
> > Not tainted 3.18.0-rc1+ #4
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
> > Workqueue: netns cleanup_net
> > ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
> > ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
> > ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
> > Call Trace:
> > [<ffffffff8185b8e9>] schedule+0x29/0x70
> > [<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
> > [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
> > [<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
> > [<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
> > [<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
> > [<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
> > [<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
> > [<ffffffff8112a219>] _rcu_barrier+0x159/0x200
> > [<ffffffff8112a315>] rcu_barrier+0x15/0x20
> > [<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
> > [<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
> > [<ffffffff817235ee>] rtnl_unlock+0xe/0x10
> > [<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
> > [<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
> > [<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
> > [<ffffffff81705c00>] cleanup_net+0x100/0x1f0
> > [<ffffffff810cca98>] process_one_work+0x218/0x850
> > [<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
> > [<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
> > [<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
> > [<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
> > [<ffffffff810d348b>] kthread+0x10b/0x130
> > [<ffffffff81028c69>] ? sched_clock+0x9/0x10
> > [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> > [<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
> > [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> > 4 locks held by kworker/u16:6/96:
> > #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> > #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> > #2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
> > #3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
> > INFO: task modprobe:1045 blocked for more than 120 seconds.
> > Not tainted 3.18.0-rc1+ #4
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > modprobe D ffff880218343480 12920 1045 1044 0x00000080
> > ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
> > ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
> > ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
> > Call Trace:
> > [<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
> > [<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
> > [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> > [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> > [<ffffffffa0673000>] ? 0xffffffffa0673000
> > [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> > [<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
> > [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> > [<ffffffff81153052>] load_module+0x20c2/0x2870
> > [<ffffffff8114e030>] ? store_uevent+0x70/0x70
> > [<ffffffff81278717>] ? kernel_read+0x57/0x90
> > [<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
> > [<ffffffff81862969>] system_call_fastpath+0x12/0x17
> > 1 lock held by modprobe/1045:
> > #0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
>
> Presumably the kworker/u16:6 completed, then modprobe hung?
>
> If not, I have some very hard questions about why net_mutex can be
> held by two tasks concurrently, given that it does not appear to be a
> reader-writer lock...
>
> Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
> __call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
> NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
> If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
> Make nocb leader kthreads process pending callbacks after spawning)
> and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?

Ok, unless I've messsed up something major, bisecting points to:

35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs

Makes any sense ?


Another thing I noticed is that in failure mode the libvirtd bridge actually
doesn't show up. So maybe ppp is just the first thing to try that bumps up
into whatever libvirtd is failing to do to setup those.

Truly hope this is not something with random timing dependency....

--Yanko

2014-10-24 14:54:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Thu, Oct 23, 2014 at 09:48:34PM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <[email protected]> wrote:
>
> >On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> >>
> >> On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> >> > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> >> > > On Thu-10/23/14-2014 08:33, Paul E. McKenney wrote:
> >> > > > On Thu, Oct 23, 2014 at 05:27:50AM -0700, Paul E. McKenney wrote:
> >> > > > > On Thu, Oct 23, 2014 at 09:09:26AM +0300, Yanko Kaneti wrote:
> >> > > > > > On Wed, 2014-10-22 at 16:24 -0700, Paul E. McKenney wrote:
> >> > > > > > > On Thu, Oct 23, 2014 at 01:40:32AM +0300, Yanko Kaneti
> >> > > > > > > wrote:
> >> > > > > > > > On Wed-10/22/14-2014 15:33, Josh Boyer wrote:
> >> > > > > > > > > On Wed, Oct 22, 2014 at 2:55 PM, Paul E. McKenney
> >> > > > > > > > > <[email protected]> wrote:
> >> > > > > > >
> >> > > > > > > [ . . . ]
> >> > > > > > >
> >> > > > > > > > > > Don't get me wrong -- the fact that this kthread
> >> > > > > > > > > > appears to
> >> > > > > > > > > > have
> >> > > > > > > > > > blocked within rcu_barrier() for 120 seconds means
> >> > > > > > > > > > that
> >> > > > > > > > > > something is
> >> > > > > > > > > > most definitely wrong here. I am surprised that
> >> > > > > > > > > > there are no
> >> > > > > > > > > > RCU CPU
> >> > > > > > > > > > stall warnings, but perhaps the blockage is in the
> >> > > > > > > > > > callback
> >> > > > > > > > > > execution
> >> > > > > > > > > > rather than grace-period completion. Or something is
> >> > > > > > > > > > preventing this
> >> > > > > > > > > > kthread from starting up after the wake-up callback
> >> > > > > > > > > > executes.
> >> > > > > > > > > > Or...
> >> > > > > > > > > >
> >> > > > > > > > > > Is this thing reproducible?
> >> > > > > > > > >
> >> > > > > > > > > I've added Yanko on CC, who reported the backtrace
> >> > > > > > > > > above and can
> >> > > > > > > > > recreate it reliably. Apparently reverting the RCU
> >> > > > > > > > > merge commit
> >> > > > > > > > > (d6dd50e) and rebuilding the latest after that does
> >> > > > > > > > > not show the
> >> > > > > > > > > issue. I'll let Yanko explain more and answer any
> >> > > > > > > > > questions you
> >> > > > > > > > > have.
> >> > > > > > > >
> >> > > > > > > > - It is reproducible
> >> > > > > > > > - I've done another build here to double check and its
> >> > > > > > > > definitely
> >> > > > > > > > the rcu merge
> >> > > > > > > > that's causing it.
> >> > > > > > > >
> >> > > > > > > > Don't think I'll be able to dig deeper, but I can do
> >> > > > > > > > testing if
> >> > > > > > > > needed.
> >> > > > > > >
> >> > > > > > > Please! Does the following patch help?
> >> > > > > >
> >> > > > > > Nope, doesn't seem to make a difference to the modprobe
> >> > > > > > ppp_generic
> >> > > > > > test
> >> > > > >
> >> > > > > Well, I was hoping. I will take a closer look at the RCU
> >> > > > > merge commit
> >> > > > > and see what suggests itself. I am likely to ask you to
> >> > > > > revert specific
> >> > > > > commits, if that works for you.
> >> > > >
> >> > > > Well, rather than reverting commits, could you please try
> >> > > > testing the
> >> > > > following commits?
> >> > > >
> >> > > > 11ed7f934cb8 (rcu: Make nocb leader kthreads process pending
> >> > > > callbacks after spawning)
> >> > > >
> >> > > > 73a860cd58a1 (rcu: Replace flush_signals() with
> >> > > > WARN_ON(signal_pending()))
> >> > > >
> >> > > > c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())
> >> > > >
> >> > > > For whatever it is worth, I am guessing this one.
> >> > >
> >> > > Indeed, c847f14217d5 it is.
> >> > >
> >> > > Much to my embarrasment I just noticed that in addition to the
> >> > > rcu merge, triggering the bug "requires" my specific Fedora
> >> > > rawhide network
> >> > > setup. Booting in single mode and modprobe ppp_generic is fine.
> >> > > The bug
> >> > > appears when starting with my regular fedora network setup, which
> >> > > in my case
> >> > > includes 3 ethernet adapters and a libvirt birdge+nat setup.
> >> > >
> >> > > Hope that helps.
> >> > >
> >> > > I am attaching the config.
> >> >
> >> > It does help a lot, thank you!!!
> >> >
> >> > The following patch is a bit of a shot in the dark, and assumes that
> >> > commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled
> >> > idle
> >> > code) introduced the problem. Does this patch fix things up?
> >>
> >> Unfortunately not, This is linus-tip + patch
> >
> >OK. Can't have everything, I guess.
> >
> >> INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
> >> Not tainted 3.18.0-rc1+ #4
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
> >> Workqueue: netns cleanup_net
> >> ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
> >> ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
> >> ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
> >> Call Trace:
> >> [<ffffffff8185b8e9>] schedule+0x29/0x70
> >> [<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
> >> [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
> >> [<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
> >> [<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
> >> [<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
> >> [<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
> >> [<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
> >> [<ffffffff8112a219>] _rcu_barrier+0x159/0x200
> >> [<ffffffff8112a315>] rcu_barrier+0x15/0x20
> >> [<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
> >> [<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
> >> [<ffffffff817235ee>] rtnl_unlock+0xe/0x10
> >> [<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
> >> [<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
> >> [<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
> >> [<ffffffff81705c00>] cleanup_net+0x100/0x1f0
> >> [<ffffffff810cca98>] process_one_work+0x218/0x850
> >> [<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
> >> [<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
> >> [<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
> >> [<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
> >> [<ffffffff810d348b>] kthread+0x10b/0x130
> >> [<ffffffff81028c69>] ? sched_clock+0x9/0x10
> >> [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> >> [<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
> >> [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> >> 4 locks held by kworker/u16:6/96:
> >> #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> >> #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> >> #2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
> >> #3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
> >> INFO: task modprobe:1045 blocked for more than 120 seconds.
> >> Not tainted 3.18.0-rc1+ #4
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> modprobe D ffff880218343480 12920 1045 1044 0x00000080
> >> ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
> >> ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
> >> ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
> >> Call Trace:
> >> [<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
> >> [<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
> >> [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> >> [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> >> [<ffffffffa0673000>] ? 0xffffffffa0673000
> >> [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> >> [<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
> >> [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> >> [<ffffffff81153052>] load_module+0x20c2/0x2870
> >> [<ffffffff8114e030>] ? store_uevent+0x70/0x70
> >> [<ffffffff81278717>] ? kernel_read+0x57/0x90
> >> [<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
> >> [<ffffffff81862969>] system_call_fastpath+0x12/0x17
> >> 1 lock held by modprobe/1045:
> >> #0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> >
> >Presumably the kworker/u16:6 completed, then modprobe hung?
> >
> >If not, I have some very hard questions about why net_mutex can be
> >held by two tasks concurrently, given that it does not appear to be a
> >reader-writer lock...
> >
> >Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
> >__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
> >NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
> >If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
> >Make nocb leader kthreads process pending callbacks after spawning)
> >and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
>
> Just a note to add that I am also reliably inducing what appears
> to be this issue on a current -net tree, when configuring openvswitch
> via script. I am available to test patches or bisect tomorrow (Friday)
> US time if needed.

Thank you, Jay! Could you please check to see if reverting this commit
fixes things for you?

35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs

Reverting is not a long-term fix, as this commit is itself a bug fix,
but would be good to check to see if you are seeing the same thing that
Yanko is. ;-)

Thanx, Paul

> The stack is as follows:
>
> [ 1320.492020] INFO: task ovs-vswitchd:1303 blocked for more than 120 seconds.
> [ 1320.498965] Not tainted 3.17.0-testola+ #1
> [ 1320.503570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1320.511374] ovs-vswitchd D ffff88013fc14600 0 1303 1302 0x00000004
> [ 1320.511378] ffff8801388d77d8 0000000000000002 ffff880031144b00 ffff8801388d7fd8
> [ 1320.511382] 0000000000014600 0000000000014600 ffff8800b092e400 ffff880031144b00
> [ 1320.511385] ffff8800b1126000 ffffffff81c58ad0 ffffffff81c58ad8 7fffffffffffffff
> [ 1320.511389] Call Trace:
> [ 1320.511396] [<ffffffff81739db9>] schedule+0x29/0x70
> [ 1320.511399] [<ffffffff8173cd8c>] schedule_timeout+0x1dc/0x260
> [ 1320.511404] [<ffffffff8109698d>] ? check_preempt_curr+0x8d/0xa0
> [ 1320.511407] [<ffffffff810969bd>] ? ttwu_do_wakeup+0x1d/0xd0
> [ 1320.511410] [<ffffffff8173aab6>] wait_for_completion+0xa6/0x160
> [ 1320.511413] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
> [ 1320.511417] [<ffffffff810cdb57>] _rcu_barrier+0x157/0x200
> [ 1320.511419] [<ffffffff810cdc55>] rcu_barrier+0x15/0x20
> [ 1320.511423] [<ffffffff8163a780>] netdev_run_todo+0x60/0x300
> [ 1320.511427] [<ffffffff8164515e>] rtnl_unlock+0xe/0x10
> [ 1320.511435] [<ffffffffa01aecc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
> [ 1320.511440] [<ffffffffa01ae622>] ovs_vport_del+0x32/0x40 [openvswitch]
> [ 1320.511444] [<ffffffffa01a7dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
> [ 1320.511448] [<ffffffffa01a7ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
> [ 1320.511452] [<ffffffff816675b5>] genl_family_rcv_msg+0x1a5/0x3c0
> [ 1320.511455] [<ffffffff816677d0>] ? genl_family_rcv_msg+0x3c0/0x3c0
> [ 1320.511458] [<ffffffff81667861>] genl_rcv_msg+0x91/0xd0
> [ 1320.511461] [<ffffffff816658d1>] netlink_rcv_skb+0xc1/0xe0
> [ 1320.511463] [<ffffffff81665dfc>] genl_rcv+0x2c/0x40
> [ 1320.511466] [<ffffffff81664e66>] netlink_unicast+0xf6/0x200
> [ 1320.511468] [<ffffffff8166528d>] netlink_sendmsg+0x31d/0x780
> [ 1320.511472] [<ffffffff81662274>] ? netlink_rcv_wake+0x44/0x60
> [ 1320.511475] [<ffffffff816632e3>] ? netlink_recvmsg+0x1d3/0x3e0
> [ 1320.511479] [<ffffffff8161c463>] sock_sendmsg+0x93/0xd0
> [ 1320.511484] [<ffffffff81332d00>] ? apparmor_file_alloc_security+0x20/0x40
> [ 1320.511487] [<ffffffff8162a697>] ? verify_iovec+0x47/0xd0
> [ 1320.511491] [<ffffffff8161cc79>] ___sys_sendmsg+0x399/0x3b0
> [ 1320.511495] [<ffffffff81254e02>] ? kernfs_seq_stop_active+0x32/0x40
> [ 1320.511499] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> [ 1320.511502] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> [ 1320.511505] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
> [ 1320.511509] [<ffffffff81122d5c>] ? acct_account_cputime+0x1c/0x20
> [ 1320.511512] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
> [ 1320.511516] [<ffffffff811fc135>] ? __fget_light+0x25/0x70
> [ 1320.511519] [<ffffffff8161d372>] __sys_sendmsg+0x42/0x80
> [ 1320.511521] [<ffffffff8161d3c2>] SyS_sendmsg+0x12/0x20
> [ 1320.511525] [<ffffffff8173e6a4>] tracesys_phase2+0xd8/0xdd
>
> -J
>
> ---
> -Jay Vosburgh, [email protected]
>

2014-10-24 15:44:04

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > >
> > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:

[ . . . ]

> > > > > Indeed, c847f14217d5 it is.
> > > > >
> > > > > Much to my embarrasment I just noticed that in addition to the
> > > > > rcu merge, triggering the bug "requires" my specific Fedora
> > > > > rawhide network
> > > > > setup. Booting in single mode and modprobe ppp_generic is fine.
> > > > > The bug
> > > > > appears when starting with my regular fedora network setup, which
> > > > > in my case
> > > > > includes 3 ethernet adapters and a libvirt birdge+nat setup.
> > > > >
> > > > > Hope that helps.
> > > > >
> > > > > I am attaching the config.
> > > >
> > > > It does help a lot, thank you!!!
> > > >
> > > > The following patch is a bit of a shot in the dark, and assumes that
> > > > commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled
> > > > idle
> > > > code) introduced the problem. Does this patch fix things up?
> > >
> > > Unfortunately not, This is linus-tip + patch
> >
> > OK. Can't have everything, I guess.
> >
> > > INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
> > > Not tainted 3.18.0-rc1+ #4
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
> > > Workqueue: netns cleanup_net
> > > ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
> > > ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
> > > ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
> > > Call Trace:
> > > [<ffffffff8185b8e9>] schedule+0x29/0x70
> > > [<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
> > > [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
> > > [<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
> > > [<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
> > > [<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
> > > [<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
> > > [<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
> > > [<ffffffff8112a219>] _rcu_barrier+0x159/0x200
> > > [<ffffffff8112a315>] rcu_barrier+0x15/0x20
> > > [<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
> > > [<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
> > > [<ffffffff817235ee>] rtnl_unlock+0xe/0x10
> > > [<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
> > > [<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
> > > [<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
> > > [<ffffffff81705c00>] cleanup_net+0x100/0x1f0
> > > [<ffffffff810cca98>] process_one_work+0x218/0x850
> > > [<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
> > > [<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
> > > [<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
> > > [<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
> > > [<ffffffff810d348b>] kthread+0x10b/0x130
> > > [<ffffffff81028c69>] ? sched_clock+0x9/0x10
> > > [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> > > [<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
> > > [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> > > 4 locks held by kworker/u16:6/96:
> > > #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> > > #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> > > #2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
> > > #3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
> > > INFO: task modprobe:1045 blocked for more than 120 seconds.
> > > Not tainted 3.18.0-rc1+ #4
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > modprobe D ffff880218343480 12920 1045 1044 0x00000080
> > > ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
> > > ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
> > > ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
> > > Call Trace:
> > > [<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
> > > [<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
> > > [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> > > [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> > > [<ffffffffa0673000>] ? 0xffffffffa0673000
> > > [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> > > [<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
> > > [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> > > [<ffffffff81153052>] load_module+0x20c2/0x2870
> > > [<ffffffff8114e030>] ? store_uevent+0x70/0x70
> > > [<ffffffff81278717>] ? kernel_read+0x57/0x90
> > > [<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
> > > [<ffffffff81862969>] system_call_fastpath+0x12/0x17
> > > 1 lock held by modprobe/1045:
> > > #0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> >
> > Presumably the kworker/u16:6 completed, then modprobe hung?
> >
> > If not, I have some very hard questions about why net_mutex can be
> > held by two tasks concurrently, given that it does not appear to be a
> > reader-writer lock...
> >
> > Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
> > __call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
> > NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
> > If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
> > Make nocb leader kthreads process pending callbacks after spawning)
> > and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
>
> Ok, unless I've messsed up something major, bisecting points to:
>
> 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
>
> Makes any sense ?

Good question. ;-)

Are any of your online CPUs missing rcuo kthreads? There should be
kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.

> Another thing I noticed is that in failure mode the libvirtd bridge actually
> doesn't show up. So maybe ppp is just the first thing to try that bumps up
> into whatever libvirtd is failing to do to setup those.
>
> Truly hope this is not something with random timing dependency....

Me too. ;-)

Thanx, Paul

> --Yanko
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2014-10-24 16:29:47

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > >
> > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
>
> [ . . . ]
>
> > > > > > Indeed, c847f14217d5 it is.
> > > > > >
> > > > > > Much to my embarrasment I just noticed that in addition to the
> > > > > > rcu merge, triggering the bug "requires" my specific Fedora
> > > > > > rawhide network
> > > > > > setup. Booting in single mode and modprobe ppp_generic is fine.
> > > > > > The bug
> > > > > > appears when starting with my regular fedora network setup, which
> > > > > > in my case
> > > > > > includes 3 ethernet adapters and a libvirt birdge+nat setup.
> > > > > >
> > > > > > Hope that helps.
> > > > > >
> > > > > > I am attaching the config.
> > > > >
> > > > > It does help a lot, thank you!!!
> > > > >
> > > > > The following patch is a bit of a shot in the dark, and assumes that
> > > > > commit 1772947bd012 (rcu: Handle NOCB callbacks from irq-disabled
> > > > > idle
> > > > > code) introduced the problem. Does this patch fix things up?
> > > >
> > > > Unfortunately not, This is linus-tip + patch
> > >
> > > OK. Can't have everything, I guess.
> > >
> > > > INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
> > > > Not tainted 3.18.0-rc1+ #4
> > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > kworker/u16:6 D ffff8800ca84cec0 11168 96 2 0x00000000
> > > > Workqueue: netns cleanup_net
> > > > ffff8802218339e8 0000000000000096 ffff8800ca84cec0 00000000001d5f00
> > > > ffff880221833fd8 00000000001d5f00 ffff880223264ec0 ffff8800ca84cec0
> > > > ffffffff82c52040 7fffffffffffffff ffffffff81ee2658 ffffffff81ee2650
> > > > Call Trace:
> > > > [<ffffffff8185b8e9>] schedule+0x29/0x70
> > > > [<ffffffff81860b0c>] schedule_timeout+0x26c/0x410
> > > > [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
> > > > [<ffffffff8110759c>] ? mark_held_locks+0x7c/0xb0
> > > > [<ffffffff81861b90>] ? _raw_spin_unlock_irq+0x30/0x50
> > > > [<ffffffff8110772d>] ? trace_hardirqs_on_caller+0x15d/0x200
> > > > [<ffffffff8185d31c>] wait_for_completion+0x10c/0x150
> > > > [<ffffffff810e4ed0>] ? wake_up_state+0x20/0x20
> > > > [<ffffffff8112a219>] _rcu_barrier+0x159/0x200
> > > > [<ffffffff8112a315>] rcu_barrier+0x15/0x20
> > > > [<ffffffff8171657f>] netdev_run_todo+0x6f/0x310
> > > > [<ffffffff8170b145>] ? rollback_registered_many+0x265/0x2e0
> > > > [<ffffffff817235ee>] rtnl_unlock+0xe/0x10
> > > > [<ffffffff8170cfa6>] default_device_exit_batch+0x156/0x180
> > > > [<ffffffff810fd390>] ? abort_exclusive_wait+0xb0/0xb0
> > > > [<ffffffff81705053>] ops_exit_list.isra.1+0x53/0x60
> > > > [<ffffffff81705c00>] cleanup_net+0x100/0x1f0
> > > > [<ffffffff810cca98>] process_one_work+0x218/0x850
> > > > [<ffffffff810cc9ff>] ? process_one_work+0x17f/0x850
> > > > [<ffffffff810cd1b7>] ? worker_thread+0xe7/0x4a0
> > > > [<ffffffff810cd13b>] worker_thread+0x6b/0x4a0
> > > > [<ffffffff810cd0d0>] ? process_one_work+0x850/0x850
> > > > [<ffffffff810d348b>] kthread+0x10b/0x130
> > > > [<ffffffff81028c69>] ? sched_clock+0x9/0x10
> > > > [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> > > > [<ffffffff818628bc>] ret_from_fork+0x7c/0xb0
> > > > [<ffffffff810d3380>] ? kthread_create_on_node+0x250/0x250
> > > > 4 locks held by kworker/u16:6/96:
> > > > #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> > > > #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc9ff>] process_one_work+0x17f/0x850
> > > > #2: (net_mutex){+.+.+.}, at: [<ffffffff81705b8c>] cleanup_net+0x8c/0x1f0
> > > > #3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff8112a0f5>] _rcu_barrier+0x35/0x200
> > > > INFO: task modprobe:1045 blocked for more than 120 seconds.
> > > > Not tainted 3.18.0-rc1+ #4
> > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > modprobe D ffff880218343480 12920 1045 1044 0x00000080
> > > > ffff880218353bf8 0000000000000096 ffff880218343480 00000000001d5f00
> > > > ffff880218353fd8 00000000001d5f00 ffffffff81e1b580 ffff880218343480
> > > > ffff880218343480 ffffffff81f8f748 0000000000000246 ffff880218343480
> > > > Call Trace:
> > > > [<ffffffff8185be91>] schedule_preempt_disabled+0x31/0x80
> > > > [<ffffffff8185d6e3>] mutex_lock_nested+0x183/0x440
> > > > [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> > > > [<ffffffff81705a1f>] ? register_pernet_subsys+0x1f/0x50
> > > > [<ffffffffa0673000>] ? 0xffffffffa0673000
> > > > [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> > > > [<ffffffffa0673048>] br_init+0x48/0xd3 [bridge]
> > > > [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> > > > [<ffffffff81153052>] load_module+0x20c2/0x2870
> > > > [<ffffffff8114e030>] ? store_uevent+0x70/0x70
> > > > [<ffffffff81278717>] ? kernel_read+0x57/0x90
> > > > [<ffffffff811539e6>] SyS_finit_module+0xa6/0xe0
> > > > [<ffffffff81862969>] system_call_fastpath+0x12/0x17
> > > > 1 lock held by modprobe/1045:
> > > > #0: (net_mutex){+.+.+.}, at: [<ffffffff81705a1f>] register_pernet_subsys+0x1f/0x50
> > >
> > > Presumably the kworker/u16:6 completed, then modprobe hung?
> > >
> > > If not, I have some very hard questions about why net_mutex can be
> > > held by two tasks concurrently, given that it does not appear to be a
> > > reader-writer lock...
> > >
> > > Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
> > > __call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
> > > NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
> > > If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
> > > Make nocb leader kthreads process pending callbacks after spawning)
> > > and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
> >
> > Ok, unless I've messsed up something major, bisecting points to:
> >
> > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> >
> > Makes any sense ?
>
> Good question. ;-)
>
> Are any of your online CPUs missing rcuo kthreads? There should be
> kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.

Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
and the modprobe ppp_generic testcase reliably works, libvirt also manages
to setup its bridge.

Just with linux-tip , the rcuos are 6 but the failure is as reliable as
before.

Awating instructions: :)

2014-10-24 16:58:50

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > > >
> > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:

[ . . . ]

> > > Ok, unless I've messsed up something major, bisecting points to:
> > >
> > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > >
> > > Makes any sense ?
> >
> > Good question. ;-)
> >
> > Are any of your online CPUs missing rcuo kthreads? There should be
> > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
>
> Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> and the modprobe ppp_generic testcase reliably works, libvirt also manages
> to setup its bridge.
>
> Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> before.

Thank you, very interesting. Which 6 of the rcuos are present?

> Awating instructions: :)

Well, I thought I understood the problem until you found that only 6 of
the expected 8 rcuos are present with linux-tip without the revert. ;-)

I am putting together a patch for the part of the problem that I think
I understand, of course, but it would help a lot to know which two of
the rcuos are missing. ;-)

Thanx, Paul

2014-10-24 17:09:34

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > > > >
> > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
>
> [ . . . ]
>
> > > > Ok, unless I've messsed up something major, bisecting points to:
> > > >
> > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > > >
> > > > Makes any sense ?
> > >
> > > Good question. ;-)
> > >
> > > Are any of your online CPUs missing rcuo kthreads? There should be
> > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> >
> > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > to setup its bridge.
> >
> > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > before.


> Thank you, very interesting. Which 6 of the rcuos are present?

Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
Phenom II.


> > Awating instructions: :)
>
> Well, I thought I understood the problem until you found that only 6 of
> the expected 8 rcuos are present with linux-tip without the revert. ;-)
>
> I am putting together a patch for the part of the problem that I think
> I understand, of course, but it would help a lot to know which two of
> the rcuos are missing. ;-)
>

Ready to test

--Yanko

2014-10-24 17:24:06

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > > > > >
> > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> >
> > [ . . . ]
> >
> > > > > Ok, unless I've messsed up something major, bisecting points to:
> > > > >
> > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > > > >
> > > > > Makes any sense ?
> > > >
> > > > Good question. ;-)
> > > >
> > > > Are any of your online CPUs missing rcuo kthreads? There should be
> > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> > >
> > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > > to setup its bridge.
> > >
> > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > > before.
>
> > Thank you, very interesting. Which 6 of the rcuos are present?
>
> Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
> Phenom II.

Ah, you get 8 without the patch because it creates them for potential
CPUs as well as real ones. OK, got it.

> > > Awating instructions: :)
> >
> > Well, I thought I understood the problem until you found that only 6 of
> > the expected 8 rcuos are present with linux-tip without the revert. ;-)
> >
> > I am putting together a patch for the part of the problem that I think
> > I understand, of course, but it would help a lot to know which two of
> > the rcuos are missing. ;-)
>
> Ready to test

Well, if you are feeling aggressive, give the following patch a spin.
I am doing sanity tests on it in the meantime.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 29fb23f33c18..927c17b081c7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
rdp->nocb_leader = rdp_spawn;
if (rdp_last && rdp != rdp_spawn)
rdp_last->nocb_next_follower = rdp;
- rdp_last = rdp;
- rdp = rdp->nocb_next_follower;
- rdp_last->nocb_next_follower = NULL;
+ if (rdp == rdp_spawn) {
+ rdp = rdp->nocb_next_follower;
+ } else {
+ rdp_last = rdp;
+ rdp = rdp->nocb_next_follower;
+ rdp_last->nocb_next_follower = NULL;
+ }
} while (rdp);
rdp_spawn->nocb_next_follower = rdp_old_leader;
}

2014-10-24 17:35:29

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > > > > > >
> > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > >
> > > [ . . . ]
> > >
> > > > > > Ok, unless I've messsed up something major, bisecting points to:
> > > > > >
> > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > > > > >
> > > > > > Makes any sense ?
> > > > >
> > > > > Good question. ;-)
> > > > >
> > > > > Are any of your online CPUs missing rcuo kthreads? There should be
> > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> > > >
> > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > > > to setup its bridge.
> > > >
> > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > > > before.
> >
> > > Thank you, very interesting. Which 6 of the rcuos are present?
> >
> > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
> > Phenom II.
>
> Ah, you get 8 without the patch because it creates them for potential
> CPUs as well as real ones. OK, got it.
>
> > > > Awating instructions: :)
> > >
> > > Well, I thought I understood the problem until you found that only 6 of
> > > the expected 8 rcuos are present with linux-tip without the revert. ;-)
> > >
> > > I am putting together a patch for the part of the problem that I think
> > > I understand, of course, but it would help a lot to know which two of
> > > the rcuos are missing. ;-)
> >
> > Ready to test
>
> Well, if you are feeling aggressive, give the following patch a spin.
> I am doing sanity tests on it in the meantime.

Doesn't seem to make a difference here


> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 29fb23f33c18..927c17b081c7 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
> rdp->nocb_leader = rdp_spawn;
> if (rdp_last && rdp != rdp_spawn)
> rdp_last->nocb_next_follower = rdp;
> - rdp_last = rdp;
> - rdp = rdp->nocb_next_follower;
> - rdp_last->nocb_next_follower = NULL;
> + if (rdp == rdp_spawn) {
> + rdp = rdp->nocb_next_follower;
> + } else {
> + rdp_last = rdp;
> + rdp = rdp->nocb_next_follower;
> + rdp_last->nocb_next_follower = NULL;
> + }
> } while (rdp);
> rdp_spawn->nocb_next_follower = rdp_old_leader;
> }
>

2014-10-24 18:20:24

by Jay Vosburgh

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <[email protected]> wrote:

>On Thu, Oct 23, 2014 at 09:48:34PM -0700, Jay Vosburgh wrote:
>> Paul E. McKenney <[email protected]> wrote:
[...]
>> >Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
>> >__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
>> >NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
>> >If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
>> >Make nocb leader kthreads process pending callbacks after spawning)
>> >and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
>>
>> Just a note to add that I am also reliably inducing what appears
>> to be this issue on a current -net tree, when configuring openvswitch
>> via script. I am available to test patches or bisect tomorrow (Friday)
>> US time if needed.
>
>Thank you, Jay! Could you please check to see if reverting this commit
>fixes things for you?
>
>35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
>
>Reverting is not a long-term fix, as this commit is itself a bug fix,
>but would be good to check to see if you are seeing the same thing that
>Yanko is. ;-)

Just to confirm what Yanko found, reverting this commit makes
the problem go away for me.

-J

---
-Jay Vosburgh, [email protected]

2014-10-24 18:36:24

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > > > > > > >
> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > > >
> > > > [ . . . ]
> > > >
> > > > > > > Ok, unless I've messsed up something major, bisecting points to:
> > > > > > >
> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > > > > > >
> > > > > > > Makes any sense ?
> > > > > >
> > > > > > Good question. ;-)
> > > > > >
> > > > > > Are any of your online CPUs missing rcuo kthreads? There should be
> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> > > > >
> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > > > > to setup its bridge.
> > > > >
> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > > > > before.
> > >
> > > > Thank you, very interesting. Which 6 of the rcuos are present?
> > >
> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
> > > Phenom II.
> >
> > Ah, you get 8 without the patch because it creates them for potential
> > CPUs as well as real ones. OK, got it.
> >
> > > > > Awating instructions: :)
> > > >
> > > > Well, I thought I understood the problem until you found that only 6 of
> > > > the expected 8 rcuos are present with linux-tip without the revert. ;-)
> > > >
> > > > I am putting together a patch for the part of the problem that I think
> > > > I understand, of course, but it would help a lot to know which two of
> > > > the rcuos are missing. ;-)
> > >
> > > Ready to test
> >
> > Well, if you are feeling aggressive, give the following patch a spin.
> > I am doing sanity tests on it in the meantime.
>
> Doesn't seem to make a difference here

OK, inspection isn't cutting it, so time for tracing. Does the system
respond to user input? If so, please enable rcu:rcu_barrier ftrace before
the problem occurs, then dump the trace buffer after the problem occurs.

Thanx, Paul

> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 29fb23f33c18..927c17b081c7 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
> > rdp->nocb_leader = rdp_spawn;
> > if (rdp_last && rdp != rdp_spawn)
> > rdp_last->nocb_next_follower = rdp;
> > - rdp_last = rdp;
> > - rdp = rdp->nocb_next_follower;
> > - rdp_last->nocb_next_follower = NULL;
> > + if (rdp == rdp_spawn) {
> > + rdp = rdp->nocb_next_follower;
> > + } else {
> > + rdp_last = rdp;
> > + rdp = rdp->nocb_next_follower;
> > + rdp_last->nocb_next_follower = NULL;
> > + }
> > } while (rdp);
> > rdp_spawn->nocb_next_follower = rdp_old_leader;
> > }
> >
>

2014-10-24 18:37:19

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 11:20:11AM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <[email protected]> wrote:
>
> >On Thu, Oct 23, 2014 at 09:48:34PM -0700, Jay Vosburgh wrote:
> >> Paul E. McKenney <[email protected]> wrote:
> [...]
> >> >Either way, my patch assumed that 39953dfd4007 (rcu: Avoid misordering in
> >> >__call_rcu_nocb_enqueue()) would work and that 1772947bd012 (rcu: Handle
> >> >NOCB callbacks from irq-disabled idle code) would fail. Is that the case?
> >> >If not, could you please bisect the commits between 11ed7f934cb8 (rcu:
> >> >Make nocb leader kthreads process pending callbacks after spawning)
> >> >and c847f14217d5 (rcu: Avoid misordering in nocb_leader_wait())?
> >>
> >> Just a note to add that I am also reliably inducing what appears
> >> to be this issue on a current -net tree, when configuring openvswitch
> >> via script. I am available to test patches or bisect tomorrow (Friday)
> >> US time if needed.
> >
> >Thank you, Jay! Could you please check to see if reverting this commit
> >fixes things for you?
> >
> >35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> >
> >Reverting is not a long-term fix, as this commit is itself a bug fix,
> >but would be good to check to see if you are seeing the same thing that
> >Yanko is. ;-)
>
> Just to confirm what Yanko found, reverting this commit makes
> the problem go away for me.

Thank you!

I take it that the patches that don't help Yanko also don't help you?

Thanx, Paul

2014-10-24 18:50:00

by Jay Vosburgh

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <[email protected]> wrote:

>On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
>> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
>> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
>> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
>> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
>> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
>> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
>> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
>> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
>> > > > > > > > >
>> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
>> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
>> > > >
>> > > > [ . . . ]
>> > > >
>> > > > > > > Ok, unless I've messsed up something major, bisecting points to:
>> > > > > > >
>> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
>> > > > > > >
>> > > > > > > Makes any sense ?
>> > > > > >
>> > > > > > Good question. ;-)
>> > > > > >
>> > > > > > Are any of your online CPUs missing rcuo kthreads? There should be
>> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
>> > > > >
>> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
>> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
>> > > > > to setup its bridge.
>> > > > >
>> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
>> > > > > before.
>> > >
>> > > > Thank you, very interesting. Which 6 of the rcuos are present?
>> > >
>> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
>> > > Phenom II.
>> >
>> > Ah, you get 8 without the patch because it creates them for potential
>> > CPUs as well as real ones. OK, got it.
>> >
>> > > > > Awating instructions: :)
>> > > >
>> > > > Well, I thought I understood the problem until you found that only 6 of
>> > > > the expected 8 rcuos are present with linux-tip without the revert. ;-)
>> > > >
>> > > > I am putting together a patch for the part of the problem that I think
>> > > > I understand, of course, but it would help a lot to know which two of
>> > > > the rcuos are missing. ;-)
>> > >
>> > > Ready to test
>> >
>> > Well, if you are feeling aggressive, give the following patch a spin.
>> > I am doing sanity tests on it in the meantime.
>>
>> Doesn't seem to make a difference here
>
>OK, inspection isn't cutting it, so time for tracing. Does the system
>respond to user input? If so, please enable rcu:rcu_barrier ftrace before
>the problem occurs, then dump the trace buffer after the problem occurs.

My system is up and responsive when the problem occurs, so this
shouldn't be a problem.

Do you want the ftrace with your patch below, or unmodified tip
of tree?

-J


> Thanx, Paul
>
>> > ------------------------------------------------------------------------
>> >
>> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>> > index 29fb23f33c18..927c17b081c7 100644
>> > --- a/kernel/rcu/tree_plugin.h
>> > +++ b/kernel/rcu/tree_plugin.h
>> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
>> > rdp->nocb_leader = rdp_spawn;
>> > if (rdp_last && rdp != rdp_spawn)
>> > rdp_last->nocb_next_follower = rdp;
>> > - rdp_last = rdp;
>> > - rdp = rdp->nocb_next_follower;
>> > - rdp_last->nocb_next_follower = NULL;
>> > + if (rdp == rdp_spawn) {
>> > + rdp = rdp->nocb_next_follower;
>> > + } else {
>> > + rdp_last = rdp;
>> > + rdp = rdp->nocb_next_follower;
>> > + rdp_last->nocb_next_follower = NULL;
>> > + }
>> > } while (rdp);
>> > rdp_spawn->nocb_next_follower = rdp_old_leader;
>> > }
>> >

---
-Jay Vosburgh, [email protected]

2014-10-24 19:01:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 11:49:48AM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <[email protected]> wrote:
>
> >On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> >> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> >> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> >> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> >> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> >> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> >> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> >> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> >> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> >> > > > > > > > >
> >> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> >> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> >> > > >
> >> > > > [ . . . ]
> >> > > >
> >> > > > > > > Ok, unless I've messsed up something major, bisecting points to:
> >> > > > > > >
> >> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> >> > > > > > >
> >> > > > > > > Makes any sense ?
> >> > > > > >
> >> > > > > > Good question. ;-)
> >> > > > > >
> >> > > > > > Are any of your online CPUs missing rcuo kthreads? There should be
> >> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> >> > > > >
> >> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> >> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> >> > > > > to setup its bridge.
> >> > > > >
> >> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> >> > > > > before.
> >> > >
> >> > > > Thank you, very interesting. Which 6 of the rcuos are present?
> >> > >
> >> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
> >> > > Phenom II.
> >> >
> >> > Ah, you get 8 without the patch because it creates them for potential
> >> > CPUs as well as real ones. OK, got it.
> >> >
> >> > > > > Awating instructions: :)
> >> > > >
> >> > > > Well, I thought I understood the problem until you found that only 6 of
> >> > > > the expected 8 rcuos are present with linux-tip without the revert. ;-)
> >> > > >
> >> > > > I am putting together a patch for the part of the problem that I think
> >> > > > I understand, of course, but it would help a lot to know which two of
> >> > > > the rcuos are missing. ;-)
> >> > >
> >> > > Ready to test
> >> >
> >> > Well, if you are feeling aggressive, give the following patch a spin.
> >> > I am doing sanity tests on it in the meantime.
> >>
> >> Doesn't seem to make a difference here
> >
> >OK, inspection isn't cutting it, so time for tracing. Does the system
> >respond to user input? If so, please enable rcu:rcu_barrier ftrace before
> >the problem occurs, then dump the trace buffer after the problem occurs.
>
> My system is up and responsive when the problem occurs, so this
> shouldn't be a problem.

Nice! ;-)

> Do you want the ftrace with your patch below, or unmodified tip
> of tree?

Let's please start with the patch.

Thanx, Paul

> -J
>
>
> > Thanx, Paul
> >
> >> > ------------------------------------------------------------------------
> >> >
> >> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >> > index 29fb23f33c18..927c17b081c7 100644
> >> > --- a/kernel/rcu/tree_plugin.h
> >> > +++ b/kernel/rcu/tree_plugin.h
> >> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
> >> > rdp->nocb_leader = rdp_spawn;
> >> > if (rdp_last && rdp != rdp_spawn)
> >> > rdp_last->nocb_next_follower = rdp;
> >> > - rdp_last = rdp;
> >> > - rdp = rdp->nocb_next_follower;
> >> > - rdp_last->nocb_next_follower = NULL;
> >> > + if (rdp == rdp_spawn) {
> >> > + rdp = rdp->nocb_next_follower;
> >> > + } else {
> >> > + rdp_last = rdp;
> >> > + rdp = rdp->nocb_next_follower;
> >> > + rdp_last->nocb_next_follower = NULL;
> >> > + }
> >> > } while (rdp);
> >> > rdp_spawn->nocb_next_follower = rdp_old_leader;
> >> > }
> >> >
>
> ---
> -Jay Vosburgh, [email protected]
>

2014-10-24 20:19:02

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 11:57:53AM -0700, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 11:49:48AM -0700, Jay Vosburgh wrote:
> > Paul E. McKenney <[email protected]> wrote:
> >
> > >On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> > >> On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> > >> > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> > >> > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> > >> > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > >> > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > >> > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > >> > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > >> > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > >> > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > >> > > >
> > >> > > > [ . . . ]
> > >> > > >
> > >> > > > > > > Ok, unless I've messsed up something major, bisecting points to:
> > >> > > > > > >
> > >> > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > >> > > > > > >
> > >> > > > > > > Makes any sense ?
> > >> > > > > >
> > >> > > > > > Good question. ;-)
> > >> > > > > >
> > >> > > > > > Are any of your online CPUs missing rcuo kthreads? There should be
> > >> > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> > >> > > > >
> > >> > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > >> > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > >> > > > > to setup its bridge.
> > >> > > > >
> > >> > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > >> > > > > before.
> > >> > >
> > >> > > > Thank you, very interesting. Which 6 of the rcuos are present?
> > >> > >
> > >> > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
> > >> > > Phenom II.
> > >> >
> > >> > Ah, you get 8 without the patch because it creates them for potential
> > >> > CPUs as well as real ones. OK, got it.
> > >> >
> > >> > > > > Awating instructions: :)
> > >> > > >
> > >> > > > Well, I thought I understood the problem until you found that only 6 of
> > >> > > > the expected 8 rcuos are present with linux-tip without the revert. ;-)
> > >> > > >
> > >> > > > I am putting together a patch for the part of the problem that I think
> > >> > > > I understand, of course, but it would help a lot to know which two of
> > >> > > > the rcuos are missing. ;-)
> > >> > >
> > >> > > Ready to test
> > >> >
> > >> > Well, if you are feeling aggressive, give the following patch a spin.
> > >> > I am doing sanity tests on it in the meantime.
> > >>
> > >> Doesn't seem to make a difference here
> > >
> > >OK, inspection isn't cutting it, so time for tracing. Does the system
> > >respond to user input? If so, please enable rcu:rcu_barrier ftrace before
> > >the problem occurs, then dump the trace buffer after the problem occurs.
> >
> > My system is up and responsive when the problem occurs, so this
> > shouldn't be a problem.
>
> Nice! ;-)
>
> > Do you want the ftrace with your patch below, or unmodified tip
> > of tree?
>
> Let's please start with the patch.

And I should hasten to add that you need to set CONFIG_RCU_TRACE=y
for these tracepoints to be enabled.

Thanx, Paul

> > -J
> >
> >
> > > Thanx, Paul
> > >
> > >> > ------------------------------------------------------------------------
> > >> >
> > >> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > >> > index 29fb23f33c18..927c17b081c7 100644
> > >> > --- a/kernel/rcu/tree_plugin.h
> > >> > +++ b/kernel/rcu/tree_plugin.h
> > >> > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
> > >> > rdp->nocb_leader = rdp_spawn;
> > >> > if (rdp_last && rdp != rdp_spawn)
> > >> > rdp_last->nocb_next_follower = rdp;
> > >> > - rdp_last = rdp;
> > >> > - rdp = rdp->nocb_next_follower;
> > >> > - rdp_last->nocb_next_follower = NULL;
> > >> > + if (rdp == rdp_spawn) {
> > >> > + rdp = rdp->nocb_next_follower;
> > >> > + } else {
> > >> > + rdp_last = rdp;
> > >> > + rdp = rdp->nocb_next_follower;
> > >> > + rdp_last->nocb_next_follower = NULL;
> > >> > + }
> > >> > } while (rdp);
> > >> > rdp_spawn->nocb_next_follower = rdp_old_leader;
> > >> > }
> > >> >
> >
> > ---
> > -Jay Vosburgh, [email protected]
> >

2014-10-24 21:26:03

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> > > On Fri, Oct 24, 2014 at 08:09:31PM +0300, Yanko Kaneti wrote:
> > > > On Fri-10/24/14-2014 09:54, Paul E. McKenney wrote:
> > > > > On Fri, Oct 24, 2014 at 07:29:43PM +0300, Yanko Kaneti wrote:
> > > > > > On Fri-10/24/14-2014 08:40, Paul E. McKenney wrote:
> > > > > > > On Fri, Oct 24, 2014 at 12:08:57PM +0300, Yanko Kaneti wrote:
> > > > > > > > On Thu-10/23/14-2014 15:04, Paul E. McKenney wrote:
> > > > > > > > > On Fri, Oct 24, 2014 at 12:45:40AM +0300, Yanko Kaneti wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, 2014-10-23 at 13:05 -0700, Paul E. McKenney wrote:
> > > > > > > > > > > On Thu, Oct 23, 2014 at 10:51:59PM +0300, Yanko Kaneti wrote:
> > > > >
> > > > > [ . . . ]
> > > > >
> > > > > > > > Ok, unless I've messsed up something major, bisecting points to:
> > > > > > > >
> > > > > > > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> > > > > > > >
> > > > > > > > Makes any sense ?
> > > > > > >
> > > > > > > Good question. ;-)
> > > > > > >
> > > > > > > Are any of your online CPUs missing rcuo kthreads? There should be
> > > > > > > kthreads named rcuos/0, rcuos/1, rcuos/2, and so on for each online CPU.
> > > > > >
> > > > > > Its a Phenom II X6. With 3.17 and linux-tip with 35ce7f29a44a reverted, the rcuos are 8
> > > > > > and the modprobe ppp_generic testcase reliably works, libvirt also manages
> > > > > > to setup its bridge.
> > > > > >
> > > > > > Just with linux-tip , the rcuos are 6 but the failure is as reliable as
> > > > > > before.
> > > >
> > > > > Thank you, very interesting. Which 6 of the rcuos are present?
> > > >
> > > > Well, the rcuos are 0 to 5. Which sounds right for a 6 core CPU like this
> > > > Phenom II.
> > >
> > > Ah, you get 8 without the patch because it creates them for potential
> > > CPUs as well as real ones. OK, got it.
> > >
> > > > > > Awating instructions: :)
> > > > >
> > > > > Well, I thought I understood the problem until you found that only 6 of
> > > > > the expected 8 rcuos are present with linux-tip without the revert. ;-)
> > > > >
> > > > > I am putting together a patch for the part of the problem that I think
> > > > > I understand, of course, but it would help a lot to know which two of
> > > > > the rcuos are missing. ;-)
> > > >
> > > > Ready to test
> > >
> > > Well, if you are feeling aggressive, give the following patch a spin.
> > > I am doing sanity tests on it in the meantime.
> >
> > Doesn't seem to make a difference here
>
> OK, inspection isn't cutting it, so time for tracing. Does the system
> respond to user input? If so, please enable rcu:rcu_barrier ftrace before
> the problem occurs, then dump the trace buffer after the problem occurs.

Sorry for being unresposive here, but I know next to nothing about tracing
or most things about the kernel, so I have some cathing up to do.

In the meantime some layman observations while I tried to find what exactly
triggers the problem.
- Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
- libvirtd seems to be very active in using all sorts of kernel facilities
that are modules on fedora so it seems to cause many simultaneous kworker
calls to modprobe
- there are 8 kworker/u16 from 0 to 7
- one of these kworkers always deadlocks, while there appear to be two
kworker/u16:6 - the seventh

6 vs 8 as in 6 rcuos where before they were always 8

Just observations from someone who still doesn't know what the u16
kworkers are..

-- Yanko




> Thanx, Paul
>
> > > ------------------------------------------------------------------------
> > >
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 29fb23f33c18..927c17b081c7 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
> > > rdp->nocb_leader = rdp_spawn;
> > > if (rdp_last && rdp != rdp_spawn)
> > > rdp_last->nocb_next_follower = rdp;
> > > - rdp_last = rdp;
> > > - rdp = rdp->nocb_next_follower;
> > > - rdp_last->nocb_next_follower = NULL;
> > > + if (rdp == rdp_spawn) {
> > > + rdp = rdp->nocb_next_follower;
> > > + } else {
> > > + rdp_last = rdp;
> > > + rdp = rdp->nocb_next_follower;
> > > + rdp_last->nocb_next_follower = NULL;
> > > + }
> > > } while (rdp);
> > > rdp_spawn->nocb_next_follower = rdp_old_leader;
> > > }
> > >
> >
>

2014-10-24 21:53:21

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Sat, Oct 25, 2014 at 12:25:57AM +0300, Yanko Kaneti wrote:
> On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote:
> > On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> > > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:

[ . . . ]

> > > > Well, if you are feeling aggressive, give the following patch a spin.
> > > > I am doing sanity tests on it in the meantime.
> > >
> > > Doesn't seem to make a difference here
> >
> > OK, inspection isn't cutting it, so time for tracing. Does the system
> > respond to user input? If so, please enable rcu:rcu_barrier ftrace before
> > the problem occurs, then dump the trace buffer after the problem occurs.
>
> Sorry for being unresposive here, but I know next to nothing about tracing
> or most things about the kernel, so I have some cathing up to do.
>
> In the meantime some layman observations while I tried to find what exactly
> triggers the problem.
> - Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
> - libvirtd seems to be very active in using all sorts of kernel facilities
> that are modules on fedora so it seems to cause many simultaneous kworker
> calls to modprobe
> - there are 8 kworker/u16 from 0 to 7
> - one of these kworkers always deadlocks, while there appear to be two
> kworker/u16:6 - the seventh

Adding Tejun on CC in case this duplication of kworker/u16:6 is important.

> 6 vs 8 as in 6 rcuos where before they were always 8
>
> Just observations from someone who still doesn't know what the u16
> kworkers are..

Could you please run the following diagnostic patch? This will help
me see if I have managed to miswire the rcuo kthreads. It should
print some information at task-hang time.

Thanx, Paul

------------------------------------------------------------------------

rcu: Dump no-CBs CPU state at task-hung time

Strictly diagnostic commit for rcu_barrier() hang. Not for inclusion.

Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 0e5366200154..34048140577b 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -157,4 +157,8 @@ static inline bool rcu_is_watching(void)

#endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */

+static inline void rcu_show_nocb_setup(void)
+{
+}
+
#endif /* __LINUX_RCUTINY_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 52953790dcca..0b813bdb971b 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -97,4 +97,6 @@ extern int rcu_scheduler_active __read_mostly;

bool rcu_is_watching(void);

+void rcu_show_nocb_setup(void);
+
#endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 06db12434d72..e6e4d0f6b063 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -118,6 +118,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
" disables this message.\n");
sched_show_task(t);
debug_show_held_locks(t);
+ rcu_show_nocb_setup();

touch_nmi_watchdog();

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 240fa9094f83..6b373e79ce0e 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1513,6 +1513,7 @@ rcu_torture_cleanup(void)
{
int i;

+ rcu_show_nocb_setup();
rcutorture_record_test_transition();
if (torture_cleanup_begin()) {
if (cur_ops->cb_barrier != NULL)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..285b3f6fb229 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2699,6 +2699,31 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)

#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */

+void rcu_show_nocb_setup(void)
+{
+#ifdef CONFIG_RCU_NOCB_CPU
+ int cpu;
+ struct rcu_data *rdp;
+ struct rcu_state *rsp;
+
+ for_each_rcu_flavor(rsp) {
+ pr_alert("rcu_show_nocb_setup(): %s nocb state:\n", rsp->name);
+ for_each_possible_cpu(cpu) {
+ if (!rcu_is_nocb_cpu(cpu))
+ continue;
+ rdp = per_cpu_ptr(rsp->rda, cpu);
+ pr_alert("%3d: %p l:%p n:%p %c%c%c\n",
+ cpu,
+ rdp, rdp->nocb_leader, rdp->nocb_next_follower,
+ ".N"[!!rdp->nocb_head],
+ ".G"[!!rdp->nocb_gp_head],
+ ".F"[!!rdp->nocb_follower_head]);
+ }
+ }
+#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
+}
+EXPORT_SYMBOL_GPL(rcu_show_nocb_setup);
+
/*
* An adaptive-ticks CPU can potentially execute in kernel mode for an
* arbitrarily long period of time with the scheduling-clock tick turned

2014-10-24 22:02:22

by Jay Vosburgh

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <[email protected]> wrote:

>On Sat, Oct 25, 2014 at 12:25:57AM +0300, Yanko Kaneti wrote:
>> On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote:
>> > On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
>> > > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
>
>[ . . . ]
>
>> > > > Well, if you are feeling aggressive, give the following patch a spin.
>> > > > I am doing sanity tests on it in the meantime.
>> > >
>> > > Doesn't seem to make a difference here
>> >
>> > OK, inspection isn't cutting it, so time for tracing. Does the system
>> > respond to user input? If so, please enable rcu:rcu_barrier ftrace before
>> > the problem occurs, then dump the trace buffer after the problem occurs.
>>
>> Sorry for being unresposive here, but I know next to nothing about tracing
>> or most things about the kernel, so I have some cathing up to do.
>>
>> In the meantime some layman observations while I tried to find what exactly
>> triggers the problem.
>> - Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
>> - libvirtd seems to be very active in using all sorts of kernel facilities
>> that are modules on fedora so it seems to cause many simultaneous kworker
>> calls to modprobe
>> - there are 8 kworker/u16 from 0 to 7
>> - one of these kworkers always deadlocks, while there appear to be two
>> kworker/u16:6 - the seventh
>
>Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
>
>> 6 vs 8 as in 6 rcuos where before they were always 8
>>
>> Just observations from someone who still doesn't know what the u16
>> kworkers are..
>
>Could you please run the following diagnostic patch? This will help
>me see if I have managed to miswire the rcuo kthreads. It should
>print some information at task-hang time.

I can give this a spin after the ftrace (now that I've got
CONFIG_RCU_TRACE turned on).

I've got an ftrace capture from unmodified -net, it looks like
this:

ovs-vswitchd-902 [000] .... 471.778441: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
rcuos/0-9 [000] ..s. 471.793150: rcu_barrier: rcu_sched CB cpu -1 remaining 3 # 2
rcuos/1-18 [001] ..s. 471.793308: rcu_barrier: rcu_sched CB cpu -1 remaining 2 # 2

I let it sit through several "hung task" cycles but that was all
there was for rcu:rcu_barrier.

I should have ftrace with the patch as soon as the kernel is
done building, then I can try the below patch (I'll start it building
now).

-J




> Thanx, Paul
>
>------------------------------------------------------------------------
>
>rcu: Dump no-CBs CPU state at task-hung time
>
>Strictly diagnostic commit for rcu_barrier() hang. Not for inclusion.
>
>Signed-off-by: Paul E. McKenney <[email protected]>
>
>diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
>index 0e5366200154..34048140577b 100644
>--- a/include/linux/rcutiny.h
>+++ b/include/linux/rcutiny.h
>@@ -157,4 +157,8 @@ static inline bool rcu_is_watching(void)
>
> #endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
>
>+static inline void rcu_show_nocb_setup(void)
>+{
>+}
>+
> #endif /* __LINUX_RCUTINY_H */
>diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
>index 52953790dcca..0b813bdb971b 100644
>--- a/include/linux/rcutree.h
>+++ b/include/linux/rcutree.h
>@@ -97,4 +97,6 @@ extern int rcu_scheduler_active __read_mostly;
>
> bool rcu_is_watching(void);
>
>+void rcu_show_nocb_setup(void);
>+
> #endif /* __LINUX_RCUTREE_H */
>diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>index 06db12434d72..e6e4d0f6b063 100644
>--- a/kernel/hung_task.c
>+++ b/kernel/hung_task.c
>@@ -118,6 +118,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> " disables this message.\n");
> sched_show_task(t);
> debug_show_held_locks(t);
>+ rcu_show_nocb_setup();
>
> touch_nmi_watchdog();
>
>diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
>index 240fa9094f83..6b373e79ce0e 100644
>--- a/kernel/rcu/rcutorture.c
>+++ b/kernel/rcu/rcutorture.c
>@@ -1513,6 +1513,7 @@ rcu_torture_cleanup(void)
> {
> int i;
>
>+ rcu_show_nocb_setup();
> rcutorture_record_test_transition();
> if (torture_cleanup_begin()) {
> if (cur_ops->cb_barrier != NULL)
>diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>index 927c17b081c7..285b3f6fb229 100644
>--- a/kernel/rcu/tree_plugin.h
>+++ b/kernel/rcu/tree_plugin.h
>@@ -2699,6 +2699,31 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
>
> #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
>
>+void rcu_show_nocb_setup(void)
>+{
>+#ifdef CONFIG_RCU_NOCB_CPU
>+ int cpu;
>+ struct rcu_data *rdp;
>+ struct rcu_state *rsp;
>+
>+ for_each_rcu_flavor(rsp) {
>+ pr_alert("rcu_show_nocb_setup(): %s nocb state:\n", rsp->name);
>+ for_each_possible_cpu(cpu) {
>+ if (!rcu_is_nocb_cpu(cpu))
>+ continue;
>+ rdp = per_cpu_ptr(rsp->rda, cpu);
>+ pr_alert("%3d: %p l:%p n:%p %c%c%c\n",
>+ cpu,
>+ rdp, rdp->nocb_leader, rdp->nocb_next_follower,
>+ ".N"[!!rdp->nocb_head],
>+ ".G"[!!rdp->nocb_gp_head],
>+ ".F"[!!rdp->nocb_follower_head]);
>+ }
>+ }
>+#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
>+}
>+EXPORT_SYMBOL_GPL(rcu_show_nocb_setup);
>+
> /*
> * An adaptive-ticks CPU can potentially execute in kernel mode for an
> * arbitrarily long period of time with the scheduling-clock tick turned
>

---
-Jay Vosburgh, [email protected]

2014-10-24 22:19:54

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 03:02:04PM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <[email protected]> wrote:
>
> >On Sat, Oct 25, 2014 at 12:25:57AM +0300, Yanko Kaneti wrote:
> >> On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote:
> >> > On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> >> > > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> >
> >[ . . . ]
> >
> >> > > > Well, if you are feeling aggressive, give the following patch a spin.
> >> > > > I am doing sanity tests on it in the meantime.
> >> > >
> >> > > Doesn't seem to make a difference here
> >> >
> >> > OK, inspection isn't cutting it, so time for tracing. Does the system
> >> > respond to user input? If so, please enable rcu:rcu_barrier ftrace before
> >> > the problem occurs, then dump the trace buffer after the problem occurs.
> >>
> >> Sorry for being unresposive here, but I know next to nothing about tracing
> >> or most things about the kernel, so I have some cathing up to do.
> >>
> >> In the meantime some layman observations while I tried to find what exactly
> >> triggers the problem.
> >> - Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
> >> - libvirtd seems to be very active in using all sorts of kernel facilities
> >> that are modules on fedora so it seems to cause many simultaneous kworker
> >> calls to modprobe
> >> - there are 8 kworker/u16 from 0 to 7
> >> - one of these kworkers always deadlocks, while there appear to be two
> >> kworker/u16:6 - the seventh
> >
> >Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
> >
> >> 6 vs 8 as in 6 rcuos where before they were always 8
> >>
> >> Just observations from someone who still doesn't know what the u16
> >> kworkers are..
> >
> >Could you please run the following diagnostic patch? This will help
> >me see if I have managed to miswire the rcuo kthreads. It should
> >print some information at task-hang time.
>
> I can give this a spin after the ftrace (now that I've got
> CONFIG_RCU_TRACE turned on).
>
> I've got an ftrace capture from unmodified -net, it looks like
> this:
>
> ovs-vswitchd-902 [000] .... 471.778441: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
> ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
> ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
> ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
> ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
> ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
> ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1

OK, so it looks like your system has four CPUs, and rcu_barrier() placed
callbacks on them all.

> ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2

The above removes the extra count used to avoid races between posting new
callbacks and completion of previously posted callbacks.

> rcuos/0-9 [000] ..s. 471.793150: rcu_barrier: rcu_sched CB cpu -1 remaining 3 # 2
> rcuos/1-18 [001] ..s. 471.793308: rcu_barrier: rcu_sched CB cpu -1 remaining 2 # 2

Two of the four callbacks fired, but the other two appear to be AWOL.
And rcu_barrier() won't return until they all fire.

> I let it sit through several "hung task" cycles but that was all
> there was for rcu:rcu_barrier.
>
> I should have ftrace with the patch as soon as the kernel is
> done building, then I can try the below patch (I'll start it building
> now).

Sounds very good, looking forward to hearing of the results.

Thanx, Paul

2014-10-24 22:34:23

by Jay Vosburgh

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <[email protected]> wrote:

>On Sat, Oct 25, 2014 at 12:25:57AM +0300, Yanko Kaneti wrote:
>> On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote:
>> > On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
>> > > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
>
>[ . . . ]
>
>> > > > Well, if you are feeling aggressive, give the following patch a spin.
>> > > > I am doing sanity tests on it in the meantime.
>> > >
>> > > Doesn't seem to make a difference here
>> >
>> > OK, inspection isn't cutting it, so time for tracing. Does the system
>> > respond to user input? If so, please enable rcu:rcu_barrier ftrace before
>> > the problem occurs, then dump the trace buffer after the problem occurs.
>>
>> Sorry for being unresposive here, but I know next to nothing about tracing
>> or most things about the kernel, so I have some cathing up to do.
>>
>> In the meantime some layman observations while I tried to find what exactly
>> triggers the problem.
>> - Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
>> - libvirtd seems to be very active in using all sorts of kernel facilities
>> that are modules on fedora so it seems to cause many simultaneous kworker
>> calls to modprobe
>> - there are 8 kworker/u16 from 0 to 7
>> - one of these kworkers always deadlocks, while there appear to be two
>> kworker/u16:6 - the seventh
>
>Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
>
>> 6 vs 8 as in 6 rcuos where before they were always 8
>>
>> Just observations from someone who still doesn't know what the u16
>> kworkers are..
>
>Could you please run the following diagnostic patch? This will help
>me see if I have managed to miswire the rcuo kthreads. It should
>print some information at task-hang time.

Here's the output of the patch; I let it sit through two hang
cycles.

-J


[ 240.348020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
[ 240.354878] Not tainted 3.17.0-testola+ #4
[ 240.359481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.367285] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
[ 240.367290] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
[ 240.367293] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
[ 240.367296] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
[ 240.367300] Call Trace:
[ 240.367307] [<ffffffff81722b99>] schedule+0x29/0x70
[ 240.367310] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
[ 240.367313] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
[ 240.367316] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
[ 240.367321] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
[ 240.367324] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
[ 240.367328] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 240.367331] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
[ 240.367334] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
[ 240.367338] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
[ 240.367341] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
[ 240.367349] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 240.367354] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 240.367358] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 240.367363] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 240.367367] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
[ 240.367370] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 240.367372] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
[ 240.367376] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
[ 240.367378] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
[ 240.367381] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
[ 240.367383] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
[ 240.367387] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
[ 240.367391] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
[ 240.367395] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
[ 240.367399] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
[ 240.367402] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
[ 240.367406] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
[ 240.367410] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 240.367413] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 240.367416] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 240.367420] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
[ 240.367424] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 240.367428] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
[ 240.367431] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
[ 240.367433] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
[ 240.367436] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
[ 240.367439] rcu_show_nocb_setup(): rcu_sched nocb state:
[ 240.372734] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 .G.
[ 240.379673] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) .G.
[ 240.386611] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
[ 240.393550] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
[ 240.400489] rcu_show_nocb_setup(): rcu_bh nocb state:
[ 240.405525] 0: ffff88013fc0e3c0 l:ffff88013fc0e3c0 n:ffff88013fc8e3c0 ...
[ 240.412463] 1: ffff88013fc8e3c0 l:ffff88013fc0e3c0 n: (null) ...
[ 240.419401] 2: ffff88013fd0e3c0 l:ffff88013fd0e3c0 n:ffff88013fd8e3c0 ...
[ 240.426339] 3: ffff88013fd8e3c0 l:ffff88013fd0e3c0 n: (null) ...
[ 360.432020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
[ 360.438881] Not tainted 3.17.0-testola+ #4
[ 360.443484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.451289] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
[ 360.451293] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
[ 360.451297] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
[ 360.451300] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
[ 360.451303] Call Trace:
[ 360.451311] [<ffffffff81722b99>] schedule+0x29/0x70
[ 360.451314] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
[ 360.451317] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
[ 360.451320] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
[ 360.451325] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
[ 360.451327] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
[ 360.451331] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
[ 360.451335] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
[ 360.451338] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
[ 360.451342] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
[ 360.451345] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
[ 360.451353] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
[ 360.451358] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
[ 360.451362] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
[ 360.451366] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
[ 360.451370] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
[ 360.451373] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
[ 360.451376] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
[ 360.451379] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
[ 360.451381] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
[ 360.451384] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
[ 360.451387] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
[ 360.451390] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
[ 360.451394] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
[ 360.451399] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
[ 360.451402] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
[ 360.451406] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
[ 360.451410] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
[ 360.451414] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 360.451417] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
[ 360.451419] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
[ 360.451424] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
[ 360.451427] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
[ 360.451431] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
[ 360.451434] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
[ 360.451437] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
[ 360.451440] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
[ 360.451442] rcu_show_nocb_setup(): rcu_sched nocb state:
[ 360.456737] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 ...
[ 360.463676] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) ...
[ 360.470614] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
[ 360.477554] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
[ 360.484494] rcu_show_nocb_setup(): rcu_bh nocb state:
[ 360.489529] 0: ffff88013fc0e3c0 l:ffff88013fc0e3c0 n:ffff88013fc8e3c0 ...
[ 360.496469] 1: ffff88013fc8e3c0 l:ffff88013fc0e3c0 n: (null) .G.
[ 360.503407] 2: ffff88013fd0e3c0 l:ffff88013fd0e3c0 n:ffff88013fd8e3c0 ...
[ 360.510346] 3: ffff88013fd8e3c0 l:ffff88013fd0e3c0 n: (null) ...

---
-Jay Vosburgh, [email protected]

2014-10-24 22:41:44

by Jay Vosburgh

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <[email protected]> wrote:

>On Fri, Oct 24, 2014 at 03:02:04PM -0700, Jay Vosburgh wrote:
>> Paul E. McKenney <[email protected]> wrote:
>>
[...]
>> I've got an ftrace capture from unmodified -net, it looks like
>> this:
>>
>> ovs-vswitchd-902 [000] .... 471.778441: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
>> ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
>> ovs-vswitchd-902 [000] .... 471.778452: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
>> ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
>> ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
>> ovs-vswitchd-902 [000] .... 471.778453: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
>> ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
>
>OK, so it looks like your system has four CPUs, and rcu_barrier() placed
>callbacks on them all.

No, the system has only two CPUs. It's an Intel Core 2 Duo
E8400, and /proc/cpuinfo agrees that there are only 2. There is a
potentially relevant-sounding message early in dmesg that says:

[ 0.000000] smpboot: Allowing 4 CPUs, 2 hotplug CPUs

>> ovs-vswitchd-902 [000] .... 471.778454: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
>
>The above removes the extra count used to avoid races between posting new
>callbacks and completion of previously posted callbacks.
>
>> rcuos/0-9 [000] ..s. 471.793150: rcu_barrier: rcu_sched CB cpu -1 remaining 3 # 2
>> rcuos/1-18 [001] ..s. 471.793308: rcu_barrier: rcu_sched CB cpu -1 remaining 2 # 2
>
>Two of the four callbacks fired, but the other two appear to be AWOL.
>And rcu_barrier() won't return until they all fire.
>
>> I let it sit through several "hung task" cycles but that was all
>> there was for rcu:rcu_barrier.
>>
>> I should have ftrace with the patch as soon as the kernel is
>> done building, then I can try the below patch (I'll start it building
>> now).
>
>Sounds very good, looking forward to hearing of the results.

Going to bounce it for ftrace now, but the cpu count mismatch
seemed important enough to mention separately.

-J

---
-Jay Vosburgh, [email protected]

2014-10-24 23:03:24

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 03:34:07PM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <[email protected]> wrote:
>
> >On Sat, Oct 25, 2014 at 12:25:57AM +0300, Yanko Kaneti wrote:
> >> On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote:
> >> > On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> >> > > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> >
> >[ . . . ]
> >
> >> > > > Well, if you are feeling aggressive, give the following patch a spin.
> >> > > > I am doing sanity tests on it in the meantime.
> >> > >
> >> > > Doesn't seem to make a difference here
> >> >
> >> > OK, inspection isn't cutting it, so time for tracing. Does the system
> >> > respond to user input? If so, please enable rcu:rcu_barrier ftrace before
> >> > the problem occurs, then dump the trace buffer after the problem occurs.
> >>
> >> Sorry for being unresposive here, but I know next to nothing about tracing
> >> or most things about the kernel, so I have some cathing up to do.
> >>
> >> In the meantime some layman observations while I tried to find what exactly
> >> triggers the problem.
> >> - Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
> >> - libvirtd seems to be very active in using all sorts of kernel facilities
> >> that are modules on fedora so it seems to cause many simultaneous kworker
> >> calls to modprobe
> >> - there are 8 kworker/u16 from 0 to 7
> >> - one of these kworkers always deadlocks, while there appear to be two
> >> kworker/u16:6 - the seventh
> >
> >Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
> >
> >> 6 vs 8 as in 6 rcuos where before they were always 8
> >>
> >> Just observations from someone who still doesn't know what the u16
> >> kworkers are..
> >
> >Could you please run the following diagnostic patch? This will help
> >me see if I have managed to miswire the rcuo kthreads. It should
> >print some information at task-hang time.
>
> Here's the output of the patch; I let it sit through two hang
> cycles.
>
> -J
>
>
> [ 240.348020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
> [ 240.354878] Not tainted 3.17.0-testola+ #4
> [ 240.359481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 240.367285] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
> [ 240.367290] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
> [ 240.367293] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
> [ 240.367296] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
> [ 240.367300] Call Trace:
> [ 240.367307] [<ffffffff81722b99>] schedule+0x29/0x70
> [ 240.367310] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
> [ 240.367313] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
> [ 240.367316] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
> [ 240.367321] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
> [ 240.367324] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
> [ 240.367328] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
> [ 240.367331] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
> [ 240.367334] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
> [ 240.367338] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
> [ 240.367341] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
> [ 240.367349] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
> [ 240.367354] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
> [ 240.367358] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
> [ 240.367363] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
> [ 240.367367] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
> [ 240.367370] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
> [ 240.367372] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
> [ 240.367376] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
> [ 240.367378] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
> [ 240.367381] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
> [ 240.367383] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
> [ 240.367387] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
> [ 240.367391] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
> [ 240.367395] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
> [ 240.367399] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
> [ 240.367402] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
> [ 240.367406] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
> [ 240.367410] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> [ 240.367413] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> [ 240.367416] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
> [ 240.367420] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
> [ 240.367424] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
> [ 240.367428] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
> [ 240.367431] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
> [ 240.367433] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
> [ 240.367436] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
> [ 240.367439] rcu_show_nocb_setup(): rcu_sched nocb state:
> [ 240.372734] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 .G.
> [ 240.379673] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) .G.
> [ 240.386611] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
> [ 240.393550] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
> [ 240.400489] rcu_show_nocb_setup(): rcu_bh nocb state:
> [ 240.405525] 0: ffff88013fc0e3c0 l:ffff88013fc0e3c0 n:ffff88013fc8e3c0 ...
> [ 240.412463] 1: ffff88013fc8e3c0 l:ffff88013fc0e3c0 n: (null) ...
> [ 240.419401] 2: ffff88013fd0e3c0 l:ffff88013fd0e3c0 n:ffff88013fd8e3c0 ...
> [ 240.426339] 3: ffff88013fd8e3c0 l:ffff88013fd0e3c0 n: (null) ...
> [ 360.432020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
> [ 360.438881] Not tainted 3.17.0-testola+ #4
> [ 360.443484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 360.451289] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
> [ 360.451293] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
> [ 360.451297] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
> [ 360.451300] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
> [ 360.451303] Call Trace:
> [ 360.451311] [<ffffffff81722b99>] schedule+0x29/0x70
> [ 360.451314] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
> [ 360.451317] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
> [ 360.451320] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
> [ 360.451325] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
> [ 360.451327] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
> [ 360.451331] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
> [ 360.451335] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
> [ 360.451338] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
> [ 360.451342] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
> [ 360.451345] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
> [ 360.451353] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
> [ 360.451358] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
> [ 360.451362] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
> [ 360.451366] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
> [ 360.451370] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
> [ 360.451373] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
> [ 360.451376] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
> [ 360.451379] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
> [ 360.451381] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
> [ 360.451384] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
> [ 360.451387] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
> [ 360.451390] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
> [ 360.451394] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
> [ 360.451399] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
> [ 360.451402] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
> [ 360.451406] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
> [ 360.451410] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
> [ 360.451414] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> [ 360.451417] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> [ 360.451419] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
> [ 360.451424] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
> [ 360.451427] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
> [ 360.451431] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
> [ 360.451434] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
> [ 360.451437] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
> [ 360.451440] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
> [ 360.451442] rcu_show_nocb_setup(): rcu_sched nocb state:
> [ 360.456737] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 ...
> [ 360.463676] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) ...
> [ 360.470614] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
> [ 360.477554] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..

Hmmm... It sure looks like we have some callbacks stuck here. I clearly
need to take a hard look at the sleep/wakeup code.

Thank you for running this!!!

Thanx, Paul

> [ 360.484494] rcu_show_nocb_setup(): rcu_bh nocb state:
> [ 360.489529] 0: ffff88013fc0e3c0 l:ffff88013fc0e3c0 n:ffff88013fc8e3c0 ...
> [ 360.496469] 1: ffff88013fc8e3c0 l:ffff88013fc0e3c0 n: (null) .G.
> [ 360.503407] 2: ffff88013fd0e3c0 l:ffff88013fd0e3c0 n:ffff88013fd8e3c0 ...
> [ 360.510346] 3: ffff88013fd8e3c0 l:ffff88013fd0e3c0 n: (null) ...
>
> ---
> -Jay Vosburgh, [email protected]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2014-10-24 23:09:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 03:59:31PM -0700, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 03:34:07PM -0700, Jay Vosburgh wrote:
> > Paul E. McKenney <[email protected]> wrote:
> >
> > >On Sat, Oct 25, 2014 at 12:25:57AM +0300, Yanko Kaneti wrote:
> > >> On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote:
> > >> > On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> > >> > > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> > >
> > >[ . . . ]
> > >
> > >> > > > Well, if you are feeling aggressive, give the following patch a spin.
> > >> > > > I am doing sanity tests on it in the meantime.
> > >> > >
> > >> > > Doesn't seem to make a difference here
> > >> >
> > >> > OK, inspection isn't cutting it, so time for tracing. Does the system
> > >> > respond to user input? If so, please enable rcu:rcu_barrier ftrace before
> > >> > the problem occurs, then dump the trace buffer after the problem occurs.
> > >>
> > >> Sorry for being unresposive here, but I know next to nothing about tracing
> > >> or most things about the kernel, so I have some cathing up to do.
> > >>
> > >> In the meantime some layman observations while I tried to find what exactly
> > >> triggers the problem.
> > >> - Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
> > >> - libvirtd seems to be very active in using all sorts of kernel facilities
> > >> that are modules on fedora so it seems to cause many simultaneous kworker
> > >> calls to modprobe
> > >> - there are 8 kworker/u16 from 0 to 7
> > >> - one of these kworkers always deadlocks, while there appear to be two
> > >> kworker/u16:6 - the seventh
> > >
> > >Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
> > >
> > >> 6 vs 8 as in 6 rcuos where before they were always 8
> > >>
> > >> Just observations from someone who still doesn't know what the u16
> > >> kworkers are..
> > >
> > >Could you please run the following diagnostic patch? This will help
> > >me see if I have managed to miswire the rcuo kthreads. It should
> > >print some information at task-hang time.
> >
> > Here's the output of the patch; I let it sit through two hang
> > cycles.
> >
> > -J
> >
> >
> > [ 240.348020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
> > [ 240.354878] Not tainted 3.17.0-testola+ #4
> > [ 240.359481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 240.367285] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
> > [ 240.367290] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
> > [ 240.367293] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
> > [ 240.367296] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
> > [ 240.367300] Call Trace:
> > [ 240.367307] [<ffffffff81722b99>] schedule+0x29/0x70
> > [ 240.367310] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
> > [ 240.367313] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
> > [ 240.367316] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
> > [ 240.367321] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
> > [ 240.367324] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
> > [ 240.367328] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
> > [ 240.367331] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
> > [ 240.367334] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
> > [ 240.367338] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
> > [ 240.367341] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
> > [ 240.367349] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
> > [ 240.367354] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
> > [ 240.367358] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
> > [ 240.367363] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
> > [ 240.367367] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
> > [ 240.367370] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
> > [ 240.367372] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
> > [ 240.367376] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
> > [ 240.367378] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
> > [ 240.367381] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
> > [ 240.367383] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
> > [ 240.367387] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
> > [ 240.367391] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
> > [ 240.367395] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
> > [ 240.367399] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
> > [ 240.367402] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
> > [ 240.367406] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
> > [ 240.367410] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> > [ 240.367413] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> > [ 240.367416] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
> > [ 240.367420] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
> > [ 240.367424] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
> > [ 240.367428] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
> > [ 240.367431] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
> > [ 240.367433] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
> > [ 240.367436] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
> > [ 240.367439] rcu_show_nocb_setup(): rcu_sched nocb state:
> > [ 240.372734] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 .G.
> > [ 240.379673] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) .G.
> > [ 240.386611] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
> > [ 240.393550] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
> > [ 240.400489] rcu_show_nocb_setup(): rcu_bh nocb state:
> > [ 240.405525] 0: ffff88013fc0e3c0 l:ffff88013fc0e3c0 n:ffff88013fc8e3c0 ...
> > [ 240.412463] 1: ffff88013fc8e3c0 l:ffff88013fc0e3c0 n: (null) ...
> > [ 240.419401] 2: ffff88013fd0e3c0 l:ffff88013fd0e3c0 n:ffff88013fd8e3c0 ...
> > [ 240.426339] 3: ffff88013fd8e3c0 l:ffff88013fd0e3c0 n: (null) ...
> > [ 360.432020] INFO: task ovs-vswitchd:902 blocked for more than 120 seconds.
> > [ 360.438881] Not tainted 3.17.0-testola+ #4
> > [ 360.443484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 360.451289] ovs-vswitchd D ffff88013fc94600 0 902 901 0x00000004
> > [ 360.451293] ffff8800ab20f7b8 0000000000000002 ffff8800b3304b00 ffff8800ab20ffd8
> > [ 360.451297] 0000000000014600 0000000000014600 ffff8800b0810000 ffff8800b3304b00
> > [ 360.451300] ffff8800b3304b00 ffffffff81c59850 ffffffff81c59858 7fffffffffffffff
> > [ 360.451303] Call Trace:
> > [ 360.451311] [<ffffffff81722b99>] schedule+0x29/0x70
> > [ 360.451314] [<ffffffff81725b6c>] schedule_timeout+0x1dc/0x260
> > [ 360.451317] [<ffffffff81722f69>] ? _cond_resched+0x29/0x40
> > [ 360.451320] [<ffffffff81723818>] ? wait_for_completion+0x28/0x160
> > [ 360.451325] [<ffffffff811081a7>] ? queue_stop_cpus_work+0xc7/0xe0
> > [ 360.451327] [<ffffffff81723896>] wait_for_completion+0xa6/0x160
> > [ 360.451331] [<ffffffff81099980>] ? wake_up_state+0x20/0x20
> > [ 360.451335] [<ffffffff810d0ecc>] _rcu_barrier+0x20c/0x480
> > [ 360.451338] [<ffffffff810d1195>] rcu_barrier+0x15/0x20
> > [ 360.451342] [<ffffffff81625010>] netdev_run_todo+0x60/0x300
> > [ 360.451345] [<ffffffff8162f9ee>] rtnl_unlock+0xe/0x10
> > [ 360.451353] [<ffffffffa01ffcc5>] internal_dev_destroy+0x55/0x80 [openvswitch]
> > [ 360.451358] [<ffffffffa01ff622>] ovs_vport_del+0x32/0x40 [openvswitch]
> > [ 360.451362] [<ffffffffa01f8dd0>] ovs_dp_detach_port+0x30/0x40 [openvswitch]
> > [ 360.451366] [<ffffffffa01f8ea5>] ovs_vport_cmd_del+0xc5/0x110 [openvswitch]
> > [ 360.451370] [<ffffffff81651d75>] genl_family_rcv_msg+0x1a5/0x3c0
> > [ 360.451373] [<ffffffff81651f90>] ? genl_family_rcv_msg+0x3c0/0x3c0
> > [ 360.451376] [<ffffffff81652021>] genl_rcv_msg+0x91/0xd0
> > [ 360.451379] [<ffffffff81650091>] netlink_rcv_skb+0xc1/0xe0
> > [ 360.451381] [<ffffffff816505bc>] genl_rcv+0x2c/0x40
> > [ 360.451384] [<ffffffff8164f626>] netlink_unicast+0xf6/0x200
> > [ 360.451387] [<ffffffff8164fa4d>] netlink_sendmsg+0x31d/0x780
> > [ 360.451390] [<ffffffff8164ca74>] ? netlink_rcv_wake+0x44/0x60
> > [ 360.451394] [<ffffffff81606a53>] sock_sendmsg+0x93/0xd0
> > [ 360.451399] [<ffffffff81337700>] ? apparmor_capable+0x60/0x60
> > [ 360.451402] [<ffffffff81614f27>] ? verify_iovec+0x47/0xd0
> > [ 360.451406] [<ffffffff81606e79>] ___sys_sendmsg+0x399/0x3b0
> > [ 360.451410] [<ffffffff812598a2>] ? kernfs_seq_stop_active+0x32/0x40
> > [ 360.451414] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> > [ 360.451417] [<ffffffff8101c385>] ? native_sched_clock+0x35/0x90
> > [ 360.451419] [<ffffffff8101c3e9>] ? sched_clock+0x9/0x10
> > [ 360.451424] [<ffffffff811277fc>] ? acct_account_cputime+0x1c/0x20
> > [ 360.451427] [<ffffffff8109ce6b>] ? account_user_time+0x8b/0xa0
> > [ 360.451431] [<ffffffff81200bd5>] ? __fget_light+0x25/0x70
> > [ 360.451434] [<ffffffff81607c02>] __sys_sendmsg+0x42/0x80
> > [ 360.451437] [<ffffffff81607c52>] SyS_sendmsg+0x12/0x20
> > [ 360.451440] [<ffffffff81727464>] tracesys_phase2+0xd8/0xdd
> > [ 360.451442] rcu_show_nocb_setup(): rcu_sched nocb state:
> > [ 360.456737] 0: ffff88013fc0e600 l:ffff88013fc0e600 n:ffff88013fc8e600 ...
> > [ 360.463676] 1: ffff88013fc8e600 l:ffff88013fc0e600 n: (null) ...
> > [ 360.470614] 2: ffff88013fd0e600 l:ffff88013fd0e600 n:ffff88013fd8e600 N..
> > [ 360.477554] 3: ffff88013fd8e600 l:ffff88013fd0e600 n: (null) N..
>
> Hmmm... It sure looks like we have some callbacks stuck here. I clearly
> need to take a hard look at the sleep/wakeup code.
>
> Thank you for running this!!!

Could you please try the following patch? If no joy, could you please
add rcu:rcu_nocb_wake to the list of ftrace events?

Thanx, Paul

------------------------------------------------------------------------

rcu: Kick rcuo kthreads after their CPU goes offline

If a no-CBs CPU were to post an RCU callback with interrupts disabled
after it entered the idle loop for the last time, there might be no
deferred wakeup for the corresponding rcuo kthreads. This commit
therefore adds a set of calls to do_nocb_deferred_wakeup() after the
CPU has gone completely offline.

Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 84b41b3c6ebd..f6880052b917 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3493,8 +3493,10 @@ static int rcu_cpu_notify(struct notifier_block *self,
case CPU_DEAD_FROZEN:
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
- for_each_rcu_flavor(rsp)
+ for_each_rcu_flavor(rsp) {
rcu_cleanup_dead_cpu(cpu, rsp);
+ do_nocb_deferred_wakeup(per_cpu_ptr(rsp->rda, cpu));
+ }
break;
default:
break;

2014-10-25 00:21:06

by Jay Vosburgh

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <[email protected]> wrote:

>On Fri, Oct 24, 2014 at 03:59:31PM -0700, Paul E. McKenney wrote:
[...]
>> Hmmm... It sure looks like we have some callbacks stuck here. I clearly
>> need to take a hard look at the sleep/wakeup code.
>>
>> Thank you for running this!!!
>
>Could you please try the following patch? If no joy, could you please
>add rcu:rcu_nocb_wake to the list of ftrace events?

I tried the patch, it did not change the behavior.

I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
and ran it again (with this patch and the first patch from earlier
today); the trace output is a bit on the large side so I put it and the
dmesg log at:

http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt

http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt

-J


> Thanx, Paul
>
>------------------------------------------------------------------------
>
>rcu: Kick rcuo kthreads after their CPU goes offline
>
>If a no-CBs CPU were to post an RCU callback with interrupts disabled
>after it entered the idle loop for the last time, there might be no
>deferred wakeup for the corresponding rcuo kthreads. This commit
>therefore adds a set of calls to do_nocb_deferred_wakeup() after the
>CPU has gone completely offline.
>
>Signed-off-by: Paul E. McKenney <[email protected]>
>
>diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>index 84b41b3c6ebd..f6880052b917 100644
>--- a/kernel/rcu/tree.c
>+++ b/kernel/rcu/tree.c
>@@ -3493,8 +3493,10 @@ static int rcu_cpu_notify(struct notifier_block *self,
> case CPU_DEAD_FROZEN:
> case CPU_UP_CANCELED:
> case CPU_UP_CANCELED_FROZEN:
>- for_each_rcu_flavor(rsp)
>+ for_each_rcu_flavor(rsp) {
> rcu_cleanup_dead_cpu(cpu, rsp);
>+ do_nocb_deferred_wakeup(per_cpu_ptr(rsp->rda, cpu));
>+ }
> break;
> default:
> break;
>

---
-Jay Vosburgh, [email protected]

2014-10-25 02:07:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 05:20:48PM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <[email protected]> wrote:
>
> >On Fri, Oct 24, 2014 at 03:59:31PM -0700, Paul E. McKenney wrote:
> [...]
> >> Hmmm... It sure looks like we have some callbacks stuck here. I clearly
> >> need to take a hard look at the sleep/wakeup code.
> >>
> >> Thank you for running this!!!
> >
> >Could you please try the following patch? If no joy, could you please
> >add rcu:rcu_nocb_wake to the list of ftrace events?
>
> I tried the patch, it did not change the behavior.
>
> I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
> and ran it again (with this patch and the first patch from earlier
> today); the trace output is a bit on the large side so I put it and the
> dmesg log at:
>
> http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt
>
> http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt

Thank you again!

Very strange part of the trace. The only sign of CPU 2 and 3 are:

ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 0 WakeNot
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 1 WakeNot
ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 2 WakeNotPoll
ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 3 WakeNotPoll
ovs-vswitchd-902 [000] .... 109.896843: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2

The pair of WakeNotPoll trace entries says that at that point, RCU believed
that the CPU 2's and CPU 3's rcuo kthreads did not exist. :-/

More diagnostics in order...

Thanx, Paul

2014-10-25 18:22:23

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <[email protected]> wrote:
>
> >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> >> Looking at the dmesg, the early boot messages seem to be
> >> confused as to how many CPUs there are, e.g.,
> >>
> >> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [ 0.000000] Hierarchical RCU implementation.
> >> [ 0.000000] RCU debugfs-based tracing is enabled.
> >> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> >> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> >> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> >> [ 0.000000] NR_IRQS:16640 nr_irqs:456 0
> >> [ 0.000000] Offload RCU callbacks from all CPUs
> >> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.
> >>
> >> but later shows 2:
> >>
> >> [ 0.233703] x86: Booting SMP configuration:
> >> [ 0.236003] .... node #0, CPUs: #1
> >> [ 0.255528] x86: Booted up 1 node, 2 CPUs
> >>
> >> In any event, the E8400 is a 2 core CPU with no hyperthreading.
> >
> >Well, this might explain some of the difficulties. If RCU decides to wait
> >on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
> >was definitely expecting four CPUs.
> >
> >So what happens if you boot with maxcpus=2? (Or build with
> >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
> >I might have some ideas for a real fix.
>
> Booting with maxcpus=2 makes no difference (the dmesg output is
> the same).
>
> Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
> dmesg has different CPU information at boot:
>
> [ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
> [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> [...]
> [ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
> [...]
> [ 0.000000] Hierarchical RCU implementation.
> [ 0.000000] RCU debugfs-based tracing is enabled.
> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> [ 0.000000] NR_IRQS:4352 nr_irqs:440 0
> [ 0.000000] Offload RCU callbacks from all CPUs
> [ 0.000000] Offload RCU callbacks from CPUs: 0-1.

Thank you -- this confirms my suspicions on the fix, though I must admit
to being surprised that maxcpus made no difference.

Thanx, Paul

2014-10-25 18:40:58

by Jay Vosburgh

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <[email protected]> wrote:

>On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
>> Looking at the dmesg, the early boot messages seem to be
>> confused as to how many CPUs there are, e.g.,
>>
>> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> [ 0.000000] Hierarchical RCU implementation.
>> [ 0.000000] RCU debugfs-based tracing is enabled.
>> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
>> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
>> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
>> [ 0.000000] NR_IRQS:16640 nr_irqs:456 0
>> [ 0.000000] Offload RCU callbacks from all CPUs
>> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.
>>
>> but later shows 2:
>>
>> [ 0.233703] x86: Booting SMP configuration:
>> [ 0.236003] .... node #0, CPUs: #1
>> [ 0.255528] x86: Booted up 1 node, 2 CPUs
>>
>> In any event, the E8400 is a 2 core CPU with no hyperthreading.
>
>Well, this might explain some of the difficulties. If RCU decides to wait
>on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
>was definitely expecting four CPUs.
>
>So what happens if you boot with maxcpus=2? (Or build with
>CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
>I might have some ideas for a real fix.

Booting with maxcpus=2 makes no difference (the dmesg output is
the same).

Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
dmesg has different CPU information at boot:

[ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[...]
[ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[...]
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:4352 nr_irqs:440 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-1.

-J

---
-Jay Vosburgh, [email protected]

2014-10-25 18:41:21

by Jay Vosburgh

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <[email protected]> wrote:

>On Fri, Oct 24, 2014 at 05:20:48PM -0700, Jay Vosburgh wrote:
>> Paul E. McKenney <[email protected]> wrote:
>>
>> >On Fri, Oct 24, 2014 at 03:59:31PM -0700, Paul E. McKenney wrote:
>> [...]
>> >> Hmmm... It sure looks like we have some callbacks stuck here. I clearly
>> >> need to take a hard look at the sleep/wakeup code.
>> >>
>> >> Thank you for running this!!!
>> >
>> >Could you please try the following patch? If no joy, could you please
>> >add rcu:rcu_nocb_wake to the list of ftrace events?
>>
>> I tried the patch, it did not change the behavior.
>>
>> I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
>> and ran it again (with this patch and the first patch from earlier
>> today); the trace output is a bit on the large side so I put it and the
>> dmesg log at:
>>
>> http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt
>>
>> http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt
>
>Thank you again!
>
>Very strange part of the trace. The only sign of CPU 2 and 3 are:
>
> ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
> ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
> ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
> ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
> ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 0 WakeNot
> ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
> ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 1 WakeNot
> ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
> ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 2 WakeNotPoll
> ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
> ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 3 WakeNotPoll
> ovs-vswitchd-902 [000] .... 109.896843: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
>
>The pair of WakeNotPoll trace entries says that at that point, RCU believed
>that the CPU 2's and CPU 3's rcuo kthreads did not exist. :-/

On the test system I'm using, CPUs 2 and 3 really do not exist;
it is a 2 CPU system (Intel Core 2 Duo E8400). I mentioned this in an
earlier message, but perhaps you missed it in the flurry.

Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,

[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS:16640 nr_irqs:456 0
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-3.

but later shows 2:

[ 0.233703] x86: Booting SMP configuration:
[ 0.236003] .... node #0, CPUs: #1
[ 0.255528] x86: Booted up 1 node, 2 CPUs

In any event, the E8400 is a 2 core CPU with no hyperthreading.

-J

---
-Jay Vosburgh, [email protected]

2014-10-25 19:07:06

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Sat, Oct 25, 2014 at 03:09:36PM +0300, Yanko Kaneti wrote:
> On Fri-10/24/14-2014 14:49, Paul E. McKenney wrote:
> > On Sat, Oct 25, 2014 at 12:25:57AM +0300, Yanko Kaneti wrote:
> > > On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote:
> > > > On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> > > > > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
> >
> > [ . . . ]
> >
> > > > > > Well, if you are feeling aggressive, give the following patch a spin.
> > > > > > I am doing sanity tests on it in the meantime.
> > > > >
> > > > > Doesn't seem to make a difference here
> > > >
> > > > OK, inspection isn't cutting it, so time for tracing. Does the system
> > > > respond to user input? If so, please enable rcu:rcu_barrier ftrace before
> > > > the problem occurs, then dump the trace buffer after the problem occurs.
> > >
> > > Sorry for being unresposive here, but I know next to nothing about tracing
> > > or most things about the kernel, so I have some cathing up to do.
> > >
> > > In the meantime some layman observations while I tried to find what exactly
> > > triggers the problem.
> > > - Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
> > > - libvirtd seems to be very active in using all sorts of kernel facilities
> > > that are modules on fedora so it seems to cause many simultaneous kworker
> > > calls to modprobe
> > > - there are 8 kworker/u16 from 0 to 7
> > > - one of these kworkers always deadlocks, while there appear to be two
> > > kworker/u16:6 - the seventh
> >
> > Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
> >
> > > 6 vs 8 as in 6 rcuos where before they were always 8
> > >
> > > Just observations from someone who still doesn't know what the u16
> > > kworkers are..
> >
> > Could you please run the following diagnostic patch? This will help
> > me see if I have managed to miswire the rcuo kthreads. It should
> > print some information at task-hang time.
>
> So here the output with todays linux tip and the diagnostic patch.
> This is the case with just starting libvird in runlevel 1.

Thank you for testing this!

> Also a snapshot of the kworker/u16 s
>
> 6 ? S 0:00 \_ [kworker/u16:0]
> 553 ? S 0:00 | \_ [kworker/u16:0]
> 554 ? D 0:00 | \_ /sbin/modprobe -q -- bridge
> 78 ? S 0:00 \_ [kworker/u16:1]
> 92 ? S 0:00 \_ [kworker/u16:2]
> 93 ? S 0:00 \_ [kworker/u16:3]
> 94 ? S 0:00 \_ [kworker/u16:4]
> 95 ? S 0:00 \_ [kworker/u16:5]
> 96 ? D 0:00 \_ [kworker/u16:6]
> 105 ? S 0:00 \_ [kworker/u16:7]
> 108 ? S 0:00 \_ [kworker/u16:8]

You had six CPUs, IIRC, so the last two kworker/u16 kthreads are surplus
to requirements. Not sure if they are causing any trouble, though.

> INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
> Not tainted 3.18.0-rc1+ #16
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u16:6 D ffff8800ca9ecec0 11552 96 2 0x00000000
> Workqueue: netns cleanup_net
> ffff880221fff9c8 0000000000000096 ffff8800ca9ecec0 00000000001d5f00
> ffff880221ffffd8 00000000001d5f00 ffff880223260000 ffff8800ca9ecec0
> ffffffff82c44010 7fffffffffffffff ffffffff81ee3798 ffffffff81ee3790
> Call Trace:
> [<ffffffff81866219>] schedule+0x29/0x70
> [<ffffffff8186b43c>] schedule_timeout+0x26c/0x410
> [<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
> [<ffffffff8110748c>] ? mark_held_locks+0x7c/0xb0
> [<ffffffff8186c4c0>] ? _raw_spin_unlock_irq+0x30/0x50
> [<ffffffff8110761d>] ? trace_hardirqs_on_caller+0x15d/0x200
> [<ffffffff81867c4c>] wait_for_completion+0x10c/0x150
> [<ffffffff810e4dc0>] ? wake_up_state+0x20/0x20
> [<ffffffff81133627>] _rcu_barrier+0x677/0xcd0
> [<ffffffff81133cd5>] rcu_barrier+0x15/0x20
> [<ffffffff81720edf>] netdev_run_todo+0x6f/0x310
> [<ffffffff81715aa5>] ? rollback_registered_many+0x265/0x2e0
> [<ffffffff8172df4e>] rtnl_unlock+0xe/0x10
> [<ffffffff81717906>] default_device_exit_batch+0x156/0x180
> [<ffffffff810fd280>] ? abort_exclusive_wait+0xb0/0xb0
> [<ffffffff8170f9b3>] ops_exit_list.isra.1+0x53/0x60
> [<ffffffff81710560>] cleanup_net+0x100/0x1f0
> [<ffffffff810cc988>] process_one_work+0x218/0x850
> [<ffffffff810cc8ef>] ? process_one_work+0x17f/0x850
> [<ffffffff810cd0a7>] ? worker_thread+0xe7/0x4a0
> [<ffffffff810cd02b>] worker_thread+0x6b/0x4a0
> [<ffffffff810ccfc0>] ? process_one_work+0x850/0x850
> [<ffffffff810d337b>] kthread+0x10b/0x130
> [<ffffffff81028c69>] ? sched_clock+0x9/0x10
> [<ffffffff810d3270>] ? kthread_create_on_node+0x250/0x250
> [<ffffffff8186d1fc>] ret_from_fork+0x7c/0xb0
> [<ffffffff810d3270>] ? kthread_create_on_node+0x250/0x250
> 4 locks held by kworker/u16:6/96:
> #0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc8ef>]
> #process_one_work+0x17f/0x850
> #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc8ef>]
> #process_one_work+0x17f/0x850
> #2: (net_mutex){+.+.+.}, at: [<ffffffff817104ec>] cleanup_net+0x8c/0x1f0
> #3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff81133025>]
> #_rcu_barrier+0x75/0xcd0
> rcu_show_nocb_setup(): rcu_sched nocb state:
> 0: ffff8802267ced40 l:ffff8802267ced40 n:ffff8802269ced40 .G.
> 1: ffff8802269ced40 l:ffff8802267ced40 n: (null) ...
> 2: ffff880226bced40 l:ffff880226bced40 n:ffff880226dced40 .G.
> 3: ffff880226dced40 l:ffff880226bced40 n: (null) N..
> 4: ffff880226fced40 l:ffff880226fced40 n:ffff8802271ced40 .G.
> 5: ffff8802271ced40 l:ffff880226fced40 n: (null) ...
> 6: ffff8802273ced40 l:ffff8802273ced40 n:ffff8802275ced40 N..
> 7: ffff8802275ced40 l:ffff8802273ced40 n: (null) N..

And this looks like rcu_barrier() has posted callbacks for the
non-existent CPUs 7 and 8, similar to what Jay was seeing.

I am working on a fix -- chasing down corner cases.

Thanx, Paul

> rcu_show_nocb_setup(): rcu_bh nocb state:
> 0: ffff8802267ceac0 l:ffff8802267ceac0 n:ffff8802269ceac0 ...
> 1: ffff8802269ceac0 l:ffff8802267ceac0 n: (null) ...
> 2: ffff880226bceac0 l:ffff880226bceac0 n:ffff880226dceac0 ...
> 3: ffff880226dceac0 l:ffff880226bceac0 n: (null) ...
> 4: ffff880226fceac0 l:ffff880226fceac0 n:ffff8802271ceac0 ...
> 5: ffff8802271ceac0 l:ffff880226fceac0 n: (null) ...
> 6: ffff8802273ceac0 l:ffff8802273ceac0 n:ffff8802275ceac0 ...
> 7: ffff8802275ceac0 l:ffff8802273ceac0 n: (null) ...
> INFO: task modprobe:554 blocked for more than 120 seconds.
> Not tainted 3.18.0-rc1+ #16
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> modprobe D ffff8800c85dcec0 12456 554 553 0x00000000
> ffff8802178afbf8 0000000000000096 ffff8800c85dcec0 00000000001d5f00
> ffff8802178affd8 00000000001d5f00 ffffffff81e1b580 ffff8800c85dcec0
> ffff8800c85dcec0 ffffffff81f90c08 0000000000000246 ffff8800c85dcec0
> Call Trace:
> [<ffffffff818667c1>] schedule_preempt_disabled+0x31/0x80
> [<ffffffff81868013>] mutex_lock_nested+0x183/0x440
> [<ffffffff8171037f>] ? register_pernet_subsys+0x1f/0x50
> [<ffffffff8171037f>] ? register_pernet_subsys+0x1f/0x50
> [<ffffffffa0619000>] ? 0xffffffffa0619000
> [<ffffffff8171037f>] register_pernet_subsys+0x1f/0x50
> [<ffffffffa0619048>] br_init+0x48/0xd3 [bridge]
> [<ffffffff81002148>] do_one_initcall+0xd8/0x210
> [<ffffffff8115bc22>] load_module+0x20c2/0x2870
> [<ffffffff81156c00>] ? store_uevent+0x70/0x70
> [<ffffffff81281327>] ? kernel_read+0x57/0x90
> [<ffffffff8115c5b6>] SyS_finit_module+0xa6/0xe0
> [<ffffffff8186d2d5>] ? sysret_check+0x22/0x5d
> [<ffffffff8186d2a9>] system_call_fastpath+0x12/0x17
> 1 lock held by modprobe/554:
> #0: (net_mutex){+.+.+.}, at: [<ffffffff8171037f>]
> #register_pernet_subsys+0x1f/0x50
> rcu_show_nocb_setup(): rcu_sched nocb state:
> 0: ffff8802267ced40 l:ffff8802267ced40 n:ffff8802269ced40 .G.
> 1: ffff8802269ced40 l:ffff8802267ced40 n: (null) ...
> 2: ffff880226bced40 l:ffff880226bced40 n:ffff880226dced40 .G.
> 3: ffff880226dced40 l:ffff880226bced40 n: (null) N..
> 4: ffff880226fced40 l:ffff880226fced40 n:ffff8802271ced40 .G.
> 5: ffff8802271ced40 l:ffff880226fced40 n: (null) ...
> 6: ffff8802273ced40 l:ffff8802273ced40 n:ffff8802275ced40 N..
> 7: ffff8802275ced40 l:ffff8802273ced40 n: (null) N..
> rcu_show_nocb_setup(): rcu_bh nocb state:
> 0: ffff8802267ceac0 l:ffff8802267ceac0 n:ffff8802269ceac0 ...
> 1: ffff8802269ceac0 l:ffff8802267ceac0 n: (null) ...
> 2: ffff880226bceac0 l:ffff880226bceac0 n:ffff880226dceac0 ...
> 3: ffff880226dceac0 l:ffff880226bceac0 n: (null) ...
> 4: ffff880226fceac0 l:ffff880226fceac0 n:ffff8802271ceac0 ...
> 5: ffff8802271ceac0 l:ffff880226fceac0 n: (null) ...
> 6: ffff8802273ceac0 l:ffff8802273ceac0 n:ffff8802275ceac0 ...
> 7: ffff8802275ceac0 l:ffff8802273ceac0 n: (null) ...
>
>
>
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > rcu: Dump no-CBs CPU state at task-hung time
> >
> > Strictly diagnostic commit for rcu_barrier() hang. Not for inclusion.
> >
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > index 0e5366200154..34048140577b 100644
> > --- a/include/linux/rcutiny.h
> > +++ b/include/linux/rcutiny.h
> > @@ -157,4 +157,8 @@ static inline bool rcu_is_watching(void)
> >
> > #endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
> >
> > +static inline void rcu_show_nocb_setup(void)
> > +{
> > +}
> > +
> > #endif /* __LINUX_RCUTINY_H */
> > diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> > index 52953790dcca..0b813bdb971b 100644
> > --- a/include/linux/rcutree.h
> > +++ b/include/linux/rcutree.h
> > @@ -97,4 +97,6 @@ extern int rcu_scheduler_active __read_mostly;
> >
> > bool rcu_is_watching(void);
> >
> > +void rcu_show_nocb_setup(void);
> > +
> > #endif /* __LINUX_RCUTREE_H */
> > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > index 06db12434d72..e6e4d0f6b063 100644
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -118,6 +118,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> > " disables this message.\n");
> > sched_show_task(t);
> > debug_show_held_locks(t);
> > + rcu_show_nocb_setup();
> >
> > touch_nmi_watchdog();
> >
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index 240fa9094f83..6b373e79ce0e 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -1513,6 +1513,7 @@ rcu_torture_cleanup(void)
> > {
> > int i;
> >
> > + rcu_show_nocb_setup();
> > rcutorture_record_test_transition();
> > if (torture_cleanup_begin()) {
> > if (cur_ops->cb_barrier != NULL)
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 927c17b081c7..285b3f6fb229 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2699,6 +2699,31 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
> >
> > #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
> >
> > +void rcu_show_nocb_setup(void)
> > +{
> > +#ifdef CONFIG_RCU_NOCB_CPU
> > + int cpu;
> > + struct rcu_data *rdp;
> > + struct rcu_state *rsp;
> > +
> > + for_each_rcu_flavor(rsp) {
> > + pr_alert("rcu_show_nocb_setup(): %s nocb state:\n", rsp->name);
> > + for_each_possible_cpu(cpu) {
> > + if (!rcu_is_nocb_cpu(cpu))
> > + continue;
> > + rdp = per_cpu_ptr(rsp->rda, cpu);
> > + pr_alert("%3d: %p l:%p n:%p %c%c%c\n",
> > + cpu,
> > + rdp, rdp->nocb_leader, rdp->nocb_next_follower,
> > + ".N"[!!rdp->nocb_head],
> > + ".G"[!!rdp->nocb_gp_head],
> > + ".F"[!!rdp->nocb_follower_head]);
> > + }
> > + }
> > +#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
> > +}
> > +EXPORT_SYMBOL_GPL(rcu_show_nocb_setup);
> > +
> > /*
> > * An adaptive-ticks CPU can potentially execute in kernel mode for an
> > * arbitrarily long period of time with the scheduling-clock tick turned
> >
>

2014-10-25 19:16:24

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri-10/24/14-2014 14:49, Paul E. McKenney wrote:
> On Sat, Oct 25, 2014 at 12:25:57AM +0300, Yanko Kaneti wrote:
> > On Fri-10/24/14-2014 11:32, Paul E. McKenney wrote:
> > > On Fri, Oct 24, 2014 at 08:35:26PM +0300, Yanko Kaneti wrote:
> > > > On Fri-10/24/14-2014 10:20, Paul E. McKenney wrote:
>
> [ . . . ]
>
> > > > > Well, if you are feeling aggressive, give the following patch a spin.
> > > > > I am doing sanity tests on it in the meantime.
> > > >
> > > > Doesn't seem to make a difference here
> > >
> > > OK, inspection isn't cutting it, so time for tracing. Does the system
> > > respond to user input? If so, please enable rcu:rcu_barrier ftrace before
> > > the problem occurs, then dump the trace buffer after the problem occurs.
> >
> > Sorry for being unresposive here, but I know next to nothing about tracing
> > or most things about the kernel, so I have some cathing up to do.
> >
> > In the meantime some layman observations while I tried to find what exactly
> > triggers the problem.
> > - Even in runlevel 1 I can reliably trigger the problem by starting libvirtd
> > - libvirtd seems to be very active in using all sorts of kernel facilities
> > that are modules on fedora so it seems to cause many simultaneous kworker
> > calls to modprobe
> > - there are 8 kworker/u16 from 0 to 7
> > - one of these kworkers always deadlocks, while there appear to be two
> > kworker/u16:6 - the seventh
>
> Adding Tejun on CC in case this duplication of kworker/u16:6 is important.
>
> > 6 vs 8 as in 6 rcuos where before they were always 8
> >
> > Just observations from someone who still doesn't know what the u16
> > kworkers are..
>
> Could you please run the following diagnostic patch? This will help
> me see if I have managed to miswire the rcuo kthreads. It should
> print some information at task-hang time.

So here the output with todays linux tip and the diagnostic patch.
This is the case with just starting libvird in runlevel 1.
Also a snapshot of the kworker/u16 s

6 ? S 0:00 \_ [kworker/u16:0]
553 ? S 0:00 | \_ [kworker/u16:0]
554 ? D 0:00 | \_ /sbin/modprobe -q -- bridge
78 ? S 0:00 \_ [kworker/u16:1]
92 ? S 0:00 \_ [kworker/u16:2]
93 ? S 0:00 \_ [kworker/u16:3]
94 ? S 0:00 \_ [kworker/u16:4]
95 ? S 0:00 \_ [kworker/u16:5]
96 ? D 0:00 \_ [kworker/u16:6]
105 ? S 0:00 \_ [kworker/u16:7]
108 ? S 0:00 \_ [kworker/u16:8]


INFO: task kworker/u16:6:96 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #16
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:6 D ffff8800ca9ecec0 11552 96 2 0x00000000
Workqueue: netns cleanup_net
ffff880221fff9c8 0000000000000096 ffff8800ca9ecec0 00000000001d5f00
ffff880221ffffd8 00000000001d5f00 ffff880223260000 ffff8800ca9ecec0
ffffffff82c44010 7fffffffffffffff ffffffff81ee3798 ffffffff81ee3790
Call Trace:
[<ffffffff81866219>] schedule+0x29/0x70
[<ffffffff8186b43c>] schedule_timeout+0x26c/0x410
[<ffffffff81028bea>] ? native_sched_clock+0x2a/0xa0
[<ffffffff8110748c>] ? mark_held_locks+0x7c/0xb0
[<ffffffff8186c4c0>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8110761d>] ? trace_hardirqs_on_caller+0x15d/0x200
[<ffffffff81867c4c>] wait_for_completion+0x10c/0x150
[<ffffffff810e4dc0>] ? wake_up_state+0x20/0x20
[<ffffffff81133627>] _rcu_barrier+0x677/0xcd0
[<ffffffff81133cd5>] rcu_barrier+0x15/0x20
[<ffffffff81720edf>] netdev_run_todo+0x6f/0x310
[<ffffffff81715aa5>] ? rollback_registered_many+0x265/0x2e0
[<ffffffff8172df4e>] rtnl_unlock+0xe/0x10
[<ffffffff81717906>] default_device_exit_batch+0x156/0x180
[<ffffffff810fd280>] ? abort_exclusive_wait+0xb0/0xb0
[<ffffffff8170f9b3>] ops_exit_list.isra.1+0x53/0x60
[<ffffffff81710560>] cleanup_net+0x100/0x1f0
[<ffffffff810cc988>] process_one_work+0x218/0x850
[<ffffffff810cc8ef>] ? process_one_work+0x17f/0x850
[<ffffffff810cd0a7>] ? worker_thread+0xe7/0x4a0
[<ffffffff810cd02b>] worker_thread+0x6b/0x4a0
[<ffffffff810ccfc0>] ? process_one_work+0x850/0x850
[<ffffffff810d337b>] kthread+0x10b/0x130
[<ffffffff81028c69>] ? sched_clock+0x9/0x10
[<ffffffff810d3270>] ? kthread_create_on_node+0x250/0x250
[<ffffffff8186d1fc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d3270>] ? kthread_create_on_node+0x250/0x250
4 locks held by kworker/u16:6/96:
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff810cc8ef>]
#process_one_work+0x17f/0x850
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff810cc8ef>]
#process_one_work+0x17f/0x850
#2: (net_mutex){+.+.+.}, at: [<ffffffff817104ec>] cleanup_net+0x8c/0x1f0
#3: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff81133025>]
#_rcu_barrier+0x75/0xcd0
rcu_show_nocb_setup(): rcu_sched nocb state:
0: ffff8802267ced40 l:ffff8802267ced40 n:ffff8802269ced40 .G.
1: ffff8802269ced40 l:ffff8802267ced40 n: (null) ...
2: ffff880226bced40 l:ffff880226bced40 n:ffff880226dced40 .G.
3: ffff880226dced40 l:ffff880226bced40 n: (null) N..
4: ffff880226fced40 l:ffff880226fced40 n:ffff8802271ced40 .G.
5: ffff8802271ced40 l:ffff880226fced40 n: (null) ...
6: ffff8802273ced40 l:ffff8802273ced40 n:ffff8802275ced40 N..
7: ffff8802275ced40 l:ffff8802273ced40 n: (null) N..
rcu_show_nocb_setup(): rcu_bh nocb state:
0: ffff8802267ceac0 l:ffff8802267ceac0 n:ffff8802269ceac0 ...
1: ffff8802269ceac0 l:ffff8802267ceac0 n: (null) ...
2: ffff880226bceac0 l:ffff880226bceac0 n:ffff880226dceac0 ...
3: ffff880226dceac0 l:ffff880226bceac0 n: (null) ...
4: ffff880226fceac0 l:ffff880226fceac0 n:ffff8802271ceac0 ...
5: ffff8802271ceac0 l:ffff880226fceac0 n: (null) ...
6: ffff8802273ceac0 l:ffff8802273ceac0 n:ffff8802275ceac0 ...
7: ffff8802275ceac0 l:ffff8802273ceac0 n: (null) ...
INFO: task modprobe:554 blocked for more than 120 seconds.
Not tainted 3.18.0-rc1+ #16
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D ffff8800c85dcec0 12456 554 553 0x00000000
ffff8802178afbf8 0000000000000096 ffff8800c85dcec0 00000000001d5f00
ffff8802178affd8 00000000001d5f00 ffffffff81e1b580 ffff8800c85dcec0
ffff8800c85dcec0 ffffffff81f90c08 0000000000000246 ffff8800c85dcec0
Call Trace:
[<ffffffff818667c1>] schedule_preempt_disabled+0x31/0x80
[<ffffffff81868013>] mutex_lock_nested+0x183/0x440
[<ffffffff8171037f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffff8171037f>] ? register_pernet_subsys+0x1f/0x50
[<ffffffffa0619000>] ? 0xffffffffa0619000
[<ffffffff8171037f>] register_pernet_subsys+0x1f/0x50
[<ffffffffa0619048>] br_init+0x48/0xd3 [bridge]
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff8115bc22>] load_module+0x20c2/0x2870
[<ffffffff81156c00>] ? store_uevent+0x70/0x70
[<ffffffff81281327>] ? kernel_read+0x57/0x90
[<ffffffff8115c5b6>] SyS_finit_module+0xa6/0xe0
[<ffffffff8186d2d5>] ? sysret_check+0x22/0x5d
[<ffffffff8186d2a9>] system_call_fastpath+0x12/0x17
1 lock held by modprobe/554:
#0: (net_mutex){+.+.+.}, at: [<ffffffff8171037f>]
#register_pernet_subsys+0x1f/0x50
rcu_show_nocb_setup(): rcu_sched nocb state:
0: ffff8802267ced40 l:ffff8802267ced40 n:ffff8802269ced40 .G.
1: ffff8802269ced40 l:ffff8802267ced40 n: (null) ...
2: ffff880226bced40 l:ffff880226bced40 n:ffff880226dced40 .G.
3: ffff880226dced40 l:ffff880226bced40 n: (null) N..
4: ffff880226fced40 l:ffff880226fced40 n:ffff8802271ced40 .G.
5: ffff8802271ced40 l:ffff880226fced40 n: (null) ...
6: ffff8802273ced40 l:ffff8802273ced40 n:ffff8802275ced40 N..
7: ffff8802275ced40 l:ffff8802273ced40 n: (null) N..
rcu_show_nocb_setup(): rcu_bh nocb state:
0: ffff8802267ceac0 l:ffff8802267ceac0 n:ffff8802269ceac0 ...
1: ffff8802269ceac0 l:ffff8802267ceac0 n: (null) ...
2: ffff880226bceac0 l:ffff880226bceac0 n:ffff880226dceac0 ...
3: ffff880226dceac0 l:ffff880226bceac0 n: (null) ...
4: ffff880226fceac0 l:ffff880226fceac0 n:ffff8802271ceac0 ...
5: ffff8802271ceac0 l:ffff880226fceac0 n: (null) ...
6: ffff8802273ceac0 l:ffff8802273ceac0 n:ffff8802275ceac0 ...
7: ffff8802275ceac0 l:ffff8802273ceac0 n: (null) ...



> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> rcu: Dump no-CBs CPU state at task-hung time
>
> Strictly diagnostic commit for rcu_barrier() hang. Not for inclusion.
>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index 0e5366200154..34048140577b 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -157,4 +157,8 @@ static inline bool rcu_is_watching(void)
>
> #endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
>
> +static inline void rcu_show_nocb_setup(void)
> +{
> +}
> +
> #endif /* __LINUX_RCUTINY_H */
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index 52953790dcca..0b813bdb971b 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -97,4 +97,6 @@ extern int rcu_scheduler_active __read_mostly;
>
> bool rcu_is_watching(void);
>
> +void rcu_show_nocb_setup(void);
> +
> #endif /* __LINUX_RCUTREE_H */
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 06db12434d72..e6e4d0f6b063 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -118,6 +118,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> " disables this message.\n");
> sched_show_task(t);
> debug_show_held_locks(t);
> + rcu_show_nocb_setup();
>
> touch_nmi_watchdog();
>
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 240fa9094f83..6b373e79ce0e 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -1513,6 +1513,7 @@ rcu_torture_cleanup(void)
> {
> int i;
>
> + rcu_show_nocb_setup();
> rcutorture_record_test_transition();
> if (torture_cleanup_begin()) {
> if (cur_ops->cb_barrier != NULL)
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 927c17b081c7..285b3f6fb229 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2699,6 +2699,31 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
>
> #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
>
> +void rcu_show_nocb_setup(void)
> +{
> +#ifdef CONFIG_RCU_NOCB_CPU
> + int cpu;
> + struct rcu_data *rdp;
> + struct rcu_state *rsp;
> +
> + for_each_rcu_flavor(rsp) {
> + pr_alert("rcu_show_nocb_setup(): %s nocb state:\n", rsp->name);
> + for_each_possible_cpu(cpu) {
> + if (!rcu_is_nocb_cpu(cpu))
> + continue;
> + rdp = per_cpu_ptr(rsp->rda, cpu);
> + pr_alert("%3d: %p l:%p n:%p %c%c%c\n",
> + cpu,
> + rdp, rdp->nocb_leader, rdp->nocb_next_follower,
> + ".N"[!!rdp->nocb_head],
> + ".G"[!!rdp->nocb_gp_head],
> + ".F"[!!rdp->nocb_follower_head]);
> + }
> + }
> +#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
> +}
> +EXPORT_SYMBOL_GPL(rcu_show_nocb_setup);
> +
> /*
> * An adaptive-ticks CPU can potentially execute in kernel mode for an
> * arbitrarily long period of time with the scheduling-clock tick turned
>

2014-10-25 21:12:58

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <[email protected]> wrote:
>
> >On Fri, Oct 24, 2014 at 05:20:48PM -0700, Jay Vosburgh wrote:
> >> Paul E. McKenney <[email protected]> wrote:
> >>
> >> >On Fri, Oct 24, 2014 at 03:59:31PM -0700, Paul E. McKenney wrote:
> >> [...]
> >> >> Hmmm... It sure looks like we have some callbacks stuck here. I clearly
> >> >> need to take a hard look at the sleep/wakeup code.
> >> >>
> >> >> Thank you for running this!!!
> >> >
> >> >Could you please try the following patch? If no joy, could you please
> >> >add rcu:rcu_nocb_wake to the list of ftrace events?
> >>
> >> I tried the patch, it did not change the behavior.
> >>
> >> I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
> >> and ran it again (with this patch and the first patch from earlier
> >> today); the trace output is a bit on the large side so I put it and the
> >> dmesg log at:
> >>
> >> http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt
> >>
> >> http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt
> >
> >Thank you again!
> >
> >Very strange part of the trace. The only sign of CPU 2 and 3 are:
> >
> > ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
> > ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
> > ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
> > ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
> > ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 0 WakeNot
> > ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
> > ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 1 WakeNot
> > ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
> > ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 2 WakeNotPoll
> > ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
> > ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 3 WakeNotPoll
> > ovs-vswitchd-902 [000] .... 109.896843: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
> >
> >The pair of WakeNotPoll trace entries says that at that point, RCU believed
> >that the CPU 2's and CPU 3's rcuo kthreads did not exist. :-/
>
> On the test system I'm using, CPUs 2 and 3 really do not exist;
> it is a 2 CPU system (Intel Core 2 Duo E8400). I mentioned this in an
> earlier message, but perhaps you missed it in the flurry.

Or forgot it. Either way, thank you for reminding me.

> Looking at the dmesg, the early boot messages seem to be
> confused as to how many CPUs there are, e.g.,
>
> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> [ 0.000000] Hierarchical RCU implementation.
> [ 0.000000] RCU debugfs-based tracing is enabled.
> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> [ 0.000000] NR_IRQS:16640 nr_irqs:456 0
> [ 0.000000] Offload RCU callbacks from all CPUs
> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.
>
> but later shows 2:
>
> [ 0.233703] x86: Booting SMP configuration:
> [ 0.236003] .... node #0, CPUs: #1
> [ 0.255528] x86: Booted up 1 node, 2 CPUs
>
> In any event, the E8400 is a 2 core CPU with no hyperthreading.

Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.

So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.

Thanx, Paul

2014-10-27 17:49:34

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Sat, Oct 25, 2014 at 11:18:27AM -0700, Paul E. McKenney wrote:
> On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
> > Paul E. McKenney <[email protected]> wrote:
> >
> > >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> > >> Looking at the dmesg, the early boot messages seem to be
> > >> confused as to how many CPUs there are, e.g.,
> > >>
> > >> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > >> [ 0.000000] Hierarchical RCU implementation.
> > >> [ 0.000000] RCU debugfs-based tracing is enabled.
> > >> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> > >> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> > >> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> > >> [ 0.000000] NR_IRQS:16640 nr_irqs:456 0
> > >> [ 0.000000] Offload RCU callbacks from all CPUs
> > >> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.
> > >>
> > >> but later shows 2:
> > >>
> > >> [ 0.233703] x86: Booting SMP configuration:
> > >> [ 0.236003] .... node #0, CPUs: #1
> > >> [ 0.255528] x86: Booted up 1 node, 2 CPUs
> > >>
> > >> In any event, the E8400 is a 2 core CPU with no hyperthreading.
> > >
> > >Well, this might explain some of the difficulties. If RCU decides to wait
> > >on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
> > >was definitely expecting four CPUs.
> > >
> > >So what happens if you boot with maxcpus=2? (Or build with
> > >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
> > >I might have some ideas for a real fix.
> >
> > Booting with maxcpus=2 makes no difference (the dmesg output is
> > the same).
> >
> > Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
> > dmesg has different CPU information at boot:
> >
> > [ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
> > [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> > [...]
> > [ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
> > [...]
> > [ 0.000000] Hierarchical RCU implementation.
> > [ 0.000000] RCU debugfs-based tracing is enabled.
> > [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> > [ 0.000000] NR_IRQS:4352 nr_irqs:440 0
> > [ 0.000000] Offload RCU callbacks from all CPUs
> > [ 0.000000] Offload RCU callbacks from CPUs: 0-1.
>
> Thank you -- this confirms my suspicions on the fix, though I must admit
> to being surprised that maxcpus made no difference.

And here is an alleged fix, lightly tested at this end. Does this patch
help?

Thanx, Paul

------------------------------------------------------------------------

rcu: Make rcu_barrier() understand about missing rcuo kthreads

Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online. This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.

It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix. The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.

It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread. This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks. It is therefore required to wait even for those callbacks
that cannot possibly be invoked. Even if doing so hangs the system.

Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case. Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().

So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.

Reported-by: Yanko Kaneti <[email protected]>
Reported-by: Jay Vosburgh <[email protected]>
Reported-by: Meelis Roos <[email protected]>
Reported-by: Eric B Munson <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aa8e5eea3ab4..c78e88ce5ea3 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
/*
* Tracepoint for _rcu_barrier() execution. The string "s" describes
* the _rcu_barrier phase:
- * "Begin": rcu_barrier_callback() started.
- * "Check": rcu_barrier_callback() checking for piggybacking.
- * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
- * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
- * "Offline": rcu_barrier_callback() found offline CPU
- * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
- * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
- * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
+ * "Begin": _rcu_barrier() started.
+ * "Check": _rcu_barrier() checking for piggybacking.
+ * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
+ * "Inc1": _rcu_barrier() piggyback check counter incremented.
+ * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
+ * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
+ * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
+ * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback.
- * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
+ * "Inc2": _rcu_barrier() piggyback check counter incremented.
* The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
* is the count of remaining callbacks, and "done" is the piggybacking count.
*/
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6880052b917..7680fc275036 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
continue;
rdp = per_cpu_ptr(rsp->rda, cpu);
if (rcu_is_nocb_cpu(cpu)) {
- _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
- rsp->n_barrier_done);
- atomic_inc(&rsp->barrier_cpu_count);
- __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
- rsp, cpu, 0);
+ if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
+ _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
+ rsp->n_barrier_done);
+ } else {
+ _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
+ rsp->n_barrier_done);
+ atomic_inc(&rsp->barrier_cpu_count);
+ __call_rcu(&rdp->barrier_head,
+ rcu_barrier_callback, rsp, cpu, 0);
+ }
} else if (ACCESS_ONCE(rdp->qlen)) {
_rcu_barrier_trace(rsp, "OnlineQ", cpu,
rsp->n_barrier_done);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4beab3d2328c..8e7b1843896e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
static void print_cpu_stall_info_end(void);
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static void increment_cpu_stall_ticks(void);
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
static void rcu_init_one_nocb(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..68c5b23b7173 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
}

/*
+ * Does the specified CPU need an RCU callback for the specified flavor
+ * of rcu_barrier()?
+ */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+ struct rcu_head *rhp;
+
+ /* No-CBs CPUs might have callbacks on any of three lists. */
+ rhp = ACCESS_ONCE(rdp->nocb_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_gp_head);
+ if (!rhp)
+ rhp = ACCESS_ONCE(rdp->nocb_follower_head);
+
+ /* Having no rcuo kthread but CBs after scheduler starts is bad! */
+ if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
+ /* RCU callback enqueued before CPU first came online??? */
+ pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
+ cpu, rhp->func);
+ WARN_ON_ONCE(1);
+ }
+
+ return !!rhp;
+}
+
+/*
* Enqueue the specified string of rcu_head structures onto the specified
* CPU's no-CBs lists. The CPU is specified by rdp, the head of the
* string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)

#else /* #ifdef CONFIG_RCU_NOCB_CPU */

+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+}
+
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
{
}

2014-10-27 20:43:38

by Jay Vosburgh

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <[email protected]> wrote:

>On Sat, Oct 25, 2014 at 11:18:27AM -0700, Paul E. McKenney wrote:
>> On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
>> > Paul E. McKenney <[email protected]> wrote:
>> >
>> > >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
>> > >> Looking at the dmesg, the early boot messages seem to be
>> > >> confused as to how many CPUs there are, e.g.,
>> > >>
>> > >> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> > >> [ 0.000000] Hierarchical RCU implementation.
>> > >> [ 0.000000] RCU debugfs-based tracing is enabled.
>> > >> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
>> > >> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
>> > >> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
>> > >> [ 0.000000] NR_IRQS:16640 nr_irqs:456 0
>> > >> [ 0.000000] Offload RCU callbacks from all CPUs
>> > >> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.
>> > >>
>> > >> but later shows 2:
>> > >>
>> > >> [ 0.233703] x86: Booting SMP configuration:
>> > >> [ 0.236003] .... node #0, CPUs: #1
>> > >> [ 0.255528] x86: Booted up 1 node, 2 CPUs
>> > >>
>> > >> In any event, the E8400 is a 2 core CPU with no hyperthreading.
>> > >
>> > >Well, this might explain some of the difficulties. If RCU decides to wait
>> > >on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
>> > >was definitely expecting four CPUs.
>> > >
>> > >So what happens if you boot with maxcpus=2? (Or build with
>> > >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
>> > >I might have some ideas for a real fix.
>> >
>> > Booting with maxcpus=2 makes no difference (the dmesg output is
>> > the same).
>> >
>> > Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
>> > dmesg has different CPU information at boot:
>> >
>> > [ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
>> > [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
>> > [...]
>> > [ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
>> > [...]
>> > [ 0.000000] Hierarchical RCU implementation.
>> > [ 0.000000] RCU debugfs-based tracing is enabled.
>> > [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
>> > [ 0.000000] NR_IRQS:4352 nr_irqs:440 0
>> > [ 0.000000] Offload RCU callbacks from all CPUs
>> > [ 0.000000] Offload RCU callbacks from CPUs: 0-1.
>>
>> Thank you -- this confirms my suspicions on the fix, though I must admit
>> to being surprised that maxcpus made no difference.
>
>And here is an alleged fix, lightly tested at this end. Does this patch
>help?

This patch appears to make the problem go away; I've run about
10 iterations. I applied this patch to the same -net tree I was using
previously (-net as of Oct 22), with all other test patches removed.

FWIW, dmesg is unchanged, and still shows messages like:

[ 0.000000] Offload RCU callbacks from CPUs: 0-3.

Tested-by: Jay Vosburgh <[email protected]>

-J

> Thanx, Paul
>
>------------------------------------------------------------------------
>
>rcu: Make rcu_barrier() understand about missing rcuo kthreads
>
>Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
>avoids creating rcuo kthreads for CPUs that never come online. This
>fixes a bug in many instances of firmware: Instead of lying about their
>age, these systems instead lie about the number of CPUs that they have.
>Before commit 35ce7f29a44a, this could result in huge numbers of useless
>rcuo kthreads being created.
>
>It appears that experience indicates that I should have told the
>people suffering from this problem to fix their broken firmware, but
>I instead produced what turned out to be a partial fix. The missing
>piece supplied by this commit makes sure that rcu_barrier() knows not to
>post callbacks for no-CBs CPUs that have not yet come online, because
>otherwise rcu_barrier() will hang on systems having firmware that lies
>about the number of CPUs.
>
>It is tempting to simply have rcu_barrier() refuse to post a callback on
>any no-CBs CPU that does not have an rcuo kthread. This unfortunately
>does not work because rcu_barrier() is required to wait for all pending
>callbacks. It is therefore required to wait even for those callbacks
>that cannot possibly be invoked. Even if doing so hangs the system.
>
>Given that posting a callback to a no-CBs CPU that does not yet have an
>rcuo kthread can hang rcu_barrier(), It is tempting to report an error
>in this case. Unfortunately, this will result in false positives at
>boot time, when it is perfectly legal to post callbacks to the boot CPU
>before the scheduler has started, in other words, before it is legal
>to invoke rcu_barrier().
>
>So this commit instead has rcu_barrier() avoid posting callbacks to
>CPUs having neither rcuo kthread nor pending callbacks, and has it
>complain bitterly if it finds CPUs having no rcuo kthread but some
>pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
>kthread but pending callbacks, as noted earlier, it has no choice but
>to hang indefinitely.
>
>Reported-by: Yanko Kaneti <[email protected]>
>Reported-by: Jay Vosburgh <[email protected]>
>Reported-by: Meelis Roos <[email protected]>
>Reported-by: Eric B Munson <[email protected]>
>Signed-off-by: Paul E. McKenney <[email protected]>
>
>diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
>index aa8e5eea3ab4..c78e88ce5ea3 100644
>--- a/include/trace/events/rcu.h
>+++ b/include/trace/events/rcu.h
>@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
> /*
> * Tracepoint for _rcu_barrier() execution. The string "s" describes
> * the _rcu_barrier phase:
>- * "Begin": rcu_barrier_callback() started.
>- * "Check": rcu_barrier_callback() checking for piggybacking.
>- * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
>- * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
>- * "Offline": rcu_barrier_callback() found offline CPU
>- * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
>- * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
>- * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
>+ * "Begin": _rcu_barrier() started.
>+ * "Check": _rcu_barrier() checking for piggybacking.
>+ * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
>+ * "Inc1": _rcu_barrier() piggyback check counter incremented.
>+ * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
>+ * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
>+ * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
>+ * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
> * "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
> * "CB": An rcu_barrier_callback() invoked a callback, not the last.
> * "LastCB": An rcu_barrier_callback() invoked the last callback.
>- * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
>+ * "Inc2": _rcu_barrier() piggyback check counter incremented.
> * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
> * is the count of remaining callbacks, and "done" is the piggybacking count.
> */
>diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>index f6880052b917..7680fc275036 100644
>--- a/kernel/rcu/tree.c
>+++ b/kernel/rcu/tree.c
>@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
> continue;
> rdp = per_cpu_ptr(rsp->rda, cpu);
> if (rcu_is_nocb_cpu(cpu)) {
>- _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
>- rsp->n_barrier_done);
>- atomic_inc(&rsp->barrier_cpu_count);
>- __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
>- rsp, cpu, 0);
>+ if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
>+ _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
>+ rsp->n_barrier_done);
>+ } else {
>+ _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
>+ rsp->n_barrier_done);
>+ atomic_inc(&rsp->barrier_cpu_count);
>+ __call_rcu(&rdp->barrier_head,
>+ rcu_barrier_callback, rsp, cpu, 0);
>+ }
> } else if (ACCESS_ONCE(rdp->qlen)) {
> _rcu_barrier_trace(rsp, "OnlineQ", cpu,
> rsp->n_barrier_done);
>diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
>index 4beab3d2328c..8e7b1843896e 100644
>--- a/kernel/rcu/tree.h
>+++ b/kernel/rcu/tree.h
>@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
> static void print_cpu_stall_info_end(void);
> static void zero_cpu_stall_ticks(struct rcu_data *rdp);
> static void increment_cpu_stall_ticks(void);
>+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
> static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
> static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
> static void rcu_init_one_nocb(struct rcu_node *rnp);
>diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>index 927c17b081c7..68c5b23b7173 100644
>--- a/kernel/rcu/tree_plugin.h
>+++ b/kernel/rcu/tree_plugin.h
>@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
> }
>
> /*
>+ * Does the specified CPU need an RCU callback for the specified flavor
>+ * of rcu_barrier()?
>+ */
>+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
>+{
>+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
>+ struct rcu_head *rhp;
>+
>+ /* No-CBs CPUs might have callbacks on any of three lists. */
>+ rhp = ACCESS_ONCE(rdp->nocb_head);
>+ if (!rhp)
>+ rhp = ACCESS_ONCE(rdp->nocb_gp_head);
>+ if (!rhp)
>+ rhp = ACCESS_ONCE(rdp->nocb_follower_head);
>+
>+ /* Having no rcuo kthread but CBs after scheduler starts is bad! */
>+ if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
>+ /* RCU callback enqueued before CPU first came online??? */
>+ pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
>+ cpu, rhp->func);
>+ WARN_ON_ONCE(1);
>+ }
>+
>+ return !!rhp;
>+}
>+
>+/*
> * Enqueue the specified string of rcu_head structures onto the specified
> * CPU's no-CBs lists. The CPU is specified by rdp, the head of the
> * string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
>@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
>
> #else /* #ifdef CONFIG_RCU_NOCB_CPU */
>
>+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
>+{
>+}
>+
> static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
> {
> }
>

---
-Jay Vosburgh, [email protected]

2014-10-27 21:11:32

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Mon, Oct 27, 2014 at 01:43:21PM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <[email protected]> wrote:
>
> >On Sat, Oct 25, 2014 at 11:18:27AM -0700, Paul E. McKenney wrote:
> >> On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
> >> > Paul E. McKenney <[email protected]> wrote:
> >> >
> >> > >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> >> > >> Looking at the dmesg, the early boot messages seem to be
> >> > >> confused as to how many CPUs there are, e.g.,
> >> > >>
> >> > >> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> > >> [ 0.000000] Hierarchical RCU implementation.
> >> > >> [ 0.000000] RCU debugfs-based tracing is enabled.
> >> > >> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> >> > >> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> >> > >> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> >> > >> [ 0.000000] NR_IRQS:16640 nr_irqs:456 0
> >> > >> [ 0.000000] Offload RCU callbacks from all CPUs
> >> > >> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.
> >> > >>
> >> > >> but later shows 2:
> >> > >>
> >> > >> [ 0.233703] x86: Booting SMP configuration:
> >> > >> [ 0.236003] .... node #0, CPUs: #1
> >> > >> [ 0.255528] x86: Booted up 1 node, 2 CPUs
> >> > >>
> >> > >> In any event, the E8400 is a 2 core CPU with no hyperthreading.
> >> > >
> >> > >Well, this might explain some of the difficulties. If RCU decides to wait
> >> > >on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
> >> > >was definitely expecting four CPUs.
> >> > >
> >> > >So what happens if you boot with maxcpus=2? (Or build with
> >> > >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
> >> > >I might have some ideas for a real fix.
> >> >
> >> > Booting with maxcpus=2 makes no difference (the dmesg output is
> >> > the same).
> >> >
> >> > Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
> >> > dmesg has different CPU information at boot:
> >> >
> >> > [ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
> >> > [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> >> > [...]
> >> > [ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
> >> > [...]
> >> > [ 0.000000] Hierarchical RCU implementation.
> >> > [ 0.000000] RCU debugfs-based tracing is enabled.
> >> > [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> >> > [ 0.000000] NR_IRQS:4352 nr_irqs:440 0
> >> > [ 0.000000] Offload RCU callbacks from all CPUs
> >> > [ 0.000000] Offload RCU callbacks from CPUs: 0-1.
> >>
> >> Thank you -- this confirms my suspicions on the fix, though I must admit
> >> to being surprised that maxcpus made no difference.
> >
> >And here is an alleged fix, lightly tested at this end. Does this patch
> >help?
>
> This patch appears to make the problem go away; I've run about
> 10 iterations. I applied this patch to the same -net tree I was using
> previously (-net as of Oct 22), with all other test patches removed.

So I finally produced a patch that helps! It was bound to happen sooner
or later, I guess. ;-)

> FWIW, dmesg is unchanged, and still shows messages like:
>
> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.

Yep, at that point in boot, RCU has no way of knowing that the firmware
is lying to it about the number of CPUs. ;-)

> Tested-by: Jay Vosburgh <[email protected]>

Thank you for your testing efforts!!!

Thanx, Paul

> -J
> >
> >------------------------------------------------------------------------
> >
> >rcu: Make rcu_barrier() understand about missing rcuo kthreads
> >
> >Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
> >avoids creating rcuo kthreads for CPUs that never come online. This
> >fixes a bug in many instances of firmware: Instead of lying about their
> >age, these systems instead lie about the number of CPUs that they have.
> >Before commit 35ce7f29a44a, this could result in huge numbers of useless
> >rcuo kthreads being created.
> >
> >It appears that experience indicates that I should have told the
> >people suffering from this problem to fix their broken firmware, but
> >I instead produced what turned out to be a partial fix. The missing
> >piece supplied by this commit makes sure that rcu_barrier() knows not to
> >post callbacks for no-CBs CPUs that have not yet come online, because
> >otherwise rcu_barrier() will hang on systems having firmware that lies
> >about the number of CPUs.
> >
> >It is tempting to simply have rcu_barrier() refuse to post a callback on
> >any no-CBs CPU that does not have an rcuo kthread. This unfortunately
> >does not work because rcu_barrier() is required to wait for all pending
> >callbacks. It is therefore required to wait even for those callbacks
> >that cannot possibly be invoked. Even if doing so hangs the system.
> >
> >Given that posting a callback to a no-CBs CPU that does not yet have an
> >rcuo kthread can hang rcu_barrier(), It is tempting to report an error
> >in this case. Unfortunately, this will result in false positives at
> >boot time, when it is perfectly legal to post callbacks to the boot CPU
> >before the scheduler has started, in other words, before it is legal
> >to invoke rcu_barrier().
> >
> >So this commit instead has rcu_barrier() avoid posting callbacks to
> >CPUs having neither rcuo kthread nor pending callbacks, and has it
> >complain bitterly if it finds CPUs having no rcuo kthread but some
> >pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
> >kthread but pending callbacks, as noted earlier, it has no choice but
> >to hang indefinitely.
> >
> >Reported-by: Yanko Kaneti <[email protected]>
> >Reported-by: Jay Vosburgh <[email protected]>
> >Reported-by: Meelis Roos <[email protected]>
> >Reported-by: Eric B Munson <[email protected]>
> >Signed-off-by: Paul E. McKenney <[email protected]>
> >
> >diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> >index aa8e5eea3ab4..c78e88ce5ea3 100644
> >--- a/include/trace/events/rcu.h
> >+++ b/include/trace/events/rcu.h
> >@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
> > /*
> > * Tracepoint for _rcu_barrier() execution. The string "s" describes
> > * the _rcu_barrier phase:
> >- * "Begin": rcu_barrier_callback() started.
> >- * "Check": rcu_barrier_callback() checking for piggybacking.
> >- * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
> >- * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
> >- * "Offline": rcu_barrier_callback() found offline CPU
> >- * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
> >- * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
> >- * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
> >+ * "Begin": _rcu_barrier() started.
> >+ * "Check": _rcu_barrier() checking for piggybacking.
> >+ * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
> >+ * "Inc1": _rcu_barrier() piggyback check counter incremented.
> >+ * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
> >+ * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
> >+ * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
> >+ * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
> > * "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
> > * "CB": An rcu_barrier_callback() invoked a callback, not the last.
> > * "LastCB": An rcu_barrier_callback() invoked the last callback.
> >- * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
> >+ * "Inc2": _rcu_barrier() piggyback check counter incremented.
> > * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
> > * is the count of remaining callbacks, and "done" is the piggybacking count.
> > */
> >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >index f6880052b917..7680fc275036 100644
> >--- a/kernel/rcu/tree.c
> >+++ b/kernel/rcu/tree.c
> >@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
> > continue;
> > rdp = per_cpu_ptr(rsp->rda, cpu);
> > if (rcu_is_nocb_cpu(cpu)) {
> >- _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> >- rsp->n_barrier_done);
> >- atomic_inc(&rsp->barrier_cpu_count);
> >- __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
> >- rsp, cpu, 0);
> >+ if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
> >+ _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
> >+ rsp->n_barrier_done);
> >+ } else {
> >+ _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> >+ rsp->n_barrier_done);
> >+ atomic_inc(&rsp->barrier_cpu_count);
> >+ __call_rcu(&rdp->barrier_head,
> >+ rcu_barrier_callback, rsp, cpu, 0);
> >+ }
> > } else if (ACCESS_ONCE(rdp->qlen)) {
> > _rcu_barrier_trace(rsp, "OnlineQ", cpu,
> > rsp->n_barrier_done);
> >diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> >index 4beab3d2328c..8e7b1843896e 100644
> >--- a/kernel/rcu/tree.h
> >+++ b/kernel/rcu/tree.h
> >@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
> > static void print_cpu_stall_info_end(void);
> > static void zero_cpu_stall_ticks(struct rcu_data *rdp);
> > static void increment_cpu_stall_ticks(void);
> >+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
> > static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
> > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
> > static void rcu_init_one_nocb(struct rcu_node *rnp);
> >diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >index 927c17b081c7..68c5b23b7173 100644
> >--- a/kernel/rcu/tree_plugin.h
> >+++ b/kernel/rcu/tree_plugin.h
> >@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
> > }
> >
> > /*
> >+ * Does the specified CPU need an RCU callback for the specified flavor
> >+ * of rcu_barrier()?
> >+ */
> >+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
> >+{
> >+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
> >+ struct rcu_head *rhp;
> >+
> >+ /* No-CBs CPUs might have callbacks on any of three lists. */
> >+ rhp = ACCESS_ONCE(rdp->nocb_head);
> >+ if (!rhp)
> >+ rhp = ACCESS_ONCE(rdp->nocb_gp_head);
> >+ if (!rhp)
> >+ rhp = ACCESS_ONCE(rdp->nocb_follower_head);
> >+
> >+ /* Having no rcuo kthread but CBs after scheduler starts is bad! */
> >+ if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
> >+ /* RCU callback enqueued before CPU first came online??? */
> >+ pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
> >+ cpu, rhp->func);
> >+ WARN_ON_ONCE(1);
> >+ }
> >+
> >+ return !!rhp;
> >+}
> >+
> >+/*
> > * Enqueue the specified string of rcu_head structures onto the specified
> > * CPU's no-CBs lists. The CPU is specified by rdp, the head of the
> > * string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
> >@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
> >
> > #else /* #ifdef CONFIG_RCU_NOCB_CPU */
> >
> >+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
> >+{
> >+}
> >+
> > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
> > {
> > }
> >
>
> ---
> -Jay Vosburgh, [email protected]
>

2014-10-28 08:12:50

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Mon-10/27/14-2014 10:45, Paul E. McKenney wrote:
> On Sat, Oct 25, 2014 at 11:18:27AM -0700, Paul E. McKenney wrote:
> > On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
> > > Paul E. McKenney <[email protected]> wrote:
> > >
> > > >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> > > >> Looking at the dmesg, the early boot messages seem to be
> > > >> confused as to how many CPUs there are, e.g.,
> > > >>
> > > >> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > > >> [ 0.000000] Hierarchical RCU implementation.
> > > >> [ 0.000000] RCU debugfs-based tracing is enabled.
> > > >> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> > > >> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> > > >> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> > > >> [ 0.000000] NR_IRQS:16640 nr_irqs:456 0
> > > >> [ 0.000000] Offload RCU callbacks from all CPUs
> > > >> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.
> > > >>
> > > >> but later shows 2:
> > > >>
> > > >> [ 0.233703] x86: Booting SMP configuration:
> > > >> [ 0.236003] .... node #0, CPUs: #1
> > > >> [ 0.255528] x86: Booted up 1 node, 2 CPUs
> > > >>
> > > >> In any event, the E8400 is a 2 core CPU with no hyperthreading.
> > > >
> > > >Well, this might explain some of the difficulties. If RCU decides to wait
> > > >on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
> > > >was definitely expecting four CPUs.
> > > >
> > > >So what happens if you boot with maxcpus=2? (Or build with
> > > >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
> > > >I might have some ideas for a real fix.
> > >
> > > Booting with maxcpus=2 makes no difference (the dmesg output is
> > > the same).
> > >
> > > Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
> > > dmesg has different CPU information at boot:
> > >
> > > [ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
> > > [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> > > [...]
> > > [ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
> > > [...]
> > > [ 0.000000] Hierarchical RCU implementation.
> > > [ 0.000000] RCU debugfs-based tracing is enabled.
> > > [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> > > [ 0.000000] NR_IRQS:4352 nr_irqs:440 0
> > > [ 0.000000] Offload RCU callbacks from all CPUs
> > > [ 0.000000] Offload RCU callbacks from CPUs: 0-1.
> >
> > Thank you -- this confirms my suspicions on the fix, though I must admit
> > to being surprised that maxcpus made no difference.
>
> And here is an alleged fix, lightly tested at this end. Does this patch
> help?

Tested this on top of rc2 (as found in Fedora, and failing without the patch)
with all my modprobe scenarios and it seems to have fixed it.

Thanks
-Yanko


> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> rcu: Make rcu_barrier() understand about missing rcuo kthreads
>
> Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
> avoids creating rcuo kthreads for CPUs that never come online. This
> fixes a bug in many instances of firmware: Instead of lying about their
> age, these systems instead lie about the number of CPUs that they have.
> Before commit 35ce7f29a44a, this could result in huge numbers of useless
> rcuo kthreads being created.
>
> It appears that experience indicates that I should have told the
> people suffering from this problem to fix their broken firmware, but
> I instead produced what turned out to be a partial fix. The missing
> piece supplied by this commit makes sure that rcu_barrier() knows not to
> post callbacks for no-CBs CPUs that have not yet come online, because
> otherwise rcu_barrier() will hang on systems having firmware that lies
> about the number of CPUs.
>
> It is tempting to simply have rcu_barrier() refuse to post a callback on
> any no-CBs CPU that does not have an rcuo kthread. This unfortunately
> does not work because rcu_barrier() is required to wait for all pending
> callbacks. It is therefore required to wait even for those callbacks
> that cannot possibly be invoked. Even if doing so hangs the system.
>
> Given that posting a callback to a no-CBs CPU that does not yet have an
> rcuo kthread can hang rcu_barrier(), It is tempting to report an error
> in this case. Unfortunately, this will result in false positives at
> boot time, when it is perfectly legal to post callbacks to the boot CPU
> before the scheduler has started, in other words, before it is legal
> to invoke rcu_barrier().
>
> So this commit instead has rcu_barrier() avoid posting callbacks to
> CPUs having neither rcuo kthread nor pending callbacks, and has it
> complain bitterly if it finds CPUs having no rcuo kthread but some
> pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
> kthread but pending callbacks, as noted earlier, it has no choice but
> to hang indefinitely.
>
> Reported-by: Yanko Kaneti <[email protected]>
> Reported-by: Jay Vosburgh <[email protected]>
> Reported-by: Meelis Roos <[email protected]>
> Reported-by: Eric B Munson <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index aa8e5eea3ab4..c78e88ce5ea3 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
> /*
> * Tracepoint for _rcu_barrier() execution. The string "s" describes
> * the _rcu_barrier phase:
> - * "Begin": rcu_barrier_callback() started.
> - * "Check": rcu_barrier_callback() checking for piggybacking.
> - * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
> - * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
> - * "Offline": rcu_barrier_callback() found offline CPU
> - * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
> - * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
> - * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
> + * "Begin": _rcu_barrier() started.
> + * "Check": _rcu_barrier() checking for piggybacking.
> + * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
> + * "Inc1": _rcu_barrier() piggyback check counter incremented.
> + * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
> + * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
> + * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
> + * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
> * "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
> * "CB": An rcu_barrier_callback() invoked a callback, not the last.
> * "LastCB": An rcu_barrier_callback() invoked the last callback.
> - * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
> + * "Inc2": _rcu_barrier() piggyback check counter incremented.
> * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
> * is the count of remaining callbacks, and "done" is the piggybacking count.
> */
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index f6880052b917..7680fc275036 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
> continue;
> rdp = per_cpu_ptr(rsp->rda, cpu);
> if (rcu_is_nocb_cpu(cpu)) {
> - _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> - rsp->n_barrier_done);
> - atomic_inc(&rsp->barrier_cpu_count);
> - __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
> - rsp, cpu, 0);
> + if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
> + _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
> + rsp->n_barrier_done);
> + } else {
> + _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> + rsp->n_barrier_done);
> + atomic_inc(&rsp->barrier_cpu_count);
> + __call_rcu(&rdp->barrier_head,
> + rcu_barrier_callback, rsp, cpu, 0);
> + }
> } else if (ACCESS_ONCE(rdp->qlen)) {
> _rcu_barrier_trace(rsp, "OnlineQ", cpu,
> rsp->n_barrier_done);
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 4beab3d2328c..8e7b1843896e 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
> static void print_cpu_stall_info_end(void);
> static void zero_cpu_stall_ticks(struct rcu_data *rdp);
> static void increment_cpu_stall_ticks(void);
> +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
> static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
> static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
> static void rcu_init_one_nocb(struct rcu_node *rnp);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 927c17b081c7..68c5b23b7173 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
> }
>
> /*
> + * Does the specified CPU need an RCU callback for the specified flavor
> + * of rcu_barrier()?
> + */
> +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
> +{
> + struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
> + struct rcu_head *rhp;
> +
> + /* No-CBs CPUs might have callbacks on any of three lists. */
> + rhp = ACCESS_ONCE(rdp->nocb_head);
> + if (!rhp)
> + rhp = ACCESS_ONCE(rdp->nocb_gp_head);
> + if (!rhp)
> + rhp = ACCESS_ONCE(rdp->nocb_follower_head);
> +
> + /* Having no rcuo kthread but CBs after scheduler starts is bad! */
> + if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
> + /* RCU callback enqueued before CPU first came online??? */
> + pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
> + cpu, rhp->func);
> + WARN_ON_ONCE(1);
> + }
> +
> + return !!rhp;
> +}
> +
> +/*
> * Enqueue the specified string of rcu_head structures onto the specified
> * CPU's no-CBs lists. The CPU is specified by rdp, the head of the
> * string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
> @@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
>
> #else /* #ifdef CONFIG_RCU_NOCB_CPU */
>
> +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
> +{
> +}
> +
> static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
> {
> }
>

2014-10-28 12:55:01

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Tue, Oct 28, 2014 at 10:12:43AM +0200, Yanko Kaneti wrote:
> On Mon-10/27/14-2014 10:45, Paul E. McKenney wrote:
> > On Sat, Oct 25, 2014 at 11:18:27AM -0700, Paul E. McKenney wrote:
> > > On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
> > > > Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> > > > >> Looking at the dmesg, the early boot messages seem to be
> > > > >> confused as to how many CPUs there are, e.g.,
> > > > >>
> > > > >> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > > > >> [ 0.000000] Hierarchical RCU implementation.
> > > > >> [ 0.000000] RCU debugfs-based tracing is enabled.
> > > > >> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> > > > >> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> > > > >> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> > > > >> [ 0.000000] NR_IRQS:16640 nr_irqs:456 0
> > > > >> [ 0.000000] Offload RCU callbacks from all CPUs
> > > > >> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.
> > > > >>
> > > > >> but later shows 2:
> > > > >>
> > > > >> [ 0.233703] x86: Booting SMP configuration:
> > > > >> [ 0.236003] .... node #0, CPUs: #1
> > > > >> [ 0.255528] x86: Booted up 1 node, 2 CPUs
> > > > >>
> > > > >> In any event, the E8400 is a 2 core CPU with no hyperthreading.
> > > > >
> > > > >Well, this might explain some of the difficulties. If RCU decides to wait
> > > > >on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
> > > > >was definitely expecting four CPUs.
> > > > >
> > > > >So what happens if you boot with maxcpus=2? (Or build with
> > > > >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
> > > > >I might have some ideas for a real fix.
> > > >
> > > > Booting with maxcpus=2 makes no difference (the dmesg output is
> > > > the same).
> > > >
> > > > Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
> > > > dmesg has different CPU information at boot:
> > > >
> > > > [ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
> > > > [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> > > > [...]
> > > > [ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
> > > > [...]
> > > > [ 0.000000] Hierarchical RCU implementation.
> > > > [ 0.000000] RCU debugfs-based tracing is enabled.
> > > > [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> > > > [ 0.000000] NR_IRQS:4352 nr_irqs:440 0
> > > > [ 0.000000] Offload RCU callbacks from all CPUs
> > > > [ 0.000000] Offload RCU callbacks from CPUs: 0-1.
> > >
> > > Thank you -- this confirms my suspicions on the fix, though I must admit
> > > to being surprised that maxcpus made no difference.
> >
> > And here is an alleged fix, lightly tested at this end. Does this patch
> > help?
>
> Tested this on top of rc2 (as found in Fedora, and failing without the patch)
> with all my modprobe scenarios and it seems to have fixed it.

Very good! May I apply your Tested-by?

Thanx, Paul

> Thanks
> -Yanko
>
>
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > rcu: Make rcu_barrier() understand about missing rcuo kthreads
> >
> > Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
> > avoids creating rcuo kthreads for CPUs that never come online. This
> > fixes a bug in many instances of firmware: Instead of lying about their
> > age, these systems instead lie about the number of CPUs that they have.
> > Before commit 35ce7f29a44a, this could result in huge numbers of useless
> > rcuo kthreads being created.
> >
> > It appears that experience indicates that I should have told the
> > people suffering from this problem to fix their broken firmware, but
> > I instead produced what turned out to be a partial fix. The missing
> > piece supplied by this commit makes sure that rcu_barrier() knows not to
> > post callbacks for no-CBs CPUs that have not yet come online, because
> > otherwise rcu_barrier() will hang on systems having firmware that lies
> > about the number of CPUs.
> >
> > It is tempting to simply have rcu_barrier() refuse to post a callback on
> > any no-CBs CPU that does not have an rcuo kthread. This unfortunately
> > does not work because rcu_barrier() is required to wait for all pending
> > callbacks. It is therefore required to wait even for those callbacks
> > that cannot possibly be invoked. Even if doing so hangs the system.
> >
> > Given that posting a callback to a no-CBs CPU that does not yet have an
> > rcuo kthread can hang rcu_barrier(), It is tempting to report an error
> > in this case. Unfortunately, this will result in false positives at
> > boot time, when it is perfectly legal to post callbacks to the boot CPU
> > before the scheduler has started, in other words, before it is legal
> > to invoke rcu_barrier().
> >
> > So this commit instead has rcu_barrier() avoid posting callbacks to
> > CPUs having neither rcuo kthread nor pending callbacks, and has it
> > complain bitterly if it finds CPUs having no rcuo kthread but some
> > pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
> > kthread but pending callbacks, as noted earlier, it has no choice but
> > to hang indefinitely.
> >
> > Reported-by: Yanko Kaneti <[email protected]>
> > Reported-by: Jay Vosburgh <[email protected]>
> > Reported-by: Meelis Roos <[email protected]>
> > Reported-by: Eric B Munson <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index aa8e5eea3ab4..c78e88ce5ea3 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
> > /*
> > * Tracepoint for _rcu_barrier() execution. The string "s" describes
> > * the _rcu_barrier phase:
> > - * "Begin": rcu_barrier_callback() started.
> > - * "Check": rcu_barrier_callback() checking for piggybacking.
> > - * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
> > - * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
> > - * "Offline": rcu_barrier_callback() found offline CPU
> > - * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
> > - * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
> > - * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
> > + * "Begin": _rcu_barrier() started.
> > + * "Check": _rcu_barrier() checking for piggybacking.
> > + * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
> > + * "Inc1": _rcu_barrier() piggyback check counter incremented.
> > + * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
> > + * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
> > + * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
> > + * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
> > * "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
> > * "CB": An rcu_barrier_callback() invoked a callback, not the last.
> > * "LastCB": An rcu_barrier_callback() invoked the last callback.
> > - * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
> > + * "Inc2": _rcu_barrier() piggyback check counter incremented.
> > * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
> > * is the count of remaining callbacks, and "done" is the piggybacking count.
> > */
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index f6880052b917..7680fc275036 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
> > continue;
> > rdp = per_cpu_ptr(rsp->rda, cpu);
> > if (rcu_is_nocb_cpu(cpu)) {
> > - _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> > - rsp->n_barrier_done);
> > - atomic_inc(&rsp->barrier_cpu_count);
> > - __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
> > - rsp, cpu, 0);
> > + if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
> > + _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
> > + rsp->n_barrier_done);
> > + } else {
> > + _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> > + rsp->n_barrier_done);
> > + atomic_inc(&rsp->barrier_cpu_count);
> > + __call_rcu(&rdp->barrier_head,
> > + rcu_barrier_callback, rsp, cpu, 0);
> > + }
> > } else if (ACCESS_ONCE(rdp->qlen)) {
> > _rcu_barrier_trace(rsp, "OnlineQ", cpu,
> > rsp->n_barrier_done);
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index 4beab3d2328c..8e7b1843896e 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
> > static void print_cpu_stall_info_end(void);
> > static void zero_cpu_stall_ticks(struct rcu_data *rdp);
> > static void increment_cpu_stall_ticks(void);
> > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
> > static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
> > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
> > static void rcu_init_one_nocb(struct rcu_node *rnp);
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 927c17b081c7..68c5b23b7173 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
> > }
> >
> > /*
> > + * Does the specified CPU need an RCU callback for the specified flavor
> > + * of rcu_barrier()?
> > + */
> > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
> > +{
> > + struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
> > + struct rcu_head *rhp;
> > +
> > + /* No-CBs CPUs might have callbacks on any of three lists. */
> > + rhp = ACCESS_ONCE(rdp->nocb_head);
> > + if (!rhp)
> > + rhp = ACCESS_ONCE(rdp->nocb_gp_head);
> > + if (!rhp)
> > + rhp = ACCESS_ONCE(rdp->nocb_follower_head);
> > +
> > + /* Having no rcuo kthread but CBs after scheduler starts is bad! */
> > + if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
> > + /* RCU callback enqueued before CPU first came online??? */
> > + pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
> > + cpu, rhp->func);
> > + WARN_ON_ONCE(1);
> > + }
> > +
> > + return !!rhp;
> > +}
> > +
> > +/*
> > * Enqueue the specified string of rcu_head structures onto the specified
> > * CPU's no-CBs lists. The CPU is specified by rdp, the head of the
> > * string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
> > @@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
> >
> > #else /* #ifdef CONFIG_RCU_NOCB_CPU */
> >
> > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
> > +{
> > +}
> > +
> > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
> > {
> > }
> >
>

2014-10-28 13:00:07

by Yanko Kaneti

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Tue-10/28/14-2014 05:50, Paul E. McKenney wrote:
> On Tue, Oct 28, 2014 at 10:12:43AM +0200, Yanko Kaneti wrote:
> > On Mon-10/27/14-2014 10:45, Paul E. McKenney wrote:
> > > On Sat, Oct 25, 2014 at 11:18:27AM -0700, Paul E. McKenney wrote:
> > > > On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
> > > > > Paul E. McKenney <[email protected]> wrote:
> > > > >
> > > > > >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> > > > > >> Looking at the dmesg, the early boot messages seem to be
> > > > > >> confused as to how many CPUs there are, e.g.,
> > > > > >>
> > > > > >> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > > > > >> [ 0.000000] Hierarchical RCU implementation.
> > > > > >> [ 0.000000] RCU debugfs-based tracing is enabled.
> > > > > >> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> > > > > >> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> > > > > >> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> > > > > >> [ 0.000000] NR_IRQS:16640 nr_irqs:456 0
> > > > > >> [ 0.000000] Offload RCU callbacks from all CPUs
> > > > > >> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.
> > > > > >>
> > > > > >> but later shows 2:
> > > > > >>
> > > > > >> [ 0.233703] x86: Booting SMP configuration:
> > > > > >> [ 0.236003] .... node #0, CPUs: #1
> > > > > >> [ 0.255528] x86: Booted up 1 node, 2 CPUs
> > > > > >>
> > > > > >> In any event, the E8400 is a 2 core CPU with no hyperthreading.
> > > > > >
> > > > > >Well, this might explain some of the difficulties. If RCU decides to wait
> > > > > >on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
> > > > > >was definitely expecting four CPUs.
> > > > > >
> > > > > >So what happens if you boot with maxcpus=2? (Or build with
> > > > > >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
> > > > > >I might have some ideas for a real fix.
> > > > >
> > > > > Booting with maxcpus=2 makes no difference (the dmesg output is
> > > > > the same).
> > > > >
> > > > > Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
> > > > > dmesg has different CPU information at boot:
> > > > >
> > > > > [ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
> > > > > [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> > > > > [...]
> > > > > [ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
> > > > > [...]
> > > > > [ 0.000000] Hierarchical RCU implementation.
> > > > > [ 0.000000] RCU debugfs-based tracing is enabled.
> > > > > [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> > > > > [ 0.000000] NR_IRQS:4352 nr_irqs:440 0
> > > > > [ 0.000000] Offload RCU callbacks from all CPUs
> > > > > [ 0.000000] Offload RCU callbacks from CPUs: 0-1.
> > > >
> > > > Thank you -- this confirms my suspicions on the fix, though I must admit
> > > > to being surprised that maxcpus made no difference.
> > >
> > > And here is an alleged fix, lightly tested at this end. Does this patch
> > > help?
> >
> > Tested this on top of rc2 (as found in Fedora, and failing without the patch)
> > with all my modprobe scenarios and it seems to have fixed it.
>
> Very good! May I apply your Tested-by?

Sure. Sorry didn't include this earlier

Tested-by: Yanko Kaneti <[email protected]>

> Thanx, Paul
>
> > Thanks
> > -Yanko
> >
> >
> > > Thanx, Paul
> > >
> > > ------------------------------------------------------------------------
> > >
> > > rcu: Make rcu_barrier() understand about missing rcuo kthreads
> > >
> > > Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
> > > avoids creating rcuo kthreads for CPUs that never come online. This
> > > fixes a bug in many instances of firmware: Instead of lying about their
> > > age, these systems instead lie about the number of CPUs that they have.
> > > Before commit 35ce7f29a44a, this could result in huge numbers of useless
> > > rcuo kthreads being created.
> > >
> > > It appears that experience indicates that I should have told the
> > > people suffering from this problem to fix their broken firmware, but
> > > I instead produced what turned out to be a partial fix. The missing
> > > piece supplied by this commit makes sure that rcu_barrier() knows not to
> > > post callbacks for no-CBs CPUs that have not yet come online, because
> > > otherwise rcu_barrier() will hang on systems having firmware that lies
> > > about the number of CPUs.
> > >
> > > It is tempting to simply have rcu_barrier() refuse to post a callback on
> > > any no-CBs CPU that does not have an rcuo kthread. This unfortunately
> > > does not work because rcu_barrier() is required to wait for all pending
> > > callbacks. It is therefore required to wait even for those callbacks
> > > that cannot possibly be invoked. Even if doing so hangs the system.
> > >
> > > Given that posting a callback to a no-CBs CPU that does not yet have an
> > > rcuo kthread can hang rcu_barrier(), It is tempting to report an error
> > > in this case. Unfortunately, this will result in false positives at
> > > boot time, when it is perfectly legal to post callbacks to the boot CPU
> > > before the scheduler has started, in other words, before it is legal
> > > to invoke rcu_barrier().
> > >
> > > So this commit instead has rcu_barrier() avoid posting callbacks to
> > > CPUs having neither rcuo kthread nor pending callbacks, and has it
> > > complain bitterly if it finds CPUs having no rcuo kthread but some
> > > pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
> > > kthread but pending callbacks, as noted earlier, it has no choice but
> > > to hang indefinitely.
> > >
> > > Reported-by: Yanko Kaneti <[email protected]>
> > > Reported-by: Jay Vosburgh <[email protected]>
> > > Reported-by: Meelis Roos <[email protected]>
> > > Reported-by: Eric B Munson <[email protected]>
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> > >
> > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > index aa8e5eea3ab4..c78e88ce5ea3 100644
> > > --- a/include/trace/events/rcu.h
> > > +++ b/include/trace/events/rcu.h
> > > @@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
> > > /*
> > > * Tracepoint for _rcu_barrier() execution. The string "s" describes
> > > * the _rcu_barrier phase:
> > > - * "Begin": rcu_barrier_callback() started.
> > > - * "Check": rcu_barrier_callback() checking for piggybacking.
> > > - * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
> > > - * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
> > > - * "Offline": rcu_barrier_callback() found offline CPU
> > > - * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
> > > - * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
> > > - * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
> > > + * "Begin": _rcu_barrier() started.
> > > + * "Check": _rcu_barrier() checking for piggybacking.
> > > + * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
> > > + * "Inc1": _rcu_barrier() piggyback check counter incremented.
> > > + * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
> > > + * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
> > > + * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
> > > + * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
> > > * "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
> > > * "CB": An rcu_barrier_callback() invoked a callback, not the last.
> > > * "LastCB": An rcu_barrier_callback() invoked the last callback.
> > > - * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
> > > + * "Inc2": _rcu_barrier() piggyback check counter incremented.
> > > * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
> > > * is the count of remaining callbacks, and "done" is the piggybacking count.
> > > */
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index f6880052b917..7680fc275036 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
> > > continue;
> > > rdp = per_cpu_ptr(rsp->rda, cpu);
> > > if (rcu_is_nocb_cpu(cpu)) {
> > > - _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> > > - rsp->n_barrier_done);
> > > - atomic_inc(&rsp->barrier_cpu_count);
> > > - __call_rcu(&rdp->barrier_head, rcu_barrier_callback,
> > > - rsp, cpu, 0);
> > > + if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
> > > + _rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
> > > + rsp->n_barrier_done);
> > > + } else {
> > > + _rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> > > + rsp->n_barrier_done);
> > > + atomic_inc(&rsp->barrier_cpu_count);
> > > + __call_rcu(&rdp->barrier_head,
> > > + rcu_barrier_callback, rsp, cpu, 0);
> > > + }
> > > } else if (ACCESS_ONCE(rdp->qlen)) {
> > > _rcu_barrier_trace(rsp, "OnlineQ", cpu,
> > > rsp->n_barrier_done);
> > > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > > index 4beab3d2328c..8e7b1843896e 100644
> > > --- a/kernel/rcu/tree.h
> > > +++ b/kernel/rcu/tree.h
> > > @@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
> > > static void print_cpu_stall_info_end(void);
> > > static void zero_cpu_stall_ticks(struct rcu_data *rdp);
> > > static void increment_cpu_stall_ticks(void);
> > > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
> > > static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
> > > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
> > > static void rcu_init_one_nocb(struct rcu_node *rnp);
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 927c17b081c7..68c5b23b7173 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
> > > }
> > >
> > > /*
> > > + * Does the specified CPU need an RCU callback for the specified flavor
> > > + * of rcu_barrier()?
> > > + */
> > > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
> > > +{
> > > + struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
> > > + struct rcu_head *rhp;
> > > +
> > > + /* No-CBs CPUs might have callbacks on any of three lists. */
> > > + rhp = ACCESS_ONCE(rdp->nocb_head);
> > > + if (!rhp)
> > > + rhp = ACCESS_ONCE(rdp->nocb_gp_head);
> > > + if (!rhp)
> > > + rhp = ACCESS_ONCE(rdp->nocb_follower_head);
> > > +
> > > + /* Having no rcuo kthread but CBs after scheduler starts is bad! */
> > > + if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
> > > + /* RCU callback enqueued before CPU first came online??? */
> > > + pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
> > > + cpu, rhp->func);
> > > + WARN_ON_ONCE(1);
> > > + }
> > > +
> > > + return !!rhp;
> > > +}
> > > +
> > > +/*
> > > * Enqueue the specified string of rcu_head structures onto the specified
> > > * CPU's no-CBs lists. The CPU is specified by rdp, the head of the
> > > * string by rhp, and the tail of the string by rhtp. The non-lazy/lazy
> > > @@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
> > >
> > > #else /* #ifdef CONFIG_RCU_NOCB_CPU */
> > >
> > > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
> > > +{
> > > +}
> > > +
> > > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
> > > {
> > > }
> > >
> >
>

2014-10-28 15:54:33

by Kevin Fenzi

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Just FYI, this solves the orig issue for me as well. ;)

Thanks for all the work in tracking it down...

Tested-by: Kevin Fenzi <[email protected]>

kevin



Attachments:
signature.asc (819.00 B)

2014-10-28 16:19:16

by Paul E. McKenney

[permalink] [raw]
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

On Tue, Oct 28, 2014 at 09:54:28AM -0600, Kevin Fenzi wrote:
> Just FYI, this solves the orig issue for me as well. ;)
>
> Thanks for all the work in tracking it down...
>
> Tested-by: Kevin Fenzi <[email protected]>

And thank you for testing as well!

Thanx, Paul