2020-03-15 03:38:45

by syzbot

[permalink] [raw]
Subject: linux-next test error: WARNING: suspicious RCU usage in ovs_ct_exit

Hello,

syzbot found the following crash on:

HEAD commit: 2e602db7 Add linux-next specific files for 20200313
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16669919e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=cf2879fc1055b886
dashboard link: https://syzkaller.appspot.com/bug?extid=7ef50afd3a211f879112
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

=============================
WARNING: suspicious RCU usage
5.6.0-rc5-next-20200313-syzkaller #0 Not tainted
-----------------------------
net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
3 locks held by kworker/u4:3/127:
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: __write_once_size include/linux/compiler.h:250 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: set_work_data kernel/workqueue.c:615 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:642 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: process_one_work+0x82a/0x1690 kernel/workqueue.c:2237
#1: ffffc900013a7dd0 (net_cleanup_work){+.+.}, at: process_one_work+0x85e/0x1690 kernel/workqueue.c:2241
#2: ffffffff8a54df08 (pernet_ops_rwsem){++++}, at: cleanup_net+0x9b/0xa50 net/core/net_namespace.c:551

stack backtrace:
CPU: 0 PID: 127 Comm: kworker/u4:3 Not tainted 5.6.0-rc5-next-20200313-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
ovs_ct_limit_exit net/openvswitch/conntrack.c:1898 [inline]
ovs_ct_exit+0x3db/0x558 net/openvswitch/conntrack.c:2295
ovs_exit_net+0x1df/0xba0 net/openvswitch/datapath.c:2469
ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:172
cleanup_net+0x511/0xa50 net/core/net_namespace.c:589
process_one_work+0x94b/0x1690 kernel/workqueue.c:2266
worker_thread+0x96/0xe20 kernel/workqueue.c:2412
kthread+0x357/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
tipc: TX() has been purged, node left!

=============================
WARNING: suspicious RCU usage
5.6.0-rc5-next-20200313-syzkaller #0 Not tainted
-----------------------------
net/ipv4/ipmr.c:1757 RCU-list traversed in non-reader section!!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
4 locks held by kworker/u4:3/127:
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: __write_once_size include/linux/compiler.h:250 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: set_work_data kernel/workqueue.c:615 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:642 [inline]
#0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: process_one_work+0x82a/0x1690 kernel/workqueue.c:2237
#1: ffffc900013a7dd0 (net_cleanup_work){+.+.}, at: process_one_work+0x85e/0x1690 kernel/workqueue.c:2241
#2: ffffffff8a54df08 (pernet_ops_rwsem){++++}, at: cleanup_net+0x9b/0xa50 net/core/net_namespace.c:551
#3: ffffffff8a559c80 (rtnl_mutex){+.+.}, at: ip6gre_exit_batch_net+0x88/0x700 net/ipv6/ip6_gre.c:1602

stack backtrace:
CPU: 1 PID: 127 Comm: kworker/u4:3 Not tainted 5.6.0-rc5-next-20200313-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
ipmr_device_event+0x240/0x2b0 net/ipv4/ipmr.c:1757
notifier_call_chain+0xc0/0x230 kernel/notifier.c:83
call_netdevice_notifiers_info net/core/dev.c:1948 [inline]
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1933
call_netdevice_notifiers_extack net/core/dev.c:1960 [inline]
call_netdevice_notifiers net/core/dev.c:1974 [inline]
rollback_registered_many+0x75c/0xe70 net/core/dev.c:8810
unregister_netdevice_many.part.0+0x16/0x1e0 net/core/dev.c:9966
unregister_netdevice_many+0x36/0x50 net/core/dev.c:9965
ip6gre_exit_batch_net+0x4e8/0x700 net/ipv6/ip6_gre.c:1605
ops_exit_list.isra.0+0x103/0x150 net/core/net_namespace.c:175
cleanup_net+0x511/0xa50 net/core/net_namespace.c:589
process_one_work+0x94b/0x1690 kernel/workqueue.c:2266
worker_thread+0x96/0xe20 kernel/workqueue.c:2412
kthread+0x357/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2020-04-18 07:04:33

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: linux-next test error: WARNING: suspicious RCU usage in ovs_ct_exit

On Sat, Mar 14, 2020 at 8:57 AM syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 2e602db7 Add linux-next specific files for 20200313
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=16669919e00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=cf2879fc1055b886
> dashboard link: https://syzkaller.appspot.com/bug?extid=7ef50afd3a211f879112
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]

+linux-next, Stephen for currently open linux-next build/boot failure

> =============================
> WARNING: suspicious RCU usage
> 5.6.0-rc5-next-20200313-syzkaller #0 Not tainted
> -----------------------------
> net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 1
> 3 locks held by kworker/u4:3/127:
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: __write_once_size include/linux/compiler.h:250 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: set_work_data kernel/workqueue.c:615 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:642 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: process_one_work+0x82a/0x1690 kernel/workqueue.c:2237
> #1: ffffc900013a7dd0 (net_cleanup_work){+.+.}, at: process_one_work+0x85e/0x1690 kernel/workqueue.c:2241
> #2: ffffffff8a54df08 (pernet_ops_rwsem){++++}, at: cleanup_net+0x9b/0xa50 net/core/net_namespace.c:551
>
> stack backtrace:
> CPU: 0 PID: 127 Comm: kworker/u4:3 Not tainted 5.6.0-rc5-next-20200313-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: netns cleanup_net
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x188/0x20d lib/dump_stack.c:118
> ovs_ct_limit_exit net/openvswitch/conntrack.c:1898 [inline]
> ovs_ct_exit+0x3db/0x558 net/openvswitch/conntrack.c:2295
> ovs_exit_net+0x1df/0xba0 net/openvswitch/datapath.c:2469
> ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:172
> cleanup_net+0x511/0xa50 net/core/net_namespace.c:589
> process_one_work+0x94b/0x1690 kernel/workqueue.c:2266
> worker_thread+0x96/0xe20 kernel/workqueue.c:2412
> kthread+0x357/0x430 kernel/kthread.c:255
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> tipc: TX() has been purged, node left!
>
> =============================
> WARNING: suspicious RCU usage
> 5.6.0-rc5-next-20200313-syzkaller #0 Not tainted
> -----------------------------
> net/ipv4/ipmr.c:1757 RCU-list traversed in non-reader section!!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 1
> 4 locks held by kworker/u4:3/127:
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: __write_once_size include/linux/compiler.h:250 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: set_work_data kernel/workqueue.c:615 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:642 [inline]
> #0: ffff8880a9771d28 ((wq_completion)netns){+.+.}, at: process_one_work+0x82a/0x1690 kernel/workqueue.c:2237
> #1: ffffc900013a7dd0 (net_cleanup_work){+.+.}, at: process_one_work+0x85e/0x1690 kernel/workqueue.c:2241
> #2: ffffffff8a54df08 (pernet_ops_rwsem){++++}, at: cleanup_net+0x9b/0xa50 net/core/net_namespace.c:551
> #3: ffffffff8a559c80 (rtnl_mutex){+.+.}, at: ip6gre_exit_batch_net+0x88/0x700 net/ipv6/ip6_gre.c:1602
>
> stack backtrace:
> CPU: 1 PID: 127 Comm: kworker/u4:3 Not tainted 5.6.0-rc5-next-20200313-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: netns cleanup_net
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x188/0x20d lib/dump_stack.c:118
> ipmr_device_event+0x240/0x2b0 net/ipv4/ipmr.c:1757
> notifier_call_chain+0xc0/0x230 kernel/notifier.c:83
> call_netdevice_notifiers_info net/core/dev.c:1948 [inline]
> call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1933
> call_netdevice_notifiers_extack net/core/dev.c:1960 [inline]
> call_netdevice_notifiers net/core/dev.c:1974 [inline]
> rollback_registered_many+0x75c/0xe70 net/core/dev.c:8810
> unregister_netdevice_many.part.0+0x16/0x1e0 net/core/dev.c:9966
> unregister_netdevice_many+0x36/0x50 net/core/dev.c:9965
> ip6gre_exit_batch_net+0x4e8/0x700 net/ipv6/ip6_gre.c:1605
> ops_exit_list.isra.0+0x103/0x150 net/core/net_namespace.c:175
> cleanup_net+0x511/0xa50 net/core/net_namespace.c:589
> process_one_work+0x94b/0x1690 kernel/workqueue.c:2266
> worker_thread+0x96/0xe20 kernel/workqueue.c:2412
> kthread+0x357/0x430 kernel/kthread.c:255
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000e642a905a0cbee6e%40google.com.

2020-04-19 08:45:26

by Tonghao Zhang

[permalink] [raw]
Subject: [PATCH] net: openvswitch: ovs_ct_exit to be done under ovs_lock

From: Tonghao Zhang <[email protected]>

syzbot wrote:
| =============================
| WARNING: suspicious RCU usage
| 5.7.0-rc1+ #45 Not tainted
| -----------------------------
| net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
|
| other info that might help us debug this:
| rcu_scheduler_active = 2, debug_locks = 1
| ...
|
| stack backtrace:
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
| ...
| ovs_ct_exit
| ovs_exit_net
| ops_exit_list.isra.7
| cleanup_net
| process_one_work
| worker_thread

To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
lockdep_ovsl_is_held as optional lockdep expression.

Link: https://lore.kernel.org/lkml/[email protected]
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Cc: Pravin B Shelar <[email protected]>
Cc: Yi-Hung Wei <[email protected]>
Reported-by: [email protected]
Signed-off-by: Tonghao Zhang <[email protected]>
---
net/openvswitch/conntrack.c | 3 ++-
net/openvswitch/datapath.c | 4 +++-
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index e726159cfcfa..4340f25fe390 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1895,7 +1895,8 @@ static void ovs_ct_limit_exit(struct net *net, struct ovs_net *ovs_net)
struct hlist_head *head = &info->limits[i];
struct ovs_ct_limit *ct_limit;

- hlist_for_each_entry_rcu(ct_limit, head, hlist_node)
+ hlist_for_each_entry_rcu(ct_limit, head, hlist_node,
+ lockdep_ovsl_is_held())
kfree_rcu(ct_limit, rcu);
}
kfree(ovs_net->ct_limit_info->limits);
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index d8ae541d22a8..94b024534987 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -2466,8 +2466,10 @@ static void __net_exit ovs_exit_net(struct net *dnet)
struct net *net;
LIST_HEAD(head);

- ovs_ct_exit(dnet);
ovs_lock();
+
+ ovs_ct_exit(dnet);
+
list_for_each_entry_safe(dp, dp_next, &ovs_net->dps, list_node)
__dp_destroy(dp);

--
2.23.0

2020-04-19 17:40:04

by Pravin Shelar

[permalink] [raw]
Subject: Re: [PATCH] net: openvswitch: ovs_ct_exit to be done under ovs_lock

On Sun, Apr 19, 2020 at 1:44 AM <[email protected]> wrote:
>
> From: Tonghao Zhang <[email protected]>
>
> syzbot wrote:
> | =============================
> | WARNING: suspicious RCU usage
> | 5.7.0-rc1+ #45 Not tainted
> | -----------------------------
> | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
> |
> | other info that might help us debug this:
> | rcu_scheduler_active = 2, debug_locks = 1
> | ...
> |
> | stack backtrace:
> | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
> | Workqueue: netns cleanup_net
> | Call Trace:
> | ...
> | ovs_ct_exit
> | ovs_exit_net
> | ops_exit_list.isra.7
> | cleanup_net
> | process_one_work
> | worker_thread
>
> To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
> lockdep_ovsl_is_held as optional lockdep expression.
>
> Link: https://lore.kernel.org/lkml/[email protected]
> Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
> Cc: Pravin B Shelar <[email protected]>
> Cc: Yi-Hung Wei <[email protected]>
> Reported-by: [email protected]
> Signed-off-by: Tonghao Zhang <[email protected]>

Acked-by: Pravin B Shelar <[email protected]>

Thanks.

2020-04-20 02:01:59

by Qian Cai

[permalink] [raw]
Subject: Re: linux-next test error: WARNING: suspicious RCU usage in ovs_ct_exit



> On Apr 18, 2020, at 3:02 AM, Dmitry Vyukov <[email protected]> wrote:
>
> On Sat, Mar 14, 2020 at 8:57 AM syzbot
> <[email protected]> wrote:
>>
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit: 2e602db7 Add linux-next specific files for 20200313
>> git tree: linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=16669919e00000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=cf2879fc1055b886
>> dashboard link: https://syzkaller.appspot.com/bug?extid=7ef50afd3a211f879112
>> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: [email protected]
>
> +linux-next, Stephen for currently open linux-next build/boot failure
>
>> =============================
>> WARNING: suspicious RCU usage
>> 5.6.0-rc5-next-20200313-syzkaller #0 Not tainted
>> -----------------------------
>> net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!

Those should be fixed by,

https://lore.kernel.org/netdev/[email protected]/


>> =============================
>> WARNING: suspicious RCU usage
>> 5.6.0-rc5-next-20200313-syzkaller #0 Not tainted
>> -----------------------------
>> net/ipv4/ipmr.c:1757 RCU-list traversed in non-reader section!!

and,

https://lore.kernel.org/netdev/[email protected]/

It looks like both are waiting for David to pick up.

2020-04-20 18:05:11

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] net: openvswitch: ovs_ct_exit to be done under ovs_lock

From: [email protected]
Date: Fri, 17 Apr 2020 02:57:31 +0800

> From: Tonghao Zhang <[email protected]>
>
> syzbot wrote:
...
> To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
> lockdep_ovsl_is_held as optional lockdep expression.
>
> Link: https://lore.kernel.org/lkml/[email protected]
> Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
> Cc: Pravin B Shelar <[email protected]>
> Cc: Yi-Hung Wei <[email protected]>
> Reported-by: [email protected]
> Signed-off-by: Tonghao Zhang <[email protected]>

Applied and queued up for -stable, thanks.