2023-07-20 15:58:50

by syzbot

[permalink] [raw]
Subject: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

Hello,

syzbot found the following issue on:

HEAD commit: 03b123debcbc tcp: tcp_enter_quickack_mode() should be static
git tree: net-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=17ac9ffaa80000
kernel config: https://syzkaller.appspot.com/x/.config?x=32e3dcc11fd0d297
dashboard link: https://syzkaller.appspot.com/bug?extid=14736e249bce46091c18
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=133f36c6a80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11a8e73aa80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/348462fb61fa/disk-03b123de.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/33375730f77f/vmlinux-03b123de.xz
kernel image: https://storage.googleapis.com/syzbot-assets/b6882fbac041/bzImage-03b123de.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

------------[ cut here ]------------
ODEBUG: activate active (active state 1) object: ffff88801529b000 object type: rcu_head hint: 0x0
WARNING: CPU: 0 PID: 57 at lib/debugobjects.c:514 debug_print_object+0x19e/0x2a0 lib/debugobjects.c:514
Modules linked in:
CPU: 0 PID: 57 Comm: kworker/u4:4 Not tainted 6.5.0-rc1-syzkaller-00458-g03b123debcbc #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
Workqueue: netns cleanup_net
RIP: 0010:debug_print_object+0x19e/0x2a0 lib/debugobjects.c:514
Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 49 48 8b 14 dd c0 20 c8 8a 41 56 4c 89 e6 48 c7 c7 20 14 c8 8a e8 b2 fa 28 fd <0f> 0b 58 83 05 5c 8b 87 0a 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e
RSP: 0018:ffffc90001587828 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: ffff888016ee5940 RSI: ffffffff814d4986 RDI: 0000000000000001
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8ac81a80
R13: ffffffff8a6df720 R14: 0000000000000000 R15: ffff88802a6b65c8
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000c776000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
debug_object_activate+0x32b/0x490 lib/debugobjects.c:733
debug_rcu_head_queue kernel/rcu/rcu.h:226 [inline]
kvfree_call_rcu+0x30/0xbe0 kernel/rcu/tree.c:3359
tcx_entry_free include/net/tcx.h:96 [inline]
tcx_uninstall+0x2fd/0x630 kernel/bpf/tcx.c:115
dev_tcx_uninstall include/net/tcx.h:174 [inline]
unregister_netdevice_many_notify+0x5e7/0x1a20 net/core/dev.c:10899
ip6gre_exit_batch_net+0x3ea/0x580 net/ipv6/ip6_gre.c:1642
ops_exit_list+0x125/0x170 net/core/net_namespace.c:175
cleanup_net+0x505/0xb20 net/core/net_namespace.c:614
process_one_work+0xaa2/0x16f0 kernel/workqueue.c:2597
worker_thread+0x687/0x1110 kernel/workqueue.c:2748
kthread+0x33a/0x430 kernel/kthread.c:389
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


2023-07-21 01:13:27

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

On 7/20/23 5:06 PM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 03b123debcbc tcp: tcp_enter_quickack_mode() should be static
> git tree: net-next
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=17ac9ffaa80000
> kernel config: https://syzkaller.appspot.com/x/.config?x=32e3dcc11fd0d297
> dashboard link: https://syzkaller.appspot.com/bug?extid=14736e249bce46091c18
> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=133f36c6a80000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11a8e73aa80000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/348462fb61fa/disk-03b123de.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/33375730f77f/vmlinux-03b123de.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/b6882fbac041/bzImage-03b123de.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]

Thanks, I'll take a look this evening.

2023-07-22 03:44:12

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: [email protected]

Tested on:

commit: 03b123de tcp: tcp_enter_quickack_mode() should be static
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=11086ae6a80000
kernel config: https://syzkaller.appspot.com/x/.config?x=32e3dcc11fd0d297
dashboard link: https://syzkaller.appspot.com/bug?extid=14736e249bce46091c18
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=10eb6176a80000

Note: testing is done by a robot and is best-effort only.

2023-07-24 10:09:07

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

syzbot has bisected this issue to:

commit e420bed025071a623d2720a92bc2245c84757ecb
Author: Daniel Borkmann <[email protected]>
Date: Wed Jul 19 14:08:52 2023 +0000

bpf: Add fd-based tcx multi-prog infra with link support

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14c60c6aa80000
start commit: 03b123debcbc tcp: tcp_enter_quickack_mode() should be static
git tree: net-next
final oops: https://syzkaller.appspot.com/x/report.txt?x=16c60c6aa80000
console output: https://syzkaller.appspot.com/x/log.txt?x=12c60c6aa80000
kernel config: https://syzkaller.appspot.com/x/.config?x=32e3dcc11fd0d297
dashboard link: https://syzkaller.appspot.com/bug?extid=14736e249bce46091c18
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=133f36c6a80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11a8e73aa80000

Reported-by: [email protected]
Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

2023-07-26 08:10:40

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

On Fri, Jul 21, 2023 at 02:52:14AM +0200, Daniel Borkmann wrote:
> On 7/20/23 5:06 PM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 03b123debcbc tcp: tcp_enter_quickack_mode() should be static
> > git tree: net-next
> > console+strace: https://syzkaller.appspot.com/x/log.txt?x=17ac9ffaa80000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=32e3dcc11fd0d297
> > dashboard link: https://syzkaller.appspot.com/bug?extid=14736e249bce46091c18
> > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=133f36c6a80000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11a8e73aa80000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/348462fb61fa/disk-03b123de.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/33375730f77f/vmlinux-03b123de.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/b6882fbac041/bzImage-03b123de.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
>
> Thanks, I'll take a look this evening.

Did anybody post a fix for that?

We are experiencing the following kernel panic in netdev commit
b57e0d48b300 (net-next/main) Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

[ 935.131864] ------------[ cut here ]------------
[ 935.133223] WARNING: CPU: 7 PID: 32248 at kernel/bpf/tcx.c:114 tcx_uninstall+0x158/0x1a0
[ 935.135408] Modules linked in: act_tunnel_key vxlan act_mirred act_skbedit cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress bonding mlx5_vfio_pci vfio_pci vfio_pci_core vfio_iommu_type1 vfio mlx5_ib mlx5_core ip6_gre ib_ipoib nf_tables rdma_ucm ib_uverbs geneve ip_gre gre ip6_tunnel tunnel6 ipip tunnel4 ib_umad iptable_raw openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm rpcsec_gss_krb5 auth_rpcgss oid_registry ib_core overlay zram zsmalloc fuse [last unloaded: ib_uverbs]
[ 935.141679] CPU: 7 PID: 32248 Comm: devlink Not tainted 6.5.0-rc2_net_next_mlx5_89edf40 #1
[ 935.142577] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 935.143758] RIP: 0010:tcx_uninstall+0x158/0x1a0
[ 935.144297] Code: f0 fe ff ff ba 4c 00 00 00 48 c7 c6 e7 33 27 82 48 c7 c7 c0 32 27 82 c6 05 5c 0a 3d 01 01 e8 4f a7 e8 ff 0f 0b e9 ca fe ff ff <0f> 0b eb 9d 44 0f b6 35 45 0a 3d 01 41 80 fe 01 0f 87 f6 8b 91 00
[ 935.146192] RSP: 0018:ffff8881eb853928 EFLAGS: 00010202
[ 935.146789] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[ 935.147548] RDX: ffff8881066f2808 RSI: ffff888106c40000 RDI: 0000000000000000
[ 935.148328] RBP: ffff8881066f2808 R08: 0000000000000001 R09: 00000000000003fe
[ 935.149081] R10: 0000000000000001 R11: 00000000fa83b2da R12: 0000000000000001
[ 935.149847] R13: ffff8881066f2808 R14: dead000000000122 R15: dead000000000100
[ 935.150490] FS: 00007fa48de38800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000
[ 935.151213] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 935.151753] CR2: 00007fa48dd7b1e0 CR3: 0000000125f06002 CR4: 0000000000370ea0
[ 935.152378] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 935.153000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 935.153620] Call Trace:
[ 935.153918] <TASK>
[ 935.154194] ? __warn+0x79/0x120
[ 935.154545] ? tcx_uninstall+0x158/0x1a0
[ 935.154936] ? report_bug+0x17c/0x190
[ 935.155320] ? handle_bug+0x3c/0x60
[ 935.155686] ? exc_invalid_op+0x14/0x70
[ 935.156088] ? asm_exc_invalid_op+0x16/0x20
[ 935.156499] ? tcx_uninstall+0x158/0x1a0
[ 935.156896] ? tcx_uninstall+0x57/0x1a0
[ 935.157287] unregister_netdevice_many_notify+0x32f/0x960
[ 935.157790] unregister_netdevice_queue+0x8d/0xe0
[ 935.158236] unregister_netdev+0x18/0x20
[ 935.158630] mlx5e_vport_rep_unload+0x30/0x90 [mlx5_core]
[ 935.159221] esw_offloads_unload_rep+0x24/0x40 [mlx5_core]
[ 935.159777] mlx5_eswitch_unload_vf_vports+0x7a/0xc0 [mlx5_core]
[ 935.160358] mlx5_eswitch_disable_pf_vf_vports+0x15/0xa0 [mlx5_core]
[ 935.160953] esw_offloads_disable+0xe/0x60 [mlx5_core]
[ 935.161469] mlx5_eswitch_disable_locked+0x15a/0x180 [mlx5_core]
[ 935.162058] mlx5_devlink_eswitch_mode_set+0xad/0x380 [mlx5_core]
[ 935.162637] ? devlink_get_from_attrs_lock+0x9e/0x110
[ 935.163108] devlink_nl_cmd_eswitch_set_doit+0x60/0xe0
[ 935.163579] genl_family_rcv_msg_doit.isra.0+0xc2/0x110
[ 935.164086] genl_rcv_msg+0x17d/0x2b0
[ 935.164460] ? devlink_get_from_attrs_lock+0x110/0x110
[ 935.164936] ? devlink_nl_cmd_eswitch_get_doit+0x290/0x290
[ 935.165436] ? devlink_pernet_pre_exit+0xf0/0xf0
[ 935.165880] ? genl_family_rcv_msg_doit.isra.0+0x110/0x110
[ 935.166381] netlink_rcv_skb+0x54/0x100
[ 935.166769] genl_rcv+0x24/0x40
[ 935.167109] netlink_unicast+0x1f6/0x2c0
[ 935.167496] netlink_sendmsg+0x239/0x4b0
[ 935.167901] sock_sendmsg+0x38/0x60
[ 935.168265] ? _copy_from_user+0x2a/0x60
[ 935.168654] __sys_sendto+0x110/0x160
[ 935.169023] ? handle_mm_fault+0xe4/0x270
[ 935.169422] ? do_user_addr_fault+0x270/0x620
[ 935.169849] __x64_sys_sendto+0x20/0x30
[ 935.170232] do_syscall_64+0x3d/0x90
[ 935.170600] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 935.171075] RIP: 0033:0x7fa48dd1340a
[ 935.171438] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 935.172982] RSP: 002b:00007ffcd3a8a498 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 935.173683] RAX: ffffffffffffffda RBX: 00000000008eeb00 RCX: 00007fa48dd1340a
[ 935.174296] RDX: 0000000000000038 RSI: 00000000008eeb00 RDI: 0000000000000003
[ 935.174913] RBP: 00000000008ee910 R08: 00007fa48df37200 R09: 000000000000000c
[ 935.175532] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 935.176166] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[ 935.176790] </TASK>
[ 935.177060] ---[ end trace 0000000000000000 ]---

Thanks

2023-07-26 15:58:58

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

On Wed, 26 Jul 2023 10:12:54 +0300 Leon Romanovsky wrote:
> > Thanks, I'll take a look this evening.
>
> Did anybody post a fix for that?
>
> We are experiencing the following kernel panic in netdev commit
> b57e0d48b300 (net-next/main) Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Not that I know, looks like this is with Daniel's previous fix already
present, and syzbot is hitting it, too :(

2023-07-26 17:48:36

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

On Wed, Jul 26, 2023 at 08:23:12AM -0700, Jakub Kicinski wrote:
> On Wed, 26 Jul 2023 10:12:54 +0300 Leon Romanovsky wrote:
> > > Thanks, I'll take a look this evening.
> >
> > Did anybody post a fix for that?
> >
> > We are experiencing the following kernel panic in netdev commit
> > b57e0d48b300 (net-next/main) Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
>
> Not that I know, looks like this is with Daniel's previous fix already
> present, and syzbot is hitting it, too :(

My naive workaround which restored our regression runs is:

diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c
index 69a272712b29..10c9ab830702 100644
--- a/kernel/bpf/tcx.c
+++ b/kernel/bpf/tcx.c
@@ -111,6 +111,7 @@ void tcx_uninstall(struct net_device *dev, bool ingress)
bpf_prog_put(tuple.prog);
tcx_skeys_dec(ingress);
}
- WARN_ON_ONCE(tcx_entry(entry)->miniq_active);
+ tcx_miniq_set_active(entry, false);
tcx_entry_free(entry);
}


2023-07-26 19:30:42

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

On 7/26/23 10:01 AM, Leon Romanovsky wrote:
> On Wed, Jul 26, 2023 at 08:23:12AM -0700, Jakub Kicinski wrote:
>> On Wed, 26 Jul 2023 10:12:54 +0300 Leon Romanovsky wrote:
>>>> Thanks, I'll take a look this evening.
>>>
>>> Did anybody post a fix for that?
>>>
>>> We are experiencing the following kernel panic in netdev commit
>>> b57e0d48b300 (net-next/main) Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
>>
>> Not that I know, looks like this is with Daniel's previous fix already
>> present, and syzbot is hitting it, too :(
>
> My naive workaround which restored our regression runs is:
>
> diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c
> index 69a272712b29..10c9ab830702 100644
> --- a/kernel/bpf/tcx.c
> +++ b/kernel/bpf/tcx.c
> @@ -111,6 +111,7 @@ void tcx_uninstall(struct net_device *dev, bool ingress)
> bpf_prog_put(tuple.prog);
> tcx_skeys_dec(ingress);
> }
> - WARN_ON_ONCE(tcx_entry(entry)->miniq_active);
> + tcx_miniq_set_active(entry, false);

Thanks for the report. I will look into it.

> tcx_entry_free(entry);
> }
>


2023-07-27 00:05:34

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

On 7/26/23 11:16 AM, Martin KaFai Lau wrote:
> On 7/26/23 10:01 AM, Leon Romanovsky wrote:
>> On Wed, Jul 26, 2023 at 08:23:12AM -0700, Jakub Kicinski wrote:
>>> On Wed, 26 Jul 2023 10:12:54 +0300 Leon Romanovsky wrote:
>>>>> Thanks, I'll take a look this evening.
>>>>
>>>> Did anybody post a fix for that?
>>>>
>>>> We are experiencing the following kernel panic in netdev commit
>>>> b57e0d48b300 (net-next/main) Merge branch '100GbE' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
>>>
>>> Not that I know, looks like this is with Daniel's previous fix already
>>> present, and syzbot is hitting it, too :(
>>
>> My naive workaround which restored our regression runs is:
>>
>> diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c
>> index 69a272712b29..10c9ab830702 100644
>> --- a/kernel/bpf/tcx.c
>> +++ b/kernel/bpf/tcx.c
>> @@ -111,6 +111,7 @@ void tcx_uninstall(struct net_device *dev, bool ingress)
>>                          bpf_prog_put(tuple.prog);
>>                  tcx_skeys_dec(ingress);
>>          }
>> -       WARN_ON_ONCE(tcx_entry(entry)->miniq_active);
>> +       tcx_miniq_set_active(entry, false);
>
> Thanks for the report. I will look into it.

I don't see how that may be triggered for now after Daniel's recent fix in
commit dc644b540a2d ("tcx: Fix splat in ingress_destroy upon tcx_entry_free").
Do you have a small reproducible case? Thanks.

>
>>          tcx_entry_free(entry);
>>   }
>>
>
>


2023-07-27 06:24:24

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

On Wed, Jul 26, 2023 at 04:33:40PM -0700, Martin KaFai Lau wrote:
> On 7/26/23 11:16 AM, Martin KaFai Lau wrote:
> > On 7/26/23 10:01 AM, Leon Romanovsky wrote:
> > > On Wed, Jul 26, 2023 at 08:23:12AM -0700, Jakub Kicinski wrote:
> > > > On Wed, 26 Jul 2023 10:12:54 +0300 Leon Romanovsky wrote:
> > > > > > Thanks, I'll take a look this evening.
> > > > >
> > > > > Did anybody post a fix for that?
> > > > >
> > > > > We are experiencing the following kernel panic in netdev commit
> > > > > b57e0d48b300 (net-next/main) Merge branch '100GbE' of
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
> > > >
> > > > Not that I know, looks like this is with Daniel's previous fix already
> > > > present, and syzbot is hitting it, too :(
> > >
> > > My naive workaround which restored our regression runs is:
> > >
> > > diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c
> > > index 69a272712b29..10c9ab830702 100644
> > > --- a/kernel/bpf/tcx.c
> > > +++ b/kernel/bpf/tcx.c
> > > @@ -111,6 +111,7 @@ void tcx_uninstall(struct net_device *dev, bool ingress)
> > > ???????????????????????? bpf_prog_put(tuple.prog);
> > > ???????????????? tcx_skeys_dec(ingress);
> > > ???????? }
> > > -?????? WARN_ON_ONCE(tcx_entry(entry)->miniq_active);
> > > +?????? tcx_miniq_set_active(entry, false);
> >
> > Thanks for the report. I will look into it.
>
> I don't see how that may be triggered for now after Daniel's recent fix in
> commit dc644b540a2d ("tcx: Fix splat in ingress_destroy upon
> tcx_entry_free").

Both our regression and syzbot have this fix in the trees.


> Do you have a small reproducible case? Thanks.

Unfortunately no.

Thanks

>
> >
> > > ???????? tcx_entry_free(entry);
> > > ? }
> > >
> >
> >
>

2023-07-28 00:25:39

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [syzbot] [bpf?] WARNING: ODEBUG bug in tcx_uninstall

On 7/27/23 7:41 AM, Leon Romanovsky wrote:
> On Wed, Jul 26, 2023 at 04:33:40PM -0700, Martin KaFai Lau wrote:
>> On 7/26/23 11:16 AM, Martin KaFai Lau wrote:
>>> On 7/26/23 10:01 AM, Leon Romanovsky wrote:
>>>> On Wed, Jul 26, 2023 at 08:23:12AM -0700, Jakub Kicinski wrote:
>>>>> On Wed, 26 Jul 2023 10:12:54 +0300 Leon Romanovsky wrote:
>>>>>>> Thanks, I'll take a look this evening.
>>>>>>
>>>>>> Did anybody post a fix for that?
>>>>>>
>>>>>> We are experiencing the following kernel panic in netdev commit
>>>>>> b57e0d48b300 (net-next/main) Merge branch '100GbE' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
>>>>>
>>>>> Not that I know, looks like this is with Daniel's previous fix already
>>>>> present, and syzbot is hitting it, too :(
>>>>
>>>> My naive workaround which restored our regression runs is:
>>>>
>>>> diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c
>>>> index 69a272712b29..10c9ab830702 100644
>>>> --- a/kernel/bpf/tcx.c
>>>> +++ b/kernel/bpf/tcx.c
>>>> @@ -111,6 +111,7 @@ void tcx_uninstall(struct net_device *dev, bool ingress)
>>>>                          bpf_prog_put(tuple.prog);
>>>>                  tcx_skeys_dec(ingress);
>>>>          }
>>>> -       WARN_ON_ONCE(tcx_entry(entry)->miniq_active);
>>>> +       tcx_miniq_set_active(entry, false);
>>>
>>> Thanks for the report. I will look into it.
>>
>> I don't see how that may be triggered for now after Daniel's recent fix in
>> commit dc644b540a2d ("tcx: Fix splat in ingress_destroy upon
>> tcx_entry_free").
>
> Both our regression and syzbot have this fix in the trees.
>
>> Do you have a small reproducible case? Thanks.
>
> Unfortunately no.

Thanks for the report, we found the root cause and will send a fix in the next
day or two.

Best,
Daniel