2021-10-13 11:42:01

by syzbot

[permalink] [raw]
Subject: [syzbot] BUG: corrupted list in netif_napi_add

Hello,

syzbot found the following issue on:

HEAD commit: 683f29b781ae Add linux-next specific files for 20211008
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
kernel config: https://syzkaller.appspot.com/x/.config?x=673b3589d970c
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17c98e98b00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
list_add double add: new=ffff888023417160, prev=ffff88807de3a050, next=ffff888023417160.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9490 Comm: syz-executor.1 Not tainted 5.15.0-rc4-next-20211008-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
FS: 00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__list_add_rcu include/linux/rculist.h:79 [inline]
list_add_rcu include/linux/rculist.h:106 [inline]
netif_napi_add+0x3fd/0x9c0 net/core/dev.c:6889
veth_enable_xdp_range+0x1b1/0x300 drivers/net/veth.c:1009
veth_enable_xdp+0x2a5/0x620 drivers/net/veth.c:1063
veth_xdp_set drivers/net/veth.c:1483 [inline]
veth_xdp+0x4d4/0x780 drivers/net/veth.c:1523
bond_xdp_set drivers/net/bonding/bond_main.c:5217 [inline]
bond_xdp+0x325/0x920 drivers/net/bonding/bond_main.c:5263
dev_xdp_install+0xd5/0x270 net/core/dev.c:9365
dev_xdp_attach+0x83d/0x1010 net/core/dev.c:9513
dev_change_xdp_fd+0x246/0x300 net/core/dev.c:9753
do_setlink+0x2fb4/0x3970 net/core/rtnetlink.c:2931
rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
__rtnl_newlink+0xc06/0x1750 net/core/rtnetlink.c:3396
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f841f2718d9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f841e9e8188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f841f375f60 RCX: 00007f841f2718d9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00007f841f2cbcb4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc8978d37f R14: 00007f841e9e8300 R15: 0000000000022000
</TASK>
Modules linked in:
---[ end trace 7281cadbc8534f23 ]---
RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
FS: 00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches


2021-10-13 13:38:36

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

On 10/13/21 1:40 PM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:

[ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]

> HEAD commit: 683f29b781ae Add linux-next specific files for 20211008
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=673b3589d970c
> dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17c98e98b00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
> list_add double add: new=ffff888023417160, prev=ffff88807de3a050, next=ffff888023417160.
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:29!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 9490 Comm: syz-executor.1 Not tainted 5.15.0-rc4-next-20211008-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
> Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
> RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
> RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
> RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
> RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
> R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
> R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
> FS: 00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> __list_add_rcu include/linux/rculist.h:79 [inline]
> list_add_rcu include/linux/rculist.h:106 [inline]
> netif_napi_add+0x3fd/0x9c0 net/core/dev.c:6889
> veth_enable_xdp_range+0x1b1/0x300 drivers/net/veth.c:1009
> veth_enable_xdp+0x2a5/0x620 drivers/net/veth.c:1063
> veth_xdp_set drivers/net/veth.c:1483 [inline]
> veth_xdp+0x4d4/0x780 drivers/net/veth.c:1523
> bond_xdp_set drivers/net/bonding/bond_main.c:5217 [inline]
> bond_xdp+0x325/0x920 drivers/net/bonding/bond_main.c:5263
> dev_xdp_install+0xd5/0x270 net/core/dev.c:9365
> dev_xdp_attach+0x83d/0x1010 net/core/dev.c:9513
> dev_change_xdp_fd+0x246/0x300 net/core/dev.c:9753
> do_setlink+0x2fb4/0x3970 net/core/rtnetlink.c:2931
> rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
> __rtnl_newlink+0xc06/0x1750 net/core/rtnetlink.c:3396
> rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
> rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
> netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
> sock_sendmsg_nosec net/socket.c:704 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:724
> ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
> ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
> __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f841f2718d9
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f841e9e8188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f841f375f60 RCX: 00007f841f2718d9
> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
> RBP: 00007f841f2cbcb4 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffc8978d37f R14: 00007f841e9e8300 R15: 0000000000022000
> </TASK>
> Modules linked in:
> ---[ end trace 7281cadbc8534f23 ]---
> RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
> Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
> RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
> RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
> RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
> RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
> R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
> R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
> FS: 00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches
>

2021-10-13 14:42:28

by Paolo Abeni

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

On Wed, 2021-10-13 at 04:40 -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 683f29b781ae Add linux-next specific files for 20211008
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=673b3589d970c
> dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17c98e98b00000
>
> IMPORTANTIMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
> list_add double add: new=ffff888023417160, prev=ffff88807de3a050, next=ffff888023417160.
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:29!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 9490 Comm: syz-executor.1 Not tainted 5.15.0-rc4-next-20211008-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
> Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
> RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
> RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
> RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
> RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
> R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
> R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
> FS: 00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> __list_add_rcu include/linux/rculist.h:79 [inline]
> list_add_rcu include/linux/rculist.h:106 [inline]
> netif_napi_add+0x3fd/0x9c0 net/core/dev.c:6889
> veth_enable_xdp_range+0x1b1/0x300 drivers/net/veth.c:1009
> veth_enable_xdp+0x2a5/0x620 drivers/net/veth.c:1063
> veth_xdp_set drivers/net/veth.c:1483 [inline]
> veth_xdp+0x4d4/0x780 drivers/net/veth.c:1523
> bond_xdp_set drivers/net/bonding/bond_main.c:5217 [inline]
> bond_xdp+0x325/0x920 drivers/net/bonding/bond_main.c:5263
> dev_xdp_install+0xd5/0x270 net/core/dev.c:9365
> dev_xdp_attach+0x83d/0x1010 net/core/dev.c:9513
> dev_change_xdp_fd+0x246/0x300 net/core/dev.c:9753
> do_setlink+0x2fb4/0x3970 net/core/rtnetlink.c:2931
> rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
> __rtnl_newlink+0xc06/0x1750 net/core/rtnetlink.c:3396
> rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
> rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
> netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
> sock_sendmsg_nosec net/socket.c:704 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:724
> ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
> ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
> __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f841f2718d9
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f841e9e8188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f841f375f60 RCX: 00007f841f2718d9
> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
> RBP: 00007f841f2cbcb4 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffc8978d37f R14: 00007f841e9e8300 R15: 0000000000022000
> </TASK>
> Modules linked in:
> ---[ end trace 7281cadbc8534f23 ]---
> RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
> Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
> RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
> RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
> RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
> RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
> R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
> R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
> FS: 00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

For the record: I'm wild guessing this is related to:

https://syzkaller.appspot.com/bug?extid=67f89551088ea1a6850e

(hopefully they share the same root cause)

I spent some time investigating the latter, with no real clue. This has
a repro, so I'll ask syzbot to provide more info with debug patches.

Cheers,

Paolo

2021-10-14 13:54:55

by Paolo Abeni

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

On Wed, 2021-10-13 at 15:35 +0200, Daniel Borkmann wrote:
> On 10/13/21 1:40 PM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
>
> [ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]

For the records: Toke and me are actively investigating this issue and
the other recent related one. So far we could not find anything
relevant.

The onluy note is that the reproducer is not extremelly reliable - I
could not reproduce locally, and multiple syzbot runs on the same code
give different results. Anyhow, so far the issue was only observerable
on a specific 'next' commit which is currently "not reachable" from any
branch. I'm wondering if the issue was caused by some incosistent
status of such tree.

Cheers,

Paolo

2021-10-18 14:09:21

by Vlad Buslov

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add


On Thu 14 Oct 2021 at 16:50, Paolo Abeni <[email protected]> wrote:
> On Wed, 2021-10-13 at 15:35 +0200, Daniel Borkmann wrote:
>> On 10/13/21 1:40 PM, syzbot wrote:
>> > Hello,
>> >
>> > syzbot found the following issue on:
>>
>> [ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]
>
> For the records: Toke and me are actively investigating this issue and
> the other recent related one. So far we could not find anything
> relevant.
>
> The onluy note is that the reproducer is not extremelly reliable - I
> could not reproduce locally, and multiple syzbot runs on the same code
> give different results. Anyhow, so far the issue was only observerable
> on a specific 'next' commit which is currently "not reachable" from any
> branch. I'm wondering if the issue was caused by some incosistent
> status of such tree.

Hi,

We got a use-after-free with very similar trace [0] during nightly
regression. The issue happens when ip link up/down state is flipped
several times in loop and doesn't reproduce for me manually. The fact
that it didn't reproduce for me after running test ten times suggests
that it is either very hard to reproduce or that it is a result of some
interaction between several tests in our suite.

[0]:

[ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
[ 3187.890694] ==================================================================
[ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
[ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618
[ 3187.895683]
[ 3187.896209] CPU: 0 PID: 119618 Comm: ip Not tainted 5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
[ 3187.898445] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 3187.901075] Call Trace:
[ 3187.901858] dump_stack_lvl+0x57/0x7d
[ 3187.902899] print_address_description.constprop.0+0x1f/0x140
[ 3187.904346] ? __list_add_valid+0xc3/0xf0
[ 3187.905439] ? __list_add_valid+0xc3/0xf0
[ 3187.906565] kasan_report.cold+0x83/0xdf
[ 3187.907619] ? __list_add_valid+0xc3/0xf0
[ 3187.908693] __list_add_valid+0xc3/0xf0
[ 3187.909765] netif_napi_add+0x399/0x9a0
[ 3187.910794] ? kmalloc_order_trace+0x6a/0x120
[ 3187.911944] mlx5e_open_channels+0x91b/0x2e10 [mlx5_core]
[ 3187.913872] ? rwlock_bug.part.0+0x90/0x90
[ 3187.914959] ? mlx5e_close_cq+0x80/0x80 [mlx5_core]
[ 3187.916584] ? mutex_is_locked+0x13/0x50
[ 3187.917703] mlx5e_open_locked+0x6a/0x1f0 [mlx5_core]
[ 3187.919368] mlx5e_open+0x35/0xb0 [mlx5_core]
[ 3187.920863] __dev_open+0x22f/0x420
[ 3187.921852] ? dev_set_rx_mode+0x80/0x80
[ 3187.922920] ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
[ 3187.924866] ? __local_bh_enable_ip+0xa2/0x100
[ 3187.926148] ? trace_hardirqs_on+0x32/0x120
[ 3187.927270] __dev_change_flags+0x451/0x670
[ 3187.928387] ? dev_set_allmulti+0x10/0x10
[ 3187.929480] ? rtnl_fill_vfinfo+0x936/0xdb0
[ 3187.930592] dev_change_flags+0x8b/0x150
[ 3187.931651] do_setlink+0x820/0x2d60
[ 3187.932631] ? rtnetlink_put_metrics+0x490/0x490
[ 3187.933852] ? lock_release+0x460/0x750
[ 3187.934881] ? kvm_async_pf_task_wake+0x410/0x410
[ 3187.936122] ? lock_downgrade+0x6e0/0x6e0
[ 3187.937203] ? do_raw_spin_unlock+0x54/0x220
[ 3187.938351] ? memset+0x20/0x40
[ 3187.939246] ? __nla_validate_parse+0xb2/0x22c0
[ 3187.940426] ? do_raw_spin_lock+0x126/0x270
[ 3187.941568] ? push_cpu_stop+0x830/0x830
[ 3187.942638] ? rwlock_bug.part.0+0x90/0x90
[ 3187.943733] ? devlink_compat_switch_id_get+0xbb/0x100
[ 3187.945065] ? nla_get_range_signed+0x540/0x540
[ 3187.946272] ? memcpy+0x39/0x60
[ 3187.947162] ? memset+0x20/0x40
[ 3187.948058] ? memset+0x20/0x40
[ 3187.948943] __rtnl_newlink+0xac0/0x1370
[ 3187.950038] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3187.951380] ? rtnl_setlink+0x330/0x330
[ 3187.952417] ? deref_stack_reg+0x160/0x160
[ 3187.953534] ? deref_stack_reg+0xe6/0x160
[ 3187.954619] ? rcu_read_lock_sched_held+0x12/0x70
[ 3187.955848] ? lock_release+0x460/0x750
[ 3187.956886] ? is_bpf_text_address+0x54/0x110
[ 3187.958047] ? lock_downgrade+0x6e0/0x6e0
[ 3187.959133] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3187.960469] ? deref_stack_reg+0x160/0x160
[ 3187.961592] ? is_bpf_text_address+0x73/0x110
[ 3187.962759] ? kernel_text_address+0xda/0x100
[ 3187.963920] ? __kernel_text_address+0xe/0x30
[ 3187.965069] ? unwind_get_return_address+0x56/0xa0
[ 3187.966334] ? __thaw_task+0x70/0x70
[ 3187.967320] ? arch_stack_walk+0x98/0xf0
[ 3187.968405] ? lock_downgrade+0x6e0/0x6e0
[ 3187.969510] ? trace_hardirqs_on+0x32/0x120
[ 3187.970644] ? rcu_read_lock_sched_held+0x12/0x70
[ 3187.971883] rtnl_newlink+0x5f/0x90
[ 3187.972866] rtnetlink_rcv_msg+0x32b/0x950
[ 3187.973968] ? deref_stack_reg+0x160/0x160
[ 3187.975088] ? rtnl_fdb_dump+0x830/0x830
[ 3187.976160] ? rcu_read_lock_sched_held+0x12/0x70
[ 3187.977393] ? lock_acquire+0x38d/0x4c0
[ 3187.978443] ? rcu_read_lock_sched_held+0x12/0x70
[ 3187.979685] ? lock_acquire+0x38d/0x4c0
[ 3187.980733] netlink_rcv_skb+0x11d/0x340
[ 3187.981812] ? rtnl_fdb_dump+0x830/0x830
[ 3187.982862] ? rcu_read_lock_sched_held+0x12/0x70
[ 3187.984105] ? netlink_ack+0x930/0x930
[ 3187.985136] ? netlink_deliver_tap+0x140/0xb10
[ 3187.986316] ? netlink_deliver_tap+0x14c/0xb10
[ 3187.987495] ? _copy_from_iter+0x282/0xbe0
[ 3187.988597] netlink_unicast+0x433/0x700
[ 3187.989693] ? netlink_attachskb+0x740/0x740
[ 3187.990819] ? __alloc_skb+0x117/0x2c0
[ 3187.991855] netlink_sendmsg+0x707/0xbf0
[ 3187.992921] ? netlink_unicast+0x700/0x700
[ 3187.994024] ? netlink_unicast+0x700/0x700
[ 3187.995121] sock_sendmsg+0xb0/0xe0
[ 3187.996091] ____sys_sendmsg+0x4fa/0x6d0
[ 3187.997163] ? iovec_from_user+0x136/0x280
[ 3187.998276] ? kernel_sendmsg+0x30/0x30
[ 3188.012806] ? __import_iovec+0x51/0x610
[ 3188.013858] ___sys_sendmsg+0x12e/0x1b0
[ 3188.014875] ? do_recvmmsg+0x500/0x500
[ 3188.015877] ? get_max_files+0x10/0x10
[ 3188.016866] ? kasan_record_aux_stack+0xab/0xc0
[ 3188.018108] ? call_rcu+0x87/0xd40
[ 3188.019041] ? task_work_run+0xc5/0x160
[ 3188.020044] ? exit_to_user_mode_prepare+0x1d9/0x1e0
[ 3188.021271] ? syscall_exit_to_user_mode+0x19/0x50
[ 3188.022563] ? do_syscall_64+0x4a/0x90
[ 3188.023559] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.024858] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.026121] ? lock_release+0x460/0x750
[ 3188.027174] ? mntput_no_expire+0x113/0xb40
[ 3188.028302] ? lock_downgrade+0x6e0/0x6e0
[ 3188.029398] ? rwlock_bug.part.0+0x90/0x90
[ 3188.030555] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.031812] ? mntput_no_expire+0x132/0xb40
[ 3188.032940] ? __fget_light+0x51/0x220
[ 3188.033986] __sys_sendmsg+0xa4/0x120
[ 3188.034992] ? __sys_sendmsg_sock+0x20/0x20
[ 3188.036115] ? call_rcu+0x543/0xd40
[ 3188.037084] ? syscall_enter_from_user_mode+0x1d/0x50
[ 3188.038406] ? trace_hardirqs_on+0x32/0x120
[ 3188.039515] do_syscall_64+0x3d/0x90
[ 3188.040502] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.041896] RIP: 0033:0x7f904ec94c17
[ 3188.042891] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 3188.047412] RSP: 002b:00007ffc1a6c4a98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3188.049361] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f904ec94c17
[ 3188.051121] RDX: 0000000000000000 RSI: 00007ffc1a6c4b00 RDI: 0000000000000003
[ 3188.052881] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007f904ed55a40
[ 3188.054645] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
[ 3188.056403] R13: 00007ffc1a6c51b0 R14: 00007ffc1a6c6c87 R15: 000000000048f520
[ 3188.058189]
[ 3188.058732] The buggy address belongs to the page:
[ 3188.059996] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
[ 3188.062378] flags: 0x8000000000000000(zone=2)
[ 3188.063551] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
[ 3188.065548] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 3188.067518] page dumped because: kasan: bad access detected
[ 3188.068930]
[ 3188.069481] Memory state around the buggy address:
[ 3188.070730] ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.072618] ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.074508] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.076378] ^
[ 3188.077711] ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.079584] ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.081470] ==================================================================
[ 3188.083406] ==================================================================
[ 3188.085280] BUG: KASAN: use-after-free in netif_napi_add+0x8b7/0x9a0
[ 3188.086952] Write of size 8 at addr ffff8881150b3fb8 by task ip/119618
[ 3188.089181]
[ 3188.089987] CPU: 0 PID: 119618 Comm: ip Tainted: G B 5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
[ 3188.092659] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 3188.095481] Call Trace:
[ 3188.096222] dump_stack_lvl+0x57/0x7d
[ 3188.097238] print_address_description.constprop.0+0x1f/0x140
[ 3188.098764] ? netif_napi_add+0x8b7/0x9a0
[ 3188.099862] ? netif_napi_add+0x8b7/0x9a0
[ 3188.100940] kasan_report.cold+0x83/0xdf
[ 3188.102041] ? netif_napi_add+0x8b7/0x9a0
[ 3188.103140] netif_napi_add+0x8b7/0x9a0
[ 3188.104180] ? kmalloc_order_trace+0x6a/0x120
[ 3188.105336] mlx5e_open_channels+0x91b/0x2e10 [mlx5_core]
[ 3188.107145] ? rwlock_bug.part.0+0x90/0x90
[ 3188.108238] ? mlx5e_close_cq+0x80/0x80 [mlx5_core]
[ 3188.109882] ? mutex_is_locked+0x13/0x50
[ 3188.110985] mlx5e_open_locked+0x6a/0x1f0 [mlx5_core]
[ 3188.112644] mlx5e_open+0x35/0xb0 [mlx5_core]
[ 3188.114215] __dev_open+0x22f/0x420
[ 3188.115186] ? dev_set_rx_mode+0x80/0x80
[ 3188.116247] ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
[ 3188.118252] ? __local_bh_enable_ip+0xa2/0x100
[ 3188.119438] ? trace_hardirqs_on+0x32/0x120
[ 3188.120554] __dev_change_flags+0x451/0x670
[ 3188.121705] ? dev_set_allmulti+0x10/0x10
[ 3188.122828] ? rtnl_fill_vfinfo+0x936/0xdb0
[ 3188.123943] dev_change_flags+0x8b/0x150
[ 3188.124995] do_setlink+0x820/0x2d60
[ 3188.126023] ? rtnetlink_put_metrics+0x490/0x490
[ 3188.127233] ? lock_release+0x460/0x750
[ 3188.128269] ? kvm_async_pf_task_wake+0x410/0x410
[ 3188.129502] ? lock_downgrade+0x6e0/0x6e0
[ 3188.130620] ? do_raw_spin_unlock+0x54/0x220
[ 3188.131781] ? memset+0x20/0x40
[ 3188.132663] ? __nla_validate_parse+0xb2/0x22c0
[ 3188.133894] ? do_raw_spin_lock+0x126/0x270
[ 3188.135066] ? push_cpu_stop+0x830/0x830
[ 3188.136136] ? rwlock_bug.part.0+0x90/0x90
[ 3188.137230] ? devlink_compat_switch_id_get+0xbb/0x100
[ 3188.138585] ? nla_get_range_signed+0x540/0x540
[ 3188.139780] ? memcpy+0x39/0x60
[ 3188.140683] ? memset+0x20/0x40
[ 3188.141580] ? memset+0x20/0x40
[ 3188.142517] __rtnl_newlink+0xac0/0x1370
[ 3188.143579] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.144914] ? rtnl_setlink+0x330/0x330
[ 3188.145974] ? deref_stack_reg+0x160/0x160
[ 3188.147078] ? deref_stack_reg+0xe6/0x160
[ 3188.148157] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.149378] ? lock_release+0x460/0x750
[ 3188.150490] ? is_bpf_text_address+0x54/0x110
[ 3188.151648] ? lock_downgrade+0x6e0/0x6e0
[ 3188.152725] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.154075] ? deref_stack_reg+0x160/0x160
[ 3188.155176] ? is_bpf_text_address+0x73/0x110
[ 3188.156353] ? kernel_text_address+0xda/0x100
[ 3188.157510] ? __kernel_text_address+0xe/0x30
[ 3188.158707] ? unwind_get_return_address+0x56/0xa0
[ 3188.159992] ? __thaw_task+0x70/0x70
[ 3188.160979] ? arch_stack_walk+0x98/0xf0
[ 3188.162072] ? lock_downgrade+0x6e0/0x6e0
[ 3188.163167] ? trace_hardirqs_on+0x32/0x120
[ 3188.164295] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.165546] rtnl_newlink+0x5f/0x90
[ 3188.166558] rtnetlink_rcv_msg+0x32b/0x950
[ 3188.167677] ? deref_stack_reg+0x160/0x160
[ 3188.168782] ? rtnl_fdb_dump+0x830/0x830
[ 3188.169857] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.171089] ? lock_acquire+0x38d/0x4c0
[ 3188.172131] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.173367] ? lock_acquire+0x38d/0x4c0
[ 3188.174472] netlink_rcv_skb+0x11d/0x340
[ 3188.175531] ? rtnl_fdb_dump+0x830/0x830
[ 3188.176592] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.177824] ? netlink_ack+0x930/0x930
[ 3188.178848] ? netlink_deliver_tap+0x140/0xb10
[ 3188.180013] ? netlink_deliver_tap+0x14c/0xb10
[ 3188.181188] ? _copy_from_iter+0x282/0xbe0
[ 3188.182351] netlink_unicast+0x433/0x700
[ 3188.183418] ? netlink_attachskb+0x740/0x740
[ 3188.184552] ? __alloc_skb+0x117/0x2c0
[ 3188.185606] netlink_sendmsg+0x707/0xbf0
[ 3188.186672] ? netlink_unicast+0x700/0x700
[ 3188.187783] ? netlink_unicast+0x700/0x700
[ 3188.188882] sock_sendmsg+0xb0/0xe0
[ 3188.189862] ____sys_sendmsg+0x4fa/0x6d0
[ 3188.190971] ? iovec_from_user+0x136/0x280
[ 3188.192074] ? kernel_sendmsg+0x30/0x30
[ 3188.193130] ? __import_iovec+0x51/0x610
[ 3188.194225] ___sys_sendmsg+0x12e/0x1b0
[ 3188.195267] ? do_recvmmsg+0x500/0x500
[ 3188.196301] ? get_max_files+0x10/0x10
[ 3188.197333] ? kasan_record_aux_stack+0xab/0xc0
[ 3188.198558] ? call_rcu+0x87/0xd40
[ 3188.199519] ? task_work_run+0xc5/0x160
[ 3188.200557] ? exit_to_user_mode_prepare+0x1d9/0x1e0
[ 3188.201872] ? syscall_exit_to_user_mode+0x19/0x50
[ 3188.203134] ? do_syscall_64+0x4a/0x90
[ 3188.204152] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.205511] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.206782] ? lock_release+0x460/0x750
[ 3188.207870] ? mntput_no_expire+0x113/0xb40
[ 3188.209025] ? lock_downgrade+0x6e0/0x6e0
[ 3188.210272] ? rwlock_bug.part.0+0x90/0x90
[ 3188.211864] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.213644] ? mntput_no_expire+0x132/0xb40
[ 3188.215253] ? __fget_light+0x51/0x220
[ 3188.216535] __sys_sendmsg+0xa4/0x120
[ 3188.217574] ? __sys_sendmsg_sock+0x20/0x20
[ 3188.218707] ? call_rcu+0x543/0xd40
[ 3188.219679] ? syscall_enter_from_user_mode+0x1d/0x50
[ 3188.221004] ? trace_hardirqs_on+0x32/0x120
[ 3188.235475] do_syscall_64+0x3d/0x90
[ 3188.236463] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.237744] RIP: 0033:0x7f904ec94c17
[ 3188.238693] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 3188.242968] RSP: 002b:00007ffc1a6c4a98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3188.244834] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f904ec94c17
[ 3188.246604] RDX: 0000000000000000 RSI: 00007ffc1a6c4b00 RDI: 0000000000000003
[ 3188.248362] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007f904ed55a40
[ 3188.250140] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
[ 3188.251889] R13: 00007ffc1a6c51b0 R14: 00007ffc1a6c6c87 R15: 000000000048f520
[ 3188.253667]
[ 3188.254215] The buggy address belongs to the page:
[ 3188.255460] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
[ 3188.257812] flags: 0x8000000000000000(zone=2)
[ 3188.258985] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
[ 3188.260971] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 3188.262993] page dumped because: kasan: bad access detected
[ 3188.264413]
[ 3188.264943] Memory state around the buggy address:
[ 3188.266203] ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.268082] ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.269957] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.271818] ^
[ 3188.273122] ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.275000] ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.276862] ==================================================================
[ 3188.371511] mlx5_core 0000:08:00.0 enp8s0f0: Link up
[ 3188.376126] IPv6: ADDRCONF(NETDEV_CHANGE): enp8s0f0: link becomes ready
[ 3188.430532] ==================================================================
[ 3188.432378] BUG: KASAN: use-after-free in __list_del_entry_valid+0x14b/0x180
[ 3188.434254] Read of size 8 at addr ffff8881150b3fb8 by task ip/119619
[ 3188.435826]
[ 3188.436365] CPU: 3 PID: 119619 Comm: ip Tainted: G B 5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
[ 3188.439688] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 3188.442423] Call Trace:
[ 3188.443172] dump_stack_lvl+0x57/0x7d
[ 3188.444186] print_address_description.constprop.0+0x1f/0x140
[ 3188.445703] ? __list_del_entry_valid+0x14b/0x180
[ 3188.447004] ? __list_del_entry_valid+0x14b/0x180
[ 3188.448255] kasan_report.cold+0x83/0xdf
[ 3188.449323] ? __list_del_entry_valid+0x14b/0x180
[ 3188.450670] __list_del_entry_valid+0x14b/0x180
[ 3188.451887] ? _raw_spin_unlock+0x1f/0x30
[ 3188.452969] __netif_napi_del.part.0+0xec/0x4a0
[ 3188.454453] mlx5e_close_channel+0x7d/0xd0 [mlx5_core]
[ 3188.456988] mlx5e_close_channels+0xf9/0x200 [mlx5_core]
[ 3188.459599] mlx5e_close_locked+0x101/0x130 [mlx5_core]
[ 3188.462156] mlx5e_close+0xad/0x100 [mlx5_core]
[ 3188.463961] __dev_close_many+0x18e/0x2b0
[ 3188.465045] ? list_netdevice+0x3a0/0x3a0
[ 3188.466187] ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
[ 3188.468156] ? __local_bh_enable_ip+0xa2/0x100
[ 3188.469333] ? trace_hardirqs_on+0x32/0x120
[ 3188.470496] __dev_change_flags+0x254/0x670
[ 3188.471605] ? dev_set_allmulti+0x10/0x10
[ 3188.472692] ? rtnl_fill_vfinfo+0x936/0xdb0
[ 3188.473854] dev_change_flags+0x8b/0x150
[ 3188.474965] do_setlink+0x820/0x2d60
[ 3188.475950] ? rtnetlink_put_metrics+0x490/0x490
[ 3188.477165] ? lock_release+0x460/0x750
[ 3188.478306] ? kvm_async_pf_task_wake+0x410/0x410
[ 3188.479542] ? lock_downgrade+0x6e0/0x6e0
[ 3188.480615] ? do_raw_spin_unlock+0x54/0x220
[ 3188.481790] ? memset+0x20/0x40
[ 3188.482963] ? __nla_validate_parse+0xb2/0x22c0
[ 3188.484167] ? do_raw_spin_lock+0x126/0x270
[ 3188.485281] ? push_cpu_stop+0x830/0x830
[ 3188.486457] ? rwlock_bug.part.0+0x90/0x90
[ 3188.487557] ? devlink_compat_switch_id_get+0xbb/0x100
[ 3188.488894] ? nla_get_range_signed+0x540/0x540
[ 3188.490168] ? memcpy+0x39/0x60
[ 3188.491083] ? memset+0x20/0x40
[ 3188.491966] ? memset+0x20/0x40
[ 3188.492855] __rtnl_newlink+0xac0/0x1370
[ 3188.493987] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.495384] ? rtnl_setlink+0x330/0x330
[ 3188.496446] ? deref_stack_reg+0x160/0x160
[ 3188.497551] ? deref_stack_reg+0xe6/0x160
[ 3188.498713] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.499929] ? lock_release+0x460/0x750
[ 3188.501232] ? is_bpf_text_address+0x54/0x110
[ 3188.502735] ? lock_downgrade+0x6e0/0x6e0
[ 3188.503831] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.505157] ? deref_stack_reg+0x160/0x160
[ 3188.506298] ? is_bpf_text_address+0x73/0x110
[ 3188.507459] ? kernel_text_address+0xda/0x100
[ 3188.508615] ? __kernel_text_address+0xe/0x30
[ 3188.509776] ? unwind_get_return_address+0x56/0xa0
[ 3188.511047] ? __thaw_task+0x70/0x70
[ 3188.512033] ? arch_stack_walk+0x98/0xf0
[ 3188.513059] ? lock_downgrade+0x6e0/0x6e0
[ 3188.514191] ? trace_hardirqs_on+0x32/0x120
[ 3188.515303] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.516524] rtnl_newlink+0x5f/0x90
[ 3188.517513] rtnetlink_rcv_msg+0x32b/0x950
[ 3188.518652] ? deref_stack_reg+0x160/0x160
[ 3188.519761] ? rtnl_fdb_dump+0x830/0x830
[ 3188.520816] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.522119] ? lock_acquire+0x38d/0x4c0
[ 3188.523211] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.524435] ? lock_acquire+0x38d/0x4c0
[ 3188.525498] netlink_rcv_skb+0x11d/0x340
[ 3188.526649] ? rtnl_fdb_dump+0x830/0x830
[ 3188.527722] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.528949] ? netlink_ack+0x930/0x930
[ 3188.530055] ? netlink_deliver_tap+0x140/0xb10
[ 3188.531347] ? netlink_deliver_tap+0x14c/0xb10
[ 3188.532549] ? _copy_from_iter+0x282/0xbe0
[ 3188.533711] netlink_unicast+0x433/0x700
[ 3188.534845] ? netlink_attachskb+0x740/0x740
[ 3188.535987] ? __alloc_skb+0x117/0x2c0
[ 3188.537006] netlink_sendmsg+0x707/0xbf0
[ 3188.538150] ? netlink_unicast+0x700/0x700
[ 3188.539337] ? netlink_unicast+0x700/0x700
[ 3188.540448] sock_sendmsg+0xb0/0xe0
[ 3188.541424] ____sys_sendmsg+0x4fa/0x6d0
[ 3188.542743] ? iovec_from_user+0x136/0x280
[ 3188.543932] ? kernel_sendmsg+0x30/0x30
[ 3188.544963] ? __import_iovec+0x51/0x610
[ 3188.546063] ___sys_sendmsg+0x12e/0x1b0
[ 3188.547189] ? do_recvmmsg+0x500/0x500
[ 3188.548209] ? get_max_files+0x10/0x10
[ 3188.549226] ? kasan_record_aux_stack+0xab/0xc0
[ 3188.550547] ? call_rcu+0x87/0xd40
[ 3188.551509] ? task_work_run+0xc5/0x160
[ 3188.552546] ? exit_to_user_mode_prepare+0x1d9/0x1e0
[ 3188.553896] ? syscall_exit_to_user_mode+0x19/0x50
[ 3188.555195] ? do_syscall_64+0x4a/0x90
[ 3188.556206] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.557634] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.558903] ? lock_release+0x460/0x750
[ 3188.559948] ? mntput_no_expire+0x113/0xb40
[ 3188.561059] ? lock_downgrade+0x6e0/0x6e0
[ 3188.562231] ? rwlock_bug.part.0+0x90/0x90
[ 3188.563338] ? rcu_read_lock_sched_held+0x12/0x70
[ 3188.564583] ? mntput_no_expire+0x132/0xb40
[ 3188.565731] ? __fget_light+0x51/0x220
[ 3188.566858] __sys_sendmsg+0xa4/0x120
[ 3188.567878] ? __sys_sendmsg_sock+0x20/0x20
[ 3188.568995] ? call_rcu+0x543/0xd40
[ 3188.570047] ? syscall_enter_from_user_mode+0x1d/0x50
[ 3188.571387] ? trace_hardirqs_on+0x32/0x120
[ 3188.572502] do_syscall_64+0x3d/0x90
[ 3188.573491] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3188.574916] RIP: 0033:0x7fc68ffd4c17
[ 3188.575900] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 3188.580625] RSP: 002b:00007ffd26634f18 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3188.582945] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc68ffd4c17
[ 3188.585684] RDX: 0000000000000000 RSI: 00007ffd26634f80 RDI: 0000000000000003
[ 3188.587965] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007fc690095a40
[ 3188.589788] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
[ 3188.591618] R13: 00007ffd26635630 R14: 00007ffd26635c85 R15: 000000000048f520
[ 3188.593365]
[ 3188.593953] The buggy address belongs to the page:
[ 3188.595288] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
[ 3188.597966] flags: 0x8000000000000000(zone=2)
[ 3188.599643] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
[ 3188.601766] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 3188.603786] page dumped because: kasan: bad access detected
[ 3188.622507]
[ 3188.623291] Memory state around the buggy address:
[ 3188.625031] ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.627617] ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.630275] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.632956] ^
[ 3188.634838] ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.637544] ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3188.640221] ==================================================================

[...]

2021-10-18 15:44:21

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
> We got a use-after-free with very similar trace [0] during nightly
> regression. The issue happens when ip link up/down state is flipped
> several times in loop and doesn't reproduce for me manually. The fact
> that it didn't reproduce for me after running test ten times suggests
> that it is either very hard to reproduce or that it is a result of some
> interaction between several tests in our suite.
>
> [0]:
>
> [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
> [ 3187.890694] ==================================================================
> [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
> [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618

Hm, not sure how similar it is. This one looks like channel was freed
without deleting NAPI. Do you have list debug enabled?

2021-10-18 16:15:23

by Vlad Buslov

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

On Mon 18 Oct 2021 at 18:42, Jakub Kicinski <[email protected]> wrote:
> On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
>> We got a use-after-free with very similar trace [0] during nightly
>> regression. The issue happens when ip link up/down state is flipped
>> several times in loop and doesn't reproduce for me manually. The fact
>> that it didn't reproduce for me after running test ten times suggests
>> that it is either very hard to reproduce or that it is a result of some
>> interaction between several tests in our suite.
>>
>> [0]:
>>
>> [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
>> [ 3187.890694] ==================================================================
>> [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
>> [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618
>
> Hm, not sure how similar it is. This one looks like channel was freed
> without deleting NAPI. Do you have list debug enabled?

Yes, CONFIG_DEBUG_LIST is enabled.

2021-10-18 17:50:04

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

Jakub Kicinski <[email protected]> writes:

> On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
>> We got a use-after-free with very similar trace [0] during nightly
>> regression. The issue happens when ip link up/down state is flipped
>> several times in loop and doesn't reproduce for me manually. The fact
>> that it didn't reproduce for me after running test ten times suggests
>> that it is either very hard to reproduce or that it is a result of some
>> interaction between several tests in our suite.
>>
>> [0]:
>>
>> [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
>> [ 3187.890694] ==================================================================
>> [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
>> [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618
>
> Hm, not sure how similar it is. This one looks like channel was freed
> without deleting NAPI. Do you have list debug enabled?

Well, the other report[0] also kinda looks like the NAPI thread keeps
running after it should have been disabled, so maybe they are in fact
related?

-Toke

[0] https://lore.kernel.org/r/[email protected]

2021-10-18 17:59:52

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

On Mon, 18 Oct 2021 19:40:40 +0200 Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <[email protected]> writes:
>
> > On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
> >> We got a use-after-free with very similar trace [0] during nightly
> >> regression. The issue happens when ip link up/down state is flipped
> >> several times in loop and doesn't reproduce for me manually. The fact
> >> that it didn't reproduce for me after running test ten times suggests
> >> that it is either very hard to reproduce or that it is a result of some
> >> interaction between several tests in our suite.
> >>
> >> [0]:
> >>
> >> [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
> >> [ 3187.890694] ==================================================================
> >> [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
> >> [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618
> >
> > Hm, not sure how similar it is. This one looks like channel was freed
> > without deleting NAPI. Do you have list debug enabled?
>
> Well, the other report[0] also kinda looks like the NAPI thread keeps
> running after it should have been disabled, so maybe they are in fact
> related?
>
> [0] https://lore.kernel.org/r/[email protected]

Could be, if napi->state gets corrupted it may lose NAPI_STATE_LISTED.

719c57197010 ("net: make napi_disable() symmetric with enable")
3765996e4f0b ("napi: fix race inside napi_enable")
is the only thing that comes to mind, but they look fine to me.

2021-10-18 23:38:28

by Saeed Mahameed

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

On Mon, 2021-10-18 at 19:12 +0300, Vlad Buslov wrote:
> On Mon 18 Oct 2021 at 18:42, Jakub Kicinski <[email protected]> wrote:
> > On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
> > > We got a use-after-free with very similar trace [0] during
> > > nightly
> > > regression. The issue happens when ip link up/down state is
> > > flipped
> > > several times in loop and doesn't reproduce for me manually. The
> > > fact
> > > that it didn't reproduce for me after running test ten times
> > > suggests
> > > that it is either very hard to reproduce or that it is a result
> > > of some
> > > interaction between several tests in our suite.
> > >
> > > [0]:
> > >
> > > [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
> > >  [ 3187.890694]
> > > =================================================================
> > > =
> > >  [ 3187.892518] BUG: KASAN: use-after-free in
> > > __list_add_valid+0xc3/0xf0
> > >  [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task
> > > ip/119618
> >
> > Hm, not sure how similar it is. This one looks like channel was
> > freed
> > without deleting NAPI. Do you have list debug enabled?
>
> Yes, CONFIG_DEBUG_LIST is enabled.
>
do you have core dumps ?
let's enable kernel.panic_on_oops with core dumps and look at it next
time we see this, I really don't think mlx5 is leaking..

2021-10-19 10:15:02

by Jussi Maki

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

On Thu, Oct 14, 2021 at 3:50 PM Paolo Abeni <[email protected]> wrote:
>
> On Wed, 2021-10-13 at 15:35 +0200, Daniel Borkmann wrote:
> > On 10/13/21 1:40 PM, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> >
> > [ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]
>
> For the records: Toke and me are actively investigating this issue and
> the other recent related one. So far we could not find anything
> relevant.
>
> The onluy note is that the reproducer is not extremelly reliable - I
> could not reproduce locally, and multiple syzbot runs on the same code
> give different results. Anyhow, so far the issue was only observerable
> on a specific 'next' commit which is currently "not reachable" from any
> branch. I'm wondering if the issue was caused by some incosistent
> status of such tree.
>

Hey,

I took a look at the bond/XDP related bits and couldn't find anything
obvious there. And for what it's worth, I was running the syzbot repro
under bpf-next tree (223f903e9c8) in the bpf vmtest.sh environment for
30 minutes without hitting this. An inconsistent tree might be a
plausible cause.

2021-10-25 07:05:08

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

On Sat, 23 Oct 2021 at 15:45, Hillf Danton <[email protected]> wrote:
>
> On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
> >On Thu 14 Oct 2021 at 16:50, Paolo Abeni <[email protected]> wrote:
> >> On Wed, 2021-10-13 at 15:35 +0200, Daniel Borkmann wrote:
> >>> On 10/13/21 1:40 PM, syzbot wrote:
> >>> > Hello,
> >>> >
> >>> > syzbot found the following issue on:
> >>>
> >>> [ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]
> >>
> >> For the records: Toke and me are actively investigating this issue and
> >> the other recent related one. So far we could not find anything
> >> relevant.
> >>
> >> The onluy note is that the reproducer is not extremelly reliable - I
> >> could not reproduce locally, and multiple syzbot runs on the same code
> >> give different results. Anyhow, so far the issue was only observerable
> >> on a specific 'next' commit which is currently "not reachable" from any
> >> branch. I'm wondering if the issue was caused by some incosistent
> >> status of such tree.
> >
> >Hi,
> >
> >We got a use-after-free with very similar trace [0] during nightly
> >regression. The issue happens when ip link up/down state is flipped
> >several times in loop and doesn't reproduce for me manually. The fact
> >that it didn't reproduce for me after running test ten times suggests
> >that it is either very hard to reproduce or that it is a result of some
> >interaction between several tests in our suite.
> >
> >[0]:
> >
> >[ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
> > [ 3187.890694] ==================================================================
> > [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
> > [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618
> > [ 3187.895683]
> > [ 3187.896209] CPU: 0 PID: 119618 Comm: ip Not tainted 5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
> > [ 3187.898445] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > [ 3187.901075] Call Trace:
> > [ 3187.901858] dump_stack_lvl+0x57/0x7d
> > [ 3187.902899] print_address_description.constprop.0+0x1f/0x140
> > [ 3187.904346] ? __list_add_valid+0xc3/0xf0
> > [ 3187.905439] ? __list_add_valid+0xc3/0xf0
> > [ 3187.906565] kasan_report.cold+0x83/0xdf
> > [ 3187.907619] ? __list_add_valid+0xc3/0xf0
> > [ 3187.908693] __list_add_valid+0xc3/0xf0
> > [ 3187.909765] netif_napi_add+0x399/0x9a0
> > [ 3187.910794] ? kmalloc_order_trace+0x6a/0x120
> > [ 3187.911944] mlx5e_open_channels+0x91b/0x2e10 [mlx5_core]
> > [ 3187.913872] ? rwlock_bug.part.0+0x90/0x90
> > [ 3187.914959] ? mlx5e_close_cq+0x80/0x80 [mlx5_core]
> > [ 3187.916584] ? mutex_is_locked+0x13/0x50
> > [ 3187.917703] mlx5e_open_locked+0x6a/0x1f0 [mlx5_core]
> > [ 3187.919368] mlx5e_open+0x35/0xb0 [mlx5_core]
> > [ 3187.920863] __dev_open+0x22f/0x420
> > [ 3187.921852] ? dev_set_rx_mode+0x80/0x80
> > [ 3187.922920] ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
> > [ 3187.924866] ? __local_bh_enable_ip+0xa2/0x100
> > [ 3187.926148] ? trace_hardirqs_on+0x32/0x120
> > [ 3187.927270] __dev_change_flags+0x451/0x670
> > [ 3187.928387] ? dev_set_allmulti+0x10/0x10
> > [ 3187.929480] ? rtnl_fill_vfinfo+0x936/0xdb0
> > [ 3187.930592] dev_change_flags+0x8b/0x150
> > [ 3187.931651] do_setlink+0x820/0x2d60
> > [ 3187.932631] ? rtnetlink_put_metrics+0x490/0x490
> > [ 3187.933852] ? lock_release+0x460/0x750
> > [ 3187.934881] ? kvm_async_pf_task_wake+0x410/0x410
> > [ 3187.936122] ? lock_downgrade+0x6e0/0x6e0
> > [ 3187.937203] ? do_raw_spin_unlock+0x54/0x220
> > [ 3187.938351] ? memset+0x20/0x40
> > [ 3187.939246] ? __nla_validate_parse+0xb2/0x22c0
> > [ 3187.940426] ? do_raw_spin_lock+0x126/0x270
> > [ 3187.941568] ? push_cpu_stop+0x830/0x830
> > [ 3187.942638] ? rwlock_bug.part.0+0x90/0x90
> > [ 3187.943733] ? devlink_compat_switch_id_get+0xbb/0x100
> > [ 3187.945065] ? nla_get_range_signed+0x540/0x540
> > [ 3187.946272] ? memcpy+0x39/0x60
> > [ 3187.947162] ? memset+0x20/0x40
> > [ 3187.948058] ? memset+0x20/0x40
> > [ 3187.948943] __rtnl_newlink+0xac0/0x1370
> > [ 3187.950038] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3187.951380] ? rtnl_setlink+0x330/0x330
> > [ 3187.952417] ? deref_stack_reg+0x160/0x160
> > [ 3187.953534] ? deref_stack_reg+0xe6/0x160
> > [ 3187.954619] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3187.955848] ? lock_release+0x460/0x750
> > [ 3187.956886] ? is_bpf_text_address+0x54/0x110
> > [ 3187.958047] ? lock_downgrade+0x6e0/0x6e0
> > [ 3187.959133] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3187.960469] ? deref_stack_reg+0x160/0x160
> > [ 3187.961592] ? is_bpf_text_address+0x73/0x110
> > [ 3187.962759] ? kernel_text_address+0xda/0x100
> > [ 3187.963920] ? __kernel_text_address+0xe/0x30
> > [ 3187.965069] ? unwind_get_return_address+0x56/0xa0
> > [ 3187.966334] ? __thaw_task+0x70/0x70
> > [ 3187.967320] ? arch_stack_walk+0x98/0xf0
> > [ 3187.968405] ? lock_downgrade+0x6e0/0x6e0
> > [ 3187.969510] ? trace_hardirqs_on+0x32/0x120
> > [ 3187.970644] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3187.971883] rtnl_newlink+0x5f/0x90
> > [ 3187.972866] rtnetlink_rcv_msg+0x32b/0x950
> > [ 3187.973968] ? deref_stack_reg+0x160/0x160
> > [ 3187.975088] ? rtnl_fdb_dump+0x830/0x830
> > [ 3187.976160] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3187.977393] ? lock_acquire+0x38d/0x4c0
> > [ 3187.978443] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3187.979685] ? lock_acquire+0x38d/0x4c0
> > [ 3187.980733] netlink_rcv_skb+0x11d/0x340
> > [ 3187.981812] ? rtnl_fdb_dump+0x830/0x830
> > [ 3187.982862] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3187.984105] ? netlink_ack+0x930/0x930
> > [ 3187.985136] ? netlink_deliver_tap+0x140/0xb10
> > [ 3187.986316] ? netlink_deliver_tap+0x14c/0xb10
> > [ 3187.987495] ? _copy_from_iter+0x282/0xbe0
> > [ 3187.988597] netlink_unicast+0x433/0x700
> > [ 3187.989693] ? netlink_attachskb+0x740/0x740
> > [ 3187.990819] ? __alloc_skb+0x117/0x2c0
> > [ 3187.991855] netlink_sendmsg+0x707/0xbf0
> > [ 3187.992921] ? netlink_unicast+0x700/0x700
> > [ 3187.994024] ? netlink_unicast+0x700/0x700
> > [ 3187.995121] sock_sendmsg+0xb0/0xe0
> > [ 3187.996091] ____sys_sendmsg+0x4fa/0x6d0
> > [ 3187.997163] ? iovec_from_user+0x136/0x280
> > [ 3187.998276] ? kernel_sendmsg+0x30/0x30
> > [ 3188.012806] ? __import_iovec+0x51/0x610
> > [ 3188.013858] ___sys_sendmsg+0x12e/0x1b0
> > [ 3188.014875] ? do_recvmmsg+0x500/0x500
> > [ 3188.015877] ? get_max_files+0x10/0x10
> > [ 3188.016866] ? kasan_record_aux_stack+0xab/0xc0
> > [ 3188.018108] ? call_rcu+0x87/0xd40
> > [ 3188.019041] ? task_work_run+0xc5/0x160
> > [ 3188.020044] ? exit_to_user_mode_prepare+0x1d9/0x1e0
> > [ 3188.021271] ? syscall_exit_to_user_mode+0x19/0x50
> > [ 3188.022563] ? do_syscall_64+0x4a/0x90
> > [ 3188.023559] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3188.024858] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.026121] ? lock_release+0x460/0x750
> > [ 3188.027174] ? mntput_no_expire+0x113/0xb40
> > [ 3188.028302] ? lock_downgrade+0x6e0/0x6e0
> > [ 3188.029398] ? rwlock_bug.part.0+0x90/0x90
> > [ 3188.030555] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.031812] ? mntput_no_expire+0x132/0xb40
> > [ 3188.032940] ? __fget_light+0x51/0x220
> > [ 3188.033986] __sys_sendmsg+0xa4/0x120
> > [ 3188.034992] ? __sys_sendmsg_sock+0x20/0x20
> > [ 3188.036115] ? call_rcu+0x543/0xd40
> > [ 3188.037084] ? syscall_enter_from_user_mode+0x1d/0x50
> > [ 3188.038406] ? trace_hardirqs_on+0x32/0x120
> > [ 3188.039515] do_syscall_64+0x3d/0x90
> > [ 3188.040502] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3188.041896] RIP: 0033:0x7f904ec94c17
> > [ 3188.042891] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
> > [ 3188.047412] RSP: 002b:00007ffc1a6c4a98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> > [ 3188.049361] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f904ec94c17
> > [ 3188.051121] RDX: 0000000000000000 RSI: 00007ffc1a6c4b00 RDI: 0000000000000003
> > [ 3188.052881] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007f904ed55a40
> > [ 3188.054645] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
> > [ 3188.056403] R13: 00007ffc1a6c51b0 R14: 00007ffc1a6c6c87 R15: 000000000048f520
> > [ 3188.058189]
> > [ 3188.058732] The buggy address belongs to the page:
> > [ 3188.059996] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
> > [ 3188.062378] flags: 0x8000000000000000(zone=2)
> > [ 3188.063551] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
> > [ 3188.065548] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> > [ 3188.067518] page dumped because: kasan: bad access detected
> > [ 3188.068930]
> > [ 3188.069481] Memory state around the buggy address:
> > [ 3188.070730] ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.072618] ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.074508] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.076378] ^
> > [ 3188.077711] ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.079584] ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.081470] ==================================================================
> > [ 3188.083406] ==================================================================
> > [ 3188.085280] BUG: KASAN: use-after-free in netif_napi_add+0x8b7/0x9a0
> > [ 3188.086952] Write of size 8 at addr ffff8881150b3fb8 by task ip/119618
> > [ 3188.089181]
> > [ 3188.089987] CPU: 0 PID: 119618 Comm: ip Tainted: G B 5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
> > [ 3188.092659] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > [ 3188.095481] Call Trace:
> > [ 3188.096222] dump_stack_lvl+0x57/0x7d
> > [ 3188.097238] print_address_description.constprop.0+0x1f/0x140
> > [ 3188.098764] ? netif_napi_add+0x8b7/0x9a0
> > [ 3188.099862] ? netif_napi_add+0x8b7/0x9a0
> > [ 3188.100940] kasan_report.cold+0x83/0xdf
> > [ 3188.102041] ? netif_napi_add+0x8b7/0x9a0
> > [ 3188.103140] netif_napi_add+0x8b7/0x9a0
> > [ 3188.104180] ? kmalloc_order_trace+0x6a/0x120
> > [ 3188.105336] mlx5e_open_channels+0x91b/0x2e10 [mlx5_core]
> > [ 3188.107145] ? rwlock_bug.part.0+0x90/0x90
> > [ 3188.108238] ? mlx5e_close_cq+0x80/0x80 [mlx5_core]
> > [ 3188.109882] ? mutex_is_locked+0x13/0x50
> > [ 3188.110985] mlx5e_open_locked+0x6a/0x1f0 [mlx5_core]
> > [ 3188.112644] mlx5e_open+0x35/0xb0 [mlx5_core]
> > [ 3188.114215] __dev_open+0x22f/0x420
> > [ 3188.115186] ? dev_set_rx_mode+0x80/0x80
> > [ 3188.116247] ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
> > [ 3188.118252] ? __local_bh_enable_ip+0xa2/0x100
> > [ 3188.119438] ? trace_hardirqs_on+0x32/0x120
> > [ 3188.120554] __dev_change_flags+0x451/0x670
> > [ 3188.121705] ? dev_set_allmulti+0x10/0x10
> > [ 3188.122828] ? rtnl_fill_vfinfo+0x936/0xdb0
> > [ 3188.123943] dev_change_flags+0x8b/0x150
> > [ 3188.124995] do_setlink+0x820/0x2d60
> > [ 3188.126023] ? rtnetlink_put_metrics+0x490/0x490
> > [ 3188.127233] ? lock_release+0x460/0x750
> > [ 3188.128269] ? kvm_async_pf_task_wake+0x410/0x410
> > [ 3188.129502] ? lock_downgrade+0x6e0/0x6e0
> > [ 3188.130620] ? do_raw_spin_unlock+0x54/0x220
> > [ 3188.131781] ? memset+0x20/0x40
> > [ 3188.132663] ? __nla_validate_parse+0xb2/0x22c0
> > [ 3188.133894] ? do_raw_spin_lock+0x126/0x270
> > [ 3188.135066] ? push_cpu_stop+0x830/0x830
> > [ 3188.136136] ? rwlock_bug.part.0+0x90/0x90
> > [ 3188.137230] ? devlink_compat_switch_id_get+0xbb/0x100
> > [ 3188.138585] ? nla_get_range_signed+0x540/0x540
> > [ 3188.139780] ? memcpy+0x39/0x60
> > [ 3188.140683] ? memset+0x20/0x40
> > [ 3188.141580] ? memset+0x20/0x40
> > [ 3188.142517] __rtnl_newlink+0xac0/0x1370
> > [ 3188.143579] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3188.144914] ? rtnl_setlink+0x330/0x330
> > [ 3188.145974] ? deref_stack_reg+0x160/0x160
> > [ 3188.147078] ? deref_stack_reg+0xe6/0x160
> > [ 3188.148157] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.149378] ? lock_release+0x460/0x750
> > [ 3188.150490] ? is_bpf_text_address+0x54/0x110
> > [ 3188.151648] ? lock_downgrade+0x6e0/0x6e0
> > [ 3188.152725] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3188.154075] ? deref_stack_reg+0x160/0x160
> > [ 3188.155176] ? is_bpf_text_address+0x73/0x110
> > [ 3188.156353] ? kernel_text_address+0xda/0x100
> > [ 3188.157510] ? __kernel_text_address+0xe/0x30
> > [ 3188.158707] ? unwind_get_return_address+0x56/0xa0
> > [ 3188.159992] ? __thaw_task+0x70/0x70
> > [ 3188.160979] ? arch_stack_walk+0x98/0xf0
> > [ 3188.162072] ? lock_downgrade+0x6e0/0x6e0
> > [ 3188.163167] ? trace_hardirqs_on+0x32/0x120
> > [ 3188.164295] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.165546] rtnl_newlink+0x5f/0x90
> > [ 3188.166558] rtnetlink_rcv_msg+0x32b/0x950
> > [ 3188.167677] ? deref_stack_reg+0x160/0x160
> > [ 3188.168782] ? rtnl_fdb_dump+0x830/0x830
> > [ 3188.169857] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.171089] ? lock_acquire+0x38d/0x4c0
> > [ 3188.172131] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.173367] ? lock_acquire+0x38d/0x4c0
> > [ 3188.174472] netlink_rcv_skb+0x11d/0x340
> > [ 3188.175531] ? rtnl_fdb_dump+0x830/0x830
> > [ 3188.176592] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.177824] ? netlink_ack+0x930/0x930
> > [ 3188.178848] ? netlink_deliver_tap+0x140/0xb10
> > [ 3188.180013] ? netlink_deliver_tap+0x14c/0xb10
> > [ 3188.181188] ? _copy_from_iter+0x282/0xbe0
> > [ 3188.182351] netlink_unicast+0x433/0x700
> > [ 3188.183418] ? netlink_attachskb+0x740/0x740
> > [ 3188.184552] ? __alloc_skb+0x117/0x2c0
> > [ 3188.185606] netlink_sendmsg+0x707/0xbf0
> > [ 3188.186672] ? netlink_unicast+0x700/0x700
> > [ 3188.187783] ? netlink_unicast+0x700/0x700
> > [ 3188.188882] sock_sendmsg+0xb0/0xe0
> > [ 3188.189862] ____sys_sendmsg+0x4fa/0x6d0
> > [ 3188.190971] ? iovec_from_user+0x136/0x280
> > [ 3188.192074] ? kernel_sendmsg+0x30/0x30
> > [ 3188.193130] ? __import_iovec+0x51/0x610
> > [ 3188.194225] ___sys_sendmsg+0x12e/0x1b0
> > [ 3188.195267] ? do_recvmmsg+0x500/0x500
> > [ 3188.196301] ? get_max_files+0x10/0x10
> > [ 3188.197333] ? kasan_record_aux_stack+0xab/0xc0
> > [ 3188.198558] ? call_rcu+0x87/0xd40
> > [ 3188.199519] ? task_work_run+0xc5/0x160
> > [ 3188.200557] ? exit_to_user_mode_prepare+0x1d9/0x1e0
> > [ 3188.201872] ? syscall_exit_to_user_mode+0x19/0x50
> > [ 3188.203134] ? do_syscall_64+0x4a/0x90
> > [ 3188.204152] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3188.205511] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.206782] ? lock_release+0x460/0x750
> > [ 3188.207870] ? mntput_no_expire+0x113/0xb40
> > [ 3188.209025] ? lock_downgrade+0x6e0/0x6e0
> > [ 3188.210272] ? rwlock_bug.part.0+0x90/0x90
> > [ 3188.211864] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.213644] ? mntput_no_expire+0x132/0xb40
> > [ 3188.215253] ? __fget_light+0x51/0x220
> > [ 3188.216535] __sys_sendmsg+0xa4/0x120
> > [ 3188.217574] ? __sys_sendmsg_sock+0x20/0x20
> > [ 3188.218707] ? call_rcu+0x543/0xd40
> > [ 3188.219679] ? syscall_enter_from_user_mode+0x1d/0x50
> > [ 3188.221004] ? trace_hardirqs_on+0x32/0x120
> > [ 3188.235475] do_syscall_64+0x3d/0x90
> > [ 3188.236463] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3188.237744] RIP: 0033:0x7f904ec94c17
> > [ 3188.238693] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
> > [ 3188.242968] RSP: 002b:00007ffc1a6c4a98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> > [ 3188.244834] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f904ec94c17
> > [ 3188.246604] RDX: 0000000000000000 RSI: 00007ffc1a6c4b00 RDI: 0000000000000003
> > [ 3188.248362] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007f904ed55a40
> > [ 3188.250140] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
> > [ 3188.251889] R13: 00007ffc1a6c51b0 R14: 00007ffc1a6c6c87 R15: 000000000048f520
> > [ 3188.253667]
> > [ 3188.254215] The buggy address belongs to the page:
> > [ 3188.255460] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
> > [ 3188.257812] flags: 0x8000000000000000(zone=2)
> > [ 3188.258985] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
> > [ 3188.260971] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> > [ 3188.262993] page dumped because: kasan: bad access detected
> > [ 3188.264413]
> > [ 3188.264943] Memory state around the buggy address:
> > [ 3188.266203] ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.268082] ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.269957] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.271818] ^
> > [ 3188.273122] ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.275000] ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.276862] ==================================================================
> > [ 3188.371511] mlx5_core 0000:08:00.0 enp8s0f0: Link up
> > [ 3188.376126] IPv6: ADDRCONF(NETDEV_CHANGE): enp8s0f0: link becomes ready
> > [ 3188.430532] ==================================================================
> > [ 3188.432378] BUG: KASAN: use-after-free in __list_del_entry_valid+0x14b/0x180
> > [ 3188.434254] Read of size 8 at addr ffff8881150b3fb8 by task ip/119619
> > [ 3188.435826]
> > [ 3188.436365] CPU: 3 PID: 119619 Comm: ip Tainted: G B 5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
> > [ 3188.439688] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > [ 3188.442423] Call Trace:
> > [ 3188.443172] dump_stack_lvl+0x57/0x7d
> > [ 3188.444186] print_address_description.constprop.0+0x1f/0x140
> > [ 3188.445703] ? __list_del_entry_valid+0x14b/0x180
> > [ 3188.447004] ? __list_del_entry_valid+0x14b/0x180
> > [ 3188.448255] kasan_report.cold+0x83/0xdf
> > [ 3188.449323] ? __list_del_entry_valid+0x14b/0x180
> > [ 3188.450670] __list_del_entry_valid+0x14b/0x180
> > [ 3188.451887] ? _raw_spin_unlock+0x1f/0x30
> > [ 3188.452969] __netif_napi_del.part.0+0xec/0x4a0
> > [ 3188.454453] mlx5e_close_channel+0x7d/0xd0 [mlx5_core]
> > [ 3188.456988] mlx5e_close_channels+0xf9/0x200 [mlx5_core]
> > [ 3188.459599] mlx5e_close_locked+0x101/0x130 [mlx5_core]
> > [ 3188.462156] mlx5e_close+0xad/0x100 [mlx5_core]
> > [ 3188.463961] __dev_close_many+0x18e/0x2b0
> > [ 3188.465045] ? list_netdevice+0x3a0/0x3a0
> > [ 3188.466187] ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
> > [ 3188.468156] ? __local_bh_enable_ip+0xa2/0x100
> > [ 3188.469333] ? trace_hardirqs_on+0x32/0x120
> > [ 3188.470496] __dev_change_flags+0x254/0x670
> > [ 3188.471605] ? dev_set_allmulti+0x10/0x10
> > [ 3188.472692] ? rtnl_fill_vfinfo+0x936/0xdb0
> > [ 3188.473854] dev_change_flags+0x8b/0x150
> > [ 3188.474965] do_setlink+0x820/0x2d60
> > [ 3188.475950] ? rtnetlink_put_metrics+0x490/0x490
> > [ 3188.477165] ? lock_release+0x460/0x750
> > [ 3188.478306] ? kvm_async_pf_task_wake+0x410/0x410
> > [ 3188.479542] ? lock_downgrade+0x6e0/0x6e0
> > [ 3188.480615] ? do_raw_spin_unlock+0x54/0x220
> > [ 3188.481790] ? memset+0x20/0x40
> > [ 3188.482963] ? __nla_validate_parse+0xb2/0x22c0
> > [ 3188.484167] ? do_raw_spin_lock+0x126/0x270
> > [ 3188.485281] ? push_cpu_stop+0x830/0x830
> > [ 3188.486457] ? rwlock_bug.part.0+0x90/0x90
> > [ 3188.487557] ? devlink_compat_switch_id_get+0xbb/0x100
> > [ 3188.488894] ? nla_get_range_signed+0x540/0x540
> > [ 3188.490168] ? memcpy+0x39/0x60
> > [ 3188.491083] ? memset+0x20/0x40
> > [ 3188.491966] ? memset+0x20/0x40
> > [ 3188.492855] __rtnl_newlink+0xac0/0x1370
> > [ 3188.493987] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3188.495384] ? rtnl_setlink+0x330/0x330
> > [ 3188.496446] ? deref_stack_reg+0x160/0x160
> > [ 3188.497551] ? deref_stack_reg+0xe6/0x160
> > [ 3188.498713] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.499929] ? lock_release+0x460/0x750
> > [ 3188.501232] ? is_bpf_text_address+0x54/0x110
> > [ 3188.502735] ? lock_downgrade+0x6e0/0x6e0
> > [ 3188.503831] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3188.505157] ? deref_stack_reg+0x160/0x160
> > [ 3188.506298] ? is_bpf_text_address+0x73/0x110
> > [ 3188.507459] ? kernel_text_address+0xda/0x100
> > [ 3188.508615] ? __kernel_text_address+0xe/0x30
> > [ 3188.509776] ? unwind_get_return_address+0x56/0xa0
> > [ 3188.511047] ? __thaw_task+0x70/0x70
> > [ 3188.512033] ? arch_stack_walk+0x98/0xf0
> > [ 3188.513059] ? lock_downgrade+0x6e0/0x6e0
> > [ 3188.514191] ? trace_hardirqs_on+0x32/0x120
> > [ 3188.515303] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.516524] rtnl_newlink+0x5f/0x90
> > [ 3188.517513] rtnetlink_rcv_msg+0x32b/0x950
> > [ 3188.518652] ? deref_stack_reg+0x160/0x160
> > [ 3188.519761] ? rtnl_fdb_dump+0x830/0x830
> > [ 3188.520816] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.522119] ? lock_acquire+0x38d/0x4c0
> > [ 3188.523211] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.524435] ? lock_acquire+0x38d/0x4c0
> > [ 3188.525498] netlink_rcv_skb+0x11d/0x340
> > [ 3188.526649] ? rtnl_fdb_dump+0x830/0x830
> > [ 3188.527722] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.528949] ? netlink_ack+0x930/0x930
> > [ 3188.530055] ? netlink_deliver_tap+0x140/0xb10
> > [ 3188.531347] ? netlink_deliver_tap+0x14c/0xb10
> > [ 3188.532549] ? _copy_from_iter+0x282/0xbe0
> > [ 3188.533711] netlink_unicast+0x433/0x700
> > [ 3188.534845] ? netlink_attachskb+0x740/0x740
> > [ 3188.535987] ? __alloc_skb+0x117/0x2c0
> > [ 3188.537006] netlink_sendmsg+0x707/0xbf0
> > [ 3188.538150] ? netlink_unicast+0x700/0x700
> > [ 3188.539337] ? netlink_unicast+0x700/0x700
> > [ 3188.540448] sock_sendmsg+0xb0/0xe0
> > [ 3188.541424] ____sys_sendmsg+0x4fa/0x6d0
> > [ 3188.542743] ? iovec_from_user+0x136/0x280
> > [ 3188.543932] ? kernel_sendmsg+0x30/0x30
> > [ 3188.544963] ? __import_iovec+0x51/0x610
> > [ 3188.546063] ___sys_sendmsg+0x12e/0x1b0
> > [ 3188.547189] ? do_recvmmsg+0x500/0x500
> > [ 3188.548209] ? get_max_files+0x10/0x10
> > [ 3188.549226] ? kasan_record_aux_stack+0xab/0xc0
> > [ 3188.550547] ? call_rcu+0x87/0xd40
> > [ 3188.551509] ? task_work_run+0xc5/0x160
> > [ 3188.552546] ? exit_to_user_mode_prepare+0x1d9/0x1e0
> > [ 3188.553896] ? syscall_exit_to_user_mode+0x19/0x50
> > [ 3188.555195] ? do_syscall_64+0x4a/0x90
> > [ 3188.556206] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3188.557634] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.558903] ? lock_release+0x460/0x750
> > [ 3188.559948] ? mntput_no_expire+0x113/0xb40
> > [ 3188.561059] ? lock_downgrade+0x6e0/0x6e0
> > [ 3188.562231] ? rwlock_bug.part.0+0x90/0x90
> > [ 3188.563338] ? rcu_read_lock_sched_held+0x12/0x70
> > [ 3188.564583] ? mntput_no_expire+0x132/0xb40
> > [ 3188.565731] ? __fget_light+0x51/0x220
> > [ 3188.566858] __sys_sendmsg+0xa4/0x120
> > [ 3188.567878] ? __sys_sendmsg_sock+0x20/0x20
> > [ 3188.568995] ? call_rcu+0x543/0xd40
> > [ 3188.570047] ? syscall_enter_from_user_mode+0x1d/0x50
> > [ 3188.571387] ? trace_hardirqs_on+0x32/0x120
> > [ 3188.572502] do_syscall_64+0x3d/0x90
> > [ 3188.573491] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3188.574916] RIP: 0033:0x7fc68ffd4c17
> > [ 3188.575900] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
> > [ 3188.580625] RSP: 002b:00007ffd26634f18 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> > [ 3188.582945] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc68ffd4c17
> > [ 3188.585684] RDX: 0000000000000000 RSI: 00007ffd26634f80 RDI: 0000000000000003
> > [ 3188.587965] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007fc690095a40
> > [ 3188.589788] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
> > [ 3188.591618] R13: 00007ffd26635630 R14: 00007ffd26635c85 R15: 000000000048f520
> > [ 3188.593365]
> > [ 3188.593953] The buggy address belongs to the page:
> > [ 3188.595288] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
> > [ 3188.597966] flags: 0x8000000000000000(zone=2)
> > [ 3188.599643] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
> > [ 3188.601766] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> > [ 3188.603786] page dumped because: kasan: bad access detected
> > [ 3188.622507]
> > [ 3188.623291] Memory state around the buggy address:
> > [ 3188.625031] ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.627617] ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.630275] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.632956] ^
> > [ 3188.634838] ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.637544] ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > [ 3188.640221] ==================================================================
> >
> >[...]
> >
> > [ 3188.574916] RIP: 0033:0x7fc68ffd4c17
> > [ 3188.237744] RIP: 0033:0x7f904ec94c17
>
> Dmitry, what addresses are these RIPs pointing to?

This report did not come from syzkaller/syzbot. We need to ask Vlad.
For syzkaller/syzbot I wouldn't be able to answer such a question. But
I guess it's just a code that executes the sendmsg syscall instruction
in user-space. What aspect of that code are you interested in?

2021-12-14 07:52:10

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

syzbot suspects this issue was fixed by commit:

commit 0315a075f1343966ea2d9a085666a88a69ea6a3d
Author: Alexander Lobakin <[email protected]>
Date: Wed Nov 10 19:56:05 2021 +0000

net: fix premature exit from NAPI state polling in napi_disable()

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=138dffbeb00000
start commit: 911e3a46fb38 net: phy: Fix unsigned comparison with less t..
git tree: net-next
kernel config: https://syzkaller.appspot.com/x/.config?x=d36d2402e8523638
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=141592f2b00000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: net: fix premature exit from NAPI state polling in napi_disable()

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

2021-12-14 17:57:12

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] BUG: corrupted list in netif_napi_add

On Tue, 14 Dec 2021 at 08:52, syzbot
<[email protected]> wrote:
>
> syzbot suspects this issue was fixed by commit:
>
> commit 0315a075f1343966ea2d9a085666a88a69ea6a3d
> Author: Alexander Lobakin <[email protected]>
> Date: Wed Nov 10 19:56:05 2021 +0000
>
> net: fix premature exit from NAPI state polling in napi_disable()
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=138dffbeb00000
> start commit: 911e3a46fb38 net: phy: Fix unsigned comparison with less t..
> git tree: net-next
> kernel config: https://syzkaller.appspot.com/x/.config?x=d36d2402e8523638
> dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=141592f2b00000
>
> If the result looks correct, please mark the issue as fixed by replying with:
>
> #syz fix: net: fix premature exit from NAPI state polling in napi_disable()
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection


Looks reasonable based on the subsystem:

#syz fix: net: fix premature exit from NAPI state polling in napi_disable()