Hello,
syzbot found the following crash on:
HEAD commit: 1236568ee3cb Merge tag 'gpio-v4.18-3' of git://git.kernel...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=135f47e0400000
kernel config: https://syzkaller.appspot.com/x/.config?x=152cb8ccd35b1f70
dashboard link: https://syzkaller.appspot.com/bug?extid=30209ea299c09d8785c9
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]
Invalid argument reading file caps for /root/syz-executor5
Invalid argument reading file caps for /root/syz-executor5
Invalid argument reading file caps for /root/syz-executor5
Invalid argument reading file caps for /root/syz-executor5
Invalid argument reading file caps for /root/syz-executor5
unregister_netdevice: waiting for ip6tnl0 to become free. Usage count = 7
Invalid argument reading file caps for /root/syz-executor5
Invalid argument reading file caps for /root/syz-executor5
Invalid argument reading file caps for /root/syz-executor5
Invalid argument reading file caps for /root/syz-executor5
Invalid argument reading file caps for /root/syz-executor5
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot has found a reproducer for the following crash on:
HEAD commit: 31130a16d459 Merge tag 'for-linus-4.19-rc1-tag' of git://g..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1116b46c400000
kernel config: https://syzkaller.appspot.com/x/.config?x=e8d52931cda051de
dashboard link: https://syzkaller.appspot.com/bug?extid=30209ea299c09d8785c9
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=11617322400000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered forwarding state
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered forwarding state
unregister_netdevice: waiting for lo to become free. Usage count = 1
8021q: adding VLAN 0 to HW filter on device bond0
IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
IPVS: stopping master sync thread 6855 ...
IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
id = 0
IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
id = 0
IPVS: stopping master sync thread 6859 ...
IPVS: ftp: loaded support on port[0] = 21
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered disabled state
device bridge_slave_0 entered promiscuous mode
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered disabled state
device bridge_slave_1 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): veth0_to_bridge: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1_to_bridge: link is not ready
bond0: Enslaving bond_slave_0 as an active interface with an up link
bond0: Enslaving bond_slave_1 as an active interface with an up link
IPv6: ADDRCONF(NETDEV_UP): team_slave_0: link is not ready
team0: Port device team_slave_0 added
IPv6: ADDRCONF(NETDEV_UP): team_slave_1: link is not ready
team0: Port device team_slave_1 added
IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered forwarding state
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered forwarding state
8021q: adding VLAN 0 to HW filter on device bond0
IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
IPVS: stopping master sync thread 7118 ...
IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
id = 0
IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
id = 0
IPVS: stopping master sync thread 7122 ...
IPVS: ftp: loaded support on port[0] = 21
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered disabled state
device bridge_slave_0 entered promiscuous mode
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered disabled state
device bridge_slave_1 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): veth0_to_bridge: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1_to_bridge: link is not ready
bond0: Enslaving bond_slave_0 as an active interface with an up link
bond0: Enslaving bond_slave_1 as an active interface with an up link
IPv6: ADDRCONF(NETDEV_UP): team_slave_0: link is not ready
team0: Port device team_slave_0 added
IPv6: ADDRCONF(NETDEV_UP): team_slave_1: link is not ready
team0: Port device team_slave_1 added
IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered forwarding state
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered forwarding state
8021q: adding VLAN 0 to HW filter on device bond0
IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
IPVS: stopping master sync thread 7381 ...
IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
id = 0
IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
id = 0
IPVS: stopping master sync thread 7385 ...
IPVS: ftp: loaded support on port[0] = 21
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered disabled state
device bridge_slave_0 entered promiscuous mode
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered disabled state
device bridge_slave_1 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): veth0_to_bridge: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1_to_bridge: link is not ready
bond0: Enslaving bond_slave_0 as an active interface with an up link
bond0: Enslaving bond_slave_1 as an active interface with an up link
IPv6: ADDRCONF(NETDEV_UP): team_slave_0: link is not ready
team0: Port device team_slave_0 added
IPv6: ADDRCONF(NETDEV_UP): team_slave_1: link is not ready
team0: Port device team_slave_1 added
IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered forwarding state
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered forwarding state
8021q: adding VLAN 0 to HW filter on device bond0
IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
IPVS: stopping master sync thread 7644 ...
IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
id = 0
IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
id = 0
IPVS: stopping master sync thread 7648 ...
IPVS: ftp: loaded support on port[0] = 21
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered disabled state
device bridge_slave_0 entered promiscuous mode
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered disabled state
device bridge_slave_1 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): veth0_to_bridge: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1_to_bridge: link is not ready
bond0: Enslaving bond_slave_0 as an active interface with an up link
bond0: Enslaving bond_slave_1 as an active interface with an up link
IPv6: ADDRCONF(NETDEV_UP): team_slave_0: link is not ready
team0: Port device team_slave_0 added
IPv6: ADDRCONF(NETDEV_UP): team_slave_1: link is not ready
team0: Port device team_slave_1 added
IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
bridge0: port 2(bridge_slave_1) entered blocking state
bridge0: port 2(bridge_slave_1) entered forwarding state
bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered forwarding state
8021q: adding VLAN 0 to HW filter on device bond0
IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
On Wed, Aug 15, 2018 at 1:28 PM, syzbot
<[email protected]> wrote:
> syzbot has found a reproducer for the following crash on:
>
> HEAD commit: 31130a16d459 Merge tag 'for-linus-4.19-rc1-tag' of git://g..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1116b46c400000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e8d52931cda051de
> dashboard link: https://syzkaller.appspot.com/bug?extid=30209ea299c09d8785c9
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=11617322400000
+netdev and Dan
There were more reproducers here:
https://groups.google.com/forum/#!msg/syzkaller/-06_laheMF0/MxCjIiHkBwAJ
and here:
https://groups.google.com/forum/#!msg/syzkaller/-06_laheMF0/4wfWs6ATBwAJ
and in the previous incarnation of the bug:
https://syzkaller.appspot.com/bug?id=1a97a5bd119fd97995f752819fd87840ab9479a9
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
>
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
> bridge0: port 2(bridge_slave_1) entered blocking state
> bridge0: port 2(bridge_slave_1) entered forwarding state
> bridge0: port 1(bridge_slave_0) entered blocking state
> bridge0: port 1(bridge_slave_0) entered forwarding state
> unregister_netdevice: waiting for lo to become free. Usage count = 1
> 8021q: adding VLAN 0 to HW filter on device bond0
> IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
> IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> 8021q: adding VLAN 0 to HW filter on device team0
> IPVS: stopping master sync thread 6855 ...
> IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
> id = 0
> IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
> id = 0
> IPVS: stopping master sync thread 6859 ...
> IPVS: ftp: loaded support on port[0] = 21
> bridge0: port 1(bridge_slave_0) entered blocking state
> bridge0: port 1(bridge_slave_0) entered disabled state
> device bridge_slave_0 entered promiscuous mode
> bridge0: port 2(bridge_slave_1) entered blocking state
> bridge0: port 2(bridge_slave_1) entered disabled state
> device bridge_slave_1 entered promiscuous mode
> IPv6: ADDRCONF(NETDEV_UP): veth0_to_bridge: link is not ready
> IPv6: ADDRCONF(NETDEV_UP): veth1_to_bridge: link is not ready
> bond0: Enslaving bond_slave_0 as an active interface with an up link
> bond0: Enslaving bond_slave_1 as an active interface with an up link
> IPv6: ADDRCONF(NETDEV_UP): team_slave_0: link is not ready
> team0: Port device team_slave_0 added
> IPv6: ADDRCONF(NETDEV_UP): team_slave_1: link is not ready
> team0: Port device team_slave_1 added
> IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
> bridge0: port 2(bridge_slave_1) entered blocking state
> bridge0: port 2(bridge_slave_1) entered forwarding state
> bridge0: port 1(bridge_slave_0) entered blocking state
> bridge0: port 1(bridge_slave_0) entered forwarding state
> 8021q: adding VLAN 0 to HW filter on device bond0
> IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
> IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> 8021q: adding VLAN 0 to HW filter on device team0
> IPVS: stopping master sync thread 7118 ...
> IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
> id = 0
> IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
> id = 0
> IPVS: stopping master sync thread 7122 ...
> IPVS: ftp: loaded support on port[0] = 21
> bridge0: port 1(bridge_slave_0) entered blocking state
> bridge0: port 1(bridge_slave_0) entered disabled state
> device bridge_slave_0 entered promiscuous mode
> bridge0: port 2(bridge_slave_1) entered blocking state
> bridge0: port 2(bridge_slave_1) entered disabled state
> device bridge_slave_1 entered promiscuous mode
> IPv6: ADDRCONF(NETDEV_UP): veth0_to_bridge: link is not ready
> IPv6: ADDRCONF(NETDEV_UP): veth1_to_bridge: link is not ready
> bond0: Enslaving bond_slave_0 as an active interface with an up link
> bond0: Enslaving bond_slave_1 as an active interface with an up link
> IPv6: ADDRCONF(NETDEV_UP): team_slave_0: link is not ready
> team0: Port device team_slave_0 added
> IPv6: ADDRCONF(NETDEV_UP): team_slave_1: link is not ready
> team0: Port device team_slave_1 added
> IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
> bridge0: port 2(bridge_slave_1) entered blocking state
> bridge0: port 2(bridge_slave_1) entered forwarding state
> bridge0: port 1(bridge_slave_0) entered blocking state
> bridge0: port 1(bridge_slave_0) entered forwarding state
> 8021q: adding VLAN 0 to HW filter on device bond0
> IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
> IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> 8021q: adding VLAN 0 to HW filter on device team0
> IPVS: stopping master sync thread 7381 ...
> IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
> id = 0
> IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
> id = 0
> IPVS: stopping master sync thread 7385 ...
> IPVS: ftp: loaded support on port[0] = 21
> bridge0: port 1(bridge_slave_0) entered blocking state
> bridge0: port 1(bridge_slave_0) entered disabled state
> device bridge_slave_0 entered promiscuous mode
> bridge0: port 2(bridge_slave_1) entered blocking state
> bridge0: port 2(bridge_slave_1) entered disabled state
> device bridge_slave_1 entered promiscuous mode
> IPv6: ADDRCONF(NETDEV_UP): veth0_to_bridge: link is not ready
> IPv6: ADDRCONF(NETDEV_UP): veth1_to_bridge: link is not ready
> bond0: Enslaving bond_slave_0 as an active interface with an up link
> bond0: Enslaving bond_slave_1 as an active interface with an up link
> IPv6: ADDRCONF(NETDEV_UP): team_slave_0: link is not ready
> team0: Port device team_slave_0 added
> IPv6: ADDRCONF(NETDEV_UP): team_slave_1: link is not ready
> team0: Port device team_slave_1 added
> IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
> bridge0: port 2(bridge_slave_1) entered blocking state
> bridge0: port 2(bridge_slave_1) entered forwarding state
> bridge0: port 1(bridge_slave_0) entered blocking state
> bridge0: port 1(bridge_slave_0) entered forwarding state
> 8021q: adding VLAN 0 to HW filter on device bond0
> IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
> IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> 8021q: adding VLAN 0 to HW filter on device team0
> IPVS: stopping master sync thread 7644 ...
> IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
> id = 0
> IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
> id = 0
> IPVS: stopping master sync thread 7648 ...
> IPVS: ftp: loaded support on port[0] = 21
> bridge0: port 1(bridge_slave_0) entered blocking state
> bridge0: port 1(bridge_slave_0) entered disabled state
> device bridge_slave_0 entered promiscuous mode
> bridge0: port 2(bridge_slave_1) entered blocking state
> bridge0: port 2(bridge_slave_1) entered disabled state
> device bridge_slave_1 entered promiscuous mode
> IPv6: ADDRCONF(NETDEV_UP): veth0_to_bridge: link is not ready
> IPv6: ADDRCONF(NETDEV_UP): veth1_to_bridge: link is not ready
> bond0: Enslaving bond_slave_0 as an active interface with an up link
> bond0: Enslaving bond_slave_1 as an active interface with an up link
> IPv6: ADDRCONF(NETDEV_UP): team_slave_0: link is not ready
> team0: Port device team_slave_0 added
> IPv6: ADDRCONF(NETDEV_UP): team_slave_1: link is not ready
> team0: Port device team_slave_1 added
> IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
> bridge0: port 2(bridge_slave_1) entered blocking state
> bridge0: port 2(bridge_slave_1) entered forwarding state
> bridge0: port 1(bridge_slave_0) entered blocking state
> bridge0: port 1(bridge_slave_0) entered forwarding state
> 8021q: adding VLAN 0 to HW filter on device bond0
> IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/000000000000c5b63005737f290d%40google.com.
>
> For more options, visit https://groups.google.com/d/optout.
syzbot has found a reproducer for the following crash on:
HEAD commit: d7857ae43dcc Add linux-next specific files for 20180817
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13c72fce400000
kernel config: https://syzkaller.appspot.com/x/.config?x=4b10cd1ea76bb092
dashboard link: https://syzkaller.appspot.com/bug?extid=30209ea299c09d8785c9
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=15df679a400000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15242741400000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]
IPVS: stopping master sync thread 4657 ...
IPVS: stopping master sync thread 4663 ...
IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
id = 0
IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0,
id = 0
IPVS: stopping master sync thread 4664 ...
unregister_netdevice: waiting for lo to become free. Usage count = 1
Hello,
On Sun, 19 Aug 2018, syzbot wrote:
> syzbot has found a reproducer for the following crash on:
>
> HEAD commit: d7857ae43dcc Add linux-next specific files for 20180817
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13c72fce400000
> kernel config: https://syzkaller.appspot.com/x/.config?x=4b10cd1ea76bb092
> dashboard link: https://syzkaller.appspot.com/bug?extid=30209ea299c09d8785c9
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=15df679a400000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15242741400000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
>
> IPVS: stopping master sync thread 4657 ...
> IPVS: stopping master sync thread 4663 ...
> IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0, id
> IPVS: = 0
> IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0, id
> IPVS: = 0
> IPVS: stopping master sync thread 4664 ...
> unregister_netdevice: waiting for lo to become free. Usage count = 1
Well, only IPVS and tun in the game? But IPVS does not
take any dev references for sync threads. Can it be a problem
in tun? For example, a side effects from dst_cache_reset?
May be dst_release is called too late? Here is what should happen
on unregistration:
- NETDEV_UNREGISTER event: rt_flush_dev changes dst->dev with lo
but dst is not released
- ndo_uninit/ip_tunnel_uninit: dst_cache_reset is called which
does nothing!?! May be dst_release call is needed here.
- no more references are expected here ...
- netdev_run_todo -> netdev_wait_allrefs: loop here due to refcnt!=0
- dev->priv_destructor (ip_tunnel_dev_free) calls dst_cache_destroy
where dst_release is used but it is not reached because we loop in
netdev_wait_allrefs above
- dst_cache_destroy: really call dst_release
In fact, after calling rt_flush_dev and replacing the
dst->dev we should reach dev->priv_destructor (ip_tunnel_dev_free)
for tun device where dst_release for lo should be called. But may be
something prevents it, exit batching?
Regards
--
Julian Anastasov <[email protected]>
On Mon, Aug 20, 2018 at 6:00 AM Julian Anastasov <[email protected]> wrote:
>
>
> Hello,
>
> On Sun, 19 Aug 2018, syzbot wrote:
>
> > syzbot has found a reproducer for the following crash on:
> >
> > HEAD commit: d7857ae43dcc Add linux-next specific files for 20180817
> > git tree: linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13c72fce400000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=4b10cd1ea76bb092
> > dashboard link: https://syzkaller.appspot.com/bug?extid=30209ea299c09d8785c9
> > compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> > syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=15df679a400000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15242741400000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > IPVS: stopping master sync thread 4657 ...
> > IPVS: stopping master sync thread 4663 ...
> > IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0, id
> > IPVS: = 0
> > IPVS: sync thread started: state = MASTER, mcast_ifn = syz_tun, syncid = 0, id
> > IPVS: = 0
> > IPVS: stopping master sync thread 4664 ...
> > unregister_netdevice: waiting for lo to become free. Usage count = 1
>
> Well, only IPVS and tun in the game? But IPVS does not
> take any dev references for sync threads. Can it be a problem
> in tun? For example, a side effects from dst_cache_reset?
> May be dst_release is called too late? Here is what should happen
> on unregistration:
There are multiple similar bugs grouped together under this, perhaps
they are different, perhaps they are a same bug, too early to say.
For the one I look into, dst_cache doesn't matter, because the xmit
path doesn't even use tunnel dst_cache at all, and it is ip6tnl0 FB
device, unlike this one which is tun device.
>
> - NETDEV_UNREGISTER event: rt_flush_dev changes dst->dev with lo
> but dst is not released
>
> - ndo_uninit/ip_tunnel_uninit: dst_cache_reset is called which
> does nothing!?! May be dst_release call is needed here.
I think this makes sense, at least prior to the general dst_cache
introduction, dst refcnt was released in ndo_uninit() too, so it
is reasonable to move the dst_cache_destroy() to ndo_uninit().
>
> - no more references are expected here ...
>
> - netdev_run_todo -> netdev_wait_allrefs: loop here due to refcnt!=0
>
> - dev->priv_destructor (ip_tunnel_dev_free) calls dst_cache_destroy
> where dst_release is used but it is not reached because we loop in
> netdev_wait_allrefs above
>
> - dst_cache_destroy: really call dst_release
>
> In fact, after calling rt_flush_dev and replacing the
> dst->dev we should reach dev->priv_destructor (ip_tunnel_dev_free)
> for tun device where dst_release for lo should be called. But may be
> something prevents it, exit batching?
I can't see anything in netnns exit batch is any special here.
For the one I look into, it seems some fib6_info is not released for
some reason. It seems to be the one created by addrconf_prefix_route(),
which is supposed to be released by fib6_clean_tree() I think, but it
never happens.
Thanks.
Hello,
On Mon, 20 Aug 2018, Cong Wang wrote:
> For the one I look into, dst_cache doesn't matter, because the xmit
> path doesn't even use tunnel dst_cache at all, and it is ip6tnl0 FB
> device, unlike this one which is tun device.
Ops, of course, it is dev tun and not ip tun...
> For the one I look into, it seems some fib6_info is not released for
> some reason. It seems to be the one created by addrconf_prefix_route(),
> which is supposed to be released by fib6_clean_tree() I think, but it
> never happens.
May be, it is not direct reference to dev but one
that is moved to loopback, like from dst, route... The repro.c
creates permanent neighbours and addresses.
Regards
--
Julian Anastasov <[email protected]>
On 2018/08/20 21:55, Julian Anastasov wrote:
> Well, only IPVS and tun in the game? But IPVS does not
> take any dev references for sync threads. Can it be a problem
> in tun? For example, a side effects from dst_cache_reset?
> May be dst_release is called too late? Here is what should happen
> on unregistration:
>
> - NETDEV_UNREGISTER event: rt_flush_dev changes dst->dev with lo
> but dst is not released
>
> - ndo_uninit/ip_tunnel_uninit: dst_cache_reset is called which
> does nothing!?! May be dst_release call is needed here.
>
> - no more references are expected here ...
>
> - netdev_run_todo -> netdev_wait_allrefs: loop here due to refcnt!=0
>
> - dev->priv_destructor (ip_tunnel_dev_free) calls dst_cache_destroy
> where dst_release is used but it is not reached because we loop in
> netdev_wait_allrefs above
>
> - dst_cache_destroy: really call dst_release
>
> In fact, after calling rt_flush_dev and replacing the
> dst->dev we should reach dev->priv_destructor (ip_tunnel_dev_free)
> for tun device where dst_release for lo should be called. But may be
> something prevents it, exit batching?
I traced using debug printk() patch shown below.
----------------------------------------
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 26f69cf763f4..25f7acacf457 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3726,6 +3726,11 @@ void netdev_run_todo(void);
*/
static inline void dev_put(struct net_device *dev)
{
+ if (!strcmp(dev->name, "lo")) {
+ int count = netdev_refcnt_read(dev);
+ printk("dev_put(%p): %u->%u\n", dev, count, count - 1);
+ dump_stack();
+ }
this_cpu_dec(*dev->pcpu_refcnt);
}
@@ -3737,6 +3742,11 @@ static inline void dev_put(struct net_device *dev)
*/
static inline void dev_hold(struct net_device *dev)
{
+ if (!strcmp(dev->name, "lo")) {
+ int count = netdev_refcnt_read(dev);
+ printk("dev_hold(%p): %u->%u\n", dev, count, count + 1);
+ dump_stack();
+ }
this_cpu_inc(*dev->pcpu_refcnt);
}
diff --git a/net/core/dev.c b/net/core/dev.c
index fdcff29df915..53ff4385c8f7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8897,9 +8897,9 @@ static void netdev_wait_allrefs(struct net_device *dev)
refcnt = netdev_refcnt_read(dev);
- if (time_after(jiffies, warning_time + 10 * HZ)) {
- pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
- dev->name, refcnt);
+ if (time_after(jiffies, warning_time + HZ)) {
+ pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d%s (%p)\n",
+ dev->name, refcnt, netdev_reg_state(dev), dev);
warning_time = jiffies;
}
}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a5da63e5faa2..4c5baca105ed 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1503,6 +1503,7 @@ void rt_flush_dev(struct net_device *dev)
list_for_each_entry(rt, &ul->head, rt_uncached) {
if (rt->dst.dev != dev)
continue;
+ printk("rt_flush_dev(%p)->(%p)\n", dev, net->loopback_dev);
rt->dst.dev = net->loopback_dev;
dev_hold(rt->dst.dev);
dev_put(dev);
@@ -2560,6 +2561,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
new->input = dst_discard;
new->output = dst_discard_out;
+ printk("ipv4_blackhole_route(%p)->(%p)\n", new->dev, net->loopback_dev);
new->dev = net->loopback_dev;
if (new->dev)
dev_hold(new->dev);
----------------------------------------
When a new loopback device in a new network namespace is created using unshare(),
nothing is wrong.
[ 60.873014][ T7306] dev_hold(00000000d9f4ea20): 0->1
[ 60.873019][ T7306] CPU: 4 PID: 7306 Comm: a.out Not tainted 5.1.0-rc5+ #177
[ 60.873021][ T7306] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 60.873022][ T7306] Call Trace:
[ 60.873031][ T7306] dump_stack+0xaa/0xd8
[ 60.873037][ T7306] net_rx_queue_update_kobjects+0x1f5/0x200
[ 60.873042][ T7306] netdev_register_kobject+0xf8/0x1a0
[ 60.873047][ T7306] register_netdevice+0x4cc/0x650
[ 60.873052][ T7306] register_netdev+0x23/0x40
[ 60.873056][ T7306] loopback_net_init+0x50/0xc0
[ 60.873059][ T7306] ? loopback_dev_init+0xa0/0xa0
[ 60.873064][ T7306] ops_init+0x4f/0x140
[ 60.873068][ T7306] setup_net+0xe7/0x250
[ 60.873072][ T7306] copy_net_ns+0xee/0x1e0
[ 60.873077][ T7306] create_new_namespaces+0x141/0x2a0
[ 60.873081][ T7306] unshare_nsproxy_namespaces+0x7e/0xf0
[ 60.873086][ T7306] ksys_unshare+0x268/0x4b0
[ 60.873090][ T7306] __x64_sys_unshare+0x16/0x20
[ 60.873095][ T7306] do_syscall_64+0x7c/0x180
[ 60.873099][ T7306] entry_SYSCALL_64_after_hwframe+0x44/0xa9
But when some device in a network namespace calls rt_flush_dev(),
it gets a usage count on loopback device in that network namespace.
[ 71.388104][ T7620] rt_flush_dev(00000000cd35e96a)->(00000000d9f4ea20)
[ 71.391757][ T7620] dev_hold(00000000d9f4ea20): 7->8
[ 71.394725][ T7620] CPU: 4 PID: 7620 Comm: a.out Not tainted 5.1.0-rc5+ #177
[ 71.398094][ T7620] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 71.403711][ T7620] Call Trace:
[ 71.405912][ T7620] dump_stack+0xaa/0xd8
[ 71.408252][ T7620] rt_flush_dev+0x177/0x1b0
[ 71.410802][ T7620] fib_netdev_event+0x150/0x1b0
[ 71.413270][ T7620] notifier_call_chain+0x47/0xd0
[ 71.415849][ T7620] raw_notifier_call_chain+0x2d/0x40
[ 71.418491][ T7620] ? tun_show_group+0x90/0x90
[ 71.421108][ T7620] call_netdevice_notifiers_info+0x32/0x70
[ 71.423854][ T7620] rollback_registered_many+0x421/0x680
[ 71.426583][ T7620] rollback_registered+0x68/0xb0
[ 71.429244][ T7620] unregister_netdevice_queue+0xa5/0x100
[ 71.432191][ T7620] __tun_detach+0x576/0x590
[ 71.435533][ T7620] tun_chr_close+0x41/0x80
[ 71.437957][ T7620] ? __tun_detach+0x590/0x590
[ 71.440500][ T7620] __fput+0xeb/0x2d0
[ 71.442816][ T7620] ____fput+0x15/0x20
[ 71.445090][ T7620] task_work_run+0xa9/0xd0
[ 71.447467][ T7620] do_exit+0x37a/0xf40
[ 71.449623][ T7620] do_group_exit+0x57/0xe0
[ 71.451826][ T7620] get_signal+0x114/0x950
[ 71.453989][ T7620] do_signal+0x2f/0x700
[ 71.456126][ T7620] ? handle_mm_fault+0x1a8/0x360
[ 71.458323][ T7620] ? __x64_sys_futex+0x179/0x210
[ 71.460620][ T7620] exit_to_usermode_loop+0x159/0x180
[ 71.462956][ T7620] do_syscall_64+0x15d/0x180
[ 71.465110][ T7620] entry_SYSCALL_64_after_hwframe+0x44/0xa9
The usage count printed by "unregister_netdevice: waiting for lo to become free."
seems to match number of rt_flush_dev() traces shown above. (Complete log is at
http://I-love.SAKURA.ne.jp/tmp/serial-20190415.txt.xz .)
Although netdev_wait_allrefs() is periodically calling
call_netdevice_notifiers(NETDEV_UNREGISTER, dev) in order to try to drop
the usgae count,
list_for_each_entry(rt, &ul->head, rt_uncached) {
if (rt->dst.dev != dev)
continue;
rt->dst.dev = net->loopback_dev;
dev_hold(rt->dst.dev);
dev_put(dev);
}
in rt_flush_dev() becomes a no-op because dev == net->loopback_dev and
therefore cannot drop the usage count forever. That is, netdev_wait_allrefs()
on a loopback device cannot make forward progress.
[ 95.502947][ T4478] unregister_netdevice: waiting for lo to become free. Usage count = 28 (unregistered) (00000000d9f4ea20)
[ 95.509108][ T4478] rt_flush_dev(00000000d9f4ea20)->(00000000d9f4ea20)
[ 95.512598][ T4478] dev_hold(00000000d9f4ea20): 28->29
[ 95.517241][ T4478] CPU: 5 PID: 4478 Comm: kworker/u128:28 Not tainted 5.1.0-rc5+ #177
[ 95.522984][ T4478] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 95.532898][ T4478] Workqueue: netns cleanup_net
[ 95.537134][ T4478] Call Trace:
[ 95.539751][ T4478] dump_stack+0xaa/0xd8
[ 95.543642][ T4478] rt_flush_dev+0x177/0x1b0
[ 95.546835][ T4478] fib_netdev_event+0x150/0x1b0
[ 95.550609][ T4478] notifier_call_chain+0x47/0xd0
[ 95.557990][ T4478] raw_notifier_call_chain+0x2d/0x40
[ 95.563900][ T4478] call_netdevice_notifiers_info+0x32/0x70
[ 95.568655][ T4478] netdev_run_todo+0x197/0x410
[ 95.572554][ T4478] rtnl_unlock+0xe/0x10
[ 95.576836][ T4478] default_device_exit_batch+0x1ab/0x1d0
[ 95.579650][ T4478] ? do_wait_intr_irq+0xb0/0xb0
[ 95.582236][ T4478] ? unregister_netdevice_many+0x30/0x30
[ 95.584766][ T4478] ? dev_change_net_namespace+0x4e0/0x4e0
[ 95.587397][ T4478] ops_exit_list.isra.6+0x75/0x90
[ 95.590063][ T4478] cleanup_net+0x20d/0x380
[ 95.592373][ T4478] process_one_work+0x202/0x4f0
[ 95.595017][ T4478] worker_thread+0x3c/0x4b0
[ 95.597520][ T4478] kthread+0x139/0x160
[ 95.599822][ T4478] ? process_one_work+0x4f0/0x4f0
[ 95.602351][ T4478] ? kthread_destroy_worker+0x70/0x70
[ 95.604901][ T4478] ret_from_fork+0x35/0x40
[ 95.607249][ T4478] dev_put(00000000d9f4ea20): 29->28
[ 95.609935][ T4478] CPU: 5 PID: 4478 Comm: kworker/u128:28 Not tainted 5.1.0-rc5+ #177
[ 95.613282][ T4478] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 95.618699][ T4478] Workqueue: netns cleanup_net
[ 95.621345][ T4478] Call Trace:
[ 95.623381][ T4478] dump_stack+0xaa/0xd8
[ 95.625787][ T4478] rt_flush_dev+0x19f/0x1b0
[ 95.628284][ T4478] fib_netdev_event+0x150/0x1b0
[ 95.630827][ T4478] notifier_call_chain+0x47/0xd0
[ 95.633153][ T4478] raw_notifier_call_chain+0x2d/0x40
[ 95.635702][ T4478] call_netdevice_notifiers_info+0x32/0x70
[ 95.638372][ T4478] netdev_run_todo+0x197/0x410
[ 95.640641][ T4478] rtnl_unlock+0xe/0x10
[ 95.642828][ T4478] default_device_exit_batch+0x1ab/0x1d0
[ 95.645230][ T4478] ? do_wait_intr_irq+0xb0/0xb0
[ 95.647613][ T4478] ? unregister_netdevice_many+0x30/0x30
[ 95.650148][ T4478] ? dev_change_net_namespace+0x4e0/0x4e0
[ 95.652594][ T4478] ops_exit_list.isra.6+0x75/0x90
[ 95.654975][ T4478] cleanup_net+0x20d/0x380
[ 95.657140][ T4478] process_one_work+0x202/0x4f0
[ 95.659829][ T4478] worker_thread+0x3c/0x4b0
[ 95.662098][ T4478] kthread+0x139/0x160
[ 95.664119][ T4478] ? process_one_work+0x4f0/0x4f0
[ 95.666467][ T4478] ? kthread_destroy_worker+0x70/0x70
[ 95.668916][ T4478] ret_from_fork+0x35/0x40
If we do something like
list_for_each_entry(rt, &ul->head, rt_uncached) {
if (rt->dst.dev != dev)
continue;
- rt->dst.dev = net->loopback_dev;
+ if (dev == net->loopback_dev)
+ rt->dst.dev = init_net.loopback_dev;
+ else
+ rt->dst.dev = net->loopback_dev;
dev_hold(rt->dst.dev);
dev_put(dev);
}
at rt_flush_dev(), I guess that this problem will go away. I don't
know which device should be used instead of init_net.loopback_dev .
But I guess that we need to somehow avoid getting usage count on
a loopback device (by e.g. using a dummy device which is not under
unregistration procedure) when we want to unregister that loopback
device.
On 4/15/19 7:36 AM, Tetsuo Handa wrote:
> I traced using debug printk() patch shown below.
>
I find tracepoints (see attached patch) and perf are easier to use to
debug device refcnt problems.
For example, limit the stack you have to deal with via sysctl -w
kernel.perf_event_max_stack=16, and add a filter (e.g., --filter 'name
== "lo"') to limit collection to a specific device.
Hello, David S. Miller.
I have a question regarding rt_flush_dev() introduced by commit caacf05e5ad1abf0
("ipv4: Properly purge netdev references on uncached routes.") which went to
Linux 3.6-rc1. That commit started replacing "a device to unregister" with
"a loopback device in that namespace", but there is no description why that
commit chose "a loopback device in that namespace". If a device to unregister
is "a loopback device in that namespace" itself, rt_flush_dev() becomes a no-op
because dev == net->loopback_dev from the beginning. Apart from a problem that
usage count keeps increasing because dev_put(rt->dst.dev) is not called after
rt->dst.dev was replaced with a loopback device, replacing "a device to unregister"
with "a loopback device in init namespace" (like shown below) avoids this problem.
----------
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a5da63e..aff6a44 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1492,7 +1492,6 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
void rt_flush_dev(struct net_device *dev)
{
- struct net *net = dev_net(dev);
struct rtable *rt;
int cpu;
@@ -1503,7 +1502,7 @@ void rt_flush_dev(struct net_device *dev)
list_for_each_entry(rt, &ul->head, rt_uncached) {
if (rt->dst.dev != dev)
continue;
- rt->dst.dev = net->loopback_dev;
+ rt->dst.dev = init_net.loopback_dev;
dev_hold(rt->dst.dev);
dev_put(dev);
}
----------
On 2019/04/15 22:36, Tetsuo Handa wrote:
> On 2018/08/20 21:55, Julian Anastasov wrote:
>> Well, only IPVS and tun in the game? But IPVS does not
>> take any dev references for sync threads. Can it be a problem
>> in tun? For example, a side effects from dst_cache_reset?
>> May be dst_release is called too late? Here is what should happen
>> on unregistration:
>>
>> - NETDEV_UNREGISTER event: rt_flush_dev changes dst->dev with lo
>> but dst is not released
>>
>> - ndo_uninit/ip_tunnel_uninit: dst_cache_reset is called which
>> does nothing!?! May be dst_release call is needed here.
>>
>> - no more references are expected here ...
>>
>> - netdev_run_todo -> netdev_wait_allrefs: loop here due to refcnt!=0
>>
>> - dev->priv_destructor (ip_tunnel_dev_free) calls dst_cache_destroy
>> where dst_release is used but it is not reached because we loop in
>> netdev_wait_allrefs above
>>
>> - dst_cache_destroy: really call dst_release
>>
>> In fact, after calling rt_flush_dev and replacing the
>> dst->dev we should reach dev->priv_destructor (ip_tunnel_dev_free)
>> for tun device where dst_release for lo should be called. But may be
>> something prevents it, exit batching?
>
> I traced using debug printk() patch shown below.
>
> ----------------------------------------
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 26f69cf763f4..25f7acacf457 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3726,6 +3726,11 @@ void netdev_run_todo(void);
> */
> static inline void dev_put(struct net_device *dev)
> {
> + if (!strcmp(dev->name, "lo")) {
> + int count = netdev_refcnt_read(dev);
> + printk("dev_put(%p): %u->%u\n", dev, count, count - 1);
> + dump_stack();
> + }
> this_cpu_dec(*dev->pcpu_refcnt);
> }
>
> @@ -3737,6 +3742,11 @@ static inline void dev_put(struct net_device *dev)
> */
> static inline void dev_hold(struct net_device *dev)
> {
> + if (!strcmp(dev->name, "lo")) {
> + int count = netdev_refcnt_read(dev);
> + printk("dev_hold(%p): %u->%u\n", dev, count, count + 1);
> + dump_stack();
> + }
> this_cpu_inc(*dev->pcpu_refcnt);
> }
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index fdcff29df915..53ff4385c8f7 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -8897,9 +8897,9 @@ static void netdev_wait_allrefs(struct net_device *dev)
>
> refcnt = netdev_refcnt_read(dev);
>
> - if (time_after(jiffies, warning_time + 10 * HZ)) {
> - pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
> - dev->name, refcnt);
> + if (time_after(jiffies, warning_time + HZ)) {
> + pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d%s (%p)\n",
> + dev->name, refcnt, netdev_reg_state(dev), dev);
> warning_time = jiffies;
> }
> }
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index a5da63e5faa2..4c5baca105ed 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1503,6 +1503,7 @@ void rt_flush_dev(struct net_device *dev)
> list_for_each_entry(rt, &ul->head, rt_uncached) {
> if (rt->dst.dev != dev)
> continue;
> + printk("rt_flush_dev(%p)->(%p)\n", dev, net->loopback_dev);
> rt->dst.dev = net->loopback_dev;
> dev_hold(rt->dst.dev);
> dev_put(dev);
> @@ -2560,6 +2561,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
> new->input = dst_discard;
> new->output = dst_discard_out;
>
> + printk("ipv4_blackhole_route(%p)->(%p)\n", new->dev, net->loopback_dev);
> new->dev = net->loopback_dev;
> if (new->dev)
> dev_hold(new->dev);
> ----------------------------------------
>
> When a new loopback device in a new network namespace is created using unshare(),
> nothing is wrong.
>
> [ 60.873014][ T7306] dev_hold(00000000d9f4ea20): 0->1
> [ 60.873019][ T7306] CPU: 4 PID: 7306 Comm: a.out Not tainted 5.1.0-rc5+ #177
> [ 60.873021][ T7306] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [ 60.873022][ T7306] Call Trace:
> [ 60.873031][ T7306] dump_stack+0xaa/0xd8
> [ 60.873037][ T7306] net_rx_queue_update_kobjects+0x1f5/0x200
> [ 60.873042][ T7306] netdev_register_kobject+0xf8/0x1a0
> [ 60.873047][ T7306] register_netdevice+0x4cc/0x650
> [ 60.873052][ T7306] register_netdev+0x23/0x40
> [ 60.873056][ T7306] loopback_net_init+0x50/0xc0
> [ 60.873059][ T7306] ? loopback_dev_init+0xa0/0xa0
> [ 60.873064][ T7306] ops_init+0x4f/0x140
> [ 60.873068][ T7306] setup_net+0xe7/0x250
> [ 60.873072][ T7306] copy_net_ns+0xee/0x1e0
> [ 60.873077][ T7306] create_new_namespaces+0x141/0x2a0
> [ 60.873081][ T7306] unshare_nsproxy_namespaces+0x7e/0xf0
> [ 60.873086][ T7306] ksys_unshare+0x268/0x4b0
> [ 60.873090][ T7306] __x64_sys_unshare+0x16/0x20
> [ 60.873095][ T7306] do_syscall_64+0x7c/0x180
> [ 60.873099][ T7306] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> But when some device in a network namespace calls rt_flush_dev(),
> it gets a usage count on loopback device in that network namespace.
>
> [ 71.388104][ T7620] rt_flush_dev(00000000cd35e96a)->(00000000d9f4ea20)
> [ 71.391757][ T7620] dev_hold(00000000d9f4ea20): 7->8
> [ 71.394725][ T7620] CPU: 4 PID: 7620 Comm: a.out Not tainted 5.1.0-rc5+ #177
> [ 71.398094][ T7620] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [ 71.403711][ T7620] Call Trace:
> [ 71.405912][ T7620] dump_stack+0xaa/0xd8
> [ 71.408252][ T7620] rt_flush_dev+0x177/0x1b0
> [ 71.410802][ T7620] fib_netdev_event+0x150/0x1b0
> [ 71.413270][ T7620] notifier_call_chain+0x47/0xd0
> [ 71.415849][ T7620] raw_notifier_call_chain+0x2d/0x40
> [ 71.418491][ T7620] ? tun_show_group+0x90/0x90
> [ 71.421108][ T7620] call_netdevice_notifiers_info+0x32/0x70
> [ 71.423854][ T7620] rollback_registered_many+0x421/0x680
> [ 71.426583][ T7620] rollback_registered+0x68/0xb0
> [ 71.429244][ T7620] unregister_netdevice_queue+0xa5/0x100
> [ 71.432191][ T7620] __tun_detach+0x576/0x590
> [ 71.435533][ T7620] tun_chr_close+0x41/0x80
> [ 71.437957][ T7620] ? __tun_detach+0x590/0x590
> [ 71.440500][ T7620] __fput+0xeb/0x2d0
> [ 71.442816][ T7620] ____fput+0x15/0x20
> [ 71.445090][ T7620] task_work_run+0xa9/0xd0
> [ 71.447467][ T7620] do_exit+0x37a/0xf40
> [ 71.449623][ T7620] do_group_exit+0x57/0xe0
> [ 71.451826][ T7620] get_signal+0x114/0x950
> [ 71.453989][ T7620] do_signal+0x2f/0x700
> [ 71.456126][ T7620] ? handle_mm_fault+0x1a8/0x360
> [ 71.458323][ T7620] ? __x64_sys_futex+0x179/0x210
> [ 71.460620][ T7620] exit_to_usermode_loop+0x159/0x180
> [ 71.462956][ T7620] do_syscall_64+0x15d/0x180
> [ 71.465110][ T7620] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> The usage count printed by "unregister_netdevice: waiting for lo to become free."
> seems to match number of rt_flush_dev() traces shown above. (Complete log is at
> http://I-love.SAKURA.ne.jp/tmp/serial-20190415.txt.xz .)
>
> Although netdev_wait_allrefs() is periodically calling
> call_netdevice_notifiers(NETDEV_UNREGISTER, dev) in order to try to drop
> the usgae count,
>
> list_for_each_entry(rt, &ul->head, rt_uncached) {
> if (rt->dst.dev != dev)
> continue;
> rt->dst.dev = net->loopback_dev;
> dev_hold(rt->dst.dev);
> dev_put(dev);
> }
>
> in rt_flush_dev() becomes a no-op because dev == net->loopback_dev and
> therefore cannot drop the usage count forever. That is, netdev_wait_allrefs()
> on a loopback device cannot make forward progress.
>
> [ 95.502947][ T4478] unregister_netdevice: waiting for lo to become free. Usage count = 28 (unregistered) (00000000d9f4ea20)
> [ 95.509108][ T4478] rt_flush_dev(00000000d9f4ea20)->(00000000d9f4ea20)
> [ 95.512598][ T4478] dev_hold(00000000d9f4ea20): 28->29
> [ 95.517241][ T4478] CPU: 5 PID: 4478 Comm: kworker/u128:28 Not tainted 5.1.0-rc5+ #177
> [ 95.522984][ T4478] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [ 95.532898][ T4478] Workqueue: netns cleanup_net
> [ 95.537134][ T4478] Call Trace:
> [ 95.539751][ T4478] dump_stack+0xaa/0xd8
> [ 95.543642][ T4478] rt_flush_dev+0x177/0x1b0
> [ 95.546835][ T4478] fib_netdev_event+0x150/0x1b0
> [ 95.550609][ T4478] notifier_call_chain+0x47/0xd0
> [ 95.557990][ T4478] raw_notifier_call_chain+0x2d/0x40
> [ 95.563900][ T4478] call_netdevice_notifiers_info+0x32/0x70
> [ 95.568655][ T4478] netdev_run_todo+0x197/0x410
> [ 95.572554][ T4478] rtnl_unlock+0xe/0x10
> [ 95.576836][ T4478] default_device_exit_batch+0x1ab/0x1d0
> [ 95.579650][ T4478] ? do_wait_intr_irq+0xb0/0xb0
> [ 95.582236][ T4478] ? unregister_netdevice_many+0x30/0x30
> [ 95.584766][ T4478] ? dev_change_net_namespace+0x4e0/0x4e0
> [ 95.587397][ T4478] ops_exit_list.isra.6+0x75/0x90
> [ 95.590063][ T4478] cleanup_net+0x20d/0x380
> [ 95.592373][ T4478] process_one_work+0x202/0x4f0
> [ 95.595017][ T4478] worker_thread+0x3c/0x4b0
> [ 95.597520][ T4478] kthread+0x139/0x160
> [ 95.599822][ T4478] ? process_one_work+0x4f0/0x4f0
> [ 95.602351][ T4478] ? kthread_destroy_worker+0x70/0x70
> [ 95.604901][ T4478] ret_from_fork+0x35/0x40
> [ 95.607249][ T4478] dev_put(00000000d9f4ea20): 29->28
> [ 95.609935][ T4478] CPU: 5 PID: 4478 Comm: kworker/u128:28 Not tainted 5.1.0-rc5+ #177
> [ 95.613282][ T4478] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [ 95.618699][ T4478] Workqueue: netns cleanup_net
> [ 95.621345][ T4478] Call Trace:
> [ 95.623381][ T4478] dump_stack+0xaa/0xd8
> [ 95.625787][ T4478] rt_flush_dev+0x19f/0x1b0
> [ 95.628284][ T4478] fib_netdev_event+0x150/0x1b0
> [ 95.630827][ T4478] notifier_call_chain+0x47/0xd0
> [ 95.633153][ T4478] raw_notifier_call_chain+0x2d/0x40
> [ 95.635702][ T4478] call_netdevice_notifiers_info+0x32/0x70
> [ 95.638372][ T4478] netdev_run_todo+0x197/0x410
> [ 95.640641][ T4478] rtnl_unlock+0xe/0x10
> [ 95.642828][ T4478] default_device_exit_batch+0x1ab/0x1d0
> [ 95.645230][ T4478] ? do_wait_intr_irq+0xb0/0xb0
> [ 95.647613][ T4478] ? unregister_netdevice_many+0x30/0x30
> [ 95.650148][ T4478] ? dev_change_net_namespace+0x4e0/0x4e0
> [ 95.652594][ T4478] ops_exit_list.isra.6+0x75/0x90
> [ 95.654975][ T4478] cleanup_net+0x20d/0x380
> [ 95.657140][ T4478] process_one_work+0x202/0x4f0
> [ 95.659829][ T4478] worker_thread+0x3c/0x4b0
> [ 95.662098][ T4478] kthread+0x139/0x160
> [ 95.664119][ T4478] ? process_one_work+0x4f0/0x4f0
> [ 95.666467][ T4478] ? kthread_destroy_worker+0x70/0x70
> [ 95.668916][ T4478] ret_from_fork+0x35/0x40
>
> If we do something like
>
> list_for_each_entry(rt, &ul->head, rt_uncached) {
> if (rt->dst.dev != dev)
> continue;
> - rt->dst.dev = net->loopback_dev;
> + if (dev == net->loopback_dev)
> + rt->dst.dev = init_net.loopback_dev;
> + else
> + rt->dst.dev = net->loopback_dev;
> dev_hold(rt->dst.dev);
> dev_put(dev);
> }
>
> at rt_flush_dev(), I guess that this problem will go away. I don't
> know which device should be used instead of init_net.loopback_dev .
> But I guess that we need to somehow avoid getting usage count on
> a loopback device (by e.g. using a dummy device which is not under
> unregistration procedure) when we want to unregister that loopback
> device.
>
Hi David,
I looked at patchwork. This patch hasn't been accepted. Is there a plan
to resubmit? It is very useful. I had to debug refcnt issues multiple
times for my employer.
Thanks,
Stephen.
On Mon, Apr 15, 2019 at 09:35:01AM -0600, David Ahern wrote:
> On 4/15/19 7:36 AM, Tetsuo Handa wrote:
> > I traced using debug printk() patch shown below.
> >
>
> I find tracepoints (see attached patch) and perf are easier to use to
> debug device refcnt problems.
>
> For example, limit the stack you have to deal with via sysctl -w
> kernel.perf_event_max_stack=16, and add a filter (e.g., --filter 'name
> == "lo"') to limit collection to a specific device.
> From 068b1b8362ec5fd1b9dffdbd6e84474ada2eb829 Mon Sep 17 00:00:00 2001
> From: David Ahern <[email protected]>
> Date: Thu, 11 Feb 2016 02:40:12 -0800
> Subject: [PATCH] Add tracepoints to dev_hold and dev_put
>
> Signed-off-by: David Ahern <[email protected]>
> ---
> include/linux/netdevice.h | 6 ++++++
> include/trace/events/net.h | 38 ++++++++++++++++++++++++++++++++++++++
> net/core/dev.c | 21 +++++++++++++++++++++
> 3 files changed, 65 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 219f53c30cb3..7ef6fc672dfb 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3193,6 +3193,7 @@ extern int netdev_budget;
> /* Called by rtnetlink.c:rtnl_unlock() */
> void netdev_run_todo(void);
>
> +#if 0
> /**
> * dev_put - release reference to device
> * @dev: network device
> @@ -3214,6 +3215,11 @@ static inline void dev_hold(struct net_device *dev)
> {
> this_cpu_inc(*dev->pcpu_refcnt);
> }
> +#else
> +void dev_put(struct net_device *dev);
> +void dev_hold(struct net_device *dev);
> +
> +#endif
>
> /* Carrier loss detection, dial on demand. The functions netif_carrier_on
> * and _off may be called from IRQ context, but it is caller
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> index 49cc7c3de252..9ed73dfe9d09 100644
> --- a/include/trace/events/net.h
> +++ b/include/trace/events/net.h
> @@ -236,6 +236,44 @@ DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_ni_entry,
> TP_ARGS(skb)
> );
>
> +TRACE_EVENT(dev_put,
> +
> + TP_PROTO(struct net_device *dev),
> +
> + TP_ARGS(dev),
> +
> + TP_STRUCT__entry(
> + __string( name, dev->name )
> + __field( int, refcnt )
> + ),
> +
> + TP_fast_assign(
> + __assign_str(name, dev->name);
> + __entry->refcnt = netdev_refcnt_read(dev);
> + ),
> +
> + TP_printk("dev=%s refcnt %d", __get_str(name), __entry->refcnt)
> +);
> +
> +TRACE_EVENT(dev_hold,
> +
> + TP_PROTO(struct net_device *dev),
> +
> + TP_ARGS(dev),
> +
> + TP_STRUCT__entry(
> + __string( name, dev->name )
> + __field( int, refcnt )
> + ),
> +
> + TP_fast_assign(
> + __assign_str(name, dev->name);
> + __entry->refcnt = netdev_refcnt_read(dev);
> + ),
> +
> + TP_printk("dev=%s refcnt %d", __get_str(name), __entry->refcnt)
> +);
> +
> #endif /* _TRACE_NET_H */
>
> /* This part must be outside protection */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index f1284835b8c9..99ac067afd18 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -8117,3 +8117,24 @@ static int __init net_dev_init(void)
> }
>
> subsys_initcall(net_dev_init);
> +
> +
> +void dev_put(struct net_device *dev)
> +{
> + this_cpu_dec(*dev->pcpu_refcnt);
> + trace_dev_put(dev);
> +}
> +EXPORT_SYMBOL(dev_put);
> +
> +/**
> + * dev_hold - get reference to device
> + * @dev: network device
> + *
> + * Hold reference to device to keep it from being freed.
> + */
> +void dev_hold(struct net_device *dev)
> +{
> + this_cpu_inc(*dev->pcpu_refcnt);
> + trace_dev_hold(dev);
> +}
> +EXPORT_SYMBOL(dev_hold);
> --
> 2.1.4
>
On 4/21/19 2:41 PM, Stephen Suryaputra wrote:
> Hi David,
>
> I looked at patchwork. This patch hasn't been accepted. Is there a plan
> to resubmit? It is very useful. I had to debug refcnt issues multiple
> times for my employer.
I think the inlined versions of dev_put and dev_hold are better for
performance.
I could submit it with a DEBUG config to enable. It has been invaluable
to me over the past 3+ years debugging refcount problems.
On 04/22/2019 07:58 AM, David Ahern wrote:
> On 4/21/19 2:41 PM, Stephen Suryaputra wrote:
>> Hi David,
>>
>> I looked at patchwork. This patch hasn't been accepted. Is there a plan
>> to resubmit? It is very useful. I had to debug refcnt issues multiple
>> times for my employer.
>
> I think the inlined versions of dev_put and dev_hold are better for
> performance.
>
> I could submit it with a DEBUG config to enable. It has been invaluable
> to me over the past 3+ years debugging refcount problems.
>
Sounds a good plan to me :)
On 04/22/2019 09:04 AM, Eric Dumazet wrote:
>
>
> On 04/22/2019 07:58 AM, David Ahern wrote:
>> On 4/21/19 2:41 PM, Stephen Suryaputra wrote:
>>> Hi David,
>>>
>>> I looked at patchwork. This patch hasn't been accepted. Is there a plan
>>> to resubmit? It is very useful. I had to debug refcnt issues multiple
>>> times for my employer.
>>
>> I think the inlined versions of dev_put and dev_hold are better for
>> performance.
However I do not really see dev_put()/dev_hold() being in very hot code paths.
>>
>> I could submit it with a DEBUG config to enable. It has been invaluable
>> to me over the past 3+ years debugging refcount problems.
>>
>
> Sounds a good plan to me :)
>
Another idea would be to track of all dev_hold(), by allocating an unique cookie
and providing the cookie at the corresponding dev_put()
The cookie would also keep track of the stack trace.
This would allow for an automatic finding of the missing release, but at extra cost of course.
This infrastructure would of course being a debug option at kernel build.
This bug is the top crasher for syzbot and thus we want to fix. I need your
response regarding commit caacf05e5ad1abf0 ("ipv4: Properly purge netdev
references on uncached routes.") why you chose "a loopback device in that
namespace".
On 2019/04/16 23:00, Tetsuo Handa wrote:
> Hello, David S. Miller.
>
> I have a question regarding rt_flush_dev() introduced by commit caacf05e5ad1abf0
> ("ipv4: Properly purge netdev references on uncached routes.") which went to
> Linux 3.6-rc1. That commit started replacing "a device to unregister" with
> "a loopback device in that namespace", but there is no description why that
> commit chose "a loopback device in that namespace". If a device to unregister
> is "a loopback device in that namespace" itself, rt_flush_dev() becomes a no-op
> because dev == net->loopback_dev from the beginning. Apart from a problem that
> usage count keeps increasing because dev_put(rt->dst.dev) is not called after
> rt->dst.dev was replaced with a loopback device, replacing "a device to unregister"
> with "a loopback device in init namespace" (like shown below) avoids this problem.
>
> ----------
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index a5da63e..aff6a44 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1492,7 +1492,6 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
>
> void rt_flush_dev(struct net_device *dev)
> {
> - struct net *net = dev_net(dev);
> struct rtable *rt;
> int cpu;
>
> @@ -1503,7 +1502,7 @@ void rt_flush_dev(struct net_device *dev)
> list_for_each_entry(rt, &ul->head, rt_uncached) {
> if (rt->dst.dev != dev)
> continue;
> - rt->dst.dev = net->loopback_dev;
> + rt->dst.dev = init_net.loopback_dev;
> dev_hold(rt->dst.dev);
> dev_put(dev);
> }
> ----------
>
> On 2019/04/15 22:36, Tetsuo Handa wrote:
>> On 2018/08/20 21:55, Julian Anastasov wrote:
>>> Well, only IPVS and tun in the game? But IPVS does not
>>> take any dev references for sync threads. Can it be a problem
>>> in tun? For example, a side effects from dst_cache_reset?
>>> May be dst_release is called too late? Here is what should happen
>>> on unregistration:
>>>
>>> - NETDEV_UNREGISTER event: rt_flush_dev changes dst->dev with lo
>>> but dst is not released
>>>
>>> - ndo_uninit/ip_tunnel_uninit: dst_cache_reset is called which
>>> does nothing!?! May be dst_release call is needed here.
>>>
>>> - no more references are expected here ...
>>>
>>> - netdev_run_todo -> netdev_wait_allrefs: loop here due to refcnt!=0
>>>
>>> - dev->priv_destructor (ip_tunnel_dev_free) calls dst_cache_destroy
>>> where dst_release is used but it is not reached because we loop in
>>> netdev_wait_allrefs above
>>>
>>> - dst_cache_destroy: really call dst_release
>>>
>>> In fact, after calling rt_flush_dev and replacing the
>>> dst->dev we should reach dev->priv_destructor (ip_tunnel_dev_free)
>>> for tun device where dst_release for lo should be called. But may be
>>> something prevents it, exit batching?
>>
>> I traced using debug printk() patch shown below.
>>
>> ----------------------------------------
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 26f69cf763f4..25f7acacf457 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -3726,6 +3726,11 @@ void netdev_run_todo(void);
>> */
>> static inline void dev_put(struct net_device *dev)
>> {
>> + if (!strcmp(dev->name, "lo")) {
>> + int count = netdev_refcnt_read(dev);
>> + printk("dev_put(%p): %u->%u\n", dev, count, count - 1);
>> + dump_stack();
>> + }
>> this_cpu_dec(*dev->pcpu_refcnt);
>> }
>>
>> @@ -3737,6 +3742,11 @@ static inline void dev_put(struct net_device *dev)
>> */
>> static inline void dev_hold(struct net_device *dev)
>> {
>> + if (!strcmp(dev->name, "lo")) {
>> + int count = netdev_refcnt_read(dev);
>> + printk("dev_hold(%p): %u->%u\n", dev, count, count + 1);
>> + dump_stack();
>> + }
>> this_cpu_inc(*dev->pcpu_refcnt);
>> }
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index fdcff29df915..53ff4385c8f7 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -8897,9 +8897,9 @@ static void netdev_wait_allrefs(struct net_device *dev)
>>
>> refcnt = netdev_refcnt_read(dev);
>>
>> - if (time_after(jiffies, warning_time + 10 * HZ)) {
>> - pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
>> - dev->name, refcnt);
>> + if (time_after(jiffies, warning_time + HZ)) {
>> + pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d%s (%p)\n",
>> + dev->name, refcnt, netdev_reg_state(dev), dev);
>> warning_time = jiffies;
>> }
>> }
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index a5da63e5faa2..4c5baca105ed 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -1503,6 +1503,7 @@ void rt_flush_dev(struct net_device *dev)
>> list_for_each_entry(rt, &ul->head, rt_uncached) {
>> if (rt->dst.dev != dev)
>> continue;
>> + printk("rt_flush_dev(%p)->(%p)\n", dev, net->loopback_dev);
>> rt->dst.dev = net->loopback_dev;
>> dev_hold(rt->dst.dev);
>> dev_put(dev);
>> @@ -2560,6 +2561,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
>> new->input = dst_discard;
>> new->output = dst_discard_out;
>>
>> + printk("ipv4_blackhole_route(%p)->(%p)\n", new->dev, net->loopback_dev);
>> new->dev = net->loopback_dev;
>> if (new->dev)
>> dev_hold(new->dev);
>> ----------------------------------------
>>
>> When a new loopback device in a new network namespace is created using unshare(),
>> nothing is wrong.
>>
>> [ 60.873014][ T7306] dev_hold(00000000d9f4ea20): 0->1
>> [ 60.873019][ T7306] CPU: 4 PID: 7306 Comm: a.out Not tainted 5.1.0-rc5+ #177
>> [ 60.873021][ T7306] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
>> [ 60.873022][ T7306] Call Trace:
>> [ 60.873031][ T7306] dump_stack+0xaa/0xd8
>> [ 60.873037][ T7306] net_rx_queue_update_kobjects+0x1f5/0x200
>> [ 60.873042][ T7306] netdev_register_kobject+0xf8/0x1a0
>> [ 60.873047][ T7306] register_netdevice+0x4cc/0x650
>> [ 60.873052][ T7306] register_netdev+0x23/0x40
>> [ 60.873056][ T7306] loopback_net_init+0x50/0xc0
>> [ 60.873059][ T7306] ? loopback_dev_init+0xa0/0xa0
>> [ 60.873064][ T7306] ops_init+0x4f/0x140
>> [ 60.873068][ T7306] setup_net+0xe7/0x250
>> [ 60.873072][ T7306] copy_net_ns+0xee/0x1e0
>> [ 60.873077][ T7306] create_new_namespaces+0x141/0x2a0
>> [ 60.873081][ T7306] unshare_nsproxy_namespaces+0x7e/0xf0
>> [ 60.873086][ T7306] ksys_unshare+0x268/0x4b0
>> [ 60.873090][ T7306] __x64_sys_unshare+0x16/0x20
>> [ 60.873095][ T7306] do_syscall_64+0x7c/0x180
>> [ 60.873099][ T7306] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> But when some device in a network namespace calls rt_flush_dev(),
>> it gets a usage count on loopback device in that network namespace.
>>
>> [ 71.388104][ T7620] rt_flush_dev(00000000cd35e96a)->(00000000d9f4ea20)
>> [ 71.391757][ T7620] dev_hold(00000000d9f4ea20): 7->8
>> [ 71.394725][ T7620] CPU: 4 PID: 7620 Comm: a.out Not tainted 5.1.0-rc5+ #177
>> [ 71.398094][ T7620] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
>> [ 71.403711][ T7620] Call Trace:
>> [ 71.405912][ T7620] dump_stack+0xaa/0xd8
>> [ 71.408252][ T7620] rt_flush_dev+0x177/0x1b0
>> [ 71.410802][ T7620] fib_netdev_event+0x150/0x1b0
>> [ 71.413270][ T7620] notifier_call_chain+0x47/0xd0
>> [ 71.415849][ T7620] raw_notifier_call_chain+0x2d/0x40
>> [ 71.418491][ T7620] ? tun_show_group+0x90/0x90
>> [ 71.421108][ T7620] call_netdevice_notifiers_info+0x32/0x70
>> [ 71.423854][ T7620] rollback_registered_many+0x421/0x680
>> [ 71.426583][ T7620] rollback_registered+0x68/0xb0
>> [ 71.429244][ T7620] unregister_netdevice_queue+0xa5/0x100
>> [ 71.432191][ T7620] __tun_detach+0x576/0x590
>> [ 71.435533][ T7620] tun_chr_close+0x41/0x80
>> [ 71.437957][ T7620] ? __tun_detach+0x590/0x590
>> [ 71.440500][ T7620] __fput+0xeb/0x2d0
>> [ 71.442816][ T7620] ____fput+0x15/0x20
>> [ 71.445090][ T7620] task_work_run+0xa9/0xd0
>> [ 71.447467][ T7620] do_exit+0x37a/0xf40
>> [ 71.449623][ T7620] do_group_exit+0x57/0xe0
>> [ 71.451826][ T7620] get_signal+0x114/0x950
>> [ 71.453989][ T7620] do_signal+0x2f/0x700
>> [ 71.456126][ T7620] ? handle_mm_fault+0x1a8/0x360
>> [ 71.458323][ T7620] ? __x64_sys_futex+0x179/0x210
>> [ 71.460620][ T7620] exit_to_usermode_loop+0x159/0x180
>> [ 71.462956][ T7620] do_syscall_64+0x15d/0x180
>> [ 71.465110][ T7620] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> The usage count printed by "unregister_netdevice: waiting for lo to become free."
>> seems to match number of rt_flush_dev() traces shown above. (Complete log is at
>> http://I-love.SAKURA.ne.jp/tmp/serial-20190415.txt.xz .)
>>
>> Although netdev_wait_allrefs() is periodically calling
>> call_netdevice_notifiers(NETDEV_UNREGISTER, dev) in order to try to drop
>> the usgae count,
>>
>> list_for_each_entry(rt, &ul->head, rt_uncached) {
>> if (rt->dst.dev != dev)
>> continue;
>> rt->dst.dev = net->loopback_dev;
>> dev_hold(rt->dst.dev);
>> dev_put(dev);
>> }
>>
>> in rt_flush_dev() becomes a no-op because dev == net->loopback_dev and
>> therefore cannot drop the usage count forever. That is, netdev_wait_allrefs()
>> on a loopback device cannot make forward progress.
>>
>> [ 95.502947][ T4478] unregister_netdevice: waiting for lo to become free. Usage count = 28 (unregistered) (00000000d9f4ea20)
>> [ 95.509108][ T4478] rt_flush_dev(00000000d9f4ea20)->(00000000d9f4ea20)
>> [ 95.512598][ T4478] dev_hold(00000000d9f4ea20): 28->29
>> [ 95.517241][ T4478] CPU: 5 PID: 4478 Comm: kworker/u128:28 Not tainted 5.1.0-rc5+ #177
>> [ 95.522984][ T4478] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
>> [ 95.532898][ T4478] Workqueue: netns cleanup_net
>> [ 95.537134][ T4478] Call Trace:
>> [ 95.539751][ T4478] dump_stack+0xaa/0xd8
>> [ 95.543642][ T4478] rt_flush_dev+0x177/0x1b0
>> [ 95.546835][ T4478] fib_netdev_event+0x150/0x1b0
>> [ 95.550609][ T4478] notifier_call_chain+0x47/0xd0
>> [ 95.557990][ T4478] raw_notifier_call_chain+0x2d/0x40
>> [ 95.563900][ T4478] call_netdevice_notifiers_info+0x32/0x70
>> [ 95.568655][ T4478] netdev_run_todo+0x197/0x410
>> [ 95.572554][ T4478] rtnl_unlock+0xe/0x10
>> [ 95.576836][ T4478] default_device_exit_batch+0x1ab/0x1d0
>> [ 95.579650][ T4478] ? do_wait_intr_irq+0xb0/0xb0
>> [ 95.582236][ T4478] ? unregister_netdevice_many+0x30/0x30
>> [ 95.584766][ T4478] ? dev_change_net_namespace+0x4e0/0x4e0
>> [ 95.587397][ T4478] ops_exit_list.isra.6+0x75/0x90
>> [ 95.590063][ T4478] cleanup_net+0x20d/0x380
>> [ 95.592373][ T4478] process_one_work+0x202/0x4f0
>> [ 95.595017][ T4478] worker_thread+0x3c/0x4b0
>> [ 95.597520][ T4478] kthread+0x139/0x160
>> [ 95.599822][ T4478] ? process_one_work+0x4f0/0x4f0
>> [ 95.602351][ T4478] ? kthread_destroy_worker+0x70/0x70
>> [ 95.604901][ T4478] ret_from_fork+0x35/0x40
>> [ 95.607249][ T4478] dev_put(00000000d9f4ea20): 29->28
>> [ 95.609935][ T4478] CPU: 5 PID: 4478 Comm: kworker/u128:28 Not tainted 5.1.0-rc5+ #177
>> [ 95.613282][ T4478] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
>> [ 95.618699][ T4478] Workqueue: netns cleanup_net
>> [ 95.621345][ T4478] Call Trace:
>> [ 95.623381][ T4478] dump_stack+0xaa/0xd8
>> [ 95.625787][ T4478] rt_flush_dev+0x19f/0x1b0
>> [ 95.628284][ T4478] fib_netdev_event+0x150/0x1b0
>> [ 95.630827][ T4478] notifier_call_chain+0x47/0xd0
>> [ 95.633153][ T4478] raw_notifier_call_chain+0x2d/0x40
>> [ 95.635702][ T4478] call_netdevice_notifiers_info+0x32/0x70
>> [ 95.638372][ T4478] netdev_run_todo+0x197/0x410
>> [ 95.640641][ T4478] rtnl_unlock+0xe/0x10
>> [ 95.642828][ T4478] default_device_exit_batch+0x1ab/0x1d0
>> [ 95.645230][ T4478] ? do_wait_intr_irq+0xb0/0xb0
>> [ 95.647613][ T4478] ? unregister_netdevice_many+0x30/0x30
>> [ 95.650148][ T4478] ? dev_change_net_namespace+0x4e0/0x4e0
>> [ 95.652594][ T4478] ops_exit_list.isra.6+0x75/0x90
>> [ 95.654975][ T4478] cleanup_net+0x20d/0x380
>> [ 95.657140][ T4478] process_one_work+0x202/0x4f0
>> [ 95.659829][ T4478] worker_thread+0x3c/0x4b0
>> [ 95.662098][ T4478] kthread+0x139/0x160
>> [ 95.664119][ T4478] ? process_one_work+0x4f0/0x4f0
>> [ 95.666467][ T4478] ? kthread_destroy_worker+0x70/0x70
>> [ 95.668916][ T4478] ret_from_fork+0x35/0x40
>>
>> If we do something like
>>
>> list_for_each_entry(rt, &ul->head, rt_uncached) {
>> if (rt->dst.dev != dev)
>> continue;
>> - rt->dst.dev = net->loopback_dev;
>> + if (dev == net->loopback_dev)
>> + rt->dst.dev = init_net.loopback_dev;
>> + else
>> + rt->dst.dev = net->loopback_dev;
>> dev_hold(rt->dst.dev);
>> dev_put(dev);
>> }
>>
>> at rt_flush_dev(), I guess that this problem will go away. I don't
>> know which device should be used instead of init_net.loopback_dev .
>> But I guess that we need to somehow avoid getting usage count on
>> a loopback device (by e.g. using a dummy device which is not under
>> unregistration procedure) when we want to unregister that loopback
>> device.
>>
>
On 4/26/19 7:43 AM, Tetsuo Handa wrote:
> This bug is the top crasher for syzbot and thus we want to fix. I need your
> response regarding commit caacf05e5ad1abf0 ("ipv4: Properly purge netdev
> references on uncached routes.") why you chose "a loopback device in that
> namespace".
>
> On 2019/04/16 23:00, Tetsuo Handa wrote:
>> Hello, David S. Miller.
>>
>> I have a question regarding rt_flush_dev() introduced by commit caacf05e5ad1abf0
>> ("ipv4: Properly purge netdev references on uncached routes.") which went to
>> Linux 3.6-rc1. That commit started replacing "a device to unregister" with
>> "a loopback device in that namespace", but there is no description why that
>> commit chose "a loopback device in that namespace". If a device to unregister
>> is "a loopback device in that namespace" itself, rt_flush_dev() becomes a no-op
>> because dev == net->loopback_dev from the beginning. Apart from a problem that
>> usage count keeps increasing because dev_put(rt->dst.dev) is not called after
>> rt->dst.dev was replaced with a loopback device, replacing "a device to unregister"
>> with "a loopback device in init namespace" (like shown below) avoids this problem.
>>
Moving resource use to the init namespace is not really solving the core
problem. It would be better to understand what changes are needed to the
shutdown sequence of a namespace to ensure proper cleanup.
In this case why are dst entries not getting cleaned up? This one is
referring to entries on the uncached list. What is using the dst entry
and why isn't it getting released?
On 2019/04/28 2:16, David Ahern wrote:
> On 4/26/19 7:43 AM, Tetsuo Handa wrote:
>> This bug is the top crasher for syzbot and thus we want to fix. I need your
>> response regarding commit caacf05e5ad1abf0 ("ipv4: Properly purge netdev
>> references on uncached routes.") why you chose "a loopback device in that
>> namespace".
>>
>> On 2019/04/16 23:00, Tetsuo Handa wrote:
>>> Hello, David S. Miller.
>>>
>>> I have a question regarding rt_flush_dev() introduced by commit caacf05e5ad1abf0
>>> ("ipv4: Properly purge netdev references on uncached routes.") which went to
>>> Linux 3.6-rc1. That commit started replacing "a device to unregister" with
>>> "a loopback device in that namespace", but there is no description why that
>>> commit chose "a loopback device in that namespace". If a device to unregister
>>> is "a loopback device in that namespace" itself, rt_flush_dev() becomes a no-op
>>> because dev == net->loopback_dev from the beginning. Apart from a problem that
>>> usage count keeps increasing because dev_put(rt->dst.dev) is not called after
>>> rt->dst.dev was replaced with a loopback device, replacing "a device to unregister"
>>> with "a loopback device in init namespace" (like shown below) avoids this problem.
>>>
>
> Moving resource use to the init namespace is not really solving the core
> problem. It would be better to understand what changes are needed to the
> shutdown sequence of a namespace to ensure proper cleanup.
I know.
>
> In this case why are dst entries not getting cleaned up? This one is
> referring to entries on the uncached list. What is using the dst entry
> and why isn't it getting released?
>
I'm waiting for davem why it is safe to move the dst entry from
"a device to unregister" to "a loopback device in that namespace".
I'm waiting for an explanation how the dst entry which was moved to
"a loopback device in that namespace" is released (i.e. what the
expected shutdown sequence is).
On 4/27/19 3:33 PM, Tetsuo Handa wrote:
>
> I'm waiting for davem why it is safe to move the dst entry from
> "a device to unregister" to "a loopback device in that namespace".
> I'm waiting for an explanation how the dst entry which was moved to
> "a loopback device in that namespace" is released (i.e. what the
> expected shutdown sequence is).
The most probable explanation is that we make sure the loopback device
is the last one to be dismantled at netns deletion,
and this would obviously happen after all dst have been released.
On 2019/04/28 8:52, Eric Dumazet wrote:
> On 4/27/19 3:33 PM, Tetsuo Handa wrote:
>>
>> I'm waiting for davem why it is safe to move the dst entry from
>> "a device to unregister" to "a loopback device in that namespace".
>> I'm waiting for an explanation how the dst entry which was moved to
>> "a loopback device in that namespace" is released (i.e. what the
>> expected shutdown sequence is).
>
> The most probable explanation is that we make sure the loopback device
> is the last one to be dismantled at netns deletion,
> and this would obviously happen after all dst have been released.
>
rt_flush_dev() becomes a no-op if "dev" == "a loopback device in that
namespace". And according to debug printk(), rt_flush_dev() is called
on "a loopback device in that namespace" itself.
If "a loopback device in that namespace" is the last "one" (== "a network
device in that namespace" ?), which shutdown sequence should have called
dev_put("a loopback device in that namespace") before unregistration of
"a loopback device in that namespace" starts?
Since I'm not a netdev person, I appreciate if you can explain
that shutdown sequence using a flow chart.
On 4/27/19 9:22 PM, Tetsuo Handa wrote:
> On 2019/04/28 8:52, Eric Dumazet wrote:
>> On 4/27/19 3:33 PM, Tetsuo Handa wrote:
>>>
>>> I'm waiting for davem why it is safe to move the dst entry from
>>> "a device to unregister" to "a loopback device in that namespace".
>>> I'm waiting for an explanation how the dst entry which was moved to
>>> "a loopback device in that namespace" is released (i.e. what the
>>> expected shutdown sequence is).
>>
>> The most probable explanation is that we make sure the loopback device
>> is the last one to be dismantled at netns deletion,
>> and this would obviously happen after all dst have been released.
>>
>
> rt_flush_dev() becomes a no-op if "dev" == "a loopback device in that
> namespace". And according to debug printk(), rt_flush_dev() is called
> on "a loopback device in that namespace" itself.
>
This is the design yes. We can not let a dst having a pointer to some garbage memory.
(since we are going to free it very soon)
dst can be long lived objects. netdev (but loopback) are not.
> If "a loopback device in that namespace" is the last "one" (== "a network
> device in that namespace" ?), which shutdown sequence should have called
> dev_put("a loopback device in that namespace") before unregistration of
> "a loopback device in that namespace" starts?
You'll have to study all the netdev notifiers to answer this question.
They are many of them, and they have a priority to let them run in a given order.
>
> Since I'm not a netdev person, I appreciate if you can explain
> that shutdown sequence using a flow chart.
I am a netdev person, but I have no time to explain this at this moment.
On 4/27/19 10:22 PM, Tetsuo Handa wrote:
> On 2019/04/28 8:52, Eric Dumazet wrote:
>> On 4/27/19 3:33 PM, Tetsuo Handa wrote:
>>>
>>> I'm waiting for davem why it is safe to move the dst entry from
>>> "a device to unregister" to "a loopback device in that namespace".
>>> I'm waiting for an explanation how the dst entry which was moved to
>>> "a loopback device in that namespace" is released (i.e. what the
>>> expected shutdown sequence is).
>>
>> The most probable explanation is that we make sure the loopback device
>> is the last one to be dismantled at netns deletion,
>> and this would obviously happen after all dst have been released.
>>
>
> rt_flush_dev() becomes a no-op if "dev" == "a loopback device in that
> namespace". And according to debug printk(), rt_flush_dev() is called
> on "a loopback device in that namespace" itself.
>
> If "a loopback device in that namespace" is the last "one" (== "a network
> device in that namespace" ?), which shutdown sequence should have called
> dev_put("a loopback device in that namespace") before unregistration of
> "a loopback device in that namespace" starts?
>
> Since I'm not a netdev person, I appreciate if you can explain
> that shutdown sequence using a flow chart.
>
The attached patch adds a tracepoint to notifier_call_chain. If you have
KALLSYMS enabled it will show the order of the function handlers:
perf record -e notifier:* -a -g &
ip netns del <NAME>
<wait a few seconds>
fg
<ctrl-c on perf-record>
perf script
On 4/29/19 12:34 PM, David Ahern wrote:
> On 4/27/19 10:22 PM, Tetsuo Handa wrote:
>> On 2019/04/28 8:52, Eric Dumazet wrote:
>>> On 4/27/19 3:33 PM, Tetsuo Handa wrote:
>>>>
>>>> I'm waiting for davem why it is safe to move the dst entry from
>>>> "a device to unregister" to "a loopback device in that namespace".
>>>> I'm waiting for an explanation how the dst entry which was moved to
>>>> "a loopback device in that namespace" is released (i.e. what the
>>>> expected shutdown sequence is).
>>>
>>> The most probable explanation is that we make sure the loopback device
>>> is the last one to be dismantled at netns deletion,
>>> and this would obviously happen after all dst have been released.
>>>
>>
>> rt_flush_dev() becomes a no-op if "dev" == "a loopback device in that
>> namespace". And according to debug printk(), rt_flush_dev() is called
>> on "a loopback device in that namespace" itself.
>>
>> If "a loopback device in that namespace" is the last "one" (== "a network
>> device in that namespace" ?), which shutdown sequence should have called
>> dev_put("a loopback device in that namespace") before unregistration of
>> "a loopback device in that namespace" starts?
>>
>> Since I'm not a netdev person, I appreciate if you can explain
>> that shutdown sequence using a flow chart.
>>
>
> The attached patch adds a tracepoint to notifier_call_chain. If you have
> KALLSYMS enabled it will show the order of the function handlers:
>
> perf record -e notifier:* -a -g &
>
> ip netns del <NAME>
> <wait a few seconds>
>
> fg
> <ctrl-c on perf-record>
>
> perf script
>
with the header file this time.
On 2019/04/30 3:43, David Ahern wrote:
>> The attached patch adds a tracepoint to notifier_call_chain. If you have
>> KALLSYMS enabled it will show the order of the function handlers:
>>
>> perf record -e notifier:* -a -g &
>>
>> ip netns del <NAME>
>> <wait a few seconds>
>>
>> fg
>> <ctrl-c on perf-record>
>>
>> perf script
>>
>
> with the header file this time.
>
What is the intent of your patch? I can see that many notifiers are called. But
how does this help identify which event is responsible for dropping the refcount?
a.out 6898 [003] 54.809503: notifier:notifier_call_chain: val 17 fcn ffffffff822de9b0 name ip_vs_dst_event
a.out 6898 [003] 54.809512: notifier:notifier_call_chain: val 17 fcn ffffffff821a1060 name rtnetlink_event
a.out 6898 [003] 54.809516: notifier:notifier_call_chain: val 17 fcn ffffffff812c8830 name dev_map_notification
a.out 6898 [003] 54.809520: notifier:notifier_call_chain: val 17 fcn ffffffff81f89a00 name netdevice_event
a.out 6898 [003] 54.809523: notifier:notifier_call_chain: val 17 fcn ffffffff821b8c70 name fib_rules_event
a.out 6898 [003] 54.809525: notifier:notifier_call_chain: val 17 fcn ffffffff821c11e0 name netprio_device_event
a.out 6898 [003] 54.809530: notifier:notifier_call_chain: val 17 fcn ffffffff826012b0 name wext_netdev_notifier_call
a.out 6898 [003] 54.809533: notifier:notifier_call_chain: val 17 fcn ffffffff8261b980 name netdev_notify
a.out 6898 [003] 54.809536: notifier:notifier_call_chain: val 17 fcn ffffffff826a4e10 name netlbl_unlhsh_netdev_handler
a.out 6898 [003] 54.809538: notifier:notifier_call_chain: val 17 fcn ffffffff826d5900 name cfg802154_netdev_notifier_call
a.out 6898 [003] 54.809542: notifier:notifier_call_chain: val 17 fcn ffffffff826e1fa0 name netdev_notify
a.out 6898 [003] 54.809546: notifier:notifier_call_chain: val 17 fcn ffffffff82331910 name arp_netdev_event
a.out 6898 [003] 54.809548: notifier:notifier_call_chain: val 17 fcn ffffffff823389c0 name inetdev_event
a.out 6898 [003] 54.809551: notifier:notifier_call_chain: val 17 fcn ffffffff82343370 name fib_netdev_event
a.out 6898 [003] 54.809555: notifier:notifier_call_chain: val 17 fcn ffffffff82399140 name xfrm_dev_event
a.out 6898 [003] 54.809557: notifier:notifier_call_chain: val 17 fcn ffffffff8233dec0 name igmp_netdev_event
a.out 6898 [003] 54.809560: notifier:notifier_call_chain: val 17 fcn ffffffff82353c00 name ipmr_device_event
a.out 6898 [003] 54.809564: notifier:notifier_call_chain: val 17 fcn ffffffff825a9440 name cfg80211_netdev_notifier_call
a.out 6898 [003] 54.809569: notifier:notifier_call_chain: val 17 fcn ffffffff8168a060 name sel_netif_netdev_notifier_handler
a.out 6898 [003] 54.809573: notifier:notifier_call_chain: val 17 fcn ffffffff81baec70 name bond_netdev_event
a.out 6898 [003] 54.809576: notifier:notifier_call_chain: val 17 fcn ffffffff81bbb230 name ipvlan_device_event
a.out 6898 [003] 54.809579: notifier:notifier_call_chain: val 17 fcn ffffffff81bbc890 name ipvtap_device_event
a.out 6898 [003] 54.809581: notifier:notifier_call_chain: val 17 fcn ffffffff81bbe550 name macsec_notify
a.out 6898 [003] 54.809583: notifier:notifier_call_chain: val 17 fcn ffffffff81bc4240 name macvlan_device_event
a.out 6898 [003] 54.809585: notifier:notifier_call_chain: val 17 fcn ffffffff81bc56a0 name macvtap_device_event
a.out 6898 [003] 54.809589: notifier:notifier_call_chain: val 17 fcn ffffffff81bd3370 name team_device_event
a.out 6898 [003] 54.809591: notifier:notifier_call_chain: val 17 fcn ffffffff81bdb890 name tun_device_event
a.out 6898 [003] 54.809593: notifier:notifier_call_chain: val 17 fcn ffffffff81bf5bf0 name vrf_device_event
a.out 6898 [003] 54.809596: notifier:notifier_call_chain: val 17 fcn ffffffff81c8cb80 name bpq_device_event
a.out 6898 [003] 54.809599: notifier:notifier_call_chain: val 17 fcn ffffffff81c96e40 name pppoe_device_event
a.out 6898 [003] 54.809601: notifier:notifier_call_chain: val 17 fcn ffffffff81c9b380 name hdlc_device_event
a.out 6898 [003] 54.809603: notifier:notifier_call_chain: val 17 fcn ffffffff81ca06a0 name dlci_dev_event
a.out 6898 [003] 54.809605: notifier:notifier_call_chain: val 17 fcn ffffffff81ca12c0 name lapbeth_device_event
a.out 6898 [003] 54.809608: notifier:notifier_call_chain: val 17 fcn ffffffff81fa90f0 name cma_netdev_callback
a.out 6898 [003] 54.809611: notifier:notifier_call_chain: val 17 fcn ffffffff81fee6a0 name ipoib_netdev_event
a.out 6898 [003] 54.809613: notifier:notifier_call_chain: val 17 fcn ffffffff821c0e20 name dropmon_net_event
a.out 6898 [003] 54.809615: notifier:notifier_call_chain: val 17 fcn ffffffff821c5d80 name failover_event
a.out 6898 [003] 54.809618: notifier:notifier_call_chain: val 17 fcn ffffffff821e3a60 name mirred_device_event
a.out 6898 [003] 54.809620: notifier:notifier_call_chain: val 17 fcn ffffffff8223a930 name nfqnl_rcv_dev_event
a.out 6898 [003] 54.809623: notifier:notifier_call_chain: val 17 fcn ffffffff82273bd0 name nf_tables_netdev_event
a.out 6898 [003] 54.809625: notifier:notifier_call_chain: val 17 fcn ffffffff822667c0 name nf_tables_flowtable_event
a.out 6898 [003] 54.809628: notifier:notifier_call_chain: val 17 fcn ffffffff8227ef40 name flow_offload_netdev_event
a.out 6898 [003] 54.809630: notifier:notifier_call_chain: val 17 fcn ffffffff82260e40 name masq_device_event
a.out 6898 [003] 54.809632: notifier:notifier_call_chain: val 17 fcn ffffffff82290560 name tee_netdev_event
a.out 6898 [003] 54.809635: notifier:notifier_call_chain: val 17 fcn ffffffff8236dcd0 name clusterip_netdev_event
a.out 6898 [003] 54.809637: notifier:notifier_call_chain: val 17 fcn ffffffff82385210 name tls_dev_event
a.out 6898 [003] 54.809641: notifier:notifier_call_chain: val 17 fcn ffffffff823e99b0 name ip6mr_device_event
a.out 6898 [003] 54.809644: notifier:notifier_call_chain: val 17 fcn ffffffff823b9300 name addrconf_notify
a.out 6898 [003] 54.809647: notifier:notifier_call_chain: val 17 fcn ffffffff823d9200 name ipv6_mc_netdev_event
a.out 6898 [003] 54.809650: notifier:notifier_call_chain: val 17 fcn ffffffff82419160 name packet_notifier
a.out 6898 [003] 54.809653: notifier:notifier_call_chain: val 17 fcn ffffffff824255f0 name br_device_event
a.out 6898 [003] 54.809656: notifier:notifier_call_chain: val 17 fcn ffffffff8243e6e0 name brnf_device_event
a.out 6898 [003] 54.809658: notifier:notifier_call_chain: val 17 fcn ffffffff8244c220 name dsa_slave_netdevice_event
a.out 6898 [003] 54.809660: notifier:notifier_call_chain: val 17 fcn ffffffff8244df10 name x25_device_event
a.out 6898 [003] 54.809662: notifier:notifier_call_chain: val 17 fcn ffffffff824562a0 name nr_device_event
a.out 6898 [003] 54.809665: notifier:notifier_call_chain: val 17 fcn ffffffff8245bd20 name rose_device_event
a.out 6898 [003] 54.809667: notifier:notifier_call_chain: val 17 fcn ffffffff82466e40 name ax25_device_event
a.out 6898 [003] 54.809669: notifier:notifier_call_chain: val 17 fcn ffffffff8246b8a0 name can_notifier
a.out 6898 [003] 54.809672: notifier:notifier_call_chain: val 17 fcn ffffffff82470860 name cgw_notifier
a.out 6898 [003] 54.809674: notifier:notifier_call_chain: val 17 fcn ffffffff824c08b0 name device_event
a.out 6898 [003] 54.809677: notifier:notifier_call_chain: val 17 fcn ffffffff825355a0 name clip_device_event
a.out 6898 [003] 54.809680: notifier:notifier_call_chain: val 17 fcn ffffffff825481c0 name vlan_device_event
a.out 6898 [003] 54.809682: notifier:notifier_call_chain: val 17 fcn ffffffff8267bde0 name tipc_l2_device_event
a.out 6898 [003] 54.809684: notifier:notifier_call_chain: val 17 fcn ffffffff826ab0e0 name smc_pnet_netdev_event
a.out 6898 [003] 54.809687: notifier:notifier_call_chain: val 17 fcn ffffffff826c3220 name caif_device_notify
a.out 6898 [003] 54.809689: notifier:notifier_call_chain: val 17 fcn ffffffff826cb4b0 name cfusbl_device_notify
a.out 6898 [003] 54.809691: notifier:notifier_call_chain: val 17 fcn ffffffff826cf280 name lowpan_event
a.out 6898 [003] 54.809694: notifier:notifier_call_chain: val 17 fcn ffffffff826def50 name lowpan_device_event
a.out 6898 [003] 54.809696: notifier:notifier_call_chain: val 17 fcn ffffffff827178d0 name batadv_hard_if_event
a.out 6898 [003] 54.809700: notifier:notifier_call_chain: val 17 fcn ffffffff8274a0c0 name dp_device_event
a.out 6898 [003] 54.809704: notifier:notifier_call_chain: val 17 fcn ffffffff827631d0 name mpls_dev_notify
a.out 6898 [003] 54.809706: notifier:notifier_call_chain: val 17 fcn ffffffff82765180 name hsr_netdev_notify
a.out 6898 [003] 54.809708: notifier:notifier_call_chain: val 17 fcn ffffffff81bc6cc0 name netconsole_netdev_event
a.out 6898 [003] 54.809711: notifier:notifier_call_chain: val 17 fcn ffffffff81bec840 name vxlan_netdevice_event
a.out 6898 [003] 54.809713: notifier:notifier_call_chain: val 17 fcn ffffffff81bf0c60 name geneve_netdevice_event
a.out 6898 [003] 54.809717: notifier:notifier_call_chain: val 17 fcn ffffffff82025700 name rxe_notify
a.out 6898 [003] 54.809719: notifier:notifier_call_chain: val 17 fcn ffffffff823cc0d0 name ndisc_netdev_event
a.out 6898 [003] 54.809722: notifier:notifier_call_chain: val 17 fcn ffffffff823bde70 name ip6_route_dev_notify
a.out 6898 [003] 54.809803: notifier:notifier_call_chain: val 5 fcn ffffffff822de9b0 name ip_vs_dst_event
a.out 6898 [003] 54.809805: notifier:notifier_call_chain: val 5 fcn ffffffff821a1060 name rtnetlink_event
a.out 6898 [003] 54.809807: notifier:notifier_call_chain: val 5 fcn ffffffff812c8830 name dev_map_notification
a.out 6898 [003] 54.809810: notifier:notifier_call_chain: val 5 fcn ffffffff81f89a00 name netdevice_event
a.out 6898 [003] 54.809812: notifier:notifier_call_chain: val 5 fcn ffffffff821b8c70 name fib_rules_event
a.out 6898 [003] 54.809815: notifier:notifier_call_chain: val 5 fcn ffffffff821c11e0 name netprio_device_event
a.out 6898 [003] 54.809817: notifier:notifier_call_chain: val 5 fcn ffffffff826012b0 name wext_netdev_notifier_call
a.out 6898 [003] 54.809819: notifier:notifier_call_chain: val 5 fcn ffffffff8261b980 name netdev_notify
a.out 6898 [003] 54.809821: notifier:notifier_call_chain: val 5 fcn ffffffff826a4e10 name netlbl_unlhsh_netdev_handler
a.out 6898 [003] 54.809823: notifier:notifier_call_chain: val 5 fcn ffffffff826d5900 name cfg802154_netdev_notifier_call
a.out 6898 [003] 54.809825: notifier:notifier_call_chain: val 5 fcn ffffffff826e1fa0 name netdev_notify
a.out 6898 [003] 54.809828: notifier:notifier_call_chain: val 5 fcn ffffffff82331910 name arp_netdev_event
a.out 6898 [003] 54.809829: notifier:notifier_call_chain: val 5 fcn ffffffff823389c0 name inetdev_event
a.out 6898 [003] 54.809851: notifier:notifier_call_chain: val 5 fcn ffffffff82343370 name fib_netdev_event
a.out 6898 [003] 54.809854: notifier:notifier_call_chain: val 5 fcn ffffffff82399140 name xfrm_dev_event
a.out 6898 [003] 54.809856: notifier:notifier_call_chain: val 5 fcn ffffffff8233dec0 name igmp_netdev_event
a.out 6898 [003] 54.809858: notifier:notifier_call_chain: val 5 fcn ffffffff82353c00 name ipmr_device_event
a.out 6898 [003] 54.809860: notifier:notifier_call_chain: val 5 fcn ffffffff825a9440 name cfg80211_netdev_notifier_call
a.out 6898 [003] 54.809863: notifier:notifier_call_chain: val 5 fcn ffffffff8168a060 name sel_netif_netdev_notifier_handler
a.out 6898 [003] 54.809866: notifier:notifier_call_chain: val 5 fcn ffffffff81baec70 name bond_netdev_event
a.out 6898 [003] 54.809868: notifier:notifier_call_chain: val 5 fcn ffffffff81bbb230 name ipvlan_device_event
a.out 6898 [003] 54.809870: notifier:notifier_call_chain: val 5 fcn ffffffff81bbc890 name ipvtap_device_event
a.out 6898 [003] 54.809872: notifier:notifier_call_chain: val 5 fcn ffffffff81bbe550 name macsec_notify
a.out 6898 [003] 54.809874: notifier:notifier_call_chain: val 5 fcn ffffffff81bc4240 name macvlan_device_event
a.out 6898 [003] 54.809875: notifier:notifier_call_chain: val 5 fcn ffffffff81bc56a0 name macvtap_device_event
a.out 6898 [003] 54.809878: notifier:notifier_call_chain: val 5 fcn ffffffff81bd3370 name team_device_event
a.out 6898 [003] 54.809880: notifier:notifier_call_chain: val 5 fcn ffffffff81bdb890 name tun_device_event
a.out 6898 [003] 54.809882: notifier:notifier_call_chain: val 5 fcn ffffffff81bf5bf0 name vrf_device_event
a.out 6898 [003] 54.809884: notifier:notifier_call_chain: val 5 fcn ffffffff81c8cb80 name bpq_device_event
a.out 6898 [003] 54.809886: notifier:notifier_call_chain: val 5 fcn ffffffff81c96e40 name pppoe_device_event
a.out 6898 [003] 54.809888: notifier:notifier_call_chain: val 5 fcn ffffffff81c9b380 name hdlc_device_event
a.out 6898 [003] 54.809890: notifier:notifier_call_chain: val 5 fcn ffffffff81ca06a0 name dlci_dev_event
a.out 6898 [003] 54.809892: notifier:notifier_call_chain: val 5 fcn ffffffff81ca12c0 name lapbeth_device_event
a.out 6898 [003] 54.809894: notifier:notifier_call_chain: val 5 fcn ffffffff81fa90f0 name cma_netdev_callback
a.out 6898 [003] 54.809896: notifier:notifier_call_chain: val 5 fcn ffffffff81fee6a0 name ipoib_netdev_event
a.out 6898 [003] 54.809898: notifier:notifier_call_chain: val 5 fcn ffffffff821c0e20 name dropmon_net_event
a.out 6898 [003] 54.809900: notifier:notifier_call_chain: val 5 fcn ffffffff821c5d80 name failover_event
a.out 6898 [003] 54.809902: notifier:notifier_call_chain: val 5 fcn ffffffff821e3a60 name mirred_device_event
a.out 6898 [003] 54.809904: notifier:notifier_call_chain: val 5 fcn ffffffff8223a930 name nfqnl_rcv_dev_event
a.out 6898 [003] 54.809906: notifier:notifier_call_chain: val 5 fcn ffffffff82273bd0 name nf_tables_netdev_event
a.out 6898 [003] 54.809908: notifier:notifier_call_chain: val 5 fcn ffffffff822667c0 name nf_tables_flowtable_event
a.out 6898 [003] 54.809911: notifier:notifier_call_chain: val 5 fcn ffffffff8227ef40 name flow_offload_netdev_event
a.out 6898 [003] 54.809913: notifier:notifier_call_chain: val 5 fcn ffffffff82260e40 name masq_device_event
a.out 6898 [003] 54.809914: notifier:notifier_call_chain: val 5 fcn ffffffff82290560 name tee_netdev_event
a.out 6898 [003] 54.809917: notifier:notifier_call_chain: val 5 fcn ffffffff8236dcd0 name clusterip_netdev_event
a.out 6898 [003] 54.809919: notifier:notifier_call_chain: val 5 fcn ffffffff82385210 name tls_dev_event
a.out 6898 [003] 54.809921: notifier:notifier_call_chain: val 5 fcn ffffffff823e99b0 name ip6mr_device_event
a.out 6898 [003] 54.809923: notifier:notifier_call_chain: val 5 fcn ffffffff823b9300 name addrconf_notify
a.out 6898 [003] 54.809956: notifier:notifier_call_chain: val 5 fcn ffffffff823d9200 name ipv6_mc_netdev_event
a.out 6898 [003] 54.809959: notifier:notifier_call_chain: val 5 fcn ffffffff82419160 name packet_notifier
a.out 6898 [003] 54.809961: notifier:notifier_call_chain: val 5 fcn ffffffff824255f0 name br_device_event
a.out 6898 [003] 54.809964: notifier:notifier_call_chain: val 5 fcn ffffffff8243e6e0 name brnf_device_event
a.out 6898 [003] 54.809966: notifier:notifier_call_chain: val 5 fcn ffffffff8244c220 name dsa_slave_netdevice_event
a.out 6898 [003] 54.809968: notifier:notifier_call_chain: val 5 fcn ffffffff8244df10 name x25_device_event
a.out 6898 [003] 54.809970: notifier:notifier_call_chain: val 5 fcn ffffffff824562a0 name nr_device_event
a.out 6898 [003] 54.809972: notifier:notifier_call_chain: val 5 fcn ffffffff8245bd20 name rose_device_event
a.out 6898 [003] 54.809974: notifier:notifier_call_chain: val 5 fcn ffffffff82466e40 name ax25_device_event
a.out 6898 [003] 54.809976: notifier:notifier_call_chain: val 5 fcn ffffffff8246b8a0 name can_notifier
a.out 6898 [003] 54.809978: notifier:notifier_call_chain: val 5 fcn ffffffff82470860 name cgw_notifier
a.out 6898 [003] 54.809980: notifier:notifier_call_chain: val 5 fcn ffffffff824c08b0 name device_event
a.out 6898 [003] 54.809982: notifier:notifier_call_chain: val 5 fcn ffffffff825355a0 name clip_device_event
a.out 6898 [003] 54.809984: notifier:notifier_call_chain: val 5 fcn ffffffff825481c0 name vlan_device_event
a.out 6898 [003] 54.809986: notifier:notifier_call_chain: val 5 fcn ffffffff8267bde0 name tipc_l2_device_event
a.out 6898 [003] 54.809988: notifier:notifier_call_chain: val 5 fcn ffffffff826ab0e0 name smc_pnet_netdev_event
a.out 6898 [003] 54.809991: notifier:notifier_call_chain: val 5 fcn ffffffff826c3220 name caif_device_notify
a.out 6898 [003] 54.809993: notifier:notifier_call_chain: val 5 fcn ffffffff826cb4b0 name cfusbl_device_notify
a.out 6898 [003] 54.809995: notifier:notifier_call_chain: val 5 fcn ffffffff826cf280 name lowpan_event
a.out 6898 [003] 54.809997: notifier:notifier_call_chain: val 5 fcn ffffffff826def50 name lowpan_device_event
a.out 6898 [003] 54.809999: notifier:notifier_call_chain: val 5 fcn ffffffff827178d0 name batadv_hard_if_event
a.out 6898 [003] 54.810001: notifier:notifier_call_chain: val 5 fcn ffffffff8274a0c0 name dp_device_event
a.out 6898 [003] 54.810003: notifier:notifier_call_chain: val 5 fcn ffffffff827631d0 name mpls_dev_notify
a.out 6898 [003] 54.810012: notifier:notifier_call_chain: val 5 fcn ffffffff82765180 name hsr_netdev_notify
a.out 6898 [003] 54.810014: notifier:notifier_call_chain: val 5 fcn ffffffff81bc6cc0 name netconsole_netdev_event
a.out 6898 [003] 54.810017: notifier:notifier_call_chain: val 5 fcn ffffffff81bec840 name vxlan_netdevice_event
a.out 6898 [003] 54.810020: notifier:notifier_call_chain: val 5 fcn ffffffff81bf0c60 name geneve_netdevice_event
a.out 6898 [003] 54.810022: notifier:notifier_call_chain: val 5 fcn ffffffff82025700 name rxe_notify
a.out 6898 [003] 54.810024: notifier:notifier_call_chain: val 5 fcn ffffffff823cc0d0 name ndisc_netdev_event
a.out 6898 [003] 54.810027: notifier:notifier_call_chain: val 5 fcn ffffffff823bde70 name ip6_route_dev_notify
(...snipped...)
a.out 17698 [002] 76.174214: notifier:notifier_call_chain: val 6 fcn ffffffff824c08b0 name device_event
a.out 17698 [002] 76.174215: notifier:notifier_call_chain: val 6 fcn ffffffff825355a0 name clip_device_event
a.out 17698 [002] 76.174217: notifier:notifier_call_chain: val 6 fcn ffffffff825481c0 name vlan_device_event
a.out 17698 [002] 76.174219: notifier:notifier_call_chain: val 6 fcn ffffffff8267bde0 name tipc_l2_device_event
a.out 17698 [002] 76.174221: notifier:notifier_call_chain: val 6 fcn ffffffff826ab0e0 name smc_pnet_netdev_event
a.out 17698 [002] 76.174224: notifier:notifier_call_chain: val 6 fcn ffffffff826c3220 name caif_device_notify
a.out 17698 [002] 76.174226: notifier:notifier_call_chain: val 6 fcn ffffffff826cb4b0 name cfusbl_device_notify
a.out 17698 [002] 76.174227: notifier:notifier_call_chain: val 6 fcn ffffffff826cf280 name lowpan_event
a.out 17698 [002] 76.174229: notifier:notifier_call_chain: val 6 fcn ffffffff826def50 name lowpan_device_event
a.out 17698 [002] 76.174231: notifier:notifier_call_chain: val 6 fcn ffffffff827178d0 name batadv_hard_if_event
a.out 17698 [002] 76.174257: notifier:notifier_call_chain: val 6 fcn ffffffff8274a0c0 name dp_device_event
a.out 17698 [002] 76.174259: notifier:notifier_call_chain: val 6 fcn ffffffff827631d0 name mpls_dev_notify
a.out 17698 [002] 76.174264: notifier:notifier_call_chain: val 6 fcn ffffffff82765180 name hsr_netdev_notify
a.out 17698 [002] 76.174266: notifier:notifier_call_chain: val 6 fcn ffffffff81bc6cc0 name netconsole_netdev_event
a.out 17698 [002] 76.174268: notifier:notifier_call_chain: val 6 fcn ffffffff81bec840 name vxlan_netdevice_event
a.out 17698 [002] 76.174270: notifier:notifier_call_chain: val 6 fcn ffffffff81bf0c60 name geneve_netdevice_event
a.out 17698 [002] 76.174273: notifier:notifier_call_chain: val 6 fcn ffffffff82025700 name rxe_notify
a.out 17698 [002] 76.174274: notifier:notifier_call_chain: val 6 fcn ffffffff823cc0d0 name ndisc_netdev_event
a.out 17698 [002] 76.174276: notifier:notifier_call_chain: val 6 fcn ffffffff823bde70 name ip6_route_dev_notify
kworker/u128:32 4366 [001] 76.297089: notifier:notifier_call_chain: val 6 fcn ffffffff822de9b0 name ip_vs_dst_event
kworker/u128:32 4366 [001] 76.297095: notifier:notifier_call_chain: val 6 fcn ffffffff821a1060 name rtnetlink_event
kworker/u128:32 4366 [001] 76.297098: notifier:notifier_call_chain: val 6 fcn ffffffff812c8830 name dev_map_notification
kworker/u128:32 4366 [001] 76.297101: notifier:notifier_call_chain: val 6 fcn ffffffff81f89a00 name netdevice_event
kworker/u128:32 4366 [001] 76.297104: notifier:notifier_call_chain: val 6 fcn ffffffff821b8c70 name fib_rules_event
kworker/u128:32 4366 [001] 76.297107: notifier:notifier_call_chain: val 6 fcn ffffffff821c11e0 name netprio_device_event
kworker/u128:32 4366 [001] 76.297111: notifier:notifier_call_chain: val 6 fcn ffffffff826012b0 name wext_netdev_notifier_call
kworker/u128:32 4366 [001] 76.297115: notifier:notifier_call_chain: val 6 fcn ffffffff8261b980 name netdev_notify
kworker/u128:32 4366 [001] 76.297117: notifier:notifier_call_chain: val 6 fcn ffffffff826a4e10 name netlbl_unlhsh_netdev_handler
kworker/u128:32 4366 [001] 76.297119: notifier:notifier_call_chain: val 6 fcn ffffffff826d5900 name cfg802154_netdev_notifier_call
kworker/u128:32 4366 [001] 76.297123: notifier:notifier_call_chain: val 6 fcn ffffffff826e1fa0 name netdev_notify
kworker/u128:32 4366 [001] 76.297126: notifier:notifier_call_chain: val 6 fcn ffffffff82331910 name arp_netdev_event
kworker/u128:32 4366 [001] 76.297128: notifier:notifier_call_chain: val 6 fcn ffffffff823389c0 name inetdev_event
kworker/u128:32 4366 [001] 76.297146: notifier:notifier_call_chain: val 6 fcn ffffffff82343370 name fib_netdev_event
kworker/u128:32 4366 [001] 76.297221: notifier:notifier_call_chain: val 6 fcn ffffffff82399140 name xfrm_dev_event
(...snipped...)
kworker/u128:32 4366 [002] 86.837244: notifier:notifier_call_chain: val 6 fcn ffffffff822de9b0 name ip_vs_dst_event
kworker/u128:32 4366 [002] 86.837260: notifier:notifier_call_chain: val 6 fcn ffffffff821a1060 name rtnetlink_event
kworker/u128:32 4366 [002] 86.837269: notifier:notifier_call_chain: val 6 fcn ffffffff812c8830 name dev_map_notification
kworker/u128:32 4366 [002] 86.837278: notifier:notifier_call_chain: val 6 fcn ffffffff81f89a00 name netdevice_event
kworker/u128:32 4366 [002] 86.837284: notifier:notifier_call_chain: val 6 fcn ffffffff821b8c70 name fib_rules_event
kworker/u128:32 4366 [002] 86.837293: notifier:notifier_call_chain: val 6 fcn ffffffff821c11e0 name netprio_device_event
kworker/u128:32 4366 [002] 86.837305: notifier:notifier_call_chain: val 6 fcn ffffffff826012b0 name wext_netdev_notifier_call
kworker/u128:32 4366 [002] 86.837315: notifier:notifier_call_chain: val 6 fcn ffffffff8261b980 name netdev_notify
kworker/u128:32 4366 [002] 86.837321: notifier:notifier_call_chain: val 6 fcn ffffffff826a4e10 name netlbl_unlhsh_netdev_handler
kworker/u128:32 4366 [002] 86.837327: notifier:notifier_call_chain: val 6 fcn ffffffff826d5900 name cfg802154_netdev_notifier_call
kworker/u128:32 4366 [002] 86.837335: notifier:notifier_call_chain: val 6 fcn ffffffff826e1fa0 name netdev_notify
kworker/u128:32 4366 [002] 86.837344: notifier:notifier_call_chain: val 6 fcn ffffffff82331910 name arp_netdev_event
kworker/u128:32 4366 [002] 86.837348: notifier:notifier_call_chain: val 6 fcn ffffffff823389c0 name inetdev_event
kworker/u128:32 4366 [002] 86.837355: notifier:notifier_call_chain: val 6 fcn ffffffff82343370 name fib_netdev_event
kworker/u128:32 4366 [002] 86.837736: notifier:notifier_call_chain: val 6 fcn ffffffff82399140 name xfrm_dev_event
kworker/u128:32 4366 [002] 86.837743: notifier:notifier_call_chain: val 6 fcn ffffffff8233dec0 name igmp_netdev_event
kworker/u128:32 4366 [002] 86.837750: notifier:notifier_call_chain: val 6 fcn ffffffff82353c00 name ipmr_device_event
kworker/u128:32 4366 [002] 86.837761: notifier:notifier_call_chain: val 6 fcn ffffffff825a9440 name cfg80211_netdev_notifier_call
kworker/u128:32 4366 [002] 86.837772: notifier:notifier_call_chain: val 6 fcn ffffffff8168a060 name sel_netif_netdev_notifier_handler
kworker/u128:32 4366 [002] 86.837782: notifier:notifier_call_chain: val 6 fcn ffffffff81baec70 name bond_netdev_event
kworker/u128:32 4366 [002] 86.837789: notifier:notifier_call_chain: val 6 fcn ffffffff81bbb230 name ipvlan_device_event
kworker/u128:32 4366 [002] 86.837796: notifier:notifier_call_chain: val 6 fcn ffffffff81bbc890 name ipvtap_device_event
kworker/u128:32 4366 [002] 86.837799: notifier:notifier_call_chain: val 6 fcn ffffffff81bbe550 name macsec_notify
kworker/u128:32 4366 [002] 86.837804: notifier:notifier_call_chain: val 6 fcn ffffffff81bc4240 name macvlan_device_event
kworker/u128:32 4366 [002] 86.837808: notifier:notifier_call_chain: val 6 fcn ffffffff81bc56a0 name macvtap_device_event
kworker/u128:32 4366 [002] 86.837816: notifier:notifier_call_chain: val 6 fcn ffffffff81bd3370 name team_device_event
kworker/u128:32 4366 [002] 86.837821: notifier:notifier_call_chain: val 6 fcn ffffffff81bdb890 name tun_device_event
kworker/u128:32 4366 [002] 86.837826: notifier:notifier_call_chain: val 6 fcn ffffffff81bf5bf0 name vrf_device_event
kworker/u128:32 4366 [002] 86.837834: notifier:notifier_call_chain: val 6 fcn ffffffff81c8cb80 name bpq_device_event
kworker/u128:32 4366 [002] 86.837841: notifier:notifier_call_chain: val 6 fcn ffffffff81c96e40 name pppoe_device_event
kworker/u128:32 4366 [002] 86.837845: notifier:notifier_call_chain: val 6 fcn ffffffff81c9b380 name hdlc_device_event
kworker/u128:32 4366 [002] 86.837850: notifier:notifier_call_chain: val 6 fcn ffffffff81ca06a0 name dlci_dev_event
kworker/u128:32 4366 [002] 86.837855: notifier:notifier_call_chain: val 6 fcn ffffffff81ca12c0 name lapbeth_device_event
kworker/u128:32 4366 [002] 86.837862: notifier:notifier_call_chain: val 6 fcn ffffffff81fa90f0 name cma_netdev_callback
kworker/u128:32 4366 [002] 86.837868: notifier:notifier_call_chain: val 6 fcn ffffffff81fee6a0 name ipoib_netdev_event
kworker/u128:32 4366 [002] 86.837875: notifier:notifier_call_chain: val 6 fcn ffffffff821c0e20 name dropmon_net_event
kworker/u128:32 4366 [002] 86.837896: notifier:notifier_call_chain: val 6 fcn ffffffff821c5d80 name failover_event
kworker/u128:32 4366 [002] 86.837903: notifier:notifier_call_chain: val 6 fcn ffffffff821e3a60 name mirred_device_event
kworker/u128:32 4366 [002] 86.837909: notifier:notifier_call_chain: val 6 fcn ffffffff8223a930 name nfqnl_rcv_dev_event
kworker/u128:32 4366 [002] 86.837918: notifier:notifier_call_chain: val 6 fcn ffffffff82273bd0 name nf_tables_netdev_event
kworker/u128:32 4366 [002] 86.837924: notifier:notifier_call_chain: val 6 fcn ffffffff822667c0 name nf_tables_flowtable_event
kworker/u128:32 4366 [002] 86.837932: notifier:notifier_call_chain: val 6 fcn ffffffff8227ef40 name flow_offload_netdev_event
kworker/u128:32 4366 [002] 86.837936: notifier:notifier_call_chain: val 6 fcn ffffffff82260e40 name masq_device_event
kworker/u128:32 4366 [002] 86.837941: notifier:notifier_call_chain: val 6 fcn ffffffff82290560 name tee_netdev_event
kworker/u128:32 4366 [002] 86.837950: notifier:notifier_call_chain: val 6 fcn ffffffff8236dcd0 name clusterip_netdev_event
kworker/u128:32 4366 [002] 86.837955: notifier:notifier_call_chain: val 6 fcn ffffffff82385210 name tls_dev_event
kworker/u128:32 4366 [002] 86.837964: notifier:notifier_call_chain: val 6 fcn ffffffff823e99b0 name ip6mr_device_event
kworker/u128:32 4366 [002] 86.837972: notifier:notifier_call_chain: val 6 fcn ffffffff823b9300 name addrconf_notify
kworker/u128:32 4366 [002] 86.837989: notifier:notifier_call_chain: val 6 fcn ffffffff823d9200 name ipv6_mc_netdev_event
kworker/u128:32 4366 [002] 86.837997: notifier:notifier_call_chain: val 6 fcn ffffffff82419160 name packet_notifier
kworker/u128:32 4366 [002] 86.838003: notifier:notifier_call_chain: val 6 fcn ffffffff824255f0 name br_device_event
kworker/u128:32 4366 [002] 86.838012: notifier:notifier_call_chain: val 6 fcn ffffffff8243e6e0 name brnf_device_event
kworker/u128:32 4366 [002] 86.838017: notifier:notifier_call_chain: val 6 fcn ffffffff8244c220 name dsa_slave_netdevice_event
kworker/u128:32 4366 [002] 86.838023: notifier:notifier_call_chain: val 6 fcn ffffffff8244df10 name x25_device_event
kworker/u128:32 4366 [002] 86.838028: notifier:notifier_call_chain: val 6 fcn ffffffff824562a0 name nr_device_event
kworker/u128:32 4366 [002] 86.838033: notifier:notifier_call_chain: val 6 fcn ffffffff8245bd20 name rose_device_event
kworker/u128:32 4366 [002] 86.838039: notifier:notifier_call_chain: val 6 fcn ffffffff82466e40 name ax25_device_event
kworker/u128:32 4366 [002] 86.838044: notifier:notifier_call_chain: val 6 fcn ffffffff8246b8a0 name can_notifier
kworker/u128:32 4366 [002] 86.838050: notifier:notifier_call_chain: val 6 fcn ffffffff82470860 name cgw_notifier
kworker/u128:32 4366 [002] 86.838056: notifier:notifier_call_chain: val 6 fcn ffffffff824c08b0 name device_event
kworker/u128:32 4366 [002] 86.838062: notifier:notifier_call_chain: val 6 fcn ffffffff825355a0 name clip_device_event
kworker/u128:32 4366 [002] 86.838067: notifier:notifier_call_chain: val 6 fcn ffffffff825481c0 name vlan_device_event
kworker/u128:32 4366 [002] 86.838073: notifier:notifier_call_chain: val 6 fcn ffffffff8267bde0 name tipc_l2_device_event
kworker/u128:32 4366 [002] 86.838079: notifier:notifier_call_chain: val 6 fcn ffffffff826ab0e0 name smc_pnet_netdev_event
kworker/u128:32 4366 [002] 86.838158: notifier:notifier_call_chain: val 6 fcn ffffffff826c3220 name caif_device_notify
kworker/u128:32 4366 [002] 86.838169: notifier:notifier_call_chain: val 6 fcn ffffffff826cb4b0 name cfusbl_device_notify
kworker/u128:32 4366 [002] 86.838174: notifier:notifier_call_chain: val 6 fcn ffffffff826cf280 name lowpan_event
kworker/u128:32 4366 [002] 86.838180: notifier:notifier_call_chain: val 6 fcn ffffffff826def50 name lowpan_device_event
kworker/u128:32 4366 [002] 86.838186: notifier:notifier_call_chain: val 6 fcn ffffffff827178d0 name batadv_hard_if_event
kworker/u128:32 4366 [002] 86.838199: notifier:notifier_call_chain: val 6 fcn ffffffff8274a0c0 name dp_device_event
kworker/u128:32 4366 [002] 86.838207: notifier:notifier_call_chain: val 6 fcn ffffffff827631d0 name mpls_dev_notify
kworker/u128:32 4366 [002] 86.838212: notifier:notifier_call_chain: val 6 fcn ffffffff82765180 name hsr_netdev_notify
kworker/u128:32 4366 [002] 86.838219: notifier:notifier_call_chain: val 6 fcn ffffffff81bc6cc0 name netconsole_netdev_event
kworker/u128:32 4366 [002] 86.838226: notifier:notifier_call_chain: val 6 fcn ffffffff81bec840 name vxlan_netdevice_event
kworker/u128:32 4366 [002] 86.838235: notifier:notifier_call_chain: val 6 fcn ffffffff81bf0c60 name geneve_netdevice_event
kworker/u128:32 4366 [002] 86.838244: notifier:notifier_call_chain: val 6 fcn ffffffff82025700 name rxe_notify
kworker/u128:32 4366 [002] 86.838250: notifier:notifier_call_chain: val 6 fcn ffffffff823cc0d0 name ndisc_netdev_event
kworker/u128:32 4366 [002] 86.838256: notifier:notifier_call_chain: val 6 fcn ffffffff823bde70 name ip6_route_dev_notify
kworker/u128:32 4366 [002] 87.985339: notifier:notifier_call_chain: val 6 fcn ffffffff822de9b0 name ip_vs_dst_event
kworker/u128:32 4366 [002] 87.985354: notifier:notifier_call_chain: val 6 fcn ffffffff821a1060 name rtnetlink_event
kworker/u128:32 4366 [002] 87.985362: notifier:notifier_call_chain: val 6 fcn ffffffff812c8830 name dev_map_notification
kworker/u128:32 4366 [002] 87.985370: notifier:notifier_call_chain: val 6 fcn ffffffff81f89a00 name netdevice_event
kworker/u128:32 4366 [002] 87.985375: notifier:notifier_call_chain: val 6 fcn ffffffff821b8c70 name fib_rules_event
kworker/u128:32 4366 [002] 87.985385: notifier:notifier_call_chain: val 6 fcn ffffffff821c11e0 name netprio_device_event
kworker/u128:32 4366 [002] 87.985396: notifier:notifier_call_chain: val 6 fcn ffffffff826012b0 name wext_netdev_notifier_call
kworker/u128:32 4366 [002] 87.985405: notifier:notifier_call_chain: val 6 fcn ffffffff8261b980 name netdev_notify
kworker/u128:32 4366 [002] 87.985411: notifier:notifier_call_chain: val 6 fcn ffffffff826a4e10 name netlbl_unlhsh_netdev_handler
kworker/u128:32 4366 [002] 87.985416: notifier:notifier_call_chain: val 6 fcn ffffffff826d5900 name cfg802154_netdev_notifier_call
kworker/u128:32 4366 [002] 87.985424: notifier:notifier_call_chain: val 6 fcn ffffffff826e1fa0 name netdev_notify
kworker/u128:32 4366 [002] 87.985433: notifier:notifier_call_chain: val 6 fcn ffffffff82331910 name arp_netdev_event
kworker/u128:32 4366 [002] 87.985437: notifier:notifier_call_chain: val 6 fcn ffffffff823389c0 name inetdev_event
kworker/u128:32 4366 [002] 87.985443: notifier:notifier_call_chain: val 6 fcn ffffffff82343370 name fib_netdev_event
kworker/u128:32 4366 [002] 87.985727: notifier:notifier_call_chain: val 6 fcn ffffffff82399140 name xfrm_dev_event
kworker/u128:32 4366 [002] 87.985733: notifier:notifier_call_chain: val 6 fcn ffffffff8233dec0 name igmp_netdev_event
kworker/u128:32 4366 [002] 87.985740: notifier:notifier_call_chain: val 6 fcn ffffffff82353c00 name ipmr_device_event
kworker/u128:32 4366 [002] 87.985751: notifier:notifier_call_chain: val 6 fcn ffffffff825a9440 name cfg80211_netdev_notifier_call
kworker/u128:32 4366 [002] 87.985762: notifier:notifier_call_chain: val 6 fcn ffffffff8168a060 name sel_netif_netdev_notifier_handler
kworker/u128:32 4366 [002] 87.985772: notifier:notifier_call_chain: val 6 fcn ffffffff81baec70 name bond_netdev_event
kworker/u128:32 4366 [002] 87.985780: notifier:notifier_call_chain: val 6 fcn ffffffff81bbb230 name ipvlan_device_event
kworker/u128:32 4366 [002] 87.985786: notifier:notifier_call_chain: val 6 fcn ffffffff81bbc890 name ipvtap_device_event
kworker/u128:32 4366 [002] 87.985789: notifier:notifier_call_chain: val 6 fcn ffffffff81bbe550 name macsec_notify
kworker/u128:32 4366 [002] 87.985794: notifier:notifier_call_chain: val 6 fcn ffffffff81bc4240 name macvlan_device_event
kworker/u128:32 4366 [002] 87.985799: notifier:notifier_call_chain: val 6 fcn ffffffff81bc56a0 name macvtap_device_event
kworker/u128:32 4366 [002] 87.985807: notifier:notifier_call_chain: val 6 fcn ffffffff81bd3370 name team_device_event
kworker/u128:32 4366 [002] 87.985812: notifier:notifier_call_chain: val 6 fcn ffffffff81bdb890 name tun_device_event
kworker/u128:32 4366 [002] 87.985817: notifier:notifier_call_chain: val 6 fcn ffffffff81bf5bf0 name vrf_device_event
kworker/u128:32 4366 [002] 87.985824: notifier:notifier_call_chain: val 6 fcn ffffffff81c8cb80 name bpq_device_event
kworker/u128:32 4366 [002] 87.985830: notifier:notifier_call_chain: val 6 fcn ffffffff81c96e40 name pppoe_device_event
kworker/u128:32 4366 [002] 87.985835: notifier:notifier_call_chain: val 6 fcn ffffffff81c9b380 name hdlc_device_event
kworker/u128:32 4366 [002] 87.985840: notifier:notifier_call_chain: val 6 fcn ffffffff81ca06a0 name dlci_dev_event
kworker/u128:32 4366 [002] 87.985845: notifier:notifier_call_chain: val 6 fcn ffffffff81ca12c0 name lapbeth_device_event
kworker/u128:32 4366 [002] 87.985852: notifier:notifier_call_chain: val 6 fcn ffffffff81fa90f0 name cma_netdev_callback
kworker/u128:32 4366 [002] 87.985857: notifier:notifier_call_chain: val 6 fcn ffffffff81fee6a0 name ipoib_netdev_event
kworker/u128:32 4366 [002] 87.985863: notifier:notifier_call_chain: val 6 fcn ffffffff821c0e20 name dropmon_net_event
kworker/u128:32 4366 [002] 87.985883: notifier:notifier_call_chain: val 6 fcn ffffffff821c5d80 name failover_event
kworker/u128:32 4366 [002] 87.985890: notifier:notifier_call_chain: val 6 fcn ffffffff821e3a60 name mirred_device_event
kworker/u128:32 4366 [002] 87.985896: notifier:notifier_call_chain: val 6 fcn ffffffff8223a930 name nfqnl_rcv_dev_event
kworker/u128:32 4366 [002] 87.985905: notifier:notifier_call_chain: val 6 fcn ffffffff82273bd0 name nf_tables_netdev_event
kworker/u128:32 4366 [002] 87.985911: notifier:notifier_call_chain: val 6 fcn ffffffff822667c0 name nf_tables_flowtable_event
kworker/u128:32 4366 [002] 87.985919: notifier:notifier_call_chain: val 6 fcn ffffffff8227ef40 name flow_offload_netdev_event
kworker/u128:32 4366 [002] 87.985923: notifier:notifier_call_chain: val 6 fcn ffffffff82260e40 name masq_device_event
kworker/u128:32 4366 [002] 87.985929: notifier:notifier_call_chain: val 6 fcn ffffffff82290560 name tee_netdev_event
kworker/u128:32 4366 [002] 87.985938: notifier:notifier_call_chain: val 6 fcn ffffffff8236dcd0 name clusterip_netdev_event
kworker/u128:32 4366 [002] 87.985943: notifier:notifier_call_chain: val 6 fcn ffffffff82385210 name tls_dev_event
kworker/u128:32 4366 [002] 87.985952: notifier:notifier_call_chain: val 6 fcn ffffffff823e99b0 name ip6mr_device_event
kworker/u128:32 4366 [002] 87.985959: notifier:notifier_call_chain: val 6 fcn ffffffff823b9300 name addrconf_notify
kworker/u128:32 4366 [002] 87.985977: notifier:notifier_call_chain: val 6 fcn ffffffff823d9200 name ipv6_mc_netdev_event
kworker/u128:32 4366 [002] 87.985984: notifier:notifier_call_chain: val 6 fcn ffffffff82419160 name packet_notifier
kworker/u128:32 4366 [002] 87.985990: notifier:notifier_call_chain: val 6 fcn ffffffff824255f0 name br_device_event
kworker/u128:32 4366 [002] 87.985999: notifier:notifier_call_chain: val 6 fcn ffffffff8243e6e0 name brnf_device_event
kworker/u128:32 4366 [002] 87.986005: notifier:notifier_call_chain: val 6 fcn ffffffff8244c220 name dsa_slave_netdevice_event
kworker/u128:32 4366 [002] 87.986011: notifier:notifier_call_chain: val 6 fcn ffffffff8244df10 name x25_device_event
kworker/u128:32 4366 [002] 87.986016: notifier:notifier_call_chain: val 6 fcn ffffffff824562a0 name nr_device_event
kworker/u128:32 4366 [002] 87.986021: notifier:notifier_call_chain: val 6 fcn ffffffff8245bd20 name rose_device_event
kworker/u128:32 4366 [002] 87.986027: notifier:notifier_call_chain: val 6 fcn ffffffff82466e40 name ax25_device_event
kworker/u128:32 4366 [002] 87.986032: notifier:notifier_call_chain: val 6 fcn ffffffff8246b8a0 name can_notifier
kworker/u128:32 4366 [002] 87.986037: notifier:notifier_call_chain: val 6 fcn ffffffff82470860 name cgw_notifier
kworker/u128:32 4366 [002] 87.986043: notifier:notifier_call_chain: val 6 fcn ffffffff824c08b0 name device_event
kworker/u128:32 4366 [002] 87.986049: notifier:notifier_call_chain: val 6 fcn ffffffff825355a0 name clip_device_event
kworker/u128:32 4366 [002] 87.986055: notifier:notifier_call_chain: val 6 fcn ffffffff825481c0 name vlan_device_event
kworker/u128:32 4366 [002] 87.986061: notifier:notifier_call_chain: val 6 fcn ffffffff8267bde0 name tipc_l2_device_event
kworker/u128:32 4366 [002] 87.986067: notifier:notifier_call_chain: val 6 fcn ffffffff826ab0e0 name smc_pnet_netdev_event
kworker/u128:32 4366 [002] 87.986076: notifier:notifier_call_chain: val 6 fcn ffffffff826c3220 name caif_device_notify
kworker/u128:32 4366 [002] 87.986081: notifier:notifier_call_chain: val 6 fcn ffffffff826cb4b0 name cfusbl_device_notify
kworker/u128:32 4366 [002] 87.986087: notifier:notifier_call_chain: val 6 fcn ffffffff826cf280 name lowpan_event
kworker/u128:32 4366 [002] 87.986092: notifier:notifier_call_chain: val 6 fcn ffffffff826def50 name lowpan_device_event
kworker/u128:32 4366 [002] 87.986097: notifier:notifier_call_chain: val 6 fcn ffffffff827178d0 name batadv_hard_if_event
kworker/u128:32 4366 [002] 87.986109: notifier:notifier_call_chain: val 6 fcn ffffffff8274a0c0 name dp_device_event
kworker/u128:32 4366 [002] 87.986117: notifier:notifier_call_chain: val 6 fcn ffffffff827631d0 name mpls_dev_notify
kworker/u128:32 4366 [002] 87.986123: notifier:notifier_call_chain: val 6 fcn ffffffff82765180 name hsr_netdev_notify
kworker/u128:32 4366 [002] 87.986128: notifier:notifier_call_chain: val 6 fcn ffffffff81bc6cc0 name netconsole_netdev_event
kworker/u128:32 4366 [002] 87.986136: notifier:notifier_call_chain: val 6 fcn ffffffff81bec840 name vxlan_netdevice_event
kworker/u128:32 4366 [002] 87.986144: notifier:notifier_call_chain: val 6 fcn ffffffff81bf0c60 name geneve_netdevice_event
kworker/u128:32 4366 [002] 87.986153: notifier:notifier_call_chain: val 6 fcn ffffffff82025700 name rxe_notify
kworker/u128:32 4366 [002] 87.986159: notifier:notifier_call_chain: val 6 fcn ffffffff823cc0d0 name ndisc_netdev_event
kworker/u128:32 4366 [002] 87.986165: notifier:notifier_call_chain: val 6 fcn ffffffff823bde70 name ip6_route_dev_notify
On 5/1/19 7:38 AM, Tetsuo Handa wrote:
> On 2019/04/30 3:43, David Ahern wrote:
>>> The attached patch adds a tracepoint to notifier_call_chain. If you have
>>> KALLSYMS enabled it will show the order of the function handlers:
>>>
>>> perf record -e notifier:* -a -g &
>>>
>>> ip netns del <NAME>
>>> <wait a few seconds>
>>>
>>> fg
>>> <ctrl-c on perf-record>
>>>
>>> perf script
>>>
>>
>> with the header file this time.
>>
>
> What is the intent of your patch? I can see that many notifiers are called. But
> how does this help identify which event is responsible for dropping the refcount?
>
In a previous response you stated: "Since I'm not a netdev person, I
appreciate if you can explain that shutdown sequence using a flow chart."
The notifier sequence tells you the order of cleanup handlers and what
happens when a namespace is destroyed.
The dev_hold / dev_put tracepoint helps find the refcnt leak but
requires some time analyzing the output to match up hold / put stack traces.
On 2019/05/01 23:52, David Ahern wrote:
> On 5/1/19 7:38 AM, Tetsuo Handa wrote:
>> On 2019/04/30 3:43, David Ahern wrote:
>>>> The attached patch adds a tracepoint to notifier_call_chain. If you have
>>>> KALLSYMS enabled it will show the order of the function handlers:
>>>>
>>>> perf record -e notifier:* -a -g &
>>>>
>>>> ip netns del <NAME>
>>>> <wait a few seconds>
>>>>
>>>> fg
>>>> <ctrl-c on perf-record>
>>>>
>>>> perf script
>>>>
>>>
>>> with the header file this time.
>>>
>>
>> What is the intent of your patch? I can see that many notifiers are called. But
>> how does this help identify which event is responsible for dropping the refcount?
>>
>
> In a previous response you stated: "Since I'm not a netdev person, I
> appreciate if you can explain that shutdown sequence using a flow chart."
Yes, I said. But
>
> The notifier sequence tells you the order of cleanup handlers and what
> happens when a namespace is destroyed.
>
> The dev_hold / dev_put tracepoint helps find the refcnt leak but
> requires some time analyzing the output to match up hold / put stack traces.
>
I already observed that fib_netdev_event() calls rt_flush_dev() which becomes a no-op
after the refcount of a dev is moved to the loopback device in that namespace.
I think that there is no event which can drop the loopback device in that namespace.
[ 71.388104][ T7620] rt_flush_dev(00000000cd35e96a)->(00000000d9f4ea20)
[ 71.391757][ T7620] dev_hold(00000000d9f4ea20): 7->8
[ 71.394725][ T7620] CPU: 4 PID: 7620 Comm: a.out Not tainted 5.1.0-rc5+ #177
[ 71.398094][ T7620] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 71.403711][ T7620] Call Trace:
[ 71.405912][ T7620] dump_stack+0xaa/0xd8
[ 71.408252][ T7620] rt_flush_dev+0x177/0x1b0
[ 71.410802][ T7620] fib_netdev_event+0x150/0x1b0
[ 71.413270][ T7620] notifier_call_chain+0x47/0xd0
[ 71.415849][ T7620] raw_notifier_call_chain+0x2d/0x40
[ 71.418491][ T7620] ? tun_show_group+0x90/0x90
[ 71.421108][ T7620] call_netdevice_notifiers_info+0x32/0x70
[ 71.423854][ T7620] rollback_registered_many+0x421/0x680
[ 71.426583][ T7620] rollback_registered+0x68/0xb0
[ 71.429244][ T7620] unregister_netdevice_queue+0xa5/0x100
[ 71.432191][ T7620] __tun_detach+0x576/0x590
[ 71.435533][ T7620] tun_chr_close+0x41/0x80
[ 71.437957][ T7620] ? __tun_detach+0x590/0x590
[ 71.440500][ T7620] __fput+0xeb/0x2d0
[ 71.442816][ T7620] ____fput+0x15/0x20
[ 71.445090][ T7620] task_work_run+0xa9/0xd0
[ 71.447467][ T7620] do_exit+0x37a/0xf40
[ 71.449623][ T7620] do_group_exit+0x57/0xe0
[ 71.451826][ T7620] get_signal+0x114/0x950
[ 71.453989][ T7620] do_signal+0x2f/0x700
[ 71.456126][ T7620] ? handle_mm_fault+0x1a8/0x360
[ 71.458323][ T7620] ? __x64_sys_futex+0x179/0x210
[ 71.460620][ T7620] exit_to_usermode_loop+0x159/0x180
[ 71.462956][ T7620] do_syscall_64+0x15d/0x180
[ 71.465110][ T7620] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 89.873350][ T4478] rt_flush_dev(00000000d9f4ea20)->(00000000d9f4ea20)
[ 89.876311][ T4478] dev_hold(00000000d9f4ea20): 34->35
[ 89.878712][ T4478] CPU: 2 PID: 4478 Comm: kworker/u128:28 Not tainted 5.1.0-rc5+ #177
[ 89.881981][ T4478] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 89.887273][ T4478] Workqueue: netns cleanup_net
[ 89.889712][ T4478] Call Trace:
[ 89.891737][ T4478] dump_stack+0xaa/0xd8
[ 89.894127][ T4478] rt_flush_dev+0x177/0x1b0
[ 89.896477][ T4478] fib_netdev_event+0x150/0x1b0
[ 89.898810][ T4478] notifier_call_chain+0x47/0xd0
[ 89.901348][ T4478] raw_notifier_call_chain+0x2d/0x40
[ 89.903974][ T4478] call_netdevice_notifiers_info+0x32/0x70
[ 89.906450][ T4478] rollback_registered_many+0x421/0x680
[ 89.909125][ T4478] unregister_netdevice_many.part.119+0x17/0x90
[ 89.911833][ T4478] default_device_exit_batch+0x1a1/0x1d0
[ 89.914287][ T4478] ? do_wait_intr_irq+0xb0/0xb0
[ 89.916720][ T4478] ? unregister_netdevice_many+0x30/0x30
[ 89.919258][ T4478] ? dev_change_net_namespace+0x4e0/0x4e0
[ 89.921759][ T4478] ops_exit_list.isra.6+0x75/0x90
[ 89.924396][ T4478] cleanup_net+0x20d/0x380
[ 89.926632][ T4478] process_one_work+0x202/0x4f0
[ 89.929045][ T4478] worker_thread+0x3c/0x4b0
[ 89.931398][ T4478] kthread+0x139/0x160
[ 89.933448][ T4478] ? process_one_work+0x4f0/0x4f0
[ 89.935887][ T4478] ? kthread_destroy_worker+0x70/0x70
[ 89.938243][ T4478] ret_from_fork+0x35/0x40
[ 89.940530][ T4478] dev_put(00000000d9f4ea20): 35->34
[ 89.943031][ T4478] CPU: 2 PID: 4478 Comm: kworker/u128:28 Not tainted 5.1.0-rc5+ #177
[ 89.946064][ T4478] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 89.951324][ T4478] Workqueue: netns cleanup_net
[ 89.953646][ T4478] Call Trace:
[ 89.955652][ T4478] dump_stack+0xaa/0xd8
[ 89.957857][ T4478] rt_flush_dev+0x19f/0x1b0
[ 89.960187][ T4478] fib_netdev_event+0x150/0x1b0
[ 89.962706][ T4478] notifier_call_chain+0x47/0xd0
[ 89.965044][ T4478] raw_notifier_call_chain+0x2d/0x40
[ 89.967503][ T4478] call_netdevice_notifiers_info+0x32/0x70
[ 89.970139][ T4478] rollback_registered_many+0x421/0x680
[ 89.972618][ T4478] unregister_netdevice_many.part.119+0x17/0x90
[ 89.975364][ T4478] default_device_exit_batch+0x1a1/0x1d0
[ 89.977827][ T4478] ? do_wait_intr_irq+0xb0/0xb0
[ 89.980066][ T4478] ? unregister_netdevice_many+0x30/0x30
[ 89.982761][ T4478] ? dev_change_net_namespace+0x4e0/0x4e0
[ 89.985231][ T4478] ops_exit_list.isra.6+0x75/0x90
[ 89.987756][ T4478] cleanup_net+0x20d/0x380
[ 89.990090][ T4478] process_one_work+0x202/0x4f0
[ 89.992384][ T4478] worker_thread+0x3c/0x4b0
[ 89.994702][ T4478] kthread+0x139/0x160
[ 89.996749][ T4478] ? process_one_work+0x4f0/0x4f0
[ 89.999116][ T4478] ? kthread_destroy_worker+0x70/0x70
[ 90.001580][ T4478] ret_from_fork+0x35/0x40
syzbot is hitting infinite loop when a loopback device in a namespace is
unregistered [1]. This is because rt_flush_dev() is moving the refcount of
"any device to unregister" to "a loopback device in that namespace" but
nobody can drop the refcount moved from non loopback devices when the
loopback device in that namespace is unregistered.
This behavior was introduced by commit caacf05e5ad1abf0 ("ipv4: Properly
purge netdev references on uncached routes.") but there is no description
why we have to temporarily move the refcount to "a loopback device in that
namespace" and why it is safe to do so, for rt_flush_dev() becomes a no-op
when "a loopback device in that namespace" is about to be unregistered.
Since I don't know the reason, this patch breaks the infinite loop by
deleting the uncached route (which eventually drops the refcount via
dst_destroy()) when "a loopback device in that namespace" is unregistered
rather than when "non-loopback devices in that namespace" is unregistered.
[1] https://syzkaller.appspot.com/bug?id=bae9a2236bfede42cf3d219e6bf6740c583568a4
Signed-off-by: Tetsuo Handa <[email protected]>
Reported-by: syzbot <[email protected]>
---
net/ipv4/route.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 6fdf1c195d8e..7e865c11d4f3 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1522,15 +1522,21 @@ void rt_flush_dev(struct net_device *dev)
{
struct net *net = dev_net(dev);
struct rtable *rt;
+ struct rtable *tmp;
int cpu;
for_each_possible_cpu(cpu) {
struct uncached_list *ul = &per_cpu(rt_uncached_list, cpu);
spin_lock_bh(&ul->lock);
- list_for_each_entry(rt, &ul->head, rt_uncached) {
+ list_for_each_entry_safe(rt, tmp, &ul->head, rt_uncached) {
if (rt->dst.dev != dev)
continue;
+ if (dev == net->loopback_dev) {
+ list_del_init(&rt->rt_uncached);
+ ip_rt_put(rt);
+ continue;
+ }
rt->dst.dev = net->loopback_dev;
dev_hold(rt->dst.dev);
dev_put(dev);
--
2.17.1
On 5/4/19 10:52 AM, Tetsuo Handa wrote:
> syzbot is hitting infinite loop when a loopback device in a namespace is
> unregistered [1]. This is because rt_flush_dev() is moving the refcount of
> "any device to unregister" to "a loopback device in that namespace" but
> nobody can drop the refcount moved from non loopback devices when the
> loopback device in that namespace is unregistered.
>
> This behavior was introduced by commit caacf05e5ad1abf0 ("ipv4: Properly
> purge netdev references on uncached routes.") but there is no description
> why we have to temporarily move the refcount to "a loopback device in that
> namespace" and why it is safe to do so, for rt_flush_dev() becomes a no-op
> when "a loopback device in that namespace" is about to be unregistered.
>
> Since I don't know the reason, this patch breaks the infinite loop by
> deleting the uncached route (which eventually drops the refcount via
> dst_destroy()) when "a loopback device in that namespace" is unregistered
> rather than when "non-loopback devices in that namespace" is unregistered.
Well, you have not fixed a bug, you simply made sure that whatever cpu is using the
routes you forcibly deleted is going to crash the host very soon (use-after-frees have
undefined behavior, but KASAN should crash most of the times)
Please do not send patches like that with a huge CC list, keep networking patches
to netdev mailing list.
Mahesh has an alternative patch, adding a fake device that can not be dismantled
to make sure we fully intercept skbs sent through a dead route, instead of relying
on loopback dropping them later at some point.
On 2019/05/05 0:56, Eric Dumazet wrote:>
> Well, you have not fixed a bug, you simply made sure that whatever cpu is using the
> routes you forcibly deleted is going to crash the host very soon (use-after-frees have
> undefined behavior, but KASAN should crash most of the times)
I confirmed that this patch survives "#syz test:" before submitting.
But you know that this patch is deleting the route entry too early. OK.
>
> Please do not send patches like that with a huge CC list, keep networking patches
> to netdev mailing list.
If netdev people started working on this "minutely crashing bug" earlier,
I would not have written a patch...
>
> Mahesh has an alternative patch, adding a fake device that can not be dismantled
> to make sure we fully intercept skbs sent through a dead route, instead of relying
> on loopback dropping them later at some point.
So, the reason to temporarily move the refcount is to give enough period
so that the route entry is no longer used. But moving the refcount to a
loopback device in a namespace was wrong. Is this understanding correct?
Compared to moving the refcount to the loopback device in the init namespace,
the fake device can somehow drop the refcount moved via rt_flush_dev(), can't it?
Anyway, I'll wait for Mahesh.
On 5/4/19 1:09 PM, Tetsuo Handa wrote:
> On 2019/05/05 0:56, Eric Dumazet wrote:>
>> Well, you have not fixed a bug, you simply made sure that whatever cpu is using the
>> routes you forcibly deleted is going to crash the host very soon (use-after-frees have
>> undefined behavior, but KASAN should crash most of the times)
>
> I confirmed that this patch survives "#syz test:" before submitting.
> But you know that this patch is deleting the route entry too early. OK.
>
>>
>> Please do not send patches like that with a huge CC list, keep networking patches
>> to netdev mailing list.
>
> If netdev people started working on this "minutely crashing bug" earlier,
> I would not have written a patch...
So, just that you know, we are working on bug fixes, and this is best effort.
It is not because _you_ want to fix a particular bug (out of hundreds)
that we need to stop everything and work full time on a particular bug.
And here the root cause of the problem is elsewhere. A dst is leaking somewhere,
and prevents the netns dismantle.
We had many dst leaks in the past, and they keep being added by new bugs.
>
>>
>> Mahesh has an alternative patch, adding a fake device that can not be dismantled
>> to make sure we fully intercept skbs sent through a dead route, instead of relying
>> on loopback dropping them later at some point.
>
> So, the reason to temporarily move the refcount is to give enough period
> so that the route entry is no longer used. But moving the refcount to a
> loopback device in a namespace was wrong. Is this understanding correct?
I believe you need spend more time on studying the networking code by yourself,
add tracing if you believe this could be useful to you and others.
>
> Compared to moving the refcount to the loopback device in the init namespace,
> the fake device can somehow drop the refcount moved via rt_flush_dev(), can't it?
>
The fake device wont ever disappear.
> Anyway, I'll wait for Mahesh.
>
Hello,
On Sat, 4 May 2019, Tetsuo Handa wrote:
> syzbot is hitting infinite loop when a loopback device in a namespace is
> unregistered [1]. This is because rt_flush_dev() is moving the refcount of
> "any device to unregister" to "a loopback device in that namespace" but
> nobody can drop the refcount moved from non loopback devices when the
> loopback device in that namespace is unregistered.
>
> This behavior was introduced by commit caacf05e5ad1abf0 ("ipv4: Properly
> purge netdev references on uncached routes.") but there is no description
> why we have to temporarily move the refcount to "a loopback device in that
> namespace" and why it is safe to do so, for rt_flush_dev() becomes a no-op
> when "a loopback device in that namespace" is about to be unregistered.
>
> Since I don't know the reason, this patch breaks the infinite loop by
> deleting the uncached route (which eventually drops the refcount via
> dst_destroy()) when "a loopback device in that namespace" is unregistered
> rather than when "non-loopback devices in that namespace" is unregistered.
There is one simple rule: code that holds device references should
catch device events and to put the references. This should happen
not after the NETDEV_UNREGISTER event.
But there are users such as dsts that have longer life because
there are other objects that can hold dsts - sockets, for example.
Sockets can hold dst as result of route resolving (sk_dst_cache) and to
reuse it as long as it is valid. And many sockets can point to same dst.
Obviously, socket can be idle while device disappears. We do not
propagate device events to every socket. What we do is to mark the dst
entry as invalid and to drop its dev reference. So, the problem can be
noticed on next sending when the cached dst is revalidated. As the dst
entry is marked as obsolete, it will be dropped.
What you see in rt_flush_dev() is that the IPv4 route subsystem
drops its dev references at the right time. Why net->loopback_dev ?
Because we prefer the leaked dsts to prevent netns to be released, so
that we have more info where is the problem. If we account the leaks
to init netns, nobody will notice them. These are leaks that missed
many cleanup steps: device events and even net-exit events. If you
see "lo" in error messages, it is more likely a dst leak, i.e.
we got a dst reference but dst_release() was not called before netns
cleanup. In such case you need to track not the dev references but the
dst references. If another device is shown, it is not a dst leak
but some dev leak (dev_put not called).
> [1] https://syzkaller.appspot.com/bug?id=bae9a2236bfede42cf3d219e6bf6740c583568a4
>
> Signed-off-by: Tetsuo Handa <[email protected]>
> Reported-by: syzbot <[email protected]>
> ---
> net/ipv4/route.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 6fdf1c195d8e..7e865c11d4f3 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1522,15 +1522,21 @@ void rt_flush_dev(struct net_device *dev)
> {
> struct net *net = dev_net(dev);
> struct rtable *rt;
> + struct rtable *tmp;
> int cpu;
>
> for_each_possible_cpu(cpu) {
> struct uncached_list *ul = &per_cpu(rt_uncached_list, cpu);
>
> spin_lock_bh(&ul->lock);
> - list_for_each_entry(rt, &ul->head, rt_uncached) {
> + list_for_each_entry_safe(rt, tmp, &ul->head, rt_uncached) {
> if (rt->dst.dev != dev)
> continue;
> + if (dev == net->loopback_dev) {
> + list_del_init(&rt->rt_uncached);
> + ip_rt_put(rt);
> + continue;
> + }
> rt->dst.dev = net->loopback_dev;
> dev_hold(rt->dst.dev);
> dev_put(dev);
> --
> 2.17.1
Regards
--
Julian Anastasov <[email protected]>
Hello.
I noticed that syzbot is reporting that refcount incremented by bpf(BPF_MAP_UPDATE_ELEM)
syscall is not decremented when unregister_netdevice() is called. Is this a BPF bug?
Kernel: 9e208aa06c2109b45eec6be049a8e47034748c20 on linux.git
Config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=73c2aace7604ab7
Reproducer: https://syzkaller.appspot.com/text?tag=ReproC&x=1215afaf600000
Debug printk patch:
----------------------------------------
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9eda1c31d1f7..542a47fe6998 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3732,10 +3732,7 @@ void netdev_run_todo(void);
*
* Release reference to device to allow it to be freed.
*/
-static inline void dev_put(struct net_device *dev)
-{
- this_cpu_dec(*dev->pcpu_refcnt);
-}
+extern void dev_put(struct net_device *dev);
/**
* dev_hold - get reference to device
@@ -3743,10 +3740,7 @@ static inline void dev_put(struct net_device *dev)
*
* Hold reference to device to keep it from being freed.
*/
-static inline void dev_hold(struct net_device *dev)
-{
- this_cpu_inc(*dev->pcpu_refcnt);
-}
+extern void dev_hold(struct net_device *dev);
/* Carrier loss detection, dial on demand. The functions netif_carrier_on
* and _off may be called from IRQ context, but it is caller
diff --git a/net/core/dev.c b/net/core/dev.c
index bf3ed413abaf..21f82aa92fad 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8968,8 +8968,8 @@ static void netdev_wait_allrefs(struct net_device *dev)
refcnt = netdev_refcnt_read(dev);
if (refcnt && time_after(jiffies, warning_time + 10 * HZ)) {
- pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
- dev->name, refcnt);
+ pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d %px\n",
+ dev->name, refcnt, dev);
warning_time = jiffies;
}
}
@@ -9930,3 +9930,24 @@ static int __init net_dev_init(void)
}
subsys_initcall(net_dev_init);
+
+
+void dev_put(struct net_device *dev)
+{
+ this_cpu_dec(*dev->pcpu_refcnt);
+ if (!strcmp(dev->name, "bridge_slave_0")) {
+ printk("dev_put: %px %d", dev, netdev_refcnt_read(dev));
+ dump_stack();
+ }
+}
+EXPORT_SYMBOL(dev_put);
+
+void dev_hold(struct net_device *dev)
+{
+ if (!strcmp(dev->name, "bridge_slave_0")) {
+ printk("dev_hold: %px %d", dev, netdev_refcnt_read(dev));
+ dump_stack();
+ }
+ this_cpu_inc(*dev->pcpu_refcnt);
+}
+EXPORT_SYMBOL(dev_hold);
----------------------------------------
----------------------------------------
Oct 11 14:33:06 ubuntu kernel: [ 114.251175][ T8866] dev_hold: ffff888091fd2000 100
Oct 11 14:33:06 ubuntu kernel: [ 114.251185][ T8866] CPU: 3 PID: 8866 Comm: a.out Not tainted 5.4.0-rc2+ #217
Oct 11 14:33:06 ubuntu kernel: [ 114.251199][ T8866] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
Oct 11 14:33:06 ubuntu kernel: [ 114.251208][ T8866] Call Trace:
Oct 11 14:33:06 ubuntu kernel: [ 114.251232][ T8866] dump_stack+0x154/0x1c5
Oct 11 14:33:06 ubuntu kernel: [ 114.251253][ T8866] dev_hold+0x73/0x80
Oct 11 14:33:06 ubuntu kernel: [ 114.251267][ T8866] dev_get_by_index+0x1b3/0x2d0
Oct 11 14:33:06 ubuntu kernel: [ 114.251280][ T8866] __dev_map_alloc_node+0x1c7/0x360
Oct 11 14:33:06 ubuntu kernel: [ 114.251299][ T8866] dev_map_hash_update_elem+0x485/0x670
Oct 11 14:33:06 ubuntu kernel: [ 114.251320][ T8866] __do_sys_bpf+0x35d6/0x38c0
Oct 11 14:33:06 ubuntu kernel: [ 114.251337][ T8866] ? bpf_prog_load+0x1470/0x1470
Oct 11 14:33:06 ubuntu kernel: [ 114.251351][ T8866] ? do_wp_page+0x3c8/0x1310
Oct 11 14:33:06 ubuntu kernel: [ 114.251364][ T8866] ? finish_mkwrite_fault+0x300/0x300
Oct 11 14:33:06 ubuntu kernel: [ 114.251381][ T8866] ? find_held_lock+0x35/0x1e0
Oct 11 14:33:06 ubuntu kernel: [ 114.251397][ T8866] ? __do_page_fault+0x504/0xb60
Oct 11 14:33:06 ubuntu kernel: [ 114.251413][ T8866] ? lock_downgrade+0x900/0x900
Oct 11 14:33:06 ubuntu kernel: [ 114.251426][ T8866] ? __pmd_alloc+0x410/0x410
Oct 11 14:33:06 ubuntu kernel: [ 114.251446][ T8866] ? __kasan_check_write+0x14/0x20
Oct 11 14:33:06 ubuntu kernel: [ 114.251457][ T8866] ? up_read+0x1b6/0x7a0
Oct 11 14:33:06 ubuntu kernel: [ 114.251471][ T8866] ? down_read_nested+0x480/0x480
Oct 11 14:33:06 ubuntu kernel: [ 114.251494][ T8866] ? do_syscall_64+0x26/0x6a0
Oct 11 14:33:06 ubuntu kernel: [ 114.251507][ T8866] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
Oct 11 14:33:06 ubuntu kernel: [ 114.251515][ T8866] ? do_syscall_64+0x26/0x6a0
Oct 11 14:33:06 ubuntu kernel: [ 114.251528][ T8866] __x64_sys_bpf+0x73/0xb0
Oct 11 14:33:06 ubuntu kernel: [ 114.251541][ T8866] do_syscall_64+0xde/0x6a0
Oct 11 14:33:06 ubuntu kernel: [ 114.251559][ T8866] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...snipped...)
Oct 11 14:33:10 ubuntu kernel: [ 117.459637][ T9584] dev_hold: ffff888091fd2000 200
Oct 11 14:33:10 ubuntu kernel: [ 117.459644][ T9584] CPU: 4 PID: 9584 Comm: a.out Not tainted 5.4.0-rc2+ #217
Oct 11 14:33:10 ubuntu kernel: [ 117.459652][ T9584] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
Oct 11 14:33:10 ubuntu kernel: [ 117.459656][ T9584] Call Trace:
Oct 11 14:33:10 ubuntu kernel: [ 117.459669][ T9584] dump_stack+0x154/0x1c5
Oct 11 14:33:10 ubuntu kernel: [ 117.459682][ T9584] dev_hold+0x73/0x80
Oct 11 14:33:10 ubuntu kernel: [ 117.459695][ T9584] dev_get_by_index+0x1b3/0x2d0
Oct 11 14:33:10 ubuntu kernel: [ 117.459706][ T9584] __dev_map_alloc_node+0x1c7/0x360
Oct 11 14:33:10 ubuntu kernel: [ 117.459720][ T9584] dev_map_hash_update_elem+0x485/0x670
Oct 11 14:33:10 ubuntu kernel: [ 117.459749][ T9584] __do_sys_bpf+0x35d6/0x38c0
Oct 11 14:33:10 ubuntu kernel: [ 117.459762][ T9584] ? bpf_prog_load+0x1470/0x1470
Oct 11 14:33:10 ubuntu kernel: [ 117.459769][ T9584] ? do_wp_page+0x3c8/0x1310
Oct 11 14:33:10 ubuntu kernel: [ 117.459778][ T9584] ? finish_mkwrite_fault+0x300/0x300
Oct 11 14:33:10 ubuntu kernel: [ 117.459787][ T9584] ? find_held_lock+0x35/0x1e0
Oct 11 14:33:10 ubuntu kernel: [ 117.459797][ T9584] ? __do_page_fault+0x504/0xb60
Oct 11 14:33:10 ubuntu kernel: [ 117.459807][ T9584] ? lock_downgrade+0x900/0x900
Oct 11 14:33:10 ubuntu kernel: [ 117.459814][ T9584] ? __pmd_alloc+0x410/0x410
Oct 11 14:33:10 ubuntu kernel: [ 117.459828][ T9584] ? __kasan_check_write+0x14/0x20
Oct 11 14:33:10 ubuntu kernel: [ 117.459835][ T9584] ? up_read+0x1b6/0x7a0
Oct 11 14:33:10 ubuntu kernel: [ 117.459846][ T9584] ? down_read_nested+0x480/0x480
Oct 11 14:33:10 ubuntu kernel: [ 117.459862][ T9584] ? do_syscall_64+0x26/0x6a0
Oct 11 14:33:10 ubuntu kernel: [ 117.459871][ T9584] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
Oct 11 14:33:10 ubuntu kernel: [ 117.459878][ T9584] ? do_syscall_64+0x26/0x6a0
Oct 11 14:33:10 ubuntu kernel: [ 117.459891][ T9584] __x64_sys_bpf+0x73/0xb0
Oct 11 14:33:10 ubuntu kernel: [ 117.459901][ T9584] do_syscall_64+0xde/0x6a0
Oct 11 14:33:10 ubuntu kernel: [ 117.459911][ T9584] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...snipped...)
Oct 11 14:33:26 ubuntu kernel: [ 134.146838][T13860] dev_hold: ffff888091fd2000 850
Oct 11 14:33:26 ubuntu kernel: [ 134.146847][T13860] CPU: 4 PID: 13860 Comm: a.out Not tainted 5.4.0-rc2+ #217
Oct 11 14:33:26 ubuntu kernel: [ 134.146853][T13860] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
Oct 11 14:33:26 ubuntu kernel: [ 134.146859][T13860] Call Trace:
Oct 11 14:33:26 ubuntu kernel: [ 134.146872][T13860] dump_stack+0x154/0x1c5
Oct 11 14:33:26 ubuntu kernel: [ 134.146885][T13860] dev_hold+0x73/0x80
Oct 11 14:33:26 ubuntu kernel: [ 134.146893][T13860] dev_get_by_index+0x1b3/0x2d0
Oct 11 14:33:26 ubuntu kernel: [ 134.146903][T13860] __dev_map_alloc_node+0x1c7/0x360
Oct 11 14:33:26 ubuntu kernel: [ 134.146918][T13860] dev_map_hash_update_elem+0x485/0x670
Oct 11 14:33:26 ubuntu kernel: [ 134.146932][T13860] __do_sys_bpf+0x35d6/0x38c0
Oct 11 14:33:26 ubuntu kernel: [ 134.146944][T13860] ? bpf_prog_load+0x1470/0x1470
Oct 11 14:33:26 ubuntu kernel: [ 134.146953][T13860] ? do_wp_page+0x3c8/0x1310
Oct 11 14:33:26 ubuntu kernel: [ 134.146964][T13860] ? finish_mkwrite_fault+0x300/0x300
Oct 11 14:33:26 ubuntu kernel: [ 134.146975][T13860] ? find_held_lock+0x35/0x1e0
Oct 11 14:33:26 ubuntu kernel: [ 134.146985][T13860] ? __do_page_fault+0x504/0xb60
Oct 11 14:33:26 ubuntu kernel: [ 134.146994][T13860] ? lock_downgrade+0x900/0x900
Oct 11 14:33:26 ubuntu kernel: [ 134.147002][T13860] ? __pmd_alloc+0x410/0x410
Oct 11 14:33:26 ubuntu kernel: [ 134.147017][T13860] ? __kasan_check_write+0x14/0x20
Oct 11 14:33:26 ubuntu kernel: [ 134.147024][T13860] ? up_read+0x1b6/0x7a0
Oct 11 14:33:26 ubuntu kernel: [ 134.147033][T13860] ? down_read_nested+0x480/0x480
Oct 11 14:33:26 ubuntu kernel: [ 134.147048][T13860] ? do_syscall_64+0x26/0x6a0
Oct 11 14:33:26 ubuntu kernel: [ 134.147056][T13860] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
Oct 11 14:33:26 ubuntu kernel: [ 134.147063][T13860] ? do_syscall_64+0x26/0x6a0
Oct 11 14:33:26 ubuntu kernel: [ 134.147074][T13860] __x64_sys_bpf+0x73/0xb0
Oct 11 14:33:26 ubuntu kernel: [ 134.147084][T13860] do_syscall_64+0xde/0x6a0
Oct 11 14:33:26 ubuntu kernel: [ 134.147095][T13860] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...snipped...)
Oct 11 14:33:41 ubuntu kernel: [ 148.384539][ T4514] unregister_netdevice: waiting for bridge_slave_0 to become free. Usage count = 850 ffff888091fd2000
----------------------------------------
On Fri, Oct 11, 2019 at 3:15 AM Tetsuo Handa
<[email protected]> wrote:
>
> Hello.
>
> I noticed that syzbot is reporting that refcount incremented by bpf(BPF_MAP_UPDATE_ELEM)
> syscall is not decremented when unregister_netdevice() is called. Is this a BPF bug?
Jesper, Toke,
please take a look.
> Kernel: 9e208aa06c2109b45eec6be049a8e47034748c20 on linux.git
> Config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=73c2aace7604ab7
> Reproducer: https://syzkaller.appspot.com/text?tag=ReproC&x=1215afaf600000
> Debug printk patch:
> ----------------------------------------
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 9eda1c31d1f7..542a47fe6998 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3732,10 +3732,7 @@ void netdev_run_todo(void);
> *
> * Release reference to device to allow it to be freed.
> */
> -static inline void dev_put(struct net_device *dev)
> -{
> - this_cpu_dec(*dev->pcpu_refcnt);
> -}
> +extern void dev_put(struct net_device *dev);
>
> /**
> * dev_hold - get reference to device
> @@ -3743,10 +3740,7 @@ static inline void dev_put(struct net_device *dev)
> *
> * Hold reference to device to keep it from being freed.
> */
> -static inline void dev_hold(struct net_device *dev)
> -{
> - this_cpu_inc(*dev->pcpu_refcnt);
> -}
> +extern void dev_hold(struct net_device *dev);
>
> /* Carrier loss detection, dial on demand. The functions netif_carrier_on
> * and _off may be called from IRQ context, but it is caller
> diff --git a/net/core/dev.c b/net/core/dev.c
> index bf3ed413abaf..21f82aa92fad 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -8968,8 +8968,8 @@ static void netdev_wait_allrefs(struct net_device *dev)
> refcnt = netdev_refcnt_read(dev);
>
> if (refcnt && time_after(jiffies, warning_time + 10 * HZ)) {
> - pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
> - dev->name, refcnt);
> + pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d %px\n",
> + dev->name, refcnt, dev);
> warning_time = jiffies;
> }
> }
> @@ -9930,3 +9930,24 @@ static int __init net_dev_init(void)
> }
>
> subsys_initcall(net_dev_init);
> +
> +
> +void dev_put(struct net_device *dev)
> +{
> + this_cpu_dec(*dev->pcpu_refcnt);
> + if (!strcmp(dev->name, "bridge_slave_0")) {
> + printk("dev_put: %px %d", dev, netdev_refcnt_read(dev));
> + dump_stack();
> + }
> +}
> +EXPORT_SYMBOL(dev_put);
> +
> +void dev_hold(struct net_device *dev)
> +{
> + if (!strcmp(dev->name, "bridge_slave_0")) {
> + printk("dev_hold: %px %d", dev, netdev_refcnt_read(dev));
> + dump_stack();
> + }
> + this_cpu_inc(*dev->pcpu_refcnt);
> +}
> +EXPORT_SYMBOL(dev_hold);
> ----------------------------------------
>
> ----------------------------------------
> Oct 11 14:33:06 ubuntu kernel: [ 114.251175][ T8866] dev_hold: ffff888091fd2000 100
> Oct 11 14:33:06 ubuntu kernel: [ 114.251185][ T8866] CPU: 3 PID: 8866 Comm: a.out Not tainted 5.4.0-rc2+ #217
> Oct 11 14:33:06 ubuntu kernel: [ 114.251199][ T8866] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> Oct 11 14:33:06 ubuntu kernel: [ 114.251208][ T8866] Call Trace:
> Oct 11 14:33:06 ubuntu kernel: [ 114.251232][ T8866] dump_stack+0x154/0x1c5
> Oct 11 14:33:06 ubuntu kernel: [ 114.251253][ T8866] dev_hold+0x73/0x80
> Oct 11 14:33:06 ubuntu kernel: [ 114.251267][ T8866] dev_get_by_index+0x1b3/0x2d0
> Oct 11 14:33:06 ubuntu kernel: [ 114.251280][ T8866] __dev_map_alloc_node+0x1c7/0x360
> Oct 11 14:33:06 ubuntu kernel: [ 114.251299][ T8866] dev_map_hash_update_elem+0x485/0x670
> Oct 11 14:33:06 ubuntu kernel: [ 114.251320][ T8866] __do_sys_bpf+0x35d6/0x38c0
> Oct 11 14:33:06 ubuntu kernel: [ 114.251337][ T8866] ? bpf_prog_load+0x1470/0x1470
> Oct 11 14:33:06 ubuntu kernel: [ 114.251351][ T8866] ? do_wp_page+0x3c8/0x1310
> Oct 11 14:33:06 ubuntu kernel: [ 114.251364][ T8866] ? finish_mkwrite_fault+0x300/0x300
> Oct 11 14:33:06 ubuntu kernel: [ 114.251381][ T8866] ? find_held_lock+0x35/0x1e0
> Oct 11 14:33:06 ubuntu kernel: [ 114.251397][ T8866] ? __do_page_fault+0x504/0xb60
> Oct 11 14:33:06 ubuntu kernel: [ 114.251413][ T8866] ? lock_downgrade+0x900/0x900
> Oct 11 14:33:06 ubuntu kernel: [ 114.251426][ T8866] ? __pmd_alloc+0x410/0x410
> Oct 11 14:33:06 ubuntu kernel: [ 114.251446][ T8866] ? __kasan_check_write+0x14/0x20
> Oct 11 14:33:06 ubuntu kernel: [ 114.251457][ T8866] ? up_read+0x1b6/0x7a0
> Oct 11 14:33:06 ubuntu kernel: [ 114.251471][ T8866] ? down_read_nested+0x480/0x480
> Oct 11 14:33:06 ubuntu kernel: [ 114.251494][ T8866] ? do_syscall_64+0x26/0x6a0
> Oct 11 14:33:06 ubuntu kernel: [ 114.251507][ T8866] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
> Oct 11 14:33:06 ubuntu kernel: [ 114.251515][ T8866] ? do_syscall_64+0x26/0x6a0
> Oct 11 14:33:06 ubuntu kernel: [ 114.251528][ T8866] __x64_sys_bpf+0x73/0xb0
> Oct 11 14:33:06 ubuntu kernel: [ 114.251541][ T8866] do_syscall_64+0xde/0x6a0
> Oct 11 14:33:06 ubuntu kernel: [ 114.251559][ T8866] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> (...snipped...)
> Oct 11 14:33:10 ubuntu kernel: [ 117.459637][ T9584] dev_hold: ffff888091fd2000 200
> Oct 11 14:33:10 ubuntu kernel: [ 117.459644][ T9584] CPU: 4 PID: 9584 Comm: a.out Not tainted 5.4.0-rc2+ #217
> Oct 11 14:33:10 ubuntu kernel: [ 117.459652][ T9584] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> Oct 11 14:33:10 ubuntu kernel: [ 117.459656][ T9584] Call Trace:
> Oct 11 14:33:10 ubuntu kernel: [ 117.459669][ T9584] dump_stack+0x154/0x1c5
> Oct 11 14:33:10 ubuntu kernel: [ 117.459682][ T9584] dev_hold+0x73/0x80
> Oct 11 14:33:10 ubuntu kernel: [ 117.459695][ T9584] dev_get_by_index+0x1b3/0x2d0
> Oct 11 14:33:10 ubuntu kernel: [ 117.459706][ T9584] __dev_map_alloc_node+0x1c7/0x360
> Oct 11 14:33:10 ubuntu kernel: [ 117.459720][ T9584] dev_map_hash_update_elem+0x485/0x670
> Oct 11 14:33:10 ubuntu kernel: [ 117.459749][ T9584] __do_sys_bpf+0x35d6/0x38c0
> Oct 11 14:33:10 ubuntu kernel: [ 117.459762][ T9584] ? bpf_prog_load+0x1470/0x1470
> Oct 11 14:33:10 ubuntu kernel: [ 117.459769][ T9584] ? do_wp_page+0x3c8/0x1310
> Oct 11 14:33:10 ubuntu kernel: [ 117.459778][ T9584] ? finish_mkwrite_fault+0x300/0x300
> Oct 11 14:33:10 ubuntu kernel: [ 117.459787][ T9584] ? find_held_lock+0x35/0x1e0
> Oct 11 14:33:10 ubuntu kernel: [ 117.459797][ T9584] ? __do_page_fault+0x504/0xb60
> Oct 11 14:33:10 ubuntu kernel: [ 117.459807][ T9584] ? lock_downgrade+0x900/0x900
> Oct 11 14:33:10 ubuntu kernel: [ 117.459814][ T9584] ? __pmd_alloc+0x410/0x410
> Oct 11 14:33:10 ubuntu kernel: [ 117.459828][ T9584] ? __kasan_check_write+0x14/0x20
> Oct 11 14:33:10 ubuntu kernel: [ 117.459835][ T9584] ? up_read+0x1b6/0x7a0
> Oct 11 14:33:10 ubuntu kernel: [ 117.459846][ T9584] ? down_read_nested+0x480/0x480
> Oct 11 14:33:10 ubuntu kernel: [ 117.459862][ T9584] ? do_syscall_64+0x26/0x6a0
> Oct 11 14:33:10 ubuntu kernel: [ 117.459871][ T9584] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
> Oct 11 14:33:10 ubuntu kernel: [ 117.459878][ T9584] ? do_syscall_64+0x26/0x6a0
> Oct 11 14:33:10 ubuntu kernel: [ 117.459891][ T9584] __x64_sys_bpf+0x73/0xb0
> Oct 11 14:33:10 ubuntu kernel: [ 117.459901][ T9584] do_syscall_64+0xde/0x6a0
> Oct 11 14:33:10 ubuntu kernel: [ 117.459911][ T9584] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> (...snipped...)
> Oct 11 14:33:26 ubuntu kernel: [ 134.146838][T13860] dev_hold: ffff888091fd2000 850
> Oct 11 14:33:26 ubuntu kernel: [ 134.146847][T13860] CPU: 4 PID: 13860 Comm: a.out Not tainted 5.4.0-rc2+ #217
> Oct 11 14:33:26 ubuntu kernel: [ 134.146853][T13860] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> Oct 11 14:33:26 ubuntu kernel: [ 134.146859][T13860] Call Trace:
> Oct 11 14:33:26 ubuntu kernel: [ 134.146872][T13860] dump_stack+0x154/0x1c5
> Oct 11 14:33:26 ubuntu kernel: [ 134.146885][T13860] dev_hold+0x73/0x80
> Oct 11 14:33:26 ubuntu kernel: [ 134.146893][T13860] dev_get_by_index+0x1b3/0x2d0
> Oct 11 14:33:26 ubuntu kernel: [ 134.146903][T13860] __dev_map_alloc_node+0x1c7/0x360
> Oct 11 14:33:26 ubuntu kernel: [ 134.146918][T13860] dev_map_hash_update_elem+0x485/0x670
> Oct 11 14:33:26 ubuntu kernel: [ 134.146932][T13860] __do_sys_bpf+0x35d6/0x38c0
> Oct 11 14:33:26 ubuntu kernel: [ 134.146944][T13860] ? bpf_prog_load+0x1470/0x1470
> Oct 11 14:33:26 ubuntu kernel: [ 134.146953][T13860] ? do_wp_page+0x3c8/0x1310
> Oct 11 14:33:26 ubuntu kernel: [ 134.146964][T13860] ? finish_mkwrite_fault+0x300/0x300
> Oct 11 14:33:26 ubuntu kernel: [ 134.146975][T13860] ? find_held_lock+0x35/0x1e0
> Oct 11 14:33:26 ubuntu kernel: [ 134.146985][T13860] ? __do_page_fault+0x504/0xb60
> Oct 11 14:33:26 ubuntu kernel: [ 134.146994][T13860] ? lock_downgrade+0x900/0x900
> Oct 11 14:33:26 ubuntu kernel: [ 134.147002][T13860] ? __pmd_alloc+0x410/0x410
> Oct 11 14:33:26 ubuntu kernel: [ 134.147017][T13860] ? __kasan_check_write+0x14/0x20
> Oct 11 14:33:26 ubuntu kernel: [ 134.147024][T13860] ? up_read+0x1b6/0x7a0
> Oct 11 14:33:26 ubuntu kernel: [ 134.147033][T13860] ? down_read_nested+0x480/0x480
> Oct 11 14:33:26 ubuntu kernel: [ 134.147048][T13860] ? do_syscall_64+0x26/0x6a0
> Oct 11 14:33:26 ubuntu kernel: [ 134.147056][T13860] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
> Oct 11 14:33:26 ubuntu kernel: [ 134.147063][T13860] ? do_syscall_64+0x26/0x6a0
> Oct 11 14:33:26 ubuntu kernel: [ 134.147074][T13860] __x64_sys_bpf+0x73/0xb0
> Oct 11 14:33:26 ubuntu kernel: [ 134.147084][T13860] do_syscall_64+0xde/0x6a0
> Oct 11 14:33:26 ubuntu kernel: [ 134.147095][T13860] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> (...snipped...)
> Oct 11 14:33:41 ubuntu kernel: [ 148.384539][ T4514] unregister_netdevice: waiting for bridge_slave_0 to become free. Usage count = 850 ffff888091fd2000
> ----------------------------------------
>
Alexei Starovoitov <[email protected]> writes:
> On Fri, Oct 11, 2019 at 3:15 AM Tetsuo Handa
> <[email protected]> wrote:
>>
>> Hello.
>>
>> I noticed that syzbot is reporting that refcount incremented by bpf(BPF_MAP_UPDATE_ELEM)
>> syscall is not decremented when unregister_netdevice() is called. Is this a BPF bug?
>
> Jesper, Toke,
> please take a look.
Yeah, that unregister notification handler definitely looks broken for
hashmaps; I'll send a patch :)
-Toke
Hello.
syzbot is still reporting that bpf(BPF_MAP_UPDATE_ELEM) causes
unregister_netdevice() to hang. It seems that commit 546ac1ffb70d25b5
("bpf: add devmap, a map for storing net device references") assigned
dtab->netdev_map[i] at dev_map_update_elem() but commit 6f9d451ab1a33728
("xdp: Add devmap_hash map type for looking up devices by hashed index")
forgot to assign dtab->netdev_map[idx] at __dev_map_hash_update_elem()
when dev is newly allocated by __dev_map_alloc_node(). As far as I and
syzbot tested, https://syzkaller.appspot.com/x/patch.diff?x=140dd206e00000
can avoid the problem, but I don't know whether this is right location to
assign it. Please check and fix.
Tetsuo Handa <[email protected]> writes:
> Hello.
>
> syzbot is still reporting that bpf(BPF_MAP_UPDATE_ELEM) causes
> unregister_netdevice() to hang. It seems that commit 546ac1ffb70d25b5
> ("bpf: add devmap, a map for storing net device references") assigned
> dtab->netdev_map[i] at dev_map_update_elem() but commit 6f9d451ab1a33728
> ("xdp: Add devmap_hash map type for looking up devices by hashed index")
> forgot to assign dtab->netdev_map[idx] at __dev_map_hash_update_elem()
> when dev is newly allocated by __dev_map_alloc_node(). As far as I and
> syzbot tested, https://syzkaller.appspot.com/x/patch.diff?x=140dd206e00000
> can avoid the problem, but I don't know whether this is right location to
> assign it. Please check and fix.
Hi Tetsuo
Sorry for missing this email last week :(
I think the issue is not a missing update of dtab->netdev_map (that is
not used at all for DEVMAP_HASH), but rather that dev_map_free() is not
cleaning up properly for DEVMAP_HASH types. Could you please check if
the patch below helps?
-Toke
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 3867864cdc2f..42ccfcb38424 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -74,7 +74,7 @@ struct bpf_dtab_netdev {
struct bpf_dtab {
struct bpf_map map;
- struct bpf_dtab_netdev **netdev_map;
+ struct bpf_dtab_netdev **netdev_map; /* DEVMAP type only */
struct list_head __percpu *flush_list;
struct list_head list;
@@ -101,6 +101,12 @@ static struct hlist_head *dev_map_create_hash(unsigned int entries)
return hash;
}
+static inline struct hlist_head *dev_map_index_hash(struct bpf_dtab *dtab,
+ int idx)
+{
+ return &dtab->dev_index_head[idx & (dtab->n_buckets - 1)];
+}
+
static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
{
int err, cpu;
@@ -143,24 +149,22 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
for_each_possible_cpu(cpu)
INIT_LIST_HEAD(per_cpu_ptr(dtab->flush_list, cpu));
- dtab->netdev_map = bpf_map_area_alloc(dtab->map.max_entries *
- sizeof(struct bpf_dtab_netdev *),
- dtab->map.numa_node);
- if (!dtab->netdev_map)
- goto free_percpu;
-
if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
dtab->dev_index_head = dev_map_create_hash(dtab->n_buckets);
if (!dtab->dev_index_head)
- goto free_map_area;
+ goto free_percpu;
spin_lock_init(&dtab->index_lock);
+ } else {
+ dtab->netdev_map = bpf_map_area_alloc(dtab->map.max_entries *
+ sizeof(struct bpf_dtab_netdev *),
+ dtab->map.numa_node);
+ if (!dtab->netdev_map)
+ goto free_percpu;
}
return 0;
-free_map_area:
- bpf_map_area_free(dtab->netdev_map);
free_percpu:
free_percpu(dtab->flush_list);
free_charge:
@@ -228,21 +232,40 @@ static void dev_map_free(struct bpf_map *map)
cond_resched();
}
- for (i = 0; i < dtab->map.max_entries; i++) {
- struct bpf_dtab_netdev *dev;
+ if (dtab->map.map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
+ for (i = 0; i < dtab->n_buckets; i++) {
+ struct bpf_dtab_netdev *dev;
+ struct hlist_head *head;
+ struct hlist_node *next;
- dev = dtab->netdev_map[i];
- if (!dev)
- continue;
+ head = dev_map_index_hash(dtab, i);
- free_percpu(dev->bulkq);
- dev_put(dev->dev);
- kfree(dev);
+ hlist_for_each_entry_safe(dev, next, head, index_hlist) {
+ hlist_del_rcu(&dev->index_hlist);
+ free_percpu(dev->bulkq);
+ dev_put(dev->dev);
+ kfree(dev);
+ }
+ }
+
+ kfree(dtab->dev_index_head);
+ } else {
+ for (i = 0; i < dtab->map.max_entries; i++) {
+ struct bpf_dtab_netdev *dev;
+
+ dev = dtab->netdev_map[i];
+ if (!dev)
+ continue;
+
+ free_percpu(dev->bulkq);
+ dev_put(dev->dev);
+ kfree(dev);
+ }
+
+ bpf_map_area_free(dtab->netdev_map);
}
free_percpu(dtab->flush_list);
- bpf_map_area_free(dtab->netdev_map);
- kfree(dtab->dev_index_head);
kfree(dtab);
}
@@ -263,12 +286,6 @@ static int dev_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
return 0;
}
-static inline struct hlist_head *dev_map_index_hash(struct bpf_dtab *dtab,
- int idx)
-{
- return &dtab->dev_index_head[idx & (dtab->n_buckets - 1)];
-}
-
struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
Hello people involved in commit a3e23f719f5c4a38 ("net-sysfs: call dev_hold if kobject_init_and_add success")
and commit b8eb718348b8fb30 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject").
syzbot is reporting that unregister_netdevice() hangs due to underflowing
device refcount when kobject_init_and_add() failed due to -ENOMEM.
----------
11:25:02 executing program 3 (fault-call:5 fault-nth:2):
r0 = openat$tun(0xffffffffffffff9c, &(0x7f0000000100)='/dev/net/tun\x00', 0x0, 0x0)
ioctl$TUNSETIFF(r0, 0x400454ca, &(0x7f0000000000)={'vet\x00\x00\x00\x00\x00\x00\x00\x00\x00\xbdh\x00', 0x43732e5398416f1a})
ioctl$TUNSETQUEUE(r0, 0x400454d9, &(0x7f00000000c0)={'\x00', 0x400})
r1 = openat$tun(0xffffffffffffff9c, &(0x7f0000000080)='/dev/net/tun\x00', 0x0, 0x0)
ioctl$TUNSETIFF(r1, 0x400454ca, &(0x7f0000000000)={'vet\x00\x00\x00\x00\x00\x00\x00\x00\x00\xbdh\x00', 0x43732e5398416f1a})
ioctl$TUNSETQUEUE(r0, 0x400454d9, &(0x7f0000000040)={'lo\x00', 0x200})
----------
----------
[ 60.043899] IPVS: ftp: loaded support on port[0] = 21
[ 60.275782] chnl_net:caif_netlink_parms(): no params data found
[ 60.305039] bridge0: port 1(bridge_slave_0) entered blocking state
[ 60.305551] bridge0: port 1(bridge_slave_0) entered disabled state
[ 60.306366] device bridge_slave_0 entered promiscuous mode
[ 60.311776] bridge0: port 2(bridge_slave_1) entered blocking state
[ 60.312032] bridge0: port 2(bridge_slave_1) entered disabled state
[ 60.312858] device bridge_slave_1 entered promiscuous mode
[ 60.336705] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 60.338321] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
[ 60.357851] team0: Port device team_slave_0 added
[ 60.359250] team0: Port device team_slave_1 added
[ 60.522829] device hsr_slave_0 entered promiscuous mode
[ 60.651798] device hsr_slave_1 entered promiscuous mode
[ 60.790287] netdevsim netdevsim0 netdevsim0: renamed from eth0
[ 60.854953] netdevsim netdevsim0 netdevsim1: renamed from eth1
[ 60.911733] netdevsim netdevsim0 netdevsim2: renamed from eth2
[ 60.974063] netdevsim netdevsim0 netdevsim3: renamed from eth3
[ 61.109590] bridge0: port 2(bridge_slave_1) entered blocking state
[ 61.109922] bridge0: port 2(bridge_slave_1) entered forwarding state
[ 61.110384] bridge0: port 1(bridge_slave_0) entered blocking state
[ 61.110556] bridge0: port 1(bridge_slave_0) entered forwarding state
[ 61.151643] 8021q: adding VLAN 0 to HW filter on device bond0
[ 61.156692] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 61.164101] bridge0: port 1(bridge_slave_0) entered disabled state
[ 61.190521] bridge0: port 2(bridge_slave_1) entered disabled state
[ 61.230466] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[ 61.283759] 8021q: adding VLAN 0 to HW filter on device team0
[ 61.364323] IPv6: ADDRCONF(NETDEV_CHANGE): bridge_slave_0: link becomes ready
[ 61.366383] bridge0: port 1(bridge_slave_0) entered blocking state
[ 61.366568] bridge0: port 1(bridge_slave_0) entered forwarding state
[ 61.367033] IPv6: ADDRCONF(NETDEV_CHANGE): bridge_slave_1: link becomes ready
[ 61.367556] bridge0: port 2(bridge_slave_1) entered blocking state
[ 61.367727] bridge0: port 2(bridge_slave_1) entered forwarding state
[ 61.372342] IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
[ 61.377760] IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
[ 61.381755] IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_0: link becomes ready
[ 61.383474] IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_1: link becomes ready
[ 61.386511] IPv6: ADDRCONF(NETDEV_CHANGE): hsr0: link becomes ready
[ 61.405483] 8021q: adding VLAN 0 to HW filter on device batadv0
[ 61.408968] IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
[ 61.412478] IPv6: ADDRCONF(NETDEV_CHANGE): vxcan1: link becomes ready
[ 61.414712] IPv6: ADDRCONF(NETDEV_CHANGE): vxcan0: link becomes ready
[ 61.466051] FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 1
[ 61.468544] CPU: 6 PID: 6365 Comm: syz-executor Not tainted 5.4.0+ #223
[ 61.469778] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[ 61.473052] Call Trace:
[ 61.475277] dump_stack+0x163/0x1d5
[ 61.476597] should_fail+0x655/0x740
[ 61.477847] ? fault_create_debugfs_attr+0x170/0x170
[ 61.479192] ? ___might_sleep+0x1de/0x500
[ 61.480520] __should_failslab+0xde/0x130
[ 61.481834] should_failslab+0x9/0x14
[ 61.483145] kmem_cache_alloc+0x28e/0x770
[ 61.484448] ? memcpy+0x45/0x50
[ 61.486397] ? kstrdup+0x59/0x70
[ 61.488702] __kernfs_new_node+0xde/0x6f0
[ 61.490999] ? kernfs_dop_revalidate+0x380/0x380
[ 61.493038] ? __put_user_ns+0x60/0x60
[ 61.495621] ? mark_lock+0x11f/0xc60
[ 61.497768] ? put_dec+0xc0/0xc0
[ 61.499926] kernfs_new_node+0x97/0x110
[ 61.501971] kernfs_create_dir_ns+0x4d/0x150
[ 61.504019] sysfs_create_dir_ns+0x13b/0x2a0
[ 61.505973] ? sysfs_create_mount_point+0xb0/0xb0
[ 61.507978] ? rwlock_bug.part.2+0x90/0x90
[ 61.509917] ? lock_acquire+0x19f/0x3f0
[ 61.511893] ? __kasan_check_read+0x11/0x20
[ 61.513887] ? do_raw_spin_unlock+0x54/0x260
[ 61.515829] kobject_add_internal+0x223/0x9a0
[ 61.517825] kobject_init_and_add+0xff/0x170
[ 61.519811] ? kset_create_and_add+0x180/0x180
[ 61.521849] ? lock_acquire+0x19f/0x3f0
[ 61.523865] ? rtnl_lock+0x17/0x20
[ 61.525787] netdev_queue_update_kobjects+0xeb/0x370
[ 61.527816] netif_set_real_num_tx_queues+0x188/0x740
[ 61.530219] ? mutex_lock_io_nested+0x14b0/0x14b0
[ 61.532255] tun_attach+0x4bd/0x1250
[ 61.534231] ? lock_acquire+0x19f/0x3f0
[ 61.536130] __tun_chr_ioctl+0x6fd/0x3b50
[ 61.537960] ? tun_flow_update+0xba0/0xba0
[ 61.539709] ? __kasan_check_read+0x11/0x20
[ 61.541376] ? mark_lock+0x11f/0xc60
[ 61.543057] ? _kstrtoull+0x11c/0x1c0
[ 61.544725] ? __kasan_check_read+0x11/0x20
[ 61.546556] ? __lock_acquire+0xc5c/0x3b30
[ 61.548282] ? __kasan_check_read+0x11/0x20
[ 61.550033] ? mark_lock+0x11f/0xc60
[ 61.551749] ? __kasan_check_read+0x11/0x20
[ 61.553502] ? __lock_acquire+0xc5c/0x3b30
[ 61.555247] ? __fget+0x31c/0x4d0
[ 61.556976] ? tun_chr_compat_ioctl+0x50/0x50
[ 61.558743] tun_chr_ioctl+0x2a/0x40
[ 61.560255] ? tun_chr_ioctl+0x2a/0x40
[ 61.562490] do_vfs_ioctl+0x1a2/0x1150
[ 61.564138] ? rcu_read_lock_held+0x9c/0xb0
[ 61.565830] ? ioctl_preallocate+0x1e0/0x1e0
[ 61.567495] ? __fget+0x33e/0x4d0
[ 61.569168] ? do_dup2+0x4d0/0x4d0
[ 61.570928] ? fput_many+0xe6/0x150
[ 61.572618] ? fput+0x1a/0x20
[ 61.574382] ? security_file_ioctl+0x81/0xb0
[ 61.576049] ksys_ioctl+0x94/0xb0
[ 61.577783] __x64_sys_ioctl+0x73/0xb0
[ 61.579526] do_syscall_64+0xde/0x6c0
[ 61.581302] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 61.583010] RIP: 0033:0x45a729
[ 61.584734] Code: bd b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 8b b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00
[ 61.590407] RSP: 002b:00007f25d540ec88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 61.592488] RAX: ffffffffffffffda RBX: 000000000071bf00 RCX: 000000000045a729
[ 61.594552] RDX: 0000000020000040 RSI: 00000000400454d9 RDI: 0000000000000003
[ 61.596829] RBP: 00007f25d540eca0 R08: 0000000000000000 R09: 0000000000000000
[ 61.598540] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f25d540f6d4
[ 61.600278] R13: 00000000004ac5a5 R14: 00000000006ee8a0 R15: 0000000000000005
[ 61.655323] kobject_add_internal failed for tx-1 (error: -12 parent: queues)
[ 71.760970] unregister_netdevice: waiting for vet to become free. Usage count = -1
[ 82.028434] unregister_netdevice: waiting for vet to become free. Usage count = -1
[ 92.140031] unregister_netdevice: waiting for vet to become free. Usage count = -1
----------
Worrisome part is that tun_attach() calls tun_set_real_num_queues() at the end of tun_attach()
but tun_set_real_num_queues() is not handling netif_set_real_num_tx_queues() failure.
That is, tun_attach() is returning success even if netdev_queue_update_kobjects() from
netif_set_real_num_tx_queues() failed.
static void tun_set_real_num_queues(struct tun_struct *tun)
{
netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
}
And I guess that ignoring that failure causes clean-up function to drop a refcount
which was not held by initialization function. Applying below diff seems to avoid
this problem. Please check.
----------
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index ae3bcb1540ec..562d06c274aa 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1459,14 +1459,14 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index)
struct kobject *kobj = &queue->kobj;
int error = 0;
+ dev_hold(queue->dev);
+
kobj->kset = dev->queues_kset;
error = kobject_init_and_add(kobj, &netdev_queue_ktype, NULL,
"tx-%u", index);
if (error)
goto err;
- dev_hold(queue->dev);
-
#ifdef CONFIG_BQL
error = sysfs_create_group(kobj, &dql_group);
if (error)
----------
----------
[ 64.482925] IPVS: ftp: loaded support on port[0] = 21
[ 64.684701] chnl_net:caif_netlink_parms(): no params data found
[ 64.757440] bridge0: port 1(bridge_slave_0) entered blocking state
[ 64.757596] bridge0: port 1(bridge_slave_0) entered disabled state
[ 64.760043] device bridge_slave_0 entered promiscuous mode
[ 64.761799] bridge0: port 2(bridge_slave_1) entered blocking state
[ 64.762025] bridge0: port 2(bridge_slave_1) entered disabled state
[ 64.793334] device bridge_slave_1 entered promiscuous mode
[ 64.818373] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 64.822950] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
[ 64.843403] team0: Port device team_slave_0 added
[ 64.844859] team0: Port device team_slave_1 added
[ 64.933830] device hsr_slave_0 entered promiscuous mode
[ 64.972990] device hsr_slave_1 entered promiscuous mode
[ 65.048057] netdevsim netdevsim0 netdevsim0: renamed from eth0
[ 65.113612] netdevsim netdevsim0 netdevsim1: renamed from eth1
[ 65.191758] netdevsim netdevsim0 netdevsim2: renamed from eth2
[ 65.262611] netdevsim netdevsim0 netdevsim3: renamed from eth3
[ 65.339507] bridge0: port 2(bridge_slave_1) entered blocking state
[ 65.339821] bridge0: port 2(bridge_slave_1) entered forwarding state
[ 65.340340] bridge0: port 1(bridge_slave_0) entered blocking state
[ 65.340514] bridge0: port 1(bridge_slave_0) entered forwarding state
[ 65.486729] 8021q: adding VLAN 0 to HW filter on device bond0
[ 65.545043] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 65.567266] bridge0: port 1(bridge_slave_0) entered disabled state
[ 65.592695] bridge0: port 2(bridge_slave_1) entered disabled state
[ 65.631471] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[ 65.656557] 8021q: adding VLAN 0 to HW filter on device team0
[ 65.660768] IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
[ 65.661411] IPv6: ADDRCONF(NETDEV_CHANGE): bridge_slave_0: link becomes ready
[ 65.661949] bridge0: port 1(bridge_slave_0) entered blocking state
[ 65.662127] bridge0: port 1(bridge_slave_0) entered forwarding state
[ 65.693984] IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
[ 65.697297] IPv6: ADDRCONF(NETDEV_CHANGE): bridge_slave_1: link becomes ready
[ 65.701488] bridge0: port 2(bridge_slave_1) entered blocking state
[ 65.703895] bridge0: port 2(bridge_slave_1) entered forwarding state
[ 65.706370] IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bond: link becomes ready
[ 65.724296] hsr0: Slave A (hsr_slave_0) is not up; please bring it up to get a fully working HSR network
[ 65.728690] hsr0: Slave B (hsr_slave_1) is not up; please bring it up to get a fully working HSR network
[ 65.777203] IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bond: link becomes ready
[ 65.782476] IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_team: link becomes ready
[ 65.785989] IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
[ 65.789006] IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_team: link becomes ready
[ 65.796973] IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
[ 65.801313] IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_hsr: link becomes ready
[ 65.804748] IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_0: link becomes ready
[ 65.807939] IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_hsr: link becomes ready
[ 65.811874] IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_1: link becomes ready
[ 65.814814] IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
[ 65.824311] IPv6: ADDRCONF(NETDEV_CHANGE): hsr0: link becomes ready
[ 65.835888] 8021q: adding VLAN 0 to HW filter on device batadv0
[ 65.840711] IPv6: ADDRCONF(NETDEV_CHANGE): vxcan1: link becomes ready
[ 65.843289] IPv6: ADDRCONF(NETDEV_CHANGE): vxcan0: link becomes ready
[ 66.055083] FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 1
[ 66.058933] CPU: 1 PID: 6375 Comm: syz-executor Not tainted 5.4.0+ #224
[ 66.060904] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[ 66.065868] Call Trace:
[ 66.068993] dump_stack+0x163/0x1d5
[ 66.071611] should_fail+0x655/0x740
[ 66.074082] ? fault_create_debugfs_attr+0x170/0x170
[ 66.076772] ? ___might_sleep+0x1de/0x500
[ 66.079324] __should_failslab+0xde/0x130
[ 66.082540] should_failslab+0x9/0x14
[ 66.084806] kmem_cache_alloc+0x28e/0x770
[ 66.087262] ? memcpy+0x45/0x50
[ 66.089712] ? kstrdup+0x59/0x70
[ 66.092133] __kernfs_new_node+0xde/0x6f0
[ 66.094512] ? kernfs_dop_revalidate+0x380/0x380
[ 66.096934] ? __put_user_ns+0x60/0x60
[ 66.099303] ? mark_lock+0x11f/0xc60
[ 66.101585] ? put_dec+0xc0/0xc0
[ 66.104388] kernfs_new_node+0x97/0x110
[ 66.106537] kernfs_create_dir_ns+0x4d/0x150
[ 66.108622] sysfs_create_dir_ns+0x13b/0x2a0
[ 66.110584] ? sysfs_create_mount_point+0xb0/0xb0
[ 66.112769] ? rwlock_bug.part.2+0x90/0x90
[ 66.114866] ? lock_acquire+0x19f/0x3f0
[ 66.116925] ? __kasan_check_read+0x11/0x20
[ 66.118888] ? do_raw_spin_unlock+0x54/0x260
[ 66.120909] kobject_add_internal+0x223/0x9a0
[ 66.122835] kobject_init_and_add+0xff/0x170
[ 66.124860] ? kset_create_and_add+0x180/0x180
[ 66.126944] ? lock_acquire+0x19f/0x3f0
[ 66.129143] ? rtnl_lock+0x17/0x20
[ 66.131087] netdev_queue_update_kobjects+0x118/0x390
[ 66.133028] ? __kasan_check_read+0x11/0x20
[ 66.134944] netif_set_real_num_tx_queues+0x188/0x740
[ 66.137012] ? mutex_lock_io_nested+0x14b0/0x14b0
[ 66.139043] tun_attach+0x4bd/0x1250
[ 66.140950] ? lock_acquire+0x19f/0x3f0
[ 66.142870] __tun_chr_ioctl+0x6fd/0x3b50
[ 66.144724] ? tun_flow_update+0xba0/0xba0
[ 66.146532] ? __kasan_check_read+0x11/0x20
[ 66.148821] ? mark_lock+0x11f/0xc60
[ 66.150635] ? _kstrtoull+0x11c/0x1c0
[ 66.152399] ? __kasan_check_read+0x11/0x20
[ 66.154056] ? __lock_acquire+0xc5c/0x3b30
[ 66.155701] ? __kasan_check_read+0x11/0x20
[ 66.157438] ? mark_lock+0x11f/0xc60
[ 66.159265] ? __kasan_check_read+0x11/0x20
[ 66.161032] ? __lock_acquire+0xc5c/0x3b30
[ 66.162885] ? __fget+0x31c/0x4d0
[ 66.164619] ? tun_chr_compat_ioctl+0x50/0x50
[ 66.166104] tun_chr_ioctl+0x2a/0x40
[ 66.167538] ? tun_chr_ioctl+0x2a/0x40
[ 66.169199] do_vfs_ioctl+0x1a2/0x1150
[ 66.170969] ? rcu_read_lock_held+0x9c/0xb0
[ 66.172735] ? ioctl_preallocate+0x1e0/0x1e0
[ 66.174487] ? __fget+0x33e/0x4d0
[ 66.176170] ? do_dup2+0x4d0/0x4d0
[ 66.177866] ? fput_many+0xe6/0x150
[ 66.179520] ? fput+0x1a/0x20
[ 66.181176] ? security_file_ioctl+0x81/0xb0
[ 66.182909] ksys_ioctl+0x94/0xb0
[ 66.184452] __x64_sys_ioctl+0x73/0xb0
[ 66.186169] do_syscall_64+0xde/0x6c0
[ 66.187908] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 66.189678] RIP: 0033:0x45a729
[ 66.191328] Code: bd b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 8b b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00
[ 66.196839] RSP: 002b:00007f08f9987c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 66.198849] RAX: ffffffffffffffda RBX: 000000000071bf00 RCX: 000000000045a729
[ 66.200876] RDX: 0000000020000040 RSI: 00000000400454d9 RDI: 0000000000000003
[ 66.202947] RBP: 00007f08f9987ca0 R08: 0000000000000000 R09: 0000000000000000
[ 66.204491] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f08f99886d4
[ 66.205688] R13: 00000000004ac5a5 R14: 00000000006ee8a0 R15: 0000000000000005
[ 66.277453] kobject_add_internal failed for tx-1 (error: -12 parent: queues)
[ 67.499501] syz-executor (6368) used greatest stack depth: 23704 bytes left
[ 70.237146] tipc: TX() has been purged, node left!
[ 75.794671] device bridge_slave_1 left promiscuous mode
[ 75.797354] bridge0: port 2(bridge_slave_1) entered disabled state
[ 75.861863] device bridge_slave_0 left promiscuous mode
[ 75.864646] bridge0: port 1(bridge_slave_0) entered disabled state
[ 79.752910] device hsr_slave_0 left promiscuous mode
[ 79.841067] device hsr_slave_1 left promiscuous mode
[ 79.939879] team0 (unregistering): Port device team_slave_1 removed
[ 79.963767] team0 (unregistering): Port device team_slave_0 removed
[ 79.985156] bond0 (unregistering): (slave bond_slave_1): Releasing backup interface
[ 80.090348] bond0 (unregistering): (slave bond_slave_0): Releasing backup interface
[ 80.352617] bond0 (unregistering): Released all slaves
----------
On Thu, Nov 28, 2019 at 10:56 AM Tetsuo Handa
<[email protected]> wrote:
>
> Hello people involved in commit a3e23f719f5c4a38 ("net-sysfs: call dev_hold if kobject_init_and_add success")
> and commit b8eb718348b8fb30 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject").
>
> syzbot is reporting that unregister_netdevice() hangs due to underflowing
> device refcount when kobject_init_and_add() failed due to -ENOMEM.
>
Tetsuo, would you happen to have a C reproducer program that creates
the trace you reported?
I could not quickly find one that by the date would fit to when we
included our change on this syzbot page:
https://syzkaller.appspot.com/bug?id=bae9a2236bfede42cf3d219e6bf6740c583568a4
Jouni, can you please follow up on this report.
Thanks.
Lukas
Tetsuo Handa <[email protected]> writes:
> Hello people involved in commit a3e23f719f5c4a38 ("net-sysfs: call dev_hold if kobject_init_and_add success")
> and commit b8eb718348b8fb30 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject").
>
> syzbot is reporting that unregister_netdevice() hangs due to underflowing
> device refcount when kobject_init_and_add() failed due to -ENOMEM.
>
> ----------
> 11:25:02 executing program 3 (fault-call:5 fault-nth:2):
> r0 = openat$tun(0xffffffffffffff9c, &(0x7f0000000100)='/dev/net/tun\x00', 0x0, 0x0)
> ioctl$TUNSETIFF(r0, 0x400454ca, &(0x7f0000000000)={'vet\x00\x00\x00\x00\x00\x00\x00\x00\x00\xbdh\x00', 0x43732e5398416f1a})
> ioctl$TUNSETQUEUE(r0, 0x400454d9, &(0x7f00000000c0)={'\x00', 0x400})
> r1 = openat$tun(0xffffffffffffff9c, &(0x7f0000000080)='/dev/net/tun\x00', 0x0, 0x0)
> ioctl$TUNSETIFF(r1, 0x400454ca, &(0x7f0000000000)={'vet\x00\x00\x00\x00\x00\x00\x00\x00\x00\xbdh\x00', 0x43732e5398416f1a})
> ioctl$TUNSETQUEUE(r0, 0x400454d9, &(0x7f0000000040)={'lo\x00', 0x200})
> ----------
>
> ----------
> [ 60.043899] IPVS: ftp: loaded support on port[0] = 21
> [ 60.275782] chnl_net:caif_netlink_parms(): no params data found
> [ 60.305039] bridge0: port 1(bridge_slave_0) entered blocking state
> [ 60.305551] bridge0: port 1(bridge_slave_0) entered disabled state
> [ 60.306366] device bridge_slave_0 entered promiscuous mode
> [ 60.311776] bridge0: port 2(bridge_slave_1) entered blocking state
> [ 60.312032] bridge0: port 2(bridge_slave_1) entered disabled state
> [ 60.312858] device bridge_slave_1 entered promiscuous mode
> [ 60.336705] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
> [ 60.338321] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
> [ 60.357851] team0: Port device team_slave_0 added
> [ 60.359250] team0: Port device team_slave_1 added
> [ 60.522829] device hsr_slave_0 entered promiscuous mode
> [ 60.651798] device hsr_slave_1 entered promiscuous mode
> [ 60.790287] netdevsim netdevsim0 netdevsim0: renamed from eth0
> [ 60.854953] netdevsim netdevsim0 netdevsim1: renamed from eth1
> [ 60.911733] netdevsim netdevsim0 netdevsim2: renamed from eth2
> [ 60.974063] netdevsim netdevsim0 netdevsim3: renamed from eth3
> [ 61.109590] bridge0: port 2(bridge_slave_1) entered blocking state
> [ 61.109922] bridge0: port 2(bridge_slave_1) entered forwarding state
> [ 61.110384] bridge0: port 1(bridge_slave_0) entered blocking state
> [ 61.110556] bridge0: port 1(bridge_slave_0) entered forwarding state
> [ 61.151643] 8021q: adding VLAN 0 to HW filter on device bond0
> [ 61.156692] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 61.164101] bridge0: port 1(bridge_slave_0) entered disabled state
> [ 61.190521] bridge0: port 2(bridge_slave_1) entered disabled state
> [ 61.230466] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> [ 61.283759] 8021q: adding VLAN 0 to HW filter on device team0
> [ 61.364323] IPv6: ADDRCONF(NETDEV_CHANGE): bridge_slave_0: link becomes ready
> [ 61.366383] bridge0: port 1(bridge_slave_0) entered blocking state
> [ 61.366568] bridge0: port 1(bridge_slave_0) entered forwarding state
> [ 61.367033] IPv6: ADDRCONF(NETDEV_CHANGE): bridge_slave_1: link becomes ready
> [ 61.367556] bridge0: port 2(bridge_slave_1) entered blocking state
> [ 61.367727] bridge0: port 2(bridge_slave_1) entered forwarding state
> [ 61.372342] IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
> [ 61.377760] IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
> [ 61.381755] IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_0: link becomes ready
> [ 61.383474] IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_1: link becomes ready
> [ 61.386511] IPv6: ADDRCONF(NETDEV_CHANGE): hsr0: link becomes ready
> [ 61.405483] 8021q: adding VLAN 0 to HW filter on device batadv0
> [ 61.408968] IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
> [ 61.412478] IPv6: ADDRCONF(NETDEV_CHANGE): vxcan1: link becomes ready
> [ 61.414712] IPv6: ADDRCONF(NETDEV_CHANGE): vxcan0: link becomes ready
> [ 61.466051] FAULT_INJECTION: forcing a failure.
> name failslab, interval 1, probability 0, space 0, times 1
> [ 61.468544] CPU: 6 PID: 6365 Comm: syz-executor Not tainted 5.4.0+ #223
> [ 61.469778] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
> [ 61.473052] Call Trace:
> [ 61.475277] dump_stack+0x163/0x1d5
> [ 61.476597] should_fail+0x655/0x740
> [ 61.477847] ? fault_create_debugfs_attr+0x170/0x170
> [ 61.479192] ? ___might_sleep+0x1de/0x500
> [ 61.480520] __should_failslab+0xde/0x130
> [ 61.481834] should_failslab+0x9/0x14
> [ 61.483145] kmem_cache_alloc+0x28e/0x770
> [ 61.484448] ? memcpy+0x45/0x50
> [ 61.486397] ? kstrdup+0x59/0x70
> [ 61.488702] __kernfs_new_node+0xde/0x6f0
> [ 61.490999] ? kernfs_dop_revalidate+0x380/0x380
> [ 61.493038] ? __put_user_ns+0x60/0x60
> [ 61.495621] ? mark_lock+0x11f/0xc60
> [ 61.497768] ? put_dec+0xc0/0xc0
> [ 61.499926] kernfs_new_node+0x97/0x110
> [ 61.501971] kernfs_create_dir_ns+0x4d/0x150
> [ 61.504019] sysfs_create_dir_ns+0x13b/0x2a0
> [ 61.505973] ? sysfs_create_mount_point+0xb0/0xb0
> [ 61.507978] ? rwlock_bug.part.2+0x90/0x90
> [ 61.509917] ? lock_acquire+0x19f/0x3f0
> [ 61.511893] ? __kasan_check_read+0x11/0x20
> [ 61.513887] ? do_raw_spin_unlock+0x54/0x260
> [ 61.515829] kobject_add_internal+0x223/0x9a0
> [ 61.517825] kobject_init_and_add+0xff/0x170
> [ 61.519811] ? kset_create_and_add+0x180/0x180
> [ 61.521849] ? lock_acquire+0x19f/0x3f0
> [ 61.523865] ? rtnl_lock+0x17/0x20
> [ 61.525787] netdev_queue_update_kobjects+0xeb/0x370
> [ 61.527816] netif_set_real_num_tx_queues+0x188/0x740
> [ 61.530219] ? mutex_lock_io_nested+0x14b0/0x14b0
> [ 61.532255] tun_attach+0x4bd/0x1250
> [ 61.534231] ? lock_acquire+0x19f/0x3f0
> [ 61.536130] __tun_chr_ioctl+0x6fd/0x3b50
> [ 61.537960] ? tun_flow_update+0xba0/0xba0
> [ 61.539709] ? __kasan_check_read+0x11/0x20
> [ 61.541376] ? mark_lock+0x11f/0xc60
> [ 61.543057] ? _kstrtoull+0x11c/0x1c0
> [ 61.544725] ? __kasan_check_read+0x11/0x20
> [ 61.546556] ? __lock_acquire+0xc5c/0x3b30
> [ 61.548282] ? __kasan_check_read+0x11/0x20
> [ 61.550033] ? mark_lock+0x11f/0xc60
> [ 61.551749] ? __kasan_check_read+0x11/0x20
> [ 61.553502] ? __lock_acquire+0xc5c/0x3b30
> [ 61.555247] ? __fget+0x31c/0x4d0
> [ 61.556976] ? tun_chr_compat_ioctl+0x50/0x50
> [ 61.558743] tun_chr_ioctl+0x2a/0x40
> [ 61.560255] ? tun_chr_ioctl+0x2a/0x40
> [ 61.562490] do_vfs_ioctl+0x1a2/0x1150
> [ 61.564138] ? rcu_read_lock_held+0x9c/0xb0
> [ 61.565830] ? ioctl_preallocate+0x1e0/0x1e0
> [ 61.567495] ? __fget+0x33e/0x4d0
> [ 61.569168] ? do_dup2+0x4d0/0x4d0
> [ 61.570928] ? fput_many+0xe6/0x150
> [ 61.572618] ? fput+0x1a/0x20
> [ 61.574382] ? security_file_ioctl+0x81/0xb0
> [ 61.576049] ksys_ioctl+0x94/0xb0
> [ 61.577783] __x64_sys_ioctl+0x73/0xb0
> [ 61.579526] do_syscall_64+0xde/0x6c0
> [ 61.581302] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 61.583010] RIP: 0033:0x45a729
> [ 61.584734] Code: bd b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 8b b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> [ 61.590407] RSP: 002b:00007f25d540ec88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 61.592488] RAX: ffffffffffffffda RBX: 000000000071bf00 RCX: 000000000045a729
> [ 61.594552] RDX: 0000000020000040 RSI: 00000000400454d9 RDI: 0000000000000003
> [ 61.596829] RBP: 00007f25d540eca0 R08: 0000000000000000 R09: 0000000000000000
> [ 61.598540] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f25d540f6d4
> [ 61.600278] R13: 00000000004ac5a5 R14: 00000000006ee8a0 R15: 0000000000000005
> [ 61.655323] kobject_add_internal failed for tx-1 (error: -12 parent: queues)
> [ 71.760970] unregister_netdevice: waiting for vet to become free. Usage count = -1
> [ 82.028434] unregister_netdevice: waiting for vet to become free. Usage count = -1
> [ 92.140031] unregister_netdevice: waiting for vet to become free. Usage count = -1
> ----------
>
> Worrisome part is that tun_attach() calls tun_set_real_num_queues() at the end of tun_attach()
> but tun_set_real_num_queues() is not handling netif_set_real_num_tx_queues() failure.
> That is, tun_attach() is returning success even if netdev_queue_update_kobjects() from
> netif_set_real_num_tx_queues() failed.
>
> static void tun_set_real_num_queues(struct tun_struct *tun)
> {
> netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
> netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
> }
>
> And I guess that ignoring that failure causes clean-up function to drop a refcount
> which was not held by initialization function. Applying below diff seems to avoid
> this problem. Please check.
>
> ----------
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index ae3bcb1540ec..562d06c274aa 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1459,14 +1459,14 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index)
> struct kobject *kobj = &queue->kobj;
> int error = 0;
>
> + dev_hold(queue->dev);
> +
> kobj->kset = dev->queues_kset;
> error = kobject_init_and_add(kobj, &netdev_queue_ktype, NULL,
> "tx-%u", index);
> if (error)
> goto err;
>
> - dev_hold(queue->dev);
> -
> #ifdef CONFIG_BQL
> error = sysfs_create_group(kobj, &dql_group);
> if (error)
I think this is workaround for the issue you pointed out above. I'll
guess we need to implement proper error handling for
netif_set_real_num_tx_queues to avoid this regression. I was a bit concerned
there is something relying netdev_queue_add_kobject and
rx_queue_add_kobject not freeing the reference. This was
it... Reproducer here would help.
> ----------
>
> ----------
> [ 64.482925] IPVS: ftp: loaded support on port[0] = 21
> [ 64.684701] chnl_net:caif_netlink_parms(): no params data found
> [ 64.757440] bridge0: port 1(bridge_slave_0) entered blocking state
> [ 64.757596] bridge0: port 1(bridge_slave_0) entered disabled state
> [ 64.760043] device bridge_slave_0 entered promiscuous mode
> [ 64.761799] bridge0: port 2(bridge_slave_1) entered blocking state
> [ 64.762025] bridge0: port 2(bridge_slave_1) entered disabled state
> [ 64.793334] device bridge_slave_1 entered promiscuous mode
> [ 64.818373] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
> [ 64.822950] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
> [ 64.843403] team0: Port device team_slave_0 added
> [ 64.844859] team0: Port device team_slave_1 added
> [ 64.933830] device hsr_slave_0 entered promiscuous mode
> [ 64.972990] device hsr_slave_1 entered promiscuous mode
> [ 65.048057] netdevsim netdevsim0 netdevsim0: renamed from eth0
> [ 65.113612] netdevsim netdevsim0 netdevsim1: renamed from eth1
> [ 65.191758] netdevsim netdevsim0 netdevsim2: renamed from eth2
> [ 65.262611] netdevsim netdevsim0 netdevsim3: renamed from eth3
> [ 65.339507] bridge0: port 2(bridge_slave_1) entered blocking state
> [ 65.339821] bridge0: port 2(bridge_slave_1) entered forwarding state
> [ 65.340340] bridge0: port 1(bridge_slave_0) entered blocking state
> [ 65.340514] bridge0: port 1(bridge_slave_0) entered forwarding state
> [ 65.486729] 8021q: adding VLAN 0 to HW filter on device bond0
> [ 65.545043] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 65.567266] bridge0: port 1(bridge_slave_0) entered disabled state
> [ 65.592695] bridge0: port 2(bridge_slave_1) entered disabled state
> [ 65.631471] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> [ 65.656557] 8021q: adding VLAN 0 to HW filter on device team0
> [ 65.660768] IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bridge: link becomes ready
> [ 65.661411] IPv6: ADDRCONF(NETDEV_CHANGE): bridge_slave_0: link becomes ready
> [ 65.661949] bridge0: port 1(bridge_slave_0) entered blocking state
> [ 65.662127] bridge0: port 1(bridge_slave_0) entered forwarding state
> [ 65.693984] IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bridge: link becomes ready
> [ 65.697297] IPv6: ADDRCONF(NETDEV_CHANGE): bridge_slave_1: link becomes ready
> [ 65.701488] bridge0: port 2(bridge_slave_1) entered blocking state
> [ 65.703895] bridge0: port 2(bridge_slave_1) entered forwarding state
> [ 65.706370] IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_bond: link becomes ready
> [ 65.724296] hsr0: Slave A (hsr_slave_0) is not up; please bring it up to get a fully working HSR network
> [ 65.728690] hsr0: Slave B (hsr_slave_1) is not up; please bring it up to get a fully working HSR network
> [ 65.777203] IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_bond: link becomes ready
> [ 65.782476] IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_team: link becomes ready
> [ 65.785989] IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_0: link becomes ready
> [ 65.789006] IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_team: link becomes ready
> [ 65.796973] IPv6: ADDRCONF(NETDEV_CHANGE): team_slave_1: link becomes ready
> [ 65.801313] IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_hsr: link becomes ready
> [ 65.804748] IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_0: link becomes ready
> [ 65.807939] IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_hsr: link becomes ready
> [ 65.811874] IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_1: link becomes ready
> [ 65.814814] IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
> [ 65.824311] IPv6: ADDRCONF(NETDEV_CHANGE): hsr0: link becomes ready
> [ 65.835888] 8021q: adding VLAN 0 to HW filter on device batadv0
> [ 65.840711] IPv6: ADDRCONF(NETDEV_CHANGE): vxcan1: link becomes ready
> [ 65.843289] IPv6: ADDRCONF(NETDEV_CHANGE): vxcan0: link becomes ready
> [ 66.055083] FAULT_INJECTION: forcing a failure.
> name failslab, interval 1, probability 0, space 0, times 1
> [ 66.058933] CPU: 1 PID: 6375 Comm: syz-executor Not tainted 5.4.0+ #224
> [ 66.060904] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
> [ 66.065868] Call Trace:
> [ 66.068993] dump_stack+0x163/0x1d5
> [ 66.071611] should_fail+0x655/0x740
> [ 66.074082] ? fault_create_debugfs_attr+0x170/0x170
> [ 66.076772] ? ___might_sleep+0x1de/0x500
> [ 66.079324] __should_failslab+0xde/0x130
> [ 66.082540] should_failslab+0x9/0x14
> [ 66.084806] kmem_cache_alloc+0x28e/0x770
> [ 66.087262] ? memcpy+0x45/0x50
> [ 66.089712] ? kstrdup+0x59/0x70
> [ 66.092133] __kernfs_new_node+0xde/0x6f0
> [ 66.094512] ? kernfs_dop_revalidate+0x380/0x380
> [ 66.096934] ? __put_user_ns+0x60/0x60
> [ 66.099303] ? mark_lock+0x11f/0xc60
> [ 66.101585] ? put_dec+0xc0/0xc0
> [ 66.104388] kernfs_new_node+0x97/0x110
> [ 66.106537] kernfs_create_dir_ns+0x4d/0x150
> [ 66.108622] sysfs_create_dir_ns+0x13b/0x2a0
> [ 66.110584] ? sysfs_create_mount_point+0xb0/0xb0
> [ 66.112769] ? rwlock_bug.part.2+0x90/0x90
> [ 66.114866] ? lock_acquire+0x19f/0x3f0
> [ 66.116925] ? __kasan_check_read+0x11/0x20
> [ 66.118888] ? do_raw_spin_unlock+0x54/0x260
> [ 66.120909] kobject_add_internal+0x223/0x9a0
> [ 66.122835] kobject_init_and_add+0xff/0x170
> [ 66.124860] ? kset_create_and_add+0x180/0x180
> [ 66.126944] ? lock_acquire+0x19f/0x3f0
> [ 66.129143] ? rtnl_lock+0x17/0x20
> [ 66.131087] netdev_queue_update_kobjects+0x118/0x390
> [ 66.133028] ? __kasan_check_read+0x11/0x20
> [ 66.134944] netif_set_real_num_tx_queues+0x188/0x740
> [ 66.137012] ? mutex_lock_io_nested+0x14b0/0x14b0
> [ 66.139043] tun_attach+0x4bd/0x1250
> [ 66.140950] ? lock_acquire+0x19f/0x3f0
> [ 66.142870] __tun_chr_ioctl+0x6fd/0x3b50
> [ 66.144724] ? tun_flow_update+0xba0/0xba0
> [ 66.146532] ? __kasan_check_read+0x11/0x20
> [ 66.148821] ? mark_lock+0x11f/0xc60
> [ 66.150635] ? _kstrtoull+0x11c/0x1c0
> [ 66.152399] ? __kasan_check_read+0x11/0x20
> [ 66.154056] ? __lock_acquire+0xc5c/0x3b30
> [ 66.155701] ? __kasan_check_read+0x11/0x20
> [ 66.157438] ? mark_lock+0x11f/0xc60
> [ 66.159265] ? __kasan_check_read+0x11/0x20
> [ 66.161032] ? __lock_acquire+0xc5c/0x3b30
> [ 66.162885] ? __fget+0x31c/0x4d0
> [ 66.164619] ? tun_chr_compat_ioctl+0x50/0x50
> [ 66.166104] tun_chr_ioctl+0x2a/0x40
> [ 66.167538] ? tun_chr_ioctl+0x2a/0x40
> [ 66.169199] do_vfs_ioctl+0x1a2/0x1150
> [ 66.170969] ? rcu_read_lock_held+0x9c/0xb0
> [ 66.172735] ? ioctl_preallocate+0x1e0/0x1e0
> [ 66.174487] ? __fget+0x33e/0x4d0
> [ 66.176170] ? do_dup2+0x4d0/0x4d0
> [ 66.177866] ? fput_many+0xe6/0x150
> [ 66.179520] ? fput+0x1a/0x20
> [ 66.181176] ? security_file_ioctl+0x81/0xb0
> [ 66.182909] ksys_ioctl+0x94/0xb0
> [ 66.184452] __x64_sys_ioctl+0x73/0xb0
> [ 66.186169] do_syscall_64+0xde/0x6c0
> [ 66.187908] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 66.189678] RIP: 0033:0x45a729
> [ 66.191328] Code: bd b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 8b b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> [ 66.196839] RSP: 002b:00007f08f9987c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 66.198849] RAX: ffffffffffffffda RBX: 000000000071bf00 RCX: 000000000045a729
> [ 66.200876] RDX: 0000000020000040 RSI: 00000000400454d9 RDI: 0000000000000003
> [ 66.202947] RBP: 00007f08f9987ca0 R08: 0000000000000000 R09: 0000000000000000
> [ 66.204491] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f08f99886d4
> [ 66.205688] R13: 00000000004ac5a5 R14: 00000000006ee8a0 R15: 0000000000000005
> [ 66.277453] kobject_add_internal failed for tx-1 (error: -12 parent: queues)
> [ 67.499501] syz-executor (6368) used greatest stack depth: 23704 bytes left
> [ 70.237146] tipc: TX() has been purged, node left!
> [ 75.794671] device bridge_slave_1 left promiscuous mode
> [ 75.797354] bridge0: port 2(bridge_slave_1) entered disabled state
> [ 75.861863] device bridge_slave_0 left promiscuous mode
> [ 75.864646] bridge0: port 1(bridge_slave_0) entered disabled state
> [ 79.752910] device hsr_slave_0 left promiscuous mode
> [ 79.841067] device hsr_slave_1 left promiscuous mode
> [ 79.939879] team0 (unregistering): Port device team_slave_1 removed
> [ 79.963767] team0 (unregistering): Port device team_slave_0 removed
> [ 79.985156] bond0 (unregistering): (slave bond_slave_1): Releasing backup interface
> [ 80.090348] bond0 (unregistering): (slave bond_slave_0): Releasing backup interface
> [ 80.352617] bond0 (unregistering): Released all slaves
> ----------
Tetsuo Handa <[email protected]> writes:
> [ 61.584734] Code: bd b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 8b b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> [ 61.590407] RSP: 002b:00007f25d540ec88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 61.592488] RAX: ffffffffffffffda RBX: 000000000071bf00 RCX: 000000000045a729
> [ 61.594552] RDX: 0000000020000040 RSI: 00000000400454d9 RDI: 0000000000000003
> [ 61.596829] RBP: 00007f25d540eca0 R08: 0000000000000000 R09: 0000000000000000
> [ 61.598540] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f25d540f6d4
> [ 61.600278] R13: 00000000004ac5a5 R14: 00000000006ee8a0 R15: 0000000000000005
> [ 61.655323] kobject_add_internal failed for tx-1 (error: -12 parent: queues)
> [ 71.760970] unregister_netdevice: waiting for vet to become free. Usage count = -1
> [ 82.028434] unregister_netdevice: waiting for vet to become free. Usage count = -1
> [ 92.140031] unregister_netdevice: waiting for vet to become free. Usage count = -1
> ----------
>
> Worrisome part is that tun_attach() calls tun_set_real_num_queues() at the end of tun_attach()
> but tun_set_real_num_queues() is not handling netif_set_real_num_tx_queues() failure.
> That is, tun_attach() is returning success even if netdev_queue_update_kobjects() from
> netif_set_real_num_tx_queues() failed.
>
> static void tun_set_real_num_queues(struct tun_struct *tun)
> {
> netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
> netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
> }
>
> And I guess that ignoring that failure causes clean-up function to drop a refcount
> which was not held by initialization function. Applying below diff seems to avoid
> this problem. Please check.
>
> ----------
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index ae3bcb1540ec..562d06c274aa 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1459,14 +1459,14 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index)
> struct kobject *kobj = &queue->kobj;
> int error = 0;
>
> + dev_hold(queue->dev);
> +
> kobj->kset = dev->queues_kset;
> error = kobject_init_and_add(kobj, &netdev_queue_ktype, NULL,
> "tx-%u", index);
> if (error)
> goto err;
>
> - dev_hold(queue->dev);
> -
> #ifdef CONFIG_BQL
> error = sysfs_create_group(kobj, &dql_group);
> if (error)
Now after reproducing the issue I think this is actually proper fix for
the issue. It's not related to missing error handling in in
tun_set_real_num_queues as I commented earlier. Can you prepare patch
for this?
BR,
Jouni Högander
On 2019/12/05 19:00, Jouni Högander wrote:
>> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
>> index ae3bcb1540ec..562d06c274aa 100644
>> --- a/net/core/net-sysfs.c
>> +++ b/net/core/net-sysfs.c
>> @@ -1459,14 +1459,14 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index)
>> struct kobject *kobj = &queue->kobj;
>> int error = 0;
>>
>> + dev_hold(queue->dev);
>> +
>> kobj->kset = dev->queues_kset;
>> error = kobject_init_and_add(kobj, &netdev_queue_ktype, NULL,
>> "tx-%u", index);
>> if (error)
>> goto err;
>>
>> - dev_hold(queue->dev);
>> -
>> #ifdef CONFIG_BQL
>> error = sysfs_create_group(kobj, &dql_group);
>> if (error)
>
> Now after reproducing the issue I think this is actually proper fix for
> the issue. It's not related to missing error handling in in
> tun_set_real_num_queues as I commented earlier. Can you prepare patch
> for this?
You can write the patch; I don't know about commit a3e23f719f5c4a38
("net-sysfs: call dev_hold if kobject_init_and_add success").
I was wondering how can the caller tell whether to drop the refcount, for
the caller won't be able to know which one (kobject_init_and_add() or
sysfs_create_group()) returned an error. Therefore, always taking the
refcount seems to be a proper fix...