2022-11-18 11:58:32

by syzbot

[permalink] [raw]
Subject: [syzbot] unregister_netdevice: waiting for DEV to become free (7)

Hello,

syzbot found the following issue on:

HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000
kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf
dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000

Bisection is inconclusive: the issue happens on the oldest tested release.

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000
final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000
console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

iwpm_register_pid: Unable to send a nlmsg (client = 2)
infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98
unregister_netdevice: waiting for vlan0 to become free. Usage count = 2


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches


2022-11-18 14:31:58

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7)

On Fri, 18 Nov 2022 at 12:39, syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset
> git tree: net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf
> dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000
>
> Bisection is inconclusive: the issue happens on the oldest tested release.
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000
> final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> iwpm_register_pid: Unable to send a nlmsg (client = 2)
> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98
> unregister_netdevice: waiting for vlan0 to become free. Usage count = 2

+RDMA maintainers

There are 4 reproducers and all contain:

r0 = socket$nl_rdma(0x10, 0x3, 0x14)
sendmsg$RDMA_NLDEV_CMD_NEWLINK(...)

Also the preceding print looks related (a bug in the error handling
path there?):

infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98

> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches

2022-11-22 03:04:12

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7)

On Fri, Nov 18, 2022 at 02:28:53PM +0100, Dmitry Vyukov wrote:
> On Fri, 18 Nov 2022 at 12:39, syzbot
> <[email protected]> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset
> > git tree: net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf
> > dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
> > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000
> >
> > Bisection is inconclusive: the issue happens on the oldest tested release.
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > iwpm_register_pid: Unable to send a nlmsg (client = 2)
> > infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98
> > unregister_netdevice: waiting for vlan0 to become free. Usage count = 2
>
> +RDMA maintainers
>
> There are 4 reproducers and all contain:
>
> r0 = socket$nl_rdma(0x10, 0x3, 0x14)
> sendmsg$RDMA_NLDEV_CMD_NEWLINK(...)
>
> Also the preceding print looks related (a bug in the error handling
> path there?):
>
> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98

I'm pretty sure it is an rxe bug

ib_device_set_netdev() will hold the netdev until the caller destroys
the ib_device

rxe calls it during rxe_register_device() because the user asked for a
stacked ib_device on top of the netdev

Presumably rxe needs to have a notifier to also self destroy the rxe
device if the underlying net device is to be destroyed?

Can someone from rxe check into this?

Jason

2022-11-23 10:07:57

by Guoqing Jiang

[permalink] [raw]
Subject: Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7)



On 11/22/22 11:28 AM, wangyufen wrote:
>
> 在 2022/11/22 10:13, Jason Gunthorpe 写道:
>> On Fri, Nov 18, 2022 at 02:28:53PM +0100, Dmitry Vyukov wrote:
>>> On Fri, 18 Nov 2022 at 12:39, syzbot
>>> <[email protected]> wrote:
>>>>
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit:    9c8774e629a1 net: eql: Use kzalloc instead of
>>>> kmalloc/memset
>>>> git tree:       net-next
>>>> console output:
>>>> https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000
>>>> kernel config:
>>>> https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf
>>>> dashboard link:
>>>> https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
>>>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU
>>>> Binutils for Debian) 2.35.2
>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000
>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000
>>>>
>>>> Bisection is inconclusive: the issue happens on the oldest tested
>>>> release.
>>>>
>>>> bisection log:
>>>> https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000
>>>> final oops:
>>>> https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000
>>>> console output:
>>>> https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000
>>>>
>>>> IMPORTANT: if you fix the issue, please add the following tag to
>>>> the commit:
>>>> Reported-by: [email protected]
>>>>
>>>> iwpm_register_pid: Unable to send a nlmsg (client = 2)
>>>> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98
>>>> unregister_netdevice: waiting for vlan0 to become free. Usage count
>>>> = 2
>>>
>>> +RDMA maintainers
>>>
>>> There are 4 reproducers and all contain:
>>>
>>> r0 = socket$nl_rdma(0x10, 0x3, 0x14)
>>> sendmsg$RDMA_NLDEV_CMD_NEWLINK(...)
>>>
>>> Also the preceding print looks related (a bug in the error handling
>>> path there?):
>>>
>>> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98
>>
>> I'm pretty sure it is an rxe bug
>>
>> ib_device_set_netdev() will hold the netdev until the caller destroys
>> the ib_device
>>
>> rxe calls it during rxe_register_device() because the user asked for a
>> stacked ib_device on top of the netdev
>>
>> Presumably rxe needs to have a notifier to also self destroy the rxe
>> device if the underlying net device is to be destroyed?
>>
>> Can someone from rxe check into this?
>
> The following patch may fix the issue:
>
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -4049,6 +4049,9 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
>         return 0;
>  err:
>         id_priv->backlog = 0;
> +       if (id_priv->cma_dev)
> +               cma_release_dev(id_priv);
> +
>         /*
>          * All the failure paths that lead here will not allow the
> req_handler's
>          * to have run.
>

But it is the caller's responsibility to destroy it since commit
dd37d2f59eb8.

> The causes are as follows:
>
> rdma_listen()
>   rdma_bind_addr()
>     cma_acquire_dev_by_src_ip()
>       cma_attach_to_dev()
>         _cma_attach_to_dev()
>           cma_dev_get()

Thanks for the analysis.

And for the two callers of cma_listen_on_dev, looks they have
different behaviors with regard to handling failure.

1. cma_listen_on_all which calls both
            list_del_init(&to_destroy->device_item)
    and
            rdma_destroy_id(&to_destroy->id)

2. cma_add_one invokes cma_process_remove to delete to_destroy,
cma_process_remove call both list_del_init(&id_priv->listen_item)
and list_del_init(&id_priv->device_item), but it doesn't call
rdma_destroy_id(&dev_id_priv->id) which is also different with
_cma_cancel_listens.

I am wondering if this is needed.

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index cc2222b85c88..48e283d1389b 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -5231,6 +5231,7 @@ static void cma_process_remove(struct cma_device
*cma_dev)
                cma_id_get(id_priv);
                mutex_unlock(&lock);

+               rdma_destroy_id(&dev_id_priv->id);
                cma_send_device_removal_put(id_priv);

                mutex_lock(&lock);

Thanks,
Guoqing

2022-11-24 00:40:55

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7)

On Wed, Nov 23, 2022 at 05:45:53PM +0800, Guoqing Jiang wrote:
> But it is the caller's responsibility to destroy it since commit
> dd37d2f59eb8.
>
> > The causes are as follows:
> >
> > rdma_listen()
> >   rdma_bind_addr()
> >     cma_acquire_dev_by_src_ip()
> >       cma_attach_to_dev()
> >         _cma_attach_to_dev()
> >           cma_dev_get()
>
> Thanks for the analysis.
>
> And for the two callers of cma_listen_on_dev, looks they have
> different behaviors with regard to handling failure.

Yes, the CM is not the problem, and that print from it is unrelated

I patched in netdevice_tracker and get this:

[ 237.475070][ T7541] unregister_netdevice: waiting for vlan0 to become free. Usage count = 2
[ 237.477311][ T7541] leaked reference.
[ 237.478378][ T7541] ib_device_set_netdev+0x266/0x730
[ 237.479848][ T7541] siw_newlink+0x4e0/0xfd0
[ 237.481100][ T7541] nldev_newlink+0x35c/0x5c0
[ 237.482121][ T7541] rdma_nl_rcv_msg+0x36d/0x690
[ 237.483312][ T7541] rdma_nl_rcv+0x2ee/0x430
[ 237.484483][ T7541] netlink_unicast+0x543/0x7f0
[ 237.485746][ T7541] netlink_sendmsg+0x918/0xe20
[ 237.486866][ T7541] sock_sendmsg+0xcf/0x120
[ 237.488006][ T7541] ____sys_sendmsg+0x70d/0x8b0
[ 237.489294][ T7541] ___sys_sendmsg+0x11d/0x1b0
[ 237.490404][ T7541] __sys_sendmsg+0xfa/0x1d0
[ 237.491451][ T7541] do_syscall_64+0x35/0xb0
[ 237.492566][ T7541] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Which seems to confirm my original prediction, except this is siw not
rxe..

Maybe rxe was the wrong guess, or maybe it is troubled too in other
reports?

Jason