2024-01-02 03:53:18

by Shifeng Li

[permalink] [raw]
Subject: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

The mad_client will be initialized in enable_device_and_get(), while the
devices_rwsem will be downgraded to a read semaphore. There is a window
that leads to the failed initialization for cm_client, since it can not
get matched mad port from ib_mad_port_list, and the matched mad port will
be added to the list after that.

mad_client | cm_client
------------------|--------------------------------------------------------
ib_register_device|
enable_device_and_get
down_write(&devices_rwsem)
xa_set_mark(&devices, DEVICE_REGISTERED)
downgrade_write(&devices_rwsem)
|
|ib_cm_init
|ib_register_client(&cm_client)
|down_read(&devices_rwsem)
|xa_for_each_marked (&devices, DEVICE_REGISTERED)
|add_client_context
|cm_add_one
|ib_register_mad_agent
|ib_get_mad_port
|__ib_get_mad_port
|list_for_each_entry(entry, &ib_mad_port_list, port_list)
|return NULL
|up_read(&devices_rwsem)
|
add_client_context|
ib_mad_init_device|
ib_mad_port_open |
list_add_tail(&port_priv->port_list, &ib_mad_port_list)
up_read(&devices_rwsem)
|

Fix it by using the devices_rwsem write semaphore to protect the mad_client
init flow in enable_device_and_get().

Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration")
Cc: Shifeng Li <[email protected]>
Signed-off-by: Shifeng Li <[email protected]>
---
drivers/infiniband/core/device.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 67bcea7a153c..85782786993d 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device)
down_write(&devices_rwsem);
xa_set_mark(&devices, device->index, DEVICE_REGISTERED);

- /*
- * By using downgrade_write() we ensure that no other thread can clear
- * DEVICE_REGISTERED while we are completing the client setup.
- */
- downgrade_write(&devices_rwsem);
-
if (device->ops.enable_driver) {
ret = device->ops.enable_driver(device);
if (ret)
@@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device)
if (!ret)
ret = add_compat_devs(device);
out:
- up_read(&devices_rwsem);
+ up_write(&devices_rwsem);
return ret;
}

--
2.25.1



2024-01-02 08:58:30

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On Mon, Jan 01, 2024 at 07:43:35PM -0800, Shifeng Li wrote:
> The mad_client will be initialized in enable_device_and_get(), while the
> devices_rwsem will be downgraded to a read semaphore. There is a window
> that leads to the failed initialization for cm_client, since it can not
> get matched mad port from ib_mad_port_list, and the matched mad port will
> be added to the list after that.
>
> mad_client | cm_client
> ------------------|--------------------------------------------------------
> ib_register_device|
> enable_device_and_get
> down_write(&devices_rwsem)
> xa_set_mark(&devices, DEVICE_REGISTERED)
> downgrade_write(&devices_rwsem)
> |
> |ib_cm_init
> |ib_register_client(&cm_client)
> |down_read(&devices_rwsem)
> |xa_for_each_marked (&devices, DEVICE_REGISTERED)
> |add_client_context
> |cm_add_one
> |ib_register_mad_agent
> |ib_get_mad_port
> |__ib_get_mad_port
> |list_for_each_entry(entry, &ib_mad_port_list, port_list)
> |return NULL
> |up_read(&devices_rwsem)
> |
> add_client_context|
> ib_mad_init_device|
> ib_mad_port_open |
> list_add_tail(&port_priv->port_list, &ib_mad_port_list)
> up_read(&devices_rwsem)
> |

How is this stack possible?

ib_register_device() is called by drivers and happens much later than ib_cm_init().

Thanks

>
> Fix it by using the devices_rwsem write semaphore to protect the mad_client
> init flow in enable_device_and_get().
>
> Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration")
> Cc: Shifeng Li <[email protected]>
> Signed-off-by: Shifeng Li <[email protected]>
> ---
> drivers/infiniband/core/device.c | 8 +-------
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 67bcea7a153c..85782786993d 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device)
> down_write(&devices_rwsem);
> xa_set_mark(&devices, device->index, DEVICE_REGISTERED);
>
> - /*
> - * By using downgrade_write() we ensure that no other thread can clear
> - * DEVICE_REGISTERED while we are completing the client setup.
> - */
> - downgrade_write(&devices_rwsem);
> -
> if (device->ops.enable_driver) {
> ret = device->ops.enable_driver(device);
> if (ret)
> @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device)
> if (!ret)
> ret = add_compat_devs(device);
> out:
> - up_read(&devices_rwsem);
> + up_write(&devices_rwsem);
> return ret;
> }
>
> --
> 2.25.1
>
>

2024-01-02 11:53:13

by Shifeng Li

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On 2024/1/2 16:58, Leon Romanovsky wrote:
> On Mon, Jan 01, 2024 at 07:43:35PM -0800, Shifeng Li wrote:
>> The mad_client will be initialized in enable_device_and_get(), while the
>> devices_rwsem will be downgraded to a read semaphore. There is a window
>> that leads to the failed initialization for cm_client, since it can not
>> get matched mad port from ib_mad_port_list, and the matched mad port will
>> be added to the list after that.
>>
>> mad_client | cm_client
>> ------------------|--------------------------------------------------------
>> ib_register_device|
>> enable_device_and_get
>> down_write(&devices_rwsem)
>> xa_set_mark(&devices, DEVICE_REGISTERED)
>> downgrade_write(&devices_rwsem)
>> |
>> |ib_cm_init
>> |ib_register_client(&cm_client)
>> |down_read(&devices_rwsem)
>> |xa_for_each_marked (&devices, DEVICE_REGISTERED)
>> |add_client_context
>> |cm_add_one
>> |ib_register_mad_agent
>> |ib_get_mad_port
>> |__ib_get_mad_port
>> |list_for_each_entry(entry, &ib_mad_port_list, port_list)
>> |return NULL
>> |up_read(&devices_rwsem)
>> |
>> add_client_context|
>> ib_mad_init_device|
>> ib_mad_port_open |
>> list_add_tail(&port_priv->port_list, &ib_mad_port_list)
>> up_read(&devices_rwsem)
>> |
>
> How is this stack possible?
>
> ib_register_device() is called by drivers and happens much later than ib_cm_init().
>

I've caught the stack and err log as follows, ib_mad_init_device() called after
cm_add_one().

[ 98.281786] CPU: 18 PID: 30079 Comm: modprobe Kdump: loaded
[ 98.281787] Call Trace:
[ 98.281790] dump_stack+0x71/0xab
[ 98.281805] ib_register_mad_agent+0x1c6/0x27f0 [ib_core]
[ 98.281840] cm_add_one+0x4b0/0x9d0 [ib_cm]
[ 98.281865] add_client_context+0x2b9/0x380 [ib_core]
[ 98.281890] ib_register_client+0x22a/0x2a0 [ib_core]
[ 98.281908] __init_backport+0x12f/0x1000 [ib_cm]
[ 98.281912] do_one_initcall+0x87/0x2e2
[ 98.281926] do_init_module+0x1c3/0x5f7
[ 98.281928] load_module+0x4fe0/0x68d0
[ 98.281945] __do_sys_finit_module+0x14d/0x180
[ 98.281952] do_syscall_64+0xa0/0x370
[ 98.281957] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 98.281958] RIP: 0033:0x7f51d3a4373d
[ 98.281961] RSP: 002b:00007ffceb925938 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[ 98.281964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f51d3a4373d
[ 98.281965] RDX: 0000000000000000 RSI: 0000000001636b70 RDI: 0000000000000003
[ 98.281966] RBP: 00007ffceb925950 R08: 0000000000000000 R09: 0000000001636b70
[ 98.281967] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000402410
[ 98.281968] R13: 00007ffceb926e60 R14: 0000000000000000 R15: 0000000000000000

[ 98.281977] infiniband mlx5_1: ib_register_mad_agent: Invalid port 1

[ 98.349040] CPU: 38 PID: 29896 Comm: modprobe Kdump: loaded
[ 98.349043] Call Trace:
[ 98.349053] dump_stack+0x71/0xab
[ 98.349076] ib_register_mad_agent+0x1c6/0x27f0 [ib_core]
[ 98.349149] ib_agent_port_open+0xe2/0x2d0 [ib_core]
[ 98.349164] ib_mad_init_device+0x818/0x1d70 [ib_core]
[ 98.349197] add_client_context+0x2b9/0x380 [ib_core]
[ 98.349221] enable_device_and_get+0x1ab/0x340 [ib_core]
[ 98.349292] ib_register_device+0xcbf/0xfd0 [ib_core]
[ 98.349355] __mlx5_ib_add+0x44/0xf0 [mlx5_ib]
[ 98.349404] mlx5_add_device+0xc3/0x280 [mlx5_core]
[ 98.349434] mlx5_register_interface+0x109/0x190 [mlx5_core]
[ 98.349442] do_one_initcall+0x87/0x2e2
[ 98.349460] do_init_module+0x1c3/0x5f7
[ 98.349462] load_module+0x4fe0/0x68d0
[ 98.349481] __do_sys_init_module+0x1f9/0x220
[ 98.349486] do_syscall_64+0xa0/0x370
[ 98.349492] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 98.349494] RIP: 0033:0x7fbc327faa7a
[ 98.349500] RSP: 002b:00007fff890df7f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000af
[ 98.349502] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbc327faa7a
[ 98.349503] RDX: 00000000004250c0 RSI: 00000000001b0230 RDI: 00007fbc31919010
[ 98.349505] RBP: 00007fff890df860 R08: 00000000025045b0 R09: 0000000000000000
[ 98.349506] R10: 00000000001b0230 R11: 0000000000000202 R12: 0000000000402410
[ 98.349507] R13: 00007fff890e0cf0 R14: 0000000000000000 R15: 0000000000000000

Thanks

> Thanks
>
>>
>> Fix it by using the devices_rwsem write semaphore to protect the mad_client
>> init flow in enable_device_and_get().
>>
>> Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration")
>> Cc: Shifeng Li <[email protected]>
>> Signed-off-by: Shifeng Li <[email protected]>
>> ---
>> drivers/infiniband/core/device.c | 8 +-------
>> 1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
>> index 67bcea7a153c..85782786993d 100644
>> --- a/drivers/infiniband/core/device.c
>> +++ b/drivers/infiniband/core/device.c
>> @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device)
>> down_write(&devices_rwsem);
>> xa_set_mark(&devices, device->index, DEVICE_REGISTERED);
>>
>> - /*
>> - * By using downgrade_write() we ensure that no other thread can clear
>> - * DEVICE_REGISTERED while we are completing the client setup.
>> - */
>> - downgrade_write(&devices_rwsem);
>> -
>> if (device->ops.enable_driver) {
>> ret = device->ops.enable_driver(device);
>> if (ret)
>> @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device)
>> if (!ret)
>> ret = add_compat_devs(device);
>> out:
>> - up_read(&devices_rwsem);
>> + up_write(&devices_rwsem);
>> return ret;
>> }
>>
>> --
>> 2.25.1
>>
>>
>


2024-01-02 11:58:20

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On Tue, Jan 02, 2024 at 07:33:41PM +0800, Shifeng Li wrote:
> On 2024/1/2 16:58, Leon Romanovsky wrote:
> > On Mon, Jan 01, 2024 at 07:43:35PM -0800, Shifeng Li wrote:
> > > The mad_client will be initialized in enable_device_and_get(), while the
> > > devices_rwsem will be downgraded to a read semaphore. There is a window
> > > that leads to the failed initialization for cm_client, since it can not
> > > get matched mad port from ib_mad_port_list, and the matched mad port will
> > > be added to the list after that.
> > >
> > > mad_client | cm_client
> > > ------------------|--------------------------------------------------------
> > > ib_register_device|
> > > enable_device_and_get
> > > down_write(&devices_rwsem)
> > > xa_set_mark(&devices, DEVICE_REGISTERED)
> > > downgrade_write(&devices_rwsem)
> > > |
> > > |ib_cm_init
> > > |ib_register_client(&cm_client)
> > > |down_read(&devices_rwsem)
> > > |xa_for_each_marked (&devices, DEVICE_REGISTERED)
> > > |add_client_context
> > > |cm_add_one
> > > |ib_register_mad_agent
> > > |ib_get_mad_port
> > > |__ib_get_mad_port
> > > |list_for_each_entry(entry, &ib_mad_port_list, port_list)
> > > |return NULL
> > > |up_read(&devices_rwsem)
> > > |
> > > add_client_context|
> > > ib_mad_init_device|
> > > ib_mad_port_open |
> > > list_add_tail(&port_priv->port_list, &ib_mad_port_list)
> > > up_read(&devices_rwsem)
> > > |
> >
> > How is this stack possible?
> >
> > ib_register_device() is called by drivers and happens much later than ib_cm_init().
> >
>
> I've caught the stack and err log as follows, ib_mad_init_device() called after


ib_mad_init_device != ib_register_device

You mentioned ib_register_device() in your callstack, while you caught ib_mad_init_device().

Thanks

> cm_add_one().
>
> [ 98.281786] CPU: 18 PID: 30079 Comm: modprobe Kdump: loaded
> [ 98.281787] Call Trace:
> [ 98.281790] dump_stack+0x71/0xab
> [ 98.281805] ib_register_mad_agent+0x1c6/0x27f0 [ib_core]
> [ 98.281840] cm_add_one+0x4b0/0x9d0 [ib_cm]
> [ 98.281865] add_client_context+0x2b9/0x380 [ib_core]
> [ 98.281890] ib_register_client+0x22a/0x2a0 [ib_core]
> [ 98.281908] __init_backport+0x12f/0x1000 [ib_cm]
> [ 98.281912] do_one_initcall+0x87/0x2e2
> [ 98.281926] do_init_module+0x1c3/0x5f7
> [ 98.281928] load_module+0x4fe0/0x68d0
> [ 98.281945] __do_sys_finit_module+0x14d/0x180
> [ 98.281952] do_syscall_64+0xa0/0x370
> [ 98.281957] entry_SYSCALL_64_after_hwframe+0x65/0xca
> [ 98.281958] RIP: 0033:0x7f51d3a4373d
> [ 98.281961] RSP: 002b:00007ffceb925938 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
> [ 98.281964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f51d3a4373d
> [ 98.281965] RDX: 0000000000000000 RSI: 0000000001636b70 RDI: 0000000000000003
> [ 98.281966] RBP: 00007ffceb925950 R08: 0000000000000000 R09: 0000000001636b70
> [ 98.281967] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000402410
> [ 98.281968] R13: 00007ffceb926e60 R14: 0000000000000000 R15: 0000000000000000
>
> [ 98.281977] infiniband mlx5_1: ib_register_mad_agent: Invalid port 1
>
> [ 98.349040] CPU: 38 PID: 29896 Comm: modprobe Kdump: loaded
> [ 98.349043] Call Trace:
> [ 98.349053] dump_stack+0x71/0xab
> [ 98.349076] ib_register_mad_agent+0x1c6/0x27f0 [ib_core]
> [ 98.349149] ib_agent_port_open+0xe2/0x2d0 [ib_core]
> [ 98.349164] ib_mad_init_device+0x818/0x1d70 [ib_core]
> [ 98.349197] add_client_context+0x2b9/0x380 [ib_core]
> [ 98.349221] enable_device_and_get+0x1ab/0x340 [ib_core]
> [ 98.349292] ib_register_device+0xcbf/0xfd0 [ib_core]
> [ 98.349355] __mlx5_ib_add+0x44/0xf0 [mlx5_ib]
> [ 98.349404] mlx5_add_device+0xc3/0x280 [mlx5_core]
> [ 98.349434] mlx5_register_interface+0x109/0x190 [mlx5_core]
> [ 98.349442] do_one_initcall+0x87/0x2e2
> [ 98.349460] do_init_module+0x1c3/0x5f7
> [ 98.349462] load_module+0x4fe0/0x68d0
> [ 98.349481] __do_sys_init_module+0x1f9/0x220
> [ 98.349486] do_syscall_64+0xa0/0x370
> [ 98.349492] entry_SYSCALL_64_after_hwframe+0x65/0xca
> [ 98.349494] RIP: 0033:0x7fbc327faa7a
> [ 98.349500] RSP: 002b:00007fff890df7f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000af
> [ 98.349502] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbc327faa7a
> [ 98.349503] RDX: 00000000004250c0 RSI: 00000000001b0230 RDI: 00007fbc31919010
> [ 98.349505] RBP: 00007fff890df860 R08: 00000000025045b0 R09: 0000000000000000
> [ 98.349506] R10: 00000000001b0230 R11: 0000000000000202 R12: 0000000000402410
> [ 98.349507] R13: 00007fff890e0cf0 R14: 0000000000000000 R15: 0000000000000000
>
> Thanks
>
> > Thanks
> >
> > >
> > > Fix it by using the devices_rwsem write semaphore to protect the mad_client
> > > init flow in enable_device_and_get().
> > >
> > > Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration")
> > > Cc: Shifeng Li <[email protected]>
> > > Signed-off-by: Shifeng Li <[email protected]>
> > > ---
> > > drivers/infiniband/core/device.c | 8 +-------
> > > 1 file changed, 1 insertion(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > > index 67bcea7a153c..85782786993d 100644
> > > --- a/drivers/infiniband/core/device.c
> > > +++ b/drivers/infiniband/core/device.c
> > > @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device)
> > > down_write(&devices_rwsem);
> > > xa_set_mark(&devices, device->index, DEVICE_REGISTERED);
> > > - /*
> > > - * By using downgrade_write() we ensure that no other thread can clear
> > > - * DEVICE_REGISTERED while we are completing the client setup.
> > > - */
> > > - downgrade_write(&devices_rwsem);
> > > -
> > > if (device->ops.enable_driver) {
> > > ret = device->ops.enable_driver(device);
> > > if (ret)
> > > @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device)
> > > if (!ret)
> > > ret = add_compat_devs(device);
> > > out:
> > > - up_read(&devices_rwsem);
> > > + up_write(&devices_rwsem);
> > > return ret;
> > > }
> > > --
> > > 2.25.1
> > >
> > >
> >
>
>

2024-01-02 12:31:28

by Shifeng Li

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On 2024/1/2 19:58, Leon Romanovsky wrote:
> On Tue, Jan 02, 2024 at 07:33:41PM +0800, Shifeng Li wrote:
>> On 2024/1/2 16:58, Leon Romanovsky wrote:
>>> On Mon, Jan 01, 2024 at 07:43:35PM -0800, Shifeng Li wrote:
>>>> The mad_client will be initialized in enable_device_and_get(), while the
>>>> devices_rwsem will be downgraded to a read semaphore. There is a window
>>>> that leads to the failed initialization for cm_client, since it can not
>>>> get matched mad port from ib_mad_port_list, and the matched mad port will
>>>> be added to the list after that.
>>>>
>>>> mad_client | cm_client
>>>> ------------------|--------------------------------------------------------
>>>> ib_register_device|
>>>> enable_device_and_get
>>>> down_write(&devices_rwsem)
>>>> xa_set_mark(&devices, DEVICE_REGISTERED)
>>>> downgrade_write(&devices_rwsem)
>>>> |
>>>> |ib_cm_init
>>>> |ib_register_client(&cm_client)
>>>> |down_read(&devices_rwsem)
>>>> |xa_for_each_marked (&devices, DEVICE_REGISTERED)
>>>> |add_client_context
>>>> |cm_add_one
>>>> |ib_register_mad_agent
>>>> |ib_get_mad_port
>>>> |__ib_get_mad_port
>>>> |list_for_each_entry(entry, &ib_mad_port_list, port_list)
>>>> |return NULL
>>>> |up_read(&devices_rwsem)
>>>> |
>>>> add_client_context|
>>>> ib_mad_init_device|
>>>> ib_mad_port_open |
>>>> list_add_tail(&port_priv->port_list, &ib_mad_port_list)
>>>> up_read(&devices_rwsem)
>>>> |
>>>
>>> How is this stack possible?
>>>
>>> ib_register_device() is called by drivers and happens much later than ib_cm_init().
>>>
>>
>> I've caught the stack and err log as follows, ib_mad_init_device() called after
>
>
> ib_mad_init_device != ib_register_device
>
> You mentioned ib_register_device() in your callstack, while you caught ib_mad_init_device().
>

ib_register_device() is called by mlx5_ib driver, and then ib_register_device()
calls ib_mad_init_device().

Thanks

> Thanks
>
>> cm_add_one().
>>
>> [ 98.281786] CPU: 18 PID: 30079 Comm: modprobe Kdump: loaded
>> [ 98.281787] Call Trace:
>> [ 98.281790] dump_stack+0x71/0xab
>> [ 98.281805] ib_register_mad_agent+0x1c6/0x27f0 [ib_core]
>> [ 98.281840] cm_add_one+0x4b0/0x9d0 [ib_cm]
>> [ 98.281865] add_client_context+0x2b9/0x380 [ib_core]
>> [ 98.281890] ib_register_client+0x22a/0x2a0 [ib_core]
>> [ 98.281908] __init_backport+0x12f/0x1000 [ib_cm]
>> [ 98.281912] do_one_initcall+0x87/0x2e2
>> [ 98.281926] do_init_module+0x1c3/0x5f7
>> [ 98.281928] load_module+0x4fe0/0x68d0
>> [ 98.281945] __do_sys_finit_module+0x14d/0x180
>> [ 98.281952] do_syscall_64+0xa0/0x370
>> [ 98.281957] entry_SYSCALL_64_after_hwframe+0x65/0xca
>> [ 98.281958] RIP: 0033:0x7f51d3a4373d
>> [ 98.281961] RSP: 002b:00007ffceb925938 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
>> [ 98.281964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f51d3a4373d
>> [ 98.281965] RDX: 0000000000000000 RSI: 0000000001636b70 RDI: 0000000000000003
>> [ 98.281966] RBP: 00007ffceb925950 R08: 0000000000000000 R09: 0000000001636b70
>> [ 98.281967] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000402410
>> [ 98.281968] R13: 00007ffceb926e60 R14: 0000000000000000 R15: 0000000000000000
>>
>> [ 98.281977] infiniband mlx5_1: ib_register_mad_agent: Invalid port 1
>>
>> [ 98.349040] CPU: 38 PID: 29896 Comm: modprobe Kdump: loaded
>> [ 98.349043] Call Trace:
>> [ 98.349053] dump_stack+0x71/0xab
>> [ 98.349076] ib_register_mad_agent+0x1c6/0x27f0 [ib_core]
>> [ 98.349149] ib_agent_port_open+0xe2/0x2d0 [ib_core]
>> [ 98.349164] ib_mad_init_device+0x818/0x1d70 [ib_core]
>> [ 98.349197] add_client_context+0x2b9/0x380 [ib_core]
>> [ 98.349221] enable_device_and_get+0x1ab/0x340 [ib_core]
>> [ 98.349292] ib_register_device+0xcbf/0xfd0 [ib_core]
>> [ 98.349355] __mlx5_ib_add+0x44/0xf0 [mlx5_ib]
>> [ 98.349404] mlx5_add_device+0xc3/0x280 [mlx5_core]
>> [ 98.349434] mlx5_register_interface+0x109/0x190 [mlx5_core]
>> [ 98.349442] do_one_initcall+0x87/0x2e2
>> [ 98.349460] do_init_module+0x1c3/0x5f7
>> [ 98.349462] load_module+0x4fe0/0x68d0
>> [ 98.349481] __do_sys_init_module+0x1f9/0x220
>> [ 98.349486] do_syscall_64+0xa0/0x370
>> [ 98.349492] entry_SYSCALL_64_after_hwframe+0x65/0xca
>> [ 98.349494] RIP: 0033:0x7fbc327faa7a
>> [ 98.349500] RSP: 002b:00007fff890df7f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000af
>> [ 98.349502] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbc327faa7a
>> [ 98.349503] RDX: 00000000004250c0 RSI: 00000000001b0230 RDI: 00007fbc31919010
>> [ 98.349505] RBP: 00007fff890df860 R08: 00000000025045b0 R09: 0000000000000000
>> [ 98.349506] R10: 00000000001b0230 R11: 0000000000000202 R12: 0000000000402410
>> [ 98.349507] R13: 00007fff890e0cf0 R14: 0000000000000000 R15: 0000000000000000
>>
>> Thanks
>>
>>> Thanks
>>>
>>>>
>>>> Fix it by using the devices_rwsem write semaphore to protect the mad_client
>>>> init flow in enable_device_and_get().
>>>>
>>>> Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration")
>>>> Cc: Shifeng Li <[email protected]>
>>>> Signed-off-by: Shifeng Li <[email protected]>
>>>> ---
>>>> drivers/infiniband/core/device.c | 8 +-------
>>>> 1 file changed, 1 insertion(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
>>>> index 67bcea7a153c..85782786993d 100644
>>>> --- a/drivers/infiniband/core/device.c
>>>> +++ b/drivers/infiniband/core/device.c
>>>> @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device)
>>>> down_write(&devices_rwsem);
>>>> xa_set_mark(&devices, device->index, DEVICE_REGISTERED);
>>>> - /*
>>>> - * By using downgrade_write() we ensure that no other thread can clear
>>>> - * DEVICE_REGISTERED while we are completing the client setup.
>>>> - */
>>>> - downgrade_write(&devices_rwsem);
>>>> -
>>>> if (device->ops.enable_driver) {
>>>> ret = device->ops.enable_driver(device);
>>>> if (ret)
>>>> @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device)
>>>> if (!ret)
>>>> ret = add_compat_devs(device);
>>>> out:
>>>> - up_read(&devices_rwsem);
>>>> + up_write(&devices_rwsem);
>>>> return ret;
>>>> }
>>>> --
>>>> 2.25.1
>>>>
>>>>
>>>
>>
>>
>


2024-01-03 18:48:20

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On Mon, Jan 01, 2024 at 07:43:35PM -0800, Shifeng Li wrote:
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 67bcea7a153c..85782786993d 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device)
> down_write(&devices_rwsem);
> xa_set_mark(&devices, device->index, DEVICE_REGISTERED);
>
> - /*
> - * By using downgrade_write() we ensure that no other thread can clear
> - * DEVICE_REGISTERED while we are completing the client setup.
> - */
> - downgrade_write(&devices_rwsem);
> -
> if (device->ops.enable_driver) {
> ret = device->ops.enable_driver(device);
> if (ret)
> @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device)
> if (!ret)
> ret = add_compat_devs(device);
> out:
> - up_read(&devices_rwsem);
> + up_write(&devices_rwsem);
> return ret;
> }

I don't think messing with the devices_rwsem here is a great idea, it
would be better to address this on the clients_rwsem side like:

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 67bcea7a153c6a..b956c9f8e62d34 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1730,7 +1730,7 @@ static int assign_client_id(struct ib_client *client)
{
int ret;

- down_write(&clients_rwsem);
+ lockdep_assert_held(&clients_rwsem);
/*
* The add/remove callbacks must be called in FIFO/LIFO order. To
* achieve this we assign client_ids so they are sorted in
@@ -1739,14 +1739,11 @@ static int assign_client_id(struct ib_client *client)
client->client_id = highest_client_id;
ret = xa_insert(&clients, client->client_id, client, GFP_KERNEL);
if (ret)
- goto out;
+ return ret;

highest_client_id++;
xa_set_mark(&clients, client->client_id, CLIENT_REGISTERED);
-
-out:
- up_write(&clients_rwsem);
- return ret;
+ return 0;
}

static void remove_client_id(struct ib_client *client)
@@ -1776,25 +1773,31 @@ int ib_register_client(struct ib_client *client)
{
struct ib_device *device;
unsigned long index;
+ bool need_unreg = false;
int ret;

refcount_set(&client->uses, 1);
init_completion(&client->uses_zero);
- ret = assign_client_id(client);
- if (ret)
- return ret;

down_read(&devices_rwsem);
+ down_write(&clients_rwsem);
+ ret = assign_client_id(client);
+ if (ret)
+ goto out;
+
+ need_unreg = true;
xa_for_each_marked (&devices, index, device, DEVICE_REGISTERED) {
ret = add_client_context(device, client);
- if (ret) {
- up_read(&devices_rwsem);
- ib_unregister_client(client);
- return ret;
- }
+ if (ret)
+ goto out;
}
+ ret = 0;
+out:
+ up_write(&clients_rwsem);
up_read(&devices_rwsem);
- return 0;
+ if (need_unreg && ret)
+ ib_unregister_client(client);
+ return ret;
}
EXPORT_SYMBOL(ib_register_client);


2024-01-04 12:38:20

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote:

> The root cause is that mad_client and cm_client may init concurrently
> when devices_rwsem write semaphore is downgraded in enable_device_and_get() like:

That can't be true, the module loader infrastructue ensures those two
things are sequential.

You are trying to say that the post-client fixup stuff will still see
the DEVICE_REGISTERED before it reaches the clients_rwsem lock?

That probably just says the clients_rwsem should be obtained before
changing the DEVICE_STATE too :\

Jason

2024-01-05 08:30:53

by Shifeng Li

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On 2024/1/4 20:37, Jason Gunthorpe wrote:
> On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote:
>
>> The root cause is that mad_client and cm_client may init concurrently
>> when devices_rwsem write semaphore is downgraded in enable_device_and_get() like:
>
> That can't be true, the module loader infrastructue ensures those two
> things are sequential.
>

I'm a bit confused how the module loader infrastructue ensures that mad_client.add() and
cm_client.add() are sequential. Could you explain in more detail please?

We know that the ib_cm driver and mlx5_ib driver can load concurrently.

Thanks.

> You are trying to say that the post-client fixup stuff will still see
> the DEVICE_REGISTERED before it reaches the clients_rwsem lock?
>
> That probably just says the clients_rwsem should be obtained before
> changing the DEVICE_STATE too :\
>
> Jason
>


2024-01-05 14:19:59

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On Fri, Jan 05, 2024 at 04:15:18PM +0800, Shifeng Li wrote:
> On 2024/1/4 20:37, Jason Gunthorpe wrote:
> > On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote:
> >
> > > The root cause is that mad_client and cm_client may init concurrently
> > > when devices_rwsem write semaphore is downgraded in enable_device_and_get() like:
> >
> > That can't be true, the module loader infrastructue ensures those two
> > things are sequential.
> >
>
> I'm a bit confused how the module loader infrastructue ensures that mad_client.add() and
> cm_client.add() are sequential. Could you explain in more detail
> please?

ib_cm has a symbol dependency on ib_mad, so the module loader will not
allow ib_cm to start running until all its symbol dependencies have
completed loading.

> We know that the ib_cm driver and mlx5_ib driver can load concurrently.

Yes, this is possible

Jason

2024-01-06 02:21:05

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On 2024/1/4 20:37, Jason Gunthorpe wrote:
> On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote:
>
>> The root cause is that mad_client and cm_client may init concurrently
>> when devices_rwsem write semaphore is downgraded in enable_device_and_get() like:
>
> That can't be true, the module loader infrastructue ensures those two
> things are sequential.
>

Please consider the sequence again and notice that:

1. We agree that dependencies ensure mad_client be registered before cm_client.
2. But the mad_client.add() is not invoked in ib_register_client(), since
there is no DEVICE_REGISTERED device at that time.
Instead, it will be delayed until the device driver init (e.g. mlx5_core)
in enable_device_and_get().
3. The ib_cm and mlx5_core can be loaded concurrently, after setting DEVICE_REGISTERED
and downgrade_write(&devices_rwsem) in enable_device_and_get(), there is a chance
that cm_client.add() can be invoked before mad_client.add().


T1(ib_core init) | T2(device driver init) | T3(ib_cm init)
---------------------------------------------------------------------------------------------------
ib_register_client mad_client
assign_client_id
add clients CLIENT_REGISTERED
(with clients_rwsem write)
down_read(&devices_rwsem);
xa_for_each_marked (&devices, DEVICE_REGISTERED)
nop # no devices
up_read(&devices_rwsem);

ib_register_device
enable_device_and_get
down_write(&devices_rwsem);
set DEVICE_REGISTERED
downgrade_write(&devices_rwsem);
ib_register_client cm_client
assign_client_id
add clients CLIENT_REGISTERED
(with clients_rwsem write)
down_read(&devices_rwsem);
xa_for_each_marked (&devices, DEVICE_REGISTERED)
add_client_context
down_write(&device->client_data_rwsem);
get CLIENT_DATA_REGISTERED
downgrade_write(&device->client_data_rwsem);
cm_client.add
cm_add_one
ib_register_mad_agent
ib_get_mad_port
__ib_get_mad_port return NULL!
set CLIENT_DATA_REGISTERED
up_read(&device->client_data_rwsem);
up_read(&devices_rwsem);
down_read(&clients_rwsem);
xa_for_each_marked (&clients, CLIENT_REGISTERED)
add_client_context [mad]
mad_client.add
add_client_context [cm]
nop # already CLIENT_DATA_REGISTERED
up_read(&clients_rwsem);
up_read(&devices_rwsem);

> You are trying to say that the post-client fixup stuff will still see
> the DEVICE_REGISTERED before it reaches the clients_rwsem lock?
>
> That probably just says the clients_rwsem should be obtained before
> changing the DEVICE_STATE too :\
>


--
Thanks,
- Ding Hui


2024-01-15 13:47:22

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On Sat, Jan 06, 2024 at 10:12:17AM +0800, Ding Hui wrote:
> On 2024/1/4 20:37, Jason Gunthorpe wrote:
> > On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote:
> >
> > > The root cause is that mad_client and cm_client may init concurrently
> > > when devices_rwsem write semaphore is downgraded in enable_device_and_get() like:
> >
> > That can't be true, the module loader infrastructue ensures those two
> > things are sequential.
> >
>
> Please consider the sequence again and notice that:
>
> 1. We agree that dependencies ensure mad_client be registered before cm_client.
> 2. But the mad_client.add() is not invoked in ib_register_client(), since
> there is no DEVICE_REGISTERED device at that time.
> Instead, it will be delayed until the device driver init (e.g. mlx5_core)
> in enable_device_and_get().
> 3. The ib_cm and mlx5_core can be loaded concurrently, after setting DEVICE_REGISTERED
> and downgrade_write(&devices_rwsem) in enable_device_and_get(), there is a chance
> that cm_client.add() can be invoked before mad_client.add().
>
>
> T1(ib_core init) | T2(device driver init) | T3(ib_cm init)
> ---------------------------------------------------------------------------------------------------
> ib_register_client mad_client
> assign_client_id
> add clients CLIENT_REGISTERED
> (with clients_rwsem write)
> down_read(&devices_rwsem);
> xa_for_each_marked (&devices, DEVICE_REGISTERED)
> nop # no devices
> up_read(&devices_rwsem);
>
> ib_register_device
> enable_device_and_get
> down_write(&devices_rwsem);
> set DEVICE_REGISTERED
> downgrade_write(&devices_rwsem);
> ib_register_client cm_client
> assign_client_id
> add clients CLIENT_REGISTERED
> (with clients_rwsem write)
> down_read(&devices_rwsem);
> xa_for_each_marked (&devices, DEVICE_REGISTERED)
> add_client_context
> down_write(&device->client_data_rwsem);
> get CLIENT_DATA_REGISTERED
> downgrade_write(&device->client_data_rwsem);
> cm_client.add
> cm_add_one
> ib_register_mad_agent
> ib_get_mad_port
> __ib_get_mad_port return NULL!
> set CLIENT_DATA_REGISTERED
> up_read(&device->client_data_rwsem);
> up_read(&devices_rwsem);
> down_read(&clients_rwsem);
> xa_for_each_marked (&clients, CLIENT_REGISTERED)
> add_client_context [mad]
> mad_client.add
> add_client_context [cm]
> nop # already CLIENT_DATA_REGISTERED
> up_read(&clients_rwsem);
> up_read(&devices_rwsem);

Take the draft I sent previously and use down_write(&devices_rwsem) in
ib_register_client()

Jason

2024-01-15 16:23:22

by Shifeng Li

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On 2024/1/5 22:19, Jason Gunthorpe wrote:
> On Fri, Jan 05, 2024 at 04:15:18PM +0800, Shifeng Li wrote:
>> On 2024/1/4 20:37, Jason Gunthorpe wrote:
>>> On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote:
>>>
>>>> The root cause is that mad_client and cm_client may init concurrently
>>>> when devices_rwsem write semaphore is downgraded in enable_device_and_get() like:
>>>
>>> That can't be true, the module loader infrastructue ensures those two
>>> things are sequential.
>>>
>>
>> I'm a bit confused how the module loader infrastructue ensures that mad_client.add() and
>> cm_client.add() are sequential. Could you explain in more detail
>> please?
>
> ib_cm has a symbol dependency on ib_mad, so the module loader will not
> allow ib_cm to start running until all its symbol dependencies have
> completed loading.

I have found a method to reproduce the issue as follow:

1. modprobe -r ib_cm; modprobe -r ib_core; modprobe -r mlx5_ib;
2. Compile and replace ib_core with following patch;
3. modprobe ib_cm;
4. modprobe mlx5_ib;

diff -Narup ./drivers/infiniband/core/device.c ./drivers/infiniband/core/device.c.reproduce
--- ./drivers/infiniband/core/device.c 2024-01-15 11:14:08.063094430 +0800
+++ ./drivers/infiniband/core/device.c.reproduce 2024-01-15 11:16:22.577096953 +0800
@@ -64,6 +64,8 @@ struct workqueue_struct *ib_wq;
EXPORT_SYMBOL_GPL(ib_wq);
static struct workqueue_struct *ib_unreg_wq;

+int ib_cm_run_flag = 0;
+
/*
* Each of the three rwsem locks (devices, clients, client_data) protects the
* xarray of the same name. Specifically it allows the caller to assert that
@@ -1371,6 +1373,9 @@ static int enable_device_and_get(struct
*/
downgrade_write(&devices_rwsem);

+ ib_cm_run_flag = 1;
+ msleep(30000);
+
if (device->ops.enable_driver) {
ret = device->ops.enable_driver(device);
if (ret)
@@ -1843,6 +1848,12 @@ int ib_register_client(struct ib_client
if (ret)
return ret;

+ if (strncmp(client->name, "cm", strlen("cm")) == 0) {
+ while (!ib_cm_run_flag) {
+ cond_resched();
+ }
+ }
+
down_read(&devices_rwsem);
xa_for_each_marked (&devices, index, device, DEVICE_REGISTERED) {
ret = add_client_context(device, client);

>
>> We know that the ib_cm driver and mlx5_ib driver can load concurrently.
>
> Yes, this is possible
>
> Jason
>


2024-01-26 12:24:43

by Shifeng Li

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On 2024/1/16 10:12, Ding Hui wrote:
> On 2024/1/15 21:47, Jason Gunthorpe wrote:
>> On Sat, Jan 06, 2024 at 10:12:17AM +0800, Ding Hui wrote:
>>> On 2024/1/4 20:37, Jason Gunthorpe wrote:
>>>> On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote:
>>>>
>>>>> The root cause is that mad_client and cm_client may init concurrently
>>>>> when devices_rwsem write semaphore is downgraded in enable_device_and_get() like:
>>>>
>>>> That can't be true, the module loader infrastructue ensures those two
>>>> things are sequential.
>>>>
>>>
>>> Please consider the sequence again and notice that:
>>>
>>> 1. We agree that dependencies ensure mad_client be registered before cm_client.
>>> 2. But the mad_client.add() is not invoked in ib_register_client(), since
>>>     there is no DEVICE_REGISTERED device at that time.
>>>     Instead, it will be delayed until the device driver init (e.g. mlx5_core)
>>>     in enable_device_and_get().
>>> 3. The ib_cm and mlx5_core can be loaded concurrently, after setting DEVICE_REGISTERED
>>>     and downgrade_write(&devices_rwsem) in enable_device_and_get(), there is a chance
>>>     that cm_client.add() can be invoked before mad_client.add().
>>>
>>>
>>>          T1(ib_core init)      |      T2(device driver init)        |        T3(ib_cm init)
>>> ---------------------------------------------------------------------------------------------------
>>> ib_register_client mad_client
>>>    assign_client_id
>>>      add clients CLIENT_REGISTERED
>>>      (with clients_rwsem write)
>>>    down_read(&devices_rwsem);
>>>    xa_for_each_marked (&devices, DEVICE_REGISTERED)
>>>      nop # no devices
>>>    up_read(&devices_rwsem);
>>>
>>>                                ib_register_device
>>>                                  enable_device_and_get
>>>                                    down_write(&devices_rwsem);
>>>                                    set DEVICE_REGISTERED
>>>                                    downgrade_write(&devices_rwsem);
>>>                                                                      ib_register_client cm_client
>>>                                                                        assign_client_id
>>>                                                                          add clients CLIENT_REGISTERED
>>>                                                                          (with clients_rwsem write)
>>>                                                                        down_read(&devices_rwsem);
>>>                                                                        xa_for_each_marked (&devices, DEVICE_REGISTERED)
>>>                                                                          add_client_context
>>>                                                                            down_write(&device->client_data_rwsem);
>>>                                                                            get CLIENT_DATA_REGISTERED
>>>                                                                            downgrade_write(&device->client_data_rwsem);
>>>                                                                            cm_client.add
>>>                                                                              cm_add_one
>>>                                                                                ib_register_mad_agent
>>>                                                                                  ib_get_mad_port
>>>                                                                                    __ib_get_mad_port return NULL!
>>>                                                                            set CLIENT_DATA_REGISTERED
>>>                                                                            up_read(&device->client_data_rwsem);
>>>                                                                        up_read(&devices_rwsem);
>>>                                  down_read(&clients_rwsem);
>>>                                  xa_for_each_marked (&clients, CLIENT_REGISTERED)
>>>                                    add_client_context [mad]
>>>                                      mad_client.add
>>>                                    add_client_context [cm]
>>>                                      nop # already CLIENT_DATA_REGISTERED
>>>                                  up_read(&clients_rwsem);
>>>                                  up_read(&devices_rwsem);
>>
>> Take the draft I sent previously and use down_write(&devices_rwsem) in
>> ib_register_client()
>>
>
> I believe this modification is effective, rather than expanding the clients_rwsem range,
> the key point is down_write(&devices_rwsem), which prevents ib_register_client() from
> being executed in the gap of ib_register_device().
>
> However, this may cause a little confusion, as ib_register_client() does not modify
> anything related to devices, but it is protected by a write lock.
>

Hi Jason,

Do you have any differing opinions about above?



2024-02-01 00:04:22

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

On Fri, Jan 26, 2024 at 10:25:01AM +0800, Shifeng Li wrote:
> > I believe this modification is effective, rather than expanding the clients_rwsem range,
> > the key point is down_write(&devices_rwsem), which prevents ib_register_client() from
> > being executed in the gap of ib_register_device().
> >
> > However, this may cause a little confusion, as ib_register_client() does not modify
> > anything related to devices, but it is protected by a write lock.
> >
> Do you have any differing opinions about above?

I'd rather see the client side have a comment explaining the confusing
lock then disturb the locking on the devices side..

I think the write/read lock scheme there was done for a reason I just
haven't had time to recall exactly why..

Please send a patch with this arrangement

Jason