Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: c804af2c1d3152c0cf877eeb50d60c2d49ac0cf0 ("IB/srpt: use new shared CQ mechanism")
https://git.kernel.org/cgit/linux/kernel/git/rdma/rdma.git for-next
in testcase: blktests
with following parameters:
test: srp-group1
ucode: 0x21
on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
user :notice: [ 44.688140] 2020-08-01 16:10:22 ./check srp/001 srp/002 srp/003 srp/004 srp/005 srp/006 srp/007 srp/008 srp/009 srp/010 srp/011 srp/012 srp/013 srp/015
user :notice: [ 44.706657] srp/001 (Create and remove LUNs)
user :notice: [ 44.718405] srp/001 (Create and remove LUNs) [passed]
user :notice: [ 44.729902] runtime ... 1.972s
user :notice: [ 99.038748] IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
user :notice: [ 3699.039790] Sat Aug 1 17:11:22 UTC 2020 detected soft_timeout
user :notice: [ 3699.060341] kill 960 /usr/bin/time -v -o /tmp/lkp/blktests.time /lkp/lkp/src/tests/blktests
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
Thanks,
Rong Chen
On 2020-08-01 23:09, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: c804af2c1d3152c0cf877eeb50d60c2d49ac0cf0 ("IB/srpt: use new shared CQ mechanism")
> https://git.kernel.org/cgit/linux/kernel/git/rdma/rdma.git for-next
>
>
> in testcase: blktests
> with following parameters:
>
> test: srp-group1
> ucode: 0x21
>
>
>
> on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> user :notice: [ 44.688140] 2020-08-01 16:10:22 ./check srp/001 srp/002 srp/003 srp/004 srp/005 srp/006 srp/007 srp/008 srp/009 srp/010 srp/011 srp/012 srp/013 srp/015
> user :notice: [ 44.706657] srp/001 (Create and remove LUNs)
> user :notice: [ 44.718405] srp/001 (Create and remove LUNs) [passed]
> user :notice: [ 44.729902] runtime ... 1.972s
> user :notice: [ 99.038748] IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
> user :notice: [ 3699.039790] Sat Aug 1 17:11:22 UTC 2020 detected soft_timeout
> user :notice: [ 3699.060341] kill 960 /usr/bin/time -v -o /tmp/lkp/blktests.time /lkp/lkp/src/tests/blktests
Yamin and Max, can you take a look at this? The SRP tests from the
blktests repository pass reliably with kernel version v5.7 and before.
With label next-20200731 from linux-next however that test triggers the
following hang:
sd 8:0:0:0: [sda] Synchronizing SCSI cache
rdma_rxe: not configured on eth0
rdma_rxe: not configured on lo
INFO: task modprobe:1894 blocked for more than 122 seconds.
Not tainted 5.8.0-rc7-next-20200731-dbg+ #3
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D27624 1894 1081 0x00004000
Call Trace:
__schedule+0x4ee/0x1170
schedule+0x7f/0x170
schedule_timeout+0x453/0x6f0
wait_for_completion+0x126/0x1b0
disable_device+0x12a/0x1c0 [ib_core]
__ib_unregister_device+0x64/0x100 [ib_core]
ib_unregister_driver+0x11c/0x180 [ib_core]
rxe_module_exit+0x1e/0x36 [rdma_rxe]
__x64_sys_delete_module+0x22a/0x310
do_syscall_64+0x36/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f8d0e0c8a3b
Code: Bad RIP value.
RSP: 002b:00007ffe8238f798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 000055a2c7d9be80 RCX: 00007f8d0e0c8a3b
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055a2c7d9bee8
RBP: 000055a2c7d9be80 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f8d0e144ac0 R11: 0000000000000206 R12: 000055a2c7d9bee8
R13: 0000000000000000 R14: 000055a2c7d9bee8 R15: 000055a2c7d9be80
Showing all locks held in the system:
1 lock held by khungtaskd/53:
#0: ffffffff82895880 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x37/0x20f
1 lock held by modprobe/1894:
#0: ffff8881c5e8c660 (&device->unregistration_lock){+.+.}-{3:3}, at: __ib_unregister_device+0x23/0x100 [ib_core]
=============================================
INFO: task modprobe:1894 blocked for more than 245 seconds.
Not tainted 5.8.0-rc7-next-20200731-dbg+ #3
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D27624 1894 1081 0x00004000
Call Trace:
__schedule+0x4ee/0x1170
schedule+0x7f/0x170
schedule_timeout+0x453/0x6f0
wait_for_completion+0x126/0x1b0
disable_device+0x12a/0x1c0 [ib_core]
__ib_unregister_device+0x64/0x100 [ib_core]
ib_unregister_driver+0x11c/0x180 [ib_core]
rxe_module_exit+0x1e/0x36 [rdma_rxe]
__x64_sys_delete_module+0x22a/0x310
do_syscall_64+0x36/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f8d0e0c8a3b
Code: Bad RIP value.
RSP: 002b:00007ffe8238f798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 000055a2c7d9be80 RCX: 00007f8d0e0c8a3b
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055a2c7d9bee8
RBP: 000055a2c7d9be80 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f8d0e144ac0 R11: 0000000000000206 R12: 000055a2c7d9bee8
R13: 0000000000000000 R14: 000055a2c7d9bee8 R15: 000055a2c7d9be80
Showing all locks held in the system:
1 lock held by khungtaskd/53:
#0: ffffffff82895880 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x37/0x20f
no locks held by systemd-journal/241.
1 lock held by modprobe/1894:
#0: ffff8881c5e8c660 (&device->unregistration_lock){+.+.}-{3:3}, at: __ib_unregister_device+0x23/0x100 [ib_core]
=============================================
On 8/2/2020 6:05 PM, Bart Van Assche wrote:
> On 2020-08-01 23:09, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed the following commit (built with gcc-9):
>>
>> commit: c804af2c1d3152c0cf877eeb50d60c2d49ac0cf0 ("IB/srpt: use new shared CQ mechanism")
>> https://git.kernel.org/cgit/linux/kernel/git/rdma/rdma.git for-next
>>
>>
>> in testcase: blktests
>> with following parameters:
>>
>> test: srp-group1
>> ucode: 0x21
>>
>>
>>
>> on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
>>
>> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>>
>>
>>
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kernel test robot <[email protected]>
>>
>>
>> user :notice: [ 44.688140] 2020-08-01 16:10:22 ./check srp/001 srp/002 srp/003 srp/004 srp/005 srp/006 srp/007 srp/008 srp/009 srp/010 srp/011 srp/012 srp/013 srp/015
>> user :notice: [ 44.706657] srp/001 (Create and remove LUNs)
>> user :notice: [ 44.718405] srp/001 (Create and remove LUNs) [passed]
>> user :notice: [ 44.729902] runtime ... 1.972s
>> user :notice: [ 99.038748] IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
>> user :notice: [ 3699.039790] Sat Aug 1 17:11:22 UTC 2020 detected soft_timeout
>> user :notice: [ 3699.060341] kill 960 /usr/bin/time -v -o /tmp/lkp/blktests.time /lkp/lkp/src/tests/blktests
> Yamin and Max, can you take a look at this? The SRP tests from the
> blktests repository pass reliably with kernel version v5.7 and before.
> With label next-20200731 from linux-next however that test triggers the
> following hang:
I will look into it.
>
> sd 8:0:0:0: [sda] Synchronizing SCSI cache
> rdma_rxe: not configured on eth0
> rdma_rxe: not configured on lo
> INFO: task modprobe:1894 blocked for more than 122 seconds.
> Not tainted 5.8.0-rc7-next-20200731-dbg+ #3
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> modprobe D27624 1894 1081 0x00004000
> Call Trace:
> __schedule+0x4ee/0x1170
> schedule+0x7f/0x170
> schedule_timeout+0x453/0x6f0
> wait_for_completion+0x126/0x1b0
> disable_device+0x12a/0x1c0 [ib_core]
> __ib_unregister_device+0x64/0x100 [ib_core]
> ib_unregister_driver+0x11c/0x180 [ib_core]
> rxe_module_exit+0x1e/0x36 [rdma_rxe]
> __x64_sys_delete_module+0x22a/0x310
> do_syscall_64+0x36/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f8d0e0c8a3b
> Code: Bad RIP value.
> RSP: 002b:00007ffe8238f798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> RAX: ffffffffffffffda RBX: 000055a2c7d9be80 RCX: 00007f8d0e0c8a3b
> RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055a2c7d9bee8
> RBP: 000055a2c7d9be80 R08: 0000000000000000 R09: 0000000000000000
> R10: 00007f8d0e144ac0 R11: 0000000000000206 R12: 000055a2c7d9bee8
> R13: 0000000000000000 R14: 000055a2c7d9bee8 R15: 000055a2c7d9be80
>
> Showing all locks held in the system:
> 1 lock held by khungtaskd/53:
> #0: ffffffff82895880 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x37/0x20f
> 1 lock held by modprobe/1894:
> #0: ffff8881c5e8c660 (&device->unregistration_lock){+.+.}-{3:3}, at: __ib_unregister_device+0x23/0x100 [ib_core]
>
> =============================================
> INFO: task modprobe:1894 blocked for more than 245 seconds.
> Not tainted 5.8.0-rc7-next-20200731-dbg+ #3
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> modprobe D27624 1894 1081 0x00004000
> Call Trace:
> __schedule+0x4ee/0x1170
> schedule+0x7f/0x170
> schedule_timeout+0x453/0x6f0
> wait_for_completion+0x126/0x1b0
> disable_device+0x12a/0x1c0 [ib_core]
> __ib_unregister_device+0x64/0x100 [ib_core]
> ib_unregister_driver+0x11c/0x180 [ib_core]
> rxe_module_exit+0x1e/0x36 [rdma_rxe]
> __x64_sys_delete_module+0x22a/0x310
> do_syscall_64+0x36/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f8d0e0c8a3b
> Code: Bad RIP value.
> RSP: 002b:00007ffe8238f798 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> RAX: ffffffffffffffda RBX: 000055a2c7d9be80 RCX: 00007f8d0e0c8a3b
> RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055a2c7d9bee8
> RBP: 000055a2c7d9be80 R08: 0000000000000000 R09: 0000000000000000
> R10: 00007f8d0e144ac0 R11: 0000000000000206 R12: 000055a2c7d9bee8
> R13: 0000000000000000 R14: 000055a2c7d9bee8 R15: 000055a2c7d9be80
>
> Showing all locks held in the system:
> 1 lock held by khungtaskd/53:
> #0: ffffffff82895880 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x37/0x20f
> no locks held by systemd-journal/241.
> 1 lock held by modprobe/1894:
> #0: ffff8881c5e8c660 (&device->unregistration_lock){+.+.}-{3:3}, at: __ib_unregister_device+0x23/0x100 [ib_core]
>
> =============================================
>>> Greeting,
>>>
>>> FYI, we noticed the following commit (built with gcc-9):
>>>
>>> commit: c804af2c1d3152c0cf877eeb50d60c2d49ac0cf0 ("IB/srpt: use new
>>> shared CQ mechanism")
>>> https://git.kernel.org/cgit/linux/kernel/git/rdma/rdma.git for-next
>>>
>>>
>>> in testcase: blktests
>>> with following parameters:
>>>
>>> ????test: srp-group1
>>> ????ucode: 0x21
>>>
>>>
>>>
>>> on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
>>> with 4G memory
>>>
>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>> log/backtrace):
>>>
>>>
>>>
>>>
>>> If you fix the issue, kindly add following tag
>>> Reported-by: kernel test robot <[email protected]>
>>>
>>>
>>> user? :notice: [?? 44.688140] 2020-08-01 16:10:22 ./check srp/001
>>> srp/002 srp/003 srp/004 srp/005 srp/006 srp/007 srp/008 srp/009
>>> srp/010 srp/011 srp/012 srp/013 srp/015
>>> user? :notice: [?? 44.706657] srp/001 (Create and remove LUNs)
>>> user? :notice: [?? 44.718405] srp/001 (Create and remove
>>> LUNs)???????????????????????????? [passed]
>>> user? :notice: [?? 44.729902]???? runtime? ...? 1.972s
>>> user? :notice: [?? 99.038748] IPMI BMC is not supported on this
>>> machine, skip bmc-watchdog setup!
>>> user? :notice: [ 3699.039790] Sat Aug? 1 17:11:22 UTC 2020 detected
>>> soft_timeout
>>> user? :notice: [ 3699.060341] kill 960 /usr/bin/time -v -o
>>> /tmp/lkp/blktests.time /lkp/lkp/src/tests/blktests
>> Yamin and Max, can you take a look at this? The SRP tests from the
>> blktests repository pass reliably with kernel version v5.7 and before.
>> With label next-20200731 from linux-next however that test triggers the
>> following hang:
>
> I will look into it.
FWIW, I ran into this as well with nvme-rdma, but it also reproduces
when I revert the shared CQ patch from nvme-rdma. Another data point
is that my tests passes with siw.
On 2020-08-03 00:27, Sagi Grimberg wrote:
>
>>>> Greeting,
>>>>
>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>
>>>> commit: c804af2c1d3152c0cf877eeb50d60c2d49ac0cf0 ("IB/srpt: use new shared CQ mechanism")
>>>> https://git.kernel.org/cgit/linux/kernel/git/rdma/rdma.git for-next
>>>>
>>>>
>>>> in testcase: blktests
>>>> with following parameters:
>>>>
>>>> ????test: srp-group1
>>>> ????ucode: 0x21
>>>>
>>>>
>>>>
>>>> on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
>>>>
>>>> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>>>>
>>>>
>>>>
>>>>
>>>> If you fix the issue, kindly add following tag
>>>> Reported-by: kernel test robot <[email protected]>
>>>>
>>>>
>>>> user? :notice: [?? 44.688140] 2020-08-01 16:10:22 ./check srp/001 srp/002 srp/003 srp/004 srp/005 srp/006 srp/007 srp/008 srp/009 srp/010 srp/011 srp/012 srp/013 srp/015
>>>> user? :notice: [?? 44.706657] srp/001 (Create and remove LUNs)
>>>> user? :notice: [?? 44.718405] srp/001 (Create and remove LUNs)???????????????????????????? [passed]
>>>> user? :notice: [?? 44.729902]???? runtime? ...? 1.972s
>>>> user? :notice: [?? 99.038748] IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
>>>> user? :notice: [ 3699.039790] Sat Aug? 1 17:11:22 UTC 2020 detected soft_timeout
>>>> user? :notice: [ 3699.060341] kill 960 /usr/bin/time -v -o /tmp/lkp/blktests.time /lkp/lkp/src/tests/blktests
>>> Yamin and Max, can you take a look at this? The SRP tests from the
>>> blktests repository pass reliably with kernel version v5.7 and before.
>>> With label next-20200731 from linux-next however that test triggers the
>>> following hang:
>>
>> I will look into it.
>
> FWIW, I ran into this as well with nvme-rdma, but it also reproduces
> when I revert the shared CQ patch from nvme-rdma. Another data point
> is that my tests passes with siw.
Hi Jason,
The patch below is sufficient to unbreak blktests. I think that the
deadlock while unloading rdma_rxe happens because the RDMA core waits for
all ib_dev references to be dropped before dealloc_driver is called.
The rdma_rxe dealloc_driver implementation drops an ib_dev reference. The
dealloc_driver callback was introduced by commit d0899892edd0
("RDMA/device: Provide APIs from the core code to help unregistration").
Do you agree that this regression has been introduced by commits
d0899892edd0 and c367074b6c37 ("RDMA/rxe: Use driver_unregister and new
unregistration API")?
Thanks,
Bart.
---
drivers/infiniband/core/device.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index dca2842a7872..5192f305b253 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1287,13 +1287,8 @@ static void disable_device(struct ib_device *device)
/* Pairs with refcount_set in enable_device */
ib_device_put(device);
- wait_for_completion(&device->unreg_completion);
- /*
- * compat devices must be removed after device refcount drops to zero.
- * Otherwise init_net() may add more compatdevs after removing compat
- * devices and before device is disabled.
- */
+ /* To do: prevent init_net() to add more compat_devs. */
remove_compat_devs(device);
}
On Sun, Aug 23, 2020 at 02:18:41PM -0700, Bart Van Assche wrote:
> The patch below is sufficient to unbreak blktests. I think that the
> deadlock while unloading rdma_rxe happens because the RDMA core waits for
> all ib_dev references to be dropped before dealloc_driver is called.
Which is required, yes
> The rdma_rxe dealloc_driver implementation drops an ib_dev
> reference.
Where does it do that? I didn't notice it?
Jason
On 2020-08-24 06:30, Jason Gunthorpe wrote:
> On Sun, Aug 23, 2020 at 02:18:41PM -0700, Bart Van Assche wrote:>
>> The patch below is sufficient to unbreak blktests. I think that the
>> deadlock while unloading rdma_rxe happens because the RDMA core waits for
>> all ib_dev references to be dropped before dealloc_driver is called.
>
> Which is required, yes
>
>> The rdma_rxe dealloc_driver implementation drops an ib_dev
>> reference.
>
> Where does it do that? I didn't notice it?
That last statement was wrong.
Anyway, with the following debugging patch applied:
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c36b4d2b61e0..b976dd30f727 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -332,6 +332,8 @@ struct ib_device *ib_device_get_by_index(const struct net *net, u32 index)
*/
void ib_device_put(struct ib_device *device)
{
+ WARN(device->warn_on_refcount_drop, "%s refcnt = %d\n",
+ dev_name(&device->dev), refcount_read(&device->refcount));
if (refcount_dec_and_test(&device->refcount))
complete(&device->unreg_completion);
}
@@ -1287,7 +1289,10 @@ static void disable_device(struct ib_device *device)
/* Pairs with refcount_set in enable_device */
ib_device_put(device);
+ device->warn_on_refcount_drop = true;
+#if 0
wait_for_completion(&device->unreg_completion);
+#endif
/*
* compat devices must be removed after device refcount drops to zero.
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index c868609a4ffa..2d050e3ee55a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2704,6 +2704,7 @@ struct ib_device {
* registered and cannot be unregistered.
*/
refcount_t refcount;
+ bool warn_on_refcount_drop;
struct completion unreg_completion;
struct work_struct unregistration_work;
The following appeared:
WARNING: CPU: 5 PID: 1760 at drivers/infiniband/core/device.c:335 ib_device_put+0xf2/0x100 [ib_core]
Call Trace:
rxe_elem_release+0x76/0x90 [rdma_rxe]
rxe_destroy_cq+0x4f/0x70 [rdma_rxe]
ib_free_cq_user+0x12b/0x2b0 [ib_core]
ib_cq_pool_destroy+0xa8/0x140 [ib_core]
__ib_unregister_device+0x9c/0x1c0 [ib_core]
ib_unregister_driver+0x181/0x1a0 [ib_core]
rxe_module_exit+0x31/0x50 [rdma_rxe]
__x64_sys_delete_module+0x22a/0x310
do_syscall_64+0x36/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Do you agree that the above proves that the hang-on-unload is a
regression that has been introduced by the cq-pool patches? Is the patch
below a good way to fix this?
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1454,8 +1454,8 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
if (!refcount_read(&ib_dev->refcount))
goto out;
- disable_device(ib_dev);
ib_cq_pool_destroy(ib_dev);
+ disable_device(ib_dev);
/* Expedite removing unregistered pointers from the hash table */
free_netdevs(ib_dev);
Thanks,
Bart.
On Sun, Sep 06, 2020 at 07:58:12PM -0700, Bart Van Assche wrote:
> The following appeared:
>
> WARNING: CPU: 5 PID: 1760 at drivers/infiniband/core/device.c:335 ib_device_put+0xf2/0x100 [ib_core]
>
> Call Trace:
> rxe_elem_release+0x76/0x90 [rdma_rxe]
> rxe_destroy_cq+0x4f/0x70 [rdma_rxe]
> ib_free_cq_user+0x12b/0x2b0 [ib_core]
> ib_cq_pool_destroy+0xa8/0x140 [ib_core]
> __ib_unregister_device+0x9c/0x1c0 [ib_core]
> ib_unregister_driver+0x181/0x1a0 [ib_core]
> rxe_module_exit+0x31/0x50 [rdma_rxe]
> __x64_sys_delete_module+0x22a/0x310
> do_syscall_64+0x36/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oh interesting..
> Do you agree that the above proves that the hang-on-unload is a
> regression that has been introduced by the cq-pool patches? Is the patch
> below a good way to fix this?
> +++ b/drivers/infiniband/core/device.c
> @@ -1454,8 +1454,8 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
> if (!refcount_read(&ib_dev->refcount))
> goto out;
>
> - disable_device(ib_dev);
> ib_cq_pool_destroy(ib_dev);
> + disable_device(ib_dev);
>
> /* Expedite removing unregistered pointers from the hash table */
> free_netdevs(ib_dev);
Hum. Not quite..
disable_device() ensures that all ib_clients have disconnected from
the device, up until the clients have disconnected the cq_pool must
remain operational.
It is reasonable to consider the cq_pool as a built-in client, so I
would suggest moving it to right around the time the dynamic clients
are handled. Something like this:
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c36b4d2b61e0c0..e3651dacad1da6 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1285,6 +1285,8 @@ static void disable_device(struct ib_device *device)
remove_client_context(device, cid);
}
+ ib_cq_pool_destroy(ib_dev);
+
/* Pairs with refcount_set in enable_device */
ib_device_put(device);
wait_for_completion(&device->unreg_completion);
@@ -1328,6 +1330,8 @@ static int enable_device_and_get(struct ib_device *device)
goto out;
}
+ ib_cq_pool_init(device);
+
down_read(&clients_rwsem);
xa_for_each_marked (&clients, index, client, CLIENT_REGISTERED) {
ret = add_client_context(device, client);
@@ -1400,7 +1404,6 @@ int ib_register_device(struct ib_device *device, const char *name)
goto dev_cleanup;
}
- ib_cq_pool_init(device);
ret = enable_device_and_get(device);
dev_set_uevent_suppress(&device->dev, false);
/* Mark for userspace that device is ready */
@@ -1455,7 +1458,6 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
goto out;
disable_device(ib_dev);
- ib_cq_pool_destroy(ib_dev);
/* Expedite removing unregistered pointers from the hash table */
free_netdevs(ib_dev);
On 2020-09-08 11:22, Jason Gunthorpe wrote:
> It is reasonable to consider the cq_pool as a built-in client, so I
> would suggest moving it to right around the time the dynamic clients
> are handled. Something like this:
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index c36b4d2b61e0c0..e3651dacad1da6 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -1285,6 +1285,8 @@ static void disable_device(struct ib_device *device)
> remove_client_context(device, cid);
> }
>
> + ib_cq_pool_destroy(ib_dev);
> +
> /* Pairs with refcount_set in enable_device */
> ib_device_put(device);
> wait_for_completion(&device->unreg_completion);
> @@ -1328,6 +1330,8 @@ static int enable_device_and_get(struct ib_device *device)
> goto out;
> }
>
> + ib_cq_pool_init(device);
> +
> down_read(&clients_rwsem);
> xa_for_each_marked (&clients, index, client, CLIENT_REGISTERED) {
> ret = add_client_context(device, client);
> @@ -1400,7 +1404,6 @@ int ib_register_device(struct ib_device *device, const char *name)
> goto dev_cleanup;
> }
>
> - ib_cq_pool_init(device);
> ret = enable_device_and_get(device);
> dev_set_uevent_suppress(&device->dev, false);
> /* Mark for userspace that device is ready */
> @@ -1455,7 +1458,6 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
> goto out;
>
> disable_device(ib_dev);
> - ib_cq_pool_destroy(ib_dev);
>
> /* Expedite removing unregistered pointers from the hash table */
> free_netdevs(ib_dev);
The above patch didn't compile, but the patch below does and makes the hang
disappear. So feel free to add the following to the patch below:
Tested-by: Bart Van Assche <[email protected]>
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c36b4d2b61e0..23ee65a9185f 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1285,6 +1285,8 @@ static void disable_device(struct ib_device *device)
remove_client_context(device, cid);
}
+ ib_cq_pool_destroy(device);
+
/* Pairs with refcount_set in enable_device */
ib_device_put(device);
wait_for_completion(&device->unreg_completion);
@@ -1328,6 +1330,8 @@ static int enable_device_and_get(struct ib_device *device)
goto out;
}
+ ib_cq_pool_init(device);
+
down_read(&clients_rwsem);
xa_for_each_marked (&clients, index, client, CLIENT_REGISTERED) {
ret = add_client_context(device, client);
@@ -1400,7 +1404,6 @@ int ib_register_device(struct ib_device *device, const char *name)
goto dev_cleanup;
}
- ib_cq_pool_init(device);
ret = enable_device_and_get(device);
dev_set_uevent_suppress(&device->dev, false);
/* Mark for userspace that device is ready */
@@ -1455,7 +1458,6 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
goto out;
disable_device(ib_dev);
- ib_cq_pool_destroy(ib_dev);
/* Expedite removing unregistered pointers from the hash table */
free_netdevs(ib_dev);
On 2020-09-08 19:01, Bart Van Assche wrote:
> The above patch didn't compile, but the patch below does and makes the hang
> disappear. So feel free to add the following to the patch below:
>
> Tested-by: Bart Van Assche <[email protected]>
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index c36b4d2b61e0..23ee65a9185f 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -1285,6 +1285,8 @@ static void disable_device(struct ib_device *device)
> remove_client_context(device, cid);
> }
>
> + ib_cq_pool_destroy(device);
> +
> /* Pairs with refcount_set in enable_device */
> ib_device_put(device);
> wait_for_completion(&device->unreg_completion);
> @@ -1328,6 +1330,8 @@ static int enable_device_and_get(struct ib_device *device)
> goto out;
> }
>
> + ib_cq_pool_init(device);
> +
> down_read(&clients_rwsem);
> xa_for_each_marked (&clients, index, client, CLIENT_REGISTERED) {
> ret = add_client_context(device, client);
> @@ -1400,7 +1404,6 @@ int ib_register_device(struct ib_device *device, const char *name)
> goto dev_cleanup;
> }
>
> - ib_cq_pool_init(device);
> ret = enable_device_and_get(device);
> dev_set_uevent_suppress(&device->dev, false);
> /* Mark for userspace that device is ready */
> @@ -1455,7 +1458,6 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
> goto out;
>
> disable_device(ib_dev);
> - ib_cq_pool_destroy(ib_dev);
>
> /* Expedite removing unregistered pointers from the hash table */
> free_netdevs(ib_dev);
Hi Jason,
Please let me know how you want to proceed with this patch.
Thanks,
Bart.
Tested-by: Yi Zhang <[email protected]>
This patch fixed the issue I filed[1] which use rdma_rxe for nvme-rdma
testing.
[1]
http://lists.infradead.org/pipermail/linux-nvme/2020-August/018988.html
Thanks
Yi
On 9/12/20 6:00 AM, Bart Van Assche wrote:
> On 2020-09-08 19:01, Bart Van Assche wrote:
>> The above patch didn't compile, but the patch below does and makes the hang
>> disappear. So feel free to add the following to the patch below:
>>
>> Tested-by: Bart Van Assche <[email protected]>
>>
>> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
>> index c36b4d2b61e0..23ee65a9185f 100644
>> --- a/drivers/infiniband/core/device.c
>> +++ b/drivers/infiniband/core/device.c
>> @@ -1285,6 +1285,8 @@ static void disable_device(struct ib_device *device)
>> remove_client_context(device, cid);
>> }
>>
>> + ib_cq_pool_destroy(device);
>> +
>> /* Pairs with refcount_set in enable_device */
>> ib_device_put(device);
>> wait_for_completion(&device->unreg_completion);
>> @@ -1328,6 +1330,8 @@ static int enable_device_and_get(struct ib_device *device)
>> goto out;
>> }
>>
>> + ib_cq_pool_init(device);
>> +
>> down_read(&clients_rwsem);
>> xa_for_each_marked (&clients, index, client, CLIENT_REGISTERED) {
>> ret = add_client_context(device, client);
>> @@ -1400,7 +1404,6 @@ int ib_register_device(struct ib_device *device, const char *name)
>> goto dev_cleanup;
>> }
>>
>> - ib_cq_pool_init(device);
>> ret = enable_device_and_get(device);
>> dev_set_uevent_suppress(&device->dev, false);
>> /* Mark for userspace that device is ready */
>> @@ -1455,7 +1458,6 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
>> goto out;
>>
>> disable_device(ib_dev);
>> - ib_cq_pool_destroy(ib_dev);
>>
>> /* Expedite removing unregistered pointers from the hash table */
>> free_netdevs(ib_dev);
> Hi Jason,
>
> Please let me know how you want to proceed with this patch.
>
> Thanks,
>
> Bart.
>
On Fri, Sep 11, 2020 at 03:00:30PM -0700, Bart Van Assche wrote:
> On 2020-09-08 19:01, Bart Van Assche wrote:
> > The above patch didn't compile, but the patch below does and makes the hang
> > disappear. So feel free to add the following to the patch below:
> >
> > Tested-by: Bart Van Assche <[email protected]>
> >
> > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > index c36b4d2b61e0..23ee65a9185f 100644
> > +++ b/drivers/infiniband/core/device.c
> > @@ -1285,6 +1285,8 @@ static void disable_device(struct ib_device *device)
> > remove_client_context(device, cid);
> > }
> >
> > + ib_cq_pool_destroy(device);
> > +
> > /* Pairs with refcount_set in enable_device */
> > ib_device_put(device);
> > wait_for_completion(&device->unreg_completion);
> > @@ -1328,6 +1330,8 @@ static int enable_device_and_get(struct ib_device *device)
> > goto out;
> > }
> >
> > + ib_cq_pool_init(device);
> > +
> > down_read(&clients_rwsem);
> > xa_for_each_marked (&clients, index, client, CLIENT_REGISTERED) {
> > ret = add_client_context(device, client);
> > @@ -1400,7 +1404,6 @@ int ib_register_device(struct ib_device *device, const char *name)
> > goto dev_cleanup;
> > }
> >
> > - ib_cq_pool_init(device);
> > ret = enable_device_and_get(device);
> > dev_set_uevent_suppress(&device->dev, false);
> > /* Mark for userspace that device is ready */
> > @@ -1455,7 +1458,6 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
> > goto out;
> >
> > disable_device(ib_dev);
> > - ib_cq_pool_destroy(ib_dev);
> >
> > /* Expedite removing unregistered pointers from the hash table */
> > free_netdevs(ib_dev);
>
> Hi Jason,
>
> Please let me know how you want to proceed with this patch.
Applied to for-rc
Thanks,
Jason