2022-08-20 22:00:27

by Chuck Lever III

[permalink] [raw]
Subject: check_flush_dependency WARNING on NFS/RDMA mount

Hi-

This warning just popped on a stuck NFS/RDMA mount (the Ethernet switch
port VLAN settings were not correct):

Aug 20 17:12:05 bazille.1015granger.net kernel: workqueue: WQ_MEM_RECLAIM xprtiod:xprt_rdma_connect_worker [rpcrdma] is flushing !WQ_MEM_RECLAI>
Aug 20 17:12:05 bazille.1015granger.net kernel: WARNING: CPU: 0 PID: 100 at kernel/workqueue.c:2628 check_flush_dependency+0xbf/0xca

Aug 20 17:12:05 bazille.1015granger.net kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]

Aug 20 17:12:05 bazille.1015granger.net kernel: Call Trace:
Aug 20 17:12:05 bazille.1015granger.net kernel: <TASK>
Aug 20 17:12:05 bazille.1015granger.net kernel: __flush_work.isra.0+0xaf/0x188
Aug 20 17:12:05 bazille.1015granger.net kernel: ? _raw_spin_lock_irqsave+0x2c/0x37
Aug 20 17:12:05 bazille.1015granger.net kernel: ? lock_timer_base+0x38/0x5f
Aug 20 17:12:05 bazille.1015granger.net kernel: __cancel_work_timer+0xea/0x13d
Aug 20 17:12:05 bazille.1015granger.net kernel: ? preempt_latency_start+0x2b/0x46
Aug 20 17:12:05 bazille.1015granger.net kernel: rdma_addr_cancel+0x70/0x81 [ib_core]
Aug 20 17:12:05 bazille.1015granger.net kernel: _destroy_id+0x1a/0x246 [rdma_cm]
Aug 20 17:12:05 bazille.1015granger.net kernel: rpcrdma_xprt_connect+0x115/0x5ae [rpcrdma]
Aug 20 17:12:05 bazille.1015granger.net kernel: ? _raw_spin_unlock+0x14/0x29
Aug 20 17:12:05 bazille.1015granger.net kernel: ? raw_spin_rq_unlock_irq+0x5/0x10
Aug 20 17:12:05 bazille.1015granger.net kernel: ? finish_task_switch.isra.0+0x171/0x249
Aug 20 17:12:05 bazille.1015granger.net kernel: xprt_rdma_connect_worker+0x3b/0xc7 [rpcrdma]
Aug 20 17:12:05 bazille.1015granger.net kernel: process_one_work+0x1d8/0x2d4
Aug 20 17:12:05 bazille.1015granger.net kernel: worker_thread+0x18b/0x24f
Aug 20 17:12:05 bazille.1015granger.net kernel: ? rescuer_thread+0x280/0x280
Aug 20 17:12:05 bazille.1015granger.net kernel: kthread+0xf4/0xfc
Aug 20 17:12:05 bazille.1015granger.net kernel: ? kthread_complete_and_exit+0x1b/0x1b
Aug 20 17:12:05 bazille.1015granger.net kernel: ret_from_fork+0x22/0x30
Aug 20 17:12:05 bazille.1015granger.net kernel: </TASK>

At a guess, the recent changes to the WQ_MEM_RECLAIM settings in the
RPC xprt code did not get carried over to rpcrdma... ? Need some
guidance please, and I can write and test a fix for this.


--
Chuck Lever




2022-08-20 22:25:54

by Trond Myklebust

[permalink] [raw]
Subject: Re: check_flush_dependency WARNING on NFS/RDMA mount

On Sat, 2022-08-20 at 21:55 +0000, Chuck Lever III wrote:
> Hi-
>
> This warning just popped on a stuck NFS/RDMA mount (the Ethernet
> switch
> port VLAN settings were not correct):
>
> Aug 20 17:12:05 bazille.1015granger.net kernel: workqueue:
> WQ_MEM_RECLAIM xprtiod:xprt_rdma_connect_worker [rpcrdma] is flushing
> !WQ_MEM_RECLAI>
> Aug 20 17:12:05 bazille.1015granger.net kernel: WARNING: CPU: 0 PID:
> 100 at kernel/workqueue.c:2628 check_flush_dependency+0xbf/0xca
>
> Aug 20 17:12:05 bazille.1015granger.net kernel: Workqueue: xprtiod
> xprt_rdma_connect_worker [rpcrdma]
>
> Aug 20 17:12:05 bazille.1015granger.net kernel: Call Trace:
> Aug 20 17:12:05 bazille.1015granger.net kernel:  <TASK>
> Aug 20 17:12:05 bazille.1015granger.net kernel: 
> __flush_work.isra.0+0xaf/0x188
> Aug 20 17:12:05 bazille.1015granger.net kernel:  ?
> _raw_spin_lock_irqsave+0x2c/0x37
> Aug 20 17:12:05 bazille.1015granger.net kernel:  ?
> lock_timer_base+0x38/0x5f
> Aug 20 17:12:05 bazille.1015granger.net kernel: 
> __cancel_work_timer+0xea/0x13d
> Aug 20 17:12:05 bazille.1015granger.net kernel:  ?
> preempt_latency_start+0x2b/0x46
> Aug 20 17:12:05 bazille.1015granger.net kernel: 
> rdma_addr_cancel+0x70/0x81 [ib_core]
> Aug 20 17:12:05 bazille.1015granger.net kernel: 
> _destroy_id+0x1a/0x246 [rdma_cm]
> Aug 20 17:12:05 bazille.1015granger.net kernel: 
> rpcrdma_xprt_connect+0x115/0x5ae [rpcrdma]
> Aug 20 17:12:05 bazille.1015granger.net kernel:  ?
> _raw_spin_unlock+0x14/0x29
> Aug 20 17:12:05 bazille.1015granger.net kernel:  ?
> raw_spin_rq_unlock_irq+0x5/0x10
> Aug 20 17:12:05 bazille.1015granger.net kernel:  ?
> finish_task_switch.isra.0+0x171/0x249
> Aug 20 17:12:05 bazille.1015granger.net kernel: 
> xprt_rdma_connect_worker+0x3b/0xc7 [rpcrdma]
> Aug 20 17:12:05 bazille.1015granger.net kernel: 
> process_one_work+0x1d8/0x2d4
> Aug 20 17:12:05 bazille.1015granger.net kernel: 
> worker_thread+0x18b/0x24f
> Aug 20 17:12:05 bazille.1015granger.net kernel:  ?
> rescuer_thread+0x280/0x280
> Aug 20 17:12:05 bazille.1015granger.net kernel:  kthread+0xf4/0xfc
> Aug 20 17:12:05 bazille.1015granger.net kernel:  ?
> kthread_complete_and_exit+0x1b/0x1b
> Aug 20 17:12:05 bazille.1015granger.net kernel: 
> ret_from_fork+0x22/0x30
> Aug 20 17:12:05 bazille.1015granger.net kernel:  </TASK>
>
> At a guess, the recent changes to the WQ_MEM_RECLAIM settings in the
> RPC xprt code did not get carried over to rpcrdma... ? Need some
> guidance please, and I can write and test a fix for this.
>

Looks like you're trying to cancel work on a non-memory reclaim
workqueue from a job running on a queue that is flagged as a memory
reclaim workqueue. That's a priority inversion problem.


Basically, you need to change

int addr_init(void)
{
addr_wq = alloc_ordered_workqueue("ib_addr", 0);
if (!addr_wq)
return -ENOMEM;

register_netevent_notifier(&nb);

return 0;
}

and flag addr_wq as being a WQ_MEM_RECLAIM queue.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]