2022-02-14 09:53:28

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [syzbot] possible deadlock in worker_thread

On 2022/02/14 8:06, Bart Van Assche wrote:
> On 2/12/22 09:14, Tetsuo Handa wrote:
>> How can reviewing all flush_workqueue(system_long_wq) calls help?
>
> It is allowed to queue blocking actions on system_long_wq.

Correct.

> flush_workqueue(system_long_wq) can make a lower layer (e.g. ib_srp)
> wait on a blocking action from a higher layer (e.g. the loop driver)
> and thereby cause a deadlock.

Correct.

> Hence my proposal to review all flush_workqueue(system_long_wq) calls.

Maybe I'm misunderstanding what the "review" means.

My proposal is to "rewrite" any module which needs to call flush_workqueue()
on system-wide workqueues or call flush_work()/flush_*_work() which will
depend on system-wide workqueues.

That is, for example, "rewrite" ib_srp module not to call flush_workqueue(system_long_wq).

+ srp_tl_err_wq = alloc_workqueue("srp_tl_err_wq", 0, 0);

- queue_work(system_long_wq, &target->tl_err_work);
+ queue_work(srp_tl_err_wq, &target->tl_err_work);

- flush_workqueue(system_long_wq);
+ flush_workqueue(srp_tl_err_wq);

+ destroy_workqueue(srp_tl_err_wq);

Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.