Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
MIME-Version: 1.0
References: <21b9c1ac-64b7-7f4b-1e62-bf2f021fffcd@I-love.SAKURA.ne.jp>
In-Reply-To: <21b9c1ac-64b7-7f4b-1e62-bf2f021fffcd@I-love.SAKURA.ne.jp>
From:   Lai Jiangshan <jiangshanlai@gmail.com>
Date:   Fri, 29 Jul 2022 10:38:36 +0800
Message-ID: <CAJhGHyCbEn-gTaK57-QA7cszRS2hyBPWt2wPhmiGDY5M9x5eKg@mail.gmail.com>
Subject: Re: [PATCH] workqueue: don't skip lockdep wq dependency in cancel_work_sync()
To:     Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc:     Tejun Heo <tj@kernel.org>, Johannes Berg <johannes.berg@intel.com>,
        Hillf Danton <hdanton@sina.com>,
        LKML <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Thu, Jul 28, 2022 at 8:23 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Like Hillf Danton mentioned
>
>   syzbot should have been able to catch cancel_work_sync() in work context
>   by checking lockdep_map in __flush_work() for both flush and cancel.
>
> in [1], being unable to report an obvious deadlock scenario shown below is
> broken. From locking dependency perspective, sync version of cancel request
> should behave as if flush request, for it waits for completion of work if
> that work has already started execution.
>
>   ----------
>   #include <linux/module.h>
>   #include <linux/sched.h>
>   static DEFINE_MUTEX(mutex);
>   static void work_fn(struct work_struct *work)
>   {
>     schedule_timeout_uninterruptible(HZ / 5);
>     mutex_lock(&mutex);
>     mutex_unlock(&mutex);
>   }
>   static DECLARE_WORK(work, work_fn);
>   static int __init test_init(void)
>   {
>     schedule_work(&work);
>     schedule_timeout_uninterruptible(HZ / 10);
>     mutex_lock(&mutex);
>     cancel_work_sync(&work);
>     mutex_unlock(&mutex);
>     return -EINVAL;
>   }
>   module_init(test_init);
>   MODULE_LICENSE("GPL");
>   ----------
>
> Link: https://lkml.kernel.org/r/20220504044800.4966-1-hdanton@sina.com [1]
> Reported-by: Hillf Danton <hdanton@sina.com>
> Fixes: d6e89786bed977f3 ("workqueue: skip lockdep wq dependency in cancel_work_sync()")
> Cc: Johannes Berg <johannes.berg@intel.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  kernel/workqueue.c | 45 ++++++++++++++++++---------------------------
>  1 file changed, 18 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1ea50f6be843..e6df688f84db 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3000,8 +3000,7 @@ void drain_workqueue(struct workqueue_struct *wq)
>  }
>  EXPORT_SYMBOL_GPL(drain_workqueue);
>
> -static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
> -                            bool from_cancel)
> +static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
>  {
>         struct worker *worker = NULL;
>         struct worker_pool *pool;
> @@ -3043,8 +3042,7 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
>          * workqueues the deadlock happens when the rescuer stalls, blocking
>          * forward progress.
>          */
> -       if (!from_cancel &&
> -           (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer)) {
> +       if (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer) {
>                 lock_map_acquire(&pwq->wq->lockdep_map);
>                 lock_map_release(&pwq->wq->lockdep_map);
>         }
> @@ -3056,7 +3054,18 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
>         return false;
>  }
>
> -static bool __flush_work(struct work_struct *work, bool from_cancel)
> +/**
> + * flush_work - wait for a work to finish executing the last queueing instance
> + * @work: the work to flush
> + *
> + * Wait until @work has finished execution.  @work is guaranteed to be idle
> + * on return if it hasn't been requeued since flush started.
> + *
> + * Return:
> + * %true if flush_work() waited for the work to finish execution,
> + * %false if it was already idle.
> + */
> +bool flush_work(struct work_struct *work)
>  {
>         struct wq_barrier barr;
>
> @@ -3066,12 +3075,10 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
>         if (WARN_ON(!work->func))
>                 return false;
>
> -       if (!from_cancel) {
> -               lock_map_acquire(&work->lockdep_map);
> -               lock_map_release(&work->lockdep_map);
> -       }
> +       lock_map_acquire(&work->lockdep_map);
> +       lock_map_release(&work->lockdep_map);


IIUC, I think the change of these 5 lines of code (-3+2) is enough
to fix the problem described in the changelog.

If so, could you make a minimal patch?

I believe what the commit d6e89786bed977f3 ("workqueue: skip lockdep
wq dependency in cancel_work_sync()") fixes is real.  It is not a good
idea to revert it.

P.S.

The commit fd1a5b04dfb8("workqueue: Remove now redundant lock
acquisitions wrt. workqueue flushes") removed this lockdep check.

And the commit 87915adc3f0a("workqueue: re-add lockdep
dependencies for flushing") added it back for non-canceling cases.

It seems the commit fd1a5b04dfb8 is the culprit and 87915adc3f0a
didn't fixes all the problem of it.

So it is better to complete 87915adc3f0a by making __flush_work()
does lock_map_acquire(&work->lockdep_map) for both canceling and
non-canceling cases.