Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp435918imn; Thu, 28 Jul 2022 05:28:02 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vmVjo2HyzSHK/LatXUi6LyylzE1KzHmI9U1atm/zPnODmiOUdCgzm+tlCRHGJw9fhBylTu X-Received: by 2002:a17:906:9b8a:b0:72f:9e7d:c458 with SMTP id dd10-20020a1709069b8a00b0072f9e7dc458mr20279132ejc.213.1659011282748; Thu, 28 Jul 2022 05:28:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659011282; cv=none; d=google.com; s=arc-20160816; b=J9Zl0h4xyI1URkDFsE8sT+vNoEsSg/yjKRpYH5lncYBeutIO36verj1paVdpI+vhd3 tS5vrrh9pNJxRBvYRI7noKHw7oU/+DQQVmJwzyMcItlacIrDbPlpj+7/YWPQx9AQrNn5 OSVZzy7cjQb7ZkKJjKjV2d3KIiJdKpzC1+c/TdgIviN4YVVrEmDsFUvgFntVrrOsjXx6 hXacGeI07UJunQnLcWg5f//XDI/xCeOOFCm9K+Fl2tXBx0F94FhOp4JueTEKaswXd4s0 6dZOvaFrKIbwE4/7fDwsw+Onn4uHhuo9DzUTGN+ye+FDG1RaCkt8g+bgt+EDRUr9VNdI EiJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:subject:from:cc:to :content-language:user-agent:mime-version:date:message-id; bh=B3Txgc7Z/w/Nyvq6hnfh30Q76SJAOci2BbOU96VTlDU=; b=PCFAwsAFvIV3xjj5A92d5iM/vzCS+bj1yrRX3Kr8npfvhQ3VShLL9CPymvYPK6L5i6 q8RETKM5lVd0M04IedUvCnWccZqFFAeevQQO8Tt9Dldyora9DXsNZBgG4ehfo82yXL0V 541W5bA4feZ9Ab2IAHC/8glYWgoIfIoyd+4izY44J+q4//kPHweln9tu9XA04ljq2fJW jcPq7BZfDIKdMCoD2EiGT8HCB/1E7WiVk66p7JtVEdZmMvrYMeK9v9oW0H/BdOkXMMsB v4B8F2wBUXaJBCi7GYGnJ/8bOKiwVFk09iBDfErPsD68BCvRZ9amovZqa3mOR9N0O3MU deXQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y30-20020a50ce1e000000b0043acab6b407si698317edi.424.2022.07.28.05.27.36; Thu, 28 Jul 2022 05:28:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238335AbiG1MXv (ORCPT + 99 others); Thu, 28 Jul 2022 08:23:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235980AbiG1MXs (ORCPT ); Thu, 28 Jul 2022 08:23:48 -0400 Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30DC915828 for ; Thu, 28 Jul 2022 05:23:47 -0700 (PDT) Received: from fsav415.sakura.ne.jp (fsav415.sakura.ne.jp [133.242.250.114]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 26SCNTq3090536; Thu, 28 Jul 2022 21:23:29 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav415.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav415.sakura.ne.jp); Thu, 28 Jul 2022 21:23:29 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav415.sakura.ne.jp) Received: from [192.168.1.9] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 26SCNSPB090528 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Thu, 28 Jul 2022 21:23:29 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: <21b9c1ac-64b7-7f4b-1e62-bf2f021fffcd@I-love.SAKURA.ne.jp> Date: Thu, 28 Jul 2022 21:23:25 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Content-Language: en-US To: Tejun Heo , Lai Jiangshan , Johannes Berg , Hillf Danton Cc: LKML From: Tetsuo Handa Subject: [PATCH] workqueue: don't skip lockdep wq dependency in cancel_work_sync() Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Like Hillf Danton mentioned syzbot should have been able to catch cancel_work_sync() in work context by checking lockdep_map in __flush_work() for both flush and cancel. in [1], being unable to report an obvious deadlock scenario shown below is broken. From locking dependency perspective, sync version of cancel request should behave as if flush request, for it waits for completion of work if that work has already started execution. ---------- #include #include static DEFINE_MUTEX(mutex); static void work_fn(struct work_struct *work) { schedule_timeout_uninterruptible(HZ / 5); mutex_lock(&mutex); mutex_unlock(&mutex); } static DECLARE_WORK(work, work_fn); static int __init test_init(void) { schedule_work(&work); schedule_timeout_uninterruptible(HZ / 10); mutex_lock(&mutex); cancel_work_sync(&work); mutex_unlock(&mutex); return -EINVAL; } module_init(test_init); MODULE_LICENSE("GPL"); ---------- Link: https://lkml.kernel.org/r/20220504044800.4966-1-hdanton@sina.com [1] Reported-by: Hillf Danton Fixes: d6e89786bed977f3 ("workqueue: skip lockdep wq dependency in cancel_work_sync()") Cc: Johannes Berg Signed-off-by: Tetsuo Handa --- kernel/workqueue.c | 45 ++++++++++++++++++--------------------------- 1 file changed, 18 insertions(+), 27 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 1ea50f6be843..e6df688f84db 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3000,8 +3000,7 @@ void drain_workqueue(struct workqueue_struct *wq) } EXPORT_SYMBOL_GPL(drain_workqueue); -static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, - bool from_cancel) +static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr) { struct worker *worker = NULL; struct worker_pool *pool; @@ -3043,8 +3042,7 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, * workqueues the deadlock happens when the rescuer stalls, blocking * forward progress. */ - if (!from_cancel && - (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer)) { + if (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer) { lock_map_acquire(&pwq->wq->lockdep_map); lock_map_release(&pwq->wq->lockdep_map); } @@ -3056,7 +3054,18 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, return false; } -static bool __flush_work(struct work_struct *work, bool from_cancel) +/** + * flush_work - wait for a work to finish executing the last queueing instance + * @work: the work to flush + * + * Wait until @work has finished execution. @work is guaranteed to be idle + * on return if it hasn't been requeued since flush started. + * + * Return: + * %true if flush_work() waited for the work to finish execution, + * %false if it was already idle. + */ +bool flush_work(struct work_struct *work) { struct wq_barrier barr; @@ -3066,12 +3075,10 @@ static bool __flush_work(struct work_struct *work, bool from_cancel) if (WARN_ON(!work->func)) return false; - if (!from_cancel) { - lock_map_acquire(&work->lockdep_map); - lock_map_release(&work->lockdep_map); - } + lock_map_acquire(&work->lockdep_map); + lock_map_release(&work->lockdep_map); - if (start_flush_work(work, &barr, from_cancel)) { + if (start_flush_work(work, &barr)) { wait_for_completion(&barr.done); destroy_work_on_stack(&barr.work); return true; @@ -3079,22 +3086,6 @@ static bool __flush_work(struct work_struct *work, bool from_cancel) return false; } } - -/** - * flush_work - wait for a work to finish executing the last queueing instance - * @work: the work to flush - * - * Wait until @work has finished execution. @work is guaranteed to be idle - * on return if it hasn't been requeued since flush started. - * - * Return: - * %true if flush_work() waited for the work to finish execution, - * %false if it was already idle. - */ -bool flush_work(struct work_struct *work) -{ - return __flush_work(work, false); -} EXPORT_SYMBOL_GPL(flush_work); struct cwt_wait { @@ -3159,7 +3150,7 @@ static bool __cancel_work_timer(struct work_struct *work, bool is_dwork) * isn't executing. */ if (wq_online) - __flush_work(work, true); + flush_work(work); clear_work_data(work); -- 2.18.4