Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp2979802pxm; Mon, 28 Feb 2022 09:38:33 -0800 (PST) X-Google-Smtp-Source: ABdhPJwaU3CAaJEmy8Nb4H3cwT8MLcFRJDGxjnKEncSZdYEFtGC7bOIcrsV0PruJhUo9mTu9ewDx X-Received: by 2002:a17:90a:e38b:b0:1bc:a1d6:1b31 with SMTP id b11-20020a17090ae38b00b001bca1d61b31mr17585269pjz.29.1646069913416; Mon, 28 Feb 2022 09:38:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646069913; cv=none; d=google.com; s=arc-20160816; b=uJJp+PQ+SZMKTV3N80RJrM8xMTZKce+uS18aY3TXUiQb8zl73597+O02dSFONaxdCL 2fcpPwng6HAhYZzPZd4KAmNYqs9iqC6BBx6P159jNO2ZmxMVeDL6s+aDyNKtT41pwwCb mXE16EfInTa32c/8VYvVYOWxHvz91ZX8OgeDYsWuN72n2cJgcivC5QjUcpyqHFb5ffFj 85nK+7llmKe5Zr6cJEL8o5oqUtrsGJauqW/MEPVc15p1/oii+FD2nxWHRNwQi7rHVHNM JJx60IBB48ml2sELsyKbv5exf4Z7TvzRn5kMJZ+xshBncuJJZUtLnAYNB0/4t83zjki/ FWfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=W9UwP6My9SHrInTIN5nNaWWzKJZemCuajMhbE3xWJZA=; b=zU7gGexZIvesQ61z5ugxFHCxGaY9kXNBfyCHgdQKCT/zVmcDkNnq6AcL0WcDWe0GSF C5J20CIg1ulrRuknL2onuVScTRkE4hjUJKPWoOy7NRsE90uk6nI4eJQWEN+mKZhtjQe3 jzlFfO+bybOk1iNPTLfrQUEsN0mJ8LA+Seu1TqLQfQSPNflXokRgWD0FMdsR9euW0o7f A8S7jTmvRy3bW7o5vFwADNtJKB2XSiRMV0pP1dZ0Cku7hbXBf1yTddTl5OK0CVCTXqLG /7DJtC69+VGl3/bERNgZF3aACZvL8if1jUZEO0on9eQcYcxw7qBX2aCBZMHxX361PSDQ NyFQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u7-20020a634547000000b003428174afe9si10047718pgk.318.2022.02.28.09.38.17; Mon, 28 Feb 2022 09:38:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232896AbiB1OEs (ORCPT + 99 others); Mon, 28 Feb 2022 09:04:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232334AbiB1OEp (ORCPT ); Mon, 28 Feb 2022 09:04:45 -0500 Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEE8256225 for ; Mon, 28 Feb 2022 06:04:05 -0800 (PST) Received: from fsav120.sakura.ne.jp (fsav120.sakura.ne.jp [27.133.134.247]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 21SE3rFv077141; Mon, 28 Feb 2022 23:03:53 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav120.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav120.sakura.ne.jp); Mon, 28 Feb 2022 23:03:53 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav120.sakura.ne.jp) Received: from [192.168.1.9] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 21SE3lRI077118 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Mon, 28 Feb 2022 23:03:53 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: <4165db50-1365-549a-eb77-6122c78d4814@I-love.SAKURA.ne.jp> Date: Mon, 28 Feb 2022 23:03:47 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: [PATCH v3] workqueue: Warn flushing of kernel-global workqueues Content-Language: en-US To: Tejun Heo Cc: 0day robot , LKML , lkp@lists.01.org, kernel test robot References: <20220221083358.GC835@xsang-OptiPlex-9020> <3a20c799-c18e-dd3a-3161-fee6bca1491e@I-love.SAKURA.ne.jp> <16a33a65-3c67-ef66-ccc8-9c4fffb0ae5a@I-love.SAKURA.ne.jp> <9a883d72-ea7d-1936-93e6-5c2a290509d4@I-love.SAKURA.ne.jp> From: Tetsuo Handa In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/02/24 7:29, Tejun Heo wrote: > On Thu, Feb 24, 2022 at 07:26:30AM +0900, Tetsuo Handa wrote: >>> The patch seems pretty wrong. What's problematic is system workqueue flushes >>> (which flushes the entire workqueue), not work item flushes. >> >> Why? My understanding is that >> >> flushing a workqueue waits for completion of all work items in that workqueue >> >> flushing a work item waits for for completion of that work item using >> a workqueue specified as of queue_work() >> >> and >> >> if a work item in some workqueue is blocked by other work in that workqueue >> (e.g. max_active limit, work items on that workqueue and locks they need), >> it has a risk of deadlock >> >> . Then, how can flushing a work item using system-wide workqueues be free of deadlock risk? >> Isn't it just "unlikely to deadlock" rather than "impossible to deadlock"? > > If we're jamming system_wq with a combination of work items which need more > than max_active to make forward progress, we're stuck regardless of flushes. > What's needed at that point is increasing max_active (or something along > that line). Then, what about this? If this looks OK, I'll test this patch using linux-next via my tree. From 75ccc165bfcea368728d8c0f7efcc8b4f904f98d Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Mon, 28 Feb 2022 22:49:16 +0900 Subject: [PATCH v3] workqueue: Warn flushing of kernel-global workqueues Since flush operation synchronously waits for completion, flushing kernel-global WQs (e.g. system_wq) might introduce possibility of deadlock due to unexpected locking dependency. Tejun Heo commented that it makes no sense at all to call flush_workqueue() on the shared WQs as the caller has no idea what it's gonna end up waiting for. Although there is flush_scheduled_work() which flushes system_wq WQ with "Think twice before calling this function! It's very easy to get into trouble if you don't take great care." warning message, syzbot found a circular locking dependency caused by flushing system_long_wq WQ [1]. Therefore, let's change the direction to that developers had better use their local WQs if flush_workqueue() is inevitable. To give developers time to update their modules, for now just emit a warning message with ratelimit when flush_workqueue() is called on kernel-global WQs. We will eventually convert this warning message into WARN_ON() and kill flush_scheduled_work(). This patch introduces __WQ_NO_FLUSH flag for emitting warning. Don't set this flag when creating your local WQs while updating your module, for destroy_workqueue() will involve flush operation. Theoretically, flushing specific work item using flush_work() queued on kernel-global WQs (which are !WQ_MEM_RECLAIM) has possibility of deadlock. But this patch does not emit warning when flush_work() is called on work items queued on kernel-global WQs, based on assumption that we can create kworker threads as needed and we won't hit max_active limit. Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1] Signed-off-by: Tetsuo Handa --- Changes in v3: Don't check flush_work() attempt. Use a private WQ_ flag. Changes in v2: Removed #ifdef CONFIG_PROVE_LOCKING=y check. Also check flush_work() attempt. Shorten warning message. Introduced a public WQ_ flag, which is initially meant for use by only system-wide WQs, but allows private WQs used by built-in modules to use this flag for detecting unexpected flush attempts if they want. include/linux/workqueue.h | 15 +++------------ kernel/workqueue.c | 36 +++++++++++++++++++++++++++++------- 2 files changed, 32 insertions(+), 19 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 7fee9b6cfede..7b13fae377e2 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -339,6 +339,7 @@ enum { __WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */ __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */ __WQ_ORDERED_EXPLICIT = 1 << 19, /* internal: alloc_ordered_workqueue() */ + __WQ_NO_FLUSH = 1 << 20, /* internal: warn flush_workqueue() */ WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */ WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */ @@ -569,18 +570,8 @@ static inline bool schedule_work(struct work_struct *work) * Forces execution of the kernel-global workqueue and blocks until its * completion. * - * Think twice before calling this function! It's very easy to get into - * trouble if you don't take great care. Either of the following situations - * will lead to deadlock: - * - * One of the work items currently on the workqueue needs to acquire - * a lock held by your code or its caller. - * - * Your code is running in the context of a work routine. - * - * They will be detected by lockdep when they occur, but the first might not - * occur very often. It depends on what work items are on the workqueue and - * what locks they need, which you have no control over. + * Please stop calling this function. If you need to flush kernel-global + * workqueue, please use your local workqueue. * * In most situations flushing the entire workqueue is overkill; you merely * need to know that a particular work item isn't queued and isn't running. diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 33f1106b4f99..bc271579704f 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2805,6 +2805,25 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq, return wait; } +static void warn_flush_attempt(struct workqueue_struct *wq) +{ + /* + * Since there are known in-tree modules which will emit this warning, + * for now don't use WARN_ON() in order not to break kernel testing. + * + * Print whole traces with ratelimit, in order to make sure that + * this warning is not overlooked while this warning does not flood + * console and kernel log buffer. + */ + static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1); + + ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE); + if (!__ratelimit(&flush_warn_rs)) + return; + pr_warn("Please do not flush %s WQ.\n", wq->name); + dump_stack(); +} + /** * flush_workqueue - ensure that any scheduled work has run to completion. * @wq: workqueue to flush @@ -2824,6 +2843,9 @@ void flush_workqueue(struct workqueue_struct *wq) if (WARN_ON(!wq_online)) return; + if (unlikely(wq->flags & __WQ_NO_FLUSH)) + warn_flush_attempt(wq); + lock_map_acquire(&wq->lockdep_map); lock_map_release(&wq->lockdep_map); @@ -6054,17 +6076,17 @@ void __init workqueue_init_early(void) ordered_wq_attrs[i] = attrs; } - system_wq = alloc_workqueue("events", 0, 0); - system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0); - system_long_wq = alloc_workqueue("events_long", 0, 0); - system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND, + system_wq = alloc_workqueue("events", __WQ_NO_FLUSH, 0); + system_highpri_wq = alloc_workqueue("events_highpri", __WQ_NO_FLUSH | WQ_HIGHPRI, 0); + system_long_wq = alloc_workqueue("events_long", __WQ_NO_FLUSH, 0); + system_unbound_wq = alloc_workqueue("events_unbound", __WQ_NO_FLUSH | WQ_UNBOUND, WQ_UNBOUND_MAX_ACTIVE); system_freezable_wq = alloc_workqueue("events_freezable", - WQ_FREEZABLE, 0); + __WQ_NO_FLUSH | WQ_FREEZABLE, 0); system_power_efficient_wq = alloc_workqueue("events_power_efficient", - WQ_POWER_EFFICIENT, 0); + __WQ_NO_FLUSH | WQ_POWER_EFFICIENT, 0); system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient", - WQ_FREEZABLE | WQ_POWER_EFFICIENT, + __WQ_NO_FLUSH | WQ_FREEZABLE | WQ_POWER_EFFICIENT, 0); BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq || !system_unbound_wq || !system_freezable_wq || -- 2.32.0