Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp935637pxb; Wed, 6 Apr 2022 04:46:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwrCoJrXAnICGKWpZn2B2Lu8+NNMMjGtVMvdSdNVWmmnvsUr7VGuO4lUKTc48LEolM263Sc X-Received: by 2002:a63:574d:0:b0:386:c67:b383 with SMTP id h13-20020a63574d000000b003860c67b383mr6738184pgm.324.1649245601936; Wed, 06 Apr 2022 04:46:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649245601; cv=none; d=google.com; s=arc-20160816; b=aG2EKW+X60wBuo88tnqcrrRHxZpuBJA7alK2bH3H90PjLTOryy2511TeV1CWo9qwFR HMzkFwjYGtb7uwiRe5LxME5dx19pW0RqWeFBteJ7cn3Kgl+7i3qJ42Xg1RYj413KB2Yh VDCCt9daZh469wm3xKWBb3AI0N5SoEwANLbSOwiGze/jlUzzvT0V0D/U9AvC4D/G0hwm uH74oy90wSbnHMtO1q7aLNjNDUl6ltITiz+czVSMtHmYKJRggApVFXidhnJ9kf4H8AFz 1bYWvOMcnM5LbWRidmkHohSaigbRmqbfblPY0u0lJFxM70d37SZvkj68thPGSKDhEZkc qHog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=t/Ca+MCEiUYN8q/vY52u0IcU4YEIeGrKTRhwQfOfeAY=; b=pSWC8a10Mc7gXeXSxrxB9s6mxBlki0W46Q9wCRWIGpcuWLHraiOWvmV77wVdx1mBRJ 16JZmL6lOnAYL7+RgNqt1wX8pshHRhaiJDbHr3YOAw/Xxz4EsKGcqQf784P+GmX4fzAX VeVoJiWiLBeZJ7wNFmX30/4PSUNzz9u3n862t95iKw5Kiqvyc2ajhn6hXp8z1ZkabXeB ng6rq4qfdzGeXkRbGsQ8TWrdXCX9KrZ7PUMtiIlifhi34WA/heM512FoBqjXJDMzwx04 t4tCAFDnfAr/6luL1P+ZgI+Dwyje7uzIrggqhWIS8bdTf56pzCraBIG6LLAR5oPWpZFy xHzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b="EMOpZ/6z"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id k11-20020a63d84b000000b0038618399001si15518712pgj.26.2022.04.06.04.46.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 04:46:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b="EMOpZ/6z"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2DB3A628797; Wed, 6 Apr 2022 03:03:37 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1836485AbiDFAgY (ORCPT + 99 others); Tue, 5 Apr 2022 20:36:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355268AbiDEKSu (ORCPT ); Tue, 5 Apr 2022 06:18:50 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4363C10FE1; Tue, 5 Apr 2022 03:04:37 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CDB5F61673; Tue, 5 Apr 2022 10:04:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DE4FEC385A2; Tue, 5 Apr 2022 10:04:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1649153076; bh=uos+dZQFyuLbUjPtiaSV8Nc9hZwgOh9eP9pCprv8FZs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EMOpZ/6zfwaZ/KCCdwYhL17zUXNW2udDxGs2Zq+CTaC2r/tOsEG41czJSWiiSUD1s DF9oOqmmW0fc9iQV7MbH28XJKuRiX9luX7mx9gKZzk5UNtUjHBXq/fVik3eSUspq2W dF6xPXmOgGwIEneCrH8xpG++dT6t8oChCIGfONPI= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Shinichiro Kawasaki , Jens Axboe Subject: [PATCH 5.10 094/599] block: limit request dispatch loop duration Date: Tue, 5 Apr 2022 09:26:28 +0200 Message-Id: <20220405070301.627744791@linuxfoundation.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220405070258.802373272@linuxfoundation.org> References: <20220405070258.802373272@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shin'ichiro Kawasaki commit 572299f03afd676dd4e20669cdaf5ed0fe1379d4 upstream. When IO requests are made continuously and the target block device handles requests faster than request arrival, the request dispatch loop keeps on repeating to dispatch the arriving requests very long time, more than a minute. Since the loop runs as a workqueue worker task, the very long loop duration triggers workqueue watchdog timeout and BUG [1]. To avoid the very long loop duration, break the loop periodically. When opportunity to dispatch requests still exists, check need_resched(). If need_resched() returns true, the dispatch loop already consumed its time slice, then reschedule the dispatch work and break the loop. With heavy IO load, need_resched() does not return true for 20~30 seconds. To cover such case, check time spent in the dispatch loop with jiffies. If more than 1 second is spent, reschedule the dispatch work and break the loop. [1] [ 609.691437] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 35s! [ 609.701820] Showing busy workqueues and worker pools: [ 609.707915] workqueue events: flags=0x0 [ 609.712615] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 609.712626] pending: drm_fb_helper_damage_work [drm_kms_helper] [ 609.712687] workqueue events_freezable: flags=0x4 [ 609.732943] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 609.732952] pending: pci_pme_list_scan [ 609.732968] workqueue events_power_efficient: flags=0x80 [ 609.751947] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 609.751955] pending: neigh_managed_work [ 609.752018] workqueue kblockd: flags=0x18 [ 609.769480] pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4 [ 609.769488] in-flight: 1020:blk_mq_run_work_fn [ 609.769498] pending: blk_mq_timeout_work, blk_mq_run_work_fn [ 609.769744] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=35s workers=2 idle: 67 [ 639.899730] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 66s! [ 639.909513] Showing busy workqueues and worker pools: [ 639.915404] workqueue events: flags=0x0 [ 639.920197] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 [ 639.920215] pending: drm_fb_helper_damage_work [drm_kms_helper] [ 639.920365] workqueue kblockd: flags=0x18 [ 639.939932] pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4 [ 639.939942] in-flight: 1020:blk_mq_run_work_fn [ 639.939955] pending: blk_mq_timeout_work, blk_mq_run_work_fn [ 639.940212] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=66s workers=2 idle: 67 Fixes: 6e6fcbc27e778 ("blk-mq: support batching dispatch in case of io") Signed-off-by: Shin'ichiro Kawasaki Cc: stable@vger.kernel.org # v5.10+ Link: https://lore.kernel.org/linux-block/20220310091649.zypaem5lkyfadymg@shindev/ Link: https://lore.kernel.org/r/20220318022641.133484-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- block/blk-mq-sched.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -194,11 +194,18 @@ static int __blk_mq_do_dispatch_sched(st static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { + unsigned long end = jiffies + HZ; int ret; do { ret = __blk_mq_do_dispatch_sched(hctx); - } while (ret == 1); + if (ret != 1) + break; + if (need_resched() || time_is_before_jiffies(end)) { + blk_mq_delay_run_hw_queue(hctx, 0); + break; + } + } while (1); return ret; }