Received: by 10.223.148.5 with SMTP id 5csp6760138wrq; Wed, 17 Jan 2018 18:42:29 -0800 (PST) X-Google-Smtp-Source: ACJfBosTNc2hPzOUoMNJv4G5AYhFlPruynnZOraB7nVTfSzg5C2Ow1yoxgPj7ie5Q0xGaziqqAgH X-Received: by 10.98.18.150 with SMTP id 22mr35270125pfs.180.1516243349457; Wed, 17 Jan 2018 18:42:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516243349; cv=none; d=google.com; s=arc-20160816; b=Tq7zcQhEKVdoqdOanT3mH1FiRXblQnQCB/ZviHWURtV7LMhZhmxRA2VBNdh/QSrn94 EfUqFGIJafLsNgkjDLXiqcqdwIRE+O+iUlyff6l21pX/mFSYT9BQNOR/dfkUJ7+JsN6k 68xmPqvzrwd+Dh3ajjg7GZ0gobz/QbRynTFXSNtdQG98e8Y7LiHZAuZKWY223bC27Tqi 4YUbnXdV3F84bW1VAoF1R+u8E4G5W8vMJ7jkHq1KycU/wx9glb6+lOUaHB6+JCr24rae YPPlAFcsav4nFWsYoBJf8xtugC3HcU7hqiLMUrG+UGdpSZtZhVd1JZPipAR70Eci00hh f/TA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=6lpCbtkD6RADz3R7HLJSBsSvClw8TjsDo/6s8nFsDcs=; b=Vk0k/3tjyjsCBjMIPMBFUtQ/dFytkvQlepA6QGT+fYbirNa6Hysa0tB3jXxBDOxFG7 2QMXi0p56k4Nu3nMfi76ucO/gWVWgTyrRhV7jM90EiLyb7OoDkboKdsvZn5SuWWsQTlm p2BIKaFv7q8Gq8sAegKGS8N3PoDzz4/vd0SqZkeYw0616ZoBEN9P3EWK3pa55ECWBXCR 3qze4LIW/d1JvP0u5khZhxx71jJ0jZsGBQQrEGbB2TXHHlgUxifJVBSsPsk81zkg53zg ZyHIiKoAWk8vno0ie4JQ2Io0hJQXOAEAl/On2b2eyctuGaN0YQNeWhNb1bTgWb2i+V4/ 6Y7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z11si2972874plo.356.2018.01.17.18.42.15; Wed, 17 Jan 2018 18:42:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752955AbeARCls (ORCPT + 99 others); Wed, 17 Jan 2018 21:41:48 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48924 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750931AbeARClr (ORCPT ); Wed, 17 Jan 2018 21:41:47 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1B983356D6; Thu, 18 Jan 2018 02:41:47 +0000 (UTC) Received: from localhost (ovpn-12-94.pek2.redhat.com [10.72.12.94]) by smtp.corp.redhat.com (Postfix) with ESMTP id B5C795D964; Thu, 18 Jan 2018 02:41:33 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, Mike Snitzer , dm-devel@redhat.com Cc: Christoph Hellwig , Bart Van Assche , linux-kernel@vger.kernel.org, Omar Sandoval , Ming Lei Subject: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle Date: Thu, 18 Jan 2018 10:41:24 +0800 Message-Id: <20180118024124.8079-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 18 Jan 2018 02:41:47 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org BLK_STS_RESOURCE can be returned from driver when any resource is running out of. And the resource may not be related with tags, such as kmalloc(GFP_ATOMIC), when queue is idle under this kind of BLK_STS_RESOURCE, restart can't work any more, then IO hang may be caused. Most of drivers may call kmalloc(GFP_ATOMIC) in IO path, and almost all returns BLK_STS_RESOURCE under this situation. But for dm-mpath, it may be triggered a bit easier since the request pool of underlying queue may be consumed up much easier. But in reality, it is still not easy to trigger it. I run all kinds of test on dm-mpath/scsi-debug with all kinds of scsi_debug parameters, can't trigger this issue at all. But finally it is triggered in Bart's SRP test, which seems made by genius, :-) This patch deals with this situation by running the queue again when queue is found idle in timeout handler. Signed-off-by: Ming Lei --- Another approach is to do the check after BLK_STS_RESOURCE is returned from .queue_rq() and BLK_MQ_S_SCHED_RESTART is set, that way may introduce a bit cost in hot path, and it was V1 of this patch actually, please see that in the following link: https://github.com/ming1/linux/commit/68a66900f3647ea6751aab2848b1e5eef508feaa Or other better ways? block/blk-mq.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 82 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 6e3f77829dcc..4d4af8d712da 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -896,6 +896,85 @@ static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx, blk_mq_rq_timed_out(rq, reserved); } +struct hctx_busy_data { + struct blk_mq_hw_ctx *hctx; + bool reserved; + bool busy; +}; + +static bool check_busy_hctx(struct sbitmap *sb, unsigned int bitnr, void *data) +{ + struct hctx_busy_data *busy_data = data; + struct blk_mq_hw_ctx *hctx = busy_data->hctx; + struct request *rq; + + if (busy_data->reserved) + bitnr += hctx->tags->nr_reserved_tags; + + rq = hctx->tags->static_rqs[bitnr]; + if (blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) { + busy_data->busy = true; + return false; + } + return true; +} + +/* Check if there is any in-flight request */ +static bool blk_mq_hctx_is_busy(struct blk_mq_hw_ctx *hctx) +{ + struct hctx_busy_data data = { + .hctx = hctx, + .busy = false, + .reserved = true, + }; + + sbitmap_for_each_set(&hctx->tags->breserved_tags.sb, + check_busy_hctx, &data); + if (data.busy) + return true; + + data.reserved = false; + sbitmap_for_each_set(&hctx->tags->bitmap_tags.sb, + check_busy_hctx, &data); + if (data.busy) + return true; + + return false; +} + +static void blk_mq_fixup_restart(struct blk_mq_hw_ctx *hctx) +{ + if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) { + bool busy; + + /* + * If this hctx is still marked as RESTART, and there + * isn't any in-flight requests, we have to run queue + * here to prevent IO from hanging. + * + * BLK_STS_RESOURCE can be returned from driver when any + * resource is running out of. And the resource may not + * be related with tags, such as kmalloc(GFP_ATOMIC), when + * queue is idle under this kind of BLK_STS_RESOURCE, restart + * can't work any more, then IO hang may be caused. + * + * The counter-pair of the following barrier is the one + * in blk_mq_put_driver_tag() after returning BLK_STS_RESOURCE + * from ->queue_rq(). + */ + smp_mb(); + + busy = blk_mq_hctx_is_busy(hctx); + if (!busy) { + printk(KERN_WARNING "blk-mq: fixup RESTART\n"); + printk(KERN_WARNING "\t If this message is shown" + " a bit often, please report the issue to" + " linux-block@vger.kernel.org\n"); + blk_mq_run_hw_queue(hctx, true); + } + } +} + static void blk_mq_timeout_work(struct work_struct *work) { struct request_queue *q = @@ -966,8 +1045,10 @@ static void blk_mq_timeout_work(struct work_struct *work) */ queue_for_each_hw_ctx(q, hctx, i) { /* the hctx may be unmapped, so check it here */ - if (blk_mq_hw_queue_mapped(hctx)) + if (blk_mq_hw_queue_mapped(hctx)) { blk_mq_tag_idle(hctx); + blk_mq_fixup_restart(hctx); + } } } blk_queue_exit(q); -- 2.9.5