Received: by 10.223.148.5 with SMTP id 5csp7756684wrq; Thu, 18 Jan 2018 09:05:23 -0800 (PST) X-Google-Smtp-Source: ACJfBoviTQ20JgqV48ND2saMQyZRRunYvwfdPc+08jFGMCpePLeeQNcTXgH1JN1gr8c+si+jtSc0 X-Received: by 10.99.125.72 with SMTP id m8mr21646025pgn.146.1516295123505; Thu, 18 Jan 2018 09:05:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516295123; cv=none; d=google.com; s=arc-20160816; b=HM40E87jEb0MNdO/xjglM4LC3DSjWGhfr68488fpV3mdT21Zp1xQpQcWqU15j8XNWd crxW4ZsdsoPQVHrtLjQoBL9G99MUOyfB4e9sbOURqYSv8RRWfj3M6M3kgajLsRtgrMbK NQcsRijE2dzp9BcASxY9NQGNe4J7tsThKLnfcUPjSKXw+KGwuABAoFHaQMMhCqwR6nbf AQ22gmdWfgEvDzJ7WWwRiByXl5F13DUSJWTbQyMgXzvyedIogH/+UNvQVI32zR3TA01w WAVo/l0+HPvJlmSax4Ia+ex5+iLX9GM4Ud6Gpf6QliTNb9IuJySxM4+We/cSTZzHS/61 iN/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=eV7S5G+mxHYegROgwNoZ2jEebFq0KgJ3AXE641kgVJ0=; b=ZrvYgsE51Q++AEb7Lq37Cu2RMkMBFRigT69AWQxYwoHay3tV0kRc079PQ4P9olUmu/ qk401sD8dQDTSjSOagcfsZQmkZ5RaI2QaArUD7yxdC5o+DRFR8E0i3RPYfaH/VeufJv2 M2ikbo55tD7RGvsbkdxB5b9npFD04GV0OHjgcrp22L7MFrHEvsQkVeQpOgZqxQRzVIqL Th38ZV/iwH9iZ96DkE6Z2FKi+kQ7gdOU3Qyp5hefBQzyHA5cWlM3Wiaye97RkNupRMbK 7ciQIupEsdTurvg92CtwIeXqKhM9b+OTiNH0wjayIRc9PDxYnekqKC/Or7FOPvTXzm3h 5x2g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t64si6421153pgt.686.2018.01.18.09.05.09; Thu, 18 Jan 2018 09:05:23 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932690AbeARREO (ORCPT + 99 others); Thu, 18 Jan 2018 12:04:14 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34518 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750948AbeARREN (ORCPT ); Thu, 18 Jan 2018 12:04:13 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2CEDA7109E; Thu, 18 Jan 2018 17:04:13 +0000 (UTC) Received: from localhost (unknown [10.18.25.149]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 520C85D6B4; Thu, 18 Jan 2018 17:03:54 +0000 (UTC) Date: Thu, 18 Jan 2018 12:03:53 -0500 From: Mike Snitzer To: Bart Van Assche Cc: Ming Lei , Jens Axboe , linux-block@vger.kernel.org, dm-devel@redhat.com, Christoph Hellwig , Bart Van Assche , linux-kernel@vger.kernel.org, Omar Sandoval Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle Message-ID: <20180118170353.GB19734@redhat.com> References: <20180118024124.8079-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 18 Jan 2018 17:04:13 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 18 2018 at 11:50am -0500, Bart Van Assche wrote: > On 01/17/18 18:41, Ming Lei wrote: > >BLK_STS_RESOURCE can be returned from driver when any resource > >is running out of. And the resource may not be related with tags, > >such as kmalloc(GFP_ATOMIC), when queue is idle under this kind of > >BLK_STS_RESOURCE, restart can't work any more, then IO hang may > >be caused. > > > >Most of drivers may call kmalloc(GFP_ATOMIC) in IO path, and almost > >all returns BLK_STS_RESOURCE under this situation. But for dm-mpath, > >it may be triggered a bit easier since the request pool of underlying > >queue may be consumed up much easier. But in reality, it is still not > >easy to trigger it. I run all kinds of test on dm-mpath/scsi-debug > >with all kinds of scsi_debug parameters, can't trigger this issue > >at all. But finally it is triggered in Bart's SRP test, which seems > >made by genius, :-) > > > >[ ... ] > > > > static void blk_mq_timeout_work(struct work_struct *work) > > { > > struct request_queue *q = > >@@ -966,8 +1045,10 @@ static void blk_mq_timeout_work(struct work_struct *work) > > */ > > queue_for_each_hw_ctx(q, hctx, i) { > > /* the hctx may be unmapped, so check it here */ > >- if (blk_mq_hw_queue_mapped(hctx)) > >+ if (blk_mq_hw_queue_mapped(hctx)) { > > blk_mq_tag_idle(hctx); > >+ blk_mq_fixup_restart(hctx); > >+ } > > } > > } > > blk_queue_exit(q); > > Hello Ming, > > My comments about the above are as follows: > - It can take up to q->rq_timeout jiffies after a .queue_rq() > implementation returned BLK_STS_RESOURCE before blk_mq_timeout_work() > gets called. However, it can happen that only a few milliseconds after > .queue_rq() returned BLK_STS_RESOURCE that the condition that caused > it to return BLK_STS_RESOURCE gets cleared. So the above approach can > result in long delays during which it will seem like the queue got > stuck. Additionally, I think that the block driver should decide how > long it takes before a queue is rerun and not the block layer core. So configure q->rq_timeout to be shorter? Which is configurable though blk_mq_tag_set's 'timeout' member. It apparently defaults to 30 * HZ. That is the problem with timeouts, there is generally no one size fits all. > - The lockup that I reported only occurs with the dm driver but not any > other block driver. So why to modify the block layer core since this > can be fixed by modifying the dm driver? Hard to know it is only DM's blk-mq that is impacted. That is the only blk-mq driver that you're testing like this (that is also able to handle faults, etc). > - A much simpler fix and a fix that is known to work exists, namely > inserting a blk_mq_delay_run_hw_queue() call in the dm driver. Because your "much simpler" fix actively hurts performance, as is detailed in this header: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.16&id=ec3eaf9a673106f66606896aed6ddd20180b02ec I'm not going to take your bandaid fix given it very much seems to be papering over a real blk-mq issue. Mike