Received: by 10.223.148.5 with SMTP id 5csp7871362wrq; Thu, 18 Jan 2018 10:31:37 -0800 (PST) X-Google-Smtp-Source: ACJfBotiaUU7jhHFlmHrvZm9aL9cWXIcoKoYiihPW9h6UmayX7ssJdscZymoEyd1GqU6B9+BrIfW X-Received: by 10.99.154.10 with SMTP id o10mr37290034pge.156.1516300297064; Thu, 18 Jan 2018 10:31:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516300297; cv=none; d=google.com; s=arc-20160816; b=STbRUE01p69fIjrtzqk0gCwFLBN0GaQfa79RSAJEgDgzchp5Os0DnEr1q1Xk/4/rRU rkf2fiw3I49giFRZw3X3O/D3N75Cjj83N3X5VThtIdWrW/gbrKvnRb5Bca+Wncuvr4QE b8I+ulOyZCbCq+YPEyS9aTXsvBIHHZKU4bAD5UCQ25hqWi7d/Oaj8bnRtfRONmekPRuo dMIY0L7Ip4F2tD+JIbUSsw3l6URp8YFWGMwxGpb+YiVtbQmcyJ0oTktbpLtlL5Rql3s6 zheFUmnhEK/7AxP7e+jJUbTmT0VVQPN8DjiTgXcFODwi0CT9coHZj/grLhomcTLtzmcG sBrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=PIRqii0kVAQDu2XkBgRVyzpc8qWaUwFXgNO6rHSrC+U=; b=do+sc2ltCQ2vsKKqyTlx27CrQm2pJezkghjfBc3bCO8+tH/+fLIte4xzHtdbWImpLA ToDaLFR3JJIGU/k8rqClGPgJgvrToAH0FgLSJR5+LNDdi6+iDg1m4c2wgiPzQsBIk42q jCWM6HajSiwvIUfWjmCo+yCGRfInX+065v/l+6/v7pcfbOG/GK5hXX9FqFBhelXHk8nj vc9FGUHdddtB2MUPIzSQrLZFvjtu5DYIXtPle5q9N7lmUP7DC5m6mpWcGUEyOmz9uiz4 TDQz5fgUeBfokBRbB6HdsXY84qUfPp6+BaYcbVI3W4GFvDkleoF9d8Pp0s7p09r0l/Xk D7Rg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f2si6615930pgr.306.2018.01.18.10.31.22; Thu, 18 Jan 2018 10:31:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755571AbeARSau (ORCPT + 99 others); Thu, 18 Jan 2018 13:30:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33654 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753332AbeARSas (ORCPT ); Thu, 18 Jan 2018 13:30:48 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9A22B62644; Thu, 18 Jan 2018 18:30:48 +0000 (UTC) Received: from localhost (unknown [10.18.25.149]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8B0D36090C; Thu, 18 Jan 2018 18:30:40 +0000 (UTC) Date: Thu, 18 Jan 2018 13:30:39 -0500 From: Mike Snitzer To: Bart Van Assche Cc: "dm-devel@redhat.com" , "hch@infradead.org" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "osandov@fb.com" , "ming.lei@redhat.com" , "axboe@kernel.dk" Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle Message-ID: <20180118183039.GA20121@redhat.com> References: <20180118024124.8079-1-ming.lei@redhat.com> <20180118170353.GB19734@redhat.com> <1516296056.2676.23.camel@wdc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1516296056.2676.23.camel@wdc.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Thu, 18 Jan 2018 18:30:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 18 2018 at 12:20pm -0500, Bart Van Assche wrote: > On Thu, 2018-01-18 at 12:03 -0500, Mike Snitzer wrote: > > On Thu, Jan 18 2018 at 11:50am -0500, > > Bart Van Assche wrote: > > > My comments about the above are as follows: > > > - It can take up to q->rq_timeout jiffies after a .queue_rq() > > > implementation returned BLK_STS_RESOURCE before blk_mq_timeout_work() > > > gets called. However, it can happen that only a few milliseconds after > > > .queue_rq() returned BLK_STS_RESOURCE that the condition that caused > > > it to return BLK_STS_RESOURCE gets cleared. So the above approach can > > > result in long delays during which it will seem like the queue got > > > stuck. Additionally, I think that the block driver should decide how > > > long it takes before a queue is rerun and not the block layer core. > > > > So configure q->rq_timeout to be shorter? Which is configurable though > > blk_mq_tag_set's 'timeout' member. It apparently defaults to 30 * HZ. > > > > That is the problem with timeouts, there is generally no one size fits > > all. > > Sorry but I think that would be wrong. The delay after which a queue is rerun > should not be coupled to the request timeout. These two should be independent. That's fair. Not saying I think that is a fix anyway. > > > - The lockup that I reported only occurs with the dm driver but not any > > > other block driver. So why to modify the block layer core since this > > > can be fixed by modifying the dm driver? > > > > Hard to know it is only DM's blk-mq that is impacted. That is the only > > blk-mq driver that you're testing like this (that is also able to handle > > faults, etc). > > That's not correct. I'm also testing the SCSI core, which is one of the most > complicated block drivers. OK, but SCSI mq is part of the problem here. It is a snowflake that has more exotic reasons for returning BLK_STS_RESOURCE. > > > - A much simpler fix and a fix that is known to work exists, namely > > > inserting a blk_mq_delay_run_hw_queue() call in the dm driver. > > > > Because your "much simpler" fix actively hurts performance, as is > > detailed in this header: > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.16&id=ec3eaf9a673106f66606896aed6ddd20180b02ec > > We are close to the start of the merge window so I think it's better to fall > back to an old approach that is known to work than to keep a new approach > that is known not to work. Additionally, the performance issue you referred > to only affects IOPS and bandwidth more than 1% with the lpfc driver and that > is because the queue depth it supports is much lower than for other SCSI HBAs, > namely 3 instead of 64. 1%!? Where are you getting that number? Ming has detailed more significant performance gains than 1%.. and not just on lpfc (though you keep seizing on lpfc because of the low queue_depth of 3). This is all very tiresome. I'm _really_ not interested in this debate any more. The specific case that causes the stall need to be identified and a real fix needs to be developed. Ming is doing a lot of that hard work. Please contribute or at least stop pleading for your hack to be reintroduced. If at the end of the 4.16 release we still don't have a handle on the stall you're seeing I'll revisit this and likely revert to blindly kicking the queue after an arbitrary delay. But I'm willing to let this issue get more time without papering over it. Mike