Received: by 10.223.176.46 with SMTP id f43csp351664wra; Thu, 18 Jan 2018 18:34:45 -0800 (PST) X-Google-Smtp-Source: ACJfBovX+HD787Tp4rvsSfRQvaE72v14KQT2a+12VEFjCXcdyRoOC049QN/tJKaSFdCcKRmNthuf X-Received: by 2002:a17:902:b403:: with SMTP id x3-v6mr813161plr.192.1516329285524; Thu, 18 Jan 2018 18:34:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516329285; cv=none; d=google.com; s=arc-20160816; b=B87OXtoSxQ+td6o0cmf5NXgp6du/kc/AdpgHrN4CH+Bh6Uzm22G7BQtHqyFr/3Z3xp HJIZFKfM+jEwEE6z98dL4dBkDbUaypRgyKzUUFhh+tzRs9zTICwQsgLC0Q2phCq3xtk5 Ym7sSbKQiLAe7MYNwbNvT3POsN5P3HAeWTxh2HFhoi3bdvCjPWoVJQB269rIqqb6TriN J+sOIwJ+5Achc9eCyoCaQAIXKrgXlbKIe722kuI2Z3xFbV3Vk3GxrE8CeV8v2l9jMLdh /JPHHJEESeRAfiqRENu9YsbdnRnnTq+VA9sQyoTKFPT4SUcINmdM4oLZQQjY6TUPgHuD X+5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=naMapO7kXlOLrNLir4Y6c8qSGk7X0wjpfCSSkm0bgT8=; b=JcP/7zyqbJyH48A2olb9nwxoCrED58Qh+wujZEZdHXHot8tKZ5mnPJYLd9VVGn6mt5 ndQw2zq7ouQFHuG3vatJXMVCLQGz/7MNZojQLw/Wwi2rrqw+KpudHaelJfRrjcmLFMSc R93/F2cXhf8W0RVeMiJyuuZdpU233lawGEeqIgkfVC6N3XdgECA+EtME/12tqij7OeFF eI6R/cnXuW1fGSf9l7kj5giOAB4Q3/SkEqJvVTF+p8Iz6Qif1xcjqXPaucKf1P+vS1m+ TBuAobP85O6/SqIteRukzUI+3XjHhGeadxKYTRh80nnkLGmENQ5e7xuFPTXyV/fb49Hn noYw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z185si7517941pgb.202.2018.01.18.18.34.31; Thu, 18 Jan 2018 18:34:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755069AbeASCcg (ORCPT + 99 others); Thu, 18 Jan 2018 21:32:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38740 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752855AbeASCc2 (ORCPT ); Thu, 18 Jan 2018 21:32:28 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 71265C070132; Fri, 19 Jan 2018 02:32:28 +0000 (UTC) Received: from ming.t460p (ovpn-12-90.pek2.redhat.com [10.72.12.90]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 585425D6B2; Fri, 19 Jan 2018 02:32:17 +0000 (UTC) Date: Fri, 19 Jan 2018 10:32:13 +0800 From: Ming Lei To: Jens Axboe Cc: Bart Van Assche , "snitzer@redhat.com" , "dm-devel@redhat.com" , "hch@infradead.org" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "osandov@fb.com" Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle Message-ID: <20180119023212.GA25413@ming.t460p> References: <20180118024124.8079-1-ming.lei@redhat.com> <20180118170353.GB19734@redhat.com> <1516296056.2676.23.camel@wdc.com> <20180118183039.GA20121@redhat.com> <1516301278.2676.35.camel@wdc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 19 Jan 2018 02:32:28 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 18, 2018 at 01:11:01PM -0700, Jens Axboe wrote: > On 1/18/18 11:47 AM, Bart Van Assche wrote: > >> This is all very tiresome. > > > > Yes, this is tiresome. It is very annoying to me that others keep > > introducing so many regressions in such important parts of the kernel. > > It is also annoying to me that I get blamed if I report a regression > > instead of seeing that the regression gets fixed. > > I agree, it sucks that any change there introduces the regression. I'm > fine with doing the delay insert again until a new patch is proven to be > better. That way is still buggy as I explained, since rerun queue before adding request to hctx->dispatch_list isn't correct. Who can make sure the request is visible when __blk_mq_run_hw_queue() is called? Not mention this way will cause performance regression again. > > From the original topic of this email, we have conditions that can cause > the driver to not be able to submit an IO. A set of those conditions can > only happen if IO is in flight, and those cases we have covered just > fine. Another set can potentially trigger without IO being in flight. > These are cases where a non-device resource is unavailable at the time > of submission. This might be iommu running out of space, for instance, > or it might be a memory allocation of some sort. For these cases, we > don't get any notification when the shortage clears. All we can do is > ensure that we restart operations at some point in the future. We're SOL > at that point, but we have to ensure that we make forward progress. Right, it is a generic issue, not DM-specific one, almost all drivers call kmalloc(GFP_ATOMIC) in IO path. IMO, there is enough time for figuring out a generic solution before 4.16 release. > > That last set of conditions better not be a a common occurence, since > performance is down the toilet at that point. I don't want to introduce > hot path code to rectify it. Have the driver return if that happens in a > way that is DIFFERENT from needing a normal restart. The driver knows if > this is a resource that will become available when IO completes on this > device or not. If we get that return, we have a generic run-again delay. Now most of times both NVMe and SCSI won't return BLK_STS_RESOURCE, and it should be DM-only which returns STS_RESOURCE so often. > > This basically becomes the same as doing the delay queue thing from DM, > but just in a generic fashion. Yeah, it is right. -- Ming