Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp499103ybz; Wed, 22 Apr 2020 02:23:28 -0700 (PDT) X-Google-Smtp-Source: APiQypIEGYh+BH5sas/p9Cy40H0wECFavgQec2/VqTEW7e8NSygtN/jmFQkWMnv3YIcQDK43M9qX X-Received: by 2002:a17:906:6a02:: with SMTP id o2mr24748217ejr.223.1587547408535; Wed, 22 Apr 2020 02:23:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587547408; cv=none; d=google.com; s=arc-20160816; b=E8HawaZapj+LdQfhlndy39AsMOkDVcWYv4ru57yfmxAjOHujvL1syfMzyV9QJ5EmP4 IpFq3WNTf/vctqenBMn14WeRo4f79YWdZw0SAt7OJKous4TIoYidLDWOAtuQkn5oWZLL xdqhUa/gbnbQNfrzxM7cszRfxODpO8g5FsibzQkUd0HRIcmhDZfSJDpzLbP0HBUXaXv8 7bvYvJIgXkRaEOMfOg2p6qbEaMU5+0Z8Q1KtVGnqMJue2OAyR9bWl7svX02nrmxvUqzS lTvr5Fv966y0YM3BaliyYeducYglRQt9QAVaGJ3hYtyNn9r5/Y261hgN1jl0Sw5mfKPR HQwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=QiYnaUuKzHlMy5O2HquLEq5ea0I9TktnMamdjmeDROs=; b=G1vMAtz85SOatTMrXiF9o2adEIuMl2e02KxJ+xJf6khmpjyeJj5mGcnY8CSQ0hiWcK fsw+AFBDQJ+UjsoGiDdNSegfV9l6wa+9oButIm6Bee6HGwCLdGZsPlZzqkAmIeO6C0UA tXNzMYl6wKSySrc0pH0NJRou08vmSAqvWWbHxPfyE1tsHKx2Cs/AxRPoKNLAZJxFMg6B pVsJs1aFJ4/IZrU8Nz8NP7wbLrt9ep29W94g7qItTEq9TBFDE95ngWHo98QNrqbm3AN6 OBBu3lnOwZFphS5LdR1zjHZwE6ClV3uRcZKQxkk7OChQ4Y70XsnUcRW1CFkscqu9nX03 qNYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WPEd4A+Y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n26si3134166ejy.188.2020.04.22.02.23.04; Wed, 22 Apr 2020 02:23:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WPEd4A+Y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726041AbgDVJVQ (ORCPT + 99 others); Wed, 22 Apr 2020 05:21:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1725924AbgDVJVP (ORCPT ); Wed, 22 Apr 2020 05:21:15 -0400 Received: from mail-lf1-x141.google.com (mail-lf1-x141.google.com [IPv6:2a00:1450:4864:20::141]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E273C03C1A7; Wed, 22 Apr 2020 02:21:15 -0700 (PDT) Received: by mail-lf1-x141.google.com with SMTP id l11so1049364lfc.5; Wed, 22 Apr 2020 02:21:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=QiYnaUuKzHlMy5O2HquLEq5ea0I9TktnMamdjmeDROs=; b=WPEd4A+YB5yVkfuCZLT5O2dnH4Lm5B+2L9YtdKS/iIKDU7PxN/jbNQLj+9F/HqtMtN VALWJz8jJAD+8lF4FHEOkmbAv+4kCbLzj7daz5QMGbz85LSRAvzJp5X2L+mihT1XWNlW FLxp/Dd4DLpbFTrcMDicKWyMABQpqQ1rlK6e6W3axEIkTbcjoDPUYnc5Mzpd65kC7oDI c3C+dixTjRj3BKqyTxt8yShQYy3gidz/f9qZfncyZJh4Iw5N3fMpy5ev0QFxZTptZIf3 3h0xb99YnT8zSZFHrau6k8KVIkZBKF4cYoKs1d71XHO1kBSCLaIYuR9/OOb8gt6etr54 cLlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=QiYnaUuKzHlMy5O2HquLEq5ea0I9TktnMamdjmeDROs=; b=G2lYe7ECxKoC3Ijmvx2DG4a/RshYRrQ6JaqHRnbLwXp7Nbs6NgByFf+KvuQK4hSPVz fOcRwp+kZEe9PhXiBOsGSJGVN0RIAseLB/o7ceMNTj6Afo5mLV+1QbBCubtBuvBlawF1 nGi2PdYldOXGQknp8jSw/SQyrgZ37eDX54YXdNVDiySG3VAziSmXYUFo4TnIDCAVtggQ th0MA5I/NHWc1CJXxlb7hGkVsJoFXTqP1oRVLRdb4VplqKBgyCt5/datXg6E0DX9/U6k fl1gL4sXNCUtFqsXgUNSUjwSd8C6zy/Ixm+augtQrZFBbVW61CGTM41J2J4NdHzEaqTn 7gyA== X-Gm-Message-State: AGi0PuYX3g2OantidgWyWItefAvW7QXO5uNTZs9NduQQX2xKBcw7cIBY 5ypUw2nZOYSNbLfOwYnVg0wYcNb0sdKXX+RqYbY= X-Received: by 2002:ac2:53a6:: with SMTP id j6mr16430182lfh.153.1587547273649; Wed, 22 Apr 2020 02:21:13 -0700 (PDT) MIME-Version: 1.0 References: <20200318100123.GA27531@ming.t460p> <20200323034432.GA27507@ming.t460p> <20200323072640.GA4767@ming.t460p> <20200323082830.GB5616@ming.t460p> <20200323095806.GD5616@ming.t460p> In-Reply-To: From: Baolin Wang Date: Wed, 22 Apr 2020 17:21:01 +0800 Message-ID: Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware To: Ming Lei Cc: axboe@kernel.dk, Paolo Valente , Ulf Hansson , Adrian Hunter , Arnd Bergmann , Linus Walleij , Orson Zhai , Chunyan Zhang , linux-mmc , linux-block , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Ming, On Fri, Mar 27, 2020 at 4:30 PM Baolin Wang wrote: > > Hi Ming, > > On Tue, Mar 24, 2020 at 4:29 PM Baolin Wang wrote: > > > > Hi Ming, > > > > On Mon, Mar 23, 2020 at 5:58 PM Ming Lei wrote: > > > > > > On Mon, Mar 23, 2020 at 05:13:27PM +0800, Baolin Wang wrote: > > > > On Mon, Mar 23, 2020 at 4:29 PM Ming Lei wrote: > > > > > > > > > > On Mon, Mar 23, 2020 at 04:22:38PM +0800, Baolin Wang wrote: > > > > > > On Mon, Mar 23, 2020 at 3:27 PM Ming Lei wrote: > > > > > > > > > > > > > > On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote: > > > > > > > > On Mon, Mar 23, 2020 at 11:44 AM Ming Lei wrote: > > > > > > > > > > > > > > > > > > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote: > > > > > > > > > > Hi Ming, > > > > > > > > > > > > > > > > > > > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang wrote: > > > > > > > > > > > > > > > > > > > > > > Hi Ming, > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote: > > > > > > > > > > > > > As we know, some SD/MMC host controllers can support packed request, > > > > > > > > > > > > > that means we can package several requests to host controller at one > > > > > > > > > > > > > time to improve performence. So the hardware driver expects the blk-mq > > > > > > > > > > > > > can dispatch a batch of requests at one time, and driver can use bd.last > > > > > > > > > > > > > to indicate if it is the last request in the batch to help to combine > > > > > > > > > > > > > requests as much as possible. > > > > > > > > > > > > > > > > > > > > > > > > > > Thus we should add batch requests setting from the block driver to tell > > > > > > > > > > > > > the scheduler how many requests can be dispatched in a batch, as well > > > > > > > > > > > > > as changing the scheduler to dispatch more than one request if setting > > > > > > > > > > > > > the maximum batch requests number. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I feel this batch dispatch style is more complicated, and some other > > > > > > > > > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in > > > > > > > > > > > > .queue_rq(). > > > > > > > > > > > > > > > > > > > > > > > > So what about the following way by extending .commit_rqs() to this usage? > > > > > > > > > > > > And you can do whatever batch processing in .commit_rqs() which will be > > > > > > > > > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver. > > > > > > > > > > > > > > > > > > > > > > I'm very appreciated for your good suggestion, which is much simpler than mine. > > > > > > > > > > > It seems to solve my problem, and I will try it on my platform to see > > > > > > > > > > > if it can work and give you the feadback. Thanks again. > > > > > > > > > > > > > > > > > > > > I tried your approach on my platform, but met some problems, see below. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > > > > > > > > > > > index 856356b1619e..cd2bbe56f83f 100644 > > > > > > > > > > > > --- a/block/blk-mq-sched.c > > > > > > > > > > > > +++ b/block/blk-mq-sched.c > > > > > > > > > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) > > > > > > > > > > > > * its queue by itself in its completion handler, so we don't need to > > > > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. > > > > > > > > > > > > */ > > > > > > > > > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > > > > > > > > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > > > > > > > > > > > { > > > > > > > > > > > > struct request_queue *q = hctx->queue; > > > > > > > > > > > > struct elevator_queue *e = q->elevator; > > > > > > > > > > > > LIST_HEAD(rq_list); > > > > > > > > > > > > + bool ret = false; > > > > > > > > > > > > > > > > > > > > > > > > do { > > > > > > > > > > > > struct request *rq; > > > > > > > > > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > > > > > > > > > > > * in blk_mq_dispatch_rq_list(). > > > > > > > > > > > > */ > > > > > > > > > > > > list_add(&rq->queuelist, &rq_list); > > > > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); > > > > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true); > > > > > > > > > > > > + } while (ret); > > > > > > > > > > > > + > > > > > > > > > > > > + return ret; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, > > > > > > > > > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, > > > > > > > > > > > > * its queue by itself in its completion handler, so we don't need to > > > > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE. > > > > > > > > > > > > */ > > > > > > > > > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) > > > > > > > > > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) > > > > > > > > > > > > { > > > > > > > > > > > > struct request_queue *q = hctx->queue; > > > > > > > > > > > > LIST_HEAD(rq_list); > > > > > > > > > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from); > > > > > > > > > > > > + bool ret = false; > > > > > > > > > > > > > > > > > > > > > > > > do { > > > > > > > > > > > > struct request *rq; > > > > > > > > > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) > > > > > > > > > > > > > > > > > > > > > > > > /* round robin for fair dispatch */ > > > > > > > > > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx); > > > > > > > > > > > > - > > > > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true)); > > > > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true); > > > > > > > > > > > > + } while (ret); > > > > > > > > > > > > > > > > > > > > > > > > WRITE_ONCE(hctx->dispatch_from, ctx); > > > > > > > > > > > > + > > > > > > > > > > > > + return ret; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > > > > > > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > > > > > > > > > > > struct elevator_queue *e = q->elevator; > > > > > > > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request; > > > > > > > > > > > > LIST_HEAD(rq_list); > > > > > > > > > > > > + bool dispatch_ret; > > > > > > > > > > > > > > > > > > > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */ > > > > > > > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q))) > > > > > > > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > > > > > > > > > > > */ > > > > > > > > > > > > if (!list_empty(&rq_list)) { > > > > > > > > > > > > blk_mq_sched_mark_restart_hctx(hctx); > > > > > > > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { > > > > > > > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false); > > > > > > > > > > > > + if (dispatch_ret) { > > > > > > > > > > > > if (has_sched_dispatch) > > > > > > > > > > > > - blk_mq_do_dispatch_sched(hctx); > > > > > > > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx); > > > > > > > > > > > > > > > > > > > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(), > > > > > > > > > > and got dispatch_ret = true now. Then we will try to dispatch more > > > > > > > > > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no > > > > > > > > > > more requests in scheduler, then we will got dispatch_ret = false. In > > > > > > > > > > > > > > > > > > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched(). > > > > > > > > > When any one request has been dispatched successfully, 'dispatch_ret' > > > > > > > > > is true. New request is always added to list before calling > > > > > > > > > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns > > > > > > > > > false, it means that .commit_rqs() has been called. > > > > > > > > > > > > > > > > Not really, if no requests int the IO cheduler, we will break the loop > > > > > > > > in blk_mq_do_dispatch_sched() and return false without calling > > > > > > > > .commit_rqs(). > > > > > > > > > > > > > > If there isn't any request to dispatch, false is returned. Otherwise, > > > > > > > always return the return value of last 'blk_mq_dispatch_rq_list'. > > > > > > > > > > > > > > > > > > > > > > > So in this case, blk_mq_do_dispatch_sched() will return 'false', which > > > > > > > > overlapped the return value of 'true' from blk_mq_dispatch_rq_list(), > > > > > > > > and did not call .commit_rqs(). Then the IO processing will be stuck. > > > > > > > > > > > > > > See below. > > > > > > > > > > > > > > > > > > > > > > > static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > > > > > > > { > > > > > > > > struct request_queue *q = hctx->queue; > > > > > > > > struct elevator_queue *e = q->elevator; > > > > > > > > LIST_HEAD(rq_list); > > > > > > > > bool ret = false; > > > > > > > > > > > > > > The above initialization is just done once. > > > > > > > > > > > > > > > > > > > > > > > do { > > > > > > > > struct request *rq; > > > > > > > > > > > > > > > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) > > > > > > > > break; > > > > > > > > > > > > > > > > ....... > > > > > > > ret = blk_mq_dispatch_rq_list(q, list, ...); > > > > > > > > > > > > > > list includes one request, so blk_mq_dispatch_rq_list() won't return > > > > > > > false in case of no request in list. > > > > > > > > > > > > > > > } while (ret); > > > > > > > > > > > > > > > > return ret; > > > > > > > > > > > > > > 'ret' is always updated by return value of last blk_mq_dispatch_rq_list() > > > > > > > if at least one request is dispatched. So if it becomes false, the loop > > > > > > > breaks, that means .commit_rqs() has been called cause 'list' does > > > > > > > include one request for blk_mq_dispatch_rq_list(). Otherwise, true is > > > > > > > still returned. > > > > > > > > > > > > Sorry for my confusing description, let me try again to describe the problem. > > > > > > When I try to mount the block device, I got the IO stuck with your > > > > > > patch, and I did some debugging. I found we missed calling > > > > > > commit_rqs() for one case: > > > > > > > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct > > > > > > blk_mq_hw_ctx *hctx) > > > > > > struct elevator_queue *e = q->elevator; > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request; > > > > > > LIST_HEAD(rq_list); > > > > > > + bool dispatch_ret; > > > > > > > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */ > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q))) > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct > > > > > > blk_mq_hw_ctx *hctx) > > > > > > */ > > > > > > if (!list_empty(&rq_list)) { > > > > > > blk_mq_sched_mark_restart_hctx(hctx); > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) { > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false); > > > > > > > > > > > > Suppose we dispatch one request to block driver, and return 'true' here. > > > > > > > > > > > > + if (dispatch_ret) { > > > > > > if (has_sched_dispatch) > > > > > > - blk_mq_do_dispatch_sched(hctx); > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx); > > > > > > > > > > > > Then we will continue to try to dispatch more requests from IO > > > > > > scheduler, but if there are no requests in IO scheduler now, it will > > > > > > return 'false' here, and set dispatch_ret as false. > > > > > > > > > > > > else > > > > > > - blk_mq_do_dispatch_ctx(hctx); > > > > > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx); > > > > > > > > > > OK, this one is an issue, but it can be fixed simply by not updating > > > > > 'dispatch_ret' for the following dispatch, something like the below > > > > > way: > > > > > > > > > > if (dispatch_ret) { > > > > > if (has_sched_dispatch) > > > > > blk_mq_do_dispatch_sched(hctx); > > > > > else > > > > > blk_mq_do_dispatch_ctx(hctx); > > > > > } > > > > > > > > Yes, this can work. > > > > > > > > But I found your patch will drop some performance comparing with my > > > > method in patch 1/2. My method can fetch several requests from IO > > > > scheduler and dispatch them to block driver at one time, but in your > > > > patch we still need dispatch request one by one, which will drop some > > > > performance I think. > > > > What do you think? Thanks. > > > > > > Please run your test and see if performance drop can be observed. > > > > From my testing (using the same fio configuration in cover letter), I > > found your method will drop some performance from below data. > > > > My original patches: > > Sequential read: 229.6MiB/s > > Random read:180.8MiB/s > > Sequential write: 172MiB/s > > Random write:169.2MiB/s > > > > Your patches: > > Sequential read: 209MiB/s > > Random read:177MiB/s > > Sequential write: 148MiB/s > > Random write:147MiB/s > > After some optimiziton and I did more testing, I did not found any > performance issue with your patch comparing with my old method. Sorry > for noise in my last email. > > So will you send out a formal patch? If yes, please add my test-by > tag. Thanks for your help. > Tested-by: Baolin Wang Can I take this patch into my patch set with your authority? Or you want to send it out by yourself? Thanks. -- Baolin Wang