Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4671722imm; Wed, 30 May 2018 09:43:25 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIkCUn0nPMLXhNxrLLoD94isXKKm/x8bzQENA/c8C5l+xjE4/nTCRIcsssaiiSrUCrkyODf X-Received: by 2002:a63:618f:: with SMTP id v137-v6mr2830929pgb.145.1527698605418; Wed, 30 May 2018 09:43:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527698605; cv=none; d=google.com; s=arc-20160816; b=C7lcVsXM1RwDdzvxMRjanRPtkjTypgdcox1tWuxhWJApBFFhIDQS3ibPtf8RbvCYzF +veUpySKgMlwq93If3/TRQJ+bs0bqQIxVO0s6/2EIY0U8Ykh5YsJ+wWb/NT7VLfKlnD3 uYPtVjd27JIiowHIM0+7CBgR3ajzjQHQzYud/++Zo9DqeK1RjFFe1qL1zT6G/4zM+bho mnTikBBd3ozW4lbC2e355E/Muz+f9ZAVz+kr3ud3DEL2hGUNYKW+EwJIwEkvXes6QRJp Xno29W0kvIKoIPDTxc0KR2ysDO5r74otAwrao3KgYj5k40BH6y8a1e6ybcJ1cojoGwgb zTgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=OS79FqbMCfG3KgaZwbT4kL4hF58Hb1+zY8X5el+Ku7I=; b=vMlnQSAzV+B2ouMGprBAo0zae2E3fY45S8rzg/8eUvqMLiGoLc4ppQoLy6bOO0ADpi qL9ixEnoh4sifMRDLY+3KmRK9mIp6BSMyv3mbNZshuLbhcJGMUmnn0oXPzquzDuACbe9 LPISKA0VRB8/sjSaYuSqdc435ihnY9UK+iqhc+UjjsEwvnmEG0c/wFyyLB0T4ORoJf80 CvE+TQ8NorW3kOcvfNVYQad4WX931xB6Vp2vVg/PF7ffDosvH9uRLWL7NyX2nmDF+OeP tWvrLfbytFcHtwBCkze22xnxIcbP6Kd/iJZFzN7a7OqzGH+yA7kTkZXMsrCzUowKlw6i W6dA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@osandov-com.20150623.gappssmtp.com header.s=20150623 header.b=FjNAJy/V; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l192-v6si6590661pge.286.2018.05.30.09.43.11; Wed, 30 May 2018 09:43:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@osandov-com.20150623.gappssmtp.com header.s=20150623 header.b=FjNAJy/V; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753805AbeE3QlC (ORCPT + 99 others); Wed, 30 May 2018 12:41:02 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:35928 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753739AbeE3Qk7 (ORCPT ); Wed, 30 May 2018 12:40:59 -0400 Received: by mail-pf0-f194.google.com with SMTP id w129-v6so9308511pfd.3 for ; Wed, 30 May 2018 09:40:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=OS79FqbMCfG3KgaZwbT4kL4hF58Hb1+zY8X5el+Ku7I=; b=FjNAJy/VKyHVyYMlvBjnURZVsuXiPo6A5VHdfhQDA5sf1H6ALqBAeQzrpNZvKVBPzY pxQ0DO/l9f9+ybGi/WrkFvamfCbIWoGuMxJ/LkriJc9FospkFohBz6Ch4vugijpjs7nz A8WARSvUgUHts8pjyExkeFVj2I/sc3Jb5RTC6028TVch2OkeDfgqZGxPWE5rxh10gOXb ZRsKp0kMENHAh0KdCQJasf4U/71Owv6nUv7ag8++lEzZpjOnq53KaU4Kd/tJIFpCNPTa BFoYCbVE3CQolv7oHqCJenQRWSNvmg0tXIh+7dUmbYN26qtwlVxF3l7JJPr4v0WC9XNp 9sFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=OS79FqbMCfG3KgaZwbT4kL4hF58Hb1+zY8X5el+Ku7I=; b=t4GJqkzujwBHa7tBGQ+ZLaFs340Fg9MRsUmOgcvGFDN3My2VsMHpGprhdeXakjM6Hc KgX3zeYGf1gtYvAAVEa3uOVgeQ5NpSPn6anKbZWmC19uEnzIELlCPQl+e9JOF/0Enx5e elU2nHZ0aaEIGkydx8THP5MBRf0CGg/MWrK/8OIkv1OoKNT8NiVW9+VPQAYtp/SLvcIy PmPwakPz0hZURVd5xpFq//jTrcNCNTcb548j+rhoWyjDWTlrU9/Si3GupurNpyiTrY9k hdG/nidi3c57NHZPzwJ11FCdpb54ubhIbk+ybtBG5PsHvh0xWjCviFx+oedwZoOZS+Yh 1oqA== X-Gm-Message-State: ALKqPwdMy88lXHH8d5NVVV0TmLqDlB6L5XME2Bt/kCN5/mIEsIYgQoKG KiumtoIB4YgJcyJH5I5CWDpF/Q== X-Received: by 2002:a62:3994:: with SMTP id u20-v6mr3452779pfj.95.1527698458895; Wed, 30 May 2018 09:40:58 -0700 (PDT) Received: from vader ([2620:10d:c090:180::1:4161]) by smtp.gmail.com with ESMTPSA id u84-v6sm23897289pfg.156.2018.05.30.09.40.57 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 30 May 2018 09:40:57 -0700 (PDT) Date: Wed, 30 May 2018 09:40:55 -0700 From: Omar Sandoval To: Jianchao Wang Cc: axboe@kernel.dk, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH V3 2/2] block: kyber: make kyber more friendly with merging Message-ID: <20180530164055.GA25342@vader> References: <1527665168-1965-1-git-send-email-jianchao.w.wang@oracle.com> <1527665168-1965-3-git-send-email-jianchao.w.wang@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1527665168-1965-3-git-send-email-jianchao.w.wang@oracle.com> User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 30, 2018 at 03:26:08PM +0800, Jianchao Wang wrote: > Currently, kyber is very unfriendly with merging. kyber depends > on ctx rq_list to do merging, however, most of time, it will not > leave any requests in ctx rq_list. This is because even if tokens > of one domain is used up, kyber will try to dispatch requests > from other domain and flush the rq_list there. > > To improve this, we setup kyber_ctx_queue (kcq) which is similar > with ctx, but it has rq_lists for different domain and build same > mapping between kcq and khd as the ctx & hctx. Then we could merge, > insert and dispatch for different domains separately. At the same > time, only flush the rq_list of kcq when get domain token successfully. > Then if one domain token is used up, the requests could be left in > the rq_list of that domain and maybe merged with following io. > > Following is my test result on machine with 8 cores and NVMe card > INTEL SSDPEKKR128G7 > > fio size=256m ioengine=libaio iodepth=64 direct=1 numjobs=8 > seq/random > +------+---------------------------------------------------------------+ > |patch?| bw(MB/s) | iops | slat(usec) | clat(usec) | merge | > +----------------------------------------------------------------------+ > | w/o | 606/612 | 151k/153k | 6.89/7.03 | 3349.21/3305.40 | 0/0 | > +----------------------------------------------------------------------+ > | w/ | 1083/616 | 277k/154k | 4.93/6.95 | 1830.62/3279.95 | 223k/3k | > +----------------------------------------------------------------------+ > When set numjobs to 16, the bw and iops could reach 1662MB/s and 425k > on my platform. Looks good, and it survived blktests plus a few other stress tests. Thanks! Reviewed-by: Omar Sandoval > Signed-off-by: Jianchao Wang > Tested-by: Holger Hoffst?tte > --- > block/kyber-iosched.c | 190 +++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 158 insertions(+), 32 deletions(-) > > diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c > index 0d6d25e3..a1613655 100644 > --- a/block/kyber-iosched.c > +++ b/block/kyber-iosched.c > @@ -72,6 +72,19 @@ static const unsigned int kyber_batch_size[] = { > [KYBER_OTHER] = 8, > }; > > +/* > + * There is a same mapping between ctx & hctx and kcq & khd, > + * we use request->mq_ctx->index_hw to index the kcq in khd. > + */ > +struct kyber_ctx_queue { > + /* > + * Used to ensure operations on rq_list and kcq_map to be an atmoic one. > + * Also protect the rqs on rq_list when merge. > + */ > + spinlock_t lock; > + struct list_head rq_list[KYBER_NUM_DOMAINS]; > +} ____cacheline_aligned_in_smp; > + > struct kyber_queue_data { > struct request_queue *q; > > @@ -99,6 +112,8 @@ struct kyber_hctx_data { > struct list_head rqs[KYBER_NUM_DOMAINS]; > unsigned int cur_domain; > unsigned int batching; > + struct kyber_ctx_queue *kcqs; > + struct sbitmap kcq_map[KYBER_NUM_DOMAINS]; > wait_queue_entry_t domain_wait[KYBER_NUM_DOMAINS]; > struct sbq_wait_state *domain_ws[KYBER_NUM_DOMAINS]; > atomic_t wait_index[KYBER_NUM_DOMAINS]; > @@ -107,10 +122,8 @@ struct kyber_hctx_data { > static int kyber_domain_wake(wait_queue_entry_t *wait, unsigned mode, int flags, > void *key); > > -static int rq_sched_domain(const struct request *rq) > +static unsigned int kyber_sched_domain(unsigned int op) > { > - unsigned int op = rq->cmd_flags; > - > if ((op & REQ_OP_MASK) == REQ_OP_READ) > return KYBER_READ; > else if ((op & REQ_OP_MASK) == REQ_OP_WRITE && op_is_sync(op)) > @@ -284,6 +297,11 @@ static unsigned int kyber_sched_tags_shift(struct kyber_queue_data *kqd) > return kqd->q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift; > } > > +static int kyber_bucket_fn(const struct request *rq) > +{ > + return kyber_sched_domain(rq->cmd_flags); > +} > + > static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) > { > struct kyber_queue_data *kqd; > @@ -297,7 +315,7 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) > goto err; > kqd->q = q; > > - kqd->cb = blk_stat_alloc_callback(kyber_stat_timer_fn, rq_sched_domain, > + kqd->cb = blk_stat_alloc_callback(kyber_stat_timer_fn, kyber_bucket_fn, > KYBER_NUM_DOMAINS, kqd); > if (!kqd->cb) > goto err_kqd; > @@ -376,6 +394,15 @@ static void kyber_exit_sched(struct elevator_queue *e) > kfree(kqd); > } > > +static void kyber_ctx_queue_init(struct kyber_ctx_queue *kcq) > +{ > + unsigned int i; > + > + spin_lock_init(&kcq->lock); > + for (i = 0; i < KYBER_NUM_DOMAINS; i++) > + INIT_LIST_HEAD(&kcq->rq_list[i]); > +} > + > static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx) > { > struct kyber_hctx_data *khd; > @@ -385,6 +412,24 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx) > if (!khd) > return -ENOMEM; > > + khd->kcqs = kmalloc_array_node(hctx->nr_ctx, > + sizeof(struct kyber_ctx_queue), > + GFP_KERNEL, hctx->numa_node); > + if (!khd->kcqs) > + goto err_khd; > + > + for (i = 0; i < hctx->nr_ctx; i++) > + kyber_ctx_queue_init(&khd->kcqs[i]); > + > + for (i = 0; i < KYBER_NUM_DOMAINS; i++) { > + if (sbitmap_init_node(&khd->kcq_map[i], hctx->nr_ctx, > + ilog2(8), GFP_KERNEL, hctx->numa_node)) { > + while (--i >= 0) > + sbitmap_free(&khd->kcq_map[i]); > + goto err_kcqs; > + } > + } > + > spin_lock_init(&khd->lock); > > for (i = 0; i < KYBER_NUM_DOMAINS; i++) { > @@ -402,10 +447,22 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx) > hctx->sched_data = khd; > > return 0; > + > +err_kcqs: > + kfree(khd->kcqs); > +err_khd: > + kfree(khd); > + return -ENOMEM; > } > > static void kyber_exit_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx) > { > + struct kyber_hctx_data *khd = hctx->sched_data; > + int i; > + > + for (i = 0; i < KYBER_NUM_DOMAINS; i++) > + sbitmap_free(&khd->kcq_map[i]); > + kfree(khd->kcqs); > kfree(hctx->sched_data); > } > > @@ -427,7 +484,7 @@ static void rq_clear_domain_token(struct kyber_queue_data *kqd, > > nr = rq_get_domain_token(rq); > if (nr != -1) { > - sched_domain = rq_sched_domain(rq); > + sched_domain = kyber_sched_domain(rq->cmd_flags); > sbitmap_queue_clear(&kqd->domain_tokens[sched_domain], nr, > rq->mq_ctx->cpu); > } > @@ -446,11 +503,51 @@ static void kyber_limit_depth(unsigned int op, struct blk_mq_alloc_data *data) > } > } > > +static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio) > +{ > + struct kyber_hctx_data *khd = hctx->sched_data; > + struct blk_mq_ctx *ctx = blk_mq_get_ctx(hctx->queue); > + struct kyber_ctx_queue *kcq = &khd->kcqs[ctx->index_hw]; > + unsigned int sched_domain = kyber_sched_domain(bio->bi_opf); > + struct list_head *rq_list = &kcq->rq_list[sched_domain]; > + bool merged; > + > + spin_lock(&kcq->lock); > + merged = blk_mq_bio_list_merge(hctx->queue, rq_list, bio); > + spin_unlock(&kcq->lock); > + blk_mq_put_ctx(ctx); > + > + return merged; > +} > + > static void kyber_prepare_request(struct request *rq, struct bio *bio) > { > rq_set_domain_token(rq, -1); > } > > +static void kyber_insert_requests(struct blk_mq_hw_ctx *hctx, > + struct list_head *rq_list, bool at_head) > +{ > + struct kyber_hctx_data *khd = hctx->sched_data; > + struct request *rq, *next; > + > + list_for_each_entry_safe(rq, next, rq_list, queuelist) { > + unsigned int sched_domain = kyber_sched_domain(rq->cmd_flags); > + struct kyber_ctx_queue *kcq = &khd->kcqs[rq->mq_ctx->index_hw]; > + struct list_head *head = &kcq->rq_list[sched_domain]; > + > + spin_lock(&kcq->lock); > + if (at_head) > + list_move(&rq->queuelist, head); > + else > + list_move_tail(&rq->queuelist, head); > + sbitmap_set_bit(&khd->kcq_map[sched_domain], > + rq->mq_ctx->index_hw); > + blk_mq_sched_request_inserted(rq); > + spin_unlock(&kcq->lock); > + } > +} > + > static void kyber_finish_request(struct request *rq) > { > struct kyber_queue_data *kqd = rq->q->elevator->elevator_data; > @@ -469,7 +566,7 @@ static void kyber_completed_request(struct request *rq) > * Check if this request met our latency goal. If not, quickly gather > * some statistics and start throttling. > */ > - sched_domain = rq_sched_domain(rq); > + sched_domain = kyber_sched_domain(rq->cmd_flags); > switch (sched_domain) { > case KYBER_READ: > target = kqd->read_lat_nsec; > @@ -495,19 +592,38 @@ static void kyber_completed_request(struct request *rq) > blk_stat_activate_msecs(kqd->cb, 10); > } > > -static void kyber_flush_busy_ctxs(struct kyber_hctx_data *khd, > - struct blk_mq_hw_ctx *hctx) > +struct flush_kcq_data { > + struct kyber_hctx_data *khd; > + unsigned int sched_domain; > + struct list_head *list; > +}; > + > +static bool flush_busy_kcq(struct sbitmap *sb, unsigned int bitnr, void *data) > { > - LIST_HEAD(rq_list); > - struct request *rq, *next; > + struct flush_kcq_data *flush_data = data; > + struct kyber_ctx_queue *kcq = &flush_data->khd->kcqs[bitnr]; > > - blk_mq_flush_busy_ctxs(hctx, &rq_list); > - list_for_each_entry_safe(rq, next, &rq_list, queuelist) { > - unsigned int sched_domain; > + spin_lock(&kcq->lock); > + list_splice_tail_init(&kcq->rq_list[flush_data->sched_domain], > + flush_data->list); > + sbitmap_clear_bit(sb, bitnr); > + spin_unlock(&kcq->lock); > > - sched_domain = rq_sched_domain(rq); > - list_move_tail(&rq->queuelist, &khd->rqs[sched_domain]); > - } > + return true; > +} > + > +static void kyber_flush_busy_kcqs(struct kyber_hctx_data *khd, > + unsigned int sched_domain, > + struct list_head *list) > +{ > + struct flush_kcq_data data = { > + .khd = khd, > + .sched_domain = sched_domain, > + .list = list, > + }; > + > + sbitmap_for_each_set(&khd->kcq_map[sched_domain], > + flush_busy_kcq, &data); > } > > static int kyber_domain_wake(wait_queue_entry_t *wait, unsigned mode, int flags, > @@ -570,26 +686,23 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd, > static struct request * > kyber_dispatch_cur_domain(struct kyber_queue_data *kqd, > struct kyber_hctx_data *khd, > - struct blk_mq_hw_ctx *hctx, > - bool *flushed) > + struct blk_mq_hw_ctx *hctx) > { > struct list_head *rqs; > struct request *rq; > int nr; > > rqs = &khd->rqs[khd->cur_domain]; > - rq = list_first_entry_or_null(rqs, struct request, queuelist); > > /* > - * If there wasn't already a pending request and we haven't flushed the > - * software queues yet, flush the software queues and check again. > + * If we already have a flushed request, then we just need to get a > + * token for it. Otherwise, if there are pending requests in the kcqs, > + * flush the kcqs, but only if we can get a token. If not, we should > + * leave the requests in the kcqs so that they can be merged. Note that > + * khd->lock serializes the flushes, so if we observed any bit set in > + * the kcq_map, we will always get a request. > */ > - if (!rq && !*flushed) { > - kyber_flush_busy_ctxs(khd, hctx); > - *flushed = true; > - rq = list_first_entry_or_null(rqs, struct request, queuelist); > - } > - > + rq = list_first_entry_or_null(rqs, struct request, queuelist); > if (rq) { > nr = kyber_get_domain_token(kqd, khd, hctx); > if (nr >= 0) { > @@ -598,6 +711,16 @@ kyber_dispatch_cur_domain(struct kyber_queue_data *kqd, > list_del_init(&rq->queuelist); > return rq; > } > + } else if (sbitmap_any_bit_set(&khd->kcq_map[khd->cur_domain])) { > + nr = kyber_get_domain_token(kqd, khd, hctx); > + if (nr >= 0) { > + kyber_flush_busy_kcqs(khd, khd->cur_domain, rqs); > + rq = list_first_entry(rqs, struct request, queuelist); > + khd->batching++; > + rq_set_domain_token(rq, nr); > + list_del_init(&rq->queuelist); > + return rq; > + } > } > > /* There were either no pending requests or no tokens. */ > @@ -608,7 +731,6 @@ static struct request *kyber_dispatch_request(struct blk_mq_hw_ctx *hctx) > { > struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data; > struct kyber_hctx_data *khd = hctx->sched_data; > - bool flushed = false; > struct request *rq; > int i; > > @@ -619,7 +741,7 @@ static struct request *kyber_dispatch_request(struct blk_mq_hw_ctx *hctx) > * from the batch. > */ > if (khd->batching < kyber_batch_size[khd->cur_domain]) { > - rq = kyber_dispatch_cur_domain(kqd, khd, hctx, &flushed); > + rq = kyber_dispatch_cur_domain(kqd, khd, hctx); > if (rq) > goto out; > } > @@ -640,7 +762,7 @@ static struct request *kyber_dispatch_request(struct blk_mq_hw_ctx *hctx) > else > khd->cur_domain++; > > - rq = kyber_dispatch_cur_domain(kqd, khd, hctx, &flushed); > + rq = kyber_dispatch_cur_domain(kqd, khd, hctx); > if (rq) > goto out; > } > @@ -657,10 +779,12 @@ static bool kyber_has_work(struct blk_mq_hw_ctx *hctx) > int i; > > for (i = 0; i < KYBER_NUM_DOMAINS; i++) { > - if (!list_empty_careful(&khd->rqs[i])) > + if (!list_empty_careful(&khd->rqs[i]) || > + sbitmap_any_bit_set(&khd->kcq_map[i])) > return true; > } > - return sbitmap_any_bit_set(&hctx->ctx_map); > + > + return false; > } > > #define KYBER_LAT_SHOW_STORE(op) \ > @@ -831,7 +955,9 @@ static struct elevator_type kyber_sched = { > .init_hctx = kyber_init_hctx, > .exit_hctx = kyber_exit_hctx, > .limit_depth = kyber_limit_depth, > + .bio_merge = kyber_bio_merge, > .prepare_request = kyber_prepare_request, > + .insert_requests = kyber_insert_requests, > .finish_request = kyber_finish_request, > .requeue_request = kyber_finish_request, > .completed_request = kyber_completed_request, > -- > 2.7.4 >