Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp131532pxb; Mon, 13 Sep 2021 15:03:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyHHlWzbcwfyUdqZlKkayoxximuZV3bcCxDrmGVpY/veFGqJzNoG8UZHR/81ySwYl29VBr2 X-Received: by 2002:a17:906:b08e:: with SMTP id x14mr15259266ejy.40.1631570599890; Mon, 13 Sep 2021 15:03:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631570599; cv=none; d=google.com; s=arc-20160816; b=bk+I0fXRySrfZ/Kc1iGgOMhIrmxSHJ5Y6Cpvot9wmseuDWdYGx+4zdozF1WFpusA0N u12YhHct8NN3Z8f3cmJAmYrdfdcDhNUvq8lxFDCQ3tpYGfPwQzT2y479GOLdJYWI4WtJ 1uoq+mi6KWuIQgRAARJajWX1LOkVBUuMl2HvnwtnvRyPcEyHZ+BNIKf9hifK/nnAKW6z nvpyc94TLzOb+FRrojMPQYbB3NOGWyYMWNKLueQudoemepT3oxpLhWLaTUK/wrQqwq9T LHsFzRZ9ubiH3ZAifkbpjOJeUZyVbC+VE/1NID+s1UkQsY9/ApSf/X3Tjw5jVfD+Y/i8 KWoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=5BVaL7nnJuMz3+69FkkQYTq5tIi2TFIR0dSluB1qEIU=; b=z4DqovascQbKWlqTd5HKek+5n/Y6TJC8qaslG9kKaumxt4hgO3JvHtJhV3WJtzS2oE T1JmaXMTRwZN05AaK4PqQZBpLklkF3pErVHFK4AXvwFexNcRJyR2+/StYO0s5S8fOKnJ MyaNuAim6UWFdDEll3n0j1tZMkuVkSdjtb6LRrKEtrXl1SYZdCjttgmXZuLlvWTGqeO0 C0vAOsUtyrVkS7Vruw3K4a1DGyyMM0qrd9wIBlnLnZE1MEAkTGMWGvHTkwhKWvnV0EhY 2sx05RdxK9S4jrBzmjtJIqtNhTMR0XQKJ26wBVABetfEnf80jnwIe6eIZeJR7PIcspWk KxCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=hGQvBMtU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 5si8230781eji.566.2021.09.13.15.02.32; Mon, 13 Sep 2021 15:03:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=hGQvBMtU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242217AbhIMN5i (ORCPT + 99 others); Mon, 13 Sep 2021 09:57:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:34744 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245499AbhIMNyn (ORCPT ); Mon, 13 Sep 2021 09:54:43 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 73C366124E; Mon, 13 Sep 2021 13:35:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1631540124; bh=Af6ZEEQwU4iHAJ/EZQKWUu5Z4tNS8dj1nUURSGLbKFM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hGQvBMtUnv1QnrC/CDaJXcfp57zzGK/V/b5NCQ4v5/0GSZkpjCbMsQbHtY5qkwd40 yg/koTMNm2caWIbdDTe+6Af9IWMSJexCg9OZHaM1TPHl4cMzqHrl9nvrStyityJGDj hzD4xDE3U3kpmDVA6IYAL59Ui6DeB2IyhOSpFSK0= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Chunguang Xu , Tejun Heo , Jens Axboe , Sasha Levin Subject: [PATCH 5.13 019/300] blk-throtl: optimize IOPS throttle for large IO scenarios Date: Mon, 13 Sep 2021 15:11:20 +0200 Message-Id: <20210913131109.949234366@linuxfoundation.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913131109.253835823@linuxfoundation.org> References: <20210913131109.253835823@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Chunguang Xu [ Upstream commit 4f1e9630afe6332de7286820fedd019f19eac057 ] After patch 54efd50 (block: make generic_make_request handle arbitrarily sized bios), the IO through io-throttle may be larger, and these IOs may be further split into more small IOs. However, IOPS throttle does not seem to be aware of this change, which makes the calculation of IOPS of large IOs incomplete, resulting in disk-side IOPS that does not meet expectations. Maybe we should fix this problem. We can reproduce it by set max_sectors_kb of disk to 128, set blkio.write_iops_throttle to 100, run a dd instance inside blkio and use iostat to watch IOPS: dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct As a result, without this change the average IOPS is 1995, with this change the IOPS is 98. Signed-off-by: Chunguang Xu Acked-by: Tejun Heo Link: https://lore.kernel.org/r/65869aaad05475797d63b4c3fed4f529febe3c26.1627876014.git.brookxu@tencent.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- block/blk-merge.c | 2 ++ block/blk-throttle.c | 32 ++++++++++++++++++++++++++++++++ block/blk.h | 2 ++ 3 files changed, 36 insertions(+) diff --git a/block/blk-merge.c b/block/blk-merge.c index bcdff1879c34..410ea45027c9 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -348,6 +348,8 @@ void __blk_queue_split(struct bio **bio, unsigned int *nr_segs) trace_block_split(split, (*bio)->bi_iter.bi_sector); submit_bio_noacct(*bio); *bio = split; + + blk_throtl_charge_bio_split(*bio); } } diff --git a/block/blk-throttle.c b/block/blk-throttle.c index b1b22d863bdf..55c49015e533 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -178,6 +178,9 @@ struct throtl_grp { unsigned int bad_bio_cnt; /* bios exceeding latency threshold */ unsigned long bio_cnt_reset_time; + atomic_t io_split_cnt[2]; + atomic_t last_io_split_cnt[2]; + struct blkg_rwstat stat_bytes; struct blkg_rwstat stat_ios; }; @@ -777,6 +780,8 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg, tg->bytes_disp[rw] = 0; tg->io_disp[rw] = 0; + atomic_set(&tg->io_split_cnt[rw], 0); + /* * Previous slice has expired. We must have trimmed it after last * bio dispatch. That means since start of last slice, we never used @@ -799,6 +804,9 @@ static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw) tg->io_disp[rw] = 0; tg->slice_start[rw] = jiffies; tg->slice_end[rw] = jiffies + tg->td->throtl_slice; + + atomic_set(&tg->io_split_cnt[rw], 0); + throtl_log(&tg->service_queue, "[%c] new slice start=%lu end=%lu jiffies=%lu", rw == READ ? 'R' : 'W', tg->slice_start[rw], @@ -1031,6 +1039,9 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, jiffies + tg->td->throtl_slice); } + if (iops_limit != UINT_MAX) + tg->io_disp[rw] += atomic_xchg(&tg->io_split_cnt[rw], 0); + if (tg_with_in_bps_limit(tg, bio, bps_limit, &bps_wait) && tg_with_in_iops_limit(tg, bio, iops_limit, &iops_wait)) { if (wait) @@ -2052,12 +2063,14 @@ static void throtl_downgrade_check(struct throtl_grp *tg) } if (tg->iops[READ][LIMIT_LOW]) { + tg->last_io_disp[READ] += atomic_xchg(&tg->last_io_split_cnt[READ], 0); iops = tg->last_io_disp[READ] * HZ / elapsed_time; if (iops >= tg->iops[READ][LIMIT_LOW]) tg->last_low_overflow_time[READ] = now; } if (tg->iops[WRITE][LIMIT_LOW]) { + tg->last_io_disp[WRITE] += atomic_xchg(&tg->last_io_split_cnt[WRITE], 0); iops = tg->last_io_disp[WRITE] * HZ / elapsed_time; if (iops >= tg->iops[WRITE][LIMIT_LOW]) tg->last_low_overflow_time[WRITE] = now; @@ -2176,6 +2189,25 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td) } #endif +void blk_throtl_charge_bio_split(struct bio *bio) +{ + struct blkcg_gq *blkg = bio->bi_blkg; + struct throtl_grp *parent = blkg_to_tg(blkg); + struct throtl_service_queue *parent_sq; + bool rw = bio_data_dir(bio); + + do { + if (!parent->has_rules[rw]) + break; + + atomic_inc(&parent->io_split_cnt[rw]); + atomic_inc(&parent->last_io_split_cnt[rw]); + + parent_sq = parent->service_queue.parent_sq; + parent = sq_to_tg(parent_sq); + } while (parent); +} + bool blk_throtl_bio(struct bio *bio) { struct request_queue *q = bio->bi_bdev->bd_disk->queue; diff --git a/block/blk.h b/block/blk.h index 54d48987c21b..40b00d18bdb2 100644 --- a/block/blk.h +++ b/block/blk.h @@ -290,11 +290,13 @@ int create_task_io_context(struct task_struct *task, gfp_t gfp_mask, int node); extern int blk_throtl_init(struct request_queue *q); extern void blk_throtl_exit(struct request_queue *q); extern void blk_throtl_register_queue(struct request_queue *q); +extern void blk_throtl_charge_bio_split(struct bio *bio); bool blk_throtl_bio(struct bio *bio); #else /* CONFIG_BLK_DEV_THROTTLING */ static inline int blk_throtl_init(struct request_queue *q) { return 0; } static inline void blk_throtl_exit(struct request_queue *q) { } static inline void blk_throtl_register_queue(struct request_queue *q) { } +static inline void blk_throtl_charge_bio_split(struct bio *bio) { } static inline bool blk_throtl_bio(struct bio *bio) { return false; } #endif /* CONFIG_BLK_DEV_THROTTLING */ #ifdef CONFIG_BLK_DEV_THROTTLING_LOW -- 2.30.2