Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp1731849ioo; Mon, 23 May 2022 01:48:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzHQ7PjvgbX3mipy94eLVROjjNq1qqUaMVBY1DgHmWW8OOYMU5WTIsqq2XbBg2Ln8AumsOy X-Received: by 2002:a63:4666:0:b0:3fa:287f:b714 with SMTP id v38-20020a634666000000b003fa287fb714mr5859832pgk.398.1653295722087; Mon, 23 May 2022 01:48:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653295722; cv=none; d=google.com; s=arc-20160816; b=sooNqH+I9pqgJ1di/y1CvlymyLPArjTcXH+xqzKMhuG/mLzjerjw/n9aDhNFDxByZh V5ztaYGmXQ67Z5e18JQvy6XLJaYpiqc9qRKsvR+dxGDhgBeMQM8z7J2d7PBoKal1bV6h r/rQLgxyEx1Lt1RaMAnC3Fkx6JSdzoI+Akt/MlrZ7ilAjXGsFJUJ/asuaCqehyyZgqn5 snVIVz0GOsM3kprkCECD406NJm0yJUJi2N9I8aPxq2LzR1/Qq+swhLAUDid1T+rO3KUg DpzaVGDuM4RuyznuxXsmKh7rsvLGsGiedcKwTT6oasfyl4duOZTXt4HOEgMV/WrtBPAq 6RSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=RmNgF865qCRGsbLIrykuePSdFNUwYLIBuZmb+C3PgNE=; b=CxeeHIfh1oCjbmzu55R7VX67fLxr65nTczb7TEM4JVTVUojUsFDjYrKtGHQsbbsz9m UlJ3jkuHJKUHFvZD2G6HYYfyCFyrwb1F4SZmqTOqVmdgCiJvtXI/RSRB7zq+YhbCaeAh aTA088rghNxG5lm5ziD/tiGzlzFtlRMM350Y4XygpwztneFxc+TcuLdmcID5/obmunCK Dao0usoke4AvoFzsmm4d8r4h9j6uOxCLjKSULjbX5m2/j/PYBugHGKhoPHwpYZiGKxvM x2RzU3gaj0c5KcD6SlNnDCuw9ksaSBdUsLtkfRygVxkLuiwzumHzk1xSgmrPKiiSx0es akSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id p5-20020a1709026b8500b0016143612a07si5475916plk.343.2022.05.23.01.48.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 01:48:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 151E2E03F; Mon, 23 May 2022 01:13:36 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231664AbiEWINV (ORCPT + 99 others); Mon, 23 May 2022 04:13:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231488AbiEWINE (ORCPT ); Mon, 23 May 2022 04:13:04 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6B9C63CB; Mon, 23 May 2022 01:13:01 -0700 (PDT) Received: from kwepemi100004.china.huawei.com (unknown [172.30.72.53]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4L697G374MzDqNs; Mon, 23 May 2022 16:12:58 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100004.china.huawei.com (7.221.188.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 23 May 2022 16:12:59 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 23 May 2022 16:12:59 +0800 From: Yu Kuai To: , , , CC: , , , , Subject: [PATCH -next v4 4/4] blk-throttle: fix io hung due to config updates Date: Mon, 23 May 2022 16:26:33 +0800 Message-ID: <20220523082633.2324980-5-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220523082633.2324980-1-yukuai3@huawei.com> References: <20220523082633.2324980-1-yukuai3@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org If new configuration is submitted while a bio is throttled, then new waiting time is recalculated regardless that the bio might aready wait for some time: tg_conf_updated throtl_start_new_slice tg_update_disptime throtl_schedule_next_dispatch Then io hung can be triggered by always submmiting new configuration before the throttled bio is dispatched. Fix the problem by respecting the time that throttled bio aready waited. In order to do that, add new fields to record how many bytes/io already waited, and use it to calculate wait time for throttled bio under new configuration. Some simple test: 1) cd /sys/fs/cgroup/blkio/ echo $$ > cgroup.procs echo "8:0 2048" > blkio.throttle.write_bps_device { sleep 3 echo "8:0 1024" > blkio.throttle.write_bps_device } & sleep 1 dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct 2) cd /sys/fs/cgroup/blkio/ echo $$ > cgroup.procs echo "8:0 1024" > blkio.throttle.write_bps_device { sleep 5 echo "8:0 2048" > blkio.throttle.write_bps_device } & sleep 1 dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct test results: io finish time before this patch with this patch 1) 10s 6s 2) 8s 6s Signed-off-by: Yu Kuai --- block/blk-throttle.c | 49 ++++++++++++++++++++++++++++++++++++++------ block/blk-throttle.h | 4 ++++ 2 files changed, 47 insertions(+), 6 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index ded0d30ef49e..612bd221783c 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -656,12 +656,17 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg, tg->slice_end[rw], jiffies); } -static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw) +static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw, + bool clear_skipped) { tg->bytes_disp[rw] = 0; tg->io_disp[rw] = 0; tg->slice_start[rw] = jiffies; tg->slice_end[rw] = jiffies + tg->td->throtl_slice; + if (clear_skipped) { + tg->bytes_skipped[rw] = 0; + tg->io_skipped[rw] = 0; + } throtl_log(&tg->service_queue, "[%c] new slice start=%lu end=%lu jiffies=%lu", @@ -784,6 +789,34 @@ static u64 calculate_bytes_allowed(u64 bps_limit, return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd, (u64)HZ); } +static void __tg_update_skipped(struct throtl_grp *tg, bool rw) +{ + unsigned long jiffy_elapsed = jiffies - tg->slice_start[rw]; + u64 bps_limit = tg_bps_limit(tg, rw); + u32 iops_limit = tg_iops_limit(tg, rw); + + if (bps_limit != U64_MAX) + tg->bytes_skipped[rw] += + calculate_bytes_allowed(bps_limit, jiffy_elapsed) - + tg->bytes_disp[rw]; + if (iops_limit != UINT_MAX) + tg->io_skipped[rw] += + calculate_io_allowed(iops_limit, jiffy_elapsed) - + tg->io_disp[rw]; +} + +static void tg_update_skipped(struct throtl_grp *tg) +{ + if (tg->service_queue.nr_queued[READ]) + __tg_update_skipped(tg, READ); + if (tg->service_queue.nr_queued[WRITE]) + __tg_update_skipped(tg, WRITE); + + throtl_log(&tg->service_queue, "%s: %llu %llu %u %u\n", __func__, + tg->bytes_skipped[READ], tg->bytes_skipped[WRITE], + tg->io_skipped[READ], tg->io_skipped[WRITE]); +} + static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio, u32 iops_limit, unsigned long *wait) { @@ -801,7 +834,8 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio, /* Round up to the next throttle slice, wait time must be nonzero */ jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice); - io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd); + io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) + + tg->io_skipped[rw]; if (tg->io_disp[rw] + 1 <= io_allowed) { if (wait) *wait = 0; @@ -838,7 +872,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, jiffy_elapsed_rnd = tg->td->throtl_slice; jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice); - bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd); + bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) + + tg->bytes_skipped[rw]; if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) { if (wait) *wait = 0; @@ -899,7 +934,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, * slice and it should be extended instead. */ if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw])) - throtl_start_new_slice(tg, rw); + throtl_start_new_slice(tg, rw, true); else { if (time_before(tg->slice_end[rw], jiffies + tg->td->throtl_slice)) @@ -1328,8 +1363,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global) * that a group's limit are dropped suddenly and we don't want to * account recently dispatched IO with new low rate. */ - throtl_start_new_slice(tg, READ); - throtl_start_new_slice(tg, WRITE); + throtl_start_new_slice(tg, READ, false); + throtl_start_new_slice(tg, WRITE, false); if (tg->flags & THROTL_TG_PENDING) { tg_update_disptime(tg); @@ -1357,6 +1392,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of, v = U64_MAX; tg = blkg_to_tg(ctx.blkg); + tg_update_skipped(tg); if (is_u64) *(u64 *)((void *)tg + of_cft(of)->private) = v; @@ -1543,6 +1579,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, return ret; tg = blkg_to_tg(ctx.blkg); + tg_update_skipped(tg); v[0] = tg->bps_conf[READ][index]; v[1] = tg->bps_conf[WRITE][index]; diff --git a/block/blk-throttle.h b/block/blk-throttle.h index c1b602996127..845909c72f86 100644 --- a/block/blk-throttle.h +++ b/block/blk-throttle.h @@ -115,6 +115,10 @@ struct throtl_grp { uint64_t bytes_disp[2]; /* Number of bio's dispatched in current slice */ unsigned int io_disp[2]; + /* Number of bytes will be skipped in current slice */ + uint64_t bytes_skipped[2]; + /* Number of bio's will be skipped in current slice */ + unsigned int io_skipped[2]; unsigned long last_low_overflow_time[2]; -- 2.31.1