2022-08-29 02:13:28

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v9 0/4] blk-throttle bugfix

From: Yu Kuai <[email protected]>

Changes in v9:
- renaming the flag BIO_THROTTLED to BIO_BPS_THROTTLED, and always
apply iops limit in path 1;
- add tag for patch 4
Changes in v8:
- use a new solution in patch 1
- move cleanups to a separate patchset
- rename bytes/io_skipped to carryover_bytes/ios in patch 4
Changes in v7:
- add patch 5 to improve handling of re-entered bio for bps limit
- as suggested by Tejun, add some comments
- sdd some Acked tag by Tejun
Changes in v6:
- rename parameter in patch 3
- add comments and reviewed tag for patch 4
Changes in v5:
- add comments in patch 4
- clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
patch 4
- and cleanup patches 5-8
Changes in v4:
- add reviewed-by tag for patch 1
- add patch 2,3
- use a different way to fix io hung in patch 4
Changes in v3:
- fix a check in patch 1
- fix link err in patch 2 on 32-bit platform
- handle overflow in patch 2
Changes in v2:
- use a new solution suggested by Ming
- change the title of patch 1
- add patch 2

Patch 1 fix that blk-throttle can't work if multiple bios are throttle.
Patch 2 fix overflow while calculating wait time.
Patch 3,4 fix io hung due to configuration updates.

Previous version:
v1: https://lore.kernel.org/all/[email protected]/
v2: https://lore.kernel.org/all/[email protected]/
v3: https://lore.kernel.org/all/[email protected]/
v4: https://lore.kernel.org/all/[email protected]/
v5: https://lore.kernel.org/all/[email protected]/
v6: https://lore.kernel.org/all/[email protected]/
v7: https://lore.kernel.org/all/[email protected]/
v8: https://lore.kernel.org/all/[email protected]/

Yu Kuai (4):
blk-throttle: fix that io throttle can only work for single bio
blk-throttle: prevent overflow while calculating wait time
blk-throttle: factor out code to calculate ios/bytes_allowed
blk-throttle: fix io hung due to configuration updates

block/bio.c | 2 -
block/blk-throttle.c | 137 +++++++++++++++++++++++++-------------
block/blk-throttle.h | 11 ++-
include/linux/bio.h | 2 +-
include/linux/blk_types.h | 2 +-
5 files changed, 104 insertions(+), 50 deletions(-)

--
2.31.1


2022-08-29 02:33:46

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v9 3/4] blk-throttle: factor out code to calculate ios/bytes_allowed

From: Yu Kuai <[email protected]>

No functional changes, new apis will be used in later patches to
calculate wait time for throttled bios when new configuration is
submitted.

Noted this patch also rename tg_with_in_iops/bps_limit() to
tg_within_iops/bps_limit().

Signed-off-by: Yu Kuai <[email protected]>
Acked-by: Tejun Heo <[email protected]>
---
block/blk-throttle.c | 59 ++++++++++++++++++++++++++------------------
1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index d193a738a966..04c72d3283a5 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -757,33 +757,20 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
tg->slice_start[rw], tg->slice_end[rw], jiffies);
}

-static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
- u32 iops_limit, unsigned long *wait)
+static unsigned int calculate_io_allowed(u32 iops_limit,
+ unsigned long jiffy_elapsed)
{
- bool rw = bio_data_dir(bio);
unsigned int io_allowed;
- unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
u64 tmp;

- if (iops_limit == UINT_MAX) {
- if (wait)
- *wait = 0;
- return true;
- }
-
- jiffy_elapsed = jiffies - tg->slice_start[rw];
-
- /* Round up to the next throttle slice, wait time must be nonzero */
- jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
-
/*
- * jiffy_elapsed_rnd should not be a big value as minimum iops can be
+ * jiffy_elapsed should not be a big value as minimum iops can be
* 1 then at max jiffy elapsed should be equivalent of 1 second as we
* will allow dispatch after 1 second and after that slice should
* have been trimmed.
*/

- tmp = (u64)iops_limit * jiffy_elapsed_rnd;
+ tmp = (u64)iops_limit * jiffy_elapsed;
do_div(tmp, HZ);

if (tmp > UINT_MAX)
@@ -791,6 +778,32 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
else
io_allowed = tmp;

+ return io_allowed;
+}
+
+static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
+{
+ return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
+}
+
+static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
+ u32 iops_limit, unsigned long *wait)
+{
+ bool rw = bio_data_dir(bio);
+ unsigned int io_allowed;
+ unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
+
+ if (iops_limit == UINT_MAX) {
+ if (wait)
+ *wait = 0;
+ return true;
+ }
+
+ jiffy_elapsed = jiffies - tg->slice_start[rw];
+
+ /* Round up to the next throttle slice, wait time must be nonzero */
+ jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
+ io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
if (tg->io_disp[rw] + 1 <= io_allowed) {
if (wait)
*wait = 0;
@@ -805,8 +818,8 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio,
return false;
}

-static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
- u64 bps_limit, unsigned long *wait)
+static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
+ u64 bps_limit, unsigned long *wait)
{
bool rw = bio_data_dir(bio);
u64 bytes_allowed, extra_bytes;
@@ -827,9 +840,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
jiffy_elapsed_rnd = tg->td->throtl_slice;

jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
- bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
- (u64)HZ);
-
+ bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
if (wait)
*wait = 0;
@@ -898,8 +909,8 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
jiffies + tg->td->throtl_slice);
}

- if (tg_with_in_bps_limit(tg, bio, bps_limit, &bps_wait) &&
- tg_with_in_iops_limit(tg, bio, iops_limit, &iops_wait)) {
+ if (tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) &&
+ tg_within_iops_limit(tg, bio, iops_limit, &iops_wait)) {
if (wait)
*wait = 0;
return true;
--
2.31.1

2022-08-29 02:34:30

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v9 2/4] blk-throttle: prevent overflow while calculating wait time

From: Yu Kuai <[email protected]>

There is a problem found by code review in tg_with_in_bps_limit() that
'bps_limit * jiffy_elapsed_rnd' might overflow. Fix the problem by
calling mul_u64_u64_div_u64() instead.

Signed-off-by: Yu Kuai <[email protected]>
Acked-by: Tejun Heo <[email protected]>
---
block/blk-throttle.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7a4c815e0f23..d193a738a966 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -809,7 +809,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
u64 bps_limit, unsigned long *wait)
{
bool rw = bio_data_dir(bio);
- u64 bytes_allowed, extra_bytes, tmp;
+ u64 bytes_allowed, extra_bytes;
unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
unsigned int bio_size = throtl_bio_data_size(bio);

@@ -827,10 +827,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
jiffy_elapsed_rnd = tg->td->throtl_slice;

jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
-
- tmp = bps_limit * jiffy_elapsed_rnd;
- do_div(tmp, HZ);
- bytes_allowed = tmp;
+ bytes_allowed = mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed_rnd,
+ (u64)HZ);

if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
if (wait)
--
2.31.1

2022-08-29 02:44:51

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v9 1/4] blk-throttle: fix that io throttle can only work for single bio

From: Yu Kuai <[email protected]>

Test scripts:
cd /sys/fs/cgroup/blkio/
echo "8:0 1024" > blkio.throttle.write_bps_device
echo $$ > cgroup.procs
dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &

Test result:
10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s
10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s

The problem is that the second bio is finished after 10s instead of 20s.

Root cause:
1) second bio will be flagged:

__blk_throtl_bio
while (true) {
...
if (sq->nr_queued[rw]) -> some bio is throttled already
break
};
bio_set_flag(bio, BIO_THROTTLED); -> flag the bio

2) flagged bio will be dispatched without waiting:

throtl_dispatch_tg
tg_may_dispatch
tg_with_in_bps_limit
if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED))
*wait = 0; -> wait time is zero
return true;

commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
support to count split bios for iops limit, thus it adds flagged bio
checking in tg_with_in_bps_limit() so that split bios will only count
once for bps limit, however, it introduce a new problem that io throttle
won't work if multiple bios are throttled.

In order to fix the problem, handle iops/bps limit in different ways:

1) for iops limit, there is no flag to record if the bio is throttled,
and iops is always applied.
2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED,
and io throttle will ignore bio with the flag.

Noted this patch also remove the code to set flag in __bio_clone(), it's
introduced in commit 111be8839817 ("block-throttle: avoid double
charge"), and author thinks split bio can be resubmited and throttled
again, which is wrong because split bio will continue to dispatch from
caller.

Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
Cc: <[email protected]>
Signed-off-by: Yu Kuai <[email protected]>
---
block/bio.c | 2 --
block/blk-throttle.c | 20 ++++++--------------
block/blk-throttle.h | 2 +-
include/linux/bio.h | 2 +-
include/linux/blk_types.h | 2 +-
5 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3d3a2678fea2..77e3b764a078 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -760,8 +760,6 @@ EXPORT_SYMBOL(bio_put);
static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
{
bio_set_flag(bio, BIO_CLONED);
- if (bio_flagged(bio_src, BIO_THROTTLED))
- bio_set_flag(bio, BIO_THROTTLED);
bio->bi_ioprio = bio_src->bi_ioprio;
bio->bi_iter = bio_src->bi_iter;

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 47142a1dd102..7a4c815e0f23 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -814,7 +814,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
unsigned int bio_size = throtl_bio_data_size(bio);

/* no need to throttle if this bio's bytes have been accounted */
- if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
+ if (bps_limit == U64_MAX || bio_flagged(bio, BIO_BPS_THROTTLED)) {
if (wait)
*wait = 0;
return true;
@@ -924,22 +924,13 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
unsigned int bio_size = throtl_bio_data_size(bio);

/* Charge the bio to the group */
- if (!bio_flagged(bio, BIO_THROTTLED)) {
+ if (!bio_flagged(bio, BIO_BPS_THROTTLED)) {
tg->bytes_disp[rw] += bio_size;
tg->last_bytes_disp[rw] += bio_size;
}

tg->io_disp[rw]++;
tg->last_io_disp[rw]++;
-
- /*
- * BIO_THROTTLED is used to prevent the same bio to be throttled
- * more than once as a throttled bio will go through blk-throtl the
- * second time when it eventually gets issued. Set it when a bio
- * is being charged to a tg.
- */
- if (!bio_flagged(bio, BIO_THROTTLED))
- bio_set_flag(bio, BIO_THROTTLED);
}

/**
@@ -1029,6 +1020,7 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw)
sq->nr_queued[rw]--;

throtl_charge_bio(tg, bio);
+ bio_set_flag(bio, BIO_BPS_THROTTLED);

/*
* If our parent is another tg, we just need to transfer @bio to
@@ -2162,8 +2154,10 @@ bool __blk_throtl_bio(struct bio *bio)
qn = &tg->qnode_on_parent[rw];
sq = sq->parent_sq;
tg = sq_to_tg(sq);
- if (!tg)
+ if (!tg) {
+ bio_set_flag(bio, BIO_BPS_THROTTLED);
goto out_unlock;
+ }
}

/* out-of-limit, queue to @tg */
@@ -2192,8 +2186,6 @@ bool __blk_throtl_bio(struct bio *bio)
}

out_unlock:
- bio_set_flag(bio, BIO_THROTTLED);
-
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
if (throttled || !td->track_bio_latency)
bio->bi_issue.value |= BIO_ISSUE_THROTL_SKIP_LATENCY;
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index c1b602996127..ee7299e6dea9 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -175,7 +175,7 @@ static inline bool blk_throtl_bio(struct bio *bio)
struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg);

/* no need to throttle bps any more if the bio has been throttled */
- if (bio_flagged(bio, BIO_THROTTLED) &&
+ if (bio_flagged(bio, BIO_BPS_THROTTLED) &&
!(tg->flags & THROTL_TG_HAS_IOPS_LIMIT))
return false;

diff --git a/include/linux/bio.h b/include/linux/bio.h
index ca22b06700a9..2c5806997bbf 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -509,7 +509,7 @@ static inline void bio_set_dev(struct bio *bio, struct block_device *bdev)
{
bio_clear_flag(bio, BIO_REMAPPED);
if (bio->bi_bdev != bdev)
- bio_clear_flag(bio, BIO_THROTTLED);
+ bio_clear_flag(bio, BIO_BPS_THROTTLED);
bio->bi_bdev = bdev;
bio_associate_blkg(bio);
}
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 1ef99790f6ed..41afb4cfb9b0 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -325,7 +325,7 @@ enum {
BIO_QUIET, /* Make BIO Quiet */
BIO_CHAIN, /* chained bio, ->bi_remaining in effect */
BIO_REFFED, /* bio has elevated ->bi_cnt */
- BIO_THROTTLED, /* This bio has already been subjected to
+ BIO_BPS_THROTTLED, /* This bio has already been subjected to
* throttling rules. Don't do it again. */
BIO_TRACE_COMPLETION, /* bio_endio() should trace the final completion
* of this bio. */
--
2.31.1

2022-08-29 02:46:16

by Yu Kuai

[permalink] [raw]
Subject: [PATCH v9 4/4] blk-throttle: fix io hung due to configuration updates

From: Yu Kuai <[email protected]>

If new configuration is submitted while a bio is throttled, then new
waiting time is recalculated regardless that the bio might already wait
for some time:

tg_conf_updated
throtl_start_new_slice
tg_update_disptime
throtl_schedule_next_dispatch

Then io hung can be triggered by always submmiting new configuration
before the throttled bio is dispatched.

Fix the problem by respecting the time that throttled bio already waited.
In order to do that, add new fields to record how many bytes/io are
waited, and use it to calculate wait time for throttled bio under new
configuration.

Some simple test:
1)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 2048" > blkio.throttle.write_bps_device
{
sleep 2
echo "8:0 1024" > blkio.throttle.write_bps_device
} &
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

2)
cd /sys/fs/cgroup/blkio/
echo $$ > cgroup.procs
echo "8:0 1024" > blkio.throttle.write_bps_device
{
sleep 4
echo "8:0 2048" > blkio.throttle.write_bps_device
} &
dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct

test results: io finish time
before this patch with this patch
1) 10s 6s
2) 8s 6s

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Michal Koutný <[email protected]>
Acked-by: Tejun Heo <[email protected]>
---
block/blk-throttle.c | 58 +++++++++++++++++++++++++++++++++++++++-----
block/blk-throttle.h | 9 +++++++
2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 04c72d3283a5..7179b5cd42dc 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -642,6 +642,8 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
{
tg->bytes_disp[rw] = 0;
tg->io_disp[rw] = 0;
+ tg->carryover_bytes[rw] = 0;
+ tg->carryover_ios[rw] = 0;

/*
* Previous slice has expired. We must have trimmed it after last
@@ -659,12 +661,17 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg,
tg->slice_end[rw], jiffies);
}

-static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw)
+static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw,
+ bool clear_carryover)
{
tg->bytes_disp[rw] = 0;
tg->io_disp[rw] = 0;
tg->slice_start[rw] = jiffies;
tg->slice_end[rw] = jiffies + tg->td->throtl_slice;
+ if (clear_carryover) {
+ tg->carryover_bytes[rw] = 0;
+ tg->carryover_ios[rw] = 0;
+ }

throtl_log(&tg->service_queue,
"[%c] new slice start=%lu end=%lu jiffies=%lu",
@@ -786,6 +793,41 @@ static u64 calculate_bytes_allowed(u64 bps_limit, unsigned long jiffy_elapsed)
return mul_u64_u64_div_u64(bps_limit, (u64)jiffy_elapsed, (u64)HZ);
}

+static void __tg_update_carryover(struct throtl_grp *tg, bool rw)
+{
+ unsigned long jiffy_elapsed = jiffies - tg->slice_start[rw];
+ u64 bps_limit = tg_bps_limit(tg, rw);
+ u32 iops_limit = tg_iops_limit(tg, rw);
+
+ /*
+ * If config is updated while bios are still throttled, calculate and
+ * accumulate how many bytes/ios are waited across changes. And
+ * carryover_bytes/ios will be used to calculate new wait time under new
+ * configuration.
+ */
+ if (bps_limit != U64_MAX)
+ tg->carryover_bytes[rw] +=
+ calculate_bytes_allowed(bps_limit, jiffy_elapsed) -
+ tg->bytes_disp[rw];
+ if (iops_limit != UINT_MAX)
+ tg->carryover_ios[rw] +=
+ calculate_io_allowed(iops_limit, jiffy_elapsed) -
+ tg->io_disp[rw];
+}
+
+static void tg_update_carryover(struct throtl_grp *tg)
+{
+ if (tg->service_queue.nr_queued[READ])
+ __tg_update_carryover(tg, READ);
+ if (tg->service_queue.nr_queued[WRITE])
+ __tg_update_carryover(tg, WRITE);
+
+ /* see comments in struct throtl_grp for meaning of these fields. */
+ throtl_log(&tg->service_queue, "%s: %llu %llu %u %u\n", __func__,
+ tg->carryover_bytes[READ], tg->carryover_bytes[WRITE],
+ tg->carryover_ios[READ], tg->carryover_ios[WRITE]);
+}
+
static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,
u32 iops_limit, unsigned long *wait)
{
@@ -803,7 +845,8 @@ static bool tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio,

/* Round up to the next throttle slice, wait time must be nonzero */
jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice);
- io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd);
+ io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd) +
+ tg->carryover_ios[rw];
if (tg->io_disp[rw] + 1 <= io_allowed) {
if (wait)
*wait = 0;
@@ -840,7 +883,8 @@ static bool tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio,
jiffy_elapsed_rnd = tg->td->throtl_slice;

jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice);
- bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd);
+ bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd) +
+ tg->carryover_bytes[rw];
if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) {
if (wait)
*wait = 0;
@@ -901,7 +945,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
* slice and it should be extended instead.
*/
if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
- throtl_start_new_slice(tg, rw);
+ throtl_start_new_slice(tg, rw, true);
else {
if (time_before(tg->slice_end[rw],
jiffies + tg->td->throtl_slice))
@@ -1325,8 +1369,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
* that a group's limit are dropped suddenly and we don't want to
* account recently dispatched IO with new low rate.
*/
- throtl_start_new_slice(tg, READ);
- throtl_start_new_slice(tg, WRITE);
+ throtl_start_new_slice(tg, READ, false);
+ throtl_start_new_slice(tg, WRITE, false);

if (tg->flags & THROTL_TG_PENDING) {
tg_update_disptime(tg);
@@ -1354,6 +1398,7 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of,
v = U64_MAX;

tg = blkg_to_tg(ctx.blkg);
+ tg_update_carryover(tg);

if (is_u64)
*(u64 *)((void *)tg + of_cft(of)->private) = v;
@@ -1540,6 +1585,7 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of,
return ret;

tg = blkg_to_tg(ctx.blkg);
+ tg_update_carryover(tg);

v[0] = tg->bps_conf[READ][index];
v[1] = tg->bps_conf[WRITE][index];
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index ee7299e6dea9..66b4292b1b92 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -121,6 +121,15 @@ struct throtl_grp {
uint64_t last_bytes_disp[2];
unsigned int last_io_disp[2];

+ /*
+ * The following two fields are updated when new configuration is
+ * submitted while some bios are still throttled, they record how many
+ * bytes/ios are waited already in previous configuration, and they will
+ * be used to calculate wait time under new configuration.
+ */
+ uint64_t carryover_bytes[2];
+ unsigned int carryover_ios[2];
+
unsigned long last_check_time;

unsigned long latency_target; /* us */
--
2.31.1

2022-08-29 05:26:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v9 1/4] blk-throttle: fix that io throttle can only work for single bio

On Mon, Aug 29, 2022 at 10:22:37AM +0800, Yu Kuai wrote:
> From: Yu Kuai <[email protected]>
>
> Test scripts:
> cd /sys/fs/cgroup/blkio/
> echo "8:0 1024" > blkio.throttle.write_bps_device
> echo $$ > cgroup.procs
> dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
> dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct &
>
> Test result:
> 10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s
> 10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s
>
> The problem is that the second bio is finished after 10s instead of 20s.
>
> Root cause:
> 1) second bio will be flagged:
>
> __blk_throtl_bio
> while (true) {
> ...
> if (sq->nr_queued[rw]) -> some bio is throttled already
> break
> };
> bio_set_flag(bio, BIO_THROTTLED); -> flag the bio
>
> 2) flagged bio will be dispatched without waiting:
>
> throtl_dispatch_tg
> tg_may_dispatch
> tg_with_in_bps_limit
> if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED))
> *wait = 0; -> wait time is zero
> return true;
>
> commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
> support to count split bios for iops limit, thus it adds flagged bio
> checking in tg_with_in_bps_limit() so that split bios will only count
> once for bps limit, however, it introduce a new problem that io throttle
> won't work if multiple bios are throttled.
>
> In order to fix the problem, handle iops/bps limit in different ways:
>
> 1) for iops limit, there is no flag to record if the bio is throttled,
> and iops is always applied.
> 2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED,
> and io throttle will ignore bio with the flag.
>
> Noted this patch also remove the code to set flag in __bio_clone(), it's
> introduced in commit 111be8839817 ("block-throttle: avoid double
> charge"), and author thinks split bio can be resubmited and throttled
> again, which is wrong because split bio will continue to dispatch from
> caller.
>
> Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit")
> Cc: <[email protected]>
> Signed-off-by: Yu Kuai <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2022-08-31 11:49:11

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH v9 0/4] blk-throttle bugfix

Hi, Jens!

在 2022/08/29 10:22, Yu Kuai 写道:
> From: Yu Kuai <[email protected]>
>
> Changes in v9:
> - renaming the flag BIO_THROTTLED to BIO_BPS_THROTTLED, and always
> apply iops limit in path 1;
> - add tag for patch 4
> Changes in v8:
> - use a new solution in patch 1
> - move cleanups to a separate patchset
> - rename bytes/io_skipped to carryover_bytes/ios in patch 4
> Changes in v7:
> - add patch 5 to improve handling of re-entered bio for bps limit
> - as suggested by Tejun, add some comments
> - sdd some Acked tag by Tejun
> Changes in v6:
> - rename parameter in patch 3
> - add comments and reviewed tag for patch 4
> Changes in v5:
> - add comments in patch 4
> - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
> patch 4
> - and cleanup patches 5-8
> Changes in v4:
> - add reviewed-by tag for patch 1
> - add patch 2,3
> - use a different way to fix io hung in patch 4
> Changes in v3:
> - fix a check in patch 1
> - fix link err in patch 2 on 32-bit platform
> - handle overflow in patch 2
> Changes in v2:
> - use a new solution suggested by Ming
> - change the title of patch 1
> - add patch 2
>
> Patch 1 fix that blk-throttle can't work if multiple bios are throttle.
> Patch 2 fix overflow while calculating wait time.
> Patch 3,4 fix io hung due to configuration updates.
>
> Previous version:
> v1: https://lore.kernel.org/all/[email protected]/
> v2: https://lore.kernel.org/all/[email protected]/
> v3: https://lore.kernel.org/all/[email protected]/
> v4: https://lore.kernel.org/all/[email protected]/
> v5: https://lore.kernel.org/all/[email protected]/
> v6: https://lore.kernel.org/all/[email protected]/
> v7: https://lore.kernel.org/all/[email protected]/
> v8: https://lore.kernel.org/all/[email protected]/
>
> Yu Kuai (4):
> blk-throttle: fix that io throttle can only work for single bio
> blk-throttle: prevent overflow while calculating wait time
> blk-throttle: factor out code to calculate ios/bytes_allowed
> blk-throttle: fix io hung due to configuration updates

Can you apply this patchset now?

Thanks,
Kuai
>
> block/bio.c | 2 -
> block/blk-throttle.c | 137 +++++++++++++++++++++++++-------------
> block/blk-throttle.h | 11 ++-
> include/linux/bio.h | 2 +-
> include/linux/blk_types.h | 2 +-
> 5 files changed, 104 insertions(+), 50 deletions(-)
>

2022-09-12 06:18:38

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH v9 0/4] blk-throttle bugfix



在 2022/08/31 19:31, Yu Kuai 写道:
> Hi, Jens!
>
> 在 2022/08/29 10:22, Yu Kuai 写道:
>> From: Yu Kuai <[email protected]>
>>
>> Changes in v9:
>>   - renaming the flag BIO_THROTTLED to BIO_BPS_THROTTLED, and always
>>   apply iops limit in path 1;
>>   - add tag for patch 4
>> Changes in v8:
>>   - use a new solution in patch 1
>>   - move cleanups to a separate patchset
>>   - rename bytes/io_skipped to carryover_bytes/ios in patch 4
>> Changes in v7:
>>   - add patch 5 to improve handling of re-entered bio for bps limit
>>   - as suggested by Tejun, add some comments
>>   - sdd some Acked tag by Tejun
>> Changes in v6:
>>   - rename parameter in patch 3
>>   - add comments and reviewed tag for patch 4
>> Changes in v5:
>>   - add comments in patch 4
>>   - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
>>   patch 4
>>   - and cleanup patches 5-8
>> Changes in v4:
>>   - add reviewed-by tag for patch 1
>>   - add patch 2,3
>>   - use a different way to fix io hung in patch 4
>> Changes in v3:
>>   - fix a check in patch 1
>>   - fix link err in patch 2 on 32-bit platform
>>   - handle overflow in patch 2
>> Changes in v2:
>>   - use a new solution suggested by Ming
>>   - change the title of patch 1
>>   - add patch 2
>>
>> Patch 1 fix that blk-throttle can't work if multiple bios are throttle.
>> Patch 2 fix overflow while calculating wait time.
>> Patch 3,4 fix io hung due to configuration updates.
>>
>> Previous version:
>> v1:
>> https://lore.kernel.org/all/[email protected]/
>> v2:
>> https://lore.kernel.org/all/[email protected]/
>> v3:
>> https://lore.kernel.org/all/[email protected]/
>> v4:
>> https://lore.kernel.org/all/[email protected]/
>> v5:
>> https://lore.kernel.org/all/[email protected]/
>> v6:
>> https://lore.kernel.org/all/[email protected]/
>>
>> v7:
>> https://lore.kernel.org/all/[email protected]/
>>
>> v8:
>> https://lore.kernel.org/all/[email protected]/
>>
>>
>> Yu Kuai (4):
>>    blk-throttle: fix that io throttle can only work for single bio
>>    blk-throttle: prevent overflow while calculating wait time
>>    blk-throttle: factor out code to calculate ios/bytes_allowed
>>    blk-throttle: fix io hung due to configuration updates
>
> Can you apply this patchset now?

friendly ping...
>
> Thanks,
> Kuai
>>
>>   block/bio.c               |   2 -
>>   block/blk-throttle.c      | 137 +++++++++++++++++++++++++-------------
>>   block/blk-throttle.h      |  11 ++-
>>   include/linux/bio.h       |   2 +-
>>   include/linux/blk_types.h |   2 +-
>>   5 files changed, 104 insertions(+), 50 deletions(-)
>>
>
> .
>

2022-09-12 06:46:30

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v9 0/4] blk-throttle bugfix

On Mon, 29 Aug 2022 10:22:36 +0800, Yu Kuai wrote:
> From: Yu Kuai <[email protected]>
>
> Changes in v9:
> - renaming the flag BIO_THROTTLED to BIO_BPS_THROTTLED, and always
> apply iops limit in path 1;
> - add tag for patch 4
> Changes in v8:
> - use a new solution in patch 1
> - move cleanups to a separate patchset
> - rename bytes/io_skipped to carryover_bytes/ios in patch 4
> Changes in v7:
> - add patch 5 to improve handling of re-entered bio for bps limit
> - as suggested by Tejun, add some comments
> - sdd some Acked tag by Tejun
> Changes in v6:
> - rename parameter in patch 3
> - add comments and reviewed tag for patch 4
> Changes in v5:
> - add comments in patch 4
> - clear bytes/io_skipped in throtl_start_new_slice_with_credit() in
> patch 4
> - and cleanup patches 5-8
> Changes in v4:
> - add reviewed-by tag for patch 1
> - add patch 2,3
> - use a different way to fix io hung in patch 4
> Changes in v3:
> - fix a check in patch 1
> - fix link err in patch 2 on 32-bit platform
> - handle overflow in patch 2
> Changes in v2:
> - use a new solution suggested by Ming
> - change the title of patch 1
> - add patch 2
>
> [...]

Applied, thanks!

[1/4] blk-throttle: fix that io throttle can only work for single bio
commit: 320fb0f91e55ba248d4bad106b408e59099cfa89
[2/4] blk-throttle: prevent overflow while calculating wait time
commit: 8d6bbaada2e0a65f9012ac4c2506460160e7237a
[3/4] blk-throttle: factor out code to calculate ios/bytes_allowed
commit: 681cd46fff8cd81e387747c7850f2e730d3e0b74
[4/4] blk-throttle: fix io hung due to configuration updates
commit: a880ae93e5b5bb5d8d5500077a391e3f5ec7715c

Best regards,
--
Jens Axboe