2020-09-18 00:46:23

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET for-5.10/block] iocost: improve debt forgiveness logic

Hello,

Debt reduction logic was recently added by dda1315f1853 ("blk-iocost: halve
debts if device stays idle"). While it was effective at avoiding
pathological cases where some iocgs were kept delayed while the device was
most idle, it wasn't very effective at addressing more complex conditions
and could leave low priority cgroups unnecessarily harshly throttled under
moderate load.

This patchset improves the debt forgiveness logic so that it's more
effective at reducing such unnecessary throttling. This patchset contains
the following five patches:

0001-iocost-factor-out-ioc_forgive_debts.patch
0002-iocost-replace-nr_shortages-cond-in-ioc_forgive_debt.patch
0003-iocost-recalculate-delay-after-debt-reduction.patch
0004-iocost-reimplement-debt-forgiveness-using-average-us.patch
0005-iocost-add-iocg_forgive_debt-tracepoint.patch

and is also available in the following git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git iocost-debt-forgiveness

diffstat follows. Thanks.

block/blk-iocost.c | 141 +++++++++++++++++++++++++++++-------------
include/trace/events/iocost.h | 41 ++++++++++++
2 files changed, 141 insertions(+), 41 deletions(-)

--
tejun


2020-09-18 00:46:33

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/5] iocost: replace nr_shortages cond in ioc_forgive_debts() with busy_level one

Debt reduction was blocked if any iocg was short on budget in the past
period to avoid reducing debts while some iocgs are saturated. However, this
ends up unnecessarily blocking debt reduction due to temporary local
imbalances when the device is generally being underutilized, while also
failing to block when the underlying device is overwhelmed and the usage
becomes low from high latency.

Given that debt accumulation mostly happens with swapout bursts which can
significantly deteriorate the underlying device's latency response, the
current logic is not great.

Let's replace it with ioc->busy_level based condition so that we block debt
reduction when the underlying device is being saturated. ioc_forgive_debts()
call is moved after busy_level determination.

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-iocost.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index bbf30bb06c07..c0499c294da9 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -287,10 +287,7 @@ enum {
MIN_DELAY = 250,
MAX_DELAY = 250 * USEC_PER_MSEC,

- /*
- * Halve debts if total usage keeps staying under 25% w/o any shortages
- * for over 100ms.
- */
+ /* halve debts if total usage keeps staying under 25% for over 100ms */
DEBT_BUSY_USAGE_PCT = 25,
DEBT_REDUCTION_IDLE_DUR = 100 * USEC_PER_MSEC,

@@ -1990,9 +1987,9 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
* sufficiently idle for a while, the debts are halved.
*/
static void ioc_forgive_debts(struct ioc *ioc, u64 usage_us_sum, int nr_debtors,
- int nr_shortages, struct ioc_now *now)
+ struct ioc_now *now)
{
- if (nr_shortages ||
+ if (ioc->busy_level < 0 ||
div64_u64(100 * usage_us_sum, now->now - ioc->period_at) >=
DEBT_BUSY_USAGE_PCT)
ioc->debt_busy_at = now->now;
@@ -2205,8 +2202,6 @@ static void ioc_timer_fn(struct timer_list *timer)
list_for_each_entry_safe(iocg, tiocg, &surpluses, surplus_list)
list_del_init(&iocg->surplus_list);

- ioc_forgive_debts(ioc, usage_us_sum, nr_debtors, nr_shortages, &now);
-
/*
* If q is getting clogged or we're missing too much, we're issuing
* too much IO and should lower vtime rate. If we're not missing
@@ -2301,6 +2296,8 @@ static void ioc_timer_fn(struct timer_list *timer)

ioc_refresh_params(ioc, false);

+ ioc_forgive_debts(ioc, usage_us_sum, nr_debtors, &now);
+
/*
* This period is done. Move onto the next one. If nothing's
* going on with the device, stop the timer.
--
2.26.2

2020-09-18 00:46:38

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/5] iocost: factor out ioc_forgive_debts()

Debt reduction logic is going to be improved and expanded. Factor it out
into ioc_forgive_debts() and generalize the comment a bit. No functional
change.

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-iocost.c | 66 ++++++++++++++++++++++++----------------------
1 file changed, 35 insertions(+), 31 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index ef9476fca1d8..bbf30bb06c07 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1979,6 +1979,40 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
list_del_init(&iocg->walk_list);
}

+/*
+ * A low weight iocg can amass a large amount of debt, for example, when
+ * anonymous memory gets reclaimed aggressively. If the system has a lot of
+ * memory paired with a slow IO device, the debt can span multiple seconds or
+ * more. If there are no other subsequent IO issuers, the in-debt iocg may end
+ * up blocked paying its debt while the IO device is idle.
+ *
+ * The following protects against such cases. If the device has been
+ * sufficiently idle for a while, the debts are halved.
+ */
+static void ioc_forgive_debts(struct ioc *ioc, u64 usage_us_sum, int nr_debtors,
+ int nr_shortages, struct ioc_now *now)
+{
+ if (nr_shortages ||
+ div64_u64(100 * usage_us_sum, now->now - ioc->period_at) >=
+ DEBT_BUSY_USAGE_PCT)
+ ioc->debt_busy_at = now->now;
+
+ if (nr_debtors &&
+ now->now - ioc->debt_busy_at >= DEBT_REDUCTION_IDLE_DUR) {
+ struct ioc_gq *iocg;
+
+ list_for_each_entry(iocg, &ioc->active_iocgs, active_list) {
+ if (iocg->abs_vdebt) {
+ spin_lock(&iocg->waitq.lock);
+ iocg->abs_vdebt /= 2;
+ iocg_kick_waitq(iocg, true, now);
+ spin_unlock(&iocg->waitq.lock);
+ }
+ }
+ ioc->debt_busy_at = now->now;
+ }
+}
+
static void ioc_timer_fn(struct timer_list *timer)
{
struct ioc *ioc = container_of(timer, struct ioc, timer);
@@ -2171,37 +2205,7 @@ static void ioc_timer_fn(struct timer_list *timer)
list_for_each_entry_safe(iocg, tiocg, &surpluses, surplus_list)
list_del_init(&iocg->surplus_list);

- /*
- * A low weight iocg can amass a large amount of debt, for example, when
- * anonymous memory gets reclaimed aggressively. If the system has a lot
- * of memory paired with a slow IO device, the debt can span multiple
- * seconds or more. If there are no other subsequent IO issuers, the
- * in-debt iocg may end up blocked paying its debt while the IO device
- * is idle.
- *
- * The following protects against such pathological cases. If the device
- * has been sufficiently idle for a substantial amount of time, the
- * debts are halved. The criteria are on the conservative side as we
- * want to resolve the rare extreme cases without impacting regular
- * operation by forgiving debts too readily.
- */
- if (nr_shortages ||
- div64_u64(100 * usage_us_sum, now.now - ioc->period_at) >=
- DEBT_BUSY_USAGE_PCT)
- ioc->debt_busy_at = now.now;
-
- if (nr_debtors &&
- now.now - ioc->debt_busy_at >= DEBT_REDUCTION_IDLE_DUR) {
- list_for_each_entry(iocg, &ioc->active_iocgs, active_list) {
- if (iocg->abs_vdebt) {
- spin_lock(&iocg->waitq.lock);
- iocg->abs_vdebt /= 2;
- iocg_kick_waitq(iocg, true, &now);
- spin_unlock(&iocg->waitq.lock);
- }
- }
- ioc->debt_busy_at = now.now;
- }
+ ioc_forgive_debts(ioc, usage_us_sum, nr_debtors, nr_shortages, &now);

/*
* If q is getting clogged or we're missing too much, we're issuing
--
2.26.2

2020-09-18 00:47:44

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 5/5] iocost: add iocg_forgive_debt tracepoint

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-iocost.c | 12 ++++++++++
include/trace/events/iocost.h | 41 +++++++++++++++++++++++++++++++++++
2 files changed, 53 insertions(+)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 9b1f94499432..328ae805e85f 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2046,12 +2046,24 @@ static void ioc_forgive_debts(struct ioc *ioc, u64 usage_us_sum, int nr_debtors,
ioc->dfgv_period_rem = do_div(nr_cycles, DFGV_PERIOD);

list_for_each_entry(iocg, &ioc->active_iocgs, active_list) {
+ u64 __maybe_unused old_debt, __maybe_unused old_delay;
+
if (!iocg->abs_vdebt)
continue;
+
spin_lock(&iocg->waitq.lock);
+
+ old_debt = iocg->abs_vdebt;
+ old_delay = iocg->delay;
+
iocg->abs_vdebt >>= nr_cycles;
iocg->delay = 0; /* kick_waitq will recalc */
iocg_kick_waitq(iocg, true, now);
+
+ TRACE_IOCG_PATH(iocg_forgive_debt, iocg, now, usage_pct,
+ old_debt, iocg->abs_vdebt,
+ old_delay, iocg->delay);
+
spin_unlock(&iocg->waitq.lock);
}
}
diff --git a/include/trace/events/iocost.h b/include/trace/events/iocost.h
index b350860d2e71..0b6869980ba2 100644
--- a/include/trace/events/iocost.h
+++ b/include/trace/events/iocost.h
@@ -164,6 +164,47 @@ TRACE_EVENT(iocost_ioc_vrate_adj,
)
);

+TRACE_EVENT(iocost_iocg_forgive_debt,
+
+ TP_PROTO(struct ioc_gq *iocg, const char *path, struct ioc_now *now,
+ u32 usage_pct, u64 old_debt, u64 new_debt,
+ u64 old_delay, u64 new_delay),
+
+ TP_ARGS(iocg, path, now, usage_pct,
+ old_debt, new_debt, old_delay, new_delay),
+
+ TP_STRUCT__entry (
+ __string(devname, ioc_name(iocg->ioc))
+ __string(cgroup, path)
+ __field(u64, now)
+ __field(u64, vnow)
+ __field(u32, usage_pct)
+ __field(u64, old_debt)
+ __field(u64, new_debt)
+ __field(u64, old_delay)
+ __field(u64, new_delay)
+ ),
+
+ TP_fast_assign(
+ __assign_str(devname, ioc_name(iocg->ioc));
+ __assign_str(cgroup, path);
+ __entry->now = now->now;
+ __entry->vnow = now->vnow;
+ __entry->usage_pct = usage_pct;
+ __entry->old_debt = old_debt;
+ __entry->new_debt = new_debt;
+ __entry->old_delay = old_delay;
+ __entry->new_delay = new_delay;
+ ),
+
+ TP_printk("[%s:%s] now=%llu:%llu usage=%u debt=%llu->%llu delay=%llu->%llu",
+ __get_str(devname), __get_str(cgroup),
+ __entry->now, __entry->vnow, __entry->usage_pct,
+ __entry->old_debt, __entry->new_debt,
+ __entry->old_delay, __entry->new_delay
+ )
+);
+
#endif /* _TRACE_BLK_IOCOST_H */

/* This part must be outside protection */
--
2.26.2

2020-09-18 00:48:40

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/5] iocost: recalculate delay after debt reduction

Debt sets the initial delay duration which is decayed over time. The current
debt reduction halved the debt but didn't change the delay. It prevented
future debts from increasing delay but didn't do anything to lower the
existing delay, limiting the mechanism's ability to reduce unnecessary
idling.

Reset iocg->delay to 0 after debt reduction so that iocg_kick_waitq()
recalculates new delay value based on the reduced debt amount.

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-iocost.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index c0499c294da9..ffcb78126ab7 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1984,7 +1984,8 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
* up blocked paying its debt while the IO device is idle.
*
* The following protects against such cases. If the device has been
- * sufficiently idle for a while, the debts are halved.
+ * sufficiently idle for a while, the debts are halved and delays are
+ * recalculated.
*/
static void ioc_forgive_debts(struct ioc *ioc, u64 usage_us_sum, int nr_debtors,
struct ioc_now *now)
@@ -2002,6 +2003,7 @@ static void ioc_forgive_debts(struct ioc *ioc, u64 usage_us_sum, int nr_debtors,
if (iocg->abs_vdebt) {
spin_lock(&iocg->waitq.lock);
iocg->abs_vdebt /= 2;
+ iocg->delay = 0; /* kick_waitq will recalc */
iocg_kick_waitq(iocg, true, now);
spin_unlock(&iocg->waitq.lock);
}
--
2.26.2

2020-09-18 00:49:21

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 4/5] iocost: reimplement debt forgiveness using average usage

Debt forgiveness logic was counting the number of consecutive !busy periods
as the trigger condition. While this usually works, it can easily be thrown
off by temporary fluctuations especially on configurations w/ short periods.

This patch reimplements debt forgiveness so that:

* Use the average usage over the forgiveness period instead of counting
consecutive periods.

* Debt is reduced at around the target rate (1/2 every 100ms) regardless of
ioc period duration.

* Usage threshold is raised to 50%. Combined with the preceding changes and
the switch to average usage, this makes debt forgivness a lot more
effective at reducing the amount of unnecessary idleness.

* Constants are renamed with DFGV_ prefix.

Signed-off-by: Tejun Heo <[email protected]>
---
block/blk-iocost.c | 94 ++++++++++++++++++++++++++++++++++------------
1 file changed, 69 insertions(+), 25 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index ffcb78126ab7..9b1f94499432 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -287,9 +287,9 @@ enum {
MIN_DELAY = 250,
MAX_DELAY = 250 * USEC_PER_MSEC,

- /* halve debts if total usage keeps staying under 25% for over 100ms */
- DEBT_BUSY_USAGE_PCT = 25,
- DEBT_REDUCTION_IDLE_DUR = 100 * USEC_PER_MSEC,
+ /* halve debts if avg usage over 100ms is under 50% */
+ DFGV_USAGE_PCT = 50,
+ DFGV_PERIOD = 100 * USEC_PER_MSEC,

/* don't let cmds which take a very long time pin lagging for too long */
MAX_LAGGING_PERIODS = 10,
@@ -433,8 +433,10 @@ struct ioc {
bool weights_updated;
atomic_t hweight_gen; /* for lazy hweights */

- /* the last time debt cancel condition wasn't met */
- u64 debt_busy_at;
+ /* debt forgivness */
+ u64 dfgv_period_at;
+ u64 dfgv_period_rem;
+ u64 dfgv_usage_us_sum;

u64 autop_too_fast_at;
u64 autop_too_slow_at;
@@ -1251,7 +1253,8 @@ static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now)

if (ioc->running == IOC_IDLE) {
ioc->running = IOC_RUNNING;
- ioc->debt_busy_at = now->now;
+ ioc->dfgv_period_at = now->now;
+ ioc->dfgv_period_rem = 0;
ioc_start_period(ioc, now);
}

@@ -1990,25 +1993,66 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
static void ioc_forgive_debts(struct ioc *ioc, u64 usage_us_sum, int nr_debtors,
struct ioc_now *now)
{
- if (ioc->busy_level < 0 ||
- div64_u64(100 * usage_us_sum, now->now - ioc->period_at) >=
- DEBT_BUSY_USAGE_PCT)
- ioc->debt_busy_at = now->now;
-
- if (nr_debtors &&
- now->now - ioc->debt_busy_at >= DEBT_REDUCTION_IDLE_DUR) {
- struct ioc_gq *iocg;
-
- list_for_each_entry(iocg, &ioc->active_iocgs, active_list) {
- if (iocg->abs_vdebt) {
- spin_lock(&iocg->waitq.lock);
- iocg->abs_vdebt /= 2;
- iocg->delay = 0; /* kick_waitq will recalc */
- iocg_kick_waitq(iocg, true, now);
- spin_unlock(&iocg->waitq.lock);
- }
- }
- ioc->debt_busy_at = now->now;
+ struct ioc_gq *iocg;
+ u64 dur, usage_pct, nr_cycles;
+
+ /* if no debtor, reset the cycle */
+ if (!nr_debtors) {
+ ioc->dfgv_period_at = now->now;
+ ioc->dfgv_period_rem = 0;
+ ioc->dfgv_usage_us_sum = 0;
+ return;
+ }
+
+ /*
+ * Debtors can pass through a lot of writes choking the device and we
+ * don't want to be forgiving debts while the device is struggling from
+ * write bursts. If we're missing latency targets, consider the device
+ * fully utilized.
+ */
+ if (ioc->busy_level > 0)
+ usage_us_sum = max_t(u64, usage_us_sum, ioc->period_us);
+
+ ioc->dfgv_usage_us_sum += usage_us_sum;
+ if (time_before64(now->now, ioc->dfgv_period_at + DFGV_PERIOD))
+ return;
+
+ /*
+ * At least DFGV_PERIOD has passed since the last period. Calculate the
+ * average usage and reset the period counters.
+ */
+ dur = now->now - ioc->dfgv_period_at;
+ usage_pct = div64_u64(100 * ioc->dfgv_usage_us_sum, dur);
+
+ ioc->dfgv_period_at = now->now;
+ ioc->dfgv_usage_us_sum = 0;
+
+ /* if was too busy, reset everything */
+ if (usage_pct > DFGV_USAGE_PCT) {
+ ioc->dfgv_period_rem = 0;
+ return;
+ }
+
+ /*
+ * Usage is lower than threshold. Let's forgive some debts. Debt
+ * forgiveness runs off of the usual ioc timer but its period usually
+ * doesn't match ioc's. Compensate the difference by performing the
+ * reduction as many times as would fit in the duration since the last
+ * run and carrying over the left-over duration in @ioc->dfgv_period_rem
+ * - if ioc period is 75% of DFGV_PERIOD, one out of three consecutive
+ * reductions is doubled.
+ */
+ nr_cycles = dur + ioc->dfgv_period_rem;
+ ioc->dfgv_period_rem = do_div(nr_cycles, DFGV_PERIOD);
+
+ list_for_each_entry(iocg, &ioc->active_iocgs, active_list) {
+ if (!iocg->abs_vdebt)
+ continue;
+ spin_lock(&iocg->waitq.lock);
+ iocg->abs_vdebt >>= nr_cycles;
+ iocg->delay = 0; /* kick_waitq will recalc */
+ iocg_kick_waitq(iocg, true, now);
+ spin_unlock(&iocg->waitq.lock);
}
}

--
2.26.2

2020-09-18 18:42:49

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 6/5] iocost: consider iocgs with active delays for debt forgiveness

An iocg may have 0 debt but non-zero delay. The current debt forgiveness
logic doesn't act on such iocgs. This can lead to unexpected behaviors - an
iocg with a little bit of debt will have its delay canceled through debt
forgiveness but one w/o any debt but active delay will have to wait out
until its delay decays out.

This patch updates the debt handling logic so that it treats delays the same
as debts. If either debt or delay is active, debt forgiveness logic kicks in
and acts on both the same way.

Also, avoid turning the debt and delay directly to zero as that can confuse
state transitions.

Signed-off-by: Tejun Heo <[email protected]>
---
Jens, a follow up patch to the series. The git tree is also updated.

Thanks.

block/blk-iocost.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2048,7 +2048,7 @@ static void ioc_forgive_debts(struct ioc
list_for_each_entry(iocg, &ioc->active_iocgs, active_list) {
u64 __maybe_unused old_debt, __maybe_unused old_delay;

- if (!iocg->abs_vdebt)
+ if (!iocg->abs_vdebt && !iocg->delay)
continue;

spin_lock(&iocg->waitq.lock);
@@ -2056,8 +2056,11 @@ static void ioc_forgive_debts(struct ioc
old_debt = iocg->abs_vdebt;
old_delay = iocg->delay;

- iocg->abs_vdebt >>= nr_cycles;
- iocg->delay = 0; /* kick_waitq will recalc */
+ if (iocg->abs_vdebt)
+ iocg->abs_vdebt = iocg->abs_vdebt >> nr_cycles ?: 1;
+ if (iocg->delay)
+ iocg->delay = iocg->delay >> nr_cycles ?: 1;
+
iocg_kick_waitq(iocg, true, now);

TRACE_IOCG_PATH(iocg_forgive_debt, iocg, now, usage_pct,
@@ -2129,7 +2132,7 @@ static void ioc_timer_fn(struct timer_li
iocg->delay) {
/* might be oversleeping vtime / hweight changes, kick */
iocg_kick_waitq(iocg, true, &now);
- if (iocg->abs_vdebt)
+ if (iocg->abs_vdebt || iocg->delay)
nr_debtors++;
} else if (iocg_is_idle(iocg)) {
/* no waiter and idle, deactivate */

2020-09-25 14:37:33

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHSET for-5.10/block] iocost: improve debt forgiveness logic

On 9/17/20 6:44 PM, Tejun Heo wrote:
> Hello,
>
> Debt reduction logic was recently added by dda1315f1853 ("blk-iocost: halve
> debts if device stays idle"). While it was effective at avoiding
> pathological cases where some iocgs were kept delayed while the device was
> most idle, it wasn't very effective at addressing more complex conditions
> and could leave low priority cgroups unnecessarily harshly throttled under
> moderate load.

Applied, thanks.

--
Jens Axboe