2019-06-03 04:46:15

by Liangyan

[permalink] [raw]
Subject: [PATCH] sched/fair: don't restart enqueued cfs quota slack timer

From: "liangyan.ply" <[email protected]>

start_cfs_slack_bandwidth() will restart the quota slack timer,
if it is called frequently, this timer will be restarted continuously
and may have no chance to expire to unthrottle cfs tasks.
This will cause that the throttled tasks can't be unthrottled in time
although they have remaining quota.

Signed-off-by: Liangyan <[email protected]>
---
kernel/sched/fair.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d90a64620072..fdb03c752f97 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4411,9 +4411,11 @@ static void start_cfs_slack_bandwidth(struct cfs_bandwidth *cfs_b)
if (runtime_refresh_within(cfs_b, min_left))
return;

- hrtimer_start(&cfs_b->slack_timer,
+ if (!hrtimer_active(&cfs_b->slack_timer)) {
+ hrtimer_start(&cfs_b->slack_timer,
ns_to_ktime(cfs_bandwidth_slack_period),
HRTIMER_MODE_REL);
+ }
}

/* we know any runtime found here is valid as update_curr() precedes return */
--
2.14.4.44.g2045bb6


2019-06-03 11:46:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: don't restart enqueued cfs quota slack timer

On Mon, Jun 03, 2019 at 12:43:09PM +0800, Liangyan wrote:
> From: "liangyan.ply" <[email protected]>
>
> start_cfs_slack_bandwidth() will restart the quota slack timer,
> if it is called frequently, this timer will be restarted continuously
> and may have no chance to expire to unthrottle cfs tasks.
> This will cause that the throttled tasks can't be unthrottled in time
> although they have remaining quota.

This looks very similar to the patch from Ben here:

https://lkml.kernel.org/r/[email protected]

>
> Signed-off-by: Liangyan <[email protected]>
> ---
> kernel/sched/fair.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d90a64620072..fdb03c752f97 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4411,9 +4411,11 @@ static void start_cfs_slack_bandwidth(struct cfs_bandwidth *cfs_b)
> if (runtime_refresh_within(cfs_b, min_left))
> return;
>
> - hrtimer_start(&cfs_b->slack_timer,
> + if (!hrtimer_active(&cfs_b->slack_timer)) {
> + hrtimer_start(&cfs_b->slack_timer,
> ns_to_ktime(cfs_bandwidth_slack_period),
> HRTIMER_MODE_REL);
> + }
> }
>
> /* we know any runtime found here is valid as update_curr() precedes return */
> --
> 2.14.4.44.g2045bb6
>

2019-06-03 20:26:24

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: don't restart enqueued cfs quota slack timer

Peter Zijlstra <[email protected]> writes:

> On Mon, Jun 03, 2019 at 12:43:09PM +0800, Liangyan wrote:
>> From: "liangyan.ply" <[email protected]>
>>
>> start_cfs_slack_bandwidth() will restart the quota slack timer,
>> if it is called frequently, this timer will be restarted continuously
>> and may have no chance to expire to unthrottle cfs tasks.
>> This will cause that the throttled tasks can't be unthrottled in time
>> although they have remaining quota.
>
> This looks very similar to the patch from Ben here:
>
> https://lkml.kernel.org/r/[email protected]

Yeah, it should do the same sort of thing, but testing hrtimer_active
is racy (we could miss restarting the timer if it's just dropped the
cfsb lock but hasn't finished yet), so the extra accounting flag is
needed.

>
>>
>> Signed-off-by: Liangyan <[email protected]>
>> ---
>> kernel/sched/fair.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index d90a64620072..fdb03c752f97 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4411,9 +4411,11 @@ static void start_cfs_slack_bandwidth(struct cfs_bandwidth *cfs_b)
>> if (runtime_refresh_within(cfs_b, min_left))
>> return;
>>
>> - hrtimer_start(&cfs_b->slack_timer,
>> + if (!hrtimer_active(&cfs_b->slack_timer)) {
>> + hrtimer_start(&cfs_b->slack_timer,
>> ns_to_ktime(cfs_bandwidth_slack_period),
>> HRTIMER_MODE_REL);
>> + }
>> }
>>
>> /* we know any runtime found here is valid as update_curr() precedes return */
>> --
>> 2.14.4.44.g2045bb6
>>