2021-06-29 12:52:20

by Odin Ugedal

[permalink] [raw]
Subject: [PATCH] sched/fair: Fix CFS bandwidth hrtimer expiry type

The time remaining until expiry of the refresh_timer can be negative.
Casting the type to an unsigned 64-bit value will cause integer
underflow, making the runtime_refresh_within return false instead of
true. These situations are rare, but they do happen.

This does not cause user-facing issues or errors; other than
possibly unthrottling cfs_rq's using runtime from the previous period(s),
making the CFS bandwidth enforcement less strict in those (special)
situations.

Signed-off-by: Odin Ugedal <[email protected]>
---
kernel/sched/fair.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 23663318fb81..62446c052efb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5108,7 +5108,7 @@ static const u64 cfs_bandwidth_slack_period = 5 * NSEC_PER_MSEC;
static int runtime_refresh_within(struct cfs_bandwidth *cfs_b, u64 min_expire)
{
struct hrtimer *refresh_timer = &cfs_b->period_timer;
- u64 remaining;
+ s64 remaining;

/* if the call-back is running a quota refresh is already occurring */
if (hrtimer_callback_running(refresh_timer))
@@ -5116,7 +5116,7 @@ static int runtime_refresh_within(struct cfs_bandwidth *cfs_b, u64 min_expire)

/* is a quota refresh about to occur? */
remaining = ktime_to_ns(hrtimer_expires_remaining(refresh_timer));
- if (remaining < min_expire)
+ if (remaining < (s64)min_expire)
return 1;

return 0;
--
2.32.0


2021-06-29 20:58:12

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix CFS bandwidth hrtimer expiry type

Odin Ugedal <[email protected]> writes:

> The time remaining until expiry of the refresh_timer can be negative.
> Casting the type to an unsigned 64-bit value will cause integer
> underflow, making the runtime_refresh_within return false instead of
> true. These situations are rare, but they do happen.
>
> This does not cause user-facing issues or errors; other than
> possibly unthrottling cfs_rq's using runtime from the previous period(s),
> making the CFS bandwidth enforcement less strict in those (special)
> situations.

Yeah, extremely rare, not any real sort of problem when it does happen,
but no reason not to fix it and get the slight win in precision.

Reviewed-by: Ben Segall <[email protected]>

>
> Signed-off-by: Odin Ugedal <[email protected]>
> ---
> kernel/sched/fair.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 23663318fb81..62446c052efb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5108,7 +5108,7 @@ static const u64 cfs_bandwidth_slack_period = 5 * NSEC_PER_MSEC;
> static int runtime_refresh_within(struct cfs_bandwidth *cfs_b, u64 min_expire)
> {
> struct hrtimer *refresh_timer = &cfs_b->period_timer;
> - u64 remaining;
> + s64 remaining;
>
> /* if the call-back is running a quota refresh is already occurring */
> if (hrtimer_callback_running(refresh_timer))
> @@ -5116,7 +5116,7 @@ static int runtime_refresh_within(struct cfs_bandwidth *cfs_b, u64 min_expire)
>
> /* is a quota refresh about to occur? */
> remaining = ktime_to_ns(hrtimer_expires_remaining(refresh_timer));
> - if (remaining < min_expire)
> + if (remaining < (s64)min_expire)
> return 1;
>
> return 0;

Subject: [tip: sched/urgent] sched/fair: Fix CFS bandwidth hrtimer expiry type

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 72d0ad7cb5bad265adb2014dbe46c4ccb11afaba
Gitweb: https://git.kernel.org/tip/72d0ad7cb5bad265adb2014dbe46c4ccb11afaba
Author: Odin Ugedal <[email protected]>
AuthorDate: Tue, 29 Jun 2021 14:14:52 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 02 Jul 2021 15:58:24 +02:00

sched/fair: Fix CFS bandwidth hrtimer expiry type

The time remaining until expiry of the refresh_timer can be negative.
Casting the type to an unsigned 64-bit value will cause integer
underflow, making the runtime_refresh_within return false instead of
true. These situations are rare, but they do happen.

This does not cause user-facing issues or errors; other than
possibly unthrottling cfs_rq's using runtime from the previous period(s),
making the CFS bandwidth enforcement less strict in those (special)
situations.

Signed-off-by: Odin Ugedal <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Ben Segall <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1e263c9..1b15a19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5054,7 +5054,7 @@ static const u64 cfs_bandwidth_slack_period = 5 * NSEC_PER_MSEC;
static int runtime_refresh_within(struct cfs_bandwidth *cfs_b, u64 min_expire)
{
struct hrtimer *refresh_timer = &cfs_b->period_timer;
- u64 remaining;
+ s64 remaining;

/* if the call-back is running a quota refresh is already occurring */
if (hrtimer_callback_running(refresh_timer))
@@ -5062,7 +5062,7 @@ static int runtime_refresh_within(struct cfs_bandwidth *cfs_b, u64 min_expire)

/* is a quota refresh about to occur? */
remaining = ktime_to_ns(hrtimer_expires_remaining(refresh_timer));
- if (remaining < min_expire)
+ if (remaining < (s64)min_expire)
return 1;

return 0;