Util-clamp places tasks in different buckets based on their clamp values
for performance reasons. However, the size of buckets is currently
computed using a rounding division, which can lead to an off-by-one
error in some configurations.
For instance, with 20 buckets, the bucket size will be 1024/20=51. A
task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly,
correct indexes are in range [0,19], hence leading to an out of bound
memory access.
Fix the math to compute the bucket size.
Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
Suggested-by: Qais Yousef <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
Changes in v2:
- replaced the DIV_ROUND_UP(a,b) with a/b+1 (Dietmar)
This was found thanks to the SCHED_WARN_ON() in uclamp_rq_dec_id() which
indicated a broken state while running with 20 buckets on Android.
Big thanks to Qais for the help with this one.
---
kernel/sched/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 98191218d891..c5fb230dc604 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -920,8 +920,7 @@ static struct uclamp_se uclamp_default[UCLAMP_CNT];
*/
DEFINE_STATIC_KEY_FALSE(sched_uclamp_used);
-/* Integer rounded range for each bucket */
-#define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
+#define UCLAMP_BUCKET_DELTA (SCHED_CAPACITY_SCALE / UCLAMP_BUCKETS + 1)
#define for_each_clamp_id(clamp_id) \
for ((clamp_id) = 0; (clamp_id) < UCLAMP_CNT; (clamp_id)++)
--
2.31.1.498.g6c1eba8ee3d-goog
On Thu, 29 Apr 2021 at 17:27, Quentin Perret <[email protected]> wrote:
>
> Util-clamp places tasks in different buckets based on their clamp values
> for performance reasons. However, the size of buckets is currently
> computed using a rounding division, which can lead to an off-by-one
> error in some configurations.
>
> For instance, with 20 buckets, the bucket size will be 1024/20=51. A
> task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly,
> correct indexes are in range [0,19], hence leading to an out of bound
> memory access.
>
> Fix the math to compute the bucket size.
>
> Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
> Suggested-by: Qais Yousef <[email protected]>
> Signed-off-by: Quentin Perret <[email protected]>
>
> ---
>
> Changes in v2:
> - replaced the DIV_ROUND_UP(a,b) with a/b+1 (Dietmar)
Doesn't this create unfairness between buckets ?
If we take your example above of 20 buckets, delta is now 52. Then we
expect the last bucket to get the range [972-1024] but values lower
than 988 will go in the idx 18. And the more bucket you will have, the
worse it will be
Your problem comes from the fact that we use 1025 values instead of
1024. Wouldn't it be easier to have a special condition for
SCHED_CAPACITY_SCALE value
>
> This was found thanks to the SCHED_WARN_ON() in uclamp_rq_dec_id() which
> indicated a broken state while running with 20 buckets on Android.
>
> Big thanks to Qais for the help with this one.
> ---
> kernel/sched/core.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 98191218d891..c5fb230dc604 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -920,8 +920,7 @@ static struct uclamp_se uclamp_default[UCLAMP_CNT];
> */
> DEFINE_STATIC_KEY_FALSE(sched_uclamp_used);
>
> -/* Integer rounded range for each bucket */
> -#define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
> +#define UCLAMP_BUCKET_DELTA (SCHED_CAPACITY_SCALE / UCLAMP_BUCKETS + 1)
>
> #define for_each_clamp_id(clamp_id) \
> for ((clamp_id) = 0; (clamp_id) < UCLAMP_CNT; (clamp_id)++)
> --
> 2.31.1.498.g6c1eba8ee3d-goog
>
On Friday 30 Apr 2021 at 09:45:32 (+0200), Vincent Guittot wrote:
> On Thu, 29 Apr 2021 at 17:27, Quentin Perret <[email protected]> wrote:
> >
> > Util-clamp places tasks in different buckets based on their clamp values
> > for performance reasons. However, the size of buckets is currently
> > computed using a rounding division, which can lead to an off-by-one
> > error in some configurations.
> >
> > For instance, with 20 buckets, the bucket size will be 1024/20=51. A
> > task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly,
> > correct indexes are in range [0,19], hence leading to an out of bound
> > memory access.
> >
> > Fix the math to compute the bucket size.
> >
> > Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
> > Suggested-by: Qais Yousef <[email protected]>
> > Signed-off-by: Quentin Perret <[email protected]>
> >
> > ---
> >
> > Changes in v2:
> > - replaced the DIV_ROUND_UP(a,b) with a/b+1 (Dietmar)
>
> Doesn't this create unfairness between buckets ?
>
> If we take your example above of 20 buckets, delta is now 52. Then we
> expect the last bucket to get the range [972-1024] but values lower
> than 988 will go in the idx 18.
Well, that's just the limitation of integer arithmetics isn't it?
> And the more bucket you will have, the
> worse it will be
Sure, but 20 is a hard limit, and if we ever need more than that then
maybe we should think about getting rid of the buckets altogether.
> Your problem comes from the fact that we use 1025 values instead of
> 1024.
I don't understand what you mean here. Right now we'll assign bucket id
20 for any clamp in the range [1020-1024], so I don't think we can
special case 1024.
> Wouldn't it be easier to have a special condition for
> SCHED_CAPACITY_SCALE value
As per the above, I don't see how that'll work.
Thanks,
Quentin
On Fri, 30 Apr 2021 at 10:25, Quentin Perret <[email protected]> wrote:
>
> On Friday 30 Apr 2021 at 09:45:32 (+0200), Vincent Guittot wrote:
> > On Thu, 29 Apr 2021 at 17:27, Quentin Perret <[email protected]> wrote:
> > >
> > > Util-clamp places tasks in different buckets based on their clamp values
> > > for performance reasons. However, the size of buckets is currently
> > > computed using a rounding division, which can lead to an off-by-one
> > > error in some configurations.
> > >
> > > For instance, with 20 buckets, the bucket size will be 1024/20=51. A
> > > task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly,
> > > correct indexes are in range [0,19], hence leading to an out of bound
> > > memory access.
> > >
> > > Fix the math to compute the bucket size.
> > >
> > > Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
> > > Suggested-by: Qais Yousef <[email protected]>
> > > Signed-off-by: Quentin Perret <[email protected]>
> > >
> > > ---
> > >
> > > Changes in v2:
> > > - replaced the DIV_ROUND_UP(a,b) with a/b+1 (Dietmar)
> >
> > Doesn't this create unfairness between buckets ?
> >
> > If we take your example above of 20 buckets, delta is now 52. Then we
> > expect the last bucket to get the range [972-1024] but values lower
> > than 988 will go in the idx 18.
>
> Well, that's just the limitation of integer arithmetics isn't it?
>
> > And the more bucket you will have, the
> > worse it will be
>
> Sure, but 20 is a hard limit, and if we ever need more than that then
> maybe we should think about getting rid of the buckets altogether.
>
> > Your problem comes from the fact that we use 1025 values instead of
> > 1024.
>
> I don't understand what you mean here. Right now we'll assign bucket id
> 20 for any clamp in the range [1020-1024], so I don't think we can
> special case 1024.
20 buckets is probably not the best example because of the rounding of
the division. With 16 buckets, each bucket should be exactly 64 steps
large except the last one which will have 65 steps because of the
value 1024. With your change, buckets will be 65 large and the last
one will be only 49 large
>
> > Wouldn't it be easier to have a special condition for
> > SCHED_CAPACITY_SCALE value
>
> As per the above, I don't see how that'll work.
>
> Thanks,
> Quentin
On Friday 30 Apr 2021 at 10:49:50 (+0200), Vincent Guittot wrote:
> 20 buckets is probably not the best example because of the rounding of
> the division. With 16 buckets, each bucket should be exactly 64 steps
> large except the last one which will have 65 steps because of the
> value 1024. With your change, buckets will be 65 large and the last
> one will be only 49 large
OK, so what do you think of this?
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c5fb230dc604..dceeb5821797 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -920,14 +920,14 @@ static struct uclamp_se uclamp_default[UCLAMP_CNT];
*/
DEFINE_STATIC_KEY_FALSE(sched_uclamp_used);
-#define UCLAMP_BUCKET_DELTA (SCHED_CAPACITY_SCALE / UCLAMP_BUCKETS + 1)
+#define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
#define for_each_clamp_id(clamp_id) \
for ((clamp_id) = 0; (clamp_id) < UCLAMP_CNT; (clamp_id)++)
static inline unsigned int uclamp_bucket_id(unsigned int clamp_value)
{
- return clamp_value / UCLAMP_BUCKET_DELTA;
+ return min(clamp_value / UCLAMP_BUCKET_DELTA, UCLAMP_BUCKETS - 1);
}
static inline unsigned int uclamp_none(enum uclamp_id clamp_id)
On Fri, 30 Apr 2021 at 11:40, Quentin Perret <[email protected]> wrote:
>
> On Friday 30 Apr 2021 at 10:49:50 (+0200), Vincent Guittot wrote:
> > 20 buckets is probably not the best example because of the rounding of
> > the division. With 16 buckets, each bucket should be exactly 64 steps
> > large except the last one which will have 65 steps because of the
> > value 1024. With your change, buckets will be 65 large and the last
> > one will be only 49 large
>
> OK, so what do you think of this?
Looks good to me
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c5fb230dc604..dceeb5821797 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -920,14 +920,14 @@ static struct uclamp_se uclamp_default[UCLAMP_CNT];
> */
> DEFINE_STATIC_KEY_FALSE(sched_uclamp_used);
>
> -#define UCLAMP_BUCKET_DELTA (SCHED_CAPACITY_SCALE / UCLAMP_BUCKETS + 1)
> +#define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
>
> #define for_each_clamp_id(clamp_id) \
> for ((clamp_id) = 0; (clamp_id) < UCLAMP_CNT; (clamp_id)++)
>
> static inline unsigned int uclamp_bucket_id(unsigned int clamp_value)
> {
> - return clamp_value / UCLAMP_BUCKET_DELTA;
> + return min(clamp_value / UCLAMP_BUCKET_DELTA, UCLAMP_BUCKETS - 1);
> }
>
> static inline unsigned int uclamp_none(enum uclamp_id clamp_id)
On 30/04/2021 14:03, Vincent Guittot wrote:
> On Fri, 30 Apr 2021 at 11:40, Quentin Perret <[email protected]> wrote:
>>
>> On Friday 30 Apr 2021 at 10:49:50 (+0200), Vincent Guittot wrote:
>>> 20 buckets is probably not the best example because of the rounding of
>>> the division. With 16 buckets, each bucket should be exactly 64 steps
>>> large except the last one which will have 65 steps because of the
>>> value 1024. With your change, buckets will be 65 large and the last
>>> one will be only 49 large
>>
>> OK, so what do you think of this?
>
> Looks good to me
+1
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index c5fb230dc604..dceeb5821797 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -920,14 +920,14 @@ static struct uclamp_se uclamp_default[UCLAMP_CNT];
>> */
>> DEFINE_STATIC_KEY_FALSE(sched_uclamp_used);
>>
>> -#define UCLAMP_BUCKET_DELTA (SCHED_CAPACITY_SCALE / UCLAMP_BUCKETS + 1)
>> +#define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
>>
>> #define for_each_clamp_id(clamp_id) \
>> for ((clamp_id) = 0; (clamp_id) < UCLAMP_CNT; (clamp_id)++)
>>
>> static inline unsigned int uclamp_bucket_id(unsigned int clamp_value)
>> {
>> - return clamp_value / UCLAMP_BUCKET_DELTA;
>> + return min(clamp_value / UCLAMP_BUCKET_DELTA, UCLAMP_BUCKETS - 1);
IMHO, this asks for
min_t(unsigned int, clamp_value/UCLAMP_BUCKET_DELTA, UCLAMP_BUCKETS-1);
>> }
>>
>> static inline unsigned int uclamp_none(enum uclamp_id clamp_id)
Looks like this will fix a lot of possible configs:
nbr buckets 1-4, 7-8, 10-12, 14-17, *20*, 26, 29-32 ...
We would still introduce larger last buckets, right?
Examples:
nbr_buckets delta last bucket size
20 51 +5 = 56
26 39 +10 = 49
29 35 +9 = 44
...
On Friday 30 Apr 2021 at 15:00:00 (+0200), Dietmar Eggemann wrote:
> On 30/04/2021 14:03, Vincent Guittot wrote:
> IMHO, this asks for
>
> min_t(unsigned int, clamp_value/UCLAMP_BUCKET_DELTA, UCLAMP_BUCKETS-1);
Yep, that's what I have locally.
> >> }
> >>
> >> static inline unsigned int uclamp_none(enum uclamp_id clamp_id)
>
> Looks like this will fix a lot of possible configs:
>
> nbr buckets 1-4, 7-8, 10-12, 14-17, *20*, 26, 29-32 ...
>
> We would still introduce larger last buckets, right?
Indeed. The only better alternative I could see was to 'spread' the
error accross multiple buckets (e.g. make the last few buckets a bit
bigger instead of having all of it accumulated on the last one), but not
sure it is worth the overhead.
Suggestions are welcome though.
> Examples:
>
> nbr_buckets delta last bucket size
>
> 20 51 +5 = 56
>
> 26 39 +10 = 49
>
> 29 35 +9 = 44
Yes the error can become worse in the relative sense with a large number
of buckets, but again the max is 20 so we should be fine?
Thanks,
Quentin
On Fri, 30 Apr 2021 at 15:14, Quentin Perret <[email protected]> wrote:
>
> On Friday 30 Apr 2021 at 15:00:00 (+0200), Dietmar Eggemann wrote:
> > On 30/04/2021 14:03, Vincent Guittot wrote:
> > IMHO, this asks for
> >
> > min_t(unsigned int, clamp_value/UCLAMP_BUCKET_DELTA, UCLAMP_BUCKETS-1);
>
> Yep, that's what I have locally.
>
> > >> }
> > >>
> > >> static inline unsigned int uclamp_none(enum uclamp_id clamp_id)
> >
> > Looks like this will fix a lot of possible configs:
> >
> > nbr buckets 1-4, 7-8, 10-12, 14-17, *20*, 26, 29-32 ...
> >
> > We would still introduce larger last buckets, right?
>
> Indeed. The only better alternative I could see was to 'spread' the
> error accross multiple buckets (e.g. make the last few buckets a bit
> bigger instead of having all of it accumulated on the last one), but not
> sure it is worth the overhead.
I don't think it's worth the overhead.
>
> Suggestions are welcome though.
>
> > Examples:
> >
> > nbr_buckets delta last bucket size
> >
> > 20 51 +5 = 56
> >
> > 26 39 +10 = 49
> >
> > 29 35 +9 = 44
>
> Yes the error can become worse in the relative sense with a large number
> of buckets, but again the max is 20 so we should be fine?
>
> Thanks,
> Quentin
On 30/04/2021 16:16, Vincent Guittot wrote:
> On Fri, 30 Apr 2021 at 15:14, Quentin Perret <[email protected]> wrote:
>>
>> On Friday 30 Apr 2021 at 15:00:00 (+0200), Dietmar Eggemann wrote:
>>> On 30/04/2021 14:03, Vincent Guittot wrote:
[...]
>>> Looks like this will fix a lot of possible configs:
>>>
>>> nbr buckets 1-4, 7-8, 10-12, 14-17, *20*, 26, 29-32 ...
>>>
>>> We would still introduce larger last buckets, right?
>>
>> Indeed. The only better alternative I could see was to 'spread' the
>> error accross multiple buckets (e.g. make the last few buckets a bit
>> bigger instead of having all of it accumulated on the last one), but not
>> sure it is worth the overhead.
>
> I don't think it's worth the overhead.
Me neither.
[...]