2021-07-01 17:19:46

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH] sched/fair: Sync load_sum with load_avg after dequeue

commit 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
reported some inconsitencies between *_avg and *_sum.

commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
fixed some but one remains when dequeuing load.

sync the cfs's load_sum with its load_avg after dequeuing the load of a
sched_entity.

Fixes: 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
Reported-by: Sachin Sant <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
---

I have been able to trigger a WARN on my system even with the patch
listed above. This patch fixes it.
Sachin could you test that it also fixes yours ?

kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 11d22943753f..48fc7dfc2f66 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3037,8 +3037,9 @@ enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
static inline void
dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
+ u32 divider = get_pelt_divider(&se->avg);
sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
- sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
+ cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
}
#else
static inline void
--
2.17.1


2021-07-02 06:26:51

by Sachin Sant

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Sync load_sum with load_avg after dequeue



> On 01-Jul-2021, at 10:48 PM, Vincent Guittot <[email protected]> wrote:
>
> commit 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
> reported some inconsitencies between *_avg and *_sum.
>
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> fixed some but one remains when dequeuing load.
>
> sync the cfs's load_sum with its load_avg after dequeuing the load of a
> sched_entity.
>
> Fixes: 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
> Reported-by: Sachin Sant <[email protected]>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
>
> I have been able to trigger a WARN on my system even with the patch
> listed above. This patch fixes it.
> Sachin could you test that it also fixes yours ?
>

I ran various LTP stress tests, scheduler tests and kernel compile operation for about 5 hours.
Haven’t seen the warning during the testing.

Tested-by: Sachin Sant <[email protected]>

I have left the tests running, will let it run for few more hours.

Thanks
-Sachin

> kernel/sched/fair.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 11d22943753f..48fc7dfc2f66 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3037,8 +3037,9 @@ enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> static inline void
> dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> + u32 divider = get_pelt_divider(&se->avg);
> sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
> - sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
> + cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
> }
> #else
> static inline void
> --
> 2.17.1
>

2021-07-02 08:09:18

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Sync load_sum with load_avg after dequeue

On Fri, 2 Jul 2021 at 08:16, Sachin Sant <[email protected]> wrote:
>
>
>
> > On 01-Jul-2021, at 10:48 PM, Vincent Guittot <[email protected]> wrote:
> >
> > commit 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
> > reported some inconsitencies between *_avg and *_sum.
> >
> > commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> > fixed some but one remains when dequeuing load.
> >
> > sync the cfs's load_sum with its load_avg after dequeuing the load of a
> > sched_entity.
> >
> > Fixes: 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
> > Reported-by: Sachin Sant <[email protected]>
> > Signed-off-by: Vincent Guittot <[email protected]>
> > ---
> >
> > I have been able to trigger a WARN on my system even with the patch
> > listed above. This patch fixes it.
> > Sachin could you test that it also fixes yours ?
> >
>
> I ran various LTP stress tests, scheduler tests and kernel compile operation for about 5 hours.
> Haven’t seen the warning during the testing.
>
> Tested-by: Sachin Sant <[email protected]>

Thanks

>
> I have left the tests running, will let it run for few more hours.
>
> Thanks
> -Sachin
>
> > kernel/sched/fair.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 11d22943753f..48fc7dfc2f66 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3037,8 +3037,9 @@ enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > static inline void
> > dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > {
> > + u32 divider = get_pelt_divider(&se->avg);
> > sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
> > - sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
> > + cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
> > }
> > #else
> > static inline void
> > --
> > 2.17.1
> >
>

2021-07-02 09:04:17

by Odin Ugedal

[permalink] [raw]
Subject: [tip: sched/urgent] sched/fair: Sync load_sum with load_avg after dequeue

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: ceb6ba45dc8074d2a1ec1117463dc94a20d4203d
Gitweb: https://git.kernel.org/tip/ceb6ba45dc8074d2a1ec1117463dc94a20d4203d
Author: Vincent Guittot <[email protected]>
AuthorDate: Thu, 01 Jul 2021 19:18:37 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 02 Jul 2021 15:58:23 +02:00

sched/fair: Sync load_sum with load_avg after dequeue

commit 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
reported some inconsitencies between *_avg and *_sum.

commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
fixed some but one remains when dequeuing load.

sync the cfs's load_sum with its load_avg after dequeuing the load of a
sched_entity.

Fixes: 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
Reported-by: Sachin Sant <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Odin Ugedal <[email protected]>
Tested-by: Sachin Sant <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 45edf61..1e263c9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3037,8 +3037,9 @@ enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
static inline void
dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
+ u32 divider = get_pelt_divider(&se->avg);
sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
- sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
+ cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
}
#else
static inline void