2020-05-13 14:00:52

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
are quite close and follow the same sequence for enqueuing an entity in the
cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
enqueue_task_fair(). This fixes a problem already faced with the latter and
add an optimization in the last for_each_sched_entity loop.

Reported-by Tao Zhou <[email protected]>
Reviewed-by: Phil Auld <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
---

v3 changes:
- remove the unused enqueue variable

kernel/sched/fair.c | 42 ++++++++++++++++++++++++++++++------------
1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4e12ba882663..9a58874ef104 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4792,7 +4792,6 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
struct sched_entity *se;
- int enqueue = 1;
long task_delta, idle_task_delta;

se = cfs_rq->tg->se[cpu_of(rq)];
@@ -4816,26 +4815,44 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
idle_task_delta = cfs_rq->idle_h_nr_running;
for_each_sched_entity(se) {
if (se->on_rq)
- enqueue = 0;
+ break;
+ cfs_rq = cfs_rq_of(se);
+ enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);

+ cfs_rq->h_nr_running += task_delta;
+ cfs_rq->idle_h_nr_running += idle_task_delta;
+
+ /* end evaluation on encountering a throttled cfs_rq */
+ if (cfs_rq_throttled(cfs_rq))
+ goto unthrottle_throttle;
+ }
+
+ for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
- if (enqueue) {
- enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
- } else {
- update_load_avg(cfs_rq, se, 0);
- se_update_runnable(se);
- }
+
+ update_load_avg(cfs_rq, se, UPDATE_TG);
+ se_update_runnable(se);

cfs_rq->h_nr_running += task_delta;
cfs_rq->idle_h_nr_running += idle_task_delta;

+
+ /* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
- break;
+ goto unthrottle_throttle;
+
+ /*
+ * One parent has been throttled and cfs_rq removed from the
+ * list. Add it back to not break the leaf list.
+ */
+ if (throttled_hierarchy(cfs_rq))
+ list_add_leaf_cfs_rq(cfs_rq);
}

- if (!se)
- add_nr_running(rq, task_delta);
+ /* At this point se is NULL and we are at root level*/
+ add_nr_running(rq, task_delta);

+unthrottle_throttle:
/*
* The cfs_rq_throttled() breaks in the above iteration can result in
* incomplete leaf list maintenance, resulting in triggering the
@@ -4844,7 +4861,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);

- list_add_leaf_cfs_rq(cfs_rq);
+ if (list_add_leaf_cfs_rq(cfs_rq))
+ break;
}

assert_list_leaf_cfs_rq(rq);
--
2.17.1


2020-05-13 20:52:50

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

Vincent Guittot <[email protected]> writes:

> Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
> are quite close and follow the same sequence for enqueuing an entity in the
> cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
> enqueue_task_fair(). This fixes a problem already faced with the latter and
> add an optimization in the last for_each_sched_entity loop.
>
> Reported-by Tao Zhou <[email protected]>
> Reviewed-by: Phil Auld <[email protected]>

Reveiewed-by: Ben Segall <[email protected]>

> Signed-off-by: Vincent Guittot <[email protected]>
> ---
>
> v3 changes:
> - remove the unused enqueue variable
>
> kernel/sched/fair.c | 42 ++++++++++++++++++++++++++++++------------
> 1 file changed, 30 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4e12ba882663..9a58874ef104 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4792,7 +4792,6 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> struct rq *rq = rq_of(cfs_rq);
> struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
> struct sched_entity *se;
> - int enqueue = 1;
> long task_delta, idle_task_delta;
>
> se = cfs_rq->tg->se[cpu_of(rq)];
> @@ -4816,26 +4815,44 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> idle_task_delta = cfs_rq->idle_h_nr_running;
> for_each_sched_entity(se) {
> if (se->on_rq)
> - enqueue = 0;
> + break;
> + cfs_rq = cfs_rq_of(se);
> + enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
>
> + cfs_rq->h_nr_running += task_delta;
> + cfs_rq->idle_h_nr_running += idle_task_delta;
> +
> + /* end evaluation on encountering a throttled cfs_rq */
> + if (cfs_rq_throttled(cfs_rq))
> + goto unthrottle_throttle;
> + }
> +
> + for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
> - if (enqueue) {
> - enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
> - } else {
> - update_load_avg(cfs_rq, se, 0);
> - se_update_runnable(se);
> - }
> +
> + update_load_avg(cfs_rq, se, UPDATE_TG);
> + se_update_runnable(se);
>
> cfs_rq->h_nr_running += task_delta;
> cfs_rq->idle_h_nr_running += idle_task_delta;
>
> +
> + /* end evaluation on encountering a throttled cfs_rq */
> if (cfs_rq_throttled(cfs_rq))
> - break;
> + goto unthrottle_throttle;
> +
> + /*
> + * One parent has been throttled and cfs_rq removed from the
> + * list. Add it back to not break the leaf list.
> + */
> + if (throttled_hierarchy(cfs_rq))
> + list_add_leaf_cfs_rq(cfs_rq);
> }
>
> - if (!se)
> - add_nr_running(rq, task_delta);
> + /* At this point se is NULL and we are at root level*/
> + add_nr_running(rq, task_delta);
>
> +unthrottle_throttle:
> /*
> * The cfs_rq_throttled() breaks in the above iteration can result in
> * incomplete leaf list maintenance, resulting in triggering the
> @@ -4844,7 +4861,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
>
> - list_add_leaf_cfs_rq(cfs_rq);
> + if (list_add_leaf_cfs_rq(cfs_rq))
> + break;
> }
>
> assert_list_leaf_cfs_rq(rq);

Subject: [tip: sched/urgent] sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 39f23ce07b9355d05a64ae303ce20d1c4b92b957
Gitweb: https://git.kernel.org/tip/39f23ce07b9355d05a64ae303ce20d1c4b92b957
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 13 May 2020 15:55:28 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 19 May 2020 20:34:10 +02:00

sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list

Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
are quite close and follow the same sequence for enqueuing an entity in the
cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
enqueue_task_fair(). This fixes a problem already faced with the latter and
add an optimization in the last for_each_sched_entity loop.

Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
Reported-by Tao Zhou <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Phil Auld <[email protected]>
Reviewed-by: Ben Segall <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 42 ++++++++++++++++++++++++++++++------------
1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c6d57c3..538ba5d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4774,7 +4774,6 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
struct sched_entity *se;
- int enqueue = 1;
long task_delta, idle_task_delta;

se = cfs_rq->tg->se[cpu_of(rq)];
@@ -4798,26 +4797,44 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
idle_task_delta = cfs_rq->idle_h_nr_running;
for_each_sched_entity(se) {
if (se->on_rq)
- enqueue = 0;
+ break;
+ cfs_rq = cfs_rq_of(se);
+ enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);

+ cfs_rq->h_nr_running += task_delta;
+ cfs_rq->idle_h_nr_running += idle_task_delta;
+
+ /* end evaluation on encountering a throttled cfs_rq */
+ if (cfs_rq_throttled(cfs_rq))
+ goto unthrottle_throttle;
+ }
+
+ for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
- if (enqueue) {
- enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
- } else {
- update_load_avg(cfs_rq, se, 0);
- se_update_runnable(se);
- }
+
+ update_load_avg(cfs_rq, se, UPDATE_TG);
+ se_update_runnable(se);

cfs_rq->h_nr_running += task_delta;
cfs_rq->idle_h_nr_running += idle_task_delta;

+
+ /* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
- break;
+ goto unthrottle_throttle;
+
+ /*
+ * One parent has been throttled and cfs_rq removed from the
+ * list. Add it back to not break the leaf list.
+ */
+ if (throttled_hierarchy(cfs_rq))
+ list_add_leaf_cfs_rq(cfs_rq);
}

- if (!se)
- add_nr_running(rq, task_delta);
+ /* At this point se is NULL and we are at root level*/
+ add_nr_running(rq, task_delta);

+unthrottle_throttle:
/*
* The cfs_rq_throttled() breaks in the above iteration can result in
* incomplete leaf list maintenance, resulting in triggering the
@@ -4826,7 +4843,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);

- list_add_leaf_cfs_rq(cfs_rq);
+ if (list_add_leaf_cfs_rq(cfs_rq))
+ break;
}

assert_list_leaf_cfs_rq(rq);

2020-11-18 22:59:46

by Guilherme Piccoli

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

Hi Vincent (and all CCed), I'm sorry to ping about such "old" patch, but
we experienced a similar condition to what this patch addresses; it's an
older kernel (4.15.x) but when suggesting the users to move to an
updated 5.4.x kernel, we noticed that this patch is not there, although
similar ones are (like [0] and [1]).

So, I'd like to ask if there's any particular reason to not backport
this fix to stable kernels, specially the longterm 5.4. The main reason
behind the question is that the code is very complex for non-experienced
scheduler developers, and I'm afraid in suggesting such backport to 5.4
and introduce complex-to-debug issues.

Let me know your thoughts Vincent (and all CCed), thanks in advance.
Cheers,


Guilherme


P.S. For those that deleted this thread from the email client, here's a
link:
https://lore.kernel.org/lkml/[email protected]/


[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe61468b2cb

[1]
https://lore.kernel.org/lkml/[email protected]/
<- great thread BTW!

2020-11-18 23:36:28

by Tao Zhou

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

Hi Guilherme,

On Wed, Nov 18, 2020 at 07:56:38PM -0300, Guilherme G. Piccoli wrote:
> Hi Vincent (and all CCed), I'm sorry to ping about such "old" patch, but
> we experienced a similar condition to what this patch addresses; it's an
> older kernel (4.15.x) but when suggesting the users to move to an
> updated 5.4.x kernel, we noticed that this patch is not there, although
> similar ones are (like [0] and [1]).
>
> So, I'd like to ask if there's any particular reason to not backport
> this fix to stable kernels, specially the longterm 5.4. The main reason
> behind the question is that the code is very complex for non-experienced
> scheduler developers, and I'm afraid in suggesting such backport to 5.4
> and introduce complex-to-debug issues.
>
> Let me know your thoughts Vincent (and all CCed), thanks in advance.
> Cheers,
>
>
> Guilherme
>
>
> P.S. For those that deleted this thread from the email client, here's a
> link:
> https://lore.kernel.org/lkml/[email protected]/
>
>
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe61468b2cb
>
> [1]
> https://lore.kernel.org/lkml/[email protected]/
> <- great thread BTW!

Backport this patch to 5.4 need runnable_avg. but it is not introduced in 5.4
that time(please correct me if I am wrong).


Thanks,
Tao

2020-11-18 23:53:39

by Tao Zhou

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

On Wed, Nov 18, 2020 at 07:56:38PM -0300, Guilherme G. Piccoli wrote:
> Hi Vincent (and all CCed), I'm sorry to ping about such "old" patch, but
> we experienced a similar condition to what this patch addresses; it's an
> older kernel (4.15.x) but when suggesting the users to move to an
> updated 5.4.x kernel, we noticed that this patch is not there, although
> similar ones are (like [0] and [1]).
>
> So, I'd like to ask if there's any particular reason to not backport
> this fix to stable kernels, specially the longterm 5.4. The main reason
> behind the question is that the code is very complex for non-experienced
> scheduler developers, and I'm afraid in suggesting such backport to 5.4
> and introduce complex-to-debug issues.
>
> Let me know your thoughts Vincent (and all CCed), thanks in advance.
> Cheers,
>
>
> Guilherme
>
>
> P.S. For those that deleted this thread from the email client, here's a
> link:
> https://lore.kernel.org/lkml/[email protected]/
>
>
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe61468b2cb
>
> [1]
> https://lore.kernel.org/lkml/[email protected]/
> <- great thread BTW!

'sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list" failed to apply to
5.4-stable tree'

You could check above. But I do not have the link about this. Can't search it
on LKML web: https://lore.kernel.org/lkml/

BTW: '[email protected]' and '[email protected]' all is myself.

Sorry for the confusing..

Thanks.

2020-11-19 00:42:12

by Tao Zhou

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

On Thu, Nov 19, 2020 at 07:50:15AM +0800, Tao Zhou wrote:
> On Wed, Nov 18, 2020 at 07:56:38PM -0300, Guilherme G. Piccoli wrote:
> > Hi Vincent (and all CCed), I'm sorry to ping about such "old" patch, but
> > we experienced a similar condition to what this patch addresses; it's an
> > older kernel (4.15.x) but when suggesting the users to move to an
> > updated 5.4.x kernel, we noticed that this patch is not there, although
> > similar ones are (like [0] and [1]).
> >
> > So, I'd like to ask if there's any particular reason to not backport
> > this fix to stable kernels, specially the longterm 5.4. The main reason
> > behind the question is that the code is very complex for non-experienced
> > scheduler developers, and I'm afraid in suggesting such backport to 5.4
> > and introduce complex-to-debug issues.
> >
> > Let me know your thoughts Vincent (and all CCed), thanks in advance.
> > Cheers,
> >
> >
> > Guilherme
> >
> >
> > P.S. For those that deleted this thread from the email client, here's a
> > link:
> > https://lore.kernel.org/lkml/[email protected]/
> >
> >
> > [0]
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe61468b2cb
> >
> > [1]
> > https://lore.kernel.org/lkml/[email protected]/
> > <- great thread BTW!
>
> 'sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list" failed to apply to
> 5.4-stable tree'
>
> You could check above. But I do not have the link about this. Can't search it
> on LKML web: https://lore.kernel.org/lkml/
>
> BTW: '[email protected]' and '[email protected]' all is myself.
>
> Sorry for the confusing..
>
> Thanks.

Sorry again. I forget something. It is in the stable.

Here it is:

https://lore.kernel.org/stable/[email protected]/

2020-11-19 08:38:58

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

On Thu, 19 Nov 2020 at 01:36, Tao Zhou <[email protected]> wrote:
>
> On Thu, Nov 19, 2020 at 07:50:15AM +0800, Tao Zhou wrote:
> > On Wed, Nov 18, 2020 at 07:56:38PM -0300, Guilherme G. Piccoli wrote:
> > > Hi Vincent (and all CCed), I'm sorry to ping about such "old" patch, but
> > > we experienced a similar condition to what this patch addresses; it's an
> > > older kernel (4.15.x) but when suggesting the users to move to an
> > > updated 5.4.x kernel, we noticed that this patch is not there, although
> > > similar ones are (like [0] and [1]).
> > >
> > > So, I'd like to ask if there's any particular reason to not backport
> > > this fix to stable kernels, specially the longterm 5.4. The main reason
> > > behind the question is that the code is very complex for non-experienced
> > > scheduler developers, and I'm afraid in suggesting such backport to 5.4
> > > and introduce complex-to-debug issues.
> > >
> > > Let me know your thoughts Vincent (and all CCed), thanks in advance.
> > > Cheers,
> > >
> > >
> > > Guilherme
> > >
> > >
> > > P.S. For those that deleted this thread from the email client, here's a
> > > link:
> > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > >
> > > [0]
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe61468b2cb
> > >
> > > [1]
> > > https://lore.kernel.org/lkml/[email protected]/
> > > <- great thread BTW!
> >
> > 'sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list" failed to apply to
> > 5.4-stable tree'
> >
> > You could check above. But I do not have the link about this. Can't search it
> > on LKML web: https://lore.kernel.org/lkml/
> >
> > BTW: '[email protected]' and '[email protected]' all is myself.
> >
> > Sorry for the confusing..
> >
> > Thanks.
>
> Sorry again. I forget something. It is in the stable.
>
> Here it is:
>
> https://lore.kernel.org/stable/[email protected]/

I think it has never been applied to stable.
As you mentioned, the backport has been sent :
https://lore.kernel.org/stable/20200525172709.GB7427@vingu-book/

I received another emailed in September and pointed out to the
backport : https://www.spinics.net/lists/stable/msg410445.html


>

2020-11-19 11:40:15

by Guilherme Piccoli

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list



On 19/11/2020 05:36, Vincent Guittot wrote:
> On Thu, 19 Nov 2020 at 01:36, Tao Zhou <[email protected]> wrote:
>>
>> On Thu, Nov 19, 2020 at 07:50:15AM +0800, Tao Zhou wrote:
>>> On Wed, Nov 18, 2020 at 07:56:38PM -0300, Guilherme G. Piccoli wrote:
>>>> Hi Vincent (and all CCed), I'm sorry to ping about such "old" patch, but
>>>> we experienced a similar condition to what this patch addresses; it's an
>>>> older kernel (4.15.x) but when suggesting the users to move to an
>>>> updated 5.4.x kernel, we noticed that this patch is not there, although
>>>> similar ones are (like [0] and [1]).
>>>>
>>>> So, I'd like to ask if there's any particular reason to not backport
>>>> this fix to stable kernels, specially the longterm 5.4. The main reason
>>>> behind the question is that the code is very complex for non-experienced
>>>> scheduler developers, and I'm afraid in suggesting such backport to 5.4
>>>> and introduce complex-to-debug issues.
>>>>
>>>> Let me know your thoughts Vincent (and all CCed), thanks in advance.
>>>> Cheers,
>>>>
>>>>
>>>> Guilherme
>>>>
>>>>
>>>> P.S. For those that deleted this thread from the email client, here's a
>>>> link:
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>>
>>>> [0]
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe61468b2cb
>>>>
>>>> [1]
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>> <- great thread BTW!
>>>
>>> 'sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list" failed to apply to
>>> 5.4-stable tree'
>>>
>>> You could check above. But I do not have the link about this. Can't search it
>>> on LKML web: https://lore.kernel.org/lkml/
>>>
>>> BTW: '[email protected]' and '[email protected]' all is myself.
>>>
>>> Sorry for the confusing..
>>>
>>> Thanks.
>>
>> Sorry again. I forget something. It is in the stable.
>>
>> Here it is:
>>
>> https://lore.kernel.org/stable/[email protected]/
>
> I think it has never been applied to stable.
> As you mentioned, the backport has been sent :
> https://lore.kernel.org/stable/20200525172709.GB7427@vingu-book/
>
> I received another emailed in September and pointed out to the
> backport : https://www.spinics.net/lists/stable/msg410445.html
>
>
>>

Thanks a lot Tao and Vincent! Nice to know that you already worked the
backport, gives much more confidence when the author does that heheh

So, this should go to stable 5.4.y, but not 4.19.y IIUC?
Cheers,


Guilherme

2020-11-19 13:28:04

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

On Thu, 19 Nov 2020 at 12:36, Guilherme G. Piccoli
<[email protected]> wrote:
>
>
>
> On 19/11/2020 05:36, Vincent Guittot wrote:
> > On Thu, 19 Nov 2020 at 01:36, Tao Zhou <[email protected]> wrote:
> >>
> >> On Thu, Nov 19, 2020 at 07:50:15AM +0800, Tao Zhou wrote:
> >>> On Wed, Nov 18, 2020 at 07:56:38PM -0300, Guilherme G. Piccoli wrote:
> >>>> Hi Vincent (and all CCed), I'm sorry to ping about such "old" patch, but
> >>>> we experienced a similar condition to what this patch addresses; it's an
> >>>> older kernel (4.15.x) but when suggesting the users to move to an
> >>>> updated 5.4.x kernel, we noticed that this patch is not there, although
> >>>> similar ones are (like [0] and [1]).
> >>>>
> >>>> So, I'd like to ask if there's any particular reason to not backport
> >>>> this fix to stable kernels, specially the longterm 5.4. The main reason
> >>>> behind the question is that the code is very complex for non-experienced
> >>>> scheduler developers, and I'm afraid in suggesting such backport to 5.4
> >>>> and introduce complex-to-debug issues.
> >>>>
> >>>> Let me know your thoughts Vincent (and all CCed), thanks in advance.
> >>>> Cheers,
> >>>>
> >>>>
> >>>> Guilherme
> >>>>
> >>>>
> >>>> P.S. For those that deleted this thread from the email client, here's a
> >>>> link:
> >>>> https://lore.kernel.org/lkml/[email protected]/
> >>>>
> >>>>
> >>>> [0]
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe61468b2cb
> >>>>
> >>>> [1]
> >>>> https://lore.kernel.org/lkml/[email protected]/
> >>>> <- great thread BTW!
> >>>
> >>> 'sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list" failed to apply to
> >>> 5.4-stable tree'
> >>>
> >>> You could check above. But I do not have the link about this. Can't search it
> >>> on LKML web: https://lore.kernel.org/lkml/
> >>>
> >>> BTW: '[email protected]' and '[email protected]' all is myself.
> >>>
> >>> Sorry for the confusing..
> >>>
> >>> Thanks.
> >>
> >> Sorry again. I forget something. It is in the stable.
> >>
> >> Here it is:
> >>
> >> https://lore.kernel.org/stable/[email protected]/
> >
> > I think it has never been applied to stable.
> > As you mentioned, the backport has been sent :
> > https://lore.kernel.org/stable/20200525172709.GB7427@vingu-book/
> >
> > I received another emailed in September and pointed out to the
> > backport : https://www.spinics.net/lists/stable/msg410445.html
> >
> >
> >>
>
> Thanks a lot Tao and Vincent! Nice to know that you already worked the
> backport, gives much more confidence when the author does that heheh
>
> So, this should go to stable 5.4.y, but not 4.19.y IIUC?

Yeah. they should be backported up to v5.1 but not earlier

Regards,
Vincent

> Cheers,
>
>
> Guilherme

2020-11-19 14:11:55

by Guilherme Piccoli

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

Thank you Vincent, much appreciated! I'll respond in the patch thread,
hopefully we can get that included in 5.4.y .

Cheers,


Guilherme

2021-06-24 10:30:42

by Po-Hsu Lin

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

Hello Vincent,

sorry to resurrect this thread again,
I was trying to backport this patch and corresponding fixes to our
Ubuntu 4.15 kernel [1] to fix an issue report by LTP cfs_bandwidth01
test[2], my colleague Guilherme told me there once a discussion about
backporting this on this thread.

You mentioned here this should not be backported to earlier stable
kernel, I am curious if there is any specific reason of it? Too risky
maybe?
Thanks!
PHLin

[1] https://lists.ubuntu.com/archives/kernel-team/2021-June/121571.html
[2] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/cfs_bandwidth01.c


On Thu, Nov 19, 2020 at 9:25 PM Vincent Guittot
<[email protected]> wrote:
>
> On Thu, 19 Nov 2020 at 12:36, Guilherme G. Piccoli
> <[email protected]> wrote:
> >
> >
> >
> > On 19/11/2020 05:36, Vincent Guittot wrote:
> > > On Thu, 19 Nov 2020 at 01:36, Tao Zhou <[email protected]> wrote:
> > >>
> > >> On Thu, Nov 19, 2020 at 07:50:15AM +0800, Tao Zhou wrote:
> > >>> On Wed, Nov 18, 2020 at 07:56:38PM -0300, Guilherme G. Piccoli wrote:
> > >>>> Hi Vincent (and all CCed), I'm sorry to ping about such "old" patch, but
> > >>>> we experienced a similar condition to what this patch addresses; it's an
> > >>>> older kernel (4.15.x) but when suggesting the users to move to an
> > >>>> updated 5.4.x kernel, we noticed that this patch is not there, although
> > >>>> similar ones are (like [0] and [1]).
> > >>>>
> > >>>> So, I'd like to ask if there's any particular reason to not backport
> > >>>> this fix to stable kernels, specially the longterm 5.4. The main reason
> > >>>> behind the question is that the code is very complex for non-experienced
> > >>>> scheduler developers, and I'm afraid in suggesting such backport to 5.4
> > >>>> and introduce complex-to-debug issues.
> > >>>>
> > >>>> Let me know your thoughts Vincent (and all CCed), thanks in advance.
> > >>>> Cheers,
> > >>>>
> > >>>>
> > >>>> Guilherme
> > >>>>
> > >>>>
> > >>>> P.S. For those that deleted this thread from the email client, here's a
> > >>>> link:
> > >>>> https://lore.kernel.org/lkml/[email protected]/
> > >>>>
> > >>>>
> > >>>> [0]
> > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe61468b2cb
> > >>>>
> > >>>> [1]
> > >>>> https://lore.kernel.org/lkml/[email protected]/
> > >>>> <- great thread BTW!
> > >>>
> > >>> 'sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list" failed to apply to
> > >>> 5.4-stable tree'
> > >>>
> > >>> You could check above. But I do not have the link about this. Can't search it
> > >>> on LKML web: https://lore.kernel.org/lkml/
> > >>>
> > >>> BTW: '[email protected]' and '[email protected]' all is myself.
> > >>>
> > >>> Sorry for the confusing..
> > >>>
> > >>> Thanks.
> > >>
> > >> Sorry again. I forget something. It is in the stable.
> > >>
> > >> Here it is:
> > >>
> > >> https://lore.kernel.org/stable/[email protected]/
> > >
> > > I think it has never been applied to stable.
> > > As you mentioned, the backport has been sent :
> > > https://lore.kernel.org/stable/20200525172709.GB7427@vingu-book/
> > >
> > > I received another emailed in September and pointed out to the
> > > backport : https://www.spinics.net/lists/stable/msg410445.html
> > >
> > >
> > >>
> >
> > Thanks a lot Tao and Vincent! Nice to know that you already worked the
> > backport, gives much more confidence when the author does that heheh
> >
> > So, this should go to stable 5.4.y, but not 4.19.y IIUC?
>
> Yeah. they should be backported up to v5.1 but not earlier
>
> Regards,
> Vincent
>
> > Cheers,
> >
> >
> > Guilherme

2021-06-24 12:34:09

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

On Thu, 24 Jun 2021 at 12:29, Po-Hsu Lin <[email protected]> wrote:
>
> Hello Vincent,
>
> sorry to resurrect this thread again,
> I was trying to backport this patch and corresponding fixes to our
> Ubuntu 4.15 kernel [1] to fix an issue report by LTP cfs_bandwidth01
> test[2], my colleague Guilherme told me there once a discussion about
> backporting this on this thread.
>
> You mentioned here this should not be backported to earlier stable
> kernel, I am curious if there is any specific reason of it? Too risky
> maybe?

Yes, IIRC there are some dependencies with other patchsets that make
the backport complex and not straight forward


> Thanks!
> PHLin
>
> [1] https://lists.ubuntu.com/archives/kernel-team/2021-June/121571.html
> [2] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/cfs_bandwidth01.c
>
>
> On Thu, Nov 19, 2020 at 9:25 PM Vincent Guittot
> <[email protected]> wrote:
> >
> > On Thu, 19 Nov 2020 at 12:36, Guilherme G. Piccoli
> > <[email protected]> wrote:
> > >
> > >
> > >
> > > On 19/11/2020 05:36, Vincent Guittot wrote:
> > > > On Thu, 19 Nov 2020 at 01:36, Tao Zhou <[email protected]> wrote:
> > > >>
> > > >> On Thu, Nov 19, 2020 at 07:50:15AM +0800, Tao Zhou wrote:
> > > >>> On Wed, Nov 18, 2020 at 07:56:38PM -0300, Guilherme G. Piccoli wrote:
> > > >>>> Hi Vincent (and all CCed), I'm sorry to ping about such "old" patch, but
> > > >>>> we experienced a similar condition to what this patch addresses; it's an
> > > >>>> older kernel (4.15.x) but when suggesting the users to move to an
> > > >>>> updated 5.4.x kernel, we noticed that this patch is not there, although
> > > >>>> similar ones are (like [0] and [1]).
> > > >>>>
> > > >>>> So, I'd like to ask if there's any particular reason to not backport
> > > >>>> this fix to stable kernels, specially the longterm 5.4. The main reason
> > > >>>> behind the question is that the code is very complex for non-experienced
> > > >>>> scheduler developers, and I'm afraid in suggesting such backport to 5.4
> > > >>>> and introduce complex-to-debug issues.
> > > >>>>
> > > >>>> Let me know your thoughts Vincent (and all CCed), thanks in advance.
> > > >>>> Cheers,
> > > >>>>
> > > >>>>
> > > >>>> Guilherme
> > > >>>>
> > > >>>>
> > > >>>> P.S. For those that deleted this thread from the email client, here's a
> > > >>>> link:
> > > >>>> https://lore.kernel.org/lkml/[email protected]/
> > > >>>>
> > > >>>>
> > > >>>> [0]
> > > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe61468b2cb
> > > >>>>
> > > >>>> [1]
> > > >>>> https://lore.kernel.org/lkml/[email protected]/
> > > >>>> <- great thread BTW!
> > > >>>
> > > >>> 'sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list" failed to apply to
> > > >>> 5.4-stable tree'
> > > >>>
> > > >>> You could check above. But I do not have the link about this. Can't search it
> > > >>> on LKML web: https://lore.kernel.org/lkml/
> > > >>>
> > > >>> BTW: '[email protected]' and '[email protected]' all is myself.
> > > >>>
> > > >>> Sorry for the confusing..
> > > >>>
> > > >>> Thanks.
> > > >>
> > > >> Sorry again. I forget something. It is in the stable.
> > > >>
> > > >> Here it is:
> > > >>
> > > >> https://lore.kernel.org/stable/[email protected]/
> > > >
> > > > I think it has never been applied to stable.
> > > > As you mentioned, the backport has been sent :
> > > > https://lore.kernel.org/stable/20200525172709.GB7427@vingu-book/
> > > >
> > > > I received another emailed in September and pointed out to the
> > > > backport : https://www.spinics.net/lists/stable/msg410445.html
> > > >
> > > >
> > > >>
> > >
> > > Thanks a lot Tao and Vincent! Nice to know that you already worked the
> > > backport, gives much more confidence when the author does that heheh
> > >
> > > So, this should go to stable 5.4.y, but not 4.19.y IIUC?
> >
> > Yeah. they should be backported up to v5.1 but not earlier
> >
> > Regards,
> > Vincent
> >
> > > Cheers,
> > >
> > >
> > > Guilherme