Subject: RFC: documentation of the autogroup feature

Hello Mike and others,

The autogroup feature that you added in 2.6.38 remains poorly
documented, so I took a stab at adding some text to the sched(7)
manual page. There are still a few pieces to be fixed, and you
may also see some other pieces that should be added. Could I
ask you to take a look at the text below?

Cheers,

Michael

The autogroup feature
Since Linux 2.6.38, the kernel provides a feature known as
autogrouping to improve interactive desktop performance in the
face of multiprocess CPU-intensive workloads such as building
the Linux kernel with large numbers of parallel build processes
(i.e., the make(1) -j flag).

This feature operates in conjunction with the CFS scheduler and
requires a kernel that is configured with CONFIG_SCHED_AUTO‐
GROUP. On a running system, this feature is enabled or dis‐
abled via the file /proc/sys/kernel/sched_autogroup_enabled; a
value of 0 disables the feature, while a value of 1 enables it.
The default value in this file is 1, unless the kernel was
booted with the noautogroup parameter.

When autogrouping is enabled, processes are automatically
placed into "task groups" for the purposes of scheduling. In
the current implementation, a new task group is created when a
new session is created via setsid(2), as happens, for example,
when a new terminal window is created. A task group is auto‐
matically destroyed when the last process in the group termi‐
nates.



┌─────────────────────────────────────────────────────┐
│FIXME │
├─────────────────────────────────────────────────────┤
│The following is a little vague. Does it need to be │
│made more precise? │
└─────────────────────────────────────────────────────┘
The CFS scheduler employs an algorithm that distributes the CPU
across task groups. As a result of this algorithm, the pro‐
cesses in task groups that contain multiple CPU-intensive pro‐
cesses are in effect disfavored by the scheduler.

A process's autogroup (task group) membership can be viewed via
the file /proc/[pid]/autogroup:

$ cat /proc/1/autogroup
/autogroup-1 nice 0

This file can also be used to modify the CPU bandwidth allo‐
cated to a task group. This is done by writing a number in the
"nice" range to the file to set the task group's nice value.
The allowed range is from +19 (low priority) to -20 (high pri‐
ority). Note that all values in this range cause a task group
to be further disfavored by the scheduler, with -20 resulting
in the scheduler mildy disfavoring the task group and +19
greatly disfavoring it.


┌─────────────────────────────────────────────────────┐
│FIXME │
├─────────────────────────────────────────────────────┤
│Regarding the previous paragraph... My tests indi‐ │
│cate that writing *any* value to the autogroup file │
│causes the task group to get a lower priority. This │
│somewhat surprised me, since I assumed (based on the │
│parallel with the process nice(2) value) that nega‐ │
│tive values might boost the task group's priority │
│above a task group whose autogroup file had not been │
│touched. │
│ │
│Is this the expected behavior? I presume it is... │
│ │
│But then there's a small surprise in the interface. │
│Suppose that the value 0 is written to the autogroup │
│file, then this results in the task group being sig‐ │
│nificantly disfavored. But, the nice value *shown* │
│in the autogroup file will be the same as if the │
│file had not been modified. So, the user has no way │
│of discovering the difference. That seems odd. Am I │
│missing something? │
└─────────────────────────────────────────────────────┘



┌─────────────────────────────────────────────────────┐
│FIXME │
├─────────────────────────────────────────────────────┤
│Is the following correct? Does the statement need to │
│be more precise? (E.g., in precisely which circum‐ │
│stances does the use of cgroups override autogroup?) │
└─────────────────────────────────────────────────────┘
The use of the cgroups(7) CPU controller overrides the effect
of autogrouping.


┌─────────────────────────────────────────────────────┐
│FIXME │
├─────────────────────────────────────────────────────┤
│What needs to be said about autogroup and real-time │
│tasks? │
└─────────────────────────────────────────────────────┘


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


2016-11-23 10:35:04

by Mike Galbraith

[permalink] [raw]
Subject: [patch] sched/autogroup: Fix 64bit kernel nice adjustment

On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote:

> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
> $B("(BFIXME $B("(B
> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
> $B("(BRegarding the previous paragraph... My tests indi$B!>(B $B("(B
> $B("(Bcate that writing *any* value to the autogroup file $B("(B
> $B("(Bcauses the task group to get a lower priority. This $B("(B

Because autogroup didn't call the then meaningless scale_load()...


Autogroup nice level adjustment has been broken ever since load
resolution was increased for 64bit kernels. Use scale_load() to
scale group weight.

Signed-off-by: Mike Galbraith <[email protected]>
Reported-by: Michael Kerrisk <[email protected]>
Cc: [email protected]
---
kernel/sched/auto_group.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -192,6 +192,7 @@ int proc_sched_autogroup_set_nice(struct
{
static unsigned long next = INITIAL_JIFFIES;
struct autogroup *ag;
+ unsigned long shares;
int err;

if (nice < MIN_NICE || nice > MAX_NICE)
@@ -210,9 +211,10 @@ int proc_sched_autogroup_set_nice(struct

next = HZ / 10 + jiffies;
ag = autogroup_task_get(p);
+ shares = scale_load(sched_prio_to_weight[nice + 20]);

down_write(&ag->lock);
- err = sched_group_set_shares(ag->tg, sched_prio_to_weight[nice + 20]);
+ err = sched_group_set_shares(ag->tg, shares);
if (!err)
ag->nice = nice;
up_write(&ag->lock);

2016-11-23 11:40:07

by Mike Galbraith

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature

On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote:

> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
> $B("(BFIXME $B("(B
> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
> $B("(BThe following is a little vague. Does it need to be $B("(B
> $B("(Bmade more precise? $B("(B
> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
> The CFS scheduler employs an algorithm that distributes the CPU
> across task groups. As a result of this algorithm, the pro$B!>(B
> cesses in task groups that contain multiple CPU-intensive pro$B!>(B
> cesses are in effect disfavored by the scheduler.

Mmmm, they're actually equalized (modulo smp fairness goop), but I see
what you mean.

> A process's autogroup (task group) membership can be viewed via
> the file /proc/[pid]/autogroup:
>
> $ cat /proc/1/autogroup
> /autogroup-1 nice 0
>
> This file can also be used to modify the CPU bandwidth allo$B!>(B
> cated to a task group. This is done by writing a number in the
> "nice" range to the file to set the task group's nice value.
> The allowed range is from +19 (low priority) to -20 (high pri$B!>(B
> ority). Note that all values in this range cause a task group
> to be further disfavored by the scheduler, with -20 resulting
> in the scheduler mildy disfavoring the task group and +19
> greatly disfavoring it.

Group nice levels exactly work the same as task nice levels, ie
negative nice increases share, positive nice decreases it relative to
the default nice 0.

> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
> $B("(BFIXME $B("(B
> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
> $B("(BRegarding the previous paragraph... My tests indi$B!>(B $B("(B
> $B("(Bcate that writing *any* value to the autogroup file $B("(B
> $B("(Bcauses the task group to get a lower priority.

(patchlet.. I'd prefer to whack the knob, but like the on/off switch,
it may be in use, so I guess we're stuck with it)

> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
> $B("(BFIXME $B("(B
> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
> $B("(BIs the following correct? Does the statement need to $B("(B
> $B("(Bbe more precise? (E.g., in precisely which circum$B!>(B $B("(B
> $B("(Bstances does the use of cgroups override autogroup?) $B("(B
> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
> The use of the cgroups(7) CPU controller overrides the effect
> of autogrouping.

Correct, autogroup defers to cgroups. Perhaps mention that moving a
task back to the root task group will result in the autogroup again
taking effect.

> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
> $B("(BFIXME $B("(B
> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
> $B("(BWhat needs to be said about autogroup and real-time $B("(B
> $B("(Btasks? $B("(B
> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B

That it does not group realtime tasks, they are auto-deflected to the
root task group.

-Mike

Subject: Re: [patch] sched/autogroup: Fix 64bit kernel nice adjustment

Hello Mike,

On 11/23/2016 11:33 AM, Mike Galbraith wrote:
> On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote:
>
>> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
>> $B("(BFIXME $B("(B
>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>> $B("(BRegarding the previous paragraph... My tests indi$B!>(B $B("(B
>> $B("(Bcate that writing *any* value to the autogroup file $B("(B
>> $B("(Bcauses the task group to get a lower priority. This $B("(B
>
> Because autogroup didn't call the then meaningless scale_load()...

So, does that mean that this buglet kicked in starting (only) in
Linux 4.7 with commit 2159197d66770ec01f75c93fb11dc66df81fd45b?

> Autogroup nice level adjustment has been broken ever since load
> resolution was increased for 64bit kernels. Use scale_load() to
> scale group weight.

Tested-by: Michael Kerrisk <[email protected]>

Applied and tested against 4.9-rc6 on an Intel u7 (4 cores).
Test setup:

Terminal window 1: running 40 CPU burner jobs
Terminal window 2: running 40 CPU burner jobs
Terminal window 1: running 1 CPU burner job

Demonstrated that:
* Writing "0" to the autogroup file for TW1 now causes no change
to the rate at which the process on the terminal consume CPU.
* Writing -20 to the autogroup file for TW1 caused those processes
to get the lion's share of CPU while TW2 TW3 get a tiny amount.
* Writing -20 to the autogroup files for TW1 and TW3 allowed the
process on TW3 to get as much CPU as it was getting as when
the autogroup nice values for both terminals were 0.

Thanks,

Michael

> Signed-off-by: Mike Galbraith <[email protected]>
> Reported-by: Michael Kerrisk <[email protected]>
> Cc: [email protected]
> ---
> kernel/sched/auto_group.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> --- a/kernel/sched/auto_group.c
> +++ b/kernel/sched/auto_group.c
> @@ -192,6 +192,7 @@ int proc_sched_autogroup_set_nice(struct
> {
> static unsigned long next = INITIAL_JIFFIES;
> struct autogroup *ag;
> + unsigned long shares;
> int err;
>
> if (nice < MIN_NICE || nice > MAX_NICE)
> @@ -210,9 +211,10 @@ int proc_sched_autogroup_set_nice(struct
>
> next = HZ / 10 + jiffies;
> ag = autogroup_task_get(p);
> + shares = scale_load(sched_prio_to_weight[nice + 20]);
>
> down_write(&ag->lock);
> - err = sched_group_set_shares(ag->tg, sched_prio_to_weight[nice + 20]);
> + err = sched_group_set_shares(ag->tg, shares);
> if (!err)
> ag->nice = nice;
> up_write(&ag->lock);
>


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: RFC: documentation of the autogroup feature

Hi Mike,

First off, I better say that I'm not at all intimate with the details
of the scheduler, so bear with me...

On 11/23/2016 12:39 PM, Mike Galbraith wrote:
> On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote:
>
>> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
>> $B("(BFIXME $B("(B
>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>> $B("(BThe following is a little vague. Does it need to be $B("(B
>> $B("(Bmade more precise? $B("(B
>> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
>> The CFS scheduler employs an algorithm that distributes the CPU
>> across task groups. As a result of this algorithm, the pro$B!>(B
>> cesses in task groups that contain multiple CPU-intensive pro$B!>(B
>> cesses are in effect disfavored by the scheduler.
>
> Mmmm, they're actually equalized (modulo smp fairness goop), but I see
> what you mean.

I couldn't quite grok that sentence. My problem is resolving "they".
Do you mean: "the CPU scheduler equalizes the distribution of
CPU cycles across task groups"?

>
>> A process's autogroup (task group) membership can be viewed via
>> the file /proc/[pid]/autogroup:
>>
>> $ cat /proc/1/autogroup
>> /autogroup-1 nice 0
>>
>> This file can also be used to modify the CPU bandwidth allo$B!>(B
>> cated to a task group. This is done by writing a number in the
>> "nice" range to the file to set the task group's nice value.
>> The allowed range is from +19 (low priority) to -20 (high pri$B!>(B
>> ority). Note that all values in this range cause a task group
>> to be further disfavored by the scheduler, with -20 resulting
>> in the scheduler mildy disfavoring the task group and +19
>> greatly disfavoring it.
>
> Group nice levels exactly work the same as task nice levels, ie
> negative nice increases share, positive nice decreases it relative to
> the default nice 0.

Yes, got it now.

>> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
>> $B("(BFIXME $B("(B
>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>> $B("(BRegarding the previous paragraph... My tests indi$B!>(B $B("(B
>> $B("(Bcate that writing *any* value to the autogroup file $B("(B
>> $B("(Bcauses the task group to get a lower priority.
>
> (patchlet..

Writing documentation finds bugs. Who knew? ;-)

> I'd prefer to whack the knob, but like the on/off switch,
> it may be in use, so I guess we're stuck with it)
>
>> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
>> $B("(BFIXME $B("(B
>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>> $B("(BIs the following correct? Does the statement need to $B("(B
>> $B("(Bbe more precise? (E.g., in precisely which circum$B!>(B $B("(B
>> $B("(Bstances does the use of cgroups override autogroup?) $B("(B
>> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
>> The use of the cgroups(7) CPU controller overrides the effect
>> of autogrouping.
>
> Correct, autogroup defers to cgroups. Perhaps mention that moving a
> task back to the root task group will result in the autogroup again
> taking effect.

In what circumstances does a process get moved back to the root
task group?

Actually, can you define for me what the root task group is, and
why it exists? That may be worth some words in this man page.

>> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
>> $B("(BFIXME $B("(B
>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>> $B("(BWhat needs to be said about autogroup and real-time $B("(B
>> $B("(Btasks? $B("(B
>> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
>
> That it does not group realtime tasks, they are auto-deflected to the
> root task group.

Okay. Thanks.

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-23 14:12:51

by Mike Galbraith

[permalink] [raw]
Subject: Re: [patch] sched/autogroup: Fix 64bit kernel nice adjustment

On Wed, 2016-11-23 at 14:47 +0100, Michael Kerrisk (man-pages) wrote:
> Hello Mike,
>
> On 11/23/2016 11:33 AM, Mike Galbraith wrote:
> > On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages)
> > wrote:
> >
> > > $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
> > > $B("(BFIXME $B("(B
> > > $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
> > > $B("(BRegarding the previous paragraph... My tests indi$B!>(B $B("(B
> > > $B("(Bcate that writing *any* value to the autogroup file $B("(B
> > > $B("(Bcauses the task group to get a lower priority. This $B("(B
> >
> > Because autogroup didn't call the then meaningless scale_load()...
>
> So, does that mean that this buglet kicked in starting (only) in
> Linux 4.7 with commit 2159197d66770ec01f75c93fb11dc66df81fd45b?

Yeah, that gave it teeth.

-Mike

Subject: Re: [patch] sched/autogroup: Fix 64bit kernel nice adjustment

On 11/23/2016 03:12 PM, Mike Galbraith wrote:
> On Wed, 2016-11-23 at 14:47 +0100, Michael Kerrisk (man-pages) wrote:
>> Hello Mike,
>>
>> On 11/23/2016 11:33 AM, Mike Galbraith wrote:
>>> On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages)
>>> wrote:
>>>
>>>> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
>>>> $B("(BFIXME $B("(B
>>>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>>>> $B("(BRegarding the previous paragraph... My tests indi$B!>(B $B("(B
>>>> $B("(Bcate that writing *any* value to the autogroup file $B("(B
>>>> $B("(Bcauses the task group to get a lower priority. This $B("(B
>>>
>>> Because autogroup didn't call the then meaningless scale_load()...
>>
>> So, does that mean that this buglet kicked in starting (only) in
>> Linux 4.7 with commit 2159197d66770ec01f75c93fb11dc66df81fd45b?
>
> Yeah, that gave it teeth.

Thanks for the confirmation. Are you aiming to see the fix
merged for 4.9, or will this wait for 4.10?

Cheers,

Michael



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-23 15:34:15

by Mike Galbraith

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature

On Wed, 2016-11-23 at 14:54 +0100, Michael Kerrisk (man-pages) wrote:
> Hi Mike,
>
> First off, I better say that I'm not at all intimate with the details
> of the scheduler, so bear with me...
>
> On 11/23/2016 12:39 PM, Mike Galbraith wrote:
> > On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote:
> >
> > > $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
> > > $B("(BFIXME $B("(B
> > > $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
> > > $B("(BThe following is a little vague. Does it need to be $B("(B
> > > $B("(Bmade more precise? $B("(B
> > > $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
> > > The CFS scheduler employs an algorithm that distributes the CPU
> > > across task groups. As a result of this algorithm, the pro$B!>(B
> > > cesses in task groups that contain multiple CPU-intensive pro$B!>(B
> > > cesses are in effect disfavored by the scheduler.
> >
> > Mmmm, they're actually equalized (modulo smp fairness goop), but I see
> > what you mean.
>
> I couldn't quite grok that sentence. My problem is resolving "they".
> Do you mean: "the CPU scheduler equalizes the distribution of
> CPU cycles across task groups"?

Sort of. "They" are scheduler entities, runqueue (group) or task. The
scheduler equalizes entity vruntimes.

> > > $B("(BFIXME $B("(B
> > > $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
> > > $B("(BIs the following correct? Does the statement need to $B("(B
> > > $B("(Bbe more precise? (E.g., in precisely which circum$B!>(B $B("(B
> > > $B("(Bstances does the use of cgroups override autogroup?) $B("(B
> > > $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
> > > The use of the cgroups(7) CPU controller overrides the effect
> > > of autogrouping.
> >
> > Correct, autogroup defers to cgroups. Perhaps mention that moving a
> > task back to the root task group will result in the autogroup again
> > taking effect.
>
> In what circumstances does a process get moved back to the root
> task group?

Userspace actions, tool or human fingers.


> Actually, can you define for me what the root task group is, and
> why it exists? That may be worth some words in this man page.

I don't think we need group scheduling details, there's plenty of
documentation elsewhere for those who want theory. Autogroup is for
those who don't want to have to care (which is also why it should have
never grown nice knob).

-Mike

2016-11-23 15:55:49

by Mike Galbraith

[permalink] [raw]
Subject: Re: [patch] sched/autogroup: Fix 64bit kernel nice adjustment

On Wed, 2016-11-23 at 15:20 +0100, Michael Kerrisk (man-pages) wrote:

> Thanks for the confirmation. Are you aiming to see the fix
> merged for 4.9, or will this wait for 4.10?

Dunno, that's up to Peter/Ingo. It's unlikely that anyone other than
we two will notice a thing either way :)

-Mike

Subject: Re: RFC: documentation of the autogroup feature

Hi Mike,

On 11/23/2016 04:33 PM, Mike Galbraith wrote:
> On Wed, 2016-11-23 at 14:54 +0100, Michael Kerrisk (man-pages) wrote:
>> Hi Mike,
>>
>> First off, I better say that I'm not at all intimate with the details
>> of the scheduler, so bear with me...
>>
>> On 11/23/2016 12:39 PM, Mike Galbraith wrote:
>>> On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote:
>>>
>>>> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
>>>> $B("(BFIXME $B("(B
>>>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>>>> $B("(BThe following is a little vague. Does it need to be $B("(B
>>>> $B("(Bmade more precise? $B("(B
>>>> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
>>>> The CFS scheduler employs an algorithm that distributes the CPU
>>>> across task groups. As a result of this algorithm, the pro$B!>(B
>>>> cesses in task groups that contain multiple CPU-intensive pro$B!>(B
>>>> cesses are in effect disfavored by the scheduler.
>>>
>>> Mmmm, they're actually equalized (modulo smp fairness goop), but I see
>>> what you mean.
>>
>> I couldn't quite grok that sentence. My problem is resolving "they".
>> Do you mean: "the CPU scheduler equalizes the distribution of
>> CPU cycles across task groups"?
>
> Sort of. "They" are scheduler entities, runqueue (group) or task. The
> scheduler equalizes entity vruntimes.

Okay -- I'll see if I can come up with some wording there.

>
>>>> $B("(BFIXME $B("(B
>>>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>>>> $B("(BIs the following correct? Does the statement need to $B("(B
>>>> $B("(Bbe more precise? (E.g., in precisely which circum$B!>(B $B("(B
>>>> $B("(Bstances does the use of cgroups override autogroup?) $B("(B
>>>> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
>>>> The use of the cgroups(7) CPU controller overrides the effect
>>>> of autogrouping.
>>>
>>> Correct, autogroup defers to cgroups. Perhaps mention that moving a
>>> task back to the root task group will result in the autogroup again
>>> taking effect.
>>
>> In what circumstances does a process get moved back to the root
>> task group?
>
> Userspace actions, tool or human fingers.

Could you say a little more please. What Kernel-user-space
APIs/system calls/etc. cause this to happen?

>> Actually, can you define for me what the root task group is, and
>> why it exists? That may be worth some words in this man page.
>
> I don't think we need group scheduling details, there's plenty of
> documentation elsewhere for those who want theory.

Well, you suggested above

Perhaps mention that moving a task back to the root task
group will result in the autogroup again taking effect.

So, that inevitable would lead me and the reader of the man page
to ask: what's the root task group?

> Autogroup is for
> those who don't want to have to care (which is also why it should have
> never grown nice knob).

Yes, that I understand that much :-).

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: RFC: documentation of the autogroup feature

> I don't think we need group scheduling details, there's plenty of
> documentation elsewhere for those who want theory.

Actually, which documentation were you referring to here?

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-23 17:11:36

by Mike Galbraith

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature

On Wed, 2016-11-23 at 17:04 +0100, Michael Kerrisk (man-pages) wrote:

> > > In what circumstances does a process get moved back to the root
> > > task group?
> >
> > Userspace actions, tool or human fingers.
>
> Could you say a little more please. What Kernel-user-space
> APIs/system calls/etc. cause this to happen?

Well, the system call would be write(), scribbling in the cgroups vfs
interface.. not all that helpful without ever more technical detail.

> > > Actually, can you define for me what the root task group is, and
> > > why it exists? That may be worth some words in this man page.
> >
> > I don't think we need group scheduling details, there's plenty of
> > documentation elsewhere for those who want theory.
>
> Well, you suggested above
>
> Perhaps mention that moving a task back to the root task
> group will result in the autogroup again taking effect.

Dang, evolution doesn't have an unsend button :)

-Mike

2016-11-23 17:19:25

by Mike Galbraith

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature

On Wed, 2016-11-23 at 17:05 +0100, Michael Kerrisk (man-pages) wrote:
> > I don't think we need group scheduling details, there's plenty of
> > documentation elsewhere for those who want theory.
>
> Actually, which documentation were you referring to here?

Documentation/scheduler/*

Subject: Re: RFC: documentation of the autogroup feature

On 11/23/2016 06:19 PM, Mike Galbraith wrote:
> On Wed, 2016-11-23 at 17:05 +0100, Michael Kerrisk (man-pages) wrote:
>>> I don't think we need group scheduling details, there's plenty of
>>> documentation elsewhere for those who want theory.
>>
>> Actually, which documentation were you referring to here?
>
> Documentation/scheduler/*

I think there's a lot less information in there than you think...
Certainly, I can't get any big picture from reading those docs.

Cheers

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: [tip:sched/urgent] sched/autogroup: Fix 64-bit kernel nice level adjustment

Commit-ID: 83929cce95251cc77e5659bf493bd424ae0e7a67
Gitweb: http://git.kernel.org/tip/83929cce95251cc77e5659bf493bd424ae0e7a67
Author: Mike Galbraith <[email protected]>
AuthorDate: Wed, 23 Nov 2016 11:33:37 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 24 Nov 2016 05:45:02 +0100

sched/autogroup: Fix 64-bit kernel nice level adjustment

Michael Kerrisk reported:

> Regarding the previous paragraph... My tests indicate
> that writing *any* value to the autogroup [nice priority level]
> file causes the task group to get a lower priority.

Because autogroup didn't call the then meaningless scale_load()...

Autogroup nice level adjustment has been broken ever since load
resolution was increased for 64-bit kernels. Use scale_load() to
scale group weight.

Michael Kerrisk tested this patch to fix the problem:

> Applied and tested against 4.9-rc6 on an Intel u7 (4 cores).
> Test setup:
>
> Terminal window 1: running 40 CPU burner jobs
> Terminal window 2: running 40 CPU burner jobs
> Terminal window 1: running 1 CPU burner job
>
> Demonstrated that:
> * Writing "0" to the autogroup file for TW1 now causes no change
> to the rate at which the process on the terminal consume CPU.
> * Writing -20 to the autogroup file for TW1 caused those processes
> to get the lion's share of CPU while TW2 TW3 get a tiny amount.
> * Writing -20 to the autogroup files for TW1 and TW3 allowed the
> process on TW3 to get as much CPU as it was getting as when
> the autogroup nice values for both terminals were 0.

Reported-by: Michael Kerrisk <[email protected]>
Tested-by: Michael Kerrisk <[email protected]>
Signed-off-by: Mike Galbraith <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: linux-man <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/auto_group.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index f1c8fd5..da39489 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -212,6 +212,7 @@ int proc_sched_autogroup_set_nice(struct task_struct *p, int nice)
{
static unsigned long next = INITIAL_JIFFIES;
struct autogroup *ag;
+ unsigned long shares;
int err;

if (nice < MIN_NICE || nice > MAX_NICE)
@@ -230,9 +231,10 @@ int proc_sched_autogroup_set_nice(struct task_struct *p, int nice)

next = HZ / 10 + jiffies;
ag = autogroup_task_get(p);
+ shares = scale_load(sched_prio_to_weight[nice + 20]);

down_write(&ag->lock);
- err = sched_group_set_shares(ag->tg, sched_prio_to_weight[nice + 20]);
+ err = sched_group_set_shares(ag->tg, shares);
if (!err)
ag->nice = nice;
up_write(&ag->lock);

Subject: RFC: documentation of the autogroup feature [v2]

Hi Mike,

I reworked the text on autogroups, and in the process learned
something/have another question. Could you tell me if anything
in the below needs fixing/improving, and also let me know about
the FIXME?

Thanks,

Michael

The autogroup feature
Since Linux 2.6.38, the kernel provides a feature known as
autogrouping to improve interactive desktop performance in the
face of multiprocess, CPU-intensive workloads such as building
the Linux kernel with large numbers of parallel build processes
(i.e., the make(1) -j flag).

This feature operates in conjunction with the CFS scheduler and
requires a kernel that is configured with CONFIG_SCHED_AUTO‐
GROUP. On a running system, this feature is enabled or dis‐
abled via the file /proc/sys/kernel/sched_autogroup_enabled; a
value of 0 disables the feature, while a value of 1 enables it.
The default value in this file is 1, unless the kernel was
booted with the noautogroup parameter.

A new autogroup is created created when a new session is cre‐
ated via setsid(2); this happens, for example, when a new ter‐
minal window is started. A new process created by fork(2)
inherits its parent's autogroup membership. Thus, all of the
processes in a session are members of the same autogroup. An
autogroup is automatically destroyed when the last process in
the group terminates.

When autogrouping is enabled, all of the members of an auto‐
group are placed in the same kernel scheduler "task group".
The CFS scheduler employs an algorithm that equalizes the dis‐
tribution of CPU cycles across task groups. The benefits of
this for interactive desktop performance can be described via
the following example.

Suppose that there are two autogroups competing for the same
CPU. The first group contains ten CPU-bound processes from a
kernel build started with make -j10. The other contains a sin‐
gle CPU-bound process: a video player. The effect of auto‐
grouping is that the two groups will each receive half of the
CPU cycles. That is, the video player will receive 50% of the
CPU cycles, rather just 9% of the cycles, which would likely
lead to degraded video playback. Or to put things another way:
an autogroup that contains a large number of CPU-bound pro‐
cesses does not end up overwhelming the CPU at the expense of
the other jobs on the system.

A process's autogroup (task group) membership can be viewed via
the file /proc/[pid]/autogroup:

$ cat /proc/1/autogroup
/autogroup-1 nice 0

This file can also be used to modify the CPU bandwidth allo‐
cated to an autogroup. This is done by writing a number in the
"nice" range to the file to set the autogroup's nice value.
The allowed range is from +19 (low priority) to -20 (high pri‐
ority), and the setting has the same effect as modifying the
nice level via getpriority(2). (For a discussion of the nice
value, see getpriority(2).)


┌─────────────────────────────────────────────────────┐
│FIXME │
├─────────────────────────────────────────────────────┤
│How do the nice value of a process and the nice │
│value of an autogroup interact? Which has priority? │
│ │
│It *appears* that the autogroup nice value is used │
│for CPU distribution between task groups, and that │
│the process nice value has no effect there. (I.e., │
│suppose two autogroups each contain a CPU-bound │
│process, with one process having nice==0 and the │
│other having nice==19. It appears that they each │
│get 50% of the CPU.) It appears that the process │
│nice value has effect only with respect to schedul‐ │
│ing relative to other processes in the *same* auto‐ │
│group. Is this correct? │
└─────────────────────────────────────────────────────┘

The use of the cgroups(7) CPU controller overrides the effect
of autogrouping.

The autogroup feature does not group processes that are sched‐
uled under a real-time and deadline policies. Those processes
are scheduled according to the rules described earlier.


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-25 12:55:49

by afzal mohammed

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature [v2]

Hi,

On Thu, Nov 24, 2016 at 10:41:29PM +0100, Michael Kerrisk (man-pages) wrote:

> Suppose that there are two autogroups competing for the same
> CPU. The first group contains ten CPU-bound processes from a
> kernel build started with make -j10. The other contains a sin‐
> gle CPU-bound process: a video player. The effect of auto‐
> grouping is that the two groups will each receive half of the
> CPU cycles. That is, the video player will receive 50% of the
> CPU cycles, rather just 9% of the cycles, which would likely
^^^^
than ?

Regards
afzal

> lead to degraded video playback. Or to put things another way:
> an autogroup that contains a large number of CPU-bound pro‐
> cesses does not end up overwhelming the CPU at the expense of
> the other jobs on the system.

Subject: Re: RFC: documentation of the autogroup feature [v2]

On 11/25/2016 01:52 PM, Afzal Mohammed wrote:
> Hi,
>
> On Thu, Nov 24, 2016 at 10:41:29PM +0100, Michael Kerrisk (man-pages) wrote:
>
>> Suppose that there are two autogroups competing for the same
>> CPU. The first group contains ten CPU-bound processes from a
>> kernel build started with make -j10. The other contains a sin‐
>> gle CPU-bound process: a video player. The effect of auto‐
>> grouping is that the two groups will each receive half of the
>> CPU cycles. That is, the video player will receive 50% of the
>> CPU cycles, rather just 9% of the cycles, which would likely
> ^^^^
> than ?
>
> Regards
> afzal

Thanks, Afzal. Fixed!

Cheers,

Michael

>
>> lead to degraded video playback. Or to put things another way:
>> an autogroup that contains a large number of CPU-bound pro‐
>> cesses does not end up overwhelming the CPU at the expense of
>> the other jobs on the system.
>


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-25 13:21:28

by Mike Galbraith

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature [v2]

On Thu, 2016-11-24 at 22:41 +0100, Michael Kerrisk (man-pages) wrote:

> Suppose that there are two autogroups competing for the same
> CPU. The first group contains ten CPU-bound processes from a
> kernel build started with make -j10. The other contains a sin$B!>(B
> gle CPU-bound process: a video player. The effect of auto$B!>(B
> grouping is that the two groups will each receive half of the
> CPU cycles. That is, the video player will receive 50% of the
> CPU cycles, rather just 9% of the cycles, which would likely
> lead to degraded video playback. Or to put things another way:
> an autogroup that contains a large number of CPU-bound pro$B!>(B
> cesses does not end up overwhelming the CPU at the expense of
> the other jobs on the system.

I'd say something more wishy-washy here, like cycles are distributed
fairly across groups and leave it at that, as your detailed example is
incorrect due to SMP fairness (which I don't like much because [very
unlikely] worst case scenario renders a box sized group incapable of
utilizing more that a single CPU total). For example, if a group of
NR_CPUS size competes with a singleton, load balancing will try to give
the singleton a full CPU of its very own. If groups intersect for
whatever reason on say my quad lappy, distribution is 80/20 in favor of
the singleton.

> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
> $B("(BFIXME $B("(B
> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
> $B("(BHow do the nice value of a process and the nice $B("(B
> $B("(Bvalue of an autogroup interact? Which has priority? $B("(B
> $B("(B $B("(B
> $B("(BIt *appears* that the autogroup nice value is used $B("(B
> $B("(Bfor CPU distribution between task groups, and that $B("(B
> $B("(Bthe process nice value has no effect there. (I.e., $B("(B
> $B("(Bsuppose two autogroups each contain a CPU-bound $B("(B
> $B("(Bprocess, with one process having nice==0 and the $B("(B
> $B("(Bother having nice==19. It appears that they each $B("(B
> $B("(Bget 50% of the CPU.) It appears that the process $B("(B
> $B("(Bnice value has effect only with respect to schedul$B!>(B $B("(B
> $B("(Bing relative to other processes in the *same* auto$B!>(B $B("(B
> $B("(Bgroup. Is this correct? $B("(B
> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B

Yup, entity nice level affects distribution among peer entities.

-Mike

Subject: Re: RFC: documentation of the autogroup feature [v2]

Hi Mike,

On 11/25/2016 02:02 PM, Mike Galbraith wrote:
> On Thu, 2016-11-24 at 22:41 +0100, Michael Kerrisk (man-pages) wrote:
>
>> Suppose that there are two autogroups competing for the same
>> CPU. The first group contains ten CPU-bound processes from a
>> kernel build started with make -j10. The other contains a sin$B!>(B
>> gle CPU-bound process: a video player. The effect of auto$B!>(B
>> grouping is that the two groups will each receive half of the
>> CPU cycles. That is, the video player will receive 50% of the
>> CPU cycles, rather just 9% of the cycles, which would likely
>> lead to degraded video playback. Or to put things another way:
>> an autogroup that contains a large number of CPU-bound pro$B!>(B
>> cesses does not end up overwhelming the CPU at the expense of
>> the other jobs on the system.
>
> I'd say something more wishy-washy here, like cycles are distributed
> fairly across groups and leave it at that,

I see where you want to go, but the problem is that the word "fair"
will invoke different interpretations for different people. So, I
think one does need to be a little more concrete.

> as your detailed example is
> incorrect due to SMP fairness

Well, I was trying to exclude SMP from the discussion by saying
"competing for the same CPU". Here I was meaning that we involve
taskset(1) to confine everyone to the same CPU. Then, I think
my example is correct. (I did some light testing before writing
that text.) But I guess my meaning wasn't clear enough, and
it is a slightly contrived scenario anyway. I'll add some words
to clarify my example, and also add something to say that the
situation is more complex on an SMP system. Something like
the following:

Suppose that there are two autogroups competing for the same CPU
(i.e., presume either a single CPU system or the use of taskset(1)
to confine all the processes to the same CPU on an SMP system).
The first group contains ten CPU-bound processes from a kernel
build started with make -j10. The other contains a single CPU-
bound process: a video player. The effect of autogrouping is that
the two groups will each receive half of the CPU cycles. That is,
the video player will receive 50% of the CPU cycles, rather than
just 9% of the cycles, which would likely lead to degraded video
playback. The situation on an SMP system is more complex, but the
general effect is the same: the scheduler distributes CPU cycles
across task groups such that an autogroup that contains a large
number of CPU-bound processes does not end up hoffing CPU cycles
at the expense of the other jobs on the system.

> (which I don't like much because [very
> unlikely] worst case scenario renders a box sized group incapable of
> utilizing more that a single CPU total). For example, if a group of
> NR_CPUS size competes with a singleton, load balancing will try to give
> the singleton a full CPU of its very own. If groups intersect for
> whatever reason on say my quad lappy, distribution is 80/20 in favor of
> the singleton.

Thanks for the additional info. Good for educating me, but I think
you'll agree it's more than we need for the man page.

>> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
>> $B("(BFIXME $B("(B
>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>> $B("(BHow do the nice value of a process and the nice $B("(B
>> $B("(Bvalue of an autogroup interact? Which has priority? $B("(B
>> $B("(B $B("(B
>> $B("(BIt *appears* that the autogroup nice value is used $B("(B
>> $B("(Bfor CPU distribution between task groups, and that $B("(B
>> $B("(Bthe process nice value has no effect there. (I.e., $B("(B
>> $B("(Bsuppose two autogroups each contain a CPU-bound $B("(B
>> $B("(Bprocess, with one process having nice==0 and the $B("(B
>> $B("(Bother having nice==19. It appears that they each $B("(B
>> $B("(Bget 50% of the CPU.) It appears that the process $B("(B
>> $B("(Bnice value has effect only with respect to schedul$B!>(B $B("(B
>> $B("(Bing relative to other processes in the *same* auto$B!>(B $B("(B
>> $B("(Bgroup. Is this correct? $B("(B
>> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
>
> Yup, entity nice level affects distribution among peer entities.

Huh! I only just learned about this via my experiments while
investigating autogroups.

How long have things been like this? Always? (I don't think
so.) Since the arrival of CFS? Since the arrival of
autogrouping? (I'm guessing not.) Since some other point?
(When?)

It seems to me that this renders the traditional process
nice pretty much useless. (I bet I'm not the only one who'd
be surprised by the current behavior.)

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: RFC: documentation of the autogroup feature [v2]

On 11/25/2016 04:04 PM, Michael Kerrisk (man-pages) wrote:
> Hi Mike,
>
> On 11/25/2016 02:02 PM, Mike Galbraith wrote:
>>> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
>>> $B("(BFIXME $B("(B
>>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>>> $B("(BHow do the nice value of a process and the nice $B("(B
>>> $B("(Bvalue of an autogroup interact? Which has priority? $B("(B
>>> $B("(B $B("(B
>>> $B("(BIt *appears* that the autogroup nice value is used $B("(B
>>> $B("(Bfor CPU distribution between task groups, and that $B("(B
>>> $B("(Bthe process nice value has no effect there. (I.e., $B("(B
>>> $B("(Bsuppose two autogroups each contain a CPU-bound $B("(B
>>> $B("(Bprocess, with one process having nice==0 and the $B("(B
>>> $B("(Bother having nice==19. It appears that they each $B("(B
>>> $B("(Bget 50% of the CPU.) It appears that the process $B("(B
>>> $B("(Bnice value has effect only with respect to schedul$B!>(B $B("(B
>>> $B("(Bing relative to other processes in the *same* auto$B!>(B $B("(B
>>> $B("(Bgroup. Is this correct? $B("(B
>>> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
>>
>> Yup, entity nice level affects distribution among peer entities.
>
> Huh! I only just learned about this via my experiments while
> investigating autogroups.
>
> How long have things been like this? Always? (I don't think
> so.) Since the arrival of CFS? Since the arrival of
> autogrouping? (I'm guessing not.) Since some other point?
> (When?)

Okay, things changed sometime after 2.6.31, at least.
(Just tested on an old box.) So, presumably with the arrival
of either CFS or autogrouping? Next comment certainly applies:

> It seems to me that this renders the traditional process
> nice pretty much useless. (I bet I'm not the only one who'd
> be surprised by the current behavior.)

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-25 15:53:37

by Mike Galbraith

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature [v2]

On Fri, 2016-11-25 at 16:04 +0100, Michael Kerrisk (man-pages) wrote:

> > > $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
> > > $B("(BFIXME $B("(B
> > > $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
> > > $B("(BHow do the nice value of a process and the nice $B("(B
> > > $B("(Bvalue of an autogroup interact? Which has priority? $B("(B
> > > $B("(B $B("(B
> > > $B("(BIt *appears* that the autogroup nice value is used $B("(B
> > > $B("(Bfor CPU distribution between task groups, and that $B("(B
> > > $B("(Bthe process nice value has no effect there. (I.e., $B("(B
> > > $B("(Bsuppose two autogroups each contain a CPU-bound $B("(B
> > > $B("(Bprocess, with one process having nice==0 and the $B("(B
> > > $B("(Bother having nice==19. It appears that they each $B("(B
> > > $B("(Bget 50% of the CPU.) It appears that the process $B("(B
> > > $B("(Bnice value has effect only with respect to schedul$B!>(B $B("(B
> > > $B("(Bing relative to other processes in the *same* auto$B!>(B $B("(B
> > > $B("(Bgroup. Is this correct? $B("(B
> > > $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
> >
> > Yup, entity nice level affects distribution among peer entities.
>
> Huh! I only just learned about this via my experiments while
> investigating autogroups.
>
> How long have things been like this? Always? (I don't think
> so.) Since the arrival of CFS? Since the arrival of
> autogrouping? (I'm guessing not.) Since some other point?
> (When?)

Always. Before CFS there just were no non-peers :)

> It seems to me that this renders the traditional process
> nice pretty much useless. (I bet I'm not the only one who'd
> be surprised by the current behavior.)

Yup, group scheduling is not a single edged sword, those don't exist.
Box wide nice loss is not the only thing that can bite you, fairness,
whether group or task oriented cuts both ways.

-Mike

2016-11-25 16:06:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature [v2]

On Fri, Nov 25, 2016 at 04:04:25PM +0100, Michael Kerrisk (man-pages) wrote:
> >> ┌─────────────────────────────────────────────────────┐
> >> │FIXME │
> >> ├─────────────────────────────────────────────────────┤
> >> │How do the nice value of a process and the nice │
> >> │value of an autogroup interact? Which has priority? │
> >> │ │
> >> │It *appears* that the autogroup nice value is used │
> >> │for CPU distribution between task groups, and that │
> >> │the process nice value has no effect there. (I.e., │
> >> │suppose two autogroups each contain a CPU-bound │
> >> │process, with one process having nice==0 and the │
> >> │other having nice==19. It appears that they each │
> >> │get 50% of the CPU.) It appears that the process │
> >> │nice value has effect only with respect to schedul‐ │
> >> │ing relative to other processes in the *same* auto‐ │
> >> │group. Is this correct? │
> >> └─────────────────────────────────────────────────────┘
> >
> > Yup, entity nice level affects distribution among peer entities.
>
> Huh! I only just learned about this via my experiments while
> investigating autogroups.
>
> How long have things been like this? Always? (I don't think
> so.) Since the arrival of CFS? Since the arrival of
> autogrouping? (I'm guessing not.) Since some other point?
> (When?)

Ever since cfs-cgroup, this is a fundamental design point of cgroups,
and has therefore always been the case for autogroups (as that is
nothing more than an application of the cgroup code).

> It seems to me that this renders the traditional process
> nice pretty much useless. (I bet I'm not the only one who'd
> be surprised by the current behavior.)

Its really rather fundamental to how the whole hierarchical things
works.

CFS is a weighted fair queueing scheduler; this means each entity
receives:

w_i
dt_i = dt --------
\Sum w_j


CPU
______/ \______
/ | | \
A B C D


So if each entity {A,B,C,D} has equal weight, then they will receive
equal time. Explicitly, for C you get:


w_C
dt_C = dt -----------------------
(w_A + w_B + w_C + w_D)


Extending this to a hierarchy, we get:


CPU
______/ \______
/ | | \
A B C D
/ \
E F

Where C becomes a 'server' for entities {E,F}. The weight of C does not
depend on its child entities. This way the time of {E,F} becomes a
straight product of their ratio with C. That is; the whole thing
becomes, where l denotes the level in the hierarchy and i an
entity on that level:

l w_g,i
dt_l,i = dt \Prod ----------
g=0 \Sum w_g,j


Or more concretely, for E:

w_E
dt_1,E = dt_0,C -----------
(w_E + w_F)

w_C w_E
= dt ----------------------- -----------
(w_A + w_B + w_C + w_D) (w_E + w_F)


And this 'trivially' extends to SMP, with the tricky bit being that the
sums over all entities end up being machine wide, instead of per CPU,
which is a real and royal pain for performance.


Note that this property, where the weight of the server entity is
independent from its child entities is a desired feature. Without that
it would be impossible to control the relative weights of groups, and
that is the sole parameter of the WFQ model.

It is also why Linus so likes autogroups, each session competes equally
amongst one another.

Subject: Re: RFC: documentation of the autogroup feature [v2]

On 11/25/2016 04:51 PM, Mike Galbraith wrote:
> On Fri, 2016-11-25 at 16:04 +0100, Michael Kerrisk (man-pages) wrote:
>
>>>> $B(#(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!($(B
>>>> $B("(BFIXME $B("(B
>>>> $B('(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!()(B
>>>> $B("(BHow do the nice value of a process and the nice $B("(B
>>>> $B("(Bvalue of an autogroup interact? Which has priority? $B("(B
>>>> $B("(B $B("(B
>>>> $B("(BIt *appears* that the autogroup nice value is used $B("(B
>>>> $B("(Bfor CPU distribution between task groups, and that $B("(B
>>>> $B("(Bthe process nice value has no effect there. (I.e., $B("(B
>>>> $B("(Bsuppose two autogroups each contain a CPU-bound $B("(B
>>>> $B("(Bprocess, with one process having nice==0 and the $B("(B
>>>> $B("(Bother having nice==19. It appears that they each $B("(B
>>>> $B("(Bget 50% of the CPU.) It appears that the process $B("(B
>>>> $B("(Bnice value has effect only with respect to schedul$B!>(B $B("(B
>>>> $B("(Bing relative to other processes in the *same* auto$B!>(B $B("(B
>>>> $B("(Bgroup. Is this correct? $B("(B
>>>> $B(&(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(!(%(B
>>>
>>> Yup, entity nice level affects distribution among peer entities.
>>
>> Huh! I only just learned about this via my experiments while
>> investigating autogroups.
>>
>> How long have things been like this? Always? (I don't think
>> so.) Since the arrival of CFS? Since the arrival of
>> autogrouping? (I'm guessing not.) Since some other point?
>> (When?)
>
> Always. Before CFS there just were no non-peers :)

Well that's one way of looking at it. So, the change
that I'm talking about came in 2.6.32 with CFS then?

>> It seems to me that this renders the traditional process
>> nice pretty much useless. (I bet I'm not the only one who'd
>> be surprised by the current behavior.)
>
> Yup, group scheduling is not a single edged sword, those don't exist.
> Box wide nice loss is not the only thing that can bite you, fairness,
> whether group or task oriented cuts both ways.

Understood. But again I'll say, I bet a lot of old-time users
(and maybe many newer) would be surprised by the fact that
nice(1) / setpriority(2) have effectively been rendered no-ops
in many use cases. At the very least, it'd have been nice
if someone had sent a man pages patch or at least a note...

Cheers,

Michael



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-25 16:14:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature [v2]

On Fri, Nov 25, 2016 at 05:04:56PM +0100, Peter Zijlstra wrote:
> That is; the whole thing
> becomes, where l denotes the level in the hierarchy and i an
> entity on that level:
>
> l w_g,i
> dt_l,i = dt \Prod ----------
> g=0 \Sum w_g,j
>
>
> Or more concretely, for E:
>
> w_E
> dt_1,E = dt_0,C -----------
> (w_E + w_F)
>
> w_C w_E
> = dt ----------------------- -----------
> (w_A + w_B + w_C + w_D) (w_E + w_F)
>

And this also immediately shows one of the 'problems' with it. Since we
don't have floating point in kernel, these fractions are evaluated with
fixed-point arithmetic. Traditionally (and on 32bit) we use 10bit fixed
point, recently we switched to 20bit for 64bit machines.

That change is what bit you on the nice testing.

But it also means that once we run out of fractional bits things go
wobbly. The fractions, as per the above, increase the deeper the group
hierarchy goes but are also affected by the number of CPUs in the system
(not immediately represented in that equation).

Not to mention that many scheduler operations become O(depth) in cost,
which also hurts. An obvious example being task selection, we pick a
runnable entity for each level, until the resulting entity has no
further children (iow. is a task).

2016-11-25 16:18:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature [v2]

On Fri, Nov 25, 2016 at 05:08:44PM +0100, Michael Kerrisk (man-pages) wrote:
> On 11/25/2016 04:51 PM, Mike Galbraith wrote:
> Well that's one way of looking at it. So, the change
> that I'm talking about came in 2.6.32 with CFS then?

cfs-cgroup landed later I think, and it was fairly wobbly in the first
few release (as per usual I'd say for major features).

Subject: Re: RFC: documentation of the autogroup feature [v2]

Hi Peter,

On 11/25/2016 05:04 PM, Peter Zijlstra wrote:
> On Fri, Nov 25, 2016 at 04:04:25PM +0100, Michael Kerrisk (man-pages) wrote:
>>>> ┌─────────────────────────────────────────────────────┐
>>>> │FIXME │
>>>> ├─────────────────────────────────────────────────────┤
>>>> │How do the nice value of a process and the nice │
>>>> │value of an autogroup interact? Which has priority? │
>>>> │ │
>>>> │It *appears* that the autogroup nice value is used │
>>>> │for CPU distribution between task groups, and that │
>>>> │the process nice value has no effect there. (I.e., │
>>>> │suppose two autogroups each contain a CPU-bound │
>>>> │process, with one process having nice==0 and the │
>>>> │other having nice==19. It appears that they each │
>>>> │get 50% of the CPU.) It appears that the process │
>>>> │nice value has effect only with respect to schedul‐ │
>>>> │ing relative to other processes in the *same* auto‐ │
>>>> │group. Is this correct? │
>>>> └─────────────────────────────────────────────────────┘
>>>
>>> Yup, entity nice level affects distribution among peer entities.
>>
>> Huh! I only just learned about this via my experiments while
>> investigating autogroups.
>>
>> How long have things been like this? Always? (I don't think
>> so.) Since the arrival of CFS? Since the arrival of
>> autogrouping? (I'm guessing not.) Since some other point?
>> (When?)
>
> Ever since cfs-cgroup,

Okay. That begs the question still though.

> this is a fundamental design point of cgroups,
> and has therefore always been the case for autogroups (as that is
> nothing more than an application of the cgroup code).

Understood.

>> It seems to me that this renders the traditional process
>> nice pretty much useless. (I bet I'm not the only one who'd
>> be surprised by the current behavior.)
>
> Its really rather fundamental to how the whole hierarchical things
> works.
>
> CFS is a weighted fair queueing scheduler; this means each entity
> receives:
>
> w_i
> dt_i = dt --------
> \Sum w_j
>
>
> CPU
> ______/ \______
> / | | \
> A B C D
>
>
> So if each entity {A,B,C,D} has equal weight, then they will receive
> equal time. Explicitly, for C you get:
>
>
> w_C
> dt_C = dt -----------------------
> (w_A + w_B + w_C + w_D)
>
>
> Extending this to a hierarchy, we get:
>
>
> CPU
> ______/ \______
> / | | \
> A B C D
> / \
> E F
>
> Where C becomes a 'server' for entities {E,F}. The weight of C does not
> depend on its child entities. This way the time of {E,F} becomes a
> straight product of their ratio with C. That is; the whole thing
> becomes, where l denotes the level in the hierarchy and i an
> entity on that level:
>
> l w_g,i
> dt_l,i = dt \Prod ----------
> g=0 \Sum w_g,j
>
>
> Or more concretely, for E:
>
> w_E
> dt_1,E = dt_0,C -----------
> (w_E + w_F)
>
> w_C w_E
> = dt ----------------------- -----------
> (w_A + w_B + w_C + w_D) (w_E + w_F)
>
>
> And this 'trivially' extends to SMP, with the tricky bit being that the
> sums over all entities end up being machine wide, instead of per CPU,
> which is a real and royal pain for performance.

Okay -- you're really quite the ASCII artist. And somehow,
I think you needed to compose the mail in LaTeX. But thanks
for the detail. It's helpful, for me at least.

> Note that this property, where the weight of the server entity is
> independent from its child entities is a desired feature. Without that
> it would be impossible to control the relative weights of groups, and
> that is the sole parameter of the WFQ model.
>
> It is also why Linus so likes autogroups, each session competes equally
> amongst one another.

I get it. But, the behavior changes for the process nice value are
undocumented, and they should be documented. I understand
what the behavior change was. But not yet when.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: RFC: documentation of the autogroup feature [v2]

On 11/25/2016 05:18 PM, Peter Zijlstra wrote:
> On Fri, Nov 25, 2016 at 05:08:44PM +0100, Michael Kerrisk (man-pages) wrote:
>> On 11/25/2016 04:51 PM, Mike Galbraith wrote:
>> Well that's one way of looking at it. So, the change
>> that I'm talking about came in 2.6.32 with CFS then?
>
> cfs-cgroup landed later I think, and it was fairly wobbly in the first
> few release (as per usual I'd say for major features).

So I've been searching git logs and elsewhere, but didn't yet
find a likely commit(s). Any clues what I should be looking for.
I'd like this info, because while documenting the changes, I'd
also like to document when they occurred.

Cheers,

Michael



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: RFC: documentation of the autogroup feature [v2]

Hi Peter,

On 11/25/2016 05:34 PM, Michael Kerrisk (man-pages) wrote:
> On 11/25/2016 05:18 PM, Peter Zijlstra wrote:
>> On Fri, Nov 25, 2016 at 05:08:44PM +0100, Michael Kerrisk (man-pages) wrote:
>>> On 11/25/2016 04:51 PM, Mike Galbraith wrote:
>>> Well that's one way of looking at it. So, the change
>>> that I'm talking about came in 2.6.32 with CFS then?
>>
>> cfs-cgroup landed later I think, and it was fairly wobbly in the first
>> few release (as per usual I'd say for major features).
>
> So I've been searching git logs and elsewhere, but didn't yet
> find a likely commit(s). Any clues what I should be looking for.
> I'd like this info, because while documenting the changes, I'd
> also like to document when they occurred.

So, part of what I was struggling with was what you meant by cfs-cgroup.
Do you mean the CFS bandwidth control features added in Linux 3.2?

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-25 21:49:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature [v2]

On Fri, Nov 25, 2016 at 09:54:05PM +0100, Michael Kerrisk (man-pages) wrote:
> So, part of what I was struggling with was what you meant by cfs-cgroup.
> Do you mean the CFS bandwidth control features added in Linux 3.2?

Nope, /me digs around for a bit... around here I suppose:

68318b8e0b61 ("Hook up group scheduler with control groups")

68318b8e0b61 v2.6.24-rc1~151

But I really have no idea what that looked like.

In any case, for the case of autogroup, the behaviour has always been,
autogroups came quite late.

2016-11-25 22:48:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature [v2]

On Fri, Nov 25, 2016 at 05:33:23PM +0100, Michael Kerrisk (man-pages) wrote:

> Okay -- you're really quite the ASCII artist. And somehow,
> I think you needed to compose the mail in LaTeX. But thanks
> for the detail. It's helpful, for me at least.

Hehe, its been a while since I did LaTeX, so I'd probably make a mess of
it :-) Glad my ramblings made sense.

> > Note that this property, where the weight of the server entity is
> > independent from its child entities is a desired feature. Without that
> > it would be impossible to control the relative weights of groups, and
> > that is the sole parameter of the WFQ model.
> >
> > It is also why Linus so likes autogroups, each session competes equally
> > amongst one another.
>
> I get it. But, the behavior changes for the process nice value are
> undocumented, and they should be documented. I understand
> what the behavior change was. But not yet when.

Well, its all undocumented -- I suppose you're about to go fix that :-)

But think of it differently, think of the group as a container, then the
behaviour inside the container is exactly as expected.

Subject: Re: RFC: documentation of the autogroup feature

Hi Mike,

On 11/23/2016 04:33 PM, Mike Galbraith wrote:
> On Wed, 2016-11-23 at 14:54 +0100, Michael Kerrisk (man-pages) wrote:
>> Hi Mike,

[...]

>> Actually, can you define for me what the root task group is, and
>> why it exists? That may be worth some words in this man page.
>
> I don't think we need group scheduling details, there's plenty of
> documentation elsewhere for those who want theory. Autogroup is for
> those who don't want to have to care (which is also why it should have
> never grown nice knob).

Actually, the more I think about this, the more I think we *do*
need a few details on group scheduling. Otherwise, it's difficult
to explain to the use why nice(1) no longer works as traditionally
expected.

Here's my attempt to define the root task group:

* If autogrouping is disabled, then all processes in the root CPU
cgroup form a scheduling group (sometimes called the "root task
group").

Can you improve on this?

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-28 01:46:45

by Mike Galbraith

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature

On Sun, 2016-11-27 at 22:13 +0100, Michael Kerrisk (man-pages) wrote:

> Here's my attempt to define the root task group:
>
> * If autogrouping is disabled, then all processes in the root CPU
> cgroup form a scheduling group (sometimes called the "root task
> group").
>
> Can you improve on this?

A task group is a set of percpu runqueues. The root task group is the
top level set in a hierarchy of such sets when group scheduling is
enabled, or the only set when group scheduling is not enabled. The
autogroup hierarchy has a depth of one, ie all autogroups are peers
who's common parent is the root task group.

-Mike

Subject: Re: RFC: documentation of the autogroup feature [v2]

Hi Peter,

On 11/25/2016 10:49 PM, Peter Zijlstra wrote:
> On Fri, Nov 25, 2016 at 09:54:05PM +0100, Michael Kerrisk (man-pages) wrote:
>> So, part of what I was struggling with was what you meant by cfs-cgroup.
>> Do you mean the CFS bandwidth control features added in Linux 3.2?
>
> Nope, /me digs around for a bit... around here I suppose:
>
> 68318b8e0b61 ("Hook up group scheduler with control groups")

Thanks. The pieces are starting to fall into place now.

> 68318b8e0b61 v2.6.24-rc1~151
>
> But I really have no idea what that looked like.
>
> In any case, for the case of autogroup, the behaviour has always been,
> autogroups came quite late.

This ("the behavior has always been") isn't quite true. Yes, group
scheduling has been around since Linux 2.6.24, but in terms of the
semantics of the thread nice value, there was no visible change
then, *unless* explicit action was taken to create cgroups.

The arrival of autogroups in Linux 2.6.38 was different.
With this feature enabled (which is the default), task
groups were implicitly created *without the user needing to
do anything*. Thus, [two terminal windows] == [two task groups]
and in those two terminal windows, nice(1) on a CPU-bound
command in one terminal did nothing in terms of improving
CPU access for a CPU-bound tasks running on the other terminal
window.

Put more succinctly: in Linux 2.6.38, autogrouping broke nice(1)
for many use cases.

Once I came to that simple summary it was easy to find multiple
reports of problems from users:

http://serverfault.com/questions/405092/nice-level-not-working-on-linux
http://superuser.com/questions/805599/nice-has-no-effect-in-linux-unless-the-same-shell-is-used
https://www.reddit.com/r/linux/comments/1c4jew/nice_has_no_effect/
http://stackoverflow.com/questions/10342470/process-niceness-priority-setting-has-no-effect-on-linux

Someone else quickly pointed out to me another such report:

https://bbs.archlinux.org/viewtopic.php?id=149553

And when I quickly surveyed a few more or less savvy Linux users
in one room, most understood what nice does, but none of them knew
about the behavior change wrought by autogroup.

I haven't looked at all of the mails in the old threads that
discussed the implementation of this feature, but so far none of
those that I saw mentioned this behavior change. It's unfortunate
that it never even got documented.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: RFC: documentation of the autogroup feature

[Resending because of bounces from the lists. (Somehow my mailer
messed up the MIME labeling)]

Hi Mike,

On 11/28/2016 02:46 AM, Mike Galbraith wrote:
> On Sun, 2016-11-27 at 22:13 +0100, Michael Kerrisk (man-pages) wrote:
>
>> Here's my attempt to define the root task group:
>>
>> * If autogrouping is disabled, then all processes in the root CPU
>> cgroup form a scheduling group (sometimes called the "root task
>> group").
>>
>> Can you improve on this?

The below is helpful, but...

> A task group is a set of percpu runqueues.

The explanation needs really to be in terms of what user-space
understands and sees. "Runqueues" are a kernel scheduler implementation
detail.

> The root task group is the
> top level set in a hierarchy of such sets when group scheduling is
> enabled, or the only set when group scheduling is not enabled. The
> autogroup hierarchy has a depth of one, ie all autogroups are peers
> who's common parent is the root task group.

Let's try and go further. How's this:

When scheduling non-real-time processes (i.e., those scheduled
under the SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE policies), the
CFS scheduler employs a technique known as "group scheduling", if
the kernel was configured with the CONFIG_FAIR_GROUP_SCHED option
(which is typical).

Under group scheduling, threads are scheduled in "task groups".
Task groups have a hierarchical relationship, rooted under the
initial task group on the system, known as the "root task group".
Task groups are formed in the following circumstances:

* All of the threads in a CPU cgroup form a task group. The par‐
ent of this task group is the task group of the corresponding
parent cgroup.

* If autogrouping is enabled, then all of the threads that are
(implicitly) placed in an autogroup (i.e., the same session, as
created by setsid(2)) form a task group. Each new autogroup is
thus a separate task group. The root task group is the parent
of all such autogroups.

* If autogrouping is enabled, then the root task group consists
of all processes in the root CPU cgroup that were not otherwise
implicitly placed into a new autogroup.

* If autogrouping is disabled, then the root task group consists
of all processes in the root CPU cgroup.

* If group scheduling was disabled (i.e., the kernel was config‐
ured without CONFIG_FAIR_GROUP_SCHED), then all of the pro‐
cesses on the system are notionally placed in a single task
group.

[To be followed by a discussion of the nice value and task groups]

?

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-29 11:47:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature [v2]

On Tue, Nov 29, 2016 at 08:43:33AM +0100, Michael Kerrisk (man-pages) wrote:
> >
> > In any case, for the case of autogroup, the behaviour has always been,
> > autogroups came quite late.
>
> This ("the behavior has always been") isn't quite true. Yes, group
> scheduling has been around since Linux 2.6.24, but in terms of the
> semantics of the thread nice value, there was no visible change
> then, *unless* explicit action was taken to create cgroups.
>
> The arrival of autogroups in Linux 2.6.38 was different.
> With this feature enabled (which is the default), task

I don't think the SCHED_AUTOGROUP symbol is default y, most distros
might have default enabled it, but that's not something I can help.

> groups were implicitly created *without the user needing to
> do anything*. Thus, [two terminal windows] == [two task groups]
> and in those two terminal windows, nice(1) on a CPU-bound
> command in one terminal did nothing in terms of improving
> CPU access for a CPU-bound tasks running on the other terminal
> window.
>
> Put more succinctly: in Linux 2.6.38, autogrouping broke nice(1)
> for many use cases.
>
> Once I came to that simple summary it was easy to find multiple
> reports of problems from users:
>
> http://serverfault.com/questions/405092/nice-level-not-working-on-linux
> http://superuser.com/questions/805599/nice-has-no-effect-in-linux-unless-the-same-shell-is-used
> https://www.reddit.com/r/linux/comments/1c4jew/nice_has_no_effect/
> http://stackoverflow.com/questions/10342470/process-niceness-priority-setting-has-no-effect-on-linux
>
> Someone else quickly pointed out to me another such report:
>
> https://bbs.archlinux.org/viewtopic.php?id=149553

Well, none of that ever got back to me, so again, nothing I could do
about that.

> And when I quickly surveyed a few more or less savvy Linux users
> in one room, most understood what nice does, but none of them knew
> about the behavior change wrought by autogroup.
>
> I haven't looked at all of the mails in the old threads that
> discussed the implementation of this feature, but so far none of
> those that I saw mentioned this behavior change. It's unfortunate
> that it never even got documented.

Well, when we added the feature people (most notable Linus) understood
what cgroups did. So no surprises for any of us.

Subject: Re: RFC: documentation of the autogroup feature [v2]

Hi Peter,

On 29 November 2016 at 12:46, Peter Zijlstra <[email protected]> wrote:
> On Tue, Nov 29, 2016 at 08:43:33AM +0100, Michael Kerrisk (man-pages) wrote:
>> >
>> > In any case, for the case of autogroup, the behaviour has always been,
>> > autogroups came quite late.
>>
>> This ("the behavior has always been") isn't quite true. Yes, group
>> scheduling has been around since Linux 2.6.24, but in terms of the
>> semantics of the thread nice value, there was no visible change
>> then, *unless* explicit action was taken to create cgroups.
>>
>> The arrival of autogroups in Linux 2.6.38 was different.
>> With this feature enabled (which is the default), task
>
> I don't think the SCHED_AUTOGROUP symbol is default y, most distros
> might have default enabled it, but that's not something I can help.

Actually, it looks to me like it is the default. But that isn't really
the point. Even if the default was off, it's the way of things that
distros will generally default "on" things, because some users want
them. That's a repeated and to be expected pattern.

>> groups were implicitly created *without the user needing to
>> do anything*. Thus, [two terminal windows] == [two task groups]
>> and in those two terminal windows, nice(1) on a CPU-bound
>> command in one terminal did nothing in terms of improving
>> CPU access for a CPU-bound tasks running on the other terminal
>> window.
>>
>> Put more succinctly: in Linux 2.6.38, autogrouping broke nice(1)
>> for many use cases.
>>
>> Once I came to that simple summary it was easy to find multiple
>> reports of problems from users:
>>
>> http://serverfault.com/questions/405092/nice-level-not-working-on-linux
>> http://superuser.com/questions/805599/nice-has-no-effect-in-linux-unless-the-same-shell-is-used
>> https://www.reddit.com/r/linux/comments/1c4jew/nice_has_no_effect/
>> http://stackoverflow.com/questions/10342470/process-niceness-priority-setting-has-no-effect-on-linux
>>
>> Someone else quickly pointed out to me another such report:
>>
>> https://bbs.archlinux.org/viewtopic.php?id=149553
>
> Well, none of that ever got back to me, so again, nothing I could do
> about that.

I understand. It's just unfortunate that the (as far as I can see) the
implications were not fully considered before making the change. Such
consideration often springs out of writing comprehensive
documentation, I find ;-).

>> And when I quickly surveyed a few more or less savvy Linux users
>> in one room, most understood what nice does, but none of them knew
>> about the behavior change wrought by autogroup.
>>
>> I haven't looked at all of the mails in the old threads that
>> discussed the implementation of this feature, but so far none of
>> those that I saw mentioned this behavior change. It's unfortunate
>> that it never even got documented.
>
> Well, when we added the feature people (most notable Linus) understood
> what cgroups did. So no surprises for any of us.

Sure, but cgroups is different. It requires explicit action by the
ueser (creating cgroups) to see the behavior.

With autogroups, the change kicks in on the desktop without the user
needing to do anything, and changes desktop behavior in a way that was
unexpected.

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2016-11-29 13:48:20

by Mike Galbraith

[permalink] [raw]
Subject: Re: RFC: documentation of the autogroup feature

On Tue, 2016-11-29 at 10:10 +0100, Michael Kerrisk (man-pages) wrote:
> Let's try and go further. How's this:
>
> When scheduling non-real-time processes (i.e., those scheduled
> under the SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE policies), the
> CFS scheduler employs a technique known as "group scheduling", if
> the kernel was configured with the CONFIG_FAIR_GROUP_SCHED option
> (which is typical).
>
> Under group scheduling, threads are scheduled in "task groups".
> Task groups have a hierarchical relationship, rooted under the
> initial task group on the system, known as the "root task group".
> Task groups are formed in the following circumstances:
>
> * All of the threads in a CPU cgroup form a task group. The par$B!>(B
> ent of this task group is the task group of the corresponding
> parent cgroup.
>
> * If autogrouping is enabled, then all of the threads that are
> (implicitly) placed in an autogroup (i.e., the same session, as
> created by setsid(2)) form a task group. Each new autogroup is
> thus a separate task group. The root task group is the parent
> of all such autogroups.
>
> * If autogrouping is enabled, then the root task group consists
> of all processes in the root CPU cgroup that were not otherwise
> implicitly placed into a new autogroup.
>
> * If autogrouping is disabled, then the root task group consists
> of all processes in the root CPU cgroup.
>
> * If group scheduling was disabled (i.e., the kernel was config$B!>(B
> ured without CONFIG_FAIR_GROUP_SCHED), then all of the pro$B!>(B
> cesses on the system are notionally placed in a single task
> group.

Notionally works for me.

-Mike