Subject: cgroups(7): documenting cgroups v2 thread mode

Hello Tejun and all,

To date, the cgroups(7) manual page does not document thread mode
(added in Linux 4.14). Furthermore, the documentation in
Documentation/cgroup-v2.txt is, I think, a little thin.

I have attempted to address this by adding some extensive documentation
to the cgroups(7) manual page. This text is based on some reading
of Documentation/cgroup-v2.txt, reading of the kernel source, and
quite a lot of experimentation.

The plain-text version for (easy review) is shown below. I would be
happy to receive review comments/corrections/improvements on the text below.

In particular, Tejun and Peter, I would be very happy if you could
take some time to look at this text.

The branch containing the pending cgroups(7) changes can be found at:
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_cgroup_updates

[[
CGROUPS V2 THREAD MODE
Among the restrictions imposed by cgroups v2 that were not
present in cgroups v1 are the following:

* No thread-granularity control: all of the threads of a
process must be in the same cgroup.

* No internal processes: a cgroup can't both have member pro‐
cesses and exercise controllers on child cgroups.

Both of these restrictions were added because the lack of these
restrictions had caused problems in cgroups v1. In particular,
the cgroups v1 ability to allow thread-level granularity for
cgroup membership made no sense for some controllers. (A
notable example was the memory controller: since threads share
an address space, it made no sense to split threads across dif‐
ferent memory cgroups.)

Notwithstanding the initial design decision in cgroups v2,
there were use cases for certain controllers, notably the cpu
controller, for which thread-level granularity of control was
meaningful and useful. To accommodate such use cases, Linux
4.14 added thread mode for cgroups v2.

Thread mode allows the following:

* The creation of threaded subtrees in which the threads of a
process may be spread across cgroups inside the tree. (A
threaded subtree may contain multiple multithreaded pro‐
cesses.)

* The concept of threaded controllers, which can distribute
resources across the cgroups in a threaded subtree.

* A relaxation of the "no internal processes rule", so that,
within a threaded subtree, a cgroup can both contain member
threads and exercise resource control over child cgroups.

With the addition of thread mode, each nonroot cgroup now con‐
tains a new file, cgroup.type, that exposes, and in some cir‐
cumstances can be used to change, the "type" of a cgroup. This
file contains one of the following type values:

domain This is a normal v2 cgroup that provides process-granu‐
larity control. If a process is a member of this
cgroup, then all threads of the process are (by defini‐
tion) in the same cgroup. This is the default cgroup
type, and provides the same behavior that was provided
for cgroups in the initial cgroups v2 implementation.

threaded
This cgroup is a member of a threaded subtree. Threads
can be added to this cgroup, and controllers can be
enabled for the cgroup.

domain threaded
This is a domain cgroup that serves as the root of a
threaded subtree. This cgroup type is also known as
"threaded root".

domain invalid
This is a cgroup inside a threaded subtree that is in an
"invalid" state. Processes can't be added to the
cgroup, and controllers can't be enabled for the cgroup.
The only thing that can be done with this cgroup (other
than deleting it) is to convert it to a threaded cgroup
by writing the string "threaded" to the cgroup.type
file.

Threaded versus domain controllers
With the addition of threads mode, cgroups v2 now distinguishes
two types of resource controllers:

* Threaded controllers: these controllers support thread-gran‐
ularity for resource control and can be enabled inside
threaded subtrees, with the result that the corresponding
controller-interface files appear inside the cgroups in the
threaded subtree. As at Linux 4.15, the following con‐
trollers are threaded: cpu, perf_event, and pids.

* Domain controllers: these controllers support only process
granularity for resource control. From the perspective of a
domain controller, all threads of a process are always in
the same cgroup. Domain controllers can't be enabled inside
a threaded subtree.

Creating a threaded subtree
There are two pathways that lead to the creation of a threaded
subtree. The first pathway proceeds as follows:

1. We write the string "threaded" to the cgroup.type file of a
cgroup y/z that currently has the type domain. This has the
following effects:

* The type of the cgroup y/z becomes threaded.

* The type of the parent cgroup, y, becomes domain
threaded. The parent cgroup is the root of a threaded
subtree (also known as the "threaded root").

* All other cgroups under y that were not already of type
threaded (because they were inside already existing
threaded subtrees under the new threaded root) are con‐
verted to type domain invalid. Any subsequently created
cgroups under y will also have the type domain invalid.

2. We write the string "threaded" to each of the domain invalid
cgroups under y, in order to convert them to the type
threaded. As a consequence of this step, all threads under
the threaded root now have the type threaded and the
threaded subtree is now fully usable. The requirement to
write "threaded" to each of these cgroups is somewhat cum‐
bersome, but allows for possible future extensions to the
thread-mode model.

┌─────────────────────────────────────────────────────┐
│FIXME │
├─────────────────────────────────────────────────────┤
│Re the preceding paragraphs... Are there other rea‐ │
│sosn for the (cumbersome) requirement to write │
│'threaded' to each of the cgroup.type files in the │
│threaded subtrees? Tejun Heo mentioned the follow‐ │
│ing: │
│ │
│ Consistency w/ the cgroups right under the root │
│ cgroup. Because they can be both domains and │
│ threadroots, we can't switch the children over │
│ to thread mode automatically. Doing that for │
│ cgroups further down in the hierarchy would be │
│ really inconsistent. │
│ │
│But, it's not clear to me how "Doing that for │
│cgroups further down in the hierarchy would be │
│really inconsistent", since in the current implemen‐ │
│tation, those same thread groups are converted to │
│"domain invalid" type. What am I missing? │
└─────────────────────────────────────────────────────┘

The second way of creating a threaded subtree is as follows:

1. In an existing cgroup, z, that currently has the type
domain, we (1) enable one or more threaded controllers and
(2) make a process a member of z. (These two steps can be
done in either order.) This has the following consequences:

* The type of z becomes domain threaded.

* All of the descendant cgroups of x that are were not
already of type threaded are converted to type domain
invalid.

2. As before, we make the threaded subtree usable by writing
the string "threaded" to each of the domain invalid cgroups
under y, in order to convert them to the type threaded.

One of the consequences of the above pathways to creating a
threaded subtree is that the threaded root cgroup can be a par‐
ent only to threaded (and domain invalid) cgroups. The
threaded root cgroup can't be a parent of a domain cgroups, and
a threaded cgroup can't have a sibling that is a domain cgroup.

Using a threaded subtree
Within a threaded subtree, threaded controllers can be enabled
in each subgroup whose type has been changed to threaded; upon
doing so, the corresponding controller interface files appear
in the children of that cgroup.

A process can be moved into a threaded subtree by writing its
PID to the cgroup.procs file in one of the cgroups inside the
tree. This has the effect of making all of the threads in the
process members of the corresponding cgroup and makes the
process a member of the threaded subtree. The threads of the
process can then be spread across the threaded subtree by writ‐
ing their thread IDs (see gettid(2)) to the cgroup.threads
files in different cgroups inside the subtree. The threads of
a process must all reside in the same threaded subtree.

The cgroup.threads file is present in each cgroup (including
domain cgroups) and can be read in order to discover the set of
threads that is present in the cgroup. The set of thread IDs
obtained when reading this file is not guaranteed to be ordered
or free of duplicates.

The cgroup.procs file in the threaded root shows the PIDs of
all processes that are members of the threaded subtree. The
cgroup.procs files in the other cgroups in the subtree are not
readable.

Domain controllers can't be enabled in a threaded subtree; no
controller-interface files appear inside the cgroups underneath
the threaded root. From the point of view of a domain con‐
troller, threaded subtrees are invisible: a multithreaded
process inside a threaded subtree appears to a domain con‐
troller as a process that resides in the threaded root cgroup.

Within a threaded subtree, the "no internal processes" rule
does not apply: a cgroup can both contain member processes (or
thread) and exercise controllers on child cgroups.

Rules for writing to cgroup.type and creating threaded subtrees
A number of rules apply when writing to the cgroup.type file:

* Only the string "threaded" may be written. In other words,
the only explicit transition that is possible is to convert
a domain cgroup to type threaded.

* The string "threaded" can be written only if the current
value in cgroup.type is one of the following

· domain, to start the creation of a threaded subtree via
the first of the pathways described above;

· domain invalid, to convert one of the cgroups in a
threaded subtree into a usable (i.e., threaded) state;

· threaded, which has no effect (a "no-op").

* We can't write to a cgroup.type file if the parent's type is
domain invalid. In other words, the cgroups of a threaded
subtree must be converted to the threaded state in a top-
down manner.

There are also various constraints that must be satisfied in
order to create a threaded subtree rooted at the cgroup x:

* There can be no member processes in the descendant cgroups
of x. (The cgroup x can itself have member processes.)

* No domain controllers may be enabled in x's cgroup.sub‐
tree_control file.

* The existing cgroups inside the threaded subtree must either
be of type domain or part of (unpopulated) threaded sub‐
trees.

If any of the above constraints is violated, then an attempt to
write "threaded" to a cgroup.type file fails with the error
ENOTSUP.

The "domain threaded" cgroup type
According to the pathways described above, the type of a cgroup
can change to domain threaded in either of the following cases:

* The string "threaded" is written to a child cgroup.

* A threaded controller is enabled inside the cgroup and a
process is made a member of the cgroup.

A domain threaded cgroup, x, can revert to the type domain if
the above conditions no longer hold true—that is, if all
threaded child cgroups of x are removed and either x no longer
has threaded controllers enabled or no longer has member pro‐
cesses.

When a domain threaded cgroup x reverts to the type domain:

* All domain invalid descendants of x that are not in lower-
level threaded subtrees revert to the type domain.

* The root cgroups in any lower-level threaded subtrees revert
to the type domain threaded.

Exceptions for the root cgroup
The root cgroup of the v2 hierarchy is treated exceptionally:
it can be the parent of both domain and threaded cgroups. If
the string "threaded" is written to the cgroup.type file of one
of the children of the root cgroup, then

* The type of that cgroup becomes threaded.

* The type of any descendants of that cgroup that are not part
of lower-level threaded subtrees changes to domain invalid.

Note that in this case, there is no cgroup whose type becomes
domain threaded. (Notionally, the root cgroup can be consid‐
ered as the threaded root for the cgroup whose type was changed
to threaded.)

The aim of this exceptional treatment for the root cgroup is to
allow a threaded cgroup that employs the cpu controller to be
placed as high as possible in the hierarchy, so as to minimize
the (small) cost of traversing the cgroup hierarchy.

The cgroups v2 "cpu" controller and realtime processes
As at Linux 4.15, the cgroups v2 cpu controller does not sup‐
port control of realtime processes, and the controller can be
enabled in the root cgroup only if all realtime threads are in
the root cgroup. (If there are realtime processes in nonroot
cgroups, then a write(2) of the string "+cpu" to the
cgroup.subtree_control file fails with the error EINVAL. How‐
ever, on some systems, systemd(1) places certain realtime pro‐
cesses in nonroot cgroups in the v2 hierarchy. On such sys‐
tems, these processes must first be moved to the root cgroup
before the cpu controller can be enabled.
]]

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


2018-01-09 21:10:21

by Tejun Heo

[permalink] [raw]
Subject: Re: cgroups(7): documenting cgroups v2 thread mode

Hello,

On Tue, Jan 02, 2018 at 07:24:01PM +0100, Michael Kerrisk (man-pages) wrote:
> 2. We write the string "threaded" to each of the domain invalid
> cgroups under y, in order to convert them to the type
> threaded. As a consequence of this step, all threads under
> the threaded root now have the type threaded and the
> threaded subtree is now fully usable. The requirement to
> write "threaded" to each of these cgroups is somewhat cum‐
> bersome, but allows for possible future extensions to the
> thread-mode model.
>
> ┌─────────────────────────────────────────────────────┐
> │FIXME │
> ├─────────────────────────────────────────────────────┤
> │Re the preceding paragraphs... Are there other rea‐ │
> │sosn for the (cumbersome) requirement to write │
> │'threaded' to each of the cgroup.type files in the │
> │threaded subtrees? Tejun Heo mentioned the follow‐ │
> │ing: │
> │ │
> │ Consistency w/ the cgroups right under the root │
> │ cgroup. Because they can be both domains and │
> │ threadroots, we can't switch the children over │
> │ to thread mode automatically. Doing that for │
> │ cgroups further down in the hierarchy would be │
> │ really inconsistent. │
> │ │
> │But, it's not clear to me how "Doing that for │
> │cgroups further down in the hierarchy would be │
> │really inconsistent", since in the current implemen‐ │
> │tation, those same thread groups are converted to │
> │"domain invalid" type. What am I missing? │
> └─────────────────────────────────────────────────────┘

Yeah, I was confused with an earlier varient where we were marking
threaded domains instead of threaded roots. It's mostly about future
extensibility (especially as Waiman was proposing related changes
there) and not doing things automatically / recursively if possible.

Looks good to me.

Thanks.

--
tejun

Subject: Re: cgroups(7): documenting cgroups v2 thread mode

On 01/09/2018 10:10 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Jan 02, 2018 at 07:24:01PM +0100, Michael Kerrisk (man-pages) wrote:
>> 2. We write the string "threaded" to each of the domain invalid
>> cgroups under y, in order to convert them to the type
>> threaded. As a consequence of this step, all threads under
>> the threaded root now have the type threaded and the
>> threaded subtree is now fully usable. The requirement to
>> write "threaded" to each of these cgroups is somewhat cum‐
>> bersome, but allows for possible future extensions to the
>> thread-mode model.
>>
>> ┌─────────────────────────────────────────────────────┐
>> │FIXME │
>> ├─────────────────────────────────────────────────────┤
>> │Re the preceding paragraphs... Are there other rea‐ │
>> │sosn for the (cumbersome) requirement to write │
>> │'threaded' to each of the cgroup.type files in the │
>> │threaded subtrees? Tejun Heo mentioned the follow‐ │
>> │ing: │
>> │ │
>> │ Consistency w/ the cgroups right under the root │
>> │ cgroup. Because they can be both domains and │
>> │ threadroots, we can't switch the children over │
>> │ to thread mode automatically. Doing that for │
>> │ cgroups further down in the hierarchy would be │
>> │ really inconsistent. │
>> │ │
>> │But, it's not clear to me how "Doing that for │
>> │cgroups further down in the hierarchy would be │
>> │really inconsistent", since in the current implemen‐ │
>> │tation, those same thread groups are converted to │
>> │"domain invalid" type. What am I missing? │
>> └─────────────────────────────────────────────────────┘
>
> Yeah, I was confused with an earlier varient where we were marking
> threaded domains instead of threaded roots. It's mostly about future
> extensibility (especially as Waiman was proposing related changes
> there) and not doing things automatically / recursively if possible.

Okay.

> Looks good to me.

Thanks for the review.

Cheers,

Michael



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: cgroups(7): documenting cgroups v2 thread mode

On 01/09/2018 10:10 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Jan 02, 2018 at 07:24:01PM +0100, Michael Kerrisk (man-pages) wrote:
>> 2. We write the string "threaded" to each of the domain invalid
>> cgroups under y, in order to convert them to the type
>> threaded. As a consequence of this step, all threads under
>> the threaded root now have the type threaded and the
>> threaded subtree is now fully usable. The requirement to
>> write "threaded" to each of these cgroups is somewhat cum‐
>> bersome, but allows for possible future extensions to the
>> thread-mode model.
>>
>> ┌─────────────────────────────────────────────────────┐
>> │FIXME │
>> ├─────────────────────────────────────────────────────┤
>> │Re the preceding paragraphs... Are there other rea‐ │
>> │sosn for the (cumbersome) requirement to write │
>> │'threaded' to each of the cgroup.type files in the │
>> │threaded subtrees? Tejun Heo mentioned the follow‐ │
>> │ing: │
>> │ │
>> │ Consistency w/ the cgroups right under the root │
>> │ cgroup. Because they can be both domains and │
>> │ threadroots, we can't switch the children over │
>> │ to thread mode automatically. Doing that for │
>> │ cgroups further down in the hierarchy would be │
>> │ really inconsistent. │
>> │ │
>> │But, it's not clear to me how "Doing that for │
>> │cgroups further down in the hierarchy would be │
>> │really inconsistent", since in the current implemen‐ │
>> │tation, those same thread groups are converted to │
>> │"domain invalid" type. What am I missing? │
>> └─────────────────────────────────────────────────────┘
>
> Yeah, I was confused with an earlier varient where we were marking
> threaded domains instead of threaded roots. It's mostly about future
> extensibility (especially as Waiman was proposing related changes
> there) and not doing things automatically / recursively if possible.
>
> Looks good to me.

One more thing. I added the following sentence to the text:

The cgroup.threads file is writable only for the cgroups inside a
threaded subtree.

Can you confirm that that is correct, please.

Cheers,

Michael



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2018-01-10 14:47:14

by Tejun Heo

[permalink] [raw]
Subject: Re: cgroups(7): documenting cgroups v2 thread mode

Hello,

On Tue, Jan 09, 2018 at 11:54:03PM +0100, Michael Kerrisk (man-pages) wrote:
> One more thing. I added the following sentence to the text:
>
> The cgroup.threads file is writable only for the cgroups inside a
> threaded subtree.
>
> Can you confirm that that is correct, please.

The only extra restriction is that the domain cgroup must be the same
for the source and destination, which is true for the entire threaded
subtree (the threaded domain). As each domain cgroup is its own
unique domain, cgroup.threads in them would only allow migrating to
self which is a noop; otherwise, it'd return -EOPNOTSUPP.

Thanks.

--
tejun

Subject: Re: cgroups(7): documenting cgroups v2 thread mode

Hello Tejun,

On 01/10/2018 03:47 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Jan 09, 2018 at 11:54:03PM +0100, Michael Kerrisk (man-pages) wrote:
>> One more thing. I added the following sentence to the text:
>>
>> The cgroup.threads file is writable only for the cgroups inside a
>> threaded subtree.
>>
>> Can you confirm that that is correct, please.
>
> The only extra restriction is that the domain cgroup must be the same
> for the source and destination, which is true for the entire threaded
> subtree (the threaded domain). As each domain cgroup is its own
> unique domain, cgroup.threads in them would only allow migrating to
> self which is a noop; otherwise, it'd return -EOPNOTSUPP.

Ahh yes. Now I understand. I made the description of the containment
rules for cgroup.threads more explicit in the text:

As with writing to cgroup.procs, some containment rules apply when
writing to the cgroup.threads file:

* The writer must have write permission on the cgroup.threads
file in the destination cgroup.

* The writer must have write permission on the cgroup.procs file
in the common ancestor of the source and destination cgroups.
(In some cases, the common ancestor may be the source or desti‐
nation cgroup itself.)

* The source and destination cgroups must be in the same threaded
subtree. (Outside a threaded subtree, an attempt to move a
thread by writing its thread ID to the cgroup.threads in a dif‐
ferent domain cgroup fails with the error EOPNOTSUPP.)

Okay? (I realize that the last bullet point is a rather different way of
formulating your idea that "the only extra restriction is that the domain
cgroup must be the same for the source and destination". But I think the
reformulation is easier to understand, no?)

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2018-01-10 22:29:18

by Tejun Heo

[permalink] [raw]
Subject: Re: cgroups(7): documenting cgroups v2 thread mode

Hello,

On Wed, Jan 10, 2018 at 11:18:48PM +0100, Michael Kerrisk (man-pages) wrote:
> Ahh yes. Now I understand. I made the description of the containment
> rules for cgroup.threads more explicit in the text:
>
> As with writing to cgroup.procs, some containment rules apply when
> writing to the cgroup.threads file:
>
> * The writer must have write permission on the cgroup.threads
> file in the destination cgroup.
>
> * The writer must have write permission on the cgroup.procs file
> in the common ancestor of the source and destination cgroups.
> (In some cases, the common ancestor may be the source or desti‐
> nation cgroup itself.)
>
> * The source and destination cgroups must be in the same threaded
> subtree. (Outside a threaded subtree, an attempt to move a
> thread by writing its thread ID to the cgroup.threads in a dif‐
> ferent domain cgroup fails with the error EOPNOTSUPP.)
>
> Okay? (I realize that the last bullet point is a rather different way of
> formulating your idea that "the only extra restriction is that the domain
> cgroup must be the same for the source and destination". But I think the
> reformulation is easier to understand, no?)

It looks great to me. Me explaining that way is mostly from internal
/ conceptual POV. Yours is definitely more approachable.

Thanks.

--
tejun