Hello Tejun,
I've discovered what looks to be a bug in the reporting
of PIDs in the cgroup.procs file in the "domain threaded"
node at the root of a threaded subtree. The following
demo is on vanilla kernel 4.19.
Suppose we have the following multithreaded process:
$ ps -L 654
PID LWP TTY STAT TIME COMMAND
654 654 pts/12 Tl 0:00 ./cpu_multithread_burner 100
654 655 pts/12 Tl 0:01 ./cpu_multithread_burner 100
Now suppose we create a threaded subtree in the v2 hierarchy:
# cd /sys/fs/cgroup/unified/
# mkdir -p x/a/b
# echo 'threaded' > x/a/cgroup.type
# echo 'threaded' > x/a/b/cgroup.type
Then we move the multithreaded process into x/a in
the threaded subtree:
# echo 654 > x/a/cgroup.procs
Now we visualize the set-up using my visualization
program[1]:
# go run ~mtk/lsp/cgroups/view_v2_cgroups.go x
x [dt]
PIDs: {654}
a [t]
TIDs: {654 655-[654]}
b [t]
The above is as I expect.
Now, we move the thread group leader (it has to be the
thread group leader to show the bug) to a x/a/b, and again
use my visualization program:
# echo 654 > x/a/b/cgroup.threads
# go run ~mtk/lsp/cgroups/view_v2_cgroups.go x
x [dt]
PIDs: {654 655}
a [t]
TIDs: {655-[654]}
b [t]
TIDs: {654}
Note how the *thread ID* of the non-thread-group-leader
(655) is being reported in the x/cgroup.procs!
And just to verify that this is not a bug in my
visualization program:
# cat x/cgroup.procs
655
654
Your thoughts?
Thanks,
Michael
[1] https://www.spinics.net/lists/cgroups/msg20710.html
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
CSS_TASK_ITER_PROCS implements process-only iteration by making
css_task_iter_advance() skip tasks which aren't threadgroup leaders;
however, when an iteration is started css_task_iter_start() calls the
inner helper function css_task_iter_advance_css_set() instead of
css_task_iter_advance(). As the helper doesn't have the skip logic,
when the first task to visit is a non-leader thread, it doesn't get
skipped correctly as shown in the following example.
# ps -L 2030
PID LWP TTY STAT TIME COMMAND
2030 2030 pts/0 Sl+ 0:00 ./test-thread
2030 2031 pts/0 Sl+ 0:00 ./test-thread
# mkdir -p /sys/fs/cgroup/x/a/b
# echo threaded > /sys/fs/cgroup/x/a/cgroup.type
# echo threaded > /sys/fs/cgroup/x/a/b/cgroup.type
# echo 2030 > /sys/fs/cgroup/x/a/cgroup.procs
# cat /sys/fs/cgroup/x/a/cgroup.threads
2030
2031
# cat /sys/fs/cgroup/x/cgroup.procs
2030
# echo 2030 > /sys/fs/cgroup/x/a/b/cgroup.threads
# cat /sys/fs/cgroup/x/cgroup.procs
2031
2030
The last read of cgroup.procs is incorrectly showing non-leader 2031
in cgroup.procs output.
This can be fixed by updating css_task_iter_advance() to handle the
first advance and css_task_iters_tart() to call
css_task_iter_advance() instead of the inner helper. After the fix,
the same commands result in the following (correct) result:
# ps -L 2062
PID LWP TTY STAT TIME COMMAND
2062 2062 pts/0 Sl+ 0:00 ./test-thread
2062 2063 pts/0 Sl+ 0:00 ./test-thread
# mkdir -p /sys/fs/cgroup/x/a/b
# echo threaded > /sys/fs/cgroup/x/a/cgroup.type
# echo threaded > /sys/fs/cgroup/x/a/b/cgroup.type
# echo 2062 > /sys/fs/cgroup/x/a/cgroup.procs
# cat /sys/fs/cgroup/x/a/cgroup.threads
2062
2063
# cat /sys/fs/cgroup/x/cgroup.procs
2062
# echo 2062 > /sys/fs/cgroup/x/a/b/cgroup.threads
# cat /sys/fs/cgroup/x/cgroup.procs
2062
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: "Michael Kerrisk (man-pages)" <[email protected]>
Fixes: 8cfd8147df67 ("cgroup: implement cgroup v2 thread support")
Cc: [email protected] # v4.14+
---
kernel/cgroup/cgroup.c | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 6aaf5dd5383b..1f84977fab47 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4202,20 +4202,25 @@ static void css_task_iter_advance(struct css_task_iter *it)
lockdep_assert_held(&css_set_lock);
repeat:
- /*
- * Advance iterator to find next entry. cset->tasks is consumed
- * first and then ->mg_tasks. After ->mg_tasks, we move onto the
- * next cset.
- */
- next = it->task_pos->next;
+ if (it->task_pos) {
+ /*
+ * Advance iterator to find next entry. cset->tasks is
+ * consumed first and then ->mg_tasks. After ->mg_tasks,
+ * we move onto the next cset.
+ */
+ next = it->task_pos->next;
- if (next == it->tasks_head)
- next = it->mg_tasks_head->next;
+ if (next == it->tasks_head)
+ next = it->mg_tasks_head->next;
- if (next == it->mg_tasks_head)
+ if (next == it->mg_tasks_head)
+ css_task_iter_advance_css_set(it);
+ else
+ it->task_pos = next;
+ } else {
+ /* called from start, proceed to the first cset */
css_task_iter_advance_css_set(it);
- else
- it->task_pos = next;
+ }
/* if PROCS, skip over tasks which aren't group leaders */
if ((it->flags & CSS_TASK_ITER_PROCS) && it->task_pos &&
@@ -4255,7 +4260,7 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
it->cset_head = it->cset_pos;
- css_task_iter_advance_css_set(it);
+ css_task_iter_advance(it);
spin_unlock_irq(&css_set_lock);
}
On Thu, Nov 08, 2018 at 12:15:15PM -0800, Tejun Heo wrote:
> CSS_TASK_ITER_PROCS implements process-only iteration by making
> css_task_iter_advance() skip tasks which aren't threadgroup leaders;
> however, when an iteration is started css_task_iter_start() calls the
> inner helper function css_task_iter_advance_css_set() instead of
> css_task_iter_advance(). As the helper doesn't have the skip logic,
> when the first task to visit is a non-leader thread, it doesn't get
> skipped correctly as shown in the following example.
Applied to cgroup/for-4.20-fixes.
Thanks.
--
tejun