2022-06-10 16:39:04

by Pavel Tikhomirov

[permalink] [raw]
Subject: [PATCH 0/2] Introduce CABA helper process tree

Please see "Add CABA tree to task_struct" for deeper explanation, and
"tests: Add CABA selftest" for a small test and an actual case for which
we might need CABA.

Probably the original problem of restoring process tree with complex
sessions can be resolved by allowing sessions copying, like we do for
process group, but I'm not sure if that would be too secure to do it,
and if there would not be another similar resource in future.

We can use CABA not only for CRIU for restoring processes, in normal
life when processes detach CABA will help to understand from which place
in process tree they were originally started from sshd/crond or
something else.

Hope my idea is not completely insane =)

CC: Eric Biederman <[email protected]>
CC: Kees Cook <[email protected]>
CC: Alexander Viro <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Juri Lelli <[email protected]>
CC: Vincent Guittot <[email protected]>
CC: Dietmar Eggemann <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ben Segall <[email protected]>
CC: Mel Gorman <[email protected]>
CC: Daniel Bristot de Oliveira <[email protected]>
CC: Valentin Schneider <[email protected]>
CC: Andrew Morton <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]

Pavel Tikhomirov (2):
Add CABA tree to task_struct
tests: Add CABA selftest

arch/ia64/kernel/mca.c | 3 +
fs/exec.c | 1 +
fs/proc/array.c | 18 +
include/linux/sched.h | 7 +
init/init_task.c | 3 +
kernel/exit.c | 50 ++-
kernel/fork.c | 4 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/caba/.gitignore | 1 +
tools/testing/selftests/caba/Makefile | 7 +
tools/testing/selftests/caba/caba_test.c | 501 +++++++++++++++++++++++
tools/testing/selftests/caba/config | 1 +
12 files changed, 591 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/caba/.gitignore
create mode 100644 tools/testing/selftests/caba/Makefile
create mode 100644 tools/testing/selftests/caba/caba_test.c
create mode 100644 tools/testing/selftests/caba/config

--
2.35.3


2022-06-10 16:47:57

by Pavel Tikhomirov

[permalink] [raw]
Subject: [PATCH 1/2] Add CABA tree to task_struct

In linux after parent (father) process dies, children processes are
moved (reparented) to a reaper process. Roughly speaking:

1) If father has other yet alive thread, this thread would be a reaper.

2) Else if there is father's ancestor (with no pidns level change in the
middle), which has PR_SET_CHILD_SUBREAPER set, this ancestor would be a
reaper.

3) Else father's pidns init would be a reaper for fathers children.

The problem with this for CRIU is that when CRIU comes to dump processes
it does not know the order in which processes and their resources were
created. And processes can have resources which a) can only be inherited
when we clone processes, b) can only be created by specific processes
and c) are shared between several processes (the example of such a
resource is process session). For such resources CRIU restore would need
to re-invent such order of process creation which at the same time
creates the desired process tree topology and allows to inherit all
resources right.

When process reparenting involves child-sub-reapers one can drastically
mix processes in process tree so that it is not obvious how to restore
everything right.

So this is what we came up with to help CRIU to overcome this problem:

CABA = Closest Alive Born Ancestor
CABD = Closest Alive Born Descendant

We want to put processes in one more tree - CABA tree. This tree is not
affecting reparenting or process creation in any way except for
providing a new information to CRIU so that it can understand from where
the reparented child had reparented, though original father is already
dead and probably a fathers father too, we can still have information
about the process which is still alive and was originally a parent of
process sequence (of already dead processes) which lead to us - CABA.

CC: Eric Biederman <[email protected]>
CC: Kees Cook <[email protected]>
CC: Alexander Viro <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Juri Lelli <[email protected]>
CC: Vincent Guittot <[email protected]>
CC: Dietmar Eggemann <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ben Segall <[email protected]>
CC: Mel Gorman <[email protected]>
CC: Daniel Bristot de Oliveira <[email protected]>
CC: Valentin Schneider <[email protected]>
CC: Andrew Morton <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]

Signed-off-by: Pavel Tikhomirov <[email protected]>
---
arch/ia64/kernel/mca.c | 3 +++
fs/exec.c | 1 +
fs/proc/array.c | 18 +++++++++++++++
include/linux/sched.h | 7 ++++++
init/init_task.c | 3 +++
kernel/exit.c | 50 +++++++++++++++++++++++++++++++++++++-----
kernel/fork.c | 4 ++++
7 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
index c62a66710ad6..74bf75fef9df 100644
--- a/arch/ia64/kernel/mca.c
+++ b/arch/ia64/kernel/mca.c
@@ -1793,6 +1793,9 @@ format_mca_init_stack(void *mca_data, unsigned long offset,
p->parent = p->real_parent = p->group_leader = p;
INIT_LIST_HEAD(&p->children);
INIT_LIST_HEAD(&p->sibling);
+ p->caba = p->real_parent;
+ INIT_LIST_HEAD(&p->cabds);
+ INIT_LIST_HEAD(&p->cabd);
strncpy(p->comm, type, sizeof(p->comm)-1);
}

diff --git a/fs/exec.c b/fs/exec.c
index 0989fb8472a1..23e48db6c5b1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1136,6 +1136,7 @@ static int de_thread(struct task_struct *tsk)

list_replace_rcu(&leader->tasks, &tsk->tasks);
list_replace_init(&leader->sibling, &tsk->sibling);
+ list_replace_init(&leader->cabd, &tsk->cabd);

tsk->group_leader = tsk;
leader->group_leader = tsk;
diff --git a/fs/proc/array.c b/fs/proc/array.c
index eb815759842c..6c43a8d64f65 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -151,11 +151,26 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
const struct cred *cred;
pid_t ppid, tpid = 0, tgid, ngid;
unsigned int max_fds = 0;
+ struct task_struct *caba;
+ struct pid *caba_pid;
+ int caba_level = 0;
+ pid_t caba_pids[MAX_PID_NS_LEVEL] = {};

rcu_read_lock();
ppid = pid_alive(p) ?
task_tgid_nr_ns(rcu_dereference(p->real_parent), ns) : 0;

+#ifdef CONFIG_PID_NS
+ caba = rcu_dereference(p->caba);
+ caba_pid = get_task_pid(caba, PIDTYPE_PID);
+ if (caba_pid) {
+ caba_level = caba_pid->level;
+ for (g = ns->level; g <= caba_level; g++)
+ caba_pids[g] = task_pid_nr_ns(caba, caba_pid->numbers[g].ns);
+ put_pid(caba_pid);
+ }
+#endif
+
tracer = ptrace_parent(p);
if (tracer)
tpid = task_pid_nr_ns(tracer, ns);
@@ -214,6 +229,9 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
seq_puts(m, "\nNSsid:");
for (g = ns->level; g <= pid->level; g++)
seq_put_decimal_ull(m, "\t", task_session_nr_ns(p, pid->numbers[g].ns));
+ seq_puts(m, "\nNScaba:");
+ for (g = ns->level; g <= caba_level; g++)
+ seq_put_decimal_ull(m, "\t", caba_pids[g]);
#endif
seq_putc(m, '\n');
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c46f3a63b758..358af0cf8f73 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -973,6 +973,13 @@ struct task_struct {
struct list_head sibling;
struct task_struct *group_leader;

+ /* Closest Alive Born Ancestor process: */
+ struct task_struct __rcu *caba;
+
+ /* Closest Alive Born Descendants list: */
+ struct list_head cabds;
+ struct list_head cabd;
+
/*
* 'ptraced' is the list of tasks this task is using ptrace() on.
*
diff --git a/init/init_task.c b/init/init_task.c
index 73cc8f03511a..a0b206dd74ef 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -109,6 +109,9 @@ struct task_struct init_task
.children = LIST_HEAD_INIT(init_task.children),
.sibling = LIST_HEAD_INIT(init_task.sibling),
.group_leader = &init_task,
+ .caba = &init_task,
+ .cabds = LIST_HEAD_INIT(init_task.cabds),
+ .cabd = LIST_HEAD_INIT(init_task.cabd),
RCU_POINTER_INITIALIZER(real_cred, &init_cred),
RCU_POINTER_INITIALIZER(cred, &init_cred),
.comm = INIT_TASK_COMM,
diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..5eae2ff93576 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -82,6 +82,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead)

list_del_rcu(&p->tasks);
list_del_init(&p->sibling);
+ list_del_init(&p->cabd);
__this_cpu_dec(process_counts);
}
list_del_rcu(&p->thread_group);
@@ -562,11 +563,11 @@ static struct task_struct *find_child_reaper(struct task_struct *father,
* 3. give it to the init process (PID 1) in our pid namespace
*/
static struct task_struct *find_new_reaper(struct task_struct *father,
- struct task_struct *child_reaper)
+ struct task_struct *child_reaper,
+ struct task_struct *thread)
{
- struct task_struct *thread, *reaper;
+ struct task_struct *reaper;

- thread = find_alive_thread(father);
if (thread)
return thread;

@@ -620,6 +621,31 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
kill_orphaned_pgrp(p, father);
}

+static struct task_struct *find_new_caba(struct task_struct *father,
+ struct task_struct *thread)
+{
+ struct task_struct *caba;
+
+ if (thread)
+ return thread;
+
+ caba = father->caba;
+ while (1) {
+ if (caba == &init_task)
+ break;
+ if (WARN_ON_ONCE(caba->caba == caba))
+ break;
+
+ thread = find_alive_thread(caba);
+ if (thread)
+ return thread;
+
+ caba = caba->caba;
+ }
+
+ return caba;
+}
+
/*
* This does two things:
*
@@ -631,17 +657,19 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
static void forget_original_parent(struct task_struct *father,
struct list_head *dead)
{
- struct task_struct *p, *t, *reaper;
+ struct task_struct *p, *t, *reaper, *thread, *caba;

if (unlikely(!list_empty(&father->ptraced)))
exit_ptrace(father, dead);

/* Can drop and reacquire tasklist_lock */
reaper = find_child_reaper(father, dead);
+ thread = find_alive_thread(father);
+
if (list_empty(&father->children))
- return;
+ goto caba;

- reaper = find_new_reaper(father, reaper);
+ reaper = find_new_reaper(father, reaper, thread);
list_for_each_entry(p, &father->children, sibling) {
for_each_thread(p, t) {
RCU_INIT_POINTER(t->real_parent, reaper);
@@ -661,6 +689,16 @@ static void forget_original_parent(struct task_struct *father,
reparent_leader(father, p, dead);
}
list_splice_tail_init(&father->children, &reaper->children);
+caba:
+ if (list_empty(&father->cabds))
+ return;
+
+ caba = find_new_caba(father, thread);
+ list_for_each_entry(p, &father->cabds, cabd) {
+ for_each_thread(p, t)
+ RCU_INIT_POINTER(t->caba, caba);
+ }
+ list_splice_tail_init(&father->cabds, &caba->cabds);
}

/*
diff --git a/kernel/fork.c b/kernel/fork.c
index 9d44f2d46c69..e397122721ff 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2123,6 +2123,8 @@ static __latent_entropy struct task_struct *copy_process(
p->flags |= PF_FORKNOEXEC;
INIT_LIST_HEAD(&p->children);
INIT_LIST_HEAD(&p->sibling);
+ INIT_LIST_HEAD(&p->cabds);
+ INIT_LIST_HEAD(&p->cabd);
rcu_copy_process(p);
p->vfork_done = NULL;
spin_lock_init(&p->alloc_lock);
@@ -2386,6 +2388,7 @@ static __latent_entropy struct task_struct *copy_process(
p->parent_exec_id = current->self_exec_id;
p->exit_signal = args->exit_signal;
}
+ p->caba = p->real_parent;

klp_copy_process(p);

@@ -2437,6 +2440,7 @@ static __latent_entropy struct task_struct *copy_process(
p->signal->has_child_subreaper = p->real_parent->signal->has_child_subreaper ||
p->real_parent->signal->is_child_subreaper;
list_add_tail(&p->sibling, &p->real_parent->children);
+ list_add_tail(&p->cabd, &p->caba->cabds);
list_add_tail_rcu(&p->tasks, &init_task.tasks);
attach_pid(p, PIDTYPE_TGID);
attach_pid(p, PIDTYPE_PGID);
--
2.35.3

2022-06-10 17:05:05

by Pavel Tikhomirov

[permalink] [raw]
Subject: [PATCH 2/2] tests: Add CABA selftest

This test creates a "tricky" example process tree where session leaders
of two sessions are children of pid namespace init, also they have their
own children, leader of session A has child with session B and leader
from session B has child with session A.

We check that Closest Alive Born Ancestor tree is right for this case.
This case illustrates how CABA tree helps to understand order of
creation between sessions.

CC: Eric Biederman <[email protected]>
CC: Kees Cook <[email protected]>
CC: Alexander Viro <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Juri Lelli <[email protected]>
CC: Vincent Guittot <[email protected]>
CC: Dietmar Eggemann <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ben Segall <[email protected]>
CC: Mel Gorman <[email protected]>
CC: Daniel Bristot de Oliveira <[email protected]>
CC: Valentin Schneider <[email protected]>
CC: Andrew Morton <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]

Signed-off-by: Pavel Tikhomirov <[email protected]>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/caba/.gitignore | 1 +
tools/testing/selftests/caba/Makefile | 7 +
tools/testing/selftests/caba/caba_test.c | 501 +++++++++++++++++++++++
tools/testing/selftests/caba/config | 1 +
5 files changed, 511 insertions(+)
create mode 100644 tools/testing/selftests/caba/.gitignore
create mode 100644 tools/testing/selftests/caba/Makefile
create mode 100644 tools/testing/selftests/caba/caba_test.c
create mode 100644 tools/testing/selftests/caba/config

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index de11992dc577..e231bd93b4c4 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -3,6 +3,7 @@ TARGETS += alsa
TARGETS += arm64
TARGETS += bpf
TARGETS += breakpoints
+TARGETS += caba
TARGETS += capabilities
TARGETS += cgroup
TARGETS += clone3
diff --git a/tools/testing/selftests/caba/.gitignore b/tools/testing/selftests/caba/.gitignore
new file mode 100644
index 000000000000..aa2c55b774e2
--- /dev/null
+++ b/tools/testing/selftests/caba/.gitignore
@@ -0,0 +1 @@
+caba_test
diff --git a/tools/testing/selftests/caba/Makefile b/tools/testing/selftests/caba/Makefile
new file mode 100644
index 000000000000..4260145c3747
--- /dev/null
+++ b/tools/testing/selftests/caba/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for caba selftests.
+CFLAGS = -g -I../../../../usr/include/ -Wall -O2
+
+TEST_GEN_FILES += caba_test
+
+include ../lib.mk
diff --git a/tools/testing/selftests/caba/caba_test.c b/tools/testing/selftests/caba/caba_test.c
new file mode 100644
index 000000000000..7a2e3f0f39db
--- /dev/null
+++ b/tools/testing/selftests/caba/caba_test.c
@@ -0,0 +1,501 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sched.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
+#include <sys/mount.h>
+#include <sys/user.h>
+
+#include "../kselftest_harness.h"
+
+#ifndef CLONE_NEWPID
+#define CLONE_NEWPID 0x20000000 /* New pid namespace */
+#endif
+
+/* Attempt to de-conflict with the selftests tree. */
+#ifndef SKIP
+#define SKIP(s, ...) XFAIL(s, ##__VA_ARGS__)
+#endif
+
+struct process
+{
+ pid_t pid;
+ pid_t real;
+ pid_t caba;
+ int sks[2];
+ int dead;
+};
+
+struct process *processes;
+int nr_processes = 8;
+int current = 0;
+
+static void cleanup(void)
+{
+ kill(processes[0].pid, SIGKILL);
+ /* It's enought to kill pidns init for others to die */
+ kill(processes[1].pid, SIGKILL);
+}
+
+enum commands
+{
+ TEST_FORK,
+ TEST_WAIT,
+ TEST_SUBREAPER,
+ TEST_SETSID,
+ TEST_DIE,
+ /* unused */
+ TEST_GETSID,
+ TEST_SETNS,
+ TEST_SETPGID,
+ TEST_GETPGID,
+ TEST_GETPPID,
+};
+
+struct command
+{
+ enum commands cmd;
+ int arg1;
+ int arg2;
+};
+
+static void handle_command(void);
+
+static void mainloop(void)
+{
+ while (1)
+ handle_command();
+}
+
+#define CLONE_STACK_SIZE 4096
+#define __stack_aligned__ __attribute__((aligned(16)))
+/* All arguments should be above stack, because it grows down */
+struct clone_args {
+ char stack[CLONE_STACK_SIZE] __stack_aligned__;
+ char stack_ptr[0];
+ int id;
+};
+
+static int get_real_pid()
+{
+ char buf[11];
+ int ret;
+
+ ret = readlink("/proc/self", buf, sizeof(buf)-1);
+ if (ret <= 0) {
+ fprintf(stderr, "%d: readlink /proc/self :%m", current);
+ return -1;
+ }
+ buf[ret] = '\0';
+
+ processes[current].real = atoi(buf);
+ return 0;
+}
+
+static int clone_func(void *_arg)
+{
+ struct clone_args *args = (struct clone_args *) _arg;
+
+ current = args->id;
+
+ if (get_real_pid())
+ exit(1);
+
+ printf("%3d: Hello. My pid is %d\n", args->id, getpid());
+ mainloop();
+ exit(0);
+}
+
+static int make_child(int id, int flags)
+{
+ struct clone_args args;
+ pid_t cid;
+
+ args.id = id;
+
+ cid = clone(clone_func, args.stack_ptr,
+ flags | SIGCHLD, &args);
+
+ if (cid < 0)
+ fprintf(stderr, "clone(%d, %d) :%m", id, flags);
+
+ processes[id].pid = cid;
+
+ return cid;
+}
+
+static int open_proc(void)
+{
+ int fd;
+ char proc_mountpoint[] = "/tmp/.caba_test.proc.XXXXXX";
+
+ if (mkdtemp(proc_mountpoint) == NULL) {
+ fprintf(stderr, "mkdtemp failed %s :%m\n", proc_mountpoint);
+ return -1;
+ }
+
+ if (mount("proc", proc_mountpoint, "proc", MS_MGC_VAL | MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL)) {
+ fprintf(stderr, "mount proc failed :%m\n");
+ rmdir(proc_mountpoint);
+ return -1;
+ }
+
+ fd = open(proc_mountpoint, O_RDONLY | O_DIRECTORY, 0);
+ if (fd < 0)
+ fprintf(stderr, "can't open proc :%m\n");
+
+ if (umount2(proc_mountpoint, MNT_DETACH)) {
+ fprintf(stderr, "can't umount proc :%m\n");
+ goto err_close;
+ }
+
+ if (rmdir(proc_mountpoint)) {
+ fprintf(stderr, "can't remove tmp dir :%m\n");
+ goto err_close;
+ }
+
+ return fd;
+err_close:
+ if (fd >= 0)
+ close(fd);
+ return -1;
+}
+
+static int open_pidns(int pid)
+{
+ int proc, fd;
+ char pidns_path[PATH_MAX];
+
+ proc = open_proc();
+ if (proc < 0) {
+ fprintf(stderr, "open proc\n");
+ return -1;
+ }
+
+ sprintf(pidns_path, "%d/ns/pid", pid);
+ fd = openat(proc, pidns_path, O_RDONLY);
+ if (fd == -1)
+ fprintf(stderr, "open pidns fd\n");
+
+ close(proc);
+ return fd;
+}
+
+static int setns_pid(int pid, int nstype)
+{
+ int pidns, ret;
+
+ pidns = open_pidns(pid);
+ if (pidns < 0)
+ return -1;
+
+ ret = setns(pidns, nstype);
+ if (ret == -1)
+ fprintf(stderr, "setns :%m\n");
+
+ close(pidns);
+ return ret;
+}
+
+static void handle_command(void)
+{
+ int sk = processes[current].sks[0], ret, status = 0;
+ struct command cmd;
+
+ ret = read(sk, &cmd, sizeof(cmd));
+ if (ret != sizeof(cmd)) {
+ fprintf(stderr, "Unable to get command :%m\n");
+ goto err;
+ }
+
+ switch (cmd.cmd) {
+ case TEST_FORK:
+ {
+ pid_t pid;
+
+ pid = make_child(cmd.arg1, cmd.arg2);
+ if (pid == -1) {
+ status = -1;
+ goto err;
+ }
+
+ printf("%3d: fork(%d, %x) = %d\n",
+ current, cmd.arg1, cmd.arg2, pid);
+ processes[cmd.arg1].pid = pid;
+ }
+ break;
+ case TEST_WAIT:
+ printf("%3d: wait(%d) = %d\n", current,
+ cmd.arg1, processes[cmd.arg1].pid);
+
+ if (waitpid(processes[cmd.arg1].pid, NULL, 0) == -1) {
+ fprintf(stderr, "waitpid(%d) :%m\n", processes[cmd.arg1].pid);
+ status = -1;
+ }
+ break;
+ case TEST_SUBREAPER:
+ printf("%3d: subreaper(%d)\n", current, cmd.arg1);
+ if (prctl(PR_SET_CHILD_SUBREAPER, cmd.arg1, 0, 0, 0) == -1) {
+ fprintf(stderr, "PR_SET_CHILD_SUBREAPER :%m\n");
+ status = -1;
+ }
+ break;
+ case TEST_SETSID:
+ printf("%3d: setsid()\n", current);
+ if(setsid() == -1) {
+ fprintf(stderr, "setsid :%m\n");
+ status = -1;
+ }
+ break;
+ case TEST_GETSID:
+ printf("%3d: getsid()\n", current);
+ status = getsid(getpid());
+ if(status == -1)
+ fprintf(stderr, "getsid :%m\n");
+ break;
+ case TEST_SETPGID:
+ printf("%3d: setpgid(%d, %d)\n", current, cmd.arg1, cmd.arg2);
+ if(setpgid(processes[cmd.arg1].pid, processes[cmd.arg2].pid) == -1) {
+ fprintf(stderr, "setpgid :%m\n");
+ status = -1;
+ }
+ break;
+ case TEST_GETPGID:
+ printf("%3d: getpgid()\n", current);
+ status = getpgid(0);
+ if(status == -1)
+ fprintf(stderr, "getpgid :%m\n");
+ break;
+ case TEST_GETPPID:
+ printf("%3d: getppid()\n", current);
+ status = getppid();
+ if(status == -1)
+ fprintf(stderr, "getppid :%m\n");
+ break;
+ case TEST_SETNS:
+ printf("%3d: setns(%d, %d) = %d\n", current,
+ cmd.arg1, cmd.arg2, processes[cmd.arg1].pid);
+ setns_pid(processes[cmd.arg1].pid, cmd.arg2);
+
+ break;
+ case TEST_DIE:
+ printf("%3d: die()\n", current);
+ processes[current].dead = 1;
+ shutdown(sk, SHUT_RDWR);
+ exit(0);
+ }
+
+ ret = write(sk, &status, sizeof(status));
+ if (ret != sizeof(status)) {
+ fprintf(stderr, "Unable to answer :%m\n");
+ goto err;
+ }
+
+ if (status < 0)
+ goto err;
+
+ return;
+err:
+ shutdown(sk, SHUT_RDWR);
+ exit(1);
+}
+
+static int send_command(int id, enum commands op, int arg1, int arg2)
+{
+ int sk = processes[id].sks[1], ret, status;
+ struct command cmd = {op, arg1, arg2};
+
+ if (op == TEST_FORK) {
+ if (processes[arg1].pid) {
+ fprintf(stderr, "%d is busy :%m\n", arg1);
+ return -1;
+ }
+ }
+
+ ret = write(sk, &cmd, sizeof(cmd));
+ if (ret != sizeof(cmd)) {
+ fprintf(stderr, "Unable to send command :%m\n");
+ goto err;
+ }
+
+ status = 0;
+ ret = read(sk, &status, sizeof(status));
+ if (ret != sizeof(status) && !(status == 0 && op == TEST_DIE)) {
+ fprintf(stderr, "Unable to get answer :%m\n");
+ goto err;
+ }
+
+ if (status != -1 && (op == TEST_GETSID || op == TEST_GETPGID || op == TEST_GETPPID))
+ return status;
+
+ if (status) {
+ fprintf(stderr, "The command(%d, %d, %d) failed :%m\n", op, arg1, arg2);
+ goto err;
+ }
+
+ return 0;
+err:
+ cleanup();
+ exit(1);
+}
+
+static int get_caba(int pid, int *caba) {
+ char buf[64], *str;
+ FILE *fp;
+ size_t n;
+
+ if (!pid)
+ snprintf(buf, sizeof(buf), "/proc/self/status");
+ else
+ snprintf(buf, sizeof(buf), "/proc/%d/status", pid);
+
+ fp = fopen(buf, "r");
+ if (!fp) {
+ perror("fopen");
+ return -1;
+ }
+
+ str = NULL;
+ while (getline(&str, &n, fp) != -1) {
+ if (strncmp(str, "NScaba:", 7) == 0) {
+ if (str[7] == '\0') {
+ *caba = 0;
+ } else {
+ if (sscanf(str+7, "%d", caba) != 1) {
+ perror("sscanf");
+ goto err;
+ }
+ }
+
+ fclose(fp);
+ free(str);
+ return 0;
+ }
+ }
+err:
+ free(str);
+ fclose(fp);
+ return -1;
+}
+
+static bool caba_supported(void)
+{
+ int caba;
+
+ return !get_caba(0, &caba);
+}
+
+FIXTURE(caba) {
+};
+
+FIXTURE_SETUP(caba)
+{
+ bool ret;
+
+ ret = caba_supported();
+ ASSERT_GE(ret, 0);
+ if (!ret)
+ SKIP(return, "CABA is not supported");
+}
+
+FIXTURE_TEARDOWN(caba)
+{
+ bool ret;
+
+ ret = caba_supported();
+ ASSERT_GE(ret, 0);
+ if (!ret)
+ SKIP(return, "CABA is not supported");
+
+ cleanup();
+}
+
+TEST_F(caba, complex_sessions)
+{
+ int ret, i, pid, caba;
+
+ ret = caba_supported();
+ ASSERT_GE(ret, 0);
+ if (!ret)
+ SKIP(return, "CABA is not supported");
+
+ processes = mmap(NULL, PAGE_SIZE, PROT_WRITE | PROT_READ, MAP_SHARED | MAP_ANONYMOUS, 0, 0); ASSERT_NE(processes, MAP_FAILED);
+ for (i = 0; i < nr_processes; i++) {
+ ret = socketpair(PF_UNIX, SOCK_STREAM, 0, processes[i].sks); ASSERT_EQ(ret, 0);
+
+ }
+
+ /*
+ * Create init:
+ * (pid, sid)
+ * (1, 1)
+ */
+ pid = make_child(0, 0); ASSERT_GT(pid, 0);
+ ret = send_command(0, TEST_FORK, 1, CLONE_NEWPID); ASSERT_EQ(ret, 0);
+ ret = send_command(1, TEST_SETSID, 0, 0); ASSERT_EQ(ret, 0);
+
+ /*
+ * Create sequence of processes from one session:
+ * (pid, sid)
+ * (1, 1)---(2, 2)---(3, 2)---(4, 2)---(5, 2)
+ */
+ ret = send_command(1, TEST_FORK, 2, 0); ASSERT_EQ(ret, 0);
+ ret = send_command(2, TEST_SETSID, 0, 0); ASSERT_EQ(ret, 0);
+ ret = send_command(2, TEST_FORK, 3, 0); ASSERT_EQ(ret, 0);
+ ret = send_command(3, TEST_FORK, 4, 0); ASSERT_EQ(ret, 0);
+ ret = send_command(4, TEST_FORK, 5, 0); ASSERT_EQ(ret, 0);
+ /*
+ * Create another session in the middle of first one:
+ * (pid, sid)
+ * (1, 1)---(2, 2)---(3, 2)---(4, 4)-+-(5, 2)
+ * `-(6, 4)---(7, 4)
+ */
+ ret = send_command(4, TEST_SETSID, 0, 0); ASSERT_EQ(ret, 0);
+ ret = send_command(4, TEST_FORK, 6, 0); ASSERT_EQ(ret, 0);
+ ret = send_command(6, TEST_FORK, 7, 0); ASSERT_EQ(ret, 0);
+
+ /*
+ * Kill 6 while having 2 as child-sub-reaper:
+ * (pid, sid)
+ * (1, 1)---(2, 2)---(3, 2)---(4, 4)-+-(5, 2)
+ * `-(7, 4)
+ */
+ ret = send_command(2, TEST_SUBREAPER, 1, 0); ASSERT_EQ(ret, 0);
+ ret = send_command(6, TEST_DIE, 0, 0); ASSERT_EQ(ret, 0);
+ ret = send_command(4, TEST_WAIT, 6, 0); ASSERT_EQ(ret, 0);
+ ret = send_command(2, TEST_SUBREAPER, 0, 0); ASSERT_EQ(ret, 0);
+
+ /*
+ * Kill 3:
+ * (pid, sid)
+ * (1, 1)-+-(2, 2)---(7, 4)
+ * `-(4, 4)---(5, 2)
+ * note: This is a "tricky" session tree example where it's not obvious
+ * whether sid 2 was created first or sid 4 when creating the tree.
+ */
+ ret = send_command(3, TEST_DIE, 0, 0); ASSERT_EQ(ret, 0);
+ ret = send_command(2, TEST_WAIT, 3, 0); ASSERT_EQ(ret, 0);
+
+ /*
+ * CABA tree for this would be:
+ * (pid, sid)
+ * (1, 1)---(2, 2)---(4, 4)-+-(5, 2)
+ * `-(7, 4)
+ * note: CABA allows us to understand that session 2 was created first.
+ */
+ ret = get_caba(processes[2].real, &caba); ASSERT_EQ(ret, 0); ASSERT_EQ(caba, processes[1].real);
+ ret = get_caba(processes[4].real, &caba); ASSERT_EQ(ret, 0); ASSERT_EQ(caba, processes[2].real);
+ ret = get_caba(processes[5].real, &caba); ASSERT_EQ(ret, 0); ASSERT_EQ(caba, processes[4].real);
+ ret = get_caba(processes[7].real, &caba); ASSERT_EQ(ret, 0); ASSERT_EQ(caba, processes[4].real);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/caba/config b/tools/testing/selftests/caba/config
new file mode 100644
index 000000000000..eae7bdaa3790
--- /dev/null
+++ b/tools/testing/selftests/caba/config
@@ -0,0 +1 @@
+CONFIG_PID_NS=y
--
2.35.3

2022-06-10 17:05:44

by Pavel Tikhomirov

[permalink] [raw]
Subject: Re: [PATCH 0/2] Introduce CABA helper process tree

CC: [email protected]

On 10.06.2022 19:32, Pavel Tikhomirov wrote:
> Please see "Add CABA tree to task_struct" for deeper explanation, and
> "tests: Add CABA selftest" for a small test and an actual case for which
> we might need CABA.
>
> Probably the original problem of restoring process tree with complex
> sessions can be resolved by allowing sessions copying, like we do for
> process group, but I'm not sure if that would be too secure to do it,
> and if there would not be another similar resource in future.
>
> We can use CABA not only for CRIU for restoring processes, in normal
> life when processes detach CABA will help to understand from which place
> in process tree they were originally started from sshd/crond or
> something else.
>
> Hope my idea is not completely insane =)
>
> CC: Eric Biederman <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: Alexander Viro <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> CC: Juri Lelli <[email protected]>
> CC: Vincent Guittot <[email protected]>
> CC: Dietmar Eggemann <[email protected]>
> CC: Steven Rostedt <[email protected]>
> CC: Ben Segall <[email protected]>
> CC: Mel Gorman <[email protected]>
> CC: Daniel Bristot de Oliveira <[email protected]>
> CC: Valentin Schneider <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
>
> Pavel Tikhomirov (2):
> Add CABA tree to task_struct
> tests: Add CABA selftest
>
> arch/ia64/kernel/mca.c | 3 +
> fs/exec.c | 1 +
> fs/proc/array.c | 18 +
> include/linux/sched.h | 7 +
> init/init_task.c | 3 +
> kernel/exit.c | 50 ++-
> kernel/fork.c | 4 +
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/caba/.gitignore | 1 +
> tools/testing/selftests/caba/Makefile | 7 +
> tools/testing/selftests/caba/caba_test.c | 501 +++++++++++++++++++++++
> tools/testing/selftests/caba/config | 1 +
> 12 files changed, 591 insertions(+), 6 deletions(-)
> create mode 100644 tools/testing/selftests/caba/.gitignore
> create mode 100644 tools/testing/selftests/caba/Makefile
> create mode 100644 tools/testing/selftests/caba/caba_test.c
> create mode 100644 tools/testing/selftests/caba/config
>

--
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

2022-06-10 21:14:01

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/2] Add CABA tree to task_struct

Hi Pavel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on shuah-kselftest/next]
[also build test WARNING on kees/for-next/execve tip/sched/core linus/master v5.19-rc1 next-20220610]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/intel-lab-lkp/linux/commits/Pavel-Tikhomirov/Introduce-CABA-helper-process-tree/20220611-003433
base: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
config: i386-randconfig-a001 (https://download.01.org/0day-ci/archive/20220611/[email protected]/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/0875a2bed5ff95643c487dfcc28a550db06ea418
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Pavel-Tikhomirov/Introduce-CABA-helper-process-tree/20220611-003433
git checkout 0875a2bed5ff95643c487dfcc28a550db06ea418
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash fs/proc/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

fs/proc/array.c: In function 'task_state':
>> fs/proc/array.c:157:15: warning: unused variable 'caba_pids' [-Wunused-variable]
157 | pid_t caba_pids[MAX_PID_NS_LEVEL] = {};
| ^~~~~~~~~
>> fs/proc/array.c:156:13: warning: unused variable 'caba_level' [-Wunused-variable]
156 | int caba_level = 0;
| ^~~~~~~~~~
>> fs/proc/array.c:155:21: warning: unused variable 'caba_pid' [-Wunused-variable]
155 | struct pid *caba_pid;
| ^~~~~~~~
>> fs/proc/array.c:154:29: warning: unused variable 'caba' [-Wunused-variable]
154 | struct task_struct *caba;
| ^~~~


vim +/caba_pids +157 fs/proc/array.c

143
144 static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
145 struct pid *pid, struct task_struct *p)
146 {
147 struct user_namespace *user_ns = seq_user_ns(m);
148 struct group_info *group_info;
149 int g, umask = -1;
150 struct task_struct *tracer;
151 const struct cred *cred;
152 pid_t ppid, tpid = 0, tgid, ngid;
153 unsigned int max_fds = 0;
> 154 struct task_struct *caba;
> 155 struct pid *caba_pid;
> 156 int caba_level = 0;
> 157 pid_t caba_pids[MAX_PID_NS_LEVEL] = {};
158
159 rcu_read_lock();
160 ppid = pid_alive(p) ?
161 task_tgid_nr_ns(rcu_dereference(p->real_parent), ns) : 0;
162

--
0-DAY CI Kernel Test Service
https://01.org/lkp