[ I already sent this patch half a year ago or so, as an RFC. I didn't
really get any comments back then, however I am still interested in
seeing this patch in the kernel tree. So here I go again: please
comment! I have updated the patch to apply to the current upstream git
master. ]
Right now, if a process dies all its children are reparented to init.
This logic has good uses, i.e. for double forking when daemonizing.
However it also allows child processes to "escape" their parents, which
is a problem for software like session managers (such as gnome-session)
or other process supervisors.
This patch adds a simple flag for each process that marks it as an
"anchor" process for all its children and grandchildren. If a child of
such an anchor dies all its children will not be reparented to init, but
instead to this anchor, escaping this anchor process is not possible. A
task with this flag set hence acts is little "sub-init".
Anchors are fully recursive: if an anchor dies, all its children are
reparented to next higher anchor in the process tree.
This is orthogonal to PID namespaces. PID namespaces virtualize the
actual IDs in addition to introducing "sub-inits". This patch introduces
"sub-inits" inside the same PID namespace.
This patch is compile tested only. It's relatively trivial, and is
written in ignorance of the expected locking logic for accessing
task_struct->parent. This mail is primarily intended as a request for
comments. So please, I'd be happy about any comments!
Lennart
diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index a3baeb2..e9b3dd1 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -102,4 +102,7 @@
#define PR_MCE_KILL_GET 34
+#define PR_SET_ANCHOR 35
+#define PR_GET_ANCHOR 36
+
#endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index abdfacc..e9ab271 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1294,6 +1294,9 @@ struct task_struct {
* execve */
unsigned in_iowait:1;
+ /* When a child of one of our children dies, reparent it to me, instead
+ * of init. */
+ unsigned child_anchor:1;
/* Revert to default priority/policy when forking */
unsigned sched_reset_on_fork:1;
@@ -1306,6 +1309,7 @@ struct task_struct {
unsigned long stack_canary;
#endif
+
/*
* pointers to (original) parent process, youngest child, younger sibling,
* older sibling, respectively. (p->father can be replaced with
diff --git a/kernel/exit.c b/kernel/exit.c
index 546774a..416883e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -704,7 +704,7 @@ static void exit_mm(struct task_struct * tsk)
static struct task_struct *find_new_reaper(struct task_struct *father)
{
struct pid_namespace *pid_ns = task_active_pid_ns(father);
- struct task_struct *thread;
+ struct task_struct *thread, *anchor;
thread = father;
while_each_thread(father, thread) {
@@ -715,6 +715,11 @@ static struct task_struct *find_new_reaper(struct task_struct *father)
return thread;
}
+ /* find the first ancestor which is marked child_anchor */
+ for (anchor = father->parent; anchor != &init_task; anchor = anchor->parent)
+ if (anchor->child_anchor)
+ return anchor;
+
if (unlikely(pid_ns->child_reaper == father)) {
write_unlock_irq(&tasklist_lock);
if (unlikely(pid_ns == &init_pid_ns))
diff --git a/kernel/fork.c b/kernel/fork.c
index 5b2959b..3d11673 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1265,6 +1265,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->parent_exec_id = current->self_exec_id;
}
+ p->child_anchor = 0;
+
spin_lock(¤t->sighand->siglock);
/*
diff --git a/kernel/sys.c b/kernel/sys.c
index 26a6b73..8a1dfb1 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1578,6 +1578,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
else
error = PR_MCE_KILL_DEFAULT;
break;
+ case PR_SET_ANCHOR:
+ me->child_anchor = !!arg2;
+ error = 0;
+ break;
+ case PR_GET_ANCHOR:
+ error = put_user(me->child_anchor, (int __user *) arg2);
+ break;
default:
error = -EINVAL;
break;
--
1.6.6
Lennart
--
Lennart Poettering Red Hat, Inc.
lennart [at] poettering [dot] net
http://0pointer.net/lennart/ GnuPG 0x1A015CC4
> [ I already sent this patch half a year ago or so, as an RFC. I didn't
> really get any comments back then, however I am still interested in
> seeing this patch in the kernel tree. So here I go again: please
> comment! I have updated the patch to apply to the current upstream git
> master. ]
>
> Right now, if a process dies all its children are reparented to init.
> This logic has good uses, i.e. for double forking when daemonizing.
> However it also allows child processes to "escape" their parents, which
> is a problem for software like session managers (such as gnome-session)
> or other process supervisors.
I think you need to explain why this patch improve gnome-session.
- What's happen on current gnome-session. and When?
- After the patch, Which behavior will be changed?
- Why do you think gnome-session can ignore old kernel?
- etc..
We don't have any input for judgement and advise.
>
> This patch adds a simple flag for each process that marks it as an
> "anchor" process for all its children and grandchildren. If a child of
> such an anchor dies all its children will not be reparented to init, but
> instead to this anchor, escaping this anchor process is not possible. A
> task with this flag set hence acts is little "sub-init".
>
> Anchors are fully recursive: if an anchor dies, all its children are
> reparented to next higher anchor in the process tree.
>
> This is orthogonal to PID namespaces. PID namespaces virtualize the
> actual IDs in addition to introducing "sub-inits". This patch introduces
> "sub-inits" inside the same PID namespace.
>
> This patch is compile tested only. It's relatively trivial, and is
> written in ignorance of the expected locking logic for accessing
> task_struct->parent. This mail is primarily intended as a request for
> comments. So please, I'd be happy about any comments!
On Wed, 03.02.10 17:24, KOSAKI Motohiro ([email protected]) wrote:
>
> > [ I already sent this patch half a year ago or so, as an RFC. I didn't
> > really get any comments back then, however I am still interested in
> > seeing this patch in the kernel tree. So here I go again: please
> > comment! I have updated the patch to apply to the current upstream git
> > master. ]
> >
> > Right now, if a process dies all its children are reparented to init.
> > This logic has good uses, i.e. for double forking when daemonizing.
> > However it also allows child processes to "escape" their parents, which
> > is a problem for software like session managers (such as gnome-session)
> > or other process supervisors.
>
> I think you need to explain why this patch improve gnome-session.
>
> - What's happen on current gnome-session. and When?
If a child of a supervisor daemon such as g-s does a double fork (and
unfortunately most existing user daemons do), then that supervisor
deaemon will be unable to monitor that child anymore, i.e. do
something when it dies, such as restarting it, or tearing the session
down, or doing something when it segfaults and so on.
Also, if g-s itself dies, clients that escaped it via double-forking
will stay around even if PR_DEATHSIG is used. With this patch applied
PR_DEATHSIG will work for them too because child processes cannot
escape their parents anymore if the parent wants that. And
getrusage(RUSAGE_CHILDREN) will start to return useful results in g-s
too.
Also, as a minor side-effect the output of "ps xawf" or similar tools
becomes much more useful since processes belonging to a session will
actually show up as children of g-s in the tree instead of as
unattached processes.
Right now, only init itself can do process supervising properly, since
it will be getting the SIGCHLD for those processes that escaped their
parents by double forking. With this patch I want to extend this
power to non-init supervisor daemons, such as g-s.
Also, this makes it easier to write and test init daemon because you
can run them as PID != 1 and still get very similar functionality
regarding SIGCHLD.
> - After the patch, Which behavior will be changed?
For normal processes, nothing. And for those which use this new
PR_SETACNHOR call the children won't be able to escape them anymore
via a double fork. Or as I already tried to explain:
> > This patch adds a simple flag for each process that marks it as an
> > "anchor" process for all its children and grandchildren. If a child of
> > such an anchor dies all its children will not be reparented to init, but
> > instead to this anchor, escaping this anchor process is not possible. A
> > task with this flag set hence acts as little "sub-init".
> - Why do you think gnome-session can ignore old kernel?
Did I say that?
On new kernels supervisor daemons can make use of this and children
won't be able to escape them. On old kernels they cannot and children
will continue to escape them. But uh, that should be fine. So on newer
kernels g-s can supervise all user daemons nicely, and on old kernels
we continue with the status quo. That should be fine.
Lennart
--
Lennart Poettering Red Hat, Inc.
lennart [at] poettering [dot] net
http://0pointer.net/lennart/ GnuPG 0x1A015CC4
On Tue, Feb 02, 2010 at 01:04:57PM +0100, Lennart Poettering wrote:
>
>This patch adds a simple flag for each process that marks it as an
>"anchor" process for all its children and grandchildren. If a child of
>such an anchor dies all its children will not be reparented to init, but
>instead to this anchor, escaping this anchor process is not possible. A
>task with this flag set hence acts is little "sub-init".
>
This will break the applictions which using 'getppid() == 1' to check
if its real parent is dead or not...
--
Live like a child, think like the god.
On Wed, 03.02.10 23:31, Am?rico Wang ([email protected]) wrote:
> On Tue, Feb 02, 2010 at 01:04:57PM +0100, Lennart Poettering wrote:
> >
> >This patch adds a simple flag for each process that marks it as an
> >"anchor" process for all its children and grandchildren. If a child of
> >such an anchor dies all its children will not be reparented to init, but
> >instead to this anchor, escaping this anchor process is not possible. A
> >task with this flag set hence acts is little "sub-init".
>
> This will break the applictions which using 'getppid() == 1' to check
> if its real parent is dead or not...
Usage of the PR_SETANCHOR flag is optional for a process. It won't
break anything unless enabled. So I don't really see a problem here.
Of course, when this flag is used the behaviour is different from what
traditional Unix says what happens with the children of a process when
it dies. But uh, that's the whole point and that's why this flag is
enabled optionally only.
Also, on a side note: code that checks if its parent process died most
likely should rewritten to use PR_DEATHSIG or something like that
anyway, so that it is notified about the parent dying instead of
polling for it manually.
Lennart
--
Lennart Poettering Red Hat, Inc.
lennart [at] poettering [dot] net
http://0pointer.net/lennart/ GnuPG 0x1A015CC4
On Tue, 2010-02-02 at 13:04 +0100, Lennart Poettering wrote:
> Right now, if a process dies all its children are reparented to init.
> This logic has good uses, i.e. for double forking when daemonizing.
> However it also allows child processes to "escape" their parents,
> which
> is a problem for software like session managers (such as
> gnome-session)
> or other process supervisors.
>
> This patch adds a simple flag for each process that marks it as an
> "anchor" process for all its children and grandchildren. If a child of
> such an anchor dies all its children will not be reparented to init,
> but instead to this anchor, escaping this anchor process is not possible.
> A task with this flag set hence acts is little "sub-init".
>
> Anchors are fully recursive: if an anchor dies, all its children are
> reparented to next higher anchor in the process tree.
>
> This is orthogonal to PID namespaces. PID namespaces virtualize the
> actual IDs in addition to introducing "sub-inits". This patch
> introduces
> "sub-inits" inside the same PID namespace.
Sounds good to me. And seems useful for all sorts of session tracking
and "prettifying ps". :)
It seems to work fine here. With a double-fork, the child gets the
intermediate-fork pid as the parent, and when this dies, it get
re-parented to the anchor pid instead of directly to pid 1. Only when
the anchor pid dies, it will be re-parented to pid 1.
Thanks,
Kay
$ ./sub-init 1
[26209] main: anchor=1
[26209] main: forked 'help' 26210
[26209] main: wait for 'help' to exit 26210
[26210] help: has parent 26209
[26210] help: forked 'child' 26211, sleep
[26211] child: has parent 26210, sleep
[26211] child: has parent 26210, sleep
[26210] help: exit
[26209] main: 'help' 26210 returned, sleep
[26211] child: has parent 26209, sleep
[26211] child: has parent 26209, sleep
[26209] main: exit
[26211] child: has parent 1, sleep
[26211] child: has parent 1, sleep
[26211] child: has parent 1, sleep
[26211] child: has parent 1, sleep
[26211] child: has parent 1, sleep
[26211] child: has parent 1, sleep
[26211] child: exit
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/prctl.h>
#include <sys/wait.h>
#define PR_SET_ANCHOR 35
#define PR_GET_ANCHOR 36
int main(int argc, char *argv[])
{
int is_anch;
pid_t pid;
if (argc > 1)
prctl(PR_SET_ANCHOR, 1);
prctl(PR_GET_ANCHOR, &is_anch);
printf("[%i] main: anchor=%i\n", getpid(), is_anch);
pid = fork();
if (pid == 0) {
pid_t pid2;
printf("[%i] help: has parent %i\n", getpid(), getppid());
pid2 = fork();
if (pid2 == 0) {
int i;
for (i = 0; i < 30; i += 3) {
printf("[%i] child: has parent %i, sleep\n", getpid(), getppid());
sleep(1);
}
printf("[%i] child: exit\n", getpid());
} else {
printf("[%i] help: forked 'child' %i, sleep\n", getpid(), pid2);
sleep(2);
printf("[%i] help: exit\n", getpid());
return 0;
}
} else {
printf("[%i] main: forked 'help' %i\n", getpid(), pid);
printf("[%i] main: wait for 'help' to exit %i\n", getpid(), pid);
waitpid(pid, NULL, 0);
printf("[%i] main: 'help' %i returned, sleep\n", getpid(), pid);
sleep(2);
printf("[%i] main: exit\n", getpid());
}
return 0;
}
On Thu, 2010-02-04 at 16:42 +0100, Kay Sievers wrote:
> Sounds good to me. And seems useful for all sorts of session tracking
> and "prettifying ps". :)
Here is the output of 'ps" with a wrapped gnome-session with the anchor
flag set. All the started programs stay childs of the session, instead
of becoming childs of init:
Thanks,
Kay
PID TTY STAT TIME COMMAND
2 ? S 0:00 [kthreadd]
3 ? S 0:00 \_ [migration/0]
4 ? S 0:00 \_ [ksoftirqd/0]
5 ? S 0:00 \_ [migration/1]
6 ? S 0:00 \_ [ksoftirqd/1]
7 ? S 0:00 \_ [events/0]
8 ? S 0:00 \_ [events/1]
9 ? S 0:00 \_ [khelper]
10 ? S 0:00 \_ [async/mgr]
11 ? S 0:00 \_ [sync_supers]
12 ? S 0:00 \_ [bdi-default]
13 ? S 0:00 \_ [kblockd/0]
14 ? S 0:00 \_ [kblockd/1]
15 ? S 0:00 \_ [kacpid]
16 ? S 0:00 \_ [kacpi_notify]
17 ? S 0:00 \_ [kacpi_hotplug]
18 ? S 0:00 \_ [ata/0]
19 ? S 0:00 \_ [ata/1]
20 ? S 0:00 \_ [ata_aux]
21 ? S 0:00 \_ [kseriod]
24 ? S 0:00 \_ [kondemand/0]
25 ? S 0:00 \_ [kondemand/1]
26 ? S 0:00 \_ [kswapd0]
27 ? S 0:00 \_ [aio/0]
28 ? S 0:00 \_ [aio/1]
29 ? S 0:00 \_ [crypto/0]
30 ? S 0:00 \_ [crypto/1]
33 ? S 0:00 \_ [scsi_eh_0]
34 ? S 0:00 \_ [scsi_eh_1]
35 ? S 0:00 \_ [scsi_eh_2]
36 ? S 0:00 \_ [scsi_eh_3]
41 ? S 0:00 \_ [kpsmoused]
43 ? S 0:00 \_ [jbd2/sda1-8]
44 ? S 0:00 \_ [ext4-dio-unwrit]
45 ? S 0:00 \_ [ext4-dio-unwrit]
233 ? S 0:00 \_ [ksuspend_usbd]
238 ? S 0:00 \_ [khubd]
272 ? S 0:00 \_ [cfg80211]
283 ? S 0:00 \_ [kvm-irqfd-clean]
324 ? S 0:00 \_ [ktpacpid]
339 ? S 0:00 \_ [iwlagn]
340 ? S 0:00 \_ [phy0]
364 ? S 0:00 \_ [i915]
425 ? S 0:00 \_ [hd-audio0]
471 ? S 0:00 \_ [flush-259:0]
489 ? S 0:00 \_ [usbhid_resumer]
502 ? S 0:00 \_ [scsi_eh_4]
503 ? S 0:00 \_ [usb-storage]
514 ? S 0:00 \_ [kauditd]
526 ? S 0:00 \_ [kstriped]
564 ? S 0:00 \_ [kjournald]
1 ? Ss 0:00 init [5]
96 ? S<s 0:00 /sbin/udevd --daemon
212 ? S< 0:00 \_ /sbin/udevd --daemon
213 ? S< 0:00 \_ /sbin/udevd --daemon
913 ? Ss 0:00 /sbin/acpid
920 ? Ss 0:00 /bin/dbus-daemon --system
1068 ? Ss 0:00 avahi-daemon: running [yio.local]
1086 ? Sl 0:00 /sbin/rsyslogd -c 4 -f /etc/rsyslog.conf
1091 ? Ssl 0:00 /usr/sbin/console-kit-daemon
1139 ? Ss 0:00 /usr/sbin/sshd -o PidFile=/var/run/sshd.init.pid
1234 ? Ssl 0:00 /usr/sbin/nscd
1252 ? Ss 0:00 /usr/sbin/cupsd -C /etc/cups/cupsd.conf
1255 ? S 0:00 /usr/sbin/gdm
1263 ? S 0:00 \_ /usr/lib/gdm/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1
1290 tty7 Ss+ 0:15 \_ /usr/bin/Xorg :0 -br -verbose -auth /var/run/gdm/auth-for-gdm-t73y8a/database -nolisten tcp vt7
1445 ? S 0:00 \_ /usr/lib/gdm/gdm-session-worker
1455 ? Ssl 0:00 \_ /usr/bin/gnome-session
1535 ? Ss 0:00 \_ /usr/bin/gpg-agent --sh --daemon --write-env-file /home/kay/.gnupg/agent.info /usr/bin/ssh-agent /bin/bash /etc/X11/xinit/xinitrc
1536 ? Ss 0:00 \_ /usr/bin/ssh-agent /bin/bash /etc/X11/xinit/xinitrc
1546 ? S 0:00 \_ dbus-launch --exit-with-session /usr/bin/gnome-session
1547 ? Ss 0:00 \_ /bin/dbus-daemon --fork --print-pid 5 --print-address 9 --session
1556 ? S 0:00 \_ /usr/lib/GConf/2/gconfd-2
1588 ? Sl 0:00 \_ gnome-keyring-daemon --start --components=pkcs11
1589 ? SLl 0:00 \_ gnome-keyring-daemon --start --components=secrets
1592 ? Sl 0:00 \_ gnome-keyring-daemon --start --components=ssh
1597 ? Ssl 0:01 \_ /usr/lib/gnome-settings-daemon/gnome-settings-daemon
1598 ? Ss 0:00 \_ seahorse-daemon
1604 ? S 0:00 \_ /usr/lib64/gvfs/gvfsd
1611 ? Ssl 0:00 \_ /usr/lib64/gvfs//gvfs-fuse-daemon /home/kay/.gvfs
1636 ? S 0:01 \_ /usr/bin/metacity
1642 ? Ssl 0:00 \_ /usr/bin/pulseaudio --start --log-target=syslog
1740 ? S 0:00 | \_ /usr/lib/pulse/gconf-helper
1649 ? S 0:01 \_ gnome-panel
1651 ? S 0:02 \_ nautilus
1653 ? Ssl 0:00 \_ /usr/lib/bonobo/bonobo-activation-server --ac-activate --ior-output-fd=18
1668 ? S 0:00 \_ python /usr/share/system-config-printer/applet.py
1669 ? S 0:03 \_ /usr/lib/gnome-main-menu/main-menu --oaf-activate-iid=OAFIID:GNOME_MainMenu_Factory --oaf-ior-fd=18
1672 ? S 0:00 \_ evolution-alarm-notify
1673 ? S 0:00 \_ /usr/lib/polkit-gnome/polkit-gnome-authentication-agent-1
1676 ? S 0:00 \_ gnome-power-manager
1678 ? S 0:00 \_ gnome-volume-control-applet
1681 ? S 0:01 \_ nm-applet --sm-disable
1684 ? S 0:00 \_ /usr/lib/gdu-notification-daemon
1687 ? S 0:00 \_ bluetooth-applet
1705 ? S 0:00 \_ /usr/lib/notification-daemon-1.0/notification-daemon
1712 ? S 0:00 \_ /usr/lib/evolution-data-server/e-calendar-factory
1714 ? Ss 0:00 \_ gnome-screensaver
1719 ? S 0:00 \_ /usr/lib/evolution-data-server/e-addressbook-factory
1726 ? S 0:00 \_ /usr/lib64/gvfs/gvfs-gdu-volume-monitor
1737 ? S 0:00 \_ /usr/lib64/gvfs/gvfs-gphoto2-volume-monitor
1745 ? S 0:00 \_ /usr/lib64/gvfs/gvfsd-trash --spawner :1.8 /org/gtk/gvfs/exec_spaw/0
1774 ? S 0:00 \_ /usr/lib64/gvfs/gvfsd-burn --spawner :1.8 /org/gtk/gvfs/exec_spaw/1
1786 ? S 0:00 \_ /usr/lib64/gvfs/gvfsd-metadata
1885 ? Sl 0:01 \_ /usr/bin/gnome-terminal -x /bin/sh -c cd '/home/kay/Desktop' && exec $SHELL
1927 ? S 0:00 | \_ gnome-pty-helper
1928 pts/1 Ss 0:00 | \_ /bin/bash
2124 pts/1 R+ 0:00 | \_ ps afx
1981 ? S 0:01 \_ pidgin
2014 ? SLl 0:06 \_ evolution
2065 ? S 0:00 \_ /bin/sh /usr/bin/firefox
2070 ? Rl 0:04 | \_ /usr/lib64/firefox/firefox
2111 ? S 0:01 \_ xchat
2112 ? S 0:00 | \_ xchat
2123 ? S 0:00 | \_ xchat
2117 ? S 0:00 \_ palimpsest
1356 ? Ss 0:00 /usr/lib/postfix/master
1377 ? S 0:00 \_ pickup -l -t fifo -u
1390 ? Ss 0:00 /usr/sbin/crond
1434 ? Ssl 0:00 /usr/sbin/NetworkManager
1695 ? S 0:00 \_ /sbin/dhclient -d -sf /usr/lib/NetworkManager/nm-dhcp-client.action -pf /var/run/dhclient-eth0.pid -lf /var/lib/dhcp/dhclient-73a36e75-368a-434c-b6c0-cfda0e3f1b50-eth0.lease -cf /var/run/nm-dhclient-eth0.conf eth0
1438 ? S 0:00 /usr/sbin/modem-manager
1441 ? S 0:00 /usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wpa_supplicant.conf -u -f /var/log/wpa_supplicant.log
1443 ? S 0:00 /usr/sbin/nm-system-settings --config /etc/NetworkManager/nm-system-settings.conf
1550 ? S 0:00 /usr/lib/DeviceKit-power/devkit-power-daemon
1644 ? SNl 0:00 /usr/lib/rtkit/rtkit-daemon
1648 ? S 0:06 /usr/lib/polkit-1/polkitd
1707 ? S 0:00 /usr/lib/DeviceKit-disks/devkit-disks-daemon
1708 ? S 0:00 \_ devkit-disks-daemon: polling /dev/sdb /dev/sdc
1807 tty1 Ss+ 0:00 /sbin/mingetty --noclear tty1
1808 tty2 Ss+ 0:00 /sbin/mingetty tty2
1809 tty3 Ss+ 0:00 /sbin/mingetty tty3
1810 tty4 Ss+ 0:00 /sbin/mingetty tty4
1811 tty5 Ss+ 0:00 /sbin/mingetty tty5
1812 tty6 Ss+ 0:00 /sbin/mingetty tty6
On Thu, Feb 4, 2010 at 1:49 AM, Lennart Poettering <[email protected]> wrote:
> On Wed, 03.02.10 23:31, Américo Wang ([email protected]) wrote:
>
>> On Tue, Feb 02, 2010 at 01:04:57PM +0100, Lennart Poettering wrote:
>> >
>> >This patch adds a simple flag for each process that marks it as an
>> >"anchor" process for all its children and grandchildren. If a child of
>> >such an anchor dies all its children will not be reparented to init, but
>> >instead to this anchor, escaping this anchor process is not possible. A
>> >task with this flag set hence acts is little "sub-init".
>>
>> This will break the applictions which using 'getppid() == 1' to check
>> if its real parent is dead or not...
>
> Usage of the PR_SETANCHOR flag is optional for a process. It won't
> break anything unless enabled. So I don't really see a problem here.
>
> Of course, when this flag is used the behaviour is different from what
> traditional Unix says what happens with the children of a process when
> it dies. But uh, that's the whole point and that's why this flag is
> enabled optionally only.
As for the example you mentioned, gnome-session, with your patch
applied, it will use this to set itself as "anchor", all the programs that
are started within it will be children of it. If one of these programs uses
"getppid() == 1" trick, it will break it.
>
> Also, on a side note: code that checks if its parent process died most
> likely should rewritten to use PR_DEATHSIG or something like that
> anyway, so that it is notified about the parent dying instead of
> polling for it manually.
>
I agree, but this is not a reason for you to break the compatiblity.
On Fri, Feb 5, 2010 at 10:54, Américo Wang <[email protected]> wrote:
>> Also, on a side note: code that checks if its parent process died most
>> likely should rewritten to use PR_DEATHSIG or something like that
>> anyway, so that it is notified about the parent dying instead of
>> polling for it manually.
>>
>
> I agree, but this is not a reason for you to break the compatiblity.
Any substantial comments instead?
Thanks,
Kay
On Tue, Feb 2, 2010 at 12:04 PM, Lennart Poettering
<[email protected]> wrote:
> Right now, if a process dies all its children are reparented to init.
> This logic has good uses, i.e. for double forking when daemonizing.
> However it also allows child processes to "escape" their parents, which
> is a problem for software like session managers (such as gnome-session)
> or other process supervisors.
>
> This patch adds a simple flag for each process that marks it as an
> "anchor" process for all its children and grandchildren. If a child of
> such an anchor dies all its children will not be reparented to init, but
> instead to this anchor, escaping this anchor process is not possible. A
> task with this flag set hence acts is little "sub-init".
>
Why can't you simply begin a new pid namespace with the session
manager or other process supervisor? That way the session
manager/process supervisor is for all intents and purposes an init
daemon, so shouldn't be surprised about getting SIGCHLD.
More to the point, it means that as far as the processes themselves
are concerned, they're being reparented to pid 1 just as they were
before, so you wouldn't be breaking any assumptions there either.
You could use the existing init daemon to create these pid namespaces
when it spawns the session manager.
Scott
On Mon, Dec 20, 2010 at 15:26, Scott James Remnant <[email protected]> wrote:
> On Tue, Feb 2, 2010 at 12:04 PM, Lennart Poettering
> <[email protected]> wrote:
>
>> Right now, if a process dies all its children are reparented to init.
>> This logic has good uses, i.e. for double forking when daemonizing.
>> However it also allows child processes to "escape" their parents, which
>> is a problem for software like session managers (such as gnome-session)
>> or other process supervisors.
>>
>> This patch adds a simple flag for each process that marks it as an
>> "anchor" process for all its children and grandchildren. If a child of
>> such an anchor dies all its children will not be reparented to init, but
>> instead to this anchor, escaping this anchor process is not possible. A
>> task with this flag set hence acts is little "sub-init".
>>
> Why can't you simply begin a new pid namespace with the session
> manager or other process supervisor?
We do not want to disconnect users from the system. Too much stuff
depends on that for good reasons.
This is only about a "user init" process, which is a much softer
concept which better fits into our current setups. It is not really
about disconnecting the user from the system, by putting him in a
"container".
The systems view from the management/administration perspective, with
users with their own pids, would really get far to complicated, I
think.
> That way the session
> manager/process supervisor is for all intents and purposes an init
> daemon, so shouldn't be surprised about getting SIGCHLD.
That shouldn't be a problem.
> More to the point, it means that as far as the processes themselves
> are concerned, they're being reparented to pid 1 just as they were
> before, so you wouldn't be breaking any assumptions there either.
For now, I don't think that this will break anything. Stuff that
really expects to have ppid() == 1 should be fixed anyway.
> You could use the existing init daemon to create these pid namespaces
> when it spawns the session manager.
We already use the existing init daemon for that. :)
This is mainly about 'prettifying ps'. The cgroups already provide us
with all the needed information, it would be just nice to localize
SIGCHLD handling to the "user init", where the signal belongs to.
Kay
On Mon, 20.12.10 14:26, Scott James Remnant ([email protected]) wrote:
> > This patch adds a simple flag for each process that marks it as an
> > "anchor" process for all its children and grandchildren. If a child of
> > such an anchor dies all its children will not be reparented to init, but
> > instead to this anchor, escaping this anchor process is not possible. A
> > task with this flag set hence acts is little "sub-init".
> >
> Why can't you simply begin a new pid namespace with the session
> manager or other process supervisor? That way the session
> manager/process supervisor is for all intents and purposes an init
> daemon, so shouldn't be surprised about getting SIGCHLD.
PID namespaces primarily provide an independent PID numbering scheme for
a subset of processes, i.e. so that identical may PIDs refer to different
processes depending on the namespace they are running in. As a side
effect this also provides init-like behaviour for processes that aren't
the original PID 1 of the operating system. For systemd we are only
interested in this side effect, but are not interested at all in the
renumbering of processes, and in fact would even really dislike if it
happened. That's why PR_SET_ANCHOR is useful: it gives us init-like
behaviour without renaming all processes.
Lennart
--
Lennart Poettering - Red Hat, Inc.
On Tue, Dec 21, 2010 at 9:56 AM, Lennart Poettering
<[email protected]> wrote:
> On Mon, 20.12.10 14:26, Scott James Remnant ([email protected]) wrote:
>
>> > This patch adds a simple flag for each process that marks it as an
>> > "anchor" process for all its children and grandchildren. If a child of
>> > such an anchor dies all its children will not be reparented to init, but
>> > instead to this anchor, escaping this anchor process is not possible. A
>> > task with this flag set hence acts is little "sub-init".
>> >
>> Why can't you simply begin a new pid namespace with the session
>> manager or other process supervisor? ?That way the session
>> manager/process supervisor is for all intents and purposes an init
>> daemon, so shouldn't be surprised about getting SIGCHLD.
>
> PID namespaces primarily provide an independent PID numbering scheme for
> a subset of processes, i.e. so that identical may PIDs refer to different
> processes depending on the namespace they are running in. As a side
> effect this also provides init-like behaviour for processes that aren't
> the original PID 1 of the operating system. For systemd we are only
> interested in this side effect, but are not interested at all in the
> renumbering of processes, and in fact would even really dislike if it
> happened. That's why PR_SET_ANCHOR is useful: it gives us init-like
> behaviour without renaming all processes.
>
Right, but I don't get why you need this behavior to supervise either
system or user processes. You already get all the functionality you
need to track processes via either cgroups or the proc connector (or a
combination of both).
So is this really just about making ps look pretty, as Kay says?
Scott
On Tue, 21.12.10 12:05, Scott James Remnant ([email protected]) wrote:
> > PID namespaces primarily provide an independent PID numbering scheme for
> > a subset of processes, i.e. so that identical may PIDs refer to different
> > processes depending on the namespace they are running in. As a side
> > effect this also provides init-like behaviour for processes that aren't
> > the original PID 1 of the operating system. For systemd we are only
> > interested in this side effect, but are not interested at all in the
> > renumbering of processes, and in fact would even really dislike if it
> > happened. That's why PR_SET_ANCHOR is useful: it gives us init-like
> > behaviour without renaming all processes.
> >
> Right, but I don't get why you need this behavior to supervise either
> system or user processes. You already get all the functionality you
> need to track processes via either cgroups or the proc connector (or a
> combination of both).
Well, we want a clean way to get access to the full siginfo_t of the
SIGCHLD for the main process of a service. the proc connector is awful
and cgroups does not pass siginfo_t's back to userspace, hence the
cleanest way to get this done properly and beautifully is to make the
session systemd a mini-init via PR_SET_ANCHOR, because then the per-user
systemd's and the per-system systemd can use the exact same code to
handle process managment.
> So is this really just about making ps look pretty, as Kay says?
That's a side effect, but for me it's mostly about getting a simple way
to get the SIGCHLDs, focussed on the children of the session manager and
with minimal wakeups.
Lennart
--
Lennart Poettering - Red Hat, Inc.
On Thu, Dec 23, 2010 at 3:44 PM, Lennart Poettering
<[email protected]> wrote:
> On Tue, 21.12.10 12:05, Scott James Remnant ([email protected]) wrote:
>
>> > PID namespaces primarily provide an independent PID numbering scheme for
>> > a subset of processes, i.e. so that identical may PIDs refer to different
>> > processes depending on the namespace they are running in. As a side
>> > effect this also provides init-like behaviour for processes that aren't
>> > the original PID 1 of the operating system. For systemd we are only
>> > interested in this side effect, but are not interested at all in the
>> > renumbering of processes, and in fact would even really dislike if it
>> > happened. That's why PR_SET_ANCHOR is useful: it gives us init-like
>> > behaviour without renaming all processes.
>> >
>> Right, but I don't get why you need this behavior to supervise either
>> system or user processes. ?You already get all the functionality you
>> need to track processes via either cgroups or the proc connector (or a
>> combination of both).
>
> Well, we want a clean way to get access to the full siginfo_t of the
> SIGCHLD for the main process of a service. the proc connector is awful
> and cgroups does not pass siginfo_t's back to userspace, hence the
> cleanest way to get this done properly and beautifully is to make the
> session systemd a mini-init via PR_SET_ANCHOR, because then the per-user
> systemd's and the per-system systemd can use the exact same code to
> handle process managment.
>
Ah, if you're after the siginfo_t this makes more sense.
You may or may not be interested in a patch I did a couple of years
ago, this also spliced into the same kind of code, but to notify init
via signal when it got given a new process via adoption. I assume
this would work with a PR_SET_ANCHOR'd mini-init too?
(Apologies if the attachment screws up, still getting used to gmail :p)
Scott