Cc Oleg and Roland and moving discussion to LKML.
Daniel Lezcano [[email protected]] wrote:
> Hi,
>
> I noticed a changed behaviour with the PR_SET_PDEATHSIG and SIGKILL
> between different kernel versions.
>
> With a kernel 2.6.27.21-78.2.41.fc9.x86_64, the SIGKILL signal is
> delivered to the child process when the parent dies but with a 2.6.31
> kernel version that don't happen.
>
> The program below shows the problem. I remember there was were some
> modifications about not killing the init process of the container from
> inside, but in this case, that happens _conceptually_ from outside.
> Keeping this feature is very important to be able to wipe out the
> container when the parent process of the container dies.
(Test case moved to attachment).
---
Container init must not be immune to signals from parent. But as pointed
out by Daniel Lezcano:
https://lists.linux-foundation.org/pipermail/containers/2009-October/021121.html
container-init is currently immune to signals from parent, if sent via
->pdeath_signal. This is because the siginfo for ->pdeath_signal is set to
SEND_SIG_NOINFO which is considered special.
This quick patch passes in siginfo explicitly (just like we do when sending
SIGCHLD to parent) and seems to fix the problem. Not though sure if
->pdeath_signal needs to be 'is_si_special()'.
Changelog [v2]:
- [Oleg Nesterov] Add missing initializer, ->si_code = SI_USER
- [Sukadev Bhattiprolu] Use 'tgid' of parent instead of 'pid'.
---
kernel/exit.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
Index: linux-2.6/kernel/exit.c
===================================================================
--- linux-2.6.orig/kernel/exit.c 2009-10-02 19:23:00.000000000 -0700
+++ linux-2.6/kernel/exit.c 2009-10-03 10:02:42.000000000 -0700
@@ -738,8 +738,20 @@ static struct task_struct *find_new_reap
static void reparent_thread(struct task_struct *father, struct task_struct *p,
struct list_head *dead)
{
- if (p->pdeath_signal)
- group_send_sig_info(p->pdeath_signal, SEND_SIG_NOINFO, p);
+ if (p->pdeath_signal) {
+ struct siginfo info;
+
+ info.si_code = SI_USER;
+ info.si_signo = p->pdeath_signal;
+ info.si_errno = 0;
+
+ rcu_read_lock();
+ info.si_pid = task_tgid_nr_ns(father, task_active_pid_ns(p));
+ info.si_uid = __task_cred(father)->uid;
+ rcu_read_unlock();
+
+ group_send_sig_info(p->pdeath_signal, &info, p);
+ }
list_move_tail(&p->sibling, &p->real_parent->children);
On 10/03, Sukadev Bhattiprolu wrote:
>
> static void reparent_thread(struct task_struct *father, struct task_struct *p,
> struct list_head *dead)
> {
> - if (p->pdeath_signal)
> - group_send_sig_info(p->pdeath_signal, SEND_SIG_NOINFO, p);
> + if (p->pdeath_signal) {
> + struct siginfo info;
> +
> + info.si_code = SI_USER;
> + info.si_signo = p->pdeath_signal;
> + info.si_errno = 0;
> +
> + rcu_read_lock();
> + info.si_pid = task_tgid_nr_ns(father, task_active_pid_ns(p));
> + info.si_uid = __task_cred(father)->uid;
> + rcu_read_unlock();
> +
> + group_send_sig_info(p->pdeath_signal, &info, p);
> + }
I think the patch is correct.
But afaics we should clarify the "from user" semantics and fix
send_signal() instead.
What do you think about this simple series? (the last 2 patches
are pure cosmetic and off-topic).
Oleg.
No changes in compiled code. The patch adds the new helper, si_fromuser()
and changes check_kill_permission() to use this helper.
The real effect of this patch is that from now we "officially" consider
SEND_SIG_NOINFO signal as "from user-space" signals. This is already true
if we look at the code which uses SEND_SIG_NOINFO, except __send_signal()
has another opinion - see the next patch.
The naming of these special SEND_SIG_XXX siginfo's is really bad imho.
>From __send_signal()'s pov they mean
SEND_SIG_NOINFO from user
SEND_SIG_PRIV from kernel
SEND_SIG_FORCED no info
Signed-off-by: Oleg Nesterov <[email protected]>
---
include/linux/sched.h | 5 -----
kernel/signal.c | 16 +++++++++++++---
2 files changed, 13 insertions(+), 8 deletions(-)
--- TTT_32/include/linux/sched.h~FU_1_HELPER 2009-09-24 21:38:54.000000000 +0200
+++ TTT_32/include/linux/sched.h 2009-10-04 02:21:49.000000000 +0200
@@ -2081,11 +2081,6 @@ static inline int kill_cad_pid(int sig,
#define SEND_SIG_PRIV ((struct siginfo *) 1)
#define SEND_SIG_FORCED ((struct siginfo *) 2)
-static inline int is_si_special(const struct siginfo *info)
-{
- return info <= SEND_SIG_FORCED;
-}
-
/* True if we are on the alternate signal stack. */
static inline int on_sig_stack(unsigned long sp)
--- TTT_32/kernel/signal.c~FU_1_HELPER 2009-09-24 21:38:54.000000000 +0200
+++ TTT_32/kernel/signal.c 2009-10-04 02:21:55.000000000 +0200
@@ -584,6 +584,17 @@ static int rm_from_queue(unsigned long m
return 1;
}
+static inline int is_si_special(const struct siginfo *info)
+{
+ return info <= SEND_SIG_FORCED;
+}
+
+static inline bool si_fromuser(const struct siginfo *info)
+{
+ return info == SEND_SIG_NOINFO ||
+ (!is_si_special(info) && SI_FROMUSER(info));
+}
+
/*
* Bad permissions for sending the signal
* - the caller must hold at least the RCU read lock
@@ -598,7 +609,7 @@ static int check_kill_permission(int sig
if (!valid_signal(sig))
return -EINVAL;
- if (info != SEND_SIG_NOINFO && (is_si_special(info) || SI_FROMKERNEL(info)))
+ if (!si_fromuser(info))
return 0;
error = audit_signal_info(sig, t); /* Let audit system see the signal */
@@ -1156,8 +1167,7 @@ int kill_pid_info_as_uid(int sig, struct
goto out_unlock;
}
pcred = __task_cred(p);
- if ((info == SEND_SIG_NOINFO ||
- (!is_si_special(info) && SI_FROMUSER(info))) &&
+ if (si_fromuser(info) &&
euid != pcred->suid && euid != pcred->uid &&
uid != pcred->suid && uid != pcred->uid) {
ret = -EPERM;
Change send_signal() to use si_fromuser(). From now SEND_SIG_NOINFO
triggers the "from_ancestor_ns" check.
This fixes reparent_thread()->group_send_sig_info(pdeath_signal)
behaviour, before this patch send_signal() does not detect the
cross-namespace case when the child of the dying parent belongs
to the sub-namespace.
This patch can affect the behaviour of send_sig(), kill_pgrp() and
kill_pid() when the caller sends the signal to the sub-namespace
with "priv == 0" but surprisingly all callers seem to use them
correctly, including disassociate_ctty(on_exit).
Except: drivers/staging/comedi/drivers/addi-data/*.c incorrectly
use send_sig(priv => 0). But his is minor and should be fixed
anyway.
Reported-by: Daniel Lezcano <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/signal.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- TTT_32/kernel/signal.c~FU_2_SEND_SIGNAL 2009-10-04 02:21:55.000000000 +0200
+++ TTT_32/kernel/signal.c 2009-10-04 03:09:44.000000000 +0200
@@ -928,9 +928,8 @@ static int send_signal(int sig, struct s
int from_ancestor_ns = 0;
#ifdef CONFIG_PID_NS
- if (!is_si_special(info) && SI_FROMUSER(info) &&
- task_pid_nr_ns(current, task_active_pid_ns(t)) <= 0)
- from_ancestor_ns = 1;
+ from_ancestor_ns = si_fromuser(info) &&
+ !task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
return __send_signal(sig, info, t, group, from_ancestor_ns);
Trivial, s/0/SI_USER/ in collect_signal() for grep.
This is a bit confusing, we don't know the source of this signal.
But we don't care, and "info->si_code = 0" is imho worse.
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- TTT_32/kernel/signal.c~3_COLLECT_SIGNAL 2009-10-04 03:09:44.000000000 +0200
+++ TTT_32/kernel/signal.c 2009-10-04 03:48:48.000000000 +0200
@@ -400,7 +400,7 @@ still_pending:
*/
info->si_signo = sig;
info->si_errno = 0;
- info->si_code = 0;
+ info->si_code = SI_USER;
info->si_pid = 0;
info->si_uid = 0;
}
Kill force_sig_specific(), this trivial wrapper has no callers.
Signed-off-by: Oleg Nesterov <[email protected]>
---
include/linux/sched.h | 1 -
kernel/signal.c | 6 ------
2 files changed, 7 deletions(-)
--- TTT_32/include/linux/sched.h~4_KILL_force_sig_specific 2009-10-04 02:21:49.000000000 +0200
+++ TTT_32/include/linux/sched.h 2009-10-04 04:08:01.000000000 +0200
@@ -2062,7 +2062,6 @@ extern int kill_proc_info(int, struct si
extern int do_notify_parent(struct task_struct *, int);
extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
extern void force_sig(int, struct task_struct *);
-extern void force_sig_specific(int, struct task_struct *);
extern int send_sig(int, struct task_struct *, int);
extern void zap_other_threads(struct task_struct *p);
extern struct sigqueue *sigqueue_alloc(void);
--- TTT_32/kernel/signal.c~4_KILL_force_sig_specific 2009-10-04 03:48:48.000000000 +0200
+++ TTT_32/kernel/signal.c 2009-10-04 04:08:36.000000000 +0200
@@ -1032,12 +1032,6 @@ force_sig_info(int sig, struct siginfo *
return ret;
}
-void
-force_sig_specific(int sig, struct task_struct *t)
-{
- force_sig_info(sig, SEND_SIG_FORCED, t);
-}
-
/*
* Nuke all other threads in the group.
*/
On 10/04, Oleg Nesterov wrote:
>
> No changes in compiled code. The patch adds the new helper, si_fromuser()
> and changes check_kill_permission() to use this helper.
>
> The real effect of this patch is that from now we "officially" consider
> SEND_SIG_NOINFO signal as "from user-space" signals. This is already true
> if we look at the code which uses SEND_SIG_NOINFO, except __send_signal()
> has another opinion - see the next patch.
>
> The naming of these special SEND_SIG_XXX siginfo's is really bad imho.
> From __send_signal()'s pov they mean
>
> SEND_SIG_NOINFO from user
To clarify, "from user" for SEND_SIG_NOINFO/SI_USER mean: sent by kernel
on behalf of some process. We should check permissions, sub-namespace,
we should fill si_pid/uid, etc.
Oleg.
Oleg Nesterov [[email protected]] wrote:
| No changes in compiled code. The patch adds the new helper, si_fromuser()
| and changes check_kill_permission() to use this helper.
|
| The real effect of this patch is that from now we "officially" consider
| SEND_SIG_NOINFO signal as "from user-space" signals. This is already true
| if we look at the code which uses SEND_SIG_NOINFO, except __send_signal()
| has another opinion - see the next patch.
|
| The naming of these special SEND_SIG_XXX siginfo's is really bad imho.
I agree :-)
| >From __send_signal()'s pov they mean
|
| SEND_SIG_NOINFO from user
Just to complicate further, all 'SEND_SIG_NOINFO' signals are from user,
but not all 'from user' signals are SEND_SIG_NOINFO.
| SEND_SIG_PRIV from kernel
SEND_SIG_PRIV also means there is no real info, just that sender is
privileged.
| SEND_SIG_FORCED no info
Are 'forced' signals considered 'from kernel' too ?
Separate from this patch, but would be good if we could fix the naming.
|
| Signed-off-by: Oleg Nesterov <[email protected]>
| ---
|
| include/linux/sched.h | 5 -----
| kernel/signal.c | 16 +++++++++++++---
| 2 files changed, 13 insertions(+), 8 deletions(-)
|
| --- TTT_32/include/linux/sched.h~FU_1_HELPER 2009-09-24 21:38:54.000000000 +0200
| +++ TTT_32/include/linux/sched.h 2009-10-04 02:21:49.000000000 +0200
| @@ -2081,11 +2081,6 @@ static inline int kill_cad_pid(int sig,
| #define SEND_SIG_PRIV ((struct siginfo *) 1)
| #define SEND_SIG_FORCED ((struct siginfo *) 2)
|
| -static inline int is_si_special(const struct siginfo *info)
| -{
| - return info <= SEND_SIG_FORCED;
| -}
| -
| /* True if we are on the alternate signal stack. */
|
| static inline int on_sig_stack(unsigned long sp)
| --- TTT_32/kernel/signal.c~FU_1_HELPER 2009-09-24 21:38:54.000000000 +0200
| +++ TTT_32/kernel/signal.c 2009-10-04 02:21:55.000000000 +0200
| @@ -584,6 +584,17 @@ static int rm_from_queue(unsigned long m
| return 1;
| }
|
| +static inline int is_si_special(const struct siginfo *info)
| +{
| + return info <= SEND_SIG_FORCED;
| +}
| +
| +static inline bool si_fromuser(const struct siginfo *info)
| +{
| + return info == SEND_SIG_NOINFO ||
| + (!is_si_special(info) && SI_FROMUSER(info));
| +}
| +
This change makes sense, but can we even drop the SEND_SIG_NOINFO
altogether and simply check for NULL:
return (!info || (is_si_special(info)) && SI_FROMUSER(info))
Sukadev
Oleg Nesterov [[email protected]] wrote:
| Trivial, s/0/SI_USER/ in collect_signal() for grep.
|
| This is a bit confusing, we don't know the source of this signal.
| But we don't care, and "info->si_code = 0" is imho worse.
|
| Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Sukadev Bhattiprolu <[email protected]>
| ---
|
| kernel/signal.c | 2 +-
| 1 file changed, 1 insertion(+), 1 deletion(-)
|
| --- TTT_32/kernel/signal.c~3_COLLECT_SIGNAL 2009-10-04 03:09:44.000000000 +0200
| +++ TTT_32/kernel/signal.c 2009-10-04 03:48:48.000000000 +0200
| @@ -400,7 +400,7 @@ still_pending:
| */
| info->si_signo = sig;
| info->si_errno = 0;
| - info->si_code = 0;
| + info->si_code = SI_USER;
| info->si_pid = 0;
| info->si_uid = 0;
| }
Oleg Nesterov [[email protected]] wrote:
| Kill force_sig_specific(), this trivial wrapper has no callers.
|
| Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Sukadev Bhattiprolu <[email protected]>
| ---
|
| include/linux/sched.h | 1 -
| kernel/signal.c | 6 ------
| 2 files changed, 7 deletions(-)
|
| --- TTT_32/include/linux/sched.h~4_KILL_force_sig_specific 2009-10-04 02:21:49.000000000 +0200
| +++ TTT_32/include/linux/sched.h 2009-10-04 04:08:01.000000000 +0200
| @@ -2062,7 +2062,6 @@ extern int kill_proc_info(int, struct si
| extern int do_notify_parent(struct task_struct *, int);
| extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
| extern void force_sig(int, struct task_struct *);
| -extern void force_sig_specific(int, struct task_struct *);
| extern int send_sig(int, struct task_struct *, int);
| extern void zap_other_threads(struct task_struct *p);
| extern struct sigqueue *sigqueue_alloc(void);
| --- TTT_32/kernel/signal.c~4_KILL_force_sig_specific 2009-10-04 03:48:48.000000000 +0200
| +++ TTT_32/kernel/signal.c 2009-10-04 04:08:36.000000000 +0200
| @@ -1032,12 +1032,6 @@ force_sig_info(int sig, struct siginfo *
| return ret;
| }
|
| -void
| -force_sig_specific(int sig, struct task_struct *t)
| -{
| - force_sig_info(sig, SEND_SIG_FORCED, t);
| -}
| -
| /*
| * Nuke all other threads in the group.
| */
Oleg Nesterov [[email protected]] wrote:
| Change send_signal() to use si_fromuser(). From now SEND_SIG_NOINFO
| triggers the "from_ancestor_ns" check.
|
| This fixes reparent_thread()->group_send_sig_info(pdeath_signal)
| behaviour, before this patch send_signal() does not detect the
| cross-namespace case when the child of the dying parent belongs
| to the sub-namespace.
|
| This patch can affect the behaviour of send_sig(), kill_pgrp() and
| kill_pid() when the caller sends the signal to the sub-namespace
| with "priv == 0" but surprisingly all callers seem to use them
| correctly, including disassociate_ctty(on_exit).
|
| Except: drivers/staging/comedi/drivers/addi-data/*.c incorrectly
| use send_sig(priv => 0). But his is minor and should be fixed
| anyway.
|
| Reported-by: Daniel Lezcano <[email protected]>
| Signed-off-by: Oleg Nesterov <[email protected]>
| ---
|
| kernel/signal.c | 5 ++---
| 1 file changed, 2 insertions(+), 3 deletions(-)
|
| --- TTT_32/kernel/signal.c~FU_2_SEND_SIGNAL 2009-10-04 02:21:55.000000000 +0200
| +++ TTT_32/kernel/signal.c 2009-10-04 03:09:44.000000000 +0200
| @@ -928,9 +928,8 @@ static int send_signal(int sig, struct s
| int from_ancestor_ns = 0;
|
| #ifdef CONFIG_PID_NS
| - if (!is_si_special(info) && SI_FROMUSER(info) &&
| - task_pid_nr_ns(current, task_active_pid_ns(t)) <= 0)
| - from_ancestor_ns = 1;
| + from_ancestor_ns = si_fromuser(info) &&
| + !task_pid_nr_ns(current, task_active_pid_ns(t));
Makes sense. And we had mentioned earlier that container-init is immune
to suicide but should we add a check for 'current == t' above to cover the
send_sig(SIGKILL, current, 0);
in load_aout_binary() and friends
from_ancestor_ns = si_fromuser(info) && (current == t ||
!task_pid_nr_ns(current, task_active_pid_ns(t)));
Sukadev.
On 10/05, Sukadev Bhattiprolu wrote:
>
> Oleg Nesterov [[email protected]] wrote:
> |
> | --- TTT_32/kernel/signal.c~FU_2_SEND_SIGNAL 2009-10-04 02:21:55.000000000 +0200
> | +++ TTT_32/kernel/signal.c 2009-10-04 03:09:44.000000000 +0200
> | @@ -928,9 +928,8 @@ static int send_signal(int sig, struct s
> | int from_ancestor_ns = 0;
> |
> | #ifdef CONFIG_PID_NS
> | - if (!is_si_special(info) && SI_FROMUSER(info) &&
> | - task_pid_nr_ns(current, task_active_pid_ns(t)) <= 0)
> | - from_ancestor_ns = 1;
> | + from_ancestor_ns = si_fromuser(info) &&
> | + !task_pid_nr_ns(current, task_active_pid_ns(t));
>
> Makes sense. And we had mentioned earlier that container-init is immune
> to suicide but should we add a check for 'current == t' above to cover the
>
> send_sig(SIGKILL, current, 0);
>
> in load_aout_binary() and friends
>
> from_ancestor_ns = si_fromuser(info) && (current == t ||
> !task_pid_nr_ns(current, task_active_pid_ns(t)));
I don't think so.
First of all, this is just ugly. If we need this check we should change
the callers, not send_signal().
But more importantly, I disagree with "container-init is immune to suicide"
above. This is another issue I was going to discuss later, lets do this now.
When load_elf_binary() does send_sig(SIGKILL, current) init must die, because
we have no option. Exec failed, but we can't return to user-space with the
error code, it is too late.
So, imho this patch also fixes this case by accident, but I think it would
be better to change load_aout_binary/etc to use force_sig_info() to make
the code more explicit.
What do you think?
Oleg.
On 10/05, Sukadev Bhattiprolu wrote:
>
> Oleg Nesterov [[email protected]] wrote:
> | >From __send_signal()'s pov they mean
> |
> | SEND_SIG_NOINFO from user
>
> Just to complicate further, all 'SEND_SIG_NOINFO' signals are from user,
> but not all 'from user' signals are SEND_SIG_NOINFO.
Yes, SEND_SIG_NOINFO means: sent by kernel on behalf of some process.
>
> | SEND_SIG_PRIV from kernel
>
> SEND_SIG_PRIV also means there is no real info, just that sender is
> privileged.
Well. Unlike SEND_SIG_FORCED, SEND_SIG_NOINFO/SEND_SIG_NOINFO ask
__send_signal() to allocate and queue "struct sigqueue". But
SEND_SIG_PRIV and SEND_SIG_NOINFO both mean the real info, jut
this info is filled by __send_signal().
> | SEND_SIG_FORCED no info
>
> Are 'forced' signals considered 'from kernel' too ?
I think yes.
> | +static inline bool si_fromuser(const struct siginfo *info)
> | +{
> | + return info == SEND_SIG_NOINFO ||
> | + (!is_si_special(info) && SI_FROMUSER(info));
> | +}
> | +
>
> This change makes sense, but can we even drop the SEND_SIG_NOINFO
> altogether and simply check for NULL:
>
> return (!info || (is_si_special(info)) && SI_FROMUSER(info))
IOW, you suggest to use NULL instead of SEND_SIG_NOINFO.
Why?
If we use NULL as a "special" info, then SEND_SIG_FORCED semantics
makes more sense because __send_signal(SEND_SIG_FORCED) does not queue
info.
But I don't think we should use NULL. I think it is better to use the
symbolic names instead of NULL which is in fact the "harcoded constant".
But it would be nice to rename them.
Oleg.
Oleg Nesterov [[email protected]] wrote:
| On 10/05, Sukadev Bhattiprolu wrote:
| >
| > Oleg Nesterov [[email protected]] wrote:
| > |
| > | --- TTT_32/kernel/signal.c~FU_2_SEND_SIGNAL 2009-10-04 02:21:55.000000000 +0200
| > | +++ TTT_32/kernel/signal.c 2009-10-04 03:09:44.000000000 +0200
| > | @@ -928,9 +928,8 @@ static int send_signal(int sig, struct s
| > | int from_ancestor_ns = 0;
| > |
| > | #ifdef CONFIG_PID_NS
| > | - if (!is_si_special(info) && SI_FROMUSER(info) &&
| > | - task_pid_nr_ns(current, task_active_pid_ns(t)) <= 0)
| > | - from_ancestor_ns = 1;
| > | + from_ancestor_ns = si_fromuser(info) &&
| > | + !task_pid_nr_ns(current, task_active_pid_ns(t));
| >
| > Makes sense. And we had mentioned earlier that container-init is immune
| > to suicide but should we add a check for 'current == t' above to cover the
| >
| > send_sig(SIGKILL, current, 0);
| >
| > in load_aout_binary() and friends
| >
| > from_ancestor_ns = si_fromuser(info) && (current == t ||
| > !task_pid_nr_ns(current, task_active_pid_ns(t)));
|
| I don't think so.
|
| First of all, this is just ugly. If we need this check we should change
| the callers, not send_signal().
Well, all I am saying is that the check
!task_pid_nr_ns(current, task_active_pid_ns(t)))
excludes container-init sending signal to itself - task_pid_nr_ns() above
would return 1 for container-init and 'from_ancestor_ns' would be 0.
But sure, we could use force_sig_info() in caller.
|
| But more importantly, I disagree with "container-init is immune to suicide"
| above. This is another issue I was going to discuss later, lets do this now.
Ok :-)
|
| When load_elf_binary() does send_sig(SIGKILL, current) init must die, because
| we have no option. Exec failed, but we can't return to user-space with the
| error code, it is too late.
|
Hence the SIGKILL - I agree with this.
| So, imho this patch also fixes this case by accident,
This part I am not sure. But as mentioned above, from_ancestor_ns is 0
and the SIGKILL will not queued right ?
| but I think it would
| be better to change load_aout_binary/etc to use force_sig_info() to make
| the code more explicit.
|
| What do you think?
|
| Oleg.
On 10/05, Sukadev Bhattiprolu wrote:
>
> Oleg Nesterov [[email protected]] wrote:
> | On 10/05, Sukadev Bhattiprolu wrote:
> | >
> | > Oleg Nesterov [[email protected]] wrote:
> | > |
> | > | --- TTT_32/kernel/signal.c~FU_2_SEND_SIGNAL 2009-10-04 02:21:55.000000000 +0200
> | > | +++ TTT_32/kernel/signal.c 2009-10-04 03:09:44.000000000 +0200
> | > | @@ -928,9 +928,8 @@ static int send_signal(int sig, struct s
> | > | int from_ancestor_ns = 0;
> | > |
> | > | #ifdef CONFIG_PID_NS
> | > | - if (!is_si_special(info) && SI_FROMUSER(info) &&
> | > | - task_pid_nr_ns(current, task_active_pid_ns(t)) <= 0)
> | > | - from_ancestor_ns = 1;
> | > | + from_ancestor_ns = si_fromuser(info) &&
> | > | + !task_pid_nr_ns(current, task_active_pid_ns(t));
> | >
> | > Makes sense. And we had mentioned earlier that container-init is immune
> | > to suicide but should we add a check for 'current == t' above to cover the
> | >
> | > send_sig(SIGKILL, current, 0);
> | >
> | > in load_aout_binary() and friends
> | >
> | > from_ancestor_ns = si_fromuser(info) && (current == t ||
> | > !task_pid_nr_ns(current, task_active_pid_ns(t)));
> |
> | I don't think so.
> |
> | First of all, this is just ugly. If we need this check we should change
> | the callers, not send_signal().
>
> Well, all I am saying is that the check
>
> !task_pid_nr_ns(current, task_active_pid_ns(t)))
>
> excludes container-init sending signal to itself - task_pid_nr_ns() above
> would return 1 for container-init and 'from_ancestor_ns' would be 0.
Ah, I misunderstood you, and I misread the "current == t" check above.
I wrongly thought that you suggest to suppress "si_fromuser()" when
the task sends a signal to itself.
Sorry for confusion.
> But sure, we could use force_sig_info() in caller.
Yes, because this makes the code more explicit imho. And we can avoid
the further complicatiions in send_signal() path.
> | So, imho this patch also fixes this case by accident,
>
> This part I am not sure. But as mentioned above, from_ancestor_ns is 0
> and the SIGKILL will not queued right ?
Yes, you are right, see above.
I meant, it fixes the from-user logic, not from_ancestor_ns logic.
Oleg.
On 10/05, Oleg Nesterov wrote:
>
> On 10/05, Sukadev Bhattiprolu wrote:
> >
> > Oleg Nesterov [[email protected]] wrote:
>
> > | So, imho this patch also fixes this case by accident,
> >
> > This part I am not sure. But as mentioned above, from_ancestor_ns is 0
> > and the SIGKILL will not queued right ?
>
> Yes, you are right, see above.
>
> I meant, it fixes the from-user logic, not from_ancestor_ns logic.
Hmm. When I re-read that email, I do not understand any longer
what I reallly meant when I said "this patch also fixes this case" :)
Anyway, you are right, thanks for correcting me.OH
Oleg.
Oleg Nesterov [[email protected]] wrote:
| Sorry for confusion.
|
| > But sure, we could use force_sig_info() in caller.
|
| Yes, because this makes the code more explicit imho. And we can avoid
| the further complicatiions in send_signal() path.
Although, one small drawback would be the different behavior for the
SIGKILL in load_aout_binary() to the container-init itself calling:
kill(getpid(), SIGKILL);
(which of course is not very useful).
Oleg Nesterov [[email protected]] wrote:
| No changes in compiled code. The patch adds the new helper, si_fromuser()
| and changes check_kill_permission() to use this helper.
|
| The real effect of this patch is that from now we "officially" consider
| SEND_SIG_NOINFO signal as "from user-space" signals. This is already true
| if we look at the code which uses SEND_SIG_NOINFO, except __send_signal()
| has another opinion - see the next patch.
|
| The naming of these special SEND_SIG_XXX siginfo's is really bad imho.
| >From __send_signal()'s pov they mean
|
| SEND_SIG_NOINFO from user
| SEND_SIG_PRIV from kernel
| SEND_SIG_FORCED no info
|
| Signed-off-by: Oleg Nesterov <[email protected]>
Renaming the special siginfo cases be done independently.
Reviewed-by: Sukadev Bhattiprolu <[email protected]>
Oleg Nesterov [[email protected]] wrote:
| Change send_signal() to use si_fromuser(). From now SEND_SIG_NOINFO
| triggers the "from_ancestor_ns" check.
|
| This fixes reparent_thread()->group_send_sig_info(pdeath_signal)
| behaviour, before this patch send_signal() does not detect the
| cross-namespace case when the child of the dying parent belongs
| to the sub-namespace.
|
| This patch can affect the behaviour of send_sig(), kill_pgrp() and
| kill_pid() when the caller sends the signal to the sub-namespace
| with "priv == 0" but surprisingly all callers seem to use them
| correctly, including disassociate_ctty(on_exit).
|
| Except: drivers/staging/comedi/drivers/addi-data/*.c incorrectly
| use send_sig(priv => 0). But his is minor and should be fixed
| anyway.
|
| Reported-by: Daniel Lezcano <[email protected]>
| Signed-off-by: Oleg Nesterov <[email protected]>
Since, addressing the problem of container-init sending SIGKILL to itself
would have to be a separate patch:
Reviewed-by: Sukadev Bhattiprolu <[email protected]>
On 10/05, Sukadev Bhattiprolu wrote:
>
> Oleg Nesterov [[email protected]] wrote:
> | Sorry for confusion.
> |
> | > But sure, we could use force_sig_info() in caller.
> |
> | Yes, because this makes the code more explicit imho. And we can avoid
> | the further complicatiions in send_signal() path.
>
> Although, one small drawback would be the different behavior for the
> SIGKILL in load_aout_binary() to the container-init itself calling:
>
> kill(getpid(), SIGKILL);
could you clarify? load_aout_binary(), like other ->load_binary()
methods does send_sig(SIGKILL, current, 0) ?
And thanks for review.
Oleg.
Oleg Nesterov [[email protected]] wrote:
| On 10/05, Sukadev Bhattiprolu wrote:
| >
| > Oleg Nesterov [[email protected]] wrote:
| > | Sorry for confusion.
| > |
| > | > But sure, we could use force_sig_info() in caller.
| > |
| > | Yes, because this makes the code more explicit imho. And we can avoid
| > | the further complicatiions in send_signal() path.
| >
| > Although, one small drawback would be the different behavior for the
| > SIGKILL in load_aout_binary() to the container-init itself calling:
| >
| > kill(getpid(), SIGKILL);
|
| could you clarify? load_aout_binary(), like other ->load_binary()
| methods does send_sig(SIGKILL, current, 0) ?
Yes sorry for being cryptic.
If we use force_sig_info() in ->load_binary() methods for the SIGKILL,
they will, correctly, kill the container-init.
But if the container-init itself calls kill(getpid(), SIGKILL), the
container-init will not be killed.
I was just pointing out the small difference in behavior for the
same signal (when we use force_sig_info()).
Thanks for fixing the bug.
Sukadev
This whole series looks fine to me. I think in commenting and cleaning up
any of this, it bears explicit mention that (almost) every signal is
potentially reduced to SI_USER. That is, in siqueue exhaustion you don't
get any info and only non-special non-SI_USER >=SIGRTMIN signals ever fail
to get posted, so you get the all-zeros defaults from collect_signal() at
delivery time. That's the principle you're encoding in your si_fromuser()
logic, but your logs and comments are not explicit about the relationship
between that logic and what's implicit in the queue-exhaustion behavior.
Thanks,
Roland
On 10/05, Sukadev Bhattiprolu wrote:
>
> Oleg Nesterov [[email protected]] wrote:
> | On 10/05, Sukadev Bhattiprolu wrote:
> | >
> | > Oleg Nesterov [[email protected]] wrote:
> | > | Sorry for confusion.
> | > |
> | > | > But sure, we could use force_sig_info() in caller.
> | > |
> | > | Yes, because this makes the code more explicit imho. And we can avoid
> | > | the further complicatiions in send_signal() path.
> | >
> | > Although, one small drawback would be the different behavior for the
> | > SIGKILL in load_aout_binary() to the container-init itself calling:
> | >
> | > kill(getpid(), SIGKILL);
> |
> | could you clarify? load_aout_binary(), like other ->load_binary()
> | methods does send_sig(SIGKILL, current, 0) ?
>
> Yes sorry for being cryptic.
>
> If we use force_sig_info() in ->load_binary() methods for the SIGKILL,
> they will, correctly, kill the container-init.
>
> But if the container-init itself calls kill(getpid(), SIGKILL), the
> container-init will not be killed.
Ah, now I see what you mean.
Yes sure, init can't kill itself with or without these changes. But,
I think this is supposed behaviour which we do not want to change?
Oh. And I guess I misunderstood you before. From the previous email
> Makes sense. And we had mentioned earlier that container-init is immune
> to suicide
I guess this is what you meant, and I fully agree.
When I said "I disagree with container-init is immune to suicide",
I wrongly thought that you suggest that load_binary()->kill(SIGKILL)
should have no effect. I have to apologize for confusion again.
I hope we finally understand each other ;)
Oleg.
On 10/06, Roland McGrath wrote:
>
> This whole series looks fine to me. I think in commenting and cleaning up
> any of this, it bears explicit mention that (almost) every signal is
> potentially reduced to SI_USER.
Yes,
> but your logs and comments are not explicit about the relationship
> between that logic and what's implicit in the queue-exhaustion behavior.
Yes. the changelog for 3/4 mentions that this SI_USER doesn't really
mean SI_FROMUSER(), but I agree I should have been more explicit.
Perhaps, we should add the comment to explain that both SI_FROMUSER()
and si_fromuser() are only valid in the sending pathes. Fortunately
get_signal_to_deliver and friends do not care about the origination of
the signal.
Oleg.
> Perhaps, we should add the comment to explain that both SI_FROMUSER()
> and si_fromuser() are only valid in the sending pathes.
Yes. Also now that you put them both in a sentence together, it is clear
that it is insane to have two different things with those two names that
differ only in capitalization.
Thanks,
Roland
> Yes sure, init can't kill itself with or without these changes. But,
> I think this is supposed behaviour which we do not want to change?
I agree. (Actually, I think init shouldn't be "protected" that way at all.
You shoot downwards, you get your foot. But we are not talking about
changing the global init behavior, and I do think that the container-init
behavior should be as consistent as possible with what global init sees.)
It's possible to meaningfully swallow a kill() signal because then nothing
happens at all. The exec failure cases are special because all the damage
is already done so there is no way to avoid dying, and SIGKILL is just
making it more formal and graceful.
Thanks,
Roland
On 10/06, Roland McGrath wrote:
>
> > Perhaps, we should add the comment to explain that both SI_FROMUSER()
> > and si_fromuser() are only valid in the sending pathes.
>
> Yes. Also now that you put them both in a sentence together, it is clear
> that it is insane to have two different things with those two names that
> differ only in capitalization.
I think this doesn't matter because we need more cleanups. As for naming
I agree, si_fromuser() sucks and I'd be happy to send the patch which
renames it (or re-send these 2 patches).
The problem is, both SI_FROMUSER() and SI_FROMKERNEL() must die imho.
In fact I think they should never exist.
SI_FROMUSER(siptr) ((siptr)->si_code <= 0)
note "<=", this means this helper is unuseable. What we need is another
macro/inline which checks "si_code < 0" (or >= 0 depending on naming),
this helper should be used by sys_sigqueueinfo/etc which can not not use
SI_FROMXXX() because SI_USER is rightly forbidden. __send_signal() can
use the new helper too.
Other cleanups which imho makes sense:
- rename SEND_SIG_XXX
- redefine them to make sure SEND_SIG_NOINFO != NULL
Oleg.
> I think this doesn't matter because we need more cleanups.
Ok, good enough for me.
> The problem is, both SI_FROMUSER() and SI_FROMKERNEL() must die imho.
> In fact I think they should never exist.
I've always found all the si_code hackery rather confusing. (Don't forget
the crucial magic (short) cast repeated three places in exit.c without a
comment between them!)
> Other cleanups which imho makes sense:
>
> - rename SEND_SIG_XXX
>
> - redefine them to make sure SEND_SIG_NOINFO != NULL
Sounds good to me.
Thanks,
Roland