Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp2162469imm; Sat, 13 Oct 2018 11:28:51 -0700 (PDT) X-Google-Smtp-Source: ACcGV60Fr8c33z5DsSI2AFmupPZJjLjyWLWbepm4H+WZKRA5kVZumDjpL9rPcw56H3+UL3EZDeno X-Received: by 2002:a17:902:5a8b:: with SMTP id r11-v6mr10990311pli.305.1539455331604; Sat, 13 Oct 2018 11:28:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539455331; cv=none; d=google.com; s=arc-20160816; b=HFKT+ZO644/qpSOiuEsAFpELVCZaM3Op9e9b681+UgliS3PXipU5jb5g7paMkBzaOO 4HIQYiB4FkV1rKNH+5kMfNndlWblGEhR7eD0APXqdaPzFwmvp8nlVaEiJTEVBepcBbFA ydEmDWOota+VHf+CpQ621BT/7pe6JNtmlpKwxGO1tphicY2v/CjlrOTFZvF1xYbJNXfd s4q6KmBW67O3MH7beQoEeT4y11dNEIhz9LLhAUa2HKXfg3QZD+Zjlk3xnCSepaivWdyM w/SOh0Kv2ux1oYm0RluCouszpl3BLuFVKkJUxVa/3hD5rzdzS+xA5aAcplH+QB9hM/0p 2Vwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=A/8Y0Hb6Wyj/qkND39Hgf/WWfH6YJH4yKTgmz9ZnF3A=; b=sXOO7emGK4g4MzuwTvJU8mfJIymUSSUFl9gnZZON/X3T2BeYLO4q28FNTPWqSJAxre yWlC8yjKdQsJvk+bWX+xXOM2J7f3tDwcj/JBrMFzO0UZBgu4Fd34rjw9r0Mk5MJAoL5h ab7ezJRHQ3PtfeEDZ2CEcgaALFdH/IqM9PnD45icM5MgBOcsrMMI1zH+SPpr2jzgZsIR 24XeNXu0VOOVHu8rRCCAPqqS/RVWff466NdvPsjYbfY1wwc5+6eJ8kY8Sc92bzHbCsDh oe3riWu+l0YjZ8ENiFQn5yHUyupME3Yi4Ou/RHh0MB2TxB2RYSEpkkNZ2Oun6owjL9ev JpCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="Kil9J/+W"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e36-v6si5185590pge.127.2018.10.13.11.28.36; Sat, 13 Oct 2018 11:28:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="Kil9J/+W"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726854AbeJNCGQ (ORCPT + 99 others); Sat, 13 Oct 2018 22:06:16 -0400 Received: from mail-oi1-f194.google.com ([209.85.167.194]:40348 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726670AbeJNCGP (ORCPT ); Sat, 13 Oct 2018 22:06:15 -0400 Received: by mail-oi1-f194.google.com with SMTP id j68-v6so12254882oib.7 for ; Sat, 13 Oct 2018 11:28:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=A/8Y0Hb6Wyj/qkND39Hgf/WWfH6YJH4yKTgmz9ZnF3A=; b=Kil9J/+WsGxve+TAkuLPVQxkmNl9o2ziXWey4kM9rXQkjVl9sZjGAKWuvCx11tLkoD cvJ0v4+DcU2QrwBosy+1tPlFgyh3nYdShLCPqUdBkccpfnOwuxpajAzvpR4P3gkN5mNH /ml7VXssETEDk5qhJgYRF20YT9CX6/nHninXnKslZiOEbLnOJTYNr67amVSLqlzyicGJ W00AiDZkFlThgdc6Xjzol+NE1YbkL+EkRmmcsWbOSBH5+vQkKZiFwB/q1Z8ESn9JpK2R jleYzSVezWrxqWk8silsN/PnoRqEKqwuTStMTFaL2SL++fVjsy8xmOIUhKM9IJsPNAAS ZB9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=A/8Y0Hb6Wyj/qkND39Hgf/WWfH6YJH4yKTgmz9ZnF3A=; b=WyOWTBeSebyakAMx6bxldwKZwV/FCOtIEYL+qAQxPvo9ChaabJbVXp21+g8t1x3PW5 ErGHNMdw5Ql85SyJ0lik7Ri7FXtfGlTcIHVyRCT21UBFtgr82Kq7t/7u1xcKoKHpgWRu 1/uQ5bTcSvj4ypi0B2atyTlcyYT+o9J4oebux2pffs3ble6SXSExr6ZYWl5DzlmvAgvr KkrXuRvxy4rQkq70tRbac3Ihtk7GZ8bEL/EX8UAM6wigvASz5w+Mrq28harQv0ZKCs0Y iE9noNtvHOzJ3DlgmQVceo4lXSqE/OUGiEyqnDbzXml0VP48CCUGqyTezkkw0PY3nKlA UM6A== X-Gm-Message-State: ABuFfojYOASqm/gb9hbbg/F8TjMhzsolv28FQrybD15+ECMjDgSxf6e1 ndyMtArt/KjuNXakGImCwAQCKwudt+oJKtuT6aRrRw== X-Received: by 2002:aca:efd6:: with SMTP id n205-v6mr5580705oih.3.1539455285088; Sat, 13 Oct 2018 11:28:05 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jann Horn Date: Sat, 13 Oct 2018 20:27:38 +0200 Message-ID: Subject: Re: [PATCH] kernel/signal: Signal-based pre-coredump notification To: enkechen@cisco.com Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , "the arch/x86 maintainers" , Peter Zijlstra , Arnd Bergmann , "Eric W. Biederman" , Khalid Aziz , Kate Stewart , deller@gmx.de, Greg Kroah-Hartman , Al Viro , Andrew Morton , christian@brauner.io, Catalin Marinas , Will Deacon , Dave.Martin@arm.com, mchehab+samsung@kernel.org, Michal Hocko , Rik van Riel , "Kirill A . Shutemov" , guro@fb.com, marcos.souza.org@gmail.com, Oleg Nesterov , linux@dominikbrodowski.net, Cyrill Gorcunov , yang.shi@linux.alibaba.com, Kees Cook , kernel list , linux-arch , kamensky@cisco.com, xe-linux-external@cisco.com, sstrogin@cisco.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 13, 2018 at 2:33 AM Enke Chen wrote: > For simplicity and consistency, this patch provides an implementation > for signal-based fault notification prior to the coredump of a child > process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can > be used by an application to express its interest and to specify the > signal (SIGCHLD or SIGUSR1 or SIGUSR2) for such a notification. A new > signal code (si_code), CLD_PREDUMP, is also defined for SIGCHLD. Your suggested API looks vaguely similar to PR_SET_PDEATHSIG, but with some important differences: - You don't reset the signal on setuid execution. - You permit setting this not just on the current process, but also on others. For both of these: Are these differences actually necessary, and if so, can you provide a specific rationale? From a security perspective, I would very much prefer it if this API had semantics closer to PR_SET_PDEATHSIG. [...] > diff --git a/kernel/signal.c b/kernel/signal.c > index 312b43e..eb4a483 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2337,6 +2337,44 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info) > return signr; > } > > +/* > + * Let the parent, if so desired, know about the imminent death of a child > + * prior to its coredump. > + * > + * Locking logic is similar to do_notify_parent_cldstop(). > + */ > +static void do_notify_parent_predump(struct task_struct *tsk) > +{ > + struct sighand_struct *sighand; > + struct task_struct *parent; > + struct kernel_siginfo info; > + unsigned long flags; > + int sig; > + > + parent = tsk->real_parent; > + sig = parent->predump_signal; > + > + /* Check again with "tasklist_lock" locked by the caller */ > + if (!valid_predump_signal(sig)) > + return; > + > + clear_siginfo(&info); > + info.si_signo = sig; > + if (sig == SIGCHLD) > + info.si_code = CLD_PREDUMP; > + > + rcu_read_lock(); > + info.si_pid = task_pid_nr_ns(tsk, task_active_pid_ns(parent)); > + info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), > + task_uid(tsk)); You're sending a signal from the current namespaces, but with IDs that have been mapped into the parent's namespaces? That looks wrong to me. > + rcu_read_unlock(); > + > + sighand = parent->sighand; > + spin_lock_irqsave(&sighand->siglock, flags); > + __group_send_sig_info(sig, &info, parent); > + spin_unlock_irqrestore(&sighand->siglock, flags); > +} > + > bool get_signal(struct ksignal *ksig) > { > struct sighand_struct *sighand = current->sighand; > @@ -2497,6 +2535,19 @@ bool get_signal(struct ksignal *ksig) > current->flags |= PF_SIGNALED; > > if (sig_kernel_coredump(signr)) { > + /* > + * Notify the parent prior to the coredump if the > + * parent is interested in such a notificaiton. > + */ > + int p_sig = current->real_parent->predump_signal; current->real_parent is an __rcu member. I think if you run the sparse checker against this patch, it's going to complain. Are you allowed to access current->real_parent in this context? > + if (valid_predump_signal(p_sig)) { > + read_lock(&tasklist_lock); > + do_notify_parent_predump(current); > + read_unlock(&tasklist_lock); > + cond_resched(); > + } > + > if (print_fatal_signals) > print_fatal_signal(ksig->info.si_signo); > proc_coredump_connector(current); > diff --git a/kernel/sys.c b/kernel/sys.c > index 123bd73..43eb250d 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -2258,6 +2258,76 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, > return -EINVAL; > } > > +static int prctl_get_predump_signal(struct task_struct *tsk, pid_t pid, > + int __user *addr) > +{ > + struct task_struct *p; > + int error; > + > + /* For the current task, the common case */ > + if (pid == 0) { > + put_user(tsk->predump_signal, addr); > + return 0; > + } > + > + error = -ESRCH; > + rcu_read_lock(); > + p = find_task_by_vpid(pid); > + if (p) { > + error = 0; > + put_user(p->predump_signal, addr); This is wrong. You can't call put_user() while you're in an RCU read-side critical section. As below, I would like it if you could just get rid of this branch, and restrict this prctl to operating on the current process. > + } > + rcu_read_unlock(); > + return error; > +} > + > +/* > + * Returns true if current's euid is same as p's uid or euid, > + * or has CAP_SYS_ADMIN. > + * > + * Called with rcu_read_lock, creds are safe. > + * > + * Adapted from set_one_prio_perm(). > + */ > +static bool set_predump_signal_perm(struct task_struct *p) > +{ > + const struct cred *cred = current_cred(), *pcred = __task_cred(p); > + > + return uid_eq(pcred->uid, cred->euid) || > + uid_eq(pcred->euid, cred->euid) || > + capable(CAP_SYS_ADMIN); > +} This permission check permits fiddling with other processes in scenarios where kill() wouldn't let you send signals (specifically, if you control the EUID of the target task). That worries me. Also, kill() is subject to LSM checks (see check_kill_permission()), but this interface isn't. I would really prefer it if you could amend this so that you can only operate on the current task, and get rid of this permission check. [...] > SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > unsigned long, arg4, unsigned long, arg5) > { > @@ -2476,6 +2546,13 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, > return -EINVAL; > error = arch_prctl_spec_ctrl_set(me, arg2, arg3); > break; > + case PR_SET_PREDUMP_SIG: > + error = prctl_set_predump_signal(me, (pid_t)arg2, (int)arg3); > + break; > + case PR_GET_PREDUMP_SIG: > + error = prctl_get_predump_signal(me, (pid_t)arg2, > + (int __user *)arg3); > + break; New prctl() calls should check that the unused arguments are zero, in order to make it possible to safely add more flags in the future.