Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp4266885imm; Mon, 15 Oct 2018 11:46:53 -0700 (PDT) X-Google-Smtp-Source: ACcGV62PeQMSychZY7ILFdlXdTyNfsdYvwkMnLClCyWqBeTRdbqS45Pwt9V6E9Ld6mLK5uxQqWY4 X-Received: by 2002:a65:610e:: with SMTP id z14-v6mr17020200pgu.138.1539629213350; Mon, 15 Oct 2018 11:46:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539629213; cv=none; d=google.com; s=arc-20160816; b=GOtFlYfBEplnai8aaeordPuaWExw+SpvxEvB/HRKSwAHTX6GntAzAGXuumhrqLrvAR Ml1tEjoLuA9sjkU+oSI5enzekIWwxz3M4MPFCtd0JOc3lt4ZN4ooMxtWe19/U2E9NvkS VvPGerKp+ebnnhuBr5btFgW/kRXFJ2PwKv9sdkNPG3dAb03o7arcasttjFUsporoFtH6 yVRjWYdOPtX8d60ZQPdUy8N4VM7mHypseSiYd/YkcfmDdgbUZNVh4M5ndpA6Q1s2F0bk 79B7ZvK3xy8VmunNj8f9LQbSw4mPbjSx3S/Q2OoJXlrMCgyPcOFYczKlvoFJv5PkxOve kgHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=/RyACWoaQFMQfdwF9PyEq7wG6BGZrEVf0Tv2hGiI6iQ=; b=VsZJj6N+ebJkFsuwUAivYw1iDVtj1b1/BVLjxaRV5NXHMNba/0XNafBi+mmHEkKgVw /5ojkSAE+me1ffwnHT8sHwKD3aDv1bkfxrY1exO/HioUtRu8sU9Uku2L3PCcHK+/k7il /JdV7ENRdXnaTCNpWL7EglikeSIzELPsZp3LgD2BBY0V2WuW293V24RAILKicb8QOzos 1wSeDfqL0+iI/OOqXyb3Mv88Mo2KLedSTlCBI3lXOqgR/5CM48fDnasc3es24On9727Y trH0bT2fbtfKTxUUIgnYQFmxycCDAuhhN6Z3s1Mavla4XHVRrea5OZIl2MyEo1yHpdjT 95QQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cisco.com header.s=iport header.b=TD6+JAct; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cisco.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k3-v6si10300069pgb.72.2018.10.15.11.46.37; Mon, 15 Oct 2018 11:46:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cisco.com header.s=iport header.b=TD6+JAct; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cisco.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726888AbeJPCcZ (ORCPT + 99 others); Mon, 15 Oct 2018 22:32:25 -0400 Received: from rcdn-iport-5.cisco.com ([173.37.86.76]:42806 "EHLO rcdn-iport-5.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726820AbeJPCcZ (ORCPT ); Mon, 15 Oct 2018 22:32:25 -0400 X-Greylist: delayed 574 seconds by postgrey-1.27 at vger.kernel.org; Mon, 15 Oct 2018 22:32:24 EDT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=8275; q=dns/txt; s=iport; t=1539629157; x=1540838757; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=0No/hpWKflZ5alnCVm7wCsMvL6Umj7oJdX9ccgSpWGs=; b=TD6+JAct4Wdsowc2Z2YNjlbFckgfLseWyzb9L9FghELLjAuE6wFa3wPV IzIuMdNU16MooSuHjCNsoDNXmY5sWK2XMvKI/RX0ed+Tt272wuGQnxSzX kBFZqkKMg2jSsHxpaNXSsNVTJ9qdBrvnUVOub4IznTtQ4zMPzPH4i3cKj A=; X-IronPort-AV: E=Sophos;i="5.54,385,1534809600"; d="scan'208";a="248599199" Received: from alln-core-5.cisco.com ([173.36.13.138]) by rcdn-iport-5.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Oct 2018 18:36:24 +0000 Received: from [10.154.208.167] ([10.154.208.167]) by alln-core-5.cisco.com (8.15.2/8.15.2) with ESMTP id w9FIaK4H024403; Mon, 15 Oct 2018 18:36:20 GMT Subject: Re: [PATCH] kernel/signal: Signal-based pre-coredump notification To: Jann Horn Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , the arch/x86 maintainers , Peter Zijlstra , Arnd Bergmann , "Eric W. Biederman" , Khalid Aziz , Kate Stewart , deller@gmx.de, Greg Kroah-Hartman , Al Viro , Andrew Morton , christian@brauner.io, Catalin Marinas , Will Deacon , Dave.Martin@arm.com, mchehab+samsung@kernel.org, Michal Hocko , Rik van Riel , "Kirill A . Shutemov" , guro@fb.com, marcos.souza.org@gmail.com, Oleg Nesterov , linux@dominikbrodowski.net, Cyrill Gorcunov , yang.shi@linux.alibaba.com, Kees Cook , kernel list , linux-arch , kamensky@cisco.com, xe-linux-external@cisco.com, sstrogin@cisco.com, Enke Chen References: From: Enke Chen Message-ID: Date: Mon, 15 Oct 2018 11:36:19 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Auto-Response-Suppress: DR, OOF, AutoReply X-Outbound-SMTP-Client: 10.154.208.167, [10.154.208.167] X-Outbound-Node: alln-core-5.cisco.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Jann: Thanks a lot for you detailed review. Please see my replied/comments inline. On 10/13/18 11:27 AM, Jann Horn wrote: > On Sat, Oct 13, 2018 at 2:33 AM Enke Chen wrote: >> For simplicity and consistency, this patch provides an implementation >> for signal-based fault notification prior to the coredump of a child >> process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can >> be used by an application to express its interest and to specify the >> signal (SIGCHLD or SIGUSR1 or SIGUSR2) for such a notification. A new >> signal code (si_code), CLD_PREDUMP, is also defined for SIGCHLD. > > Your suggested API looks vaguely similar to PR_SET_PDEATHSIG, but with > some important differences: > > - You don't reset the signal on setuid execution. > - You permit setting this not just on the current process, but also on others. > > For both of these: Are these differences actually necessary, and if > so, can you provide a specific rationale? From a security perspective, > I would very much prefer it if this API had semantics closer to > PR_SET_PDEATHSIG. Regarding setting on others, I started with setting for self. But there is a requirement for supporting the feature for a process manager written in bash script. That's the reason for allowing the setting on others. Given the feedback from you and others, I agree that it would be simpler and more secure to remove the setting on others. We can submit a patch for bash to support the setting natively. Regarding the impact of "setuid", this property "PR_SET_PREDUMP_SIG" has to do with the application/process whether the signal handler is set for receiving such a notification. If it is set, the "uid" should not matter. > > [...] >> diff --git a/kernel/signal.c b/kernel/signal.c >> index 312b43e..eb4a483 100644 >> --- a/kernel/signal.c >> +++ b/kernel/signal.c >> @@ -2337,6 +2337,44 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info) >> return signr; >> } >> >> +/* >> + * Let the parent, if so desired, know about the imminent death of a child >> + * prior to its coredump. >> + * >> + * Locking logic is similar to do_notify_parent_cldstop(). >> + */ >> +static void do_notify_parent_predump(struct task_struct *tsk) >> +{ >> + struct sighand_struct *sighand; >> + struct task_struct *parent; >> + struct kernel_siginfo info; >> + unsigned long flags; >> + int sig; >> + >> + parent = tsk->real_parent; >> + sig = parent->predump_signal; >> + >> + /* Check again with "tasklist_lock" locked by the caller */ >> + if (!valid_predump_signal(sig)) >> + return; >> + >> + clear_siginfo(&info); >> + info.si_signo = sig; >> + if (sig == SIGCHLD) >> + info.si_code = CLD_PREDUMP; >> + >> + rcu_read_lock(); >> + info.si_pid = task_pid_nr_ns(tsk, task_active_pid_ns(parent)); >> + info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), >> + task_uid(tsk)); > > You're sending a signal from the current namespaces, but with IDs that > have been mapped into the parent's namespaces? That looks wrong to me. I am following the example "do_notify_parent_cldstop()" called in the same routine "get_signal()". If there is a better way, sure I will use it. > >> + rcu_read_unlock(); >> + >> + sighand = parent->sighand; >> + spin_lock_irqsave(&sighand->siglock, flags); >> + __group_send_sig_info(sig, &info, parent); >> + spin_unlock_irqrestore(&sighand->siglock, flags); >> +} >> + >> bool get_signal(struct ksignal *ksig) >> { >> struct sighand_struct *sighand = current->sighand; >> @@ -2497,6 +2535,19 @@ bool get_signal(struct ksignal *ksig) >> current->flags |= PF_SIGNALED; >> >> if (sig_kernel_coredump(signr)) { >> + /* >> + * Notify the parent prior to the coredump if the >> + * parent is interested in such a notificaiton. >> + */ >> + int p_sig = current->real_parent->predump_signal; > > current->real_parent is an __rcu member. I think if you run the sparse > checker against this patch, it's going to complain. Are you allowed to > access current->real_parent in this context? Let me check, and get back to you on this one. > >> + if (valid_predump_signal(p_sig)) { >> + read_lock(&tasklist_lock); >> + do_notify_parent_predump(current); >> + read_unlock(&tasklist_lock); >> + cond_resched(); >> + } >> + >> if (print_fatal_signals) >> print_fatal_signal(ksig->info.si_signo); >> proc_coredump_connector(current); >> diff --git a/kernel/sys.c b/kernel/sys.c >> index 123bd73..43eb250d 100644 >> --- a/kernel/sys.c >> +++ b/kernel/sys.c >> @@ -2258,6 +2258,76 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, >> return -EINVAL; >> } >> >> +static int prctl_get_predump_signal(struct task_struct *tsk, pid_t pid, >> + int __user *addr) >> +{ >> + struct task_struct *p; >> + int error; >> + >> + /* For the current task, the common case */ >> + if (pid == 0) { >> + put_user(tsk->predump_signal, addr); >> + return 0; >> + } >> + >> + error = -ESRCH; >> + rcu_read_lock(); >> + p = find_task_by_vpid(pid); >> + if (p) { >> + error = 0; >> + put_user(p->predump_signal, addr); > > This is wrong. You can't call put_user() while you're in an RCU > read-side critical section. > > As below, I would like it if you could just get rid of this branch, > and restrict this prctl to operating on the current process. My bad. The code will be removed. > >> + } >> + rcu_read_unlock(); >> + return error; >> +} >> + >> +/* >> + * Returns true if current's euid is same as p's uid or euid, >> + * or has CAP_SYS_ADMIN. >> + * >> + * Called with rcu_read_lock, creds are safe. >> + * >> + * Adapted from set_one_prio_perm(). >> + */ >> +static bool set_predump_signal_perm(struct task_struct *p) >> +{ >> + const struct cred *cred = current_cred(), *pcred = __task_cred(p); >> + >> + return uid_eq(pcred->uid, cred->euid) || >> + uid_eq(pcred->euid, cred->euid) || >> + capable(CAP_SYS_ADMIN); >> +} > > This permission check permits fiddling with other processes in > scenarios where kill() wouldn't let you send signals (specifically, if > you control the EUID of the target task). That worries me. Also, > kill() is subject to LSM checks (see check_kill_permission()), but > this interface isn't. I would really prefer it if you could amend this > so that you can only operate on the current task, and get rid of this > permission check. It is gone. > > [...] >> SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, >> unsigned long, arg4, unsigned long, arg5) >> { >> @@ -2476,6 +2546,13 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, >> return -EINVAL; >> error = arch_prctl_spec_ctrl_set(me, arg2, arg3); >> break; >> + case PR_SET_PREDUMP_SIG: >> + error = prctl_set_predump_signal(me, (pid_t)arg2, (int)arg3); >> + break; >> + case PR_GET_PREDUMP_SIG: >> + error = prctl_get_predump_signal(me, (pid_t)arg2, >> + (int __user *)arg3); >> + break; > > New prctl() calls should check that the unused arguments are zero, in > order to make it possible to safely add more flags in the future. Will do. Thanks again. -- Enke