Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp1382999imm; Fri, 12 Oct 2018 17:43:46 -0700 (PDT) X-Google-Smtp-Source: ACcGV62eQKfmQr7fxhEHstETCURwxJnvfJBeYvXItJ6q3XTqdUKfxmpCPT8Cgq0wb07IzhowKVfE X-Received: by 2002:a17:902:1129:: with SMTP id d38-v6mr2599826pla.270.1539391426034; Fri, 12 Oct 2018 17:43:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539391425; cv=none; d=google.com; s=arc-20160816; b=vaSII3+HHaYQnMJUiGVBOo7MwS4k0m1XhCIWYPWYII4P/DgiZ7x4AARP5Es2pYZZb8 jW+QEVkwtJDmQsu7qyO4YC2N8sJvZ46O+yK2s5NXZSvEgjaZTH4IrzkYlzJoQ8kRNjMm 70rYXfzsxlAWedtEllHxY/XxCcJdAGlkMVfwqkPDJHwkR6vyrf2XYPspPIAtMZ0qGvER W3/yHmcweJqKll0+sxCke9yQmyKDYzcICXyZR6fNER8lwGD1FOCyDKMT3HO9ESfnWJ8h sij5Dl9E26GSVzols8KqHaTKXlT0BjrE1jnPJk2XjhPGb4ZQ0nsBJAlzvA9etuOn3AXt bC2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:mime-version:user-agent:date:message-id:subject:cc :from:to:dkim-signature; bh=qeQrjDDYrOL/ytPrIaueZPVORWDvd3HVvWZBCYY5ou0=; b=aNdUd7aGJdmjbXq9+AXEF68lD9oWzOIjuQSHdTng3YM8MutFQB8+VEDownarkhdz1G gzF1CaPTxyNFqvRF3NaHqnTL9+tVatpxgdcOcpKWsSRHL2QUbRcnN7W4rYDEbd30n0h/ ySSeahqXLOD9dZTzxuUVswo1HtlJ23xSphAf4E9Jor1SKHc2PuUK9eJLKIB5lbLLdYEM AKIC/q5CtT+/GS8rsM0ZKd8sMy10dn1CnXKcRHYv4TQ6KArLCS/ULPV+MbnTxE3S6ELD myt2jWwE+h64OHgXHccV8lyEdtYNacnbX09r5xzxALZMxIQ+ujPT9vL5ah8Qpa8XQwtu HR+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cisco.com header.s=iport header.b=BsfcUCVp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cisco.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a32-v6si2790866pgm.24.2018.10.12.17.43.30; Fri, 12 Oct 2018 17:43:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cisco.com header.s=iport header.b=BsfcUCVp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cisco.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726363AbeJMISG (ORCPT + 99 others); Sat, 13 Oct 2018 04:18:06 -0400 Received: from alln-iport-3.cisco.com ([173.37.142.90]:57212 "EHLO alln-iport-3.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726086AbeJMISG (ORCPT ); Sat, 13 Oct 2018 04:18:06 -0400 X-Greylist: delayed 571 seconds by postgrey-1.27 at vger.kernel.org; Sat, 13 Oct 2018 04:18:04 EDT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=9739; q=dns/txt; s=iport; t=1539391386; x=1540600986; h=to:from:cc:subject:message-id:date:mime-version: content-transfer-encoding; bh=yQiOG26t4yWgrpRWkKhg6VU6cwdx8KiWqe6julKE/7g=; b=BsfcUCVpq50pARG6TeW6Pjd2L7a/zUReXtR7sR8UNQ5NpjQTW5ZFeGD9 V9xHSmTpEQFWcN32PcMMzBzvtJUmW1Zno8I9JLpD7yJYgRodM4kLMLa75 auuqiRK+UP/jphuxx6NBjMiCdqVxsVDBLFcV3Xwtff2MMrggFgtW/e88V o=; X-IronPort-AV: E=Sophos;i="5.54,374,1534809600"; d="scan'208";a="185538400" Received: from rcdn-core-12.cisco.com ([173.37.93.148]) by alln-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Oct 2018 00:33:38 +0000 Received: from [10.154.208.159] ([10.154.208.159]) by rcdn-core-12.cisco.com (8.15.2/8.15.2) with ESMTP id w9D0XZ8a029947; Sat, 13 Oct 2018 00:33:36 GMT To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org, Peter Zijlstra , Arnd Bergmann , "Eric W. Biederman" , Khalid Aziz , Kate Stewart , Helge Deller , Greg Kroah-Hartman , Al Viro , Andrew Morton , Christian Brauner , Catalin Marinas , Will Deacon , Dave Martin , Mauro Carvalho Chehab , Michal Hocko , Rik van Riel , "Kirill A. Shutemov" , Roman Gushchin , Marcos Paulo de Souza , Oleg Nesterov , Dominik Brodowski , Cyrill Gorcunov , Yang Shi , Jann Horn , Kees Cook From: Enke Chen Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, Enke Chen , "Victor Kamensky (kamensky)" , xe-linux-external@cisco.com, Stefan Strogin Subject: [PATCH] kernel/signal: Signal-based pre-coredump notification Message-ID: Date: Fri, 12 Oct 2018 17:33:35 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Auto-Response-Suppress: DR, OOF, AutoReply X-Outbound-SMTP-Client: 10.154.208.159, [10.154.208.159] X-Outbound-Node: rcdn-core-12.cisco.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org For simplicity and consistency, this patch provides an implementation for signal-based fault notification prior to the coredump of a child process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can be used by an application to express its interest and to specify the signal (SIGCHLD or SIGUSR1 or SIGUSR2) for such a notification. A new signal code (si_code), CLD_PREDUMP, is also defined for SIGCHLD. Background: As the coredump of a process may take time, in certain time-sensitive applications it is necessary for a parent process (e.g., a process manager) to be notified of a child's imminent death before the coredump so that the parent process can act sooner, such as re-spawning an application process, or initiating a control-plane fail-over. Currently there are two ways for a parent process to be notified of a child process's state change. One is to use the POSIX signal, and another is to use the kernel connector module. The specific events and actions are summarized as follows: Process Event POSIX Signal Connector-based ---------------------------------------------------------------------- ptrace_attach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_STOPPED ptrace_detach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_CONTINUED pre_coredump/ N/A proc_coredump_connector() get_signal() post_coredump/ do_notify_parent() proc_exit_connector() do_exit() SIGCHLD / exit_signal ---------------------------------------------------------------------- As shown in the table, the signal-based pre-coredump notification is not currently available. In some cases using a connector-based notification can be quite complicated (e.g., when a process manager is written in shell scripts and thus is subject to certain inherent limitations), and a signal-based notification would be simpler and better suited. Signed-off-by: Enke Chen --- arch/x86/kernel/signal_compat.c | 2 +- include/linux/sched.h | 4 ++ include/linux/signal.h | 5 +++ include/uapi/asm-generic/siginfo.h | 3 +- include/uapi/linux/prctl.h | 4 ++ kernel/fork.c | 1 + kernel/signal.c | 51 +++++++++++++++++++++++++ kernel/sys.c | 77 ++++++++++++++++++++++++++++++++++++++ 8 files changed, 145 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 9ccbf05..a3deba8 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void) BUILD_BUG_ON(NSIGSEGV != 7); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 5); - BUILD_BUG_ON(NSIGCHLD != 6); + BUILD_BUG_ON(NSIGCHLD != 7); BUILD_BUG_ON(NSIGSYS != 1); /* This is part of the ABI and can never change in size: */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 09026ea..cfb9645 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -696,6 +696,10 @@ struct task_struct { int exit_signal; /* The signal sent when the parent dies: */ int pdeath_signal; + + /* The signal sent prior to a child's coredump: */ + int predump_signal; + /* JOBCTL_*, siglock protected: */ unsigned long jobctl; diff --git a/include/linux/signal.h b/include/linux/signal.h index 706a499..7cb976d 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -256,6 +256,11 @@ static inline int valid_signal(unsigned long sig) return sig <= _NSIG ? 1 : 0; } +static inline int valid_predump_signal(int sig) +{ + return (sig == SIGCHLD) || (sig == SIGUSR1) || (sig == SIGUSR2); +} + struct timespec; struct pt_regs; enum pid_type; diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index cb3d6c2..1a47cef 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -267,7 +267,8 @@ struct { \ #define CLD_TRAPPED 4 /* traced child has trapped */ #define CLD_STOPPED 5 /* child has stopped */ #define CLD_CONTINUED 6 /* stopped child has continued */ -#define NSIGCHLD 6 +#define CLD_PREDUMP 7 /* child is about to dump core */ +#define NSIGCHLD 7 /* * SIGPOLL (or any other signal without signal specific si_codes) si_codes diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index c0d7ea0..79f0a8a 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -219,4 +219,8 @@ struct prctl_mm_map { # define PR_SPEC_DISABLE (1UL << 2) # define PR_SPEC_FORCE_DISABLE (1UL << 3) +/* Whether to receive signal prior to child's coredump */ +#define PR_SET_PREDUMP_SIG 54 +#define PR_GET_PREDUMP_SIG 55 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 07cddff..c296c11 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1985,6 +1985,7 @@ static __latent_entropy struct task_struct *copy_process( p->dirty_paused_when = 0; p->pdeath_signal = 0; + p->predump_signal = 0; INIT_LIST_HEAD(&p->thread_group); p->task_works = NULL; diff --git a/kernel/signal.c b/kernel/signal.c index 312b43e..eb4a483 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2337,6 +2337,44 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info) return signr; } +/* + * Let the parent, if so desired, know about the imminent death of a child + * prior to its coredump. + * + * Locking logic is similar to do_notify_parent_cldstop(). + */ +static void do_notify_parent_predump(struct task_struct *tsk) +{ + struct sighand_struct *sighand; + struct task_struct *parent; + struct kernel_siginfo info; + unsigned long flags; + int sig; + + parent = tsk->real_parent; + sig = parent->predump_signal; + + /* Check again with "tasklist_lock" locked by the caller */ + if (!valid_predump_signal(sig)) + return; + + clear_siginfo(&info); + info.si_signo = sig; + if (sig == SIGCHLD) + info.si_code = CLD_PREDUMP; + + rcu_read_lock(); + info.si_pid = task_pid_nr_ns(tsk, task_active_pid_ns(parent)); + info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), + task_uid(tsk)); + rcu_read_unlock(); + + sighand = parent->sighand; + spin_lock_irqsave(&sighand->siglock, flags); + __group_send_sig_info(sig, &info, parent); + spin_unlock_irqrestore(&sighand->siglock, flags); +} + bool get_signal(struct ksignal *ksig) { struct sighand_struct *sighand = current->sighand; @@ -2497,6 +2535,19 @@ bool get_signal(struct ksignal *ksig) current->flags |= PF_SIGNALED; if (sig_kernel_coredump(signr)) { + /* + * Notify the parent prior to the coredump if the + * parent is interested in such a notificaiton. + */ + int p_sig = current->real_parent->predump_signal; + + if (valid_predump_signal(p_sig)) { + read_lock(&tasklist_lock); + do_notify_parent_predump(current); + read_unlock(&tasklist_lock); + cond_resched(); + } + if (print_fatal_signals) print_fatal_signal(ksig->info.si_signo); proc_coredump_connector(current); diff --git a/kernel/sys.c b/kernel/sys.c index 123bd73..43eb250d 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2258,6 +2258,76 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, return -EINVAL; } +static int prctl_get_predump_signal(struct task_struct *tsk, pid_t pid, + int __user *addr) +{ + struct task_struct *p; + int error; + + /* For the current task, the common case */ + if (pid == 0) { + put_user(tsk->predump_signal, addr); + return 0; + } + + error = -ESRCH; + rcu_read_lock(); + p = find_task_by_vpid(pid); + if (p) { + error = 0; + put_user(p->predump_signal, addr); + } + rcu_read_unlock(); + return error; +} + +/* + * Returns true if current's euid is same as p's uid or euid, + * or has CAP_SYS_ADMIN. + * + * Called with rcu_read_lock, creds are safe. + * + * Adapted from set_one_prio_perm(). + */ +static bool set_predump_signal_perm(struct task_struct *p) +{ + const struct cred *cred = current_cred(), *pcred = __task_cred(p); + + return uid_eq(pcred->uid, cred->euid) || + uid_eq(pcred->euid, cred->euid) || + capable(CAP_SYS_ADMIN); +} + +static int prctl_set_predump_signal(struct task_struct *tsk, pid_t pid, int sig) +{ + struct task_struct *p; + int error; + + /* 0 is valid for disabling the feature */ + if (sig && !valid_predump_signal(sig)) + return -EINVAL; + + /* For the current task, the common case */ + if (pid == 0) { + tsk->predump_signal = sig; + return 0; + } + + error = -ESRCH; + rcu_read_lock(); + p = find_task_by_vpid(pid); + if (p) { + if (!set_predump_signal_perm(p)) + error = -EPERM; + else { + error = 0; + p->predump_signal = sig; + } + } + rcu_read_unlock(); + return error; +} + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, unsigned long, arg4, unsigned long, arg5) { @@ -2476,6 +2546,13 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, return -EINVAL; error = arch_prctl_spec_ctrl_set(me, arg2, arg3); break; + case PR_SET_PREDUMP_SIG: + error = prctl_set_predump_signal(me, (pid_t)arg2, (int)arg3); + break; + case PR_GET_PREDUMP_SIG: + error = prctl_get_predump_signal(me, (pid_t)arg2, + (int __user *)arg3); + break; default: error = -EINVAL; break; -- 1.8.3.1