Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp4508366imm; Mon, 15 Oct 2018 16:30:45 -0700 (PDT) X-Google-Smtp-Source: ACcGV63V3vEdefgPlUshBrMcZctlOXMKCh6OPBRyvZl/ivJnJqhp2Bnb69ZunAMYpevpH1DaGUnx X-Received: by 2002:a65:44c6:: with SMTP id g6-v6mr17407243pgs.350.1539646245540; Mon, 15 Oct 2018 16:30:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539646245; cv=none; d=google.com; s=arc-20160816; b=C2GwYVijv3gARKALR62iuJVBmKsoQl5p9c9rNduN5c//Iwz9uMA0T606Ka6Kr4Nchk dSbr79iGHHFzwGBv11NYGDWnEKb+SPTuqFy6YRwfzaM5bRmOZRHthLfelXVVRB/alCi8 NT30VhQ6Z9Xzb20DHgl3YARGT185W62QsW9iil6NIFlzOchswfqtmF418UUixI7iqdTT q36S4iIp8hTjSVK+ufNp1BaC3LqKPl+GviQVUf5QULAfuUBPjYCSqoO223ZCgdJaRaYg r60HVSSz+EaRQTKIIAI2WYg5l61TbusSaZiCtsSaZwxycgrsGnyuf1dZr1UL5/qRCnjz MOug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:mime-version:user-agent :message-id:in-reply-to:date:references:cc:to:from; bh=1stDBHJYTYcH2JUu1d9A/Zl9J5Lle6jkhwSbzdd7OPg=; b=YG+vcRSPDbc3GY5A0VpGHkn2tmh3VVpbw1tgSG8XLHbcDO7hnA0KOFQBm5kQjV89k5 mwXlTEuib1byr3/AOaq4JSewcxVq4J+qbxrHYkYjTcbB49BExfL5NslQpz479pV30Uyz I/eHo5csHszozF8/rcNiLeN1pz6y5gVtWMhTZuXpQI9J9Zq5SB9jK9IuJEcauYKkX9dy 549TGWm1WHLFRtYM7A1gy1+R7tTv4PQnoC8gK7RqjyP4yCDMIlF5tXTRsdtLrv1tHK1j w1gZM7npZ++JXsn3LZJ5nFxTdpTa3AKkVEPBEC81uY26k00DPNBKP4oGMIIwDiXSkcMs ZSjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m16-v6si10357571pgl.462.2018.10.15.16.30.28; Mon, 15 Oct 2018 16:30:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727079AbeJPHQH (ORCPT + 99 others); Tue, 16 Oct 2018 03:16:07 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:60850 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726440AbeJPHQH (ORCPT ); Tue, 16 Oct 2018 03:16:07 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1gCCHf-0007C3-0k; Mon, 15 Oct 2018 17:28:23 -0600 Received: from 67-3-154-154.omah.qwest.net ([67.3.154.154] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1gCCHc-0003Wp-GJ; Mon, 15 Oct 2018 17:28:22 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Enke Chen Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org, Peter Zijlstra , Arnd Bergmann , Khalid Aziz , Kate Stewart , Helge Deller , Greg Kroah-Hartman , Al Viro , Andrew Morton , Christian Brauner , Catalin Marinas , Will Deacon , Dave Martin , Mauro Carvalho Chehab , Michal Hocko , Rik van Riel , "Kirill A. Shutemov" , Roman Gushchin , Marcos Paulo de Souza , Oleg Nesterov , Dominik Brodowski , Cyrill Gorcunov , Yang Shi , Jann Horn , Kees Cook , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, "Victor Kamensky \(kamensky\)" , xe-linux-external@cisco.com, Stefan Strogin References: Date: Mon, 15 Oct 2018 18:28:03 -0500 In-Reply-To: (Enke Chen's message of "Fri, 12 Oct 2018 17:33:35 -0700") Message-ID: <87va62lri4.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1gCCHc-0003Wp-GJ;;;mid=<87va62lri4.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.154.154;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/2IzdcUh+cVVPDRDjon1VyTaYiixi7dx8= X-SA-Exim-Connect-IP: 67.3.154.154 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on sa03.xmission.com X-Spam-Level: * X-Spam-Status: No, score=1.1 required=8.0 tests=ALL_TRUSTED,BAYES_50, DCC_CHECK_NEGATIVE,FVGT_m_MULTI_ODD,TVD_RCVD_IP,T_TM2_M_HEADER_IN_MSG, T_TooManySym_01,T_XMDrugObfuBody_14,XMSubLong autolearn=disabled version=3.4.0 X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa03 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.2 T_XMDrugObfuBody_14 obfuscated drug references * 0.4 FVGT_m_MULTI_ODD Contains multiple odd letter combinations X-Spam-DCC: XMission; sa03 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: *;Enke Chen X-Spam-Relay-Country: X-Spam-Timing: total 1972 ms - load_scoreonly_sql: 0.39 (0.0%), signal_user_changed: 7 (0.4%), b_tie_ro: 4.0 (0.2%), parse: 4.7 (0.2%), extract_message_metadata: 49 (2.5%), get_uri_detail_list: 12 (0.6%), tests_pri_-1000: 22 (1.1%), tests_pri_-950: 3.7 (0.2%), tests_pri_-900: 3.0 (0.2%), tests_pri_-400: 97 (4.9%), check_bayes: 91 (4.6%), b_tokenize: 47 (2.4%), b_tok_get_all: 20 (1.0%), b_comp_prob: 11 (0.6%), b_tok_touch_all: 7 (0.3%), b_finish: 0.97 (0.0%), tests_pri_0: 1737 (88.1%), check_dkim_signature: 1.71 (0.1%), check_dkim_adsp: 6 (0.3%), tests_pri_10: 4.2 (0.2%), tests_pri_500: 29 (1.4%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH] kernel/signal: Signal-based pre-coredump notification X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Enke Chen writes: > For simplicity and consistency, this patch provides an implementation > for signal-based fault notification prior to the coredump of a child > process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can > be used by an application to express its interest and to specify the > signal (SIGCHLD or SIGUSR1 or SIGUSR2) for such a notification. A new > signal code (si_code), CLD_PREDUMP, is also defined for SIGCHLD. > > Background: > > As the coredump of a process may take time, in certain time-sensitive > applications it is necessary for a parent process (e.g., a process > manager) to be notified of a child's imminent death before the coredump > so that the parent process can act sooner, such as re-spawning an > application process, or initiating a control-plane fail-over. You talk about time senstive and then you talk about bash scripts. I don't think your definition of time-sensitive and my definition match. With that said I think the best solution would be to figure out how to allow the coredump to run in parallel with the usual exit signal, and exit code reaping of the process. That would solve the problem for everyone, and would not introduce any new complicated APIs. Short of that having the prctl in the process that receives the signals they you are doing is the right way to go. You are however calling do_notify_parent_predump from the wrong function, and frankly with the wrong locking. There are multiple paths to the do_coredump function so you really want this notification from do_coredump. But I still think it would be better to solve the root cause problem and change the coredump logic to be able to run in parallel with the normal exit notification and zombie reaping logic. Then the problem you are trying to solve goes away and everyone's code gets simpler. Eric > Currently there are two ways for a parent process to be notified of a > child process's state change. One is to use the POSIX signal, and > another is to use the kernel connector module. The specific events and > actions are summarized as follows: > > Process Event POSIX Signal Connector-based > ---------------------------------------------------------------------- > ptrace_attach() do_notify_parent_cldstop() proc_ptrace_connector() > SIGCHLD / CLD_STOPPED > > ptrace_detach() do_notify_parent_cldstop() proc_ptrace_connector() > SIGCHLD / CLD_CONTINUED > > pre_coredump/ N/A proc_coredump_connector() > get_signal() > > post_coredump/ do_notify_parent() proc_exit_connector() > do_exit() SIGCHLD / exit_signal > ---------------------------------------------------------------------- > > As shown in the table, the signal-based pre-coredump notification is not > currently available. In some cases using a connector-based notification > can be quite complicated (e.g., when a process manager is written in shell > scripts and thus is subject to certain inherent limitations), and a > signal-based notification would be simpler and better suited. > > Signed-off-by: Enke Chen > --- > arch/x86/kernel/signal_compat.c | 2 +- > include/linux/sched.h | 4 ++ > include/linux/signal.h | 5 +++ > include/uapi/asm-generic/siginfo.h | 3 +- > include/uapi/linux/prctl.h | 4 ++ > kernel/fork.c | 1 + > kernel/signal.c | 51 +++++++++++++++++++++++++ > kernel/sys.c | 77 ++++++++++++++++++++++++++++++++++++++ > 8 files changed, 145 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c > index 9ccbf05..a3deba8 100644 > --- a/arch/x86/kernel/signal_compat.c > +++ b/arch/x86/kernel/signal_compat.c > @@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void) > BUILD_BUG_ON(NSIGSEGV != 7); > BUILD_BUG_ON(NSIGBUS != 5); > BUILD_BUG_ON(NSIGTRAP != 5); > - BUILD_BUG_ON(NSIGCHLD != 6); > + BUILD_BUG_ON(NSIGCHLD != 7); > BUILD_BUG_ON(NSIGSYS != 1); > > /* This is part of the ABI and can never change in size: */ > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 09026ea..cfb9645 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -696,6 +696,10 @@ struct task_struct { > int exit_signal; > /* The signal sent when the parent dies: */ > int pdeath_signal; > + > + /* The signal sent prior to a child's coredump: */ > + int predump_signal; > + > /* JOBCTL_*, siglock protected: */ > unsigned long jobctl; > > diff --git a/include/linux/signal.h b/include/linux/signal.h > index 706a499..7cb976d 100644 > --- a/include/linux/signal.h > +++ b/include/linux/signal.h > @@ -256,6 +256,11 @@ static inline int valid_signal(unsigned long sig) > return sig <= _NSIG ? 1 : 0; > } > > +static inline int valid_predump_signal(int sig) > +{ > + return (sig == SIGCHLD) || (sig == SIGUSR1) || (sig == SIGUSR2); > +} > + > struct timespec; > struct pt_regs; > enum pid_type; > diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h > index cb3d6c2..1a47cef 100644 > --- a/include/uapi/asm-generic/siginfo.h > +++ b/include/uapi/asm-generic/siginfo.h > @@ -267,7 +267,8 @@ struct { \ > #define CLD_TRAPPED 4 /* traced child has trapped */ > #define CLD_STOPPED 5 /* child has stopped */ > #define CLD_CONTINUED 6 /* stopped child has continued */ > -#define NSIGCHLD 6 > +#define CLD_PREDUMP 7 /* child is about to dump core */ > +#define NSIGCHLD 7 > > /* > * SIGPOLL (or any other signal without signal specific si_codes) si_codes > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > index c0d7ea0..79f0a8a 100644 > --- a/include/uapi/linux/prctl.h > +++ b/include/uapi/linux/prctl.h > @@ -219,4 +219,8 @@ struct prctl_mm_map { > # define PR_SPEC_DISABLE (1UL << 2) > # define PR_SPEC_FORCE_DISABLE (1UL << 3) > > +/* Whether to receive signal prior to child's coredump */ > +#define PR_SET_PREDUMP_SIG 54 > +#define PR_GET_PREDUMP_SIG 55 > + > #endif /* _LINUX_PRCTL_H */ > diff --git a/kernel/fork.c b/kernel/fork.c > index 07cddff..c296c11 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1985,6 +1985,7 @@ static __latent_entropy struct task_struct *copy_process( > p->dirty_paused_when = 0; > > p->pdeath_signal = 0; > + p->predump_signal = 0; > INIT_LIST_HEAD(&p->thread_group); > p->task_works = NULL; > > diff --git a/kernel/signal.c b/kernel/signal.c > index 312b43e..eb4a483 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2337,6 +2337,44 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info) > return signr; > } > > +/* > + * Let the parent, if so desired, know about the imminent death of a child > + * prior to its coredump. > + * > + * Locking logic is similar to do_notify_parent_cldstop(). > + */ > +static void do_notify_parent_predump(struct task_struct *tsk) > +{ > + struct sighand_struct *sighand; > + struct task_struct *parent; > + struct kernel_siginfo info; > + unsigned long flags; > + int sig; > + > + parent = tsk->real_parent; > + sig = parent->predump_signal; > + > + /* Check again with "tasklist_lock" locked by the caller */ > + if (!valid_predump_signal(sig)) > + return; > + > + clear_siginfo(&info); > + info.si_signo = sig; > + if (sig == SIGCHLD) > + info.si_code = CLD_PREDUMP; > + > + rcu_read_lock(); > + info.si_pid = task_pid_nr_ns(tsk, task_active_pid_ns(parent)); > + info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), > + task_uid(tsk)); > + rcu_read_unlock(); > + > + sighand = parent->sighand; > + spin_lock_irqsave(&sighand->siglock, flags); > + __group_send_sig_info(sig, &info, parent); > + spin_unlock_irqrestore(&sighand->siglock, flags); > +} > + > bool get_signal(struct ksignal *ksig) > { > struct sighand_struct *sighand = current->sighand; > @@ -2497,6 +2535,19 @@ bool get_signal(struct ksignal *ksig) > current->flags |= PF_SIGNALED; > > if (sig_kernel_coredump(signr)) { > + /* > + * Notify the parent prior to the coredump if the > + * parent is interested in such a notificaiton. > + */ > + int p_sig = current->real_parent->predump_signal; > + > + if (valid_predump_signal(p_sig)) { > + read_lock(&tasklist_lock); > + do_notify_parent_predump(current); > + read_unlock(&tasklist_lock); > + cond_resched(); > + } > + > if (print_fatal_signals) > print_fatal_signal(ksig->info.si_signo); > proc_coredump_connector(current); > diff --git a/kernel/sys.c b/kernel/sys.c > index 123bd73..43eb250d 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -2258,6 +2258,76 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, > return -EINVAL; > } > > +static int prctl_get_predump_signal(struct task_struct *tsk, pid_t pid, > + int __user *addr) > +{ > + struct task_struct *p; > + int error; > + > + /* For the current task, the common case */ > + if (pid == 0) { > + put_user(tsk->predump_signal, addr); > + return 0; > + } > + > + error = -ESRCH; > + rcu_read_lock(); > + p = find_task_by_vpid(pid); > + if (p) { > + error = 0; > + put_user(p->predump_signal, addr); > + } > + rcu_read_unlock(); > + return error; > +} > + > +/* > + * Returns true if current's euid is same as p's uid or euid, > + * or has CAP_SYS_ADMIN. > + * > + * Called with rcu_read_lock, creds are safe. > + * > + * Adapted from set_one_prio_perm(). > + */ > +static bool set_predump_signal_perm(struct task_struct *p) > +{ > + const struct cred *cred = current_cred(), *pcred = __task_cred(p); > + > + return uid_eq(pcred->uid, cred->euid) || > + uid_eq(pcred->euid, cred->euid) || > + capable(CAP_SYS_ADMIN); > +} > + > +static int prctl_set_predump_signal(struct task_struct *tsk, pid_t pid, int sig) > +{ > + struct task_struct *p; > + int error; > + > + /* 0 is valid for disabling the feature */ > + if (sig && !valid_predump_signal(sig)) > + return -EINVAL; > + > + /* For the current task, the common case */ > + if (pid == 0) { > + tsk->predump_signal = sig; > + return 0; > + } > + > + error = -ESRCH; > + rcu_read_lock(); > + p = find_task_by_vpid(pid); > + if (p) { > + if (!set_predump_signal_perm(p)) > + error = -EPERM; > + else { > + error = 0; > + p->predump_signal = sig; > + } > + } > + rcu_read_unlock(); > + return error; > +} > + > SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > unsigned long, arg4, unsigned long, arg5) > { > @@ -2476,6 +2546,13 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, > return -EINVAL; > error = arch_prctl_spec_ctrl_set(me, arg2, arg3); > break; > + case PR_SET_PREDUMP_SIG: > + error = prctl_set_predump_signal(me, (pid_t)arg2, (int)arg3); > + break; > + case PR_GET_PREDUMP_SIG: > + error = prctl_get_predump_signal(me, (pid_t)arg2, > + (int __user *)arg3); > + break; > default: > error = -EINVAL; > break;