Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp348391imm; Tue, 24 Jul 2018 20:58:44 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfSS05R6r1m6KX8zTn+0dlAfSgLFdCq8GKlxBXI1ZCTVLJRzeo4QisPtRGrWIReIgffiPjR X-Received: by 2002:a63:195e:: with SMTP id 30-v6mr18688579pgz.192.1532491124719; Tue, 24 Jul 2018 20:58:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532491124; cv=none; d=google.com; s=arc-20160816; b=GaKcwCiXO7EpM6igJQGZHMj3E7xgL4qRGkvlb/5ay74CyjrtIEOPk9jTnmXkSHonrC PMcrOnwHLOWvuaHKoZ298diCd7i+qtBGNy2dnNgnnZUT12Gtf1EwQq0i6nORZ2bgpzlH Kl4YFYC4CcCit/5ETHlD7YNuEXLFiRA+fl632rAFbop1RAarzc6YG6hiXNs8Q59iqP4D ux1G7bAhYEYs3HXuEeWLqUMLdzSb4df6qzCbHpnqZlokNixjjMqQDt7NxC+3jAgy41UY MGDcQSGwl0gmfXm725YaNwkK4FpDsTrvgEeHPwk3+mqR3C7XLRHHFMX6LfwDdzI5GqXQ Q3SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:mime-version:user-agent :message-id:in-reply-to:date:references:cc:to:from :arc-authentication-results; bh=IvkA/RT9+wnfKR9JaXk2pDJO8+trkpjVYcOVpw4Y5vE=; b=fdXMm4kRWaVidNCG1Yj0GS1Zzk7bOmE97tm3XPtWud3+M0/dZIeKJwA7IRZcr1yxYE zhxxrKDH/z+j1MGaYVpgK3nEVnGu82qwHmyAHCkLCIXqYK/nUgQ/aA+eT5+M1GcYrgqe llqo8uEFcqWYcaqQmXambk/VrKiXeaI1KZAipxLD6VPFBY6AvkYnp4C7umSruRO8jpoH aMrcJmL7mpDUc+DUWNWyTnzgT2tzxTpWSxdg/kVRaY4F2RQXXi7qcwTr7RTjo4OVG8C5 tS/IsPLDcG7904Exo7G0neX/iCed97kgmlaV66w6PJPQ9inuZY/Y2CoZocKE5+QPUxHp LWBw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r14-v6si12633444pfa.44.2018.07.24.20.58.30; Tue, 24 Jul 2018 20:58:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727816AbeGYFG4 (ORCPT + 99 others); Wed, 25 Jul 2018 01:06:56 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:58464 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725998AbeGYFG4 (ORCPT ); Wed, 25 Jul 2018 01:06:56 -0400 Received: from in01.mta.xmission.com ([166.70.13.51]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fiAvI-0003zJ-UJ; Tue, 24 Jul 2018 21:57:12 -0600 Received: from [97.119.167.31] (helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fiAvG-0007sP-QG; Tue, 24 Jul 2018 21:57:12 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Linus Torvalds Cc: Oleg Nesterov , Andrew Morton , Linux Kernel Mailing List , Wen Yang , majiang References: <87efft5ncd.fsf_-_@xmission.com> <20180724032419.20231-20-ebiederm@xmission.com> <874lgo5xdg.fsf@xmission.com> <87fu084cxj.fsf@xmission.com> <87a7qg4bb3.fsf_-_@xmission.com> <87pnzc2upf.fsf@xmission.com> Date: Tue, 24 Jul 2018 22:56:58 -0500 In-Reply-To: <87pnzc2upf.fsf@xmission.com> (Eric W. Biederman's message of "Tue, 24 Jul 2018 16:24:28 -0500") Message-ID: <87k1pk2cj9.fsf_-_@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1fiAvG-0007sP-QG;;;mid=<87k1pk2cj9.fsf_-_@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX189EQ6KgpITaZNhBUjCb0iN6HB6v3bc99M= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on sa05.xmission.com X-Spam-Level: X-Spam-Status: No, score=0.5 required=8.0 tests=ALL_TRUSTED,BAYES_50, DCC_CHECK_NEGATIVE,T_TM2_M_HEADER_IN_MSG,T_TooManySym_01,T_TooManySym_02, XMSubLong autolearn=disabled version=3.4.1 X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa05 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_02 5+ unique symbols in subject * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Linus Torvalds X-Spam-Relay-Country: X-Spam-Timing: total 1482 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 3.8 (0.3%), b_tie_ro: 2.5 (0.2%), parse: 1.68 (0.1%), extract_message_metadata: 23 (1.5%), get_uri_detail_list: 6 (0.4%), tests_pri_-1000: 9 (0.6%), tests_pri_-950: 1.51 (0.1%), tests_pri_-900: 1.21 (0.1%), tests_pri_-400: 39 (2.6%), check_bayes: 37 (2.5%), b_tokenize: 16 (1.1%), b_tok_get_all: 11 (0.8%), b_comp_prob: 3.4 (0.2%), b_tok_touch_all: 4.3 (0.3%), b_finish: 0.65 (0.0%), tests_pri_0: 1392 (93.9%), check_dkim_signature: 0.62 (0.0%), check_dkim_adsp: 2.8 (0.2%), tests_pri_500: 7 (0.5%), poll_dns_idle: 0.54 (0.0%), rewrite_mail: 0.00 (0.0%) Subject: [PATCH v3 20/20] signal: Don't restart fork when signals come in. X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Wen Yang and majiang report that a periodic signal received during fork can cause fork to continually restart preventing an application from making progress. The code was being overly pesimistic. Fork needs to guarantee that a signal sent to multiple processes is logically delivered before the fork and just to the forking process or logically delivered after the fork to both the forking process and it's newly spawned child. For signals like periodic timers that are always delivered to a single process fork can safely complete and let them appear to logically delivered after the fork(). While examining this issue I also discovered that fork today will miss signals delivered to multiple processes during the fork and handled by another thread. Similarly the current code will also miss blocked signals that are delivered to multiple process, as those signals will not appear pending during fork. Add a list of each thread that is currently forking, and keep on that list a signal set that records all of the signals sent to multiple processes. When fork completes initialize the new processes shared_pending signal set with it. The calculate_sigpending function will see those signals and set TIF_SIGPENDING causing the new task to take the slow path to userspace to handle those signals. Making it appear as if those signals were received immediately after the fork. It is not possible to send real time signals to multiple processes and exceptions don't go to multiple processes, which means that that are no signals sent to multiple processes that require siginfo. This means it is safe to not bother collecting siginfo on signals sent during fork. The sigaction of a child of fork is initially the same as the sigaction of the parent process. So a signal the parent ignores the child will also initially ignore. Therefore it is safe to ignore signals sent to multiple processes and ignored by the forking process. Signals sent to only a single process or only a single thread and delivered during fork are treated as if they are received after the fork, and generally not dealt with. They won't cause any problems. V2: Added removal from the multiprocess list on failure. V3: Use -ERESTARTNOINTR directly Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447 Reported-by: Wen Yang and Reported-by: majiang Fixes: 4a2c7a7837da ("[PATCH] make fork() atomic wrt pgrp/session signals") Signed-off-by: "Eric W. Biederman" --- include/linux/sched/signal.h | 8 +++++++ kernel/fork.c | 43 ++++++++++++++++++++++-------------- kernel/signal.c | 9 ++++++++ 3 files changed, 44 insertions(+), 16 deletions(-) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index f3507bf165d0..62262021cf7e 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -69,6 +69,11 @@ struct thread_group_cputimer { bool checking_timer; }; +struct multiprocess_signals { + sigset_t signal; + struct hlist_node node; +}; + /* * NOTE! "signal_struct" does not have its own * locking, because a shared signal_struct always @@ -90,6 +95,9 @@ struct signal_struct { /* shared signal handling: */ struct sigpending shared_pending; + /* For collecting multiprocess signals during fork */ + struct hlist_head multiprocess; + /* thread group exit support */ int group_exit_code; /* overloaded: diff --git a/kernel/fork.c b/kernel/fork.c index 6c358846a8b8..94951fb562db 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1456,6 +1456,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) init_waitqueue_head(&sig->wait_chldexit); sig->curr_target = tsk; init_sigpending(&sig->shared_pending); + INIT_HLIST_HEAD(&sig->multiprocess); seqlock_init(&sig->stats_lock); prev_cputime_init(&sig->prev_cputime); @@ -1602,6 +1603,7 @@ static __latent_entropy struct task_struct *copy_process( { int retval; struct task_struct *p; + struct multiprocess_signals delayed; /* * Don't allow sharing the root directory with processes in a different @@ -1649,6 +1651,24 @@ static __latent_entropy struct task_struct *copy_process( return ERR_PTR(-EINVAL); } + /* + * Force any signals received before this point to be delivered + * before the fork happens. Collect up signals sent to multiple + * processes that happen during the fork and delay them so that + * they appear to happen after the fork. + */ + sigemptyset(&delayed.signal); + INIT_HLIST_NODE(&delayed.node); + + spin_lock_irq(¤t->sighand->siglock); + if (!(clone_flags & CLONE_THREAD)) + hlist_add_head(&delayed.node, ¤t->signal->multiprocess); + recalc_sigpending(); + spin_unlock_irq(¤t->sighand->siglock); + retval = -ERESTARTNOINTR; + if (signal_pending(current)) + goto fork_out; + retval = -ENOMEM; p = dup_task_struct(current, node); if (!p) @@ -1934,22 +1954,6 @@ static __latent_entropy struct task_struct *copy_process( goto bad_fork_cancel_cgroup; } - if (!(clone_flags & CLONE_THREAD)) { - /* - * Process group and session signals need to be delivered to just the - * parent before the fork or both the parent and the child after the - * fork. Restart if a signal comes in before we add the new process to - * it's process group. - * A fatal signal pending means that current will exit, so the new - * thread can't slip out of an OOM kill (or normal SIGKILL). - */ - recalc_sigpending(); - if (signal_pending(current)) { - retval = -ERESTARTNOINTR; - goto bad_fork_cancel_cgroup; - } - } - init_task_pid_links(p); if (likely(p->pid)) { @@ -1979,6 +1983,8 @@ static __latent_entropy struct task_struct *copy_process( attach_pid(p, PIDTYPE_TGID); attach_pid(p, PIDTYPE_PGID); attach_pid(p, PIDTYPE_SID); + p->signal->shared_pending.signal = delayed.signal; + hlist_del(&delayed.node); __this_cpu_inc(process_counts); } else { current->signal->nr_threads++; @@ -2060,6 +2066,11 @@ static __latent_entropy struct task_struct *copy_process( put_task_stack(p); free_task(p); fork_out: + if (!(clone_flags & CLONE_THREAD)) { + spin_lock_irq(¤t->sighand->siglock); + hlist_del(&delayed.node); + spin_unlock_irq(¤t->sighand->siglock); + } return ERR_PTR(retval); } diff --git a/kernel/signal.c b/kernel/signal.c index 78e2d5d196f3..5b1aab94daf6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1123,6 +1123,15 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t, out_set: signalfd_notify(t, sig); sigaddset(&pending->signal, sig); + + /* Let multiprocess signals appear after on-going forks */ + if (type > PIDTYPE_TGID) { + struct multiprocess_signals *delayed; + hlist_for_each_entry(delayed, &t->signal->multiprocess, node) { + sigaddset(&delayed->signal, sig); + } + } + complete_signal(sig, t, type); ret: trace_signal_generate(sig, info, t, type != PIDTYPE_PID, result); -- 2.17.1