Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4931862imu; Sat, 1 Dec 2018 05:59:02 -0800 (PST) X-Google-Smtp-Source: AFSGD/XvaJ+P+tCMoRppgrOEk+2X3ARV5zBL8og+REZQJ1Wmgwfj1n7/+TznqDWqdGVYxAN2Rvja X-Received: by 2002:a62:fc86:: with SMTP id e128mr9796662pfh.54.1543672742280; Sat, 01 Dec 2018 05:59:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543672742; cv=none; d=google.com; s=arc-20160816; b=R6xOkP+OzsqikJHvYq1CohewWDwMmMt03sZboOQmkYcqLVivm9oAy1gn+phdWPrQF2 nY/SgSqzjHu09hKf6DM1a/4U6vMpe+kHnLbG2FO747mcPNtNwphrlSmBeIvvlMKljQer zV3xO+l1+fokVqiAmZRa4s8DItRGL4uVX0GXS3BhNqoY4/TcXRC47KaXs4Ij/pTW0cQA 3y9vn3n34d2BPzR0MO1yq93LCsq+0rlyAB1sPwpq3vGoizSFxDU4rE0WJzxSj1p2dJb1 ME/JwIywXQOMuw4JwUxcT+x2LsFIW+dwPZal1eOdNfZABPhquFbpop0+HdyFb2RKZENJ WEQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=jNEjkFhWMxu2S3koeKD3/Vkeem18f5h6IJY2tMihulQ=; b=YklIMlf0eSuUkJSAaI1Zz+ldB+jXLzhtnBLzG8UvXzKnOfcp5R4ZRcicS/h4nXXqCS e1MKvj56P7JPDo8N2jC9+cGey9XmA5sdkOCoJfefMLKXMl6D+2KDrcO7dk5FLaHU3g+O v3rX0kPYwuq0x/OPmLN1CGXhyO/+mRARLyStQ0UPexNA+Ywrshm0E2P2BjI/7CVkEpRP 8M/Ldkq/iTXU/NmOXyLqP1p+R49qEph6XRWhzzCwQ5ibaoDPSN4efJJYWsO6qxnsJL5T F7ff6KU4h4I5qqo1adSh23TEtV9UZsW0q3yFLUCbED0E/rsIlpzXLlKDRx9HtIaRPSpb zbmw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33si8520229plg.62.2018.12.01.05.58.47; Sat, 01 Dec 2018 05:59:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726830AbeLBBKn (ORCPT + 99 others); Sat, 1 Dec 2018 20:10:43 -0500 Received: from nov-007-i648.relay.mailchannels.net ([46.232.183.202]:48840 "EHLO nov-007-i648.relay.mailchannels.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726458AbeLBBKn (ORCPT ); Sat, 1 Dec 2018 20:10:43 -0500 X-Sender-Id: novatrend|x-authuser|juerg@bitron.ch Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 913F12EC0003; Sat, 1 Dec 2018 13:58:04 +0000 (UTC) Received: from srv17.tophost.ch (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTPA id A80432EC002F; Sat, 1 Dec 2018 13:58:00 +0000 (UTC) X-Sender-Id: novatrend|x-authuser|juerg@bitron.ch Received: from srv17.tophost.ch (srv17.tophost.ch [193.33.128.141]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.15.2); Sat, 01 Dec 2018 13:58:04 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: novatrend|x-authuser|juerg@bitron.ch X-MailChannels-Auth-Id: novatrend X-Hook-Occur: 7c8ba3a21a3d823a_1543672684378_1746381404 X-MC-Loop-Signature: 1543672684377:208710795 X-MC-Ingress-Time: 1543672684377 Received: from [178.197.234.15] (port=21001 helo=jx1y) by srv17.tophost.ch with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.91) (envelope-from ) id 1gT5mO-008f0W-JC; Sat, 01 Dec 2018 14:57:56 +0100 Message-ID: <5aebc712634afda1eaad820f2e4f330689b287ea.camel@bitron.ch> Subject: Re: [PATCH v2 1/1] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT From: =?ISO-8859-1?Q?J=FCrg?= Billeter To: Florian Weimer Cc: Andrew Morton , Oleg Nesterov , Thomas Gleixner , Eric Biederman , Kees Cook , Andy Lutomirski , linux-api@vger.kernel.org, linux-kernel@vger.kernel.org Date: Sat, 01 Dec 2018 14:57:54 +0100 In-Reply-To: <878t19o2h6.fsf@oldenburg.str.redhat.com> References: <20181127225408.7553-2-j@bitron.ch> <20181130080004.23635-1-j@bitron.ch> <20181130080004.23635-2-j@bitron.ch> <87bm66u1j5.fsf@oldenburg.str.redhat.com> <878t19o2h6.fsf@oldenburg.str.redhat.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.30.2 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-AuthUser: juerg@bitron.ch Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 2018-12-01 at 13:28 +0100, Florian Weimer wrote: > * Jürg Billeter: > > > On Fri, 2018-11-30 at 14:40 +0100, Florian Weimer wrote: > > > * Jürg Billeter: > > > > > > > This introduces a new thread group flag that can be set by calling > > > > > > > > prctl(PR_SET_KILL_DESCENDANTS_ON_EXIT, 1, 0, 0, 0) > > > > > > > > When a thread group exits with this flag set, it will send SIGKILL to > > > > all descendant processes. This can be used to prevent stray child > > > > processes. > > > > > > > > This flag is cleared on privilege gaining execve(2) to ensure an > > > > unprivileged process cannot get a privileged process to send SIGKILL. > > > > > > So this is inherited across regular execve? I'm not sure that's a good > > > idea. > > > > Yes, this matches PR_SET_CHILD_SUBREAPER (and other process > > attributes). Besides consistency and allowing a parent to configure the > > flag for a spawned process, this is also needed to prevent a process > > from clearing the flag (in combination with a seccomp filter). > > I think the semantics of PR_SET_CHILD_SUBREAPER are different, and the > behavior makes more sense there. In my opinion, introducing inconsistency by deviating from the common behavior of retaining process attributes across execve would be more confusing/surprising to users. I don't see why it makes sense for PR_SET_CHILD_SUBREAPER but not for PR_SET_KILL_DESCENDANTS_ON_EXIT. Also, the main motivation is to provide a subset of PID namespace features to unprivileged processes with a lightweight mechanism. Retaining kill_descendants_on_exit across execve allows very similar usage to PID namespaces: E.g., the parent can set PR_SET_KILL_DESCENDANTS_ON_EXIT and PR_SET_CHILD_SUBREAPER in the child before execve and the spawned init-like executable doesn't need to know about this flag itself, i.e., the same init-like program can function as a leader of a PID namespace or as a subreaper with this extra flag set without code changes. If the flag was cleared by execve, the program would need to know about this flag and it would be impossible for the parent to lock this down using seccomp. > > > > > Descendants that are orphaned and reparented to an ancestor of the > > > > current process before the current process exits, will not be killed. > > > > PR_SET_CHILD_SUBREAPER can be used to contain orphaned processes. > > > > > > For double- or triple-forking daemons, the reparenting will be racy, if > > > I understand things correctly. > > > > Can you please elaborate, if you're concerned about a particular race? > > As the commit message mentions, for containment this flag can be > > combined with PR_SET_CHILD_SUBREAPER (and PR_SET_NO_NEW_PRIVS). > > Without PR_SET_CHILD_SUBREAPER, if a newly execve'ed daemon performs > double/triple forking to disentangle itself from the parent process > session, and the parent process which set > PR_SET_KILL_DESCENDANTS_ON_EXIT terminates, behavior depends on when > exactly the parent process terminates. The daemon process will leak if > it has completed its reparenting. > > I think this could be sufficiently common that solution is needed here. I expect the common case to be that PR_SET_KILL_DESCENDANTS_ON_EXIT will be used together with PR_SET_CHILD_SUBREAPER (and possibly PR_SET_NO_NEW_PRIVS) to prevent stray children. And I don't see a race condition in that case. PR_SET_KILL_DESCENDANTS_ON_EXIT can be used for non-subreapers but I expect this to be used in more specialized scenarios where the program is designed/known to avoid such race conditions. We could theoretically restrict PR_SET_KILL_DESCENDANTS_ON_EXIT to subreapers but I currently don't see a strong enough reason for this. Jürg