Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp3411333ybc; Thu, 14 Nov 2019 08:44:28 -0800 (PST) X-Google-Smtp-Source: APXvYqxOU98UTx1vR6Ds5BEGLTx+lFq8n4yngpBXndb8lIzHZtWRz/tvgQMm7LU5TBw28PZ8Th75 X-Received: by 2002:a50:9136:: with SMTP id e51mr2305732eda.71.1573749867020; Thu, 14 Nov 2019 08:44:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573749867; cv=none; d=google.com; s=arc-20160816; b=QtfKVqaXTpNb9McId2j/mXVf590GEFpkDgVN+8SJzx++wrGMLlzN3K/qxBhnjR2WlM 9AeSd/qihSIHtyrYjUo/Mv0ppCfdbB4gUdVjqR6mcVmwYmzJUO1FKjakCZFkvfpKgFB5 xE2/fvko415G11OGD+Gx2QnIQu/Cgae1nw/qZ7M4qvJNHiIB23M3sc1Wq0SIxnRaP2s/ x4p+nRWyQ7qnhAmMd7yNbEfJdHrc9vka2KFXhy3KtjTw7Hc/GXaYGhwlsPMhSTdVytFn 239mux31Tc7aSneRARgjx1dZ/TBwoJU5HQE6X7Ci+8z/HhJDkVCBzu6d16elbFsbDgwn 2mTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=mTfxvUUmkb2hoDqZ7Cc8+uF7C0iJPmLrLl5/JKoAv74=; b=HL+Lt/p8FNOTW50DixqnQpP/sAZrKDrUffcG06zdoqjpqDHI/cEOjMP3ecHdBulu3V C3yC0Dbvpa+3HJrjAl9G0nYyB4qbSTtPCAXGI3DGcKGZWHrGaRW5MRjvebo0hbX/C2nb VH05rfQC9lqzSiLNFpj+R4UkcuVUsWFchxgdxVv7AdUcigqU54ETCBeYKtXqgBsy22nT JTcyjKcpGfN9RmodsgyI8rqdM9oxCQHYwOqKnsQNmBuMt63VAcdQH4qCVTCBiNvc5Dr6 izWXYOXpKX1kmvJ9XxKkYbgwQejcq7oG7lNlwI4qcS6sJp5ewaj+P1Gq22u2lKu1YkGA 8cUg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t22si4060509edy.116.2019.11.14.08.44.01; Thu, 14 Nov 2019 08:44:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726707AbfKNQkq (ORCPT + 99 others); Thu, 14 Nov 2019 11:40:46 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:43523 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726410AbfKNQkp (ORCPT ); Thu, 14 Nov 2019 11:40:45 -0500 Received: from [213.220.153.21] (helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iVIAQ-0000ZX-V2; Thu, 14 Nov 2019 16:40:23 +0000 Date: Thu, 14 Nov 2019 17:40:22 +0100 From: Christian Brauner To: Adrian Reber , Oleg Nesterov Cc: Eric Biederman , Pavel Emelyanov , Jann Horn , Dmitry Safonov <0x7f454c46@gmail.com>, Rasmus Villemoes , linux-kernel@vger.kernel.org, Andrei Vagin , Mike Rapoport , Radostin Stoyanov Subject: Re: [PATCH v10 1/2] fork: extend clone3() to support setting a PID Message-ID: <20191114164021.zvlyjceifuusdzqm@wittgenstein> References: <20191114142707.1608679-1-areber@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191114142707.1608679-1-areber@redhat.com> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 14, 2019 at 03:27:06PM +0100, Adrian Reber wrote: > The main motivation to add set_tid to clone3() is CRIU. > > To restore a process with the same PID/TID CRIU currently uses > /proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to > ns_last_pid and then (quickly) does a clone(). This works most of the > time, but it is racy. It is also slow as it requires multiple syscalls. > > Extending clone3() to support *set_tid makes it possible restore a > process using CRIU without accessing /proc/sys/kernel/ns_last_pid and > race free (as long as the desired PID/TID is available). > > This clone3() extension places the same restrictions (CAP_SYS_ADMIN) > on clone3() with *set_tid as they are currently in place for ns_last_pid. > > The original version of this change was using a single value for > set_tid. At the 2019 LPC, after presenting set_tid, it was, however, > decided to change set_tid to an array to enable setting the PID of a > process in multiple PID namespaces at the same time. If a process is > created in a PID namespace it is possible to influence the PID inside > and outside of the PID namespace. Details also in the corresponding > selftest. > > To create a process with the following PIDs: > > PID NS level Requested PID > 0 (host) 31496 > 1 42 > 2 1 > > For that example the two newly introduced parameters to struct > clone_args (set_tid and set_tid_size) would need to be: > > set_tid[0] = 1; > set_tid[1] = 42; > set_tid[2] = 31496; > set_tid_size = 3; > > If only the PIDs of the two innermost nested PID namespaces should be > defined it would look like this: > > set_tid[0] = 1; > set_tid[1] = 42; > set_tid_size = 2; > > The PID of the newly created process would then be the next available > free PID in the PID namespace level 0 (host) and 42 in the PID namespace > at level 1 and the PID of the process in the innermost PID namespace > would be 1. > > The set_tid array is used to specify the PID of a process starting > from the innermost nested PID namespaces up to set_tid_size PID namespaces. > > set_tid_size cannot be larger then the current PID namespace level. > > Signed-off-by: Adrian Reber > Reviewed-by: Christian Brauner Applied. Thanks, Adrian and Oleg! Christian