Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp2875828ybc; Thu, 14 Nov 2019 00:00:09 -0800 (PST) X-Google-Smtp-Source: APXvYqyqISCeO8beLvJ6Hp+enOCrN5flHJL5rt2Jzi7jqOFxVHINP6nueNh67wFeh7n/b9rdnFNT X-Received: by 2002:a05:6402:1049:: with SMTP id e9mr8359433edu.91.1573718408942; Thu, 14 Nov 2019 00:00:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573718408; cv=none; d=google.com; s=arc-20160816; b=U7R6Le1qDtPQSgeibcTXFru5gr5Jgsw1sHA+/4Jx6D2no87s2qx1yE6jvyCB72dZBf VWCHn8OjrzvggbHKhi2S7AIPCVVIyeeB0z66p5TTOJWszz1MzK7arZ/U0ewEBDJKClvv pLcaI/9WC5xiyteP1k6b7CnQE1nMaL4hisDxxOy7YwtmhSHL2RYJUypHnRfknVAbW+G5 pg5zmEQp+WHGvYazmjSU9gybMJrPmBfH4F9G9pV7/YxZjrZ7RWzdOdbQqz17gPsnQGtL eG6mr8n9odM0FJoR8MVA3eIv+xu+uKMF4R6rluxw/UukjvOFGR/NOfm4zbtH7p2DUIiV wFiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=PYhKTeIU5+w00DgEAHItFTx5oKOL2Sp+fvJ9PmrzWVk=; b=pt5PefIjK6eP6VNNQzsMD9J7ZrJW7454MJr1PzpSfpQhuhyXg3K4xO6WES6gVnFNKW iwrvAkf5xHz5V/rQMXNwHZcsre6ajTC8sfv41KlEeax2PBSCFiLrGhwobR3JwoLY1V8/ Vj67uWobSddVkPGRceteaz+2waOUzxulyAy5eFjmuQsiJTSlo0b3ovWifW9HI5w4Z92T IdYm5NyZy3oucLZSUyaYzuHMe90M2KBAOi7tI960Hlo02AmHyVGVVyzCz2tH9lfv/Qlt wsIpk3QIflkDsBXcf8sW1QDYB9S2PLFmno5zBKk/qjfHRrXQ26ICLzLaOtoV4IzSFYWo zGtA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o5si2984369ejn.123.2019.11.13.23.59.44; Thu, 14 Nov 2019 00:00:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726335AbfKNH6N (ORCPT + 99 others); Thu, 14 Nov 2019 02:58:13 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:59497 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725838AbfKNH6M (ORCPT ); Thu, 14 Nov 2019 02:58:12 -0500 Received: from [213.220.153.21] (helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iVA0v-0006DG-3f; Thu, 14 Nov 2019 07:58:01 +0000 Date: Thu, 14 Nov 2019 08:58:00 +0100 From: Christian Brauner To: Adrian Reber Cc: Eric Biederman , Pavel Emelyanov , Jann Horn , Oleg Nesterov , Dmitry Safonov <0x7f454c46@gmail.com>, Rasmus Villemoes , linux-kernel@vger.kernel.org, Andrei Vagin , Mike Rapoport , Radostin Stoyanov Subject: Re: [PATCH v9 1/2] fork: extend clone3() to support setting a PID Message-ID: <20191114075759.3cdil2rh3dz4ozvs@wittgenstein> References: <20191114070709.1504202-1-areber@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191114070709.1504202-1-areber@redhat.com> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 14, 2019 at 08:07:08AM +0100, Adrian Reber wrote: > The main motivation to add set_tid to clone3() is CRIU. > > To restore a process with the same PID/TID CRIU currently uses > /proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to > ns_last_pid and then (quickly) does a clone(). This works most of the > time, but it is racy. It is also slow as it requires multiple syscalls. > > Extending clone3() to support *set_tid makes it possible restore a > process using CRIU without accessing /proc/sys/kernel/ns_last_pid and > race free (as long as the desired PID/TID is available). > > This clone3() extension places the same restrictions (CAP_SYS_ADMIN) > on clone3() with *set_tid as they are currently in place for ns_last_pid. > > The original version of this change was using a single value for > set_tid. At the 2019 LPC, after presenting set_tid, it was, however, > decided to change set_tid to an array to enable setting the PID of a > process in multiple PID namespaces at the same time. If a process is > created in a PID namespace it is possible to influence the PID inside > and outside of the PID namespace. Details also in the corresponding > selftest. > > To create a process with the following PIDs: > > PID NS level Requested PID > 0 (host) 31496 > 1 42 > 2 1 > > For that example the two newly introduced parameters to struct > clone_args (set_tid and set_tid_size) would need to be: > > set_tid[0] = 1; > set_tid[1] = 42; > set_tid[2] = 31496; > set_tid_size = 3; > > If only the PIDs of the two innermost nested PID namespaces should be > defined it would look like this: > > set_tid[0] = 1; > set_tid[1] = 42; > set_tid_size = 2; > > The PID of the newly created process would then be the next available > free PID in the PID namespace level 0 (host) and 42 in the PID namespace > at level 1 and the PID of the process in the innermost PID namespace > would be 1. > > The set_tid array is used to specify the PID of a process starting > from the innermost nested PID namespaces up to set_tid_size PID namespaces. > > set_tid_size cannot be larger then the current PID namespace level. > > Signed-off-by: Adrian Reber I have no quarrels with the core patch anymore. Note, once Oleg has said he's fine with this patch too I will likely reword the kernel-doc and the comment in alloc_pid() and the commit message a little before applying; but really just minor things that are not worth resending for. Thanks! Reviewed-by: Christian Brauner