Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp2880073ybc; Thu, 14 Nov 2019 00:04:36 -0800 (PST) X-Google-Smtp-Source: APXvYqw9/E3GqzKggt+mU5PaiRkuKKXFSjC/AixjmXsYXcBHCyjGPxWWfHB8Gxkh0EpRLL5+v5Bc X-Received: by 2002:a50:88c1:: with SMTP id d59mr2542319edd.127.1573718676401; Thu, 14 Nov 2019 00:04:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573718676; cv=none; d=google.com; s=arc-20160816; b=h4/quWZE8ENJS0/TvMDTbv66+UNYcPNGvGWhVHnbyk/7caQGmNAoS1QUeqajudAm9G vhXzbq1YlxhGPd0h5xD+gQOan5vITmIF/UNXwpG8U0AtqlrEtD1QwoJTykGnixMUqAra hzB1jQcDIJ8R60hSMkXE2XyjiCyMDkIHeYKxxoL3M7qXhoAis9EkLT42kvH5+xVAEFhp 5415pbrq3u8r30Yd0J7uimdYOGrxcPgsBtkr6+B+RCB1uVjpRzMyiCrEcobRm5sgeYlu rPja6LRe17b7PLsJJ+ow9hBOkvZLOnkL622szl36+8J0xgA+B5GyREbtG2U595uuXclo mMAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=ApEAWnOgIGCqmanXLhApMxghmMlcd8girFxUa7ZHGUk=; b=jkuqAwdR0PwnYeh+eyCCT5r0YdmGg+gp07z0ylu/mth3h12u0uI6Vxeapji2gmg5k/ MxkiNYWmU0CTZHd9DIXolZOnFdv2gFSaqPgIJJeOJ+mnGH/DQmG08tx+uiYxovytGIrf U8WmE70HTRhRRPb9UDVoObl9qfs0fEafZtuAOAfw01bEMPPMTAobwY6ZFx8G0yjAXDzd bK+wUto0pyWHC9W5Iz05AjduZw1LQFQqSV5RzF1uIGnz9rScsSMkovpx9g+trWZi0CHx K4/fXLo4HPU5VqGPuaH+A/LFwdj2w8ryc6yIv0L4qYKsUh77iQKlJLCvcwcyaQRS1IkQ apTg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d1si2865474ejh.281.2019.11.14.00.04.11; Thu, 14 Nov 2019 00:04:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726185AbfKNIC1 (ORCPT + 99 others); Thu, 14 Nov 2019 03:02:27 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:59689 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725838AbfKNIC0 (ORCPT ); Thu, 14 Nov 2019 03:02:26 -0500 Received: from [213.220.153.21] (helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iVA4w-0006SY-CR; Thu, 14 Nov 2019 08:02:10 +0000 Date: Thu, 14 Nov 2019 09:02:09 +0100 From: Christian Brauner To: Adrian Reber Cc: Eric Biederman , Pavel Emelyanov , Jann Horn , Oleg Nesterov , Dmitry Safonov <0x7f454c46@gmail.com>, Rasmus Villemoes , linux-kernel@vger.kernel.org, Andrei Vagin , Mike Rapoport , Radostin Stoyanov Subject: Re: [PATCH v9 1/2] fork: extend clone3() to support setting a PID Message-ID: <20191114080208.m5cl4hv5w7k2za4e@wittgenstein> References: <20191114070709.1504202-1-areber@redhat.com> <20191114075759.3cdil2rh3dz4ozvs@wittgenstein> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191114075759.3cdil2rh3dz4ozvs@wittgenstein> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 14, 2019 at 08:57:59AM +0100, Christian Brauner wrote: > On Thu, Nov 14, 2019 at 08:07:08AM +0100, Adrian Reber wrote: > > The main motivation to add set_tid to clone3() is CRIU. > > > > To restore a process with the same PID/TID CRIU currently uses > > /proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to > > ns_last_pid and then (quickly) does a clone(). This works most of the > > time, but it is racy. It is also slow as it requires multiple syscalls. > > > > Extending clone3() to support *set_tid makes it possible restore a > > process using CRIU without accessing /proc/sys/kernel/ns_last_pid and > > race free (as long as the desired PID/TID is available). > > > > This clone3() extension places the same restrictions (CAP_SYS_ADMIN) > > on clone3() with *set_tid as they are currently in place for ns_last_pid. > > > > The original version of this change was using a single value for > > set_tid. At the 2019 LPC, after presenting set_tid, it was, however, > > decided to change set_tid to an array to enable setting the PID of a > > process in multiple PID namespaces at the same time. If a process is > > created in a PID namespace it is possible to influence the PID inside > > and outside of the PID namespace. Details also in the corresponding > > selftest. > > > > To create a process with the following PIDs: > > > > PID NS level Requested PID > > 0 (host) 31496 > > 1 42 > > 2 1 > > > > For that example the two newly introduced parameters to struct > > clone_args (set_tid and set_tid_size) would need to be: > > > > set_tid[0] = 1; > > set_tid[1] = 42; > > set_tid[2] = 31496; > > set_tid_size = 3; > > > > If only the PIDs of the two innermost nested PID namespaces should be > > defined it would look like this: > > > > set_tid[0] = 1; > > set_tid[1] = 42; > > set_tid_size = 2; > > > > The PID of the newly created process would then be the next available > > free PID in the PID namespace level 0 (host) and 42 in the PID namespace > > at level 1 and the PID of the process in the innermost PID namespace > > would be 1. > > > > The set_tid array is used to specify the PID of a process starting > > from the innermost nested PID namespaces up to set_tid_size PID namespaces. > > > > set_tid_size cannot be larger then the current PID namespace level. > > > > Signed-off-by: Adrian Reber > > I have no quarrels with the core patch anymore. > Note, once Oleg has said he's fine with this patch too I will likely > reword the kernel-doc and the comment in alloc_pid() and the commit > message a little before applying; but really just minor things that are > not worth resending for. > > Thanks! > Reviewed-by: Christian Brauner And thanks for the --base= information you've added to this version. :) Christian