Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp646942ybh; Sat, 3 Aug 2019 06:57:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqyDS/tUA/lAxMFnpJfvCLEEzHLYP+XWAVu4JFtwucopJord8u4wZml1UjMMSsLGidvFOBRq X-Received: by 2002:a63:d34c:: with SMTP id u12mr114627739pgi.114.1564840668616; Sat, 03 Aug 2019 06:57:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564840668; cv=none; d=google.com; s=arc-20160816; b=Ap3FDEsV7v78NpA26dkcIAbSDBkzqt2y2yOuSuO7S5dEO4oxWfYwj2/0ITpEfJmi/r L62/r8FrihPPvPAqALHo8rcdCOF6NyfkAzVMhhx/mP5JsgPAKRr5UMlz4+T8/bDDDQBw uv8JSxXlFY7SvQxClB2sp8RflkcuhnMEgLHDN34tDg4pCqmm3K/t7Gm7vFCWJ8BUE3IF Xy1sNdyC4llmSn38aQN1NfP/53ZkeQk2ny2Nmv+Yy3KyvAFH3Zr9u1jZTzkfurTLHP3U DL2of5ehLFafZmiIM8Czbb7D8F/3Uo5tzH/Rn+5H3ppDOJtP6awAl3D/zJNjnx4GhDRI WrnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:date:from:dkim-signature; bh=dkJDCou/R5noAf5MeI28VyIZL6uuN+JWyULKfcfVJQk=; b=Jc/Y/q5arnZzOvgfVHH1Q6dmMJiAKmE8Ttf9Nrlk+eZL3Bu3OCtKEULbCF6+wcv6eN o4zD0nOvqNs7xoJyvt8eMiAb/Uc345vJBompFylWeBgKLkhYLlAuMR7xOJtyT8Cth7iU BYBUiRuvGiCdekG83Kk8GXZxdztR+bWQ2DFHORbbYKzJoDJGfohBd6DhYhX7wtVFnkZ8 Ba61f+SRZvB1pO+UNZ2Ep71vTpm4ojC+ambEJxu7NaMSdKRDpL/7gGGySmm5Sxu6Sia6 eKOu71gC8jNkGodizNZPhU4dl9ydtqHJPF1fG0xK8zNj6YN1gvQAS0UmiJHaXy/F20qt F+xw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=DKCSwf33; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cl7si38842580plb.267.2019.08.03.06.57.33; Sat, 03 Aug 2019 06:57:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=DKCSwf33; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436558AbfHBNu7 (ORCPT + 99 others); Fri, 2 Aug 2019 09:50:59 -0400 Received: from mail-io1-f68.google.com ([209.85.166.68]:39021 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731873AbfHBNu6 (ORCPT ); Fri, 2 Aug 2019 09:50:58 -0400 Received: by mail-io1-f68.google.com with SMTP id f4so152346432ioh.6 for ; Fri, 02 Aug 2019 06:50:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=dkJDCou/R5noAf5MeI28VyIZL6uuN+JWyULKfcfVJQk=; b=DKCSwf33Jdiifj3e9fyaq1g4dBUIVSNbu/WHTxUTyI5/gr3Y35HBmQLapbEXgN5MEd d6CxrACrxA7ifPECr72j7CWNT/slyj+z9q20ihfkxzH/xJwLq7OhiNNvs3mzynRYkehw OhN3DvMjmKsThCwK50nKxSpv1ysQbLKjpeOOHsSyATRp4crTneLslH+r0xNi4TmkDcHm VRuDYN08ynwGJYYXDAcD0ZivALAo4t5vDWynYjWnfT2VuGdv7rDYTd5mlU45NCsCCTbq cYw1USo/okxtt9iGTesc7B78o4jADBclrxv2Q9UlQ2eIsKC0DDUk39vgslTS+bRq9cxL +txw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=dkJDCou/R5noAf5MeI28VyIZL6uuN+JWyULKfcfVJQk=; b=sPZ6maQatQLD5jS++ZJxu0CIODhp1mfBcUnFU7oQCqNGm67RI9SjM+sVTsejj+H28U 1YFYD1Fo+NsEAwXqdKrc06h7o37wsNA4EabBd9H4WR2J2VLFVMH4SHIUGFvLwJmCt+eu duyUryj6DTzNWpkjws8u/gJlMJg4gUhx0/qYhc5PUwpXz5oqwwqt/hcxBTn7mej6Os/+ ES0BzmJcQiIc+gp9kPDCd1v+wEOUy7/avhB0jpIj38P2F+e8lSrltgIc/K8P/wRseaf7 aMQdqvWANmYPE+vDalFBfGIldz77cdQyHz8+rDv2k9JqcZ+cI2xeb43bEYMhX+vgnb7G ErpQ== X-Gm-Message-State: APjAAAWSQLrqkP1uG4Gmelz3DcWlzGFJqPWcBYpD+ALNpQNkqF5IA7eM PM/JpHsHM3ObeowP+pbOZx0= X-Received: by 2002:a02:6a22:: with SMTP id l34mr142361772jac.126.1564753857639; Fri, 02 Aug 2019 06:50:57 -0700 (PDT) Received: from brauner.io ([162.223.5.78]) by smtp.gmail.com with ESMTPSA id v3sm57439131iom.53.2019.08.02.06.50.55 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Fri, 02 Aug 2019 06:50:57 -0700 (PDT) From: Christian Brauner X-Google-Original-From: Christian Brauner Date: Fri, 2 Aug 2019 15:50:54 +0200 To: Oleg Nesterov Cc: Christian Brauner , Adrian Reber , Eric Biederman , Pavel Emelianov , Jann Horn , Dmitry Safonov <0x7f454c46@gmail.com>, linux-kernel@vger.kernel.org, Andrei Vagin , Mike Rapoport , Radostin Stoyanov Subject: Re: [PATCH v2 1/2] fork: extend clone3() to support CLONE_SET_TID Message-ID: <20190802135050.fx3tbynztmxbmqik@brauner.io> References: <20190731161223.2928-1-areber@redhat.com> <20190802131943.hkvcssv74j25xmmt@brauner.io> <20190802133001.GE20111@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190802133001.GE20111@redhat.com> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 02, 2019 at 03:30:01PM +0200, Oleg Nesterov wrote: > On 08/02, Christian Brauner wrote: > > > > On Wed, Jul 31, 2019 at 06:12:22PM +0200, Adrian Reber wrote: > > > The main motivation to add CLONE_SET_TID to clone3() is CRIU. > > > > > > To restore a process with the same PID/TID CRIU currently uses > > > /proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to > > > ns_last_pid and then (quickly) does a clone(). This works most of the > > > time, but it is racy. It is also slow as it requires multiple syscalls. > > > > Can you elaborate how this is racy, please. Afaict, CRIU will always > > usually restore in a new pid namespace that it controls, right? > > Why? No. For example you can checkpoint (not sure this is correct word) > a single process in your namespace, then (try to restore) it. > > > What is > > the exact race? > > something else in the same namespace can fork() right after criu writes > the pid-for-restore into ns_last_pid. Ok, that makes sense. :) My CRIU userspace knowledge is sporadic, so I'm not sure how exactly it restores process trees in pid namespaces and what workloads this would especially help with.