Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp5301113imd; Tue, 30 Oct 2018 15:24:44 -0700 (PDT) X-Google-Smtp-Source: AJdET5fzZpV7Vr92NxuRxZ7JQiHlmeQfjc1Imka03Z3f6iZZ6G9j7Bbo7d+aupurt9YrJytfLFDR X-Received: by 2002:a17:902:904a:: with SMTP id w10-v6mr516392plz.225.1540938284249; Tue, 30 Oct 2018 15:24:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540938284; cv=none; d=google.com; s=arc-20160816; b=RRo5UnkOsGyt035ZBMzNkVQ2anXO7hkvqI+XlzKzYze+rdKlJ+3wz9cETQbi4a5i8X mpRhSUL1FTWkgaPWX+sBGikUZOiIqt6abCWzzAzPR27ZUhfNqdxFVCerkqKX5+fppzpw OPuD4R747931QG/RGyoNDOmwifusQityrvX2vwRYPAyPDcrDxIttP3EXls6Nn71BrdDW H3FCIzO3RTy9pOUHNk6J18kGjREUgk/G3ukg2bygnrhQtR1aZosfLuglB4hcy2rUlCZ7 H21Z5VybHPkkI1wO0jGcPwnmi7OCbp4yqtbV42fCfgq2ocs8WfcidQg6MwvJ3Zf6V0eQ mT0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=zRRMeB9OPL4J6NAALcI/sOKUFBLCcdjuZV76WiDp8eQ=; b=qT6iwo0FMoA4Fmoes7HYoT0bwqrmxaiE1nqNsnNcZ4a4AZbFcI+oOetp4WBgsnApVW jUQ0EjRUF2YxNr1WKS8tlDMjQa8w4kWKSTSeI3DlA/RdFnTCZKoybdQs2tdUXXXAbsvl IqhSXveN/pSBp3j7Mv+H2owgPDVORYZ8tCeFAL+uO5ZKm2TXLXY22ePZLjTY7krlwRSV GPofHoAAJMniDefLf6V6MChq2jVOnXDbdT2sJdVZ6HRSmJ+6Uoru5QjsvaMTm2c5VXpS sbSJNowG5LpiWPSE3U62Ka7o8f1201oC8gD1vVGwPR/B2nd+qFSImiwkSuZQiDp8w/i3 j6lw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 129-v6si24719013pfy.164.2018.10.30.15.24.28; Tue, 30 Oct 2018 15:24:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728431AbeJaHTL (ORCPT + 99 others); Wed, 31 Oct 2018 03:19:11 -0400 Received: from mx1.mailbox.org ([80.241.60.212]:13430 "EHLO mx1.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728078AbeJaHTL (ORCPT ); Wed, 31 Oct 2018 03:19:11 -0400 Received: from smtp1.mailbox.org (smtp1.mailbox.org [80.241.60.240]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id C5F2F42844; Tue, 30 Oct 2018 23:23:50 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by spamfilter03.heinlein-hosting.de (spamfilter03.heinlein-hosting.de [80.241.56.117]) (amavisd-new, port 10030) with ESMTP id 1sQtVVSfcsWL; Tue, 30 Oct 2018 23:23:46 +0100 (CET) Date: Wed, 31 Oct 2018 09:23:39 +1100 From: Aleksa Sarai To: Joel Fernandes Cc: Daniel Colascione , linux-kernel , Tim Murray , Suren Baghdasaryan Subject: Re: [RFC PATCH] Implement /proc/pid/kill Message-ID: <20181030222339.ud4wfp75tidowuo4@yavin> References: <20181029221037.87724-1-dancol@google.com> <20181030050012.u43lcvydy6nom3ul@yavin> <20181030204501.jnbe7dyqui47hd2x@yavin> <20181030214243.GB32621@google.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ctrs4pwa23lyq57l" Content-Disposition: inline In-Reply-To: <20181030214243.GB32621@google.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --ctrs4pwa23lyq57l Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2018-10-30, Joel Fernandes wrote: > On Wed, Oct 31, 2018 at 07:45:01AM +1100, Aleksa Sarai wrote: > [...]=20 > > > > (Unfortunately > > > > there are lots of things that make it a bit difficult to use /proc/= $pid > > > > exclusively for introspection of a process -- especially in the con= text > > > > of containers.) > > >=20 > > > Tons of things already break without a working /proc. What do you hav= e in mind? > >=20 > > Heh, if only that was the only blocker. :P > >=20 > > The basic problem is that currently container runtimes either depend on > > some non-transient on-disk state (which becomes invalid on machine > > reboots or dead processes and so on), or on long-running processes that > > keep file descriptors required for administration of a container alive > > (think O_PATH to /dev/pts/ptmx to avoid malicious container filesystem > > attacks). Usually both. > >=20 > > What would be really useful would be having some way of "hiding away" a > > mount namespace (of the pid1 of the container) that has all of the > > information and bind-mounts-to-file-descriptors that are necessary for > > administration. If the container's pid1 dies all of the transient state > > has disappeared automatically -- because the stashed mount namespace has > > died. In addition, if this was done the way I'm thinking with (and this > > is the contentious bit) hierarchical mount namespaces you could make it > > so that the pid1 could not manipulate its current mount namespace to > > confuse the administrative process. You would also then create an > > intermediate user namespace to help with several race conditions (that > > have caused security bugs like CVE-2016-9962) we've seen when joining > > containers. > >=20 > > Unfortunately this all depends on hierarchical mount namespaces (and > > note that this would just be that NS_GET_PARENT gives you the mount > > namespace that it was created in -- I'm not suggesting we redesign peers > > or anything like that). This makes it basically a non-starter. > >=20 > > But if, on top of this ground-work, we then referenced containers > > entirely via an fd to /proc/$pid then you could also avoid PID reuse > > races (as well as being able to find out implicitly whether a container > > has died thanks to the error semantics of /proc/$pid). And that's the > > way I would suggest doing it (if we had these other things in place). >=20 > I didn't fully follow exactly what you mean. If you can explain for the > layman who doesn't know much experience with containers.. >=20 > Are you saying that keeping open a /proc/$pid directory handle is not > sufficient to prevent PID reuse while the proc entries under /proc/$pid a= re > being looked into? If its not sufficient, then isn't that a bug? If it is > sufficient, then can we not just keep the handle open while we do whateve= r we > want under /proc/$pid ? Sorry, I went on a bit of a tangent about various internals of container runtimes. My main point is that I would love to use /proc/$pid because it makes reuse handling very trivial and is always correct, but that there are things which stop us from being able to use it for everything (which is what my incoherent rambling was on about). --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --ctrs4pwa23lyq57l Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEb6Gz4/mhjNy+aiz1Snvnv3Dem58FAlvY2eoACgkQSnvnv3De m5/2pRAAklfOwY/pnZ7CSbcT6jX5A/OQHbK/B/IXA7LhlFGI2sun5cAptMidvfGN FoM+BwXsYpac7XMUPBX3iNHBDRj7pWgpb923i1jUaCwWx4Y4gLbVyCMNgxpHA+23 wz51k4lNGuCpbGHVNa+RbvODmw8pQx/Kt9HlTXhlpjKA+t/Ga6i+1E7PQWYYrE6G LDNVxtrXJRiIxFdLOFwP/Ecurm5qq6Qio8JgUJBBU6PQ6YBLmDRxzyf9DAbzqKbs Kt9ujV1+//v8C89KcNDe0HMTwOVjRfqyJ49SMijhSzhmB221HraASJ1dpSk/DCCQ asEP7/B6jB2VHfZuDjkZ2WVeRdC2k4/bfRxDm61Bikak6WLzepiuH+nJ0B3vXPqA LtacbO2sqEGjtlwXfp4+B2xIN4UjbeerrWMBd90hrG0juNvl6S9ZEDPZwaGpSQof mFX5XGM4YHB8ihE6950PDzhtWv4txpLWWPAdERA6zV05kgysRcmkS9uWPcgZmvyr uDVy0ReiC/lk1whKdEIsnSRh9yBbpzEmBRtsP0kODKabsxox7kEG+iHD+h75CTn4 TFhJUXPtXkAz3XraL9MYpIkzNo45ba4yfrFMQKlkSKnwa+EObbakomQMY7AWYCHs 7lKLSydb0kO4V6KXbKv3HVYi9WTUqKY1dUylMvGXLdk93sJk7wk= =nVn+ -----END PGP SIGNATURE----- --ctrs4pwa23lyq57l--