Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp5233148imd; Tue, 30 Oct 2018 14:09:52 -0700 (PDT) X-Google-Smtp-Source: AJdET5f/TBN9KWYz1d9TU5MLTx4pXeGwTDkgydW+zX017FdDDtraNYno4FaUeY8qDG+wAbQtnOj6 X-Received: by 2002:a63:9c1a:: with SMTP id f26mr295149pge.381.1540933792819; Tue, 30 Oct 2018 14:09:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540933792; cv=none; d=google.com; s=arc-20160816; b=RRz7npVpPQX2MtbIqzuKuRmJNqd/2wN03E9LDcLVgE//NJOObH7De4IU/dQ8YZ8My5 G+/LkpDk+rGr6sksEjlN5vWjBolycJKZwPt9YFRfWF5QSdu/L96CPsG3Xr6v19shhwlZ J2KDILml0G2JlnlJqhLAXGIsn85PRIX2khuuctrihJIFCFvo5U4dhLxQrzBFBvUzREzL JPDdpnhDEzvdesR4ZKWIBz73w6S1dHUe1txyVqgmG4bke9SadLQblY8ZGgV3dJWk9esV DbdPB2noFxKFpcxGrty6h4QmZQA3Omfd/0dR4fOVSJkoUeCr5n7Km9KrNeA7TDmhrODJ DJlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=fLmOYJFXJSWwLyOtx+Y6okAhXMeupnx+fx6UXC/I3qE=; b=hAjB9eyP5M1y7VpwaNEDC0fsB2uI9MNEn3v1dTVjez1VGadBTz2nXpuduxHmixoASk 8w7V1K91xbhkd8PTWTSrH70EbfMpysV5zXRgn4nbmaWtEZv68gp3rJfFOVQECdvN+5ir jzF45NjW/f+XXAO580yWV3yG7QSCj6vWJL/sd9PHOxZ+vOqc9K98qkfGIpfQkynjF8wW LjnPOKPMUwWP6D4wdKKtVQRmIVa17eNVE44A4yyTQfdLYbhDBUAtjmbh1Fo0vXHIJLR4 451YuqDlVCwp1Pd2ZDgrGyNheVlD0Cr4a6PHhaNYsVALkV4RBYWNLEeiUS6l5NAYYJAP cNIA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 193-v6si26777710pgc.264.2018.10.30.14.09.37; Tue, 30 Oct 2018 14:09:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727710AbeJaFkO (ORCPT + 99 others); Wed, 31 Oct 2018 01:40:14 -0400 Received: from mx2.mailbox.org ([80.241.60.215]:54374 "EHLO mx2.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727346AbeJaFkO (ORCPT ); Wed, 31 Oct 2018 01:40:14 -0400 Received: from smtp1.mailbox.org (unknown [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx2.mailbox.org (Postfix) with ESMTPS id 13BBCA11E6; Tue, 30 Oct 2018 21:45:10 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id Kf5wuDb_gLPY; Tue, 30 Oct 2018 21:45:08 +0100 (CET) Date: Wed, 31 Oct 2018 07:45:01 +1100 From: Aleksa Sarai To: Daniel Colascione Cc: linux-kernel , Tim Murray , Joel Fernandes , Suren Baghdasaryan Subject: Re: [RFC PATCH] Implement /proc/pid/kill Message-ID: <20181030204501.jnbe7dyqui47hd2x@yavin> References: <20181029221037.87724-1-dancol@google.com> <20181030050012.u43lcvydy6nom3ul@yavin> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ttptq65jxnckrurq" Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --ttptq65jxnckrurq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2018-10-30, Daniel Colascione wrote: > >> Add a simple proc-based kill interface. To use /proc/pid/kill, just > >> write the signal number in base-10 ASCII to the kill file of the > >> process to be killed: for example, 'echo 9 > /proc/$$/kill'. > >> > >> Semantically, /proc/pid/kill works like kill(2), except that the > >> process ID comes from the proc filesystem context instead of from an > >> explicit system call parameter. This way, it's possible to avoid races > >> between inspecting some aspect of a process and that process's PID > >> being reused for some other process. > > > > (Aside from any UX concerns other folks might have.) > > > > I think it would be a good idea to (at least temporarily) restrict this > > so that only processes that are in the same PID namespace as the /proc > > being resolved through may use this interface. Otherwise you might have > > cases where partial container breakouts can start sending signals to > > PIDs they wouldn't normally be able to address. >=20 > That's a good idea. (Oh and I wonder how this interacts with SELinux/AppArmor signal mediation.) > > (Unfortunately > > there are lots of things that make it a bit difficult to use /proc/$pid > > exclusively for introspection of a process -- especially in the context > > of containers.) >=20 > Tons of things already break without a working /proc. What do you have in= mind? Heh, if only that was the only blocker. :P The basic problem is that currently container runtimes either depend on some non-transient on-disk state (which becomes invalid on machine reboots or dead processes and so on), or on long-running processes that keep file descriptors required for administration of a container alive (think O_PATH to /dev/pts/ptmx to avoid malicious container filesystem attacks). Usually both. What would be really useful would be having some way of "hiding away" a mount namespace (of the pid1 of the container) that has all of the information and bind-mounts-to-file-descriptors that are necessary for administration. If the container's pid1 dies all of the transient state has disappeared automatically -- because the stashed mount namespace has died. In addition, if this was done the way I'm thinking with (and this is the contentious bit) hierarchical mount namespaces you could make it so that the pid1 could not manipulate its current mount namespace to confuse the administrative process. You would also then create an intermediate user namespace to help with several race conditions (that have caused security bugs like CVE-2016-9962) we've seen when joining containers. Unfortunately this all depends on hierarchical mount namespaces (and note that this would just be that NS_GET_PARENT gives you the mount namespace that it was created in -- I'm not suggesting we redesign peers or anything like that). This makes it basically a non-starter. But if, on top of this ground-work, we then referenced containers entirely via an fd to /proc/$pid then you could also avoid PID reuse races (as well as being able to find out implicitly whether a container has died thanks to the error semantics of /proc/$pid). And that's the way I would suggest doing it (if we had these other things in place). --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --ttptq65jxnckrurq Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEb6Gz4/mhjNy+aiz1Snvnv3Dem58FAlvYws0ACgkQSnvnv3De m59tqg//SJHM++rBB5LiwC+95XIVUdw1DmHVa+8z0p6QbArCWyzbDF47DGc6+31W R5X9lEOiEH0Rtreq5ZN9xrK4VsjWVqeriu5/Cg4pY41UGsjF5Yc6V8td85mI5Km+ fw+b02qdtbNHO3JSRmKkwcwBr5Y5SX0rn/78YSDFxb2jbgke1ApUVDunkoHwzw2N JoQaIq1iHRqPRMy9COMH1l2qJ/XCCmpyUathKmw6HNCx55aRJHkFrWWoiNbviHZ7 Qst2pM1AKK/dK80oBdGtb1vNb2oI2mOlROESwf0LxUOMbSCrXIDjO3WvDvoOw48P ACO2DHeBqKYam8ZE5aM3Koz9vGbYR5S4Pb0vs6IgR0bLXGGicgxTNASRK/+hBkAi IozExc8PQpgqwP62mM0Za4cRqHO2bW3bgM4ULTHmx7VHhhOkyL2lZf4lVznQ8rh3 D3hQTln8QqYAts32bvWLj/2YePo3maKxF66VwLgz+dkF/IKZ9SRG8wkt4ItSTA3+ +R1SsAVLkKLzn5bXVnR8eteE51MvB0ccn66gjgW7MJJhrSzIhisUv/Ayv2VVhzvk 34sCpiHbK+YFnFpuxTr2mqa6bB9bAeIg2Hiqqlq5NidMpdYSymAiYqrbXnzeXMo7 pHOtuiU7mzdN16sbiENBvJL14w3sMj7d7Wg0hQmRFToOH3j3liA= =VYkc -----END PGP SIGNATURE----- --ttptq65jxnckrurq--