Received: by 10.223.185.116 with SMTP id b49csp209433wrg; Thu, 8 Mar 2018 15:54:32 -0800 (PST) X-Google-Smtp-Source: AG47ELtsDO0GMb7DaLjbnsuSzTUWBFad4E0Jyg+0K1su+9+cJWafFSh2nLqaVhjMzdyyOwE8Qqfi X-Received: by 10.99.122.86 with SMTP id j22mr22614771pgn.351.1520553271983; Thu, 08 Mar 2018 15:54:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520553271; cv=none; d=google.com; s=arc-20160816; b=nTNOxPgDg1vA2s8P8FRh4CDCs40koSU20HZnO2rvdVBb/DgRkoFkx8oiLO/vrcn0Wj lmwh1zrs02d7pGY+bbWyAtidbyAIy3HC6VvllQ1deF1OBJwO4tpw4yelxe5whNxBJNzJ 7TRM5NMBpJm9Hvcm2wJs2Wu2q6gth7id69lPluLBs7AnmrM7crDmtBwB8qffKWQROsiu 2FtQdWMBej9gipSLVJf0bBF5nFJGH2pMjNUIHv7Wsyb2e7WJcdrnhz0QvQZaxiu8YmS6 YCZuKbtgzlHYirbDfj3rqz/+fubaNn/NWYEd/8Gtx4cZ4w4UJloAuQ1ja9OoH4VKAt/8 +fmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=hpk/DfAkeQwvxRKGoE2RjYwZ2IcO+3mbvsP+cMdDW8Y=; b=Qoe5A4vPwyfAEhyLVFvpFyr4yzQWqaH5ZC6/C3kWBO/mehFMwJWOo8u4HnzD+3Hf49 mHaTCjZEULeJlx4yu0SR8itQ3VCGBJSuUdJqrgJie+WsdudVPTaLDXrm/mC6+gN5oHzq SCTQjZwOPzEktJ/DyYAdazHqIU/SLWpu7osZ3Duv70WUWYb2xxva5UvqwxBf/F8OtBtJ /v2I6O5CRq8amnmOCwjTUITRvye09+0CNdSi0dEs+oq7h13rVAKaPH6zC0iDUOTUnHZ0 Y1T2A7Si/mp3nh4zurYs+lyXFwDTm6lQ41V189+Mz+6wT5E97cfPE73y0qzG/Njc2CIo l59A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 76si12737621pfo.248.2018.03.08.15.54.16; Thu, 08 Mar 2018 15:54:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751115AbeCHXxX (ORCPT + 99 others); Thu, 8 Mar 2018 18:53:23 -0500 Received: from smtp-sh2.infomaniak.ch ([128.65.195.6]:43082 "EHLO smtp-sh2.infomaniak.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750800AbeCHXxU (ORCPT ); Thu, 8 Mar 2018 18:53:20 -0500 Received: from smtp7.infomaniak.ch (smtp7.infomaniak.ch [83.166.132.30]) by smtp-sh.infomaniak.ch (8.14.5/8.14.5) with ESMTP id w28Nq5V4031262 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 9 Mar 2018 00:52:05 +0100 Received: from ns3096276.ip-94-23-54.eu (ns3096276.ip-94-23-54.eu [94.23.54.103]) (authenticated bits=0) by smtp7.infomaniak.ch (8.14.5/8.14.5) with ESMTP id w28Nq03Q074624 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Fri, 9 Mar 2018 00:52:00 +0100 Subject: Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing To: Andy Lutomirski Cc: Tycho Andersen , LKML , Alexei Starovoitov , Arnaldo Carvalho de Melo , Casey Schaufler , Daniel Borkmann , David Drysdale , "David S . Miller" , "Eric W . Biederman" , James Morris , Jann Horn , Jonathan Corbet , Michael Kerrisk , Kees Cook , Paul Moore , Sargun Dhillon , "Serge E . Hallyn" , Shuah Khan , Tejun Heo , Thomas Graf , Will Drewry , Kernel Hardening , Linux API , LSM List , Network Development References: <20180227004121.3633-1-mic@digikod.net> <2e06621c-08e9-dc12-9b6e-9c09d5d8f458@digikod.net> <20180306224636.wf5z3kujtc7r5qyh@cisco> <7082be04-d6af-b853-4bb7-f331836662e2@digikod.net> From: =?UTF-8?Q?Micka=c3=abl_Sala=c3=bcn?= Message-ID: Date: Fri, 9 Mar 2018 00:51:51 +0100 User-Agent: MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="r6BxumTg3uL0bNiir2lPR7tZALVVtIZ8D" X-Antivirus: Dr.Web (R) for Unix mail servers drweb plugin ver.6.0.2.8 X-Antivirus-Code: 0x100000 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --r6BxumTg3uL0bNiir2lPR7tZALVVtIZ8D Content-Type: multipart/mixed; boundary="6AKbTuUXCFlZGfbDTtMAsaqAO7wsT7K6i"; protected-headers="v1" From: =?UTF-8?Q?Micka=c3=abl_Sala=c3=bcn?= To: Andy Lutomirski Cc: Tycho Andersen , LKML , Alexei Starovoitov , Arnaldo Carvalho de Melo , Casey Schaufler , Daniel Borkmann , David Drysdale , "David S . Miller" , "Eric W . Biederman" , James Morris , Jann Horn , Jonathan Corbet , Michael Kerrisk , Kees Cook , Paul Moore , Sargun Dhillon , "Serge E . Hallyn" , Shuah Khan , Tejun Heo , Thomas Graf , Will Drewry , Kernel Hardening , Linux API , LSM List , Network Development Message-ID: Subject: Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing References: <20180227004121.3633-1-mic@digikod.net> <2e06621c-08e9-dc12-9b6e-9c09d5d8f458@digikod.net> <20180306224636.wf5z3kujtc7r5qyh@cisco> <7082be04-d6af-b853-4bb7-f331836662e2@digikod.net> In-Reply-To: --6AKbTuUXCFlZGfbDTtMAsaqAO7wsT7K6i Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 07/03/2018 02:21, Andy Lutomirski wrote: > On Tue, Mar 6, 2018 at 11:06 PM, Micka=C3=ABl Sala=C3=BCn wrote: >> >> On 06/03/2018 23:46, Tycho Andersen wrote: >>> On Tue, Mar 06, 2018 at 10:33:17PM +0000, Andy Lutomirski wrote: >>>>>> Suppose I'm writing a container manager. I want to run "mount" in= the >>>>>> container, but I don't want to allow moun() in general and I want = to >>>>>> emulate certain mount() actions. I can write a filter that catche= s >>>>>> mount using seccomp and calls out to the container manager for hel= p. >>>>>> This isn't theoretical -- Tycho wants *exactly* this use case to b= e >>>>>> supported. >>>>> >>>>> Well, I think this use case should be handled with something like >>>>> LD_PRELOAD and a helper library. FYI, I did something like this: >>>>> https://github.com/stemjail/stemshim >>>> >>>> I doubt that will work for containers. Containers that use user >>>> namespaces and, for example, setuid programs aren't going to honor >>>> LD_PRELOAD. >>> >>> Or anything that calls syscalls directly, like go programs. >> >> That's why the vDSO-like approach. Enforcing an access control is not >> the issue here, patching a buggy userland (without patching its code) = is >> the issue isn't it? >> >> As far as I remember, the main problem is to handle file descriptors >> while "emulating" the kernel behavior. This can be done with a "shim" >> code mapped in every processes. Chrome used something like this (in a >> previous sandbox mechanism) as a kind of emulation (with the current >> seccomp-bpf ). I think it should be doable to replace the (userland) >> emulation code with an IPC wrapper receiving file descriptors through >> UNIX socket. >> >=20 > Can you explain exactly what you mean by "vDSO-like"? >=20 > When a 64-bit program does a syscall, it just executes the SYSCALL > instruction. The vDSO isn't involved at all. 32-bit programs usually > go through the vDSO, but not always. >=20 > It could be possible to force-load a DSO into an entire container and > rig up seccomp to intercept all SYSCALLs not originating from the DSO > such that they merely redirect control to the DSO, but that seems > quite messy. vDSO is a code mapped for all processes. As you said, these processes may use it or not. What I was thinking about is to use the same concept, i.e. map a "shim" code into each processes pertaining to a particular hierarchy (the same way seccomp filters are inherited across processes). With a seccomp filter matching some syscall (e.g. mount, open), it is possible to jump back to the shim code thanks to SECCOMP_RET_TRAP. This shim code should then be able to emulate/patch what is needed, even faking a file opening by receiving a file descriptor through a UNIX socket. As did the Chrome sandbox, the seccomp filter may look at the calling address to allow the shim code to call syscalls without being catched, if needed. However, relying on SIGSYS may not fit with arbitrary code. Using a new SECCOMP_RET_EMULATE (?) may be used to jump to a specific process address, to emulate the syscall in an easier way than only relying on a {c,e}BPF program. --6AKbTuUXCFlZGfbDTtMAsaqAO7wsT7K6i-- --r6BxumTg3uL0bNiir2lPR7tZALVVtIZ8D Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCgAdFiEEUysCyY8er9Axt7hqIt7+33O9apUFAlqhzJcACgkQIt7+33O9 apWgVwgAl1dBFNgiitkw/hxWaEtKaPGCXPJigUT/86EJ0ZNqnQG3njJQCFaaVxal 2XJOqJHaniJ/P/B8JKgdzGlCWWGQOi7ZLWx6+RaN3pHc3YS1xzFNOIdcyxKa59OC iqv79nIBCN1SFX9W7pV8qyRMmSDVteZHZ2UW/x2TsWv4PMCY1/AIdBOCOLPElF7G qmhX0LpcrCpP6OqEi7YlTflNmKAx4WYV9oA5AERb15iyO6YN+111XyyJZmZDyNav CwHJi1dFZ10AGSsYYUDlJ5UjNm8NMzS/HvFDcj5HwFUUvrkwhC7wirE1fuE+pSxu dONxdFA8kw3AFG6g38ZL4eGkkVvswg== =l9jE -----END PGP SIGNATURE----- --r6BxumTg3uL0bNiir2lPR7tZALVVtIZ8D--