Received: by 10.223.185.116 with SMTP id b49csp4525678wrg; Mon, 26 Feb 2018 20:38:35 -0800 (PST) X-Google-Smtp-Source: AH8x226GFtneTKQs7/oxME8vH92tPY/uMokf5NHQSFEWexHB01Edo9H7BAJosHjUeCVUvakDAiZF X-Received: by 2002:a17:902:6805:: with SMTP id h5-v6mr12918611plk.46.1519706315668; Mon, 26 Feb 2018 20:38:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519706315; cv=none; d=google.com; s=arc-20160816; b=xJVLzq9EMFlgonE0Wx2/juI08H+xg+snC36HG0cr9Ex9RKQtrS97gwXe7VqUk9j2+A 2lWFUrF6e80nA/hxbuBqlfstv1fhZE2yE2jNjQnhoZuk2f1BcnJga4q7eQxkPD/NcqDZ 5KdZ2bNKkV4uZ/ZM+p6nlejwGVh2EmKOzeKvcfSCLZGruhlKLromC9LmG2JNHg7hhytO XgxHqjzhu1mrR7tyEXYCGXBnoH2TkdQvNm3ldM+5FHoDNgSKN2yVBr+aK1bAH1kacdhP wJKqkpcblxlq0hyj4VHyUKSnNmx1DCNINQ+hkwIlyRjMRdy4LOT7B/euVEi1LO3WdZEY 6eSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=ynAvzdDe8rcWpEpC+Ki/ejzvp8GRB3W5VxdkusW35mg=; b=ACz7gg9jbdamaRN/hFEhSNlTVib5c3FS8bGmMa05112rtr+QjOoCZq6SzjSQPQ2Y0P e7vQCyds4+hjR6t1fsGdHpWKZoGKG3YDOlbFYQzOtb8hE/sWk+IUr/sxpEP4vO3X10d7 fQ/DFZ5xUGqez7ayrqD3mpdERbpvsypl9G9g9W2f4w6Gb6ympdsud5BamUUnE+BpyDCK FbyGqsommAOb8y9Dz/+fioktWMOZdgaYCd+06GunW+jo+qyx+7xVWzeF9jJAr+//8gNQ LuiBmLn9GieWrJofWJMn3mF9oZQnHq2ZOqxI+LSo+vy2oGkmb57TDy7g9VVgTQu9g7Uc KfIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=XTeJzoef; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b63-v6si2903338plb.12.2018.02.26.20.38.03; Mon, 26 Feb 2018 20:38:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=XTeJzoef; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751759AbeB0EhW (ORCPT + 99 others); Mon, 26 Feb 2018 23:37:22 -0500 Received: from mail-io0-f174.google.com ([209.85.223.174]:33595 "EHLO mail-io0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751501AbeB0EhT (ORCPT ); Mon, 26 Feb 2018 23:37:19 -0500 Received: by mail-io0-f174.google.com with SMTP id f1so5861018iob.0 for ; Mon, 26 Feb 2018 20:37:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=ynAvzdDe8rcWpEpC+Ki/ejzvp8GRB3W5VxdkusW35mg=; b=XTeJzoef0rFILXqFK3Td/WshDiGYrXmzP3x9UKQZsv6r8JTOq85j4iu5hE0bdrvIW0 uVs6xd5fN11li4OG8682IxZawmqP8smrz+uE/neiaW2XyGAAOKaee9izktunmQI/MNBg HSB/xxTO+4ZCpnfHI0D0Usl5BmjMfDxfET76eKNhxahL0SawlKKXMIb7UMRFCcbmYVmp fksK/o7exxMJslYjAVNNocm2G7POgXXb7rl2ra/76kQ84WYgPvXEhoRUnbQUJuVp5guk /L1G7CugMz4Zz0yh3yHYGX7ry72BhQ7XTb5vV6RL9GZDbImorBxQo2t3iDahoodxfS8m z+ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ynAvzdDe8rcWpEpC+Ki/ejzvp8GRB3W5VxdkusW35mg=; b=kxw6I51XEndcdbWEgBN4wWkCvjxz2OlJYXGpimJ4VDbjPFx1+xvLIcAUYWBfk23SwJ HplGoPQWgPlEnvyieWV4v3qxNVn2u+JSNiPnLr+x5dLDTgf9hjPKXahKJkeDcFLy7uSs 5lQHJdIoDrBeDEoXbJoIxy9QlzefTCM65JaA7zRRJybguZUbkC3UWyIzesJKtdsoJ4LC IboC176wrCo+3ZZKNrnhtXfx+i/YhliAJyUTlNfj0imp5DJKkVqhZvvKToo9HxUD+tKR JX4qe67NKSweQ2vhmMwZhT7swtikSxSM9Fg3i2o+h+hfvlLIoXMEpvkWY+GFc284peri 6dUg== X-Gm-Message-State: APf1xPDOeAU0NrMH6lXi800k0kb0C1Ku64FyugEke1lB+zyoawecWN0K bxeDk2hJS+B5HGalTHb3XrjyXiFIpvy7u98UXxma1A== X-Received: by 10.107.20.131 with SMTP id 125mr14806116iou.239.1519706238870; Mon, 26 Feb 2018 20:37:18 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.137.101 with HTTP; Mon, 26 Feb 2018 20:36:58 -0800 (PST) In-Reply-To: <20180227004121.3633-1-mic@digikod.net> References: <20180227004121.3633-1-mic@digikod.net> From: Andy Lutomirski Date: Tue, 27 Feb 2018 04:36:58 +0000 Message-ID: Subject: Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing To: =?UTF-8?B?TWlja2HDq2wgU2FsYcO8bg==?= Cc: LKML , Alexei Starovoitov , Arnaldo Carvalho de Melo , Casey Schaufler , Daniel Borkmann , David Drysdale , "David S . Miller" , "Eric W . Biederman" , James Morris , Jann Horn , Jonathan Corbet , Michael Kerrisk , Kees Cook , Paul Moore , Sargun Dhillon , "Serge E . Hallyn" , Shuah Khan , Tejun Heo , Thomas Graf , Tycho Andersen , Will Drewry , Kernel Hardening , Linux API , LSM List , Network Development Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 27, 2018 at 12:41 AM, Micka=C3=ABl Sala=C3=BCn wrote: > Hi, > > This eight series is a major revamp of the Landlock design compared to > the previous series [1]. This enables more flexibility and granularity > of access control with file paths. It is now possible to enforce an > access control according to a file hierarchy. Landlock uses the concept > of inode and path to identify such hierarchy. In a way, it brings tools > to program what is a file hierarchy. > > There is now three types of Landlock hooks: FS_WALK, FS_PICK and FS_GET. > Each of them accepts a dedicated eBPF program, called a Landlock > program. They can be chained to enforce a full access control according > to a list of directories or files. The set of actions on a file is well > defined (e.g. read, write, ioctl, append, lock, mount...) taking > inspiration from the major Linux LSMs and some other access-controls > like Capsicum. These program types are designed to be cache-friendly, > which give room for optimizations in the future. > > The documentation patch contains some kernel documentation and > explanations on how to use Landlock. The compiled documentation and > a talk I gave at FOSDEM can be found here: https://landlock.io > This patch series can be found in the branch landlock-v8 in this repo: > https://github.com/landlock-lsm/linux > > There is still some minor issues with this patch series but it should > demonstrate how powerful this design may be. One of these issues is that > it is not a stackable LSM anymore, but the infrastructure management of > security blobs should allow to stack it with other LSM [4]. > > This is the first step of the roadmap discussed at LPC [2]. While the > intended final goal is to allow unprivileged users to use Landlock, this > series allows only a process with global CAP_SYS_ADMIN to load and > enforce a rule. This may help to get feedback and avoid unexpected > behaviors. > > This series can be applied on top of bpf-next, commit 7d72637eb39f > ("Merge branch 'x86-jit'"). This can be tested with > CONFIG_SECCOMP_FILTER and CONFIG_SECURITY_LANDLOCK. I would really > appreciate constructive comments on the design and the code. > > > # Landlock LSM > > The goal of this new Linux Security Module (LSM) called Landlock is to > allow any process, including unprivileged ones, to create powerful > security sandboxes comparable to XNU Sandbox or OpenBSD Pledge. This > kind of sandbox is expected to help mitigate the security impact of bugs > or unexpected/malicious behaviors in user-space applications. > > The approach taken is to add the minimum amount of code while still > allowing the user-space application to create quite complex access > rules. A dedicated security policy language such as the one used by > SELinux, AppArmor and other major LSMs involves a lot of code and is > usually permitted to only a trusted user (i.e. root). On the contrary, > eBPF programs already exist and are designed to be safely loaded by > unprivileged user-space. > > This design does not seem too intrusive but is flexible enough to allow > a powerful sandbox mechanism accessible by any process on Linux. The use > of seccomp and Landlock is more suitable with the help of a user-space > library (e.g. libseccomp) that could help to specify a high-level > language to express a security policy instead of raw eBPF programs. > Moreover, thanks to the LLVM front-end, it is quite easy to write an > eBPF program with a subset of the C language. > > > # Frequently asked questions > > ## Why is seccomp-bpf not enough? > > A seccomp filter can access only raw syscall arguments (i.e. the > register values) which means that it is not possible to filter according > to the value pointed to by an argument, such as a file pathname. As an > embryonic Landlock version demonstrated, filtering at the syscall level > is complicated (e.g. need to take care of race conditions). This is > mainly because the access control checkpoints of the kernel are not at > this high-level but more underneath, at the LSM-hook level. The LSM > hooks are designed to handle this kind of checks. Landlock abstracts > this approach to leverage the ability of unprivileged users to limit > themselves. > > Cf. section "What it isn't?" in Documentation/prctl/seccomp_filter.txt > > > ## Why use the seccomp(2) syscall? > > Landlock use the same semantic as seccomp to apply access rule > restrictions. It add a new layer of security for the current process > which is inherited by its children. It makes sense to use an unique > access-restricting syscall (that should be allowed by seccomp filters) > which can only drop privileges. Moreover, a Landlock rule could come > from outside a process (e.g. passed through a UNIX socket). It is then > useful to differentiate the creation/load of Landlock eBPF programs via > bpf(2), from rule enforcement via seccomp(2). This seems like a weak argument to me. Sure, this is a bit different from seccomp(), and maybe shoving it into the seccomp() multiplexer is awkward, but surely the bpf() multiplexer is even less applicable. But I think that you have more in common with seccomp() than you're giving it credit for. With seccomp, you need to either prevent ptrace() of any more-privileged task or you need to filter to make sure you can't trace a more privileged program. With landlock, you need exactly the same thing. You have basically the same no_new_privs considerations, etc. Also, looking forward, I think you're going to want a bunch of the stuff that's under consideration as new seccomp features. Tycho is working on a "user notifier" feature for seccomp where, in addition to accepting, rejecting, or kicking to ptrace, you can send a message to the creator of the filter and wait for a reply. I think that Landlock will want exactly the same feature. In other words, it really seems to be that you should extend seccomp() with the ability to attach filters to things that aren't syscall entry, e.g. file open. I would also seriously consider doing a scaled-back Landlock variant first, with the intent of getting the main mechanism into the kernel. In particular, there are two big sources of complexity in Landlock. You need to deal with the API for managing bpf programs that filter various actions beyond just syscall entry, and you need to deal with giving those filters a way to deal with inodes, paths, etc. But you can do the former without the latter. For example, you could start with some Landlock-style filters on things that have nothing to do with files. For example, you could allow a filter for connecting to an abstract-namespace unix socket. Or you could have a hook for file_receive. (You couldn't meaningfully filter based on the *path* of the fd being received without adding all the path infrastructure, but you could fitler on the *type* of the fd being received.) Both of these add new sandboxing abilities that don't currently exist. In particular, you can't write a seccomp rule that prevents receiving an fd using recvmsg() right now unless you block cmsg entirely. And you can't write a filter that allows connecting to unix sockets by path without allowing abstract namespace sockets either. If you split up Landlock like this then, once you got all the installation and management of filters down, you could submit patches to add all the path stuff and deal with that review separately. What do you all think?