2020-12-09 20:33:57

by Mickaël Salaün

[permalink] [raw]
Subject: [PATCH v26 00/12] Landlock LSM

Hi,

This patch series adds new built-time checks, a new test, renames some
variables and functions to improve readability, and shift syscall
numbers to align with -next.

The SLOC count is 1289 for security/landlock/ and 1791 for
tools/testing/selftest/landlock/ . Test coverage for security/landlock/
is 94.1% of lines. The code not covered only deals with internal kernel
errors (e.g. memory allocation) and race conditions.

The compiled documentation is available here:
https://landlock.io/linux-doc/landlock-v26/userspace-api/landlock.html

This series can be applied on top of v5.10-rc7 . This can be tested
with CONFIG_SECURITY_LANDLOCK, CONFIG_SAMPLE_LANDLOCK and by prepending
"landlock," to CONFIG_LSM. This patch series can be found in a Git
repository here:
https://github.com/landlock-lsm/linux/commits/landlock-v26
I would really appreciate constructive comments on this patch series.


# Landlock LSM

The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes. Because Landlock is a
stackable LSM [1], it makes possible to create safe security sandboxes
as new security layers in addition to the existing system-wide
access-controls. This kind of sandbox is expected to help mitigate the
security impact of bugs or unexpected/malicious behaviors in user-space
applications. Landlock empowers any process, including unprivileged
ones, to securely restrict themselves.

Landlock is inspired by seccomp-bpf but instead of filtering syscalls
and their raw arguments, a Landlock rule can restrict the use of kernel
objects like file hierarchies, according to the kernel semantic.
Landlock also takes inspiration from other OS sandbox mechanisms: XNU
Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil.

In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review. This series
still addresses multiple use cases, especially with the combined use of
seccomp-bpf: applications with built-in sandboxing, init systems,
security sandbox tools and security-oriented APIs [2].

Previous version:
https://lore.kernel.org/lkml/[email protected]

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/


Casey Schaufler (1):
LSM: Infrastructure management of the superblock

Mickaël Salaün (11):
landlock: Add object management
landlock: Add ruleset and domain management
landlock: Set up the security framework and manage credentials
landlock: Add ptrace restrictions
fs,security: Add sb_delete hook
landlock: Support filesystem access-control
landlock: Add syscall implementations
arch: Wire up Landlock syscalls
selftests/landlock: Add user space tests
samples/landlock: Add a sandbox manager example
landlock: Add user and kernel documentation

Documentation/security/index.rst | 1 +
Documentation/security/landlock.rst | 79 +
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/landlock.rst | 280 +++
MAINTAINERS | 13 +
arch/Kconfig | 7 +
arch/alpha/kernel/syscalls/syscall.tbl | 3 +
arch/arm/tools/syscall.tbl | 3 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 6 +
arch/ia64/kernel/syscalls/syscall.tbl | 3 +
arch/m68k/kernel/syscalls/syscall.tbl | 3 +
arch/microblaze/kernel/syscalls/syscall.tbl | 3 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 3 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 3 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 3 +
arch/parisc/kernel/syscalls/syscall.tbl | 3 +
arch/powerpc/kernel/syscalls/syscall.tbl | 3 +
arch/s390/kernel/syscalls/syscall.tbl | 3 +
arch/sh/kernel/syscalls/syscall.tbl | 3 +
arch/sparc/kernel/syscalls/syscall.tbl | 3 +
arch/um/Kconfig | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 3 +
arch/x86/entry/syscalls/syscall_64.tbl | 3 +
arch/xtensa/kernel/syscalls/syscall.tbl | 3 +
fs/super.c | 1 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 3 +
include/linux/security.h | 4 +
include/linux/syscalls.h | 7 +
include/uapi/asm-generic/unistd.h | 8 +-
include/uapi/linux/landlock.h | 128 ++
kernel/sys_ni.c | 5 +
samples/Kconfig | 7 +
samples/Makefile | 1 +
samples/landlock/.gitignore | 1 +
samples/landlock/Makefile | 15 +
samples/landlock/sandboxer.c | 233 +++
security/Kconfig | 11 +-
security/Makefile | 2 +
security/landlock/Kconfig | 21 +
security/landlock/Makefile | 4 +
security/landlock/common.h | 20 +
security/landlock/cred.c | 46 +
security/landlock/cred.h | 58 +
security/landlock/fs.c | 622 ++++++
security/landlock/fs.h | 56 +
security/landlock/limits.h | 21 +
security/landlock/object.c | 67 +
security/landlock/object.h | 91 +
security/landlock/ptrace.c | 120 ++
security/landlock/ptrace.h | 14 +
security/landlock/ruleset.c | 466 +++++
security/landlock/ruleset.h | 161 ++
security/landlock/setup.c | 40 +
security/landlock/setup.h | 18 +
security/landlock/syscall.c | 427 ++++
security/security.c | 51 +-
security/selinux/hooks.c | 58 +-
security/selinux/include/objsec.h | 6 +
security/selinux/ss/services.c | 3 +-
security/smack/smack.h | 6 +
security/smack/smack_lsm.c | 35 +-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/landlock/.gitignore | 2 +
tools/testing/selftests/landlock/Makefile | 24 +
tools/testing/selftests/landlock/base_test.c | 219 ++
tools/testing/selftests/landlock/common.h | 110 +
tools/testing/selftests/landlock/config | 5 +
tools/testing/selftests/landlock/fs_test.c | 1799 +++++++++++++++++
.../testing/selftests/landlock/ptrace_test.c | 314 +++
tools/testing/selftests/landlock/true.c | 5 +
72 files changed, 5678 insertions(+), 77 deletions(-)
create mode 100644 Documentation/security/landlock.rst
create mode 100644 Documentation/userspace-api/landlock.rst
create mode 100644 include/uapi/linux/landlock.h
create mode 100644 samples/landlock/.gitignore
create mode 100644 samples/landlock/Makefile
create mode 100644 samples/landlock/sandboxer.c
create mode 100644 security/landlock/Kconfig
create mode 100644 security/landlock/Makefile
create mode 100644 security/landlock/common.h
create mode 100644 security/landlock/cred.c
create mode 100644 security/landlock/cred.h
create mode 100644 security/landlock/fs.c
create mode 100644 security/landlock/fs.h
create mode 100644 security/landlock/limits.h
create mode 100644 security/landlock/object.c
create mode 100644 security/landlock/object.h
create mode 100644 security/landlock/ptrace.c
create mode 100644 security/landlock/ptrace.h
create mode 100644 security/landlock/ruleset.c
create mode 100644 security/landlock/ruleset.h
create mode 100644 security/landlock/setup.c
create mode 100644 security/landlock/setup.h
create mode 100644 security/landlock/syscall.c
create mode 100644 tools/testing/selftests/landlock/.gitignore
create mode 100644 tools/testing/selftests/landlock/Makefile
create mode 100644 tools/testing/selftests/landlock/base_test.c
create mode 100644 tools/testing/selftests/landlock/common.h
create mode 100644 tools/testing/selftests/landlock/config
create mode 100644 tools/testing/selftests/landlock/fs_test.c
create mode 100644 tools/testing/selftests/landlock/ptrace_test.c
create mode 100644 tools/testing/selftests/landlock/true.c


base-commit: 0477e92881850d44910a7e94fc2c46f96faa131f
--
2.29.2


2020-12-09 20:34:35

by Mickaël Salaün

[permalink] [raw]
Subject: [PATCH v26 02/12] landlock: Add ruleset and domain management

From: Mickaël Salaün <[email protected]>

A Landlock ruleset is mainly a red-black tree with Landlock rules as
nodes. This enables quick update and lookup to match a requested
access, e.g. to a file. A ruleset is usable through a dedicated file
descriptor (cf. following commit implementing syscalls) which enables a
process to create and populate a ruleset with new rules.

A domain is a ruleset tied to a set of processes. This group of rules
defines the security policy enforced on these processes and their future
children. A domain can transition to a new domain which is the
intersection of all its constraints and those of a ruleset provided by
the current process. This modification only impact the current process.
This means that a process can only gain more constraints (i.e. lose
accesses) over time.

Cc: James Morris <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Signed-off-by: Mickaël Salaün <[email protected]>
---

Changes since v25:
* Add build-time checks for the num_layers and num_rules variables
according to LANDLOCK_MAX_NUM_LAYERS and LANDLOCK_MAX_NUM_RULES, and
move these limits to a dedicated file.
* Cosmetic variable renames.

Changes since v24:
* Update struct landlock_rule with a layer stack. This reverts "Always
intersect access rights" from v24 and also adds the ability to tie
access rights with their policy layer. As noted by Jann Horn, always
intersecting access rights made some use cases uselessly more
difficult to handle in user space. Thanks to this new stack, we still
have a deterministic policy behavior whatever their level in the stack
of policies, while using a "union" of accesses when building a
ruleset. The implementation use a FAM to keep the access checks quick
and memory efficient (4 bytes per layer per inode). Update
insert_rule() accordingly.

Changes since v23:
* Always intersect access rights. Following the filesystem change
logic, make ruleset updates more consistent by always intersecting
access rights (boolean AND) instead of combining them (boolean OR) for
the same layer. This defensive approach could also help avoid user
space to inadvertently allow multiple access rights for the same
object (e.g. write and execute access on a path hierarchy) instead of
dealing with such inconsistency. This can happen when there is no
deduplication of objects (e.g. paths and underlying inodes) whereas
they get different access rights with landlock_add_rule(2).
* Add extra checks to make sure that:
- there is always an (allocated) object in each used rules;
- when updating a ruleset with a new rule (i.e. not merging two
rulesets), the ruleset doesn't contain multiple layers.
* Hide merge parameter from the public landlock_insert_rule() API. This
helps avoid misuse of this function.
* Replace a remaining hardcoded 1 with SINGLE_DEPTH_NESTING.

Changes since v22:
* Explicitely use RB_ROOT and SINGLE_DEPTH_NESTING (suggested by Jann
Horn).
* Improve comments and fix spelling (suggested by Jann Horn).

Changes since v21:
* Add and clean up comments.

Changes since v18:
* Account rulesets to kmemcg.
* Remove struct holes.
* Cosmetic changes.

Changes since v17:
* Move include/uapi/linux/landlock.h and _LANDLOCK_ACCESS_FS_* to a
following patch.

Changes since v16:
* Allow enforcement of empty ruleset, which enables deny-all policies.

Changes since v15:
* Replace layer_levels and layer_depth with a bitfield of layers, cf.
filesystem commit.
* Rename the LANDLOCK_ACCESS_FS_{UNLINK,RMDIR} with
LANDLOCK_ACCESS_FS_REMOVE_{FILE,DIR} because it makes sense to use
them for the action of renaming a file or a directory, which may lead
to the removal of the source file or directory. Removes the
LANDLOCK_ACCESS_FS_{LINK_TO,RENAME_FROM,RENAME_TO} which are now
replaced with LANDLOCK_ACCESS_FS_REMOVE_{FILE,DIR} and
LANDLOCK_ACCESS_FS_MAKE_* .
* Update the documentation accordingly and highlight how the access
rights are taken into account.
* Change nb_rules from atomic_t to u32 because it is not use anymore by
show_fdinfo().
* Add safeguard for level variables types.
* Check max number of rules.
* Replace struct landlock_access (self and beneath bitfields) with one
bitfield.
* Remove useless variable.
* Add comments.

Changes since v14:
* Simplify the object, rule and ruleset management at the expense of a
less aggressive memory freeing (contributed by Jann Horn, with
additional modifications):
- Make a domain immutable (remove the opportunistic cleaning).
- Remove RCU pointers.
- Merge struct landlock_ref and struct landlock_ruleset_elem into
landlock_rule: get ride of rule's RCU.
- Adjust union.
- Remove the landlock_insert_rule() check about a new object with the
same address as a previously disabled one, because it is not
possible to disable a rule anymore.
Cf. https://lore.kernel.org/lkml/CAG48ez21bEn0wL1bbmTiiu8j9jP5iEWtHOwz4tURUJ+ki0ydYw@mail.gmail.com/
* Fix nested domains by implementing a notion of layer level and depth:
- Update landlock_insert_rule() to manage such layers.
- Add an inherit_ruleset() helper to properly create a new domain.
- Rename landlock_find_access() to landlock_find_rule() and return a
full rule reference.
- Add a layer_level and a layer_depth fields to struct landlock_rule.
- Add a top_layer_level field to struct landlock_ruleset.
* Remove access rights that may be required for FD-only requests:
truncate, getattr, lock, chmod, chown, chgrp, ioctl. This will be
handle in a future evolution of Landlock, but right now the goal is to
lighten the code to ease review.
* Remove LANDLOCK_ACCESS_FS_OPEN and rename
LANDLOCK_ACCESS_FS_{READ,WRITE} with a FILE suffix.
* Rename LANDLOCK_ACCESS_FS_READDIR to match the *_FILE pattern.
* Remove LANDLOCK_ACCESS_FS_MAP which was useless.
* Fix memory leak in put_hierarchy() (reported by Jann Horn).
* Fix user-after-free and rename free_ruleset() (reported by Jann Horn).
* Replace the for loops with rbtree_postorder_for_each_entry_safe().
* Constify variables.
* Only use refcount_inc() through getter helpers.
* Change Landlock_insert_ruleset_access() to
Landlock_insert_ruleset_rule().
* Rename landlock_put_ruleset_enqueue() to landlock_put_ruleset_deferred().
* Improve kernel documentation and add a warning about the unhandled
access/syscall families.
* Move ABI check to syscall.c .

Changes since v13:
* New implementation, inspired by the previous inode eBPF map, but
agnostic to the underlying kernel object.

Previous changes:
https://lore.kernel.org/lkml/[email protected]/
---
security/landlock/Makefile | 2 +-
security/landlock/limits.h | 17 ++
security/landlock/ruleset.c | 462 ++++++++++++++++++++++++++++++++++++
security/landlock/ruleset.h | 161 +++++++++++++
4 files changed, 641 insertions(+), 1 deletion(-)
create mode 100644 security/landlock/limits.h
create mode 100644 security/landlock/ruleset.c
create mode 100644 security/landlock/ruleset.h

diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index cb6deefbf4c0..d846eba445bb 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,3 @@
obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o

-landlock-y := object.o
+landlock-y := object.o ruleset.o
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
new file mode 100644
index 000000000000..b734f597bb0e
--- /dev/null
+++ b/security/landlock/limits.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Limits for different components
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_LIMITS_H
+#define _SECURITY_LANDLOCK_LIMITS_H
+
+#include <linux/limits.h>
+
+#define LANDLOCK_MAX_NUM_LAYERS 64
+#define LANDLOCK_MAX_NUM_RULES U32_MAX
+
+#endif /* _SECURITY_LANDLOCK_LIMITS_H */
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
new file mode 100644
index 000000000000..bf7ff66c1b12
--- /dev/null
+++ b/security/landlock/ruleset.c
@@ -0,0 +1,462 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Ruleset management
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include <linux/bits.h>
+#include <linux/bug.h>
+#include <linux/compiler_types.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/lockdep.h>
+#include <linux/overflow.h>
+#include <linux/rbtree.h>
+#include <linux/refcount.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/workqueue.h>
+
+#include "limits.h"
+#include "object.h"
+#include "ruleset.h"
+
+static struct landlock_ruleset *create_ruleset(void)
+{
+ struct landlock_ruleset *new_ruleset;
+
+ new_ruleset = kzalloc(sizeof(*new_ruleset), GFP_KERNEL_ACCOUNT);
+ if (!new_ruleset)
+ return ERR_PTR(-ENOMEM);
+ refcount_set(&new_ruleset->usage, 1);
+ mutex_init(&new_ruleset->lock);
+ new_ruleset->root = RB_ROOT;
+ /*
+ * hierarchy = NULL
+ * num_rules = 0
+ * num_layers = 0
+ * fs_access_mask = 0
+ */
+ return new_ruleset;
+}
+
+struct landlock_ruleset *landlock_create_ruleset(const u32 fs_access_mask)
+{
+ struct landlock_ruleset *new_ruleset;
+
+ /* Informs about useless ruleset. */
+ if (!fs_access_mask)
+ return ERR_PTR(-ENOMSG);
+ new_ruleset = create_ruleset();
+ if (!IS_ERR(new_ruleset))
+ new_ruleset->fs_access_mask = fs_access_mask;
+ return new_ruleset;
+}
+
+static void build_check_rule(void)
+{
+ const struct landlock_rule rule = {
+ .num_layers = ~0,
+ };
+
+ BUILD_BUG_ON(rule.num_layers < LANDLOCK_MAX_NUM_LAYERS);
+}
+
+static struct landlock_rule *create_rule(
+ struct landlock_object *const object,
+ const struct landlock_layer (*const layers)[],
+ const u32 num_layers,
+ const struct landlock_layer *const new_layer)
+{
+ struct landlock_rule *new_rule;
+ u32 new_num_layers = num_layers;
+
+ build_check_rule();
+ if (new_layer) {
+ /* Should already be checked by merge_ruleset(). */
+ if (WARN_ON_ONCE(num_layers == LANDLOCK_MAX_NUM_LAYERS))
+ return ERR_PTR(-E2BIG);
+ new_num_layers++;
+ }
+ new_rule = kzalloc(struct_size(new_rule, layers, new_num_layers),
+ GFP_KERNEL_ACCOUNT);
+ if (!new_rule)
+ return ERR_PTR(-ENOMEM);
+ RB_CLEAR_NODE(&new_rule->node);
+ landlock_get_object(object);
+ new_rule->object = object;
+ new_rule->num_layers = new_num_layers;
+ if (new_layer)
+ /* Push a copy of @new_layer on the layer stack. */
+ new_rule->layers[0] = *new_layer;
+ /* Copies the original layer stack. */
+ memcpy(&new_rule->layers[new_layer ? 1 : 0], layers,
+ flex_array_size(new_rule, layers, num_layers));
+ return new_rule;
+}
+
+static void put_rule(struct landlock_rule *const rule)
+{
+ might_sleep();
+ if (!rule)
+ return;
+ landlock_put_object(rule->object);
+ kfree(rule);
+}
+
+static void build_check_ruleset(void)
+{
+ const struct landlock_ruleset ruleset = {
+ .num_rules = ~0,
+ .num_layers = ~0,
+ };
+
+ BUILD_BUG_ON(ruleset.num_rules < LANDLOCK_MAX_NUM_RULES);
+ BUILD_BUG_ON(ruleset.num_layers < LANDLOCK_MAX_NUM_LAYERS);
+}
+
+/**
+ * insert_rule - Create and insert a rule in a ruleset
+ *
+ * @ruleset: The ruleset to be updated.
+ * @object: The object to build the new rule with. The underlying kernel
+ * object must be held by the caller.
+ * @layers: One or multiple layers to be copied into the new rule.
+ * @num_layers: The number of @layers entries.
+
+ * When user space requests to add a new rule to a ruleset, @layers only
+ * contains one entry and this entry is not assigned to any level. In this
+ * case, the new rule will extend @ruleset, similarly to a boolean OR between
+ * access rights.
+ *
+ * When merging a ruleset in a domain, or copying a domain, @layers will be
+ * added to @ruleset as new constraints, similarly to a boolean AND between
+ * access rights.
+ */
+static int insert_rule(struct landlock_ruleset *const ruleset,
+ struct landlock_object *const object,
+ const struct landlock_layer (*const layers)[],
+ size_t num_layers)
+{
+ struct rb_node **walker_node;
+ struct rb_node *parent_node = NULL;
+ struct landlock_rule *new_rule;
+
+ might_sleep();
+ lockdep_assert_held(&ruleset->lock);
+ if (WARN_ON_ONCE(!object || !layers))
+ return -ENOENT;
+ walker_node = &(ruleset->root.rb_node);
+ while (*walker_node) {
+ struct landlock_rule *const this = rb_entry(*walker_node,
+ struct landlock_rule, node);
+
+ if (this->object != object) {
+ parent_node = *walker_node;
+ if (this->object < object)
+ walker_node = &((*walker_node)->rb_right);
+ else
+ walker_node = &((*walker_node)->rb_left);
+ continue;
+ }
+
+ /* Only a single-level layer should match an existing rule. */
+ if (WARN_ON_ONCE(num_layers != 1))
+ return -EINVAL;
+
+ /* If there is a matching rule, updates it. */
+ if ((*layers)[0].level == 0) {
+ /*
+ * Extends access rights when the request comes from
+ * landlock_add_rule(2), i.e. @ruleset is not a domain.
+ */
+ if (WARN_ON_ONCE(this->num_layers != 1))
+ return -EINVAL;
+ if (WARN_ON_ONCE(this->layers[0].level != 0))
+ return -EINVAL;
+ this->layers[0].access |= (*layers)[0].access;
+ return 0;
+ }
+
+ if (WARN_ON_ONCE(this->layers[0].level == 0))
+ return -EINVAL;
+
+ /*
+ * Intersects access rights when it is a merge between a
+ * ruleset and a domain.
+ */
+ new_rule = create_rule(object, &this->layers, this->num_layers,
+ &(*layers)[0]);
+ if (IS_ERR(new_rule))
+ return PTR_ERR(new_rule);
+ rb_replace_node(&this->node, &new_rule->node, &ruleset->root);
+ put_rule(this);
+ return 0;
+ }
+
+ /* There is no match for @object. */
+ build_check_ruleset();
+ if (ruleset->num_rules == LANDLOCK_MAX_NUM_RULES)
+ return -E2BIG;
+ new_rule = create_rule(object, layers, num_layers, NULL);
+ if (IS_ERR(new_rule))
+ return PTR_ERR(new_rule);
+ rb_link_node(&new_rule->node, parent_node, walker_node);
+ rb_insert_color(&new_rule->node, &ruleset->root);
+ ruleset->num_rules++;
+ return 0;
+}
+
+static void build_check_layer(void)
+{
+ const struct landlock_layer layer = {
+ .level = ~0,
+ };
+
+ BUILD_BUG_ON(layer.level < LANDLOCK_MAX_NUM_LAYERS);
+}
+
+int landlock_insert_rule(struct landlock_ruleset *const ruleset,
+ struct landlock_object *const object, const u32 access)
+{
+ struct landlock_layer layers[] = {{
+ .access = access,
+ /* When @level is zero, insert_rule() extends @ruleset. */
+ .level = 0,
+ }};
+
+ build_check_layer();
+ return insert_rule(ruleset, object, &layers, ARRAY_SIZE(layers));
+}
+
+static inline void get_hierarchy(struct landlock_hierarchy *const hierarchy)
+{
+ if (hierarchy)
+ refcount_inc(&hierarchy->usage);
+}
+
+static void put_hierarchy(struct landlock_hierarchy *hierarchy)
+{
+ while (hierarchy && refcount_dec_and_test(&hierarchy->usage)) {
+ const struct landlock_hierarchy *const freeme = hierarchy;
+
+ hierarchy = hierarchy->parent;
+ kfree(freeme);
+ }
+}
+
+static int merge_ruleset(struct landlock_ruleset *const dst,
+ struct landlock_ruleset *const src)
+{
+ struct landlock_rule *walker_rule, *next_rule;
+ int err = 0;
+
+ might_sleep();
+ if (!src)
+ return 0;
+ /* Only merge into a domain. */
+ if (WARN_ON_ONCE(!dst || !dst->hierarchy))
+ return -EINVAL;
+
+ /*
+ * The ruleset being modified (@dst) is locked first, then the ruleset
+ * being copied (@src).
+ */
+ mutex_lock(&dst->lock);
+ mutex_lock_nested(&src->lock, SINGLE_DEPTH_NESTING);
+ /*
+ * Makes a new layer, but only increments the number of layers after
+ * the rules are inserted. The layer 0 is invalid, and the last layer
+ * is then LANDLOCK_MAX_NUM_LAYERS.
+ */
+ if (dst->num_layers == LANDLOCK_MAX_NUM_LAYERS) {
+ err = -E2BIG;
+ goto out_unlock;
+ }
+ dst->fs_access_mask |= src->fs_access_mask;
+
+ /* Merges the @src tree. */
+ rbtree_postorder_for_each_entry_safe(walker_rule, next_rule,
+ &src->root, node) {
+ struct landlock_layer layers[] = {{
+ .level = dst->num_layers + 1,
+ }};
+
+ if (WARN_ON_ONCE(walker_rule->num_layers != 1)) {
+ err = -EINVAL;
+ goto out_unlock;
+ }
+ if (WARN_ON_ONCE(walker_rule->layers[0].level != 0)) {
+ err = -EINVAL;
+ goto out_unlock;
+ }
+ layers[0].access = walker_rule->layers[0].access;
+ err = insert_rule(dst, walker_rule->object, &layers,
+ ARRAY_SIZE(layers));
+ if (err)
+ goto out_unlock;
+ }
+ dst->num_layers++;
+
+out_unlock:
+ mutex_unlock(&src->lock);
+ mutex_unlock(&dst->lock);
+ return err;
+}
+
+static struct landlock_ruleset *inherit_ruleset(
+ struct landlock_ruleset *const parent)
+{
+ struct landlock_rule *walker_rule, *next_rule;
+ struct landlock_ruleset *new_ruleset;
+ int err = 0;
+
+ might_sleep();
+ new_ruleset = create_ruleset();
+ if (IS_ERR(new_ruleset))
+ return new_ruleset;
+
+ new_ruleset->hierarchy = kzalloc(sizeof(*new_ruleset->hierarchy),
+ GFP_KERNEL_ACCOUNT);
+ if (!new_ruleset->hierarchy) {
+ err = -ENOMEM;
+ goto out_put_ruleset;
+ }
+ refcount_set(&new_ruleset->hierarchy->usage, 1);
+ if (!parent)
+ return new_ruleset;
+
+ mutex_lock(&new_ruleset->lock);
+ mutex_lock_nested(&parent->lock, SINGLE_DEPTH_NESTING);
+
+ /* Copies the @parent tree. */
+ rbtree_postorder_for_each_entry_safe(walker_rule, next_rule,
+ &parent->root, node) {
+ err = insert_rule(new_ruleset, walker_rule->object,
+ &walker_rule->layers, walker_rule->num_layers);
+ if (err)
+ goto out_unlock;
+ }
+ new_ruleset->num_layers = parent->num_layers;
+ new_ruleset->fs_access_mask = parent->fs_access_mask;
+ if (WARN_ON_ONCE(!parent->hierarchy)) {
+ err = -EINVAL;
+ goto out_unlock;
+ }
+ get_hierarchy(parent->hierarchy);
+ new_ruleset->hierarchy->parent = parent->hierarchy;
+
+ mutex_unlock(&parent->lock);
+ mutex_unlock(&new_ruleset->lock);
+ return new_ruleset;
+
+out_unlock:
+ mutex_unlock(&parent->lock);
+ mutex_unlock(&new_ruleset->lock);
+
+out_put_ruleset:
+ landlock_put_ruleset(new_ruleset);
+ return ERR_PTR(err);
+}
+
+static void free_ruleset(struct landlock_ruleset *const ruleset)
+{
+ struct landlock_rule *freeme, *next;
+
+ might_sleep();
+ rbtree_postorder_for_each_entry_safe(freeme, next, &ruleset->root,
+ node)
+ put_rule(freeme);
+ put_hierarchy(ruleset->hierarchy);
+ kfree(ruleset);
+}
+
+void landlock_put_ruleset(struct landlock_ruleset *const ruleset)
+{
+ might_sleep();
+ if (ruleset && refcount_dec_and_test(&ruleset->usage))
+ free_ruleset(ruleset);
+}
+
+static void free_ruleset_work(struct work_struct *const work)
+{
+ struct landlock_ruleset *ruleset;
+
+ ruleset = container_of(work, struct landlock_ruleset, work_free);
+ free_ruleset(ruleset);
+}
+
+void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset)
+{
+ if (ruleset && refcount_dec_and_test(&ruleset->usage)) {
+ INIT_WORK(&ruleset->work_free, free_ruleset_work);
+ schedule_work(&ruleset->work_free);
+ }
+}
+
+/**
+ * landlock_merge_ruleset - Merge a ruleset with a domain
+ *
+ * @parent: Parent domain.
+ * @ruleset: New ruleset to be merged.
+ *
+ * Returns the intersection of @parent and @ruleset, or returns @parent if
+ * @ruleset is empty, or returns a duplicate of @ruleset if @parent is empty.
+ */
+struct landlock_ruleset *landlock_merge_ruleset(
+ struct landlock_ruleset *const parent,
+ struct landlock_ruleset *const ruleset)
+{
+ struct landlock_ruleset *new_dom;
+ int err;
+
+ might_sleep();
+ /*
+ * Merging duplicates a ruleset, so a new ruleset cannot be
+ * the same as the parent, but they can have similar content.
+ */
+ if (WARN_ON_ONCE(!ruleset || parent == ruleset)) {
+ landlock_get_ruleset(parent);
+ return parent;
+ }
+
+ new_dom = inherit_ruleset(parent);
+ if (IS_ERR(new_dom))
+ return new_dom;
+
+ err = merge_ruleset(new_dom, ruleset);
+ if (err) {
+ landlock_put_ruleset(new_dom);
+ return ERR_PTR(err);
+ }
+ return new_dom;
+}
+
+/*
+ * The returned access has the same lifetime as @ruleset.
+ */
+const struct landlock_rule *landlock_find_rule(
+ const struct landlock_ruleset *const ruleset,
+ const struct landlock_object *const object)
+{
+ const struct rb_node *node;
+
+ if (!object)
+ return NULL;
+ node = ruleset->root.rb_node;
+ while (node) {
+ struct landlock_rule *this = rb_entry(node,
+ struct landlock_rule, node);
+
+ if (this->object == object)
+ return this;
+ if (this->object < object)
+ node = node->rb_right;
+ else
+ node = node->rb_left;
+ }
+ return NULL;
+}
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
new file mode 100644
index 000000000000..f99686cda94b
--- /dev/null
+++ b/security/landlock/ruleset.h
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Ruleset management
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_RULESET_H
+#define _SECURITY_LANDLOCK_RULESET_H
+
+#include <linux/mutex.h>
+#include <linux/rbtree.h>
+#include <linux/refcount.h>
+#include <linux/workqueue.h>
+
+#include "object.h"
+
+/**
+ * struct landlock_layer - Access rights for a given layer
+ */
+struct landlock_layer {
+ /**
+ * @level: Position of this layer in the layer stack.
+ */
+ u16 level;
+ /**
+ * @access: Bitfield of allowed actions on the kernel object. They are
+ * relative to the object type (e.g. %LANDLOCK_ACTION_FS_READ).
+ */
+ u16 access;
+};
+
+/**
+ * struct landlock_rule - Access rights tied to an object
+ */
+struct landlock_rule {
+ /**
+ * @node: Node in the ruleset's red-black tree.
+ */
+ struct rb_node node;
+ /**
+ * @object: Pointer to identify a kernel object (e.g. an inode). This
+ * is used as a key for this ruleset element. This pointer is set once
+ * and never modified. It always points to an allocated object because
+ * each rule increments the refcount of its object.
+ */
+ struct landlock_object *object;
+ /**
+ * @num_layers: Number of entries in @layers.
+ */
+ u32 num_layers;
+ /**
+ * @layers: Stack of layers, from the newest to the latest, implemented
+ * as a flexible array member.
+ */
+ struct landlock_layer layers[];
+};
+
+/**
+ * struct landlock_hierarchy - Node in a ruleset hierarchy
+ */
+struct landlock_hierarchy {
+ /**
+ * @parent: Pointer to the parent node, or NULL if it is a root
+ * Landlock domain.
+ */
+ struct landlock_hierarchy *parent;
+ /**
+ * @usage: Number of potential children domains plus their parent
+ * domain.
+ */
+ refcount_t usage;
+};
+
+/**
+ * struct landlock_ruleset - Landlock ruleset
+ *
+ * This data structure must contain unique entries, be updatable, and quick to
+ * match an object.
+ */
+struct landlock_ruleset {
+ /**
+ * @root: Root of a red-black tree containing &struct landlock_rule
+ * nodes. Once a ruleset is tied to a process (i.e. as a domain), this
+ * tree is immutable until @usage reaches zero.
+ */
+ struct rb_root root;
+ /**
+ * @hierarchy: Enables hierarchy identification even when a parent
+ * domain vanishes. This is needed for the ptrace protection.
+ */
+ struct landlock_hierarchy *hierarchy;
+ union {
+ /**
+ * @work_free: Enables to free a ruleset within a lockless
+ * section. This is only used by
+ * landlock_put_ruleset_deferred() when @usage reaches zero.
+ * The fields @lock, @usage, @num_layers, @num_rules and
+ * @fs_access_mask are then unused.
+ */
+ struct work_struct work_free;
+ struct {
+ /**
+ * @lock: Guards against concurrent modifications of
+ * @root, if @usage is greater than zero.
+ */
+ struct mutex lock;
+ /**
+ * @usage: Number of processes (i.e. domains) or file
+ * descriptors referencing this ruleset.
+ */
+ refcount_t usage;
+ /**
+ * @num_rules: Number of non-overlapping (i.e. not for
+ * the same object) rules in this ruleset.
+ */
+ u32 num_rules;
+ /**
+ * @num_layers: Number of layers which are used in this
+ * ruleset. This enables to check that all the layers
+ * allow an access request. A value of 0 identifies a
+ * non-merged ruleset (i.e. not a domain).
+ */
+ u32 num_layers;
+ /**
+ * @fs_access_mask: Contains the subset of filesystem
+ * actions which are restricted by a ruleset. This is
+ * used when merging rulesets and for user space
+ * backward compatibility (i.e. future-proof). Set
+ * once and never changed for the lifetime of the
+ * ruleset.
+ */
+ u32 fs_access_mask;
+ };
+ };
+};
+
+struct landlock_ruleset *landlock_create_ruleset(const u32 fs_access_mask);
+
+void landlock_put_ruleset(struct landlock_ruleset *const ruleset);
+void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset);
+
+int landlock_insert_rule(struct landlock_ruleset *const ruleset,
+ struct landlock_object *const object, const u32 access);
+
+struct landlock_ruleset *landlock_merge_ruleset(
+ struct landlock_ruleset *const parent,
+ struct landlock_ruleset *const ruleset);
+
+const struct landlock_rule *landlock_find_rule(
+ const struct landlock_ruleset *const ruleset,
+ const struct landlock_object *const object);
+
+static inline void landlock_get_ruleset(struct landlock_ruleset *const ruleset)
+{
+ if (ruleset)
+ refcount_inc(&ruleset->usage);
+}
+
+#endif /* _SECURITY_LANDLOCK_RULESET_H */
--
2.29.2

2020-12-09 23:48:41

by Mickaël Salaün

[permalink] [raw]
Subject: [PATCH v26 06/12] fs,security: Add sb_delete hook

From: Mickaël Salaün <[email protected]>

The sb_delete security hook is called when shutting down a superblock,
which may be useful to release kernel objects tied to the superblock's
lifetime (e.g. inodes).

This new hook is needed by Landlock to release (ephemerally) tagged
struct inodes. This comes from the unprivileged nature of Landlock
described in the next commit.

Cc: Al Viro <[email protected]>
Cc: James Morris <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Signed-off-by: Mickaël Salaün <[email protected]>
Reviewed-by: Jann Horn <[email protected]>
---

Changes since v22:
* Add Reviewed-by: Jann Horn <[email protected]>

Changes since v17:
* Initial patch to replace the direct call to landlock_release_inodes()
(requested by James Morris).
https://lore.kernel.org/lkml/[email protected]/
---
fs/super.c | 1 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 2 ++
include/linux/security.h | 4 ++++
security/security.c | 5 +++++
5 files changed, 13 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index 98bb0629ee10..751cad8c081f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -454,6 +454,7 @@ void generic_shutdown_super(struct super_block *sb)
evict_inodes(sb);
/* only nonzero refcount inodes can have marks */
fsnotify_sb_delete(sb);
+ security_sb_delete(sb);

if (sb->s_dio_done_wq) {
destroy_workqueue(sb->s_dio_done_wq);
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 32a940117e7a..1ba9b4dfecb3 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -59,6 +59,7 @@ LSM_HOOK(int, 0, fs_context_dup, struct fs_context *fc,
LSM_HOOK(int, -ENOPARAM, fs_context_parse_param, struct fs_context *fc,
struct fs_parameter *param)
LSM_HOOK(int, 0, sb_alloc_security, struct super_block *sb)
+LSM_HOOK(void, LSM_RET_VOID, sb_delete, struct super_block *sb)
LSM_HOOK(void, LSM_RET_VOID, sb_free_security, struct super_block *sb)
LSM_HOOK(void, LSM_RET_VOID, sb_free_mnt_opts, void *mnt_opts)
LSM_HOOK(int, 0, sb_eat_lsm_opts, char *orig, void **mnt_opts)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index ff0f03a45c56..dbfcec05a176 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -108,6 +108,8 @@
* allocated.
* @sb contains the super_block structure to be modified.
* Return 0 if operation was successful.
+ * @sb_delete:
+ * Release objects tied to a superblock (e.g. inodes).
* @sb_free_security:
* Deallocate and clear the sb->s_security field.
* @sb contains the super_block structure to be modified.
diff --git a/include/linux/security.h b/include/linux/security.h
index bc2725491560..a4603b62d444 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -287,6 +287,7 @@ void security_bprm_committed_creds(struct linux_binprm *bprm);
int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc);
int security_fs_context_parse_param(struct fs_context *fc, struct fs_parameter *param);
int security_sb_alloc(struct super_block *sb);
+void security_sb_delete(struct super_block *sb);
void security_sb_free(struct super_block *sb);
void security_free_mnt_opts(void **mnt_opts);
int security_sb_eat_lsm_opts(char *options, void **mnt_opts);
@@ -619,6 +620,9 @@ static inline int security_sb_alloc(struct super_block *sb)
return 0;
}

+static inline void security_sb_delete(struct super_block *sb)
+{ }
+
static inline void security_sb_free(struct super_block *sb)
{ }

diff --git a/security/security.c b/security/security.c
index 4ffd6c3af9d7..4563e7a79216 100644
--- a/security/security.c
+++ b/security/security.c
@@ -899,6 +899,11 @@ int security_sb_alloc(struct super_block *sb)
return rc;
}

+void security_sb_delete(struct super_block *sb)
+{
+ call_void_hook(sb_delete, sb);
+}
+
void security_sb_free(struct super_block *sb)
{
call_void_hook(sb_free_security, sb);
--
2.29.2

2020-12-09 23:48:42

by Mickaël Salaün

[permalink] [raw]
Subject: [PATCH v26 07/12] landlock: Support filesystem access-control

From: Mickaël Salaün <[email protected]>

Thanks to the Landlock objects and ruleset, it is possible to identify
inodes according to a process's domain. To enable an unprivileged
process to express a file hierarchy, it first needs to open a directory
(or a file) and pass this file descriptor to the kernel through
landlock_add_rule(2). When checking if a file access request is
allowed, we walk from the requested dentry to the real root, following
the different mount layers. The access to each "tagged" inodes are
collected according to their rule layer level, and ANDed to create
access to the requested file hierarchy. This makes possible to identify
a lot of files without tagging every inodes nor modifying the
filesystem, while still following the view and understanding the user
has from the filesystem.

Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
keep the same struct inodes for the same inodes whereas these inodes are
in use.

This commit adds a minimal set of supported filesystem access-control
which doesn't enable to restrict all file-related actions. This is the
result of multiple discussions to minimize the code of Landlock to ease
review. Thanks to the Landlock design, extending this access-control
without breaking user space will not be a problem. Moreover, seccomp
filters can be used to restrict the use of syscall families which may
not be currently handled by Landlock.

Cc: Al Viro <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: James Morris <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Signed-off-by: Mickaël Salaün <[email protected]>
---

Changes since v25:
* Move build_check_layer() to ruleset.c, and add built-time checks for
the fs_access_mask and access variables according to
_LANDLOCK_ACCESS_FS_MASK.
* Move limits to a dedicated file and rename them:
_LANDLOCK_ACCESS_FS_LAST and _LANDLOCK_ACCESS_FS_MASK.
* Set build_check_layer() as non-inline to trigger a warning if it is
not called.
* Use BITS_PER_TYPE() macro.
* Rename function to landlock_add_fs_hooks().
* Cosmetic variable renames.

Changes since v24:
* Use the new struct landlock_rule and landlock_layer to not mix
accesses from different layers. Revert "Enforce deterministic
interleaved path rules" from v24, and fix the layer check. This
enables to follow a sane semantic: an access is granted if, for each
policy layer, at least one rule encountered on the pathwalk grants the
access, regardless of their position in the layer stack (suggested by
Jann Horn). See layout1.interleaved_masked_accesses tests from
tools/testing/selftests/landlock/fs_test.c for corner cases.
* Add build-time checks for layers.
* Use the new landlock_insert_rule() API.

Changes since v23:
* Enforce deterministic interleaved path rules. To have consistent
layered rules, granting access to a path implies that all accesses
tied to inodes, from the requested file to the real root, must be
checked. Otherwise, stacked rules may result to overzealous
restrictions. By excluding the ability to add exceptions in the same
layer (e.g. /a allowed, /a/b denied, and /a/b/c allowed), we get
deterministic interleaved path rules. This removes an optimization
which could be replaced by a proper cache mechanism. This also
further simplifies and explain check_access_path_continue().
* Fix memory allocation error handling in landlock_create_object()
calls. This prevent to inadvertently hold an inode.
* In get_inode_object(), improve comments, make code more readable and
move kfree() call out of the lock window.
* Use the simplified landlock_insert_rule() API.

Changes since v22:
* Simplify check_access_path_continue() (suggested by Jann Horn).
* Remove prefetch() call for now (suggested by Jann Horn).
* Fix spelling and remove superfluous comment (spotted by Jann Horn).
* Cosmetic variable renaming.

Changes since v21:
* Rename ARCH_EPHEMERAL_STATES to ARCH_EPHEMERAL_INODES (suggested by
James Morris).
* Remove the LANDLOCK_ACCESS_FS_CHROOT right because chroot(2) (which
requires CAP_SYS_CHROOT) doesn't enable to bypass Landlock (as tests
demonstrate it), and because it is often used by sandboxes, it would
be counterproductive to forbid it. This also reduces the code size.
* Clean up documentation.

Changes since v19:
* Fix spelling (spotted by Randy Dunlap).

Changes since v18:
* Remove useless include.
* Fix spelling.

Changes since v17:
* Replace landlock_release_inodes() with security_sb_delete() (requested
by James Morris).
* Replace struct super_block->s_landlock_inode_refs with the LSM
infrastructure management of the superblock (requested by James
Morris).
* Fix mknod restriction with a zero mode (spotted by Vincent Dagonneau).
* Minimize executed code in path_mknod and file_open hooks when the
current tasks is not sandboxed.
* Remove useless checks on the file pointer and inode in
hook_file_open() .
* Constify domain pointers.
* Rename inode_landlock() to landlock_inode().
* Import include/uapi/linux/landlock.h and _LANDLOCK_ACCESS_FS_* from
the ruleset and domain management patch.
* Explain the rational of this minimal set of access-control.
https://lore.kernel.org/lkml/[email protected]/

Changes since v16:
* Add ARCH_EPHEMERAL_STATES and enable it for UML.

Changes since v15:
* Replace layer_levels and layer_depth with a bitfield of layers: this
enables to properly manage superset and subset of access rights,
whatever their order in the stack of layers.
Cf. https://lore.kernel.org/lkml/[email protected]/
* Allow to open pipes and similar special files through /proc/self/fd/.
* Properly handle internal filesystems such as nsfs: always allow these
kind of roots because disconnected path cannot be evaluated.
* Remove the LANDLOCK_ACCESS_FS_LINK_TO and
LANDLOCK_ACCESS_FS_RENAME_{TO,FROM}, but use the
LANDLOCK_ACCESS_FS_REMOVE_{FILE,DIR} and LANDLOCK_ACCESS_FS_MAKE_*
instead. Indeed, it is not possible for now (and not really useful)
to express the semantic of a source and a destination.
* Check access rights to remove a directory or a file with rename(2).
* Forbid reparenting when linking or renaming. This is needed to easily
protect against possible privilege escalation by changing the place of
a file or directory in relation to an enforced access policy (from the
set of layers). This will be relaxed in the future.
* Update hooks to take into account replacement of the object's self and
beneath access bitfields with one. Simplify the code.
* Check file related access rights.
* Check d_is_negative() instead of !d_backing_inode() in
check_access_path_continue(), and continue the path walk while there
is no mapped inode e.g., with rename(2).
* Check private inode in check_access_path().
* Optimize get_file_access() when dealing with a directory.
* Add missing atomic.h .

Changes since v14:
* Simplify the object, rule and ruleset management at the expense of a
less aggressive memory freeing (contributed by Jann Horn, with
additional modifications):
- Rewrite release_inode() to use inode->sb->s_landlock_inode_refs.
- Remove useless checks in landlock_release_inodes(), clean object
pointer according to the new struct landlock_object and wait for all
iput() to complete.
- Rewrite get_inode_object() according to the new struct
landlock_object. If there is a race-condition when cleaning up an
object, we retry until the concurrent thread finished the object
cleaning.
Cf. https://lore.kernel.org/lkml/CAG48ez21bEn0wL1bbmTiiu8j9jP5iEWtHOwz4tURUJ+ki0ydYw@mail.gmail.com/
* Fix nested domains by implementing a notion of layer level and depth:
- Check for matching level ranges when walking through a file path.
- Only allow access if every layer granted the access request.
* Handles files without mount points (e.g. pipes).
* Hardens path walk by checking inode pointer values.
* Prefetches d_parent when walking to the root directory.
* Remove useless inode_alloc_security hook() (suggested by Jann Horn):
already initialized by lsm_inode_alloc().
* Remove the inode_free_security hook.
* Remove access checks that may be required for FD-only requests:
truncate, getattr, lock, chmod, chown, chgrp, ioctl. This will be
handle in a future evolution of Landlock, but right now the goal is to
lighten the code to ease review.
* Constify variables.
* Move ABI checks into syscall.c .
* Cosmetic variable renames.

Changes since v11:
* Add back, revamp and make a fully working filesystem access-control
based on paths and inodes.
* Remove the eBPF dependency.

Previous changes:
https://lore.kernel.org/lkml/[email protected]/
---
MAINTAINERS | 1 +
arch/Kconfig | 7 +
arch/um/Kconfig | 1 +
include/uapi/linux/landlock.h | 75 ++++
security/landlock/Kconfig | 2 +-
security/landlock/Makefile | 2 +-
security/landlock/fs.c | 622 ++++++++++++++++++++++++++++++++++
security/landlock/fs.h | 56 +++
security/landlock/limits.h | 4 +
security/landlock/ruleset.c | 4 +
security/landlock/setup.c | 7 +
security/landlock/setup.h | 2 +
12 files changed, 781 insertions(+), 2 deletions(-)
create mode 100644 include/uapi/linux/landlock.h
create mode 100644 security/landlock/fs.c
create mode 100644 security/landlock/fs.h

diff --git a/MAINTAINERS b/MAINTAINERS
index dc718573317e..8656d3b9dd0e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9833,6 +9833,7 @@ L: [email protected]
S: Supported
W: https://landlock.io
T: git https://github.com/landlock-lsm/linux.git
+F: include/uapi/linux/landlock.h
F: security/landlock/
K: landlock
K: LANDLOCK
diff --git a/arch/Kconfig b/arch/Kconfig
index ba4e966484ab..e1f8180521fb 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -884,6 +884,13 @@ config COMPAT_32BIT_TIME
config ARCH_NO_PREEMPT
bool

+config ARCH_EPHEMERAL_INODES
+ def_bool n
+ help
+ An arch should select this symbol if it doesn't keep track of inode
+ instances on its own, but instead relies on something else (e.g. the host
+ kernel for an UML kernel).
+
config ARCH_SUPPORTS_RT
bool

diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index 4b799fad8b48..082d0207a7be 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -5,6 +5,7 @@ menu "UML-specific options"
config UML
bool
default y
+ select ARCH_EPHEMERAL_INODES
select ARCH_HAS_KCOV
select ARCH_NO_PREEMPT
select HAVE_ARCH_AUDITSYSCALL
diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
new file mode 100644
index 000000000000..d547bd49fe38
--- /dev/null
+++ b/include/uapi/linux/landlock.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Landlock - User space API
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _UAPI__LINUX_LANDLOCK_H__
+#define _UAPI__LINUX_LANDLOCK_H__
+
+/**
+ * DOC: fs_access
+ *
+ * A set of actions on kernel objects may be defined by an attribute (e.g.
+ * &struct landlock_path_beneath_attr) including a bitmask of access.
+ *
+ * Filesystem flags
+ * ~~~~~~~~~~~~~~~~
+ *
+ * These flags enable to restrict a sandboxed process to a set of actions on
+ * files and directories. Files or directories opened before the sandboxing
+ * are not subject to these restrictions.
+ *
+ * A file can only receive these access rights:
+ *
+ * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file.
+ * - %LANDLOCK_ACCESS_FS_WRITE_FILE: Open a file with write access.
+ * - %LANDLOCK_ACCESS_FS_READ_FILE: Open a file with read access.
+ *
+ * A directory can receive access rights related to files or directories. The
+ * following access right is applied to the directory itself, and the
+ * directories beneath it:
+ *
+ * - %LANDLOCK_ACCESS_FS_READ_DIR: Open a directory or list its content.
+ *
+ * However, the following access rights only apply to the content of a
+ * directory, not the directory itself:
+ *
+ * - %LANDLOCK_ACCESS_FS_REMOVE_DIR: Remove an empty directory or rename one.
+ * - %LANDLOCK_ACCESS_FS_REMOVE_FILE: Unlink (or rename) a file.
+ * - %LANDLOCK_ACCESS_FS_MAKE_CHAR: Create (or rename or link) a character
+ * device.
+ * - %LANDLOCK_ACCESS_FS_MAKE_DIR: Create (or rename) a directory.
+ * - %LANDLOCK_ACCESS_FS_MAKE_REG: Create (or rename or link) a regular file.
+ * - %LANDLOCK_ACCESS_FS_MAKE_SOCK: Create (or rename or link) a UNIX domain
+ * socket.
+ * - %LANDLOCK_ACCESS_FS_MAKE_FIFO: Create (or rename or link) a named pipe.
+ * - %LANDLOCK_ACCESS_FS_MAKE_BLOCK: Create (or rename or link) a block device.
+ * - %LANDLOCK_ACCESS_FS_MAKE_SYM: Create (or rename or link) a symbolic link.
+ *
+ * .. warning::
+ *
+ * It is currently not possible to restrict some file-related actions
+ * accessible through these syscall families: :manpage:`chdir(2)`,
+ * :manpage:`truncate(2)`, :manpage:`stat(2)`, :manpage:`flock(2)`,
+ * :manpage:`chmod(2)`, :manpage:`chown(2)`, :manpage:`setxattr(2)`,
+ * :manpage:`ioctl(2)`, :manpage:`fcntl(2)`.
+ * Future Landlock evolutions will enable to restrict them.
+ */
+#define LANDLOCK_ACCESS_FS_EXECUTE (1ULL << 0)
+#define LANDLOCK_ACCESS_FS_WRITE_FILE (1ULL << 1)
+#define LANDLOCK_ACCESS_FS_READ_FILE (1ULL << 2)
+#define LANDLOCK_ACCESS_FS_READ_DIR (1ULL << 3)
+#define LANDLOCK_ACCESS_FS_REMOVE_DIR (1ULL << 4)
+#define LANDLOCK_ACCESS_FS_REMOVE_FILE (1ULL << 5)
+#define LANDLOCK_ACCESS_FS_MAKE_CHAR (1ULL << 6)
+#define LANDLOCK_ACCESS_FS_MAKE_DIR (1ULL << 7)
+#define LANDLOCK_ACCESS_FS_MAKE_REG (1ULL << 8)
+#define LANDLOCK_ACCESS_FS_MAKE_SOCK (1ULL << 9)
+#define LANDLOCK_ACCESS_FS_MAKE_FIFO (1ULL << 10)
+#define LANDLOCK_ACCESS_FS_MAKE_BLOCK (1ULL << 11)
+#define LANDLOCK_ACCESS_FS_MAKE_SYM (1ULL << 12)
+
+#endif /* _UAPI__LINUX_LANDLOCK_H__ */
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
index ea58e6208afa..43e5b0bb0706 100644
--- a/security/landlock/Kconfig
+++ b/security/landlock/Kconfig
@@ -2,7 +2,7 @@

config SECURITY_LANDLOCK
bool "Landlock support"
- depends on SECURITY
+ depends on SECURITY && !ARCH_EPHEMERAL_INODES
select SECURITY_PATH
help
Landlock is a safe sandboxing mechanism which enables processes to
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index f1d1eb72fa76..92e3d80ab8ed 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,4 +1,4 @@
obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o

landlock-y := setup.o object.o ruleset.o \
- cred.o ptrace.o
+ cred.o ptrace.o fs.o
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
new file mode 100644
index 000000000000..cd80b1973bb5
--- /dev/null
+++ b/security/landlock/fs.c
@@ -0,0 +1,622 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Filesystem management and hooks
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/bits.h>
+#include <linux/compiler_types.h>
+#include <linux/dcache.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/limits.h>
+#include <linux/list.h>
+#include <linux/lsm_hooks.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/path.h>
+#include <linux/rcupdate.h>
+#include <linux/spinlock.h>
+#include <linux/stat.h>
+#include <linux/types.h>
+#include <linux/wait_bit.h>
+#include <linux/workqueue.h>
+#include <uapi/linux/landlock.h>
+
+#include "common.h"
+#include "cred.h"
+#include "fs.h"
+#include "limits.h"
+#include "object.h"
+#include "ruleset.h"
+#include "setup.h"
+
+/* Underlying object management */
+
+static void release_inode(struct landlock_object *const object)
+ __releases(object->lock)
+{
+ struct inode *const inode = object->underobj;
+ struct super_block *sb;
+
+ if (!inode) {
+ spin_unlock(&object->lock);
+ return;
+ }
+
+ spin_lock(&inode->i_lock);
+ /*
+ * Make sure that if the filesystem is concurrently unmounted,
+ * hook_sb_delete() will wait for us to finish iput().
+ */
+ sb = inode->i_sb;
+ atomic_long_inc(&landlock_superblock(sb)->inode_refs);
+ rcu_assign_pointer(landlock_inode(inode)->object, NULL);
+ spin_unlock(&inode->i_lock);
+ spin_unlock(&object->lock);
+ /*
+ * Now, new rules can safely be tied to @inode.
+ */
+
+ iput(inode);
+ if (atomic_long_dec_and_test(&landlock_superblock(sb)->inode_refs))
+ wake_up_var(&landlock_superblock(sb)->inode_refs);
+}
+
+static const struct landlock_object_underops landlock_fs_underops = {
+ .release = release_inode
+};
+
+/* Ruleset management */
+
+static struct landlock_object *get_inode_object(struct inode *const inode)
+{
+ struct landlock_object *object, *new_object;
+ struct landlock_inode_security *inode_sec = landlock_inode(inode);
+
+ rcu_read_lock();
+retry:
+ object = rcu_dereference(inode_sec->object);
+ if (object) {
+ if (likely(refcount_inc_not_zero(&object->usage))) {
+ rcu_read_unlock();
+ return object;
+ }
+ /*
+ * We are racing with release_inode(), the object is going
+ * away. Wait for release_inode(), then retry.
+ */
+ spin_lock(&object->lock);
+ spin_unlock(&object->lock);
+ goto retry;
+ }
+ rcu_read_unlock();
+
+ /*
+ * If there is no object tied to @inode, then create a new one (without
+ * holding any locks).
+ */
+ new_object = landlock_create_object(&landlock_fs_underops, inode);
+ if (IS_ERR(new_object))
+ return new_object;
+
+ spin_lock(&inode->i_lock);
+ object = rcu_dereference_protected(inode_sec->object,
+ lockdep_is_held(&inode->i_lock));
+ if (unlikely(object)) {
+ /* Someone else just created the object, bail out and retry. */
+ spin_unlock(&inode->i_lock);
+ kfree(new_object);
+
+ rcu_read_lock();
+ goto retry;
+ }
+
+ rcu_assign_pointer(inode_sec->object, new_object);
+ /*
+ * @inode will be released by hook_sb_delete() on its superblock
+ * shutdown.
+ */
+ ihold(inode);
+ spin_unlock(&inode->i_lock);
+ return new_object;
+}
+
+/* All access rights which can be tied to files. */
+#define ACCESS_FILE ( \
+ LANDLOCK_ACCESS_FS_EXECUTE | \
+ LANDLOCK_ACCESS_FS_WRITE_FILE | \
+ LANDLOCK_ACCESS_FS_READ_FILE)
+
+/*
+ * @path: Should have been checked by get_path_from_fd().
+ */
+int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
+ const struct path *const path, u32 access_rights)
+{
+ int err;
+ struct landlock_object *object;
+
+ /* Files only get access rights that make sense. */
+ if (!d_is_dir(path->dentry) && (access_rights | ACCESS_FILE) !=
+ ACCESS_FILE)
+ return -EINVAL;
+
+ /* Transforms relative access rights to absolute ones. */
+ access_rights |= LANDLOCK_MASK_ACCESS_FS & ~ruleset->fs_access_mask;
+ object = get_inode_object(d_backing_inode(path->dentry));
+ if (IS_ERR(object))
+ return PTR_ERR(object);
+ mutex_lock(&ruleset->lock);
+ err = landlock_insert_rule(ruleset, object, access_rights);
+ mutex_unlock(&ruleset->lock);
+ /*
+ * No need to check for an error because landlock_insert_rule()
+ * increments the refcount for the new object if needed.
+ */
+ landlock_put_object(object);
+ return err;
+}
+
+/* Access-control management */
+
+static bool check_access_path_continue(
+ const struct landlock_ruleset *const domain,
+ const struct path *const path, const u32 access_request,
+ u64 *const layer_mask)
+{
+ const struct landlock_rule *rule;
+ const struct inode *inode;
+ size_t i;
+
+ if (d_is_negative(path->dentry))
+ /* Continues to walk while there is no mapped inode. */
+ return true;
+ inode = d_backing_inode(path->dentry);
+ rcu_read_lock();
+ rule = landlock_find_rule(domain,
+ rcu_dereference(landlock_inode(inode)->object));
+ rcu_read_unlock();
+ if (!rule)
+ return true;
+
+ /*
+ * An access is granted if, for each policy layer, at least one rule
+ * encountered on the pathwalk grants the access, regardless of their
+ * position in the layer stack. We must then check not-yet-seen layers
+ * for each inode, from the last one added to the first one.
+ */
+ for (i = 0; i < rule->num_layers; i++) {
+ const struct landlock_layer *const layer = &rule->layers[i];
+ const u64 layer_level = BIT_ULL(layer->level - 1);
+
+ if (!(layer_level & *layer_mask))
+ continue;
+ if ((layer->access & access_request) != access_request)
+ return false;
+ *layer_mask &= ~layer_level;
+ }
+ return true;
+}
+
+static int check_access_path(const struct landlock_ruleset *const domain,
+ const struct path *const path, u32 access_request)
+{
+ bool allowed = false;
+ struct path walker_path;
+ u64 layer_mask;
+
+ /* Make sure all layers can be checked. */
+ BUILD_BUG_ON(BITS_PER_TYPE(layer_mask) < LANDLOCK_MAX_NUM_LAYERS);
+
+ if (WARN_ON_ONCE(!domain || !path))
+ return 0;
+ /*
+ * Allows access to pseudo filesystems that will never be mountable
+ * (e.g. sockfs, pipefs), but can still be reachable through
+ * /proc/self/fd .
+ */
+ if ((path->dentry->d_sb->s_flags & SB_NOUSER) ||
+ (d_is_positive(path->dentry) &&
+ unlikely(IS_PRIVATE(d_backing_inode(path->dentry)))))
+ return 0;
+ if (WARN_ON_ONCE(domain->num_layers < 1))
+ return -EACCES;
+
+ layer_mask = GENMASK_ULL(domain->num_layers - 1, 0);
+ /*
+ * An access request which is not handled by the domain should be
+ * allowed.
+ */
+ access_request &= domain->fs_access_mask;
+ if (access_request == 0)
+ return 0;
+ walker_path = *path;
+ path_get(&walker_path);
+ /*
+ * We need to walk through all the hierarchy to not miss any relevant
+ * restriction.
+ */
+ while (check_access_path_continue(domain, &walker_path, access_request,
+ &layer_mask)) {
+ struct dentry *parent_dentry;
+
+ /* Stops when a rule from each layer granted access. */
+ if (layer_mask == 0) {
+ allowed = true;
+ break;
+ }
+
+jump_up:
+ /*
+ * Does not work with orphaned/private mounts like overlayfs
+ * layers for now (cf. ovl_path_real() and ovl_path_open()).
+ */
+ if (walker_path.dentry == walker_path.mnt->mnt_root) {
+ if (follow_up(&walker_path)) {
+ /* Ignores hidden mount points. */
+ goto jump_up;
+ } else {
+ /*
+ * Stops at the real root. Denies access
+ * because not all layers have granted access.
+ */
+ allowed = false;
+ break;
+ }
+ }
+ if (unlikely(IS_ROOT(walker_path.dentry))) {
+ /*
+ * Stops at disconnected root directories. Only allows
+ * access to internal filesystems (e.g. nsfs which is
+ * reachable through /proc/self/ns).
+ */
+ allowed = !!(walker_path.mnt->mnt_flags & MNT_INTERNAL);
+ break;
+ }
+ parent_dentry = dget_parent(walker_path.dentry);
+ dput(walker_path.dentry);
+ walker_path.dentry = parent_dentry;
+ }
+ path_put(&walker_path);
+ return allowed ? 0 : -EACCES;
+}
+
+static inline int current_check_access_path(const struct path *const path,
+ const u32 access_request)
+{
+ const struct landlock_ruleset *const dom =
+ landlock_get_current_domain();
+
+ if (!dom)
+ return 0;
+ return check_access_path(dom, path, access_request);
+}
+
+/* Super-block hooks */
+
+/*
+ * Release the inodes used in a security policy.
+ *
+ * Cf. fsnotify_unmount_inodes()
+ */
+static void hook_sb_delete(struct super_block *const sb)
+{
+ struct inode *inode, *iput_inode = NULL;
+
+ if (!landlock_initialized)
+ return;
+
+ spin_lock(&sb->s_inode_list_lock);
+ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ struct landlock_inode_security *inode_sec =
+ landlock_inode(inode);
+ struct landlock_object *object;
+ bool do_put = false;
+
+ rcu_read_lock();
+ object = rcu_dereference(inode_sec->object);
+ if (!object) {
+ rcu_read_unlock();
+ continue;
+ }
+
+ spin_lock(&object->lock);
+ if (object->underobj) {
+ object->underobj = NULL;
+ do_put = true;
+ spin_lock(&inode->i_lock);
+ rcu_assign_pointer(inode_sec->object, NULL);
+ spin_unlock(&inode->i_lock);
+ }
+ spin_unlock(&object->lock);
+ rcu_read_unlock();
+ if (!do_put)
+ /*
+ * A concurrent iput() in release_inode() is ongoing
+ * and we will just wait for it to finish.
+ */
+ continue;
+
+ /*
+ * At this point, we own the ihold() reference that was
+ * originally set up by get_inode_object(). Therefore we can
+ * drop the list lock and know that the inode won't disappear
+ * from under us until the next loop walk.
+ */
+ spin_unlock(&sb->s_inode_list_lock);
+ /*
+ * We can now actually put the previous inode, which is not
+ * needed anymore for the loop walk.
+ */
+ if (iput_inode)
+ iput(iput_inode);
+ iput_inode = inode;
+ spin_lock(&sb->s_inode_list_lock);
+ }
+ spin_unlock(&sb->s_inode_list_lock);
+ if (iput_inode)
+ iput(iput_inode);
+
+ /*
+ * Wait for pending iput() in release_inode().
+ */
+ wait_var_event(&landlock_superblock(sb)->inode_refs, !atomic_long_read(
+ &landlock_superblock(sb)->inode_refs));
+}
+
+/*
+ * Because a Landlock security policy is defined according to the filesystem
+ * layout (i.e. the mount namespace), changing it may grant access to files not
+ * previously allowed.
+ *
+ * To make it simple, deny any filesystem layout modification by landlocked
+ * processes. Non-landlocked processes may still change the namespace of a
+ * landlocked process, but this kind of threat must be handled by a system-wide
+ * access-control security policy.
+ *
+ * This could be lifted in the future if Landlock can safely handle mount
+ * namespace updates requested by a landlocked process. Indeed, we could
+ * update the current domain (which is currently read-only) by taking into
+ * account the accesses of the source and the destination of a new mount point.
+ * However, it would also require to make all the child domains dynamically
+ * inherit these new constraints. Anyway, for backward compatibility reasons,
+ * a dedicated user space option would be required (e.g. as a ruleset command
+ * option).
+ */
+static int hook_sb_mount(const char *const dev_name,
+ const struct path *const path, const char *const type,
+ const unsigned long flags, void *const data)
+{
+ if (!landlock_get_current_domain())
+ return 0;
+ return -EPERM;
+}
+
+static int hook_move_mount(const struct path *const from_path,
+ const struct path *const to_path)
+{
+ if (!landlock_get_current_domain())
+ return 0;
+ return -EPERM;
+}
+
+/*
+ * Removing a mount point may reveal a previously hidden file hierarchy, which
+ * may then grant access to files, which may have previously been forbidden.
+ */
+static int hook_sb_umount(struct vfsmount *const mnt, const int flags)
+{
+ if (!landlock_get_current_domain())
+ return 0;
+ return -EPERM;
+}
+
+static int hook_sb_remount(struct super_block *const sb, void *const mnt_opts)
+{
+ if (!landlock_get_current_domain())
+ return 0;
+ return -EPERM;
+}
+
+/*
+ * pivot_root(2), like mount(2), changes the current mount namespace. It must
+ * then be forbidden for a landlocked process.
+ *
+ * However, chroot(2) may be allowed because it only changes the relative root
+ * directory of the current process. Moreover, it can be used to restrict the
+ * view of the filesystem.
+ */
+static int hook_sb_pivotroot(const struct path *const old_path,
+ const struct path *const new_path)
+{
+ if (!landlock_get_current_domain())
+ return 0;
+ return -EPERM;
+}
+
+/* Path hooks */
+
+static inline u32 get_mode_access(const umode_t mode)
+{
+ switch (mode & S_IFMT) {
+ case S_IFLNK:
+ return LANDLOCK_ACCESS_FS_MAKE_SYM;
+ case 0:
+ /* A zero mode translates to S_IFREG. */
+ case S_IFREG:
+ return LANDLOCK_ACCESS_FS_MAKE_REG;
+ case S_IFDIR:
+ return LANDLOCK_ACCESS_FS_MAKE_DIR;
+ case S_IFCHR:
+ return LANDLOCK_ACCESS_FS_MAKE_CHAR;
+ case S_IFBLK:
+ return LANDLOCK_ACCESS_FS_MAKE_BLOCK;
+ case S_IFIFO:
+ return LANDLOCK_ACCESS_FS_MAKE_FIFO;
+ case S_IFSOCK:
+ return LANDLOCK_ACCESS_FS_MAKE_SOCK;
+ default:
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+}
+
+/*
+ * Creating multiple links or renaming may lead to privilege escalations if not
+ * handled properly. Indeed, we must be sure that the source doesn't gain more
+ * privileges by being accessible from the destination. This is getting more
+ * complex when dealing with multiple layers. The whole picture can be seen as
+ * a multilayer partial ordering problem. A future version of Landlock will
+ * deal with that.
+ */
+static int hook_path_link(struct dentry *const old_dentry,
+ const struct path *const new_dir,
+ struct dentry *const new_dentry)
+{
+ const struct landlock_ruleset *const dom =
+ landlock_get_current_domain();
+
+ if (!dom)
+ return 0;
+ /* The mount points are the same for old and new paths, cf. EXDEV. */
+ if (old_dentry->d_parent != new_dir->dentry)
+ /* For now, forbid reparenting. */
+ return -EACCES;
+ if (unlikely(d_is_negative(old_dentry)))
+ return -EACCES;
+ return check_access_path(dom, new_dir,
+ get_mode_access(d_backing_inode(old_dentry)->i_mode));
+}
+
+static inline u32 maybe_remove(const struct dentry *const dentry)
+{
+ if (d_is_negative(dentry))
+ return 0;
+ return d_is_dir(dentry) ? LANDLOCK_ACCESS_FS_REMOVE_DIR :
+ LANDLOCK_ACCESS_FS_REMOVE_FILE;
+}
+
+static int hook_path_rename(const struct path *const old_dir,
+ struct dentry *const old_dentry,
+ const struct path *const new_dir,
+ struct dentry *const new_dentry)
+{
+ const struct landlock_ruleset *const dom =
+ landlock_get_current_domain();
+
+ if (!dom)
+ return 0;
+ /* The mount points are the same for old and new paths, cf. EXDEV. */
+ if (old_dir->dentry != new_dir->dentry)
+ /* For now, forbid reparenting. */
+ return -EACCES;
+ if (WARN_ON_ONCE(d_is_negative(old_dentry)))
+ return -EACCES;
+ /* RENAME_EXCHANGE is handled because directories are the same. */
+ return check_access_path(dom, old_dir, maybe_remove(old_dentry) |
+ maybe_remove(new_dentry) |
+ get_mode_access(d_backing_inode(old_dentry)->i_mode));
+}
+
+static int hook_path_mkdir(const struct path *const dir,
+ struct dentry *const dentry, const umode_t mode)
+{
+ return current_check_access_path(dir, LANDLOCK_ACCESS_FS_MAKE_DIR);
+}
+
+static int hook_path_mknod(const struct path *const dir,
+ struct dentry *const dentry, const umode_t mode,
+ const unsigned int dev)
+{
+ const struct landlock_ruleset *const dom =
+ landlock_get_current_domain();
+
+ if (!dom)
+ return 0;
+ return check_access_path(dom, dir, get_mode_access(mode));
+}
+
+static int hook_path_symlink(const struct path *const dir,
+ struct dentry *const dentry, const char *const old_name)
+{
+ return current_check_access_path(dir, LANDLOCK_ACCESS_FS_MAKE_SYM);
+}
+
+static int hook_path_unlink(const struct path *const dir,
+ struct dentry *const dentry)
+{
+ return current_check_access_path(dir, LANDLOCK_ACCESS_FS_REMOVE_FILE);
+}
+
+static int hook_path_rmdir(const struct path *const dir,
+ struct dentry *const dentry)
+{
+ return current_check_access_path(dir, LANDLOCK_ACCESS_FS_REMOVE_DIR);
+}
+
+/* File hooks */
+
+static inline u32 get_file_access(const struct file *const file)
+{
+ u32 access = 0;
+
+ if (file->f_mode & FMODE_READ) {
+ /* A directory can only be opened in read mode. */
+ if (S_ISDIR(file_inode(file)->i_mode))
+ return LANDLOCK_ACCESS_FS_READ_DIR;
+ access = LANDLOCK_ACCESS_FS_READ_FILE;
+ }
+ if (file->f_mode & FMODE_WRITE)
+ access |= LANDLOCK_ACCESS_FS_WRITE_FILE;
+ /* __FMODE_EXEC is indeed part of f_flags, not f_mode. */
+ if (file->f_flags & __FMODE_EXEC)
+ access |= LANDLOCK_ACCESS_FS_EXECUTE;
+ return access;
+}
+
+static int hook_file_open(struct file *const file)
+{
+ const struct landlock_ruleset *const dom =
+ landlock_get_current_domain();
+
+ if (!dom)
+ return 0;
+ /*
+ * Because a file may be opened with O_PATH, get_file_access() may
+ * return 0. This case will be handled with a future Landlock
+ * evolution.
+ */
+ return current_check_access_path(&file->f_path, get_file_access(file));
+}
+
+static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = {
+ LSM_HOOK_INIT(sb_delete, hook_sb_delete),
+ LSM_HOOK_INIT(sb_mount, hook_sb_mount),
+ LSM_HOOK_INIT(move_mount, hook_move_mount),
+ LSM_HOOK_INIT(sb_umount, hook_sb_umount),
+ LSM_HOOK_INIT(sb_remount, hook_sb_remount),
+ LSM_HOOK_INIT(sb_pivotroot, hook_sb_pivotroot),
+
+ LSM_HOOK_INIT(path_link, hook_path_link),
+ LSM_HOOK_INIT(path_rename, hook_path_rename),
+ LSM_HOOK_INIT(path_mkdir, hook_path_mkdir),
+ LSM_HOOK_INIT(path_mknod, hook_path_mknod),
+ LSM_HOOK_INIT(path_symlink, hook_path_symlink),
+ LSM_HOOK_INIT(path_unlink, hook_path_unlink),
+ LSM_HOOK_INIT(path_rmdir, hook_path_rmdir),
+
+ LSM_HOOK_INIT(file_open, hook_file_open),
+};
+
+__init void landlock_add_fs_hooks(void)
+{
+ security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+ LANDLOCK_NAME);
+}
diff --git a/security/landlock/fs.h b/security/landlock/fs.h
new file mode 100644
index 000000000000..9f14ec4d8d48
--- /dev/null
+++ b/security/landlock/fs.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Filesystem management and hooks
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_FS_H
+#define _SECURITY_LANDLOCK_FS_H
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/rcupdate.h>
+
+#include "ruleset.h"
+#include "setup.h"
+
+struct landlock_inode_security {
+ /*
+ * @object: Weak pointer to an allocated object. All writes (i.e.
+ * creating a new object or removing one) are protected by the
+ * underlying inode->i_lock. Disassociating @object from the inode is
+ * additionally protected by @object->lock, from the time @object's
+ * usage refcount drops to zero to the time this pointer is nulled out.
+ * Cf. release_inode().
+ */
+ struct landlock_object __rcu *object;
+};
+
+struct landlock_superblock_security {
+ /*
+ * @inode_refs: References to Landlock underlying objects.
+ * Cf. struct super_block->s_fsnotify_inode_refs .
+ */
+ atomic_long_t inode_refs;
+};
+
+static inline struct landlock_inode_security *landlock_inode(
+ const struct inode *const inode)
+{
+ return inode->i_security + landlock_blob_sizes.lbs_inode;
+}
+
+static inline struct landlock_superblock_security *landlock_superblock(
+ const struct super_block *const superblock)
+{
+ return superblock->s_security + landlock_blob_sizes.lbs_superblock;
+}
+
+__init void landlock_add_fs_hooks(void);
+
+int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
+ const struct path *const path, u32 access_hierarchy);
+
+#endif /* _SECURITY_LANDLOCK_FS_H */
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index b734f597bb0e..2a0a1095ee27 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -10,8 +10,12 @@
#define _SECURITY_LANDLOCK_LIMITS_H

#include <linux/limits.h>
+#include <uapi/linux/landlock.h>

#define LANDLOCK_MAX_NUM_LAYERS 64
#define LANDLOCK_MAX_NUM_RULES U32_MAX

+#define LANDLOCK_LAST_ACCESS_FS LANDLOCK_ACCESS_FS_MAKE_SYM
+#define LANDLOCK_MASK_ACCESS_FS ((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
+
#endif /* _SECURITY_LANDLOCK_LIMITS_H */
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index bf7ff66c1b12..548636a68b48 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -112,10 +112,12 @@ static void build_check_ruleset(void)
const struct landlock_ruleset ruleset = {
.num_rules = ~0,
.num_layers = ~0,
+ .fs_access_mask = ~0,
};

BUILD_BUG_ON(ruleset.num_rules < LANDLOCK_MAX_NUM_RULES);
BUILD_BUG_ON(ruleset.num_layers < LANDLOCK_MAX_NUM_LAYERS);
+ BUILD_BUG_ON(ruleset.fs_access_mask < LANDLOCK_MASK_ACCESS_FS);
}

/**
@@ -214,9 +216,11 @@ static void build_check_layer(void)
{
const struct landlock_layer layer = {
.level = ~0,
+ .access = ~0,
};

BUILD_BUG_ON(layer.level < LANDLOCK_MAX_NUM_LAYERS);
+ BUILD_BUG_ON(layer.access < LANDLOCK_MASK_ACCESS_FS);
}

int landlock_insert_rule(struct landlock_ruleset *const ruleset,
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
index a5d6ef334991..f8e8e980454c 100644
--- a/security/landlock/setup.c
+++ b/security/landlock/setup.c
@@ -11,17 +11,24 @@

#include "common.h"
#include "cred.h"
+#include "fs.h"
#include "ptrace.h"
#include "setup.h"

+bool landlock_initialized __lsm_ro_after_init = false;
+
struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = {
.lbs_cred = sizeof(struct landlock_cred_security),
+ .lbs_inode = sizeof(struct landlock_inode_security),
+ .lbs_superblock = sizeof(struct landlock_superblock_security),
};

static int __init landlock_init(void)
{
landlock_add_cred_hooks();
landlock_add_ptrace_hooks();
+ landlock_add_fs_hooks();
+ landlock_initialized = true;
pr_info("Up and running.\n");
return 0;
}
diff --git a/security/landlock/setup.h b/security/landlock/setup.h
index 9fdbf33fcc33..1daffab1ab4b 100644
--- a/security/landlock/setup.h
+++ b/security/landlock/setup.h
@@ -11,6 +11,8 @@

#include <linux/lsm_hooks.h>

+extern bool landlock_initialized;
+
extern struct lsm_blob_sizes landlock_blob_sizes;

#endif /* _SECURITY_LANDLOCK_SETUP_H */
--
2.29.2

2020-12-09 23:48:46

by Mickaël Salaün

[permalink] [raw]
Subject: [PATCH v26 11/12] samples/landlock: Add a sandbox manager example

From: Mickaël Salaün <[email protected]>

Add a basic sandbox tool to launch a command which can only access a
whitelist of file hierarchies in a read-only or read-write way.

Cc: James Morris <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Signed-off-by: Mickaël Salaün <[email protected]>
---

Changes since v25:
* Remove useless errno set in the syscall wrappers.
* Cosmetic variable renames.

Changes since v23:
* Re-add hints to help users understand the required kernel
configuration. This was removed with the removal of
landlock_get_features(2).

Changes since v21:
* Remove LANDLOCK_ACCESS_FS_CHROOT.
* Clean up help.

Changes since v20:
* Update with new syscalls and type names.
* Update errno check for EOPNOTSUPP.
* Use the full syscall interfaces: explicitely set the "flags" field to
zero.

Changes since v19:
* Update with the new Landlock syscalls.
* Comply with commit 5f2fb52fac15 ("kbuild: rename hostprogs-y/always to
hostprogs/always-y").

Changes since v16:
* Switch syscall attribute pointer and size arguments.

Changes since v15:
* Update access right names.
* Properly assign access right to files according to the new related
syscall restriction.
* Replace "select" with "depends on" HEADERS_INSTALL (suggested by Randy
Dunlap).

Changes since v14:
* Fix Kconfig dependency.
* Remove access rights that may be required for FD-only requests:
mmap, truncate, getattr, lock, chmod, chown, chgrp, ioctl.
* Fix useless hardcoded syscall number.
* Use execvpe().
* Follow symlinks.
* Extend help with common file paths.
* Constify variables.
* Clean up comments.
* Improve error message.

Changes since v11:
* Add back the filesystem sandbox manager and update it to work with the
new Landlock syscall.

Previous changes:
https://lore.kernel.org/lkml/[email protected]/
---
samples/Kconfig | 7 ++
samples/Makefile | 1 +
samples/landlock/.gitignore | 1 +
samples/landlock/Makefile | 15 +++
samples/landlock/sandboxer.c | 233 +++++++++++++++++++++++++++++++++++
5 files changed, 257 insertions(+)
create mode 100644 samples/landlock/.gitignore
create mode 100644 samples/landlock/Makefile
create mode 100644 samples/landlock/sandboxer.c

diff --git a/samples/Kconfig b/samples/Kconfig
index 0ed6e4d71d87..e6129496ced5 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -124,6 +124,13 @@ config SAMPLE_HIDRAW
bool "hidraw sample"
depends on CC_CAN_LINK && HEADERS_INSTALL

+config SAMPLE_LANDLOCK
+ bool "Build Landlock sample code"
+ depends on HEADERS_INSTALL
+ help
+ Build a simple Landlock sandbox manager able to launch a process
+ restricted by a user-defined filesystem access control.
+
config SAMPLE_PIDFD
bool "pidfd sample"
depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index c3392a595e4b..087e0988ccc5 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_SAMPLE_KDB) += kdb/
obj-$(CONFIG_SAMPLE_KFIFO) += kfifo/
obj-$(CONFIG_SAMPLE_KOBJECT) += kobject/
obj-$(CONFIG_SAMPLE_KPROBES) += kprobes/
+subdir-$(CONFIG_SAMPLE_LANDLOCK) += landlock
obj-$(CONFIG_SAMPLE_LIVEPATCH) += livepatch/
subdir-$(CONFIG_SAMPLE_PIDFD) += pidfd
obj-$(CONFIG_SAMPLE_QMI_CLIENT) += qmi/
diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore
new file mode 100644
index 000000000000..f43668b2d318
--- /dev/null
+++ b/samples/landlock/.gitignore
@@ -0,0 +1 @@
+/sandboxer
diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile
new file mode 100644
index 000000000000..21eda5774948
--- /dev/null
+++ b/samples/landlock/Makefile
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+hostprogs := sandboxer
+
+always-y := $(hostprogs)
+
+KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
+
+.PHONY: all clean
+
+all:
+ $(MAKE) -C ../.. samples/landlock/
+
+clean:
+ $(MAKE) -C ../.. M=samples/landlock/ clean
diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
new file mode 100644
index 000000000000..82b2738b216c
--- /dev/null
+++ b/samples/landlock/sandboxer.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Simple Landlock sandbox manager able to launch a process restricted by a
+ * user-defined filesystem access control.
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <linux/prctl.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+#ifndef landlock_create_ruleset
+static inline int landlock_create_ruleset(
+ const struct landlock_ruleset_attr *const attr,
+ const size_t size, const __u32 flags)
+{
+ return syscall(__NR_landlock_create_ruleset, attr, size, flags);
+}
+#endif
+
+#ifndef landlock_add_rule
+static inline int landlock_add_rule(const int ruleset_fd,
+ const enum landlock_rule_type rule_type,
+ const void *const rule_attr, const __u32 flags)
+{
+ return syscall(__NR_landlock_add_rule, ruleset_fd, rule_type,
+ rule_attr, flags);
+}
+#endif
+
+#ifndef landlock_enforce_ruleset_current
+static inline int landlock_enforce_ruleset_current(const int ruleset_fd,
+ const __u32 flags)
+{
+ return syscall(__NR_landlock_enforce_ruleset_current, ruleset_fd,
+ flags);
+}
+#endif
+
+#define ENV_FS_RO_NAME "LL_FS_RO"
+#define ENV_FS_RW_NAME "LL_FS_RW"
+#define ENV_PATH_TOKEN ":"
+
+static int parse_path(char *env_path, const char ***const path_list)
+{
+ int i, num_paths = 0;
+
+ if (env_path) {
+ num_paths++;
+ for (i = 0; env_path[i]; i++) {
+ if (env_path[i] == ENV_PATH_TOKEN[0])
+ num_paths++;
+ }
+ }
+ *path_list = malloc(num_paths * sizeof(**path_list));
+ for (i = 0; i < num_paths; i++)
+ (*path_list)[i] = strsep(&env_path, ENV_PATH_TOKEN);
+
+ return num_paths;
+}
+
+#define ACCESS_FILE ( \
+ LANDLOCK_ACCESS_FS_EXECUTE | \
+ LANDLOCK_ACCESS_FS_WRITE_FILE | \
+ LANDLOCK_ACCESS_FS_READ_FILE)
+
+static int populate_ruleset(
+ const char *const env_var, const int ruleset_fd,
+ const __u64 allowed_access)
+{
+ int num_paths, i;
+ char *env_path_name;
+ const char **path_list = NULL;
+ struct landlock_path_beneath_attr path_beneath = {
+ .parent_fd = -1,
+ };
+
+ env_path_name = getenv(env_var);
+ if (!env_path_name) {
+ fprintf(stderr, "Missing environment variable %s\n", env_var);
+ return 1;
+ }
+ env_path_name = strdup(env_path_name);
+ unsetenv(env_var);
+ num_paths = parse_path(env_path_name, &path_list);
+ if (num_paths == 1 && path_list[0][0] == '\0') {
+ fprintf(stderr, "Missing path in %s\n", env_var);
+ goto err_free_name;
+ }
+
+ for (i = 0; i < num_paths; i++) {
+ struct stat statbuf;
+
+ path_beneath.parent_fd = open(path_list[i], O_PATH |
+ O_CLOEXEC);
+ if (path_beneath.parent_fd < 0) {
+ fprintf(stderr, "Failed to open \"%s\": %s\n",
+ path_list[i],
+ strerror(errno));
+ goto err_free_name;
+ }
+ if (fstat(path_beneath.parent_fd, &statbuf)) {
+ close(path_beneath.parent_fd);
+ goto err_free_name;
+ }
+ path_beneath.allowed_access = allowed_access;
+ if (!S_ISDIR(statbuf.st_mode))
+ path_beneath.allowed_access &= ACCESS_FILE;
+ if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0)) {
+ fprintf(stderr, "Failed to update the ruleset with \"%s\": %s\n",
+ path_list[i], strerror(errno));
+ close(path_beneath.parent_fd);
+ goto err_free_name;
+ }
+ close(path_beneath.parent_fd);
+ }
+ free(env_path_name);
+ return 0;
+
+err_free_name:
+ free(env_path_name);
+ return 1;
+}
+
+#define ACCESS_FS_ROUGHLY_READ ( \
+ LANDLOCK_ACCESS_FS_EXECUTE | \
+ LANDLOCK_ACCESS_FS_READ_FILE | \
+ LANDLOCK_ACCESS_FS_READ_DIR)
+
+#define ACCESS_FS_ROUGHLY_WRITE ( \
+ LANDLOCK_ACCESS_FS_WRITE_FILE | \
+ LANDLOCK_ACCESS_FS_REMOVE_DIR | \
+ LANDLOCK_ACCESS_FS_REMOVE_FILE | \
+ LANDLOCK_ACCESS_FS_MAKE_CHAR | \
+ LANDLOCK_ACCESS_FS_MAKE_DIR | \
+ LANDLOCK_ACCESS_FS_MAKE_REG | \
+ LANDLOCK_ACCESS_FS_MAKE_SOCK | \
+ LANDLOCK_ACCESS_FS_MAKE_FIFO | \
+ LANDLOCK_ACCESS_FS_MAKE_BLOCK | \
+ LANDLOCK_ACCESS_FS_MAKE_SYM)
+
+int main(const int argc, char *const argv[], char *const *const envp)
+{
+ const char *cmd_path;
+ char *const *cmd_argv;
+ int ruleset_fd;
+ struct landlock_ruleset_attr ruleset_attr = {
+ .handled_access_fs = ACCESS_FS_ROUGHLY_READ |
+ ACCESS_FS_ROUGHLY_WRITE,
+ };
+
+ if (argc < 2) {
+ fprintf(stderr, "usage: %s=\"...\" %s=\"...\" %s <cmd> [args]...\n\n",
+ ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
+ fprintf(stderr, "Launch a command in a restricted environment.\n\n");
+ fprintf(stderr, "Environment variables containing paths, "
+ "each separated by a colon:\n");
+ fprintf(stderr, "* %s: list of paths allowed to be used in a read-only way.\n",
+ ENV_FS_RO_NAME);
+ fprintf(stderr, "* %s: list of paths allowed to be used in a read-write way.\n",
+ ENV_FS_RO_NAME);
+ fprintf(stderr, "\nexample:\n"
+ "%s=\"/bin:/lib:/usr:/proc:/etc:/dev/urandom\" "
+ "%s=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" "
+ "%s bash -i\n",
+ ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
+ return 1;
+ }
+
+ ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+ if (ruleset_fd < 0) {
+ perror("Failed to create a ruleset");
+ switch (errno) {
+ case ENOSYS:
+ fprintf(stderr, "Hint: Landlock is not supported by the current kernel. "
+ "To support it, build the kernel with "
+ "CONFIG_SECURITY_LANDLOCK=y and prepend "
+ "\"landlock,\" to the content of CONFIG_LSM.\n");
+ break;
+ case EOPNOTSUPP:
+ fprintf(stderr, "Hint: Landlock is currently disabled. "
+ "It can be enabled in the kernel configuration by "
+ "prepending \"landlock,\" to the content of CONFIG_LSM, "
+ "or at boot time by setting the same content to the "
+ "\"lsm\" kernel parameter.\n");
+ break;
+ }
+ return 1;
+ }
+ if (populate_ruleset(ENV_FS_RO_NAME, ruleset_fd,
+ ACCESS_FS_ROUGHLY_READ)) {
+ goto err_close_ruleset;
+ }
+ if (populate_ruleset(ENV_FS_RW_NAME, ruleset_fd,
+ ACCESS_FS_ROUGHLY_READ | ACCESS_FS_ROUGHLY_WRITE)) {
+ goto err_close_ruleset;
+ }
+ if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+ perror("Failed to restrict privileges");
+ goto err_close_ruleset;
+ }
+ if (landlock_enforce_ruleset_current(ruleset_fd, 0)) {
+ perror("Failed to enforce ruleset");
+ goto err_close_ruleset;
+ }
+ close(ruleset_fd);
+
+ cmd_path = argv[1];
+ cmd_argv = argv + 1;
+ execvpe(cmd_path, cmd_argv, envp);
+ fprintf(stderr, "Failed to execute \"%s\": %s\n", cmd_path,
+ strerror(errno));
+ fprintf(stderr, "Hint: access to the binary, the interpreter or "
+ "shared libraries may be denied.\n");
+ return 1;
+
+err_close_ruleset:
+ close(ruleset_fd);
+ return 1;
+}
--
2.29.2

2020-12-09 23:50:47

by Mickaël Salaün

[permalink] [raw]
Subject: [PATCH v26 10/12] selftests/landlock: Add user space tests

From: Mickaël Salaün <[email protected]>

Test all Landlock system calls, ptrace hooks semantic and filesystem
access-control.

Test coverage for security/landlock/ is 94.1% of lines. The code not
covered only deals with internal kernel errors (e.g. memory allocation)
and race conditions.

Cc: James Morris <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Mickaël Salaün <[email protected]>
Reviewed-by: Vincent Dagonneau <[email protected]>
---

Changes since v25:
* Add a new test to check that Landlock ruleset file descriptors
received through UNIX sockets are usable. Contributed by Vincent
Dagonneau.
* Improve hierarchy.trace tests to not hang when testing on a kernel
that don't support Landlock.
* Replace EXPECT_EQ(0, close(*)) with ASSERT_EQ(0, close(*)).
* Guard WEXITSTATUS() use with WIFEXITED() in ptrace tests.
* Use pipe2(2) with O_CLOEXEC.
* Remove useless errno set for syscall wrappers, and related useless checks.
* Rename test.
* Add Microsoft copyright for layout1.interleaved_masked_accesses .

Changes since v24:
* Revert the ruleset_overlap test from v24: check that access righs are
ORed together when building a ruleset. Keep the extra checks
added with v24.
* Revert inherit_subset test from v24: use the automatic ORing of
access rights for the same file.
* Update interleaved_masked_accesses test (added with v24) to stop when
all layers allowed at least one time an inode in the path walk.
* Extend interleaved_masked_accesses test with new tricky interleaved
layers which would not work as intended with (allow or deny) bitmask
layer implementations.
* Simplify and rename test_path*() to test_open*() to make easier the
diagnostic in case of unattended errors.
* Replace most call to open(2) with a call to test_open(), which
reduces the number of lines and make tests more readable.
* Fix erroneous check in inherit_superset.

Changes since v23:
* Add an interleaved_masked_accesses test to check corner cases for
interleaved layered ruleset combinations.
* Update ruleset_overlap and inherit_subset tests to follow the new
intersect access rights behavior.
* Extend the inherit_superset test to check that layers are handled as
expected in the superset use case, which complete the inherit_subset
checks.
* Fix comment (spotted by Vincent Dagonneau).

Changes since v22:
* Extend and add a new test to better check rules applied to the root
directory: rule_over_root_allow_then_deny, rule_over_root_deny.
* Change the signature of test_path*() to make the calls clearer.

Changes since v21:
* Remove layout1.chroot test and update layout1.unhandled_access to not
rely on LANDLOCK_ACCESS_FS_CHROOT.
* Clean up comments.

Changes since v20:
* Update with new syscalls and type names.
* Use the full syscall interfaces: explicitely set the "flags" field to
zero.
* Update the empty_path_beneath_attr test to check for EFAULT.
* Update and merge tests for the simplified copy_min_struct_from_user().
* Clean up makefile.
* Rename some types and variables in a more consistent way.

Changes since v19:
* Update with the new Landlock syscalls.
* Fix device creation.
* Check the new landlock_attr_features members: last_rule_type and
last_target_type .
* Constify variables.

Changes since v18:
* Replace ruleset_rw.inval with layout1.inval to avoid inexistent test
layout.
* Use the new FIXTURE_VARIANT for ptrace_test: makes the tests more
readable and usable.
* Add ARRAY_SIZE() macro to please checkpatch.

Changes since v17:
* Add new test for mknod with a zero mode.
* Use memset(3) to initialize attr_features in base_test.

Changes since v16:
* Add new unpriv_enforce_without_no_new_privs test: check that ruleset
enforcing is forbiden without no_new_privs and CAP_SYS_ADMIN.
* Drop capabilities when useful.
* Check the new size_attr_features field from struct
landlock_attr_features.
* Update the empty_or_same_ruleset test to check complementary empty
ruleset.
* Update base_test according to the new attribute structures and fix the
inconsistent_attr test accordingly.
* Switch syscall attribute pointer and size arguments.
* Rename test files with a "_test" suffix.

Changes since v14:
* Add new tests:
- superset: check new layer bitmask.
- max_layers: check maximum number of layers.
- release_inodes: check that umount work well.
- empty_or_same_ruleset.
- inconsistent_attr: checks copy_to_user limits.
- in ruleset_rw.inval to check ruleset FD.
- proc_unlinked_file: check file access through /proc/self/fd .
- file_access_rights: check that a file can only get consistent access
rights.
- unpriv: check that NO_NEW_PRIVS or CAP_SYS_ADMIN is required.
- check pipe access through /proc/self/fd .
- check move_mount(2).
- check ruleset file descriptor properties.
- proc_nsfs: extend to check that internal filesystems (e.g. nsfs) are
allowed.
* Double-check read and write effective actions.
* Fix potential desynchronization between the kernel sources and
installed headers by overriding the build step in the Makefile. This
also enable to build with Clang.
* Add two files in the test directories (for link test and rename test).
* Remove test for ruleset's show_fdinfo().
* Replace EBADR with EBADFD.
* Update tests accordingly to the changes of rename and link rights.
* Fix (now) illegal access rights tied to files.
* Update rename and link tests.
* Remove superfluous '\n' in TH_LOG() calls.
* Make assert calls consistent and readable.
* Fix the execute test.
* Make tests future-proof.
* Cosmetic fixes.

Changes since v14:
* Add new tests:
- Compatibility: empty_attr_{ruleset,path_beneath,enforce} to check
minimal attr size.
- Access types: link_to, rename_from, rename_to, rmdir, unlink,
make_char, make_block, make_reg, make_sock, make_fifo, make_sym,
make_dir, chroot, execute.
- Test privilege escalation prevention by enforcing a nested rule, on
a parent directory, with less restrictions than one on a child
directory.
- Test for empty and more than 32-bits allowed_access
* Merge the two test mount hierarchies.
* Complete relative path tests by combining chdir and chroot.
* Adjust tests:
- Remove the layout1/extend_ruleset_with_denied_path test.
- Extend layout1/whitelist test with checks on file.
- Add and use create_dir_and_file().
* Only use read/write checks but not stat(2) for tests.
* Rename test.h to common.h and improve it.
* Rename path name to make them more consistent, easy to understand and
make them in a common directory.
* Make create_ruleset() more generic.
* Constify variables.
* Re-add static global variables.
* Remove useless openat(2).
* Fix and complete kernel config.
* Set umask and clean up file modes.
* Clean up open flags.
* Improve Makefile.
* Fix spelling.
* Improve comments and error messages.

Changes since v13:
* Add back the filesystem tests (from v10) and extend them.
* Add tests for the new syscall.

Previous changes:
https://lore.kernel.org/lkml/[email protected]/
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/landlock/.gitignore | 2 +
tools/testing/selftests/landlock/Makefile | 24 +
tools/testing/selftests/landlock/base_test.c | 219 ++
tools/testing/selftests/landlock/common.h | 110 +
tools/testing/selftests/landlock/config | 5 +
tools/testing/selftests/landlock/fs_test.c | 1799 +++++++++++++++++
.../testing/selftests/landlock/ptrace_test.c | 314 +++
tools/testing/selftests/landlock/true.c | 5 +
9 files changed, 2479 insertions(+)
create mode 100644 tools/testing/selftests/landlock/.gitignore
create mode 100644 tools/testing/selftests/landlock/Makefile
create mode 100644 tools/testing/selftests/landlock/base_test.c
create mode 100644 tools/testing/selftests/landlock/common.h
create mode 100644 tools/testing/selftests/landlock/config
create mode 100644 tools/testing/selftests/landlock/fs_test.c
create mode 100644 tools/testing/selftests/landlock/ptrace_test.c
create mode 100644 tools/testing/selftests/landlock/true.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index d9c283503159..f40a34430652 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -26,6 +26,7 @@ TARGETS += ir
TARGETS += kcmp
TARGETS += kexec
TARGETS += kvm
+TARGETS += landlock
TARGETS += lib
TARGETS += livepatch
TARGETS += lkdtm
diff --git a/tools/testing/selftests/landlock/.gitignore b/tools/testing/selftests/landlock/.gitignore
new file mode 100644
index 000000000000..470203a7cd73
--- /dev/null
+++ b/tools/testing/selftests/landlock/.gitignore
@@ -0,0 +1,2 @@
+/*_test
+/true
diff --git a/tools/testing/selftests/landlock/Makefile b/tools/testing/selftests/landlock/Makefile
new file mode 100644
index 000000000000..a99596ca9882
--- /dev/null
+++ b/tools/testing/selftests/landlock/Makefile
@@ -0,0 +1,24 @@
+# SPDX-License-Identifier: GPL-2.0
+
+CFLAGS += -Wall -O2
+
+src_test := $(wildcard *_test.c)
+
+TEST_GEN_PROGS := $(src_test:.c=)
+
+TEST_GEN_PROGS_EXTENDED := true
+
+KSFT_KHDR_INSTALL := 1
+OVERRIDE_TARGETS := 1
+include ../lib.mk
+
+khdr_dir = $(top_srcdir)/usr/include
+
+$(khdr_dir)/linux/landlock.h: khdr
+ @:
+
+$(OUTPUT)/true: true.c
+ $(LINK.c) $< $(LDLIBS) -o $@ -static
+
+$(OUTPUT)/%_test: %_test.c $(khdr_dir)/linux/landlock.h ../kselftest_harness.h common.h
+ $(LINK.c) $< $(LDLIBS) -o $@ -lcap -I$(khdr_dir)
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
new file mode 100644
index 000000000000..e2bec028d831
--- /dev/null
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Common user space base
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2019-2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+
+#include "common.h"
+
+#ifndef O_PATH
+#define O_PATH 010000000
+#endif
+
+TEST(inconsistent_attr) {
+ const long page_size = sysconf(_SC_PAGESIZE);
+ char *const buf = malloc(page_size + 1);
+ struct landlock_ruleset_attr *const ruleset_attr = (void *)buf;
+
+ ASSERT_NE(NULL, buf);
+
+ /* Checks copy_from_user(). */
+ ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, 0, 0));
+ /* The size if less than sizeof(struct landlock_attr_enforce). */
+ ASSERT_EQ(EINVAL, errno);
+ ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, 1, 0));
+ ASSERT_EQ(EINVAL, errno);
+
+ ASSERT_EQ(-1, landlock_create_ruleset(NULL, 1, 0));
+ /* The size if less than sizeof(struct landlock_attr_enforce). */
+ ASSERT_EQ(EFAULT, errno);
+
+ ASSERT_EQ(-1, landlock_create_ruleset(NULL,
+ sizeof(struct landlock_ruleset_attr), 0));
+ ASSERT_EQ(EFAULT, errno);
+
+ ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, page_size + 1, 0));
+ ASSERT_EQ(E2BIG, errno);
+
+ ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr,
+ sizeof(struct landlock_ruleset_attr), 0));
+ ASSERT_EQ(ENOMSG, errno);
+ ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, page_size, 0));
+ ASSERT_EQ(ENOMSG, errno);
+
+ /* Checks non-zero value. */
+ buf[page_size - 2] = '.';
+ ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, page_size, 0));
+ ASSERT_EQ(E2BIG, errno);
+
+ ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, page_size + 1, 0));
+ ASSERT_EQ(E2BIG, errno);
+
+ free(buf);
+}
+
+TEST(empty_path_beneath_attr) {
+ const struct landlock_ruleset_attr ruleset_attr = {
+ .handled_access_fs = LANDLOCK_ACCESS_FS_EXECUTE,
+ };
+ const int ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0);
+
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Similar to struct landlock_path_beneath_attr.parent_fd = 0 */
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ NULL, 0));
+ ASSERT_EQ(EFAULT, errno);
+ ASSERT_EQ(0, close(ruleset_fd));
+}
+
+TEST(inval_fd_enforce) {
+ ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0));
+
+ ASSERT_EQ(-1, landlock_enforce_ruleset_current(-1, 0));
+ ASSERT_EQ(EBADF, errno);
+}
+
+TEST(unpriv_enforce_without_no_new_privs) {
+ int err;
+
+ disable_caps(_metadata);
+ err = landlock_enforce_ruleset_current(-1, 0);
+ ASSERT_EQ(EPERM, errno);
+ ASSERT_EQ(err, -1);
+}
+
+TEST(ruleset_fd_io)
+{
+ struct landlock_ruleset_attr ruleset_attr = {
+ .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
+ };
+ int ruleset_fd;
+ char buf;
+
+ disable_caps(_metadata);
+ ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+
+ ASSERT_EQ(-1, write(ruleset_fd, ".", 1));
+ ASSERT_EQ(EINVAL, errno);
+ ASSERT_EQ(-1, read(ruleset_fd, &buf, 1));
+ ASSERT_EQ(EINVAL, errno);
+
+ ASSERT_EQ(0, close(ruleset_fd));
+}
+
+/* Tests enforcement of a ruleset FD transfered through a UNIX socket. */
+TEST(ruleset_fd_transfer)
+{
+ struct landlock_ruleset_attr ruleset_attr = {
+ .handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+ };
+ struct landlock_path_beneath_attr path_beneath_attr = {
+ .allowed_access = LANDLOCK_ACCESS_FS_READ_DIR,
+ };
+ int ruleset_fd_tx, dir_fd;
+ union {
+ /* Aligned ancillary data buffer. */
+ char buf[CMSG_SPACE(sizeof(ruleset_fd_tx))];
+ struct cmsghdr _align;
+ } cmsg_tx = {};
+ char data_tx = '.';
+ struct iovec io = {
+ .iov_base = &data_tx,
+ .iov_len = sizeof(data_tx),
+ };
+ struct msghdr msg = {
+ .msg_iov = &io,
+ .msg_iovlen = 1,
+ .msg_control = &cmsg_tx.buf,
+ .msg_controllen = sizeof(cmsg_tx.buf),
+ };
+ struct cmsghdr *cmsg;
+ int socket_fds[2];
+ pid_t child;
+ int status;
+
+ disable_caps(_metadata);
+
+ /* Creates a test ruleset with a simple rule. */
+ ruleset_fd_tx = landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0);
+ ASSERT_LE(0, ruleset_fd_tx);
+ path_beneath_attr.parent_fd = open("/tmp", O_PATH | O_NOFOLLOW |
+ O_DIRECTORY | O_CLOEXEC);
+ ASSERT_LE(0, path_beneath_attr.parent_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd_tx, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath_attr, 0));
+ ASSERT_EQ(0, close(path_beneath_attr.parent_fd));
+
+ cmsg = CMSG_FIRSTHDR(&msg);
+ ASSERT_NE(NULL, cmsg);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(ruleset_fd_tx));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ memcpy(CMSG_DATA(cmsg), &ruleset_fd_tx, sizeof(ruleset_fd_tx));
+
+ /* Sends the ruleset FD over a socketpair and then close it. */
+ ASSERT_EQ(0, socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0, socket_fds));
+ ASSERT_EQ(sizeof(data_tx), sendmsg(socket_fds[0], &msg, 0));
+ ASSERT_EQ(0, close(socket_fds[0]));
+ ASSERT_EQ(0, close(ruleset_fd_tx));
+
+ child = fork();
+ ASSERT_LE(0, child);
+ if (child == 0) {
+ int ruleset_fd_rx;
+
+ *(char *)msg.msg_iov->iov_base = '\0';
+ ASSERT_EQ(sizeof(data_tx), recvmsg(socket_fds[1], &msg, MSG_CMSG_CLOEXEC));
+ ASSERT_EQ('.', *(char *)msg.msg_iov->iov_base);
+ ASSERT_EQ(0, close(socket_fds[1]));
+ cmsg = CMSG_FIRSTHDR(&msg);
+ ASSERT_EQ(cmsg->cmsg_len, CMSG_LEN(sizeof(ruleset_fd_tx)));
+ memcpy(&ruleset_fd_rx, CMSG_DATA(cmsg), sizeof(ruleset_fd_tx));
+
+ /* Enforces the received ruleset on the child. */
+ ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0));
+ ASSERT_EQ(0, landlock_enforce_ruleset_current(ruleset_fd_rx, 0));
+ ASSERT_EQ(0, close(ruleset_fd_rx));
+
+ /* Checks that the ruleset enforcement. */
+ ASSERT_EQ(-1, open("/", O_RDONLY | O_DIRECTORY | O_CLOEXEC));
+ ASSERT_EQ(EACCES, errno);
+ dir_fd = open("/tmp", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+ ASSERT_LE(0, dir_fd);
+ ASSERT_EQ(0, close(dir_fd));
+ _exit(_metadata->passed ? EXIT_SUCCESS : EXIT_FAILURE);
+ return;
+ }
+
+ ASSERT_EQ(0, close(socket_fds[1]));
+
+ /* Checks that the parent is unrestricted. */
+ dir_fd = open("/", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+ ASSERT_LE(0, dir_fd);
+ ASSERT_EQ(0, close(dir_fd));
+ dir_fd = open("/tmp", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+ ASSERT_LE(0, dir_fd);
+ ASSERT_EQ(0, close(dir_fd));
+
+ ASSERT_EQ(child, waitpid(child, &status, 0));
+ ASSERT_EQ(1, WIFEXITED(status));
+ ASSERT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h
new file mode 100644
index 000000000000..223d60a52ead
--- /dev/null
+++ b/tools/testing/selftests/landlock/common.h
@@ -0,0 +1,110 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Landlock test helpers
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2019-2020 ANSSI
+ */
+
+#include <errno.h>
+#include <linux/landlock.h>
+#include <sys/capability.h>
+#include <sys/syscall.h>
+
+#include "../kselftest_harness.h"
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+#ifndef landlock_create_ruleset
+static inline int landlock_create_ruleset(
+ const struct landlock_ruleset_attr *const attr,
+ const size_t size, const __u32 flags)
+{
+ return syscall(__NR_landlock_create_ruleset, attr, size, flags);
+}
+#endif
+
+#ifndef landlock_add_rule
+static inline int landlock_add_rule(const int ruleset_fd,
+ const enum landlock_rule_type rule_type,
+ const void *const rule_attr, const __u32 flags)
+{
+ return syscall(__NR_landlock_add_rule, ruleset_fd, rule_type,
+ rule_attr, flags);
+}
+#endif
+
+#ifndef landlock_enforce_ruleset_current
+static inline int landlock_enforce_ruleset_current(const int ruleset_fd,
+ const __u32 flags)
+{
+ return syscall(__NR_landlock_enforce_ruleset_current, ruleset_fd,
+ flags);
+}
+#endif
+
+static void disable_caps(struct __test_metadata *const _metadata)
+{
+ cap_t cap_p;
+ /* Only these three capabilities are useful for the tests. */
+ const cap_value_t caps[] = {
+ CAP_MKNOD,
+ CAP_SYS_ADMIN,
+ CAP_SYS_CHROOT,
+ };
+
+ cap_p = cap_get_proc();
+ ASSERT_NE(NULL, cap_p) {
+ TH_LOG("Failed to cap_get_proc: %s", strerror(errno));
+ }
+ ASSERT_NE(-1, cap_clear(cap_p)) {
+ TH_LOG("Failed to cap_clear: %s", strerror(errno));
+ }
+ ASSERT_NE(-1, cap_set_flag(cap_p, CAP_PERMITTED, ARRAY_SIZE(caps),
+ caps, CAP_SET)) {
+ TH_LOG("Failed to cap_set_flag: %s", strerror(errno));
+ }
+ ASSERT_NE(-1, cap_set_proc(cap_p)) {
+ TH_LOG("Failed to cap_set_proc: %s", strerror(errno));
+ }
+ ASSERT_NE(-1, cap_free(cap_p)) {
+ TH_LOG("Failed to cap_free: %s", strerror(errno));
+ }
+}
+
+static void effective_cap(struct __test_metadata *const _metadata,
+ const cap_value_t caps, const cap_flag_value_t value)
+{
+ cap_t cap_p;
+
+ cap_p = cap_get_proc();
+ ASSERT_NE(NULL, cap_p) {
+ TH_LOG("Failed to cap_get_proc: %s", strerror(errno));
+ }
+ ASSERT_NE(-1, cap_set_flag(cap_p, CAP_EFFECTIVE, 1, &caps, value)) {
+ TH_LOG("Failed to cap_set_flag: %s", strerror(errno));
+ }
+ ASSERT_NE(-1, cap_set_proc(cap_p)) {
+ TH_LOG("Failed to cap_set_proc: %s", strerror(errno));
+ }
+ ASSERT_NE(-1, cap_free(cap_p)) {
+ TH_LOG("Failed to cap_free: %s", strerror(errno));
+ }
+}
+
+/* We cannot put such helpers in a library because of kselftest_harness.h . */
+__attribute__((__unused__))
+static void set_cap(struct __test_metadata *const _metadata,
+ const cap_value_t caps)
+{
+ effective_cap(_metadata, caps, CAP_SET);
+}
+
+__attribute__((__unused__))
+static void clear_cap(struct __test_metadata *const _metadata,
+ const cap_value_t caps)
+{
+ effective_cap(_metadata, caps, CAP_CLEAR);
+}
diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
new file mode 100644
index 000000000000..042298105821
--- /dev/null
+++ b/tools/testing/selftests/landlock/config
@@ -0,0 +1,5 @@
+CONFIG_SECURITY_LANDLOCK=y
+CONFIG_SECURITY_PATH=y
+CONFIG_SECURITY=y
+CONFIG_SHMEM=y
+CONFIG_TMPFS=y
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
new file mode 100644
index 000000000000..3480eff8a883
--- /dev/null
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -0,0 +1,1799 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Filesystem
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2020 ANSSI
+ * Copyright © 2020 Microsoft Corporation
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <sched.h>
+#include <string.h>
+#include <sys/capability.h>
+#include <sys/mount.h>
+#include <sys/prctl.h>
+#include <sys/sendfile.h>
+#include <sys/stat.h>
+#include <sys/sysmacros.h>
+#include <unistd.h>
+
+#include "common.h"
+
+#define TMP_DIR "tmp/"
+#define FILE_1 "file1"
+#define FILE_2 "file2"
+#define BINARY_PATH "./true"
+
+/* Paths (sibling number and depth) */
+static const char dir_s1d1[] = TMP_DIR "s1d1";
+static const char file1_s1d1[] = TMP_DIR "s1d1/" FILE_1;
+static const char file2_s1d1[] = TMP_DIR "s1d1/" FILE_2;
+static const char dir_s1d2[] = TMP_DIR "s1d1/s1d2";
+static const char file1_s1d2[] = TMP_DIR "s1d1/s1d2/" FILE_1;
+static const char file2_s1d2[] = TMP_DIR "s1d1/s1d2/" FILE_2;
+static const char dir_s1d3[] = TMP_DIR "s1d1/s1d2/s1d3";
+static const char file1_s1d3[] = TMP_DIR "s1d1/s1d2/s1d3/" FILE_1;
+static const char file2_s1d3[] = TMP_DIR "s1d1/s1d2/s1d3/" FILE_2;
+
+static const char dir_s2d1[] = TMP_DIR "s2d1";
+static const char file1_s2d1[] = TMP_DIR "s2d1/" FILE_1;
+static const char dir_s2d2[] = TMP_DIR "s2d1/s2d2";
+static const char file1_s2d2[] = TMP_DIR "s2d1/s2d2/" FILE_1;
+static const char dir_s2d3[] = TMP_DIR "s2d1/s2d2/s2d3";
+static const char file1_s2d3[] = TMP_DIR "s2d1/s2d2/s2d3/" FILE_1;
+static const char file2_s2d3[] = TMP_DIR "s2d1/s2d2/s2d3/" FILE_2;
+
+static const char dir_s3d1[] = TMP_DIR "s3d1";
+/* dir_s3d2 is a mount point. */
+static const char dir_s3d2[] = TMP_DIR "s3d1/s3d2";
+static const char dir_s3d3[] = TMP_DIR "s3d1/s3d2/s3d3";
+
+static void create_dir_and_file(struct __test_metadata *const _metadata,
+ const char *const dir_path)
+{
+ int file_fd;
+ char *const file1_path = alloca(strlen(dir_path) + sizeof(FILE_1) + 2);
+ char *const file2_path = alloca(strlen(dir_path) + sizeof(FILE_2) + 2);
+
+ strcpy(file1_path, dir_path);
+ strcat(file1_path, "/");
+ strcat(file1_path, FILE_1);
+
+ strcpy(file2_path, dir_path);
+ strcat(file2_path, "/");
+ strcat(file2_path, FILE_2);
+
+ ASSERT_EQ(0, mkdir(dir_path, 0700)) {
+ TH_LOG("Failed to create directory \"%s\": %s", dir_path,
+ strerror(errno));
+ }
+ file_fd = open(file1_path, O_CREAT | O_EXCL | O_WRONLY | O_CLOEXEC,
+ 0700);
+ ASSERT_LE(0, file_fd);
+ ASSERT_EQ(0, close(file_fd));
+
+ file_fd = open(file2_path, O_CREAT | O_EXCL | O_WRONLY | O_CLOEXEC,
+ 0700);
+ ASSERT_LE(0, file_fd);
+ ASSERT_EQ(0, close(file_fd));
+}
+
+static void delete_dir_and_file(const char *const dir_path)
+{
+ char *const file1_path = alloca(strlen(dir_path) +
+ sizeof(FILE_1) + 2);
+ char *const file2_path = alloca(strlen(dir_path) +
+ sizeof(FILE_2) + 2);
+
+ strcpy(file1_path, dir_path);
+ strcat(file1_path, "/");
+ strcat(file1_path, FILE_1);
+
+ strcpy(file2_path, dir_path);
+ strcat(file2_path, "/");
+ strcat(file2_path, FILE_2);
+
+ unlink(file1_path);
+ unlink(file2_path);
+ /* file1_path may be a directory, cf. layout1/make_directory. */
+ rmdir(file1_path);
+ rmdir(dir_path);
+}
+
+static void cleanup_layout1(struct __test_metadata *const _metadata)
+{
+ delete_dir_and_file(dir_s1d3);
+ delete_dir_and_file(dir_s1d2);
+ delete_dir_and_file(dir_s1d1);
+
+ delete_dir_and_file(dir_s2d3);
+ delete_dir_and_file(dir_s2d2);
+ delete_dir_and_file(dir_s2d1);
+
+ delete_dir_and_file(dir_s3d3);
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ umount(dir_s3d2);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ delete_dir_and_file(dir_s3d2);
+ delete_dir_and_file(dir_s3d1);
+
+ delete_dir_and_file(TMP_DIR);
+}
+
+FIXTURE(layout1) {
+};
+
+FIXTURE_SETUP(layout1)
+{
+ disable_caps(_metadata);
+ cleanup_layout1(_metadata);
+
+ /* Do not pollute the rest of the system. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, unshare(CLONE_NEWNS));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ umask(0077);
+ create_dir_and_file(_metadata, TMP_DIR);
+
+ create_dir_and_file(_metadata, dir_s1d1);
+ create_dir_and_file(_metadata, dir_s1d2);
+ create_dir_and_file(_metadata, dir_s1d3);
+
+ create_dir_and_file(_metadata, dir_s2d1);
+ create_dir_and_file(_metadata, dir_s2d2);
+ create_dir_and_file(_metadata, dir_s2d3);
+
+ create_dir_and_file(_metadata, dir_s3d1);
+ create_dir_and_file(_metadata, dir_s3d2);
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, mount("tmp", dir_s3d2, "tmpfs", 0, "size=4m,mode=700"));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ create_dir_and_file(_metadata, dir_s3d3);
+}
+
+FIXTURE_TEARDOWN(layout1)
+{
+ /*
+ * cleanup_layout1() would be denied here, use TEST(cleanup) instead.
+ */
+}
+
+/*
+ * This helper enables to use the ASSERT_* macros and print the line number
+ * pointing to the test caller.
+ */
+static int test_open_rel(const int dirfd, const char *const path, const int flags)
+{
+ int fd;
+
+ /* Works with file and directories. */
+ fd = openat(dirfd, path, flags | O_CLOEXEC);
+ if (fd < 0)
+ return errno;
+ if (close(fd) == 0)
+ return 0;
+ /*
+ * Mixing error codes from close(2) and open(2) should not lead to any
+ * (access type) confusion for this test.
+ */
+ return errno;
+}
+
+static int test_open(const char *const path, const int flags)
+{
+ return test_open_rel(AT_FDCWD, path, flags);
+}
+
+TEST_F(layout1, no_restriction)
+{
+ ASSERT_EQ(0, test_open(dir_s1d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(file2_s1d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s1d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(file2_s1d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+
+ ASSERT_EQ(0, test_open(dir_s2d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s2d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s2d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s2d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s2d3, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s2d3, O_RDONLY));
+
+ ASSERT_EQ(0, test_open(dir_s3d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s3d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s3d3, O_RDONLY));
+}
+
+TEST_F(layout1, inval)
+{
+ struct landlock_path_beneath_attr path_beneath = {
+ .allowed_access = LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_WRITE_FILE,
+ .parent_fd = -1,
+ };
+ struct landlock_ruleset_attr ruleset_attr = {
+ .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_WRITE_FILE,
+ };
+ int ruleset_fd;
+
+ path_beneath.parent_fd = open(dir_s1d2, O_PATH | O_DIRECTORY |
+ O_CLOEXEC);
+ ASSERT_LE(0, path_beneath.parent_fd);
+
+ ruleset_fd = open(dir_s1d1, O_PATH | O_DIRECTORY | O_CLOEXEC);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0));
+ /* Returns EBADF because ruleset_fd contains O_PATH. */
+ ASSERT_EQ(EBADF, errno);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ruleset_fd = open(dir_s1d1, O_DIRECTORY | O_CLOEXEC);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0));
+ /* Returns EBADFD because ruleset_fd is not a valid ruleset. */
+ ASSERT_EQ(EBADFD, errno);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Gets a real ruleset. */
+ ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0));
+ ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+ /* Tests without O_PATH. */
+ path_beneath.parent_fd = open(dir_s1d2, O_DIRECTORY | O_CLOEXEC);
+ ASSERT_LE(0, path_beneath.parent_fd);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0));
+ ASSERT_EQ(EBADFD, errno);
+ ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+ /* Checks unhandled allowed_access. */
+ path_beneath.parent_fd = open(dir_s1d2, O_PATH | O_DIRECTORY |
+ O_CLOEXEC);
+ ASSERT_LE(0, path_beneath.parent_fd);
+
+ /* Test with legitimate values. */
+ path_beneath.allowed_access |= LANDLOCK_ACCESS_FS_EXECUTE;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0));
+ ASSERT_EQ(EINVAL, errno);
+ path_beneath.allowed_access &= ~LANDLOCK_ACCESS_FS_EXECUTE;
+
+ /* Test with unknown (64-bits) value. */
+ path_beneath.allowed_access |= (1ULL << 60);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0));
+ ASSERT_EQ(EINVAL, errno);
+ path_beneath.allowed_access &= ~(1ULL << 60);
+
+ /* Test with no access. */
+ path_beneath.allowed_access = 0;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0));
+ path_beneath.allowed_access &= ~(1ULL << 60);
+
+ ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+ /* Enforces the ruleset. */
+ ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0));
+ ASSERT_EQ(0, landlock_enforce_ruleset_current(ruleset_fd, 0));
+
+ ASSERT_EQ(0, close(ruleset_fd));
+}
+
+#define ACCESS_FILE ( \
+ LANDLOCK_ACCESS_FS_EXECUTE | \
+ LANDLOCK_ACCESS_FS_WRITE_FILE | \
+ LANDLOCK_ACCESS_FS_READ_FILE)
+
+#define ACCESS_LAST LANDLOCK_ACCESS_FS_MAKE_SYM
+
+#define ACCESS_ALL ( \
+ ACCESS_FILE | \
+ LANDLOCK_ACCESS_FS_READ_DIR | \
+ LANDLOCK_ACCESS_FS_REMOVE_DIR | \
+ LANDLOCK_ACCESS_FS_REMOVE_FILE | \
+ LANDLOCK_ACCESS_FS_MAKE_CHAR | \
+ LANDLOCK_ACCESS_FS_MAKE_DIR | \
+ LANDLOCK_ACCESS_FS_MAKE_REG | \
+ LANDLOCK_ACCESS_FS_MAKE_SOCK | \
+ LANDLOCK_ACCESS_FS_MAKE_FIFO | \
+ LANDLOCK_ACCESS_FS_MAKE_BLOCK | \
+ ACCESS_LAST)
+
+TEST_F(layout1, file_access_rights)
+{
+ __u64 access;
+ int err;
+ struct landlock_path_beneath_attr path_beneath = {};
+ struct landlock_ruleset_attr ruleset_attr = {
+ .handled_access_fs = ACCESS_ALL,
+ };
+ const int ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0);
+
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Tests access rights for files. */
+ path_beneath.parent_fd = open(file1_s1d2, O_PATH | O_CLOEXEC);
+ ASSERT_LE(0, path_beneath.parent_fd);
+ for (access = 1; access <= ACCESS_LAST; access <<= 1) {
+ path_beneath.allowed_access = access;
+ err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0);
+ if ((access | ACCESS_FILE) == ACCESS_FILE) {
+ ASSERT_EQ(0, err);
+ } else {
+ ASSERT_EQ(-1, err);
+ ASSERT_EQ(EINVAL, errno);
+ }
+ }
+ ASSERT_EQ(0, close(path_beneath.parent_fd));
+}
+
+static void add_path_beneath(struct __test_metadata *const _metadata,
+ const int ruleset_fd, const __u64 allowed_access,
+ const char *const path)
+{
+ struct landlock_path_beneath_attr path_beneath = {
+ .allowed_access = allowed_access,
+ };
+
+ path_beneath.parent_fd = open(path, O_PATH | O_CLOEXEC);
+ ASSERT_LE(0, path_beneath.parent_fd) {
+ TH_LOG("Failed to open directory \"%s\": %s", path,
+ strerror(errno));
+ }
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0)) {
+ TH_LOG("Failed to update the ruleset with \"%s\": %s", path,
+ strerror(errno));
+ }
+ ASSERT_EQ(0, close(path_beneath.parent_fd));
+}
+
+struct rule {
+ const char *path;
+ __u64 access;
+};
+
+#define ACCESS_RO ( \
+ LANDLOCK_ACCESS_FS_READ_FILE | \
+ LANDLOCK_ACCESS_FS_READ_DIR)
+
+#define ACCESS_RW ( \
+ ACCESS_RO | \
+ LANDLOCK_ACCESS_FS_WRITE_FILE)
+
+static int create_ruleset(struct __test_metadata *const _metadata,
+ const __u64 handled_access_fs, const struct rule rules[])
+{
+ int ruleset_fd, i;
+ struct landlock_ruleset_attr ruleset_attr = {
+ .handled_access_fs = handled_access_fs,
+ };
+
+ ASSERT_NE(NULL, rules) {
+ TH_LOG("No rule list");
+ }
+ ASSERT_NE(NULL, rules[0].path) {
+ TH_LOG("Empty rule list");
+ }
+
+ ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0);
+ ASSERT_LE(0, ruleset_fd) {
+ TH_LOG("Failed to create a ruleset: %s", strerror(errno));
+ }
+
+ for (i = 0; rules[i].path; i++) {
+ add_path_beneath(_metadata, ruleset_fd, rules[i].access,
+ rules[i].path);
+ }
+ return ruleset_fd;
+}
+
+static void enforce_ruleset(struct __test_metadata *const _metadata,
+ const int ruleset_fd)
+{
+ ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0));
+ ASSERT_EQ(0, landlock_enforce_ruleset_current(ruleset_fd, 0)) {
+ TH_LOG("Failed to enforce ruleset: %s", strerror(errno));
+ }
+}
+
+TEST_F(layout1, proc_nsfs)
+{
+ const struct rule rules[] = {
+ {
+ .path = "/dev/null",
+ .access = LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_WRITE_FILE,
+ },
+ {}
+ };
+ struct landlock_path_beneath_attr path_beneath;
+ const int ruleset_fd = create_ruleset(_metadata, rules[0].access |
+ LANDLOCK_ACCESS_FS_READ_DIR, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, test_open("/proc/self/ns/mnt", O_RDONLY));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+
+ ASSERT_EQ(EACCES, test_open("/", O_RDONLY));
+ ASSERT_EQ(EACCES, test_open("/dev", O_RDONLY));
+ ASSERT_EQ(0, test_open("/dev/null", O_RDONLY));
+ ASSERT_EQ(EACCES, test_open("/dev/full", O_RDONLY));
+
+ ASSERT_EQ(EACCES, test_open("/proc", O_RDONLY));
+ ASSERT_EQ(EACCES, test_open("/proc/self", O_RDONLY));
+ ASSERT_EQ(EACCES, test_open("/proc/self/ns", O_RDONLY));
+ /*
+ * Because nsfs is an internal filesystem, /proc/self/ns/mnt is a
+ * disconnected path. Such path cannot be identified and must then be
+ * allowed.
+ */
+ ASSERT_EQ(0, test_open("/proc/self/ns/mnt", O_RDONLY));
+
+ /*
+ * Checks that it is not possible to add nsfs-like filesystem
+ * references to a ruleset.
+ */
+ path_beneath.allowed_access = LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_WRITE_FILE,
+ path_beneath.parent_fd = open("/proc/self/ns/mnt", O_PATH | O_CLOEXEC);
+ ASSERT_LE(0, path_beneath.parent_fd);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath, 0));
+ ASSERT_EQ(EBADFD, errno);
+ ASSERT_EQ(0, close(path_beneath.parent_fd));
+}
+
+static void drop_privileges(struct __test_metadata *const _metadata)
+{
+ cap_t caps;
+ const cap_value_t cap_val = CAP_SYS_ADMIN;
+
+ caps = cap_get_proc();
+ ASSERT_NE(NULL, caps);
+ ASSERT_EQ(0, cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_val,
+ CAP_CLEAR));
+ ASSERT_EQ(0, cap_set_proc(caps));
+ ASSERT_EQ(0, cap_free(caps));
+}
+
+TEST_F(layout1, unpriv) {
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ int ruleset_fd;
+
+ drop_privileges(_metadata);
+ ruleset_fd = create_ruleset(_metadata, ACCESS_RO, rules);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(-1, landlock_enforce_ruleset_current(ruleset_fd, 0));
+ ASSERT_EQ(EPERM, errno);
+
+ /* enforce_ruleset() calls prctl(no_new_privs). */
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+}
+
+TEST_F(layout1, effective_access)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = ACCESS_RO,
+ },
+ {
+ .path = file1_s2d2,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_WRITE_FILE,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+ char buf;
+ int reg_fd;
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Tests on a directory. */
+ ASSERT_EQ(EACCES, test_open("/", O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s1d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+
+ /* Tests on a file. */
+ ASSERT_EQ(EACCES, test_open(dir_s2d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s2d2, O_RDONLY));
+
+ /* Checks effective read and write actions. */
+ reg_fd = open(file1_s2d2, O_RDWR | O_CLOEXEC);
+ ASSERT_LE(0, reg_fd);
+ ASSERT_EQ(1, write(reg_fd, ".", 1));
+ ASSERT_LE(0, lseek(reg_fd, 0, SEEK_SET));
+ ASSERT_EQ(1, read(reg_fd, &buf, 1));
+ ASSERT_EQ('.', buf);
+ ASSERT_EQ(0, close(reg_fd));
+
+ /* Just in case, double-checks effective actions. */
+ reg_fd = open(file1_s2d2, O_RDONLY | O_CLOEXEC);
+ ASSERT_LE(0, reg_fd);
+ ASSERT_EQ(-1, write(reg_fd, &buf, 1));
+ ASSERT_EQ(EBADF, errno);
+ ASSERT_EQ(0, close(reg_fd));
+}
+
+TEST_F(layout1, unhandled_access)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ /* Here, we only handle read accesses, not write accesses. */
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RO, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Because the policy does not handle LANDLOCK_ACCESS_FS_WRITE_FILE,
+ * opening for write-only should be allowed, but not read-write.
+ */
+ ASSERT_EQ(0, test_open(file1_s1d1, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDWR));
+
+ ASSERT_EQ(0, test_open(file1_s1d2, O_WRONLY));
+ ASSERT_EQ(0, test_open(file1_s1d2, O_RDWR));
+}
+
+TEST_F(layout1, ruleset_overlap)
+{
+ const struct rule rules[] = {
+ /* These rules should be ORed among them. */
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_WRITE_FILE,
+ },
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_READ_DIR,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks s1d1 hierarchy. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* Checks s1d2 hierarchy. */
+ ASSERT_EQ(0, test_open(file1_s1d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d2, O_WRONLY));
+ ASSERT_EQ(0, test_open(file1_s1d2, O_RDWR));
+ ASSERT_EQ(0, test_open(dir_s1d2, O_RDONLY | O_DIRECTORY));
+
+ /* Checks s1d3 hierarchy. */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d3, O_WRONLY));
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDWR));
+ ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY | O_DIRECTORY));
+}
+
+TEST_F(layout1, interleaved_masked_accesses)
+{
+ /*
+ * Checks overly restrictive rules:
+ * layer 1: allows R s1d1/s1d2/s1d3/file1
+ * layer 2: allows R s1d1/s1d2/s1d3
+ * denies R s1d1/s1d2
+ * layer 3: allows R s1d1
+ * layer 4: denies W s1d1/s1d2
+ * layer 5: allows R s1d1/s1d2
+ * layer 6: denies R s1d1/s1d2
+ */
+ const struct rule layer1_read[] = {
+ /* Allows access to file1_s1d3 with the first layer. */
+ {
+ .path = file1_s1d3,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE,
+ },
+ {}
+ };
+ const struct rule layer2_read[] = {
+ /* Start by granting access via its parent directory... */
+ {
+ .path = dir_s1d3,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE,
+ },
+ /* ...but also denies access via its grandparent directory. */
+ {
+ .path = dir_s1d2,
+ .access = 0,
+ },
+ {}
+ };
+ const struct rule layer3_read[] = {
+ /* Allows access via its great-grandparent directory. */
+ {
+ .path = dir_s1d1,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE,
+ },
+ {}
+ };
+ const struct rule layer4_write[] = {
+ /*
+ * Try to confuse the deny access by denying write (but not
+ * read) access via its grandparent directory.
+ */
+ {
+ .path = dir_s1d2,
+ .access = 0,
+ },
+ {}
+ };
+ const struct rule layer5_read[] = {
+ /*
+ * Try to override layer2's deny read access by explicitly
+ * allowing read access via file1_s1d3's grandparent.
+ */
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE,
+ },
+ {}
+ };
+ const struct rule layer6_read[] = {
+ /*
+ * Finally, denies read access to file1_s1d3 via its
+ * grandparent.
+ */
+ {
+ .path = dir_s1d2,
+ .access = 0,
+ },
+ {}
+ };
+ int ruleset_fd;
+
+ ruleset_fd = create_ruleset(_metadata, LANDLOCK_ACCESS_FS_READ_FILE,
+ layer1_read);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks that access is granted for file1_s1d3 with layer 1. */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(file2_s1d3, O_RDONLY));
+ ASSERT_EQ(0, test_open(file2_s1d3, O_WRONLY));
+
+ ruleset_fd = create_ruleset(_metadata, LANDLOCK_ACCESS_FS_READ_FILE,
+ layer2_read);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks that previous access rights are unchanged with layer 2. */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(file2_s1d3, O_RDONLY));
+ ASSERT_EQ(0, test_open(file2_s1d3, O_WRONLY));
+
+ ruleset_fd = create_ruleset(_metadata, LANDLOCK_ACCESS_FS_READ_FILE,
+ layer3_read);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks that previous access rights are unchanged with layer 3. */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(file2_s1d3, O_RDONLY));
+ ASSERT_EQ(0, test_open(file2_s1d3, O_WRONLY));
+
+ /* This time, creates a write-only rule. */
+ ruleset_fd = create_ruleset(_metadata, LANDLOCK_ACCESS_FS_WRITE_FILE,
+ layer4_write);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Checks that the only change with layer 4 is that write access is
+ * denied.
+ */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d3, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file2_s1d3, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file2_s1d3, O_WRONLY));
+
+ ruleset_fd = create_ruleset(_metadata, LANDLOCK_ACCESS_FS_READ_FILE,
+ layer5_read);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks that previous access rights are unchanged with layer 5. */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d3, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file2_s1d3, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file2_s1d3, O_RDONLY));
+
+ ruleset_fd = create_ruleset(_metadata, LANDLOCK_ACCESS_FS_READ_FILE,
+ layer6_read);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks read access is now denied with layer 6. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d3, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d3, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file2_s1d3, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file2_s1d3, O_RDONLY));
+}
+
+TEST_F(layout1, inherit_subset)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_READ_DIR,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* Write access is forbidden. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_WRONLY));
+ /* Readdir access is allowed. */
+ ASSERT_EQ(0, test_open(dir_s1d2, O_RDONLY | O_DIRECTORY));
+
+ /* Write access is forbidden. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d3, O_WRONLY));
+ /* Readdir access is allowed. */
+ ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY | O_DIRECTORY));
+
+ /*
+ * Tests shared rule extension: the following rules should not grant
+ * any new access, only remove some. Once enforced, these rules are
+ * ANDed with the previous ones.
+ */
+ add_path_beneath(_metadata, ruleset_fd, LANDLOCK_ACCESS_FS_WRITE_FILE,
+ dir_s1d2);
+ /*
+ * According to ruleset_fd, dir_s1d2 should now have the
+ * LANDLOCK_ACCESS_FS_READ_FILE and LANDLOCK_ACCESS_FS_WRITE_FILE
+ * access rights (even if this directory is opened a second time).
+ * However, when enforcing this updated ruleset, the ruleset tied to
+ * the current process (i.e. its domain) will still only have the
+ * dir_s1d2 with LANDLOCK_ACCESS_FS_READ_FILE and
+ * LANDLOCK_ACCESS_FS_READ_DIR accesses, but
+ * LANDLOCK_ACCESS_FS_WRITE_FILE must not be allowed because it would
+ * be a privilege escalation.
+ */
+ enforce_ruleset(_metadata, ruleset_fd);
+
+ /* Same tests and results as above. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* It is still forbidden to write in file1_s1d2. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_WRONLY));
+ /* Readdir access is still allowed. */
+ ASSERT_EQ(0, test_open(dir_s1d2, O_RDONLY | O_DIRECTORY));
+
+ /* It is still forbidden to write in file1_s1d3. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d3, O_WRONLY));
+ /* Readdir access is still allowed. */
+ ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY | O_DIRECTORY));
+
+ /*
+ * Try to get more privileges by adding new access rights to the parent
+ * directory: dir_s1d1.
+ */
+ add_path_beneath(_metadata, ruleset_fd, ACCESS_RW, dir_s1d1);
+ enforce_ruleset(_metadata, ruleset_fd);
+
+ /* Same tests and results as above. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* It is still forbidden to write in file1_s1d2. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_WRONLY));
+ /* Readdir access is still allowed. */
+ ASSERT_EQ(0, test_open(dir_s1d2, O_RDONLY | O_DIRECTORY));
+
+ /* It is still forbidden to write in file1_s1d3. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d3, O_WRONLY));
+ /* Readdir access is still allowed. */
+ ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY | O_DIRECTORY));
+
+ /*
+ * Now, dir_s1d3 get a new rule tied to it, only allowing
+ * LANDLOCK_ACCESS_FS_WRITE_FILE. The (kernel internal) difference is
+ * that there was no rule tied to it before.
+ */
+ add_path_beneath(_metadata, ruleset_fd, LANDLOCK_ACCESS_FS_WRITE_FILE,
+ dir_s1d3);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Same tests and results as above, except for open(dir_s1d3) which is
+ * now denied because the new rule mask the rule previously inherited
+ * from dir_s1d2.
+ */
+
+ /* Same tests and results as above. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* It is still forbidden to write in file1_s1d2. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_WRONLY));
+ /* Readdir access is still allowed. */
+ ASSERT_EQ(0, test_open(dir_s1d2, O_RDONLY | O_DIRECTORY));
+
+ /* It is still forbidden to write in file1_s1d3. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d3, O_WRONLY));
+ /* Readdir of dir_s1d3 is now forbidden too. */
+ ASSERT_EQ(EACCES, test_open(dir_s1d3, O_RDONLY | O_DIRECTORY));
+}
+
+TEST_F(layout1, inherit_superset)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d3,
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+
+ /* Readdir access is denied for dir_s1d2. */
+ ASSERT_EQ(EACCES, test_open(dir_s1d2, O_RDONLY | O_DIRECTORY));
+ /* Readdir access is allowed for dir_s1d3. */
+ ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY | O_DIRECTORY));
+ /* File access is allowed for file1_s1d3. */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+
+ /* Now dir_s1d2, parent of dir_s1d3, gets a new rule tied to it. */
+ add_path_beneath(_metadata, ruleset_fd, LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_READ_DIR, dir_s1d2);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Readdir access is still denied for dir_s1d2. */
+ ASSERT_EQ(EACCES, test_open(dir_s1d2, O_RDONLY | O_DIRECTORY));
+ /* Readdir access is still allowed for dir_s1d3. */
+ ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY | O_DIRECTORY));
+ /* File access is still allowed for file1_s1d3. */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+}
+
+TEST_F(layout1, max_layers)
+{
+ int i, err;
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ for (i = 0; i < 64; i++)
+ enforce_ruleset(_metadata, ruleset_fd);
+
+ for (i = 0; i < 2; i++) {
+ err = landlock_enforce_ruleset_current(ruleset_fd, 0);
+ ASSERT_EQ(-1, err);
+ ASSERT_EQ(E2BIG, errno);
+ }
+ ASSERT_EQ(0, close(ruleset_fd));
+}
+
+TEST_F(layout1, empty_or_same_ruleset)
+{
+ struct landlock_ruleset_attr ruleset_attr = {};
+ int ruleset_fd;
+
+ /* Tests empty handled_access_fs. */
+ ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0);
+ ASSERT_LE(-1, ruleset_fd);
+ ASSERT_EQ(ENOMSG, errno);
+
+ /* Enforces policy which deny read access to all files. */
+ ruleset_attr.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE;
+ ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s1d1, O_RDONLY));
+
+ /* Nests a policy which deny read access to all directories. */
+ ruleset_attr.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR;
+ ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY));
+
+ /* Enforces a second time with the same ruleset. */
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+}
+
+TEST_F(layout1, rule_on_mountpoint)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d1,
+ .access = ACCESS_RO,
+ },
+ {
+ /* dir_s3d2 is a mount point. */
+ .path = dir_s3d2,
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(0, test_open(dir_s1d1, O_RDONLY));
+
+ ASSERT_EQ(EACCES, test_open(dir_s2d1, O_RDONLY));
+
+ ASSERT_EQ(EACCES, test_open(dir_s3d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s3d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s3d3, O_RDONLY));
+}
+
+TEST_F(layout1, rule_over_mountpoint)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d1,
+ .access = ACCESS_RO,
+ },
+ {
+ /* dir_s3d2 is a mount point. */
+ .path = dir_s3d1,
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(0, test_open(dir_s1d1, O_RDONLY));
+
+ ASSERT_EQ(EACCES, test_open(dir_s2d1, O_RDONLY));
+
+ ASSERT_EQ(0, test_open(dir_s3d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s3d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s3d3, O_RDONLY));
+}
+
+/*
+ * This test verifies that we can apply a landlock rule on the root directory
+ * (which might require special handling).
+ */
+TEST_F(layout1, rule_over_root_allow_then_deny)
+{
+ struct rule rules[] = {
+ {
+ .path = "/",
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks allowed access. */
+ ASSERT_EQ(0, test_open("/", O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s1d1, O_RDONLY));
+
+ rules[0].access = LANDLOCK_ACCESS_FS_READ_FILE;
+ ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks denied access (on a directory). */
+ ASSERT_EQ(EACCES, test_open("/", O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY));
+}
+
+TEST_F(layout1, rule_over_root_deny)
+{
+ const struct rule rules[] = {
+ {
+ .path = "/",
+ .access = LANDLOCK_ACCESS_FS_READ_FILE,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks denied access (on a directory). */
+ ASSERT_EQ(EACCES, test_open("/", O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY));
+}
+
+TEST_F(layout1, rule_inside_mount_ns)
+{
+ const struct rule rules[] = {
+ {
+ .path = "s3d3",
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ int ruleset_fd;
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL));
+ ASSERT_EQ(0, syscall(SYS_pivot_root, dir_s3d2, dir_s3d3)) {
+ TH_LOG("Failed to pivot_root into \"%s\": %s", dir_s3d2,
+ strerror(errno));
+ };
+ ASSERT_EQ(0, chdir("/"));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(0, test_open("s3d3", O_RDONLY));
+ ASSERT_EQ(EACCES, test_open("/", O_RDONLY));
+}
+
+TEST_F(layout1, mount_and_pivot)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s3d2,
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(-1, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL));
+ ASSERT_EQ(EPERM, errno);
+ ASSERT_EQ(-1, syscall(SYS_pivot_root, dir_s3d2, dir_s3d3));
+ ASSERT_EQ(EPERM, errno);
+}
+
+TEST_F(layout1, move_mount)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s3d2,
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL));
+ ASSERT_EQ(0, syscall(SYS_move_mount, AT_FDCWD, dir_s3d2, AT_FDCWD,
+ dir_s1d2, 0)) {
+ TH_LOG("Failed to move_mount: %s", strerror(errno));
+ }
+ ASSERT_EQ(0, syscall(SYS_move_mount, AT_FDCWD, dir_s1d2, AT_FDCWD,
+ dir_s3d2, 0));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(-1, syscall(SYS_move_mount, AT_FDCWD, dir_s3d2, AT_FDCWD,
+ dir_s1d2, 0));
+ ASSERT_EQ(EPERM, errno);
+}
+
+TEST_F(layout1, release_inodes)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d1,
+ .access = ACCESS_RO,
+ },
+ {
+ .path = dir_s3d2,
+ .access = ACCESS_RO,
+ },
+ {
+ .path = dir_s3d3,
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ /* Unmount a file hierarchy while it is being used by a ruleset. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, umount(dir_s3d2));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(0, test_open(file1_s1d1, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(dir_s3d2, O_RDONLY));
+ /* This dir_s3d3 would not be allowed and does not exist anyway. */
+ ASSERT_EQ(ENOENT, test_open(dir_s3d3, O_RDONLY));
+}
+
+enum relative_access {
+ REL_OPEN,
+ REL_CHDIR,
+ REL_CHROOT_ONLY,
+ REL_CHROOT_CHDIR,
+};
+
+static void test_relative_path(struct __test_metadata *const _metadata,
+ const enum relative_access rel)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = ACCESS_RO,
+ },
+ {
+ .path = dir_s2d2,
+ .access = ACCESS_RO,
+ },
+ {}
+ };
+ int dirfd;
+ const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ switch (rel) {
+ case REL_OPEN:
+ case REL_CHDIR:
+ break;
+ case REL_CHROOT_ONLY:
+ ASSERT_EQ(0, chdir(dir_s2d2));
+ break;
+ case REL_CHROOT_CHDIR:
+ ASSERT_EQ(0, chdir(dir_s1d2));
+ break;
+ default:
+ ASSERT_TRUE(false);
+ return;
+ }
+
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ enforce_ruleset(_metadata, ruleset_fd);
+
+ switch (rel) {
+ case REL_OPEN:
+ dirfd = open(dir_s1d2, O_DIRECTORY);
+ ASSERT_LE(0, dirfd);
+ break;
+ case REL_CHDIR:
+ ASSERT_EQ(0, chdir(dir_s1d2));
+ dirfd = AT_FDCWD;
+ break;
+ case REL_CHROOT_ONLY:
+ /* Do chroot into dir_s1d2 (relative to dir_s2d2). */
+ ASSERT_EQ(0, chroot("../../s1d1/s1d2")) {
+ TH_LOG("Failed to chroot: %s", strerror(errno));
+ }
+ dirfd = AT_FDCWD;
+ break;
+ case REL_CHROOT_CHDIR:
+ /* Do chroot into dir_s1d2. */
+ ASSERT_EQ(0, chroot(".")) {
+ TH_LOG("Failed to chroot: %s", strerror(errno));
+ }
+ dirfd = AT_FDCWD;
+ break;
+ }
+
+ ASSERT_EQ((rel == REL_CHROOT_CHDIR) ? 0 : EACCES,
+ test_open_rel(dirfd, "..", O_RDONLY));
+ ASSERT_EQ(0, test_open_rel(dirfd, ".", O_RDONLY));
+
+ if (rel == REL_CHROOT_ONLY) {
+ /* The current directory is dir_s2d2. */
+ ASSERT_EQ(0, test_open_rel(dirfd, "./s2d3", O_RDONLY));
+ } else {
+ /* The current directory is dir_s1d2. */
+ ASSERT_EQ(0, test_open_rel(dirfd, "./s1d3", O_RDONLY));
+ }
+
+ if (rel != REL_CHROOT_CHDIR) {
+ ASSERT_EQ(EACCES, test_open_rel(dirfd, "../../s1d1", O_RDONLY));
+ ASSERT_EQ(0, test_open_rel(dirfd, "../../s1d1/s1d2", O_RDONLY));
+ ASSERT_EQ(0, test_open_rel(dirfd, "../../s1d1/s1d2/s1d3", O_RDONLY));
+
+ ASSERT_EQ(EACCES, test_open_rel(dirfd, "../../s2d1", O_RDONLY));
+ ASSERT_EQ(0, test_open_rel(dirfd, "../../s2d1/s2d2", O_RDONLY));
+ ASSERT_EQ(0, test_open_rel(dirfd, "../../s2d1/s2d2/s2d3", O_RDONLY));
+ }
+
+ if (rel == REL_OPEN)
+ ASSERT_EQ(0, close(dirfd));
+ ASSERT_EQ(0, close(ruleset_fd));
+}
+
+TEST_F(layout1, relative_open)
+{
+ test_relative_path(_metadata, REL_OPEN);
+}
+
+TEST_F(layout1, relative_chdir)
+{
+ test_relative_path(_metadata, REL_CHDIR);
+}
+
+TEST_F(layout1, relative_chroot_only)
+{
+ test_relative_path(_metadata, REL_CHROOT_ONLY);
+}
+
+TEST_F(layout1, relative_chroot_chdir)
+{
+ test_relative_path(_metadata, REL_CHROOT_CHDIR);
+}
+
+static void copy_binary(struct __test_metadata *const _metadata,
+ const char *const dst_path)
+{
+ int dst_fd, src_fd;
+ struct stat statbuf;
+
+ dst_fd = open(dst_path, O_WRONLY | O_TRUNC | O_CLOEXEC);
+ ASSERT_LE(0, dst_fd) {
+ TH_LOG("Failed to open \"%s\": %s", dst_path,
+ strerror(errno));
+ }
+ src_fd = open(BINARY_PATH, O_RDONLY | O_CLOEXEC);
+ ASSERT_LE(0, src_fd) {
+ TH_LOG("Failed to open \"" BINARY_PATH "\": %s",
+ strerror(errno));
+ }
+ ASSERT_EQ(0, fstat(src_fd, &statbuf));
+ ASSERT_EQ(statbuf.st_size, sendfile(dst_fd, src_fd, 0,
+ statbuf.st_size));
+ ASSERT_EQ(0, close(src_fd));
+ ASSERT_EQ(0, close(dst_fd));
+}
+
+static void test_execute(struct __test_metadata *const _metadata,
+ const char *const path, const int ret)
+{
+ int status;
+ char *const argv[] = {(char *)path, NULL};
+ const pid_t child = fork();
+
+ ASSERT_LE(0, child);
+ if (child == 0) {
+ ASSERT_EQ(ret, execve(path, argv, NULL)) {
+ TH_LOG("Failed to execute \"%s\": %s", path,
+ strerror(errno));
+ };
+ ASSERT_EQ(EACCES, errno);
+ _exit(_metadata->passed ? 2 : 1);
+ return;
+ }
+ ASSERT_EQ(child, waitpid(child, &status, 0));
+ ASSERT_EQ(1, WIFEXITED(status));
+ ASSERT_EQ(ret ? 2 : 0, WEXITSTATUS(status)) {
+ TH_LOG("Unexpected return code for \"%s\": %s", path,
+ strerror(errno));
+ };
+}
+
+TEST_F(layout1, execute)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_EXECUTE,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+ rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ copy_binary(_metadata, file1_s1d1);
+ copy_binary(_metadata, file1_s1d2);
+ copy_binary(_metadata, file1_s1d3);
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ test_execute(_metadata, file1_s1d1, -1);
+ test_execute(_metadata, file1_s1d2, 0);
+ test_execute(_metadata, file1_s1d3, 0);
+}
+
+TEST_F(layout1, link)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_MAKE_REG,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+ rules);
+
+ ASSERT_LE(0, ruleset_fd);
+
+ ASSERT_EQ(0, unlink(file1_s1d1));
+ ASSERT_EQ(0, unlink(file1_s1d2));
+ ASSERT_EQ(0, unlink(file1_s1d3));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(-1, link(file2_s1d1, file1_s1d1));
+ ASSERT_EQ(EACCES, errno);
+ /* Denies linking because of reparenting. */
+ ASSERT_EQ(-1, link(file1_s2d1, file1_s1d2));
+ ASSERT_EQ(EACCES, errno);
+ ASSERT_EQ(-1, link(file2_s1d2, file1_s1d3));
+ ASSERT_EQ(EACCES, errno);
+
+ ASSERT_EQ(0, link(file2_s1d2, file1_s1d2)) {
+ TH_LOG("Failed to link file to \"%s\": %s", file2_s1d2,
+ strerror(errno));
+ };
+ ASSERT_EQ(0, link(file2_s1d3, file1_s1d3));
+}
+
+TEST_F(layout1, rename_file)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d3,
+ .access = LANDLOCK_ACCESS_FS_REMOVE_FILE,
+ },
+ {
+ .path = dir_s2d2,
+ .access = LANDLOCK_ACCESS_FS_REMOVE_FILE,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+ rules);
+
+ ASSERT_LE(0, ruleset_fd);
+
+ ASSERT_EQ(0, unlink(file1_s1d1));
+ ASSERT_EQ(0, unlink(file1_s1d2));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Replaces file. */
+ ASSERT_EQ(-1, rename(file1_s2d3, file1_s1d3));
+ ASSERT_EQ(EACCES, errno);
+ ASSERT_EQ(-1, rename(file1_s2d1, file1_s1d3));
+ ASSERT_EQ(EACCES, errno);
+ /* Same parent. */
+ ASSERT_EQ(0, rename(file2_s2d3, file1_s2d3)) {
+ TH_LOG("Failed to rename file \"%s\": %s", file2_s2d3,
+ strerror(errno));
+ };
+
+ /* Renames files. */
+ ASSERT_EQ(-1, rename(file1_s2d2, file1_s1d2));
+ ASSERT_EQ(EACCES, errno);
+ ASSERT_EQ(0, unlink(file1_s1d3));
+ ASSERT_EQ(-1, rename(file1_s2d1, file1_s1d3));
+ ASSERT_EQ(EACCES, errno);
+ /* Same parent. */
+ ASSERT_EQ(0, rename(file2_s1d3, file1_s1d3));
+}
+
+TEST_F(layout1, rename_dir)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_REMOVE_DIR,
+ },
+ {
+ .path = dir_s2d1,
+ .access = LANDLOCK_ACCESS_FS_REMOVE_DIR,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+ rules);
+
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Empties dir_s1d3. */
+ ASSERT_EQ(0, unlink(file1_s1d3));
+ ASSERT_EQ(0, unlink(file2_s1d3));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Renames directory. */
+ ASSERT_EQ(-1, rename(dir_s2d3, dir_s1d3));
+ ASSERT_EQ(EACCES, errno);
+ ASSERT_EQ(0, unlink(file1_s1d2));
+ ASSERT_EQ(0, rename(dir_s1d3, file1_s1d2)) {
+ TH_LOG("Failed to rename directory \"%s\": %s", dir_s1d3,
+ strerror(errno));
+ };
+ ASSERT_EQ(0, rmdir(file1_s1d2));
+}
+
+TEST_F(layout1, rmdir)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_REMOVE_DIR,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+ rules);
+
+ ASSERT_LE(0, ruleset_fd);
+
+ ASSERT_EQ(0, unlink(file1_s1d1));
+ ASSERT_EQ(0, unlink(file1_s1d2));
+ ASSERT_EQ(0, unlink(file1_s1d3));
+ ASSERT_EQ(0, unlink(file2_s1d3));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(0, rmdir(dir_s1d3));
+ /* dir_s1d2 itself cannot be removed. */
+ ASSERT_EQ(-1, rmdir(dir_s1d2));
+ ASSERT_EQ(EACCES, errno);
+ ASSERT_EQ(-1, rmdir(dir_s1d1));
+ ASSERT_EQ(EACCES, errno);
+}
+
+TEST_F(layout1, unlink)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_REMOVE_FILE,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+ rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(-1, unlink(file1_s1d1));
+ ASSERT_EQ(EACCES, errno);
+ ASSERT_EQ(0, unlink(file1_s1d2)) {
+ TH_LOG("Failed to unlink file \"%s\": %s", file1_s1d2,
+ strerror(errno));
+ };
+ ASSERT_EQ(0, unlink(file1_s1d3));
+}
+
+static void test_make_file(struct __test_metadata *const _metadata,
+ const __u64 access, const mode_t mode, const dev_t dev)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = access,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, access, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+
+ unlink(file1_s1d1);
+ unlink(file1_s1d2);
+ unlink(file1_s1d3);
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(-1, mknod(file1_s1d1, mode | 0400, dev));
+ ASSERT_EQ(EACCES, errno);
+ ASSERT_EQ(0, mknod(file1_s1d2, mode | 0400, dev)) {
+ TH_LOG("Failed to make file \"%s\": %s",
+ file1_s1d2, strerror(errno));
+ };
+ ASSERT_EQ(0, mknod(file1_s1d3, mode | 0400, dev));
+}
+
+TEST_F(layout1, make_char)
+{
+ /* Creates a /dev/null device. */
+ set_cap(_metadata, CAP_MKNOD);
+ test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_CHAR, S_IFCHR,
+ makedev(1, 3));
+}
+
+TEST_F(layout1, make_block)
+{
+ /* Creates a /dev/loop0 device. */
+ set_cap(_metadata, CAP_MKNOD);
+ test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_BLOCK, S_IFBLK,
+ makedev(7, 0));
+}
+
+TEST_F(layout1, make_reg)
+{
+ test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_REG, S_IFREG, 0);
+ test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_REG, 0, 0);
+}
+
+TEST_F(layout1, make_sock)
+{
+ test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_SOCK, S_IFSOCK, 0);
+}
+
+TEST_F(layout1, make_fifo)
+{
+ test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_FIFO, S_IFIFO, 0);
+}
+
+TEST_F(layout1, make_sym)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_MAKE_SYM,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+ rules);
+
+ ASSERT_LE(0, ruleset_fd);
+
+ ASSERT_EQ(0, unlink(file1_s1d1));
+ ASSERT_EQ(0, unlink(file1_s1d2));
+ ASSERT_EQ(0, unlink(file1_s1d3));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(-1, symlink("none", file1_s1d1));
+ ASSERT_EQ(EACCES, errno);
+ ASSERT_EQ(0, symlink("none", file1_s1d2)) {
+ TH_LOG("Failed to make symlink \"%s\": %s",
+ file1_s1d2, strerror(errno));
+ };
+ ASSERT_EQ(0, symlink("none", file1_s1d3));
+}
+
+TEST_F(layout1, make_dir)
+{
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_MAKE_DIR,
+ },
+ {}
+ };
+ const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+ rules);
+
+ ASSERT_LE(0, ruleset_fd);
+
+ ASSERT_EQ(0, unlink(file1_s1d1));
+ ASSERT_EQ(0, unlink(file1_s1d2));
+ ASSERT_EQ(0, unlink(file1_s1d3));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Uses file_* as directory names. */
+ ASSERT_EQ(-1, mkdir(file1_s1d1, 0700));
+ ASSERT_EQ(EACCES, errno);
+ ASSERT_EQ(0, mkdir(file1_s1d2, 0700)) {
+ TH_LOG("Failed to make directory \"%s\": %s",
+ file1_s1d2, strerror(errno));
+ };
+ ASSERT_EQ(0, mkdir(file1_s1d3, 0700));
+}
+
+static int open_proc_fd(struct __test_metadata *const _metadata, const int fd,
+ const int open_flags)
+{
+ static const char path_template[] = "/proc/self/fd/%d";
+ char procfd_path[sizeof(path_template) + 10];
+ const int procfd_path_size = snprintf(procfd_path, sizeof(procfd_path),
+ path_template, fd);
+
+ ASSERT_LT(procfd_path_size, sizeof(procfd_path));
+ return open(procfd_path, open_flags);
+}
+
+TEST_F(layout1, proc_unlinked_file)
+{
+ const struct rule rules[] = {
+ {
+ .path = file1_s1d2,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE,
+ },
+ {}
+ };
+ int reg_fd, proc_fd;
+ const int ruleset_fd = create_ruleset(_metadata,
+ LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_WRITE_FILE, rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_RDWR));
+ ASSERT_EQ(0, test_open(file1_s1d2, O_RDONLY));
+ reg_fd = open(file1_s1d2, O_RDONLY | O_CLOEXEC);
+ ASSERT_LE(0, reg_fd);
+ ASSERT_EQ(0, unlink(file1_s1d2));
+
+ proc_fd = open_proc_fd(_metadata, reg_fd, O_RDONLY | O_CLOEXEC);
+ ASSERT_LE(0, proc_fd);
+ ASSERT_EQ(0, close(proc_fd));
+
+ proc_fd = open_proc_fd(_metadata, reg_fd, O_RDWR | O_CLOEXEC);
+ ASSERT_EQ(-1, proc_fd) {
+ TH_LOG("Successfully opened /proc/self/fd/%d: %s",
+ reg_fd, strerror(errno));
+ }
+ ASSERT_EQ(EACCES, errno);
+
+ ASSERT_EQ(0, close(reg_fd));
+}
+
+TEST_F(layout1, proc_pipe)
+{
+ int proc_fd;
+ int pipe_fds[2];
+ char buf = '\0';
+ const struct rule rules[] = {
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_WRITE_FILE,
+ },
+ {}
+ };
+ /* Limits read and write access to files tied to the filesystem. */
+ const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+ rules);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks enforcement for normal files. */
+ ASSERT_EQ(0, test_open(file1_s1d2, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDWR));
+
+ /* Checks access to pipes through FD. */
+ ASSERT_EQ(0, pipe2(pipe_fds, O_CLOEXEC));
+ ASSERT_EQ(1, write(pipe_fds[1], ".", 1)) {
+ TH_LOG("Failed to write in pipe: %s", strerror(errno));
+ }
+ ASSERT_EQ(1, read(pipe_fds[0], &buf, 1));
+ ASSERT_EQ('.', buf);
+
+ /* Checks write access to pipe through /proc/self/fd . */
+ proc_fd = open_proc_fd(_metadata, pipe_fds[1], O_WRONLY | O_CLOEXEC);
+ ASSERT_LE(0, proc_fd);
+ ASSERT_EQ(1, write(proc_fd, ".", 1)) {
+ TH_LOG("Failed to write through /proc/self/fd/%d: %s",
+ pipe_fds[1], strerror(errno));
+ }
+ ASSERT_EQ(0, close(proc_fd));
+
+ /* Checks read access to pipe through /proc/self/fd . */
+ proc_fd = open_proc_fd(_metadata, pipe_fds[0], O_RDONLY | O_CLOEXEC);
+ ASSERT_LE(0, proc_fd);
+ buf = '\0';
+ ASSERT_EQ(1, read(proc_fd, &buf, 1)) {
+ TH_LOG("Failed to read through /proc/self/fd/%d: %s",
+ pipe_fds[1], strerror(errno));
+ }
+ ASSERT_EQ(0, close(proc_fd));
+
+ ASSERT_EQ(0, close(pipe_fds[0]));
+ ASSERT_EQ(0, close(pipe_fds[1]));
+}
+
+TEST(cleanup)
+{
+ cleanup_layout1(_metadata);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/ptrace_test.c b/tools/testing/selftests/landlock/ptrace_test.c
new file mode 100644
index 000000000000..b5d97e2ff0d5
--- /dev/null
+++ b/tools/testing/selftests/landlock/ptrace_test.c
@@ -0,0 +1,314 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Ptrace
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2019-2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <signal.h>
+#include <sys/prctl.h>
+#include <sys/ptrace.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "common.h"
+
+static void create_domain(struct __test_metadata *const _metadata)
+{
+ int ruleset_fd;
+ struct landlock_ruleset_attr ruleset_attr = {
+ .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
+ };
+ struct landlock_path_beneath_attr path_beneath_attr = {
+ .allowed_access = LANDLOCK_ACCESS_FS_READ_FILE,
+ };
+
+ ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0);
+ EXPECT_LE(0, ruleset_fd) {
+ TH_LOG("Failed to create a ruleset: %s", strerror(errno));
+ }
+ path_beneath_attr.parent_fd = open("/tmp", O_PATH | O_NOFOLLOW |
+ O_DIRECTORY | O_CLOEXEC);
+ EXPECT_LE(0, path_beneath_attr.parent_fd);
+ EXPECT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath_attr, 0));
+ EXPECT_EQ(0, close(path_beneath_attr.parent_fd));
+
+ EXPECT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0));
+ EXPECT_EQ(0, landlock_enforce_ruleset_current(ruleset_fd, 0));
+ EXPECT_EQ(0, close(ruleset_fd));
+}
+
+FIXTURE(hierarchy) { };
+
+FIXTURE_VARIANT(hierarchy) {
+ const bool domain_both;
+ const bool domain_parent;
+ const bool domain_child;
+};
+
+/*
+ * Test multiple tracing combinations between a parent process P1 and a child
+ * process P2.
+ *
+ * Yama's scoped ptrace is presumed disabled. If enabled, this optional
+ * restriction is enforced in addition to any Landlock check, which means that
+ * all P2 requests to trace P1 would be denied.
+ */
+
+/*
+ * No domain
+ *
+ * P1-. P1 -> P2 : allow
+ * \ P2 -> P1 : allow
+ * 'P2
+ */
+FIXTURE_VARIANT_ADD(hierarchy, allow_without_domain) {
+ .domain_both = false,
+ .domain_parent = false,
+ .domain_child = false,
+};
+
+/*
+ * Child domain
+ *
+ * P1--. P1 -> P2 : allow
+ * \ P2 -> P1 : deny
+ * .'-----.
+ * | P2 |
+ * '------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, allow_with_one_domain) {
+ .domain_both = false,
+ .domain_parent = false,
+ .domain_child = true,
+};
+
+/*
+ * Parent domain
+ * .------.
+ * | P1 --. P1 -> P2 : deny
+ * '------' \ P2 -> P1 : allow
+ * '
+ * P2
+ */
+FIXTURE_VARIANT_ADD(hierarchy, deny_with_parent_domain) {
+ .domain_both = false,
+ .domain_parent = true,
+ .domain_child = false,
+};
+
+/*
+ * Parent + child domain (siblings)
+ * .------.
+ * | P1 ---. P1 -> P2 : deny
+ * '------' \ P2 -> P1 : deny
+ * .---'--.
+ * | P2 |
+ * '------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, deny_with_sibling_domain) {
+ .domain_both = false,
+ .domain_parent = true,
+ .domain_child = true,
+};
+
+/*
+ * Same domain (inherited)
+ * .-------------.
+ * | P1----. | P1 -> P2 : allow
+ * | \ | P2 -> P1 : allow
+ * | ' |
+ * | P2 |
+ * '-------------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, allow_sibling_domain) {
+ .domain_both = true,
+ .domain_parent = false,
+ .domain_child = false,
+};
+
+/*
+ * Inherited + child domain
+ * .-----------------.
+ * | P1----. | P1 -> P2 : allow
+ * | \ | P2 -> P1 : deny
+ * | .-'----. |
+ * | | P2 | |
+ * | '------' |
+ * '-----------------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, allow_with_nested_domain) {
+ .domain_both = true,
+ .domain_parent = false,
+ .domain_child = true,
+};
+
+/*
+ * Inherited + parent domain
+ * .-----------------.
+ * |.------. | P1 -> P2 : deny
+ * || P1 ----. | P2 -> P1 : allow
+ * |'------' \ |
+ * | ' |
+ * | P2 |
+ * '-----------------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, deny_with_nested_and_parent_domain) {
+ .domain_both = true,
+ .domain_parent = true,
+ .domain_child = false,
+};
+
+/*
+ * Inherited + parent and child domain (siblings)
+ * .-----------------.
+ * | .------. | P1 -> P2 : deny
+ * | | P1 . | P2 -> P1 : deny
+ * | '------'\ |
+ * | \ |
+ * | .--'---. |
+ * | | P2 | |
+ * | '------' |
+ * '-----------------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, deny_with_forked_domain) {
+ .domain_both = true,
+ .domain_parent = true,
+ .domain_child = true,
+};
+
+FIXTURE_SETUP(hierarchy)
+{ }
+
+FIXTURE_TEARDOWN(hierarchy)
+{ }
+
+/* Test PTRACE_TRACEME and PTRACE_ATTACH for parent and child. */
+TEST_F(hierarchy, trace)
+{
+ pid_t child, parent;
+ int status;
+ int pipe_child[2], pipe_parent[2];
+ char buf_parent;
+ long ret;
+
+ disable_caps(_metadata);
+
+ parent = getpid();
+ ASSERT_EQ(0, pipe2(pipe_child, O_CLOEXEC));
+ ASSERT_EQ(0, pipe2(pipe_parent, O_CLOEXEC));
+ if (variant->domain_both) {
+ create_domain(_metadata);
+ if (!_metadata->passed)
+ /* Aborts before forking. */
+ return;
+ }
+
+ child = fork();
+ ASSERT_LE(0, child);
+ if (child == 0) {
+ char buf_child;
+
+ ASSERT_EQ(0, close(pipe_parent[1]));
+ ASSERT_EQ(0, close(pipe_child[0]));
+ if (variant->domain_child)
+ create_domain(_metadata);
+
+ /* Waits for the parent to be in a domain, if any. */
+ ASSERT_EQ(1, read(pipe_parent[0], &buf_child, 1));
+
+ /* Tests PTRACE_ATTACH on the parent. */
+ ret = ptrace(PTRACE_ATTACH, parent, NULL, 0);
+ if (variant->domain_child) {
+ EXPECT_EQ(-1, ret);
+ EXPECT_EQ(EPERM, errno);
+ } else {
+ EXPECT_EQ(0, ret);
+ }
+ if (ret == 0) {
+ ASSERT_EQ(parent, waitpid(parent, &status, 0));
+ ASSERT_EQ(1, WIFSTOPPED(status));
+ ASSERT_EQ(0, ptrace(PTRACE_DETACH, parent, NULL, 0));
+ }
+
+ /* Tests child PTRACE_TRACEME. */
+ ret = ptrace(PTRACE_TRACEME);
+ if (variant->domain_parent) {
+ EXPECT_EQ(-1, ret);
+ EXPECT_EQ(EPERM, errno);
+ } else {
+ EXPECT_EQ(0, ret);
+ }
+
+ /*
+ * Signals that the PTRACE_ATTACH test is done and the
+ * PTRACE_TRACEME test is ongoing.
+ */
+ ASSERT_EQ(1, write(pipe_child[1], ".", 1));
+
+ if (!variant->domain_parent) {
+ ASSERT_EQ(0, raise(SIGSTOP));
+ }
+
+ /* Waits for the parent PTRACE_ATTACH test. */
+ ASSERT_EQ(1, read(pipe_parent[0], &buf_child, 1));
+ _exit(_metadata->passed ? EXIT_SUCCESS : EXIT_FAILURE);
+ return;
+ }
+
+ ASSERT_EQ(0, close(pipe_child[1]));
+ ASSERT_EQ(0, close(pipe_parent[0]));
+ if (variant->domain_parent)
+ create_domain(_metadata);
+
+ /* Signals that the parent is in a domain, if any. */
+ ASSERT_EQ(1, write(pipe_parent[1], ".", 1));
+
+ /*
+ * Waits for the child to test PTRACE_ATTACH on the parent and start
+ * testing PTRACE_TRACEME.
+ */
+ ASSERT_EQ(1, read(pipe_child[0], &buf_parent, 1));
+
+ /* Tests child PTRACE_TRACEME. */
+ if (!variant->domain_parent) {
+ ASSERT_EQ(child, waitpid(child, &status, 0));
+ ASSERT_EQ(1, WIFSTOPPED(status));
+ ASSERT_EQ(0, ptrace(PTRACE_DETACH, child, NULL, 0));
+ } else {
+ /* The child should not be traced by the parent. */
+ EXPECT_EQ(-1, ptrace(PTRACE_DETACH, child, NULL, 0));
+ EXPECT_EQ(ESRCH, errno);
+ }
+
+ /* Tests PTRACE_ATTACH on the child. */
+ ret = ptrace(PTRACE_ATTACH, child, NULL, 0);
+ if (variant->domain_parent) {
+ EXPECT_EQ(-1, ret);
+ EXPECT_EQ(EPERM, errno);
+ } else {
+ EXPECT_EQ(0, ret);
+ }
+ if (ret == 0) {
+ ASSERT_EQ(child, waitpid(child, &status, 0));
+ ASSERT_EQ(1, WIFSTOPPED(status));
+ ASSERT_EQ(0, ptrace(PTRACE_DETACH, child, NULL, 0));
+ }
+
+ /* Signals that the parent PTRACE_ATTACH test is done. */
+ ASSERT_EQ(1, write(pipe_parent[1], ".", 1));
+ ASSERT_EQ(child, waitpid(child, &status, 0));
+ if (WIFSIGNALED(status) || !WIFEXITED(status) ||
+ WEXITSTATUS(status) != EXIT_SUCCESS)
+ _metadata->passed = 0;
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/true.c b/tools/testing/selftests/landlock/true.c
new file mode 100644
index 000000000000..3f9ccbf52783
--- /dev/null
+++ b/tools/testing/selftests/landlock/true.c
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+int main(void)
+{
+ return 0;
+}
--
2.29.2

2020-12-09 23:53:29

by Mickaël Salaün

[permalink] [raw]
Subject: [PATCH v26 01/12] landlock: Add object management

From: Mickaël Salaün <[email protected]>

A Landlock object enables to identify a kernel object (e.g. an inode).
A Landlock rule is a set of access rights allowed on an object. Rules
are grouped in rulesets that may be tied to a set of processes (i.e.
subjects) to enforce a scoped access-control (i.e. a domain).

Because Landlock's goal is to empower any process (especially
unprivileged ones) to sandbox themselves, we cannot rely on a
system-wide object identification such as file extended attributes.
Indeed, we need innocuous, composable and modular access-controls.

The main challenge with these constraints is to identify kernel objects
while this identification is useful (i.e. when a security policy makes
use of this object). But this identification data should be freed once
no policy is using it. This ephemeral tagging should not and may not be
written in the filesystem. We then need to manage the lifetime of a
rule according to the lifetime of its objects. To avoid a global lock,
this implementation make use of RCU and counters to safely reference
objects.

A following commit uses this generic object management for inodes.

Cc: James Morris <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Signed-off-by: Mickaël Salaün <[email protected]>
Reviewed-by: Jann Horn <[email protected]>
---

Changes since v24:
* Fix typo in comment (spotted by Jann Horn).
* Add Reviewed-by: Jann Horn <[email protected]>

Changes since v23:
* Update landlock_create_object() to return error codes instead of NULL.
This help error handling in callers.
* When using make oldconfig with a previous configuration already
including the CONFIG_LSM variable, no question is asked to update its
content. Update the Kconfig help to warn about LSM stacking
configuration.
* Constify variable (spotted by Vincent Dagonneau).

Changes since v22:
* Fix spelling (spotted by Jann Horn).

Changes since v21:
* Update Kconfig help.
* Clean up comments.

Changes since v18:
* Account objects to kmemcg.

Changes since v14:
* Simplify the object, rule and ruleset management at the expense of a
less aggressive memory freeing (contributed by Jann Horn, with
additional modifications):
- Remove object->list aggregating the rules tied to an object.
- Remove landlock_get_object(), landlock_drop_object(),
{get,put}_object_cleaner() and landlock_rule_is_disabled().
- Rewrite landlock_put_object() to use a more simple mechanism
(no tricky RCU).
- Replace enum landlock_object_type and landlock_release_object() with
landlock_object_underops->release()
- Adjust unions and Sparse annotations.
Cf. https://lore.kernel.org/lkml/CAG48ez21bEn0wL1bbmTiiu8j9jP5iEWtHOwz4tURUJ+ki0ydYw@mail.gmail.com/
* Merge struct landlock_rule into landlock_ruleset_elem to simplify the
rule management.
* Constify variables.
* Improve kernel documentation.
* Cosmetic variable renames.
* Remove the "default" in the Kconfig (suggested by Jann Horn).
* Only use refcount_inc() through getter helpers.
* Update Kconfig description.

Changes since v13:
* New dedicated implementation, removing the need for eBPF.

Previous changes:
https://lore.kernel.org/lkml/[email protected]/
---
MAINTAINERS | 10 +++++
security/Kconfig | 1 +
security/Makefile | 2 +
security/landlock/Kconfig | 21 +++++++++
security/landlock/Makefile | 3 ++
security/landlock/object.c | 67 ++++++++++++++++++++++++++++
security/landlock/object.h | 91 ++++++++++++++++++++++++++++++++++++++
7 files changed, 195 insertions(+)
create mode 100644 security/landlock/Kconfig
create mode 100644 security/landlock/Makefile
create mode 100644 security/landlock/object.c
create mode 100644 security/landlock/object.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6f474153dbec..dc718573317e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9827,6 +9827,16 @@ F: net/core/sock_map.c
F: net/ipv4/tcp_bpf.c
F: net/ipv4/udp_bpf.c

+LANDLOCK SECURITY MODULE
+M: Mickaël Salaün <[email protected]>
+L: [email protected]
+S: Supported
+W: https://landlock.io
+T: git https://github.com/landlock-lsm/linux.git
+F: security/landlock/
+K: landlock
+K: LANDLOCK
+
LANTIQ / INTEL Ethernet drivers
M: Hauke Mehrtens <[email protected]>
L: [email protected]
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..15a4342b5d01 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -238,6 +238,7 @@ source "security/loadpin/Kconfig"
source "security/yama/Kconfig"
source "security/safesetid/Kconfig"
source "security/lockdown/Kconfig"
+source "security/landlock/Kconfig"

source "security/integrity/Kconfig"

diff --git a/security/Makefile b/security/Makefile
index 3baf435de541..c688f4907a1b 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -13,6 +13,7 @@ subdir-$(CONFIG_SECURITY_LOADPIN) += loadpin
subdir-$(CONFIG_SECURITY_SAFESETID) += safesetid
subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown
subdir-$(CONFIG_BPF_LSM) += bpf
+subdir-$(CONFIG_SECURITY_LANDLOCK) += landlock

# always enable default capabilities
obj-y += commoncap.o
@@ -32,6 +33,7 @@ obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/
obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/
obj-$(CONFIG_CGROUPS) += device_cgroup.o
obj-$(CONFIG_BPF_LSM) += bpf/
+obj-$(CONFIG_SECURITY_LANDLOCK) += landlock/

# Object integrity file lists
subdir-$(CONFIG_INTEGRITY) += integrity
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
new file mode 100644
index 000000000000..ea58e6208afa
--- /dev/null
+++ b/security/landlock/Kconfig
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config SECURITY_LANDLOCK
+ bool "Landlock support"
+ depends on SECURITY
+ select SECURITY_PATH
+ help
+ Landlock is a safe sandboxing mechanism which enables processes to
+ restrict themselves (and their future children) by gradually
+ enforcing tailored access control policies. A security policy is a
+ set of access rights (e.g. open a file in read-only, make a
+ directory, etc.) tied to a file hierarchy. Such policy can be configured
+ and enforced by any processes for themselves thanks to dedicated system
+ calls: landlock_create_ruleset(), landlock_add_rule(), and
+ landlock_enforce_ruleset_current().
+
+ See Documentation/userspace-api/landlock.rst for further information.
+
+ If you are unsure how to answer this question, answer N. Otherwise, you
+ should also prepend "landlock," to the content of CONFIG_LSM to enable
+ Landlock at boot time.
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
new file mode 100644
index 000000000000..cb6deefbf4c0
--- /dev/null
+++ b/security/landlock/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
+
+landlock-y := object.o
diff --git a/security/landlock/object.c b/security/landlock/object.c
new file mode 100644
index 000000000000..d674fdf9ff04
--- /dev/null
+++ b/security/landlock/object.c
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Object management
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include <linux/bug.h>
+#include <linux/compiler_types.h>
+#include <linux/err.h>
+#include <linux/kernel.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include "object.h"
+
+struct landlock_object *landlock_create_object(
+ const struct landlock_object_underops *const underops,
+ void *const underobj)
+{
+ struct landlock_object *new_object;
+
+ if (WARN_ON_ONCE(!underops || !underobj))
+ return ERR_PTR(-ENOENT);
+ new_object = kzalloc(sizeof(*new_object), GFP_KERNEL_ACCOUNT);
+ if (!new_object)
+ return ERR_PTR(-ENOMEM);
+ refcount_set(&new_object->usage, 1);
+ spin_lock_init(&new_object->lock);
+ new_object->underops = underops;
+ new_object->underobj = underobj;
+ return new_object;
+}
+
+/*
+ * The caller must own the object (i.e. thanks to object->usage) to safely put
+ * it.
+ */
+void landlock_put_object(struct landlock_object *const object)
+{
+ /*
+ * The call to @object->underops->release(object) might sleep, e.g.
+ * because of iput().
+ */
+ might_sleep();
+ if (!object)
+ return;
+
+ /*
+ * If the @object's refcount cannot drop to zero, we can just decrement
+ * the refcount without holding a lock. Otherwise, the decrement must
+ * happen under @object->lock for synchronization with things like
+ * get_inode_object().
+ */
+ if (refcount_dec_and_lock(&object->usage, &object->lock)) {
+ __acquire(&object->lock);
+ /*
+ * With @object->lock initially held, remove the reference from
+ * @object->underobj to @object (if it still exists).
+ */
+ object->underops->release(object);
+ kfree_rcu(object, rcu_free);
+ }
+}
diff --git a/security/landlock/object.h b/security/landlock/object.h
new file mode 100644
index 000000000000..1d6edbf939e2
--- /dev/null
+++ b/security/landlock/object.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Object management
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <[email protected]>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_OBJECT_H
+#define _SECURITY_LANDLOCK_OBJECT_H
+
+#include <linux/compiler_types.h>
+#include <linux/refcount.h>
+#include <linux/spinlock.h>
+
+struct landlock_object;
+
+/**
+ * struct landlock_object_underops - Operations on an underlying object
+ */
+struct landlock_object_underops {
+ /**
+ * @release: Releases the underlying object (e.g. iput() for an inode).
+ */
+ void (*release)(struct landlock_object *const object)
+ __releases(object->lock);
+};
+
+/**
+ * struct landlock_object - Security blob tied to a kernel object
+ *
+ * The goal of this structure is to enable to tie a set of ephemeral access
+ * rights (pertaining to different domains) to a kernel object (e.g an inode)
+ * in a safe way. This imply to handle concurrent use and modification.
+ *
+ * The lifetime of a &struct landlock_object depends of the rules referring to
+ * it.
+ */
+struct landlock_object {
+ /**
+ * @usage: This counter is used to tie an object to the rules matching
+ * it or to keep it alive while adding a new rule. If this counter
+ * reaches zero, this struct must not be modified, but this counter can
+ * still be read from within an RCU read-side critical section. When
+ * adding a new rule to an object with a usage counter of zero, we must
+ * wait until the pointer to this object is set to NULL (or recycled).
+ */
+ refcount_t usage;
+ /**
+ * @lock: Guards against concurrent modifications. This lock must be
+ * held from the time @usage drops to zero until any weak references
+ * from @underobj to this object have been cleaned up.
+ *
+ * Lock ordering: inode->i_lock nests inside this.
+ */
+ spinlock_t lock;
+ /**
+ * @underobj: Used when cleaning up an object and to mark an object as
+ * tied to its underlying kernel structure. This pointer is protected
+ * by @lock. Cf. landlock_release_inodes() and release_inode().
+ */
+ void *underobj;
+ union {
+ /**
+ * @rcu_free: Enables lockless use of @usage, @lock and
+ * @underobj from within an RCU read-side critical section.
+ * @rcu_free and @underops are only used by
+ * landlock_put_object().
+ */
+ struct rcu_head rcu_free;
+ /**
+ * @underops: Enables landlock_put_object() to release the
+ * underlying object (e.g. inode).
+ */
+ const struct landlock_object_underops *underops;
+ };
+};
+
+struct landlock_object *landlock_create_object(
+ const struct landlock_object_underops *const underops,
+ void *const underobj);
+
+void landlock_put_object(struct landlock_object *const object);
+
+static inline void landlock_get_object(struct landlock_object *const object)
+{
+ if (object)
+ refcount_inc(&object->usage);
+}
+
+#endif /* _SECURITY_LANDLOCK_OBJECT_H */
--
2.29.2

2021-01-14 03:24:14

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v26 11/12] samples/landlock: Add a sandbox manager example

On Wed, Dec 9, 2020 at 8:29 PM Mickaël Salaün <[email protected]> wrote:
> Add a basic sandbox tool to launch a command which can only access a
> whitelist of file hierarchies in a read-only or read-write way.

I have to admit that I didn't really look at this closely before
because it's just sample code... but I guess I should. You can add

Reviewed-by: Jann Horn <[email protected]>

if you fix the following nits:

[...]
> diff --git a/samples/Kconfig b/samples/Kconfig
[...]
> +config SAMPLE_LANDLOCK
> + bool "Build Landlock sample code"
> + depends on HEADERS_INSTALL
> + help
> + Build a simple Landlock sandbox manager able to launch a process
> + restricted by a user-defined filesystem access control.

nit: s/filesystem access control/filesystem access control policy/

[...]
> diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
[...]
> +/*
> + * Simple Landlock sandbox manager able to launch a process restricted by a
> + * user-defined filesystem access control.

nit: s/filesystem access control/filesystem access control policy/

[...]
> +int main(const int argc, char *const argv[], char *const *const envp)
> +{
[...]
> + if (argc < 2) {
[...]
> + fprintf(stderr, "* %s: list of paths allowed to be used in a read-only way.\n",
> + ENV_FS_RO_NAME);
> + fprintf(stderr, "* %s: list of paths allowed to be used in a read-write way.\n",
> + ENV_FS_RO_NAME);

s/ENV_FS_RO_NAME/ENV_FS_RW_NAME/

> + fprintf(stderr, "\nexample:\n"
> + "%s=\"/bin:/lib:/usr:/proc:/etc:/dev/urandom\" "
> + "%s=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" "
> + "%s bash -i\n",
> + ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
> + return 1;
> + }
> +
> + ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
> + if (ruleset_fd < 0) {
> + perror("Failed to create a ruleset");
> + switch (errno) {

(Just as a note: In theory perror() can change the value of errno, as
far as I know - so AFAIK you'd theoretically have to do something
like:

int errno_ = errno;
perror("...");
switch (errno_) {
...
}

I'll almost certainly work fine as-is in practice though.)

2021-01-14 03:25:50

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v26 02/12] landlock: Add ruleset and domain management

On Wed, Dec 9, 2020 at 8:28 PM Mickaël Salaün <[email protected]> wrote:
> A Landlock ruleset is mainly a red-black tree with Landlock rules as
> nodes. This enables quick update and lookup to match a requested
> access, e.g. to a file. A ruleset is usable through a dedicated file
> descriptor (cf. following commit implementing syscalls) which enables a
> process to create and populate a ruleset with new rules.
>
> A domain is a ruleset tied to a set of processes. This group of rules
> defines the security policy enforced on these processes and their future
> children. A domain can transition to a new domain which is the
> intersection of all its constraints and those of a ruleset provided by
> the current process. This modification only impact the current process.
> This means that a process can only gain more constraints (i.e. lose
> accesses) over time.
>
> Cc: James Morris <[email protected]>
> Cc: Jann Horn <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Serge E. Hallyn <[email protected]>
> Signed-off-by: Mickaël Salaün <[email protected]>

Yeah, the layer stack stuff in this version looks good to me. :)

Reviewed-by: Jann Horn <[email protected]>

2021-01-14 03:27:07

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v26 07/12] landlock: Support filesystem access-control

On Wed, Dec 9, 2020 at 8:28 PM Mickaël Salaün <[email protected]> wrote:
> Thanks to the Landlock objects and ruleset, it is possible to identify
> inodes according to a process's domain. To enable an unprivileged
> process to express a file hierarchy, it first needs to open a directory
> (or a file) and pass this file descriptor to the kernel through
> landlock_add_rule(2). When checking if a file access request is
> allowed, we walk from the requested dentry to the real root, following
> the different mount layers. The access to each "tagged" inodes are
> collected according to their rule layer level, and ANDed to create
> access to the requested file hierarchy. This makes possible to identify
> a lot of files without tagging every inodes nor modifying the
> filesystem, while still following the view and understanding the user
> has from the filesystem.
>
> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
> keep the same struct inodes for the same inodes whereas these inodes are
> in use.
>
> This commit adds a minimal set of supported filesystem access-control
> which doesn't enable to restrict all file-related actions. This is the
> result of multiple discussions to minimize the code of Landlock to ease
> review. Thanks to the Landlock design, extending this access-control
> without breaking user space will not be a problem. Moreover, seccomp
> filters can be used to restrict the use of syscall families which may
> not be currently handled by Landlock.
[...]
> +static bool check_access_path_continue(
> + const struct landlock_ruleset *const domain,
> + const struct path *const path, const u32 access_request,
> + u64 *const layer_mask)
> +{
[...]
> + /*
> + * An access is granted if, for each policy layer, at least one rule
> + * encountered on the pathwalk grants the access, regardless of their
> + * position in the layer stack. We must then check not-yet-seen layers
> + * for each inode, from the last one added to the first one.
> + */
> + for (i = 0; i < rule->num_layers; i++) {
> + const struct landlock_layer *const layer = &rule->layers[i];
> + const u64 layer_level = BIT_ULL(layer->level - 1);
> +
> + if (!(layer_level & *layer_mask))
> + continue;
> + if ((layer->access & access_request) != access_request)
> + return false;
> + *layer_mask &= ~layer_level;

Hmm... shouldn't the last 5 lines be replaced by the following?

if ((layer->access & access_request) == access_request)
*layer_mask &= ~layer_level;

And then, since this function would always return true, you could
change its return type to "void".


As far as I can tell, the current version will still, if a ruleset
looks like this:

/usr read+write
/usr/lib/ read

reject write access to /usr/lib, right?


> + }
> + return true;
> +}

2021-01-14 03:28:01

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v26 00/12] Landlock LSM

On Wed, Dec 9, 2020 at 8:28 PM Mickaël Salaün <[email protected]> wrote:
> This patch series adds new built-time checks, a new test, renames some
> variables and functions to improve readability, and shift syscall
> numbers to align with -next.

Sorry, I've finally gotten around to looking at v26 - I hadn't
actually looked at v25 either yet. I think there's still one remaining
small issue in the filesystem access logic, but I think that's very
simple to fix, as long as we agree on what the expected semantics are.
Otherwise it basically looks good, apart from some typos.

I think v27 will be the final version of this series. :) (And I'll try
to actually look at that version much faster - I realize that waiting
for code reviews this long sucks.)

2021-01-14 18:57:11

by Mickaël Salaün

[permalink] [raw]
Subject: Re: [PATCH v26 07/12] landlock: Support filesystem access-control


On 14/01/2021 04:22, Jann Horn wrote:
> On Wed, Dec 9, 2020 at 8:28 PM Mickaël Salaün <[email protected]> wrote:
>> Thanks to the Landlock objects and ruleset, it is possible to identify
>> inodes according to a process's domain. To enable an unprivileged
>> process to express a file hierarchy, it first needs to open a directory
>> (or a file) and pass this file descriptor to the kernel through
>> landlock_add_rule(2). When checking if a file access request is
>> allowed, we walk from the requested dentry to the real root, following
>> the different mount layers. The access to each "tagged" inodes are
>> collected according to their rule layer level, and ANDed to create
>> access to the requested file hierarchy. This makes possible to identify
>> a lot of files without tagging every inodes nor modifying the
>> filesystem, while still following the view and understanding the user
>> has from the filesystem.
>>
>> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
>> keep the same struct inodes for the same inodes whereas these inodes are
>> in use.
>>
>> This commit adds a minimal set of supported filesystem access-control
>> which doesn't enable to restrict all file-related actions. This is the
>> result of multiple discussions to minimize the code of Landlock to ease
>> review. Thanks to the Landlock design, extending this access-control
>> without breaking user space will not be a problem. Moreover, seccomp
>> filters can be used to restrict the use of syscall families which may
>> not be currently handled by Landlock.
> [...]
>> +static bool check_access_path_continue(
>> + const struct landlock_ruleset *const domain,
>> + const struct path *const path, const u32 access_request,
>> + u64 *const layer_mask)
>> +{
> [...]
>> + /*
>> + * An access is granted if, for each policy layer, at least one rule
>> + * encountered on the pathwalk grants the access, regardless of their
>> + * position in the layer stack. We must then check not-yet-seen layers
>> + * for each inode, from the last one added to the first one.
>> + */
>> + for (i = 0; i < rule->num_layers; i++) {
>> + const struct landlock_layer *const layer = &rule->layers[i];
>> + const u64 layer_level = BIT_ULL(layer->level - 1);
>> +
>> + if (!(layer_level & *layer_mask))
>> + continue;
>> + if ((layer->access & access_request) != access_request)
>> + return false;
>> + *layer_mask &= ~layer_level;
>
> Hmm... shouldn't the last 5 lines be replaced by the following?
>
> if ((layer->access & access_request) == access_request)
> *layer_mask &= ~layer_level;
>
> And then, since this function would always return true, you could
> change its return type to "void".
>
>
> As far as I can tell, the current version will still, if a ruleset
> looks like this:
>
> /usr read+write
> /usr/lib/ read
>
> reject write access to /usr/lib, right?

If these two rules are from different layers, then yes it would work as
intended. However, if these rules are from the same layer the path walk
will not stop at /usr/lib but go down to /usr, which grants write
access. This is the reason I wrote it like this and the
layout1.inherit_subset test checks that. I'm updating the documentation
to better explain how an access is checked with one or multiple layers.

Doing this way also enables to stop the path walk earlier, which is the
original purpose of this function.


>
>
>> + }
>> + return true;
>> +}

2021-01-14 19:09:06

by Mickaël Salaün

[permalink] [raw]
Subject: Re: [PATCH v26 00/12] Landlock LSM


On 14/01/2021 04:22, Jann Horn wrote:
> On Wed, Dec 9, 2020 at 8:28 PM Mickaël Salaün <[email protected]> wrote:
>> This patch series adds new built-time checks, a new test, renames some
>> variables and functions to improve readability, and shift syscall
>> numbers to align with -next.
>
> Sorry, I've finally gotten around to looking at v26 - I hadn't
> actually looked at v25 either yet. I think there's still one remaining
> small issue in the filesystem access logic, but I think that's very
> simple to fix, as long as we agree on what the expected semantics are.
> Otherwise it basically looks good, apart from some typos.
>
> I think v27 will be the final version of this series. :) (And I'll try
> to actually look at that version much faster - I realize that waiting
> for code reviews this long sucks.)
>

I'm improving the tests, especially with bind mounts and overlayfs
tests. The v27 will also contains a better documentation to clarify the
semantic and explain how these mounts are handled. Thanks!

2021-01-14 22:46:47

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v26 07/12] landlock: Support filesystem access-control

On Thu, Jan 14, 2021 at 7:54 PM Mickaël Salaün <[email protected]> wrote:
> On 14/01/2021 04:22, Jann Horn wrote:
> > On Wed, Dec 9, 2020 at 8:28 PM Mickaël Salaün <[email protected]> wrote:
> >> Thanks to the Landlock objects and ruleset, it is possible to identify
> >> inodes according to a process's domain. To enable an unprivileged
> >> process to express a file hierarchy, it first needs to open a directory
> >> (or a file) and pass this file descriptor to the kernel through
> >> landlock_add_rule(2). When checking if a file access request is
> >> allowed, we walk from the requested dentry to the real root, following
> >> the different mount layers. The access to each "tagged" inodes are
> >> collected according to their rule layer level, and ANDed to create
> >> access to the requested file hierarchy. This makes possible to identify
> >> a lot of files without tagging every inodes nor modifying the
> >> filesystem, while still following the view and understanding the user
> >> has from the filesystem.
> >>
> >> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
> >> keep the same struct inodes for the same inodes whereas these inodes are
> >> in use.
> >>
> >> This commit adds a minimal set of supported filesystem access-control
> >> which doesn't enable to restrict all file-related actions. This is the
> >> result of multiple discussions to minimize the code of Landlock to ease
> >> review. Thanks to the Landlock design, extending this access-control
> >> without breaking user space will not be a problem. Moreover, seccomp
> >> filters can be used to restrict the use of syscall families which may
> >> not be currently handled by Landlock.
> > [...]
> >> +static bool check_access_path_continue(
> >> + const struct landlock_ruleset *const domain,
> >> + const struct path *const path, const u32 access_request,
> >> + u64 *const layer_mask)
> >> +{
> > [...]
> >> + /*
> >> + * An access is granted if, for each policy layer, at least one rule
> >> + * encountered on the pathwalk grants the access, regardless of their
> >> + * position in the layer stack. We must then check not-yet-seen layers
> >> + * for each inode, from the last one added to the first one.
> >> + */
> >> + for (i = 0; i < rule->num_layers; i++) {
> >> + const struct landlock_layer *const layer = &rule->layers[i];
> >> + const u64 layer_level = BIT_ULL(layer->level - 1);
> >> +
> >> + if (!(layer_level & *layer_mask))
> >> + continue;
> >> + if ((layer->access & access_request) != access_request)
> >> + return false;
> >> + *layer_mask &= ~layer_level;
> >
> > Hmm... shouldn't the last 5 lines be replaced by the following?
> >
> > if ((layer->access & access_request) == access_request)
> > *layer_mask &= ~layer_level;
> >
> > And then, since this function would always return true, you could
> > change its return type to "void".
> >
> >
> > As far as I can tell, the current version will still, if a ruleset
> > looks like this:
> >
> > /usr read+write
> > /usr/lib/ read
> >
> > reject write access to /usr/lib, right?
>
> If these two rules are from different layers, then yes it would work as
> intended. However, if these rules are from the same layer the path walk
> will not stop at /usr/lib but go down to /usr, which grants write
> access.

I don't see why the code would do what you're saying it does. And an
experiment seems to confirm what I said; I checked out landlock-v26,
and the behavior I get is:

user@vm:~/landlock$ dd if=/dev/null of=/tmp/aaa
0+0 records in
0+0 records out
0 bytes copied, 0.00106365 s, 0.0 kB/s
user@vm:~/landlock$ LL_FS_RO='/lib' LL_FS_RW='/' ./sandboxer dd
if=/dev/null of=/tmp/aaa
0+0 records in
0+0 records out
0 bytes copied, 0.000491814 s, 0.0 kB/s
user@vm:~/landlock$ LL_FS_RO='/tmp' LL_FS_RW='/' ./sandboxer dd
if=/dev/null of=/tmp/aaa
dd: failed to open '/tmp/aaa': Permission denied
user@vm:~/landlock$

Granting read access to /tmp prevents writing to it, even though write
access was granted to /.

2021-01-15 00:53:59

by Mickaël Salaün

[permalink] [raw]
Subject: Re: [PATCH v26 11/12] samples/landlock: Add a sandbox manager example


On 14/01/2021 04:21, Jann Horn wrote:
> On Wed, Dec 9, 2020 at 8:29 PM Mickaël Salaün <[email protected]> wrote:
>> Add a basic sandbox tool to launch a command which can only access a
>> whitelist of file hierarchies in a read-only or read-write way.
>
> I have to admit that I didn't really look at this closely before
> because it's just sample code... but I guess I should. You can add
>
> Reviewed-by: Jann Horn <[email protected]>
>
> if you fix the following nits:

OK, I will!

>
> [...]
>> diff --git a/samples/Kconfig b/samples/Kconfig
> [...]
>> +config SAMPLE_LANDLOCK
>> + bool "Build Landlock sample code"
>> + depends on HEADERS_INSTALL
>> + help
>> + Build a simple Landlock sandbox manager able to launch a process
>> + restricted by a user-defined filesystem access control.
>
> nit: s/filesystem access control/filesystem access control policy/
>
> [...]
>> diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
> [...]
>> +/*
>> + * Simple Landlock sandbox manager able to launch a process restricted by a
>> + * user-defined filesystem access control.
>
> nit: s/filesystem access control/filesystem access control policy/
>
> [...]
>> +int main(const int argc, char *const argv[], char *const *const envp)
>> +{
> [...]
>> + if (argc < 2) {
> [...]
>> + fprintf(stderr, "* %s: list of paths allowed to be used in a read-only way.\n",
>> + ENV_FS_RO_NAME);
>> + fprintf(stderr, "* %s: list of paths allowed to be used in a read-write way.\n",
>> + ENV_FS_RO_NAME);
>
> s/ENV_FS_RO_NAME/ENV_FS_RW_NAME/
>
>> + fprintf(stderr, "\nexample:\n"
>> + "%s=\"/bin:/lib:/usr:/proc:/etc:/dev/urandom\" "
>> + "%s=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" "
>> + "%s bash -i\n",
>> + ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
>> + return 1;
>> + }
>> +
>> + ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
>> + if (ruleset_fd < 0) {
>> + perror("Failed to create a ruleset");
>> + switch (errno) {
>
> (Just as a note: In theory perror() can change the value of errno, as
> far as I know - so AFAIK you'd theoretically have to do something
> like:
>
> int errno_ = errno;
> perror("...");
> switch (errno_) {
> ...
> }

Indeed :)

>
> I'll almost certainly work fine as-is in practice though.)
>

2021-01-15 09:13:26

by Mickaël Salaün

[permalink] [raw]
Subject: Re: [PATCH v26 07/12] landlock: Support filesystem access-control


On 14/01/2021 23:43, Jann Horn wrote:
> On Thu, Jan 14, 2021 at 7:54 PM Mickaël Salaün <[email protected]> wrote:
>> On 14/01/2021 04:22, Jann Horn wrote:
>>> On Wed, Dec 9, 2020 at 8:28 PM Mickaël Salaün <[email protected]> wrote:
>>>> Thanks to the Landlock objects and ruleset, it is possible to identify
>>>> inodes according to a process's domain. To enable an unprivileged
>>>> process to express a file hierarchy, it first needs to open a directory
>>>> (or a file) and pass this file descriptor to the kernel through
>>>> landlock_add_rule(2). When checking if a file access request is
>>>> allowed, we walk from the requested dentry to the real root, following
>>>> the different mount layers. The access to each "tagged" inodes are
>>>> collected according to their rule layer level, and ANDed to create
>>>> access to the requested file hierarchy. This makes possible to identify
>>>> a lot of files without tagging every inodes nor modifying the
>>>> filesystem, while still following the view and understanding the user
>>>> has from the filesystem.
>>>>
>>>> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
>>>> keep the same struct inodes for the same inodes whereas these inodes are
>>>> in use.
>>>>
>>>> This commit adds a minimal set of supported filesystem access-control
>>>> which doesn't enable to restrict all file-related actions. This is the
>>>> result of multiple discussions to minimize the code of Landlock to ease
>>>> review. Thanks to the Landlock design, extending this access-control
>>>> without breaking user space will not be a problem. Moreover, seccomp
>>>> filters can be used to restrict the use of syscall families which may
>>>> not be currently handled by Landlock.
>>> [...]
>>>> +static bool check_access_path_continue(
>>>> + const struct landlock_ruleset *const domain,
>>>> + const struct path *const path, const u32 access_request,
>>>> + u64 *const layer_mask)
>>>> +{
>>> [...]
>>>> + /*
>>>> + * An access is granted if, for each policy layer, at least one rule
>>>> + * encountered on the pathwalk grants the access, regardless of their
>>>> + * position in the layer stack. We must then check not-yet-seen layers
>>>> + * for each inode, from the last one added to the first one.
>>>> + */
>>>> + for (i = 0; i < rule->num_layers; i++) {
>>>> + const struct landlock_layer *const layer = &rule->layers[i];
>>>> + const u64 layer_level = BIT_ULL(layer->level - 1);
>>>> +
>>>> + if (!(layer_level & *layer_mask))
>>>> + continue;
>>>> + if ((layer->access & access_request) != access_request)
>>>> + return false;
>>>> + *layer_mask &= ~layer_level;
>>>
>>> Hmm... shouldn't the last 5 lines be replaced by the following?
>>>
>>> if ((layer->access & access_request) == access_request)
>>> *layer_mask &= ~layer_level;
>>>
>>> And then, since this function would always return true, you could
>>> change its return type to "void".
>>>
>>>
>>> As far as I can tell, the current version will still, if a ruleset
>>> looks like this:
>>>
>>> /usr read+write
>>> /usr/lib/ read
>>>
>>> reject write access to /usr/lib, right?
>>
>> If these two rules are from different layers, then yes it would work as
>> intended. However, if these rules are from the same layer the path walk
>> will not stop at /usr/lib but go down to /usr, which grants write
>> access.
>
> I don't see why the code would do what you're saying it does. And an
> experiment seems to confirm what I said; I checked out landlock-v26,
> and the behavior I get is:

There is a misunderstanding, I was responding to your proposition to
modify check_access_path_continue(), not about the behavior of landlock-v26.

>
> user@vm:~/landlock$ dd if=/dev/null of=/tmp/aaa
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.00106365 s, 0.0 kB/s
> user@vm:~/landlock$ LL_FS_RO='/lib' LL_FS_RW='/' ./sandboxer dd
> if=/dev/null of=/tmp/aaa
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.000491814 s, 0.0 kB/s
> user@vm:~/landlock$ LL_FS_RO='/tmp' LL_FS_RW='/' ./sandboxer dd
> if=/dev/null of=/tmp/aaa
> dd: failed to open '/tmp/aaa': Permission denied
> user@vm:~/landlock$
>
> Granting read access to /tmp prevents writing to it, even though write
> access was granted to /.
>

It indeed works like this with landlock-v26. However, with your above
proposition, it would work like this:

$ LL_FS_RO='/tmp' LL_FS_RW='/' ./sandboxer dd if=/dev/null of=/tmp/aaa
0+0 records in
0+0 records out
0 bytes copied, 0.000187265 s, 0.0 kB/s

…which is not what users would expect I guess. :)

2021-01-15 18:33:56

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v26 07/12] landlock: Support filesystem access-control

On Fri, Jan 15, 2021 at 10:10 AM Mickaël Salaün <[email protected]> wrote:
> On 14/01/2021 23:43, Jann Horn wrote:
> > On Thu, Jan 14, 2021 at 7:54 PM Mickaël Salaün <[email protected]> wrote:
> >> On 14/01/2021 04:22, Jann Horn wrote:
> >>> On Wed, Dec 9, 2020 at 8:28 PM Mickaël Salaün <[email protected]> wrote:
> >>>> Thanks to the Landlock objects and ruleset, it is possible to identify
> >>>> inodes according to a process's domain. To enable an unprivileged
> >>>> process to express a file hierarchy, it first needs to open a directory
> >>>> (or a file) and pass this file descriptor to the kernel through
> >>>> landlock_add_rule(2). When checking if a file access request is
> >>>> allowed, we walk from the requested dentry to the real root, following
> >>>> the different mount layers. The access to each "tagged" inodes are
> >>>> collected according to their rule layer level, and ANDed to create
> >>>> access to the requested file hierarchy. This makes possible to identify
> >>>> a lot of files without tagging every inodes nor modifying the
> >>>> filesystem, while still following the view and understanding the user
> >>>> has from the filesystem.
> >>>>
> >>>> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
> >>>> keep the same struct inodes for the same inodes whereas these inodes are
> >>>> in use.
> >>>>
> >>>> This commit adds a minimal set of supported filesystem access-control
> >>>> which doesn't enable to restrict all file-related actions. This is the
> >>>> result of multiple discussions to minimize the code of Landlock to ease
> >>>> review. Thanks to the Landlock design, extending this access-control
> >>>> without breaking user space will not be a problem. Moreover, seccomp
> >>>> filters can be used to restrict the use of syscall families which may
> >>>> not be currently handled by Landlock.
> >>> [...]
> >>>> +static bool check_access_path_continue(
> >>>> + const struct landlock_ruleset *const domain,
> >>>> + const struct path *const path, const u32 access_request,
> >>>> + u64 *const layer_mask)
> >>>> +{
> >>> [...]
> >>>> + /*
> >>>> + * An access is granted if, for each policy layer, at least one rule
> >>>> + * encountered on the pathwalk grants the access, regardless of their
> >>>> + * position in the layer stack. We must then check not-yet-seen layers
> >>>> + * for each inode, from the last one added to the first one.
> >>>> + */
> >>>> + for (i = 0; i < rule->num_layers; i++) {
> >>>> + const struct landlock_layer *const layer = &rule->layers[i];
> >>>> + const u64 layer_level = BIT_ULL(layer->level - 1);
> >>>> +
> >>>> + if (!(layer_level & *layer_mask))
> >>>> + continue;
> >>>> + if ((layer->access & access_request) != access_request)
> >>>> + return false;
> >>>> + *layer_mask &= ~layer_level;
> >>>
> >>> Hmm... shouldn't the last 5 lines be replaced by the following?
> >>>
> >>> if ((layer->access & access_request) == access_request)
> >>> *layer_mask &= ~layer_level;
> >>>
> >>> And then, since this function would always return true, you could
> >>> change its return type to "void".
> >>>
> >>>
> >>> As far as I can tell, the current version will still, if a ruleset
> >>> looks like this:
> >>>
> >>> /usr read+write
> >>> /usr/lib/ read
> >>>
> >>> reject write access to /usr/lib, right?
> >>
> >> If these two rules are from different layers, then yes it would work as
> >> intended. However, if these rules are from the same layer the path walk
> >> will not stop at /usr/lib but go down to /usr, which grants write
> >> access.
> >
> > I don't see why the code would do what you're saying it does. And an
> > experiment seems to confirm what I said; I checked out landlock-v26,
> > and the behavior I get is:
>
> There is a misunderstanding, I was responding to your proposition to
> modify check_access_path_continue(), not about the behavior of landlock-v26.
>
> >
> > user@vm:~/landlock$ dd if=/dev/null of=/tmp/aaa
> > 0+0 records in
> > 0+0 records out
> > 0 bytes copied, 0.00106365 s, 0.0 kB/s
> > user@vm:~/landlock$ LL_FS_RO='/lib' LL_FS_RW='/' ./sandboxer dd
> > if=/dev/null of=/tmp/aaa
> > 0+0 records in
> > 0+0 records out
> > 0 bytes copied, 0.000491814 s, 0.0 kB/s
> > user@vm:~/landlock$ LL_FS_RO='/tmp' LL_FS_RW='/' ./sandboxer dd
> > if=/dev/null of=/tmp/aaa
> > dd: failed to open '/tmp/aaa': Permission denied
> > user@vm:~/landlock$
> >
> > Granting read access to /tmp prevents writing to it, even though write
> > access was granted to /.
> >
>
> It indeed works like this with landlock-v26. However, with your above
> proposition, it would work like this:
>
> $ LL_FS_RO='/tmp' LL_FS_RW='/' ./sandboxer dd if=/dev/null of=/tmp/aaa
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.000187265 s, 0.0 kB/s
>
> …which is not what users would expect I guess. :)

Ah, so we are disagreeing about what the right semantics are. ^^ To
me, that is exactly the behavior I would expect.

Imagine that someone wants to write a program that needs to be able to
load libraries from /usr/lib (including subdirectories) and needs to
be able to write output to some user-specified output directory. So
they use something like this to sandbox their program (plus error
handling):

static void add_fs_rule(int ruleset_fd, char *path, u64 allowed_access) {
int fd = open(path, O_PATH);
struct landlock_path_beneath_attr path_beneath = {
.parent_fd = fd,
.allowed_access = allowed_access
};
landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
&path_beneath, 0);
close(fd);
}
int main(int argc, char **argv) {
char *output_dir = argv[1];
int ruleset_fd = landlock_create_ruleset(&ruleset_attr,
sizeof(ruleset_attr, 0);
add_fs_rule(ruleset_fd, "/usr/lib", ACCESS_FS_ROUGHLY_READ);
add_fs_rule(ruleset_fd, output_dir,
LANDLOCK_ACCESS_FS_WRITE_FILE|LANDLOCK_ACCESS_FS_MAKE_REG|LANDLOCK_ACCESS_FS_REMOVE_FILE);
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
landlock_enforce_ruleset_current(ruleset_fd, 0);
}

This will *almost* always work; but if the output directory is
/usr/lib/x86_64-linux-gnu/ , loading libraries from that directory
won't work anymore, right? So if userspace wanted this to *always*
works correctly, it would have to somehow figure out whether there is
a path upwards from the output directory (under any mount) that will
encounter /usr/lib, and set different permissions if that is the case.
That seems unnecessarily messy to me; and I think that this will make
it harder for generic commandline tools and such to adopt landlock.


If you do want to have the ability to deny access to subtrees of trees
to which access is permitted, I think that that should be made
explicit in the UAPI - e.g. you could (at a later point, after this
series has landed) introduce a new EXCLUDE flag for
landlock_add_rule() that means "I want to deny the access specified by
this rule", or something like that. (And you'd have to very carefully
document under which circumstances such rules are actually effective -
e.g. if someone grants full access to $HOME, but excludes $HOME/.ssh,
an attacker would still be able to rename $HOME/.ssh to $HOME/old_ssh,
and then if the program is later restarted and creates the ruleset
from scratch again, the old SSH folder will be accessible.)

2021-01-16 17:18:54

by Mickaël Salaün

[permalink] [raw]
Subject: Re: [PATCH v26 07/12] landlock: Support filesystem access-control


On 15/01/2021 19:31, Jann Horn wrote:
> On Fri, Jan 15, 2021 at 10:10 AM Mickaël Salaün <[email protected]> wrote:
>> On 14/01/2021 23:43, Jann Horn wrote:
>>> On Thu, Jan 14, 2021 at 7:54 PM Mickaël Salaün <[email protected]> wrote:
>>>> On 14/01/2021 04:22, Jann Horn wrote:
>>>>> On Wed, Dec 9, 2020 at 8:28 PM Mickaël Salaün <[email protected]> wrote:
>>>>>> Thanks to the Landlock objects and ruleset, it is possible to identify
>>>>>> inodes according to a process's domain. To enable an unprivileged
>>>>>> process to express a file hierarchy, it first needs to open a directory
>>>>>> (or a file) and pass this file descriptor to the kernel through
>>>>>> landlock_add_rule(2). When checking if a file access request is
>>>>>> allowed, we walk from the requested dentry to the real root, following
>>>>>> the different mount layers. The access to each "tagged" inodes are
>>>>>> collected according to their rule layer level, and ANDed to create
>>>>>> access to the requested file hierarchy. This makes possible to identify
>>>>>> a lot of files without tagging every inodes nor modifying the
>>>>>> filesystem, while still following the view and understanding the user
>>>>>> has from the filesystem.
>>>>>>
>>>>>> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
>>>>>> keep the same struct inodes for the same inodes whereas these inodes are
>>>>>> in use.
>>>>>>
>>>>>> This commit adds a minimal set of supported filesystem access-control
>>>>>> which doesn't enable to restrict all file-related actions. This is the
>>>>>> result of multiple discussions to minimize the code of Landlock to ease
>>>>>> review. Thanks to the Landlock design, extending this access-control
>>>>>> without breaking user space will not be a problem. Moreover, seccomp
>>>>>> filters can be used to restrict the use of syscall families which may
>>>>>> not be currently handled by Landlock.
>>>>> [...]
>>>>>> +static bool check_access_path_continue(
>>>>>> + const struct landlock_ruleset *const domain,
>>>>>> + const struct path *const path, const u32 access_request,
>>>>>> + u64 *const layer_mask)
>>>>>> +{
>>>>> [...]
>>>>>> + /*
>>>>>> + * An access is granted if, for each policy layer, at least one rule
>>>>>> + * encountered on the pathwalk grants the access, regardless of their
>>>>>> + * position in the layer stack. We must then check not-yet-seen layers
>>>>>> + * for each inode, from the last one added to the first one.
>>>>>> + */
>>>>>> + for (i = 0; i < rule->num_layers; i++) {
>>>>>> + const struct landlock_layer *const layer = &rule->layers[i];
>>>>>> + const u64 layer_level = BIT_ULL(layer->level - 1);
>>>>>> +
>>>>>> + if (!(layer_level & *layer_mask))
>>>>>> + continue;
>>>>>> + if ((layer->access & access_request) != access_request)
>>>>>> + return false;
>>>>>> + *layer_mask &= ~layer_level;
>>>>>
>>>>> Hmm... shouldn't the last 5 lines be replaced by the following?
>>>>>
>>>>> if ((layer->access & access_request) == access_request)
>>>>> *layer_mask &= ~layer_level;
>>>>>
>>>>> And then, since this function would always return true, you could
>>>>> change its return type to "void".
>>>>>
>>>>>
>>>>> As far as I can tell, the current version will still, if a ruleset
>>>>> looks like this:
>>>>>
>>>>> /usr read+write
>>>>> /usr/lib/ read
>>>>>
>>>>> reject write access to /usr/lib, right?
>>>>
>>>> If these two rules are from different layers, then yes it would work as
>>>> intended. However, if these rules are from the same layer the path walk
>>>> will not stop at /usr/lib but go down to /usr, which grants write
>>>> access.
>>>
>>> I don't see why the code would do what you're saying it does. And an
>>> experiment seems to confirm what I said; I checked out landlock-v26,
>>> and the behavior I get is:
>>
>> There is a misunderstanding, I was responding to your proposition to
>> modify check_access_path_continue(), not about the behavior of landlock-v26.
>>
>>>
>>> user@vm:~/landlock$ dd if=/dev/null of=/tmp/aaa
>>> 0+0 records in
>>> 0+0 records out
>>> 0 bytes copied, 0.00106365 s, 0.0 kB/s
>>> user@vm:~/landlock$ LL_FS_RO='/lib' LL_FS_RW='/' ./sandboxer dd
>>> if=/dev/null of=/tmp/aaa
>>> 0+0 records in
>>> 0+0 records out
>>> 0 bytes copied, 0.000491814 s, 0.0 kB/s
>>> user@vm:~/landlock$ LL_FS_RO='/tmp' LL_FS_RW='/' ./sandboxer dd
>>> if=/dev/null of=/tmp/aaa
>>> dd: failed to open '/tmp/aaa': Permission denied
>>> user@vm:~/landlock$
>>>
>>> Granting read access to /tmp prevents writing to it, even though write
>>> access was granted to /.
>>>
>>
>> It indeed works like this with landlock-v26. However, with your above
>> proposition, it would work like this:
>>
>> $ LL_FS_RO='/tmp' LL_FS_RW='/' ./sandboxer dd if=/dev/null of=/tmp/aaa
>> 0+0 records in
>> 0+0 records out
>> 0 bytes copied, 0.000187265 s, 0.0 kB/s
>>
>> …which is not what users would expect I guess. :)
>
> Ah, so we are disagreeing about what the right semantics are. ^^ To
> me, that is exactly the behavior I would expect.
>
> Imagine that someone wants to write a program that needs to be able to
> load libraries from /usr/lib (including subdirectories) and needs to
> be able to write output to some user-specified output directory. So
> they use something like this to sandbox their program (plus error
> handling):
>
> static void add_fs_rule(int ruleset_fd, char *path, u64 allowed_access) {
> int fd = open(path, O_PATH);
> struct landlock_path_beneath_attr path_beneath = {
> .parent_fd = fd,
> .allowed_access = allowed_access
> };
> landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
> &path_beneath, 0);
> close(fd);
> }
> int main(int argc, char **argv) {
> char *output_dir = argv[1];
> int ruleset_fd = landlock_create_ruleset(&ruleset_attr,
> sizeof(ruleset_attr, 0);
> add_fs_rule(ruleset_fd, "/usr/lib", ACCESS_FS_ROUGHLY_READ);
> add_fs_rule(ruleset_fd, output_dir,
> LANDLOCK_ACCESS_FS_WRITE_FILE|LANDLOCK_ACCESS_FS_MAKE_REG|LANDLOCK_ACCESS_FS_REMOVE_FILE);
> prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> landlock_enforce_ruleset_current(ruleset_fd, 0);
> }
>
> This will *almost* always work; but if the output directory is
> /usr/lib/x86_64-linux-gnu/ , loading libraries from that directory
> won't work anymore, right? So if userspace wanted this to *always*
> works correctly, it would have to somehow figure out whether there is
> a path upwards from the output directory (under any mount) that will
> encounter /usr/lib, and set different permissions if that is the case.
> That seems unnecessarily messy to me; and I think that this will make
> it harder for generic commandline tools and such to adopt landlock.
>
>
> If you do want to have the ability to deny access to subtrees of trees
> to which access is permitted, I think that that should be made
> explicit in the UAPI - e.g. you could (at a later point, after this
> series has landed) introduce a new EXCLUDE flag for
> landlock_add_rule() that means "I want to deny the access specified by
> this rule", or something like that. (And you'd have to very carefully
> document under which circumstances such rules are actually effective -
> e.g. if someone grants full access to $HOME, but excludes $HOME/.ssh,
> an attacker would still be able to rename $HOME/.ssh to $HOME/old_ssh,
> and then if the program is later restarted and creates the ruleset
> from scratch again, the old SSH folder will be accessible.)
>

OK, it's indeed a more pragmatic approach. I'll take your change and
merge check_access_path_continue() with check_access_path(). Thanks!