2023-10-18 10:51:52

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 00/14] device_cgroup: guard mknod for non-initial user namespace

Introduce the flag BPF_DEVCG_ACC_MKNOD_UNS for bpf programs of type
BPF_PROG_TYPE_CGROUP_DEVICE which allows to guard access to mknod
in non-initial user namespaces.

If a container manager restricts its unprivileged (user namespaced)
children by a device cgroup, it is not necessary to deny mknod()
anymore. Thus, user space applications may map devices on different
locations in the file system by using mknod() inside the container.

A use case for this, we also use in GyroidOS, is to run virsh for
VMs inside an unprivileged container. virsh creates device nodes,
e.g., "/var/run/libvirt/qemu/11-fgfg.dev/null" which currently fails
in a non-initial userns, even if a cgroup device white list with the
corresponding major, minor of /dev/null exists. Thus, in this case
the usual bind mounts or pre populated device nodes under /dev are
not sufficient.

To circumvent this limitation, allow mknod() by checking CAP_MKNOD
in the userns by implementing the security_inode_mknod_nscap(). The
hook implementation checks if the corresponding permission flag
BPF_DEVCG_ACC_MKNOD_UNS is set for the device in the bpf program.
To avoid to create unusable inodes in user space the hook also
checks SB_I_NODEV on the corresponding super block.

Further, the security_sb_alloc_userns() hook is implemented using
cgroup_bpf_current_enabled() to allow usage of device nodes on super
blocks mounted by a guarded task.

Patch 1 to 3 rework the current devcgroup_inode hooks as an LSM

Patch 4 to 8 rework explicit calls to devcgroup_check_permission
also as LSM hooks and finalize the conversion of the device_cgroup
subsystem to a LSM.

Patch 9 and 10 introduce new generic security hooks to be used
for the actual mknod device guard implementation.

Patch 11 wires up the security hooks in the vfs

Patch 12 and 13 provide helper functions in the bpf cgroup
subsystem.

Patch 14 finally implement the LSM hooks to grand access

Signed-off-by: Michael Weiß <[email protected]>
---
Changes in v2:
- Integrate this as LSM (Christian, Paul)
- Switched to a device cgroup specific flag instead of a generic
bpf program flag (Christian)
- do not ignore SB_I_NODEV in fs/namei.c but use LSM hook in
sb_alloc_super in fs/super.c
- Link to v1: https://lore.kernel.org/r/[email protected]

Michael Weiß (14):
device_cgroup: Implement devcgroup hooks as lsm security hooks
vfs: Remove explicit devcgroup_inode calls
device_cgroup: Remove explicit devcgroup_inode hooks
lsm: Add security_dev_permission() hook
device_cgroup: Implement dev_permission() hook
block: Switch from devcgroup_check_permission to security hook
drm/amdkfd: Switch from devcgroup_check_permission to security hook
device_cgroup: Hide devcgroup functionality completely in lsm
lsm: Add security_inode_mknod_nscap() hook
lsm: Add security_sb_alloc_userns() hook
vfs: Wire up security hooks for lsm-based device guard in userns
bpf: Add flag BPF_DEVCG_ACC_MKNOD_UNS for device access
bpf: cgroup: Introduce helper cgroup_bpf_current_enabled()
device_cgroup: Allow mknod in non-initial userns if guarded

block/bdev.c | 9 +-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +-
fs/namei.c | 24 ++--
fs/super.c | 6 +-
include/linux/bpf-cgroup.h | 2 +
include/linux/device_cgroup.h | 67 -----------
include/linux/lsm_hook_defs.h | 4 +
include/linux/security.h | 18 +++
include/uapi/linux/bpf.h | 1 +
init/Kconfig | 4 +
kernel/bpf/cgroup.c | 14 +++
security/Kconfig | 1 +
security/Makefile | 2 +-
security/device_cgroup/Kconfig | 7 ++
security/device_cgroup/Makefile | 4 +
security/{ => device_cgroup}/device_cgroup.c | 3 +-
security/device_cgroup/device_cgroup.h | 20 ++++
security/device_cgroup/lsm.c | 114 +++++++++++++++++++
security/security.c | 75 ++++++++++++
19 files changed, 294 insertions(+), 88 deletions(-)
delete mode 100644 include/linux/device_cgroup.h
create mode 100644 security/device_cgroup/Kconfig
create mode 100644 security/device_cgroup/Makefile
rename security/{ => device_cgroup}/device_cgroup.c (99%)
create mode 100644 security/device_cgroup/device_cgroup.h
create mode 100644 security/device_cgroup/lsm.c


base-commit: 58720809f52779dc0f08e53e54b014209d13eebb
--
2.30.2


2023-10-18 10:52:00

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 01/14] device_cgroup: Implement devcgroup hooks as lsm security hooks

devcgroup_inode_mknod and devcgroup_inode_permission hooks are
called at place where already the corresponding lsm hooks
security_inode_mknod and security_inode_permission are called
to govern device access. Though introduce a small LSM which
implements those two security hooks instead of the additional
explicit devcgroup calls. The explicit API will be removed when
corresponding subsystems will drop the direct call to devcgroup
hooks.

Signed-off-by: Michael Weiß <[email protected]>
---
init/Kconfig | 4 +
security/Kconfig | 1 +
security/Makefile | 2 +-
security/device_cgroup/Kconfig | 7 ++
security/device_cgroup/Makefile | 4 +
security/{ => device_cgroup}/device_cgroup.c | 0
security/device_cgroup/lsm.c | 82 ++++++++++++++++++++
7 files changed, 99 insertions(+), 1 deletion(-)
create mode 100644 security/device_cgroup/Kconfig
create mode 100644 security/device_cgroup/Makefile
rename security/{ => device_cgroup}/device_cgroup.c (100%)
create mode 100644 security/device_cgroup/lsm.c

diff --git a/init/Kconfig b/init/Kconfig
index 6d35728b94b2..5ed28dc821f3 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1111,6 +1111,8 @@ config PROC_PID_CPUSET

config CGROUP_DEVICE
bool "Device controller"
+ select SECURITY
+ select SECURITY_DEVICE_CGROUP
help
Provides a cgroup controller implementing whitelists for
devices which a process in the cgroup can mknod or open.
@@ -1136,6 +1138,8 @@ config CGROUP_BPF
bool "Support for eBPF programs attached to cgroups"
depends on BPF_SYSCALL
select SOCK_CGROUP_DATA
+ select SECURITY
+ select SECURITY_DEVICE_CGROUP
help
Allow attaching eBPF programs to a cgroup using the bpf(2)
syscall command BPF_PROG_ATTACH.
diff --git a/security/Kconfig b/security/Kconfig
index 52c9af08ad35..0a0e60fc50e1 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -194,6 +194,7 @@ source "security/yama/Kconfig"
source "security/safesetid/Kconfig"
source "security/lockdown/Kconfig"
source "security/landlock/Kconfig"
+source "security/device_cgroup/Kconfig"

source "security/integrity/Kconfig"

diff --git a/security/Makefile b/security/Makefile
index 18121f8f85cd..7000cb8a69e8 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -21,7 +21,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/
obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/
obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/
obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/
-obj-$(CONFIG_CGROUPS) += device_cgroup.o
+obj-$(CONFIG_SECURITY_DEVICE_CGROUP) += device_cgroup/
obj-$(CONFIG_BPF_LSM) += bpf/
obj-$(CONFIG_SECURITY_LANDLOCK) += landlock/

diff --git a/security/device_cgroup/Kconfig b/security/device_cgroup/Kconfig
new file mode 100644
index 000000000000..93934bda3b8e
--- /dev/null
+++ b/security/device_cgroup/Kconfig
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config SECURITY_DEVICE_CGROUP
+ bool "Device Cgroup Support"
+ depends on SECURITY
+ help
+ Provides the necessary security framework integration
+ for cgroup device controller implementations.
diff --git a/security/device_cgroup/Makefile b/security/device_cgroup/Makefile
new file mode 100644
index 000000000000..c715b2b96388
--- /dev/null
+++ b/security/device_cgroup/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_SECURITY_DEVICE_CGROUP) += devcgroup.o
+
+devcgroup-y := lsm.o device_cgroup.o
diff --git a/security/device_cgroup.c b/security/device_cgroup/device_cgroup.c
similarity index 100%
rename from security/device_cgroup.c
rename to security/device_cgroup/device_cgroup.c
diff --git a/security/device_cgroup/lsm.c b/security/device_cgroup/lsm.c
new file mode 100644
index 000000000000..ef30cff1f610
--- /dev/null
+++ b/security/device_cgroup/lsm.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Device cgroup security module
+ *
+ * This file contains device cgroup LSM hooks.
+ *
+ * Copyright (C) 2023 Fraunhofer AISEC. All rights reserved.
+ * Based on code copied from <file:include/linux/device_cgroups.h> (which has no copyright)
+ *
+ * Authors: Michael Weiß <[email protected]>
+ */
+
+#include <linux/bpf-cgroup.h>
+#include <linux/device_cgroup.h>
+#include <linux/lsm_hooks.h>
+
+static int devcg_inode_permission(struct inode *inode, int mask)
+{
+ short type, access = 0;
+
+ if (likely(!inode->i_rdev))
+ return 0;
+
+ if (S_ISBLK(inode->i_mode))
+ type = DEVCG_DEV_BLOCK;
+ else if (S_ISCHR(inode->i_mode))
+ type = DEVCG_DEV_CHAR;
+ else
+ return 0;
+
+ if (mask & MAY_WRITE)
+ access |= DEVCG_ACC_WRITE;
+ if (mask & MAY_READ)
+ access |= DEVCG_ACC_READ;
+
+ return devcgroup_check_permission(type, imajor(inode), iminor(inode),
+ access);
+}
+
+static int __devcg_inode_mknod(int mode, dev_t dev, short access)
+{
+ short type;
+
+ if (!S_ISBLK(mode) && !S_ISCHR(mode))
+ return 0;
+
+ if (S_ISCHR(mode) && dev == WHITEOUT_DEV)
+ return 0;
+
+ if (S_ISBLK(mode))
+ type = DEVCG_DEV_BLOCK;
+ else
+ type = DEVCG_DEV_CHAR;
+
+ return devcgroup_check_permission(type, MAJOR(dev), MINOR(dev),
+ access);
+}
+
+static int devcg_inode_mknod(struct inode *dir, struct dentry *dentry,
+ umode_t mode, dev_t dev)
+{
+ return __devcg_inode_mknod(mode, dev, DEVCG_ACC_MKNOD);
+}
+
+static struct security_hook_list devcg_hooks[] __ro_after_init = {
+ LSM_HOOK_INIT(inode_permission, devcg_inode_permission),
+ LSM_HOOK_INIT(inode_mknod, devcg_inode_mknod),
+};
+
+static int __init devcgroup_init(void)
+{
+ security_add_hooks(devcg_hooks, ARRAY_SIZE(devcg_hooks),
+ "devcgroup");
+ pr_info("device cgroup initialized\n");
+ return 0;
+}
+
+DEFINE_LSM(devcgroup) = {
+ .name = "devcgroup",
+ .order = LSM_ORDER_FIRST,
+ .init = devcgroup_init,
+};
--
2.30.2

2023-10-18 10:52:08

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 03/14] device_cgroup: Remove explicit devcgroup_inode hooks

All users (actually just fs/namei) of devcgroup_inode_mknod and
devcgroup_inode_permission are removed. Now drop the API completely.

Signed-off-by: Michael Weiß <[email protected]>
---
include/linux/device_cgroup.h | 47 -----------------------------------
1 file changed, 47 deletions(-)

diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index d02f32b7514e..d9a62b0cff87 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -14,54 +14,7 @@
#if defined(CONFIG_CGROUP_DEVICE) || defined(CONFIG_CGROUP_BPF)
int devcgroup_check_permission(short type, u32 major, u32 minor,
short access);
-static inline int devcgroup_inode_permission(struct inode *inode, int mask)
-{
- short type, access = 0;
-
- if (likely(!inode->i_rdev))
- return 0;
-
- if (S_ISBLK(inode->i_mode))
- type = DEVCG_DEV_BLOCK;
- else if (S_ISCHR(inode->i_mode))
- type = DEVCG_DEV_CHAR;
- else
- return 0;
-
- if (mask & MAY_WRITE)
- access |= DEVCG_ACC_WRITE;
- if (mask & MAY_READ)
- access |= DEVCG_ACC_READ;
-
- return devcgroup_check_permission(type, imajor(inode), iminor(inode),
- access);
-}
-
-static inline int devcgroup_inode_mknod(int mode, dev_t dev)
-{
- short type;
-
- if (!S_ISBLK(mode) && !S_ISCHR(mode))
- return 0;
-
- if (S_ISCHR(mode) && dev == WHITEOUT_DEV)
- return 0;
-
- if (S_ISBLK(mode))
- type = DEVCG_DEV_BLOCK;
- else
- type = DEVCG_DEV_CHAR;
-
- return devcgroup_check_permission(type, MAJOR(dev), MINOR(dev),
- DEVCG_ACC_MKNOD);
-}
-
#else
static inline int devcgroup_check_permission(short type, u32 major, u32 minor,
short access)
-{ return 0; }
-static inline int devcgroup_inode_permission(struct inode *inode, int mask)
-{ return 0; }
-static inline int devcgroup_inode_mknod(int mode, dev_t dev)
-{ return 0; }
#endif
--
2.30.2

2023-10-18 10:52:09

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 14/14] device_cgroup: Allow mknod in non-initial userns if guarded

If a container manager restricts its unprivileged (user namespaced)
children by a device cgroup, it is not necessary to deny mknod()
anymore. Thus, user space applications may map devices on different
locations in the file system by using mknod() inside the container.

A use case for this, we also use in GyroidOS, is to run virsh for
VMs inside an unprivileged container. virsh creates device nodes,
e.g., "/var/run/libvirt/qemu/11-fgfg.dev/null" which currently fails
in a non-initial userns, even if a cgroup device white list with the
corresponding major, minor of /dev/null exists. Thus, in this case
the usual bind mounts or pre populated device nodes under /dev are
not sufficient.

To circumvent this limitation, allow mknod() by checking CAP_MKNOD
in the userns by implementing the security_inode_mknod_nscap(). The
hook implementation checks if the corresponding permission flag
BPF_DEVCG_ACC_MKNOD_UNS is set for the device in the bpf program.
To avoid to create unusable inodes in user space the hook also checks
SB_I_NODEV on the corresponding super block.

Further, the security_sb_alloc_userns() hook is implemented using
cgroup_bpf_current_enabled() to allow usage of device nodes on super
blocks mounted by a guarded task.

Signed-off-by: Michael Weiß <[email protected]>
---
security/device_cgroup/lsm.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)

diff --git a/security/device_cgroup/lsm.c b/security/device_cgroup/lsm.c
index a963536d0a15..6bc984d9c9d1 100644
--- a/security/device_cgroup/lsm.c
+++ b/security/device_cgroup/lsm.c
@@ -66,10 +66,37 @@ static int devcg_inode_mknod(struct inode *dir, struct dentry *dentry,
return __devcg_inode_mknod(mode, dev, DEVCG_ACC_MKNOD);
}

+#ifdef CONFIG_CGROUP_BPF
+static int devcg_sb_alloc_userns(struct super_block *sb)
+{
+ if (cgroup_bpf_current_enabled(CGROUP_DEVICE))
+ return 0;
+
+ return -EPERM;
+}
+
+static int devcg_inode_mknod_nscap(struct inode *dir, struct dentry *dentry,
+ umode_t mode, dev_t dev)
+{
+ if (!cgroup_bpf_current_enabled(CGROUP_DEVICE))
+ return -EPERM;
+
+ // avoid to create unusable inodes in user space
+ if (dentry->d_sb->s_iflags & SB_I_NODEV)
+ return -EPERM;
+
+ return __devcg_inode_mknod(mode, dev, BPF_DEVCG_ACC_MKNOD_UNS);
+}
+#endif /* CONFIG_CGROUP_BPF */
+
static struct security_hook_list devcg_hooks[] __ro_after_init = {
LSM_HOOK_INIT(inode_permission, devcg_inode_permission),
LSM_HOOK_INIT(inode_mknod, devcg_inode_mknod),
LSM_HOOK_INIT(dev_permission, devcg_dev_permission),
+#ifdef CONFIG_CGROUP_BPF
+ LSM_HOOK_INIT(sb_alloc_userns, devcg_sb_alloc_userns),
+ LSM_HOOK_INIT(inode_mknod_nscap, devcg_inode_mknod_nscap),
+#endif
};

static int __init devcgroup_init(void)
--
2.30.2

2023-10-18 10:52:14

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 12/14] bpf: Add flag BPF_DEVCG_ACC_MKNOD_UNS for device access

With this new flag for bpf cgroup device programs, it should be
possible to guard mknod() access in non-initial user namespaces
later on.

Signed-off-by: Michael Weiß <[email protected]>
---
include/uapi/linux/bpf.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0448700890f7..0196b9c72d3e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6927,6 +6927,7 @@ enum {
BPF_DEVCG_ACC_MKNOD = (1ULL << 0),
BPF_DEVCG_ACC_READ = (1ULL << 1),
BPF_DEVCG_ACC_WRITE = (1ULL << 2),
+ BPF_DEVCG_ACC_MKNOD_UNS = (1ULL << 3),
};

enum {
--
2.30.2

2023-10-18 10:52:15

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 08/14] device_cgroup: Hide devcgroup functionality completely in lsm

Now since all users of devcgroup_check_permission() have been
removed, all device cgroup related functionality is covered by
security hooks. Thus, move the public device_cgroup.h header
into the subfolder of the lsm module.

Signed-off-by: Michael Weiß <[email protected]>
---
security/device_cgroup/device_cgroup.c | 3 ++-
{include/linux => security/device_cgroup}/device_cgroup.h | 0
security/device_cgroup/lsm.c | 3 ++-
3 files changed, 4 insertions(+), 2 deletions(-)
rename {include/linux => security/device_cgroup}/device_cgroup.h (100%)

diff --git a/security/device_cgroup/device_cgroup.c b/security/device_cgroup/device_cgroup.c
index dc4df7475081..1a8190929ec3 100644
--- a/security/device_cgroup/device_cgroup.c
+++ b/security/device_cgroup/device_cgroup.c
@@ -6,7 +6,6 @@
*/

#include <linux/bpf-cgroup.h>
-#include <linux/device_cgroup.h>
#include <linux/cgroup.h>
#include <linux/ctype.h>
#include <linux/list.h>
@@ -16,6 +15,8 @@
#include <linux/rcupdate.h>
#include <linux/mutex.h>

+#include "device_cgroup.h"
+
#ifdef CONFIG_CGROUP_DEVICE

static DEFINE_MUTEX(devcgroup_mutex);
diff --git a/include/linux/device_cgroup.h b/security/device_cgroup/device_cgroup.h
similarity index 100%
rename from include/linux/device_cgroup.h
rename to security/device_cgroup/device_cgroup.h
diff --git a/security/device_cgroup/lsm.c b/security/device_cgroup/lsm.c
index 987d2c20a577..a963536d0a15 100644
--- a/security/device_cgroup/lsm.c
+++ b/security/device_cgroup/lsm.c
@@ -11,9 +11,10 @@
*/

#include <linux/bpf-cgroup.h>
-#include <linux/device_cgroup.h>
#include <linux/lsm_hooks.h>

+#include "device_cgroup.h"
+
static int devcg_dev_permission(umode_t mode, dev_t dev, int mask)
{
short type, access = 0;
--
2.30.2

2023-10-18 10:52:21

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 09/14] lsm: Add security_inode_mknod_nscap() hook

Provide a new lsm hook which may be used to allow mknod in
non-initial userns. If access to the device is guarded by this
hook, access to mknod may be granted by checking cap mknod for
unprivileged user namespaces.

By default this will return -EPERM if no lsm implements the
hook. A first lsm to use this will be the lately converted
cgroup_device module.

Signed-off-by: Michael Weiß <[email protected]>
---
include/linux/lsm_hook_defs.h | 2 ++
include/linux/security.h | 8 ++++++++
security/security.c | 31 +++++++++++++++++++++++++++++++
3 files changed, 41 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index a868982725a9..f4fa01182910 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -276,6 +276,8 @@ LSM_HOOK(int, 0, inode_setsecctx, struct dentry *dentry, void *ctx, u32 ctxlen)
LSM_HOOK(int, 0, inode_getsecctx, struct inode *inode, void **ctx,
u32 *ctxlen)
LSM_HOOK(int, 0, dev_permission, umode_t mode, dev_t dev, int mask)
+LSM_HOOK(int, -EPERM, inode_mknod_nscap, struct inode *dir, struct dentry *dentry,
+ umode_t mode, dev_t dev)

#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
LSM_HOOK(int, 0, post_notification, const struct cred *w_cred,
diff --git a/include/linux/security.h b/include/linux/security.h
index 8bc6ac8816c6..bad6992877f4 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -485,6 +485,8 @@ int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
int security_locked_down(enum lockdown_reason what);
int security_dev_permission(umode_t mode, dev_t dev, int mask);
+int security_inode_mknod_nscap(struct inode *dir, struct dentry *dentry,
+ umode_t mode, dev_t dev);
#else /* CONFIG_SECURITY */

static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1400,6 +1402,12 @@ static inline int security_dev_permission(umode_t mode, dev_t dev, int mask)
{
return 0;
}
+static inline int security_inode_mknod_nscap(struct inode *dir,
+ struct dentry *dentry,
+ umode_t mode, dev_t dev);
+{
+ return -EPERM;
+}
#endif /* CONFIG_SECURITY */

#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/security/security.c b/security/security.c
index 40f6787df3b1..7708374b6d7e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -4034,6 +4034,37 @@ int security_dev_permission(umode_t mode, dev_t dev, int mask)
}
EXPORT_SYMBOL(security_dev_permission);

+/**
+ * security_inode_mknod_nscap() - Check if device is guarded
+ * @dir: parent directory
+ * @dentry: new file
+ * @mode: new file mode
+ * @dev: device number
+ *
+ * If access to the device is guarded by this hook, access to mknod may be granted by
+ * checking cap mknod for unprivileged user namespaces.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_inode_mknod_nscap(struct inode *dir, struct dentry *dentry,
+ umode_t mode, dev_t dev)
+{
+ int thisrc;
+ int rc = LSM_RET_DEFAULT(inode_mknod_nscap);
+ struct security_hook_list *hp;
+
+ hlist_for_each_entry(hp, &security_hook_heads.inode_mknod_nscap, list) {
+ thisrc = hp->hook.inode_mknod_nscap(dir, dentry, mode, dev);
+ if (thisrc != LSM_RET_DEFAULT(inode_mknod_nscap)) {
+ rc = thisrc;
+ if (thisrc != 0)
+ break;
+ }
+ }
+ return rc;
+}
+EXPORT_SYMBOL(security_inode_mknod_nscap);
+
#ifdef CONFIG_WATCH_QUEUE
/**
* security_post_notification() - Check if a watch notification can be posted
--
2.30.2

2023-10-18 10:52:21

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 11/14] vfs: Wire up security hooks for lsm-based device guard in userns

Wire up security_inode_mknod_capns() in fs/namei.c. If implemented
and access is granted by an lsm, check ns_capable() instead of the
global CAP_MKNOD.

Wire up security_sb_alloc_userns() in fs/super.c. If implemented
and access is granted by an lsm, the created super block will allow
access to device nodes also if it was created in a non-inital userns.

Signed-off-by: Michael Weiß <[email protected]>
---
fs/namei.c | 16 +++++++++++++++-
fs/super.c | 6 +++++-
2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f601fcbdc4d2..1f68d160e2c0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3949,6 +3949,20 @@ inline struct dentry *user_path_create(int dfd, const char __user *pathname,
}
EXPORT_SYMBOL(user_path_create);

+static bool mknod_capable(struct inode *dir, struct dentry *dentry,
+ umode_t mode, dev_t dev)
+{
+ /*
+ * In case of a security hook implementation check mknod in user
+ * namespace. Otherwise just check global capability.
+ */
+ int error = security_inode_mknod_nscap(dir, dentry, mode, dev);
+ if (!error)
+ return ns_capable(current_user_ns(), CAP_MKNOD);
+ else
+ return capable(CAP_MKNOD);
+}
+
/**
* vfs_mknod - create device node or file
* @idmap: idmap of the mount the inode was found from
@@ -3975,7 +3989,7 @@ int vfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
return error;

if ((S_ISCHR(mode) || S_ISBLK(mode)) && !is_whiteout &&
- !capable(CAP_MKNOD))
+ !mknod_capable(dir, dentry, mode, dev))
return -EPERM;

if (!dir->i_op->mknod)
diff --git a/fs/super.c b/fs/super.c
index 2d762ce67f6e..bb01db6d9986 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -362,7 +362,11 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
}
s->s_bdi = &noop_backing_dev_info;
s->s_flags = flags;
- if (s->s_user_ns != &init_user_ns)
+ /*
+ * We still have to think about this here. Several concerns exist
+ * about the security model, especially about malicious fuse.
+ */
+ if (s->s_user_ns != &init_user_ns && security_sb_alloc_userns(s))
s->s_iflags |= SB_I_NODEV;
INIT_HLIST_NODE(&s->s_instances);
INIT_HLIST_BL_HEAD(&s->s_roots);
--
2.30.2

2023-10-18 10:52:27

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 04/14] lsm: Add security_dev_permission() hook

Provide a new lsm hook which may be used to check permission on
a device by its dev_t representation only. This could be used if
an inode is not available and the security_inode_permission
check is not applicable.

A first lsm to use this will be the lately converted cgroup_device
module, to allow permission checks inside driver implementations.

Signed-off-by: Michael Weiß <[email protected]>
---
include/linux/lsm_hook_defs.h | 1 +
include/linux/security.h | 5 +++++
security/security.c | 18 ++++++++++++++++++
3 files changed, 24 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index ac962c4cb44b..a868982725a9 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -275,6 +275,7 @@ LSM_HOOK(int, 0, inode_notifysecctx, struct inode *inode, void *ctx, u32 ctxlen)
LSM_HOOK(int, 0, inode_setsecctx, struct dentry *dentry, void *ctx, u32 ctxlen)
LSM_HOOK(int, 0, inode_getsecctx, struct inode *inode, void **ctx,
u32 *ctxlen)
+LSM_HOOK(int, 0, dev_permission, umode_t mode, dev_t dev, int mask)

#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
LSM_HOOK(int, 0, post_notification, const struct cred *w_cred,
diff --git a/include/linux/security.h b/include/linux/security.h
index 5f16eecde00b..8bc6ac8816c6 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -484,6 +484,7 @@ int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
int security_locked_down(enum lockdown_reason what);
+int security_dev_permission(umode_t mode, dev_t dev, int mask);
#else /* CONFIG_SECURITY */

static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1395,6 +1396,10 @@ static inline int security_locked_down(enum lockdown_reason what)
{
return 0;
}
+static inline int security_dev_permission(umode_t mode, dev_t dev, int mask)
+{
+ return 0;
+}
#endif /* CONFIG_SECURITY */

#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/security/security.c b/security/security.c
index 23b129d482a7..40f6787df3b1 100644
--- a/security/security.c
+++ b/security/security.c
@@ -4016,6 +4016,24 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
}
EXPORT_SYMBOL(security_inode_getsecctx);

+/**
+ * security_dev_permission() - Check if accessing a dev is allowed
+ * @mode: file mode holding device type
+ * @dev: device
+ * @mask: access mask
+ *
+ * Check permission before accessing an device by its major minor.
+ * This hook is called by drivers which may not have an inode but only
+ * the dev_t representation of a device to check permission.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_dev_permission(umode_t mode, dev_t dev, int mask)
+{
+ return call_int_hook(dev_permission, 0, mode, dev, mask);
+}
+EXPORT_SYMBOL(security_dev_permission);
+
#ifdef CONFIG_WATCH_QUEUE
/**
* security_post_notification() - Check if a watch notification can be posted
--
2.30.2

2023-10-18 10:52:41

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 13/14] bpf: cgroup: Introduce helper cgroup_bpf_current_enabled()

This helper can be used to check if a cgroup-bpf specific program is
active for the current task.

Signed-off-by: Michael Weiß <[email protected]>
Reviewed-by: Alexander Mikhalitsyn <[email protected]>
---
include/linux/bpf-cgroup.h | 2 ++
kernel/bpf/cgroup.c | 14 ++++++++++++++
2 files changed, 16 insertions(+)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 8506690dbb9c..655697c2a620 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -184,6 +184,8 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
return array != &bpf_empty_prog_array.hdr;
}

+bool cgroup_bpf_current_enabled(enum cgroup_bpf_attach_type type);
+
/* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
#define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb) \
({ \
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 03b3d4492980..19ae3d037db7 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -24,6 +24,20 @@
DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE);
EXPORT_SYMBOL(cgroup_bpf_enabled_key);

+bool cgroup_bpf_current_enabled(enum cgroup_bpf_attach_type type)
+{
+ struct cgroup *cgrp;
+ struct bpf_prog_array *array;
+
+ rcu_read_lock();
+ cgrp = task_dfl_cgroup(current);
+ rcu_read_unlock();
+
+ array = rcu_access_pointer(cgrp->bpf.effective[type]);
+ return array != &bpf_empty_prog_array.hdr;
+}
+EXPORT_SYMBOL(cgroup_bpf_current_enabled);
+
/* __always_inline is necessary to prevent indirect call through run_prog
* function pointer.
*/
--
2.30.2

2023-10-18 10:57:18

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 07/14] drm/amdkfd: Switch from devcgroup_check_permission to security hook

The new lsm-based cgroup device access control provides an
equivalent hook to check device permission. Thus, switch to the
more generic security hook security_dev_permission() instead of
directly calling devcgroup_check_permission().

Signed-off-by: Michael Weiß <[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index fa24e1852493..50979f332e38 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -38,7 +38,7 @@
#include <linux/seq_file.h>
#include <linux/kref.h>
#include <linux/sysfs.h>
-#include <linux/device_cgroup.h>
+#include <linux/security.h>
#include <drm/drm_file.h>
#include <drm/drm_drv.h>
#include <drm/drm_device.h>
@@ -1487,9 +1487,8 @@ static inline int kfd_devcgroup_check_permission(struct kfd_node *kfd)
#if defined(CONFIG_CGROUP_DEVICE) || defined(CONFIG_CGROUP_BPF)
struct drm_device *ddev = adev_to_drm(kfd->adev);

- return devcgroup_check_permission(DEVCG_DEV_CHAR, DRM_MAJOR,
- ddev->render->index,
- DEVCG_ACC_WRITE | DEVCG_ACC_READ);
+ return security_dev_permission(S_IFCHR, MKDEV(DRM_MAJOR, ddev->render->index),
+ MAY_WRITE | MAY_READ);
#else
return 0;
#endif
--
2.30.2

2023-10-18 10:57:25

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 10/14] lsm: Add security_sb_alloc_userns() hook

Provide a new lsm hook which may be used to allow access to device
nodes for super blocks created in unprivileged namespaces if some
sort of device guard to control access is implemented.

By default this will return -EPERM if no lsm implements the hook.
A first lsm to use this will be the lately converted cgroup_device
module.

Signed-off-by: Michael Weiß <[email protected]>
---
include/linux/lsm_hook_defs.h | 1 +
include/linux/security.h | 5 +++++
security/security.c | 26 ++++++++++++++++++++++++++
3 files changed, 32 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index f4fa01182910..0f734a0a5ebc 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -278,6 +278,7 @@ LSM_HOOK(int, 0, inode_getsecctx, struct inode *inode, void **ctx,
LSM_HOOK(int, 0, dev_permission, umode_t mode, dev_t dev, int mask)
LSM_HOOK(int, -EPERM, inode_mknod_nscap, struct inode *dir, struct dentry *dentry,
umode_t mode, dev_t dev)
+LSM_HOOK(int, -EPERM, sb_alloc_userns, struct super_block *sb)

#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
LSM_HOOK(int, 0, post_notification, const struct cred *w_cred,
diff --git a/include/linux/security.h b/include/linux/security.h
index bad6992877f4..0f66be1ed1ed 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -487,6 +487,7 @@ int security_locked_down(enum lockdown_reason what);
int security_dev_permission(umode_t mode, dev_t dev, int mask);
int security_inode_mknod_nscap(struct inode *dir, struct dentry *dentry,
umode_t mode, dev_t dev);
+int security_sb_alloc_userns(struct super_block *sb);
#else /* CONFIG_SECURITY */

static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1408,6 +1409,10 @@ static inline int security_inode_mknod_nscap(struct inode *dir,
{
return -EPERM;
}
+static inline int security_sb_alloc_userns(struct super_block *sb)
+{
+ return -EPERM;
+}
#endif /* CONFIG_SECURITY */

#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/security/security.c b/security/security.c
index 7708374b6d7e..9d5d4ec28e62 100644
--- a/security/security.c
+++ b/security/security.c
@@ -4065,6 +4065,32 @@ int security_inode_mknod_nscap(struct inode *dir, struct dentry *dentry,
}
EXPORT_SYMBOL(security_inode_mknod_nscap);

+/**
+ * security_sb_alloc_userns() - Grand access to device nodes on sb in userns
+ *
+ * If device access is provided elsewere, this hook will grand access to device nodes
+ * on the allocated sb for unprivileged user namespaces.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_sb_alloc_userns(struct super_block *sb)
+{
+ int thisrc;
+ int rc = LSM_RET_DEFAULT(sb_alloc_userns);
+ struct security_hook_list *hp;
+
+ hlist_for_each_entry(hp, &security_hook_heads.sb_alloc_userns, list) {
+ thisrc = hp->hook.sb_alloc_userns(sb);
+ if (thisrc != LSM_RET_DEFAULT(sb_alloc_userns)) {
+ rc = thisrc;
+ if (thisrc != 0)
+ break;
+ }
+ }
+ return rc;
+}
+EXPORT_SYMBOL(security_sb_alloc_userns);
+
#ifdef CONFIG_WATCH_QUEUE
/**
* security_post_notification() - Check if a watch notification can be posted
--
2.30.2

2023-10-18 10:57:26

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 02/14] vfs: Remove explicit devcgroup_inode calls

Since the new lsm-based cgroup device access control is settled,
the explicit calls to devcgroup_inode_permission and
devcgroup_inode_mknod in fs/namei.c are redundant and can safely
be dropped. The corresponding security_inode_permission and
security_inode_mknod hooks are taking over.

Signed-off-by: Michael Weiß <[email protected]>
---
fs/namei.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 567ee547492b..f601fcbdc4d2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -34,7 +34,6 @@
#include <linux/capability.h>
#include <linux/file.h>
#include <linux/fcntl.h>
-#include <linux/device_cgroup.h>
#include <linux/fs_struct.h>
#include <linux/posix_acl.h>
#include <linux/hash.h>
@@ -529,10 +528,6 @@ int inode_permission(struct mnt_idmap *idmap,
if (retval)
return retval;

- retval = devcgroup_inode_permission(inode, mask);
- if (retval)
- return retval;
-
return security_inode_permission(inode, mask);
}
EXPORT_SYMBOL(inode_permission);
@@ -3987,9 +3982,6 @@ int vfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
return -EPERM;

mode = vfs_prepare_mode(idmap, dir, mode, mode, mode);
- error = devcgroup_inode_mknod(mode, dev);
- if (error)
- return error;

error = security_inode_mknod(dir, dentry, mode, dev);
if (error)
--
2.30.2

2023-10-18 10:57:30

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 05/14] device_cgroup: Implement dev_permission() hook

Wrap devcgroup_check_permission() by implementing the new security
hook dev_permission().

Signed-off-by: Michael Weiß <[email protected]>
---
security/device_cgroup/lsm.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/security/device_cgroup/lsm.c b/security/device_cgroup/lsm.c
index ef30cff1f610..987d2c20a577 100644
--- a/security/device_cgroup/lsm.c
+++ b/security/device_cgroup/lsm.c
@@ -14,29 +14,32 @@
#include <linux/device_cgroup.h>
#include <linux/lsm_hooks.h>

-static int devcg_inode_permission(struct inode *inode, int mask)
+static int devcg_dev_permission(umode_t mode, dev_t dev, int mask)
{
short type, access = 0;

- if (likely(!inode->i_rdev))
- return 0;
-
- if (S_ISBLK(inode->i_mode))
+ if (S_ISBLK(mode))
type = DEVCG_DEV_BLOCK;
- else if (S_ISCHR(inode->i_mode))
- type = DEVCG_DEV_CHAR;
else
- return 0;
+ type = DEVCG_DEV_CHAR;

if (mask & MAY_WRITE)
access |= DEVCG_ACC_WRITE;
if (mask & MAY_READ)
access |= DEVCG_ACC_READ;

- return devcgroup_check_permission(type, imajor(inode), iminor(inode),
+ return devcgroup_check_permission(type, MAJOR(dev), MINOR(dev),
access);
}

+static int devcg_inode_permission(struct inode *inode, int mask)
+{
+ if (likely(!inode->i_rdev))
+ return 0;
+
+ return devcg_dev_permission(inode->i_mode, inode->i_rdev, mask);
+}
+
static int __devcg_inode_mknod(int mode, dev_t dev, short access)
{
short type;
@@ -65,6 +68,7 @@ static int devcg_inode_mknod(struct inode *dir, struct dentry *dentry,
static struct security_hook_list devcg_hooks[] __ro_after_init = {
LSM_HOOK_INIT(inode_permission, devcg_inode_permission),
LSM_HOOK_INIT(inode_mknod, devcg_inode_mknod),
+ LSM_HOOK_INIT(dev_permission, devcg_dev_permission),
};

static int __init devcgroup_init(void)
--
2.30.2

2023-10-18 10:57:37

by Michael Weiß

[permalink] [raw]
Subject: [RFC PATCH v2 06/14] block: Switch from devcgroup_check_permission to security hook

The new lsm-based cgroup device access control provides an
equivalent hook to check device permission. Thus, switch to the
more generic security hook security_dev_permission() instead of
directly calling devcgroup_check_permission().

Signed-off-by: Michael Weiß <[email protected]>
---
block/bdev.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index f3b13aa1b7d4..fc6de4e2a80b 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -10,7 +10,6 @@
#include <linux/slab.h>
#include <linux/kmod.h>
#include <linux/major.h>
-#include <linux/device_cgroup.h>
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
#include <linux/backing-dev.h>
@@ -27,6 +26,7 @@
#include <linux/part_stat.h>
#include <linux/uaccess.h>
#include <linux/stat.h>
+#include <linux/security.h>
#include "../fs/internal.h"
#include "blk.h"

@@ -757,10 +757,9 @@ struct block_device *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
struct gendisk *disk;
int ret;

- ret = devcgroup_check_permission(DEVCG_DEV_BLOCK,
- MAJOR(dev), MINOR(dev),
- ((mode & BLK_OPEN_READ) ? DEVCG_ACC_READ : 0) |
- ((mode & BLK_OPEN_WRITE) ? DEVCG_ACC_WRITE : 0));
+ ret = security_dev_permission(S_IFBLK, dev,
+ ((mode & BLK_OPEN_READ) ? MAY_READ : 0) |
+ ((mode & BLK_OPEN_WRITE) ? MAY_WRITE : 0));
if (ret)
return ERR_PTR(ret);

--
2.30.2