2013-03-18 21:33:04

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

Caring about protecting the kernel from UID 0 was previously relatively
uninteresting, since an attacker could simply modify the kernel, a module
or an earlier part of the boot chain in order to insert new code. However,
there are now a range of widely-deployed mechanisms for ensuring the
authenticity of the early boot process and kernel. The addition of module
signing makes most of these attacks infeasible.

This means we can return our focus to the kernel. There's currently a number
of kernel interfaces that permit privileged userspace to modify the running
kernel. These are currently protected by CAP_SYS_RAWIO, but unfortunately
the semantics of this capability are poorly defined and it now covers a large
superset of the desired behaviour.

This patch introduces CAP_COMPROMISE_KERNEL. Holding this capability
indicates that a process is empowered to perform tasks that may result in
modification of the running kernel. While aimed at handling the specific
use-case of Secure Boot, it is generalisable to any other environment where
permitting userspace to modify the kernel is undesirable.

Signed-off-by: Matthew Garrett <[email protected]>
---
include/uapi/linux/capability.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index ba478fa..7109e650 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -343,7 +343,11 @@ struct vfs_cap_data {

#define CAP_BLOCK_SUSPEND 36

-#define CAP_LAST_CAP CAP_BLOCK_SUSPEND
+/* Allow things that trivially permit root to modify the running kernel */
+
+#define CAP_COMPROMISE_KERNEL 37
+
+#define CAP_LAST_CAP CAP_COMPROMISE_KERNEL

#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)

--
1.8.1.2


2013-03-18 21:33:01

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 03/12] Secure boot: Add a dummy kernel parameter that will switch on Secure Boot mode

From: Josh Boyer <[email protected]>

This forcibly drops CAP_COMPROMISE_KERNEL from both cap_permitted and cap_bset
in the init_cred struct, which everything else inherits from. This works on
any machine and can be used to develop even if the box doesn't have UEFI.

Signed-off-by: Josh Boyer <[email protected]>
---
Documentation/kernel-parameters.txt | 7 +++++++
kernel/cred.c | 17 +++++++++++++++++
2 files changed, 24 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4609e81..7c0b137 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2683,6 +2683,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Note: increases power consumption, thus should only be
enabled if running jitter sensitive (HPC/RT) workloads.

+ secureboot_enable=
+ [KNL] Enables an emulated UEFI Secure Boot mode. This
+ locks down various aspects of the kernel guarded by the
+ CAP_COMPROMISE_KERNEL capability. This includes things
+ like /dev/mem, IO port access, and other areas. It can
+ be used on non-UEFI machines for testing purposes.
+
security= [SECURITY] Choose a security module to enable at boot.
If this boot parameter is not specified, only the first
security module asking for security registration will be
diff --git a/kernel/cred.c b/kernel/cred.c
index e0573a4..c3f4e3e 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -565,6 +565,23 @@ void __init cred_init(void)
0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
}

+void __init secureboot_enable()
+{
+ pr_info("Secure boot enabled\n");
+ cap_lower((&init_cred)->cap_bset, CAP_COMPROMISE_KERNEL);
+ cap_lower((&init_cred)->cap_permitted, CAP_COMPROMISE_KERNEL);
+}
+
+/* Dummy Secure Boot enable option to fake out UEFI SB=1 */
+static int __init secureboot_enable_opt(char *str)
+{
+ int sb_enable = !!simple_strtol(str, NULL, 0);
+ if (sb_enable)
+ secureboot_enable();
+ return 1;
+}
+__setup("secureboot_enable=", secureboot_enable_opt);
+
/**
* prepare_kernel_cred - Prepare a set of credentials for a kernel service
* @daemon: A userspace daemon to be used as a reference
--
1.8.1.2

2013-03-18 21:33:22

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 10/12] acpi: Ignore acpi_rsdp kernel parameter in a secure boot environment

From: Josh Boyer <[email protected]>

This option allows userspace to pass the RSDP address to the kernel. This
could potentially be used to circumvent the secure boot trust model.
We ignore the setting if we don't have the CAP_COMPROMISE_KERNEL capability.

Signed-off-by: Josh Boyer <[email protected]>
---
drivers/acpi/osl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 586e7e9..0ef63f1 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -245,7 +245,7 @@ early_param("acpi_rsdp", setup_acpi_rsdp);
acpi_physical_address __init acpi_os_get_root_pointer(void)
{
#ifdef CONFIG_KEXEC
- if (acpi_rsdp)
+ if (acpi_rsdp && capable(CAP_COMPROMISE_KERNEL))
return acpi_rsdp;
#endif

--
1.8.1.2

2013-03-18 21:33:46

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 09/12] Require CAP_COMPROMISE_KERNEL for /dev/mem and /dev/kmem access

Allowing users to write to address space makes it possible for the kernel
to be subverted. Restrict this when we need to protect the kernel.

Signed-off-by: Matthew Garrett <[email protected]>
---
drivers/char/mem.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 7eee4d8..772ee2b 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -158,6 +158,9 @@ static ssize_t write_mem(struct file *file, const char __user *buf,
unsigned long copied;
void *ptr;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
if (!valid_phys_addr_range(p, count))
return -EFAULT;

@@ -530,6 +533,9 @@ static ssize_t write_kmem(struct file *file, const char __user *buf,
char *kbuf; /* k-addr because vwrite() takes vmlist_lock rwlock */
int err = 0;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
if (p < (unsigned long) high_memory) {
unsigned long to_write = min_t(unsigned long, count,
(unsigned long)high_memory - p);
--
1.8.1.2

2013-03-18 21:34:02

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 08/12] asus-wmi: Restrict debugfs interface

We have no way of validating what all of the Asus WMI methods do on a
given machine, and there's a risk that some will allow hardware state to
be manipulated in such a way that arbitrary code can be executed in the
kernel. Add a capability check to prevent that.

Signed-off-by: Matthew Garrett <[email protected]>
---
drivers/platform/x86/asus-wmi.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index c11b242..6d5f88f 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -1617,6 +1617,9 @@ static int show_dsts(struct seq_file *m, void *data)
int err;
u32 retval = -1;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
err = asus_wmi_get_devstate(asus, asus->debug.dev_id, &retval);

if (err < 0)
@@ -1633,6 +1636,9 @@ static int show_devs(struct seq_file *m, void *data)
int err;
u32 retval = -1;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
err = asus_wmi_set_devstate(asus->debug.dev_id, asus->debug.ctrl_param,
&retval);

@@ -1657,6 +1663,9 @@ static int show_call(struct seq_file *m, void *data)
union acpi_object *obj;
acpi_status status;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
status = wmi_evaluate_method(ASUS_WMI_MGMT_GUID,
1, asus->debug.method_id,
&input, &output);
--
1.8.1.2

2013-03-18 21:32:59

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 02/12] SELinux: define mapping for CAP_COMPROMISE_KERNEL

From: Josh Boyer <[email protected]>

Add the name of the new CAP_COMPROMISE_KERNEL capability. This allows SELinux
policies to properly map CAP_COMPROMISE_KERNEL to the appropriate
capability class.

Signed-off-by: Josh Boyer <[email protected]>
---
security/selinux/include/classmap.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 14d04e6..ed99a2d 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -146,8 +146,8 @@ struct security_class_mapping secclass_map[] = {
{ "memprotect", { "mmap_zero", NULL } },
{ "peer", { "recv", NULL } },
{ "capability2",
- { "mac_override", "mac_admin", "syslog", "wake_alarm", "block_suspend",
- NULL } },
+ { "mac_override", "mac_admin", "syslog", "wake_alarm",
+ "block_suspend", "compromise_kernel", NULL } },
{ "kernel_service", { "use_as_override", "create_files_as", NULL } },
{ "tun_socket",
{ COMMON_SOCK_PERMS, "attach_queue", NULL } },
--
1.8.1.2

2013-03-18 21:34:25

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 07/12] ACPI: Limit access to custom_method

It must be impossible for even root to get code executed in kernel context
under a secure boot environment. custom_method effectively allows arbitrary
access to system memory, so it needs to have a capability check here.

Signed-off-by: Matthew Garrett <[email protected]>
---
drivers/acpi/custom_method.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/custom_method.c b/drivers/acpi/custom_method.c
index 12b62f2..edf0710 100644
--- a/drivers/acpi/custom_method.c
+++ b/drivers/acpi/custom_method.c
@@ -29,6 +29,9 @@ static ssize_t cm_write(struct file *file, const char __user * user_buf,
struct acpi_table_header table;
acpi_status status;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
if (!(*ppos)) {
/* parse the table header to get the table length */
if (count <= sizeof(struct acpi_table_header))
--
1.8.1.2

2013-03-18 21:34:46

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 06/12] x86: Require CAP_COMPROMISE_KERNEL for IO port access

IO port access would permit users to gain access to PCI configuration
registers, which in turn (on a lot of hardware) give access to MMIO register
space. This would potentially permit root to trigger arbitrary DMA, so lock
it down by default.

Signed-off-by: Matthew Garrett <[email protected]>
---
arch/x86/kernel/ioport.c | 4 ++--
drivers/char/mem.c | 3 +++
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
index 4ddaf66..f505995 100644
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -28,7 +28,7 @@ asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int turn_on)

if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
return -EINVAL;
- if (turn_on && !capable(CAP_SYS_RAWIO))
+ if (turn_on && (!capable(CAP_SYS_RAWIO) || !capable(CAP_COMPROMISE_KERNEL)))
return -EPERM;

/*
@@ -103,7 +103,7 @@ SYSCALL_DEFINE1(iopl, unsigned int, level)
return -EINVAL;
/* Trying to gain more privileges? */
if (level > old) {
- if (!capable(CAP_SYS_RAWIO))
+ if (!capable(CAP_SYS_RAWIO) || !capable(CAP_COMPROMISE_KERNEL))
return -EPERM;
}
regs->flags = (regs->flags & ~X86_EFLAGS_IOPL) | (level << 12);
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 2c644af..7eee4d8 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -597,6 +597,9 @@ static ssize_t write_port(struct file *file, const char __user *buf,
unsigned long i = *ppos;
const char __user *tmp = buf;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
if (!access_ok(VERIFY_READ, buf, count))
return -EFAULT;
while (count-- > 0 && i < 65536) {
--
1.8.1.2

2013-03-18 21:35:04

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 05/12] PCI: Require CAP_COMPROMISE_KERNEL for PCI BAR access

Any hardware that can potentially generate DMA has to be locked down from
userspace in order to avoid it being possible for an attacker to cause
arbitrary kernel behaviour. Default to paranoid - in future we can
potentially relax this for sufficiently IOMMU-isolated devices.

Signed-off-by: Matthew Garrett <[email protected]>
---
drivers/pci/pci-sysfs.c | 9 +++++++++
drivers/pci/proc.c | 8 +++++++-
drivers/pci/syscall.c | 2 +-
3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 9c6e9bb..b966089 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -622,6 +622,9 @@ pci_write_config(struct file* filp, struct kobject *kobj,
loff_t init_off = off;
u8 *data = (u8*) buf;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
if (off > dev->cfg_size)
return 0;
if (off + count > dev->cfg_size) {
@@ -928,6 +931,9 @@ pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
resource_size_t start, end;
int i;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
for (i = 0; i < PCI_ROM_RESOURCE; i++)
if (res == &pdev->resource[i])
break;
@@ -1035,6 +1041,9 @@ pci_write_resource_io(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
loff_t off, size_t count)
{
+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
return pci_resource_io(filp, kobj, attr, buf, off, count, true);
}

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index 0b00947..7639f68 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -139,6 +139,9 @@ proc_bus_pci_write(struct file *file, const char __user *buf, size_t nbytes, lof
int size = dp->size;
int cnt;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
if (pos >= size)
return 0;
if (nbytes >= size)
@@ -219,6 +222,9 @@ static long proc_bus_pci_ioctl(struct file *file, unsigned int cmd,
#endif /* HAVE_PCI_MMAP */
int ret = 0;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
switch (cmd) {
case PCIIOC_CONTROLLER:
ret = pci_domain_nr(dev->bus);
@@ -259,7 +265,7 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
struct pci_filp_private *fpriv = file->private_data;
int i, ret;

- if (!capable(CAP_SYS_RAWIO))
+ if (!capable(CAP_SYS_RAWIO) || !capable(CAP_COMPROMISE_KERNEL))
return -EPERM;

/* Make sure the caller is mapping a real resource for this device */
diff --git a/drivers/pci/syscall.c b/drivers/pci/syscall.c
index e1c1ec5..97e785f 100644
--- a/drivers/pci/syscall.c
+++ b/drivers/pci/syscall.c
@@ -92,7 +92,7 @@ SYSCALL_DEFINE5(pciconfig_write, unsigned long, bus, unsigned long, dfn,
u32 dword;
int err = 0;

- if (!capable(CAP_SYS_ADMIN))
+ if (!capable(CAP_SYS_ADMIN) || !capable(CAP_COMPROMISE_KERNEL))
return -EPERM;

dev = pci_get_bus_and_slot(bus, dfn);
--
1.8.1.2

2013-03-18 21:35:53

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 04/12] efi: Enable secure boot lockdown automatically when enabled in firmware

The firmware has a set of flags that indicate whether secure boot is enabled
and enforcing. Use them to indicate whether the kernel should lock itself
down. We also indicate the machine is in secure boot mode by adding the
EFI_SECURE_BOOT bit for use with efi_enabled.

Signed-off-by: Matthew Garrett <[email protected]>
Signed-off-by: Josh Boyer <[email protected]>
---
Documentation/x86/zero-page.txt | 3 ++-
arch/x86/boot/compressed/eboot.c | 32 ++++++++++++++++++++++++++++++++
arch/x86/include/uapi/asm/bootparam.h | 3 ++-
arch/x86/kernel/setup.c | 5 +++++
include/linux/cred.h | 2 ++
include/linux/efi.h | 1 +
6 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
index 199f453..16f2464 100644
--- a/Documentation/x86/zero-page.txt
+++ b/Documentation/x86/zero-page.txt
@@ -29,7 +29,8 @@ Offset Proto Name Meaning
1E8/001 ALL e820_entries Number of entries in e820_map (below)
1E9/001 ALL eddbuf_entries Number of entries in eddbuf (below)
1EA/001 ALL edd_mbr_sig_buf_entries Number of entries in edd_mbr_sig_buffer
- (below)
+1EB/001 ALL kbd_status Numlock is enabled
+1EC/001 ALL secure_boot Kernel should enable secure boot lockdowns
1EF/001 ALL sentinel Used to detect broken bootloaders
290/040 ALL edd_mbr_sig_buffer EDD MBR signatures
2D0/A00 ALL e820_map E820 memory map table
diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index c205035..96d859d 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -861,6 +861,36 @@ fail:
return status;
}

+static int get_secure_boot(efi_system_table_t *_table)
+{
+ u8 sb, setup;
+ unsigned long datasize = sizeof(sb);
+ efi_guid_t var_guid = EFI_GLOBAL_VARIABLE_GUID;
+ efi_status_t status;
+
+ status = efi_call_phys5(sys_table->runtime->get_variable,
+ L"SecureBoot", &var_guid, NULL, &datasize, &sb);
+
+ if (status != EFI_SUCCESS)
+ return 0;
+
+ if (sb == 0)
+ return 0;
+
+
+ status = efi_call_phys5(sys_table->runtime->get_variable,
+ L"SetupMode", &var_guid, NULL, &datasize,
+ &setup);
+
+ if (status != EFI_SUCCESS)
+ return 0;
+
+ if (setup == 1)
+ return 0;
+
+ return 1;
+}
+
/*
* Because the x86 boot code expects to be passed a boot_params we
* need to create one ourselves (usually the bootloader would create
@@ -1155,6 +1185,8 @@ struct boot_params *efi_main(void *handle, efi_system_table_t *_table,
if (sys_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
goto fail;

+ boot_params->secure_boot = get_secure_boot(sys_table);
+
setup_graphics(boot_params);

setup_efi_pci(boot_params);
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index c15ddaf..85d7685 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -131,7 +131,8 @@ struct boot_params {
__u8 eddbuf_entries; /* 0x1e9 */
__u8 edd_mbr_sig_buf_entries; /* 0x1ea */
__u8 kbd_status; /* 0x1eb */
- __u8 _pad5[3]; /* 0x1ec */
+ __u8 secure_boot; /* 0x1ec */
+ __u8 _pad5[2]; /* 0x1ed */
/*
* The sentinel is set to a nonzero value (0xff) in header.S.
*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 90d8cc9..5ef9285 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1104,6 +1104,11 @@ void __init setup_arch(char **cmdline_p)

io_delay_init();

+ if (boot_params.secure_boot) {
+ set_bit(EFI_SECURE_BOOT, &x86_efi_facility);
+ secureboot_enable();
+ }
+
/*
* Parse the ACPI tables for possible boot-time SMP configuration.
*/
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 04421e8..9e69542 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -156,6 +156,8 @@ extern int set_security_override_from_ctx(struct cred *, const char *);
extern int set_create_files_as(struct cred *, struct inode *);
extern void __init cred_init(void);

+extern void secureboot_enable(void);
+
/*
* check for validity of credentials
*/
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 9bf2f1f..1bf382b 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -627,6 +627,7 @@ extern int __init efi_setup_pcdp_console(char *);
#define EFI_RUNTIME_SERVICES 3 /* Can we use runtime services? */
#define EFI_MEMMAP 4 /* Can we use EFI memory map? */
#define EFI_64BIT 5 /* Is the firmware 64-bit? */
+#define EFI_SECURE_BOOT 6 /* Are we in Secure Boot mode? */

#ifdef CONFIG_EFI
# ifdef CONFIG_X86
--
1.8.1.2

2013-03-18 21:38:31

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 12/12] kexec: Require CAP_SYS_COMPROMISE_KERNEL

kexec can easily be used to modify the security policy of a running kernel.
CONFIG_KEXEC_JUMP makes it trivial for an attacker to simply jump to another
kernel, modify the security policy of the previous kernel and then switch
back, but it's still merely a matter of difficulty. Long term we'll want
an interface for ensuring that kexec is able to launch signed code, but we
should default to safe behaviour for now.

Signed-off-by: Matthew Garrett <[email protected]>
---
kernel/kexec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index bddd3d7..cbdb930 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -946,7 +946,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
int result;

/* We only trust the superuser with rebooting the system. */
- if (!capable(CAP_SYS_BOOT))
+ if (!capable(CAP_SYS_BOOT) || !capable(CAP_COMPROMISE_KERNEL))
return -EPERM;

/*
--
1.8.1.2

2013-03-18 21:38:35

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 11/12] x86: Require CAP_COMPROMISE_KERNEL for MSR writing

From: Kees Cook <[email protected]>

Writing to MSRs should not be allowed unless CAP_COMPROMISE_KERNEL is
set since it could lead to execution of arbitrary code in kernel mode.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Matthew Garrett <[email protected]>
---
arch/x86/kernel/msr.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index ce13049..fa4dc6c 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -103,6 +103,9 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
int err = 0;
ssize_t bytes = 0;

+ if (!capable(CAP_COMPROMISE_KERNEL))
+ return -EPERM;
+
if (count % 8)
return -EINVAL; /* Invalid chunk size */

@@ -150,6 +153,10 @@ static long msr_ioctl(struct file *file, unsigned int ioc, unsigned long arg)
err = -EBADF;
break;
}
+ if (!capable(CAP_COMPROMISE_KERNEL)) {
+ err = -EPERM;
+ break;
+ }
if (copy_from_user(&regs, uregs, sizeof regs)) {
err = -EFAULT;
break;
--
1.8.1.2

2013-03-19 04:47:41

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Mon, 18 Mar 2013, Matthew Garrett wrote:

> This patch introduces CAP_COMPROMISE_KERNEL.

I'd like to see this named CAP_MODIFY_KERNEL, which is more accurate and
less emotive. Otherwise I think core kernel developers will be scratching
their head over where to sprinkle this.

Apart from that, I like the idea, especially when it's wired up to MAC
security.


--
James Morris
<[email protected]>

2013-03-19 07:19:37

by Yves-Alexis Perez

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On lun., 2013-03-18 at 17:32 -0400, Matthew Garrett wrote:
> This patch introduces CAP_COMPROMISE_KERNEL. Holding this capability
> indicates that a process is empowered to perform tasks that may result
> in
> modification of the running kernel. While aimed at handling the
> specific
> use-case of Secure Boot, it is generalisable to any other environment
> where
> permitting userspace to modify the kernel is undesirable.

About that, did someone looked at the way securelevel(7) is handled on
OpenBSD? This is more or less the same thing, where there's a desire to
distinguish uid 0 from ring0. They're not using a capability but more a
global state which allows more or less stuff depending on the value
(securelevel=-1 to securelevel=2).

Regards,
--
Yves-Alexis


Attachments:
signature.asc (490.00 B)
This is a digitally signed message part

2013-03-19 08:48:09

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH 10/12] acpi: Ignore acpi_rsdp kernel parameter in a secure boot environment

On 03/19/2013 05:32 AM, Matthew Garrett wrote:
> From: Josh Boyer <[email protected]>
>
> This option allows userspace to pass the RSDP address to the kernel. This
> could potentially be used to circumvent the secure boot trust model.
> We ignore the setting if we don't have the CAP_COMPROMISE_KERNEL capability.
>
> Signed-off-by: Josh Boyer <[email protected]>
> ---
> drivers/acpi/osl.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 586e7e9..0ef63f1 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -245,7 +245,7 @@ early_param("acpi_rsdp", setup_acpi_rsdp);
> acpi_physical_address __init acpi_os_get_root_pointer(void)
> {
> #ifdef CONFIG_KEXEC
> - if (acpi_rsdp)
> + if (acpi_rsdp && capable(CAP_COMPROMISE_KERNEL))
> return acpi_rsdp;
> #endif
>
>

This does not work because capable is not usable at this early point.

Josh, could you update your fix here?

--
Thanks
Dave

2013-03-19 11:19:28

by Josh Boyer

[permalink] [raw]
Subject: Re: [PATCH 10/12] acpi: Ignore acpi_rsdp kernel parameter in a secure boot environment

On Tue, Mar 19, 2013 at 04:47:27PM +0800, Dave Young wrote:
> On 03/19/2013 05:32 AM, Matthew Garrett wrote:
> > From: Josh Boyer <[email protected]>
> >
> > This option allows userspace to pass the RSDP address to the kernel. This
> > could potentially be used to circumvent the secure boot trust model.
> > We ignore the setting if we don't have the CAP_COMPROMISE_KERNEL capability.
> >
> > Signed-off-by: Josh Boyer <[email protected]>
> > ---
> > drivers/acpi/osl.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> > index 586e7e9..0ef63f1 100644
> > --- a/drivers/acpi/osl.c
> > +++ b/drivers/acpi/osl.c
> > @@ -245,7 +245,7 @@ early_param("acpi_rsdp", setup_acpi_rsdp);
> > acpi_physical_address __init acpi_os_get_root_pointer(void)
> > {
> > #ifdef CONFIG_KEXEC
> > - if (acpi_rsdp)
> > + if (acpi_rsdp && capable(CAP_COMPROMISE_KERNEL))
> > return acpi_rsdp;
> > #endif
> >
> >
>
> This does not work because capable is not usable at this early point.

Right.

> Josh, could you update your fix here?

I have. Twice. Matthew sent out a stale patch.

josh

2013-03-19 17:07:58

by Josh Boyer

[permalink] [raw]
Subject: [PATCH v2] acpi: Ignore acpi_rsdp kernel parameter in a secure boot environment

This option allows userspace to pass the RSDP address to the kernel. This
could potentially be used to circumvent the secure boot trust model.
This is setup through the setup_arch function, which is called before the
security_init function sets up the security_ops, so we cannot use a
capable call here. We ignore the setting if we are booted in Secure Boot
mode.

Signed-off-by: Josh Boyer <[email protected]>
---

v2: Actually send it to Matthew this time

drivers/acpi/osl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 586e7e9..8950454 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -245,7 +245,7 @@ early_param("acpi_rsdp", setup_acpi_rsdp);
acpi_physical_address __init acpi_os_get_root_pointer(void)
{
#ifdef CONFIG_KEXEC
- if (acpi_rsdp)
+ if (acpi_rsdp && !efi_enabled(EFI_SECURE_BOOT))
return acpi_rsdp;
#endif

--
1.8.1.2

2013-03-20 01:00:39

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 06/12] x86: Require CAP_COMPROMISE_KERNEL for IO port access

On 03/18/2013 02:32 PM, Matthew Garrett wrote:
> IO port access would permit users to gain access to PCI configuration
> registers, which in turn (on a lot of hardware) give access to MMIO register
> space. This would potentially permit root to trigger arbitrary DMA, so lock
> it down by default.
>

Again, if you are going to do this, just kill bloody RAWIO. Quite
frankly saying brainfucked SCSI drivers have abused it doesn't hold
water, because as far as you know those same commands can trigger
arbitrary DMA activity.

-hpa

2013-03-20 01:03:06

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On 03/18/2013 02:32 PM, Matthew Garrett wrote:
>
> This means we can return our focus to the kernel. There's currently a number
> of kernel interfaces that permit privileged userspace to modify the running
> kernel. These are currently protected by CAP_SYS_RAWIO, but unfortunately
> the semantics of this capability are poorly defined and it now covers a large
> superset of the desired behaviour.
>

... except it doesn't.

Looking at it in detail, EVERYTHING in CAP_SYS_RAWIO has the possibility
of compromising the kernel, because they let device drivers be bypassed,
which means arbitrary DMA, which means you have everything.

Now, a lot of the abuses of CAP_SYS_RAWIO have clearly been added by
people who had *no bloody clue* what that capability meant, but it
really doesn't change the fact that pretty much if you have
CAP_SYS_RAWIO you have the machine.

So just reject CAP_SYS_RAWIO.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-03-20 01:04:07

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On 03/18/2013 09:47 PM, James Morris wrote:
> On Mon, 18 Mar 2013, Matthew Garrett wrote:
>
>> This patch introduces CAP_COMPROMISE_KERNEL.
>
> I'd like to see this named CAP_MODIFY_KERNEL, which is more accurate and
> less emotive. Otherwise I think core kernel developers will be scratching
> their head over where to sprinkle this.
>
> Apart from that, I like the idea, especially when it's wired up to MAC
> security.

The wiring up to MAC security is a nice touch.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-03-20 01:05:32

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 06/12] x86: Require CAP_COMPROMISE_KERNEL for IO port access

Easiest way to do that would be to replace some existing users of CAP_RAW_IO with CAP_SYS_ADMIN and then just insert a couple of extra RAW_IO checks. That would break some existing userspace, but so would introducing a new capability. I'm happy to go that way, but would appreciate some broader feedback that that's the way to go.
--
Matthew Garrett | [email protected]????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-03-20 01:05:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On 03/19/2013 06:02 PM, H. Peter Anvin wrote:
>
> Looking at it in detail, EVERYTHING in CAP_SYS_RAWIO has the possibility
> of compromising the kernel, because they let device drivers be bypassed,
> which means arbitrary DMA, which means you have everything.
>

Well, *unless* you have an iommu that you *actually know* is protecting you.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-03-20 01:07:32

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

Yeah, I'd like the option of relaxing restrictions when drivers explicitly opt in based on iommu support.
--
Matthew Garrett | [email protected]????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-03-20 01:09:44

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

The cases I'd looked at seemed to mostly involve obsolete hardware or only allow command submission to SCSI targets, so I wasn't too worried about them - but, like I said, I've no inherent objection to using CAP_SYS_RAWIO as long as we modify any cases where userspace really does need that access.
--
Matthew Garrett | [email protected]????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-03-20 01:11:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On 03/19/2013 06:07 PM, Matthew Garrett wrote:
> Yeah, I'd like the option of relaxing restrictions when drivers explicitly opt in based on iommu support.

When drivers opt in they can provide an interface. The interesting case
becomes non-drivers.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-03-20 01:28:21

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

Mm. The question is whether we can reliably determine the ranges a device should be able to access without having to trust userspace (and, ideally, without having to worry about whether iommu vendors have done their job). It's pretty important for PCI passthrough, so we do need to care.
--
Matthew Garrett | [email protected]????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-03-20 02:48:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On 03/19/2013 06:28 PM, Matthew Garrett wrote:
> Mm. The question is whether we can reliably determine the ranges a device should be able to access without having to trust userspace (and, ideally, without having to worry about whether iommu vendors have done their job). It's pretty important for PCI passthrough, so we do need to care.

It is actually very simple: the device should be able to DMA into/out of:

1. pinned pages
2. owned by the process controlling the device

... and nothing else.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-03-20 03:08:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On 03/19/2013 07:48 PM, H. Peter Anvin wrote:
> On 03/19/2013 06:28 PM, Matthew Garrett wrote:
>> Mm. The question is whether we can reliably determine the ranges a device should be able to access without having to trust userspace (and, ideally, without having to worry about whether iommu vendors have done their job). It's pretty important for PCI passthrough, so we do need to care.
>
> It is actually very simple: the device should be able to DMA into/out of:
>
> 1. pinned pages
> 2. owned by the process controlling the device
>
> ... and nothing else.
>

The "pinning" process needs to involve a call to the kernel to process
the page for DMA (pinning the page and opening it in the iommu) and
return a transaction address, of course.

I think we have the interface for that in vfio, but I haven't followed
that work.

-hpa

2013-03-20 03:18:33

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Tue, 2013-03-19 at 20:08 -0700, H. Peter Anvin wrote:
> On 03/19/2013 07:48 PM, H. Peter Anvin wrote:
> > On 03/19/2013 06:28 PM, Matthew Garrett wrote:
> >> Mm. The question is whether we can reliably determine the ranges a
> device should be able to access without having to trust userspace
> (and, ideally, without having to worry about whether iommu vendors
> have done their job). It's pretty important for PCI passthrough, so we
> do need to care.
> >
> > It is actually very simple: the device should be able to DMA into/out of:
> >
> > 1. pinned pages
> > 2. owned by the process controlling the device
> >
> > ... and nothing else.
> >
>
> The "pinning" process needs to involve a call to the kernel to process
> the page for DMA (pinning the page and opening it in the iommu) and
> return a transaction address, of course.
>
> I think we have the interface for that in vfio, but I haven't followed
> that work.

Yes, vfio does this and is meant to provide a secure-boot-friendly PCI
passthrough interface. Thanks,

Alex

2013-03-20 03:22:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On 03/19/2013 08:18 PM, Alex Williamson wrote:
>>
>> The "pinning" process needs to involve a call to the kernel to process
>> the page for DMA (pinning the page and opening it in the iommu) and
>> return a transaction address, of course.
>>
>> I think we have the interface for that in vfio, but I haven't followed
>> that work.
>
> Yes, vfio does this and is meant to provide a secure-boot-friendly PCI
> passthrough interface. Thanks,
>

Right, and presumably vfio does *not* require CAP_SYS_RAWIO, right?

-hpa

2013-03-20 03:27:25

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Tue, 2013-03-19 at 20:22 -0700, H. Peter Anvin wrote:
> On 03/19/2013 08:18 PM, Alex Williamson wrote:
> >>
> >> The "pinning" process needs to involve a call to the kernel to process
> >> the page for DMA (pinning the page and opening it in the iommu) and
> >> return a transaction address, of course.
> >>
> >> I think we have the interface for that in vfio, but I haven't followed
> >> that work.
> >
> > Yes, vfio does this and is meant to provide a secure-boot-friendly PCI
> > passthrough interface. Thanks,
> >
>
> Right, and presumably vfio does *not* require CAP_SYS_RAWIO, right?

Correct

2013-03-20 13:17:16

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Tue, 2013-03-19 at 18:02 -0700, H. Peter Anvin wrote:

> Looking at it in detail, EVERYTHING in CAP_SYS_RAWIO has the possibility
> of compromising the kernel, because they let device drivers be bypassed,
> which means arbitrary DMA, which means you have everything.

Having checked again, I don't think this is true. The most obvious case
is libata, which uses CAP_SYS_RAWIO to limit the ability to send raw ATA
commands. Being able to do so clearly permits userspace to avoid any
kind of policy the vfs has put in place, but there's no obvious way for
the user to modify the running kernel. Are you suggesting that removing
the CAP_SYS_RAWIO check there would be reasonable?

--
Matthew Garrett | [email protected]
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-03-20 15:03:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

CAP_SYS_RAWIO is definitely inappropriate there.

Matthew Garrett <[email protected]> wrote:

>On Tue, 2013-03-19 at 18:02 -0700, H. Peter Anvin wrote:
>
>> Looking at it in detail, EVERYTHING in CAP_SYS_RAWIO has the
>possibility
>> of compromising the kernel, because they let device drivers be
>bypassed,
>> which means arbitrary DMA, which means you have everything.
>
>Having checked again, I don't think this is true. The most obvious case
>is libata, which uses CAP_SYS_RAWIO to limit the ability to send raw
>ATA
>commands. Being able to do so clearly permits userspace to avoid any
>kind of policy the vfs has put in place, but there's no obvious way for
>the user to modify the running kernel. Are you suggesting that removing
>the CAP_SYS_RAWIO check there would be reasonable?

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-03-20 15:14:37

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Wed, 2013-03-20 at 08:03 -0700, H. Peter Anvin wrote:
> CAP_SYS_RAWIO is definitely inappropriate there.

Ok. How do we fix that without breaking userspace that expects
CAP_SYS_RAWIO to be sufficient?

--
Matthew Garrett | [email protected]
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-03-20 16:42:45

by Mimi Zohar

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Tue, 2013-03-19 at 15:47 +1100, James Morris wrote:
> On Mon, 18 Mar 2013, Matthew Garrett wrote:
>
> > This patch introduces CAP_COMPROMISE_KERNEL.
>
> I'd like to see this named CAP_MODIFY_KERNEL, which is more accurate and
> less emotive. Otherwise I think core kernel developers will be scratching
> their head over where to sprinkle this.
>
> Apart from that, I like the idea, especially when it's wired up to MAC
> security.

Matthrew, perhaps you could clarify whether this will be tied to MAC
security. Based on the kexec thread, I'm under the impression that is
not the intention, or at least not for kexec. As root isn't trusted,
neither is the boot command line, nor any policy that is loaded by root,
including those for MAC.

thanks,

Mimi

2013-03-20 16:46:11

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On 03/20/2013 08:14 AM, Matthew Garrett wrote:
> On Wed, 2013-03-20 at 08:03 -0700, H. Peter Anvin wrote:
>> CAP_SYS_RAWIO is definitely inappropriate there.
>
> Ok. How do we fix that without breaking userspace that expects
> CAP_SYS_RAWIO to be sufficient?
>

I don't think we can to some way, because when what you have is
fundamentally broken, it's hard to fix.

However, it is extremely likely that the number of affected applications
is vanishingly small. There probably are a handful of apps that do
this, and I wouldn't be surprised if most of them simply run as root.

-hpa

2013-03-20 16:49:41

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Wed, 2013-03-20 at 12:41 -0400, Mimi Zohar wrote:

> Matthrew, perhaps you could clarify whether this will be tied to MAC
> security. Based on the kexec thread, I'm under the impression that is
> not the intention, or at least not for kexec. As root isn't trusted,
> neither is the boot command line, nor any policy that is loaded by root,
> including those for MAC.

The work done on signed initramfs fragments would seem to be the best
option here so far?

--
Matthew Garrett | [email protected]
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-03-20 18:07:03

by Mimi Zohar

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Wed, 2013-03-20 at 16:49 +0000, Matthew Garrett wrote:
> On Wed, 2013-03-20 at 12:41 -0400, Mimi Zohar wrote:
>
> > Matthrew, perhaps you could clarify whether this will be tied to MAC
> > security. Based on the kexec thread, I'm under the impression that is
> > not the intention, or at least not for kexec. As root isn't trusted,
> > neither is the boot command line, nor any policy that is loaded by root,
> > including those for MAC.
>
> The work done on signed initramfs fragments would seem to be the best
> option here so far?

Sorry, I'm not sure to which work you're referring. If you're referring
to Dmitry's "initramfs with digital signature protection" patches, then
we're speaking about enforcing integrity, not MAC security.

thanks,

Mimi

2013-03-20 18:12:47

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Wed, 2013-03-20 at 14:01 -0400, Mimi Zohar wrote:

> Sorry, I'm not sure to which work you're referring. If you're referring
> to Dmitry's "initramfs with digital signature protection" patches, then
> we're speaking about enforcing integrity, not MAC security.

Well, in the absence of hardcoded in-kernel policy, there needs to be
some mechanism for ensuring the integrity of a policy. Shipping a signed
policy initramfs fragment and having any Secure Boot bootloaders pass a
flag in bootparams indicating that the kernel should panic if that
fragment isn't present would seem to be the easiest way of doing that.
Or have I misunderstood the question?

--
Matthew Garrett | [email protected]
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-03-20 19:16:43

by Mimi Zohar

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Wed, 2013-03-20 at 18:12 +0000, Matthew Garrett wrote:
> On Wed, 2013-03-20 at 14:01 -0400, Mimi Zohar wrote:
>
> > Sorry, I'm not sure to which work you're referring. If you're referring
> > to Dmitry's "initramfs with digital signature protection" patches, then
> > we're speaking about enforcing integrity, not MAC security.
>
> Well, in the absence of hardcoded in-kernel policy, there needs to be
> some mechanism for ensuring the integrity of a policy. Shipping a signed
> policy initramfs fragment and having any Secure Boot bootloaders pass a
> flag in bootparams indicating that the kernel should panic if that
> fragment isn't present would seem to be the easiest way of doing that.
> Or have I misunderstood the question?

Ok, I was confused by the term "fragmented" initramfs. So once you have
verified the "early" fragmented initramfs signature, this initramfs will
load the "trusted" public keys and could also load the MAC policy. (I
realize that dracut is currently loading the MAC policy, not the
initramfs.) The MAC policy would then be trusted, right? Could we then
use the LSM labels for defining an integrity policy for kexec?

thanks,

Mimi

2013-03-20 20:38:54

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Wed, 2013-03-20 at 15:16 -0400, Mimi Zohar wrote:
> On Wed, 2013-03-20 at 18:12 +0000, Matthew Garrett wrote:
> > Well, in the absence of hardcoded in-kernel policy, there needs to be
> > some mechanism for ensuring the integrity of a policy. Shipping a signed
> > policy initramfs fragment and having any Secure Boot bootloaders pass a
> > flag in bootparams indicating that the kernel should panic if that
> > fragment isn't present would seem to be the easiest way of doing that.
> > Or have I misunderstood the question?
>
> Ok, I was confused by the term "fragmented" initramfs. So once you have
> verified the "early" fragmented initramfs signature, this initramfs will
> load the "trusted" public keys and could also load the MAC policy. (I
> realize that dracut is currently loading the MAC policy, not the
> initramfs.) The MAC policy would then be trusted, right? Could we then
> use the LSM labels for defining an integrity policy for kexec?

Right, that'd be the rough idea. Any further runtime policy updates
would presumably need to be signed with a trusted key.

--
Matthew Garrett | [email protected]
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-03-20 21:11:27

by Mimi Zohar

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Wed, 2013-03-20 at 20:37 +0000, Matthew Garrett wrote:
> On Wed, 2013-03-20 at 15:16 -0400, Mimi Zohar wrote:
> > On Wed, 2013-03-20 at 18:12 +0000, Matthew Garrett wrote:
> > > Well, in the absence of hardcoded in-kernel policy, there needs to be
> > > some mechanism for ensuring the integrity of a policy. Shipping a signed
> > > policy initramfs fragment and having any Secure Boot bootloaders pass a
> > > flag in bootparams indicating that the kernel should panic if that
> > > fragment isn't present would seem to be the easiest way of doing that.
> > > Or have I misunderstood the question?
> >
> > Ok, I was confused by the term "fragmented" initramfs. So once you have
> > verified the "early" fragmented initramfs signature, this initramfs will
> > load the "trusted" public keys and could also load the MAC policy. (I
> > realize that dracut is currently loading the MAC policy, not the
> > initramfs.) The MAC policy would then be trusted, right? Could we then
> > use the LSM labels for defining an integrity policy for kexec?
>
> Right, that'd be the rough idea. Any further runtime policy updates
> would presumably need to be signed with a trusted key.

I'm really sorry to belabor this point, but can kexec rely on an LSM
label to identify a specific file, out of all the files being executed,
in a secure boot environment? The SELinux integrity rule for kexec
would then look something like,

appraise func=BPRM_CHECK obj_type=kdump_exec_t appraise_type=imasig

We could then follow this up with Serge's idea of, "a capset
akin to the bounding set, saying you can only have the caps in this set
if the running binary was a signed one." kexec already requires
CAP_SYS_BOOT.

thanks,

Mimi

2013-03-20 21:18:48

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Wed, 2013-03-20 at 17:11 -0400, Mimi Zohar wrote:
> On Wed, 2013-03-20 at 20:37 +0000, Matthew Garrett wrote:
> > Right, that'd be the rough idea. Any further runtime policy updates
> > would presumably need to be signed with a trusted key.
>
> I'm really sorry to belabor this point, but can kexec rely on an LSM
> label to identify a specific file, out of all the files being executed,
> in a secure boot environment? The SELinux integrity rule for kexec
> would then look something like,
>
> appraise func=BPRM_CHECK obj_type=kdump_exec_t appraise_type=imasig

It would certainly be possible to configure a system such that this was
true (assuming support for signed initramfs and restricted policy
loading), and anyone wanting to ensure that kexec only loaded trusted
binaries would have to ensure that their system was appropriately
configured. Having some mechanism to then give the kexec binary
CAP_MODIFY_KERNEL would avoid needing an extra kexec entry point.

--
Matthew Garrett | [email protected]
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-03-21 01:58:04

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Wed, 20 Mar 2013, Mimi Zohar wrote:

> On Tue, 2013-03-19 at 15:47 +1100, James Morris wrote:
> > On Mon, 18 Mar 2013, Matthew Garrett wrote:
> >
> > > This patch introduces CAP_COMPROMISE_KERNEL.
> >
> > I'd like to see this named CAP_MODIFY_KERNEL, which is more accurate and
> > less emotive. Otherwise I think core kernel developers will be scratching
> > their head over where to sprinkle this.
> >
> > Apart from that, I like the idea, especially when it's wired up to MAC
> > security.
>
> Matthrew, perhaps you could clarify whether this will be tied to MAC
> security.

All capabilities are, via LSM.


--
James Morris
<[email protected]>

2013-03-21 13:44:00

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Wed, Mar 20, 2013 at 09:18:10PM +0000, Matthew Garrett wrote:
> On Wed, 2013-03-20 at 17:11 -0400, Mimi Zohar wrote:
> > On Wed, 2013-03-20 at 20:37 +0000, Matthew Garrett wrote:
> > > Right, that'd be the rough idea. Any further runtime policy updates
> > > would presumably need to be signed with a trusted key.
> >
> > I'm really sorry to belabor this point, but can kexec rely on an LSM
> > label to identify a specific file, out of all the files being executed,
> > in a secure boot environment? The SELinux integrity rule for kexec
> > would then look something like,
> >
> > appraise func=BPRM_CHECK obj_type=kdump_exec_t appraise_type=imasig
>
> It would certainly be possible to configure a system such that this was
> true (assuming support for signed initramfs and restricted policy
> loading), and anyone wanting to ensure that kexec only loaded trusted
> binaries would have to ensure that their system was appropriately
> configured. Having some mechanism to then give the kexec binary
> CAP_MODIFY_KERNEL would avoid needing an extra kexec entry point.

Giving CAP_MODIFY_KERNEL to processess upon signature verification
will simplify things a bit.

Only thing is that signature verification alone is not sufficient. We
also need to make sure after signature verification executable can
not be modified in memory in any way. So that means atleast couple of
things.

- Process code/data should not be swapped out. Otherwise it can possibly
be written by unsigned priviliged processes and then faulted in back.

- Because priviliged unsigned processes can bypass file system and
directly write to disk, do not cache appraisal results. So create a
way in IMA rules to not cache the results.

I think memory locking part is little tricky as what part of files are
to be locked will depend on the binary loader (and not IMA). May be IMA
can set a flag somewhere which gives an hint to binary loader that lock
down file. Once the file has been locked down, binary loader should
set some flag too and call security hook. This flag will be a hint to IMA
that file has been locked down, another appraisal happens and if
it passes successfuly, then IMA can give CAP_MODIFY_KERNEL capability
to the process.

Another small nit is appraise_type=imasig. Given the fact that there
can be many formats of digital signature, we might have to make it
more fine grained to be able to specify a particular kind of digital
signature and not every possible digital signature supported.

Assuming all this works, I can look into how /sbin/kexec can call into
kernel to verify integrity of bzImage before it is loaded. Not sure one
needs to very PE/COFF signature or bzImage will be re-signed using IMA
and one needs to call into IMA. I think here also we will have to first
lock down file in memory, make sure nobody can open file for writes,
and then do signature verification.

Thanks
Vivek

2013-03-21 15:36:40

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

Quoting Vivek Goyal ([email protected]):
...
> Giving CAP_MODIFY_KERNEL to processess upon signature verification
> will simplify things a bit.
>
> Only thing is that signature verification alone is not sufficient. We
> also need to make sure after signature verification executable can
> not be modified in memory in any way. So that means atleast couple of
> things.

Also what about context? If I construct a mounts namespace a certain
way, can I trick this executable into loading an old singed bzImage that
I had laying around?

ISTM the only sane thing to do, if you're going down this road,
is to have CAP_MODIFIY_KERNEL pulled from bounding set for everyone
except a getty started by init on ttyS0. Then log in on serial
to update. Or run a damon with CAP_MODIFIY_KERNEL which listens
to a init_net_ns netlink socket for very basic instructions, like
"find and install latest signed bzImage in /boot". Then you can
at least trust that /boot for that daemon is not faked.

2013-03-21 15:52:39

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Thu, Mar 21, 2013 at 10:37:25AM -0500, Serge E. Hallyn wrote:
> Quoting Vivek Goyal ([email protected]):
> ...
> > Giving CAP_MODIFY_KERNEL to processess upon signature verification
> > will simplify things a bit.
> >
> > Only thing is that signature verification alone is not sufficient. We
> > also need to make sure after signature verification executable can
> > not be modified in memory in any way. So that means atleast couple of
> > things.
>
> Also what about context? If I construct a mounts namespace a certain
> way, can I trick this executable into loading an old singed bzImage that
> I had laying around?

We were thinking that /sbin/kexec will need to verify bzImage signature
before loading it.

Key for verification is in kernel so idea was to take kernel's help
in verifying signature.

Not sure how exactly the interface should look like.

- I was thinking may be process can mmap() the bzImage with MAP_LOCKED.
We can create additional flag say MAP_VERIFY_SIG_POST, which tries
to verify signature/integrity of mapped region of file after mapping and
locking pages and mmap() fails if signature verification fails.

There have been suggestions about creating new system call altogether.

>
> ISTM the only sane thing to do, if you're going down this road,
> is to have CAP_MODIFIY_KERNEL pulled from bounding set for everyone
> except a getty started by init on ttyS0. Then log in on serial
> to update. Or run a damon with CAP_MODIFIY_KERNEL which listens
> to a init_net_ns netlink socket for very basic instructions, like
> "find and install latest signed bzImage in /boot". Then you can
> at least trust that /boot for that daemon is not faked.

daemon does not have the key and can't verify signature of signed
bzImage. Even if it had the key, it can't trust the crypto code for
signature verification as none of that is signed.

Thanks
Vivek

2013-03-21 15:57:37

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

Quoting Vivek Goyal ([email protected]):
> On Thu, Mar 21, 2013 at 10:37:25AM -0500, Serge E. Hallyn wrote:
> > Quoting Vivek Goyal ([email protected]):
> > ...
> > > Giving CAP_MODIFY_KERNEL to processess upon signature verification
> > > will simplify things a bit.
> > >
> > > Only thing is that signature verification alone is not sufficient. We
> > > also need to make sure after signature verification executable can
> > > not be modified in memory in any way. So that means atleast couple of
> > > things.
> >
> > Also what about context? If I construct a mounts namespace a certain
> > way, can I trick this executable into loading an old singed bzImage that
> > I had laying around?
>
> We were thinking that /sbin/kexec will need to verify bzImage signature
> before loading it.
>
> Key for verification is in kernel so idea was to take kernel's help
> in verifying signature.
>
> Not sure how exactly the interface should look like.
>
> - I was thinking may be process can mmap() the bzImage with MAP_LOCKED.
> We can create additional flag say MAP_VERIFY_SIG_POST, which tries
> to verify signature/integrity of mapped region of file after mapping and
> locking pages and mmap() fails if signature verification fails.
>
> There have been suggestions about creating new system call altogether.
>
> >
> > ISTM the only sane thing to do, if you're going down this road,
> > is to have CAP_MODIFIY_KERNEL pulled from bounding set for everyone
> > except a getty started by init on ttyS0. Then log in on serial
> > to update. Or run a damon with CAP_MODIFIY_KERNEL which listens
> > to a init_net_ns netlink socket for very basic instructions, like
> > "find and install latest signed bzImage in /boot". Then you can
> > at least trust that /boot for that daemon is not faked.
>
> daemon does not have the key and can't verify signature of signed
> bzImage. Even if it had the key, it can't trust the crypto code for
> signature verification as none of that is signed.

I'm not saying not to use the kernel to verify the signature.

-serge

2013-03-21 16:09:55

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Thu, Mar 21, 2013 at 10:58:23AM -0500, Serge E. Hallyn wrote:
> Quoting Vivek Goyal ([email protected]):
> > On Thu, Mar 21, 2013 at 10:37:25AM -0500, Serge E. Hallyn wrote:
> > > Quoting Vivek Goyal ([email protected]):
> > > ...
> > > > Giving CAP_MODIFY_KERNEL to processess upon signature verification
> > > > will simplify things a bit.
> > > >
> > > > Only thing is that signature verification alone is not sufficient. We
> > > > also need to make sure after signature verification executable can
> > > > not be modified in memory in any way. So that means atleast couple of
> > > > things.
> > >
> > > Also what about context? If I construct a mounts namespace a certain
> > > way, can I trick this executable into loading an old singed bzImage that
> > > I had laying around?
> >
> > We were thinking that /sbin/kexec will need to verify bzImage signature
> > before loading it.
> >
> > Key for verification is in kernel so idea was to take kernel's help
> > in verifying signature.
> >
> > Not sure how exactly the interface should look like.
> >
> > - I was thinking may be process can mmap() the bzImage with MAP_LOCKED.
> > We can create additional flag say MAP_VERIFY_SIG_POST, which tries
> > to verify signature/integrity of mapped region of file after mapping and
> > locking pages and mmap() fails if signature verification fails.
> >
> > There have been suggestions about creating new system call altogether.
> >
> > >
> > > ISTM the only sane thing to do, if you're going down this road,
> > > is to have CAP_MODIFIY_KERNEL pulled from bounding set for everyone
> > > except a getty started by init on ttyS0. Then log in on serial
> > > to update. Or run a damon with CAP_MODIFIY_KERNEL which listens
> > > to a init_net_ns netlink socket for very basic instructions, like
> > > "find and install latest signed bzImage in /boot". Then you can
> > > at least trust that /boot for that daemon is not faked.
> >
> > daemon does not have the key and can't verify signature of signed
> > bzImage. Even if it had the key, it can't trust the crypto code for
> > signature verification as none of that is signed.
>
> I'm not saying not to use the kernel to verify the signature.

Ok. So why can't /sbin/kexec can do the verification of bzImage with
kernel's help. Due to crafted /boot/ it might load old signed bzImage,
but it can't load unsigned/untrusted code on secureboot system at ring 0.

I am hoping I did not miss your point entirely.

Thanks
Vivek

2013-03-21 16:19:08

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

Quoting Vivek Goyal ([email protected]):
> On Thu, Mar 21, 2013 at 10:58:23AM -0500, Serge E. Hallyn wrote:
> > Quoting Vivek Goyal ([email protected]):
> > > On Thu, Mar 21, 2013 at 10:37:25AM -0500, Serge E. Hallyn wrote:
> > > > Quoting Vivek Goyal ([email protected]):
> > > > ...
> > > > > Giving CAP_MODIFY_KERNEL to processess upon signature verification
> > > > > will simplify things a bit.
> > > > >
> > > > > Only thing is that signature verification alone is not sufficient. We
> > > > > also need to make sure after signature verification executable can
> > > > > not be modified in memory in any way. So that means atleast couple of
> > > > > things.
> > > >
> > > > Also what about context? If I construct a mounts namespace a certain
> > > > way, can I trick this executable into loading an old singed bzImage that
> > > > I had laying around?
> > >
> > > We were thinking that /sbin/kexec will need to verify bzImage signature
> > > before loading it.
> > >
> > > Key for verification is in kernel so idea was to take kernel's help
> > > in verifying signature.
> > >
> > > Not sure how exactly the interface should look like.
> > >
> > > - I was thinking may be process can mmap() the bzImage with MAP_LOCKED.
> > > We can create additional flag say MAP_VERIFY_SIG_POST, which tries
> > > to verify signature/integrity of mapped region of file after mapping and
> > > locking pages and mmap() fails if signature verification fails.
> > >
> > > There have been suggestions about creating new system call altogether.
> > >
> > > >
> > > > ISTM the only sane thing to do, if you're going down this road,
> > > > is to have CAP_MODIFIY_KERNEL pulled from bounding set for everyone
> > > > except a getty started by init on ttyS0. Then log in on serial
> > > > to update. Or run a damon with CAP_MODIFIY_KERNEL which listens
> > > > to a init_net_ns netlink socket for very basic instructions, like
> > > > "find and install latest signed bzImage in /boot". Then you can
> > > > at least trust that /boot for that daemon is not faked.
> > >
> > > daemon does not have the key and can't verify signature of signed
> > > bzImage. Even if it had the key, it can't trust the crypto code for
> > > signature verification as none of that is signed.
> >
> > I'm not saying not to use the kernel to verify the signature.
>
> Ok. So why can't /sbin/kexec can do the verification of bzImage with
> kernel's help. Due to crafted /boot/ it might load old signed bzImage,
> but it can't load unsigned/untrusted code on secureboot system at ring 0.
>
> I am hoping I did not miss your point entirely.

No, you didn't. If replay attacks are not a concern then that bit
doesn't matter. But if^Wwhen there is a vulnerability in a signed kernel,
and a user has a copy of bzImage sitting around, signed kexec alone does
not suffice (and I'm assuming revocation is not going into the kernel?).
It seems to me if replay attacks are ignored, this is all for theater...

The other concern is analogous, just more general - seems like I may very
well be able to find a way to corrupt kexec or even corrupt the kernel with
a bad environment.

So I'm just saying that in general it doesn't seem worth having a special
list of capabilities that only signed executables can have, without doing
something about the environment. And that the solution to that seems like
what we can already do today (with a bounding set and init-launched
services).

All of this is probably premature though. IIUC the first thing you are
after is a way to record on the file the fact that it is a verified-signature
binary, and that's what CAP_SIGNED meant right? I agree we need something
like that, but using a capability is not right. You can add a field to
the binprm or file or f_cred, or even add a field to the capability struct,
meaningful only on files, to show it was signed - but not taint the list of
capabilities with something that is not a capability. I haven't looked
closer to see which would be the best way (my hunch would be binprm), will
be happy to come up with a proposal when I have time, but I don't want to slow
you down :)

-serge

2013-03-21 17:16:08

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 01/12] Security: Add CAP_COMPROMISE_KERNEL

On Thu, Mar 21, 2013 at 11:19:52AM -0500, Serge E. Hallyn wrote:

[..]
> > I am hoping I did not miss your point entirely.
>
> No, you didn't. If replay attacks are not a concern then that bit
> doesn't matter. But if^Wwhen there is a vulnerability in a signed kernel,
> and a user has a copy of bzImage sitting around, signed kexec alone does
> not suffice (and I'm assuming revocation is not going into the kernel?).
> It seems to me if replay attacks are ignored, this is all for theater...
>

As matthew mentioned, revocation list is in kernel. So old vulnerable
kernels should fail signature verification.

> The other concern is analogous, just more general - seems like I may very
> well be able to find a way to corrupt kexec or even corrupt the kernel with
> a bad environment.
>
> So I'm just saying that in general it doesn't seem worth having a special
> list of capabilities that only signed executables can have, without doing
> something about the environment.

Agreed that only being signed is part of the problem. Environment is
important too. And running signed binaries memory locked is I think
one part of controlling the environment. But there might be other
things too which I am blissfully unaware of.

Right now there were few things we were considering for controlling
the environemnt.

- Build /sbin/kexec statically and sign only statically linked exeutables.
- Run executables memory locked
- Unsigned binary can not ptrace() signed one.

> And that the solution to that seems like
> what we can already do today (with a bounding set and init-launched
> services).

Frankly speaking I did not understand this part. For secureboot issue
we don't trust root and don't trust init. I am assuming any restricted
environment setup will have to be done by a trusted entity.

>
> All of this is probably premature though. IIUC the first thing you are
> after is a way to record on the file the fact that it is a verified-signature
> binary, and that's what CAP_SIGNED meant right?

Yes, that was the first thing. How to reliably sign and verify signature
of a executable. Also make sure executable code/data can not modified
in memory later by anything untrusted.

> I agree we need something
> like that, but using a capability is not right. You can add a field to
> the binprm or file or f_cred, or even add a field to the capability struct,
> meaningful only on files, to show it was signed - but not taint the list of
> capabilities with something that is not a capability.

Ok, I will look into other options too. Agreed being signed is not a
capability. But being signed along with other attributes should allow to
get one a capability (CAP_MODIFY_KERNEL in this case). I am not sure why
nobody likes that idea. But that's fine, I will go with advice of subject
matter experts.

> I haven't looked
> closer to see which would be the best way (my hunch would be binprm), will
> be happy to come up with a proposal when I have time, but I don't want to slow
> you down :)

Any suggetions are greatly appreciated whenever time permits. In the mean
time I will atleast write more code and post it for RFC and hopefully
there will be some consensus on how to solve kexec issue.

Thanks
Vivek

2013-03-27 15:03:28

by Josh Boyer

[permalink] [raw]
Subject: Re: [PATCH 05/12] PCI: Require CAP_COMPROMISE_KERNEL for PCI BAR access

On Mon, Mar 18, 2013 at 5:32 PM, Matthew Garrett
<[email protected]> wrote:
> Any hardware that can potentially generate DMA has to be locked down from
> userspace in order to avoid it being possible for an attacker to cause
> arbitrary kernel behaviour. Default to paranoid - in future we can
> potentially relax this for sufficiently IOMMU-isolated devices.
>
> Signed-off-by: Matthew Garrett <[email protected]>

As noted here:

https://bugzilla.redhat.com/show_bug.cgi?id=908888

this breaks pci passthru with QEMU. The suggestion in the bug is to move
the check from read/write to open, but sysfs makes that somewhat
difficult. The open code is part of the core sysfs functionality shared
with the majority of sysfs files, so adding a check there would restrict
things that clearly don't need to be restricted.

Kyle had the idea to add a cap field to the attribute structure, and do
a capable check if that is set. That would allow for a more generic
usage of capabilities in sysfs code, at the cost of slightly increasing
the structure size and open path. That seems somewhat promising if we
stick with capabilities.

I would love to just squarely blame capabilities for causing this, but we
can't just replace it with an efi_enabled(EFI_SECURE_BOOT) check because
of the sysfs open case. I'm not sure there are great answers here.

josh

2013-03-27 15:08:17

by Kyle McMartin

[permalink] [raw]
Subject: Re: [PATCH 05/12] PCI: Require CAP_COMPROMISE_KERNEL for PCI BAR access

On Wed, Mar 27, 2013 at 11:03:26AM -0400, Josh Boyer wrote:
> On Mon, Mar 18, 2013 at 5:32 PM, Matthew Garrett
> <[email protected]> wrote:
> > Any hardware that can potentially generate DMA has to be locked down from
> > userspace in order to avoid it being possible for an attacker to cause
> > arbitrary kernel behaviour. Default to paranoid - in future we can
> > potentially relax this for sufficiently IOMMU-isolated devices.
> >
> > Signed-off-by: Matthew Garrett <[email protected]>
>
> As noted here:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=908888
>
> this breaks pci passthru with QEMU. The suggestion in the bug is to move
> the check from read/write to open, but sysfs makes that somewhat
> difficult. The open code is part of the core sysfs functionality shared
> with the majority of sysfs files, so adding a check there would restrict
> things that clearly don't need to be restricted.
>
> Kyle had the idea to add a cap field to the attribute structure, and do
> a capable check if that is set. That would allow for a more generic
> usage of capabilities in sysfs code, at the cost of slightly increasing
> the structure size and open path. That seems somewhat promising if we
> stick with capabilities.
>
> I would love to just squarely blame capabilities for causing this, but we
> can't just replace it with an efi_enabled(EFI_SECURE_BOOT) check because
> of the sysfs open case. I'm not sure there are great answers here.
>

Yeah, that was something like this (I don't even remember which Fedora
kernel version this was against.)

--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -546,9 +546,6 @@ pci_write_config(struct file* filp, struct kobject *kobj,
loff_t init_off = off;
u8 *data = (u8*) buf;

- if (!capable(CAP_COMPROMISE_KERNEL))
- return -EPERM;
-
if (off > dev->cfg_size)
return 0;
if (off + count > dev->cfg_size) {
@@ -772,6 +769,7 @@ void pci_create_legacy_files(struct pci_bus *b)
b->legacy_io->attr.name = "legacy_io";
b->legacy_io->size = 0xffff;
b->legacy_io->attr.mode = S_IRUSR | S_IWUSR;
+ b->legacy_io->attr.cap = CAP_COMPROMISE_KERNEL;
b->legacy_io->read = pci_read_legacy_io;
b->legacy_io->write = pci_write_legacy_io;
b->legacy_io->mmap = pci_mmap_legacy_io;
@@ -786,6 +784,7 @@ void pci_create_legacy_files(struct pci_bus *b)
b->legacy_mem->attr.name = "legacy_mem";
b->legacy_mem->size = 1024*1024;
b->legacy_mem->attr.mode = S_IRUSR | S_IWUSR;
+ b->legacy_io->attr.cap = CAP_COMPROMISE_KERNEL;
b->legacy_mem->mmap = pci_mmap_legacy_mem;
pci_adjust_legacy_attr(b, pci_mmap_mem);
error = device_create_bin_file(&b->dev, b->legacy_mem);
@@ -855,9 +854,6 @@ pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
resource_size_t start, end;
int i;

- if (!capable(CAP_COMPROMISE_KERNEL))
- return -EPERM;
-
for (i = 0; i < PCI_ROM_RESOURCE; i++)
if (res == &pdev->resource[i])
break;
@@ -965,9 +961,6 @@ pci_write_resource_io(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
loff_t off, size_t count)
{
- if (!capable(CAP_COMPROMISE_KERNEL))
- return -EPERM;
-
return pci_resource_io(filp, kobj, attr, buf, off, count, true);
}

@@ -1027,6 +1020,7 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
}
res_attr->attr.name = res_attr_name;
res_attr->attr.mode = S_IRUSR | S_IWUSR;
+ res_attr->attr.cap = CAP_COMPROMISE_KERNEL;
res_attr->size = pci_resource_len(pdev, num);
res_attr->private = &pdev->resource[num];
retval = sysfs_create_bin_file(&pdev->dev.kobj, res_attr);
@@ -1142,6 +1136,7 @@ static struct bin_attribute pci_config_attr = {
.attr = {
.name = "config",
.mode = S_IRUGO | S_IWUSR,
+ .cap = CAP_COMPROMISE_KERNEL,
},
.size = PCI_CFG_SPACE_SIZE,
.read = pci_read_config,
@@ -1152,6 +1147,7 @@ static struct bin_attribute pcie_config_attr = {
.attr = {
.name = "config",
.mode = S_IRUGO | S_IWUSR,
+ .cap = CAP_COMPROMISE_KERNEL,
},
.size = PCI_CFG_SPACE_EXP_SIZE,
.read = pci_read_config,
@@ -1201,6 +1197,7 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
attr->size = dev->vpd->len;
attr->attr.name = "vpd";
attr->attr.mode = S_IRUSR | S_IWUSR;
+ attr->attr.cap = CAP_COMPROMISE_KERNEL;
attr->read = read_vpd_attr;
attr->write = write_vpd_attr;
retval = sysfs_create_bin_file(&dev->dev.kobj, attr);
diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 614b2b5..e40a725 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -402,6 +402,10 @@ static int open(struct inode * inode, struct file * file)
if (!sysfs_get_active(attr_sd))
return -ENODEV;

+ error = -EACCES;
+ if (attr->attr.cap && !capable(attr->attr.cap))
+ goto err_out;
+
error = -EACCES;
if ((file->f_mode & FMODE_WRITE) && !(attr->write || attr->mmap))
goto err_out;
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 381f06d..0cf0034 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -26,6 +26,7 @@ enum kobj_ns_type;
struct attribute {
const char *name;
umode_t mode;
+ int cap;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
bool ignore_lockdep:1;
struct lock_class_key *key;

2013-03-28 12:46:08

by Josh Boyer

[permalink] [raw]
Subject: Re: [PATCH 05/12] PCI: Require CAP_COMPROMISE_KERNEL for PCI BAR access

On Wed, Mar 27, 2013 at 11:08 AM, Kyle McMartin <[email protected]> wrote:
> On Wed, Mar 27, 2013 at 11:03:26AM -0400, Josh Boyer wrote:
>> On Mon, Mar 18, 2013 at 5:32 PM, Matthew Garrett
>> <[email protected]> wrote:
>> > Any hardware that can potentially generate DMA has to be locked down from
>> > userspace in order to avoid it being possible for an attacker to cause
>> > arbitrary kernel behaviour. Default to paranoid - in future we can
>> > potentially relax this for sufficiently IOMMU-isolated devices.
>> >
>> > Signed-off-by: Matthew Garrett <[email protected]>
>>
>> As noted here:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=908888
>>
>> this breaks pci passthru with QEMU. The suggestion in the bug is to move
>> the check from read/write to open, but sysfs makes that somewhat
>> difficult. The open code is part of the core sysfs functionality shared
>> with the majority of sysfs files, so adding a check there would restrict
>> things that clearly don't need to be restricted.
>>
>> Kyle had the idea to add a cap field to the attribute structure, and do
>> a capable check if that is set. That would allow for a more generic
>> usage of capabilities in sysfs code, at the cost of slightly increasing
>> the structure size and open path. That seems somewhat promising if we
>> stick with capabilities.
>>
>> I would love to just squarely blame capabilities for causing this, but we
>> can't just replace it with an efi_enabled(EFI_SECURE_BOOT) check because
>> of the sysfs open case. I'm not sure there are great answers here.
>>
>
> Yeah, that was something like this (I don't even remember which Fedora
> kernel version this was against.)

Mostly an FYI for the peanut gallery, but we noticed moving the cap check
to open breaks lspci being run by an unprivileged user. It also doesn't
fix pci passthrough because QEMU opens the PCI resource files by itself
after it's already dropped all caps.

More thinking required.

josh