2024-05-03 13:20:44

by Mickaël Salaün

[permalink] [raw]
Subject: [RFC PATCH v3 0/5] Hypervisor-Enforced Kernel Integrity - CR pinning

Hi,

This patch series implements control-register (CR) pinning for KVM and
provides an hypervisor-agnostic API to protect guests. It includes the
guest interface, the host interface, and the KVM implementation.

It's not ready for mainline yet (see the current limitations), but we
think the overall design and interfaces are good and we'd like to have
some feedback on that.

# Changes since previous version

We choose to remove as much as possible from the previous version of
this patch series to only keep the CR pinning feature and the API. This
makes the patches simpler and brings the foundation for future
enhancement. This will also enables us to quickly iterate on new
versions. We are still working on memory protection but that should be
part of another patch series, if possible once this one land.

We implemented proper KUnit tests and we are also improving the test
framework to make it easier to run tests (and another series is planed):
https://lore.kernel.org/r/[email protected]
It makes sense to use KUnit for hypervisor-agnostic features.

This series is rebased on top of v6.9-rc6 . guest_memfd is now merged
in mainline, which will help upcoming memory-related changes.

# Overview

The main idea being that kernel self-protection mechanisms should be
delegated to a more privileged part of the system, that is the
hypervisor (see the Threat model below for more details). It is still
the role of the guest kernel to request such restrictions according to
its configuration. The high-level security guarantees provided by the
hypervisor are semantically the same as a subset of those the kernel
already enforces on itself (CR pinning hardening), but with much strong
guarantees.

The guest kernel API layer contains a global struct heki_hypervisor to
share data and functions between the common code and the hypervisor
support code. The struct heki_hypervisor enables to plug in different
backend implementations that are initialized with the heki_early_init()
and heki_late_init() calls.

We took inspiration from previous patches, mainly the KVMI [1] [2] and
KVM CR-pinning [3] series, revamped and simplified relevant parts to fit
well with our goal, added one hypercall, and created a kernel API for
VMs to request protection in a generic way that can be leveraged by any
hypervisor.

When a guest request to change one of its previously protected CR, KVM
creates a GP fault.

Because the VMM needs to be involved and know the guests' requested
memory permissions, we implemented two new kind of VM exits to be able
to notify the VMM about guests' Heki configurations and policy
violations. Indeed, forwarding such signals to the VMM could help
improve attack detection, and react to such attempt (e.g. log events,
stop the VM). Giving visibility to the VMM would also enable us to
migrate VMs.

# Threat model

The main threat model is a malicious user space process exploiting a
kernel vulnerability to gain more privileges or to bypass the
access-control system. This threat also covers attacks coming from
network or storage data (e.g., malformed network packet, inconsistent
drive content).

Considering all potential ways to compromise a kernel, Heki's goal is to
harden a sane kernel before a runtime attack to make it more difficult,
and potentially to cause such an attack to fail. Because current attack
mitigations are only mitigations, we consider the kernel itself to be
partially malicious during its lifetime e.g., because a ROP attack that
could disable kernel self-protection mechanisms and make kernel
exploitation much easier. Indeed, an exploit is often split into several
stages, each bypassing some security measures (including CFI).

CR pinning should already be enforced by the guest kernel and the reason
to pin such registers is the same. With this patch series it
significantly improve such protection.

Getting the guarantee that these control registers cannot be changed
increases the cost of an attack.

# Prerequisites

For this new security layer to be effective, guest kernels must be
trusted by the VM owners at boot time, before launching any user space
processes nor receiving potentially malicious network packets. It is
then required to have a security mechanism to provide or check this
initial trust (e.g., secure boot, kernel module signing). To protect
against persistent attacks, complementary security mechanisms should be
used (e.g., IMA, IPE, Lockdown).

# How does it work?

The KVM_HC_LOCK_CR_UPDATE hypercall enables guests to pin some of its
CPU control register flags (e.g., X86_CR0_WP, X86_CR4_SMEP,
X86_CR4_SMAP).

Two new kinds of VM exits are implemented: one for a guest Heki request
(i.e. hypercall), and another for a guest attempt to change its pinned
CRs. When the guest attempts to update pinned CRs or to access memory
in a way that is not allowed, the VMM can then be notified and react to
such attack attempt. After that, if the VM is still running, KVM sends
a GP fault to the guest. The guest could then send a signal to the user
space process that triggered this policy violation (not implemented).

Heki can be enabled with the heki=1 boot command argument.

# Similar implementations

Here is a non-exhaustive list of similar implementations that we looked
at and took some ideas from. Linux mainline doesn't support such
security features, let's change that!

Windows's Virtualization-Based Security is a proprietary technology
that provides a superset of this kind of security mechanism, relying on
Hyper-V and Virtual Trust Levels which enables to have light and secure
VM enforcing restrictions on a full guest VM. This includes several
components such as HVCI for code authenticity, or HyperGuard for
monitoring and protecting kernel code and data.

Samsung's Real-time Kernel Protection (RKP) and Huawei Hypervisor
Execution Environment (HHEE) rely on proprietary hypervisors to protect
some Android devices. They monitor critical kernel data (e.g., page
tables, credentials, selinux_enforcing).

The iOS Kernel Patch Protection (KPP/Watchtower) is a proprietary
solution running in EL3 that monitors and protects critical parts of the
kernel. It is now replaced with a hardware-based mechanism: KTTR/RoRgn.

Bitdefender's Hypervisor Memory Introspection (HVMI) is an open-source
(but out of tree) set of components leveraging virtualization. HVMI
implementation is very complex, and this approach implies potential
semantic gap issues (i.e., kernel data structures may change from one
version to another).

Linux Kernel Runtime Guard is an open-source kernel module that can
detect some kernel data illegitimate modifications. Because it is the
same kernel as the compromised one, an attacker could also bypass or
disable these checks.

Intel's Virtualization Based Hardening [4] [5] is an open-source
proof-of-concept of a thin hypervisor dedicated to guest protection. As
such, it cannot be used to manage several VMs.

# Similar Linux patches

Paravirtualized Control Register pinning [3] added a set of KVM IOCTLs
to restrict some flags to be set. Heki doesn't implement such user space
interface, but only a dedicated hypercall to lock such registers. A
superset of these flags is configurable with Heki.

The Hypervisor Based Integrity patches [6] [7] only contain a generic
IPC mechanism (KVM_HC_UCALL hypercall) to request protection to the VMM.
The idea was to extend the KVM_SET_USER_MEMORY_REGION IOCTL to support
more permission than read-only.

# Current limitations

This patch series doesn't handle VM reboot, kexec, nor hybernate yet.
We'd like to leverage the realated feature from KVM CR-pinning patch
series [3]. Help appreciated!

We noticed that the KUnit tests don't work on AMD because the exception
table seems to not be properly handled (i.e. a double fault is
received). Any reason why this would differ from an Intel's CPU?

What about extending register pinning to MSRs? This should first be
implemented as a kernel self-protection though.

This patch series is also a call for collaboration. There is a lot to
do, either on hypervisors, guest kernels or VMMs sides.

# Resources

You can find related resources, including previous versions, and
conference talks about this work and the related LVBS project here:
https://github.com/heki-linux

[1] https://lore.kernel.org/all/[email protected]/
[2] https://www.linux-kvm.org/images/7/72/KVMForum2017_Introspection.pdf
[3] https://lore.kernel.org/all/[email protected]/
[4] https://github.com/intel/vbh
[5] https://sched.co/TmwN
[6] https://sched.co/eE3f
[7] https://lore.kernel.org/all/[email protected]/

Please reach out to us by replying to this thread, we're looking for
people to join and collaborate on this project!

Previous versions:
v2: https://lore.kernel.org/r/[email protected]
v1: https://lore.kernel.org/r/[email protected]

Regards,

Madhavan T. Venkataraman (1):
virt: Introduce Hypervisor Enforced Kernel Integrity (Heki)

Mickaël Salaün (4):
KVM: x86: Add new hypercall to lock control registers
KVM: x86: Add notifications for Heki policy configuration and
violation
heki: Lock guest control registers at the end of guest kernel init
virt: Add Heki KUnit tests

Documentation/virt/kvm/x86/hypercalls.rst | 17 ++
Kconfig | 2 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/x86_init.h | 1 +
arch/x86/include/uapi/asm/kvm_para.h | 2 +
arch/x86/kernel/cpu/common.c | 7 +-
arch/x86/kernel/cpu/hypervisor.c | 1 +
arch/x86/kernel/kvm.c | 56 +++++++
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/vmx/vmx.c | 6 +
arch/x86/kvm/x86.c | 180 ++++++++++++++++++++++
arch/x86/kvm/x86.h | 23 +++
include/linux/heki.h | 54 +++++++
include/linux/kvm_host.h | 7 +
include/uapi/linux/kvm.h | 22 +++
include/uapi/linux/kvm_para.h | 1 +
init/main.c | 3 +
mm/mm_init.c | 1 +
virt/Makefile | 1 +
virt/heki/.kunitconfig | 9 ++
virt/heki/Kconfig | 43 ++++++
virt/heki/Makefile | 4 +
virt/heki/common.h | 16 ++
virt/heki/heki-test.c | 114 ++++++++++++++
virt/heki/main.c | 68 ++++++++
25 files changed, 638 insertions(+), 2 deletions(-)
create mode 100644 include/linux/heki.h
create mode 100644 virt/heki/.kunitconfig
create mode 100644 virt/heki/Kconfig
create mode 100644 virt/heki/Makefile
create mode 100644 virt/heki/common.h
create mode 100644 virt/heki/heki-test.c
create mode 100644 virt/heki/main.c


base-commit: e67572cd2204894179d89bd7b984072f19313b03
--
2.45.0



2024-05-03 13:21:51

by Mickaël Salaün

[permalink] [raw]
Subject: [RFC PATCH v3 4/5] heki: Lock guest control registers at the end of guest kernel init

The hypervisor needs to provide some functions to support Heki. These
form the Heki-Hypervisor API.

Define a heki_hypervisor structure to house the API functions. A
hypervisor that supports Heki must instantiate a heki_hypervisor
structure and pass it to the Heki common code. This allows the common
code to access these functions in a hypervisor-agnostic way.

The first function that is implemented is lock_crs() (lock control
registers). That is, certain flags in the control registers are pinned
so that they can never be changed for the lifetime of the guest.

Implement Heki support in the guest:

- Each supported hypervisor in x86 implements a set of functions for the
guest kernel. Add an init_heki() function to that set. This function
initializes Heki-related stuff. Call init_heki() for the detected
hypervisor in init_hypervisor_platform().

- Implement init_heki() for the guest.

- Implement kvm_lock_crs() in the guest to lock down control registers.
This function calls a KVM hypercall to do the job.

- Instantiate a heki_hypervisor structure that contains a pointer to
kvm_lock_crs().

- Pass the heki_hypervisor structure to Heki common code in init_heki().

Implement a heki_late_init() function and call it at the end of kernel
init. This function calls lock_crs(). In other words, control registers
of a guest are locked down at the end of guest kernel init.

Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Co-developed-by: Madhavan T. Venkataraman <[email protected]>
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
Signed-off-by: Mickaël Salaün <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---

Changes since v2:
* Hide CONFIG_HYPERVISOR_SUPPORTS_HEKI from users.

Changes since v1:
* Shrinked the patch to only manage the CR pinning.
---
arch/x86/include/asm/x86_init.h | 1 +
arch/x86/kernel/cpu/hypervisor.c | 1 +
arch/x86/kernel/kvm.c | 56 ++++++++++++++++++++++++++++++++
arch/x86/kvm/Kconfig | 1 +
include/linux/heki.h | 22 +++++++++++++
init/main.c | 1 +
virt/heki/Kconfig | 8 ++++-
virt/heki/main.c | 25 ++++++++++++++
8 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 6149eabe200f..113998799473 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -128,6 +128,7 @@ struct x86_hyper_init {
bool (*msi_ext_dest_id)(void);
void (*init_mem_mapping)(void);
void (*init_after_bootmem)(void);
+ void (*init_heki)(void);
};

/**
diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c
index 553bfbfc3a1b..6085c8129e0c 100644
--- a/arch/x86/kernel/cpu/hypervisor.c
+++ b/arch/x86/kernel/cpu/hypervisor.c
@@ -106,4 +106,5 @@ void __init init_hypervisor_platform(void)

x86_hyper_type = h->type;
x86_init.hyper.init_platform();
+ x86_init.hyper.init_heki();
}
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 7f0732bc0ccd..a54f2c0d7cd0 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -29,6 +29,7 @@
#include <linux/syscore_ops.h>
#include <linux/cc_platform.h>
#include <linux/efi.h>
+#include <linux/heki.h>
#include <asm/timer.h>
#include <asm/cpu.h>
#include <asm/traps.h>
@@ -999,6 +1000,60 @@ static bool kvm_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
}
#endif

+#ifdef CONFIG_HEKI
+
+extern unsigned long cr4_pinned_mask;
+
+/*
+ * TODO: Check SMP policy consistency, e.g. with
+ * this_cpu_read(cpu_tlbstate.cr4)
+ */
+static int kvm_lock_crs(void)
+{
+ unsigned long cr4;
+ int err;
+
+ err = kvm_hypercall3(KVM_HC_LOCK_CR_UPDATE, 0, X86_CR0_WP, 0);
+ if (err)
+ return err;
+
+ cr4 = __read_cr4();
+ err = kvm_hypercall3(KVM_HC_LOCK_CR_UPDATE, 4, cr4 & cr4_pinned_mask,
+ 0);
+ return err;
+}
+
+static struct heki_hypervisor kvm_heki_hypervisor = {
+ .lock_crs = kvm_lock_crs,
+};
+
+static void kvm_init_heki(void)
+{
+ long err;
+
+ if (!kvm_para_available()) {
+ /* Cannot make KVM hypercalls. */
+ return;
+ }
+
+ err = kvm_hypercall3(KVM_HC_LOCK_CR_UPDATE, 0, 0,
+ KVM_LOCK_CR_UPDATE_VERSION);
+ if (err < 1) {
+ /* Ignores host not supporting at least the first version. */
+ return;
+ }
+
+ heki.hypervisor = &kvm_heki_hypervisor;
+}
+
+#else /* CONFIG_HEKI */
+
+static void kvm_init_heki(void)
+{
+}
+
+#endif /* CONFIG_HEKI */
+
const __initconst struct hypervisor_x86 x86_hyper_kvm = {
.name = "KVM",
.detect = kvm_detect,
@@ -1007,6 +1062,7 @@ const __initconst struct hypervisor_x86 x86_hyper_kvm = {
.init.x2apic_available = kvm_para_available,
.init.msi_ext_dest_id = kvm_msi_ext_dest_id,
.init.init_platform = kvm_init_platform,
+ .init.init_heki = kvm_init_heki,
#if defined(CONFIG_AMD_MEM_ENCRYPT)
.runtime.sev_es_hcall_prepare = kvm_sev_es_hcall_prepare,
.runtime.sev_es_hcall_finish = kvm_sev_es_hcall_finish,
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 0ebdd088f28b..68e0e8d7230a 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -44,6 +44,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_PM_NOTIFIER if PM
select KVM_GENERIC_HARDWARE_ENABLING
+ select HYPERVISOR_SUPPORTS_HEKI
help
Support hosting fully virtualized guest machines using hardware
virtualization extensions. You will need a fairly recent
diff --git a/include/linux/heki.h b/include/linux/heki.h
index 4c18d2283392..96ccb17657e5 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -9,6 +9,7 @@
#define __HEKI_H__

#include <linux/types.h>
+#include <linux/bug.h>
#include <linux/cache.h>
#include <linux/init.h>
#include <linux/kernel.h>
@@ -16,15 +17,36 @@

#ifdef CONFIG_HEKI

+/*
+ * A hypervisor that supports Heki will instantiate this structure to
+ * provide hypervisor specific functions for Heki.
+ */
+struct heki_hypervisor {
+ int (*lock_crs)(void); /* Lock control registers. */
+};
+
+/*
+ * If the active hypervisor supports Heki, it will plug its heki_hypervisor
+ * pointer into this heki structure.
+ */
+struct heki {
+ struct heki_hypervisor *hypervisor;
+};
+
+extern struct heki heki;
extern bool heki_enabled;

void heki_early_init(void);
+void heki_late_init(void);

#else /* !CONFIG_HEKI */

static inline void heki_early_init(void)
{
}
+static inline void heki_late_init(void)
+{
+}

#endif /* CONFIG_HEKI */

diff --git a/init/main.c b/init/main.c
index bec2c8d939aa..c2dc663ab4b5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1454,6 +1454,7 @@ static int __ref kernel_init(void *unused)
exit_boot_config();
free_initmem();
mark_readonly();
+ heki_late_init();

/*
* Kernel mappings are now finalized - update the userspace page-table
diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig
index 66e73d212856..0c764e342f48 100644
--- a/virt/heki/Kconfig
+++ b/virt/heki/Kconfig
@@ -8,6 +8,12 @@ config ARCH_SUPPORTS_HEKI
# and run with CONFIG_HEKI. That is, it should provide all of the
# architecture support required for the HEKI feature.

+config HYPERVISOR_SUPPORTS_HEKI
+ bool
+ # A hypervisor should select this when it can successfully build
+ # and run with CONFIG_HEKI. That is, it should provide all of the
+ # hypervisor support required for the Heki feature.
+
menuconfig HEKI_MENU
bool "Virtualization hardening"

@@ -15,7 +21,7 @@ if HEKI_MENU

config HEKI
bool "Hypervisor Enforced Kernel Integrity (Heki)"
- depends on ARCH_SUPPORTS_HEKI
+ depends on ARCH_SUPPORTS_HEKI && HYPERVISOR_SUPPORTS_HEKI
help
This feature enhances guest virtual machine security by taking
advantage of security features provided by the hypervisor for guests.
diff --git a/virt/heki/main.c b/virt/heki/main.c
index 25c25f5700f7..ef0530a03e09 100644
--- a/virt/heki/main.c
+++ b/virt/heki/main.c
@@ -11,6 +11,7 @@
#include "common.h"

bool heki_enabled __ro_after_init = true;
+struct heki heki;

/*
* Must be called after kmem_cache_init().
@@ -22,6 +23,30 @@ __init void heki_early_init(void)
return;
}
pr_warn("Heki is enabled\n");
+
+ if (!heki.hypervisor) {
+ /* This happens for kernels running on bare metal as well. */
+ pr_warn("No support for Heki in the active hypervisor\n");
+ return;
+ }
+ pr_warn("Heki is supported by the active Hypervisor\n");
+}
+
+/*
+ * Must be called after mark_readonly().
+ */
+void heki_late_init(void)
+{
+ struct heki_hypervisor *hypervisor = heki.hypervisor;
+
+ if (!heki_enabled || !heki.hypervisor)
+ return;
+
+ /* Locks control registers so a compromised guest cannot change them. */
+ if (WARN_ON(hypervisor->lock_crs()))
+ return;
+
+ pr_warn("Control registers locked\n");
}

static int __init heki_parse_config(char *str)
--
2.45.0


2024-05-03 13:21:51

by Mickaël Salaün

[permalink] [raw]
Subject: [RFC PATCH v3 5/5] virt: Add Heki KUnit tests

The new CONFIG_HEKI_KUNIT_TEST option enables to run tests in a a kernel
module. The minimal required configuration is listed in the
virt/heki-test/.kunitconfig file.

test_cr_disable_smep checks control-register pinning by trying to
disable SMEP. This test should then failed on a non-protected kernel,
and only succeed with a kernel protected by Heki.

This test doesn't rely on native_write_cr4() because of the
cr4_pinned_mask hardening, which means that this *test* module loads a
valid kernel code to arbitrary change CR4. This simulate an attack
scenario where an attaker would use ROP to directly jump to the related
cr4 instruction.

As for any KUnit test, the kernel is tainted with TAINT_TEST when the
test is executed.

It is interesting to create new KUnit tests instead of extending KVM's
Kselftests because Heki is design to be hypervisor-agnostic, it relies
on a set of hypercalls (for KVM or others), and we also want to test
kernel's configuration (actual pinned CR). However, new KVM's
Kselftests would be useful to test KVM's interface with the host.

When using Qemu, we need to pass the following arguments: -cpu host
-enable-kvm

For now, it is not possible to run these tests as built-in but we are
working on that [1]. If tests are built-in anyway, they will just be
skipped because Heki would not be enabled.

Run Heki tests with:
insmod heki-test.ko

KTAP version 1
1..1
KTAP version 1
# Subtest: heki_x86
# module: heki_test
1..1
ok 1 test_cr_disable_smep
ok 1 heki_x86

Link: https://lore.kernel.org/r/[email protected] [1]
Signed-off-by: Mickaël Salaün <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---

Changes since v2:
* Make tests standalone (e.g. don't depends on CONFIG_HEKI).
* Enable to create a test kernel module.
* Don't rely on private kernel symbols.
* Handle GP fault for CR-pinning test case.
* Rename option to CONFIG_HEKI_KUNIT_TEST.
* Add the list of required kernel options.
* Move tests to virt/heki-test/ [FIXME]
* Only keep CR pinning test.
* Restore previous state (with SMEP enabled).
* Add a Kconfig menu for Heki and update the description.
* Skip tests if Heki is not protecting the running kernel.

Changes since v1:
* Move all tests to virt/heki/tests.c
---
include/linux/heki.h | 1 +
virt/heki/.kunitconfig | 9 ++++
virt/heki/Kconfig | 12 +++++
virt/heki/Makefile | 1 +
virt/heki/heki-test.c | 114 +++++++++++++++++++++++++++++++++++++++++
virt/heki/main.c | 10 ++++
6 files changed, 147 insertions(+)
create mode 100644 virt/heki/.kunitconfig
create mode 100644 virt/heki/heki-test.c

diff --git a/include/linux/heki.h b/include/linux/heki.h
index 96ccb17657e5..3294c4d583e5 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -35,6 +35,7 @@ struct heki {

extern struct heki heki;
extern bool heki_enabled;
+extern bool heki_enforcing;

void heki_early_init(void);
void heki_late_init(void);
diff --git a/virt/heki/.kunitconfig b/virt/heki/.kunitconfig
new file mode 100644
index 000000000000..ad4454800579
--- /dev/null
+++ b/virt/heki/.kunitconfig
@@ -0,0 +1,9 @@
+CONFIG_HEKI=y
+CONFIG_HEKI_KUNIT_TEST=m
+CONFIG_HEKI_MENU=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_HYPERVISOR_GUEST=y
+CONFIG_KUNIT=y
+CONFIG_KVM=y
+CONFIG_KVM_GUEST=y
+CONFIG_PARAVIRT=y
diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig
index 0c764e342f48..18895a81a9af 100644
--- a/virt/heki/Kconfig
+++ b/virt/heki/Kconfig
@@ -28,4 +28,16 @@ config HEKI
This feature is helpful in maintaining guest virtual machine security
even after the guest kernel has been compromised.

+config HEKI_KUNIT_TEST
+ tristate "KUnit tests for Heki" if !KUNIT_ALL_TESTS
+ depends on KUNIT
+ depends on X86
+ default KUNIT_ALL_TESTS
+ help
+ Build KUnit tests for Landlock.
+
+ See the KUnit documentation in Documentation/dev-tools/kunit
+
+ If you are unsure how to answer this question, answer N.
+
endif
diff --git a/virt/heki/Makefile b/virt/heki/Makefile
index 8b10e73a154b..7133545eb5ae 100644
--- a/virt/heki/Makefile
+++ b/virt/heki/Makefile
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only

obj-$(CONFIG_HEKI) += main.o
+obj-$(CONFIG_HEKI_KUNIT_TEST) += heki-test.o
diff --git a/virt/heki/heki-test.c b/virt/heki/heki-test.c
new file mode 100644
index 000000000000..b4e11c21ac5d
--- /dev/null
+++ b/virt/heki/heki-test.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Tests
+ *
+ * Copyright © 2023-2024 Microsoft Corporation
+ */
+
+#include <asm/asm.h>
+#include <asm/processor.h>
+#include <asm/special_insns.h>
+#include <kunit/test.h>
+#include <linux/heki.h>
+
+/* Returns true on error (i.e. GP fault), false otherwise. */
+static __always_inline bool set_cr4(unsigned long value)
+{
+ int err = 0;
+
+ might_sleep();
+ /* clang-format off */
+ asm volatile("1: mov %[value],%%cr4 \n"
+ "2: \n"
+ _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_ONE_REG, %[err])
+ : [value] "+r" (value), [err] "+r" (err)
+ :);
+ /* clang-format on */
+ return err;
+}
+
+/* Control register pinning tests with SMEP check. */
+static void test_cr_disable_smep(struct kunit *const test)
+{
+ bool is_vme_set;
+
+ /* SMEP should be initially enabled. */
+ KUNIT_ASSERT_TRUE(test, __read_cr4() & X86_CR4_SMEP);
+
+ /*
+ * Trying to disable SMEP, bypassing kernel self-protection by not
+ * using cr4_clear_bits(X86_CR4_SMEP), and checking GP fault.
+ */
+ KUNIT_EXPECT_TRUE(test, set_cr4(__read_cr4() & ~X86_CR4_SMEP));
+
+ /* SMEP should still be enabled. */
+ KUNIT_EXPECT_TRUE(test, __read_cr4() & X86_CR4_SMEP);
+
+ /* Re-enabling SMEP doesn't throw a GP fault. */
+ KUNIT_EXPECT_FALSE(test, set_cr4(__read_cr4() | X86_CR4_SMEP));
+ KUNIT_EXPECT_TRUE(test, __read_cr4() & X86_CR4_SMEP);
+
+ /* We are allowed to set and unset VME. */
+ is_vme_set = __read_cr4() & X86_CR4_VME;
+ KUNIT_EXPECT_FALSE(test, set_cr4(__read_cr4() | X86_CR4_VME));
+ KUNIT_EXPECT_TRUE(test, __read_cr4() & X86_CR4_VME);
+
+ KUNIT_EXPECT_FALSE(test, set_cr4(__read_cr4() & ~X86_CR4_VME));
+ KUNIT_EXPECT_FALSE(test, __read_cr4() & X86_CR4_VME);
+
+ KUNIT_EXPECT_FALSE(test, set_cr4(__read_cr4() | X86_CR4_VME));
+ KUNIT_EXPECT_TRUE(test, __read_cr4() & X86_CR4_VME);
+
+ /* SMEP and VME changes should be consistent when setting both. */
+ KUNIT_EXPECT_TRUE(test, set_cr4(__read_cr4() &
+ ~(X86_CR4_SMEP | X86_CR4_VME)));
+ KUNIT_EXPECT_TRUE(test, __read_cr4() & X86_CR4_SMEP);
+ KUNIT_EXPECT_TRUE(test, __read_cr4() & X86_CR4_VME);
+
+ /* Unset VME. */
+ KUNIT_EXPECT_FALSE(test, set_cr4(__read_cr4() & ~X86_CR4_VME));
+ KUNIT_EXPECT_FALSE(test, __read_cr4() & X86_CR4_VME);
+
+ /* SMEP and VME changes should be consistent when only setting SMEP. */
+ KUNIT_EXPECT_TRUE(test, set_cr4(__read_cr4() &
+ ~(X86_CR4_SMEP | X86_CR4_VME)));
+ KUNIT_EXPECT_TRUE(test, __read_cr4() & X86_CR4_SMEP);
+ KUNIT_EXPECT_FALSE(test, __read_cr4() & X86_CR4_VME);
+
+ /* Restores VME. */
+ if (is_vme_set)
+ KUNIT_EXPECT_FALSE(test, set_cr4(__read_cr4() | X86_CR4_VME));
+ else
+ KUNIT_EXPECT_FALSE(test, set_cr4(__read_cr4() & ~X86_CR4_VME));
+}
+
+/* clang-format off */
+static struct kunit_case test_cases_x86[] = {
+ KUNIT_CASE(test_cr_disable_smep),
+ {}
+};
+/* clang-format on */
+
+static int test_init(struct kunit *test)
+{
+#ifdef CONFIG_HEKI
+ if (heki_enforcing)
+ return 0;
+#else /* CONFIG_HEKI */
+ kunit_skip(test, "Heki is not enforced");
+#endif /* CONFIG_HEKI */
+
+ return 0;
+}
+
+static struct kunit_suite test_suite_x86 = {
+ .name = "heki_x86",
+ .init = test_init,
+ .test_cases = test_cases_x86,
+};
+
+kunit_test_suite(test_suite_x86);
+
+MODULE_IMPORT_NS(HEKI_KUNIT_TEST);
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Tests for Hypervisor Enforced Kernel Integrity (Heki)");
diff --git a/virt/heki/main.c b/virt/heki/main.c
index ef0530a03e09..dcc89befaf66 100644
--- a/virt/heki/main.c
+++ b/virt/heki/main.c
@@ -11,6 +11,12 @@
#include "common.h"

bool heki_enabled __ro_after_init = true;
+
+#if IS_ENABLED(CONFIG_KUNIT)
+bool heki_enforcing = false;
+EXPORT_SYMBOL_NS_GPL(heki_enforcing, HEKI_KUNIT_TEST);
+#endif /* IS_ENABLED(CONFIG_KUNIT) */
+
struct heki heki;

/*
@@ -47,6 +53,10 @@ void heki_late_init(void)
return;

pr_warn("Control registers locked\n");
+
+#if IS_ENABLED(CONFIG_KUNIT)
+ heki_enforcing = true;
+#endif /* IS_ENABLED(CONFIG_KUNIT) */
}

static int __init heki_parse_config(char *str)
--
2.45.0


2024-05-03 13:21:53

by Mickaël Salaün

[permalink] [raw]
Subject: [RFC PATCH v3 1/5] virt: Introduce Hypervisor Enforced Kernel Integrity (Heki)

From: Madhavan T. Venkataraman <[email protected]>

Hypervisor Enforced Kernel Integrity (Heki) is a feature that will use
the hypervisor to enhance guest virtual machine security.

Implement minimal code to introduce Heki:

- Define the config variables.

- Define a kernel command line parameter "heki" to turn the feature
on or off. By default, Heki is on.

- Define heki_early_init() and call it in start_kernel(). Currently,
this function only prints the value of the "heki" command
line parameter.

Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Co-developed-by: Mickaël Salaün <[email protected]>
Signed-off-by: Mickaël Salaün <[email protected]>
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---

Changes since v2:
* Move CONFIG_HEKI under a new CONFIG_HEKI_MENU to group it with the
test configuration (see following patches).
* Hide CONFIG_ARCH_SUPPORS_HEKI from users.

Changes since v1:
* Shrinked this patch to only contain the minimal common parts.
* Moved heki_early_init() to start_kernel().
* Use kstrtobool().
---
Kconfig | 2 ++
arch/x86/Kconfig | 1 +
include/linux/heki.h | 31 +++++++++++++++++++++++++++++++
init/main.c | 2 ++
mm/mm_init.c | 1 +
virt/Makefile | 1 +
virt/heki/Kconfig | 25 +++++++++++++++++++++++++
virt/heki/Makefile | 3 +++
virt/heki/common.h | 16 ++++++++++++++++
virt/heki/main.c | 33 +++++++++++++++++++++++++++++++++
10 files changed, 115 insertions(+)
create mode 100644 include/linux/heki.h
create mode 100644 virt/heki/Kconfig
create mode 100644 virt/heki/Makefile
create mode 100644 virt/heki/common.h
create mode 100644 virt/heki/main.c

diff --git a/Kconfig b/Kconfig
index 745bc773f567..0c844d9bcb03 100644
--- a/Kconfig
+++ b/Kconfig
@@ -29,4 +29,6 @@ source "lib/Kconfig"

source "lib/Kconfig.debug"

+source "virt/heki/Kconfig"
+
source "Documentation/Kconfig"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 928820e61cb5..d2fba63c289b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -34,6 +34,7 @@ config X86_64
select SWIOTLB
select ARCH_HAS_ELFCORE_COMPAT
select ZONE_DMA32
+ select ARCH_SUPPORTS_HEKI

config FORCE_DYNAMIC_FTRACE
def_bool y
diff --git a/include/linux/heki.h b/include/linux/heki.h
new file mode 100644
index 000000000000..4c18d2283392
--- /dev/null
+++ b/include/linux/heki.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Definitions
+ *
+ * Copyright © 2023 Microsoft Corporation
+ */
+
+#ifndef __HEKI_H__
+#define __HEKI_H__
+
+#include <linux/types.h>
+#include <linux/cache.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/printk.h>
+
+#ifdef CONFIG_HEKI
+
+extern bool heki_enabled;
+
+void heki_early_init(void);
+
+#else /* !CONFIG_HEKI */
+
+static inline void heki_early_init(void)
+{
+}
+
+#endif /* CONFIG_HEKI */
+
+#endif /* __HEKI_H__ */
diff --git a/init/main.c b/init/main.c
index 5dcf5274c09c..bec2c8d939aa 100644
--- a/init/main.c
+++ b/init/main.c
@@ -102,6 +102,7 @@
#include <linux/randomize_kstack.h>
#include <linux/pidfs.h>
#include <linux/ptdump.h>
+#include <linux/heki.h>
#include <net/net_namespace.h>

#include <asm/io.h>
@@ -1059,6 +1060,7 @@ void start_kernel(void)
uts_ns_init();
key_init();
security_init();
+ heki_early_init();
dbg_late_init();
net_ns_init();
vfs_caches_init();
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 549e76af8f82..89d9f97bd471 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -27,6 +27,7 @@
#include <linux/swap.h>
#include <linux/cma.h>
#include <linux/crash_dump.h>
+#include <linux/heki.h>
#include "internal.h"
#include "slab.h"
#include "shuffle.h"
diff --git a/virt/Makefile b/virt/Makefile
index 1cfea9436af9..856b5ccedb5a 100644
--- a/virt/Makefile
+++ b/virt/Makefile
@@ -1,2 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-y += lib/
+obj-$(CONFIG_HEKI_MENU) += heki/
diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig
new file mode 100644
index 000000000000..66e73d212856
--- /dev/null
+++ b/virt/heki/Kconfig
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Hypervisor Enforced Kernel Integrity (Heki)
+
+config ARCH_SUPPORTS_HEKI
+ bool
+ # An architecture should select this when it can successfully build
+ # and run with CONFIG_HEKI. That is, it should provide all of the
+ # architecture support required for the HEKI feature.
+
+menuconfig HEKI_MENU
+ bool "Virtualization hardening"
+
+if HEKI_MENU
+
+config HEKI
+ bool "Hypervisor Enforced Kernel Integrity (Heki)"
+ depends on ARCH_SUPPORTS_HEKI
+ help
+ This feature enhances guest virtual machine security by taking
+ advantage of security features provided by the hypervisor for guests.
+ This feature is helpful in maintaining guest virtual machine security
+ even after the guest kernel has been compromised.
+
+endif
diff --git a/virt/heki/Makefile b/virt/heki/Makefile
new file mode 100644
index 000000000000..8b10e73a154b
--- /dev/null
+++ b/virt/heki/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_HEKI) += main.o
diff --git a/virt/heki/common.h b/virt/heki/common.h
new file mode 100644
index 000000000000..edd98fc650a8
--- /dev/null
+++ b/virt/heki/common.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Common header
+ *
+ * Copyright © 2023 Microsoft Corporation
+ */
+
+#ifndef _HEKI_COMMON_H
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) "heki-guest: " fmt
+
+#endif /* _HEKI_COMMON_H */
diff --git a/virt/heki/main.c b/virt/heki/main.c
new file mode 100644
index 000000000000..25c25f5700f7
--- /dev/null
+++ b/virt/heki/main.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Common code
+ *
+ * Copyright © 2023 Microsoft Corporation
+ */
+
+#include <linux/heki.h>
+#include <linux/kstrtox.h>
+
+#include "common.h"
+
+bool heki_enabled __ro_after_init = true;
+
+/*
+ * Must be called after kmem_cache_init().
+ */
+__init void heki_early_init(void)
+{
+ if (!heki_enabled) {
+ pr_warn("Heki is not enabled\n");
+ return;
+ }
+ pr_warn("Heki is enabled\n");
+}
+
+static int __init heki_parse_config(char *str)
+{
+ if (kstrtobool(str, &heki_enabled))
+ pr_warn("Invalid option string for heki: '%s'\n", str);
+ return 1;
+}
+__setup("heki=", heki_parse_config);
--
2.45.0


2024-05-03 13:22:08

by Mickaël Salaün

[permalink] [raw]
Subject: [RFC PATCH v3 2/5] KVM: x86: Add new hypercall to lock control registers

This enables guests to lock their CR0 and CR4 registers with a subset of
X86_CR0_WP, X86_CR4_SMEP, X86_CR4_SMAP, X86_CR4_UMIP, X86_CR4_FSGSBASE
and X86_CR4_CET flags.

The new KVM_HC_LOCK_CR_UPDATE hypercall takes three arguments. The
first is to identify the control register, the second is a bit mask to
pin (i.e. mark as read-only), and the third is for optional flags.

These register flags should already be pinned by Linux guests, but once
compromised, this self-protection mechanism could be disabled, which is
not the case with this dedicated hypercall.

Once the CRs are pinned by the guest, if it attempts to change them,
then a general protection fault is sent to the guest.

This hypercall may evolve and support new kind of registers or pinning.
The optional KVM_LOCK_CR_UPDATE_VERSION flag enables guests to know the
supported abilities by mapping the returned version with the related
features.

Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Madhavan T. Venkataraman <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Signed-off-by: Mickaël Salaün <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---

Changes since v1:
* Guard KVM_HC_LOCK_CR_UPDATE hypercall with CONFIG_HEKI.
* Move extern cr4_pinned_mask to x86.h (suggested by Kees Cook).
* Move VMX CR checks from vmx_set_cr*() to handle_cr() to make it
possible to return to user space (see next commit).
* Change the heki_check_cr()'s first argument to vcpu.
* Don't use -KVM_EPERM in heki_check_cr().
* Generate a fault when the guest requests a denied CR update.
* Add a flags argument to get the version of this hypercall. Being able
to do a preper version check was suggested by Wei Liu.
---
Documentation/virt/kvm/x86/hypercalls.rst | 17 +++++
arch/x86/include/uapi/asm/kvm_para.h | 2 +
arch/x86/kernel/cpu/common.c | 7 +-
arch/x86/kvm/vmx/vmx.c | 5 ++
arch/x86/kvm/x86.c | 84 +++++++++++++++++++++++
arch/x86/kvm/x86.h | 22 ++++++
include/linux/kvm_host.h | 5 ++
include/uapi/linux/kvm_para.h | 1 +
8 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/x86/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst
index 10db7924720f..3178576f4c47 100644
--- a/Documentation/virt/kvm/x86/hypercalls.rst
+++ b/Documentation/virt/kvm/x86/hypercalls.rst
@@ -190,3 +190,20 @@ the KVM_CAP_EXIT_HYPERCALL capability. Userspace must enable that capability
before advertising KVM_FEATURE_HC_MAP_GPA_RANGE in the guest CPUID. In
addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace
must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL.
+
+9. KVM_HC_LOCK_CR_UPDATE
+------------------------
+
+:Architecture: x86
+:Status: active
+:Purpose: Request some control registers to be restricted.
+
+- a0: identify a control register
+- a1: bit mask to make some flags read-only
+- a2: optional KVM_LOCK_CR_UPDATE_VERSION flag that will return the version of
+ this hypercall. Version 1 supports CR0 and CR4 pinning.
+
+The hypercall lets a guest request control register flags to be pinned for
+itself.
+
+Returns 0 on success or a KVM error code otherwise.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index a1efa7907a0b..cfc17f3d1877 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -149,4 +149,6 @@ struct kvm_vcpu_pv_apf_data {
#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
#define KVM_PV_EOI_DISABLED 0x0

+#define KVM_LOCK_CR_UPDATE_VERSION (1 << 0)
+
#endif /* _UAPI_ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 605c26c009c8..69695d9d6e2a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -398,8 +398,11 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
}

/* These bits should not change their value after CPU init is finished. */
-static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
- X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED;
+const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP |
+ X86_CR4_UMIP | X86_CR4_FSGSBASE |
+ X86_CR4_CET | X86_CR4_FRED;
+EXPORT_SYMBOL_GPL(cr4_pinned_mask);
+
static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
static unsigned long cr4_pinned_bits __ro_after_init;

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 22411f4aff53..7ba970b525f7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5453,6 +5453,11 @@ static int handle_cr(struct kvm_vcpu *vcpu)
case 0: /* mov to cr */
val = kvm_register_read(vcpu, reg);
trace_kvm_cr_write(cr, val);
+
+ ret = heki_check_cr(vcpu, cr, val);
+ if (ret)
+ return ret;
+
switch (cr) {
case 0:
err = handle_set_cr0(vcpu, val);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 91478b769af0..a5f47be59abc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8281,11 +8281,86 @@ static unsigned long emulator_get_cr(struct x86_emulate_ctxt *ctxt, int cr)
return value;
}

+#ifdef CONFIG_HEKI
+
+#define HEKI_ABI_VERSION 1
+
+static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
+ unsigned long pin, unsigned long flags)
+{
+ if (flags) {
+ if ((flags == KVM_LOCK_CR_UPDATE_VERSION) && !cr && !pin)
+ return HEKI_ABI_VERSION;
+ return -KVM_EINVAL;
+ }
+
+ if (!pin)
+ return -KVM_EINVAL;
+
+ switch (cr) {
+ case 0:
+ /* Cf. arch/x86/kernel/cpu/common.c */
+ if (!(pin & X86_CR0_WP))
+ return -KVM_EINVAL;
+
+ if ((pin & read_cr0()) != pin)
+ return -KVM_EINVAL;
+
+ atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr0);
+ return 0;
+ case 4:
+ /* Checks for irrelevant bits. */
+ if ((pin & cr4_pinned_mask) != pin)
+ return -KVM_EINVAL;
+
+ /* Ignores bits not present in host. */
+ pin &= __read_cr4();
+ atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr4);
+ return 0;
+ }
+ return -KVM_EINVAL;
+}
+
+int heki_check_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
+ const unsigned long val)
+{
+ unsigned long pinned;
+
+ switch (cr) {
+ case 0:
+ pinned = atomic_long_read(&vcpu->kvm->heki_pinned_cr0);
+ if ((val & pinned) != pinned) {
+ pr_warn_ratelimited(
+ "heki: Blocked CR0 update: 0x%lx\n", val);
+ kvm_inject_gp(vcpu, 0);
+ return 1;
+ }
+ return 0;
+ case 4:
+ pinned = atomic_long_read(&vcpu->kvm->heki_pinned_cr4);
+ if ((val & pinned) != pinned) {
+ pr_warn_ratelimited(
+ "heki: Blocked CR4 update: 0x%lx\n", val);
+ kvm_inject_gp(vcpu, 0);
+ return 1;
+ }
+ return 0;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(heki_check_cr);
+
+#endif /* CONFIG_HEKI */
+
static int emulator_set_cr(struct x86_emulate_ctxt *ctxt, int cr, ulong val)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
int res = 0;

+ res = heki_check_cr(vcpu, cr, val);
+ if (res)
+ return res;
+
switch (cr) {
case 0:
res = kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val));
@@ -10142,6 +10217,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
vcpu->arch.complete_userspace_io = complete_hypercall_exit;
return 0;
}
+#ifdef CONFIG_HEKI
+ case KVM_HC_LOCK_CR_UPDATE:
+ if (a0 > U32_MAX) {
+ ret = -KVM_EINVAL;
+ } else {
+ ret = heki_lock_cr(vcpu, a0, a1, a2);
+ }
+ break;
+#endif /* CONFIG_HEKI */
default:
ret = -KVM_ENOSYS;
break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a8b71803777b..ade7d68ddaff 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -290,6 +290,26 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
return !(kvm->arch.disabled_quirks & quirk);
}

+#ifdef CONFIG_HEKI
+
+int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, unsigned long val);
+
+#else /* CONFIG_HEKI */
+
+static inline int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr,
+ unsigned long val)
+{
+ return 0;
+}
+
+static inline int heki_lock_cr(struct kvm_vcpu *const vcpu, unsigned long cr,
+ unsigned long pin)
+{
+ return 0;
+}
+
+#endif /* CONFIG_HEKI */
+
void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);

u64 get_kvmclock_ns(struct kvm *kvm);
@@ -327,6 +347,8 @@ extern u64 host_xcr0;
extern u64 host_xss;
extern u64 host_arch_capabilities;

+extern const unsigned long cr4_pinned_mask;
+
extern struct kvm_caps kvm_caps;

extern bool enable_pmu;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 48f31dcd318a..6ff13937929a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -836,6 +836,11 @@ struct kvm {
bool vm_bugged;
bool vm_dead;

+#ifdef CONFIG_HEKI
+ atomic_long_t heki_pinned_cr0;
+ atomic_long_t heki_pinned_cr4;
+#endif /* CONFIG_HEKI */
+
#ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
struct notifier_block pm_notifier;
#endif
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 960c7e93d1a9..2ed418704603 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -30,6 +30,7 @@
#define KVM_HC_SEND_IPI 10
#define KVM_HC_SCHED_YIELD 11
#define KVM_HC_MAP_GPA_RANGE 12
+#define KVM_HC_LOCK_CR_UPDATE 13

/*
* hypercalls use architecture specific
--
2.45.0


2024-05-03 13:22:31

by Mickaël Salaün

[permalink] [raw]
Subject: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

Add an interface for user space to be notified about guests' Heki policy
and related violations.

Extend the KVM_ENABLE_CAP IOCTL with KVM_CAP_HEKI_CONFIGURE and
KVM_CAP_HEKI_DENIAL. Each one takes a bitmask as first argument that can
contains KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4. The
returned value is the bitmask of known Heki exit reasons, for now:
KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4.

If KVM_CAP_HEKI_CONFIGURE is set, a VM exit will be triggered for each
KVM_HC_LOCK_CR_UPDATE hypercalls according to the requested control
register. This enables to enlighten the VMM with the guest
auto-restrictions.

If KVM_CAP_HEKI_DENIAL is set, a VM exit will be triggered for each
pinned CR violation. This enables the VMM to react to a policy
violation.

Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Madhavan T. Venkataraman <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Signed-off-by: Mickaël Salaün <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---

Changes since v1:
* New patch. Making user space aware of Heki properties was requested by
Sean Christopherson.
---
arch/x86/kvm/vmx/vmx.c | 5 +-
arch/x86/kvm/x86.c | 114 +++++++++++++++++++++++++++++++++++----
arch/x86/kvm/x86.h | 7 +--
include/linux/kvm_host.h | 2 +
include/uapi/linux/kvm.h | 22 ++++++++
5 files changed, 136 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7ba970b525f7..5869a1ed7866 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5445,6 +5445,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
int reg;
int err;
int ret;
+ bool exit = false;

exit_qualification = vmx_get_exit_qual(vcpu);
cr = exit_qualification & 15;
@@ -5454,8 +5455,8 @@ static int handle_cr(struct kvm_vcpu *vcpu)
val = kvm_register_read(vcpu, reg);
trace_kvm_cr_write(cr, val);

- ret = heki_check_cr(vcpu, cr, val);
- if (ret)
+ ret = heki_check_cr(vcpu, cr, val, &exit);
+ if (exit)
return ret;

switch (cr) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a5f47be59abc..865e88f2b0fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -119,6 +119,10 @@ static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;

#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE

+#define KVM_HEKI_EXIT_REASON_VALID_MASK ( \
+ KVM_HEKI_EXIT_REASON_CR0 | \
+ KVM_HEKI_EXIT_REASON_CR4)
+
#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)

@@ -4836,6 +4840,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
r |= BIT(KVM_X86_SW_PROTECTED_VM);
break;
+ case KVM_CAP_HEKI_CONFIGURE:
+ case KVM_CAP_HEKI_DENIAL:
+ r = KVM_HEKI_EXIT_REASON_VALID_MASK;
+ break;
default:
break;
}
@@ -6729,6 +6737,22 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+#ifdef CONFIG_HEKI
+ case KVM_CAP_HEKI_CONFIGURE:
+ r = -EINVAL;
+ if (cap->args[0] & ~KVM_HEKI_EXIT_REASON_VALID_MASK)
+ break;
+ kvm->heki_configure_exit_reason = cap->args[0];
+ r = 0;
+ break;
+ case KVM_CAP_HEKI_DENIAL:
+ r = -EINVAL;
+ if (cap->args[0] & ~KVM_HEKI_EXIT_REASON_VALID_MASK)
+ break;
+ kvm->heki_denial_exit_reason = cap->args[0];
+ r = 0;
+ break;
+#endif
default:
r = -EINVAL;
break;
@@ -8283,11 +8307,60 @@ static unsigned long emulator_get_cr(struct x86_emulate_ctxt *ctxt, int cr)

#ifdef CONFIG_HEKI

+static int complete_heki_configure_exit(struct kvm_vcpu *const vcpu)
+{
+ kvm_rax_write(vcpu, 0);
+ ++vcpu->stat.hypercalls;
+ return kvm_skip_emulated_instruction(vcpu);
+}
+
+static int complete_heki_denial_exit(struct kvm_vcpu *const vcpu)
+{
+ kvm_inject_gp(vcpu, 0);
+ return 1;
+}
+
+/* Returns true if the @exit_reason is handled by @vcpu->kvm. */
+static bool heki_exit_cr(struct kvm_vcpu *const vcpu, const __u32 exit_reason,
+ const u64 heki_reason, unsigned long value)
+{
+ switch (exit_reason) {
+ case KVM_EXIT_HEKI_CONFIGURE:
+ if (!(vcpu->kvm->heki_configure_exit_reason & heki_reason))
+ return false;
+
+ vcpu->run->heki_configure.reason = heki_reason;
+ memset(vcpu->run->heki_configure.reserved, 0,
+ sizeof(vcpu->run->heki_configure.reserved));
+ vcpu->run->heki_configure.cr_pinned = value;
+ vcpu->arch.complete_userspace_io = complete_heki_configure_exit;
+ break;
+ case KVM_EXIT_HEKI_DENIAL:
+ if (!(vcpu->kvm->heki_denial_exit_reason & heki_reason))
+ return false;
+
+ vcpu->run->heki_denial.reason = heki_reason;
+ memset(vcpu->run->heki_denial.reserved, 0,
+ sizeof(vcpu->run->heki_denial.reserved));
+ vcpu->run->heki_denial.cr_value = value;
+ vcpu->arch.complete_userspace_io = complete_heki_denial_exit;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return false;
+ }
+
+ vcpu->run->exit_reason = exit_reason;
+ return true;
+}
+
#define HEKI_ABI_VERSION 1

static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
- unsigned long pin, unsigned long flags)
+ unsigned long pin, unsigned long flags, bool *exit)
{
+ *exit = false;
+
if (flags) {
if ((flags == KVM_LOCK_CR_UPDATE_VERSION) && !cr && !pin)
return HEKI_ABI_VERSION;
@@ -8307,6 +8380,8 @@ static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
return -KVM_EINVAL;

atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr0);
+ *exit = heki_exit_cr(vcpu, KVM_EXIT_HEKI_CONFIGURE,
+ KVM_HEKI_EXIT_REASON_CR0, pin);
return 0;
case 4:
/* Checks for irrelevant bits. */
@@ -8316,24 +8391,37 @@ static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
/* Ignores bits not present in host. */
pin &= __read_cr4();
atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr4);
+ *exit = heki_exit_cr(vcpu, KVM_EXIT_HEKI_CONFIGURE,
+ KVM_HEKI_EXIT_REASON_CR4, pin);
return 0;
}
return -KVM_EINVAL;
}

+/*
+ * Sets @exit to true if the caller must exit (i.e. denied access) with the
+ * returned value:
+ * - 0 when kvm_run is configured;
+ * - 1 when there is no user space handler.
+ */
int heki_check_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
- const unsigned long val)
+ const unsigned long val, bool *exit)
{
unsigned long pinned;

+ *exit = false;
+
switch (cr) {
case 0:
pinned = atomic_long_read(&vcpu->kvm->heki_pinned_cr0);
if ((val & pinned) != pinned) {
pr_warn_ratelimited(
"heki: Blocked CR0 update: 0x%lx\n", val);
- kvm_inject_gp(vcpu, 0);
- return 1;
+ *exit = true;
+ if (heki_exit_cr(vcpu, KVM_EXIT_HEKI_DENIAL,
+ KVM_HEKI_EXIT_REASON_CR0, val))
+ return 0;
+ return complete_heki_denial_exit(vcpu);
}
return 0;
case 4:
@@ -8341,8 +8429,11 @@ int heki_check_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
if ((val & pinned) != pinned) {
pr_warn_ratelimited(
"heki: Blocked CR4 update: 0x%lx\n", val);
- kvm_inject_gp(vcpu, 0);
- return 1;
+ *exit = true;
+ if (heki_exit_cr(vcpu, KVM_EXIT_HEKI_DENIAL,
+ KVM_HEKI_EXIT_REASON_CR4, val))
+ return 0;
+ return complete_heki_denial_exit(vcpu);
}
return 0;
}
@@ -8356,9 +8447,10 @@ static int emulator_set_cr(struct x86_emulate_ctxt *ctxt, int cr, ulong val)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
int res = 0;
+ bool exit = false;

- res = heki_check_cr(vcpu, cr, val);
- if (res)
+ res = heki_check_cr(vcpu, cr, val, &exit);
+ if (exit)
return res;

switch (cr) {
@@ -10222,7 +10314,11 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
if (a0 > U32_MAX) {
ret = -KVM_EINVAL;
} else {
- ret = heki_lock_cr(vcpu, a0, a1, a2);
+ bool exit = false;
+
+ ret = heki_lock_cr(vcpu, a0, a1, a2, &exit);
+ if (exit)
+ return ret;
}
break;
#endif /* CONFIG_HEKI */
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index ade7d68ddaff..2740b74ab583 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -292,18 +292,19 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)

#ifdef CONFIG_HEKI

-int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, unsigned long val);
+int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, unsigned long val,
+ bool *exit);

#else /* CONFIG_HEKI */

static inline int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr,
- unsigned long val)
+ unsigned long val, bool *exit)
{
return 0;
}

static inline int heki_lock_cr(struct kvm_vcpu *const vcpu, unsigned long cr,
- unsigned long pin)
+ unsigned long pin, bool *exit)
{
return 0;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6ff13937929a..cf8e271d47aa 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -839,6 +839,8 @@ struct kvm {
#ifdef CONFIG_HEKI
atomic_long_t heki_pinned_cr0;
atomic_long_t heki_pinned_cr4;
+ u64 heki_configure_exit_reason;
+ u64 heki_denial_exit_reason;
#endif /* CONFIG_HEKI */

#ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..1051c2f817ba 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -178,6 +178,8 @@ struct kvm_xen_exit {
#define KVM_EXIT_NOTIFY 37
#define KVM_EXIT_LOONGARCH_IOCSR 38
#define KVM_EXIT_MEMORY_FAULT 39
+#define KVM_EXIT_HEKI_CONFIGURE 40
+#define KVM_EXIT_HEKI_DENIAL 41

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -433,6 +435,24 @@ struct kvm_run {
__u64 gpa;
__u64 size;
} memory_fault;
+ /* KVM_EXIT_HEKI_CONFIGURE */
+ struct {
+#define KVM_HEKI_EXIT_REASON_CR0 (1ULL << 0)
+#define KVM_HEKI_EXIT_REASON_CR4 (1ULL << 1)
+ __u64 reason;
+ union {
+ __u64 cr_pinned;
+ __u64 reserved[7]; /* ignored */
+ };
+ } heki_configure;
+ /* KVM_EXIT_HEKI_DENIAL */
+ struct {
+ __u64 reason;
+ union {
+ __u64 cr_value;
+ __u64 reserved[7]; /* ignored */
+ };
+ } heki_denial;
/* Fix the size of the union. */
char padding[256];
};
@@ -917,6 +937,8 @@ struct kvm_enable_cap {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_HEKI_CONFIGURE 236
+#define KVM_CAP_HEKI_DENIAL 237

struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.45.0


2024-05-06 17:50:34

by Mickaël Salaün

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Fri, May 03, 2024 at 07:03:21AM GMT, Sean Christopherson wrote:
> On Fri, May 03, 2024, Mickaël Salaün wrote:
> > Add an interface for user space to be notified about guests' Heki policy
> > and related violations.
> >
> > Extend the KVM_ENABLE_CAP IOCTL with KVM_CAP_HEKI_CONFIGURE and
> > KVM_CAP_HEKI_DENIAL. Each one takes a bitmask as first argument that can
> > contains KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4. The
> > returned value is the bitmask of known Heki exit reasons, for now:
> > KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4.
> >
> > If KVM_CAP_HEKI_CONFIGURE is set, a VM exit will be triggered for each
> > KVM_HC_LOCK_CR_UPDATE hypercalls according to the requested control
> > register. This enables to enlighten the VMM with the guest
> > auto-restrictions.
> >
> > If KVM_CAP_HEKI_DENIAL is set, a VM exit will be triggered for each
> > pinned CR violation. This enables the VMM to react to a policy
> > violation.
> >
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: H. Peter Anvin <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Madhavan T. Venkataraman <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: Sean Christopherson <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Vitaly Kuznetsov <[email protected]>
> > Cc: Wanpeng Li <[email protected]>
> > Signed-off-by: Mickaël Salaün <[email protected]>
> > Link: https://lore.kernel.org/r/[email protected]
> > ---
> >
> > Changes since v1:
> > * New patch. Making user space aware of Heki properties was requested by
> > Sean Christopherson.
>
> No, I suggested having userspace _control_ the pinning[*], not merely be notified
> of pinning.
>
> : IMO, manipulation of protections, both for memory (this patch) and CPU state
> : (control registers in the next patch) should come from userspace. I have no
> : objection to KVM providing plumbing if necessary, but I think userspace needs to
> : to have full control over the actual state.
> :
> : One of the things that caused Intel's control register pinning series to stall
> : out was how to handle edge cases like kexec() and reboot. Deferring to userspace
> : means the kernel doesn't need to define policy, e.g. when to unprotect memory,
> : and avoids questions like "should userspace be able to overwrite pinned control
> : registers".
> :
> : And like the confidential VM use case, keeping userspace in the loop is a big
> : beneifit, e.g. the guest can't circumvent protections by coercing userspace into
> : writing to protected memory.
>
> I stand by that suggestion, because I don't see a sane way to handle things like
> kexec() and reboot without having a _much_ more sophisticated policy than would
> ever be acceptable in KVM.
>
> I think that can be done without KVM having any awareness of CR pinning whatsoever.
> E.g. userspace just needs to ability to intercept CR writes and inject #GPs. Off
> the cuff, I suspect the uAPI could look very similar to MSR filtering. E.g. I bet
> userspace could enforce MSR pinning without any new KVM uAPI at all.
>
> [*] https://lore.kernel.org/all/[email protected]

OK, I had concern about the control not directly coming from the guest,
especially in the case of pKVM and confidential computing, but I get you
point. It should indeed be quite similar to the MSR filtering on the
userspace side, except that we need another interface for the guest to
request such change (i.e. self-protection).

Would it be OK to keep this new KVM_HC_LOCK_CR_UPDATE hypercall but
forward the request to userspace with a VM exit instead? That would
also enable userspace to get the request and directly configure the CR
pinning with the same VM exit.

2024-05-07 01:35:11

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Mon, May 06, 2024, Mickaël Salaün wrote:
> On Fri, May 03, 2024 at 07:03:21AM GMT, Sean Christopherson wrote:
> > > ---
> > >
> > > Changes since v1:
> > > * New patch. Making user space aware of Heki properties was requested by
> > > Sean Christopherson.
> >
> > No, I suggested having userspace _control_ the pinning[*], not merely be notified
> > of pinning.
> >
> > : IMO, manipulation of protections, both for memory (this patch) and CPU state
> > : (control registers in the next patch) should come from userspace. I have no
> > : objection to KVM providing plumbing if necessary, but I think userspace needs to
> > : to have full control over the actual state.
> > :
> > : One of the things that caused Intel's control register pinning series to stall
> > : out was how to handle edge cases like kexec() and reboot. Deferring to userspace
> > : means the kernel doesn't need to define policy, e.g. when to unprotect memory,
> > : and avoids questions like "should userspace be able to overwrite pinned control
> > : registers".
> > :
> > : And like the confidential VM use case, keeping userspace in the loop is a big
> > : beneifit, e.g. the guest can't circumvent protections by coercing userspace into
> > : writing to protected memory.
> >
> > I stand by that suggestion, because I don't see a sane way to handle things like
> > kexec() and reboot without having a _much_ more sophisticated policy than would
> > ever be acceptable in KVM.
> >
> > I think that can be done without KVM having any awareness of CR pinning whatsoever.
> > E.g. userspace just needs to ability to intercept CR writes and inject #GPs. Off
> > the cuff, I suspect the uAPI could look very similar to MSR filtering. E.g. I bet
> > userspace could enforce MSR pinning without any new KVM uAPI at all.
> >
> > [*] https://lore.kernel.org/all/[email protected]
>
> OK, I had concern about the control not directly coming from the guest,
> especially in the case of pKVM and confidential computing, but I get you

Hardware-based CoCo is completely out of scope, because KVM has zero visibility
into the guest (well, SNP technically allows trapping CR0/CR4, but KVM really
shouldn't intercept CR0/CR4 for SNP guests).

And more importantly, _KVM_ doesn't define any policies for CoCo VMs. KVM might
help enforce policies that are defined by hardware/firmware, but KVM doesn't
define any of its own.

If pKVM on x86 comes along, then KVM will likely get in the business of defining
policy, but until that happens, KVM needs to stay firmly out of the picture.

> point. It should indeed be quite similar to the MSR filtering on the
> userspace side, except that we need another interface for the guest to
> request such change (i.e. self-protection).
>
> Would it be OK to keep this new KVM_HC_LOCK_CR_UPDATE hypercall but
> forward the request to userspace with a VM exit instead? That would
> also enable userspace to get the request and directly configure the CR
> pinning with the same VM exit.

No? Maybe? I strongly suspect that full support will need a richer set of APIs
than a single hypercall. E.g. to handle kexec(), suspend+resume, emulated SMM,
and so on and so forth. And that's just for CR pinning.

And hypercalls are hampered by the fact that VMCALL/VMMCALL don't allow for
delegation or restriction, i.e. there's no way for the guest to communicate to
the hypervisor that a less privileged component is allowed to perform some action,
nor is there a way for the guest to say some chunk of CPL0 code *isn't* allowed
to request transition. Delegation and restriction all has to be done out-of-band.

It'd probably be more annoying to setup initially, but I think a synthetic device
with an MMIO-based interface would be more powerful and flexible in the long run.
Then userspace can evolve without needing to wait for KVM to catch up.

Actually, potential bad/crazy idea. Why does the _host_ need to define policy?
Linux already knows what assets it wants to (un)protect and when. What's missing
is a way for the guest kernel to effectively deprivilege and re-authenticate
itself as needed. We've been tossing around the idea of paired VMs+vCPUs to
support VTLs and SEV's VMPLs, what if we usurped/piggybacked those ideas, with a
bit of pKVM mixed in?

Borrowing VTL terminology, where VTL0 is the least privileged, userspace launches
the VM at VTL0. At some point, the guest triggers the deprivileging sequence and
userspace creates VTL1. Userpace also provides a way for VTL0 restrict access to
its memory, e.g. to effectively make the page tables for the kernel's direct map
writable only from VTL1, to make kernel text RO (or XO), etc. And VTL0 could then
also completely remove its access to code that changes CR0/CR4.

It would obviously require a _lot_ more upfront work, e.g. to isolate the kernel
text that modifies CR0/CR4 so that it can be removed from VTL0, but that should
be doable with annotations, e.g. tag relevant functions with __magic or whatever,
throw them in a dedicated section, and then free/protect the section(s) at the
appropriate time.

KVM would likely need to provide the ability to switch VTLs (or whatever they get
called), and host userspace would need to provide a decent amount of the backend
mechanisms and "core" policies, e.g. to manage VTL0 memory, teardown (turn off?)
VTL1 on kexec(), etc. But everything else could live in the guest kernel itself.
E.g. to have CR pinning play nice with kexec(), toss the relevant kexec() code into
VTL1. That way VTL1 can verify the kexec() target and tear itself down before
jumping into the new kernel.

This is very off the cuff and have-wavy, e.g. I don't have much of an idea what
it would take to harden kernel text patching, but keeping the policy in the guest
seems like it'd make everything more tractable than trying to define an ABI
between Linux and a VMM that is rich and flexible enough to support all the
fancy things Linux does (and will do in the future).

Am I crazy? Or maybe reinventing whatever that McAfee thing was that led to
Intel implementing EPTP switching?

2024-05-07 09:30:52

by Mickaël Salaün

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Mon, May 06, 2024 at 06:34:53PM GMT, Sean Christopherson wrote:
> On Mon, May 06, 2024, Mickaël Salaün wrote:
> > On Fri, May 03, 2024 at 07:03:21AM GMT, Sean Christopherson wrote:
> > > > ---
> > > >
> > > > Changes since v1:
> > > > * New patch. Making user space aware of Heki properties was requested by
> > > > Sean Christopherson.
> > >
> > > No, I suggested having userspace _control_ the pinning[*], not merely be notified
> > > of pinning.
> > >
> > > : IMO, manipulation of protections, both for memory (this patch) and CPU state
> > > : (control registers in the next patch) should come from userspace. I have no
> > > : objection to KVM providing plumbing if necessary, but I think userspace needs to
> > > : to have full control over the actual state.
> > > :
> > > : One of the things that caused Intel's control register pinning series to stall
> > > : out was how to handle edge cases like kexec() and reboot. Deferring to userspace
> > > : means the kernel doesn't need to define policy, e.g. when to unprotect memory,
> > > : and avoids questions like "should userspace be able to overwrite pinned control
> > > : registers".
> > > :
> > > : And like the confidential VM use case, keeping userspace in the loop is a big
> > > : beneifit, e.g. the guest can't circumvent protections by coercing userspace into
> > > : writing to protected memory.
> > >
> > > I stand by that suggestion, because I don't see a sane way to handle things like
> > > kexec() and reboot without having a _much_ more sophisticated policy than would
> > > ever be acceptable in KVM.
> > >
> > > I think that can be done without KVM having any awareness of CR pinning whatsoever.
> > > E.g. userspace just needs to ability to intercept CR writes and inject #GPs. Off
> > > the cuff, I suspect the uAPI could look very similar to MSR filtering. E.g. I bet
> > > userspace could enforce MSR pinning without any new KVM uAPI at all.
> > >
> > > [*] https://lore.kernel.org/all/[email protected]
> >
> > OK, I had concern about the control not directly coming from the guest,
> > especially in the case of pKVM and confidential computing, but I get you
>
> Hardware-based CoCo is completely out of scope, because KVM has zero visibility
> into the guest (well, SNP technically allows trapping CR0/CR4, but KVM really
> shouldn't intercept CR0/CR4 for SNP guests).
>
> And more importantly, _KVM_ doesn't define any policies for CoCo VMs. KVM might
> help enforce policies that are defined by hardware/firmware, but KVM doesn't
> define any of its own.
>
> If pKVM on x86 comes along, then KVM will likely get in the business of defining
> policy, but until that happens, KVM needs to stay firmly out of the picture.
>
> > point. It should indeed be quite similar to the MSR filtering on the
> > userspace side, except that we need another interface for the guest to
> > request such change (i.e. self-protection).
> >
> > Would it be OK to keep this new KVM_HC_LOCK_CR_UPDATE hypercall but
> > forward the request to userspace with a VM exit instead? That would
> > also enable userspace to get the request and directly configure the CR
> > pinning with the same VM exit.
>
> No? Maybe? I strongly suspect that full support will need a richer set of APIs
> than a single hypercall. E.g. to handle kexec(), suspend+resume, emulated SMM,
> and so on and so forth. And that's just for CR pinning.
>
> And hypercalls are hampered by the fact that VMCALL/VMMCALL don't allow for
> delegation or restriction, i.e. there's no way for the guest to communicate to
> the hypervisor that a less privileged component is allowed to perform some action,
> nor is there a way for the guest to say some chunk of CPL0 code *isn't* allowed
> to request transition. Delegation and restriction all has to be done out-of-band.
>
> It'd probably be more annoying to setup initially, but I think a synthetic device
> with an MMIO-based interface would be more powerful and flexible in the long run.
> Then userspace can evolve without needing to wait for KVM to catch up.
>
> Actually, potential bad/crazy idea. Why does the _host_ need to define policy?
> Linux already knows what assets it wants to (un)protect and when. What's missing
> is a way for the guest kernel to effectively deprivilege and re-authenticate
> itself as needed. We've been tossing around the idea of paired VMs+vCPUs to
> support VTLs and SEV's VMPLs, what if we usurped/piggybacked those ideas, with a
> bit of pKVM mixed in?
>
> Borrowing VTL terminology, where VTL0 is the least privileged, userspace launches
> the VM at VTL0. At some point, the guest triggers the deprivileging sequence and
> userspace creates VTL1. Userpace also provides a way for VTL0 restrict access to
> its memory, e.g. to effectively make the page tables for the kernel's direct map
> writable only from VTL1, to make kernel text RO (or XO), etc. And VTL0 could then
> also completely remove its access to code that changes CR0/CR4.
>
> It would obviously require a _lot_ more upfront work, e.g. to isolate the kernel
> text that modifies CR0/CR4 so that it can be removed from VTL0, but that should
> be doable with annotations, e.g. tag relevant functions with __magic or whatever,
> throw them in a dedicated section, and then free/protect the section(s) at the
> appropriate time.
>
> KVM would likely need to provide the ability to switch VTLs (or whatever they get
> called), and host userspace would need to provide a decent amount of the backend
> mechanisms and "core" policies, e.g. to manage VTL0 memory, teardown (turn off?)
> VTL1 on kexec(), etc. But everything else could live in the guest kernel itself.
> E.g. to have CR pinning play nice with kexec(), toss the relevant kexec() code into
> VTL1. That way VTL1 can verify the kexec() target and tear itself down before
> jumping into the new kernel.
>
> This is very off the cuff and have-wavy, e.g. I don't have much of an idea what
> it would take to harden kernel text patching, but keeping the policy in the guest
> seems like it'd make everything more tractable than trying to define an ABI
> between Linux and a VMM that is rich and flexible enough to support all the
> fancy things Linux does (and will do in the future).

Yes, we agree that the guest needs to manage its own policy. That's why
we implemented Heki for KVM this way, but without VTLs because KVM
doesn't support them.

To sum up, is the VTL approach the only one that would be acceptable for
KVM? If yes, that would indeed require a *lot* of work for something
we're not sure will be accepted later on.

>
> Am I crazy? Or maybe reinventing whatever that McAfee thing was that led to
> Intel implementing EPTP switching?
>

2024-05-07 16:16:28

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Tue, May 07, 2024, Mickaël Salaün wrote:
> > Actually, potential bad/crazy idea. Why does the _host_ need to define policy?
> > Linux already knows what assets it wants to (un)protect and when. What's missing
> > is a way for the guest kernel to effectively deprivilege and re-authenticate
> > itself as needed. We've been tossing around the idea of paired VMs+vCPUs to
> > support VTLs and SEV's VMPLs, what if we usurped/piggybacked those ideas, with a
> > bit of pKVM mixed in?
> >
> > Borrowing VTL terminology, where VTL0 is the least privileged, userspace launches
> > the VM at VTL0. At some point, the guest triggers the deprivileging sequence and
> > userspace creates VTL1. Userpace also provides a way for VTL0 restrict access to
> > its memory, e.g. to effectively make the page tables for the kernel's direct map
> > writable only from VTL1, to make kernel text RO (or XO), etc. And VTL0 could then
> > also completely remove its access to code that changes CR0/CR4.
> >
> > It would obviously require a _lot_ more upfront work, e.g. to isolate the kernel
> > text that modifies CR0/CR4 so that it can be removed from VTL0, but that should
> > be doable with annotations, e.g. tag relevant functions with __magic or whatever,
> > throw them in a dedicated section, and then free/protect the section(s) at the
> > appropriate time.
> >
> > KVM would likely need to provide the ability to switch VTLs (or whatever they get
> > called), and host userspace would need to provide a decent amount of the backend
> > mechanisms and "core" policies, e.g. to manage VTL0 memory, teardown (turn off?)
> > VTL1 on kexec(), etc. But everything else could live in the guest kernel itself.
> > E.g. to have CR pinning play nice with kexec(), toss the relevant kexec() code into
> > VTL1. That way VTL1 can verify the kexec() target and tear itself down before
> > jumping into the new kernel.
> >
> > This is very off the cuff and have-wavy, e.g. I don't have much of an idea what
> > it would take to harden kernel text patching, but keeping the policy in the guest
> > seems like it'd make everything more tractable than trying to define an ABI
> > between Linux and a VMM that is rich and flexible enough to support all the
> > fancy things Linux does (and will do in the future).
>
> Yes, we agree that the guest needs to manage its own policy. That's why
> we implemented Heki for KVM this way, but without VTLs because KVM
> doesn't support them.
>
> To sum up, is the VTL approach the only one that would be acceptable for
> KVM?

Heh, that's not a question you want to be asking. You're effectively asking me
to make an authorative, "final" decision on a topic which I am only passingly
familiar with.

But since you asked it... :-) Probably?

I see a lot of advantages to a VTL/VSM-like approach:

1. Provides Linux-as-a guest the flexibility it needs to meaningfully advance
its security, with the least amount of policy built into the guest/host ABI.

2. Largely decouples guest policy from the host, i.e. should allow the guest to
evolve/update it's policy without needing to coordinate changes with the host.

3. The KVM implementation can be generic enough to be reusable for other features.

4. Other groups are already working on VTL-like support in KVM, e.g. for VSM
itself, and potentially for VMPL/SVSM support.

IMO, #2 is a *huge* selling point. Not having to coordinate changes across
multiple code bases and/or organizations and/or maintainers is a big win for
velocity, long term maintenance, and probably the very viability of HEKI.

Providing the guest with the tools to define and implement its own policy means
end users don't have to way for some third party, e.g. CSPs, to deploy the
accompanying host-side changes, because there are no host-side changes.

And encapsulating everything in the guest drastically reduces the friction with
changes in the kernel that interact with hardening, both from a technical and a
social perspective. I.e. giving the kernel (near) complete control over its
destiny minimizes the number of moving parts, and will be far, far easier to sell
to maintainers. I would expect maintainers to react much more favorably to being
handed tools to harden the kernel, as opposed to being presented a set of APIs
that can be used to make the kernel compliant with _someone else's_ vision of
what kernel hardening should look like.

E.g. imagine a new feature comes along that requires overriding CR0/CR4 pinning
in a way that doesn't fit into existing policy. If the VMM is involved in
defining/enforcing the CR pinning policy, then supporting said new feature would
require new guest/host ABI and an updated host VMM in order to make the new
feature compatible with HEKI. Inevitably, even if everything goes smoothly from
an upstreaming perspective, that will result in guests that have to choose between
HEKI and new feature X, because there is zero chance that all hosts that run Linux
as a guest will be updated in advance of new feature X being deployed.

And if/when things don't go smoothly, odds are very good that kernel maintainers
will eventually tire of having to coordinate and negotiate with QEMU and other
VMMs, and will become resistant to continuing to support/extend HEKI.

> If yes, that would indeed require a *lot* of work for something we're not
> sure will be accepted later on.

Yes and no. The AWS folks are pursuing VSM support in KVM+QEMU, and SVSM support
is trending toward the paired VM+vCPU model. IMO, it's entirely feasible to
design KVM support such that much of the development load can be shared between
the projects. And having 2+ use cases for a feature (set) makes it _much_ more
likely that the feature(s) will be accepted.

And similar to what Paolo said regarding HEKI not having a complete story, I
don't see a clear line of sight for landing host-defined policy enforcement, as
there are many open, non-trivial questions that need answers. I.e. upstreaming
HEKI in its current form is also far from a done deal, and isn't guaranteed to
be substantially less work when all is said and done.

2024-05-10 10:07:34

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Tue May 7, 2024 at 4:16 PM UTC, Sean Christopherson wrote:
> > If yes, that would indeed require a *lot* of work for something we're not
> > sure will be accepted later on.
>
> Yes and no. The AWS folks are pursuing VSM support in KVM+QEMU, and SVSM support
> is trending toward the paired VM+vCPU model. IMO, it's entirely feasible to
> design KVM support such that much of the development load can be shared between
> the projects. And having 2+ use cases for a feature (set) makes it _much_ more
> likely that the feature(s) will be accepted.

Since Sean mentioned our VSM efforts, a small update. We were able to
validate the concept of one KVM VM per VTL as discussed in LPC. Right
now only for single CPU guests, but are in the late stages of bringing
up MP support. The resulting KVM code is small, and most will be
uncontroversial (I hope). If other obligations allow it, we plan on
having something suitable for review in the coming months.

Our implementation aims to implement all the VSM spec necessary to run
with Microsoft Credential Guard. But note that some aspects necessary
for HVCI are not covered, especially the ones that depend on MBEC
support, or some categories of secure intercepts.

Development happens
https://github.com/vianpl/{linux,qemu,kvm-unit-tests} and the vsm-next
branch, but I'd advice against looking into it until we add some order
to the rework. Regardless, feel free to get in touch.

Nicolas

2024-05-14 12:16:13

by Mickaël Salaün

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Tue, May 07, 2024 at 09:16:06AM -0700, Sean Christopherson wrote:
> On Tue, May 07, 2024, Mickaël Salaün wrote:
> > > Actually, potential bad/crazy idea. Why does the _host_ need to define policy?
> > > Linux already knows what assets it wants to (un)protect and when. What's missing
> > > is a way for the guest kernel to effectively deprivilege and re-authenticate
> > > itself as needed. We've been tossing around the idea of paired VMs+vCPUs to
> > > support VTLs and SEV's VMPLs, what if we usurped/piggybacked those ideas, with a
> > > bit of pKVM mixed in?
> > >
> > > Borrowing VTL terminology, where VTL0 is the least privileged, userspace launches
> > > the VM at VTL0. At some point, the guest triggers the deprivileging sequence and
> > > userspace creates VTL1. Userpace also provides a way for VTL0 restrict access to
> > > its memory, e.g. to effectively make the page tables for the kernel's direct map
> > > writable only from VTL1, to make kernel text RO (or XO), etc. And VTL0 could then
> > > also completely remove its access to code that changes CR0/CR4.
> > >
> > > It would obviously require a _lot_ more upfront work, e.g. to isolate the kernel
> > > text that modifies CR0/CR4 so that it can be removed from VTL0, but that should
> > > be doable with annotations, e.g. tag relevant functions with __magic or whatever,
> > > throw them in a dedicated section, and then free/protect the section(s) at the
> > > appropriate time.
> > >
> > > KVM would likely need to provide the ability to switch VTLs (or whatever they get
> > > called), and host userspace would need to provide a decent amount of the backend
> > > mechanisms and "core" policies, e.g. to manage VTL0 memory, teardown (turn off?)
> > > VTL1 on kexec(), etc. But everything else could live in the guest kernel itself.
> > > E.g. to have CR pinning play nice with kexec(), toss the relevant kexec() code into
> > > VTL1. That way VTL1 can verify the kexec() target and tear itself down before
> > > jumping into the new kernel.
> > >
> > > This is very off the cuff and have-wavy, e.g. I don't have much of an idea what
> > > it would take to harden kernel text patching, but keeping the policy in the guest
> > > seems like it'd make everything more tractable than trying to define an ABI
> > > between Linux and a VMM that is rich and flexible enough to support all the
> > > fancy things Linux does (and will do in the future).
> >
> > Yes, we agree that the guest needs to manage its own policy. That's why
> > we implemented Heki for KVM this way, but without VTLs because KVM
> > doesn't support them.
> >
> > To sum up, is the VTL approach the only one that would be acceptable for
> > KVM?
>
> Heh, that's not a question you want to be asking. You're effectively asking me
> to make an authorative, "final" decision on a topic which I am only passingly
> familiar with.
>
> But since you asked it... :-) Probably?
>
> I see a lot of advantages to a VTL/VSM-like approach:
>
> 1. Provides Linux-as-a guest the flexibility it needs to meaningfully advance
> its security, with the least amount of policy built into the guest/host ABI.
>
> 2. Largely decouples guest policy from the host, i.e. should allow the guest to
> evolve/update it's policy without needing to coordinate changes with the host.
>
> 3. The KVM implementation can be generic enough to be reusable for other features.
>
> 4. Other groups are already working on VTL-like support in KVM, e.g. for VSM
> itself, and potentially for VMPL/SVSM support.
>
> IMO, #2 is a *huge* selling point. Not having to coordinate changes across
> multiple code bases and/or organizations and/or maintainers is a big win for
> velocity, long term maintenance, and probably the very viability of HEKI.

Agree, this is our goal.

>
> Providing the guest with the tools to define and implement its own policy means
> end users don't have to way for some third party, e.g. CSPs, to deploy the
> accompanying host-side changes, because there are no host-side changes.
>
> And encapsulating everything in the guest drastically reduces the friction with
> changes in the kernel that interact with hardening, both from a technical and a
> social perspective. I.e. giving the kernel (near) complete control over its
> destiny minimizes the number of moving parts, and will be far, far easier to sell
> to maintainers. I would expect maintainers to react much more favorably to being
> handed tools to harden the kernel, as opposed to being presented a set of APIs
> that can be used to make the kernel compliant with _someone else's_ vision of
> what kernel hardening should look like.
>
> E.g. imagine a new feature comes along that requires overriding CR0/CR4 pinning
> in a way that doesn't fit into existing policy. If the VMM is involved in
> defining/enforcing the CR pinning policy, then supporting said new feature would
> require new guest/host ABI and an updated host VMM in order to make the new
> feature compatible with HEKI. Inevitably, even if everything goes smoothly from
> an upstreaming perspective, that will result in guests that have to choose between
> HEKI and new feature X, because there is zero chance that all hosts that run Linux
> as a guest will be updated in advance of new feature X being deployed.

Sure. We need to find a generic-enough KVM interface to be able to
restrict a wide range of virtualization/hardware mechanisms (to not rely
too much on KVM changes) and delegate most of enforcement/emulation to
VTL1. In short, policy definition owned by VTL0/guest, and policy
enforcement shared between KVM (coarse grained) and VTL1 (fine grained).

>
> And if/when things don't go smoothly, odds are very good that kernel maintainers
> will eventually tire of having to coordinate and negotiate with QEMU and other
> VMMs, and will become resistant to continuing to support/extend HEKI.

Yes, that was our concern too and another reason why we choose to let
the guest handle its own security policy.

>
> > If yes, that would indeed require a *lot* of work for something we're not
> > sure will be accepted later on.
>
> Yes and no. The AWS folks are pursuing VSM support in KVM+QEMU, and SVSM support
> is trending toward the paired VM+vCPU model. IMO, it's entirely feasible to
> design KVM support such that much of the development load can be shared between
> the projects. And having 2+ use cases for a feature (set) makes it _much_ more
> likely that the feature(s) will be accepted.
>
> And similar to what Paolo said regarding HEKI not having a complete story, I
> don't see a clear line of sight for landing host-defined policy enforcement, as
> there are many open, non-trivial questions that need answers. I.e. upstreaming
> HEKI in its current form is also far from a done deal, and isn't guaranteed to
> be substantially less work when all is said and done.

I'm not sure to understand why "Heki not having a complete story". The
goal is the same as the current kernel self-protection mechanisms.

2024-05-14 12:23:57

by Mickaël Salaün

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Fri, May 10, 2024 at 10:07:00AM +0000, Nicolas Saenz Julienne wrote:
> On Tue May 7, 2024 at 4:16 PM UTC, Sean Christopherson wrote:
> > > If yes, that would indeed require a *lot* of work for something we're not
> > > sure will be accepted later on.
> >
> > Yes and no. The AWS folks are pursuing VSM support in KVM+QEMU, and SVSM support
> > is trending toward the paired VM+vCPU model. IMO, it's entirely feasible to
> > design KVM support such that much of the development load can be shared between
> > the projects. And having 2+ use cases for a feature (set) makes it _much_ more
> > likely that the feature(s) will be accepted.
>
> Since Sean mentioned our VSM efforts, a small update. We were able to
> validate the concept of one KVM VM per VTL as discussed in LPC. Right
> now only for single CPU guests, but are in the late stages of bringing
> up MP support. The resulting KVM code is small, and most will be
> uncontroversial (I hope). If other obligations allow it, we plan on
> having something suitable for review in the coming months.

Looks good!

>
> Our implementation aims to implement all the VSM spec necessary to run
> with Microsoft Credential Guard. But note that some aspects necessary
> for HVCI are not covered, especially the ones that depend on MBEC
> support, or some categories of secure intercepts.

We already implemented support for MBEC, so that should not be an issue.
We just need to find the best interface to configure it.

>
> Development happens
> https://github.com/vianpl/{linux,qemu,kvm-unit-tests} and the vsm-next
> branch, but I'd advice against looking into it until we add some order
> to the rework. Regardless, feel free to get in touch.

Thanks for the update.

Could we schedule a PUCK meeting to synchronize and help each other?
What about June 12?

2024-05-15 20:32:43

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Tue, May 14, 2024, Mickaël Salaün wrote:
> On Fri, May 10, 2024 at 10:07:00AM +0000, Nicolas Saenz Julienne wrote:
> > Development happens
> > https://github.com/vianpl/{linux,qemu,kvm-unit-tests} and the vsm-next
> > branch, but I'd advice against looking into it until we add some order
> > to the rework. Regardless, feel free to get in touch.
>
> Thanks for the update.
>
> Could we schedule a PUCK meeting to synchronize and help each other?
> What about June 12?

June 12th works on my end.

2024-05-16 14:03:16

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Tue May 14, 2024 at 12:23 PM UTC, Mickaël Salaün wrote:
> > Development happens
> > https://github.com/vianpl/{linux,qemu,kvm-unit-tests} and the vsm-next
> > branch, but I'd advice against looking into it until we add some order
> > to the rework. Regardless, feel free to get in touch.
>
> Thanks for the update.
>
> Could we schedule a PUCK meeting to synchronize and help each other?
> What about June 12?

Sounds great! June 12th works for me.

Nicolas

2024-06-03 18:39:53

by Mickaël Salaün

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Wed, May 15, 2024 at 01:32:24PM -0700, Sean Christopherson wrote:
> On Tue, May 14, 2024, Mickaël Salaün wrote:
> > On Fri, May 10, 2024 at 10:07:00AM +0000, Nicolas Saenz Julienne wrote:
> > > Development happens
> > > https://github.com/vianpl/{linux,qemu,kvm-unit-tests} and the vsm-next
> > > branch, but I'd advice against looking into it until we add some order
> > > to the rework. Regardless, feel free to get in touch.
> >
> > Thanks for the update.
> >
> > Could we schedule a PUCK meeting to synchronize and help each other?
> > What about June 12?
>
> June 12th works on my end.

Can you please send an invite?

Mickaël

2024-06-04 00:30:06

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Tue, May 14, 2024, Mickaël Salaün wrote:
> On Tue, May 07, 2024 at 09:16:06AM -0700, Sean Christopherson wrote:
> > On Tue, May 07, 2024, Mickaël Salaün wrote:
> > > If yes, that would indeed require a *lot* of work for something we're not
> > > sure will be accepted later on.
> >
> > Yes and no. The AWS folks are pursuing VSM support in KVM+QEMU, and SVSM support
> > is trending toward the paired VM+vCPU model. IMO, it's entirely feasible to
> > design KVM support such that much of the development load can be shared between
> > the projects. And having 2+ use cases for a feature (set) makes it _much_ more
> > likely that the feature(s) will be accepted.
> >
> > And similar to what Paolo said regarding HEKI not having a complete story, I
> > don't see a clear line of sight for landing host-defined policy enforcement, as
> > there are many open, non-trivial questions that need answers. I.e. upstreaming
> > HEKI in its current form is also far from a done deal, and isn't guaranteed to
> > be substantially less work when all is said and done.
>
> I'm not sure to understand why "Heki not having a complete story". The
> goal is the same as the current kernel self-protection mechanisms.

HEKI doesn't have a complete story for how it's going to play nice with kexec(),
emulated RESET, etc. The kernel's existing self-protection mechanisms Just Work
because the protections are automatically disabled/lost on such transitions.
They are obviously significant drawbacks to that behavior, but they are accepted
drawbacks, i.e. solving those problems isn't in scope (yet) for the kernel. And
the "failure" mode is also loss of hardening, not an unusable guest.

In other words, the kernel's hardening is firmly best effort at this time,
whereas HEKI likely needs to be much more than "best effort" in order to justify
the extra complexity. And that means having answers to the various interoperability
questions.

2024-06-04 00:30:31

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

On Mon, Jun 03, 2024, Mickaël Salaün wrote:
> On Wed, May 15, 2024 at 01:32:24PM -0700, Sean Christopherson wrote:
> > On Tue, May 14, 2024, Mickaël Salaün wrote:
> > > On Fri, May 10, 2024 at 10:07:00AM +0000, Nicolas Saenz Julienne wrote:
> > > > Development happens
> > > > https://github.com/vianpl/{linux,qemu,kvm-unit-tests} and the vsm-next
> > > > branch, but I'd advice against looking into it until we add some order
> > > > to the rework. Regardless, feel free to get in touch.
> > >
> > > Thanks for the update.
> > >
> > > Could we schedule a PUCK meeting to synchronize and help each other?
> > > What about June 12?
> >
> > June 12th works on my end.
>
> Can you please send an invite?

I think this is all the info?

Time: 6am PDT
Video: https://meet.google.com/vdb-aeqo-knk
Phone: https://tel.meet/vdb-aeqo-knk?pin=3003112178656

Calendar: https://calendar.google.com/calendar/u/0?cid=Y182MWE1YjFmNjQ0NzM5YmY1YmVkN2U1ZWE1ZmMzNjY5Y2UzMmEyNTQ0YzVkYjFjN2M4OTE3MDJjYTUwOTBjN2Q1QGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20
Drive: https://drive.google.com/drive/folders/1aTqCrvTsQI9T4qLhhLs_l986SngGlhPH?resourcekey=0-FDy0ykM3RerZedI8R-zj4A&usp=drive_link