Kernel developers working on confidential computing for virtualized
environments in x86 operate under a set of assumptions regarding the Linux
kernel threat model that differs from the traditional view. Historically,
the Linux threat model acknowledges attackers residing in userspace, as
well as a limited set of external attackers that are able to interact with
the kernel through networking or limited HW-specific exposed interfaces
(e.g. USB, thunderbolt). The goal of this document is to explain additional
attack vectors that arise in the virtualized confidential computing space
and discuss the proposed protection mechanisms for the Linux kernel.
Reviewed-by: Larry Dewey <[email protected]>
Reviewed-by: David Kaplan <[email protected]>
Co-developed-by: Elena Reshetova <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Carlos Bilbao <[email protected]>
---
V1 can be found in:
https://lore.kernel.org/lkml/[email protected]/
Changes since v1:
- Apply feedback from first version of the patch
- Clarify that the document applies only to a particular angle of
confidential computing, namely confidential computing for virtualized
environments. Also, state that the document is specific to x86 and
that the main goal is to discuss the emerging threats.
- Change commit message and file name accordingly
- Replace AMD's link to AMD SEV SNP white paper
- Minor tweaking and clarifications
---
Documentation/security/index.rst | 1 +
.../security/x86-confidential-computing.rst | 298 ++++++++++++++++++
MAINTAINERS | 6 +
3 files changed, 305 insertions(+)
create mode 100644 Documentation/security/x86-confidential-computing.rst
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 6ed8d2fa6f9e..bda919aecb37 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -6,6 +6,7 @@ Security Documentation
:maxdepth: 1
credentials
+ x86-confidential-computing
IMA-templates
keys/index
lsm
diff --git a/Documentation/security/x86-confidential-computing.rst b/Documentation/security/x86-confidential-computing.rst
new file mode 100644
index 000000000000..5c52b8888089
--- /dev/null
+++ b/Documentation/security/x86-confidential-computing.rst
@@ -0,0 +1,298 @@
+======================================================
+Confidential Computing in Linux for x86 virtualization
+======================================================
+
+.. contents:: :local:
+
+By: Elena Reshetova <[email protected]> and Carlos Bilbao <[email protected]>
+
+Motivation
+==========
+
+Kernel developers working on confidential computing for virtualized
+environments in x86 operate under a set of assumptions regarding the Linux
+kernel threat model that differ from the traditional view. Historically,
+the Linux threat model acknowledges attackers residing in userspace, as
+well as a limited set of external attackers that are able to interact with
+the kernel through various networking or limited HW-specific exposed
+interfaces (USB, thunderbolt). The goal of this document is to explain
+additional attack vectors that arise in the confidential computing space
+and discuss the proposed protection mechanisms for the Linux kernel.
+
+Overview and terminology
+========================
+
+Confidential Computing (CoCo) is a broad term covering a wide range of
+security technologies that aim to protect the confidentiality and integrity
+of data in use (vs. data at rest or data in transit). At its core, CoCo
+solutions provide a Trusted Execution Environment (TEE), where secure data
+processing can be performed and, as a result, they are typically further
+classified into different subtypes depending on the SW that is intended
+to be run in TEE. This document focuses on a subclass of CoCo technologies
+that are targeting virtualized environments and allow running Virtual
+Machines (VM) inside TEE. From now on in this document will be referring
+to this subclass of CoCo as 'Confidential Computing (CoCo) for the
+virtualized environments (VE)'.
+
+CoCo, in the virtualization context, refers to a set of HW and/or SW
+technologies that allow for stronger security guarantees for the SW running
+inside a CoCo VM. Namely, confidential computing allows its users to
+confirm the trustworthiness of all SW pieces to include in its reduced
+Trusted Computing Base (TCB) given its ability to attest the state of these
+trusted components.
+
+While the concrete implementation details differ between technologies, all
+available mechanisms aim to provide increased confidentiality and
+integrity for the VM's guest memory and execution state (vCPU registers),
+more tightly controlled guest interrupt injection, as well as some
+additional mechanisms to control guest-host page mapping. More details on
+the x86-specific solutions can be found in
+:doc:`Intel Trust Domain Extensions (TDX) </arch/x86/tdx>` and
+`AMD Memory Encryption <https://www.amd.com/system/files/techdocs/sev-snp-strengthening-vm-isolation-with-integrity-protection-and-more.pdf>`_.
+
+The basic CoCo guest layout includes the host, guest, the interfaces that
+communicate guest and host, a platform capable of supporting CoCo VMs, and
+a trusted intermediary between the guest VM and the underlying platform
+that acts as a security manager. The host-side virtual machine monitor
+(VMM) typically consists of a subset of traditional VMM features and
+is still in charge of the guest lifecycle, i.e. create or destroy a CoCo
+VM, manage its access to system resources, etc. However, since it
+typically stays out of CoCo VM TCB, its access is limited to preserve the
+security objectives.
+
+In the following diagram, the "<--->" lines represent bi-directional
+communication channels or interfaces between the CoCo security manager and
+the rest of the components (data flow for guest, host, hardware) ::
+
+ +-------------------+ +-----------------------+
+ | CoCo guest VM |<---->| |
+ +-------------------+ | |
+ | Interfaces | | CoCo security manager |
+ +-------------------+ | |
+ | Host VMM |<---->| |
+ +-------------------+ | |
+ | |
+ +--------------------+ | |
+ | CoCo platform |<--->| |
+ +--------------------+ +-----------------------+
+
+The specific details of the CoCo security manager vastly diverge between
+technologies. For example, in some cases, it will be implemented in HW
+while in others it may be pure SW. In some cases, such as for the
+`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-staging/pKVM-IA>`,
+the CoCo security manager is a small, isolated and highly privileged
+(compared to the rest of SW running on the host) part of a traditional
+VMM.
+
+Existing Linux kernel threat model
+==================================
+
+The overall components of the current Linux kernel threat model are::
+
+ +-----------------------+ +-------------------+
+ | |<---->| Userspace |
+ | | +-------------------+
+ | External attack | | Interfaces |
+ | vectors | +-------------------+
+ | |<---->| Linux Kernel |
+ | | +-------------------+
+ +-----------------------+ +-------------------+
+ | Bootloader/BIOS |
+ +-------------------+
+ +-------------------+
+ | HW platform |
+ +-------------------+
+
+There is also communication between the bootloader and the kernel during
+the boot process, but this diagram does not represent it explicitly. The
+"Interfaces" box represents the various interfaces that allow
+communication between kernel and userspace. This includes system calls,
+kernel APIs, device drivers, etc.
+
+The existing Linux kernel threat model typically assumes execution on a
+trusted HW platform with all of the firmware and bootloaders included on
+its TCB. The primary attacker resides in the userspace, and all of the data
+coming from there is generally considered untrusted, unless userspace is
+privileged enough to perform trusted actions. In addition, external
+attackers are typically considered, including those with access to enabled
+external networks (e.g. Ethernet, Wireless, Bluetooth), exposed hardware
+interfaces (e.g. USB, Thunderbolt), and the ability to modify the contents
+of disks offline.
+
+Regarding external attack vectors, it is interesting to note that in most
+cases external attackers will try to exploit vulnerabilities in userspace
+first, but that it is possible for an attacker to directly target the
+kernel; particularly if the host has physical access. Examples of direct
+kernel attacks include the vulnerabilities CVE-2019-19524, CVE-2022-0435
+and CVE-2020-24490.
+
+Confidential Computing threat model and its security objectives
+===============================================================
+
+Confidential Computing adds a new type of attacker to the above list: a
+potentially misbehaving host (which can also include some part of a
+traditional VMM or all of it), which is typically placed outside of the
+CoCo VM TCB due to its large SW attack surface. It is important to note
+that this doesn’t imply that the host or VMM are intentionally
+malicious, but that there exists a security value in having a small CoCo
+VM TCB. This new type of adversary may be viewed as a more powerful type
+of external attacker, as it resides locally on the same physical machine
+-in contrast to a remote network attacker- and has control over the guest
+kernel communication with most of the HW::
+
+ +------------------------+
+ | CoCo guest VM |
+ +-----------------------+ | +-------------------+ |
+ | |<--->| | Userspace | |
+ | | | +-------------------+ |
+ | External attack | | | Interfaces | |
+ | vectors | | +-------------------+ |
+ | |<--->| | Linux Kernel | |
+ | | | +-------------------+ |
+ +-----------------------+ | +-------------------+ |
+ | | Bootloader/BIOS | |
+ +-----------------------+ | +-------------------+ |
+ | |<--->+------------------------+
+ | | | Interfaces |
+ | | +------------------------+
+ | CoCo security |<--->| Host/Host-side VMM |
+ | manager | +------------------------+
+ | | +------------------------+
+ | |<--->| CoCo platform |
+ +-----------------------+ +------------------------+
+
+While traditionally the host has unlimited access to guest data and can
+leverage this access to attack the guest, the CoCo systems mitigate such
+attacks by adding security features like guest data confidentiality and
+integrity protection. This threat model assumes that those features are
+available and intact.
+
+The **Linux kernel CoCo VM security objectives** can be summarized as follows:
+
+1. Preserve the confidentiality and integrity of CoCo guest's private
+memory and registers.
+
+2. Prevent privileged escalation from a host into a CoCo guest Linux kernel.
+While it is true that the host (and host-side VMM) requires some level of
+privilege to create, destroy, or pause the guest, part of the goal of
+preventing privileged escalation is to ensure that these operations do not
+provide a pathway for attackers to gain access to the guest's kernel.
+
+The above security objectives result in two primary **Linux kernel CoCo
+VM assets**:
+
+1. Guest kernel execution context.
+2. Guest kernel private memory.
+
+The host retains full control over the CoCo guest resources, and can deny
+access to them at any time. Examples of resources include CPU time, memory
+that the guest can consume, network bandwidth, etc. Because of this, the
+host Denial of Service (DoS) attacks against CoCo guests are beyond the
+scope of this threat model.
+
+The **Linux CoCo VM attack surface** is any interface exposed from a CoCo
+guest Linux kernel towards an untrusted host that is not covered by the
+CoCo technology SW/HW protection. This includes any possible
+side-channels, as well as transient execution side channels. Examples of
+explicit (not side-channel) interfaces include accesses to port I/O, MMIO
+and DMA interfaces, access to PCI configuration space, VMM-specific
+hypercalls (towards Host-side VMM), access to shared memory pages,
+interrupts allowed to be injected into the guest kernel by the host, as
+well as CoCo technology specific hypercalls, if present. Additionally, the
+host in a CoCo system typically controls the process of creating a CoCo
+guest: it has a method to load into a guest the firmware and bootloader
+images, the kernel image together with the kernel command line. All of this
+data should also be considered untrusted until its integrity and
+authenticity is established via attestation.
+
+The table below shows a threat matrix for the CoCo guest Linux kernel with
+the potential mitigation strategies. The matrix refers to CoCo-specific
+versions of the guest, host and platform.
+
+.. list-table:: CoCo Linux guest kernel threat matrix
+ :widths: auto
+ :align: center
+ :header-rows: 1
+
+ * - Threat name
+ - Threat description
+ - Mitigation strategies
+
+ * - Guest malicious configuration
+ - A misbehaving host modifies one of the following guest's
+ configuration:
+
+ 1. Guest firmware or bootloader
+
+ 2. Guest kernel or module binaries
+
+ 3. Guest command line parameters
+
+ This allows the host to break the integrity of the code running
+ inside a CoCo guest, and violates the CoCo security objectives.
+ - The integrity of the guest's configuration passed via untrusted host
+ must be ensured by methods such as remote attestation and signing.
+ This should be largely transparent to the guest kernel, and would
+ allow it to assume a trusted state at the time of boot.
+
+ * - CoCo guest data attacks
+ - A misbehaving host retains full control of the CoCo guest's data
+ in-transit between the guest and the host-managed physical or
+ virtual devices. This allows any attack against confidentiality,
+ integrity or freshness of such data.
+ - The CoCo guest is responsible for ensuring the confidentiality,
+ integrity and freshness of such data using well-established
+ security mechanisms. For example, for any guest external network
+ communications passed via the untrusted host, an end-to-end
+ secure session must be established between a guest and a trusted
+ remote endpoint using well-known protocols such as TLS.
+ This requirement also applies to protection of the guest's disk
+ image.
+
+ * - Malformed runtime input
+ - A misbehaving host injects malformed input via any communication
+ interface used by the guest's kernel code. If the code is not
+ prepared to handle this input correctly, this can result in a host
+ --> guest kernel privilege escalation. This includes traditional
+ side-channel and/or transient execution attack vectors.
+ - The attestation or signing process cannot help to mitigate this
+ threat since this input is highly dynamic. Instead, a different set
+ of mechanisms is required:
+
+ 1. *Limit the exposed attack surface*. Whenever possible, disable
+ complex kernel features and device drivers (not required for guest
+ operation) that actively use the communication interfaces between
+ the untrusted host and the guest. This is not a new concept for the
+ Linux kernel, since it already has mechanisms to disable external
+ interfaces, such as attacker's access via USB/Thunderbolt subsystem.
+
+ 2. *Harden the exposed attack surface*. Any code that uses such
+ interfaces must treat the input from the untrusted host as malicious,
+ and do sanity checks before processing it. This can be ensured by
+ performing a code audit of such device drivers as well as employing
+ other standard techniques for testing the code robustness, such as
+ fuzzing. This is again a well-known concept for the Linux kernel,
+ since all its networking code has been previously analyzed under
+ presumption of processing malformed input from a network attacker.
+
+ * - Malicious runtime input
+ - A misbehaving host injects a specific input value via any
+ communication interface used by the guest's kernel code. The
+ difference with the previous attack vector (malformed runtime input)
+ is that this input is not malformed, but its value is crafted to
+ impact the guest's kernel security. Examples of such inputs include
+ providing a malicious time to the guest or the entropy to the guest
+ random number generator. Additionally, the timing of such events can
+ be an attack vector on its own, if it results in a particular guest
+ kernel action (i.e. processing of a host-injected interrupt).
+ - Similarly, as with the previous attack vector, it is not possible to
+ use attestation mechanisms to address this threat. Instead, such
+ attack vectors (i.e. interfaces) must be either disabled or made
+ resistant to supplied host input.
+
+As can be seen from the above table, the potential mitigation strategies
+to secure the CoCo Linux guest kernel vary, but can be roughly split into
+mechanisms that either require or do not require changes to the existing
+Linux kernel code. One main goal of the CoCo security architecture is to
+minimize changes to the Linux kernel code, while also providing usable
+and scalable means to facilitate the security of a CoCo guest kernel.
diff --git a/MAINTAINERS b/MAINTAINERS
index a73486c4aa6e..1d4ae60cdee9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5197,6 +5197,12 @@ S: Orphan
W: http://accessrunner.sourceforge.net/
F: drivers/usb/atm/cxacru.c
+CONFIDENTIAL COMPUTING THREAT MODEL FOR X86 VIRTUALIZATION
+M: Elena Reshetova <[email protected]>
+M: Carlos Bilbao <[email protected]>
+S: Maintained
+F: Documentation/security/x86-confidential-computing.rst
+
CONFIGFS
M: Joel Becker <[email protected]>
M: Christoph Hellwig <[email protected]>
--
2.34.1
Hi--
On 6/12/23 09:47, Carlos Bilbao wrote:
> Kernel developers working on confidential computing for virtualized
> environments in x86 operate under a set of assumptions regarding the Linux
> kernel threat model that differs from the traditional view. Historically,
> the Linux threat model acknowledges attackers residing in userspace, as
> well as a limited set of external attackers that are able to interact with
> the kernel through networking or limited HW-specific exposed interfaces
> (e.g. USB, thunderbolt). The goal of this document is to explain additional
> attack vectors that arise in the virtualized confidential computing space
> and discuss the proposed protection mechanisms for the Linux kernel.
>
> Reviewed-by: Larry Dewey <[email protected]>
> Reviewed-by: David Kaplan <[email protected]>
> Co-developed-by: Elena Reshetova <[email protected]>
> Signed-off-by: Elena Reshetova <[email protected]>
> Signed-off-by: Carlos Bilbao <[email protected]>
> ---
>
> V1 can be found in:
> https://lore.kernel.org/lkml/[email protected]/
> Changes since v1:
>
> - Apply feedback from first version of the patch
> - Clarify that the document applies only to a particular angle of
> confidential computing, namely confidential computing for virtualized
> environments. Also, state that the document is specific to x86 and
> that the main goal is to discuss the emerging threats.
> - Change commit message and file name accordingly
> - Replace AMD's link to AMD SEV SNP white paper
> - Minor tweaking and clarifications
>
> ---
> Documentation/security/index.rst | 1 +
> .../security/x86-confidential-computing.rst | 298 ++++++++++++++++++
> MAINTAINERS | 6 +
> 3 files changed, 305 insertions(+)
> create mode 100644 Documentation/security/x86-confidential-computing.rst
>
> diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
> index 6ed8d2fa6f9e..bda919aecb37 100644
> --- a/Documentation/security/index.rst
> +++ b/Documentation/security/index.rst
> @@ -6,6 +6,7 @@ Security Documentation
> :maxdepth: 1
>
> credentials
> + x86-confidential-computing
Does the new entry align with the others?
> IMA-templates
> keys/index
> lsm
> diff --git a/Documentation/security/x86-confidential-computing.rst b/Documentation/security/x86-confidential-computing.rst
> new file mode 100644
> index 000000000000..5c52b8888089
> --- /dev/null
> +++ b/Documentation/security/x86-confidential-computing.rst
> @@ -0,0 +1,298 @@
> +======================================================
> +Confidential Computing in Linux for x86 virtualization
> +======================================================
> +
> +.. contents:: :local:
> +
> +By: Elena Reshetova <[email protected]> and Carlos Bilbao <[email protected]>
> +
> +Motivation
> +==========
> +
> +Kernel developers working on confidential computing for virtualized
> +environments in x86 operate under a set of assumptions regarding the Linux
> +kernel threat model that differ from the traditional view. Historically,
> +the Linux threat model acknowledges attackers residing in userspace, as
> +well as a limited set of external attackers that are able to interact with
> +the kernel through various networking or limited HW-specific exposed
> +interfaces (USB, thunderbolt). The goal of this document is to explain
> +additional attack vectors that arise in the confidential computing space
> +and discuss the proposed protection mechanisms for the Linux kernel.
> +
> +Overview and terminology
> +========================
> +
> +Confidential Computing (CoCo) is a broad term covering a wide range of
> +security technologies that aim to protect the confidentiality and integrity
> +of data in use (vs. data at rest or data in transit). At its core, CoCo
> +solutions provide a Trusted Execution Environment (TEE), where secure data
> +processing can be performed and, as a result, they are typically further
> +classified into different subtypes depending on the SW that is intended
> +to be run in TEE. This document focuses on a subclass of CoCo technologies
> +that are targeting virtualized environments and allow running Virtual
> +Machines (VM) inside TEE. From now on in this document will be referring
> +to this subclass of CoCo as 'Confidential Computing (CoCo) for the
> +virtualized environments (VE)'.
> +
> +CoCo, in the virtualization context, refers to a set of HW and/or SW
> +technologies that allow for stronger security guarantees for the SW running
> +inside a CoCo VM. Namely, confidential computing allows its users to
> +confirm the trustworthiness of all SW pieces to include in its reduced
> +Trusted Computing Base (TCB) given its ability to attest the state of these
> +trusted components.
> +
> +While the concrete implementation details differ between technologies, all
> +available mechanisms aim to provide increased confidentiality and
> +integrity for the VM's guest memory and execution state (vCPU registers),
> +more tightly controlled guest interrupt injection, as well as some
> +additional mechanisms to control guest-host page mapping. More details on
> +the x86-specific solutions can be found in
> +:doc:`Intel Trust Domain Extensions (TDX) </arch/x86/tdx>` and
<Documentation/arch/x86/tdx>
or does it work without the leading subdir?
> +`AMD Memory Encryption <https://www.amd.com/system/files/techdocs/sev-snp-strengthening-vm-isolation-with-integrity-protection-and-more.pdf>`_.
> +
> +The basic CoCo guest layout includes the host, guest, the interfaces that
> +communicate guest and host, a platform capable of supporting CoCo VMs, and
> +a trusted intermediary between the guest VM and the underlying platform
> +that acts as a security manager. The host-side virtual machine monitor
> +(VMM) typically consists of a subset of traditional VMM features and
> +is still in charge of the guest lifecycle, i.e. create or destroy a CoCo
> +VM, manage its access to system resources, etc. However, since it
> +typically stays out of CoCo VM TCB, its access is limited to preserve the
to preserving the
?
> +security objectives.
> +
> +In the following diagram, the "<--->" lines represent bi-directional
> +communication channels or interfaces between the CoCo security manager and
> +the rest of the components (data flow for guest, host, hardware) ::
> +
> + +-------------------+ +-----------------------+
> + | CoCo guest VM |<---->| |
> + +-------------------+ | |
> + | Interfaces | | CoCo security manager |
> + +-------------------+ | |
> + | Host VMM |<---->| |
> + +-------------------+ | |
> + | |
> + +--------------------+ | |
> + | CoCo platform |<--->| |
> + +--------------------+ +-----------------------+
> +
> +The specific details of the CoCo security manager vastly diverge between
> +technologies. For example, in some cases, it will be implemented in HW
> +while in others it may be pure SW. In some cases, such as for the
> +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-staging/pKVM-IA>`,
> +the CoCo security manager is a small, isolated and highly privileged
> +(compared to the rest of SW running on the host) part of a traditional
> +VMM.
> +
> +Existing Linux kernel threat model
> +==================================
> +
> +The overall components of the current Linux kernel threat model are::
> +
> + +-----------------------+ +-------------------+
> + | |<---->| Userspace |
> + | | +-------------------+
> + | External attack | | Interfaces |
> + | vectors | +-------------------+
> + | |<---->| Linux Kernel |
> + | | +-------------------+
> + +-----------------------+ +-------------------+
> + | Bootloader/BIOS |
> + +-------------------+
> + +-------------------+
> + | HW platform |
> + +-------------------+
> +
> +There is also communication between the bootloader and the kernel during
> +the boot process, but this diagram does not represent it explicitly. The
> +"Interfaces" box represents the various interfaces that allow
> +communication between kernel and userspace. This includes system calls,
> +kernel APIs, device drivers, etc.
> +
> +The existing Linux kernel threat model typically assumes execution on a
> +trusted HW platform with all of the firmware and bootloaders included on
> +its TCB. The primary attacker resides in the userspace, and all of the data
> +coming from there is generally considered untrusted, unless userspace is
> +privileged enough to perform trusted actions. In addition, external
> +attackers are typically considered, including those with access to enabled
> +external networks (e.g. Ethernet, Wireless, Bluetooth), exposed hardware
> +interfaces (e.g. USB, Thunderbolt), and the ability to modify the contents
> +of disks offline.
> +
> +Regarding external attack vectors, it is interesting to note that in most
> +cases external attackers will try to exploit vulnerabilities in userspace
> +first, but that it is possible for an attacker to directly target the
> +kernel; particularly if the host has physical access. Examples of direct
> +kernel attacks include the vulnerabilities CVE-2019-19524, CVE-2022-0435
> +and CVE-2020-24490.
> +
> +Confidential Computing threat model and its security objectives
> +===============================================================
> +
> +Confidential Computing adds a new type of attacker to the above list: a
> +potentially misbehaving host (which can also include some part of a
> +traditional VMM or all of it), which is typically placed outside of the
> +CoCo VM TCB due to its large SW attack surface. It is important to note
> +that this doesn’t imply that the host or VMM are intentionally
> +malicious, but that there exists a security value in having a small CoCo
> +VM TCB. This new type of adversary may be viewed as a more powerful type
> +of external attacker, as it resides locally on the same physical machine
> +-in contrast to a remote network attacker- and has control over the guest
Hyphens (dashes) are not normally used for a parenthetical phrase AFAIK.
> +kernel communication with most of the HW::
I would prefer to capitalize "kernel" above.
> +
> + +------------------------+
> + | CoCo guest VM |
> + +-----------------------+ | +-------------------+ |
> + | |<--->| | Userspace | |
> + | | | +-------------------+ |
> + | External attack | | | Interfaces | |
> + | vectors | | +-------------------+ |
> + | |<--->| | Linux Kernel | |
> + | | | +-------------------+ |
> + +-----------------------+ | +-------------------+ |
> + | | Bootloader/BIOS | |
> + +-----------------------+ | +-------------------+ |
> + | |<--->+------------------------+
> + | | | Interfaces |
> + | | +------------------------+
> + | CoCo security |<--->| Host/Host-side VMM |
> + | manager | +------------------------+
> + | | +------------------------+
> + | |<--->| CoCo platform |
> + +-----------------------+ +------------------------+
> +
> +While traditionally the host has unlimited access to guest data and can
> +leverage this access to attack the guest, the CoCo systems mitigate such
> +attacks by adding security features like guest data confidentiality and
> +integrity protection. This threat model assumes that those features are
> +available and intact.
> +
> +The **Linux kernel CoCo VM security objectives** can be summarized as follows:
> +
> +1. Preserve the confidentiality and integrity of CoCo guest's private
> +memory and registers.
> +
> +2. Prevent privileged escalation from a host into a CoCo guest Linux kernel.
> +While it is true that the host (and host-side VMM) requires some level of
> +privilege to create, destroy, or pause the guest, part of the goal of
> +preventing privileged escalation is to ensure that these operations do not
> +provide a pathway for attackers to gain access to the guest's kernel.
> +
> +The above security objectives result in two primary **Linux kernel CoCo
> +VM assets**:
> +
> +1. Guest kernel execution context.
> +2. Guest kernel private memory.
> +
> +The host retains full control over the CoCo guest resources, and can deny
> +access to them at any time. Examples of resources include CPU time, memory
> +that the guest can consume, network bandwidth, etc. Because of this, the
> +host Denial of Service (DoS) attacks against CoCo guests are beyond the
> +scope of this threat model.
> +
> +The **Linux CoCo VM attack surface** is any interface exposed from a CoCo
> +guest Linux kernel towards an untrusted host that is not covered by the
> +CoCo technology SW/HW protection. This includes any possible
> +side-channels, as well as transient execution side channels. Examples of
> +explicit (not side-channel) interfaces include accesses to port I/O, MMIO
> +and DMA interfaces, access to PCI configuration space, VMM-specific
> +hypercalls (towards Host-side VMM), access to shared memory pages,
> +interrupts allowed to be injected into the guest kernel by the host, as
> +well as CoCo technology specific hypercalls, if present. Additionally, the
technology-specific
> +host in a CoCo system typically controls the process of creating a CoCo
> +guest: it has a method to load into a guest the firmware and bootloader
> +images, the kernel image together with the kernel command line. All of this
> +data should also be considered untrusted until its integrity and
> +authenticity is established via attestation.
> +
> +The table below shows a threat matrix for the CoCo guest Linux kernel with
> +the potential mitigation strategies. The matrix refers to CoCo-specific
> +versions of the guest, host and platform.
> +
> +.. list-table:: CoCo Linux guest kernel threat matrix
> + :widths: auto
> + :align: center
> + :header-rows: 1
> +
> + * - Threat name
> + - Threat description
> + - Mitigation strategies
> +
> + * - Guest malicious configuration
> + - A misbehaving host modifies one of the following guest's
> + configuration:
> +
> + 1. Guest firmware or bootloader
> +
> + 2. Guest kernel or module binaries
> +
> + 3. Guest command line parameters
> +
> + This allows the host to break the integrity of the code running
> + inside a CoCo guest, and violates the CoCo security objectives.
> + - The integrity of the guest's configuration passed via untrusted host
> + must be ensured by methods such as remote attestation and signing.
> + This should be largely transparent to the guest kernel, and would
> + allow it to assume a trusted state at the time of boot.
> +
> + * - CoCo guest data attacks
> + - A misbehaving host retains full control of the CoCo guest's data
> + in-transit between the guest and the host-managed physical or
> + virtual devices. This allows any attack against confidentiality,
> + integrity or freshness of such data.
> + - The CoCo guest is responsible for ensuring the confidentiality,
> + integrity and freshness of such data using well-established
> + security mechanisms. For example, for any guest external network
> + communications passed via the untrusted host, an end-to-end
> + secure session must be established between a guest and a trusted
> + remote endpoint using well-known protocols such as TLS.
> + This requirement also applies to protection of the guest's disk
> + image.
> +
> + * - Malformed runtime input
> + - A misbehaving host injects malformed input via any communication
> + interface used by the guest's kernel code. If the code is not
> + prepared to handle this input correctly, this can result in a host
> + --> guest kernel privilege escalation. This includes traditional
> + side-channel and/or transient execution attack vectors.
> + - The attestation or signing process cannot help to mitigate this
> + threat since this input is highly dynamic. Instead, a different set
> + of mechanisms is required:
> +
> + 1. *Limit the exposed attack surface*. Whenever possible, disable
> + complex kernel features and device drivers (not required for guest
> + operation) that actively use the communication interfaces between
> + the untrusted host and the guest. This is not a new concept for the
> + Linux kernel, since it already has mechanisms to disable external
> + interfaces, such as attacker's access via USB/Thunderbolt subsystem.
> +
> + 2. *Harden the exposed attack surface*. Any code that uses such
> + interfaces must treat the input from the untrusted host as malicious,
> + and do sanity checks before processing it. This can be ensured by
> + performing a code audit of such device drivers as well as employing
> + other standard techniques for testing the code robustness, such as
> + fuzzing. This is again a well-known concept for the Linux kernel,
> + since all its networking code has been previously analyzed under
> + presumption of processing malformed input from a network attacker.
> +
> + * - Malicious runtime input
> + - A misbehaving host injects a specific input value via any
> + communication interface used by the guest's kernel code. The
> + difference with the previous attack vector (malformed runtime input)
> + is that this input is not malformed, but its value is crafted to
> + impact the guest's kernel security. Examples of such inputs include
> + providing a malicious time to the guest or the entropy to the guest
> + random number generator. Additionally, the timing of such events can
> + be an attack vector on its own, if it results in a particular guest
> + kernel action (i.e. processing of a host-injected interrupt).
> + - Similarly, as with the previous attack vector, it is not possible to
> + use attestation mechanisms to address this threat. Instead, such
> + attack vectors (i.e. interfaces) must be either disabled or made
> + resistant to supplied host input.
> +
> +As can be seen from the above table, the potential mitigation strategies
> +to secure the CoCo Linux guest kernel vary, but can be roughly split into
> +mechanisms that either require or do not require changes to the existing
> +Linux kernel code. One main goal of the CoCo security architecture is to
> +minimize changes to the Linux kernel code, while also providing usable
> +and scalable means to facilitate the security of a CoCo guest kernel.
HTH.
~Randy
On Mon, Jun 12, 2023, Carlos Bilbao wrote:
> Kernel developers working on confidential computing for virtualized
> environments in x86 operate under a set of assumptions regarding the Linux
No, "x86" isn't special, SNP and TDX and s390's UV are special. pKVM is similar,
but (a) it's not as paranoid as SNP and TDX, and (b) the known use case for pKVM
on x86 is to harden usage of hardware devices, i.e. pKVM x86 "guests" likely don't
have the same "untrusted virtual device" attack surfaces a SNP/TDX/UV guests.
> +Kernel developers working on confidential computing for virtualized
> +environments in x86 operate under a set of assumptions regarding the Linux
I don't think "virtualized environments" is the right description. IMO, "cloud
computing environments" or maybe "off-premise environments" more accurately
captures what you want to document, though the latter fails to imply the "virtual"
aspect of things.
> +The specific details of the CoCo security manager vastly diverge between
> +technologies. For example, in some cases, it will be implemented in HW
> +while in others it may be pure SW. In some cases, such as for the
> +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-staging/pKVM-IA>`,
> +the CoCo security manager is a small, isolated and highly privileged
> +(compared to the rest of SW running on the host) part of a traditional
> +VMM.
I say that "virtualized environments" isn't a good description because while pKVM
does utilize hardware virtualization, my understanding is that the primary use
cases for pKVM don't have the same threat model as SNP/TDX, e.g. IIUC many (most?
all?) pKVM guests don't require network access.
> +Confidential Computing adds a new type of attacker to the above list: a
This should be qualified as "CoCo for cloud", or whatever sublabel we land on.
> +potentially misbehaving host (which can also include some part of a
> +traditional VMM or all of it), which is typically placed outside of the
> +CoCo VM TCB due to its large SW attack surface. It is important to note
> +that this doesn’t imply that the host or VMM are intentionally
> +malicious, but that there exists a security value in having a small CoCo
> +VM TCB. This new type of adversary may be viewed as a more powerful type
> +of external attacker, as it resides locally on the same physical machine
> +-in contrast to a remote network attacker- and has control over the guest
> +kernel communication with most of the HW::
IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
specifically aims to give a "guest" exclusive access to hardware resources.
> +The **Linux kernel CoCo VM security objectives** can be summarized as follows:
> +
> +1. Preserve the confidentiality and integrity of CoCo guest's private
> +memory and registers.
As I complained in v1, this doesn't hold true for all of x86. My complaint goes
away if the document is specific to the TDX/SNP/UV threat models, but describing
the doc as "x86 specific" is misleading, as the threat model isn't x86 specific,
nor do all confidential compute technologies that run on x86 share these objectives,
e.g. vanilla SEV.
> +well as CoCo technology specific hypercalls, if present. Additionally, the
> +host in a CoCo system typically controls the process of creating a CoCo
> +guest: it has a method to load into a guest the firmware and bootloader
> +images, the kernel image together with the kernel command line. All of this
> +data should also be considered untrusted until its integrity and
> +authenticity is established via attestation.
Attestation is SNP and TDX specific. AIUI, none of SEV, SEV-ES, or pKVM (which
doesn't even really exist on x86 yet), have attestation of their own, e.g. the
proposed pKVM support would rely on Secure Boot of the original "full" host kernel.
> +CONFIDENTIAL COMPUTING THREAT MODEL FOR X86 VIRTUALIZATION
> +M: Elena Reshetova <[email protected]>
> +M: Carlos Bilbao <[email protected]>
> +S: Maintained
> +F: Documentation/security/x86-confidential-computing.rst
Throwing "x86" on the name doesn't change my objections, this is still an SNP/TDX
specific doc pretending to be more generic then it actually is. I don't understand
the resistance to picking a name that makes it abundantly clear the doc covers a
very specific niche of confidential computing.
> On Mon, Jun 12, 2023, Carlos Bilbao wrote:
> > Kernel developers working on confidential computing for virtualized
> > environments in x86 operate under a set of assumptions regarding the Linux
>
> No, "x86" isn't special, SNP and TDX and s390's UV are special. pKVM is similar,
> but (a) it's not as paranoid as SNP and TDX, and (b) the known use case for pKVM
> on x86 is to harden usage of hardware devices, i.e. pKVM x86 "guests" likely
> don't
> have the same "untrusted virtual device" attack surfaces a SNP/TDX/UV guests.
+ Jason Chen to help clarifying the pKVM on x86 case.
My impression was that pKVM on x86 would similarly care about hardening its
pKVM guest kernel against host attacks. Because in security world,
if you try to put smth outside of your TCB (host SW stack in this case),
you automatically need to prevent privilege escalation attacks from
outside to inside and that implies caring about attacks that host can
do via let's say malicious pci drivers and such.
What prevents host doing such attacks in pKVM case?
>
> > +Kernel developers working on confidential computing for virtualized
> > +environments in x86 operate under a set of assumptions regarding the Linux
>
> I don't think "virtualized environments" is the right description. IMO, "cloud
> computing environments" or maybe "off-premise environments" more
> accurately
> captures what you want to document, though the latter fails to imply the
> "virtual"
> aspect of things.
Hm.. "cloud computing environments" explicitly implies "cloud", which is what
we were trying to get away from in v2, because it describes a *particular* use case
where CoCo VMs can be used (and probably will be used a lot in practice), but
we don’t want to limit this to just that use case and exclude others.
"off-premise environments" is so vague imo that I would not know what it means
in this context, if I would be a person new to the topic of CoCo. And as you said
it doesn’t even imply the virtual aspect at all.
>
> > +The specific details of the CoCo security manager vastly diverge between
> > +technologies. For example, in some cases, it will be implemented in HW
> > +while in others it may be pure SW. In some cases, such as for the
> > +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-
> staging/pKVM-IA>`,
> > +the CoCo security manager is a small, isolated and highly privileged
> > +(compared to the rest of SW running on the host) part of a traditional
> > +VMM.
>
> I say that "virtualized environments" isn't a good description because while
> pKVM
> does utilize hardware virtualization, my understanding is that the primary use
> cases for pKVM don't have the same threat model as SNP/TDX, e.g. IIUC many
> (most?
> all?) pKVM guests don't require network access.
Not having a network access requirement doesn’t implicitly invalidate the
separation guarantees between the host and guest, it just makes it easier
since you have one interface less between the host and guest.
But again I will let Jason to reply on this since he knows details.
But what you are saying more generally here and above is that you don’t want
pKVM case included into this threat model, did I understand you correctly?
>
> > +Confidential Computing adds a new type of attacker to the above list: a
>
> This should be qualified as "CoCo for cloud", or whatever sublabel we land on.
Yes, we just need to find this label. If you remember, v1 had the name
"Confidential Cloud Computing", which you were the first one to complain about ))
>
> > +potentially misbehaving host (which can also include some part of a
> > +traditional VMM or all of it), which is typically placed outside of the
> > +CoCo VM TCB due to its large SW attack surface. It is important to note
> > +that this doesn’t imply that the host or VMM are intentionally
> > +malicious, but that there exists a security value in having a small CoCo
> > +VM TCB. This new type of adversary may be viewed as a more powerful type
> > +of external attacker, as it resides locally on the same physical machine
> > +-in contrast to a remote network attacker- and has control over the guest
> > +kernel communication with most of the HW::
>
> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
> specifically aims to give a "guest" exclusive access to hardware resources.
Does it hold for *all* HW resources? If yes, indeed this would make pKVM on
x86 considerably different.
>
> > +The **Linux kernel CoCo VM security objectives** can be summarized as
> follows:
> > +
> > +1. Preserve the confidentiality and integrity of CoCo guest's private
> > +memory and registers.
>
> As I complained in v1, this doesn't hold true for all of x86. My complaint goes
> away if the document is specific to the TDX/SNP/UV threat models, but describing
> the doc as "x86 specific" is misleading, as the threat model isn't x86 specific,
> nor do all confidential compute technologies that run on x86 share these
> objectives,
> e.g. vanilla SEV.
Yes, this brings us back to the naming issue, see below.
>
> > +well as CoCo technology specific hypercalls, if present. Additionally, the
> > +host in a CoCo system typically controls the process of creating a CoCo
> > +guest: it has a method to load into a guest the firmware and bootloader
> > +images, the kernel image together with the kernel command line. All of this
> > +data should also be considered untrusted until its integrity and
> > +authenticity is established via attestation.
>
> Attestation is SNP and TDX specific. AIUI, none of SEV, SEV-ES, or pKVM (which
> doesn't even really exist on x86 yet), have attestation of their own, e.g. the
> proposed pKVM support would rely on Secure Boot of the original "full" host
> kernel.
Agree the last phrase needs to be corrected to apply for pKVM case (was missed
in v2), so propose to have this text instead:
"All of this data should also be considered untrusted until its integrity and
authenticity is established via a CoCo technology-defined process such as attestation
or variants of secure/trusted/authenticated boot."
The goal of the above sentence is only to say that the integrity/authenticity
must be established via whatever method a concrete technology brings,
otherwise we have a big problem in security.
>
> > +CONFIDENTIAL COMPUTING THREAT MODEL FOR X86 VIRTUALIZATION
> > +M: Elena Reshetova <[email protected]>
> > +M: Carlos Bilbao <[email protected]>
> > +S: Maintained
> > +F: Documentation/security/x86-confidential-computing.rst
>
> Throwing "x86" on the name doesn't change my objections, this is still an
> SNP/TDX
> specific doc pretending to be more generic then it actually is. I don't understand
> the resistance to picking a name that makes it abundantly clear the doc covers a
> very specific niche of confidential computing.
We really don’t pretend to "overgeneric", but since noone else outside of x86 is
interested to help writing this document or becoming a co-maintainer, we cannot
claim covering more than merely describing x86 specific solutions in this space.
But, let’s agree on the name and then we can plug it everywhere in v3.
v1 used "Confidential Cloud Computing"
v2 used "Confidential Computing for virtualized environments"
You proposed above
"Confidential computing for cloud computing environments "
and
"Confidential Computing for off-premise environments ".
I still don’t get what is wrong with "Confidential Computing for
virtualized environments" name: you mentioned it doesn’t
describe correctly what we want to express, but you didn’t explain
why. Could you please elaborate?
Also, is the name *that* important given that we have already spend
a whole paragraph in v2 explaining what we mean by this name?
We are all tech people here, so we don’t plan to use this name
for marketing campaigns :)
Best Regards,
Elena.
On Wed, Jun 14, 2023, Elena Reshetova wrote:
> > > +The specific details of the CoCo security manager vastly diverge between
> > > +technologies. For example, in some cases, it will be implemented in HW
> > > +while in others it may be pure SW. In some cases, such as for the
> > > +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-
> > staging/pKVM-IA>`,
> > > +the CoCo security manager is a small, isolated and highly privileged
> > > +(compared to the rest of SW running on the host) part of a traditional
> > > +VMM.
> >
> > I say that "virtualized environments" isn't a good description because
> > while pKVM does utilize hardware virtualization, my understanding is that
> > the primary use cases for pKVM don't have the same threat model as SNP/TDX,
> > e.g. IIUC many (most? all?) pKVM guests don't require network access.
>
> Not having a network access requirement doesn’t implicitly invalidate the
> separation guarantees between the host and guest, it just makes it easier
> since you have one interface less between the host and guest.
My point is that if the protected guest doesn't need any I/O beyond the hardware
device that it accesses, then the threat model is different because many of the
new/novel attack surfaces that come with the TDX/SNP threat model don't exist.
E.g. the hardening that people want to do for VirtIO drivers may not be at all
relevant to pKVM.
> But again I will let Jason to reply on this since he knows details.
>
> But what you are saying more generally here and above is that you don’t want
> pKVM case included into this threat model, did I understand you correctly?
More or less. I think the threat models for pKVM versus TDX/SNP are different
enough that accurately capturing the nuances and novelties of the TDX/SNP threat
model will be unnecessarily difficult if you also try to lump in pKVM. E.g. pKVM
is intended to run on portable client hardware, likely without memory encryption,
versus TDX/SNP being almost exclusively server oriented with the hardware being
owned and hosted by a third party that is benign (perhaps trusted even), but not
necessarily physically isolated enough to satisfy the end user's security
requirements.
One of the points I (and others) was trying to get across in v1 feedback is that
security requirements for CoCo are not the same across all use cases, and that
there are subtle but meaningful differences even when use cases are built on common
underlying technology. In other words, describing the TDX/SNP threat model with
sufficient detail and nuance is difficult enough without throwing pKVM into the
mix.
And I don't see any need to formally document pKVM's threat model right *now*.
pKVM on x86 is little more than a proposal at this point, and while I would love
to see documentation for pKVM on ARM's threat model, that obviously doesn't belong
in a doc that's x86 specific.
> > > +potentially misbehaving host (which can also include some part of a
> > > +traditional VMM or all of it), which is typically placed outside of the
> > > +CoCo VM TCB due to its large SW attack surface. It is important to note
> > > +that this doesn’t imply that the host or VMM are intentionally
> > > +malicious, but that there exists a security value in having a small CoCo
> > > +VM TCB. This new type of adversary may be viewed as a more powerful type
> > > +of external attacker, as it resides locally on the same physical machine
> > > +-in contrast to a remote network attacker- and has control over the guest
> > > +kernel communication with most of the HW::
> >
> > IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
> > specifically aims to give a "guest" exclusive access to hardware resources.
>
> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on
> x86 considerably different.
Heh, the original says "most", so it doesn't have to hold for all hardware resources,
just a simple majority.
Hello Randy,
On 6/12/23 17:43, Randy Dunlap wrote:
> Hi--
>
> On 6/12/23 09:47, Carlos Bilbao wrote:
>> Kernel developers working on confidential computing for virtualized
>> environments in x86 operate under a set of assumptions regarding the Linux
>> kernel threat model that differs from the traditional view. Historically,
>> the Linux threat model acknowledges attackers residing in userspace, as
>> well as a limited set of external attackers that are able to interact with
>> the kernel through networking or limited HW-specific exposed interfaces
>> (e.g. USB, thunderbolt). The goal of this document is to explain additional
>> attack vectors that arise in the virtualized confidential computing space
>> and discuss the proposed protection mechanisms for the Linux kernel.
>>
>> Reviewed-by: Larry Dewey <[email protected]>
>> Reviewed-by: David Kaplan <[email protected]>
>> Co-developed-by: Elena Reshetova <[email protected]>
>> Signed-off-by: Elena Reshetova <[email protected]>
>> Signed-off-by: Carlos Bilbao <[email protected]>
>> ---
>>
>> V1 can be found in:
>>
>> https://lore.kernel.org/lkml/[email protected]/
>> Changes since v1:
>>
>> - Apply feedback from first version of the patch
>> - Clarify that the document applies only to a particular angle of
>> confidential computing, namely confidential computing for virtualized
>> environments. Also, state that the document is specific to x86 and
>> that the main goal is to discuss the emerging threats.
>> - Change commit message and file name accordingly
>> - Replace AMD's link to AMD SEV SNP white paper
>> - Minor tweaking and clarifications
>>
>> ---
>> Documentation/security/index.rst | 1 +
>> .../security/x86-confidential-computing.rst | 298 ++++++++++++++++++
>> MAINTAINERS | 6 +
>> 3 files changed, 305 insertions(+)
>> create mode 100644 Documentation/security/x86-confidential-computing.rst
>>
>> diff --git a/Documentation/security/index.rst
>> b/Documentation/security/index.rst
>> index 6ed8d2fa6f9e..bda919aecb37 100644
>> --- a/Documentation/security/index.rst
>> +++ b/Documentation/security/index.rst
>> @@ -6,6 +6,7 @@ Security Documentation
>> :maxdepth: 1
>> credentials
>> + x86-confidential-computing
>
> Does the new entry align with the others?
Yes, I believe so.
>
>> IMA-templates
>> keys/index
>> lsm
>> diff --git a/Documentation/security/x86-confidential-computing.rst
>> b/Documentation/security/x86-confidential-computing.rst
>> new file mode 100644
>> index 000000000000..5c52b8888089
>> --- /dev/null
>> +++ b/Documentation/security/x86-confidential-computing.rst
>> @@ -0,0 +1,298 @@
>> +======================================================
>> +Confidential Computing in Linux for x86 virtualization
>> +======================================================
>> +
>> +.. contents:: :local:
>> +
>> +By: Elena Reshetova <[email protected]> and Carlos Bilbao
>> <[email protected]>
>> +
>> +Motivation
>> +==========
>> +
>> +Kernel developers working on confidential computing for virtualized
>> +environments in x86 operate under a set of assumptions regarding the Linux
>> +kernel threat model that differ from the traditional view. Historically,
>> +the Linux threat model acknowledges attackers residing in userspace, as
>> +well as a limited set of external attackers that are able to interact with
>> +the kernel through various networking or limited HW-specific exposed
>> +interfaces (USB, thunderbolt). The goal of this document is to explain
>> +additional attack vectors that arise in the confidential computing space
>> +and discuss the proposed protection mechanisms for the Linux kernel.
>> +
>> +Overview and terminology
>> +========================
>> +
>> +Confidential Computing (CoCo) is a broad term covering a wide range of
>> +security technologies that aim to protect the confidentiality and integrity
>> +of data in use (vs. data at rest or data in transit). At its core, CoCo
>> +solutions provide a Trusted Execution Environment (TEE), where secure data
>> +processing can be performed and, as a result, they are typically further
>> +classified into different subtypes depending on the SW that is intended
>> +to be run in TEE. This document focuses on a subclass of CoCo technologies
>> +that are targeting virtualized environments and allow running Virtual
>> +Machines (VM) inside TEE. From now on in this document will be referring
>> +to this subclass of CoCo as 'Confidential Computing (CoCo) for the
>> +virtualized environments (VE)'.
>> +
>> +CoCo, in the virtualization context, refers to a set of HW and/or SW
>> +technologies that allow for stronger security guarantees for the SW running
>> +inside a CoCo VM. Namely, confidential computing allows its users to
>> +confirm the trustworthiness of all SW pieces to include in its reduced
>> +Trusted Computing Base (TCB) given its ability to attest the state of these
>> +trusted components.
>> +
>> +While the concrete implementation details differ between technologies, all
>> +available mechanisms aim to provide increased confidentiality and
>> +integrity for the VM's guest memory and execution state (vCPU registers),
>> +more tightly controlled guest interrupt injection, as well as some
>> +additional mechanisms to control guest-host page mapping. More details on
>> +the x86-specific solutions can be found in
>> +:doc:`Intel Trust Domain Extensions (TDX) </arch/x86/tdx>` and
>
> <Documentation/arch/x86/tdx>
> or does it work without the leading subdir?
It works like this.
>
>> +`AMD Memory Encryption
>> <https://www.amd.com/system/files/techdocs/sev-snp-strengthening-vm-isolation-with-integrity-protection-and-more.pdf>`_.
>> +
>> +The basic CoCo guest layout includes the host, guest, the interfaces that
>> +communicate guest and host, a platform capable of supporting CoCo VMs, and
>> +a trusted intermediary between the guest VM and the underlying platform
>> +that acts as a security manager. The host-side virtual machine monitor
>> +(VMM) typically consists of a subset of traditional VMM features and
>> +is still in charge of the guest lifecycle, i.e. create or destroy a CoCo
>> +VM, manage its access to system resources, etc. However, since it
>> +typically stays out of CoCo VM TCB, its access is limited to preserve the
>
> to preserving the
> ?
I think that using "preserving" and "preserve" here may result in two
different interpretations:
"limited to preserve the security objectives" suggests that the limited
access is enforced to preserve the security guarantees. In other words, the
act of limiting access itself, particularly from the VMM, helps to maintain
the security objectives. This is what we want to say.
"limited to preserving the security objectives" suggests that the access of
the VMM is limited to the components that allow the VMM to preserve the
security objectives.
Hope that makes sense?
>
>> +security objectives.
>> +
>> +In the following diagram, the "<--->" lines represent bi-directional
>> +communication channels or interfaces between the CoCo security manager and
>> +the rest of the components (data flow for guest, host, hardware) ::
>> +
>> + +-------------------+ +-----------------------+
>> + | CoCo guest VM |<---->| |
>> + +-------------------+ | |
>> + | Interfaces | | CoCo security manager |
>> + +-------------------+ | |
>> + | Host VMM |<---->| |
>> + +-------------------+ | |
>> + | |
>> + +--------------------+ | |
>> + | CoCo platform |<--->| |
>> + +--------------------+ +-----------------------+
>> +
>> +The specific details of the CoCo security manager vastly diverge between
>> +technologies. For example, in some cases, it will be implemented in HW
>> +while in others it may be pure SW. In some cases, such as for the
>> +`Protected kernel-based virtual machine (pKVM)
>> <https://github.com/intel-staging/pKVM-IA>`,
>> +the CoCo security manager is a small, isolated and highly privileged
>> +(compared to the rest of SW running on the host) part of a traditional
>> +VMM.
>> +
>> +Existing Linux kernel threat model
>> +==================================
>> +
>> +The overall components of the current Linux kernel threat model are::
>> +
>> + +-----------------------+ +-------------------+
>> + | |<---->| Userspace |
>> + | | +-------------------+
>> + | External attack | | Interfaces |
>> + | vectors | +-------------------+
>> + | |<---->| Linux Kernel |
>> + | | +-------------------+
>> + +-----------------------+ +-------------------+
>> + | Bootloader/BIOS |
>> + +-------------------+
>> + +-------------------+
>> + | HW platform |
>> + +-------------------+
>> +
>> +There is also communication between the bootloader and the kernel during
>> +the boot process, but this diagram does not represent it explicitly. The
>> +"Interfaces" box represents the various interfaces that allow
>> +communication between kernel and userspace. This includes system calls,
>> +kernel APIs, device drivers, etc.
>> +
>> +The existing Linux kernel threat model typically assumes execution on a
>> +trusted HW platform with all of the firmware and bootloaders included on
>> +its TCB. The primary attacker resides in the userspace, and all of the data
>> +coming from there is generally considered untrusted, unless userspace is
>> +privileged enough to perform trusted actions. In addition, external
>> +attackers are typically considered, including those with access to enabled
>> +external networks (e.g. Ethernet, Wireless, Bluetooth), exposed hardware
>> +interfaces (e.g. USB, Thunderbolt), and the ability to modify the contents
>> +of disks offline.
>> +
>> +Regarding external attack vectors, it is interesting to note that in most
>> +cases external attackers will try to exploit vulnerabilities in userspace
>> +first, but that it is possible for an attacker to directly target the
>> +kernel; particularly if the host has physical access. Examples of direct
>> +kernel attacks include the vulnerabilities CVE-2019-19524, CVE-2022-0435
>> +and CVE-2020-24490.
>> +
>> +Confidential Computing threat model and its security objectives
>> +===============================================================
>> +
>> +Confidential Computing adds a new type of attacker to the above list: a
>> +potentially misbehaving host (which can also include some part of a
>> +traditional VMM or all of it), which is typically placed outside of the
>> +CoCo VM TCB due to its large SW attack surface. It is important to note
>> +that this doesn’t imply that the host or VMM are intentionally
>> +malicious, but that there exists a security value in having a small CoCo
>> +VM TCB. This new type of adversary may be viewed as a more powerful type
>> +of external attacker, as it resides locally on the same physical machine
>> +-in contrast to a remote network attacker- and has control over the guest
>
> Hyphens (dashes) are not normally used for a parenthetical phrase AFAIK.
Yes, parentheses would be more appropriate.
>
>> +kernel communication with most of the HW::
>
> I would prefer to capitalize "kernel" above.
I'm not sure I follow, we don't capitalize kernel elsewhere, why here?
>
>> +
>> + +------------------------+
>> + | CoCo guest VM |
>> + +-----------------------+ | +-------------------+ |
>> + | |<--->| | Userspace | |
>> + | | | +-------------------+ |
>> + | External attack | | | Interfaces | |
>> + | vectors | | +-------------------+ |
>> + | |<--->| | Linux Kernel | |
>> + | | | +-------------------+ |
>> + +-----------------------+ | +-------------------+ |
>> + | | Bootloader/BIOS | |
>> + +-----------------------+ | +-------------------+ |
>> + | |<--->+------------------------+
>> + | | | Interfaces |
>> + | | +------------------------+
>> + | CoCo security |<--->| Host/Host-side VMM |
>> + | manager | +------------------------+
>> + | | +------------------------+
>> + | |<--->| CoCo platform |
>> + +-----------------------+ +------------------------+
>> +
>> +While traditionally the host has unlimited access to guest data and can
>> +leverage this access to attack the guest, the CoCo systems mitigate such
>> +attacks by adding security features like guest data confidentiality and
>> +integrity protection. This threat model assumes that those features are
>> +available and intact.
>> +
>> +The **Linux kernel CoCo VM security objectives** can be summarized as
>> follows:
>> +
>> +1. Preserve the confidentiality and integrity of CoCo guest's private
>> +memory and registers.
>> +
>> +2. Prevent privileged escalation from a host into a CoCo guest Linux
>> kernel.
>> +While it is true that the host (and host-side VMM) requires some level of
>> +privilege to create, destroy, or pause the guest, part of the goal of
>> +preventing privileged escalation is to ensure that these operations do not
>> +provide a pathway for attackers to gain access to the guest's kernel.
>> +
>> +The above security objectives result in two primary **Linux kernel CoCo
>> +VM assets**:
>> +
>> +1. Guest kernel execution context.
>> +2. Guest kernel private memory.
>> +
>> +The host retains full control over the CoCo guest resources, and can deny
>> +access to them at any time. Examples of resources include CPU time, memory
>> +that the guest can consume, network bandwidth, etc. Because of this, the
>> +host Denial of Service (DoS) attacks against CoCo guests are beyond the
>> +scope of this threat model.
>> +
>> +The **Linux CoCo VM attack surface** is any interface exposed from a CoCo
>> +guest Linux kernel towards an untrusted host that is not covered by the
>> +CoCo technology SW/HW protection. This includes any possible
>> +side-channels, as well as transient execution side channels. Examples of
>> +explicit (not side-channel) interfaces include accesses to port I/O, MMIO
>> +and DMA interfaces, access to PCI configuration space, VMM-specific
>> +hypercalls (towards Host-side VMM), access to shared memory pages,
>> +interrupts allowed to be injected into the guest kernel by the host, as
>> +well as CoCo technology specific hypercalls, if present. Additionally, the
>
> technology-specific
True!
>
>> +host in a CoCo system typically controls the process of creating a CoCo
>> +guest: it has a method to load into a guest the firmware and bootloader
>> +images, the kernel image together with the kernel command line. All of this
>> +data should also be considered untrusted until its integrity and
>> +authenticity is established via attestation.
>> +
>> +The table below shows a threat matrix for the CoCo guest Linux kernel with
>> +the potential mitigation strategies. The matrix refers to CoCo-specific
>> +versions of the guest, host and platform.
>> +
>> +.. list-table:: CoCo Linux guest kernel threat matrix
>> + :widths: auto
>> + :align: center
>> + :header-rows: 1
>> +
>> + * - Threat name
>> + - Threat description
>> + - Mitigation strategies
>> +
>> + * - Guest malicious configuration
>> + - A misbehaving host modifies one of the following guest's
>> + configuration:
>> +
>> + 1. Guest firmware or bootloader
>> +
>> + 2. Guest kernel or module binaries
>> +
>> + 3. Guest command line parameters
>> +
>> + This allows the host to break the integrity of the code running
>> + inside a CoCo guest, and violates the CoCo security objectives.
>> + - The integrity of the guest's configuration passed via untrusted host
>> + must be ensured by methods such as remote attestation and signing.
>> + This should be largely transparent to the guest kernel, and would
>> + allow it to assume a trusted state at the time of boot.
>> +
>> + * - CoCo guest data attacks
>> + - A misbehaving host retains full control of the CoCo guest's data
>> + in-transit between the guest and the host-managed physical or
>> + virtual devices. This allows any attack against confidentiality,
>> + integrity or freshness of such data.
>> + - The CoCo guest is responsible for ensuring the confidentiality,
>> + integrity and freshness of such data using well-established
>> + security mechanisms. For example, for any guest external network
>> + communications passed via the untrusted host, an end-to-end
>> + secure session must be established between a guest and a trusted
>> + remote endpoint using well-known protocols such as TLS.
>> + This requirement also applies to protection of the guest's disk
>> + image.
>> +
>> + * - Malformed runtime input
>> + - A misbehaving host injects malformed input via any communication
>> + interface used by the guest's kernel code. If the code is not
>> + prepared to handle this input correctly, this can result in a host
>> + --> guest kernel privilege escalation. This includes traditional
>> + side-channel and/or transient execution attack vectors.
>> + - The attestation or signing process cannot help to mitigate this
>> + threat since this input is highly dynamic. Instead, a different set
>> + of mechanisms is required:
>> +
>> + 1. *Limit the exposed attack surface*. Whenever possible, disable
>> + complex kernel features and device drivers (not required for guest
>> + operation) that actively use the communication interfaces between
>> + the untrusted host and the guest. This is not a new concept for the
>> + Linux kernel, since it already has mechanisms to disable external
>> + interfaces, such as attacker's access via USB/Thunderbolt subsystem.
>> +
>> + 2. *Harden the exposed attack surface*. Any code that uses such
>> + interfaces must treat the input from the untrusted host as
>> malicious,
>> + and do sanity checks before processing it. This can be ensured by
>> + performing a code audit of such device drivers as well as employing
>> + other standard techniques for testing the code robustness, such as
>> + fuzzing. This is again a well-known concept for the Linux kernel,
>> + since all its networking code has been previously analyzed under
>> + presumption of processing malformed input from a network attacker.
>> +
>> + * - Malicious runtime input
>> + - A misbehaving host injects a specific input value via any
>> + communication interface used by the guest's kernel code. The
>> + difference with the previous attack vector (malformed runtime input)
>> + is that this input is not malformed, but its value is crafted to
>> + impact the guest's kernel security. Examples of such inputs include
>> + providing a malicious time to the guest or the entropy to the guest
>> + random number generator. Additionally, the timing of such events can
>> + be an attack vector on its own, if it results in a particular guest
>> + kernel action (i.e. processing of a host-injected interrupt).
>> + - Similarly, as with the previous attack vector, it is not possible to
>> + use attestation mechanisms to address this threat. Instead, such
>> + attack vectors (i.e. interfaces) must be either disabled or made
>> + resistant to supplied host input.
>> +
>> +As can be seen from the above table, the potential mitigation strategies
>> +to secure the CoCo Linux guest kernel vary, but can be roughly split into
>> +mechanisms that either require or do not require changes to the existing
>> +Linux kernel code. One main goal of the CoCo security architecture is to
>> +minimize changes to the Linux kernel code, while also providing usable
>> +and scalable means to facilitate the security of a CoCo guest kernel.
>
> HTH.
Very helpful, thank you for the feedback.
> ~Randy
>
Best,
Carlos
Hi Carlos,
On 6/14/23 06:55, Carlos Bilbao wrote:
> Hello Randy,
>
> On 6/12/23 17:43, Randy Dunlap wrote:
>> Hi--
>>
>> On 6/12/23 09:47, Carlos Bilbao wrote:
>>> Kernel developers working on confidential computing for virtualized
>>> environments in x86 operate under a set of assumptions regarding the Linux
>>> kernel threat model that differs from the traditional view. Historically,
>>> the Linux threat model acknowledges attackers residing in userspace, as
>>> well as a limited set of external attackers that are able to interact with
>>> the kernel through networking or limited HW-specific exposed interfaces
>>> (e.g. USB, thunderbolt). The goal of this document is to explain additional
>>> attack vectors that arise in the virtualized confidential computing space
>>> and discuss the proposed protection mechanisms for the Linux kernel.
>>>
>>> Reviewed-by: Larry Dewey <[email protected]>
>>> Reviewed-by: David Kaplan <[email protected]>
>>> Co-developed-by: Elena Reshetova <[email protected]>
>>> Signed-off-by: Elena Reshetova <[email protected]>
>>> Signed-off-by: Carlos Bilbao <[email protected]>
>>> ---
>>>
>>> ---
>>> Documentation/security/index.rst | 1 +
>>> .../security/x86-confidential-computing.rst | 298 ++++++++++++++++++
>>> MAINTAINERS | 6 +
>>> 3 files changed, 305 insertions(+)
>>> create mode 100644 Documentation/security/x86-confidential-computing.rst
>>>
>>> diff --git a/Documentation/security/x86-confidential-computing.rst b/Documentation/security/x86-confidential-computing.rst
>>> new file mode 100644
>>> index 000000000000..5c52b8888089
>>> --- /dev/null
>>> +++ b/Documentation/security/x86-confidential-computing.rst
>>> @@ -0,0 +1,298 @@
>>> +======================================================
>>> +Confidential Computing in Linux for x86 virtualization
>>> +======================================================
>>> +
>>> +.. contents:: :local:
>>> +
>>> +By: Elena Reshetova <[email protected]> and Carlos Bilbao <[email protected]>
>>> +
>>> +The basic CoCo guest layout includes the host, guest, the interfaces that
>>> +communicate guest and host, a platform capable of supporting CoCo VMs, and
>>> +a trusted intermediary between the guest VM and the underlying platform
>>> +that acts as a security manager. The host-side virtual machine monitor
>>> +(VMM) typically consists of a subset of traditional VMM features and
>>> +is still in charge of the guest lifecycle, i.e. create or destroy a CoCo
>>> +VM, manage its access to system resources, etc. However, since it
>>> +typically stays out of CoCo VM TCB, its access is limited to preserve the
>>
>> to preserving the
>> ?
>
> I think that using "preserving" and "preserve" here may result in two
> different interpretations:
>
> "limited to preserve the security objectives" suggests that the limited
> access is enforced to preserve the security guarantees. In other words, the
> act of limiting access itself, particularly from the VMM, helps to maintain
> the security objectives. This is what we want to say.
>
> "limited to preserving the security objectives" suggests that the access of
> the VMM is limited to the components that allow the VMM to preserve the
> security objectives.
>
> Hope that makes sense?
Yes, I get it, thanks.
>>
>>> +security objectives.
>>> +
>>> +In the following diagram, the "<--->" lines represent bi-directional
>>> +communication channels or interfaces between the CoCo security manager and
>>> +the rest of the components (data flow for guest, host, hardware) ::
>>> +
>>> + +-------------------+ +-----------------------+
>>> + | CoCo guest VM |<---->| |
>>> + +-------------------+ | |
>>> + | Interfaces | | CoCo security manager |
>>> + +-------------------+ | |
>>> + | Host VMM |<---->| |
>>> + +-------------------+ | |
>>> + | |
>>> + +--------------------+ | |
>>> + | CoCo platform |<--->| |
>>> + +--------------------+ +-----------------------+
>>> +
>>> +The specific details of the CoCo security manager vastly diverge between
>>> +technologies. For example, in some cases, it will be implemented in HW
>>> +while in others it may be pure SW. In some cases, such as for the
>>> +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-staging/pKVM-IA>`,
>>> +the CoCo security manager is a small, isolated and highly privileged
>>> +(compared to the rest of SW running on the host) part of a traditional
>>> +VMM.
>>> +
>>> +Confidential Computing threat model and its security objectives
>>> +===============================================================
>>> +
>>> +Confidential Computing adds a new type of attacker to the above list: a
>>> +potentially misbehaving host (which can also include some part of a
>>> +traditional VMM or all of it), which is typically placed outside of the
>>> +CoCo VM TCB due to its large SW attack surface. It is important to note
>>> +that this doesn’t imply that the host or VMM are intentionally
>>> +malicious, but that there exists a security value in having a small CoCo
>>> +VM TCB. This new type of adversary may be viewed as a more powerful type
>>> +of external attacker, as it resides locally on the same physical machine
>>> +-in contrast to a remote network attacker- and has control over the guest
>>
>> Hyphens (dashes) are not normally used for a parenthetical phrase AFAIK.
>
> Yes, parentheses would be more appropriate.
>
>>
>>> +kernel communication with most of the HW::
>>
>> I would prefer to capitalize "kernel" above.
>
> I'm not sure I follow, we don't capitalize kernel elsewhere, why here?
>
My mistake in reading. :(
Thanks.
--
~Randy
On 6/13/23 19:03, Sean Christopherson wrote:
> On Mon, Jun 12, 2023, Carlos Bilbao wrote:
>> +well as CoCo technology specific hypercalls, if present. Additionally, the
>> +host in a CoCo system typically controls the process of creating a CoCo
>> +guest: it has a method to load into a guest the firmware and bootloader
>> +images, the kernel image together with the kernel command line. All of this
>> +data should also be considered untrusted until its integrity and
>> +authenticity is established via attestation.
>
> Attestation is SNP and TDX specific. AIUI, none of SEV, SEV-ES, or pKVM (which
> doesn't even really exist on x86 yet), have attestation of their own, e.g. the
> proposed pKVM support would rely on Secure Boot of the original "full" host kernel.
Seems to be a bit of misunderstanding here. Secure Boot verifies the
host kernel, which is indeed also important, since the pKVM hypervisor
is a part of the host kernel image. But when it comes to verifying the
guests, it's a different story: a protected pKVM guest is started by the
(untrusted) host at an arbitrary moment in time, not before the early
kernel deprivileging when the host is still considered trusted.
(Moreover, in practice the guest is started by a userspace VMM, i.e. not
exactly the most trusted part of the host stack.) So the host can
maliciously or mistakenly load a wrong guest image for running as a
protected guest, so we do need attestation for protected guests.
This attestation is not implemented in pKVM on x86 yet (you are right
that pKVM on x86 is little more than a proposal at this point). But in
pKVM on ARM it is afaik already working, it is software based (ensured
by pKVM hypervisor + a tiny generic guest bootloader which verifies the
guest image before jumping to the guest) and architecture-independent,
so it should be possible to adopt it for x86 as is.
Furthermore, since for pKVM on x86 use cases we also need assigning
physical secure hardware devices to the protected guest, we need
attestation not just for the guest image itself but also for the secure
devices assigned to it by the host.
On 6/14/23 16:15, Sean Christopherson wrote:
> On Wed, Jun 14, 2023, Elena Reshetova wrote:
>>>> +The specific details of the CoCo security manager vastly diverge between
>>>> +technologies. For example, in some cases, it will be implemented in HW
>>>> +while in others it may be pure SW. In some cases, such as for the
>>>> +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-
>>> staging/pKVM-IA>`,
>>>> +the CoCo security manager is a small, isolated and highly privileged
>>>> +(compared to the rest of SW running on the host) part of a traditional
>>>> +VMM.
>>>
>>> I say that "virtualized environments" isn't a good description because
>>> while pKVM does utilize hardware virtualization, my understanding is that
>>> the primary use cases for pKVM don't have the same threat model as SNP/TDX,
>>> e.g. IIUC many (most? all?) pKVM guests don't require network access.
>>
>> Not having a network access requirement doesn’t implicitly invalidate the
>> separation guarantees between the host and guest, it just makes it easier
>> since you have one interface less between the host and guest.
>
> My point is that if the protected guest doesn't need any I/O beyond the hardware
> device that it accesses, then the threat model is different because many of the
> new/novel attack surfaces that come with the TDX/SNP threat model don't exist.
> E.g. the hardening that people want to do for VirtIO drivers may not be at all
> relevant to pKVM.
Strictly speaking, the protected pKVM guest does need some I/O beyond
that, e.g. for some (limited and specialized) communication between the
host and the guest, e.g. vsock-based. For example, in the fingerprint
use case, the guest receives requests from the host to capture
fingerprint data from the sensor, sends encrypted fingerprint templates
to the host, and so on.
Additionally, speaking of the hardware device, the guest does not
entirely own it. It has direct exclusive access to the data
communication with the device (ensured by its exclusive access to MMIO
and DMA buffers), but e.g. the device interrupts are forwarded to the
guest by the host, and the PCI config space is virtualized by the host.
But I think I get what you mean: there is no data transfer whereby the
host is not an endpoint but an intermediary between the guest and some
device. In simple words, things like virtio-net or virtio-blk are out of
scope. Yes, I think that's correct for pKVM-on-x86 use cases (and I
suppose it is correct for pKVM-on-ARM use cases as well). I guess it
means that "guest data attacks" may not be relevant to pKVM, and perhaps
this makes its threat model substantially different from cloud use
cases.
However, other kinds of threats described in the doc do seem to be
relevant to pKVM. "Malformed/malicious runtime input" is relevant since
communication channels between the host and the guest do exist, the host
may arbitrarily inject interrupts into the guest, etc. "Guest malicious
configuration" is relevant too, and guest attestation is required, as I
wrote in [1].
Cc'ing android-kvm and some ChromeOS folks to correct me if needed.
> And I don't see any need to formally document pKVM's threat model right *now*.
> pKVM on x86 is little more than a proposal at this point, and while I would love
> to see documentation for pKVM on ARM's threat model, that obviously doesn't belong
> in a doc that's x86 specific.
Agree, and I don't think it makes sense to mention pKVM-on-x86 without
mentioning pKVM-on-ARM, as if pKVM-on-x86 had more in common with cloud
use cases than with pKVM-on-ARM, while quite the opposite is true.
It seems there is no reason why pKVM-on-x86 threat model should be
different from pKVM-on-ARM. The use cases on ARM (for Android) and on
x86 (for ChromeOS) are somewhat different at this moment (in that in
ChromeOS use cases the protected guest's sensitive data includes also
data coming directly from a physical device), but IIUC they are
converging now, i.e. Android is getting interested in use cases with
physical devices too.
>>>> +potentially misbehaving host (which can also include some part of a
>>>> +traditional VMM or all of it), which is typically placed outside of the
>>>> +CoCo VM TCB due to its large SW attack surface. It is important to note
>>>> +that this doesn’t imply that the host or VMM are intentionally
>>>> +malicious, but that there exists a security value in having a small CoCo
>>>> +VM TCB. This new type of adversary may be viewed as a more powerful type
>>>> +of external attacker, as it resides locally on the same physical machine
>>>> +-in contrast to a remote network attacker- and has control over the guest
>>>> +kernel communication with most of the HW::
>>>
>>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
>>> specifically aims to give a "guest" exclusive access to hardware resources.
>>
>> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on
>> x86 considerably different.
>
> Heh, the original says "most", so it doesn't have to hold for all hardware resources,
> just a simple majority.
Again, pedantic mode on, I find it difficult to agree with the wording
that the guest owns "most of" the HW resources it uses. It controls the
data communication with its hardware device, but other resources (e.g.
CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
host and virtualized by it for the guest.
[1] https://lore.kernel.org/all/[email protected]/
On Fri, Jun 16, 2023, Dmytro Maluka wrote:
> On 6/14/23 16:15, Sean Christopherson wrote:
> > On Wed, Jun 14, 2023, Elena Reshetova wrote:
> >> Not having a network access requirement doesn’t implicitly invalidate the
> >> separation guarantees between the host and guest, it just makes it easier
> >> since you have one interface less between the host and guest.
> >
> > My point is that if the protected guest doesn't need any I/O beyond the hardware
> > device that it accesses, then the threat model is different because many of the
> > new/novel attack surfaces that come with the TDX/SNP threat model don't exist.
> > E.g. the hardening that people want to do for VirtIO drivers may not be at all
> > relevant to pKVM.
...
> But I think I get what you mean: there is no data transfer whereby the
> host is not an endpoint but an intermediary between the guest and some
> device. In simple words, things like virtio-net or virtio-blk are out of
> scope. Yes, I think that's correct for pKVM-on-x86 use cases (and I
> suppose it is correct for pKVM-on-ARM use cases as well). I guess it
> means that "guest data attacks" may not be relevant to pKVM, and perhaps
> this makes its threat model substantially different from cloud use
> cases.
Yes.
> >>>> +This new type of adversary may be viewed as a more powerful type
> >>>> +of external attacker, as it resides locally on the same physical machine
> >>>> +-in contrast to a remote network attacker- and has control over the guest
> >>>> +kernel communication with most of the HW::
> >>>
> >>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
> >>> specifically aims to give a "guest" exclusive access to hardware resources.
> >>
> >> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on
> >> x86 considerably different.
> >
> > Heh, the original says "most", so it doesn't have to hold for all hardware resources,
> > just a simple majority.
>
> Again, pedantic mode on, I find it difficult to agree with the wording
> that the guest owns "most of" the HW resources it uses. It controls the
> data communication with its hardware device, but other resources (e.g.
> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
> host and virtualized by it for the guest.
I wasn't saying that the guest owns most resources, I was saying that the *untrusted*
host does *not* own most resources that are exposed to the guest. My understanding
is that everything in your list is owned by the trusted hypervisor in the pKVM model.
What I was pointing out is related to the above discussion about the guest needing
access to hardware that is effectively owned by the untrusted host, e.g. network
access.
On Fri, Jun 16, 2023 at 8:56 AM Sean Christopherson <[email protected]> wrote:
>
> On Fri, Jun 16, 2023, Dmytro Maluka wrote:
> > On 6/14/23 16:15, Sean Christopherson wrote:
> > > On Wed, Jun 14, 2023, Elena Reshetova wrote:
> > >> Not having a network access requirement doesn’t implicitly invalidate the
> > >> separation guarantees between the host and guest, it just makes it easier
> > >> since you have one interface less between the host and guest.
> > >
> > > My point is that if the protected guest doesn't need any I/O beyond the hardware
> > > device that it accesses, then the threat model is different because many of the
> > > new/novel attack surfaces that come with the TDX/SNP threat model don't exist.
> > > E.g. the hardening that people want to do for VirtIO drivers may not be at all
> > > relevant to pKVM.
>
> ...
>
> > But I think I get what you mean: there is no data transfer whereby the
> > host is not an endpoint but an intermediary between the guest and some
> > device. In simple words, things like virtio-net or virtio-blk are out of
> > scope. Yes, I think that's correct for pKVM-on-x86 use cases (and I
> > suppose it is correct for pKVM-on-ARM use cases as well). I guess it
> > means that "guest data attacks" may not be relevant to pKVM, and perhaps
> > this makes its threat model substantially different from cloud use
> > cases.
>
> Yes.
>
> > >>>> +This new type of adversary may be viewed as a more powerful type
> > >>>> +of external attacker, as it resides locally on the same physical machine
> > >>>> +-in contrast to a remote network attacker- and has control over the guest
> > >>>> +kernel communication with most of the HW::
> > >>>
> > >>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
> > >>> specifically aims to give a "guest" exclusive access to hardware resources.
> > >>
> > >> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on
> > >> x86 considerably different.
> > >
> > > Heh, the original says "most", so it doesn't have to hold for all hardware resources,
> > > just a simple majority.
> >
> > Again, pedantic mode on, I find it difficult to agree with the wording
> > that the guest owns "most of" the HW resources it uses. It controls the
> > data communication with its hardware device, but other resources (e.g.
> > CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
> > host and virtualized by it for the guest.
>
> I wasn't saying that the guest owns most resources, I was saying that the *untrusted*
> host does *not* own most resources that are exposed to the guest. My understanding
> is that everything in your list is owned by the trusted hypervisor in the pKVM model.
>
> What I was pointing out is related to the above discussion about the guest needing
> access to hardware that is effectively owned by the untrusted host, e.g. network
> access.
The network case isn't a great example because it is common for user
space applications not to trust the network and to use verification
schemes like TLS where trust of the network is not required, so the
trusted guest could use these strategies when needed. There wouldn't
be any availability guarantees, but my understanding is that isn't in
scope for pKVM.
In the case where the host owns a TPM and the guest has to cooperate
with the host to communicate with the TPM. There are schemes for
establishing trust between the TPM and the trusted guest with various
properties (authentication, confidentiality, integrity, etc.). This
does have the downside of additional complexity, but comes with the
benefit of also being resistant to attacks like monitoring the SPI
lines going to the TPM.
Did you have particular situations in mind for resources that would be
owned by the host and needed by the trusted guest?
On Fri, Jun 16, 2023, Dmytro Maluka wrote:
> On 6/13/23 19:03, Sean Christopherson wrote:
> > On Mon, Jun 12, 2023, Carlos Bilbao wrote:
> >> +well as CoCo technology specific hypercalls, if present. Additionally, the
> >> +host in a CoCo system typically controls the process of creating a CoCo
> >> +guest: it has a method to load into a guest the firmware and bootloader
> >> +images, the kernel image together with the kernel command line. All of this
> >> +data should also be considered untrusted until its integrity and
> >> +authenticity is established via attestation.
> >
> > Attestation is SNP and TDX specific. AIUI, none of SEV, SEV-ES, or pKVM (which
> > doesn't even really exist on x86 yet), have attestation of their own, e.g. the
> > proposed pKVM support would rely on Secure Boot of the original "full" host kernel.
>
> Seems to be a bit of misunderstanding here. Secure Boot verifies the
> host kernel, which is indeed also important, since the pKVM hypervisor
> is a part of the host kernel image. But when it comes to verifying the
> guests, it's a different story: a protected pKVM guest is started by the
> (untrusted) host at an arbitrary moment in time, not before the early
> kernel deprivileging when the host is still considered trusted.
> (Moreover, in practice the guest is started by a userspace VMM, i.e. not
> exactly the most trusted part of the host stack.) So the host can
> maliciously or mistakenly load a wrong guest image for running as a
> protected guest, so we do need attestation for protected guests.
>
> This attestation is not implemented in pKVM on x86 yet (you are right
> that pKVM on x86 is little more than a proposal at this point). But in
> pKVM on ARM it is afaik already working, it is software based (ensured
> by pKVM hypervisor + a tiny generic guest bootloader which verifies the
> guest image before jumping to the guest) and architecture-independent,
> so it should be possible to adopt it for x86 as is.
Sorry, instead of "Attestation is SNP and TDX specific", I should have said, "The
form of attestation described here is SNP and TDX specific".
pKVM's "attestation", effectively has its root of trust in the pKVM hypervisor,
which is in turn attested via Secure Boot. I.e. the guest payload is verified
*before* it is launched.
That is different from SNP and TDX where guest code and data is controlled by the
*untrusted* host. The initial payload is measured by trusted firmware, but it is
not verified, and so that measurement must be attested after the guest boots,
before any sensitive data is provisioned to the guest.
Specifically, with "untrusted" inserted by me for clarification, my understanding
is that this doesn't hold true for pKVM when splitting hairs:
Additionally, the **untrusted** host in a CoCo system typically controls the
process of creating a CoCo guest: it has a method to load into a guest the
firmware and bootloader images, the kernel image together with the kernel
command line. All of this data should also be considered untrusted until its
integrity and authenticity is established via attestation.
because the guest firmware comes from a trusted entity, not the untrusted host.
On Fri, Jun 16, 2023, Allen Webb wrote:
> On Fri, Jun 16, 2023 at 8:56 AM Sean Christopherson <[email protected]> wrote:
> >
> > On Fri, Jun 16, 2023, Dmytro Maluka wrote:
> > > On 6/14/23 16:15, Sean Christopherson wrote:
> > > > On Wed, Jun 14, 2023, Elena Reshetova wrote:
> > > >>>> +This new type of adversary may be viewed as a more powerful type
> > > >>>> +of external attacker, as it resides locally on the same physical machine
> > > >>>> +-in contrast to a remote network attacker- and has control over the guest
> > > >>>> +kernel communication with most of the HW::
> > > >>>
> > > >>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
> > > >>> specifically aims to give a "guest" exclusive access to hardware resources.
> > > >>
> > > >> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on
> > > >> x86 considerably different.
> > > >
> > > > Heh, the original says "most", so it doesn't have to hold for all hardware resources,
> > > > just a simple majority.
> > >
> > > Again, pedantic mode on, I find it difficult to agree with the wording
> > > that the guest owns "most of" the HW resources it uses. It controls the
> > > data communication with its hardware device, but other resources (e.g.
> > > CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
> > > host and virtualized by it for the guest.
> >
> > I wasn't saying that the guest owns most resources, I was saying that the *untrusted*
> > host does *not* own most resources that are exposed to the guest. My understanding
> > is that everything in your list is owned by the trusted hypervisor in the pKVM model.
> >
> > What I was pointing out is related to the above discussion about the guest needing
> > access to hardware that is effectively owned by the untrusted host, e.g. network
> > access.
>
> The network case isn't a great example because it is common for user
> space applications not to trust the network and to use verification
> schemes like TLS where trust of the network is not required, so the
> trusted guest could use these strategies when needed.
There's a bit of context/history that isn't captured here. The network being
untrusted isn't new/novel in the SNP/TDX threat model, what's new is that the
network *device* is untrusted.
In the SNP/TDX world, the NIC is likely to be a synthetic, virtual device that is
provided by the untrusted VMM. Pre-SNP/TDX, input from the device, i.e. the VMM,
is trusted; the guest still needs to use e.g. TLS to secure network traffic, but
the device configuration and whatnot is fully trusted. When the VMM is no longer
trusted, the device itself is no longer trusted.
To address that, the folks working on SNP and TDX started posting patches[1][2]
to harden kernel drivers against bad device configurations and whanot, but without
first getting community buy-in on this new threat model, which led us here[3].
There is no equivalent in existing userspace applications, because userspace's
memory is not private, i.e. the kernel doesn't need to do Iago attacks to compromise
userspace, the kernel can simply read whatever memory it wants.
And for pKVM, my understanding is that devices and configuration information that
are exposed to the guest are trusted and/or verified in some way, i.e. the points
of contention that led to this doc don't necessarily apply to the pKVM use case.
[1] https://lore.kernel.org/linux-iommu/[email protected]
[2] https://lore.kernel.org/all/[email protected]
[3] https://lore.kernel.org/lkml/DM8PR11MB57505481B2FE79C3D56C9201E7CE9@DM8PR11MB5750.namprd11.prod.outlook.com
On Fri, Jun 16, 2023 at 9:42 AM Sean Christopherson <[email protected]> wrote:
>
> On Fri, Jun 16, 2023, Allen Webb wrote:
> > On Fri, Jun 16, 2023 at 8:56 AM Sean Christopherson <[email protected]> wrote:
> > >
> > > On Fri, Jun 16, 2023, Dmytro Maluka wrote:
> > > > On 6/14/23 16:15, Sean Christopherson wrote:
> > > > > On Wed, Jun 14, 2023, Elena Reshetova wrote:
> > > > >>>> +This new type of adversary may be viewed as a more powerful type
> > > > >>>> +of external attacker, as it resides locally on the same physical machine
> > > > >>>> +-in contrast to a remote network attacker- and has control over the guest
> > > > >>>> +kernel communication with most of the HW::
> > > > >>>
> > > > >>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
> > > > >>> specifically aims to give a "guest" exclusive access to hardware resources.
> > > > >>
> > > > >> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on
> > > > >> x86 considerably different.
> > > > >
> > > > > Heh, the original says "most", so it doesn't have to hold for all hardware resources,
> > > > > just a simple majority.
> > > >
> > > > Again, pedantic mode on, I find it difficult to agree with the wording
> > > > that the guest owns "most of" the HW resources it uses. It controls the
> > > > data communication with its hardware device, but other resources (e.g.
> > > > CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
> > > > host and virtualized by it for the guest.
> > >
> > > I wasn't saying that the guest owns most resources, I was saying that the *untrusted*
> > > host does *not* own most resources that are exposed to the guest. My understanding
> > > is that everything in your list is owned by the trusted hypervisor in the pKVM model.
> > >
> > > What I was pointing out is related to the above discussion about the guest needing
> > > access to hardware that is effectively owned by the untrusted host, e.g. network
> > > access.
> >
> > The network case isn't a great example because it is common for user
> > space applications not to trust the network and to use verification
> > schemes like TLS where trust of the network is not required, so the
> > trusted guest could use these strategies when needed.
>
> There's a bit of context/history that isn't captured here. The network being
> untrusted isn't new/novel in the SNP/TDX threat model, what's new is that the
> network *device* is untrusted.
>
> In the SNP/TDX world, the NIC is likely to be a synthetic, virtual device that is
> provided by the untrusted VMM. Pre-SNP/TDX, input from the device, i.e. the VMM,
> is trusted; the guest still needs to use e.g. TLS to secure network traffic, but
> the device configuration and whatnot is fully trusted. When the VMM is no longer
> trusted, the device itself is no longer trusted.
>
> To address that, the folks working on SNP and TDX started posting patches[1][2]
> to harden kernel drivers against bad device configurations and whanot, but without
> first getting community buy-in on this new threat model, which led us here[3].
>
> There is no equivalent in existing userspace applications, because userspace's
> memory is not private, i.e. the kernel doesn't need to do Iago attacks to compromise
> userspace, the kernel can simply read whatever memory it wants.
>
> And for pKVM, my understanding is that devices and configuration information that
> are exposed to the guest are trusted and/or verified in some way, i.e. the points
> of contention that led to this doc don't necessarily apply to the pKVM use case.
That extra context helps, so the hardening is on the side of the guest
kernel since the host kernel isn't trusted?
My biggest concerns would be around situations where devices have
memory access for things like DMA. In such cases the guest would need
to be protected from the devices so bounce buffers or some limited
shared memory might need to be set up to facilitate these devices
without breaking the goals of pKVM.
The minimum starting point for something like this would be a shared
memory region visible to both the guest and the host. Given that it
should be possible to build communication primitives on top, but yes
ideally something like vsock or virtio would just work without
introducing risk of exploitation and typically the hypervisor is
trusted. Maybe this could be modeled as sibling to sibling
virtio/vsock?
>
> [1] https://lore.kernel.org/linux-iommu/[email protected]
> [2] https://lore.kernel.org/all/[email protected]
> [3] https://lore.kernel.org/lkml/DM8PR11MB57505481B2FE79C3D56C9201E7CE9@DM8PR11MB5750.namprd11.prod.outlook.com
On 6/16/23 15:56, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Dmytro Maluka wrote:
>> On 6/14/23 16:15, Sean Christopherson wrote:
>>> On Wed, Jun 14, 2023, Elena Reshetova wrote:
>>>>>> +This new type of adversary may be viewed as a more powerful type
>>>>>> +of external attacker, as it resides locally on the same physical machine
>>>>>> +-in contrast to a remote network attacker- and has control over the guest
>>>>>> +kernel communication with most of the HW::
>>>>>
>>>>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
>>>>> specifically aims to give a "guest" exclusive access to hardware resources.
>>>>
>>>> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on
>>>> x86 considerably different.
>>>
>>> Heh, the original says "most", so it doesn't have to hold for all hardware resources,
>>> just a simple majority.
>>
>> Again, pedantic mode on, I find it difficult to agree with the wording
>> that the guest owns "most of" the HW resources it uses. It controls the
>> data communication with its hardware device, but other resources (e.g.
>> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
>> host and virtualized by it for the guest.
>
> I wasn't saying that the guest owns most resources, I was saying that the *untrusted*
> host does *not* own most resources that are exposed to the guest. My understanding
> is that everything in your list is owned by the trusted hypervisor in the pKVM model.
Heh, no. Most of these resources are owned by the untrusted host, that's
the point.
Basically for two reasons: 1. we want to keep the trusted hypervisor as
simple as possible. 2. we don't need availability guarantees.
The trusted hypervisor owns only: 2nd-stage MMU, IOMMU, VMCS (or its
counterparts on non-Intel), physical PCI config space (merely for
controlling a few critical registers like BARs and MSI address
registers), perhaps a few more things that don't come to my mind now.
The untrusted host schedules its guests on physical CPUs (i.e. the
host's L1 vCPUs are 1:1 mapped onto pCPUs), while the trusted hypervisor
has no scheduling, it only handles vmexits from the host and guests. The
untrusted host fully controls the physical interrupt controllers (I
think we realize that is not perfectly fine, but here we are), etc.
> What I was pointing out is related to the above discussion about the guest needing
> access to hardware that is effectively owned by the untrusted host, e.g. network
> access.
On 6/16/23 16:20, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Dmytro Maluka wrote:
>> On 6/13/23 19:03, Sean Christopherson wrote:
>>> On Mon, Jun 12, 2023, Carlos Bilbao wrote:
>>>> +well as CoCo technology specific hypercalls, if present. Additionally, the
>>>> +host in a CoCo system typically controls the process of creating a CoCo
>>>> +guest: it has a method to load into a guest the firmware and bootloader
>>>> +images, the kernel image together with the kernel command line. All of this
>>>> +data should also be considered untrusted until its integrity and
>>>> +authenticity is established via attestation.
>>>
>>> Attestation is SNP and TDX specific. AIUI, none of SEV, SEV-ES, or pKVM (which
>>> doesn't even really exist on x86 yet), have attestation of their own, e.g. the
>>> proposed pKVM support would rely on Secure Boot of the original "full" host kernel.
>>
>> Seems to be a bit of misunderstanding here. Secure Boot verifies the
>> host kernel, which is indeed also important, since the pKVM hypervisor
>> is a part of the host kernel image. But when it comes to verifying the
>> guests, it's a different story: a protected pKVM guest is started by the
>> (untrusted) host at an arbitrary moment in time, not before the early
>> kernel deprivileging when the host is still considered trusted.
>> (Moreover, in practice the guest is started by a userspace VMM, i.e. not
>> exactly the most trusted part of the host stack.) So the host can
>> maliciously or mistakenly load a wrong guest image for running as a
>> protected guest, so we do need attestation for protected guests.
>>
>> This attestation is not implemented in pKVM on x86 yet (you are right
>> that pKVM on x86 is little more than a proposal at this point). But in
>> pKVM on ARM it is afaik already working, it is software based (ensured
>> by pKVM hypervisor + a tiny generic guest bootloader which verifies the
>> guest image before jumping to the guest) and architecture-independent,
>> so it should be possible to adopt it for x86 as is.
>
> Sorry, instead of "Attestation is SNP and TDX specific", I should have said, "The
> form of attestation described here is SNP and TDX specific".
>
> pKVM's "attestation", effectively has its root of trust in the pKVM hypervisor,
> which is in turn attested via Secure Boot. I.e. the guest payload is verified
> *before* it is launched.
Got it, fair point. Yep, I think this understanding is fully correct.
> That is different from SNP and TDX where guest code and data is controlled by the
> *untrusted* host. The initial payload is measured by trusted firmware, but it is
> not verified, and so that measurement must be attested after the guest boots,
> before any sensitive data is provisioned to the guest.
>
> Specifically, with "untrusted" inserted by me for clarification, my understanding
> is that this doesn't hold true for pKVM when splitting hairs:
>
> Additionally, the **untrusted** host in a CoCo system typically controls the
> process of creating a CoCo guest: it has a method to load into a guest the
> firmware and bootloader images, the kernel image together with the kernel
> command line. All of this data should also be considered untrusted until its
> integrity and authenticity is established via attestation.
>
> because the guest firmware comes from a trusted entity, not the untrusted host.
On Fri, Jun 16, 2023, Dmytro Maluka wrote:
> On 6/16/23 15:56, Sean Christopherson wrote:
> > On Fri, Jun 16, 2023, Dmytro Maluka wrote:
> >> On 6/14/23 16:15, Sean Christopherson wrote:
> >>> On Wed, Jun 14, 2023, Elena Reshetova wrote:
> >>>>>> +This new type of adversary may be viewed as a more powerful type
> >>>>>> +of external attacker, as it resides locally on the same physical machine
> >>>>>> +-in contrast to a remote network attacker- and has control over the guest
> >>>>>> +kernel communication with most of the HW::
> >>>>>
> >>>>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
> >>>>> specifically aims to give a "guest" exclusive access to hardware resources.
> >>>>
> >>>> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on
> >>>> x86 considerably different.
> >>>
> >>> Heh, the original says "most", so it doesn't have to hold for all hardware resources,
> >>> just a simple majority.
> >>
> >> Again, pedantic mode on, I find it difficult to agree with the wording
> >> that the guest owns "most of" the HW resources it uses. It controls the
> >> data communication with its hardware device, but other resources (e.g.
> >> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
> >> host and virtualized by it for the guest.
> >
> > I wasn't saying that the guest owns most resources, I was saying that the *untrusted*
> > host does *not* own most resources that are exposed to the guest. My understanding
> > is that everything in your list is owned by the trusted hypervisor in the pKVM model.
>
> Heh, no. Most of these resources are owned by the untrusted host, that's
> the point.
Ah, I was overloading "owned", probably wrongly. What I'm trying to call out is
that in pKVM, while the untrusted host can withold resources, it can't subvert
most of those resources. Taking scheduling as an example, a pKVM vCPU may be
migrated to a different pCPU by the untrusted host, but pKVM ensures that it is
safe to run on the new pCPU, e.g. on Intel, pKVM (presumably) does any necessary
VMCLEAR, IBPB, INVEPT, etc. to ensure the vCPU doesn't consume stale data.
> Basically for two reasons: 1. we want to keep the trusted hypervisor as
> simple as possible. 2. we don't need availability guarantees.
>
> The trusted hypervisor owns only: 2nd-stage MMU, IOMMU, VMCS (or its
> counterparts on non-Intel), physical PCI config space (merely for
> controlling a few critical registers like BARs and MSI address
> registers), perhaps a few more things that don't come to my mind now.
The "physical PCI config space" is a key difference, and is very relevant to this
doc (see my response to Allen).
> The untrusted host schedules its guests on physical CPUs (i.e. the
> host's L1 vCPUs are 1:1 mapped onto pCPUs), while the trusted hypervisor
> has no scheduling, it only handles vmexits from the host and guests. The
> untrusted host fully controls the physical interrupt controllers (I
> think we realize that is not perfectly fine, but here we are), etc.
Yeah, IRQs are a tough nut to crack.
On 6/16/23 20:07, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Dmytro Maluka wrote:
>> On 6/16/23 15:56, Sean Christopherson wrote:
>>> On Fri, Jun 16, 2023, Dmytro Maluka wrote:
>>>> Again, pedantic mode on, I find it difficult to agree with the wording
>>>> that the guest owns "most of" the HW resources it uses. It controls the
>>>> data communication with its hardware device, but other resources (e.g.
>>>> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
>>>> host and virtualized by it for the guest.
>>>
>>> I wasn't saying that the guest owns most resources, I was saying that the *untrusted*
>>> host does *not* own most resources that are exposed to the guest. My understanding
>>> is that everything in your list is owned by the trusted hypervisor in the pKVM model.
>>
>> Heh, no. Most of these resources are owned by the untrusted host, that's
>> the point.
>
> Ah, I was overloading "owned", probably wrongly. What I'm trying to call out is
> that in pKVM, while the untrusted host can withold resources, it can't subvert
> most of those resources. Taking scheduling as an example, a pKVM vCPU may be
> migrated to a different pCPU by the untrusted host, but pKVM ensures that it is
> safe to run on the new pCPU, e.g. on Intel, pKVM (presumably) does any necessary
> VMCLEAR, IBPB, INVEPT, etc. to ensure the vCPU doesn't consume stale data.
Yep, agree.
>> Basically for two reasons: 1. we want to keep the trusted hypervisor as
>> simple as possible. 2. we don't need availability guarantees.
>>
>> The trusted hypervisor owns only: 2nd-stage MMU, IOMMU, VMCS (or its
>> counterparts on non-Intel), physical PCI config space (merely for
>> controlling a few critical registers like BARs and MSI address
>> registers), perhaps a few more things that don't come to my mind now.
>
> The "physical PCI config space" is a key difference, and is very relevant to this
> doc (see my response to Allen).
Yeah, thanks for the links and the context, BTW.
But let me clarify that we have 2 things here that should not be
confused with each other. We have 2 levels of virtualization of the PCI
config space in pKVM. The hypervisor traps the host's accesses to the
config space, but mostly it simply passes them through to hardware. Most
importantly, when the host reprograms a BAR, the hypervisor makes sure
to update the corresponding MMIO mappings in the host's and the guest's
2nd-level page tables (that is what makes protection of the protected
guest's passthrough PCI devices possible at all). But essentially it's
the host that manages the physical config space. And the host, in turn,
virtualizes it for the guest, using vfio-pci, like it is traditionally
done for passthrough PCI devices.
This latter, emulated config space is the concern. Looking at the
patches [1] and thinking if those MSI-X misconfiguration attacks are
possible in pKVM, I come to the conclusion that yes, they are.
Device attestation helps with trusting/verifying static information, but
the dynamically changing config space is something different.
So it seems that such "emulated PCI config misconfiguration attacks"
need to be included in the threat model for pKVM as well, i.e. need to
be hardened on the guest side. Unless we revisit our current design
assumptions for device assignment in pKVM on x86 and manage the physical
PCI config in the trusted hypervisor, not in the host (with all the
increasing complexity that comes with that, related to power management
and other things).
Also, thinking more about it: irrespectively of passthrough devices, I
guess that the protected pKVM guest may well want to use virtio with PCI
transport (not for things like networking, but that's not the point),
thus be prone to the same attacks.
>> The untrusted host schedules its guests on physical CPUs (i.e. the
>> host's L1 vCPUs are 1:1 mapped onto pCPUs), while the trusted hypervisor
>> has no scheduling, it only handles vmexits from the host and guests. The
>> untrusted host fully controls the physical interrupt controllers (I
>> think we realize that is not perfectly fine, but here we are), etc.
>
> Yeah, IRQs are a tough nut to crack.
And BTW, doesn't it mean that interrupts also need to be hardened in the
guest (if we don't want the complexity of interrupt controllers in the
trusted hypervisor)? At least sensitive ones like IPIs, but I guess we
should also consider interrupt-based timings attacks, which could use
any type of interrupt. (I have no idea how to harden either of the two
cases, but I'm no expert.)
[1] https://lore.kernel.org/all/[email protected]/
On 6/16/23 17:16, Allen Webb wrote:
> That extra context helps, so the hardening is on the side of the guest
> kernel since the host kernel isn't trusted?
>
> My biggest concerns would be around situations where devices have
> memory access for things like DMA. In such cases the guest would need
> to be protected from the devices so bounce buffers or some limited
> shared memory might need to be set up to facilitate these devices
> without breaking the goals of pKVM.
I'm assuming you are talking about cases when we want a host-owned
device, e.g. a TPM from your example, to be able to DMA to the guest
memory (please correct me if you mean something different). I think with
pKVM it should be already possible to do securely and without extra
hardening in the guest (modulo establishing trust between the guest and
the TPM, which you mentioned, but that is needed anyway?). The
hypervisor in any case ensures protection of the guest memory from the
host devices DMA via IOMMU. Also the hypervisor allows the guest to
explicitly share its memory pages with the host via a hypercall. Those
shared pages, and only those, become accessible by the host devices DMA
as well.
P.S. I know that on chromebooks the TPM can't possibly do DMA. :)
> The minimum starting point for something like this would be a shared
> memory region visible to both the guest and the host. Given that it
> should be possible to build communication primitives on top, but yes
> ideally something like vsock or virtio would just work without
> introducing risk of exploitation and typically the hypervisor is
> trusted. Maybe this could be modeled as sibling to sibling
> virtio/vsock?
> On 6/16/23 20:07, Sean Christopherson wrote:
> > On Fri, Jun 16, 2023, Dmytro Maluka wrote:
> >> On 6/16/23 15:56, Sean Christopherson wrote:
> >>> On Fri, Jun 16, 2023, Dmytro Maluka wrote:
> >>>> Again, pedantic mode on, I find it difficult to agree with the wording
> >>>> that the guest owns "most of" the HW resources it uses. It controls the
> >>>> data communication with its hardware device, but other resources (e.g.
> >>>> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
> >>>> host and virtualized by it for the guest.
> >>>
> >>> I wasn't saying that the guest owns most resources, I was saying that the
> *untrusted*
> >>> host does *not* own most resources that are exposed to the guest. My
> understanding
> >>> is that everything in your list is owned by the trusted hypervisor in the pKVM
> model.
> >>
> >> Heh, no. Most of these resources are owned by the untrusted host, that's
> >> the point.
> >
> > Ah, I was overloading "owned", probably wrongly. What I'm trying to call out is
> > that in pKVM, while the untrusted host can withold resources, it can't subvert
> > most of those resources. Taking scheduling as an example, a pKVM vCPU may
> be
> > migrated to a different pCPU by the untrusted host, but pKVM ensures that it is
> > safe to run on the new pCPU, e.g. on Intel, pKVM (presumably) does any
> necessary
> > VMCLEAR, IBPB, INVEPT, etc. to ensure the vCPU doesn't consume stale data.
>
> Yep, agree.
>
> >> Basically for two reasons: 1. we want to keep the trusted hypervisor as
> >> simple as possible. 2. we don't need availability guarantees.
> >>
> >> The trusted hypervisor owns only: 2nd-stage MMU, IOMMU, VMCS (or its
> >> counterparts on non-Intel), physical PCI config space (merely for
> >> controlling a few critical registers like BARs and MSI address
> >> registers), perhaps a few more things that don't come to my mind now.
> >
> > The "physical PCI config space" is a key difference, and is very relevant to this
> > doc (see my response to Allen).
>
> Yeah, thanks for the links and the context, BTW.
>
> But let me clarify that we have 2 things here that should not be
> confused with each other. We have 2 levels of virtualization of the PCI
> config space in pKVM. The hypervisor traps the host's accesses to the
> config space, but mostly it simply passes them through to hardware. Most
> importantly, when the host reprograms a BAR, the hypervisor makes sure
> to update the corresponding MMIO mappings in the host's and the guest's
> 2nd-level page tables (that is what makes protection of the protected
> guest's passthrough PCI devices possible at all). But essentially it's
> the host that manages the physical config space. And the host, in turn,
> virtualizes it for the guest, using vfio-pci, like it is traditionally
> done for passthrough PCI devices.
>
> This latter, emulated config space is the concern. Looking at the
> patches [1] and thinking if those MSI-X misconfiguration attacks are
> possible in pKVM, I come to the conclusion that yes, they are.
>
> Device attestation helps with trusting/verifying static information, but
> the dynamically changing config space is something different.
>
> So it seems that such "emulated PCI config misconfiguration attacks"
> need to be included in the threat model for pKVM as well, i.e. need to
> be hardened on the guest side. Unless we revisit our current design
> assumptions for device assignment in pKVM on x86 and manage the physical
> PCI config in the trusted hypervisor, not in the host (with all the
> increasing complexity that comes with that, related to power management
> and other things).
Thank you very much for clarification Dmytro on this and many other
points when it comes to pKVM. It does help greatly to bring us on the same
page.
>
> Also, thinking more about it: irrespectively of passthrough devices, I
> guess that the protected pKVM guest may well want to use virtio with PCI
> transport (not for things like networking, but that's not the point),
> thus be prone to the same attacks.
>
> >> The untrusted host schedules its guests on physical CPUs (i.e. the
> >> host's L1 vCPUs are 1:1 mapped onto pCPUs), while the trusted hypervisor
> >> has no scheduling, it only handles vmexits from the host and guests. The
> >> untrusted host fully controls the physical interrupt controllers (I
> >> think we realize that is not perfectly fine, but here we are), etc.
> >
> > Yeah, IRQs are a tough nut to crack.
>
> And BTW, doesn't it mean that interrupts also need to be hardened in the
> guest (if we don't want the complexity of interrupt controllers in the
> trusted hypervisor)? At least sensitive ones like IPIs, but I guess we
> should also consider interrupt-based timings attacks, which could use
> any type of interrupt. (I have no idea how to harden either of the two
> cases, but I'm no expert.)
We have been thinking about it a bit at least when it comes to our
TDX case. Two main issues were identified: interrupts contributing
to the state of Linux PRNG [1] and potential implications of missing
interrupts for reliable panic and other kernel use cases [2].
[1] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#randomness-inside-tdx-guest
[2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#reliable-panic
For the first one, in addition to simply enforce usage of RDSEED
for TDX guests, we still want to do a proper evaluation of security
of Linux PRNG under our threat model. The second one is
harder to reliably asses imo, but so far we were not able to find any
concrete attack vectors. But it would be good if people who
have expertise in this, could take a look on the assessment we did.
The logic was to go over all kernel core callers of various
smp_call_function*, on_each_cpu* and check the implications
if such an IPI is never delivered.
Best Regards,
Elena.
On 6/19/23 13:23, Reshetova, Elena wrote:
>> And BTW, doesn't it mean that interrupts also need to be hardened in the
>> guest (if we don't want the complexity of interrupt controllers in the
>> trusted hypervisor)? At least sensitive ones like IPIs, but I guess we
>> should also consider interrupt-based timings attacks, which could use
>> any type of interrupt. (I have no idea how to harden either of the two
>> cases, but I'm no expert.)
>
> We have been thinking about it a bit at least when it comes to our
> TDX case. Two main issues were identified: interrupts contributing
> to the state of Linux PRNG [1] and potential implications of missing
> interrupts for reliable panic and other kernel use cases [2].
>
> [1] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#randomness-inside-tdx-guest
> [2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#reliable-panic
>
> For the first one, in addition to simply enforce usage of RDSEED
> for TDX guests, we still want to do a proper evaluation of security
> of Linux PRNG under our threat model. The second one is
> harder to reliably asses imo, but so far we were not able to find any
> concrete attack vectors. But it would be good if people who
> have expertise in this, could take a look on the assessment we did.
> The logic was to go over all kernel core callers of various
> smp_call_function*, on_each_cpu* and check the implications
> if such an IPI is never delivered.
Thanks. I also had in mind for example [1].
[1] https://people.cs.kuleuven.be/~jo.vanbulck/ccs18.pdf
On 6/12/23 11:47, Carlos Bilbao wrote:
> Kernel developers working on confidential computing for virtualized
> environments in x86 operate under a set of assumptions regarding the Linux
> kernel threat model that differs from the traditional view. Historically,
> the Linux threat model acknowledges attackers residing in userspace, as
> well as a limited set of external attackers that are able to interact with
> the kernel through networking or limited HW-specific exposed interfaces
> (e.g. USB, thunderbolt). The goal of this document is to explain additional
> attack vectors that arise in the virtualized confidential computing space
> and discuss the proposed protection mechanisms for the Linux kernel.
To expedite things, I'm going to outline the changes to make for v3 based
on the given feedback. Please, take a look and let me know if I'm missing
something. Changes for v3:
- Remove pKVM from the document. Although there are clear overlaps in the
threat models (as the discussions have shown), it might be good to omit
pKVM for now to avoid further complexity. In the future, when pKVM is
more mature, we can revisit and discuss its inclusion.
- Change file name to "snp-tdx-threat-model.rst".
- Replace hyphens (dashes) for parenthesis in a parenthetical sentence.
- Change "technology specific" for "technology-specific".
>
> Reviewed-by: Larry Dewey <[email protected]>
> Reviewed-by: David Kaplan <[email protected]>
> Co-developed-by: Elena Reshetova <[email protected]>
> Signed-off-by: Elena Reshetova <[email protected]>
> Signed-off-by: Carlos Bilbao <[email protected]>
> ---
>
> V1 can be found in:
> https://lore.kernel.org/lkml/[email protected]/
> Changes since v1:
>
> - Apply feedback from first version of the patch
> - Clarify that the document applies only to a particular angle of
> confidential computing, namely confidential computing for virtualized
> environments. Also, state that the document is specific to x86 and
> that the main goal is to discuss the emerging threats.
> - Change commit message and file name accordingly
> - Replace AMD's link to AMD SEV SNP white paper
> - Minor tweaking and clarifications
>
> ---
> Documentation/security/index.rst | 1 +
> .../security/x86-confidential-computing.rst | 298 ++++++++++++++++++
> MAINTAINERS | 6 +
> 3 files changed, 305 insertions(+)
> create mode 100644 Documentation/security/x86-confidential-computing.rst
>
> diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
> index 6ed8d2fa6f9e..bda919aecb37 100644
> --- a/Documentation/security/index.rst
> +++ b/Documentation/security/index.rst
> @@ -6,6 +6,7 @@ Security Documentation
> :maxdepth: 1
>
> credentials
> + x86-confidential-computing
> IMA-templates
> keys/index
> lsm
> diff --git a/Documentation/security/x86-confidential-computing.rst b/Documentation/security/x86-confidential-computing.rst
> new file mode 100644
> index 000000000000..5c52b8888089
> --- /dev/null
> +++ b/Documentation/security/x86-confidential-computing.rst
> @@ -0,0 +1,298 @@
> +======================================================
> +Confidential Computing in Linux for x86 virtualization
> +======================================================
> +
> +.. contents:: :local:
> +
> +By: Elena Reshetova <[email protected]> and Carlos Bilbao <[email protected]>
> +
> +Motivation
> +==========
> +
> +Kernel developers working on confidential computing for virtualized
> +environments in x86 operate under a set of assumptions regarding the Linux
> +kernel threat model that differ from the traditional view. Historically,
> +the Linux threat model acknowledges attackers residing in userspace, as
> +well as a limited set of external attackers that are able to interact with
> +the kernel through various networking or limited HW-specific exposed
> +interfaces (USB, thunderbolt). The goal of this document is to explain
> +additional attack vectors that arise in the confidential computing space
> +and discuss the proposed protection mechanisms for the Linux kernel.
> +
> +Overview and terminology
> +========================
> +
> +Confidential Computing (CoCo) is a broad term covering a wide range of
> +security technologies that aim to protect the confidentiality and integrity
> +of data in use (vs. data at rest or data in transit). At its core, CoCo
> +solutions provide a Trusted Execution Environment (TEE), where secure data
> +processing can be performed and, as a result, they are typically further
> +classified into different subtypes depending on the SW that is intended
> +to be run in TEE. This document focuses on a subclass of CoCo technologies
> +that are targeting virtualized environments and allow running Virtual
> +Machines (VM) inside TEE. From now on in this document will be referring
> +to this subclass of CoCo as 'Confidential Computing (CoCo) for the
> +virtualized environments (VE)'.
> +
> +CoCo, in the virtualization context, refers to a set of HW and/or SW
> +technologies that allow for stronger security guarantees for the SW running
> +inside a CoCo VM. Namely, confidential computing allows its users to
> +confirm the trustworthiness of all SW pieces to include in its reduced
> +Trusted Computing Base (TCB) given its ability to attest the state of these
> +trusted components.
> +
> +While the concrete implementation details differ between technologies, all
> +available mechanisms aim to provide increased confidentiality and
> +integrity for the VM's guest memory and execution state (vCPU registers),
> +more tightly controlled guest interrupt injection, as well as some
> +additional mechanisms to control guest-host page mapping. More details on
> +the x86-specific solutions can be found in
> +:doc:`Intel Trust Domain Extensions (TDX) </arch/x86/tdx>` and
> +`AMD Memory Encryption <https://www.amd.com/system/files/techdocs/sev-snp-strengthening-vm-isolation-with-integrity-protection-and-more.pdf>`_.
> +
> +The basic CoCo guest layout includes the host, guest, the interfaces that
> +communicate guest and host, a platform capable of supporting CoCo VMs, and
> +a trusted intermediary between the guest VM and the underlying platform
> +that acts as a security manager. The host-side virtual machine monitor
> +(VMM) typically consists of a subset of traditional VMM features and
> +is still in charge of the guest lifecycle, i.e. create or destroy a CoCo
> +VM, manage its access to system resources, etc. However, since it
> +typically stays out of CoCo VM TCB, its access is limited to preserve the
> +security objectives.
> +
> +In the following diagram, the "<--->" lines represent bi-directional
> +communication channels or interfaces between the CoCo security manager and
> +the rest of the components (data flow for guest, host, hardware) ::
> +
> + +-------------------+ +-----------------------+
> + | CoCo guest VM |<---->| |
> + +-------------------+ | |
> + | Interfaces | | CoCo security manager |
> + +-------------------+ | |
> + | Host VMM |<---->| |
> + +-------------------+ | |
> + | |
> + +--------------------+ | |
> + | CoCo platform |<--->| |
> + +--------------------+ +-----------------------+
> +
> +The specific details of the CoCo security manager vastly diverge between
> +technologies. For example, in some cases, it will be implemented in HW
> +while in others it may be pure SW. In some cases, such as for the
> +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-staging/pKVM-IA>`,
> +the CoCo security manager is a small, isolated and highly privileged
> +(compared to the rest of SW running on the host) part of a traditional
> +VMM.
> +
> +Existing Linux kernel threat model
> +==================================
> +
> +The overall components of the current Linux kernel threat model are::
> +
> + +-----------------------+ +-------------------+
> + | |<---->| Userspace |
> + | | +-------------------+
> + | External attack | | Interfaces |
> + | vectors | +-------------------+
> + | |<---->| Linux Kernel |
> + | | +-------------------+
> + +-----------------------+ +-------------------+
> + | Bootloader/BIOS |
> + +-------------------+
> + +-------------------+
> + | HW platform |
> + +-------------------+
> +
> +There is also communication between the bootloader and the kernel during
> +the boot process, but this diagram does not represent it explicitly. The
> +"Interfaces" box represents the various interfaces that allow
> +communication between kernel and userspace. This includes system calls,
> +kernel APIs, device drivers, etc.
> +
> +The existing Linux kernel threat model typically assumes execution on a
> +trusted HW platform with all of the firmware and bootloaders included on
> +its TCB. The primary attacker resides in the userspace, and all of the data
> +coming from there is generally considered untrusted, unless userspace is
> +privileged enough to perform trusted actions. In addition, external
> +attackers are typically considered, including those with access to enabled
> +external networks (e.g. Ethernet, Wireless, Bluetooth), exposed hardware
> +interfaces (e.g. USB, Thunderbolt), and the ability to modify the contents
> +of disks offline.
> +
> +Regarding external attack vectors, it is interesting to note that in most
> +cases external attackers will try to exploit vulnerabilities in userspace
> +first, but that it is possible for an attacker to directly target the
> +kernel; particularly if the host has physical access. Examples of direct
> +kernel attacks include the vulnerabilities CVE-2019-19524, CVE-2022-0435
> +and CVE-2020-24490.
> +
> +Confidential Computing threat model and its security objectives
> +===============================================================
> +
> +Confidential Computing adds a new type of attacker to the above list: a
> +potentially misbehaving host (which can also include some part of a
> +traditional VMM or all of it), which is typically placed outside of the
> +CoCo VM TCB due to its large SW attack surface. It is important to note
> +that this doesn’t imply that the host or VMM are intentionally
> +malicious, but that there exists a security value in having a small CoCo
> +VM TCB. This new type of adversary may be viewed as a more powerful type
> +of external attacker, as it resides locally on the same physical machine
> +-in contrast to a remote network attacker- and has control over the guest
> +kernel communication with most of the HW::
> +
> + +------------------------+
> + | CoCo guest VM |
> + +-----------------------+ | +-------------------+ |
> + | |<--->| | Userspace | |
> + | | | +-------------------+ |
> + | External attack | | | Interfaces | |
> + | vectors | | +-------------------+ |
> + | |<--->| | Linux Kernel | |
> + | | | +-------------------+ |
> + +-----------------------+ | +-------------------+ |
> + | | Bootloader/BIOS | |
> + +-----------------------+ | +-------------------+ |
> + | |<--->+------------------------+
> + | | | Interfaces |
> + | | +------------------------+
> + | CoCo security |<--->| Host/Host-side VMM |
> + | manager | +------------------------+
> + | | +------------------------+
> + | |<--->| CoCo platform |
> + +-----------------------+ +------------------------+
> +
> +While traditionally the host has unlimited access to guest data and can
> +leverage this access to attack the guest, the CoCo systems mitigate such
> +attacks by adding security features like guest data confidentiality and
> +integrity protection. This threat model assumes that those features are
> +available and intact.
> +
> +The **Linux kernel CoCo VM security objectives** can be summarized as follows:
> +
> +1. Preserve the confidentiality and integrity of CoCo guest's private
> +memory and registers.
> +
> +2. Prevent privileged escalation from a host into a CoCo guest Linux kernel.
> +While it is true that the host (and host-side VMM) requires some level of
> +privilege to create, destroy, or pause the guest, part of the goal of
> +preventing privileged escalation is to ensure that these operations do not
> +provide a pathway for attackers to gain access to the guest's kernel.
> +
> +The above security objectives result in two primary **Linux kernel CoCo
> +VM assets**:
> +
> +1. Guest kernel execution context.
> +2. Guest kernel private memory.
> +
> +The host retains full control over the CoCo guest resources, and can deny
> +access to them at any time. Examples of resources include CPU time, memory
> +that the guest can consume, network bandwidth, etc. Because of this, the
> +host Denial of Service (DoS) attacks against CoCo guests are beyond the
> +scope of this threat model.
> +
> +The **Linux CoCo VM attack surface** is any interface exposed from a CoCo
> +guest Linux kernel towards an untrusted host that is not covered by the
> +CoCo technology SW/HW protection. This includes any possible
> +side-channels, as well as transient execution side channels. Examples of
> +explicit (not side-channel) interfaces include accesses to port I/O, MMIO
> +and DMA interfaces, access to PCI configuration space, VMM-specific
> +hypercalls (towards Host-side VMM), access to shared memory pages,
> +interrupts allowed to be injected into the guest kernel by the host, as
> +well as CoCo technology specific hypercalls, if present. Additionally, the
> +host in a CoCo system typically controls the process of creating a CoCo
> +guest: it has a method to load into a guest the firmware and bootloader
> +images, the kernel image together with the kernel command line. All of this
> +data should also be considered untrusted until its integrity and
> +authenticity is established via attestation.
> +
> +The table below shows a threat matrix for the CoCo guest Linux kernel with
> +the potential mitigation strategies. The matrix refers to CoCo-specific
> +versions of the guest, host and platform.
> +
> +.. list-table:: CoCo Linux guest kernel threat matrix
> + :widths: auto
> + :align: center
> + :header-rows: 1
> +
> + * - Threat name
> + - Threat description
> + - Mitigation strategies
> +
> + * - Guest malicious configuration
> + - A misbehaving host modifies one of the following guest's
> + configuration:
> +
> + 1. Guest firmware or bootloader
> +
> + 2. Guest kernel or module binaries
> +
> + 3. Guest command line parameters
> +
> + This allows the host to break the integrity of the code running
> + inside a CoCo guest, and violates the CoCo security objectives.
> + - The integrity of the guest's configuration passed via untrusted host
> + must be ensured by methods such as remote attestation and signing.
> + This should be largely transparent to the guest kernel, and would
> + allow it to assume a trusted state at the time of boot.
> +
> + * - CoCo guest data attacks
> + - A misbehaving host retains full control of the CoCo guest's data
> + in-transit between the guest and the host-managed physical or
> + virtual devices. This allows any attack against confidentiality,
> + integrity or freshness of such data.
> + - The CoCo guest is responsible for ensuring the confidentiality,
> + integrity and freshness of such data using well-established
> + security mechanisms. For example, for any guest external network
> + communications passed via the untrusted host, an end-to-end
> + secure session must be established between a guest and a trusted
> + remote endpoint using well-known protocols such as TLS.
> + This requirement also applies to protection of the guest's disk
> + image.
> +
> + * - Malformed runtime input
> + - A misbehaving host injects malformed input via any communication
> + interface used by the guest's kernel code. If the code is not
> + prepared to handle this input correctly, this can result in a host
> + --> guest kernel privilege escalation. This includes traditional
> + side-channel and/or transient execution attack vectors.
> + - The attestation or signing process cannot help to mitigate this
> + threat since this input is highly dynamic. Instead, a different set
> + of mechanisms is required:
> +
> + 1. *Limit the exposed attack surface*. Whenever possible, disable
> + complex kernel features and device drivers (not required for guest
> + operation) that actively use the communication interfaces between
> + the untrusted host and the guest. This is not a new concept for the
> + Linux kernel, since it already has mechanisms to disable external
> + interfaces, such as attacker's access via USB/Thunderbolt subsystem.
> +
> + 2. *Harden the exposed attack surface*. Any code that uses such
> + interfaces must treat the input from the untrusted host as malicious,
> + and do sanity checks before processing it. This can be ensured by
> + performing a code audit of such device drivers as well as employing
> + other standard techniques for testing the code robustness, such as
> + fuzzing. This is again a well-known concept for the Linux kernel,
> + since all its networking code has been previously analyzed under
> + presumption of processing malformed input from a network attacker.
> +
> + * - Malicious runtime input
> + - A misbehaving host injects a specific input value via any
> + communication interface used by the guest's kernel code. The
> + difference with the previous attack vector (malformed runtime input)
> + is that this input is not malformed, but its value is crafted to
> + impact the guest's kernel security. Examples of such inputs include
> + providing a malicious time to the guest or the entropy to the guest
> + random number generator. Additionally, the timing of such events can
> + be an attack vector on its own, if it results in a particular guest
> + kernel action (i.e. processing of a host-injected interrupt).
> + - Similarly, as with the previous attack vector, it is not possible to
> + use attestation mechanisms to address this threat. Instead, such
> + attack vectors (i.e. interfaces) must be either disabled or made
> + resistant to supplied host input.
> +
> +As can be seen from the above table, the potential mitigation strategies
> +to secure the CoCo Linux guest kernel vary, but can be roughly split into
> +mechanisms that either require or do not require changes to the existing
> +Linux kernel code. One main goal of the CoCo security architecture is to
> +minimize changes to the Linux kernel code, while also providing usable
> +and scalable means to facilitate the security of a CoCo guest kernel.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a73486c4aa6e..1d4ae60cdee9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5197,6 +5197,12 @@ S: Orphan
> W: http://accessrunner.sourceforge.net/
> F: drivers/usb/atm/cxacru.c
>
> +CONFIDENTIAL COMPUTING THREAT MODEL FOR X86 VIRTUALIZATION
> +M: Elena Reshetova <[email protected]>
> +M: Carlos Bilbao <[email protected]>
> +S: Maintained
> +F: Documentation/security/x86-confidential-computing.rst
> +
> CONFIGFS
> M: Joel Becker <[email protected]>
> M: Christoph Hellwig <[email protected]>
Thanks,
Carlos