2020-04-21 18:43:55

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 00/15] Add support for Nitro Enclaves

Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
that allows customers to carve out isolated compute environments within EC2
instances [1].

For example, an application that processes highly sensitive data and
runs in a VM, can be separated from other applications running in the same VM.
This application then runs in a separate VM than the primary VM, namely an
enclave.

An enclave runs alongside the VM that spawned it. This setup matches low latency
applications needs. The resources that are allocated for the enclave, such as
memory and CPU, are carved out of the primary VM. Each enclave is mapped to a
process running in the primary VM, that communicates with the NE driver via an
ioctl interface.

An enclave communicates with the primary VM via a local communication channel,
using virtio-vsock [2]. An enclave does not have a disk or a network device
attached.

The following patch series covers the NE driver for enclave lifetime management.
It provides an ioctl interface to the user space and includes a PCI device
driver that is the means of communication with the hypervisor running on the
host where the primary VM and the enclave are launched.

The proposed solution is following the KVM model and uses the KVM API to be able
to create and set resources for enclaves. An additional ioctl command, besides
the ones provided by KVM, is used to start an enclave and setup the addressing
for the communication channel and an enclave unique id.

Thank you.

Andra

[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
[2] http://man7.org/linux/man-pages/man7/vsock.7.html

---

Patch Series Changelog

The patch series is built on top of v5.7-rc2.

v1

* The current patch series.

---

Andra Paraschiv (15):
nitro_enclaves: Add ioctl interface definition
nitro_enclaves: Define the PCI device interface
nitro_enclaves: Define enclave info for internal bookkeeping
nitro_enclaves: Init PCI device driver
nitro_enclaves: Handle PCI device command requests
nitro_enclaves: Handle out-of-band PCI device events
nitro_enclaves: Init misc device providing the ioctl interface
nitro_enclaves: Add logic for enclave vm creation
nitro_enclaves: Add logic for enclave vcpu creation
nitro_enclaves: Add logic for enclave memory region set
nitro_enclaves: Add logic for enclave start
nitro_enclaves: Add logic for enclave termination
nitro_enclaves: Add Kconfig for the Nitro Enclaves driver
nitro_enclaves: Add Makefile for the Nitro Enclaves driver
MAINTAINERS: Add entry for the Nitro Enclaves driver

MAINTAINERS | 11 +
drivers/virt/Kconfig | 2 +
drivers/virt/Makefile | 2 +
drivers/virt/amazon/Kconfig | 28 +
drivers/virt/amazon/Makefile | 19 +
drivers/virt/amazon/nitro_enclaves/Makefile | 23 +
.../virt/amazon/nitro_enclaves/ne_misc_dev.c | 1039 +++++++++++++++++
.../virt/amazon/nitro_enclaves/ne_misc_dev.h | 120 ++
.../virt/amazon/nitro_enclaves/ne_pci_dev.c | 675 +++++++++++
.../virt/amazon/nitro_enclaves/ne_pci_dev.h | 266 +++++
include/linux/nitro_enclaves.h | 23 +
include/uapi/linux/nitro_enclaves.h | 52 +
12 files changed, 2260 insertions(+)
create mode 100644 drivers/virt/amazon/Kconfig
create mode 100644 drivers/virt/amazon/Makefile
create mode 100644 drivers/virt/amazon/nitro_enclaves/Makefile
create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_misc_dev.h
create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_pci_dev.h
create mode 100644 include/linux/nitro_enclaves.h
create mode 100644 include/uapi/linux/nitro_enclaves.h

--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


2020-04-21 18:44:11

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 02/15] nitro_enclaves: Define the PCI device interface

The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.

This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself run, e.g. to launch a VM that is used
for the enclave.

Define the MMIO space of the PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the functions for the PCI device init
/ uninit and command requests handling.

Signed-off-by: Alexandru-Catalin Vasile <[email protected]>
Signed-off-by: Alexandru Ciobotaru <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_pci_dev.h | 266 ++++++++++++++++++
1 file changed, 266 insertions(+)
create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_pci_dev.h

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.h b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.h
new file mode 100644
index 000000000000..e703419ed29d
--- /dev/null
+++ b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.h
@@ -0,0 +1,266 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _NE_PCI_DEV_H_
+#define _NE_PCI_DEV_H_
+
+#include <linux/atomic.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pci_ids.h>
+#include <linux/wait.h>
+
+/* Nitro Enclaves (NE) PCI device identifier */
+
+#define PCI_DEVICE_ID_NE (0xe4c1)
+#define PCI_BAR_NE (0x03)
+
+/* Device registers */
+
+/**
+ * (1 byte) Register to notify the device that the driver is using it
+ * (Read/Write).
+ */
+#define NE_ENABLE (0x0000)
+#define NE_ENABLE_ON (0x00)
+#define NE_ENABLE_OFF (0x01)
+
+/* (2 bytes) Register to select the device run-time version (Read/Write). */
+#define NE_VERSION (0x0002)
+#define NE_VERSION_MAX (0x0001)
+
+/**
+ * (4 bytes) Register to notify the device what command was requested
+ * (Write-Only).
+ */
+#define NE_COMMAND (0x0004)
+
+/**
+ * (4 bytes) Register to notify the driver that a reply or a device event
+ * is available (Read-Only):
+ * - Lower half - command reply counter
+ * - Higher half - out-of-band device event counter
+ */
+#define NE_EVTCNT (0x000c)
+#define NE_EVTCNT_REPLY_SHIFT (0)
+#define NE_EVTCNT_REPLY_MASK (0x0000ffff)
+#define NE_EVTCNT_REPLY(cnt) (((cnt) & NE_EVTCNT_REPLY_MASK) >> \
+ NE_EVTCNT_REPLY_SHIFT)
+#define NE_EVTCNT_EVENT_SHIFT (16)
+#define NE_EVTCNT_EVENT_MASK (0xffff0000)
+#define NE_EVTCNT_EVENT(cnt) (((cnt) & NE_EVTCNT_EVENT_MASK) >> \
+ NE_EVTCNT_EVENT_SHIFT)
+
+/* (240 bytes) Buffer for sending the command request payload (Read/Write). */
+#define NE_SEND_DATA (0x0010)
+
+/* (240 bytes) Buffer for receiving the command reply payload (Read-Only). */
+#define NE_RECV_DATA (0x0100)
+
+/* Device MMIO buffer sizes */
+
+/* 240 bytes for send / recv buffer. */
+#define NE_SEND_DATA_SIZE (240)
+#define NE_RECV_DATA_SIZE (240)
+
+/* MSI-X interrupt vectors */
+
+/* MSI-X vector used for command reply notification. */
+#define NE_VEC_REPLY (0)
+
+/* MSI-X vector used for out-of-band events e.g. enclave crash. */
+#define NE_VEC_EVENT (1)
+
+/* Device command types. */
+enum ne_pci_dev_cmd_type {
+ INVALID_CMD = 0,
+ ENCLAVE_START = 1,
+ ENCLAVE_GET_SLOT = 2,
+ ENCLAVE_STOP = 3,
+ SLOT_ALLOC = 4,
+ SLOT_FREE = 5,
+ SLOT_ADD_MEM = 6,
+ SLOT_ADD_VCPU = 7,
+ SLOT_COUNT = 8,
+ NEXT_SLOT = 9,
+ SLOT_INFO = 10,
+ SLOT_ADD_BULK_VCPUS = 11,
+ MAX_CMD,
+};
+
+/* Device commands - payload structure for requests and replies. */
+
+struct enclave_start_req {
+ /* Slot unique id mapped to the enclave to start. */
+ u64 slot_uid;
+
+ /**
+ * Context ID (CID) for the enclave vsock device.
+ * If 0, CID is autogenerated.
+ */
+ u64 enclave_cid;
+
+ /* Flags for the enclave to start with (e.g. debug mode). */
+ u64 flags;
+} __attribute__ ((__packed__));
+
+struct enclave_get_slot_req {
+ /* Context ID (CID) for the enclave vsock device. */
+ u64 enclave_cid;
+} __attribute__ ((__packed__));
+
+struct enclave_stop_req {
+ /* Slot unique id mapped to the enclave to stop. */
+ u64 slot_uid;
+} __attribute__ ((__packed__));
+
+struct slot_alloc_req {
+ /* In order to avoid weird sizeof edge cases. */
+ u8 unused;
+} __attribute__ ((__packed__));
+
+struct slot_free_req {
+ /* Slot unique id mapped to the slot to free. */
+ u64 slot_uid;
+} __attribute__ ((__packed__));
+
+struct slot_add_mem_req {
+ /* Slot unique id mapped to the slot to add the memory region to. */
+ u64 slot_uid;
+
+ /* Physical address of the memory region to add to the slot. */
+ u64 paddr;
+
+ /* Memory size, in bytes, of the memory region to add to the slot. */
+ u64 size;
+} __attribute__ ((__packed__));
+
+struct slot_add_vcpu_req {
+ /* Slot unique id mapped to the slot to add the vCPU to. */
+ u64 slot_uid;
+
+ /* vCPU ID of the CPU to add to the enclave. */
+ u32 vcpu_id;
+} __attribute__ ((__packed__));
+
+struct slot_count_req {
+ /* In order to avoid weird sizeof edge cases. */
+ u8 unused;
+} __attribute__ ((__packed__));
+
+struct next_slot_req {
+ /* Slot unique id of the next slot in the iteration. */
+ u64 slot_uid;
+} __attribute__ ((__packed__));
+
+struct slot_info_req {
+ /* Slot unique id mapped to the slot to get information about. */
+ u64 slot_uid;
+} __attribute__ ((__packed__));
+
+struct slot_add_bulk_vcpus_req {
+ /* Slot unique id mapped to the slot to add vCPUs to. */
+ u64 slot_uid;
+
+ /* Number of vCPUs to add to the slot. */
+ u64 nr_vcpus;
+} __attribute__ ((__packed__));
+
+struct ne_pci_dev_cmd_reply {
+ s32 rc;
+
+ /* Valid for all commands except SLOT_COUNT. */
+ u64 slot_uid;
+
+ /* Valid for ENCLAVE_START command. */
+ u64 enclave_cid;
+
+ /* Valid for SLOT_COUNT command. */
+ u64 slot_count;
+
+ /* Valid for SLOT_ALLOC and SLOT_INFO commands. */
+ u64 mem_regions;
+
+ /* Valid for SLOT_INFO command. */
+ u64 mem_size;
+
+ /* Valid for SLOT_INFO command. */
+ u64 nr_vcpus;
+
+ /* Valid for SLOT_INFO command. */
+ u64 flags;
+
+ /* Valid for SLOT_INFO command. */
+ u16 state;
+} __attribute__ ((__packed__));
+
+/* Nitro Enclaves (NE) PCI device. */
+struct ne_pci_dev {
+ /* Variable set if a reply has been sent by the PCI device. */
+ atomic_t cmd_reply_avail;
+
+ /* Wait queue for handling command reply from the PCI device. */
+ wait_queue_head_t cmd_reply_wait_q;
+
+ /* List of the enclaves managed by the PCI device. */
+ struct list_head enclaves_list;
+
+ /* Mutex for accessing the list of enclaves. */
+ struct mutex enclaves_list_mutex;
+
+ /**
+ * Work queue for handling out-of-band events triggered by the Nitro
+ * Hypervisor which require enclave state scanning and propagation to
+ * the enclave process.
+ */
+ struct workqueue_struct *event_wq;
+
+ /* MMIO region of the PCI device. */
+ void __iomem *iomem_base;
+
+ /* Work item for every received out-of-band event. */
+ struct work_struct notify_work;
+
+ /* Mutex for accessing the PCI dev MMIO space. */
+ struct mutex pci_dev_mutex;
+};
+
+/**
+ * ne_do_request - Submit command request to the PCI device based on the command
+ * type and retrieve the associated reply.
+ *
+ * This function uses the ne_pci_dev mutex to handle one command at a time.
+ *
+ * @pdev: PCI device to send the command to and receive the reply from.
+ * @cmd_type: command type of the request sent to the PCI device.
+ * @cmd_request: command request payload.
+ * @cmd_request_size: size of the command request payload.
+ * @cmd_reply: command reply payload.
+ * @cmd_reply_size: size of the command reply payload.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+ void *cmd_request, size_t cmd_request_size,
+ struct ne_pci_dev_cmd_reply *cmd_reply,
+ size_t cmd_reply_size);
+
+/* Nitro Enclaves (NE) PCI device driver */
+extern struct pci_driver ne_pci_driver;
+
+#endif /* _NE_PCI_DEV_H_ */
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:44:20

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 01/15] nitro_enclaves: Add ioctl interface definition

The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Include the KVM API as part of the provided ioctl interface, with an
additional ENCLAVE_START ioctl command that triggers the enclave run.

Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
include/linux/nitro_enclaves.h | 23 +++++++++++++
include/uapi/linux/nitro_enclaves.h | 52 +++++++++++++++++++++++++++++
2 files changed, 75 insertions(+)
create mode 100644 include/linux/nitro_enclaves.h
create mode 100644 include/uapi/linux/nitro_enclaves.h

diff --git a/include/linux/nitro_enclaves.h b/include/linux/nitro_enclaves.h
new file mode 100644
index 000000000000..7e593a9fbf8c
--- /dev/null
+++ b/include/linux/nitro_enclaves.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include <uapi/linux/nitro_enclaves.h>
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/include/uapi/linux/nitro_enclaves.h b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index 000000000000..b90dfcf6253a
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include <linux/kvm.h>
+#include <linux/types.h>
+
+/* Nitro Enclaves (NE) Kernel Driver Interface */
+
+/**
+ * The command is used to trigger enclave start after the enclave resources,
+ * such as memory and CPU, have been set.
+ *
+ * The enclave start metadata is an in / out data structure. It includes
+ * provided info by the caller - enclave cid and flags - and returns the
+ * slot uid and the cid (if input cid is 0).
+ */
+#define NE_ENCLAVE_START _IOWR('B', 0x1, struct enclave_start_metadata)
+
+/* Setup metadata necessary for enclave start. */
+struct enclave_start_metadata {
+ /* Flags for the enclave to start with (e.g. debug mode) (in). */
+ __u64 flags;
+
+ /**
+ * Context ID (CID) for the enclave vsock device. If 0 as input, the
+ * CID is autogenerated by the hypervisor and returned back as output
+ * by the driver (in/out).
+ */
+ __u64 enclave_cid;
+
+ /* Slot unique id mapped to the enclave to start (out). */
+ __u64 slot_uid;
+};
+
+#endif /* _UAPI_LINUX_NITRO_ENCLAVES_H_ */
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:44:32

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 03/15] nitro_enclaves: Define enclave info for internal bookkeeping

The Nitro Enclaves driver keeps an internal info per each enclave.

This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.

Signed-off-by: Alexandru-Catalin Vasile <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_misc_dev.h | 120 ++++++++++++++++++
1 file changed, 120 insertions(+)
create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_misc_dev.h

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.h b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.h
new file mode 100644
index 000000000000..dece3ead86b9
--- /dev/null
+++ b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.h
@@ -0,0 +1,120 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _NE_MISC_DEV_H_
+#define _NE_MISC_DEV_H_
+
+#include <linux/cpumask.h>
+#include <linux/list.h>
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/wait.h>
+
+/* Entry in vCPU IDs list. */
+struct ne_vcpu_id {
+ /* CPU id associated with a given slot, apic id on x86. */
+ u32 vcpu_id;
+
+ struct list_head vcpu_id_list_entry;
+};
+
+/* Entry in memory regions list. */
+struct ne_mem_region {
+ struct list_head mem_region_list_entry;
+
+ /* Number of pages that make up the memory region. */
+ unsigned long nr_pages;
+
+ /* Pages that make up the user space memory region. */
+ struct page **pages;
+};
+
+/* Per-enclave data used for enclave lifetime management. */
+struct ne_enclave {
+ /**
+ * CPU pool with siblings of already allocated CPUs to an enclave.
+ * This is used when a CPU pool is set, to be able to know the CPU
+ * siblings for the hyperthreading (HT) setup.
+ */
+ cpumask_var_t cpu_siblings;
+
+ struct list_head enclave_list_entry;
+
+ /* Mutex for accessing this internal state. */
+ struct mutex enclave_info_mutex;
+
+ /**
+ * Wait queue used for out-of-band event notifications
+ * triggered from the PCI device event handler to the enclave
+ * process via the poll function.
+ */
+ wait_queue_head_t eventq;
+
+ /* Variable used to determine if the out-of-band event was triggered. */
+ bool has_event;
+
+ /**
+ * The maximum number of memory regions that can be handled by the
+ * lower levels.
+ */
+ u64 max_mem_regions;
+
+ /* Enclave memory regions list. */
+ struct list_head mem_regions_list;
+
+ /* Enclave process abstraction mm data struct. */
+ struct mm_struct *mm;
+
+ /* PCI device used for enclave lifetime management. */
+ struct pci_dev *pdev;
+
+ /* Slot unique id mapped to the enclave. */
+ u64 slot_uid;
+
+ /* Enclave state, updated during enclave lifetime. */
+ u16 state;
+
+ /* Enclave vCPUs list. */
+ struct list_head vcpu_ids_list;
+};
+
+/**
+ * States available for an enclave.
+ *
+ * TODO: Determine if the following states are exposing enough information
+ * to the kernel driver.
+ */
+enum ne_state {
+ /* NE_ENCLAVE_START ioctl was never issued for the enclave. */
+ NE_STATE_INIT = 0,
+
+ /**
+ * NE_ENCLAVE_START ioctl was issued and the enclave is running
+ * as expected.
+ */
+ NE_STATE_RUNNING = 2,
+
+ /* Enclave exited without userspace interaction. */
+ NE_STATE_STOPPED = U16_MAX,
+};
+
+/* Nitro Enclaves (NE) misc device */
+extern struct miscdevice ne_miscdevice;
+
+#endif /* _NE_MISC_DEV_H_ */
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:44:42

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 04/15] nitro_enclaves: Init PCI device driver

The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile <[email protected]>
Signed-off-by: Alexandru Ciobotaru <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_pci_dev.c | 278 ++++++++++++++++++
1 file changed, 278 insertions(+)
create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
new file mode 100644
index 000000000000..8fbee95ea291
--- /dev/null
+++ b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,278 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/* Nitro Enclaves (NE) PCI device driver. */
+
+#include <linux/bug.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/module.h>
+#include <linux/nitro_enclaves.h>
+#include <linux/pci.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define DEFAULT_TIMEOUT_MSECS (120000) // 120 sec
+
+static const struct pci_device_id ne_pci_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
+ { 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, ne_pci_ids);
+
+/**
+ * ne_setup_msix - Setup MSI-X vectors for the PCI device.
+ *
+ * @pdev: PCI device to setup the MSI-X for.
+ * @ne_pci_dev: PCI device private data structure.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev, struct ne_pci_dev *ne_pci_dev)
+{
+ int nr_vecs = 0;
+ int rc = -EINVAL;
+
+ BUG_ON(!ne_pci_dev);
+
+ nr_vecs = pci_msix_vec_count(pdev);
+ if (nr_vecs < 0) {
+ rc = nr_vecs;
+
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in getting vec count [rc=%d]\n",
+ rc);
+
+ return rc;
+ }
+
+ rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in alloc MSI-X vecs [rc=%d]\n",
+ rc);
+
+ goto err_alloc_irq_vecs;
+ }
+
+ return 0;
+
+err_alloc_irq_vecs:
+ return rc;
+}
+
+/**
+ * ne_pci_dev_enable - Select PCI device version and enable it.
+ *
+ * @pdev: PCI device to select version for and then enable.
+ * @ne_pci_dev: PCI device private data structure.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_pci_dev_enable(struct pci_dev *pdev,
+ struct ne_pci_dev *ne_pci_dev)
+{
+ u8 dev_enable_reply = 0;
+ u16 dev_version_reply = 0;
+
+ BUG_ON(!pdev);
+ BUG_ON(!ne_pci_dev);
+ BUG_ON(!ne_pci_dev->iomem_base);
+
+ iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
+
+ dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
+ if (dev_version_reply != NE_VERSION_MAX) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in pci dev version cmd\n");
+
+ return -EIO;
+ }
+
+ iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
+
+ dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+ if (dev_enable_reply != NE_ENABLE_ON) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in pci dev enable cmd\n");
+
+ return -EIO;
+ }
+
+ return 0;
+}
+
+/**
+ * ne_pci_dev_disable - Disable PCI device.
+ *
+ * @pdev: PCI device to disable.
+ * @ne_pci_dev: PCI device private data structure.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_pci_dev_disable(struct pci_dev *pdev,
+ struct ne_pci_dev *ne_pci_dev)
+{
+ u8 dev_disable_reply = 0;
+
+ BUG_ON(!pdev);
+ BUG_ON(!ne_pci_dev);
+ BUG_ON(!ne_pci_dev->iomem_base);
+
+ iowrite8(NE_ENABLE_OFF, ne_pci_dev->iomem_base + NE_ENABLE);
+
+ /*
+ * TODO: Check for NE_ENABLE_OFF in a loop, to handle cases when the
+ * device state is not immediately set to disabled and going through a
+ * transitory state of disabling.
+ */
+ dev_disable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+ if (dev_disable_reply != NE_ENABLE_OFF) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in pci dev disable cmd\n");
+
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static int ne_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+ struct ne_pci_dev *ne_pci_dev = NULL;
+ int rc = -EINVAL;
+
+ ne_pci_dev = kzalloc(sizeof(*ne_pci_dev), GFP_KERNEL);
+ if (!ne_pci_dev)
+ return -ENOMEM;
+
+ rc = pci_enable_device(pdev);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in pci dev enable [rc=%d]\n", rc);
+
+ goto err_pci_enable_dev;
+ }
+
+ rc = pci_request_regions_exclusive(pdev, "ne_pci_dev");
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in pci request regions [rc=%d]\n",
+ rc);
+
+ goto err_req_regions;
+ }
+
+ ne_pci_dev->iomem_base = pci_iomap(pdev, PCI_BAR_NE, 0);
+ if (!ne_pci_dev->iomem_base) {
+ rc = -ENOMEM;
+
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in pci bar mapping [rc=%d]\n", rc);
+
+ goto err_iomap;
+ }
+
+ rc = ne_setup_msix(pdev, ne_pci_dev);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in pci dev msix setup [rc=%d]\n",
+ rc);
+
+ goto err_setup_msix;
+ }
+
+ rc = ne_pci_dev_disable(pdev, ne_pci_dev);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in ne_pci_dev disable [rc=%d]\n",
+ rc);
+
+ goto err_ne_pci_dev_disable;
+ }
+
+ rc = ne_pci_dev_enable(pdev, ne_pci_dev);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in ne_pci_dev enable [rc=%d]\n",
+ rc);
+
+ goto err_ne_pci_dev_enable;
+ }
+
+ atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
+ init_waitqueue_head(&ne_pci_dev->cmd_reply_wait_q);
+ INIT_LIST_HEAD(&ne_pci_dev->enclaves_list);
+ mutex_init(&ne_pci_dev->enclaves_list_mutex);
+ mutex_init(&ne_pci_dev->pci_dev_mutex);
+
+ pci_set_drvdata(pdev, ne_pci_dev);
+
+ return 0;
+
+err_ne_pci_dev_enable:
+err_ne_pci_dev_disable:
+ pci_free_irq_vectors(pdev);
+err_setup_msix:
+ pci_iounmap(pdev, ne_pci_dev->iomem_base);
+err_iomap:
+ pci_release_regions(pdev);
+err_req_regions:
+ pci_disable_device(pdev);
+err_pci_enable_dev:
+ kzfree(ne_pci_dev);
+ return rc;
+}
+
+static void ne_remove(struct pci_dev *pdev)
+{
+ struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+ if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+ return;
+
+ ne_pci_dev_disable(pdev, ne_pci_dev);
+
+ pci_set_drvdata(pdev, NULL);
+
+ pci_free_irq_vectors(pdev);
+
+ pci_iounmap(pdev, ne_pci_dev->iomem_base);
+
+ kzfree(ne_pci_dev);
+
+ pci_release_regions(pdev);
+
+ pci_disable_device(pdev);
+}
+
+/*
+ * TODO: Add suspend / resume functions for power management w/ CONFIG_PM, if
+ * needed.
+ */
+struct pci_driver ne_pci_driver = {
+ .name = "ne_pci_dev",
+ .id_table = ne_pci_ids,
+ .probe = ne_probe,
+ .remove = ne_remove,
+};
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:44:50

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 05/15] nitro_enclaves: Handle PCI device command requests

The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.

Add logic for handling PCI device command requests based on the given
command type.

Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.

Signed-off-by: Alexandru-Catalin Vasile <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_pci_dev.c | 264 ++++++++++++++++++
1 file changed, 264 insertions(+)

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
index 8fbee95ea291..7453d129689a 100644
--- a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
@@ -40,6 +40,251 @@ static const struct pci_device_id ne_pci_ids[] = {

MODULE_DEVICE_TABLE(pci, ne_pci_ids);

+/**
+ * ne_submit_request - Submit command request to the PCI device based on the
+ * command type.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device to send the command to.
+ * @cmd_type: command type of the request sent to the PCI device.
+ * @cmd_request: command request payload.
+ * @cmd_request_size: size of the command request payload.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_submit_request(struct pci_dev *pdev,
+ enum ne_pci_dev_cmd_type cmd_type,
+ void *cmd_request, size_t cmd_request_size)
+{
+ struct ne_pci_dev *ne_pci_dev = NULL;
+
+ BUG_ON(!pdev);
+
+ ne_pci_dev = pci_get_drvdata(pdev);
+ BUG_ON(!ne_pci_dev);
+ BUG_ON(!ne_pci_dev->iomem_base);
+
+ if (WARN_ON(cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD)) {
+ dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%d\n",
+ cmd_type);
+
+ return -EINVAL;
+ }
+
+ if (WARN_ON(!cmd_request))
+ return -EINVAL;
+
+ if (WARN_ON(cmd_request_size > NE_SEND_DATA_SIZE)) {
+ dev_err_ratelimited(&pdev->dev,
+ "Invalid req size=%ld for cmd type=%d\n",
+ cmd_request_size, cmd_type);
+
+ return -EINVAL;
+ }
+
+ memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request,
+ cmd_request_size);
+
+ iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
+
+ return 0;
+}
+
+/**
+ * ne_retrieve_reply - Retrieve reply from the PCI device.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device to receive the reply from.
+ * @cmd_reply: command reply payload.
+ * @cmd_reply_size: size of the command reply payload.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_retrieve_reply(struct pci_dev *pdev,
+ struct ne_pci_dev_cmd_reply *cmd_reply,
+ size_t cmd_reply_size)
+{
+ struct ne_pci_dev *ne_pci_dev = NULL;
+
+ BUG_ON(!pdev);
+
+ ne_pci_dev = pci_get_drvdata(pdev);
+ BUG_ON(!ne_pci_dev);
+ BUG_ON(!ne_pci_dev->iomem_base);
+
+ if (WARN_ON(!cmd_reply))
+ return -EINVAL;
+
+ if (WARN_ON(cmd_reply_size > NE_RECV_DATA_SIZE)) {
+ dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
+ cmd_reply_size);
+
+ return -EINVAL;
+ }
+
+ memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA,
+ cmd_reply_size);
+
+ return 0;
+}
+
+/**
+ * ne_wait_for_reply - Wait for a reply of a PCI command.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device for which a reply is waited.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_wait_for_reply(struct pci_dev *pdev)
+{
+ struct ne_pci_dev *ne_pci_dev = NULL;
+ int rc = -EINVAL;
+
+ BUG_ON(!pdev);
+
+ ne_pci_dev = pci_get_drvdata(pdev);
+ BUG_ON(!ne_pci_dev);
+
+ /*
+ * TODO: Update to _interruptible and handle interrupted wait event
+ * e.g. -ERESTARTSYS, incoming signals + add / update timeout.
+ */
+ rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
+ atomic_read(&ne_pci_dev->cmd_reply_avail) != 0,
+ msecs_to_jiffies(DEFAULT_TIMEOUT_MSECS));
+ if (!rc) {
+ pr_err("Wait event timed out when waiting for PCI cmd reply\n");
+
+ return -ETIMEDOUT;
+ }
+
+ return 0;
+}
+
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+ void *cmd_request, size_t cmd_request_size,
+ struct ne_pci_dev_cmd_reply *cmd_reply, size_t cmd_reply_size)
+{
+ struct ne_pci_dev *ne_pci_dev = NULL;
+ int rc = -EINVAL;
+
+ BUG_ON(!pdev);
+
+ ne_pci_dev = pci_get_drvdata(pdev);
+ BUG_ON(!ne_pci_dev);
+ BUG_ON(!ne_pci_dev->iomem_base);
+
+ if (WARN_ON(cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD)) {
+ dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%d\n",
+ cmd_type);
+
+ return -EINVAL;
+ }
+
+ if (WARN_ON(!cmd_request))
+ return -EINVAL;
+
+ if (WARN_ON(cmd_request_size > NE_SEND_DATA_SIZE)) {
+ dev_err_ratelimited(&pdev->dev,
+ "Invalid req size=%ld for cmd type=%d\n",
+ cmd_request_size, cmd_type);
+
+ return -EINVAL;
+ }
+
+ if (WARN_ON(!cmd_reply))
+ return -EINVAL;
+
+ if (WARN_ON(cmd_reply_size > NE_RECV_DATA_SIZE)) {
+ dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
+ cmd_reply_size);
+
+ return -EINVAL;
+ }
+
+ /*
+ * Use this mutex so that the PCI device handles one command request at
+ * a time.
+ */
+ mutex_lock(&ne_pci_dev->pci_dev_mutex);
+
+ atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
+
+ rc = ne_submit_request(pdev, cmd_type, cmd_request, cmd_request_size);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in submit cmd request [rc=%d]\n",
+ rc);
+
+ mutex_unlock(&ne_pci_dev->pci_dev_mutex);
+
+ return rc;
+ }
+
+ rc = ne_wait_for_reply(pdev);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in wait cmd reply [rc=%d]\n",
+ rc);
+
+ mutex_unlock(&ne_pci_dev->pci_dev_mutex);
+
+ return rc;
+ }
+
+ rc = ne_retrieve_reply(pdev, cmd_reply, cmd_reply_size);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in retrieve cmd reply [rc=%d]\n",
+ rc);
+
+ mutex_unlock(&ne_pci_dev->pci_dev_mutex);
+
+ return rc;
+ }
+
+ atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
+
+ if (cmd_reply->rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in cmd process logic [rc=%d]\n",
+ cmd_reply->rc);
+
+ mutex_unlock(&ne_pci_dev->pci_dev_mutex);
+
+ return cmd_reply->rc;
+ }
+
+ mutex_unlock(&ne_pci_dev->pci_dev_mutex);
+
+ return 0;
+}
+
+/**
+ * ne_reply_handler - Interrupt handler for retrieving a reply matching
+ * a request sent to the PCI device for enclave lifetime management.
+ *
+ * @irq: received interrupt for a reply sent by the PCI device.
+ * @args: PCI device private data structure.
+ *
+ * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
+ */
+static irqreturn_t ne_reply_handler(int irq, void *args)
+{
+ struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+ atomic_set(&ne_pci_dev->cmd_reply_avail, 1);
+
+ /* TODO: Update to _interruptible. */
+ wake_up(&ne_pci_dev->cmd_reply_wait_q);
+
+ return IRQ_HANDLED;
+}
+
/**
* ne_setup_msix - Setup MSI-X vectors for the PCI device.
*
@@ -75,8 +320,25 @@ static int ne_setup_msix(struct pci_dev *pdev, struct ne_pci_dev *ne_pci_dev)
goto err_alloc_irq_vecs;
}

+ /*
+ * This IRQ gets triggered every time the PCI device responds to a
+ * command request. The reply is then retrieved, reading from the MMIO
+ * space of the PCI device.
+ */
+ rc = request_irq(pci_irq_vector(pdev, NE_VEC_REPLY),
+ ne_reply_handler, 0, "enclave_cmd", ne_pci_dev);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in allocating irq reply [rc=%d]\n",
+ rc);
+
+ goto err_req_irq_reply;
+ }
+
return 0;

+err_req_irq_reply:
+ pci_free_irq_vectors(pdev);
err_alloc_irq_vecs:
return rc;
}
@@ -232,6 +494,7 @@ static int ne_probe(struct pci_dev *pdev, const struct pci_device_id *id)

err_ne_pci_dev_enable:
err_ne_pci_dev_disable:
+ free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
pci_free_irq_vectors(pdev);
err_setup_msix:
pci_iounmap(pdev, ne_pci_dev->iomem_base);
@@ -255,6 +518,7 @@ static void ne_remove(struct pci_dev *pdev)

pci_set_drvdata(pdev, NULL);

+ free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
pci_free_irq_vectors(pdev);

pci_iounmap(pdev, ne_pci_dev->iomem_base);
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:44:58

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 07/15] nitro_enclaves: Init misc device providing the ioctl interface

The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_misc_dev.c | 174 ++++++++++++++++++
.../virt/amazon/nitro_enclaves/ne_pci_dev.c | 13 ++
2 files changed, 187 insertions(+)
create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
new file mode 100644
index 000000000000..d22a76ed07e5
--- /dev/null
+++ b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/**
+ * Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/bug.h>
+#include <linux/cpu.h>
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/hugetlb.h>
+#include <linux/kvm_host.h>
+#include <linux/list.h>
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/nitro_enclaves.h>
+#include <linux/pci.h>
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define NE_DEV_NAME "nitro_enclaves"
+
+#define MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
+
+static char *ne_cpus;
+module_param(ne_cpus, charp, 0644);
+MODULE_PARM_DESC(ne_cpus, "<cpu-list> - CPU pool used for Nitro Enclaves");
+
+/* CPU pool used for Nitro Enclaves. */
+struct ne_cpu_pool {
+ /* Available CPUs in the pool. */
+ cpumask_var_t avail;
+};
+
+static struct ne_cpu_pool ne_cpu_pool;
+
+static struct mutex ne_cpu_pool_mutex;
+
+static int ne_open(struct inode *node, struct file *file)
+{
+ return 0;
+}
+
+static long ne_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ switch (cmd) {
+
+ default:
+ return -ENOTTY;
+ }
+
+ return 0;
+}
+
+static int ne_release(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+static const struct file_operations ne_fops = {
+ .owner = THIS_MODULE,
+ .llseek = noop_llseek,
+ .unlocked_ioctl = ne_ioctl,
+ .open = ne_open,
+ .release = ne_release,
+};
+
+struct miscdevice ne_miscdevice = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = NE_DEV_NAME,
+ .fops = &ne_fops,
+ .mode = 0664,
+};
+
+static int __init ne_init(void)
+{
+ unsigned int cpu = 0;
+ int rc = -EINVAL;
+
+ memset(&ne_cpu_pool, 0, sizeof(ne_cpu_pool));
+
+ if (!zalloc_cpumask_var(&ne_cpu_pool.avail, GFP_KERNEL))
+ return -ENOMEM;
+
+ mutex_init(&ne_cpu_pool_mutex);
+
+ rc = cpulist_parse(ne_cpus, ne_cpu_pool.avail);
+ if (rc < 0) {
+ pr_err_ratelimited("Failure in cpulist parse [rc=%d]\n", rc);
+
+ goto err_cpulist_parse;
+ }
+
+ for_each_cpu(cpu, ne_cpu_pool.avail) {
+ rc = remove_cpu(cpu);
+ if (rc != 0) {
+ pr_err_ratelimited("Failure in cpu=%d remove [rc=%d]\n",
+ cpu, rc);
+
+ goto err_remove_cpu;
+ }
+ }
+
+ rc = pci_register_driver(&ne_pci_driver);
+ if (rc < 0) {
+ pr_err_ratelimited("Failure in pci register driver [rc=%d]\n",
+ rc);
+
+ goto err_pci_register_driver;
+ }
+
+ return 0;
+
+err_pci_register_driver:
+err_remove_cpu:
+ for_each_cpu(cpu, ne_cpu_pool.avail)
+ add_cpu(cpu);
+err_cpulist_parse:
+ free_cpumask_var(ne_cpu_pool.avail);
+ return rc;
+}
+
+static void __exit ne_exit(void)
+{
+ unsigned int cpu = 0;
+ int rc = -EINVAL;
+
+ pci_unregister_driver(&ne_pci_driver);
+
+ if (!ne_cpu_pool.avail)
+ return;
+
+ for_each_cpu(cpu, ne_cpu_pool.avail) {
+ rc = add_cpu(cpu);
+ if (WARN_ON(rc != 0))
+ pr_err_ratelimited("Failure in cpu=%d add [rc=%d]\n",
+ cpu, rc);
+ }
+
+ free_cpumask_var(ne_cpu_pool.avail);
+}
+
+/* TODO: Handle actions such as reboot, kexec. */
+
+module_init(ne_init);
+module_exit(ne_exit);
+
+MODULE_AUTHOR("Amazon.com, Inc. or its affiliates");
+MODULE_DESCRIPTION("Nitro Enclaves Driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
index 884acbb92305..19b3836b08f4 100644
--- a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
@@ -593,6 +593,15 @@ static int ne_probe(struct pci_dev *pdev, const struct pci_device_id *id)
goto err_ne_pci_dev_enable;
}

+ rc = misc_register(&ne_miscdevice);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in misc dev register [rc=%d]\n",
+ rc);
+
+ goto err_misc_register;
+ }
+
atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
init_waitqueue_head(&ne_pci_dev->cmd_reply_wait_q);
INIT_LIST_HEAD(&ne_pci_dev->enclaves_list);
@@ -603,6 +612,8 @@ static int ne_probe(struct pci_dev *pdev, const struct pci_device_id *id)

return 0;

+err_misc_register:
+ ne_pci_dev_disable(pdev, ne_pci_dev);
err_ne_pci_dev_enable:
err_ne_pci_dev_disable:
free_irq(pci_irq_vector(pdev, NE_VEC_EVENT), ne_pci_dev);
@@ -627,6 +638,8 @@ static void ne_remove(struct pci_dev *pdev)
if (!ne_pci_dev || !ne_pci_dev->iomem_base)
return;

+ misc_deregister(&ne_miscdevice);
+
ne_pci_dev_disable(pdev, ne_pci_dev);

pci_set_drvdata(pdev, NULL);
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:45:15

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 06/15] nitro_enclaves: Handle out-of-band PCI device events

In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.

Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.

Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd. The
enclave fd is returned as a result of KVM_CREATE_VM ioctl call.

Signed-off-by: Alexandru-Catalin Vasile <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_pci_dev.c | 120 ++++++++++++++++++
1 file changed, 120 insertions(+)

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
index 7453d129689a..884acbb92305 100644
--- a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
@@ -285,6 +285,85 @@ static irqreturn_t ne_reply_handler(int irq, void *args)
return IRQ_HANDLED;
}

+/**
+ * ne_event_work_handler - Work queue handler for notifying enclaves on
+ * a state change received by the event interrupt handler.
+ *
+ * An out-of-band event is being issued by the Nitro Hypervisor when at least
+ * one enclave is changing state without client interaction.
+ *
+ * @work: item containing the Nitro Enclaves PCI device for which a
+ * out-of-band event was issued.
+ */
+static void ne_event_work_handler(struct work_struct *work)
+{
+ struct ne_pci_dev_cmd_reply cmd_reply = {};
+ struct ne_enclave *ne_enclave = NULL;
+ struct ne_pci_dev *ne_pci_dev =
+ container_of(work, struct ne_pci_dev, notify_work);
+ int rc = -EINVAL;
+ struct slot_info_req slot_info_req = {};
+
+ mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+
+ /*
+ * Iterate over all enclaves registered for the Nitro Enclaves
+ * PCI device and determine for which enclave(s) the out-of-band event
+ * is corresponding to.
+ */
+ list_for_each_entry(ne_enclave, &ne_pci_dev->enclaves_list,
+ enclave_list_entry) {
+ mutex_lock(&ne_enclave->enclave_info_mutex);
+
+ /*
+ * Enclaves that were never started cannot receive out-of-band
+ * events.
+ */
+ if (ne_enclave->state != NE_STATE_RUNNING)
+ goto unlock;
+
+ slot_info_req.slot_uid = ne_enclave->slot_uid;
+
+ rc = ne_do_request(ne_enclave->pdev, SLOT_INFO, &slot_info_req,
+ sizeof(slot_info_req), &cmd_reply,
+ sizeof(cmd_reply));
+ WARN_ON(rc < 0);
+
+ /* Notify enclave process that the enclave state changed. */
+ if (ne_enclave->state != cmd_reply.state) {
+ ne_enclave->state = cmd_reply.state;
+
+ ne_enclave->has_event = true;
+
+ wake_up_interruptible(&ne_enclave->eventq);
+ }
+
+unlock:
+ mutex_unlock(&ne_enclave->enclave_info_mutex);
+ }
+
+ mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+}
+
+/**
+ * ne_event_handler - Interrupt handler for PCI device out-of-band
+ * events. This interrupt does not supply any data in the MMIO region.
+ * It notifies a change in the state of any of the launched enclaves.
+ *
+ * @irq: received interrupt for an out-of-band event.
+ * @args: PCI device private data structure.
+ *
+ * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
+ */
+static irqreturn_t ne_event_handler(int irq, void *args)
+{
+ struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+ queue_work(ne_pci_dev->event_wq, &ne_pci_dev->notify_work);
+
+ return IRQ_HANDLED;
+}
+
/**
* ne_setup_msix - Setup MSI-X vectors for the PCI device.
*
@@ -311,6 +390,19 @@ static int ne_setup_msix(struct pci_dev *pdev, struct ne_pci_dev *ne_pci_dev)
return rc;
}

+ ne_pci_dev->event_wq = create_singlethread_workqueue("ne_pci_dev_wq");
+ if (!ne_pci_dev->event_wq) {
+ rc = -ENOMEM;
+
+ dev_err_ratelimited(&pdev->dev,
+ "Cannot get wq for device events [rc=%d]\n",
+ rc);
+
+ goto err_create_wq;
+ }
+
+ INIT_WORK(&ne_pci_dev->notify_work, ne_event_work_handler);
+
rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
if (rc < 0) {
dev_err_ratelimited(&pdev->dev,
@@ -335,11 +427,30 @@ static int ne_setup_msix(struct pci_dev *pdev, struct ne_pci_dev *ne_pci_dev)
goto err_req_irq_reply;
}

+ /*
+ * This IRQ gets triggered every time any enclave's state changes. Its
+ * handler then scans for the changes and propagates them to the user
+ * space.
+ */
+ rc = request_irq(pci_irq_vector(pdev, NE_VEC_EVENT),
+ ne_event_handler, 0, "enclave_evt", ne_pci_dev);
+ if (rc < 0) {
+ dev_err_ratelimited(&pdev->dev,
+ "Failure in allocating irq event [rc=%d]\n",
+ rc);
+
+ goto err_req_irq_event;
+ }
+
return 0;

+err_req_irq_event:
+ free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
err_req_irq_reply:
pci_free_irq_vectors(pdev);
err_alloc_irq_vecs:
+ destroy_workqueue(ne_pci_dev->event_wq);
+err_create_wq:
return rc;
}

@@ -494,8 +605,10 @@ static int ne_probe(struct pci_dev *pdev, const struct pci_device_id *id)

err_ne_pci_dev_enable:
err_ne_pci_dev_disable:
+ free_irq(pci_irq_vector(pdev, NE_VEC_EVENT), ne_pci_dev);
free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
pci_free_irq_vectors(pdev);
+ destroy_workqueue(ne_pci_dev->event_wq);
err_setup_msix:
pci_iounmap(pdev, ne_pci_dev->iomem_base);
err_iomap:
@@ -518,9 +631,16 @@ static void ne_remove(struct pci_dev *pdev)

pci_set_drvdata(pdev, NULL);

+ free_irq(pci_irq_vector(pdev, NE_VEC_EVENT), ne_pci_dev);
free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
pci_free_irq_vectors(pdev);

+ if (ne_pci_dev->event_wq) {
+ flush_work(&ne_pci_dev->notify_work);
+ flush_workqueue(ne_pci_dev->event_wq);
+ destroy_workqueue(ne_pci_dev->event_wq);
+ }
+
pci_iounmap(pdev, ne_pci_dev->iomem_base);

kzfree(ne_pci_dev);
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:45:53

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 12/15] nitro_enclaves: Add logic for enclave termination

An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.

Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.

Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_misc_dev.c | 164 ++++++++++++++++++
1 file changed, 164 insertions(+)

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
index f07eb46f7995..08ba8295d524 100644
--- a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
@@ -611,8 +611,172 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
return 0;
}

+/**
+ * ne_enclave_remove_all_mem_region_entries - Remove all memory region
+ * entries from the enclave data structure.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ */
+static void ne_enclave_remove_all_mem_region_entries(
+ struct ne_enclave *ne_enclave)
+{
+ struct ne_mem_region *ne_mem_region = NULL;
+ struct ne_mem_region *ne_mem_region_tmp = NULL;
+
+ BUG_ON(!ne_enclave);
+
+ list_for_each_entry_safe(ne_mem_region, ne_mem_region_tmp,
+ &ne_enclave->mem_regions_list,
+ mem_region_list_entry) {
+ list_del(&ne_mem_region->mem_region_list_entry);
+
+ unpin_user_pages(ne_mem_region->pages,
+ ne_mem_region->nr_pages);
+
+ kzfree(ne_mem_region->pages);
+
+ kzfree(ne_mem_region);
+ }
+}
+
+/**
+ * ne_enclave_remove_all_vcpu_id_entries - Remove all vCPU id entries
+ * from the enclave data structure.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ */
+static void ne_enclave_remove_all_vcpu_id_entries(struct ne_enclave *ne_enclave)
+{
+ unsigned int cpu = 0;
+ struct ne_vcpu_id *ne_vcpu_id = NULL;
+ struct ne_vcpu_id *ne_vcpu_id_tmp = NULL;
+
+ BUG_ON(!ne_enclave);
+
+ mutex_lock(&ne_cpu_pool_mutex);
+
+ list_for_each_entry_safe(ne_vcpu_id, ne_vcpu_id_tmp,
+ &ne_enclave->vcpu_ids_list,
+ vcpu_id_list_entry) {
+ list_del(&ne_vcpu_id->vcpu_id_list_entry);
+
+ /* Update the available CPU pool. */
+ cpumask_set_cpu(ne_vcpu_id->vcpu_id, ne_cpu_pool.avail);
+
+ kzfree(ne_vcpu_id);
+ }
+
+ /* If any siblings left in the enclave CPU pool, move to available. */
+ for_each_cpu(cpu, ne_enclave->cpu_siblings) {
+ cpumask_clear_cpu(cpu, ne_enclave->cpu_siblings);
+
+ cpumask_set_cpu(cpu, ne_cpu_pool.avail);
+ }
+
+ free_cpumask_var(ne_enclave->cpu_siblings);
+
+ mutex_unlock(&ne_cpu_pool_mutex);
+}
+
+/**
+ * ne_pci_dev_remove_enclave_entry - Remove enclave entry from the data
+ * structure that is part of the PCI device private data.
+ *
+ * This function gets called with the ne_pci_dev enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @ne_pci_dev: private data associated with the PCI device.
+ */
+static void ne_pci_dev_remove_enclave_entry(struct ne_enclave *ne_enclave,
+ struct ne_pci_dev *ne_pci_dev)
+{
+ struct ne_enclave *ne_enclave_entry = NULL;
+ struct ne_enclave *ne_enclave_entry_tmp = NULL;
+
+ BUG_ON(!ne_enclave);
+ BUG_ON(!ne_pci_dev);
+
+ list_for_each_entry_safe(ne_enclave_entry, ne_enclave_entry_tmp,
+ &ne_pci_dev->enclaves_list,
+ enclave_list_entry) {
+ if (ne_enclave_entry->slot_uid == ne_enclave->slot_uid) {
+ list_del(&ne_enclave_entry->enclave_list_entry);
+
+ break;
+ }
+ }
+}
+
static int ne_enclave_release(struct inode *inode, struct file *file)
{
+ struct ne_pci_dev_cmd_reply cmd_reply = {};
+ struct enclave_stop_req enclave_stop_request = {};
+ struct ne_enclave *ne_enclave = file->private_data;
+ struct ne_pci_dev *ne_pci_dev = NULL;
+ int rc = -EINVAL;
+ struct slot_free_req slot_free_req = {};
+
+ BUG_ON(!ne_enclave);
+ BUG_ON(!ne_enclave->pdev);
+
+ ne_pci_dev = pci_get_drvdata(ne_enclave->pdev);
+ BUG_ON(!ne_pci_dev);
+
+ /*
+ * Acquire the enclave list mutex before the enclave mutex
+ * in order to avoid deadlocks with @ref ne_event_work_handler.
+ */
+ mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+ mutex_lock(&ne_enclave->enclave_info_mutex);
+
+ if (ne_enclave->state != NE_STATE_INIT &&
+ ne_enclave->state != NE_STATE_STOPPED) {
+ enclave_stop_request.slot_uid = ne_enclave->slot_uid;
+
+ rc = ne_do_request(ne_enclave->pdev, ENCLAVE_STOP,
+ &enclave_stop_request,
+ sizeof(enclave_stop_request), &cmd_reply,
+ sizeof(cmd_reply));
+ if (WARN_ON(rc < 0)) {
+ pr_err_ratelimited("Failure in enclave stop [rc=%d]\n",
+ rc);
+
+ mutex_unlock(&ne_enclave->enclave_info_mutex);
+ mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+ return rc;
+ }
+
+ memset(&cmd_reply, 0, sizeof(cmd_reply));
+ }
+
+ slot_free_req.slot_uid = ne_enclave->slot_uid;
+
+ rc = ne_do_request(ne_enclave->pdev, SLOT_FREE, &slot_free_req,
+ sizeof(slot_free_req), &cmd_reply,
+ sizeof(cmd_reply));
+ if (WARN_ON(rc < 0)) {
+ pr_err_ratelimited("Failure in slot free [rc=%d]\n", rc);
+
+ mutex_unlock(&ne_enclave->enclave_info_mutex);
+ mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+ return rc;
+ }
+
+ ne_pci_dev_remove_enclave_entry(ne_enclave, ne_pci_dev);
+ ne_enclave_remove_all_mem_region_entries(ne_enclave);
+ ne_enclave_remove_all_vcpu_id_entries(ne_enclave);
+
+ mutex_unlock(&ne_enclave->enclave_info_mutex);
+ mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+ kzfree(ne_enclave);
+
return 0;
}

--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:47:13

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 14/15] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

Signed-off-by: Andra Paraschiv <[email protected]>
---
drivers/virt/Makefile | 2 ++
drivers/virt/amazon/Makefile | 19 +++++++++++++++++
drivers/virt/amazon/nitro_enclaves/Makefile | 23 +++++++++++++++++++++
3 files changed, 44 insertions(+)
create mode 100644 drivers/virt/amazon/Makefile
create mode 100644 drivers/virt/amazon/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..50a7d792e1c4 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@

obj-$(CONFIG_FSL_HV_MANAGER) += fsl_hypervisor.o
obj-y += vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES) += amazon/
diff --git a/drivers/virt/amazon/Makefile b/drivers/virt/amazon/Makefile
new file mode 100644
index 000000000000..9d77bbfc748a
--- /dev/null
+++ b/drivers/virt/amazon/Makefile
@@ -0,0 +1,19 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, see <http://www.gnu.org/licenses/>.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
diff --git a/drivers/virt/amazon/nitro_enclaves/Makefile b/drivers/virt/amazon/nitro_enclaves/Makefile
new file mode 100644
index 000000000000..9109aed41070
--- /dev/null
+++ b/drivers/virt/amazon/nitro_enclaves/Makefile
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, see <http://www.gnu.org/licenses/>.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
+
+ccflags-y += -Wall
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:47:16

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 15/15] MAINTAINERS: Add entry for the Nitro Enclaves driver

Signed-off-by: Andra Paraschiv <[email protected]>
---
MAINTAINERS | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b816a453b10e..9625fadbd400 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11956,6 +11956,17 @@ S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2.git
F: arch/nios2/

+NITRO ENCLAVES (NE)
+M: Andra Paraschiv <[email protected]>
+M: Alexandru Vasile <[email protected]>
+M: Alexandru Ciobotaru <[email protected]>
+L: [email protected]
+S: Supported
+W: https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+F: include/linux/nitro_enclaves.h
+F: include/uapi/linux/nitro_enclaves.h
+F: drivers/virt/amazon/nitro_enclaves/
+
NOHZ, DYNTICKS SUPPORT
M: Frederic Weisbecker <[email protected]>
M: Thomas Gleixner <[email protected]>
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:47:32

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 09/15] nitro_enclaves: Add logic for enclave vcpu creation

An enclave, before being started, has its resources set. One of its
resources is CPU.

Add ioctl command logic for enclave vCPU creation. Return as result a
file descriptor that is associated with the enclave vCPU.

Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_misc_dev.c | 210 ++++++++++++++++++
1 file changed, 210 insertions(+)

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
index abbebc7718c2..c9acdfd63daf 100644
--- a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
@@ -60,6 +60,179 @@ static struct ne_cpu_pool ne_cpu_pool;

static struct mutex ne_cpu_pool_mutex;

+static int ne_enclave_vcpu_open(struct inode *node, struct file *file)
+{
+ return 0;
+}
+
+static long ne_enclave_vcpu_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ switch (cmd) {
+ default:
+ return -ENOTTY;
+ }
+
+ return 0;
+}
+
+static int ne_enclave_vcpu_release(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+static const struct file_operations ne_enclave_vcpu_fops = {
+ .owner = THIS_MODULE,
+ .llseek = noop_llseek,
+ .unlocked_ioctl = ne_enclave_vcpu_ioctl,
+ .open = ne_enclave_vcpu_open,
+ .release = ne_enclave_vcpu_release,
+};
+
+/**
+ * ne_get_cpu_from_cpu_pool - Get a CPU from the CPU pool, if it is set.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @vcpu_id: id of the CPU to be associated with the given slot, apic id on x86.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_get_cpu_from_cpu_pool(struct ne_enclave *ne_enclave, u32 *vcpu_id)
+{
+ unsigned int cpu = 0;
+ unsigned int cpu_sibling = 0;
+
+ BUG_ON(!ne_enclave);
+
+ if (WARN_ON(!vcpu_id))
+ return -EINVAL;
+
+ /* There are CPU siblings available to choose from. */
+ cpu = cpumask_any(ne_enclave->cpu_siblings);
+ if (cpu < nr_cpu_ids) {
+ cpumask_clear_cpu(cpu, ne_enclave->cpu_siblings);
+
+ *vcpu_id = cpu;
+
+ return 0;
+ }
+
+ mutex_lock(&ne_cpu_pool_mutex);
+
+ /* Choose any CPU from the available CPU pool. */
+ cpu = cpumask_any(ne_cpu_pool.avail);
+ if (cpu >= nr_cpu_ids) {
+ pr_err_ratelimited("No CPUs available in CPU pool\n");
+
+ mutex_unlock(&ne_cpu_pool_mutex);
+
+ return -EINVAL;
+ }
+
+ cpumask_clear_cpu(cpu, ne_cpu_pool.avail);
+
+ /*
+ * Make sure the CPU siblings are not marked as
+ * available anymore.
+ */
+ for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu)) {
+ if (cpu_sibling != cpu) {
+ cpumask_clear_cpu(cpu_sibling, ne_cpu_pool.avail);
+
+ cpumask_set_cpu(cpu_sibling, ne_enclave->cpu_siblings);
+ }
+ }
+
+ mutex_unlock(&ne_cpu_pool_mutex);
+
+ *vcpu_id = cpu;
+
+ return 0;
+}
+
+/**
+ * ne_create_vcpu_ioctl - Add vCPU to the slot associated with the current
+ * enclave. Create vCPU file descriptor to be further used for CPU handling.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @vcpu_id: id of the CPU to be associated with the given slot, apic id on x86.
+ *
+ * @returns: vCPU fd on success, negative return value on failure.
+ */
+static int ne_create_vcpu_ioctl(struct ne_enclave *ne_enclave, u32 vcpu_id)
+{
+ struct ne_pci_dev_cmd_reply cmd_reply = {};
+ int fd = 0;
+ struct file *file = NULL;
+ struct ne_vcpu_id *ne_vcpu_id = NULL;
+ int rc = -EINVAL;
+ struct slot_add_vcpu_req slot_add_vcpu_req = {};
+
+ BUG_ON(!ne_enclave);
+ BUG_ON(!ne_enclave->pdev);
+
+ if (ne_enclave->mm != current->mm)
+ return -EIO;
+
+ ne_vcpu_id = kzalloc(sizeof(*ne_vcpu_id), GFP_KERNEL);
+ if (!ne_vcpu_id)
+ return -ENOMEM;
+
+ fd = get_unused_fd_flags(O_CLOEXEC);
+ if (fd < 0) {
+ rc = fd;
+
+ pr_err_ratelimited("Failure in getting unused fd [rc=%d]\n",
+ rc);
+
+ goto err_get_unused_fd;
+ }
+
+ /* TODO: Include (vcpu) id in the ne-vm-vcpu naming. */
+ file = anon_inode_getfile("ne-vm-vcpu", &ne_enclave_vcpu_fops,
+ ne_enclave, O_RDWR);
+ if (IS_ERR(file)) {
+ rc = PTR_ERR(file);
+
+ pr_err_ratelimited("Failure in anon inode get file [rc=%d]\n",
+ rc);
+
+ goto err_anon_inode_getfile;
+ }
+
+ slot_add_vcpu_req.slot_uid = ne_enclave->slot_uid;
+ slot_add_vcpu_req.vcpu_id = vcpu_id;
+
+ rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_VCPU, &slot_add_vcpu_req,
+ sizeof(slot_add_vcpu_req), &cmd_reply,
+ sizeof(cmd_reply));
+ if (rc < 0) {
+ pr_err_ratelimited("Failure in slot add vcpu [rc=%d]\n", rc);
+
+ goto err_slot_add_vcpu;
+ }
+
+ ne_vcpu_id->vcpu_id = vcpu_id;
+
+ list_add(&ne_vcpu_id->vcpu_id_list_entry, &ne_enclave->vcpu_ids_list);
+
+ fd_install(fd, file);
+
+ return fd;
+
+err_slot_add_vcpu:
+ fput(file);
+err_anon_inode_getfile:
+ put_unused_fd(fd);
+err_get_unused_fd:
+ kzfree(ne_vcpu_id);
+ return rc;
+}
+
static int ne_enclave_open(struct inode *node, struct file *file)
{
return 0;
@@ -68,7 +241,44 @@ static int ne_enclave_open(struct inode *node, struct file *file)
static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
+ struct ne_enclave *ne_enclave = file->private_data;
+
+ BUG_ON(!ne_enclave);
+
switch (cmd) {
+ case KVM_CREATE_VCPU: {
+ int rc = -EINVAL;
+ u32 vcpu_id = 0;
+
+ if (copy_from_user(&vcpu_id, (void *)arg, sizeof(vcpu_id))) {
+ pr_err_ratelimited("Failure in copy from user\n");
+
+ return -EFAULT;
+ }
+
+ mutex_lock(&ne_enclave->enclave_info_mutex);
+
+ /* Use the CPU pool for choosing a CPU for the enclave. */
+ rc = ne_get_cpu_from_cpu_pool(ne_enclave, &vcpu_id);
+ if (rc < 0) {
+ pr_err_ratelimited("Failure in get CPU from pool\n");
+
+ mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+ return -EINVAL;
+ }
+
+ rc = ne_create_vcpu_ioctl(ne_enclave, vcpu_id);
+
+ /* Put back the CPU in enclave cpu pool, if add vcpu error. */
+ if (rc < 0)
+ cpumask_set_cpu(vcpu_id, ne_enclave->cpu_siblings);
+
+ mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+ return rc;
+ }
+
default:
return -ENOTTY;
}
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:47:32

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 08/15] nitro_enclaves: Add logic for enclave vm creation

Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_misc_dev.c | 166 ++++++++++++++++++
1 file changed, 166 insertions(+)

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
index d22a76ed07e5..abbebc7718c2 100644
--- a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
@@ -60,6 +60,145 @@ static struct ne_cpu_pool ne_cpu_pool;

static struct mutex ne_cpu_pool_mutex;

+static int ne_enclave_open(struct inode *node, struct file *file)
+{
+ return 0;
+}
+
+static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ switch (cmd) {
+ default:
+ return -ENOTTY;
+ }
+
+ return 0;
+}
+
+static int ne_enclave_release(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
+{
+ __poll_t mask = 0;
+ struct ne_enclave *ne_enclave = file->private_data;
+
+ poll_wait(file, &ne_enclave->eventq, wait);
+
+ if (!ne_enclave->has_event)
+ return mask;
+
+ mask = POLLHUP;
+
+ return mask;
+}
+
+static const struct file_operations ne_enclave_fops = {
+ .owner = THIS_MODULE,
+ .llseek = noop_llseek,
+ .poll = ne_enclave_poll,
+ .unlocked_ioctl = ne_enclave_ioctl,
+ .open = ne_enclave_open,
+ .release = ne_enclave_release,
+};
+
+/**
+ * ne_create_vm_ioctl - Alloc slot to be associated with an enclave. Create
+ * enclave file descriptor to be further used for enclave resources handling
+ * e.g. memory regions and CPUs.
+ *
+ * This function gets called with the ne_pci_dev enclave mutex held.
+ *
+ * @pdev: PCI device used for enclave lifetime management.
+ * @ne_pci_dev: private data associated with the PCI device.
+ * @type: type of the virtual machine to be created.
+ *
+ * @returns: enclave fd on success, negative return value on failure.
+ */
+static int ne_create_vm_ioctl(struct pci_dev *pdev,
+ struct ne_pci_dev *ne_pci_dev, unsigned long type)
+{
+ struct ne_pci_dev_cmd_reply cmd_reply = {};
+ int fd = 0;
+ struct file *file = NULL;
+ struct ne_enclave *ne_enclave = NULL;
+ int rc = -EINVAL;
+ struct slot_alloc_req slot_alloc_req = {};
+
+ BUG_ON(!pdev);
+ BUG_ON(!ne_pci_dev);
+
+ ne_enclave = kzalloc(sizeof(*ne_enclave), GFP_KERNEL);
+ if (!ne_enclave)
+ return -ENOMEM;
+
+ if (!zalloc_cpumask_var(&ne_enclave->cpu_siblings, GFP_KERNEL)) {
+ kzfree(ne_enclave);
+
+ return -ENOMEM;
+ }
+
+ fd = get_unused_fd_flags(O_CLOEXEC);
+ if (fd < 0) {
+ rc = fd;
+
+ pr_err_ratelimited("Failure in getting unused fd [rc=%d]\n",
+ rc);
+
+ goto err_get_unused_fd;
+ }
+
+ file = anon_inode_getfile("ne-vm", &ne_enclave_fops, ne_enclave,
+ O_RDWR);
+ if (IS_ERR(file)) {
+ rc = PTR_ERR(file);
+
+ pr_err_ratelimited("Failure in anon inode get file [rc=%d]\n",
+ rc);
+
+ goto err_anon_inode_getfile;
+ }
+
+ ne_enclave->pdev = pdev;
+
+ rc = ne_do_request(ne_enclave->pdev, SLOT_ALLOC, &slot_alloc_req,
+ sizeof(slot_alloc_req), &cmd_reply,
+ sizeof(cmd_reply));
+ if (rc < 0) {
+ pr_err_ratelimited("Failure in slot alloc [rc=%d]\n", rc);
+
+ goto err_slot_alloc;
+ }
+
+ init_waitqueue_head(&ne_enclave->eventq);
+ ne_enclave->has_event = false;
+ mutex_init(&ne_enclave->enclave_info_mutex);
+ ne_enclave->max_mem_regions = cmd_reply.mem_regions;
+ INIT_LIST_HEAD(&ne_enclave->mem_regions_list);
+ ne_enclave->mm = current->mm;
+ ne_enclave->slot_uid = cmd_reply.slot_uid;
+ ne_enclave->state = NE_STATE_INIT;
+ INIT_LIST_HEAD(&ne_enclave->vcpu_ids_list);
+
+ list_add(&ne_enclave->enclave_list_entry, &ne_pci_dev->enclaves_list);
+
+ fd_install(fd, file);
+
+ return fd;
+
+err_slot_alloc:
+ fput(file);
+err_anon_inode_getfile:
+ put_unused_fd(fd);
+err_get_unused_fd:
+ free_cpumask_var(ne_enclave->cpu_siblings);
+ kzfree(ne_enclave);
+ return rc;
+}
+
static int ne_open(struct inode *node, struct file *file)
{
return 0;
@@ -67,7 +206,34 @@ static int ne_open(struct inode *node, struct file *file)

static long ne_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
+ struct ne_pci_dev *ne_pci_dev = NULL;
+ struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
+ PCI_DEVICE_ID_NE, NULL);
+
+ BUG_ON(!pdev);
+
+ ne_pci_dev = pci_get_drvdata(pdev);
+ BUG_ON(!ne_pci_dev);
+
switch (cmd) {
+ case KVM_CREATE_VM: {
+ int rc = -EINVAL;
+ unsigned long type = 0;
+
+ if (copy_from_user(&type, (void *)arg, sizeof(type))) {
+ pr_err_ratelimited("Failure in copy from user\n");
+
+ return -EFAULT;
+ }
+
+ mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+
+ rc = ne_create_vm_ioctl(pdev, ne_pci_dev, type);
+
+ mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+ return rc;
+ }

default:
return -ENOTTY;
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:47:49

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 10/15] nitro_enclaves: Add logic for enclave memory region set

Another resource that is being set for an enclave is memory. User space
memory regions, that needs to be backed by contiguous memory regions,
are associated with the enclave.

One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.

Add ioctl command logic for setting user space memory region for an
enclave.

Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_misc_dev.c | 242 ++++++++++++++++++
1 file changed, 242 insertions(+)

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
index c9acdfd63daf..0bd283f73a87 100644
--- a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
@@ -233,6 +233,228 @@ static int ne_create_vcpu_ioctl(struct ne_enclave *ne_enclave, u32 vcpu_id)
return rc;
}

+/**
+ * ne_sanity_check_user_mem_region - Sanity check the userspace memory
+ * region received during the set user memory region ioctl call.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @mem_region: user space memory region to be sanity checked.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region(struct ne_enclave *ne_enclave,
+ struct kvm_userspace_memory_region *mem_region)
+{
+ BUG_ON(!ne_enclave);
+
+ if (WARN_ON(!mem_region))
+ return -EINVAL;
+
+ if (mem_region->slot > ne_enclave->max_mem_regions) {
+ pr_err_ratelimited("Mem slot higher than max mem regions\n");
+
+ return -EINVAL;
+ }
+
+ if ((mem_region->memory_size % MIN_MEM_REGION_SIZE) != 0) {
+ pr_err_ratelimited("Mem region size not multiple of 2 MiB\n");
+
+ return -EINVAL;
+ }
+
+ if ((mem_region->userspace_addr & (MIN_MEM_REGION_SIZE - 1)) ||
+ !access_ok((void __user *)(unsigned long)mem_region->userspace_addr,
+ mem_region->memory_size)) {
+ pr_err_ratelimited("Invalid user space addr range\n");
+
+ return -EINVAL;
+ }
+
+ if ((mem_region->guest_phys_addr + mem_region->memory_size) <
+ mem_region->guest_phys_addr) {
+ pr_err_ratelimited("Invalid guest phys addr range\n");
+
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/**
+ * ne_set_user_memory_region_ioctl - Add user space memory region to the slot
+ * associated with the current enclave.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @mem_region: user space memory region to be associated with the given slot.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,
+ struct kvm_userspace_memory_region *mem_region)
+{
+ struct ne_pci_dev_cmd_reply cmd_reply = {};
+ long gup_rc = 0;
+ unsigned long i = 0;
+ struct ne_mem_region *ne_mem_region = NULL;
+ unsigned long nr_phys_contig_mem_regions = 0;
+ unsigned long nr_pinned_pages = 0;
+ struct page **phys_contig_mem_regions = NULL;
+ int rc = -EINVAL;
+ struct slot_add_mem_req slot_add_mem_req = {};
+
+ BUG_ON(!ne_enclave);
+ BUG_ON(!ne_enclave->pdev);
+
+ if (WARN_ON(!mem_region))
+ return -EINVAL;
+
+ if (ne_enclave->mm != current->mm)
+ return -EIO;
+
+ rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
+ if (rc < 0)
+ return rc;
+
+ ne_mem_region = kzalloc(sizeof(*ne_mem_region), GFP_KERNEL);
+ if (!ne_mem_region)
+ return -ENOMEM;
+
+ /*
+ * TODO: Update nr_pages value to handle contiguous virtual address
+ * ranges mapped to non-contiguous physical regions. Hugetlbfs can give
+ * 2 MiB / 1 GiB contiguous physical regions.
+ */
+ ne_mem_region->nr_pages = mem_region->memory_size / MIN_MEM_REGION_SIZE;
+
+ ne_mem_region->pages = kcalloc(ne_mem_region->nr_pages,
+ sizeof(*ne_mem_region->pages),
+ GFP_KERNEL);
+ if (!ne_mem_region->pages) {
+ kzfree(ne_mem_region);
+
+ return -ENOMEM;
+ }
+
+ phys_contig_mem_regions = kcalloc(ne_mem_region->nr_pages,
+ sizeof(*phys_contig_mem_regions),
+ GFP_KERNEL);
+ if (!phys_contig_mem_regions) {
+ kzfree(ne_mem_region->pages);
+ kzfree(ne_mem_region);
+
+ return -ENOMEM;
+ }
+
+ /*
+ * TODO: Handle non-contiguous memory regions received from user space.
+ * Hugetlbfs can give 2 MiB / 1 GiB contiguous physical regions. The
+ * virtual address space can be seen as contiguous, although it is
+ * mapped underneath to 2 MiB / 1 GiB physical regions e.g. 8 MiB
+ * virtual address space mapped to 4 physically contiguous regions of 2
+ * MiB.
+ */
+ do {
+ unsigned long tmp_nr_pages = ne_mem_region->nr_pages -
+ nr_pinned_pages;
+ struct page **tmp_pages = ne_mem_region->pages +
+ nr_pinned_pages;
+ u64 tmp_userspace_addr = mem_region->userspace_addr +
+ nr_pinned_pages * MIN_MEM_REGION_SIZE;
+
+ gup_rc = get_user_pages(tmp_userspace_addr, tmp_nr_pages,
+ FOLL_GET, tmp_pages, NULL);
+ if (gup_rc < 0) {
+ rc = gup_rc;
+
+ pr_err_ratelimited("Failure in gup [rc=%d]\n", rc);
+
+ unpin_user_pages(ne_mem_region->pages, nr_pinned_pages);
+
+ goto err_get_user_pages;
+ }
+
+ nr_pinned_pages += gup_rc;
+
+ } while (nr_pinned_pages < ne_mem_region->nr_pages);
+
+ /*
+ * TODO: Update checks once physically contiguous regions are collected
+ * based on the user space address and get_user_pages() results.
+ */
+ for (i = 0; i < ne_mem_region->nr_pages; i++) {
+ if (!PageHuge(ne_mem_region->pages[i])) {
+ pr_err_ratelimited("The page isn't a hugetlbfs page\n");
+
+ goto err_phys_pages_check;
+ }
+
+ if (huge_page_size(page_hstate(ne_mem_region->pages[i])) !=
+ MIN_MEM_REGION_SIZE) {
+ pr_err_ratelimited("The page size isn't 2 MiB\n");
+
+ goto err_phys_pages_check;
+ }
+
+ /*
+ * TODO: Update once handled non-contiguous memory regions
+ * received from user space.
+ */
+ phys_contig_mem_regions[i] = ne_mem_region->pages[i];
+ }
+
+ /*
+ * TODO: Update once handled non-contiguous memory regions received
+ * from user space.
+ */
+ nr_phys_contig_mem_regions = ne_mem_region->nr_pages;
+
+ for (i = 0; i < nr_phys_contig_mem_regions; i++) {
+ u64 phys_addr = page_to_phys(phys_contig_mem_regions[i]);
+
+ slot_add_mem_req.slot_uid = ne_enclave->slot_uid;
+ slot_add_mem_req.paddr = phys_addr;
+ /*
+ * TODO: Update memory size of physical contiguous memory
+ * region, in case of non-contiguous memory regions received
+ * from user space.
+ */
+ slot_add_mem_req.size = MIN_MEM_REGION_SIZE;
+
+ rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_MEM,
+ &slot_add_mem_req, sizeof(slot_add_mem_req),
+ &cmd_reply, sizeof(cmd_reply));
+ if (rc < 0) {
+ pr_err_ratelimited("Failure in slot add mem [rc=%d]\n",
+ rc);
+
+ goto err_slot_add_mem;
+ }
+
+ memset(&slot_add_mem_req, 0, sizeof(slot_add_mem_req));
+ memset(&cmd_reply, 0, sizeof(cmd_reply));
+ }
+
+ list_add(&ne_mem_region->mem_region_list_entry,
+ &ne_enclave->mem_regions_list);
+
+ kzfree(phys_contig_mem_regions);
+
+ return 0;
+
+err_slot_add_mem:
+err_phys_pages_check:
+ unpin_user_pages(ne_mem_region->pages, ne_mem_region->nr_pages);
+err_get_user_pages:
+ kzfree(phys_contig_mem_regions);
+ kzfree(ne_mem_region->pages);
+ kzfree(ne_mem_region);
+ return rc;
+}
+
static int ne_enclave_open(struct inode *node, struct file *file)
{
return 0;
@@ -279,6 +501,26 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
return rc;
}

+ case KVM_SET_USER_MEMORY_REGION: {
+ struct kvm_userspace_memory_region mem_region = {};
+ int rc = -EINVAL;
+
+ if (copy_from_user(&mem_region, (void *)arg,
+ sizeof(mem_region))) {
+ pr_err_ratelimited("Failure in copy from user\n");
+
+ return -EFAULT;
+ }
+
+ mutex_lock(&ne_enclave->enclave_info_mutex);
+
+ rc = ne_set_user_memory_region_ioctl(ne_enclave, &mem_region);
+
+ mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+ return rc;
+ }
+
default:
return -ENOTTY;
}
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:47:56

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 11/15] nitro_enclaves: Add logic for enclave start

After all the enclave resources are set, the enclave is ready for
beginning to run.

Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.

The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.

Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Andra Paraschiv <[email protected]>
---
.../virt/amazon/nitro_enclaves/ne_misc_dev.c | 83 +++++++++++++++++++
1 file changed, 83 insertions(+)

diff --git a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
index 0bd283f73a87..f07eb46f7995 100644
--- a/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/amazon/nitro_enclaves/ne_misc_dev.c
@@ -455,6 +455,53 @@ static int ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,
return rc;
}

+/**
+ * ne_enclave_start_ioctl - Trigger enclave start after the enclave resources,
+ * such as memory and CPU, have been set.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @enclave_start_metadata: enclave metadata that includes enclave cid and
+ * flags and the slot uid.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_enclave_start_ioctl(struct ne_enclave *ne_enclave,
+ struct enclave_start_metadata *enclave_start_metadata)
+{
+ struct ne_pci_dev_cmd_reply cmd_reply = {};
+ struct enclave_start_req enclave_start_req = {};
+ int rc = -EINVAL;
+
+ BUG_ON(!ne_enclave);
+ BUG_ON(!ne_enclave->pdev);
+
+ if (WARN_ON(!enclave_start_metadata))
+ return -EINVAL;
+
+ enclave_start_metadata->slot_uid = ne_enclave->slot_uid;
+
+ enclave_start_req.enclave_cid = enclave_start_metadata->enclave_cid;
+ enclave_start_req.flags = enclave_start_metadata->flags;
+ enclave_start_req.slot_uid = enclave_start_metadata->slot_uid;
+
+ rc = ne_do_request(ne_enclave->pdev, ENCLAVE_START, &enclave_start_req,
+ sizeof(enclave_start_req), &cmd_reply,
+ sizeof(cmd_reply));
+ if (rc < 0) {
+ pr_err_ratelimited("Failure in enclave start [rc=%d]\n", rc);
+
+ return rc;
+ }
+
+ ne_enclave->state = NE_STATE_RUNNING;
+
+ enclave_start_metadata->enclave_cid = cmd_reply.enclave_cid;
+
+ return 0;
+}
+
static int ne_enclave_open(struct inode *node, struct file *file)
{
return 0;
@@ -521,6 +568,42 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
return rc;
}

+ case NE_ENCLAVE_START: {
+ struct enclave_start_metadata enclave_start_metadata = {};
+ int rc = -EINVAL;
+
+ if (copy_from_user(&enclave_start_metadata, (void *)arg,
+ sizeof(enclave_start_metadata))) {
+ pr_err_ratelimited("Failure in copy from user\n");
+
+ return -EFAULT;
+ }
+
+ mutex_lock(&ne_enclave->enclave_info_mutex);
+
+ if (!cpumask_empty(ne_enclave->cpu_siblings)) {
+ pr_err_ratelimited("Enclave has CPU siblings avail\n");
+
+ mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+ return -EINVAL;
+ }
+
+ rc = ne_enclave_start_ioctl(ne_enclave,
+ &enclave_start_metadata);
+
+ mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+ if (copy_to_user((void *)arg, &enclave_start_metadata,
+ sizeof(enclave_start_metadata))) {
+ pr_err_ratelimited("Failure in copy to user\n");
+
+ return -EFAULT;
+ }
+
+ return rc;
+ }
+
default:
return -ENOTTY;
}
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:48:19

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: [PATCH v1 13/15] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

Signed-off-by: Andra Paraschiv <[email protected]>
---
drivers/virt/Kconfig | 2 ++
drivers/virt/amazon/Kconfig | 28 ++++++++++++++++++++++++++++
2 files changed, 30 insertions(+)
create mode 100644 drivers/virt/amazon/Kconfig

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 363af2eaf2ba..06bb5cfa191d 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -32,4 +32,6 @@ config FSL_HV_MANAGER
partition shuts down.

source "drivers/virt/vboxguest/Kconfig"
+
+source "drivers/virt/amazon/Kconfig"
endif
diff --git a/drivers/virt/amazon/Kconfig b/drivers/virt/amazon/Kconfig
new file mode 100644
index 000000000000..57fd0aa58803
--- /dev/null
+++ b/drivers/virt/amazon/Kconfig
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, see <http://www.gnu.org/licenses/>.
+
+# Amazon Nitro Enclaves (NE) support.
+# Nitro is a hypervisor that has been developed by Amazon.
+
+config NITRO_ENCLAVES
+ tristate "Nitro Enclaves Support"
+ depends on HOTPLUG_CPU
+ ---help---
+ This driver consists of support for enclave lifetime management
+ for Nitro Enclaves (NE).
+
+ To compile this driver as a module, choose M here.
+ The module will be called nitro_enclaves.
--
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-21 18:49:43

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v1 01/15] nitro_enclaves: Add ioctl interface definition

Hi--

On 4/21/20 11:41 AM, Andra Paraschiv wrote:
> The Nitro Enclaves driver handles the enclave lifetime management. This
> includes enclave creation, termination and setting up its resources such
> as memory and CPU.
>
> An enclave runs alongside the VM that spawned it. It is abstracted as a
> process running in the VM that launched it. The process interacts with
> the NE driver, that exposes an ioctl interface for creating an enclave
> and setting up its resources.
>
> Include the KVM API as part of the provided ioctl interface, with an
> additional ENCLAVE_START ioctl command that triggers the enclave run.
>
> Signed-off-by: Alexandru Vasile <[email protected]>
> Signed-off-by: Andra Paraschiv <[email protected]>
> ---
> include/linux/nitro_enclaves.h | 23 +++++++++++++
> include/uapi/linux/nitro_enclaves.h | 52 +++++++++++++++++++++++++++++
> 2 files changed, 75 insertions(+)
> create mode 100644 include/linux/nitro_enclaves.h
> create mode 100644 include/uapi/linux/nitro_enclaves.h
>

> diff --git a/include/uapi/linux/nitro_enclaves.h b/include/uapi/linux/nitro_enclaves.h
> new file mode 100644
> index 000000000000..b90dfcf6253a
> --- /dev/null
> +++ b/include/uapi/linux/nitro_enclaves.h
> @@ -0,0 +1,52 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
> +#define _UAPI_LINUX_NITRO_ENCLAVES_H_
> +
> +#include <linux/kvm.h>
> +#include <linux/types.h>
> +
> +/* Nitro Enclaves (NE) Kernel Driver Interface */
> +
> +/**
> + * The command is used to trigger enclave start after the enclave resources,
> + * such as memory and CPU, have been set.
> + *
> + * The enclave start metadata is an in / out data structure. It includes
> + * provided info by the caller - enclave cid and flags - and returns the
> + * slot uid and the cid (if input cid is 0).
> + */
> +#define NE_ENCLAVE_START _IOWR('B', 0x1, struct enclave_start_metadata)

Please document ioctl major ('B' in this case) and range used in
Documentation/userspace-api/ioctl/ioctl-number.rst.

> +
> +/* Setup metadata necessary for enclave start. */
> +struct enclave_start_metadata {
> + /* Flags for the enclave to start with (e.g. debug mode) (in). */
> + __u64 flags;
> +
> + /**
> + * Context ID (CID) for the enclave vsock device. If 0 as input, the
> + * CID is autogenerated by the hypervisor and returned back as output
> + * by the driver (in/out).
> + */
> + __u64 enclave_cid;
> +
> + /* Slot unique id mapped to the enclave to start (out). */
> + __u64 slot_uid;
> +};
> +
> +#endif /* _UAPI_LINUX_NITRO_ENCLAVES_H_ */
>

thanks.
--
~Randy

2020-04-21 18:52:00

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v1 13/15] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

Hi--

On 4/21/20 11:41 AM, Andra Paraschiv wrote:
> Signed-off-by: Andra Paraschiv <[email protected]>
> ---
> drivers/virt/Kconfig | 2 ++
> drivers/virt/amazon/Kconfig | 28 ++++++++++++++++++++++++++++
> 2 files changed, 30 insertions(+)
> create mode 100644 drivers/virt/amazon/Kconfig
>
> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
> index 363af2eaf2ba..06bb5cfa191d 100644
> --- a/drivers/virt/Kconfig
> +++ b/drivers/virt/Kconfig
> @@ -32,4 +32,6 @@ config FSL_HV_MANAGER
> partition shuts down.
>
> source "drivers/virt/vboxguest/Kconfig"
> +
> +source "drivers/virt/amazon/Kconfig"
> endif
> diff --git a/drivers/virt/amazon/Kconfig b/drivers/virt/amazon/Kconfig
> new file mode 100644
> index 000000000000..57fd0aa58803
> --- /dev/null
> +++ b/drivers/virt/amazon/Kconfig
> @@ -0,0 +1,28 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms and conditions of the GNU General Public License,
> +# version 2, as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, see <http://www.gnu.org/licenses/>.
> +
> +# Amazon Nitro Enclaves (NE) support.
> +# Nitro is a hypervisor that has been developed by Amazon.
> +
> +config NITRO_ENCLAVES
> + tristate "Nitro Enclaves Support"
> + depends on HOTPLUG_CPU
> + ---help---

For v2:
We are moving away from the use of "---help---" to just "help".

> + This driver consists of support for enclave lifetime management
> + for Nitro Enclaves (NE).
> +
> + To compile this driver as a module, choose M here.
> + The module will be called nitro_enclaves.
>

thanks.
--
~Randy

2020-04-21 21:26:24

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 02/15] nitro_enclaves: Define the PCI device interface

On 21/04/20 20:41, Andra Paraschiv wrote:
> The Nitro Enclaves (NE) driver communicates with a new PCI device, that
> is exposed to a virtual machine (VM) and handles commands meant for
> handling enclaves lifetime e.g. creation, termination, setting memory
> regions. The communication with the PCI device is handled using a MMIO
> space and MSI-X interrupts.
>
> This device communicates with the hypervisor on the host, where the VM
> that spawned the enclave itself run, e.g. to launch a VM that is used
> for the enclave.
>
> Define the MMIO space of the PCI device, the commands that are
> provided by this device. Add an internal data structure used as private
> data for the PCI device driver and the functions for the PCI device init
> / uninit and command requests handling.
>
> Signed-off-by: Alexandru-Catalin Vasile <[email protected]>
> Signed-off-by: Alexandru Ciobotaru <[email protected]>
> Signed-off-by: Andra Paraschiv <[email protected]>
> ---
> .../virt/amazon/nitro_enclaves/ne_pci_dev.h | 266 ++++++++++++++++++
> 1 file changed, 266 insertions(+)
> create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_pci_dev.h

Can this be placed just in drivers/virt/nitro_enclaves, or
drivers/virt/enclave/nitro? It's not unlikely that this device be
implemented outside EC2 sooner or later, and there's nothing
Amazon-specific as far as I can see from the UAPI.

Paolo

2020-04-21 21:49:45

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 01/15] nitro_enclaves: Add ioctl interface definition

On 21/04/20 20:47, Randy Dunlap wrote:
>> +
>> +/**
>> + * The command is used to trigger enclave start after the enclave resources,
>> + * such as memory and CPU, have been set.
>> + *
>> + * The enclave start metadata is an in / out data structure. It includes
>> + * provided info by the caller - enclave cid and flags - and returns the
>> + * slot uid and the cid (if input cid is 0).
>> + */
>> +#define NE_ENCLAVE_START _IOWR('B', 0x1, struct enclave_start_metadata)
> Please document ioctl major ('B' in this case) and range used in
> Documentation/userspace-api/ioctl/ioctl-number.rst.
>

Since it's really just a couple ioctls, I can "donate" part of the KVM
space, for example major 0xAE minor 0x20-0x3f.

Paolo

2020-04-21 21:50:48

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 21/04/20 20:41, Andra Paraschiv wrote:
> An enclave communicates with the primary VM via a local communication channel,
> using virtio-vsock [2]. An enclave does not have a disk or a network device
> attached.

Is it possible to have a sample of this in the samples/ directory?

I am interested especially in:

- the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.

- the communication channel; does the enclave see the usual local APIC
and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
where is the virtio-vsock device (virtio-mmio I suppose) placed in memory?

- what the enclave is allowed to do: can it change privilege levels,
what happens if the enclave performs an access to nonexistent memory, etc.

- whether there are special hypercall interfaces for the enclave

> The proposed solution is following the KVM model and uses the KVM API to be able
> to create and set resources for enclaves. An additional ioctl command, besides
> the ones provided by KVM, is used to start an enclave and setup the addressing
> for the communication channel and an enclave unique id.

Reusing some KVM ioctls is definitely a good idea, but I wouldn't really
say it's the KVM API since the VCPU file descriptor is basically non
functional (without KVM_RUN and mmap it's not really the KVM API).

Paolo

2020-04-22 15:07:45

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 13/15] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver



On 21/04/2020 21:50, Randy Dunlap wrote:
> Hi--
>
> On 4/21/20 11:41 AM, Andra Paraschiv wrote:
>> Signed-off-by: Andra Paraschiv <[email protected]>
>> ---
>> drivers/virt/Kconfig | 2 ++
>> drivers/virt/amazon/Kconfig | 28 ++++++++++++++++++++++++++++
>> 2 files changed, 30 insertions(+)
>> create mode 100644 drivers/virt/amazon/Kconfig
>>
>> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
>> index 363af2eaf2ba..06bb5cfa191d 100644
>> --- a/drivers/virt/Kconfig
>> +++ b/drivers/virt/Kconfig
>> @@ -32,4 +32,6 @@ config FSL_HV_MANAGER
>> partition shuts down.
>>
>> source "drivers/virt/vboxguest/Kconfig"
>> +
>> +source "drivers/virt/amazon/Kconfig"
>> endif
>> diff --git a/drivers/virt/amazon/Kconfig b/drivers/virt/amazon/Kconfig
>> new file mode 100644
>> index 000000000000..57fd0aa58803
>> --- /dev/null
>> +++ b/drivers/virt/amazon/Kconfig
>> @@ -0,0 +1,28 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +#
>> +# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or modify it
>> +# under the terms and conditions of the GNU General Public License,
>> +# version 2, as published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, see <http://www.gnu.org/licenses/>.
>> +
>> +# Amazon Nitro Enclaves (NE) support.
>> +# Nitro is a hypervisor that has been developed by Amazon.
>> +
>> +config NITRO_ENCLAVES
>> + tristate "Nitro Enclaves Support"
>> + depends on HOTPLUG_CPU
>> + ---help---
> For v2:
> We are moving away from the use of "---help---" to just "help".

Hi Randy,

Ack, thank you, I updated in v2.

Thanks,
Andra

>
>> + This driver consists of support for enclave lifetime management
>> + for Nitro Enclaves (NE).
>> +
>> + To compile this driver as a module, choose M here.
>> + The module will be called nitro_enclaves.
>>
> thanks.
> --
> ~Randy
>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-22 15:54:06

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 01/15] nitro_enclaves: Add ioctl interface definition



On 22/04/2020 00:45, Paolo Bonzini wrote:
> On 21/04/20 20:47, Randy Dunlap wrote:
>>> +
>>> +/**
>>> + * The command is used to trigger enclave start after the enclave resources,
>>> + * such as memory and CPU, have been set.
>>> + *
>>> + * The enclave start metadata is an in / out data structure. It includes
>>> + * provided info by the caller - enclave cid and flags - and returns the
>>> + * slot uid and the cid (if input cid is 0).
>>> + */
>>> +#define NE_ENCLAVE_START _IOWR('B', 0x1, struct enclave_start_metadata)
>> Please document ioctl major ('B' in this case) and range used in
>> Documentation/userspace-api/ioctl/ioctl-number.rst.
>>
> Since it's really just a couple ioctls, I can "donate" part of the KVM
> space, for example major 0xAE minor 0x20-0x3f.

Randy, thanks for the ioctl doc refs.

I can update the ioctl-number doc to add an entry for the the Nitro
Enclaves uapi with 0xAE and 0x20-0x3f range + update the KVM entry to
have 0xAE 0x00-0x1f and 0x40-0xff.

Will then use 0xAE and 0x20 for NE_ENCLAVE_START.

Paolo, let me know if we should do this ioctl number update other way.
And thanks for the proposal. :)

Thanks,
Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-23 08:15:18

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 14/15] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

Hi Andra,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on linux/master v5.7-rc2 next-20200422]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Andra-Paraschiv/Add-support-for-Nitro-Enclaves/20200423-130814
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 18bf34080c4c3beb6699181986cc97dd712498fe
config: parisc-allmodconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=parisc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

In file included from include/uapi/linux/nitro_enclaves.h:21,
from include/linux/nitro_enclaves.h:21,
from drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:26:
>> include/uapi/linux/kvm.h:14:10: fatal error: asm/kvm.h: No such file or directory
14 | #include <asm/kvm.h>
| ^~~~~~~~~~~
compilation terminated.

vim +14 include/uapi/linux/kvm.h

6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 4
6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 5 /*
6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 6 * Userspace interface for /dev/kvm - kernel based virtual machine
6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 7 *
dea8caee7b6971 include/linux/kvm.h Rusty Russell 2007-07-17 8 * Note: you must update KVM_API_VERSION if you change this interface.
6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 9 */
6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 10
00bfddaf7f68a6 include/linux/kvm.h Jaswinder Singh Rajput 2009-01-15 11 #include <linux/types.h>
97646202bc3f19 include/linux/kvm.h Christian Borntraeger 2008-03-12 12 #include <linux/compiler.h>
6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 13 #include <linux/ioctl.h>
f6a40e3bdf5fe0 include/linux/kvm.h Jerone Young 2007-11-19 @14 #include <asm/kvm.h>
6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 15

:::::: The code at line 14 was first introduced by commit
:::::: f6a40e3bdf5fe0a7d7d7f2dbc5b10158fbdad968 KVM: Portability: Move kvm_memory_alias to asm/kvm.h

:::::: TO: Jerone Young <[email protected]>
:::::: CC: Avi Kivity <[email protected]>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (3.06 kB)
.config.gz (60.02 kB)
Download all attachments

2020-04-23 09:28:55

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 14/15] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

Hi Andra,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on linux/master v5.7-rc2 next-20200422]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Andra-Paraschiv/Add-support-for-Nitro-Enclaves/20200423-130814
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 18bf34080c4c3beb6699181986cc97dd712498fe
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from include/linux/device.h:15:0,
from drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:22:
drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c: In function 'ne_submit_request':
>> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:80:9: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
"Invalid req size=%ld for cmd type=%d\n",
^
include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt'
#define dev_fmt(fmt) fmt
^~~
>> include/linux/dev_printk.h:167:3: note: in expansion of macro 'dev_err'
dev_level(dev, fmt, ##__VA_ARGS__); \
^~~~~~~~~
include/linux/dev_printk.h:177:2: note: in expansion of macro 'dev_level_ratelimited'
dev_level_ratelimited(dev_err, dev, fmt, ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~
>> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:79:3: note: in expansion of macro 'dev_err_ratelimited'
dev_err_ratelimited(&pdev->dev,
^~~~~~~~~~~~~~~~~~~
drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c: In function 'ne_retrieve_reply':
drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:121:35: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
^
include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt'
#define dev_fmt(fmt) fmt
^~~
>> include/linux/dev_printk.h:167:3: note: in expansion of macro 'dev_err'
dev_level(dev, fmt, ##__VA_ARGS__); \
^~~~~~~~~
include/linux/dev_printk.h:177:2: note: in expansion of macro 'dev_level_ratelimited'
dev_level_ratelimited(dev_err, dev, fmt, ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~
drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:121:3: note: in expansion of macro 'dev_err_ratelimited'
dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
^~~~~~~~~~~~~~~~~~~
drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c: In function 'ne_do_request':
drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:193:9: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
"Invalid req size=%ld for cmd type=%d\n",
^
include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt'
#define dev_fmt(fmt) fmt
^~~
>> include/linux/dev_printk.h:167:3: note: in expansion of macro 'dev_err'
dev_level(dev, fmt, ##__VA_ARGS__); \
^~~~~~~~~
include/linux/dev_printk.h:177:2: note: in expansion of macro 'dev_level_ratelimited'
dev_level_ratelimited(dev_err, dev, fmt, ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~
drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:192:3: note: in expansion of macro 'dev_err_ratelimited'
dev_err_ratelimited(&pdev->dev,
^~~~~~~~~~~~~~~~~~~
drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:203:35: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
^
include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt'
#define dev_fmt(fmt) fmt
^~~
>> include/linux/dev_printk.h:167:3: note: in expansion of macro 'dev_err'
dev_level(dev, fmt, ##__VA_ARGS__); \
^~~~~~~~~
include/linux/dev_printk.h:177:2: note: in expansion of macro 'dev_level_ratelimited'
dev_level_ratelimited(dev_err, dev, fmt, ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~
drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:203:3: note: in expansion of macro 'dev_err_ratelimited'
dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
^~~~~~~~~~~~~~~~~~~

vim +80 drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c

0ed609272739ee Andra Paraschiv 2020-04-21 42
08a5a524ab0b6c Andra Paraschiv 2020-04-21 43 /**
08a5a524ab0b6c Andra Paraschiv 2020-04-21 44 * ne_submit_request - Submit command request to the PCI device based on the
08a5a524ab0b6c Andra Paraschiv 2020-04-21 45 * command type.
08a5a524ab0b6c Andra Paraschiv 2020-04-21 46 *
08a5a524ab0b6c Andra Paraschiv 2020-04-21 47 * This function gets called with the ne_pci_dev mutex held.
08a5a524ab0b6c Andra Paraschiv 2020-04-21 48 *
08a5a524ab0b6c Andra Paraschiv 2020-04-21 49 * @pdev: PCI device to send the command to.
08a5a524ab0b6c Andra Paraschiv 2020-04-21 50 * @cmd_type: command type of the request sent to the PCI device.
08a5a524ab0b6c Andra Paraschiv 2020-04-21 51 * @cmd_request: command request payload.
08a5a524ab0b6c Andra Paraschiv 2020-04-21 52 * @cmd_request_size: size of the command request payload.
08a5a524ab0b6c Andra Paraschiv 2020-04-21 53 *
08a5a524ab0b6c Andra Paraschiv 2020-04-21 54 * @returns: 0 on success, negative return value on failure.
08a5a524ab0b6c Andra Paraschiv 2020-04-21 55 */
08a5a524ab0b6c Andra Paraschiv 2020-04-21 56 static int ne_submit_request(struct pci_dev *pdev,
08a5a524ab0b6c Andra Paraschiv 2020-04-21 57 enum ne_pci_dev_cmd_type cmd_type,
08a5a524ab0b6c Andra Paraschiv 2020-04-21 58 void *cmd_request, size_t cmd_request_size)
08a5a524ab0b6c Andra Paraschiv 2020-04-21 59 {
08a5a524ab0b6c Andra Paraschiv 2020-04-21 60 struct ne_pci_dev *ne_pci_dev = NULL;
08a5a524ab0b6c Andra Paraschiv 2020-04-21 61
08a5a524ab0b6c Andra Paraschiv 2020-04-21 62 BUG_ON(!pdev);
08a5a524ab0b6c Andra Paraschiv 2020-04-21 63
08a5a524ab0b6c Andra Paraschiv 2020-04-21 64 ne_pci_dev = pci_get_drvdata(pdev);
08a5a524ab0b6c Andra Paraschiv 2020-04-21 65 BUG_ON(!ne_pci_dev);
08a5a524ab0b6c Andra Paraschiv 2020-04-21 66 BUG_ON(!ne_pci_dev->iomem_base);
08a5a524ab0b6c Andra Paraschiv 2020-04-21 67
08a5a524ab0b6c Andra Paraschiv 2020-04-21 68 if (WARN_ON(cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD)) {
08a5a524ab0b6c Andra Paraschiv 2020-04-21 69 dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%d\n",
08a5a524ab0b6c Andra Paraschiv 2020-04-21 70 cmd_type);
08a5a524ab0b6c Andra Paraschiv 2020-04-21 71
08a5a524ab0b6c Andra Paraschiv 2020-04-21 72 return -EINVAL;
08a5a524ab0b6c Andra Paraschiv 2020-04-21 73 }
08a5a524ab0b6c Andra Paraschiv 2020-04-21 74
08a5a524ab0b6c Andra Paraschiv 2020-04-21 75 if (WARN_ON(!cmd_request))
08a5a524ab0b6c Andra Paraschiv 2020-04-21 76 return -EINVAL;
08a5a524ab0b6c Andra Paraschiv 2020-04-21 77
08a5a524ab0b6c Andra Paraschiv 2020-04-21 78 if (WARN_ON(cmd_request_size > NE_SEND_DATA_SIZE)) {
08a5a524ab0b6c Andra Paraschiv 2020-04-21 @79 dev_err_ratelimited(&pdev->dev,
08a5a524ab0b6c Andra Paraschiv 2020-04-21 @80 "Invalid req size=%ld for cmd type=%d\n",
08a5a524ab0b6c Andra Paraschiv 2020-04-21 81 cmd_request_size, cmd_type);
08a5a524ab0b6c Andra Paraschiv 2020-04-21 82
08a5a524ab0b6c Andra Paraschiv 2020-04-21 83 return -EINVAL;
08a5a524ab0b6c Andra Paraschiv 2020-04-21 84 }
08a5a524ab0b6c Andra Paraschiv 2020-04-21 85
08a5a524ab0b6c Andra Paraschiv 2020-04-21 86 memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request,
08a5a524ab0b6c Andra Paraschiv 2020-04-21 87 cmd_request_size);
08a5a524ab0b6c Andra Paraschiv 2020-04-21 88
08a5a524ab0b6c Andra Paraschiv 2020-04-21 89 iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
08a5a524ab0b6c Andra Paraschiv 2020-04-21 90
08a5a524ab0b6c Andra Paraschiv 2020-04-21 91 return 0;
08a5a524ab0b6c Andra Paraschiv 2020-04-21 92 }
08a5a524ab0b6c Andra Paraschiv 2020-04-21 93

:::::: The code at line 80 was first introduced by commit
:::::: 08a5a524ab0b6c939997c8d44b4d07e5ee97e91d nitro_enclaves: Handle PCI device command requests

:::::: TO: Andra Paraschiv <[email protected]>
:::::: CC: 0day robot <[email protected]>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (9.08 kB)
.config.gz (71.35 kB)
Download all attachments

2020-04-23 13:22:51

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 22/04/2020 00:46, Paolo Bonzini wrote:
> On 21/04/20 20:41, Andra Paraschiv wrote:
>> An enclave communicates with the primary VM via a local communication channel,
>> using virtio-vsock [2]. An enclave does not have a disk or a network device
>> attached.
> Is it possible to have a sample of this in the samples/ directory?

I can add in v2 a sample file including the basic flow of how to use the
ioctl interface to create / terminate an enclave.

Then we can update / build on top it based on the ongoing discussions on
the patch series and the received feedback.

>
> I am interested especially in:
>
> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>
> - the communication channel; does the enclave see the usual local APIC
> and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
> where is the virtio-vsock device (virtio-mmio I suppose) placed in memory?
>
> - what the enclave is allowed to do: can it change privilege levels,
> what happens if the enclave performs an access to nonexistent memory, etc.
>
> - whether there are special hypercall interfaces for the enclave

An enclave is a VM, running on the same host as the primary VM, that
launched the enclave. They are siblings.

Here we need to think of two components:

1. An enclave abstraction process - a process running in the primary VM
guest, that uses the provided ioctl interface of the Nitro Enclaves
kernel driver to spawn an enclave VM (that's 2 below).

How does all gets to an enclave VM running on the host?

There is a Nitro Enclaves emulated PCI device exposed to the primary VM.
The driver for this new PCI device is included in the current patch series.

The ioctl logic is mapped to PCI device commands e.g. the
NE_ENCLAVE_START ioctl maps to an enclave start PCI command or the
KVM_SET_USER_MEMORY_REGION maps to an add memory PCI command. The PCI
device commands are then translated into actions taken on the hypervisor
side; that's the Nitro hypervisor running on the host where the primary
VM is running.

2. The enclave itself - a VM running on the same host as the primary VM
that spawned it.

The enclave VM has no persistent storage or network interface attached,
it uses its own memory and CPUs + its virtio-vsock emulated device for
communication with the primary VM.

The memory and CPUs are carved out of the primary VM, they are dedicated
for the enclave. The Nitro hypervisor running on the host ensures memory
and CPU isolation between the primary VM and the enclave VM.


These two components need to reflect the same state e.g. when the
enclave abstraction process (1) is terminated, the enclave VM (2) is
terminated as well.

With regard to the communication channel, the primary VM has its own
emulated virtio-vsock PCI device. The enclave VM has its own emulated
virtio-vsock device as well. This channel is used, for example, to fetch
data in the enclave and then process it. An application that sets up the
vsock socket and connects or listens, depending on the use case, is then
developed to use this channel; this happens on both ends - primary VM
and enclave VM.

Let me know if further clarifications are needed.

>
>> The proposed solution is following the KVM model and uses the KVM API to be able
>> to create and set resources for enclaves. An additional ioctl command, besides
>> the ones provided by KVM, is used to start an enclave and setup the addressing
>> for the communication channel and an enclave unique id.
> Reusing some KVM ioctls is definitely a good idea, but I wouldn't really
> say it's the KVM API since the VCPU file descriptor is basically non
> functional (without KVM_RUN and mmap it's not really the KVM API).

It uses part of the KVM API or a set of KVM ioctls to model the way a VM
is created / terminated. That's true, KVM_RUN and mmap-ing the vcpu fd
are not included.

Thanks for the feedback regarding the reuse of KVM ioctls.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-23 13:40:36

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 02/15] nitro_enclaves: Define the PCI device interface



On 22/04/2020 00:22, Paolo Bonzini wrote:
> On 21/04/20 20:41, Andra Paraschiv wrote:
>> The Nitro Enclaves (NE) driver communicates with a new PCI device, that
>> is exposed to a virtual machine (VM) and handles commands meant for
>> handling enclaves lifetime e.g. creation, termination, setting memory
>> regions. The communication with the PCI device is handled using a MMIO
>> space and MSI-X interrupts.
>>
>> This device communicates with the hypervisor on the host, where the VM
>> that spawned the enclave itself run, e.g. to launch a VM that is used
>> for the enclave.
>>
>> Define the MMIO space of the PCI device, the commands that are
>> provided by this device. Add an internal data structure used as private
>> data for the PCI device driver and the functions for the PCI device init
>> / uninit and command requests handling.
>>
>> Signed-off-by: Alexandru-Catalin Vasile <[email protected]>
>> Signed-off-by: Alexandru Ciobotaru <[email protected]>
>> Signed-off-by: Andra Paraschiv <[email protected]>
>> ---
>> .../virt/amazon/nitro_enclaves/ne_pci_dev.h | 266 ++++++++++++++++++
>> 1 file changed, 266 insertions(+)
>> create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_pci_dev.h
> Can this be placed just in drivers/virt/nitro_enclaves, or
> drivers/virt/enclave/nitro? It's not unlikely that this device be
> implemented outside EC2 sooner or later, and there's nothing
> Amazon-specific as far as I can see from the UAPI.

I can update the path to drivers/virt/nitro_enclaves.

The PCI device in the patch series is registered under Amazon PCI Vendor
ID and it has this PCI Device ID - 0xe4c1.

Thanks,
Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-23 13:45:21

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 23/04/20 15:19, Paraschiv, Andra-Irina wrote:
> 2. The enclave itself - a VM running on the same host as the primary VM
> that spawned it.
>
> The enclave VM has no persistent storage or network interface attached,
> it uses its own memory and CPUs + its virtio-vsock emulated device for
> communication with the primary VM.
>
> The memory and CPUs are carved out of the primary VM, they are dedicated
> for the enclave. The Nitro hypervisor running on the host ensures memory
> and CPU isolation between the primary VM and the enclave VM.
>
> These two components need to reflect the same state e.g. when the
> enclave abstraction process (1) is terminated, the enclave VM (2) is
> terminated as well.
>
> With regard to the communication channel, the primary VM has its own
> emulated virtio-vsock PCI device. The enclave VM has its own emulated
> virtio-vsock device as well. This channel is used, for example, to fetch
> data in the enclave and then process it. An application that sets up the
> vsock socket and connects or listens, depending on the use case, is then
> developed to use this channel; this happens on both ends - primary VM
> and enclave VM.
>
> Let me know if further clarifications are needed.

Thanks, this is all useful. However can you please clarify the
low-level details here?

>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>> - the communication channel; does the enclave see the usual local APIC
>> and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
>> where is the virtio-vsock device (virtio-mmio I suppose) placed in
>> memory?
>> - what the enclave is allowed to do: can it change privilege levels,
>> what happens if the enclave performs an access to nonexistent memory,
>> etc.
>> - whether there are special hypercall interfaces for the enclave

Thanks,

Paolo

2020-04-23 17:53:04

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 23/04/20 19:42, Paraschiv, Andra-Irina wrote:
>>
>>>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>
> The enclave VM has its own kernel and follows the well-known Linux boot
> protocol, in the end getting to the user application after init finishes
> its work, so that's CPL3.

CPL3 is how the user application run, but does the enclave's Linux boot
process start in real mode at the reset vector (0xfffffff0), in 16-bit
protected mode at the Linux bzImage entry point, or at the ELF entry point?

Paolo

2020-04-23 19:37:40

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 23/04/2020 16:42, Paolo Bonzini wrote:
> On 23/04/20 15:19, Paraschiv, Andra-Irina wrote:
>> 2. The enclave itself - a VM running on the same host as the primary VM
>> that spawned it.
>>
>> The enclave VM has no persistent storage or network interface attached,
>> it uses its own memory and CPUs + its virtio-vsock emulated device for
>> communication with the primary VM.
>>
>> The memory and CPUs are carved out of the primary VM, they are dedicated
>> for the enclave. The Nitro hypervisor running on the host ensures memory
>> and CPU isolation between the primary VM and the enclave VM.
>>
>> These two components need to reflect the same state e.g. when the
>> enclave abstraction process (1) is terminated, the enclave VM (2) is
>> terminated as well.
>>
>> With regard to the communication channel, the primary VM has its own
>> emulated virtio-vsock PCI device. The enclave VM has its own emulated
>> virtio-vsock device as well. This channel is used, for example, to fetch
>> data in the enclave and then process it. An application that sets up the
>> vsock socket and connects or listens, depending on the use case, is then
>> developed to use this channel; this happens on both ends - primary VM
>> and enclave VM.
>>
>> Let me know if further clarifications are needed.
> Thanks, this is all useful. However can you please clarify the
> low-level details here?
>
>>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.

The enclave VM has its own kernel and follows the well-known Linux boot
protocol, in the end getting to the user application after init finishes
its work, so that's CPL3.

>>> - the communication channel; does the enclave see the usual local APIC
>>> and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
>>> where is the virtio-vsock device (virtio-mmio I suppose) placed in
>>> memory?
vsock is using eventfd for signalling; wrt enclave VM, it sees the usual
interfaces to get interrupts from virtio dev.

It's placed below the typical 4GB; in general, it may depend based on arch.

>>> - what the enclave is allowed to do: can it change privilege levels,
>>> what happens if the enclave performs an access to nonexistent memory,
>>> etc.

If talking about the enclave abstraction process, it is running in the
primary VM as a user space process, so it will get into primary VM guest
kernel if privileged instructions need to be executed.

Same happens with the user space application running in the enclave VM.
And the VM itself will get to the hypervisor running on the host for
privileged instructions. The Nitro hypervisor is based on core KVM
technology.

Access to nonexistent memory gets faults.

>>> - whether there are special hypercall interfaces for the enclave

The path towards creating / setting resources / terminating an enclave
(here referring to enclave VM) is towards the ioctl interface, with the
corresponding misc device, and the emulated PCI device. That's the
interface used to manage enclaves. Once booted, the enclave resources
setup is not modified anymore. And the way to communicate with the
enclave after booting, with the application running in the enclave, is
via the vsock comm channel.


Thanks,
Andra

> Thanks,
>
> Paolo
>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-23 21:00:48

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 23.04.20 19:51, Paolo Bonzini wrote:
>
> On 23/04/20 19:42, Paraschiv, Andra-Irina wrote:
>>>
>>>>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>>
>> The enclave VM has its own kernel and follows the well-known Linux boot
>> protocol, in the end getting to the user application after init finishes
>> its work, so that's CPL3.
>
> CPL3 is how the user application run, but does the enclave's Linux boot
> process start in real mode at the reset vector (0xfffffff0), in 16-bit
> protected mode at the Linux bzImage entry point, or at the ELF entry point?

There is no "entry point" per se. You prepopulate at target bzImage into
the enclave memory on boot which then follows the standard boot
protocol. Everything before that (enclave firmware, etc.) is provided by
the enclave environment.

Think of it like a mechanism to launch a second QEMU instance on the
host, but all you can actually control are the -smp, -m, -kernel and
-initrd parameters. The only I/O channel you have between your VM and
that new VM is a vsock channel which is configured by the host on your
behalf.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2020-04-23 21:20:28

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 23/04/20 22:56, Alexander Graf wrote:
>>
>> CPL3 is how the user application run, but does the enclave's Linux boot
>> process start in real mode at the reset vector (0xfffffff0), in 16-bit
>> protected mode at the Linux bzImage entry point, or at the ELF entry
>> point?
>
> There is no "entry point" per se. You prepopulate at target bzImage into
> the enclave memory on boot which then follows the standard boot
> protocol. Everything

There's still a "where" missing in that sentence. :) I assume you put
it at 0x10000 (and so the entry point at 0x10200)? That should be
documented because that is absolutely not what the KVM API looks like.

> before that (enclave firmware, etc.) is provided by
> the enclave environment.
>
> Think of it like a mechanism to launch a second QEMU instance on the
> host, but all you can actually control are the -smp, -m, -kernel and
> -initrd parameters.

Are there requirements on how to populate the memory to ensure that the
host firmware doesn't crash and burn? E.g. some free memory right below
4GiB (for the firmware, the LAPIC/IOAPIC or any other special MMIO
devices you have, PCI BARs, and the like)?

> The only I/O channel you have between your VM and
> that new VM is a vsock channel which is configured by the host on your
> behalf.

Is this virtio-mmio or virtio-pci, and what other emulated devices are
there and how do you discover them? Are there any ISA devices
(RTC/PIC/PIT), and are there SMBIOS/RSDP/MP tables in the F segment?

Thanks,

Paolo

2020-04-24 03:08:19

by Longpeng(Mike)

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves


On 2020/4/23 21:19, Paraschiv, Andra-Irina wrote:
>
>
> On 22/04/2020 00:46, Paolo Bonzini wrote:
>> On 21/04/20 20:41, Andra Paraschiv wrote:
>>> An enclave communicates with the primary VM via a local communication channel,
>>> using virtio-vsock [2]. An enclave does not have a disk or a network device
>>> attached.
>> Is it possible to have a sample of this in the samples/ directory?
>
> I can add in v2 a sample file including the basic flow of how to use the ioctl
> interface to create / terminate an enclave.
>
> Then we can update / build on top it based on the ongoing discussions on the
> patch series and the received feedback.
>
>>
>> I am interested especially in:
>>
>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>>
>> - the communication channel; does the enclave see the usual local APIC
>> and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
>> where is the virtio-vsock device (virtio-mmio I suppose) placed in memory?
>>
>> - what the enclave is allowed to do: can it change privilege levels,
>> what happens if the enclave performs an access to nonexistent memory, etc.
>>
>> - whether there are special hypercall interfaces for the enclave
>
> An enclave is a VM, running on the same host as the primary VM, that launched
> the enclave. They are siblings.
>
> Here we need to think of two components:
>
> 1. An enclave abstraction process - a process running in the primary VM guest,
> that uses the provided ioctl interface of the Nitro Enclaves kernel driver to
> spawn an enclave VM (that's 2 below).
>
> How does all gets to an enclave VM running on the host?
>
> There is a Nitro Enclaves emulated PCI device exposed to the primary VM. The
> driver for this new PCI device is included in the current patch series.
>
Hi Paraschiv,

The new PCI device is emulated in QEMU ? If so, is there any plan to send the
QEMU code ?

> The ioctl logic is mapped to PCI device commands e.g. the NE_ENCLAVE_START ioctl
> maps to an enclave start PCI command or the KVM_SET_USER_MEMORY_REGION maps to
> an add memory PCI command. The PCI device commands are then translated into
> actions taken on the hypervisor side; that's the Nitro hypervisor running on the
> host where the primary VM is running.
>
> 2. The enclave itself - a VM running on the same host as the primary VM that
> spawned it.
>
> The enclave VM has no persistent storage or network interface attached, it uses
> its own memory and CPUs + its virtio-vsock emulated device for communication
> with the primary VM.
>
> The memory and CPUs are carved out of the primary VM, they are dedicated for the
> enclave. The Nitro hypervisor running on the host ensures memory and CPU
> isolation between the primary VM and the enclave VM.
>
>
> These two components need to reflect the same state e.g. when the enclave
> abstraction process (1) is terminated, the enclave VM (2) is terminated as well.
>
> With regard to the communication channel, the primary VM has its own emulated
> virtio-vsock PCI device. The enclave VM has its own emulated virtio-vsock device
> as well. This channel is used, for example, to fetch data in the enclave and
> then process it. An application that sets up the vsock socket and connects or
> listens, depending on the use case, is then developed to use this channel; this
> happens on both ends - primary VM and enclave VM.
>
> Let me know if further clarifications are needed.
>
>>
>>> The proposed solution is following the KVM model and uses the KVM API to be able
>>> to create and set resources for enclaves. An additional ioctl command, besides
>>> the ones provided by KVM, is used to start an enclave and setup the addressing
>>> for the communication channel and an enclave unique id.
>> Reusing some KVM ioctls is definitely a good idea, but I wouldn't really
>> say it's the KVM API since the VCPU file descriptor is basically non
>> functional (without KVM_RUN and mmap it's not really the KVM API).
>
> It uses part of the KVM API or a set of KVM ioctls to model the way a VM is
> created / terminated. That's true, KVM_RUN and mmap-ing the vcpu fd are not
> included.
>
> Thanks for the feedback regarding the reuse of KVM ioctls.
>
> Andra
>
>
>
>
> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar
> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in
> Romania. Registration number J22/2621/2005.

--
---
Regards,
Longpeng(Mike)

2020-04-24 08:21:49

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 24/04/2020 06:04, Longpeng (Mike, Cloud Infrastructure Service
Product Dept.) wrote:
> On 2020/4/23 21:19, Paraschiv, Andra-Irina wrote:
>>
>> On 22/04/2020 00:46, Paolo Bonzini wrote:
>>> On 21/04/20 20:41, Andra Paraschiv wrote:
>>>> An enclave communicates with the primary VM via a local communication channel,
>>>> using virtio-vsock [2]. An enclave does not have a disk or a network device
>>>> attached.
>>> Is it possible to have a sample of this in the samples/ directory?
>> I can add in v2 a sample file including the basic flow of how to use the ioctl
>> interface to create / terminate an enclave.
>>
>> Then we can update / build on top it based on the ongoing discussions on the
>> patch series and the received feedback.
>>
>>> I am interested especially in:
>>>
>>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>>>
>>> - the communication channel; does the enclave see the usual local APIC
>>> and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
>>> where is the virtio-vsock device (virtio-mmio I suppose) placed in memory?
>>>
>>> - what the enclave is allowed to do: can it change privilege levels,
>>> what happens if the enclave performs an access to nonexistent memory, etc.
>>>
>>> - whether there are special hypercall interfaces for the enclave
>> An enclave is a VM, running on the same host as the primary VM, that launched
>> the enclave. They are siblings.
>>
>> Here we need to think of two components:
>>
>> 1. An enclave abstraction process - a process running in the primary VM guest,
>> that uses the provided ioctl interface of the Nitro Enclaves kernel driver to
>> spawn an enclave VM (that's 2 below).
>>
>> How does all gets to an enclave VM running on the host?
>>
>> There is a Nitro Enclaves emulated PCI device exposed to the primary VM. The
>> driver for this new PCI device is included in the current patch series.
>>
> Hi Paraschiv,
>
> The new PCI device is emulated in QEMU ? If so, is there any plan to send the
> QEMU code ?

Hi,

Nope, not that I know of so far.

Thanks,
Andra

>
>> The ioctl logic is mapped to PCI device commands e.g. the NE_ENCLAVE_START ioctl
>> maps to an enclave start PCI command or the KVM_SET_USER_MEMORY_REGION maps to
>> an add memory PCI command. The PCI device commands are then translated into
>> actions taken on the hypervisor side; that's the Nitro hypervisor running on the
>> host where the primary VM is running.
>>
>> 2. The enclave itself - a VM running on the same host as the primary VM that
>> spawned it.
>>
>> The enclave VM has no persistent storage or network interface attached, it uses
>> its own memory and CPUs + its virtio-vsock emulated device for communication
>> with the primary VM.
>>
>> The memory and CPUs are carved out of the primary VM, they are dedicated for the
>> enclave. The Nitro hypervisor running on the host ensures memory and CPU
>> isolation between the primary VM and the enclave VM.
>>
>>
>> These two components need to reflect the same state e.g. when the enclave
>> abstraction process (1) is terminated, the enclave VM (2) is terminated as well.
>>
>> With regard to the communication channel, the primary VM has its own emulated
>> virtio-vsock PCI device. The enclave VM has its own emulated virtio-vsock device
>> as well. This channel is used, for example, to fetch data in the enclave and
>> then process it. An application that sets up the vsock socket and connects or
>> listens, depending on the use case, is then developed to use this channel; this
>> happens on both ends - primary VM and enclave VM.
>>
>> Let me know if further clarifications are needed.
>>
>>>> The proposed solution is following the KVM model and uses the KVM API to be able
>>>> to create and set resources for enclaves. An additional ioctl command, besides
>>>> the ones provided by KVM, is used to start an enclave and setup the addressing
>>>> for the communication channel and an enclave unique id.
>>> Reusing some KVM ioctls is definitely a good idea, but I wouldn't really
>>> say it's the KVM API since the VCPU file descriptor is basically non
>>> functional (without KVM_RUN and mmap it's not really the KVM API).
>> It uses part of the KVM API or a set of KVM ioctls to model the way a VM is
>> created / terminated. That's true, KVM_RUN and mmap-ing the vcpu fd are not
>> included.
>>
>> Thanks for the feedback regarding the reuse of KVM ioctls.
>>
>> Andra
>>
>>
>>
>>
>> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar
>> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in
>> Romania. Registration number J22/2621/2005.




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-24 09:58:46

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 24/04/2020 11:19, Paraschiv, Andra-Irina wrote:
>
>
> On 24/04/2020 06:04, Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.) wrote:
>> On 2020/4/23 21:19, Paraschiv, Andra-Irina wrote:
>>>
>>> On 22/04/2020 00:46, Paolo Bonzini wrote:
>>>> On 21/04/20 20:41, Andra Paraschiv wrote:
>>>>> An enclave communicates with the primary VM via a local
>>>>> communication channel,
>>>>> using virtio-vsock [2]. An enclave does not have a disk or a
>>>>> network device
>>>>> attached.
>>>> Is it possible to have a sample of this in the samples/ directory?
>>> I can add in v2 a sample file including the basic flow of how to use
>>> the ioctl
>>> interface to create / terminate an enclave.
>>>
>>> Then we can update / build on top it based on the ongoing
>>> discussions on the
>>> patch series and the received feedback.
>>>
>>>> I am interested especially in:
>>>>
>>>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>>>>
>>>> - the communication channel; does the enclave see the usual local APIC
>>>> and IOAPIC interfaces in order to get interrupts from virtio-vsock,
>>>> and
>>>> where is the virtio-vsock device (virtio-mmio I suppose) placed in
>>>> memory?
>>>>
>>>> - what the enclave is allowed to do: can it change privilege levels,
>>>> what happens if the enclave performs an access to nonexistent
>>>> memory, etc.
>>>>
>>>> - whether there are special hypercall interfaces for the enclave
>>> An enclave is a VM, running on the same host as the primary VM, that
>>> launched
>>> the enclave. They are siblings.
>>>
>>> Here we need to think of two components:
>>>
>>> 1. An enclave abstraction process - a process running in the primary
>>> VM guest,
>>> that uses the provided ioctl interface of the Nitro Enclaves kernel
>>> driver to
>>> spawn an enclave VM (that's 2 below).
>>>
>>> How does all gets to an enclave VM running on the host?
>>>
>>> There is a Nitro Enclaves emulated PCI device exposed to the primary
>>> VM. The
>>> driver for this new PCI device is included in the current patch series.
>>>
>> Hi Paraschiv,
>>
>> The new PCI device is emulated in QEMU ? If so, is there any plan to
>> send the
>> QEMU code ?
>
> Hi,
>
> Nope, not that I know of so far.

And just to be a bit more clear, the reply above takes into
consideration that it's not emulated in QEMU.


Thanks,
Andra

>
>>
>>> The ioctl logic is mapped to PCI device commands e.g. the
>>> NE_ENCLAVE_START ioctl
>>> maps to an enclave start PCI command or the
>>> KVM_SET_USER_MEMORY_REGION maps to
>>> an add memory PCI command. The PCI device commands are then
>>> translated into
>>> actions taken on the hypervisor side; that's the Nitro hypervisor
>>> running on the
>>> host where the primary VM is running.
>>>
>>> 2. The enclave itself - a VM running on the same host as the primary
>>> VM that
>>> spawned it.
>>>
>>> The enclave VM has no persistent storage or network interface
>>> attached, it uses
>>> its own memory and CPUs + its virtio-vsock emulated device for
>>> communication
>>> with the primary VM.
>>>
>>> The memory and CPUs are carved out of the primary VM, they are
>>> dedicated for the
>>> enclave. The Nitro hypervisor running on the host ensures memory and
>>> CPU
>>> isolation between the primary VM and the enclave VM.
>>>
>>>
>>> These two components need to reflect the same state e.g. when the
>>> enclave
>>> abstraction process (1) is terminated, the enclave VM (2) is
>>> terminated as well.
>>>
>>> With regard to the communication channel, the primary VM has its own
>>> emulated
>>> virtio-vsock PCI device. The enclave VM has its own emulated
>>> virtio-vsock device
>>> as well. This channel is used, for example, to fetch data in the
>>> enclave and
>>> then process it. An application that sets up the vsock socket and
>>> connects or
>>> listens, depending on the use case, is then developed to use this
>>> channel; this
>>> happens on both ends - primary VM and enclave VM.
>>>
>>> Let me know if further clarifications are needed.
>>>
>>>>> The proposed solution is following the KVM model and uses the KVM
>>>>> API to be able
>>>>> to create and set resources for enclaves. An additional ioctl
>>>>> command, besides
>>>>> the ones provided by KVM, is used to start an enclave and setup
>>>>> the addressing
>>>>> for the communication channel and an enclave unique id.
>>>> Reusing some KVM ioctls is definitely a good idea, but I wouldn't
>>>> really
>>>> say it's the KVM API since the VCPU file descriptor is basically non
>>>> functional (without KVM_RUN and mmap it's not really the KVM API).
>>> It uses part of the KVM API or a set of KVM ioctls to model the way
>>> a VM is
>>> created / terminated. That's true, KVM_RUN and mmap-ing the vcpu fd
>>> are not
>>> included.
>>>
>>> Thanks for the feedback regarding the reuse of KVM ioctls.
>>>
>>> Andra
>>>
>>>
>>>
>>>
>>> Amazon Development Center (Romania) S.R.L. registered office: 27A
>>> Sf. Lazar
>>> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania.
>>> Registered in
>>> Romania. Registration number J22/2621/2005.
>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-24 10:04:50

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v1 00/15] Add support for Nitro Enclaves

> From: Paraschiv, Andra-Irina
> Sent: Thursday, April 23, 2020 9:20 PM
>
> On 22/04/2020 00:46, Paolo Bonzini wrote:
> > On 21/04/20 20:41, Andra Paraschiv wrote:
> >> An enclave communicates with the primary VM via a local communication
> channel,
> >> using virtio-vsock [2]. An enclave does not have a disk or a network device
> >> attached.
> > Is it possible to have a sample of this in the samples/ directory?
>
> I can add in v2 a sample file including the basic flow of how to use the
> ioctl interface to create / terminate an enclave.
>
> Then we can update / build on top it based on the ongoing discussions on
> the patch series and the received feedback.
>
> >
> > I am interested especially in:
> >
> > - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
> >
> > - the communication channel; does the enclave see the usual local APIC
> > and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
> > where is the virtio-vsock device (virtio-mmio I suppose) placed in memory?
> >
> > - what the enclave is allowed to do: can it change privilege levels,
> > what happens if the enclave performs an access to nonexistent memory,
> etc.
> >
> > - whether there are special hypercall interfaces for the enclave
>
> An enclave is a VM, running on the same host as the primary VM, that
> launched the enclave. They are siblings.
>
> Here we need to think of two components:
>
> 1. An enclave abstraction process - a process running in the primary VM
> guest, that uses the provided ioctl interface of the Nitro Enclaves
> kernel driver to spawn an enclave VM (that's 2 below).
>
> How does all gets to an enclave VM running on the host?
>
> There is a Nitro Enclaves emulated PCI device exposed to the primary VM.
> The driver for this new PCI device is included in the current patch series.
>
> The ioctl logic is mapped to PCI device commands e.g. the
> NE_ENCLAVE_START ioctl maps to an enclave start PCI command or the
> KVM_SET_USER_MEMORY_REGION maps to an add memory PCI command.
> The PCI
> device commands are then translated into actions taken on the hypervisor
> side; that's the Nitro hypervisor running on the host where the primary
> VM is running.
>
> 2. The enclave itself - a VM running on the same host as the primary VM
> that spawned it.
>
> The enclave VM has no persistent storage or network interface attached,
> it uses its own memory and CPUs + its virtio-vsock emulated device for
> communication with the primary VM.

sounds like a firecracker VM?

>
> The memory and CPUs are carved out of the primary VM, they are dedicated
> for the enclave. The Nitro hypervisor running on the host ensures memory
> and CPU isolation between the primary VM and the enclave VM.

In last paragraph, you said that the enclave VM uses its own memory and
CPUs. Then here, you said the memory/CPUs are carved out and dedicated
from the primary VM. Can you elaborate which one is accurate? or a mixed
model?

>
>
> These two components need to reflect the same state e.g. when the
> enclave abstraction process (1) is terminated, the enclave VM (2) is
> terminated as well.
>
> With regard to the communication channel, the primary VM has its own
> emulated virtio-vsock PCI device. The enclave VM has its own emulated
> virtio-vsock device as well. This channel is used, for example, to fetch
> data in the enclave and then process it. An application that sets up the
> vsock socket and connects or listens, depending on the use case, is then
> developed to use this channel; this happens on both ends - primary VM
> and enclave VM.

How does the application in the primary VM assign task to be executed
in the enclave VM? I didn't see such command in this series, so suppose
it is also communicated through virtio-vsock?

>
> Let me know if further clarifications are needed.
>
> >
> >> The proposed solution is following the KVM model and uses the KVM API
> to be able
> >> to create and set resources for enclaves. An additional ioctl command,
> besides
> >> the ones provided by KVM, is used to start an enclave and setup the
> addressing
> >> for the communication channel and an enclave unique id.
> > Reusing some KVM ioctls is definitely a good idea, but I wouldn't really
> > say it's the KVM API since the VCPU file descriptor is basically non
> > functional (without KVM_RUN and mmap it's not really the KVM API).
>
> It uses part of the KVM API or a set of KVM ioctls to model the way a VM
> is created / terminated. That's true, KVM_RUN and mmap-ing the vcpu fd
> are not included.
>
> Thanks for the feedback regarding the reuse of KVM ioctls.
>
> Andra
>

Thanks
Kevin

2020-04-24 13:00:13

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves


On 23.04.20 23:18, Paolo Bonzini wrote:
>
>
> On 23/04/20 22:56, Alexander Graf wrote:
>>>
>>> CPL3 is how the user application run, but does the enclave's Linux boot
>>> process start in real mode at the reset vector (0xfffffff0), in 16-bit
>>> protected mode at the Linux bzImage entry point, or at the ELF entry
>>> point?
>>
>> There is no "entry point" per se. You prepopulate at target bzImage into
>> the enclave memory on boot which then follows the standard boot
>> protocol. Everything
>
> There's still a "where" missing in that sentence. :) I assume you put
> it at 0x10000 (and so the entry point at 0x10200)? That should be
> documented because that is absolutely not what the KVM API looks like.

Yes, that part is not documented in the patch set, correct. I would
personally just make an example user space binary the documentation for
now. Later we will publish a proper device specification outside of the
Linux ecosystem which will describe the register layout and image
loading semantics in verbatim, so that other OSs can implement the
driver too.

To answer the question though, the target file is in a newly invented
file format called "EIF" and it needs to be loaded at offset 0x800000 of
the address space donated to the enclave.

>
>> before that (enclave firmware, etc.) is provided by
>> the enclave environment.
>>
>> Think of it like a mechanism to launch a second QEMU instance on the
>> host, but all you can actually control are the -smp, -m, -kernel and
>> -initrd parameters.
>
> Are there requirements on how to populate the memory to ensure that the
> host firmware doesn't crash and burn? E.g. some free memory right below
> 4GiB (for the firmware, the LAPIC/IOAPIC or any other special MMIO
> devices you have, PCI BARs, and the like)?

No, the target memory layout is currently disconnected from the memory
layout defined through the KVM_SET_USER_MEMORY_REGION ioctl. While we do
check that guest_phys_addr is contiguous, the underlying device API does
not have any notion of a "guest address" - all it gets is a
scatter-gather sliced bucket of memory.

>> The only I/O channel you have between your VM and
>> that new VM is a vsock channel which is configured by the host on your
>> behalf.
>
> Is this virtio-mmio or virtio-pci, and what other emulated devices are
> there and how do you discover them? Are there any ISA devices
> (RTC/PIC/PIT), and are there SMBIOS/RSDP/MP tables in the F segment?

It is virtio-mmio for the enclave and virtio-pci for the parent. The
enclave is a microvm.

For more details on the enclave device topology, we'll have to wait for
the public documentation that describes the enclave view of the world
though. I don't think that one's public quite yet. This patch set is
about the parent's view.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2020-04-24 14:06:42

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 24/04/2020 12:59, Tian, Kevin wrote:
>
>> From: Paraschiv, Andra-Irina
>> Sent: Thursday, April 23, 2020 9:20 PM
>>
>> On 22/04/2020 00:46, Paolo Bonzini wrote:
>>> On 21/04/20 20:41, Andra Paraschiv wrote:
>>>> An enclave communicates with the primary VM via a local communication
>> channel,
>>>> using virtio-vsock [2]. An enclave does not have a disk or a network device
>>>> attached.
>>> Is it possible to have a sample of this in the samples/ directory?
>> I can add in v2 a sample file including the basic flow of how to use the
>> ioctl interface to create / terminate an enclave.
>>
>> Then we can update / build on top it based on the ongoing discussions on
>> the patch series and the received feedback.
>>
>>> I am interested especially in:
>>>
>>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>>>
>>> - the communication channel; does the enclave see the usual local APIC
>>> and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
>>> where is the virtio-vsock device (virtio-mmio I suppose) placed in memory?
>>>
>>> - what the enclave is allowed to do: can it change privilege levels,
>>> what happens if the enclave performs an access to nonexistent memory,
>> etc.
>>> - whether there are special hypercall interfaces for the enclave
>> An enclave is a VM, running on the same host as the primary VM, that
>> launched the enclave. They are siblings.
>>
>> Here we need to think of two components:
>>
>> 1. An enclave abstraction process - a process running in the primary VM
>> guest, that uses the provided ioctl interface of the Nitro Enclaves
>> kernel driver to spawn an enclave VM (that's 2 below).
>>
>> How does all gets to an enclave VM running on the host?
>>
>> There is a Nitro Enclaves emulated PCI device exposed to the primary VM.
>> The driver for this new PCI device is included in the current patch series.
>>
>> The ioctl logic is mapped to PCI device commands e.g. the
>> NE_ENCLAVE_START ioctl maps to an enclave start PCI command or the
>> KVM_SET_USER_MEMORY_REGION maps to an add memory PCI command.
>> The PCI
>> device commands are then translated into actions taken on the hypervisor
>> side; that's the Nitro hypervisor running on the host where the primary
>> VM is running.
>>
>> 2. The enclave itself - a VM running on the same host as the primary VM
>> that spawned it.
>>
>> The enclave VM has no persistent storage or network interface attached,
>> it uses its own memory and CPUs + its virtio-vsock emulated device for
>> communication with the primary VM.
> sounds like a firecracker VM?

It's a VM crafted for enclave needs.

>
>> The memory and CPUs are carved out of the primary VM, they are dedicated
>> for the enclave. The Nitro hypervisor running on the host ensures memory
>> and CPU isolation between the primary VM and the enclave VM.
> In last paragraph, you said that the enclave VM uses its own memory and
> CPUs. Then here, you said the memory/CPUs are carved out and dedicated
> from the primary VM. Can you elaborate which one is accurate? or a mixed
> model?

Memory and CPUs are carved out of the primary VM and are dedicated for
the enclave VM. I mentioned above as "its own" in the sense that the
primary VM doesn't use these carved out resources while the enclave is
running, as they are dedicated to the enclave.

Hope that now it's more clear.

>
>>
>> These two components need to reflect the same state e.g. when the
>> enclave abstraction process (1) is terminated, the enclave VM (2) is
>> terminated as well.
>>
>> With regard to the communication channel, the primary VM has its own
>> emulated virtio-vsock PCI device. The enclave VM has its own emulated
>> virtio-vsock device as well. This channel is used, for example, to fetch
>> data in the enclave and then process it. An application that sets up the
>> vsock socket and connects or listens, depending on the use case, is then
>> developed to use this channel; this happens on both ends - primary VM
>> and enclave VM.
> How does the application in the primary VM assign task to be executed
> in the enclave VM? I didn't see such command in this series, so suppose
> it is also communicated through virtio-vsock?

The application that runs in the enclave needs to be packaged in an
enclave image together with the OS ( e.g. kernel, ramdisk, init ) that
will run in the enclave VM.

Then the enclave image is loaded in memory. After booting is finished,
the application starts. Now, depending on the app implementation and use
case, one example can be that the app in the enclave waits for data to
be fetched in via the vsock channel.

Thanks,
Andra

>
>> Let me know if further clarifications are needed.
>>
>>>> The proposed solution is following the KVM model and uses the KVM API
>> to be able
>>>> to create and set resources for enclaves. An additional ioctl command,
>> besides
>>>> the ones provided by KVM, is used to start an enclave and setup the
>> addressing
>>>> for the communication channel and an enclave unique id.
>>> Reusing some KVM ioctls is definitely a good idea, but I wouldn't really
>>> say it's the KVM API since the VCPU file descriptor is basically non
>>> functional (without KVM_RUN and mmap it's not really the KVM API).
>> It uses part of the KVM API or a set of KVM ioctls to model the way a VM
>> is created / terminated. That's true, KVM_RUN and mmap-ing the vcpu fd
>> are not included.
>>
>> Thanks for the feedback regarding the reuse of KVM ioctls.
>>
>> Andra
>>
> Thanks
> Kevin




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-24 15:16:17

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 02/15] nitro_enclaves: Define the PCI device interface



On 23/04/2020 16:37, Paraschiv, Andra-Irina wrote:
>
>
> On 22/04/2020 00:22, Paolo Bonzini wrote:
>> On 21/04/20 20:41, Andra Paraschiv wrote:
>>> The Nitro Enclaves (NE) driver communicates with a new PCI device, that
>>> is exposed to a virtual machine (VM) and handles commands meant for
>>> handling enclaves lifetime e.g. creation, termination, setting memory
>>> regions. The communication with the PCI device is handled using a MMIO
>>> space and MSI-X interrupts.
>>>
>>> This device communicates with the hypervisor on the host, where the VM
>>> that spawned the enclave itself run, e.g. to launch a VM that is used
>>> for the enclave.
>>>
>>> Define the MMIO space of the PCI device, the commands that are
>>> provided by this device. Add an internal data structure used as private
>>> data for the PCI device driver and the functions for the PCI device
>>> init
>>> / uninit and command requests handling.
>>>
>>> Signed-off-by: Alexandru-Catalin Vasile <[email protected]>
>>> Signed-off-by: Alexandru Ciobotaru <[email protected]>
>>> Signed-off-by: Andra Paraschiv <[email protected]>
>>> ---
>>>   .../virt/amazon/nitro_enclaves/ne_pci_dev.h   | 266
>>> ++++++++++++++++++
>>>   1 file changed, 266 insertions(+)
>>>   create mode 100644 drivers/virt/amazon/nitro_enclaves/ne_pci_dev.h
>> Can this be placed just in drivers/virt/nitro_enclaves, or
>> drivers/virt/enclave/nitro?  It's not unlikely that this device be
>> implemented outside EC2 sooner or later, and there's nothing
>> Amazon-specific as far as I can see from the UAPI.
>
> I can update the path to drivers/virt/nitro_enclaves.
>
> The PCI device in the patch series is registered under Amazon PCI
> Vendor ID and it has this PCI Device ID - 0xe4c1.

v2 now includes the updated path - drivers/virt/nitro_enclaves.

Thanks,
Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-24 15:31:37

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 14/15] nitro_enclaves: Add Makefile for the Nitro Enclaves driver



On 23/04/2020 11:43, kbuild test robot wrote:
>
> Hi Andra,
>
> Thank you for the patch! Perhaps something to improve:

Fixed in v2.

Andra

>
> [auto build test WARNING on linus/master]
> [also build test WARNING on linux/master v5.7-rc2 next-20200422]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
> url: https://github.com/0day-ci/linux/commits/Andra-Paraschiv/Add-support-for-Nitro-Enclaves/20200423-130814
> base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 18bf34080c4c3beb6699181986cc97dd712498fe
> config: i386-allmodconfig (attached as .config)
> compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kbuild test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
> In file included from include/linux/device.h:15:0,
> from drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:22:
> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c: In function 'ne_submit_request':
>>> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:80:9: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
> "Invalid req size=%ld for cmd type=%d\n",
> ^
> include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt'
> #define dev_fmt(fmt) fmt
> ^~~
>>> include/linux/dev_printk.h:167:3: note: in expansion of macro 'dev_err'
> dev_level(dev, fmt, ##__VA_ARGS__); \
> ^~~~~~~~~
> include/linux/dev_printk.h:177:2: note: in expansion of macro 'dev_level_ratelimited'
> dev_level_ratelimited(dev_err, dev, fmt, ##__VA_ARGS__)
> ^~~~~~~~~~~~~~~~~~~~~
>>> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:79:3: note: in expansion of macro 'dev_err_ratelimited'
> dev_err_ratelimited(&pdev->dev,
> ^~~~~~~~~~~~~~~~~~~
> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c: In function 'ne_retrieve_reply':
> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:121:35: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
> dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
> ^
> include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt'
> #define dev_fmt(fmt) fmt
> ^~~
>>> include/linux/dev_printk.h:167:3: note: in expansion of macro 'dev_err'
> dev_level(dev, fmt, ##__VA_ARGS__); \
> ^~~~~~~~~
> include/linux/dev_printk.h:177:2: note: in expansion of macro 'dev_level_ratelimited'
> dev_level_ratelimited(dev_err, dev, fmt, ##__VA_ARGS__)
> ^~~~~~~~~~~~~~~~~~~~~
> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:121:3: note: in expansion of macro 'dev_err_ratelimited'
> dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
> ^~~~~~~~~~~~~~~~~~~
> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c: In function 'ne_do_request':
> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:193:9: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
> "Invalid req size=%ld for cmd type=%d\n",
> ^
> include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt'
> #define dev_fmt(fmt) fmt
> ^~~
>>> include/linux/dev_printk.h:167:3: note: in expansion of macro 'dev_err'
> dev_level(dev, fmt, ##__VA_ARGS__); \
> ^~~~~~~~~
> include/linux/dev_printk.h:177:2: note: in expansion of macro 'dev_level_ratelimited'
> dev_level_ratelimited(dev_err, dev, fmt, ##__VA_ARGS__)
> ^~~~~~~~~~~~~~~~~~~~~
> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:192:3: note: in expansion of macro 'dev_err_ratelimited'
> dev_err_ratelimited(&pdev->dev,
> ^~~~~~~~~~~~~~~~~~~
> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:203:35: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
> dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
> ^
> include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt'
> #define dev_fmt(fmt) fmt
> ^~~
>>> include/linux/dev_printk.h:167:3: note: in expansion of macro 'dev_err'
> dev_level(dev, fmt, ##__VA_ARGS__); \
> ^~~~~~~~~
> include/linux/dev_printk.h:177:2: note: in expansion of macro 'dev_level_ratelimited'
> dev_level_ratelimited(dev_err, dev, fmt, ##__VA_ARGS__)
> ^~~~~~~~~~~~~~~~~~~~~
> drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:203:3: note: in expansion of macro 'dev_err_ratelimited'
> dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
> ^~~~~~~~~~~~~~~~~~~
>
> vim +80 drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
>
> 0ed609272739ee Andra Paraschiv 2020-04-21 42
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 43 /**
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 44 * ne_submit_request - Submit command request to the PCI device based on the
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 45 * command type.
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 46 *
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 47 * This function gets called with the ne_pci_dev mutex held.
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 48 *
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 49 * @pdev: PCI device to send the command to.
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 50 * @cmd_type: command type of the request sent to the PCI device.
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 51 * @cmd_request: command request payload.
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 52 * @cmd_request_size: size of the command request payload.
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 53 *
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 54 * @returns: 0 on success, negative return value on failure.
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 55 */
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 56 static int ne_submit_request(struct pci_dev *pdev,
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 57 enum ne_pci_dev_cmd_type cmd_type,
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 58 void *cmd_request, size_t cmd_request_size)
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 59 {
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 60 struct ne_pci_dev *ne_pci_dev = NULL;
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 61
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 62 BUG_ON(!pdev);
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 63
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 64 ne_pci_dev = pci_get_drvdata(pdev);
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 65 BUG_ON(!ne_pci_dev);
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 66 BUG_ON(!ne_pci_dev->iomem_base);
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 67
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 68 if (WARN_ON(cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD)) {
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 69 dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%d\n",
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 70 cmd_type);
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 71
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 72 return -EINVAL;
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 73 }
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 74
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 75 if (WARN_ON(!cmd_request))
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 76 return -EINVAL;
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 77
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 78 if (WARN_ON(cmd_request_size > NE_SEND_DATA_SIZE)) {
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 @79 dev_err_ratelimited(&pdev->dev,
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 @80 "Invalid req size=%ld for cmd type=%d\n",
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 81 cmd_request_size, cmd_type);
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 82
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 83 return -EINVAL;
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 84 }
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 85
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 86 memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request,
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 87 cmd_request_size);
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 88
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 89 iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 90
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 91 return 0;
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 92 }
> 08a5a524ab0b6c Andra Paraschiv 2020-04-21 93
>
> :::::: The code at line 80 was first introduced by commit
> :::::: 08a5a524ab0b6c939997c8d44b4d07e5ee97e91d nitro_enclaves: Handle PCI device command requests
>
> :::::: TO: Andra Paraschiv <[email protected]>
> :::::: CC: 0day robot <[email protected]>
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/[email protected]




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-24 16:30:08

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 24/04/20 14:56, Alexander Graf wrote:
> Yes, that part is not documented in the patch set, correct. I would
> personally just make an example user space binary the documentation for
> now. Later we will publish a proper device specification outside of the
> Linux ecosystem which will describe the register layout and image
> loading semantics in verbatim, so that other OSs can implement the
> driver too.

But this is not part of the device specification, it's part of the child
enclave view. And in my opinion, understanding the way the child
enclave is programmed is very important to understand if Linux should at
all support this new device.

> To answer the question though, the target file is in a newly invented
> file format called "EIF" and it needs to be loaded at offset 0x800000 of
> the address space donated to the enclave.

What is this EIF?

* a new Linux kernel format? If so, are there patches in flight to
compile Linux in this new format (and I would be surprised if they were
accepted, since we already have PVH as a standard way to boot
uncompressed Linux kernels)?

* a userspace binary (the CPL3 that Andra was referring to)? In that
case what is the rationale to prefer it over a statically linked ELF binary?

* something completely different like WebAssembly?

Again, I cannot provide a sensible review without explaining how to use
all this. I understand that Amazon needs to do part of the design
behind closed doors, but this seems to have the resulted in issues that
reminds me of Intel's SGX misadventures. If Amazon has designed NE in a
way that is incompatible with open standards, it's up to Amazon to fix
it for the patches to be accepted. I'm very saddened to have to say
this, because I do love the idea.

Thanks,

Paolo

2020-04-24 17:04:22

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 14/15] nitro_enclaves: Add Makefile for the Nitro Enclaves driver



On 23/04/2020 11:12, kbuild test robot wrote:
>
> Hi Andra,
>
> Thank you for the patch! Yet something to improve:

From what I see, this was triggered by using the uapi KVM header, which
includesasm/kvm.hand this is not present for parisc arch.

Andra

>
> [auto build test ERROR on linus/master]
> [also build test ERROR on linux/master v5.7-rc2 next-20200422]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
> url: https://github.com/0day-ci/linux/commits/Andra-Paraschiv/Add-support-for-Nitro-Enclaves/20200423-130814
> base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 18bf34080c4c3beb6699181986cc97dd712498fe
> config: parisc-allmodconfig (attached as .config)
> compiler: hppa-linux-gcc (GCC) 9.3.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=parisc
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kbuild test robot <[email protected]>
>
> All errors (new ones prefixed by >>):
>
> In file included from include/uapi/linux/nitro_enclaves.h:21,
> from include/linux/nitro_enclaves.h:21,
> from drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c:26:
>>> include/uapi/linux/kvm.h:14:10: fatal error: asm/kvm.h: No such file or directory
> 14 | #include <asm/kvm.h>
> | ^~~~~~~~~~~
> compilation terminated.
>
> vim +14 include/uapi/linux/kvm.h
>
> 6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 4
> 6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 5 /*
> 6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 6 * Userspace interface for /dev/kvm - kernel based virtual machine
> 6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 7 *
> dea8caee7b6971 include/linux/kvm.h Rusty Russell 2007-07-17 8 * Note: you must update KVM_API_VERSION if you change this interface.
> 6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 9 */
> 6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 10
> 00bfddaf7f68a6 include/linux/kvm.h Jaswinder Singh Rajput 2009-01-15 11 #include <linux/types.h>
> 97646202bc3f19 include/linux/kvm.h Christian Borntraeger 2008-03-12 12 #include <linux/compiler.h>
> 6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 13 #include <linux/ioctl.h>
> f6a40e3bdf5fe0 include/linux/kvm.h Jerone Young 2007-11-19 @14 #include <asm/kvm.h>
> 6aa8b732ca01c3 include/linux/kvm.h Avi Kivity 2006-12-10 15
>
> :::::: The code at line 14 was first introduced by commit
> :::::: f6a40e3bdf5fe0a7d7d7f2dbc5b10158fbdad968 KVM: Portability: Move kvm_memory_alias to asm/kvm.h
>
> :::::: TO: Jerone Young <[email protected]>
> :::::: CC: Avi Kivity <[email protected]>
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/[email protected]




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-24 19:13:48

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves


On 24.04.20 18:27, Paolo Bonzini wrote:
>
> On 24/04/20 14:56, Alexander Graf wrote:
>> Yes, that part is not documented in the patch set, correct. I would
>> personally just make an example user space binary the documentation for
>> now. Later we will publish a proper device specification outside of the
>> Linux ecosystem which will describe the register layout and image
>> loading semantics in verbatim, so that other OSs can implement the
>> driver too.
>
> But this is not part of the device specification, it's part of the child
> enclave view. And in my opinion, understanding the way the child
> enclave is programmed is very important to understand if Linux should at
> all support this new device.

Oh, absolutely. All of the "how do I load an enclave image, run it and
interact with it" bits need to be explained.

What I was saying above is that maybe code is easier to transfer that
than a .txt file that gets lost somewhere in the Documentation directory :).

I'm more than happy to hear of other suggestions though.

>
>> To answer the question though, the target file is in a newly invented
>> file format called "EIF" and it needs to be loaded at offset 0x800000 of
>> the address space donated to the enclave.
>
> What is this EIF?

It's just a very dumb container format that has a trivial header, a
section with the bzImage and one to many sections of initramfs.

As mentioned earlier in this thread, it really is just "-kernel" and
"-initrd", packed into a single binary for transmission to the host.

>
> * a new Linux kernel format? If so, are there patches in flight to
> compile Linux in this new format (and I would be surprised if they were
> accepted, since we already have PVH as a standard way to boot
> uncompressed Linux kernels)?
>
> * a userspace binary (the CPL3 that Andra was referring to)? In that
> case what is the rationale to prefer it over a statically linked ELF binary?
>
> * something completely different like WebAssembly?
>
> Again, I cannot provide a sensible review without explaining how to use
> all this. I understand that Amazon needs to do part of the design
> behind closed doors, but this seems to have the resulted in issues that
> reminds me of Intel's SGX misadventures. If Amazon has designed NE in a
> way that is incompatible with open standards, it's up to Amazon to fix

Oh, if there's anything that conflicts with open standards here, I would
love to hear it immediately. I do not believe in security by obscurity :).


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2020-04-25 14:28:01

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH v1 04/15] nitro_enclaves: Init PCI device driver


On 21/04/2020 21:41, Andra Paraschiv wrote:
> +
> +/**
> + * ne_setup_msix - Setup MSI-X vectors for the PCI device.
> + *
> + * @pdev: PCI device to setup the MSI-X for.
> + * @ne_pci_dev: PCI device private data structure.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_setup_msix(struct pci_dev *pdev, struct ne_pci_dev *ne_pci_dev)
> +{
> + int nr_vecs = 0;
> + int rc = -EINVAL;
> +
> + BUG_ON(!ne_pci_dev);
This kind of defensive programming does not align with Linux coding
convention.
I think these BUG_ON() conditions should be removed.
> +
> + nr_vecs = pci_msix_vec_count(pdev);
> + if (nr_vecs < 0) {
> + rc = nr_vecs;
> +
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in getting vec count [rc=%d]\n",
> + rc);
> +
> + return rc;
> + }
> +
> + rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
> + if (rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in alloc MSI-X vecs [rc=%d]\n",
> + rc);
> +
> + goto err_alloc_irq_vecs;
You should just replace this with "return rc;" as no cleanup is required
here.
> + }
> +
> + return 0;
> +
> +err_alloc_irq_vecs:
> + return rc;
> +}
> +
> +/**
> + * ne_pci_dev_enable - Select PCI device version and enable it.
> + *
> + * @pdev: PCI device to select version for and then enable.
> + * @ne_pci_dev: PCI device private data structure.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_pci_dev_enable(struct pci_dev *pdev,
> + struct ne_pci_dev *ne_pci_dev)
> +{
> + u8 dev_enable_reply = 0;
> + u16 dev_version_reply = 0;
> +
> + BUG_ON(!pdev);
> + BUG_ON(!ne_pci_dev);
> + BUG_ON(!ne_pci_dev->iomem_base);
Same.
> +
> + iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
> +
> + dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
> + if (dev_version_reply != NE_VERSION_MAX) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in pci dev version cmd\n");
> +
> + return -EIO;
> + }
> +
> + iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
> +
> + dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
> + if (dev_enable_reply != NE_ENABLE_ON) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in pci dev enable cmd\n");
> +
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * ne_pci_dev_disable - Disable PCI device.
> + *
> + * @pdev: PCI device to disable.
> + * @ne_pci_dev: PCI device private data structure.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_pci_dev_disable(struct pci_dev *pdev,
> + struct ne_pci_dev *ne_pci_dev)
> +{
> + u8 dev_disable_reply = 0;
> +
> + BUG_ON(!pdev);
> + BUG_ON(!ne_pci_dev);
> + BUG_ON(!ne_pci_dev->iomem_base);
Same.
> +
> + iowrite8(NE_ENABLE_OFF, ne_pci_dev->iomem_base + NE_ENABLE);
> +
> + /*
> + * TODO: Check for NE_ENABLE_OFF in a loop, to handle cases when the
> + * device state is not immediately set to disabled and going through a
> + * transitory state of disabling.
> + */
> + dev_disable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
> + if (dev_disable_reply != NE_ENABLE_OFF) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in pci dev disable cmd\n");
> +
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +static int ne_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> +{
> + struct ne_pci_dev *ne_pci_dev = NULL;
> + int rc = -EINVAL;
Unnecessary variable initialization.
ne_pci_dev and rc are initialized below always before they are used.
> +
> + ne_pci_dev = kzalloc(sizeof(*ne_pci_dev), GFP_KERNEL);
> + if (!ne_pci_dev)
> + return -ENOMEM;
> +
> + rc = pci_enable_device(pdev);
> + if (rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in pci dev enable [rc=%d]\n", rc);
> +
Why is this dev_err_ratelimited() instead of dev_err()?
Same for the rest of error printing in this probe() method and other
places in this patch.
> + goto err_pci_enable_dev;
I find it confusing that the error labels are named based on the
failure-case they are used,
instead of the action they do (i.e. Unwind previous successful operation
that requires unwinding).
This doesn't seem to match Linux kernel coding convention.
It also created an unnecessary 2 labels pointing to the same place in
cleanup code.
> + }
> +
> + rc = pci_request_regions_exclusive(pdev, "ne_pci_dev");
> + if (rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in pci request regions [rc=%d]\n",
> + rc);
> +
> + goto err_req_regions;
> + }
> +
> + ne_pci_dev->iomem_base = pci_iomap(pdev, PCI_BAR_NE, 0);
> + if (!ne_pci_dev->iomem_base) {
> + rc = -ENOMEM;
> +
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in pci bar mapping [rc=%d]\n", rc);
> +
> + goto err_iomap;
> + }
> +
> + rc = ne_setup_msix(pdev, ne_pci_dev);
> + if (rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in pci dev msix setup [rc=%d]\n",
> + rc);
> +
> + goto err_setup_msix;
> + }
> +
> + rc = ne_pci_dev_disable(pdev, ne_pci_dev);
> + if (rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in ne_pci_dev disable [rc=%d]\n",
> + rc);
> +
> + goto err_ne_pci_dev_disable;
> + }
It seems weird that we need to disable the device before enabling it on
the probe() method.
Why can't we just enable the device without disabling it?
> +
> + rc = ne_pci_dev_enable(pdev, ne_pci_dev);
> + if (rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in ne_pci_dev enable [rc=%d]\n",
> + rc);
> +
> + goto err_ne_pci_dev_enable;
> + }
> +
> + atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
> + init_waitqueue_head(&ne_pci_dev->cmd_reply_wait_q);
> + INIT_LIST_HEAD(&ne_pci_dev->enclaves_list);
> + mutex_init(&ne_pci_dev->enclaves_list_mutex);
> + mutex_init(&ne_pci_dev->pci_dev_mutex);
> +
> + pci_set_drvdata(pdev, ne_pci_dev);
If you would have pci_set_drvdata() as one of the first operations in
ne_probe(), then you could have avoided
passing both struct pci_dev  and struct ne_pci_dev parameters to
ne_setup_msix(), ne_pci_dev_enable() and ne_pci_dev_disable().
Which would have been a bit more elegant.
> +
> + return 0;
> +
> +err_ne_pci_dev_enable:
> +err_ne_pci_dev_disable:
> + pci_free_irq_vectors(pdev);
> +err_setup_msix:
> + pci_iounmap(pdev, ne_pci_dev->iomem_base);
> +err_iomap:
> + pci_release_regions(pdev);
> +err_req_regions:
> + pci_disable_device(pdev);
> +err_pci_enable_dev:
> + kzfree(ne_pci_dev);
An empty new-line is appropriate here.
To separate the return statement from the cleanup logic.
> + return rc;
> +}
> +
> +static void ne_remove(struct pci_dev *pdev)
> +{
> + struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
> +
> + if (!ne_pci_dev || !ne_pci_dev->iomem_base)
> + return;
Why is this condition necessary?
The ne_remove() function should be called only in case ne_probe() succeeded.
In that case, both ne_pci_dev and ne_pci_dev->iomem_base should be non-NULL.
> +
> + ne_pci_dev_disable(pdev, ne_pci_dev);
> +
> + pci_set_drvdata(pdev, NULL);
> +
> + pci_free_irq_vectors(pdev);
> +
> + pci_iounmap(pdev, ne_pci_dev->iomem_base);
> +
> + kzfree(ne_pci_dev);
> +
> + pci_release_regions(pdev);
> +
> + pci_disable_device(pdev);
You should aspire to keep ne_remove() order of operations to be the
reverse order of operations done in ne_probe().
Which would also nicely match the order of operations done in ne_probe()
cleanup.
i.e. The following order:

pci_set_drvdata();
ne_pci_dev_disable();
pci_free_irq_vectors();
pci_iounmap();
pci_release_regions();
pci_disable_device()
kzfree();

-Liran

2020-04-25 14:56:33

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH v1 05/15] nitro_enclaves: Handle PCI device command requests


On 21/04/2020 21:41, Andra Paraschiv wrote:
> The Nitro Enclaves PCI device exposes a MMIO space that this driver
> uses to submit command requests and to receive command replies e.g. for
> enclave creation / termination or setting enclave resources.
>
> Add logic for handling PCI device command requests based on the given
> command type.
>
> Register an MSI-X interrupt vector for command reply notifications to
> handle this type of communication events.
>
> Signed-off-by: Alexandru-Catalin Vasile <[email protected]>
> Signed-off-by: Andra Paraschiv <[email protected]>
> ---
> .../virt/amazon/nitro_enclaves/ne_pci_dev.c | 264 ++++++++++++++++++
> 1 file changed, 264 insertions(+)
>
> diff --git a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
> index 8fbee95ea291..7453d129689a 100644
> --- a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
> +++ b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
> @@ -40,6 +40,251 @@ static const struct pci_device_id ne_pci_ids[] = {
>
> MODULE_DEVICE_TABLE(pci, ne_pci_ids);
>
> +/**
> + * ne_submit_request - Submit command request to the PCI device based on the
> + * command type.
> + *
> + * This function gets called with the ne_pci_dev mutex held.
> + *
> + * @pdev: PCI device to send the command to.
> + * @cmd_type: command type of the request sent to the PCI device.
> + * @cmd_request: command request payload.
> + * @cmd_request_size: size of the command request payload.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_submit_request(struct pci_dev *pdev,
> + enum ne_pci_dev_cmd_type cmd_type,
> + void *cmd_request, size_t cmd_request_size)
> +{
> + struct ne_pci_dev *ne_pci_dev = NULL;
These local vars are unnecessarily initialized.
> +
> + BUG_ON(!pdev);
> +
> + ne_pci_dev = pci_get_drvdata(pdev);
> + BUG_ON(!ne_pci_dev);
> + BUG_ON(!ne_pci_dev->iomem_base);
You should remove these defensive BUG_ON() calls.
> +
> + if (WARN_ON(cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD)) {
> + dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%d\n",
> + cmd_type);
> +
> + return -EINVAL;
> + }
> +
> + if (WARN_ON(!cmd_request))
> + return -EINVAL;
> +
> + if (WARN_ON(cmd_request_size > NE_SEND_DATA_SIZE)) {
> + dev_err_ratelimited(&pdev->dev,
> + "Invalid req size=%ld for cmd type=%d\n",
> + cmd_request_size, cmd_type);
> +
> + return -EINVAL;
> + }
It doesn't make sense to have WARN_ON() print error to dmesg on every
evaluation to true,
together with using dev_err_ratelimited() which attempts to rate-limit
prints.

Anyway, these conditions were already checked by ne_do_request(). Why
also check them here?

> +
> + memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request,
> + cmd_request_size);
> +
> + iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
> +
> + return 0;
> +}
> +
> +/**
> + * ne_retrieve_reply - Retrieve reply from the PCI device.
> + *
> + * This function gets called with the ne_pci_dev mutex held.
> + *
> + * @pdev: PCI device to receive the reply from.
> + * @cmd_reply: command reply payload.
> + * @cmd_reply_size: size of the command reply payload.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_retrieve_reply(struct pci_dev *pdev,
> + struct ne_pci_dev_cmd_reply *cmd_reply,
> + size_t cmd_reply_size)
> +{
> + struct ne_pci_dev *ne_pci_dev = NULL;
These local vars are unnecessarily initialized.
> +
> + BUG_ON(!pdev);
> +
> + ne_pci_dev = pci_get_drvdata(pdev);
> + BUG_ON(!ne_pci_dev);
> + BUG_ON(!ne_pci_dev->iomem_base);
You should remove these defensive BUG_ON() calls.
> +
> + if (WARN_ON(!cmd_reply))
> + return -EINVAL;
> +
> + if (WARN_ON(cmd_reply_size > NE_RECV_DATA_SIZE)) {
> + dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
> + cmd_reply_size);
> +
> + return -EINVAL;
> + }
It doesn't make sense to have WARN_ON() print error to dmesg on every
evaluation to true,
together with using dev_err_ratelimited() which attempts to rate-limit
prints.

Anyway, these conditions were already checked by ne_do_request(). Why
also check them here?

> +
> + memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA,
> + cmd_reply_size);
> +
> + return 0;
> +}
> +
> +/**
> + * ne_wait_for_reply - Wait for a reply of a PCI command.
> + *
> + * This function gets called with the ne_pci_dev mutex held.
> + *
> + * @pdev: PCI device for which a reply is waited.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_wait_for_reply(struct pci_dev *pdev)
> +{
> + struct ne_pci_dev *ne_pci_dev = NULL;
> + int rc = -EINVAL;
These local vars are unnecessarily initialized.
> +
> + BUG_ON(!pdev);
> +
> + ne_pci_dev = pci_get_drvdata(pdev);
> + BUG_ON(!ne_pci_dev);
You should remove these defensive BUG_ON() calls.
> +
> + /*
> + * TODO: Update to _interruptible and handle interrupted wait event
> + * e.g. -ERESTARTSYS, incoming signals + add / update timeout.
> + */
> + rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
> + atomic_read(&ne_pci_dev->cmd_reply_avail) != 0,
> + msecs_to_jiffies(DEFAULT_TIMEOUT_MSECS));
> + if (!rc) {
> + pr_err("Wait event timed out when waiting for PCI cmd reply\n");
> +
> + return -ETIMEDOUT;
> + }
> +
> + return 0;
> +}
> +
> +int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
> + void *cmd_request, size_t cmd_request_size,
> + struct ne_pci_dev_cmd_reply *cmd_reply, size_t cmd_reply_size)
This function is introduced in this patch but it is not used.
It will cause compiling the kernel on this commit to raise
warnings/errors on unused functions.
You should introduce functions on the patch that they are used.
> +{
> + struct ne_pci_dev *ne_pci_dev = NULL;
> + int rc = -EINVAL;
These local vars are unnecessarily initialized.
> +
> + BUG_ON(!pdev);
> +
> + ne_pci_dev = pci_get_drvdata(pdev);
> + BUG_ON(!ne_pci_dev);
> + BUG_ON(!ne_pci_dev->iomem_base);
You should remove these defensive BUG_ON() calls.
> +
> + if (WARN_ON(cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD)) {
> + dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%d\n",
> + cmd_type);
> +
> + return -EINVAL;
> + }
> +
> + if (WARN_ON(!cmd_request))
> + return -EINVAL;
> +
> + if (WARN_ON(cmd_request_size > NE_SEND_DATA_SIZE)) {
> + dev_err_ratelimited(&pdev->dev,
> + "Invalid req size=%ld for cmd type=%d\n",
> + cmd_request_size, cmd_type);
> +
> + return -EINVAL;
> + }
> +
> + if (WARN_ON(!cmd_reply))
> + return -EINVAL;
> +
> + if (WARN_ON(cmd_reply_size > NE_RECV_DATA_SIZE)) {
> + dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
> + cmd_reply_size);
> +
> + return -EINVAL;
> + }
I would consider specifying all these conditions in function
documentation instead of enforcing them at runtime on every function call.
> +
> + /*
> + * Use this mutex so that the PCI device handles one command request at
> + * a time.
> + */
> + mutex_lock(&ne_pci_dev->pci_dev_mutex);
> +
> + atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
> +
> + rc = ne_submit_request(pdev, cmd_type, cmd_request, cmd_request_size);
> + if (rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in submit cmd request [rc=%d]\n",
> + rc);
> +
> + mutex_unlock(&ne_pci_dev->pci_dev_mutex);
> +
> + return rc;
Consider leaving function with a goto to a label that unlocks mutex and
then return.
> + }
> +
> + rc = ne_wait_for_reply(pdev);
> + if (rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in wait cmd reply [rc=%d]\n",
> + rc);
> +
> + mutex_unlock(&ne_pci_dev->pci_dev_mutex);
> +
> + return rc;
> + }
> +
> + rc = ne_retrieve_reply(pdev, cmd_reply, cmd_reply_size);
> + if (rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in retrieve cmd reply [rc=%d]\n",
> + rc);
> +
> + mutex_unlock(&ne_pci_dev->pci_dev_mutex);
> +
> + return rc;
> + }
> +
> + atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
> +
> + if (cmd_reply->rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in cmd process logic [rc=%d]\n",
> + cmd_reply->rc);
> +
> + mutex_unlock(&ne_pci_dev->pci_dev_mutex);
> +
> + return cmd_reply->rc;
> + }
> +
> + mutex_unlock(&ne_pci_dev->pci_dev_mutex);
> +
> + return 0;
> +}
> +
> +/**
> + * ne_reply_handler - Interrupt handler for retrieving a reply matching
> + * a request sent to the PCI device for enclave lifetime management.
> + *
> + * @irq: received interrupt for a reply sent by the PCI device.
> + * @args: PCI device private data structure.
> + *
> + * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
> + */
> +static irqreturn_t ne_reply_handler(int irq, void *args)
> +{
> + struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
> +
> + atomic_set(&ne_pci_dev->cmd_reply_avail, 1);
> +
> + /* TODO: Update to _interruptible. */
> + wake_up(&ne_pci_dev->cmd_reply_wait_q);
> +
> + return IRQ_HANDLED;
> +}
> +
> /**
> * ne_setup_msix - Setup MSI-X vectors for the PCI device.
> *
> @@ -75,8 +320,25 @@ static int ne_setup_msix(struct pci_dev *pdev, struct ne_pci_dev *ne_pci_dev)
> goto err_alloc_irq_vecs;
> }
>
> + /*
> + * This IRQ gets triggered every time the PCI device responds to a
> + * command request. The reply is then retrieved, reading from the MMIO
> + * space of the PCI device.
> + */
> + rc = request_irq(pci_irq_vector(pdev, NE_VEC_REPLY),
> + ne_reply_handler, 0, "enclave_cmd", ne_pci_dev);
> + if (rc < 0) {
> + dev_err_ratelimited(&pdev->dev,
> + "Failure in allocating irq reply [rc=%d]\n",
> + rc);
> +
> + goto err_req_irq_reply;
> + }
> +
> return 0;
>
> +err_req_irq_reply:
> + pci_free_irq_vectors(pdev);
> err_alloc_irq_vecs:
> return rc;
> }
> @@ -232,6 +494,7 @@ static int ne_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>
> err_ne_pci_dev_enable:
> err_ne_pci_dev_disable:
> + free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
> pci_free_irq_vectors(pdev);
I suggest to introduce a ne_teardown_msix() utility. That is aimed to
cleanup after ne_setup_msix().
> err_setup_msix:
> pci_iounmap(pdev, ne_pci_dev->iomem_base);
> @@ -255,6 +518,7 @@ static void ne_remove(struct pci_dev *pdev)
>
> pci_set_drvdata(pdev, NULL);
>
> + free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
> pci_free_irq_vectors(pdev);
>
> pci_iounmap(pdev, ne_pci_dev->iomem_base);

2020-04-25 15:27:37

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves


On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:
>
> The memory and CPUs are carved out of the primary VM, they are
> dedicated for the enclave. The Nitro hypervisor running on the host
> ensures memory and CPU isolation between the primary VM and the
> enclave VM.
I hope you properly take into consideration Hyper-Threading speculative
side-channel vulnerabilities here.
i.e. Usually cloud providers designate each CPU core to be assigned to
run only vCPUs of specific guest. To avoid sharing a single CPU core
between multiple guests.
To handle this properly, you need to use some kind of core-scheduling
mechanism (Such that each CPU core either runs only vCPUs of enclave or
only vCPUs of primary VM at any given point in time).

In addition, can you elaborate more on how the enclave memory is carved
out of the primary VM?
Does this involve performing a memory hot-unplug operation from primary
VM or just unmap enclave-assigned guest physical pages from primary VM's
SLAT (EPT/NPT) and map them now only in enclave's SLAT?

>
> Let me know if further clarifications are needed.
>
I don't quite understand why Enclave VM needs to be provisioned/teardown
during primary VM's runtime.

For example, an alternative could have been to just provision both
primary VM and Enclave VM on primary VM startup.
Then, wait for primary VM to setup a communication channel with Enclave
VM (E.g. via virtio-vsock).
Then, primary VM is free to request Enclave VM to perform various tasks
when required on the isolated environment.

Such setup will mimic a common Enclave setup. Such as Microsoft Windows
VBS EPT-based Enclaves (That all runs on VTL1). It is also similar to
TEEs running on ARM TrustZone.
i.e. In my alternative proposed solution, the Enclave VM is similar to
VTL1/TrustZone.
It will also avoid requiring introducing a new PCI device and driver.

-Liran


2020-04-25 16:07:52

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 24/04/20 21:11, Alexander Graf wrote:
> What I was saying above is that maybe code is easier to transfer that
> than a .txt file that gets lost somewhere in the Documentation directory
> :).

whynotboth.jpg :D

>>> To answer the question though, the target file is in a newly invented
>>> file format called "EIF" and it needs to be loaded at offset 0x800000 of
>>> the address space donated to the enclave.
>>
>> What is this EIF?
>
> It's just a very dumb container format that has a trivial header, a
> section with the bzImage and one to many sections of initramfs.
>
> As mentioned earlier in this thread, it really is just "-kernel" and
> "-initrd", packed into a single binary for transmission to the host.

Okay, got it. So, correct me if this is wrong, the information that is
needed to boot the enclave is:

* the kernel, in bzImage format

* the initrd

* a consecutive amount of memory, to be mapped with
KVM_SET_USER_MEMORY_REGION

Off list, Alex and I discussed having a struct that points to kernel and
initrd off enclave memory, and have the driver build EIF at the
appropriate point in enclave memory (the 8 MiB ofset that you mentioned).

This however has two disadvantages:

1) having the kernel and initrd loaded by the parent VM in enclave
memory has the advantage that you save memory outside the enclave memory
for something that is only needed inside the enclave

2) it is less extensible (what if you want to use PVH in the future for
example) and puts in the driver policy that should be in userspace.


So why not just start running the enclave at 0xfffffff0 in real mode?
Yes everybody hates it, but that's what OSes are written against. In
the simplest example, the parent enclave can load bzImage and initrd at
0x10000 and place firmware tables (MPTable and DMI) somewhere at
0xf0000; the firmware would just be a few movs to segment registers
followed by a long jmp.

If you want to keep EIF, we measured in QEMU that there is no measurable
difference between loading the kernel in the host and doing it in the
guest, so Amazon could provide an EIF loader stub at 0xfffffff0 for
backwards compatibility.

>> Again, I cannot provide a sensible review without explaining how to use
>> all this.  I understand that Amazon needs to do part of the design
>> behind closed doors, but this seems to have the resulted in issues that
>> reminds me of Intel's SGX misadventures. If Amazon has designed NE in a
>> way that is incompatible with open standards, it's up to Amazon to fix
>
> Oh, if there's anything that conflicts with open standards here, I would
> love to hear it immediately. I do not believe in security by obscurity  :).

That's great to hear!

Paolo

2020-04-26 01:57:41

by Longpeng(Mike)

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 2020/4/24 17:54, Paraschiv, Andra-Irina wrote:
>
>
> On 24/04/2020 11:19, Paraschiv, Andra-Irina wrote:
>>
>>
>> On 24/04/2020 06:04, Longpeng (Mike, Cloud Infrastructure Service Product
>> Dept.) wrote:
>>> On 2020/4/23 21:19, Paraschiv, Andra-Irina wrote:
>>>>
>>>> On 22/04/2020 00:46, Paolo Bonzini wrote:
>>>>> On 21/04/20 20:41, Andra Paraschiv wrote:
>>>>>> An enclave communicates with the primary VM via a local communication
>>>>>> channel,
>>>>>> using virtio-vsock [2]. An enclave does not have a disk or a network device
>>>>>> attached.
>>>>> Is it possible to have a sample of this in the samples/ directory?
>>>> I can add in v2 a sample file including the basic flow of how to use the ioctl
>>>> interface to create / terminate an enclave.
>>>>
>>>> Then we can update / build on top it based on the ongoing discussions on the
>>>> patch series and the received feedback.
>>>>
>>>>> I am interested especially in:
>>>>>
>>>>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>>>>>
>>>>> - the communication channel; does the enclave see the usual local APIC
>>>>> and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
>>>>> where is the virtio-vsock device (virtio-mmio I suppose) placed in memory?
>>>>>
>>>>> - what the enclave is allowed to do: can it change privilege levels,
>>>>> what happens if the enclave performs an access to nonexistent memory, etc.
>>>>>
>>>>> - whether there are special hypercall interfaces for the enclave
>>>> An enclave is a VM, running on the same host as the primary VM, that launched
>>>> the enclave. They are siblings.
>>>>
>>>> Here we need to think of two components:
>>>>
>>>> 1. An enclave abstraction process - a process running in the primary VM guest,
>>>> that uses the provided ioctl interface of the Nitro Enclaves kernel driver to
>>>> spawn an enclave VM (that's 2 below).
>>>>
>>>> How does all gets to an enclave VM running on the host?
>>>>
>>>> There is a Nitro Enclaves emulated PCI device exposed to the primary VM. The
>>>> driver for this new PCI device is included in the current patch series.
>>>>
>>> Hi Paraschiv,
>>>
>>> The new PCI device is emulated in QEMU ? If so, is there any plan to send the
>>> QEMU code ?
>>
>> Hi,
>>
>> Nope, not that I know of so far.
>
> And just to be a bit more clear, the reply above takes into consideration that
> it's not emulated in QEMU.
>

Thanks.

Guys in this thread are much more interested in the design of enclave VM and the
new device, but there's no any document about this device yet, so I think the
emulate code is a good alternative. However, Alex said the device specific will
be published later, so I'll wait for it.

>
> Thanks,
> Andra
>
>>
>>>
>>>> The ioctl logic is mapped to PCI device commands e.g. the NE_ENCLAVE_START
>>>> ioctl
>>>> maps to an enclave start PCI command or the KVM_SET_USER_MEMORY_REGION maps to
>>>> an add memory PCI command. The PCI device commands are then translated into
>>>> actions taken on the hypervisor side; that's the Nitro hypervisor running on
>>>> the
>>>> host where the primary VM is running.
>>>>
>>>> 2. The enclave itself - a VM running on the same host as the primary VM that
>>>> spawned it.
>>>>
>>>> The enclave VM has no persistent storage or network interface attached, it uses
>>>> its own memory and CPUs + its virtio-vsock emulated device for communication
>>>> with the primary VM.
>>>>
>>>> The memory and CPUs are carved out of the primary VM, they are dedicated for
>>>> the
>>>> enclave. The Nitro hypervisor running on the host ensures memory and CPU
>>>> isolation between the primary VM and the enclave VM.
>>>>
>>>>
>>>> These two components need to reflect the same state e.g. when the enclave
>>>> abstraction process (1) is terminated, the enclave VM (2) is terminated as
>>>> well.
>>>>
>>>> With regard to the communication channel, the primary VM has its own emulated
>>>> virtio-vsock PCI device. The enclave VM has its own emulated virtio-vsock
>>>> device
>>>> as well. This channel is used, for example, to fetch data in the enclave and
>>>> then process it. An application that sets up the vsock socket and connects or
>>>> listens, depending on the use case, is then developed to use this channel; this
>>>> happens on both ends - primary VM and enclave VM.
>>>>
>>>> Let me know if further clarifications are needed.
>>>>
>>>>>> The proposed solution is following the KVM model and uses the KVM API to
>>>>>> be able
>>>>>> to create and set resources for enclaves. An additional ioctl command,
>>>>>> besides
>>>>>> the ones provided by KVM, is used to start an enclave and setup the
>>>>>> addressing
>>>>>> for the communication channel and an enclave unique id.
>>>>> Reusing some KVM ioctls is definitely a good idea, but I wouldn't really
>>>>> say it's the KVM API since the VCPU file descriptor is basically non
>>>>> functional (without KVM_RUN and mmap it's not really the KVM API).
>>>> It uses part of the KVM API or a set of KVM ioctls to model the way a VM is
>>>> created / terminated. That's true, KVM_RUN and mmap-ing the vcpu fd are not
>>>> included.
>>>>
>>>> Thanks for the feedback regarding the reuse of KVM ioctls.
>>>>
>>>> Andra
>>>>
>>>>
>>>>
>>>>
>>>> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar
>>>> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in
>>>> Romania. Registration number J22/2621/2005.
>>
>
>
>
>
> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar
> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in
> Romania. Registration number J22/2621/2005.
---
Regards,
Longpeng(Mike)

2020-04-26 08:18:34

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v1 00/15] Add support for Nitro Enclaves

> From: Paraschiv, Andra-Irina <[email protected]>
> Sent: Friday, April 24, 2020 9:59 PM
>
>
> On 24/04/2020 12:59, Tian, Kevin wrote:
> >
> >> From: Paraschiv, Andra-Irina
> >> Sent: Thursday, April 23, 2020 9:20 PM
> >>
> >> On 22/04/2020 00:46, Paolo Bonzini wrote:
> >>> On 21/04/20 20:41, Andra Paraschiv wrote:
> >>>> An enclave communicates with the primary VM via a local
> communication
> >> channel,
> >>>> using virtio-vsock [2]. An enclave does not have a disk or a network
> device
> >>>> attached.
> >>> Is it possible to have a sample of this in the samples/ directory?
> >> I can add in v2 a sample file including the basic flow of how to use the
> >> ioctl interface to create / terminate an enclave.
> >>
> >> Then we can update / build on top it based on the ongoing discussions on
> >> the patch series and the received feedback.
> >>
> >>> I am interested especially in:
> >>>
> >>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
> >>>
> >>> - the communication channel; does the enclave see the usual local APIC
> >>> and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
> >>> where is the virtio-vsock device (virtio-mmio I suppose) placed in
> memory?
> >>>
> >>> - what the enclave is allowed to do: can it change privilege levels,
> >>> what happens if the enclave performs an access to nonexistent memory,
> >> etc.
> >>> - whether there are special hypercall interfaces for the enclave
> >> An enclave is a VM, running on the same host as the primary VM, that
> >> launched the enclave. They are siblings.
> >>
> >> Here we need to think of two components:
> >>
> >> 1. An enclave abstraction process - a process running in the primary VM
> >> guest, that uses the provided ioctl interface of the Nitro Enclaves
> >> kernel driver to spawn an enclave VM (that's 2 below).
> >>
> >> How does all gets to an enclave VM running on the host?
> >>
> >> There is a Nitro Enclaves emulated PCI device exposed to the primary VM.
> >> The driver for this new PCI device is included in the current patch series.
> >>
> >> The ioctl logic is mapped to PCI device commands e.g. the
> >> NE_ENCLAVE_START ioctl maps to an enclave start PCI command or the
> >> KVM_SET_USER_MEMORY_REGION maps to an add memory PCI
> command.
> >> The PCI
> >> device commands are then translated into actions taken on the hypervisor
> >> side; that's the Nitro hypervisor running on the host where the primary
> >> VM is running.
> >>
> >> 2. The enclave itself - a VM running on the same host as the primary VM
> >> that spawned it.
> >>
> >> The enclave VM has no persistent storage or network interface attached,
> >> it uses its own memory and CPUs + its virtio-vsock emulated device for
> >> communication with the primary VM.
> > sounds like a firecracker VM?
>
> It's a VM crafted for enclave needs.
>
> >
> >> The memory and CPUs are carved out of the primary VM, they are
> dedicated
> >> for the enclave. The Nitro hypervisor running on the host ensures memory
> >> and CPU isolation between the primary VM and the enclave VM.
> > In last paragraph, you said that the enclave VM uses its own memory and
> > CPUs. Then here, you said the memory/CPUs are carved out and dedicated
> > from the primary VM. Can you elaborate which one is accurate? or a mixed
> > model?
>
> Memory and CPUs are carved out of the primary VM and are dedicated for
> the enclave VM. I mentioned above as "its own" in the sense that the
> primary VM doesn't use these carved out resources while the enclave is
> running, as they are dedicated to the enclave.
>
> Hope that now it's more clear.

yes, it's clearer.

>
> >
> >>
> >> These two components need to reflect the same state e.g. when the
> >> enclave abstraction process (1) is terminated, the enclave VM (2) is
> >> terminated as well.
> >>
> >> With regard to the communication channel, the primary VM has its own
> >> emulated virtio-vsock PCI device. The enclave VM has its own emulated
> >> virtio-vsock device as well. This channel is used, for example, to fetch
> >> data in the enclave and then process it. An application that sets up the
> >> vsock socket and connects or listens, depending on the use case, is then
> >> developed to use this channel; this happens on both ends - primary VM
> >> and enclave VM.
> > How does the application in the primary VM assign task to be executed
> > in the enclave VM? I didn't see such command in this series, so suppose
> > it is also communicated through virtio-vsock?
>
> The application that runs in the enclave needs to be packaged in an
> enclave image together with the OS ( e.g. kernel, ramdisk, init ) that
> will run in the enclave VM.
>
> Then the enclave image is loaded in memory. After booting is finished,
> the application starts. Now, depending on the app implementation and use
> case, one example can be that the app in the enclave waits for data to
> be fetched in via the vsock channel.
>

OK, I thought the code/data was dynamically injected from the primary
VM and then run in the enclave. From your description it sounds like
a servicing model that an auto-running application wait for and respond
service request from the application in the primary VM.

Thanks
Kevin

2020-04-27 08:00:57

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 25/04/2020 18:25, Liran Alon wrote:
>
> On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:
>>
>> The memory and CPUs are carved out of the primary VM, they are
>> dedicated for the enclave. The Nitro hypervisor running on the host
>> ensures memory and CPU isolation between the primary VM and the
>> enclave VM.
> I hope you properly take into consideration Hyper-Threading
> speculative side-channel vulnerabilities here.
> i.e. Usually cloud providers designate each CPU core to be assigned to
> run only vCPUs of specific guest. To avoid sharing a single CPU core
> between multiple guests.
> To handle this properly, you need to use some kind of core-scheduling
> mechanism (Such that each CPU core either runs only vCPUs of enclave
> or only vCPUs of primary VM at any given point in time).
>
> In addition, can you elaborate more on how the enclave memory is
> carved out of the primary VM?
> Does this involve performing a memory hot-unplug operation from
> primary VM or just unmap enclave-assigned guest physical pages from
> primary VM's SLAT (EPT/NPT) and map them now only in enclave's SLAT?


Correct, we take into consideration the HT setup. The enclave gets
dedicated physical cores. The primary VM and the enclave VM don't run on
CPU siblings of a physical core.

Regarding the memory carve out, the logic includes page table entries
handling.

IIRC, memory hot-unplug can be used for the memory blocks that were
previously hot-plugged.

https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html

>
>>
>> Let me know if further clarifications are needed.
>>
> I don't quite understand why Enclave VM needs to be
> provisioned/teardown during primary VM's runtime.
>
> For example, an alternative could have been to just provision both
> primary VM and Enclave VM on primary VM startup.
> Then, wait for primary VM to setup a communication channel with
> Enclave VM (E.g. via virtio-vsock).
> Then, primary VM is free to request Enclave VM to perform various
> tasks when required on the isolated environment.
>
> Such setup will mimic a common Enclave setup. Such as Microsoft
> Windows VBS EPT-based Enclaves (That all runs on VTL1). It is also
> similar to TEEs running on ARM TrustZone.
> i.e. In my alternative proposed solution, the Enclave VM is similar to
> VTL1/TrustZone.
> It will also avoid requiring introducing a new PCI device and driver.

True, this can be another option, to provision the primary VM and the
enclave VM at launch time.

In the proposed setup, the primary VM starts with the initial allocated
resources (memory, CPUs). The launch path of the enclave VM, as it's
spawned on the same host, is done via the ioctl interface - PCI device -
host hypervisor path. Short-running or long-running enclave can be
bootstrapped during primary VM lifetime. Depending on the use case, a
custom set of resources (memory and CPUs) is set for an enclave and then
given back when the enclave is terminated; these resources can be used
for another enclave spawned later on or the primary VM tasks.

Thanks,
Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-27 09:18:38

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 25/04/2020 19:05, Paolo Bonzini wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On 24/04/20 21:11, Alexander Graf wrote:
>> What I was saying above is that maybe code is easier to transfer that
>> than a .txt file that gets lost somewhere in the Documentation directory
>> :).
> whynotboth.jpg :D

:) Alright, I added it to the list, in addition to the sample we've been
talking before, with the basic flow of the ioctl interface usage.

>
>>>> To answer the question though, the target file is in a newly invented
>>>> file format called "EIF" and it needs to be loaded at offset 0x800000 of
>>>> the address space donated to the enclave.
>>> What is this EIF?
>> It's just a very dumb container format that has a trivial header, a
>> section with the bzImage and one to many sections of initramfs.
>>
>> As mentioned earlier in this thread, it really is just "-kernel" and
>> "-initrd", packed into a single binary for transmission to the host.
> Okay, got it. So, correct me if this is wrong, the information that is
> needed to boot the enclave is:
>
> * the kernel, in bzImage format
>
> * the initrd
>
> * a consecutive amount of memory, to be mapped with
> KVM_SET_USER_MEMORY_REGION

Yes, the kernel bzImage, the kernel command line, the ramdisk(s) are
part of the Enclave Image Format (EIF); plus an EIF header including
metadata such as magic number, eif version, image size and CRC.

>
> Off list, Alex and I discussed having a struct that points to kernel and
> initrd off enclave memory, and have the driver build EIF at the
> appropriate point in enclave memory (the 8 MiB ofset that you mentioned).
>
> This however has two disadvantages:
>
> 1) having the kernel and initrd loaded by the parent VM in enclave
> memory has the advantage that you save memory outside the enclave memory
> for something that is only needed inside the enclave

Here you wanted to say disadvantage? :) Wrt saving memory, it's about
additional memory from the parent / primary VM needed for handling the
enclave image sections (such as the kernel, ramdisk) and setting the EIF
at a certain offset in enclave memory?

>
> 2) it is less extensible (what if you want to use PVH in the future for
> example) and puts in the driver policy that should be in userspace.
>
>
> So why not just start running the enclave at 0xfffffff0 in real mode?
> Yes everybody hates it, but that's what OSes are written against. In
> the simplest example, the parent enclave can load bzImage and initrd at
> 0x10000 and place firmware tables (MPTable and DMI) somewhere at
> 0xf0000; the firmware would just be a few movs to segment registers
> followed by a long jmp.
>
> If you want to keep EIF, we measured in QEMU that there is no measurable
> difference between loading the kernel in the host and doing it in the
> guest, so Amazon could provide an EIF loader stub at 0xfffffff0 for
> backwards compatibility.

Thanks for info.

Andra

>
>>> Again, I cannot provide a sensible review without explaining how to use
>>> all this. I understand that Amazon needs to do part of the design
>>> behind closed doors, but this seems to have the resulted in issues that
>>> reminds me of Intel's SGX misadventures. If Amazon has designed NE in a
>>> way that is incompatible with open standards, it's up to Amazon to fix
>> Oh, if there's anything that conflicts with open standards here, I would
>> love to hear it immediately. I do not believe in security by obscurity :).
> That's great to hear!
>
> Paolo
>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-27 09:24:42

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 25/04/2020 19:05, Paolo Bonzini wrote:
>
> On 24/04/20 21:11, Alexander Graf wrote:
>> What I was saying above is that maybe code is easier to transfer that
>> than a .txt file that gets lost somewhere in the Documentation directory
>> :).
> whynotboth.jpg :D

:) Alright, I added it to the list, in addition to the sample we've been
talking before, with the basic flow of the ioctl interface usage.

>
>>>> To answer the question though, the target file is in a newly invented
>>>> file format called "EIF" and it needs to be loaded at offset 0x800000 of
>>>> the address space donated to the enclave.
>>> What is this EIF?
>> It's just a very dumb container format that has a trivial header, a
>> section with the bzImage and one to many sections of initramfs.
>>
>> As mentioned earlier in this thread, it really is just "-kernel" and
>> "-initrd", packed into a single binary for transmission to the host.
> Okay, got it. So, correct me if this is wrong, the information that is
> needed to boot the enclave is:
>
> * the kernel, in bzImage format
>
> * the initrd
>
> * a consecutive amount of memory, to be mapped with
> KVM_SET_USER_MEMORY_REGION

Yes, the kernel bzImage, the kernel command line, the ramdisk(s) are
part of the Enclave Image Format (EIF); plus an EIF header including
metadata such as magic number, eif version, image size and CRC.

>
> Off list, Alex and I discussed having a struct that points to kernel and
> initrd off enclave memory, and have the driver build EIF at the
> appropriate point in enclave memory (the 8 MiB ofset that you mentioned).
>
> This however has two disadvantages:
>
> 1) having the kernel and initrd loaded by the parent VM in enclave
> memory has the advantage that you save memory outside the enclave memory
> for something that is only needed inside the enclave

Here you wanted to say disadvantage? :)Wrt saving memory, it's about
additional memory from the parent / primary VM needed for handling the
enclave image sections (such as the kernel, ramdisk) and setting the EIF
at a certain offset in enclave memory?

>
> 2) it is less extensible (what if you want to use PVH in the future for
> example) and puts in the driver policy that should be in userspace.
>
>
> So why not just start running the enclave at 0xfffffff0 in real mode?
> Yes everybody hates it, but that's what OSes are written against. In
> the simplest example, the parent enclave can load bzImage and initrd at
> 0x10000 and place firmware tables (MPTable and DMI) somewhere at
> 0xf0000; the firmware would just be a few movs to segment registers
> followed by a long jmp.
>
> If you want to keep EIF, we measured in QEMU that there is no measurable
> difference between loading the kernel in the host and doing it in the
> guest, so Amazon could provide an EIF loader stub at 0xfffffff0 for
> backwards compatibility.

Thanks for info.

Andra

>
>>> Again, I cannot provide a sensible review without explaining how to use
>>> all this. I understand that Amazon needs to do part of the design
>>> behind closed doors, but this seems to have the resulted in issues that
>>> reminds me of Intel's SGX misadventures. If Amazon has designed NE in a
>>> way that is incompatible with open standards, it's up to Amazon to fix
>> Oh, if there's anything that conflicts with open standards here, I would
>> love to hear it immediately. I do not believe in security by obscurity :).
> That's great to hear!
>
> Paolo
>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-27 09:51:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 27/04/20 11:22, Paraschiv, Andra-Irina wrote:
>>
>>
>> 1) having the kernel and initrd loaded by the parent VM in enclave
>> memory has the advantage that you save memory outside the enclave memory
>> for something that is only needed inside the enclave
>
> Here you wanted to say disadvantage? :)Wrt saving memory, it's about
> additional memory from the parent / primary VM needed for handling the
> enclave image sections (such as the kernel, ramdisk) and setting the EIF
> at a certain offset in enclave memory?

No, it's an advantage. If the parent VM can load everything in enclave
memory, it can read() into it directly. It doesn't to waste its own
memory for a kernel and initrd, whose only reason to exist is to be
copied into enclave memory.

Paolo

2020-04-27 10:02:54

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 27/04/2020 12:46, Paolo Bonzini wrote:
> On 27/04/20 11:22, Paraschiv, Andra-Irina wrote:
>>>
>>> 1) having the kernel and initrd loaded by the parent VM in enclave
>>> memory has the advantage that you save memory outside the enclave memory
>>> for something that is only needed inside the enclave
>> Here you wanted to say disadvantage? :)Wrt saving memory, it's about
>> additional memory from the parent / primary VM needed for handling the
>> enclave image sections (such as the kernel, ramdisk) and setting the EIF
>> at a certain offset in enclave memory?
> No, it's an advantage. If the parent VM can load everything in enclave
> memory, it can read() into it directly. It doesn't to waste its own
> memory for a kernel and initrd, whose only reason to exist is to be
> copied into enclave memory.

Ok, got it, saving was referring to actually not using additional memory.

Thank you for clarification.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-27 11:48:25

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves


On 27/04/2020 10:56, Paraschiv, Andra-Irina wrote:
>
> On 25/04/2020 18:25, Liran Alon wrote:
>>
>> On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:
>>>
>>> The memory and CPUs are carved out of the primary VM, they are
>>> dedicated for the enclave. The Nitro hypervisor running on the host
>>> ensures memory and CPU isolation between the primary VM and the
>>> enclave VM.
>> I hope you properly take into consideration Hyper-Threading
>> speculative side-channel vulnerabilities here.
>> i.e. Usually cloud providers designate each CPU core to be assigned
>> to run only vCPUs of specific guest. To avoid sharing a single CPU
>> core between multiple guests.
>> To handle this properly, you need to use some kind of core-scheduling
>> mechanism (Such that each CPU core either runs only vCPUs of enclave
>> or only vCPUs of primary VM at any given point in time).
>>
>> In addition, can you elaborate more on how the enclave memory is
>> carved out of the primary VM?
>> Does this involve performing a memory hot-unplug operation from
>> primary VM or just unmap enclave-assigned guest physical pages from
>> primary VM's SLAT (EPT/NPT) and map them now only in enclave's SLAT?
>
> Correct, we take into consideration the HT setup. The enclave gets
> dedicated physical cores. The primary VM and the enclave VM don't run
> on CPU siblings of a physical core.
The way I would imagine this to work is that Primary-VM just specifies
how many vCPUs will the Enclave-VM have and those vCPUs will be set with
affinity to run on same physical CPU cores as Primary-VM.
But with the exception that scheduler is modified to not run vCPUs of
Primary-VM and Enclave-VM as sibling on the same physical CPU core
(core-scheduling). i.e. This is different than primary-VM losing
physical CPU cores permanently as long as the Enclave-VM is running.
Or maybe this should even be controlled by a knob in virtual PCI device
interface to allow flexibility to customer to decide if Enclave-VM needs
dedicated CPU cores or is it ok to share them with Primary-VM
as long as core-scheduling is used to guarantee proper isolation.
>
> Regarding the memory carve out, the logic includes page table entries
> handling.
As I thought. Thanks for conformation.
>
> IIRC, memory hot-unplug can be used for the memory blocks that were
> previously hot-plugged.
>
> https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html__;!!GqivPVa7Brio!MubgaBjJabDtNzNpdOxxbSKtLbqXHbsEpTtZ1mj-rnfLvMIbLW1nZ8cK10GhYJQ$
>
>>
>> I don't quite understand why Enclave VM needs to be
>> provisioned/teardown during primary VM's runtime.
>>
>> For example, an alternative could have been to just provision both
>> primary VM and Enclave VM on primary VM startup.
>> Then, wait for primary VM to setup a communication channel with
>> Enclave VM (E.g. via virtio-vsock).
>> Then, primary VM is free to request Enclave VM to perform various
>> tasks when required on the isolated environment.
>>
>> Such setup will mimic a common Enclave setup. Such as Microsoft
>> Windows VBS EPT-based Enclaves (That all runs on VTL1). It is also
>> similar to TEEs running on ARM TrustZone.
>> i.e. In my alternative proposed solution, the Enclave VM is similar
>> to VTL1/TrustZone.
>> It will also avoid requiring introducing a new PCI device and driver.
>
> True, this can be another option, to provision the primary VM and the
> enclave VM at launch time.
>
> In the proposed setup, the primary VM starts with the initial
> allocated resources (memory, CPUs). The launch path of the enclave VM,
> as it's spawned on the same host, is done via the ioctl interface -
> PCI device - host hypervisor path. Short-running or long-running
> enclave can be bootstrapped during primary VM lifetime. Depending on
> the use case, a custom set of resources (memory and CPUs) is set for
> an enclave and then given back when the enclave is terminated; these
> resources can be used for another enclave spawned later on or the
> primary VM tasks.
>
Yes, I already understood this is how the mechanism work. I'm
questioning whether this is indeed a good approach that should also be
taken by upstream.

The use-case of using Nitro Enclaves is for a Confidential-Computing
service. i.e. The ability to provision a compute instance that can be
trusted to perform a bunch of computation on sensitive
information with high confidence that it cannot be compromised as it's
highly isolated. Some technologies such as Intel SGX and AMD SEV
attempted to achieve this even with guarantees that
the computation is isolated from the hardware and hypervisor itself.

I would have expected that for the vast majority of real customer
use-cases, the customer will provision a compute instance that runs some
confidential-computing task in an enclave which it
keeps running for the entire life-time of the compute instance. As the
sole purpose of the compute instance is to just expose a service that
performs some confidential-computing task.
For those cases, it should have been sufficient to just pre-provision a
single Enclave-VM that performs this task, together with the compute
instance and connect them via virtio-vsock.
Without introducing any new virtual PCI device, guest PCI driver and
unique semantics of stealing resources (CPUs and Memory) from primary-VM
at runtime.

In this Nitro Enclave architecture, we de-facto put Compute
control-plane abilities in the hands of the guest VM. Instead of
introducing new control-plane primitives that allows building
the data-plane architecture desired by the customer in a flexible manner.
* What if the customer prefers to have it's Enclave VM polling S3 bucket
for new tasks and produce results to S3 as-well? Without having any
"Primary-VM" or virtio-vsock connection of any kind?
* What if for some use-cases customer wants Enclave-VM to have dedicated
compute power (i.e. Not share physical CPU cores with primary-VM. Not
even with core-scheduling) but for other
use-cases, customer prefers to share physical CPU cores with Primary-VM
(Together with core-scheduling guarantees)? (Although this could be
addressed by extending the virtual PCI device
interface with a knob to control this)

An alternative would have been to have the following new control-plane
primitives:
* Ability to provision a VM without boot-volume, but instead from an
Image that is used to boot from memory. Allowing to provision disk-less VMs.
  (E.g. Can be useful for other use-cases such as VMs not requiring EBS
at all which could allow cheaper compute instance)
* Ability to provision a group of VMs together as a group such that they
are guaranteed to launch as sibling VMs on the same host.
* Ability to create a fast-path connection between sibling VMs on the
same host with virtio-vsock. Or even also other shared-memory mechanism.
* Extend AWS Fargate with ability to run multiple microVMs as a group
(Similar to above) connected with virtio-vsock. To allow on-demand scale
of confidential-computing task.

Having said that, I do see a similar architecture to Nitro Enclaves
virtual PCI device used for a different purpose: For hypervisor-based
security isolation (Such as Windows VBS).
E.g. Linux boot-loader can detect the presence of this virtual PCI
device and use it to provision multiple VM security domains. Such that
when a security domain is created,
it is specified what is the hardware resources it have access to (Guest
memory pages, IOPorts, MSRs and etc.) and the blob it should run to
bootstrap. Similar, but superior than,
Hyper-V VSM. In addition, some security domains will be given special
abilities to control other security domains (For example, to control the
+XS,+XU EPT bits of other security
domains to enforce code-integrity. Similar to Windows VBS HVCI). Just an
idea... :)

-Liran











2020-04-27 18:42:15

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 26/04/2020 04:55, Longpeng (Mike, Cloud Infrastructure Service
Product Dept.) wrote:
>
> On 2020/4/24 17:54, Paraschiv, Andra-Irina wrote:
>>
>> On 24/04/2020 11:19, Paraschiv, Andra-Irina wrote:
>>>
>>> On 24/04/2020 06:04, Longpeng (Mike, Cloud Infrastructure Service Product
>>> Dept.) wrote:
>>>> On 2020/4/23 21:19, Paraschiv, Andra-Irina wrote:
>>>>> On 22/04/2020 00:46, Paolo Bonzini wrote:
>>>>>> On 21/04/20 20:41, Andra Paraschiv wrote:
>>>>>>> An enclave communicates with the primary VM via a local communication
>>>>>>> channel,
>>>>>>> using virtio-vsock [2]. An enclave does not have a disk or a network device
>>>>>>> attached.
>>>>>> Is it possible to have a sample of this in the samples/ directory?
>>>>> I can add in v2 a sample file including the basic flow of how to use the ioctl
>>>>> interface to create / terminate an enclave.
>>>>>
>>>>> Then we can update / build on top it based on the ongoing discussions on the
>>>>> patch series and the received feedback.
>>>>>
>>>>>> I am interested especially in:
>>>>>>
>>>>>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>>>>>>
>>>>>> - the communication channel; does the enclave see the usual local APIC
>>>>>> and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
>>>>>> where is the virtio-vsock device (virtio-mmio I suppose) placed in memory?
>>>>>>
>>>>>> - what the enclave is allowed to do: can it change privilege levels,
>>>>>> what happens if the enclave performs an access to nonexistent memory, etc.
>>>>>>
>>>>>> - whether there are special hypercall interfaces for the enclave
>>>>> An enclave is a VM, running on the same host as the primary VM, that launched
>>>>> the enclave. They are siblings.
>>>>>
>>>>> Here we need to think of two components:
>>>>>
>>>>> 1. An enclave abstraction process - a process running in the primary VM guest,
>>>>> that uses the provided ioctl interface of the Nitro Enclaves kernel driver to
>>>>> spawn an enclave VM (that's 2 below).
>>>>>
>>>>> How does all gets to an enclave VM running on the host?
>>>>>
>>>>> There is a Nitro Enclaves emulated PCI device exposed to the primary VM. The
>>>>> driver for this new PCI device is included in the current patch series.
>>>>>
>>>> Hi Paraschiv,
>>>>
>>>> The new PCI device is emulated in QEMU ? If so, is there any plan to send the
>>>> QEMU code ?
>>> Hi,
>>>
>>> Nope, not that I know of so far.
>> And just to be a bit more clear, the reply above takes into consideration that
>> it's not emulated in QEMU.
>>
> Thanks.
>
> Guys in this thread are much more interested in the design of enclave VM and the
> new device, but there's no any document about this device yet, so I think the
> emulate code is a good alternative. However, Alex said the device specific will
> be published later, so I'll wait for it.

True, that was mentioned wrt device spec. The device interface could
also be updated based on the ongoing discussions on the patch series.
Refs to the device spec should be included e.g. in the .h file of the
PCI device, once it's available.

Thanks,
Andra

>
>> Thanks,
>> Andra
>>
>>>>> The ioctl logic is mapped to PCI device commands e.g. the NE_ENCLAVE_START
>>>>> ioctl
>>>>> maps to an enclave start PCI command or the KVM_SET_USER_MEMORY_REGION maps to
>>>>> an add memory PCI command. The PCI device commands are then translated into
>>>>> actions taken on the hypervisor side; that's the Nitro hypervisor running on
>>>>> the
>>>>> host where the primary VM is running.
>>>>>
>>>>> 2. The enclave itself - a VM running on the same host as the primary VM that
>>>>> spawned it.
>>>>>
>>>>> The enclave VM has no persistent storage or network interface attached, it uses
>>>>> its own memory and CPUs + its virtio-vsock emulated device for communication
>>>>> with the primary VM.
>>>>>
>>>>> The memory and CPUs are carved out of the primary VM, they are dedicated for
>>>>> the
>>>>> enclave. The Nitro hypervisor running on the host ensures memory and CPU
>>>>> isolation between the primary VM and the enclave VM.
>>>>>
>>>>>
>>>>> These two components need to reflect the same state e.g. when the enclave
>>>>> abstraction process (1) is terminated, the enclave VM (2) is terminated as
>>>>> well.
>>>>>
>>>>> With regard to the communication channel, the primary VM has its own emulated
>>>>> virtio-vsock PCI device. The enclave VM has its own emulated virtio-vsock
>>>>> device
>>>>> as well. This channel is used, for example, to fetch data in the enclave and
>>>>> then process it. An application that sets up the vsock socket and connects or
>>>>> listens, depending on the use case, is then developed to use this channel; this
>>>>> happens on both ends - primary VM and enclave VM.
>>>>>
>>>>> Let me know if further clarifications are needed.
>>>>>
>>>>>>> The proposed solution is following the KVM model and uses the KVM API to
>>>>>>> be able
>>>>>>> to create and set resources for enclaves. An additional ioctl command,
>>>>>>> besides
>>>>>>> the ones provided by KVM, is used to start an enclave and setup the
>>>>>>> addressing
>>>>>>> for the communication channel and an enclave unique id.
>>>>>> Reusing some KVM ioctls is definitely a good idea, but I wouldn't really
>>>>>> say it's the KVM API since the VCPU file descriptor is basically non
>>>>>> functional (without KVM_RUN and mmap it's not really the KVM API).
>>>>> It uses part of the KVM API or a set of KVM ioctls to model the way a VM is
>>>>> created / terminated. That's true, KVM_RUN and mmap-ing the vcpu fd are not
>>>>> included.
>>>>>
>>>>> Thanks for the feedback regarding the reuse of KVM ioctls.
>>>>>
>>>>> Andra
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar
>>>>> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in
>>>>> Romania. Registration number J22/2621/2005.
>>
>>
>>
>> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar
>> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in
>> Romania. Registration number J22/2621/2005.
> ---
> Regards,
> Longpeng(Mike)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-27 19:07:45

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 26/04/2020 11:16, Tian, Kevin wrote:
>> From: Paraschiv, Andra-Irina <[email protected]>
>> Sent: Friday, April 24, 2020 9:59 PM
>>
>>
>> On 24/04/2020 12:59, Tian, Kevin wrote:
>>>> From: Paraschiv, Andra-Irina
>>>> Sent: Thursday, April 23, 2020 9:20 PM
>>>>
>>>> On 22/04/2020 00:46, Paolo Bonzini wrote:
>>>>> On 21/04/20 20:41, Andra Paraschiv wrote:
>>>>>> An enclave communicates with the primary VM via a local
>> communication
>>>> channel,
>>>>>> using virtio-vsock [2]. An enclave does not have a disk or a network
>> device
>>>>>> attached.
>>>>> Is it possible to have a sample of this in the samples/ directory?
>>>> I can add in v2 a sample file including the basic flow of how to use the
>>>> ioctl interface to create / terminate an enclave.
>>>>
>>>> Then we can update / build on top it based on the ongoing discussions on
>>>> the patch series and the received feedback.
>>>>
>>>>> I am interested especially in:
>>>>>
>>>>> - the initial CPU state: CPL0 vs. CPL3, initial program counter, etc.
>>>>>
>>>>> - the communication channel; does the enclave see the usual local APIC
>>>>> and IOAPIC interfaces in order to get interrupts from virtio-vsock, and
>>>>> where is the virtio-vsock device (virtio-mmio I suppose) placed in
>> memory?
>>>>> - what the enclave is allowed to do: can it change privilege levels,
>>>>> what happens if the enclave performs an access to nonexistent memory,
>>>> etc.
>>>>> - whether there are special hypercall interfaces for the enclave
>>>> An enclave is a VM, running on the same host as the primary VM, that
>>>> launched the enclave. They are siblings.
>>>>
>>>> Here we need to think of two components:
>>>>
>>>> 1. An enclave abstraction process - a process running in the primary VM
>>>> guest, that uses the provided ioctl interface of the Nitro Enclaves
>>>> kernel driver to spawn an enclave VM (that's 2 below).
>>>>
>>>> How does all gets to an enclave VM running on the host?
>>>>
>>>> There is a Nitro Enclaves emulated PCI device exposed to the primary VM.
>>>> The driver for this new PCI device is included in the current patch series.
>>>>
>>>> The ioctl logic is mapped to PCI device commands e.g. the
>>>> NE_ENCLAVE_START ioctl maps to an enclave start PCI command or the
>>>> KVM_SET_USER_MEMORY_REGION maps to an add memory PCI
>> command.
>>>> The PCI
>>>> device commands are then translated into actions taken on the hypervisor
>>>> side; that's the Nitro hypervisor running on the host where the primary
>>>> VM is running.
>>>>
>>>> 2. The enclave itself - a VM running on the same host as the primary VM
>>>> that spawned it.
>>>>
>>>> The enclave VM has no persistent storage or network interface attached,
>>>> it uses its own memory and CPUs + its virtio-vsock emulated device for
>>>> communication with the primary VM.
>>> sounds like a firecracker VM?
>> It's a VM crafted for enclave needs.
>>
>>>> The memory and CPUs are carved out of the primary VM, they are
>> dedicated
>>>> for the enclave. The Nitro hypervisor running on the host ensures memory
>>>> and CPU isolation between the primary VM and the enclave VM.
>>> In last paragraph, you said that the enclave VM uses its own memory and
>>> CPUs. Then here, you said the memory/CPUs are carved out and dedicated
>>> from the primary VM. Can you elaborate which one is accurate? or a mixed
>>> model?
>> Memory and CPUs are carved out of the primary VM and are dedicated for
>> the enclave VM. I mentioned above as "its own" in the sense that the
>> primary VM doesn't use these carved out resources while the enclave is
>> running, as they are dedicated to the enclave.
>>
>> Hope that now it's more clear.
> yes, it's clearer.

Good, glad to hear that.

>
>>>> These two components need to reflect the same state e.g. when the
>>>> enclave abstraction process (1) is terminated, the enclave VM (2) is
>>>> terminated as well.
>>>>
>>>> With regard to the communication channel, the primary VM has its own
>>>> emulated virtio-vsock PCI device. The enclave VM has its own emulated
>>>> virtio-vsock device as well. This channel is used, for example, to fetch
>>>> data in the enclave and then process it. An application that sets up the
>>>> vsock socket and connects or listens, depending on the use case, is then
>>>> developed to use this channel; this happens on both ends - primary VM
>>>> and enclave VM.
>>> How does the application in the primary VM assign task to be executed
>>> in the enclave VM? I didn't see such command in this series, so suppose
>>> it is also communicated through virtio-vsock?
>> The application that runs in the enclave needs to be packaged in an
>> enclave image together with the OS ( e.g. kernel, ramdisk, init ) that
>> will run in the enclave VM.
>>
>> Then the enclave image is loaded in memory. After booting is finished,
>> the application starts. Now, depending on the app implementation and use
>> case, one example can be that the app in the enclave waits for data to
>> be fetched in via the vsock channel.
>>
> OK, I thought the code/data was dynamically injected from the primary
> VM and then run in the enclave. From your description it sounds like
> a servicing model that an auto-running application wait for and respond
> service request from the application in the primary VM.

That was an example with a possible use case; in that one example, data
can be dynamically injected e.g. fetch in the enclave VM a bunch data,
get back the results after processing, then fetch in another set of data
and so on.

The architecture of the solution depends on how the tasks are split
between the primary VM and the enclave VM and what is sent via the vsock
channel. The primary VM, the enclave VM and the communication between
them is part of the foundational technology we provide. What's running
inside each of them can vary based on the customer use case and updates
to fit this infrastructure of several tasks now being split and running
part of them in the enclave VM.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-28 15:10:21

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 25.04.20 18:05, Paolo Bonzini wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On 24/04/20 21:11, Alexander Graf wrote:
>> What I was saying above is that maybe code is easier to transfer that
>> than a .txt file that gets lost somewhere in the Documentation directory
>> :).
>
> whynotboth.jpg :D

Uh, sure? :)

Let's first hammer out what we really want for the UABI though. Then we
can document it.

>>>> To answer the question though, the target file is in a newly invented
>>>> file format called "EIF" and it needs to be loaded at offset 0x800000 of
>>>> the address space donated to the enclave.
>>>
>>> What is this EIF?
>>
>> It's just a very dumb container format that has a trivial header, a
>> section with the bzImage and one to many sections of initramfs.
>>
>> As mentioned earlier in this thread, it really is just "-kernel" and
>> "-initrd", packed into a single binary for transmission to the host.
>
> Okay, got it. So, correct me if this is wrong, the information that is
> needed to boot the enclave is:
>
> * the kernel, in bzImage format
>
> * the initrd

It's a single EIF file for a good reason. There are checksums in there
and potentially signatures too, so that you can the enclave can attest
itself. For the sake of the user space API, the enclave image really
should just be considered a blob.

>
> * a consecutive amount of memory, to be mapped with
> KVM_SET_USER_MEMORY_REGION
>
> Off list, Alex and I discussed having a struct that points to kernel and
> initrd off enclave memory, and have the driver build EIF at the
> appropriate point in enclave memory (the 8 MiB ofset that you mentioned).
>
> This however has two disadvantages:
>
> 1) having the kernel and initrd loaded by the parent VM in enclave
> memory has the advantage that you save memory outside the enclave memory
> for something that is only needed inside the enclave
>
> 2) it is less extensible (what if you want to use PVH in the future for
> example) and puts in the driver policy that should be in userspace.
>
>
> So why not just start running the enclave at 0xfffffff0 in real mode?
> Yes everybody hates it, but that's what OSes are written against. In
> the simplest example, the parent enclave can load bzImage and initrd at
> 0x10000 and place firmware tables (MPTable and DMI) somewhere at
> 0xf0000; the firmware would just be a few movs to segment registers
> followed by a long jmp.

There is a bit of initial attestation flow in the enclave, so that you
can be sure that the code that is running is actually what you wanted to
run.

I would also in general prefer to disconnect the notion of "enclave
memory" as much as possible from a memory location view. User space
shouldn't be in the business of knowing location of its donated memory
ended up at which enclave memory position. By disconnecting the view of
the memory world, we can do some more optimizations, such as compact
memory ranges more efficiently in kernel space.

> If you want to keep EIF, we measured in QEMU that there is no measurable
> difference between loading the kernel in the host and doing it in the
> guest, so Amazon could provide an EIF loader stub at 0xfffffff0 for
> backwards compatibility.

It's not about performance :).

So the other thing we discussed was whether the KVM API really turned
out to be a good fit here. After all, today we merely call:

* CREATE_VM
* SET_MEMORY_RANGE
* CREATE_VCPU
* START_ENCLAVE

where we even butcher up CREATE_VCPU into a meaningless blob of overhead
for no good reason.

Why don't we build something like the following instead?

vm = ne_create(vcpus = 4)
ne_set_memory(vm, hva, len)
ne_load_image(vm, addr, len)
ne_start(vm)

That way we would get the EIF loading into kernel space. "LOAD_IMAGE"
would only be available in the time window between set_memory and start.
It basically implements a memcpy(), but it would completely hide the
hidden semantics of where an EIF has to go, so future device versions
(or even other enclave implementers) could change the logic.

I think it also makes sense to just allocate those 4 ioctls from
scratch. Paolo, would you still want to "donate" KVM ioctl space in that
case?

Overall, the above should address most of the concerns you raised in
this mail, right? It still requires copying, but at least we don't have
to keep the copy in kernel space.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2020-04-28 15:28:24

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 27.04.20 13:44, Liran Alon wrote:
>
> On 27/04/2020 10:56, Paraschiv, Andra-Irina wrote:
>>
>> On 25/04/2020 18:25, Liran Alon wrote:
>>>
>>> On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:
>>>>
>>>> The memory and CPUs are carved out of the primary VM, they are
>>>> dedicated for the enclave. The Nitro hypervisor running on the host
>>>> ensures memory and CPU isolation between the primary VM and the
>>>> enclave VM.
>>> I hope you properly take into consideration Hyper-Threading
>>> speculative side-channel vulnerabilities here.
>>> i.e. Usually cloud providers designate each CPU core to be assigned
>>> to run only vCPUs of specific guest. To avoid sharing a single CPU
>>> core between multiple guests.
>>> To handle this properly, you need to use some kind of core-scheduling
>>> mechanism (Such that each CPU core either runs only vCPUs of enclave
>>> or only vCPUs of primary VM at any given point in time).
>>>
>>> In addition, can you elaborate more on how the enclave memory is
>>> carved out of the primary VM?
>>> Does this involve performing a memory hot-unplug operation from
>>> primary VM or just unmap enclave-assigned guest physical pages from
>>> primary VM's SLAT (EPT/NPT) and map them now only in enclave's SLAT?
>>
>> Correct, we take into consideration the HT setup. The enclave gets
>> dedicated physical cores. The primary VM and the enclave VM don't run
>> on CPU siblings of a physical core.
> The way I would imagine this to work is that Primary-VM just specifies
> how many vCPUs will the Enclave-VM have and those vCPUs will be set with
> affinity to run on same physical CPU cores as Primary-VM.
> But with the exception that scheduler is modified to not run vCPUs of
> Primary-VM and Enclave-VM as sibling on the same physical CPU core
> (core-scheduling). i.e. This is different than primary-VM losing
> physical CPU cores permanently as long as the Enclave-VM is running.
> Or maybe this should even be controlled by a knob in virtual PCI device
> interface to allow flexibility to customer to decide if Enclave-VM needs
> dedicated CPU cores or is it ok to share them with Primary-VM
> as long as core-scheduling is used to guarantee proper isolation.

Running both parent and enclave on the same core can *potentially* lead
to L2 cache leakage, so we decided not to go with it :).

>>
>> Regarding the memory carve out, the logic includes page table entries
>> handling.
> As I thought. Thanks for conformation.
>>
>> IIRC, memory hot-unplug can be used for the memory blocks that were
>> previously hot-plugged.
>>
>> https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html__;!!GqivPVa7Brio!MubgaBjJabDtNzNpdOxxbSKtLbqXHbsEpTtZ1mj-rnfLvMIbLW1nZ8cK10GhYJQ$
>>
>>
>>>
>>> I don't quite understand why Enclave VM needs to be
>>> provisioned/teardown during primary VM's runtime.
>>>
>>> For example, an alternative could have been to just provision both
>>> primary VM and Enclave VM on primary VM startup.
>>> Then, wait for primary VM to setup a communication channel with
>>> Enclave VM (E.g. via virtio-vsock).
>>> Then, primary VM is free to request Enclave VM to perform various
>>> tasks when required on the isolated environment.
>>>
>>> Such setup will mimic a common Enclave setup. Such as Microsoft
>>> Windows VBS EPT-based Enclaves (That all runs on VTL1). It is also
>>> similar to TEEs running on ARM TrustZone.
>>> i.e. In my alternative proposed solution, the Enclave VM is similar
>>> to VTL1/TrustZone.
>>> It will also avoid requiring introducing a new PCI device and driver.
>>
>> True, this can be another option, to provision the primary VM and the
>> enclave VM at launch time.
>>
>> In the proposed setup, the primary VM starts with the initial
>> allocated resources (memory, CPUs). The launch path of the enclave VM,
>> as it's spawned on the same host, is done via the ioctl interface -
>> PCI device - host hypervisor path. Short-running or long-running
>> enclave can be bootstrapped during primary VM lifetime. Depending on
>> the use case, a custom set of resources (memory and CPUs) is set for
>> an enclave and then given back when the enclave is terminated; these
>> resources can be used for another enclave spawned later on or the
>> primary VM tasks.
>>
> Yes, I already understood this is how the mechanism work. I'm
> questioning whether this is indeed a good approach that should also be
> taken by upstream.

I thought the point of Linux was to support devices that exist, rather
than change the way the world works around it? ;)

> The use-case of using Nitro Enclaves is for a Confidential-Computing
> service. i.e. The ability to provision a compute instance that can be
> trusted to perform a bunch of computation on sensitive
> information with high confidence that it cannot be compromised as it's
> highly isolated. Some technologies such as Intel SGX and AMD SEV
> attempted to achieve this even with guarantees that
> the computation is isolated from the hardware and hypervisor itself.

Yeah, that worked really well, didn't it? ;)

> I would have expected that for the vast majority of real customer
> use-cases, the customer will provision a compute instance that runs some
> confidential-computing task in an enclave which it
> keeps running for the entire life-time of the compute instance. As the
> sole purpose of the compute instance is to just expose a service that
> performs some confidential-computing task.
> For those cases, it should have been sufficient to just pre-provision a
> single Enclave-VM that performs this task, together with the compute
> instance and connect them via virtio-vsock.
> Without introducing any new virtual PCI device, guest PCI driver and
> unique semantics of stealing resources (CPUs and Memory) from primary-VM
> at runtime.

You would also need to preprovision the image that runs in the enclave,
which is usually only determined at runtime. For that you need the PCI
driver anyway, so why not make the creation dynamic too?

> In this Nitro Enclave architecture, we de-facto put Compute
> control-plane abilities in the hands of the guest VM. Instead of
> introducing new control-plane primitives that allows building
> the data-plane architecture desired by the customer in a flexible manner.
> * What if the customer prefers to have it's Enclave VM polling S3 bucket
> for new tasks and produce results to S3 as-well? Without having any
> "Primary-VM" or virtio-vsock connection of any kind?
> * What if for some use-cases customer wants Enclave-VM to have dedicated
> compute power (i.e. Not share physical CPU cores with primary-VM. Not
> even with core-scheduling) but for other
> use-cases, customer prefers to share physical CPU cores with Primary-VM
> (Together with core-scheduling guarantees)? (Although this could be
> addressed by extending the virtual PCI device
> interface with a knob to control this)
>
> An alternative would have been to have the following new control-plane
> primitives:
> * Ability to provision a VM without boot-volume, but instead from an
> Image that is used to boot from memory. Allowing to provision disk-less
> VMs.
>   (E.g. Can be useful for other use-cases such as VMs not requiring EBS
> at all which could allow cheaper compute instance)
> * Ability to provision a group of VMs together as a group such that they
> are guaranteed to launch as sibling VMs on the same host.
> * Ability to create a fast-path connection between sibling VMs on the
> same host with virtio-vsock. Or even also other shared-memory mechanism.
> * Extend AWS Fargate with ability to run multiple microVMs as a group
> (Similar to above) connected with virtio-vsock. To allow on-demand scale
> of confidential-computing task.

Yes, there are a *lot* of different ways to implement enclaves in a
cloud environment. This is the one that we focused on, but I'm sure
others in the space will have more ideas. It's definitely an interesting
space and I'm eager to see more innovation happening :).

> Having said that, I do see a similar architecture to Nitro Enclaves
> virtual PCI device used for a different purpose: For hypervisor-based
> security isolation (Such as Windows VBS).
> E.g. Linux boot-loader can detect the presence of this virtual PCI
> device and use it to provision multiple VM security domains. Such that
> when a security domain is created,
> it is specified what is the hardware resources it have access to (Guest
> memory pages, IOPorts, MSRs and etc.) and the blob it should run to
> bootstrap. Similar, but superior than,
> Hyper-V VSM. In addition, some security domains will be given special
> abilities to control other security domains (For example, to control the
> +XS,+XU EPT bits of other security
> domains to enforce code-integrity. Similar to Windows VBS HVCI). Just an
> idea... :)

Yes, absolutely! So much fun to be had :D


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2020-04-28 16:04:55

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves


On 28/04/2020 18:25, Alexander Graf wrote:
>
>
> On 27.04.20 13:44, Liran Alon wrote:
>>
>> On 27/04/2020 10:56, Paraschiv, Andra-Irina wrote:
>>>
>>> On 25/04/2020 18:25, Liran Alon wrote:
>>>>
>>>> On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:
>>>>>
>>>>> The memory and CPUs are carved out of the primary VM, they are
>>>>> dedicated for the enclave. The Nitro hypervisor running on the host
>>>>> ensures memory and CPU isolation between the primary VM and the
>>>>> enclave VM.
>>>> I hope you properly take into consideration Hyper-Threading
>>>> speculative side-channel vulnerabilities here.
>>>> i.e. Usually cloud providers designate each CPU core to be assigned
>>>> to run only vCPUs of specific guest. To avoid sharing a single CPU
>>>> core between multiple guests.
>>>> To handle this properly, you need to use some kind of core-scheduling
>>>> mechanism (Such that each CPU core either runs only vCPUs of enclave
>>>> or only vCPUs of primary VM at any given point in time).
>>>>
>>>> In addition, can you elaborate more on how the enclave memory is
>>>> carved out of the primary VM?
>>>> Does this involve performing a memory hot-unplug operation from
>>>> primary VM or just unmap enclave-assigned guest physical pages from
>>>> primary VM's SLAT (EPT/NPT) and map them now only in enclave's SLAT?
>>>
>>> Correct, we take into consideration the HT setup. The enclave gets
>>> dedicated physical cores. The primary VM and the enclave VM don't run
>>> on CPU siblings of a physical core.
>> The way I would imagine this to work is that Primary-VM just specifies
>> how many vCPUs will the Enclave-VM have and those vCPUs will be set with
>> affinity to run on same physical CPU cores as Primary-VM.
>> But with the exception that scheduler is modified to not run vCPUs of
>> Primary-VM and Enclave-VM as sibling on the same physical CPU core
>> (core-scheduling). i.e. This is different than primary-VM losing
>> physical CPU cores permanently as long as the Enclave-VM is running.
>> Or maybe this should even be controlled by a knob in virtual PCI device
>> interface to allow flexibility to customer to decide if Enclave-VM needs
>> dedicated CPU cores or is it ok to share them with Primary-VM
>> as long as core-scheduling is used to guarantee proper isolation.
>
> Running both parent and enclave on the same core can *potentially*
> lead to L2 cache leakage, so we decided not to go with it :).
Haven't thought about the L2 cache. Makes sense. Ack.
>
>>>
>>> Regarding the memory carve out, the logic includes page table entries
>>> handling.
>> As I thought. Thanks for conformation.
>>>
>>> IIRC, memory hot-unplug can be used for the memory blocks that were
>>> previously hot-plugged.
>>>
>>> https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html__;!!GqivPVa7Brio!MubgaBjJabDtNzNpdOxxbSKtLbqXHbsEpTtZ1mj-rnfLvMIbLW1nZ8cK10GhYJQ$
>>>
>>>
>>>>
>>>> I don't quite understand why Enclave VM needs to be
>>>> provisioned/teardown during primary VM's runtime.
>>>>
>>>> For example, an alternative could have been to just provision both
>>>> primary VM and Enclave VM on primary VM startup.
>>>> Then, wait for primary VM to setup a communication channel with
>>>> Enclave VM (E.g. via virtio-vsock).
>>>> Then, primary VM is free to request Enclave VM to perform various
>>>> tasks when required on the isolated environment.
>>>>
>>>> Such setup will mimic a common Enclave setup. Such as Microsoft
>>>> Windows VBS EPT-based Enclaves (That all runs on VTL1). It is also
>>>> similar to TEEs running on ARM TrustZone.
>>>> i.e. In my alternative proposed solution, the Enclave VM is similar
>>>> to VTL1/TrustZone.
>>>> It will also avoid requiring introducing a new PCI device and driver.
>>>
>>> True, this can be another option, to provision the primary VM and the
>>> enclave VM at launch time.
>>>
>>> In the proposed setup, the primary VM starts with the initial
>>> allocated resources (memory, CPUs). The launch path of the enclave VM,
>>> as it's spawned on the same host, is done via the ioctl interface -
>>> PCI device - host hypervisor path. Short-running or long-running
>>> enclave can be bootstrapped during primary VM lifetime. Depending on
>>> the use case, a custom set of resources (memory and CPUs) is set for
>>> an enclave and then given back when the enclave is terminated; these
>>> resources can be used for another enclave spawned later on or the
>>> primary VM tasks.
>>>
>> Yes, I already understood this is how the mechanism work. I'm
>> questioning whether this is indeed a good approach that should also be
>> taken by upstream.
>
> I thought the point of Linux was to support devices that exist, rather
> than change the way the world works around it? ;)
I agree. Just poking around to see if upstream wants to implement a
different approach for Enclaves, regardless of accepting the Nitro
Enclave virtual PCI driver for AWS use-case of course.
>
>> The use-case of using Nitro Enclaves is for a Confidential-Computing
>> service. i.e. The ability to provision a compute instance that can be
>> trusted to perform a bunch of computation on sensitive
>> information with high confidence that it cannot be compromised as it's
>> highly isolated. Some technologies such as Intel SGX and AMD SEV
>> attempted to achieve this even with guarantees that
>> the computation is isolated from the hardware and hypervisor itself.
>
> Yeah, that worked really well, didn't it? ;)
You haven't seen me saying SGX worked well. :)
AMD SEV though still have it's shot (Once SEV-SNP will be GA).
>
>> I would have expected that for the vast majority of real customer
>> use-cases, the customer will provision a compute instance that runs some
>> confidential-computing task in an enclave which it
>> keeps running for the entire life-time of the compute instance. As the
>> sole purpose of the compute instance is to just expose a service that
>> performs some confidential-computing task.
>> For those cases, it should have been sufficient to just pre-provision a
>> single Enclave-VM that performs this task, together with the compute
>> instance and connect them via virtio-vsock.
>> Without introducing any new virtual PCI device, guest PCI driver and
>> unique semantics of stealing resources (CPUs and Memory) from primary-VM
>> at runtime.
>
> You would also need to preprovision the image that runs in the
> enclave, which is usually only determined at runtime. For that you
> need the PCI driver anyway, so why not make the creation dynamic too?
The image doesn't have to be determined at runtime. It could be supplied
to control-plane. As mentioned below.
>
>> In this Nitro Enclave architecture, we de-facto put Compute
>> control-plane abilities in the hands of the guest VM. Instead of
>> introducing new control-plane primitives that allows building
>> the data-plane architecture desired by the customer in a flexible
>> manner.
>> * What if the customer prefers to have it's Enclave VM polling S3 bucket
>> for new tasks and produce results to S3 as-well? Without having any
>> "Primary-VM" or virtio-vsock connection of any kind?
>> * What if for some use-cases customer wants Enclave-VM to have dedicated
>> compute power (i.e. Not share physical CPU cores with primary-VM. Not
>> even with core-scheduling) but for other
>> use-cases, customer prefers to share physical CPU cores with Primary-VM
>> (Together with core-scheduling guarantees)? (Although this could be
>> addressed by extending the virtual PCI device
>> interface with a knob to control this)
>>
>> An alternative would have been to have the following new control-plane
>> primitives:
>> * Ability to provision a VM without boot-volume, but instead from an
>> Image that is used to boot from memory. Allowing to provision
>> disk-less VMs.
>>    (E.g. Can be useful for other use-cases such as VMs not requiring EBS
>> at all which could allow cheaper compute instance)
>> * Ability to provision a group of VMs together as a group such that they
>> are guaranteed to launch as sibling VMs on the same host.
>> * Ability to create a fast-path connection between sibling VMs on the
>> same host with virtio-vsock. Or even also other shared-memory mechanism.
>> * Extend AWS Fargate with ability to run multiple microVMs as a group
>> (Similar to above) connected with virtio-vsock. To allow on-demand scale
>> of confidential-computing task.
>
> Yes, there are a *lot* of different ways to implement enclaves in a
> cloud environment. This is the one that we focused on, but I'm sure
> others in the space will have more ideas. It's definitely an
> interesting space and I'm eager to see more innovation happening :).
>
>> Having said that, I do see a similar architecture to Nitro Enclaves
>> virtual PCI device used for a different purpose: For hypervisor-based
>> security isolation (Such as Windows VBS).
>> E.g. Linux boot-loader can detect the presence of this virtual PCI
>> device and use it to provision multiple VM security domains. Such that
>> when a security domain is created,
>> it is specified what is the hardware resources it have access to (Guest
>> memory pages, IOPorts, MSRs and etc.) and the blob it should run to
>> bootstrap. Similar, but superior than,
>> Hyper-V VSM. In addition, some security domains will be given special
>> abilities to control other security domains (For example, to control the
>> +XS,+XU EPT bits of other security
>> domains to enforce code-integrity. Similar to Windows VBS HVCI). Just an
>> idea... :)
>
> Yes, absolutely! So much fun to be had :D

:)

-Liran

>
>
> Alex
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>

2020-04-29 13:22:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 28/04/20 17:07, Alexander Graf wrote:
>> So why not just start running the enclave at 0xfffffff0 in real mode?
>> Yes everybody hates it, but that's what OSes are written against. In
>> the simplest example, the parent enclave can load bzImage and initrd at
>> 0x10000 and place firmware tables (MPTable and DMI) somewhere at
>> 0xf0000; the firmware would just be a few movs to segment registers
>> followed by a long jmp.
>
> There is a bit of initial attestation flow in the enclave, so that
> you can be sure that the code that is running is actually what you wanted to
> run.

Can you explain this, since it's not documented?

>   vm = ne_create(vcpus = 4)
>   ne_set_memory(vm, hva, len)
>   ne_load_image(vm, addr, len)
>   ne_start(vm)
>
> That way we would get the EIF loading into kernel space. "LOAD_IMAGE"
> would only be available in the time window between set_memory and start.
> It basically implements a memcpy(), but it would completely hide the
> hidden semantics of where an EIF has to go, so future device versions
> (or even other enclave implementers) could change the logic.
>
> I think it also makes sense to just allocate those 4 ioctls from
> scratch. Paolo, would you still want to "donate" KVM ioctl space in that
> case?

Sure, that's not a problem.

Paolo

> Overall, the above should address most of the concerns you raised in
> this mail, right? It still requires copying, but at least we don't have
> to keep the copy in kernel space.

2020-04-29 16:34:41

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 04/15] nitro_enclaves: Init PCI device driver



On 25/04/2020 17:25, Liran Alon wrote:
>
> On 21/04/2020 21:41, Andra Paraschiv wrote:
>> +
>> +/**
>> + * ne_setup_msix - Setup MSI-X vectors for the PCI device.
>> + *
>> + * @pdev: PCI device to setup the MSI-X for.
>> + * @ne_pci_dev: PCI device private data structure.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_setup_msix(struct pci_dev *pdev, struct ne_pci_dev
>> *ne_pci_dev)
>> +{
>> +    int nr_vecs = 0;
>> +    int rc = -EINVAL;
>> +
>> +    BUG_ON(!ne_pci_dev);
> This kind of defensive programming does not align with Linux coding
> convention.
> I think these BUG_ON() conditions should be removed.

I replaced with WARN_ON here and in the other places in the codebase.

>> +
>> +    nr_vecs = pci_msix_vec_count(pdev);
>> +    if (nr_vecs < 0) {
>> +        rc = nr_vecs;
>> +
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in getting vec count [rc=%d]\n",
>> +                    rc);
>> +
>> +        return rc;
>> +    }
>> +
>> +    rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in alloc MSI-X vecs [rc=%d]\n",
>> +                    rc);
>> +
>> +        goto err_alloc_irq_vecs;
> You should just replace this with "return rc;" as no cleanup is
> required here.

Done.

>> +    }
>> +
>> +    return 0;
>> +
>> +err_alloc_irq_vecs:
>> +    return rc;
>> +}
>> +
>> +/**
>> + * ne_pci_dev_enable - Select PCI device version and enable it.
>> + *
>> + * @pdev: PCI device to select version for and then enable.
>> + * @ne_pci_dev: PCI device private data structure.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_pci_dev_enable(struct pci_dev *pdev,
>> +                 struct ne_pci_dev *ne_pci_dev)
>> +{
>> +    u8 dev_enable_reply = 0;
>> +    u16 dev_version_reply = 0;
>> +
>> +    BUG_ON(!pdev);
>> +    BUG_ON(!ne_pci_dev);
>> +    BUG_ON(!ne_pci_dev->iomem_base);
> Same.
>> +
>> +    iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
>> +
>> +    dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
>> +    if (dev_version_reply != NE_VERSION_MAX) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in pci dev version cmd\n");
>> +
>> +        return -EIO;
>> +    }
>> +
>> +    iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
>> +
>> +    dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
>> +    if (dev_enable_reply != NE_ENABLE_ON) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in pci dev enable cmd\n");
>> +
>> +        return -EIO;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * ne_pci_dev_disable - Disable PCI device.
>> + *
>> + * @pdev: PCI device to disable.
>> + * @ne_pci_dev: PCI device private data structure.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_pci_dev_disable(struct pci_dev *pdev,
>> +                  struct ne_pci_dev *ne_pci_dev)
>> +{
>> +    u8 dev_disable_reply = 0;
>> +
>> +    BUG_ON(!pdev);
>> +    BUG_ON(!ne_pci_dev);
>> +    BUG_ON(!ne_pci_dev->iomem_base);
> Same.
>> +
>> +    iowrite8(NE_ENABLE_OFF, ne_pci_dev->iomem_base + NE_ENABLE);
>> +
>> +    /*
>> +     * TODO: Check for NE_ENABLE_OFF in a loop, to handle cases when
>> the
>> +     * device state is not immediately set to disabled and going
>> through a
>> +     * transitory state of disabling.
>> +     */
>> +    dev_disable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
>> +    if (dev_disable_reply != NE_ENABLE_OFF) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in pci dev disable cmd\n");
>> +
>> +        return -EIO;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int ne_probe(struct pci_dev *pdev, const struct pci_device_id
>> *id)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = NULL;
>> +    int rc = -EINVAL;
> Unnecessary variable initialization.
> ne_pci_dev and rc are initialized below always before they are used.

I would rather keep the initialization in place overall, to not have a
mix of init and uninit vars, when needed.

>> +
>> +    ne_pci_dev = kzalloc(sizeof(*ne_pci_dev), GFP_KERNEL);
>> +    if (!ne_pci_dev)
>> +        return -ENOMEM;
>> +
>> +    rc = pci_enable_device(pdev);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in pci dev enable [rc=%d]\n", rc);
>> +
> Why is this dev_err_ratelimited() instead of dev_err()?
> Same for the rest of error printing in this probe() method and other
> places in this patch.

Just to avoid any misbehaving scenario, where would be way too many logs
in a short timeframe. Here may not happen, but while handling PCI dev
requests could be.

>> +        goto err_pci_enable_dev;
> I find it confusing that the error labels are named based on the
> failure-case they are used,
> instead of the action they do (i.e. Unwind previous successful
> operation that requires unwinding).
> This doesn't seem to match Linux kernel coding convention.
> It also created an unnecessary 2 labels pointing to the same place in
> cleanup code.

Yep, that's better this way wrt the naming of the labels. I updated the
gotos in the patch series.

>> +    }
>> +
>> +    rc = pci_request_regions_exclusive(pdev, "ne_pci_dev");
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in pci request regions [rc=%d]\n",
>> +                    rc);
>> +
>> +        goto err_req_regions;
>> +    }
>> +
>> +    ne_pci_dev->iomem_base = pci_iomap(pdev, PCI_BAR_NE, 0);
>> +    if (!ne_pci_dev->iomem_base) {
>> +        rc = -ENOMEM;
>> +
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in pci bar mapping [rc=%d]\n", rc);
>> +
>> +        goto err_iomap;
>> +    }
>> +
>> +    rc = ne_setup_msix(pdev, ne_pci_dev);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in pci dev msix setup [rc=%d]\n",
>> +                    rc);
>> +
>> +        goto err_setup_msix;
>> +    }
>> +
>> +    rc = ne_pci_dev_disable(pdev, ne_pci_dev);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in ne_pci_dev disable [rc=%d]\n",
>> +                    rc);
>> +
>> +        goto err_ne_pci_dev_disable;
>> +    }
> It seems weird that we need to disable the device before enabling it
> on the probe() method.
> Why can't we just enable the device without disabling it?

The pci dev disable call cleanups the internal state of the device and
terminates any running / "dangling" enclaves; here it is included just
in case any remaining state from a previous PCI device use. The below
enable call would fail in that case, though.

>> +
>> +    rc = ne_pci_dev_enable(pdev, ne_pci_dev);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in ne_pci_dev enable [rc=%d]\n",
>> +                    rc);
>> +
>> +        goto err_ne_pci_dev_enable;
>> +    }
>> +
>> +    atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
>> +    init_waitqueue_head(&ne_pci_dev->cmd_reply_wait_q);
>> +    INIT_LIST_HEAD(&ne_pci_dev->enclaves_list);
>> +    mutex_init(&ne_pci_dev->enclaves_list_mutex);
>> +    mutex_init(&ne_pci_dev->pci_dev_mutex);
>> +
>> +    pci_set_drvdata(pdev, ne_pci_dev);
> If you would have pci_set_drvdata() as one of the first operations in
> ne_probe(), then you could have avoided
> passing both struct pci_dev  and struct ne_pci_dev parameters to
> ne_setup_msix(), ne_pci_dev_enable() and ne_pci_dev_disable().
> Which would have been a bit more elegant.

Fair point. I moved pci_set_drvdata() upper in the logic and updated the
signature of the functions to only include the pci_dev parameter.

>> +
>> +    return 0;
>> +
>> +err_ne_pci_dev_enable:
>> +err_ne_pci_dev_disable:
>> +    pci_free_irq_vectors(pdev);
>> +err_setup_msix:
>> +    pci_iounmap(pdev, ne_pci_dev->iomem_base);
>> +err_iomap:
>> +    pci_release_regions(pdev);
>> +err_req_regions:
>> +    pci_disable_device(pdev);
>> +err_pci_enable_dev:
>> +    kzfree(ne_pci_dev);
> An empty new-line is appropriate here.
> To separate the return statement from the cleanup logic.

Done.

>> +    return rc;
>> +}
>> +
>> +static void ne_remove(struct pci_dev *pdev)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
>> +
>> +    if (!ne_pci_dev || !ne_pci_dev->iomem_base)
>> +        return;
> Why is this condition necessary?
> The ne_remove() function should be called only in case ne_probe()
> succeeded.
> In that case, both ne_pci_dev and ne_pci_dev->iomem_base should be
> non-NULL.

Correct, that shouldn't happen.

Just for early exit in case of bad behavior.

>> +
>> +    ne_pci_dev_disable(pdev, ne_pci_dev);
>> +
>> +    pci_set_drvdata(pdev, NULL);
>> +
>> +    pci_free_irq_vectors(pdev);
>> +
>> +    pci_iounmap(pdev, ne_pci_dev->iomem_base);
>> +
>> +    kzfree(ne_pci_dev);
>> +
>> +    pci_release_regions(pdev);
>> +
>> +    pci_disable_device(pdev);
> You should aspire to keep ne_remove() order of operations to be the
> reverse order of operations done in ne_probe().
> Which would also nicely match the order of operations done in
> ne_probe() cleanup.
> i.e. The following order:
>
> pci_set_drvdata();
> ne_pci_dev_disable();
> pci_free_irq_vectors();
> pci_iounmap();
> pci_release_regions();
> pci_disable_device()
> kzfree();

I updated the order of operations.


Thanks for review, Liran.

Andra





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-29 17:05:49

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 05/15] nitro_enclaves: Handle PCI device command requests



On 25/04/2020 17:52, Liran Alon wrote:
>
> On 21/04/2020 21:41, Andra Paraschiv wrote:
>> The Nitro Enclaves PCI device exposes a MMIO space that this driver
>> uses to submit command requests and to receive command replies e.g. for
>> enclave creation / termination or setting enclave resources.
>>
>> Add logic for handling PCI device command requests based on the given
>> command type.
>>
>> Register an MSI-X interrupt vector for command reply notifications to
>> handle this type of communication events.
>>
>> Signed-off-by: Alexandru-Catalin Vasile <[email protected]>
>> Signed-off-by: Andra Paraschiv <[email protected]>
>> ---
>>   .../virt/amazon/nitro_enclaves/ne_pci_dev.c   | 264 ++++++++++++++++++
>>   1 file changed, 264 insertions(+)
>>
>> diff --git a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
>> b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
>> index 8fbee95ea291..7453d129689a 100644
>> --- a/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
>> +++ b/drivers/virt/amazon/nitro_enclaves/ne_pci_dev.c
>> @@ -40,6 +40,251 @@ static const struct pci_device_id ne_pci_ids[] = {
>>     MODULE_DEVICE_TABLE(pci, ne_pci_ids);
>>   +/**
>> + * ne_submit_request - Submit command request to the PCI device
>> based on the
>> + * command type.
>> + *
>> + * This function gets called with the ne_pci_dev mutex held.
>> + *
>> + * @pdev: PCI device to send the command to.
>> + * @cmd_type: command type of the request sent to the PCI device.
>> + * @cmd_request: command request payload.
>> + * @cmd_request_size: size of the command request payload.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_submit_request(struct pci_dev *pdev,
>> +                 enum ne_pci_dev_cmd_type cmd_type,
>> +                 void *cmd_request, size_t cmd_request_size)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = NULL;
> These local vars are unnecessarily initialized.

I would keep this initialized overall.

>> +
>> +    BUG_ON(!pdev);
>> +
>> +    ne_pci_dev = pci_get_drvdata(pdev);
>> +    BUG_ON(!ne_pci_dev);
>> +    BUG_ON(!ne_pci_dev->iomem_base);
> You should remove these defensive BUG_ON() calls.

Done.

>> +
>> +    if (WARN_ON(cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD)) {
>> +        dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%d\n",
>> +                    cmd_type);
>> +
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (WARN_ON(!cmd_request))
>> +        return -EINVAL;
>> +
>> +    if (WARN_ON(cmd_request_size > NE_SEND_DATA_SIZE)) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Invalid req size=%ld for cmd type=%d\n",
>> +                    cmd_request_size, cmd_type);
>> +
>> +        return -EINVAL;
>> +    }
> It doesn't make sense to have WARN_ON() print error to dmesg on every
> evaluation to true,
> together with using dev_err_ratelimited() which attempts to rate-limit
> prints.
>
> Anyway, these conditions were already checked by ne_do_request(). Why
> also check them here?

Updated to not use WARN_ON. Right, they were checked before, but I kept
them here just for checking the parameters.

>
>> +
>> +    memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request,
>> +            cmd_request_size);
>> +
>> +    iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * ne_retrieve_reply - Retrieve reply from the PCI device.
>> + *
>> + * This function gets called with the ne_pci_dev mutex held.
>> + *
>> + * @pdev: PCI device to receive the reply from.
>> + * @cmd_reply: command reply payload.
>> + * @cmd_reply_size: size of the command reply payload.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_retrieve_reply(struct pci_dev *pdev,
>> +                 struct ne_pci_dev_cmd_reply *cmd_reply,
>> +                 size_t cmd_reply_size)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = NULL;
> These local vars are unnecessarily initialized.
>> +
>> +    BUG_ON(!pdev);
>> +
>> +    ne_pci_dev = pci_get_drvdata(pdev);
>> +    BUG_ON(!ne_pci_dev);
>> +    BUG_ON(!ne_pci_dev->iomem_base);
> You should remove these defensive BUG_ON() calls.
>> +
>> +    if (WARN_ON(!cmd_reply))
>> +        return -EINVAL;
>> +
>> +    if (WARN_ON(cmd_reply_size > NE_RECV_DATA_SIZE)) {
>> +        dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
>> +                    cmd_reply_size);
>> +
>> +        return -EINVAL;
>> +    }
> It doesn't make sense to have WARN_ON() print error to dmesg on every
> evaluation to true,
> together with using dev_err_ratelimited() which attempts to rate-limit
> prints.
>
> Anyway, these conditions were already checked by ne_do_request(). Why
> also check them here?
>
>> +
>> +    memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA,
>> +              cmd_reply_size);
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * ne_wait_for_reply - Wait for a reply of a PCI command.
>> + *
>> + * This function gets called with the ne_pci_dev mutex held.
>> + *
>> + * @pdev: PCI device for which a reply is waited.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_wait_for_reply(struct pci_dev *pdev)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = NULL;
>> +    int rc = -EINVAL;
> These local vars are unnecessarily initialized.
>> +
>> +    BUG_ON(!pdev);
>> +
>> +    ne_pci_dev = pci_get_drvdata(pdev);
>> +    BUG_ON(!ne_pci_dev);
> You should remove these defensive BUG_ON() calls.
>> +
>> +    /*
>> +     * TODO: Update to _interruptible and handle interrupted wait event
>> +     * e.g. -ERESTARTSYS, incoming signals + add / update timeout.
>> +     */
>> +    rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
>> + atomic_read(&ne_pci_dev->cmd_reply_avail) != 0,
>> +                msecs_to_jiffies(DEFAULT_TIMEOUT_MSECS));
>> +    if (!rc) {
>> +        pr_err("Wait event timed out when waiting for PCI cmd
>> reply\n");
>> +
>> +        return -ETIMEDOUT;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type
>> cmd_type,
>> +          void *cmd_request, size_t cmd_request_size,
>> +          struct ne_pci_dev_cmd_reply *cmd_reply, size_t
>> cmd_reply_size)
> This function is introduced in this patch but it is not used.
> It will cause compiling the kernel on this commit to raise
> warnings/errors on unused functions.
> You should introduce functions on the patch that they are used.

This function is externally available, via the ne_pci_dev header, so it
shouldn't raise warnings.

>> +{
>> +    struct ne_pci_dev *ne_pci_dev = NULL;
>> +    int rc = -EINVAL;
> These local vars are unnecessarily initialized.
>> +
>> +    BUG_ON(!pdev);
>> +
>> +    ne_pci_dev = pci_get_drvdata(pdev);
>> +    BUG_ON(!ne_pci_dev);
>> +    BUG_ON(!ne_pci_dev->iomem_base);
> You should remove these defensive BUG_ON() calls.
>> +
>> +    if (WARN_ON(cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD)) {
>> +        dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%d\n",
>> +                    cmd_type);
>> +
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (WARN_ON(!cmd_request))
>> +        return -EINVAL;
>> +
>> +    if (WARN_ON(cmd_request_size > NE_SEND_DATA_SIZE)) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Invalid req size=%ld for cmd type=%d\n",
>> +                    cmd_request_size, cmd_type);
>> +
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (WARN_ON(!cmd_reply))
>> +        return -EINVAL;
>> +
>> +    if (WARN_ON(cmd_reply_size > NE_RECV_DATA_SIZE)) {
>> +        dev_err_ratelimited(&pdev->dev, "Invalid reply size=%ld\n",
>> +                    cmd_reply_size);
>> +
>> +        return -EINVAL;
>> +    }
> I would consider specifying all these conditions in function
> documentation instead of enforcing them at runtime on every function
> call.

I think that both PCI dev logic checks and documentation would be
helpful in this case. :)

>> +
>> +    /*
>> +     * Use this mutex so that the PCI device handles one command
>> request at
>> +     * a time.
>> +     */
>> +    mutex_lock(&ne_pci_dev->pci_dev_mutex);
>> +
>> +    atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
>> +
>> +    rc = ne_submit_request(pdev, cmd_type, cmd_request,
>> cmd_request_size);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in submit cmd request [rc=%d]\n",
>> +                    rc);
>> +
>> +        mutex_unlock(&ne_pci_dev->pci_dev_mutex);
>> +
>> +        return rc;
> Consider leaving function with a goto to a label that unlocks mutex
> and then return.

Done, I added a goto for mutex unlock and return. In this patch and in a
following one, that was having a similar cleanup structure.

>> +    }
>> +
>> +    rc = ne_wait_for_reply(pdev);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in wait cmd reply [rc=%d]\n",
>> +                    rc);
>> +
>> +        mutex_unlock(&ne_pci_dev->pci_dev_mutex);
>> +
>> +        return rc;
>> +    }
>> +
>> +    rc = ne_retrieve_reply(pdev, cmd_reply, cmd_reply_size);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in retrieve cmd reply [rc=%d]\n",
>> +                    rc);
>> +
>> +        mutex_unlock(&ne_pci_dev->pci_dev_mutex);
>> +
>> +        return rc;
>> +    }
>> +
>> +    atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
>> +
>> +    if (cmd_reply->rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in cmd process logic [rc=%d]\n",
>> +                    cmd_reply->rc);
>> +
>> +        mutex_unlock(&ne_pci_dev->pci_dev_mutex);
>> +
>> +        return cmd_reply->rc;
>> +    }
>> +
>> +    mutex_unlock(&ne_pci_dev->pci_dev_mutex);
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * ne_reply_handler - Interrupt handler for retrieving a reply matching
>> + * a request sent to the PCI device for enclave lifetime management.
>> + *
>> + * @irq: received interrupt for a reply sent by the PCI device.
>> + * @args: PCI device private data structure.
>> + *
>> + * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
>> + */
>> +static irqreturn_t ne_reply_handler(int irq, void *args)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
>> +
>> +    atomic_set(&ne_pci_dev->cmd_reply_avail, 1);
>> +
>> +    /* TODO: Update to _interruptible. */
>> +    wake_up(&ne_pci_dev->cmd_reply_wait_q);
>> +
>> +    return IRQ_HANDLED;
>> +}
>> +
>>   /**
>>    * ne_setup_msix - Setup MSI-X vectors for the PCI device.
>>    *
>> @@ -75,8 +320,25 @@ static int ne_setup_msix(struct pci_dev *pdev,
>> struct ne_pci_dev *ne_pci_dev)
>>           goto err_alloc_irq_vecs;
>>       }
>>   +    /*
>> +     * This IRQ gets triggered every time the PCI device responds to a
>> +     * command request. The reply is then retrieved, reading from
>> the MMIO
>> +     * space of the PCI device.
>> +     */
>> +    rc = request_irq(pci_irq_vector(pdev, NE_VEC_REPLY),
>> +             ne_reply_handler, 0, "enclave_cmd", ne_pci_dev);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Failure in allocating irq reply [rc=%d]\n",
>> +                    rc);
>> +
>> +        goto err_req_irq_reply;
>> +    }
>> +
>>       return 0;
>>   +err_req_irq_reply:
>> +    pci_free_irq_vectors(pdev);
>>   err_alloc_irq_vecs:
>>       return rc;
>>   }
>> @@ -232,6 +494,7 @@ static int ne_probe(struct pci_dev *pdev, const
>> struct pci_device_id *id)
>>     err_ne_pci_dev_enable:
>>   err_ne_pci_dev_disable:
>> +    free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
>>       pci_free_irq_vectors(pdev);
> I suggest to introduce a ne_teardown_msix() utility. That is aimed to
> cleanup after ne_setup_msix().

I added this functionality in a new function, then I used it for cleanup
in this function and teardown in pci remove function.

Thank you.

Andra

>>   err_setup_msix:
>>       pci_iounmap(pdev, ne_pci_dev->iomem_base);
>> @@ -255,6 +518,7 @@ static void ne_remove(struct pci_dev *pdev)
>>         pci_set_drvdata(pdev, NULL);
>>   +    free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
>>       pci_free_irq_vectors(pdev);
>>         pci_iounmap(pdev, ne_pci_dev->iomem_base);



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-04-30 10:37:02

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 28/04/20 17:07, Alexander Graf wrote:
>
> Why don't we build something like the following instead?
>
>   vm = ne_create(vcpus = 4)
>   ne_set_memory(vm, hva, len)
>   ne_load_image(vm, addr, len)
>   ne_start(vm)
>
> That way we would get the EIF loading into kernel space. "LOAD_IMAGE"
> would only be available in the time window between set_memory and start.
> It basically implements a memcpy(), but it would completely hide the
> hidden semantics of where an EIF has to go, so future device versions
> (or even other enclave implementers) could change the logic.

Can we add a file format argument and flags to ne_load_image, to avoid
having a v2 ioctl later?

Also, would you consider a mode where ne_load_image is not invoked and
the enclave starts in real mode at 0xffffff0?

Thanks,

Paolo

2020-04-30 11:23:40

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 30.04.20 12:34, Paolo Bonzini wrote:
>
> On 28/04/20 17:07, Alexander Graf wrote:
>>
>> Why don't we build something like the following instead?
>>
>> vm = ne_create(vcpus = 4)
>> ne_set_memory(vm, hva, len)
>> ne_load_image(vm, addr, len)
>> ne_start(vm)
>>
>> That way we would get the EIF loading into kernel space. "LOAD_IMAGE"
>> would only be available in the time window between set_memory and start.
>> It basically implements a memcpy(), but it would completely hide the
>> hidden semantics of where an EIF has to go, so future device versions
>> (or even other enclave implementers) could change the logic.
>
> Can we add a file format argument and flags to ne_load_image, to avoid
> having a v2 ioctl later?

I think flags along should be enough, no? A new format would just be a flag.

That said, any of the commands above should have flags IMHO.

> Also, would you consider a mode where ne_load_image is not invoked and
> the enclave starts in real mode at 0xffffff0?

Consider, sure. But I don't quite see any big benefit just yet. The
current abstraction level for the booted payloads is much higher. That
allows us to simplify the device model dramatically: There is no need to
create a virtual flash region for example.

In addition, by moving firmware into the trusted base, firmware can
execute validation of the target image. If you make it all flat, how do
you verify whether what you're booting is what you think you're booting?

So in a nutshell, for a PV virtual machine spawning interface, I think
it would make sense to have memory fully owned by the parent. In the
enclave world, I would rather not like to give the parent too much
control over what memory actually means, outside of donating a bucket of it.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2020-04-30 11:43:03

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 30/04/20 13:21, Alexander Graf wrote:
>> Also, would you consider a mode where ne_load_image is not invoked and
>> the enclave starts in real mode at 0xffffff0?
>
> Consider, sure. But I don't quite see any big benefit just yet. The
> current abstraction level for the booted payloads is much higher. That
> allows us to simplify the device model dramatically: There is no need to
> create a virtual flash region for example.

It doesn't have to be flash, it can be just ROM.

> In addition, by moving firmware into the trusted base, firmware can
> execute validation of the target image. If you make it all flat, how do
> you verify whether what you're booting is what you think you're booting?

So the issue would be that a firmware image provided by the parent could
be tampered with by something malicious running in the parent enclave?

Paolo

> So in a nutshell, for a PV virtual machine spawning interface, I think
> it would make sense to have memory fully owned by the parent. In the
> enclave world, I would rather not like to give the parent too much
> control over what memory actually means, outside of donating a bucket of
> it.

2020-04-30 11:49:13

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 30.04.20 13:38, Paolo Bonzini wrote:
>
> On 30/04/20 13:21, Alexander Graf wrote:
>>> Also, would you consider a mode where ne_load_image is not invoked and
>>> the enclave starts in real mode at 0xffffff0?
>>
>> Consider, sure. But I don't quite see any big benefit just yet. The
>> current abstraction level for the booted payloads is much higher. That
>> allows us to simplify the device model dramatically: There is no need to
>> create a virtual flash region for example.
>
> It doesn't have to be flash, it can be just ROM.
>
>> In addition, by moving firmware into the trusted base, firmware can
>> execute validation of the target image. If you make it all flat, how do
>> you verify whether what you're booting is what you think you're booting?
>
> So the issue would be that a firmware image provided by the parent could
> be tampered with by something malicious running in the parent enclave?

You have to have a root of trust somewhere. That root then checks and
attests everything it runs. What exactly would you attest for with a
flat address space model?

So the issue is that the enclave code can not trust its own integrity if
it doesn't have anything at a higher level attesting it. The way this is
usually solved on bare metal systems is that you trust your CPU which
then checks the firmware integrity (Boot Guard). Where would you put
that check in a VM model? How close would it be to a normal VM then? And
if it's not, what's the point of sticking to such terrible legacy boot
paths?


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2020-04-30 12:01:06

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On 30/04/20 13:47, Alexander Graf wrote:
>>
>> So the issue would be that a firmware image provided by the parent could
>> be tampered with by something malicious running in the parent enclave?
>
> You have to have a root of trust somewhere. That root then checks and
> attests everything it runs. What exactly would you attest for with a
> flat address space model?
>
> So the issue is that the enclave code can not trust its own integrity if
> it doesn't have anything at a higher level attesting it. The way this is
> usually solved on bare metal systems is that you trust your CPU which
> then checks the firmware integrity (Boot Guard). Where would you put
> that check in a VM model?

In the enclave device driver, I would just limit the attestation to the
firmware image

So yeah it wouldn't be a mode where ne_load_image is not invoked and
the enclave starts in real mode at 0xffffff0. You would still need
"load image" functionality.

> How close would it be to a normal VM then? And
> if it's not, what's the point of sticking to such terrible legacy boot
> paths?

The point is that there's already two plausible loaders for the kernel
(bzImage and ELF), so I'd like to decouple the loader and the image.

Paolo

2020-04-30 12:23:48

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 30.04.20 13:58, Paolo Bonzini wrote:
>
> On 30/04/20 13:47, Alexander Graf wrote:
>>>
>>> So the issue would be that a firmware image provided by the parent could
>>> be tampered with by something malicious running in the parent enclave?
>>
>> You have to have a root of trust somewhere. That root then checks and
>> attests everything it runs. What exactly would you attest for with a
>> flat address space model?
>>
>> So the issue is that the enclave code can not trust its own integrity if
>> it doesn't have anything at a higher level attesting it. The way this is
>> usually solved on bare metal systems is that you trust your CPU which
>> then checks the firmware integrity (Boot Guard). Where would you put
>> that check in a VM model?
>
> In the enclave device driver, I would just limit the attestation to the
> firmware image
>
> So yeah it wouldn't be a mode where ne_load_image is not invoked and
> the enclave starts in real mode at 0xffffff0. You would still need
> "load image" functionality.
>
>> How close would it be to a normal VM then? And
>> if it's not, what's the point of sticking to such terrible legacy boot
>> paths?
>
> The point is that there's already two plausible loaders for the kernel
> (bzImage and ELF), so I'd like to decouple the loader and the image.

The loader is implemented by the enclave device. If it wishes to support
bzImage and ELF it does that. Today, it only does bzImage though IIRC :).

So yes, they are decoupled? Are you saying you would like to build your
own code in any way you like? Well, that means we either need to add
support for another loader in the enclave device or your workloads just
fakes a bzImage header and gets loaded regardless :).


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2020-04-30 14:03:22

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 29/04/2020 16:20, Paolo Bonzini wrote:
> On 28/04/20 17:07, Alexander Graf wrote:
>>> So why not just start running the enclave at 0xfffffff0 in real mode?
>>> Yes everybody hates it, but that's what OSes are written against. In
>>> the simplest example, the parent enclave can load bzImage and initrd at
>>> 0x10000 and place firmware tables (MPTable and DMI) somewhere at
>>> 0xf0000; the firmware would just be a few movs to segment registers
>>> followed by a long jmp.
>> There is a bit of initial attestation flow in the enclave, so that
>> you can be sure that the code that is running is actually what you wanted to
>> run.
> Can you explain this, since it's not documented?

Hash values are computed for the entire enclave image (EIF), the kernel
and ramdisk(s). That's used, for example, to checkthat the enclave image
that is loaded in the enclave VM is the one that was intended to be run.

These crypto measurements are included in a signed attestation document
generated by the Nitro Hypervisor and further used to prove the identity
of the enclave. KMS is an example of service that NE is integrated with
and that checks the attestation doc.

>
>>   vm = ne_create(vcpus = 4)
>>   ne_set_memory(vm, hva, len)
>>   ne_load_image(vm, addr, len)
>>   ne_start(vm)
>>
>> That way we would get the EIF loading into kernel space. "LOAD_IMAGE"
>> would only be available in the time window between set_memory and start.
>> It basically implements a memcpy(), but it would completely hide the
>> hidden semantics of where an EIF has to go, so future device versions
>> (or even other enclave implementers) could change the logic.
>>
>> I think it also makes sense to just allocate those 4 ioctls from
>> scratch. Paolo, would you still want to "donate" KVM ioctl space in that
>> case?
> Sure, that's not a problem.

Ok, thanks for confirmation. I've updated the ioctl number documentation
to reflect the ioctl space update, taking into account the previous
discussion; andnow, given also the proposal above from Alex, the
discussions we currently have and considering further easy extensibility
of the user space interface.

Thanks,
Andra

>> Overall, the above should address most of the concerns you raised in
>> this mail, right? It still requires copying, but at least we don't have
>> to keep the copy in kernel space.




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-05-07 17:46:40

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

Hi!

> > it uses its own memory and CPUs + its virtio-vsock emulated device for
> > communication with the primary VM.
> >
> > The memory and CPUs are carved out of the primary VM, they are dedicated
> > for the enclave. The Nitro hypervisor running on the host ensures memory
> > and CPU isolation between the primary VM and the enclave VM.
> >
> > These two components need to reflect the same state e.g. when the
> > enclave abstraction process (1) is terminated, the enclave VM (2) is
> > terminated as well.
> >
> > With regard to the communication channel, the primary VM has its own
> > emulated virtio-vsock PCI device. The enclave VM has its own emulated
> > virtio-vsock device as well. This channel is used, for example, to fetch
> > data in the enclave and then process it. An application that sets up the
> > vsock socket and connects or listens, depending on the use case, is then
> > developed to use this channel; this happens on both ends - primary VM
> > and enclave VM.
> >
> > Let me know if further clarifications are needed.
>
> Thanks, this is all useful. However can you please clarify the
> low-level details here?

Is the virtual machine manager open-source? If so, I guess pointer for sources
would be useful.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2020-05-08 07:02:50

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 07/05/2020 20:44, Pavel Machek wrote:
>
> Hi!
>
>>> it uses its own memory and CPUs + its virtio-vsock emulated device for
>>> communication with the primary VM.
>>>
>>> The memory and CPUs are carved out of the primary VM, they are dedicated
>>> for the enclave. The Nitro hypervisor running on the host ensures memory
>>> and CPU isolation between the primary VM and the enclave VM.
>>>
>>> These two components need to reflect the same state e.g. when the
>>> enclave abstraction process (1) is terminated, the enclave VM (2) is
>>> terminated as well.
>>>
>>> With regard to the communication channel, the primary VM has its own
>>> emulated virtio-vsock PCI device. The enclave VM has its own emulated
>>> virtio-vsock device as well. This channel is used, for example, to fetch
>>> data in the enclave and then process it. An application that sets up the
>>> vsock socket and connects or listens, depending on the use case, is then
>>> developed to use this channel; this happens on both ends - primary VM
>>> and enclave VM.
>>>
>>> Let me know if further clarifications are needed.
>> Thanks, this is all useful. However can you please clarify the
>> low-level details here?
> Is the virtual machine manager open-source? If so, I guess pointer for sources
> would be useful.

Hi Pavel,

Thanks for reaching out.

The VMM that is used for the primary / parent VM is not open source.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-05-09 19:23:16

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On Fri 2020-05-08 10:00:27, Paraschiv, Andra-Irina wrote:
>
>
> On 07/05/2020 20:44, Pavel Machek wrote:
> >
> >Hi!
> >
> >>>it uses its own memory and CPUs + its virtio-vsock emulated device for
> >>>communication with the primary VM.
> >>>
> >>>The memory and CPUs are carved out of the primary VM, they are dedicated
> >>>for the enclave. The Nitro hypervisor running on the host ensures memory
> >>>and CPU isolation between the primary VM and the enclave VM.
> >>>
> >>>These two components need to reflect the same state e.g. when the
> >>>enclave abstraction process (1) is terminated, the enclave VM (2) is
> >>>terminated as well.
> >>>
> >>>With regard to the communication channel, the primary VM has its own
> >>>emulated virtio-vsock PCI device. The enclave VM has its own emulated
> >>>virtio-vsock device as well. This channel is used, for example, to fetch
> >>>data in the enclave and then process it. An application that sets up the
> >>>vsock socket and connects or listens, depending on the use case, is then
> >>>developed to use this channel; this happens on both ends - primary VM
> >>>and enclave VM.
> >>>
> >>>Let me know if further clarifications are needed.
> >>Thanks, this is all useful. However can you please clarify the
> >>low-level details here?
> >Is the virtual machine manager open-source? If so, I guess pointer for sources
> >would be useful.
>
> Hi Pavel,
>
> Thanks for reaching out.
>
> The VMM that is used for the primary / parent VM is not open source.

Do we want to merge code that opensource community can not test?

Pavel

-- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2020-05-10 11:04:12

by Herrenschmidt, Benjamin

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On Sat, 2020-05-09 at 21:21 +0200, Pavel Machek wrote:
>
> On Fri 2020-05-08 10:00:27, Paraschiv, Andra-Irina wrote:
> >
> >
> > On 07/05/2020 20:44, Pavel Machek wrote:
> > >
> > > Hi!
> > >
> > > > > it uses its own memory and CPUs + its virtio-vsock emulated device for
> > > > > communication with the primary VM.
> > > > >
> > > > > The memory and CPUs are carved out of the primary VM, they are dedicated
> > > > > for the enclave. The Nitro hypervisor running on the host ensures memory
> > > > > and CPU isolation between the primary VM and the enclave VM.
> > > > >
> > > > > These two components need to reflect the same state e.g. when the
> > > > > enclave abstraction process (1) is terminated, the enclave VM (2) is
> > > > > terminated as well.
> > > > >
> > > > > With regard to the communication channel, the primary VM has its own
> > > > > emulated virtio-vsock PCI device. The enclave VM has its own emulated
> > > > > virtio-vsock device as well. This channel is used, for example, to fetch
> > > > > data in the enclave and then process it. An application that sets up the
> > > > > vsock socket and connects or listens, depending on the use case, is then
> > > > > developed to use this channel; this happens on both ends - primary VM
> > > > > and enclave VM.
> > > > >
> > > > > Let me know if further clarifications are needed.
> > > >
> > > > Thanks, this is all useful. However can you please clarify the
> > > > low-level details here?
> > >
> > > Is the virtual machine manager open-source? If so, I guess pointer for sources
> > > would be useful.
> >
> > Hi Pavel,
> >
> > Thanks for reaching out.
> >
> > The VMM that is used for the primary / parent VM is not open source.
>
> Do we want to merge code that opensource community can not test?

Hehe.. this isn't quite the story Pavel :)

We merge support for proprietary hypervisors, this is no different. You
can test it, well at least you'll be able to ... when AWS deploys the
functionality. You don't need the hypervisor itself to be open source.

In fact, in this case, it's not even low level invasive arch code like
some of the above can be. It's a driver for a PCI device :-) Granted a
virtual one. We merge drivers for PCI devices routinely without the RTL
or firmware of those devices being open source.

So yes, we probably want this if it's going to be a useful features to
users when running on AWS EC2. (Disclaimer: I work for AWS these days).

Cheers,
Ben.

2020-05-11 10:52:35

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 10/05/2020 14:02, Herrenschmidt, Benjamin wrote:
> On Sat, 2020-05-09 at 21:21 +0200, Pavel Machek wrote:
>> On Fri 2020-05-08 10:00:27, Paraschiv, Andra-Irina wrote:
>>>
>>> On 07/05/2020 20:44, Pavel Machek wrote:
>>>> Hi!
>>>>
>>>>>> it uses its own memory and CPUs + its virtio-vsock emulated device for
>>>>>> communication with the primary VM.
>>>>>>
>>>>>> The memory and CPUs are carved out of the primary VM, they are dedicated
>>>>>> for the enclave. The Nitro hypervisor running on the host ensures memory
>>>>>> and CPU isolation between the primary VM and the enclave VM.
>>>>>>
>>>>>> These two components need to reflect the same state e.g. when the
>>>>>> enclave abstraction process (1) is terminated, the enclave VM (2) is
>>>>>> terminated as well.
>>>>>>
>>>>>> With regard to the communication channel, the primary VM has its own
>>>>>> emulated virtio-vsock PCI device. The enclave VM has its own emulated
>>>>>> virtio-vsock device as well. This channel is used, for example, to fetch
>>>>>> data in the enclave and then process it. An application that sets up the
>>>>>> vsock socket and connects or listens, depending on the use case, is then
>>>>>> developed to use this channel; this happens on both ends - primary VM
>>>>>> and enclave VM.
>>>>>>
>>>>>> Let me know if further clarifications are needed.
>>>>> Thanks, this is all useful. However can you please clarify the
>>>>> low-level details here?
>>>> Is the virtual machine manager open-source? If so, I guess pointer for sources
>>>> would be useful.
>>> Hi Pavel,
>>>
>>> Thanks for reaching out.
>>>
>>> The VMM that is used for the primary / parent VM is not open source.
>> Do we want to merge code that opensource community can not test?
> Hehe.. this isn't quite the story Pavel :)
>
> We merge support for proprietary hypervisors, this is no different. You
> can test it, well at least you'll be able to ... when AWS deploys the
> functionality. You don't need the hypervisor itself to be open source.
>
> In fact, in this case, it's not even low level invasive arch code like
> some of the above can be. It's a driver for a PCI device :-) Granted a
> virtual one. We merge drivers for PCI devices routinely without the RTL
> or firmware of those devices being open source.
>
> So yes, we probably want this if it's going to be a useful features to
> users when running on AWS EC2. (Disclaimer: I work for AWS these days).

Indeed, it will available for checking out how it works.

The discussions are ongoing here on the LKML - understanding the
context, clarifying items, sharing feedback and coming with codebase
updates and basic example flow of the ioctl interface usage. This all
helps with the path towards merging.

Thanks, Ben, for the follow-up.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-05-11 12:09:14

by Paraschiv, Andra-Irina

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves



On 10/05/2020 12:57, Li Qiang wrote:
>
>
> Paraschiv, Andra-Irina <[email protected]
> <mailto:[email protected]>> 于2020年4月24日周五 下午10:03写道:
>
>
>
> On 24/04/2020 12:59, Tian, Kevin wrote:
> >
> >> From: Paraschiv, Andra-Irina
> >> Sent: Thursday, April 23, 2020 9:20 PM
> >>
> >> On 22/04/2020 00:46, Paolo Bonzini wrote:
> >>> On 21/04/20 20:41, Andra Paraschiv wrote:
> >>>> An enclave communicates with the primary VM via a local
> communication
> >> channel,
> >>>> using virtio-vsock [2]. An enclave does not have a disk or a
> network device
> >>>> attached.
> >>> Is it possible to have a sample of this in the samples/ directory?
> >> I can add in v2 a sample file including the basic flow of how
> to use the
> >> ioctl interface to create / terminate an enclave.
> >>
> >> Then we can update / build on top it based on the ongoing
> discussions on
> >> the patch series and the received feedback.
> >>
> >>> I am interested especially in:
> >>>
> >>> - the initial CPU state: CPL0 vs. CPL3, initial program
> counter, etc.
> >>>
> >>> - the communication channel; does the enclave see the usual
> local APIC
> >>> and IOAPIC interfaces in order to get interrupts from
> virtio-vsock, and
> >>> where is the virtio-vsock device (virtio-mmio I suppose)
> placed in memory?
> >>>
> >>> - what the enclave is allowed to do: can it change privilege
> levels,
> >>> what happens if the enclave performs an access to nonexistent
> memory,
> >> etc.
> >>> - whether there are special hypercall interfaces for the enclave
> >> An enclave is a VM, running on the same host as the primary VM,
> that
> >> launched the enclave. They are siblings.
> >>
> >> Here we need to think of two components:
> >>
> >> 1. An enclave abstraction process - a process running in the
> primary VM
> >> guest, that uses the provided ioctl interface of the Nitro Enclaves
> >> kernel driver to spawn an enclave VM (that's 2 below).
> >>
> >> How does all gets to an enclave VM running on the host?
> >>
> >> There is a Nitro Enclaves emulated PCI device exposed to the
> primary VM.
> >> The driver for this new PCI device is included in the current
> patch series.
> >>
> >> The ioctl logic is mapped to PCI device commands e.g. the
> >> NE_ENCLAVE_START ioctl maps to an enclave start PCI command or the
> >> KVM_SET_USER_MEMORY_REGION maps to an add memory PCI command.
> >> The PCI
> >> device commands are then translated into actions taken on the
> hypervisor
> >> side; that's the Nitro hypervisor running on the host where the
> primary
> >> VM is running.
> >>
> >> 2. The enclave itself - a VM running on the same host as the
> primary VM
> >> that spawned it.
> >>
> >> The enclave VM has no persistent storage or network interface
> attached,
> >> it uses its own memory and CPUs + its virtio-vsock emulated
> device for
> >> communication with the primary VM.
> > sounds like a firecracker VM?
>
> It's a VM crafted for enclave needs.
>
> >
> >> The memory and CPUs are carved out of the primary VM, they are
> dedicated
> >> for the enclave. The Nitro hypervisor running on the host
> ensures memory
> >> and CPU isolation between the primary VM and the enclave VM.
> > In last paragraph, you said that the enclave VM uses its own
> memory and
> > CPUs. Then here, you said the memory/CPUs are carved out and
> dedicated
> > from the primary VM. Can you elaborate which one is accurate? or
> a mixed
> > model?
>
> Memory and CPUs are carved out of the primary VM and are dedicated
> for
> the enclave VM. I mentioned above as "its own" in the sense that the
> primary VM doesn't use these carved out resources while the
> enclave is
> running, as they are dedicated to the enclave.
>
> Hope that now it's more clear.
>
> >
> >>
> >> These two components need to reflect the same state e.g. when the
> >> enclave abstraction process (1) is terminated, the enclave VM
> (2) is
> >> terminated as well.
> >>
> >> With regard to the communication channel, the primary VM has
> its own
> >> emulated virtio-vsock PCI device. The enclave VM has its own
> emulated
> >> virtio-vsock device as well. This channel is used, for example,
> to fetch
> >> data in the enclave and then process it. An application that
> sets up the
> >> vsock socket and connects or listens, depending on the use
> case, is then
> >> developed to use this channel; this happens on both ends -
> primary VM
> >> and enclave VM.
> > How does the application in the primary VM assign task to be
> executed
> > in the enclave VM? I didn't see such command in this series, so
> suppose
> > it is also communicated through virtio-vsock?
>
> The application that runs in the enclave needs to be packaged in an
> enclave image together with the OS ( e.g. kernel, ramdisk, init )
> that
> will run in the enclave VM.
>
> Then the enclave image is loaded in memory. After booting is
> finished,
> the application starts. Now, depending on the app implementation
> and use
> case, one example can be that the app in the enclave waits for
> data to
> be fetched in via the vsock channel.
>
>
> Hi Paraschiv,
>
> So here the custom's application should be programmed to respect the
> enclave VM spec,
> and can't be any binary, right? And also the application in enclave
> can't use any other IO
> except the vsock?

Hi,

The application running in the enclave should be built so that it uses
the available exposed functionality e.g. the vsock comm channel.

With regard to I/O, vsock is the means to interact with the primary /
parent VM. The enclave VM doesn't have a network interface attached or
persistent storage.

There is also an exposed device in the enclave, for the attestation flow
e.g. to get the signed attestation document generated by the Nitro
Hypervisor on the host where the primary VM and the enclave VM run.

From a previous mail thread on LKML, where I added a couple of
clarifications on the attestation flow:

"

Hash values are computed for the entire enclave image (EIF), the kernel
and ramdisk(s). That's used, for example, to check that the enclave image
that is loaded in the enclave VM is the one that was intended to be run.

These crypto measurements are included in a signed attestation document
generated by the Nitro Hypervisor and further used to prove the identity
of the enclave. KMS is an example of service that NE is integrated with
and that checks the attestation doc.

"


Thanks,
Andra

>
> >
> >> Let me know if further clarifications are needed.
> >>
> >>>> The proposed solution is following the KVM model and uses the
> KVM API
> >> to be able
> >>>> to create and set resources for enclaves. An additional ioctl
> command,
> >> besides
> >>>> the ones provided by KVM, is used to start an enclave and
> setup the
> >> addressing
> >>>> for the communication channel and an enclave unique id.
> >>> Reusing some KVM ioctls is definitely a good idea, but I
> wouldn't really
> >>> say it's the KVM API since the VCPU file descriptor is
> basically non
> >>> functional (without KVM_RUN and mmap it's not really the KVM API).
> >> It uses part of the KVM API or a set of KVM ioctls to model the
> way a VM
> >> is created / terminated. That's true, KVM_RUN and mmap-ing the
> vcpu fd
> >> are not included.
> >>
> >> Thanks for the feedback regarding the reuse of KVM ioctls.
> >>
> >> Andra
> >>
> > Thanks
> > Kevin
>
>
>
>
> Amazon Development Center (Romania) S.R.L. registered office: 27A
> Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045,
> Romania. Registered in Romania. Registration number J22/2621/2005.
>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

2020-05-11 13:51:54

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves

On Sun, May 10, 2020 at 11:02:18AM +0000, Herrenschmidt, Benjamin wrote:
> On Sat, 2020-05-09 at 21:21 +0200, Pavel Machek wrote:
> >
> > On Fri 2020-05-08 10:00:27, Paraschiv, Andra-Irina wrote:
> > >
> > >
> > > On 07/05/2020 20:44, Pavel Machek wrote:
> > > >
> > > > Hi!
> > > >
> > > > > > it uses its own memory and CPUs + its virtio-vsock emulated device for
> > > > > > communication with the primary VM.
> > > > > >
> > > > > > The memory and CPUs are carved out of the primary VM, they are dedicated
> > > > > > for the enclave. The Nitro hypervisor running on the host ensures memory
> > > > > > and CPU isolation between the primary VM and the enclave VM.
> > > > > >
> > > > > > These two components need to reflect the same state e.g. when the
> > > > > > enclave abstraction process (1) is terminated, the enclave VM (2) is
> > > > > > terminated as well.
> > > > > >
> > > > > > With regard to the communication channel, the primary VM has its own
> > > > > > emulated virtio-vsock PCI device. The enclave VM has its own emulated
> > > > > > virtio-vsock device as well. This channel is used, for example, to fetch
> > > > > > data in the enclave and then process it. An application that sets up the
> > > > > > vsock socket and connects or listens, depending on the use case, is then
> > > > > > developed to use this channel; this happens on both ends - primary VM
> > > > > > and enclave VM.
> > > > > >
> > > > > > Let me know if further clarifications are needed.
> > > > >
> > > > > Thanks, this is all useful. However can you please clarify the
> > > > > low-level details here?
> > > >
> > > > Is the virtual machine manager open-source? If so, I guess pointer for sources
> > > > would be useful.
> > >
> > > Hi Pavel,
> > >
> > > Thanks for reaching out.
> > >
> > > The VMM that is used for the primary / parent VM is not open source.
> >
> > Do we want to merge code that opensource community can not test?
>
> Hehe.. this isn't quite the story Pavel :)
>
> We merge support for proprietary hypervisors, this is no different. You
> can test it, well at least you'll be able to ... when AWS deploys the
> functionality. You don't need the hypervisor itself to be open source.
>
> In fact, in this case, it's not even low level invasive arch code like
> some of the above can be. It's a driver for a PCI device :-) Granted a
> virtual one. We merge drivers for PCI devices routinely without the RTL
> or firmware of those devices being open source.
>
> So yes, we probably want this if it's going to be a useful features to
> users when running on AWS EC2. (Disclaimer: I work for AWS these days).

I agree that the VMM does not need to be open source.

What is missing though are details of the enclave's initial state and
the image format required to boot code. Until this documentation is
available only Amazon can write a userspace application that does
anything useful with this driver.

Some of the people from Amazon are long-time Linux contributors (such as
yourself!) and the intent to publish this information has been
expressed, so I'm sure that will be done.

Until then, it's cool but no one else can play with it.

Stefan


Attachments:
(No filename) (3.18 kB)
signature.asc (499.00 B)
Download all attachments