Early draft because there are people outside of Intel that want to
see how this is coming along, and this is the easiest way to share.
I wouldn't advise running this code on a production system as testing
has been very light.
SGX memory pages are allocated from special protected memory ranges
and do not have Linux "struct page" structures to manage them.
A recent architecture change results in new behavior for SGX enclaves
on a system when a recoverable local machine check occurs.
a) If the machine check is triggered by code executing outside of an
enclave, then it can be handled as normal by the OS. Enclaves are
not affected
b) If the machine check is triggered by code in an enclave, then that
enclave cannot be re-entered. But other enclaves on the system can
continue to execute.
This means that "recovery" from an error in an active enclave page
will result in the termination of that enclave.
Memory controller patrol scrubbing may find errors in unused SGX pages.
Those can simply be removed from the free list so that they will not
be used.
On bare metal there are two cases for "regular" SGX pages:
1) Error is found by patrol scrubber. Action is to remove the page
from the enclave. If the page isn't accessed again, then the enclave
can continue to execute. If the page is accessed the page cannot be
replaced, so the enclave will be terminated.
This part of the code (and the free page part) tested using
/sys/devices/system/memory/hard_offline_page to call memory_failure().
2) Error triggers a machine check when enclave code accesses poison. In
this case a SIGBUS is sent to the task that owns the enclave (just
like the non-SGX case).
This part of the code has been tested with EINJ error injection.
Poison in other types of SGX pages (e.g. SECS) isn't handled yet.
The virtualization case is just a shell. Linux doesn't know how the
guest is using each page. For now just SIGKILL the task (qemu?) that
owns the SGX pages.
This part of the code compiles, but has not been tested.
Tony Luck (4):
x86/sgx: Track phase and type of SGX EPC pages
x86/sgx: Add basic infrastructure to recover from errors in SGX memory
x86/sgx: Hook sgx_memory_failure() into mainline code
x86/sgx: Add hook to error injection address validation
.../firmware-guide/acpi/apei/einj.rst | 19 +++
arch/x86/include/asm/sgx.h | 6 +
arch/x86/kernel/cpu/sgx/encl.c | 4 +-
arch/x86/kernel/cpu/sgx/ioctl.c | 4 +-
arch/x86/kernel/cpu/sgx/main.c | 147 +++++++++++++++++-
arch/x86/kernel/cpu/sgx/sgx.h | 17 +-
arch/x86/kernel/cpu/sgx/virt.c | 11 +-
drivers/acpi/apei/einj.c | 3 +-
include/linux/mm.h | 15 ++
mm/memory-failure.c | 4 +
10 files changed, 219 insertions(+), 11 deletions(-)
--
2.29.2
Memory errors can be reported either synchronously as memory is accessed,
or asynchronously by speculative access or by a memory controller page
scrubber. The life cycle of an EPC page takes it through:
dirty -> free -> in-use -> free.
Memory errors are reported using physical addresses. It is a simple
matter to find which sgx_epc_page structure maps a given address.
But then recovery code needs to be able to determine the current use of
the page to take the appropriate recovery action. Within the "in-use"
phase different actions are needed based on how the page is used in
the enclave.
Add new flags bits to describe the phase (with an extra bit for the new
phase of "poisoned"). Drop pages marked as poisoned instead of adding
them to a free list to make sure they are not re-used.
Add a type field to struct epc_page for how an in-use page has been
allocated. Re-use "enum sgx_page_type" for this type, with a couple
of additions for s/w types.
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/sgx.h | 6 ++++++
arch/x86/kernel/cpu/sgx/encl.c | 4 ++--
arch/x86/kernel/cpu/sgx/ioctl.c | 4 ++--
arch/x86/kernel/cpu/sgx/main.c | 21 +++++++++++++++++++--
arch/x86/kernel/cpu/sgx/sgx.h | 14 ++++++++++++--
arch/x86/kernel/cpu/sgx/virt.c | 2 +-
6 files changed, 42 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 9c31e0ebc55b..9619a6d77a83 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -216,6 +216,8 @@ struct sgx_pageinfo {
* %SGX_PAGE_TYPE_REG: a regular page
* %SGX_PAGE_TYPE_VA: a VA page
* %SGX_PAGE_TYPE_TRIM: a page in trimmed state
+ *
+ * Also used to track current use of &struct sgx_epc_page
*/
enum sgx_page_type {
SGX_PAGE_TYPE_SECS,
@@ -223,6 +225,10 @@ enum sgx_page_type {
SGX_PAGE_TYPE_REG,
SGX_PAGE_TYPE_VA,
SGX_PAGE_TYPE_TRIM,
+
+ /* sgx_epc_page.type */
+ SGX_PAGE_TYPE_FREE = 100,
+ SGX_PAGE_TYPE_KVM = 101,
};
#define SGX_NR_PAGE_TYPES 5
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 3be203297988..abf6e1a704c0 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -72,7 +72,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
struct sgx_epc_page *epc_page;
int ret;
- epc_page = sgx_alloc_epc_page(encl_page, false);
+ epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, false);
if (IS_ERR(epc_page))
return epc_page;
@@ -679,7 +679,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
struct sgx_epc_page *epc_page;
int ret;
- epc_page = sgx_alloc_epc_page(NULL, true);
+ epc_page = sgx_alloc_epc_page(NULL, SGX_PAGE_TYPE_VA, true);
if (IS_ERR(epc_page))
return ERR_CAST(epc_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 83df20e3e633..a74ae00194cc 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -83,7 +83,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
encl->backing = backing;
- secs_epc = sgx_alloc_epc_page(&encl->secs, true);
+ secs_epc = sgx_alloc_epc_page(&encl->secs, SGX_PAGE_TYPE_SECS, true);
if (IS_ERR(secs_epc)) {
ret = PTR_ERR(secs_epc);
goto err_out_backing;
@@ -300,7 +300,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
if (IS_ERR(encl_page))
return PTR_ERR(encl_page);
- epc_page = sgx_alloc_epc_page(encl_page, true);
+ epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, true);
if (IS_ERR(epc_page)) {
kfree(encl_page);
return PTR_ERR(epc_page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..643df87b3e01 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -401,7 +401,12 @@ static void sgx_reclaim_pages(void)
section = &sgx_epc_sections[epc_page->section];
node = section->node;
+ /* drop poison pages instead of adding to free list */
+ if (epc_page->flags & SGX_EPC_PAGE_POISON)
+ continue;
+
spin_lock(&node->lock);
+ epc_page->flags = SGX_EPC_PAGE_FREE;
list_add_tail(&epc_page->list, &node->free_page_list);
sgx_nr_free_pages++;
spin_unlock(&node->lock);
@@ -560,6 +565,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
/**
* sgx_alloc_epc_page() - Allocate an EPC page
* @owner: the owner of the EPC page
+ * @type: type of page being allocated
* @reclaim: reclaim pages if necessary
*
* Iterate through EPC sections and borrow a free EPC page to the caller. When a
@@ -574,7 +580,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
* an EPC page,
* -errno on error
*/
-struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
+struct sgx_epc_page *sgx_alloc_epc_page(void *owner, enum sgx_page_type type, bool reclaim)
{
struct sgx_epc_page *page;
@@ -582,6 +588,8 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
page = __sgx_alloc_epc_page();
if (!IS_ERR(page)) {
page->owner = owner;
+ page->type = type;
+ page->flags = 0;
break;
}
@@ -616,14 +624,22 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
* responsibility to make sure that the page is in uninitialized state. In other
* words, do EREMOVE, EWB or whatever operation is necessary before calling
* this function.
+ *
+ * Note that if the page has been tagged as poisoned, it is simply
+ * dropped on the floor instead of added to the free list to make
+ * sure we do not re-use it.
*/
void sgx_free_epc_page(struct sgx_epc_page *page)
{
struct sgx_epc_section *section = &sgx_epc_sections[page->section];
struct sgx_numa_node *node = section->node;
+ if (page->flags & SGX_EPC_PAGE_POISON)
+ return;
+
spin_lock(&node->lock);
+ page->flags = SGX_EPC_PAGE_FREE;
list_add_tail(&page->list, &node->free_page_list);
sgx_nr_free_pages++;
@@ -651,7 +667,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
for (i = 0; i < nr_pages; i++) {
section->pages[i].section = index;
- section->pages[i].flags = 0;
+ section->pages[i].flags = SGX_EPC_PAGE_DIRTY;
+ section->pages[i].type = SGX_PAGE_TYPE_FREE;
section->pages[i].owner = NULL;
list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list);
}
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..e43d3c27eb96 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,9 +26,19 @@
/* Pages, which are being tracked by the page reclaimer. */
#define SGX_EPC_PAGE_RECLAIMER_TRACKED BIT(0)
+/* Pages, on the "sgx_dirty_page_list" */
+#define SGX_EPC_PAGE_DIRTY BIT(1)
+
+/* Pages, on one of the node free lists */
+#define SGX_EPC_PAGE_FREE BIT(2)
+
+/* Pages, with h/w poison errors */
+#define SGX_EPC_PAGE_POISON BIT(3)
+
struct sgx_epc_page {
unsigned int section;
- unsigned int flags;
+ u16 flags;
+ u16 type;
struct sgx_encl_page *owner;
struct list_head list;
};
@@ -82,7 +92,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page);
void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
-struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
+struct sgx_epc_page *sgx_alloc_epc_page(void *owner, enum sgx_page_type type, bool reclaim);
#ifdef CONFIG_X86_SGX_KVM
int __init sgx_vepc_init(void);
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 6ad165a5c0cc..a9d89e9ddf93 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -46,7 +46,7 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc,
if (epc_page)
return 0;
- epc_page = sgx_alloc_epc_page(vepc, false);
+ epc_page = sgx_alloc_epc_page(vepc, SGX_PAGE_TYPE_KVM, false);
if (IS_ERR(epc_page))
return PTR_ERR(epc_page);
--
2.29.2
Add a call inside memory_failure() to check if the address is an SGX
EPC page and handle it.
Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.
Signed-off-by: Tony Luck <[email protected]>
---
include/linux/mm.h | 9 +++++++++
mm/memory-failure.c | 4 ++++
2 files changed, 13 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c274f75efcf9..8ad4c513d4cc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3248,5 +3248,14 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
return 0;
}
+#ifdef CONFIG_X86_SGX
+int sgx_memory_failure(unsigned long pfn, int flags);
+#else
+static inline int sgx_memory_failure(unsigned long pfn, int flags)
+{
+ return -ENXIO;
+}
+#endif
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 85ad98c00fd9..c2ade18fd8d5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1413,6 +1413,10 @@ int memory_failure(unsigned long pfn, int flags)
p = pfn_to_online_page(pfn);
if (!p) {
+ res = sgx_memory_failure(pfn, flags);
+ if (res == 0)
+ return 0;
+
if (pfn_valid(pfn)) {
pgmap = get_dev_pagemap(pfn, NULL);
if (pgmap)
--
2.29.2
X86 machine check architecture reports a physical address when there is
a memory error.
Add an end_phys_addr field to the sgx_epc_section structure and a new
function sgx_paddr_to_page() that searches all such structures to see
if a physical address is part of an SGX EPC page.
The ACPI EINJ module has a sanity check that the injection address is
valid. Add and export a function sgx_is_epc_page() so that it can be
changed to allow injection to SGX EPC pages.
Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously inside the enclave send a SIGBUS just the
same as for poison consumption outside of an enclave.
For asynchronously reported errors the easiest cases are when the page
is currently free. Just drop the page from the free list so that it will
never be allocated.
An SGX_PAGE_TYPE_REG page can just be unmapped from the enclave. If the
enclave doesn't ever touch that page again all is well. If it does touch
the page it will die. Possible future enhancement here to mark the
unmapped PTE so that it will cause a SIGBUS.
SGX_PAGE_TYPE_KVM pages may be recoverable depending on how they are
being used by guests. To fix that properly requires injecting the machine
check into the guest. For now just kill it.
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/sgx/main.c | 126 +++++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/sgx/sgx.h | 3 +-
arch/x86/kernel/cpu/sgx/virt.c | 9 +++
3 files changed, 137 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 643df87b3e01..4a354f89373e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -664,6 +664,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
}
section->phys_addr = phys_addr;
+ section->end_phys_addr = phys_addr + size - 1;
for (i = 0; i < nr_pages; i++) {
section->pages[i].section = index;
@@ -676,6 +677,131 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
return true;
}
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+ struct sgx_epc_section *section;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
+ section = &sgx_epc_sections[i];
+
+ if (paddr < section->phys_addr || paddr > section->end_phys_addr)
+ continue;
+
+ return §ion->pages[PFN_DOWN(paddr - section->phys_addr)];
+ }
+
+ return NULL;
+}
+
+bool sgx_is_epc_page(u64 paddr)
+{
+ return !!sgx_paddr_to_page(paddr);
+}
+EXPORT_SYMBOL_GPL(sgx_is_epc_page);
+
+void __attribute__((weak)) sgx_memory_failure_vepc(struct sgx_epc_page *epc_page)
+{
+}
+
+/*
+ * This can be called in three ways:
+ * a) When an enclave has just consumed poison. In this case
+ * calling process context is the owner of the enclave.
+ * b) For some asychronous poison notification (e.g. patrol
+ * scrubber or speculative execution saw the poison.)
+ * In this case calling context is a kernel thread.
+ * c) Someone asked to take a page offline using the
+ * /sys/devices/system/memory/{hard,soft}_offline_page.
+ * In this case caller is the process writing these files.
+ */
+int sgx_memory_failure(unsigned long pfn, int flags)
+{
+ struct sgx_epc_page *epc_page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+ struct sgx_epc_section *section;
+ struct sgx_encl_page *encl_page;
+ struct sgx_numa_node *node;
+ unsigned long addr;
+ u16 page_flags;
+
+ if (!epc_page)
+ return -ENXIO;
+
+ spin_lock(&sgx_reclaimer_lock);
+
+ page_flags = epc_page->flags;
+ epc_page->flags |= SGX_EPC_PAGE_POISON;
+
+ /*
+ * Poison was synchronously consumed by an enclave in the current
+ * task. Send a SIGBUS here. Hardware should prevent this enclave
+ * from being re-entered, so no concern that the poisoned page
+ * will be touched a second time. The poisoned EPC page will be
+ * dropped as pages are freed during task exit.
+ */
+ if (flags & MF_ACTION_REQUIRED) {
+ if (epc_page->type == SGX_PAGE_TYPE_REG) {
+ encl_page = epc_page->owner;
+ addr = encl_page->desc & PAGE_MASK;
+ force_sig_mceerr(BUS_MCEERR_AR, (void *)addr, PAGE_SHIFT);
+ } else {
+ force_sig(SIGBUS);
+ }
+ goto unlock;
+ }
+
+ section = &sgx_epc_sections[epc_page->section];
+ node = section->node;
+
+ if (page_flags & SGX_EPC_PAGE_POISON)
+ goto unlock;
+
+ if (page_flags & SGX_EPC_PAGE_DIRTY) {
+ list_del(&epc_page->list);
+ } else if (page_flags & SGX_EPC_PAGE_FREE) {
+ spin_lock(&node->lock);
+ epc_page->owner = NULL;
+ list_del(&epc_page->list);
+ sgx_nr_free_pages--;
+ spin_unlock(&node->lock);
+ goto unlock;
+ }
+
+ switch (epc_page->type) {
+ case SGX_PAGE_TYPE_REG:
+ encl_page = epc_page->owner;
+ /* Unmap the page, unless it was already in progress to be freed */
+ if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) {
+ spin_unlock(&sgx_reclaimer_lock);
+ sgx_reclaimer_block(epc_page);
+ kref_put(&encl_page->encl->refcount, sgx_encl_release);
+ goto done;
+ }
+ break;
+
+ case SGX_PAGE_TYPE_KVM:
+ spin_unlock(&sgx_reclaimer_lock);
+ sgx_memory_failure_vepc(epc_page);
+ break;
+
+ default:
+ /*
+ * I don't expect SECS, TCS, VA pages would result
+ * in recoverable machine checks. If it turns out
+ * that they do, then add cases for recovery for
+ * each of them.
+ */
+ panic("Poisoned active SGX page type %d\n", epc_page->type);
+ break;
+ }
+ goto done;
+
+unlock:
+ spin_unlock(&sgx_reclaimer_lock);
+done:
+ return 0;
+}
+
/**
* A section metric is concatenated in a way that @low bits 12-31 define the
* bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index e43d3c27eb96..7701f5f88b61 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -39,7 +39,7 @@ struct sgx_epc_page {
unsigned int section;
u16 flags;
u16 type;
- struct sgx_encl_page *owner;
+ void *owner;
struct list_head list;
};
@@ -60,6 +60,7 @@ struct sgx_numa_node {
*/
struct sgx_epc_section {
unsigned long phys_addr;
+ unsigned long end_phys_addr;
void *virt_addr;
struct sgx_epc_page *pages;
struct sgx_numa_node *node;
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index a9d89e9ddf93..4d40c2a1e4ee 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -21,6 +21,7 @@
struct sgx_vepc {
struct xarray page_array;
struct mutex lock;
+ struct task_struct *task;
};
/*
@@ -217,6 +218,13 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
return 0;
}
+void sgx_memory_failure_vepc(struct sgx_epc_page *epc_page)
+{
+ struct sgx_vepc *vepc = epc_page->owner;
+
+ send_sig(SIGBUS, vepc->task, false);
+}
+
static int sgx_vepc_open(struct inode *inode, struct file *file)
{
struct sgx_vepc *vepc;
@@ -226,6 +234,7 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
return -ENOMEM;
mutex_init(&vepc->lock);
xa_init(&vepc->page_array);
+ vepc->task = current;
file->private_data = vepc;
--
2.29.2
On Tue, Jun 08, 2021, Tony Luck wrote:
> Add a type field to struct epc_page for how an in-use page has been
> allocated. Re-use "enum sgx_page_type" for this type, with a couple
> of additions for s/w types.
IMO, taking different actions based on the page type is flawed. Ditto for adding
a flag to track so called "dirty" pages.
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 3be203297988..abf6e1a704c0 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -72,7 +72,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> struct sgx_epc_page *epc_page;
> int ret;
>
> - epc_page = sgx_alloc_epc_page(encl_page, false);
> + epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, false);
This is wrong when reloading an SECS page.
> if (IS_ERR(epc_page))
> return epc_page;
>
> @@ -651,7 +667,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>
> for (i = 0; i < nr_pages; i++) {
> section->pages[i].section = index;
> - section->pages[i].flags = 0;
> + section->pages[i].flags = SGX_EPC_PAGE_DIRTY;
This flag is confusing, "dirty" in paging context almost always refers to the
page having been written. The existing sgx_dirty_page_list name is not good,
fixing that instead of propagating the bad name would preferable.
But why is the flag needed? If the flag is set iff the page is on the dirty
list, what's the point of the flag?
Looking forward to patch 3, why does sgx_memory_failure() need to eagerly remove
pages from the list? Just have the sanitization skip poisoned pages.
> + section->pages[i].type = SGX_PAGE_TYPE_FREE;
> section->pages[i].owner = NULL;
> list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list);
> }
On Tue, Jun 08, 2021, Tony Luck wrote:
> + /*
> + * Poison was synchronously consumed by an enclave in the current
> + * task. Send a SIGBUS here. Hardware should prevent this enclave
> + * from being re-entered, so no concern that the poisoned page
> + * will be touched a second time. The poisoned EPC page will be
> + * dropped as pages are freed during task exit.
> + */
> + if (flags & MF_ACTION_REQUIRED) {
> + if (epc_page->type == SGX_PAGE_TYPE_REG) {
Why only for REG pages? I don't see the value added by assuming that SECS, TCS,
and VA pages will result in uncorrectable #MC.
> + encl_page = epc_page->owner;
> + addr = encl_page->desc & PAGE_MASK;
> + force_sig_mceerr(BUS_MCEERR_AR, (void *)addr, PAGE_SHIFT);
> + } else {
> + force_sig(SIGBUS);
> + }
> + goto unlock;
> + }
> +
> + section = &sgx_epc_sections[epc_page->section];
> + node = section->node;
> +
> + if (page_flags & SGX_EPC_PAGE_POISON)
> + goto unlock;
> +
> + if (page_flags & SGX_EPC_PAGE_DIRTY) {
> + list_del(&epc_page->list);
As suggested in the prior patch, why not let the sanitizer handle poisoned pages?
> + } else if (page_flags & SGX_EPC_PAGE_FREE) {
> + spin_lock(&node->lock);
> + epc_page->owner = NULL;
> + list_del(&epc_page->list);
> + sgx_nr_free_pages--;
> + spin_unlock(&node->lock);
> + goto unlock;
> + }
> +
> + switch (epc_page->type) {
> + case SGX_PAGE_TYPE_REG:
> + encl_page = epc_page->owner;
> + /* Unmap the page, unless it was already in progress to be freed */
> + if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) {
> + spin_unlock(&sgx_reclaimer_lock);
> + sgx_reclaimer_block(epc_page);
> + kref_put(&encl_page->encl->refcount, sgx_encl_release);
> + goto done;
> + }
> + break;
> +
> + case SGX_PAGE_TYPE_KVM:
> + spin_unlock(&sgx_reclaimer_lock);
> + sgx_memory_failure_vepc(epc_page);
...
> @@ -217,6 +218,13 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
> return 0;
> }
>
> +void sgx_memory_failure_vepc(struct sgx_epc_page *epc_page)
> +{
> + struct sgx_vepc *vepc = epc_page->owner;
> +
> + send_sig(SIGBUS, vepc->task, false);
...
> +}
> +
> static int sgx_vepc_open(struct inode *inode, struct file *file)
> {
> struct sgx_vepc *vepc;
> @@ -226,6 +234,7 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
> return -ENOMEM;
> mutex_init(&vepc->lock);
> xa_init(&vepc->page_array);
> + vepc->task = current;
This is broken, there is no guarantee whatsoever that the task that opened the
vEPC is the task that is actively using vEPC. Since Dave successfully lobbied
to allow multiple mm_structs per vEPC, it might not even be the correct process.
> file->private_data = vepc;
>
> --
> 2.29.2
>
On Tue, 2021-07-13 at 16:56 +0000, Sean Christopherson wrote:
> On Tue, Jun 08, 2021, Tony Luck wrote:
> > Add a type field to struct epc_page for how an in-use page has been
> > allocated. Re-use "enum sgx_page_type" for this type, with a couple
> > of additions for s/w types.
>
> IMO, taking different actions based on the page type is flawed. Ditto for adding
> a flag to track so called "dirty" pages.
I agree with this. Let's keep microarchitecture related
structures/enums separate from "software defined" things.
If such things add up (i.e. similar additions to something
else in future), it will be both hard to maintain, and also
the subsystem will be harder to understand for potential future
contributers.
/Jarkko