2022-01-07 18:17:05

by Kristen Carlson Accardi

[permalink] [raw]
Subject: [PATCH v2 0/2] x86/sgx: Limit EPC overcommit

SGX currently allows EPC pages to be overcommitted. If the system is
out of enclave memory, EPC pages are swapped to normal RAM via
a per enclave shared memory area. This shared memory is not charged
to the enclave or the task mapping it, making it hard to account
for using normal methods. Since SGX will allow EPC pages to be
overcommitted without limits, enclaves can consume system memory
for these backing pages without limits.

In order to prevent this, set a cap on the amount of overcommit SGX
allows. Whenever a backing page is requested by an enclave, track
the total amount of shared memory pages used across all enclaves and
return an error if the overcommit limit has been reached. This will
restrict the total amount of backing pages that all enclaves can
consume to a maximum amount, and prevent enclaves from consuming
all the system RAM for backing pages.

The overcommit percentage has a value of 150, which limits shared
memory page consumption to 1.5x the number of EPC pages in the system.

Changes from v1
----------------
* removed module parameter and disable boolean
* increased over commit percentage to 150% from 100%

Kristen Carlson Accardi (2):
x86/sgx: Add accounting for tracking overcommit
x86/sgx: account backing pages

arch/x86/kernel/cpu/sgx/encl.c | 76 ++++++++++++++++++++++++++++++++--
arch/x86/kernel/cpu/sgx/encl.h | 6 ++-
arch/x86/kernel/cpu/sgx/main.c | 52 +++++++++++++++++++++--
arch/x86/kernel/cpu/sgx/sgx.h | 2 +
4 files changed, 128 insertions(+), 8 deletions(-)

--
2.20.1



2022-01-07 18:17:13

by Kristen Carlson Accardi

[permalink] [raw]
Subject: [PATCH v2 1/2] x86/sgx: Add accounting for tracking overcommit

When the system runs out of enclave memory, SGX can reclaim EPC pages
by swapping to normal RAM. This normal RAM is allocated via a
per-enclave shared memory area. The shared memory area is not mapped
into the enclave or the task mapping it, which makes its memory use
opaque (including to the OOM killer). Having lots of hard to find
memory around is problematic, especially when there is no limit.

Introduce a global counter that can be used to limit the number of pages
that enclaves are able to consume for backing storage. This parameter
is a percentage value that is used in conjunction with the number of
EPC pages in the system to set a cap on the amount of backing RAM that
can be consumed.

The overcommit percentage value is 150, which limits the total number of
shared memory pages that may be consumed by all enclaves as backing pages
to 1.5X of EPC pages on the system.

For example, on an SGX system that has 128MB of EPC, this would
cap the amount of normal RAM that SGX consumes for its shared memory
areas at 192MB.

The SGX overcommit percent works differently than the core VM overcommit
limit. Enclaves request backing pages one page at a time, and the number
of in use backing pages that are allowed is a global resource that is
limited for all enclaves.

Introduce a pair of functions which can be used by callers when requesting
backing RAM pages. These functions are responsible for accounting the
page charges. A request may return an error if the request will cause the
counter to exceed the backing page cap.

Signed-off-by: Kristen Carlson Accardi <[email protected]>
---
arch/x86/kernel/cpu/sgx/main.c | 46 ++++++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/sgx/sgx.h | 2 ++
2 files changed, 48 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 2857a49f2335..8a7bfe0d9023 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -43,6 +43,45 @@ static struct sgx_numa_node *sgx_numa_nodes;

static LIST_HEAD(sgx_dirty_page_list);

+/*
+ * Limits the amount of normal RAM that SGX can consume for EPC
+ * overcommit to the total EPC pages * sgx_overcommit_percent / 100
+ */
+static int sgx_overcommit_percent = 150;
+
+/* The number of pages that can be allocated globally for backing storage. */
+static atomic_long_t sgx_nr_available_backing_pages;
+
+/**
+ * sgx_charge_mem() - charge for a page used for backing storage
+ *
+ * Backing storage usage is capped by the sgx_nr_available_backing_pages.
+ * If the backing storage usage is over the overcommit limit,
+ * return an error.
+ *
+ * Return:
+ * 0: The page requested does not exceed the limit
+ * -ENOMEM: The page requested exceeds the overcommit limit
+ */
+int sgx_charge_mem(void)
+{
+ if (!atomic_long_add_unless(&sgx_nr_available_backing_pages, -1, 0))
+ return -ENOMEM;
+
+ return 0;
+}
+
+/**
+ * sgx_uncharge_mem() - uncharge a page previously used for backing storage
+ *
+ * When backing storage is no longer in use, increment the
+ * sgx_nr_available_backing_pages counter.
+ */
+void sgx_uncharge_mem(void)
+{
+ atomic_long_inc(&sgx_nr_available_backing_pages);
+}
+
/*
* Reset post-kexec EPC pages to the uninitialized state. The pages are removed
* from the input list, and made available for the page allocator. SECS pages
@@ -786,6 +825,8 @@ static bool __init sgx_page_cache_init(void)
u64 pa, size;
int nid;
int i;
+ u64 total_epc_bytes = 0;
+ u64 available_backing_bytes;

sgx_numa_nodes = kmalloc_array(num_possible_nodes(), sizeof(*sgx_numa_nodes), GFP_KERNEL);
if (!sgx_numa_nodes)
@@ -830,6 +871,7 @@ static bool __init sgx_page_cache_init(void)

sgx_epc_sections[i].node = &sgx_numa_nodes[nid];
sgx_numa_nodes[nid].size += size;
+ total_epc_bytes += size;

sgx_nr_epc_sections++;
}
@@ -839,6 +881,10 @@ static bool __init sgx_page_cache_init(void)
return false;
}

+ available_backing_bytes = total_epc_bytes * (sgx_overcommit_percent / 100);
+
+ atomic_long_set(&sgx_nr_available_backing_pages, available_backing_bytes >> PAGE_SHIFT);
+
return true;
}

diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 0f17def9fe6f..3507a9983fc1 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -89,6 +89,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page);
void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
+int sgx_charge_mem(void);
+void sgx_uncharge_mem(void);

#ifdef CONFIG_X86_SGX_KVM
int __init sgx_vepc_init(void);
--
2.20.1


2022-01-07 18:17:16

by Kristen Carlson Accardi

[permalink] [raw]
Subject: [PATCH v2 2/2] x86/sgx: account backing pages

SGX may allow EPC pages to be overcommitted. If the system is
out of enclave memory, EPC pages are swapped to normal RAM via
a per enclave shared memory area. This shared memory is not
charged to the enclave or the task mapping it, making it hard
to account for using normal methods.

In order to avoid unlimited usage of normal RAM, enclaves must be
charged for each new page used for backing storage, and uncharged
when they are no longer using a backing page.

Modify the existing flow for requesting backing pages to reduce the
available backing page counter and confirm that the limit has not
been exceeded. Backing page usage for loading EPC pages back out of
the shared memory do not incur a charge.

When a backing page is released from usage, increment the available
backing page counter.

When swapping EPC pages to RAM, in addition to storing the page
contents, SGX must store some additional metadata to protect
against a malicious kernel when the page is swapped back in. This
additional metadata is called Paging Crypto MetaData. PCMD is
allocated from the same shared memory area as the backing page
contents and consumes RAM the same way.

PCMD is 128 bytes in size, and there is one PCMD structure per
page written to shared RAM. The page index for the PCMD page is
calculated from the page index of the backing page, so it is possible
that the PCMD structures are not packed into the minimum number of
pages possible. If 32 PCMDs can fit onto a single page, then PCMD
usage is 1/32 of total EPC pages. In the worst case, PCMD can
consume the same amount of RAM as EPC backing pages (1:1). For
simplicity, this implementation does not account for PCMD page usage.

Signed-off-by: Kristen Carlson Accardi <[email protected]>
---
arch/x86/kernel/cpu/sgx/encl.c | 76 ++++++++++++++++++++++++++++++++--
arch/x86/kernel/cpu/sgx/encl.h | 6 ++-
arch/x86/kernel/cpu/sgx/main.c | 6 +--
3 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..8be6f0592bdc 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
else
page_index = PFN_DOWN(encl->size);

- ret = sgx_encl_get_backing(encl, page_index, &b);
+ ret = sgx_encl_lookup_backing(encl, page_index, &b);
if (ret)
return ret;

@@ -407,6 +407,12 @@ void sgx_encl_release(struct kref *ref)
sgx_encl_free_epc_page(entry->epc_page);
encl->secs_child_cnt--;
entry->epc_page = NULL;
+ } else {
+ /*
+ * If there is no epc_page, it means it has been
+ * swapped out. Uncharge the backing storage.
+ */
+ sgx_uncharge_mem();
}

kfree(entry);
@@ -574,8 +580,8 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
* 0 on success,
* -errno otherwise.
*/
-int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
- struct sgx_backing *backing)
+static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
+ struct sgx_backing *backing)
{
pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
struct page *contents;
@@ -601,6 +607,62 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
return 0;
}

+/**
+ * sgx_encl_alloc_backing() - allocate a new backing storage page
+ * @encl: an enclave pointer
+ * @page_index: enclave page index
+ * @backing: data for accessing backing storage for the page
+ *
+ * Confirm that the global overcommit limit has not been reached before
+ * requesting a new backing storage page for storing the encrypted contents
+ * and Paging Crypto MetaData (PCMD) of an enclave page. This is called when
+ * there is no existing backing page, just before writing to the backing
+ * storage with EWB.
+ *
+ * Return:
+ * 0 on success,
+ * -errno otherwise.
+ */
+int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
+ struct sgx_backing *backing)
+{
+ int ret;
+
+ if (sgx_charge_mem())
+ return -ENOMEM;
+
+ ret = sgx_encl_get_backing(encl, page_index, backing);
+ if (ret)
+ sgx_uncharge_mem();
+
+ return ret;
+}
+
+/**
+ * sgx_encl_lookup_backing() - retrieve an existing backing storage page
+ * @encl: an enclave pointer
+ * @page_index: enclave page index
+ * @backing: data for accessing backing storage for the page
+ *
+ * Retrieve a backing page for loading data back into an EPC page with ELDU.
+ * This call does not cause a charge to the overcommit limit because a page
+ * has already been allocated, but has been swapped out or is in RAM
+ *
+ * It is the caller's responsibility to ensure that it is appropriate to
+ * use sgx_encl_lookup_backing() rather than sgx_encl_alloc_backing(). If
+ * lookup is not used correctly, this will cause an allocation that is
+ * not accounted for.
+ *
+ * Return:
+ * 0 on success,
+ * -errno otherwise.
+ */
+int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
+ struct sgx_backing *backing)
+{
+ return sgx_encl_get_backing(encl, page_index, backing);
+}
+
/**
* sgx_encl_put_backing() - Unpin the backing storage
* @backing: data for accessing backing storage for the page
@@ -608,9 +670,17 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
*/
void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write)
{
+ /*
+ * If the page is being written to by the reclaimer, it is
+ * still in use and the backing page usage should not be
+ * uncharged. However, if the page is not being written to,
+ * it is no longer in use and may be uncharged.
+ */
if (do_write) {
set_page_dirty(backing->pcmd);
set_page_dirty(backing->contents);
+ } else {
+ sgx_uncharge_mem();
}

put_page(backing->pcmd);
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..8ffb8a83263f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -105,8 +105,10 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,

void sgx_encl_release(struct kref *ref);
int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
-int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
- struct sgx_backing *backing);
+int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
+ struct sgx_backing *backing);
+int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
+ struct sgx_backing *backing);
void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
int sgx_encl_test_and_clear_young(struct mm_struct *mm,
struct sgx_encl_page *page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8a7bfe0d9023..20ccd9c5b184 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -347,8 +347,8 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
encl->secs_child_cnt--;

if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
- ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
- &secs_backing);
+ ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl->size),
+ &secs_backing);
if (ret)
goto out;

@@ -418,7 +418,7 @@ static void sgx_reclaim_pages(void)
goto skip;

page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
- ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
+ ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i]);
if (ret)
goto skip;

--
2.20.1


2022-01-07 18:47:16

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/sgx: Add accounting for tracking overcommit

On 1/7/22 10:16, Kristen Carlson Accardi wrote:
> The overcommit percentage value is 150, which limits the total number of
> shared memory pages that may be consumed by all enclaves as backing pages
> to 1.5X of EPC pages on the system.

Hi Kristen,

Could you give some background on how this value was chosen and how it
might impact userspace?

2022-01-07 19:16:44

by Kristen Carlson Accardi

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/sgx: Add accounting for tracking overcommit

On Fri, 2022-01-07 at 10:46 -0800, Dave Hansen wrote:
> On 1/7/22 10:16, Kristen Carlson Accardi wrote:
> > The overcommit percentage value is 150, which limits the total
> > number of
> > shared memory pages that may be consumed by all enclaves as backing
> > pages
> > to 1.5X of EPC pages on the system.
>
> Hi Kristen,
>
> Could you give some background on how this value was chosen and how
> it
> might impact userspace?

Yes,
The value of 1.5x the number of EPC pages was chosen because it will
handle the most common case of a few enclaves that don't need much
overcommit without any impact to user space. In the less commone case
where there are many enclaves or a few large enclaves which need a lot
of overcommit due to large EPC memory requirements, the reclaimer may
fail to allocate a backing page for swapping if the limit has been
reached. In this case the page will not be able to be reclaimed and the
system will not be able to allocate any new EPC pages. Any ioctl or
call to add new EPC pages will get -ENOMEM, so for example, new
enclaves will fail to load, and new EPC pages will not be able to be
added.

Does that make sense?

Kristen



2022-01-08 15:59:58

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/sgx: Add accounting for tracking overcommit

On Fri, Jan 07, 2022 at 10:16:16AM -0800, Kristen Carlson Accardi wrote:
> When the system runs out of enclave memory, SGX can reclaim EPC pages
> by swapping to normal RAM. This normal RAM is allocated via a
> per-enclave shared memory area. The shared memory area is not mapped
> into the enclave or the task mapping it, which makes its memory use
> opaque (including to the OOM killer). Having lots of hard to find
> memory around is problematic, especially when there is no limit.
>
> Introduce a global counter that can be used to limit the number of pages
> that enclaves are able to consume for backing storage. This parameter
> is a percentage value that is used in conjunction with the number of
> EPC pages in the system to set a cap on the amount of backing RAM that
> can be consumed.
>
> The overcommit percentage value is 150, which limits the total number of
> shared memory pages that may be consumed by all enclaves as backing pages
> to 1.5X of EPC pages on the system.
>
> For example, on an SGX system that has 128MB of EPC, this would
> cap the amount of normal RAM that SGX consumes for its shared memory
> areas at 192MB.
>
> The SGX overcommit percent works differently than the core VM overcommit
> limit. Enclaves request backing pages one page at a time, and the number
> of in use backing pages that are allowed is a global resource that is
> limited for all enclaves.
>
> Introduce a pair of functions which can be used by callers when requesting
> backing RAM pages. These functions are responsible for accounting the
> page charges. A request may return an error if the request will cause the
> counter to exceed the backing page cap.
>
> Signed-off-by: Kristen Carlson Accardi <[email protected]>
> ---
> arch/x86/kernel/cpu/sgx/main.c | 46 ++++++++++++++++++++++++++++++++++
> arch/x86/kernel/cpu/sgx/sgx.h | 2 ++
> 2 files changed, 48 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 2857a49f2335..8a7bfe0d9023 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -43,6 +43,45 @@ static struct sgx_numa_node *sgx_numa_nodes;
>
> static LIST_HEAD(sgx_dirty_page_list);
>
> +/*
> + * Limits the amount of normal RAM that SGX can consume for EPC
> + * overcommit to the total EPC pages * sgx_overcommit_percent / 100
> + */
> +static int sgx_overcommit_percent = 150;
> +
> +/* The number of pages that can be allocated globally for backing storage. */
> +static atomic_long_t sgx_nr_available_backing_pages;
> +
> +/**
> + * sgx_charge_mem() - charge for a page used for backing storage
> + *
> + * Backing storage usage is capped by the sgx_nr_available_backing_pages.
> + * If the backing storage usage is over the overcommit limit,
> + * return an error.
> + *
> + * Return:
> + * 0: The page requested does not exceed the limit
> + * -ENOMEM: The page requested exceeds the overcommit limit
> + */
> +int sgx_charge_mem(void)
> +{
> + if (!atomic_long_add_unless(&sgx_nr_available_backing_pages, -1, 0))
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +/**
> + * sgx_uncharge_mem() - uncharge a page previously used for backing storage
> + *
> + * When backing storage is no longer in use, increment the
> + * sgx_nr_available_backing_pages counter.
> + */
> +void sgx_uncharge_mem(void)
> +{
> + atomic_long_inc(&sgx_nr_available_backing_pages);
> +}
> +
> /*
> * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> * from the input list, and made available for the page allocator. SECS pages
> @@ -786,6 +825,8 @@ static bool __init sgx_page_cache_init(void)
> u64 pa, size;
> int nid;
> int i;
> + u64 total_epc_bytes = 0;
> + u64 available_backing_bytes;

Use reverse-christmas-tree declaration order here. This was required for
the original patch set, so I expect this to still hold.

>
> sgx_numa_nodes = kmalloc_array(num_possible_nodes(), sizeof(*sgx_numa_nodes), GFP_KERNEL);
> if (!sgx_numa_nodes)
> @@ -830,6 +871,7 @@ static bool __init sgx_page_cache_init(void)
>
> sgx_epc_sections[i].node = &sgx_numa_nodes[nid];
> sgx_numa_nodes[nid].size += size;
> + total_epc_bytes += size;
>
> sgx_nr_epc_sections++;
> }
> @@ -839,6 +881,10 @@ static bool __init sgx_page_cache_init(void)
> return false;
> }
>
> + available_backing_bytes = total_epc_bytes * (sgx_overcommit_percent / 100);
> +

IMHO this empty line could be removed, and group these statements together.

> + atomic_long_set(&sgx_nr_available_backing_pages, available_backing_bytes >> PAGE_SHIFT);
> +
> return true;
> }
>
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 0f17def9fe6f..3507a9983fc1 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -89,6 +89,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page);
> void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
> int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
> struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
> +int sgx_charge_mem(void);
> +void sgx_uncharge_mem(void);
>
> #ifdef CONFIG_X86_SGX_KVM
> int __init sgx_vepc_init(void);
> --
> 2.20.1
>

Other than that, looks good to me.

BR,
Jarkko

2022-01-08 16:05:35

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/sgx: account backing pages

On Fri, Jan 07, 2022 at 10:16:17AM -0800, Kristen Carlson Accardi wrote:
> SGX may allow EPC pages to be overcommitted. If the system is
> out of enclave memory, EPC pages are swapped to normal RAM via
> a per enclave shared memory area. This shared memory is not
> charged to the enclave or the task mapping it, making it hard
> to account for using normal methods.
>
> In order to avoid unlimited usage of normal RAM, enclaves must be
> charged for each new page used for backing storage, and uncharged
> when they are no longer using a backing page.
>
> Modify the existing flow for requesting backing pages to reduce the
> available backing page counter and confirm that the limit has not
> been exceeded. Backing page usage for loading EPC pages back out of
> the shared memory do not incur a charge.
>
> When a backing page is released from usage, increment the available
> backing page counter.
>
> When swapping EPC pages to RAM, in addition to storing the page
> contents, SGX must store some additional metadata to protect
> against a malicious kernel when the page is swapped back in. This
> additional metadata is called Paging Crypto MetaData. PCMD is
> allocated from the same shared memory area as the backing page
> contents and consumes RAM the same way.
>
> PCMD is 128 bytes in size, and there is one PCMD structure per
> page written to shared RAM. The page index for the PCMD page is
> calculated from the page index of the backing page, so it is possible
> that the PCMD structures are not packed into the minimum number of
> pages possible. If 32 PCMDs can fit onto a single page, then PCMD
> usage is 1/32 of total EPC pages. In the worst case, PCMD can
> consume the same amount of RAM as EPC backing pages (1:1). For
> simplicity, this implementation does not account for PCMD page usage.
>
> Signed-off-by: Kristen Carlson Accardi <[email protected]>
> ---
> arch/x86/kernel/cpu/sgx/encl.c | 76 ++++++++++++++++++++++++++++++++--
> arch/x86/kernel/cpu/sgx/encl.h | 6 ++-
> arch/x86/kernel/cpu/sgx/main.c | 6 +--
> 3 files changed, 80 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 001808e3901c..8be6f0592bdc 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
> else
> page_index = PFN_DOWN(encl->size);
>
> - ret = sgx_encl_get_backing(encl, page_index, &b);
> + ret = sgx_encl_lookup_backing(encl, page_index, &b);
> if (ret)
> return ret;
>
> @@ -407,6 +407,12 @@ void sgx_encl_release(struct kref *ref)
> sgx_encl_free_epc_page(entry->epc_page);
> encl->secs_child_cnt--;
> entry->epc_page = NULL;
> + } else {
> + /*
> + * If there is no epc_page, it means it has been
> + * swapped out. Uncharge the backing storage.
> + */
> + sgx_uncharge_mem();
> }
>
> kfree(entry);
> @@ -574,8 +580,8 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
> * 0 on success,
> * -errno otherwise.
> */
> -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
> - struct sgx_backing *backing)
> +static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
> + struct sgx_backing *backing)
> {
> pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
> struct page *contents;
> @@ -601,6 +607,62 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
> return 0;
> }
>
> +/**
> + * sgx_encl_alloc_backing() - allocate a new backing storage page
> + * @encl: an enclave pointer
> + * @page_index: enclave page index
> + * @backing: data for accessing backing storage for the page
> + *
> + * Confirm that the global overcommit limit has not been reached before
> + * requesting a new backing storage page for storing the encrypted contents
> + * and Paging Crypto MetaData (PCMD) of an enclave page. This is called when
> + * there is no existing backing page, just before writing to the backing
> + * storage with EWB.
> + *
> + * Return:
> + * 0 on success,
> + * -errno otherwise.
> + */
> +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
> + struct sgx_backing *backing)
> +{
> + int ret;
> +
> + if (sgx_charge_mem())
> + return -ENOMEM;
> +
> + ret = sgx_encl_get_backing(encl, page_index, backing);
> + if (ret)
> + sgx_uncharge_mem();
> +
> + return ret;
> +}
> +
> +/**
> + * sgx_encl_lookup_backing() - retrieve an existing backing storage page
> + * @encl: an enclave pointer
> + * @page_index: enclave page index
> + * @backing: data for accessing backing storage for the page
> + *
> + * Retrieve a backing page for loading data back into an EPC page with ELDU.
> + * This call does not cause a charge to the overcommit limit because a page
> + * has already been allocated, but has been swapped out or is in RAM
> + *
> + * It is the caller's responsibility to ensure that it is appropriate to
> + * use sgx_encl_lookup_backing() rather than sgx_encl_alloc_backing(). If
> + * lookup is not used correctly, this will cause an allocation that is
> + * not accounted for.
> + *
> + * Return:
> + * 0 on success,
> + * -errno otherwise.
> + */
> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
> + struct sgx_backing *backing)
> +{
> + return sgx_encl_get_backing(encl, page_index, backing);
> +}

IMHO, sgx_encl_backing() should be open-coded here.

> +
> /**
> * sgx_encl_put_backing() - Unpin the backing storage
> * @backing: data for accessing backing storage for the page
> @@ -608,9 +670,17 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
> */
> void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write)
> {
> + /*
> + * If the page is being written to by the reclaimer, it is
> + * still in use and the backing page usage should not be
> + * uncharged. However, if the page is not being written to,
> + * it is no longer in use and may be uncharged.
> + */
> if (do_write) {
> set_page_dirty(backing->pcmd);
> set_page_dirty(backing->contents);
> + } else {
> + sgx_uncharge_mem();
> }
>
> put_page(backing->pcmd);
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index fec43ca65065..8ffb8a83263f 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -105,8 +105,10 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
>
> void sgx_encl_release(struct kref *ref);
> int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
> -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
> - struct sgx_backing *backing);
> +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
> + struct sgx_backing *backing);

This might sound overly long but it really would make this a lot clearer
than alloc, which can mean pretty much anything:

sgx_encl_lookup_and_charge_backing()

What do you think? That leaves no guesswork what the function does.

> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
> + struct sgx_backing *backing);
> void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
> int sgx_encl_test_and_clear_young(struct mm_struct *mm,
> struct sgx_encl_page *page);
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 8a7bfe0d9023..20ccd9c5b184 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -347,8 +347,8 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
> encl->secs_child_cnt--;
>
> if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
> - ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
> - &secs_backing);
> + ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl->size),
> + &secs_backing);
> if (ret)
> goto out;
>
> @@ -418,7 +418,7 @@ static void sgx_reclaim_pages(void)
> goto skip;
>
> page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
> - ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
> + ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i]);
> if (ret)
> goto skip;
>
> --
> 2.20.1
>

/Jarkko

2022-01-11 14:20:39

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/sgx: Add accounting for tracking overcommit

On Fri, 07 Jan 2022 13:16:12 -0600, Kristen Carlson Accardi
<[email protected]> wrote:

> On Fri, 2022-01-07 at 10:46 -0800, Dave Hansen wrote:
>> On 1/7/22 10:16, Kristen Carlson Accardi wrote:
>> > The overcommit percentage value is 150, which limits the total
>> > number of
>> > shared memory pages that may be consumed by all enclaves as backing
>> > pages
>> > to 1.5X of EPC pages on the system.
>>
>> Hi Kristen,
>>
>> Could you give some background on how this value was chosen and how
>> it
>> might impact userspace?
>
> Yes,
> The value of 1.5x the number of EPC pages was chosen because it will
> handle the most common case of a few enclaves that don't need much
> overcommit without any impact to user space. In the less commone case
> where there are many enclaves or a few large enclaves which need a lot
> of overcommit due to large EPC memory requirements, the reclaimer may
> fail to allocate a backing page for swapping if the limit has been
> reached. In this case the page will not be able to be reclaimed and the
> system will not be able to allocate any new EPC pages. Any ioctl or
> call to add new EPC pages will get -ENOMEM, so for example, new
> enclaves will fail to load, and new EPC pages will not be able to be
> added.
>
> Does that make sense?

If the system has a ton of RAM but limited EPC, I think it makes sense to
allow more EPC swapping, can we do min(0.5*RAM, 2*EPC)?
I suppose if the system is used for heavy enclave load, user would be
willing to at least use half of RAM.

Thanks
Haitao

2022-01-11 15:43:52

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/sgx: Add accounting for tracking overcommit

On 1/11/22 06:20, Haitao Huang wrote:
> If the system has a ton of RAM but limited EPC, I think it makes sense
> to allow more EPC swapping, can we do min(0.5*RAM, 2*EPC)?
> I suppose if the system is used for heavy enclave load, user would be
> willing to at least use half of RAM.

If I have 100GB of RAM and 100MB of EPC, can I really *meaningfully* run
50GB of enclaves? In that case, if everything was swapped out evenly, I
would only have a 499/500 chance that a given page reference would fault.

This isn't about a "heavy enclave load". If there is *that* much
swapped-out enclave memory, will an enclave even make meaningful forward
progress?

2022-01-11 16:33:34

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/sgx: Add accounting for tracking overcommit

On Tue, 11 Jan 2022 09:43:35 -0600, Dave Hansen <[email protected]>
wrote:

> On 1/11/22 06:20, Haitao Huang wrote:
>> If the system has a ton of RAM but limited EPC, I think it makes sense
>> to allow more EPC swapping, can we do min(0.5*RAM, 2*EPC)?
>> I suppose if the system is used for heavy enclave load, user would be
>> willing to at least use half of RAM.
>
> If I have 100GB of RAM and 100MB of EPC, can I really *meaningfully* run
> 50GB of enclaves? In that case, if everything was swapped out evenly, I
> would only have a 499/500 chance that a given page reference would fault.
>

The formula will cap swapping at 2*EPC so only 200MB swapped out. So the
miss is at most 1/3.
The original hard coded cap 1.5*EPC may still consume too much RAM if
RAM<1.5*EPC.

> This isn't about a "heavy enclave load". If there is *that* much
> swapped-out enclave memory, will an enclave even make meaningful forward
> progress?



2022-01-11 17:39:55

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/sgx: Add accounting for tracking overcommit

On 1/11/22 08:33, Haitao Huang wrote:
> On Tue, 11 Jan 2022 09:43:35 -0600, Dave Hansen <[email protected]>
> wrote:
>> On 1/11/22 06:20, Haitao Huang wrote:
>>> If the system has a ton of RAM but limited EPC, I think it makes
>>> sense to allow more EPC swapping, can we do min(0.5*RAM, 2*EPC)?
>>> I suppose if the system is used for heavy enclave load, user would be
>>> willing to at least use half of RAM.
>>
>> If I have 100GB of RAM and 100MB of EPC, can I really *meaningfully*
>> run 50GB of enclaves?  In that case, if everything was swapped out
>> evenly, I would only have a 499/500 chance that a given page reference
>> would fault.
>
> The formula will cap swapping at 2*EPC so only 200MB swapped out.  So
> the miss is at most 1/3.
> The original hard coded cap 1.5*EPC may still consume too much RAM if
> RAM<1.5*EPC.

Oh, sorry, I read that backwards.

Basing it on the amount of RAM is a bit nasty. You might either really
overly restrict the amount of allowed EPC, or you have to handle hotplug.

2022-01-14 22:59:26

by Kristen Carlson Accardi

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/sgx: Add accounting for tracking overcommit

On Tue, 2022-01-11 at 09:39 -0800, Dave Hansen wrote:
> On 1/11/22 08:33, Haitao Huang wrote:
> > On Tue, 11 Jan 2022 09:43:35 -0600, Dave Hansen <
> > [email protected]>
> > wrote:
> > > On 1/11/22 06:20, Haitao Huang wrote:
> > > > If the system has a ton of RAM but limited EPC, I think it
> > > > makes
> > > > sense to allow more EPC swapping, can we do min(0.5*RAM,
> > > > 2*EPC)?
> > > > I suppose if the system is used for heavy enclave load, user
> > > > would be
> > > > willing to at least use half of RAM.
> > >
> > > If I have 100GB of RAM and 100MB of EPC, can I really
> > > *meaningfully*
> > > run 50GB of enclaves? In that case, if everything was swapped
> > > out
> > > evenly, I would only have a 499/500 chance that a given page
> > > reference
> > > would fault.
> >
> > The formula will cap swapping at 2*EPC so only 200MB swapped out.
> > So
> > the miss is at most 1/3.
> > The original hard coded cap 1.5*EPC may still consume too much RAM
> > if
> > RAM<1.5*EPC.
>
> Oh, sorry, I read that backwards.
>
> Basing it on the amount of RAM is a bit nasty. You might either
> really
> overly restrict the amount of allowed EPC, or you have to handle
> hotplug.

My opinion is that we should keep the current algorithm for now as it
is pretty straightforward, and cgroups will eventually allow for more
control.


2022-01-14 22:59:27

by Kristen Carlson Accardi

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/sgx: Add accounting for tracking overcommit

On Sat, 2022-01-08 at 17:59 +0200, Jarkko Sakkinen wrote:
> On Fri, Jan 07, 2022 at 10:16:16AM -0800, Kristen Carlson Accardi
> wrote:
> > When the system runs out of enclave memory, SGX can reclaim EPC
> > pages
> > by swapping to normal RAM. This normal RAM is allocated via a
> > per-enclave shared memory area. The shared memory area is not
> > mapped
> > into the enclave or the task mapping it, which makes its memory use
> > opaque (including to the OOM killer). Having lots of hard to find
> > memory around is problematic, especially when there is no limit.
> >
> > Introduce a global counter that can be used to limit the number of
> > pages
> > that enclaves are able to consume for backing storage. This
> > parameter
> > is a percentage value that is used in conjunction with the number
> > of
> > EPC pages in the system to set a cap on the amount of backing RAM
> > that
> > can be consumed.
> >
> > The overcommit percentage value is 150, which limits the total
> > number of
> > shared memory pages that may be consumed by all enclaves as backing
> > pages
> > to 1.5X of EPC pages on the system.
> >
> > For example, on an SGX system that has 128MB of EPC, this would
> > cap the amount of normal RAM that SGX consumes for its shared
> > memory
> > areas at 192MB.
> >
> > The SGX overcommit percent works differently than the core VM
> > overcommit
> > limit. Enclaves request backing pages one page at a time, and the
> > number
> > of in use backing pages that are allowed is a global resource that
> > is
> > limited for all enclaves.
> >
> > Introduce a pair of functions which can be used by callers when
> > requesting
> > backing RAM pages. These functions are responsible for accounting
> > the
> > page charges. A request may return an error if the request will
> > cause the
> > counter to exceed the backing page cap.
> >
> > Signed-off-by: Kristen Carlson Accardi <[email protected]>
> > ---
> > arch/x86/kernel/cpu/sgx/main.c | 46
> > ++++++++++++++++++++++++++++++++++
> > arch/x86/kernel/cpu/sgx/sgx.h | 2 ++
> > 2 files changed, 48 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c
> > b/arch/x86/kernel/cpu/sgx/main.c
> > index 2857a49f2335..8a7bfe0d9023 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -43,6 +43,45 @@ static struct sgx_numa_node *sgx_numa_nodes;
> >
> > static LIST_HEAD(sgx_dirty_page_list);
> >
> > +/*
> > + * Limits the amount of normal RAM that SGX can consume for EPC
> > + * overcommit to the total EPC pages * sgx_overcommit_percent /
> > 100
> > + */
> > +static int sgx_overcommit_percent = 150;
> > +
> > +/* The number of pages that can be allocated globally for backing
> > storage. */
> > +static atomic_long_t sgx_nr_available_backing_pages;
> > +
> > +/**
> > + * sgx_charge_mem() - charge for a page used for backing storage
> > + *
> > + * Backing storage usage is capped by the
> > sgx_nr_available_backing_pages.
> > + * If the backing storage usage is over the overcommit limit,
> > + * return an error.
> > + *
> > + * Return:
> > + * 0: The page requested does not exceed the limit
> > + * -ENOMEM: The page requested exceeds the overcommit limit
> > + */
> > +int sgx_charge_mem(void)
> > +{
> > + if (!atomic_long_add_unless(&sgx_nr_available_backing_pages,
> > -1, 0))
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * sgx_uncharge_mem() - uncharge a page previously used for
> > backing storage
> > + *
> > + * When backing storage is no longer in use, increment the
> > + * sgx_nr_available_backing_pages counter.
> > + */
> > +void sgx_uncharge_mem(void)
> > +{
> > + atomic_long_inc(&sgx_nr_available_backing_pages);
> > +}
> > +
> > /*
> > * Reset post-kexec EPC pages to the uninitialized state. The
> > pages are removed
> > * from the input list, and made available for the page allocator.
> > SECS pages
> > @@ -786,6 +825,8 @@ static bool __init sgx_page_cache_init(void)
> > u64 pa, size;
> > int nid;
> > int i;
> > + u64 total_epc_bytes = 0;
> > + u64 available_backing_bytes;
>
> Use reverse-christmas-tree declaration order here. This was required
> for
> the original patch set, so I expect this to still hold.

sure.

>
> >
> > sgx_numa_nodes = kmalloc_array(num_possible_nodes(),
> > sizeof(*sgx_numa_nodes), GFP_KERNEL);
> > if (!sgx_numa_nodes)
> > @@ -830,6 +871,7 @@ static bool __init sgx_page_cache_init(void)
> >
> > sgx_epc_sections[i].node = &sgx_numa_nodes[nid];
> > sgx_numa_nodes[nid].size += size;
> > + total_epc_bytes += size;
> >
> > sgx_nr_epc_sections++;
> > }
> > @@ -839,6 +881,10 @@ static bool __init sgx_page_cache_init(void)
> > return false;
> > }
> >
> > + available_backing_bytes = total_epc_bytes *
> > (sgx_overcommit_percent / 100);
> > +
>
> IMHO this empty line could be removed, and group these statements
> together.

ok.

>
> > + atomic_long_set(&sgx_nr_available_backing_pages,
> > available_backing_bytes >> PAGE_SHIFT);
> > +
> > return true;
> > }
> >
> > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h
> > b/arch/x86/kernel/cpu/sgx/sgx.h
> > index 0f17def9fe6f..3507a9983fc1 100644
> > --- a/arch/x86/kernel/cpu/sgx/sgx.h
> > +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> > @@ -89,6 +89,8 @@ void sgx_free_epc_page(struct sgx_epc_page
> > *page);
> > void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
> > int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
> > struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool
> > reclaim);
> > +int sgx_charge_mem(void);
> > +void sgx_uncharge_mem(void);
> >
> > #ifdef CONFIG_X86_SGX_KVM
> > int __init sgx_vepc_init(void);
> > --
> > 2.20.1
> >
>
> Other than that, looks good to me.

thanks for taking another look. I will incorporate your suggestions
into the next revision.


2022-01-14 22:59:33

by Kristen Carlson Accardi

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/sgx: account backing pages

On Sat, 2022-01-08 at 18:05 +0200, Jarkko Sakkinen wrote:
> On Fri, Jan 07, 2022 at 10:16:17AM -0800, Kristen Carlson Accardi
> wrote:
> > SGX may allow EPC pages to be overcommitted. If the system is
> > out of enclave memory, EPC pages are swapped to normal RAM via
> > a per enclave shared memory area. This shared memory is not
> > charged to the enclave or the task mapping it, making it hard
> > to account for using normal methods.
> >
> > In order to avoid unlimited usage of normal RAM, enclaves must be
> > charged for each new page used for backing storage, and uncharged
> > when they are no longer using a backing page.
> >
> > Modify the existing flow for requesting backing pages to reduce the
> > available backing page counter and confirm that the limit has not
> > been exceeded. Backing page usage for loading EPC pages back out of
> > the shared memory do not incur a charge.
> >
> > When a backing page is released from usage, increment the available
> > backing page counter.
> >
> > When swapping EPC pages to RAM, in addition to storing the page
> > contents, SGX must store some additional metadata to protect
> > against a malicious kernel when the page is swapped back in. This
> > additional metadata is called Paging Crypto MetaData. PCMD is
> > allocated from the same shared memory area as the backing page
> > contents and consumes RAM the same way.
> >
> > PCMD is 128 bytes in size, and there is one PCMD structure per
> > page written to shared RAM. The page index for the PCMD page is
> > calculated from the page index of the backing page, so it is
> > possible
> > that the PCMD structures are not packed into the minimum number of
> > pages possible. If 32 PCMDs can fit onto a single page, then PCMD
> > usage is 1/32 of total EPC pages. In the worst case, PCMD can
> > consume the same amount of RAM as EPC backing pages (1:1). For
> > simplicity, this implementation does not account for PCMD page
> > usage.
> >
> > Signed-off-by: Kristen Carlson Accardi <[email protected]>
> > ---
> > arch/x86/kernel/cpu/sgx/encl.c | 76
> > ++++++++++++++++++++++++++++++++--
> > arch/x86/kernel/cpu/sgx/encl.h | 6 ++-
> > arch/x86/kernel/cpu/sgx/main.c | 6 +--
> > 3 files changed, 80 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> > b/arch/x86/kernel/cpu/sgx/encl.c
> > index 001808e3901c..8be6f0592bdc 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page
> > *encl_page,
> > else
> > page_index = PFN_DOWN(encl->size);
> >
> > - ret = sgx_encl_get_backing(encl, page_index, &b);
> > + ret = sgx_encl_lookup_backing(encl, page_index, &b);
> > if (ret)
> > return ret;
> >
> > @@ -407,6 +407,12 @@ void sgx_encl_release(struct kref *ref)
> > sgx_encl_free_epc_page(entry->epc_page);
> > encl->secs_child_cnt--;
> > entry->epc_page = NULL;
> > + } else {
> > + /*
> > + * If there is no epc_page, it means it has
> > been
> > + * swapped out. Uncharge the backing storage.
> > + */
> > + sgx_uncharge_mem();
> > }
> >
> > kfree(entry);
> > @@ -574,8 +580,8 @@ static struct page
> > *sgx_encl_get_backing_page(struct sgx_encl *encl,
> > * 0 on success,
> > * -errno otherwise.
> > */
> > -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long
> > page_index,
> > - struct sgx_backing *backing)
> > +static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned
> > long page_index,
> > + struct sgx_backing *backing)
> > {
> > pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >>
> > 5);
> > struct page *contents;
> > @@ -601,6 +607,62 @@ int sgx_encl_get_backing(struct sgx_encl
> > *encl, unsigned long page_index,
> > return 0;
> > }
> >
> > +/**
> > + * sgx_encl_alloc_backing() - allocate a new backing storage page
> > + * @encl: an enclave pointer
> > + * @page_index: enclave page index
> > + * @backing: data for accessing backing storage for the page
> > + *
> > + * Confirm that the global overcommit limit has not been reached
> > before
> > + * requesting a new backing storage page for storing the encrypted
> > contents
> > + * and Paging Crypto MetaData (PCMD) of an enclave page. This is
> > called when
> > + * there is no existing backing page, just before writing to the
> > backing
> > + * storage with EWB.
> > + *
> > + * Return:
> > + * 0 on success,
> > + * -errno otherwise.
> > + */
> > +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long
> > page_index,
> > + struct sgx_backing *backing)
> > +{
> > + int ret;
> > +
> > + if (sgx_charge_mem())
> > + return -ENOMEM;
> > +
> > + ret = sgx_encl_get_backing(encl, page_index, backing);
> > + if (ret)
> > + sgx_uncharge_mem();
> > +
> > + return ret;
> > +}
> > +
> > +/**
> > + * sgx_encl_lookup_backing() - retrieve an existing backing
> > storage page
> > + * @encl: an enclave pointer
> > + * @page_index: enclave page index
> > + * @backing: data for accessing backing storage for the page
> > + *
> > + * Retrieve a backing page for loading data back into an EPC page
> > with ELDU.
> > + * This call does not cause a charge to the overcommit limit
> > because a page
> > + * has already been allocated, but has been swapped out or is in
> > RAM
> > + *
> > + * It is the caller's responsibility to ensure that it is
> > appropriate to
> > + * use sgx_encl_lookup_backing() rather than
> > sgx_encl_alloc_backing(). If
> > + * lookup is not used correctly, this will cause an allocation
> > that is
> > + * not accounted for.
> > + *
> > + * Return:
> > + * 0 on success,
> > + * -errno otherwise.
> > + */
> > +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long
> > page_index,
> > + struct sgx_backing *backing)
> > +{
> > + return sgx_encl_get_backing(encl, page_index, backing);
> > +}
>
> IMHO, sgx_encl_backing() should be open-coded here.

I can understand your hesitation, but I agree with Dave here that
wrapping the function makes the code more clear. I would prefer to keep
this the way it is.


2022-01-14 23:02:56

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/sgx: account backing pages

On 1/14/22 9:51 AM, Kristen Carlson Accardi wrote:
>>> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long
>>> page_index,
>>> + struct sgx_backing *backing)
>>> +{
>>> + return sgx_encl_get_backing(encl, page_index, backing);
>>> +}
>> IMHO, sgx_encl_backing() should be open-coded here.
> I can understand your hesitation, but I agree with Dave here that
> wrapping the function makes the code more clear. I would prefer to keep
> this the way it is.

I'd also like to see sgx_encl_lookup_backing() and
sgx_encl_alloc_backing() diverge more in the future.

For instance, sgx_encl_alloc_backing() could ensure that the page does
not exist in the file before doing the sgx_encl_get_backing() call.
This would ensure that it truly *does* allocate a page and does not just
return a previously-allocated page.

sgx_encl_lookup_backing() could ensure the opposite: that the page
*DOES* exist in the file before doing the sgx_encl_get_backing() call.
This would ensure that it does not allocate a page in a case where we
expected an old, existing page to be present.

2022-01-15 04:27:29

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/sgx: account backing pages

On Fri, Jan 14, 2022 at 09:51:14AM -0800, Kristen Carlson Accardi wrote:
> On Sat, 2022-01-08 at 18:05 +0200, Jarkko Sakkinen wrote:
> > On Fri, Jan 07, 2022 at 10:16:17AM -0800, Kristen Carlson Accardi
> > wrote:
> > > SGX may allow EPC pages to be overcommitted. If the system is
> > > out of enclave memory, EPC pages are swapped to normal RAM via
> > > a per enclave shared memory area. This shared memory is not
> > > charged to the enclave or the task mapping it, making it hard
> > > to account for using normal methods.
> > >
> > > In order to avoid unlimited usage of normal RAM, enclaves must be
> > > charged for each new page used for backing storage, and uncharged
> > > when they are no longer using a backing page.
> > >
> > > Modify the existing flow for requesting backing pages to reduce the
> > > available backing page counter and confirm that the limit has not
> > > been exceeded. Backing page usage for loading EPC pages back out of
> > > the shared memory do not incur a charge.
> > >
> > > When a backing page is released from usage, increment the available
> > > backing page counter.
> > >
> > > When swapping EPC pages to RAM, in addition to storing the page
> > > contents, SGX must store some additional metadata to protect
> > > against a malicious kernel when the page is swapped back in. This
> > > additional metadata is called Paging Crypto MetaData. PCMD is
> > > allocated from the same shared memory area as the backing page
> > > contents and consumes RAM the same way.
> > >
> > > PCMD is 128 bytes in size, and there is one PCMD structure per
> > > page written to shared RAM. The page index for the PCMD page is
> > > calculated from the page index of the backing page, so it is
> > > possible
> > > that the PCMD structures are not packed into the minimum number of
> > > pages possible. If 32 PCMDs can fit onto a single page, then PCMD
> > > usage is 1/32 of total EPC pages. In the worst case, PCMD can
> > > consume the same amount of RAM as EPC backing pages (1:1). For
> > > simplicity, this implementation does not account for PCMD page
> > > usage.
> > >
> > > Signed-off-by: Kristen Carlson Accardi <[email protected]>
> > > ---
> > > arch/x86/kernel/cpu/sgx/encl.c | 76
> > > ++++++++++++++++++++++++++++++++--
> > > arch/x86/kernel/cpu/sgx/encl.h | 6 ++-
> > > arch/x86/kernel/cpu/sgx/main.c | 6 +--
> > > 3 files changed, 80 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> > > b/arch/x86/kernel/cpu/sgx/encl.c
> > > index 001808e3901c..8be6f0592bdc 100644
> > > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > > @@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page
> > > *encl_page,
> > > else
> > > page_index = PFN_DOWN(encl->size);
> > >
> > > - ret = sgx_encl_get_backing(encl, page_index, &b);
> > > + ret = sgx_encl_lookup_backing(encl, page_index, &b);
> > > if (ret)
> > > return ret;
> > >
> > > @@ -407,6 +407,12 @@ void sgx_encl_release(struct kref *ref)
> > > sgx_encl_free_epc_page(entry->epc_page);
> > > encl->secs_child_cnt--;
> > > entry->epc_page = NULL;
> > > + } else {
> > > + /*
> > > + * If there is no epc_page, it means it has
> > > been
> > > + * swapped out. Uncharge the backing storage.
> > > + */
> > > + sgx_uncharge_mem();
> > > }
> > >
> > > kfree(entry);
> > > @@ -574,8 +580,8 @@ static struct page
> > > *sgx_encl_get_backing_page(struct sgx_encl *encl,
> > > * 0 on success,
> > > * -errno otherwise.
> > > */
> > > -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long
> > > page_index,
> > > - struct sgx_backing *backing)
> > > +static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned
> > > long page_index,
> > > + struct sgx_backing *backing)
> > > {
> > > pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >>
> > > 5);
> > > struct page *contents;
> > > @@ -601,6 +607,62 @@ int sgx_encl_get_backing(struct sgx_encl
> > > *encl, unsigned long page_index,
> > > return 0;
> > > }
> > >
> > > +/**
> > > + * sgx_encl_alloc_backing() - allocate a new backing storage page
> > > + * @encl: an enclave pointer
> > > + * @page_index: enclave page index
> > > + * @backing: data for accessing backing storage for the page
> > > + *
> > > + * Confirm that the global overcommit limit has not been reached
> > > before
> > > + * requesting a new backing storage page for storing the encrypted
> > > contents
> > > + * and Paging Crypto MetaData (PCMD) of an enclave page. This is
> > > called when
> > > + * there is no existing backing page, just before writing to the
> > > backing
> > > + * storage with EWB.
> > > + *
> > > + * Return:
> > > + * 0 on success,
> > > + * -errno otherwise.
> > > + */
> > > +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long
> > > page_index,
> > > + struct sgx_backing *backing)
> > > +{
> > > + int ret;
> > > +
> > > + if (sgx_charge_mem())
> > > + return -ENOMEM;
> > > +
> > > + ret = sgx_encl_get_backing(encl, page_index, backing);
> > > + if (ret)
> > > + sgx_uncharge_mem();
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +/**
> > > + * sgx_encl_lookup_backing() - retrieve an existing backing
> > > storage page
> > > + * @encl: an enclave pointer
> > > + * @page_index: enclave page index
> > > + * @backing: data for accessing backing storage for the page
> > > + *
> > > + * Retrieve a backing page for loading data back into an EPC page
> > > with ELDU.
> > > + * This call does not cause a charge to the overcommit limit
> > > because a page
> > > + * has already been allocated, but has been swapped out or is in
> > > RAM
> > > + *
> > > + * It is the caller's responsibility to ensure that it is
> > > appropriate to
> > > + * use sgx_encl_lookup_backing() rather than
> > > sgx_encl_alloc_backing(). If
> > > + * lookup is not used correctly, this will cause an allocation
> > > that is
> > > + * not accounted for.
> > > + *
> > > + * Return:
> > > + * 0 on success,
> > > + * -errno otherwise.
> > > + */
> > > +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long
> > > page_index,
> > > + struct sgx_backing *backing)
> > > +{
> > > + return sgx_encl_get_backing(encl, page_index, backing);
> > > +}
> >
> > IMHO, sgx_encl_backing() should be open-coded here.
>
> I can understand your hesitation, but I agree with Dave here that
> wrapping the function makes the code more clear. I would prefer to keep
> this the way it is.

What if sgx_encl_get_backing() was changed as "static inline", if the
only motivation is encapsulation?

/Jarkko

2022-01-15 04:28:44

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/sgx: account backing pages

On 1/14/22 3:52 PM, Jarkko Sakkinen wrote:
>> I can understand your hesitation, but I agree with Dave here that
>> wrapping the function makes the code more clear. I would prefer to keep
>> this the way it is.
> What if sgx_encl_get_backing() was changed as "static inline", if the
> only motivation is encapsulation?

What would the purpose be of adding an 'inline' to the function definition?

2022-01-15 06:31:10

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/sgx: account backing pages

On Fri, Jan 14, 2022 at 09:55:51AM -0800, Dave Hansen wrote:
> On 1/14/22 9:51 AM, Kristen Carlson Accardi wrote:
> >>> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long
> >>> page_index,
> >>> + struct sgx_backing *backing)
> >>> +{
> >>> + return sgx_encl_get_backing(encl, page_index, backing);
> >>> +}
> >> IMHO, sgx_encl_backing() should be open-coded here.
> > I can understand your hesitation, but I agree with Dave here that
> > wrapping the function makes the code more clear. I would prefer to keep
> > this the way it is.
>
> I'd also like to see sgx_encl_lookup_backing() and
> sgx_encl_alloc_backing() diverge more in the future.
>
> For instance, sgx_encl_alloc_backing() could ensure that the page does
> not exist in the file before doing the sgx_encl_get_backing() call.
> This would ensure that it truly *does* allocate a page and does not just
> return a previously-allocated page.
>
> sgx_encl_lookup_backing() could ensure the opposite: that the page
> *DOES* exist in the file before doing the sgx_encl_get_backing() call.
> This would ensure that it does not allocate a page in a case where we
> expected an old, existing page to be present.

Would it be a too big tretch to add these and make the whole scheme
fully legit? Does not sound like an extremely huge stretch and there is now
a full cycle amount of time make it happen before 5.18 merge window.

/Jarkko

2022-01-15 08:17:12

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/sgx: account backing pages

On 1/14/22 3:57 PM, Jarkko Sakkinen wrote:
> Would it be a too big tretch to add these and make the whole scheme
> fully legit? Does not sound like an extremely huge stretch and there is now
> a full cycle amount of time make it happen before 5.18 merge window.

No, it's not a big stretch. If anyone wants it for 5.18 (or whatever)
just send the patch.

2022-01-15 11:14:38

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/sgx: account backing pages

On Fri, Jan 14, 2022 at 03:55:43PM -0800, Dave Hansen wrote:
> On 1/14/22 3:52 PM, Jarkko Sakkinen wrote:
> >> I can understand your hesitation, but I agree with Dave here that
> >> wrapping the function makes the code more clear. I would prefer to keep
> >> this the way it is.
> > What if sgx_encl_get_backing() was changed as "static inline", if the
> > only motivation is encapsulation?
>
> What would the purpose be of adding an 'inline' to the function definition?

Agreed, not much sense to do this. I just had troubles to get the way
things were encapsulated before your response.

/Jarkko

2022-01-15 11:15:45

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/sgx: account backing pages

On Fri, Jan 14, 2022 at 04:16:41PM -0800, Dave Hansen wrote:
> On 1/14/22 3:57 PM, Jarkko Sakkinen wrote:
> > Would it be a too big tretch to add these and make the whole scheme
> > fully legit? Does not sound like an extremely huge stretch and there is now
> > a full cycle amount of time make it happen before 5.18 merge window.
>
> No, it's not a big stretch. If anyone wants it for 5.18 (or whatever)
> just send the patch.

Fixing this to work with the current mainline is first and foremost
needed:

https://lore.kernel.org/linux-sgx/[email protected]/T/#t

With a fixeed version of that, wouldn't it be then just matter of calling
shmem_getpage() with SGX_NOALLOC to check the state of the file in both
calls?

/Jarkko

2022-01-16 16:22:26

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] x86/sgx: Limit EPC overcommit

On Sat, Jan 15, 2022 at 08:57:44PM +0200, Jarkko Sakkinen wrote:
> On Fri, Jan 07, 2022 at 10:16:15AM -0800, Kristen Carlson Accardi wrote:
> > SGX currently allows EPC pages to be overcommitted. If the system is
> > out of enclave memory, EPC pages are swapped to normal RAM via
> > a per enclave shared memory area. This shared memory is not charged
> > to the enclave or the task mapping it, making it hard to account
> > for using normal methods. Since SGX will allow EPC pages to be
> > overcommitted without limits, enclaves can consume system memory
> > for these backing pages without limits.
> >
> > In order to prevent this, set a cap on the amount of overcommit SGX
> > allows. Whenever a backing page is requested by an enclave, track
> > the total amount of shared memory pages used across all enclaves and
> > return an error if the overcommit limit has been reached. This will
> > restrict the total amount of backing pages that all enclaves can
> > consume to a maximum amount, and prevent enclaves from consuming
> > all the system RAM for backing pages.
> >
> > The overcommit percentage has a value of 150, which limits shared
> > memory page consumption to 1.5x the number of EPC pages in the system.
> >
> > Changes from v1
> > ----------------
> > * removed module parameter and disable boolean
> > * increased over commit percentage to 150% from 100%
> >
> > Kristen Carlson Accardi (2):
> > x86/sgx: Add accounting for tracking overcommit
> > x86/sgx: account backing pages
> >
> > arch/x86/kernel/cpu/sgx/encl.c | 76 ++++++++++++++++++++++++++++++++--
> > arch/x86/kernel/cpu/sgx/encl.h | 6 ++-
> > arch/x86/kernel/cpu/sgx/main.c | 52 +++++++++++++++++++++--
> > arch/x86/kernel/cpu/sgx/sgx.h | 2 +
> > 4 files changed, 128 insertions(+), 8 deletions(-)
> >
> > --
> > 2.20.1
> >
>
> I've tested also these. Looking at the feedback, there's
> nothing game changing, so you could add for the next
> version:
>
> Tested-by: Jarkko Sakkinen <[email protected]>

The test environment was a VM running my desktop [*] and I just
run many instances of kselftest as a test case.

[*] i5-9600KF CPU

/Jarkko

2022-01-16 16:22:42

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] x86/sgx: Limit EPC overcommit

On Fri, Jan 07, 2022 at 10:16:15AM -0800, Kristen Carlson Accardi wrote:
> SGX currently allows EPC pages to be overcommitted. If the system is
> out of enclave memory, EPC pages are swapped to normal RAM via
> a per enclave shared memory area. This shared memory is not charged
> to the enclave or the task mapping it, making it hard to account
> for using normal methods. Since SGX will allow EPC pages to be
> overcommitted without limits, enclaves can consume system memory
> for these backing pages without limits.
>
> In order to prevent this, set a cap on the amount of overcommit SGX
> allows. Whenever a backing page is requested by an enclave, track
> the total amount of shared memory pages used across all enclaves and
> return an error if the overcommit limit has been reached. This will
> restrict the total amount of backing pages that all enclaves can
> consume to a maximum amount, and prevent enclaves from consuming
> all the system RAM for backing pages.
>
> The overcommit percentage has a value of 150, which limits shared
> memory page consumption to 1.5x the number of EPC pages in the system.
>
> Changes from v1
> ----------------
> * removed module parameter and disable boolean
> * increased over commit percentage to 150% from 100%
>
> Kristen Carlson Accardi (2):
> x86/sgx: Add accounting for tracking overcommit
> x86/sgx: account backing pages
>
> arch/x86/kernel/cpu/sgx/encl.c | 76 ++++++++++++++++++++++++++++++++--
> arch/x86/kernel/cpu/sgx/encl.h | 6 ++-
> arch/x86/kernel/cpu/sgx/main.c | 52 +++++++++++++++++++++--
> arch/x86/kernel/cpu/sgx/sgx.h | 2 +
> 4 files changed, 128 insertions(+), 8 deletions(-)
>
> --
> 2.20.1
>

I've tested also these. Looking at the feedback, there's
nothing game changing, so you could add for the next
version:

Tested-by: Jarkko Sakkinen <[email protected]>

/Jarkko