2022-01-21 11:23:31

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH] x86/sgx: Add poison handling to reclaimer

The machine check recovery handling in SGX added the changes
listed below to the freeing of pages in sgx_free_epc_page().
The SGX reclaimer contains an open coded version of
sgx_free_epc_page() and thus did not obtain the changes in
support of poison handling.

The changes made to EPC page freeing in support of poison handling
are:
1) A new SGX_EPC_PAGE_IS_FREE flag is set when the EPC page is
freed. Introduced in commit d6d261bded8a ("x86/sgx: Add new
sgx_epc_page flag bit to mark free pages").
2) A new "poison" field in struct sgx_epc_page is used to
determine whether a newly freed EPC page should be placed
on the list of poisoned or list of free pages. Introduced
in commit 992801ae9243 ("x86/sgx: Initial poison handling
for dirty and free pages").
3) The owner field in struct sgx_epc_page is cleared when the EPC
page is freed. Introduced in commit 992801ae9243 ("x86/sgx:
Initial poison handling for dirty and free pages").

Replace the open coded enclave page freeing code in the reclaimer
with sgx_free_epc_page() to obtain support for poison page handling.

Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark free pages")
Fixes: 992801ae9243 ("x86/sgx: Initial poison handling for dirty and free pages")
Signed-off-by: Reinette Chatre <[email protected]>
---
arch/x86/kernel/cpu/sgx/main.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4b41efc9e367..997a5d0bc488 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -418,13 +418,7 @@ static void sgx_reclaim_pages(void)
kref_put(&encl_page->encl->refcount, sgx_encl_release);
epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;

- section = &sgx_epc_sections[epc_page->section];
- node = section->node;
-
- spin_lock(&node->lock);
- list_add_tail(&epc_page->list, &node->free_page_list);
- spin_unlock(&node->lock);
- atomic_long_inc(&sgx_nr_free_pages);
+ sgx_free_epc_page(epc_page);
}
}

--
2.25.1


2022-01-21 20:00:55

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/sgx: Add poison handling to reclaimer

On 1/18/22 3:05 PM, Reinette Chatre wrote:
> The machine check recovery handling in SGX added the changes
> listed below to the freeing of pages in sgx_free_epc_page().
> The SGX reclaimer contains an open coded version of
> sgx_free_epc_page() and thus did not obtain the changes in
> support of poison handling.

I was trying to decide if this is an urgent fix or not. A more crisp
problem statement might have helped in the changelog.

But, from what I can tell, the most probable troublesome scenario here
would be something like:

1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
2. arch_memory_failure() called is eventually
3. (SGX) page->poison set to 1
4. Page is reclaimed
5. Page added to normal free lists by sgx_reclaim_pages()
^ This is the bug
6. Page is reallocated by some innocent enclave, a second (synchronous)
in-kernel #MC is induced, probably during EADD instruction.
^ This is the fallout from the bug

#6 is unfortunate and can be avoided if this patch is applied.

Basically, this patch ensures that a bad enclave page is isolated
quickly and causes a minimal amount of collateral damage. Is this a
valid summary?

The SGX reclaimer code lacks page poison handling in its free
path. This can lead to completely avoidable machine checks if a
poisoned page is freed and reallocated instead of being
isolated.

2022-01-21 20:03:58

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/sgx: Add poison handling to reclaimer

On 1/19/22 12:47 PM, Reinette Chatre wrote:
> Thank you very much for the detailed analysis. Should I resend with
> an improved commit message that contains your scenario description
> and summary?

Sure, that would be great.

2022-01-21 20:04:46

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH] x86/sgx: Add poison handling to reclaimer

Hi Dave,

On 1/19/2022 11:51 AM, Dave Hansen wrote:
> On 1/18/22 3:05 PM, Reinette Chatre wrote:
>> The machine check recovery handling in SGX added the changes
>> listed below to the freeing of pages in sgx_free_epc_page().
>> The SGX reclaimer contains an open coded version of
>> sgx_free_epc_page() and thus did not obtain the changes in
>> support of poison handling.
>
> I was trying to decide if this is an urgent fix or not. A more crisp
> problem statement might have helped in the changelog.
>
> But, from what I can tell, the most probable troublesome scenario here
> would be something like:
>
> 1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
> 2. arch_memory_failure() called is eventually
> 3. (SGX) page->poison set to 1
> 4. Page is reclaimed
> 5. Page added to normal free lists by sgx_reclaim_pages()
> ^ This is the bug
> 6. Page is reallocated by some innocent enclave, a second (synchronous)
> in-kernel #MC is induced, probably during EADD instruction.
> ^ This is the fallout from the bug
>
> #6 is unfortunate and can be avoided if this patch is applied.
>
> Basically, this patch ensures that a bad enclave page is isolated
> quickly and causes a minimal amount of collateral damage. Is this a
> valid summary?
>
> The SGX reclaimer code lacks page poison handling in its free
> path. This can lead to completely avoidable machine checks if a
> poisoned page is freed and reallocated instead of being
> isolated.

As I understand this code it does look like a valid summary to me. One
detail is that the poison page handling is currently done for SECS pages
when they are freed by the reclaimer (via sgx_reclaimer_write()).

Thank you very much for the detailed analysis. Should I resend with
an improved commit message that contains your scenario description
and summary?

Reinette

2022-01-21 22:03:16

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH] x86/sgx: Add poison handling to reclaimer

On Tue, 2022-01-18 at 15:05 -0800, Reinette Chatre wrote:
> The machine check recovery handling in SGX added the changes
> listed below to the freeing of pages in sgx_free_epc_page().
> The SGX reclaimer contains an open coded version of
> sgx_free_epc_page() and thus did not obtain the changes in
> support of poison handling.
>
> The changes made to EPC page freeing in support of poison handling
> are:
> 1) A new SGX_EPC_PAGE_IS_FREE flag is set when the EPC page is
>    freed. Introduced in commit d6d261bded8a ("x86/sgx: Add new
>    sgx_epc_page flag bit to mark free pages").
> 2) A new "poison" field in struct sgx_epc_page is used to
>    determine whether a newly freed EPC page should be placed
>    on the list of poisoned or list of free pages. Introduced
>    in commit 992801ae9243 ("x86/sgx: Initial poison handling
>    for dirty and free pages").
> 3) The owner field in struct sgx_epc_page is cleared when the EPC
>    page is freed.  Introduced in commit 992801ae9243 ("x86/sgx:
>    Initial poison handling for dirty and free pages").
>
> Replace the open coded enclave page freeing code in the reclaimer
> with sgx_free_epc_page() to obtain support for poison page handling.
>
> Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark
> free pages")

AFAIK, this patch does not semantically break anything so it is not
a legit fixes tag.

BR, Jarkko

2022-01-21 22:21:09

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH] x86/sgx: Add poison handling to reclaimer

Hi Jarkko,

On 1/20/2022 4:57 AM, Jarkko Sakkinen wrote:
> On Tue, 2022-01-18 at 15:05 -0800, Reinette Chatre wrote:
>> The machine check recovery handling in SGX added the changes
>> listed below to the freeing of pages in sgx_free_epc_page().
>> The SGX reclaimer contains an open coded version of
>> sgx_free_epc_page() and thus did not obtain the changes in
>> support of poison handling.
>>
>> The changes made to EPC page freeing in support of poison handling
>> are:
>> 1) A new SGX_EPC_PAGE_IS_FREE flag is set when the EPC page is
>>    freed. Introduced in commit d6d261bded8a ("x86/sgx: Add new
>>    sgx_epc_page flag bit to mark free pages").
>> 2) A new "poison" field in struct sgx_epc_page is used to
>>    determine whether a newly freed EPC page should be placed
>>    on the list of poisoned or list of free pages. Introduced
>>    in commit 992801ae9243 ("x86/sgx: Initial poison handling
>>    for dirty and free pages").
>> 3) The owner field in struct sgx_epc_page is cleared when the EPC
>>    page is freed.  Introduced in commit 992801ae9243 ("x86/sgx:
>>    Initial poison handling for dirty and free pages").
>>
>> Replace the open coded enclave page freeing code in the reclaimer
>> with sgx_free_epc_page() to obtain support for poison page handling.
>>
>> Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark
>> free pages")
>
> AFAIK, this patch does not semantically break anything so it is not
> a legit fixes tag.
>

The commit you refer to, commit d6d261bded8a ("x86/sgx: Add new
sgx_epc_page flag bit to mark free pages", introduced a new page flag bit
(SGX_EPC_PAGE_IS_FREE) that should be set when an EPC page is freed. The
commit also sets the bit in sgx_free_epc_page() when an EPC page is freed.
The commit should also have set that bit when the EPC page is freed in the
reclaimer, which contains an open coded version of sgx_free_epc_page(),
but it did not. This fix adds the snippet that was omitted from that
commit.

Reinette