The SGX reclaimer code lacks page poison handling in its main
free path. This can lead to avoidable machine checks if a
poisoned page is freed and reallocated instead of being
isolated.
A troublesome scenario is:
1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
2. arch_memory_failure() is eventually called
3. (SGX) page->poison set to 1
4. Page is reclaimed
5. Page added to normal free lists by sgx_reclaim_pages()
^ This is the bug (poison pages should be isolated on the
sgx_poison_page_list instead)
6. Page is reallocated by some innocent enclave, a second (synchronous)
in-kernel #MC is induced, probably during EADD instruction.
^ This is the fallout from the bug
(6) is unfortunate and can be avoided by replacing the open coded
enclave page freeing code in the reclaimer with sgx_free_epc_page()
to obtain support for poison page handling that includes placing the
poisoned page on the correct list.
Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark free pages")
Fixes: 992801ae9243 ("x86/sgx: Initial poison handling for dirty and free pages")
Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since V1:
- V1: https://lore.kernel.org/lkml/ef74bd9548df61f77e802e7505affcfb5159c48c.1642545829.git.reinette.chatre@intel.com/
- Complete rewrite of commit message with significant guidance from Dave
who provided the summary as well as troublesome scenario.
arch/x86/kernel/cpu/sgx/main.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4b41efc9e367..997a5d0bc488 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -418,13 +418,7 @@ static void sgx_reclaim_pages(void)
kref_put(&encl_page->encl->refcount, sgx_encl_release);
epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
- section = &sgx_epc_sections[epc_page->section];
- node = section->node;
-
- spin_lock(&node->lock);
- list_add_tail(&epc_page->list, &node->free_page_list);
- spin_unlock(&node->lock);
- atomic_long_inc(&sgx_nr_free_pages);
+ sgx_free_epc_page(epc_page);
}
}
--
2.25.1
On Wed, 2022-01-19 at 14:23 -0800, Reinette Chatre wrote:
> The SGX reclaimer code lacks page poison handling in its main
> free path. This can lead to avoidable machine checks if a
> poisoned page is freed and reallocated instead of being
> isolated.
>
> A troublesome scenario is:
> 1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
> 2. arch_memory_failure() is eventually called
> 3. (SGX) page->poison set to 1
> 4. Page is reclaimed
> 5. Page added to normal free lists by sgx_reclaim_pages()
> ^ This is the bug (poison pages should be isolated on the
> sgx_poison_page_list instead)
> 6. Page is reallocated by some innocent enclave, a second
> (synchronous)
> in-kernel #MC is induced, probably during EADD instruction.
> ^ This is the fallout from the bug
>
> (6) is unfortunate and can be avoided by replacing the open coded
> enclave page freeing code in the reclaimer with sgx_free_epc_page()
> to obtain support for poison page handling that includes placing the
> poisoned page on the correct list.
>
> Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark
> free pages")
> Fixes: 992801ae9243 ("x86/sgx: Initial poison handling for dirty and
> free pages")
Same comment as for the first version: remove the first fixes tag.
BR, Jarkko
Hi Jarkko,
On 1/20/2022 5:09 AM, Jarkko Sakkinen wrote:
> On Wed, 2022-01-19 at 14:23 -0800, Reinette Chatre wrote:
>> The SGX reclaimer code lacks page poison handling in its main
>> free path. This can lead to avoidable machine checks if a
>> poisoned page is freed and reallocated instead of being
>> isolated.
>>
>> A troublesome scenario is:
>> 1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
>> 2. arch_memory_failure() is eventually called
>> 3. (SGX) page->poison set to 1
>> 4. Page is reclaimed
>> 5. Page added to normal free lists by sgx_reclaim_pages()
>> ^ This is the bug (poison pages should be isolated on the
>> sgx_poison_page_list instead)
>> 6. Page is reallocated by some innocent enclave, a second
>> (synchronous)
>> in-kernel #MC is induced, probably during EADD instruction.
>> ^ This is the fallout from the bug
>>
>> (6) is unfortunate and can be avoided by replacing the open coded
>> enclave page freeing code in the reclaimer with sgx_free_epc_page()
>> to obtain support for poison page handling that includes placing the
>> poisoned page on the correct list.
>>
>> Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark
>> free pages")
>> Fixes: 992801ae9243 ("x86/sgx: Initial poison handling for dirty and
>> free pages")
>
> Same comment as for the first version: remove the first fixes tag.
>
For completeness I'll duplicate my response also:
The commit you refer to, commit d6d261bded8a ("x86/sgx: Add new
sgx_epc_page flag bit to mark free pages", introduced a new page flag bit
(SGX_EPC_PAGE_IS_FREE) that should be set when an EPC page is freed. The
commit also sets the bit in sgx_free_epc_page() when an EPC page is freed.
The commit should also have set that bit when the EPC page is freed in the
reclaimer, which contains an open coded version of sgx_free_epc_page(),
but it did not. This fix adds the snippet that was omitted from that
commit.
Reinette
On Thu, Jan 20, 2022 at 10:20:01AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
>
> On 1/20/2022 5:09 AM, Jarkko Sakkinen wrote:
> > On Wed, 2022-01-19 at 14:23 -0800, Reinette Chatre wrote:
> >> The SGX reclaimer code lacks page poison handling in its main
> >> free path. This can lead to avoidable machine checks if a
> >> poisoned page is freed and reallocated instead of being
> >> isolated.
> >>
> >> A troublesome scenario is:
> >> ?1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
> >> ?2. arch_memory_failure() is eventually called
> >> ?3. (SGX) page->poison set to 1
> >> ?4. Page is reclaimed
> >> ?5. Page added to normal free lists by sgx_reclaim_pages()
> >> ??? ^ This is the bug (poison pages should be isolated on the
> >> ??? sgx_poison_page_list instead)
> >> ?6. Page is reallocated by some innocent enclave, a second
> >> (synchronous)
> >> ??? in-kernel #MC is induced, probably during EADD instruction.
> >> ??? ^ This is the fallout from the bug
> >>
> >> (6) is unfortunate and can be avoided by replacing the open coded
> >> enclave page freeing code in the reclaimer with sgx_free_epc_page()
> >> to obtain support for poison page handling that includes placing the
> >> poisoned page on the correct list.
> >>
> >> Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark
> >> free pages")
> >> Fixes: 992801ae9243 ("x86/sgx: Initial poison handling for dirty and
> >> free pages")
> >
> > Same comment as for the first version: remove the first fixes tag.
> >
>
> For completeness I'll duplicate my response also:
>
> The commit you refer to, commit d6d261bded8a ("x86/sgx: Add new
> sgx_epc_page flag bit to mark free pages", introduced a new page flag bit
> (SGX_EPC_PAGE_IS_FREE) that should be set when an EPC page is freed. The
> commit also sets the bit in sgx_free_epc_page() when an EPC page is freed.
> The commit should also have set that bit when the EPC page is freed in the
> reclaimer, which contains an open coded version of sgx_free_epc_page(),
> but it did not. This fix adds the snippet that was omitted from that
> commit.
>
> Reinette
Let me check. Yes, I can see it in lore, but somehow it has slipped out
of my inbox. My apologies.
The change would make sense even as a clean up (no reason to have open
coded of sgx_free_epc_page(), let alone this regression:
Reviewed-by: Jarkko Sakkinen <[email protected]>
BR, Jarkko
On Sun, Jan 23, 2022 at 01:27:10AM +0200, Jarkko Sakkinen wrote:
> On Thu, Jan 20, 2022 at 10:20:01AM -0800, Reinette Chatre wrote:
> > Hi Jarkko,
> >
> > On 1/20/2022 5:09 AM, Jarkko Sakkinen wrote:
> > > On Wed, 2022-01-19 at 14:23 -0800, Reinette Chatre wrote:
> > >> The SGX reclaimer code lacks page poison handling in its main
> > >> free path. This can lead to avoidable machine checks if a
> > >> poisoned page is freed and reallocated instead of being
> > >> isolated.
> > >>
> > >> A troublesome scenario is:
> > >> ?1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
> > >> ?2. arch_memory_failure() is eventually called
> > >> ?3. (SGX) page->poison set to 1
> > >> ?4. Page is reclaimed
> > >> ?5. Page added to normal free lists by sgx_reclaim_pages()
> > >> ??? ^ This is the bug (poison pages should be isolated on the
> > >> ??? sgx_poison_page_list instead)
> > >> ?6. Page is reallocated by some innocent enclave, a second
> > >> (synchronous)
> > >> ??? in-kernel #MC is induced, probably during EADD instruction.
> > >> ??? ^ This is the fallout from the bug
> > >>
> > >> (6) is unfortunate and can be avoided by replacing the open coded
> > >> enclave page freeing code in the reclaimer with sgx_free_epc_page()
> > >> to obtain support for poison page handling that includes placing the
> > >> poisoned page on the correct list.
> > >>
> > >> Fixes: d6d261bded8a ("x86/sgx: Add new sgx_epc_page flag bit to mark
> > >> free pages")
> > >> Fixes: 992801ae9243 ("x86/sgx: Initial poison handling for dirty and
> > >> free pages")
> > >
> > > Same comment as for the first version: remove the first fixes tag.
> > >
> >
> > For completeness I'll duplicate my response also:
> >
> > The commit you refer to, commit d6d261bded8a ("x86/sgx: Add new
> > sgx_epc_page flag bit to mark free pages", introduced a new page flag bit
> > (SGX_EPC_PAGE_IS_FREE) that should be set when an EPC page is freed. The
> > commit also sets the bit in sgx_free_epc_page() when an EPC page is freed.
> > The commit should also have set that bit when the EPC page is freed in the
> > reclaimer, which contains an open coded version of sgx_free_epc_page(),
> > but it did not. This fix adds the snippet that was omitted from that
> > commit.
> >
> > Reinette
>
> Let me check. Yes, I can see it in lore, but somehow it has slipped out
> of my inbox. My apologies.
And thanks for re-elaborating this.
/Jarkko