During review of TDX guests on Hyper-V patchset Dave pointed to the
potential race between changing page private/shared status and
load_unaligned_zeropad().
Fix the issue.
v3:
- Fix grammar;
- Add Sathya's Reviewed-bys;
v2:
- Add more info in commit message of the first patch.
- Move enc_status_change_finish_noop() into a separate patch.
- Fix typo in commit message and comment.
Kirill A. Shutemov (3):
x86/mm: Allow guest.enc_status_change_prepare() to fail
x86/tdx: Fix race between set_memory_encrypted() and
load_unaligned_zeropad()
x86/mm: Fix enc_status_change_finish_noop()
arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++--
arch/x86/include/asm/x86_init.h | 2 +-
arch/x86/kernel/x86_init.c | 4 +--
arch/x86/mm/mem_encrypt_amd.c | 4 ++-
arch/x86/mm/pat/set_memory.c | 3 +-
5 files changed, 69 insertions(+), 8 deletions(-)
--
2.39.3
Touching privately mapped GPA that is not properly converted to private
with MapGPA and accepted leads to an unrecoverable exit to VMM.
load_unaligned_zeropad() can touch memory that is not owned by the
caller, but just happened to next after the owned memory.
This load_unaligned_zeropad() behaviour makes it important when kernel
asks VMM to convert a GPA from shared to private or back. Kernel must
never have a page mapped into direct mapping (and aliases) as private
when the GPA is already converted to shared or when GPA is not yet
converted to private.
load_unaligned_zeropad() can touch memory that is not owned by the
caller, but just happens to be next after the owned memory. This
load_unaligned_zeropad() behavior makes it important when the kernel
asks VMM to convert a GPA from shared to private or back. The kernel
must never have a page mapped into direct mapping (and aliases) as
private when the GPA is already converted to shared or when the GPA is
not yet converted to private.
guest.enc_status_change_prepare() is called before adjusting direct
mapping and therefore is responsible for converting the memory to
private.
guest.enc_status_change_finish() is called after adjusting direct
mapping and it converts the memory to shared.
It is okay to have a shared mapping of memory that is not properly
converted. handle_mmio() knows how to deal with load_unaligned_zeropad()
stepping on it.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Fixes: 7dbde7631629 ("x86/mm/cpa: Add support for TDX shared memory")
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 61 insertions(+), 3 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index e146b599260f..f6213a10de3a 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -840,6 +840,30 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
return true;
}
+static bool tdx_enc_status_change_prepare(unsigned long vaddr, int numpages,
+ bool enc)
+{
+ /*
+ * Only handle shared->private conversion here.
+ * See the comment in tdx_early_init().
+ */
+ if (enc)
+ return tdx_enc_status_changed(vaddr, numpages, enc);
+ return true;
+}
+
+static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
+ bool enc)
+{
+ /*
+ * Only handle private->shared conversion here.
+ * See the comment in tdx_early_init().
+ */
+ if (!enc)
+ return tdx_enc_status_changed(vaddr, numpages, enc);
+ return true;
+}
+
void __init tdx_early_init(void)
{
u64 cc_mask;
@@ -867,9 +891,43 @@ void __init tdx_early_init(void)
*/
physical_mask &= cc_mask - 1;
- x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
- x86_platform.guest.enc_tlb_flush_required = tdx_tlb_flush_required;
- x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;
+ /*
+ * Touching privately mapped GPA that is not properly converted to
+ * private with MapGPA and accepted leads to an unrecoverable exit
+ * to VMM.
+ *
+ * load_unaligned_zeropad() can touch memory that is not owned by the
+ * caller, but just happened to next after the owned memory.
+ * This load_unaligned_zeropad() behaviour makes it important when
+ * kernel asks VMM to convert a GPA from shared to private or back.
+ * Kernel must never have a page mapped into direct mapping (and
+ * aliases) as private when the GPA is already converted to shared or
+ * when GPA is not yet converted to private.
+ *
+ * load_unaligned_zeropad() can touch memory that is not owned by the
+ * caller, but just happens to be next after the owned memory. This
+ * load_unaligned_zeropad() behavior makes it important when the kernel
+ * asks VMM to convert a GPA from shared to private or back. The kernel
+ * must never have a page mapped into direct mapping (and aliases) as
+ * private when the GPA is already converted to shared or when the GPA
+ * is not yet converted to private.
+ *
+ * guest.enc_status_change_prepare() is called before adjusting direct
+ * mapping and therefore is responsible for converting the memory to
+ * private.
+ *
+ * guest.enc_status_change_finish() is called after adjusting direct
+ * mapping and it converts the memory to shared.
+ *
+ * It is okay to have a shared mapping of memory that is not properly
+ * converted. handle_mmio() knows how to deal with
+ * load_unaligned_zeropad() stepping on it.
+ */
+ x86_platform.guest.enc_status_change_prepare = tdx_enc_status_change_prepare;
+ x86_platform.guest.enc_status_change_finish = tdx_enc_status_change_finish;
+
+ x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
+ x86_platform.guest.enc_tlb_flush_required = tdx_tlb_flush_required;
pr_info("Guest detected\n");
}
--
2.39.3
enc_status_change_finish_noop() is now defined as always-fail, which
doesn't make sense for noop.
The change has no user-visible effect because it is only called if the
platform has CC_ATTR_MEM_ENCRYPT. All platforms with the attribute
override the callback with their own implementation.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/kernel/x86_init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index f230d4d7d8eb..64664311ac2b 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -131,7 +131,7 @@ struct x86_cpuinit_ops x86_cpuinit = {
static void default_nmi_init(void) { };
static bool enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { return true; }
-static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return false; }
+static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return true; }
static bool enc_tlb_flush_required_noop(bool enc) { return false; }
static bool enc_cache_flush_required_noop(void) { return false; }
static bool is_private_mmio_noop(u64 addr) {return false; }
--
2.39.3
TDX code is going to provide guest.enc_status_change_prepare() that is
able to fail. TDX will use the call to convert the GPA range from shared
to private. This operation can fail.
Add a way to return an error from the callback.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/include/asm/x86_init.h | 2 +-
arch/x86/kernel/x86_init.c | 2 +-
arch/x86/mm/mem_encrypt_amd.c | 4 +++-
arch/x86/mm/pat/set_memory.c | 3 ++-
4 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 88085f369ff6..1ca9701917c5 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -150,7 +150,7 @@ struct x86_init_acpi {
* @enc_cache_flush_required Returns true if a cache flush is needed before changing page encryption status
*/
struct x86_guest {
- void (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
+ bool (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
bool (*enc_status_change_finish)(unsigned long vaddr, int npages, bool enc);
bool (*enc_tlb_flush_required)(bool enc);
bool (*enc_cache_flush_required)(void);
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index d82f4fa2f1bf..f230d4d7d8eb 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -130,7 +130,7 @@ struct x86_cpuinit_ops x86_cpuinit = {
static void default_nmi_init(void) { };
-static void enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { }
+static bool enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { return true; }
static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return false; }
static bool enc_tlb_flush_required_noop(bool enc) { return false; }
static bool enc_cache_flush_required_noop(void) { return false; }
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index e0b51c09109f..4f95c449a406 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -319,7 +319,7 @@ static void enc_dec_hypercall(unsigned long vaddr, int npages, bool enc)
#endif
}
-static void amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc)
+static bool amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc)
{
/*
* To maintain the security guarantees of SEV-SNP guests, make sure
@@ -327,6 +327,8 @@ static void amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool
*/
if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP) && !enc)
snp_set_memory_shared(vaddr, npages);
+
+ return true;
}
/* Return true unconditionally: return value doesn't matter for the SEV side */
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 7159cf787613..b8f48ebe753c 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2151,7 +2151,8 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
cpa_flush(&cpa, x86_platform.guest.enc_cache_flush_required());
/* Notify hypervisor that we are about to set/clr encryption attribute. */
- x86_platform.guest.enc_status_change_prepare(addr, numpages, enc);
+ if (!x86_platform.guest.enc_status_change_prepare(addr, numpages, enc))
+ return -EIO;
ret = __change_page_attr_set_clr(&cpa, 1);
--
2.39.3
On 6/6/23 02:56, Kirill A. Shutemov wrote:
> load_unaligned_zeropad() can touch memory that is not owned by the
> caller, but just happened to next after the owned memory.
> This load_unaligned_zeropad() behaviour makes it important when kernel
> asks VMM to convert a GPA from shared to private or back. Kernel must
> never have a page mapped into direct mapping (and aliases) as private
> when the GPA is already converted to shared or when GPA is not yet
> converted to private.
>
> load_unaligned_zeropad() can touch memory that is not owned by the
> caller, but just happens to be next after the owned memory. This
> load_unaligned_zeropad() behavior makes it important when the kernel
> asks VMM to convert a GPA from shared to private or back. The kernel
> must never have a page mapped into direct mapping (and aliases) as
> private when the GPA is already converted to shared or when the GPA is
> not yet converted to private.
Heh, that must be really important info to have it in the changelog twice!
I'll fix it up when I apply it.
On Tue, Jun 06, 2023 at 11:14:29AM -0700, Dave Hansen wrote:
> On 6/6/23 02:56, Kirill A. Shutemov wrote:
> > load_unaligned_zeropad() can touch memory that is not owned by the
> > caller, but just happened to next after the owned memory.
> > This load_unaligned_zeropad() behaviour makes it important when kernel
> > asks VMM to convert a GPA from shared to private or back. Kernel must
> > never have a page mapped into direct mapping (and aliases) as private
> > when the GPA is already converted to shared or when GPA is not yet
> > converted to private.
> >
> > load_unaligned_zeropad() can touch memory that is not owned by the
> > caller, but just happens to be next after the owned memory. This
> > load_unaligned_zeropad() behavior makes it important when the kernel
> > asks VMM to convert a GPA from shared to private or back. The kernel
> > must never have a page mapped into direct mapping (and aliases) as
> > private when the GPA is already converted to shared or when the GPA is
> > not yet converted to private.
>
> Heh, that must be really important info to have it in the changelog twice!
>
> I'll fix it up when I apply it.
Ouch. Please fix the comment in the code too.
--
Kiryl Shutsemau / Kirill A. Shutemov
On 6/6/23 11:37, Kirill A. Shutemov wrote:
>> Heh, that must be really important info to have it in the changelog twice!
>>
>> I'll fix it up when I apply it.
> Ouch. Please fix the comment in the code too.
I couldn't help myself and rewrote the changelog and comment. Please
let me know if it looks OK:
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/tdx
On Tue, Jun 06, 2023 at 04:29:15PM -0700, Dave Hansen wrote:
> On 6/6/23 11:37, Kirill A. Shutemov wrote:
> >> Heh, that must be really important info to have it in the changelog twice!
> >>
> >> I'll fix it up when I apply it.
> > Ouch. Please fix the comment in the code too.
>
> I couldn't help myself and rewrote the changelog and comment. Please
> let me know if it looks OK:
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/tdx
Looks good. Thank you.
--
Kiryl Shutsemau / Kirill A. Shutemov
From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, June 6, 2023 2:56 AM
>
> During review of TDX guests on Hyper-V patchset Dave pointed to the
> potential race between changing page private/shared status and
> load_unaligned_zeropad().
>
> Fix the issue.
>
> v3:
> - Fix grammar;
> - Add Sathya's Reviewed-bys;
> v2:
> - Add more info in commit message of the first patch.
> - Move enc_status_change_finish_noop() into a separate patch.
> - Fix typo in commit message and comment.
>
> Kirill A. Shutemov (3):
> x86/mm: Allow guest.enc_status_change_prepare() to fail
> x86/tdx: Fix race between set_memory_encrypted() and
> load_unaligned_zeropad()
> x86/mm: Fix enc_status_change_finish_noop()
>
> arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++--
> arch/x86/include/asm/x86_init.h | 2 +-
> arch/x86/kernel/x86_init.c | 4 +--
> arch/x86/mm/mem_encrypt_amd.c | 4 ++-
> arch/x86/mm/pat/set_memory.c | 3 +-
> 5 files changed, 69 insertions(+), 8 deletions(-)
>
> --
> 2.39.3
These fixes notwithstanding, load_unaligned_zeropad() is not handled
properly in a TDX VM. The problem was introduced with commit
c4e34dd99f2e, which moved the fixup code to function
ex_handler_zeropad(). This new function does a verification against
fault_addr, and the verification always fails because fault_addr is zero.
The call sequence is:
exc_virtualization_exception()
ve_raise_fault()
gp_try_fixup_and_notify() <-- always passes 0 as fault_addr
fixup_exception()
ex_handler_zeropad()
The validation of fault_addr could probably be removed since
such validation wasn't there prior to c4e34dd99f2e. But before
going down that path, I want to propose a different top-level
solution to the interaction between load_unaligned_zeropad()
and CoCo VM page transitions between private and shared.
When a page is transitioning, the caller can and should ensure
that it is not being accessed during the transition. But we have
the load_unaligned_zeropad() wildcard. So do the following for
the transition sequence in __set_memory_enc_pgtable():
1. Remove aliasing mappings
2. Remove the PRESENT bit from the PTEs of all transitioning pages
3. Flush the TLB globally
4. Flush the data cache if needed
5. Set/clear the encryption attribute as appropriate
6. Notify the hypervisor of the page status change
7. Add back the PRESENT bit
With this approach, load_unaligned_zeropad() just takes the
normal page-fault-based fixup path if it touches a page that is
transitioning. As a result, load_unaligned_zeropad() and CoCo
VM page transitioning are completely decoupled. We don't
need to handle architecture-specific CoCo VM exceptions and
fix things up.
I've posted an RFC PATCH that implements this approach [1],
and tested on TDX VMs and SEV-SNP VMs in vTOM mode.
The RFC PATCH has more details on the benefits and
implications. Follow-up discussion should probably be done
on that email thread.
Michael
[1] https://lore.kernel.org/lkml/[email protected]/T/#u
On 7/6/23 09:48, Michael Kelley (LINUX) wrote:
> When a page is transitioning, the caller can and should ensure
> that it is not being accessed during the transition. But we have
> the load_unaligned_zeropad() wildcard. So do the following for
> the transition sequence in __set_memory_enc_pgtable():
>
> 1. Remove aliasing mappings
> 2. Remove the PRESENT bit from the PTEs of all transitioning pages
> 3. Flush the TLB globally
> 4. Flush the data cache if needed
> 5. Set/clear the encryption attribute as appropriate
> 6. Notify the hypervisor of the page status change
> 7. Add back the PRESENT bit
>
> With this approach, load_unaligned_zeropad() just takes the
> normal page-fault-based fixup path if it touches a page that is
> transitioning.
Yes, this does seem much simpler. It funnels everything through the
page fault handler and also doesn't require because careful about the
ordering of the private<=>shared conversions.
On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, June 6, 2023 2:56 AM
> >
> > During review of TDX guests on Hyper-V patchset Dave pointed to the
> > potential race between changing page private/shared status and
> > load_unaligned_zeropad().
> >
> > Fix the issue.
> >
> > v3:
> > - Fix grammar;
> > - Add Sathya's Reviewed-bys;
> > v2:
> > - Add more info in commit message of the first patch.
> > - Move enc_status_change_finish_noop() into a separate patch.
> > - Fix typo in commit message and comment.
> >
> > Kirill A. Shutemov (3):
> > x86/mm: Allow guest.enc_status_change_prepare() to fail
> > x86/tdx: Fix race between set_memory_encrypted() and
> > load_unaligned_zeropad()
> > x86/mm: Fix enc_status_change_finish_noop()
> >
> > arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++--
> > arch/x86/include/asm/x86_init.h | 2 +-
> > arch/x86/kernel/x86_init.c | 4 +--
> > arch/x86/mm/mem_encrypt_amd.c | 4 ++-
> > arch/x86/mm/pat/set_memory.c | 3 +-
> > 5 files changed, 69 insertions(+), 8 deletions(-)
> >
> > --
> > 2.39.3
>
> These fixes notwithstanding, load_unaligned_zeropad() is not handled
> properly in a TDX VM. The problem was introduced with commit
> c4e34dd99f2e, which moved the fixup code to function
> ex_handler_zeropad().
Ughh.. I needed to pay more attention to the rework around
load_unaligned_zeropad() :/
> This new function does a verification against
> fault_addr, and the verification always fails because fault_addr is zero.
> The call sequence is:
>
> exc_virtualization_exception()
> ve_raise_fault()
> gp_try_fixup_and_notify() <-- always passes 0 as fault_addr
> fixup_exception()
> ex_handler_zeropad()
>
> The validation of fault_addr could probably be removed since
> such validation wasn't there prior to c4e34dd99f2e. But before
> going down that path, I want to propose a different top-level
> solution to the interaction between load_unaligned_zeropad()
> and CoCo VM page transitions between private and shared.
>
> When a page is transitioning, the caller can and should ensure
> that it is not being accessed during the transition. But we have
> the load_unaligned_zeropad() wildcard. So do the following for
> the transition sequence in __set_memory_enc_pgtable():
>
> 1. Remove aliasing mappings
> 2. Remove the PRESENT bit from the PTEs of all transitioning pages
> 3. Flush the TLB globally
> 4. Flush the data cache if needed
> 5. Set/clear the encryption attribute as appropriate
> 6. Notify the hypervisor of the page status change
> 7. Add back the PRESENT bit
>
> With this approach, load_unaligned_zeropad() just takes the
> normal page-fault-based fixup path if it touches a page that is
> transitioning. As a result, load_unaligned_zeropad() and CoCo
> VM page transitioning are completely decoupled. We don't
> need to handle architecture-specific CoCo VM exceptions and
> fix things up.
It only addresses the problem that happens on transition, but
load_unaligned_zeropad() is still a problem for the shared mappings in
general, after transition is complete. Like if load_unaligned_zeropad()
steps from private to shared mapping and shared mapping triggers #VE,
kernel should be able to handle it.
The testcase that triggers the problem (It is ugly, but useful.):
#include <linux/mm.h>
#include <asm/word-at-a-time.h>
static int f(pte_t *pte, unsigned long addr, void *data)
{
pte_t ***p = data;
if (p) {
*(*p) = pte;
(*p)++;
}
return 0;
}
static struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
{
struct vm_struct *area;
area = get_vm_area_caller(size, VM_IOREMAP,
__builtin_return_address(0));
if (area == NULL)
return NULL;
if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
size, f, ptes ? &ptes : NULL)) {
free_vm_area(area);
return NULL;
}
return area;
}
#define SHARED_PFN_TRIGGERS_VE 0x80000
static int test_zeropad(void)
{
struct vm_struct *area;
pte_t *pte[2];
unsigned long a;
struct page *page;
page = alloc_page(GFP_KERNEL);
area = alloc_vm_area(2 * PAGE_SIZE, pte);
set_pte_at(&init_mm, (unsigned long)area->addr, pte[0],
pfn_pte(page_to_pfn(page), PAGE_KERNEL));
set_pte_at(&init_mm, (unsigned long)(area->addr + PAGE_SIZE), pte[1],
pfn_pte(SHARED_PFN_TRIGGERS_VE, pgprot_decrypted(PAGE_KERNEL)));
a = load_unaligned_zeropad(area->addr + PAGE_SIZE - 1);
printk("a: %#lx\n", a);
for(;;);
return 0;
}
late_initcall(test_zeropad);
Below is a patch that provides fixup code with the address it wants.
Any comments?
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 58b1f208eff5..4a817d20ce3b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -697,9 +697,10 @@ static bool try_fixup_enqcmd_gp(void)
}
static bool gp_try_fixup_and_notify(struct pt_regs *regs, int trapnr,
- unsigned long error_code, const char *str)
+ unsigned long error_code, const char *str,
+ unsigned long address)
{
- if (fixup_exception(regs, trapnr, error_code, 0))
+ if (fixup_exception(regs, trapnr, error_code, address))
return true;
current->thread.error_code = error_code;
@@ -759,7 +760,7 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
goto exit;
}
- if (gp_try_fixup_and_notify(regs, X86_TRAP_GP, error_code, desc))
+ if (gp_try_fixup_and_notify(regs, X86_TRAP_GP, error_code, desc, 0))
goto exit;
if (error_code)
@@ -1357,17 +1358,20 @@ DEFINE_IDTENTRY(exc_device_not_available)
#define VE_FAULT_STR "VE fault"
-static void ve_raise_fault(struct pt_regs *regs, long error_code)
+static void ve_raise_fault(struct pt_regs *regs, long error_code,
+ unsigned long address)
{
if (user_mode(regs)) {
gp_user_force_sig_segv(regs, X86_TRAP_VE, error_code, VE_FAULT_STR);
return;
}
- if (gp_try_fixup_and_notify(regs, X86_TRAP_VE, error_code, VE_FAULT_STR))
+ if (gp_try_fixup_and_notify(regs, X86_TRAP_VE, error_code,
+ VE_FAULT_STR, address)) {
return;
+ }
- die_addr(VE_FAULT_STR, regs, error_code, 0);
+ die_addr(VE_FAULT_STR, regs, error_code, address);
}
/*
@@ -1431,7 +1435,7 @@ DEFINE_IDTENTRY(exc_virtualization_exception)
* it successfully, treat it as #GP(0) and handle it.
*/
if (!tdx_handle_virt_exception(regs, &ve))
- ve_raise_fault(regs, 0);
+ ve_raise_fault(regs, 0, ve.gla);
cond_local_irq_disable(regs);
}
--
Kiryl Shutsemau / Kirill A. Shutemov
From: Kirill A. Shutemov <[email protected]> Sent: Friday, July 7, 2023 7:07 AM
>
> On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, June 6, 2023 2:56 AM
[snip]
>
> It only addresses the problem that happens on transition, but
> load_unaligned_zeropad() is still a problem for the shared mappings in
> general, after transition is complete. Like if load_unaligned_zeropad()
> steps from private to shared mapping and shared mapping triggers #VE,
> kernel should be able to handle it.
I'm showing my ignorance of TDX architectural details, but what's the
situation where shared mappings in general can trigger a #VE? How
do such situations get handled for references that aren't from
load_unaligned_zeropad()?
>
> The testcase that triggers the problem (It is ugly, but useful.):
>
> #include <linux/mm.h>
> #include <asm/word-at-a-time.h>
>
> static int f(pte_t *pte, unsigned long addr, void *data)
> {
> pte_t ***p = data;
>
> if (p) {
> *(*p) = pte;
> (*p)++;
> }
> return 0;
> }
>
> static struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
> {
> struct vm_struct *area;
>
> area = get_vm_area_caller(size, VM_IOREMAP,
> __builtin_return_address(0));
> if (area == NULL)
> return NULL;
>
> if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
> size, f, ptes ? &ptes : NULL)) {
> free_vm_area(area);
> return NULL;
> }
>
> return area;
> }
>
> #define SHARED_PFN_TRIGGERS_VE 0x80000
>
> static int test_zeropad(void)
> {
> struct vm_struct *area;
> pte_t *pte[2];
> unsigned long a;
> struct page *page;
>
> page = alloc_page(GFP_KERNEL);
> area = alloc_vm_area(2 * PAGE_SIZE, pte);
>
> set_pte_at(&init_mm, (unsigned long)area->addr, pte[0],
> pfn_pte(page_to_pfn(page), PAGE_KERNEL));
> set_pte_at(&init_mm, (unsigned long)(area->addr + PAGE_SIZE), pte[1],
> pfn_pte(SHARED_PFN_TRIGGERS_VE,
> pgprot_decrypted(PAGE_KERNEL)));
>
> a = load_unaligned_zeropad(area->addr + PAGE_SIZE - 1);
> printk("a: %#lx\n", a);
> for(;;);
> return 0;
> }
> late_initcall(test_zeropad);
>
> Below is a patch that provides fixup code with the address it wants.
>
> Any comments?
This looks good to me. I applied the diff to a TDX VM running on
Hyper-V. When a load_unaligned_zeropad() occurs on a page that is
transitioning between private and shared, the zeropad fixup is now
done correctly via the #VE handler. (This is *without* my RFC patch to
mark the pages invalid during a transition.)
Michael
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 58b1f208eff5..4a817d20ce3b 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -697,9 +697,10 @@ static bool try_fixup_enqcmd_gp(void)
> }
>
> static bool gp_try_fixup_and_notify(struct pt_regs *regs, int trapnr,
> - unsigned long error_code, const char *str)
> + unsigned long error_code, const char *str,
> + unsigned long address)
> {
> - if (fixup_exception(regs, trapnr, error_code, 0))
> + if (fixup_exception(regs, trapnr, error_code, address))
> return true;
>
> current->thread.error_code = error_code;
> @@ -759,7 +760,7 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
> goto exit;
> }
>
> - if (gp_try_fixup_and_notify(regs, X86_TRAP_GP, error_code, desc))
> + if (gp_try_fixup_and_notify(regs, X86_TRAP_GP, error_code, desc, 0))
> goto exit;
>
> if (error_code)
> @@ -1357,17 +1358,20 @@ DEFINE_IDTENTRY(exc_device_not_available)
>
> #define VE_FAULT_STR "VE fault"
>
> -static void ve_raise_fault(struct pt_regs *regs, long error_code)
> +static void ve_raise_fault(struct pt_regs *regs, long error_code,
> + unsigned long address)
> {
> if (user_mode(regs)) {
> gp_user_force_sig_segv(regs, X86_TRAP_VE, error_code,
> VE_FAULT_STR);
> return;
> }
>
> - if (gp_try_fixup_and_notify(regs, X86_TRAP_VE, error_code, VE_FAULT_STR))
> + if (gp_try_fixup_and_notify(regs, X86_TRAP_VE, error_code,
> + VE_FAULT_STR, address)) {
> return;
> + }
>
> - die_addr(VE_FAULT_STR, regs, error_code, 0);
> + die_addr(VE_FAULT_STR, regs, error_code, address);
> }
>
> /*
> @@ -1431,7 +1435,7 @@ DEFINE_IDTENTRY(exc_virtualization_exception)
> * it successfully, treat it as #GP(0) and handle it.
> */
> if (!tdx_handle_virt_exception(regs, &ve))
> - ve_raise_fault(regs, 0);
> + ve_raise_fault(regs, 0, ve.gla);
>
> cond_local_irq_disable(regs);
> }
> --
> Kiryl Shutsemau / Kirill A. Shutemov
On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> From: Kirill A. Shutemov <[email protected]> Sent: Friday, July 7, 2023 7:07 AM
> >
> > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, June 6, 2023 2:56 AM
>
> [snip]
>
> >
> > It only addresses the problem that happens on transition, but
> > load_unaligned_zeropad() is still a problem for the shared mappings in
> > general, after transition is complete. Like if load_unaligned_zeropad()
> > steps from private to shared mapping and shared mapping triggers #VE,
> > kernel should be able to handle it.
>
> I'm showing my ignorance of TDX architectural details, but what's the
> situation where shared mappings in general can trigger a #VE? How
> do such situations get handled for references that aren't from
> load_unaligned_zeropad()?
>
Shared mappings are under host/VMM control. It can just not map the page
in shared-ept and trigger ept-violation #VE.
> > Any comments?
>
> This looks good to me. I applied the diff to a TDX VM running on
> Hyper-V. When a load_unaligned_zeropad() occurs on a page that is
> transitioning between private and shared, the zeropad fixup is now
> done correctly via the #VE handler. (This is *without* my RFC patch to
> mark the pages invalid during a transition.)
Great.
I am at vacation for the next two weeks. I will prepare a proper patch
when I am back. Feel free to make patch yourself if you feel it is urgent.
--
Kiryl Shutsemau / Kirill A. Shutemov
On 7/9/23 01:09, Kirill A. Shutemov wrote:
> On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
>> From: Kirill A. Shutemov <[email protected]> Sent: Friday, July 7, 2023 7:07 AM
>>>
>>> On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
>>>> From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, June 6, 2023 2:56 AM
>>
>> [snip]
>>
>>>
>>> It only addresses the problem that happens on transition, but
>>> load_unaligned_zeropad() is still a problem for the shared mappings in
>>> general, after transition is complete. Like if load_unaligned_zeropad()
>>> steps from private to shared mapping and shared mapping triggers #VE,
>>> kernel should be able to handle it.
>>
>> I'm showing my ignorance of TDX architectural details, but what's the
>> situation where shared mappings in general can trigger a #VE? How
>> do such situations get handled for references that aren't from
>> load_unaligned_zeropad()?
>>
>
> Shared mappings are under host/VMM control. It can just not map the page
> in shared-ept and trigger ept-violation #VE.
>
>>> Any comments?
>>
>> This looks good to me. I applied the diff to a TDX VM running on
>> Hyper-V. When a load_unaligned_zeropad() occurs on a page that is
>> transitioning between private and shared, the zeropad fixup is now
>> done correctly via the #VE handler. (This is *without* my RFC patch to
>> mark the pages invalid during a transition.)
>
> Great.
>
> I am at vacation for the next two weeks. I will prepare a proper patch
> when I am back. Feel free to make patch yourself if you feel it is urgent.
>
Michael,
Are you still pursuing the RFC patch, then? Just trying to decide whether
a patch will be needed for SNP...
Thanks,
Tom
From: Tom Lendacky <[email protected]> Sent: Monday, July 10, 2023 6:59 AM
>
> On 7/9/23 01:09, Kirill A. Shutemov wrote:
> > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> >> From: Kirill A. Shutemov <[email protected]> Sent: Friday, July 7, 2023 7:07 AM
> >>>
> >>> On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> >>>> From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, June 6,
> 2023 2:56 AM
> >>
> >> [snip]
> >>
> >>>
> >>> It only addresses the problem that happens on transition, but
> >>> load_unaligned_zeropad() is still a problem for the shared mappings in
> >>> general, after transition is complete. Like if load_unaligned_zeropad()
> >>> steps from private to shared mapping and shared mapping triggers #VE,
> >>> kernel should be able to handle it.
> >>
> >> I'm showing my ignorance of TDX architectural details, but what's the
> >> situation where shared mappings in general can trigger a #VE? How
> >> do such situations get handled for references that aren't from
> >> load_unaligned_zeropad()?
> >>
> >
> > Shared mappings are under host/VMM control. It can just not map the page
> > in shared-ept and trigger ept-violation #VE.
> >
> >>> Any comments?
> >>
> >> This looks good to me. I applied the diff to a TDX VM running on
> >> Hyper-V. When a load_unaligned_zeropad() occurs on a page that is
> >> transitioning between private and shared, the zeropad fixup is now
> >> done correctly via the #VE handler. (This is *without* my RFC patch to
> >> mark the pages invalid during a transition.)
> >
> > Great.
> >
> > I am at vacation for the next two weeks. I will prepare a proper patch
> > when I am back. Feel free to make patch yourself if you feel it is urgent.
> >
>
> Michael,
>
> Are you still pursuing the RFC patch, then? Just trying to decide whether
> a patch will be needed for SNP...
>
Yes, I'm still pursuing the RFC patch. In addition to solving the SNP
problems, I think there are some benefits with TDX. But I need to have
further discussion with Kirill, which may be delayed a bit while he's out
on vacation.
Michael
From: Kirill A. Shutemov <[email protected]> Sent: Saturday, July 8, 2023 11:09 PM
>
> On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > From: Kirill A. Shutemov <[email protected]> Sent: Friday, July 7, 2023 7:07 AM
> > >
> > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, June 6,
> 2023 2:56 AM
> >
> > [snip]
> >
> > >
> > > It only addresses the problem that happens on transition, but
> > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > steps from private to shared mapping and shared mapping triggers #VE,
> > > kernel should be able to handle it.
> >
> > I'm showing my ignorance of TDX architectural details, but what's the
> > situation where shared mappings in general can trigger a #VE? How
> > do such situations get handled for references that aren't from
> > load_unaligned_zeropad()?
> >
>
> Shared mappings are under host/VMM control. It can just not map the page
> in shared-ept and trigger ept-violation #VE.
I know you are out on vacation, but let me follow up now for further
discussion when you are back.
Isn't the scenario you are describing a malfunctioning or malicious
host/VMM? Would what you are describing be done as part of normal
operation? Kernel code must have switched the page from private to
shared for some purpose. As soon as that code (which presumably
does not have any entry in the exception table) touches the page, it
would take the #VE and the enter the die path because there's no fixup.
So is there value in having load_unaligned_zeropad() handle the #VE and
succeed where a normal reference would fail?
I'd still like to see the private <-> shared transition code mark the pages
as invalid during the transition, and avoid the possibility of #VE and
similar cases with SEV-SNP. Such approach reduces (eliminates?)
entanglement between CoCo-specific exceptions and
load_unaligned_zeropad(). It also greatly simplifies TD Partition cases
and SEV-SNP cases where a paravisor is used.
But maybe I'm still missing a case where code must handle the #VE
for load_unaligned_zeropad() outside of private <-> shared transitions.
Michael
>
> > > Any comments?
> >
> > This looks good to me. I applied the diff to a TDX VM running on
> > Hyper-V. When a load_unaligned_zeropad() occurs on a page that is
> > transitioning between private and shared, the zeropad fixup is now
> > done correctly via the #VE handler. (This is *without* my RFC patch to
> > mark the pages invalid during a transition.)
>
> Great.
>
> I am at vacation for the next two weeks. I will prepare a proper patch
> when I am back. Feel free to make patch yourself if you feel it is urgent.
>
> --
> Kiryl Shutsemau / Kirill A. Shutemov
On Thu, Jul 13, 2023 at 02:43:39PM +0000, Michael Kelley (LINUX) wrote:
> From: Kirill A. Shutemov <[email protected]> Sent: Saturday, July 8, 2023 11:09 PM
> >
> > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > > From: Kirill A. Shutemov <[email protected]> Sent: Friday, July 7, 2023 7:07 AM
> > > >
> > > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > > From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, June 6,
> > 2023 2:56 AM
> > >
> > > [snip]
> > >
> > > >
> > > > It only addresses the problem that happens on transition, but
> > > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > > steps from private to shared mapping and shared mapping triggers #VE,
> > > > kernel should be able to handle it.
> > >
> > > I'm showing my ignorance of TDX architectural details, but what's the
> > > situation where shared mappings in general can trigger a #VE? How
> > > do such situations get handled for references that aren't from
> > > load_unaligned_zeropad()?
> > >
> >
> > Shared mappings are under host/VMM control. It can just not map the page
> > in shared-ept and trigger ept-violation #VE.
>
> I know you are out on vacation, but let me follow up now for further
> discussion when you are back.
>
> Isn't the scenario you are describing a malfunctioning or malicious
> host/VMM? Would what you are describing be done as part of normal
> operation? Kernel code must have switched the page from private to
> shared for some purpose. As soon as that code (which presumably
> does not have any entry in the exception table) touches the page, it
> would take the #VE and the enter the die path because there's no fixup.
> So is there value in having load_unaligned_zeropad() handle the #VE and
> succeed where a normal reference would fail?
#VE on shared memory is legitimately used for MMIO. But MMIO region is
usually separate from the real memory in physical address space.
But we also have DMA.
DMA pages allocated from common pool of memory and they can be next to
dentry cache that kernel accesses with load_unaligned_zeropad(). DMA pages
are shared, but they usually backed by memory and not cause #VE. However
shared memory is under full control from VMM and VMM can remove page at
any point which would make platform to deliver #VE to the guest. This is
pathological scenario, but I think it still worth handling gracefully.
> I'd still like to see the private <-> shared transition code mark the pages
> as invalid during the transition, and avoid the possibility of #VE and
> similar cases with SEV-SNP. Such approach reduces (eliminates?)
> entanglement between CoCo-specific exceptions and
> load_unaligned_zeropad(). It also greatly simplifies TD Partition cases
> and SEV-SNP cases where a paravisor is used.
I doesn't eliminates issue for TDX as the scenario above is not transient.
It can happen after memory is converted to shared.
--
Kiryl Shutsemau / Kirill A. Shutemov
From: Kirill A. Shutemov <[email protected]> Sent: Monday, July 24, 2023 4:19 PM
>
> On Thu, Jul 13, 2023 at 02:43:39PM +0000, Michael Kelley (LINUX) wrote:
> > From: Kirill A. Shutemov <[email protected]> Sent: Saturday, July 8, 2023 11:09 PM
> > >
> > > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > > > From: Kirill A. Shutemov <[email protected]> Sent: Friday, July 7, 2023 7:07 AM
> > > > >
> > > > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, June 6, 2023 2:56 AM
> > > >
> > > > [snip]
> > > >
> > > > >
> > > > > It only addresses the problem that happens on transition, but
> > > > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > > > steps from private to shared mapping and shared mapping triggers #VE,
> > > > > kernel should be able to handle it.
> > > >
> > > > I'm showing my ignorance of TDX architectural details, but what's the
> > > > situation where shared mappings in general can trigger a #VE? How
> > > > do such situations get handled for references that aren't from
> > > > load_unaligned_zeropad()?
> > > >
> > >
> > > Shared mappings are under host/VMM control. It can just not map the page
> > > in shared-ept and trigger ept-violation #VE.
> >
> > I know you are out on vacation, but let me follow up now for further
> > discussion when you are back.
> >
> > Isn't the scenario you are describing a malfunctioning or malicious
> > host/VMM? Would what you are describing be done as part of normal
> > operation? Kernel code must have switched the page from private to
> > shared for some purpose. As soon as that code (which presumably
> > does not have any entry in the exception table) touches the page, it
> > would take the #VE and the enter the die path because there's no fixup.
> > So is there value in having load_unaligned_zeropad() handle the #VE and
> > succeed where a normal reference would fail?
>
> #VE on shared memory is legitimately used for MMIO. But MMIO region is
> usually separate from the real memory in physical address space.
>
> But we also have DMA.
>
> DMA pages allocated from common pool of memory and they can be next to
> dentry cache that kernel accesses with load_unaligned_zeropad(). DMA pages
> are shared, but they usually backed by memory and not cause #VE. However
> shared memory is under full control from VMM and VMM can remove page at
> any point which would make platform to deliver #VE to the guest. This is
> pathological scenario, but I think it still worth handling gracefully.
Yep, pages targeted by DMA have been transitioned to shared, and could
be scattered anywhere in the guest physical address space. Such a page
could be touched by load_unaligned_zeropad(). But could you elaborate
more on the "pathological scenario" where the guest physical page isn't
backed by memory?
Sure, the VMM can remove the page at any point, but why would it do
so? Is doing so a legitimate part of the host/guest contract that the guest
must handle cleanly? More importantly, what is the guest expected to
do in such a case? I would expect that at some point such a DMA page
is accessed by a guest vCPU with an explicit reference that is not
load_unaligned_zeropad(). Then the guest would take a #VE that
goes down the #GP path and invokes die().
I don't object to make the load_unaligned_zeropad() fixup path work
correctly in response to a #VE, as it seems like general goodness. I'm
just trying to make sure I understand the nuances of "why". :-)
>
> > I'd still like to see the private <-> shared transition code mark the pages
> > as invalid during the transition, and avoid the possibility of #VE and
> > similar cases with SEV-SNP. Such approach reduces (eliminates?)
> > entanglement between CoCo-specific exceptions and
> > load_unaligned_zeropad(). It also greatly simplifies TD Partition cases
> > and SEV-SNP cases where a paravisor is used.
>
> I doesn't eliminates issue for TDX as the scenario above is not transient.
> It can happen after memory is converted to shared.
Notwithstanding, do you have any objections to the private <-> shared
transition code being changed so it won't be the cause of #VE, and
similar on SEV-SNP?
Michael
On Tue, Jul 25, 2023 at 03:51:24PM +0000, Michael Kelley (LINUX) wrote:
> From: Kirill A. Shutemov <[email protected]> Sent: Monday, July 24, 2023 4:19 PM
> >
> > On Thu, Jul 13, 2023 at 02:43:39PM +0000, Michael Kelley (LINUX) wrote:
> > > From: Kirill A. Shutemov <[email protected]> Sent: Saturday, July 8, 2023 11:09 PM
> > > >
> > > > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > > > > From: Kirill A. Shutemov <[email protected]> Sent: Friday, July 7, 2023 7:07 AM
> > > > > >
> > > > > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > > From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, June 6, 2023 2:56 AM
> > > > >
> > > > > [snip]
> > > > >
> > > > > >
> > > > > > It only addresses the problem that happens on transition, but
> > > > > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > > > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > > > > steps from private to shared mapping and shared mapping triggers #VE,
> > > > > > kernel should be able to handle it.
> > > > >
> > > > > I'm showing my ignorance of TDX architectural details, but what's the
> > > > > situation where shared mappings in general can trigger a #VE? How
> > > > > do such situations get handled for references that aren't from
> > > > > load_unaligned_zeropad()?
> > > > >
> > > >
> > > > Shared mappings are under host/VMM control. It can just not map the page
> > > > in shared-ept and trigger ept-violation #VE.
> > >
> > > I know you are out on vacation, but let me follow up now for further
> > > discussion when you are back.
> > >
> > > Isn't the scenario you are describing a malfunctioning or malicious
> > > host/VMM? Would what you are describing be done as part of normal
> > > operation? Kernel code must have switched the page from private to
> > > shared for some purpose. As soon as that code (which presumably
> > > does not have any entry in the exception table) touches the page, it
> > > would take the #VE and the enter the die path because there's no fixup.
> > > So is there value in having load_unaligned_zeropad() handle the #VE and
> > > succeed where a normal reference would fail?
> >
> > #VE on shared memory is legitimately used for MMIO. But MMIO region is
> > usually separate from the real memory in physical address space.
> >
> > But we also have DMA.
> >
> > DMA pages allocated from common pool of memory and they can be next to
> > dentry cache that kernel accesses with load_unaligned_zeropad(). DMA pages
> > are shared, but they usually backed by memory and not cause #VE. However
> > shared memory is under full control from VMM and VMM can remove page at
> > any point which would make platform to deliver #VE to the guest. This is
> > pathological scenario, but I think it still worth handling gracefully.
>
> Yep, pages targeted by DMA have been transitioned to shared, and could
> be scattered anywhere in the guest physical address space. Such a page
> could be touched by load_unaligned_zeropad(). But could you elaborate
> more on the "pathological scenario" where the guest physical page isn't
> backed by memory?
>
> Sure, the VMM can remove the page at any point, but why would it do
> so? Is doing so a legitimate part of the host/guest contract that the guest
> must handle cleanly? More importantly, what is the guest expected to
> do in such a case? I would expect that at some point such a DMA page
> is accessed by a guest vCPU with an explicit reference that is not
> load_unaligned_zeropad(). Then the guest would take a #VE that
> goes down the #GP path and invokes die().
>
> I don't object to make the load_unaligned_zeropad() fixup path work
> correctly in response to a #VE, as it seems like general goodness. I'm
> just trying to make sure I understand the nuances of "why". :-)
We actually triggered the issue during internal testing. I wrote initial
patch after that. But looking back on the report I don't have an answer
why the page triggered #VE. Maybe VMM or virtual BIOS screwed up and put
MMIO next to normal memory, I donno.
> > > I'd still like to see the private <-> shared transition code mark the pages
> > > as invalid during the transition, and avoid the possibility of #VE and
> > > similar cases with SEV-SNP. Such approach reduces (eliminates?)
> > > entanglement between CoCo-specific exceptions and
> > > load_unaligned_zeropad(). It also greatly simplifies TD Partition cases
> > > and SEV-SNP cases where a paravisor is used.
> >
> > I doesn't eliminates issue for TDX as the scenario above is not transient.
> > It can happen after memory is converted to shared.
>
> Notwithstanding, do you have any objections to the private <-> shared
> transition code being changed so it won't be the cause of #VE, and
> similar on SEV-SNP?
I am not yet convinced it is needed. But let's see the code.
--
Kiryl Shutsemau / Kirill A. Shutemov
From: Kirill A. Shutemov <[email protected]> Sent: Tuesday, July 25, 2023 4:13 PM
>
> On Tue, Jul 25, 2023 at 03:51:24PM +0000, Michael Kelley (LINUX) wrote:
> > From: Kirill A. Shutemov <[email protected]> Sent: Monday, July 24,
> 2023 4:19 PM
> > >
> > > On Thu, Jul 13, 2023 at 02:43:39PM +0000, Michael Kelley (LINUX) wrote:
> > > > From: Kirill A. Shutemov <[email protected]> Sent: Saturday, July 8, 2023
> 11:09 PM
> > > > >
> > > > > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > From: Kirill A. Shutemov <[email protected]> Sent: Friday, July 7, 2023
> 7:07 AM
> > > > > > >
> > > > > > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > > > From: Kirill A. Shutemov <[email protected]> Sent: Tuesday,
> June 6, 2023 2:56 AM
> > > > > >
> > > > > > [snip]
> > > > > >
> > > > > > >
> > > > > > > It only addresses the problem that happens on transition, but
> > > > > > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > > > > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > > > > > steps from private to shared mapping and shared mapping triggers #VE,
> > > > > > > kernel should be able to handle it.
> > > > > >
> > > > > > I'm showing my ignorance of TDX architectural details, but what's the
> > > > > > situation where shared mappings in general can trigger a #VE? How
> > > > > > do such situations get handled for references that aren't from
> > > > > > load_unaligned_zeropad()?
> > > > > >
> > > > >
> > > > > Shared mappings are under host/VMM control. It can just not map the page
> > > > > in shared-ept and trigger ept-violation #VE.
> > > >
> > > > I know you are out on vacation, but let me follow up now for further
> > > > discussion when you are back.
> > > >
> > > > Isn't the scenario you are describing a malfunctioning or malicious
> > > > host/VMM? Would what you are describing be done as part of normal
> > > > operation? Kernel code must have switched the page from private to
> > > > shared for some purpose. As soon as that code (which presumably
> > > > does not have any entry in the exception table) touches the page, it
> > > > would take the #VE and the enter the die path because there's no fixup.
> > > > So is there value in having load_unaligned_zeropad() handle the #VE and
> > > > succeed where a normal reference would fail?
> > >
> > > #VE on shared memory is legitimately used for MMIO. But MMIO region is
> > > usually separate from the real memory in physical address space.
> > >
> > > But we also have DMA.
> > >
> > > DMA pages allocated from common pool of memory and they can be next to
> > > dentry cache that kernel accesses with load_unaligned_zeropad(). DMA pages
> > > are shared, but they usually backed by memory and not cause #VE. However
> > > shared memory is under full control from VMM and VMM can remove page at
> > > any point which would make platform to deliver #VE to the guest. This is
> > > pathological scenario, but I think it still worth handling gracefully.
> >
> > Yep, pages targeted by DMA have been transitioned to shared, and could
> > be scattered anywhere in the guest physical address space. Such a page
> > could be touched by load_unaligned_zeropad(). But could you elaborate
> > more on the "pathological scenario" where the guest physical page isn't
> > backed by memory?
> >
> > Sure, the VMM can remove the page at any point, but why would it do
> > so? Is doing so a legitimate part of the host/guest contract that the guest
> > must handle cleanly? More importantly, what is the guest expected to
> > do in such a case? I would expect that at some point such a DMA page
> > is accessed by a guest vCPU with an explicit reference that is not
> > load_unaligned_zeropad(). Then the guest would take a #VE that
> > goes down the #GP path and invokes die().
> >
> > I don't object to make the load_unaligned_zeropad() fixup path work
> > correctly in response to a #VE, as it seems like general goodness. I'm
> > just trying to make sure I understand the nuances of "why". :-)
>
> We actually triggered the issue during internal testing. I wrote initial
> patch after that. But looking back on the report I don't have an answer
> why the page triggered #VE. Maybe VMM or virtual BIOS screwed up and put
> MMIO next to normal memory, I donno.
>
> > > > I'd still like to see the private <-> shared transition code mark the pages
> > > > as invalid during the transition, and avoid the possibility of #VE and
> > > > similar cases with SEV-SNP. Such approach reduces (eliminates?)
> > > > entanglement between CoCo-specific exceptions and
> > > > load_unaligned_zeropad(). It also greatly simplifies TD Partition cases
> > > > and SEV-SNP cases where a paravisor is used.
> > >
> > > I doesn't eliminates issue for TDX as the scenario above is not transient.
> > > It can happen after memory is converted to shared.
> >
> > Notwithstanding, do you have any objections to the private <-> shared
> > transition code being changed so it won't be the cause of #VE, and
> > similar on SEV-SNP?
>
> I am not yet convinced it is needed. But let's see the code.
>
Fair enough. But similarly, I'm not convinced that we have
load_unaligned_zeropad() cases outside of private <-> shared transitions
that *must* be fixed up cleanly. :-)
Here's the RFC patch, and it includes a couple of open questions:
https://lore.kernel.org/lkml/[email protected]/
In TD Partitioning configurations, I'm hoping we can eliminate #VE EPT
violations needing to be handled in the L2 guest. The L1 guest will handle
things like MMIO. But the L1 guest can't easily detect that a #VE with EPT
violation is due load_unaligned_zeropad() in the L2. And trying to reflect
the #VE from the L1 guest to the L2 guest wades into a bunch of problems
that I want to avoid.
The same problems arise in SEV-SNP with a paravisor running at VMPL 0.
Changing the private <-> shared transition code prevents the problems
there as well.
Michael