From: Ashwin Dayanand Kamat <[email protected]>
kernel crash was observed because of page fault, while running
cpuhotplug ltp testcases on SEV-ES enabled systems. The crash was
observed during hotplug after the CPU was offlined and the process
was migrated to different cpu. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.
This results in pagefault.
From logs,
[ 256.447466] BUG: unable to handle page fault for address: ffffffffba556e70
[ 256.447476] #PF: supervisor write access in kernel mode
[ 256.447478] #PF: error_code(0x0003) - permissions violation
[ 256.447479] PGD 8000667c0f067 P4D 8000667c0f067 PUD 8000667c10063 PMD 80080006674001e1
[ 256.447483] Oops: 0003 [#1] PREEMPT SMP NOPTI
[ 256.447487] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.45-8.ph5 #1-photon
.
.
.
.
.
[ 256.447511] CR2: ffffffffba556e70 CR3: 0008000667c0a004 CR4: 0000000000770ee0
[ 256.447514] PKRU: 55555554
[ 256.447515] Call Trace:
[ 256.447516] <TASK>
[ 256.447519] ? __die_body.cold+0x1a/0x1f
[ 256.447526] ? __die+0x2a/0x35
[ 256.447528] ? page_fault_oops+0x10c/0x270
[ 256.447531] ? setup_ghcb+0x71/0x100
[ 256.447533] ? __x86_return_thunk+0x5/0x6
[ 256.447537] ? search_exception_tables+0x60/0x70
[ 256.447541] ? __x86_return_thunk+0x5/0x6
[ 256.447543] ? fixup_exception+0x27/0x320
[ 256.447546] ? kernelmode_fixup_or_oops+0xa2/0x120
[ 256.447549] ? __bad_area_nosemaphore+0x16a/0x1b0
[ 256.447551] ? kernel_exc_vmm_communication+0x60/0xb0
[ 256.447556] ? bad_area_nosemaphore+0x16/0x20
[ 256.447558] ? do_kern_addr_fault+0x7a/0x90
[ 256.447560] ? exc_page_fault+0xbd/0x160
[ 256.447563] ? asm_exc_page_fault+0x27/0x30
[ 256.447570] ? setup_ghcb+0x71/0x100
[ 256.447572] ? setup_ghcb+0xe/0x100
[ 256.447574] cpu_init_exception_handling+0x1b9/0x1f0
Fix is to call sev_es_negotiate_protocol() only in the BSP boot phase (and
it only needs to be done once)
Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Co-developed-by: Bo Gan <[email protected]>
Signed-off-by: Bo Gan <[email protected]>
Signed-off-by: Ashwin Dayanand Kamat <[email protected]>
---
v2:
As per the review comments given by Tom Lendacky, did below changes in v2,
- Moved sev_es_negotiate_protocol() after initial_vc_handler if-check in setup_ghcb()
- Added Signed-off of Co-developer
---
arch/x86/kernel/sev.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 70472eebe719..c67285824e82 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1234,10 +1234,6 @@ void setup_ghcb(void)
if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
return;
- /* First make sure the hypervisor talks a supported protocol. */
- if (!sev_es_negotiate_protocol())
- sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
-
/*
* Check whether the runtime #VC exception handler is active. It uses
* the per-CPU GHCB page which is set up by sev_es_init_vc_handling().
@@ -1254,6 +1250,13 @@ void setup_ghcb(void)
return;
}
+ /*
+ * Make sure the hypervisor talks a supported protocol.
+ * This gets called only in the BSP boot phase.
+ */
+ if (!sev_es_negotiate_protocol())
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
/*
* Clear the boot_ghcb. The first exception comes in before the bss
* section is cleared.
--
2.39.0
* Ashwin Dayanand Kamat <[email protected]> wrote:
> From: Ashwin Dayanand Kamat <[email protected]>
>
> kernel crash was observed because of page fault, while running
> cpuhotplug ltp testcases on SEV-ES enabled systems. The crash was
> observed during hotplug after the CPU was offlined and the process
> was migrated to different cpu. setup_ghcb() is called again which
> tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
> is a read_only variable which is initialised during booting.
> This results in pagefault.
Applied to tip:x86/urgent, thanks.
Tom: I've added your Suggested-by and Acked-by, which appeared to be the
case given the v1 discussion, let me know if that's not accurate.
I've also tidied up the changelog - final version attached below.
Thanks,
Ingo
============>
From: Ashwin Dayanand Kamat <[email protected]>
Date: Wed, 29 Nov 2023 16:10:29 +0530
Subject: [PATCH] x86/sev: Fix kernel crash due to late update to read-only ghcb_version
A write-access violation page fault kernel crash was observed while running
cpuhotplug LTP testcases on SEV-ES enabled systems. The crash was
observed during hotplug, after the CPU was offlined and the process
was migrated to different CPU. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.
Trying to write it results in a pagefault:
BUG: unable to handle page fault for address: ffffffffba556e70
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
[ ...]
Call Trace:
<TASK>
? __die_body.cold+0x1a/0x1f
? __die+0x2a/0x35
? page_fault_oops+0x10c/0x270
? setup_ghcb+0x71/0x100
? __x86_return_thunk+0x5/0x6
? search_exception_tables+0x60/0x70
? __x86_return_thunk+0x5/0x6
? fixup_exception+0x27/0x320
? kernelmode_fixup_or_oops+0xa2/0x120
? __bad_area_nosemaphore+0x16a/0x1b0
? kernel_exc_vmm_communication+0x60/0xb0
? bad_area_nosemaphore+0x16/0x20
? do_kern_addr_fault+0x7a/0x90
? exc_page_fault+0xbd/0x160
? asm_exc_page_fault+0x27/0x30
? setup_ghcb+0x71/0x100
? setup_ghcb+0xe/0x100
cpu_init_exception_handling+0x1b9/0x1f0
The fix is to call sev_es_negotiate_protocol() only in the BSP boot phase,
and it only needs to be done once in any case.
[ mingo: Refined the changelog. ]
Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Suggested-by: Tom Lendacky <[email protected]>
Co-developed-by: Bo Gan <[email protected]>
Signed-off-by: Bo Gan <[email protected]>
Signed-off-by: Ashwin Dayanand Kamat <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Tom Lendacky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/sev.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 70472eebe719..c67285824e82 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1234,10 +1234,6 @@ void setup_ghcb(void)
if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
return;
- /* First make sure the hypervisor talks a supported protocol. */
- if (!sev_es_negotiate_protocol())
- sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
-
/*
* Check whether the runtime #VC exception handler is active. It uses
* the per-CPU GHCB page which is set up by sev_es_init_vc_handling().
@@ -1254,6 +1250,13 @@ void setup_ghcb(void)
return;
}
+ /*
+ * Make sure the hypervisor talks a supported protocol.
+ * This gets called only in the BSP boot phase.
+ */
+ if (!sev_es_negotiate_protocol())
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
/*
* Clear the boot_ghcb. The first exception comes in before the bss
* section is cleared.
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 27d25348d42161837be08fc63b04a2559d2e781c
Gitweb: https://git.kernel.org/tip/27d25348d42161837be08fc63b04a2559d2e781c
Author: Ashwin Dayanand Kamat <[email protected]>
AuthorDate: Wed, 29 Nov 2023 16:10:29 +05:30
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 30 Nov 2023 10:23:12 +01:00
x86/sev: Fix kernel crash due to late update to read-only ghcb_version
A write-access violation page fault kernel crash was observed while running
cpuhotplug LTP testcases on SEV-ES enabled systems. The crash was
observed during hotplug, after the CPU was offlined and the process
was migrated to different CPU. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.
Trying to write it results in a pagefault:
BUG: unable to handle page fault for address: ffffffffba556e70
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
[ ...]
Call Trace:
<TASK>
? __die_body.cold+0x1a/0x1f
? __die+0x2a/0x35
? page_fault_oops+0x10c/0x270
? setup_ghcb+0x71/0x100
? __x86_return_thunk+0x5/0x6
? search_exception_tables+0x60/0x70
? __x86_return_thunk+0x5/0x6
? fixup_exception+0x27/0x320
? kernelmode_fixup_or_oops+0xa2/0x120
? __bad_area_nosemaphore+0x16a/0x1b0
? kernel_exc_vmm_communication+0x60/0xb0
? bad_area_nosemaphore+0x16/0x20
? do_kern_addr_fault+0x7a/0x90
? exc_page_fault+0xbd/0x160
? asm_exc_page_fault+0x27/0x30
? setup_ghcb+0x71/0x100
? setup_ghcb+0xe/0x100
cpu_init_exception_handling+0x1b9/0x1f0
The fix is to call sev_es_negotiate_protocol() only in the BSP boot phase,
and it only needs to be done once in any case.
[ mingo: Refined the changelog. ]
Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Suggested-by: Tom Lendacky <[email protected]>
Co-developed-by: Bo Gan <[email protected]>
Signed-off-by: Bo Gan <[email protected]>
Signed-off-by: Ashwin Dayanand Kamat <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Tom Lendacky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/sev.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 70472ee..c672858 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1234,10 +1234,6 @@ void setup_ghcb(void)
if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
return;
- /* First make sure the hypervisor talks a supported protocol. */
- if (!sev_es_negotiate_protocol())
- sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
-
/*
* Check whether the runtime #VC exception handler is active. It uses
* the per-CPU GHCB page which is set up by sev_es_init_vc_handling().
@@ -1255,6 +1251,13 @@ void setup_ghcb(void)
}
/*
+ * Make sure the hypervisor talks a supported protocol.
+ * This gets called only in the BSP boot phase.
+ */
+ if (!sev_es_negotiate_protocol())
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
+ /*
* Clear the boot_ghcb. The first exception comes in before the bss
* section is cleared.
*/
On 11/30/23 03:30, Ingo Molnar wrote:
>
> * Ashwin Dayanand Kamat <[email protected]> wrote:
>
>> From: Ashwin Dayanand Kamat <[email protected]>
>>
>> kernel crash was observed because of page fault, while running
>> cpuhotplug ltp testcases on SEV-ES enabled systems. The crash was
>> observed during hotplug after the CPU was offlined and the process
>> was migrated to different cpu. setup_ghcb() is called again which
>> tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
>> is a read_only variable which is initialised during booting.
>> This results in pagefault.
>
> Applied to tip:x86/urgent, thanks.
>
> Tom: I've added your Suggested-by and Acked-by, which appeared to be the
> case given the v1 discussion, let me know if that's not accurate.
All good.
Thanks,
Tom
>
> I've also tidied up the changelog - final version attached below.
>
> Thanks,
>
> Ingo
>