Hi all,
With the earlyprintk=ttyS0 kernel parameter, a C-bit mode Linux SNP guest
on Hyper-V always decides to crash via sev_es_terminate() in
do_boot_stage2_vc(), because early_setup_ghcb() fails:
early_setup_ghcb() ->
set_page_decrypted() ->
set_clr_page_flags() ->
split_large_pmd() ->
alloc_pgt_page() fails to allocate memory.
static void *alloc_pgt_page(void *context)
{
...
/* Validate there is space available for a new page. */
if (pages->pgt_buf_offset >= pages->pgt_buf_size) {
...
return NULL;
}
...
}
alloc_pgt_page() fails to allocate memory because both
pages->pgt_buf_offset and pages->pgt_buf_size are zero.
pgt_data.pgt_buf_size is zero because of this line in
initialize_identity_maps()
pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
void initialize_identity_maps(void *rmode)
{
...
top_level_pgt = read_cr3_pa();
if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) {
pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
} else {
pgt_data.pgt_buf = _pgtable;
pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data);
}
In arch/x86/include/asm/boot.h, BOOT_PGT_SIZE equals
BOOT_INIT_PGT_SIZE if CONFIG_RANDOMIZE_BASE is not defined
(which is my case):
# define BOOT_INIT_PGT_SIZE (6*4096)
# ifdef CONFIG_RANDOMIZE_BASE
...
# ifdef CONFIG_X86_VERBOSE_BOOTUP
# define BOOT_PGT_SIZE (19*4096)
# else /* !CONFIG_X86_VERBOSE_BOOTUP */
# define BOOT_PGT_SIZE (17*4096)
# endif
# else /* !CONFIG_RANDOMIZE_BASE */
# define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
# endif
I think this means: if CONFIG_RANDOMIZE_BASE is not defined,
earlyprintk=ttyS0 also doesn't work for an SNP guest on KVM?
Sorry I don't have a KVM environment at hand to test it.
If I define CONFIG_RANDOMIZE_BASE, my C-bit mode SNP guest crashes
even ealier -- it looks like CONFIG_RANDOMIZE_BASE is incompatible
with my guest on Hyper-V due to some reason I don't know.
Do you always use CONFIG_RANDOMIZE_BASE for a SNP guest on KVM
and does earlyprintk=ttyS0 work for you?
Can you please share your thoughts? Thanks!
Thanks,
-- Dexuan
On Thu, Feb 16, 2023 at 04:40:14AM +0000, Dexuan Cui wrote:
> Hi all,
> With the earlyprintk=ttyS0 kernel parameter, a C-bit mode Linux SNP guest
> on Hyper-V always decides to crash via sev_es_terminate() in
> do_boot_stage2_vc(), because early_setup_ghcb() fails:
>
> early_setup_ghcb() ->
> set_page_decrypted() ->
> set_clr_page_flags() ->
> split_large_pmd() ->
> alloc_pgt_page() fails to allocate memory.
>
> static void *alloc_pgt_page(void *context)
> {
> ...
> /* Validate there is space available for a new page. */
> if (pages->pgt_buf_offset >= pages->pgt_buf_size) {
> ...
> return NULL;
> }
> ...
> }
>
> alloc_pgt_page() fails to allocate memory because both
> pages->pgt_buf_offset and pages->pgt_buf_size are zero.
>
>
> pgt_data.pgt_buf_size is zero because of this line in
> initialize_identity_maps()
> pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
>
> void initialize_identity_maps(void *rmode)
> {
> ...
> top_level_pgt = read_cr3_pa();
> if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) {
> pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
> pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
> memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
> } else {
> pgt_data.pgt_buf = _pgtable;
> pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
> memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
> top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data);
I just tested an SNP guest on KVM with and without CONFIG_RANDOMIZE_BASE.
In both cases we end up in the else() branch.
With CONFIG_RANDOMIZE_BASE BOOT_PGT_SIZE=0x13000
Without CONFIG_RANDOMMIZE_BASE BOOT_PGT_SIZE=0x6000.
So in both cases pgt_data.pgt_buf_size != 0.
Getting into that first branch would require having 5-level paging supported
(CONFIG_X86_5LEVEL=y) and enabled inside the guest, I don't have that on any
hardware I have access to.
Jeremi
> }
>
> In arch/x86/include/asm/boot.h, BOOT_PGT_SIZE equals
> BOOT_INIT_PGT_SIZE if CONFIG_RANDOMIZE_BASE is not defined
> (which is my case):
>
> # define BOOT_INIT_PGT_SIZE (6*4096)
>
> # ifdef CONFIG_RANDOMIZE_BASE
> ...
> # ifdef CONFIG_X86_VERBOSE_BOOTUP
> # define BOOT_PGT_SIZE (19*4096)
> # else /* !CONFIG_X86_VERBOSE_BOOTUP */
> # define BOOT_PGT_SIZE (17*4096)
> # endif
> # else /* !CONFIG_RANDOMIZE_BASE */
> # define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
> # endif
>
> I think this means: if CONFIG_RANDOMIZE_BASE is not defined,
> earlyprintk=ttyS0 also doesn't work for an SNP guest on KVM?
> Sorry I don't have a KVM environment at hand to test it.
>
> If I define CONFIG_RANDOMIZE_BASE, my C-bit mode SNP guest crashes
> even ealier -- it looks like CONFIG_RANDOMIZE_BASE is incompatible
> with my guest on Hyper-V due to some reason I don't know.
>
> Do you always use CONFIG_RANDOMIZE_BASE for a SNP guest on KVM
> and does earlyprintk=ttyS0 work for you?
>
> Can you please share your thoughts? Thanks!
>
> Thanks,
> -- Dexuan
> From: Jeremi Piotrowski <[email protected]>
> Sent: Thursday, February 16, 2023 1:15 AM
> > ...
> > alloc_pgt_page() fails to allocate memory because both
> > pages->pgt_buf_offset and pages->pgt_buf_size are zero.
> >
> >
> > pgt_data.pgt_buf_size is zero because of this line in
> > initialize_identity_maps()
> > pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
> >
> > void initialize_identity_maps(void *rmode)
> > {
> > ...
> > top_level_pgt = read_cr3_pa();
> > if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) {
> > pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
> > pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
> > BOOT_INIT_PGT_SIZE;
> > memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
> > } else {
> > pgt_data.pgt_buf = _pgtable;
> > pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
> > memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
> > top_level_pgt = (unsigned
> > long)alloc_pgt_page(&pgt_data);
>
> I just tested an SNP guest on KVM with and without
> CONFIG_RANDOMIZE_BASE.
> In both cases we end up in the else() branch.
> With CONFIG_RANDOMIZE_BASE BOOT_PGT_SIZE=0x13000
> Without CONFIG_RANDOMMIZE_BASE BOOT_PGT_SIZE=0x6000.
>
> So in both cases pgt_data.pgt_buf_size != 0.
>
> Getting into that first branch would require having 5-level paging supported
> (CONFIG_X86_5LEVEL=y) and enabled inside the guest, I don't have that on
> any
> hardware I have access to.
>
> Jeremi
CONFIG_X86_5LEVEL is not set for my kernel.
The comment before the first branch says:
On 4-level paging, p4d_offset(top_level_pgt, 0) is equal to 'top_level_pgt'.
IIUC this means 'top_level_pgt' is equal to '_pgtable'? i.e. without
CONFIG_RANDOMIZE_BASE, pgt_data.pgt_buf_size should be 0.
Not sure why it's not getting into the first branch for you.
On 16/02/2023 18:58, Dexuan Cui wrote:
>> From: Jeremi Piotrowski <[email protected]>
>> Sent: Thursday, February 16, 2023 1:15 AM
>>> ...
>>> alloc_pgt_page() fails to allocate memory because both
>>> pages->pgt_buf_offset and pages->pgt_buf_size are zero.
>>>
>>>
>>> pgt_data.pgt_buf_size is zero because of this line in
>>> initialize_identity_maps()
>>> pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
>>>
>>> void initialize_identity_maps(void *rmode)
>>> {
>>> ...
>>> top_level_pgt = read_cr3_pa();
>>> if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) {
>>> pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
>>> pgt_data.pgt_buf_size = BOOT_PGT_SIZE -
>>> BOOT_INIT_PGT_SIZE;
>>> memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
>>> } else {
>>> pgt_data.pgt_buf = _pgtable;
>>> pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
>>> memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
>>> top_level_pgt = (unsigned
>>> long)alloc_pgt_page(&pgt_data);
>>
>> I just tested an SNP guest on KVM with and without
>> CONFIG_RANDOMIZE_BASE.
>> In both cases we end up in the else() branch.
>> With CONFIG_RANDOMIZE_BASE BOOT_PGT_SIZE=0x13000
>> Without CONFIG_RANDOMMIZE_BASE BOOT_PGT_SIZE=0x6000.
>>
>> So in both cases pgt_data.pgt_buf_size != 0.
>>
>> Getting into that first branch would require having 5-level paging supported
>> (CONFIG_X86_5LEVEL=y) and enabled inside the guest, I don't have that on
>> any
>> hardware I have access to.
>>
>> Jeremi
>
> CONFIG_X86_5LEVEL is not set for my kernel.
>
> The comment before the first branch says:
> On 4-level paging, p4d_offset(top_level_pgt, 0) is equal to 'top_level_pgt'.
>
> IIUC this means 'top_level_pgt' is equal to '_pgtable'? i.e. without
> CONFIG_RANDOMIZE_BASE, pgt_data.pgt_buf_size should be 0.
>
> Not sure why it's not getting into the first branch for you.
Sorry, I got two things confused here. The relevant part of the comment is this:
"If we came here via startup_32(), cr3 will be _pgtable already".
Booting a (non-SNP) guest via BIOS I end up in the first branch. Upstream SNP support
requires OVMF (UEFI) so we'll always reach the kernel in 64-bit mode (startup_64?),
and end up in the second branch.
Jeremi
> From: Jeremi Piotrowski <[email protected]>
> Sent: Friday, February 17, 2023 4:51 AM
> To: Dexuan Cui <[email protected]>
> > [...]
> > The comment before the first branch says:
> > On 4-level paging, p4d_offset(top_level_pgt, 0) is equal to 'top_level_pgt'.
> >
> > IIUC this means 'top_level_pgt' is equal to '_pgtable'? i.e. without
> > CONFIG_RANDOMIZE_BASE, pgt_data.pgt_buf_size should be 0.
> >
> > Not sure why it's not getting into the first branch for you.
>
> Sorry, I got two things confused here. The relevant part of the comment is this:
> "If we came here via startup_32(), cr3 will be _pgtable already".
>
> Booting a (non-SNP) guest via BIOS I end up in the first branch. Upstream SNP
> support requires OVMF (UEFI) so we'll always reach the kernel in 64-bit mode
> (startup_64?), and end up in the second branch.
>
> Jeremi
Here I'm running a C-bit mode SNP guest on Hyper-V via "direct-boot" (i.e.
I run Set-VMFirmware to tell Hyper-V to boot the kernel directly without
UEFI). Looks like arch/x86/boot/compressed/head_64.S: startup_32 runs
first and calls startup_64 later (?) This might explain why I'm getting into
the first branch, which I hope could be fixed by someone...
On 2/17/23 20:54, Dexuan Cui wrote:
>> From: Jeremi Piotrowski <[email protected]>
>> Sent: Friday, February 17, 2023 4:51 AM
>> To: Dexuan Cui <[email protected]>
>>> [...]
>>> The comment before the first branch says:
>>> On 4-level paging, p4d_offset(top_level_pgt, 0) is equal to 'top_level_pgt'.
>>>
>>> IIUC this means 'top_level_pgt' is equal to '_pgtable'? i.e. without
>>> CONFIG_RANDOMIZE_BASE, pgt_data.pgt_buf_size should be 0.
>>>
>>> Not sure why it's not getting into the first branch for you.
>>
>> Sorry, I got two things confused here. The relevant part of the comment is this:
>> "If we came here via startup_32(), cr3 will be _pgtable already".
>>
>> Booting a (non-SNP) guest via BIOS I end up in the first branch. Upstream SNP
>> support requires OVMF (UEFI) so we'll always reach the kernel in 64-bit mode
>> (startup_64?), and end up in the second branch.
>>
>> Jeremi
>
> Here I'm running a C-bit mode SNP guest on Hyper-V via "direct-boot" (i.e.
> I run Set-VMFirmware to tell Hyper-V to boot the kernel directly without
> UEFI). Looks like arch/x86/boot/compressed/head_64.S: startup_32 runs
> first and calls startup_64 later (?) This might explain why I'm getting into
> the first branch, which I hope could be fixed by someone...
It sounds like there aren't enough pages available to satisfy the page
split in order to make the GHCB shared. Have you tried changing
BOOT_INIT_PGT_SIZE to increase it by 1 page. Splitting the page will
require an additional page table, I think that is all that would be needed.
Thanks,
Tom