Initialize the mask for PKU permissions as if CR4.PKE=0, avoiding
incorrect interpretations of the nested hypervisor's page tables.
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5942e9c6dd6e..a33b5361bc67 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4855,7 +4855,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
struct kvm_mmu *context = &vcpu->arch.guest_mmu;
struct kvm_mmu_role_regs regs = {
.cr0 = cr0,
- .cr4 = cr4,
+ .cr4 = cr4 & ~X86_CR4_PKE,
.efer = efer,
};
union kvm_mmu_role new_role;
@@ -4919,7 +4919,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
context->direct_map = false;
update_permission_bitmask(context, true);
- update_pkru_bitmask(context);
+ context->pkru_mask = 0;
reset_rsvds_bits_mask_ept(vcpu, context, execonly);
reset_ept_shadow_zero_bits_mask(vcpu, context, execonly);
}
--
2.31.1
On 2021/11/26 21:21, Paolo Bonzini wrote:
> Initialize the mask for PKU permissions as if CR4.PKE=0, avoiding
> incorrect interpretations of the nested hypervisor's page tables.
I think the AMD64 volume2 Architecture Programmer’s Manual does not
specify it, but it seems that for a sane NPT walk, PKU should not work
in NPT.
I once planed to set
cr0 = X86_CR0_PG | X86_CR0_WP;
cr4 = cr4 & ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
It adds X86_CR0_WP and removes smep smap just because it is always usermode
access, and it has no meaning for CR0_WP, smep, smap. Setting it like this
ways can reduce the role combination.
>
> Cc: [email protected]
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5942e9c6dd6e..a33b5361bc67 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4855,7 +4855,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
> struct kvm_mmu *context = &vcpu->arch.guest_mmu;
> struct kvm_mmu_role_regs regs = {
> .cr0 = cr0,
> - .cr4 = cr4,
> + .cr4 = cr4 & ~X86_CR4_PKE,
> .efer = efer,
> };
> union kvm_mmu_role new_role;
> @@ -4919,7 +4919,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
> context->direct_map = false;
>
> update_permission_bitmask(context, true);
> - update_pkru_bitmask(context);
> + context->pkru_mask = 0;
It is not worth to optimize it since update_pkru_bitmask() will also just
set context->pkru_mask = 0 and then return.
> reset_rsvds_bits_mask_ept(vcpu, context, execonly);
> reset_ept_shadow_zero_bits_mask(vcpu, context, execonly);
> }
>
On 11/27/21 02:21, Lai Jiangshan wrote:
>
>
> On 2021/11/26 21:21, Paolo Bonzini wrote:
>> Initialize the mask for PKU permissions as if CR4.PKE=0, avoiding
>> incorrect interpretations of the nested hypervisor's page tables.
>
> I think the AMD64 volume2 Architecture Programmer’s Manual does not
> specify it, but it seems that for a sane NPT walk, PKU should not work
> in NPT.
The PK bit is not defined in the nested page fault EXITINFO1, too.
Thomas, can you have it fixed in the APM that the host's SMEP, SMAP and
PKE bits do not affect nested page table walks?
> I once planed to set
>
> cr0 = X86_CR0_PG | X86_CR0_WP;
> cr4 = cr4 & ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
>
> It adds X86_CR0_WP and removes smep smap just because it is always usermode
> access, and it has no meaning for CR0_WP, smep, smap. Setting it like this
> ways can reduce the role combination.
Adding WP is a good idea (the host hCR0.WP bit is ignored under nested
paging). Adding PG is unnecessary though, it must be on.
Removing SMEP and SMAP makes sense, but not really because of the role
(if you add WP, then SMEP and SMAP are not part of the role because SMEP
& ~WP and SMAP & ~WP are both zero). Special-casing hCR4.SMEP basically
allows us to implement Guest Mode Execute Trap essentially for free and
even on older processors, because it's the same thing as SMEP---just
governed by a field in the VMCB control area instead of host CR4.
I'll send a v2 that also removes WP, SMEP and SMAP.
>> - update_pkru_bitmask(context);
>> + context->pkru_mask = 0;
>
> It is not worth to optimize it since update_pkru_bitmask() will also just
> set context->pkru_mask = 0 and then return.
I didn't think of it as an optimization, but I can undo it.
Paolo
On 11/27/21 4:25 AM, Paolo Bonzini wrote:
> On 11/27/21 02:21, Lai Jiangshan wrote:
>>
>>
>> On 2021/11/26 21:21, Paolo Bonzini wrote:
>>> Initialize the mask for PKU permissions as if CR4.PKE=0, avoiding
>>> incorrect interpretations of the nested hypervisor's page tables.
>>
>> I think the AMD64 volume2 Architecture Programmer’s Manual does not
>> specify it, but it seems that for a sane NPT walk, PKU should not work
>> in NPT.
>
> The PK bit is not defined in the nested page fault EXITINFO1, too. Thomas,
> can you have it fixed in the APM that the host's SMEP, SMAP and PKE bits
> do not affect nested page table walks?
>
I talked to our documentation folks and they will look to update the APM
with the appropriate information.
Thanks,
Tom