Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751471AbdILQsJ (ORCPT ); Tue, 12 Sep 2017 12:48:09 -0400 Received: from mail-pf0-f174.google.com ([209.85.192.174]:33776 "EHLO mail-pf0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751405AbdILQsF (ORCPT ); Tue, 12 Sep 2017 12:48:05 -0400 X-Google-Smtp-Source: ADKCNb6AUszza+qi7Dv5mFlJjX3AJsTkMfxpHeoOCJQgZx5c8+GO0g7rcc7HC8qNofa8E6zqOHE8E7Og2YRAtbXL8zM= MIME-Version: 1.0 In-Reply-To: References: <1503590171-41030-1-git-send-email-pbonzini@redhat.com> From: Peter Feiner Date: Tue, 12 Sep 2017 09:48:03 -0700 Message-ID: Subject: Re: [PATCH] KVM: MMU: speedup update_permission_bitmask To: Jim Mattson Cc: Paolo Bonzini , LKML , kvm list , David Hildenbrand Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10002 Lines: 204 On Mon, Aug 28, 2017 at 12:42 PM, Jim Mattson wrote: > > Looks okay to me, but I'm hoping Peter will chime in. Sorry, this slipped by. Busy couple of weeks! > > > Reviewed-by: Jim Mattson > > On Thu, Aug 24, 2017 at 8:56 AM, Paolo Bonzini wrote: > > update_permission_bitmask currently does a 128-iteration loop to, > > essentially, compute a constant array. Computing the 8 bits in parallel > > reduces it to 16 iterations, and is enough to speed it up substantially > > because many boolean operations in the inner loop become constants or > > simplify noticeably. > > > > Because update_permission_bitmask is actually the top item in the profile > > for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand > > clock cycles, or up to 30%: This is a great improvement! Why not take it a step further and compute the whole table once at module init time and be done with it? There are only 5 extra input bits (nx, ept, smep, smap, wp), so the whole table would only take up (1 << 5) * 16 = 512 bytes. Moreover, if you had 32 VMs on the host, you'd actually save memory! > > > > before after > > cpuid 35173 25954 > > vmcall 35122 27079 > > inl_from_pmtimer 52635 42675 > > inl_from_qemu 53604 44599 > > inl_from_kernel 38498 30798 > > outl_to_kernel 34508 28816 > > wr_tsc_adjust_msr 34185 26818 > > rd_tsc_adjust_msr 37409 27049 > > mmio-no-eventfd:pci-mem 50563 45276 > > mmio-wildcard-eventfd:pci-mem 34495 30823 > > mmio-datamatch-eventfd:pci-mem 35612 31071 > > portio-no-eventfd:pci-io 44925 40661 > > portio-wildcard-eventfd:pci-io 29708 27269 > > portio-datamatch-eventfd:pci-io 31135 27164 > > > > (I wrote a small C program to compare the tables for all values of CR0.WP, > > CR4.SMAP and CR4.SMEP, and they match). > > > > Signed-off-by: Paolo Bonzini > > --- > > arch/x86/kvm/mmu.c | 121 +++++++++++++++++++++++++++++++---------------------- > > 1 file changed, 70 insertions(+), 51 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > > index f47cccace1a1..2a8a6e3e2a31 100644 > > --- a/arch/x86/kvm/mmu.c > > +++ b/arch/x86/kvm/mmu.c > > @@ -4204,66 +4204,85 @@ static inline bool boot_cpu_is_amd(void) > > boot_cpu_data.x86_phys_bits, execonly); > > } > > > > +#define BYTE_MASK(access) \ > > + ((1 & (access) ? 2 : 0) | \ > > + (2 & (access) ? 4 : 0) | \ > > + (3 & (access) ? 8 : 0) | \ > > + (4 & (access) ? 16 : 0) | \ > > + (5 & (access) ? 32 : 0) | \ > > + (6 & (access) ? 64 : 0) | \ > > + (7 & (access) ? 128 : 0)) > > + > > + > > static void update_permission_bitmask(struct kvm_vcpu *vcpu, > > struct kvm_mmu *mmu, bool ept) > > { > > - unsigned bit, byte, pfec; > > - u8 map; > > - bool fault, x, w, u, wf, uf, ff, smapf, cr4_smap, cr4_smep, smap = 0; > > + unsigned byte; > > + > > + const u8 x = BYTE_MASK(ACC_EXEC_MASK); > > + const u8 w = BYTE_MASK(ACC_WRITE_MASK); > > + const u8 u = BYTE_MASK(ACC_USER_MASK); > > + > > + bool cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) != 0; > > + bool cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP) != 0; > > + bool cr0_wp = is_write_protection(vcpu); > > > > - cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP); > > - cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP); > > for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) { > > - pfec = byte << 1; > > - map = 0; > > - wf = pfec & PFERR_WRITE_MASK; > > - uf = pfec & PFERR_USER_MASK; > > - ff = pfec & PFERR_FETCH_MASK; > > + unsigned pfec = byte << 1; > > + > > /* > > - * PFERR_RSVD_MASK bit is set in PFEC if the access is not > > - * subject to SMAP restrictions, and cleared otherwise. The > > - * bit is only meaningful if the SMAP bit is set in CR4. > > + * Each "*f" variable has a 1 bit for each UWX value > > + * that causes a fault with the given PFEC. > > */ > > - smapf = !(pfec & PFERR_RSVD_MASK); > > - for (bit = 0; bit < 8; ++bit) { > > - x = bit & ACC_EXEC_MASK; > > - w = bit & ACC_WRITE_MASK; > > - u = bit & ACC_USER_MASK; > > - > > - if (!ept) { > > - /* Not really needed: !nx will cause pte.nx to fault */ > > - x |= !mmu->nx; > > - /* Allow supervisor writes if !cr0.wp */ > > - w |= !is_write_protection(vcpu) && !uf; > > - /* Disallow supervisor fetches of user code if cr4.smep */ > > - x &= !(cr4_smep && u && !uf); > > - > > - /* > > - * SMAP:kernel-mode data accesses from user-mode > > - * mappings should fault. A fault is considered > > - * as a SMAP violation if all of the following > > - * conditions are ture: > > - * - X86_CR4_SMAP is set in CR4 > > - * - A user page is accessed > > - * - Page fault in kernel mode > > - * - if CPL = 3 or X86_EFLAGS_AC is clear > > - * > > - * Here, we cover the first three conditions. > > - * The fourth is computed dynamically in > > - * permission_fault() and is in smapf. > > - * > > - * Also, SMAP does not affect instruction > > - * fetches, add the !ff check here to make it > > - * clearer. > > - */ > > - smap = cr4_smap && u && !uf && !ff; > > - } > > > > - fault = (ff && !x) || (uf && !u) || (wf && !w) || > > - (smapf && smap); > > - map |= fault << bit; > > + /* Faults from writes to non-writable pages */ > > + u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0; > > + /* Faults from user mode accesses to supervisor pages */ > > + u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0; Aside: In the EPT case, I wondered why it's necessary to construct any of the mmu->permissions where PFERR_USER_MASK is set. EPT doesn't have a 'user' bit. Digging through the code, I see that vmx.c sets PFERR_USER_MASK for read violations, which normal x86 page tables don't have. I'm not sure to what effect, but it's used for something :-) > > + /* Faults from fetches of non-executable pages*/ > > + u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0; > > + /* Faults from kernel mode fetches of user pages */ > > + u8 smepf = 0; > > + /* Faults from kernel mode accesses of user pages */ > > + u8 smapf = 0; > > + > > + if (!ept) { > > + /* Faults from kernel mode accesses to user pages */ > > + u8 kf = (pfec & PFERR_USER_MASK) ? 0 : u; > > + > > + /* Not really needed: !nx will cause pte.nx to fault */ > > + if (!mmu->nx) > > + ff = 0; > > + > > + /* Allow supervisor writes if !cr0.wp */ > > + if (!cr0_wp) > > + wf = (pfec & PFERR_USER_MASK) ? wf : 0; > > + > > + /* Disallow supervisor fetches of user code if cr4.smep */ > > + if (cr4_smep) > > + smepf = (pfec & PFERR_FETCH_MASK) ? kf : 0; > > + > > + /* > > + * SMAP:kernel-mode data accesses from user-mode > > + * mappings should fault. A fault is considered > > + * as a SMAP violation if all of the following > > + * conditions are ture: > > + * - X86_CR4_SMAP is set in CR4 > > + * - A user page is accessed > > + * - The access is not a fetch > > + * - Page fault in kernel mode > > + * - if CPL = 3 or X86_EFLAGS_AC is clear > > + * > > + * Here, we cover the first three conditions. > > + * The fourth is computed dynamically in permission_fault(); > > + * PFERR_RSVD_MASK bit will be set in PFEC if the access is > > + * *not* subject to SMAP restrictions. > > + */ > > + if (cr4_smap) > > + smapf = (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf; > > } > > - mmu->permissions[byte] = map; > > + > > + mmu->permissions[byte] = ff | uf | wf | smepf | smapf; > > } > > } > > > > -- > > 1.8.3.1 > >