This allows the set_pmd to be batched with whatever setup
paravirt_alloc_pte() needs to do. pmd_populate() is always called
under the pagetable lock, so this is safe.
With a corresponding change in the Xen-specific code, this allows
the setup of a new pte page to be done with 1 hypercall rather than 3.
This results in a 3-8% improvement in fork performance, depending on
process size (more improvement on larger processes). I expect a similar
improvement in time spent touching a virgin part of the address space.
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index 271de94..15e1153f 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -71,8 +71,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
{
unsigned long pfn = page_to_pfn(pte);
+ arch_enter_lazy_mmu_mode();
paravirt_alloc_pte(mm, pfn);
set_pmd(pmd, __pmd(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE));
+ arch_leave_lazy_mmu_mode();
}
#define pmd_pgtable(pmd) pmd_page(pmd)