Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754836AbbFKOJr (ORCPT ); Thu, 11 Jun 2015 10:09:47 -0400 Received: from mail-wg0-f49.google.com ([74.125.82.49]:33739 "EHLO mail-wg0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932344AbbFKOHp (ORCPT ); Thu, 11 Jun 2015 10:07:45 -0400 From: Ingo Molnar To: linux-kernel@vger.kernel.org Cc: linux-mml@vger.kernel.org, Andy Lutomirski , Andrew Morton , Denys Vlasenko , Brian Gerst , Peter Zijlstra , Borislav Petkov , "H. Peter Anvin" , Linus Torvalds , Oleg Nesterov , Thomas Gleixner , Waiman Long Subject: [PATCH 06/12] x86/mm: Enable and use the arch_pgd_init_late() method Date: Thu, 11 Jun 2015 16:07:11 +0200 Message-Id: <1434031637-9091-7-git-send-email-mingo@kernel.org> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1434031637-9091-1-git-send-email-mingo@kernel.org> References: <1434031637-9091-1-git-send-email-mingo@kernel.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4061 Lines: 120 Prepare for lockless PGD init: enable the arch_pgd_init_late() callback and add a 'careful' implementation of PGD init to it: only copy over non-zero entries. Since PGD entries only ever get added, this method catches any updates to swapper_pg_dir[] that might have occured between early PGD init and late PGD init. Note that this only matters for code that does not use the pgd_list but the task list to find all PGDs in the system. Subsequent patches will convert pgd_list users to task-list iterations. [ This adds extra overhead in that we do the PGD initialization for a second time - a later patch will simplify this, once we don't have old pgd_list users. ] Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar --- arch/x86/Kconfig | 1 + arch/x86/mm/pgtable.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7e39f9b22705..15c19ce149f0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -27,6 +27,7 @@ config X86 select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_GCOV_PROFILE_ALL + select ARCH_HAS_PGD_INIT_LATE select ARCH_HAS_SG_CHAIN select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index fb0a9dd1d6e4..e0bf90470d70 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -391,6 +391,63 @@ pgd_t *pgd_alloc(struct mm_struct *mm) return NULL; } +/* + * Initialize the kernel portion of the PGD. + * + * This is done separately, because pgd_alloc() happens when + * the task is not on the task list yet - and PGD updates + * happen by walking the task list. + * + * No locking is needed here, as we just copy over the reference + * PGD. The reference PGD (pgtable_init) is only ever expanded + * at the highest, PGD level. Thus any other task extending it + * will first update the reference PGD, then modify the task PGDs. + */ +void arch_pgd_init_late(struct mm_struct *mm, pgd_t *pgd) +{ + /* + * This is called after a new MM has been made visible + * in fork() or exec(). + * + * This barrier makes sure the MM is visible to new RCU + * walkers before we initialize it, so that we don't miss + * updates: + */ + smp_wmb(); + + /* + * If the pgd points to a shared pagetable level (either the + * ptes in non-PAE, or shared PMD in PAE), then just copy the + * references from swapper_pg_dir: + */ + if (CONFIG_PGTABLE_LEVELS == 2 || + (CONFIG_PGTABLE_LEVELS == 3 && SHARED_KERNEL_PMD) || + CONFIG_PGTABLE_LEVELS == 4) { + + pgd_t *pgd_src = swapper_pg_dir + KERNEL_PGD_BOUNDARY; + pgd_t *pgd_dst = pgd + KERNEL_PGD_BOUNDARY; + int i; + + for (i = 0; i < KERNEL_PGD_PTRS; i++, pgd_src++, pgd_dst++) { + /* + * This is lock-less, so it can race with PGD updates + * coming from vmalloc() or CPA methods, but it's safe, + * because: + * + * 1) this PGD is not in use yet, we have still not + * scheduled this task. + * 2) we only ever extend PGD entries + * + * So if we observe a non-zero PGD entry we can copy it, + * it won't change from under us. Parallel updates (new + * allocations) will modify our (already visible) PGD: + */ + if (pgd_val(*pgd_src)) + WRITE_ONCE(*pgd_dst, *pgd_src); + } + } +} + void pgd_free(struct mm_struct *mm, pgd_t *pgd) { pgd_mop_up_pmds(mm, pgd); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/