Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S937700AbdDZUxD (ORCPT ); Wed, 26 Apr 2017 16:53:03 -0400 Received: from mail.kernel.org ([198.145.29.136]:52184 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750944AbdDZUwy (ORCPT ); Wed, 26 Apr 2017 16:52:54 -0400 MIME-Version: 1.0 From: Andy Lutomirski Date: Wed, 26 Apr 2017 13:52:29 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: xen_exit_mmap() questions To: "xen-devel@lists.xenproject.org" , Boris Ostrovsky , Juergen Gross Cc: X86 ML , Borislav Petkov , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5004 Lines: 149 I was trying to understand xen_drop_mm_ref() to update it for some changes I'm working on, and I'm wondering whether we need xen_exit_mmap() at all. AFAICS the intent is to force all CPUs to drop their lazy uses of the mm being destroyed so it can be unpinned before tearing down the page tables, thus making it faster to tear down the page tables. This seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this seems like it may be of rather limited value. Could we get away with deleting it? Also, this code in drop_other_mm_ref() looks dubious to me: /* If this cpu still has a stale cr3 reference, then make sure it has been flushed. */ if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) load_cr3(swapper_pg_dir); If cr3 hasn't been flushed to the hypervisor because we're in a lazy mode, why would load_cr3() help? Shouldn't this be xen_mc_flush() instead? Anyway, the whitespace-damaged patch below seems to result in a fully-functional kernel: diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 37cb5aad71de..e4e073844cbf 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -998,101 +998,6 @@ static void xen_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm) spin_unlock(&mm->page_table_lock); } - -#ifdef CONFIG_SMP -/* Another cpu may still have their %cr3 pointing at the pagetable, so - we need to repoint it somewhere else before we can unpin it. */ -static void drop_other_mm_ref(void *info) -{ - struct mm_struct *mm = info; - struct mm_struct *active_mm; - - active_mm = this_cpu_read(cpu_tlbstate.active_mm); - - if (active_mm == mm && this_cpu_read(cpu_tlbstate.state) != TLBSTATE_OK) - leave_mm(smp_processor_id()); - - /* If this cpu still has a stale cr3 reference, then make sure - it has been flushed. */ - if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) - load_cr3(swapper_pg_dir); -} - -static void xen_drop_mm_ref(struct mm_struct *mm) -{ - cpumask_var_t mask; - unsigned cpu; - - if (current->active_mm == mm) { - if (current->mm == mm) - load_cr3(swapper_pg_dir); - else - leave_mm(smp_processor_id()); - } - - /* Get the "official" set of cpus referring to our pagetable. */ - if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) { - for_each_online_cpu(cpu) { - if (!cpumask_test_cpu(cpu, mm_cpumask(mm)) - && per_cpu(xen_current_cr3, cpu) != __pa(mm->pgd)) - continue; - smp_call_function_single(cpu, drop_other_mm_ref, mm, 1); - } - return; - } - cpumask_copy(mask, mm_cpumask(mm)); - - /* It's possible that a vcpu may have a stale reference to our - cr3, because its in lazy mode, and it hasn't yet flushed - its set of pending hypercalls yet. In this case, we can - look at its actual current cr3 value, and force it to flush - if needed. */ - for_each_online_cpu(cpu) { - if (per_cpu(xen_current_cr3, cpu) == __pa(mm->pgd)) - cpumask_set_cpu(cpu, mask); - } - - if (!cpumask_empty(mask)) - smp_call_function_many(mask, drop_other_mm_ref, mm, 1); - free_cpumask_var(mask); -} -#else -static void xen_drop_mm_ref(struct mm_struct *mm) -{ - if (current->active_mm == mm) - load_cr3(swapper_pg_dir); -} -#endif - -/* - * While a process runs, Xen pins its pagetables, which means that the - * hypervisor forces it to be read-only, and it controls all updates - * to it. This means that all pagetable updates have to go via the - * hypervisor, which is moderately expensive. - * - * Since we're pulling the pagetable down, we switch to use init_mm, - * unpin old process pagetable and mark it all read-write, which - * allows further operations on it to be simple memory accesses. - * - * The only subtle point is that another CPU may be still using the - * pagetable because of lazy tlb flushing. This means we need need to - * switch all CPUs off this pagetable before we can unpin it. - */ -static void xen_exit_mmap(struct mm_struct *mm) -{ - get_cpu(); /* make sure we don't move around */ - xen_drop_mm_ref(mm); - put_cpu(); - - spin_lock(&mm->page_table_lock); - - /* pgd may not be pinned in the error exit path of execve */ - if (xen_page_pinned(mm->pgd)) - xen_pgd_unpin(mm); - - spin_unlock(&mm->page_table_lock); -} - static void xen_post_allocator_init(void); static void __init pin_pagetable_pfn(unsigned cmd, unsigned long pfn) @@ -1544,6 +1449,8 @@ static int xen_pgd_alloc(struct mm_struct *mm) static void xen_pgd_free(struct mm_struct *mm, pgd_t *pgd) { + xen_pgd_unpin(mm); + #ifdef CONFIG_X86_64 pgd_t *user_pgd = xen_get_user_pgd(pgd); @@ -2465,7 +2372,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { .activate_mm = xen_activate_mm, .dup_mmap = xen_dup_mmap, - .exit_mmap = xen_exit_mmap, .lazy_mode = { .enter = paravirt_enter_lazy_mmu,