Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758571AbcDHPQn (ORCPT ); Fri, 8 Apr 2016 11:16:43 -0400 Received: from mail-wm0-f47.google.com ([74.125.82.47]:36964 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753138AbcDHPQm (ORCPT ); Fri, 8 Apr 2016 11:16:42 -0400 Date: Fri, 8 Apr 2016 17:16:53 +0200 From: Christoffer Dall To: Marc Zyngier Cc: Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, mark.rutland@arm.com, will.deacon@arm.com, catalin.marinas@arm.com Subject: Re: [PATCH 12/17] kvm-arm: Add explicit hyp page table modifiers Message-ID: <20160408151653.GC8961@cbox> References: <1459787177-12767-1-git-send-email-suzuki.poulose@arm.com> <1459787177-12767-13-git-send-email-suzuki.poulose@arm.com> <20160408131537.GR8961@cbox> <5707C997.3030304@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5707C997.3030304@arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6335 Lines: 189 On Fri, Apr 08, 2016 at 04:09:11PM +0100, Marc Zyngier wrote: > On 08/04/16 14:15, Christoffer Dall wrote: > > On Mon, Apr 04, 2016 at 05:26:12PM +0100, Suzuki K Poulose wrote: > >> We have common routines to modify hyp and stage2 page tables > >> based on the 'kvm' parameter. For a smoother transition to > >> using separate routines for each, duplicate the routines > >> and modify the copy to work on hyp. > >> > >> Marks the forked routines with _hyp_ and gets rid of the > >> kvm parameter which is no longer needed and is NULL for hyp. > >> Also, gets rid of calls to kvm_tlb_flush_by_vmid_ipa() calls > >> from the hyp versions. Uses explicit host page table accessors > >> instead of the kvm_* page table helpers. > >> > >> Suggested-by: Christoffer Dall > >> Cc: Marc Zyngier > >> Signed-off-by: Suzuki K Poulose > >> --- > >> arch/arm/kvm/mmu.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++--- > >> 1 file changed, 118 insertions(+), 5 deletions(-) > >> > >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > >> index b46a337..2b491e5 100644 > >> --- a/arch/arm/kvm/mmu.c > >> +++ b/arch/arm/kvm/mmu.c > >> @@ -388,6 +388,119 @@ static void stage2_flush_vm(struct kvm *kvm) > >> srcu_read_unlock(&kvm->srcu, idx); > >> } > >> > >> +static void clear_hyp_pgd_entry(pgd_t *pgd) > >> +{ > >> + pud_t *pud_table __maybe_unused = pud_offset(pgd, 0UL); > >> + pgd_clear(pgd); > >> + pud_free(NULL, pud_table); > >> + put_page(virt_to_page(pgd)); > >> +} > >> + > >> +static void clear_hyp_pud_entry(pud_t *pud) > >> +{ > >> + pmd_t *pmd_table __maybe_unused = pmd_offset(pud, 0); > >> + VM_BUG_ON(pud_huge(*pud)); > >> + pud_clear(pud); > >> + pmd_free(NULL, pmd_table); > >> + put_page(virt_to_page(pud)); > >> +} > >> + > >> +static void clear_hyp_pmd_entry(pmd_t *pmd) > >> +{ > >> + pte_t *pte_table = pte_offset_kernel(pmd, 0); > >> + VM_BUG_ON(pmd_thp_or_huge(*pmd)); > >> + pmd_clear(pmd); > >> + pte_free_kernel(NULL, pte_table); > >> + put_page(virt_to_page(pmd)); > >> +} > >> + > >> +static void unmap_hyp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end) > >> +{ > >> + pte_t *pte, *start_pte; > >> + > >> + start_pte = pte = pte_offset_kernel(pmd, addr); > >> + do { > >> + if (!pte_none(*pte)) { > >> + pte_t old_pte = *pte; > >> + > >> + kvm_set_pte(pte, __pte(0)); > >> + > >> + /* XXX: Do we need to invalidate the cache for device mappings ? */ > > > > no, we will not be swapping out any pages mapped in Hyp mode so you can > > get rid of both of the following two lines. > > > >> + if (!kvm_is_device_pfn(pte_pfn(old_pte))) > >> + kvm_flush_dcache_pte(old_pte); > >> + > >> + put_page(virt_to_page(pte)); > >> + } > >> + } while (pte++, addr += PAGE_SIZE, addr != end); > >> + > >> + if (hyp_pte_table_empty(start_pte)) > >> + clear_hyp_pmd_entry(pmd); > >> +} > >> + > >> +static void unmap_hyp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end) > >> +{ > >> + phys_addr_t next; > >> + pmd_t *pmd, *start_pmd; > >> + > >> + start_pmd = pmd = pmd_offset(pud, addr); > >> + do { > >> + next = pmd_addr_end(addr, end); > >> + if (!pmd_none(*pmd)) { > >> + if (pmd_thp_or_huge(*pmd)) { > > > > do we ever actually map anything with section mappings in the Hyp > > mappings? > > No, this is purely a page mapping so far. On my system, the HYP text is > just over 4 pages big (4k pages), so the incentive is pretty low, unless > we can demonstrate some big gains due to the reduced TLB impact. > > >> + pmd_t old_pmd = *pmd; > >> + > >> + pmd_clear(pmd); > >> + kvm_flush_dcache_pmd(old_pmd); > >> + put_page(virt_to_page(pmd)); > >> + } else { > >> + unmap_hyp_ptes(pmd, addr, next); > >> + } > >> + } > >> + } while (pmd++, addr = next, addr != end); > >> + > >> + if (hyp_pmd_table_empty(start_pmd)) > >> + clear_hyp_pud_entry(pud); > >> +} > >> + > >> +static void unmap_hyp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end) > >> +{ > >> + phys_addr_t next; > >> + pud_t *pud, *start_pud; > >> + > >> + start_pud = pud = pud_offset(pgd, addr); > >> + do { > >> + next = pud_addr_end(addr, end); > >> + if (!pud_none(*pud)) { > >> + if (pud_huge(*pud)) { > > > > do we ever actually map anything with huge pud > > mappings for the Hyp space? > > Same thing. Looks like there is some potential simplification here. > > > > >> + pud_t old_pud = *pud; > >> + > >> + pud_clear(pud); > >> + kvm_flush_dcache_pud(old_pud); > >> + put_page(virt_to_page(pud)); > >> + } else { > >> + unmap_hyp_pmds(pud, addr, next); > >> + } > >> + } > >> + } while (pud++, addr = next, addr != end); > >> + > >> + if (hyp_pud_table_empty(start_pud)) > >> + clear_hyp_pgd_entry(pgd); > >> +} > >> + > >> +static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t start, u64 size) > >> +{ > >> + pgd_t *pgd; > >> + phys_addr_t addr = start, end = start + size; > >> + phys_addr_t next; > >> + > >> + pgd = pgdp + pgd_index(addr); > >> + do { > >> + next = pgd_addr_end(addr, end); > >> + if (!pgd_none(*pgd)) > >> + unmap_hyp_puds(pgd, addr, next); > >> + } while (pgd++, addr = next, addr != end); > > > > shouldn't we flush the EL2 (hyp) TLB here, strictly speaking? > > > > Or do we rely on all mappings ever created/torn down here to always have > > the same VA/PA relationship? Since we didn't flush the EL2 TLB in the > > existing code, that indeed does seem to be the case. > > Actually, we never unmap anything from HYP. Once a structure (kvm, vcpu) > is mapped there, it stays forever, whatever happens to the VM (that's > because we'd otherwise have to refcount the number of objects in a page, > and I'm lazy...). > > > That, in turn, raises the question why we don't simply map all pages > > that could be referenced by a kmalloc() in Hyp mode during the init > > phase and be done with all this hyp mapping/unmapping stuff? > > > > In any case, that behavior doesn't have to change now, but if we don't > > add a TLB flush here, I'd like a comment to explain why that's not > > needed. > > Hope you have your answer above... ;-) > Not quite: Could we just map the linearly mapped region in Hyp mode from the beginning and be done with all this? Otherwise yes, I have the answer, and we should add a comment too. Thanks, -Christoffer