Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp6958349imu; Mon, 3 Dec 2018 05:40:03 -0800 (PST) X-Google-Smtp-Source: AFSGD/UzTWlo6PYFtZfrKkYhadCoKsRCc7LIDof8E65RY4MArsC1SAhXTJUBZ3b1qolhuYiFUlet X-Received: by 2002:a17:902:720c:: with SMTP id ba12mr16062537plb.79.1543844403167; Mon, 03 Dec 2018 05:40:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543844403; cv=none; d=google.com; s=arc-20160816; b=B9INGdxWM90BcAKzluvN12CK0a8+dDniPQJk2QNyLqgRaG4D7ldxbzAedRPstts3k5 S3vJFg6sLHiIKsFgFmfntXz8bzjnJzkrD+9JdTtqJeOxOhQAmnx6FjoOfUiYt3+RCCDW 34VbC2X9Ijez1RsqnwuElgw5aEOwWQ5g2DSZwtVGZfIu8d5Jgkryw3B6jO2RGGhxQTMW xcatjnVjqKSHpilDypg/DTbMCVal/rF2HyuSfKKOmf4da5On4ST8qiVkF3fhcTeMxK0A NxEKXPEA2tSB+nyMJCcCYeRyjUdfPLvYfdrw0txO8i6Zn6BBpGEdQui2kT/AANkKK1bN Qmtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=A6fEtaHQMqNtNgz1vC8uSH45wm+TodAeA0DCs3OIw/s=; b=Sy+pnANglLcZsrk+0zmcb0ymNeobadU8yLBHbAljYpTagG7kmR8EA+oILBlvS8tzaO v/gQV/YsTWznYrfMkGthIdRLLE6PFW9BCdAPOy6BMaCDvVWcHl1pi8zW6vRJRlurSoBx 7NUuoHhUQZpV2PmlzoJTXDSbAoxbFTk3TQdx28UMi+aQaXz4fbxsFpttxKumhoUsyqH8 Edp5qsvygr18iYKHQDOkji7Xhbgy1ax1wROVG2i6zelzRECSkMx4iYz1sDKeJrD3U+mT m+spV6mbT1wkPBjstE1gBxN8CnUoWm8gzdKT9o+Jayt20Hsdu9P+IE4Ti4JLelUIcI08 AEcA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x12si14103980plo.164.2018.12.03.05.39.48; Mon, 03 Dec 2018 05:40:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726198AbeLCNio (ORCPT + 99 others); Mon, 3 Dec 2018 08:38:44 -0500 Received: from foss.arm.com ([217.140.101.70]:37480 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725892AbeLCNio (ORCPT ); Mon, 3 Dec 2018 08:38:44 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5ADA41682; Mon, 3 Dec 2018 05:37:40 -0800 (PST) Received: from [10.1.196.93] (en101.cambridge.arm.com [10.1.196.93]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 036293F614; Mon, 3 Dec 2018 05:37:38 -0800 (PST) Subject: Re: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort() To: Anshuman Khandual , kvmarm@lists.cs.columbia.edu Cc: marc.zyngier@arm.com, will.deacon@arm.com, linux-kernel@vger.kernel.org, Christoffer Dall , punitagrawal@gmail.com, linux-arm-kernel@lists.infradead.org References: <20181031175745.18650-1-punit.agrawal@arm.com> <20181031175745.18650-2-punit.agrawal@arm.com> From: Suzuki K Poulose Message-ID: <8fd34e5f-7d75-4de2-3fee-d6d70805685c@arm.com> Date: Mon, 3 Dec 2018 13:37:37 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Anshuman, On 03/12/2018 12:11, Anshuman Khandual wrote: > > > On 10/31/2018 11:27 PM, Punit Agrawal wrote: >> The code for operations such as marking the pfn as dirty, and >> dcache/icache maintenance during stage 2 fault handling is duplicated >> between normal pages and PMD hugepages. >> >> Instead of creating another copy of the operations when we introduce >> PUD hugepages, let's share them across the different pagesizes. >> >> Signed-off-by: Punit Agrawal >> Reviewed-by: Suzuki K Poulose >> Cc: Christoffer Dall >> Cc: Marc Zyngier >> --- >> virt/kvm/arm/mmu.c | 49 ++++++++++++++++++++++++++++------------------ >> 1 file changed, 30 insertions(+), 19 deletions(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 5eca48bdb1a6..59595207c5e1 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> unsigned long fault_status) >> { >> int ret; >> - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false; >> + bool write_fault, exec_fault, writable, force_pte = false; >> unsigned long mmu_seq; >> gfn_t gfn = fault_ipa >> PAGE_SHIFT; >> struct kvm *kvm = vcpu->kvm; >> @@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> kvm_pfn_t pfn; >> pgprot_t mem_type = PAGE_S2; >> bool logging_active = memslot_is_logging(memslot); >> - unsigned long flags = 0; >> + unsigned long vma_pagesize, flags = 0; > > A small nit s/vma_pagesize/pagesize. Why call it VMA ? Its implicit. May be we could call it mapsize. pagesize is confusing. > >> >> write_fault = kvm_is_write_fault(vcpu); >> exec_fault = kvm_vcpu_trap_is_iabt(vcpu); >> @@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> return -EFAULT; >> } >> >> - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { >> - hugetlb = true; >> + vma_pagesize = vma_kernel_pagesize(vma); >> + if (vma_pagesize == PMD_SIZE && !logging_active) { >> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; >> } else { >> + /* >> + * Fallback to PTE if it's not one of the Stage 2 >> + * supported hugepage sizes >> + */ >> + vma_pagesize = PAGE_SIZE; > > This seems redundant and should be dropped. vma_kernel_pagesize() here either > calls hugetlb_vm_op_pagesize (via hugetlb_vm_ops->pagesize) or simply returns > PAGE_SIZE. The vm_ops path is taken if the QEMU VMA covering any given HVA is > backed either by HugeTLB pages or simply normal pages. vma_pagesize would > either has a value of PMD_SIZE (HugeTLB hstate based) or PAGE_SIZE. Hence if > its not PMD_SIZE it must be PAGE_SIZE which should not be assigned again. We may want to force using the PTE mappings when logging_active (e.g, migration ?) to prevent keep tracking of huge pages. So the check is still valid. > >> + >> /* >> * Pages belonging to memslots that don't have the same >> * alignment for userspace and IPA cannot be mapped using >> @@ -1573,23 +1579,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> if (mmu_notifier_retry(kvm, mmu_seq)) >> goto out_unlock; >> >> - if (!hugetlb && !force_pte) >> - hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa); >> + if (vma_pagesize == PAGE_SIZE && !force_pte) { >> + /* >> + * Only PMD_SIZE transparent hugepages(THP) are >> + * currently supported. This code will need to be >> + * updated to support other THP sizes. >> + */ > > This comment belongs to transparent_hugepage_adjust() but not here. I think this is relevant here than in thp_adjust, unless we rename the function below to something generic, handle_hugepage_adjust(). >> + if (transparent_hugepage_adjust(&pfn, &fault_ipa)) >> + vma_pagesize = PMD_SIZE; > > IIUC transparent_hugepage_adjust() is only getting called here. Instead of > returning 'true' when it is able to detect a huge page backing and doing > an adjustment there after, it should rather return THP size (PMD_SIZE) to > accommodate probable multi size THP support in future . That makes sense. > >> + } >> + >> + if (writable) >> + kvm_set_pfn_dirty(pfn); >> >> - if (hugetlb) { >> + if (fault_status != FSC_PERM) >> + clean_dcache_guest_page(pfn, vma_pagesize); >> + >> + if (exec_fault) >> + invalidate_icache_guest_page(pfn, vma_pagesize); >> + >> + if (vma_pagesize == PMD_SIZE) { >> pmd_t new_pmd = pfn_pmd(pfn, mem_type); >> new_pmd = pmd_mkhuge(new_pmd); >> - if (writable) { >> + if (writable) >> new_pmd = kvm_s2pmd_mkwrite(new_pmd); >> - kvm_set_pfn_dirty(pfn); >> - } >> - >> - if (fault_status != FSC_PERM) >> - clean_dcache_guest_page(pfn, PMD_SIZE); >> >> if (exec_fault) { >> new_pmd = kvm_s2pmd_mkexec(new_pmd); >> - invalidate_icache_guest_page(pfn, PMD_SIZE); >> } else if (fault_status == FSC_PERM) { >> /* Preserve execute if XN was already cleared */ >> if (stage2_is_exec(kvm, fault_ipa)) >> @@ -1602,16 +1618,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> >> if (writable) { >> new_pte = kvm_s2pte_mkwrite(new_pte); >> - kvm_set_pfn_dirty(pfn); >> mark_page_dirty(kvm, gfn); >> } >> >> - if (fault_status != FSC_PERM) >> - clean_dcache_guest_page(pfn, PAGE_SIZE); >> - >> if (exec_fault) { >> new_pte = kvm_s2pte_mkexec(new_pte); >> - invalidate_icache_guest_page(pfn, PAGE_SIZE); >> } else if (fault_status == FSC_PERM) { >> /* Preserve execute if XN was already cleared */ >> if (stage2_is_exec(kvm, fault_ipa)) >> > > kvm_set_pfn_dirty, clean_dcache_guest_page, invalidate_icache_guest_page > can all be safely moved before setting the page table entries either as > PMD or PTE. I think this is what we do currently. So I assume this is fine. Cheers Suzuki