2018-04-20 14:56:45

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH 0/4] KVM: Support PUD hugepages at stage 2

Hi,

This patchset adds support for PUD hugepages at stage 2. This feature
is useful on cores that have support for large sized TLB mappings
(e.g., 1GB for 4K granule). Previous posting[0].

Support is added to code that is shared between arm and arm64. Dummy
helpers for arm are provided as the port does not support PUD hugepage
sizes.

There is a small conflict with the series to add support for 52 bit
IPA[1]. The patches have been functionally tested on an A57 based
system. The patchset is based on v4.17-rc1.

Thanks,
Punit

[0] https://www.spinics.net/lists/arm-kernel/msg628053.html
[1] https://lwn.net/Articles/750176/

Punit Agrawal (4):
KVM: arm/arm64: Share common code in user_mem_abort()
KVM: arm/arm64: Introduce helpers to manupulate page table entries
KVM: arm64: Support dirty page tracking for PUD hugepages
KVM: arm64: Add support for PUD hugepages at stage 2

arch/arm/include/asm/kvm_mmu.h | 40 ++++++++++++
arch/arm64/include/asm/kvm_mmu.h | 30 +++++++++
arch/arm64/include/asm/pgtable-hwdef.h | 4 ++
arch/arm64/include/asm/pgtable.h | 2 +
virt/kvm/arm/mmu.c | 84 +++++++++++++++++++-------
5 files changed, 139 insertions(+), 21 deletions(-)

--
2.17.0



2018-04-20 14:57:17

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH 2/4] KVM: arm/arm64: Introduce helpers to manupulate page table entries

Introduce helpers to abstract architectural handling of the conversion
of pfn to page table entries and marking a PMD page table entry as a
block entry.

The helpers are introduced in preparation for supporting PUD hugepages
at stage 2 - which are supported on arm64 but do not exist on arm.

Signed-off-by: Punit Agrawal <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 5 +++++
arch/arm64/include/asm/kvm_mmu.h | 5 +++++
virt/kvm/arm/mmu.c | 7 ++++---
3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 707a1f06dc5d..5907a81ad5c1 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -75,6 +75,11 @@ phys_addr_t kvm_get_idmap_vector(void);
int kvm_mmu_init(void);
void kvm_clear_hyp_idmap(void);

+#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
+#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+
+#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
+
static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
{
*pmd = new_pmd;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 082110993647..d962508ce4b3 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -173,6 +173,11 @@ void kvm_clear_hyp_idmap(void);
#define kvm_set_pte(ptep, pte) set_pte(ptep, pte)
#define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd)

+#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
+#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+
+#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
+
static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
{
pte_val(pte) |= PTE_S2_RDWR;
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index db382c7c7cd7..f72ae7a6dea0 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1538,8 +1538,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
invalidate_icache_guest_page(pfn, vma_pagesize);

if (hugetlb) {
- pmd_t new_pmd = pfn_pmd(pfn, mem_type);
- new_pmd = pmd_mkhuge(new_pmd);
+ pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
+
+ new_pmd = kvm_pmd_mkhuge(new_pmd);
if (writable)
new_pmd = kvm_s2pmd_mkwrite(new_pmd);

@@ -1553,7 +1554,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,

ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
} else {
- pte_t new_pte = pfn_pte(pfn, mem_type);
+ pte_t new_pte = kvm_pfn_pte(pfn, mem_type);

if (writable) {
new_pte = kvm_s2pte_mkwrite(new_pte);
--
2.17.0


2018-04-20 14:57:25

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH 3/4] KVM: arm64: Support dirty page tracking for PUD hugepages

In preparation for creating PUD hugepages at stage 2, add support for
write protecting PUD hugepages when they are encountered. Write
protecting guest tables is used to track dirty pages when migrating
VMs.

Also, provide trivial implementations of required kvm_s2pud_* helpers
to allow sharing of code with arm32.

Signed-off-by: Punit Agrawal <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 16 ++++++++++++++++
arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
virt/kvm/arm/mmu.c | 9 ++++++---
3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5907a81ad5c1..224c22c0a69c 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -80,6 +80,22 @@ void kvm_clear_hyp_idmap(void);

#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)

+/*
+ * The following kvm_*pud*() functionas are provided strictly to allow
+ * sharing code with arm64. They should never be called in practice.
+ */
+static inline void kvm_set_s2pud_readonly(pud_t *pud)
+{
+ BUG();
+}
+
+static inline bool kvm_s2pud_readonly(pud_t *pud)
+{
+ BUG();
+ return false;
+}
+
+
static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
{
*pmd = new_pmd;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index d962508ce4b3..f440cf216a23 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -240,6 +240,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp)
return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN);
}

+static inline void kvm_set_s2pud_readonly(pud_t *pudp)
+{
+ kvm_set_s2pte_readonly((pte_t *)pudp);
+}
+
+static inline bool kvm_s2pud_readonly(pud_t *pudp)
+{
+ return kvm_s2pte_readonly((pte_t *)pudp);
+}
+
static inline bool kvm_page_empty(void *ptr)
{
struct page *ptr_page = virt_to_page(ptr);
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f72ae7a6dea0..5f53909da90e 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1286,9 +1286,12 @@ static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end)
do {
next = stage2_pud_addr_end(addr, end);
if (!stage2_pud_none(*pud)) {
- /* TODO:PUD not supported, revisit later if supported */
- BUG_ON(stage2_pud_huge(*pud));
- stage2_wp_pmds(pud, addr, next);
+ if (stage2_pud_huge(*pud)) {
+ if (!kvm_s2pud_readonly(pud))
+ kvm_set_s2pud_readonly(pud);
+ } else {
+ stage2_wp_pmds(pud, addr, next);
+ }
}
} while (pud++, addr = next, addr != end);
}
--
2.17.0


2018-04-20 14:58:14

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH 1/4] KVM: arm/arm64: Share common code in user_mem_abort()

The code for operations such as marking the pfn as dirty, and
dcache/icache maintenance during stage 2 fault handling is duplicated
between normal pages and PMD hugepages.

Instead of creating another copy of the operations when we introduce
PUD hugepages, let's share them across the different pagesizes.

Signed-off-by: Punit Agrawal <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
virt/kvm/arm/mmu.c | 36 +++++++++++++++++++++---------------
1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 7f6a944db23d..db382c7c7cd7 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1428,7 +1428,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
kvm_pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
bool logging_active = memslot_is_logging(memslot);
- unsigned long flags = 0;
+ unsigned long vma_pagesize, flags = 0;

write_fault = kvm_is_write_fault(vcpu);
exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1448,7 +1448,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}

- if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
+ vma_pagesize = vma_kernel_pagesize(vma);
+ if (vma_pagesize == PMD_SIZE && !logging_active) {
hugetlb = true;
gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
} else {
@@ -1517,23 +1518,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (mmu_notifier_retry(kvm, mmu_seq))
goto out_unlock;

- if (!hugetlb && !force_pte)
+ if (!hugetlb && !force_pte) {
+ /*
+ * Only PMD_SIZE transparent hugepages(THP) are
+ * currently supported. This code will need to be
+ * updated if other THP sizes are supported.
+ */
hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
+ vma_pagesize = PMD_SIZE;
+ }
+
+ if (writable)
+ kvm_set_pfn_dirty(pfn);
+
+ if (fault_status != FSC_PERM)
+ clean_dcache_guest_page(pfn, vma_pagesize);
+
+ if (exec_fault)
+ invalidate_icache_guest_page(pfn, vma_pagesize);

if (hugetlb) {
pmd_t new_pmd = pfn_pmd(pfn, mem_type);
new_pmd = pmd_mkhuge(new_pmd);
- if (writable) {
+ if (writable)
new_pmd = kvm_s2pmd_mkwrite(new_pmd);
- kvm_set_pfn_dirty(pfn);
- }
-
- if (fault_status != FSC_PERM)
- clean_dcache_guest_page(pfn, PMD_SIZE);

if (exec_fault) {
new_pmd = kvm_s2pmd_mkexec(new_pmd);
- invalidate_icache_guest_page(pfn, PMD_SIZE);
} else if (fault_status == FSC_PERM) {
/* Preserve execute if XN was already cleared */
if (stage2_is_exec(kvm, fault_ipa))
@@ -1546,16 +1557,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,

if (writable) {
new_pte = kvm_s2pte_mkwrite(new_pte);
- kvm_set_pfn_dirty(pfn);
mark_page_dirty(kvm, gfn);
}

- if (fault_status != FSC_PERM)
- clean_dcache_guest_page(pfn, PAGE_SIZE);
-
if (exec_fault) {
new_pte = kvm_s2pte_mkexec(new_pte);
- invalidate_icache_guest_page(pfn, PAGE_SIZE);
} else if (fault_status == FSC_PERM) {
/* Preserve execute if XN was already cleared */
if (stage2_is_exec(kvm, fault_ipa))
--
2.17.0


2018-04-20 14:58:14

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH 4/4] KVM: arm64: Add support for PUD hugepages at stage 2

KVM only supports PMD hugepages at stage 2. Extend the stage 2 fault
handling to add support for PUD hugepages.

Addition of pud hugepage support enables additional hugepage
sizes (e.g., 1G with 4K granule) which can be useful on cores that
support mapping larger block sizes in the TLB entries.

Signed-off-by: Punit Agrawal <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 19 +++++++++
arch/arm64/include/asm/kvm_mmu.h | 15 +++++++
arch/arm64/include/asm/pgtable-hwdef.h | 4 ++
arch/arm64/include/asm/pgtable.h | 2 +
virt/kvm/arm/mmu.c | 54 ++++++++++++++++++++------
5 files changed, 83 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 224c22c0a69c..155916dbdd7e 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -77,8 +77,11 @@ void kvm_clear_hyp_idmap(void);

#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+#define kvm_pfn_pud(pfn, prot) (__pud(0))

#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
+/* No support for pud hugepages */
+#define kvm_pud_mkhuge(pud) (pud)

/*
* The following kvm_*pud*() functionas are provided strictly to allow
@@ -95,6 +98,22 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
return false;
}

+static inline void kvm_set_pud(pud_t *pud, pud_t new_pud)
+{
+ BUG();
+}
+
+static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
+{
+ BUG();
+ return pud;
+}
+
+static inline pud_t kvm_s2pud_mkexec(pud_t pud)
+{
+ BUG();
+ return pud;
+}

static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
{
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index f440cf216a23..f49a68fcbf26 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -172,11 +172,14 @@ void kvm_clear_hyp_idmap(void);

#define kvm_set_pte(ptep, pte) set_pte(ptep, pte)
#define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd)
+#define kvm_set_pud(pudp, pud) set_pud(pudp, pud)

#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot)

#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
+#define kvm_pud_mkhuge(pud) pud_mkhuge(pud)

static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
{
@@ -190,6 +193,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
return pmd;
}

+static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
+{
+ pud_val(pud) |= PUD_S2_RDWR;
+ return pud;
+}
+
static inline pte_t kvm_s2pte_mkexec(pte_t pte)
{
pte_val(pte) &= ~PTE_S2_XN;
@@ -202,6 +211,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
return pmd;
}

+static inline pud_t kvm_s2pud_mkexec(pud_t pud)
+{
+ pud_val(pud) &= ~PUD_S2_XN;
+ return pud;
+}
+
static inline void kvm_set_s2pte_readonly(pte_t *ptep)
{
pteval_t old_pteval, pteval;
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index fd208eac9f2a..e327665e94d1 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -193,6 +193,10 @@
#define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
#define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */

+#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */
+#define PUD_S2_RDWR (_AT(pudval_t, 3) << 6) /* HAP[2:1] */
+#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */
+
/*
* Memory Attribute override for Stage-2 (MemAttr[3:0])
*/
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7e2c27e63cd8..5efb4585c879 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -386,6 +386,8 @@ static inline int pmd_protnone(pmd_t pmd)

#define pud_write(pud) pte_write(pud_pte(pud))

+#define pud_mkhuge(pud) (__pud(pud_val(pud) & ~PUD_TABLE_BIT))
+
#define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
#define __phys_to_pud_val(phys) __phys_to_pte_val(phys)
#define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 5f53909da90e..7fb58dca0a83 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1036,6 +1036,26 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
return 0;
}

+static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+ phys_addr_t addr, const pud_t *new_pud)
+{
+ pud_t *pud, old_pud;
+
+ pud = stage2_get_pud(kvm, cache, addr);
+ VM_BUG_ON(!pud);
+
+ old_pud = *pud;
+ if (pud_present(old_pud)) {
+ pud_clear(pud);
+ kvm_tlb_flush_vmid_ipa(kvm, addr);
+ } else {
+ get_page(virt_to_page(pud));
+ }
+
+ kvm_set_pud(pud, *new_pud);
+ return 0;
+}
+
static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
{
pmd_t *pmdp;
@@ -1452,9 +1472,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
}

vma_pagesize = vma_kernel_pagesize(vma);
- if (vma_pagesize == PMD_SIZE && !logging_active) {
+ if ((vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) &&
+ !logging_active) {
+ struct hstate *h = hstate_vma(vma);
+
hugetlb = true;
- gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
+ gfn = (fault_ipa & huge_page_mask(h)) >> PAGE_SHIFT;
} else {
/*
* Pages belonging to memslots that don't have the same
@@ -1521,15 +1544,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (mmu_notifier_retry(kvm, mmu_seq))
goto out_unlock;

- if (!hugetlb && !force_pte) {
- /*
- * Only PMD_SIZE transparent hugepages(THP) are
- * currently supported. This code will need to be
- * updated if other THP sizes are supported.
- */
+ if (!hugetlb && !force_pte)
hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
- vma_pagesize = PMD_SIZE;
- }

if (writable)
kvm_set_pfn_dirty(pfn);
@@ -1540,7 +1556,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (exec_fault)
invalidate_icache_guest_page(pfn, vma_pagesize);

- if (hugetlb) {
+ if (vma_pagesize == PUD_SIZE) {
+ pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
+
+ new_pud = kvm_pud_mkhuge(new_pud);
+ if (writable)
+ new_pud = kvm_s2pud_mkwrite(new_pud);
+
+ if (exec_fault) {
+ new_pud = kvm_s2pud_mkexec(new_pud);
+ } else if (fault_status == FSC_PERM) {
+ /* Preserve execute if XN was already cleared */
+ if (stage2_is_exec(kvm, fault_ipa))
+ new_pud = kvm_s2pud_mkexec(new_pud);
+ }
+
+ ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud);
+ } else if (vma_pagesize == PMD_SIZE) {
pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);

new_pmd = kvm_pmd_mkhuge(new_pmd);
--
2.17.0


2018-04-27 11:15:55

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 1/4] KVM: arm/arm64: Share common code in user_mem_abort()

On Fri, Apr 20, 2018 at 03:54:06PM +0100, Punit Agrawal wrote:
> The code for operations such as marking the pfn as dirty, and
> dcache/icache maintenance during stage 2 fault handling is duplicated
> between normal pages and PMD hugepages.
>
> Instead of creating another copy of the operations when we introduce
> PUD hugepages, let's share them across the different pagesizes.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> ---
> virt/kvm/arm/mmu.c | 36 +++++++++++++++++++++---------------
> 1 file changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 7f6a944db23d..db382c7c7cd7 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1428,7 +1428,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> kvm_pfn_t pfn;
> pgprot_t mem_type = PAGE_S2;
> bool logging_active = memslot_is_logging(memslot);
> - unsigned long flags = 0;
> + unsigned long vma_pagesize, flags = 0;
>
> write_fault = kvm_is_write_fault(vcpu);
> exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> @@ -1448,7 +1448,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> return -EFAULT;
> }
>
> - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
> + vma_pagesize = vma_kernel_pagesize(vma);
> + if (vma_pagesize == PMD_SIZE && !logging_active) {
> hugetlb = true;
> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> } else {
> @@ -1517,23 +1518,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> if (mmu_notifier_retry(kvm, mmu_seq))
> goto out_unlock;
>
> - if (!hugetlb && !force_pte)
> + if (!hugetlb && !force_pte) {
> + /*
> + * Only PMD_SIZE transparent hugepages(THP) are
> + * currently supported. This code will need to be
> + * updated if other THP sizes are supported.
> + */
> hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> + vma_pagesize = PMD_SIZE;
> + }
> +
> + if (writable)
> + kvm_set_pfn_dirty(pfn);
> +
> + if (fault_status != FSC_PERM)
> + clean_dcache_guest_page(pfn, vma_pagesize);
> +
> + if (exec_fault)
> + invalidate_icache_guest_page(pfn, vma_pagesize);
>
> if (hugetlb) {
> pmd_t new_pmd = pfn_pmd(pfn, mem_type);
> new_pmd = pmd_mkhuge(new_pmd);
> - if (writable) {
> + if (writable)
> new_pmd = kvm_s2pmd_mkwrite(new_pmd);
> - kvm_set_pfn_dirty(pfn);
> - }
> -
> - if (fault_status != FSC_PERM)
> - clean_dcache_guest_page(pfn, PMD_SIZE);
>
> if (exec_fault) {
> new_pmd = kvm_s2pmd_mkexec(new_pmd);
> - invalidate_icache_guest_page(pfn, PMD_SIZE);
> } else if (fault_status == FSC_PERM) {

This could now be rewritten to:

if (exec_fault ||
(fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)))
new_pmd = kvm_s2pmd_mkexec(new_pmd);

which we could even consider making

static bool stage2_should_exec(struct kvm *kvm, phys_addr_t addr,
bool exec_fault, unsigned long fault_status)
{
/*
* If we took an execution fault we will have made the
* icache/dcache coherent and should now let the s2 mapping be
* executable.
*
* Write faults (!exec_fault && FSC_PERM) are orthogonal to
* execute permissions, and we preserve whatever we have.
*/
return exec_fault ||
(fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
}

The benefit would be to have this documentation in a single place and
slightly simply both the hugetlb and !hugetlb blocks.


> /* Preserve execute if XN was already cleared */
> if (stage2_is_exec(kvm, fault_ipa))
> @@ -1546,16 +1557,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
> if (writable) {
> new_pte = kvm_s2pte_mkwrite(new_pte);
> - kvm_set_pfn_dirty(pfn);
> mark_page_dirty(kvm, gfn);
> }
>
> - if (fault_status != FSC_PERM)
> - clean_dcache_guest_page(pfn, PAGE_SIZE);
> -
> if (exec_fault) {
> new_pte = kvm_s2pte_mkexec(new_pte);
> - invalidate_icache_guest_page(pfn, PAGE_SIZE);
> } else if (fault_status == FSC_PERM) {
> /* Preserve execute if XN was already cleared */
> if (stage2_is_exec(kvm, fault_ipa))
> --
> 2.17.0
>

Notwithstanding my suggestion above:

Reviewed-by: Christoffer Dall <[email protected]>

2018-04-27 11:15:55

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 4/4] KVM: arm64: Add support for PUD hugepages at stage 2

On Fri, Apr 20, 2018 at 03:54:09PM +0100, Punit Agrawal wrote:
> KVM only supports PMD hugepages at stage 2. Extend the stage 2 fault
> handling to add support for PUD hugepages.
>
> Addition of pud hugepage support enables additional hugepage
> sizes (e.g., 1G with 4K granule) which can be useful on cores that
> support mapping larger block sizes in the TLB entries.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 19 +++++++++
> arch/arm64/include/asm/kvm_mmu.h | 15 +++++++
> arch/arm64/include/asm/pgtable-hwdef.h | 4 ++
> arch/arm64/include/asm/pgtable.h | 2 +
> virt/kvm/arm/mmu.c | 54 ++++++++++++++++++++------
> 5 files changed, 83 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 224c22c0a69c..155916dbdd7e 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -77,8 +77,11 @@ void kvm_clear_hyp_idmap(void);
>
> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
> +#define kvm_pfn_pud(pfn, prot) (__pud(0))
>
> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
> +/* No support for pud hugepages */
> +#define kvm_pud_mkhuge(pud) (pud)
>
> /*
> * The following kvm_*pud*() functionas are provided strictly to allow
> @@ -95,6 +98,22 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
> return false;
> }
>
> +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud)
> +{
> + BUG();
> +}
> +
> +static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
> +{
> + BUG();
> + return pud;
> +}
> +
> +static inline pud_t kvm_s2pud_mkexec(pud_t pud)
> +{
> + BUG();
> + return pud;
> +}
>
> static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
> {
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index f440cf216a23..f49a68fcbf26 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -172,11 +172,14 @@ void kvm_clear_hyp_idmap(void);
>
> #define kvm_set_pte(ptep, pte) set_pte(ptep, pte)
> #define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd)
> +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud)
>
> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
> +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot)
>
> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
> +#define kvm_pud_mkhuge(pud) pud_mkhuge(pud)
>
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> {
> @@ -190,6 +193,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
> return pmd;
> }
>
> +static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
> +{
> + pud_val(pud) |= PUD_S2_RDWR;
> + return pud;
> +}
> +
> static inline pte_t kvm_s2pte_mkexec(pte_t pte)
> {
> pte_val(pte) &= ~PTE_S2_XN;
> @@ -202,6 +211,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
> return pmd;
> }
>
> +static inline pud_t kvm_s2pud_mkexec(pud_t pud)
> +{
> + pud_val(pud) &= ~PUD_S2_XN;
> + return pud;
> +}
> +
> static inline void kvm_set_s2pte_readonly(pte_t *ptep)
> {
> pteval_t old_pteval, pteval;
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index fd208eac9f2a..e327665e94d1 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -193,6 +193,10 @@
> #define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
> #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */
>
> +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */
> +#define PUD_S2_RDWR (_AT(pudval_t, 3) << 6) /* HAP[2:1] */
> +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */
> +
> /*
> * Memory Attribute override for Stage-2 (MemAttr[3:0])
> */
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7e2c27e63cd8..5efb4585c879 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -386,6 +386,8 @@ static inline int pmd_protnone(pmd_t pmd)
>
> #define pud_write(pud) pte_write(pud_pte(pud))
>
> +#define pud_mkhuge(pud) (__pud(pud_val(pud) & ~PUD_TABLE_BIT))
> +
> #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
> #define __phys_to_pud_val(phys) __phys_to_pte_val(phys)
> #define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT)
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 5f53909da90e..7fb58dca0a83 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1036,6 +1036,26 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> return 0;
> }
>
> +static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> + phys_addr_t addr, const pud_t *new_pud)
> +{
> + pud_t *pud, old_pud;
> +
> + pud = stage2_get_pud(kvm, cache, addr);
> + VM_BUG_ON(!pud);
> +
> + old_pud = *pud;
> + if (pud_present(old_pud)) {
> + pud_clear(pud);
> + kvm_tlb_flush_vmid_ipa(kvm, addr);
> + } else {
> + get_page(virt_to_page(pud));
> + }
> +
> + kvm_set_pud(pud, *new_pud);
> + return 0;
> +}
> +
> static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> {
> pmd_t *pmdp;
> @@ -1452,9 +1472,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> }
>
> vma_pagesize = vma_kernel_pagesize(vma);
> - if (vma_pagesize == PMD_SIZE && !logging_active) {
> + if ((vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) &&
> + !logging_active) {
> + struct hstate *h = hstate_vma(vma);
> +
> hugetlb = true;
> - gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> + gfn = (fault_ipa & huge_page_mask(h)) >> PAGE_SHIFT;
> } else {
> /*
> * Pages belonging to memslots that don't have the same
> @@ -1521,15 +1544,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> if (mmu_notifier_retry(kvm, mmu_seq))
> goto out_unlock;
>
> - if (!hugetlb && !force_pte) {
> - /*
> - * Only PMD_SIZE transparent hugepages(THP) are
> - * currently supported. This code will need to be
> - * updated if other THP sizes are supported.
> - */
> + if (!hugetlb && !force_pte)
> hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> - vma_pagesize = PMD_SIZE;

Why this change? Won't you end up trying to map THPs as individual
pages now?

> - }
>
> if (writable)
> kvm_set_pfn_dirty(pfn);
> @@ -1540,7 +1556,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> if (exec_fault)
> invalidate_icache_guest_page(pfn, vma_pagesize);
>
> - if (hugetlb) {
> + if (vma_pagesize == PUD_SIZE) {
> + pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
> +
> + new_pud = kvm_pud_mkhuge(new_pud);
> + if (writable)
> + new_pud = kvm_s2pud_mkwrite(new_pud);
> +
> + if (exec_fault) {
> + new_pud = kvm_s2pud_mkexec(new_pud);
> + } else if (fault_status == FSC_PERM) {
> + /* Preserve execute if XN was already cleared */
> + if (stage2_is_exec(kvm, fault_ipa))
> + new_pud = kvm_s2pud_mkexec(new_pud);
> + }

aha, another reason for my suggestion in the other patch.

> +
> + ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud);
> + } else if (vma_pagesize == PMD_SIZE) {
> pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
>
> new_pmd = kvm_pmd_mkhuge(new_pmd);
> --
> 2.17.0
>

Otherwise, this patch looks fine.

Thanks,
-Christoffer

2018-04-27 11:16:13

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 2/4] KVM: arm/arm64: Introduce helpers to manupulate page table entries

On Fri, Apr 20, 2018 at 03:54:07PM +0100, Punit Agrawal wrote:
> Introduce helpers to abstract architectural handling of the conversion
> of pfn to page table entries and marking a PMD page table entry as a
> block entry.
>
> The helpers are introduced in preparation for supporting PUD hugepages
> at stage 2 - which are supported on arm64 but do not exist on arm.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>

Acked-by: Christoffer Dall <[email protected]>

> ---
> arch/arm/include/asm/kvm_mmu.h | 5 +++++
> arch/arm64/include/asm/kvm_mmu.h | 5 +++++
> virt/kvm/arm/mmu.c | 7 ++++---
> 3 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 707a1f06dc5d..5907a81ad5c1 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -75,6 +75,11 @@ phys_addr_t kvm_get_idmap_vector(void);
> int kvm_mmu_init(void);
> void kvm_clear_hyp_idmap(void);
>
> +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
> +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
> +
> +#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
> +
> static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
> {
> *pmd = new_pmd;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 082110993647..d962508ce4b3 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -173,6 +173,11 @@ void kvm_clear_hyp_idmap(void);
> #define kvm_set_pte(ptep, pte) set_pte(ptep, pte)
> #define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd)
>
> +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
> +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
> +
> +#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
> +
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> {
> pte_val(pte) |= PTE_S2_RDWR;
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index db382c7c7cd7..f72ae7a6dea0 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1538,8 +1538,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> invalidate_icache_guest_page(pfn, vma_pagesize);
>
> if (hugetlb) {
> - pmd_t new_pmd = pfn_pmd(pfn, mem_type);
> - new_pmd = pmd_mkhuge(new_pmd);
> + pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
> +
> + new_pmd = kvm_pmd_mkhuge(new_pmd);
> if (writable)
> new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>
> @@ -1553,7 +1554,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
> ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
> } else {
> - pte_t new_pte = pfn_pte(pfn, mem_type);
> + pte_t new_pte = kvm_pfn_pte(pfn, mem_type);
>
> if (writable) {
> new_pte = kvm_s2pte_mkwrite(new_pte);
> --
> 2.17.0
>

2018-04-27 11:16:27

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 3/4] KVM: arm64: Support dirty page tracking for PUD hugepages

On Fri, Apr 20, 2018 at 03:54:08PM +0100, Punit Agrawal wrote:
> In preparation for creating PUD hugepages at stage 2, add support for
> write protecting PUD hugepages when they are encountered. Write
> protecting guest tables is used to track dirty pages when migrating
> VMs.
>
> Also, provide trivial implementations of required kvm_s2pud_* helpers
> to allow sharing of code with arm32.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 16 ++++++++++++++++
> arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
> virt/kvm/arm/mmu.c | 9 ++++++---
> 3 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 5907a81ad5c1..224c22c0a69c 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -80,6 +80,22 @@ void kvm_clear_hyp_idmap(void);
>
> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
>
> +/*
> + * The following kvm_*pud*() functionas are provided strictly to allow
> + * sharing code with arm64. They should never be called in practice.
> + */
> +static inline void kvm_set_s2pud_readonly(pud_t *pud)
> +{
> + BUG();
> +}
> +
> +static inline bool kvm_s2pud_readonly(pud_t *pud)
> +{
> + BUG();
> + return false;
> +}
> +
> +
> static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
> {
> *pmd = new_pmd;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index d962508ce4b3..f440cf216a23 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -240,6 +240,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp)
> return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN);
> }
>
> +static inline void kvm_set_s2pud_readonly(pud_t *pudp)
> +{
> + kvm_set_s2pte_readonly((pte_t *)pudp);
> +}
> +
> +static inline bool kvm_s2pud_readonly(pud_t *pudp)
> +{
> + return kvm_s2pte_readonly((pte_t *)pudp);
> +}
> +
> static inline bool kvm_page_empty(void *ptr)
> {
> struct page *ptr_page = virt_to_page(ptr);
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index f72ae7a6dea0..5f53909da90e 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1286,9 +1286,12 @@ static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end)
> do {
> next = stage2_pud_addr_end(addr, end);
> if (!stage2_pud_none(*pud)) {
> - /* TODO:PUD not supported, revisit later if supported */
> - BUG_ON(stage2_pud_huge(*pud));
> - stage2_wp_pmds(pud, addr, next);
> + if (stage2_pud_huge(*pud)) {
> + if (!kvm_s2pud_readonly(pud))
> + kvm_set_s2pud_readonly(pud);
> + } else {
> + stage2_wp_pmds(pud, addr, next);
> + }
> }
> } while (pud++, addr = next, addr != end);
> }
> --
> 2.17.0
>

Reviewed-by: Christoffer Dall <[email protected]>

2018-04-27 14:23:18

by Punit Agrawal

[permalink] [raw]
Subject: Re: [PATCH 1/4] KVM: arm/arm64: Share common code in user_mem_abort()

Christoffer Dall <[email protected]> writes:

> On Fri, Apr 20, 2018 at 03:54:06PM +0100, Punit Agrawal wrote:
>> The code for operations such as marking the pfn as dirty, and
>> dcache/icache maintenance during stage 2 fault handling is duplicated
>> between normal pages and PMD hugepages.
>>
>> Instead of creating another copy of the operations when we introduce
>> PUD hugepages, let's share them across the different pagesizes.
>>
>> Signed-off-by: Punit Agrawal <[email protected]>
>> Cc: Christoffer Dall <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> ---
>> virt/kvm/arm/mmu.c | 36 +++++++++++++++++++++---------------
>> 1 file changed, 21 insertions(+), 15 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 7f6a944db23d..db382c7c7cd7 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1428,7 +1428,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> kvm_pfn_t pfn;
>> pgprot_t mem_type = PAGE_S2;
>> bool logging_active = memslot_is_logging(memslot);
>> - unsigned long flags = 0;
>> + unsigned long vma_pagesize, flags = 0;
>>
>> write_fault = kvm_is_write_fault(vcpu);
>> exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
>> @@ -1448,7 +1448,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> return -EFAULT;
>> }
>>
>> - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
>> + vma_pagesize = vma_kernel_pagesize(vma);
>> + if (vma_pagesize == PMD_SIZE && !logging_active) {
>> hugetlb = true;
>> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>> } else {
>> @@ -1517,23 +1518,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> if (mmu_notifier_retry(kvm, mmu_seq))
>> goto out_unlock;
>>
>> - if (!hugetlb && !force_pte)
>> + if (!hugetlb && !force_pte) {
>> + /*
>> + * Only PMD_SIZE transparent hugepages(THP) are
>> + * currently supported. This code will need to be
>> + * updated if other THP sizes are supported.
>> + */
>> hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>> + vma_pagesize = PMD_SIZE;
>> + }
>> +
>> + if (writable)
>> + kvm_set_pfn_dirty(pfn);
>> +
>> + if (fault_status != FSC_PERM)
>> + clean_dcache_guest_page(pfn, vma_pagesize);
>> +
>> + if (exec_fault)
>> + invalidate_icache_guest_page(pfn, vma_pagesize);
>>
>> if (hugetlb) {
>> pmd_t new_pmd = pfn_pmd(pfn, mem_type);
>> new_pmd = pmd_mkhuge(new_pmd);
>> - if (writable) {
>> + if (writable)
>> new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>> - kvm_set_pfn_dirty(pfn);
>> - }
>> -
>> - if (fault_status != FSC_PERM)
>> - clean_dcache_guest_page(pfn, PMD_SIZE);
>>
>> if (exec_fault) {
>> new_pmd = kvm_s2pmd_mkexec(new_pmd);
>> - invalidate_icache_guest_page(pfn, PMD_SIZE);
>> } else if (fault_status == FSC_PERM) {
>
> This could now be rewritten to:
>
> if (exec_fault ||
> (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)))
> new_pmd = kvm_s2pmd_mkexec(new_pmd);
>
> which we could even consider making
>
> static bool stage2_should_exec(struct kvm *kvm, phys_addr_t addr,
> bool exec_fault, unsigned long fault_status)
> {
> /*
> * If we took an execution fault we will have made the
> * icache/dcache coherent and should now let the s2 mapping be
> * executable.
> *
> * Write faults (!exec_fault && FSC_PERM) are orthogonal to
> * execute permissions, and we preserve whatever we have.
> */
> return exec_fault ||
> (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
> }
>
> The benefit would be to have this documentation in a single place and
> slightly simply both the hugetlb and !hugetlb blocks.

Makes sense. And saves another copy when we introduce PUD handling in a
latter patch. I've rolled this in with minor changes.
>
>
>> /* Preserve execute if XN was already cleared */
>> if (stage2_is_exec(kvm, fault_ipa))
>> @@ -1546,16 +1557,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (writable) {
>> new_pte = kvm_s2pte_mkwrite(new_pte);
>> - kvm_set_pfn_dirty(pfn);
>> mark_page_dirty(kvm, gfn);
>> }
>>
>> - if (fault_status != FSC_PERM)
>> - clean_dcache_guest_page(pfn, PAGE_SIZE);
>> -
>> if (exec_fault) {
>> new_pte = kvm_s2pte_mkexec(new_pte);
>> - invalidate_icache_guest_page(pfn, PAGE_SIZE);
>> } else if (fault_status == FSC_PERM) {
>> /* Preserve execute if XN was already cleared */
>> if (stage2_is_exec(kvm, fault_ipa))
>> --
>> 2.17.0
>>
>
> Notwithstanding my suggestion above:
>
> Reviewed-by: Christoffer Dall <[email protected]>

Thanks,
Punit

2018-04-27 14:52:08

by Punit Agrawal

[permalink] [raw]
Subject: Re: [PATCH 4/4] KVM: arm64: Add support for PUD hugepages at stage 2

Christoffer Dall <[email protected]> writes:

> On Fri, Apr 20, 2018 at 03:54:09PM +0100, Punit Agrawal wrote:
>> KVM only supports PMD hugepages at stage 2. Extend the stage 2 fault
>> handling to add support for PUD hugepages.
>>
>> Addition of pud hugepage support enables additional hugepage
>> sizes (e.g., 1G with 4K granule) which can be useful on cores that
>> support mapping larger block sizes in the TLB entries.
>>
>> Signed-off-by: Punit Agrawal <[email protected]>
>> Cc: Christoffer Dall <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Russell King <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> ---
>> arch/arm/include/asm/kvm_mmu.h | 19 +++++++++
>> arch/arm64/include/asm/kvm_mmu.h | 15 +++++++
>> arch/arm64/include/asm/pgtable-hwdef.h | 4 ++
>> arch/arm64/include/asm/pgtable.h | 2 +
>> virt/kvm/arm/mmu.c | 54 ++++++++++++++++++++------
>> 5 files changed, 83 insertions(+), 11 deletions(-)
>>

[...]

>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c

[...]

>> @@ -1452,9 +1472,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> }
>>
>> vma_pagesize = vma_kernel_pagesize(vma);
>> - if (vma_pagesize == PMD_SIZE && !logging_active) {
>> + if ((vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) &&
>> + !logging_active) {
>> + struct hstate *h = hstate_vma(vma);
>> +
>> hugetlb = true;
>> - gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>> + gfn = (fault_ipa & huge_page_mask(h)) >> PAGE_SHIFT;
>> } else {
>> /*
>> * Pages belonging to memslots that don't have the same
>> @@ -1521,15 +1544,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> if (mmu_notifier_retry(kvm, mmu_seq))
>> goto out_unlock;
>>
>> - if (!hugetlb && !force_pte) {
>> - /*
>> - * Only PMD_SIZE transparent hugepages(THP) are
>> - * currently supported. This code will need to be
>> - * updated if other THP sizes are supported.
>> - */
>> + if (!hugetlb && !force_pte)
>> hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>> - vma_pagesize = PMD_SIZE;
>
> Why this change? Won't you end up trying to map THPs as individual
> pages now?

Argh - that's a rebase gone awry. Thanks for spotting.

There's another issue with this hunk - hugetlb can be false after the
call to transparent_hugepage_adjust(). I've fixed that up for the next
update.

>
>> - }
>>
>> if (writable)
>> kvm_set_pfn_dirty(pfn);
>> @@ -1540,7 +1556,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> if (exec_fault)
>> invalidate_icache_guest_page(pfn, vma_pagesize);
>>
>> - if (hugetlb) {
>> + if (vma_pagesize == PUD_SIZE) {
>> + pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
>> +
>> + new_pud = kvm_pud_mkhuge(new_pud);
>> + if (writable)
>> + new_pud = kvm_s2pud_mkwrite(new_pud);
>> +
>> + if (exec_fault) {
>> + new_pud = kvm_s2pud_mkexec(new_pud);
>> + } else if (fault_status == FSC_PERM) {
>> + /* Preserve execute if XN was already cleared */
>> + if (stage2_is_exec(kvm, fault_ipa))
>> + new_pud = kvm_s2pud_mkexec(new_pud);
>> + }
>
> aha, another reason for my suggestion in the other patch.

Ack! Already fixed locally.

>
>> +
>> + ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud);
>> + } else if (vma_pagesize == PMD_SIZE) {
>> pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
>>
>> new_pmd = kvm_pmd_mkhuge(new_pmd);
>> --
>> 2.17.0
>>
>
> Otherwise, this patch looks fine.

Thanks a lot for reviewing the patches. I'll send out an update
incorporating your suggestions.

Punit