2018-10-31 19:01:18

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH v9 0/8] KVM: Support PUD hugepage at stage 2

This series is an update to the PUD hugepage support previously posted
at [0]. This patchset adds support for PUD hugepages at stage 2 a
feature that is useful on cores that have support for large sized TLB
mappings (e.g., 1GB for 4K granule).

The patches are based on the latest upstream kernel.

The patches have been tested on AMD Seattle system with the following
hugepage sizes - 2M and 1G.

Thanks,
Punit

[0] https://patchwork.kernel.org/cover/10622379/

v8 -> v9

* Dropped bugfix patch 1 which has been merged

v7 -> v8

* Add kvm_stage2_has_pud() helper on arm32
* Rebased to v6 of 52bit dynamic IPA support

v6 -> v7

* Restrict thp check to exclude hugetlbfs pages - Patch 1
* Don't update PUD entry if there's no change - Patch 9
* Add check for PUD level in stage 2 - Patch 9

v5 -> v6

* Split Patch 1 to move out the refactoring of exec permissions on
page table entries.
* Patch 4 - Initialise p*dpp pointers in stage2_get_leaf_entry()
* Patch 5 - Trigger a BUG() in kvm_pud_pfn() on arm

v4 -> v5:
* Patch 1 - Drop helper stage2_should_exec() and refactor the
condition to decide if a page table entry should be marked
executable
* Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and
latter patches
* Patch 7 - Use stage 2 accessors instead of using the page table
helpers directly
* Patch 7 - Add a note to update the PUD hugepage support when number
of levels of stage 2 tables differs from stage 1

v3 -> v4:
* Patch 1 and 7 - Don't put down hugepages pte if logging is enabled
* Patch 4-5 - Add PUD hugepage support for exec and access faults
* Patch 6 - PUD hugepage support for aging page table entries

v2 -> v3:
* Update vma_pagesize directly if THP [1/4]. Previsouly this was done
indirectly via hugetlb
* Added review tag [4/4]

v1 -> v2:
* Create helper to check if the page should have exec permission [1/4]
* Fix broken condition to detect THP hugepage [1/4]
* Fix in-correct hunk resulting from a rebase [4/4]

Punit Agrawal (8):
KVM: arm/arm64: Share common code in user_mem_abort()
KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault
KVM: arm/arm64: Introduce helpers to manipulate page table entries
KVM: arm64: Support dirty page tracking for PUD hugepages
KVM: arm64: Support PUD hugepage in stage2_is_exec()
KVM: arm64: Support handling access faults for PUD hugepages
KVM: arm64: Update age handlers to support PUD hugepages
KVM: arm64: Add support for creating PUD hugepages at stage 2

arch/arm/include/asm/kvm_mmu.h | 61 +++++
arch/arm/include/asm/stage2_pgtable.h | 5 +
arch/arm64/include/asm/kvm_mmu.h | 48 ++++
arch/arm64/include/asm/pgtable-hwdef.h | 4 +
arch/arm64/include/asm/pgtable.h | 9 +
virt/kvm/arm/mmu.c | 312 ++++++++++++++++++-------
6 files changed, 360 insertions(+), 79 deletions(-)

--
2.19.1



2018-10-31 17:59:14

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH v9 3/8] KVM: arm/arm64: Introduce helpers to manipulate page table entries

Introduce helpers to abstract architectural handling of the conversion
of pfn to page table entries and marking a PMD page table entry as a
block entry.

The helpers are introduced in preparation for supporting PUD hugepages
at stage 2 - which are supported on arm64 but do not exist on arm.

Signed-off-by: Punit Agrawal <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Acked-by: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 5 +++++
arch/arm64/include/asm/kvm_mmu.h | 5 +++++
virt/kvm/arm/mmu.c | 14 ++++++++------
3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 1098ffc3d54b..e6eff8bf5d7f 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -82,6 +82,11 @@ void kvm_clear_hyp_idmap(void);
#define kvm_mk_pud(pmdp) __pud(__pa(pmdp) | PMD_TYPE_TABLE)
#define kvm_mk_pgd(pudp) ({ BUILD_BUG(); 0; })

+#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
+#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+
+#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
+
static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
{
pte_val(pte) |= L_PTE_S2_RDWR;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 658657367f2f..13d482710292 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -184,6 +184,11 @@ void kvm_clear_hyp_idmap(void);
#define kvm_mk_pgd(pudp) \
__pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE)

+#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
+#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+
+#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
+
static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
{
pte_val(pte) |= PTE_S2_RDWR;
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 6912529946fb..fb5325f7a1ac 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -607,7 +607,7 @@ static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
addr = start;
do {
pte = pte_offset_kernel(pmd, addr);
- kvm_set_pte(pte, pfn_pte(pfn, prot));
+ kvm_set_pte(pte, kvm_pfn_pte(pfn, prot));
get_page(virt_to_page(pte));
pfn++;
} while (addr += PAGE_SIZE, addr != end);
@@ -1202,7 +1202,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
pfn = __phys_to_pfn(pa);

for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
- pte_t pte = pfn_pte(pfn, PAGE_S2_DEVICE);
+ pte_t pte = kvm_pfn_pte(pfn, PAGE_S2_DEVICE);

if (writable)
pte = kvm_s2pte_mkwrite(pte);
@@ -1611,8 +1611,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
(fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));

if (vma_pagesize == PMD_SIZE) {
- pmd_t new_pmd = pfn_pmd(pfn, mem_type);
- new_pmd = pmd_mkhuge(new_pmd);
+ pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
+
+ new_pmd = kvm_pmd_mkhuge(new_pmd);
+
if (writable)
new_pmd = kvm_s2pmd_mkwrite(new_pmd);

@@ -1621,7 +1623,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,

ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
} else {
- pte_t new_pte = pfn_pte(pfn, mem_type);
+ pte_t new_pte = kvm_pfn_pte(pfn, mem_type);

if (writable) {
new_pte = kvm_s2pte_mkwrite(new_pte);
@@ -1878,7 +1880,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
* just like a translation fault and clean the cache to the PoC.
*/
clean_dcache_guest_page(pfn, PAGE_SIZE);
- stage2_pte = pfn_pte(pfn, PAGE_S2);
+ stage2_pte = kvm_pfn_pte(pfn, PAGE_S2);
handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
}

--
2.19.1


2018-10-31 17:59:29

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH v9 4/8] KVM: arm64: Support dirty page tracking for PUD hugepages

In preparation for creating PUD hugepages at stage 2, add support for
write protecting PUD hugepages when they are encountered. Write
protecting guest tables is used to track dirty pages when migrating
VMs.

Also, provide trivial implementations of required kvm_s2pud_* helpers
to allow sharing of code with arm32.

Signed-off-by: Punit Agrawal <[email protected]>
Reviewed-by: Christoffer Dall <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 15 +++++++++++++++
arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
virt/kvm/arm/mmu.c | 11 +++++++----
3 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index e6eff8bf5d7f..37bf85d39607 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -87,6 +87,21 @@ void kvm_clear_hyp_idmap(void);

#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)

+/*
+ * The following kvm_*pud*() functions are provided strictly to allow
+ * sharing code with arm64. They should never be called in practice.
+ */
+static inline void kvm_set_s2pud_readonly(pud_t *pud)
+{
+ BUG();
+}
+
+static inline bool kvm_s2pud_readonly(pud_t *pud)
+{
+ BUG();
+ return false;
+}
+
static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
{
pte_val(pte) |= L_PTE_S2_RDWR;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 13d482710292..8da6d1b2a196 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -251,6 +251,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp)
return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN);
}

+static inline void kvm_set_s2pud_readonly(pud_t *pudp)
+{
+ kvm_set_s2pte_readonly((pte_t *)pudp);
+}
+
+static inline bool kvm_s2pud_readonly(pud_t *pudp)
+{
+ return kvm_s2pte_readonly((pte_t *)pudp);
+}
+
#define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)

#ifdef __PAGETABLE_PMD_FOLDED
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index fb5325f7a1ac..1c669c3c1208 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1347,9 +1347,12 @@ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
do {
next = stage2_pud_addr_end(kvm, addr, end);
if (!stage2_pud_none(kvm, *pud)) {
- /* TODO:PUD not supported, revisit later if supported */
- BUG_ON(stage2_pud_huge(kvm, *pud));
- stage2_wp_pmds(kvm, pud, addr, next);
+ if (stage2_pud_huge(kvm, *pud)) {
+ if (!kvm_s2pud_readonly(pud))
+ kvm_set_s2pud_readonly(pud);
+ } else {
+ stage2_wp_pmds(kvm, pud, addr, next);
+ }
}
} while (pud++, addr = next, addr != end);
}
@@ -1392,7 +1395,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
*
* Called to start logging dirty pages after memory region
* KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
- * all present PMD and PTEs are write protected in the memory region.
+ * all present PUD, PMD and PTEs are write protected in the memory region.
* Afterwards read of dirty page log can be called.
*
* Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
--
2.19.1


2018-10-31 18:00:12

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH v9 7/8] KVM: arm64: Update age handlers to support PUD hugepages

In preparation for creating larger hugepages at Stage 2, add support
to the age handling notifiers for PUD hugepages when encountered.

Provide trivial helpers for arm32 to allow sharing code.

Signed-off-by: Punit Agrawal <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 6 +++++
arch/arm64/include/asm/kvm_mmu.h | 5 ++++
arch/arm64/include/asm/pgtable.h | 1 +
virt/kvm/arm/mmu.c | 39 ++++++++++++++++----------------
4 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index fea5e723e3ac..e62f0913ce7d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -117,6 +117,12 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
return pud;
}

+static inline bool kvm_s2pud_young(pud_t pud)
+{
+ BUG();
+ return false;
+}
+
static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
{
pte_val(pte) |= L_PTE_S2_RDWR;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 612032bbb428..9f941f70775c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -273,6 +273,11 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
return pud_mkyoung(pud);
}

+static inline bool kvm_s2pud_young(pud_t pud)
+{
+ return pud_young(pud);
+}
+
#define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)

#ifdef __PAGETABLE_PMD_FOLDED
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index f51e2271e6a3..bb0f3f17a7a9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -386,6 +386,7 @@ static inline int pmd_protnone(pmd_t pmd)
#define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)

+#define pud_young(pud) pte_young(pud_pte(pud))
#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
#define pud_write(pud) pte_write(pud_pte(pud))

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index bd749601195f..3893ea6a50bf 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1225,6 +1225,11 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd)
return stage2_ptep_test_and_clear_young((pte_t *)pmd);
}

+static int stage2_pudp_test_and_clear_young(pud_t *pud)
+{
+ return stage2_ptep_test_and_clear_young((pte_t *)pud);
+}
+
/**
* kvm_phys_addr_ioremap - map a device range to guest IPA
*
@@ -1932,42 +1937,38 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)

static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
{
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte;

- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
- pmd = stage2_get_pmd(kvm, NULL, gpa);
- if (!pmd || pmd_none(*pmd)) /* Nothing there */
+ WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
+ if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte))
return 0;

- if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */
+ if (pud)
+ return stage2_pudp_test_and_clear_young(pud);
+ else if (pmd)
return stage2_pmdp_test_and_clear_young(pmd);
-
- pte = pte_offset_kernel(pmd, gpa);
- if (pte_none(*pte))
- return 0;
-
- return stage2_ptep_test_and_clear_young(pte);
+ else
+ return stage2_ptep_test_and_clear_young(pte);
}

static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
{
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte;

- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
- pmd = stage2_get_pmd(kvm, NULL, gpa);
- if (!pmd || pmd_none(*pmd)) /* Nothing there */
+ WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
+ if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte))
return 0;

- if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */
+ if (pud)
+ return kvm_s2pud_young(*pud);
+ else if (pmd)
return pmd_young(*pmd);
-
- pte = pte_offset_kernel(pmd, gpa);
- if (!pte_none(*pte)) /* Just a page... */
+ else
return pte_young(*pte);
-
- return 0;
}

int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
--
2.19.1


2018-10-31 18:00:18

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH v9 2/8] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault

Stage 2 fault handler marks a page as executable if it is handling an
execution fault or if it was a permission fault in which case the
executable bit needs to be preserved.

The logic to decide if the page should be marked executable is
duplicated for PMD and PTE entries. To avoid creating another copy
when support for PUD hugepages is introduced refactor the code to
share the checks needed to mark a page table entry as executable.

Signed-off-by: Punit Agrawal <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
virt/kvm/arm/mmu.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 59595207c5e1..6912529946fb 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1475,7 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
unsigned long fault_status)
{
int ret;
- bool write_fault, exec_fault, writable, force_pte = false;
+ bool write_fault, writable, force_pte = false;
+ bool exec_fault, needs_exec;
unsigned long mmu_seq;
gfn_t gfn = fault_ipa >> PAGE_SHIFT;
struct kvm *kvm = vcpu->kvm;
@@ -1598,19 +1599,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (exec_fault)
invalidate_icache_guest_page(pfn, vma_pagesize);

+ /*
+ * If we took an execution fault we have made the
+ * icache/dcache coherent above and should now let the s2
+ * mapping be executable.
+ *
+ * Write faults (!exec_fault && FSC_PERM) are orthogonal to
+ * execute permissions, and we preserve whatever we have.
+ */
+ needs_exec = exec_fault ||
+ (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
+
if (vma_pagesize == PMD_SIZE) {
pmd_t new_pmd = pfn_pmd(pfn, mem_type);
new_pmd = pmd_mkhuge(new_pmd);
if (writable)
new_pmd = kvm_s2pmd_mkwrite(new_pmd);

- if (exec_fault) {
+ if (needs_exec)
new_pmd = kvm_s2pmd_mkexec(new_pmd);
- } else if (fault_status == FSC_PERM) {
- /* Preserve execute if XN was already cleared */
- if (stage2_is_exec(kvm, fault_ipa))
- new_pmd = kvm_s2pmd_mkexec(new_pmd);
- }

ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
} else {
@@ -1621,13 +1628,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
mark_page_dirty(kvm, gfn);
}

- if (exec_fault) {
+ if (needs_exec)
new_pte = kvm_s2pte_mkexec(new_pte);
- } else if (fault_status == FSC_PERM) {
- /* Preserve execute if XN was already cleared */
- if (stage2_is_exec(kvm, fault_ipa))
- new_pte = kvm_s2pte_mkexec(new_pte);
- }

ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
}
--
2.19.1


2018-10-31 18:00:23

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH v9 8/8] KVM: arm64: Add support for creating PUD hugepages at stage 2

KVM only supports PMD hugepages at stage 2. Now that the various page
handling routines are updated, extend the stage 2 fault handling to
map in PUD hugepages.

Addition of PUD hugepage support enables additional page sizes (e.g.,
1G with 4K granule) which can be useful on cores that support mapping
larger block sizes in the TLB entries.

Signed-off-by: Punit Agrawal <[email protected]>
Reviewed-by: Suzuki Poulose <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 20 +++++
arch/arm/include/asm/stage2_pgtable.h | 5 ++
arch/arm64/include/asm/kvm_mmu.h | 16 ++++
arch/arm64/include/asm/pgtable-hwdef.h | 2 +
arch/arm64/include/asm/pgtable.h | 2 +
virt/kvm/arm/mmu.c | 104 +++++++++++++++++++++++--
6 files changed, 143 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index e62f0913ce7d..6336319a0d5b 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -84,11 +84,14 @@ void kvm_clear_hyp_idmap(void);

#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+#define kvm_pfn_pud(pfn, prot) (__pud(0))

#define kvm_pud_pfn(pud) ({ BUG(); 0; })


#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
+/* No support for pud hugepages */
+#define kvm_pud_mkhuge(pud) ( {BUG(); pud; })

/*
* The following kvm_*pud*() functions are provided strictly to allow
@@ -105,6 +108,23 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
return false;
}

+static inline void kvm_set_pud(pud_t *pud, pud_t new_pud)
+{
+ BUG();
+}
+
+static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
+{
+ BUG();
+ return pud;
+}
+
+static inline pud_t kvm_s2pud_mkexec(pud_t pud)
+{
+ BUG();
+ return pud;
+}
+
static inline bool kvm_s2pud_exec(pud_t *pud)
{
BUG();
diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
index f6a7ea805232..f9017167a8d1 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -68,4 +68,9 @@ stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
#define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp)
#define stage2_pud_table_empty(kvm, pudp) false

+static inline bool kvm_stage2_has_pud(struct kvm *kvm)
+{
+ return false;
+}
+
#endif /* __ARM_S2_PGTABLE_H_ */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 9f941f70775c..8af4b1befa42 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -184,12 +184,16 @@ void kvm_clear_hyp_idmap(void);
#define kvm_mk_pgd(pudp) \
__pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE)

+#define kvm_set_pud(pudp, pud) set_pud(pudp, pud)
+
#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot)

#define kvm_pud_pfn(pud) pud_pfn(pud)

#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
+#define kvm_pud_mkhuge(pud) pud_mkhuge(pud)

static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
{
@@ -203,6 +207,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
return pmd;
}

+static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
+{
+ pud_val(pud) |= PUD_S2_RDWR;
+ return pud;
+}
+
static inline pte_t kvm_s2pte_mkexec(pte_t pte)
{
pte_val(pte) &= ~PTE_S2_XN;
@@ -215,6 +225,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
return pmd;
}

+static inline pud_t kvm_s2pud_mkexec(pud_t pud)
+{
+ pud_val(pud) &= ~PUD_S2_XN;
+ return pud;
+}
+
static inline void kvm_set_s2pte_readonly(pte_t *ptep)
{
pteval_t old_pteval, pteval;
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 336e24cddc87..6f1c187f1c86 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -193,6 +193,8 @@
#define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
#define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */

+#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */
+#define PUD_S2_RDWR (_AT(pudval_t, 3) << 6) /* HAP[2:1] */
#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */

/*
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index bb0f3f17a7a9..576128635f3c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -390,6 +390,8 @@ static inline int pmd_protnone(pmd_t pmd)
#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
#define pud_write(pud) pte_write(pud_pte(pud))

+#define pud_mkhuge(pud) (__pud(pud_val(pud) & ~PUD_TABLE_BIT))
+
#define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
#define __phys_to_pud_val(phys) __phys_to_pte_val(phys)
#define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 3893ea6a50bf..2dcff38868d4 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -115,6 +115,25 @@ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
put_page(virt_to_page(pmd));
}

+/**
+ * stage2_dissolve_pud() - clear and flush huge PUD entry
+ * @kvm: pointer to kvm structure.
+ * @addr: IPA
+ * @pud: pud pointer for IPA
+ *
+ * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs. Marks all
+ * pages in the range dirty.
+ */
+static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp)
+{
+ if (!stage2_pud_huge(kvm, *pudp))
+ return;
+
+ stage2_pud_clear(kvm, pudp);
+ kvm_tlb_flush_vmid_ipa(kvm, addr);
+ put_page(virt_to_page(pudp));
+}
+
static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
int min, int max)
{
@@ -1022,7 +1041,7 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
pmd_t *pmd;

pud = stage2_get_pud(kvm, cache, addr);
- if (!pud)
+ if (!pud || stage2_pud_huge(kvm, *pud))
return NULL;

if (stage2_pud_none(kvm, *pud)) {
@@ -1083,6 +1102,36 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
return 0;
}

+static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+ phys_addr_t addr, const pud_t *new_pudp)
+{
+ pud_t *pudp, old_pud;
+
+ pudp = stage2_get_pud(kvm, cache, addr);
+ VM_BUG_ON(!pudp);
+
+ old_pud = *pudp;
+
+ /*
+ * A large number of vcpus faulting on the same stage 2 entry,
+ * can lead to a refault due to the
+ * stage2_pud_clear()/tlb_flush(). Skip updating the page
+ * tables if there is no change.
+ */
+ if (pud_val(old_pud) == pud_val(*new_pudp))
+ return 0;
+
+ if (stage2_pud_present(kvm, old_pud)) {
+ stage2_pud_clear(kvm, pudp);
+ kvm_tlb_flush_vmid_ipa(kvm, addr);
+ } else {
+ get_page(virt_to_page(pudp));
+ }
+
+ kvm_set_pud(pudp, *new_pudp);
+ return 0;
+}
+
/*
* stage2_get_leaf_entry - walk the stage2 VM page tables and return
* true if a valid and present leaf-entry is found. A pointer to the
@@ -1149,6 +1198,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
phys_addr_t addr, const pte_t *new_pte,
unsigned long flags)
{
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte, old_pte;
bool iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP;
@@ -1157,7 +1207,31 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
VM_BUG_ON(logging_active && !cache);

/* Create stage-2 page table mapping - Levels 0 and 1 */
- pmd = stage2_get_pmd(kvm, cache, addr);
+ pud = stage2_get_pud(kvm, cache, addr);
+ if (!pud) {
+ /*
+ * Ignore calls from kvm_set_spte_hva for unallocated
+ * address ranges.
+ */
+ return 0;
+ }
+
+ /*
+ * While dirty page logging - dissolve huge PUD, then continue
+ * on to allocate page.
+ */
+ if (logging_active)
+ stage2_dissolve_pud(kvm, addr, pud);
+
+ if (stage2_pud_none(kvm, *pud)) {
+ if (!cache)
+ return 0; /* ignore calls from kvm_set_spte_hva */
+ pmd = mmu_memory_cache_alloc(cache);
+ stage2_pud_populate(kvm, pud, pmd);
+ get_page(virt_to_page(pud));
+ }
+
+ pmd = stage2_pmd_offset(kvm, pud, addr);
if (!pmd) {
/*
* Ignore calls from kvm_set_spte_hva for unallocated
@@ -1557,12 +1631,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
}

vma_pagesize = vma_kernel_pagesize(vma);
- if (vma_pagesize == PMD_SIZE && !logging_active) {
- gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
+ /*
+ * PUD level may not exist for a VM but PMD is guaranteed to
+ * exist.
+ */
+ if ((vma_pagesize == PMD_SIZE ||
+ (vma_pagesize == PUD_SIZE && kvm_stage2_has_pud(kvm))) &&
+ !logging_active) {
+ gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
} else {
/*
* Fallback to PTE if it's not one of the Stage 2
- * supported hugepage sizes
+ * supported hugepage sizes or the corresponding level
+ * doesn't exist
*/
vma_pagesize = PAGE_SIZE;

@@ -1661,7 +1742,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
needs_exec = exec_fault ||
(fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));

- if (vma_pagesize == PMD_SIZE) {
+ if (vma_pagesize == PUD_SIZE) {
+ pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
+
+ new_pud = kvm_pud_mkhuge(new_pud);
+ if (writable)
+ new_pud = kvm_s2pud_mkwrite(new_pud);
+
+ if (needs_exec)
+ new_pud = kvm_s2pud_mkexec(new_pud);
+
+ ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud);
+ } else if (vma_pagesize == PMD_SIZE) {
pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);

new_pmd = kvm_pmd_mkhuge(new_pmd);
--
2.19.1


2018-10-31 18:01:05

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH v9 5/8] KVM: arm64: Support PUD hugepage in stage2_is_exec()

In preparation for creating PUD hugepages at stage 2, add support for
detecting execute permissions on PUD page table entries. Faults due to
lack of execute permissions on page table entries is used to perform
i-cache invalidation on first execute.

Provide trivial implementations of arm32 helpers to allow sharing of
code.

Signed-off-by: Punit Agrawal <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 6 +++
arch/arm64/include/asm/kvm_mmu.h | 5 +++
arch/arm64/include/asm/pgtable-hwdef.h | 2 +
virt/kvm/arm/mmu.c | 53 +++++++++++++++++++++++---
4 files changed, 61 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 37bf85d39607..839a619873d3 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -102,6 +102,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
return false;
}

+static inline bool kvm_s2pud_exec(pud_t *pud)
+{
+ BUG();
+ return false;
+}
+
static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
{
pte_val(pte) |= L_PTE_S2_RDWR;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 8da6d1b2a196..c755b37b3f92 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -261,6 +261,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp)
return kvm_s2pte_readonly((pte_t *)pudp);
}

+static inline bool kvm_s2pud_exec(pud_t *pudp)
+{
+ return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
+}
+
#define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)

#ifdef __PAGETABLE_PMD_FOLDED
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 1d7d8da2ef9b..336e24cddc87 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -193,6 +193,8 @@
#define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
#define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */

+#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */
+
/*
* Memory Attribute override for Stage-2 (MemAttr[3:0])
*/
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1c669c3c1208..8e44dccd1b47 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1083,23 +1083,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
return 0;
}

-static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
+/*
+ * stage2_get_leaf_entry - walk the stage2 VM page tables and return
+ * true if a valid and present leaf-entry is found. A pointer to the
+ * leaf-entry is returned in the appropriate level variable - pudpp,
+ * pmdpp, ptepp.
+ */
+static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
+ pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
{
+ pud_t *pudp;
pmd_t *pmdp;
pte_t *ptep;

- pmdp = stage2_get_pmd(kvm, NULL, addr);
+ *pudpp = NULL;
+ *pmdpp = NULL;
+ *ptepp = NULL;
+
+ pudp = stage2_get_pud(kvm, NULL, addr);
+ if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp))
+ return false;
+
+ if (stage2_pud_huge(kvm, *pudp)) {
+ *pudpp = pudp;
+ return true;
+ }
+
+ pmdp = stage2_pmd_offset(kvm, pudp, addr);
if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp))
return false;

- if (pmd_thp_or_huge(*pmdp))
- return kvm_s2pmd_exec(pmdp);
+ if (pmd_thp_or_huge(*pmdp)) {
+ *pmdpp = pmdp;
+ return true;
+ }

ptep = pte_offset_kernel(pmdp, addr);
if (!ptep || pte_none(*ptep) || !pte_present(*ptep))
return false;

- return kvm_s2pte_exec(ptep);
+ *ptepp = ptep;
+ return true;
+}
+
+static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
+{
+ pud_t *pudp;
+ pmd_t *pmdp;
+ pte_t *ptep;
+ bool found;
+
+ found = stage2_get_leaf_entry(kvm, addr, &pudp, &pmdp, &ptep);
+ if (!found)
+ return false;
+
+ if (pudp)
+ return kvm_s2pud_exec(pudp);
+ else if (pmdp)
+ return kvm_s2pmd_exec(pmdp);
+ else
+ return kvm_s2pte_exec(ptep);
}

static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
--
2.19.1


2018-10-31 18:01:26

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH v9 6/8] KVM: arm64: Support handling access faults for PUD hugepages

In preparation for creating larger hugepages at Stage 2, extend the
access fault handling at Stage 2 to support PUD hugepages when
encountered.

Provide trivial helpers for arm32 to allow sharing of code.

Signed-off-by: Punit Agrawal <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 9 +++++++++
arch/arm64/include/asm/kvm_mmu.h | 7 +++++++
arch/arm64/include/asm/pgtable.h | 6 ++++++
virt/kvm/arm/mmu.c | 22 +++++++++++-----------
4 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 839a619873d3..fea5e723e3ac 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -85,6 +85,9 @@ void kvm_clear_hyp_idmap(void);
#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)

+#define kvm_pud_pfn(pud) ({ BUG(); 0; })
+
+
#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)

/*
@@ -108,6 +111,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud)
return false;
}

+static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
+{
+ BUG();
+ return pud;
+}
+
static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
{
pte_val(pte) |= L_PTE_S2_RDWR;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index c755b37b3f92..612032bbb428 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -187,6 +187,8 @@ void kvm_clear_hyp_idmap(void);
#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)

+#define kvm_pud_pfn(pud) pud_pfn(pud)
+
#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)

static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
@@ -266,6 +268,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp)
return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
}

+static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
+{
+ return pud_mkyoung(pud);
+}
+
#define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)

#ifdef __PAGETABLE_PMD_FOLDED
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 50b1ef8584c0..f51e2271e6a3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -314,6 +314,11 @@ static inline pte_t pud_pte(pud_t pud)
return __pte(pud_val(pud));
}

+static inline pud_t pte_pud(pte_t pte)
+{
+ return __pud(pte_val(pte));
+}
+
static inline pmd_t pud_pmd(pud_t pud)
{
return __pmd(pud_val(pud));
@@ -381,6 +386,7 @@ static inline int pmd_protnone(pmd_t pmd)
#define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)

+#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
#define pud_write(pud) pte_write(pud_pte(pud))

#define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 8e44dccd1b47..bd749601195f 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1698,6 +1698,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
*/
static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
{
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte;
kvm_pfn_t pfn;
@@ -1707,24 +1708,23 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)

spin_lock(&vcpu->kvm->mmu_lock);

- pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
- if (!pmd || pmd_none(*pmd)) /* Nothing there */
+ if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, &pud, &pmd, &pte))
goto out;

- if (pmd_thp_or_huge(*pmd)) { /* THP, HugeTLB */
+ if (pud) { /* HugeTLB */
+ *pud = kvm_s2pud_mkyoung(*pud);
+ pfn = kvm_pud_pfn(*pud);
+ pfn_valid = true;
+ } else if (pmd) { /* THP, HugeTLB */
*pmd = pmd_mkyoung(*pmd);
pfn = pmd_pfn(*pmd);
pfn_valid = true;
- goto out;
+ } else {
+ *pte = pte_mkyoung(*pte); /* Just a page... */
+ pfn = pte_pfn(*pte);
+ pfn_valid = true;
}

- pte = pte_offset_kernel(pmd, fault_ipa);
- if (pte_none(*pte)) /* Nothing there either */
- goto out;
-
- *pte = pte_mkyoung(*pte); /* Just a page... */
- pfn = pte_pfn(*pte);
- pfn_valid = true;
out:
spin_unlock(&vcpu->kvm->mmu_lock);
if (pfn_valid)
--
2.19.1


2018-10-31 19:01:18

by Punit Agrawal

[permalink] [raw]
Subject: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()

The code for operations such as marking the pfn as dirty, and
dcache/icache maintenance during stage 2 fault handling is duplicated
between normal pages and PMD hugepages.

Instead of creating another copy of the operations when we introduce
PUD hugepages, let's share them across the different pagesizes.

Signed-off-by: Punit Agrawal <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
virt/kvm/arm/mmu.c | 49 ++++++++++++++++++++++++++++------------------
1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 5eca48bdb1a6..59595207c5e1 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
unsigned long fault_status)
{
int ret;
- bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
+ bool write_fault, exec_fault, writable, force_pte = false;
unsigned long mmu_seq;
gfn_t gfn = fault_ipa >> PAGE_SHIFT;
struct kvm *kvm = vcpu->kvm;
@@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
kvm_pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
bool logging_active = memslot_is_logging(memslot);
- unsigned long flags = 0;
+ unsigned long vma_pagesize, flags = 0;

write_fault = kvm_is_write_fault(vcpu);
exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}

- if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
- hugetlb = true;
+ vma_pagesize = vma_kernel_pagesize(vma);
+ if (vma_pagesize == PMD_SIZE && !logging_active) {
gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
} else {
+ /*
+ * Fallback to PTE if it's not one of the Stage 2
+ * supported hugepage sizes
+ */
+ vma_pagesize = PAGE_SIZE;
+
/*
* Pages belonging to memslots that don't have the same
* alignment for userspace and IPA cannot be mapped using
@@ -1573,23 +1579,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (mmu_notifier_retry(kvm, mmu_seq))
goto out_unlock;

- if (!hugetlb && !force_pte)
- hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
+ if (vma_pagesize == PAGE_SIZE && !force_pte) {
+ /*
+ * Only PMD_SIZE transparent hugepages(THP) are
+ * currently supported. This code will need to be
+ * updated to support other THP sizes.
+ */
+ if (transparent_hugepage_adjust(&pfn, &fault_ipa))
+ vma_pagesize = PMD_SIZE;
+ }
+
+ if (writable)
+ kvm_set_pfn_dirty(pfn);

- if (hugetlb) {
+ if (fault_status != FSC_PERM)
+ clean_dcache_guest_page(pfn, vma_pagesize);
+
+ if (exec_fault)
+ invalidate_icache_guest_page(pfn, vma_pagesize);
+
+ if (vma_pagesize == PMD_SIZE) {
pmd_t new_pmd = pfn_pmd(pfn, mem_type);
new_pmd = pmd_mkhuge(new_pmd);
- if (writable) {
+ if (writable)
new_pmd = kvm_s2pmd_mkwrite(new_pmd);
- kvm_set_pfn_dirty(pfn);
- }
-
- if (fault_status != FSC_PERM)
- clean_dcache_guest_page(pfn, PMD_SIZE);

if (exec_fault) {
new_pmd = kvm_s2pmd_mkexec(new_pmd);
- invalidate_icache_guest_page(pfn, PMD_SIZE);
} else if (fault_status == FSC_PERM) {
/* Preserve execute if XN was already cleared */
if (stage2_is_exec(kvm, fault_ipa))
@@ -1602,16 +1618,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,

if (writable) {
new_pte = kvm_s2pte_mkwrite(new_pte);
- kvm_set_pfn_dirty(pfn);
mark_page_dirty(kvm, gfn);
}

- if (fault_status != FSC_PERM)
- clean_dcache_guest_page(pfn, PAGE_SIZE);
-
if (exec_fault) {
new_pte = kvm_s2pte_mkexec(new_pte);
- invalidate_icache_guest_page(pfn, PAGE_SIZE);
} else if (fault_status == FSC_PERM) {
/* Preserve execute if XN was already cleared */
if (stage2_is_exec(kvm, fault_ipa))
--
2.19.1


2018-11-01 13:41:36

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v9 6/8] KVM: arm64: Support handling access faults for PUD hugepages

On Wed, Oct 31, 2018 at 05:57:43PM +0000, Punit Agrawal wrote:
> In preparation for creating larger hugepages at Stage 2, extend the
> access fault handling at Stage 2 to support PUD hugepages when
> encountered.
>
> Provide trivial helpers for arm32 to allow sharing of code.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 9 +++++++++
> arch/arm64/include/asm/kvm_mmu.h | 7 +++++++
> arch/arm64/include/asm/pgtable.h | 6 ++++++
> virt/kvm/arm/mmu.c | 22 +++++++++++-----------
> 4 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 839a619873d3..fea5e723e3ac 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -85,6 +85,9 @@ void kvm_clear_hyp_idmap(void);
> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
>
> +#define kvm_pud_pfn(pud) ({ BUG(); 0; })
> +
> +
> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
>
> /*
> @@ -108,6 +111,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud)
> return false;
> }
>
> +static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
> +{
> + BUG();

nit: I think this should be WARN now.

> + return pud;
> +}
> +
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> {
> pte_val(pte) |= L_PTE_S2_RDWR;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index c755b37b3f92..612032bbb428 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -187,6 +187,8 @@ void kvm_clear_hyp_idmap(void);
> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
>
> +#define kvm_pud_pfn(pud) pud_pfn(pud)
> +
> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
>
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> @@ -266,6 +268,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp)
> return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
> }
>
> +static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
> +{
> + return pud_mkyoung(pud);
> +}
> +
> #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
>
> #ifdef __PAGETABLE_PMD_FOLDED
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 50b1ef8584c0..f51e2271e6a3 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -314,6 +314,11 @@ static inline pte_t pud_pte(pud_t pud)
> return __pte(pud_val(pud));
> }
>
> +static inline pud_t pte_pud(pte_t pte)
> +{
> + return __pud(pte_val(pte));
> +}
> +
> static inline pmd_t pud_pmd(pud_t pud)
> {
> return __pmd(pud_val(pud));
> @@ -381,6 +386,7 @@ static inline int pmd_protnone(pmd_t pmd)
> #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
> #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
>
> +#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
> #define pud_write(pud) pte_write(pud_pte(pud))
>
> #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 8e44dccd1b47..bd749601195f 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1698,6 +1698,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> */
> static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> {
> + pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
> kvm_pfn_t pfn;
> @@ -1707,24 +1708,23 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>
> spin_lock(&vcpu->kvm->mmu_lock);
>
> - pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
> - if (!pmd || pmd_none(*pmd)) /* Nothing there */
> + if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, &pud, &pmd, &pte))
> goto out;
>
> - if (pmd_thp_or_huge(*pmd)) { /* THP, HugeTLB */
> + if (pud) { /* HugeTLB */
> + *pud = kvm_s2pud_mkyoung(*pud);
> + pfn = kvm_pud_pfn(*pud);
> + pfn_valid = true;
> + } else if (pmd) { /* THP, HugeTLB */
> *pmd = pmd_mkyoung(*pmd);
> pfn = pmd_pfn(*pmd);
> pfn_valid = true;
> - goto out;
> + } else {
> + *pte = pte_mkyoung(*pte); /* Just a page... */
> + pfn = pte_pfn(*pte);
> + pfn_valid = true;
> }
>
> - pte = pte_offset_kernel(pmd, fault_ipa);
> - if (pte_none(*pte)) /* Nothing there either */
> - goto out;
> -
> - *pte = pte_mkyoung(*pte); /* Just a page... */
> - pfn = pte_pfn(*pte);
> - pfn_valid = true;
> out:
> spin_unlock(&vcpu->kvm->mmu_lock);
> if (pfn_valid)
> --
> 2.19.1
>

2018-11-01 14:05:13

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v9 5/8] KVM: arm64: Support PUD hugepage in stage2_is_exec()

On Wed, Oct 31, 2018 at 05:57:42PM +0000, Punit Agrawal wrote:
> In preparation for creating PUD hugepages at stage 2, add support for
> detecting execute permissions on PUD page table entries. Faults due to
> lack of execute permissions on page table entries is used to perform
> i-cache invalidation on first execute.
>
> Provide trivial implementations of arm32 helpers to allow sharing of
> code.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 6 +++
> arch/arm64/include/asm/kvm_mmu.h | 5 +++
> arch/arm64/include/asm/pgtable-hwdef.h | 2 +
> virt/kvm/arm/mmu.c | 53 +++++++++++++++++++++++---
> 4 files changed, 61 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 37bf85d39607..839a619873d3 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -102,6 +102,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
> return false;
> }
>
> +static inline bool kvm_s2pud_exec(pud_t *pud)
> +{
> + BUG();

nit: I think this should be WARN() now :)

> + return false;
> +}
> +
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> {
> pte_val(pte) |= L_PTE_S2_RDWR;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 8da6d1b2a196..c755b37b3f92 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -261,6 +261,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp)
> return kvm_s2pte_readonly((pte_t *)pudp);
> }
>
> +static inline bool kvm_s2pud_exec(pud_t *pudp)
> +{
> + return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
> +}
> +
> #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
>
> #ifdef __PAGETABLE_PMD_FOLDED
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index 1d7d8da2ef9b..336e24cddc87 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -193,6 +193,8 @@
> #define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
> #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */
>
> +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */
> +
> /*
> * Memory Attribute override for Stage-2 (MemAttr[3:0])
> */
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1c669c3c1208..8e44dccd1b47 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1083,23 +1083,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> return 0;
> }
>
> -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> +/*
> + * stage2_get_leaf_entry - walk the stage2 VM page tables and return
> + * true if a valid and present leaf-entry is found. A pointer to the
> + * leaf-entry is returned in the appropriate level variable - pudpp,
> + * pmdpp, ptepp.
> + */
> +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
> + pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)

Do we need this type madness or could this just return a u64 pointer
(NULL if nothing is found) and pass that to kvm_s2pte_exec (because we
know it's the same bit we need to check regardless of the pgtable level
on both arm and arm64)?

Or do we consider that bad for some reason?


Thanks,

Christoffer

> {
> + pud_t *pudp;
> pmd_t *pmdp;
> pte_t *ptep;
>
> - pmdp = stage2_get_pmd(kvm, NULL, addr);
> + *pudpp = NULL;
> + *pmdpp = NULL;
> + *ptepp = NULL;
> +
> + pudp = stage2_get_pud(kvm, NULL, addr);
> + if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp))
> + return false;
> +
> + if (stage2_pud_huge(kvm, *pudp)) {
> + *pudpp = pudp;
> + return true;
> + }
> +
> + pmdp = stage2_pmd_offset(kvm, pudp, addr);
> if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp))
> return false;
>
> - if (pmd_thp_or_huge(*pmdp))
> - return kvm_s2pmd_exec(pmdp);
> + if (pmd_thp_or_huge(*pmdp)) {
> + *pmdpp = pmdp;
> + return true;
> + }
>
> ptep = pte_offset_kernel(pmdp, addr);
> if (!ptep || pte_none(*ptep) || !pte_present(*ptep))
> return false;
>
> - return kvm_s2pte_exec(ptep);
> + *ptepp = ptep;
> + return true;
> +}
> +
> +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> +{
> + pud_t *pudp;
> + pmd_t *pmdp;
> + pte_t *ptep;
> + bool found;
> +
> + found = stage2_get_leaf_entry(kvm, addr, &pudp, &pmdp, &ptep);
> + if (!found)
> + return false;
> +
> + if (pudp)
> + return kvm_s2pud_exec(pudp);
> + else if (pmdp)
> + return kvm_s2pmd_exec(pmdp);
> + else
> + return kvm_s2pte_exec(ptep);
> }
>
> static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> --
> 2.19.1
>

2018-12-03 12:13:58

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()



On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> The code for operations such as marking the pfn as dirty, and
> dcache/icache maintenance during stage 2 fault handling is duplicated
> between normal pages and PMD hugepages.
>
> Instead of creating another copy of the operations when we introduce
> PUD hugepages, let's share them across the different pagesizes.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> ---
> virt/kvm/arm/mmu.c | 49 ++++++++++++++++++++++++++++------------------
> 1 file changed, 30 insertions(+), 19 deletions(-)
>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 5eca48bdb1a6..59595207c5e1 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> unsigned long fault_status)
> {
> int ret;
> - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
> + bool write_fault, exec_fault, writable, force_pte = false;
> unsigned long mmu_seq;
> gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> struct kvm *kvm = vcpu->kvm;
> @@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> kvm_pfn_t pfn;
> pgprot_t mem_type = PAGE_S2;
> bool logging_active = memslot_is_logging(memslot);
> - unsigned long flags = 0;
> + unsigned long vma_pagesize, flags = 0;

A small nit s/vma_pagesize/pagesize. Why call it VMA ? Its implicit.

>
> write_fault = kvm_is_write_fault(vcpu);
> exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> @@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> return -EFAULT;
> }
>
> - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
> - hugetlb = true;
> + vma_pagesize = vma_kernel_pagesize(vma);
> + if (vma_pagesize == PMD_SIZE && !logging_active) {
> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> } else {
> + /*
> + * Fallback to PTE if it's not one of the Stage 2
> + * supported hugepage sizes
> + */
> + vma_pagesize = PAGE_SIZE;

This seems redundant and should be dropped. vma_kernel_pagesize() here either
calls hugetlb_vm_op_pagesize (via hugetlb_vm_ops->pagesize) or simply returns
PAGE_SIZE. The vm_ops path is taken if the QEMU VMA covering any given HVA is
backed either by HugeTLB pages or simply normal pages. vma_pagesize would
either has a value of PMD_SIZE (HugeTLB hstate based) or PAGE_SIZE. Hence if
its not PMD_SIZE it must be PAGE_SIZE which should not be assigned again.

> +
> /*
> * Pages belonging to memslots that don't have the same
> * alignment for userspace and IPA cannot be mapped using
> @@ -1573,23 +1579,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> if (mmu_notifier_retry(kvm, mmu_seq))
> goto out_unlock;
>
> - if (!hugetlb && !force_pte)
> - hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> + if (vma_pagesize == PAGE_SIZE && !force_pte) {
> + /*
> + * Only PMD_SIZE transparent hugepages(THP) are
> + * currently supported. This code will need to be
> + * updated to support other THP sizes.
> + */

This comment belongs to transparent_hugepage_adjust() but not here.

> + if (transparent_hugepage_adjust(&pfn, &fault_ipa))
> + vma_pagesize = PMD_SIZE;

IIUC transparent_hugepage_adjust() is only getting called here. Instead of
returning 'true' when it is able to detect a huge page backing and doing
an adjustment there after, it should rather return THP size (PMD_SIZE) to
accommodate probable multi size THP support in future .

> + }
> +
> + if (writable)
> + kvm_set_pfn_dirty(pfn);
>
> - if (hugetlb) {
> + if (fault_status != FSC_PERM)
> + clean_dcache_guest_page(pfn, vma_pagesize);
> +
> + if (exec_fault)
> + invalidate_icache_guest_page(pfn, vma_pagesize);
> +
> + if (vma_pagesize == PMD_SIZE) {
> pmd_t new_pmd = pfn_pmd(pfn, mem_type);
> new_pmd = pmd_mkhuge(new_pmd);
> - if (writable) {
> + if (writable)
> new_pmd = kvm_s2pmd_mkwrite(new_pmd);
> - kvm_set_pfn_dirty(pfn);
> - }
> -
> - if (fault_status != FSC_PERM)
> - clean_dcache_guest_page(pfn, PMD_SIZE);
>
> if (exec_fault) {
> new_pmd = kvm_s2pmd_mkexec(new_pmd);
> - invalidate_icache_guest_page(pfn, PMD_SIZE);
> } else if (fault_status == FSC_PERM) {
> /* Preserve execute if XN was already cleared */
> if (stage2_is_exec(kvm, fault_ipa))
> @@ -1602,16 +1618,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
> if (writable) {
> new_pte = kvm_s2pte_mkwrite(new_pte);
> - kvm_set_pfn_dirty(pfn);
> mark_page_dirty(kvm, gfn);
> }
>
> - if (fault_status != FSC_PERM)
> - clean_dcache_guest_page(pfn, PAGE_SIZE);
> -
> if (exec_fault) {
> new_pte = kvm_s2pte_mkexec(new_pte);
> - invalidate_icache_guest_page(pfn, PAGE_SIZE);
> } else if (fault_status == FSC_PERM) {
> /* Preserve execute if XN was already cleared */
> if (stage2_is_exec(kvm, fault_ipa))
>

kvm_set_pfn_dirty, clean_dcache_guest_page, invalidate_icache_guest_page
can all be safely moved before setting the page table entries either as
PMD or PTE.

2018-12-03 13:33:58

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v9 2/8] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault



On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> Stage 2 fault handler marks a page as executable if it is handling an
> execution fault or if it was a permission fault in which case the
> executable bit needs to be preserved.
>
> The logic to decide if the page should be marked executable is
> duplicated for PMD and PTE entries. To avoid creating another copy
> when support for PUD hugepages is introduced refactor the code to
> share the checks needed to mark a page table entry as executable.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> ---
> virt/kvm/arm/mmu.c | 28 +++++++++++++++-------------
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 59595207c5e1..6912529946fb 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1475,7 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> unsigned long fault_status)
> {
> int ret;
> - bool write_fault, exec_fault, writable, force_pte = false;
> + bool write_fault, writable, force_pte = false;
> + bool exec_fault, needs_exec;

New line not required, still within 80 characters.

> unsigned long mmu_seq;
> gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> struct kvm *kvm = vcpu->kvm;
> @@ -1598,19 +1599,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> if (exec_fault)
> invalidate_icache_guest_page(pfn, vma_pagesize);
>
> + /*
> + * If we took an execution fault we have made the
> + * icache/dcache coherent above and should now let the s2

Coherent or invalidated with invalidate_icache_guest_page ?

> + * mapping be executable.
> + *
> + * Write faults (!exec_fault && FSC_PERM) are orthogonal to
> + * execute permissions, and we preserve whatever we have.
> + */

Otherwise looks good.

2018-12-03 13:40:03

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()

Hi Anshuman,

On 03/12/2018 12:11, Anshuman Khandual wrote:
>
>
> On 10/31/2018 11:27 PM, Punit Agrawal wrote:
>> The code for operations such as marking the pfn as dirty, and
>> dcache/icache maintenance during stage 2 fault handling is duplicated
>> between normal pages and PMD hugepages.
>>
>> Instead of creating another copy of the operations when we introduce
>> PUD hugepages, let's share them across the different pagesizes.
>>
>> Signed-off-by: Punit Agrawal <[email protected]>
>> Reviewed-by: Suzuki K Poulose <[email protected]>
>> Cc: Christoffer Dall <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> ---
>> virt/kvm/arm/mmu.c | 49 ++++++++++++++++++++++++++++------------------
>> 1 file changed, 30 insertions(+), 19 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 5eca48bdb1a6..59595207c5e1 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> unsigned long fault_status)
>> {
>> int ret;
>> - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
>> + bool write_fault, exec_fault, writable, force_pte = false;
>> unsigned long mmu_seq;
>> gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>> struct kvm *kvm = vcpu->kvm;
>> @@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> kvm_pfn_t pfn;
>> pgprot_t mem_type = PAGE_S2;
>> bool logging_active = memslot_is_logging(memslot);
>> - unsigned long flags = 0;
>> + unsigned long vma_pagesize, flags = 0;
>
> A small nit s/vma_pagesize/pagesize. Why call it VMA ? Its implicit.

May be we could call it mapsize. pagesize is confusing.

>
>>
>> write_fault = kvm_is_write_fault(vcpu);
>> exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
>> @@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> return -EFAULT;
>> }
>>
>> - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
>> - hugetlb = true;
>> + vma_pagesize = vma_kernel_pagesize(vma);
>> + if (vma_pagesize == PMD_SIZE && !logging_active) {
>> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>> } else {
>> + /*
>> + * Fallback to PTE if it's not one of the Stage 2
>> + * supported hugepage sizes
>> + */
>> + vma_pagesize = PAGE_SIZE;
>
> This seems redundant and should be dropped. vma_kernel_pagesize() here either
> calls hugetlb_vm_op_pagesize (via hugetlb_vm_ops->pagesize) or simply returns
> PAGE_SIZE. The vm_ops path is taken if the QEMU VMA covering any given HVA is
> backed either by HugeTLB pages or simply normal pages. vma_pagesize would
> either has a value of PMD_SIZE (HugeTLB hstate based) or PAGE_SIZE. Hence if
> its not PMD_SIZE it must be PAGE_SIZE which should not be assigned again.

We may want to force using the PTE mappings when logging_active (e.g, migration
?) to prevent keep tracking of huge pages. So the check is still valid.


>
>> +
>> /*
>> * Pages belonging to memslots that don't have the same
>> * alignment for userspace and IPA cannot be mapped using
>> @@ -1573,23 +1579,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> if (mmu_notifier_retry(kvm, mmu_seq))
>> goto out_unlock;
>>
>> - if (!hugetlb && !force_pte)
>> - hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>> + if (vma_pagesize == PAGE_SIZE && !force_pte) {
>> + /*
>> + * Only PMD_SIZE transparent hugepages(THP) are
>> + * currently supported. This code will need to be
>> + * updated to support other THP sizes.
>> + */
>
> This comment belongs to transparent_hugepage_adjust() but not here.

I think this is relevant here than in thp_adjust, unless we rename
the function below to something generic, handle_hugepage_adjust().

>> + if (transparent_hugepage_adjust(&pfn, &fault_ipa))
>> + vma_pagesize = PMD_SIZE;
>
> IIUC transparent_hugepage_adjust() is only getting called here. Instead of
> returning 'true' when it is able to detect a huge page backing and doing
> an adjustment there after, it should rather return THP size (PMD_SIZE) to
> accommodate probable multi size THP support in future .

That makes sense.

>
>> + }
>> +
>> + if (writable)
>> + kvm_set_pfn_dirty(pfn);
>>
>> - if (hugetlb) {
>> + if (fault_status != FSC_PERM)
>> + clean_dcache_guest_page(pfn, vma_pagesize);
>> +
>> + if (exec_fault)
>> + invalidate_icache_guest_page(pfn, vma_pagesize);
>> +
>> + if (vma_pagesize == PMD_SIZE) {
>> pmd_t new_pmd = pfn_pmd(pfn, mem_type);
>> new_pmd = pmd_mkhuge(new_pmd);
>> - if (writable) {
>> + if (writable)
>> new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>> - kvm_set_pfn_dirty(pfn);
>> - }
>> -
>> - if (fault_status != FSC_PERM)
>> - clean_dcache_guest_page(pfn, PMD_SIZE);
>>
>> if (exec_fault) {
>> new_pmd = kvm_s2pmd_mkexec(new_pmd);
>> - invalidate_icache_guest_page(pfn, PMD_SIZE);
>> } else if (fault_status == FSC_PERM) {
>> /* Preserve execute if XN was already cleared */
>> if (stage2_is_exec(kvm, fault_ipa))
>> @@ -1602,16 +1618,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (writable) {
>> new_pte = kvm_s2pte_mkwrite(new_pte);
>> - kvm_set_pfn_dirty(pfn);
>> mark_page_dirty(kvm, gfn);
>> }
>>
>> - if (fault_status != FSC_PERM)
>> - clean_dcache_guest_page(pfn, PAGE_SIZE);
>> -
>> if (exec_fault) {
>> new_pte = kvm_s2pte_mkexec(new_pte);
>> - invalidate_icache_guest_page(pfn, PAGE_SIZE);
>> } else if (fault_status == FSC_PERM) {
>> /* Preserve execute if XN was already cleared */
>> if (stage2_is_exec(kvm, fault_ipa))
>>
>
> kvm_set_pfn_dirty, clean_dcache_guest_page, invalidate_icache_guest_page
> can all be safely moved before setting the page table entries either as
> PMD or PTE.

I think this is what we do currently. So I assume this is fine.

Cheers
Suzuki

2018-12-03 13:51:07

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v9 3/8] KVM: arm/arm64: Introduce helpers to manipulate page table entries



On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> Introduce helpers to abstract architectural handling of the conversion
> of pfn to page table entries and marking a PMD page table entry as a
> block entry.

Why is this necessary ? we would still need to access PMD, PUD as is
without any conversion. IOW KVM knows the breakdown of the page table
at various levels. Is this something required from generic KVM code ?

>
> The helpers are introduced in preparation for supporting PUD hugepages
> at stage 2 - which are supported on arm64 but do not exist on arm.

Some of these patches (including the earlier two) are good on it's
own. Do we have still mention in commit message about the incoming PUD
enablement as the reason for these cleanup patches ?

2018-12-03 14:05:47

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v9 3/8] KVM: arm/arm64: Introduce helpers to manipulate page table entries

Hi Anshuman,

On 03/12/2018 13:50, Anshuman Khandual wrote:
>
>
> On 10/31/2018 11:27 PM, Punit Agrawal wrote:
>> Introduce helpers to abstract architectural handling of the conversion
>> of pfn to page table entries and marking a PMD page table entry as a
>> block entry.
>
> Why is this necessary ? we would still need to access PMD, PUD as is
> without any conversion. IOW KVM knows the breakdown of the page table
> at various levels. Is this something required from generic KVM code ?

The KVM MMU code is shared for arm32 and arm64. Hence we need arch specific
helpers to hide how we deal with the levels. e.g, PUD never exists for
arm32.

>
>>
>> The helpers are introduced in preparation for supporting PUD hugepages
>> at stage 2 - which are supported on arm64 but do not exist on arm.
>
> Some of these patches (including the earlier two) are good on it's
> own. Do we have still mention in commit message about the incoming PUD
> enablement as the reason for these cleanup patches ?
>

Cheers
Suzuki

2018-12-03 14:19:34

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v9 4/8] KVM: arm64: Support dirty page tracking for PUD hugepages



On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> In preparation for creating PUD hugepages at stage 2, add support for
> write protecting PUD hugepages when they are encountered. Write
> protecting guest tables is used to track dirty pages when migrating
> VMs.
>
> Also, provide trivial implementations of required kvm_s2pud_* helpers
> to allow sharing of code with arm32.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Reviewed-by: Christoffer Dall <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 15 +++++++++++++++
> arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
> virt/kvm/arm/mmu.c | 11 +++++++----
> 3 files changed, 32 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index e6eff8bf5d7f..37bf85d39607 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -87,6 +87,21 @@ void kvm_clear_hyp_idmap(void);
>
> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
>
> +/*
> + * The following kvm_*pud*() functions are provided strictly to allow
> + * sharing code with arm64. They should never be called in practice.
> + */
> +static inline void kvm_set_s2pud_readonly(pud_t *pud)
> +{
> + BUG();
> +}
> +
> +static inline bool kvm_s2pud_readonly(pud_t *pud)
> +{
> + BUG();
> + return false;
> +}

As arm32 does not support direct manipulation of PUD entries.

> +
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> {
> pte_val(pte) |= L_PTE_S2_RDWR;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 13d482710292..8da6d1b2a196 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -251,6 +251,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp)
> return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN);
> }
>
> +static inline void kvm_set_s2pud_readonly(pud_t *pudp)
> +{
> + kvm_set_s2pte_readonly((pte_t *)pudp);
> +}
> +
> +static inline bool kvm_s2pud_readonly(pud_t *pudp)
> +{
> + return kvm_s2pte_readonly((pte_t *)pudp);
> +}
> +
> #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
>
> #ifdef __PAGETABLE_PMD_FOLDED
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index fb5325f7a1ac..1c669c3c1208 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1347,9 +1347,12 @@ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
> do {
> next = stage2_pud_addr_end(kvm, addr, end);
> if (!stage2_pud_none(kvm, *pud)) {
> - /* TODO:PUD not supported, revisit later if supported */
> - BUG_ON(stage2_pud_huge(kvm, *pud));
> - stage2_wp_pmds(kvm, pud, addr, next);
> + if (stage2_pud_huge(kvm, *pud)) {
> + if (!kvm_s2pud_readonly(pud))
> + kvm_set_s2pud_readonly(pud);
> + } else {
> + stage2_wp_pmds(kvm, pud, addr, next);
> + }

As this series is enabling PUD related changes in multiple places, it is
reasonable to enable PGD level support as well even if it might not be
used much at the moment. I dont see much extra code to enable PGD, then
why not ? Even just to make the HugeTLB support matrix complete.

2018-12-03 14:22:22

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v9 4/8] KVM: arm64: Support dirty page tracking for PUD hugepages



On 03/12/2018 14:17, Anshuman Khandual wrote:
>
>
> On 10/31/2018 11:27 PM, Punit Agrawal wrote:
>> In preparation for creating PUD hugepages at stage 2, add support for
>> write protecting PUD hugepages when they are encountered. Write
>> protecting guest tables is used to track dirty pages when migrating
>> VMs.
>>
>> Also, provide trivial implementations of required kvm_s2pud_* helpers
>> to allow sharing of code with arm32.
>>
>> Signed-off-by: Punit Agrawal <[email protected]>
>> Reviewed-by: Christoffer Dall <[email protected]>
>> Reviewed-by: Suzuki K Poulose <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Russell King <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> ---
>> arch/arm/include/asm/kvm_mmu.h | 15 +++++++++++++++
>> arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
>> virt/kvm/arm/mmu.c | 11 +++++++----
>> 3 files changed, 32 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index e6eff8bf5d7f..37bf85d39607 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -87,6 +87,21 @@ void kvm_clear_hyp_idmap(void);
>>
>> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
>>
>> +/*
>> + * The following kvm_*pud*() functions are provided strictly to allow
>> + * sharing code with arm64. They should never be called in practice.
>> + */
>> +static inline void kvm_set_s2pud_readonly(pud_t *pud)
>> +{
>> + BUG();
>> +}
>> +
>> +static inline bool kvm_s2pud_readonly(pud_t *pud)
>> +{
>> + BUG();
>> + return false;
>> +}
>
> As arm32 does not support direct manipulation of PUD entries.
>
>> +
>> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
>> {
>> pte_val(pte) |= L_PTE_S2_RDWR;
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 13d482710292..8da6d1b2a196 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -251,6 +251,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp)
>> return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN);
>> }
>>
>> +static inline void kvm_set_s2pud_readonly(pud_t *pudp)
>> +{
>> + kvm_set_s2pte_readonly((pte_t *)pudp);
>> +}
>> +
>> +static inline bool kvm_s2pud_readonly(pud_t *pudp)
>> +{
>> + return kvm_s2pte_readonly((pte_t *)pudp);
>> +}
>> +
>> #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
>>
>> #ifdef __PAGETABLE_PMD_FOLDED
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index fb5325f7a1ac..1c669c3c1208 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1347,9 +1347,12 @@ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
>> do {
>> next = stage2_pud_addr_end(kvm, addr, end);
>> if (!stage2_pud_none(kvm, *pud)) {
>> - /* TODO:PUD not supported, revisit later if supported */
>> - BUG_ON(stage2_pud_huge(kvm, *pud));
>> - stage2_wp_pmds(kvm, pud, addr, next);
>> + if (stage2_pud_huge(kvm, *pud)) {
>> + if (!kvm_s2pud_readonly(pud))
>> + kvm_set_s2pud_readonly(pud);
>> + } else {
>> + stage2_wp_pmds(kvm, pud, addr, next);
>> + }
>
> As this series is enabling PUD related changes in multiple places, it is
> reasonable to enable PGD level support as well even if it might not be
> used much at the moment. I dont see much extra code to enable PGD, then
> why not ? Even just to make the HugeTLB support matrix complete.

Architecturally, we have hugepages at PUD/PMD (Level1 and Level2). PGD
is a variable level which could be L0 or L1 (when PUD is folded) or L2(when
PMD is folded). So if we enabled PUD/PMD hugepages, that would make sure
that the PGD hugepages (i.e, when one of the levels are folded) are supported.
We don't have L0 huge pages in arm64. Does that make sense ?

Cheers
Suzuki



>

2018-12-03 14:38:25

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v9 5/8] KVM: arm64: Support PUD hugepage in stage2_is_exec()



On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> In preparation for creating PUD hugepages at stage 2, add support for
> detecting execute permissions on PUD page table entries. Faults due to
> lack of execute permissions on page table entries is used to perform
> i-cache invalidation on first execute.
>
> Provide trivial implementations of arm32 helpers to allow sharing of
> code.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 6 +++
> arch/arm64/include/asm/kvm_mmu.h | 5 +++
> arch/arm64/include/asm/pgtable-hwdef.h | 2 +
> virt/kvm/arm/mmu.c | 53 +++++++++++++++++++++++---
> 4 files changed, 61 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 37bf85d39607..839a619873d3 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -102,6 +102,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
> return false;
> }
>
> +static inline bool kvm_s2pud_exec(pud_t *pud)
> +{
> + BUG();
> + return false;
> +}
> +
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> {
> pte_val(pte) |= L_PTE_S2_RDWR;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 8da6d1b2a196..c755b37b3f92 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -261,6 +261,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp)
> return kvm_s2pte_readonly((pte_t *)pudp);
> }
>
> +static inline bool kvm_s2pud_exec(pud_t *pudp)
> +{
> + return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
> +}
> +
> #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
>
> #ifdef __PAGETABLE_PMD_FOLDED
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index 1d7d8da2ef9b..336e24cddc87 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -193,6 +193,8 @@
> #define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
> #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */
>
> +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */
> +
> /*
> * Memory Attribute override for Stage-2 (MemAttr[3:0])
> */
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1c669c3c1208..8e44dccd1b47 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1083,23 +1083,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> return 0;
> }
>
> -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> +/*
> + * stage2_get_leaf_entry - walk the stage2 VM page tables and return
> + * true if a valid and present leaf-entry is found. A pointer to the
> + * leaf-entry is returned in the appropriate level variable - pudpp,
> + * pmdpp, ptepp.
> + */
> +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
> + pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
> {
> + pud_t *pudp;
> pmd_t *pmdp;
> pte_t *ptep;
>
> - pmdp = stage2_get_pmd(kvm, NULL, addr);
> + *pudpp = NULL;
> + *pmdpp = NULL;
> + *ptepp = NULL;
> +
> + pudp = stage2_get_pud(kvm, NULL, addr);
> + if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp))
> + return false;
> +
> + if (stage2_pud_huge(kvm, *pudp)) {
> + *pudpp = pudp;
> + return true;
> + }
> +
> + pmdp = stage2_pmd_offset(kvm, pudp, addr);
> if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp))
> return false;
>
> - if (pmd_thp_or_huge(*pmdp))
> - return kvm_s2pmd_exec(pmdp);
> + if (pmd_thp_or_huge(*pmdp)) {
> + *pmdpp = pmdp;
> + return true;
> + }
>
> ptep = pte_offset_kernel(pmdp, addr);
> if (!ptep || pte_none(*ptep) || !pte_present(*ptep))
> return false;
>
> - return kvm_s2pte_exec(ptep);
> + *ptepp = ptep;
> + return true;
> +}
> +
> +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> +{
> + pud_t *pudp;
> + pmd_t *pmdp;
> + pte_t *ptep;
> + bool found;
> +
> + found = stage2_get_leaf_entry(kvm, addr, &pudp, &pmdp, &ptep);
> + if (!found)
> + return false;
> +
> + if (pudp)
> + return kvm_s2pud_exec(pudp);
> + else if (pmdp)
> + return kvm_s2pmd_exec(pmdp);
> + else
> + return kvm_s2pte_exec(ptep);
> }

stage2_get_leaf_entry() is not really necessary as a separate function.
It determines leaf entry and just return a true/false. At those 'true'
return points it can just return kvm_s2XXX_exec() directly. Passing
three different pointers as arguments and checking for them being
non-NULL upon return and doing a simple return then seems like a lot
without much reason. stage2_is_exec() can just be expanded to add PUD
support.

2018-12-03 15:13:21

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v9 6/8] KVM: arm64: Support handling access faults for PUD hugepages



On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> In preparation for creating larger hugepages at Stage 2, extend the
> access fault handling at Stage 2 to support PUD hugepages when
> encountered.
>
> Provide trivial helpers for arm32 to allow sharing of code.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 9 +++++++++
> arch/arm64/include/asm/kvm_mmu.h | 7 +++++++
> arch/arm64/include/asm/pgtable.h | 6 ++++++
> virt/kvm/arm/mmu.c | 22 +++++++++++-----------
> 4 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 839a619873d3..fea5e723e3ac 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -85,6 +85,9 @@ void kvm_clear_hyp_idmap(void);
> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
>
> +#define kvm_pud_pfn(pud) ({ BUG(); 0; })
> +
> +
> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
>
> /*
> @@ -108,6 +111,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud)
> return false;
> }
>
> +static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
> +{
> + BUG();
> + return pud;
> +}
> +
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> {
> pte_val(pte) |= L_PTE_S2_RDWR;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index c755b37b3f92..612032bbb428 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -187,6 +187,8 @@ void kvm_clear_hyp_idmap(void);
> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
>
> +#define kvm_pud_pfn(pud) pud_pfn(pud)
> +
> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
>
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> @@ -266,6 +268,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp)
> return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
> }
>
> +static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
> +{
> + return pud_mkyoung(pud);
> +}
> +
> #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
>
> #ifdef __PAGETABLE_PMD_FOLDED
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 50b1ef8584c0..f51e2271e6a3 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -314,6 +314,11 @@ static inline pte_t pud_pte(pud_t pud)
> return __pte(pud_val(pud));
> }
>
> +static inline pud_t pte_pud(pte_t pte)
> +{
> + return __pud(pte_val(pte));
> +}
> +

Yeah these would be required for PUD based THP when enabled.

> static inline pmd_t pud_pmd(pud_t pud)
> {
> return __pmd(pud_val(pud));
> @@ -381,6 +386,7 @@ static inline int pmd_protnone(pmd_t pmd)
> #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
> #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
>
> +#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
> #define pud_write(pud) pte_write(pud_pte(pud))
>
> #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 8e44dccd1b47..bd749601195f 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1698,6 +1698,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> */
> static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> {
> + pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
> kvm_pfn_t pfn;
> @@ -1707,24 +1708,23 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>
> spin_lock(&vcpu->kvm->mmu_lock);
>
> - pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
> - if (!pmd || pmd_none(*pmd)) /* Nothing there */
> + if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, &pud, &pmd, &pte))
> goto out;
>
> - if (pmd_thp_or_huge(*pmd)) { /* THP, HugeTLB */
> + if (pud) { /* HugeTLB */
> + *pud = kvm_s2pud_mkyoung(*pud);
> + pfn = kvm_pud_pfn(*pud);
> + pfn_valid = true;
> + } else if (pmd) { /* THP, HugeTLB */
> *pmd = pmd_mkyoung(*pmd);
> pfn = pmd_pfn(*pmd);
> pfn_valid = true;
> - goto out;
> + } else {
> + *pte = pte_mkyoung(*pte); /* Just a page... */
> + pfn = pte_pfn(*pte);
> + pfn_valid = true;
> }

As mentioned before stage2_get_leaf_entry() is not required for the previous
patch and handle_access_fault() can definitely do without it. The existing
page table walker flow is better than this helper which takes like three
different arguments and makes semantics more complicated than required.


2018-12-03 15:20:37

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v9 7/8] KVM: arm64: Update age handlers to support PUD hugepages



On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> In preparation for creating larger hugepages at Stage 2, add support
> to the age handling notifiers for PUD hugepages when encountered.
>
> Provide trivial helpers for arm32 to allow sharing code.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 6 +++++
> arch/arm64/include/asm/kvm_mmu.h | 5 ++++
> arch/arm64/include/asm/pgtable.h | 1 +
> virt/kvm/arm/mmu.c | 39 ++++++++++++++++----------------
> 4 files changed, 32 insertions(+), 19 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index fea5e723e3ac..e62f0913ce7d 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -117,6 +117,12 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
> return pud;
> }
>
> +static inline bool kvm_s2pud_young(pud_t pud)
> +{
> + BUG();
> + return false;
> +}
> +
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> {
> pte_val(pte) |= L_PTE_S2_RDWR;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 612032bbb428..9f941f70775c 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -273,6 +273,11 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
> return pud_mkyoung(pud);
> }
>
> +static inline bool kvm_s2pud_young(pud_t pud)
> +{
> + return pud_young(pud);
> +}
> +
> #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
>
> #ifdef __PAGETABLE_PMD_FOLDED
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index f51e2271e6a3..bb0f3f17a7a9 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -386,6 +386,7 @@ static inline int pmd_protnone(pmd_t pmd)
> #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
> #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
>
> +#define pud_young(pud) pte_young(pud_pte(pud))
> #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
> #define pud_write(pud) pte_write(pud_pte(pud))
>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index bd749601195f..3893ea6a50bf 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1225,6 +1225,11 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd)
> return stage2_ptep_test_and_clear_young((pte_t *)pmd);
> }
>
> +static int stage2_pudp_test_and_clear_young(pud_t *pud)
> +{
> + return stage2_ptep_test_and_clear_young((pte_t *)pud);
> +}
> +
> /**
> * kvm_phys_addr_ioremap - map a device range to guest IPA
> *
> @@ -1932,42 +1937,38 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>
> static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
> {
> + pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
>
> - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
> - pmd = stage2_get_pmd(kvm, NULL, gpa);
> - if (!pmd || pmd_none(*pmd)) /* Nothing there */
> + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
> + if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte))
> return 0;
>
> - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */
> + if (pud)
> + return stage2_pudp_test_and_clear_young(pud);
> + else if (pmd)
> return stage2_pmdp_test_and_clear_young(pmd);
> -
> - pte = pte_offset_kernel(pmd, gpa);
> - if (pte_none(*pte))
> - return 0;
> -
> - return stage2_ptep_test_and_clear_young(pte);
> + else
> + return stage2_ptep_test_and_clear_young(pte);
> }
>
> static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
> {
> + pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
>
> - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
> - pmd = stage2_get_pmd(kvm, NULL, gpa);
> - if (!pmd || pmd_none(*pmd)) /* Nothing there */
> + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
> + if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte))
> return 0;
>
> - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */
> + if (pud)
> + return kvm_s2pud_young(*pud);
> + else if (pmd)
> return pmd_young(*pmd);
> -
> - pte = pte_offset_kernel(pmd, gpa);
> - if (!pte_none(*pte)) /* Just a page... */
> + else
> return pte_young(*pte);
> -
> - return 0;
> }
>
> int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
>

If stage2_get_leaf_entry() is avoided (which it should be) some of the previous
patches can be folded into just a single one like defining new HW page table
definitions, arm32 trivial functions and PUD specific additions at various
stage 2 page table walkers etc.

2018-12-03 15:49:32

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v9 8/8] KVM: arm64: Add support for creating PUD hugepages at stage 2



On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> KVM only supports PMD hugepages at stage 2. Now that the various page
> handling routines are updated, extend the stage 2 fault handling to
> map in PUD hugepages.
>
> Addition of PUD hugepage support enables additional page sizes (e.g.,
> 1G with 4K granule) which can be useful on cores that support mapping
> larger block sizes in the TLB entries.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> Reviewed-by: Suzuki Poulose <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 20 +++++
> arch/arm/include/asm/stage2_pgtable.h | 5 ++
> arch/arm64/include/asm/kvm_mmu.h | 16 ++++
> arch/arm64/include/asm/pgtable-hwdef.h | 2 +
> arch/arm64/include/asm/pgtable.h | 2 +
> virt/kvm/arm/mmu.c | 104 +++++++++++++++++++++++--
> 6 files changed, 143 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index e62f0913ce7d..6336319a0d5b 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -84,11 +84,14 @@ void kvm_clear_hyp_idmap(void);
>
> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
> +#define kvm_pfn_pud(pfn, prot) (__pud(0))

This makes sense. Was not enabled before and getting added here.

>
> #define kvm_pud_pfn(pud) ({ BUG(); 0; })
>
>
> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
> +/* No support for pud hugepages */
> +#define kvm_pud_mkhuge(pud) ( {BUG(); pud; })
>
> /*
> * The following kvm_*pud*() functions are provided strictly to allow
> @@ -105,6 +108,23 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
> return false;
> }
>
> +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud)
> +{
> + BUG();
> +}
> +
> +static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
> +{
> + BUG();
> + return pud;
> +}
> +
> +static inline pud_t kvm_s2pud_mkexec(pud_t pud)
> +{
> + BUG();
> + return pud;
> +}
> +

But the previous patches have been adding some of these helpers
as well. Needs consolidation.

> static inline bool kvm_s2pud_exec(pud_t *pud)
> {
> BUG();
> diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
> index f6a7ea805232..f9017167a8d1 100644
> --- a/arch/arm/include/asm/stage2_pgtable.h
> +++ b/arch/arm/include/asm/stage2_pgtable.h
> @@ -68,4 +68,9 @@ stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> #define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp)
> #define stage2_pud_table_empty(kvm, pudp) false
>
> +static inline bool kvm_stage2_has_pud(struct kvm *kvm)
> +{
> + return false;
> +}
>

Makes sense. Disabled on arm32.

+
> #endif /* __ARM_S2_PGTABLE_H_ */
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 9f941f70775c..8af4b1befa42 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -184,12 +184,16 @@ void kvm_clear_hyp_idmap(void);
> #define kvm_mk_pgd(pudp) \
> __pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE)
>
> +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud)
> +
> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
> +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot)
>
> #define kvm_pud_pfn(pud) pud_pfn(pud)
>
> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
> +#define kvm_pud_mkhuge(pud) pud_mkhuge(pud)
>
> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> {
> @@ -203,6 +207,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
> return pmd;
> }
>
> +static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
> +{
> + pud_val(pud) |= PUD_S2_RDWR;
> + return pud;
> +}
> +
> static inline pte_t kvm_s2pte_mkexec(pte_t pte)
> {
> pte_val(pte) &= ~PTE_S2_XN;
> @@ -215,6 +225,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
> return pmd;
> }
>
> +static inline pud_t kvm_s2pud_mkexec(pud_t pud)
> +{
> + pud_val(pud) &= ~PUD_S2_XN;
> + return pud;
> +}
> +

Should not these be consolidated in previous patches ?

> static inline void kvm_set_s2pte_readonly(pte_t *ptep)
> {
> pteval_t old_pteval, pteval;
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index 336e24cddc87..6f1c187f1c86 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -193,6 +193,8 @@
> #define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
> #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */
>
> +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */
> +#define PUD_S2_RDWR (_AT(pudval_t, 3) << 6) /* HAP[2:1] */
> #define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */

All PUD related HW page table definitions should have been added
at once in one of the preparatory patches.

>
> /*
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index bb0f3f17a7a9..576128635f3c 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -390,6 +390,8 @@ static inline int pmd_protnone(pmd_t pmd)
> #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
> #define pud_write(pud) pte_write(pud_pte(pud))
>
> +#define pud_mkhuge(pud) (__pud(pud_val(pud) & ~PUD_TABLE_BIT))
> +

All new pud_* definitions should have come together at once as well.

> #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
> #define __phys_to_pud_val(phys) __phys_to_pte_val(phys)
> #define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT)



> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 3893ea6a50bf..2dcff38868d4 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -115,6 +115,25 @@ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
> put_page(virt_to_page(pmd));
> }
>
> +/**
> + * stage2_dissolve_pud() - clear and flush huge PUD entry
> + * @kvm: pointer to kvm structure.
> + * @addr: IPA
> + * @pud: pud pointer for IPA
> + *
> + * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs. Marks all
> + * pages in the range dirty.
> + */
> +static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp)
> +{
> + if (!stage2_pud_huge(kvm, *pudp))
> + return;
> +
> + stage2_pud_clear(kvm, pudp);
> + kvm_tlb_flush_vmid_ipa(kvm, addr);
> + put_page(virt_to_page(pudp));
> +}
> +
> static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
> int min, int max)
> {
> @@ -1022,7 +1041,7 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
> pmd_t *pmd;
>
> pud = stage2_get_pud(kvm, cache, addr);
> - if (!pud)
> + if (!pud || stage2_pud_huge(kvm, *pud))
> return NULL;
>
> if (stage2_pud_none(kvm, *pud)) {
> @@ -1083,6 +1102,36 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> return 0;
> }
>
> +static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> + phys_addr_t addr, const pud_t *new_pudp)
> +{
> + pud_t *pudp, old_pud;
> +
> + pudp = stage2_get_pud(kvm, cache, addr);
> + VM_BUG_ON(!pudp);
> +
> + old_pud = *pudp;
> +
> + /*
> + * A large number of vcpus faulting on the same stage 2 entry,
> + * can lead to a refault due to the
> + * stage2_pud_clear()/tlb_flush(). Skip updating the page
> + * tables if there is no change.
> + */
> + if (pud_val(old_pud) == pud_val(*new_pudp))
> + return 0;
> +
> + if (stage2_pud_present(kvm, old_pud)) {
> + stage2_pud_clear(kvm, pudp);
> + kvm_tlb_flush_vmid_ipa(kvm, addr);
> + } else {
> + get_page(virt_to_page(pudp));
> + }
> +
> + kvm_set_pud(pudp, *new_pudp);
> + return 0;
> +}
> +
> /*
> * stage2_get_leaf_entry - walk the stage2 VM page tables and return
> * true if a valid and present leaf-entry is found. A pointer to the
> @@ -1149,6 +1198,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> phys_addr_t addr, const pte_t *new_pte,
> unsigned long flags)
> {
> + pud_t *pud;
> pmd_t *pmd;
> pte_t *pte, old_pte;
> bool iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP;
> @@ -1157,7 +1207,31 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> VM_BUG_ON(logging_active && !cache);
>
> /* Create stage-2 page table mapping - Levels 0 and 1 */
> - pmd = stage2_get_pmd(kvm, cache, addr);
> + pud = stage2_get_pud(kvm, cache, addr);
> + if (!pud) {
> + /*
> + * Ignore calls from kvm_set_spte_hva for unallocated
> + * address ranges.
> + */
> + return 0;
> + }
> +
> + /*
> + * While dirty page logging - dissolve huge PUD, then continue
> + * on to allocate page.
> + */
> + if (logging_active)
> + stage2_dissolve_pud(kvm, addr, pud);
> +
> + if (stage2_pud_none(kvm, *pud)) {
> + if (!cache)
> + return 0; /* ignore calls from kvm_set_spte_hva */
> + pmd = mmu_memory_cache_alloc(cache);
> + stage2_pud_populate(kvm, pud, pmd);
> + get_page(virt_to_page(pud));
> + }
> +
> + pmd = stage2_pmd_offset(kvm, pud, addr);
> if (!pmd) {
> /*
> * Ignore calls from kvm_set_spte_hva for unallocated
> @@ -1557,12 +1631,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> }
>
> vma_pagesize = vma_kernel_pagesize(vma);
> - if (vma_pagesize == PMD_SIZE && !logging_active) {
> - gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> + /*
> + * PUD level may not exist for a VM but PMD is guaranteed to
> + * exist.
> + */
> + if ((vma_pagesize == PMD_SIZE ||
> + (vma_pagesize == PUD_SIZE && kvm_stage2_has_pud(kvm))) &&
> + !logging_active) {

This can be better and wrapped in a helper. Probably arm64 definition
for the helper should check on (PUD_SIZE || PMD_SIZE) but for arm32
it should just check on PMD_SIZE alone. But this is optional.

> + gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
> } else {
> /*
> * Fallback to PTE if it's not one of the Stage 2
> - * supported hugepage sizes
> + * supported hugepage sizes or the corresponding level
> + * doesn't exist
> */
> vma_pagesize = PAGE_SIZE;
>
> @@ -1661,7 +1742,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> needs_exec = exec_fault ||
> (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
>
> - if (vma_pagesize == PMD_SIZE) {
> + if (vma_pagesize == PUD_SIZE) {
> + pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
> +
> + new_pud = kvm_pud_mkhuge(new_pud);
> + if (writable)
> + new_pud = kvm_s2pud_mkwrite(new_pud);
> +
> + if (needs_exec)
> + new_pud = kvm_s2pud_mkexec(new_pud);
> +
> + ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud);
> + } else if (vma_pagesize == PMD_SIZE) {

Ran some memory intensive tests on guests with these patches in place but
will continue experimenting and looking for more details in here.

2018-12-05 10:48:05

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v9 2/8] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault



On 03/12/2018 13:32, Anshuman Khandual wrote:
>
>
> On 10/31/2018 11:27 PM, Punit Agrawal wrote:
>> Stage 2 fault handler marks a page as executable if it is handling an
>> execution fault or if it was a permission fault in which case the
>> executable bit needs to be preserved.
>>
>> The logic to decide if the page should be marked executable is
>> duplicated for PMD and PTE entries. To avoid creating another copy
>> when support for PUD hugepages is introduced refactor the code to
>> share the checks needed to mark a page table entry as executable.
>>
>> Signed-off-by: Punit Agrawal <[email protected]>
>> Reviewed-by: Suzuki K Poulose <[email protected]>
>> Cc: Christoffer Dall <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> ---
>> virt/kvm/arm/mmu.c | 28 +++++++++++++++-------------
>> 1 file changed, 15 insertions(+), 13 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 59595207c5e1..6912529946fb 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1475,7 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> unsigned long fault_status)
>> {
>> int ret;
>> - bool write_fault, exec_fault, writable, force_pte = false;
>> + bool write_fault, writable, force_pte = false;
>> + bool exec_fault, needs_exec;
>
> New line not required, still within 80 characters.
>
>> unsigned long mmu_seq;
>> gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>> struct kvm *kvm = vcpu->kvm;
>> @@ -1598,19 +1599,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> if (exec_fault)
>> invalidate_icache_guest_page(pfn, vma_pagesize);
>>
>> + /*
>> + * If we took an execution fault we have made the
>> + * icache/dcache coherent above and should now let the s2
>
> Coherent or invalidated with invalidate_icache_guest_page ?

We also do clean_dcache above if needed. So that makes sure
the data is coherent. Am I missing something here ?

>
>> + * mapping be executable.
>> + *
>> + * Write faults (!exec_fault && FSC_PERM) are orthogonal to
>> + * execute permissions, and we preserve whatever we have.
>> + */
>
> Otherwise looks good.
>


Suzuki

2018-12-05 17:59:59

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v9 5/8] KVM: arm64: Support PUD hugepage in stage2_is_exec()



On 01/11/2018 13:38, Christoffer Dall wrote:
> On Wed, Oct 31, 2018 at 05:57:42PM +0000, Punit Agrawal wrote:
>> In preparation for creating PUD hugepages at stage 2, add support for
>> detecting execute permissions on PUD page table entries. Faults due to
>> lack of execute permissions on page table entries is used to perform
>> i-cache invalidation on first execute.
>>
>> Provide trivial implementations of arm32 helpers to allow sharing of
>> code.
>>
>> Signed-off-by: Punit Agrawal <[email protected]>
>> Reviewed-by: Suzuki K Poulose <[email protected]>
>> Cc: Christoffer Dall <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Russell King <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> ---
>> arch/arm/include/asm/kvm_mmu.h | 6 +++
>> arch/arm64/include/asm/kvm_mmu.h | 5 +++
>> arch/arm64/include/asm/pgtable-hwdef.h | 2 +
>> virt/kvm/arm/mmu.c | 53 +++++++++++++++++++++++---
>> 4 files changed, 61 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 37bf85d39607..839a619873d3 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -102,6 +102,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
>> return false;
>> }
>>
>> +static inline bool kvm_s2pud_exec(pud_t *pud)
>> +{
>> + BUG();
>
> nit: I think this should be WARN() now :)
>
>> + return false;
>> +}
>> +
>> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
>> {
>> pte_val(pte) |= L_PTE_S2_RDWR;
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 8da6d1b2a196..c755b37b3f92 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -261,6 +261,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp)
>> return kvm_s2pte_readonly((pte_t *)pudp);
>> }
>>
>> +static inline bool kvm_s2pud_exec(pud_t *pudp)
>> +{
>> + return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
>> +}
>> +
>> #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
>>
>> #ifdef __PAGETABLE_PMD_FOLDED
>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>> index 1d7d8da2ef9b..336e24cddc87 100644
>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> @@ -193,6 +193,8 @@
>> #define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
>> #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */
>>
>> +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */
>> +
>> /*
>> * Memory Attribute override for Stage-2 (MemAttr[3:0])
>> */
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 1c669c3c1208..8e44dccd1b47 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1083,23 +1083,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>> return 0;
>> }
>>
>> -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
>> +/*
>> + * stage2_get_leaf_entry - walk the stage2 VM page tables and return
>> + * true if a valid and present leaf-entry is found. A pointer to the
>> + * leaf-entry is returned in the appropriate level variable - pudpp,
>> + * pmdpp, ptepp.
>> + */
>> +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
>> + pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
>
> Do we need this type madness or could this just return a u64 pointer
> (NULL if nothing is found) and pass that to kvm_s2pte_exec (because we
> know it's the same bit we need to check regardless of the pgtable level
> on both arm and arm64)?
>
> Or do we consider that bad for some reason?

Practically, yes the bit positions are same and thus we should be able
to do this assuming that it is just a pte. When we get to independent stage2
pgtable implementation which treats all page table entries as a single type
with a level information, we should be able to get rid of these.
But since we have followed the Linux way of page-table manipulation where we
have "level" specific accessors. The other option is open code the walking
sequence from the pgd to the leaf entry everywhere.

I am fine with changing this code, if you like.

Cheers
Suzuki

2018-12-10 08:58:30

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()

On Mon, Dec 03, 2018 at 01:37:37PM +0000, Suzuki K Poulose wrote:
> Hi Anshuman,
>
> On 03/12/2018 12:11, Anshuman Khandual wrote:
> >
> >
> >On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> >>The code for operations such as marking the pfn as dirty, and
> >>dcache/icache maintenance during stage 2 fault handling is duplicated
> >>between normal pages and PMD hugepages.
> >>
> >>Instead of creating another copy of the operations when we introduce
> >>PUD hugepages, let's share them across the different pagesizes.
> >>
> >>Signed-off-by: Punit Agrawal <[email protected]>
> >>Reviewed-by: Suzuki K Poulose <[email protected]>
> >>Cc: Christoffer Dall <[email protected]>
> >>Cc: Marc Zyngier <[email protected]>
> >>---
> >> virt/kvm/arm/mmu.c | 49 ++++++++++++++++++++++++++++------------------
> >> 1 file changed, 30 insertions(+), 19 deletions(-)
> >>
> >>diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >>index 5eca48bdb1a6..59595207c5e1 100644
> >>--- a/virt/kvm/arm/mmu.c
> >>+++ b/virt/kvm/arm/mmu.c
> >>@@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >> unsigned long fault_status)
> >> {
> >> int ret;
> >>- bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
> >>+ bool write_fault, exec_fault, writable, force_pte = false;
> >> unsigned long mmu_seq;
> >> gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> >> struct kvm *kvm = vcpu->kvm;
> >>@@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >> kvm_pfn_t pfn;
> >> pgprot_t mem_type = PAGE_S2;
> >> bool logging_active = memslot_is_logging(memslot);
> >>- unsigned long flags = 0;
> >>+ unsigned long vma_pagesize, flags = 0;
> >
> >A small nit s/vma_pagesize/pagesize. Why call it VMA ? Its implicit.
>
> May be we could call it mapsize. pagesize is confusing.
>

I'm ok with mapsize. I see the vma_pagesize name coming from the fact
that this is initially set to the return value from vma_kernel_pagesize.

I have not problems with either vma_pagesize or mapsize.

> >
> >> write_fault = kvm_is_write_fault(vcpu);
> >> exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> >>@@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >> return -EFAULT;
> >> }
> >>- if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
> >>- hugetlb = true;
> >>+ vma_pagesize = vma_kernel_pagesize(vma);
> >>+ if (vma_pagesize == PMD_SIZE && !logging_active) {
> >> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> >> } else {
> >>+ /*
> >>+ * Fallback to PTE if it's not one of the Stage 2
> >>+ * supported hugepage sizes
> >>+ */
> >>+ vma_pagesize = PAGE_SIZE;
> >
> >This seems redundant and should be dropped. vma_kernel_pagesize() here either
> >calls hugetlb_vm_op_pagesize (via hugetlb_vm_ops->pagesize) or simply returns
> >PAGE_SIZE. The vm_ops path is taken if the QEMU VMA covering any given HVA is
> >backed either by HugeTLB pages or simply normal pages. vma_pagesize would
> >either has a value of PMD_SIZE (HugeTLB hstate based) or PAGE_SIZE. Hence if
> >its not PMD_SIZE it must be PAGE_SIZE which should not be assigned again.
>
> We may want to force using the PTE mappings when logging_active (e.g, migration
> ?) to prevent keep tracking of huge pages. So the check is still valid.
>
>

Agreed, and let's not try additionally change the logic and flow with
this patch.

> >
> >>+
> >> /*
> >> * Pages belonging to memslots that don't have the same
> >> * alignment for userspace and IPA cannot be mapped using
> >>@@ -1573,23 +1579,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >> if (mmu_notifier_retry(kvm, mmu_seq))
> >> goto out_unlock;
> >>- if (!hugetlb && !force_pte)
> >>- hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> >>+ if (vma_pagesize == PAGE_SIZE && !force_pte) {
> >>+ /*
> >>+ * Only PMD_SIZE transparent hugepages(THP) are
> >>+ * currently supported. This code will need to be
> >>+ * updated to support other THP sizes.
> >>+ */
> >
> >This comment belongs to transparent_hugepage_adjust() but not here.
>
> I think this is relevant here than in thp_adjust, unless we rename
> the function below to something generic, handle_hugepage_adjust().
>

Agreed.

> >>+ if (transparent_hugepage_adjust(&pfn, &fault_ipa))
> >>+ vma_pagesize = PMD_SIZE;
> >
> >IIUC transparent_hugepage_adjust() is only getting called here. Instead of
> >returning 'true' when it is able to detect a huge page backing and doing
> >an adjustment there after, it should rather return THP size (PMD_SIZE) to
> >accommodate probable multi size THP support in future .
>
> That makes sense.
>

That's fine.

> >
> >>+ }
> >>+
> >>+ if (writable)
> >>+ kvm_set_pfn_dirty(pfn);
> >>- if (hugetlb) {
> >>+ if (fault_status != FSC_PERM)
> >>+ clean_dcache_guest_page(pfn, vma_pagesize);
> >>+
> >>+ if (exec_fault)
> >>+ invalidate_icache_guest_page(pfn, vma_pagesize);
> >>+
> >>+ if (vma_pagesize == PMD_SIZE) {
> >> pmd_t new_pmd = pfn_pmd(pfn, mem_type);
> >> new_pmd = pmd_mkhuge(new_pmd);
> >>- if (writable) {
> >>+ if (writable)
> >> new_pmd = kvm_s2pmd_mkwrite(new_pmd);
> >>- kvm_set_pfn_dirty(pfn);
> >>- }
> >>-
> >>- if (fault_status != FSC_PERM)
> >>- clean_dcache_guest_page(pfn, PMD_SIZE);
> >> if (exec_fault) {
> >> new_pmd = kvm_s2pmd_mkexec(new_pmd);
> >>- invalidate_icache_guest_page(pfn, PMD_SIZE);
> >> } else if (fault_status == FSC_PERM) {
> >> /* Preserve execute if XN was already cleared */
> >> if (stage2_is_exec(kvm, fault_ipa))
> >>@@ -1602,16 +1618,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >> if (writable) {
> >> new_pte = kvm_s2pte_mkwrite(new_pte);
> >>- kvm_set_pfn_dirty(pfn);
> >> mark_page_dirty(kvm, gfn);
> >> }
> >>- if (fault_status != FSC_PERM)
> >>- clean_dcache_guest_page(pfn, PAGE_SIZE);
> >>-
> >> if (exec_fault) {
> >> new_pte = kvm_s2pte_mkexec(new_pte);
> >>- invalidate_icache_guest_page(pfn, PAGE_SIZE);
> >> } else if (fault_status == FSC_PERM) {
> >> /* Preserve execute if XN was already cleared */
> >> if (stage2_is_exec(kvm, fault_ipa))
> >>
> >
> >kvm_set_pfn_dirty, clean_dcache_guest_page, invalidate_icache_guest_page
> >can all be safely moved before setting the page table entries either as
> >PMD or PTE.
>
> I think this is what we do currently. So I assume this is fine.
>
Agreed, I don't understand the comment raised by Anshuman here.


Thanks,

Christoffer

2018-12-10 09:00:03

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v9 2/8] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault

On Mon, Dec 03, 2018 at 07:02:23PM +0530, Anshuman Khandual wrote:
>
>
> On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> > Stage 2 fault handler marks a page as executable if it is handling an
> > execution fault or if it was a permission fault in which case the
> > executable bit needs to be preserved.
> >
> > The logic to decide if the page should be marked executable is
> > duplicated for PMD and PTE entries. To avoid creating another copy
> > when support for PUD hugepages is introduced refactor the code to
> > share the checks needed to mark a page table entry as executable.
> >
> > Signed-off-by: Punit Agrawal <[email protected]>
> > Reviewed-by: Suzuki K Poulose <[email protected]>
> > Cc: Christoffer Dall <[email protected]>
> > Cc: Marc Zyngier <[email protected]>
> > ---
> > virt/kvm/arm/mmu.c | 28 +++++++++++++++-------------
> > 1 file changed, 15 insertions(+), 13 deletions(-)
> >
> > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > index 59595207c5e1..6912529946fb 100644
> > --- a/virt/kvm/arm/mmu.c
> > +++ b/virt/kvm/arm/mmu.c
> > @@ -1475,7 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > unsigned long fault_status)
> > {
> > int ret;
> > - bool write_fault, exec_fault, writable, force_pte = false;
> > + bool write_fault, writable, force_pte = false;
> > + bool exec_fault, needs_exec;
>
> New line not required, still within 80 characters.
>

He's trying to logically group the two variables. I don't see a problem
with that.


Thanks,

Christoffer

2018-12-10 09:45:43

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v9 2/8] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault

On Wed, Dec 05, 2018 at 10:47:10AM +0000, Suzuki K Poulose wrote:
>
>
> On 03/12/2018 13:32, Anshuman Khandual wrote:
> >
> >
> >On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> >>Stage 2 fault handler marks a page as executable if it is handling an
> >>execution fault or if it was a permission fault in which case the
> >>executable bit needs to be preserved.
> >>
> >>The logic to decide if the page should be marked executable is
> >>duplicated for PMD and PTE entries. To avoid creating another copy
> >>when support for PUD hugepages is introduced refactor the code to
> >>share the checks needed to mark a page table entry as executable.
> >>
> >>Signed-off-by: Punit Agrawal <[email protected]>
> >>Reviewed-by: Suzuki K Poulose <[email protected]>
> >>Cc: Christoffer Dall <[email protected]>
> >>Cc: Marc Zyngier <[email protected]>
> >>---
> >> virt/kvm/arm/mmu.c | 28 +++++++++++++++-------------
> >> 1 file changed, 15 insertions(+), 13 deletions(-)
> >>
> >>diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >>index 59595207c5e1..6912529946fb 100644
> >>--- a/virt/kvm/arm/mmu.c
> >>+++ b/virt/kvm/arm/mmu.c
> >>@@ -1475,7 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >> unsigned long fault_status)
> >> {
> >> int ret;
> >>- bool write_fault, exec_fault, writable, force_pte = false;
> >>+ bool write_fault, writable, force_pte = false;
> >>+ bool exec_fault, needs_exec;
> >
> >New line not required, still within 80 characters.
> >
> >> unsigned long mmu_seq;
> >> gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> >> struct kvm *kvm = vcpu->kvm;
> >>@@ -1598,19 +1599,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >> if (exec_fault)
> >> invalidate_icache_guest_page(pfn, vma_pagesize);
> >>+ /*
> >>+ * If we took an execution fault we have made the
> >>+ * icache/dcache coherent above and should now let the s2
> >
> >Coherent or invalidated with invalidate_icache_guest_page ?
>
> We also do clean_dcache above if needed. So that makes sure
> the data is coherent. Am I missing something here ?
>

I think you've got it right. We have made the icache coherent with the
data/instructions in the page by invalidating the icache. I think the
comment is ok either way.

Thanks,

Christoffer

2018-12-10 09:45:47

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v9 3/8] KVM: arm/arm64: Introduce helpers to manipulate page table entries

On Mon, Dec 03, 2018 at 07:20:08PM +0530, Anshuman Khandual wrote:
>
>
> On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> > Introduce helpers to abstract architectural handling of the conversion
> > of pfn to page table entries and marking a PMD page table entry as a
> > block entry.
>
> Why is this necessary ? we would still need to access PMD, PUD as is
> without any conversion. IOW KVM knows the breakdown of the page table
> at various levels. Is this something required from generic KVM code ?
>
> >
> > The helpers are introduced in preparation for supporting PUD hugepages
> > at stage 2 - which are supported on arm64 but do not exist on arm.
>
> Some of these patches (including the earlier two) are good on it's
> own. Do we have still mention in commit message about the incoming PUD
> enablement as the reason for these cleanup patches ?
>

Does it hurt? What is your concern here?


Thanks,

Christoffer

2018-12-10 09:46:46

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v9 5/8] KVM: arm64: Support PUD hugepage in stage2_is_exec()

On Wed, Dec 05, 2018 at 05:57:51PM +0000, Suzuki K Poulose wrote:
>
>
> On 01/11/2018 13:38, Christoffer Dall wrote:
> >On Wed, Oct 31, 2018 at 05:57:42PM +0000, Punit Agrawal wrote:
> >>In preparation for creating PUD hugepages at stage 2, add support for
> >>detecting execute permissions on PUD page table entries. Faults due to
> >>lack of execute permissions on page table entries is used to perform
> >>i-cache invalidation on first execute.
> >>
> >>Provide trivial implementations of arm32 helpers to allow sharing of
> >>code.
> >>
> >>Signed-off-by: Punit Agrawal <[email protected]>
> >>Reviewed-by: Suzuki K Poulose <[email protected]>
> >>Cc: Christoffer Dall <[email protected]>
> >>Cc: Marc Zyngier <[email protected]>
> >>Cc: Russell King <[email protected]>
> >>Cc: Catalin Marinas <[email protected]>
> >>Cc: Will Deacon <[email protected]>
> >>---
> >> arch/arm/include/asm/kvm_mmu.h | 6 +++
> >> arch/arm64/include/asm/kvm_mmu.h | 5 +++
> >> arch/arm64/include/asm/pgtable-hwdef.h | 2 +
> >> virt/kvm/arm/mmu.c | 53 +++++++++++++++++++++++---
> >> 4 files changed, 61 insertions(+), 5 deletions(-)
> >>
> >>diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >>index 37bf85d39607..839a619873d3 100644
> >>--- a/arch/arm/include/asm/kvm_mmu.h
> >>+++ b/arch/arm/include/asm/kvm_mmu.h
> >>@@ -102,6 +102,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
> >> return false;
> >> }
> >>+static inline bool kvm_s2pud_exec(pud_t *pud)
> >>+{
> >>+ BUG();
> >
> >nit: I think this should be WARN() now :)
> >
> >>+ return false;
> >>+}
> >>+
> >> static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> >> {
> >> pte_val(pte) |= L_PTE_S2_RDWR;
> >>diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> >>index 8da6d1b2a196..c755b37b3f92 100644
> >>--- a/arch/arm64/include/asm/kvm_mmu.h
> >>+++ b/arch/arm64/include/asm/kvm_mmu.h
> >>@@ -261,6 +261,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp)
> >> return kvm_s2pte_readonly((pte_t *)pudp);
> >> }
> >>+static inline bool kvm_s2pud_exec(pud_t *pudp)
> >>+{
> >>+ return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
> >>+}
> >>+
> >> #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
> >> #ifdef __PAGETABLE_PMD_FOLDED
> >>diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> >>index 1d7d8da2ef9b..336e24cddc87 100644
> >>--- a/arch/arm64/include/asm/pgtable-hwdef.h
> >>+++ b/arch/arm64/include/asm/pgtable-hwdef.h
> >>@@ -193,6 +193,8 @@
> >> #define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
> >> #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */
> >>+#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */
> >>+
> >> /*
> >> * Memory Attribute override for Stage-2 (MemAttr[3:0])
> >> */
> >>diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >>index 1c669c3c1208..8e44dccd1b47 100644
> >>--- a/virt/kvm/arm/mmu.c
> >>+++ b/virt/kvm/arm/mmu.c
> >>@@ -1083,23 +1083,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> >> return 0;
> >> }
> >>-static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> >>+/*
> >>+ * stage2_get_leaf_entry - walk the stage2 VM page tables and return
> >>+ * true if a valid and present leaf-entry is found. A pointer to the
> >>+ * leaf-entry is returned in the appropriate level variable - pudpp,
> >>+ * pmdpp, ptepp.
> >>+ */
> >>+static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
> >>+ pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
> >
> >Do we need this type madness or could this just return a u64 pointer
> >(NULL if nothing is found) and pass that to kvm_s2pte_exec (because we
> >know it's the same bit we need to check regardless of the pgtable level
> >on both arm and arm64)?
> >
> >Or do we consider that bad for some reason?
>
> Practically, yes the bit positions are same and thus we should be able
> to do this assuming that it is just a pte. When we get to independent stage2
> pgtable implementation which treats all page table entries as a single type
> with a level information, we should be able to get rid of these.
> But since we have followed the Linux way of page-table manipulation where we
> have "level" specific accessors. The other option is open code the walking
> sequence from the pgd to the leaf entry everywhere.
>
> I am fine with changing this code, if you like.
>

Meh, it just looked a bit over-engineered to me when I originally looked
at the patches, but you're right, they align with the rest of the
implementation.

Thanks,

Christoffer

2018-12-10 10:37:52

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()



On 10/12/2018 08:56, Christoffer Dall wrote:
> On Mon, Dec 03, 2018 at 01:37:37PM +0000, Suzuki K Poulose wrote:
>> Hi Anshuman,
>>
>> On 03/12/2018 12:11, Anshuman Khandual wrote:
>>>
>>>
>>> On 10/31/2018 11:27 PM, Punit Agrawal wrote:
>>>> The code for operations such as marking the pfn as dirty, and
>>>> dcache/icache maintenance during stage 2 fault handling is duplicated
>>>> between normal pages and PMD hugepages.
>>>>
>>>> Instead of creating another copy of the operations when we introduce
>>>> PUD hugepages, let's share them across the different pagesizes.
>>>>
>>>> Signed-off-by: Punit Agrawal <[email protected]>
>>>> Reviewed-by: Suzuki K Poulose <[email protected]>
>>>> Cc: Christoffer Dall <[email protected]>
>>>> Cc: Marc Zyngier <[email protected]>
>>>> ---
>>>> virt/kvm/arm/mmu.c | 49 ++++++++++++++++++++++++++++------------------
>>>> 1 file changed, 30 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>> index 5eca48bdb1a6..59595207c5e1 100644
>>>> --- a/virt/kvm/arm/mmu.c
>>>> +++ b/virt/kvm/arm/mmu.c
>>>> @@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>> unsigned long fault_status)
>>>> {
>>>> int ret;
>>>> - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
>>>> + bool write_fault, exec_fault, writable, force_pte = false;
>>>> unsigned long mmu_seq;
>>>> gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>>>> struct kvm *kvm = vcpu->kvm;
>>>> @@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>> kvm_pfn_t pfn;
>>>> pgprot_t mem_type = PAGE_S2;
>>>> bool logging_active = memslot_is_logging(memslot);
>>>> - unsigned long flags = 0;
>>>> + unsigned long vma_pagesize, flags = 0;
>>>
>>> A small nit s/vma_pagesize/pagesize. Why call it VMA ? Its implicit.
>>
>> May be we could call it mapsize. pagesize is confusing.
>>
>
> I'm ok with mapsize. I see the vma_pagesize name coming from the fact
> that this is initially set to the return value from vma_kernel_pagesize.
>
> I have not problems with either vma_pagesize or mapsize.

Ok, I will retain the vma_pagesize to avoid unnecessary changes to the patch.

Thanks
Suzuki

2018-12-10 10:49:53

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()



On 10/12/2018 08:56, Christoffer Dall wrote:
> On Mon, Dec 03, 2018 at 01:37:37PM +0000, Suzuki K Poulose wrote:
>> Hi Anshuman,
>>
>> On 03/12/2018 12:11, Anshuman Khandual wrote:
>>>
>>>
>>> On 10/31/2018 11:27 PM, Punit Agrawal wrote:
>>>> The code for operations such as marking the pfn as dirty, and
>>>> dcache/icache maintenance during stage 2 fault handling is duplicated
>>>> between normal pages and PMD hugepages.
>>>>
>>>> Instead of creating another copy of the operations when we introduce
>>>> PUD hugepages, let's share them across the different pagesizes.
>>>>
>>>> Signed-off-by: Punit Agrawal <[email protected]>
>>>> Reviewed-by: Suzuki K Poulose <[email protected]>
>>>> Cc: Christoffer Dall <[email protected]>
>>>> Cc: Marc Zyngier <[email protected]>
>>>> ---
>>>> virt/kvm/arm/mmu.c | 49 ++++++++++++++++++++++++++++------------------
>>>> 1 file changed, 30 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>> index 5eca48bdb1a6..59595207c5e1 100644
>>>> --- a/virt/kvm/arm/mmu.c
>>>> +++ b/virt/kvm/arm/mmu.c
>>>> @@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>> unsigned long fault_status)
>>>> {
>>>> int ret;
>>>> - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
>>>> + bool write_fault, exec_fault, writable, force_pte = false;
>>>> unsigned long mmu_seq;
>>>> gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>>>> struct kvm *kvm = vcpu->kvm;
>>>> @@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>> kvm_pfn_t pfn;
>>>> pgprot_t mem_type = PAGE_S2;
>>>> bool logging_active = memslot_is_logging(memslot);
>>>> - unsigned long flags = 0;
>>>> + unsigned long vma_pagesize, flags = 0;
>>>
>>> A small nit s/vma_pagesize/pagesize. Why call it VMA ? Its implicit.
>>
>> May be we could call it mapsize. pagesize is confusing.
>>
>
> I'm ok with mapsize. I see the vma_pagesize name coming from the fact
> that this is initially set to the return value from vma_kernel_pagesize.
>
> I have not problems with either vma_pagesize or mapsize.
>
>>>
>>>> write_fault = kvm_is_write_fault(vcpu);
>>>> exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
>>>> @@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>> return -EFAULT;
>>>> }
>>>> - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
>>>> - hugetlb = true;
>>>> + vma_pagesize = vma_kernel_pagesize(vma);
>>>> + if (vma_pagesize == PMD_SIZE && !logging_active) {
>>>> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>>>> } else {
>>>> + /*
>>>> + * Fallback to PTE if it's not one of the Stage 2
>>>> + * supported hugepage sizes
>>>> + */
>>>> + vma_pagesize = PAGE_SIZE;
>>>
>>> This seems redundant and should be dropped. vma_kernel_pagesize() here either
>>> calls hugetlb_vm_op_pagesize (via hugetlb_vm_ops->pagesize) or simply returns
>>> PAGE_SIZE. The vm_ops path is taken if the QEMU VMA covering any given HVA is
>>> backed either by HugeTLB pages or simply normal pages. vma_pagesize would
>>> either has a value of PMD_SIZE (HugeTLB hstate based) or PAGE_SIZE. Hence if
>>> its not PMD_SIZE it must be PAGE_SIZE which should not be assigned again.
>>
>> We may want to force using the PTE mappings when logging_active (e.g, migration
>> ?) to prevent keep tracking of huge pages. So the check is still valid.
>>
>>
>
> Agreed, and let's not try additionally change the logic and flow with
> this patch.
>
>>>
>>>> +
>>>> /*
>>>> * Pages belonging to memslots that don't have the same
>>>> * alignment for userspace and IPA cannot be mapped using
>>>> @@ -1573,23 +1579,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>> if (mmu_notifier_retry(kvm, mmu_seq))
>>>> goto out_unlock;
>>>> - if (!hugetlb && !force_pte)
>>>> - hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>>> + if (vma_pagesize == PAGE_SIZE && !force_pte) {
>>>> + /*
>>>> + * Only PMD_SIZE transparent hugepages(THP) are
>>>> + * currently supported. This code will need to be
>>>> + * updated to support other THP sizes.
>>>> + */
>>>
>>> This comment belongs to transparent_hugepage_adjust() but not here.
>>
>> I think this is relevant here than in thp_adjust, unless we rename
>> the function below to something generic, handle_hugepage_adjust().
>>
>
> Agreed.
>
>>>> + if (transparent_hugepage_adjust(&pfn, &fault_ipa))
>>>> + vma_pagesize = PMD_SIZE;
>>>
>>> IIUC transparent_hugepage_adjust() is only getting called here. Instead of
>>> returning 'true' when it is able to detect a huge page backing and doing
>>> an adjustment there after, it should rather return THP size (PMD_SIZE) to
>>> accommodate probable multi size THP support in future .
>>
>> That makes sense.
>>
>
> That's fine.
>

Btw, after a further thought, since we don't have any THP support for anything
other than PMD_SIZE, I am dropping the above suggestion. We need to make changes
in our stage2 page table manipulation code anyway to support the new sizes. So
this could be addressed when we get there, to keep the changes minimal and
specific to the PUD huge page support.


Cheers
Suzuki


2018-12-10 12:25:15

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()

On Mon, Dec 10, 2018 at 10:47:42AM +0000, Suzuki K Poulose wrote:
>
>
> On 10/12/2018 08:56, Christoffer Dall wrote:
> >On Mon, Dec 03, 2018 at 01:37:37PM +0000, Suzuki K Poulose wrote:
> >>Hi Anshuman,
> >>
> >>On 03/12/2018 12:11, Anshuman Khandual wrote:
> >>>
> >>>
> >>>On 10/31/2018 11:27 PM, Punit Agrawal wrote:
> >>>>The code for operations such as marking the pfn as dirty, and
> >>>>dcache/icache maintenance during stage 2 fault handling is duplicated
> >>>>between normal pages and PMD hugepages.
> >>>>
> >>>>Instead of creating another copy of the operations when we introduce
> >>>>PUD hugepages, let's share them across the different pagesizes.
> >>>>
> >>>>Signed-off-by: Punit Agrawal <[email protected]>
> >>>>Reviewed-by: Suzuki K Poulose <[email protected]>
> >>>>Cc: Christoffer Dall <[email protected]>
> >>>>Cc: Marc Zyngier <[email protected]>
> >>>>---
> >>>> virt/kvm/arm/mmu.c | 49 ++++++++++++++++++++++++++++------------------
> >>>> 1 file changed, 30 insertions(+), 19 deletions(-)
> >>>>
> >>>>diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >>>>index 5eca48bdb1a6..59595207c5e1 100644
> >>>>--- a/virt/kvm/arm/mmu.c
> >>>>+++ b/virt/kvm/arm/mmu.c
> >>>>@@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>>> unsigned long fault_status)
> >>>> {
> >>>> int ret;
> >>>>- bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
> >>>>+ bool write_fault, exec_fault, writable, force_pte = false;
> >>>> unsigned long mmu_seq;
> >>>> gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> >>>> struct kvm *kvm = vcpu->kvm;
> >>>>@@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>>> kvm_pfn_t pfn;
> >>>> pgprot_t mem_type = PAGE_S2;
> >>>> bool logging_active = memslot_is_logging(memslot);
> >>>>- unsigned long flags = 0;
> >>>>+ unsigned long vma_pagesize, flags = 0;
> >>>
> >>>A small nit s/vma_pagesize/pagesize. Why call it VMA ? Its implicit.
> >>
> >>May be we could call it mapsize. pagesize is confusing.
> >>
> >
> >I'm ok with mapsize. I see the vma_pagesize name coming from the fact
> >that this is initially set to the return value from vma_kernel_pagesize.
> >
> >I have not problems with either vma_pagesize or mapsize.
> >
> >>>
> >>>> write_fault = kvm_is_write_fault(vcpu);
> >>>> exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> >>>>@@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>>> return -EFAULT;
> >>>> }
> >>>>- if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
> >>>>- hugetlb = true;
> >>>>+ vma_pagesize = vma_kernel_pagesize(vma);
> >>>>+ if (vma_pagesize == PMD_SIZE && !logging_active) {
> >>>> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> >>>> } else {
> >>>>+ /*
> >>>>+ * Fallback to PTE if it's not one of the Stage 2
> >>>>+ * supported hugepage sizes
> >>>>+ */
> >>>>+ vma_pagesize = PAGE_SIZE;
> >>>
> >>>This seems redundant and should be dropped. vma_kernel_pagesize() here either
> >>>calls hugetlb_vm_op_pagesize (via hugetlb_vm_ops->pagesize) or simply returns
> >>>PAGE_SIZE. The vm_ops path is taken if the QEMU VMA covering any given HVA is
> >>>backed either by HugeTLB pages or simply normal pages. vma_pagesize would
> >>>either has a value of PMD_SIZE (HugeTLB hstate based) or PAGE_SIZE. Hence if
> >>>its not PMD_SIZE it must be PAGE_SIZE which should not be assigned again.
> >>
> >>We may want to force using the PTE mappings when logging_active (e.g, migration
> >>?) to prevent keep tracking of huge pages. So the check is still valid.
> >>
> >>
> >
> >Agreed, and let's not try additionally change the logic and flow with
> >this patch.
> >
> >>>
> >>>>+
> >>>> /*
> >>>> * Pages belonging to memslots that don't have the same
> >>>> * alignment for userspace and IPA cannot be mapped using
> >>>>@@ -1573,23 +1579,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>>> if (mmu_notifier_retry(kvm, mmu_seq))
> >>>> goto out_unlock;
> >>>>- if (!hugetlb && !force_pte)
> >>>>- hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> >>>>+ if (vma_pagesize == PAGE_SIZE && !force_pte) {
> >>>>+ /*
> >>>>+ * Only PMD_SIZE transparent hugepages(THP) are
> >>>>+ * currently supported. This code will need to be
> >>>>+ * updated to support other THP sizes.
> >>>>+ */
> >>>
> >>>This comment belongs to transparent_hugepage_adjust() but not here.
> >>
> >>I think this is relevant here than in thp_adjust, unless we rename
> >>the function below to something generic, handle_hugepage_adjust().
> >>
> >
> >Agreed.
> >
> >>>>+ if (transparent_hugepage_adjust(&pfn, &fault_ipa))
> >>>>+ vma_pagesize = PMD_SIZE;
> >>>
> >>>IIUC transparent_hugepage_adjust() is only getting called here. Instead of
> >>>returning 'true' when it is able to detect a huge page backing and doing
> >>>an adjustment there after, it should rather return THP size (PMD_SIZE) to
> >>>accommodate probable multi size THP support in future .
> >>
> >>That makes sense.
> >>
> >
> >That's fine.
> >
>
> Btw, after a further thought, since we don't have any THP support for anything
> other than PMD_SIZE, I am dropping the above suggestion. We need to make changes
> in our stage2 page table manipulation code anyway to support the new sizes. So
> this could be addressed when we get there, to keep the changes minimal and
> specific to the PUD huge page support.
>
>

Sounds good to me.

Thanks,

Christoffer