2018-03-27 13:27:23

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 00/17] kvm: arm64: Dynamic & 52bit IPA support

The physical address space size for a VM (IPA size) on arm/arm64 is
limited to a static limit of 40bits. This series adds support for
using a limit specific to a VM, allowing to use a limit supported
by the host (based on the host kernel configuration and CPU support).
The default and the minimum size is fixed to 40bits. We also add
support for handling 52bit IPA addresses added by Arm v8.2 extensions.

As mentioned above, the supported IPA size on a host could be different
from the system's PARange indicated by the CPUs (e.g, kernel limit
on the PA size). So we expose the limit via a new system ioctl request
- KVM_ARM_GET_MAX_VM_PHYS_SHIFT - on arm/arm64. This can then be
passed on to the KVM_CREATE_VM ioctl, encoded in the "type" field.
Bits [7-0] of the type are reserved for the IPA size. This approach
allows simpler management of the stage2 page table and guest memory
slots.

The arm64 page table level helpers are defined based on the page
table levels used by the host VA. So, the accessors may not work
if the guest uses more number of levels in stage2 than the stage1
of the host. In order to provide an independent stage2 page table,
we refactor the arm64 page table helpers to give us raw accessors
for each level, which should only used when that level is present.
And then, based on the VM, we make the decision of the stage2
page table using the raw accessors.

52bit support is added for VGIC (including ITS emulation) and handling
of PAR, HPFAR registers.

The series applies on arm64 for-next/core. A tree is available here:

git://linux-arm.org/linux-skp.git ipa52/v2

Changes since V1:
- Change the userspace API for configuring VM to encode the IPA
size in the VM type. (suggested by Christoffer)
- Expose the IPA limit on the host via ioctl on /dev/kvm
- Handle 52bit addresses in PAR & HPFAR
- Drop patch changing the life time of stage2 PGD
- Rename macros for 48-to-52 bit conversion for GIC ITS BASER.
(suggested by Christoffer)
- Split virtio PFN check patches and address comments.

The series also adds :
- Support for handling 52bit IPA for vgic ITS.
- Cleanup in virtio to handle errors when the PFN used in
the virtio transport doesn't fit in 32bit.

Tested with
- Modified kvmtool, which can only be used for (patches included in
the series for reference / testing):
* with virtio-pci upto 44bit PA (Due to 4K page size for virtio-pci
legacy implemented by kvmtool)
* Upto 48bit PA with virtio-mmio, due to 32bit PFN limitation.
- Hacked Qemu (boot loader support for highmem, phys-shift support)
* with virtio-pci GIC-v3 ITS & MSI upto 52bit on Foundation model.

Kristina Martsenko (1):
vgic: Add support for 52bit guest physical address

Suzuki K Poulose (16):
virtio: mmio-v1: Validate queue PFN
virtio: pci-legacy: Validate queue pfn
arm64: Make page table helpers reusable
arm64: Refactor pud_huge for reusability
arm64: Helper for parange to PASize
kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table
kvm: arm/arm64: Remove spurious WARN_ON
kvm: arm/arm64: Prepare for VM specific stage2 translations
kvm: arm64: Make stage2 page table layout dynamic
kvm: arm64: Dynamic configuration of VTCR and VTTBR mask
kvm: arm64: Configure VTCR per VM
kvm: arm/arm64: Expose supported physical address limit for VM
kvm: arm/arm64: Allow tuning the physical address size for VM
kvm: arm64: Switch to per VM IPA limit
kvm: arm64: Add support for handling 52bit IPA
kvm: arm64: Allow IPA size supported by the system

Documentation/virtual/kvm/api.txt | 14 ++
arch/arm/include/asm/kvm_arm.h | 3 +-
arch/arm/include/asm/kvm_mmu.h | 22 ++-
arch/arm/include/asm/stage2_pgtable.h | 42 ++---
arch/arm64/include/asm/cpufeature.h | 16 ++
arch/arm64/include/asm/kvm_arm.h | 119 +++++++++++++--
arch/arm64/include/asm/kvm_asm.h | 2 +-
arch/arm64/include/asm/kvm_host.h | 19 ++-
arch/arm64/include/asm/kvm_mmu.h | 71 +++++++--
arch/arm64/include/asm/pgalloc.h | 34 ++++-
arch/arm64/include/asm/pgtable.h | 63 +++++---
arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 -----
arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 -----
arch/arm64/include/asm/stage2_pgtable.h | 211 +++++++++++++++++---------
arch/arm64/kvm/hyp/s2-setup.c | 34 +----
arch/arm64/kvm/hyp/switch.c | 7 +-
arch/arm64/mm/hugetlbpage.c | 2 +-
drivers/virtio/virtio_mmio.c | 18 ++-
drivers/virtio/virtio_pci_legacy.c | 12 +-
include/linux/irqchip/arm-gic-v3.h | 5 +
include/uapi/linux/kvm.h | 16 ++
virt/kvm/arm/arm.c | 32 +++-
virt/kvm/arm/mmu.c | 124 ++++++++-------
virt/kvm/arm/vgic/vgic-its.c | 37 ++---
virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +-
virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 -
26 files changed, 626 insertions(+), 362 deletions(-)
delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h

kvmtool hack for IPA support :

Suzuki K Poulose (4):
kvmtool: Allow backends to run checks on the KVM device fd
kvmtool: arm64: Add support for guest physical address size
kvmtool: arm64: Switch memory layout
kvmtool: arm: Add support for creating VM with PA size

arm/aarch32/include/kvm/kvm-arch.h | 1 +
arm/aarch64/include/kvm/kvm-arch.h | 15 ++++++++++++---
arm/aarch64/include/kvm/kvm-config-arch.h | 5 ++++-
arm/include/arm-common/kvm-arch.h | 17 +++++++++++------
arm/include/arm-common/kvm-config-arch.h | 1 +
arm/kvm.c | 23 ++++++++++++++++++++++-
include/kvm/kvm.h | 4 ++++
kvm.c | 2 ++
8 files changed, 57 insertions(+), 11 deletions(-)

--
2.7.4


2018-03-27 13:17:46

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 04/17] arm64: Refactor pud_huge for reusability

Make pud_huge reusable for stage2 tables, independent
of the stage1 levels.

Cc: Steve Capper <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/pgtable.h | 5 +++++
arch/arm64/mm/hugetlbpage.c | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 5e22503..4a16c11 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -479,6 +479,11 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
#define __raw_pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT))
#define __raw_pud_present(pud) pte_present(pud_pte(pud))

+static inline int __raw_pud_huge(pud_t pud)
+{
+ return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT);
+}
+
static inline void __raw_set_pud(pud_t *pudp, pud_t pud)
{
WRITE_ONCE(*pudp, pud);
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index ecc6818..3f4bb43 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -35,7 +35,7 @@ int pmd_huge(pmd_t pmd)
int pud_huge(pud_t pud)
{
#ifndef __PAGETABLE_PMD_FOLDED
- return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT);
+ return __raw_pud_huge(pud);
#else
return 0;
#endif
--
2.7.4


2018-03-27 13:17:48

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 06/17] kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table

So far we have only supported 3 level page table with fixed IPA of 40bits.
Fix stage2_flush_memslot() to accommodate for 4 level tables.

Cc: Marc Zyngier <[email protected]>
Acked-by: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
virt/kvm/arm/mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ec62d1c..16a19fb 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -375,7 +375,8 @@ static void stage2_flush_memslot(struct kvm *kvm,
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
do {
next = stage2_pgd_addr_end(addr, end);
- stage2_flush_puds(kvm, pgd, addr, next);
+ if (!stage2_pgd_none(*pgd))
+ stage2_flush_puds(kvm, pgd, addr, next);
} while (pgd++, addr = next, addr != end);
}

--
2.7.4


2018-03-27 13:17:58

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 09/17] kvm: arm64: Make stage2 page table layout dynamic

So far we had a static stage2 page table handling code, based on a
fixed IPA of 40bits. As we prepare for a configurable IPA size per
VM, make the our stage2 page table code dynamic to do the right thing
for a given VM.

Support for the IPA size configuration needs other changes in the way
we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still
fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h
to the top, before including the asm/stage2_pgtable.h to avoid a forward
declaration.

Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/kvm_mmu.h | 16 +-
arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 ------
arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 -----
arch/arm64/include/asm/stage2_pgtable.h | 203 +++++++++++++++++---------
4 files changed, 145 insertions(+), 155 deletions(-)
delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 594c4e6..bc133ce 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -18,9 +18,10 @@
#ifndef __ARM64_KVM_MMU_H__
#define __ARM64_KVM_MMU_H__

+#include <asm/cpufeature.h>
#include <asm/page.h>
#include <asm/memory.h>
-#include <asm/cpufeature.h>
+#include <asm/kvm_arm.h>

/*
* As ARMv8.0 only has the TTBR0_EL2 register, we cannot express
@@ -140,6 +141,13 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK

+static inline bool kvm_page_empty(void *ptr)
+{
+ struct page *ptr_page = virt_to_page(ptr);
+
+ return page_count(ptr_page) == 1;
+}
+
#include <asm/stage2_pgtable.h>

int create_hyp_mappings(void *from, void *to, pgprot_t prot);
@@ -226,12 +234,6 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp)
return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN);
}

-static inline bool kvm_page_empty(void *ptr)
-{
- struct page *ptr_page = virt_to_page(ptr);
- return page_count(ptr_page) == 1;
-}
-
#define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)

#ifdef __PAGETABLE_PMD_FOLDED
diff --git a/arch/arm64/include/asm/stage2_pgtable-nopmd.h b/arch/arm64/include/asm/stage2_pgtable-nopmd.h
deleted file mode 100644
index 0280ded..0000000
--- a/arch/arm64/include/asm/stage2_pgtable-nopmd.h
+++ /dev/null
@@ -1,42 +0,0 @@
-/*
- * Copyright (C) 2016 - ARM Ltd
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef __ARM64_S2_PGTABLE_NOPMD_H_
-#define __ARM64_S2_PGTABLE_NOPMD_H_
-
-#include <asm/stage2_pgtable-nopud.h>
-
-#define __S2_PGTABLE_PMD_FOLDED
-
-#define S2_PMD_SHIFT S2_PUD_SHIFT
-#define S2_PTRS_PER_PMD 1
-#define S2_PMD_SIZE (1UL << S2_PMD_SHIFT)
-#define S2_PMD_MASK (~(S2_PMD_SIZE-1))
-
-#define stage2_pud_none(kvm, pud) (0)
-#define stage2_pud_present(kvm, pud) (1)
-#define stage2_pud_clear(kvm, pud) do { } while (0)
-#define stage2_pud_populate(kvm, pud, pmd) do { } while (0)
-#define stage2_pmd_offset(kvm, pud, address) ((pmd_t *)(pud))
-
-#define stage2_pmd_free(kvm, pmd) do { } while (0)
-
-#define stage2_pmd_addr_end(kvm, addr, end) (end)
-
-#define stage2_pud_huge(kvm, pud) (0)
-#define stage2_pmd_table_empty(kvm, pmdp) (0)
-
-#endif
diff --git a/arch/arm64/include/asm/stage2_pgtable-nopud.h b/arch/arm64/include/asm/stage2_pgtable-nopud.h
deleted file mode 100644
index cd6304e..0000000
--- a/arch/arm64/include/asm/stage2_pgtable-nopud.h
+++ /dev/null
@@ -1,39 +0,0 @@
-/*
- * Copyright (C) 2016 - ARM Ltd
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef __ARM64_S2_PGTABLE_NOPUD_H_
-#define __ARM64_S2_PGTABLE_NOPUD_H_
-
-#define __S2_PGTABLE_PUD_FOLDED
-
-#define S2_PUD_SHIFT S2_PGDIR_SHIFT
-#define S2_PTRS_PER_PUD 1
-#define S2_PUD_SIZE (_AC(1, UL) << S2_PUD_SHIFT)
-#define S2_PUD_MASK (~(S2_PUD_SIZE-1))
-
-#define stage2_pgd_none(kvm, pgd) (0)
-#define stage2_pgd_present(kvm, pgd) (1)
-#define stage2_pgd_clear(kvm, pgd) do { } while (0)
-#define stage2_pgd_populate(kvm, pgd, pud) do { } while (0)
-
-#define stage2_pud_offset(kvm, pgd, address) ((pud_t *)(pgd))
-
-#define stage2_pud_free(kvm, x) do { } while (0)
-
-#define stage2_pud_addr_end(kvm, addr, end) (end)
-#define stage2_pud_table_empty(kvm, pmdp) (0)
-
-#endif
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index 057a405..33e8ebb 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -21,6 +21,9 @@

#include <asm/pgtable.h>

+/* The PGDIR shift for a given page table with "n" levels. */
+#define pt_levels_pgdir_shift(n) ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - (n))
+
/*
* The hardware supports concatenation of up to 16 tables at stage2 entry level
* and we use the feature whenever possible.
@@ -29,118 +32,184 @@
* On arm64, the smallest PAGE_SIZE supported is 4k, which means
* (PAGE_SHIFT - 3) > 4 holds for all page sizes.
* This implies, the total number of page table levels at stage2 expected
- * by the hardware is actually the number of levels required for (KVM_PHYS_SHIFT - 4)
+ * by the hardware is actually the number of levels required for (IPA_SHIFT - 4)
* in normal translations(e.g, stage1), since we cannot have another level in
- * the range (KVM_PHYS_SHIFT, KVM_PHYS_SHIFT - 4).
+ * the range (IPA_SHIFT, IPA_SHIFT - 4).
*/
-#define STAGE2_PGTABLE_LEVELS ARM64_HW_PGTABLE_LEVELS(KVM_PHYS_SHIFT - 4)
-
-/*
- * With all the supported VA_BITs and 40bit guest IPA, the following condition
- * is always true:
- *
- * STAGE2_PGTABLE_LEVELS <= CONFIG_PGTABLE_LEVELS
- *
- * We base our stage-2 page table walker helpers on this assumption and
- * fall back to using the host version of the helper wherever possible.
- * i.e, if a particular level is not folded (e.g, PUD) at stage2, we fall back
- * to using the host version, since it is guaranteed it is not folded at host.
- *
- * If the condition breaks in the future, we can rearrange the host level
- * definitions and reuse them for stage2. Till then...
- */
-#if STAGE2_PGTABLE_LEVELS > CONFIG_PGTABLE_LEVELS
-#error "Unsupported combination of guest IPA and host VA_BITS."
-#endif
-
-/* S2_PGDIR_SHIFT is the size mapped by top-level stage2 entry */
-#define S2_PGDIR_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - STAGE2_PGTABLE_LEVELS)
-#define S2_PGDIR_SIZE (_AC(1, UL) << S2_PGDIR_SHIFT)
-#define S2_PGDIR_MASK (~(S2_PGDIR_SIZE - 1))
+#define stage2_pt_levels(ipa_shift) ARM64_HW_PGTABLE_LEVELS((ipa_shift) - 4)

/*
* The number of PTRS across all concatenated stage2 tables given by the
* number of bits resolved at the initial level.
*/
-#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - S2_PGDIR_SHIFT))
+#define __s2_pgd_ptrs(pa, lvls) (1 << ((pa) - pt_levels_pgdir_shift((lvls))))
+
+#define kvm_stage2_levels(kvm) stage2_pt_levels(kvm_phys_shift(kvm))
+#define stage2_pgdir_shift(kvm) \
+ pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
+#define stage2_pgdir_size(kvm) (_AC(1, UL) << stage2_pgdir_shift((kvm)))
+#define stage2_pgdir_mask(kvm) (~(stage2_pgdir_size((kvm)) - 1))
+#define stage2_pgd_ptrs(kvm) \
+ __s2_pgd_ptrs(kvm_phys_shift(kvm), kvm_stage2_levels(kvm))
+

/*
* kvm_mmmu_cache_min_pages is the number of stage2 page table translation
* levels in addition to the PGD.
*/
-#define kvm_mmu_cache_min_pages(kvm) (STAGE2_PGTABLE_LEVELS - 1)
+#define kvm_mmu_cache_min_pages(kvm) (kvm_stage2_levels(kvm) - 1)


-#if STAGE2_PGTABLE_LEVELS > 3
+/* PUD/PMD definitions if present */
+#define __S2_PUD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(1)
+#define __S2_PUD_SIZE (_AC(1, UL) << __S2_PUD_SHIFT)
+#define __S2_PUD_MASK (~(__S2_PUD_SIZE - 1))

-#define S2_PUD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(1)
-#define S2_PUD_SIZE (_AC(1, UL) << S2_PUD_SHIFT)
-#define S2_PUD_MASK (~(S2_PUD_SIZE - 1))
+#define __S2_PMD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(2)
+#define __S2_PMD_SIZE (_AC(1, UL) << __S2_PMD_SHIFT)
+#define __S2_PMD_MASK (~(__S2_PMD_SIZE - 1))

-#define stage2_pgd_none(kvm, pgd) pgd_none(pgd)
-#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd)
-#define stage2_pgd_present(kvm, pgd) pgd_present(pgd)
-#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud)
-#define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address)
-#define stage2_pud_free(kvm, pud) pud_free(NULL, pud)
+#define __s2_pud_index(addr) \
+ (((addr) >> __S2_PUD_SHIFT) & (PTRS_PER_PTE - 1))
+#define __s2_pmd_index(addr) \
+ (((addr) >> __S2_PMD_SHIFT) & (PTRS_PER_PTE - 1))

-#define stage2_pud_table_empty(kvm, pudp) kvm_page_empty(pudp)
+static inline int stage2_pgd_none(struct kvm *kvm, pgd_t pgd)
+{
+ return (kvm_stage2_levels(kvm) > 3) ? __raw_pgd_none(pgd) : 0;
+}
+
+static inline void stage2_pgd_clear(struct kvm *kvm, pgd_t *pgdp)
+{
+ if (kvm_stage2_levels(kvm) > 3)
+ __raw_pgd_clear(pgdp);
+}
+
+static inline int stage2_pgd_present(struct kvm *kvm, pgd_t pgd)
+{
+ return kvm_stage2_levels(kvm) > 3 ? __raw_pgd_present(pgd) : 1;
+}
+
+static inline void stage2_pgd_populate(struct kvm *kvm, pgd_t *pgdp, pud_t *pud)
+{
+ if (kvm_stage2_levels(kvm) > 3)
+ __raw_pgd_populate(pgdp, __pa(pud), PUD_TYPE_TABLE);
+ else
+ BUG();
+}
+
+static inline pud_t *stage2_pud_offset(struct kvm *kvm,
+ pgd_t *pgd, unsigned long address)
+{
+ if (kvm_stage2_levels(kvm) > 3) {
+ phys_addr_t pud_phys = __raw_pgd_page_paddr(*pgd);
+
+ pud_phys += __s2_pud_index(address) * sizeof(pud_t);
+ return __va(pud_phys);
+ }
+ return (pud_t *)pgd;
+}
+
+static inline void stage2_pud_free(struct kvm *kvm, pud_t *pud)
+{
+ if (kvm_stage2_levels(kvm) > 3)
+ __raw_pud_free(pud);
+}
+
+static inline int stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp)
+{
+ return kvm_stage2_levels(kvm) > 3 && kvm_page_empty(pudp);
+}

static inline phys_addr_t
stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
{
- phys_addr_t boundary = (addr + S2_PUD_SIZE) & S2_PUD_MASK;
+ if (kvm_stage2_levels(kvm) > 3) {
+ phys_addr_t boundary = (addr + __S2_PUD_SIZE) & __S2_PUD_MASK;

- return (boundary - 1 < end - 1) ? boundary : end;
+ return (boundary - 1 < end - 1) ? boundary : end;
+ }
+ return end;
}

-#endif /* STAGE2_PGTABLE_LEVELS > 3 */
+static inline int stage2_pud_none(struct kvm *kvm, pud_t pud)
+{
+ return kvm_stage2_levels(kvm) > 2 ? __raw_pud_none(pud) : 0;
+}

+static inline void stage2_pud_clear(struct kvm *kvm, pud_t *pudp)
+{
+ if (kvm_stage2_levels(kvm) > 2)
+ __raw_pud_clear(pudp);
+}

-#if STAGE2_PGTABLE_LEVELS > 2
+static inline int stage2_pud_present(struct kvm *kvm, pud_t pud)
+{
+ return kvm_stage2_levels(kvm) > 2 ? __raw_pud_present(pud) : 1;
+}

-#define S2_PMD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(2)
-#define S2_PMD_SIZE (_AC(1, UL) << S2_PMD_SHIFT)
-#define S2_PMD_MASK (~(S2_PMD_SIZE - 1))
+static inline void stage2_pud_populate(struct kvm *kvm, pud_t *pudp, pmd_t *pmd)
+{
+ if (kvm_stage2_levels(kvm) > 2)
+ __raw_pud_populate(pudp, __pa(pmd), PMD_TYPE_TABLE);
+ else
+ BUG();
+}

-#define stage2_pud_none(kvm, pud) pud_none(pud)
-#define stage2_pud_clear(kvm, pud) pud_clear(pud)
-#define stage2_pud_present(kvm, pud) pud_present(pud)
-#define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd)
-#define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud, address)
-#define stage2_pmd_free(kvm, pmd) pmd_free(NULL, pmd)
+static inline pmd_t *stage2_pmd_offset(struct kvm *kvm,
+ pud_t *pud, unsigned long address)
+{
+ if (kvm_stage2_levels(kvm) > 2) {
+ phys_addr_t pmd_phys = __raw_pud_page_paddr(*pud);

-#define stage2_pud_huge(kvm, pud) pud_huge(pud)
-#define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp)
+ pmd_phys += __s2_pmd_index(address) * sizeof(pmd_t);
+ return __va(pmd_phys);
+ }
+ return (pmd_t *)pud;
+}
+
+static inline void stage2_pmd_free(struct kvm *kvm, pmd_t *pmd)
+{
+ if (kvm_stage2_levels(kvm) > 2)
+ __raw_pmd_free(pmd);
+}
+
+static inline int stage2_pmd_table_empty(struct kvm *kvm, pmd_t *pmdp)
+{
+ return kvm_stage2_levels(kvm) > 2 && kvm_page_empty(pmdp);
+}

static inline phys_addr_t
stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
{
- phys_addr_t boundary = (addr + S2_PMD_SIZE) & S2_PMD_MASK;
+ if (kvm_stage2_levels(kvm) > 2) {
+ phys_addr_t boundary = (addr + __S2_PMD_SIZE) & __S2_PMD_MASK;

- return (boundary - 1 < end - 1) ? boundary : end;
+ return (boundary - 1 < end - 1) ? boundary : end;
+ }
+ return end;
}

-#endif /* STAGE2_PGTABLE_LEVELS > 2 */
+static inline int stage2_pud_huge(struct kvm *kvm, pud_t pud)
+{
+ return kvm_stage2_levels(kvm) > 2 ? __raw_pud_huge(pud) : 0;
+}

#define stage2_pte_table_empty(kvm, ptep) kvm_page_empty(ptep)

-#if STAGE2_PGTABLE_LEVELS == 2
-#include <asm/stage2_pgtable-nopmd.h>
-#elif STAGE2_PGTABLE_LEVELS == 3
-#include <asm/stage2_pgtable-nopud.h>
-#endif
+#define stage2_pgd_size(kvm) (stage2_pgd_ptrs(kvm) * sizeof(pgd_t))

-#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t))
-
-#define stage2_pgd_index(kvm, addr) \
- (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1))
+static inline unsigned long stage2_pgd_index(struct kvm *kvm, phys_addr_t addr)
+{
+ return (addr >> stage2_pgdir_shift(kvm)) & (stage2_pgd_ptrs(kvm) - 1);
+}

static inline phys_addr_t
stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
{
- phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
+ phys_addr_t boundary;

+ boundary = (addr + stage2_pgdir_size(kvm)) & stage2_pgdir_mask(kvm);
return (boundary - 1 < end - 1) ? boundary : end;
}

--
2.7.4


2018-03-27 13:18:00

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 11/17] kvm: arm64: Configure VTCR per VM

We set VTCR_EL2 very early during the stage2 init and don't
touch it ever. This is fine as we had a fixed IPA size. This
patch changes the behavior to set the VTCR for a given VM,
depending on its stage2 table. The common configuration for
VTCR is still performed during the early init as we have to
retain the hardware access flag update bits (VTCR_EL2_HA)
per CPU (as they are only set for the CPUs which are capabile).
The bits defining the number of levels in the page table (SL0)
and and the size of the Input address to the translation (T0SZ)
are programmed for each VM upon entry to the guest.

Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/kvm_arm.h | 16 ++++++----------
arch/arm64/include/asm/kvm_asm.h | 2 +-
arch/arm64/include/asm/kvm_host.h | 8 +++++---
arch/arm64/kvm/hyp/s2-setup.c | 16 +---------------
arch/arm64/kvm/hyp/switch.c | 6 ++++++
5 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 176551c..2b278dc 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -117,9 +117,7 @@
#define VTCR_EL2_IRGN0_WBWA TCR_IRGN0_WBWA
#define VTCR_EL2_SL0_SHIFT 6
#define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT)
-#define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT)
#define VTCR_EL2_T0SZ_MASK 0x3f
-#define VTCR_EL2_T0SZ_40B 24
#define VTCR_EL2_VS_SHIFT 19
#define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT)
#define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT)
@@ -141,38 +139,36 @@
* D4-23 and D4-25 in ARM DDI 0487A.b.
*/

-#define VTCR_EL2_T0SZ_IPA VTCR_EL2_T0SZ_40B
#define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \
VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
+#define VTCR_EL2_PRIVATE_MASK (VTCR_EL2_SL0_MASK | VTCR_EL2_T0SZ_MASK)

#ifdef CONFIG_ARM64_64K_PAGES
/*
* Stage2 translation configuration:
* 64kB pages (TG0 = 1)
- * 2 level page tables (SL = 1)
*/
-#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1)
+#define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K
#define VTCR_EL2_TGRAN_SL0_BASE 3UL

#elif defined(CONFIG_ARM64_16K_PAGES)
/*
* Stage2 translation configuration:
* 16kB pages (TG0 = 2)
- * 2 level page tables (SL = 1)
*/
-#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1)
+#define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K
#define VTCR_EL2_TGRAN_SL0_BASE 3UL
#else /* 4K */
/*
* Stage2 translation configuration:
* 4kB pages (TG0 = 0)
- * 3 level page tables (SL = 1)
*/
-#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1)
+#define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K
#define VTCR_EL2_TGRAN_SL0_BASE 2UL
#endif

-#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS)
+#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN)
+
/*
* VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
* Interestingly, it depends on the page size.
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 24961b7..977f945 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -66,7 +66,7 @@ extern void __vgic_v3_init_lrs(void);

extern u32 __kvm_get_mdcr_el2(void);

-extern u32 __init_stage2_translation(void);
+extern void __init_stage2_translation(void);

extern void __qcom_hyp_sanitize_btac_predictors(void);

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 596f8e4..9f3c8b8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -392,10 +392,12 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,

static inline void __cpu_init_stage2(void)
{
- u32 parange = kvm_call_hyp(__init_stage2_translation);
+ u32 ps;

- WARN_ONCE(parange < 40,
- "PARange is %d bits, unsupported configuration!", parange);
+ kvm_call_hyp(__init_stage2_translation);
+ ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1));
+ WARN_ONCE(ps < 40,
+ "PARange is %d bits, unsupported configuration!", ps);
}

/*
diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
index b1129c8..5c26ad4b 100644
--- a/arch/arm64/kvm/hyp/s2-setup.c
+++ b/arch/arm64/kvm/hyp/s2-setup.c
@@ -19,13 +19,11 @@
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_hyp.h>
-#include <asm/cpufeature.h>

-u32 __hyp_text __init_stage2_translation(void)
+void __hyp_text __init_stage2_translation(void)
{
u64 val = VTCR_EL2_FLAGS;
u64 parange;
- u32 phys_shift;
u64 tmp;

/*
@@ -38,16 +36,6 @@ u32 __hyp_text __init_stage2_translation(void)
parange = ID_AA64MMFR0_PARANGE_MAX;
val |= parange << 16;

- /* Compute the actual PARange... */
- phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
-
- /*
- * ... and clamp it to 40 bits, unless we have some braindead
- * HW that implements less than that. In all cases, we'll
- * return that value for the rest of the kernel to decide what
- * to do.
- */
- val |= 64 - (phys_shift > 40 ? 40 : phys_shift);

/*
* Check the availability of Hardware Access Flag / Dirty Bit
@@ -67,6 +55,4 @@ u32 __hyp_text __init_stage2_translation(void)
VTCR_EL2_VS_8BIT;

write_sysreg(val, vtcr_el2);
-
- return phys_shift;
}
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 870f4b1..5ccd3ae 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -164,6 +164,12 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
+ u64 vtcr = read_sysreg(vtcr_el2);
+
+ vtcr &= ~VTCR_EL2_PRIVATE_MASK;
+ vtcr |= VTCR_EL2_SL0(kvm_stage2_levels(kvm)) |
+ VTCR_EL2_T0SZ(kvm_phys_shift(kvm));
+ write_sysreg(vtcr, vtcr_el2);
write_sysreg(kvm->arch.vttbr, vttbr_el2);
}

--
2.7.4


2018-03-27 13:18:23

by Suzuki K Poulose

[permalink] [raw]
Subject: [kvmtool PATCH 20/17] kvmtool: arm64: Switch memory layout

If the guest wants to use a larger physical address space place
the RAM at upper half of the address space. Otherwise, it uses the
default layout.

Signed-off-by: Suzuki K Poulose <[email protected]>
---
arm/aarch32/include/kvm/kvm-arch.h | 1 +
arm/aarch64/include/kvm/kvm-arch.h | 15 ++++++++++++---
arm/include/arm-common/kvm-arch.h | 11 ++++++-----
arm/kvm.c | 2 +-
4 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/arm/aarch32/include/kvm/kvm-arch.h b/arm/aarch32/include/kvm/kvm-arch.h
index cd31e72..2d62aab 100644
--- a/arm/aarch32/include/kvm/kvm-arch.h
+++ b/arm/aarch32/include/kvm/kvm-arch.h
@@ -4,6 +4,7 @@
#define ARM_KERN_OFFSET(...) 0x8000

#define ARM_MAX_MEMORY(...) ARM_LOMAP_MAX_MEMORY
+#define ARM_MEMORY_AREA(...) ARM32_MEMORY_AREA

#include "arm-common/kvm-arch.h"

diff --git a/arm/aarch64/include/kvm/kvm-arch.h b/arm/aarch64/include/kvm/kvm-arch.h
index 9de623a..bad35b9 100644
--- a/arm/aarch64/include/kvm/kvm-arch.h
+++ b/arm/aarch64/include/kvm/kvm-arch.h
@@ -1,14 +1,23 @@
#ifndef KVM__KVM_ARCH_H
#define KVM__KVM_ARCH_H

+#include "arm-common/kvm-arch.h"
+
+#define ARM64_MEMORY_AREA(phys_shift) (1UL << (phys_shift - 1))
+#define ARM64_MAX_MEMORY(phys_shift) \
+ ((1ULL << (phys_shift)) - ARM64_MEMORY_AREA(phys_shift))
+
+#define ARM_MEMORY_AREA(kvm) ((kvm)->cfg.arch.aarch32_guest ? \
+ ARM32_MEMORY_AREA : \
+ ARM64_MEMORY_AREA(kvm->cfg.arch.phys_shift))
+
#define ARM_KERN_OFFSET(kvm) ((kvm)->cfg.arch.aarch32_guest ? \
0x8000 : \
0x80000)

#define ARM_MAX_MEMORY(kvm) ((kvm)->cfg.arch.aarch32_guest ? \
- ARM_LOMAP_MAX_MEMORY : \
- ARM_HIMAP_MAX_MEMORY)
+ ARM32_MAX_MEMORY : \
+ ARM64_MAX_MEMORY(kvm->cfg.arch.phys_shift))

-#include "arm-common/kvm-arch.h"

#endif /* KVM__KVM_ARCH_H */
diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
index c83c45f..ca7ab0f 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -6,14 +6,15 @@
#include <linux/types.h>

#include "arm-common/gic.h"
-
#define ARM_IOPORT_AREA _AC(0x0000000000000000, UL)
#define ARM_MMIO_AREA _AC(0x0000000000010000, UL)
#define ARM_AXI_AREA _AC(0x0000000040000000, UL)
-#define ARM_MEMORY_AREA _AC(0x0000000080000000, UL)

-#define ARM_LOMAP_MAX_MEMORY ((1ULL << 32) - ARM_MEMORY_AREA)
-#define ARM_HIMAP_MAX_MEMORY ((1ULL << 40) - ARM_MEMORY_AREA)
+#define ARM32_MEMORY_AREA _AC(0x0000000080000000, UL)
+#define ARM32_MAX_MEMORY ((1ULL << 32) - ARM32_MEMORY_AREA)
+
+#define ARM_IOMEM_AREA_END ARM32_MEMORY_AREA
+

#define ARM_GIC_DIST_BASE (ARM_AXI_AREA - ARM_GIC_DIST_SIZE)
#define ARM_GIC_CPUI_BASE (ARM_GIC_DIST_BASE - ARM_GIC_CPUI_SIZE)
@@ -24,7 +25,7 @@
#define ARM_IOPORT_SIZE (ARM_MMIO_AREA - ARM_IOPORT_AREA)
#define ARM_VIRTIO_MMIO_SIZE (ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
#define ARM_PCI_CFG_SIZE (1ULL << 24)
-#define ARM_PCI_MMIO_SIZE (ARM_MEMORY_AREA - \
+#define ARM_PCI_MMIO_SIZE (ARM_IOMEM_AREA_END - \
(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))

#define KVM_IOPORT_AREA ARM_IOPORT_AREA
diff --git a/arm/kvm.c b/arm/kvm.c
index 2ab436e..5701d41 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -30,7 +30,7 @@ void kvm__init_ram(struct kvm *kvm)
u64 phys_start, phys_size;
void *host_mem;

- phys_start = ARM_MEMORY_AREA;
+ phys_start = ARM_MEMORY_AREA(kvm);
phys_size = kvm->ram_size;
host_mem = kvm->ram_start;

--
1.9.1


2018-03-27 13:18:41

by Suzuki K Poulose

[permalink] [raw]
Subject: [kvmtool PATCH 21/17] kvmtool: arm: Add support for creating VM with PA size

Specify the physical size for the VM encoded in the vm type.

Signed-off-by: Suzuki K Poulose <[email protected]>
---
arm/include/arm-common/kvm-arch.h | 6 +++++-
arm/kvm.c | 21 +++++++++++++++++++++
2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
index ca7ab0f..64bc92a 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -42,7 +42,11 @@

#define KVM_IRQ_OFFSET GIC_SPI_IRQ_BASE

-#define KVM_VM_TYPE 0
+extern unsigned long kvm_arm_type;
+extern void kvm__arch_init_hyp(struct kvm *kvm);
+
+#define KVM_VM_TYPE kvm_arm_type
+#define kvm__arch_init_hyp kvm__arch_init_hyp

#define VIRTIO_DEFAULT_TRANS(kvm) \
((kvm)->cfg.arch.virtio_trans_pci ? VIRTIO_PCI : VIRTIO_MMIO)
diff --git a/arm/kvm.c b/arm/kvm.c
index 5701d41..a9a9140 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -11,6 +11,8 @@
#include <linux/kvm.h>
#include <linux/sizes.h>

+unsigned long kvm_arm_type;
+
struct kvm_ext kvm_req_ext[] = {
{ DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
{ DEFINE_KVM_EXT(KVM_CAP_ONE_REG) },
@@ -18,6 +20,25 @@ struct kvm_ext kvm_req_ext[] = {
{ 0, 0 },
};

+#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT
+#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT _IO(KVMIO, 0x0a)
+#endif
+
+void kvm__arch_init_hyp(struct kvm *kvm)
+{
+ unsigned max_ipa;
+
+ max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT);
+ if (max_ipa < 0)
+ max_ipa = 40;
+ if (!kvm->cfg.arch.phys_shift)
+ kvm->cfg.arch.phys_shift = 40;
+ if (kvm->cfg.arch.phys_shift > max_ipa)
+ die("Requested PA size (%u) is not supported by the host (%ubits)\n",
+ kvm->cfg.arch.phys_shift, max_ipa);
+ kvm_arm_type = kvm->cfg.arch.phys_shift;
+}
+
bool kvm__arch_cpu_supports_vm(void)
{
/* The KVM capability check is enough. */
--
1.9.1


2018-03-27 13:19:43

by Suzuki K Poulose

[permalink] [raw]
Subject: [kvmtool PATCH 19/17] kvmtool: arm64: Add support for guest physical address size

Add an option to specify the physical address size used by this
VM.

Signed-off-by: Suzuki K Poulose <[email protected]>
---
arm/aarch64/include/kvm/kvm-config-arch.h | 5 ++++-
arm/include/arm-common/kvm-config-arch.h | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h b/arm/aarch64/include/kvm/kvm-config-arch.h
index 04be43d..c4bb207 100644
--- a/arm/aarch64/include/kvm/kvm-config-arch.h
+++ b/arm/aarch64/include/kvm/kvm-config-arch.h
@@ -8,7 +8,10 @@
"Create PMUv3 device"), \
OPT_U64('\0', "kaslr-seed", &(cfg)->kaslr_seed, \
"Specify random seed for Kernel Address Space " \
- "Layout Randomization (KASLR)"),
+ "Layout Randomization (KASLR)"), \
+ OPT_UINTEGER('\0', "phys-shift", &(cfg)->phys_shift, \
+ "Specify maximum physical address size (not " \
+ "the amount of memory)"),

#include "arm-common/kvm-config-arch.h"

diff --git a/arm/include/arm-common/kvm-config-arch.h b/arm/include/arm-common/kvm-config-arch.h
index 6a196f1..d841b0b 100644
--- a/arm/include/arm-common/kvm-config-arch.h
+++ b/arm/include/arm-common/kvm-config-arch.h
@@ -11,6 +11,7 @@ struct kvm_config_arch {
bool has_pmuv3;
u64 kaslr_seed;
enum irqchip_type irqchip;
+ unsigned int phys_shift;
};

int irqchip_parser(const struct option *opt, const char *arg, int unset);
--
1.9.1


2018-03-27 13:20:07

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 17/17] kvm: arm64: Allow IPA size supported by the system

So far we have restricted the IPA size of the VM to the default
value (40bits). Now that we can manage the IPA size per VM and
support dynamic stage2 page tables, allow VMs to bigger IPA.
This is done by setting the IPA limit to the one supported by
the hardware and kernel. This patch also moves the check for
the default IPA size support to kvm_get_ipa_limit().

Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 5 -----
arch/arm64/include/asm/kvm_mmu.h | 18 +++++++++++++++++-
2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7b0af32..ec61e08 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -402,12 +402,7 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,

static inline void __cpu_init_stage2(void)
{
- u32 ps;
-
kvm_call_hyp(__init_stage2_translation);
- ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1));
- WARN_ONCE(ps < 40,
- "PARange is %d bits, unsupported configuration!", ps);
}

/*
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e86d7f4..6c5f0be 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -414,7 +414,23 @@ static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm)

static inline u32 kvm_get_ipa_limit(void)
{
- return KVM_PHYS_SHIFT;
+ unsigned int ipa_max, parange;
+
+ parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7;
+ ipa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
+
+ /* Raise the limit to the default size for backward compatibility */
+ if (ipa_max < KVM_PHYS_SHIFT) {
+ WARN_ONCE(1,
+ "PARange is %d bits, unsupported configuration!",
+ ipa_max);
+ ipa_max = KVM_PHYS_SHIFT;
+ }
+
+ /* Clamp it to the size supported by the kernel */
+ ipa_max = (ipa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : ipa_max;
+
+ return ipa_max;
}

static inline void kvm_config_stage2(struct kvm *kvm, u8 ipa_shift)
--
2.7.4


2018-03-27 13:20:22

by Suzuki K Poulose

[permalink] [raw]
Subject: [kvmtool PATCH 18/17] kvmtool: Allow backends to run checks on the KVM device fd

Allow architectures to perform initialisation based on the
KVM device fd ioctls, even before the VM is created.

Signed-off-by: Suzuki K Poulose <[email protected]>
---
include/kvm/kvm.h | 4 ++++
kvm.c | 2 ++
2 files changed, 6 insertions(+)

diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 90463b8..a036dd2 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -103,6 +103,10 @@ int kvm__get_sock_by_instance(const char *name);
int kvm__enumerate_instances(int (*callback)(const char *name, int pid));
void kvm__remove_socket(const char *name);

+#ifndef kvm__arch_init_hyp
+static inline void kvm__arch_init_hyp(struct kvm *kvm) {}
+#endif
+
void kvm__arch_set_cmdline(char *cmdline, bool video);
void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size);
void kvm__arch_delete_ram(struct kvm *kvm);
diff --git a/kvm.c b/kvm.c
index f8f2fdc..b992e74 100644
--- a/kvm.c
+++ b/kvm.c
@@ -304,6 +304,8 @@ int kvm__init(struct kvm *kvm)
goto err_sys_fd;
}

+ kvm__arch_init_hyp(kvm);
+
kvm->vm_fd = ioctl(kvm->sys_fd, KVM_CREATE_VM, KVM_VM_TYPE);
if (kvm->vm_fd < 0) {
pr_err("KVM_CREATE_VM ioctl");
--
1.9.1


2018-03-27 13:21:05

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 15/17] vgic: Add support for 52bit guest physical address

From: Kristina Martsenko <[email protected]>

Add support for handling 52bit guest physical address to the
VGIC layer. So far we have limited the guest physical address
to 48bits, by explicitly masking the upper bits. This patch
removes the restriction. We do not have to check if the host
supports 52bit as the gpa is always validated during an access.
(e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()).
Also, the ITS table save-restore is also not affected with
the enhancement. The DTE entries already store the bits[51:8]
of the ITT_addr (with a 256byte alignment).

Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Kristina Martsenko <[email protected]>
[ Macro clean ups, fix PROPBASER and PENDBASER accesses ]
Signed-off-by: Suzuki K Poulose <[email protected]>
---
include/linux/irqchip/arm-gic-v3.h | 5 +++++
virt/kvm/arm/vgic/vgic-its.c | 37 ++++++++++---------------------------
virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 --
3 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index c00c4c33..d9c60eb 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -344,6 +344,8 @@
#define GITS_CBASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWt)
#define GITS_CBASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWb)

+#define GITS_CBASER_ADDRESS(cbaser) ((cbaser) & GENMASK_ULL(52, 12))
+
#define GITS_BASER_NR_REGS 8

#define GITS_BASER_VALID (1ULL << 63)
@@ -375,6 +377,9 @@
#define GITS_BASER_ENTRY_SIZE_MASK GENMASK_ULL(52, 48)
#define GITS_BASER_PHYS_52_to_48(phys) \
(((phys) & GENMASK_ULL(47, 16)) | (((phys) >> 48) & 0xf) << 12)
+#define GITS_BASER_ADDR_48_to_52(baser) \
+ (((baser) & GENMASK_ULL(47, 16)) | (((baser) >> 12) & 0xf) << 48)
+
#define GITS_BASER_SHAREABILITY_SHIFT (10)
#define GITS_BASER_InnerShareable \
GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 4650953..ae6bfaf 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -233,13 +233,6 @@ static struct its_ite *find_ite(struct vgic_its *its, u32 device_id,
list_for_each_entry(dev, &(its)->device_list, dev_list) \
list_for_each_entry(ite, &(dev)->itt_head, ite_list)

-/*
- * We only implement 48 bits of PA at the moment, although the ITS
- * supports more. Let's be restrictive here.
- */
-#define BASER_ADDRESS(x) ((x) & GENMASK_ULL(47, 16))
-#define CBASER_ADDRESS(x) ((x) & GENMASK_ULL(47, 12))
-
#define GIC_LPI_OFFSET 8192

#define VITS_TYPER_IDBITS 16
@@ -745,6 +738,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
{
int l1_tbl_size = GITS_BASER_NR_PAGES(baser) * SZ_64K;
u64 indirect_ptr, type = GITS_BASER_TYPE(baser);
+ phys_addr_t base = GITS_BASER_ADDR_48_to_52(baser);
int esz = GITS_BASER_ENTRY_SIZE(baser);
int index;
gfn_t gfn;
@@ -769,7 +763,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
if (id >= (l1_tbl_size / esz))
return false;

- addr = BASER_ADDRESS(baser) + id * esz;
+ addr = base + id * esz;
gfn = addr >> PAGE_SHIFT;

if (eaddr)
@@ -783,8 +777,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
return false;

/* Each 1st level entry is represented by a 64-bit value. */
- if (kvm_read_guest(its->dev->kvm,
- BASER_ADDRESS(baser) + index * sizeof(indirect_ptr),
+ if (kvm_read_guest(its->dev->kvm, base + index * sizeof(indirect_ptr),
&indirect_ptr, sizeof(indirect_ptr)))
return false;

@@ -794,11 +787,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
if (!(indirect_ptr & BIT_ULL(63)))
return false;

- /*
- * Mask the guest physical address and calculate the frame number.
- * Any address beyond our supported 48 bits of PA will be caught
- * by the actual check in the final step.
- */
+ /* Mask the guest physical address and calculate the frame number. */
indirect_ptr &= GENMASK_ULL(51, 16);

/* Find the address of the actual entry */
@@ -1290,9 +1279,6 @@ static u64 vgic_sanitise_its_baser(u64 reg)
GITS_BASER_OUTER_CACHEABILITY_SHIFT,
vgic_sanitise_outer_cacheability);

- /* Bits 15:12 contain bits 51:48 of the PA, which we don't support. */
- reg &= ~GENMASK_ULL(15, 12);
-
/* We support only one (ITS) page size: 64K */
reg = (reg & ~GITS_BASER_PAGE_SIZE_MASK) | GITS_BASER_PAGE_SIZE_64K;

@@ -1311,11 +1297,8 @@ static u64 vgic_sanitise_its_cbaser(u64 reg)
GITS_CBASER_OUTER_CACHEABILITY_SHIFT,
vgic_sanitise_outer_cacheability);

- /*
- * Sanitise the physical address to be 64k aligned.
- * Also limit the physical addresses to 48 bits.
- */
- reg &= ~(GENMASK_ULL(51, 48) | GENMASK_ULL(15, 12));
+ /* Sanitise the physical address to be 64k aligned. */
+ reg &= ~GENMASK_ULL(15, 12);

return reg;
}
@@ -1361,7 +1344,7 @@ static void vgic_its_process_commands(struct kvm *kvm, struct vgic_its *its)
if (!its->enabled)
return;

- cbaser = CBASER_ADDRESS(its->cbaser);
+ cbaser = GITS_CBASER_ADDRESS(its->cbaser);

while (its->cwriter != its->creadr) {
int ret = kvm_read_guest(kvm, cbaser + its->creadr,
@@ -2219,7 +2202,7 @@ static int vgic_its_restore_device_tables(struct vgic_its *its)
if (!(baser & GITS_BASER_VALID))
return 0;

- l1_gpa = BASER_ADDRESS(baser);
+ l1_gpa = GITS_BASER_ADDR_48_to_52(baser);

if (baser & GITS_BASER_INDIRECT) {
l1_esz = GITS_LVL1_ENTRY_SIZE;
@@ -2291,7 +2274,7 @@ static int vgic_its_save_collection_table(struct vgic_its *its)
{
const struct vgic_its_abi *abi = vgic_its_get_abi(its);
u64 baser = its->baser_coll_table;
- gpa_t gpa = BASER_ADDRESS(baser);
+ gpa_t gpa = GITS_BASER_ADDR_48_to_52(baser);
struct its_collection *collection;
u64 val;
size_t max_size, filled = 0;
@@ -2340,7 +2323,7 @@ static int vgic_its_restore_collection_table(struct vgic_its *its)
if (!(baser & GITS_BASER_VALID))
return 0;

- gpa = BASER_ADDRESS(baser);
+ gpa = GITS_BASER_ADDR_48_to_52(baser);

max_size = GITS_BASER_NR_PAGES(baser) * SZ_64K;

diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index 671fe81..80e0d15 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -333,7 +333,6 @@ static u64 vgic_sanitise_pendbaser(u64 reg)
vgic_sanitise_outer_cacheability);

reg &= ~PENDBASER_RES0_MASK;
- reg &= ~GENMASK_ULL(51, 48);

return reg;
}
@@ -351,7 +350,6 @@ static u64 vgic_sanitise_propbaser(u64 reg)
vgic_sanitise_outer_cacheability);

reg &= ~PROPBASER_RES0_MASK;
- reg &= ~GENMASK_ULL(51, 48);
return reg;
}

--
2.7.4


2018-03-27 13:21:24

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 16/17] kvm: arm64: Add support for handling 52bit IPA

Add support for handling the 52bit IPA. 52bit IPA
support needs changes to the following :

1) Page-table entries - We use kernel page table helpers for setting
up the stage2. Hence we don't explicit changes here

2) VTTBR:BADDR - This is already supported with :
commit 529c4b05a3cb2f324aa ("arm64: handle 52-bit addresses in TTBR")

3) VGIC support for 52bit: Supported with a patch in this series.

That leaves us with the handling for PAR and HPAR. This patch adds
support for handling the 52bit addresses in PAR and HPFAR,
which are used while handling the permission faults in stage1.

Cc: Marc Zyngier <[email protected]>
Cc: Kristina Martsenko <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/kvm_arm.h | 7 +++++++
arch/arm64/kvm/hyp/switch.c | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 2b278dc..e48852c 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -302,6 +302,13 @@

/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
#define HPFAR_MASK (~UL(0xf))
+/*
+ * We have
+ * PAR [PA_Shift - 1 : 12] = PA [PA_Shift - 1 : 12]
+ * HPFAR [PA_Shift - 9 : 4] = FIPA [PA_Shift - 1 : 12]
+ */
+#define PAR_TO_HPFAR(par) \
+ (((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8)

#define kvm_arm_exception_type \
{0, "IRQ" }, \
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 794da55..b785853 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -241,7 +241,7 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar)
return false; /* Translation failed, back to guest */

/* Convert PAR to HPFAR format */
- *hpfar = ((tmp >> 12) & ((1UL << 36) - 1)) << 4;
+ *hpfar = PAR_TO_HPFAR(tmp);
return true;
}

--
2.7.4


2018-03-27 13:22:01

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 13/17] kvm: arm/arm64: Allow tuning the physical address size for VM

Allow specifying the physical address size for a new VM via
the kvm_type argument for KVM_CREATE_VM ioctl. This allows
us to finalise the stage2 page table format as early as possible
and hence perform the right checks on the memory slots without
complication. The size is encoded as Log2(PA_Siz) in the bits[7:0]
of the type field and can encode more information in the future if
required.

Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Peter Maydel <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 2 ++
arch/arm64/include/asm/kvm_mmu.h | 2 ++
include/uapi/linux/kvm.h | 10 ++++++++++
virt/kvm/arm/arm.c | 24 ++++++++++++++++++++++--
4 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 50896da..3f13827 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -323,6 +323,8 @@ static inline u32 kvm_get_ipa_limit(void)
return KVM_PHYS_SHIFT;
}

+static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {}
+
#endif /* !__ASSEMBLY__ */

#endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index a4c8c00..bb458bf 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -416,5 +416,7 @@ static inline u32 kvm_get_ipa_limit(void)
return KVM_PHYS_SHIFT;
}

+static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {}
+
#endif /* __ASSEMBLY__ */
#endif /* __ARM64_KVM_MMU_H__ */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b737ee1..67b09b0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -740,6 +740,16 @@ struct kvm_ppc_resize_hpt {
#define KVM_S390_SIE_PAGE_OFFSET 1

/*
+ * On arm/arm64, machine type can be used to request the physical
+ * address size for the VM. Bits [7-0] have been reserved for the
+ * PA size shift (i.e, log2(PA-Size)). For backward compatibility,
+ * value 0 implies the default IPA size, which is 40bits.
+ */
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK 0xff
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x) \
+ ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)
+
+/*
* ioctls for /dev/kvm fds:
*/
#define KVM_GET_API_VERSION _IO(KVMIO, 0x00)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 53bb05c..13eb66f 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -110,6 +110,25 @@ void kvm_arch_check_processor_compat(void *rtn)
}


+static int kvm_arch_config_vm(struct kvm *kvm, unsigned long type)
+{
+ u32 ipa_shift = (u32)type & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK;
+
+ /*
+ * Make sure the size, if specified, is within the range of
+ * default size and supported maximum limit.
+ */
+ if (ipa_shift) {
+ if (ipa_shift < KVM_PHYS_SHIFT || ipa_shift > kvm_ipa_limit)
+ return -EINVAL;
+ } else {
+ ipa_shift = KVM_PHYS_SHIFT;
+ }
+
+ kvm_config_stage2(kvm, ipa_shift);
+ return 0;
+}
+
/**
* kvm_arch_init_vm - initializes a VM data structure
* @kvm: pointer to the KVM struct
@@ -118,8 +137,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
{
int ret, cpu;

- if (type)
- return -EINVAL;
+ ret = kvm_arch_config_vm(kvm, type);
+ if (ret)
+ return ret;

kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
if (!kvm->arch.last_vcpu_ran)
--
2.7.4


2018-03-27 13:22:28

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 14/17] kvm: arm64: Switch to per VM IPA limit

Now that we can manage the stage2 page table per VM, switch the
configuration details to per VM instance. We keep track of the
IPA bits, number of page table levels and the VTCR bits (which
depends on the IPA and the number of levels). While at it, remove
unused pgd_lock field from kvm_arch for arm64.

Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 14 ++++++++++++--
arch/arm64/include/asm/kvm_mmu.h | 11 +++++++++--
arch/arm64/include/asm/stage2_pgtable.h | 1 -
arch/arm64/kvm/hyp/switch.c | 3 +--
virt/kvm/arm/mmu.c | 4 ++++
5 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 9f3c8b8..7b0af32 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -60,13 +60,23 @@ struct kvm_arch {
u64 vmid_gen;
u32 vmid;

- /* 1-level 2nd stage table and lock */
- spinlock_t pgd_lock;
+ /* stage-2 page table */
pgd_t *pgd;

/* VTTBR value associated with above pgd and vmid */
u64 vttbr;

+ /* Private bits of VTCR_EL2 for this VM */
+ u64 vtcr_private;
+ /* Size of the PA size for this guest */
+ u8 phys_shift;
+ /*
+ * Number of levels in page table. We could always calculate
+ * it from phys_shift above. We cache it for faster switches
+ * in stage2 page table helpers.
+ */
+ u8 s2_levels;
+
/* The last vcpu id that ran on each physical CPU */
int __percpu *last_vcpu_ran;

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index bb458bf..e86d7f4 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -136,9 +136,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
*/
#define KVM_PHYS_SHIFT (40)

-#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT
+#define kvm_phys_shift(kvm) (kvm->arch.phys_shift)
#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm))
#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
+#define kvm_stage2_levels(kvm) (kvm->arch.s2_levels)

static inline bool kvm_page_empty(void *ptr)
{
@@ -416,7 +417,13 @@ static inline u32 kvm_get_ipa_limit(void)
return KVM_PHYS_SHIFT;
}

-static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {}
+static inline void kvm_config_stage2(struct kvm *kvm, u8 ipa_shift)
+{
+ kvm->arch.phys_shift = ipa_shift;
+ kvm->arch.s2_levels = stage2_pt_levels(ipa_shift);
+ kvm->arch.vtcr_private = VTCR_EL2_SL0(kvm->arch.s2_levels) |
+ TCR_T0SZ(ipa_shift);
+}

#endif /* __ASSEMBLY__ */
#endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index 33e8ebb..9b75b83 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -44,7 +44,6 @@
*/
#define __s2_pgd_ptrs(pa, lvls) (1 << ((pa) - pt_levels_pgdir_shift((lvls))))

-#define kvm_stage2_levels(kvm) stage2_pt_levels(kvm_phys_shift(kvm))
#define stage2_pgdir_shift(kvm) \
pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
#define stage2_pgdir_size(kvm) (_AC(1, UL) << stage2_pgdir_shift((kvm)))
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 5ccd3ae..794da55 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -167,8 +167,7 @@ static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
u64 vtcr = read_sysreg(vtcr_el2);

vtcr &= ~VTCR_EL2_PRIVATE_MASK;
- vtcr |= VTCR_EL2_SL0(kvm_stage2_levels(kvm)) |
- VTCR_EL2_T0SZ(kvm_phys_shift(kvm));
+ vtcr |= kvm->arch.vtcr_private;
write_sysreg(vtcr, vtcr_el2);
write_sysreg(kvm->arch.vttbr, vttbr_el2);
}
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 7a264c6..746f38e 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -753,6 +753,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
return -EINVAL;
}

+ /* Make sure we have the stage2 configured for this VM */
+ if (WARN_ON(!kvm_stage2_levels(kvm)))
+ return -EINVAL;
+
/* Allocate the HW PGD, making sure that each page gets its own refcount */
pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
if (!pgd)
--
2.7.4


2018-03-27 13:22:39

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 12/17] kvm: arm/arm64: Expose supported physical address limit for VM

Expose the maximum physical address size supported by the host
for a VM. This could be later used by the userspace to choose the
appropriate size for a given VM. The limit is determined as the
minimum of actual CPU limit, the kernel limit (i.e, either 48 or 52)
and the stage2 page table support limit (which is 40bits at the moment).
For backward compatibility, we support a minimum of 40bits. The limit
will be lifted as we add support for the stage2 to support the host
kernel PA limit.

This value may be different from what is exposed to the VM via
CPU ID registers. The limit only applies to the stage2 page table.

Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Peter Maydel <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
Documentation/virtual/kvm/api.txt | 14 ++++++++++++++
arch/arm/include/asm/kvm_mmu.h | 5 +++++
arch/arm64/include/asm/kvm_mmu.h | 5 +++++
include/uapi/linux/kvm.h | 6 ++++++
virt/kvm/arm/arm.c | 6 ++++++
5 files changed, 36 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 792fa87..55908a8 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3500,6 +3500,20 @@ Returns: 0 on success; -1 on error
This ioctl can be used to unregister the guest memory region registered
with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.

+4.113 KVM_ARM_GET_MAX_VM_PHYS_SHIFT
+Capability: basic
+Architectures: arm, arm64
+Type: system ioctl
+Parameters: none
+Returns: log2(Maximum physical address space size) supported by the
+hyperviosr.
+
+This ioctl can be used to identify the maximum physical address space size
+supported by the hypervisor. The returned value indicates the maximum size
+of the address that can be resolved by the stage2 translation table on
+arm/arm64. On arm64, the value is decided based on the host kernel
+configuration and the system wide safe value of ID_AA64MMFR0_EL1:PARange.
+This may not match the value exposed to the VM in CPU ID registers.

5. The kvm_run structure
------------------------
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 305aa40..50896da 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -318,6 +318,11 @@ static inline int kvm_map_vectors(void)

#define kvm_phys_to_vttbr(addr) (addr)

+static inline u32 kvm_get_ipa_limit(void)
+{
+ return KVM_PHYS_SHIFT;
+}
+
#endif /* !__ASSEMBLY__ */

#endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 447bc10..a4c8c00 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -411,5 +411,10 @@ static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm)
return GENMASK_ULL(PHYS_MASK_SHIFT - 1, x);
}

+static inline u32 kvm_get_ipa_limit(void)
+{
+ return KVM_PHYS_SHIFT;
+}
+
#endif /* __ASSEMBLY__ */
#endif /* __ARM64_KVM_MMU_H__ */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0fb5ef9..b737ee1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -763,6 +763,12 @@ struct kvm_ppc_resize_hpt {
#define KVM_GET_EMULATED_CPUID _IOWR(KVMIO, 0x09, struct kvm_cpuid2)

/*
+ * Get the maximum physical address size supported by the host.
+ * Returns log2(Max-Physical-Address-Size)
+ */
+#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT _IO(KVMIO, 0x0a)
+
+/*
* Extension capability list.
*/
#define KVM_CAP_IRQCHIP 0
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 8b35a47..53bb05c 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -64,6 +64,7 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
static u32 kvm_next_vmid;
static unsigned int kvm_vmid_bits __read_mostly;
static DEFINE_SPINLOCK(kvm_vmid_lock);
+static u32 kvm_ipa_limit;

static bool vgic_present;

@@ -246,6 +247,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
+ if (ioctl == KVM_ARM_GET_MAX_VM_PHYS_SHIFT)
+ return kvm_ipa_limit;
+
return -EINVAL;
}

@@ -1348,6 +1352,8 @@ static int init_common_resources(void)
kvm_vmid_bits = kvm_get_vmid_bits();
kvm_info("%d-bit VMID\n", kvm_vmid_bits);

+ kvm_ipa_limit = kvm_get_ipa_limit();
+
return 0;
}

--
2.7.4


2018-03-27 13:23:04

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 10/17] kvm: arm64: Dynamic configuration of VTCR and VTTBR mask

VTCR_EL2 holds the following key stage2 translation table
parameters:
SL0 - Entry level in the page table lookup.
T0SZ - Denotes the size of the memory addressed by the table.

We have been using fixed values for the SL0 depending on the
page size as we have a fixed IPA size. But since we are about
to make it dynamic, we need to calculate the SL0 at runtime
per VM.

Also the VTTBR:BADDR holds the base address for the stage2
translation table and the ARM ARM mandates that bits
BADDR[x-1:0] should be 0, where 'x' defined using some
magical constant, which depends on the page size, T0SZ
and the entry level of lookup (Since the entry level page
tables can be concatenated at stage2, a given T0SZ could
possibly start at 2 different levels). We need a way to
calculate this magical value per VM, depending on the
IPA size. Luckily there is a magic formula for finding
the "magic" number to find "x". See the patch for more
details.

This patch adds helpers to figure out the VTCR_SL0 and
the magic "X" for a configuration of stage2.

The other advantage we have with this change is switching
the entry level for a given IPA size, depending on if we
are able to get contiguous block of memory for the entry
level page table. (e.g, With 64KB page size and 46bit IPA
starting at level 2, finding 16 * 64KB contiguous block on a
loaded system could be tricky. So we could decide to rather
enter at level 1, with a single page).

Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/kvm_arm.h | 96 +++++++++++++++++++++++++++++++++++++---
arch/arm64/include/asm/kvm_mmu.h | 20 ++++++++-
2 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index b0c8417..176551c 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -124,6 +124,8 @@
#define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT)
#define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT)

+#define VTCR_EL2_T0SZ(x) TCR_T0SZ((x))
+
/*
* We configure the Stage-2 page tables to always restrict the IPA space to be
* 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are
@@ -150,7 +152,8 @@
* 2 level page tables (SL = 1)
*/
#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1)
-#define VTTBR_X_TGRAN_MAGIC 38
+#define VTCR_EL2_TGRAN_SL0_BASE 3UL
+
#elif defined(CONFIG_ARM64_16K_PAGES)
/*
* Stage2 translation configuration:
@@ -158,7 +161,7 @@
* 2 level page tables (SL = 1)
*/
#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1)
-#define VTTBR_X_TGRAN_MAGIC 42
+#define VTCR_EL2_TGRAN_SL0_BASE 3UL
#else /* 4K */
/*
* Stage2 translation configuration:
@@ -166,13 +169,96 @@
* 3 level page tables (SL = 1)
*/
#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1)
-#define VTTBR_X_TGRAN_MAGIC 37
+#define VTCR_EL2_TGRAN_SL0_BASE 2UL
#endif

#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS)
-#define VTTBR_X (VTTBR_X_TGRAN_MAGIC - VTCR_EL2_T0SZ_IPA)
+/*
+ * VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
+ * Interestingly, it depends on the page size.
+ * See D.10.2.110, VTCR_EL2, in ARM DDI 0487B.b
+ *
+ * -----------------------------------------
+ * | Entry level | 4K | 16K/64K |
+ * ------------------------------------------
+ * | Level: 0 | 2 | - |
+ * ------------------------------------------
+ * | Level: 1 | 1 | 2 |
+ * ------------------------------------------
+ * | Level: 2 | 0 | 1 |
+ * ------------------------------------------
+ * | Level: 3 | - | 0 |
+ * ------------------------------------------
+ *
+ * That table roughly translates to :
+ *
+ * SL0(PAGE_SIZE, Entry_level) = SL0_BASE(PAGE_SIZE) - Entry_Level
+ *
+ * Where SL0_BASE(4K) = 2 and SL0_BASE(16K) = 3, SL0_BASE(64K) = 3, provided
+ * we take care of ruling out the unsupported cases and
+ * Entry_Level = 4 - Number_of_levels.
+ *
+ */
+#define VTCR_EL2_SL0(levels) \
+ ((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT)
+/*
+ * ARM VMSAv8-64 defines an algorithm for finding the translation table
+ * descriptors in section D4.2.8 in ARM DDI 0487B.b.
+ *
+ * The algorithm defines the expectaions on the BaseAddress (for the page
+ * table) bits resolved at each level based on the page size, entry level
+ * and T0SZ. The variable "x" in the algorithm also affects the VTTBR:BADDR
+ * for stage2 page table.
+ *
+ * The value of "x" is calculated as :
+ * x = Magic_N - T0SZ
+ *
+ * where Magic_N is an integer depending on the page size and the entry
+ * level of the page table as below:
+ *
+ * --------------------------------------------
+ * | Entry level | 4K 16K 64K |
+ * --------------------------------------------
+ * | Level: 0 (4 levels) | 28 | - | - |
+ * --------------------------------------------
+ * | Level: 1 (3 levels) | 37 | 31 | 25 |
+ * --------------------------------------------
+ * | Level: 2 (2 levels) | 46 | 42 | 38 |
+ * --------------------------------------------
+ * | Level: 3 (1 level) | - | 53 | 51 |
+ * --------------------------------------------
+ *
+ * We have a magic formula for the Magic_N below.
+ * Which can also be expressed as:
+ *
+ * Magic_N(PAGE_SIZE, Entry_Level) = 64 - ((PAGE_SHIFT - 3) * Number of levels)
+ *
+ * where number of levels = (4 - Entry_Level).
+ *
+ * So, given that T0SZ = (64 - PA_SHIFT), we can compute 'x' as follows:
+ *
+ * x = (64 - ((PAGE_SHIFT - 3) * Number_of_levels)) - (64 - PA_SHIFT)
+ * = PA_SHIFT - ((PAGE_SHIFT - 3) * Number of levels)
+ *
+ * Here is one way to explain the Magic Formula:
+ *
+ * x = log2(Size_of_Entry_Level_Table)
+ *
+ * Since, we can resolve (PAGE_SHIFT - 3) bits at each level, and another
+ * PAGE_SHIFT bits in the PTE, we have :
+ *
+ * Bits_Entry_level = PA_SHIFT - ((PAGE_SHIFT - 3) * (n - 1) + PAGE_SHIFT)
+ * = PA_SHIFT - (PAGE_SHIFT - 3) * n - 3
+ * where n = number of levels, and since each pointer is 8bytes, we have:
+ *
+ * x = Bits_Entry_Level + 3
+ * = PA_SHIFT - (PAGE_SHIFT - 3) * n
+ *
+ * The only constraint here is that, we have to find the number of page table
+ * levels for a given IPA size (which we do, see STAGE2_PGTABLE_LEVELS).
+ */
+#define ARM64_VTTBR_X(ipa, levels) ((ipa) - ((levels) * (PAGE_SHIFT - 3)))

-#define VTTBR_BADDR_MASK (((UL(1) << (PHYS_MASK_SHIFT - VTTBR_X)) - 1) << VTTBR_X)
#define VTTBR_VMID_SHIFT (UL(48))
#define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index bc133ce..447bc10 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -139,7 +139,6 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT
#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm))
#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
-#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK

static inline bool kvm_page_empty(void *ptr)
{
@@ -393,5 +392,24 @@ static inline int kvm_map_vectors(void)

#define kvm_phys_to_vttbr(addr) phys_to_ttbr(addr)

+/*
+ * Get the magic number 'x' for VTTBR:BADDR of this KVM instance.
+ * With v8.2 LVA extensions, 'x' (rather 'z') should be a minimum
+ * of 6 with 52bit IPS.
+ */
+static inline int kvm_vttbr_x(struct kvm *kvm)
+{
+ int x = ARM64_VTTBR_X(kvm_phys_shift(kvm), kvm_stage2_levels(kvm));
+
+ return (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && x < 6) ? 6 : x;
+}
+
+static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm)
+{
+ unsigned int x = kvm_vttbr_x(kvm);
+
+ return GENMASK_ULL(PHYS_MASK_SHIFT - 1, x);
+}
+
#endif /* __ASSEMBLY__ */
#endif /* __ARM64_KVM_MMU_H__ */
--
2.7.4


2018-03-27 13:24:54

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 07/17] kvm: arm/arm64: Remove spurious WARN_ON

On a 4-level page table pgd entry can be empty, unlike a 3-level
page table. Remove the spurious WARN_ON() in stage_get_pud().

Cc: Marc Zyngier <[email protected]>
Acked-by: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
virt/kvm/arm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 16a19fb..9cba334 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -862,7 +862,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
pud_t *pud;

pgd = kvm->arch.pgd + stage2_pgd_index(addr);
- if (WARN_ON(stage2_pgd_none(*pgd))) {
+ if (stage2_pgd_none(*pgd)) {
if (!cache)
return NULL;
pud = mmu_memory_cache_alloc(cache);
--
2.7.4


2018-03-27 13:25:03

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 05/17] arm64: Helper for parange to PASize

Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
size shift. Limit the size to the maximum supported by the kernel.
We are about to move the user of this code and this helps to
keep the changes cleaner.

Cc: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/cpufeature.h | 16 ++++++++++++++++
arch/arm64/kvm/hyp/s2-setup.c | 28 +++++-----------------------
2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index fbf0aab..1f2a5dd 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
return zcr;
}

+static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
+{
+ switch (parange) {
+ case 0: return 32;
+ case 1: return 36;
+ case 2: return 40;
+ case 3: return 42;
+ case 4: return 44;
+ /* Report 48 bit if the kernel doesn't support 52bit */
+ default:
+ case 5: return 48;
+#ifdef CONFIG_ARM64_PA_BITS_52
+ case 6: return 52;
+#endif
+ }
+}
#endif /* __ASSEMBLY__ */

#endif
diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
index 603e1ee..b1129c8 100644
--- a/arch/arm64/kvm/hyp/s2-setup.c
+++ b/arch/arm64/kvm/hyp/s2-setup.c
@@ -19,11 +19,13 @@
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_hyp.h>
+#include <asm/cpufeature.h>

u32 __hyp_text __init_stage2_translation(void)
{
u64 val = VTCR_EL2_FLAGS;
u64 parange;
+ u32 phys_shift;
u64 tmp;

/*
@@ -37,27 +39,7 @@ u32 __hyp_text __init_stage2_translation(void)
val |= parange << 16;

/* Compute the actual PARange... */
- switch (parange) {
- case 0:
- parange = 32;
- break;
- case 1:
- parange = 36;
- break;
- case 2:
- parange = 40;
- break;
- case 3:
- parange = 42;
- break;
- case 4:
- parange = 44;
- break;
- case 5:
- default:
- parange = 48;
- break;
- }
+ phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);

/*
* ... and clamp it to 40 bits, unless we have some braindead
@@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void)
* return that value for the rest of the kernel to decide what
* to do.
*/
- val |= 64 - (parange > 40 ? 40 : parange);
+ val |= 64 - (phys_shift > 40 ? 40 : phys_shift);

/*
* Check the availability of Hardware Access Flag / Dirty Bit
@@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void)

write_sysreg(val, vtcr_el2);

- return parange;
+ return phys_shift;
}
--
2.7.4


2018-03-27 13:25:21

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 08/17] kvm: arm/arm64: Prepare for VM specific stage2 translations

Right now the stage2 page table for a VM is hard coded, assuming
an IPA of 40bits. As we are about to add support for per VM IPA,
prepare the stage2 page table helpers to accept the kvm instance
to make the right decision for the VM. No functional changes.

Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm/include/asm/kvm_arm.h | 3 +-
arch/arm/include/asm/kvm_mmu.h | 15 +++-
arch/arm/include/asm/stage2_pgtable.h | 42 ++++-----
arch/arm64/include/asm/kvm_mmu.h | 7 +-
arch/arm64/include/asm/stage2_pgtable-nopmd.h | 18 ++--
arch/arm64/include/asm/stage2_pgtable-nopud.h | 16 ++--
arch/arm64/include/asm/stage2_pgtable.h | 49 ++++++-----
virt/kvm/arm/arm.c | 2 +-
virt/kvm/arm/mmu.c | 119 +++++++++++++-------------
virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +-
10 files changed, 148 insertions(+), 125 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 3ab8b37..c3f1f9b 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -133,8 +133,7 @@
* space.
*/
#define KVM_PHYS_SHIFT (40)
-#define KVM_PHYS_SIZE (_AC(1, ULL) << KVM_PHYS_SHIFT)
-#define KVM_PHYS_MASK (KVM_PHYS_SIZE - _AC(1, ULL))
+
#define PTRS_PER_S2_PGD (_AC(1, ULL) << (KVM_PHYS_SHIFT - 30))

/* Virtualization Translation Control Register (VTCR) bits */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index de1b919..305aa40 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -29,19 +29,30 @@
#define kern_hyp_va(kva) (kva)

/*
- * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels.
+ * kvm_mmu_cache_min_pages() is the number of stage2 page
+ * table translation levels, excluding the top level, for
+ * the given VM. Since we have a 3 level page-table, this
+ * is fixed.
*/
-#define KVM_MMU_CACHE_MIN_PAGES 2
+#define kvm_mmu_cache_min_pages(kvm) 2

#ifndef __ASSEMBLY__

#include <linux/highmem.h>
#include <asm/cacheflush.h>
#include <asm/cputype.h>
+#include <asm/kvm_arm.h>
#include <asm/kvm_hyp.h>
#include <asm/pgalloc.h>
#include <asm/stage2_pgtable.h>

+#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT
+#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm))
+#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
+#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK
+
+#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t))
+
int create_hyp_mappings(void *from, void *to, pgprot_t prot);
int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
void free_hyp_pgds(void);
diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
index 460d616..e22ae94 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -19,43 +19,45 @@
#ifndef __ARM_S2_PGTABLE_H_
#define __ARM_S2_PGTABLE_H_

-#define stage2_pgd_none(pgd) pgd_none(pgd)
-#define stage2_pgd_clear(pgd) pgd_clear(pgd)
-#define stage2_pgd_present(pgd) pgd_present(pgd)
-#define stage2_pgd_populate(pgd, pud) pgd_populate(NULL, pgd, pud)
-#define stage2_pud_offset(pgd, address) pud_offset(pgd, address)
-#define stage2_pud_free(pud) pud_free(NULL, pud)
+#define stage2_pgd_none(kvm, pgd) pgd_none(pgd)
+#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd)
+#define stage2_pgd_present(kvm, pgd) pgd_present(pgd)
+#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud)
+#define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address)
+#define stage2_pud_free(kvm, pud) pud_free(NULL, pud)

-#define stage2_pud_none(pud) pud_none(pud)
-#define stage2_pud_clear(pud) pud_clear(pud)
-#define stage2_pud_present(pud) pud_present(pud)
-#define stage2_pud_populate(pud, pmd) pud_populate(NULL, pud, pmd)
-#define stage2_pmd_offset(pud, address) pmd_offset(pud, address)
-#define stage2_pmd_free(pmd) pmd_free(NULL, pmd)
+#define stage2_pud_none(kvm, pud) pud_none(pud)
+#define stage2_pud_clear(kvm, pud) pud_clear(pud)
+#define stage2_pud_present(kvm, pud) pud_present(pud)
+#define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd)
+#define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud, address)
+#define stage2_pmd_free(kvm, pmd) pmd_free(NULL, pmd)

-#define stage2_pud_huge(pud) pud_huge(pud)
+#define stage2_pud_huge(kvm, pud) pud_huge(pud)

/* Open coded p*d_addr_end that can deal with 64bit addresses */
-static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
+static inline phys_addr_t
+stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
{
phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;

return (boundary - 1 < end - 1) ? boundary : end;
}

-#define stage2_pud_addr_end(addr, end) (end)
+#define stage2_pud_addr_end(kvm, addr, end) (end)

-static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end)
+static inline phys_addr_t
+stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
{
phys_addr_t boundary = (addr + PMD_SIZE) & PMD_MASK;

return (boundary - 1 < end - 1) ? boundary : end;
}

-#define stage2_pgd_index(addr) pgd_index(addr)
+#define stage2_pgd_index(kvm, addr) pgd_index(addr)

-#define stage2_pte_table_empty(ptep) kvm_page_empty(ptep)
-#define stage2_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
-#define stage2_pud_table_empty(pudp) false
+#define stage2_pte_table_empty(kvm, ptep) kvm_page_empty(ptep)
+#define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp)
+#define stage2_pud_table_empty(kvm, pudp) false

#endif /* __ARM_S2_PGTABLE_H_ */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 7faed6e..594c4e6 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -134,8 +134,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
* We currently only support a 40bit IPA.
*/
#define KVM_PHYS_SHIFT (40)
-#define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT)
-#define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL)
+
+#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT
+#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm))
+#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
+#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK

#include <asm/stage2_pgtable.h>

diff --git a/arch/arm64/include/asm/stage2_pgtable-nopmd.h b/arch/arm64/include/asm/stage2_pgtable-nopmd.h
index 2656a0f..0280ded 100644
--- a/arch/arm64/include/asm/stage2_pgtable-nopmd.h
+++ b/arch/arm64/include/asm/stage2_pgtable-nopmd.h
@@ -26,17 +26,17 @@
#define S2_PMD_SIZE (1UL << S2_PMD_SHIFT)
#define S2_PMD_MASK (~(S2_PMD_SIZE-1))

-#define stage2_pud_none(pud) (0)
-#define stage2_pud_present(pud) (1)
-#define stage2_pud_clear(pud) do { } while (0)
-#define stage2_pud_populate(pud, pmd) do { } while (0)
-#define stage2_pmd_offset(pud, address) ((pmd_t *)(pud))
+#define stage2_pud_none(kvm, pud) (0)
+#define stage2_pud_present(kvm, pud) (1)
+#define stage2_pud_clear(kvm, pud) do { } while (0)
+#define stage2_pud_populate(kvm, pud, pmd) do { } while (0)
+#define stage2_pmd_offset(kvm, pud, address) ((pmd_t *)(pud))

-#define stage2_pmd_free(pmd) do { } while (0)
+#define stage2_pmd_free(kvm, pmd) do { } while (0)

-#define stage2_pmd_addr_end(addr, end) (end)
+#define stage2_pmd_addr_end(kvm, addr, end) (end)

-#define stage2_pud_huge(pud) (0)
-#define stage2_pmd_table_empty(pmdp) (0)
+#define stage2_pud_huge(kvm, pud) (0)
+#define stage2_pmd_table_empty(kvm, pmdp) (0)

#endif
diff --git a/arch/arm64/include/asm/stage2_pgtable-nopud.h b/arch/arm64/include/asm/stage2_pgtable-nopud.h
index 5ee87b5..cd6304e 100644
--- a/arch/arm64/include/asm/stage2_pgtable-nopud.h
+++ b/arch/arm64/include/asm/stage2_pgtable-nopud.h
@@ -24,16 +24,16 @@
#define S2_PUD_SIZE (_AC(1, UL) << S2_PUD_SHIFT)
#define S2_PUD_MASK (~(S2_PUD_SIZE-1))

-#define stage2_pgd_none(pgd) (0)
-#define stage2_pgd_present(pgd) (1)
-#define stage2_pgd_clear(pgd) do { } while (0)
-#define stage2_pgd_populate(pgd, pud) do { } while (0)
+#define stage2_pgd_none(kvm, pgd) (0)
+#define stage2_pgd_present(kvm, pgd) (1)
+#define stage2_pgd_clear(kvm, pgd) do { } while (0)
+#define stage2_pgd_populate(kvm, pgd, pud) do { } while (0)

-#define stage2_pud_offset(pgd, address) ((pud_t *)(pgd))
+#define stage2_pud_offset(kvm, pgd, address) ((pud_t *)(pgd))

-#define stage2_pud_free(x) do { } while (0)
+#define stage2_pud_free(kvm, x) do { } while (0)

-#define stage2_pud_addr_end(addr, end) (end)
-#define stage2_pud_table_empty(pmdp) (0)
+#define stage2_pud_addr_end(kvm, addr, end) (end)
+#define stage2_pud_table_empty(kvm, pmdp) (0)

#endif
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index 8b68099..057a405 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -65,10 +65,10 @@
#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - S2_PGDIR_SHIFT))

/*
- * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation
+ * kvm_mmmu_cache_min_pages is the number of stage2 page table translation
* levels in addition to the PGD.
*/
-#define KVM_MMU_CACHE_MIN_PAGES (STAGE2_PGTABLE_LEVELS - 1)
+#define kvm_mmu_cache_min_pages(kvm) (STAGE2_PGTABLE_LEVELS - 1)


#if STAGE2_PGTABLE_LEVELS > 3
@@ -77,16 +77,17 @@
#define S2_PUD_SIZE (_AC(1, UL) << S2_PUD_SHIFT)
#define S2_PUD_MASK (~(S2_PUD_SIZE - 1))

-#define stage2_pgd_none(pgd) pgd_none(pgd)
-#define stage2_pgd_clear(pgd) pgd_clear(pgd)
-#define stage2_pgd_present(pgd) pgd_present(pgd)
-#define stage2_pgd_populate(pgd, pud) pgd_populate(NULL, pgd, pud)
-#define stage2_pud_offset(pgd, address) pud_offset(pgd, address)
-#define stage2_pud_free(pud) pud_free(NULL, pud)
+#define stage2_pgd_none(kvm, pgd) pgd_none(pgd)
+#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd)
+#define stage2_pgd_present(kvm, pgd) pgd_present(pgd)
+#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud)
+#define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address)
+#define stage2_pud_free(kvm, pud) pud_free(NULL, pud)

-#define stage2_pud_table_empty(pudp) kvm_page_empty(pudp)
+#define stage2_pud_table_empty(kvm, pudp) kvm_page_empty(pudp)

-static inline phys_addr_t stage2_pud_addr_end(phys_addr_t addr, phys_addr_t end)
+static inline phys_addr_t
+stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
{
phys_addr_t boundary = (addr + S2_PUD_SIZE) & S2_PUD_MASK;

@@ -102,17 +103,18 @@ static inline phys_addr_t stage2_pud_addr_end(phys_addr_t addr, phys_addr_t end)
#define S2_PMD_SIZE (_AC(1, UL) << S2_PMD_SHIFT)
#define S2_PMD_MASK (~(S2_PMD_SIZE - 1))

-#define stage2_pud_none(pud) pud_none(pud)
-#define stage2_pud_clear(pud) pud_clear(pud)
-#define stage2_pud_present(pud) pud_present(pud)
-#define stage2_pud_populate(pud, pmd) pud_populate(NULL, pud, pmd)
-#define stage2_pmd_offset(pud, address) pmd_offset(pud, address)
-#define stage2_pmd_free(pmd) pmd_free(NULL, pmd)
+#define stage2_pud_none(kvm, pud) pud_none(pud)
+#define stage2_pud_clear(kvm, pud) pud_clear(pud)
+#define stage2_pud_present(kvm, pud) pud_present(pud)
+#define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd)
+#define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud, address)
+#define stage2_pmd_free(kvm, pmd) pmd_free(NULL, pmd)

-#define stage2_pud_huge(pud) pud_huge(pud)
-#define stage2_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
+#define stage2_pud_huge(kvm, pud) pud_huge(pud)
+#define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp)

-static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end)
+static inline phys_addr_t
+stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
{
phys_addr_t boundary = (addr + S2_PMD_SIZE) & S2_PMD_MASK;

@@ -121,7 +123,7 @@ static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end)

#endif /* STAGE2_PGTABLE_LEVELS > 2 */

-#define stage2_pte_table_empty(ptep) kvm_page_empty(ptep)
+#define stage2_pte_table_empty(kvm, ptep) kvm_page_empty(ptep)

#if STAGE2_PGTABLE_LEVELS == 2
#include <asm/stage2_pgtable-nopmd.h>
@@ -129,10 +131,13 @@ static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end)
#include <asm/stage2_pgtable-nopud.h>
#endif

+#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t))

-#define stage2_pgd_index(addr) (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1))
+#define stage2_pgd_index(kvm, addr) \
+ (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1))

-static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
+static inline phys_addr_t
+stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
{
phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 86941f6..8b35a47 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -518,7 +518,7 @@ static void update_vttbr(struct kvm *kvm)

/* update vttbr to be used with the new vmid */
pgd_phys = virt_to_phys(kvm->arch.pgd);
- BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK);
+ BUG_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm));
vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK(kvm_vmid_bits);
kvm->arch.vttbr = kvm_phys_to_vttbr(pgd_phys) | vmid;

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 9cba334..7a264c6 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -43,7 +43,6 @@ static unsigned long hyp_idmap_start;
static unsigned long hyp_idmap_end;
static phys_addr_t hyp_idmap_vector;

-#define S2_PGD_SIZE (PTRS_PER_S2_PGD * sizeof(pgd_t))
#define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))

#define KVM_S2PTE_FLAG_IS_IOMAP (1UL << 0)
@@ -148,20 +147,20 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)

static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t addr)
{
- pud_t *pud_table __maybe_unused = stage2_pud_offset(pgd, 0UL);
- stage2_pgd_clear(pgd);
+ pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, pgd, 0UL);
+ stage2_pgd_clear(kvm, pgd);
kvm_tlb_flush_vmid_ipa(kvm, addr);
- stage2_pud_free(pud_table);
+ stage2_pud_free(kvm, pud_table);
put_page(virt_to_page(pgd));
}

static void clear_stage2_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr)
{
- pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(pud, 0);
- VM_BUG_ON(stage2_pud_huge(*pud));
- stage2_pud_clear(pud);
+ pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
+ VM_BUG_ON(stage2_pud_huge(kvm, *pud));
+ stage2_pud_clear(kvm, pud);
kvm_tlb_flush_vmid_ipa(kvm, addr);
- stage2_pmd_free(pmd_table);
+ stage2_pmd_free(kvm, pmd_table);
put_page(virt_to_page(pud));
}

@@ -217,7 +216,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
}
} while (pte++, addr += PAGE_SIZE, addr != end);

- if (stage2_pte_table_empty(start_pte))
+ if (stage2_pte_table_empty(kvm, start_pte))
clear_stage2_pmd_entry(kvm, pmd, start_addr);
}

@@ -227,9 +226,9 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
phys_addr_t next, start_addr = addr;
pmd_t *pmd, *start_pmd;

- start_pmd = pmd = stage2_pmd_offset(pud, addr);
+ start_pmd = pmd = stage2_pmd_offset(kvm, pud, addr);
do {
- next = stage2_pmd_addr_end(addr, end);
+ next = stage2_pmd_addr_end(kvm, addr, end);
if (!pmd_none(*pmd)) {
if (pmd_thp_or_huge(*pmd)) {
pmd_t old_pmd = *pmd;
@@ -246,7 +245,7 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
}
} while (pmd++, addr = next, addr != end);

- if (stage2_pmd_table_empty(start_pmd))
+ if (stage2_pmd_table_empty(kvm, start_pmd))
clear_stage2_pud_entry(kvm, pud, start_addr);
}

@@ -256,14 +255,14 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
phys_addr_t next, start_addr = addr;
pud_t *pud, *start_pud;

- start_pud = pud = stage2_pud_offset(pgd, addr);
+ start_pud = pud = stage2_pud_offset(kvm, pgd, addr);
do {
- next = stage2_pud_addr_end(addr, end);
- if (!stage2_pud_none(*pud)) {
- if (stage2_pud_huge(*pud)) {
+ next = stage2_pud_addr_end(kvm, addr, end);
+ if (!stage2_pud_none(kvm, *pud)) {
+ if (stage2_pud_huge(kvm, *pud)) {
pud_t old_pud = *pud;

- stage2_pud_clear(pud);
+ stage2_pud_clear(kvm, pud);
kvm_tlb_flush_vmid_ipa(kvm, addr);
kvm_flush_dcache_pud(old_pud);
put_page(virt_to_page(pud));
@@ -273,7 +272,7 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
}
} while (pud++, addr = next, addr != end);

- if (stage2_pud_table_empty(start_pud))
+ if (stage2_pud_table_empty(kvm, start_pud))
clear_stage2_pgd_entry(kvm, pgd, start_addr);
}

@@ -295,7 +294,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
phys_addr_t next;

assert_spin_locked(&kvm->mmu_lock);
- pgd = kvm->arch.pgd + stage2_pgd_index(addr);
+ pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
do {
/*
* Make sure the page table is still active, as another thread
@@ -304,8 +303,8 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
*/
if (!READ_ONCE(kvm->arch.pgd))
break;
- next = stage2_pgd_addr_end(addr, end);
- if (!stage2_pgd_none(*pgd))
+ next = stage2_pgd_addr_end(kvm, addr, end);
+ if (!stage2_pgd_none(kvm, *pgd))
unmap_stage2_puds(kvm, pgd, addr, next);
/*
* If the range is too large, release the kvm->mmu_lock
@@ -334,9 +333,9 @@ static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
pmd_t *pmd;
phys_addr_t next;

- pmd = stage2_pmd_offset(pud, addr);
+ pmd = stage2_pmd_offset(kvm, pud, addr);
do {
- next = stage2_pmd_addr_end(addr, end);
+ next = stage2_pmd_addr_end(kvm, addr, end);
if (!pmd_none(*pmd)) {
if (pmd_thp_or_huge(*pmd))
kvm_flush_dcache_pmd(*pmd);
@@ -352,11 +351,11 @@ static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
pud_t *pud;
phys_addr_t next;

- pud = stage2_pud_offset(pgd, addr);
+ pud = stage2_pud_offset(kvm, pgd, addr);
do {
- next = stage2_pud_addr_end(addr, end);
- if (!stage2_pud_none(*pud)) {
- if (stage2_pud_huge(*pud))
+ next = stage2_pud_addr_end(kvm, addr, end);
+ if (!stage2_pud_none(kvm, *pud)) {
+ if (stage2_pud_huge(kvm, *pud))
kvm_flush_dcache_pud(*pud);
else
stage2_flush_pmds(kvm, pud, addr, next);
@@ -372,10 +371,10 @@ static void stage2_flush_memslot(struct kvm *kvm,
phys_addr_t next;
pgd_t *pgd;

- pgd = kvm->arch.pgd + stage2_pgd_index(addr);
+ pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
do {
- next = stage2_pgd_addr_end(addr, end);
- if (!stage2_pgd_none(*pgd))
+ next = stage2_pgd_addr_end(kvm, addr, end);
+ if (!stage2_pgd_none(kvm, *pgd))
stage2_flush_puds(kvm, pgd, addr, next);
} while (pgd++, addr = next, addr != end);
}
@@ -755,7 +754,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
}

/* Allocate the HW PGD, making sure that each page gets its own refcount */
- pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
+ pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
if (!pgd)
return -ENOMEM;

@@ -844,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)

spin_lock(&kvm->mmu_lock);
if (kvm->arch.pgd) {
- unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+ unmap_stage2_range(kvm, 0, kvm_phys_size(kvm));
pgd = READ_ONCE(kvm->arch.pgd);
kvm->arch.pgd = NULL;
}
@@ -852,7 +851,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)

/* Free the HW pgd, one page at a time */
if (pgd)
- free_pages_exact(pgd, S2_PGD_SIZE);
+ free_pages_exact(pgd, stage2_pgd_size(kvm));
}

static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
@@ -861,16 +860,16 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
pgd_t *pgd;
pud_t *pud;

- pgd = kvm->arch.pgd + stage2_pgd_index(addr);
- if (stage2_pgd_none(*pgd)) {
+ pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
+ if (stage2_pgd_none(kvm, *pgd)) {
if (!cache)
return NULL;
pud = mmu_memory_cache_alloc(cache);
- stage2_pgd_populate(pgd, pud);
+ stage2_pgd_populate(kvm, pgd, pud);
get_page(virt_to_page(pgd));
}

- return stage2_pud_offset(pgd, addr);
+ return stage2_pud_offset(kvm, pgd, addr);
}

static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
@@ -883,15 +882,15 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
if (!pud)
return NULL;

- if (stage2_pud_none(*pud)) {
+ if (stage2_pud_none(kvm, *pud)) {
if (!cache)
return NULL;
pmd = mmu_memory_cache_alloc(cache);
- stage2_pud_populate(pud, pmd);
+ stage2_pud_populate(kvm, pud, pmd);
get_page(virt_to_page(pud));
}

- return stage2_pmd_offset(pud, addr);
+ return stage2_pmd_offset(kvm, pud, addr);
}

static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
@@ -1045,8 +1044,9 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
if (writable)
pte = kvm_s2pte_mkwrite(pte);

- ret = mmu_topup_memory_cache(&cache, KVM_MMU_CACHE_MIN_PAGES,
- KVM_NR_MEM_OBJS);
+ ret = mmu_topup_memory_cache(&cache,
+ kvm_mmu_cache_min_pages(kvm),
+ KVM_NR_MEM_OBJS);
if (ret)
goto out;
spin_lock(&kvm->mmu_lock);
@@ -1134,19 +1134,21 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)

/**
* stage2_wp_pmds - write protect PUD range
+ * kvm: kvm instance for the VM
* @pud: pointer to pud entry
* @addr: range start address
* @end: range end address
*/
-static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end)
+static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
+ phys_addr_t addr, phys_addr_t end)
{
pmd_t *pmd;
phys_addr_t next;

- pmd = stage2_pmd_offset(pud, addr);
+ pmd = stage2_pmd_offset(kvm, pud, addr);

do {
- next = stage2_pmd_addr_end(addr, end);
+ next = stage2_pmd_addr_end(kvm, addr, end);
if (!pmd_none(*pmd)) {
if (pmd_thp_or_huge(*pmd)) {
if (!kvm_s2pmd_readonly(pmd))
@@ -1166,18 +1168,19 @@ static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end)
*
* Process PUD entries, for a huge PUD we cause a panic.
*/
-static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end)
+static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
+ phys_addr_t addr, phys_addr_t end)
{
pud_t *pud;
phys_addr_t next;

- pud = stage2_pud_offset(pgd, addr);
+ pud = stage2_pud_offset(kvm, pgd, addr);
do {
- next = stage2_pud_addr_end(addr, end);
- if (!stage2_pud_none(*pud)) {
+ next = stage2_pud_addr_end(kvm, addr, end);
+ if (!stage2_pud_none(kvm, *pud)) {
/* TODO:PUD not supported, revisit later if supported */
- BUG_ON(stage2_pud_huge(*pud));
- stage2_wp_pmds(pud, addr, next);
+ BUG_ON(stage2_pud_huge(kvm, *pud));
+ stage2_wp_pmds(kvm, pud, addr, next);
}
} while (pud++, addr = next, addr != end);
}
@@ -1193,7 +1196,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
pgd_t *pgd;
phys_addr_t next;

- pgd = kvm->arch.pgd + stage2_pgd_index(addr);
+ pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
do {
/*
* Release kvm_mmu_lock periodically if the memory region is
@@ -1207,9 +1210,9 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
cond_resched_lock(&kvm->mmu_lock);
if (!READ_ONCE(kvm->arch.pgd))
break;
- next = stage2_pgd_addr_end(addr, end);
- if (stage2_pgd_present(*pgd))
- stage2_wp_puds(pgd, addr, next);
+ next = stage2_pgd_addr_end(kvm, addr, end);
+ if (stage2_pgd_present(kvm, *pgd))
+ stage2_wp_puds(kvm, pgd, addr, next);
} while (pgd++, addr = next, addr != end);
}

@@ -1357,7 +1360,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
up_read(&current->mm->mmap_sem);

/* We need minimum second+third level pages */
- ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES,
+ ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm),
KVM_NR_MEM_OBJS);
if (ret)
return ret;
@@ -1600,7 +1603,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
}

/* Userspace should not be able to register out-of-bounds IPAs */
- VM_BUG_ON(fault_ipa >= KVM_PHYS_SIZE);
+ VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));

if (fault_status == FSC_ACCESS) {
handle_access_fault(vcpu, fault_ipa);
@@ -1900,7 +1903,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
* space addressable by the KVM guest IPA space.
*/
if (memslot->base_gfn + memslot->npages >=
- (KVM_PHYS_SIZE >> PAGE_SHIFT))
+ (kvm_phys_size(kvm) >> PAGE_SHIFT))
return -EFAULT;

down_read(&current->mm->mmap_sem);
diff --git a/virt/kvm/arm/vgic/vgic-kvm-device.c b/virt/kvm/arm/vgic/vgic-kvm-device.c
index 10ae6f3..613ff4a 100644
--- a/virt/kvm/arm/vgic/vgic-kvm-device.c
+++ b/virt/kvm/arm/vgic/vgic-kvm-device.c
@@ -25,7 +25,7 @@
int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
phys_addr_t addr, phys_addr_t alignment)
{
- if (addr & ~KVM_PHYS_MASK)
+ if (addr & ~kvm_phys_mask(kvm))
return -E2BIG;

if (!IS_ALIGNED(addr, alignment))
--
2.7.4


2018-03-27 13:26:13

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 02/17] virtio: pci-legacy: Validate queue pfn

Legacy PCI over virtio uses a 32bit PFN for the queue. If the
queue pfn is too large to fit in 32bits, which we could hit on
arm64 systems with 52bit physical addresses (even with 64K page
size), we simply miss out a proper link to the other side of
the queue.

Add a check to validate the PFN, rather than silently breaking
the devices.

Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Peter Maydel <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
drivers/virtio/virtio_pci_legacy.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 2780886..4b84a75 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
struct virtqueue *vq;
u16 num;
int err;
+ u64 q_pfn;

/* Select the queue we're interested in */
iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
@@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
if (!vq)
return ERR_PTR(-ENOMEM);

+ q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
+ if (q_pfn >> 32) {
+ dev_err(&vp_dev->pci_dev->dev, "virtio-pci queue PFN too large\n");
+ err = -ENOMEM;
+ goto out_del_vq;
+ }
+
/* activate the queue */
- iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
- vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+ iowrite32((u32)q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);

vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY;

@@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,

out_deactivate:
iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+out_del_vq:
vring_del_virtqueue(vq);
return ERR_PTR(err);
}
--
2.7.4


2018-03-27 13:26:34

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 03/17] arm64: Make page table helpers reusable

This patch rearranges the page table level helpers so that it can
be reused for a page table with different number of levels
(e.g, stage2 page table for a VM) than the kernel page tables.
As such there is no functional change with this patch.

The page table helpers are defined to do the right thing for the
fixed page table levels set for the kernel. This patch tries to
refactor the code such that, we can have helpers for each level,
which should be used when the caller knows that the level exists
for the page table dealt with. Since the kernel defines helpers
p.d_action and __p.d_action, for consistency, we name the raw
page table action helpers __raw_p.d_action.

Cc: Catalin Marinas <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/pgalloc.h | 34 +++++++++++++++++++----
arch/arm64/include/asm/pgtable.h | 60 +++++++++++++++++++++++++---------------
2 files changed, 66 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 2e05bcd..ab4662c 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -29,6 +29,30 @@
#define PGALLOC_GFP (GFP_KERNEL | __GFP_ZERO)
#define PGD_SIZE (PTRS_PER_PGD * sizeof(pgd_t))

+static inline void __raw_pmd_free(pmd_t *pmdp)
+{
+ BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
+ free_page((unsigned long)pmdp);
+}
+
+static inline void
+__raw_pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
+{
+ __raw_set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot));
+}
+
+static inline void __raw_pud_free(pud_t *pudp)
+{
+ BUG_ON((unsigned long)pudp & (PAGE_SIZE-1));
+ free_page((unsigned long)pudp);
+}
+
+static inline void
+__raw_pgd_populate(pgd_t *pgdp, phys_addr_t pudp, pgdval_t prot)
+{
+ __raw_set_pgd(pgdp, __pgd(__phys_to_pgd_val(pudp) | prot));
+}
+
#if CONFIG_PGTABLE_LEVELS > 2

static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
@@ -38,13 +62,12 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)

static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
{
- BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
- free_page((unsigned long)pmdp);
+ __raw_pmd_free(pmdp);
}

static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
{
- set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot));
+ __raw_pud_populate(pudp, pmdp, prot);
}

static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
@@ -67,13 +90,12 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)

static inline void pud_free(struct mm_struct *mm, pud_t *pudp)
{
- BUG_ON((unsigned long)pudp & (PAGE_SIZE-1));
- free_page((unsigned long)pudp);
+ __raw_pud_free(pudp);
}

static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t pudp, pgdval_t prot)
{
- set_pgd(pgdp, __pgd(__phys_to_pgd_val(pudp) | prot));
+ __raw_pgd_populate(pgdp, pudp, prot);
}

static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, pud_t *pudp)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7e2c27e..5e22503 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -475,31 +475,39 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
*/
#define mk_pte(page,prot) pfn_pte(page_to_pfn(page),prot)

-#if CONFIG_PGTABLE_LEVELS > 2
+#define __raw_pud_none(pud) (!pud_val(pud))
+#define __raw_pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT))
+#define __raw_pud_present(pud) pte_present(pud_pte(pud))

-#define pmd_ERROR(pmd) __pmd_error(__FILE__, __LINE__, pmd_val(pmd))
-
-#define pud_none(pud) (!pud_val(pud))
-#define pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT))
-#define pud_present(pud) pte_present(pud_pte(pud))
-
-static inline void set_pud(pud_t *pudp, pud_t pud)
+static inline void __raw_set_pud(pud_t *pudp, pud_t pud)
{
WRITE_ONCE(*pudp, pud);
dsb(ishst);
isb();
}

-static inline void pud_clear(pud_t *pudp)
+static inline void __raw_pud_clear(pud_t *pudp)
{
- set_pud(pudp, __pud(0));
+ __raw_set_pud(pudp, __pud(0));
}

-static inline phys_addr_t pud_page_paddr(pud_t pud)
+static inline phys_addr_t __raw_pud_page_paddr(pud_t pud)
{
return __pud_to_phys(pud);
}

+#if CONFIG_PGTABLE_LEVELS > 2
+
+#define pmd_ERROR(pmd) __pmd_error(__FILE__, __LINE__, pmd_val(pmd))
+
+#define pud_none(pud) __raw_pud_none(pud)
+#define pud_bad(pud) __raw_pud_bad(pud)
+#define pud_present(pud) __raw_pud_present(pud)
+
+#define set_pud(pudp, pud) __raw_set_pud((pudp), (pud))
+#define pud_clear(pudp) __raw_pud_clear((pudp))
+#define pud_page_paddr(pud) __raw_pud_page_paddr((pud))
+
/* Find an entry in the second-level page table. */
#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))

@@ -528,30 +536,38 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)

#endif /* CONFIG_PGTABLE_LEVELS > 2 */

-#if CONFIG_PGTABLE_LEVELS > 3
+#define __raw_pgd_none(pgd) (!pgd_val(pgd))
+#define __raw_pgd_bad(pgd) (!(pgd_val(pgd) & 2))
+#define __raw_pgd_present(pgd) (pgd_val(pgd))

-#define pud_ERROR(pud) __pud_error(__FILE__, __LINE__, pud_val(pud))
-
-#define pgd_none(pgd) (!pgd_val(pgd))
-#define pgd_bad(pgd) (!(pgd_val(pgd) & 2))
-#define pgd_present(pgd) (pgd_val(pgd))
-
-static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
+static inline void __raw_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
WRITE_ONCE(*pgdp, pgd);
dsb(ishst);
}

-static inline void pgd_clear(pgd_t *pgdp)
+static inline void __raw_pgd_clear(pgd_t *pgdp)
{
- set_pgd(pgdp, __pgd(0));
+ __raw_set_pgd(pgdp, __pgd(0));
}

-static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
+static inline phys_addr_t __raw_pgd_page_paddr(pgd_t pgd)
{
return __pgd_to_phys(pgd);
}

+#if CONFIG_PGTABLE_LEVELS > 3
+
+#define pud_ERROR(pud) __pud_error(__FILE__, __LINE__, pud_val(pud))
+
+#define pgd_none(pgd) __raw_pgd_none((pgd))
+#define pgd_bad(pgd) __raw_pgd_bad((pgd))
+#define pgd_present(pgd) __raw_pgd_present((pgd))
+
+#define set_pgd(pgdp, pgd) __raw_set_pgd((pgdp), (pgd))
+#define pgd_clear(pgdp) __raw_pgd_clear((pgdp))
+#define pgd_page_paddr(pgd) __raw_pgd_page_paddr((pgd))
+
/* Find an entry in the frst-level page table. */
#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))

--
2.7.4


2018-03-27 13:26:56

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 01/17] virtio: mmio-v1: Validate queue PFN

virtio-mmio with virtio-v1 uses a 32bit PFN for the queue.
If the queue pfn is too large to fit in 32bits, which
we could hit on arm64 systems with 52bit physical addresses
(even with 64K page size), we simply miss out a proper link
to the other side of the queue.

Add a check to validate the PFN, rather than silently breaking
the devices.

Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Peter Maydel <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
drivers/virtio/virtio_mmio.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 67763d3..b2f9b5c 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -397,9 +397,21 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
/* Activate the queue */
writel(virtqueue_get_vring_size(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NUM);
if (vm_dev->version == 1) {
+ u64 q_pfn = virtqueue_get_desc_addr(vq) >> PAGE_SHIFT;
+
+ /*
+ * virtio-mmio v1 uses a 32bit QUEUE PFN. If we have something
+ * that doesn't fit in 32bit, fail the setup rather than
+ * pretending to be successful.
+ */
+ if (q_pfn >> 32) {
+ dev_err(&vdev->dev, "virtio-mmio: queue address too large\n");
+ err = -ENOMEM;
+ goto error_bad_pfn;
+ }
+
writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_QUEUE_ALIGN);
- writel(virtqueue_get_desc_addr(vq) >> PAGE_SHIFT,
- vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
+ writel(q_pfn, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
} else {
u64 addr;

@@ -430,6 +442,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,

return vq;

+error_bad_pfn:
+ vring_del_virtqueue(vq);
error_new_virtqueue:
if (vm_dev->version == 1) {
writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
--
2.7.4


2018-03-27 14:10:03

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] virtio: mmio-v1: Validate queue PFN

On Tue, Mar 27, 2018 at 02:15:11PM +0100, Suzuki K Poulose wrote:
> virtio-mmio with virtio-v1 uses a 32bit PFN for the queue.
> If the queue pfn is too large to fit in 32bits, which
> we could hit on arm64 systems with 52bit physical addresses
> (even with 64K page size), we simply miss out a proper link
> to the other side of the queue.
>
> Add a check to validate the PFN, rather than silently breaking
> the devices.
>
> Cc: "Michael S. Tsirkin" <[email protected]>
> Cc: Jason Wang <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Peter Maydel <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>

OK - seems harmless so I will queue this.
But I really think effort should be spent on
adding v1.0 support in QEMU.

> ---
> drivers/virtio/virtio_mmio.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 67763d3..b2f9b5c 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -397,9 +397,21 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
> /* Activate the queue */
> writel(virtqueue_get_vring_size(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NUM);
> if (vm_dev->version == 1) {
> + u64 q_pfn = virtqueue_get_desc_addr(vq) >> PAGE_SHIFT;
> +
> + /*
> + * virtio-mmio v1 uses a 32bit QUEUE PFN. If we have something
> + * that doesn't fit in 32bit, fail the setup rather than
> + * pretending to be successful.
> + */
> + if (q_pfn >> 32) {
> + dev_err(&vdev->dev, "virtio-mmio: queue address too large\n");
> + err = -ENOMEM;
> + goto error_bad_pfn;
> + }
> +
> writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_QUEUE_ALIGN);
> - writel(virtqueue_get_desc_addr(vq) >> PAGE_SHIFT,
> - vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
> + writel(q_pfn, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
> } else {
> u64 addr;
>
> @@ -430,6 +442,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
>
> return vq;
>
> +error_bad_pfn:
> + vring_del_virtqueue(vq);
> error_new_virtqueue:
> if (vm_dev->version == 1) {
> writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
> --
> 2.7.4

2018-03-27 14:12:45

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 02/17] virtio: pci-legacy: Validate queue pfn

On Tue, Mar 27, 2018 at 02:15:12PM +0100, Suzuki K Poulose wrote:
> Legacy PCI over virtio uses a 32bit PFN for the queue. If the
> queue pfn is too large to fit in 32bits, which we could hit on
> arm64 systems with 52bit physical addresses (even with 64K page
> size), we simply miss out a proper link to the other side of
> the queue.
>
> Add a check to validate the PFN, rather than silently breaking
> the devices.
>
> Cc: "Michael S. Tsirkin" <[email protected]>
> Cc: Jason Wang <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Peter Maydel <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> drivers/virtio/virtio_pci_legacy.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index 2780886..4b84a75 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> struct virtqueue *vq;
> u16 num;
> int err;
> + u64 q_pfn;
>
> /* Select the queue we're interested in */
> iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
> @@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> if (!vq)
> return ERR_PTR(-ENOMEM);
>
> + q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
> + if (q_pfn >> 32) {
> + dev_err(&vp_dev->pci_dev->dev, "virtio-pci queue PFN too large\n");
> + err = -ENOMEM;

ENOMEM seems wrong here. E2BIG?

> + goto out_del_vq;
> + }
> +
> /* activate the queue */
> - iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> - vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> + iowrite32((u32)q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);

Is the cast really necessary here?

>
> vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY;
>
> @@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>
> out_deactivate:
> iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> +out_del_vq:
> vring_del_virtqueue(vq);
> return ERR_PTR(err);
> }
> --
> 2.7.4

2018-04-03 12:32:25

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [kvmtool PATCH 20/17] kvmtool: arm64: Switch memory layout

Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> If the guest wants to use a larger physical address space place
> the RAM at upper half of the address space. Otherwise, it uses the
> default layout.
>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> arm/aarch32/include/kvm/kvm-arch.h | 1 +
> arm/aarch64/include/kvm/kvm-arch.h | 15 ++++++++++++---
> arm/include/arm-common/kvm-arch.h | 11 ++++++-----
> arm/kvm.c | 2 +-
> 4 files changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/arm/aarch32/include/kvm/kvm-arch.h b/arm/aarch32/include/kvm/kvm-arch.h
> index cd31e72..2d62aab 100644
> --- a/arm/aarch32/include/kvm/kvm-arch.h
> +++ b/arm/aarch32/include/kvm/kvm-arch.h
> @@ -4,6 +4,7 @@
> #define ARM_KERN_OFFSET(...) 0x8000
>
> #define ARM_MAX_MEMORY(...) ARM_LOMAP_MAX_MEMORY

I guess this should now be ARM32_MAX_MEMORY?

Thanks,
Jean

2018-04-03 15:03:47

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v2 11/17] kvm: arm64: Configure VTCR per VM

Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> We set VTCR_EL2 very early during the stage2 init and don't
> touch it ever. This is fine as we had a fixed IPA size. This
> patch changes the behavior to set the VTCR for a given VM,
> depending on its stage2 table. The common configuration for
> VTCR is still performed during the early init as we have to
> retain the hardware access flag update bits (VTCR_EL2_HA)
> per CPU (as they are only set for the CPUs which are capabile).

(Nit: capable)


> The bits defining the number of levels in the page table (SL0)
> and and the size of the Input address to the translation (T0SZ)
> are programmed for each VM upon entry to the guest.

> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 870f4b1..5ccd3ae 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -164,6 +164,12 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
> static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
> {
> struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> + u64 vtcr = read_sysreg(vtcr_el2);
> +
> + vtcr &= ~VTCR_EL2_PRIVATE_MASK;
> + vtcr |= VTCR_EL2_SL0(kvm_stage2_levels(kvm)) |
> + VTCR_EL2_T0SZ(kvm_phys_shift(kvm));
> + write_sysreg(vtcr, vtcr_el2);
> write_sysreg(kvm->arch.vttbr, vttbr_el2);
> }

Do we need to set this register for tlb maintenance too?
e.g. tlbi for a 3-level-stage2 vm when a 2-level-stage2 vm's vtcr is loaded...

(The ARM-ARM has 'Any of the bits of VTCR_EL2 are permitted to be cached in a TLB'.)


Thanks,

James

2018-04-03 15:45:51

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 11/17] kvm: arm64: Configure VTCR per VM

On 03/04/18 15:58, James Morse wrote:
> Hi Suzuki,
>
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> We set VTCR_EL2 very early during the stage2 init and don't
>> touch it ever. This is fine as we had a fixed IPA size. This
>> patch changes the behavior to set the VTCR for a given VM,
>> depending on its stage2 table. The common configuration for
>> VTCR is still performed during the early init as we have to
>> retain the hardware access flag update bits (VTCR_EL2_HA)
>> per CPU (as they are only set for the CPUs which are capabile).
>
> (Nit: capable)
>

Thanks for spotting, will fix it.

>
>> The bits defining the number of levels in the page table (SL0)
>> and and the size of the Input address to the translation (T0SZ)
>> are programmed for each VM upon entry to the guest.
>
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index 870f4b1..5ccd3ae 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -164,6 +164,12 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
>> static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
>> {
>> struct kvm *kvm = kern_hyp_va(vcpu->kvm);
>> + u64 vtcr = read_sysreg(vtcr_el2);
>> +
>> + vtcr &= ~VTCR_EL2_PRIVATE_MASK;
>> + vtcr |= VTCR_EL2_SL0(kvm_stage2_levels(kvm)) |
>> + VTCR_EL2_T0SZ(kvm_phys_shift(kvm));
>> + write_sysreg(vtcr, vtcr_el2);
>> write_sysreg(kvm->arch.vttbr, vttbr_el2);
>> }
>
> Do we need to set this register for tlb maintenance too?
> e.g. tlbi for a 3-level-stage2 vm when a 2-level-stage2 vm's vtcr is loaded...
>
> (The ARM-ARM has 'Any of the bits of VTCR_EL2 are permitted to be cached in a TLB'.)

You're right. We need to set the VTCR for the tlb operations. I think
we can do this by hooking it to the __tlb_switch_to_guest() routine.
Will address it in the next version.

Cheers
Suzuki

2018-04-13 13:47:20

by Peter Maydell

[permalink] [raw]
Subject: Re: [PATCH v2 12/17] kvm: arm/arm64: Expose supported physical address limit for VM

On 27 March 2018 at 14:15, Suzuki K Poulose <[email protected]> wrote:
> Expose the maximum physical address size supported by the host
> for a VM. This could be later used by the userspace to choose the
> appropriate size for a given VM. The limit is determined as the
> minimum of actual CPU limit, the kernel limit (i.e, either 48 or 52)
> and the stage2 page table support limit (which is 40bits at the moment).
> For backward compatibility, we support a minimum of 40bits. The limit
> will be lifted as we add support for the stage2 to support the host
> kernel PA limit.
>
> This value may be different from what is exposed to the VM via
> CPU ID registers. The limit only applies to the stage2 page table.
>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Peter Maydel <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> Documentation/virtual/kvm/api.txt | 14 ++++++++++++++
> arch/arm/include/asm/kvm_mmu.h | 5 +++++
> arch/arm64/include/asm/kvm_mmu.h | 5 +++++
> include/uapi/linux/kvm.h | 6 ++++++
> virt/kvm/arm/arm.c | 6 ++++++
> 5 files changed, 36 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 792fa87..55908a8 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3500,6 +3500,20 @@ Returns: 0 on success; -1 on error
> This ioctl can be used to unregister the guest memory region registered
> with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
>
> +4.113 KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> +Capability: basic
> +Architectures: arm, arm64
> +Type: system ioctl
> +Parameters: none
> +Returns: log2(Maximum physical address space size) supported by the
> +hyperviosr.

typo: "hypervisor".

> +
> +This ioctl can be used to identify the maximum physical address space size
> +supported by the hypervisor.

Is that the physical address space on the host, or the physical
address space size we present to the guest?

> The returned value indicates the maximum size
> +of the address that can be resolved by the stage2 translation table on
> +arm/arm64. On arm64, the value is decided based on the host kernel
> +configuration and the system wide safe value of ID_AA64MMFR0_EL1:PARange.
> +This may not match the value exposed to the VM in CPU ID registers.

Isn't it likely to confuse the guest if we lie to it about the PA range it
sees? When would the two values differ?

Do we also need a 'set' operation, so userspace can create a VM
that has a 40 bit userspace on a CPU that supports more than that,
or does it just work?

What's the x86 API for KVM to tell userspace about physical address
range restrictions?

thanks
-- PMM

2018-04-13 16:29:31

by Punit Agrawal

[permalink] [raw]
Subject: Re: [PATCH v2 14/17] kvm: arm64: Switch to per VM IPA limit

Hi Suzuki,

I haven't had a chance to look at the code but noticed one issue below.

Suzuki K Poulose <[email protected]> writes:

> Now that we can manage the stage2 page table per VM, switch the
> configuration details to per VM instance. We keep track of the
> IPA bits, number of page table levels and the VTCR bits (which
> depends on the IPA and the number of levels). While at it, remove
> unused pgd_lock field from kvm_arch for arm64.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 14 ++++++++++++--
> arch/arm64/include/asm/kvm_mmu.h | 11 +++++++++--
> arch/arm64/include/asm/stage2_pgtable.h | 1 -
> arch/arm64/kvm/hyp/switch.c | 3 +--
> virt/kvm/arm/mmu.c | 4 ++++
> 5 files changed, 26 insertions(+), 7 deletions(-)
>

[...]

> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index bb458bf..e86d7f4 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -136,9 +136,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
> */
> #define KVM_PHYS_SHIFT (40)
>
> -#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT
> +#define kvm_phys_shift(kvm) (kvm->arch.phys_shift)
> #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm))
> #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
> +#define kvm_stage2_levels(kvm) (kvm->arch.s2_levels)
>
> static inline bool kvm_page_empty(void *ptr)
> {

[...]

> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
> index 33e8ebb..9b75b83 100644
> --- a/arch/arm64/include/asm/stage2_pgtable.h
> +++ b/arch/arm64/include/asm/stage2_pgtable.h
> @@ -44,7 +44,6 @@
> */
> #define __s2_pgd_ptrs(pa, lvls) (1 << ((pa) - pt_levels_pgdir_shift((lvls))))
>
> -#define kvm_stage2_levels(kvm) stage2_pt_levels(kvm_phys_shift(kvm))
> #define stage2_pgdir_shift(kvm) \
> pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
> #define stage2_pgdir_size(kvm) (_AC(1, UL) << stage2_pgdir_shift((kvm)))

[...]

> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 7a264c6..746f38e 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -753,6 +753,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
> return -EINVAL;
> }
>
> + /* Make sure we have the stage2 configured for this VM */
> + if (WARN_ON(!kvm_stage2_levels(kvm)))
> + return -EINVAL;
> +

This hunk breaks the 32-bit build as kvm_stag2_levels() isn't defined on
arm.

Thanks,
Punit

> /* Allocate the HW PGD, making sure that each page gets its own refcount */
> pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
> if (!pgd)

2018-04-16 10:25:02

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 12/17] kvm: arm/arm64: Expose supported physical address limit for VM

On 13/04/18 14:21, Peter Maydell wrote:
> On 27 March 2018 at 14:15, Suzuki K Poulose <[email protected]> wrote:
>> Expose the maximum physical address size supported by the host
>> for a VM. This could be later used by the userspace to choose the
>> appropriate size for a given VM. The limit is determined as the
>> minimum of actual CPU limit, the kernel limit (i.e, either 48 or 52)
>> and the stage2 page table support limit (which is 40bits at the moment).
>> For backward compatibility, we support a minimum of 40bits. The limit
>> will be lifted as we add support for the stage2 to support the host
>> kernel PA limit.
>>
>> This value may be different from what is exposed to the VM via
>> CPU ID registers. The limit only applies to the stage2 page table.
>>
>> Cc: Christoffer Dall <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Peter Maydel <[email protected]>
>> Signed-off-by: Suzuki K Poulose <[email protected]>
>> ---
>> Documentation/virtual/kvm/api.txt | 14 ++++++++++++++
>> arch/arm/include/asm/kvm_mmu.h | 5 +++++
>> arch/arm64/include/asm/kvm_mmu.h | 5 +++++
>> include/uapi/linux/kvm.h | 6 ++++++
>> virt/kvm/arm/arm.c | 6 ++++++
>> 5 files changed, 36 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 792fa87..55908a8 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -3500,6 +3500,20 @@ Returns: 0 on success; -1 on error
>> This ioctl can be used to unregister the guest memory region registered
>> with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
>>
>> +4.113 KVM_ARM_GET_MAX_VM_PHYS_SHIFT
>> +Capability: basic
>> +Architectures: arm, arm64
>> +Type: system ioctl
>> +Parameters: none
>> +Returns: log2(Maximum physical address space size) supported by the
>> +hyperviosr.
>
> typo: "hypervisor".
>

Will fix it.

>> +
>> +This ioctl can be used to identify the maximum physical address space size
>> +supported by the hypervisor.
>
> Is that the physical address space on the host, or the physical
> address space size we present to the guest?

It is the size of the address space we present to the guest. I will update
the documentation to make it more clear.

>
>> The returned value indicates the maximum size
>> +of the address that can be resolved by the stage2 translation table on
>> +arm/arm64. On arm64, the value is decided based on the host kernel
>> +configuration and the system wide safe value of ID_AA64MMFR0_EL1:PARange.
>> +This may not match the value exposed to the VM in CPU ID registers.
>
> Isn't it likely to confuse the guest if we lie to it about the PA range it
> sees? When would the two values differ?

On a heterogeneous system, the guest could see different values
of PARange on the same VCPU. So that is not safe for a guest at the moment.
Ideally, we should emulate the PARange to provide the system wide safe value, which the
guest can read.

We don't touch the emulation of PARange in the ID registers in this set.
All we do is (in the next patches) limiting the address space size provided
to the guest. May be we could update PARange to the limit imposed and emulate
the field.

>
> Do we also need a 'set' operation, so userspace can create a VM
> that has a 40 bit userspace on a CPU that supports more than that,
> or does it just work?

It just works as before, creating a 40bit userspace, without any additional
steps. All we do is, allowing to create a VM with bigger address space
by specifying the size in the "type" field. The other question is, does
it really matter what a guest sees in PARange and what it is provided
with ? e.g, on my Juno, the A53's have 40bit and A57 has 44bit, while
the system uses only 40bit.

This will be true even with the new change. i.e, we don't allow a size
beyond the limit supported by all the CPUs on the system.

>
> What's the x86 API for KVM to tell userspace about physical address
> range restrictions?

From a quick look, the limit comes from cpuid (leaf 0x80000008 ?). So, it
could be via the existing per-VCPU get/set_cpuid{,2}() API on x86.

Suzuki

2018-04-16 14:28:29

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 14/17] kvm: arm64: Switch to per VM IPA limit

On 13/04/18 17:27, Punit Agrawal wrote:
> Hi Suzuki,
>
> I haven't had a chance to look at the code but noticed one issue below.
>
> Suzuki K Poulose <[email protected]> writes:
>
>> Now that we can manage the stage2 page table per VM, switch the
>> configuration details to per VM instance. We keep track of the
>> IPA bits, number of page table levels and the VTCR bits (which
>> depends on the IPA and the number of levels). While at it, remove
>> unused pgd_lock field from kvm_arch for arm64.
>>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Christoffer Dall <[email protected]>
>> Signed-off-by: Suzuki K Poulose <[email protected]>
>> ---
>> arch/arm64/include/asm/kvm_host.h | 14 ++++++++++++--
>> arch/arm64/include/asm/kvm_mmu.h | 11 +++++++++--
>> arch/arm64/include/asm/stage2_pgtable.h | 1 -
>> arch/arm64/kvm/hyp/switch.c | 3 +--
>> virt/kvm/arm/mmu.c | 4 ++++
>> 5 files changed, 26 insertions(+), 7 deletions(-)
>>
>
> [...]
>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index bb458bf..e86d7f4 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -136,9 +136,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
>> */
>> #define KVM_PHYS_SHIFT (40)
>>
>> -#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT
>> +#define kvm_phys_shift(kvm) (kvm->arch.phys_shift)
>> #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm))
>> #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
>> +#define kvm_stage2_levels(kvm) (kvm->arch.s2_levels)
>>
>> static inline bool kvm_page_empty(void *ptr)
>> {
>
> [...]
>
>> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
>> index 33e8ebb..9b75b83 100644
>> --- a/arch/arm64/include/asm/stage2_pgtable.h
>> +++ b/arch/arm64/include/asm/stage2_pgtable.h
>> @@ -44,7 +44,6 @@
>> */
>> #define __s2_pgd_ptrs(pa, lvls) (1 << ((pa) - pt_levels_pgdir_shift((lvls))))
>>
>> -#define kvm_stage2_levels(kvm) stage2_pt_levels(kvm_phys_shift(kvm))
>> #define stage2_pgdir_shift(kvm) \
>> pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
>> #define stage2_pgdir_size(kvm) (_AC(1, UL) << stage2_pgdir_shift((kvm)))
>
> [...]
>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 7a264c6..746f38e 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -753,6 +753,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>> return -EINVAL;
>> }
>>
>> + /* Make sure we have the stage2 configured for this VM */
>> + if (WARN_ON(!kvm_stage2_levels(kvm)))
>> + return -EINVAL;
>> +
>
> This hunk breaks the 32-bit build as kvm_stag2_levels() isn't defined on
> arm.

Thanks for spotting. I have fixed this locally in my next version to check
for the kvm_phys_shift(), as I plan to delay the levels selection to the
actual allocation of the table, giving us a fall back to increase the level
if we are unable to allocate contiguous pages (e.g, 16 * 64K pages with say 46bit
IPA).


Cheers
Suzuki

2018-04-25 16:11:37

by Julien Grall

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] kvm: arm/arm64: Allow tuning the physical address size for VM

Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Allow specifying the physical address size for a new VM via
> the kvm_type argument for KVM_CREATE_VM ioctl. This allows
> us to finalise the stage2 page table format as early as possible
> and hence perform the right checks on the memory slots without
> complication. The size is encoded as Log2(PA_Siz) in the bits[7:0]

s/PA_Siz/PA_Size/.

> of the type field and can encode more information in the future if
> required.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Peter Maydel <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 2 ++
> arch/arm64/include/asm/kvm_mmu.h | 2 ++
> include/uapi/linux/kvm.h | 10 ++++++++++
> virt/kvm/arm/arm.c | 24 ++++++++++++++++++++++--
> 4 files changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 50896da..3f13827 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -323,6 +323,8 @@ static inline u32 kvm_get_ipa_limit(void)
> return KVM_PHYS_SHIFT;
> }
>
> +static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {}
> +
> #endif /* !__ASSEMBLY__ */
>
> #endif /* __ARM_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index a4c8c00..bb458bf 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -416,5 +416,7 @@ static inline u32 kvm_get_ipa_limit(void)
> return KVM_PHYS_SHIFT;
> }
>
> +static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {}
> +
> #endif /* __ASSEMBLY__ */
> #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b737ee1..67b09b0 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -740,6 +740,16 @@ struct kvm_ppc_resize_hpt {
> #define KVM_S390_SIE_PAGE_OFFSET 1
>
> /*
> + * On arm/arm64, machine type can be used to request the physical
> + * address size for the VM. Bits [7-0] have been reserved for the
> + * PA size shift (i.e, log2(PA-Size)). For backward compatibility,

I would s/PA-Size/PA_Size/ to avoid the impression that it is a
substraction.

> + * value 0 implies the default IPA size, which is 40bits.
> + */
> +#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK 0xff
> +#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x) \
> + ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)
> +
> +/*
> * ioctls for /dev/kvm fds:
> */
> #define KVM_GET_API_VERSION _IO(KVMIO, 0x00)
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 53bb05c..13eb66f 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -110,6 +110,25 @@ void kvm_arch_check_processor_compat(void *rtn)
> }
>
>
> +static int kvm_arch_config_vm(struct kvm *kvm, unsigned long type)
> +{
> + u32 ipa_shift = (u32)type & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK;

How about using KVM_VM_TYPE_ARM_PHYS_SHIFT(type) directly here?

I am not entirely sure whether the cast is necessary. If it is, then I
think you want to add it in KVM_VM_TYPE_ARM_PHYS_SHIFT(...) as well.

> +
> + /*
> + * Make sure the size, if specified, is within the range of
> + * default size and supported maximum limit.
> + */
> + if (ipa_shift) {
> + if (ipa_shift < KVM_PHYS_SHIFT || ipa_shift > kvm_ipa_limit)
> + return -EINVAL;
> + } else {
> + ipa_shift = KVM_PHYS_SHIFT;
> + }
> +
> + kvm_config_stage2(kvm, ipa_shift);
> + return 0;
> +}
> +
> /**
> * kvm_arch_init_vm - initializes a VM data structure
> * @kvm: pointer to the KVM struct
> @@ -118,8 +137,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> {
> int ret, cpu;
>
> - if (type)
> - return -EINVAL;
> + ret = kvm_arch_config_vm(kvm, type);
> + if (ret)
> + return ret;
>
> kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
> if (!kvm->arch.last_vcpu_ran)
>

Cheers,

--
Julien Grall

2018-04-25 16:24:00

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] kvm: arm/arm64: Allow tuning the physical address size for VM

On 25/04/18 17:10, Julien Grall wrote:
> Hi Suzuki,
>
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> Allow specifying the physical address size for a new VM via
>> the kvm_type argument for KVM_CREATE_VM ioctl. This allows
>> us to finalise the stage2 page table format as early as possible
>> and hence perform the right checks on the memory slots without
>> complication. The size is encoded as Log2(PA_Siz) in the bits[7:0]
>
> s/PA_Siz/PA_Size/.
>

>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -740,6 +740,16 @@ struct kvm_ppc_resize_hpt {
>>   #define KVM_S390_SIE_PAGE_OFFSET 1
>>   /*
>> + * On arm/arm64, machine type can be used to request the physical
>> + * address size for the VM. Bits [7-0] have been reserved for the
>> + * PA size shift (i.e, log2(PA-Size)). For backward compatibility,
>
> I would s/PA-Size/PA_Size/ to avoid the impression that it is a substraction.
>

Agree.

>> + * value 0 implies the default IPA size, which is 40bits.
>> + */
>> +#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK    0xff
>> +#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)        \
>> +    ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)
>> +
>> +/*
>>    * ioctls for /dev/kvm fds:
>>    */
>>   #define KVM_GET_API_VERSION       _IO(KVMIO,   0x00)
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 53bb05c..13eb66f 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -110,6 +110,25 @@ void kvm_arch_check_processor_compat(void *rtn)
>>   }
>> +static int kvm_arch_config_vm(struct kvm *kvm, unsigned long type)
>> +{
>> +    u32 ipa_shift = (u32)type & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK;
>
> How about using KVM_VM_TYPE_ARM_PHYS_SHIFT(type) directly here?

Thanks for that, I have this already fixed in for v3.

>>   /**
>>    * kvm_arch_init_vm - initializes a VM data structure
>>    * @kvm:    pointer to the KVM struct
>> @@ -118,8 +137,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>   {
>>       int ret, cpu;
>> -    if (type)
>> -        return -EINVAL;
>> +    ret = kvm_arch_config_vm(kvm, type);
>> +    if (ret)
>> +        return ret;
>>       kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
>>       if (!kvm->arch.last_vcpu_ran)
>>
>
> Cheers,

Thanks for the review.

Cheers.

2018-04-25 16:38:54

by Julien Grall

[permalink] [raw]
Subject: Re: [PATCH v2 09/17] kvm: arm64: Make stage2 page table layout dynamic

Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> So far we had a static stage2 page table handling code, based on a
> fixed IPA of 40bits. As we prepare for a configurable IPA size per
> VM, make the our stage2 page table code dynamic to do the right thing
> for a given VM.
>
> Support for the IPA size configuration needs other changes in the way
> we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still
> fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h
> to the top, before including the asm/stage2_pgtable.h to avoid a forward
> declaration.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> arch/arm64/include/asm/kvm_mmu.h | 16 +-
> arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 ------
> arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 -----
> arch/arm64/include/asm/stage2_pgtable.h | 203 +++++++++++++++++---------
> 4 files changed, 145 insertions(+), 155 deletions(-)
> delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
> delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 594c4e6..bc133ce 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -18,9 +18,10 @@
> #ifndef __ARM64_KVM_MMU_H__
> #define __ARM64_KVM_MMU_H__
>
> +#include <asm/cpufeature.h>
> #include <asm/page.h>
> #include <asm/memory.h>
> -#include <asm/cpufeature.h>

OOI, why was the include of cpufeature.h moved earlier?

Cheers,

--
Julien Grall

2018-04-25 16:40:21

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 09/17] kvm: arm64: Make stage2 page table layout dynamic

On 25/04/18 17:35, Julien Grall wrote:
> Hi Suzuki,
>
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> So far we had a static stage2 page table handling code, based on a
>> fixed IPA of 40bits. As we prepare for a configurable IPA size per
>> VM, make the our stage2 page table code dynamic to do the right thing
>> for a given VM.
>>
>> Support for the IPA size configuration needs other changes in the way
>> we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still
>> fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h
>> to the top, before including the asm/stage2_pgtable.h to avoid a forward
>> declaration.
>>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Christoffer Dall <[email protected]>
>> Signed-off-by: Suzuki K Poulose <[email protected]>
>> ---
>>   arch/arm64/include/asm/kvm_mmu.h              |  16 +-
>>   arch/arm64/include/asm/stage2_pgtable-nopmd.h |  42 ------
>>   arch/arm64/include/asm/stage2_pgtable-nopud.h |  39 -----
>>   arch/arm64/include/asm/stage2_pgtable.h       | 203 +++++++++++++++++---------
>>   4 files changed, 145 insertions(+), 155 deletions(-)
>>   delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
>>   delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 594c4e6..bc133ce 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -18,9 +18,10 @@
>>   #ifndef __ARM64_KVM_MMU_H__
>>   #define __ARM64_KVM_MMU_H__
>> +#include <asm/cpufeature.h>
>>   #include <asm/page.h>
>>   #include <asm/memory.h>
>> -#include <asm/cpufeature.h>
>
> OOI, why was the include of cpufeature.h moved earlier?

I think it was a minor cleanup to keep the includes in order. I will either
mention it or drop it from the hunk altogether.

Cheers
Suzuki

2018-04-26 10:57:15

by Julien Grall

[permalink] [raw]
Subject: Re: [PATCH v2 03/17] arm64: Make page table helpers reusable

Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> This patch rearranges the page table level helpers so that it can
> be reused for a page table with different number of levels
> (e.g, stage2 page table for a VM) than the kernel page tables.
> As such there is no functional change with this patch.
>
> The page table helpers are defined to do the right thing for the
> fixed page table levels set for the kernel. This patch tries to
> refactor the code such that, we can have helpers for each level,
> which should be used when the caller knows that the level exists
> for the page table dealt with. Since the kernel defines helpers
> p.d_action and __p.d_action, for consistency, we name the raw
> page table action helpers __raw_p.d_action.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Steve Capper <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>

Reviewed-by: Julien Grall <[email protected]>

Cheers,

> ---
> arch/arm64/include/asm/pgalloc.h | 34 +++++++++++++++++++----
> arch/arm64/include/asm/pgtable.h | 60 +++++++++++++++++++++++++---------------
> 2 files changed, 66 insertions(+), 28 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index 2e05bcd..ab4662c 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -29,6 +29,30 @@
> #define PGALLOC_GFP (GFP_KERNEL | __GFP_ZERO)
> #define PGD_SIZE (PTRS_PER_PGD * sizeof(pgd_t))
>
> +static inline void __raw_pmd_free(pmd_t *pmdp)
> +{
> + BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
> + free_page((unsigned long)pmdp);
> +}
> +
> +static inline void
> +__raw_pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
> +{
> + __raw_set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot));
> +}
> +
> +static inline void __raw_pud_free(pud_t *pudp)
> +{
> + BUG_ON((unsigned long)pudp & (PAGE_SIZE-1));
> + free_page((unsigned long)pudp);
> +}
> +
> +static inline void
> +__raw_pgd_populate(pgd_t *pgdp, phys_addr_t pudp, pgdval_t prot)
> +{
> + __raw_set_pgd(pgdp, __pgd(__phys_to_pgd_val(pudp) | prot));
> +}
> +
> #if CONFIG_PGTABLE_LEVELS > 2
>
> static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
> @@ -38,13 +62,12 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>
> static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
> {
> - BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
> - free_page((unsigned long)pmdp);
> + __raw_pmd_free(pmdp);
> }
>
> static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
> {
> - set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot));
> + __raw_pud_populate(pudp, pmdp, prot);
> }
>
> static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
> @@ -67,13 +90,12 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
>
> static inline void pud_free(struct mm_struct *mm, pud_t *pudp)
> {
> - BUG_ON((unsigned long)pudp & (PAGE_SIZE-1));
> - free_page((unsigned long)pudp);
> + __raw_pud_free(pudp);
> }
>
> static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t pudp, pgdval_t prot)
> {
> - set_pgd(pgdp, __pgd(__phys_to_pgd_val(pudp) | prot));
> + __raw_pgd_populate(pgdp, pudp, prot);
> }
>
> static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, pud_t *pudp)
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7e2c27e..5e22503 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -475,31 +475,39 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
> */
> #define mk_pte(page,prot) pfn_pte(page_to_pfn(page),prot)
>
> -#if CONFIG_PGTABLE_LEVELS > 2
> +#define __raw_pud_none(pud) (!pud_val(pud))
> +#define __raw_pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT))
> +#define __raw_pud_present(pud) pte_present(pud_pte(pud))
>
> -#define pmd_ERROR(pmd) __pmd_error(__FILE__, __LINE__, pmd_val(pmd))
> -
> -#define pud_none(pud) (!pud_val(pud))
> -#define pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT))
> -#define pud_present(pud) pte_present(pud_pte(pud))
> -
> -static inline void set_pud(pud_t *pudp, pud_t pud)
> +static inline void __raw_set_pud(pud_t *pudp, pud_t pud)
> {
> WRITE_ONCE(*pudp, pud);
> dsb(ishst);
> isb();
> }
>
> -static inline void pud_clear(pud_t *pudp)
> +static inline void __raw_pud_clear(pud_t *pudp)
> {
> - set_pud(pudp, __pud(0));
> + __raw_set_pud(pudp, __pud(0));
> }
>
> -static inline phys_addr_t pud_page_paddr(pud_t pud)
> +static inline phys_addr_t __raw_pud_page_paddr(pud_t pud)
> {
> return __pud_to_phys(pud);
> }
>
> +#if CONFIG_PGTABLE_LEVELS > 2
> +
> +#define pmd_ERROR(pmd) __pmd_error(__FILE__, __LINE__, pmd_val(pmd))
> +
> +#define pud_none(pud) __raw_pud_none(pud)
> +#define pud_bad(pud) __raw_pud_bad(pud)
> +#define pud_present(pud) __raw_pud_present(pud)
> +
> +#define set_pud(pudp, pud) __raw_set_pud((pudp), (pud))
> +#define pud_clear(pudp) __raw_pud_clear((pudp))
> +#define pud_page_paddr(pud) __raw_pud_page_paddr((pud))
> +
> /* Find an entry in the second-level page table. */
> #define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
>
> @@ -528,30 +536,38 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>
> #endif /* CONFIG_PGTABLE_LEVELS > 2 */
>
> -#if CONFIG_PGTABLE_LEVELS > 3
> +#define __raw_pgd_none(pgd) (!pgd_val(pgd))
> +#define __raw_pgd_bad(pgd) (!(pgd_val(pgd) & 2))
> +#define __raw_pgd_present(pgd) (pgd_val(pgd))
>
> -#define pud_ERROR(pud) __pud_error(__FILE__, __LINE__, pud_val(pud))
> -
> -#define pgd_none(pgd) (!pgd_val(pgd))
> -#define pgd_bad(pgd) (!(pgd_val(pgd) & 2))
> -#define pgd_present(pgd) (pgd_val(pgd))
> -
> -static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
> +static inline void __raw_set_pgd(pgd_t *pgdp, pgd_t pgd)
> {
> WRITE_ONCE(*pgdp, pgd);
> dsb(ishst);
> }
>
> -static inline void pgd_clear(pgd_t *pgdp)
> +static inline void __raw_pgd_clear(pgd_t *pgdp)
> {
> - set_pgd(pgdp, __pgd(0));
> + __raw_set_pgd(pgdp, __pgd(0));
> }
>
> -static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
> +static inline phys_addr_t __raw_pgd_page_paddr(pgd_t pgd)
> {
> return __pgd_to_phys(pgd);
> }
>
> +#if CONFIG_PGTABLE_LEVELS > 3
> +
> +#define pud_ERROR(pud) __pud_error(__FILE__, __LINE__, pud_val(pud))
> +
> +#define pgd_none(pgd) __raw_pgd_none((pgd))
> +#define pgd_bad(pgd) __raw_pgd_bad((pgd))
> +#define pgd_present(pgd) __raw_pgd_present((pgd))
> +
> +#define set_pgd(pgdp, pgd) __raw_set_pgd((pgdp), (pgd))
> +#define pgd_clear(pgdp) __raw_pgd_clear((pgdp))
> +#define pgd_page_paddr(pgd) __raw_pgd_page_paddr((pgd))
> +
> /* Find an entry in the frst-level page table. */
> #define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
>
>

--
Julien Grall

2018-04-26 10:57:21

by Julien Grall

[permalink] [raw]
Subject: Re: [PATCH v2 04/17] arm64: Refactor pud_huge for reusability

Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Make pud_huge reusable for stage2 tables, independent
> of the stage1 levels.
>
> Cc: Steve Capper <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>

Reviewed-by: Julien Grall <[email protected]>

Cheers,

> ---
> arch/arm64/include/asm/pgtable.h | 5 +++++
> arch/arm64/mm/hugetlbpage.c | 2 +-
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 5e22503..4a16c11 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -479,6 +479,11 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
> #define __raw_pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT))
> #define __raw_pud_present(pud) pte_present(pud_pte(pud))
>
> +static inline int __raw_pud_huge(pud_t pud)
> +{
> + return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT);
> +}
> +
> static inline void __raw_set_pud(pud_t *pudp, pud_t pud)
> {
> WRITE_ONCE(*pudp, pud);
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index ecc6818..3f4bb43 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -35,7 +35,7 @@ int pmd_huge(pmd_t pmd)
> int pud_huge(pud_t pud)
> {
> #ifndef __PAGETABLE_PMD_FOLDED
> - return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT);
> + return __raw_pud_huge(pud);
> #else
> return 0;
> #endif
>

--
Julien Grall

2018-04-26 10:59:42

by Julien Grall

[permalink] [raw]
Subject: Re: [PATCH v2 05/17] arm64: Helper for parange to PASize

Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
> size shift. Limit the size to the maximum supported by the kernel.
> We are about to move the user of this code and this helps to
> keep the changes cleaner.

It is probably worth to mention that you are also adding 52-bit support
in the patch.

Cheers,

>
> Cc: Mark Rutland <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> arch/arm64/include/asm/cpufeature.h | 16 ++++++++++++++++
> arch/arm64/kvm/hyp/s2-setup.c | 28 +++++-----------------------
> 2 files changed, 21 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index fbf0aab..1f2a5dd 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
> return zcr;
> }
>
> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
> +{
> + switch (parange) {
> + case 0: return 32;
> + case 1: return 36;
> + case 2: return 40;
> + case 3: return 42;
> + case 4: return 44;
> + /* Report 48 bit if the kernel doesn't support 52bit */
> + default:
> + case 5: return 48;
> +#ifdef CONFIG_ARM64_PA_BITS_52
> + case 6: return 52;
> +#endif
> + }
> +}
> #endif /* __ASSEMBLY__ */
>
> #endif
> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
> index 603e1ee..b1129c8 100644
> --- a/arch/arm64/kvm/hyp/s2-setup.c
> +++ b/arch/arm64/kvm/hyp/s2-setup.c
> @@ -19,11 +19,13 @@
> #include <asm/kvm_arm.h>
> #include <asm/kvm_asm.h>
> #include <asm/kvm_hyp.h>
> +#include <asm/cpufeature.h>
>
> u32 __hyp_text __init_stage2_translation(void)
> {
> u64 val = VTCR_EL2_FLAGS;
> u64 parange;
> + u32 phys_shift;
> u64 tmp;
>
> /*
> @@ -37,27 +39,7 @@ u32 __hyp_text __init_stage2_translation(void)
> val |= parange << 16;
>
> /* Compute the actual PARange... */
> - switch (parange) {
> - case 0:
> - parange = 32;
> - break;
> - case 1:
> - parange = 36;
> - break;
> - case 2:
> - parange = 40;
> - break;
> - case 3:
> - parange = 42;
> - break;
> - case 4:
> - parange = 44;
> - break;
> - case 5:
> - default:
> - parange = 48;
> - break;
> - }
> + phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
>
> /*
> * ... and clamp it to 40 bits, unless we have some braindead
> @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void)
> * return that value for the rest of the kernel to decide what
> * to do.
> */
> - val |= 64 - (parange > 40 ? 40 : parange);
> + val |= 64 - (phys_shift > 40 ? 40 : phys_shift);
>
> /*
> * Check the availability of Hardware Access Flag / Dirty Bit
> @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void)
>
> write_sysreg(val, vtcr_el2);
>
> - return parange;
> + return phys_shift;
> }
>

--
Julien Grall

2018-04-26 13:37:00

by Julien Grall

[permalink] [raw]
Subject: Re: [PATCH v2 08/17] kvm: arm/arm64: Prepare for VM specific stage2 translations

Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Right now the stage2 page table for a VM is hard coded, assuming
> an IPA of 40bits. As we are about to add support for per VM IPA,
> prepare the stage2 page table helpers to accept the kvm instance
> to make the right decision for the VM. No functional changes.
>
> Cc: Marc Zyngier <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> arch/arm/include/asm/kvm_arm.h | 3 +-
> arch/arm/include/asm/kvm_mmu.h | 15 +++-
> arch/arm/include/asm/stage2_pgtable.h | 42 ++++-----
> arch/arm64/include/asm/kvm_mmu.h | 7 +-
> arch/arm64/include/asm/stage2_pgtable-nopmd.h | 18 ++--
> arch/arm64/include/asm/stage2_pgtable-nopud.h | 16 ++--
> arch/arm64/include/asm/stage2_pgtable.h | 49 ++++++-----
> virt/kvm/arm/arm.c | 2 +-
> virt/kvm/arm/mmu.c | 119 +++++++++++++-------------
> virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +-
> 10 files changed, 148 insertions(+), 125 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 3ab8b37..c3f1f9b 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -133,8 +133,7 @@
> * space.
> */
> #define KVM_PHYS_SHIFT (40)
> -#define KVM_PHYS_SIZE (_AC(1, ULL) << KVM_PHYS_SHIFT)
> -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - _AC(1, ULL))

I assume you are moving them to kvm_mmu.h in order to match the arm64
side, right? If so, would not it make sense to make KVM_PHYS_SHIFT with it?

[...]

> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 7faed6e..594c4e6 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -134,8 +134,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
> * We currently only support a 40bit IPA.
> */
> #define KVM_PHYS_SHIFT (40)
> -#define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT)
> -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL)
> +
> +#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT
> +#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm))

NIT: is the _AC(...) necessary? It does not look like this function is
going to be usable in assembly.

> +#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))

Same here.

--
Julien Grall

2018-04-26 14:10:50

by Julien Grall

[permalink] [raw]
Subject: Re: [kvmtool PATCH 21/17] kvmtool: arm: Add support for creating VM with PA size

Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Specify the physical size for the VM encoded in the vm type.
>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> arm/include/arm-common/kvm-arch.h | 6 +++++-
> arm/kvm.c | 21 +++++++++++++++++++++
> 2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
> index ca7ab0f..64bc92a 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -42,7 +42,11 @@
>
> #define KVM_IRQ_OFFSET GIC_SPI_IRQ_BASE
>
> -#define KVM_VM_TYPE 0
> +extern unsigned long kvm_arm_type;
> +extern void kvm__arch_init_hyp(struct kvm *kvm);
> +
> +#define KVM_VM_TYPE kvm_arm_type
> +#define kvm__arch_init_hyp kvm__arch_init_hyp
>
> #define VIRTIO_DEFAULT_TRANS(kvm) \
> ((kvm)->cfg.arch.virtio_trans_pci ? VIRTIO_PCI : VIRTIO_MMIO)
> diff --git a/arm/kvm.c b/arm/kvm.c
> index 5701d41..a9a9140 100644
> --- a/arm/kvm.c
> +++ b/arm/kvm.c
> @@ -11,6 +11,8 @@
> #include <linux/kvm.h>
> #include <linux/sizes.h>
>
> +unsigned long kvm_arm_type;
> +
> struct kvm_ext kvm_req_ext[] = {
> { DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
> { DEFINE_KVM_EXT(KVM_CAP_ONE_REG) },
> @@ -18,6 +20,25 @@ struct kvm_ext kvm_req_ext[] = {
> { 0, 0 },
> };
>
> +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT _IO(KVMIO, 0x0a)
> +#endif

Once the IOCTL is agreed, I think you would need to update
include/linux/kvm.h.

> +
> +void kvm__arch_init_hyp(struct kvm *kvm)
> +{
> + unsigned max_ipa;
> +
> + max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT);
> + if (max_ipa < 0)
> + max_ipa = 40;
> + if (!kvm->cfg.arch.phys_shift)
> + kvm->cfg.arch.phys_shift = 40;
> + if (kvm->cfg.arch.phys_shift > max_ipa)
> + die("Requested PA size (%u) is not supported by the host (%ubits)\n",
> + kvm->cfg.arch.phys_shift, max_ipa);
> + kvm_arm_type = kvm->cfg.arch.phys_shift;

FWIW, older Linux will not accept a KVM type other than 0. So I think
the best would be to bail out if KVM_ARM_GET_MAX_VM_PHYS_SHIFT does not
exist.

For safety, you could even check that arch.phys_shift is always 40 when
running under older Linux.

Cheers,

--
Julien Grall

2018-04-27 15:19:27

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 05/17] arm64: Helper for parange to PASize

On 26/04/18 11:58, Julien Grall wrote:
> Hi Suzuki,
>
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
>> size shift. Limit the size to the maximum supported by the kernel.
>> We are about to move the user of this code and this helps to
>> keep the changes cleaner.
>
> It is probably worth to mention that you are also adding 52-bit support in the patch.

Sure, will do. Can I take that as a Reviewed-by with the fixed
commit description ?

Cheers
Suzuki

>
> Cheers,
>
>>
>> Cc: Mark Rutland <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Christoffer Dall <[email protected]>
>> Signed-off-by: Suzuki K Poulose <[email protected]>
>> ---
>>   arch/arm64/include/asm/cpufeature.h | 16 ++++++++++++++++
>>   arch/arm64/kvm/hyp/s2-setup.c       | 28 +++++-----------------------
>>   2 files changed, 21 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index fbf0aab..1f2a5dd 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
>>       return zcr;
>>   }
>> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
>> +{
>> +    switch (parange) {
>> +    case 0: return 32;
>> +    case 1: return 36;
>> +    case 2: return 40;
>> +    case 3: return 42;
>> +    case 4: return 44;
>> +    /* Report 48 bit if the kernel doesn't support 52bit */
>> +    default:
>> +    case 5: return 48;
>> +#ifdef CONFIG_ARM64_PA_BITS_52
>> +    case 6: return 52;
>> +#endif
>> +    }
>> +}
>>   #endif /* __ASSEMBLY__ */
>>   #endif
>> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
>> index 603e1ee..b1129c8 100644
>> --- a/arch/arm64/kvm/hyp/s2-setup.c
>> +++ b/arch/arm64/kvm/hyp/s2-setup.c
>> @@ -19,11 +19,13 @@
>>   #include <asm/kvm_arm.h>
>>   #include <asm/kvm_asm.h>
>>   #include <asm/kvm_hyp.h>
>> +#include <asm/cpufeature.h>
>>   u32 __hyp_text __init_stage2_translation(void)
>>   {
>>       u64 val = VTCR_EL2_FLAGS;
>>       u64 parange;
>> +    u32 phys_shift;
>>       u64 tmp;
>>       /*
>> @@ -37,27 +39,7 @@ u32 __hyp_text __init_stage2_translation(void)
>>       val |= parange << 16;
>>       /* Compute the actual PARange... */
>> -    switch (parange) {
>> -    case 0:
>> -        parange = 32;
>> -        break;
>> -    case 1:
>> -        parange = 36;
>> -        break;
>> -    case 2:
>> -        parange = 40;
>> -        break;
>> -    case 3:
>> -        parange = 42;
>> -        break;
>> -    case 4:
>> -        parange = 44;
>> -        break;
>> -    case 5:
>> -    default:
>> -        parange = 48;
>> -        break;
>> -    }
>> +    phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
>>       /*
>>        * ... and clamp it to 40 bits, unless we have some braindead
>> @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void)
>>        * return that value for the rest of the kernel to decide what
>>        * to do.
>>        */
>> -    val |= 64 - (parange > 40 ? 40 : parange);
>> +    val |= 64 - (phys_shift > 40 ? 40 : phys_shift);
>>       /*
>>        * Check the availability of Hardware Access Flag / Dirty Bit
>> @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void)
>>       write_sysreg(val, vtcr_el2);
>> -    return parange;
>> +    return phys_shift;
>>   }
>>
>


2018-04-27 15:20:20

by Julien Grall

[permalink] [raw]
Subject: Re: [PATCH v2 05/17] arm64: Helper for parange to PASize



On 27/04/18 16:18, Suzuki K Poulose wrote:
> On 26/04/18 11:58, Julien Grall wrote:
>> Hi Suzuki,
>>
>> On 27/03/18 14:15, Suzuki K Poulose wrote:
>>> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
>>> size shift. Limit the size to the maximum supported by the kernel.
>>> We are about to move the user of this code and this helps to
>>> keep the changes cleaner.
>>
>> It is probably worth to mention that you are also adding 52-bit
>> support in the patch.
>
> Sure, will do. Can I take that as a Reviewed-by with the fixed
> commit description ?

Yes. Here we go:

Reviewed-by: Julien Grall <[email protected]>

Cheers,

>
> Cheers
> Suzuki
>
>>
>> Cheers,
>>
>>>
>>> Cc: Mark Rutland <[email protected]>
>>> Cc: Catalin Marinas <[email protected]>
>>> Cc: Will Deacon <[email protected]>
>>> Cc: Marc Zyngier <[email protected]>
>>> Cc: Christoffer Dall <[email protected]>
>>> Signed-off-by: Suzuki K Poulose <[email protected]>
>>> ---
>>>   arch/arm64/include/asm/cpufeature.h | 16 ++++++++++++++++
>>>   arch/arm64/kvm/hyp/s2-setup.c       | 28 +++++-----------------------
>>>   2 files changed, 21 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/cpufeature.h
>>> b/arch/arm64/include/asm/cpufeature.h
>>> index fbf0aab..1f2a5dd 100644
>>> --- a/arch/arm64/include/asm/cpufeature.h
>>> +++ b/arch/arm64/include/asm/cpufeature.h
>>> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
>>>       return zcr;
>>>   }
>>> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
>>> +{
>>> +    switch (parange) {
>>> +    case 0: return 32;
>>> +    case 1: return 36;
>>> +    case 2: return 40;
>>> +    case 3: return 42;
>>> +    case 4: return 44;
>>> +    /* Report 48 bit if the kernel doesn't support 52bit */
>>> +    default:
>>> +    case 5: return 48;
>>> +#ifdef CONFIG_ARM64_PA_BITS_52
>>> +    case 6: return 52;
>>> +#endif
>>> +    }
>>> +}
>>>   #endif /* __ASSEMBLY__ */
>>>   #endif
>>> diff --git a/arch/arm64/kvm/hyp/s2-setup.c
>>> b/arch/arm64/kvm/hyp/s2-setup.c
>>> index 603e1ee..b1129c8 100644
>>> --- a/arch/arm64/kvm/hyp/s2-setup.c
>>> +++ b/arch/arm64/kvm/hyp/s2-setup.c
>>> @@ -19,11 +19,13 @@
>>>   #include <asm/kvm_arm.h>
>>>   #include <asm/kvm_asm.h>
>>>   #include <asm/kvm_hyp.h>
>>> +#include <asm/cpufeature.h>
>>>   u32 __hyp_text __init_stage2_translation(void)
>>>   {
>>>       u64 val = VTCR_EL2_FLAGS;
>>>       u64 parange;
>>> +    u32 phys_shift;
>>>       u64 tmp;
>>>       /*
>>> @@ -37,27 +39,7 @@ u32 __hyp_text __init_stage2_translation(void)
>>>       val |= parange << 16;
>>>       /* Compute the actual PARange... */
>>> -    switch (parange) {
>>> -    case 0:
>>> -        parange = 32;
>>> -        break;
>>> -    case 1:
>>> -        parange = 36;
>>> -        break;
>>> -    case 2:
>>> -        parange = 40;
>>> -        break;
>>> -    case 3:
>>> -        parange = 42;
>>> -        break;
>>> -    case 4:
>>> -        parange = 44;
>>> -        break;
>>> -    case 5:
>>> -    default:
>>> -        parange = 48;
>>> -        break;
>>> -    }
>>> +    phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
>>>       /*
>>>        * ... and clamp it to 40 bits, unless we have some braindead
>>> @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void)
>>>        * return that value for the rest of the kernel to decide what
>>>        * to do.
>>>        */
>>> -    val |= 64 - (parange > 40 ? 40 : parange);
>>> +    val |= 64 - (phys_shift > 40 ? 40 : phys_shift);
>>>       /*
>>>        * Check the availability of Hardware Access Flag / Dirty Bit
>>> @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void)
>>>       write_sysreg(val, vtcr_el2);
>>> -    return parange;
>>> +    return phys_shift;
>>>   }
>>>
>>
>

--
Julien Grall

2018-04-27 15:23:34

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 08/17] kvm: arm/arm64: Prepare for VM specific stage2 translations

On 26/04/18 14:35, Julien Grall wrote:
> Hi Suzuki,
>
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> Right now the stage2 page table for a VM is hard coded, assuming
>> an IPA of 40bits. As we are about to add support for per VM IPA,
>> prepare the stage2 page table helpers to accept the kvm instance
>> to make the right decision for the VM. No functional changes.


>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index 3ab8b37..c3f1f9b 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -133,8 +133,7 @@
>>    * space.
>>    */
>>   #define KVM_PHYS_SHIFT    (40)
>> -#define KVM_PHYS_SIZE    (_AC(1, ULL) << KVM_PHYS_SHIFT)
>> -#define KVM_PHYS_MASK    (KVM_PHYS_SIZE - _AC(1, ULL))
>
> I assume you are moving them to kvm_mmu.h in order to match the arm64 side, right? If so, would not it make sense to make KVM_PHYS_SHIFT with it?
>
> [...]

I am moving all the macros that depend on the "kvm" instance to kvm_mmu.h.
I will see if I can move the KVM_PHYS_SHIFT without much trouble.

>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 7faed6e..594c4e6 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -134,8 +134,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
>>    * We currently only support a 40bit IPA.
>>    */
>>   #define KVM_PHYS_SHIFT    (40)
>> -#define KVM_PHYS_SIZE    (1UL << KVM_PHYS_SHIFT)
>> -#define KVM_PHYS_MASK    (KVM_PHYS_SIZE - 1UL)
>> +
>> +#define kvm_phys_shift(kvm)        KVM_PHYS_SHIFT
>> +#define kvm_phys_size(kvm)        (_AC(1, ULL) << kvm_phys_shift(kvm))
>
> NIT: is the _AC(...) necessary? It does not look like this function is going to be usable in assembly.
>
>> +#define kvm_phys_mask(kvm)        (kvm_phys_size(kvm) - _AC(1, ULL))
>
> Same here.
>

Ok, will drop them.

Cheers
Suzuki



2018-04-27 16:03:08

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 08/17] kvm: arm/arm64: Prepare for VM specific stage2 translations

On 27/04/18 16:22, Suzuki K Poulose wrote:
> On 26/04/18 14:35, Julien Grall wrote:
>> Hi Suzuki,
>>
>> On 27/03/18 14:15, Suzuki K Poulose wrote:
>>> Right now the stage2 page table for a VM is hard coded, assuming
>>> an IPA of 40bits. As we are about to add support for per VM IPA,
>>> prepare the stage2 page table helpers to accept the kvm instance
>>> to make the right decision for the VM. No functional changes.
>
>
>>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>>> index 3ab8b37..c3f1f9b 100644
>>> --- a/arch/arm/include/asm/kvm_arm.h
>>> +++ b/arch/arm/include/asm/kvm_arm.h
>>> @@ -133,8 +133,7 @@
>>>    * space.
>>>    */
>>>   #define KVM_PHYS_SHIFT    (40)
>>> -#define KVM_PHYS_SIZE    (_AC(1, ULL) << KVM_PHYS_SHIFT)
>>> -#define KVM_PHYS_MASK    (KVM_PHYS_SIZE - _AC(1, ULL))
>>
>> I assume you are moving them to kvm_mmu.h in order to match the arm64 side, right? If so, would not it make sense to make KVM_PHYS_SHIFT with it?
>>
>> [...]
>
> I am moving all the macros that depend on the "kvm" instance to kvm_mmu.h.
> I will see if I can move the KVM_PHYS_SHIFT without much trouble.

It looks like we can't do that easily. KVM_PHYS_SHIFT is used for KVM_T0SZ
on arm, even though that can be simply hard coded to avoid the dependency on
KVM_PHYS_SHIFT (like we did for arm64, T0SZ is defined to 24). I would leave it
as it is to avoid the noise.

Cheers
Suzuki

2018-04-27 16:06:09

by Julien Grall

[permalink] [raw]
Subject: Re: [PATCH v2 08/17] kvm: arm/arm64: Prepare for VM specific stage2 translations

Hi Suzuki,

On 27/04/18 16:58, Suzuki K Poulose wrote:
> On 27/04/18 16:22, Suzuki K Poulose wrote:
>> On 26/04/18 14:35, Julien Grall wrote:
>>> Hi Suzuki,
>>>
>>> On 27/03/18 14:15, Suzuki K Poulose wrote:
>>>> Right now the stage2 page table for a VM is hard coded, assuming
>>>> an IPA of 40bits. As we are about to add support for per VM IPA,
>>>> prepare the stage2 page table helpers to accept the kvm instance
>>>> to make the right decision for the VM. No functional changes.
>>
>>
>>>> diff --git a/arch/arm/include/asm/kvm_arm.h
>>>> b/arch/arm/include/asm/kvm_arm.h
>>>> index 3ab8b37..c3f1f9b 100644
>>>> --- a/arch/arm/include/asm/kvm_arm.h
>>>> +++ b/arch/arm/include/asm/kvm_arm.h
>>>> @@ -133,8 +133,7 @@
>>>>    * space.
>>>>    */
>>>>   #define KVM_PHYS_SHIFT    (40)
>>>> -#define KVM_PHYS_SIZE    (_AC(1, ULL) << KVM_PHYS_SHIFT)
>>>> -#define KVM_PHYS_MASK    (KVM_PHYS_SIZE - _AC(1, ULL))
>>>
>>> I assume you are moving them to kvm_mmu.h in order to match the arm64
>>> side, right? If so, would not it make sense to make KVM_PHYS_SHIFT
>>> with it?
>>>
>>> [...]
>>
>> I am moving all the macros that depend on the "kvm" instance to
>> kvm_mmu.h.
>> I will see if I can move the KVM_PHYS_SHIFT without much trouble.
>
> It looks like we can't do that easily. KVM_PHYS_SHIFT is used for KVM_T0SZ
> on arm, even though that can be simply hard coded to avoid the
> dependency on
> KVM_PHYS_SHIFT (like we did for arm64, T0SZ is defined to 24). I would
> leave it
> as it is to avoid the noise.

Fine. That was only a suggestion :).

Cheers,

--
Julien Grall

2018-04-30 11:14:39

by Julien Grall

[permalink] [raw]
Subject: Re: [PATCH v2 10/17] kvm: arm64: Dynamic configuration of VTCR and VTTBR mask

Hi Suzuki,

The algos in this patch looks good to me. A couple of NIT below.

On 27/03/18 14:15, Suzuki K Poulose wrote:
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index b0c8417..176551c 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -124,6 +124,8 @@
> #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT)
> #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT)
>
> +#define VTCR_EL2_T0SZ(x) TCR_T0SZ((x))

NIT: The inner parentheses should not be necessary.

[...]

> +/*
> + * VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
> + * Interestingly, it depends on the page size.
> + * See D.10.2.110, VTCR_EL2, in ARM DDI 0487B.b
> + *
> + * -----------------------------------------
> + * | Entry level | 4K | 16K/64K |
> + * ------------------------------------------
> + * | Level: 0 | 2 | - |
> + * ------------------------------------------
> + * | Level: 1 | 1 | 2 |
> + * ------------------------------------------
> + * | Level: 2 | 0 | 1 |
> + * ------------------------------------------
> + * | Level: 3 | - | 0 |
> + * ------------------------------------------
> + *
> + * That table roughly translates to :
> + *
> + * SL0(PAGE_SIZE, Entry_level) = SL0_BASE(PAGE_SIZE) - Entry_Level
> + *
> + * Where SL0_BASE(4K) = 2 and SL0_BASE(16K) = 3, SL0_BASE(64K) = 3, provided
> + * we take care of ruling out the unsupported cases and
> + * Entry_Level = 4 - Number_of_levels.
> + *
> + */
> +#define VTCR_EL2_SL0(levels) \
> + ((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT)
> +/*
> + * ARM VMSAv8-64 defines an algorithm for finding the translation table
> + * descriptors in section D4.2.8 in ARM DDI 0487B.b.
> + *
> + * The algorithm defines the expectaions on the BaseAddress (for the page

NIT: s/expectaions/expectations/

Cheers,


--
Julien Grall

2018-04-30 14:17:39

by Julien Grall

[permalink] [raw]
Subject: Re: [kvmtool PATCH 21/17] kvmtool: arm: Add support for creating VM with PA size

Hi,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> diff --git a/arm/kvm.c b/arm/kvm.c
> index 5701d41..a9a9140 100644
> --- a/arm/kvm.c
> +++ b/arm/kvm.c
> @@ -11,6 +11,8 @@
> #include <linux/kvm.h>
> #include <linux/sizes.h>
>
> +unsigned long kvm_arm_type;
> +
> struct kvm_ext kvm_req_ext[] = {
> { DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
> { DEFINE_KVM_EXT(KVM_CAP_ONE_REG) },
> @@ -18,6 +20,25 @@ struct kvm_ext kvm_req_ext[] = {
> { 0, 0 },
> };
>
> +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT
> +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT _IO(KVMIO, 0x0a)
> +#endif
> +
> +void kvm__arch_init_hyp(struct kvm *kvm)
> +{
> + unsigned max_ipa;
> +
> + max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT);
> + if (max_ipa < 0)

Another issues spotted while doing some testing. This will always be
false because max_ipa is unsigned.

I think we want to turn max_ipa to signed.

Cheers,

--
Julien Grall

2018-04-30 14:19:29

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [kvmtool PATCH 21/17] kvmtool: arm: Add support for creating VM with PA size

On 30/04/18 15:17, Julien Grall wrote:
> Hi,
>
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> diff --git a/arm/kvm.c b/arm/kvm.c
>> index 5701d41..a9a9140 100644
>> --- a/arm/kvm.c
>> +++ b/arm/kvm.c
>> @@ -11,6 +11,8 @@
>>   #include <linux/kvm.h>
>>   #include <linux/sizes.h>
>> +unsigned long kvm_arm_type;
>> +
>>   struct kvm_ext kvm_req_ext[] = {
>>       { DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
>>       { DEFINE_KVM_EXT(KVM_CAP_ONE_REG) },
>> @@ -18,6 +20,25 @@ struct kvm_ext kvm_req_ext[] = {
>>       { 0, 0 },
>>   };
>> +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT
>> +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT        _IO(KVMIO, 0x0a)
>> +#endif
>> +
>> +void kvm__arch_init_hyp(struct kvm *kvm)
>> +{
>> +    unsigned max_ipa;
>> +
>> +    max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT);
>> +    if (max_ipa < 0)
>
> Another issues spotted while doing some testing. This will always be false because max_ipa is unsigned.
>
> I think we want to turn max_ipa to signed.

Good catch. I will fix it.

Cheers
Suzuki


2018-05-03 14:43:02

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v2 11/17] kvm: arm64: Configure VTCR per VM

Hi Suzuki,

On 27/03/18 14:15, Suzuki K Poulose wrote:
> We set VTCR_EL2 very early during the stage2 init and don't
> touch it ever. This is fine as we had a fixed IPA size. This
> patch changes the behavior to set the VTCR for a given VM,
> depending on its stage2 table. The common configuration for
> VTCR is still performed during the early init as we have to
> retain the hardware access flag update bits (VTCR_EL2_HA)
> per CPU (as they are only set for the CPUs which are capabile).
> The bits defining the number of levels in the page table (SL0)
> and and the size of the Input address to the translation (T0SZ)
> are programmed for each VM upon entry to the guest.


> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 596f8e4..9f3c8b8 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -392,10 +392,12 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>
> static inline void __cpu_init_stage2(void)
> {
> - u32 parange = kvm_call_hyp(__init_stage2_translation);
> + u32 ps;
>
> - WARN_ONCE(parange < 40,
> - "PARange is %d bits, unsupported configuration!", parange);
> + kvm_call_hyp(__init_stage2_translation);
> + ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1));

Doesn't id_aa64mmfr0_parange_to_phys_shift() expect you do to the mask and shift
for it? This will always hit the default case.


> + WARN_ONCE(ps < 40,
> + "PARange is %d bits, unsupported configuration!", ps);
> }



Thanks,

James

2018-05-03 14:43:31

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v2 05/17] arm64: Helper for parange to PASize

Hi Suzuki,

Nit: KVM in the subject line?

On 27/03/18 14:15, Suzuki K Poulose wrote:
> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
> size shift. Limit the size to the maximum supported by the kernel.
> We are about to move the user of this code and this helps to
> keep the changes cleaner.

> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index fbf0aab..1f2a5dd 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
> return zcr;
> }
>
> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
> +{
> + switch (parange) {
> + case 0: return 32;
> + case 1: return 36;
> + case 2: return 40;
> + case 3: return 42;
> + case 4: return 44;
> + /* Report 48 bit if the kernel doesn't support 52bit */
> + default:
> + case 5: return 48;
> +#ifdef CONFIG_ARM64_PA_BITS_52
> + case 6: return 52;
> +#endif

Eeew. I thought 'default' had to appear at the end of the list, but evidently
not! If the last three bit value ever gets used this is going to look really weird.

Can't we have a helper that just does the mapping, then apply the clamping with
something like:
| parange = min(CONFIG_ARM64_PA_BITS, parange);


Its odd that the helper has the id-register in the name, but expects you do the
shift and mask for it...

(and for this patch, KVM has already done the 52bit clamping with:
| if (parange > ID_AA64MMFR0_PARANGE_MAX)
| parange = ID_AA64MMFR0_PARANGE_MAX;
)


> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
> index 603e1ee..b1129c8 100644
> --- a/arch/arm64/kvm/hyp/s2-setup.c
> +++ b/arch/arm64/kvm/hyp/s2-setup.c
> @@ -19,11 +19,13 @@
> #include <asm/kvm_arm.h>
> #include <asm/kvm_asm.h>
> #include <asm/kvm_hyp.h>
> +#include <asm/cpufeature.h>
>
> u32 __hyp_text __init_stage2_translation(void)
> {

Nit: Why change the variable you put this in, if its all removed again in patch 11?


Thanks,

James


2018-05-08 11:17:21

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 11/17] kvm: arm64: Configure VTCR per VM

On 03/05/18 15:39, James Morse wrote:
> Hi Suzuki,
>
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> We set VTCR_EL2 very early during the stage2 init and don't
>> touch it ever. This is fine as we had a fixed IPA size. This
>> patch changes the behavior to set the VTCR for a given VM,
>> depending on its stage2 table. The common configuration for
>> VTCR is still performed during the early init as we have to
>> retain the hardware access flag update bits (VTCR_EL2_HA)
>> per CPU (as they are only set for the CPUs which are capabile).
>> The bits defining the number of levels in the page table (SL0)
>> and and the size of the Input address to the translation (T0SZ)
>> are programmed for each VM upon entry to the guest.
>
>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 596f8e4..9f3c8b8 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -392,10 +392,12 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>>
>> static inline void __cpu_init_stage2(void)
>> {
>> - u32 parange = kvm_call_hyp(__init_stage2_translation);
>> + u32 ps;
>>
>> - WARN_ONCE(parange < 40,
>> - "PARange is %d bits, unsupported configuration!", parange);
>> + kvm_call_hyp(__init_stage2_translation);
>> + ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1));
>
> Doesn't id_aa64mmfr0_parange_to_phys_shift() expect you do to the mask and shift
> for it? This will always hit the default case.

Good catch ! The error case was not hit on the system I tested, as it was
indeed having 48bit PA. I should have done more testing with Juno where it
is 40bit PA (which doesn't really allow different phys-shift ranges).

I will change the helper to extract the parange and the convert it.
Also, rename it to :

id_aa64mmfr0_phys_shift()

Cheers
Suzuki

>
>
>> + WARN_ONCE(ps < 40,
>> + "PARange is %d bits, unsupported configuration!", ps);
>> }
>
>
>
> Thanks,
>
> James
>


2018-05-08 13:47:41

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 05/17] arm64: Helper for parange to PASize

On 03/05/18 15:39, James Morse wrote:
> Hi Suzuki,
>
> Nit: KVM in the subject line?

Well, the helper is generic and its just that KVM makes use of it.

>
> On 27/03/18 14:15, Suzuki K Poulose wrote:
>> Add a helper to convert ID_AA64MMFR0_EL1:PARange to they physical
>> size shift. Limit the size to the maximum supported by the kernel.
>> We are about to move the user of this code and this helps to
>> keep the changes cleaner.
>
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index fbf0aab..1f2a5dd 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -311,6 +311,22 @@ static inline u64 read_zcr_features(void)
>> return zcr;
>> }
>>
>> +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
>> +{
>> + switch (parange) {
>> + case 0: return 32;
>> + case 1: return 36;
>> + case 2: return 40;
>> + case 3: return 42;
>> + case 4: return 44;
>> + /* Report 48 bit if the kernel doesn't support 52bit */
>> + default:
>> + case 5: return 48;
>> +#ifdef CONFIG_ARM64_PA_BITS_52
>> + case 6: return 52;
>> +#endif
>
> Eeew. I thought 'default' had to appear at the end of the list, but evidently
> not! If the last three bit value ever gets used this is going to look really weird.

I could rearrange them a bit to :

case 4: ..
#ifdef CONFIG_ARM64_PA_BITS_52
case 6: ...
#endif
case 5:
/* Report 48 bit if the kernel doesn't support 52bit */
default: return 48;

>
> Can't we have a helper that just does the mapping, then apply the clamping with
> something like:
> | parange = min(CONFIG_ARM64_PA_BITS, parange);

Yes, I think that might be a bit cleaner.

>
>
> Its odd that the helper has the id-register in the name, but expects you do the
> shift and mask for it...
>
> (and for this patch, KVM has already done the 52bit clamping with:
> | if (parange > ID_AA64MMFR0_PARANGE_MAX)
> | parange = ID_AA64MMFR0_PARANGE_MAX;
> )
>

As mentioned in the other thread, I will change the name of the helper.

>
>> diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
>> index 603e1ee..b1129c8 100644
>> --- a/arch/arm64/kvm/hyp/s2-setup.c
>> +++ b/arch/arm64/kvm/hyp/s2-setup.c
>> @@ -19,11 +19,13 @@
>> #include <asm/kvm_arm.h>
>> #include <asm/kvm_asm.h>
>> #include <asm/kvm_hyp.h>
>> +#include <asm/cpufeature.h>
>>
>> u32 __hyp_text __init_stage2_translation(void)
>> {
>
> Nit: Why change the variable you put this in, if its all removed again in patch 11?

The parange holds the PARange initially, used to set the VTCR.IPS and is then
overloaded with the converted "phys-shift". The change is a minor cleanup to
make that clear, even though we remove it later as we don't deal with the
phys-shifts anymore. I would prefer to keep it as it is.

Cheers
Suzuki

2018-07-13 00:37:09

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 02/17] virtio: pci-legacy: Validate queue pfn

On Tue, Mar 27, 2018 at 05:11:04PM +0300, Michael S. Tsirkin wrote:
> On Tue, Mar 27, 2018 at 02:15:12PM +0100, Suzuki K Poulose wrote:
> > Legacy PCI over virtio uses a 32bit PFN for the queue. If the
> > queue pfn is too large to fit in 32bits, which we could hit on
> > arm64 systems with 52bit physical addresses (even with 64K page
> > size), we simply miss out a proper link to the other side of
> > the queue.
> >
> > Add a check to validate the PFN, rather than silently breaking
> > the devices.
> >
> > Cc: "Michael S. Tsirkin" <[email protected]>
> > Cc: Jason Wang <[email protected]>
> > Cc: Marc Zyngier <[email protected]>
> > Cc: Christoffer Dall <[email protected]>
> > Cc: Peter Maydel <[email protected]>
> > Cc: Jean-Philippe Brucker <[email protected]>
> > Signed-off-by: Suzuki K Poulose <[email protected]>
> > ---
> > drivers/virtio/virtio_pci_legacy.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> > index 2780886..4b84a75 100644
> > --- a/drivers/virtio/virtio_pci_legacy.c
> > +++ b/drivers/virtio/virtio_pci_legacy.c
> > @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> > struct virtqueue *vq;
> > u16 num;
> > int err;
> > + u64 q_pfn;
> >
> > /* Select the queue we're interested in */
> > iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
> > @@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> > if (!vq)
> > return ERR_PTR(-ENOMEM);
> >
> > + q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
> > + if (q_pfn >> 32) {
> > + dev_err(&vp_dev->pci_dev->dev, "virtio-pci queue PFN too large\n");
> > + err = -ENOMEM;
>
> ENOMEM seems wrong here. E2BIG?
>
> > + goto out_del_vq;
> > + }
> > +
> > /* activate the queue */
> > - iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
> > - vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > + iowrite32((u32)q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>
> Is the cast really necessary here?
>
> >
> > vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY;
> >
> > @@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> >
> > out_deactivate:
> > iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
> > +out_del_vq:
> > vring_del_virtqueue(vq);
> > return ERR_PTR(err);
> > }
> > --
> > 2.7.4

Ping are you going to address and repost, or should I drop this?

--
MST

2018-07-13 08:55:52

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 02/17] virtio: pci-legacy: Validate queue pfn

On 13/07/18 01:36, Michael S. Tsirkin wrote:
> On Tue, Mar 27, 2018 at 05:11:04PM +0300, Michael S. Tsirkin wrote:
>> On Tue, Mar 27, 2018 at 02:15:12PM +0100, Suzuki K Poulose wrote:
>>> Legacy PCI over virtio uses a 32bit PFN for the queue. If the
>>> queue pfn is too large to fit in 32bits, which we could hit on
>>> arm64 systems with 52bit physical addresses (even with 64K page
>>> size), we simply miss out a proper link to the other side of
>>> the queue.
>>>
>>> Add a check to validate the PFN, rather than silently breaking
>>> the devices.
>>>
>>> Cc: "Michael S. Tsirkin" <[email protected]>
>>> Cc: Jason Wang <[email protected]>
>>> Cc: Marc Zyngier <[email protected]>
>>> Cc: Christoffer Dall <[email protected]>
>>> Cc: Peter Maydel <[email protected]>
>>> Cc: Jean-Philippe Brucker <[email protected]>
>>> Signed-off-by: Suzuki K Poulose <[email protected]>
>>> ---
>>> drivers/virtio/virtio_pci_legacy.c | 12 ++++++++++--
>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
>>> index 2780886..4b84a75 100644
>>> --- a/drivers/virtio/virtio_pci_legacy.c
>>> +++ b/drivers/virtio/virtio_pci_legacy.c
>>> @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>>> struct virtqueue *vq;
>>> u16 num;
>>> int err;
>>> + u64 q_pfn;
>>>
>>> /* Select the queue we're interested in */
>>> iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
>>> @@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>>> if (!vq)
>>> return ERR_PTR(-ENOMEM);
>>>
>>> + q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
>>> + if (q_pfn >> 32) {
>>> + dev_err(&vp_dev->pci_dev->dev, "virtio-pci queue PFN too large\n");
>>> + err = -ENOMEM;
>>
>> ENOMEM seems wrong here. E2BIG?
>>
>>> + goto out_del_vq;
>>> + }
>>> +
>>> /* activate the queue */
>>> - iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
>>> - vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>>> + iowrite32((u32)q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>>
>> Is the cast really necessary here?
>>
>>>
>>> vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY;
>>>
>>> @@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
>>>
>>> out_deactivate:
>>> iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
>>> +out_del_vq:
>>> vring_del_virtqueue(vq);
>>> return ERR_PTR(err);
>>> }
>>> --
>>> 2.7.4

Michael,

>
> Ping are you going to address and repost, or should I drop this?

This was addressed and reposted as v3, which needs a minor update to the
error message as mentioned here [0]. I will post the fixed version today.

[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/588398.html

Thanks
Suzuki