2018-09-26 15:12:35

by Lukas Braun

[permalink] [raw]
Subject: [PATCH v2] KVM: arm/arm64: Check memslot bounds before mapping hugepages

Userspace can create a memslot with memory backed by (transparent)
hugepages, but with bounds that do not align with hugepages.
In that case, we cannot map the entire region in the guest as hugepages
without exposing additional host memory to the guest and potentially
interfering with other memslots.
Consequently, this patch adds a bounds check when populating guest page
tables and forces the creation of regular PTEs if mapping an entire
hugepage would violate the memslots bounds.

Signed-off-by: Lukas Braun <[email protected]>
---

Hi everyone,

for v2, in addition to writing the condition the way Marc suggested, I
moved the whole check so it also catches the problem when the hugepage
was allocated explicitly, not only for THPs. The second line is quite
long, but splitting it up would make things rather ugly IMO, so I left
it as it is.

Regards,
Lukas


virt/kvm/arm/mmu.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ed162a6c57c5..ba77339e23ec 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1500,7 +1500,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}

- if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
+ if ((fault_ipa & S2_PMD_MASK) < (memslot->base_gfn << PAGE_SHIFT) ||
+ ALIGN(fault_ipa, S2_PMD_SIZE) >= ((memslot->base_gfn + memslot->npages) << PAGE_SHIFT)) {
+ /* PMD entry would map something outside of the memslot */
+ force_pte = true;
+ } else if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
hugetlb = true;
gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
} else {
--
2.11.0



2018-10-04 09:54:26

by Punit Agrawal

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: arm/arm64: Check memslot bounds before mapping hugepages

Hi Lukas,

Lukas Braun <[email protected]> writes:

> Userspace can create a memslot with memory backed by (transparent)
> hugepages, but with bounds that do not align with hugepages.
> In that case, we cannot map the entire region in the guest as hugepages
> without exposing additional host memory to the guest and potentially
> interfering with other memslots.
> Consequently, this patch adds a bounds check when populating guest page
> tables and forces the creation of regular PTEs if mapping an entire
> hugepage would violate the memslots bounds.
>
> Signed-off-by: Lukas Braun <[email protected]>
> ---
>
> Hi everyone,
>
> for v2, in addition to writing the condition the way Marc suggested, I
> moved the whole check so it also catches the problem when the hugepage
> was allocated explicitly, not only for THPs.

Ok, that makes sense. Memslot bounds could exceed for hugetlbfs pages as
well.

> The second line is quite long, but splitting it up would make things
> rather ugly IMO, so I left it as it is.

Let's try to do better - user_mem_abort() is quite hard to follow as it
is.

>
>
> Regards,
> Lukas
>
>
> virt/kvm/arm/mmu.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index ed162a6c57c5..ba77339e23ec 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1500,7 +1500,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> return -EFAULT;
> }
>
> - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
> + if ((fault_ipa & S2_PMD_MASK) < (memslot->base_gfn << PAGE_SHIFT) ||
> + ALIGN(fault_ipa, S2_PMD_SIZE) >= ((memslot->base_gfn + memslot->npages) << PAGE_SHIFT)) {
> + /* PMD entry would map something outside of the memslot */
> + force_pte = true;
> + } else if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
> hugetlb = true;
> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> } else {

For the purpose of this fix, using a helper to check whether the mapping
fits in the memslot makes things clearer (imo) (untested patch below) -

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ed162a6c57c5..8bca141eb45e 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1466,6 +1466,18 @@ static void kvm_send_hwpoison_signal(unsigned long address,
send_sig_info(SIGBUS, &info, current);
}

+static bool mapping_in_memslot(struct kvm_memory_slot *memslot,
+ phys_addr_t fault_ipa, unsigned long mapping_size)
+{
+ gfn_t start_gfn = (fault_ipa & ~(mapping_size - 1)) >> PAGE_SHIFT;
+ gfn_t end_gfn = ALIGN(fault_ipa, mapping_size) >> PAGE_SHIFT;
+
+ WARN_ON(!is_power_of_2(mapping_size));
+
+ return memslot->base_gfn <= start_gfn &&
+ end_gfn < memslot->base_gfn + memslot->npages;
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_memory_slot *memslot, unsigned long hva,
unsigned long fault_status)
@@ -1480,7 +1492,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
kvm_pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
bool logging_active = memslot_is_logging(memslot);
- unsigned long flags = 0;
+ unsigned long vma_pagesize, flags = 0;

write_fault = kvm_is_write_fault(vcpu);
exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1500,7 +1512,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}

- if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
+ vma_pagesize = vma_kernel_pagesize(vma);
+ /* Is the mapping contained in the memslot? */
+ if (!mapping_in_memslot(memslot, fault_ipa, vma_pagesize)) {
+ /* memslot should be aligned to page size */
+ vma_pagesize = PAGE_SIZE;
+ force_pte = true;
+ }
+
+ if (vma_pagesize == PMD_SIZE && !logging_active) {
hugetlb = true;
gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
} else {

Thoughts?

2018-10-31 16:02:52

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: arm/arm64: Check memslot bounds before mapping hugepages

On Wed, Sep 26, 2018 at 05:11:35PM +0200, Lukas Braun wrote:
> Userspace can create a memslot with memory backed by (transparent)
> hugepages, but with bounds that do not align with hugepages.
> In that case, we cannot map the entire region in the guest as hugepages
> without exposing additional host memory to the guest and potentially
> interfering with other memslots.
> Consequently, this patch adds a bounds check when populating guest page
> tables and forces the creation of regular PTEs if mapping an entire
> hugepage would violate the memslots bounds.
>
> Signed-off-by: Lukas Braun <[email protected]>

It took me fairly long to understand why we didn't catch that when we
introduced the 'offset check', which indicates that this function is
just getting too long to read.

I don't absolutely mind the fix below, but it does pile on to the
complexity.

Here's an alternative approach (untested, of course), but it slightly
limits the functionality we have today, in favor of simplicitly. (Also
not sure if it's too large for cc'ing stable).


Thanks,

Christoffer


diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index 8b68099348e5..ccb003dfdb97 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -119,6 +119,11 @@ static inline phys_addr_t stage2_pmd_addr_end(phys_addr_t addr, phys_addr_t end)
return (boundary - 1 < end - 1) ? boundary : end;
}

+static inline bool stage2_pmd_aligned(phys_addr_t addr)
+{
+ return !!(addr & ~S2_PMD_MASK);
+}
+
#endif /* STAGE2_PGTABLE_LEVELS > 2 */

#define stage2_pte_table_empty(ptep) kvm_page_empty(ptep)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ed162a6c57c5..e5709ccee224 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1466,6 +1466,22 @@ static void kvm_send_hwpoison_signal(unsigned long address,
send_sig_info(SIGBUS, &info, current);
}

+static bool memslot_supports_pmd_mappings(struct kvm_memory_slot *memslot)
+{
+ gpa_t gpa_start, gpa_end;
+ hva_t hva_start, hva_end;
+ size_t size;
+
+ size = memslot->npages * PAGE_SIZE;
+ gpa_start = memslot->base_gfn << PAGE_SHIFT;
+ gpa_end = gpa_start + size;
+ hva_start = memslot->userspace_addr;
+ hva_end = hva_start + size;
+
+ return stage2_pmd_aligned(gpa_start) && stage2_pmd_aligned(gpa_end) &&
+ stage2_pmd_aligned(hva_start) && stage2_pmd_aligned(hva_end);
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_memory_slot *memslot, unsigned long hva,
unsigned long fault_status)
@@ -1491,6 +1507,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}

+ /*
+ * We limit PMD-size mappings to situations where both the userspace
+ * region and GPA region is aligned to the stage2 pmd size and is a
+ * multiple of the stage2 pmd size in total size. This ensures
+ * that we won't map unintended host memory and that we'll map the
+ * intended user pages (not skewed by mismatching PMD offsets).
+ *
+ * We miss out on the opportunity to map non-edge PMD regions in
+ * unaligned memslots. Oh well...
+ */
+ if (!memslot_supports_pmd_mappings(memslot))
+ force_pte = true;
+
+ if (logging_active)
+ force_pte = true;
+
/* Let's check if we will get back a huge page backed by hugetlbfs */
down_read(&current->mm->mmap_sem);
vma = find_vma_intersection(current->mm, hva, hva + 1);
@@ -1500,22 +1532,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}

- if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
+ if (vma_kernel_pagesize(vma) == PMD_SIZE && !force_pte) {
hugetlb = true;
gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
- } else {
- /*
- * Pages belonging to memslots that don't have the same
- * alignment for userspace and IPA cannot be mapped using
- * block descriptors even if the pages belong to a THP for
- * the process, because the stage-2 block descriptor will
- * cover more than a single THP and we loose atomicity for
- * unmapping, updates, and splits of the THP or other pages
- * in the stage-2 block range.
- */
- if ((memslot->userspace_addr & ~PMD_MASK) !=
- ((memslot->base_gfn << PAGE_SHIFT) & ~PMD_MASK))
- force_pte = true;
}
up_read(&current->mm->mmap_sem);

@@ -1554,7 +1573,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* should not be mapped with huge pages (it introduces churn
* and performance degradation), so force a pte mapping.
*/
- force_pte = true;
flags |= KVM_S2_FLAG_LOGGING_ACTIVE;

/*