2020-01-08 20:29:58

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 00/14] KVM: x86/mmu: Huge page fixes, cleanup, and DAX

This series is a mix of bug fixes, cleanup and new support in KVM's
handling of huge pages. The series initially stemmed from a syzkaller
bug report[1], which is fixed by patch 02, "mm: thp: KVM: Explicitly
check for THP when populating secondary MMU".

While investigating options for fixing the syzkaller bug, I realized KVM
could reuse the approach from Barret's series to enable huge pages for DAX
mappings in KVM[2] for all types of huge mappings, i.e. walk the host page
tables instead of querying metadata (patches 05 - 09).

Walking the host page tables sidesteps the issues with refcounting and
identifying THP mappings (in theory), and using a common method for
identifying huge mappings should improve (haven't actually measured) KVM's
overall page fault latency by eliminating the vma lookup that is currently
used to identify HugeTLB mappings. Eliminating the HugeTLB specific code
also allows for additional cleanup (patches 10 - 13).

Testing the page walk approach revealed several pre-existing bugs that
are included here (patches 01, 03 and 04) because the changes interact
with the rest of the series, e.g. without the read-only memslots fix,
walking the host page tables without explicitly filtering out HugeTLB
mappings would pick up read-only memslots and introduce a completely
unintended functional change.

Lastly, with the page walk infrastructure in place, supporting DAX-based
huge mappings becomes a trivial change (patch 14).

Based on kvm/queue, commit e41a90be9659 ("KVM: x86/mmu: WARN if root_hpa
is invalid when handling a page fault")

Paolo, assuming I understand your workflow, patch 01 can be squashed with
the buggy commit as it's still sitting in kvm/queue.

[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/[email protected]

Sean Christopherson (14):
KVM: x86/mmu: Enforce max_level on HugeTLB mappings
mm: thp: KVM: Explicitly check for THP when populating secondary MMU
KVM: Use vcpu-specific gva->hva translation when querying host page
size
KVM: Play nice with read-only memslots when querying host page size
x86/mm: Introduce lookup_address_in_mm()
KVM: x86/mmu: Refactor THP adjust to prep for changing query
KVM: x86/mmu: Walk host page tables to find THP mappings
KVM: x86/mmu: Drop level optimization from fast_page_fault()
KVM: x86/mmu: Rely on host page tables to find HugeTLB mappings
KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)
KVM: x86/mmu: Zap any compound page when collapsing sptes
KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()
KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()
KVM: x86/mmu: Use huge pages for DAX-backed files

arch/powerpc/kvm/book3s_xive_native.c | 2 +-
arch/x86/include/asm/pgtable_types.h | 4 +
arch/x86/kvm/mmu/mmu.c | 208 ++++++++++----------------
arch/x86/kvm/mmu/paging_tmpl.h | 29 +---
arch/x86/mm/pageattr.c | 11 ++
include/linux/huge_mm.h | 6 +
include/linux/kvm_host.h | 3 +-
mm/huge_memory.c | 11 ++
virt/kvm/arm/mmu.c | 8 +-
virt/kvm/kvm_main.c | 24 ++-
10 files changed, 145 insertions(+), 161 deletions(-)

--
2.24.1


2020-01-08 20:30:05

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 01/14] KVM: x86/mmu: Enforce max_level on HugeTLB mappings

Limit KVM's mapping level for HugeTLB based on its calculated max_level.
The max_level check prior to invoking host_mapping_level() only filters
out the case where KVM cannot create a 2mb mapping, it doesn't handle
the scenario where KVM can create a 2mb but not 1gb mapping, and the
host is using a 1gb HugeTLB mapping.

Fixes: ad163aa8903d ("KVM: x86/mmu: Persist gfn_lpage_is_disallowed() to max_level")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7269130ea5e2..8e822c09170d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1330,7 +1330,7 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
int *max_levelp)
{
- int max_level = *max_levelp;
+ int host_level, max_level = *max_levelp;
struct kvm_memory_slot *slot;

if (unlikely(max_level == PT_PAGE_TABLE_LEVEL))
@@ -1362,7 +1362,8 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
* So, do not propagate host_mapping_level() to max_level as KVM can
* still promote the guest mapping to a huge page in the THP case.
*/
- return host_mapping_level(vcpu->kvm, large_gfn);
+ host_level = host_mapping_level(vcpu->kvm, large_gfn);
+ return min(host_level, max_level);
}

/*
--
2.24.1

2020-01-08 20:30:09

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 05/14] x86/mm: Introduce lookup_address_in_mm()

Add a helper, lookup_address_in_mm(), to traverse the page tables of a
given mm struct. KVM will use the helper to retrieve the host mapping
level, e.g. 4k vs. 2mb vs. 1gb, of a compound (or DAX-backed) page
without having to resort to implementation specific metadata. E.g. KVM
currently uses different logic for HugeTLB vs. THP, and would add a
third variant for DAX-backed files.

Cc: Dan Williams <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/pgtable_types.h | 4 ++++
arch/x86/mm/pageattr.c | 11 +++++++++++
2 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index b5e49e6bac63..400ac8da75e8 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -561,6 +561,10 @@ static inline void update_page_count(int level, unsigned long pages) { }
extern pte_t *lookup_address(unsigned long address, unsigned int *level);
extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
unsigned int *level);
+
+struct mm_struct;
+pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
+ unsigned int *level);
extern pmd_t *lookup_pmd_address(unsigned long address);
extern phys_addr_t slow_virt_to_phys(void *__address);
extern int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn,
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 0d09cc5aad61..8787fec876e4 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -618,6 +618,17 @@ pte_t *lookup_address(unsigned long address, unsigned int *level)
}
EXPORT_SYMBOL_GPL(lookup_address);

+/*
+ * Lookup the page table entry for a virtual address in a given mm. Return a
+ * pointer to the entry and the level of the mapping.
+ */
+pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
+ unsigned int *level)
+{
+ return lookup_address_in_pgd(pgd_offset(mm, address), address, level);
+}
+EXPORT_SYMBOL_GPL(lookup_address_in_mm);
+
static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address,
unsigned int *level)
{
--
2.24.1

2020-01-09 20:51:40

by Barret Rhoden

[permalink] [raw]
Subject: Re: [PATCH 00/14] KVM: x86/mmu: Huge page fixes, cleanup, and DAX

Hi -

On 1/8/20 3:24 PM, Sean Christopherson wrote:
> This series is a mix of bug fixes, cleanup and new support in KVM's
> handling of huge pages. The series initially stemmed from a syzkaller
> bug report[1], which is fixed by patch 02, "mm: thp: KVM: Explicitly
> check for THP when populating secondary MMU".
>
> While investigating options for fixing the syzkaller bug, I realized KVM
> could reuse the approach from Barret's series to enable huge pages for DAX
> mappings in KVM[2] for all types of huge mappings, i.e. walk the host page
> tables instead of querying metadata (patches 05 - 09).

Thanks, Sean. I tested this patch series out, and it works for me.
(Huge KVM mappings of a DAX file, etc.).

Thanks,

Barret



2020-01-09 21:08:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 05/14] x86/mm: Introduce lookup_address_in_mm()

Sean Christopherson <[email protected]> writes:

> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index b5e49e6bac63..400ac8da75e8 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -561,6 +561,10 @@ static inline void update_page_count(int level, unsigned long pages) { }
> extern pte_t *lookup_address(unsigned long address, unsigned int *level);
> extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
> unsigned int *level);
> +
> +struct mm_struct;
> +pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
> + unsigned int *level);

Please keep the file consistent and use extern even if not required.

Other than that:

Reviewed-by: Thomas Gleixner <[email protected]>

2020-01-21 14:29:05

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 05/14] x86/mm: Introduce lookup_address_in_mm()

On 09/01/20 22:04, Thomas Gleixner wrote:
> Sean Christopherson <[email protected]> writes:
>
>> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
>> index b5e49e6bac63..400ac8da75e8 100644
>> --- a/arch/x86/include/asm/pgtable_types.h
>> +++ b/arch/x86/include/asm/pgtable_types.h
>> @@ -561,6 +561,10 @@ static inline void update_page_count(int level, unsigned long pages) { }
>> extern pte_t *lookup_address(unsigned long address, unsigned int *level);
>> extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
>> unsigned int *level);
>> +
>> +struct mm_struct;
>> +pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
>> + unsigned int *level);
>
> Please keep the file consistent and use extern even if not required.
>
> Other than that:
>
> Reviewed-by: Thomas Gleixner <[email protected]>
>

Adjusted, thanks for the review.

Paolo

2020-01-21 15:11:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 00/14] KVM: x86/mmu: Huge page fixes, cleanup, and DAX

On 09/01/20 20:47, Barret Rhoden wrote:
> Hi -
>
> On 1/8/20 3:24 PM, Sean Christopherson wrote:
>> This series is a mix of bug fixes, cleanup and new support in KVM's
>> handling of huge pages.  The series initially stemmed from a syzkaller
>> bug report[1], which is fixed by patch 02, "mm: thp: KVM: Explicitly
>> check for THP when populating secondary MMU".
>>
>> While investigating options for fixing the syzkaller bug, I realized KVM
>> could reuse the approach from Barret's series to enable huge pages for
>> DAX
>> mappings in KVM[2] for all types of huge mappings, i.e. walk the host
>> page
>> tables instead of querying metadata (patches 05 - 09).
>
> Thanks, Sean.  I tested this patch series out, and it works for me.
> (Huge KVM mappings of a DAX file, etc.).

Queued, thanks.

Paolo