2020-11-30 12:22:52

by Yanan Wang

[permalink] [raw]
Subject: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2

When installing a new leaf pte onto an invalid ptep, we need to get_page(ptep).
When just updating a valid leaf ptep, we shouldn't get_page(ptep).
Incorrect page_count of translation tables might lead to memory leak,
when unmapping a stage 2 memory range.

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..696b6aa83faf 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
return old == pte;

smp_store_release(ptep, pte);
+ get_page(virt_to_page(ptep));
return true;
}

@@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
/* There's an existing valid leaf entry, so perform break-before-make */
kvm_set_invalid_pte(ptep);
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+ put_page(virt_to_page(ptep));
kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
out:
data->phys += granule;
@@ -512,7 +514,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
}

if (stage2_map_walker_try_leaf(addr, end, level, ptep, data))
- goto out_get_page;
+ return 0;

if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
return -EINVAL;
@@ -536,9 +538,8 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
}

kvm_set_table_pte(ptep, childp);
-
-out_get_page:
get_page(page);
+
return 0;
}

--
2.19.1


2020-11-30 13:24:46

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2

On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
> When installing a new leaf pte onto an invalid ptep, we need to get_page(ptep).
> When just updating a valid leaf ptep, we shouldn't get_page(ptep).
> Incorrect page_count of translation tables might lead to memory leak,
> when unmapping a stage 2 memory range.

Did you find this by inspection, or did you hit this in practice? I'd be
interested to see the backtrace for mapping over an existing mapping.

> Signed-off-by: Yanan Wang <[email protected]>
> ---
> arch/arm64/kvm/hyp/pgtable.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..696b6aa83faf 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> return old == pte;
>
> smp_store_release(ptep, pte);
> + get_page(virt_to_page(ptep));

This is also used for the hypervisor stage-1 page-table, so I'd prefer to
leave this function as-is.

> return true;
> }
>
> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> /* There's an existing valid leaf entry, so perform break-before-make */
> kvm_set_invalid_pte(ptep);
> kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> + put_page(virt_to_page(ptep));
> kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> out:
> data->phys += granule;

Isn't this hunk alone sufficient to solve the problem?

Will

2020-12-01 07:27:10

by Yanan Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2

Hi Will,

On 2020/11/30 21:21, Will Deacon wrote:
> On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
>> When installing a new leaf pte onto an invalid ptep, we need to get_page(ptep).
>> When just updating a valid leaf ptep, we shouldn't get_page(ptep).
>> Incorrect page_count of translation tables might lead to memory leak,
>> when unmapping a stage 2 memory range.
> Did you find this by inspection, or did you hit this in practice? I'd be
> interested to see the backtrace for mapping over an existing mapping.

Actually this is found by inspection.

In the current code, get_page() will uniformly called at "out_get_page"
in function stage2_map_walk_leaf(),

no matter the old ptep is valid or not.

When using stage2_unmap_walker() API to unmap a memory range, some
page-table pages might not be

freed if page_count of the pages is not right.

>> Signed-off-by: Yanan Wang <[email protected]>
>> ---
>> arch/arm64/kvm/hyp/pgtable.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 0271b4a3b9fe..696b6aa83faf 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>> return old == pte;
>>
>> smp_store_release(ptep, pte);
>> + get_page(virt_to_page(ptep));
> This is also used for the hypervisor stage-1 page-table, so I'd prefer to
> leave this function as-is.
I agree at this point.
>> return true;
>> }
>>
>> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>> /* There's an existing valid leaf entry, so perform break-before-make */
>> kvm_set_invalid_pte(ptep);
>> kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>> + put_page(virt_to_page(ptep));
>> kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>> out:
>> data->phys += granule;
> Isn't this hunk alone sufficient to solve the problem?
>
> Will
> .

Not sufficient enough. When the old ptep is valid and old pte equlas new
pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()

and get_page() will still be called.


Yanan

2020-12-01 14:21:12

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2

On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
> On 2020/11/30 21:21, Will Deacon wrote:
> > On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index 0271b4a3b9fe..696b6aa83faf 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> > > return old == pte;
> > > smp_store_release(ptep, pte);
> > > + get_page(virt_to_page(ptep));
> > This is also used for the hypervisor stage-1 page-table, so I'd prefer to
> > leave this function as-is.
> I agree at this point.
> > > return true;
> > > }
> > > @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > > /* There's an existing valid leaf entry, so perform break-before-make */
> > > kvm_set_invalid_pte(ptep);
> > > kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > > + put_page(virt_to_page(ptep));
> > > kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> > > out:
> > > data->phys += granule;
> > Isn't this hunk alone sufficient to solve the problem?
> >
> > Will
> > .
>
> Not sufficient enough. When the old ptep is valid and old pte equlas new
> pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
>
> and get_page() will still be called.

I had a go at fixing this without introducing refcounting to the hyp stage-1
case, and ended up with the diff below. What do you think?

Will

--->8

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..78e2c0dc47ae 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
smp_store_release(ptep, pte);
}

-static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
- u32 level)
+static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
{
- kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
+ kvm_pte_t pte = kvm_phys_to_pte(pa);
u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
KVM_PTE_TYPE_BLOCK;

pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
pte |= FIELD_PREP(KVM_PTE_TYPE, type);
pte |= KVM_PTE_VALID;
-
- /* Tolerate KVM recreating the exact same mapping. */
- if (kvm_pte_valid(old))
- return old == pte;
-
- smp_store_release(ptep, pte);
- return true;
+ return pte;
}

static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
@@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep, struct hyp_map_data *data)
{
+ kvm_pte_t new, old = *ptep;
u64 granule = kvm_granule_size(level), phys = data->phys;

if (!kvm_block_mapping_supported(addr, end, phys, level))
return false;

- WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
+ /* Tolerate KVM recreating the exact same mapping. */
+ new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+ if (old != new && !WARN_ON(kvm_pte_valid(old)))
+ smp_store_release(ptep, new);
+
data->phys += granule;
return true;
}
@@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
struct stage2_map_data *data)
{
+ kvm_pte_t new, old = *ptep;
u64 granule = kvm_granule_size(level), phys = data->phys;

if (!kvm_block_mapping_supported(addr, end, phys, level))
return false;

- if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
- goto out;
+ new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+ if (kvm_pte_valid(old)) {
+ /*
+ * There's an existing valid leaf entry, so perform
+ * break-before-make.
+ */
+ kvm_set_invalid_pte(ptep);
+ kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+ put_page(virt_to_page(ptep));
+ }

- /* There's an existing valid leaf entry, so perform break-before-make */
- kvm_set_invalid_pte(ptep);
- kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
- kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
-out:
+ smp_store_release(ptep, new);
data->phys += granule;
return true;
}

2020-12-01 17:24:06

by Yanan Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2

On 2020/12/1 22:16, Will Deacon wrote:

> On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
>> On 2020/11/30 21:21, Will Deacon wrote:
>>> On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
>>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>>> index 0271b4a3b9fe..696b6aa83faf 100644
>>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>>> @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>>>> return old == pte;
>>>> smp_store_release(ptep, pte);
>>>> + get_page(virt_to_page(ptep));
>>> This is also used for the hypervisor stage-1 page-table, so I'd prefer to
>>> leave this function as-is.
>> I agree at this point.
>>>> return true;
>>>> }
>>>> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>> /* There's an existing valid leaf entry, so perform break-before-make */
>>>> kvm_set_invalid_pte(ptep);
>>>> kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>>> + put_page(virt_to_page(ptep));
>>>> kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>>>> out:
>>>> data->phys += granule;
>>> Isn't this hunk alone sufficient to solve the problem?
>>>
>>> Will
>>> .
>> Not sufficient enough. When the old ptep is valid and old pte equlas new
>> pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
>>
>> and get_page() will still be called.
> I had a go at fixing this without introducing refcounting to the hyp stage-1
> case, and ended up with the diff below. What do you think?

Hi Will,

Functionally this diff looks fine to me. A small comment inline, please
see below.

I had made an alternative fix (after sending v1) and it looks much more
concise.

If you're ok with it, I can send it as v2 (together with patch#2 and #3)
after some tests.


Thanks,

Yanan


diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..b232bdd142a6 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64
end, u32 level,
        if (!kvm_block_mapping_supported(addr, end, phys, level))
                return false;

+       if (kvm_pte_valid(*ptep))
+               put_page(virt_to_page(ptep));
+
        if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
                goto out;

>
> --->8
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..78e2c0dc47ae 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
> smp_store_release(ptep, pte);
> }
>
> -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> - u32 level)
> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> {
> - kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> + kvm_pte_t pte = kvm_phys_to_pte(pa);
> u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> KVM_PTE_TYPE_BLOCK;
>
> pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> pte |= KVM_PTE_VALID;
> -
> - /* Tolerate KVM recreating the exact same mapping. */
> - if (kvm_pte_valid(old))
> - return old == pte;
> -
> - smp_store_release(ptep, pte);
> - return true;
> + return pte;
> }
>
> static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
> static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> kvm_pte_t *ptep, struct hyp_map_data *data)
> {
> + kvm_pte_t new, old = *ptep;
> u64 granule = kvm_granule_size(level), phys = data->phys;
>
> if (!kvm_block_mapping_supported(addr, end, phys, level))
> return false;
>
> - WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
> + /* Tolerate KVM recreating the exact same mapping. */
> + new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> + if (old != new && !WARN_ON(kvm_pte_valid(old)))
> + smp_store_release(ptep, new);
> +
> data->phys += granule;
> return true;
> }
> @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> kvm_pte_t *ptep,
> struct stage2_map_data *data)
> {
> + kvm_pte_t new, old = *ptep;
> u64 granule = kvm_granule_size(level), phys = data->phys;
>
> if (!kvm_block_mapping_supported(addr, end, phys, level))
> return false;
>
> - if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> - goto out;
> + new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> + if (kvm_pte_valid(old)) {
> + /*
> + * There's an existing valid leaf entry, so perform
> + * break-before-make.
> + */
> + kvm_set_invalid_pte(ptep);
> + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> + put_page(virt_to_page(ptep));

When old PTE is valid and equals to the new one, we will also perform
break-before-make and the new PTE installation. But they're actually not
necessary in this case.

> + }
>
> - /* There's an existing valid leaf entry, so perform break-before-make */
> - kvm_set_invalid_pte(ptep);
> - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> - kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> -out:
> + smp_store_release(ptep, new);
> data->phys += granule;
> return true;
> }
> .

2020-12-01 18:21:22

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2

On Wed, Dec 02, 2020 at 01:19:35AM +0800, wangyanan (Y) wrote:
> On 2020/12/1 22:16, Will Deacon wrote:
> > On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
> > > On 2020/11/30 21:21, Will Deacon wrote:
> > > > On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
> > > > > @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > > > > /* There's an existing valid leaf entry, so perform break-before-make */
> > > > > kvm_set_invalid_pte(ptep);
> > > > > kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > > > > + put_page(virt_to_page(ptep));
> > > > > kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> > > > > out:
> > > > > data->phys += granule;
> > > > Isn't this hunk alone sufficient to solve the problem?
> > > >
> > > Not sufficient enough. When the old ptep is valid and old pte equlas new
> > > pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
> > >
> > > and get_page() will still be called.
> > I had a go at fixing this without introducing refcounting to the hyp stage-1
> > case, and ended up with the diff below. What do you think?
>
> Functionally this diff looks fine to me. A small comment inline, please see
> below.
>
> I had made an alternative fix (after sending v1) and it looks much more
> concise.
>
> If you're ok with it, I can send it as v2 (together with patch#2 and #3)
> after some tests.

Thanks.

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..b232bdd142a6 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64
> end, u32 level,
> ??????? if (!kvm_block_mapping_supported(addr, end, phys, level))
> ??????????????? return false;
>
> +?????? if (kvm_pte_valid(*ptep))
> +?????????????? put_page(virt_to_page(ptep));
> +
> ??????? if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> ??????????????? goto out;

This is certainly smaller, but see below.

> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 0271b4a3b9fe..78e2c0dc47ae 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
> > smp_store_release(ptep, pte);
> > }
> > -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> > - u32 level)
> > +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> > {
> > - kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> > + kvm_pte_t pte = kvm_phys_to_pte(pa);
> > u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> > KVM_PTE_TYPE_BLOCK;
> > pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> > pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> > pte |= KVM_PTE_VALID;
> > -
> > - /* Tolerate KVM recreating the exact same mapping. */
> > - if (kvm_pte_valid(old))
> > - return old == pte;
> > -
> > - smp_store_release(ptep, pte);
> > - return true;
> > + return pte;
> > }
> > static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> > @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
> > static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > kvm_pte_t *ptep, struct hyp_map_data *data)
> > {
> > + kvm_pte_t new, old = *ptep;
> > u64 granule = kvm_granule_size(level), phys = data->phys;
> > if (!kvm_block_mapping_supported(addr, end, phys, level))
> > return false;
> > - WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
> > + /* Tolerate KVM recreating the exact same mapping. */
> > + new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> > + if (old != new && !WARN_ON(kvm_pte_valid(old)))
> > + smp_store_release(ptep, new);
> > +
> > data->phys += granule;
> > return true;
> > }
> > @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > kvm_pte_t *ptep,
> > struct stage2_map_data *data)
> > {
> > + kvm_pte_t new, old = *ptep;
> > u64 granule = kvm_granule_size(level), phys = data->phys;
> > if (!kvm_block_mapping_supported(addr, end, phys, level))
> > return false;
> > - if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> > - goto out;
> > + new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> > + if (kvm_pte_valid(old)) {
> > + /*
> > + * There's an existing valid leaf entry, so perform
> > + * break-before-make.
> > + */
> > + kvm_set_invalid_pte(ptep);
> > + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > + put_page(virt_to_page(ptep));
>
> When old PTE is valid and equals to the new one, we will also perform
> break-before-make and the new PTE installation. But they're actually not
> necessary in this case.

Agreed, but I don't think that case should happen for stage-2 mappings.
That's why I prefer my diff here, as it makes that 'old == new' logic
specific to the hyp mappings.

But I'll leave it all up to you (these diffs are only intended to be
helpful).

Will

2020-12-01 20:13:38

by Yanan Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2


On 2020/12/2 2:15, Will Deacon wrote:
> On Wed, Dec 02, 2020 at 01:19:35AM +0800, wangyanan (Y) wrote:
>> On 2020/12/1 22:16, Will Deacon wrote:
>>> On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
>>>> On 2020/11/30 21:21, Will Deacon wrote:
>>>>> On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
>>>>>> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>>>> /* There's an existing valid leaf entry, so perform break-before-make */
>>>>>> kvm_set_invalid_pte(ptep);
>>>>>> kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>>>>> + put_page(virt_to_page(ptep));
>>>>>> kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>>>>>> out:
>>>>>> data->phys += granule;
>>>>> Isn't this hunk alone sufficient to solve the problem?
>>>>>
>>>> Not sufficient enough. When the old ptep is valid and old pte equlas new
>>>> pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
>>>>
>>>> and get_page() will still be called.
>>> I had a go at fixing this without introducing refcounting to the hyp stage-1
>>> case, and ended up with the diff below. What do you think?
>> Functionally this diff looks fine to me. A small comment inline, please see
>> below.
>>
>> I had made an alternative fix (after sending v1) and it looks much more
>> concise.
>>
>> If you're ok with it, I can send it as v2 (together with patch#2 and #3)
>> after some tests.
> Thanks.
>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 0271b4a3b9fe..b232bdd142a6 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64
>> end, u32 level,
>>         if (!kvm_block_mapping_supported(addr, end, phys, level))
>>                 return false;
>>
>> +       if (kvm_pte_valid(*ptep))
>> +               put_page(virt_to_page(ptep));
>> +
>>         if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>>                 goto out;
> This is certainly smaller, but see below.
>
>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>> index 0271b4a3b9fe..78e2c0dc47ae 100644
>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>> @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
>>> smp_store_release(ptep, pte);
>>> }
>>> -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>>> - u32 level)
>>> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
>>> {
>>> - kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
>>> + kvm_pte_t pte = kvm_phys_to_pte(pa);
>>> u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
>>> KVM_PTE_TYPE_BLOCK;
>>> pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
>>> pte |= FIELD_PREP(KVM_PTE_TYPE, type);
>>> pte |= KVM_PTE_VALID;
>>> -
>>> - /* Tolerate KVM recreating the exact same mapping. */
>>> - if (kvm_pte_valid(old))
>>> - return old == pte;
>>> -
>>> - smp_store_release(ptep, pte);
>>> - return true;
>>> + return pte;
>>> }
>>> static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
>>> @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
>>> static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>> kvm_pte_t *ptep, struct hyp_map_data *data)
>>> {
>>> + kvm_pte_t new, old = *ptep;
>>> u64 granule = kvm_granule_size(level), phys = data->phys;
>>> if (!kvm_block_mapping_supported(addr, end, phys, level))
>>> return false;
>>> - WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
>>> + /* Tolerate KVM recreating the exact same mapping. */
>>> + new = kvm_init_valid_leaf_pte(phys, data->attr, level);
>>> + if (old != new && !WARN_ON(kvm_pte_valid(old)))
>>> + smp_store_release(ptep, new);
>>> +
>>> data->phys += granule;
>>> return true;
>>> }
>>> @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>> kvm_pte_t *ptep,
>>> struct stage2_map_data *data)
>>> {
>>> + kvm_pte_t new, old = *ptep;
>>> u64 granule = kvm_granule_size(level), phys = data->phys;
>>> if (!kvm_block_mapping_supported(addr, end, phys, level))
>>> return false;
>>> - if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>>> - goto out;
>>> + new = kvm_init_valid_leaf_pte(phys, data->attr, level);
>>> + if (kvm_pte_valid(old)) {
>>> + /*
>>> + * There's an existing valid leaf entry, so perform
>>> + * break-before-make.
>>> + */
>>> + kvm_set_invalid_pte(ptep);
>>> + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>> + put_page(virt_to_page(ptep));
>> When old PTE is valid and equals to the new one, we will also perform
>> break-before-make and the new PTE installation. But they're actually not
>> necessary in this case.
> Agreed, but I don't think that case should happen for stage-2 mappings.
> That's why I prefer my diff here, as it makes that 'old == new' logic
> specific to the hyp mappings.
>
> But I'll leave it all up to you (these diffs are only intended to be
> helpful).
>
> Will
> .

Hi Will,

In my opinion, the 'old == new' case might happen too in stage-2
translation,  especially in the condition of concurrent access of
multiple vCPUs.

For example, when merging tables into a block, we will perform
break-before-make for a valid old PTE in function stage2_map_walk_pre().

If the other vCPUs are trying to access the same GPA range, so the MMUs
will target at the same PTE as above, and they might just catch the moment

when the target PTE is set invalid in BBM by the vCPU holding the
mmu_lock, but the old PTE will be updated to valid later.

As a result, translation fault will be triggered in these vCPUs, then
they will wait for the mmu_lock to map exactly the *same* mapping (old
== new).


Thanks,

Yanan