2017-08-22 14:43:20

by Xiaoming Ni

[permalink] [raw]
Subject: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

miss kfree(stt) when anon_inode_getfd return fail
so add check anon_inode_getfd return val, and kfree stt

Signed-off-by: nixiaoming <[email protected]>
---
arch/powerpc/kvm/book3s_64_vio.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index a160c14..a0b4459 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -341,8 +341,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,

mutex_unlock(&kvm->lock);

- return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
+ ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
stt, O_RDWR | O_CLOEXEC);
+ if (ret < 0)
+ goto fail;
+ return ret;

fail:
if (stt) {
--
2.11.0.1


2017-08-22 15:16:00

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

On 22.08.2017 16:28, nixiaoming wrote:
> miss kfree(stt) when anon_inode_getfd return fail
> so add check anon_inode_getfd return val, and kfree stt
>
> Signed-off-by: nixiaoming <[email protected]>
> ---
> arch/powerpc/kvm/book3s_64_vio.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index a160c14..a0b4459 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -341,8 +341,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>
> mutex_unlock(&kvm->lock);
>
> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
> stt, O_RDWR | O_CLOEXEC);
> + if (ret < 0)
> + goto fail;
> + return ret;
>
> fail:
> if (stt) {
>


stt has already been added to kvm->arch.spapr_tce_tables, so freeing it
is evil IMHO. I don't know that code, so I don't know if there is some
other place that will make sure that everything in
kvm->arch.spapr_tce_tables will properly get freed, even when no release
function has been called (kvm_spapr_tce_release).
--

Thanks,

David

2017-08-22 15:23:42

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

On 22.08.2017 17:15, David Hildenbrand wrote:
> On 22.08.2017 16:28, nixiaoming wrote:
>> miss kfree(stt) when anon_inode_getfd return fail
>> so add check anon_inode_getfd return val, and kfree stt
>>
>> Signed-off-by: nixiaoming <[email protected]>
>> ---
>> arch/powerpc/kvm/book3s_64_vio.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index a160c14..a0b4459 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -341,8 +341,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>
>> mutex_unlock(&kvm->lock);
>>
>> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>> stt, O_RDWR | O_CLOEXEC);
>> + if (ret < 0)
>> + goto fail;
>> + return ret;
>>
>> fail:
>> if (stt) {
>>
>
>
> stt has already been added to kvm->arch.spapr_tce_tables, so freeing it
> is evil IMHO. I don't know that code, so I don't know if there is some
> other place that will make sure that everything in
> kvm->arch.spapr_tce_tables will properly get freed, even when no release
> function has been called (kvm_spapr_tce_release).
>

If it is really not freed, than also kvm_put_kvm(stt->kvm) is missing.

--

Thanks,

David

2017-08-23 01:47:18

by Xiaoming Ni

[permalink] [raw]
Subject: Re:Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

>On 22.08.2017 17:15, David Hildenbrand wrote:
>> On 22.08.2017 16:28, nixiaoming wrote:
>>> miss kfree(stt) when anon_inode_getfd return fail so add check
>>> anon_inode_getfd return val, and kfree stt
>>>
>>> Signed-off-by: nixiaoming <[email protected]>
>>> ---
>>> arch/powerpc/kvm/book3s_64_vio.c | 5 ++++-
>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c
>>> b/arch/powerpc/kvm/book3s_64_vio.c
>>> index a160c14..a0b4459 100644
>>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>>> @@ -341,8 +341,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm
>>> *kvm,
>>>
>>> mutex_unlock(&kvm->lock);
>>>
>>> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>> stt, O_RDWR | O_CLOEXEC);
>>> + if (ret < 0)
>>> + goto fail;
>>> + return ret;
>>>
>>> fail:
>>> if (stt) {
>>>
>>
>>
>> stt has already been added to kvm->arch.spapr_tce_tables, so freeing
>> it is evil IMHO. I don't know that code, so I don't know if there is
>> some other place that will make sure that everything in
>> kvm->arch.spapr_tce_tables will properly get freed, even when no
>> kvm->release
>> function has been called (kvm_spapr_tce_release).
>>
>
>If it is really not freed, than also kvm_put_kvm(stt->kvm) is missing.
>
>--
>
>Thanks,
>
>David
>

if (!stt) return -ENOMEM;
kvm_get_kvm(kvm);
if anon_inode_getfd return -ENOMEM
The user can not determine whether kvm_get_kvm has been called
so need add kvm_pet_kvm when anon_inode_getfd fail

stt has already been added to kvm->arch.spapr_tce_tables,
but if anon_inode_getfd fail, stt is unused val,
so call list_del_rcu, and free as quickly as possible

new patch:

---
arch/powerpc/kvm/book3s_64_vio.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index a160c14..e2228f1 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -341,8 +341,16 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,

mutex_unlock(&kvm->lock);

- return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
+ ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
stt, O_RDWR | O_CLOEXEC);
+ if (ret < 0) {
+ mutex_lock(&kvm->lock);
+ list_del_rcu(&stt->list);
+ mutex_unlock(&kvm->lock);
+ kvm_put_kvm(kvm);
+ goto fail;
+ }
+ return ret;

fail:
if (stt) {
--
2.11.0.1




2017-08-23 06:06:34

by Paul Mackerras

[permalink] [raw]
Subject: Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

On Wed, Aug 23, 2017 at 01:43:08AM +0000, Nixiaoming wrote:
> >On 22.08.2017 17:15, David Hildenbrand wrote:
> >> On 22.08.2017 16:28, nixiaoming wrote:
> >>> miss kfree(stt) when anon_inode_getfd return fail so add check
> >>> anon_inode_getfd return val, and kfree stt
> >>>
> >>> Signed-off-by: nixiaoming <[email protected]>
> >>> ---
> >>> arch/powerpc/kvm/book3s_64_vio.c | 5 ++++-
> >>> 1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c
> >>> b/arch/powerpc/kvm/book3s_64_vio.c
> >>> index a160c14..a0b4459 100644
> >>> --- a/arch/powerpc/kvm/book3s_64_vio.c
> >>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> >>> @@ -341,8 +341,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm
> >>> *kvm,
> >>>
> >>> mutex_unlock(&kvm->lock);
> >>>
> >>> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
> >>> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
> >>> stt, O_RDWR | O_CLOEXEC);
> >>> + if (ret < 0)
> >>> + goto fail;
> >>> + return ret;
> >>>
> >>> fail:
> >>> if (stt) {
> >>>
> >>
> >>
> >> stt has already been added to kvm->arch.spapr_tce_tables, so freeing
> >> it is evil IMHO. I don't know that code, so I don't know if there is
> >> some other place that will make sure that everything in
> >> kvm->arch.spapr_tce_tables will properly get freed, even when no
> >> kvm->release
> >> function has been called (kvm_spapr_tce_release).
> >>
> >
> >If it is really not freed, than also kvm_put_kvm(stt->kvm) is missing.
> >
> >--
> >
> >Thanks,
> >
> >David
> >
>
> if (!stt) return -ENOMEM;
> kvm_get_kvm(kvm);
> if anon_inode_getfd return -ENOMEM
> The user can not determine whether kvm_get_kvm has been called
> so need add kvm_pet_kvm when anon_inode_getfd fail
>
> stt has already been added to kvm->arch.spapr_tce_tables,
> but if anon_inode_getfd fail, stt is unused val,
> so call list_del_rcu, and free as quickly as possible
>
> new patch:
>
> ---
> arch/powerpc/kvm/book3s_64_vio.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index a160c14..e2228f1 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -341,8 +341,16 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>
> mutex_unlock(&kvm->lock);
>
> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
> stt, O_RDWR | O_CLOEXEC);
> + if (ret < 0) {
> + mutex_lock(&kvm->lock);
> + list_del_rcu(&stt->list);
> + mutex_unlock(&kvm->lock);
> + kvm_put_kvm(kvm);
> + goto fail;
> + }
> + return ret;

It seems to me that it would be better to do the anon_inode_getfd()
call before the kvm_get_kvm() call, and go to the fail label if it
fails.

Paul.

2017-08-23 08:25:49

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

On 23.08.2017 08:06, Paul Mackerras wrote:
> On Wed, Aug 23, 2017 at 01:43:08AM +0000, Nixiaoming wrote:
>>> On 22.08.2017 17:15, David Hildenbrand wrote:
>>>> On 22.08.2017 16:28, nixiaoming wrote:
>>>>> miss kfree(stt) when anon_inode_getfd return fail so add check
>>>>> anon_inode_getfd return val, and kfree stt
>>>>>
>>>>> Signed-off-by: nixiaoming <[email protected]>
>>>>> ---
>>>>> arch/powerpc/kvm/book3s_64_vio.c | 5 ++++-
>>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c
>>>>> b/arch/powerpc/kvm/book3s_64_vio.c
>>>>> index a160c14..a0b4459 100644
>>>>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>>>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>>>>> @@ -341,8 +341,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm
>>>>> *kvm,
>>>>>
>>>>> mutex_unlock(&kvm->lock);
>>>>>
>>>>> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>>>> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>>>> stt, O_RDWR | O_CLOEXEC);
>>>>> + if (ret < 0)
>>>>> + goto fail;
>>>>> + return ret;
>>>>>
>>>>> fail:
>>>>> if (stt) {
>>>>>
>>>>
>>>>
>>>> stt has already been added to kvm->arch.spapr_tce_tables, so freeing
>>>> it is evil IMHO. I don't know that code, so I don't know if there is
>>>> some other place that will make sure that everything in
>>>> kvm->arch.spapr_tce_tables will properly get freed, even when no
>>>> kvm->release
>>>> function has been called (kvm_spapr_tce_release).
>>>>
>>>
>>> If it is really not freed, than also kvm_put_kvm(stt->kvm) is missing.
>>>
>>> --
>>>
>>> Thanks,
>>>
>>> David
>>>
>>
>> if (!stt) return -ENOMEM;
>> kvm_get_kvm(kvm);
>> if anon_inode_getfd return -ENOMEM
>> The user can not determine whether kvm_get_kvm has been called
>> so need add kvm_pet_kvm when anon_inode_getfd fail
>>
>> stt has already been added to kvm->arch.spapr_tce_tables,
>> but if anon_inode_getfd fail, stt is unused val,
>> so call list_del_rcu, and free as quickly as possible
>>
>> new patch:
>>
>> ---
>> arch/powerpc/kvm/book3s_64_vio.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index a160c14..e2228f1 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -341,8 +341,16 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>
>> mutex_unlock(&kvm->lock);
>>
>> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>> stt, O_RDWR | O_CLOEXEC);
>> + if (ret < 0) {
>> + mutex_lock(&kvm->lock);
>> + list_del_rcu(&stt->list);

... don't we have to take care of rcu synchronization before freeing it?

>> + mutex_unlock(&kvm->lock);
>> + kvm_put_kvm(kvm);
>> + goto fail;
>> + }
>> + return ret;

of simply

if (!ret)
return 0;

mutex_lock(&kvm->lock);
list_del_rcu(&stt->list);
mutex_unlock(&kvm->lock);
kvm_put_kvm(kvm);


>
> It seems to me that it would be better to do the anon_inode_getfd()
> call before the kvm_get_kvm() call, and go to the fail label if it
> fails.

I would have suggested to not add it to the list before it has been
fully created (so nobody can have access to it). But I guess than we
need another level of protection(e.g. kvm->lock).

Am I missing something, or is kvm_vm_ioctl_create_spapr_tce() racy?

The -EBUSY check is done without any locking, so two parallel creators
could create an inconsistency, no? Shouldn't this all be protected by
kvm->lock?

>
> Paul.
>

Independent of the fix, I'd suggest the following cleanup.


>From 979f55083ee965e25827a8743e8a9fdb85231a6f Mon Sep 17 00:00:00 2001
From: David Hildenbrand <[email protected]>
Date: Wed, 23 Aug 2017 10:08:58 +0200
Subject: [PATCH v1 1/1] KVM: PPC: cleanup kvm_vm_ioctl_create_spapr_tce

Let's simplify error handling.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/kvm/book3s_64_vio.c | 29 +++++++++++------------------
1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c
b/arch/powerpc/kvm/book3s_64_vio.c
index a160c14304eb..6bac49292296 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -295,8 +295,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
{
struct kvmppc_spapr_tce_table *stt = NULL;
unsigned long npages, size;
- int ret = -ENOMEM;
- int i;
+ int i, ret;

if (!args->size)
return -EINVAL;
@@ -310,16 +309,13 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
size = _ALIGN_UP(args->size, PAGE_SIZE >> 3);
npages = kvmppc_tce_pages(size);
ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true);
- if (ret) {
- stt = NULL;
- goto fail;
- }
+ if (ret)
+ return ret;

- ret = -ENOMEM;
stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
GFP_KERNEL);
if (!stt)
- goto fail;
+ return -ENOMEM;

stt->liobn = args->liobn;
stt->page_shift = args->page_shift;
@@ -331,7 +327,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
for (i = 0; i < npages; i++) {
stt->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!stt->pages[i])
- goto fail;
+ goto fail_free;
}

kvm_get_kvm(kvm);
@@ -344,15 +340,12 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
stt, O_RDWR | O_CLOEXEC);

-fail:
- if (stt) {
- for (i = 0; i < npages; i++)
- if (stt->pages[i])
- __free_page(stt->pages[i]);
-
- kfree(stt);
- }
- return ret;
+fail_free:
+ for (i = 0; i < npages; i++)
+ if (stt->pages[i])
+ __free_page(stt->pages[i]);
+ kfree(stt);
+ return -ENOMEM;
}

static void kvmppc_clear_tce(struct iommu_table *tbl, unsigned long entry)
--
2.13.5


--

Thanks,

David

2017-08-23 09:16:32

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce


>>> + mutex_unlock(&kvm->lock);
>>> + kvm_put_kvm(kvm);
>>> + goto fail;
>>> + }
>>> + return ret;
>
> of simply
>
> if (!ret)

if (ret >= 0)
return ret;

is of course what I meant :)

> return 0;
>
> mutex_lock(&kvm->lock);
> list_del_rcu(&stt->list);
> mutex_unlock(&kvm->lock);
> kvm_put_kvm(kvm);


--

Thanks,

David

2017-08-23 10:17:44

by Xiaoming Ni

[permalink] [raw]
Subject: 答复: [PATCH] fix memory leak on kvm_vm_ioc tl_create_spapr_tce

>On 23.08.2017 08:06, Paul Mackerras wrote:
>> On Wed, Aug 23, 2017 at 01:43:08AM +0000, Nixiaoming wrote:
>>>> On 22.08.2017 17:15, David Hildenbrand wrote:
>>>>> On 22.08.2017 16:28, nixiaoming wrote:
>>>>>> miss kfree(stt) when anon_inode_getfd return fail so add check
>>>>>> anon_inode_getfd return val, and kfree stt
>>>>>>
>>>>>> Signed-off-by: nixiaoming <[email protected]>
>>>>>> ---
>>>>>> arch/powerpc/kvm/book3s_64_vio.c | 5 ++++-
>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c
>>>>>> b/arch/powerpc/kvm/book3s_64_vio.c
>>>>>> index a160c14..a0b4459 100644
>>>>>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>>>>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>>>>>> @@ -341,8 +341,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm
>>>>>> *kvm,
>>>>>>
>>>>>> mutex_unlock(&kvm->lock);
>>>>>>
>>>>>> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>>>>> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>>>>> stt, O_RDWR | O_CLOEXEC);
>>>>>> + if (ret < 0)
>>>>>> + goto fail;
>>>>>> + return ret;
>>>>>>
>>>>>> fail:
>>>>>> if (stt) {
>>>>>>
>>>>>
>>>>>
>>>>> stt has already been added to kvm->arch.spapr_tce_tables, so freeing
>>>>> it is evil IMHO. I don't know that code, so I don't know if there is
>>>>> some other place that will make sure that everything in
>>>>> kvm->arch.spapr_tce_tables will properly get freed, even when no
>>>>> kvm->release
>>>>> function has been called (kvm_spapr_tce_release).
>>>>>
>>>>
>>>> If it is really not freed, than also kvm_put_kvm(stt->kvm) is missing.
>>>>
>>>> --
>>>>
>>>> Thanks,
>>>>
>>>> David
>>>>
>>>
>>> if (!stt) return -ENOMEM;
>>> kvm_get_kvm(kvm);
>>> if anon_inode_getfd return -ENOMEM
>>> The user can not determine whether kvm_get_kvm has been called
>>> so need add kvm_pet_kvm when anon_inode_getfd fail
>>>
>>> stt has already been added to kvm->arch.spapr_tce_tables,
>>> but if anon_inode_getfd fail, stt is unused val,
>>> so call list_del_rcu, and free as quickly as possible
>>>
>>> new patch:
>>>
>>> ---
>>> arch/powerpc/kvm/book3s_64_vio.c | 10 +++++++++-
>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>>> index a160c14..e2228f1 100644
>>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>>> @@ -341,8 +341,16 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>>
>>> mutex_unlock(&kvm->lock);
>>>
>>> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>> stt, O_RDWR | O_CLOEXEC);
>>> + if (ret < 0) {
>>> + mutex_lock(&kvm->lock);
>>> + list_del_rcu(&stt->list);
>
>... don't we have to take care of rcu synchronization before freeing it?
>
>>> + mutex_unlock(&kvm->lock);
>>> + kvm_put_kvm(kvm);
>>> + goto fail;
>>> + }
>>> + return ret;
>
>of simply
>
>if (!ret)
> return 0;
>
>mutex_lock(&kvm->lock);
>list_del_rcu(&stt->list);
>mutex_unlock(&kvm->lock);
>kvm_put_kvm(kvm);
>
>
>>
>> It seems to me that it would be better to do the anon_inode_getfd()
>> call before the kvm_get_kvm() call, and go to the fail label if it
>> fails.
>
>I would have suggested to not add it to the list before it has been
>fully created (so nobody can have access to it). But I guess than we
>need another level of protection(e.g. kvm->lock).
>
>Am I missing something, or is kvm_vm_ioctl_create_spapr_tce() racy?
>
>The -EBUSY check is done without any locking, so two parallel creators
>could create an inconsistency, no? Shouldn't this all be protected by
>kvm->lock?
>
>>
>> Paul.
>>
>
>Independent of the fix, I'd suggest the following cleanup.
>
>
>From 979f55083ee965e25827a8743e8a9fdb85231a6f Mon Sep 17 00:00:00 2001
>From: David Hildenbrand <[email protected]>
>Date: Wed, 23 Aug 2017 10:08:58 +0200
>Subject: [PATCH v1 1/1] KVM: PPC: cleanup kvm_vm_ioctl_create_spapr_tce
>
>Let's simplify error handling.
>
>Signed-off-by: David Hildenbrand <[email protected]>
>---
> arch/powerpc/kvm/book3s_64_vio.c | 29 +++++++++++------------------
> 1 file changed, 11 insertions(+), 18 deletions(-)
>
>diff --git a/arch/powerpc/kvm/book3s_64_vio.c
>b/arch/powerpc/kvm/book3s_64_vio.c
>index a160c14304eb..6bac49292296 100644
>--- a/arch/powerpc/kvm/book3s_64_vio.c
>+++ b/arch/powerpc/kvm/book3s_64_vio.c
>@@ -295,8 +295,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> {
> struct kvmppc_spapr_tce_table *stt = NULL;
> unsigned long npages, size;
>- int ret = -ENOMEM;
>- int i;
>+ int i, ret;
>
> if (!args->size)
> return -EINVAL;
>@@ -310,16 +309,13 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> size = _ALIGN_UP(args->size, PAGE_SIZE >> 3);
> npages = kvmppc_tce_pages(size);
> ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true);
>- if (ret) {
>- stt = NULL;
>- goto fail;
>- }
>+ if (ret)
>+ return ret;
>
>- ret = -ENOMEM;
> stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
> GFP_KERNEL);
> if (!stt)
>- goto fail;
>+ return -ENOMEM;
>
> stt->liobn = args->liobn;
> stt->page_shift = args->page_shift;
>@@ -331,7 +327,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> for (i = 0; i < npages; i++) {
> stt->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
> if (!stt->pages[i])
>- goto fail;
>+ goto fail_free;
> }
>
> kvm_get_kvm(kvm);
>@@ -344,15 +340,12 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
> stt, O_RDWR | O_CLOEXEC);
>
>-fail:
>- if (stt) {
>- for (i = 0; i < npages; i++)
>- if (stt->pages[i])
>- __free_page(stt->pages[i]);
>-
>- kfree(stt);
>- }
>- return ret;
>+fail_free:
>+ for (i = 0; i < npages; i++)
>+ if (stt->pages[i])
>+ __free_page(stt->pages[i]);
>+ kfree(stt);
>+ return -ENOMEM;
> }
>
> static void kvmppc_clear_tce(struct iommu_table *tbl, unsigned long entry)
>--
>2.13.5
>
>
>--
>
>Thanks,
>
>David
>

Update patch based on advice from David Hildenbrand and Paul Mackerras

---
arch/powerpc/kvm/book3s_64_vio.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index a160c14..517594a 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -334,16 +334,21 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
goto fail;
}

- kvm_get_kvm(kvm);
-
mutex_lock(&kvm->lock);
list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables);

mutex_unlock(&kvm->lock);

- return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
+ ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
stt, O_RDWR | O_CLOEXEC);
-
+ if (ret >= 0) {
+ kvm_get_kvm(kvm);
+ return ret;
+ }
+ mutex_lock(&kvm->lock);
+ list_del_rcu(&stt->list);
+ mutex_unlock(&kvm->lock);
+ synchronize_rcu();
fail:
if (stt) {
for (i = 0; i < npages; i++)
--



2017-08-24 01:06:54

by Xiaoming Ni

[permalink] [raw]
Subject: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

>On 23.08.2017 08:06, Paul Mackerras wrote:
>> On Wed, Aug 23, 2017 at 01:43:08AM +0000, Nixiaoming wrote:
>>>> On 22.08.2017 17:15, David Hildenbrand wrote:
>>>>> On 22.08.2017 16:28, nixiaoming wrote:
>>>>>> miss kfree(stt) when anon_inode_getfd return fail so add check
>>>>>> anon_inode_getfd return val, and kfree stt
>>>>>>
>>>>>> Signed-off-by: nixiaoming <[email protected]>
>>>>>> ---
>>>>>> arch/powerpc/kvm/book3s_64_vio.c | 5 ++++-
>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c
>>>>>> b/arch/powerpc/kvm/book3s_64_vio.c
>>>>>> index a160c14..a0b4459 100644
>>>>>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>>>>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>>>>>> @@ -341,8 +341,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct
>>>>>> kvm *kvm,
>>>>>>
>>>>>> mutex_unlock(&kvm->lock);
>>>>>>
>>>>>> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>>>>> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>>>>> stt, O_RDWR | O_CLOEXEC);
>>>>>> + if (ret < 0)
>>>>>> + goto fail;
>>>>>> + return ret;
>>>>>>
>>>>>> fail:
>>>>>> if (stt) {
>>>>>>
>>>>>
>>>>>
>>>>> stt has already been added to kvm->arch.spapr_tce_tables, so
>>>>> freeing it is evil IMHO. I don't know that code, so I don't know
>>>>> if there is some other place that will make sure that everything
>>>>> in
>>>>> kvm->arch.spapr_tce_tables will properly get freed, even when no
>>>>> kvm->release
>>>>> function has been called (kvm_spapr_tce_release).
>>>>>
>>>>
>>>> If it is really not freed, than also kvm_put_kvm(stt->kvm) is missing.
>>>>
>>>> --
>>>>
>>>> Thanks,
>>>>
>>>> David
>>>>
>>>
>>> if (!stt) return -ENOMEM;
>>> kvm_get_kvm(kvm);
>>> if anon_inode_getfd return -ENOMEM
>>> The user can not determine whether kvm_get_kvm has been called so
>>> need add kvm_pet_kvm when anon_inode_getfd fail
>>>
>>> stt has already been added to kvm->arch.spapr_tce_tables, but if
>>> anon_inode_getfd fail, stt is unused val, so call list_del_rcu, and
>>> free as quickly as possible
>>>
>>> new patch:
>>>
>>> ---
>>> arch/powerpc/kvm/book3s_64_vio.c | 10 +++++++++-
>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c
>>> b/arch/powerpc/kvm/book3s_64_vio.c
>>> index a160c14..e2228f1 100644
>>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>>> @@ -341,8 +341,16 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm
>>> *kvm,
>>>
>>> mutex_unlock(&kvm->lock);
>>>
>>> - return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>> + ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
>>> stt, O_RDWR | O_CLOEXEC);
>>> + if (ret < 0) {
>>> + mutex_lock(&kvm->lock);
>>> + list_del_rcu(&stt->list);
>
>... don't we have to take care of rcu synchronization before freeing it?
>
>>> + mutex_unlock(&kvm->lock);
>>> + kvm_put_kvm(kvm);
>>> + goto fail;
>>> + }
>>> + return ret;
>
>of simply
>
>if (!ret)
> return 0;
>
>mutex_lock(&kvm->lock);
>list_del_rcu(&stt->list);
>mutex_unlock(&kvm->lock);
>kvm_put_kvm(kvm);
>
>
>>
>> It seems to me that it would be better to do the anon_inode_getfd()
>> call before the kvm_get_kvm() call, and go to the fail label if it
>> fails.
>
>I would have suggested to not add it to the list before it has been
>fully created (so nobody can have access to it). But I guess than we
>need another level of protection(e.g. kvm->lock).
>
>Am I missing something, or is kvm_vm_ioctl_create_spapr_tce() racy?
>
>The -EBUSY check is done without any locking, so two parallel creators
>could create an inconsistency, no? Shouldn't this all be protected by
>kvm->lock?
>
>>
>> Paul.
>>
>
>Independent of the fix, I'd suggest the following cleanup.
>
>
>From 979f55083ee965e25827a8743e8a9fdb85231a6f Mon Sep 17 00:00:00 2001
>From: David Hildenbrand <[email protected]>
>Date: Wed, 23 Aug 2017 10:08:58 +0200
>Subject: [PATCH v1 1/1] KVM: PPC: cleanup kvm_vm_ioctl_create_spapr_tce
>
>Let's simplify error handling.
>
>Signed-off-by: David Hildenbrand <[email protected]>
>---
> arch/powerpc/kvm/book3s_64_vio.c | 29 +++++++++++------------------
> 1 file changed, 11 insertions(+), 18 deletions(-)
>
>diff --git a/arch/powerpc/kvm/book3s_64_vio.c
>b/arch/powerpc/kvm/book3s_64_vio.c
>index a160c14304eb..6bac49292296 100644
>--- a/arch/powerpc/kvm/book3s_64_vio.c
>+++ b/arch/powerpc/kvm/book3s_64_vio.c
>@@ -295,8 +295,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>{
> struct kvmppc_spapr_tce_table *stt = NULL;
> unsigned long npages, size;
>- int ret = -ENOMEM;
>- int i;
>+ int i, ret;
>
> if (!args->size)
> return -EINVAL;
>@@ -310,16 +309,13 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> size = _ALIGN_UP(args->size, PAGE_SIZE >> 3);
> npages = kvmppc_tce_pages(size);
> ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true);
>- if (ret) {
>- stt = NULL;
>- goto fail;
>- }
>+ if (ret)
>+ return ret;
>
>- ret = -ENOMEM;
> stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
> GFP_KERNEL);
> if (!stt)
>- goto fail;
>+ return -ENOMEM;
>
> stt->liobn = args->liobn;
> stt->page_shift = args->page_shift;
>@@ -331,7 +327,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> for (i = 0; i < npages; i++) {
> stt->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
> if (!stt->pages[i])
>- goto fail;
>+ goto fail_free;
> }
>
> kvm_get_kvm(kvm);
>@@ -344,15 +340,12 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
> stt, O_RDWR | O_CLOEXEC);
>
>-fail:
>- if (stt) {
>- for (i = 0; i < npages; i++)
>- if (stt->pages[i])
>- __free_page(stt->pages[i]);
>-
>- kfree(stt);
>- }
>- return ret;
>+fail_free:
>+ for (i = 0; i < npages; i++)
>+ if (stt->pages[i])
>+ __free_page(stt->pages[i]);
>+ kfree(stt);
>+ return -ENOMEM;
> }
>
> static void kvmppc_clear_tce(struct iommu_table *tbl, unsigned long
>entry)
>--
>2.13.5
>
>
>--
>
>Thanks,
>
>David
>

Update patch based on advice from David Hildenbrand and Paul Mackerras

---
arch/powerpc/kvm/book3s_64_vio.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index a160c14..517594a 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -334,16 +334,21 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
goto fail;
}

- kvm_get_kvm(kvm);
-
mutex_lock(&kvm->lock);
list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables);

mutex_unlock(&kvm->lock);

- return anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
+ ret = anon_inode_getfd("kvm-spapr-tce", &kvm_spapr_tce_fops,
stt, O_RDWR | O_CLOEXEC);
-
+ if (ret >= 0) {
+ kvm_get_kvm(kvm);
+ return ret;
+ }
+ mutex_lock(&kvm->lock);
+ list_del_rcu(&stt->list);
+ mutex_unlock(&kvm->lock);
+ synchronize_rcu();
fail:
if (stt) {
for (i = 0; i < npages; i++)
--



2017-08-27 21:02:38

by Al Viro

[permalink] [raw]
Subject: Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

On Wed, Aug 23, 2017 at 04:06:24PM +1000, Paul Mackerras wrote:

> It seems to me that it would be better to do the anon_inode_getfd()
> call before the kvm_get_kvm() call, and go to the fail label if it
> fails.

And what happens if another thread does close() on the (guessed) fd?

2017-08-28 04:42:35

by Paul Mackerras

[permalink] [raw]
Subject: Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

On Sun, Aug 27, 2017 at 10:02:20PM +0100, Al Viro wrote:
> On Wed, Aug 23, 2017 at 04:06:24PM +1000, Paul Mackerras wrote:
>
> > It seems to me that it would be better to do the anon_inode_getfd()
> > call before the kvm_get_kvm() call, and go to the fail label if it
> > fails.
>
> And what happens if another thread does close() on the (guessed) fd?

Chaos ensues, but mostly because we don't have proper mutual exclusion
on the modifications to the list. I'll add a mutex_lock/unlock to
kvm_spapr_tce_release() and move the anon_inode_getfd() call inside
the mutex.

It looks like the other possible uses of the fd (mmap, and passing it
as a parameter to the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM
device fd) are safe.

Thanks,
Paul.

2017-08-28 05:28:23

by Al Viro

[permalink] [raw]
Subject: Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

On Mon, Aug 28, 2017 at 02:38:37PM +1000, Paul Mackerras wrote:
> On Sun, Aug 27, 2017 at 10:02:20PM +0100, Al Viro wrote:
> > On Wed, Aug 23, 2017 at 04:06:24PM +1000, Paul Mackerras wrote:
> >
> > > It seems to me that it would be better to do the anon_inode_getfd()
> > > call before the kvm_get_kvm() call, and go to the fail label if it
> > > fails.
> >
> > And what happens if another thread does close() on the (guessed) fd?
>
> Chaos ensues, but mostly because we don't have proper mutual exclusion
> on the modifications to the list. I'll add a mutex_lock/unlock to
> kvm_spapr_tce_release() and move the anon_inode_getfd() call inside
> the mutex.
>
> It looks like the other possible uses of the fd (mmap, and passing it
> as a parameter to the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM
> device fd) are safe.

Frankly, it's a lot saner to have "no failure points past anon_inode_getfd()"
policy...

2017-08-28 06:06:38

by Paul Mackerras

[permalink] [raw]
Subject: Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

On Mon, Aug 28, 2017 at 06:28:08AM +0100, Al Viro wrote:
> On Mon, Aug 28, 2017 at 02:38:37PM +1000, Paul Mackerras wrote:
> > On Sun, Aug 27, 2017 at 10:02:20PM +0100, Al Viro wrote:
> > > On Wed, Aug 23, 2017 at 04:06:24PM +1000, Paul Mackerras wrote:
> > >
> > > > It seems to me that it would be better to do the anon_inode_getfd()
> > > > call before the kvm_get_kvm() call, and go to the fail label if it
> > > > fails.
> > >
> > > And what happens if another thread does close() on the (guessed) fd?
> >
> > Chaos ensues, but mostly because we don't have proper mutual exclusion
> > on the modifications to the list. I'll add a mutex_lock/unlock to
> > kvm_spapr_tce_release() and move the anon_inode_getfd() call inside
> > the mutex.
> >
> > It looks like the other possible uses of the fd (mmap, and passing it
> > as a parameter to the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM
> > device fd) are safe.
>
> Frankly, it's a lot saner to have "no failure points past anon_inode_getfd()"
> policy...

Right. In my latest patch, there are no failure points past
anon_inode_getfd().

Paul.

2017-08-28 11:31:19

by Michael Ellerman

[permalink] [raw]
Subject: Re: Re: [PATCH] fix memory leak on kvm_vm_ioctl_create_spapr_tce

Al Viro <[email protected]> writes:

> On Mon, Aug 28, 2017 at 02:38:37PM +1000, Paul Mackerras wrote:
>> On Sun, Aug 27, 2017 at 10:02:20PM +0100, Al Viro wrote:
>> > On Wed, Aug 23, 2017 at 04:06:24PM +1000, Paul Mackerras wrote:
>> >
>> > > It seems to me that it would be better to do the anon_inode_getfd()
>> > > call before the kvm_get_kvm() call, and go to the fail label if it
>> > > fails.
>> >
>> > And what happens if another thread does close() on the (guessed) fd?
>>
>> Chaos ensues, but mostly because we don't have proper mutual exclusion
>> on the modifications to the list. I'll add a mutex_lock/unlock to
>> kvm_spapr_tce_release() and move the anon_inode_getfd() call inside
>> the mutex.
>>
>> It looks like the other possible uses of the fd (mmap, and passing it
>> as a parameter to the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM
>> device fd) are safe.
>
> Frankly, it's a lot saner to have "no failure points past anon_inode_getfd()"
> policy...

Actually I thought that was a hard rule. But I don't see it documented
or mentioned anywhere so I'm not sure now why I thought that.

cheers