2021-07-30 04:34:01

by Alexey Kardashevskiy

[permalink] [raw]
Subject: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs

The debugfs folder name is made of a supposedly unique pair of
the process pid and a VM fd. However it is possible to get a race here
which manifests in these messages:

[ 471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!

debugfs_create_dir() returns an error which is handled correctly
everywhere except kvm_create_vm_debugfs() where the code allocates
stat data structs and overwrites the older values regardless.

Spotted by syzkaller. This slow memory leak produces way too many
OOM reports.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---

I am pretty sure we better fix the race but I am not quite sure what
lock is appropriate here, ideas?

---
virt/kvm/kvm_main.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 986959833d70..89496fd8127a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -904,6 +904,10 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)

snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
+ if (IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
+ pr_err("Failed to create %s\n", dir_name);
+ return 0;
+ }

kvm->debugfs_stat_data = kcalloc(kvm_debugfs_num_entries,
sizeof(*kvm->debugfs_stat_data),
--
2.30.2



2021-08-03 11:19:00

by Greg KH

[permalink] [raw]
Subject: Re: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs

On Fri, Jul 30, 2021 at 02:32:17PM +1000, Alexey Kardashevskiy wrote:
> The debugfs folder name is made of a supposedly unique pair of
> the process pid and a VM fd. However it is possible to get a race here
> which manifests in these messages:
>
> [ 471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!
>
> debugfs_create_dir() returns an error which is handled correctly
> everywhere except kvm_create_vm_debugfs() where the code allocates
> stat data structs and overwrites the older values regardless.
>
> Spotted by syzkaller. This slow memory leak produces way too many
> OOM reports.
>
> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> ---
>
> I am pretty sure we better fix the race but I am not quite sure what
> lock is appropriate here, ideas?
>
> ---
> virt/kvm/kvm_main.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 986959833d70..89496fd8127a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -904,6 +904,10 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
>
> snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
> kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
> + if (IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
> + pr_err("Failed to create %s\n", dir_name);
> + return 0;
> + }

It should not matter if you fail a debugfs call at all.

If there is a larger race at work here, please fix that root cause, do
not paper over it by attempting to have debugfs catch the issue for you.

thanks,

greg k-h

2021-08-03 12:53:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs

On Tue, Aug 3, 2021 at 1:16 PM Greg KH <[email protected]> wrote:
> On Fri, Jul 30, 2021 at 02:32:17PM +1000, Alexey Kardashevskiy wrote:
> > snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
> > kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
> > + if (IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
> > + pr_err("Failed to create %s\n", dir_name);
> > + return 0;
> > + }
>
> It should not matter if you fail a debugfs call at all.
>
> If there is a larger race at work here, please fix that root cause, do
> not paper over it by attempting to have debugfs catch the issue for you.

I don't think it's a race, it's really just a bug that is intrinsic in how
the debugfs files are named. You can just do something like this:

#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <sys/ioctl.h>
#include <linux/kvm.h>
#include <stdlib.h>
int main() {
int kvmfd = open("/dev/kvm", O_RDONLY);
int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
if (fork() == 0) {
printf("before: %d\n", fd);
sleep(2);
} else {
close(fd);
sleep(1);
int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
printf("after: %d\n", fd);
wait(NULL);
}
}

So Alexey's patch is okay and I've queued it, though with pr_warn_ratelimited
instead of pr_err.

Paolo


2021-08-03 13:14:57

by Greg KH

[permalink] [raw]
Subject: Re: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs

On Tue, Aug 03, 2021 at 02:52:17PM +0200, Paolo Bonzini wrote:
> On Tue, Aug 3, 2021 at 1:16 PM Greg KH <[email protected]> wrote:
> > On Fri, Jul 30, 2021 at 02:32:17PM +1000, Alexey Kardashevskiy wrote:
> > > snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
> > > kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
> > > + if (IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
> > > + pr_err("Failed to create %s\n", dir_name);
> > > + return 0;
> > > + }
> >
> > It should not matter if you fail a debugfs call at all.
> >
> > If there is a larger race at work here, please fix that root cause, do
> > not paper over it by attempting to have debugfs catch the issue for you.
>
> I don't think it's a race, it's really just a bug that is intrinsic in how
> the debugfs files are named. You can just do something like this:
>
> #include <unistd.h>
> #include <stdio.h>
> #include <fcntl.h>
> #include <sys/wait.h>
> #include <sys/ioctl.h>
> #include <linux/kvm.h>
> #include <stdlib.h>
> int main() {
> int kvmfd = open("/dev/kvm", O_RDONLY);
> int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
> if (fork() == 0) {
> printf("before: %d\n", fd);
> sleep(2);
> } else {
> close(fd);
> sleep(1);
> int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
> printf("after: %d\n", fd);
> wait(NULL);
> }
> }
>
> So Alexey's patch is okay and I've queued it, though with pr_warn_ratelimited
> instead of pr_err.

So userspace can create kvm resources with duplicate names? That feels
wrong to me.

But if all that is "duplicate" is the debugfs kvm directory, why not ask
debugfs if it is already present before trying to create it again? That
way you will not have debugfs complain about duplicate
files/directories.

thanks,

greg k-h

2021-08-03 13:16:23

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs



On 03/08/2021 22:52, Paolo Bonzini wrote:
> On Tue, Aug 3, 2021 at 1:16 PM Greg KH <[email protected]> wrote:
>> On Fri, Jul 30, 2021 at 02:32:17PM +1000, Alexey Kardashevskiy wrote:
>>> snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
>>> kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
>>> + if (IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
>>> + pr_err("Failed to create %s\n", dir_name);
>>> + return 0;
>>> + }
>>
>> It should not matter if you fail a debugfs call at all.
>>
>> If there is a larger race at work here, please fix that root cause, do
>> not paper over it by attempting to have debugfs catch the issue for you.
>
> I don't think it's a race, it's really just a bug that is intrinsic in how
> the debugfs files are named. You can just do something like this:
>
> #include <unistd.h>
> #include <stdio.h>
> #include <fcntl.h>
> #include <sys/wait.h>
> #include <sys/ioctl.h>
> #include <linux/kvm.h>
> #include <stdlib.h>
> int main() {
> int kvmfd = open("/dev/kvm", O_RDONLY);
> int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
> if (fork() == 0) {
> printf("before: %d\n", fd);
> sleep(2);
> } else {
> close(fd);
> sleep(1);
> int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
> printf("after: %d\n", fd);
> wait(NULL);
> }
> }

oh nice demo :)

although I still think there was a race when I saw it as there was no
fork() in the picture but continuous create/destroy VM in 16 threads on
16 VCPUs with no KVM_RUN in between.

>
> So Alexey's patch is okay and I've queued it, though with pr_warn_ratelimited
> instead of pr_err.

Makes sense with your reproducer. Thanks,


--
Alexey

2021-08-03 13:33:46

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs

> So userspace can create kvm resources with duplicate names? That feels
> wrong to me.

Yes, the name is just the (pid, file descriptor). It's used only for
debugfs, and it's not really likely going to happen unless a program
does it on purpose, but the ugliness/wrongness is one of the reasons
why we now have a non-debugfs mechanism to retrieve the stats.

> But if all that is "duplicate" is the debugfs kvm directory, why not ask
> debugfs if it is already present before trying to create it again? That
> way you will not have debugfs complain about duplicate
> files/directories.

That would also be racy; it would need a simple mutex around
debugfs_lookup and debugfs_create_dir. But it would indeed avoid the
complaints altogether, so I'll prepare a patch. Thanks,

Paolo