Hello Avi,
here is the latest respin of my fixes for the kvm module unload problem:
[PATCH 1/2] anon_inodes: use fops->owner for module refcount
[PATCH 2/2] kvm: set owner of cpu and vm file operations
Both patches fix module reference counting problems and only matter for module
unload - nothing critical.
Tested on s390 and x86_32.
Christian
There is an imbalance for anonymous inodes. If the fops->owner field is set,
the module reference count of owner is decreases on release.
("filp_close" --> "__fput" ---> "fops_put")
On the other hand, anon_inode_getfd does not increase the module reference
count of owner. This causes two problems:
- if owner is set, the module refcount goes negative
- if owner is not set, the module can be unloaded while code is running
This patch changes anon_inode_getfd to be symmetric regarding fops->owner
handling.
I have checked all existing users of anon_inode_getfd. Noone sets fops->owner,
thats why nobody has seen the module refcount negative. The refcounting was
tested with a patched and unpatched KVM module.(see patch 2/2) I also did an
epoll_open/close test.
Signed-off-by: Christian Borntraeger <[email protected]>
Reviewed-by: Davide Libenzi <[email protected]>
---
fs/anon_inodes.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Index: kvm/fs/anon_inodes.c
===================================================================
--- kvm.orig/fs/anon_inodes.c
+++ kvm/fs/anon_inodes.c
@@ -79,9 +79,12 @@ int anon_inode_getfd(const char *name, c
if (IS_ERR(anon_inode_inode))
return -ENODEV;
+ if (fops->owner && !try_module_get(fops->owner))
+ return -ENOENT;
+
error = get_unused_fd_flags(flags);
if (error < 0)
- return error;
+ goto err_module;
fd = error;
/*
@@ -128,6 +131,8 @@ err_dput:
dput(dentry);
err_put_unused_fd:
put_unused_fd(fd);
+err_module:
+ module_put(fops->owner);
return error;
}
EXPORT_SYMBOL_GPL(anon_inode_getfd);
There is a race between a "close of the file descriptors" and module
unload in the kvm module.
You can easily trigger this problem by applying this debug patch:
>--- kvm.orig/virt/kvm/kvm_main.c
>+++ kvm/virt/kvm/kvm_main.c
>@@ -648,10 +648,14 @@ void kvm_free_physmem(struct kvm *kvm)
> kvm_free_physmem_slot(&kvm->memslots[i], NULL);
> }
>
>+#include <linux/delay.h>
> static void kvm_destroy_vm(struct kvm *kvm)
> {
> struct mm_struct *mm = kvm->mm;
>
>+ printk("off1\n");
>+ msleep(5000);
>+ printk("off2\n");
> spin_lock(&kvm_lock);
> list_del(&kvm->vm_list);
> spin_unlock(&kvm_lock);
and killing the userspace, followed by an rmmod.
The problem is that kvm_destroy_vm can run while the module count
is 0. That means, you can remove the module while kvm_destroy_vm
is running. But kvm_destroy_vm is part of the module text. This
causes a kerneloops. The race exists without the msleep but is much
harder to trigger.
This patch requires the fix for anon_inodes (anon_inodes: use fops->owner
for module refcount).
With this patch, we can set the owner of all anonymous KVM inodes file
operations. The VFS will then control the KVM module refcount as long as there
is an open file. kvm_destroy_vm will be called by the release function of the
last closed file - before the VFS drops the module refcount.
Signed-off-by: Christian Borntraeger <[email protected]>
---
virt/kvm/kvm_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Index: kvm/virt/kvm/kvm_main.c
===================================================================
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -1303,7 +1303,7 @@ static int kvm_vcpu_release(struct inode
return 0;
}
-static const struct file_operations kvm_vcpu_fops = {
+static struct file_operations kvm_vcpu_fops = {
.release = kvm_vcpu_release,
.unlocked_ioctl = kvm_vcpu_ioctl,
.compat_ioctl = kvm_vcpu_ioctl,
@@ -1697,7 +1697,7 @@ static int kvm_vm_mmap(struct file *file
return 0;
}
-static const struct file_operations kvm_vm_fops = {
+static struct file_operations kvm_vm_fops = {
.release = kvm_vm_release,
.unlocked_ioctl = kvm_vm_ioctl,
.compat_ioctl = kvm_vm_ioctl,
@@ -2061,6 +2061,8 @@ int kvm_init(void *opaque, unsigned int
}
kvm_chardev_ops.owner = module;
+ kvm_vm_fops.owner = module;
+ kvm_vcpu_fops.owner = module;
r = misc_register(&kvm_dev);
if (r) {
Am Dienstag, 2. Dezember 2008 schrieb Christian Borntraeger:
> Hello Avi,
>
> here is the latest respin of my fixes for the kvm module unload problem:
>
> [PATCH 1/2] anon_inodes: use fops->owner for module refcount
> [PATCH 2/2] kvm: set owner of cpu and vm file operations
In the meantime patch 2 has a offset against the latest git. Should I resend
the patch or will you apply it anyway?
Christian
Christian Borntraeger wrote:
> Am Dienstag, 2. Dezember 2008 schrieb Christian Borntraeger:
>
>> Hello Avi,
>>
>> here is the latest respin of my fixes for the kvm module unload problem:
>>
>> [PATCH 1/2] anon_inodes: use fops->owner for module refcount
>> [PATCH 2/2] kvm: set owner of cpu and vm file operations
>>
>
> In the meantime patch 2 has a offset against the latest git. Should I resend
> the patch or will you apply it anyway?
>
Applied the patches; thanks for the reminder.
--
error compiling committee.c: too many arguments to function
On Monday 08 December 2008 19:57:14 Avi Kivity wrote:
> Christian Borntraeger wrote:
> > Am Dienstag, 2. Dezember 2008 schrieb Christian Borntraeger:
> >> Hello Avi,
> >>
> >> here is the latest respin of my fixes for the kvm module unload problem:
> >>
> >> [PATCH 1/2] anon_inodes: use fops->owner for module refcount
> >> [PATCH 2/2] kvm: set owner of cpu and vm file operations
> >
> > In the meantime patch 2 has a offset against the latest git. Should I
> > resend the patch or will you apply it anyway?
>
> Applied the patches; thanks for the reminder.
Should we push the first patch to 2.6.28? I got some trouble with the separate
2nd patch, for I am using Linus' tree and make KVM as modules, so the
reference count reduced to negative now... (Oh Avi, I know you suggest to use
in kernel rather than modules, but module is indeed convenient. :) )
--
regards
Yang, Sheng
Sheng Yang wrote:
> Should we push the first patch to 2.6.28?
It's not a recent regression, so no.
> I got some trouble with the separate
> 2nd patch, for I am using Linus' tree and make KVM as modules, so the
> reference count reduced to negative now... (Oh Avi, I know you suggest to use
> in kernel rather than modules, but module is indeed convenient. :) )
>
Right, that would affect everyone. What we need is to hack the second
patch for external modules on <2.6.29.
--
error compiling committee.c: too many arguments to function
Am Dienstag, 9. Dezember 2008 schrieb Avi Kivity:
> Sheng Yang wrote:
> > Should we push the first patch to 2.6.28?
>
> It's not a recent regression, so no.
>
> > I got some trouble with the separate
> > 2nd patch, for I am using Linus' tree and make KVM as modules, so the
> > reference count reduced to negative now... (Oh Avi, I know you suggest to
use
> > in kernel rather than modules, but module is indeed convenient. :) )
> >
>
> Right, that would affect everyone. What we need is to hack the second
> patch for external modules on <2.6.29.
Oh this is tricky. Both patches belong together, patch 2 depends on patch 1.
For base kernels which do not contain patch1, this additional (untested) patch
would probably help:
---
virt/kvm/kvm_main.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
Index: kvm/virt/kvm/kvm_main.c
===================================================================
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -1501,9 +1501,15 @@ static struct file_operations kvm_vcpu_f
*/
static int create_vcpu_fd(struct kvm_vcpu *vcpu)
{
- int fd = anon_inode_getfd("kvm-vcpu", &kvm_vcpu_fops, vcpu, 0);
- if (fd < 0)
+ int fd;
+
+ if (!try_module_get(kvm_vcpu_fops.owner))
+ return -ENOENT;
+ fd = anon_inode_getfd("kvm-vcpu", &kvm_vcpu_fops, vcpu, 0);
+ if (fd < 0) {
kvm_put_kvm(vcpu->kvm);
+ module_put(kvm_vcpu_fops.owner);
+ }
return fd;
}
@@ -1895,12 +1901,19 @@ static int kvm_dev_ioctl_create_vm(void)
int fd;
struct kvm *kvm;
+ if (!try_module_get(kvm_vm_fops.owner))
+ return -ENOENT;
+
kvm = kvm_create_vm();
- if (IS_ERR(kvm))
+ if (IS_ERR(kvm)) {
+ module_put(kvm_vm_fops.owner);
return PTR_ERR(kvm);
+ }
fd = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, 0);
- if (fd < 0)
+ if (fd < 0) {
kvm_put_kvm(kvm);
+ module_put(kvm_vm_fops.owner);
+ }
return fd;
}
The problem is, how do you detect if the base kernel has patch1 applied?
Christian
Christian Borntraeger wrote:
> The problem is, how do you detect if the base kernel has patch1 applied?
>
In the external module compatibility kit, implement two versions of
anon_inode_getfd(), and select the appropriate one according to kernel
version (like we currently support 91 smp_call_function_single variants).
--
error compiling committee.c: too many arguments to function
Avi Kivity wrote:
> Christian Borntraeger wrote:
>> The problem is, how do you detect if the base kernel has patch1 applied?
>>
>
> In the external module compatibility kit, implement two versions of
> anon_inode_getfd(), and select the appropriate one according to kernel
> version (like we currently support 91 smp_call_function_single variants).
>
I committed something simpler: if running on 2.6.28 or earlier, I hack
out the two lines your second patch adds.\
--
error compiling committee.c: too many arguments to function
On Tuesday 09 December 2008 21:22:42 Avi Kivity wrote:
> Avi Kivity wrote:
> > Christian Borntraeger wrote:
> >> The problem is, how do you detect if the base kernel has patch1 applied?
> >
> > In the external module compatibility kit, implement two versions of
> > anon_inode_getfd(), and select the appropriate one according to kernel
> > version (like we currently support 91 smp_call_function_single variants).
>
> I committed something simpler: if running on 2.6.28 or earlier, I hack
> out the two lines your second patch adds.\
Good hack!
> diff --git a/kernel/external-module-compat-comm.h
> b/kernel/external-module-compat-comm.h index a089f62..d90522d 100644
> --- a/kernel/external-module-compat-comm.h
> +++ b/kernel/external-module-compat-comm.h
> @@ -682,3 +682,13 @@ static inline void cpumask_clear_cpu(int cpu,
> cpumask_var_t mask)
>
> #endif
>
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,28)
But should it be 2,6,29?...
--
regards
Yang, Sheng
> +
> +#define IF_ANON_INODES_DOES_REFCOUNTS(x)
> +
> +#else
> +
> +#define IF_ANON_INODES_DOES_REFCOUNTS(x) x
> +
> +#endif
> +
Sheng Yang wrote:
>> +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,28)
>>
>
> But should it be 2,6,29?...
>
>
Thanks, fixed.
--
error compiling committee.c: too many arguments to function