2021-12-22 22:53:56

by Vipin Sharma

[permalink] [raw]
Subject: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

VM worker kthreads can linger in the VM process's cgroup for sometime
after KVM terminates the VM process.

KVM terminates the worker kthreads by calling kthread_stop() which waits
on the 'exited' completion, triggered by exit_mm(), via mm_release(),
during kthread's exit. However, these kthreads are removed from the
cgroup using cgroup_exit() call which happens after exit_mm(). A VM
process can terminate between the time window of exit_mm() to
cgroup_exit(), leaving only worker kthreads in the cgroup.

Moving worker kthreads back to the original cgroup (kthreadd_task's
cgroup) makes sure that cgroup is empty as soon as the main VM process
is terminated.

kthreadd_task is not an exported symbol which causes build errors if KVM
is built as a loadable module. Both users (kvm_main & vhost) of
cgroup_attach_task_all(), have the same issue, therefore, using
kthreadd_task as a default option is chosen when the API is called with
NULL argument.

Signed-off-by: Vipin Sharma <[email protected]>
---

v2:
- Use kthreadd_task in the cgroup API to avoid build issue.

v1: https://lore.kernel.org/lkml/[email protected]/

kernel/cgroup/cgroup-v1.c | 5 +++++
virt/kvm/kvm_main.c | 15 ++++++++++++++-
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 81c9e0685948..81d4b2f2acf0 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -51,6 +51,8 @@ bool cgroup1_ssid_disabled(int ssid)
* @from: attach to all cgroups of a given task
* @tsk: the task to be attached
*
+ * If @from is NULL then use kthreadd_task for finding the destination cgroups.
+ *
* Return: %0 on success or a negative errno code on failure
*/
int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
@@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
struct cgroup_root *root;
int retval = 0;

+ if (!from)
+ from = kthreadd_task;
+
mutex_lock(&cgroup_mutex);
percpu_down_write(&cgroup_threadgroup_rwsem);
for_each_root(root) {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b0f7e6eb00ff..f7504578c374 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5785,7 +5785,7 @@ static int kvm_vm_worker_thread(void *context)
init_context = NULL;

if (err)
- return err;
+ goto out;

/* Wait to be woken up by the spawner before proceeding. */
kthread_parkme();
@@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context)
if (!kthread_should_stop())
err = thread_fn(kvm, data);

+out:
+ /*
+ * We need to move the kthread back to its original cgroups, so that it
+ * doesn't linger in the cgroups of the user process after the user
+ * process has already terminated.
+ *
+ * kthread_stop() waits on 'exited' completion condition which is set
+ * in exit_mm(), via mm_release(), in do_exit(). However, kthread
+ * is removed from cgroups in the cgroup_exit() which is called after
+ * exit_mm(). This causes lingering of kthreads in cgroups after main
+ * VM process has finished.
+ */
+ WARN_ON(cgroup_attach_task_all(NULL, current));
return err;
}


base-commit: 5e4e84f1124aa02643833b7ea40abd5a8e964388
--
2.34.1.307.g9b7440fafd-goog



2021-12-28 17:17:48

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

On Wed, Dec 22, 2021, Vipin Sharma wrote:
> kthreadd_task is not an exported symbol which causes build errors if KVM
> is built as a loadable module. Both users (kvm_main & vhost) of
> cgroup_attach_task_all(), have the same issue, therefore, using
> kthreadd_task as a default option is chosen when the API is called with
> NULL argument.

...

> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> index 81c9e0685948..81d4b2f2acf0 100644
> --- a/kernel/cgroup/cgroup-v1.c
> +++ b/kernel/cgroup/cgroup-v1.c
> @@ -51,6 +51,8 @@ bool cgroup1_ssid_disabled(int ssid)
> * @from: attach to all cgroups of a given task
> * @tsk: the task to be attached
> *
> + * If @from is NULL then use kthreadd_task for finding the destination cgroups.
> + *
> * Return: %0 on success or a negative errno code on failure
> */
> int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> @@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> struct cgroup_root *root;
> int retval = 0;
>
> + if (!from)
> + from = kthreadd_task;

Rather than sully cgroup_attach_task_all() with this behavior, can't KVM do

cgroup_attach_task_all(current->real_parent, current)

since AFAICT real_parent is guaranteed to point at kthreadd_task.

> +
> mutex_lock(&cgroup_mutex);
> percpu_down_write(&cgroup_threadgroup_rwsem);
> for_each_root(root) {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b0f7e6eb00ff..f7504578c374 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5785,7 +5785,7 @@ static int kvm_vm_worker_thread(void *context)
> init_context = NULL;
>
> if (err)
> - return err;
> + goto out;
>
> /* Wait to be woken up by the spawner before proceeding. */
> kthread_parkme();
> @@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context)
> if (!kthread_should_stop())
> err = thread_fn(kvm, data);
>
> +out:
> + /*
> + * We need to move the kthread back to its original cgroups, so that it

Please state what is being done, not what "needs" to be done. The need to do
something is implicit, otherwise we wouldn't be doing it.

> + * doesn't linger in the cgroups of the user process after the user
> + * process has already terminated.
> + *
> + * kthread_stop() waits on 'exited' completion condition which is set
> + * in exit_mm(), via mm_release(), in do_exit(). However, kthread
> + * is removed from cgroups in the cgroup_exit() which is called after
> + * exit_mm(). This causes lingering of kthreads in cgroups after main
> + * VM process has finished.
> + */
> + WARN_ON(cgroup_attach_task_all(NULL, current));

This should not WARN, cgroup_attach_task_all() needs to perform allocations and
will fail with -ENOMEM even in the absense of kernel bugs.

> return err;
> }
>
>
> base-commit: 5e4e84f1124aa02643833b7ea40abd5a8e964388
> --
> 2.34.1.307.g9b7440fafd-goog
>

2022-01-05 18:05:01

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

Hi Vipin.

On Wed, Dec 22, 2021 at 10:53:50PM +0000, Vipin Sharma <[email protected]> wrote:
> VM worker kthreads can linger in the VM process's cgroup for sometime
> after KVM terminates the VM process.

Why is it a problem? And how long are we talking about?

> A VM process can terminate between the time window of exit_mm() to
> cgroup_exit(), leaving only worker kthreads in the cgroup.

Even kthreads should eventually have PF_EXITING set, they shouldd be
treated as "user-space" zombies by cgroups, i.e. mostly invisible (e.g.
it doesn't prevent rmdir'ing the cgroup).

(And after the last task_struct reference is gone, the cgroup structs
can be released too. Maybe the cause is holding the reference to the KVM
worker thread somewhere for too long.)

> Moving worker kthreads back to the original cgroup (kthreadd_task's
> cgroup) makes sure that cgroup is empty as soon as the main VM process
> is terminated.

BTW this used to be done for "user-space" tasks too (migrate to root
cgroup) but it was replaced with the less transactional "ignore zombies"
approach. So this change seems inconsistent.


Regards,
Michal

2022-01-21 06:59:21

by Vipin Sharma

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

On Tue, Dec 28, 2021 at 9:17 AM Sean Christopherson <[email protected]> wrote:
>
> On Wed, Dec 22, 2021, Vipin Sharma wrote:
> > diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> > index 81c9e0685948..81d4b2f2acf0 100644
> > --- a/kernel/cgroup/cgroup-v1.c
> > +++ b/kernel/cgroup/cgroup-v1.c
> > int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> > @@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> > struct cgroup_root *root;
> > int retval = 0;
> >
> > + if (!from)
> > + from = kthreadd_task;
>
> Rather than sully cgroup_attach_task_all() with this behavior, can't KVM do
>
> cgroup_attach_task_all(current->real_parent, current)
>
> since AFAICT real_parent is guaranteed to point at kthreadd_task.
>

Thanks for the "real_parent" suggestion. This is much cleaner and
better than changing cgroup logic. I will make this change.

> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index b0f7e6eb00ff..f7504578c374 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context)
> > if (!kthread_should_stop())
> > err = thread_fn(kvm, data);
> >
> > +out:
> > + /*
> > + * We need to move the kthread back to its original cgroups, so that it
>
> Please state what is being done, not what "needs" to be done. The need to do
> something is implicit, otherwise we wouldn't be doing it.
>

I will update the statement. Thanks.

> > + * doesn't linger in the cgroups of the user process after the user
> > + * process has already terminated.
> > + *
> > + * kthread_stop() waits on 'exited' completion condition which is set
> > + * in exit_mm(), via mm_release(), in do_exit(). However, kthread
> > + * is removed from cgroups in the cgroup_exit() which is called after
> > + * exit_mm(). This causes lingering of kthreads in cgroups after main
> > + * VM process has finished.
> > + */
> > + WARN_ON(cgroup_attach_task_all(NULL, current));
>
> This should not WARN, cgroup_attach_task_all() needs to perform allocations and
> will fail with -ENOMEM even in the absense of kernel bugs.
>

I will remove WARN_ON and print an error using kvm_err(), it will be
similar to the earlier call of cgroup_attach_task_all() in the same
function.

Thanks

2022-01-21 07:38:09

by Vipin Sharma

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

Hi Michal,

Thanks for looking into this patch. I will be using Sean's suggestion
and use real_parent of the task. This will avoid my ugly code in the
cgroup APIs.

On Wed, Jan 5, 2022 at 10:04 AM Michal Koutný <[email protected]> wrote:
>
> Hi Vipin.
>
> On Wed, Dec 22, 2021 at 10:53:50PM +0000, Vipin Sharma <[email protected]> wrote:
> > VM worker kthreads can linger in the VM process's cgroup for sometime
> > after KVM terminates the VM process.
>
> Why is it a problem? And how long are we talking about?
>

Automated tools/scripts which delete VM cgroups after the main KVM
process ends were seeing deletion errors because kernel worker threads
were still running inside those cgroups. This is not a very frequent
issue but we noticed it happens every now and then.


> > A VM process can terminate between the time window of exit_mm() to
> > cgroup_exit(), leaving only worker kthreads in the cgroup.
>
> Even kthreads should eventually have PF_EXITING set, they shouldd be
> treated as "user-space" zombies by cgroups, i.e. mostly invisible (e.g.
> it doesn't prevent rmdir'ing the cgroup).
>

Since that eventual time period is not known, we can either pause the
script for sometime before starting the cleanup or add some x number
of retries. Both of which are not preferable due to indeterministic
nature.

> (And after the last task_struct reference is gone, the cgroup structs
> can be released too. Maybe the cause is holding the reference to the KVM
> worker thread somewhere for too long.)
>
> > Moving worker kthreads back to the original cgroup (kthreadd_task's
> > cgroup) makes sure that cgroup is empty as soon as the main VM process
> > is terminated.
>
> BTW this used to be done for "user-space" tasks too (migrate to root
> cgroup) but it was replaced with the less transactional "ignore zombies"
> approach. So this change seems inconsistent.
>
Interesting, however, until PF_EXITING is set those threads will also
exhibit the same behavior if it comes to deletion. Thanks for the
background, good to know.

Thanks

2022-01-21 07:38:17

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

Hello,

On Tue, Jan 18, 2022 at 12:25:48PM -0800, Vipin Sharma wrote:
> Automated tools/scripts which delete VM cgroups after the main KVM
> process ends were seeing deletion errors because kernel worker threads
> were still running inside those cgroups. This is not a very frequent
> issue but we noticed it happens every now and then.

So, these are normally driven by the !populated events. That's how everyone
else is doing it. If you want to tie the kvm workers lifetimes to kvm
process, wouldn't it be cleaner to do so from kvm side? ie. let kvm process
exit wait for the workers to be cleaned up.

Thanks.

--
tejun

2022-01-21 19:54:34

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

On 1/18/22 21:39, Tejun Heo wrote:
> So, these are normally driven by the !populated events. That's how everyone
> else is doing it. If you want to tie the kvm workers lifetimes to kvm
> process, wouldn't it be cleaner to do so from kvm side? ie. let kvm process
> exit wait for the workers to be cleaned up.

It does. For example kvm_mmu_post_init_vm's call to
kvm_vm_create_worker_thread is matched with the call to
kthread_stop in kvm_mmu_pre_destroy_vm.

According to Vpin, the problem is that there's a small amount of time
between the return from kthread_stop and the point where the cgroup
can be removed. My understanding of the race is the following:

user process kthread management
------------ ------- ----------
wait4()
exit_task_work()
____fput()
kvm_mmu_pre_destroy_vm()
kthread_stop();
wait_for_completion();
exit_signals()
/* set PF_EXITING */
exit_mm()
exit_mm_release()
complete_vfork_done()
complete();
cgroup_exit()
cgroup_set_move_task()
css_set_update_populated()
exit_notify()
do_notify_parent()
<wakeup>
rmdir()
cgroup_destroy_locked()
cgroup_is_populated()
return -EBUSY
cgroup_exit()
cgroup_set_move_task()
css_set_update_populated()

I cannot find the code that makes it possible to rmdir a cgroup
if PF_EXITING is set.

Paolo

2022-01-21 19:56:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

On Wed, Jan 19, 2022 at 07:02:53PM +0100, Paolo Bonzini wrote:
> On 1/18/22 21:39, Tejun Heo wrote:
> > So, these are normally driven by the !populated events. That's how everyone
> > else is doing it. If you want to tie the kvm workers lifetimes to kvm
> > process, wouldn't it be cleaner to do so from kvm side? ie. let kvm process
> > exit wait for the workers to be cleaned up.
>
> It does. For example kvm_mmu_post_init_vm's call to
> kvm_vm_create_worker_thread is matched with the call to
> kthread_stop in kvm_mmu_pre_destroy_vm.
> According to Vpin, the problem is that there's a small amount of time
> between the return from kthread_stop and the point where the cgroup
> can be removed. My understanding of the race is the following:

Okay, this is because kthread_stop piggy backs on vfork_done to wait for the
task exit intead of the usual exit notification, so it only waits till
exit_mm(), which is uhh... weird. So, migrating is one option, I guess,
albeit a rather ugly one. It'd be nicer if we can make kthread_stop()
waiting more regular but I couldn't find a good existing place and routing
the usual parent signaling might be too complicated. Anyone has better
ideas?

Thanks.

--
tejun

2022-01-21 19:57:43

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

On Wed, Jan 19, 2022 at 10:49:57AM -0800, Vipin Sharma wrote:
> Sean suggested that we can use the real_parent of the kthread task
> which will always be kthreadd_task, this will also not require any
> changes in the cgroup API. I like that approach, I will give it a try.
> This will avoid changes in cgroup APIs completely.

Yeah, that's better than the original but still not great in that it's still
a workaround and just pushes up the problem. You can get the same race if
the cgroups are nested. e.g. if you have a kvm instance under a/b and when
kvm exits, the management software removes b and then realizes that a is
empty too and then tries to delete that too.

It'd be great if we can make kthread_stop actually wait for what most others
consider thread exit but if we're just gonna work around them, just doing it
in userspace might be better - e.g. after kvm exits, wait for !populated
event (this is a pollable event) on the cgroup and then clean up.

Thanks.

--
tejun

2022-01-21 19:58:19

by Vipin Sharma

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

On Wed, Jan 19, 2022 at 10:30 AM Tejun Heo <[email protected]> wrote:
>
> On Wed, Jan 19, 2022 at 07:02:53PM +0100, Paolo Bonzini wrote:
> > On 1/18/22 21:39, Tejun Heo wrote:
> > > So, these are normally driven by the !populated events. That's how everyone
> > > else is doing it. If you want to tie the kvm workers lifetimes to kvm
> > > process, wouldn't it be cleaner to do so from kvm side? ie. let kvm process
> > > exit wait for the workers to be cleaned up.
> >
> > It does. For example kvm_mmu_post_init_vm's call to
> > kvm_vm_create_worker_thread is matched with the call to
> > kthread_stop in kvm_mmu_pre_destroy_vm.
> > According to Vpin, the problem is that there's a small amount of time
> > between the return from kthread_stop and the point where the cgroup
> > can be removed. My understanding of the race is the following:
>
> Okay, this is because kthread_stop piggy backs on vfork_done to wait for the
> task exit intead of the usual exit notification, so it only waits till
> exit_mm(), which is uhh... weird. So, migrating is one option, I guess,
> albeit a rather ugly one. It'd be nicer if we can make kthread_stop()
> waiting more regular but I couldn't find a good existing place and routing
> the usual parent signaling might be too complicated. Anyone has better
> ideas?
>
Sean suggested that we can use the real_parent of the kthread task
which will always be kthreadd_task, this will also not require any
changes in the cgroup API. I like that approach, I will give it a try.
This will avoid changes in cgroup APIs completely.

2022-01-21 22:18:44

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

On Wed, Jan 19, 2022 at 08:30:43AM -1000, Tejun Heo <[email protected]> wrote:
> It'd be nicer if we can make kthread_stop() waiting more regular but I
> couldn't find a good existing place and routing the usual parent
> signaling might be too complicated. Anyone has better ideas?

The regular way is pictured in Paolo's diagram already, the
exit_notify/do_signal_parent -> wait4 path.

Actually, I can see that there exists already kernel_wait() and is used
by a UMH wrapper kthread. kthreadd issues ignore_signals() so (besides
no well defined point of signalling a kthread) the signal notification
is moot and only waking up the waiter is relevant. So kthread_stop()
could wait via kernel_wait() based on pid (extracted from task_struct).

Have I missed an obstacle?


Michal

2022-02-16 22:48:42

by Vipin Sharma

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

Hi Paolo, Michal

Paolo:
Will you accept a patch which uses real_parent in
kvm_vm_worker_thread() as suggested by Sean, while I figure out the
recommendation from Michal about making kthread_stop() wait on
kernel_wait()?
cgroup_attach_task_all(current->real_parent, current)

Michal:

On Thu, Jan 20, 2022 at 7:05 AM Michal Koutný <[email protected]> wrote:
>
> On Wed, Jan 19, 2022 at 08:30:43AM -1000, Tejun Heo <[email protected]> wrote:
> > It'd be nicer if we can make kthread_stop() waiting more regular but I
> > couldn't find a good existing place and routing the usual parent
> > signaling might be too complicated. Anyone has better ideas?
>
> The regular way is pictured in Paolo's diagram already, the
> exit_notify/do_signal_parent -> wait4 path.
>
> Actually, I can see that there exists already kernel_wait() and is used
> by a UMH wrapper kthread. kthreadd issues ignore_signals() so (besides
> no well defined point of signalling a kthread) the signal notification
> is moot and only waking up the waiter is relevant. So kthread_stop()
> could wait via kernel_wait() based on pid (extracted from task_struct).
>
> Have I missed an obstacle?
>

I must admit I do not have a good understanding of kernel_wait() and
kthread_stop() APIs. I tried making some changes in the kthread_stop()
but I was not able to successfully use the API. I tested it by a
writing a test module, where during the init I start a kthread which
prints some message every few seconds and during the module exit I
call kernel_stop(). This module worked as intended without the
kernel_wait() changes in the kthread_stop() API.

My changes were basically replacing wait_for_completion() with
kernel_wait() call.

@@ -645,8 +645,9 @@ int kthread_stop(struct task_struct *k)
set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
kthread_unpark(k);
wake_up_process(k);
- wait_for_completion(&kthread->exited);
- ret = k->exit_code;
+ kernel_wait(k->pid, &ret);
+// kernel_wait(task_pid_vnr(k), &ret);
+// wait_for_completion(&kthread->exited);
+// ret = k->exit_code;
put_task_struct(k);

I used few other combination where I put kernel_wait() call after
put_task_struct(k) call.

Every time during the module exit, kernel was crashing like:

[ 285.014612] RIP: 0010:0xffffffffc04ed074
[ 285.018537] RSP: 0018:ffff9ccdc8365ee8 EFLAGS: 00010246
[ 285.023761] RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000000001
[ 285.030896] RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff9cce3f7d9cc0
[ 285.038028] RBP: ffff9ccdc8365ef8 R08: 0000000000000000 R09: 0000000000015504
[ 285.045160] R10: 000000000000004b R11: ffffffff8dd92880 R12: 0000000000000012
[ 285.052293] R13: ffff9ccdc813db90 R14: ffff9ccdc7e66240 R15: ffffffffc04ed000
[ 285.059425] FS: 0000000000000000(0000) GS:ffff9cce3f7c0000(0000)
knlGS:0000000000000000
[ 285.067510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 285.073258] CR2: ffffffffc04ed074 CR3: 000000c07f20e002 CR4: 0000000000362ef0
[ 285.080390] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 285.087522] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 285.094656] Call Trace:
[ 285.097112] kthread+0x148/0x1b0
[ 285.100343] ? kthread_blkcg+0x30/0x30
[ 285.104096] ret_from_fork+0x3a/0x60
[ 285.107671] Code: Bad RIP value.
[ 285.107671] IP: 0xffffffffc04ecff4:

Crash is not observed if I keep wait_for_completion(&kthread->exited)
along with kernel_wait(), but I guess the kernel_wait() should be
sufficient by itself if I figure out proper way to use it.

Do you have any suggestions what might be the right way to use this API?

Thanks
Vipin

2022-02-17 05:28:47

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

On 2/16/22 18:37, Vipin Sharma wrote:
> Paolo:
> Will you accept a patch which uses real_parent in
> kvm_vm_worker_thread() as suggested by Sean, while I figure out the
> recommendation from Michal about making kthread_stop() wait on
> kernel_wait()?
> cgroup_attach_task_all(current->real_parent, current)

Yes, of course.

Paolo

2022-02-26 02:15:04

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

Hi Vipin.

On Wed, Feb 16, 2022 at 09:37:45AM -0800, Vipin Sharma <[email protected]> wrote:
> On Thu, Jan 20, 2022 at 7:05 AM Michal Koutn? <[email protected]> wrote:
> > Have I missed an obstacle?

Aha...

[
> I used few other combination where I put kernel_wait() call after
> put_task_struct(k) call.
>
> Every time during the module exit, kernel was crashing like:

Thanks for trying this out.
]

> Do you have any suggestions what might be the right way to use this API?

...it has occured to me now -- the KVM kthread is not a child of the
wanna-wait user task. So the kernel_wait() silently errs with -ECHILD
and task_struct is released too early and that (probably) brings about
the crash.

I'm sorry for not realizing that initially.

(Generally, any kthread_create'd task would be affected by this. I guess
the KVM worker threads can't be forked from the kvm_create_vm() callers?
(It could prevent the double migration to and from caller's cgroup
though.))

Michal