Hello,
syzbot found the following issue on:
HEAD commit: 71ab9c3e2253 net: fix UaF in netns ops registration error ..
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=10f86a56480000
kernel config: https://syzkaller.appspot.com/x/.config?x=899d86a7610a0ea0
dashboard link: https://syzkaller.appspot.com/bug?extid=6cd18e123583550cf469
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/54c953096fdb/disk-71ab9c3e.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/644d72265810/vmlinux-71ab9c3e.xz
kernel image: https://storage.googleapis.com/syzbot-assets/16e994579ca5/bzImage-71ab9c3e.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 hlock_class kernel/locking/lockdep.c:231 [inline]
WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 hlock_class kernel/locking/lockdep.c:220 [inline]
WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 check_wait_context kernel/locking/lockdep.c:4729 [inline]
WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 __lock_acquire+0x13ae/0x56d0 kernel/locking/lockdep.c:5005
Modules linked in:
CPU: 0 PID: 46 Comm: kworker/u4:3 Not tainted 6.2.0-rc4-syzkaller-00191-g71ab9c3e2253 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Workqueue: events_unbound call_usermodehelper_exec_work
RIP: 0010:hlock_class kernel/locking/lockdep.c:231 [inline]
RIP: 0010:hlock_class kernel/locking/lockdep.c:220 [inline]
RIP: 0010:check_wait_context kernel/locking/lockdep.c:4729 [inline]
RIP: 0010:__lock_acquire+0x13ae/0x56d0 kernel/locking/lockdep.c:5005
Code: 08 84 d2 0f 85 56 41 00 00 8b 15 55 7b 0f 0d 85 d2 0f 85 d5 fd ff ff 48 c7 c6 c0 51 4c 8a 48 c7 c7 20 4b 4c 8a e8 92 e1 5b 08 <0f> 0b 31 ed e9 50 f0 ff ff 48 c7 c0 a8 2b 73 8e 48 ba 00 00 00 00
RSP: 0018:ffffc90000b77a30 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffff88801272a778 RCX: 0000000000000000
RDX: ffff888012729d40 RSI: ffffffff8166822c RDI: fffff5200016ef38
RBP: 0000000000001b2c R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000001 R11: 0000000000000001 R12: ffff88801272a7c8
R13: ffff888012729d40 R14: ffffc9000397fb28 R15: 0000000000040000
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa5a95e81d0 CR3: 00000000284d2000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
lock_acquire kernel/locking/lockdep.c:5668 [inline]
lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
complete+0x1d/0x1f0 kernel/sched/completion.c:32
umh_complete+0x32/0x90 kernel/umh.c:59
call_usermodehelper_exec_sync kernel/umh.c:144 [inline]
call_usermodehelper_exec_work+0x115/0x180 kernel/umh.c:167
process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
worker_thread+0x669/0x1090 kernel/workqueue.c:2436
kthread+0x2e8/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
On 2023/01/27 10:41, Hillf Danton wrote:
> On Thu, 26 Jan 2023 14:27:51 -0800
>> syzbot found the following issue on:
>>
>> HEAD commit: 71ab9c3e2253 net: fix UaF in netns ops registration error ..
>> git tree: net
>> console output: https://syzkaller.appspot.com/x/log.txt?x=10f86a56480000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=899d86a7610a0ea0
>> dashboard link: https://syzkaller.appspot.com/bug?extid=6cd18e123583550cf469
>> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/54c953096fdb/disk-71ab9c3e.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/644d72265810/vmlinux-71ab9c3e.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/16e994579ca5/bzImage-71ab9c3e.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: [email protected]
>>
>> ------------[ cut here ]------------
>> DEBUG_LOCKS_WARN_ON(1)
>> WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 hlock_class kernel/locking/lockdep.c:231 [inline]
>> WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 hlock_class kernel/locking/lockdep.c:220 [inline]
>> WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 check_wait_context kernel/locking/lockdep.c:4729 [inline]
>> WARNING: CPU: 0 PID: 46 at kernel/locking/lockdep.c:231 __lock_acquire+0x13ae/0x56d0 kernel/locking/lockdep.c:5005
>> Modules linked in:
>> CPU: 0 PID: 46 Comm: kworker/u4:3 Not tainted 6.2.0-rc4-syzkaller-00191-g71ab9c3e2253 #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
>> Workqueue: events_unbound call_usermodehelper_exec_work
>> RIP: 0010:hlock_class kernel/locking/lockdep.c:231 [inline]
>> RIP: 0010:hlock_class kernel/locking/lockdep.c:220 [inline]
>> RIP: 0010:check_wait_context kernel/locking/lockdep.c:4729 [inline]
>> RIP: 0010:__lock_acquire+0x13ae/0x56d0 kernel/locking/lockdep.c:5005
>> Code: 08 84 d2 0f 85 56 41 00 00 8b 15 55 7b 0f 0d 85 d2 0f 85 d5 fd ff ff 48 c7 c6 c0 51 4c 8a 48 c7 c7 20 4b 4c 8a e8 92 e1 5b 08 <0f> 0b 31 ed e9 50 f0 ff ff 48 c7 c0 a8 2b 73 8e 48 ba 00 00 00 00
>> RSP: 0018:ffffc90000b77a30 EFLAGS: 00010082
>> RAX: 0000000000000000 RBX: ffff88801272a778 RCX: 0000000000000000
>> RDX: ffff888012729d40 RSI: ffffffff8166822c RDI: fffff5200016ef38
>> RBP: 0000000000001b2c R08: 0000000000000005 R09: 0000000000000000
>> R10: 0000000080000001 R11: 0000000000000001 R12: ffff88801272a7c8
>> R13: ffff888012729d40 R14: ffffc9000397fb28 R15: 0000000000040000
>> FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007fa5a95e81d0 CR3: 00000000284d2000 CR4: 00000000003506f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>> <TASK>
>> lock_acquire kernel/locking/lockdep.c:5668 [inline]
>> lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
>> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>> _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
>> complete+0x1d/0x1f0 kernel/sched/completion.c:32
>> umh_complete+0x32/0x90 kernel/umh.c:59
>> call_usermodehelper_exec_sync kernel/umh.c:144 [inline]
>> call_usermodehelper_exec_work+0x115/0x180 kernel/umh.c:167
>> process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
>> worker_thread+0x669/0x1090 kernel/workqueue.c:2436
>> kthread+0x2e8/0x3a0 kernel/kthread.c:376
>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
>> </TASK>
>
> This is an interesting case - given done initialized on stack, no garbage
> should have been detected by lockdep.
>
> One explanation to the report is uaf on the waker side, and it can be
> tested with the diff below when a reproducer is available.
>
> Hillf
>
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -452,6 +452,12 @@ int call_usermodehelper_exec(struct subp
> /* umh_complete() will see NULL and free sub_info */
> if (xchg(&sub_info->complete, NULL))
> goto unlock;
> + else {
> + /* wait for umh_complete() to finish in a bid to avoid
> + * uaf because done is destructed
> + */
> + wait_for_completion(&done);
> + }
> }
>
> wait_done:
> --
Yes, this bug is caused by commit f5d39b020809 ("freezer,sched: Rewrite core freezer
logic"), for that commit for unknown reason omits wait_for_completion(&done) call
when wait_for_completion_state(&done, state) returned -ERESTARTSYS.
Peter, is it safe to restore wait_for_completion(&done) call?
On 2023/02/03 19:22, Tetsuo Handa wrote:
> Yes, this bug is caused by commit f5d39b020809 ("freezer,sched: Rewrite core freezer
> logic"), for that commit for unknown reason omits wait_for_completion(&done) call
> when wait_for_completion_state(&done, state) returned -ERESTARTSYS.
>
> Peter, is it safe to restore wait_for_completion(&done) call?
>
Something like this?
diff --git a/kernel/umh.c b/kernel/umh.c
index 850631518665..97230edb1849 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -441,8 +441,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
if (wait & UMH_KILLABLE)
state |= TASK_KILLABLE;
- if (wait & UMH_FREEZABLE)
- state |= TASK_FREEZABLE;
+ //if (wait & UMH_FREEZABLE)
+ // state |= TASK_FREEZABLE;
retval = wait_for_completion_state(&done, state);
if (!retval)
@@ -452,7 +452,9 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
/* umh_complete() will see NULL and free sub_info */
if (xchg(&sub_info->complete, NULL))
goto unlock;
+ /* fallthrough, umh_complete() was already called */
}
+ wait_for_completion(&done);
wait_done:
retval = sub_info->retval;
How does TASK_FREEZABLE affect here? Since call_usermodehelper_exec() is a function
for starting and waiting for termination of a userspace process (which is subjected
to freezing), the caller of call_usermodehelper_exec() can't wait for the termination
of that userspace process if that process was frozen, and wait_for_completion()
blocks forever?
On Fri, Feb 03, 2023 at 07:22:43PM +0900, Tetsuo Handa wrote:
> On 2023/01/27 10:41, Hillf Danton wrote:
> >> Call Trace:
> >> <TASK>
> >> lock_acquire kernel/locking/lockdep.c:5668 [inline]
> >> lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
> >> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> >> _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
> >> complete+0x1d/0x1f0 kernel/sched/completion.c:32
> >> umh_complete+0x32/0x90 kernel/umh.c:59
> >> call_usermodehelper_exec_sync kernel/umh.c:144 [inline]
> >> call_usermodehelper_exec_work+0x115/0x180 kernel/umh.c:167
> >> process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
> >> worker_thread+0x669/0x1090 kernel/workqueue.c:2436
> >> kthread+0x2e8/0x3a0 kernel/kthread.c:376
> >> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
> >> </TASK>
> >
> > This is an interesting case - given done initialized on stack, no garbage
> > should have been detected by lockdep.
> >
> > One explanation to the report is uaf on the waker side, and it can be
> > tested with the diff below when a reproducer is available.
> >
> > Hillf
> >
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -452,6 +452,12 @@ int call_usermodehelper_exec(struct subp
> > /* umh_complete() will see NULL and free sub_info */
> > if (xchg(&sub_info->complete, NULL))
> > goto unlock;
> > + else {
> > + /* wait for umh_complete() to finish in a bid to avoid
> > + * uaf because done is destructed
> > + */
Invalid comment style at the very least.
> > + wait_for_completion(&done);
> > + }
> > }
> >
> > wait_done:
> > --
>
> Yes, this bug is caused by commit f5d39b020809 ("freezer,sched: Rewrite core freezer
> logic"), for that commit for unknown reason omits wait_for_completion(&done) call
> when wait_for_completion_state(&done, state) returned -ERESTARTSYS.
>
> Peter, is it safe to restore wait_for_completion(&done) call?
Urgh, that code is terrible.. the way I read it was that it would
wait_for_completion_killable() if KILLABLE and assumed the
second wait_for_completion() would NOP out because we'd already
completed on the first.
I don't see how adding a second wait is correct in the case of
-ERESTARTSYS, what's the stop this second wait to also get interrupted
like that?
Should there be a loop?
On Fri, Feb 03, 2023 at 07:48:35PM +0900, Tetsuo Handa wrote:
> On 2023/02/03 19:22, Tetsuo Handa wrote:
> > Yes, this bug is caused by commit f5d39b020809 ("freezer,sched: Rewrite core freezer
> > logic"), for that commit for unknown reason omits wait_for_completion(&done) call
> > when wait_for_completion_state(&done, state) returned -ERESTARTSYS.
> >
> > Peter, is it safe to restore wait_for_completion(&done) call?
> >
>
> Something like this?
>
> diff --git a/kernel/umh.c b/kernel/umh.c
> index 850631518665..97230edb1849 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -441,8 +441,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> if (wait & UMH_KILLABLE)
> state |= TASK_KILLABLE;
>
> - if (wait & UMH_FREEZABLE)
> - state |= TASK_FREEZABLE;
> + //if (wait & UMH_FREEZABLE)
> + // state |= TASK_FREEZABLE;
>
> retval = wait_for_completion_state(&done, state);
> if (!retval)
> @@ -452,7 +452,9 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> /* umh_complete() will see NULL and free sub_info */
> if (xchg(&sub_info->complete, NULL))
> goto unlock;
> + /* fallthrough, umh_complete() was already called */
> }
> + wait_for_completion(&done);
>
> wait_done:
> retval = sub_info->retval;
>
> How does TASK_FREEZABLE affect here?
It marks those waits that are safe to convert to a frozen state.
> Since call_usermodehelper_exec() is a function for starting and
> waiting for termination of a userspace process (which is subjected to
> freezing), the caller of call_usermodehelper_exec() can't wait for the
> termination of that userspace process if that process was frozen, and
> wait_for_completion()
> blocks forever?
It'll probably make the freeze fail and abort the suspend. We first
freezer userspace (including the helper), then we try and freeze all the
kernel threads. If we can't, we error out and abort -- waking everything
back up.
But now I realize what I missed before, wait_for_completion() it not
interruptible.
I think the right fix is to:
state &= ~TASK_KILLABLE;
wait_for_completion_state(&done, state);
Also, put in a comment..
On Fri, Feb 03, 2023 at 01:19:53PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 03, 2023 at 07:48:35PM +0900, Tetsuo Handa wrote:
> > On 2023/02/03 19:22, Tetsuo Handa wrote:
> > > Yes, this bug is caused by commit f5d39b020809 ("freezer,sched: Rewrite core freezer
> > > logic"), for that commit for unknown reason omits wait_for_completion(&done) call
> > > when wait_for_completion_state(&done, state) returned -ERESTARTSYS.
> > >
> > > Peter, is it safe to restore wait_for_completion(&done) call?
> > >
> >
> > Something like this?
> >
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index 850631518665..97230edb1849 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -441,8 +441,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> > if (wait & UMH_KILLABLE)
> > state |= TASK_KILLABLE;
> >
> > - if (wait & UMH_FREEZABLE)
> > - state |= TASK_FREEZABLE;
> > + //if (wait & UMH_FREEZABLE)
> > + // state |= TASK_FREEZABLE;
> >
> > retval = wait_for_completion_state(&done, state);
> > if (!retval)
> > @@ -452,7 +452,9 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> > /* umh_complete() will see NULL and free sub_info */
> > if (xchg(&sub_info->complete, NULL))
> > goto unlock;
> > + /* fallthrough, umh_complete() was already called */
> > }
> > + wait_for_completion(&done);
> >
> > wait_done:
> > retval = sub_info->retval;
> >
> > How does TASK_FREEZABLE affect here?
>
> It marks those waits that are safe to convert to a frozen state.
>
> > Since call_usermodehelper_exec() is a function for starting and
> > waiting for termination of a userspace process (which is subjected to
> > freezing), the caller of call_usermodehelper_exec() can't wait for the
> > termination of that userspace process if that process was frozen, and
> > wait_for_completion()
> > blocks forever?
>
> It'll probably make the freeze fail and abort the suspend. We first
> freezer userspace (including the helper), then we try and freeze all the
> kernel threads. If we can't, we error out and abort -- waking everything
> back up.
>
> But now I realize what I missed before, wait_for_completion() it not
> interruptible.
>
> I think the right fix is to:
>
> state &= ~TASK_KILLABLE;
state &= ~__TASK_WAKEKILL;
we don't want to mask out UNINTERUPTIBLE, that would be bad.
> wait_for_completion_state(&done, state);
>
> Also, put in a comment..
On 2023/02/03 21:30, Peter Zijlstra wrote:
>> I think the right fix is to:
>>
>> state &= ~TASK_KILLABLE;
>
> state &= ~__TASK_WAKEKILL;
>
> we don't want to mask out UNINTERUPTIBLE, that would be bad.
This code was made killable as a solution for CVE-2012-4398.
Although OOM reaper is available today, making back to unkillable is not smart.
>
>> wait_for_completion_state(&done, state);
>>
>> Also, put in a comment..
On Fri, Feb 03, 2023 at 09:59:51PM +0900, Tetsuo Handa wrote:
> On 2023/02/03 21:30, Peter Zijlstra wrote:
> >> I think the right fix is to:
> >>
> >> state &= ~TASK_KILLABLE;
> >
> > state &= ~__TASK_WAKEKILL;
> >
> > we don't want to mask out UNINTERUPTIBLE, that would be bad.
>
> This code was made killable as a solution for CVE-2012-4398.
> Although OOM reaper is available today, making back to unkillable is not smart.
Yes, I meant something like so.
diff --git a/kernel/umh.c b/kernel/umh.c
index 850631518665..0e69e1e4b6fe 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -438,21 +438,24 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
if (wait == UMH_NO_WAIT) /* task has freed sub_info */
goto unlock;
- if (wait & UMH_KILLABLE)
- state |= TASK_KILLABLE;
-
- if (wait & UMH_FREEZABLE)
+ if (wait & UMH_FREEZABLE) {
state |= TASK_FREEZABLE;
- retval = wait_for_completion_state(&done, state);
- if (!retval)
- goto wait_done;
-
if (wait & UMH_KILLABLE) {
+ retval = wait_for_completion_state(&done, state | TASK_KILLABLE);
+ if (!retval)
+ goto wait_done;
+
/* umh_complete() will see NULL and free sub_info */
if (xchg(&sub_info->complete, NULL))
goto unlock;
+
+ /*
+ * fallthrough; in case of -ERESTARTSYS now do uninterruptible
+ * wait_for_completion().
+ */
}
+ wait_for_completion_state(&done, state);
wait_done:
retval = sub_info->retval;
On 2023/02/03 23:31, Peter Zijlstra wrote:
> Yes, I meant something like so.
>
>
> diff --git a/kernel/umh.c b/kernel/umh.c
> index 850631518665..0e69e1e4b6fe 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -438,21 +438,24 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> if (wait == UMH_NO_WAIT) /* task has freed sub_info */
> goto unlock;
>
> - if (wait & UMH_KILLABLE)
> - state |= TASK_KILLABLE;
> -
> - if (wait & UMH_FREEZABLE)
> + if (wait & UMH_FREEZABLE) {
> state |= TASK_FREEZABLE;
>
> - retval = wait_for_completion_state(&done, state);
> - if (!retval)
> - goto wait_done;
> -
> if (wait & UMH_KILLABLE) {
> + retval = wait_for_completion_state(&done, state | TASK_KILLABLE);
> + if (!retval)
> + goto wait_done;
> +
> /* umh_complete() will see NULL and free sub_info */
> if (xchg(&sub_info->complete, NULL))
> goto unlock;
> +
> + /*
> + * fallthrough; in case of -ERESTARTSYS now do uninterruptible
> + * wait_for_completion().
> + */
> }
> + wait_for_completion_state(&done, state);
>
> wait_done:
> retval = sub_info->retval;
Please fold below diff, provided that wait_for_completion_state(TASK_FREEZABLE)
does not return when the current thread was frozen. (If
wait_for_completion_state(TASK_FREEZABLE) returns when the current thread was
frozen, we will fail to execute the
retval = sub_info->retval;
line in order to get the exit code after the current thread is thawed...)
diff --git a/kernel/umh.c b/kernel/umh.c
index 0e69e1e4b6fe..a776920ec051 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -431,35 +431,38 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
* This makes it possible to use umh_complete to free
* the data structure in case of UMH_NO_WAIT.
*/
sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done;
sub_info->wait = wait;
queue_work(system_unbound_wq, &sub_info->work);
if (wait == UMH_NO_WAIT) /* task has freed sub_info */
goto unlock;
- if (wait & UMH_FREEZABLE) {
+ if (wait & UMH_FREEZABLE)
state |= TASK_FREEZABLE;
if (wait & UMH_KILLABLE) {
retval = wait_for_completion_state(&done, state | TASK_KILLABLE);
if (!retval)
goto wait_done;
/* umh_complete() will see NULL and free sub_info */
if (xchg(&sub_info->complete, NULL))
goto unlock;
/*
* fallthrough; in case of -ERESTARTSYS now do uninterruptible
- * wait_for_completion().
+ * wait_for_completion_state(). Since umh_complete() shall call
+ * complete() in a moment if xchg() above returned NULL, this
+ * uninterruptible wait_for_completion_state() will not block
+ * SIGKILL'ed process for so long.
*/
}
wait_for_completion_state(&done, state);
wait_done:
retval = sub_info->retval;
out:
call_usermodehelper_freeinfo(sub_info);
unlock:
helper_unlock();
On Sat, Feb 04, 2023 at 09:48:39AM +0900, Tetsuo Handa wrote:
> On 2023/02/03 23:31, Peter Zijlstra wrote:
> > Yes, I meant something like so.
> >
> >
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index 850631518665..0e69e1e4b6fe 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -438,21 +438,24 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> > if (wait == UMH_NO_WAIT) /* task has freed sub_info */
> > goto unlock;
> >
> > - if (wait & UMH_KILLABLE)
> > - state |= TASK_KILLABLE;
> > -
> > - if (wait & UMH_FREEZABLE)
> > + if (wait & UMH_FREEZABLE) {
> > state |= TASK_FREEZABLE;
> >
> > - retval = wait_for_completion_state(&done, state);
> > - if (!retval)
> > - goto wait_done;
> > -
> > if (wait & UMH_KILLABLE) {
> > + retval = wait_for_completion_state(&done, state | TASK_KILLABLE);
> > + if (!retval)
> > + goto wait_done;
> > +
> > /* umh_complete() will see NULL and free sub_info */
> > if (xchg(&sub_info->complete, NULL))
> > goto unlock;
> > +
> > + /*
> > + * fallthrough; in case of -ERESTARTSYS now do uninterruptible
> > + * wait_for_completion().
> > + */
> > }
> > + wait_for_completion_state(&done, state);
> >
> > wait_done:
> > retval = sub_info->retval;
>
> Please fold below diff, provided that wait_for_completion_state(TASK_FREEZABLE)
> does not return when the current thread was frozen. (If
> wait_for_completion_state(TASK_FREEZABLE) returns when the current thread was
> frozen, we will fail to execute the
>
> retval = sub_info->retval;
>
> line in order to get the exit code after the current thread is thawed...)
>
> diff --git a/kernel/umh.c b/kernel/umh.c
> index 0e69e1e4b6fe..a776920ec051 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -431,35 +431,38 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> * This makes it possible to use umh_complete to free
> * the data structure in case of UMH_NO_WAIT.
> */
> sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done;
> sub_info->wait = wait;
>
> queue_work(system_unbound_wq, &sub_info->work);
> if (wait == UMH_NO_WAIT) /* task has freed sub_info */
> goto unlock;
>
> - if (wait & UMH_FREEZABLE) {
> + if (wait & UMH_FREEZABLE)
> state |= TASK_FREEZABLE;
>
> if (wait & UMH_KILLABLE) {
> retval = wait_for_completion_state(&done, state | TASK_KILLABLE);
> if (!retval)
> goto wait_done;
>
> /* umh_complete() will see NULL and free sub_info */
> if (xchg(&sub_info->complete, NULL))
> goto unlock;
>
> /*
> * fallthrough; in case of -ERESTARTSYS now do uninterruptible
> - * wait_for_completion().
> + * wait_for_completion_state(). Since umh_complete() shall call
> + * complete() in a moment if xchg() above returned NULL, this
> + * uninterruptible wait_for_completion_state() will not block
> + * SIGKILL'ed process for so long.
> */
> }
> wait_for_completion_state(&done, state);
I think this seems to be the same issue that Schspa Shi reported / provided a
fix sugggestion for [0]. This lead me to ask if:
a) incorrect usage of completion on stack could be generic and;
b) if we should instead have an API helper for that?
Although he already implemented a suggestion for b) to answer a) we need
some SmPL constructs yet to be written by Schspa. The reason I asked for
b) is that if this is a regular pattern it begs for a) as this sort of
issue could be prevalent in other places. So the status of Schspa's work
was that he was going to work on the SmPL grammar to check how frequent
this incorrect patern could be found.
Please let me know your thoughts as perhaps this issue / discussion
didn't get on Peter's radar as it was rececently discussed with Schspa
despite being on Cc.
[0] https://lore.kernel.org/all/[email protected]/T/#u
Luis
On Mon, Feb 06, 2023 at 07:51:16AM -0800, Luis Chamberlain wrote:
> I think this seems to be the same issue that Schspa Shi reported / provided a
> fix sugggestion for [0]. This lead me to ask if:
>
> a) incorrect usage of completion on stack could be generic and;
> b) if we should instead have an API helper for that?
>
> Although he already implemented a suggestion for b) to answer a) we need
> some SmPL constructs yet to be written by Schspa. The reason I asked for
> b) is that if this is a regular pattern it begs for a) as this sort of
> issue could be prevalent in other places. So the status of Schspa's work
> was that he was going to work on the SmPL grammar to check how frequent
> this incorrect patern could be found.
>
> Please let me know your thoughts as perhaps this issue / discussion
> didn't get on Peter's radar as it was rececently discussed with Schspa
> despite being on Cc.
>
> [0] https://lore.kernel.org/all/[email protected]/T/#u
-ETOODAMNMUCHEMAIL :-/
Urgh, esp. :
https://lore.kernel.org/all/[email protected]/T/#m9f0105d28fcfe4947a2583cd3d425169c4fe5dfa
is quite insane.
On Mon, Feb 06, 2023 at 07:51:16AM -0800, Luis Chamberlain wrote:
> I think this seems to be the same issue that Schspa Shi reported / provided a
> fix sugggestion for [0]. This lead me to ask if:
>
> a) incorrect usage of completion on stack could be generic and;
> b) if we should instead have an API helper for that?
>
> Although he already implemented a suggestion for b) to answer a) we need
> some SmPL constructs yet to be written by Schspa. The reason I asked for
> b) is that if this is a regular pattern it begs for a) as this sort of
> issue could be prevalent in other places. So the status of Schspa's work
> was that he was going to work on the SmPL grammar to check how frequent
> this incorrect patern could be found.
Do I read correctly, from you above alphabet-soup, that someone is
working on some static analysis for on-stack completions or something?
If so, perhaps the simplest rule would to be ensure there is an
unconditional uninterruptible wait-for-completion() before going out of
scope.
This latter can be spelled like wait_for_completion() or
wait_for_completion_state(TASK_UNINTERRUPTIBLE). More specifically,
TASK_INTERRUPTIBLE and TASK_WAKEKILL must not be set in the state mask
for the wait to be uninterruptible.
If it cannot be proven, raise a warning and audit or somesuch.
On Sat, Feb 04, 2023 at 09:48:39AM +0900, Tetsuo Handa wrote:
> Please fold below diff,
Find final version below -- typing hard I suppose..
> provided that wait_for_completion_state(TASK_FREEZABLE)
> does not return when the current thread was frozen. (If
> wait_for_completion_state(TASK_FREEZABLE) returns when the current thread was
> frozen, we will fail to execute the
FREEZABLE should be transparent; that is it will only return when an
actual wakeup was missed, otherwise it will remain asleep.
Specifically, the FROZEN thing relies on wait loops to be resillient
against spurious wakeups. Consider do_wait_for_common(), the action() :=
schedule_timeout() might 'suriously' return after thawing, but it will
re-validate the actual completion condition and go back to sleep if it
hasn't happened yet.
OTOH, if the completeion condition has happened (right before the
completer itself was frozen for example, but after the waiter was
already frozen), then the 'spurious' wakeup on thaw is exactly what was
needed, the completion condition is satisfied and the wait terminated.
---
Subject: freezer,umh: Fix call_usermode_helper_exec() vs SIGKILL
From: Peter Zijlstra <[email protected]>
Date: Fri, 3 Feb 2023 15:31:11 +0100
Tetsuo-San noted that commit f5d39b020809 ("freezer,sched: Rewrite
core freezer logic") broke call_usermodehelper_exec() for the KILLABLE
case.
Specifically it was missed that the second, unconditional,
wait_for_completion() was not optional and ensures the on-stack
completion is unused before going out-of-scope.
Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
Reported-by: [email protected]
Reported-by: Tetsuo Handa <[email protected]>
Debugged-by: Tetsuo Handa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/umh.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -438,21 +438,27 @@ int call_usermodehelper_exec(struct subp
if (wait == UMH_NO_WAIT) /* task has freed sub_info */
goto unlock;
- if (wait & UMH_KILLABLE)
- state |= TASK_KILLABLE;
-
if (wait & UMH_FREEZABLE)
state |= TASK_FREEZABLE;
- retval = wait_for_completion_state(&done, state);
- if (!retval)
- goto wait_done;
-
if (wait & UMH_KILLABLE) {
+ retval = wait_for_completion_state(&done, state | TASK_KILLABLE);
+ if (!retval)
+ goto wait_done;
+
/* umh_complete() will see NULL and free sub_info */
if (xchg(&sub_info->complete, NULL))
goto unlock;
+
+ /*
+ * fallthrough; in case of -ERESTARTSYS now do uninterruptible
+ * wait_for_completion_state(). Since umh_complete() shall call
+ * complete() in a moment if xchg() above returned NULL, this
+ * uninterruptible wait_for_completion_state() will not block
+ * SIGKILL'ed processes for long.
+ */
}
+ wait_for_completion_state(&done, state);
wait_done:
retval = sub_info->retval;
On 2023/02/14 0:34, Peter Zijlstra wrote:
> On Sat, Feb 04, 2023 at 09:48:39AM +0900, Tetsuo Handa wrote:
>> Please fold below diff,
>
> Find final version below -- typing hard I suppose..
>
Thanks for explanation and patch. Please proceed.
As per https://lkml.kernel.org/r/[email protected] ,
I think Schspa Shi <[email protected]> can be added using Reported-by: tag.
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: eedeb787ebb53de5c5dcf7b7b39d01bf1b0f037d
Gitweb: https://git.kernel.org/tip/eedeb787ebb53de5c5dcf7b7b39d01bf1b0f037d
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 03 Feb 2023 15:31:11 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Mon, 13 Feb 2023 16:36:14 +01:00
freezer,umh: Fix call_usermode_helper_exec() vs SIGKILL
Tetsuo-San noted that commit f5d39b020809 ("freezer,sched: Rewrite
core freezer logic") broke call_usermodehelper_exec() for the KILLABLE
case.
Specifically it was missed that the second, unconditional,
wait_for_completion() was not optional and ensures the on-stack
completion is unused before going out-of-scope.
Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
Reported-by: [email protected]
Reported-by: Tetsuo Handa <[email protected]>
Debugged-by: Tetsuo Handa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/umh.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/kernel/umh.c b/kernel/umh.c
index 8506315..fbf872c 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -438,21 +438,27 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
if (wait == UMH_NO_WAIT) /* task has freed sub_info */
goto unlock;
- if (wait & UMH_KILLABLE)
- state |= TASK_KILLABLE;
-
if (wait & UMH_FREEZABLE)
state |= TASK_FREEZABLE;
- retval = wait_for_completion_state(&done, state);
- if (!retval)
- goto wait_done;
-
if (wait & UMH_KILLABLE) {
+ retval = wait_for_completion_state(&done, state | TASK_KILLABLE);
+ if (!retval)
+ goto wait_done;
+
/* umh_complete() will see NULL and free sub_info */
if (xchg(&sub_info->complete, NULL))
goto unlock;
+
+ /*
+ * fallthrough; in case of -ERESTARTSYS now do uninterruptible
+ * wait_for_completion_state(). Since umh_complete() shall call
+ * complete() in a moment if xchg() above returned NULL, this
+ * uninterruptible wait_for_completion_state() will not block
+ * SIGKILL'ed processes for long.
+ */
}
+ wait_for_completion_state(&done, state);
wait_done:
retval = sub_info->retval;
Peter Zijlstra <[email protected]> writes:
> On Mon, Feb 06, 2023 at 07:51:16AM -0800, Luis Chamberlain wrote:
>
>> I think this seems to be the same issue that Schspa Shi reported / provided a
>> fix sugggestion for [0]. This lead me to ask if:
>>
>> a) incorrect usage of completion on stack could be generic and;
>> b) if we should instead have an API helper for that?
>>
>> Although he already implemented a suggestion for b) to answer a) we need
>> some SmPL constructs yet to be written by Schspa. The reason I asked for
>> b) is that if this is a regular pattern it begs for a) as this sort of
>> issue could be prevalent in other places. So the status of Schspa's work
>> was that he was going to work on the SmPL grammar to check how frequent
>> this incorrect patern could be found.
>
> Do I read correctly, from you above alphabet-soup, that someone is
> working on some static analysis for on-stack completions or something?
>
Yes, I was trying to do this.
> If so, perhaps the simplest rule would to be ensure there is an
> unconditional uninterruptible wait-for-completion() before going out of
> scope.
>
> This latter can be spelled like wait_for_completion() or
> wait_for_completion_state(TASK_UNINTERRUPTIBLE). More specifically,
> TASK_INTERRUPTIBLE and TASK_WAKEKILL must not be set in the state mask
> for the wait to be uninterruptible.
>
> If it cannot be proven, raise a warning and audit or somesuch.
This is a good suggestion. I have written a SmPL patch to complete this
check, and now I need to rule out the situation that the driver has
added an additional lock to protect it.
And I have found a lot of bad usage, should we consider adding a new
helper API to simplify the fix this?
Such as:
+
+void complete_on_stack(struct completion **x)
+{
+ struct completion *comp = xchg(*x, NULL);
+
+ if (comp)
+ complete(comp);
+}
+EXPORT_SYMBOL(complete_on_stack);
+
+int __sched wait_for_completion_state_on_stack(struct completion **x,
+ unsigned int state)
+{
+ struct completion *comp = *x;
+ int retval;
+
+ retval = wait_for_completion_state(comp, state);
+ if (retval) {
+ if (xchg(*x, NULL))
+ return retval;
+
+ /*
+ * complete_on_stack will call complete shortly.
+ */
+ wait_for_completion(comp);
+ }
+
+ return retval;
+}
+EXPORT_SYMBOL(wait_for_completion_state_on_stack);
Link: https://lore.kernel.org/all/[email protected]/T/#mf6a41a7009bb47af1b15adf2b7b355e495f609c4
--
BRs
Schspa Shi
On Tue, Feb 14, 2023 at 10:31:58AM +0800, Schspa Shi wrote:
> I have written a SmPL patch to complete this
> check
<-- etc -->
> And I have found a lot of bad usage
Can you share what you found here? Just a simple list of affected
files.
Luis
On Tue, Feb 14, 2023 at 10:31:58AM +0800, Schspa Shi wrote:
> Peter Zijlstra <[email protected]> writes:
> > If so, perhaps the simplest rule would to be ensure there is an
> > unconditional uninterruptible wait-for-completion() before going out of
> > scope.
> >
> > This latter can be spelled like wait_for_completion() or
> > wait_for_completion_state(TASK_UNINTERRUPTIBLE). More specifically,
> > TASK_INTERRUPTIBLE and TASK_WAKEKILL must not be set in the state mask
> > for the wait to be uninterruptible.
> >
> > If it cannot be proven, raise a warning and audit or somesuch.
>
> This is a good suggestion. I have written a SmPL patch to complete this
> check, and now I need to rule out the situation that the driver has
> added an additional lock to protect it.
>
> And I have found a lot of bad usage, should we consider adding a new
> helper API to simplify the fix this?
Please first share some of the locations where this would be applied.
Peter Zijlstra <[email protected]> writes:
> On Tue, Feb 14, 2023 at 10:31:58AM +0800, Schspa Shi wrote:
>> Peter Zijlstra <[email protected]> writes:
>
>> > If so, perhaps the simplest rule would to be ensure there is an
>> > unconditional uninterruptible wait-for-completion() before going out of
>> > scope.
>> >
>> > This latter can be spelled like wait_for_completion() or
>> > wait_for_completion_state(TASK_UNINTERRUPTIBLE). More specifically,
>> > TASK_INTERRUPTIBLE and TASK_WAKEKILL must not be set in the state mask
>> > for the wait to be uninterruptible.
>> >
>> > If it cannot be proven, raise a warning and audit or somesuch.
>>
>> This is a good suggestion. I have written a SmPL patch to complete this
>> check, and now I need to rule out the situation that the driver has
>> added an additional lock to protect it.
>>
>> And I have found a lot of bad usage, should we consider adding a new
>> helper API to simplify the fix this?
>
> Please first share some of the locations where this would be applied.
Hi Peter:
I started a new thread to discuss the SmPL patch.
Link: https://lore.kernel.org/all/[email protected]/
--
BRs
Schspa Shi