2021-11-10 10:21:03

by syzbot

[permalink] [raw]
Subject: [syzbot] BUG: sleeping function called from invalid context in __might_resched

Hello,

syzbot found the following issue on:

HEAD commit: e851dfae4371 Merge tag 'kgdb-5.16-rc1' of git://git.kernel..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14078392b00000
kernel config: https://syzkaller.appspot.com/x/.config?x=25617f80cd596994
dashboard link: https://syzkaller.appspot.com/bug?extid=5f47a8cea6a12b77a876
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

BUG: sleeping function called from invalid context at kernel/printk/printk.c:2522
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 21727, name: syz-executor.3
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by syz-executor.3/21727:
#0: ffff888022671098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
#1: ffff888022671468 (&tty->flow.lock){....}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:375 [inline]
#1: ffff888022671468 (&tty->flow.lock){....}-{2:2}, at: n_tty_ioctl_helper+0xb6/0x2d0 drivers/tty/tty_ioctl.c:877
#2: ffff888022671098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
irq event stamp: 236
hardirqs last enabled at (235): [<ffffffff81be7585>] kasan_quarantine_put+0xf5/0x210 mm/kasan/quarantine.c:220
hardirqs last disabled at (236): [<ffffffff893c12f1>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (236): [<ffffffff893c12f1>] _raw_spin_lock_irq+0x41/0x50 kernel/locking/spinlock.c:170
softirqs last enabled at (0): [<ffffffff8144aeac>] copy_process+0x1e8c/0x75a0 kernel/fork.c:2136
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 0 PID: 21727 Comm: syz-executor.3 Not tainted 5.15.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
__might_resched.cold+0x222/0x26b kernel/sched/core.c:9539
console_lock+0x17/0x80 kernel/printk/printk.c:2522
do_con_write+0x10f/0x1e40 drivers/tty/vt/vt.c:2908
con_write+0x21/0x40 drivers/tty/vt/vt.c:3295
n_hdlc_send_frames+0x24b/0x490 drivers/tty/n_hdlc.c:290
tty_wakeup+0xe1/0x120 drivers/tty/tty_io.c:534
__start_tty drivers/tty/tty_io.c:806 [inline]
__start_tty+0xfb/0x130 drivers/tty/tty_io.c:799
n_tty_ioctl_helper+0x299/0x2d0 drivers/tty/tty_ioctl.c:880
n_hdlc_tty_ioctl+0xd2/0x340 drivers/tty/n_hdlc.c:633
tty_ioctl+0xc69/0x1670 drivers/tty/tty_io.c:2814
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f5a77a72ae9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5a74fe8188 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f5a77b85f60 RCX: 00007f5a77a72ae9
RDX: 0000000000000001 RSI: 000000000000540a RDI: 0000000000000003
RBP: 00007f5a77accf25 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5a780b9b2f R14: 00007f5a74fe8300 R15: 0000000000022000
</TASK>
BUG: scheduling while atomic: syz-executor.3/21727/0x00000002
3 locks held by syz-executor.3/21727:
#0: ffff888022671098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
#1: ffff888022671468 (&tty->flow.lock){+.+.}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:375 [inline]
#1: ffff888022671468 (&tty->flow.lock){+.+.}-{2:2}, at: n_tty_ioctl_helper+0xb6/0x2d0 drivers/tty/tty_ioctl.c:877
#2: ffff888022671098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
Modules linked in:
Preemption disabled at:
[<0000000000000000>] 0x0


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2021-11-12 12:22:21

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Wednesday, November 10, 2021 11:18:26 AM CET syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: e851dfae4371 Merge tag 'kgdb-5.16-rc1' of git://
git.kernel..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=14078392b00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=25617f80cd596994
> dashboard link: https://syzkaller.appspot.com/bug?
extid=5f47a8cea6a12b77a876
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils
for Debian) 2.35.2
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the
commit:
> Reported-by: [email protected]
>
> BUG: sleeping function called from invalid context at kernel/printk/
printk.c:2522
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 21727, name: syz-
executor.3
> preempt_count: 1, expected: 0
> RCU nest depth: 0, expected: 0
> 3 locks held by syz-executor.3/21727:
> #0: ffff888022671098 (&tty->ldisc_sem){++++}-{0:0}, at:
tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
> #1: ffff888022671468 (&tty->flow.lock){....}-{2:2}, at: spin_lock_irq
include/linux/spinlock.h:375 [inline]
> #1: ffff888022671468 (&tty->flow.lock){....}-{2:2}, at:
n_tty_ioctl_helper+0xb6/0x2d0 drivers/tty/tty_ioctl.c:877
> #2: ffff888022671098 (&tty->ldisc_sem){++++}-{0:0}, at:
tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
> irq event stamp: 236
> hardirqs last enabled at (235): [<ffffffff81be7585>]
kasan_quarantine_put+0xf5/0x210 mm/kasan/quarantine.c:220
> hardirqs last disabled at (236): [<ffffffff893c12f1>] __raw_spin_lock_irq
include/linux/spinlock_api_smp.h:117 [inline]
> hardirqs last disabled at (236): [<ffffffff893c12f1>]
_raw_spin_lock_irq+0x41/0x50 kernel/locking/spinlock.c:170
> softirqs last enabled at (0): [<ffffffff8144aeac>] copy_process+0x1e8c/
0x75a0 kernel/fork.c:2136
> softirqs last disabled at (0): [<0000000000000000>] 0x0
> Preemption disabled at:
> [<0000000000000000>] 0x0
> CPU: 0 PID: 21727 Comm: syz-executor.3 Not tainted 5.15.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> __might_resched.cold+0x222/0x26b kernel/sched/core.c:9539
> console_lock+0x17/0x80 kernel/printk/printk.c:2522

I think that this "BUG" is a false positive.

In do_con_write(), Just before the call of console_lock() there is an
in_interrupt() check that, if it evaluates to true, makes this function to
return "count" and prevents the SAC bug.

Therefore, if I'm not missing something, the sleeping code is never executed.

If someone wants to check it and, if they find out that I'm not wrong, please
also tell Syzbot to close this issue (I don't know which command to send).

Regards,

Fabio M. De Francesco

> do_con_write+0x10f/0x1e40 drivers/tty/vt/vt.c:2908
> con_write+0x21/0x40 drivers/tty/vt/vt.c:3295
> n_hdlc_send_frames+0x24b/0x490 drivers/tty/n_hdlc.c:290
> tty_wakeup+0xe1/0x120 drivers/tty/tty_io.c:534
> __start_tty drivers/tty/tty_io.c:806 [inline]
> __start_tty+0xfb/0x130 drivers/tty/tty_io.c:799
> n_tty_ioctl_helper+0x299/0x2d0 drivers/tty/tty_ioctl.c:880
> n_hdlc_tty_ioctl+0xd2/0x340 drivers/tty/n_hdlc.c:633
> tty_ioctl+0xc69/0x1670 drivers/tty/tty_io.c:2814
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:874 [inline]
> __se_sys_ioctl fs/ioctl.c:860 [inline]
> __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f5a77a72ae9
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f5a74fe8188 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007f5a77b85f60 RCX: 00007f5a77a72ae9
> RDX: 0000000000000001 RSI: 000000000000540a RDI: 0000000000000003
> RBP: 00007f5a77accf25 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f5a780b9b2f R14: 00007f5a74fe8300 R15: 0000000000022000
> </TASK>
> BUG: scheduling while atomic: syz-executor.3/21727/0x00000002
> 3 locks held by syz-executor.3/21727:
> #0: ffff888022671098 (&tty->ldisc_sem){++++}-{0:0}, at:
tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
> #1: ffff888022671468 (&tty->flow.lock){+.+.}-{2:2}, at: spin_lock_irq
include/linux/spinlock.h:375 [inline]
> #1: ffff888022671468 (&tty->flow.lock){+.+.}-{2:2}, at:
n_tty_ioctl_helper+0xb6/0x2d0 drivers/tty/tty_ioctl.c:877
> #2: ffff888022671098 (&tty->ldisc_sem){++++}-{0:0}, at:
tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
> Modules linked in:
> Preemption disabled at:
> [<0000000000000000>] 0x0



2021-11-12 13:58:28

by Marco Elver

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Fri, 12 Nov 2021 at 13:22, Fabio M. De Francesco
<[email protected]> wrote:
[...]
> I think that this "BUG" is a false positive.
>
> In do_con_write(), Just before the call of console_lock() there is an
> in_interrupt() check that, if it evaluates to true, makes this function to
> return "count" and prevents the SAC bug.

It's not complaining about being in an interrupt, but rather
interrupts disabled, i.e. still an atomic context.

> > do_con_write+0x10f/0x1e40 drivers/tty/vt/vt.c:2908
> > con_write+0x21/0x40 drivers/tty/vt/vt.c:3295
> > n_hdlc_send_frames+0x24b/0x490 drivers/tty/n_hdlc.c:290
> > tty_wakeup+0xe1/0x120 drivers/tty/tty_io.c:534
> > __start_tty drivers/tty/tty_io.c:806 [inline]
> > __start_tty+0xfb/0x130 drivers/tty/tty_io.c:799
> > n_tty_ioctl_helper+0x299/0x2d0 drivers/tty/tty_ioctl.c:880

^^ n_tty_ioctl_helper() disabled interrupts via
spin_lock_irq(&tty->flow.lock)

2021-11-12 16:05:22

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Friday, November 12, 2021 2:58:14 PM CET Marco Elver wrote:
> On Fri, 12 Nov 2021 at 13:22, Fabio M. De Francesco
> <[email protected]> wrote:
> [...]
> > I think that this "BUG" is a false positive.
> >
> > In do_con_write(), Just before the call of console_lock() there is an
> > in_interrupt() check that, if it evaluates to true, makes this function
to
> > return "count" and prevents the SAC bug.
>
> It's not complaining about being in an interrupt, but rather
> interrupts disabled, i.e. still an atomic context.

Yes, still in an atomic context.

Actually, I've never talked about being "in an interrupt", but I've just said
that the in_interrupt() macro prevents to fall into the code that might
sleep.

Now I suppose that this is the place for in_atomic(). Isn't it?

I wrongly thought that in_interrupt() returns non-zero from any kind of
atomic context.

Thanks,

Fabio

> > > do_con_write+0x10f/0x1e40 drivers/tty/vt/vt.c:2908
> > > con_write+0x21/0x40 drivers/tty/vt/vt.c:3295
> > > n_hdlc_send_frames+0x24b/0x490 drivers/tty/n_hdlc.c:290
> > > tty_wakeup+0xe1/0x120 drivers/tty/tty_io.c:534
> > > __start_tty drivers/tty/tty_io.c:806 [inline]
> > > __start_tty+0xfb/0x130 drivers/tty/tty_io.c:799
> > > n_tty_ioctl_helper+0x299/0x2d0 drivers/tty/tty_ioctl.c:880
>
> ^^ n_tty_ioctl_helper() disabled interrupts via
> spin_lock_irq(&tty->flow.lock)
>





2021-11-12 16:28:01

by Marco Elver

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Fri, Nov 12, 2021 at 05:05PM +0100, Fabio M. De Francesco wrote:
> On Friday, November 12, 2021 2:58:14 PM CET Marco Elver wrote:
> > On Fri, 12 Nov 2021 at 13:22, Fabio M. De Francesco
> > <[email protected]> wrote:
> > [...]
> > > I think that this "BUG" is a false positive.
> > >
> > > In do_con_write(), Just before the call of console_lock() there is an
> > > in_interrupt() check that, if it evaluates to true, makes this function to
> > > return "count" and prevents the SAC bug.
> >
> > It's not complaining about being in an interrupt, but rather
> > interrupts disabled, i.e. still an atomic context.
>
> Yes, still in an atomic context.
>
> Actually, I've never talked about being "in an interrupt", but I've just said
> that the in_interrupt() macro prevents to fall into the code that might
> sleep.
>
> Now I suppose that this is the place for in_atomic(). Isn't it?

in_atomic() probably won't do:

/*
* Are we running in atomic context? WARNING: this macro cannot
* always detect atomic context; in particular, it cannot know about
* held spinlocks in non-preemptible kernels. Thus it should not be
* used in the general case to determine whether sleeping is possible.
* Do not use in_atomic() in driver code.
*/
#define in_atomic() (preempt_count() != 0)

In particular, it doesn't detect if interrupts are disabled.

My guess is that in this case '!preemptible()' could work:

#define preemptible() (preempt_count() == 0 && !irqs_disabled())

But still am not entirely sure.

Thanks,
-- Marco

2021-11-12 17:15:50

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Friday, November 12, 2021 5:27:51 PM CET Marco Elver wrote:

> My guess is that in this case '!preemptible()' could work:
>
> #define preemptible() (preempt_count() == 0 && !
irqs_disabled())
>
> But still am not entirely sure.
>
> Thanks,
> -- Marco

Oh, I didn't even know that we have that preemptible() macro.

Instead, I should have known that in_atomic() won't do, since last week I had
copy-pasted in an email that I sent to someone on the LKML exactly the same
inline documentation that you have showed now.

Thanks, Marco. At least, this thread was useful to make me recall how these
macros should be used :)

Regards,

Fabio





2021-11-13 20:13:34

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

syzbot has found a reproducer for the following issue on:

HEAD commit: f2e19fd15bd7 Add linux-next specific files for 20211112
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15def132b00000
kernel config: https://syzkaller.appspot.com/x/.config?x=ba9c83199208e103
dashboard link: https://syzkaller.appspot.com/bug?extid=5f47a8cea6a12b77a876
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12bdb2c9b00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=109194aab00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

BUG: sleeping function called from invalid context at kernel/printk/printk.c:2525
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6529, name: syz-executor925
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by syz-executor925/6529:
#0: ffff888018d79098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
#1: ffff888018d79468 (&tty->flow.lock){....}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:379 [inline]
#1: ffff888018d79468 (&tty->flow.lock){....}-{2:2}, at: n_tty_ioctl_helper+0xb6/0x2d0 drivers/tty/tty_ioctl.c:877
#2: ffff888018d79098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
irq event stamp: 16252
hardirqs last enabled at (16251): [<ffffffff8952dbc0>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline]
hardirqs last enabled at (16251): [<ffffffff8952dbc0>] _raw_spin_unlock_irqrestore+0x50/0x70 kernel/locking/spinlock.c:194
hardirqs last disabled at (16252): [<ffffffff8952d921>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (16252): [<ffffffff8952d921>] _raw_spin_lock_irq+0x41/0x50 kernel/locking/spinlock.c:170
softirqs last enabled at (14452): [<ffffffff812bc45c>] fpu_reset_fpregs arch/x86/kernel/fpu/core.c:601 [inline]
softirqs last enabled at (14452): [<ffffffff812bc45c>] fpu_flush_thread+0x23c/0x390 arch/x86/kernel/fpu/core.c:647
softirqs last disabled at (14450): [<ffffffff812bc3eb>] fpstate_reset arch/x86/kernel/fpu/core.c:452 [inline]
softirqs last disabled at (14450): [<ffffffff812bc3eb>] fpu_flush_thread+0x1cb/0x390 arch/x86/kernel/fpu/core.c:646
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 0 PID: 6529 Comm: syz-executor925 Not tainted 5.15.0-next-20211112-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
__might_resched.cold+0x222/0x26b kernel/sched/core.c:9554
console_lock+0x17/0x80 kernel/printk/printk.c:2525
do_con_write+0x10f/0x1e40 drivers/tty/vt/vt.c:2908
con_write+0x21/0x40 drivers/tty/vt/vt.c:3295
n_hdlc_send_frames+0x24b/0x490 drivers/tty/n_hdlc.c:290
tty_wakeup+0xe1/0x120 drivers/tty/tty_io.c:534
__start_tty drivers/tty/tty_io.c:806 [inline]
__start_tty+0xfb/0x130 drivers/tty/tty_io.c:799
n_tty_ioctl_helper+0x299/0x2d0 drivers/tty/tty_ioctl.c:880
n_hdlc_tty_ioctl+0xd2/0x340 drivers/tty/n_hdlc.c:633
tty_ioctl+0xc69/0x1670 drivers/tty/tty_io.c:2814
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f16c8522349
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff6ccb8b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f16c8522349
RDX: 0000000000000001 RSI: 000000000000540a RDI: 0000000000000005
RBP: 00007f16c84e61d0 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000


2021-11-16 07:58:03

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Saturday, November 13, 2021 9:13:24 PM CET syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: f2e19fd15bd7 Add linux-next specific files for 20211112
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15def132b00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=ba9c83199208e103
> dashboard link: https://syzkaller.appspot.com/bug?
extid=5f47a8cea6a12b77a876
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils
for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12bdb2c9b00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=109194aab00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the
commit:
> Reported-by: [email protected]
>
> BUG: sleeping function called from invalid context at kernel/printk/
printk.c:2525
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 6529, name: syz-
executor925
> preempt_count: 1, expected: 0
> RCU nest depth: 0, expected: 0
> 3 locks held by syz-executor925/6529:
> #0: ffff888018d79098 (&tty->ldisc_sem){++++}-{0:0}, at:
tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
> #1: ffff888018d79468 (&tty->flow.lock){....}-{2:2}, at: spin_lock_irq
include/linux/spinlock.h:379 [inline]
> #1: ffff888018d79468 (&tty->flow.lock){....}-{2:2}, at:
n_tty_ioctl_helper+0xb6/0x2d0 drivers/tty/tty_ioctl.c:877
> #2: ffff888018d79098 (&tty->ldisc_sem){++++}-{0:0}, at:
tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
> irq event stamp: 16252
> hardirqs last enabled at (16251): [<ffffffff8952dbc0>]
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline]
> hardirqs last enabled at (16251): [<ffffffff8952dbc0>]
_raw_spin_unlock_irqrestore+0x50/0x70 kernel/locking/spinlock.c:194
> hardirqs last disabled at (16252): [<ffffffff8952d921>] __raw_spin_lock_irq
include/linux/spinlock_api_smp.h:117 [inline]
> hardirqs last disabled at (16252): [<ffffffff8952d921>]
_raw_spin_lock_irq+0x41/0x50 kernel/locking/spinlock.c:170
> softirqs last enabled at (14452): [<ffffffff812bc45c>] fpu_reset_fpregs
arch/x86/kernel/fpu/core.c:601 [inline]
> softirqs last enabled at (14452): [<ffffffff812bc45c>]
fpu_flush_thread+0x23c/0x390 arch/x86/kernel/fpu/core.c:647
> softirqs last disabled at (14450): [<ffffffff812bc3eb>] fpstate_reset arch/
x86/kernel/fpu/core.c:452 [inline]
> softirqs last disabled at (14450): [<ffffffff812bc3eb>]
fpu_flush_thread+0x1cb/0x390 arch/x86/kernel/fpu/core.c:646
> Preemption disabled at:
> [<0000000000000000>] 0x0
> CPU: 0 PID: 6529 Comm: syz-executor925 Not tainted 5.15.0-next-20211112-
syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> __might_resched.cold+0x222/0x26b kernel/sched/core.c:9554
> console_lock+0x17/0x80 kernel/printk/printk.c:2525
> do_con_write+0x10f/0x1e40 drivers/tty/vt/vt.c:2908
> con_write+0x21/0x40 drivers/tty/vt/vt.c:3295
> n_hdlc_send_frames+0x24b/0x490 drivers/tty/n_hdlc.c:290
> tty_wakeup+0xe1/0x120 drivers/tty/tty_io.c:534
> __start_tty drivers/tty/tty_io.c:806 [inline]
> __start_tty+0xfb/0x130 drivers/tty/tty_io.c:799
> n_tty_ioctl_helper+0x299/0x2d0 drivers/tty/tty_ioctl.c:880
> n_hdlc_tty_ioctl+0xd2/0x340 drivers/tty/n_hdlc.c:633
> tty_ioctl+0xc69/0x1670 drivers/tty/tty_io.c:2814
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:874 [inline]
> __se_sys_ioctl fs/ioctl.c:860 [inline]
> __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f16c8522349
> Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fff6ccb8b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f16c8522349
> RDX: 0000000000000001 RSI: 000000000000540a RDI: 0000000000000005
> RBP: 00007f16c84e61d0 R08: 0000000000000000 R09: 0000000000000000
> R10: 00000000000

I think that thanks to Marco Elver we have a solution for this issue.

However I'd propose a transformation on Marco's solution based on one of the
De Morgan's laws: !(A && B) <=> !A || !B.

I think that this is more readable and comprehensible.

Therefore, if I'm not wrong, Marco's "!preemptible()", that is "if (!
(preempt_count() == 0 && !irqs_disabled())", might be rewritten to an easier
to understand "if (preempt_count() || irqs_disabled())".

Am I wrong? Let's test it...

#syz test:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

--
Fabio M. De Francesco


Attachments:
vt.c.diff (402.00 B)

2021-11-16 08:09:13

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: sleeping function called from invalid context in __might_resched

BUG: sleeping function called from invalid context at kernel/printk/printk.c:2522
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 8755, name: syz-executor.2
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by syz-executor.2/8755:
#0: ffff888070c9a098
(&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
#1: ffff888070c9a468
(&tty->flow.lock){....}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:374 [inline]
(&tty->flow.lock){....}-{2:2}, at: n_tty_ioctl_helper+0xb6/0x2d0 drivers/tty/tty_ioctl.c:877
#2: ffff888070c9a098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
irq event stamp: 916
hardirqs last enabled at (915): [<ffffffff81beabd5>] kasan_quarantine_put+0xf5/0x210 mm/kasan/quarantine.c:220
hardirqs last disabled at (916): [<ffffffff8950a731>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (916): [<ffffffff8950a731>] _raw_spin_lock_irq+0x41/0x50 kernel/locking/spinlock.c:170
softirqs last enabled at (0): [<ffffffff8144cf0c>] copy_process+0x1e8c/0x75a0 kernel/fork.c:2136
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 1 PID: 8755 Comm: syz-executor.2 Not tainted 5.16.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
__might_resched.cold+0x222/0x26b kernel/sched/core.c:9542
console_lock+0x17/0x80 kernel/printk/printk.c:2522
con_flush_chars drivers/tty/vt/vt.c:3365 [inline]
con_flush_chars+0x35/0x90 drivers/tty/vt/vt.c:3357
con_write+0x2c/0x40 drivers/tty/vt/vt.c:3296
n_hdlc_send_frames+0x24b/0x490 drivers/tty/n_hdlc.c:290
tty_wakeup+0xe1/0x120 drivers/tty/tty_io.c:534
__start_tty drivers/tty/tty_io.c:806 [inline]
__start_tty+0xfb/0x130 drivers/tty/tty_io.c:799
n_tty_ioctl_helper+0x299/0x2d0 drivers/tty/tty_ioctl.c:880
n_hdlc_tty_ioctl+0xd2/0x340 drivers/tty/n_hdlc.c:633
tty_ioctl+0xc69/0x1670 drivers/tty/tty_io.c:2814
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f9322bb3ae9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f9322329188 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f9322cc6f60 RCX: 00007f9322bb3ae9
RDX: 0000000000000001 RSI: 000000000000540a RDI: 0000000000000005
RBP: 00007f9322c0df6d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe7ad1cc3f R14: 00007f9322329300 R15: 0000000000022000
</TASK>
BUG: scheduling while atomic: syz-executor.2/8755/0x00000002
3 locks held by syz-executor.2/8755:
#0: ffff888070c9a098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
#1: ffff888070c9a468 (&tty->flow.lock){+.+.}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:374 [inline]
#1: ffff888070c9a468 (&tty->flow.lock){+.+.}-{2:2}, at: n_tty_ioctl_helper+0xb6/0x2d0 drivers/tty/tty_ioctl.c:877
#2: ffff888070c9a098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
Modules linked in:
Preemption disabled at:
[<0000000000000000>] 0x0


Tested on:

commit: 8ab77458 Merge tag 'trace-v5.16-5' of git://git.kernel..
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git --
console output: https://syzkaller.appspot.com/x/log.txt?x=16dd826eb00000
kernel config: https://syzkaller.appspot.com/x/.config?x=d2db820d271ec6d2
dashboard link: https://syzkaller.appspot.com/bug?extid=5f47a8cea6a12b77a876
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=15c10b69b00000


2021-11-16 08:54:08

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Tuesday, November 16, 2021 9:09:11 AM CET syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering
an issue:
> BUG: sleeping function called from invalid context in __might_resched
>
> BUG: sleeping function called from invalid context at kernel/printk/
printk.c:2522
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 8755, name: syz-
executor.2
> preempt_count: 1, expected: 0
> RCU nest depth: 0, expected: 0
> 3 locks held by syz-executor.2/8755:
> #0: ffff888070c9a098
> (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/
tty/tty_ldisc.c:252
> #1: ffff888070c9a468
> (&tty->flow.lock){....}-{2:2}, at: spin_lock_irq include/linux/spinlock.h:
374 [inline]
> (&tty->flow.lock){....}-{2:2}, at: n_tty_ioctl_helper+0xb6/0x2d0 drivers/
tty/tty_ioctl.c:877
> #2: ffff888070c9a098 (&tty->ldisc_sem){++++}-{0:0}, at:
tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
> irq event stamp: 916
> hardirqs last enabled at (915): [<ffffffff81beabd5>]
kasan_quarantine_put+0xf5/0x210 mm/kasan/quarantine.c:220
> hardirqs last disabled at (916): [<ffffffff8950a731>] __raw_spin_lock_irq
include/linux/spinlock_api_smp.h:117 [inline]
> hardirqs last disabled at (916): [<ffffffff8950a731>]
_raw_spin_lock_irq+0x41/0x50 kernel/locking/spinlock.c:170
> softirqs last enabled at (0): [<ffffffff8144cf0c>] copy_process+0x1e8c/
0x75a0 kernel/fork.c:2136
> softirqs last disabled at (0): [<0000000000000000>] 0x0
> Preemption disabled at:
> [<0000000000000000>] 0x0
> CPU: 1 PID: 8755 Comm: syz-executor.2 Not tainted 5.16.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> __might_resched.cold+0x222/0x26b kernel/sched/core.c:9542
> console_lock+0x17/0x80 kernel/printk/printk.c:2522
> con_flush_chars drivers/tty/vt/vt.c:3365 [inline]
> con_flush_chars+0x35/0x90 drivers/tty/vt/vt.c:3357
> con_write+0x2c/0x40 drivers/tty/vt/vt.c:3296

The reproducer is still triggering an issue, but this time it looks like it
is triggered by a different path of execution.

The same invalid "in_interrupt()" test is also in con_flush_chars().

Let's try to remove it too...

My first idea would be to replace "if (in_interrupt())" with the same
"preempt_count() || irqs_disabled()" I used in do_con_write(). However I
noticed that both do_con_write() and con_flush_chars() are only called from
inside con_write() (which, aside from calling those functions, does nothing
else).

So why not remove the if (in_interrupt()) from both them and use if
(preempt_count() || irqs_disabled()) just once in con_write()?

I think this should be the right solution, but I prefer to go one step at a
time.

Therefore, I'll (1) use the same (redundant, if it was used in con_write())
test also in con_flush_chars(), (2) wait for Syzbot to confirm that it fixes
the bug, and (3) wait for maintainers review and suggestions about whether or
not moving those tests one level upper.

#syz test:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

---
Fabio M. De Francesco


Attachments:
vt.c.diff (684.00 B)

2021-11-16 08:55:33

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to checkout kernel repo git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/---: failed to run ["git" "fetch" "--force" "f569e972c8e9057ee9c286220c83a480ebf30cc5" "---"]: exit status 129
error: unknown option `-'
usage: git fetch [<options>] [<repository> [<refspec>...]]
or: git fetch [<options>] <group>
or: git fetch --multiple [<options>] [(<repository> | <group>)...]
or: git fetch --all [<options>]

-v, --verbose be more verbose
-q, --quiet be more quiet
--all fetch from all remotes
--set-upstream set upstream for git pull/fetch
-a, --append append to .git/FETCH_HEAD instead of overwriting
--upload-pack <path> path to upload pack on remote end
-f, --force force overwrite of local reference
-m, --multiple fetch from multiple remotes
-t, --tags fetch all tags and associated objects
-n do not fetch all tags (--no-tags)
-j, --jobs <n> number of submodules fetched in parallel
-p, --prune prune remote-tracking branches no longer on remote
-P, --prune-tags prune local tags no longer on remote and clobber changed tags
--recurse-submodules[=<on-demand>]
control recursive fetching of submodules
--dry-run dry run
--write-fetch-head write fetched references to the FETCH_HEAD file
-k, --keep keep downloaded pack
-u, --update-head-ok allow updating of HEAD ref
--progress force progress reporting
--depth <depth> deepen history of shallow clone
--shallow-since <time>
deepen history of shallow repository based on time
--shallow-exclude <revision>
deepen history of shallow clone, excluding rev
--deepen <n> deepen history of shallow clone
--unshallow convert to a complete repository
--update-shallow accept refs that update .git/shallow
--refmap <refmap> specify fetch refmap
-o, --server-option <server-specific>
option to transmit
-4, --ipv4 use IPv4 addresses only
-6, --ipv6 use IPv6 addresses only
--negotiation-tip <revision>
report that we have only objects reachable from this object
--filter <args> object filtering
--auto-maintenance run 'maintenance --auto' after fetching
--auto-gc run 'maintenance --auto' after fetching
--show-forced-updates
check for forced-updates on all updated branches
--write-commit-graph write the commit-graph after fetching
--stdin accept refspecs from stdin




Tested on:

commit: [unknown
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ---
dashboard link: https://syzkaller.appspot.com/bug?extid=5f47a8cea6a12b77a876
compiler:
patch: https://syzkaller.appspot.com/x/patch.diff?x=131e6221b00000


2021-11-16 09:03:39

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Tuesday, November 16, 2021 9:53:53 AM CET Fabio M. De Francesco wrote:
> On Tuesday, November 16, 2021 9:09:11 AM CET syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch but the reproducer is still
triggering
> an issue:
> > BUG: sleeping function called from invalid context in __might_resched
> > [...]
>
> The reproducer is still triggering an issue, but this time it looks like it
> is triggered by a different path of execution.
>
> The same invalid "in_interrupt()" test is also in con_flush_chars().
>
> Let's try to remove it too...
>
> My first idea would be to replace "if (in_interrupt())" with the same
> "preempt_count() || irqs_disabled()" I used in do_con_write(). However I
> noticed that both do_con_write() and con_flush_chars() are only called from
> inside con_write() (which, aside from calling those functions, does nothing
> else).
>
> So why not remove the if (in_interrupt()) from both them and use if
> (preempt_count() || irqs_disabled()) just once in con_write()?
>
> I think this should be the right solution, but I prefer to go one step at a
> time.
>
> Therefore, I'll (1) use the same (redundant, if it was used in con_write())
> test also in con_flush_chars(), (2) wait for Syzbot to confirm that it
fixes
> the bug, and (3) wait for maintainers review and suggestions about whether
or
> not moving those tests one level upper.
>
> #syz test:
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
> ---
> Fabio M. De Francesco

Don't know exactly what happened, so let's retry the test...

#syz test:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git


Attachments:
vt.c.diff (684.00 B)

2021-11-16 09:03:51

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

> On Tuesday, November 16, 2021 9:53:53 AM CET Fabio M. De Francesco wrote:
>> On Tuesday, November 16, 2021 9:09:11 AM CET syzbot wrote:
>> > Hello,
>> >
>> > syzbot has tested the proposed patch but the reproducer is still
> triggering
>> an issue:
>> > BUG: sleeping function called from invalid context in __might_resched
>> > [...]
>>
>> The reproducer is still triggering an issue, but this time it looks like it
>> is triggered by a different path of execution.
>>
>> The same invalid "in_interrupt()" test is also in con_flush_chars().
>>
>> Let's try to remove it too...
>>
>> My first idea would be to replace "if (in_interrupt())" with the same
>> "preempt_count() || irqs_disabled()" I used in do_con_write(). However I
>> noticed that both do_con_write() and con_flush_chars() are only called from
>> inside con_write() (which, aside from calling those functions, does nothing
>> else).
>>
>> So why not remove the if (in_interrupt()) from both them and use if
>> (preempt_count() || irqs_disabled()) just once in con_write()?
>>
>> I think this should be the right solution, but I prefer to go one step at a
>> time.
>>
>> Therefore, I'll (1) use the same (redundant, if it was used in con_write())
>> test also in con_flush_chars(), (2) wait for Syzbot to confirm that it
> fixes
>> the bug, and (3) wait for maintainers review and suggestions about whether
> or
>> not moving those tests one level upper.
>>
>> #syz test:
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>
>> ---
>> Fabio M. De Francesco
>
> Don't know exactly what happened, so let's retry the test...
>
> #syz test:

want 2 args (repo, branch), got 1

> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2021-11-16 09:13:29

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Tuesday, November 16, 2021 9:53:53 AM CET Fabio M. De Francesco wrote:
> On Tuesday, November 16, 2021 9:09:11 AM CET syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch but the reproducer is still
triggering
> an issue:
> > BUG: sleeping function called from invalid context in __might_resched
> >
> > BUG: sleeping function called from invalid context at kernel/printk/
> printk.c:2522
> > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 8755, name: syz-
> executor.2
> > preempt_count: 1, expected: 0
> > RCU nest depth: 0, expected: 0
> > 3 locks held by syz-executor.2/8755:
> > #0: ffff888070c9a098
> > (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/
> tty/tty_ldisc.c:252
> > #1: ffff888070c9a468
> > (&tty->flow.lock){....}-{2:2}, at: spin_lock_irq include/linux/
spinlock.h:
> 374 [inline]
> > (&tty->flow.lock){....}-{2:2}, at: n_tty_ioctl_helper+0xb6/0x2d0
drivers/
> tty/tty_ioctl.c:877
> > #2: ffff888070c9a098 (&tty->ldisc_sem){++++}-{0:0}, at:
> tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
> > irq event stamp: 916
> > hardirqs last enabled at (915): [<ffffffff81beabd5>]
> kasan_quarantine_put+0xf5/0x210 mm/kasan/quarantine.c:220
> > hardirqs last disabled at (916): [<ffffffff8950a731>] __raw_spin_lock_irq
> include/linux/spinlock_api_smp.h:117 [inline]
> > hardirqs last disabled at (916): [<ffffffff8950a731>]
> _raw_spin_lock_irq+0x41/0x50 kernel/locking/spinlock.c:170
> > softirqs last enabled at (0): [<ffffffff8144cf0c>] copy_process+0x1e8c/
> 0x75a0 kernel/fork.c:2136
> > softirqs last disabled at (0): [<0000000000000000>] 0x0
> > Preemption disabled at:
> > [<0000000000000000>] 0x0
> > CPU: 1 PID: 8755 Comm: syz-executor.2 Not tainted 5.16.0-rc1-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > __might_resched.cold+0x222/0x26b kernel/sched/core.c:9542
> > console_lock+0x17/0x80 kernel/printk/printk.c:2522
> > con_flush_chars drivers/tty/vt/vt.c:3365 [inline]
> > con_flush_chars+0x35/0x90 drivers/tty/vt/vt.c:3357
> > con_write+0x2c/0x40 drivers/tty/vt/vt.c:3296
>
> The reproducer is still triggering an issue, but this time it looks like it
> is triggered by a different path of execution.
>
> The same invalid "in_interrupt()" test is also in con_flush_chars().
>
> Let's try to remove it too...
>
> My first idea would be to replace "if (in_interrupt())" with the same
> "preempt_count() || irqs_disabled()" I used in do_con_write(). However I
> noticed that both do_con_write() and con_flush_chars() are only called from
> inside con_write() (which, aside from calling those functions, does nothing
> else).
>
> So why not remove the if (in_interrupt()) from both them and use if
> (preempt_count() || irqs_disabled()) just once in con_write()?
>
> I think this should be the right solution, but I prefer to go one step at a
> time.
>
> Therefore, I'll (1) use the same (redundant, if it was used in con_write())
> test also in con_flush_chars(), (2) wait for Syzbot to confirm that it
fixes
> the bug, and (3) wait for maintainers review and suggestions about whether
or
> not moving those tests one level upper.
>

#syz test:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

>
> ---
> Fabio M. De Francesco
>


Attachments:
vt.c.diff (684.00 B)

2021-11-16 09:20:13

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: [email protected]

Tested on:

commit: 8ab77458 Merge tag 'trace-v5.16-5' of git://git.kernel..
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git --
kernel config: https://syzkaller.appspot.com/x/.config?x=d2db820d271ec6d2
dashboard link: https://syzkaller.appspot.com/bug?extid=5f47a8cea6a12b77a876
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=131739e1b00000

Note: testing is done by a robot and is best-effort only.

2021-11-16 09:38:22

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: [email protected]

Tested on:

commit: 8ab77458 Merge tag 'trace-v5.16-5' of git://git.kernel..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=d2db820d271ec6d2
dashboard link: https://syzkaller.appspot.com/bug?extid=5f47a8cea6a12b77a876
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=149be83eb00000

Note: testing is done by a robot and is best-effort only.

2021-11-16 10:25:35

by Marco Elver

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Tue, 16 Nov 2021 at 08:57, Fabio M. De Francesco
<[email protected]> wrote:
[...]
> I think that this is more readable and comprehensible.
>
> Therefore, if I'm not wrong, Marco's "!preemptible()", that is "if (!
> (preempt_count() == 0 && !irqs_disabled())", might be rewritten to an easier
> to understand "if (preempt_count() || irqs_disabled())".
>
> Am I wrong? Let's test it...

It's right, but why not use preemptible()? The definition of
preemptible() might change and then you'd have to fix the code again.

I actually find (preempt_count() || irqs_disabled()) tells me less of
what your intent here is vs. just writing !preemptible().

2021-11-16 11:37:59

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

On Tuesday, November 16, 2021 11:24:54 AM CET Marco Elver wrote:
> On Tue, 16 Nov 2021 at 08:57, Fabio M. De Francesco
> <[email protected]> wrote:
> [...]
> > I think that this is more readable and comprehensible.
> >
> > Therefore, if I'm not wrong, Marco's "!preemptible()", that is "if (!
> > (preempt_count() == 0 && !irqs_disabled())", might be rewritten to an
easier
> > to understand "if (preempt_count() || irqs_disabled())".
> >
> > Am I wrong? Let's test it...
>
> It's right, but why not use preemptible()? The definition of
> preemptible() might change and then you'd have to fix the code again.
>
> I actually find (preempt_count() || irqs_disabled()) tells me less of
> what your intent here is vs. just writing !preemptible().
>
You are right :)

If we have a macro, there must be a good reason behind its existence. So
let's use it.

For I didn't know that we have that macro, I had to read its definition. Then
I had to understand what means the negation of its parts. It was a bit
difficult to understand, so I thought that open coding if we have preemption
disabled or irqs disabled was easier to understand.

But now I see that, as said, if we have an API we should use it.

I'm preparing a patch and give you proper credit for suggestions.

Thanks,

Fabio M. De Francesco