2020-01-03 03:47:02

by Zhenzhong Duan

[permalink] [raw]
Subject: [PATCH RESEND] ttyprintk: fix a potential sleeping in interrupt context issue

Google syzbot reports:
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:938
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/1
1 lock held by swapper/1/0:
...
Call Trace:
<IRQ>
dump_stack+0x197/0x210
___might_sleep.cold+0x1fb/0x23e
__might_sleep+0x95/0x190
__mutex_lock+0xc5/0x13c0
mutex_lock_nested+0x16/0x20
tpk_write+0x5d/0x340
resync_tnc+0x1b6/0x320
call_timer_fn+0x1ac/0x780
run_timer_softirq+0x6c3/0x1790
__do_softirq+0x262/0x98c
irq_exit+0x19b/0x1e0
smp_apic_timer_interrupt+0x1a3/0x610
apic_timer_interrupt+0xf/0x20
</IRQ>

Fix it by using spinlock in process context instead of mutex and having
interrupt disabled in critical section.

Signed-off-by: Zhenzhong Duan <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
---
drivers/char/ttyprintk.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c
index 4f24e46ebe7c..56db949a7b70 100644
--- a/drivers/char/ttyprintk.c
+++ b/drivers/char/ttyprintk.c
@@ -15,10 +15,11 @@
#include <linux/serial.h>
#include <linux/tty.h>
#include <linux/module.h>
+#include <linux/spinlock.h>

struct ttyprintk_port {
struct tty_port port;
- struct mutex port_write_mutex;
+ spinlock_t spinlock;
};

static struct ttyprintk_port tpk_port;
@@ -99,11 +100,12 @@ static int tpk_open(struct tty_struct *tty, struct file *filp)
static void tpk_close(struct tty_struct *tty, struct file *filp)
{
struct ttyprintk_port *tpkp = tty->driver_data;
+ unsigned long flags;

- mutex_lock(&tpkp->port_write_mutex);
+ spin_lock_irqsave(&tpkp->spinlock, flags);
/* flush tpk_printk buffer */
tpk_printk(NULL, 0);
- mutex_unlock(&tpkp->port_write_mutex);
+ spin_unlock_irqrestore(&tpkp->spinlock, flags);

tty_port_close(&tpkp->port, tty, filp);
}
@@ -115,13 +117,14 @@ static int tpk_write(struct tty_struct *tty,
const unsigned char *buf, int count)
{
struct ttyprintk_port *tpkp = tty->driver_data;
+ unsigned long flags;
int ret;


/* exclusive use of tpk_printk within this tty */
- mutex_lock(&tpkp->port_write_mutex);
+ spin_lock_irqsave(&tpkp->spinlock, flags);
ret = tpk_printk(buf, count);
- mutex_unlock(&tpkp->port_write_mutex);
+ spin_unlock_irqrestore(&tpkp->spinlock, flags);

return ret;
}
@@ -171,7 +174,7 @@ static int __init ttyprintk_init(void)
{
int ret = -ENOMEM;

- mutex_init(&tpk_port.port_write_mutex);
+ spin_lock_init(&tpk_port.spinlock);

ttyprintk_driver = tty_alloc_driver(1,
TTY_DRIVER_RESET_TERMIOS |
--
2.17.1


2020-01-03 08:44:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH RESEND] ttyprintk: fix a potential sleeping in interrupt context issue

On Fri, Jan 03, 2020 at 11:45:41AM +0800, Zhenzhong Duan wrote:
> Google syzbot reports:
> BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:938
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/1
> 1 lock held by swapper/1/0:
> ...
> Call Trace:
> <IRQ>
> dump_stack+0x197/0x210
> ___might_sleep.cold+0x1fb/0x23e
> __might_sleep+0x95/0x190
> __mutex_lock+0xc5/0x13c0
> mutex_lock_nested+0x16/0x20
> tpk_write+0x5d/0x340
> resync_tnc+0x1b6/0x320
> call_timer_fn+0x1ac/0x780
> run_timer_softirq+0x6c3/0x1790
> __do_softirq+0x262/0x98c
> irq_exit+0x19b/0x1e0
> smp_apic_timer_interrupt+0x1a3/0x610
> apic_timer_interrupt+0xf/0x20
> </IRQ>
>
> Fix it by using spinlock in process context instead of mutex and having
> interrupt disabled in critical section.
>
> Signed-off-by: Zhenzhong Duan <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> ---
> drivers/char/ttyprintk.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)

Why was this resent? What differs from the first version that required
it to be resent?

Always give us a clue here please :)


>
> diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c
> index 4f24e46ebe7c..56db949a7b70 100644
> --- a/drivers/char/ttyprintk.c
> +++ b/drivers/char/ttyprintk.c
> @@ -15,10 +15,11 @@
> #include <linux/serial.h>
> #include <linux/tty.h>
> #include <linux/module.h>
> +#include <linux/spinlock.h>
>
> struct ttyprintk_port {
> struct tty_port port;
> - struct mutex port_write_mutex;
> + spinlock_t spinlock;
> };
>
> static struct ttyprintk_port tpk_port;
> @@ -99,11 +100,12 @@ static int tpk_open(struct tty_struct *tty, struct file *filp)
> static void tpk_close(struct tty_struct *tty, struct file *filp)
> {
> struct ttyprintk_port *tpkp = tty->driver_data;
> + unsigned long flags;
>
> - mutex_lock(&tpkp->port_write_mutex);
> + spin_lock_irqsave(&tpkp->spinlock, flags);
> /* flush tpk_printk buffer */
> tpk_printk(NULL, 0);

Are you sure you can call this with a spinlock held?

Doesn't your trace above show the opposite?

What is wrong with sleeping during the mutex you currently have? How is
syzbot reporting this error, is there a reproducer somewhere?

thanks,

greg k-h

2020-01-06 02:46:04

by Zhenzhong Duan

[permalink] [raw]
Subject: Re: [PATCH RESEND] ttyprintk: fix a potential sleeping in interrupt context issue

On Fri, Jan 3, 2020 at 4:43 PM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Fri, Jan 03, 2020 at 11:45:41AM +0800, Zhenzhong Duan wrote:
> > Google syzbot reports:
> > BUG: sleeping function called from invalid context at
> > kernel/locking/mutex.c:938
> > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/1
> > 1 lock held by swapper/1/0:
> > ...
> > Call Trace:
> > <IRQ>
> > dump_stack+0x197/0x210
> > ___might_sleep.cold+0x1fb/0x23e
> > __might_sleep+0x95/0x190
> > __mutex_lock+0xc5/0x13c0
> > mutex_lock_nested+0x16/0x20
> > tpk_write+0x5d/0x340
> > resync_tnc+0x1b6/0x320
> > call_timer_fn+0x1ac/0x780
> > run_timer_softirq+0x6c3/0x1790
> > __do_softirq+0x262/0x98c
> > irq_exit+0x19b/0x1e0
> > smp_apic_timer_interrupt+0x1a3/0x610
> > apic_timer_interrupt+0xf/0x20
> > </IRQ>
> >
> > Fix it by using spinlock in process context instead of mutex and having
> > interrupt disabled in critical section.
> >
> > Signed-off-by: Zhenzhong Duan <[email protected]>
> > Cc: Arnd Bergmann <[email protected]>
> > Cc: Greg Kroah-Hartman <[email protected]>
> > ---
> > drivers/char/ttyprintk.c | 15 +++++++++------
> > 1 file changed, 9 insertions(+), 6 deletions(-)
>
> Why was this resent? What differs from the first version that required
> it to be resent?
>
> Always give us a clue here please :)
Sorry, I should have done that.
patch-bot told me my last version is malformed(tabs converted to
spaces) which may be due to I used gmail web browser to send patch.
Now I have direct access to smtp.gmail.com and use 'git send-email',
so that's not an issue now. No functional changes compared to last
version.

>
>
> >
> > diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c
> > index 4f24e46ebe7c..56db949a7b70 100644
> > --- a/drivers/char/ttyprintk.c
> > +++ b/drivers/char/ttyprintk.c
> > @@ -15,10 +15,11 @@
> > #include <linux/serial.h>
> > #include <linux/tty.h>
> > #include <linux/module.h>
> > +#include <linux/spinlock.h>
> >
> > struct ttyprintk_port {
> > struct tty_port port;
> > - struct mutex port_write_mutex;
> > + spinlock_t spinlock;
> > };
> >
> > static struct ttyprintk_port tpk_port;
> > @@ -99,11 +100,12 @@ static int tpk_open(struct tty_struct *tty, struct file *filp)
> > static void tpk_close(struct tty_struct *tty, struct file *filp)
> > {
> > struct ttyprintk_port *tpkp = tty->driver_data;
> > + unsigned long flags;
> >
> > - mutex_lock(&tpkp->port_write_mutex);
> > + spin_lock_irqsave(&tpkp->spinlock, flags);
> > /* flush tpk_printk buffer */
> > tpk_printk(NULL, 0);
>
> Are you sure you can call this with a spinlock held?
I think so.

>
> Doesn't your trace above show the opposite?
That's why I use spin_lock_irqsave() variants rather than spin_lock()

The issue here is tpk_write()/tpk_close() could be interrupted when
holding a mutex, then in timer handler tpk_write() is called again
trying to acquire same mutex, lead to dead lock.

With spin_lock_irqsave(), interrupt is disabled in process context, so
no such issue.

>
> What is wrong with sleeping during the mutex you currently have? How is
> syzbot reporting this error, is there a reproducer somewhere?
See https://syzkaller.appspot.com/bug?extid=2eeef62ee31f9460ad65

I didn't reproduce the dead lock locally, not even for the warning
syzbot reported, but syzbot does.

Thanks
Zhenzhong

2020-01-11 20:04:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH RESEND] ttyprintk: fix a potential sleeping in interrupt context issue

On Mon, Jan 06, 2020 at 10:44:42AM +0800, Zhenzhong Duan wrote:
> On Fri, Jan 3, 2020 at 4:43 PM Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Fri, Jan 03, 2020 at 11:45:41AM +0800, Zhenzhong Duan wrote:
> > > Google syzbot reports:
> > > BUG: sleeping function called from invalid context at
> > > kernel/locking/mutex.c:938
> > > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/1
> > > 1 lock held by swapper/1/0:
> > > ...
> > > Call Trace:
> > > <IRQ>
> > > dump_stack+0x197/0x210
> > > ___might_sleep.cold+0x1fb/0x23e
> > > __might_sleep+0x95/0x190
> > > __mutex_lock+0xc5/0x13c0
> > > mutex_lock_nested+0x16/0x20
> > > tpk_write+0x5d/0x340
> > > resync_tnc+0x1b6/0x320
> > > call_timer_fn+0x1ac/0x780
> > > run_timer_softirq+0x6c3/0x1790
> > > __do_softirq+0x262/0x98c
> > > irq_exit+0x19b/0x1e0
> > > smp_apic_timer_interrupt+0x1a3/0x610
> > > apic_timer_interrupt+0xf/0x20
> > > </IRQ>
> > >
> > > Fix it by using spinlock in process context instead of mutex and having
> > > interrupt disabled in critical section.
> > >
> > > Signed-off-by: Zhenzhong Duan <[email protected]>
> > > Cc: Arnd Bergmann <[email protected]>
> > > Cc: Greg Kroah-Hartman <[email protected]>
> > > ---
> > > drivers/char/ttyprintk.c | 15 +++++++++------
> > > 1 file changed, 9 insertions(+), 6 deletions(-)
> >
> > Why was this resent? What differs from the first version that required
> > it to be resent?
> >
> > Always give us a clue here please :)
> Sorry, I should have done that.
> patch-bot told me my last version is malformed(tabs converted to
> spaces) which may be due to I used gmail web browser to send patch.
> Now I have direct access to smtp.gmail.com and use 'git send-email',
> so that's not an issue now. No functional changes compared to last
> version.

Ok, then this is a v2 patch, and it needs to be described what changed
below the --- line. Please always do that, when you resend this again.

> > > diff --git a/drivers/char/ttyprintk.c b/drivers/char/ttyprintk.c
> > > index 4f24e46ebe7c..56db949a7b70 100644
> > > --- a/drivers/char/ttyprintk.c
> > > +++ b/drivers/char/ttyprintk.c
> > > @@ -15,10 +15,11 @@
> > > #include <linux/serial.h>
> > > #include <linux/tty.h>
> > > #include <linux/module.h>
> > > +#include <linux/spinlock.h>
> > >
> > > struct ttyprintk_port {
> > > struct tty_port port;
> > > - struct mutex port_write_mutex;
> > > + spinlock_t spinlock;
> > > };
> > >
> > > static struct ttyprintk_port tpk_port;
> > > @@ -99,11 +100,12 @@ static int tpk_open(struct tty_struct *tty, struct file *filp)
> > > static void tpk_close(struct tty_struct *tty, struct file *filp)
> > > {
> > > struct ttyprintk_port *tpkp = tty->driver_data;
> > > + unsigned long flags;
> > >
> > > - mutex_lock(&tpkp->port_write_mutex);
> > > + spin_lock_irqsave(&tpkp->spinlock, flags);
> > > /* flush tpk_printk buffer */
> > > tpk_printk(NULL, 0);
> >
> > Are you sure you can call this with a spinlock held?
> I think so.
>
> >
> > Doesn't your trace above show the opposite?
> That's why I use spin_lock_irqsave() variants rather than spin_lock()
>
> The issue here is tpk_write()/tpk_close() could be interrupted when
> holding a mutex, then in timer handler tpk_write() is called again
> trying to acquire same mutex, lead to dead lock.
>
> With spin_lock_irqsave(), interrupt is disabled in process context, so
> no such issue.
>
> >
> > What is wrong with sleeping during the mutex you currently have? How is
> > syzbot reporting this error, is there a reproducer somewhere?
> See https://syzkaller.appspot.com/bug?extid=2eeef62ee31f9460ad65
>
> I didn't reproduce the dead lock locally, not even for the warning
> syzbot reported, but syzbot does.

If this is a syzbot found issue, please include in the changelog text
that information and the proper link to how to have it marked that way
(it's in the syzkaller information.)

Fix that up and resend a v3 please.

thanks,

greg k-h