2021-04-13 09:34:59

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in gadget_setup

On Tue, Apr 13, 2021 at 10:08 AM syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 0f4498ce Merge tag 'for-5.12/dm-fixes-2' of git://git.kern..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=124adbf6d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=daeff30c2474a60f
> dashboard link: https://syzkaller.appspot.com/bug?extid=eb4674092e6cc8d9e0bd
> userspace arch: i386
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]

I suspect that the raw gadget_unbind() can be called while the timer
is still active. gadget_unbind() sets gadget data to NULL.
But I am not sure which unbind call this is:
usb_gadget_remove_driver() or right in udc_bind_to_driver() due to a
start error.

Also looking at the code, gadget_bind() resets data to NULL on this error path:
https://elixir.bootlin.com/linux/v5.12-rc7/source/drivers/usb/gadget/legacy/raw_gadget.c#L283
but not on this error path:
https://elixir.bootlin.com/linux/v5.12-rc7/source/drivers/usb/gadget/legacy/raw_gadget.c#L306
Should the second one also reset data to NULL?


> general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
> CPU: 1 PID: 5016 Comm: systemd-udevd Not tainted 5.12.0-rc4-syzkaller #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> RIP: 0010:__lock_acquire+0xcfe/0x54c0 kernel/locking/lockdep.c:4770
> Code: 09 0e 41 bf 01 00 00 00 0f 86 8c 00 00 00 89 05 48 69 09 0e e9 81 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 5b 31 00 00 49 81 3e c0 13 38 8f 0f 84 d0 f3 ff
> RSP: 0000:ffffc90000ce77d8 EFLAGS: 00010002
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000004 RSI: 1ffff9200019cf0c RDI: 0000000000000020
> RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000001 R11: 0000000000000006 R12: ffff88801295b880
> R13: 0000000000000000 R14: 0000000000000020 R15: 0000000000000000
> FS: 00007fcd745f98c0(0000) GS:ffff88802cb00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffe279f7d87 CR3: 000000001c7d4000 CR4: 0000000000150ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> lock_acquire kernel/locking/lockdep.c:5510 [inline]
> lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:159
> gadget_setup+0x4e/0x510 drivers/usb/gadget/legacy/raw_gadget.c:327
> dummy_timer+0x1615/0x32a0 drivers/usb/gadget/udc/dummy_hcd.c:1903
> call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1431
> expire_timers kernel/time/timer.c:1476 [inline]
> __run_timers.part.0+0x67c/0xa50 kernel/time/timer.c:1745
> __run_timers kernel/time/timer.c:1726 [inline]
> run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1758
> __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
> invoke_softirq kernel/softirq.c:221 [inline]
> __irq_exit_rcu kernel/softirq.c:422 [inline]
> irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
> sysvec_apic_timer_interrupt+0x45/0xc0 arch/x86/kernel/apic/apic.c:1100
> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
> RIP: 0033:0x560cfc4a02ed
> Code: 4c 39 c1 48 89 42 18 4c 89 52 08 4c 89 5a 10 48 89 1a 0f 87 7b ff ff ff 48 89 f8 48 f7 d0 48 01 c8 48 83 e0 f8 48 8d 7c 07 08 <48> 8d 0d 34 d9 02 00 48 63 04 b1 48 01 c8 ff e0 0f 1f 00 48 8d 0d
> RSP: 002b:00007ffe279f9dd0 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000560cfcd88e40 RCX: 0000560cfcd72af0
> RDX: 00007ffe279f9de0 RSI: 0000000000000007 RDI: 0000560cfcd72af0
> RBP: 00007ffe279f9e70 R08: 0000000000000000 R09: 0000000000000020
> R10: 0000560cfcd72af7 R11: 0000560cfcd73530 R12: 0000560cfcd72af0
> R13: 0000000000000000 R14: 0000560cfcd72b10 R15: 0000000000000001
> Modules linked in:
> ---[ end trace ab0f6632fdd289cf ]---
> RIP: 0010:__lock_acquire+0xcfe/0x54c0 kernel/locking/lockdep.c:4770
> Code: 09 0e 41 bf 01 00 00 00 0f 86 8c 00 00 00 89 05 48 69 09 0e e9 81 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 5b 31 00 00 49 81 3e c0 13 38 8f 0f 84 d0 f3 ff
> RSP: 0000:ffffc90000ce77d8 EFLAGS: 00010002
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000004 RSI: 1ffff9200019cf0c RDI: 0000000000000020
> RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000001 R11: 0000000000000006 R12: ffff88801295b880
> R13: 0000000000000000 R14: 0000000000000020 R15: 0000000000000000
> FS: 00007fcd745f98c0(0000) GS:ffff88802cb00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffe279f7d87 CR3: 000000001c7d4000 CR4: 0000000000150ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000075c58405bfd6228c%40google.com.


2021-04-13 21:29:02

by Alan Stern

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in gadget_setup

On Tue, Apr 13, 2021 at 10:12:05AM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 13, 2021 at 10:08 AM syzbot
> <[email protected]> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 0f4498ce Merge tag 'for-5.12/dm-fixes-2' of git://git.kern..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=124adbf6d00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=daeff30c2474a60f
> > dashboard link: https://syzkaller.appspot.com/bug?extid=eb4674092e6cc8d9e0bd
> > userspace arch: i386
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
>
> I suspect that the raw gadget_unbind() can be called while the timer
> is still active. gadget_unbind() sets gadget data to NULL.
> But I am not sure which unbind call this is:
> usb_gadget_remove_driver() or right in udc_bind_to_driver() due to a
> start error.

This certainly looks like a race between gadget_unbind and gadget_setup
in raw_gadget.

In theory, this race shouldn't matter. The gadget core is supposed to
guarantee that there won't be any more callbacks to the gadget driver
once the driver's unbind routine is called. That guarantee is enforced
in usb_gadget_remove_driver as follows:

usb_gadget_disconnect(udc->gadget);
if (udc->gadget->irq)
synchronize_irq(udc->gadget->irq);
udc->driver->unbind(udc->gadget);
usb_gadget_udc_stop(udc);

usb_gadget_disconnect turns off the pullup resistor, telling the host
that the gadget is no longer connected and preventing the transmission
of any more USB packets. Any packets that have already been received
are sure to processed by the UDC driver's interrupt handler by the time
synchronize_irq returns.

But this doesn't work with dummy_hcd, because dummy_hcd doesn't use
interrupts; it uses a timer instead. It does have code to emulate the
effect of synchronize_irq, but that code doesn't get invoked at the
right time -- it currently runs in usb_gadget_udc_stop, after the unbind
callback instead of before. Indeed, there's no way for
usb_gadget_remove_driver to invoke this code before the unbind
callback,.

I thought the synchronize_irq emulation problem had been completely
solved, but evidently it hasn't. It looks like the best solution is to
add a call of the synchronize_irq emulation code in dummy_pullup.

Maybe we can test this reasoning by putting a delay just before the call
to dum->driver->setup. That runs in the timer handler, so it's not a
good place to delay, but it may be okay just for testing purposes.

Hopefully this patch will make the race a lot more likely to occur. Is
there any way to tell syzkaller to test it, despite the fact that
syzkaller doesn't think it has a reproducer for this issue?

Alan Stern


Index: usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
===================================================================
--- usb-devel.orig/drivers/usb/gadget/udc/dummy_hcd.c
+++ usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
@@ -1900,6 +1900,7 @@ restart:
if (value > 0) {
++dum->callback_usage;
spin_unlock(&dum->lock);
+ mdelay(5);
value = dum->driver->setup(&dum->gadget,
&setup);
spin_lock(&dum->lock);

2021-04-14 02:44:52

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in gadget_setup

On Tue, Apr 13, 2021 at 6:13 PM Alan Stern <[email protected]> wrote:
>
> On Tue, Apr 13, 2021 at 10:12:05AM +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 13, 2021 at 10:08 AM syzbot
> > <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: 0f4498ce Merge tag 'for-5.12/dm-fixes-2' of git://git.kern..
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=124adbf6d00000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=daeff30c2474a60f
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=eb4674092e6cc8d9e0bd
> > > userspace arch: i386
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: [email protected]
> >
> > I suspect that the raw gadget_unbind() can be called while the timer
> > is still active. gadget_unbind() sets gadget data to NULL.
> > But I am not sure which unbind call this is:
> > usb_gadget_remove_driver() or right in udc_bind_to_driver() due to a
> > start error.
>
> This certainly looks like a race between gadget_unbind and gadget_setup
> in raw_gadget.
>
> In theory, this race shouldn't matter. The gadget core is supposed to
> guarantee that there won't be any more callbacks to the gadget driver
> once the driver's unbind routine is called. That guarantee is enforced
> in usb_gadget_remove_driver as follows:
>
> usb_gadget_disconnect(udc->gadget);
> if (udc->gadget->irq)
> synchronize_irq(udc->gadget->irq);
> udc->driver->unbind(udc->gadget);
> usb_gadget_udc_stop(udc);
>
> usb_gadget_disconnect turns off the pullup resistor, telling the host
> that the gadget is no longer connected and preventing the transmission
> of any more USB packets. Any packets that have already been received
> are sure to processed by the UDC driver's interrupt handler by the time
> synchronize_irq returns.
>
> But this doesn't work with dummy_hcd, because dummy_hcd doesn't use
> interrupts; it uses a timer instead. It does have code to emulate the
> effect of synchronize_irq, but that code doesn't get invoked at the
> right time -- it currently runs in usb_gadget_udc_stop, after the unbind
> callback instead of before. Indeed, there's no way for
> usb_gadget_remove_driver to invoke this code before the unbind
> callback,.
>
> I thought the synchronize_irq emulation problem had been completely
> solved, but evidently it hasn't. It looks like the best solution is to
> add a call of the synchronize_irq emulation code in dummy_pullup.
>
> Maybe we can test this reasoning by putting a delay just before the call
> to dum->driver->setup. That runs in the timer handler, so it's not a
> good place to delay, but it may be okay just for testing purposes.
>
> Hopefully this patch will make the race a lot more likely to occur. Is
> there any way to tell syzkaller to test it, despite the fact that
> syzkaller doesn't think it has a reproducer for this issue?

If there is no reproducer the only way syzbot can test it is if it's
in linux-next under CONFIG_DEBUG_AID_FOR_SYZBOT:
http://bit.do/syzbot#no-custom-patches





> Alan Stern
>
>
> Index: usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
> ===================================================================
> --- usb-devel.orig/drivers/usb/gadget/udc/dummy_hcd.c
> +++ usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
> @@ -1900,6 +1900,7 @@ restart:
> if (value > 0) {
> ++dum->callback_usage;
> spin_unlock(&dum->lock);
> + mdelay(5);
> value = dum->driver->setup(&dum->gadget,
> &setup);
> spin_lock(&dum->lock);

2021-04-16 06:58:15

by Anirudh Rayabharam

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in gadget_setup

On Tue, Apr 13, 2021 at 12:13:11PM -0400, Alan Stern wrote:
> On Tue, Apr 13, 2021 at 10:12:05AM +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 13, 2021 at 10:08 AM syzbot
> > <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: 0f4498ce Merge tag 'for-5.12/dm-fixes-2' of git://git.kern..
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=124adbf6d00000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=daeff30c2474a60f
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=eb4674092e6cc8d9e0bd
> > > userspace arch: i386
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: [email protected]
> >
> > I suspect that the raw gadget_unbind() can be called while the timer
> > is still active. gadget_unbind() sets gadget data to NULL.
> > But I am not sure which unbind call this is:
> > usb_gadget_remove_driver() or right in udc_bind_to_driver() due to a
> > start error.
>
> This certainly looks like a race between gadget_unbind and gadget_setup
> in raw_gadget.
>
> In theory, this race shouldn't matter. The gadget core is supposed to
> guarantee that there won't be any more callbacks to the gadget driver
> once the driver's unbind routine is called. That guarantee is enforced
> in usb_gadget_remove_driver as follows:
>
> usb_gadget_disconnect(udc->gadget);
> if (udc->gadget->irq)
> synchronize_irq(udc->gadget->irq);
> udc->driver->unbind(udc->gadget);
> usb_gadget_udc_stop(udc);
>
> usb_gadget_disconnect turns off the pullup resistor, telling the host
> that the gadget is no longer connected and preventing the transmission
> of any more USB packets. Any packets that have already been received
> are sure to processed by the UDC driver's interrupt handler by the time
> synchronize_irq returns.
>
> But this doesn't work with dummy_hcd, because dummy_hcd doesn't use
> interrupts; it uses a timer instead. It does have code to emulate the
> effect of synchronize_irq, but that code doesn't get invoked at the
> right time -- it currently runs in usb_gadget_udc_stop, after the unbind
> callback instead of before. Indeed, there's no way for
> usb_gadget_remove_driver to invoke this code before the unbind
> callback,.
>
> I thought the synchronize_irq emulation problem had been completely
> solved, but evidently it hasn't. It looks like the best solution is to
> add a call of the synchronize_irq emulation code in dummy_pullup.
>
> Maybe we can test this reasoning by putting a delay just before the call
> to dum->driver->setup. That runs in the timer handler, so it's not a
> good place to delay, but it may be okay just for testing purposes.
>
> Hopefully this patch will make the race a lot more likely to occur. Is

Hi Alan,

Indeed, I was able to reproduce this bug easily on my machine with your
delay patch applied and using this syzkaller program:

syz_usb_connect$cdc_ncm(0x1, 0x6e, &(0x7f0000000040)={{0x12, 0x1, 0x0, 0x2, 0x0, 0x0, 0x8, 0x525, 0xa4a1, 0x40, 0x1, 0x2, 0x3, 0x1, [{{0x9, 0x2, 0x5c, 0x2, 0x1, 0x0, 0x0, 0x0, {{0x9, 0x4, 0x0, 0x0, 0x1, 0x2, 0xd, 0x0, 0x0, {{0x5}, {0x5}, {0xd}, {0x6}}, {{0x9, 0x5, 0x81, 0x3, 0x200}}}}}}]}}, &(0x7f0000000480)={0x0, 0x0, 0x0, 0x0, 0x3, [{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}]})

I also tested doing the synchronize_irq emulation in dummy_pullup and it
fixed the issue. The patch is below.

Thanks!

- Anirudh.

diff --git a/drivers/usb/gadget/udc/dummy_hcd.c b/drivers/usb/gadget/udc/dummy_hcd.c
index ce24d4f28f2a..931d4612d859 100644
--- a/drivers/usb/gadget/udc/dummy_hcd.c
+++ b/drivers/usb/gadget/udc/dummy_hcd.c
@@ -903,6 +903,12 @@ static int dummy_pullup(struct usb_gadget *_gadget, int value)
spin_lock_irqsave(&dum->lock, flags);
dum->pullup = (value != 0);
set_link_state(dum_hcd);
+ /* emulate synchronize_irq(): wait for callbacks to finish */
+ while (dum->callback_usage > 0) {
+ spin_unlock_irqrestore(&dum->lock, flags);
+ usleep_range(1000, 2000);
+ spin_lock_irqsave(&dum->lock, flags);
+ }
spin_unlock_irqrestore(&dum->lock, flags);

usb_hcd_poll_rh_status(dummy_hcd_to_hcd(dum_hcd));
@@ -1005,13 +1011,6 @@ static int dummy_udc_stop(struct usb_gadget *g)
dum->ints_enabled = 0;
stop_activity(dum);

- /* emulate synchronize_irq(): wait for callbacks to finish */
- while (dum->callback_usage > 0) {
- spin_unlock_irq(&dum->lock);
- usleep_range(1000, 2000);
- spin_lock_irq(&dum->lock);
- }
-
dum->driver = NULL;
spin_unlock_irq(&dum->lock);

@@ -1900,6 +1899,7 @@ static void dummy_timer(struct timer_list *t)
if (value > 0) {
++dum->callback_usage;
spin_unlock(&dum->lock);
+ mdelay(5);
value = dum->driver->setup(&dum->gadget,
&setup);
spin_lock(&dum->lock);

2021-04-16 16:12:21

by Alan Stern

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in gadget_setup

On Fri, Apr 16, 2021 at 11:10:35AM +0530, Anirudh Rayabharam wrote:
> On Tue, Apr 13, 2021 at 12:13:11PM -0400, Alan Stern wrote:
> > Maybe we can test this reasoning by putting a delay just before the call
> > to dum->driver->setup. That runs in the timer handler, so it's not a
> > good place to delay, but it may be okay just for testing purposes.
> >
> > Hopefully this patch will make the race a lot more likely to occur. Is
>
> Hi Alan,
>
> Indeed, I was able to reproduce this bug easily on my machine with your
> delay patch applied and using this syzkaller program:
>
> syz_usb_connect$cdc_ncm(0x1, 0x6e, &(0x7f0000000040)={{0x12, 0x1, 0x0, 0x2, 0x0, 0x0, 0x8, 0x525, 0xa4a1, 0x40, 0x1, 0x2, 0x3, 0x1, [{{0x9, 0x2, 0x5c, 0x2, 0x1, 0x0, 0x0, 0x0, {{0x9, 0x4, 0x0, 0x0, 0x1, 0x2, 0xd, 0x0, 0x0, {{0x5}, {0x5}, {0xd}, {0x6}}, {{0x9, 0x5, 0x81, 0x3, 0x200}}}}}}]}}, &(0x7f0000000480)={0x0, 0x0, 0x0, 0x0, 0x3, [{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}]})
>
> I also tested doing the synchronize_irq emulation in dummy_pullup and it
> fixed the issue. The patch is below.

That's great! Thanks for testing.

> Thanks!
>
> - Anirudh.
>
> diff --git a/drivers/usb/gadget/udc/dummy_hcd.c b/drivers/usb/gadget/udc/dummy_hcd.c
> index ce24d4f28f2a..931d4612d859 100644
> --- a/drivers/usb/gadget/udc/dummy_hcd.c
> +++ b/drivers/usb/gadget/udc/dummy_hcd.c
> @@ -903,6 +903,12 @@ static int dummy_pullup(struct usb_gadget *_gadget, int value)
> spin_lock_irqsave(&dum->lock, flags);
> dum->pullup = (value != 0);
> set_link_state(dum_hcd);
> + /* emulate synchronize_irq(): wait for callbacks to finish */
> + while (dum->callback_usage > 0) {
> + spin_unlock_irqrestore(&dum->lock, flags);
> + usleep_range(1000, 2000);
> + spin_lock_irqsave(&dum->lock, flags);
> + }

We should do this only if value == 0. No synchronization is needed when
the pullup is turned on.

Also, there should be a comment explaining that this is necessary
because there's no other place to emulate the call made to
synchronize_irq() in core.c:usb_gadget_remove_driver().

> spin_unlock_irqrestore(&dum->lock, flags);
>
> usb_hcd_poll_rh_status(dummy_hcd_to_hcd(dum_hcd));
> @@ -1005,13 +1011,6 @@ static int dummy_udc_stop(struct usb_gadget *g)
> dum->ints_enabled = 0;
> stop_activity(dum);
>
> - /* emulate synchronize_irq(): wait for callbacks to finish */
> - while (dum->callback_usage > 0) {
> - spin_unlock_irq(&dum->lock);
> - usleep_range(1000, 2000);
> - spin_lock_irq(&dum->lock);
> - }
> -
> dum->driver = NULL;
> spin_unlock_irq(&dum->lock);

Actually, I wanted to move this emulation code into a new subroutine and
then call that subroutine from _both_ places. Would you like to write
and submit a patch that does this?

Alan Stern

2021-04-16 18:53:01

by Anirudh Rayabharam

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in gadget_setup

On Fri, Apr 16, 2021 at 11:27:34AM -0400, Alan Stern wrote:
> On Fri, Apr 16, 2021 at 11:10:35AM +0530, Anirudh Rayabharam wrote:
> > On Tue, Apr 13, 2021 at 12:13:11PM -0400, Alan Stern wrote:
> > > Maybe we can test this reasoning by putting a delay just before the call
> > > to dum->driver->setup. That runs in the timer handler, so it's not a
> > > good place to delay, but it may be okay just for testing purposes.
> > >
> > > Hopefully this patch will make the race a lot more likely to occur. Is
> >
> > Hi Alan,
> >
> > Indeed, I was able to reproduce this bug easily on my machine with your
> > delay patch applied and using this syzkaller program:
> >
> > syz_usb_connect$cdc_ncm(0x1, 0x6e, &(0x7f0000000040)={{0x12, 0x1, 0x0, 0x2, 0x0, 0x0, 0x8, 0x525, 0xa4a1, 0x40, 0x1, 0x2, 0x3, 0x1, [{{0x9, 0x2, 0x5c, 0x2, 0x1, 0x0, 0x0, 0x0, {{0x9, 0x4, 0x0, 0x0, 0x1, 0x2, 0xd, 0x0, 0x0, {{0x5}, {0x5}, {0xd}, {0x6}}, {{0x9, 0x5, 0x81, 0x3, 0x200}}}}}}]}}, &(0x7f0000000480)={0x0, 0x0, 0x0, 0x0, 0x3, [{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}]})
> >
> > I also tested doing the synchronize_irq emulation in dummy_pullup and it
> > fixed the issue. The patch is below.
>
> That's great! Thanks for testing.
>
> > Thanks!
> >
> > - Anirudh.
> >
> > diff --git a/drivers/usb/gadget/udc/dummy_hcd.c b/drivers/usb/gadget/udc/dummy_hcd.c
> > index ce24d4f28f2a..931d4612d859 100644
> > --- a/drivers/usb/gadget/udc/dummy_hcd.c
> > +++ b/drivers/usb/gadget/udc/dummy_hcd.c
> > @@ -903,6 +903,12 @@ static int dummy_pullup(struct usb_gadget *_gadget, int value)
> > spin_lock_irqsave(&dum->lock, flags);
> > dum->pullup = (value != 0);
> > set_link_state(dum_hcd);
> > + /* emulate synchronize_irq(): wait for callbacks to finish */
> > + while (dum->callback_usage > 0) {
> > + spin_unlock_irqrestore(&dum->lock, flags);
> > + usleep_range(1000, 2000);
> > + spin_lock_irqsave(&dum->lock, flags);
> > + }
>
> We should do this only if value == 0. No synchronization is needed when
> the pullup is turned on.

Oh right! My bad.

> Also, there should be a comment explaining that this is necessary
> because there's no other place to emulate the call made to
> synchronize_irq() in core.c:usb_gadget_remove_driver().

Will do.

> > spin_unlock_irqrestore(&dum->lock, flags);
> >
> > usb_hcd_poll_rh_status(dummy_hcd_to_hcd(dum_hcd));
> > @@ -1005,13 +1011,6 @@ static int dummy_udc_stop(struct usb_gadget *g)
> > dum->ints_enabled = 0;
> > stop_activity(dum);
> >
> > - /* emulate synchronize_irq(): wait for callbacks to finish */
> > - while (dum->callback_usage > 0) {
> > - spin_unlock_irq(&dum->lock);
> > - usleep_range(1000, 2000);
> > - spin_lock_irq(&dum->lock);
> > - }
> > -
> > dum->driver = NULL;
> > spin_unlock_irq(&dum->lock);
>
> Actually, I wanted to move this emulation code into a new subroutine and
> then call that subroutine from _both_ places. Would you like to write

Does it really need to be called from both places?

> and submit a patch that does this?

Sure! I will do that.

Thanks!

- Anirudh.

2021-04-16 20:43:31

by Alan Stern

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in gadget_setup

On Fri, Apr 16, 2021 at 10:35:20PM +0530, Anirudh Rayabharam wrote:
> On Fri, Apr 16, 2021 at 11:27:34AM -0400, Alan Stern wrote:
> > Actually, I wanted to move this emulation code into a new subroutine and
> > then call that subroutine from _both_ places. Would you like to write
>
> Does it really need to be called from both places?

You know, I was going to say Yes, but now I think you're right; it's not
needed in dummy_udc_stop. This is because core.c always calls
usb_gadget_disconnect before usb_gadget_udc_stop. And we can rely on
this behavior; it's obviously necessary to disconnect from the host
before stopping the UDC driver.

On the other hand, while checking that fact I noticed that
soft_connect_store in core.c doesn't call synchronize_irq in between the
other two, the way usb_gadget_remove_driver does. That seems like a bug
-- if it's necessary to synchronize with the IRQ handler on one path, it
should be necessary on the other path as well. But that's a matter for
a separate patch.

Alan Stern

> > and submit a patch that does this?
>
> Sure! I will do that.
>
> Thanks!
>
> - Anirudh.