2024-04-11 06:53:15

by Sam Sun

[permalink] [raw]
Subject: [Linux kernel bug] general protection fault in disable_store

Dear developers and maintainers,

We encountered a general protection fault in function disable_store.
It is tested against the latest upstream linux (tag 6.9-rc3). C repro
and kernel config are attached to this email. Kernel crash log is
listed below.
```
general protection fault, probably for non-canonical address
0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 PID: 9459 Comm: syz-executor414 Not tainted 6.7.0-rc7 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:disable_store+0xd0/0x3d0 drivers/usb/core/port.c:88
Code: 02 00 00 4c 8b 75 40 4d 8d be 58 ff ff ff 4c 89 ff e8 a4 20 fa
ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
02 00 0f 85 b0 02 00 00 48 8b 45 00 48 8d bb 34 05 00 00 48
RSP: 0018:ffffc90006e3fc08 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff88801d4d4008 RCX: ffffffff86706be8
RDX: 0000000000000000 RSI: ffffffff86706c4d RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92000dc7f85
R13: ffff88810f4bfb18 R14: ffff88801d4d10a8 R15: ffff88801d4d1000
FS: 00007fa0af71b640(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa0af71a4b8 CR3: 0000000022f5f000 CR4: 0000000000750ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
dev_attr_store+0x54/0x80 drivers/base/core.c:2366
sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136
kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x96a/0xd80 fs/read_write.c:584
ksys_write+0x122/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fa0aff9ee1f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 a9 f4 02 00 48 8b 54
24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d
00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 ec f4 02 00 48
RSP: 002b:00007fa0af71a460 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fa0aff9ee1f
RDX: 0000000000000004 RSI: 00007fa0af71acc0 RDI: 0000000000000005
RBP: 0000000000000005 R08: 0000000000000000 R09: 00007fffdb6af2cf
R10: 0000000000000000 R11: 0000000000000293 R12: 00007fa0af71acc0
R13: 000000000000006e R14: 00007fa0aff613d0 R15: 00007fa0af6fb000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:disable_store+0xd0/0x3d0 drivers/usb/core/port.c:88
Code: 02 00 00 4c 8b 75 40 4d 8d be 58 ff ff ff 4c 89 ff e8 a4 20 fa
ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
02 00 0f 85 b0 02 00 00 48 8b 45 00 48 8d bb 34 05 00 00 48
RSP: 0018:ffffc90006e3fc08 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff88801d4d4008 RCX: ffffffff86706be8
RDX: 0000000000000000 RSI: ffffffff86706c4d RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92000dc7f85
R13: ffff88810f4bfb18 R14: ffff88801d4d10a8 R15: ffff88801d4d1000
FS: 00007fa0af71b640(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055d23050c460 CR3: 0000000022f5f000 CR4: 0000000000750ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
----------------
Code disassembly (best guess):
0: 02 00 add (%rax),%al
2: 00 4c 8b 75 add %cl,0x75(%rbx,%rcx,4)
6: 40 rex
7: 4d 8d be 58 ff ff ff lea -0xa8(%r14),%r15
e: 4c 89 ff mov %r15,%rdi
11: e8 a4 20 fa ff call 0xfffa20ba
16: 48 89 c2 mov %rax,%rdx
19: 48 89 c5 mov %rax,%rbp
1c: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
23: fc ff df
26: 48 c1 ea 03 shr $0x3,%rdx
* 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <--
trapping instruction
2e: 0f 85 b0 02 00 00 jne 0x2e4
34: 48 8b 45 00 mov 0x0(%rbp),%rax
38: 48 8d bb 34 05 00 00 lea 0x534(%rbx),%rdi
3f: 48 rex.W
```
We analyzed the root cause of this bug. When calling disable_store()
in drivers/usb/core/port.c, if function authorized_store() is calling
usb_deauthorized_device() concurrently, the usb_interface will be
removed by usb_disable_device. However, in function disable_store,
usb_hub_to_struct_hub() would try to deref interface, causing
nullptr-deref. We also tested other functions in
drivers/usb/core/port.c. So far we haven't found a similar problem.

If you have any questions, please contact us.

Reported by Yue Sun <[email protected]>
Reported by xingwei lee <[email protected]>

Best Regards,
Yue


Attachments:
disable_store.c (80.68 kB)
config (242.05 kB)
Download all attachments

2024-04-11 06:59:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Thu, Apr 11, 2024 at 02:52:27PM +0800, Sam Sun wrote:
> Dear developers and maintainers,
>
> We encountered a general protection fault in function disable_store.
> It is tested against the latest upstream linux (tag 6.9-rc3). C repro
> and kernel config are attached to this email. Kernel crash log is
> listed below.
> ```
> general protection fault, probably for non-canonical address
> 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> CPU: 1 PID: 9459 Comm: syz-executor414 Not tainted 6.7.0-rc7 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> RIP: 0010:disable_store+0xd0/0x3d0 drivers/usb/core/port.c:88
> Code: 02 00 00 4c 8b 75 40 4d 8d be 58 ff ff ff 4c 89 ff e8 a4 20 fa
> ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> 02 00 0f 85 b0 02 00 00 48 8b 45 00 48 8d bb 34 05 00 00 48
> RSP: 0018:ffffc90006e3fc08 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: ffff88801d4d4008 RCX: ffffffff86706be8
> RDX: 0000000000000000 RSI: ffffffff86706c4d RDI: 0000000000000005
> RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92000dc7f85
> R13: ffff88810f4bfb18 R14: ffff88801d4d10a8 R15: ffff88801d4d1000
> FS: 00007fa0af71b640(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fa0af71a4b8 CR3: 0000000022f5f000 CR4: 0000000000750ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <TASK>
> dev_attr_store+0x54/0x80 drivers/base/core.c:2366
> sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136
> kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334
> call_write_iter include/linux/fs.h:2020 [inline]
> new_sync_write fs/read_write.c:491 [inline]
> vfs_write+0x96a/0xd80 fs/read_write.c:584
> ksys_write+0x122/0x250 fs/read_write.c:637
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x63/0x6b
> RIP: 0033:0x7fa0aff9ee1f
> Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 a9 f4 02 00 48 8b 54
> 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d
> 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 ec f4 02 00 48
> RSP: 002b:00007fa0af71a460 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fa0aff9ee1f
> RDX: 0000000000000004 RSI: 00007fa0af71acc0 RDI: 0000000000000005
> RBP: 0000000000000005 R08: 0000000000000000 R09: 00007fffdb6af2cf
> R10: 0000000000000000 R11: 0000000000000293 R12: 00007fa0af71acc0
> R13: 000000000000006e R14: 00007fa0aff613d0 R15: 00007fa0af6fb000
> </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:disable_store+0xd0/0x3d0 drivers/usb/core/port.c:88
> Code: 02 00 00 4c 8b 75 40 4d 8d be 58 ff ff ff 4c 89 ff e8 a4 20 fa
> ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> 02 00 0f 85 b0 02 00 00 48 8b 45 00 48 8d bb 34 05 00 00 48
> RSP: 0018:ffffc90006e3fc08 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: ffff88801d4d4008 RCX: ffffffff86706be8
> RDX: 0000000000000000 RSI: ffffffff86706c4d RDI: 0000000000000005
> RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92000dc7f85
> R13: ffff88810f4bfb18 R14: ffff88801d4d10a8 R15: ffff88801d4d1000
> FS: 00007fa0af71b640(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055d23050c460 CR3: 0000000022f5f000 CR4: 0000000000750ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> ----------------
> Code disassembly (best guess):
> 0: 02 00 add (%rax),%al
> 2: 00 4c 8b 75 add %cl,0x75(%rbx,%rcx,4)
> 6: 40 rex
> 7: 4d 8d be 58 ff ff ff lea -0xa8(%r14),%r15
> e: 4c 89 ff mov %r15,%rdi
> 11: e8 a4 20 fa ff call 0xfffa20ba
> 16: 48 89 c2 mov %rax,%rdx
> 19: 48 89 c5 mov %rax,%rbp
> 1c: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
> 23: fc ff df
> 26: 48 c1 ea 03 shr $0x3,%rdx
> * 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <--
> trapping instruction
> 2e: 0f 85 b0 02 00 00 jne 0x2e4
> 34: 48 8b 45 00 mov 0x0(%rbp),%rax
> 38: 48 8d bb 34 05 00 00 lea 0x534(%rbx),%rdi
> 3f: 48 rex.W
> ```
> We analyzed the root cause of this bug. When calling disable_store()
> in drivers/usb/core/port.c, if function authorized_store() is calling
> usb_deauthorized_device() concurrently, the usb_interface will be
> removed by usb_disable_device. However, in function disable_store,
> usb_hub_to_struct_hub() would try to deref interface, causing
> nullptr-deref. We also tested other functions in
> drivers/usb/core/port.c. So far we haven't found a similar problem.
>
> If you have any questions, please contact us.
>
> Reported by Yue Sun <[email protected]>
> Reported by xingwei lee <[email protected]>

Do you have a proposed patch to fix this as you have a way to easily
test this?

thanks,

greg k-h

2024-04-11 07:19:53

by Sam Sun

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Thu, Apr 11, 2024 at 2:58 PM Greg KH <[email protected]> wrote:
>
> On Thu, Apr 11, 2024 at 02:52:27PM +0800, Sam Sun wrote:
> > Dear developers and maintainers,
> >
> > We encountered a general protection fault in function disable_store.
> > It is tested against the latest upstream linux (tag 6.9-rc3). C repro
> > and kernel config are attached to this email. Kernel crash log is
> > listed below.
> > ```
> > general protection fault, probably for non-canonical address
> > 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> > CPU: 1 PID: 9459 Comm: syz-executor414 Not tainted 6.7.0-rc7 #2
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> > RIP: 0010:disable_store+0xd0/0x3d0 drivers/usb/core/port.c:88
> > Code: 02 00 00 4c 8b 75 40 4d 8d be 58 ff ff ff 4c 89 ff e8 a4 20 fa
> > ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> > 02 00 0f 85 b0 02 00 00 48 8b 45 00 48 8d bb 34 05 00 00 48
> > RSP: 0018:ffffc90006e3fc08 EFLAGS: 00010246
> > RAX: dffffc0000000000 RBX: ffff88801d4d4008 RCX: ffffffff86706be8
> > RDX: 0000000000000000 RSI: ffffffff86706c4d RDI: 0000000000000005
> > RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92000dc7f85
> > R13: ffff88810f4bfb18 R14: ffff88801d4d10a8 R15: ffff88801d4d1000
> > FS: 00007fa0af71b640(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fa0af71a4b8 CR3: 0000000022f5f000 CR4: 0000000000750ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> > dev_attr_store+0x54/0x80 drivers/base/core.c:2366
> > sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136
> > kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334
> > call_write_iter include/linux/fs.h:2020 [inline]
> > new_sync_write fs/read_write.c:491 [inline]
> > vfs_write+0x96a/0xd80 fs/read_write.c:584
> > ksys_write+0x122/0x250 fs/read_write.c:637
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x63/0x6b
> > RIP: 0033:0x7fa0aff9ee1f
> > Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 a9 f4 02 00 48 8b 54
> > 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d
> > 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 ec f4 02 00 48
> > RSP: 002b:00007fa0af71a460 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> > RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fa0aff9ee1f
> > RDX: 0000000000000004 RSI: 00007fa0af71acc0 RDI: 0000000000000005
> > RBP: 0000000000000005 R08: 0000000000000000 R09: 00007fffdb6af2cf
> > R10: 0000000000000000 R11: 0000000000000293 R12: 00007fa0af71acc0
> > R13: 000000000000006e R14: 00007fa0aff613d0 R15: 00007fa0af6fb000
> > </TASK>
> > Modules linked in:
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:disable_store+0xd0/0x3d0 drivers/usb/core/port.c:88
> > Code: 02 00 00 4c 8b 75 40 4d 8d be 58 ff ff ff 4c 89 ff e8 a4 20 fa
> > ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> > 02 00 0f 85 b0 02 00 00 48 8b 45 00 48 8d bb 34 05 00 00 48
> > RSP: 0018:ffffc90006e3fc08 EFLAGS: 00010246
> > RAX: dffffc0000000000 RBX: ffff88801d4d4008 RCX: ffffffff86706be8
> > RDX: 0000000000000000 RSI: ffffffff86706c4d RDI: 0000000000000005
> > RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92000dc7f85
> > R13: ffff88810f4bfb18 R14: ffff88801d4d10a8 R15: ffff88801d4d1000
> > FS: 00007fa0af71b640(0000) GS:ffff888063a00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000055d23050c460 CR3: 0000000022f5f000 CR4: 0000000000750ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
> > ----------------
> > Code disassembly (best guess):
> > 0: 02 00 add (%rax),%al
> > 2: 00 4c 8b 75 add %cl,0x75(%rbx,%rcx,4)
> > 6: 40 rex
> > 7: 4d 8d be 58 ff ff ff lea -0xa8(%r14),%r15
> > e: 4c 89 ff mov %r15,%rdi
> > 11: e8 a4 20 fa ff call 0xfffa20ba
> > 16: 48 89 c2 mov %rax,%rdx
> > 19: 48 89 c5 mov %rax,%rbp
> > 1c: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
> > 23: fc ff df
> > 26: 48 c1 ea 03 shr $0x3,%rdx
> > * 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <--
> > trapping instruction
> > 2e: 0f 85 b0 02 00 00 jne 0x2e4
> > 34: 48 8b 45 00 mov 0x0(%rbp),%rax
> > 38: 48 8d bb 34 05 00 00 lea 0x534(%rbx),%rdi
> > 3f: 48 rex.W
> > ```
> > We analyzed the root cause of this bug. When calling disable_store()
> > in drivers/usb/core/port.c, if function authorized_store() is calling
> > usb_deauthorized_device() concurrently, the usb_interface will be
> > removed by usb_disable_device. However, in function disable_store,
> > usb_hub_to_struct_hub() would try to deref interface, causing
> > nullptr-deref. We also tested other functions in
> > drivers/usb/core/port.c. So far we haven't found a similar problem.
> >
> > If you have any questions, please contact us.
> >
> > Reported by Yue Sun <[email protected]>
> > Reported by xingwei lee <[email protected]>
>
> Do you have a proposed patch to fix this as you have a way to easily
> test this?
>
> thanks,
>
> greg k-h

I am glad to help, but I am afraid I don't have a proposed patch
currently. This bug is a bit tricky, since the child device (usb port)
is trying to get its parent device (usb interface) while its
grandparent device (usb device) is trying to delete it (usb
interface). I need to further look into the locks to see which lock
should I grab to avoid deadlock or introducing other problems. I will
try my best but there is no guarantee. Hope someone who knows usb
subsystem better could fix this.

Best Regards,
Yue

2024-04-11 15:32:29

by Alan Stern

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Thu, Apr 11, 2024 at 02:52:27PM +0800, Sam Sun wrote:
> Dear developers and maintainers,
>
> We encountered a general protection fault in function disable_store.
> It is tested against the latest upstream linux (tag 6.9-rc3). C repro
> and kernel config are attached to this email. Kernel crash log is
> listed below.
> ```
> general protection fault, probably for non-canonical address
> 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> CPU: 1 PID: 9459 Comm: syz-executor414 Not tainted 6.7.0-rc7 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> RIP: 0010:disable_store+0xd0/0x3d0 drivers/usb/core/port.c:88
> Code: 02 00 00 4c 8b 75 40 4d 8d be 58 ff ff ff 4c 89 ff e8 a4 20 fa
> ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> 02 00 0f 85 b0 02 00 00 48 8b 45 00 48 8d bb 34 05 00 00 48
> RSP: 0018:ffffc90006e3fc08 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: ffff88801d4d4008 RCX: ffffffff86706be8
> RDX: 0000000000000000 RSI: ffffffff86706c4d RDI: 0000000000000005
> RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92000dc7f85
> R13: ffff88810f4bfb18 R14: ffff88801d4d10a8 R15: ffff88801d4d1000
> FS: 00007fa0af71b640(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fa0af71a4b8 CR3: 0000000022f5f000 CR4: 0000000000750ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554

> ----------------
> Code disassembly (best guess):
> 0: 02 00 add (%rax),%al
> 2: 00 4c 8b 75 add %cl,0x75(%rbx,%rcx,4)
> 6: 40 rex
> 7: 4d 8d be 58 ff ff ff lea -0xa8(%r14),%r15
> e: 4c 89 ff mov %r15,%rdi
> 11: e8 a4 20 fa ff call 0xfffa20ba
> 16: 48 89 c2 mov %rax,%rdx
> 19: 48 89 c5 mov %rax,%rbp
> 1c: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
> 23: fc ff df
> 26: 48 c1 ea 03 shr $0x3,%rdx
> * 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <--
> trapping instruction
> 2e: 0f 85 b0 02 00 00 jne 0x2e4
> 34: 48 8b 45 00 mov 0x0(%rbp),%rax
> 38: 48 8d bb 34 05 00 00 lea 0x534(%rbx),%rdi
> 3f: 48 rex.W
> ```
> We analyzed the root cause of this bug. When calling disable_store()
> in drivers/usb/core/port.c, if function authorized_store() is calling
> usb_deauthorized_device() concurrently, the usb_interface will be
> removed by usb_disable_device. However, in function disable_store,
> usb_hub_to_struct_hub() would try to deref interface, causing
> nullptr-deref. We also tested other functions in
> drivers/usb/core/port.c. So far we haven't found a similar problem.

I don't see how this explanation could be correct. disable_store() is a
sysfs attribute file for the port device, so when it is called the port
device structure must still be registered. The interface structure
doesn't get removed until after usb_disable_device() calls device_del(),
which won't return until hub_disconnect() returns, which won't happen
until after the port devices are unregistered, which doesn't happen
until disable_store() calls sysfs_break_active_protection(), which is
after the call to usb_hub_to_struct_hub().

Can you do a little extra debugging to find out exactly which C
statement causes the trap? The disassembly above indicates the trap
happens during a compare against 0 inside disable_store() -- not inside
usb_hub_to_struct_hub(). Can you figure out which comparison that is?

Alan Stern

> If you have any questions, please contact us.
>
> Reported by Yue Sun <[email protected]>
> Reported by xingwei lee <[email protected]>
>
> Best Regards,
> Yue

2024-04-12 13:08:42

by Sam Sun

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Thu, Apr 11, 2024 at 11:24 PM Alan Stern <[email protected]> wrote:
>
> On Thu, Apr 11, 2024 at 02:52:27PM +0800, Sam Sun wrote:
> > Dear developers and maintainers,
> >
> > We encountered a general protection fault in function disable_store.
> > It is tested against the latest upstream linux (tag 6.9-rc3). C repro
> > and kernel config are attached to this email. Kernel crash log is
> > listed below.
> > ```
> > general protection fault, probably for non-canonical address
> > 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> > CPU: 1 PID: 9459 Comm: syz-executor414 Not tainted 6.7.0-rc7 #2
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> > RIP: 0010:disable_store+0xd0/0x3d0 drivers/usb/core/port.c:88
> > Code: 02 00 00 4c 8b 75 40 4d 8d be 58 ff ff ff 4c 89 ff e8 a4 20 fa
> > ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> > 02 00 0f 85 b0 02 00 00 48 8b 45 00 48 8d bb 34 05 00 00 48
> > RSP: 0018:ffffc90006e3fc08 EFLAGS: 00010246
> > RAX: dffffc0000000000 RBX: ffff88801d4d4008 RCX: ffffffff86706be8
> > RDX: 0000000000000000 RSI: ffffffff86706c4d RDI: 0000000000000005
> > RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92000dc7f85
> > R13: ffff88810f4bfb18 R14: ffff88801d4d10a8 R15: ffff88801d4d1000
> > FS: 00007fa0af71b640(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fa0af71a4b8 CR3: 0000000022f5f000 CR4: 0000000000750ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
>
> > ----------------
> > Code disassembly (best guess):
> > 0: 02 00 add (%rax),%al
> > 2: 00 4c 8b 75 add %cl,0x75(%rbx,%rcx,4)
> > 6: 40 rex
> > 7: 4d 8d be 58 ff ff ff lea -0xa8(%r14),%r15
> > e: 4c 89 ff mov %r15,%rdi
> > 11: e8 a4 20 fa ff call 0xfffa20ba
> > 16: 48 89 c2 mov %rax,%rdx
> > 19: 48 89 c5 mov %rax,%rbp
> > 1c: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
> > 23: fc ff df
> > 26: 48 c1 ea 03 shr $0x3,%rdx
> > * 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <--
> > trapping instruction
> > 2e: 0f 85 b0 02 00 00 jne 0x2e4
> > 34: 48 8b 45 00 mov 0x0(%rbp),%rax
> > 38: 48 8d bb 34 05 00 00 lea 0x534(%rbx),%rdi
> > 3f: 48 rex.W
> > ```
> > We analyzed the root cause of this bug. When calling disable_store()
> > in drivers/usb/core/port.c, if function authorized_store() is calling
> > usb_deauthorized_device() concurrently, the usb_interface will be
> > removed by usb_disable_device. However, in function disable_store,
> > usb_hub_to_struct_hub() would try to deref interface, causing
> > nullptr-deref. We also tested other functions in
> > drivers/usb/core/port.c. So far we haven't found a similar problem.
>
> I don't see how this explanation could be correct. disable_store() is a
> sysfs attribute file for the port device, so when it is called the port
> device structure must still be registered. The interface structure
> doesn't get removed until after usb_disable_device() calls device_del(),
> which won't return until hub_disconnect() returns, which won't happen
> until after the port devices are unregistered, which doesn't happen
> until disable_store() calls sysfs_break_active_protection(), which is
> after the call to usb_hub_to_struct_hub().
>
> Can you do a little extra debugging to find out exactly which C
> statement causes the trap? The disassembly above indicates the trap
> happens during a compare against 0 inside disable_store() -- not inside
> usb_hub_to_struct_hub(). Can you figure out which comparison that is?
>

Sorry for the mistake I made when debugging this bug. Now I have more
information about it. Disassembly of function disable_store() in the
latest upstream kernel is listed below.
```
Dump of assembler code for function disable_store:
...
0xffffffff86e907eb <+187>: lea -0x8(%r14),%r12
0xffffffff86e907ef <+191>: mov (%rbx),%rax
0xffffffff86e907f2 <+194>: mov %rax,0x20(%rsp)
0xffffffff86e907f7 <+199>: lea -0xa8(%rax),%rdi
0xffffffff86e907fe <+206>: mov %rdi,0x18(%rsp)
0xffffffff86e90803 <+211>: call 0xffffffff86e20220
<usb_hub_to_struct_hub>
0xffffffff86e90808 <+216>: mov %rax,%rbx
0xffffffff86e9080b <+219>: shr $0x3,%rax
0xffffffff86e9080f <+223>: movabs $0xdffffc0000000000,%rcx
0xffffffff86e90819 <+233>: cmpb $0x0,(%rax,%rcx,1)
0xffffffff86e9081d <+237>: je 0xffffffff86e90827 <disable_store+247>
0xffffffff86e9081f <+239>: mov %rbx,%rdi
0xffffffff86e90822 <+242>: call 0xffffffff81eeb0b0
<__asan_report_load8_noabort>
0xffffffff86e90827 <+247>: lea 0x60(%rsp),%rsi
...
```
The cmpb in disable_store()<+233> is generated by KASAN to check the
shadow memory status. If equals 0, which means the load 8 is valid,
pass the KASAN check. However, this time rax is 0, so it first
triggers general protection fault, since 0xdffffc0000000000 is not a
valid address. rax contains the return address of function
usb_hub_to_struct_hub(), in this case is a NULL.

In function usb_hub_to_struct_hub(), I checked hdev and its sub
domains, and they are not NULL. Is it possible that
usb_deauthorized_device() set
hdev->actconfig->interface[0]->dev.driver_data to NULL? I cannot
confirm that since every time I try to breakpoint the code it crashes
differently.

If there is any other thing I could help, please let me know.

Best,
Yue


> Alan Stern
>
> > If you have any questions, please contact us.
> >
> > Reported by Yue Sun <[email protected]>
> > Reported by xingwei lee <[email protected]>
> >
> > Best Regards,
> > Yue

2024-04-12 14:40:50

by Alan Stern

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Fri, Apr 12, 2024 at 09:08:12PM +0800, Sam Sun wrote:
> Sorry for the mistake I made when debugging this bug. Now I have more
> information about it. Disassembly of function disable_store() in the
> latest upstream kernel is listed below.
> ```
> Dump of assembler code for function disable_store:
> ...
> 0xffffffff86e907eb <+187>: lea -0x8(%r14),%r12
> 0xffffffff86e907ef <+191>: mov (%rbx),%rax
> 0xffffffff86e907f2 <+194>: mov %rax,0x20(%rsp)
> 0xffffffff86e907f7 <+199>: lea -0xa8(%rax),%rdi
> 0xffffffff86e907fe <+206>: mov %rdi,0x18(%rsp)
> 0xffffffff86e90803 <+211>: call 0xffffffff86e20220
> <usb_hub_to_struct_hub>
> 0xffffffff86e90808 <+216>: mov %rax,%rbx
> 0xffffffff86e9080b <+219>: shr $0x3,%rax
> 0xffffffff86e9080f <+223>: movabs $0xdffffc0000000000,%rcx
> 0xffffffff86e90819 <+233>: cmpb $0x0,(%rax,%rcx,1)
> 0xffffffff86e9081d <+237>: je 0xffffffff86e90827 <disable_store+247>
> 0xffffffff86e9081f <+239>: mov %rbx,%rdi
> 0xffffffff86e90822 <+242>: call 0xffffffff81eeb0b0
> <__asan_report_load8_noabort>
> 0xffffffff86e90827 <+247>: lea 0x60(%rsp),%rsi
> ...
> ```
> The cmpb in disable_store()<+233> is generated by KASAN to check the
> shadow memory status. If equals 0, which means the load 8 is valid,
> pass the KASAN check. However, this time rax is 0, so it first
> triggers general protection fault, since 0xdffffc0000000000 is not a
> valid address. rax contains the return address of function
> usb_hub_to_struct_hub(), in this case is a NULL.
>
> In function usb_hub_to_struct_hub(), I checked hdev and its sub
> domains, and they are not NULL. Is it possible that
> usb_deauthorized_device() set
> hdev->actconfig->interface[0]->dev.driver_data to NULL? I cannot
> confirm that since every time I try to breakpoint the code it crashes
> differently.

I suspect the usb_hub_to_struct_hub() call is racing with the
spinlock-protected region in hub_disconnect() (in hub.c).

> If there is any other thing I could help, please let me know.

Try the patch below. It should eliminate that race, which hopefully
will fix the problem.

Alan Stern



Index: usb-devel/drivers/usb/core/hub.c
===================================================================
--- usb-devel.orig/drivers/usb/core/hub.c
+++ usb-devel/drivers/usb/core/hub.c
@@ -72,6 +72,9 @@
* change to USB_STATE_NOTATTACHED even when the semaphore isn't held. */
static DEFINE_SPINLOCK(device_state_lock);

+/* Protect hdev->maxchild and hub's intfdata */
+static DEFINE_SPINLOCK(hub_state_lock);
+
/* workqueue to process hub events */
static struct workqueue_struct *hub_wq;
static void hub_event(struct work_struct *work);
@@ -152,9 +155,13 @@ static inline char *portspeed(struct usb
/* Note that hdev or one of its children must be locked! */
struct usb_hub *usb_hub_to_struct_hub(struct usb_device *hdev)
{
- if (!hdev || !hdev->actconfig || !hdev->maxchild)
- return NULL;
- return usb_get_intfdata(hdev->actconfig->interface[0]);
+ struct usb_hub *hub = NULL;
+
+ spin_lock_irq(&hub_state_lock);
+ if (hdev && hdev->actconfig && hdev->maxchild)
+ hub = usb_get_intfdata(hdev->actconfig->interface[0]);
+ spin_unlock_irq(&hub_state_lock);
+ return hub;
}

int usb_device_supports_lpm(struct usb_device *udev)
@@ -1714,7 +1721,9 @@ static int hub_configure(struct usb_hub
break;
}
}
+ spin_lock_irq(&hub_state_lock);
hdev->maxchild = i;
+ spin_unlock_irq(&hub_state_lock);
for (i = 0; i < hdev->maxchild; i++) {
struct usb_port *port_dev = hub->ports[i];

@@ -1790,9 +1799,11 @@ static void hub_disconnect(struct usb_in

/* Avoid races with recursively_mark_NOTATTACHED() */
spin_lock_irq(&device_state_lock);
+ spin_lock(&hub_state_lock);
port1 = hdev->maxchild;
hdev->maxchild = 0;
usb_set_intfdata(intf, NULL);
+ spin_unlock(&hub_state_lock);
spin_unlock_irq(&device_state_lock);

for (; port1 > 0; --port1)


2024-04-12 16:28:05

by Sam Sun

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Fri, Apr 12, 2024 at 10:40 PM Alan Stern <[email protected]> wrote:
>
> On Fri, Apr 12, 2024 at 09:08:12PM +0800, Sam Sun wrote:
> > Sorry for the mistake I made when debugging this bug. Now I have more
> > information about it. Disassembly of function disable_store() in the
> > latest upstream kernel is listed below.
> > ```
> > Dump of assembler code for function disable_store:
> > ...
> > 0xffffffff86e907eb <+187>: lea -0x8(%r14),%r12
> > 0xffffffff86e907ef <+191>: mov (%rbx),%rax
> > 0xffffffff86e907f2 <+194>: mov %rax,0x20(%rsp)
> > 0xffffffff86e907f7 <+199>: lea -0xa8(%rax),%rdi
> > 0xffffffff86e907fe <+206>: mov %rdi,0x18(%rsp)
> > 0xffffffff86e90803 <+211>: call 0xffffffff86e20220
> > <usb_hub_to_struct_hub>
> > 0xffffffff86e90808 <+216>: mov %rax,%rbx
> > 0xffffffff86e9080b <+219>: shr $0x3,%rax
> > 0xffffffff86e9080f <+223>: movabs $0xdffffc0000000000,%rcx
> > 0xffffffff86e90819 <+233>: cmpb $0x0,(%rax,%rcx,1)
> > 0xffffffff86e9081d <+237>: je 0xffffffff86e90827 <disable_store+247>
> > 0xffffffff86e9081f <+239>: mov %rbx,%rdi
> > 0xffffffff86e90822 <+242>: call 0xffffffff81eeb0b0
> > <__asan_report_load8_noabort>
> > 0xffffffff86e90827 <+247>: lea 0x60(%rsp),%rsi
> > ...
> > ```
> > The cmpb in disable_store()<+233> is generated by KASAN to check the
> > shadow memory status. If equals 0, which means the load 8 is valid,
> > pass the KASAN check. However, this time rax is 0, so it first
> > triggers general protection fault, since 0xdffffc0000000000 is not a
> > valid address. rax contains the return address of function
> > usb_hub_to_struct_hub(), in this case is a NULL.
> >
> > In function usb_hub_to_struct_hub(), I checked hdev and its sub
> > domains, and they are not NULL. Is it possible that
> > usb_deauthorized_device() set
> > hdev->actconfig->interface[0]->dev.driver_data to NULL? I cannot
> > confirm that since every time I try to breakpoint the code it crashes
> > differently.
>
> I suspect the usb_hub_to_struct_hub() call is racing with the
> spinlock-protected region in hub_disconnect() (in hub.c).
>
> > If there is any other thing I could help, please let me know.
>
> Try the patch below. It should eliminate that race, which hopefully
> will fix the problem.
>
> Alan Stern
>
>
>
> Index: usb-devel/drivers/usb/core/hub.c
> ===================================================================
> --- usb-devel.orig/drivers/usb/core/hub.c
> +++ usb-devel/drivers/usb/core/hub.c
> @@ -72,6 +72,9 @@
> * change to USB_STATE_NOTATTACHED even when the semaphore isn't held. */
> static DEFINE_SPINLOCK(device_state_lock);
>
> +/* Protect hdev->maxchild and hub's intfdata */
> +static DEFINE_SPINLOCK(hub_state_lock);
> +
> /* workqueue to process hub events */
> static struct workqueue_struct *hub_wq;
> static void hub_event(struct work_struct *work);
> @@ -152,9 +155,13 @@ static inline char *portspeed(struct usb
> /* Note that hdev or one of its children must be locked! */
> struct usb_hub *usb_hub_to_struct_hub(struct usb_device *hdev)
> {
> - if (!hdev || !hdev->actconfig || !hdev->maxchild)
> - return NULL;
> - return usb_get_intfdata(hdev->actconfig->interface[0]);
> + struct usb_hub *hub = NULL;
> +
> + spin_lock_irq(&hub_state_lock);
> + if (hdev && hdev->actconfig && hdev->maxchild)
> + hub = usb_get_intfdata(hdev->actconfig->interface[0]);
> + spin_unlock_irq(&hub_state_lock);
> + return hub;
> }
>
> int usb_device_supports_lpm(struct usb_device *udev)
> @@ -1714,7 +1721,9 @@ static int hub_configure(struct usb_hub
> break;
> }
> }
> + spin_lock_irq(&hub_state_lock);
> hdev->maxchild = i;
> + spin_unlock_irq(&hub_state_lock);
> for (i = 0; i < hdev->maxchild; i++) {
> struct usb_port *port_dev = hub->ports[i];
>
> @@ -1790,9 +1799,11 @@ static void hub_disconnect(struct usb_in
>
> /* Avoid races with recursively_mark_NOTATTACHED() */
> spin_lock_irq(&device_state_lock);
> + spin_lock(&hub_state_lock);
> port1 = hdev->maxchild;
> hdev->maxchild = 0;
> usb_set_intfdata(intf, NULL);
> + spin_unlock(&hub_state_lock);
> spin_unlock_irq(&device_state_lock);
>
> for (; port1 > 0; --port1)
>

I applied this patch and tried to execute several times, no more
kernel core dump in my environment. I think this bug is fixed by the
patch. But I do have one more question about it. Since it is a data
race bug, it has reproducibility issues originally. How can I confirm
if a racy bug is fixed by test? This kind of bug might still have a
race window but is harder to trigger. Just curious, not for this
patch. I think this patch eliminates the racy window.

Best,
Yue

2024-04-12 18:12:25

by Alan Stern

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Sat, Apr 13, 2024 at 12:26:07AM +0800, Sam Sun wrote:
> On Fri, Apr 12, 2024 at 10:40 PM Alan Stern <[email protected]> wrote:
> > I suspect the usb_hub_to_struct_hub() call is racing with the
> > spinlock-protected region in hub_disconnect() (in hub.c).
> >
> > > If there is any other thing I could help, please let me know.
> >
> > Try the patch below. It should eliminate that race, which hopefully
> > will fix the problem.

> I applied this patch and tried to execute several times, no more
> kernel core dump in my environment. I think this bug is fixed by the
> patch. But I do have one more question about it. Since it is a data
> race bug, it has reproducibility issues originally. How can I confirm
> if a racy bug is fixed by test? This kind of bug might still have a
> race window but is harder to trigger. Just curious, not for this
> patch. I think this patch eliminates the racy window.

If you don't what what is racing, then testing cannot prove that a race
is eliminated. However, if you do know where a race occurs then it's
easy to see how mutual exclusion can prevent the race from happening.

In this case the bug might have had a different cause, something other
than a race between usb_hub_to_struct_hub() and hub_disconnect(). If
that's so then testing this patch would not be a definite proof that the
bug is gone. But if that race _is_ the cause of the bug then this patch
will fix it -- you can see that just by reading the code with no need
for testing.

Besides, the patch is needed in any case because that race certainly
_can_ occur. And maybe not only on this pathway.

May I add your "Reported-and-tested-by:" to the patch?

Alan Stern

2024-04-12 18:32:14

by Alan Stern

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Fri, Apr 12, 2024 at 02:11:49PM -0400, Alan Stern wrote:
> On Sat, Apr 13, 2024 at 12:26:07AM +0800, Sam Sun wrote:
> > On Fri, Apr 12, 2024 at 10:40 PM Alan Stern <[email protected]> wrote:
> > > I suspect the usb_hub_to_struct_hub() call is racing with the
> > > spinlock-protected region in hub_disconnect() (in hub.c).
> > >
> > > > If there is any other thing I could help, please let me know.
> > >
> > > Try the patch below. It should eliminate that race, which hopefully
> > > will fix the problem.
>
> > I applied this patch and tried to execute several times, no more
> > kernel core dump in my environment. I think this bug is fixed by the
> > patch. But I do have one more question about it. Since it is a data
> > race bug, it has reproducibility issues originally. How can I confirm
> > if a racy bug is fixed by test? This kind of bug might still have a
> > race window but is harder to trigger. Just curious, not for this
> > patch. I think this patch eliminates the racy window.
>
> If you don't what what is racing, then testing cannot prove that a race
> is eliminated. However, if you do know where a race occurs then it's
> easy to see how mutual exclusion can prevent the race from happening.
>
> In this case the bug might have had a different cause, something other
> than a race between usb_hub_to_struct_hub() and hub_disconnect(). If
> that's so then testing this patch would not be a definite proof that the
> bug is gone. But if that race _is_ the cause of the bug then this patch
> will fix it -- you can see that just by reading the code with no need
> for testing.

In fact, there still might be a remaining bug, because even with this
patch it is possible for usb_hub_to_struct_hub() to return NULL. You
can test for this possibility by editing the disable_show() routine:
Move the initializations of the local variables out of the declaration
lines and into the main part of the routine, and add a delay (like
msleep(1000)) before the call to usb_hub_to_struct_hub() -- this will
make the bug a lot easier to trigger (you could even do it by hand).

I think to fully fix this problem it will be necessary to test whether
hub is NULL before using it. The same problem ought to exist in
disable_show(), even though your testing didn't trigger it.

Alan Stern

2024-04-13 05:09:06

by Sam Sun

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Sat, Apr 13, 2024 at 2:11 AM Alan Stern <[email protected]> wrote:
>
> On Sat, Apr 13, 2024 at 12:26:07AM +0800, Sam Sun wrote:
> > On Fri, Apr 12, 2024 at 10:40 PM Alan Stern <[email protected]> wrote:
> > > I suspect the usb_hub_to_struct_hub() call is racing with the
> > > spinlock-protected region in hub_disconnect() (in hub.c).
> > >
> > > > If there is any other thing I could help, please let me know.
> > >
> > > Try the patch below. It should eliminate that race, which hopefully
> > > will fix the problem.
>
> > I applied this patch and tried to execute several times, no more
> > kernel core dump in my environment. I think this bug is fixed by the
> > patch. But I do have one more question about it. Since it is a data
> > race bug, it has reproducibility issues originally. How can I confirm
> > if a racy bug is fixed by test? This kind of bug might still have a
> > race window but is harder to trigger. Just curious, not for this
> > patch. I think this patch eliminates the racy window.
>
> If you don't what what is racing, then testing cannot prove that a race
> is eliminated. However, if you do know where a race occurs then it's
> easy to see how mutual exclusion can prevent the race from happening.
>
> In this case the bug might have had a different cause, something other
> than a race between usb_hub_to_struct_hub() and hub_disconnect(). If
> that's so then testing this patch would not be a definite proof that the
> bug is gone. But if that race _is_ the cause of the bug then this patch
> will fix it -- you can see that just by reading the code with no need
> for testing.
>
> Besides, the patch is needed in any case because that race certainly
> _can_ occur. And maybe not only on this pathway.
>

Thanks for explaining! I will check the related code next time.

> May I add your "Reported-and-tested-by:" to the patch?

Sure, thanks for your help!

Best Regards,
Yue

2024-04-15 14:47:58

by Alan Stern

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Sat, Apr 13, 2024 at 01:08:41PM +0800, Sam Sun wrote:
> On Sat, Apr 13, 2024 at 2:11 AM Alan Stern <[email protected]> wrote:
> >
> > On Sat, Apr 13, 2024 at 12:26:07AM +0800, Sam Sun wrote:
> > > On Fri, Apr 12, 2024 at 10:40 PM Alan Stern <[email protected]> wrote:
> > > > I suspect the usb_hub_to_struct_hub() call is racing with the
> > > > spinlock-protected region in hub_disconnect() (in hub.c).
> > > >
> > > > > If there is any other thing I could help, please let me know.
> > > >
> > > > Try the patch below. It should eliminate that race, which hopefully
> > > > will fix the problem.
> >
> > > I applied this patch and tried to execute several times, no more
> > > kernel core dump in my environment. I think this bug is fixed by the
> > > patch. But I do have one more question about it. Since it is a data
> > > race bug, it has reproducibility issues originally. How can I confirm
> > > if a racy bug is fixed by test? This kind of bug might still have a
> > > race window but is harder to trigger. Just curious, not for this
> > > patch. I think this patch eliminates the racy window.
> >
> > If you don't what what is racing, then testing cannot prove that a race
> > is eliminated. However, if you do know where a race occurs then it's
> > easy to see how mutual exclusion can prevent the race from happening.
> >
> > In this case the bug might have had a different cause, something other
> > than a race between usb_hub_to_struct_hub() and hub_disconnect(). If
> > that's so then testing this patch would not be a definite proof that the
> > bug is gone. But if that race _is_ the cause of the bug then this patch
> > will fix it -- you can see that just by reading the code with no need
> > for testing.
> >
> > Besides, the patch is needed in any case because that race certainly
> > _can_ occur. And maybe not only on this pathway.
> >
>
> Thanks for explaining! I will check the related code next time.
>
> > May I add your "Reported-and-tested-by:" to the patch?
>
> Sure, thanks for your help!

Actually, I've got a completely different patch which I think will fix
the problem you encountered. Instead of using mutual exclusion to
avoid the race, it prevents the two routines from being called at the
same time so the race can't occur in the first place. It also should
guarantee the usb_hub_to_struct_hub() doesn't return NULL when
disable_store() calls it.

Can you try the patch below, instead of (not along with) the first
patch? Thanks.

Alan Stern



Index: usb-devel/drivers/usb/core/hub.c
===================================================================
--- usb-devel.orig/drivers/usb/core/hub.c
+++ usb-devel/drivers/usb/core/hub.c
@@ -1788,16 +1788,15 @@ static void hub_disconnect(struct usb_in

mutex_lock(&usb_port_peer_mutex);

+ for (port1 = hdev->maxchild; port1 > 0; --port1)
+ usb_hub_remove_port_device(hub, port1);
+
/* Avoid races with recursively_mark_NOTATTACHED() */
spin_lock_irq(&device_state_lock);
- port1 = hdev->maxchild;
hdev->maxchild = 0;
usb_set_intfdata(intf, NULL);
spin_unlock_irq(&device_state_lock);

- for (; port1 > 0; --port1)
- usb_hub_remove_port_device(hub, port1);
-
mutex_unlock(&usb_port_peer_mutex);

if (hub->hdev->speed == USB_SPEED_HIGH)


2024-04-16 09:06:25

by Sam Sun

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Mon, Apr 15, 2024 at 10:47 PM Alan Stern <[email protected]> wrote:
>
> Actually, I've got a completely different patch which I think will fix
> the problem you encountered. Instead of using mutual exclusion to
> avoid the race, it prevents the two routines from being called at the
> same time so the race can't occur in the first place. It also should
> guarantee the usb_hub_to_struct_hub() doesn't return NULL when
> disable_store() calls it.
>
> Can you try the patch below, instead of (not along with) the first
> patch? Thanks.
>
> Alan Stern
>
>
>
> Index: usb-devel/drivers/usb/core/hub.c
> ===================================================================
> --- usb-devel.orig/drivers/usb/core/hub.c
> +++ usb-devel/drivers/usb/core/hub.c
> @@ -1788,16 +1788,15 @@ static void hub_disconnect(struct usb_in
>
> mutex_lock(&usb_port_peer_mutex);
>
> + for (port1 = hdev->maxchild; port1 > 0; --port1)
> + usb_hub_remove_port_device(hub, port1);
> +
> /* Avoid races with recursively_mark_NOTATTACHED() */
> spin_lock_irq(&device_state_lock);
> - port1 = hdev->maxchild;
> hdev->maxchild = 0;
> usb_set_intfdata(intf, NULL);
> spin_unlock_irq(&device_state_lock);
>
> - for (; port1 > 0; --port1)
> - usb_hub_remove_port_device(hub, port1);
> -
> mutex_unlock(&usb_port_peer_mutex);
>
> if (hub->hdev->speed == USB_SPEED_HIGH)
>

I tried this patch and it worked. I agree this patch is better and it
avoids introducing new locks.

Best,
Yue

2024-04-16 16:36:09

by Alan Stern

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Tue, Apr 16, 2024 at 05:05:55PM +0800, Sam Sun wrote:
> On Mon, Apr 15, 2024 at 10:47 PM Alan Stern <[email protected]> wrote:
> >
> > Actually, I've got a completely different patch which I think will fix
> > the problem you encountered. Instead of using mutual exclusion to
> > avoid the race, it prevents the two routines from being called at the
> > same time so the race can't occur in the first place. It also should
> > guarantee the usb_hub_to_struct_hub() doesn't return NULL when
> > disable_store() calls it.
> >
> > Can you try the patch below, instead of (not along with) the first
> > patch? Thanks.
> >
> > Alan Stern

> I tried this patch and it worked. I agree this patch is better and it
> avoids introducing new locks.

It turns out that patch is no good. The reason is mentioned in the
changelog for commit 543d7784b07f ("USB: fix race between hub_disconnect
and recursively_mark_NOTATTACHED"); it says that the port devices have to
be removed _after_ maxchild has been set to 0.

So okay, here's a third attempt to fix the problem. This doesn't try to
avoid the race or anything like that. Instead it just adds checks for
usb_hub_to_struct_hub() returning NULL. That should be enough to prevent
the invalid pointer dereference you encountered.

This should be tested by itself, without either of the first two patches.

Alan Stern



Index: usb-devel/drivers/usb/core/port.c
===================================================================
--- usb-devel.orig/drivers/usb/core/port.c
+++ usb-devel/drivers/usb/core/port.c
@@ -51,13 +51,15 @@ static ssize_t disable_show(struct devic
struct usb_port *port_dev = to_usb_port(dev);
struct usb_device *hdev = to_usb_device(dev->parent->parent);
struct usb_hub *hub = usb_hub_to_struct_hub(hdev);
- struct usb_interface *intf = to_usb_interface(hub->intfdev);
+ struct usb_interface *intf = to_usb_interface(dev->parent);
int port1 = port_dev->portnum;
u16 portstatus, unused;
bool disabled;
int rc;
struct kernfs_node *kn;

+ if (!hub)
+ return -ENODEV;
hub_get(hub);
rc = usb_autopm_get_interface(intf);
if (rc < 0)
@@ -101,12 +103,14 @@ static ssize_t disable_store(struct devi
struct usb_port *port_dev = to_usb_port(dev);
struct usb_device *hdev = to_usb_device(dev->parent->parent);
struct usb_hub *hub = usb_hub_to_struct_hub(hdev);
- struct usb_interface *intf = to_usb_interface(hub->intfdev);
+ struct usb_interface *intf = to_usb_interface(dev->parent);
int port1 = port_dev->portnum;
bool disabled;
int rc;
struct kernfs_node *kn;

+ if (!hub)
+ return -ENODEV;
rc = kstrtobool(buf, &disabled);
if (rc)
return rc;


2024-04-17 07:39:27

by Sam Sun

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Wed, Apr 17, 2024 at 12:35 AM Alan Stern <[email protected]> wrote:
>
> On Tue, Apr 16, 2024 at 05:05:55PM +0800, Sam Sun wrote:
> > On Mon, Apr 15, 2024 at 10:47 PM Alan Stern <[email protected]> wrote:
> > >
> > > Actually, I've got a completely different patch which I think will fix
> > > the problem you encountered. Instead of using mutual exclusion to
> > > avoid the race, it prevents the two routines from being called at the
> > > same time so the race can't occur in the first place. It also should
> > > guarantee the usb_hub_to_struct_hub() doesn't return NULL when
> > > disable_store() calls it.
> > >
> > > Can you try the patch below, instead of (not along with) the first
> > > patch? Thanks.
> > >
> > > Alan Stern
>
> > I tried this patch and it worked. I agree this patch is better and it
> > avoids introducing new locks.
>
> It turns out that patch is no good. The reason is mentioned in the
> changelog for commit 543d7784b07f ("USB: fix race between hub_disconnect
> and recursively_mark_NOTATTACHED"); it says that the port devices have to
> be removed _after_ maxchild has been set to 0.
>

I checked the commit you mentioned. Maybe your first fix is all we
need to fix the problem? At least no race would occur for
hdev->maxchild and usb_set_intfdata().

> So okay, here's a third attempt to fix the problem. This doesn't try to
> avoid the race or anything like that. Instead it just adds checks for
> usb_hub_to_struct_hub() returning NULL. That should be enough to prevent
> the invalid pointer dereference you encountered.
>
> This should be tested by itself, without either of the first two patches.
>
> Alan Stern
>
>
>
> Index: usb-devel/drivers/usb/core/port.c
> ===================================================================
> --- usb-devel.orig/drivers/usb/core/port.c
> +++ usb-devel/drivers/usb/core/port.c
> @@ -51,13 +51,15 @@ static ssize_t disable_show(struct devic
> struct usb_port *port_dev = to_usb_port(dev);
> struct usb_device *hdev = to_usb_device(dev->parent->parent);
> struct usb_hub *hub = usb_hub_to_struct_hub(hdev);
> - struct usb_interface *intf = to_usb_interface(hub->intfdev);
> + struct usb_interface *intf = to_usb_interface(dev->parent);
> int port1 = port_dev->portnum;
> u16 portstatus, unused;
> bool disabled;
> int rc;
> struct kernfs_node *kn;
>
> + if (!hub)
> + return -ENODEV;
> hub_get(hub);
> rc = usb_autopm_get_interface(intf);
> if (rc < 0)
> @@ -101,12 +103,14 @@ static ssize_t disable_store(struct devi
> struct usb_port *port_dev = to_usb_port(dev);
> struct usb_device *hdev = to_usb_device(dev->parent->parent);
> struct usb_hub *hub = usb_hub_to_struct_hub(hdev);
> - struct usb_interface *intf = to_usb_interface(hub->intfdev);
> + struct usb_interface *intf = to_usb_interface(dev->parent);
> int port1 = port_dev->portnum;
> bool disabled;
> int rc;
> struct kernfs_node *kn;
>
> + if (!hub)
> + return -ENODEV;
> rc = kstrtobool(buf, &disabled);
> if (rc)
> return rc;
>

I applied this patch and it also can fix the warning. I am not sure
which one is better.

Best,
Yue

2024-04-17 14:24:10

by Alan Stern

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in disable_store

On Wed, Apr 17, 2024 at 03:39:02PM +0800, Sam Sun wrote:
> On Wed, Apr 17, 2024 at 12:35 AM Alan Stern <[email protected]> wrote:
> > It turns out that patch is no good. The reason is mentioned in the
> > changelog for commit 543d7784b07f ("USB: fix race between hub_disconnect
> > and recursively_mark_NOTATTACHED"); it says that the port devices have to
> > be removed _after_ maxchild has been set to 0.
> >
>
> I checked the commit you mentioned. Maybe your first fix is all we
> need to fix the problem? At least no race would occur for
> hdev->maxchild and usb_set_intfdata().

No, the first patch won't help, even though it passed your testing. The
race it eliminates is a harmless one -- or at least, it's harmless in
this context. If usb_hub_to_struct_hub() sees bad values for
hdev->maxchild or usb_get_intfdata(), it will simply return NULL. But
this can happen even with the first patch applied, if the user tries to
access disable_store() during the brief time between when hdev->maxchild
is set to 0 and when the port devices are removed.

The true fix is simply to check whether the return value from
usb_hub_to_struct_hub() is NULL, which is what this patch does.

> I applied this patch and it also can fix the warning. I am not sure
> which one is better.

I'm quite sure that this one is better. I will submit it shortly, with
your Tested-by:.

Thanks a lot; the work you have done on this has been a big help.

Alan Stern