2017-12-07 09:37:04

by Alexander Kappner

[permalink] [raw]
Subject:

Date: Wed, 6 Dec 2017 15:28:37 -0800
Subject: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

My kernel crashed just after resuming from hibernate and starting usbmuxd
(a user-space daemon for iOS device pairing) with several USB devices
connected (dmesg attached).

Backtrace leads to:

0xffffffff8170465d is in xhci_debugfs_create_endpoint
(drivers/usb/host/xhci-debugfs.c:381).
376 int ep_index)
377 {
378 struct xhci_ep_priv *epriv;
379 struct xhci_slot_priv *spriv = dev->debugfs_private;
380
381 if (spriv->eps[ep_index])
382 return;
383
384 epriv = kzalloc(sizeof(*epriv), GFP_KERNEL);
385 if (!epriv)

The read violation happens at address 0x40 and sizeof(struct
xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here.

spriv gets allocated in xhci_debugfs_create_slot:

...
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
return;
...

There's no separate error path if this allocation fails, so we might be
left with NULL in priv. Subsequent users of priv thus need to check for
this NULL - so this is what the patch does.

There might be other ways of triggering this null pointer dereference,
including when xhci_resume frees the device structures (e.g. after
returning from a hibernate), but I wasn't able to find or reproduce it.

[63953.758083] BUG: unable to handle kernel NULL pointer dereference at
0000000000000040
[63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0
[63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0
[63953.758093] Oops: 0000 [#1] PREEMPT SMP
[63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm
mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211
qmi_wwan ecdh_generic thinkpad_acpi rfkill
[63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P O
4.14.0.1-12769-g1deab8c #1
[63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET62W
(1.35 ) 11/10/2016
[63953.758105] task: ffff8810527ba0c0 task.stack: ffffc9000a8ec000
[63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0
[63953.758108] RSP: 0018:ffffc9000a8efc80 EFLAGS: 00010206
[63953.758109] RAX: 0000000000000000 RBX: ffff88105a71c000 RCX:
0000000000030000
[63953.758110] RDX: 0000000000000003 RSI: ffff880c0b57e000 RDI:
ffff88105a71c238
[63953.758110] RBP: 0000000000000003 R08: ffff881063800600 R09:
0000000000000003
[63953.758111] R10: ffff88105a71c238 R11: 0000000000000001 R12:
0000000000000011
[63953.758112] R13: 0000000000000018 R14: 0000000000000000 R15:
0000000000000000
[63953.758113] FS: 00007f0a77715700(0000) GS:ffff8810a3d00000(0000)
knlGS:0000000000000000
[63953.758114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63953.758115] CR2: 0000000000000040 CR3: 00000003f91a8006 CR4:
00000000003606e0
[63953.758115] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[63953.758116] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[63953.758117] Call Trace:
[63953.758120] xhci_add_endpoint+0x127/0x2b0
[63953.758123] usb_hcd_alloc_bandwidth+0x1ad/0x300
[63953.758125] usb_set_configuration+0x1c8/0x880
[63953.758128] usbdev_do_ioctl+0xc41/0x1120
[63953.758130] usbdev_ioctl+0xa/0x10
[63953.758151] do_vfs_ioctl+0x8b/0x5c0
[63953.758153] ? __fget+0x6c/0xb0
[63953.758155] SyS_ioctl+0x76/0x90
[63953.758157] do_syscall_64+0x6b/0x290
[63953.758173] entry_SYSCALL64_slow_path+0x25/0x25
[63953.758175] RIP: 0033:0x7f0a76a151c7
[63953.758175] RSP: 002b:00007ffd1431b0c8 EFLAGS: 00000202 ORIG_RAX:
0000000000000010
[63953.758177] RAX: ffffffffffffffda RBX: 00000000023239a0 RCX:
00007f0a76a151c7
[63953.758178] RDX: 00007ffd1431b0dc RSI: 0000000080045505 RDI:
000000000000000e
[63953.758178] RBP: 00000000023240c0 R08: 00007ffd1431b008 R09:
0000000000000004
[63953.758179] R10: 00007ffd1431aec0 R11: 0000000000000202 R12:
00000000023240c0
[63953.758180] R13: 0000000000000001 R14: 0000000000000056 R15:
0000000000000038
[63953.758182] Code: e9 39 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 41 57 41 56 41 55 41 54 55 48 63 ea 53 4c 8b b6 88 15 00 00 4d 8d 2c ee
<49> 83 7d 28 00 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 3d
[63953.758202] RIP: xhci_debugfs_create_endpoint+0x1d/0xa0 RSP:
ffffc9000a8efc80
[63953.758203] CR2: 0000000000000040
[63953.758204] ---[ end trace 1f7ea9a959f02054 ]---

Signed-off-by: Alexander Kappner <[email protected]>
---
drivers/usb/host/xhci-debugfs.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/usb/host/xhci-debugfs.c b/drivers/usb/host/xhci-debugfs.c
index 4f7895d..1cea59c 100644
--- a/drivers/usb/host/xhci-debugfs.c
+++ b/drivers/usb/host/xhci-debugfs.c
@@ -378,6 +378,9 @@ void xhci_debugfs_create_endpoint(struct xhci_hcd *xhci,
struct xhci_ep_priv *epriv;
struct xhci_slot_priv *spriv = dev->debugfs_private;

+ if (!spriv)
+ return;
+
if (spriv->eps[ep_index])
return;

--
2.1.4


2017-12-07 10:38:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: your mail

On Thu, Dec 07, 2017 at 01:26:14AM -0800, Alexander Kappner wrote:
> Date: Wed, 6 Dec 2017 15:28:37 -0800
> Subject: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

Something went wrong here, resulting in an email with no subject.

Can you fix this up and resend?

thanks,

greg k-h

2017-12-07 12:45:09

by Mathias Nyman

[permalink] [raw]
Subject: Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

On 07.12.2017 11:26, Alexander Kappner wrote:
> Date: Wed, 6 Dec 2017 15:28:37 -0800
> Subject: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

(Added this as reply subject, please remember to send mail with proper subject)

>
> My kernel crashed just after resuming from hibernate and starting usbmuxd
> (a user-space daemon for iOS device pairing) with several USB devices
> connected (dmesg attached).
>
> Backtrace leads to:
>
> 0xffffffff8170465d is in xhci_debugfs_create_endpoint
> (drivers/usb/host/xhci-debugfs.c:381).
> 376 int ep_index)
> 377 {
> 378 struct xhci_ep_priv *epriv;
> 379 struct xhci_slot_priv *spriv = dev->debugfs_private;
> 380
> 381 if (spriv->eps[ep_index])
> 382 return;
> 383
> 384 epriv = kzalloc(sizeof(*epriv), GFP_KERNEL);
> 385 if (!epriv)
>
> The read violation happens at address 0x40 and sizeof(struct
> xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here.
>

Thanks for reporting and debugging this, much appreciated.

> spriv gets allocated in xhci_debugfs_create_slot:
>
> ...
> priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> if (!priv)
> return;
> ...
>
> There's no separate error path if this allocation fails, so we might be
> left with NULL in priv. Subsequent users of priv thus need to check for
> this NULL - so this is what the patch does.
>

I think we need to dig a bit deeper. It's good to check if spriv is valid
but there are probably other reasons than kzalloc failing.

Resume from hibernation will reset the xhci controller, reinitializing a lot.
It could be related.

-Mathias

(keeping rest of message as reference)

> There might be other ways of triggering this null pointer dereference,
> including when xhci_resume frees the device structures (e.g. after
> returning from a hibernate), but I wasn't able to find or reproduce it.
>
> [63953.758083] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000040
> [63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0
> [63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0
> [63953.758093] Oops: 0000 [#1] PREEMPT SMP
> [63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm
> mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211
> qmi_wwan ecdh_generic thinkpad_acpi rfkill
> [63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P O
> 4.14.0.1-12769-g1deab8c #1
> [63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET62W
> (1.35 ) 11/10/2016
> [63953.758105] task: ffff8810527ba0c0 task.stack: ffffc9000a8ec000
> [63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0
> [63953.758108] RSP: 0018:ffffc9000a8efc80 EFLAGS: 00010206
> [63953.758109] RAX: 0000000000000000 RBX: ffff88105a71c000 RCX:
> 0000000000030000
> [63953.758110] RDX: 0000000000000003 RSI: ffff880c0b57e000 RDI:
> ffff88105a71c238
> [63953.758110] RBP: 0000000000000003 R08: ffff881063800600 R09:
> 0000000000000003
> [63953.758111] R10: ffff88105a71c238 R11: 0000000000000001 R12:
> 0000000000000011
> [63953.758112] R13: 0000000000000018 R14: 0000000000000000 R15:
> 0000000000000000
> [63953.758113] FS: 00007f0a77715700(0000) GS:ffff8810a3d00000(0000)
> knlGS:0000000000000000
> [63953.758114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [63953.758115] CR2: 0000000000000040 CR3: 00000003f91a8006 CR4:
> 00000000003606e0
> [63953.758115] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [63953.758116] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [63953.758117] Call Trace:
> [63953.758120] xhci_add_endpoint+0x127/0x2b0
> [63953.758123] usb_hcd_alloc_bandwidth+0x1ad/0x300
> [63953.758125] usb_set_configuration+0x1c8/0x880
> [63953.758128] usbdev_do_ioctl+0xc41/0x1120
> [63953.758130] usbdev_ioctl+0xa/0x10
> [63953.758151] do_vfs_ioctl+0x8b/0x5c0
> [63953.758153] ? __fget+0x6c/0xb0
> [63953.758155] SyS_ioctl+0x76/0x90
> [63953.758157] do_syscall_64+0x6b/0x290
> [63953.758173] entry_SYSCALL64_slow_path+0x25/0x25
> [63953.758175] RIP: 0033:0x7f0a76a151c7
> [63953.758175] RSP: 002b:00007ffd1431b0c8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000010
> [63953.758177] RAX: ffffffffffffffda RBX: 00000000023239a0 RCX:
> 00007f0a76a151c7
> [63953.758178] RDX: 00007ffd1431b0dc RSI: 0000000080045505 RDI:
> 000000000000000e
> [63953.758178] RBP: 00000000023240c0 R08: 00007ffd1431b008 R09:
> 0000000000000004
> [63953.758179] R10: 00007ffd1431aec0 R11: 0000000000000202 R12:
> 00000000023240c0
> [63953.758180] R13: 0000000000000001 R14: 0000000000000056 R15:
> 0000000000000038
> [63953.758182] Code: e9 39 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> 00 41 57 41 56 41 55 41 54 55 48 63 ea 53 4c 8b b6 88 15 00 00 4d 8d 2c ee
> <49> 83 7d 28 00 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 3d
> [63953.758202] RIP: xhci_debugfs_create_endpoint+0x1d/0xa0 RSP:
> ffffc9000a8efc80
> [63953.758203] CR2: 0000000000000040
> [63953.758204] ---[ end trace 1f7ea9a959f02054 ]---
>
> Signed-off-by: Alexander Kappner <[email protected]>
> ---
> drivers/usb/host/xhci-debugfs.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/usb/host/xhci-debugfs.c b/drivers/usb/host/xhci-debugfs.c
> index 4f7895d..1cea59c 100644
> --- a/drivers/usb/host/xhci-debugfs.c
> +++ b/drivers/usb/host/xhci-debugfs.c
> @@ -378,6 +378,9 @@ void xhci_debugfs_create_endpoint(struct xhci_hcd *xhci,
> struct xhci_ep_priv *epriv;
> struct xhci_slot_priv *spriv = dev->debugfs_private;
>
> + if (!spriv)
> + return;
> +
> if (spriv->eps[ep_index])
> return;
>
>

2017-12-08 11:06:45

by Alexander Kappner

[permalink] [raw]
Subject: Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

Hi,

> I think we need to dig a bit deeper. It's good to check if spriv is
valid
but there are probably other reasons than kzalloc failing.

I agree -- this small allocation is unlikely to fail in practice. Also,
while my patch prevents the kernel oops, it also prevents the debugfs
entries from being created.

I've been debugging this more trying to come up with a better solution,
but I might need some guidance as I'm not too familiar with the USB
subsystem. The immediate cause of the crash was usbmuxd sending a
USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if it fails_
calls usb_hcd_alloc_bandwidth to try and reset the device, which in turn
calls xhci_debugfs_create_endpoint. The ioctl handler acquires a
device-specific lock via usb_lock_device.

When the system resumed from hibernate, xhci_resume was called. This in
turn called xhci_mem_cleanup to deallocate the device structures, which
include setting the debugfs_private pointer to NULL (via
xhci_free_virt_devices_depth_first). It thus seems likely that the ioctl
is somehow racing with the hibernate. The call to xhci_resume is
protected by a host-controller specific lock (xhci->lock) but it doesn't
attempt to take the usb_lock_device device-specific lock.

Now my suspicion is that xhci_resume freed and zeroed the device
structures while racing with the ioctl handler. I can't seem to find any
exclusion mechanism that would prevent xhci_resume from racing with the
USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for that matter).
Am I missing something? If not, is there any reason why an ioctl might
need to execute in parallel with the xhci_resume? If not, can we just do
a busy wait in xhci_resume until all pending ioctls have returned?

> (Added this as reply subject, please remember to send mail with
proper subject)
My apologies, thanks for fixing.


On 12/07/2017 04:47 AM, Mathias Nyman wrote:
> On 07.12.2017 11:26, Alexander Kappner wrote:
>> Date: Wed, 6 Dec 2017 15:28:37 -0800
>> Subject: [PATCH] usb-core: Fix potential null pointer dereference in
>> xhci-debugfs.c
>
> (Added this as reply subject, please remember to send mail with proper
> subject)
>
>>
>> My kernel crashed just after resuming from hibernate and starting
>> usbmuxd
>> (a user-space daemon for iOS device pairing) with several USB devices
>> connected (dmesg attached).
>>
>> Backtrace leads to:
>>
>> 0xffffffff8170465d is in xhci_debugfs_create_endpoint
>> (drivers/usb/host/xhci-debugfs.c:381).
>> 376 int ep_index)
>> 377 {
>> 378 struct xhci_ep_priv *epriv;
>> 379 struct xhci_slot_priv *spriv = dev->debugfs_private;
>> 380
>> 381 if (spriv->eps[ep_index])
>> 382 return;
>> 383
>> 384 epriv = kzalloc(sizeof(*epriv), GFP_KERNEL);
>> 385 if (!epriv)
>>
>> The read violation happens at address 0x40 and sizeof(struct
>> xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here.
>>
>
> Thanks for reporting and debugging this, much appreciated.
>
>> spriv gets allocated in xhci_debugfs_create_slot:
>>
>> ...
>> priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>> if (!priv)
>> return;
>> ...
>>
>> There's no separate error path if this allocation fails, so we might be
>> left with NULL in priv. Subsequent users of priv thus need to check for
>> this NULL - so this is what the patch does.
>>
>
> I think we need to dig a bit deeper. It's good to check if spriv is valid
> but there are probably other reasons than kzalloc failing.
>
> Resume from hibernation will reset the xhci controller, reinitializing
> a lot.
> It could be related.
>
> -Mathias
>
> (keeping rest of message as reference)
>
>> There might be other ways of triggering this null pointer dereference,
>> including when xhci_resume frees the device structures (e.g. after
>> returning from a hibernate), but I wasn't able to find or reproduce it.
>>
>> [63953.758083] BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000040
>> [63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0
>> [63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0
>> [63953.758093] Oops: 0000 [#1] PREEMPT SMP
>> [63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm
>> mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211
>> qmi_wwan ecdh_generic thinkpad_acpi rfkill
>> [63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P O
>> 4.14.0.1-12769-g1deab8c #1
>> [63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS
>> N1EET62W
>> (1.35 ) 11/10/2016
>> [63953.758105] task: ffff8810527ba0c0 task.stack: ffffc9000a8ec000
>> [63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0
>> [63953.758108] RSP: 0018:ffffc9000a8efc80 EFLAGS: 00010206
>> [63953.758109] RAX: 0000000000000000 RBX: ffff88105a71c000 RCX:
>> 0000000000030000
>> [63953.758110] RDX: 0000000000000003 RSI: ffff880c0b57e000 RDI:
>> ffff88105a71c238
>> [63953.758110] RBP: 0000000000000003 R08: ffff881063800600 R09:
>> 0000000000000003
>> [63953.758111] R10: ffff88105a71c238 R11: 0000000000000001 R12:
>> 0000000000000011
>> [63953.758112] R13: 0000000000000018 R14: 0000000000000000 R15:
>> 0000000000000000
>> [63953.758113] FS: 00007f0a77715700(0000) GS:ffff8810a3d00000(0000)
>> knlGS:0000000000000000
>> [63953.758114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [63953.758115] CR2: 0000000000000040 CR3: 00000003f91a8006 CR4:
>> 00000000003606e0
>> [63953.758115] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [63953.758116] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>> 0000000000000400
>> [63953.758117] Call Trace:
>> [63953.758120] xhci_add_endpoint+0x127/0x2b0
>> [63953.758123] usb_hcd_alloc_bandwidth+0x1ad/0x300
>> [63953.758125] usb_set_configuration+0x1c8/0x880
>> [63953.758128] usbdev_do_ioctl+0xc41/0x1120
>> [63953.758130] usbdev_ioctl+0xa/0x10
>> [63953.758151] do_vfs_ioctl+0x8b/0x5c0
>> [63953.758153] ? __fget+0x6c/0xb0
>> [63953.758155] SyS_ioctl+0x76/0x90
>> [63953.758157] do_syscall_64+0x6b/0x290
>> [63953.758173] entry_SYSCALL64_slow_path+0x25/0x25
>> [63953.758175] RIP: 0033:0x7f0a76a151c7
>> [63953.758175] RSP: 002b:00007ffd1431b0c8 EFLAGS: 00000202 ORIG_RAX:
>> 0000000000000010
>> [63953.758177] RAX: ffffffffffffffda RBX: 00000000023239a0 RCX:
>> 00007f0a76a151c7
>> [63953.758178] RDX: 00007ffd1431b0dc RSI: 0000000080045505 RDI:
>> 000000000000000e
>> [63953.758178] RBP: 00000000023240c0 R08: 00007ffd1431b008 R09:
>> 0000000000000004
>> [63953.758179] R10: 00007ffd1431aec0 R11: 0000000000000202 R12:
>> 00000000023240c0
>> [63953.758180] R13: 0000000000000001 R14: 0000000000000056 R15:
>> 0000000000000038
>> [63953.758182] Code: e9 39 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f
>> 44 00
>> 00 41 57 41 56 41 55 41 54 55 48 63 ea 53 4c 8b b6 88 15 00 00 4d 8d
>> 2c ee
>> <49> 83 7d 28 00 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 3d
>> [63953.758202] RIP: xhci_debugfs_create_endpoint+0x1d/0xa0 RSP:
>> ffffc9000a8efc80
>> [63953.758203] CR2: 0000000000000040
>> [63953.758204] ---[ end trace 1f7ea9a959f02054 ]---
>>
>> Signed-off-by: Alexander Kappner <[email protected]>
>> ---
>> drivers/usb/host/xhci-debugfs.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/usb/host/xhci-debugfs.c
>> b/drivers/usb/host/xhci-debugfs.c
>> index 4f7895d..1cea59c 100644
>> --- a/drivers/usb/host/xhci-debugfs.c
>> +++ b/drivers/usb/host/xhci-debugfs.c
>> @@ -378,6 +378,9 @@ void xhci_debugfs_create_endpoint(struct xhci_hcd
>> *xhci,
>> struct xhci_ep_priv *epriv;
>> struct xhci_slot_priv *spriv = dev->debugfs_private;
>> + if (!spriv)
>> + return;
>> +
>> if (spriv->eps[ep_index])
>> return;
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2017-12-08 16:59:17

by Mathias Nyman

[permalink] [raw]
Subject: Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

On 08.12.2017 13:06, Alexander Kappner wrote:
> Hi,
>
>> I think we need to dig a bit deeper. It's good to check if spriv is
>> valid
>> but there are probably other reasons than kzalloc failing.
>
> I agree -- this small allocation is unlikely to fail in practice.
> Also, while my patch prevents the kernel oops, it also prevents the
> debugfs entries from being created.
>
> I've been debugging this more trying to come up with a better
> solution, but I might need some guidance as I'm not too familiar with
> the USB subsystem. The immediate cause of the crash was usbmuxd
> sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if
> it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device,
> which in turn calls xhci_debugfs_create_endpoint. The ioctl handler
> acquires a device-specific lock via usb_lock_device.
>
> When the system resumed from hibernate, xhci_resume was called. This
> in turn called xhci_mem_cleanup to deallocate the device structures,
> which include setting the debugfs_private pointer to NULL (via
> xhci_free_virt_devices_depth_first). It thus seems likely that the
> ioctl is somehow racing with the hibernate. The call to xhci_resume
> is protected by a host-controller specific lock (xhci->lock) but it
> doesn't attempt to take the usb_lock_device device-specific lock.
>
> Now my suspicion is that xhci_resume freed and zeroed the device
> structures while racing with the ioctl handler. I can't seem to find
> any exclusion mechanism that would prevent xhci_resume from racing
> with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for
> that matter). Am I missing something? If not, is there any reason why
> an ioctl might need to execute in parallel with the xhci_resume? If
> not, can we just do a busy wait in xhci_resume until all pending
> ioctls have returned?

I'm not sure, but if I recall correctly then power management is supposed
to make sure a driver doesn't access usb devices while the host controller
is still resuming.

The odd thing here is that
xhci_debugfs_remove_slot(xhci, slot_id), and
xhci_free_virt_device(xhci, slot_id) are called together when
xhci_mem_cleanup() calls xhci_free_virt_devices_depth_first()

That means both the xhci_virt_device *dev and dev->debugfs_private
should both be freed and xhci->devs[slot_id] set NULL for that virt_device.

so xhci_add_endpoint() should fail a lot earlier because the xhci->devs[slot_id]
should be a null pointer as well.

Allocation is also done together in xhci_alloc_dev()

Looking at it more closely there is actually the .free_dev callback that
first frees the dev->debugs_private but the virt_dev is only freed
conditionally later

Attached a patch that frees them together, can you try it out?

If it doesn't help we need to add some elaborate tracing

Thanks
-Mathias



Attachments:
0001-xhci-Only-free-xhci_virt_device-debugfs_private-if-d.patch (1.32 kB)

2017-12-09 22:38:51

by Alexander Kappner

[permalink] [raw]
Subject: Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

Hi Mathias,

thanks for the patch! The system now resumes cleanly from hibernate even
with usbmuxd doing its thing.

Tested-by: Alexander Kappner <[email protected]>

While testing this I hit some other issues with xhci-debugfs.c but I'll
write these up in a separate bug.

On 12/08/2017 09:01 AM, Mathias Nyman wrote:
> On 08.12.2017 13:06, Alexander Kappner wrote:
>> Hi,
>>
>>> I think we need to dig a bit deeper. It's good to check if spriv is
>>> valid
>>> but there are probably other reasons than kzalloc failing.
>>
>> I agree -- this small allocation is unlikely to fail in practice.
>> Also, while my patch prevents the kernel oops, it also prevents the
>> debugfs entries from being created.
>>
>> I've been debugging this more trying to come up with a better
>> solution, but I might need some guidance as I'm not too familiar with
>> the USB subsystem. The immediate cause of the crash was usbmuxd
>> sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if
>> it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device,
>> which in turn calls xhci_debugfs_create_endpoint. The ioctl handler
>> acquires a device-specific lock via usb_lock_device.
>>
>> When the system resumed from hibernate, xhci_resume was called. This
>> in turn called xhci_mem_cleanup to deallocate the device structures,
>> which include setting the debugfs_private pointer to NULL (via
>> xhci_free_virt_devices_depth_first). It thus seems likely that the
>> ioctl is somehow racing with the hibernate. The call to xhci_resume
>> is protected by a host-controller specific lock (xhci->lock) but it
>> doesn't attempt to take the usb_lock_device device-specific lock.
>>
>> Now my suspicion is that xhci_resume freed and zeroed the device
>> structures while racing with the ioctl handler. I can't seem to find
>> any exclusion mechanism that would prevent xhci_resume from racing
>> with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for
>> that matter). Am I missing something? If not, is there any reason why
>> an ioctl might need to execute in parallel with the xhci_resume? If
>> not, can we just do a busy wait in xhci_resume until all pending
>> ioctls have returned?
>
> I'm not sure, but if I recall correctly then power management is supposed
> to make sure a driver doesn't access usb devices while the host
> controller
> is still resuming.
>
> The odd thing here is that
> xhci_debugfs_remove_slot(xhci, slot_id), and
> xhci_free_virt_device(xhci, slot_id) are called together when
> xhci_mem_cleanup() calls xhci_free_virt_devices_depth_first()
>
> That means both the xhci_virt_device *dev and dev->debugfs_private
> should both be freed and xhci->devs[slot_id] set NULL for that
> virt_device.
>
> so xhci_add_endpoint() should fail a lot earlier because the
> xhci->devs[slot_id]
> should be a null pointer as well.
>
> Allocation is also done together in xhci_alloc_dev()
>
> Looking at it more closely there is actually the .free_dev callback that
> first frees the dev->debugs_private but the virt_dev is only freed
> conditionally later
>
> Attached a patch that frees them together, can you try it out?
>
> If it doesn't help we need to add some elaborate tracing
>
> Thanks
> -Mathias
>
>

2017-12-11 10:36:44

by Mathias Nyman

[permalink] [raw]
Subject: Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

On 10.12.2017 00:38, Alexander Kappner wrote:
> Hi Mathias,
>
> thanks for the patch! The system now resumes cleanly from hibernate even with usbmuxd doing its thing.
>
> Tested-by: Alexander Kappner <[email protected]>
>
> While testing this I hit some other issues with xhci-debugfs.c but I'll write these up in a separate bug.
>

Thanks,
Adding reported-by and tested-by tags, and pushing patch to my for-usb-linus branch

-Mathias