2018-07-20 09:46:07

by Paul Menzel

[permalink] [raw]
Subject: BUG: KASAN: use-after-free in xhci_trb_virt_to_dma.part.24+0x1c/0x80

Dear Linux folks,


Using Linux 4.18-rc5+ with kmemleak enabled on a MSI B350M MORTAR (MS-7A37)
with an AMD Ryzen 3 2200G, the memory leak below is suspected.

```
$ sudo less /sys/kernel/debug/kmemleak
unreferenced object 0xffff894f8874a2b8 (size 8):
comm "systemd-udevd", pid 119, jiffies 4294893109 (age 908.348s)
hex dump (first 8 bytes):
34 01 05 00 00 00 00 00 4.......
backtrace:
[<00000000308e4456>] xhci_init+0x81/0x170 [xhci_hcd]
[<00000000269aa18f>] xhci_gen_setup+0x2cb/0x510 [xhci_hcd]
[<000000007b70d85f>] xhci_pci_setup+0x4d/0x120 [xhci_pci]
[<0000000059f49127>] usb_add_hcd+0x2b6/0x800 [usbcore]
[<000000009a16d67c>] usb_hcd_pci_probe+0x1f3/0x460 [usbcore]
[<0000000001295c2e>] xhci_pci_probe+0x27/0x1d7 [xhci_pci]
[<00000000395bd8f9>] local_pci_probe+0x41/0x90
[<00000000a344e362>] pci_device_probe+0x189/0x1a0
[<00000000318999e5>] driver_probe_device+0x2b9/0x460
[<00000000c29d8a55>] __driver_attach+0xdd/0x110
[<00000000975b7f46>] bus_for_each_dev+0x76/0xc0
[<000000006bc40955>] bus_add_driver+0x152/0x230
[<00000000840ed63c>] driver_register+0x6b/0xb0
[<00000000123908c4>] do_one_initcall+0x46/0x1c3
[<00000000ce69c793>] do_init_module+0x5a/0x210
[<0000000091d4aef2>] load_module+0x21c4/0x2410
[…]
```


Kind regards,

Paul



Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2018-07-20 09:55:41

by Greg KH

[permalink] [raw]
Subject: Re: BUG: KASAN: use-after-free in xhci_trb_virt_to_dma.part.24+0x1c/0x80

On Fri, Jul 20, 2018 at 11:44:49AM +0200, Paul Menzel wrote:
> Dear Linux folks,
>
>
> Using Linux 4.18-rc5+ with kmemleak enabled on a MSI B350M MORTAR (MS-7A37)
> with an AMD Ryzen 3 2200G, the memory leak below is suspected.
>
> ```
> $ sudo less /sys/kernel/debug/kmemleak
> unreferenced object 0xffff894f8874a2b8 (size 8):
> comm "systemd-udevd", pid 119, jiffies 4294893109 (age 908.348s)
> hex dump (first 8 bytes):
> 34 01 05 00 00 00 00 00 4.......
> backtrace:
> [<00000000308e4456>] xhci_init+0x81/0x170 [xhci_hcd]
> [<00000000269aa18f>] xhci_gen_setup+0x2cb/0x510 [xhci_hcd]
> [<000000007b70d85f>] xhci_pci_setup+0x4d/0x120 [xhci_pci]
> [<0000000059f49127>] usb_add_hcd+0x2b6/0x800 [usbcore]
> [<000000009a16d67c>] usb_hcd_pci_probe+0x1f3/0x460 [usbcore]
> [<0000000001295c2e>] xhci_pci_probe+0x27/0x1d7 [xhci_pci]
> [<00000000395bd8f9>] local_pci_probe+0x41/0x90
> [<00000000a344e362>] pci_device_probe+0x189/0x1a0
> [<00000000318999e5>] driver_probe_device+0x2b9/0x460
> [<00000000c29d8a55>] __driver_attach+0xdd/0x110
> [<00000000975b7f46>] bus_for_each_dev+0x76/0xc0
> [<000000006bc40955>] bus_add_driver+0x152/0x230
> [<00000000840ed63c>] driver_register+0x6b/0xb0
> [<00000000123908c4>] do_one_initcall+0x46/0x1c3
> [<00000000ce69c793>] do_init_module+0x5a/0x210
> [<0000000091d4aef2>] load_module+0x21c4/0x2410
> […]
> ```

That's really vague. Any chance for a reproducer or some other types of
hints as to what you feel the problem is here?

thanks,

greg k-h

2018-07-23 11:25:11

by Paul Menzel

[permalink] [raw]
Subject: Re: BUG: KASAN: use-after-free in xhci_trb_virt_to_dma.part.24+0x1c/0x80

Dear Greg,


On 07/20/18 11:54, Greg KH wrote:
> On Fri, Jul 20, 2018 at 11:44:49AM +0200, Paul Menzel wrote:

>> Using Linux 4.18-rc5+ with kmemleak enabled on a MSI B350M MORTAR (MS-7A37)
>> with an AMD Ryzen 3 2200G, the memory leak below is suspected.
>>
>> ```
>> $ sudo less /sys/kernel/debug/kmemleak
>> unreferenced object 0xffff894f8874a2b8 (size 8):
>> comm "systemd-udevd", pid 119, jiffies 4294893109 (age 908.348s)
>> hex dump (first 8 bytes):
>> 34 01 05 00 00 00 00 00 4.......
>> backtrace:
>> [<00000000308e4456>] xhci_init+0x81/0x170 [xhci_hcd]
>> [<00000000269aa18f>] xhci_gen_setup+0x2cb/0x510 [xhci_hcd]
>> [<000000007b70d85f>] xhci_pci_setup+0x4d/0x120 [xhci_pci]
>> [<0000000059f49127>] usb_add_hcd+0x2b6/0x800 [usbcore]
>> [<000000009a16d67c>] usb_hcd_pci_probe+0x1f3/0x460 [usbcore]
>> [<0000000001295c2e>] xhci_pci_probe+0x27/0x1d7 [xhci_pci]
>> [<00000000395bd8f9>] local_pci_probe+0x41/0x90
>> [<00000000a344e362>] pci_device_probe+0x189/0x1a0
>> [<00000000318999e5>] driver_probe_device+0x2b9/0x460
>> [<00000000c29d8a55>] __driver_attach+0xdd/0x110
>> [<00000000975b7f46>] bus_for_each_dev+0x76/0xc0
>> [<000000006bc40955>] bus_add_driver+0x152/0x230
>> [<00000000840ed63c>] driver_register+0x6b/0xb0
>> [<00000000123908c4>] do_one_initcall+0x46/0x1c3
>> [<00000000ce69c793>] do_init_module+0x5a/0x210
>> [<0000000091d4aef2>] load_module+0x21c4/0x2410
>> […]
>> ```
>
> That's really vague. Any chance for a reproducer or some other types of
> hints as to what you feel the problem is here?

Unfortunately, not really. I have a SanDisk USB storage medium attached.

```
$ lsusb
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 005 Device 002: ID 413c:3012 Dell Computer Corp. Optical Wheel Mouse
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 002: ID 0781:558b SanDisk Corp.
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 413c:2105 Dell Computer Corp. Model L100 Keyboard
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
```

After ten or eleven minutes after boot, systemd-udevd gets triggered
and causes the kmemleak message.

```
[ 82.196740] calling fuse_init+0x0/0x1a6 [fuse] @ 4455
[ 82.196741] fuse init (API version 7.27)
[ 82.201779] initcall fuse_init+0x0/0x1a6 [fuse] returned 0 after 4925 usecs
[ 677.532745] kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
```

Please note, that there could be memory corruption issues [1] with AMD
Raven devices. But as I can reproduce the kmemleak messages, I thought
that this is unrelated and that you might have an idea.

Please tell me, if I can provide more information. Sorry for
forgetting to attach the Linux messages.


Kind regards,

Paul


[1]: https://bugs.freedesktop.org/show_bug.cgi?id=105684
"Loading amdgpu hits general protection fault: 0000 [#1] SMP NOPTI"


Attachments:
=?utf-8?Q?20180723=E2=80=93linux-messages=2Etxt?= (146.54 kB)
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature
Download all attachments

2020-01-02 14:11:39

by Paul Menzel

[permalink] [raw]
Subject: Re: BUG: KASAN: use-after-free in xhci_trb_virt_to_dma.part.24+0x1c/0x80

Dear Linux folks,


On 2018-07-23 13:23, Paul Menzel wrote:

> On 07/20/18 11:54, Greg KH wrote:
>> On Fri, Jul 20, 2018 at 11:44:49AM +0200, Paul Menzel wrote:
>
>>> Using Linux 4.18-rc5+ with kmemleak enabled on a MSI B350M MORTAR (MS-7A37)
>>> with an AMD Ryzen 3 2200G, the memory leak below is suspected.
>>>
>>> ```
>>> $ sudo less /sys/kernel/debug/kmemleak
>>> unreferenced object 0xffff894f8874a2b8 (size 8):
>>> comm "systemd-udevd", pid 119, jiffies 4294893109 (age 908.348s)
>>> hex dump (first 8 bytes):
>>> 34 01 05 00 00 00 00 00 4.......
>>> backtrace:
>>> [<00000000308e4456>] xhci_init+0x81/0x170 [xhci_hcd]
>>> [<00000000269aa18f>] xhci_gen_setup+0x2cb/0x510 [xhci_hcd]
>>> [<000000007b70d85f>] xhci_pci_setup+0x4d/0x120 [xhci_pci]
>>> [<0000000059f49127>] usb_add_hcd+0x2b6/0x800 [usbcore]
>>> [<000000009a16d67c>] usb_hcd_pci_probe+0x1f3/0x460 [usbcore]
>>> [<0000000001295c2e>] xhci_pci_probe+0x27/0x1d7 [xhci_pci]
>>> [<00000000395bd8f9>] local_pci_probe+0x41/0x90
>>> [<00000000a344e362>] pci_device_probe+0x189/0x1a0
>>> [<00000000318999e5>] driver_probe_device+0x2b9/0x460
>>> [<00000000c29d8a55>] __driver_attach+0xdd/0x110
>>> [<00000000975b7f46>] bus_for_each_dev+0x76/0xc0
>>> [<000000006bc40955>] bus_add_driver+0x152/0x230
>>> [<00000000840ed63c>] driver_register+0x6b/0xb0
>>> [<00000000123908c4>] do_one_initcall+0x46/0x1c3
>>> [<00000000ce69c793>] do_init_module+0x5a/0x210
>>> [<0000000091d4aef2>] load_module+0x21c4/0x2410
>>> […]
>>> ```
>>
>> That's really vague. Any chance for a reproducer or some other types of
>> hints as to what you feel the problem is here?

I guess you just have to run your system also with kmemleak, and you will
be notified about similar leaks.

> Unfortunately, not really. I have a SanDisk USB storage medium attached.

On the system there is now an external DVD drive connected over USB Type-C.

> ```
> $ lsusb
> Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
> Bus 005 Device 002: ID 413c:3012 Dell Computer Corp. Optical Wheel Mouse
> Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> Bus 004 Device 002: ID 0781:558b SanDisk Corp.
> Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
> Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
> Bus 001 Device 002: ID 413c:2105 Dell Computer Corp. Model L100 Keyboard
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> ```
>
> After ten or eleven minutes after boot, systemd-udevd gets triggered
> and causes the kmemleak message.
>
> ```
> [ 82.196740] calling fuse_init+0x0/0x1a6 [fuse] @ 4455
> [ 82.196741] fuse init (API version 7.27)
> [ 82.201779] initcall fuse_init+0x0/0x1a6 [fuse] returned 0 after 4925 usecs
> [ 677.532745] kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> ```

[…]

> Please tell me, if I can provide more information. Sorry for forgetting
> to attach the Linux messages.

I am still getting this with Linux 5.5-rc4, and the commit below
included.

> commit ce91f1a43b37463f517155bdfbd525eb43adbd1a
> Author: Mika Westerberg <[email protected]>
> Date: Wed Dec 11 16:20:02 2019 +0200
>
> xhci: Fix memory leak in xhci_add_in_port()

Mika, as you fixed the other leak, any idea, how to continue from the
kmemleak log below?

```
unreferenced object 0xffff8c207a1e1408 (size 8):
comm "systemd-udevd", pid 183, jiffies 4294667978 (age 752.292s)
hex dump (first 8 bytes):
34 01 05 00 00 00 00 00 4.......
backtrace:
[<00000000aea7b46d>] xhci_mem_init+0xcfa/0xec0 [xhci_hcd]
[<00000000417c4e3f>] xhci_init+0x81/0x170 [xhci_hcd]
[<00000000dcdd3292>] xhci_gen_setup+0x26a/0x340 [xhci_hcd]
[<0000000079014433>] xhci_pci_setup+0x4d/0x120 [xhci_pci]
[<000000008f4fc4d1>] usb_add_hcd.cold+0x266/0x74b
[<0000000023fadb59>] usb_hcd_pci_probe+0x216/0x3b1
[<0000000006043143>] xhci_pci_probe+0x29/0x1bc [xhci_pci]
[<000000006e8744e3>] local_pci_probe+0x42/0x80
[<00000000120e570a>] pci_device_probe+0x107/0x1a0
[<0000000074c180e1>] really_probe+0x147/0x3c0
[<000000002d64c344>] driver_probe_device+0xb6/0x100
[<00000000e695e4ae>] device_driver_attach+0x53/0x60
[<00000000b7832c47>] __driver_attach+0x8a/0x150
[<0000000069df77eb>] bus_for_each_dev+0x78/0xc0
[<00000000aa6d98a4>] bus_add_driver+0x14d/0x1f0
[<0000000002c7d24b>] driver_register+0x6c/0xc0
unreferenced object 0xffff8c207a1e1718 (size 8):
comm "systemd-udevd", pid 183, jiffies 4294667978 (age 752.292s)
hex dump (first 8 bytes):
34 01 05 00 00 00 00 00 4.......
backtrace:
[<00000000aea7b46d>] xhci_mem_init+0xcfa/0xec0 [xhci_hcd]
[<00000000417c4e3f>] xhci_init+0x81/0x170 [xhci_hcd]
[<00000000dcdd3292>] xhci_gen_setup+0x26a/0x340 [xhci_hcd]
[<0000000079014433>] xhci_pci_setup+0x4d/0x120 [xhci_pci]
[<000000008f4fc4d1>] usb_add_hcd.cold+0x266/0x74b
[<0000000023fadb59>] usb_hcd_pci_probe+0x216/0x3b1
[<0000000006043143>] xhci_pci_probe+0x29/0x1bc [xhci_pci]
[<000000006e8744e3>] local_pci_probe+0x42/0x80
[<00000000120e570a>] pci_device_probe+0x107/0x1a0
[<0000000074c180e1>] really_probe+0x147/0x3c0
[<000000002d64c344>] driver_probe_device+0xb6/0x100
[<00000000e695e4ae>] device_driver_attach+0x53/0x60
[<00000000b7832c47>] __driver_attach+0x8a/0x150
[<0000000069df77eb>] bus_for_each_dev+0x78/0xc0
[<00000000aa6d98a4>] bus_add_driver+0x14d/0x1f0
[<0000000002c7d24b>] driver_register+0x6c/0xc0
unreferenced object 0xffff8c207a1e1338 (size 8):
comm "systemd-udevd", pid 183, jiffies 4294667978 (age 752.292s)
hex dump (first 8 bytes):
34 01 05 00 00 00 00 00 4.......
backtrace:
[<00000000aea7b46d>] xhci_mem_init+0xcfa/0xec0 [xhci_hcd]
[<00000000417c4e3f>] xhci_init+0x81/0x170 [xhci_hcd]
[<00000000dcdd3292>] xhci_gen_setup+0x26a/0x340 [xhci_hcd]
[<0000000079014433>] xhci_pci_setup+0x4d/0x120 [xhci_pci]
[<000000008f4fc4d1>] usb_add_hcd.cold+0x266/0x74b
[<0000000023fadb59>] usb_hcd_pci_probe+0x216/0x3b1
[<0000000006043143>] xhci_pci_probe+0x29/0x1bc [xhci_pci]
[<000000006e8744e3>] local_pci_probe+0x42/0x80
[<00000000120e570a>] pci_device_probe+0x107/0x1a0
[<0000000074c180e1>] really_probe+0x147/0x3c0
[<000000002d64c344>] driver_probe_device+0xb6/0x100
[<00000000e695e4ae>] device_driver_attach+0x53/0x60
[<00000000b7832c47>] __driver_attach+0x8a/0x150
[<0000000069df77eb>] bus_for_each_dev+0x78/0xc0
[<00000000aa6d98a4>] bus_add_driver+0x14d/0x1f0
[<0000000002c7d24b>] driver_register+0x6c/0xc0
```


Kind regards,

Paul


> [1]: https://bugs.freedesktop.org/show_bug.cgi?id=105684
> "Loading amdgpu hits general protection fault: 0000 [#1] SMP NOPTI"[2]: https://patchwork.kernel.org/patch/11242383/


Attachments:
=?UTF-8?Q?20200101=E2=80=93msi-MS-7A37=E2=80=93linux-5=2E5=2E0-rc4-drm-tip-messages-kmemleak=2Etxt?= (69.84 kB)
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature
Download all attachments

2020-01-03 11:06:17

by Mika Westerberg

[permalink] [raw]
Subject: Re: BUG: KASAN: use-after-free in xhci_trb_virt_to_dma.part.24+0x1c/0x80

On Thu, Jan 02, 2020 at 03:10:14PM +0100, Paul Menzel wrote:
> Mika, as you fixed the other leak, any idea, how to continue from the
> kmemleak log below?
>
> ```
> unreferenced object 0xffff8c207a1e1408 (size 8):
> comm "systemd-udevd", pid 183, jiffies 4294667978 (age 752.292s)
> hex dump (first 8 bytes):
> 34 01 05 00 00 00 00 00 4.......
> backtrace:
> [<00000000aea7b46d>] xhci_mem_init+0xcfa/0xec0 [xhci_hcd]

There are probably better ways for doing this but you can use objdump
for example:

$ objdump -l --prefix-addresses -j .text --disassemble=xhci_mem_init drivers/usb/host/xhci-hcd.ko

then find the offset xhci_mem_init+0xcfa. It should show you the line
numbers as well if you have compiled your kernel with debug info. This
should be close to the line that allocated the memory that was leaked.

2020-01-07 12:11:03

by Mathias Nyman

[permalink] [raw]
Subject: Re: BUG: KASAN: use-after-free in xhci_trb_virt_to_dma.part.24+0x1c/0x80

On 3.1.2020 13.04, Mika Westerberg wrote:
> On Thu, Jan 02, 2020 at 03:10:14PM +0100, Paul Menzel wrote:
>> Mika, as you fixed the other leak, any idea, how to continue from the
>> kmemleak log below?
>>
>> ```
>> unreferenced object 0xffff8c207a1e1408 (size 8):
>> comm "systemd-udevd", pid 183, jiffies 4294667978 (age 752.292s)
>> hex dump (first 8 bytes):
>> 34 01 05 00 00 00 00 00 4.......
>> backtrace:
>> [<00000000aea7b46d>] xhci_mem_init+0xcfa/0xec0 [xhci_hcd]
>
> There are probably better ways for doing this but you can use objdump
> for example:
>
> $ objdump -l --prefix-addresses -j .text --disassemble=xhci_mem_init drivers/usb/host/xhci-hcd.ko
>
> then find the offset xhci_mem_init+0xcfa. It should show you the line
> numbers as well if you have compiled your kernel with debug info. This
> should be close to the line that allocated the memory that was leaked.
>

Paul, it possible that your xhci controller has several
supported protocol extended capabilities for usb 3 ports, each
with their own custom protocol speed ID table.

xhci driver assumes there is only one custome PSI table per roothub,
and we will end up allocating the second PSI table on top of the first,
leaking the first.

Could you boot with xhci dynamic debug enabled, and show dmesg after boot, add:
xhci_hcd.dyndbg=+p
to you kernel cmdline.

Or as an alternative, show output of:

sudo cat /sys/kernel/debug/usb/xhci/*/reg-ext-protocol*

-Mathias

2020-01-07 15:36:30

by Paul Menzel

[permalink] [raw]
Subject: Re: BUG: KASAN: use-after-free in xhci_trb_virt_to_dma.part.24+0x1c/0x80

Dear Mathias, dear Mika,


On 2020-01-07 13:09, Mathias Nyman wrote:
> On 3.1.2020 13.04, Mika Westerberg wrote:
>> On Thu, Jan 02, 2020 at 03:10:14PM +0100, Paul Menzel wrote:
>>> Mika, as you fixed the other leak, any idea, how to continue from the
>>> kmemleak log below?
>>>
>>> ```
>>> unreferenced object 0xffff8c207a1e1408 (size 8):
>>>    comm "systemd-udevd", pid 183, jiffies 4294667978 (age 752.292s)
>>>    hex dump (first 8 bytes):
>>>      34 01 05 00 00 00 00 00                          4.......
>>>    backtrace:
>>>      [<00000000aea7b46d>] xhci_mem_init+0xcfa/0xec0 [xhci_hcd]
>>
>> There are probably better ways for doing this but you can use objdump
>> for example:
>>
>>    $ objdump -l --prefix-addresses -j .text --disassemble=xhci_mem_init drivers/usb/host/xhci-hcd.ko
>>
>> then find the offset xhci_mem_init+0xcfa. It should show you the line
>> numbers as well if you have compiled your kernel with debug info. This
>> should be close to the line that allocated the memory that was leaked.

Thank you. I actually remembered `script/f2addr2line`.

$ scripts/faddr2line drivers/usb/host/xhci-hcd.o xhci_mem_init+0xcfa
xhci_mem_init+0xcfa/0xec0:
xhci_add_in_port at /mnt/drivers/usb/host/xhci-mem.c:2161
(inlined by) xhci_setup_port_arrays at /mnt/drivers/usb/host/xhci-mem.c:2309
(inlined by) xhci_mem_init at /mnt/drivers/usb/host/xhci-mem.c:2538

> Paul, it possible that your xhci controller has several
> supported protocol extended capabilities for usb 3 ports, each
> with their own custom protocol speed ID table.
>
> xhci driver assumes there is only one custome PSI table per roothub,
> and we will end up allocating the second PSI table on top of the first,
> leaking the first.
>
> Could you boot with xhci dynamic debug enabled, and show dmesg after boot, add:
> xhci_hcd.dyndbg=+p
> to you kernel cmdline.
>
> Or as an alternative, show output of:
>
> sudo cat /sys/kernel/debug/usb/xhci/*/reg-ext-protocol*

`/sys/kernel/debug/` cannot be read by unprivileged users, so the wildcard does
not work with `sudo`.

```
$ sudo ls /sys/kernel/debug/usb/xhci
0000:12:00.0 0000:26:00.3 0000:26:00.4
# cat /sys/kernel/debug/usb/xhci/*/reg-ext-protocol*
EXTCAP_REVISION = 0x03100802
EXTCAP_NAME = 0x20425355
EXTCAP_PORTINFO = 0x00000201
EXTCAP_PORTTYPE = 0x00000000
EXTCAP_REVISION = 0x03000802
EXTCAP_NAME = 0x20425355
EXTCAP_PORTINFO = 0x00000203
EXTCAP_PORTTYPE = 0x00000000
EXTCAP_REVISION = 0x02000802
EXTCAP_NAME = 0x20425355
EXTCAP_PORTINFO = 0x00190a05
EXTCAP_PORTTYPE = 0x00000000
EXTCAP_REVISION = 0x02000402
EXTCAP_NAME = 0x20425355
EXTCAP_PORTINFO = 0x00180401
EXTCAP_PORTTYPE = 0x00000000
EXTCAP_REVISION = 0x03100802
EXTCAP_NAME = 0x20425355
EXTCAP_PORTINFO = 0x10000105
EXTCAP_PORTTYPE = 0x00000000
EXTCAP_MANTISSA1 = 0x00050134
EXTCAP_REVISION = 0x03100802
EXTCAP_NAME = 0x20425355
EXTCAP_PORTINFO = 0x10000106
EXTCAP_PORTTYPE = 0x00000000
EXTCAP_MANTISSA1 = 0x00050134
EXTCAP_REVISION = 0x03100802
EXTCAP_NAME = 0x20425355
EXTCAP_PORTINFO = 0x10000107
EXTCAP_PORTTYPE = 0x00000000
EXTCAP_MANTISSA1 = 0x00050134
EXTCAP_REVISION = 0x03100802
EXTCAP_NAME = 0x20425355
EXTCAP_PORTINFO = 0x10000108
EXTCAP_PORTTYPE = 0x00000000
EXTCAP_MANTISSA1 = 0x00050134
EXTCAP_REVISION = 0x02000402
EXTCAP_NAME = 0x20425355
EXTCAP_PORTINFO = 0x00180101
EXTCAP_PORTTYPE = 0x00000000
EXTCAP_REVISION = 0x03100802
EXTCAP_NAME = 0x20425355
EXTCAP_PORTINFO = 0x10000102
EXTCAP_PORTTYPE = 0x00000000
EXTCAP_MANTISSA1 = 0x00050134
```


Kind regards,

Paul


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2020-01-08 10:48:36

by Mathias Nyman

[permalink] [raw]
Subject: Re: BUG: KASAN: use-after-free in xhci_trb_virt_to_dma.part.24+0x1c/0x80

On 7.1.2020 17.35, Paul Menzel wrote:
> Dear Mathias, dear Mika,
>
>
> On 2020-01-07 13:09, Mathias Nyman wrote:
>> On 3.1.2020 13.04, Mika Westerberg wrote:
>>> On Thu, Jan 02, 2020 at 03:10:14PM +0100, Paul Menzel wrote:
>>>> Mika, as you fixed the other leak, any idea, how to continue from the
>>>> kmemleak log below?
>>>>
>>>> ```
>>>> unreferenced object 0xffff8c207a1e1408 (size 8):
>>>>    comm "systemd-udevd", pid 183, jiffies 4294667978 (age 752.292s)
>>>>    hex dump (first 8 bytes):
>>>>      34 01 05 00 00 00 00 00                          4.......
>>>>    backtrace:
>>>>      [<00000000aea7b46d>] xhci_mem_init+0xcfa/0xec0 [xhci_hcd]
>>>
>>> There are probably better ways for doing this but you can use objdump
>>> for example:
>>>
>>>    $ objdump -l --prefix-addresses -j .text --disassemble=xhci_mem_init drivers/usb/host/xhci-hcd.ko
>>>
>>> then find the offset xhci_mem_init+0xcfa. It should show you the line
>>> numbers as well if you have compiled your kernel with debug info. This
>>> should be close to the line that allocated the memory that was leaked.
>
> Thank you. I actually remembered `script/f2addr2line`.
>
> $ scripts/faddr2line drivers/usb/host/xhci-hcd.o xhci_mem_init+0xcfa
> xhci_mem_init+0xcfa/0xec0:
> xhci_add_in_port at /mnt/drivers/usb/host/xhci-mem.c:2161
> (inlined by) xhci_setup_port_arrays at /mnt/drivers/usb/host/xhci-mem.c:2309
> (inlined by) xhci_mem_init at /mnt/drivers/usb/host/xhci-mem.c:2538
>
>> Paul, it possible that your xhci controller has several
>> supported protocol extended capabilities for usb 3 ports, each
>> with their own custom protocol speed ID table.
>>
>> xhci driver assumes there is only one custome PSI table per roothub,
>> and we will end up allocating the second PSI table on top of the first,
>> leaking the first.
>>
>> Could you boot with xhci dynamic debug enabled, and show dmesg after boot, add:
>> xhci_hcd.dyndbg=+p
>> to you kernel cmdline.
>>
>> Or as an alternative, show output of:
>>
>> sudo cat /sys/kernel/debug/usb/xhci/*/reg-ext-protocol*
>
> `/sys/kernel/debug/` cannot be read by unprivileged users, so the wildcard does
> not work with `sudo`.
>
> ```
> $ sudo ls /sys/kernel/debug/usb/xhci
> 0000:12:00.0 0000:26:00.3 0000:26:00.4
> # cat /sys/kernel/debug/usb/xhci/*/reg-ext-protocol*

problematic xhci:
capability for first four USB 2 ports
> EXTCAP_REVISION = 0x02000402
> EXTCAP_NAME = 0x20425355
> EXTCAP_PORTINFO = 0x00180401
> EXTCAP_PORTTYPE = 0x00000000

capability for one USB 3.1 port (5th port)
> EXTCAP_REVISION = 0x03100802
> EXTCAP_NAME = 0x20425355
> EXTCAP_PORTINFO = 0x10000105
> EXTCAP_PORTTYPE = 0x00000000
> EXTCAP_MANTISSA1 = 0x00050134
capability for one USB 3.1 port (6th port)
> EXTCAP_REVISION = 0x03100802
> EXTCAP_NAME = 0x20425355
> EXTCAP_PORTINFO = 0x10000106
> EXTCAP_PORTTYPE = 0x00000000
> EXTCAP_MANTISSA1 = 0x00050134
capability for one USB 3.1 port (7th port)
> EXTCAP_REVISION = 0x03100802
> EXTCAP_NAME = 0x20425355
> EXTCAP_PORTINFO = 0x10000107
> EXTCAP_PORTTYPE = 0x00000000
> EXTCAP_MANTISSA1 = 0x00050134
capability for one USB 3.1 port (8th port)
> EXTCAP_REVISION = 0x03100802
> EXTCAP_NAME = 0x20425355
> EXTCAP_PORTINFO = 0x10000108
> EXTCAP_PORTTYPE = 0x00000000
> EXTCAP_MANTISSA1 = 0x00050134

It has eight ports. last four of them are USB 3.1 ports.
It has a very odd setup where each 3.1 port has their own
supported protocol capability with a custom PSI, but all the PSI's are similar,
telling the port only support a 5Gbps speed.

We leak all the custom PSI tables for USB 3.1 ports except the last,
these would be the EXTCAP_MANTISSA1 = 0x00050134, which is the same as
the hex dump of the unreferenced object you posted earlier (considering byte order):

hex dump (first 8 bytes):
34 01 05 00 00 00 00 00 4.......

I'm working on a patch for this

-Mathias

2020-01-08 15:18:05

by Mathias Nyman

[permalink] [raw]
Subject: [RFT PATCH] xhci: Fix memory leak when caching protocol extended capability PSI tables

xhci driver assumed that xHC controllers have at most one custom
supported speed table (PSI) for all usb 3.x ports.
Memory was allocated for one PSI table under the xhci hub structure.

Turns out this is not the case, some controllers have a separate
"supported protocol capability" entry with a PSI table for each port.
This means each usb3 port can in theory support different custom speeds.

To solve this cache all supported protocol capabilities with their PSI
tables in an array, and add pointers to the xhci port structure so that
every port points to its capability entry in the array.

When creating the SuperSpeedPlus USB Device Capability BOS descriptor
for the xhci USB 3.1 roothub we for now will use only data from the
first USB 3.1 capable protocol capability entry in the array.
This could be improved later, this patch focuses resolving
the memory leak.

Reported-by: Paul Menzel <[email protected]>
Reported-by: Sajja Venkateswara Rao <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
---
drivers/usb/host/xhci-hub.c | 25 +++++++++++-----
drivers/usb/host/xhci-mem.c | 60 +++++++++++++++++++++++--------------
drivers/usb/host/xhci.h | 14 +++++++--
3 files changed, 66 insertions(+), 33 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 7a3a29e5e9d2..0974eebd28e7 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -55,6 +55,7 @@ static u8 usb_bos_descriptor [] = {
static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
u16 wLength)
{
+ struct xhci_port_cap *port_cap;
int i, ssa_count;
u32 temp;
u16 desc_size, ssp_cap_size, ssa_size = 0;
@@ -64,16 +65,24 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
ssp_cap_size = sizeof(usb_bos_descriptor) - desc_size;

/* does xhci support USB 3.1 Enhanced SuperSpeed */
- if (xhci->usb3_rhub.min_rev >= 0x01) {
+ for (i = 0; i < xhci->num_port_caps; i++) {
+ if (xhci->port_caps[i].maj_rev == 0x03 &&
+ xhci->port_caps[i].min_rev >= 0x01) {
+ usb3_1 = true;
+ port_cap = &xhci->port_caps[i];
+ break;
+ }
+ }
+
+ if (usb3_1) {
/* does xhci provide a PSI table for SSA speed attributes? */
- if (xhci->usb3_rhub.psi_count) {
+ if (port_cap->psi_count) {
/* two SSA entries for each unique PSI ID, RX and TX */
- ssa_count = xhci->usb3_rhub.psi_uid_count * 2;
+ ssa_count = port_cap->psi_uid_count * 2;
ssa_size = ssa_count * sizeof(u32);
ssp_cap_size -= 16; /* skip copying the default SSA */
}
desc_size += ssp_cap_size;
- usb3_1 = true;
}
memcpy(buf, &usb_bos_descriptor, min(desc_size, wLength));

@@ -99,7 +108,7 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
}

/* If PSI table exists, add the custom speed attributes from it */
- if (usb3_1 && xhci->usb3_rhub.psi_count) {
+ if (usb3_1 && port_cap->psi_count) {
u32 ssp_cap_base, bm_attrib, psi, psi_mant, psi_exp;
int offset;

@@ -111,7 +120,7 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,

/* attribute count SSAC bits 4:0 and ID count SSIC bits 8:5 */
bm_attrib = (ssa_count - 1) & 0x1f;
- bm_attrib |= (xhci->usb3_rhub.psi_uid_count - 1) << 5;
+ bm_attrib |= (port_cap->psi_uid_count - 1) << 5;
put_unaligned_le32(bm_attrib, &buf[ssp_cap_base + 4]);

if (wLength < desc_size + ssa_size)
@@ -124,8 +133,8 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
* USB 3.1 requires two SSA entries (RX and TX) for every link
*/
offset = desc_size;
- for (i = 0; i < xhci->usb3_rhub.psi_count; i++) {
- psi = xhci->usb3_rhub.psi[i];
+ for (i = 0; i < port_cap->psi_count; i++) {
+ psi = port_cap->psi[i];
psi &= ~USB_SSP_SUBLINK_SPEED_RSVD;
psi_exp = XHCI_EXT_PORT_PSIE(psi);
psi_mant = XHCI_EXT_PORT_PSIM(psi);
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 3b1388fa2f36..cf4d27774a7d 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1909,17 +1909,18 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
xhci->usb3_rhub.num_ports = 0;
xhci->num_active_eps = 0;
kfree(xhci->usb2_rhub.ports);
- kfree(xhci->usb2_rhub.psi);
kfree(xhci->usb3_rhub.ports);
- kfree(xhci->usb3_rhub.psi);
kfree(xhci->hw_ports);
kfree(xhci->rh_bw);
kfree(xhci->ext_caps);
+ for (i = 0; i < xhci->num_port_caps; i++) {
+ kfree(xhci->port_caps[i].psi);
+ xhci->port_caps[i].psi = NULL;
+ }
+ kfree(xhci->port_caps);

xhci->usb2_rhub.ports = NULL;
- xhci->usb2_rhub.psi = NULL;
xhci->usb3_rhub.ports = NULL;
- xhci->usb3_rhub.psi = NULL;
xhci->hw_ports = NULL;
xhci->rh_bw = NULL;
xhci->ext_caps = NULL;
@@ -2120,6 +2121,7 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
u8 major_revision, minor_revision;
struct xhci_hub *rhub;
struct device *dev = xhci_to_hcd(xhci)->self.sysdev;
+ struct xhci_port_cap *port_cap;

temp = readl(addr);
major_revision = XHCI_EXT_PORT_MAJOR(temp);
@@ -2154,31 +2156,39 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
/* WTF? "Valid values are ‘1’ to MaxPorts" */
return;

- rhub->psi_count = XHCI_EXT_PORT_PSIC(temp);
- if (rhub->psi_count) {
- rhub->psi = kcalloc_node(rhub->psi_count, sizeof(*rhub->psi),
- GFP_KERNEL, dev_to_node(dev));
- if (!rhub->psi)
- rhub->psi_count = 0;
+ port_cap = &xhci->port_caps[xhci->num_port_caps++];
+ if (xhci->num_port_caps > max_caps)
+ return;
+
+ port_cap->maj_rev = major_revision;
+ port_cap->min_rev = minor_revision;
+ port_cap->psi_count = XHCI_EXT_PORT_PSIC(temp);
+
+ if (port_cap->psi_count) {
+ port_cap->psi = kcalloc_node(port_cap->psi_count,
+ sizeof(*port_cap->psi),
+ GFP_KERNEL, dev_to_node(dev));
+ if (!port_cap->psi)
+ port_cap->psi_count = 0;

- rhub->psi_uid_count++;
- for (i = 0; i < rhub->psi_count; i++) {
- rhub->psi[i] = readl(addr + 4 + i);
+ port_cap->psi_uid_count++;
+ for (i = 0; i < port_cap->psi_count; i++) {
+ port_cap->psi[i] = readl(addr + 4 + i);

/* count unique ID values, two consecutive entries can
* have the same ID if link is assymetric
*/
- if (i && (XHCI_EXT_PORT_PSIV(rhub->psi[i]) !=
- XHCI_EXT_PORT_PSIV(rhub->psi[i - 1])))
- rhub->psi_uid_count++;
+ if (i && (XHCI_EXT_PORT_PSIV(port_cap->psi[i]) !=
+ XHCI_EXT_PORT_PSIV(port_cap->psi[i - 1])))
+ port_cap->psi_uid_count++;

xhci_dbg(xhci, "PSIV:%d PSIE:%d PLT:%d PFD:%d LP:%d PSIM:%d\n",
- XHCI_EXT_PORT_PSIV(rhub->psi[i]),
- XHCI_EXT_PORT_PSIE(rhub->psi[i]),
- XHCI_EXT_PORT_PLT(rhub->psi[i]),
- XHCI_EXT_PORT_PFD(rhub->psi[i]),
- XHCI_EXT_PORT_LP(rhub->psi[i]),
- XHCI_EXT_PORT_PSIM(rhub->psi[i]));
+ XHCI_EXT_PORT_PSIV(port_cap->psi[i]),
+ XHCI_EXT_PORT_PSIE(port_cap->psi[i]),
+ XHCI_EXT_PORT_PLT(port_cap->psi[i]),
+ XHCI_EXT_PORT_PFD(port_cap->psi[i]),
+ XHCI_EXT_PORT_LP(port_cap->psi[i]),
+ XHCI_EXT_PORT_PSIM(port_cap->psi[i]));
}
}
/* cache usb2 port capabilities */
@@ -2213,6 +2223,7 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
continue;
}
hw_port->rhub = rhub;
+ hw_port->port_cap = port_cap;
rhub->num_ports++;
}
/* FIXME: Should we disable ports not in the Extended Capabilities? */
@@ -2303,6 +2314,11 @@ static int xhci_setup_port_arrays(struct xhci_hcd *xhci, gfp_t flags)
if (!xhci->ext_caps)
return -ENOMEM;

+ xhci->port_caps = kcalloc_node(cap_count, sizeof(*xhci->port_caps),
+ flags, dev_to_node(dev));
+ if (!xhci->port_caps)
+ return -ENOMEM;
+
offset = cap_start;

while (offset) {
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 13d8838cd552..3ecee10fdcdc 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1702,12 +1702,20 @@ struct xhci_bus_state {
* Intel Lynx Point LP xHCI host.
*/
#define XHCI_MAX_REXIT_TIMEOUT_MS 20
+struct xhci_port_cap {
+ u32 *psi; /* array of protocol speed ID entries */
+ u8 psi_count;
+ u8 psi_uid_count;
+ u8 maj_rev;
+ u8 min_rev;
+};

struct xhci_port {
__le32 __iomem *addr;
int hw_portnum;
int hcd_portnum;
struct xhci_hub *rhub;
+ struct xhci_port_cap *port_cap;
};

struct xhci_hub {
@@ -1719,9 +1727,6 @@ struct xhci_hub {
/* supported prococol extended capabiliy values */
u8 maj_rev;
u8 min_rev;
- u32 *psi; /* array of protocol speed ID entries */
- u8 psi_count;
- u8 psi_uid_count;
};

/* There is one xhci_hcd structure per controller */
@@ -1880,6 +1885,9 @@ struct xhci_hcd {
/* cached usb2 extened protocol capabilites */
u32 *ext_caps;
unsigned int num_ext_caps;
+ /* cached extended protocol port capabilities */
+ struct xhci_port_cap *port_caps;
+ unsigned int num_port_caps;
/* Compliance Mode Recovery Data */
struct timer_list comp_mode_recovery_timer;
u32 port_status_u0;
--
2.17.1

2020-01-08 17:42:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFT PATCH] xhci: Fix memory leak when caching protocol extended capability PSI tables

On Wed, Jan 08, 2020 at 05:17:30PM +0200, Mathias Nyman wrote:
> xhci driver assumed that xHC controllers have at most one custom
> supported speed table (PSI) for all usb 3.x ports.
> Memory was allocated for one PSI table under the xhci hub structure.
>
> Turns out this is not the case, some controllers have a separate
> "supported protocol capability" entry with a PSI table for each port.
> This means each usb3 port can in theory support different custom speeds.

Is there a "max" number of port capabilities that can happen? Or this
this truely dynamic?

> + for (i = 0; i < xhci->num_port_caps; i++) {
> + kfree(xhci->port_caps[i].psi);
> + xhci->port_caps[i].psi = NULL;
> + }

Nit, no need to set to NULL here :)

thanks,

greg k-h

2020-01-08 17:46:20

by Mathias Nyman

[permalink] [raw]
Subject: Re: [RFT PATCH] xhci: Fix memory leak when caching protocol extended capability PSI tables

On 8.1.2020 17.40, Greg KH wrote:
> On Wed, Jan 08, 2020 at 05:17:30PM +0200, Mathias Nyman wrote:
>> xhci driver assumed that xHC controllers have at most one custom
>> supported speed table (PSI) for all usb 3.x ports.
>> Memory was allocated for one PSI table under the xhci hub structure.
>>
>> Turns out this is not the case, some controllers have a separate
>> "supported protocol capability" entry with a PSI table for each port.
>> This means each usb3 port can in theory support different custom speeds.
>
> Is there a "max" number of port capabilities that can happen? Or this
> this truely dynamic?

Almost truly dynamic, each capability points to the next, last points to 0

But we can't have more "supported protocol capabilities" than xHC ports.
(MaxPorts value in xHC HCSPARAMS1 register)

>
>> + for (i = 0; i < xhci->num_port_caps; i++) {
>> + kfree(xhci->port_caps[i].psi);
>> + xhci->port_caps[i].psi = NULL;
>> + }
>
> Nit, no need to set to NULL here :)

Thanks, will remove that

-Mathias

2020-01-09 10:12:12

by Felipe Balbi

[permalink] [raw]
Subject: Re: BUG: KASAN: use-after-free in xhci_trb_virt_to_dma.part.24+0x1c/0x80


Hi,

Mika Westerberg <[email protected]> writes:

> On Thu, Jan 02, 2020 at 03:10:14PM +0100, Paul Menzel wrote:
>> Mika, as you fixed the other leak, any idea, how to continue from the
>> kmemleak log below?
>>
>> ```
>> unreferenced object 0xffff8c207a1e1408 (size 8):
>> comm "systemd-udevd", pid 183, jiffies 4294667978 (age 752.292s)
>> hex dump (first 8 bytes):
>> 34 01 05 00 00 00 00 00 4.......
>> backtrace:
>> [<00000000aea7b46d>] xhci_mem_init+0xcfa/0xec0 [xhci_hcd]
>
> There are probably better ways for doing this but you can use objdump
> for example:
>
> $ objdump -l --prefix-addresses -j .text --disassemble=xhci_mem_init drivers/usb/host/xhci-hcd.ko
>
> then find the offset xhci_mem_init+0xcfa. It should show you the line
> numbers as well if you have compiled your kernel with debug info. This
> should be close to the line that allocated the memory that was leaked.

addr2line helps here. So does gdb (gdb vmlinux l *(xhci_mem_init+0xcfa))

--
balbi


Attachments:
signature.asc (847.00 B)

2020-02-11 11:44:14

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [RFT PATCH] xhci: Fix memory leak when caching protocol extended capability PSI tables

Hi

On 08.01.2020 16:17, Mathias Nyman wrote:
> xhci driver assumed that xHC controllers have at most one custom
> supported speed table (PSI) for all usb 3.x ports.
> Memory was allocated for one PSI table under the xhci hub structure.
>
> Turns out this is not the case, some controllers have a separate
> "supported protocol capability" entry with a PSI table for each port.
> This means each usb3 port can in theory support different custom speeds.
>
> To solve this cache all supported protocol capabilities with their PSI
> tables in an array, and add pointers to the xhci port structure so that
> every port points to its capability entry in the array.
>
> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
> for the xhci USB 3.1 roothub we for now will use only data from the
> first USB 3.1 capable protocol capability entry in the array.
> This could be improved later, this patch focuses resolving
> the memory leak.
>
> Reported-by: Paul Menzel <[email protected]>
> Reported-by: Sajja Venkateswara Rao <[email protected]>
> Signed-off-by: Mathias Nyman <[email protected]>

This patch landed in today's linux-next (20200211) and causes NULL
pointer dereference during second suspend/resume cycle on Samsung
Exynos5422-based (arm 32bit) Odroid XU3lite board:

# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020
PM: suspend entry (deep)
Filesystems sync: 0.012 seconds
Freezing user space processes ... (elapsed 0.010 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
wake enabled for irq 153
wake enabled for irq 158
samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt
mask: 0xffffffe7
Disabling non-boot CPUs ...
IRQ 51: no longer affine to CPU1
IRQ 52: no longer affine to CPU2
s3c2410-wdt 101d0000.watchdog: watchdog disabled
wake disabled for irq 158
usb usb1: root hub lost power or was reset
usb usb2: root hub lost power or was reset
wake disabled for irq 153
exynos-tmu 10060000.tmu: More trip points than supported by this TMU.
exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode.
exynos-tmu 10064000.tmu: More trip points than supported by this TMU.
exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode.
exynos-tmu 10068000.tmu: More trip points than supported by this TMU.
exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode.
exynos-tmu 1006c000.tmu: More trip points than supported by this TMU.
exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode.
exynos-tmu 100a0000.tmu: More trip points than supported by this TMU.
exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode.
usb usb3: root hub lost power or was reset
s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
usb usb4: root hub lost power or was reset
xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?
PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12
PM: Device xhci-hcd.8.auto failed to resume async: error -12
hub 3-0:1.0: hub_ext_port_status failed (err = -32)
hub 4-0:1.0: hub_ext_port_status failed (err = -32)
usb 1-1: reset high-speed USB device number 2 using exynos-ehci
usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
OOM killer enabled.
Restarting tasks ... done.

real    0m11.890s
user    0m0.001s
sys     0m0.679s
root@target:~# PM: suspend exit
mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz,
actual 396825HZ div = 63)
mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz,
actual 200000000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz,
actual 50000000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz,
actual 200000000HZ div = 1)
smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1

root@target:~#
root@target:~# time rtcwake -s10 -mmem[   35.451572] vdd_ldo12: disabling

rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020
PM: suspend entry (deep)
Filesystems sync: 0.004 seconds
Freezing user space processes ... (elapsed 0.006 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done.
hub 4-0:1.0: hub_ext_port_status failed (err = -32)
hub 3-0:1.0: hub_ext_port_status failed (err = -32)
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 00000014
pgd = 4c26b54b
[00000014] *pgd=00000000
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted
5.6.0-rc1-next-20200211 #268
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events_unbound async_run_entry_fn
PC is at xhci_suspend+0x12c/0x520
LR is at 0xa6aa9898
pc : [<c0724c90>]    lr : [<a6aa9898>]    psr: 60000093
sp : ec401df8  ip : 0000001a  fp : c12e7864
r10: 00000000  r9 : ecfb87b0  r8 : ecfb8220
r7 : 00000000  r6 : 00000000  r5 : 00000004  r4 : ecfb81f0
r3 : 00007d00  r2 : 00000001  r1 : 00000001  r0 : 00000000
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 6bd4006a  DAC: 00000051
Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba)
Stack: (0xec401df8 to 0xec402000)
...
[<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc)
[<c061b4f4>] (dpm_run_callback) from [<c061bd5c>]
(__device_suspend+0x134/0x7e8)
[<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94)
[<c061c42c>] (async_suspend) from [<c0154bd0>]
(async_run_entry_fn+0x48/0x1b8)
[<c0154bd0>] (async_run_entry_fn) from [<c0149b38>]
(process_one_work+0x230/0x7bc)
[<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524)
[<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164)
[<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Exception stack(0xec401fb0 to 0xec401ff8)
...
---[ end trace c72caf6487666442 ]---
note: kworker/u16:23[1468] exited with preempt_count 1

Reverting it fixes the NULL pointer issue. I can provide more
information or do some other tests. Just let me know what will help to
fix it.

> ...

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2020-02-11 12:47:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFT PATCH] xhci: Fix memory leak when caching protocol extended capability PSI tables

On Tue, Feb 11, 2020 at 11:56:12AM +0100, Marek Szyprowski wrote:
> Hi
>
> On 08.01.2020 16:17, Mathias Nyman wrote:
> > xhci driver assumed that xHC controllers have at most one custom
> > supported speed table (PSI) for all usb 3.x ports.
> > Memory was allocated for one PSI table under the xhci hub structure.
> >
> > Turns out this is not the case, some controllers have a separate
> > "supported protocol capability" entry with a PSI table for each port.
> > This means each usb3 port can in theory support different custom speeds.
> >
> > To solve this cache all supported protocol capabilities with their PSI
> > tables in an array, and add pointers to the xhci port structure so that
> > every port points to its capability entry in the array.
> >
> > When creating the SuperSpeedPlus USB Device Capability BOS descriptor
> > for the xhci USB 3.1 roothub we for now will use only data from the
> > first USB 3.1 capable protocol capability entry in the array.
> > This could be improved later, this patch focuses resolving
> > the memory leak.
> >
> > Reported-by: Paul Menzel <[email protected]>
> > Reported-by: Sajja Venkateswara Rao <[email protected]>
> > Signed-off-by: Mathias Nyman <[email protected]>
>
> This patch landed in today's linux-next (20200211) and causes NULL
> pointer dereference during second suspend/resume cycle on Samsung
> Exynos5422-based (arm 32bit) Odroid XU3lite board:
>
> # time rtcwake -s10 -mmem
> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020
> PM: suspend entry (deep)
> Filesystems sync: 0.012 seconds
> Freezing user space processes ... (elapsed 0.010 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
> smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
> wake enabled for irq 153
> wake enabled for irq 158
> samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt
> mask: 0xffffffe7
> Disabling non-boot CPUs ...
> IRQ 51: no longer affine to CPU1
> IRQ 52: no longer affine to CPU2
> s3c2410-wdt 101d0000.watchdog: watchdog disabled
> wake disabled for irq 158
> usb usb1: root hub lost power or was reset
> usb usb2: root hub lost power or was reset
> wake disabled for irq 153
> exynos-tmu 10060000.tmu: More trip points than supported by this TMU.
> exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 10064000.tmu: More trip points than supported by this TMU.
> exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 10068000.tmu: More trip points than supported by this TMU.
> exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 1006c000.tmu: More trip points than supported by this TMU.
> exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 100a0000.tmu: More trip points than supported by this TMU.
> exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode.
> usb usb3: root hub lost power or was reset
> s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
> usb usb4: root hub lost power or was reset
> xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?
> PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12
> PM: Device xhci-hcd.8.auto failed to resume async: error -12
> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
> usb 1-1: reset high-speed USB device number 2 using exynos-ehci
> usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
> OOM killer enabled.
> Restarting tasks ... done.
>
> real??? 0m11.890s
> user??? 0m0.001s
> sys???? 0m0.679s
> root@target:~# PM: suspend exit
> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz,
> actual 396825HZ div = 63)
> mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz,
> actual 200000000HZ div = 0)
> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz,
> actual 50000000HZ div = 0)
> mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz,
> actual 200000000HZ div = 1)
> smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
>
> root@target:~#
> root@target:~# time rtcwake -s10 -mmem[?? 35.451572] vdd_ldo12: disabling
>
> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020
> PM: suspend entry (deep)
> Filesystems sync: 0.004 seconds
> Freezing user space processes ... (elapsed 0.006 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done.
> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
> 8<--- cut here ---
> Unable to handle kernel NULL pointer dereference at virtual address 00000014
> pgd = 4c26b54b
> [00000014] *pgd=00000000
> Internal error: Oops: 17 [#1] PREEMPT SMP ARM
> Modules linked in:
> CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted
> 5.6.0-rc1-next-20200211 #268
> Hardware name: Samsung Exynos (Flattened Device Tree)
> Workqueue: events_unbound async_run_entry_fn
> PC is at xhci_suspend+0x12c/0x520
> LR is at 0xa6aa9898
> pc : [<c0724c90>]??? lr : [<a6aa9898>]??? psr: 60000093
> sp : ec401df8? ip : 0000001a? fp : c12e7864
> r10: 00000000? r9 : ecfb87b0? r8 : ecfb8220
> r7 : 00000000? r6 : 00000000? r5 : 00000004? r4 : ecfb81f0
> r3 : 00007d00? r2 : 00000001? r1 : 00000001? r0 : 00000000
> Flags: nZCv? IRQs off? FIQs on? Mode SVC_32? ISA ARM? Segment none
> Control: 10c5387d? Table: 6bd4006a? DAC: 00000051
> Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba)
> Stack: (0xec401df8 to 0xec402000)
> ...
> [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc)
> [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>]
> (__device_suspend+0x134/0x7e8)
> [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94)
> [<c061c42c>] (async_suspend) from [<c0154bd0>]
> (async_run_entry_fn+0x48/0x1b8)
> [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>]
> (process_one_work+0x230/0x7bc)
> [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524)
> [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164)
> [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> Exception stack(0xec401fb0 to 0xec401ff8)
> ...
> ---[ end trace c72caf6487666442 ]---
> note: kworker/u16:23[1468] exited with preempt_count 1
>
> Reverting it fixes the NULL pointer issue. I can provide more
> information or do some other tests. Just let me know what will help to
> fix it.
>
> > ...

Ugh. Mathias, should I just revert this for now?

thanks,

greg k-h

2020-02-11 12:48:10

by Mathias Nyman

[permalink] [raw]
Subject: Re: [RFT PATCH] xhci: Fix memory leak when caching protocol extended capability PSI tables

On 11.2.2020 14.23, Greg KH wrote:
> On Tue, Feb 11, 2020 at 11:56:12AM +0100, Marek Szyprowski wrote:
>> Hi
>>
>> On 08.01.2020 16:17, Mathias Nyman wrote:
>>> xhci driver assumed that xHC controllers have at most one custom
>>> supported speed table (PSI) for all usb 3.x ports.
>>> Memory was allocated for one PSI table under the xhci hub structure.
>>>
>>> Turns out this is not the case, some controllers have a separate
>>> "supported protocol capability" entry with a PSI table for each port.
>>> This means each usb3 port can in theory support different custom speeds.
>>>
>>> To solve this cache all supported protocol capabilities with their PSI
>>> tables in an array, and add pointers to the xhci port structure so that
>>> every port points to its capability entry in the array.
>>>
>>> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
>>> for the xhci USB 3.1 roothub we for now will use only data from the
>>> first USB 3.1 capable protocol capability entry in the array.
>>> This could be improved later, this patch focuses resolving
>>> the memory leak.
>>>
>>> Reported-by: Paul Menzel <[email protected]>
>>> Reported-by: Sajja Venkateswara Rao <[email protected]>
>>> Signed-off-by: Mathias Nyman <[email protected]>
>>
>> This patch landed in today's linux-next (20200211) and causes NULL
>> pointer dereference during second suspend/resume cycle on Samsung
>> Exynos5422-based (arm 32bit) Odroid XU3lite board:
>>
>> # time rtcwake -s10 -mmem
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020
>> PM: suspend entry (deep)
>> Filesystems sync: 0.012 seconds
>> Freezing user space processes ... (elapsed 0.010 seconds) done.
>> OOM killer disabled.
>> Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
>> smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
>> wake enabled for irq 153
>> wake enabled for irq 158
>> samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt
>> mask: 0xffffffe7
>> Disabling non-boot CPUs ...
>> IRQ 51: no longer affine to CPU1
>> IRQ 52: no longer affine to CPU2
>> s3c2410-wdt 101d0000.watchdog: watchdog disabled
>> wake disabled for irq 158
>> usb usb1: root hub lost power or was reset
>> usb usb2: root hub lost power or was reset
>> wake disabled for irq 153
>> exynos-tmu 10060000.tmu: More trip points than supported by this TMU.
>> exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode.
>> exynos-tmu 10064000.tmu: More trip points than supported by this TMU.
>> exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode.
>> exynos-tmu 10068000.tmu: More trip points than supported by this TMU.
>> exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode.
>> exynos-tmu 1006c000.tmu: More trip points than supported by this TMU.
>> exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode.
>> exynos-tmu 100a0000.tmu: More trip points than supported by this TMU.
>> exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode.
>> usb usb3: root hub lost power or was reset
>> s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
>> usb usb4: root hub lost power or was reset
>> xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?
>> PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12
>> PM: Device xhci-hcd.8.auto failed to resume async: error -12
>> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
>> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
>> usb 1-1: reset high-speed USB device number 2 using exynos-ehci
>> usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
>> OOM killer enabled.
>> Restarting tasks ... done.
>>
>> real    0m11.890s
>> user    0m0.001s
>> sys     0m0.679s
>> root@target:~# PM: suspend exit
>> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz,
>> actual 396825HZ div = 63)
>> mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz,
>> actual 200000000HZ div = 0)
>> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz,
>> actual 50000000HZ div = 0)
>> mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz,
>> actual 200000000HZ div = 1)
>> smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
>>
>> root@target:~#
>> root@target:~# time rtcwake -s10 -mmem[   35.451572] vdd_ldo12: disabling
>>
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020
>> PM: suspend entry (deep)
>> Filesystems sync: 0.004 seconds
>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>> OOM killer disabled.
>> Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done.
>> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
>> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
>> 8<--- cut here ---
>> Unable to handle kernel NULL pointer dereference at virtual address 00000014
>> pgd = 4c26b54b
>> [00000014] *pgd=00000000
>> Internal error: Oops: 17 [#1] PREEMPT SMP ARM
>> Modules linked in:
>> CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted
>> 5.6.0-rc1-next-20200211 #268
>> Hardware name: Samsung Exynos (Flattened Device Tree)
>> Workqueue: events_unbound async_run_entry_fn
>> PC is at xhci_suspend+0x12c/0x520
>> LR is at 0xa6aa9898
>> pc : [<c0724c90>]    lr : [<a6aa9898>]    psr: 60000093
>> sp : ec401df8  ip : 0000001a  fp : c12e7864
>> r10: 00000000  r9 : ecfb87b0  r8 : ecfb8220
>> r7 : 00000000  r6 : 00000000  r5 : 00000004  r4 : ecfb81f0
>> r3 : 00007d00  r2 : 00000001  r1 : 00000001  r0 : 00000000
>> Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
>> Control: 10c5387d  Table: 6bd4006a  DAC: 00000051
>> Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba)
>> Stack: (0xec401df8 to 0xec402000)
>> ...
>> [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc)
>> [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>]
>> (__device_suspend+0x134/0x7e8)
>> [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94)
>> [<c061c42c>] (async_suspend) from [<c0154bd0>]
>> (async_run_entry_fn+0x48/0x1b8)
>> [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>]
>> (process_one_work+0x230/0x7bc)
>> [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524)
>> [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164)
>> [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>> Exception stack(0xec401fb0 to 0xec401ff8)
>> ...
>> ---[ end trace c72caf6487666442 ]---
>> note: kworker/u16:23[1468] exited with preempt_count 1
>>
>> Reverting it fixes the NULL pointer issue. I can provide more
>> information or do some other tests. Just let me know what will help to
>> fix it.
>>
>> > ...
>
> Ugh. Mathias, should I just revert this for now?
>

Yes, revert it.

This looks very odd, after second resume, and losing power driver
can't find any port at all.

Marek, do you still get the "xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?"
message on second resume after reverting the patch?

-Mathias

2020-02-11 14:31:52

by Mathias Nyman

[permalink] [raw]
Subject: Re: [RFT PATCH] xhci: Fix memory leak when caching protocol extended capability PSI tables

On 11.2.2020 14.29, Mathias Nyman wrote:
> On 11.2.2020 14.23, Greg KH wrote:
>> On Tue, Feb 11, 2020 at 11:56:12AM +0100, Marek Szyprowski wrote:
>>> Hi
>>>
>>> On 08.01.2020 16:17, Mathias Nyman wrote:
>>>> xhci driver assumed that xHC controllers have at most one custom
>>>> supported speed table (PSI) for all usb 3.x ports.
>>>> Memory was allocated for one PSI table under the xhci hub structure.
>>>>
>>>> Turns out this is not the case, some controllers have a separate
>>>> "supported protocol capability" entry with a PSI table for each port.
>>>> This means each usb3 port can in theory support different custom speeds.
>>>>
>>>> To solve this cache all supported protocol capabilities with their PSI
>>>> tables in an array, and add pointers to the xhci port structure so that
>>>> every port points to its capability entry in the array.
>>>>
>>>> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
>>>> for the xhci USB 3.1 roothub we for now will use only data from the
>>>> first USB 3.1 capable protocol capability entry in the array.
>>>> This could be improved later, this patch focuses resolving
>>>> the memory leak.
>>>>
>>>> Reported-by: Paul Menzel <[email protected]>
>>>> Reported-by: Sajja Venkateswara Rao <[email protected]>
>>>> Signed-off-by: Mathias Nyman <[email protected]>
>>>
>>> This patch landed in today's linux-next (20200211) and causes NULL
>>> pointer dereference during second suspend/resume cycle on Samsung
>>> Exynos5422-based (arm 32bit) Odroid XU3lite board:
>>>
>>> # time rtcwake -s10 -mmem
>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020
>>> PM: suspend entry (deep)
>>> Filesystems sync: 0.012 seconds
>>> Freezing user space processes ... (elapsed 0.010 seconds) done.
>>> OOM killer disabled.
>>> Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
>>> smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
>>> wake enabled for irq 153
>>> wake enabled for irq 158
>>> samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt
>>> mask: 0xffffffe7
>>> Disabling non-boot CPUs ...
>>> IRQ 51: no longer affine to CPU1
>>> IRQ 52: no longer affine to CPU2
>>> s3c2410-wdt 101d0000.watchdog: watchdog disabled
>>> wake disabled for irq 158
>>> usb usb1: root hub lost power or was reset
>>> usb usb2: root hub lost power or was reset
>>> wake disabled for irq 153
>>> exynos-tmu 10060000.tmu: More trip points than supported by this TMU.
>>> exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode.
>>> exynos-tmu 10064000.tmu: More trip points than supported by this TMU.
>>> exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode.
>>> exynos-tmu 10068000.tmu: More trip points than supported by this TMU.
>>> exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode.
>>> exynos-tmu 1006c000.tmu: More trip points than supported by this TMU.
>>> exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode.
>>> exynos-tmu 100a0000.tmu: More trip points than supported by this TMU.
>>> exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode.
>>> usb usb3: root hub lost power or was reset
>>> s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
>>> usb usb4: root hub lost power or was reset
>>> xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?
>>> PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12
>>> PM: Device xhci-hcd.8.auto failed to resume async: error -12
>>> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
>>> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
>>> usb 1-1: reset high-speed USB device number 2 using exynos-ehci
>>> usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
>>> OOM killer enabled.
>>> Restarting tasks ... done.
>>>
>>> real    0m11.890s
>>> user    0m0.001s
>>> sys     0m0.679s
>>> root@target:~# PM: suspend exit
>>> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz,
>>> actual 396825HZ div = 63)
>>> mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz,
>>> actual 200000000HZ div = 0)
>>> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz,
>>> actual 50000000HZ div = 0)
>>> mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz,
>>> actual 200000000HZ div = 1)
>>> smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
>>>
>>> root@target:~#
>>> root@target:~# time rtcwake -s10 -mmem[   35.451572] vdd_ldo12: disabling
>>>
>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020
>>> PM: suspend entry (deep)
>>> Filesystems sync: 0.004 seconds
>>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>>> OOM killer disabled.
>>> Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done.
>>> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
>>> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
>>> 8<--- cut here ---
>>> Unable to handle kernel NULL pointer dereference at virtual address 00000014
>>> pgd = 4c26b54b
>>> [00000014] *pgd=00000000
>>> Internal error: Oops: 17 [#1] PREEMPT SMP ARM
>>> Modules linked in:
>>> CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted
>>> 5.6.0-rc1-next-20200211 #268
>>> Hardware name: Samsung Exynos (Flattened Device Tree)
>>> Workqueue: events_unbound async_run_entry_fn
>>> PC is at xhci_suspend+0x12c/0x520
>>> LR is at 0xa6aa9898
>>> pc : [<c0724c90>]    lr : [<a6aa9898>]    psr: 60000093
>>> sp : ec401df8  ip : 0000001a  fp : c12e7864
>>> r10: 00000000  r9 : ecfb87b0  r8 : ecfb8220
>>> r7 : 00000000  r6 : 00000000  r5 : 00000004  r4 : ecfb81f0
>>> r3 : 00007d00  r2 : 00000001  r1 : 00000001  r0 : 00000000
>>> Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
>>> Control: 10c5387d  Table: 6bd4006a  DAC: 00000051
>>> Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba)
>>> Stack: (0xec401df8 to 0xec402000)
>>> ...
>>> [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc)
>>> [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>]
>>> (__device_suspend+0x134/0x7e8)
>>> [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94)
>>> [<c061c42c>] (async_suspend) from [<c0154bd0>]
>>> (async_run_entry_fn+0x48/0x1b8)
>>> [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>]
>>> (process_one_work+0x230/0x7bc)
>>> [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524)
>>> [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164)
>>> [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>>> Exception stack(0xec401fb0 to 0xec401ff8)
>>> ...
>>> ---[ end trace c72caf6487666442 ]---
>>> note: kworker/u16:23[1468] exited with preempt_count 1
>>>
>>> Reverting it fixes the NULL pointer issue. I can provide more
>>> information or do some other tests. Just let me know what will help to
>>> fix it.
>>>
>>> > ...
>>
>> Ugh. Mathias, should I just revert this for now?
>>
>
> Yes, revert it.
>
> This looks very odd, after second resume, and losing power driver
> can't find any port at all.
>
> Marek, do you still get the "xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?"
> message on second resume after reverting the patch?
>

Ok, I think I got it.
Patch doesn't set xhci->num_port_caps to 0 in xhci_mem_cleanup().

Adding new ports will fail when we reinitialize xhci manually, like in this
exynos case where xhci loses power in suspend/resume cycle.

I'll post a new version soon

-Mathias


2020-02-11 16:56:14

by Mathias Nyman

[permalink] [raw]
Subject: [RFT PATCH v2] xhci: Fix memory leak when caching protocol extended capability PSI tables

xhci driver assumed that xHC controllers have at most one custom
supported speed table (PSI) for all usb 3.x ports.
Memory was allocated for one PSI table under the xhci hub structure.

Turns out this is not the case, some controllers have a separate
"supported protocol capability" entry with a PSI table for each port.
This means each usb3 roothub port can in theory support different custom
speeds.

To solve this, cache all supported protocol capabilities with their PSI
tables in an array, and add pointers to the xhci port structure so that
every port points to its capability entry in the array.

When creating the SuperSpeedPlus USB Device Capability BOS descriptor
for the xhci USB 3.1 roothub we for now will use only data from the
first USB 3.1 capable protocol capability entry in the array.
This could be improved later, this patch focuses resolving
the memory leak.

Reported-by: Paul Menzel <[email protected]>
Reported-by: Sajja Venkateswara Rao <[email protected]>
Fixes: 47189098f8be ("xhci: parse xhci protocol speed ID list for usb 3.1 usage")
Cc: stable <[email protected]> # v4.4+
Signed-off-by: Mathias Nyman <[email protected]>
---

Changes since v1:

- Clear xhci->num_port_caps in xhci_mem_cleanup()
Otherwise we fail to add new ports and cause NULL pointer dereference at
manual xhci re-initialization. This can happen at resume if host lost power
during suspend.
---
drivers/usb/host/xhci-hub.c | 25 +++++++++++-----
drivers/usb/host/xhci-mem.c | 59 +++++++++++++++++++++++--------------
drivers/usb/host/xhci.h | 14 +++++++--
3 files changed, 65 insertions(+), 33 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 7a3a29e5e9d2..af92b2576fe9 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -55,6 +55,7 @@ static u8 usb_bos_descriptor [] = {
static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
u16 wLength)
{
+ struct xhci_port_cap *port_cap = NULL;
int i, ssa_count;
u32 temp;
u16 desc_size, ssp_cap_size, ssa_size = 0;
@@ -64,16 +65,24 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
ssp_cap_size = sizeof(usb_bos_descriptor) - desc_size;

/* does xhci support USB 3.1 Enhanced SuperSpeed */
- if (xhci->usb3_rhub.min_rev >= 0x01) {
+ for (i = 0; i < xhci->num_port_caps; i++) {
+ if (xhci->port_caps[i].maj_rev == 0x03 &&
+ xhci->port_caps[i].min_rev >= 0x01) {
+ usb3_1 = true;
+ port_cap = &xhci->port_caps[i];
+ break;
+ }
+ }
+
+ if (usb3_1) {
/* does xhci provide a PSI table for SSA speed attributes? */
- if (xhci->usb3_rhub.psi_count) {
+ if (port_cap->psi_count) {
/* two SSA entries for each unique PSI ID, RX and TX */
- ssa_count = xhci->usb3_rhub.psi_uid_count * 2;
+ ssa_count = port_cap->psi_uid_count * 2;
ssa_size = ssa_count * sizeof(u32);
ssp_cap_size -= 16; /* skip copying the default SSA */
}
desc_size += ssp_cap_size;
- usb3_1 = true;
}
memcpy(buf, &usb_bos_descriptor, min(desc_size, wLength));

@@ -99,7 +108,7 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
}

/* If PSI table exists, add the custom speed attributes from it */
- if (usb3_1 && xhci->usb3_rhub.psi_count) {
+ if (usb3_1 && port_cap->psi_count) {
u32 ssp_cap_base, bm_attrib, psi, psi_mant, psi_exp;
int offset;

@@ -111,7 +120,7 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,

/* attribute count SSAC bits 4:0 and ID count SSIC bits 8:5 */
bm_attrib = (ssa_count - 1) & 0x1f;
- bm_attrib |= (xhci->usb3_rhub.psi_uid_count - 1) << 5;
+ bm_attrib |= (port_cap->psi_uid_count - 1) << 5;
put_unaligned_le32(bm_attrib, &buf[ssp_cap_base + 4]);

if (wLength < desc_size + ssa_size)
@@ -124,8 +133,8 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
* USB 3.1 requires two SSA entries (RX and TX) for every link
*/
offset = desc_size;
- for (i = 0; i < xhci->usb3_rhub.psi_count; i++) {
- psi = xhci->usb3_rhub.psi[i];
+ for (i = 0; i < port_cap->psi_count; i++) {
+ psi = port_cap->psi[i];
psi &= ~USB_SSP_SUBLINK_SPEED_RSVD;
psi_exp = XHCI_EXT_PORT_PSIE(psi);
psi_mant = XHCI_EXT_PORT_PSIM(psi);
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 0e2701649369..884c601bfa15 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1915,17 +1915,17 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
xhci->usb3_rhub.num_ports = 0;
xhci->num_active_eps = 0;
kfree(xhci->usb2_rhub.ports);
- kfree(xhci->usb2_rhub.psi);
kfree(xhci->usb3_rhub.ports);
- kfree(xhci->usb3_rhub.psi);
kfree(xhci->hw_ports);
kfree(xhci->rh_bw);
kfree(xhci->ext_caps);
+ for (i = 0; i < xhci->num_port_caps; i++)
+ kfree(xhci->port_caps[i].psi);
+ kfree(xhci->port_caps);
+ xhci->num_port_caps = 0;

xhci->usb2_rhub.ports = NULL;
- xhci->usb2_rhub.psi = NULL;
xhci->usb3_rhub.ports = NULL;
- xhci->usb3_rhub.psi = NULL;
xhci->hw_ports = NULL;
xhci->rh_bw = NULL;
xhci->ext_caps = NULL;
@@ -2126,6 +2126,7 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
u8 major_revision, minor_revision;
struct xhci_hub *rhub;
struct device *dev = xhci_to_hcd(xhci)->self.sysdev;
+ struct xhci_port_cap *port_cap;

temp = readl(addr);
major_revision = XHCI_EXT_PORT_MAJOR(temp);
@@ -2160,31 +2161,39 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
/* WTF? "Valid values are ‘1’ to MaxPorts" */
return;

- rhub->psi_count = XHCI_EXT_PORT_PSIC(temp);
- if (rhub->psi_count) {
- rhub->psi = kcalloc_node(rhub->psi_count, sizeof(*rhub->psi),
- GFP_KERNEL, dev_to_node(dev));
- if (!rhub->psi)
- rhub->psi_count = 0;
+ port_cap = &xhci->port_caps[xhci->num_port_caps++];
+ if (xhci->num_port_caps > max_caps)
+ return;
+
+ port_cap->maj_rev = major_revision;
+ port_cap->min_rev = minor_revision;
+ port_cap->psi_count = XHCI_EXT_PORT_PSIC(temp);
+
+ if (port_cap->psi_count) {
+ port_cap->psi = kcalloc_node(port_cap->psi_count,
+ sizeof(*port_cap->psi),
+ GFP_KERNEL, dev_to_node(dev));
+ if (!port_cap->psi)
+ port_cap->psi_count = 0;

- rhub->psi_uid_count++;
- for (i = 0; i < rhub->psi_count; i++) {
- rhub->psi[i] = readl(addr + 4 + i);
+ port_cap->psi_uid_count++;
+ for (i = 0; i < port_cap->psi_count; i++) {
+ port_cap->psi[i] = readl(addr + 4 + i);

/* count unique ID values, two consecutive entries can
* have the same ID if link is assymetric
*/
- if (i && (XHCI_EXT_PORT_PSIV(rhub->psi[i]) !=
- XHCI_EXT_PORT_PSIV(rhub->psi[i - 1])))
- rhub->psi_uid_count++;
+ if (i && (XHCI_EXT_PORT_PSIV(port_cap->psi[i]) !=
+ XHCI_EXT_PORT_PSIV(port_cap->psi[i - 1])))
+ port_cap->psi_uid_count++;

xhci_dbg(xhci, "PSIV:%d PSIE:%d PLT:%d PFD:%d LP:%d PSIM:%d\n",
- XHCI_EXT_PORT_PSIV(rhub->psi[i]),
- XHCI_EXT_PORT_PSIE(rhub->psi[i]),
- XHCI_EXT_PORT_PLT(rhub->psi[i]),
- XHCI_EXT_PORT_PFD(rhub->psi[i]),
- XHCI_EXT_PORT_LP(rhub->psi[i]),
- XHCI_EXT_PORT_PSIM(rhub->psi[i]));
+ XHCI_EXT_PORT_PSIV(port_cap->psi[i]),
+ XHCI_EXT_PORT_PSIE(port_cap->psi[i]),
+ XHCI_EXT_PORT_PLT(port_cap->psi[i]),
+ XHCI_EXT_PORT_PFD(port_cap->psi[i]),
+ XHCI_EXT_PORT_LP(port_cap->psi[i]),
+ XHCI_EXT_PORT_PSIM(port_cap->psi[i]));
}
}
/* cache usb2 port capabilities */
@@ -2219,6 +2228,7 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
continue;
}
hw_port->rhub = rhub;
+ hw_port->port_cap = port_cap;
rhub->num_ports++;
}
/* FIXME: Should we disable ports not in the Extended Capabilities? */
@@ -2309,6 +2319,11 @@ static int xhci_setup_port_arrays(struct xhci_hcd *xhci, gfp_t flags)
if (!xhci->ext_caps)
return -ENOMEM;

+ xhci->port_caps = kcalloc_node(cap_count, sizeof(*xhci->port_caps),
+ flags, dev_to_node(dev));
+ if (!xhci->port_caps)
+ return -ENOMEM;
+
offset = cap_start;

while (offset) {
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 13d8838cd552..3ecee10fdcdc 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1702,12 +1702,20 @@ struct xhci_bus_state {
* Intel Lynx Point LP xHCI host.
*/
#define XHCI_MAX_REXIT_TIMEOUT_MS 20
+struct xhci_port_cap {
+ u32 *psi; /* array of protocol speed ID entries */
+ u8 psi_count;
+ u8 psi_uid_count;
+ u8 maj_rev;
+ u8 min_rev;
+};

struct xhci_port {
__le32 __iomem *addr;
int hw_portnum;
int hcd_portnum;
struct xhci_hub *rhub;
+ struct xhci_port_cap *port_cap;
};

struct xhci_hub {
@@ -1719,9 +1727,6 @@ struct xhci_hub {
/* supported prococol extended capabiliy values */
u8 maj_rev;
u8 min_rev;
- u32 *psi; /* array of protocol speed ID entries */
- u8 psi_count;
- u8 psi_uid_count;
};

/* There is one xhci_hcd structure per controller */
@@ -1880,6 +1885,9 @@ struct xhci_hcd {
/* cached usb2 extened protocol capabilites */
u32 *ext_caps;
unsigned int num_ext_caps;
+ /* cached extended protocol port capabilities */
+ struct xhci_port_cap *port_caps;
+ unsigned int num_port_caps;
/* Compliance Mode Recovery Data */
struct timer_list comp_mode_recovery_timer;
u32 port_status_u0;
--
2.17.1

2020-02-11 17:00:10

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [RFT PATCH v2] xhci: Fix memory leak when caching protocol extended capability PSI tables

Hi Mathias,

On 11.02.2020 16:01, Mathias Nyman wrote:
> xhci driver assumed that xHC controllers have at most one custom
> supported speed table (PSI) for all usb 3.x ports.
> Memory was allocated for one PSI table under the xhci hub structure.
>
> Turns out this is not the case, some controllers have a separate
> "supported protocol capability" entry with a PSI table for each port.
> This means each usb3 roothub port can in theory support different custom
> speeds.
>
> To solve this, cache all supported protocol capabilities with their PSI
> tables in an array, and add pointers to the xhci port structure so that
> every port points to its capability entry in the array.
>
> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
> for the xhci USB 3.1 roothub we for now will use only data from the
> first USB 3.1 capable protocol capability entry in the array.
> This could be improved later, this patch focuses resolving
> the memory leak.
>
> Reported-by: Paul Menzel <[email protected]>
> Reported-by: Sajja Venkateswara Rao <[email protected]>
> Fixes: 47189098f8be ("xhci: parse xhci protocol speed ID list for usb 3.1 usage")
> Cc: stable <[email protected]> # v4.4+
> Signed-off-by: Mathias Nyman <[email protected]>

Tested-by: Marek Szyprowski <[email protected]>

> ---
>
> Changes since v1:
>
> - Clear xhci->num_port_caps in xhci_mem_cleanup()
> Otherwise we fail to add new ports and cause NULL pointer dereference at
> manual xhci re-initialization. This can happen at resume if host lost power
> during suspend.
> ---
> drivers/usb/host/xhci-hub.c | 25 +++++++++++-----
> drivers/usb/host/xhci-mem.c | 59 +++++++++++++++++++++++--------------
> drivers/usb/host/xhci.h | 14 +++++++--
> 3 files changed, 65 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
> index 7a3a29e5e9d2..af92b2576fe9 100644
> --- a/drivers/usb/host/xhci-hub.c
> +++ b/drivers/usb/host/xhci-hub.c
> @@ -55,6 +55,7 @@ static u8 usb_bos_descriptor [] = {
> static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
> u16 wLength)
> {
> + struct xhci_port_cap *port_cap = NULL;
> int i, ssa_count;
> u32 temp;
> u16 desc_size, ssp_cap_size, ssa_size = 0;
> @@ -64,16 +65,24 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
> ssp_cap_size = sizeof(usb_bos_descriptor) - desc_size;
>
> /* does xhci support USB 3.1 Enhanced SuperSpeed */
> - if (xhci->usb3_rhub.min_rev >= 0x01) {
> + for (i = 0; i < xhci->num_port_caps; i++) {
> + if (xhci->port_caps[i].maj_rev == 0x03 &&
> + xhci->port_caps[i].min_rev >= 0x01) {
> + usb3_1 = true;
> + port_cap = &xhci->port_caps[i];
> + break;
> + }
> + }
> +
> + if (usb3_1) {
> /* does xhci provide a PSI table for SSA speed attributes? */
> - if (xhci->usb3_rhub.psi_count) {
> + if (port_cap->psi_count) {
> /* two SSA entries for each unique PSI ID, RX and TX */
> - ssa_count = xhci->usb3_rhub.psi_uid_count * 2;
> + ssa_count = port_cap->psi_uid_count * 2;
> ssa_size = ssa_count * sizeof(u32);
> ssp_cap_size -= 16; /* skip copying the default SSA */
> }
> desc_size += ssp_cap_size;
> - usb3_1 = true;
> }
> memcpy(buf, &usb_bos_descriptor, min(desc_size, wLength));
>
> @@ -99,7 +108,7 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
> }
>
> /* If PSI table exists, add the custom speed attributes from it */
> - if (usb3_1 && xhci->usb3_rhub.psi_count) {
> + if (usb3_1 && port_cap->psi_count) {
> u32 ssp_cap_base, bm_attrib, psi, psi_mant, psi_exp;
> int offset;
>
> @@ -111,7 +120,7 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
>
> /* attribute count SSAC bits 4:0 and ID count SSIC bits 8:5 */
> bm_attrib = (ssa_count - 1) & 0x1f;
> - bm_attrib |= (xhci->usb3_rhub.psi_uid_count - 1) << 5;
> + bm_attrib |= (port_cap->psi_uid_count - 1) << 5;
> put_unaligned_le32(bm_attrib, &buf[ssp_cap_base + 4]);
>
> if (wLength < desc_size + ssa_size)
> @@ -124,8 +133,8 @@ static int xhci_create_usb3_bos_desc(struct xhci_hcd *xhci, char *buf,
> * USB 3.1 requires two SSA entries (RX and TX) for every link
> */
> offset = desc_size;
> - for (i = 0; i < xhci->usb3_rhub.psi_count; i++) {
> - psi = xhci->usb3_rhub.psi[i];
> + for (i = 0; i < port_cap->psi_count; i++) {
> + psi = port_cap->psi[i];
> psi &= ~USB_SSP_SUBLINK_SPEED_RSVD;
> psi_exp = XHCI_EXT_PORT_PSIE(psi);
> psi_mant = XHCI_EXT_PORT_PSIM(psi);
> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
> index 0e2701649369..884c601bfa15 100644
> --- a/drivers/usb/host/xhci-mem.c
> +++ b/drivers/usb/host/xhci-mem.c
> @@ -1915,17 +1915,17 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
> xhci->usb3_rhub.num_ports = 0;
> xhci->num_active_eps = 0;
> kfree(xhci->usb2_rhub.ports);
> - kfree(xhci->usb2_rhub.psi);
> kfree(xhci->usb3_rhub.ports);
> - kfree(xhci->usb3_rhub.psi);
> kfree(xhci->hw_ports);
> kfree(xhci->rh_bw);
> kfree(xhci->ext_caps);
> + for (i = 0; i < xhci->num_port_caps; i++)
> + kfree(xhci->port_caps[i].psi);
> + kfree(xhci->port_caps);
> + xhci->num_port_caps = 0;
>
> xhci->usb2_rhub.ports = NULL;
> - xhci->usb2_rhub.psi = NULL;
> xhci->usb3_rhub.ports = NULL;
> - xhci->usb3_rhub.psi = NULL;
> xhci->hw_ports = NULL;
> xhci->rh_bw = NULL;
> xhci->ext_caps = NULL;
> @@ -2126,6 +2126,7 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
> u8 major_revision, minor_revision;
> struct xhci_hub *rhub;
> struct device *dev = xhci_to_hcd(xhci)->self.sysdev;
> + struct xhci_port_cap *port_cap;
>
> temp = readl(addr);
> major_revision = XHCI_EXT_PORT_MAJOR(temp);
> @@ -2160,31 +2161,39 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
> /* WTF? "Valid values are ‘1’ to MaxPorts" */
> return;
>
> - rhub->psi_count = XHCI_EXT_PORT_PSIC(temp);
> - if (rhub->psi_count) {
> - rhub->psi = kcalloc_node(rhub->psi_count, sizeof(*rhub->psi),
> - GFP_KERNEL, dev_to_node(dev));
> - if (!rhub->psi)
> - rhub->psi_count = 0;
> + port_cap = &xhci->port_caps[xhci->num_port_caps++];
> + if (xhci->num_port_caps > max_caps)
> + return;
> +
> + port_cap->maj_rev = major_revision;
> + port_cap->min_rev = minor_revision;
> + port_cap->psi_count = XHCI_EXT_PORT_PSIC(temp);
> +
> + if (port_cap->psi_count) {
> + port_cap->psi = kcalloc_node(port_cap->psi_count,
> + sizeof(*port_cap->psi),
> + GFP_KERNEL, dev_to_node(dev));
> + if (!port_cap->psi)
> + port_cap->psi_count = 0;
>
> - rhub->psi_uid_count++;
> - for (i = 0; i < rhub->psi_count; i++) {
> - rhub->psi[i] = readl(addr + 4 + i);
> + port_cap->psi_uid_count++;
> + for (i = 0; i < port_cap->psi_count; i++) {
> + port_cap->psi[i] = readl(addr + 4 + i);
>
> /* count unique ID values, two consecutive entries can
> * have the same ID if link is assymetric
> */
> - if (i && (XHCI_EXT_PORT_PSIV(rhub->psi[i]) !=
> - XHCI_EXT_PORT_PSIV(rhub->psi[i - 1])))
> - rhub->psi_uid_count++;
> + if (i && (XHCI_EXT_PORT_PSIV(port_cap->psi[i]) !=
> + XHCI_EXT_PORT_PSIV(port_cap->psi[i - 1])))
> + port_cap->psi_uid_count++;
>
> xhci_dbg(xhci, "PSIV:%d PSIE:%d PLT:%d PFD:%d LP:%d PSIM:%d\n",
> - XHCI_EXT_PORT_PSIV(rhub->psi[i]),
> - XHCI_EXT_PORT_PSIE(rhub->psi[i]),
> - XHCI_EXT_PORT_PLT(rhub->psi[i]),
> - XHCI_EXT_PORT_PFD(rhub->psi[i]),
> - XHCI_EXT_PORT_LP(rhub->psi[i]),
> - XHCI_EXT_PORT_PSIM(rhub->psi[i]));
> + XHCI_EXT_PORT_PSIV(port_cap->psi[i]),
> + XHCI_EXT_PORT_PSIE(port_cap->psi[i]),
> + XHCI_EXT_PORT_PLT(port_cap->psi[i]),
> + XHCI_EXT_PORT_PFD(port_cap->psi[i]),
> + XHCI_EXT_PORT_LP(port_cap->psi[i]),
> + XHCI_EXT_PORT_PSIM(port_cap->psi[i]));
> }
> }
> /* cache usb2 port capabilities */
> @@ -2219,6 +2228,7 @@ static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
> continue;
> }
> hw_port->rhub = rhub;
> + hw_port->port_cap = port_cap;
> rhub->num_ports++;
> }
> /* FIXME: Should we disable ports not in the Extended Capabilities? */
> @@ -2309,6 +2319,11 @@ static int xhci_setup_port_arrays(struct xhci_hcd *xhci, gfp_t flags)
> if (!xhci->ext_caps)
> return -ENOMEM;
>
> + xhci->port_caps = kcalloc_node(cap_count, sizeof(*xhci->port_caps),
> + flags, dev_to_node(dev));
> + if (!xhci->port_caps)
> + return -ENOMEM;
> +
> offset = cap_start;
>
> while (offset) {
> diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
> index 13d8838cd552..3ecee10fdcdc 100644
> --- a/drivers/usb/host/xhci.h
> +++ b/drivers/usb/host/xhci.h
> @@ -1702,12 +1702,20 @@ struct xhci_bus_state {
> * Intel Lynx Point LP xHCI host.
> */
> #define XHCI_MAX_REXIT_TIMEOUT_MS 20
> +struct xhci_port_cap {
> + u32 *psi; /* array of protocol speed ID entries */
> + u8 psi_count;
> + u8 psi_uid_count;
> + u8 maj_rev;
> + u8 min_rev;
> +};
>
> struct xhci_port {
> __le32 __iomem *addr;
> int hw_portnum;
> int hcd_portnum;
> struct xhci_hub *rhub;
> + struct xhci_port_cap *port_cap;
> };
>
> struct xhci_hub {
> @@ -1719,9 +1727,6 @@ struct xhci_hub {
> /* supported prococol extended capabiliy values */
> u8 maj_rev;
> u8 min_rev;
> - u32 *psi; /* array of protocol speed ID entries */
> - u8 psi_count;
> - u8 psi_uid_count;
> };
>
> /* There is one xhci_hcd structure per controller */
> @@ -1880,6 +1885,9 @@ struct xhci_hcd {
> /* cached usb2 extened protocol capabilites */
> u32 *ext_caps;
> unsigned int num_ext_caps;
> + /* cached extended protocol port capabilities */
> + struct xhci_port_cap *port_caps;
> + unsigned int num_port_caps;
> /* Compliance Mode Recovery Data */
> struct timer_list comp_mode_recovery_timer;
> u32 port_status_u0;

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2020-02-11 17:15:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFT PATCH v2] xhci: Fix memory leak when caching protocol extended capability PSI tables

On Tue, Feb 11, 2020 at 04:12:40PM +0100, Marek Szyprowski wrote:
> Hi Mathias,
>
> On 11.02.2020 16:01, Mathias Nyman wrote:
> > xhci driver assumed that xHC controllers have at most one custom
> > supported speed table (PSI) for all usb 3.x ports.
> > Memory was allocated for one PSI table under the xhci hub structure.
> >
> > Turns out this is not the case, some controllers have a separate
> > "supported protocol capability" entry with a PSI table for each port.
> > This means each usb3 roothub port can in theory support different custom
> > speeds.
> >
> > To solve this, cache all supported protocol capabilities with their PSI
> > tables in an array, and add pointers to the xhci port structure so that
> > every port points to its capability entry in the array.
> >
> > When creating the SuperSpeedPlus USB Device Capability BOS descriptor
> > for the xhci USB 3.1 roothub we for now will use only data from the
> > first USB 3.1 capable protocol capability entry in the array.
> > This could be improved later, this patch focuses resolving
> > the memory leak.
> >
> > Reported-by: Paul Menzel <[email protected]>
> > Reported-by: Sajja Venkateswara Rao <[email protected]>
> > Fixes: 47189098f8be ("xhci: parse xhci protocol speed ID list for usb 3.1 usage")
> > Cc: stable <[email protected]> # v4.4+
> > Signed-off-by: Mathias Nyman <[email protected]>
>
> Tested-by: Marek Szyprowski <[email protected]>

Nice!

Should I revert the first and then apply this?

thanks,

greg k-h

2020-02-12 08:59:56

by Mathias Nyman

[permalink] [raw]
Subject: Re: [RFT PATCH v2] xhci: Fix memory leak when caching protocol extended capability PSI tables

On 11.2.2020 18.13, Greg KH wrote:
> On Tue, Feb 11, 2020 at 04:12:40PM +0100, Marek Szyprowski wrote:
>> Hi Mathias,
>>
>> On 11.02.2020 16:01, Mathias Nyman wrote:
>>> xhci driver assumed that xHC controllers have at most one custom
>>> supported speed table (PSI) for all usb 3.x ports.
>>> Memory was allocated for one PSI table under the xhci hub structure.
>>>
>>> Turns out this is not the case, some controllers have a separate
>>> "supported protocol capability" entry with a PSI table for each port.
>>> This means each usb3 roothub port can in theory support different custom
>>> speeds.
>>>
>>> To solve this, cache all supported protocol capabilities with their PSI
>>> tables in an array, and add pointers to the xhci port structure so that
>>> every port points to its capability entry in the array.
>>>
>>> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
>>> for the xhci USB 3.1 roothub we for now will use only data from the
>>> first USB 3.1 capable protocol capability entry in the array.
>>> This could be improved later, this patch focuses resolving
>>> the memory leak.
>>>
>>> Reported-by: Paul Menzel <[email protected]>
>>> Reported-by: Sajja Venkateswara Rao <[email protected]>
>>> Fixes: 47189098f8be ("xhci: parse xhci protocol speed ID list for usb 3.1 usage")
>>> Cc: stable <[email protected]> # v4.4+
>>> Signed-off-by: Mathias Nyman <[email protected]>
>>
>> Tested-by: Marek Szyprowski <[email protected]>
>
> Nice!
>
> Should I revert the first and then apply this?
>

Yes, please

Thanks

-Mathias


2020-02-12 17:52:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFT PATCH v2] xhci: Fix memory leak when caching protocol extended capability PSI tables

On Wed, Feb 12, 2020 at 11:01:52AM +0200, Mathias Nyman wrote:
> On 11.2.2020 18.13, Greg KH wrote:
> > On Tue, Feb 11, 2020 at 04:12:40PM +0100, Marek Szyprowski wrote:
> >> Hi Mathias,
> >>
> >> On 11.02.2020 16:01, Mathias Nyman wrote:
> >>> xhci driver assumed that xHC controllers have at most one custom
> >>> supported speed table (PSI) for all usb 3.x ports.
> >>> Memory was allocated for one PSI table under the xhci hub structure.
> >>>
> >>> Turns out this is not the case, some controllers have a separate
> >>> "supported protocol capability" entry with a PSI table for each port.
> >>> This means each usb3 roothub port can in theory support different custom
> >>> speeds.
> >>>
> >>> To solve this, cache all supported protocol capabilities with their PSI
> >>> tables in an array, and add pointers to the xhci port structure so that
> >>> every port points to its capability entry in the array.
> >>>
> >>> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
> >>> for the xhci USB 3.1 roothub we for now will use only data from the
> >>> first USB 3.1 capable protocol capability entry in the array.
> >>> This could be improved later, this patch focuses resolving
> >>> the memory leak.
> >>>
> >>> Reported-by: Paul Menzel <[email protected]>
> >>> Reported-by: Sajja Venkateswara Rao <[email protected]>
> >>> Fixes: 47189098f8be ("xhci: parse xhci protocol speed ID list for usb 3.1 usage")
> >>> Cc: stable <[email protected]> # v4.4+
> >>> Signed-off-by: Mathias Nyman <[email protected]>
> >>
> >> Tested-by: Marek Szyprowski <[email protected]>
> >
> > Nice!
> >
> > Should I revert the first and then apply this?
> >
>
> Yes, please

Now done, thanks.

greg k-h

2020-02-13 13:34:08

by Jon Hunter

[permalink] [raw]
Subject: Re: [RFT PATCH v2] xhci: Fix memory leak when caching protocol extended capability PSI tables


On 11/02/2020 15:01, Mathias Nyman wrote:
> xhci driver assumed that xHC controllers have at most one custom
> supported speed table (PSI) for all usb 3.x ports.
> Memory was allocated for one PSI table under the xhci hub structure.
>
> Turns out this is not the case, some controllers have a separate
> "supported protocol capability" entry with a PSI table for each port.
> This means each usb3 roothub port can in theory support different custom
> speeds.
>
> To solve this, cache all supported protocol capabilities with their PSI
> tables in an array, and add pointers to the xhci port structure so that
> every port points to its capability entry in the array.
>
> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
> for the xhci USB 3.1 roothub we for now will use only data from the
> first USB 3.1 capable protocol capability entry in the array.
> This could be improved later, this patch focuses resolving
> the memory leak.
>
> Reported-by: Paul Menzel <[email protected]>
> Reported-by: Sajja Venkateswara Rao <[email protected]>
> Fixes: 47189098f8be ("xhci: parse xhci protocol speed ID list for usb 3.1 usage")
> Cc: stable <[email protected]> # v4.4+
> Signed-off-by: Mathias Nyman <[email protected]>


Since next-20200211, we have been observing a regression exiting suspend
on our Tegra124 Jetson TK1 board. Bisect is pointing to this commit and
reverting on top of -next fixes the problem.

On exiting suspend, I am seeing the following ...

[ 56.216793] tegra-xusb 70090000.usb: Firmware already loaded, Falcon state 0x20
[ 56.216834] usb usb3: root hub lost power or was reset
[ 56.216837] usb usb4: root hub lost power or was reset
[ 56.217760] tegra-xusb 70090000.usb: No ports on the roothubs?
[ 56.218257] tegra-xusb 70090000.usb: failed to resume XHCI: -12
[ 56.218299] PM: dpm_run_callback(): platform_pm_resume+0x0/0x40 returns -12
[ 56.218312] PM: Device 70090000.usb failed to resume: error -12
[ 56.334366] hub 4-0:1.0: hub_ext_port_status failed (err = -32)
[ 56.334368] hub 3-0:1.0: hub_ext_port_status failed (err = -32)

Let me know if you have any thoughts on this.

Cheers
Jon

--
nvpublic

2020-02-14 07:45:54

by Mathias Nyman

[permalink] [raw]
Subject: Re: [RFT PATCH v2] xhci: Fix memory leak when caching protocol extended capability PSI tables

On 13.2.2020 15.33, Jon Hunter wrote:
>
> On 11/02/2020 15:01, Mathias Nyman wrote:
>> xhci driver assumed that xHC controllers have at most one custom
>> supported speed table (PSI) for all usb 3.x ports.
>> Memory was allocated for one PSI table under the xhci hub structure.
>>
>> Turns out this is not the case, some controllers have a separate
>> "supported protocol capability" entry with a PSI table for each port.
>> This means each usb3 roothub port can in theory support different custom
>> speeds.
>>
>> To solve this, cache all supported protocol capabilities with their PSI
>> tables in an array, and add pointers to the xhci port structure so that
>> every port points to its capability entry in the array.
>>
>> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
>> for the xhci USB 3.1 roothub we for now will use only data from the
>> first USB 3.1 capable protocol capability entry in the array.
>> This could be improved later, this patch focuses resolving
>> the memory leak.
>>
>> Reported-by: Paul Menzel <[email protected]>
>> Reported-by: Sajja Venkateswara Rao <[email protected]>
>> Fixes: 47189098f8be ("xhci: parse xhci protocol speed ID list for usb 3.1 usage")
>> Cc: stable <[email protected]> # v4.4+
>> Signed-off-by: Mathias Nyman <[email protected]>
>
>
> Since next-20200211, we have been observing a regression exiting suspend
> on our Tegra124 Jetson TK1 board. Bisect is pointing to this commit and
> reverting on top of -next fixes the problem.
>
> On exiting suspend, I am seeing the following ...
>
> [ 56.216793] tegra-xusb 70090000.usb: Firmware already loaded, Falcon state 0x20
> [ 56.216834] usb usb3: root hub lost power or was reset
> [ 56.216837] usb usb4: root hub lost power or was reset
> [ 56.217760] tegra-xusb 70090000.usb: No ports on the roothubs?
> [ 56.218257] tegra-xusb 70090000.usb: failed to resume XHCI: -12
> [ 56.218299] PM: dpm_run_callback(): platform_pm_resume+0x0/0x40 returns -12
> [ 56.218312] PM: Device 70090000.usb failed to resume: error -12
> [ 56.334366] hub 4-0:1.0: hub_ext_port_status failed (err = -32)
> [ 56.334368] hub 3-0:1.0: hub_ext_port_status failed (err = -32)
>
> Let me know if you have any thoughts on this.
>
> Cheers
> Jon

This was an issue with the first version, and should be fixed in the second.

next-20200211 has the faulty version,
next-20200213 is fixed, reverted first version and applied second.

Does next-20200213 work for you?

-Mathias

2020-02-14 08:36:35

by Jon Hunter

[permalink] [raw]
Subject: Re: [RFT PATCH v2] xhci: Fix memory leak when caching protocol extended capability PSI tables


On 14/02/2020 07:47, Mathias Nyman wrote:
> On 13.2.2020 15.33, Jon Hunter wrote:
>>
>> On 11/02/2020 15:01, Mathias Nyman wrote:
>>> xhci driver assumed that xHC controllers have at most one custom
>>> supported speed table (PSI) for all usb 3.x ports.
>>> Memory was allocated for one PSI table under the xhci hub structure.
>>>
>>> Turns out this is not the case, some controllers have a separate
>>> "supported protocol capability" entry with a PSI table for each port.
>>> This means each usb3 roothub port can in theory support different custom
>>> speeds.
>>>
>>> To solve this, cache all supported protocol capabilities with their PSI
>>> tables in an array, and add pointers to the xhci port structure so that
>>> every port points to its capability entry in the array.
>>>
>>> When creating the SuperSpeedPlus USB Device Capability BOS descriptor
>>> for the xhci USB 3.1 roothub we for now will use only data from the
>>> first USB 3.1 capable protocol capability entry in the array.
>>> This could be improved later, this patch focuses resolving
>>> the memory leak.
>>>
>>> Reported-by: Paul Menzel <[email protected]>
>>> Reported-by: Sajja Venkateswara Rao <[email protected]>
>>> Fixes: 47189098f8be ("xhci: parse xhci protocol speed ID list for usb 3.1 usage")
>>> Cc: stable <[email protected]> # v4.4+
>>> Signed-off-by: Mathias Nyman <[email protected]>
>>
>>
>> Since next-20200211, we have been observing a regression exiting suspend
>> on our Tegra124 Jetson TK1 board. Bisect is pointing to this commit and
>> reverting on top of -next fixes the problem.
>>
>> On exiting suspend, I am seeing the following ...
>>
>> [ 56.216793] tegra-xusb 70090000.usb: Firmware already loaded, Falcon state 0x20
>> [ 56.216834] usb usb3: root hub lost power or was reset
>> [ 56.216837] usb usb4: root hub lost power or was reset
>> [ 56.217760] tegra-xusb 70090000.usb: No ports on the roothubs?
>> [ 56.218257] tegra-xusb 70090000.usb: failed to resume XHCI: -12
>> [ 56.218299] PM: dpm_run_callback(): platform_pm_resume+0x0/0x40 returns -12
>> [ 56.218312] PM: Device 70090000.usb failed to resume: error -12
>> [ 56.334366] hub 4-0:1.0: hub_ext_port_status failed (err = -32)
>> [ 56.334368] hub 3-0:1.0: hub_ext_port_status failed (err = -32)
>>
>> Let me know if you have any thoughts on this.
>>
>> Cheers
>> Jon
>
> This was an issue with the first version, and should be fixed in the second.
>
> next-20200211 has the faulty version,
> next-20200213 is fixed, reverted first version and applied second.
>
> Does next-20200213 work for you?

Yes it does. Sorry I am an idiot and should have read the changes and
thread more closely!

Thanks for fixing so quickly.

Jon

--
nvpublic