2020-01-17 07:44:48

by Macpaul Lin

[permalink] [raw]
Subject: [PATCH] xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd

According to NULL pointer fix: https://tinyurl.com/uqft5ra
xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
The similar issue has also been found in QC activities in Mediatek.

Here quote the description from the referenced patch as follows.
"Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
at xhci removal") sets xhci_shared_hcd to NULL without
stopping xhci host. This results into a race condition
where shared_hcd (super speed roothub) related interrupts
are being handled with xhci_irq happens when the
xhci_plat_remove is called and shared_hcd is set to NULL.
Fix this by setting the shared_hcd to NULL only after the
controller is halted and no interrupts are generated."

Signed-off-by: Sriharsha Allenki <[email protected]>
Signed-off-by: Macpaul Lin <[email protected]>
---
drivers/usb/host/xhci-mtk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
index b18a6baef204..c227c67f5dc5 100644
--- a/drivers/usb/host/xhci-mtk.c
+++ b/drivers/usb/host/xhci-mtk.c
@@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
struct usb_hcd *shared_hcd = xhci->shared_hcd;

usb_remove_hcd(shared_hcd);
- xhci->shared_hcd = NULL;
device_init_wakeup(&dev->dev, false);

usb_remove_hcd(hcd);
usb_put_hcd(shared_hcd);
+ xhci->shared_hcd = NULL;
usb_put_hcd(hcd);
xhci_mtk_sch_exit(mtk);
xhci_mtk_clks_disable(mtk);
--
2.18.0


2020-01-31 14:50:21

by Mathias Nyman

[permalink] [raw]
Subject: Re: [PATCH] xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd

On 17.1.2020 9.41, Macpaul Lin wrote:
> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> The similar issue has also been found in QC activities in Mediatek.
>
> Here quote the description from the referenced patch as follows.
> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> at xhci removal") sets xhci_shared_hcd to NULL without
> stopping xhci host. This results into a race condition
> where shared_hcd (super speed roothub) related interrupts
> are being handled with xhci_irq happens when the
> xhci_plat_remove is called and shared_hcd is set to NULL.
> Fix this by setting the shared_hcd to NULL only after the
> controller is halted and no interrupts are generated."
>
> Signed-off-by: Sriharsha Allenki <[email protected]>
> Signed-off-by: Macpaul Lin <[email protected]>
> ---
> drivers/usb/host/xhci-mtk.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> index b18a6baef204..c227c67f5dc5 100644
> --- a/drivers/usb/host/xhci-mtk.c
> +++ b/drivers/usb/host/xhci-mtk.c
> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> struct usb_hcd *shared_hcd = xhci->shared_hcd;
>
> usb_remove_hcd(shared_hcd);
> - xhci->shared_hcd = NULL;
> device_init_wakeup(&dev->dev, false);
>
> usb_remove_hcd(hcd);
> usb_put_hcd(shared_hcd);
> + xhci->shared_hcd = NULL;
> usb_put_hcd(hcd);
> xhci_mtk_sch_exit(mtk);
> xhci_mtk_clks_disable(mtk);
>

Could you share details of the NULL pointer dereference, (backtrace).

The USB3 hcd is already removed when xhci->shared_hcd is set to NULL.
We might want to add some checks to make sure we are not using the removed
hcd anymore in that codepath anymore.

-Mathias

2020-02-01 11:24:41

by Macpaul Lin

[permalink] [raw]
Subject: Re: [PATCH] xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd

On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> On 17.1.2020 9.41, Macpaul Lin wrote:
> > According to NULL pointer fix: https://tinyurl.com/uqft5ra
> > xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> > The similar issue has also been found in QC activities in Mediatek.
> >
> > Here quote the description from the referenced patch as follows.
> > "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> > at xhci removal") sets xhci_shared_hcd to NULL without
> > stopping xhci host. This results into a race condition
> > where shared_hcd (super speed roothub) related interrupts
> > are being handled with xhci_irq happens when the
> > xhci_plat_remove is called and shared_hcd is set to NULL.
> > Fix this by setting the shared_hcd to NULL only after the
> > controller is halted and no interrupts are generated."
> >
> > Signed-off-by: Sriharsha Allenki <[email protected]>
> > Signed-off-by: Macpaul Lin <[email protected]>
> > ---
> > drivers/usb/host/xhci-mtk.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> > index b18a6baef204..c227c67f5dc5 100644
> > --- a/drivers/usb/host/xhci-mtk.c
> > +++ b/drivers/usb/host/xhci-mtk.c
> > @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> > struct usb_hcd *shared_hcd = xhci->shared_hcd;
> >
> > usb_remove_hcd(shared_hcd);
> > - xhci->shared_hcd = NULL;
> > device_init_wakeup(&dev->dev, false);
> >
> > usb_remove_hcd(hcd);
> > usb_put_hcd(shared_hcd);
> > + xhci->shared_hcd = NULL;
> > usb_put_hcd(hcd);
> > xhci_mtk_sch_exit(mtk);
> > xhci_mtk_clks_disable(mtk);
> >
>
> Could you share details of the NULL pointer dereference, (backtrace).

This bug was found by our QA staff while doing 500 times plug-in and
plug-out devices. The backtrace I have was recorded by QA and I didn't
reproduce this issue on my own environment. However, after applied this
patch the issue seems resolve. Here is the backtrace:

Exception Class: Kernel (KE)
PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364

Current Executing Process:
[iptables, 859][netdagent, 770]

Backtrace:
[<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
[<ffffff80080eb6d4>] notify_die+0x84/0xac
[<ffffff800808e874>] die+0x1d8/0x3b8
[<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
[<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
[<ffffff80080a811c>] do_translation_fault+0x44/0x98
[<ffffff8008080e08>] do_mem_abort+0x4c/0x128
[<ffffff80080832d0>] el1_da+0x24/0x3c
[<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
[<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
[<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
[<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
[<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
[<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
[<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
[<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
[<ffffffffffffffff>] 0xffffffffffffffff

> The USB3 hcd is already removed when xhci->shared_hcd is set to NULL.
> We might want to add some checks to make sure we are not using the removed
> hcd anymore in that codepath anymore.
>
> -Mathias

Thanks,
Macpaul Lin

2020-02-02 19:30:27

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd

On Fri, Jan 17, 2020 at 03:41:50PM +0800, Macpaul Lin wrote:
> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> The similar issue has also been found in QC activities in Mediatek.
>
> Here quote the description from the referenced patch as follows.
> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> at xhci removal") sets xhci_shared_hcd to NULL without
> stopping xhci host. This results into a race condition
> where shared_hcd (super speed roothub) related interrupts
> are being handled with xhci_irq happens when the
> xhci_plat_remove is called and shared_hcd is set to NULL.
> Fix this by setting the shared_hcd to NULL only after the
> controller is halted and no interrupts are generated."
>
> Signed-off-by: Sriharsha Allenki <[email protected]>
> Signed-off-by: Macpaul Lin <[email protected]>
> ---
> drivers/usb/host/xhci-mtk.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> index b18a6baef204..c227c67f5dc5 100644
> --- a/drivers/usb/host/xhci-mtk.c
> +++ b/drivers/usb/host/xhci-mtk.c
> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> struct usb_hcd *shared_hcd = xhci->shared_hcd;
>
> usb_remove_hcd(shared_hcd);
> - xhci->shared_hcd = NULL;
> device_init_wakeup(&dev->dev, false);
>
> usb_remove_hcd(hcd);
> usb_put_hcd(shared_hcd);
> + xhci->shared_hcd = NULL;

I may be wrong, but I have some concerns that this replaces
the NULL pointer access with a UAF.

Guenter

> usb_put_hcd(hcd);
> xhci_mtk_sch_exit(mtk);
> xhci_mtk_clks_disable(mtk);
> --
> 2.18.0

2020-02-04 09:43:46

by Mathias Nyman

[permalink] [raw]
Subject: Re: [PATCH] xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd

On 1.2.2020 13.20, Macpaul Lin wrote:
> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
>> On 17.1.2020 9.41, Macpaul Lin wrote:
>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
>>> The similar issue has also been found in QC activities in Mediatek.
>>>
>>> Here quote the description from the referenced patch as follows.
>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
>>> at xhci removal") sets xhci_shared_hcd to NULL without
>>> stopping xhci host. This results into a race condition
>>> where shared_hcd (super speed roothub) related interrupts
>>> are being handled with xhci_irq happens when the
>>> xhci_plat_remove is called and shared_hcd is set to NULL.
>>> Fix this by setting the shared_hcd to NULL only after the
>>> controller is halted and no interrupts are generated."
>>>
>>> Signed-off-by: Sriharsha Allenki <[email protected]>
>>> Signed-off-by: Macpaul Lin <[email protected]>
>>> ---
>>> drivers/usb/host/xhci-mtk.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
>>> index b18a6baef204..c227c67f5dc5 100644
>>> --- a/drivers/usb/host/xhci-mtk.c
>>> +++ b/drivers/usb/host/xhci-mtk.c
>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
>>> struct usb_hcd *shared_hcd = xhci->shared_hcd;
>>>
>>> usb_remove_hcd(shared_hcd);
>>> - xhci->shared_hcd = NULL;
>>> device_init_wakeup(&dev->dev, false);
>>>
>>> usb_remove_hcd(hcd);
>>> usb_put_hcd(shared_hcd);
>>> + xhci->shared_hcd = NULL;
>>> usb_put_hcd(hcd);
>>> xhci_mtk_sch_exit(mtk);
>>> xhci_mtk_clks_disable(mtk);
>>>
>>
>> Could you share details of the NULL pointer dereference, (backtrace).
>
> This bug was found by our QA staff while doing 500 times plug-in and
> plug-out devices. The backtrace I have was recorded by QA and I didn't
> reproduce this issue on my own environment. However, after applied this
> patch the issue seems resolve. Here is the backtrace:
>
> Exception Class: Kernel (KE)
> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
>
> Current Executing Process:
> [iptables, 859][netdagent, 770]
>
> Backtrace:
> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> [<ffffff80080eb6d4>] notify_die+0x84/0xac
> [<ffffff800808e874>] die+0x1d8/0x3b8
> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> [<ffffff80080832d0>] el1_da+0x24/0x3c
> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> [<ffffffffffffffff>] 0xffffffffffffffff
>

Thanks,
Could you help me find out which line of code xhci_irq+0x728 is in your case.

As Guenter pointed out there is a risk of turning the NULL pointer dereference
into a use after free if we just solve this by setting xhci->shared_hcd = NULL
later.

If you still have that kernel around, and xhci is compiled in:
gdb vmlinux
gdb li *(xhci_irq+0x728)

-Mathias

2020-03-04 03:17:01

by Macpaul Lin

[permalink] [raw]
Subject: Re: [PATCH] xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd

On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote:
> On 1.2.2020 13.20, Macpaul Lin wrote:
> > On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> >> On 17.1.2020 9.41, Macpaul Lin wrote:
> >>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> >>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> >>> The similar issue has also been found in QC activities in Mediatek.
> >>>
> >>> Here quote the description from the referenced patch as follows.
> >>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> >>> at xhci removal") sets xhci_shared_hcd to NULL without
> >>> stopping xhci host. This results into a race condition
> >>> where shared_hcd (super speed roothub) related interrupts
> >>> are being handled with xhci_irq happens when the
> >>> xhci_plat_remove is called and shared_hcd is set to NULL.
> >>> Fix this by setting the shared_hcd to NULL only after the
> >>> controller is halted and no interrupts are generated."
> >>>
> >>> Signed-off-by: Sriharsha Allenki <[email protected]>
> >>> Signed-off-by: Macpaul Lin <[email protected]>
> >>> ---
> >>> drivers/usb/host/xhci-mtk.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> >>> index b18a6baef204..c227c67f5dc5 100644
> >>> --- a/drivers/usb/host/xhci-mtk.c
> >>> +++ b/drivers/usb/host/xhci-mtk.c
> >>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> >>> struct usb_hcd *shared_hcd = xhci->shared_hcd;
> >>>
> >>> usb_remove_hcd(shared_hcd);
> >>> - xhci->shared_hcd = NULL;
> >>> device_init_wakeup(&dev->dev, false);
> >>>
> >>> usb_remove_hcd(hcd);
> >>> usb_put_hcd(shared_hcd);
> >>> + xhci->shared_hcd = NULL;
> >>> usb_put_hcd(hcd);
> >>> xhci_mtk_sch_exit(mtk);
> >>> xhci_mtk_clks_disable(mtk);
> >>>
> >>
> >> Could you share details of the NULL pointer dereference, (backtrace).
> >
> > This bug was found by our QA staff while doing 500 times plug-in and
> > plug-out devices. The backtrace I have was recorded by QA and I didn't
> > reproduce this issue on my own environment. However, after applied this
> > patch the issue seems resolve. Here is the backtrace:
> >
> > Exception Class: Kernel (KE)
> > PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
> >
> > Current Executing Process:
> > [iptables, 859][netdagent, 770]
> >
> > Backtrace:
> > [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> > [<ffffff80080eb6d4>] notify_die+0x84/0xac
> > [<ffffff800808e874>] die+0x1d8/0x3b8
> > [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> > [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> > [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> > [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> > [<ffffff80080832d0>] el1_da+0x24/0x3c
> > [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> > [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> > [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> > [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> > [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> > [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> > [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
>
> Thanks,
> Could you help me find out which line of code xhci_irq+0x728 is in your case.
>
> As Guenter pointed out there is a risk of turning the NULL pointer dereference
> into a use after free if we just solve this by setting xhci->shared_hcd = NULL
> later.
>
> If you still have that kernel around, and xhci is compiled in:
> gdb vmlinux
> gdb li *(xhci_irq+0x728)
>

Sorry that I couldn't get back to you soon. The internal code version
for this issue was really old and a little bit difficult to rewind to
that version.
However, I think the following dump might be correct for the code base.

(gdb) li *(xhci_irq+0x728)
0xffffff8008cc8634 is in xhci_irq (*stripped*
kernel-4.14/drivers/usb/host/xhci.h:1694).
1689 */
1690 #define XHCI_MAX_REXIT_TIMEOUT_MS 20
1691
1692 static inline unsigned int hcd_index(struct usb_hcd *hcd)
1693 {
1694 if (hcd->speed >= HCD_USB3)
1695 return 0;
1696 else
1697 return 1;
1698 }
(gdb)

Thanks
Macpaul Lin

2020-03-04 14:38:09

by Mathias Nyman

[permalink] [raw]
Subject: Re: [PATCH] xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd

On 4.3.2020 5.16, Macpaul Lin wrote:
> On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote:
>> On 1.2.2020 13.20, Macpaul Lin wrote:
>>> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
>>>> On 17.1.2020 9.41, Macpaul Lin wrote:
>>>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
>>>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
>>>>> The similar issue has also been found in QC activities in Mediatek.
>>>>>
>>>>> Here quote the description from the referenced patch as follows.
>>>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
>>>>> at xhci removal") sets xhci_shared_hcd to NULL without
>>>>> stopping xhci host. This results into a race condition
>>>>> where shared_hcd (super speed roothub) related interrupts
>>>>> are being handled with xhci_irq happens when the
>>>>> xhci_plat_remove is called and shared_hcd is set to NULL.
>>>>> Fix this by setting the shared_hcd to NULL only after the
>>>>> controller is halted and no interrupts are generated."
>>>>>
>>>>> Signed-off-by: Sriharsha Allenki <[email protected]>
>>>>> Signed-off-by: Macpaul Lin <[email protected]>
>>>>> ---
>>>>> drivers/usb/host/xhci-mtk.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
>>>>> index b18a6baef204..c227c67f5dc5 100644
>>>>> --- a/drivers/usb/host/xhci-mtk.c
>>>>> +++ b/drivers/usb/host/xhci-mtk.c
>>>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
>>>>> struct usb_hcd *shared_hcd = xhci->shared_hcd;
>>>>>
>>>>> usb_remove_hcd(shared_hcd);
>>>>> - xhci->shared_hcd = NULL;
>>>>> device_init_wakeup(&dev->dev, false);
>>>>>
>>>>> usb_remove_hcd(hcd);
>>>>> usb_put_hcd(shared_hcd);
>>>>> + xhci->shared_hcd = NULL;
>>>>> usb_put_hcd(hcd);
>>>>> xhci_mtk_sch_exit(mtk);
>>>>> xhci_mtk_clks_disable(mtk);
>>>>>
>>>>
>>>> Could you share details of the NULL pointer dereference, (backtrace).
>>>
>>> This bug was found by our QA staff while doing 500 times plug-in and
>>> plug-out devices. The backtrace I have was recorded by QA and I didn't
>>> reproduce this issue on my own environment. However, after applied this
>>> patch the issue seems resolve. Here is the backtrace:
>>>
>>> Exception Class: Kernel (KE)
>>> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
>>> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
>>>
>>> Current Executing Process:
>>> [iptables, 859][netdagent, 770]
>>>
>>> Backtrace:
>>> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
>>> [<ffffff80080eb6d4>] notify_die+0x84/0xac
>>> [<ffffff800808e874>] die+0x1d8/0x3b8
>>> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
>>> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
>>> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
>>> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
>>> [<ffffff80080832d0>] el1_da+0x24/0x3c
>>> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
>>> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
>>> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
>>> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
>>> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
>>> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
>>> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
>>> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>
>> Thanks,
>> Could you help me find out which line of code xhci_irq+0x728 is in your case.
>>
>> As Guenter pointed out there is a risk of turning the NULL pointer dereference
>> into a use after free if we just solve this by setting xhci->shared_hcd = NULL
>> later.
>>
>> If you still have that kernel around, and xhci is compiled in:
>> gdb vmlinux
>> gdb li *(xhci_irq+0x728)
>>
>
> Sorry that I couldn't get back to you soon. The internal code version
> for this issue was really old and a little bit difficult to rewind to
> that version.
> However, I think the following dump might be correct for the code base.
>
> (gdb) li *(xhci_irq+0x728)
> 0xffffff8008cc8634 is in xhci_irq (*stripped*
> kernel-4.14/drivers/usb/host/xhci.h:1694).
> 1689 */
> 1690 #define XHCI_MAX_REXIT_TIMEOUT_MS 20
> 1691
> 1692 static inline unsigned int hcd_index(struct usb_hcd *hcd)
> 1693 {
> 1694 if (hcd->speed >= HCD_USB3)
> 1695 return 0;
> 1696 else
> 1697 return 1;
> 1698 }
> (gdb)
>
> Thanks
> Macpaul Lin
>

Ah, it was a 4.14 kernel.
This should be fixed in 4.20 with patch:
1245374e9b83 xhci: handle port status events for removed USB3 hcd

Port arrays/structures were changed completely in 4.18

Something like the below should work for 4.14:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 61fa3007a74a..e7367b9f19c5 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1640,6 +1640,12 @@ static void handle_port_status(struct xhci_hcd *xhci,
if ((major_revision == 0x03) != (hcd->speed >= HCD_USB3))
hcd = xhci->shared_hcd;

+ if (!hcd) {
+ xhci_dbg(xhci, "No hcd found for port %u event\n", port_id);
+ bogus_port_status = true;
+ goto cleanup;
+ }
+
if (major_revision == 0) {
xhci_warn(xhci, "Event for port %u not in "
"Extended Capabilities, ignoring.\n",


-Mathias

2020-03-05 02:59:55

by Macpaul Lin

[permalink] [raw]
Subject: Re: [PATCH] xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd

On Wed, 2020-03-04 at 16:39 +0200, Mathias Nyman wrote:
> On 4.3.2020 5.16, Macpaul Lin wrote:
> > On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote:
> >> On 1.2.2020 13.20, Macpaul Lin wrote:
> >>> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> >>>> On 17.1.2020 9.41, Macpaul Lin wrote:
> >>>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> >>>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> >>>>> The similar issue has also been found in QC activities in Mediatek.
> >>>>>
> >>>>> Here quote the description from the referenced patch as follows.
> >>>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> >>>>> at xhci removal") sets xhci_shared_hcd to NULL without
> >>>>> stopping xhci host. This results into a race condition
> >>>>> where shared_hcd (super speed roothub) related interrupts
> >>>>> are being handled with xhci_irq happens when the
> >>>>> xhci_plat_remove is called and shared_hcd is set to NULL.
> >>>>> Fix this by setting the shared_hcd to NULL only after the
> >>>>> controller is halted and no interrupts are generated."
> >>>>>
> >>>>> Signed-off-by: Sriharsha Allenki <[email protected]>
> >>>>> Signed-off-by: Macpaul Lin <[email protected]>
> >>>>> ---
> >>>>> drivers/usb/host/xhci-mtk.c | 2 +-
> >>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> >>>>> index b18a6baef204..c227c67f5dc5 100644
> >>>>> --- a/drivers/usb/host/xhci-mtk.c
> >>>>> +++ b/drivers/usb/host/xhci-mtk.c
> >>>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> >>>>> struct usb_hcd *shared_hcd = xhci->shared_hcd;
> >>>>>
> >>>>> usb_remove_hcd(shared_hcd);
> >>>>> - xhci->shared_hcd = NULL;
> >>>>> device_init_wakeup(&dev->dev, false);
> >>>>>
> >>>>> usb_remove_hcd(hcd);
> >>>>> usb_put_hcd(shared_hcd);
> >>>>> + xhci->shared_hcd = NULL;
> >>>>> usb_put_hcd(hcd);
> >>>>> xhci_mtk_sch_exit(mtk);
> >>>>> xhci_mtk_clks_disable(mtk);
> >>>>>
> >>>>
> >>>> Could you share details of the NULL pointer dereference, (backtrace).
> >>>
> >>> This bug was found by our QA staff while doing 500 times plug-in and
> >>> plug-out devices. The backtrace I have was recorded by QA and I didn't
> >>> reproduce this issue on my own environment. However, after applied this
> >>> patch the issue seems resolve. Here is the backtrace:
> >>>
> >>> Exception Class: Kernel (KE)
> >>> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> >>> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
> >>>
> >>> Current Executing Process:
> >>> [iptables, 859][netdagent, 770]
> >>>
> >>> Backtrace:
> >>> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> >>> [<ffffff80080eb6d4>] notify_die+0x84/0xac
> >>> [<ffffff800808e874>] die+0x1d8/0x3b8
> >>> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> >>> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> >>> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> >>> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> >>> [<ffffff80080832d0>] el1_da+0x24/0x3c
> >>> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> >>> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> >>> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> >>> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> >>> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> >>> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> >>> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> >>> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> >>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>
> >>
> >> Thanks,
> >> Could you help me find out which line of code xhci_irq+0x728 is in your case.
> >>
> >> As Guenter pointed out there is a risk of turning the NULL pointer dereference
> >> into a use after free if we just solve this by setting xhci->shared_hcd = NULL
> >> later.
> >>
> >> If you still have that kernel around, and xhci is compiled in:
> >> gdb vmlinux
> >> gdb li *(xhci_irq+0x728)
> >>
> >
> > Sorry that I couldn't get back to you soon. The internal code version
> > for this issue was really old and a little bit difficult to rewind to
> > that version.
> > However, I think the following dump might be correct for the code base.
> >
> > (gdb) li *(xhci_irq+0x728)
> > 0xffffff8008cc8634 is in xhci_irq (*stripped*
> > kernel-4.14/drivers/usb/host/xhci.h:1694).
> > 1689 */
> > 1690 #define XHCI_MAX_REXIT_TIMEOUT_MS 20
> > 1691
> > 1692 static inline unsigned int hcd_index(struct usb_hcd *hcd)
> > 1693 {
> > 1694 if (hcd->speed >= HCD_USB3)
> > 1695 return 0;
> > 1696 else
> > 1697 return 1;
> > 1698 }
> > (gdb)
> >
> > Thanks
> > Macpaul Lin
> >
>
> Ah, it was a 4.14 kernel.
> This should be fixed in 4.20 with patch:
> 1245374e9b83 xhci: handle port status events for removed USB3 hcd
>
> Port arrays/structures were changed completely in 4.18
>
> Something like the below should work for 4.14:
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 61fa3007a74a..e7367b9f19c5 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1640,6 +1640,12 @@ static void handle_port_status(struct xhci_hcd *xhci,
> if ((major_revision == 0x03) != (hcd->speed >= HCD_USB3))
> hcd = xhci->shared_hcd;
>
> + if (!hcd) {
> + xhci_dbg(xhci, "No hcd found for port %u event\n", port_id);
> + bogus_port_status = true;
> + goto cleanup;
> + }
> +
> if (major_revision == 0) {
> xhci_warn(xhci, "Event for port %u not in "
> "Extended Capabilities, ignoring.\n",

Thanks for this suggestion, this is much better! I am sorry that we're
using android kernel that some reported issue might be out of date. I
will update the suggestion into our code base. Thanks!

Regards,
Macpaul Lin

2020-03-05 18:33:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd

On Thu, Mar 05, 2020 at 10:58:46AM +0800, Macpaul Lin wrote:
> On Wed, 2020-03-04 at 16:39 +0200, Mathias Nyman wrote:
> > On 4.3.2020 5.16, Macpaul Lin wrote:
> > > On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote:
> > >> On 1.2.2020 13.20, Macpaul Lin wrote:
> > >>> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> > >>>> On 17.1.2020 9.41, Macpaul Lin wrote:
> > >>>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> > >>>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> > >>>>> The similar issue has also been found in QC activities in Mediatek.
> > >>>>>
> > >>>>> Here quote the description from the referenced patch as follows.
> > >>>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> > >>>>> at xhci removal") sets xhci_shared_hcd to NULL without
> > >>>>> stopping xhci host. This results into a race condition
> > >>>>> where shared_hcd (super speed roothub) related interrupts
> > >>>>> are being handled with xhci_irq happens when the
> > >>>>> xhci_plat_remove is called and shared_hcd is set to NULL.
> > >>>>> Fix this by setting the shared_hcd to NULL only after the
> > >>>>> controller is halted and no interrupts are generated."
> > >>>>>
> > >>>>> Signed-off-by: Sriharsha Allenki <[email protected]>
> > >>>>> Signed-off-by: Macpaul Lin <[email protected]>
> > >>>>> ---
> > >>>>> drivers/usb/host/xhci-mtk.c | 2 +-
> > >>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>>>
> > >>>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> > >>>>> index b18a6baef204..c227c67f5dc5 100644
> > >>>>> --- a/drivers/usb/host/xhci-mtk.c
> > >>>>> +++ b/drivers/usb/host/xhci-mtk.c
> > >>>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> > >>>>> struct usb_hcd *shared_hcd = xhci->shared_hcd;
> > >>>>>
> > >>>>> usb_remove_hcd(shared_hcd);
> > >>>>> - xhci->shared_hcd = NULL;
> > >>>>> device_init_wakeup(&dev->dev, false);
> > >>>>>
> > >>>>> usb_remove_hcd(hcd);
> > >>>>> usb_put_hcd(shared_hcd);
> > >>>>> + xhci->shared_hcd = NULL;
> > >>>>> usb_put_hcd(hcd);
> > >>>>> xhci_mtk_sch_exit(mtk);
> > >>>>> xhci_mtk_clks_disable(mtk);
> > >>>>>
> > >>>>
> > >>>> Could you share details of the NULL pointer dereference, (backtrace).
> > >>>
> > >>> This bug was found by our QA staff while doing 500 times plug-in and
> > >>> plug-out devices. The backtrace I have was recorded by QA and I didn't
> > >>> reproduce this issue on my own environment. However, after applied this
> > >>> patch the issue seems resolve. Here is the backtrace:
> > >>>
> > >>> Exception Class: Kernel (KE)
> > >>> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > >>> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
> > >>>
> > >>> Current Executing Process:
> > >>> [iptables, 859][netdagent, 770]
> > >>>
> > >>> Backtrace:
> > >>> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> > >>> [<ffffff80080eb6d4>] notify_die+0x84/0xac
> > >>> [<ffffff800808e874>] die+0x1d8/0x3b8
> > >>> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> > >>> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> > >>> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> > >>> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> > >>> [<ffffff80080832d0>] el1_da+0x24/0x3c
> > >>> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > >>> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> > >>> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> > >>> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> > >>> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> > >>> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> > >>> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> > >>> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> > >>> [<ffffffffffffffff>] 0xffffffffffffffff
> > >>>
> > >>
> > >> Thanks,
> > >> Could you help me find out which line of code xhci_irq+0x728 is in your case.
> > >>
> > >> As Guenter pointed out there is a risk of turning the NULL pointer dereference
> > >> into a use after free if we just solve this by setting xhci->shared_hcd = NULL
> > >> later.
> > >>
> > >> If you still have that kernel around, and xhci is compiled in:
> > >> gdb vmlinux
> > >> gdb li *(xhci_irq+0x728)
> > >>
> > >
> > > Sorry that I couldn't get back to you soon. The internal code version
> > > for this issue was really old and a little bit difficult to rewind to
> > > that version.
> > > However, I think the following dump might be correct for the code base.
> > >
> > > (gdb) li *(xhci_irq+0x728)
> > > 0xffffff8008cc8634 is in xhci_irq (*stripped*
> > > kernel-4.14/drivers/usb/host/xhci.h:1694).
> > > 1689 */
> > > 1690 #define XHCI_MAX_REXIT_TIMEOUT_MS 20
> > > 1691
> > > 1692 static inline unsigned int hcd_index(struct usb_hcd *hcd)
> > > 1693 {
> > > 1694 if (hcd->speed >= HCD_USB3)
> > > 1695 return 0;
> > > 1696 else
> > > 1697 return 1;
> > > 1698 }
> > > (gdb)
> > >
> > > Thanks
> > > Macpaul Lin
> > >
> >
> > Ah, it was a 4.14 kernel.
> > This should be fixed in 4.20 with patch:
> > 1245374e9b83 xhci: handle port status events for removed USB3 hcd
> >
> > Port arrays/structures were changed completely in 4.18
> >
> > Something like the below should work for 4.14:
> >
> > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > index 61fa3007a74a..e7367b9f19c5 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -1640,6 +1640,12 @@ static void handle_port_status(struct xhci_hcd *xhci,
> > if ((major_revision == 0x03) != (hcd->speed >= HCD_USB3))
> > hcd = xhci->shared_hcd;
> >
> > + if (!hcd) {
> > + xhci_dbg(xhci, "No hcd found for port %u event\n", port_id);
> > + bogus_port_status = true;
> > + goto cleanup;
> > + }
> > +
> > if (major_revision == 0) {
> > xhci_warn(xhci, "Event for port %u not in "
> > "Extended Capabilities, ignoring.\n",
>
> Thanks for this suggestion, this is much better! I am sorry that we're
> using android kernel that some reported issue might be out of date. I
> will update the suggestion into our code base. Thanks!

Should I backport this to 4.14 and older kernels to prevent this issue
from showing up in newer Android devices that are using these older
kernels?

thanks,

greg k-h

2020-03-06 02:37:14

by Macpaul Lin

[permalink] [raw]
Subject: Re: [PATCH] xhci-mtk: Fix NULL pointer dereference with xhci_irq() for shared_hcd

On Thu, 2020-03-05 at 19:32 +0100, Greg Kroah-Hartman wrote:
> On Thu, Mar 05, 2020 at 10:58:46AM +0800, Macpaul Lin wrote:
> > On Wed, 2020-03-04 at 16:39 +0200, Mathias Nyman wrote:
> > > On 4.3.2020 5.16, Macpaul Lin wrote:
> > > > On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote:
> > > >> On 1.2.2020 13.20, Macpaul Lin wrote:
> > > >>> On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote:
> > > >>>> On 17.1.2020 9.41, Macpaul Lin wrote:
> > > >>>>> According to NULL pointer fix: https://tinyurl.com/uqft5ra
> > > >>>>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd
> > > >>>>> The similar issue has also been found in QC activities in Mediatek.
> > > >>>>>
> > > >>>>> Here quote the description from the referenced patch as follows.
> > > >>>>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd
> > > >>>>> at xhci removal") sets xhci_shared_hcd to NULL without
> > > >>>>> stopping xhci host. This results into a race condition
> > > >>>>> where shared_hcd (super speed roothub) related interrupts
> > > >>>>> are being handled with xhci_irq happens when the
> > > >>>>> xhci_plat_remove is called and shared_hcd is set to NULL.
> > > >>>>> Fix this by setting the shared_hcd to NULL only after the
> > > >>>>> controller is halted and no interrupts are generated."
> > > >>>>>
> > > >>>>> Signed-off-by: Sriharsha Allenki <[email protected]>
> > > >>>>> Signed-off-by: Macpaul Lin <[email protected]>
> > > >>>>> ---
> > > >>>>> drivers/usb/host/xhci-mtk.c | 2 +-
> > > >>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >>>>>
> > > >>>>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> > > >>>>> index b18a6baef204..c227c67f5dc5 100644
> > > >>>>> --- a/drivers/usb/host/xhci-mtk.c
> > > >>>>> +++ b/drivers/usb/host/xhci-mtk.c
> > > >>>>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev)
> > > >>>>> struct usb_hcd *shared_hcd = xhci->shared_hcd;
> > > >>>>>
> > > >>>>> usb_remove_hcd(shared_hcd);
> > > >>>>> - xhci->shared_hcd = NULL;
> > > >>>>> device_init_wakeup(&dev->dev, false);
> > > >>>>>
> > > >>>>> usb_remove_hcd(hcd);
> > > >>>>> usb_put_hcd(shared_hcd);
> > > >>>>> + xhci->shared_hcd = NULL;
> > > >>>>> usb_put_hcd(hcd);
> > > >>>>> xhci_mtk_sch_exit(mtk);
> > > >>>>> xhci_mtk_clks_disable(mtk);
> > > >>>>>
> > > >>>>
> > > >>>> Could you share details of the NULL pointer dereference, (backtrace).
> > > >>>
> > > >>> This bug was found by our QA staff while doing 500 times plug-in and
> > > >>> plug-out devices. The backtrace I have was recorded by QA and I didn't
> > > >>> reproduce this issue on my own environment. However, after applied this
> > > >>> patch the issue seems resolve. Here is the backtrace:
> > > >>>
> > > >>> Exception Class: Kernel (KE)
> > > >>> PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > > >>> LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364
> > > >>>
> > > >>> Current Executing Process:
> > > >>> [iptables, 859][netdagent, 770]
> > > >>>
> > > >>> Backtrace:
> > > >>> [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130
> > > >>> [<ffffff80080eb6d4>] notify_die+0x84/0xac
> > > >>> [<ffffff800808e874>] die+0x1d8/0x3b8
> > > >>> [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188
> > > >>> [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0
> > > >>> [<ffffff80080a811c>] do_translation_fault+0x44/0x98
> > > >>> [<ffffff8008080e08>] do_mem_abort+0x4c/0x128
> > > >>> [<ffffff80080832d0>] el1_da+0x24/0x3c
> > > >>> [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364
> > > >>> [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44
> > > >>> [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4
> > > >>> [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0
> > > >>> [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0
> > > >>> [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738
> > > >>> [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4
> > > >>> [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c
> > > >>> [<ffffffffffffffff>] 0xffffffffffffffff
> > > >>>
> > > >>
> > > >> Thanks,
> > > >> Could you help me find out which line of code xhci_irq+0x728 is in your case.
> > > >>
> > > >> As Guenter pointed out there is a risk of turning the NULL pointer dereference
> > > >> into a use after free if we just solve this by setting xhci->shared_hcd = NULL
> > > >> later.
> > > >>
> > > >> If you still have that kernel around, and xhci is compiled in:
> > > >> gdb vmlinux
> > > >> gdb li *(xhci_irq+0x728)
> > > >>
> > > >
> > > > Sorry that I couldn't get back to you soon. The internal code version
> > > > for this issue was really old and a little bit difficult to rewind to
> > > > that version.
> > > > However, I think the following dump might be correct for the code base.
> > > >
> > > > (gdb) li *(xhci_irq+0x728)
> > > > 0xffffff8008cc8634 is in xhci_irq (*stripped*
> > > > kernel-4.14/drivers/usb/host/xhci.h:1694).
> > > > 1689 */
> > > > 1690 #define XHCI_MAX_REXIT_TIMEOUT_MS 20
> > > > 1691
> > > > 1692 static inline unsigned int hcd_index(struct usb_hcd *hcd)
> > > > 1693 {
> > > > 1694 if (hcd->speed >= HCD_USB3)
> > > > 1695 return 0;
> > > > 1696 else
> > > > 1697 return 1;
> > > > 1698 }
> > > > (gdb)
> > > >
> > > > Thanks
> > > > Macpaul Lin
> > > >
> > >
> > > Ah, it was a 4.14 kernel.
> > > This should be fixed in 4.20 with patch:
> > > 1245374e9b83 xhci: handle port status events for removed USB3 hcd
> > >
> > > Port arrays/structures were changed completely in 4.18
> > >
> > > Something like the below should work for 4.14:
> > >
> > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > > index 61fa3007a74a..e7367b9f19c5 100644
> > > --- a/drivers/usb/host/xhci-ring.c
> > > +++ b/drivers/usb/host/xhci-ring.c
> > > @@ -1640,6 +1640,12 @@ static void handle_port_status(struct xhci_hcd *xhci,
> > > if ((major_revision == 0x03) != (hcd->speed >= HCD_USB3))
> > > hcd = xhci->shared_hcd;
> > >
> > > + if (!hcd) {
> > > + xhci_dbg(xhci, "No hcd found for port %u event\n", port_id);
> > > + bogus_port_status = true;
> > > + goto cleanup;
> > > + }
> > > +
> > > if (major_revision == 0) {
> > > xhci_warn(xhci, "Event for port %u not in "
> > > "Extended Capabilities, ignoring.\n",
> >
> > Thanks for this suggestion, this is much better! I am sorry that we're
> > using android kernel that some reported issue might be out of date. I
> > will update the suggestion into our code base. Thanks!
>
> Should I backport this to 4.14 and older kernels to prevent this issue
> from showing up in newer Android devices that are using these older
> kernels?
>
> thanks,
>
> greg k-h

If this could be backported to older kernel that will be great for newer
Android devices. Some of the shipping devices will have requirement of
kernel upgrade. Hence if you could backport this patch will be great.

Thanks!

Regards,
Macpaul Lin