2021-12-02 15:00:59

by Robert Munteanu

[permalink] [raw]
Subject: Regression: plugging in USB scanner breaks all USB functionality

Hi,

After updating from kernel 5.14.11 to 5.14.14 I am seeing the following
problem:

When plugging in an USB scanner ( Brother DSMobile DS-740D ) to my
Lenovo P52 laptop I lose connection to all USB devices. Not only are
the devices no longer available on the host, but no power is drawn by
them. Only a reboot fixes the problem.

The scanner is the only device that triggers the problem, even when it
is the only device plugged in. I have a host of other devices,
connected either directly or via a USB hub in my monitor:

- keyboard
- mouse
- logitech brio webcam
- yubikey
- stream deck
- microphone

None of these cause any issues.
I have tried the following kernels ( packaged for openSUSE Tumbleweed
), and none of them fixed the issue:

- 5.15.2
- 5.15.5
- 5.16~rc3-1.1.ge8ae228

The problem does not appear if the scanner is connected when the laptop
is shutdown. It seems to have an init phase of about 6-7 seconds
(blinking green led) and then stays on. However, it is not detected via
lsusb or scanimage -L.

The problem does not appear on a desktop class machine ( ASUS Prime
X470-PRO/Ryzen 3700x).

The relevant parts of the kernel log seem to be:

Nov 22 11:53:18 rombert kernel: xhci_hcd 0000:00:14.0: Abort failed to stop command ring: -110
Nov 22 11:53:18 rombert kernel: xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
Nov 22 11:53:18 rombert kernel: xhci_hcd 0000:00:14.0: HC died; cleaning up

I've initially reported this at
https://bugzilla.opensuse.org/show_bug.cgi?id=1192569 and CC'ed the
distribution's kernel maintainer.

Please let me know if additional information is needed.

Regards,
Robert Munteanu


2021-12-02 15:13:48

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality

Hi, this is your Linux kernel regression tracker speaking.

Thanks for the report.

Top-posting for once, to make this easy accessible to everyone.

FWIW, 5.14 is EOL, so it might not be fixed there. As the problem is in
newer kernels as well, I suspect that it was a change applies to 5.15 or
5.16 that got backported. Maybe one of the developers might have an idea
which commit causes it. If that's not the case you likely should try a
bisection to find the culprit. Performing one between v5.14.11..v5.14.14
is likely the easiest and quickest way to find it.

To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced v5.14.11..v5.14.14
#regzbot title usb: plugging in USB scanner breaks all USB functionality
[regression present in 5.15.2 und 5.16-rc3, too]
#regzbot ignore-activity

Reminder for developers: when fixing the issue, please add a 'Link:' tag
with the URL to the report (the parent of this mail), then regzbot will
automatically mark the regression as resolved once the fix lands in the
appropriate tree. For more details about regzbot see footer.

Sending this to everyone that got the initial report, to make all aware
of the tracking. I also hope that messages like this motivate people to
directly get regzbot involved when dealing with regressions, as messages
like this wouldn't be needed then.

Don't worry, I'll send further messages wrt to this regression just to
the lists (with a tag in the subject so people can filter them away), as
long as they are intended just for regzbot. With a bit of luck no such
messages will be needed anyway.

Ciao, Thorsten, your Linux kernel regression tracker.

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave they thus might sent someone reading this down the
wrong rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.

---
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and/or the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
tell #regzbot about it in the report, as that will ensure the regression
gets on the radar of regzbot and the regression tracker. That's in your
interest, as they will make sure the report won't fall through the
cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include a 'Link:' tag to the report in the commit message, as explained
in Documentation/process/submitting-patches.rst
That aspect was recently was made more explicit in commit 1f57bd42b77c:
https://git.kernel.org/linus/1f57bd42b77c

On 02.12.21 15:55, Robert Munteanu wrote:
> Hi,
>
> After updating from kernel 5.14.11 to 5.14.14 I am seeing the following
> problem:
>
> When plugging in an USB scanner ( Brother DSMobile DS-740D ) to my
> Lenovo P52 laptop I lose connection to all USB devices. Not only are
> the devices no longer available on the host, but no power is drawn by
> them. Only a reboot fixes the problem.
>
> The scanner is the only device that triggers the problem, even when it
> is the only device plugged in. I have a host of other devices,
> connected either directly or via a USB hub in my monitor:
>
> - keyboard
> - mouse
> - logitech brio webcam
> - yubikey
> - stream deck
> - microphone
>
> None of these cause any issues.
> I have tried the following kernels ( packaged for openSUSE Tumbleweed
> ), and none of them fixed the issue:
>
> - 5.15.2
> - 5.15.5
> - 5.16~rc3-1.1.ge8ae228
>
> The problem does not appear if the scanner is connected when the laptop
> is shutdown. It seems to have an init phase of about 6-7 seconds
> (blinking green led) and then stays on. However, it is not detected via
> lsusb or scanimage -L.
>
> The problem does not appear on a desktop class machine ( ASUS Prime
> X470-PRO/Ryzen 3700x).
>
> The relevant parts of the kernel log seem to be:
>
> Nov 22 11:53:18 rombert kernel: xhci_hcd 0000:00:14.0: Abort failed to stop command ring: -110
> Nov 22 11:53:18 rombert kernel: xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
> Nov 22 11:53:18 rombert kernel: xhci_hcd 0000:00:14.0: HC died; cleaning up
>
> I've initially reported this at
> https://bugzilla.opensuse.org/show_bug.cgi?id=1192569 and CC'ed the
> distribution's kernel maintainer.
>
> Please let me know if additional information is needed.
>
> Regards,
> Robert Munteanu
>
>

2021-12-02 15:17:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality

On Thu, Dec 02, 2021 at 03:55:44PM +0100, Robert Munteanu wrote:
> Hi,
>
> After updating from kernel 5.14.11 to 5.14.14 I am seeing the following
> problem:

Can you run 'git bisect' between those kernel versions to get the
offending commit located? It shouldn't take that long as there's not a
lot of changes there.

thanks,

greg k-h

2021-12-03 10:56:07

by Mathias Nyman

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality

On 2.12.2021 16.55, Robert Munteanu wrote:
> Hi,
>
> After updating from kernel 5.14.11 to 5.14.14 I am seeing the following
> problem:
>
> When plugging in an USB scanner ( Brother DSMobile DS-740D ) to my
> Lenovo P52 laptop I lose connection to all USB devices. Not only are
> the devices no longer available on the host, but no power is drawn by
> them. Only a reboot fixes the problem.
>
> The scanner is the only device that triggers the problem, even when it
> is the only device plugged in. I have a host of other devices,
> connected either directly or via a USB hub in my monitor:
>

There is one xhci patch in that range that has caused other issues:
ff0e50d3564f xhci: Fix command ring pointer corruption while aborting a command

That patch has a fix that is not yet applied, fix can be found here:
https://lore.kernel.org/linux-usb/[email protected]/
or
https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/commit/?h=for-usb-linus&id=385b5b09c3546c87cfb730b76abe5f8d73c579a2

Does reverting the original patch, or applying the fix help?

Thanks
-Mathias




2021-12-03 11:36:49

by Takashi Iwai

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality

On Fri, 03 Dec 2021 11:57:38 +0100,
Mathias Nyman wrote:
>
> On 2.12.2021 16.55, Robert Munteanu wrote:
> > Hi,
> >
> > After updating from kernel 5.14.11 to 5.14.14 I am seeing the following
> > problem:
> >
> > When plugging in an USB scanner ( Brother DSMobile DS-740D ) to my
> > Lenovo P52 laptop I lose connection to all USB devices. Not only are
> > the devices no longer available on the host, but no power is drawn by
> > them. Only a reboot fixes the problem.
> >
> > The scanner is the only device that triggers the problem, even when it
> > is the only device plugged in. I have a host of other devices,
> > connected either directly or via a USB hub in my monitor:
> >
>
> There is one xhci patch in that range that has caused other issues:
> ff0e50d3564f xhci: Fix command ring pointer corruption while aborting a command
>
> That patch has a fix that is not yet applied, fix can be found here:
> https://lore.kernel.org/linux-usb/[email protected]/
> or
> https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/commit/?h=for-usb-linus&id=385b5b09c3546c87cfb730b76abe5f8d73c579a2
>
> Does reverting the original patch, or applying the fix help?

Thanks!

For convenience, I'm building a test 15.5.x kernel for openSUSE TW in
OBS home:tiwai:bsc1192569 repo. Robert, if you have time, please test
it later.


Takashi

2021-12-03 15:33:33

by Robert Munteanu

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality

On Thu, 2021-12-02 at 16:17 +0100, Greg Kroah-Hartman wrote:
> On Thu, Dec 02, 2021 at 03:55:44PM +0100, Robert Munteanu wrote:
> > Hi,
> >
> > After updating from kernel 5.14.11 to 5.14.14 I am seeing the
> > following
> > problem:
>
> Can you run 'git bisect' between those kernel versions to get the
> offending commit located?  It shouldn't take that long as there's not a
> lot of changes there.

A full bisect run, as suspected in other messages, results in

e54abefe703ab7c4e5983e889babd1447738ca42 is the first bad commit
commit e54abefe703ab7c4e5983e889babd1447738ca42
Author: Pavankumar Kondeti <[email protected]>
Date: Fri Oct 8 12:25:46 2021 +0300

xhci: Fix command ring pointer corruption while aborting a command

commit ff0e50d3564f33b7f4b35cadeabd951d66cfc570 upstream.

The command ring pointer is located at [6:63] bits of the command
ring control register (CRCR). All the control bits like command
stop,
abort are located at [0:3] bits. While aborting a command, we read
the
CRCR and set the abort bit and write to the CRCR. The read will
always
give command ring pointer as all zeros. So we essentially write
only
the control bits. Since we split the 64 bit write into two 32 bit
writes,
there is a possibility of xHC command ring stopped before the upper
dword (all zeros) is written. If that happens, xHC updates the
upper
dword of its internal command ring pointer with all zeros. Next
time,
when the command ring is restarted, we see xHC memory access
failures.
Fix this issue by only writing to the lower dword of CRCR where all
control bits are located.

Cc: [email protected]
Signed-off-by: Pavankumar Kondeti <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Link:
https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

drivers/usb/host/xhci-ring.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

I will try the patch linked by Matias as soon as the openSUSE kernel
build is complete.

Thanks,
Robert

2021-12-03 16:22:29

by Robert Munteanu

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality

On Fri, 2021-12-03 at 12:36 +0100, Takashi Iwai wrote:
> > That patch has a fix that is not yet applied, fix can be found
> > here:
> > https://lore.kernel.org/linux-usb/[email protected]/
> > or
> > https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/commit/?h=for-usb-linus&id=385b5b09c3546c87cfb730b76abe5f8d73c579a2
> >
> > Does reverting the original patch, or applying the fix help?
>
> Thanks!
>
> For convenience, I'm building a test 15.5.x kernel for openSUSE TW in
> OBS home:tiwai:bsc1192569 repo.  Robert, if you have time, please
> test
> it later.

I confirm that building and installing the kernel from the repostiory
that Takashi has provided fixed the problem for me.

Thanks a lot for the help!

Robert

2021-12-03 17:24:57

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality #forregzbot

On 02.12.21 16:13, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker speaking.
>
> Thanks for the report.
>
> Top-posting for once, to make this easy accessible to everyone.
>
> FWIW, 5.14 is EOL, so it might not be fixed there. As the problem is in
> newer kernels as well, I suspect that it was a change applies to 5.15 or
> 5.16 that got backported. Maybe one of the developers might have an idea
> which commit causes it. If that's not the case you likely should try a
> bisection to find the culprit. Performing one between v5.14.11..v5.14.14
> is likely the easiest and quickest way to find it.
>
> To be sure this issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, my Linux kernel regression tracking bot:
>
> #regzbot ^introduced v5.14.11..v5.14.14
> #regzbot title usb: plugging in USB scanner breaks all USB functionality
> [regression present in 5.15.2 und 5.16-rc3, too]
> #regzbot ignore-activity

#regzbot introduced ff0e50d3564f
#regzbot fixed-by 385b5b09c3546c87cfb730b76abe5f8d73c579a2

Ciao, Thorsten, your Linux kernel regression tracker

P.S.: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.




2021-12-04 10:03:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality #forregzbot

On Fri, Dec 03, 2021 at 06:24:52PM +0100, Thorsten Leemhuis wrote:
> On 02.12.21 16:13, Thorsten Leemhuis wrote:
> > Hi, this is your Linux kernel regression tracker speaking.
> >
> > Thanks for the report.
> >
> > Top-posting for once, to make this easy accessible to everyone.
> >
> > FWIW, 5.14 is EOL, so it might not be fixed there. As the problem is in
> > newer kernels as well, I suspect that it was a change applies to 5.15 or
> > 5.16 that got backported. Maybe one of the developers might have an idea
> > which commit causes it. If that's not the case you likely should try a
> > bisection to find the culprit. Performing one between v5.14.11..v5.14.14
> > is likely the easiest and quickest way to find it.
> >
> > To be sure this issue doesn't fall through the cracks unnoticed, I'm
> > adding it to regzbot, my Linux kernel regression tracking bot:
> >
> > #regzbot ^introduced v5.14.11..v5.14.14
> > #regzbot title usb: plugging in USB scanner breaks all USB functionality
> > [regression present in 5.15.2 und 5.16-rc3, too]
> > #regzbot ignore-activity
>
> #regzbot introduced ff0e50d3564f
> #regzbot fixed-by 385b5b09c3546c87cfb730b76abe5f8d73c579a2

Odd, where did that git commit id come from? I don't see it in
linux-next or Linus's tree.

confused,

greg k-h

2021-12-04 10:26:49

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality


On 04.12.21 11:03, Greg KH wrote:
> On Fri, Dec 03, 2021 at 06:24:52PM +0100, Thorsten Leemhuis wrote:
>> On 02.12.21 16:13, Thorsten Leemhuis wrote:
>>> Hi, this is your Linux kernel regression tracker speaking.
>>>
>>> Thanks for the report.
>>>
>>> Top-posting for once, to make this easy accessible to everyone.
>>>
>>> FWIW, 5.14 is EOL, so it might not be fixed there. As the problem is in
>>> newer kernels as well, I suspect that it was a change applies to 5.15 or
>>> 5.16 that got backported. Maybe one of the developers might have an idea
>>> which commit causes it. If that's not the case you likely should try a
>>> bisection to find the culprit. Performing one between v5.14.11..v5.14.14
>>> is likely the easiest and quickest way to find it.
>>>
>>> To be sure this issue doesn't fall through the cracks unnoticed, I'm
>>> adding it to regzbot, my Linux kernel regression tracking bot:
>>>
>>> #regzbot ^introduced v5.14.11..v5.14.14
>>> #regzbot title usb: plugging in USB scanner breaks all USB functionality
>>> [regression present in 5.15.2 und 5.16-rc3, too]
>>> #regzbot ignore-activity
>>
>> #regzbot introduced ff0e50d3564f
>> #regzbot fixed-by 385b5b09c3546c87cfb730b76abe5f8d73c579a2
>
> Odd, where did that git commit id come from? I don't see it in
> linux-next or Linus's tree.
>
> confused,

Yeah, sorry, after sending that mail it occurred to me that this wasn't
ideal and hard to follow.

I got it from here:
https://lore.kernel.org/lkml/[email protected]/

I already decided that next time something like this comes up I'll reply
to the mail with the details instead (with proper quoting) to make this
easier to follow.

Reading that message again I suspect that I might have been a bit quick
as well, as this might not be the commit id this ends up with when it
gets merged: I now see that this is likely a developers tree and not one
that gets indirectly merged.

Sorry, I'll manually keep an eye on things to fix this up once that
patch gets its real it.

Ciao, Thorsten

BTW, while at it:

#regzbot monitor
https://lore.kernel.org/linux-usb/[email protected]/

2021-12-04 10:44:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality

On Sat, Dec 04, 2021 at 11:26:45AM +0100, Thorsten Leemhuis wrote:
>
> On 04.12.21 11:03, Greg KH wrote:
> > On Fri, Dec 03, 2021 at 06:24:52PM +0100, Thorsten Leemhuis wrote:
> >> On 02.12.21 16:13, Thorsten Leemhuis wrote:
> >>> Hi, this is your Linux kernel regression tracker speaking.
> >>>
> >>> Thanks for the report.
> >>>
> >>> Top-posting for once, to make this easy accessible to everyone.
> >>>
> >>> FWIW, 5.14 is EOL, so it might not be fixed there. As the problem is in
> >>> newer kernels as well, I suspect that it was a change applies to 5.15 or
> >>> 5.16 that got backported. Maybe one of the developers might have an idea
> >>> which commit causes it. If that's not the case you likely should try a
> >>> bisection to find the culprit. Performing one between v5.14.11..v5.14.14
> >>> is likely the easiest and quickest way to find it.
> >>>
> >>> To be sure this issue doesn't fall through the cracks unnoticed, I'm
> >>> adding it to regzbot, my Linux kernel regression tracking bot:
> >>>
> >>> #regzbot ^introduced v5.14.11..v5.14.14
> >>> #regzbot title usb: plugging in USB scanner breaks all USB functionality
> >>> [regression present in 5.15.2 und 5.16-rc3, too]
> >>> #regzbot ignore-activity
> >>
> >> #regzbot introduced ff0e50d3564f
> >> #regzbot fixed-by 385b5b09c3546c87cfb730b76abe5f8d73c579a2
> >
> > Odd, where did that git commit id come from? I don't see it in
> > linux-next or Linus's tree.
> >
> > confused,
>
> Yeah, sorry, after sending that mail it occurred to me that this wasn't
> ideal and hard to follow.
>
> I got it from here:
> https://lore.kernel.org/lkml/[email protected]/
>
> I already decided that next time something like this comes up I'll reply
> to the mail with the details instead (with proper quoting) to make this
> easier to follow.
>
> Reading that message again I suspect that I might have been a bit quick
> as well, as this might not be the commit id this ends up with when it
> gets merged: I now see that this is likely a developers tree and not one
> that gets indirectly merged.
>
> Sorry, I'll manually keep an eye on things to fix this up once that
> patch gets its real it.

Ah, found it, it's now in my usb-linus branch, and I'll send it to Linus
later today:
09f736aa9547 ("xhci: Fix commad ring abort, write all 64 bits to CRCR register.")

thanks,

greg k-h

2021-12-04 11:06:38

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Regression: plugging in USB scanner breaks all USB functionality



On 04.12.21 11:44, Greg KH wrote:
> On Sat, Dec 04, 2021 at 11:26:45AM +0100, Thorsten Leemhuis wrote:
>>
>> On 04.12.21 11:03, Greg KH wrote:
>>> On Fri, Dec 03, 2021 at 06:24:52PM +0100, Thorsten Leemhuis wrote:
>>>> On 02.12.21 16:13, Thorsten Leemhuis wrote:
>>>>> Hi, this is your Linux kernel regression tracker speaking.
>>>>>
>>>>> Thanks for the report.
>>>>>
>>>>> Top-posting for once, to make this easy accessible to everyone.
>>>>>
>>>>> FWIW, 5.14 is EOL, so it might not be fixed there. As the problem is in
>>>>> newer kernels as well, I suspect that it was a change applies to 5.15 or
>>>>> 5.16 that got backported. Maybe one of the developers might have an idea
>>>>> which commit causes it. If that's not the case you likely should try a
>>>>> bisection to find the culprit. Performing one between v5.14.11..v5.14.14
>>>>> is likely the easiest and quickest way to find it.
>>>>>
>>>>> To be sure this issue doesn't fall through the cracks unnoticed, I'm
>>>>> adding it to regzbot, my Linux kernel regression tracking bot:
>>>>>
>>>>> #regzbot ^introduced v5.14.11..v5.14.14
>>>>> #regzbot title usb: plugging in USB scanner breaks all USB functionality
>>>>> [regression present in 5.15.2 und 5.16-rc3, too]
>>>>> #regzbot ignore-activity
>>>>
>>>> #regzbot introduced ff0e50d3564f
>>>> #regzbot fixed-by 385b5b09c3546c87cfb730b76abe5f8d73c579a2
>>>
>>> Odd, where did that git commit id come from? I don't see it in
>>> linux-next or Linus's tree.
>>>
>>> confused,
>>
>> Yeah, sorry, after sending that mail it occurred to me that this wasn't
>> ideal and hard to follow.
>>
>> I got it from here:
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> I already decided that next time something like this comes up I'll reply
>> to the mail with the details instead (with proper quoting) to make this
>> easier to follow.
>>
>> Reading that message again I suspect that I might have been a bit quick
>> as well, as this might not be the commit id this ends up with when it
>> gets merged: I now see that this is likely a developers tree and not one
>> that gets indirectly merged.
>>
>> Sorry, I'll manually keep an eye on things to fix this up once that
>> patch gets its real it.
>
> Ah, found it, it's now in my usb-linus branch, and I'll send it to Linus
> later today:
> 09f736aa9547 ("xhci: Fix commad ring abort, write all 64 bits to CRCR register.")

Great, thx for letting me known, then I will let regzbot know:

#regbzot fixed-by: 09f736aa9547

TWIMC: regbzot will automatically pick up the title once it sees the
commit in next or mainline.

Ciao, Thorsten