2019-02-17 06:15:02

by Nix

[permalink] [raw]
Subject: 4.20.7: pl2303 not working (post-4.19 regression) (limited info so far, not yet bisected)

So I just tried to connect up to my ancient Soekris firewall's serial
console to try to bisect a problem where it stopped booting in 4.20, and
found I couldn't.

minicom says:

minicom: cannot open /dev/ttyUSB0: Input/output error

and in the dmesg we see

[705576.028170] pl2303 ttyUSB0: failed to submit interrupt urb: -28

Booting to 4.19, everything works fine. (A random GalliumOS Chromebook
running 4.9.4 works fine too, not that that confirmation is terribly
useful.)

This is an extremely preliminary report in case it's instantly obvious
what's going on: I'll do enough investigation to produce an actually
useful bug report, including bisecting this, after I've bisected the
*other* non-booting bug, but that might not be until next weekend. (All
this for a firewall I was trying to decommission! bah :) )


2019-02-17 07:19:32

by Greg KH

[permalink] [raw]
Subject: Re: 4.20.7: pl2303 not working (post-4.19 regression) (limited info so far, not yet bisected)

On Sat, Feb 16, 2019 at 04:26:30PM +0000, Nix wrote:
> So I just tried to connect up to my ancient Soekris firewall's serial
> console to try to bisect a problem where it stopped booting in 4.20, and
> found I couldn't.
>
> minicom says:
>
> minicom: cannot open /dev/ttyUSB0: Input/output error
>
> and in the dmesg we see
>
> [705576.028170] pl2303 ttyUSB0: failed to submit interrupt urb: -28
>
> Booting to 4.19, everything works fine. (A random GalliumOS Chromebook
> running 4.9.4 works fine too, not that that confirmation is terribly
> useful.)
>
> This is an extremely preliminary report in case it's instantly obvious
> what's going on: I'll do enough investigation to produce an actually
> useful bug report, including bisecting this, after I've bisected the
> *other* non-booting bug, but that might not be until next weekend. (All
> this for a firewall I was trying to decommission! bah :) )

bisection would be great, thanks!

greg k-h

2019-02-17 19:16:06

by Nix

[permalink] [raw]
Subject: Re: 4.20.7: pl2303 not working (post-4.19 regression) (limited info so far, not yet bisected)

On 16 Feb 2019, Greg KH told this:

> On Sat, Feb 16, 2019 at 04:26:30PM +0000, Nix wrote:
>> So I just tried to connect up to my ancient Soekris firewall's serial
>> console to try to bisect a problem where it stopped booting in 4.20, and
>> found I couldn't.
>>
>> minicom says:
>>
>> minicom: cannot open /dev/ttyUSB0: Input/output error
>>
>> and in the dmesg we see
>>
>> [705576.028170] pl2303 ttyUSB0: failed to submit interrupt urb: -28
>>
>> Booting to 4.19, everything works fine. (A random GalliumOS Chromebook
[...]
>
> bisection would be great, thanks!

Rrrg. This is going to be harder than I thought. Rebooting, everything
works fine! So this is something that kicks in after something less than
eight days uptime, consistently on every box I own, but is fine on
reboot.

I'm still fairly sure this is a regression -- my machines are often up
for a lot longer than that and I've never seen this before I upgraded to
4.20.x -- but I don't think I'm going to identify it by mindless
bisection. I might have to actually *think* about it.

2019-02-18 08:07:58

by Johan Hovold

[permalink] [raw]
Subject: Re: 4.20.7: pl2303 not working (post-4.19 regression) (limited info so far, not yet bisected)

On Sun, Feb 17, 2019 at 07:13:52PM +0000, Nix wrote:
> On 16 Feb 2019, Greg KH told this:
>
> > On Sat, Feb 16, 2019 at 04:26:30PM +0000, Nix wrote:
> >> So I just tried to connect up to my ancient Soekris firewall's serial
> >> console to try to bisect a problem where it stopped booting in 4.20, and
> >> found I couldn't.
> >>
> >> minicom says:
> >>
> >> minicom: cannot open /dev/ttyUSB0: Input/output error
> >>
> >> and in the dmesg we see
> >>
> >> [705576.028170] pl2303 ttyUSB0: failed to submit interrupt urb: -28
> >>
> >> Booting to 4.19, everything works fine. (A random GalliumOS Chromebook
> [...]
> >
> > bisection would be great, thanks!
>
> Rrrg. This is going to be harder than I thought. Rebooting, everything
> works fine! So this is something that kicks in after something less than
> eight days uptime, consistently on every box I own, but is fine on
> reboot.

Tough one.

> I'm still fairly sure this is a regression -- my machines are often up
> for a lot longer than that and I've never seen this before I upgraded to
> 4.20.x -- but I don't think I'm going to identify it by mindless
> bisection. I might have to actually *think* about it.

I doubt it's a regression in usb-serial as essentially nothing changed
with respect to pl2303 or core since 4.19.

The -ENOSPC you're seeing is returned by the host controller to
indicate:

This request would overcommit the usb bandwidth reserved for
periodic transfers (interrupt, isochronous).

but if you're saying you can reproduce this on "every box" it may not be
related to any particular host-controller driver (or USB topology).

Johan

2019-02-18 10:35:40

by Nix

[permalink] [raw]
Subject: Re: 4.20.7: pl2303 not working (post-4.19 regression) (limited info so far, not yet bisected)

On 18 Feb 2019, Johan Hovold stated:

> On Sun, Feb 17, 2019 at 07:13:52PM +0000, Nix wrote:
>> I'm still fairly sure this is a regression -- my machines are often up
>> for a lot longer than that and I've never seen this before I upgraded to
>> 4.20.x -- but I don't think I'm going to identify it by mindless
>> bisection. I might have to actually *think* about it.
>
> I doubt it's a regression in usb-serial as essentially nothing changed
> with respect to pl2303 or core since 4.19.

Yeah, I came to that conclusion as well.

> The -ENOSPC you're seeing is returned by the host controller to
> indicate:
>
> This request would overcommit the usb bandwidth reserved for
> periodic transfers (interrupt, isochronous).

Side note: probably not related to *this* -ENOSPC, which I've been
seeing for a few releases now and which appears to break Chromium's U2F
negotiation when the USB bus has sufficiently weird devices on it (like,
uh, my wireless mouse):

<https://bugs.chromium.org/p/chromium/issues/detail?id=932699>

(I say "probably not related" because it's much older and long predates
the pl2303 trouble.)

> but if you're saying you can reproduce this on "every box" it may not be
> related to any particular host-controller driver (or USB topology).

They are all xhci, at least. The pl2303 is USB 2. One machine, a
two-year-old Broadwell server, says:

xhci_hcd 0000:00:14.0: xHCI Host Controller
xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 3
xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x0000000000009810
xhci_hcd 0000:00:14.0: cache line size of 32 is not supported
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 15 ports detected
xhci_hcd 0000:00:14.0: xHCI Host Controller
xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 4
xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed

00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05) (prog-if 30 [XHCI])

The other, a 2012-era cheapish Ivy Bridge workstation:

xhci_hcd 0000:03:00.0: xHCI Host Controller
xhci_hcd 0000:03:00.0: new USB bus registered, assigned bus number 3
xhci_hcd 0000:03:00.0: hcc params 0x014042cb hci version 0x96 quirks 0x0000000000000004
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
xhci_hcd 0000:03:00.0: xHCI Host Controller
xhci_hcd 0000:03:00.0: new USB bus registered, assigned bus number 4
xhci_hcd 0000:03:00.0: Host supports USB 3.0 SuperSpeed
usb usb4: We don't know the algorithms for LPM for this host, disabling LPM.
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
xhci_hcd 0000:04:00.0: xHCI Host Controller
xhci_hcd 0000:04:00.0: new USB bus registered, assigned bus number 5
xhci_hcd 0000:04:00.0: hcc params 0x0200f180 hci version 0x96 quirks 0x0000000000080000
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
xhci_hcd 0000:04:00.0: xHCI Host Controller
xhci_hcd 0000:04:00.0: new USB bus registered, assigned bus number 6
xhci_hcd 0000:04:00.0: Host supports USB 3.0 SuperSpeed
usb usb6: We don't know the algorithms for LPM for this host, disabling LPM.

03:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
04:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller (prog-if 30 [XHCI])

(I really don't know which of these is which. I suspect only one
actually has any visible ports on the outside of the case...)

So the quirks are all totally different, and the controllers are quite
different as well...

2019-02-20 09:32:05

by Johan Hovold

[permalink] [raw]
Subject: Re: 4.20.7: pl2303 not working (post-4.19 regression) (limited info so far, not yet bisected)

On Mon, Feb 18, 2019 at 10:32:57AM +0000, Nix wrote:
> On 18 Feb 2019, Johan Hovold stated:
>
> > On Sun, Feb 17, 2019 at 07:13:52PM +0000, Nix wrote:
> >> I'm still fairly sure this is a regression -- my machines are often up
> >> for a lot longer than that and I've never seen this before I upgraded to
> >> 4.20.x -- but I don't think I'm going to identify it by mindless
> >> bisection. I might have to actually *think* about it.
> >
> > I doubt it's a regression in usb-serial as essentially nothing changed
> > with respect to pl2303 or core since 4.19.
>
> Yeah, I came to that conclusion as well.
>
> > The -ENOSPC you're seeing is returned by the host controller to
> > indicate:
> >
> > This request would overcommit the usb bandwidth reserved for
> > periodic transfers (interrupt, isochronous).
>
> Side note: probably not related to *this* -ENOSPC, which I've been
> seeing for a few releases now and which appears to break Chromium's U2F
> negotiation when the USB bus has sufficiently weird devices on it (like,
> uh, my wireless mouse):
>
> <https://bugs.chromium.org/p/chromium/issues/detail?id=932699>
>
> (I say "probably not related" because it's much older and long predates
> the pl2303 trouble.)

Yeah, hard to tell from a quick look.

> > but if you're saying you can reproduce this on "every box" it may not be
> > related to any particular host-controller driver (or USB topology).
>
> They are all xhci, at least. The pl2303 is USB 2. One machine, a
> two-year-old Broadwell server, says:

> So the quirks are all totally different, and the controllers are quite
> different as well...

Yeah, but they are all xhci as you point out so theoretically it could
be an xhci driver regression.

Johan