2014-10-11 19:06:00

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

[Cc:ed someone who knows the people behind the Entropy Key: they're not
being manufactured at the moment, but he might want to know anyway]

On 5 Sep 2014, [email protected] stated:

> On 1 Sep 2014, Oliver Neukum stated:
>
>>> I'll do a bisection of the cdc-acm changes since 3.15 tomorrow night and
>>> see if I can find the commit at fault.
>>
>> Thank you for the report. Please let me know the results of your
>> bisection.
>
> Bisection underway (fifth attempt -- I *may* have characterized it well
> enough after a few hours of thrashing at it to bisect accurately this
> time).
[...]
> More generally, the problem may be at *shutdown* -- something goes wrong
> during link suspension or something, such that the link never comes up
> again until physically reconnected. So a straight bisect is misleading
> -- the error may have been in the *last* kernel tested -- and even then,
> some kernels (e.g. the 3.15.0 merge base) appear capable of making it
> work fine. But even this is not consistent: sometimes a kernel that
> works fine if you repeatedly reboot it (such as 3.15) malfunctions when
> you reboot into 3.16 -- but sometimes a newly plugged USB key on a 3.16
> kernel malfunctions upon reboot, even if you reboot into a working
> kernel such as 3.15 (and it then proceeds to work indefinitely if you
> unplug and replug it and stick with 3.15.x, but upon rebooting into
> 3.16.x it goes wrong again).

*Finally* bisected, not helped by the fact that I sometimes needed up to
five reboots (!) to see a failure. The guilty commit is this one:

commit 0943d8ead30e9474034cc5e92225ab0fd29fd0d4
Author: Johan Hovold <[email protected]>
Date: Mon May 26 19:23:51 2014 +0200

USB: cdc-acm: use tty-port dtr_rts

Add dtr_rts tty-port operation which implements proper DTR/RTS handling
(e.g. only lower DTR/RTS during shutdown if HUPCL is set).

Note that modem-control locking still needs to be added throughout the
driver.

Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

To re-describe this failure for the people who weren't in the thread: in
3.16.x I often see this output when asking the ekey daemon for the state
of my Simtec Entropy Key (a cdc-acm-based random number generator) after
rebooting my ohci-based Soekris net5501:

fold:~# ekeydctl stats 1
BytesRead=0
BytesWritten=0
ConnectionNonces=0
ConnectionPackets=0
ConnectionRekeys=0
ConnectionResets=0
ConnectionTime=65
EntropyRate=0
FipsFrameRate=0
FrameByteLast=0
FramesOk=0
FramingErrors=0
KeyDbsdShannonPerByteL=0
KeyDbsdShannonPerByteR=0
KeyEnglishBadness=No failure
KeyRawBadness=0
KeyRawShannonPerByteL=0
KeyRawShannonPerByteR=0
KeyRawShannonPerByteX=0
KeyShortBadness=efm_ok
KeyTemperatureC=-273.15
KeyTemperatureF=-459.67
KeyTemperatureK=0
KeyVoltage=0
PacketErrors=0
PacketOK=0
ReadRate=0
TotalEntropy=0
WriteRate=0

This device streams data continuously at at rate of several KiB/s, so
normally we would never expect to see a report of zero bytes read or
written if the key were functional (nor, indeed, a key temperature of
absolute zero!). This failure never occurred in 3.15.x nor any earlier
kernel. (Note: the 'no failure' message above is sent *from the key* to
indicate that the random numbers can be trusted: it is a bit unfortunate
that the code for 'No failure' is 0, which is also the default value
before anything is received from the key. In this case, we're just
seeing the daemon's initialization-time default. As BytesRead indicates,
the key is not talking to us.)

The symptoms are such that it is the kernel you reboot *from* that
causes the failure, not the one you reboot into: once the key fails it
never recovers without physical removal and reinsertion (or, one
presumes, a poweroff of the whole machine, but I haven't tried that)

This is not a consistent failure: sometimes it can take up to four
reboots for the key to fail. As a result, the bisection took forever (I
had to wait until I had a spare weekend day to devote to it). Despite
the errative nature, I'm fairly confident this commit is at fault: with
it reverted, I have restarted a couple of dozen times without failure
symptoms.

(I speculate that the device's firmware may be terminally confused by
having something try to hang it up, since it's not a modem nor anything
like one, as the boot messages correctly proclaim. The firmware isn't
open, so I can't check.)


2014-10-11 20:07:14

by Paul Martin

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

Having been privy to the firmware of the eKey, it is very simplisting,
with no implementation whatsoever of any flow control.

--
Paul Martin <[email protected]>

2014-10-11 22:25:14

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 11 Oct 2014, Paul Martin spake thusly:

> Having been privy to the firmware of the eKey, it is very simplisting,
> with no implementation whatsoever of any flow control.

That's what I thought. (Why would something that just provides data at a
constant rate way below that of even the slowest USB bus *need* flow
control?)

One presumes therefore that the kernel suddenly trying to do flow
control on shutdown would fubar the firmware's internal state, leading
to the symptoms I see.

So, the question becomes, is there a way to spot this general 'no flow
control on this device' thing from the kernel side, or do we need a
blacklist? Or, perhaps, if this is commonplace for cdc-acm devices, a
whitelist? I can't imagine it's *that* commonplace or someone would have
spotted this already in the months and months it took me to do the
bisection.

Maybe all non-modem cdc-acm devices should eschew flow control, or
something? (This is a genuine guess and is almost certainly wrong.)

--
NULL && (void)

2014-10-12 11:14:15

by Paul Martin

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On Sat, Oct 11, 2014 at 11:24:59PM +0100, Nix wrote:
> On 11 Oct 2014, Paul Martin spake thusly:
>
> > Having been privy to the firmware of the eKey, it is very simplistic,
> > with no implementation whatsoever of any flow control.
>
> That's what I thought. (Why would something that just provides data at a
> constant rate way below that of even the slowest USB bus *need* flow
> control?)
>
> One presumes therefore that the kernel suddenly trying to do flow
> control on shutdown would fubar the firmware's internal state, leading
> to the symptoms I see.
>
> So, the question becomes, is there a way to spot this general 'no flow
> control on this device' thing from the kernel side, or do we need a
> blacklist? Or, perhaps, if this is commonplace for cdc-acm devices, a
> whitelist? I can't imagine it's *that* commonplace or someone would have
> spotted this already in the months and months it took me to do the
> bisection.
>
> Maybe all non-modem cdc-acm devices should eschew flow control, or
> something? (This is a genuine guess and is almost certainly wrong.)
>

I'm going to pass this on to Daniel Silverstone, who did the actual
coding of the firmware.

--
Paul Martin <[email protected]>

2014-10-12 18:56:33

by Johan Hovold

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On Sat, Oct 11, 2014 at 11:24:59PM +0100, Nix wrote:
> On 11 Oct 2014, Paul Martin spake thusly:
>
> > Having been privy to the firmware of the eKey, it is very simplisting,
> > with no implementation whatsoever of any flow control.
>
> That's what I thought. (Why would something that just provides data at a
> constant rate way below that of even the slowest USB bus *need* flow
> control?)
>
> One presumes therefore that the kernel suddenly trying to do flow
> control on shutdown would fubar the firmware's internal state, leading
> to the symptoms I see.

The cdc-acm driver was dropping DTR/RTS on shutdown (close) also before
the commit you refer to. One thing it did change however is that this is
now only done if HUPCL is set. Might setting that flag be enough to
prevent the device firmware from crashing?

Johan

2014-10-12 21:36:49

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 12 Oct 2014, Johan Hovold verbalised:

> On Sat, Oct 11, 2014 at 11:24:59PM +0100, Nix wrote:
>> On 11 Oct 2014, Paul Martin spake thusly:
>>
>> > Having been privy to the firmware of the eKey, it is very simplisting,
>> > with no implementation whatsoever of any flow control.
>>
>> That's what I thought. (Why would something that just provides data at a
>> constant rate way below that of even the slowest USB bus *need* flow
>> control?)
>>
>> One presumes therefore that the kernel suddenly trying to do flow
>> control on shutdown would fubar the firmware's internal state, leading
>> to the symptoms I see.
>
> The cdc-acm driver was dropping DTR/RTS on shutdown (close) also before
> the commit you refer to. One thing it did change however is that this is
> now only done if HUPCL is set. Might setting that flag be enough to
> prevent the device firmware from crashing?

If I read the ekeyd 1.1.5 source code correctly, this is already
happening:

,----[ host/stream.c:estream_open() ]
| } else if (S_ISCHR(sbuf.st_mode)) {
| /* Open the file as a character device/tty */
| fd = open(uri, O_RDWR | O_NOCTTY);
| if ((fd != -1) && (isatty(fd))) {
| if (tcgetattr(fd, &settings) == 0 ) {
| settings.c_cflag &= ~(CSIZE | CSTOPB | PARENB | CLOCAL |
| CREAD | PARODD | CRTSCTS);
| settings.c_iflag &= ~(BRKINT | IGNPAR | PARMRK | INPCK |
| ISTRIP | INLCR | IGNCR | ICRNL | IXON |
| IXOFF | IXANY | IMAXBEL);
| settings.c_iflag |= IGNBRK;
| settings.c_oflag &= ~(OPOST | OCRNL | ONOCR | ONLRET);
| settings.c_lflag &= ~(ISIG | ICANON | IEXTEN | ECHO |
| ECHOE | ECHOK | ECHONL | NOFLSH |
| TOSTOP | ECHOPRT | ECHOCTL | ECHOKE);
| settings.c_cflag |= CS8 | HUPCL | CREAD | CLOCAL;
| #ifdef EKEY_FULL_TERMIOS
| settings.c_cflag &= ~(CBAUD);
| settings.c_iflag &= ~(IUTF8 | IUCLC);
| settings.c_oflag &= ~(OFILL | OFDEL | NLDLY | CRDLY | TABDLY |
| BSDLY | VTDLY | FFDLY | OLCUC );
| settings.c_oflag |= NL0 | CR0 | TAB0 | BS0 | VT0 | FF0;
| settings.c_lflag &= ~(XCASE);
| #endif
| settings.c_cflag |= B115200;
| if (tcsetattr(fd, TCSANOW, &settings) < 0) {
`----

Note the HUPCL in there.

I have checked: this code is being executed against a symlink that
points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
succeeding on the kernel I'm running now, but of course that's 3.16.5
with this commit reverted...)

Of course the daemon is stopped before the reboot, closing the device.
(But even if it wasn't, one would assume that the fact the system was
rebooting would be considered tantamount to a close!)

--
NULL && (void)

2014-10-14 08:32:22

by Johan Hovold

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
> On 12 Oct 2014, Johan Hovold verbalised:
>
> > On Sat, Oct 11, 2014 at 11:24:59PM +0100, Nix wrote:
> >> On 11 Oct 2014, Paul Martin spake thusly:
> >>
> >> > Having been privy to the firmware of the eKey, it is very simplisting,
> >> > with no implementation whatsoever of any flow control.
> >>
> >> That's what I thought. (Why would something that just provides data at a
> >> constant rate way below that of even the slowest USB bus *need* flow
> >> control?)
> >>
> >> One presumes therefore that the kernel suddenly trying to do flow
> >> control on shutdown would fubar the firmware's internal state, leading
> >> to the symptoms I see.
> >
> > The cdc-acm driver was dropping DTR/RTS on shutdown (close) also before
> > the commit you refer to. One thing it did change however is that this is
> > now only done if HUPCL is set. Might setting that flag be enough to
> > prevent the device firmware from crashing?
>
> If I read the ekeyd 1.1.5 source code correctly, this is already
> happening:
>
> ,----[ host/stream.c:estream_open() ]
> | } else if (S_ISCHR(sbuf.st_mode)) {
> | /* Open the file as a character device/tty */
> | fd = open(uri, O_RDWR | O_NOCTTY);
> | if ((fd != -1) && (isatty(fd))) {
> | if (tcgetattr(fd, &settings) == 0 ) {
> | settings.c_cflag &= ~(CSIZE | CSTOPB | PARENB | CLOCAL |
> | CREAD | PARODD | CRTSCTS);
> | settings.c_iflag &= ~(BRKINT | IGNPAR | PARMRK | INPCK |
> | ISTRIP | INLCR | IGNCR | ICRNL | IXON |
> | IXOFF | IXANY | IMAXBEL);
> | settings.c_iflag |= IGNBRK;
> | settings.c_oflag &= ~(OPOST | OCRNL | ONOCR | ONLRET);
> | settings.c_lflag &= ~(ISIG | ICANON | IEXTEN | ECHO |
> | ECHOE | ECHOK | ECHONL | NOFLSH |
> | TOSTOP | ECHOPRT | ECHOCTL | ECHOKE);
> | settings.c_cflag |= CS8 | HUPCL | CREAD | CLOCAL;
> | #ifdef EKEY_FULL_TERMIOS
> | settings.c_cflag &= ~(CBAUD);
> | settings.c_iflag &= ~(IUTF8 | IUCLC);
> | settings.c_oflag &= ~(OFILL | OFDEL | NLDLY | CRDLY | TABDLY |
> | BSDLY | VTDLY | FFDLY | OLCUC );
> | settings.c_oflag |= NL0 | CR0 | TAB0 | BS0 | VT0 | FF0;
> | settings.c_lflag &= ~(XCASE);
> | #endif
> | settings.c_cflag |= B115200;
> | if (tcsetattr(fd, TCSANOW, &settings) < 0) {
> `----
>
> Note the HUPCL in there.
>
> I have checked: this code is being executed against a symlink that
> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
> succeeding on the kernel I'm running now, but of course that's 3.16.5
> with this commit reverted...)

You could verify that by enabling debugging in the cdc-acm driver and
making sure that the corresponding control messages are indeed sent on
close.

But you haven't seen any fw crashes since you reverted the commit in
question?

Another thing you could try is to add back the

acm_set_control(acm, 0);

just after the dev_info message in probe.

Johan

2014-10-14 14:44:53

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 14 Oct 2014, Johan Hovold spake thusly:

> On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> On 12 Oct 2014, Johan Hovold verbalised:
>>
>> > On Sat, Oct 11, 2014 at 11:24:59PM +0100, Nix wrote:
>> >> On 11 Oct 2014, Paul Martin spake thusly:
>> >>
>> >> > Having been privy to the firmware of the eKey, it is very simplisting,
>> >> > with no implementation whatsoever of any flow control.
>> >>
>> >> That's what I thought. (Why would something that just provides data at a
>> >> constant rate way below that of even the slowest USB bus *need* flow
>> >> control?)
>> >>
>> >> One presumes therefore that the kernel suddenly trying to do flow
>> >> control on shutdown would fubar the firmware's internal state, leading
>> >> to the symptoms I see.
>> >
>> > The cdc-acm driver was dropping DTR/RTS on shutdown (close) also before
>> > the commit you refer to. One thing it did change however is that this is
>> > now only done if HUPCL is set. Might setting that flag be enough to
>> > prevent the device firmware from crashing?
>>
>> If I read the ekeyd 1.1.5 source code correctly, this is already
>> happening:
>>
>> ,----[ host/stream.c:estream_open() ]
>> | } else if (S_ISCHR(sbuf.st_mode)) {
>> | /* Open the file as a character device/tty */
>> | fd = open(uri, O_RDWR | O_NOCTTY);
>> | if ((fd != -1) && (isatty(fd))) {
>> | if (tcgetattr(fd, &settings) == 0 ) {
>> | settings.c_cflag &= ~(CSIZE | CSTOPB | PARENB | CLOCAL |
>> | CREAD | PARODD | CRTSCTS);
>> | settings.c_iflag &= ~(BRKINT | IGNPAR | PARMRK | INPCK |
>> | ISTRIP | INLCR | IGNCR | ICRNL | IXON |
>> | IXOFF | IXANY | IMAXBEL);
>> | settings.c_iflag |= IGNBRK;
>> | settings.c_oflag &= ~(OPOST | OCRNL | ONOCR | ONLRET);
>> | settings.c_lflag &= ~(ISIG | ICANON | IEXTEN | ECHO |
>> | ECHOE | ECHOK | ECHONL | NOFLSH |
>> | TOSTOP | ECHOPRT | ECHOCTL | ECHOKE);
>> | settings.c_cflag |= CS8 | HUPCL | CREAD | CLOCAL;
>> | #ifdef EKEY_FULL_TERMIOS
>> | settings.c_cflag &= ~(CBAUD);
>> | settings.c_iflag &= ~(IUTF8 | IUCLC);
>> | settings.c_oflag &= ~(OFILL | OFDEL | NLDLY | CRDLY | TABDLY |
>> | BSDLY | VTDLY | FFDLY | OLCUC );
>> | settings.c_oflag |= NL0 | CR0 | TAB0 | BS0 | VT0 | FF0;
>> | settings.c_lflag &= ~(XCASE);
>> | #endif
>> | settings.c_cflag |= B115200;
>> | if (tcsetattr(fd, TCSANOW, &settings) < 0) {
>> `----
>>
>> Note the HUPCL in there.
>>
>> I have checked: this code is being executed against a symlink that
>> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> with this commit reverted...)
>
> You could verify that by enabling debugging in the cdc-acm driver and
> making sure that the corresponding control messages are indeed sent on
> close.

Yeah, good idea -- in the specific context of the system rebooting, in
particular (though I'll also see if just killing the daemon causes this
problem, something I haven't tested). I was assuming it would be hard to
see it before the reboot process cleared the screen -- but of course
this machine has a serial console so that's not important.

> But you haven't seen any fw crashes since you reverted the commit in
> question?

Not one.

> Another thing you could try is to add back the
>
> acm_set_control(acm, 0);
>
> just after the dev_info message in probe.

I'll try that tonight.

--
NULL && (void)

2014-10-17 13:21:52

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 14 Oct 2014, Johan Hovold outgrape:
> Another thing you could try is to add back the
>
> acm_set_control(acm, 0);
>
> just after the dev_info message in probe.

"Add back" suggests that this line existed before this change. It
didn't, as far as I can see. Probing has

acm_set_control(acm, acm->ctrlout);

*shutdown* has

acm_set_control(acm, acm->ctrlout = 0);

Did you mean that I should try adding back the acm_set_control() during
shutdown, or the one during probe?

--
NULL && (void)

2014-10-19 13:48:59

by Johan Hovold

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On Fri, Oct 17, 2014 at 02:21:35PM +0100, Nix wrote:
> On 14 Oct 2014, Johan Hovold outgrape:
> > Another thing you could try is to add back the
> >
> > acm_set_control(acm, 0);
> >
> > just after the dev_info message in probe.
>
> "Add back" suggests that this line existed before this change. It
> didn't, as far as I can see. Probing has
>
> acm_set_control(acm, acm->ctrlout);
>
> *shutdown* has
>
> acm_set_control(acm, acm->ctrlout = 0);
>
> Did you mean that I should try adding back the acm_set_control() during
> shutdown, or the one during probe?

After the dev_info in probe as I wrote. acm->ctrlout will always be 0 at
that point, could have been more clear about that.

Johan

2014-10-22 09:31:36

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 14 Oct 2014, Johan Hovold verbalised:

> On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> I have checked: this code is being executed against a symlink that
>> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> with this commit reverted...)
>
> You could verify that by enabling debugging in the cdc-acm driver and
> making sure that the corresponding control messages are indeed sent on
> close.

I have a debugging dump at
<http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
voluminous because the ekeyd is constantly doing USB reads, but the end
says

Oct 22 10:19:13 fold kern debug: : [ 88.423970] cdc_acm 2-1:1.0: acm_tty_close
Oct 22 10:19:13 fold kern debug: : [ 88.424012] cdc_acm 2-1:1.0: acm_port_shutdown
Oct 22 10:19:13 fold kern debug: : [ 88.440038] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result 0
Oct 22 10:19:13 fold kern debug: : [ 88.440038] cdc_acm 2-1:1.0: acm_ctrl_irq - urb shutting down with status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.440038] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 0, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.440038] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.447588] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 1, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.447613] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.448575] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 2, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.448599] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.449576] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 3, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.449599] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.450578] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 4, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.450602] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.451573] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 5, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.451596] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.452574] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 6, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.452597] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.453567] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 7, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.453588] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.454570] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 8, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.454592] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.462591] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 9, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.462619] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.463568] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 10, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.463590] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.464564] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 11, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.464585] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.465578] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 12, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.465602] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [ 88.466566] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 13, len 0
Oct 22 10:19:13 fold kern debug: : [ 88.466587] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2

which looks, hm, a bit suspicious to me.

> But you haven't seen any fw crashes since you reverted the commit in
> question?

Not a one.

> Another thing you could try is to add back the
>
> acm_set_control(acm, 0);
>
> just after the dev_info message in probe.

Tried that (with, obviously, the commit not reverted) -- rebooted, and

BytesRead=0
BytesWritten=0
ConnectionNonces=0
ConnectionPackets=0
ConnectionRekeys=0
ConnectionResets=0
ConnectionTime=46
EntropyRate=0
FipsFrameRate=0
FrameByteLast=0
FramesOk=0
FramingErrors=0
KeyDbsdShannonPerByteL=0
KeyDbsdShannonPerByteR=0
KeyEnglishBadness=No failure
KeyRawBadness=0
KeyRawShannonPerByteL=0
KeyRawShannonPerByteR=0
KeyRawShannonPerByteX=0
KeyShortBadness=efm_ok
KeyTemperatureC=-273.15
KeyTemperatureF=-459.67
KeyTemperatureK=0
KeyVoltage=0
PacketErrors=0
PacketOK=0
ReadRate=0
TotalEntropy=0
WriteRate=0

So that doesn't help.

--
NULL && (void)

2014-10-22 10:18:04

by Johan Hovold

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
> On 14 Oct 2014, Johan Hovold verbalised:
>
> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
> >> I have checked: this code is being executed against a symlink that
> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
> >> with this commit reverted...)
> >
> > You could verify that by enabling debugging in the cdc-acm driver and
> > making sure that the corresponding control messages are indeed sent on
> > close.
>
> I have a debugging dump at
> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly

What kernel were you using here? The log seems to suggest that it was
generated with the commit in question reverted.

> voluminous because the ekeyd is constantly doing USB reads, but the end
> says

You can enable debugging selectively using the debugfs control file of
dynamic debug, see:

Documentation/dynamic-debug-howto.txt

> Oct 22 10:19:13 fold kern debug: : [ 88.423970] cdc_acm 2-1:1.0: acm_tty_close
> Oct 22 10:19:13 fold kern debug: : [ 88.424012] cdc_acm 2-1:1.0: acm_port_shutdown
> Oct 22 10:19:13 fold kern debug: : [ 88.440038] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result 0

So DTR/RTS is indeed lowered (i.e. HUPCL is set).

> Oct 22 10:19:13 fold kern debug: : [ 88.440038] cdc_acm 2-1:1.0: acm_ctrl_irq - urb shutting down with status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.440038] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 0, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.440038] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.447588] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 1, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.447613] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.448575] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 2, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.448599] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.449576] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 3, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.449599] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.450578] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 4, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.450602] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.451573] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 5, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.451596] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.452574] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 6, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.452597] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.453567] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 7, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.453588] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.454570] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 8, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.454592] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.462591] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 9, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.462619] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.463568] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 10, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.463590] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.464564] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 11, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.464585] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.465578] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 12, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.465602] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [ 88.466566] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 13, len 0
> Oct 22 10:19:13 fold kern debug: : [ 88.466587] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
>
> which looks, hm, a bit suspicious to me.

That's normal, it's just the urbs being killed (stopped) at close.

> > But you haven't seen any fw crashes since you reverted the commit in
> > question?
>
> Not a one.
>
> > Another thing you could try is to add back the
> >
> > acm_set_control(acm, 0);
> >
> > just after the dev_info message in probe.
>
> Tried that (with, obviously, the commit not reverted) -- rebooted, and
>
> BytesRead=0
> BytesWritten=0
> ConnectionNonces=0
> ConnectionPackets=0
> ConnectionRekeys=0
> ConnectionResets=0
> ConnectionTime=46
> EntropyRate=0
> FipsFrameRate=0
> FrameByteLast=0
> FramesOk=0
> FramingErrors=0
> KeyDbsdShannonPerByteL=0
> KeyDbsdShannonPerByteR=0
> KeyEnglishBadness=No failure
> KeyRawBadness=0
> KeyRawShannonPerByteL=0
> KeyRawShannonPerByteR=0
> KeyRawShannonPerByteX=0
> KeyShortBadness=efm_ok
> KeyTemperatureC=-273.15
> KeyTemperatureF=-459.67
> KeyTemperatureK=0
> KeyVoltage=0
> PacketErrors=0
> PacketOK=0
> ReadRate=0
> TotalEntropy=0
> WriteRate=0
>
> So that doesn't help.

Good to know. It was a bit of a long shot.

It seems the only thing that have changed then is that DTR/RTS is now
raised after the read urbs have been submitted during open (rather than
after the ctrlurb was submitted).

Perhaps this slight change in timing causes the device to misbehave
occasionally.

What kernel version are you using? And do you have autosuspend enabled?

Johan

2014-10-22 14:00:34

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 22 Oct 2014, Johan Hovold uttered the following:

> On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
>> On 14 Oct 2014, Johan Hovold verbalised:
>>
>> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> >> I have checked: this code is being executed against a symlink that
>> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> >> with this commit reverted...)
>> >
>> > You could verify that by enabling debugging in the cdc-acm driver and
>> > making sure that the corresponding control messages are indeed sent on
>> > close.
>>
>> I have a debugging dump at
>> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
>
> What kernel were you using here? The log seems to suggest that it was
> generated with the commit in question reverted.

Wurgle. I think I was probably using the wrong one, as you suggest.
Hell. That's no use at all is it. Still, at least you know what it looks
like when it works. :)

I'll retry with the right one.

> What kernel version are you using? And do you have autosuspend enabled?

3.16.6 (but it's happened for 3.16.* obviously). No autosuspend (no
*way* can this box suspend, but it only draws a couple of watts so I
don't care).

--
NULL && (void)

2014-10-22 15:37:13

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 22 Oct 2014, Johan Hovold outgrape:

> On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
>> On 14 Oct 2014, Johan Hovold verbalised:
>>
>> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> >> I have checked: this code is being executed against a symlink that
>> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> >> with this commit reverted...)
>> >
>> > You could verify that by enabling debugging in the cdc-acm driver and
>> > making sure that the corresponding control messages are indeed sent on
>> > close.
>>
>> I have a debugging dump at
>> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
>
> What kernel were you using here? The log seems to suggest that it was
> generated with the commit in question reverted.

Look now :) (the previous log is still there in cdc-acm-reverted.log.)

This contains two boots -- one on which the USB key worked, and the next
(with an identical kernel) with which the key was broken. (I'm not sure
whether this problem happens at startup or shutdown time, so it seemed
best to provide both.)

--
NULL && (void)

2014-10-24 11:17:48

by Johan Hovold

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

[ +CC: linux-usb ]

On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
> On 22 Oct 2014, Johan Hovold outgrape:
>
> > On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
> >> On 14 Oct 2014, Johan Hovold verbalised:
> >>
> >> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
> >> >> I have checked: this code is being executed against a symlink that
> >> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
> >> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
> >> >> with this commit reverted...)
> >> >
> >> > You could verify that by enabling debugging in the cdc-acm driver and
> >> > making sure that the corresponding control messages are indeed sent on
> >> > close.
> >>
> >> I have a debugging dump at
> >> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
> >
> > What kernel were you using here? The log seems to suggest that it was
> > generated with the commit in question reverted.
>
> Look now :) (the previous log is still there in cdc-acm-reverted.log.)

Unfortunately, it seems the logs are incomplete. There are lots of
entries missing (e.g. "acm_tty_install" when opening, but also some
"acm_submit_read_urb"), some of which were there in your first log.

Also you can disable the "acm_tty_write - count" output, which isn't
really useful here.

> This contains two boots -- one on which the USB key worked, and the next
> (with an identical kernel) with which the key was broken. (I'm not sure
> whether this problem happens at startup or shutdown time, so it seemed
> best to provide both.)

That's a good idea.

Is it only after reboot you've seen the device fail? What if you
physically disconnect and reconnect the device instead, or simply
repeatedly open and close it?

Johan

2014-10-24 15:09:29

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 24 Oct 2014, Johan Hovold told this:

> [ +CC: linux-usb ]
> On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
>> On 22 Oct 2014, Johan Hovold outgrape:
>>
>> > On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
>> >> On 14 Oct 2014, Johan Hovold verbalised:
>> >>
>> >> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> >> >> I have checked: this code is being executed against a symlink that
>> >> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> >> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> >> >> with this commit reverted...)
>> >> >
>> >> > You could verify that by enabling debugging in the cdc-acm driver and
>> >> > making sure that the corresponding control messages are indeed sent on
>> >> > close.
>> >>
>> >> I have a debugging dump at
>> >> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
>> >
>> > What kernel were you using here? The log seems to suggest that it was
>> > generated with the commit in question reverted.
>>
>> Look now :) (the previous log is still there in cdc-acm-reverted.log.)
>
> Unfortunately, it seems the logs are incomplete. There are lots of
> entries missing (e.g. "acm_tty_install" when opening, but also some
> "acm_submit_read_urb"), some of which were there in your first log.

Curious. It's a straight cut-and-paste from the syslog. Maybe the kmsg
buffer overflowed, but I start the ekey daemon *after* I start syslogd,
so that seems unlikely.

I'll have another look.

> Is it only after reboot you've seen the device fail?

Yes (though sometimes after reboot of an unaffected kernel into an
affected one! i.e. sometimes the first boot into an affected kernel is
broken).

> What if you
> physically disconnect and reconnect the device instead,

That makes it work.

> or simply
> repeatedly open and close it?

IIRC -- but I'll have to check this tomorrow when I look at the log
problem, so don't take it as gospel -- that doesn't affect it: I can
stop and restart the daemon repeatedly and, if it wasn't working before,
it's not working afterwards: if it was working before, it'll be working
afterwards.

--
NULL && (void)

2014-10-31 16:45:07

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

Sorry for the delay: illness and work-related release time flurries.

On 24 Oct 2014, Johan Hovold told this:

> [ +CC: linux-usb ]
>
> On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
>> On 22 Oct 2014, Johan Hovold outgrape:
>>
>> > On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
>> >> On 14 Oct 2014, Johan Hovold verbalised:
>> >>
>> >> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> >> >> I have checked: this code is being executed against a symlink that
>> >> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> >> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> >> >> with this commit reverted...)
>> >> >
>> >> > You could verify that by enabling debugging in the cdc-acm driver and
>> >> > making sure that the corresponding control messages are indeed sent on
>> >> > close.
>> >>
>> >> I have a debugging dump at
>> >> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
>> >
>> > What kernel were you using here? The log seems to suggest that it was
>> > generated with the commit in question reverted.
>>
>> Look now :) (the previous log is still there in cdc-acm-reverted.log.)
>
> Unfortunately, it seems the logs are incomplete. There are lots of
> entries missing (e.g. "acm_tty_install" when opening, but also some
> "acm_submit_read_urb"), some of which were there in your first log.

OK. What we have now in
<http://www.esperi.org.uk/~nix/temporary/cdc-acm.log> is a log from the
pristine upstream 3.16.6 kernel with cdc-acm debugging turned on and the
acm_tty_write - count stuff in acm_tty_write() disabled: I've increased
the dmesg buffer size so the top isn't being cut off any more. It took a
lot of boots for it to fail this time: about a dozen. The log contains
the boot that failed and the one before.

(To my uneducated eye, the initial traffic to/from the key looks *very*
different in the second boot. Something is clearly wrong by this point,
but that's not much of a surprise!)

>> This contains two boots -- one on which the USB key worked, and the next
>> (with an identical kernel) with which the key was broken. (I'm not sure
>> whether this problem happens at startup or shutdown time, so it seemed
>> best to provide both.)
>
> That's a good idea.
>
> Is it only after reboot you've seen the device fail?

So far.

> What if you
> physically disconnect and reconnect the device instead, or simply

That fixes it, in fact it's the only way to fix it once it's broken by
this bug.

> repeatedly open and close it?

Hm. Good idea.

... no, that doesn't help. Additional log from that shows a lot of what
looks like error returns:

Oct 31 16:38:03 fold kern debug: : [ 168.135213] cdc_acm 2-1:1.0: acm_tty_close
Oct 31 16:38:08 fold kern debug: : [ 173.130531] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110
Oct 31 16:38:08 fold kern debug: : [ 173.130557] cdc_acm 2-1:1.0: acm_port_shutdown
Oct 31 16:38:08 fold kern debug: : [ 173.131475] cdc_acm 2-1:1.0: acm_ctrl_irq - urb shutting down with status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.132474] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [ 173.132519] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [ 173.133473] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [ 173.133510] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [ 173.134471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [ 173.134507] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [ 173.135471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [ 173.135509] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [ 173.136472] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [ 173.136507] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [ 173.137471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [ 173.137517] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [ 173.138471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [ 173.138520] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [ 173.139469] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [ 173.139515] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [ 173.140470] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 0, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.140491] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.141469] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 1, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.141489] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.142469] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 2, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.142490] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.143468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 3, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.143488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.144468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 4, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.144488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.145467] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 5, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.145488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.146467] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 6, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.146487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.147477] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 7, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.147498] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.153496] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 8, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.153524] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.154468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 9, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.154489] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.155466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 10, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.155487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.156466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 11, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.156487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.157466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 12, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.157487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.158466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 13, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.158486] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.159465] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 14, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.159486] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.160475] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 15, len 0
Oct 31 16:38:08 fold kern debug: : [ 173.160496] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [ 173.160567] tty ttyACM0: acm_tty_open
Oct 31 16:38:08 fold kern debug: : [ 173.160588] cdc_acm 2-1:1.0: acm_port_activate
Oct 31 16:38:08 fold kern debug: : [ 173.160626] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 0
Oct 31 16:38:08 fold kern debug: : [ 173.160651] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 1
Oct 31 16:38:08 fold kern debug: : [ 173.160674] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 2
Oct 31 16:38:08 fold kern debug: : [ 173.160796] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 3
Oct 31 16:38:08 fold kern debug: : [ 173.160820] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 4
Oct 31 16:38:08 fold kern debug: : [ 173.160842] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 5
Oct 31 16:38:08 fold kern debug: : [ 173.160863] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 6
Oct 31 16:38:08 fold kern debug: : [ 173.160884] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 7
Oct 31 16:38:08 fold kern debug: : [ 173.160906] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 8
Oct 31 16:38:08 fold kern debug: : [ 173.160927] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 9
Oct 31 16:38:08 fold kern debug: : [ 173.160948] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 10
Oct 31 16:38:08 fold kern debug: : [ 173.160969] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 11
Oct 31 16:38:08 fold kern debug: : [ 173.160991] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 12
Oct 31 16:38:08 fold kern debug: : [ 173.161012] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 13
Oct 31 16:38:08 fold kern debug: : [ 173.161033] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 14
Oct 31 16:38:08 fold kern debug: : [ 173.161054] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 15
Oct 31 16:38:08 fold kern debug: : [ 173.161489] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x3, len 0x0, result -62

2014-11-05 12:00:01

by Johan Hovold

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On Fri, Oct 31, 2014 at 04:44:46PM +0000, Nix wrote:
> Sorry for the delay: illness and work-related release time flurries.
>
> On 24 Oct 2014, Johan Hovold told this:
>
> > [ +CC: linux-usb ]
> >
> > On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
> >> On 22 Oct 2014, Johan Hovold outgrape:
> >>
> >> > On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
> >> >> On 14 Oct 2014, Johan Hovold verbalised:
> >> >>
> >> >> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
> >> >> >> I have checked: this code is being executed against a symlink that
> >> >> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
> >> >> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
> >> >> >> with this commit reverted...)
> >> >> >
> >> >> > You could verify that by enabling debugging in the cdc-acm driver and
> >> >> > making sure that the corresponding control messages are indeed sent on
> >> >> > close.
> >> >>
> >> >> I have a debugging dump at
> >> >> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
> >> >
> >> > What kernel were you using here? The log seems to suggest that it was
> >> > generated with the commit in question reverted.
> >>
> >> Look now :) (the previous log is still there in cdc-acm-reverted.log.)
> >
> > Unfortunately, it seems the logs are incomplete. There are lots of
> > entries missing (e.g. "acm_tty_install" when opening, but also some
> > "acm_submit_read_urb"), some of which were there in your first log.
>
> OK. What we have now in
> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log> is a log from the
> pristine upstream 3.16.6 kernel with cdc-acm debugging turned on and the
> acm_tty_write - count stuff in acm_tty_write() disabled: I've increased
> the dmesg buffer size so the top isn't being cut off any more. It took a
> lot of boots for it to fail this time: about a dozen. The log contains
> the boot that failed and the one before.
>
> (To my uneducated eye, the initial traffic to/from the key looks *very*
> different in the second boot. Something is clearly wrong by this point,
> but that's not much of a surprise!)

The log appears incomplete again, this time it seems the last part is
completely missing (the device is never closed for example). The device
still seems to be responding.

> >> This contains two boots -- one on which the USB key worked, and the next
> >> (with an identical kernel) with which the key was broken. (I'm not sure
> >> whether this problem happens at startup or shutdown time, so it seemed
> >> best to provide both.)
> >
> > That's a good idea.
> >
> > Is it only after reboot you've seen the device fail?
>
> So far.
>
> > What if you
> > physically disconnect and reconnect the device instead, or simply
>
> That fixes it, in fact it's the only way to fix it once it's broken by
> this bug.

I didn't mean whether it would get the device working again, but rather
whether you could kill it this way.

> > repeatedly open and close it?
>
> Hm. Good idea.

Same here, but the below test was also informative.

> ... no, that doesn't help. Additional log from that shows a lot of what
> looks like error returns:
>
> Oct 31 16:38:03 fold kern debug: : [ 168.135213] cdc_acm 2-1:1.0: acm_tty_close
> Oct 31 16:38:08 fold kern debug: : [ 173.130531] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110

Yeah, it seems your device firmware has crashed. It stops responding to
control requests.

> Oct 31 16:38:08 fold kern debug: : [ 173.130557] cdc_acm 2-1:1.0: acm_port_shutdown
> Oct 31 16:38:08 fold kern debug: : [ 173.131475] cdc_acm 2-1:1.0: acm_ctrl_irq - urb shutting down with status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.132474] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [ 173.132519] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [ 173.133473] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [ 173.133510] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [ 173.134471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [ 173.134507] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [ 173.135471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [ 173.135509] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [ 173.136472] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [ 173.136507] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [ 173.137471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [ 173.137517] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [ 173.138471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [ 173.138520] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [ 173.139469] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [ 173.139515] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [ 173.140470] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 0, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.140491] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.141469] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 1, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.141489] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.142469] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 2, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.142490] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.143468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 3, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.143488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.144468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 4, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.144488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.145467] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 5, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.145488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.146467] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 6, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.146487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.147477] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 7, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.147498] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.153496] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 8, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.153524] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.154468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 9, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.154489] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.155466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 10, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.155487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.156466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 11, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.156487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.157466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 12, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.157487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.158466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 13, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.158486] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.159465] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 14, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.159486] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.160475] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 15, len 0
> Oct 31 16:38:08 fold kern debug: : [ 173.160496] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [ 173.160567] tty ttyACM0: acm_tty_open
> Oct 31 16:38:08 fold kern debug: : [ 173.160588] cdc_acm 2-1:1.0: acm_port_activate
> Oct 31 16:38:08 fold kern debug: : [ 173.160626] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 0
> Oct 31 16:38:08 fold kern debug: : [ 173.160651] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 1
> Oct 31 16:38:08 fold kern debug: : [ 173.160674] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 2
> Oct 31 16:38:08 fold kern debug: : [ 173.160796] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 3
> Oct 31 16:38:08 fold kern debug: : [ 173.160820] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 4
> Oct 31 16:38:08 fold kern debug: : [ 173.160842] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 5
> Oct 31 16:38:08 fold kern debug: : [ 173.160863] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 6
> Oct 31 16:38:08 fold kern debug: : [ 173.160884] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 7
> Oct 31 16:38:08 fold kern debug: : [ 173.160906] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 8
> Oct 31 16:38:08 fold kern debug: : [ 173.160927] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 9
> Oct 31 16:38:08 fold kern debug: : [ 173.160948] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 10
> Oct 31 16:38:08 fold kern debug: : [ 173.160969] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 11
> Oct 31 16:38:08 fold kern debug: : [ 173.160991] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 12
> Oct 31 16:38:08 fold kern debug: : [ 173.161012] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 13
> Oct 31 16:38:08 fold kern debug: : [ 173.161033] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 14
> Oct 31 16:38:08 fold kern debug: : [ 173.161054] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 15

The above is all normal, but

> Oct 31 16:38:08 fold kern debug: : [ 173.161489] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x3, len 0x0, result -62

here's another timeout. It's dead.

Did you get anywhere with trying to look at the device firmware?

Johan

2014-11-05 15:15:02

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 5 Nov 2014, Johan Hovold stated:

> On Fri, Oct 31, 2014 at 04:44:46PM +0000, Nix wrote:
>> Sorry for the delay: illness and work-related release time flurries.
>>
>> On 24 Oct 2014, Johan Hovold told this:
>>
>> > [ +CC: linux-usb ]
>> >
>> > On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
>> >> On 22 Oct 2014, Johan Hovold outgrape:
>> >>
>> >> > On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
>> >> >> On 14 Oct 2014, Johan Hovold verbalised:
>> >> >>
>> >> >> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> >> >> >> I have checked: this code is being executed against a symlink that
>> >> >> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> >> >> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> >> >> >> with this commit reverted...)
>> >> >> >
>> >> >> > You could verify that by enabling debugging in the cdc-acm driver and
>> >> >> > making sure that the corresponding control messages are indeed sent on
>> >> >> > close.
>> >> >>
>> >> >> I have a debugging dump at
>> >> >> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
>> >> >
>> >> > What kernel were you using here? The log seems to suggest that it was
>> >> > generated with the commit in question reverted.
>> >>
>> >> Look now :) (the previous log is still there in cdc-acm-reverted.log.)
>> >
>> > Unfortunately, it seems the logs are incomplete. There are lots of
>> > entries missing (e.g. "acm_tty_install" when opening, but also some
>> > "acm_submit_read_urb"), some of which were there in your first log.
>>
>> OK. What we have now in
>> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log> is a log from the
>> pristine upstream 3.16.6 kernel with cdc-acm debugging turned on and the
>> acm_tty_write - count stuff in acm_tty_write() disabled: I've increased
>> the dmesg buffer size so the top isn't being cut off any more. It took a
>> lot of boots for it to fail this time: about a dozen. The log contains
>> the boot that failed and the one before.
>>
>> (To my uneducated eye, the initial traffic to/from the key looks *very*
>> different in the second boot. Something is clearly wrong by this point,
>> but that's not much of a surprise!)
>
> The log appears incomplete again, this time it seems the last part is
> completely missing (the device is never closed for example). The device
> still seems to be responding.

Nope -- by the time I clipped the log, the device was definitely
nonresponsive.

I've appended the remaining log, just in case. This is the same as the
snapshot I added to the email itself last time: a close-and-open as I
tried restarting the daemon, and another close as part of system
shutdown.

>> > What if you
>> > physically disconnect and reconnect the device instead, or simply
>>
>> That fixes it, in fact it's the only way to fix it once it's broken by
>> this bug.
>
> I didn't mean whether it would get the device working again, but rather
> whether you could kill it this way.

Never seen it fail after a physical disconnection.

>> ... no, that doesn't help. Additional log from that shows a lot of what
>> looks like error returns:
>>
>> Oct 31 16:38:03 fold kern debug: : [ 168.135213] cdc_acm 2-1:1.0: acm_tty_close
>> Oct 31 16:38:08 fold kern debug: : [ 173.130531] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110
>
> Yeah, it seems your device firmware has crashed. It stops responding to
> control requests.

Not surprising: I was fairly sure we were provoking a key firmware crash
or something like that. This is a device with no support for flow
control at all, after all, so I'm not terribly surprised that trying to
do flow control confuses it.

> The above is all normal, but
>
>> Oct 31 16:38:08 fold kern debug: : [ 173.161489] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x3, len 0x0, result -62
>
> here's another timeout. It's dead.

Again, not surprising.

> Did you get anywhere with trying to look at the device firmware?

Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
of the firmware I have is baked into the sealed key. :)

--
NULL && (void)

2014-11-05 15:56:06

by Daniel Silverstone

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On Wed, Nov 05, 2014 at 15:14:49 +0000, Nix wrote:
> > Did you get anywhere with trying to look at the device firmware?
> Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
> of the firmware I have is baked into the sealed key. :)

Sadly I can't look at the exact firmware of the device, but I can tell you it
was based on ST Electronics' virtual COM port example in their *old* STM32
development kits.

Fortunately I do have a record of that example code here:
https://git.gitano.org.uk/stm32-virtcomport.git/

I'm not a USB guru though.

D.

--
Daniel Silverstone http://www.digital-scurf.org/
PGP mail accepted and encouraged. Key Id: 3CCE BABE 206C 3B69

2014-11-05 18:18:07

by Johan Hovold

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On Wed, Nov 05, 2014 at 03:14:49PM +0000, Nix wrote:
> On 5 Nov 2014, Johan Hovold stated:
>
> > On Fri, Oct 31, 2014 at 04:44:46PM +0000, Nix wrote:
> >> Sorry for the delay: illness and work-related release time flurries.
> >>
> >> On 24 Oct 2014, Johan Hovold told this:
> >>

> > The log appears incomplete again, this time it seems the last part is
> > completely missing (the device is never closed for example). The device
> > still seems to be responding.
>
> Nope -- by the time I clipped the log, the device was definitely
> nonresponsive.
>
> I've appended the remaining log, just in case. This is the same as the
> snapshot I added to the email itself last time: a close-and-open as I
> tried restarting the daemon, and another close as part of system
> shutdown.

Ok, yeah, there's the crash.

> >> > What if you
> >> > physically disconnect and reconnect the device instead, or simply
> >>
> >> That fixes it, in fact it's the only way to fix it once it's broken by
> >> this bug.
> >
> > I didn't mean whether it would get the device working again, but rather
> > whether you could kill it this way.
>
> Never seen it fail after a physical disconnection.

Ok.

And it has to include an enumeration, since just opening and closing it
(restarting the daemon) repeatedly seems to work?

> >> ... no, that doesn't help. Additional log from that shows a lot of what
> >> looks like error returns:
> >>
> >> Oct 31 16:38:03 fold kern debug: : [ 168.135213] cdc_acm 2-1:1.0: acm_tty_close
> >> Oct 31 16:38:08 fold kern debug: : [ 173.130531] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110
> >
> > Yeah, it seems your device firmware has crashed. It stops responding to
> > control requests.
>
> Not surprising: I was fairly sure we were provoking a key firmware crash
> or something like that. This is a device with no support for flow
> control at all, after all, so I'm not terribly surprised that trying to
> do flow control confuses it.

Only weird thing is that it has been coping with those calls for a long
time. Only the slightly changed timings seems to have triggered this
issue.

> > The above is all normal, but
> >
> >> Oct 31 16:38:08 fold kern debug: : [ 173.161489] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x3, len 0x0, result -62
> >
> > here's another timeout. It's dead.
>
> Again, not surprising.
>
> > Did you get anywhere with trying to look at the device firmware?
>
> Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
> of the firmware I have is baked into the sealed key. :)

Ah, ok. I guess we need a new quirk then. There's already a quirk in the
driver to suppress error from when trying to set the control lines.

But that doesn't help this device, which happily accepts the requests
and then dies at random times.

Could you try two more things (too make sure line control is really the
culprit):

1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
that fix the stability issue?

2. Can you verify that the patch below works? Did I use the correct
VID/PID? Could you provide "lsusb -v" output (for the capability flags)
as well?

Note that the patch is against the usb-linus branch of the usb tree,
which also contains another fix which could affect this device
(set_termios will now be called an extra time on every open). You also
need this one, which have not yet been applied:

http://marc.info/?l=linux-usb&m=141520959813505&w=2

Thanks,
Johan


>From 76abc8a7eda9ea1978ee3527c773210c28332317 Mon Sep 17 00:00:00 2001
From: Johan Hovold <[email protected]>
Date: Wed, 5 Nov 2014 18:51:54 +0100
Subject: [PATCH] USB: cdc-acm: add quirk for control-line state requests

Add new quirk for devices that cannot handle control-line state
requests.

Note that we currently send these requests to all devices, regardless of
whether they claim to support it, but that errors are only logged if
support is claimed.

Since commit 0943d8ead30e ("USB: cdc-acm: use tty-port dtr_rts"), which
only changed the timings for these requests slightly, this has been
reported to cause occasional firmware crashes on Simtec Electronics
Entropy Key devices after re-enumeration. Enable the quirk for this
device.

Reported-by: Nix <[email protected]>
Cc: stable <[email protected]> # v3.16
Signed-off-by: Johan Hovold <[email protected]>
---
drivers/usb/class/cdc-acm.c | 14 ++++++++++++--
drivers/usb/class/cdc-acm.h | 2 ++
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 9d6495424b06..077d58ac3dcb 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -148,8 +148,15 @@ static int acm_ctrl_msg(struct acm *acm, int request, int value,
/* devices aren't required to support these requests.
* the cdc acm descriptor tells whether they do...
*/
-#define acm_set_control(acm, control) \
- acm_ctrl_msg(acm, USB_CDC_REQ_SET_CONTROL_LINE_STATE, control, NULL, 0)
+static inline int acm_set_control(struct acm *acm, int control)
+{
+ if (acm->quirks & QUIRK_CONTROL_LINE_STATE)
+ return -EOPNOTSUPP;
+
+ return acm_ctrl_msg(acm, USB_CDC_REQ_SET_CONTROL_LINE_STATE,
+ control, NULL, 0);
+}
+
#define acm_set_line(acm, line) \
acm_ctrl_msg(acm, USB_CDC_REQ_SET_LINE_CODING, 0, line, sizeof *(line))
#define acm_send_break(acm, ms) \
@@ -1320,6 +1327,7 @@ made_compressed_probe:
tty_port_init(&acm->port);
acm->port.ops = &acm_port_ops;
init_usb_anchor(&acm->delayed);
+ acm->quirks = quirks;

buf = usb_alloc_coherent(usb_dev, ctrlsize, GFP_KERNEL, &acm->ctrl_dma);
if (!buf) {
@@ -1687,6 +1695,8 @@ static const struct usb_device_id acm_ids[] = {
{ USB_DEVICE(0x0572, 0x1328), /* Shiro / Aztech USB MODEM UM-3100 */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},
+ { USB_DEVICE(0x20df, 0x0001), /* Simtec Electronics Entropy Key */
+ .driver_info = QUIRK_CONTROL_LINE_STATE, },
{ USB_DEVICE(0x2184, 0x001c) }, /* GW Instek AFG-2225 */
{ USB_DEVICE(0x22b8, 0x6425), /* Motorola MOTOMAGX phones */
},
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index fc75651afe1c..d3251ebd09e2 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -121,6 +121,7 @@ struct acm {
unsigned int throttle_req:1; /* throttle requested */
u8 bInterval;
struct usb_anchor delayed; /* writes queued for a device about to be woken */
+ unsigned long quirks;
};

#define CDC_DATA_INTERFACE_TYPE 0x0a
@@ -132,3 +133,4 @@ struct acm {
#define NOT_A_MODEM BIT(3)
#define NO_DATA_INTERFACE BIT(4)
#define IGNORE_DEVICE BIT(5)
+#define QUIRK_CONTROL_LINE_STATE BIT(6)
--
2.0.4

2014-11-06 13:49:50

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 5 Nov 2014, Johan Hovold told this:

> On Wed, Nov 05, 2014 at 03:14:49PM +0000, Nix wrote:
>> >> > What if you
>> >> > physically disconnect and reconnect the device instead, or simply
>> >>
>> >> That fixes it, in fact it's the only way to fix it once it's broken by
>> >> this bug.
>> >
>> > I didn't mean whether it would get the device working again, but rather
>> > whether you could kill it this way.
>>
>> Never seen it fail after a physical disconnection.
>
> Ok.
>
> And it has to include an enumeration, since just opening and closing it
> (restarting the daemon) repeatedly seems to work?

Well, it's more accurate to say that restarting the daemon doesn't make
it fail, but won't make it start working if it's already gone nuts
either. It normally copes fine with the daemon stopping and starting,
yes (and the daemon copes fine with keys being connected and
disconnected).

>> > Yeah, it seems your device firmware has crashed. It stops responding to
>> > control requests.
>>
>> Not surprising: I was fairly sure we were provoking a key firmware crash
>> or something like that. This is a device with no support for flow
>> control at all, after all, so I'm not terribly surprised that trying to
>> do flow control confuses it.
>
> Only weird thing is that it has been coping with those calls for a long
> time. Only the slightly changed timings seems to have triggered this
> issue.

Yeah. I get the strong impression from Daniel that the USB side of this
was done by taking something that worked on the kernel of the day,
adding the key-specific stuff to it, and making sure that it still
worked. i.e. this was not a from-spec reimplementation, it was using an
existing (old) stack. If that stack makes iffy assumptions, so will the
entropy key.

>> Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
>> of the firmware I have is baked into the sealed key. :)

As his email noted, no he can't :) but you do now have a link to the
thing it was based on.

> Ah, ok. I guess we need a new quirk then. There's already a quirk in the
> driver to suppress error from when trying to set the control lines.
>
> But that doesn't help this device, which happily accepts the requests
> and then dies at random times.

Yeah. Strange. (I thought it was it's 'right after', but you seem to
have disproved that. It's only 'sometime later'.)

> Could you try two more things (too make sure line control is really the
> culprit):
>
> 1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
> that fix the stability issue?

Definitely not. I got a hang after the first reboot out of an afflicted
kernel, when using a HUPCLless ekeyd. Weird. (I guess they're lowered on
reboot anyway?)

> 2. Can you verify that the patch below works? Did I use the correct
> VID/PID? Could you provide "lsusb -v" output (for the capability flags)
> as well?

The VID/PID are right.

The patch seems to work. I tested against the usb testing tree directly,
since that was easier than adjusting it to apply against 3.16. Sixteen
reboots, no failures, looks to be fixed.

lsusb output:

Bus 002 Device 003: ID 20df:0001 Simtec Electronics Entropy Key [UDEKEY01]
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 2 Communications
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x20df Simtec Electronics
idProduct 0x0001 Entropy Key [UDEKEY01]
bcdDevice 2.00
iManufacturer 1 Simtec Electronics
iProduct 2 Entropy Key
iSerial 3 M/9tBjBLNzE2RSFD
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 67
bNumInterfaces 2
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 76mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
CDC Header:
bcdCDC 1.10
CDC Call Management:
bmCapabilities 0x00
bDataInterface 1
CDC ACM:
bmCapabilities 0x00
CDC Union:
bMasterInterface 0
bSlaveInterface 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0008 1x 8 bytes
bInterval 255
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 10 CDC Data
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Device Status: 0x0000
(Bus Powered)

> Note that the patch is against the usb-linus branch of the usb tree,

OK. (I presume this is gregkh's tree, not yours?)

> which also contains another fix which could affect this device
> (set_termios will now be called an extra time on every open). You also
> need this one, which have not yet been applied:
>
> http://marc.info/?l=linux-usb&m=141520959813505&w=2

It had been applied by the time I tested this :)

--
NULL && (void)

2014-11-06 17:08:09

by Johan Hovold

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On Thu, Nov 06, 2014 at 01:49:13PM +0000, Nix wrote:
> On 5 Nov 2014, Johan Hovold told this:
> > On Wed, Nov 05, 2014 at 03:14:49PM +0000, Nix wrote:

> > Could you try two more things (too make sure line control is really the
> > culprit):
> >
> > 1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
> > that fix the stability issue?
>
> Definitely not. I got a hang after the first reboot out of an afflicted
> kernel, when using a HUPCLless ekeyd. Weird. (I guess they're lowered on
> reboot anyway?)

It's actually only the timings related to the control-lines being raised
on open that has changed, so this would seem consistent with that.

> > 2. Can you verify that the patch below works? Did I use the correct
> > VID/PID? Could you provide "lsusb -v" output (for the capability flags)
> > as well?
>
> The VID/PID are right.
>
> The patch seems to work. I tested against the usb testing tree directly,
> since that was easier than adjusting it to apply against 3.16. Sixteen
> reboots, no failures, looks to be fixed.

Great, thanks for testing.

> > Note that the patch is against the usb-linus branch of the usb tree,
>
> OK. (I presume this is gregkh's tree, not yours?)

Yes, Greg's tree, could have been more clear about that.

> > which also contains another fix which could affect this device
> > (set_termios will now be called an extra time on every open). You also
> > need this one, which have not yet been applied:
> >
> > http://marc.info/?l=linux-usb&m=141520959813505&w=2
>
> It had been applied by the time I tested this :)

Greg is fast. :)

Thanks for tracking this down. That bisect cannot have been fun given
the low failure rate (sometimes only one in ten reboots?).

Johan

2014-11-06 17:13:20

by Johan Hovold

[permalink] [raw]
Subject: [PATCH] USB: cdc-acm: add quirk for control-line state requests

Add new quirk for devices that cannot handle control-line state
requests.

Note that we currently send these requests to all devices, regardless of
whether they claim to support it, but that errors are only logged if
support is claimed.

Since commit 0943d8ead30e ("USB: cdc-acm: use tty-port dtr_rts"), which
only changed the timings for these requests slightly, this has been
reported to cause occasional firmware crashes on Simtec Electronics
Entropy Key devices after re-enumeration. Enable the quirk for this
device.

Reported-by: Nix <[email protected]>
Tested-by: Nix <[email protected]>
Cc: stable <[email protected]> # v3.16
Signed-off-by: Johan Hovold <[email protected]>
---

Greg,

I believe this should into v3.18 as it fixes a reported regression with
these devices.

Johan


drivers/usb/class/cdc-acm.c | 14 ++++++++++++--
drivers/usb/class/cdc-acm.h | 2 ++
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 9d6495424b06..077d58ac3dcb 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -148,8 +148,15 @@ static int acm_ctrl_msg(struct acm *acm, int request, int value,
/* devices aren't required to support these requests.
* the cdc acm descriptor tells whether they do...
*/
-#define acm_set_control(acm, control) \
- acm_ctrl_msg(acm, USB_CDC_REQ_SET_CONTROL_LINE_STATE, control, NULL, 0)
+static inline int acm_set_control(struct acm *acm, int control)
+{
+ if (acm->quirks & QUIRK_CONTROL_LINE_STATE)
+ return -EOPNOTSUPP;
+
+ return acm_ctrl_msg(acm, USB_CDC_REQ_SET_CONTROL_LINE_STATE,
+ control, NULL, 0);
+}
+
#define acm_set_line(acm, line) \
acm_ctrl_msg(acm, USB_CDC_REQ_SET_LINE_CODING, 0, line, sizeof *(line))
#define acm_send_break(acm, ms) \
@@ -1320,6 +1327,7 @@ made_compressed_probe:
tty_port_init(&acm->port);
acm->port.ops = &acm_port_ops;
init_usb_anchor(&acm->delayed);
+ acm->quirks = quirks;

buf = usb_alloc_coherent(usb_dev, ctrlsize, GFP_KERNEL, &acm->ctrl_dma);
if (!buf) {
@@ -1687,6 +1695,8 @@ static const struct usb_device_id acm_ids[] = {
{ USB_DEVICE(0x0572, 0x1328), /* Shiro / Aztech USB MODEM UM-3100 */
.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
},
+ { USB_DEVICE(0x20df, 0x0001), /* Simtec Electronics Entropy Key */
+ .driver_info = QUIRK_CONTROL_LINE_STATE, },
{ USB_DEVICE(0x2184, 0x001c) }, /* GW Instek AFG-2225 */
{ USB_DEVICE(0x22b8, 0x6425), /* Motorola MOTOMAGX phones */
},
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index fc75651afe1c..d3251ebd09e2 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -121,6 +121,7 @@ struct acm {
unsigned int throttle_req:1; /* throttle requested */
u8 bInterval;
struct usb_anchor delayed; /* writes queued for a device about to be woken */
+ unsigned long quirks;
};

#define CDC_DATA_INTERFACE_TYPE 0x0a
@@ -132,3 +133,4 @@ struct acm {
#define NOT_A_MODEM BIT(3)
#define NO_DATA_INTERFACE BIT(4)
#define IGNORE_DEVICE BIT(5)
+#define QUIRK_CONTROL_LINE_STATE BIT(6)
--
2.0.4

2014-11-06 17:14:28

by Nix

[permalink] [raw]
Subject: Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

On 6 Nov 2014, Johan Hovold said:

> On Thu, Nov 06, 2014 at 01:49:13PM +0000, Nix wrote:
>> On 5 Nov 2014, Johan Hovold told this:
>> > On Wed, Nov 05, 2014 at 03:14:49PM +0000, Nix wrote:
>
>> > Could you try two more things (too make sure line control is really the
>> > culprit):
>> >
>> > 1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
>> > that fix the stability issue?
>>
>> Definitely not. I got a hang after the first reboot out of an afflicted
>> kernel, when using a HUPCLless ekeyd. Weird. (I guess they're lowered on
>> reboot anyway?)
>
> It's actually only the timings related to the control-lines being raised
> on open that has changed, so this would seem consistent with that.

Urgh. No wonder it was intermittent.

> Thanks for tracking this down. That bisect cannot have been fun given
> the low failure rate (sometimes only one in ten reboots?).

It often failed quite fast, but yes, the negative case was hard to
prove: I had to rewind twice. I counted reboots because I'm a flaming
aspie pedant. 173 reboots that took... thank goodness it replicated on
the machine of mine that's fastest to reboot!

--
NULL && (void)

2014-11-07 09:05:17

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] USB: cdc-acm: add quirk for control-line state requests

On Thu, 2014-11-06 at 18:08 +0100, Johan Hovold wrote:
> Add new quirk for devices that cannot handle control-line state
> requests.
>
> Note that we currently send these requests to all devices, regardless
> of
> whether they claim to support it, but that errors are only logged if
> support is claimed.

That makes me wonder whether we should do this. What do you think?

Regards
Oliver

2014-11-07 09:19:27

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH] USB: cdc-acm: add quirk for control-line state requests

On Fri, Nov 07, 2014 at 10:05:12AM +0100, Oliver Neukum wrote:
> On Thu, 2014-11-06 at 18:08 +0100, Johan Hovold wrote:
> > Add new quirk for devices that cannot handle control-line state
> > requests.
> >
> > Note that we currently send these requests to all devices, regardless
> > of
> > whether they claim to support it, but that errors are only logged if
> > support is claimed.
>
> That makes me wonder whether we should do this. What do you think?

My interpretation was that it's done this way as there may be devices
with broken CDC headers which fail to set the corresponding capability
bits, but still support the request (c.f. our recent not-a-modem
discussion).

In that case, always attempting the request, but only reporting errors
if support was claimed, makes sense.

As changing this behaviour now would risk breaking such devices, I
think black-listing (i.e. this patch) is preferred moving forward.

Johan

2014-11-07 10:23:29

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] USB: cdc-acm: add quirk for control-line state requests

On Fri, 2014-11-07 at 10:16 +0100, Johan Hovold wrote:
> On Fri, Nov 07, 2014 at 10:05:12AM +0100, Oliver Neukum wrote:
> > On Thu, 2014-11-06 at 18:08 +0100, Johan Hovold wrote:
> > > Add new quirk for devices that cannot handle control-line state
> > > requests.
> > >
> > > Note that we currently send these requests to all devices, regardless
> > > of
> > > whether they claim to support it, but that errors are only logged if
> > > support is claimed.
> >
> > That makes me wonder whether we should do this. What do you think?
>
> My interpretation was that it's done this way as there may be devices
> with broken CDC headers which fail to set the corresponding capability
> bits, but still support the request (c.f. our recent not-a-modem
> discussion).

Oh well, yes I don't like it, but we can't risk the change.

> In that case, always attempting the request, but only reporting errors
> if support was claimed, makes sense.
>
> As changing this behaviour now would risk breaking such devices, I
> think black-listing (i.e. this patch) is preferred moving forward.

Unfortunately, yes.

Regards
Oliver