2007-08-10 09:14:19

by Daniel Exner

[permalink] [raw]
Subject: EHCI Regression in 2.6.23-rc2

Hi!

Please CC me, as I'm currently not subscribed to this list, thx.

After some serious hangs with 2.6.23-rc2 I did some bisects and this was the
result:

196705c9bbc03540429b0f7cf9ee35c2f928a534 is first bad commit
commit 196705c9bbc03540429b0f7cf9ee35c2f928a534
Author: [email protected] <[email protected]>
Date: Thu May 3 08:58:49 2007 -0700

USB: EHCI cpufreq fix

EHCI controllers that don't cache enough microframes can get MMF errors
when CPU frequency changes occur between the start and completion of
split interrupt transactions, due to delays in reading main memory
(caused by CPU cache snoop delays).

This patch adds a cpufreq notifier to the EHCI driver that will
inactivate split interrupt transactions during frequency transitions.
It was tested on Intel ICH7 and Serverworks/Broadcom HT1000 EHCI
controllers.

Signed-off-by: Stuart Hayes <[email protected]>
Signed-off-by: David Brownell <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

:040000 040000 0e6d518de17cf18155c7529f7b044a4660ca24e9
736bbcc7d3fb138138ee1840d8a6b83b959c07fc M drivers

As expected my system only hangs when cpufreq, powernow-k8 and ehci modules
are loaded, and some transition should occur.
(Simulated by using userspace governour and changing freq manually)

The corresponding EHCI Controller is:

00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) (prog-if 20
[EHCI])
Subsystem: ASUSTeK Computer Inc. A7V600/K8V-X/A8V Deluxe motherboard

I could not get my hands on any output while the hang occurs, seems like the
CPU is really bad locked.

Greetings
Daniel Exner


2007-08-10 09:28:31

by Jiri Kosina

[permalink] [raw]
Subject: Re: EHCI Regression in 2.6.23-rc2

On Fri, 10 Aug 2007, Daniel Exner wrote:

> Please CC me, as I'm currently not subscribed to this list, thx.

Please also don't forget to CC relevant people/lists when reporting bugs,
thanks.

> After some serious hangs with 2.6.23-rc2 I did some bisects and this was the
> result:
> 196705c9bbc03540429b0f7cf9ee35c2f928a534 is first bad commit
> commit 196705c9bbc03540429b0f7cf9ee35c2f928a534
> Author: [email protected] <[email protected]>
> Date: Thu May 3 08:58:49 2007 -0700

I guess that the patch attached to bug 8535 in kernel.org bugzilla --
http://bugzilla.kernel.org/attachment.cgi?id=12228&action=view -- solves
your issues, right?

Stuart, did you submit this fix for upstream already please?

--
Jiri Kosina

2007-08-10 13:37:25

by Daniel Exner

[permalink] [raw]
Subject: Re: EHCI Regression in 2.6.23-rc2

Jiri Kosina wrote:
> On Fri, 10 Aug 2007, Daniel Exner wrote:
> > Please CC me, as I'm currently not subscribed to this list, thx.
>
> Please also don't forget to CC relevant people/lists when reporting bugs,
> thanks.
Guess its ok, now? Thanks anyway :)

> > After some serious hangs with 2.6.23-rc2 I did some bisects and this was
> > the result:
> > 196705c9bbc03540429b0f7cf9ee35c2f928a534 is first bad commit
> > commit 196705c9bbc03540429b0f7cf9ee35c2f928a534
> > Author: [email protected] <[email protected]>
> > Date: Thu May 3 08:58:49 2007 -0700
>
> I guess that the patch attached to bug 8535 in kernel.org bugzilla --
> http://bugzilla.kernel.org/attachment.cgi?id=12228&action=view -- solves
> your issues, right?
Nope, this does _not_ fix my issue.

Anything else I could try, or some files you need?
I tried finding some clue in my logs, but without any results so far.

Greetings
Daniel Exner

2007-08-10 15:15:56

by Stuart Hayes

[permalink] [raw]
Subject: RE: EHCI Regression in 2.6.23-rc2

Jiri Kosina wrote:
> On Fri, 10 Aug 2007, Daniel Exner wrote:
>
>> After some serious hangs with 2.6.23-rc2 I did some bisects and this
>> was the result:
>> 196705c9bbc03540429b0f7cf9ee35c2f928a534 is first bad commit commit
>> 196705c9bbc03540429b0f7cf9ee35c2f928a534
>> Author: [email protected] <[email protected]>
>> Date: Thu May 3 08:58:49 2007 -0700
>
> I guess that the patch attached to bug 8535 in kernel.org bugzilla --
> http://bugzilla.kernel.org/attachment.cgi?id=12228&action=view --
> solves your issues, right?
>
> Stuart, did you submit this fix for upstream already please?

Yes... http://marc.info/?l=linux-usb-devel&m=118598561010046&w=2

However, I have not tested this with a VIA EHCI controller (though it's
been tested with Intel, Broadcom, and nVidia). This patch uses the
"inactivate" bit in the QH, which wasn't previously used by the linux
kernel, and I found that the different vendors of EHCI controllers
(Intel, Broadcom, nVidia) all handle this a little differently. There's
probably something about the way VIA controllers respond to seeing this
bit set that is breaking things.

I'll try to get my hands on a VIA EHCI controller so I can look at
this... if you happen to know of an add-in card that has one of these,
please let me know! It would be a lot easier for me to debug this
myself here than to try to get someone else to run test kernels for
me...

2007-08-13 20:49:33

by Stuart Hayes

[permalink] [raw]
Subject: RE: EHCI Regression in 2.6.23-rc2

Daniel Exner wrote:
> Jiri Kosina wrote:
>> On Fri, 10 Aug 2007, Daniel Exner wrote:
>>> Please CC me, as I'm currently not subscribed to this list, thx.
>>
>> Please also don't forget to CC relevant people/lists when reporting
>> bugs, thanks.
> Guess its ok, now? Thanks anyway :)
>
>>> After some serious hangs with 2.6.23-rc2 I did some bisects and
>>> this was the result: 196705c9bbc03540429b0f7cf9ee35c2f928a534 is
>>> first bad commit commit 196705c9bbc03540429b0f7cf9ee35c2f928a534
>>> Author: [email protected] <[email protected]>
>>> Date: Thu May 3 08:58:49 2007 -0700
>>
>> I guess that the patch attached to bug 8535 in kernel.org bugzilla --
>> http://bugzilla.kernel.org/attachment.cgi?id=12228&action=view --
>> solves your issues, right?
> Nope, this does _not_ fix my issue.
>
> Anything else I could try, or some files you need?
> I tried finding some clue in my logs, but without any results so far.
>
> Greetings
> Daniel Exner


It appears that the VIA controllers just ignore the "inactivate" bit
completely.

Normally, when I set the "inactivate" bit in the QH and then watch the
QH & overlay, I eventually see the controller clear the "active" bit in
the overlay token, and, of course, it doesn't do the transaction.

With the VIA controller I have, after I set the "inactivate" bit, I
eventually see the controller set bit 1 in the overlay token
(SplitXstate), indicating that it's running the transaction, and, a
couple microframes later, it clears that bit again. The transaction is
not inactivated.

The problem occurs if a transaction completes when the "inactivate" bit
is set... qh_completions will ignore the transaction until the
"inactivate" bit is cleared, and then, when the transaction should be
re-activated, my patch will set the "active" bit back to 1 in the
overlay & qtd token, even though the transaction was already completed
by the controller...

To work around this, I'd have to re-write my patch so that it didn't
depend on the "inactivate" bit at all... I suppose it could possibly be
done just by directly manipulating the "active" bit in the overlay
token, since already the code doesn't mess with the overlay if there's
any chance that the transaction is alrady cached or in progress, but
that would be tricky.

Perhaps for now the best thing would just be to bypass the EHCI CPU
frequency notifier code (i.e., my patch) for VIA EHCI controllers, since
they are broken. Would a hard-coded blacklist (just an "if
(manufacturer==VIA)..." type thing) be OK?

I've also acquired a card with an NEC EHCI controller on it, which I'm
going to look at while I'm into it...

Thanks
Stuart

2007-08-13 22:16:35

by David Brownell

[permalink] [raw]
Subject: Re: EHCI Regression in 2.6.23-rc2

On Monday 13 August 2007, [email protected] wrote:
> With the VIA controller I have,

Which kind is that? The VT6202 is buggy as all get-out, and
they sold a *LOT* of those discrete chips for use in add-on PCI
cards. We generally warn people away from those. A more current
version is the VT6212, which was much more usable. (If it says
EHCI 0.95, it's a VT6202... their EHCI 1.0 chips were much better.)


> after I set the "inactivate" bit, I
> eventually see the controller set bit 1 in the overlay token
> (SplitXstate), indicating that it's running the transaction, and, a
> couple microframes later, it clears that bit again. ?The transaction is
> not inactivated.

> ...

> Perhaps for now the best thing would just be to bypass the EHCI CPU
> frequency notifier code (i.e., my patch) for VIA EHCI controllers, since
> they are broken. ?Would a hard-coded blacklist (just an "if
> (manufacturer==VIA)..." type thing) be OK?

Yes ... although if you don't need to blacklist their EHCI 1.0 chips
don't do it. (Any VIA EHCI integrated into a southbridge is going
to follow spec rev 1.0 pretty well, modulo idiosyncratic timings.)


> I've also acquired a card with an NEC EHCI controller on it, which I'm
> going to look at while I'm into it...

Another case where there are a lot of add-on "EHCI 0.95" cards; but
in this case the quirks were less significant.

- Dave

2007-08-14 06:44:30

by Daniel Exner

[permalink] [raw]
Subject: Re: EHCI Regression in 2.6.23-rc2

David Brownell wrote:
> On Monday 13 August 2007, [email protected] wrote:
> > With the VIA controller I have,
>
> Which kind is that? The VT6202 is buggy as all get-out, and
> they sold a *LOT* of those discrete chips for use in add-on PCI
> cards. We generally warn people away from those. A more current
> version is the VT6212, which was much more usable. (If it says
> EHCI 0.95, it's a VT6202... their EHCI 1.0 chips were much better.)
Where exactly should I search for this? Neither lspci nor lsusb showed any
hint on the EHCI rev. the chip conforms to..

[..]
> > Perhaps for now the best thing would just be to bypass the EHCI CPU
> > frequency notifier code (i.e., my patch) for VIA EHCI controllers, since
> > they are broken. ?Would a hard-coded blacklist (just an "if
> > (manufacturer==VIA)..." type thing) be OK?
>
> Yes ... although if you don't need to blacklist their EHCI 1.0 chips
> don't do it. (Any VIA EHCI integrated into a southbridge is going
> to follow spec rev 1.0 pretty well, modulo idiosyncratic timings.)
I guess its needed to blacklist even the ECHI 1.0 chips, since my problem is
with exactly one of those ;)

I'm not really into USB protocol specs, but perhaps its possible to test
wether the problem Stuarts patch addressed can actually happen on VIA EHCI
chips? Perhaps those guys solved the problem in Hard/Firmware..

> > I've also acquired a card with an NEC EHCI controller on it, which I'm
> > going to look at while I'm into it...
>
> Another case where there are a lot of add-on "EHCI 0.95" cards; but
> in this case the quirks were less significant.
Some guy donated me a PCMCIA card with one of those, cause it'll wont work in
his Windows only Notebook :)



Greetings
Daniel Exner

2007-08-14 08:01:58

by David Brownell

[permalink] [raw]
Subject: Re: EHCI Regression in 2.6.23-rc2

On Monday 13 August 2007, Daniel Exner wrote:
> David Brownell wrote:
> > On Monday 13 August 2007, [email protected] wrote:
> > > With the VIA controller I have,
> >
> > Which kind is that? The VT6202 is buggy as all get-out, and
> > they sold a *LOT* of those discrete chips for use in add-on PCI
> > cards. We generally warn people away from those. A more current
> > version is the VT6212, which was much more usable. (If it says
> > EHCI 0.95, it's a VT6202... their EHCI 1.0 chips were much better.)
>
> Where exactly should I search for this? Neither lspci nor lsusb showed any
> hint on the EHCI rev. the chip conforms to..

The driver logs that information as it starts; on this sytem:

ehci_hcd 0000:00:02.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004

vs "EHCI 0.95".


> [..]
> > > Perhaps for now the best thing would just be to bypass the EHCI CPU
> > > frequency notifier code (i.e., my patch) for VIA EHCI controllers, since
> > > they are broken. ?Would a hard-coded blacklist (just an "if
> > > (manufacturer==VIA)..." type thing) be OK?
> >
> > Yes ... although if you don't need to blacklist their EHCI 1.0 chips
> > don't do it. (Any VIA EHCI integrated into a southbridge is going
> > to follow spec rev 1.0 pretty well, modulo idiosyncratic timings.)
>
> I guess its needed to blacklist even the ECHI 1.0 chips, since my problem is
> with exactly one of those ;)

Something doesn't add up then ... above you ask where to find that info,
but here you say you already got it from somwhere ... ?


> I'm not really into USB protocol specs, but perhaps its possible to test
> wether the problem Stuarts patch addressed can actually happen on VIA EHCI
> chips? Perhaps those guys solved the problem in Hard/Firmware..

Theoretically possible, and I've certainly seen hardware made to do
stranger things than that.


> > > I've also acquired a card with an NEC EHCI controller on it, which I'm
> > > going to look at while I'm into it...
> >
> > Another case where there are a lot of add-on "EHCI 0.95" cards; but
> > in this case the quirks were less significant.
>
> Some guy donated me a PCMCIA card with one of those, cause it'll wont work in
> his Windows only Notebook :)

A NEC 0.95 ?? Should be fine with Linux. Assuming no bugs have
crept in.

- Dave



2007-08-14 09:47:42

by Daniel Exner

[permalink] [raw]
Subject: Re: EHCI Regression in 2.6.23-rc2

David Brownell wrote:
> On Monday 13 August 2007, Daniel Exner wrote:
[..]
> > Where exactly should I search for this? Neither lspci nor lsusb showed
> > any hint on the EHCI rev. the chip conforms to..
>
> The driver logs that information as it starts; on this sytem:
>
> ehci_hcd 0000:00:02.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
>
> vs "EHCI 0.95".
ehci_hcd 0000:00:10.4: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004

Build into:
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890
South]

> > > > I've also acquired a card with an NEC EHCI controller on it, which
> > > > I'm going to look at while I'm into it...
> > >
> > > Another case where there are a lot of add-on "EHCI 0.95" cards; but
> > > in this case the quirks were less significant.
> >
> > Some guy donated me a PCMCIA card with one of those, cause it'll wont
> > work in his Windows only Notebook :)
>
> A NEC 0.95 ?? Should be fine with Linux. Assuming no bugs have
> crept in.
Didn't test it yet with 2.6.23-rc2 or rc3, but up to 2.6.22 it was fine :)

Regarding the option to blacklist VIA in the module:
I would prefer blacklisting VIA by default but giving the module some
parameter like "honours inactive bit" to override this.

Perhaps there are newer VIA Chips out there, that indeed do this and some
users trigger happy enough to test this. :)

Greetings
Daniel Exner

2007-08-14 15:24:18

by Stuart Hayes

[permalink] [raw]
Subject: RE: EHCI Regression in 2.6.23-rc2

Daniel Exner wrote:
> David Brownell wrote:
>> On Monday 13 August 2007, Daniel Exner wrote:
> [..]
>>> Where exactly should I search for this? Neither lspci nor lsusb
>>> showed any hint on the EHCI rev. the chip conforms to..
>>
>> The driver logs that information as it starts; on this sytem:
>>
>> ehci_hcd 0000:00:02.2: USB 2.0 started, EHCI 1.00, driver 10 Dec
>> 2004
>>
>> vs "EHCI 0.95".
> ehci_hcd 0000:00:10.4: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
>
> Build into:
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge
> [K8T800/K8T890 South]
>

Hm... I've got a 0.95. I'll try to get a Via EHCI 1.00 controller and
make sure it's the same problem.

>>>>> I've also acquired a card with an NEC EHCI controller on it,
>>>>> which I'm going to look at while I'm into it...
>>>>
>>>> Another case where there are a lot of add-on "EHCI 0.95" cards;
>>>> but in this case the quirks were less significant.
>>>
>>> Some guy donated me a PCMCIA card with one of those, cause it'll
>>> wont work in his Windows only Notebook :)
>>
>> A NEC 0.95 ?? Should be fine with Linux. Assuming no bugs have
>> crept in.
> Didn't test it yet with 2.6.23-rc2 or rc3, but up to 2.6.22 it was
> fine :)
>
> Regarding the option to blacklist VIA in the module:
> I would prefer blacklisting VIA by default but giving the module some
> parameter like "honours inactive bit" to override this.
>
> Perhaps there are newer VIA Chips out there, that indeed do this and
> some users trigger happy enough to test this. :)

That kernel parameter sounds like a reasonable idea to me. The problem
that the patch is trying to work around is that, while the CPUs are
changing frequency, the EHCI controller gets delayed trying to read main
memory (because CPU cache snoops have to wait until the CPU is
finished)... if this happens in the middle of a split transaction to a
low/full speed device, the transaction won't complete in time, and you
get an error and possible data loss.

If the EHCI controller caches ahead enough, it shouldn't need to read
main memory to be able to complete the split transaction... but, while
the controller does say how much ahead it may cache, it isn't clear to
me that it will always be able to cache that much, so I thought it would
be safe to go ahead and inactivate split transactions during CPU
frequency transitions regardless.

2007-08-14 15:42:56

by David Brownell

[permalink] [raw]
Subject: Re: EHCI Regression in 2.6.23-rc2

> ehci_hcd 0000:00:10.4: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
>
> Build into:
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890
> South]

Yeah, VT8235 was their first southbridge with integrated EHCI.
ISTR that the VT8237 worked a bit more smoothly.

I think 8237 was about where they forked off the the VT6212 as
their first discrete EHCI (for addon cards) claiming EHCI 1.0
conformance. Too bad they didn't recycle all the VT6202 chips
in their inventory at that time...

Now ... are you reporting that this worked with Stuart's patch?
Or that it didn't? Or that you couldn't say?

- Dave

2007-08-14 15:49:54

by David Brownell

[permalink] [raw]
Subject: Re: EHCI Regression in 2.6.23-rc2

> Hm... I've got a 0.95. I'll try to get a Via EHCI 1.00 controller and
> make sure it's the same problem.

Yeah, for some reason way too many of the add-on PCI cards with
VIA chips use that pretty-broken VT6202 chip. Ones with VT6212
are also available, and work a lot better.


> > Regarding the option to blacklist VIA in the module:
> > I would prefer blacklisting VIA by default but giving the module some
> > parameter like "honours inactive bit" to override this.
> >
> > Perhaps there are newer VIA Chips out there, that indeed do this and
> > some users trigger happy enough to test this. :)
>
> That kernel parameter sounds like a reasonable idea to me.

Yes, IFF we know that the bug shows up in EHCI 1.00 chips rather than
just the already-known-to-be-buggy VT6202 chips. (I think part of the
deal was that until the parts went through some conformance testing,
nobody could use the "1.0" label. There were also a few small feature
updates and spec clarifications. If anyone else shipped silicon in
volume that was as buggy as a VT6202, I didn't see any.)

I'd be happy to see a warning come out whenever a VT6202 is found,
since its problems are NOT limited to this I-bit bug.


> The problem
> that the patch is trying to work around is that, while the CPUs are
> changing frequency, the EHCI controller gets delayed trying to read main
> memory (because CPU cache snoops have to wait until the CPU is
> finished)... if this happens in the middle of a split transaction to a
> low/full speed device, the transaction won't complete in time, and you
> get an error and possible data loss.
>
> If the EHCI controller caches ahead enough, it shouldn't need to read
> main memory to be able to complete the split transaction... but, while
> the controller does say how much ahead it may cache, it isn't clear to
> me that it will always be able to cache that much, so I thought it would
> be safe to go ahead and inactivate split transactions during CPU
> frequency transitions regardless.

Right.

2007-08-14 15:58:18

by Daniel Exner

[permalink] [raw]
Subject: Re: EHCI Regression in 2.6.23-rc2

David Brownell wrote:
> > ehci_hcd 0000:00:10.4: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
> >
> > Build into:
> > 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge
> > [K8T800/K8T890 South]
>
> Yeah, VT8235 was their first southbridge with integrated EHCI.
> ISTR that the VT8237 worked a bit more smoothly.
>
> I think 8237 was about where they forked off the the VT6212 as
> their first discrete EHCI (for addon cards) claiming EHCI 1.0
> conformance. Too bad they didn't recycle all the VT6202 chips
> in their inventory at that time...
No hardware producer would have done that ;)

> Now ... are you reporting that this worked with Stuart's patch?
> Or that it didn't? Or that you couldn't say?
As I started this thread because Stuart's patch freezes my whole system (at
least my bitsect did blame him), I therefore report that it doesnt work for
me.




Greetings
Daniel Exner

2007-08-14 16:13:04

by David Brownell

[permalink] [raw]
Subject: Re: EHCI Regression in 2.6.23-rc2

> > I think 8237 was about where they forked off the the VT6212 as
> > their first discrete EHCI (for addon cards) claiming EHCI 1.0
> > conformance. Too bad they didn't recycle all the VT6202 chips
> > in their inventory at that time...
>
> No hardware producer would have done that ;)

As I said: too bad. ;)


> > Now ... are you reporting that this worked with Stuart's patch?
> > Or that it didn't? Or that you couldn't say?
>
> As I started this thread because Stuart's patch freezes my whole system (at
> least my bitsect did blame him), I therefore report that it doesnt work for
> me.

The original patch, yes. ISTR seeing an update come around though.
Maybe I was imagining things.

2007-08-14 21:54:56

by Stuart Hayes

[permalink] [raw]
Subject: RE: EHCI Regression in 2.6.23-rc2

David Brownell wrote:
>> Hm... I've got a 0.95. I'll try to get a Via EHCI 1.00 controller
>> and make sure it's the same problem.
>
> Yeah, for some reason way too many of the add-on PCI cards with VIA
> chips use that pretty-broken VT6202 chip. Ones with VT6212 are also
> available, and work a lot better.
>
>
>>> Regarding the option to blacklist VIA in the module:
>>> I would prefer blacklisting VIA by default but giving the module
>>> some parameter like "honours inactive bit" to override this.
>>>
>>> Perhaps there are newer VIA Chips out there, that indeed do this and
>>> some users trigger happy enough to test this. :)
>>
>> That kernel parameter sounds like a reasonable idea to me.
>
> Yes, IFF we know that the bug shows up in EHCI 1.00 chips rather than
> just the already-known-to-be-buggy VT6202 chips. (I think part of
> the deal was that until the parts went through some conformance
> testing, nobody could use the "1.0" label. There were also a few
> small feature updates and spec clarifications. If anyone else
> shipped silicon in volume that was as buggy as a VT6202, I didn't see
> any.)
>
> I'd be happy to see a warning come out whenever a VT6202 is found,
> since its problems are NOT limited to this I-bit bug.
>

OK, I've got a VIA VT6212, and it's definitely not the same as the
6202--it's locking up my system, too, with my patch, and it is
definitely not just ignoring the inactivate bit. I'm still trying to
figure out what's going on.

The NEC controller (EHCI 1.00) seems to work fine, though.


2007-08-15 18:38:27

by Stuart Hayes

[permalink] [raw]
Subject: RE: EHCI Regression in 2.6.23-rc2

Hayes, Stuart wrote:
> David Brownell wrote:
>>> Hm... I've got a 0.95. I'll try to get a Via EHCI 1.00 controller
>>> and make sure it's the same problem.
>>
>> Yeah, for some reason way too many of the add-on PCI cards with VIA
>> chips use that pretty-broken VT6202 chip. Ones with VT6212 are also
>> available, and work a lot better.
>>
>>
>>>> Regarding the option to blacklist VIA in the module:
>>>> I would prefer blacklisting VIA by default but giving the module
>>>> some parameter like "honours inactive bit" to override this.
>>>>
>>>> Perhaps there are newer VIA Chips out there, that indeed do this
>>>> and some users trigger happy enough to test this. :)
>>>
>>> That kernel parameter sounds like a reasonable idea to me.
>>
>> Yes, IFF we know that the bug shows up in EHCI 1.00 chips rather than
>> just the already-known-to-be-buggy VT6202 chips. (I think part of
>> the deal was that until the parts went through some conformance
>> testing, nobody could use the "1.0" label. There were also a few
>> small feature updates and spec clarifications. If anyone else
>> shipped silicon in volume that was as buggy as a VT6202, I didn't see
>> any.)
>>
>> I'd be happy to see a warning come out whenever a VT6202 is found,
>> since its problems are NOT limited to this I-bit bug.
>>
>
> OK, I've got a VIA VT6212, and it's definitely not the same as the
> 6202--it's locking up my system, too, with my patch, and it is
> definitely not just ignoring the inactivate bit. I'm still trying to
> figure out what's going on.
>
> The NEC controller (EHCI 1.00) seems to work fine, though.

OK... I see what's happening. When the VIA VT6212 sees the "inactivate"
bit set, it will START the split transaction, but it doesn't finish it.
When I set the "I" bit--even if I set it like 50 uframes before the
transaction should start--the controller will set bit 1
(splitXstate--this means it's started the transaction) and clear bit 7
(active bit) in the token, when it comes time for the transaction to be
run.

This is a violation of EHCI 1.0 spec section 4.12.2.5, second bullet:

"If the Active bit is a one and the SplitXState is DoStart (regardless
of the value of S-mask), the host
controller will simply set Active bit to a zero... the host controller
must not issue the start-split bus transaction."

With an analyzer, I've observed that the controller is indeed issuing
"start split" without issuing "complete split", and I'm losing
keystrokes if I type when this is happening, as expected.

So... I still think blacklisting the VIA controllers from this CPU
frequency stuff is the best option. It is unlikely that any real issues
will be seen during CPU frequency transitions with these controllers
anyway, because they claim to cache 8 uframes of the periodic schedule.

2007-08-15 19:12:19

by David Brownell

[permalink] [raw]
Subject: Re: [linux-usb-devel] EHCI Regression in 2.6.23-rc2

On Wednesday 15 August 2007, [email protected] wrote:
> So... I still think blacklisting the VIA controllers from this CPU
> frequency stuff is the best option. ?

Sadly, yes. My negative impression of VIA quality is confirmed,
yet again...

Please make sure the comments in your blacklist code describe both
of the chip bugs you've observed.


> It is unlikely that any real issues
> will be seen during CPU frequency transitions with these controllers
> anyway, because they claim to cache 8 uframes of the periodic schedule.

I'd not be so sure. But if there are such issues, we can wait
for problem reports.

- Dvae