2003-05-07 15:59:43

by Paul Fulghum

[permalink] [raw]
Subject: 2.5.69 Interrupt Latency

Starting with kernel version 2.5.69, I am
encountering what appears to be increased
interrupt latency or spikes in interrupt latency.

I noticed this on two serial drivers that use programmed
I/O with FIFOs. On 2.5.68, no problems. On 2.5.69
plenty of underruns. Inspecting the driver tracing, it
does not look like lost interrupts.

I see this on 2 different machines
(one SMP server and one laptop).

There were changes involving the return type of
interrupt handlers (from void to irqreturn_t) in 2.5.69.
Could this be related?

Has anyone else seen similar results?

If I can get time, I'll try and hook up a scope
to measure the latencies precisely. I want to
check to see if this is a known problems before doing so.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com



2003-05-07 19:29:39

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Wed, 2003-05-07 at 11:12, Paul Fulghum wrote:
> Starting with kernel version 2.5.69, I am
> encountering what appears to be increased
> interrupt latency or spikes in interrupt latency.
> ...
> I see this on 2 different machines
> (one SMP server and one laptop).
> ...
> If I can get time, I'll try and hook up a scope
> to measure the latencies precisely. I want to
> check to see if this is a known problems before doing so.

Here are some results with a scope hooked to
the hardware while running tests with a regular
interrupt pattern:

2.4.20-8 (redhat)
Latency 20-30usec
Spikes to 80usec

2.5.68
Latency 20-30usec
Spikes to 100usec

2.5.69
Latency 100-110usec (5x increase)
Spikes from 5-10 milliseconds

This is all on a PCI adapter not sharing interrupts
on a dual Pentium II-400 Netserver LC3.

Any ideas what happened?

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-07 22:20:09

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

Paul Fulghum <[email protected]> wrote:
>
> 2.5.69
> Latency 100-110usec (5x increase)
> Spikes from 5-10 milliseconds
>
> This is all on a PCI adapter not sharing interrupts
> on a dual Pentium II-400 Netserver LC3.
>
> Any ideas what happened?

Could be that some random piece of code forgot to reenable interrupts, and
things stay that way until they get reenabled again by schedule() or
syscall return.

One way of finding the culprit would be:

my_isr()
{
if (this interrupt is more than 5 milliseconds delayed)
dump_stack();
}

the stack dump will point up at the place where interrupts finally got
enabled.

If you can describe what drivers are in use, and what workload triggers the
problem then it may be locatable by inspection.

2003-05-08 01:06:24

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Wed, 2003-05-07 at 17:28, Andrew Morton wrote:
> Paul Fulghum <[email protected]> wrote:
> >
> > 2.5.69
> > Latency 100-110usec (5x increase)
> > Spikes from 5-10 milliseconds
> >
> > This is all on a PCI adapter not sharing interrupts
> > on a dual Pentium II-400 Netserver LC3.
> >
> > Any ideas what happened?
>
> Could be that some random piece of code forgot to reenable interrupts, and
> things stay that way until they get reenabled again by schedule() or
> syscall return.
>
> One way of finding the culprit would be:
>
> my_isr()
> {
> if (this interrupt is more than 5 milliseconds delayed)
> dump_stack();
> }
>
> the stack dump will point up at the place where interrupts finally got
> enabled.

I'll give that a try tomorrow.

> If you can describe what drivers are in use, and what workload triggers the
> problem then it may be locatable by inspection.

It happens on both of the machines I tried (server and laptop).
I think the only common hardware between the two is the net
controller which is intel etherpro 100 based. I'll check tomorrow
to be sure.

There was essentially no work load (no net traffic, no CPU
intensive program, no disk activity). I was just doing simple
loopback tests on our serial devices (PCI based on server
and PC Card on laptop).

Paul Fulghum
[email protected]




2003-05-08 13:43:45

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Wed, 2003-05-07 at 17:28, Andrew Morton wrote:
> Paul Fulghum <[email protected]> wrote:
> >
> > 2.5.69
> > Latency 100-110usec (5x increase)
> > Spikes from 5-10 milliseconds
> >

> If you can describe what drivers are in use, and what workload triggers the
> problem then it may be locatable by inspection.

After inspecting both machines, there is no common
hardware other than the net device. Both machines
use the e100 driver.

After removing the e100 driver altogether,
the increased latency and latency spikes persist.

So it looks like this problem is not specific to a
particular hardware driver, but is a result of a
more fundemental, system wide change.

I'm going to try your suggestion of doing a stack dump
when the driver encounters the large spikes in IRQ latency,
to determine if something is leaving interrupts disabled.

That will not address the fact that the minimum
latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec
(2.5.69). This may actually be two separate problems
introduced with 2.5.69

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-08 14:34:25

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Wed, 2003-05-07 at 17:28, Andrew Morton wrote:
> Paul Fulghum <[email protected]> wrote:
> >
> > 2.5.69
> > Latency 100-110usec (5x increase)
> > Spikes from 5-10 milliseconds
> >
> Could be that some random piece of code forgot to reenable interrupts, and
> things stay that way until they get reenabled again by schedule() or
> syscall return.
>
> One way of finding the culprit would be:
>
> my_isr()
> {
> if (this interrupt is more than 5 milliseconds delayed)
> dump_stack();
> }
>
> the stack dump will point up at the place where interrupts finally got
> enabled.

Here is what I got (latency spike in milliseconds):

synclinkscc is a driver I maintain, and that is
where I placed the stack_dump()

Call Trace:
[<cc8bdf1a>] IsrStatusB+0x1fa/0x2c0 [synclinkscc]
[<cc8c5b29>] +0xf4/0x5ab [synclinkscc]
[<cc8c5fe0>] +0x0/0x14c8 [synclinkscc]
[<cc8be575>] mgscc_interrupt+0xa5/0x1b0 [synclinkscc]
[<cc8c5fe0>] +0x0/0x14c8 [synclinkscc]
[<c010d5eb>] handle_IRQ_event+0x4b/0x120
[<c010d89b>] do_IRQ+0x9b/0x110
[<c010bc00>] common_interrupt+0x18/0x20
[<c01255db>] do_softirq+0x6b/0xe0
[<c010d8fb>] do_IRQ+0xfb/0x110
[<c0108f90>] default_idle+0x0/0x40
[<c010bc00>] common_interrupt+0x18/0x20
[<c0108f90>] default_idle+0x0/0x40
[<c0108fc0>] default_idle+0x30/0x40
[<c010905a>] cpu_idle+0x4a/0x60
[<c0105000>] rest_init+0x0/0x60
[<c0456951>] start_kernel+0x181/0x1b0
[<c0456500>] unknown_bootoption+0x0/0x110



--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-08 19:09:17

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

Paul Fulghum <[email protected]> wrote:
>
> On Wed, 2003-05-07 at 17:28, Andrew Morton wrote:
> > Paul Fulghum <[email protected]> wrote:
> > >
> > > 2.5.69
> > > Latency 100-110usec (5x increase)
> > > Spikes from 5-10 milliseconds
> > >
>
> > If you can describe what drivers are in use, and what workload triggers the
> > problem then it may be locatable by inspection.
>
> After inspecting both machines, there is no common
> hardware other than the net device. Both machines
> use the e100 driver.
>
> After removing the e100 driver altogether,
> the increased latency and latency spikes persist.
>
> So it looks like this problem is not specific to a
> particular hardware driver, but is a result of a
> more fundemental, system wide change.
>
> I'm going to try your suggestion of doing a stack dump
> when the driver encounters the large spikes in IRQ latency,
> to determine if something is leaving interrupts disabled.

I wasn't very informative, alas.

> That will not address the fact that the minimum
> latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec
> (2.5.69). This may actually be two separate problems
> introduced with 2.5.69

Can you pinpoint a kernel version at which it started to happen?

2003-05-08 19:22:33

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Thu, 2003-05-08 at 14:22, Andrew Morton wrote:
> Paul Fulghum <[email protected]> wrote:
> >
> > On Wed, 2003-05-07 at 17:28, Andrew Morton wrote:
> > > Paul Fulghum <[email protected]> wrote:
> > > >
> > > > 2.5.69
> > > > Latency 100-110usec (5x increase)
> > > > Spikes from 5-10 milliseconds
> > > >

> > I'm going to try your suggestion of doing a stack dump
> > when the driver encounters the large spikes in IRQ latency,
> > to determine if something is leaving interrupts disabled.
>
> I wasn't very informative, alas.

Yeah, I've been reading through the 2.5.69 patch again and
could not really see anything that related to the
stack dump.

> > That will not address the fact that the minimum
> > latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec
> > (2.5.69). This may actually be two separate problems
> > introduced with 2.5.69
>
> Can you pinpoint a kernel version at which it started to happen?

Exactly with 2.5.69

2.5.68 works fine as do earlier versions back to 2.4.20-8
(earliest tested for this problem). All these versions have
very consistant latencies as described above.

The problem definately started with the 2.5.69

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-08 23:08:21

by Brian Gerst

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

Paul Fulghum wrote:
> On Thu, 2003-05-08 at 14:22, Andrew Morton wrote:
>
>>Paul Fulghum <[email protected]> wrote:
>>
>>>On Wed, 2003-05-07 at 17:28, Andrew Morton wrote:
>>>
>>>>Paul Fulghum <[email protected]> wrote:
>>>>
>>>>>2.5.69
>>>>>Latency 100-110usec (5x increase)
>>>>>Spikes from 5-10 milliseconds
>>>>>
>
>
>>>I'm going to try your suggestion of doing a stack dump
>>>when the driver encounters the large spikes in IRQ latency,
>>>to determine if something is leaving interrupts disabled.
>>
>>I wasn't very informative, alas.
>
>
> Yeah, I've been reading through the 2.5.69 patch again and
> could not really see anything that related to the
> stack dump.
>
>
>>>That will not address the fact that the minimum
>>>latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec
>>>(2.5.69). This may actually be two separate problems
>>>introduced with 2.5.69
>>
>>Can you pinpoint a kernel version at which it started to happen?
>
>
> Exactly with 2.5.69
>
> 2.5.68 works fine as do earlier versions back to 2.4.20-8
> (earliest tested for this problem). All these versions have
> very consistant latencies as described above.
>
> The problem definately started with the 2.5.69
>

Try to narrow it down with the 2.5.68-bk snapshots.

--
Brian Gerst

2003-05-09 17:59:11

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Thu, 2003-05-08 at 14:22, Andrew Morton wrote:
> Can you pinpoint a kernel version at which it started to happen?

I have now isolated the latency problems further to 2.5.68-bk11

2.5.68-bk10 an earlier works fine.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-09 20:17:43

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Fri, 2003-05-09 at 13:12, Paul Fulghum wrote:
> On Thu, 2003-05-08 at 14:22, Andrew Morton wrote:
> > Can you pinpoint a kernel version at which it started to happen?
>
> I have now isolated the latency problems further to 2.5.68-bk11
>
> 2.5.68-bk10 an earlier works fine.

In the process of eliminating kernel options to isolate
the problem, eliminating USB completely appears to fix it.

One machine (server) was using usb-uhci and
the other (laptop) was using usb-ohci.

So it looks like something with USB in 2.5.68-bk11

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-09 20:56:56

by Arnd Bergmann

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

Paul Fulghum wrote:

> One machine (server) was using usb-uhci and
> the other (laptop) was using usb-ohci.
>
> So it looks like something with USB in 2.5.68-bk11

The change below was part of 2.5.68-bk11, and adds a 20ms
delay to the uhci interrupt handler. Could that be
the culprit?

Arnd <><

ChangeSet 1.1042.1.129 2003/04/29 15:30:31 [email protected]
[PATCH] USB: Minor patch for uhci-hcd.c
--- 1.32/drivers/usb/host/uhci-hcd.c Mon Apr 14 11:51:40 2003
+++ 1.33/drivers/usb/host/uhci-hcd.c Fri Apr 18 13:37:24 2003
@@ -1283,7 +1283,8 @@
}

if (last_urb) {
- *end = (last_urb->start_frame + last_urb->number_of_packets) & 1023;
+ *end = (last_urb->start_frame + last_urb->number_of_packets *
+ last_urb->interval) & (UHCI_NUMFRAMES-1);
ret = 0;
} else
ret = -1; /* no previous urb found */
@@ -1933,9 +1934,10 @@

dbg("%x: suspend_hc", io_addr);

- outw(USBCMD_EGSM, io_addr + USBCMD);
-
uhci->is_suspended = 1;
+ smp_wmb();
+
+ outw(USBCMD_EGSM, io_addr + USBCMD);
}

static void wakeup_hc(struct uhci_hcd *uhci)
@@ -1945,6 +1947,9 @@

dbg("%x: wakeup_hc", io_addr);

+ /* Global resume for 20ms */
+ outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
+ wait_ms(20);
outw(0, io_addr + USBCMD);

/* wait for EOP to be sent */
@@ -1965,7 +1970,7 @@
int i;

for (i = 0; i < uhci->rh_numports; i++)
- connection |= (inw(io_addr + USBPORTSC1 + i * 2) & 0x1);
+ connection |= (inw(io_addr + USBPORTSC1 + i * 2) & USBPORTSC_CCS);

return connection;
}

2003-05-09 20:58:45

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

Paul Fulghum <[email protected]> wrote:
>
> On Thu, 2003-05-08 at 14:22, Andrew Morton wrote:
> > Can you pinpoint a kernel version at which it started to happen?
>
> I have now isolated the latency problems further to 2.5.68-bk11
>
> 2.5.68-bk10 an earlier works fine.

Well I'm darned if I can see a thing wrong there. Are you using
ieee1394, or USB, or any fancy networking features?

2003-05-09 21:12:44

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Fri, 2003-05-09 at 16:06, Arnd Bergmann wrote:
> Paul Fulghum wrote:
>
> > One machine (server) was using usb-uhci and
> > the other (laptop) was using usb-ohci.
> >
> > So it looks like something with USB in 2.5.68-bk11
>
> The change below was part of 2.5.68-bk11, and adds a 20ms
> delay to the uhci interrupt handler. Could that be
> the culprit?

Possibly, I can try backing out just that part.

To complicate matters, this is happening on 2 machines:
a server and a laptop.

I disabled USB on the laptop (2.5.69) and the problem
is still there :-(

I am confident about these results:
1. On the server, bk10 and earlier works, bk11 and later breaks.
2. On the server, bk11 with USB breaks, bk11 without USB works.
3. On the laptop, 2.5.68 and earlier works, 2.5.69 breaks

I need to test the laptop with the bk10/bk11 sets to
see if this follows the results on the server.

Maybe disabling/enabling USB is just triggering something else
in the configuration file.

I'm leaving for the weekend now, and will try
to get back to this on Monday.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com



--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-09 21:15:49

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Fri, 2003-05-09 at 16:07, Andrew Morton wrote:

> Well I'm darned if I can see a thing wrong there. Are you using
> ieee1394, or USB, or any fancy networking features?

ieee1394 is disabled, pretty basic network options
(started from make defconfig)

See my reponse to Arnd Bergmann for more details.
I'm not thoroughly convinced it's USB either.
I'm still collecting info and testing different versions
to try and piece this together.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-09 21:19:46

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

Paul Fulghum <[email protected]> wrote:
>
> In the process of eliminating kernel options to isolate
> the problem, eliminating USB completely appears to fix it.
>
> One machine (server) was using usb-uhci and
> the other (laptop) was using usb-ohci.
>
> So it looks like something with USB in 2.5.68-bk11

ah, that helps.

This code was added to wakeup_hc(). It is called from uhci_irq():

+ /* Global resume for 20ms */
+ outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
+ wait_ms(20);

The changelog just says "Minor patch for uhci-hcd.c"

Can you delete the wait_ms() and see if that is our culprit?

2003-05-12 13:44:41

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Fri, 2003-05-09 at 16:28, Andrew Morton wrote:

> This code was added to wakeup_hc(). It is called from uhci_irq():
>
> + /* Global resume for 20ms */
> + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
> + wait_ms(20);
>
> The changelog just says "Minor patch for uhci-hcd.c"
>
> Can you delete the wait_ms() and see if that is our culprit?

This is the culprit.

Removing this line corrects the latency problems on
the server. A 20ms delay seems pretty excessive for an
interrupt handler. I'm not sure what it is supposed to
accomplish, but this seems like something that should
be scheduled to run outside of the ISR.

I must have messed up a test on the laptop that is
also showing latency problems. On the laptop the
problem *is* in both 2.5.68/2.5.69 and *is not*
eliminated by turning off USB. The laptop uses the
ohci driver anyways which is not effected by this patch.
The laptop does not show latency problems on 2.4.20.

So the patch above is definately a problem,
but the problem I am seeing on the laptop
is something unrelated, but part of 2.5.x
(which I will investigate further).

Thanks,
Paul

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-12 13:53:09

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Mon, 2003-05-12 at 08:57, Paul Fulghum wrote:
> On Fri, 2003-05-09 at 16:28, Andrew Morton wrote:
>
> > This code was added to wakeup_hc(). It is called from uhci_irq():
> >
> > + /* Global resume for 20ms */
> > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
> > + wait_ms(20);
> >
> > The changelog just says "Minor patch for uhci-hcd.c"
> >
> > Can you delete the wait_ms() and see if that is our culprit?
>
> This is the culprit.
>
> Removing this line corrects the latency problems on
> the server. A 20ms delay seems pretty excessive for an
> interrupt handler. I'm not sure what it is supposed to
> accomplish, but this seems like something that should
> be scheduled to run outside of the ISR.
>
> I must have messed up a test on the laptop that is
> also showing latency problems. On the laptop the
> problem *is* in both 2.5.68/2.5.69 and *is not*
> eliminated by turning off USB. The laptop uses the
> ohci driver anyways which is not effected by this patch.
> The laptop does not show latency problems on 2.4.20.
>
> So the patch above is definately a problem,
> but the problem I am seeing on the laptop
> is something unrelated, but part of 2.5.x
> (which I will investigate further).
>
> Thanks,
> Paul

I forgot to add this snippet from the /var/log/messages file
of the server in case it helps the USB maintainer
in evaluating what to do about the above patch.

kernel: drivers/usb/host/uhci-hcd.c: USB Universal Host Controller Interface driver v2.0
kernel: uhci-hcd 00:04.2: Intel Corp. 82371AB/EB/MB PIIX4
kernel: uhci-hcd 00:04.2: irq 19, io base 0000fce0
kernel: Please use the 'usbfs' filetype instead, the 'usbdevfs' name is deprecated.
kernel: uhci-hcd 00:04.2: new USB bus registered, assigned bus number 1
kernel: hub 1-0:0: USB hub found
kernel: hub 1-0:0: 2 ports detected


--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-12 16:10:44

by Greg KH

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Mon, May 12, 2003 at 08:57:42AM -0500, Paul Fulghum wrote:
> On Fri, 2003-05-09 at 16:28, Andrew Morton wrote:
>
> > This code was added to wakeup_hc(). It is called from uhci_irq():
> >
> > + /* Global resume for 20ms */
> > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
> > + wait_ms(20);
> >
> > The changelog just says "Minor patch for uhci-hcd.c"
> >
> > Can you delete the wait_ms() and see if that is our culprit?
>
> This is the culprit.
>
> Removing this line corrects the latency problems on
> the server. A 20ms delay seems pretty excessive for an
> interrupt handler. I'm not sure what it is supposed to
> accomplish, but this seems like something that should
> be scheduled to run outside of the ISR.

This should only happen when your hardware receives a "RESUME" signal
from a USB device AND the host controller is in a global suspend state
at that time.

Now I think the wait_ms() call is valid for when this is really
happening, but it sounds like you are having this happen all the time
during normal operation. Are you using any USB devices with this
server? Is USB enabled in the BIOS or not?

Also, Johannes / Alan, should we be verifying the global suspend state
when we read this value so that we don't accidentally call wakeup_hc()
for hardware that sets this bit in an illegal way? I think that might
be the proper fix for this problem.

thanks,

greg k-h

2003-05-12 16:57:26

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Mon, 2003-05-12 at 11:24, Greg KH wrote:
> On Mon, May 12, 2003 at 08:57:42AM -0500, Paul Fulghum wrote:
> > On Fri, 2003-05-09 at 16:28, Andrew Morton wrote:
> >
> > > This code was added to wakeup_hc(). It is called from uhci_irq():
> > >
> > > + /* Global resume for 20ms */
> > > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
> > > + wait_ms(20);
> > >
> > > The changelog just says "Minor patch for uhci-hcd.c"
> > >
> > > Can you delete the wait_ms() and see if that is our culprit?
> >
> > This is the culprit.
> >
> > Removing this line corrects the latency problems on
> > the server. A 20ms delay seems pretty excessive for an
> > interrupt handler. I'm not sure what it is supposed to
> > accomplish, but this seems like something that should
> > be scheduled to run outside of the ISR.
>
> This should only happen when your hardware receives a "RESUME" signal
> from a USB device AND the host controller is in a global suspend state
> at that time.
>
> Now I think the wait_ms() call is valid for when this is really
> happening, but it sounds like you are having this happen all the time
> during normal operation.

It does appear to happen on a regular basis.

Should the 20ms delay be in the ISR though?
I thought it was standard practice to move such
lengthy operations outside of the ISR so as not to
impact interrupt latency for the system.

> Are you using any USB devices with this
> server? Is USB enabled in the BIOS or not?

There are no USB devices attached to the server.
There are no actual USB connectors, and the
server's specs do not list USB. There is no
option to enable/disable USB in the BIOS.

So my guess is this is an unused portion of
the chipset being detected and enabled.



--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com

2003-05-12 17:16:46

by Greg KH

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Mon, May 12, 2003 at 12:08:21PM -0500, Paul Fulghum wrote:
> On Mon, 2003-05-12 at 11:24, Greg KH wrote:
> > On Mon, May 12, 2003 at 08:57:42AM -0500, Paul Fulghum wrote:
> > > On Fri, 2003-05-09 at 16:28, Andrew Morton wrote:
> > >
> > > > This code was added to wakeup_hc(). It is called from uhci_irq():
> > > >
> > > > + /* Global resume for 20ms */
> > > > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
> > > > + wait_ms(20);
> > > >
> > > > The changelog just says "Minor patch for uhci-hcd.c"
> > > >
> > > > Can you delete the wait_ms() and see if that is our culprit?
> > >
> > > This is the culprit.
> > >
> > > Removing this line corrects the latency problems on
> > > the server. A 20ms delay seems pretty excessive for an
> > > interrupt handler. I'm not sure what it is supposed to
> > > accomplish, but this seems like something that should
> > > be scheduled to run outside of the ISR.
> >
> > This should only happen when your hardware receives a "RESUME" signal
> > from a USB device AND the host controller is in a global suspend state
> > at that time.
> >
> > Now I think the wait_ms() call is valid for when this is really
> > happening, but it sounds like you are having this happen all the time
> > during normal operation.
>
> It does appear to happen on a regular basis.
>
> Should the 20ms delay be in the ISR though?
> I thought it was standard practice to move such
> lengthy operations outside of the ISR so as not to
> impact interrupt latency for the system.

This should only happen when the hardware is suspended, and we are being
woken up by a device. So this should be a _very_ rare occurance, and
when it does happen, the latency isn't that big of a deal (we need it to
wake up the hardware properly.)

> > Are you using any USB devices with this
> > server? Is USB enabled in the BIOS or not?
>
> There are no USB devices attached to the server.
> There are no actual USB connectors, and the
> server's specs do not list USB. There is no
> option to enable/disable USB in the BIOS.

Heh, then I would suggest not loading this driver at all. It sounds
like you have an internal USB controller that probably does not have
properly terminated connectors.

thanks,

greg k-h

2003-05-12 17:38:38

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Mon, 2003-05-12 at 12:30, Greg KH wrote:
> > Should the 20ms delay be in the ISR though?
> > I thought it was standard practice to move such
> > lengthy operations outside of the ISR so as not to
> > impact interrupt latency for the system.
>
> This should only happen when the hardware is suspended, and we are being
> woken up by a device. So this should be a _very_ rare occurance, and
> when it does happen, the latency isn't that big of a deal (we need it to
> wake up the hardware properly.)

So you feel interrupt latency does not matter when
a machine is waking up? I'm not particularly worried
about that situation, so I won't argue that.

How about some sort of sanity check (as you mentioned
earlier), so this is not shooting off all of the time
during normal operation.

> > There are no USB devices attached to the server.
> > There are no actual USB connectors, and the
> > server's specs do not list USB. There is no
> > option to enable/disable USB in the BIOS.
>
> Heh, then I would suggest not loading this driver at all. It sounds
> like you have an internal USB controller that probably does not have
> properly terminated connectors.

Maybe.

But most distributions have the USB driver loaded by
default, so if this new change stays as is, it will silently
cause erratic problems for such machines (with unused
USB controllers on the mainboard).

Then this investigation will be repeated over and over
by end users and anyone trying to support latency
sensitive devices (such as standard serial ports) on Linux.

So either a sanity check to prevent unnecessary
calls to this delay, or recoding the delay so it
does not run in the ISR and kill interrupt latency
would be the correct thing to do.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-12 18:03:08

by Greg KH

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Mon, May 12, 2003 at 12:49:29PM -0500, Paul Fulghum wrote:
>
> How about some sort of sanity check (as you mentioned
> earlier), so this is not shooting off all of the time
> during normal operation.

That's the proper thing to do. Also possibly blacklisting your
motherboard's USB controller. What does lspci -v show?

thanks,

greg k-h

2003-05-12 18:05:47

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Mon, 2003-05-12 at 13:01, Greg KH wrote:
> On Mon, May 12, 2003 at 12:49:29PM -0500, Paul Fulghum wrote:
> >
> > How about some sort of sanity check (as you mentioned
> > earlier), so this is not shooting off all of the time
> > during normal operation.
>
> That's the proper thing to do. Also possibly blacklisting your
> motherboard's USB controller. What does lspci -v show?

If both can be accomplished (state check to qualify delay
and blacklisting), that would be optimal.

The machine is the HP Netserver LC3 which does not
officially have USB (but apparently has a vestigial
controller), so blacklisting should be a no brainer.

Output of lspci:

00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge
(AGP disabled) (rev 02)
Flags: bus master, medium devsel, latency 64
Memory at <unassigned> (32-bit, prefetchable) [size=256M]

00:04.0 ISA bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02)
Flags: bus master, medium devsel, latency 0

00:04.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01)
(prog-if 80 [Master])
Flags: bus master, medium devsel, latency 32
I/O ports at fcb0 [size=16]

00:04.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01)
(prog-if 00 [UHCI])
Flags: bus master, medium devsel, latency 32, IRQ 19
I/O ports at fce0 [size=32]

00:04.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 02)
Flags: medium devsel, IRQ 9

00:07.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 02)
(prog-if 00 [Normal decode])
Flags: bus master, medium devsel, latency 57
Bus: primary=00, secondary=01, subordinate=01, sec-latency=36
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: feb00000-febfffff
Prefetchable memory behind bridge: 00000000fbf00000-00000000fbf00000

00:08.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100]
(rev 05)
Subsystem: Hewlett-Packard Company NetServer 10/100TX
Flags: bus master, medium devsel, latency 66, IRQ 16
Memory at fecfc000 (32-bit, prefetchable) [size=4K]
I/O ports at fcc0 [size=32]
Memory at fed00000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at <unassigned> [disabled] [size=1M]
Capabilities: [dc] Power Management version 1

00:09.0 Communication controller: Microgate Corporation: Unknown device
0030 (rev 01)
Subsystem: Microgate Corporation: Unknown device 0030
Flags: medium devsel, IRQ 17
Memory at fecfd400 (32-bit, non-prefetchable) [size=128]
I/O ports at fc00 [size=128]
Memory at fecfd800 (32-bit, non-prefetchable) [size=512]
Memory at fec80000 (32-bit, prefetchable) [size=256K]
Memory at fecfdc00 (32-bit, non-prefetchable) [size=16]

00:0a.0 SCSI storage controller: Adaptec AIC-7880U (rev 01)
Subsystem: Adaptec AIC-7880P Ultra/Ultra Wide SCSI Chipset
Flags: bus master, medium devsel, latency 64, IRQ 19
I/O ports at f800 [disabled] [size=256]
Memory at fecff000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at <unassigned> [disabled] [size=64K]
Capabilities: [dc] Power Management version 1

00:0d.0 VGA compatible controller: Cirrus Logic GD 5446 (rev 45)
(prog-if 00 [VGA])
Subsystem: Hewlett-Packard Company: Unknown device 0001
Flags: medium devsel
Memory at fc000000 (32-bit, prefetchable) [size=32M]
Memory at fecfe000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at <unassigned> [disabled] [size=32K]

01:02.0 Communication controller: Microgate Corporation: Unknown device
0020 (rev 01)
Subsystem: Microgate Corporation: Unknown device 0020
Flags: medium devsel, IRQ 18
Memory at febff800 (32-bit, non-prefetchable) [size=128]
I/O ports at ec00 [size=128]
I/O ports at ecfc [size=4]

01:03.0 Communication controller: Microgate Corporation: Unknown device
0210 (rev 02)
Subsystem: Microgate Corporation: Unknown device 0210
Flags: medium devsel, IRQ 19
Memory at febff400 (32-bit, non-prefetchable) [size=128]
I/O ports at e880 [size=128]
I/O ports at ece8 [size=8]
Memory at fbf80000 (32-bit, prefetchable) [size=256K]

01:04.0 Communication controller: Microgate Corporation SyncLink WAN
Adapter (rev 01)
Subsystem: Microgate Corporation SyncLink WAN Adapter
Flags: medium devsel, IRQ 16
Memory at febfe800 (32-bit, non-prefetchable) [size=128]
I/O ports at e800 [size=128]
I/O ports at ecf0 [size=8]
Memory at fbf40000 (32-bit, prefetchable) [size=256K]



--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-13 15:13:19

by Alan Stern

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Mon, 12 May 2003, Greg KH wrote:

> > > This should only happen when your hardware receives a "RESUME" signal
> > > from a USB device AND the host controller is in a global suspend state
> > > at that time.
> > >
> > > Now I think the wait_ms() call is valid for when this is really
> > > happening, but it sounds like you are having this happen all the time
> > > during normal operation.
> >
> > It does appear to happen on a regular basis.
> >
> > Should the 20ms delay be in the ISR though?
> > I thought it was standard practice to move such
> > lengthy operations outside of the ISR so as not to
> > impact interrupt latency for the system.
>
> This should only happen when the hardware is suspended, and we are being
> woken up by a device. So this should be a _very_ rare occurance, and
> when it does happen, the latency isn't that big of a deal (we need it to
> wake up the hardware properly.)

Putting in a sanity check for the global suspend state will be very easy.
But I would like to point out that this "global suspend" does not refer to
the entire system, only the USB bus. I'm not sure under what
circumstances the bus is placed in global suspend; I think it's just when
there are no devices attached (or the last remaining device is detached).

However, there have been cases on my own system where turning off the only
USB peripheral caused the driver to bounce between suspend_hc() and
wakeup_hc() several times without any apparent explanation -- possibly as
a result of transient electrical signals on the bus (?). So perhaps
moving that delay out of the ISR isn't such a bad idea.

Alan Stern

2003-05-13 15:24:39

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Tue, 2003-05-13 at 10:26, Alan Stern wrote:

> Putting in a sanity check for the global suspend state will be very easy.
> But I would like to point out that this "global suspend" does not refer to
> the entire system, only the USB bus.

That is a problem then, because the delay can still
occur during normal system operation.

> I'm not sure under what
> circumstances the bus is placed in global suspend; I think it's just when
> there are no devices attached (or the last remaining device is detached).
>
> However, there have been cases on my own system where turning off the only
> USB peripheral caused the driver to bounce between suspend_hc() and
> wakeup_hc() several times without any apparent explanation -- possibly as
> a result of transient electrical signals on the bus (?). So perhaps
> moving that delay out of the ISR isn't such a bad idea.

Agreed. If this can happen on functional USB controllers
when no devices are attached, then it is a serious problem.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-13 17:17:09

by Greg KH

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote:
> On Tue, 2003-05-13 at 10:26, Alan Stern wrote:
>
> > Putting in a sanity check for the global suspend state will be very easy.
> > But I would like to point out that this "global suspend" does not refer to
> > the entire system, only the USB bus.
>
> That is a problem then, because the delay can still
> occur during normal system operation.

Ok, can you try the attached patch and see if it causes your latency
problem to go away?

thanks,

greg k-h


--- a/drivers/usb/host/uhci-hcd.c Sun May 4 23:49:54 2003
+++ b/drivers/usb/host/uhci-hcd.c Tue May 13 10:26:02 2003
@@ -1947,6 +1947,11 @@

dbg("%x: wakeup_hc", io_addr);

+ /* Verify that we really should wake up the hc */
+ status = inw(io_addr + USBCMD);
+ if (!(status & USBCMD_EGSM))
+ return;
+
/* Global resume for 20ms */
outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
wait_ms(20);

2003-05-13 17:50:05

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Tue, 2003-05-13 at 12:30, Greg KH wrote:
> On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote:
> > On Tue, 2003-05-13 at 10:26, Alan Stern wrote:
> >
> > > Putting in a sanity check for the global suspend state will be very easy.
> > > But I would like to point out that this "global suspend" does not refer to
> > > the entire system, only the USB bus.
> >
> > That is a problem then, because the delay can still
> > occur during normal system operation.
>
> Ok, can you try the attached patch and see if it causes your latency
> problem to go away?

I applied the patch plus a couple of printk statements,
and the wakeup_hc() is being continuously called
as well as actually executing the delay.

So the check is not preventing anything.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-13 17:56:52

by Greg KH

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Tue, May 13, 2003 at 08:01:01AM -0500, Paul Fulghum wrote:
> On Tue, 2003-05-13 at 12:30, Greg KH wrote:
> > On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote:
> > > On Tue, 2003-05-13 at 10:26, Alan Stern wrote:
> > >
> > > > Putting in a sanity check for the global suspend state will be very easy.
> > > > But I would like to point out that this "global suspend" does not refer to
> > > > the entire system, only the USB bus.
> > >
> > > That is a problem then, because the delay can still
> > > occur during normal system operation.
> >
> > Ok, can you try the attached patch and see if it causes your latency
> > problem to go away?
>
> I applied the patch plus a couple of printk statements,
> and the wakeup_hc() is being continuously called
> as well as actually executing the delay.

Is the suspend_hc() function ever getting called by anyone in that
driver?

thanks,

greg k-h

2003-05-13 17:59:23

by Greg KH

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Tue, May 13, 2003 at 11:09:13AM -0700, Greg KH wrote:
> On Tue, May 13, 2003 at 08:01:01AM -0500, Paul Fulghum wrote:
> > On Tue, 2003-05-13 at 12:30, Greg KH wrote:
> > > On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote:
> > > > On Tue, 2003-05-13 at 10:26, Alan Stern wrote:
> > > >
> > > > > Putting in a sanity check for the global suspend state will be very easy.
> > > > > But I would like to point out that this "global suspend" does not refer to
> > > > > the entire system, only the USB bus.
> > > >
> > > > That is a problem then, because the delay can still
> > > > occur during normal system operation.
> > >
> > > Ok, can you try the attached patch and see if it causes your latency
> > > problem to go away?
> >
> > I applied the patch plus a couple of printk statements,
> > and the wakeup_hc() is being continuously called
> > as well as actually executing the delay.
>
> Is the suspend_hc() function ever getting called by anyone in that
> driver?

Ok, nevermind, I see where it would be getting called under normal
operation...

Hm, I don't really know. Johannes, any thoughts?

thanks,

greg k-h

2003-05-13 20:10:42

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On 13 May 2003, Paul Fulghum wrote:

> On Tue, 2003-05-13 at 10:26, Alan Stern wrote:
>
> > Putting in a sanity check for the global suspend state will be very easy.
> > But I would like to point out that this "global suspend" does not refer to
> > the entire system, only the USB bus.
>
> That is a problem then, because the delay can still
> occur during normal system operation.
>
> > I'm not sure under what
> > circumstances the bus is placed in global suspend; I think it's just when
> > there are no devices attached (or the last remaining device is detached).
> >
> > However, there have been cases on my own system where turning off the only
> > USB peripheral caused the driver to bounce between suspend_hc() and
> > wakeup_hc() several times without any apparent explanation -- possibly as
> > a result of transient electrical signals on the bus (?). So perhaps
> > moving that delay out of the ISR isn't such a bad idea.
>
> Agreed. If this can happen on functional USB controllers
> when no devices are attached, then it is a serious problem.

Instead of trying to guess when to do it, could the sleep be replaced by
setting a flag bit to indicate that a sleep was needed before using the
hardware? Then the sleep could be done when needed but no noise on the USB
bus wouldn't hurt.

1 - there may be many places, I thought of that but didn't look
since someone will tell me if it's a problem.

2 - if you don't use USB why not just take the driver out?

It would be nice to prevent the problem, of course.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-05-13 21:23:40

by Alan Stern

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Tue, 13 May 2003, Greg KH wrote:

> On Tue, May 13, 2003 at 11:09:13AM -0700, Greg KH wrote:
> > On Tue, May 13, 2003 at 08:01:01AM -0500, Paul Fulghum wrote:
> > >
> > > I applied the patch plus a couple of printk statements,
> > > and the wakeup_hc() is being continuously called
> > > as well as actually executing the delay.
> >
> > Is the suspend_hc() function ever getting called by anyone in that
> > driver?
>
> Ok, nevermind, I see where it would be getting called under normal
> operation...
>
> Hm, I don't really know. Johannes, any thoughts?

My take is that wakeup_hc() is getting called whenever some stray signal
causes the device to generate an interrupt, and then a little while later
the stall timer routine calls suspend_hc() since nothing is active. The
interrupts are probably indistinguishable from what you would get if a new
device really had just been attached to the bus.

Assuming this analysis is correct, only malfunctioning hardware would ever
cause the problem to arise. Still, it's something that needs to be
handled. (That's a tricky point -- to what extent should the kernel try
to compensate for broken hardware?)

Unfortunately, there isn't any obvious way to tell that under these
circumstances the wakeup_hc() routine doesn't need to run. Using a timer
routine to implement that 20 ms delay would at least remove the large
interrupt latency. However, this presents some problems as well. In
particular, is there anything that would prevent suspend_hc() from being
called before the timer had expired? We don't want to find ourselves
simultaneously trying to turn the USB controller on and off. Getting that
done properly will require some thought.

Maybe a kind of grace period would help: each time the controller changes
state, don't allow another change until at least 1 second later. That
would also help the "bouncing" effect I see when I turn on or off my USB
CD-RW drive.

Alan Stern

2003-05-13 21:38:08

by Helge Hafting

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Tue, May 13, 2003 at 05:35:47PM -0400, Alan Stern wrote:
> My take is that wakeup_hc() is getting called whenever some stray signal
> causes the device to generate an interrupt, and then a little while later
> the stall timer routine calls suspend_hc() since nothing is active. The
> interrupts are probably indistinguishable from what you would get if a new
> device really had just been attached to the bus.
>
Could this also happen if the USB interrupt is shared?
The other device interrupts, and the kernel calls into
usb interrupt routine just in case USB has some data too?

Helge Hafting

2003-05-13 21:56:44

by Alan Stern

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Tue, 13 May 2003, Helge Hafting wrote:

> On Tue, May 13, 2003 at 05:35:47PM -0400, Alan Stern wrote:
> > My take is that wakeup_hc() is getting called whenever some stray signal
> > causes the device to generate an interrupt, and then a little while later
> > the stall timer routine calls suspend_hc() since nothing is active. The
> > interrupts are probably indistinguishable from what you would get if a new
> > device really had just been attached to the bus.
> >
> Could this also happen if the USB interrupt is shared?
> The other device interrupts, and the kernel calls into
> usb interrupt routine just in case USB has some data too?

Yes, it certainly could. The other part of the problem, which I failed to
mention, is that the Resume-Detect bit in the USB controller's status
register is set. wakeup_hc() gets called only if that bit is set, and the
bit is supposed to be set only if some device attached to the USB bus has
requested a wakeup (also known as "resume"). If there's nothing on the
bus, the controller shouldn't indicate that a resume was detected.

Alan Stern

2003-05-13 23:08:04

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Tue, 2003-05-13 at 15:17, Bill Davidsen wrote:

> 2 - if you don't use USB why not just take the driver out?

Because a driver that runs amok, silently causing
interrupt latency problems, becomes a real support
nightmare for others.

> It would be nice to prevent the problem, of course.

Agreed

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-14 17:38:58

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Fri, 2003-05-09 at 16:28, Andrew Morton wrote:
> Paul Fulghum <[email protected]> wrote:
> >
> > In the process of eliminating kernel options to isolate
> > the problem, eliminating USB completely appears to fix it.
> >
> > One machine (server) was using usb-uhci and
> > the other (laptop) was using usb-ohci.
> >
> > So it looks like something with USB in 2.5.68-bk11

The latency problem seen on the laptop turned out to
be a stupid mistake on my part: I enabled the ALI15XX
IDE controller option as a module instead of in kernel
and so it was not available for using DMA mode. Once
corrected the latency is running at a smooth 20us without
the >5ms spikes associated with PIO IDE.

Final Diagnosis:

server latency problem = USB wakeup_hc() delay added in 2.5.68-bk11
laptop latency problem = user with dain bramage

Thanks,
Paul

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-14 20:39:46

by Alan Stern

[permalink] [raw]
Subject: Test Patch: 2.5.69 Interrupt Latency

On 13 May 2003, Paul Fulghum wrote:

> On Tue, 2003-05-13 at 10:26, Alan Stern wrote:
>
> > Putting in a sanity check for the global suspend state will be very easy.
> > But I would like to point out that this "global suspend" does not refer to
> > the entire system, only the USB bus.
>
> That is a problem then, because the delay can still
> occur during normal system operation.
>
> > I'm not sure under what
> > circumstances the bus is placed in global suspend; I think it's just when
> > there are no devices attached (or the last remaining device is detached).
> >
> > However, there have been cases on my own system where turning off the only
> > USB peripheral caused the driver to bounce between suspend_hc() and
> > wakeup_hc() several times without any apparent explanation -- possibly as
> > a result of transient electrical signals on the bus (?). So perhaps
> > moving that delay out of the ISR isn't such a bad idea.
>
> Agreed. If this can happen on functional USB controllers
> when no devices are attached, then it is a serious problem.

Below is a patch that addresses both of the issues raised in this thread.
The delay time is moved out of the interrupt handler, and the
wakeup/suspend transitions are de-bounced. To do this, I needed to add a
mildly elaborate state mechanism. State transitions are polled during the
stall-timer callback.

There is no protection against simultaneous access from multiple threads,
such as a PCI suspend occurring at the same time as a normal suspend or
resume. The original driver didn't have any either; it's probably not
worth worrying too much about. The patch works okay on my system. Try it
and see how it works on yours.

Johannes, please look over this code and verify that I haven't screwed
anything up.

Alan Stern


===== uhci-hcd.c 1.48 vs edited =====
--- 1.48/drivers/usb/host/uhci-hcd.c Mon May 5 02:49:54 2003
+++ edited/drivers/usb/host/uhci-hcd.c Wed May 14 16:45:42 2003
@@ -61,7 +61,7 @@
/*
* Version Information
*/
-#define DRIVER_VERSION "v2.0"
+#define DRIVER_VERSION "v2.1"
#define DRIVER_AUTHOR "Linus 'Frodo Rabbit' Torvalds, Johannes Erdfelt, Randy Dunlap, Georg Acher, Deti Fliegl, Thomas Sailer, Roman Weissgaerber"
#define DRIVER_DESC "USB Universal Host Controller Interface driver"

@@ -91,9 +91,7 @@
static int uhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb);
static void uhci_unlink_generic(struct uhci_hcd *uhci, struct urb *urb);

-static int ports_active(struct uhci_hcd *uhci);
-static void suspend_hc(struct uhci_hcd *uhci);
-static void wakeup_hc(struct uhci_hcd *uhci);
+static void hc_state_transitions(struct uhci_hcd *uhci);

/* If a transfer is still active after this much time, turn off FSBR */
#define IDLE_TIMEOUT (HZ / 20) /* 50 ms */
@@ -1757,9 +1755,8 @@
uhci->skel_term_qh->link = UHCI_PTR_TERM;
}

- /* enter global suspend if nothing connected */
- if (!uhci->is_suspended && !ports_active(uhci))
- suspend_hc(uhci);
+ /* Poll for and perform state transitions */
+ hc_state_transitions(uhci);

init_stall_timer(hcd);
}
@@ -1884,14 +1881,14 @@
err("%x: host system error, PCI problems?", io_addr);
if (status & USBSTS_HCPE)
err("%x: host controller process error. something bad happened", io_addr);
- if ((status & USBSTS_HCH) && !uhci->is_suspended) {
+ if ((status & USBSTS_HCH) && uhci->state > 0) {
err("%x: host controller halted. very bad", io_addr);
/* FIXME: Reset the controller, fix the offending TD */
}
}

if (status & USBSTS_RD)
- wakeup_hc(uhci);
+ uhci->resume_detect = 1;

uhci_free_pending_qhs(uhci);

@@ -1922,45 +1919,75 @@
unsigned int io_addr = uhci->io_addr;

/* Global reset for 50ms */
+ uhci->state = UHCI_RESET;
outw(USBCMD_GRESET, io_addr + USBCMD);
wait_ms(50);
outw(0, io_addr + USBCMD);
wait_ms(10);
+ uhci->resume_detect = 0;
}

static void suspend_hc(struct uhci_hcd *uhci)
{
unsigned int io_addr = uhci->io_addr;

- dbg("%x: suspend_hc", io_addr);
+ switch (uhci->state) {
+ case UHCI_RUNNING: /* Suspend after 1 second */
+ uhci->state = UHCI_SUSPENDING_GRACE;
+ uhci->state_end = jiffies + HZ;
+ break;

- uhci->is_suspended = 1;
- smp_wmb();
+ case UHCI_SUSPENDING_GRACE: /* Actually suspend */
+ dbg("%x: suspend_hc", io_addr);
+ uhci->state = UHCI_SUSPENDED;
+ uhci->resume_detect = 0;
+ outw(USBCMD_EGSM, io_addr + USBCMD);
+ break;

- outw(USBCMD_EGSM, io_addr + USBCMD);
+ default:
+ break;
+ }
}

static void wakeup_hc(struct uhci_hcd *uhci)
{
unsigned int io_addr = uhci->io_addr;
- unsigned int status;

- dbg("%x: wakeup_hc", io_addr);
+ switch (uhci->state) {
+ case UHCI_SUSPENDED: /* Start the resume */
+ dbg("%x: wakeup_hc", io_addr);
+
+ /* Global resume for >= 20ms */
+ uhci->state = UHCI_RESUMING_1;
+ uhci->state_end = jiffies + (20*HZ+999) / 1000;
+ outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
+ break;

- /* Global resume for 20ms */
- outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
- wait_ms(20);
- outw(0, io_addr + USBCMD);
-
- /* wait for EOP to be sent */
- status = inw(io_addr + USBCMD);
- while (status & USBCMD_FGR)
- status = inw(io_addr + USBCMD);
+ case UHCI_RESUMING_1: /* Continue the resume */
+ uhci->state = UHCI_RESUMING_2;
+ outw(0, io_addr + USBCMD);
+ /* Falls through */

- uhci->is_suspended = 0;
+ case UHCI_RESUMING_2: /* Run for at least 1 second */

- /* Run and mark it configured with a 64-byte max packet */
- outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP, io_addr + USBCMD);
+ /* wait for EOP to be sent */
+ if (inw(io_addr + USBCMD) & USBCMD_FGR)
+ break;
+
+ /* Run and mark it configured with a 64-byte max packet */
+ uhci->state = UHCI_RUNNING_GRACE;
+ uhci->state_end = jiffies + HZ;
+ outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP,
+ io_addr + USBCMD);
+ break;
+
+ case UHCI_RUNNING_GRACE: /* Now allowed to suspend */
+ uhci->state = UHCI_RUNNING;
+ break;
+
+ default:
+ break;
+ }
}

static int ports_active(struct uhci_hcd *uhci)
@@ -1975,6 +2002,40 @@
return connection;
}

+static void hc_state_transitions(struct uhci_hcd *uhci)
+{
+ switch (uhci->state) {
+ case UHCI_RUNNING:
+
+ /* enter global suspend if nothing connected */
+ if (!ports_active(uhci))
+ suspend_hc(uhci);
+ break;
+
+ case UHCI_SUSPENDING_GRACE:
+ if (time_after_eq(jiffies, uhci->state_end))
+ suspend_hc(uhci);
+ break;
+
+ case UHCI_SUSPENDED:
+
+ /* wakeup if requested by a device */
+ if (uhci->resume_detect)
+ wakeup_hc(uhci);
+ break;
+
+ case UHCI_RESUMING_1:
+ case UHCI_RESUMING_2:
+ case UHCI_RUNNING_GRACE:
+ if (time_after_eq(jiffies, uhci->state_end))
+ wakeup_hc(uhci);
+ break;
+
+ default:
+ break;
+ }
+}
+
static void start_hc(struct uhci_hcd *uhci)
{
unsigned int io_addr = uhci->io_addr;
@@ -2003,6 +2064,8 @@
outl(uhci->fl->dma_handle, io_addr + USBFLBASEADD);

/* Run and mark it configured with a 64-byte max packet */
+ uhci->state = UHCI_RUNNING_GRACE;
+ uhci->state_end = jiffies + HZ;
outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP, io_addr + USBCMD);

uhci->hcd.state = USB_STATE_READY;
@@ -2101,8 +2164,6 @@
uhci->fsbr = 0;
uhci->fsbrtimeout = 0;

- uhci->is_suspended = 0;
-
spin_lock_init(&uhci->qh_remove_list_lock);
INIT_LIST_HEAD(&uhci->qh_remove_list);

@@ -2335,6 +2396,8 @@
{
struct uhci_hcd *uhci = hcd_to_uhci(hcd);

+ /* Force an immediate suspend */
+ uhci->state = UHCI_SUSPENDING_GRACE;
suspend_hc(uhci);
return 0;
}
===== uhci-hcd.h 1.11 vs edited =====
--- 1.11/drivers/usb/host/uhci-hcd.h Tue Dec 10 14:03:10 2002
+++ edited/drivers/usb/host/uhci-hcd.h Wed May 14 15:48:18 2003
@@ -282,6 +282,29 @@
return 0; /* int128 for 128-255 ms (Max.) */
}

+/*
+ * Device states for the host controller.
+ *
+ * To prevent "bouncing" in the presence of electrical noise,
+ * we insist on a 1-second "grace" period, before switching to
+ * the RUNNING or SUSPENDED states, during which the state is
+ * not allowed to change.
+ *
+ * The resume process is divided into substates in order to avoid
+ * potentially length delays during the timer handler.
+ *
+ * States in which the host controller is halted must have values <= 0.
+ */
+enum uhci_state {
+ UHCI_RESET,
+ UHCI_RUNNING_GRACE, /* Before RUNNING */
+ UHCI_RUNNING, /* The normal state */
+ UHCI_SUSPENDING_GRACE, /* Before SUSPENDED */
+ UHCI_SUSPENDED = -10, /* When no devices are attached */
+ UHCI_RESUMING_1,
+ UHCI_RESUMING_2
+};
+
#define hcd_to_uhci(hcd_ptr) container_of(hcd_ptr, struct uhci_hcd, hcd)

/*
@@ -313,7 +336,10 @@
struct uhci_frame_list *fl; /* P: uhci->frame_list_lock */
int fsbr; /* Full speed bandwidth reclamation */
unsigned long fsbrtimeout; /* FSBR delay */
- int is_suspended;
+
+ enum uhci_state state; /* FIXME: needs a spinlock */
+ unsigned long state_end; /* Time of next transition */
+ int resume_detect; /* Need a Global Resume */

/* Main list of URB's currently controlled by this HC */
spinlock_t urb_list_lock;

2003-05-14 20:57:05

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Tue, 2003-05-13 at 13:11, Greg KH wrote:

I was looking over the PIIX3 datasheet and noticed
that the USBSTS_RD bit is only valid when the
device is in the suspended state.

This bit is being acted on regardless of the
suspend state of the controller in the ISR.
Could this be why the driver is detecting
false 'resume' signals and calling wakeup_hc()
when it shouldn't?

Maybe the code should be something like:

if (uhci->is_suspended && (status & USBSTS_RD))
wakeup_hc(uhci);

in the ISR to qualify acting on that status bit.
Alternatively, USBCMD_EGSM (BIT3) of the USBCMD
register could be tested to qualify action on
the state of USBSTS_RD

I'm going to test this now, but I wanted to
know what you think.

Thanks,
Paul

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-14 21:03:05

by Johannes Erdfelt

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Wed, May 14, 2003, Paul Fulghum <[email protected]> wrote:
> I was looking over the PIIX3 datasheet and noticed
> that the USBSTS_RD bit is only valid when the
> device is in the suspended state.
>
> This bit is being acted on regardless of the
> suspend state of the controller in the ISR.
> Could this be why the driver is detecting
> false 'resume' signals and calling wakeup_hc()
> when it shouldn't?
>
> Maybe the code should be something like:
>
> if (uhci->is_suspended && (status & USBSTS_RD))
> wakeup_hc(uhci);
>
> in the ISR to qualify acting on that status bit.
> Alternatively, USBCMD_EGSM (BIT3) of the USBCMD
> register could be tested to qualify action on
> the state of USBSTS_RD
>
> I'm going to test this now, but I wanted to
> know what you think.

Good eye. That may very well be the problem. Looking at the UHCI specs,
it says the same thing, but I never really noticed it before.

JE

2003-05-14 21:15:54

by Greg KH

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Wed, May 14, 2003 at 04:06:33PM -0500, Paul Fulghum wrote:
> On Tue, 2003-05-13 at 13:11, Greg KH wrote:
>
> I was looking over the PIIX3 datasheet and noticed
> that the USBSTS_RD bit is only valid when the
> device is in the suspended state.
>
> This bit is being acted on regardless of the
> suspend state of the controller in the ISR.
> Could this be why the driver is detecting
> false 'resume' signals and calling wakeup_hc()
> when it shouldn't?
>
> Maybe the code should be something like:
>
> if (uhci->is_suspended && (status & USBSTS_RD))
> wakeup_hc(uhci);

That's basically what the code I sent you did :)

> in the ISR to qualify acting on that status bit.
> Alternatively, USBCMD_EGSM (BIT3) of the USBCMD
> register could be tested to qualify action on
> the state of USBSTS_RD
>
> I'm going to test this now, but I wanted to
> know what you think.

I think it's correct, but I don't think it will solve your problem. I
would be very happy to be wrong though.

thanks,

greg k-h

2003-05-14 21:34:21

by Paul Fulghum

[permalink] [raw]
Subject: Re: 2.5.69 Interrupt Latency

On Wed, 2003-05-14 at 16:30, Greg KH wrote:
> On Wed, May 14, 2003 at 04:06:33PM -0500, Paul Fulghum wrote:
> > On Tue, 2003-05-13 at 13:11, Greg KH wrote:
> >
> > I was looking over the PIIX3 datasheet and noticed
> > that the USBSTS_RD bit is only valid when the
> > device is in the suspended state.
> >
> > This bit is being acted on regardless of the
> > suspend state of the controller in the ISR.
> > Could this be why the driver is detecting
> > false 'resume' signals and calling wakeup_hc()
> > when it shouldn't?
> >
> > Maybe the code should be something like:
> >
> > if (uhci->is_suspended && (status & USBSTS_RD))
> > wakeup_hc(uhci);
>
> That's basically what the code I sent you did :)

Yes, that's right.

In this case suspend_hc() is being called anyways, so
the controller *is* in the suspended state.
suspend_hc() and wakeup_hc() are spinning back
and forth forever.

For some reason I thought this was firing without
a call to suspend_hc(), but I verified it with printks.

I tried it both with is_suspended, and again testing
USBCMD_EGSM in the command register (Greg's patch)
with same results.

So it is a good check to add to qualify USBSTS_RD,
but in this case it looks like the mainboard implementation
is FUBAR and bogus resume messages are being
recognized by the controller.

Is there some transient period after setting USBCMD_EGSM
before which the controller is not officially in the
suspended state that might cause a spurious
USBSTS_RD indication? (seems unlikely)

> > in the ISR to qualify acting on that status bit.
> > Alternatively, USBCMD_EGSM (BIT3) of the USBCMD
> > register could be tested to qualify action on
> > the state of USBSTS_RD
> >
> > I'm going to test this now, but I wanted to
> > know what you think.
>
> I think it's correct, but I don't think it will solve your problem. I
> would be very happy to be wrong though.

You are right (IMO) that it is correct and should be added,
and you are also right in that it does not solve this problem.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-15 13:34:42

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Wed, 2003-05-14 at 15:52, Alan Stern wrote:
> Below is a patch that addresses both of the issues raised in this thread.
> The delay time is moved out of the interrupt handler, and the
> wakeup/suspend transitions are de-bounced. To do this, I needed to add a
> mildly elaborate state mechanism. State transitions are polled during the
> stall-timer callback.
>
> There is no protection against simultaneous access from multiple threads,
> such as a PCI suspend occurring at the same time as a normal suspend or
> resume. The original driver didn't have any either; it's probably not
> worth worrying too much about. The patch works okay on my system. Try it
> and see how it works on yours.
>
> Johannes, please look over this code and verify that I haven't screwed
> anything up.
>
> Alan Stern

I tested the patch, and it solves the IRQ latency problems by moving
the delay outside of the ISR. The debouncing period reduces the
rate of thrashing back and forth between wake and suspend, but
the cycle does continue forever:

May 15 08:27:27 diemos kernel: suspend_hc():UHCI_RUNNING_GRACE
May 15 08:27:27 diemos kernel: suspend_hc():UHCI_RUNNING
May 15 08:27:28 diemos kernel: suspend_hc():UHCI_SUSPENDING_GRACE
May 15 08:27:28 diemos kernel: wakeup_hc():UHCI_SUSPENDED
May 15 08:27:28 diemos kernel: wakeup_hc():UHCI_RESUMING_1
May 15 08:27:28 diemos kernel: suspend_hc():UHCI_RESUMING_2
May 15 08:27:29 diemos kernel: suspend_hc():UHCI_RUNNING_GRACE
May 15 08:27:29 diemos kernel: suspend_hc():UHCI_RUNNING
May 15 08:27:30 diemos kernel: suspend_hc():UHCI_SUSPENDING_GRACE
May 15 08:27:30 diemos kernel: wakeup_hc():UHCI_SUSPENDED
May 15 08:27:30 diemos kernel: wakeup_hc():UHCI_RESUMING_1
May 15 08:27:30 diemos kernel: suspend_hc():UHCI_RESUMING_2
May 15 08:27:31 diemos kernel: suspend_hc():UHCI_RUNNING_GRACE

This patch removes my complaint, but I do wonder why this
unused controller continually generates the USBSTS_RD
indications. I would hope HP used pull-ups/downs on unused
input signals of the PIIX3 chipset, but maybe not.

I also can't vouch for the correct operation of this patch
for fully functional USB implementations.

If you have other tests you want me to do to try and figure
out a sane way of dealing with such unused controllers,
just ask.

Thanks,
Paul

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-15 14:01:41

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency


I have a question about the wakeup_hc() code in general:

when waking in response to a resume event,
the current code sets the USBCMD_FGR (force global resume)
and USBCMD_EGSM (enter global suspend mode) bits,
waits 20ms and clears both bits to start sending EOP signal.

According to the datasheet, the controller itself
(not software) sets the FGR bit on detection of
a resume event. 20ms after the USBSTS_RD indication,
software should clear both the FGR and EGSM bits.

My reading of this is that the line:

outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);

before the 20ms wait should not be necessary.

Am I reading this correctly?

Thanks,
Paul

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-15 15:15:16

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency


I think I found the key to this whole problem:

Note:I mistakenly referred to the chipset as PIIX3
in previous messages, in fact it is the PIIX4 (BX)

The PIIX4 errata states that false resume indications
can be generated if CLK48 is active during an
overcondition indication (OC[1..0]).

Sure enough, the USBPORTSC[12] registers constantly
report a status of 0C80 which shows that both
ports are showing overcurrent condition (which
disables the associated port).

My guess is that HP deliberately tied the OC[1..0]
inputs active to force the ports to a disabled state.

So checking for the case of both ports constantly
in OC condition and disabled may be a reasonable
way to either disable the controller altogether or
at least not do the wakeup/suspend shuffle.

Any comments?

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-15 17:58:55

by Alan Stern

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On 15 May 2003, Paul Fulghum wrote:

> I think I found the key to this whole problem:
>
> Note:I mistakenly referred to the chipset as PIIX3
> in previous messages, in fact it is the PIIX4 (BX)
>
> The PIIX4 errata states that false resume indications
> can be generated if CLK48 is active during an
> overcondition indication (OC[1..0]).
>
> Sure enough, the USBPORTSC[12] registers constantly
> report a status of 0C80 which shows that both
> ports are showing overcurrent condition (which
> disables the associated port).
>
> My guess is that HP deliberately tied the OC[1..0]
> inputs active to force the ports to a disabled state.
>
> So checking for the case of both ports constantly
> in OC condition and disabled may be a reasonable
> way to either disable the controller altogether or
> at least not do the wakeup/suspend shuffle.
>
> Any comments?

That sounds like a believable explanation. My copy of the generic UHCI
specification does not include the OC port status bits. I'm guessing from
your mail they are either bit 10 or bit 11 of the PORTSC registers, can't
tell which. Maybe they are an Intel-specific addition? Or perhaps a more
recent version of the spec has more information -- the one I've got is 1.1
(March 1996).

Can you suggest a good way of detecting whether or not a controller is
part of a PIIX4 chipset, to indicate whether or not the OC bits are valid?
Maybe the PCI vendor and product codes will have that information? I'm
not sure it's safe to assume that any old host controller will have
meaningful values there; the spec just says "reserved" and doesn't
stipulate that they will always read as 0's.

Alan Stern

2003-05-15 18:29:27

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Thu, 2003-05-15 at 13:11, Alan Stern wrote:
> That sounds like a believable explanation. My copy of the generic UHCI
> specification does not include the OC port status bits. I'm guessing from
> your mail they are either bit 10 or bit 11 of the PORTSC registers, can't
> tell which. Maybe they are an Intel-specific addition? Or perhaps a more
> recent version of the spec has more information -- the one I've got is 1.1
> (March 1996).
>
> Can you suggest a good way of detecting whether or not a controller is
> part of a PIIX4 chipset, to indicate whether or not the OC bits are valid?
> Maybe the PCI vendor and product codes will have that information? I'm
> not sure it's safe to assume that any old host controller will have
> meaningful values there; the spec just says "reserved" and doesn't
> stipulate that they will always read as 0's.

I was originally looking at the 82731FB (PIIX) / 82731SB (PIIX3) datasheet
which does not have over current inputs and has bits 11..10
labelled as reserved (but read value is not specified).

The lspci show the device on the Netserver to be
the 82731AB/EB/MB (PIIX4). This datasheet shows 2 over current
inputs OC[1..0] and defines PORTSC bits:
11 - over current indicator change (1=changed, 0=not changed)
10 - over current indicator state (1=over current, 0=normal)

If bit 10 is set then the documentation says the port is disabled.
Which triggers the erratum and false resume signals.

As you say, the PIIX3 does not specify that the reserved bits
will necessarily read 0, then I guess some other method
is needed to indicate these bits are significant. Or maybe
some other document does specify that the reserved bits
must be zero if not used? The PCI ID should differentiate
between the controllers.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-15 19:31:19

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Thu, 2003-05-15 at 13:11, Alan Stern wrote:
> Maybe they are an Intel-specific addition? Or perhaps a more
> recent version of the spec has more information -- the one I've got is 1.1
> (March 1996).

I can't find any later documents.

> Can you suggest a good way of detecting whether or not a controller is
> part of a PIIX4 chipset, to indicate whether or not the OC bits are valid?

I don't see a generic way to determine the validity of these bits.

I think the PCI ID is the only way:
Vendor ID 8086
Device ID 7112

The erratum is only for the PIIX4, and it is
triggered only when the OC inputs are active,
so limiting the check to that device should
be OK.

Probably the least intrusive thing to do
is to disable suspending the uhci controller
if it is a PIIX4 *and* either port has an
over current condition. This will catch the case
of a functional USB controller that has one
or more real over current conditions and the
case of a deliberately disabled (by hardwiring
the OC inputs) controller. The erratum will
pop up in both cases causing suspend<->wake
thrashing.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-15 19:48:38

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Thu, 2003-05-15 at 14:42, Paul Fulghum wrote:
> On Thu, 2003-05-15 at 13:11, Alan Stern wrote:
> > Maybe they are an Intel-specific addition? Or perhaps a more
> > recent version of the spec has more information -- the one I've got is 1.1
> > (March 1996).
>
> I can't find any later documents.
>
> > Can you suggest a good way of detecting whether or not a controller is
> > part of a PIIX4 chipset, to indicate whether or not the OC bits are valid?
>
> I don't see a generic way to determine the validity of these bits.
>
> I think the PCI ID is the only way:
> Vendor ID 8086
> Device ID 7112
>
> The erratum is only for the PIIX4, and it is
> triggered only when the OC inputs are active,
> so limiting the check to that device should
> be OK.

More clarification (at a suggestion from Charles Lepple):
The errata covers all steppings of the 82371AB/EB/MB
with a note that this bug will never be fixed
in these devices.

So checking for 8086:7112 should be sufficient without
a need to check the version number.

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-15 20:54:53

by Alan Stern

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On 15 May 2003, Paul Fulghum wrote:

> I have a question about the wakeup_hc() code in general:
>
> when waking in response to a resume event,
> the current code sets the USBCMD_FGR (force global resume)
> and USBCMD_EGSM (enter global suspend mode) bits,
> waits 20ms and clears both bits to start sending EOP signal.
>
> According to the datasheet, the controller itself
> (not software) sets the FGR bit on detection of
> a resume event. 20ms after the USBSTS_RD indication,
> software should clear both the FGR and EGSM bits.
>
> My reading of this is that the line:
>
> outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
>
> before the 20ms wait should not be necessary.
>
> Am I reading this correctly?

I interpret it the same way as you. I tried removing that line from the
driver, and it continued to work just fine. So it looks like you are
right.

Alan Stern

2003-05-15 21:01:00

by Alan Stern

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On 15 May 2003, Paul Fulghum wrote:

> The erratum is only for the PIIX4, and it is
> triggered only when the OC inputs are active,
> so limiting the check to that device should
> be OK.
>
> Probably the least intrusive thing to do
> is to disable suspending the uhci controller
> if it is a PIIX4 *and* either port has an
> over current condition. This will catch the case
> of a functional USB controller that has one
> or more real over current conditions and the
> case of a deliberately disabled (by hardwiring
> the OC inputs) controller. The erratum will
> pop up in both cases causing suspend<->wake
> thrashing.

My intention was to avoid resuming if the resume-detect bit is set only
on ports in an over-current condition, since that is the case mentioned in
the erratum. Of course, this isn't as failsafe as your suggestion. Which
do you think would work better?

Alan Stern

2003-05-15 21:19:01

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Thu, 2003-05-15 at 16:13, Alan Stern wrote:
> My intention was to avoid resuming if the resume-detect bit is set only
> on ports in an over-current condition, since that is the case mentioned in
> the erratum. Of course, this isn't as failsafe as your suggestion. Which
> do you think would work better?

This should be caught on the suspend side so
that you can still service the ports that do not
have the over current condition.

A single port in OC makes resume unreliable,
so the only thing to do is not suspend.

The following worked for me:

static int suspend_allowed(struct uhci_hcd *uhci)
{
unsigned int io_addr = uhci->io_addr;
int i;

if (!uhci->hcd.pdev ||
(uhci->hcd.pdev->vendor != PCI_VENDOR_ID_INTEL) ||
(uhci->hcd.pdev->device != PCI_DEVICE_ID_INTEL_82371AB_2))
return 1;

/* This is a 82371AB/EB/MB USB controller which has a bug that
* causes false resume indications if any port has an
* over current condition. If we do a global suspend then
* the controller thrashes back and forth between suspend and wakeup.
*
* Some motherboards using the 82371AB/EB/MB (but not the USB portion)
* appear to hardwire the over current inputs active to disable
* the USB ports..
*/

/* check for over current condition on all ports */
for (i = 0; i < uhci->rh_numports; i++) {
if (inw(io_addr + USBPORTSC1 + i * 2) & 0x0400)
return 0;
}

return 1;
}

static void suspend_hc(struct uhci_hcd *uhci)
{
unsigned int io_addr = uhci->io_addr;

if (!suspend_allowed(uhci))
return;




--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-16 00:07:35

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Thu, 2003-05-15 at 16:30, Paul Fulghum wrote:
> On Thu, 2003-05-15 at 16:13, Alan Stern wrote:
> > My intention was to avoid resuming if the resume-detect bit is set only
> > on ports in an over-current condition, since that is the case mentioned in
> > the erratum. Of course, this isn't as failsafe as your suggestion. Which
> > do you think would work better?
>
> This should be caught on the suspend side so
> that you can still service the ports that do not
> have the over current condition.
>
> A single port in OC makes resume unreliable,
> so the only thing to do is not suspend.

Alan:

I think I misread your message. Is there a per port resume
indication? (I'm at home and don't have the specs in front
of me) I was thinking of the global USBSTS_RD bit.

If you can qualify the global USBSTS_RD bit with a per
port resume indication on a non OC port, then it might
make sense to do this on the wakeup side.

Pro: you could suspend the controller when appropriate
without interference from the OC ports

Con: you would be generating a lot of spurious interrupts
as the global USBSTS_RD is set (incorrectly) by the OC ports.
Even though you would not actually do the wake, you still
burn cycles servicing the false interrupts.

So my inclination is still to nab this on the suspend side.

Paul Fulghum
[email protected]



2003-05-16 15:20:19

by Alan Stern

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

Paul:

On 15 May 2003, Paul Fulghum wrote:

> I think I misread your message. Is there a per port resume
> indication? (I'm at home and don't have the specs in front
> of me) I was thinking of the global USBSTS_RD bit.

You can probably find this information in your copy of the specs. Bit 6
of the PORTSC register is Resume Detect. When that bit gets set, the host
controller signals a global resume and presumably sets bit 2 of the USBSTS
register.

> If you can qualify the global USBSTS_RD bit with a per
> port resume indication on a non OC port, then it might
> make sense to do this on the wakeup side.
>
> Pro: you could suspend the controller when appropriate
> without interference from the OC ports
>
> Con: you would be generating a lot of spurious interrupts
> as the global USBSTS_RD is set (incorrectly) by the OC ports.
> Even though you would not actually do the wake, you still
> burn cycles servicing the false interrupts.

I'm not sure about that. For ports in a permanent OC state, the RD bit
would get set just once, so a single interrupt would be generated. When
the host clears the Resume Detect bit in the USBSTS register, it shouldn't
get set again (not until a different port signals a resume). Otherwise a
properly working system would generate continuous interrupts during the
global resume sequence.

That's just guesswork on my part; the spec isn't sufficiently precise to
be certain one way or the other. You can find out pretty easily by
testing the driver on your machine. Just don't do anything in the ISR
when USBSTS_RD is set. Come to think of it, I can try the same thing
here.

> So my inclination is still to nab this on the suspend side.

By a nice coincidence, my system has a 8086:7112 controller -- wired up
correctly and perfectly useable. So I can easily test to make sure that
the final proposed change works okay on a good system.

BTW, I'm not entirely pleased with the size of my test patch. It's a bit
lengthy for something intended mainly to move a delay loop outside an ISR.
But I couldn't think of any simpler way to do it, and once the state
transition code is there it's really no harder to add the 1-second "grace"
periods. So of the three ingredients we've got here (20ms delay outside
of ISR, "grace" periods, checking for OC ports), I don't think we could
make the patch much simpler by eliminating any. What do you think?

Alan Stern

2003-05-16 15:45:36

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Fri, 2003-05-16 at 10:33, Alan Stern wrote:

> You can probably find this information in your copy of the specs. Bit 6
> of the PORTSC register is Resume Detect. When that bit gets set, the host
> controller signals a global resume and presumably sets bit 2 of the USBSTS
> register.

Yes, I see that now.

> > Con: you would be generating a lot of spurious interrupts
> > as the global USBSTS_RD is set (incorrectly) by the OC ports.
> > Even though you would not actually do the wake, you still
> > burn cycles servicing the false interrupts.
>
> I'm not sure about that. For ports in a permanent OC state, the RD bit
> would get set just once, so a single interrupt would be generated. When
> the host clears the Resume Detect bit in the USBSTS register, it shouldn't
> get set again (not until a different port signals a resume). Otherwise a
> properly working system would generate continuous interrupts during the
> global resume sequence.

Good point. The continuous interrupts I was seeing is because
the wakeup was actually being carried out, followed by the
thrashing between suspend and wakeup.

If your interpretation is true (which I suspect it is)
then global suspend could still be supported with only
a couple of extra interrupts after each suspend.

> That's just guesswork on my part; the spec isn't sufficiently precise to
> be certain one way or the other. You can find out pretty easily by
> testing the driver on your machine. Just don't do anything in the ISR
> when USBSTS_RD is set. Come to think of it, I can try the same thing
> here.

I'll test this.

> By a nice coincidence, my system has a 8086:7112 controller -- wired up
> correctly and perfectly useable. So I can easily test to make sure that
> the final proposed change works okay on a good system.

Good

> BTW, I'm not entirely pleased with the size of my test patch. It's a bit
> lengthy for something intended mainly to move a delay loop outside an ISR.
> But I couldn't think of any simpler way to do it, and once the state
> transition code is there it's really no harder to add the 1-second "grace"
> periods. So of the three ingredients we've got here (20ms delay outside
> of ISR, "grace" periods, checking for OC ports), I don't think we could
> make the patch much simpler by eliminating any. What do you think?

Moving the wait out of the ISR and doing the wakeup
only for RD on non-OC ports are winners.

I can't comment on the 1 second grace period. Was that
in response to this investigation, or have you actually
seen false RD indications due to noise?

There is also the more trivial matter of removing the
unnecessary setting of the FGR bit on wakeup.

I'll check that the global RD interrupt does not
keep repeating after a false RD by an OC port.

So I suggest you build a patch that does all of
the above (with the grace period at your discretion).
Then we can both test it, and you can submit it
for actual inclusion.

Thanks,
Paul



2003-05-16 16:05:35

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Fri, 2003-05-16 at 10:58, Paul Fulghum wrote:
> There is also the more trivial matter of removing the
> unnecessary setting of the FGR bit on wakeup.

Gack! I just thought of something else:

According to the 82371AB datasheet the controller
itself sets the USBCMD_FGR bit when a global RD is
detected. So qualifying the wakeup in software won't
prevent the controller itself from starting the
wakeup process on a false RD from an OC port. :-(

Hmmmm.... crap
Is there a way around this or are we back to
preventing the suspend?

--
Paul Fulghum
[email protected]

2003-05-16 17:03:21

by Alan Stern

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On 16 May 2003, Paul Fulghum wrote:

> Gack! I just thought of something else:
>
> According to the 82371AB datasheet the controller
> itself sets the USBCMD_FGR bit when a global RD is
> detected. So qualifying the wakeup in software won't
> prevent the controller itself from starting the
> wakeup process on a false RD from an OC port. :-(
>
> Hmmmm.... crap
> Is there a way around this or are we back to
> preventing the suspend?

It might not be that bad. What will happen is that the controller will
assert FGR and interrupt the host. The driver will see the global RD, but
will also see that it's present only on OC ports, so the driver will never
leave the SUSPENDED state and will never write a 0 to FGR and EGSM. Hence
the controller will never become active -- the wakeup won't finish.

Of course, it's necessary to worry about what happens if one port is OC,
so the controller is in this permanently-waking-up state, when a device is
plugged into the other port. I think this will re-assert global RD and
generate another interrupt. This time the driver will see that the RD is
for real, so it will complete the wakeup sequence.

Neither of us can easily test that, because it requires one port to be
broken and the other to be working. But if everything else is okay, I
think this will work too.

Alan Stern

2003-05-16 17:08:01

by Alan Stern

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On 16 May 2003, Paul Fulghum wrote:

> Moving the wait out of the ISR and doing the wakeup
> only for RD on non-OC ports are winners.
>
> I can't comment on the 1 second grace period. Was that
> in response to this investigation, or have you actually
> seen false RD indications due to noise?

Well, I have actually seen false indications. Whether they are due to
noise is open to debate. Since they occur just at the time when I turn
the power to my USB peripheral on or off, that's my best guess. It might
even turn out that power on/off generates a temporary OC condition, so
fixing that might render the grace period unnecessary. I haven't had a
chance try it yet.

> There is also the more trivial matter of removing the
> unnecessary setting of the FGR bit on wakeup.

Yes. That can be done in any case.

> I'll check that the global RD interrupt does not
> keep repeating after a false RD by an OC port.

Good.

> So I suggest you build a patch that does all of
> the above (with the grace period at your discretion).
> Then we can both test it, and you can submit it
> for actual inclusion.

I will. Probably won't be ready until some time next week.

Alan Stern

2003-05-16 17:39:02

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Fri, 2003-05-16 at 12:20, Alan Stern wrote:

> Well, I have actually seen false indications. Whether they are due to
> noise is open to debate. Since they occur just at the time when I turn
> the power to my USB peripheral on or off, that's my best guess. It might
> even turn out that power on/off generates a temporary OC condition, so
> fixing that might render the grace period unnecessary. I haven't had a
> chance try it yet.

That sounds like a reasonable explanation.

--
Paul Fulghum
[email protected]

2003-05-16 17:35:41

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Fri, 2003-05-16 at 12:16, Alan Stern wrote:
> On 16 May 2003, Paul Fulghum wrote:
> > According to the 82371AB datasheet the controller
> > itself sets the USBCMD_FGR bit when a global RD is
> > detected. So qualifying the wakeup in software won't
> > prevent the controller itself from starting the
> > wakeup process on a false RD from an OC port. :-(
> >
> > Hmmmm.... crap
> > Is there a way around this or are we back to
> > preventing the suspend?
>
> It might not be that bad. What will happen is that the controller will
> assert FGR and interrupt the host. The driver will see the global RD, but
> will also see that it's present only on OC ports, so the driver will never
> leave the SUSPENDED state and will never write a 0 to FGR and EGSM. Hence
> the controller will never become active -- the wakeup won't finish.
>
> Of course, it's necessary to worry about what happens if one port is OC,
> so the controller is in this permanently-waking-up state, when a device is
> plugged into the other port. I think this will re-assert global RD and
> generate another interrupt. This time the driver will see that the RD is
> for real, so it will complete the wakeup sequence.

Agreed, when the controller sets the FGR bit, it
starts sending the resume signal to all ports,
waking attached devices, which will send back
valid resume signals to the host controller, which
will complete the wakeup. Which takes us back
to the thrashing problem.

For the case where ports are not hardwired to OC,
we should account for the possibility that the
OC condition may clear (bad cable replaced, etc).

So if we allow suspend, and then ignore resumes
on an OC (temporary condition) port, then that port will not
be able to cause a valid resume when the OC
condition is cleared. (per port RD is already set)

So always allowing suspend, and selectively doing the
wakeup will cause:
1. thrashing for case of one port OC,
other port OK with attached device.
2. prevent port with OC from doing resume
after clearing OC condition.

For the case of all ports hardwired OC, this
will work because you suspend the whole controller
and never get a valid resume.


--
Paul Fulghum
[email protected]

2003-05-16 17:57:39

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Fri, 2003-05-16 at 10:33, Alan Stern wrote:
> Paul:
>
> On 15 May 2003, Paul Fulghum wrote:
> > Con: you would be generating a lot of spurious interrupts
> > as the global USBSTS_RD is set (incorrectly) by the OC ports.
> > Even though you would not actually do the wake, you still
> > burn cycles servicing the false interrupts.
>
> I'm not sure about that. For ports in a permanent OC state, the RD bit
> would get set just once, so a single interrupt would be generated. When
> the host clears the Resume Detect bit in the USBSTS register, it shouldn't
> get set again (not until a different port signals a resume). Otherwise a
> properly working system would generate continuous interrupts during the
> global resume sequence.

Your interpretation checks out. The global RD interrupt does not
reoccur once the individual RD bit is set. So we get a max of
once extra interrupt per OC port.

--
Paul Fulghum
[email protected]

2003-05-16 18:19:04

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Fri, 2003-05-16 at 12:48, Paul Fulghum wrote:
> So always allowing suspend, and selectively doing the
> wakeup will cause:
> 1. thrashing for case of one port OC,
> other port OK with attached device.
> 2. prevent port with OC from doing resume
> after clearing OC condition.
>
> For the case of all ports hardwired OC, this
> will work because you suspend the whole controller
> and never get a valid resume.

Just to add another argument to disallowing suspend
instead of qualifying wakeup:

In 99% of cases, with no OC, this won't come into play.
In .9% of cases, with transient OC, this won't delay suspend long.
In .01% of cases, with all hardwired OC ports, suspend does not matter.

Plus it cures the above problems #1 and #2.

If problem #1 occurs, I don't see that thrashing is
any better than not suspending at all.


--
Paul Fulghum
[email protected]

2003-05-16 18:27:14

by Alan Stern

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On 16 May 2003, Paul Fulghum wrote:

> Agreed, when the controller sets the FGR bit, it
> starts sending the resume signal to all ports,
> waking attached devices, which will send back
> valid resume signals to the host controller, which
> will complete the wakeup. Which takes us back
> to the thrashing problem.
>
> For the case where ports are not hardwired to OC,
> we should account for the possibility that the
> OC condition may clear (bad cable replaced, etc).
>
> So if we allow suspend, and then ignore resumes
> on an OC (temporary condition) port, then that port will not
> be able to cause a valid resume when the OC
> condition is cleared. (per port RD is already set)
>
> So always allowing suspend, and selectively doing the
> wakeup will cause:
> 1. thrashing for case of one port OC,
> other port OK with attached device.
> 2. prevent port with OC from doing resume
> after clearing OC condition.

I disagree with 1. When one port has an attached device there won't be
any problem because suspends will never occur (since one port is active).

But I agree with 2. So yes, it's probably safer to do a conditional
suspend rather than a conditional resume.

> For the case of all ports hardwired OC, this
> will work because you suspend the whole controller
> and never get a valid resume.

Not suspending isn't really a big deal. After all, we would suspend only
when no devices are plugged in anyway. Is the PIIX4 chipset used in
laptops, where the power saving might be important? That's the only
reason I can think of for wanting to suspend whenever possible.

Alan Stern


2003-05-16 18:54:03

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Fri, 2003-05-16 at 13:40, Alan Stern wrote:

> I disagree with 1. When one port has an attached device there won't be
> any problem because suspends will never occur (since one port is active).
> ...
> > For the case of all ports hardwired OC, this
> > will work because you suspend the whole controller
> > and never get a valid resume.
>
> Not suspending isn't really a big deal. After all, we would suspend only
> when no devices are plugged in anyway. Is the PIIX4 chipset used in
> laptops, where the power saving might be important? That's the only
> reason I can think of for wanting to suspend whenever possible.

OK, my bad. I thought global suspend could occur
with devices plugged in, if not that makes #1 a non issue.

The PIIX4E (same ID, same bug) is used in some laptops.
I would assume the 99% of laptop users without the OC
condition would like to save power.

I don't want to get in the way of the vast majority
of users for these rare special cases.

Since the thrashing is not a problem (no global suspend
when a device is plugged in), the only downside of
your original qualify wakeup plan is the possibility of
missing a valid resume once a transient OC condition is cleared.

Maybe just polling individual ports for OC cleared and
RD set to do a global wakeup?

You convinced me that the qualified wakeup is the
way to go.

--
Paul Fulghum
[email protected]

2003-05-18 00:59:53

by Andrew McGregor

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency



--On Friday, 16 May 2003 2:40 p.m. -0400 Alan Stern
<[email protected]> wrote:

> Not suspending isn't really a big deal. After all, we would suspend only
> when no devices are plugged in anyway. Is the PIIX4 chipset used in
> laptops, where the power saving might be important? That's the only
> reason I can think of for wanting to suspend whenever possible.

Yes, it has been used in laptops (some Gateway models, for instance), but I
suspect strangely broken configurations of this chip are more common than
laptops using it, and I don't expect the power saving to be that dramatic.
Those machines are pretty good for battery life anyway, considering the
high-end (for the time) configurations. (btw: I no longer have one)

Andrew

2003-05-19 16:28:56

by Alan Stern

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

Paul:

The patch below incorporates your suggested subroutine. That alone wasn't
enough to prevent the state from bouncing a few times when I powered my
USB device on or off, so the debounce code is in there too. This patch
behaves fine on my workstation, which has both ports connected. I'll try
it later on my laptop, which only has one port.

In the end, I decided it was easiest and safest to follow your "don't
suspend if any ports are OC" scheme. We can try it the other way too if
you want. What do you think would happen if you were to try to put your
machine in an APM/ACPI "suspend" state?

This is a cumulative patch, i.e., it applies to a virgin 2.5.69 source.
Let me know how it works for you.

Alan Stern


===== uhci-hcd.c 1.48 vs edited =====
--- 1.48/drivers/usb/host/uhci-hcd.c Mon May 5 02:49:54 2003
+++ edited/drivers/usb/host/uhci-hcd.c Mon May 19 11:30:22 2003
@@ -61,7 +61,7 @@
/*
* Version Information
*/
-#define DRIVER_VERSION "v2.0"
+#define DRIVER_VERSION "v2.1"
#define DRIVER_AUTHOR "Linus 'Frodo Rabbit' Torvalds, Johannes Erdfelt, Randy Dunlap, Georg Acher, Deti Fliegl, Thomas Sailer, Roman Weissgaerber"
#define DRIVER_DESC "USB Universal Host Controller Interface driver"

@@ -91,9 +91,7 @@
static int uhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb);
static void uhci_unlink_generic(struct uhci_hcd *uhci, struct urb *urb);

-static int ports_active(struct uhci_hcd *uhci);
-static void suspend_hc(struct uhci_hcd *uhci);
-static void wakeup_hc(struct uhci_hcd *uhci);
+static void hc_state_transitions(struct uhci_hcd *uhci);

/* If a transfer is still active after this much time, turn off FSBR */
#define IDLE_TIMEOUT (HZ / 20) /* 50 ms */
@@ -1757,9 +1755,8 @@
uhci->skel_term_qh->link = UHCI_PTR_TERM;
}

- /* enter global suspend if nothing connected */
- if (!uhci->is_suspended && !ports_active(uhci))
- suspend_hc(uhci);
+ /* Poll for and perform state transitions */
+ hc_state_transitions(uhci);

init_stall_timer(hcd);
}
@@ -1884,14 +1881,14 @@
err("%x: host system error, PCI problems?", io_addr);
if (status & USBSTS_HCPE)
err("%x: host controller process error. something bad happened", io_addr);
- if ((status & USBSTS_HCH) && !uhci->is_suspended) {
+ if ((status & USBSTS_HCH) && uhci->state > 0) {
err("%x: host controller halted. very bad", io_addr);
/* FIXME: Reset the controller, fix the offending TD */
}
}

if (status & USBSTS_RD)
- wakeup_hc(uhci);
+ uhci->resume_detect = 1;

uhci_free_pending_qhs(uhci);

@@ -1922,10 +1919,12 @@
unsigned int io_addr = uhci->io_addr;

/* Global reset for 50ms */
+ uhci->state = UHCI_RESET;
outw(USBCMD_GRESET, io_addr + USBCMD);
wait_ms(50);
outw(0, io_addr + USBCMD);
wait_ms(10);
+ uhci->resume_detect = 0;
}

static void suspend_hc(struct uhci_hcd *uhci)
@@ -1933,34 +1932,48 @@
unsigned int io_addr = uhci->io_addr;

dbg("%x: suspend_hc", io_addr);
-
- uhci->is_suspended = 1;
- smp_wmb();
-
+ uhci->state = UHCI_SUSPENDED;
+ uhci->resume_detect = 0;
outw(USBCMD_EGSM, io_addr + USBCMD);
}

static void wakeup_hc(struct uhci_hcd *uhci)
{
unsigned int io_addr = uhci->io_addr;
- unsigned int status;

- dbg("%x: wakeup_hc", io_addr);
+ switch (uhci->state) {
+ case UHCI_SUSPENDED: /* Start the resume */
+ dbg("%x: wakeup_hc", io_addr);
+
+ /* Global resume for >= 20ms */
+ uhci->state = UHCI_RESUMING_1;
+ uhci->state_end = jiffies + (20*HZ+999) / 1000;
+ break;

- /* Global resume for 20ms */
- outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD);
- wait_ms(20);
- outw(0, io_addr + USBCMD);
-
- /* wait for EOP to be sent */
- status = inw(io_addr + USBCMD);
- while (status & USBCMD_FGR)
- status = inw(io_addr + USBCMD);
+ case UHCI_RESUMING_1: /* End global resume */
+ uhci->state = UHCI_RESUMING_2;
+ outw(0, io_addr + USBCMD);
+ /* Falls through */

- uhci->is_suspended = 0;
+ case UHCI_RESUMING_2: /* Wait for EOP to be sent */
+ if (inw(io_addr + USBCMD) & USBCMD_FGR)
+ break;

- /* Run and mark it configured with a 64-byte max packet */
- outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP, io_addr + USBCMD);
+ /* Run for at least 1 second, and
+ * mark it configured with a 64-byte max packet */
+ uhci->state = UHCI_RUNNING_GRACE;
+ uhci->state_end = jiffies + HZ;
+ outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP,
+ io_addr + USBCMD);
+ break;
+
+ case UHCI_RUNNING_GRACE: /* Now allowed to suspend */
+ uhci->state = UHCI_RUNNING;
+ break;
+
+ default:
+ break;
+ }
}

static int ports_active(struct uhci_hcd *uhci)
@@ -1975,6 +1988,73 @@
return connection;
}

+static int suspend_allowed(struct uhci_hcd *uhci)
+{
+ unsigned int io_addr = uhci->io_addr;
+ int i;
+
+ if (!uhci->hcd.pdev ||
+ uhci->hcd.pdev->vendor != PCI_VENDOR_ID_INTEL ||
+ uhci->hcd.pdev->device != PCI_DEVICE_ID_INTEL_82371AB_2)
+ return 1;
+
+ /* This is a 82371AB/EB/MB USB controller which has a bug that
+ * causes false resume indications if any port has an
+ * over current condition. To prevent problems, we will not
+ * allow a global suspend if any ports are OC.
+ *
+ * Some motherboards using the 82371AB/EB/MB (but not the USB portion)
+ * appear to hardwire the over current inputs active to disable
+ * the USB ports.
+ */
+
+ /* check for over current condition on any port */
+ for (i = 0; i < uhci->rh_numports; i++) {
+ if (inw(io_addr + USBPORTSC1 + i * 2) & USBPORTSC_OC)
+ return 0;
+ }
+
+ return 1;
+}
+
+static void hc_state_transitions(struct uhci_hcd *uhci)
+{
+ switch (uhci->state) {
+ case UHCI_RUNNING:
+
+ /* global suspend if nothing connected for 1 second */
+ if (!ports_active(uhci) && suspend_allowed(uhci)) {
+ uhci->state = UHCI_SUSPENDING_GRACE;
+ uhci->state_end = jiffies + HZ;
+ }
+ break;
+
+ case UHCI_SUSPENDING_GRACE:
+ if (ports_active(uhci))
+ uhci->state = UHCI_RUNNING;
+ else if (time_after_eq(jiffies, uhci->state_end))
+ suspend_hc(uhci);
+ break;
+
+ case UHCI_SUSPENDED:
+
+ /* wakeup if requested by a device */
+ if (uhci->resume_detect)
+ wakeup_hc(uhci);
+ break;
+
+ case UHCI_RESUMING_1:
+ case UHCI_RESUMING_2:
+ case UHCI_RUNNING_GRACE:
+ if (time_after_eq(jiffies, uhci->state_end))
+ wakeup_hc(uhci);
+ break;
+
+ default:
+ break;
+ }
+}
+
static void start_hc(struct uhci_hcd *uhci)
{
unsigned int io_addr = uhci->io_addr;
@@ -2003,6 +2083,8 @@
outl(uhci->fl->dma_handle, io_addr + USBFLBASEADD);

/* Run and mark it configured with a 64-byte max packet */
+ uhci->state = UHCI_RUNNING_GRACE;
+ uhci->state_end = jiffies + HZ;
outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP, io_addr + USBCMD);

uhci->hcd.state = USB_STATE_READY;
@@ -2100,8 +2182,6 @@

uhci->fsbr = 0;
uhci->fsbrtimeout = 0;
-
- uhci->is_suspended = 0;

spin_lock_init(&uhci->qh_remove_list_lock);
INIT_LIST_HEAD(&uhci->qh_remove_list);
===== uhci-hcd.h 1.11 vs edited =====
--- 1.11/drivers/usb/host/uhci-hcd.h Tue Dec 10 14:03:10 2002
+++ edited/drivers/usb/host/uhci-hcd.h Mon May 19 10:36:36 2003
@@ -53,6 +53,7 @@
#define USBPORTSC_RD 0x0040 /* Resume Detect */
#define USBPORTSC_LSDA 0x0100 /* Low Speed Device Attached */
#define USBPORTSC_PR 0x0200 /* Port Reset */
+#define USBPORTSC_OC 0x0400 /* Over Current condition */
#define USBPORTSC_SUSP 0x1000 /* Suspend */

/* Legacy support register */
@@ -282,6 +283,29 @@
return 0; /* int128 for 128-255 ms (Max.) */
}

+/*
+ * Device states for the host controller.
+ *
+ * To prevent "bouncing" in the presence of electrical noise,
+ * we insist on a 1-second "grace" period, before switching to
+ * the RUNNING or SUSPENDED states, during which the state is
+ * not allowed to change.
+ *
+ * The resume process is divided into substates in order to avoid
+ * potentially length delays during the timer handler.
+ *
+ * States in which the host controller is halted must have values <= 0.
+ */
+enum uhci_state {
+ UHCI_RESET,
+ UHCI_RUNNING_GRACE, /* Before RUNNING */
+ UHCI_RUNNING, /* The normal state */
+ UHCI_SUSPENDING_GRACE, /* Before SUSPENDED */
+ UHCI_SUSPENDED = -10, /* When no devices are attached */
+ UHCI_RESUMING_1,
+ UHCI_RESUMING_2
+};
+
#define hcd_to_uhci(hcd_ptr) container_of(hcd_ptr, struct uhci_hcd, hcd)

/*
@@ -313,7 +337,10 @@
struct uhci_frame_list *fl; /* P: uhci->frame_list_lock */
int fsbr; /* Full speed bandwidth reclamation */
unsigned long fsbrtimeout; /* FSBR delay */
- int is_suspended;
+
+ enum uhci_state state; /* FIXME: needs a spinlock */
+ unsigned long state_end; /* Time of next transition */
+ int resume_detect; /* Need a Global Resume */

/* Main list of URB's currently controlled by this HC */
spinlock_t urb_list_lock;

2003-05-19 18:09:24

by Paul Fulghum

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On Mon, 2003-05-19 at 11:41, Alan Stern wrote:
> The patch below incorporates your suggested subroutine. That alone wasn't
> enough to prevent the state from bouncing a few times when I powered my
> USB device on or off, so the debounce code is in there too. This patch
> behaves fine on my workstation, which has both ports connected. I'll try
> it later on my laptop, which only has one port.
>
> In the end, I decided it was easiest and safest to follow your "don't
> suspend if any ports are OC" scheme. We can try it the other way too if
> you want. What do you think would happen if you were to try to put your
> machine in an APM/ACPI "suspend" state?
>
> This is a cumulative patch, i.e., it applies to a virgin 2.5.69 source.
> Let me know how it works for you.

Alan,

the patch applied cleanly and worked for me
(prevented global suspension). Having the lengthy
waits outside of the ISR is a definate plus, and
the debounce makes sense.

My machine does not have APM/ACPI facilities so
I can't test the suspend. It is getting pretty
dated, but the economy dictates I live with it for
a while longer :-)

Does you laptop use the PIIX4? If it does and uses only
one port, I would be very interested to see if
one port is continuously reporting OC (hardwired).

Thanks for the patch,
Paul

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com


2003-05-19 18:36:53

by Alan Stern

[permalink] [raw]
Subject: Re: Test Patch: 2.5.69 Interrupt Latency

On 19 May 2003, Paul Fulghum wrote:

> the patch applied cleanly and worked for me
> (prevented global suspension). Having the lengthy
> waits outside of the ISR is a definate plus, and
> the debounce makes sense.

Good. I'll submit this, if no other problems arise.

> My machine does not have APM/ACPI facilities so
> I can't test the suspend. It is getting pretty
> dated, but the economy dictates I live with it for
> a while longer :-)
>
> Does you laptop use the PIIX4? If it does and uses only
> one port, I would be very interested to see if
> one port is continuously reporting OC (hardwired).

I don't remember what chipset it has. But it definitely includes a UHCI
controller with only one port. I'll check it out tonight.

Alan Stern