2009-11-18 12:57:56

by Simon Arlott

[permalink] [raw]
Subject: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

Can anyone explain this speed difference I'm seeing with usb_bulk_msg()?

Increasing the size of the data doesn't improve the speed at all. This
makes firmware loading take significantly longer with OHCI.

I've tested this with two UHCI controllers, two OHCI controllers, and
two different versions of the device. Even on the same system with the
same device the speed difference occurs between OHCI and UHCI.

I've added a debug line to drivers/usb/atm/cxacru.c in cxacru_fw(),
which uploads the firmware in PAGE_SIZE chunks:
+ printk(KERN_INFO "cxacru: sending fw %#x size %#x to %#x", fw, offb, offd);
ret = usb_bulk_msg(usb_dev, usb_sndbulkpipe(usb_dev, CXACRU_EP_CMD),


With OHCI and PAGE_SIZE chunks (4KB):
[2567188.504299] cxacru: sending fw 0x3 size 0x1000 to 0xe00
[2567188.760293] cxacru: sending fw 0x3 size 0x1000 to 0x1c00
[2567189.016258] cxacru: sending fw 0x3 size 0x1000 to 0x2a00
[2567189.272235] cxacru: sending fw 0x3 size 0x1000 to 0x3800
[2567189.528210] cxacru: sending fw 0x3 size 0x1000 to 0x4600
[2567189.784488] cxacru: sending fw 0x3 size 0x1000 to 0x5400
...
[2567233.044134] cxacru: sending fw 0x3 size 0x2c0 to 0x98668

760293-504299 = 255994us
1016258-760293 = 255965us
272235- 16258 = 255977us
528210-272235 = 255975us
784488-528210 = 256278us
approx. 128,000 bits/s


With UHCI and PAGE_SIZE chunks (4KB):
[ 770.148148] cxacru: sending fw 0x3 size 0x1000 to 0xe00
[ 770.155163] cxacru: sending fw 0x3 size 0x1000 to 0x1c00
[ 770.162146] cxacru: sending fw 0x3 size 0x1000 to 0x2a00
[ 770.169136] cxacru: sending fw 0x3 size 0x1000 to 0x3800
[ 770.176139] cxacru: sending fw 0x3 size 0x1000 to 0x4600
[ 770.183143] cxacru: sending fw 0x3 size 0x1000 to 0x5400
...
[ 771.386167] cxacru: sending fw 0x3 size 0x2c0 to #98668

155163-148148 = 7015us
162146-155163 = 6983us
169136-162146 = 6990us
176139-169136 = 7003us
183143-176139 = 7004us
approx. 128,000 bits/s


With OHCI and PAGE_SIZE*8 chunks (32KB):
[ 4731.826738] cxacru: sending fw 0x3 size 0x8000 to 0x7000
[ 4733.874628] cxacru: sending fw 0x3 size 0x8000 to 0xe000
[ 4735.922353] cxacru: sending fw 0x3 size 0x8000 to 0x15000
[ 4737.970153] cxacru: sending fw 0x3 size 0x8000 to 0x1c000
[ 4740.017937] cxacru: sending fw 0x3 size 0x8000 to 0x23000
[ 4742.065814] cxacru: sending fw 0x3 size 0x8000 to 0x2a000
...
[ 4774.830569] cxacru: sending fw 0x3 size 0x62c0 to #98668

33874628-31826738 = 2047890us, /8 = 255986us
35922353-33874628 = 2047725us, /8 = 255966us
37970153-35922353 = 2047800us, /8 = 255975us
40017937-37970153 = 2047784us, /8 = 255973us
42065814-40017937 = 2047877us, /8 = 255985us
approx. 128,000 bits/s


With UHCI and PAGE_SIZE*8 chunks (32KB):
[ 4835.342142] cxacru: sending fw 0x3 size 0x8000 to 0x7000
[ 4835.399307] cxacru: sending fw 0x3 size 0x8000 to 0xe000
[ 4835.456209] cxacru: sending fw 0x3 size 0x8000 to 0x15000
[ 4835.513308] cxacru: sending fw 0x3 size 0x8000 to 0x1c000
[ 4835.570308] cxacru: sending fw 0x3 size 0x8000 to 0x23000
[ 4835.627287] cxacru: sending fw 0x3 size 0x8000 to 0x2a000
...
[ 4836.539297] cxacru: sending fw 0x3 size 0x62c0 to #98668

399307-342142 = 57165us, /8 = 7146us
456209-399307 = 56902us, /8 = 7113us
513308-456209 = 57099us, /8 = 7137us
570308-513308 = 57000us, /8 = 7125us
627287-570308 = 56979us, /8 = 7122us
approx. 4,600,000 bits/s

--
Simon Arlott


2009-11-18 15:51:12

by Alan Stern

[permalink] [raw]
Subject: Re: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

On Wed, 18 Nov 2009, Simon Arlott wrote:

> Can anyone explain this speed difference I'm seeing with usb_bulk_msg()?
>
> Increasing the size of the data doesn't improve the speed at all. This
> makes firmware loading take significantly longer with OHCI.
>
> I've tested this with two UHCI controllers, two OHCI controllers, and
> two different versions of the device. Even on the same system with the
> same device the speed difference occurs between OHCI and UHCI.
>
> I've added a debug line to drivers/usb/atm/cxacru.c in cxacru_fw(),
> which uploads the firmware in PAGE_SIZE chunks:
> + printk(KERN_INFO "cxacru: sending fw %#x size %#x to %#x", fw, offb, offd);
> ret = usb_bulk_msg(usb_dev, usb_sndbulkpipe(usb_dev, CXACRU_EP_CMD),
>
>
> With OHCI and PAGE_SIZE chunks (4KB):
> [2567188.504299] cxacru: sending fw 0x3 size 0x1000 to 0xe00
> [2567188.760293] cxacru: sending fw 0x3 size 0x1000 to 0x1c00
> [2567189.016258] cxacru: sending fw 0x3 size 0x1000 to 0x2a00
> [2567189.272235] cxacru: sending fw 0x3 size 0x1000 to 0x3800
> [2567189.528210] cxacru: sending fw 0x3 size 0x1000 to 0x4600
> [2567189.784488] cxacru: sending fw 0x3 size 0x1000 to 0x5400
> ...
> [2567233.044134] cxacru: sending fw 0x3 size 0x2c0 to 0x98668

...

> With OHCI and PAGE_SIZE*8 chunks (32KB):
> [ 4731.826738] cxacru: sending fw 0x3 size 0x8000 to 0x7000
> [ 4733.874628] cxacru: sending fw 0x3 size 0x8000 to 0xe000
> [ 4735.922353] cxacru: sending fw 0x3 size 0x8000 to 0x15000
> [ 4737.970153] cxacru: sending fw 0x3 size 0x8000 to 0x1c000
> [ 4740.017937] cxacru: sending fw 0x3 size 0x8000 to 0x23000
> [ 4742.065814] cxacru: sending fw 0x3 size 0x8000 to 0x2a000
> ...
> [ 4774.830569] cxacru: sending fw 0x3 size 0x62c0 to #98668

Each OHCI transfer requires about 256 ms per 4-KB page. There's no
obvious reason for this to take so long. I'd guess it was caused by a
hardware problem except that you say the same thing happened with two
different controllers. Were they both on the same computer?

You may have to do a bisection search to find the answer.

Alan Stern

2009-11-18 20:02:25

by Simon Arlott

[permalink] [raw]
Subject: Re: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

On 18/11/09 15:51, Alan Stern wrote:
> On Wed, 18 Nov 2009, Simon Arlott wrote:
>
>> Can anyone explain this speed difference I'm seeing with usb_bulk_msg()?
>>
>> Increasing the size of the data doesn't improve the speed at all. This
>> makes firmware loading take significantly longer with OHCI.
>>
>> I've tested this with two UHCI controllers, two OHCI controllers, and
>> two different versions of the device. Even on the same system with the
>> same device the speed difference occurs between OHCI and UHCI.
>>
>> I've added a debug line to drivers/usb/atm/cxacru.c in cxacru_fw(),
>> which uploads the firmware in PAGE_SIZE chunks:
>> + printk(KERN_INFO "cxacru: sending fw %#x size %#x to %#x", fw, offb, offd);
>> ret = usb_bulk_msg(usb_dev, usb_sndbulkpipe(usb_dev, CXACRU_EP_CMD),
>>
>>
>> With OHCI and PAGE_SIZE chunks (4KB):
>> [2567188.504299] cxacru: sending fw 0x3 size 0x1000 to 0xe00
>> [2567188.760293] cxacru: sending fw 0x3 size 0x1000 to 0x1c00
>> [2567189.016258] cxacru: sending fw 0x3 size 0x1000 to 0x2a00
>> [2567189.272235] cxacru: sending fw 0x3 size 0x1000 to 0x3800
>> [2567189.528210] cxacru: sending fw 0x3 size 0x1000 to 0x4600
>> [2567189.784488] cxacru: sending fw 0x3 size 0x1000 to 0x5400
>> ...
>> [2567233.044134] cxacru: sending fw 0x3 size 0x2c0 to 0x98668
>
> ...
>
>> With OHCI and PAGE_SIZE*8 chunks (32KB):
>> [ 4731.826738] cxacru: sending fw 0x3 size 0x8000 to 0x7000
>> [ 4733.874628] cxacru: sending fw 0x3 size 0x8000 to 0xe000
>> [ 4735.922353] cxacru: sending fw 0x3 size 0x8000 to 0x15000
>> [ 4737.970153] cxacru: sending fw 0x3 size 0x8000 to 0x1c000
>> [ 4740.017937] cxacru: sending fw 0x3 size 0x8000 to 0x23000
>> [ 4742.065814] cxacru: sending fw 0x3 size 0x8000 to 0x2a000
>> ...
>> [ 4774.830569] cxacru: sending fw 0x3 size 0x62c0 to #98668
>
> Each OHCI transfer requires about 256 ms per 4-KB page. There's no
> obvious reason for this to take so long. I'd guess it was caused by a
> hardware problem except that you say the same thing happened with two
> different controllers. Were they both on the same computer?

Yes. I have a UHCI controller on a PCI card, and used that with on-board
OHCI. Same computer (SMP), same kernel, same device...

Different computer (UP), on-board UHCI = 7ms per 4KB
A third computer (SMP), on-board OHCI = 256ms per 4KB

The device otherwise works ok (it's an ADSL modem) with no issues relating
to latency or throughput.

> You may have to do a bisection search to find the answer.

With what? I can't really bisect "the UHCI code" and "the OHCI code"...

I have no good kernel to work with unless I start trying really old kernels,
but there's no reason why those should work either. I'm hoping someone
recognises the significance of the transfer speed.

The OHCI code appears to split the data up into 4096 chunks, but even the
odd sized transfer of 25280 bytes at the end runs at the same speed:

[ 4774.830569] cxacru: sending fw 0x3 size 0x62c0 to #98668
[ 4776.410375] cxacru: sending fw 0x3 size 0x100 to #e0

6.410375-4.830569
1.579806
1579806/25280*4096
255968.56708860759493668864

--
Simon Arlott

2009-11-18 21:28:44

by Alan Stern

[permalink] [raw]
Subject: Re: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

On Wed, 18 Nov 2009, Simon Arlott wrote:

> > You may have to do a bisection search to find the answer.
>
> With what? I can't really bisect "the UHCI code" and "the OHCI code"...
>
> I have no good kernel to work with unless I start trying really old kernels,
> but there's no reason why those should work either. I'm hoping someone
> recognises the significance of the transfer speed.

If you don't have a good kernel to start from then there's nothing to
search for. I was assuming that a relatively recent kernel change
might have caused the slow-down, but if it has been this way for a long
time then a different approach is needed.

There's no particular significance to 256 ms that I know of, although
if the hardware is malfunctioning it could easily pick such a rate.
The page size is significant because that's how the driver and the
hardware divide up transfers; each Transfer Descriptor refers to at
most 4096 bytes of data.

> The OHCI code appears to split the data up into 4096 chunks, but even the
> odd sized transfer of 25280 bytes at the end runs at the same speed:
>
> [ 4774.830569] cxacru: sending fw 0x3 size 0x62c0 to #98668
> [ 4776.410375] cxacru: sending fw 0x3 size 0x100 to #e0

Yep. My intuition says "hardware problem", but there's no hard
evidence one way or another.

What happens with other sorts of devices, such as a USB flash drive?

Alan Stern

2009-11-18 21:59:36

by Simon Arlott

[permalink] [raw]
Subject: Re: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

On 18/11/09 21:28, Alan Stern wrote:
> If you don't have a good kernel to start from then there's nothing to
> search for. I was assuming that a relatively recent kernel change
> might have caused the slow-down, but if it has been this way for a long
> time then a different approach is needed.

The earliest kernel I've had this happen with is 2.6.28-rc2, when I first
started using it with an OHCI controller.

> There's no particular significance to 256 ms that I know of, although
> if the hardware is malfunctioning it could easily pick such a rate.
> The page size is significant because that's how the driver and the
> hardware divide up transfers; each Transfer Descriptor refers to at
> most 4096 bytes of data.
>
> On Wed, 18 Nov 2009, Simon Arlott wrote:
>> The OHCI code appears to split the data up into 4096 chunks, but even the
>> odd sized transfer of 25280 bytes at the end runs at the same speed:
>>
>> [ 4774.830569] cxacru: sending fw 0x3 size 0x62c0 to #98668
>> [ 4776.410375] cxacru: sending fw 0x3 size 0x100 to #e0
>
> Yep. My intuition says "hardware problem", but there's no hard
> evidence one way or another.
>
> What happens with other sorts of devices, such as a USB flash drive?

On my desktop, I don't usually use the OHCI controller directly, but
through hubs on EHCI. I do have an USB Audio device which works ok
using OHCI.

The firmware load fails over EHCI (via a hub) despite running even faster:
[89633.219403] cxacru: sending fw 0x3 size 0x8000 to #7000
[89633.220147] cxacru: sending fw 0x3 size 0x8000 to #e000
[89633.220530] cxacru: sending fw 0x3 size 0x8000 to #15000
[89633.220774] cxacru: sending fw 0x3 size 0x8000 to #1c000
[89633.220898] cxacru: sending fw 0x3 size 0x8000 to #23000
[89633.221286] cxacru: sending fw 0x3 size 0x8000 to #2a000

338µs per 32KB... which is 775mbit/s+ and doesn't make sense.

I can write a 10MB file to an USB flash drive over OHCI, and umount+sync
takes around 13 seconds.

--
Simon Arlott

2009-11-18 22:25:34

by Alan Stern

[permalink] [raw]
Subject: Re: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

On Wed, 18 Nov 2009, Simon Arlott wrote:

> > What happens with other sorts of devices, such as a USB flash drive?
>
> On my desktop, I don't usually use the OHCI controller directly, but
> through hubs on EHCI. I do have an USB Audio device which works ok
> using OHCI.
>
> The firmware load fails over EHCI (via a hub) despite running even faster:
> [89633.219403] cxacru: sending fw 0x3 size 0x8000 to #7000
> [89633.220147] cxacru: sending fw 0x3 size 0x8000 to #e000
> [89633.220530] cxacru: sending fw 0x3 size 0x8000 to #15000
> [89633.220774] cxacru: sending fw 0x3 size 0x8000 to #1c000
> [89633.220898] cxacru: sending fw 0x3 size 0x8000 to #23000
> [89633.221286] cxacru: sending fw 0x3 size 0x8000 to #2a000
>
> 338µs per 32KB... which is 775mbit/s+ and doesn't make sense

It sure doesn't. EHCI can't send bulk data faster than 52 KB/ms, which
is 615 us per 32 KB. I wonder if the time values above are correct.

> I can write a 10MB file to an USB flash drive over OHCI, and umount+sync
> takes around 13 seconds.

Yes, that's about right. It leads me to wonder if something funny is
going on with the device, or least with the firmware-loading part of
it. Odd that it works differently with UHCI and OHCI, though. There
shouldn't be any differences visible to the device. You don't have
anything else attached to the same bus, do you?

Alan Stern

2009-11-22 10:58:49

by Simon Arlott

[permalink] [raw]
Subject: Re: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

On 18/11/09 22:25, Alan Stern wrote:
> On Wed, 18 Nov 2009, Simon Arlott wrote:
>> > What happens with other sorts of devices, such as a USB flash drive?
>>
>> I can write a 10MB file to an USB flash drive over OHCI, and umount+sync
>> takes around 13 seconds.
>
> Yes, that's about right. It leads me to wonder if something funny is
> going on with the device, or least with the firmware-loading part of
> it. Odd that it works differently with UHCI and OHCI, though. There
> shouldn't be any differences visible to the device. You don't have
> anything else attached to the same bus, do you?

Yes, but not in use and I could disconnect everything to test it.

core/hcd.c has an interesting comment in usb_hcd_check_unlink_urb():
/* The USB 2.0 spec says 256 ms. This is close enough and won't
* exceed that limit if HZ is 100. The math is more clunky than
* maybe expected, this is to make sure that all timers for USB devices
* fire at the same time to give the CPU a break inbetween */

I'll try increasing the frequency of this timer too.

--
Simon Arlott

2009-11-22 16:14:24

by Alan Stern

[permalink] [raw]
Subject: Re: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

On Sun, 22 Nov 2009, Simon Arlott wrote:

> On 18/11/09 22:25, Alan Stern wrote:
> > On Wed, 18 Nov 2009, Simon Arlott wrote:
> >> > What happens with other sorts of devices, such as a USB flash drive?
> >>
> >> I can write a 10MB file to an USB flash drive over OHCI, and umount+sync
> >> takes around 13 seconds.
> >
> > Yes, that's about right. It leads me to wonder if something funny is
> > going on with the device, or least with the firmware-loading part of
> > it. Odd that it works differently with UHCI and OHCI, though. There
> > shouldn't be any differences visible to the device. You don't have
> > anything else attached to the same bus, do you?
>
> Yes, but not in use and I could disconnect everything to test it.
>
> core/hcd.c has an interesting comment in usb_hcd_check_unlink_urb():

You mean usb_hcd_poll_rh_status().

> /* The USB 2.0 spec says 256 ms. This is close enough and won't
> * exceed that limit if HZ is 100. The math is more clunky than
> * maybe expected, this is to make sure that all timers for USB devices
> * fire at the same time to give the CPU a break inbetween */
>
> I'll try increasing the frequency of this timer too.

It shouldn't make any difference; that timer is used with OHCI only in
exceptional circumstances (like immediately after a device is plugged
in or unplugged), not during normal operation. But maybe something
strange is going on. You could add a printk in ohci_hub_status_data(),
which is the routine that timer ends up calling.

Alan Stern

2009-11-22 17:16:32

by Simon Arlott

[permalink] [raw]
Subject: Re: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

On 22/11/09 16:14, Alan Stern wrote:
> On Sun, 22 Nov 2009, Simon Arlott wrote:
>> On 18/11/09 22:25, Alan Stern wrote:
>> > On Wed, 18 Nov 2009, Simon Arlott wrote:
>> >> > What happens with other sorts of devices, such as a USB flash drive?
>> >>
>> >> I can write a 10MB file to an USB flash drive over OHCI, and umount+sync
>> >> takes around 13 seconds.
>> >
>> > Yes, that's about right. It leads me to wonder if something funny is
>> > going on with the device, or least with the firmware-loading part of
>> > it. Odd that it works differently with UHCI and OHCI, though. There
>> > shouldn't be any differences visible to the device. You don't have
>> > anything else attached to the same bus, do you?
>>
>> Yes, but not in use and I could disconnect everything to test it.
>>
>> core/hcd.c has an interesting comment in usb_hcd_check_unlink_urb():
>
> You mean usb_hcd_poll_rh_status().

Yes... misplaced paste from my other email.

>> /* The USB 2.0 spec says 256 ms. This is close enough and won't
>> * exceed that limit if HZ is 100. The math is more clunky than
>> * maybe expected, this is to make sure that all timers for USB devices
>> * fire at the same time to give the CPU a break inbetween */
>>
>> I'll try increasing the frequency of this timer too.
>
> It shouldn't make any difference; that timer is used with OHCI only in
> exceptional circumstances (like immediately after a device is plugged
> in or unplugged), not during normal operation. But maybe something
> strange is going on. You could add a printk in ohci_hub_status_data(),
> which is the routine that timer ends up calling.

I've tried sending 64 and 2048 bytes at a time, with the same speed
(4ms and 128ms), so that time is just a coincidence.

Submitting it all as multiple asynchronous URBs in one go doesn't help
either. I've been trying to get EHCI working too (via two different
high speed hubs) but that's not working even if I add long delays.

--
Simon Arlott

2009-11-22 22:17:38

by Alan Stern

[permalink] [raw]
Subject: Re: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

On Sun, 22 Nov 2009, Simon Arlott wrote:

> I've tried sending 64 and 2048 bytes at a time, with the same speed
> (4ms and 128ms), so that time is just a coincidence.

These timings indicate that one packet is getting transmitted every 4
frames. In theory the slowness could be caused by either the host
controller or the device, but you'd need a bus analyzer to tell which.
Certainly there's no good reason for the controller to run that slowly,
but there's also no good reason for the device to run that slowly only
with OHCI.

> Submitting it all as multiple asynchronous URBs in one go doesn't help
> either. I've been trying to get EHCI working too (via two different
> high speed hubs) but that's not working even if I add long delays.

Have you tried using a non-high-speed hub, or forcing the port into
full speed and adding a hub? That is, does the presence of a hub with
OHCI make any difference?

Alan Stern

2010-08-30 21:56:30

by Simon Arlott

[permalink] [raw]
Subject: Re: cxacru usb_bulk_msg() firmware upload 36x slower with OHCI vs. UHCI

On 22/11/09 22:17, Alan Stern wrote:
> On Sun, 22 Nov 2009, Simon Arlott wrote:
>> I've tried sending 64 and 2048 bytes at a time, with the same speed
>> (4ms and 128ms), so that time is just a coincidence.
>
> These timings indicate that one packet is getting transmitted every 4
> frames. In theory the slowness could be caused by either the host
> controller or the device, but you'd need a bus analyzer to tell which.
> Certainly there's no good reason for the controller to run that slowly,
> but there's also no good reason for the device to run that slowly only
> with OHCI.
>
>> Submitting it all as multiple asynchronous URBs in one go doesn't help
>> either. I've been trying to get EHCI working too (via two different
>> high speed hubs) but that's not working even if I add long delays.
>
> Have you tried using a non-high-speed hub, or forcing the port into
> full speed and adding a hub? That is, does the presence of a hub with
> OHCI make any difference?

Something has changed in the kernel between the last time I looked at
this issue and now. On both my OHCI hardware the firmware initialisation
runs at the same speed as UHCI did. It's even working via a hub on EHCI.

This may have something to do with cxacru trying to use an interrupt
transfer for a bulk endpoint, which now returns an error if
CONFIG_USB_DEBUG is enabled...

Commit f661c6f8c67bd55e93348f160d590ff9edf08904 is in urgent need of a
printk(); a patch follows this email.

--
Simon Arlott