(Resending because [email protected] bounced right back to me. Sorry for the multiple messages!)
----- Forwarded Message ----
From: Open Source <[email protected]>
To: [email protected]; [email protected]
Sent: Thursday, October 12, 2006 12:21:56 PM
Subject: USB performance bug since kernel 2.6.13 (CRITICAL???)
Hi all,
I am writing regarding a performance issue that I recently observed after upgrading from kernel 2.6.12 to 2.6.17. I did some hunting around and have found that the issue first arises in 2.6.13.
I am using a device that submits URBs asynchronously using the libusb devio infrastructure. In version 2.6.12 I am able to submit and reap URBs for my particular application at a transaction rate of one per millisecond. A transaction consists of a single WRITE URB (< 512 bytes) followed by a single READ URB (1024 bytes). Once I upgrade to version 2.6.13, the transactional rate drops to one per 4 milliseconds!
The overall performance of a particular algorithm is increased from a total execution time of 75 seconds to over 160 seconds. The only difference between the two tests is the kernel. Microsoft Windows executes the algorithm in 70-75 seconds!
I am using a Fedora Core distribution with FC4 kernels for testing. Is there some new incantation that is required in my user-mode driver to get around a "feature" in recent kernels? Does anyone else know about this? I was not able to easily find discussion about this on the newsgroups. It appears that this problem has been around for a while, if it is indeed a problem.
I am not a subscriber to the linux-kernel mailing list but have cross-posted to it since this seems like a serious enough issue. Please continue to keep any responses on linux-usb-devel as well so I can see them in my email box.
Thank you,
Beleaguered Open Source Fan
On Thu, 2006-10-12 at 12:33 -0700, Open Source wrote:
> I am using a device that submits URBs asynchronously using the libusb
> devio infrastructure. In version 2.6.12 I am able to submit and reap
> URBs for my particular application at a transaction rate of one per
> millisecond. A transaction consists of a single WRITE URB (< 512
> bytes) followed by a single READ URB (1024 bytes). Once I upgrade to
> version 2.6.13, the transactional rate drops to one per 4
> milliseconds!
The kernel's internal tick rate was lowered from 1000Hz to 250Hz
starting with 2.6.13. Rebuild with CONFIG_HZ=1000 and the performance
should return to normal.
Lee
Hi,
Thanks for the rapid response! But...I'm a bit confused. Why is there any software OS timer involved at all? Bulk messages should be scheduled by the host controller and for USB 2.0 the microframe period is 125 us. When I submit an URB, it should be sent to the host controller and scheduled for the next microframe. When the URB completes I should get an interrupt from the hardware. Hence, my transactions (WRITE followed by READ) should at worst be approximately 250 us plus some overhead to process the transaction itself, provided there aren't any other time consuming processes running on my OS. My overhead is less than 250 us. I was willing to tolerate 1 ms per transaction, but 4 ms just doesn't make any sense. Therefore I'm not sure if CONFIG_HZ is an appropriate excuse for this issue.
Besides, Windows is now more than twice as fast out of the box than Linux when it comes to this particular type of USB usage. We can't have that can we? :-) Linux used to be fast and the only difference with my recent experiences is the kernel version.
Best regards,
Still Beleagered Open Source Fan
----- Original Message ----
From: Lee Revell <[email protected]>
To: Open Source <[email protected]>
Cc: [email protected]; [email protected]
Sent: Thursday, October 12, 2006 12:55:55 PM
Subject: Re: USB performance bug since kernel 2.6.13 (CRITICAL???)
On Thu, 2006-10-12 at 12:33 -0700, Open Source wrote:
> I am using a device that submits URBs asynchronously using the libusb
> devio infrastructure. In version 2.6.12 I am able to submit and reap
> URBs for my particular application at a transaction rate of one per
> millisecond. A transaction consists of a single WRITE URB (< 512
> bytes) followed by a single READ URB (1024 bytes). Once I upgrade to
> version 2.6.13, the transactional rate drops to one per 4
> milliseconds!
The kernel's internal tick rate was lowered from 1000Hz to 250Hz
starting with 2.6.13. Rebuild with CONFIG_HZ=1000 and the performance
should return to normal.
Lee
On Thu, 2006-10-12 at 13:05 -0700, Open Source wrote:
> Hi,
>
> Thanks for the rapid response! But...I'm a bit confused. Why is there any software OS timer involved at all? Bulk messages should be scheduled by the host controller and for USB 2.0 the microframe period is 125 us. When I submit an URB, it should be sent to the host controller and scheduled for the next microframe. When the URB completes I should get an interrupt from the hardware. Hence, my transactions (WRITE followed by READ) should at worst be approximately 250 us plus some overhead to process the transaction itself, provided there aren't any other time consuming processes running on my OS. My overhead is less than 250 us. I was willing to tolerate 1 ms per transaction, but 4 ms just doesn't make any sense. Therefore I'm not sure if CONFIG_HZ is an appropriate excuse for this issue.
>
I don't know, it just seemed likely because 1ms vs 4ms is close to the
change in the timer tick rate. Try it.
Lee
Yes, I am pretty sure you are right about the timing. But it shouldn't be that way. If it is, then there's a bug.
I'm fully willing to accept there is something else I should be doing driver-wise, but it shoudn't require recompiling the stock distribution kernels. Otherwise, Linux is not competitive with Microsoft Windows in this regard!
I'll try a recompile and report back. In the meantime, if anyone else has any ideas, please let me know!
Gopal
----- Original Message ----
From: Lee Revell <[email protected]>
To: Open Source <[email protected]>
Cc: [email protected]; [email protected]
Sent: Thursday, October 12, 2006 1:19:12 PM
Subject: Re: USB performance bug since kernel 2.6.13 (CRITICAL???)
On Thu, 2006-10-12 at 13:05 -0700, Open Source wrote:
> Hi,
>
> Thanks for the rapid response! But...I'm a bit confused. Why is there any software OS timer involved at all? Bulk messages should be scheduled by the host controller and for USB 2.0 the microframe period is 125 us. When I submit an URB, it should be sent to the host controller and scheduled for the next microframe. When the URB completes I should get an interrupt from the hardware. Hence, my transactions (WRITE followed by READ) should at worst be approximately 250 us plus some overhead to process the transaction itself, provided there aren't any other time consuming processes running on my OS. My overhead is less than 250 us. I was willing to tolerate 1 ms per transaction, but 4 ms just doesn't make any sense. Therefore I'm not sure if CONFIG_HZ is an appropriate excuse for this issue.
>
I don't know, it just seemed likely because 1ms vs 4ms is close to the
change in the timer tick rate. Try it.
Lee
On Thu, 2006-10-12 at 13:56 -0700, Open Source wrote:
> Yes, I am pretty sure you are right about the timing. But it shouldn't be that way. If it is, then there's a bug.
>
> I'm fully willing to accept there is something else I should be doing driver-wise, but it shoudn't require recompiling the stock distribution kernels. Otherwise, Linux is not competitive with Microsoft Windows in this regard!
>
> I'll try a recompile and report back. In the meantime, if anyone else has any ideas, please let me know!
>
Yes, I agree that it would be a bug. If it turns out to be related to
CONFIG_HZ, ask your distro why they rolled it back from 1000 to 250Hz.
Lee
On Thursday 12 October 2006 17:26, Lee Revell wrote:
> On Thu, 2006-10-12 at 13:56 -0700, Open Source wrote:
> > Yes, I am pretty sure you are right about the timing. But it shouldn't be that way. If it is, then there's a bug.
> >
> > I'm fully willing to accept there is something else I should be doing driver-wise, but it shoudn't require recompiling the stock distribution kernels. Otherwise, Linux is not competitive with Microsoft Windows in this regard!
> >
> > I'll try a recompile and report back. In the meantime, if anyone else has any ideas, please let me know!
> >
>
> Yes, I agree that it would be a bug. If it turns out to be related to
> CONFIG_HZ, ask your distro why they rolled it back from 1000 to 250Hz.
If this turns out to be tied to the HZ rate its a bug. It _should_ not be using
this timing to do this ergo bug. You may be able to bypass by using 1000Hz
but this is not a fix...
Ed Tomlinson
[FYI, it would make things easier for the rest of us if you can convince
your email client to wrap lines before 80 columns.]
On Thu, 12 Oct 2006, Open Source wrote:
> Hi all,
> I am writing regarding a performance issue that I recently observed
> after upgrading from kernel 2.6.12 to 2.6.17. I did some hunting around
> and have found that the issue first arises in 2.6.13.
>
> I am using a device that submits URBs asynchronously using the libusb
> devio infrastructure. In version 2.6.12 I am able to submit and reap
> URBs for my particular application at a transaction rate of one per
> millisecond. A transaction consists of a single WRITE URB (< 512 bytes)
> followed by a single READ URB (1024 bytes). Once I upgrade to version
> 2.6.13, the transactional rate drops to one per 4 milliseconds!
>
> The overall performance of a particular algorithm is increased from a
> total execution time of 75 seconds to over 160 seconds. The only
> difference between the two tests is the kernel. Microsoft Windows
> executes the algorithm in 70-75 seconds!
>
> I am using a Fedora Core distribution with FC4 kernels for testing. Is
> there some new incantation that is required in my user-mode driver to
> get around a "feature" in recent kernels? Does anyone else know about
> this? I was not able to easily find discussion about this on the
> newsgroups. It appears that this problem has been around for a while,
> if it is indeed a problem.
I'll be interested to here if changing HZ back to 1000 makes any
difference. As others have already stated, it shouldn't matter but maybe
it does somehow.
Even if it does, there are things you might be able to do with HZ=250 to
improve throughput. You could transfer more than 512/1024 bytes per URB.
You could queue multiple URBs before waiting for the first one to
complete. Provided you can keep the endpoint queues filled, you should be
able to achieve the maximum throughput of the hardware.
Alan Stern
Hi Wolfgang (and all),
Thanks for the input. However, I am not understanding
exactly why kernel mode is treated any differently than
user mode for this sort of thing. I am looking at the code
in ehci-q.c and ehci-hcd.c.
It seems like the unlinking of completed URBs
happens asynchronously on a timer. This is a
surprise to me since I thought this was happening
on an IRQ from the host controller. But if what I'm
surmising is correct it would explain everything
I am seeing. I'm not able to ascertain how
user mode drivers are treated differently than
kernel mode drivers in this regard. From what I
can tell, all drivers would be broken equally!
Can anyone who has more experience
with this code confirm this for me?
Besides, we count on sub-10 ms response times all the
time in user mode. Take for example, the access of a file.
If opening a file had a fixed latency of 4 ms, people
would be up in arms. So that's not entirely a valid excuse.
A USB operation that used to take 1 ms now takes 4 ms.
That's a pretty big change.
The ability to write user-mode drivers for USB devices
is very powerful for deployment. If one writes a kernel
driver, there are severe deployment hassles. As such,
my company has chosen to write user-mode drivers
on both Windows to avoid driver deployment nightmares.
This has been extremely successful so far.. Ironically,
Windows (using libusb-win32) has had no such performance
glitches. As a matter of principle, Linux should at least be
as good as Windows, right?
Hopefully we can get this sorted out.
Cheers.
----- Original Message ----
From: WolfgangM?es <[email protected]>
To: [email protected]
Sent: Friday, October 13, 2006 12:11:08 PM
Subject: Re: [linux-usb-devel] USB performance bug since kernel 2.6.13 (CRITICAL???)
On Friday 13 October 2006 19:20, Open Source wrote:
> Alan -- yes, I understand the ability to increase throughput
> by transfering more bytes and I am definitely able to see
> better overall throughput when increasing the number
> of bytes per transaction. However, I needs to still have
> good transaction-level timing because I cannot always
> queue the transactions up. Recall that each transaction
> is a WRITE followed by a READ. The results of the
> READ determine the outgoing bytes for the following
> transaction's WRITE.
Relying on sub-10ms response times in userspace is broken by design.
I have written a driver with similar timing requirements, and I have
done it in the kernel. This is the right way to go. Nothing else.
regards
Wolfgang
--
Das Leben kann nur r?ckw?rts verstanden,
mu? aber vorw?rts gelebt werden.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
[email protected]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
On Fri, 13 Oct 2006, Open Source wrote:
> Hi Wolfgang (and all),
>
> Thanks for the input. However, I am not understanding
> exactly why kernel mode is treated any differently than
> user mode for this sort of thing. I am looking at the code
> in ehci-q.c and ehci-hcd.c.
>
> It seems like the unlinking of completed URBs
> happens asynchronously on a timer. This is a
> surprise to me since I thought this was happening
> on an IRQ from the host controller. But if what I'm
> surmising is correct it would explain everything
> I am seeing. I'm not able to ascertain how
> user mode drivers are treated differently than
> kernel mode drivers in this regard. From what I
> can tell, all drivers would be broken equally!
> Can anyone who has more experience
> with this code confirm this for me?
I don't think so. You must be mis-reading the code. The only timers used
in ehci-hcd are a couple of watchdogs; they shouldn't affect the normal
URB completions which occur within ehci_work(), called by ehci_irq().
What Wolfgang meant was that user processes are subject to unpredictable
delays from all kinds of sources.
Alan Stern
Hi Alan (and all),
Thanks for your message.
There is an ioctl that is waiting for the URB to be reaped.
I am almost certain it is this syscall that is taking 4 ms (as
opposed to 1 ms with CONFIG_HZ=1000).
Let me ask a very specific question related to my
previous email about devio.c.
Say I have a process that has done the following:
--
add_wait_queue(&ps->wait, &wait);
for (;;) {
__set_current_state(TASK_INTERRUPTIBLE);
if ((as = async_getcompleted(ps)))
break;
if (signal_pending(current))
break;
usb_unlock_device(dev);
schedule();
usb_lock_device(dev);
}
remove_wait_queue(&ps->wait, &wait);
--
Then later, a USB interrupt triggers, and the following is called
after a bunch of code:
--
wake_up(&ps->wait)
--
Will the first piece of code wake up immediately or only after the
next HZ timeslice? If it is the latter, which is what I am starting to
fear now, that would explain everything. Then, the question is
whether or not there is anyway to improve this situation. If it does
wake up immediatley, then the delay is elsewhere and probably
has nothing to do with kernel mode versus user mode issues.
Thanks again.
----- Original Message ----
From: Alan Cox <[email protected]>
To: Open Source <[email protected]>
Cc: Alan Stern <[email protected]>; [email protected]; WolfgangM?es <[email protected]>; [email protected]
Sent: Friday, October 13, 2006 4:41:44 PM
Subject: Re: [linux-usb-devel] USB performance bug since kernel 2.6.13 (CRITICAL???)
Ar Gwe, 2006-10-13 am 16:02 -0700, ysgrifennodd Open Source:
> clear understanding of what is causing it. As it stands it doesn't
> seem like even the experts know exactly where this
> delay is being caused.
strace should tell you precisely how long each syscall takes if you ask
it to trace things nicely. If you have code trying to wait for a tiny
time then HZ will bump the wait to be longer (kernel or user) but for
other cases all should be fine either way.
The other issues like priority and paging caused delays can generally be
dealt with by having the relevant service code running mlockall and real
time priority.
On Friday 13 October 2006 12:31 pm, Open Source wrote:
>
> It seems like the unlinking of completed URBs
> happens asynchronously on a timer. This is a
> surprise to me since I thought this was happening
> on an IRQ from the host controller. But if what I'm
> surmising is correct it would explain everything
> I am seeing.
You didn't say what EHCI controller you're using.
The original reason for that timer was specifically
to handle bugs in VIA silicon: that won't always issue
all of the IRQs it's supposed to. Specifically the IAA
IRQ would never arrive, leading to unlink operations
wedging.
Later on that timer got overloaded (and hence it got
to be a bit of a mess) with two other tasks:
(a) deducing that an endpoint had been idle long enough
to take off the ring of active control/bulk queues,
which is what ensures that an idle controller isn't
wasting memory bandwidth by doing DMA all the time.
(It turns out to be a Bad Idea to take idle endpoints
off that ring the instant they become idle, since they
will as a rule be reused very quickly ... so immediate
removal slows things down a lot. Plus it chases an
annoying number of races between driver and controller.)
(b) being a general I/O watchdog, since there were some
other (rare) cases where IRQs would appear to get lost
In short, except for (a) the only reason I know of
that changing HZ should affect system performance is
IRQ lossage ... of which the primary reason has always
been strange behavior from VIA controllers.
- Dave
Ar Gwe, 2006-10-13 am 16:30 -0700, ysgrifennodd Open Source:
> There is an ioctl that is waiting for the URB to be reaped.
> I am almost certain it is this syscall that is taking 4 ms (as
> opposed to 1 ms with CONFIG_HZ=1000).
What does strace say about it ? This is measurable not speculation.
> Will the first piece of code wake up immediately or only after the
> next HZ timeslice? If it is the latter, which is what I am starting to
It depends if there are other things running and on their priority
relative to you.
On Sat, 14 Oct 2006, Alan Cox wrote:
> Ar Gwe, 2006-10-13 am 16:30 -0700, ysgrifennodd Open Source:
> > There is an ioctl that is waiting for the URB to be reaped.
> > I am almost certain it is this syscall that is taking 4 ms (as
> > opposed to 1 ms with CONFIG_HZ=1000).
>
> What does strace say about it ? This is measurable not speculation.
I completely agree with the other Alan. You don't have to guess about
these things. Use strace to see what your process is doing and at the
same time use usbmon to see what the USB stack is doing. Run the
experiment at both 1000 Hz and 250 Hz and compare the results.
Alan Stern
Hi all,
Yes, I am on board with both Alans. I did not know
about this very nifty option in strace. I will look into
it in detail on Mon/Tue and report back.
Clearly something is weird, because there are no
other load-intensive processes running on the system.
I would think that the scheduler would wake up any
blocked processes rather than sitting idle for 4 ms.
In any case, I am open to the fact it may not be
USB per se. We'll see what strace says.
As always, thanks for the suggestions. Stay tuned....
----- Original Message ----
From: Alan Stern <[email protected]>
To: Open Source <[email protected]>
Cc: Alan Cox <[email protected]>; USB development list <[email protected]>; Kernel development list <[email protected]>
Sent: Friday, October 13, 2006 7:11:39 PM
Subject: Re: [linux-usb-devel] USB performance bug since kernel 2.6.13 (CRITICAL???)
On Sat, 14 Oct 2006, Alan Cox wrote:
> Ar Gwe, 2006-10-13 am 16:30 -0700, ysgrifennodd Open Source:
> > There is an ioctl that is waiting for the URB to be reaped.
> > I am almost certain it is this syscall that is taking 4 ms (as
> > opposed to 1 ms with CONFIG_HZ=1000).
>
> What does strace say about it ? This is measurable not speculation.
I completely agree with the other Alan. You don't have to guess about
these things. Use strace to see what your process is doing and at the
same time use usbmon to see what the USB stack is doing. Run the
experiment at both 1000 Hz and 250 Hz and compare the results.
Alan Stern