Hi,
I wonder if:
drivers/usb/core/devio.c:86
#define MAX_USBFS_BUFFER_SIZE 16384
is some random, or outdated limit or if there really is some code path that could
not handle bigger URBs.
For performance reasons I would like to use bigger packages for an image
aquisition device.
Yours,
--
René Rebe - Rubensstr. 64 - 12157 Berlin (Europe / Germany)
http://www.exactcode.de | http://www.t2-project.org
+49 (0)30 255 897 45
Am Mittwoch, 1. März 2006 21:16 schrieb René Rebe:
> Hi,
>
> I wonder if:
>
> drivers/usb/core/devio.c:86
> #define MAX_USBFS_BUFFER_SIZE 16384
>
> is some random, or outdated limit or if there really is some code path that could
> not handle bigger URBs.
We are nice to the VM. 16384 is optimistic anyway. You cannot expect
to repeatedly and reliably allocate larger buffers.
Regards
Oliver
On Wed, Mar 01, 2006 at 09:16:25PM +0100, Ren?? Rebe wrote:
> Hi,
>
> I wonder if:
>
> drivers/usb/core/devio.c:86
> #define MAX_USBFS_BUFFER_SIZE 16384
>
> is some random, or outdated limit or if there really is some code path that could
> not handle bigger URBs.
>
> For performance reasons I would like to use bigger packages for an image
> aquisition device.
Why not just send down 2 urbs with that size then, that would keep the
pipe quite full.
thanks,
greg k-h
Hi,
On Wednesday 01 March 2006 22:32, Greg KH wrote:
> On Wed, Mar 01, 2006 at 09:16:25PM +0100, Ren?? Rebe wrote:
> > Hi,
> >
> > I wonder if:
> >
> > drivers/usb/core/devio.c:86
> > #define MAX_USBFS_BUFFER_SIZE 16384
> >
> > is some random, or outdated limit or if there really is some code path that could
> > not handle bigger URBs.
> >
> > For performance reasons I would like to use bigger packages for an image
> > aquisition device.
>
> Why not just send down 2 urbs with that size then, that would keep the
> pipe quite full.
Because that requires even more modifications to libusb and sane (i_usb) ...
So, queing alot URBs is the recommended way to sustain the bus? Allowing
way bigger buffers will not be realistic?
Yours,
--
Ren? Rebe - Rubensstr. 64 - 12157 Berlin (Europe / Germany)
http://www.exactcode.de | http://www.t2-project.org
+49 (0)30 255 897 45
On Wed, Mar 01, 2006 at 10:42:35PM +0100, Ren? Rebe wrote:
> Hi,
>
> On Wednesday 01 March 2006 22:32, Greg KH wrote:
> > On Wed, Mar 01, 2006 at 09:16:25PM +0100, Ren?? Rebe wrote:
> > > Hi,
> > >
> > > I wonder if:
> > >
> > > drivers/usb/core/devio.c:86
> > > #define MAX_USBFS_BUFFER_SIZE 16384
> > >
> > > is some random, or outdated limit or if there really is some code path that could
> > > not handle bigger URBs.
> > >
> > > For performance reasons I would like to use bigger packages for an image
> > > aquisition device.
> >
> > Why not just send down 2 urbs with that size then, that would keep the
> > pipe quite full.
>
> Because that requires even more modifications to libusb and sane (i_usb) ...
No, do it in your application I mean.
> So, queing alot URBs is the recommended way to sustain the bus? Allowing
> way bigger buffers will not be realistic?
16Kb is "way big" in the USB scheme of things aready. Look at the size
of your endpoint. It's probably _very_ small compared to that. So no,
larger buffer sizes is not realistic at all.
thanks,
greg k-h
On Wed, Mar 01, 2006 at 01:54:23PM -0800, Greg KH wrote:
> On Wed, Mar 01, 2006 at 10:42:35PM +0100, Ren? Rebe wrote:
> > So, queing alot URBs is the recommended way to sustain the bus? Allowing
> > way bigger buffers will not be realistic?
>
> 16Kb is "way big" in the USB scheme of things aready. Look at the size
> of your endpoint. It's probably _very_ small compared to that. So no,
> larger buffer sizes is not realistic at all.
As a data point, I have traces of a scanner session including a
download of a 26Mb binary image using 524288 bytes logical blocks
physically transferred with 61440 bytes bulk_in frames. Seems stable
enough. IIRC the scanner-side controller chip has some advanced
buffering just to handle that kind of bandwidth.
ISTR a preliminary linux userland driver using libusb having problems
keeping up with the scanner too. May very well have been an issue
with the driver itself though, so I wouldn't read too much into that.
OG.
On Wed, Mar 01, 2006 at 11:34:30PM +0100, Olivier Galibert wrote:
> On Wed, Mar 01, 2006 at 01:54:23PM -0800, Greg KH wrote:
> > On Wed, Mar 01, 2006 at 10:42:35PM +0100, Ren? Rebe wrote:
> > > So, queing alot URBs is the recommended way to sustain the bus? Allowing
> > > way bigger buffers will not be realistic?
> >
> > 16Kb is "way big" in the USB scheme of things aready. Look at the size
> > of your endpoint. It's probably _very_ small compared to that. So no,
> > larger buffer sizes is not realistic at all.
>
> As a data point, I have traces of a scanner session including a
> download of a 26Mb binary image using 524288 bytes logical blocks
> physically transferred with 61440 bytes bulk_in frames. Seems stable
> enough. IIRC the scanner-side controller chip has some advanced
> buffering just to handle that kind of bandwidth.
That's impressive. What are the endpoint sizes on the device that did
this?
thanks,
greg k-h
On Wed, Mar 01, 2006 at 02:41:23PM -0800, Greg KH wrote:
> On Wed, Mar 01, 2006 at 11:34:30PM +0100, Olivier Galibert wrote:
> > As a data point, I have traces of a scanner session including a
> > download of a 26Mb binary image using 524288 bytes logical blocks
> > physically transferred with 61440 bytes bulk_in frames. Seems stable
> > enough. IIRC the scanner-side controller chip has some advanced
> > buffering just to handle that kind of bandwidth.
>
> That's impressive. What are the endpoint sizes on the device that did
> this?
Hmmm, the chip is a Genesys gl841, on a canonscan lide 35. And it
advertises a 64 bytes wMaxPacketSize on both in and out bulk
interfaces. Go figure.
Want the log and/or the lsusb -v?
OG.
On Thu, Mar 02, 2006 at 12:25:35AM +0100, Olivier Galibert wrote:
> On Wed, Mar 01, 2006 at 02:41:23PM -0800, Greg KH wrote:
> > On Wed, Mar 01, 2006 at 11:34:30PM +0100, Olivier Galibert wrote:
> > > As a data point, I have traces of a scanner session including a
> > > download of a 26Mb binary image using 524288 bytes logical blocks
> > > physically transferred with 61440 bytes bulk_in frames. Seems stable
> > > enough. IIRC the scanner-side controller chip has some advanced
> > > buffering just to handle that kind of bandwidth.
> >
> > That's impressive. What are the endpoint sizes on the device that did
> > this?
>
> Hmmm, the chip is a Genesys gl841, on a canonscan lide 35. And it
> advertises a 64 bytes wMaxPacketSize on both in and out bulk
> interfaces. Go figure.
>
> Want the log and/or the lsusb -v?
Nah, I was just curious.
Now notice that the max the device can take for a single USB frame is 64
bytes. So if you send one urb at 16K, you should have plenty of cpu
time to queue up another one of the same size before that one flushes
out to the device, even if it is a high speed device.
That's the reason upping the size of this buffer will not really help
anyone out, except lazy userspace programmers :)
thanks,
greg k-h
Hi,
On Wednesday 01 March 2006 22:54, Greg KH wrote:
> > > Why not just send down 2 urbs with that size then, that would keep the
> > > pipe quite full.
> >
> > Because that requires even more modifications to libusb and sane (i_usb) ...
>
> No, do it in your application I mean.
? The driver is a SANE backend and forced to use sanei_usb over libusb. Thus
I have to modifiy them all to allow asynchon URB queuing - or have I missed
something?
Yours,
--
Ren? Rebe - Rubensstr. 64 - 12157 Berlin (Europe / Germany)
http://www.exactcode.de | http://www.t2-project.org
+49 (0)30 255 897 45
Hi,
On Wednesday 01 March 2006 22:54, Greg KH wrote:
> > > Why not just send down 2 urbs with that size then, that would keep the
> > > pipe quite full.
> >
> > Because that requires even more modifications to libusb and sane (i_usb) ...
>
> No, do it in your application I mean.
Ok, tweaking libusb to queue N URBs for reads to be split (resulting in 9 URBs
in my usecase) I see a nearly 100% improvement here (2 times faster).
How many URBs may I queue? Nearly infinite (in my case that would be max 64)
or is there some tiny static list somewhere in the affected code-path?
Yours,
--
Ren? Rebe - Rubensstr. 64 - 12157 Berlin (Europe / Germany)
http://www.exactcode.de | http://www.t2-project.org
+49 (0)30 255 897 45
On Thu, Mar 02, 2006 at 05:03:26PM +0100, Ren? Rebe wrote:
> Hi,
>
> On Wednesday 01 March 2006 22:54, Greg KH wrote:
>
> > > > Why not just send down 2 urbs with that size then, that would keep the
> > > > pipe quite full.
> > >
> > > Because that requires even more modifications to libusb and sane (i_usb) ...
> >
> > No, do it in your application I mean.
>
> Ok, tweaking libusb to queue N URBs for reads to be split (resulting in 9 URBs
> in my usecase) I see a nearly 100% improvement here (2 times faster).
>
> How many URBs may I queue? Nearly infinite (in my case that would be max 64)
> or is there some tiny static list somewhere in the affected code-path?
There is no static list that I know of, as it is all just pointers.
Just don't DOS the kernel by sending it an infinate ammount of memory :)
More details can be found on the linux-usb-devel list if you ask there.
thanks,
greg k-h
On Thu, Mar 02, 2006 at 10:04:21AM +0100, Ren? Rebe wrote:
> Hi,
>
> On Wednesday 01 March 2006 22:54, Greg KH wrote:
>
> > > > Why not just send down 2 urbs with that size then, that would keep the
> > > > pipe quite full.
> > >
> > > Because that requires even more modifications to libusb and sane (i_usb) ...
> >
> > No, do it in your application I mean.
>
> ? The driver is a SANE backend and forced to use sanei_usb over libusb. Thus
> I have to modifiy them all to allow asynchon URB queuing - or have I missed
> something?
I really don't know the SANE backend design, sorry.
greg k-h
On Wed, 1 Mar 2006 22:42:35 +0100, René Rebe <[email protected]> wrote:
> > > drivers/usb/core/devio.c:86
> > > #define MAX_USBFS_BUFFER_SIZE 16384
> So, queing alot URBs is the recommended way to sustain the bus? Allowing
> way bigger buffers will not be realistic?
Have you ever considered how many TDs have to be allocated to transfer
a data buffer this big? No, seriously. If your application cannot deliver
the tranfer speeds with 16KB URBs, we ought to consider if the combination
of our USB stack, usbfs, libusb and the application ought to get serious
performance enhancing surgery. The problem is obviously in the software
overhead.
-- Pete
Hi,
On Thursday 02 March 2006 22:05, Pete Zaitcev wrote:
> On Wed, 1 Mar 2006 22:42:35 +0100, René Rebe <[email protected]> wrote:
>
> > > > drivers/usb/core/devio.c:86
> > > > #define MAX_USBFS_BUFFER_SIZE 16384
>
> > So, queing alot URBs is the recommended way to sustain the bus? Allowing
> > way bigger buffers will not be realistic?
>
> Have you ever considered how many TDs have to be allocated to transfer
> a data buffer this big? No, seriously. If your application cannot deliver
> the tranfer speeds with 16KB URBs, we ought to consider if the combination
> of our USB stack, usbfs, libusb and the application ought to get serious
> performance enhancing surgery. The problem is obviously in the software
> overhead.
As I already wrote, queing multiple URBs in parallel solved the problem for me.
I'll post the libusb patch later. So the problem just was time of no pending
URBs wasted a lot of time slots where no URB was exchanged with the scanner.
Queueing N = size / 16k URBs in parallel gets the maximal possible thruput with
the scanner - a 2x speedup. The driver is now even slightly faster than the
vendor Windows one by about 20%.
For even further improvements a _async interface would be needed in libusb
(and sanei_usb) so I can queue the prologue and epilogue URBs of the protocol
of communication into the kernel and thus elleminate some more wasted time
slots. I estimate that the driver would then be over 30% faster compared with
the Windows one.
Yours,
--
René Rebe - Rubensstr. 64 - 12157 Berlin (Europe / Germany)
http://www.exactcode.de | http://www.t2-project.org
+49 (0)30 255 897 45
> So, queing alot URBs is the recommended way to sustain the bus? Allowing
> way bigger buffers will not be realistic?
usbfs could copy the user buffer to a bunch of non-contiguous pages, and
then fire those off in an urb using the scatter-gather stuff. [Rather than,
as now, allocating a bunch of contiguous pages using kmalloc]. That would
probably make it possible to use much much bigger user-space buffers. Plus
the code looks rather easy to write.
Ciao,
Duncan.
> Have you ever considered how many TDs have to be allocated to transfer
> a data buffer this big? No, seriously. If your application cannot deliver
> the tranfer speeds with 16KB URBs, we ought to consider if the combination
> of our USB stack, usbfs, libusb and the application ought to get serious
> performance enhancing surgery. The problem is obviously in the software
> overhead.
If you queue a large number of 16KB urbs, rather than one jumbo urb,
does that make any difference to the number of TDs allocated? I thought
TDs were allocated for all queued urbs at the moment they are queued...
Ciao,
Duncan.
Am Freitag, 3. März 2006 09:12 schrieb Duncan Sands:
> > Have you ever considered how many TDs have to be allocated to transfer
> > a data buffer this big? No, seriously. If your application cannot deliver
> > the tranfer speeds with 16KB URBs, we ought to consider if the combination
> > of our USB stack, usbfs, libusb and the application ought to get serious
> > performance enhancing surgery. The problem is obviously in the software
> > overhead.
>
> If you queue a large number of 16KB urbs, rather than one jumbo urb,
> does that make any difference to the number of TDs allocated? I thought
> TDs were allocated for all queued urbs at the moment they are queued...
It changes the time the TDs are allocated. TDs allocated while an URB is
in flight don't hurt bandwidth. If your throughput is low because there
is too much delay between URBs, allocating many TDs makes matters worse.
Regards
Oliver
Am Mittwoch, 1. M?rz 2006 22:59 schrieb Duncan Sands:
> > So, queing alot URBs is the recommended way to sustain the bus? Allowing
> > way bigger buffers will not be realistic?
>
> usbfs could copy the user buffer to a bunch of non-contiguous pages, and
> then fire those off in an urb using the scatter-gather stuff. [Rather than,
> as now, allocating a bunch of contiguous pages using kmalloc]. That would
> probably make it possible to use much much bigger user-space buffers. Plus
> the code looks rather easy to write.
It seems to me that that would change the API. The scatter/gather stuff
can fail partially, can't it?
Regards
Oliver
On Fri, Mar 03, 2006 at 08:27:45AM +0100, Ren?? Rebe wrote:
> Queueing N = size / 16k URBs in parallel gets the maximal possible thruput with
> the scanner - a 2x speedup. The driver is now even slightly faster than the
> vendor Windows one by about 20%.
That's great. It's also another data point in the many success storys
saying that Linux's USB stack is faster than Windows, even when driven
by userspace programs :)
> For even further improvements a _async interface would be needed in libusb
> (and sanei_usb) so I can queue the prologue and epilogue URBs of the protocol
> of communication into the kernel and thus elleminate some more wasted time
> slots. I estimate that the driver would then be over 30% faster compared with
> the Windows one.
I'm currently working on a "usbfs2" that will be async-io driven. That
should allow you to get that added speed you need.
thanks,
greg k-h