2008-06-20 20:45:42

by Alan Stern

[permalink] [raw]
Subject: Scatter-gather list constraints

Is there any way to express the constraint that for a particular
request queue, all members of a scatter-gather list (except the last)
must be a multiple of a particular length?

This question arises in connection with wireless USB mass-storage
devices. The controller driver requires that all DMA segments
in a transfer, other than the last one, have a multiple of 1024 bytes.
But we're sometimes getting s-g lists where an element contains an odd
number of 512-byte sectors, and of course it doesn't work.

Thanks,

Alan Stern


2008-06-20 20:50:32

by David Miller

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

From: Alan Stern <[email protected]>
Date: Fri, 20 Jun 2008 16:30:25 -0400 (EDT)

> This question arises in connection with wireless USB mass-storage
> devices. The controller driver requires that all DMA segments
> in a transfer, other than the last one, have a multiple of 1024 bytes.
> But we're sometimes getting s-g lists where an element contains an odd
> number of 512-byte sectors, and of course it doesn't work.

The generic device layer DMA bits does have ways to indicate
DMA restrictions such as maximum segment size, but not something
like this.

This is a pretty strange requirement, and would be probably be
quite difficult to support across the board just to handle this
strange device :)

2008-06-21 14:01:45

by Andi Kleen

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

Alan Stern <[email protected]> writes:

> This question arises in connection with wireless USB mass-storage
> devices. The controller driver requires that all DMA segments
> in a transfer, other than the last one, have a multiple of 1024 bytes.
> But we're sometimes getting s-g lists where an element contains an odd
> number of 512-byte sectors, and of course it doesn't work.

But you can handle a single 512 byte request? Splitting the request
in this case should work. Or maybe copying is cheaper than splitting?

I don't think the block layer knows about such kinds of restrictions.

-Andi

2008-06-21 14:54:51

by Alan Stern

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Sat, 21 Jun 2008, Andi Kleen wrote:

> Alan Stern <[email protected]> writes:
>
> > This question arises in connection with wireless USB mass-storage
> > devices. The controller driver requires that all DMA segments
> > in a transfer, other than the last one, have a multiple of 1024 bytes.
> > But we're sometimes getting s-g lists where an element contains an odd
> > number of 512-byte sectors, and of course it doesn't work.
>
> But you can handle a single 512 byte request?

Yes. And we can handle a list containing a bunch of 1024-byte segments
terminated by a single 512-byte segment.

> Splitting the request
> in this case should work. Or maybe copying is cheaper than splitting?

Splitting would work. But it has to be done fairly high up in the
stack, ideally in the block layer.

> I don't think the block layer knows about such kinds of restrictions.

Evidently not. Is it feasible to add such knowledge to the block
layer?

Alan Stern

2008-06-21 15:21:23

by Andi Kleen

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

Alan Stern wrote:
> On Sat, 21 Jun 2008, Andi Kleen wrote:
>
>> Alan Stern <[email protected]> writes:
>>
>>> This question arises in connection with wireless USB mass-storage
>>> devices. The controller driver requires that all DMA segments
>>> in a transfer, other than the last one, have a multiple of 1024 bytes.
>>> But we're sometimes getting s-g lists where an element contains an odd
>>> number of 512-byte sectors, and of course it doesn't work.
>> But you can handle a single 512 byte request?
>
> Yes. And we can handle a list containing a bunch of 1024-byte segments
> terminated by a single 512-byte segment.
>
>> Splitting the request
>> in this case should work. Or maybe copying is cheaper than splitting?
>
> Splitting would work. But it has to be done fairly high up in the
> stack, ideally in the block layer.

That's true. Or you would need to reserve requests for this, which is likely
a bad idea. Perhaps you're better off just copying in this (hopefully
rare) case. Fortunately lately thanks to work from Peter Z. allocating memory
in the write out path is much safer than it used to be.

Also for my edification: is that a general restriction of the wireless usb
spec or just a specific hardware quirk in some design?

>
>> I don't think the block layer knows about such kinds of restrictions.
>
> Evidently not. Is it feasible to add such knowledge to the block
> layer?

You would need to ask Jens, but I would assume he would ask:
- Is it common?
- Is it performance critical?
and presumably the answer to both would be "no" ?

-Andi

2008-06-21 21:50:27

by Alan Stern

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Sat, 21 Jun 2008, Andi Kleen wrote:

> > Splitting would work. But it has to be done fairly high up in the
> > stack, ideally in the block layer.
>
> That's true. Or you would need to reserve requests for this, which is likely
> a bad idea. Perhaps you're better off just copying in this (hopefully
> rare) case. Fortunately lately thanks to work from Peter Z. allocating memory
> in the write out path is much safer than it used to be.
>
> Also for my edification: is that a general restriction of the wireless usb
> spec or just a specific hardware quirk in some design?

It's a combination of factors. The USB spec requires that during a
bulk data transfer, all packets except the last must have the maximum
packet size defined for the device. In fact, a packet whose size is
smaller than the maximum always marks the end of a transfer. (More
precisely, the transfer ends when the expected number of bytes has been
received or when a short packet is received, whichever occurs first.)

The USB host controller drivers, in concert with the USB scatter-gather
library, package each S-G element as a separate set of packets. This
is easy to do, of course, since each element represents data that is
contiguous in DMA space. Until recently the maximum packet sizes have
been no larger than 512 bytes; hence filesystem transfers would never
cause problems because they always involve complete sectors and so the
S-G elements are always a multiple of 512 in length. Thus the
transfers would always consist of nothing but maximum-sized packets.

But now there is a new spec for Wireless USB devices, and it permits
the maximum packet size to be 1024. So when an S-G element is 3584
bytes (this actually was observed in testing), it gets packaged as
three 1024-byte packets followed by a 512-byte packet. The device
naturally thinks that the 512-byte packet signals the end of the
transfer, and therefore it doesn't accept the remaining S-G elements in
the list.

> >> I don't think the block layer knows about such kinds of restrictions.
> >
> > Evidently not. Is it feasible to add such knowledge to the block
> > layer?
>
> You would need to ask Jens, but I would assume he would ask:
> - Is it common?

This is the only situation I know about. But of course it will become
more and more common as wireless USB devices spread into use.

> - Is it performance critical?

For people using wireless USB drives, yes.

> and presumably the answer to both would be "no" ?

In theory this could be fixed at the host controller level, by making
packets span S-G elements. But this would be a very large and
difficult change. Altering the block layer should be a lot easier (he
said, secure in his blind ignorance). For example, the DMA alignment
restriction could be made to apply to the _end_ of each S-G element as
well as the _beginning_, except for the last element in the list.

Alan Stern

2008-06-21 23:01:17

by Andi Kleen

[permalink] [raw]
Subject: Re: Scatter-gather list constraints


>> - Is it performance critical?
>
> For people using wireless USB drives, yes.

But only if there is a lot of 512 byte block IO? The only case I can think
of right now would be XFS log IO and perhaps some O_DIRECT/raw device
accesses.

Or how did you determine it is critical? Was there some important workload where
most of the requests had this form?

If it's only an relative oddball just copying is fine imho.

-Andi

2008-06-22 14:35:51

by Alan Stern

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Sun, 22 Jun 2008, Andi Kleen wrote:

>
> >> - Is it performance critical?
> >
> > For people using wireless USB drives, yes.
>
> But only if there is a lot of 512 byte block IO? The only case I can think
> of right now would be XFS log IO and perhaps some O_DIRECT/raw device
> accesses.

Sorry, I misunderstood your question. There probably will not be a lot
of 512-byte block I/O -- not in the workloads I'm acquainted with. But
there will be some.

It isn't performance-critical, in the sense that slowing down the odd
512-byte block transfers won't hurt performance much. But it is
critical in the sense that the transfers must work properly when they
do occur.

> If it's only an relative oddball just copying is fine imho.

You mean, have the USB stack allocate bounce buffers and copy the data
between the S-G buffers (which may be in high memory) and the bounce
buffers? We're talking about a potentially fairly large amount of
data, say up to 100 KB. Is that really easier than splitting up an I/O
request?

Alan Stern

2008-06-23 15:12:23

by Alan Stern

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Mon, 23 Jun 2008, David Vrabel wrote:

> Note that this 1024 byte multiple is for one particular WUSB mass
> storage device. The WUSB standard permits max packet sizes of up 3584
> (in multiples of 512), but I suspect WUSB mass storage devices will only
> use 512, 1024, or 2048.
>
> For a solution, we may be able to do something if the HWA host
> controller is passed a single URB with an s-g list (rather than one URB
> per s-g list entry) and was careful about how it segmented the URB into
> transfers to the rpipe.

That would be ideal. However there is no way to pass an S-G list along
with an URB; there's no field for it in the data structure. And none
of the existing host controller drivers support such a thing.

I suppose we could add a field to struct urb and add a flag indicating
whether the controller driver supports S-G lists.

Alan Stern

2008-06-23 15:54:31

by David Vrabel

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

Alan Stern wrote:
> Is there any way to express the constraint that for a particular
> request queue, all members of a scatter-gather list (except the last)
> must be a multiple of a particular length?
>
> This question arises in connection with wireless USB mass-storage
> devices. The controller driver requires that all DMA segments
> in a transfer, other than the last one, have a multiple of 1024 bytes.
> But we're sometimes getting s-g lists where an element contains an odd
> number of 512-byte sectors, and of course it doesn't work.

Note that this 1024 byte multiple is for one particular WUSB mass
storage device. The WUSB standard permits max packet sizes of up 3584
(in multiples of 512), but I suspect WUSB mass storage devices will only
use 512, 1024, or 2048.

For a solution, we may be able to do something if the HWA host
controller is passed a single URB with an s-g list (rather than one URB
per s-g list entry) and was careful about how it segmented the URB into
transfers to the rpipe.

Inaky, you know more about WAs that me. Could you comment on this?

A similar solution for WHCI host controllers would require a revision of
the WHCI spec to support full scatter-gather DMA.

David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/

2008-06-23 19:07:31

by David Vrabel

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

Alan Stern wrote:
> On Mon, 23 Jun 2008, David Vrabel wrote:
>
>> Note that this 1024 byte multiple is for one particular WUSB mass
>> storage device. The WUSB standard permits max packet sizes of up 3584
>> (in multiples of 512), but I suspect WUSB mass storage devices will only
>> use 512, 1024, or 2048.
>>
>> For a solution, we may be able to do something if the HWA host
>> controller is passed a single URB with an s-g list (rather than one URB
>> per s-g list entry) and was careful about how it segmented the URB into
>> transfers to the rpipe.
>
> That would be ideal. However there is no way to pass an S-G list along
> with an URB; there's no field for it in the data structure. And none
> of the existing host controller drivers support such a thing.
>
> I suppose we could add a field to struct urb and add a flag indicating
> whether the controller driver supports S-G lists.

This is what I was thinking.

Can the number of entries in a sg list be limited? e.g., if the
hardware only had support for say, 64 entries?

David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/

2008-06-23 19:45:21

by Alan Stern

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Mon, 23 Jun 2008, David Vrabel wrote:

> Alan Stern wrote:
> > On Mon, 23 Jun 2008, David Vrabel wrote:
> >
> >> Note that this 1024 byte multiple is for one particular WUSB mass
> >> storage device. The WUSB standard permits max packet sizes of up 3584
> >> (in multiples of 512), but I suspect WUSB mass storage devices will only
> >> use 512, 1024, or 2048.
> >>
> >> For a solution, we may be able to do something if the HWA host
> >> controller is passed a single URB with an s-g list (rather than one URB
> >> per s-g list entry) and was careful about how it segmented the URB into
> >> transfers to the rpipe.
> >
> > That would be ideal. However there is no way to pass an S-G list along
> > with an URB; there's no field for it in the data structure. And none
> > of the existing host controller drivers support such a thing.
> >
> > I suppose we could add a field to struct urb and add a flag indicating
> > whether the controller driver supports S-G lists.
>
> This is what I was thinking.
>
> Can the number of entries in a sg list be limited? e.g., if the
> hardware only had support for say, 64 entries?

Yes, there are two fields in struct request_queue for this:
max_phys_segments (the driver's limit) and max_hw_segments (the
hardware's limit).

Standard EHCI hardware requires that the memory locations of the data
for each packet be "virtually contiguous", i.e., discontiguities are
allowed only at 4-KB page boundaries. This severely limits the ability
to handle general S-G lists. For example, a 1024-byte packet can't be
broken up into two 512-byte pieces unless the first piece ends at a
page boundary and the second piece begins at a page boundary. Maybe
HWA host controllers are required to be more flexible, I don't know.

This may mean that your suggested approach won't work.

Alan Stern

2008-06-23 21:54:31

by Stefan Richter

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

Alan Stern wrote:
> On Mon, 23 Jun 2008, David Vrabel wrote:
>> Can the number of entries in a sg list be limited? e.g., if the
>> hardware only had support for say, 64 entries?
>
> Yes, there are two fields in struct request_queue for this:
> max_phys_segments (the driver's limit) and max_hw_segments (the
> hardware's limit).

David, also have a look at <linux/blkdev.h> from the comment "Access
functions for manipulating queue properties" downwards for accessors to
request queue tunables.
--
Stefan Richter
-=====-==--- -==- =-===
http://arcgraph.de/sr/

2008-06-24 10:42:39

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Sat, 21 Jun 2008 17:50:13 -0400 (EDT)
Alan Stern <[email protected]> wrote:

> > >> I don't think the block layer knows about such kinds of restrictions.
> > >
> > > Evidently not. Is it feasible to add such knowledge to the block
> > > layer?
> >
> > You would need to ask Jens, but I would assume he would ask:
> > - Is it common?
>
> This is the only situation I know about. But of course it will become
> more and more common as wireless USB devices spread into use.
>
> > - Is it performance critical?
>
> For people using wireless USB drives, yes.
>
> > and presumably the answer to both would be "no" ?
>
> In theory this could be fixed at the host controller level, by making
> packets span S-G elements. But this would be a very large and
> difficult change. Altering the block layer should be a lot easier (he
> said, secure in his blind ignorance). For example, the DMA alignment
> restriction could be made to apply to the _end_ of each S-G element as
> well as the _beginning_, except for the last element in the list.

I don't think that the block layer has the DMA alignment concept in FS
I/O path. And I think that you need kinda the DMA padding instead the
DMA alignment though again The block layer doesn't have the DMA
padding concept in FS I/O path. And the DMA padding applies to only
the last SG element.

I guess that it's pretty hard to implement such a strange restriction
in the block layer cleanly.

The iSER driver has a strange restriction too. I think that as iSER
does, bouncing is a better option, though adding some generic
mechanism to reserve buffer in the block layer might be nice, I
gueess.

2008-06-24 14:57:24

by Alan Stern

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Tue, 24 Jun 2008, FUJITA Tomonori wrote:

> I don't think that the block layer has the DMA alignment concept in FS
> I/O path. And I think that you need kinda the DMA padding instead the
> DMA alignment though again The block layer doesn't have the DMA
> padding concept in FS I/O path. And the DMA padding applies to only
> the last SG element.
>
> I guess that it's pretty hard to implement such a strange restriction
> in the block layer cleanly.

I don't see why there should be any problem. It's simply a matter of
splitting a single request into multiple requests, at places where
the length restriction would be violated.

For example, suppose an I/O request starts out with two S-G elements
of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
that all elements except the last must have length divisible by 1024.
Then the request could be broken up into three requests of 1024, 512,
and 2048 bytes.

> The iSER driver has a strange restriction too. I think that as iSER
> does, bouncing is a better option, though adding some generic
> mechanism to reserve buffer in the block layer might be nice, I
> gueess.

Is it reasonable to have 120-KB bounce buffers?

Alan Stern

2008-06-25 00:18:54

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Tue, 24 Jun 2008 10:57:13 -0400 (EDT)
Alan Stern <[email protected]> wrote:

> On Tue, 24 Jun 2008, FUJITA Tomonori wrote:
>
> > I don't think that the block layer has the DMA alignment concept in FS
> > I/O path. And I think that you need kinda the DMA padding instead the
> > DMA alignment though again The block layer doesn't have the DMA
> > padding concept in FS I/O path. And the DMA padding applies to only
> > the last SG element.
> >
> > I guess that it's pretty hard to implement such a strange restriction
> > in the block layer cleanly.
>
> I don't see why there should be any problem. It's simply a matter of
> splitting a single request into multiple requests, at places where
> the length restriction would be violated.
>
> For example, suppose an I/O request starts out with two S-G elements
> of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
> that all elements except the last must have length divisible by 1024.
> Then the request could be broken up into three requests of 1024, 512,
> and 2048 bytes.

I can't say that it's easy to implement a clean mechanism to break up
a request into multiple requests until I see a patch.

What I said is that you think that this is about extending something
in the block layer but it's about adding a new concept to the block
layer.


> > The iSER driver has a strange restriction too. I think that as iSER
> > does, bouncing is a better option, though adding some generic
> > mechanism to reserve buffer in the block layer might be nice, I
> > gueess.
>
> Is it reasonable to have 120-KB bounce buffers?

The block layer does. Why do you think that USB can't?

2008-06-25 04:09:31

by Perez-Gonzalez, Inaky

[permalink] [raw]
Subject: RE: Scatter-gather list constraints

>From: David Vrabel [mailto:[email protected]]
>
>> That would be ideal. However there is no way to pass an S-G list
along
>> with an URB; there's no field for it in the data structure. And none
>> of the existing host controller drivers support such a thing.
>>
>> I suppose we could add a field to struct urb and add a flag
indicating
>> whether the controller driver supports S-G lists.
>
>This is what I was thinking.

That would simplify a *LOT* of WUSB wire-adapter code which now is a
horrible kludge.

-- Inaky

2008-06-25 14:23:27

by Alan Stern

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Wed, 25 Jun 2008, FUJITA Tomonori wrote:

> > For example, suppose an I/O request starts out with two S-G elements
> > of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
> > that all elements except the last must have length divisible by 1024.
> > Then the request could be broken up into three requests of 1024, 512,
> > and 2048 bytes.
>
> I can't say that it's easy to implement a clean mechanism to break up
> a request into multiple requests until I see a patch.

And I can't write a patch without learning a lot more about how the
block core works.

> What I said is that you think that this is about extending something
> in the block layer but it's about adding a new concept to the block
> layer.

Is it? What does the block layer do when it receives an I/O request
that don't satisfy the other constraints (max_sectors or
dma_alignment_mask, for example)?

> > Is it reasonable to have 120-KB bounce buffers?
>
> The block layer does. Why do you think that USB can't?

Why do you think I think that USB can't? I didn't ask whether it was
_possible_; I asked whether it was _reasonable_.

Alan Stern

2008-06-25 14:25:05

by Alan Stern

[permalink] [raw]
Subject: RE: Scatter-gather list constraints

On Tue, 24 Jun 2008, Perez-Gonzalez, Inaky wrote:

> >From: David Vrabel [mailto:[email protected]]
> >
> >> That would be ideal. However there is no way to pass an S-G list
> along
> >> with an URB; there's no field for it in the data structure. And none
> >> of the existing host controller drivers support such a thing.
> >>
> >> I suppose we could add a field to struct urb and add a flag
> indicating
> >> whether the controller driver supports S-G lists.
> >
> >This is what I was thinking.
>
> That would simplify a *LOT* of WUSB wire-adapter code which now is a
> horrible kludge.

In what way would it simplify the code?

Note that usbcore already contains a scatter-gather library.
(Unfortunately the library is limited in usefulness because it needs to
run in process context.)

Alan Stern

2008-06-26 02:06:51

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Wed, 25 Jun 2008 10:23:00 -0400 (EDT)
Alan Stern <[email protected]> wrote:

> On Wed, 25 Jun 2008, FUJITA Tomonori wrote:
>
> > > For example, suppose an I/O request starts out with two S-G elements
> > > of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
> > > that all elements except the last must have length divisible by 1024.
> > > Then the request could be broken up into three requests of 1024, 512,
> > > and 2048 bytes.
> >
> > I can't say that it's easy to implement a clean mechanism to break up
> > a request into multiple requests until I see a patch.
>
> And I can't write a patch without learning a lot more about how the
> block core works.
>
> > What I said is that you think that this is about extending something
> > in the block layer but it's about adding a new concept to the block
> > layer.
>
> Is it? What does the block layer do when it receives an I/O request
> that don't satisfy the other constraints (max_sectors or
> dma_alignment_mask, for example)?

As I explained, you need something new.

I don't think that max_sectors works as you expect.

dma_alignment_mask is not used in the FS path. And I think that
dma_alignment_mask doens't solve your problems.


> > > Is it reasonable to have 120-KB bounce buffers?
> >
> > The block layer does. Why do you think that USB can't?
>
> Why do you think I think that USB can't? I didn't ask whether it was
> _possible_; I asked whether it was _reasonable_.

What the block layer does is reasonable with regard to this, I think.

2008-06-26 05:40:43

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thu, 26 Jun 2008 11:06:03 +0900
FUJITA Tomonori <[email protected]> wrote:

> On Wed, 25 Jun 2008 10:23:00 -0400 (EDT)
> Alan Stern <[email protected]> wrote:
>
> > On Wed, 25 Jun 2008, FUJITA Tomonori wrote:
> >
> > > > For example, suppose an I/O request starts out with two S-G elements
> > > > of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
> > > > that all elements except the last must have length divisible by 1024.
> > > > Then the request could be broken up into three requests of 1024, 512,
> > > > and 2048 bytes.
> > >
> > > I can't say that it's easy to implement a clean mechanism to break up
> > > a request into multiple requests until I see a patch.
> >
> > And I can't write a patch without learning a lot more about how the
> > block core works.
> >
> > > What I said is that you think that this is about extending something
> > > in the block layer but it's about adding a new concept to the block
> > > layer.
> >
> > Is it? What does the block layer do when it receives an I/O request
> > that don't satisfy the other constraints (max_sectors or
> > dma_alignment_mask, for example)?
>
> As I explained, you need something new.
>
> I don't think that max_sectors works as you expect.

The block layer looks at max_sectors when merging two things (or add
one to another). So the test fails, it doesn't merge them.


> dma_alignment_mask is not used in the FS path. And I think that
> dma_alignment_mask doens't solve your problems.

If dma_alignment_mask test fails, the block layer allocates temporary
buffers and does memory copies.

2008-06-26 06:36:16

by Jens Axboe

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thu, Jun 26 2008, FUJITA Tomonori wrote:
> On Thu, 26 Jun 2008 11:06:03 +0900
> FUJITA Tomonori <[email protected]> wrote:
>
> > On Wed, 25 Jun 2008 10:23:00 -0400 (EDT)
> > Alan Stern <[email protected]> wrote:
> >
> > > On Wed, 25 Jun 2008, FUJITA Tomonori wrote:
> > >
> > > > > For example, suppose an I/O request starts out with two S-G elements
> > > > > of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
> > > > > that all elements except the last must have length divisible by 1024.
> > > > > Then the request could be broken up into three requests of 1024, 512,
> > > > > and 2048 bytes.
> > > >
> > > > I can't say that it's easy to implement a clean mechanism to break up
> > > > a request into multiple requests until I see a patch.
> > >
> > > And I can't write a patch without learning a lot more about how the
> > > block core works.
> > >
> > > > What I said is that you think that this is about extending something
> > > > in the block layer but it's about adding a new concept to the block
> > > > layer.
> > >
> > > Is it? What does the block layer do when it receives an I/O request
> > > that don't satisfy the other constraints (max_sectors or
> > > dma_alignment_mask, for example)?
> >
> > As I explained, you need something new.
> >
> > I don't think that max_sectors works as you expect.
>
> The block layer looks at max_sectors when merging two things (or add
> one to another). So the test fails, it doesn't merge them.
>
>
> > dma_alignment_mask is not used in the FS path. And I think that
> > dma_alignment_mask doens't solve your problems.
>
> If dma_alignment_mask test fails, the block layer allocates temporary
> buffers and does memory copies.

I don't think adding anything in the general IO path makes a lot of
sense, this is a really screwy case. I don't mind adding work-arounds to
the block layer to cater for hardware weirdness, but this is getting a
little silly. We could provide a helper function for 'bouncing' this
request and thus reuse the block bounce buffer for this, but I'm not
even sure how to simply express this generically. As it is likely of no
use outside of this specific case, putting it in the driver (or usb
layer, if you expect more of these similar cases) is the best option.

--
Jens Axboe

2008-06-26 06:59:52

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thu, 26 Jun 2008 08:35:59 +0200
Jens Axboe <[email protected]> wrote:

> On Thu, Jun 26 2008, FUJITA Tomonori wrote:
> > On Thu, 26 Jun 2008 11:06:03 +0900
> > FUJITA Tomonori <[email protected]> wrote:
> >
> > > On Wed, 25 Jun 2008 10:23:00 -0400 (EDT)
> > > Alan Stern <[email protected]> wrote:
> > >
> > > > On Wed, 25 Jun 2008, FUJITA Tomonori wrote:
> > > >
> > > > > > For example, suppose an I/O request starts out with two S-G elements
> > > > > > of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
> > > > > > that all elements except the last must have length divisible by 1024.
> > > > > > Then the request could be broken up into three requests of 1024, 512,
> > > > > > and 2048 bytes.
> > > > >
> > > > > I can't say that it's easy to implement a clean mechanism to break up
> > > > > a request into multiple requests until I see a patch.
> > > >
> > > > And I can't write a patch without learning a lot more about how the
> > > > block core works.
> > > >
> > > > > What I said is that you think that this is about extending something
> > > > > in the block layer but it's about adding a new concept to the block
> > > > > layer.
> > > >
> > > > Is it? What does the block layer do when it receives an I/O request
> > > > that don't satisfy the other constraints (max_sectors or
> > > > dma_alignment_mask, for example)?
> > >
> > > As I explained, you need something new.
> > >
> > > I don't think that max_sectors works as you expect.
> >
> > The block layer looks at max_sectors when merging two things (or add
> > one to another). So the test fails, it doesn't merge them.
> >
> >
> > > dma_alignment_mask is not used in the FS path. And I think that
> > > dma_alignment_mask doens't solve your problems.
> >
> > If dma_alignment_mask test fails, the block layer allocates temporary
> > buffers and does memory copies.
>
> I don't think adding anything in the general IO path makes a lot of
> sense, this is a really screwy case. I don't mind adding work-arounds to
> the block layer to cater for hardware weirdness, but this is getting a
> little silly. We could provide a helper function for 'bouncing' this
> request and thus reuse the block bounce buffer for this, but I'm not
> even sure how to simply express this generically. As it is likely of no
> use outside of this specific case, putting it in the driver (or usb
> layer, if you expect more of these similar cases) is the best option.

Yeah, agreed, as I wrote in the first mail:

http://marc.info/?l=linux-kernel&m=121430416329618&w=2

I guess that a generic mechanism reserving some buffers in the block
layer might work for them. I also need such a mechnism to convert sg
and st to use the block layer (yeah, it's overdue but still on my todo
list).

2008-06-26 12:54:52

by Jens Axboe

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thu, Jun 26 2008, FUJITA Tomonori wrote:
> On Thu, 26 Jun 2008 08:35:59 +0200
> Jens Axboe <[email protected]> wrote:
>
> > On Thu, Jun 26 2008, FUJITA Tomonori wrote:
> > > On Thu, 26 Jun 2008 11:06:03 +0900
> > > FUJITA Tomonori <[email protected]> wrote:
> > >
> > > > On Wed, 25 Jun 2008 10:23:00 -0400 (EDT)
> > > > Alan Stern <[email protected]> wrote:
> > > >
> > > > > On Wed, 25 Jun 2008, FUJITA Tomonori wrote:
> > > > >
> > > > > > > For example, suppose an I/O request starts out with two S-G elements
> > > > > > > of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
> > > > > > > that all elements except the last must have length divisible by 1024.
> > > > > > > Then the request could be broken up into three requests of 1024, 512,
> > > > > > > and 2048 bytes.
> > > > > >
> > > > > > I can't say that it's easy to implement a clean mechanism to break up
> > > > > > a request into multiple requests until I see a patch.
> > > > >
> > > > > And I can't write a patch without learning a lot more about how the
> > > > > block core works.
> > > > >
> > > > > > What I said is that you think that this is about extending something
> > > > > > in the block layer but it's about adding a new concept to the block
> > > > > > layer.
> > > > >
> > > > > Is it? What does the block layer do when it receives an I/O request
> > > > > that don't satisfy the other constraints (max_sectors or
> > > > > dma_alignment_mask, for example)?
> > > >
> > > > As I explained, you need something new.
> > > >
> > > > I don't think that max_sectors works as you expect.
> > >
> > > The block layer looks at max_sectors when merging two things (or add
> > > one to another). So the test fails, it doesn't merge them.
> > >
> > >
> > > > dma_alignment_mask is not used in the FS path. And I think that
> > > > dma_alignment_mask doens't solve your problems.
> > >
> > > If dma_alignment_mask test fails, the block layer allocates temporary
> > > buffers and does memory copies.
> >
> > I don't think adding anything in the general IO path makes a lot of
> > sense, this is a really screwy case. I don't mind adding work-arounds to
> > the block layer to cater for hardware weirdness, but this is getting a
> > little silly. We could provide a helper function for 'bouncing' this
> > request and thus reuse the block bounce buffer for this, but I'm not
> > even sure how to simply express this generically. As it is likely of no
> > use outside of this specific case, putting it in the driver (or usb
> > layer, if you expect more of these similar cases) is the best option.
>
> Yeah, agreed, as I wrote in the first mail:
>
> http://marc.info/?l=linux-kernel&m=121430416329618&w=2
>
> I guess that a generic mechanism reserving some buffers in the block
> layer might work for them. I also need such a mechnism to convert sg
> and st to use the block layer (yeah, it's overdue but still on my todo
> list).

On the fs side, just setting a hw block size of 1k should fix the
problem, since that'd be your minimum transfer size AND alignment there
even for O_DIRECT IO.

So that leaves SG_IO (and similar) issued IO, which are typically really
small (and thus not an issue, since it'll be a single sg element). For
the bigger ones, sg elements should be tightly packed (eg page size)
except the last one.

Alan, in what specific cases have you observed IO requests that violate
the rules you gave? The example of:

"For example, suppose an I/O request starts out with two S-G elements of
1536 bytes and 2048 bytes respectively, and the DMA requirement is"

really sounds concocted, have you ever seen something like that?

--
Jens Axboe

2008-06-26 13:09:56

by Andi Kleen

[permalink] [raw]
Subject: Re: Scatter-gather list constraints


> On the fs side, just setting a hw block size of 1k should fix the
> problem, since that'd be your minimum transfer size AND alignment there
> even for O_DIRECT IO.

XFS used to force 512 byte IOs for its log IO. Not sure
that was ever fixed. If it was fixed it likely required a disk format
change (I think s390 ran into a problem like this)

-Andi

2008-06-26 13:11:14

by Jens Axboe

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thu, Jun 26 2008, Andi Kleen wrote:
>
> > On the fs side, just setting a hw block size of 1k should fix the
> > problem, since that'd be your minimum transfer size AND alignment there
> > even for O_DIRECT IO.
>
> XFS used to force 512 byte IOs for its log IO. Not sure
> that was ever fixed. If it was fixed it likely required a disk format
> change (I think s390 ran into a problem like this)

Issuing IO less than the hardware block size is illegal, so if they do
that then they can't be supported on hardware with > 512b block sizes.
Someone has to do the RMW for such an operation and we don't do it in
drivers.

--
Jens Axboe

2008-06-26 14:18:44

by Alan Stern

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thu, 26 Jun 2008, Jens Axboe wrote:

> I don't think adding anything in the general IO path makes a lot of
> sense, this is a really screwy case. I don't mind adding work-arounds to
> the block layer to cater for hardware weirdness, but this is getting a
> little silly. We could provide a helper function for 'bouncing' this
> request and thus reuse the block bounce buffer for this, but I'm not
> even sure how to simply express this generically. As it is likely of no
> use outside of this specific case, putting it in the driver (or usb
> layer, if you expect more of these similar cases) is the best option.

That sounds great. If any of you could provide the bounce-buffer
helper functions, I will be pleased to write the USB-specific code to
call them.

Alan Stern

2008-06-26 15:12:22

by Alan Stern

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thu, 26 Jun 2008, Jens Axboe wrote:

> Alan, in what specific cases have you observed IO requests that violate
> the rules you gave? The example of:
>
> "For example, suppose an I/O request starts out with two S-G elements of
> 1536 bytes and 2048 bytes respectively, and the DMA requirement is"
>
> really sounds concocted, have you ever seen something like that?

It really was observed, though not by me. Here's the email message in
which it was reported (for some reason this doesn't seem to have made
it into the list archives):

> From [email protected] Thu Jun 26 11:05:30 2008
> Date: Wed, 11 Jun 2008 20:51:52 +0800
> From: AntonioLin <[email protected]>
> To: Alan Stern <[email protected]>
> Cc: David Vrabel <[email protected]>, [email protected]
> Subject: Re: [S] Re: [linux-uwb] packet size problem
>
> Hi All,
>
> I checked srb->device->request_queue->dma_alignment in usb_stor_bulk_Bulk_transport() routine. , the value is 1023.
>
> But in usb_stor_bulk_transfer_sglist, the length of first element in sg array is 3584 which is not divisible by 1024.
>
>
> Can you post your /proc/bus/usb/devices ?
>
> I don't know how to do this, could you descript moe about it ?
> (Sorry,I have few experience about Linux.)
>
> Thanks.
>
> Jun 11 16:43:14 localhost kernel: [ 1959.320234] usb-storage: *** thread sleeping.
> Jun 11 16:43:14 localhost kernel: [ 1959.320271] usb-storage: queuecommand called
> Jun 11 16:43:14 localhost kernel: [ 1959.320288] usb-storage: *** thread awakened.
> Jun 11 16:43:14 localhost kernel: [ 1959.320294] usb-storage: Command READ_10 (10 bytes)
> Jun 11 16:43:14 localhost kernel: [ 1959.320297] usb-storage: 28 00 00 00 a8 91 00 00 1f 00
> Jun 11 16:43:14 localhost kernel: [ 1959.320316] usb_stor_Bulk_transport:dma_alignment:1023
> Jun 11 16:43:14 localhost kernel: [ 1959.320322] usb-storage: Bulk Command S 0x43425355 T 0x2f L 15872 F 128 Trg 0 LUN 0 CL 10
> Jun 11 16:43:14 localhost kernel: [ 1959.320327] usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
> Jun 11 16:43:14 localhost kernel: [ 1959.320333] hwahc_op_urb_enqueue
> Jun 11 16:43:14 localhost kernel: [ 1959.320340] xfer d3202dc0 urb d30e6780 pipe 0xc0008200 [31 bytes] dma outbound inline
> Jun 11 16:43:14 localhost kernel: [ 1956.597834] giveback d3202dc0 0
> Jun 11 16:43:14 localhost kernel: [ 1959.323583] usb-storage: Status code 0; transferred 31/31
> Jun 11 16:43:14 localhost kernel: [ 1959.323588] usb-storage: -- transfer complete
> Jun 11 16:43:14 localhost kernel: [ 1959.323593] usb-storage: Bulk command transfer result=0
> Jun 11 16:43:14 localhost kernel: [ 1959.323598] usb-storage: usb_stor_bulk_transfer_sglist: xfer 15872 bytes, 4 entries
> Jun 11 16:43:14 localhost kernel: [ 1959.323611] hwahc_op_urb_enqueue
> Jun 11 16:43:14 localhost kernel: [ 1959.323618] xfer d3202000 urb d30e6c00 pipe 0xc0008280 [3584 bytes] dma inbound deferred
> Jun 11 16:43:14 localhost kernel: [ 1959.323633] hwahc_op_urb_enqueue
> Jun 11 16:43:14 localhost kernel: [ 1959.323640] xfer d32020c0 urb d30e6180 pipe 0xc0008280 [4096 bytes] dma inbound deferred
> Jun 11 16:43:14 localhost kernel: [ 1959.323647] hwahc_op_urb_enqueue
> Jun 11 16:43:14 localhost kernel: [ 1959.323652] xfer d3202780 urb d30e6900 pipe 0xc0008280 [4096 bytes] dma inbound deferred
> Jun 11 16:43:14 localhost kernel: [ 1959.323659] hwahc_op_urb_enqueue
> Jun 11 16:43:14 localhost kernel: [ 1959.323665] xfer d3202800 urb d30e6100 pipe 0xc0008280 [4096 bytes] dma inbound deferred
> Jun 11 16:43:14 localhost kernel: [ 1956.607877] hwa-hc 1-4:1.1: DTI: xfer d3202000#0 failed (0x87)
> Jun 11 16:43:14 localhost kernel: [ 1956.607877] giveback d3202000 -84

As you can see, the S-G element lengths for this I/O request were 3584,
4096, 4096, 4096, totalling 15872 bytes. I don't know what workload
caused this request to be generated; maybe Antonio can tell us.

Alan Stern

2008-06-26 15:20:18

by Boaz Harrosh

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

Jens Axboe wrote:
> On Thu, Jun 26 2008, FUJITA Tomonori wrote:
>> On Thu, 26 Jun 2008 08:35:59 +0200
>> Jens Axboe <[email protected]> wrote:
>>
>>> On Thu, Jun 26 2008, FUJITA Tomonori wrote:
>>>> On Thu, 26 Jun 2008 11:06:03 +0900
>>>> FUJITA Tomonori <[email protected]> wrote:
>>>>
>>>>> On Wed, 25 Jun 2008 10:23:00 -0400 (EDT)
>>>>> Alan Stern <[email protected]> wrote:
>>>>>
>>>>>> On Wed, 25 Jun 2008, FUJITA Tomonori wrote:
>>>>>>
>>>>>>>> For example, suppose an I/O request starts out with two S-G elements
>>>>>>>> of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
>>>>>>>> that all elements except the last must have length divisible by 1024.
>>>>>>>> Then the request could be broken up into three requests of 1024, 512,
>>>>>>>> and 2048 bytes.
>>>>>>> I can't say that it's easy to implement a clean mechanism to break up
>>>>>>> a request into multiple requests until I see a patch.
>>>>>> And I can't write a patch without learning a lot more about how the
>>>>>> block core works.
>>>>>>
>>>>>>> What I said is that you think that this is about extending something
>>>>>>> in the block layer but it's about adding a new concept to the block
>>>>>>> layer.
>>>>>> Is it? What does the block layer do when it receives an I/O request
>>>>>> that don't satisfy the other constraints (max_sectors or
>>>>>> dma_alignment_mask, for example)?
>>>>> As I explained, you need something new.
>>>>>
>>>>> I don't think that max_sectors works as you expect.
>>>> The block layer looks at max_sectors when merging two things (or add
>>>> one to another). So the test fails, it doesn't merge them.
>>>>
>>>>
>>>>> dma_alignment_mask is not used in the FS path. And I think that
>>>>> dma_alignment_mask doens't solve your problems.
>>>> If dma_alignment_mask test fails, the block layer allocates temporary
>>>> buffers and does memory copies.
>>> I don't think adding anything in the general IO path makes a lot of
>>> sense, this is a really screwy case. I don't mind adding work-arounds to
>>> the block layer to cater for hardware weirdness, but this is getting a
>>> little silly. We could provide a helper function for 'bouncing' this
>>> request and thus reuse the block bounce buffer for this, but I'm not
>>> even sure how to simply express this generically. As it is likely of no
>>> use outside of this specific case, putting it in the driver (or usb
>>> layer, if you expect more of these similar cases) is the best option.
>> Yeah, agreed, as I wrote in the first mail:
>>
>> http://marc.info/?l=linux-kernel&m=121430416329618&w=2
>>
>> I guess that a generic mechanism reserving some buffers in the block
>> layer might work for them. I also need such a mechnism to convert sg
>> and st to use the block layer (yeah, it's overdue but still on my todo
>> list).
>
> On the fs side, just setting a hw block size of 1k should fix the
> problem, since that'd be your minimum transfer size AND alignment there
> even for O_DIRECT IO.

Please forgive my ignorance, is there a way for devices to specify
minimum block size to upper layer, say if we have a new sata with 1k
sectors?

If not should we include it in Martin's "I/O hints work", if it is
not already included? (CCed)

Not that all this will help with a device that already has a file system
with 512 block size, say from another OS. That could be supported with
that special needed bouncing.

>
> So that leaves SG_IO (and similar) issued IO, which are typically really
> small (and thus not an issue, since it'll be a single sg element). For
> the bigger ones, sg elements should be tightly packed (eg page size)
> except the last one.
>
> Alan, in what specific cases have you observed IO requests that violate
> the rules you gave? The example of:
>
> "For example, suppose an I/O request starts out with two S-G elements of
> 1536 bytes and 2048 bytes respectively, and the DMA requirement is"
>
> really sounds concocted, have you ever seen something like that?
>

Boaz

2008-06-26 17:39:37

by Jens Axboe

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thu, Jun 26 2008, Boaz Harrosh wrote:
> Jens Axboe wrote:
> > On Thu, Jun 26 2008, FUJITA Tomonori wrote:
> >> On Thu, 26 Jun 2008 08:35:59 +0200
> >> Jens Axboe <[email protected]> wrote:
> >>
> >>> On Thu, Jun 26 2008, FUJITA Tomonori wrote:
> >>>> On Thu, 26 Jun 2008 11:06:03 +0900
> >>>> FUJITA Tomonori <[email protected]> wrote:
> >>>>
> >>>>> On Wed, 25 Jun 2008 10:23:00 -0400 (EDT)
> >>>>> Alan Stern <[email protected]> wrote:
> >>>>>
> >>>>>> On Wed, 25 Jun 2008, FUJITA Tomonori wrote:
> >>>>>>
> >>>>>>>> For example, suppose an I/O request starts out with two S-G elements
> >>>>>>>> of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
> >>>>>>>> that all elements except the last must have length divisible by 1024.
> >>>>>>>> Then the request could be broken up into three requests of 1024, 512,
> >>>>>>>> and 2048 bytes.
> >>>>>>> I can't say that it's easy to implement a clean mechanism to break up
> >>>>>>> a request into multiple requests until I see a patch.
> >>>>>> And I can't write a patch without learning a lot more about how the
> >>>>>> block core works.
> >>>>>>
> >>>>>>> What I said is that you think that this is about extending something
> >>>>>>> in the block layer but it's about adding a new concept to the block
> >>>>>>> layer.
> >>>>>> Is it? What does the block layer do when it receives an I/O request
> >>>>>> that don't satisfy the other constraints (max_sectors or
> >>>>>> dma_alignment_mask, for example)?
> >>>>> As I explained, you need something new.
> >>>>>
> >>>>> I don't think that max_sectors works as you expect.
> >>>> The block layer looks at max_sectors when merging two things (or add
> >>>> one to another). So the test fails, it doesn't merge them.
> >>>>
> >>>>
> >>>>> dma_alignment_mask is not used in the FS path. And I think that
> >>>>> dma_alignment_mask doens't solve your problems.
> >>>> If dma_alignment_mask test fails, the block layer allocates temporary
> >>>> buffers and does memory copies.
> >>> I don't think adding anything in the general IO path makes a lot of
> >>> sense, this is a really screwy case. I don't mind adding work-arounds to
> >>> the block layer to cater for hardware weirdness, but this is getting a
> >>> little silly. We could provide a helper function for 'bouncing' this
> >>> request and thus reuse the block bounce buffer for this, but I'm not
> >>> even sure how to simply express this generically. As it is likely of no
> >>> use outside of this specific case, putting it in the driver (or usb
> >>> layer, if you expect more of these similar cases) is the best option.
> >> Yeah, agreed, as I wrote in the first mail:
> >>
> >> http://marc.info/?l=linux-kernel&m=121430416329618&w=2
> >>
> >> I guess that a generic mechanism reserving some buffers in the block
> >> layer might work for them. I also need such a mechnism to convert sg
> >> and st to use the block layer (yeah, it's overdue but still on my todo
> >> list).
> >
> > On the fs side, just setting a hw block size of 1k should fix the
> > problem, since that'd be your minimum transfer size AND alignment there
> > even for O_DIRECT IO.
>
> Please forgive my ignorance, is there a way for devices to specify
> minimum block size to upper layer, say if we have a new sata with 1k
> sectors?

Sure, blk_queue_hardsect_size(). We support > 512b sector devices just
fine.

--
Jens Axboe

2008-06-26 17:42:21

by Jens Axboe

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thu, Jun 26 2008, Alan Stern wrote:
> On Thu, 26 Jun 2008, Jens Axboe wrote:
>
> > Alan, in what specific cases have you observed IO requests that violate
> > the rules you gave? The example of:
> >
> > "For example, suppose an I/O request starts out with two S-G elements of
> > 1536 bytes and 2048 bytes respectively, and the DMA requirement is"
> >
> > really sounds concocted, have you ever seen something like that?
>
> It really was observed, though not by me. Here's the email message in
> which it was reported (for some reason this doesn't seem to have made
> it into the list archives):
>
> > From [email protected] Thu Jun 26 11:05:30 2008
> > Date: Wed, 11 Jun 2008 20:51:52 +0800
> > From: AntonioLin <[email protected]>
> > To: Alan Stern <[email protected]>
> > Cc: David Vrabel <[email protected]>, [email protected]
> > Subject: Re: [S] Re: [linux-uwb] packet size problem
> >
> > Hi All,
> >
> > I checked srb->device->request_queue->dma_alignment in usb_stor_bulk_Bulk_transport() routine. , the value is 1023.
> >
> > But in usb_stor_bulk_transfer_sglist, the length of first element in sg array is 3584 which is not divisible by 1024.
> >
> >
> > Can you post your /proc/bus/usb/devices ?
> >
> > I don't know how to do this, could you descript moe about it ?
> > (Sorry,I have few experience about Linux.)
> >
> > Thanks.
> >
> > Jun 11 16:43:14 localhost kernel: [ 1959.320234] usb-storage: *** thread sleeping.
> > Jun 11 16:43:14 localhost kernel: [ 1959.320271] usb-storage: queuecommand called
> > Jun 11 16:43:14 localhost kernel: [ 1959.320288] usb-storage: *** thread awakened.
> > Jun 11 16:43:14 localhost kernel: [ 1959.320294] usb-storage: Command READ_10 (10 bytes)
> > Jun 11 16:43:14 localhost kernel: [ 1959.320297] usb-storage: 28 00 00 00 a8 91 00 00 1f 00
> > Jun 11 16:43:14 localhost kernel: [ 1959.320316] usb_stor_Bulk_transport:dma_alignment:1023
> > Jun 11 16:43:14 localhost kernel: [ 1959.320322] usb-storage: Bulk Command S 0x43425355 T 0x2f L 15872 F 128 Trg 0 LUN 0 CL 10
> > Jun 11 16:43:14 localhost kernel: [ 1959.320327] usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
> > Jun 11 16:43:14 localhost kernel: [ 1959.320333] hwahc_op_urb_enqueue
> > Jun 11 16:43:14 localhost kernel: [ 1959.320340] xfer d3202dc0 urb d30e6780 pipe 0xc0008200 [31 bytes] dma outbound inline
> > Jun 11 16:43:14 localhost kernel: [ 1956.597834] giveback d3202dc0 0
> > Jun 11 16:43:14 localhost kernel: [ 1959.323583] usb-storage: Status code 0; transferred 31/31
> > Jun 11 16:43:14 localhost kernel: [ 1959.323588] usb-storage: -- transfer complete
> > Jun 11 16:43:14 localhost kernel: [ 1959.323593] usb-storage: Bulk command transfer result=0
> > Jun 11 16:43:14 localhost kernel: [ 1959.323598] usb-storage: usb_stor_bulk_transfer_sglist: xfer 15872 bytes, 4 entries
> > Jun 11 16:43:14 localhost kernel: [ 1959.323611] hwahc_op_urb_enqueue
> > Jun 11 16:43:14 localhost kernel: [ 1959.323618] xfer d3202000 urb d30e6c00 pipe 0xc0008280 [3584 bytes] dma inbound deferred
> > Jun 11 16:43:14 localhost kernel: [ 1959.323633] hwahc_op_urb_enqueue
> > Jun 11 16:43:14 localhost kernel: [ 1959.323640] xfer d32020c0 urb d30e6180 pipe 0xc0008280 [4096 bytes] dma inbound deferred
> > Jun 11 16:43:14 localhost kernel: [ 1959.323647] hwahc_op_urb_enqueue
> > Jun 11 16:43:14 localhost kernel: [ 1959.323652] xfer d3202780 urb d30e6900 pipe 0xc0008280 [4096 bytes] dma inbound deferred
> > Jun 11 16:43:14 localhost kernel: [ 1959.323659] hwahc_op_urb_enqueue
> > Jun 11 16:43:14 localhost kernel: [ 1959.323665] xfer d3202800 urb d30e6100 pipe 0xc0008280 [4096 bytes] dma inbound deferred
> > Jun 11 16:43:14 localhost kernel: [ 1956.607877] hwa-hc 1-4:1.1: DTI: xfer d3202000#0 failed (0x87)
> > Jun 11 16:43:14 localhost kernel: [ 1956.607877] giveback d3202000 -84
>
> As you can see, the S-G element lengths for this I/O request were 3584,
> 4096, 4096, 4096, totalling 15872 bytes. I don't know what workload
> caused this request to be generated; maybe Antonio can tell us.

OK, I can see that happening for fs IO if the block alignment odd. The
1kb block size would definitely fix that, but cause SCSI to treat you as
such and cause problems as well. Alright, I'll write something up for
you to bounce such a request.

--
Jens Axboe

2008-06-26 17:42:45

by Perez-Gonzalez, Inaky

[permalink] [raw]
Subject: RE: Scatter-gather list constraints



>From: Alan Stern [mailto:[email protected]]
>
>> >> I suppose we could add a field to struct urb and add a flag
>> indicating
>> >> whether the controller driver supports S-G lists.
>> >
>> >This is what I was thinking.
>>
>> That would simplify a *LOT* of WUSB wire-adapter code which now is a
>> horrible kludge.
>
>In what way would it simplify the code?
>
>Note that usbcore already contains a scatter-gather library.
>(Unfortunately the library is limited in usefulness because it needs to
>run in process context.)
>
>Alan Stern

For WA, when we get a buffer to be sent from a URB, it has to be split
in chunks, each chunk has a header added. So we end up with a list of
chunks, most of them quite small. Each requires a single URB to send.
resources galore.

If we could queue all those, the overhead would be reduced to allocating
the headers (possibly in a continuous array) and the sg "descriptors"
to describe the whole thing.

However, the alignment stuff somebody mentioned in another email in this
thread might cause problems.

At the end it might not be all that doable (I might be missing some
subtle isssues), but it is well worth a look.

>Note that usbcore already contains a scatter-gather library.
>(Unfortunately the library is limited in usefulness because it needs to
>run in process context.)

And the overhead of one URB per sg "node" kills it's usability for
WAs.

2008-06-26 19:43:23

by Alan Stern

[permalink] [raw]
Subject: RE: Scatter-gather list constraints

On Thu, 26 Jun 2008, Perez-Gonzalez, Inaky wrote:

> For WA, when we get a buffer to be sent from a URB, it has to be split
> in chunks, each chunk has a header added. So we end up with a list of
> chunks, most of them quite small. Each requires a single URB to send.
> resources galore.
>
> If we could queue all those, the overhead would be reduced to allocating
> the headers (possibly in a continuous array) and the sg "descriptors"
> to describe the whole thing.
>
> However, the alignment stuff somebody mentioned in another email in this
> thread might cause problems.
>
> At the end it might not be all that doable (I might be missing some
> subtle isssues), but it is well worth a look.
>
> >Note that usbcore already contains a scatter-gather library.
> >(Unfortunately the library is limited in usefulness because it needs to
> >run in process context.)
>
> And the overhead of one URB per sg "node" kills it's usability for
> WAs.

For this case (lots of small chunks making up a single URB), using a
bounce buffer might well be the easiest solution. It depends on the
size of the URB and the number and sizes of the small chunks. There
would be a lot less overhead -- only one URB -- and one large memory
allocation instead of lots of small ones.

Alan Stern

2008-06-26 22:39:58

by Inaky Perez-Gonzalez

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thursday 26 June 2008, Alan Stern wrote:
> On Thu, 26 Jun 2008, Perez-Gonzalez, Inaky wrote:
>
> > For WA, when we get a buffer to be sent from a URB, it has to be split
> > in chunks, each chunk has a header added. So we end up with a list of
> > chunks, most of them quite small. Each requires a single URB to send.
> > resources galore.
> >
> > If we could queue all those, the overhead would be reduced to allocating
> > the headers (possibly in a continuous array) and the sg "descriptors"
> > to describe the whole thing.
> >
> > However, the alignment stuff somebody mentioned in another email in this
> > thread might cause problems.
> >
> > At the end it might not be all that doable (I might be missing some
> > subtle isssues), but it is well worth a look.
> >
> > >Note that usbcore already contains a scatter-gather library.
> > >(Unfortunately the library is limited in usefulness because it needs to
> > >run in process context.)
> >
> > And the overhead of one URB per sg "node" kills it's usability for
> > WAs.
>
> For this case (lots of small chunks making up a single URB), using a
> bounce buffer might well be the easiest solution. It depends on the
> size of the URB and the number and sizes of the small chunks. There
> would be a lot less overhead -- only one URB -- and one large memory
> allocation instead of lots of small ones.

That's what we have right now (if I rememeber correctly); the issue
is that you end up copying A LOT. I don't know, maybe I am just being
overperfectionist. The data chunks (segments) can be up to (digs)
3584 [from 512, in 512 increments] if I am reading the spec right
(WUSB1.0 4.5.1). Interleaving that with small chunks and change...
I don't know if that much copying will end up being that good, along
with the allocations it requires, etc.

--
Inaky

2008-08-27 21:32:28

by Alan Stern

[permalink] [raw]
Subject: Re: Scatter-gather list constraints

On Thu, 26 Jun 2008, Jens Axboe wrote:

> On Thu, Jun 26 2008, Alan Stern wrote:
> > On Thu, 26 Jun 2008, Jens Axboe wrote:
> >
> > > Alan, in what specific cases have you observed IO requests that violate
> > > the rules you gave? The example of:
> > >
> > > "For example, suppose an I/O request starts out with two S-G elements of
> > > 1536 bytes and 2048 bytes respectively, and the DMA requirement is"
> > >
> > > really sounds concocted, have you ever seen something like that?
> >
> > It really was observed, though not by me. Here's the email message in
> > which it was reported (for some reason this doesn't seem to have made
> > it into the list archives):
> >
> > > From [email protected] Thu Jun 26 11:05:30 2008
> > > Date: Wed, 11 Jun 2008 20:51:52 +0800
> > > From: AntonioLin <[email protected]>
> > > To: Alan Stern <[email protected]>
> > > Cc: David Vrabel <[email protected]>, [email protected]
> > > Subject: Re: [S] Re: [linux-uwb] packet size problem
> > >
> > > Hi All,
> > >
> > > I checked srb->device->request_queue->dma_alignment in usb_stor_bulk_Bulk_transport() routine. , the value is 1023.
> > >
> > > But in usb_stor_bulk_transfer_sglist, the length of first element in sg array is 3584 which is not divisible by 1024.
> > >
> > >
> > > Can you post your /proc/bus/usb/devices ?
> > >
> > > I don't know how to do this, could you descript moe about it ?
> > > (Sorry,I have few experience about Linux.)
> > >
> > > Thanks.
> > >
> > > Jun 11 16:43:14 localhost kernel: [ 1959.320234] usb-storage: *** thread sleeping.
> > > Jun 11 16:43:14 localhost kernel: [ 1959.320271] usb-storage: queuecommand called
> > > Jun 11 16:43:14 localhost kernel: [ 1959.320288] usb-storage: *** thread awakened.
> > > Jun 11 16:43:14 localhost kernel: [ 1959.320294] usb-storage: Command READ_10 (10 bytes)
> > > Jun 11 16:43:14 localhost kernel: [ 1959.320297] usb-storage: 28 00 00 00 a8 91 00 00 1f 00
> > > Jun 11 16:43:14 localhost kernel: [ 1959.320316] usb_stor_Bulk_transport:dma_alignment:1023
> > > Jun 11 16:43:14 localhost kernel: [ 1959.320322] usb-storage: Bulk Command S 0x43425355 T 0x2f L 15872 F 128 Trg 0 LUN 0 CL 10
> > > Jun 11 16:43:14 localhost kernel: [ 1959.320327] usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
> > > Jun 11 16:43:14 localhost kernel: [ 1959.320333] hwahc_op_urb_enqueue
> > > Jun 11 16:43:14 localhost kernel: [ 1959.320340] xfer d3202dc0 urb d30e6780 pipe 0xc0008200 [31 bytes] dma outbound inline
> > > Jun 11 16:43:14 localhost kernel: [ 1956.597834] giveback d3202dc0 0
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323583] usb-storage: Status code 0; transferred 31/31
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323588] usb-storage: -- transfer complete
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323593] usb-storage: Bulk command transfer result=0
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323598] usb-storage: usb_stor_bulk_transfer_sglist: xfer 15872 bytes, 4 entries
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323611] hwahc_op_urb_enqueue
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323618] xfer d3202000 urb d30e6c00 pipe 0xc0008280 [3584 bytes] dma inbound deferred
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323633] hwahc_op_urb_enqueue
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323640] xfer d32020c0 urb d30e6180 pipe 0xc0008280 [4096 bytes] dma inbound deferred
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323647] hwahc_op_urb_enqueue
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323652] xfer d3202780 urb d30e6900 pipe 0xc0008280 [4096 bytes] dma inbound deferred
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323659] hwahc_op_urb_enqueue
> > > Jun 11 16:43:14 localhost kernel: [ 1959.323665] xfer d3202800 urb d30e6100 pipe 0xc0008280 [4096 bytes] dma inbound deferred
> > > Jun 11 16:43:14 localhost kernel: [ 1956.607877] hwa-hc 1-4:1.1: DTI: xfer d3202000#0 failed (0x87)
> > > Jun 11 16:43:14 localhost kernel: [ 1956.607877] giveback d3202000 -84
> >
> > As you can see, the S-G element lengths for this I/O request were 3584,
> > 4096, 4096, 4096, totalling 15872 bytes. I don't know what workload
> > caused this request to be generated; maybe Antonio can tell us.
>
> OK, I can see that happening for fs IO if the block alignment odd. The
> 1kb block size would definitely fix that, but cause SCSI to treat you as
> such and cause problems as well. Alright, I'll write something up for
> you to bounce such a request.

Any progress?

Alan Stern