2008-11-17 21:54:44

by Leon Woestenberg

[permalink] [raw]
Subject: pci_map_sg() does not coalesce adjacent physical memory? x86

Hello,

pci_map_sg() does not coalesce the scattergather list for me on x86.
Is this expected? Documentation mentions that coalescing is typically
done by pci_map_sg().

Manually traversing scatterlists that describe large user space
allocations I found that on my system 25% reduction of list length can
be achieved.


static int sgm_map_to_table(struct sg_mapping_t *sgm)
{
int i, j = 0;
dma_addr_t addr = sg_dma_address(&sgm->sgl[0]);
unsigned int len = sg_dma_len(&sgm->sgl[0]);
dma_addr_t cont_addr = addr;
unsigned int cont_len = len;
for (i = 0; i < sgm->mapped_pages - 1; i++) {
dma_addr_t next = sg_dma_address(&sgm->sgl[i + 1]);
len = sg_dma_len(&sgm->sgl[i]);
printk(KERN_DEBUG "%04d: addr=0x%08x length=0x%08x\n",
i, addr, len);
/* page i + 1 is non-contiguous with page i? */
if (next != addr + len) {
/* TODO create entry here (we could overwrite i) */
printk(KERN_DEBUG "%4d: cont_addr=0x%08x
cont_len=0x%08x\n",
j++, cont_addr, cont_len);
cont_addr = next;
cont_len = 0;
}
/* add page i + 1 to current contiguous block */
cont_len += len;
/* goto page i + 1 */
addr = next;
}
/* TODO create entry here (we could overwrite i) */
printk(KERN_DEBUG "%04d: addr=0x%08x length=0x%08x\n", i, addr, len);
printk(KERN_DEBUG "%4d: cont_addr=0x%08x length=0x%08x\n",
j++, cont_addr, cont_len);
}

Regards,
--
Leon


2008-11-18 03:16:26

by Andrew Morton

[permalink] [raw]
Subject: Re: pci_map_sg() does not coalesce adjacent physical memory? x86

(cc's added)

On Mon, 17 Nov 2008 22:54:33 +0100 "Leon Woestenberg" <[email protected]> wrote:

> Hello,
>
> pci_map_sg() does not coalesce the scattergather list for me on x86.

In which kernel version(s)?

> Is this expected? Documentation mentions that coalescing is typically
> done by pci_map_sg().
>
> Manually traversing scatterlists that describe large user space
> allocations I found that on my system 25% reduction of list length can
> be achieved.
>
>
> static int sgm_map_to_table(struct sg_mapping_t *sgm)
> {
> int i, j = 0;
> dma_addr_t addr = sg_dma_address(&sgm->sgl[0]);
> unsigned int len = sg_dma_len(&sgm->sgl[0]);
> dma_addr_t cont_addr = addr;
> unsigned int cont_len = len;
> for (i = 0; i < sgm->mapped_pages - 1; i++) {
> dma_addr_t next = sg_dma_address(&sgm->sgl[i + 1]);
> len = sg_dma_len(&sgm->sgl[i]);
> printk(KERN_DEBUG "%04d: addr=0x%08x length=0x%08x\n",
> i, addr, len);
> /* page i + 1 is non-contiguous with page i? */
> if (next != addr + len) {
> /* TODO create entry here (we could overwrite i) */
> printk(KERN_DEBUG "%4d: cont_addr=0x%08x
> cont_len=0x%08x\n",
> j++, cont_addr, cont_len);
> cont_addr = next;
> cont_len = 0;
> }
> /* add page i + 1 to current contiguous block */
> cont_len += len;
> /* goto page i + 1 */
> addr = next;
> }
> /* TODO create entry here (we could overwrite i) */
> printk(KERN_DEBUG "%04d: addr=0x%08x length=0x%08x\n", i, addr, len);
> printk(KERN_DEBUG "%4d: cont_addr=0x%08x length=0x%08x\n",
> j++, cont_addr, cont_len);
> }
>

2008-11-18 05:21:44

by James Bottomley

[permalink] [raw]
Subject: Re: pci_map_sg() does not coalesce adjacent physical memory? x86

On Mon, 2008-11-17 at 19:15 -0800, Andrew Morton wrote:
> (cc's added)
>
> On Mon, 17 Nov 2008 22:54:33 +0100 "Leon Woestenberg" <[email protected]> wrote:
>
> > Hello,
> >
> > pci_map_sg() does not coalesce the scattergather list for me on x86.
>
> In which kernel version(s)?

Also which driver? Some manually disable physical merging, so they will
never coalesce adjacent pages.

You need the flag QUEUE_FLAG_CLUSTER set for physical merging to take
place.

James

2008-11-18 09:37:32

by Leon Woestenberg

[permalink] [raw]
Subject: Re: pci_map_sg() does not coalesce adjacent physical memory? x86

Hello all,

On Tue, Nov 18, 2008 at 6:21 AM, James Bottomley
<[email protected]> wrote:
> On Mon, 2008-11-17 at 19:15 -0800, Andrew Morton wrote:
>> (cc's added)
>>
>> On Mon, 17 Nov 2008 22:54:33 +0100 "Leon Woestenberg" <[email protected]> wrote:
>> > pci_map_sg() does not coalesce the scattergather list for me on x86.
>> In which kernel version(s)?
>
2.6.24 for x86, and 2.6.27 for powerpc. Sorry, this is a bogus report:

I tried to isolate the scattergather implementation, thereby the dev
pointer got NULL, which results in different dma_ops.

I am bringing a freshly written driver upstream 'altpciechdma' for the
Altera FPGA PCI Express Chaining DMA reference design. Will probably
appear in staging RSN.

Thanks for the help,
--
Leon

2008-11-19 05:20:26

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: pci_map_sg() does not coalesce adjacent physical memory? x86

On Mon, 17 Nov 2008 19:15:32 -0800
Andrew Morton <[email protected]> wrote:

> (cc's added)
>
> On Mon, 17 Nov 2008 22:54:33 +0100 "Leon Woestenberg" <[email protected]> wrote:
>
> > Hello,
> >
> > pci_map_sg() does not coalesce the scattergather list for me on x86.
>
> In which kernel version(s)?
>
> > Is this expected? Documentation mentions that coalescing is typically
> > done by pci_map_sg().

Hm, what document did you read? We might need to fix it.

pci_map_sg() is not a typical place to coalesce the entries of the sg
list are physically adjacent. The block layer is the typical place.

The dma operations are free to coalesce the entries that physically
and virtually adjacent but there are not many that does.

For example, by default, on x86, only AMD GART (x86_64) dma operation
coalesces such entries.


> > Manually traversing scatterlists that describe large user space
> > allocations I found that on my system 25% reduction of list length can
> > be achieved.
> >
> >
> > static int sgm_map_to_table(struct sg_mapping_t *sgm)
> > {
> > int i, j = 0;
> > dma_addr_t addr = sg_dma_address(&sgm->sgl[0]);
> > unsigned int len = sg_dma_len(&sgm->sgl[0]);
> > dma_addr_t cont_addr = addr;
> > unsigned int cont_len = len;
> > for (i = 0; i < sgm->mapped_pages - 1; i++) {
> > dma_addr_t next = sg_dma_address(&sgm->sgl[i + 1]);
> > len = sg_dma_len(&sgm->sgl[i]);
> > printk(KERN_DEBUG "%04d: addr=0x%08x length=0x%08x\n",
> > i, addr, len);
> > /* page i + 1 is non-contiguous with page i? */
> > if (next != addr + len) {
> > /* TODO create entry here (we could overwrite i) */
> > printk(KERN_DEBUG "%4d: cont_addr=0x%08x
> > cont_len=0x%08x\n",
> > j++, cont_addr, cont_len);
> > cont_addr = next;
> > cont_len = 0;
> > }
> > /* add page i + 1 to current contiguous block */
> > cont_len += len;
> > /* goto page i + 1 */
> > addr = next;
> > }
> > /* TODO create entry here (we could overwrite i) */
> > printk(KERN_DEBUG "%04d: addr=0x%08x length=0x%08x\n", i, addr, len);
> > printk(KERN_DEBUG "%4d: cont_addr=0x%08x length=0x%08x\n",
> > j++, cont_addr, cont_len);
> > }

What's kinda of your driver? If it's a SCSI (or block) driver, you
don't need this trick.

I don't think that inventing a homegrown function to coalesce sg
entries in a driver is a good idea. If you really need this, it's
would be better to have a generic function to coalesce sg entries and
modify the block layer to use it (and your driver can use it too).

2008-11-19 06:22:45

by Leon Woestenberg

[permalink] [raw]
Subject: Re: pci_map_sg() does not coalesce adjacent physical memory? x86

Hello,

On Wed, Nov 19, 2008 at 6:19 AM, FUJITA Tomonori
<[email protected]> wrote:
> On Mon, 17 Nov 2008 19:15:32 -0800
>> > pci_map_sg() does not coalesce the scattergather list for me on x86.
>> [ Marked as Bogus Report ]
>>
>> > Is this expected? Documentation mentions that coalescing is typically
>> > done by pci_map_sg().
>
> Hm, what document did you read? We might need to fix it.
>
http://lxr.linux.no/linux+v2.6.27.6/Documentation/DMA-API.txt

316 pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
317 int nents, int direction)
318
319Maps a scatter gather list from the block layer.
320
321Returns: the number of physical segments mapped (this may be shorter
322than <nents> passed in if the block layer determines that some
323elements of the scatter/gather list are physically adjacent and thus
324may be mapped with a single entry).

Also it is mentioned elsewhere, such a Linux Device Driver 3rd
Edition, page 451.

> pci_map_sg() is not a typical place to coalesce the entries of the sg
> list are physically adjacent. The block layer is the typical place.
>
Hmmm, why not?

In my case I am using the following flow:

virtual user buffer -> scatterlist -> pci_map_sg -> device specific
scatterlist table

Of course, I could write my own (tm) and map my pages into PCI using
pci_map_single, but I thought I was being the good kid by using what
was there.

> The dma operations are free to coalesce the entries that physically
> and virtually adjacent but there are not many that does.
>
> For example, by default, on x86, only AMD GART (x86_64) dma operation
> coalesces such entries.
>

> What's kinda of your driver? If it's a SCSI (or block) driver, you
> don't need this trick.
>
Not a block driver.

A data acquisition driver through PCI Express.

> I don't think that inventing a homegrown function to coalesce sg
> entries in a driver is a good idea. If you really need this, it's
> would be better to have a generic function to coalesce sg entries and
> modify the block layer to use it (and your driver can use it too).
>
+1 for this idea, *iff* it does not belong in pci_map_sg.

However, I think it *does* belong in pci_map_sg, because pci_map_sg
can perform virtual coalescing exploiting IOMMU's if I am not
mistaken.

Regards,
--
Leon

2008-11-19 06:59:14

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: pci_map_sg() does not coalesce adjacent physical memory? x86

On Wed, 19 Nov 2008 07:22:31 +0100
"Leon Woestenberg" <[email protected]> wrote:

> Hello,
>
> On Wed, Nov 19, 2008 at 6:19 AM, FUJITA Tomonori
> <[email protected]> wrote:
> > On Mon, 17 Nov 2008 19:15:32 -0800
> >> > pci_map_sg() does not coalesce the scattergather list for me on x86.
> >> [ Marked as Bogus Report ]
> >>
> >> > Is this expected? Documentation mentions that coalescing is typically
> >> > done by pci_map_sg().
> >
> > Hm, what document did you read? We might need to fix it.
> >
> http://lxr.linux.no/linux+v2.6.27.6/Documentation/DMA-API.txt
>
> 316 pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
> 317 int nents, int direction)
> 318
> 319Maps a scatter gather list from the block layer.
> 320
> 321Returns: the number of physical segments mapped (this may be shorter
> 322than <nents> passed in if the block layer determines that some
> 323elements of the scatter/gather list are physically adjacent and thus
> 324may be mapped with a single entry).

Hmm, the description looks confusing. The block layer coalesces
physically adjacent entries before pci_map_sg(). pci_map_sg() could
coalesces physically or virtually adjacent entries.


> Also it is mentioned elsewhere, such a Linux Device Driver 3rd
> Edition, page 451.
>
> > pci_map_sg() is not a typical place to coalesce the entries of the sg
> > list are physically adjacent. The block layer is the typical place.
> >
> Hmmm, why not?
>
> In my case I am using the following flow:
>
> virtual user buffer -> scatterlist -> pci_map_sg -> device specific
> scatterlist table

If you follow what the block does now, you build coalesced
scatterlists from virtual user buffers.


> Of course, I could write my own (tm) and map my pages into PCI using
> pci_map_single, but I thought I was being the good kid by using what
> was there.

I don't think it's not a good place. What pci_map_single should do is
mapping a virtual address to a dma-capable address.

IOMMU dma operations maps a virtual address to an I/O address and
non-IOMMU dma operations do nothing. As you said below, some IOMMU
operations do coalescing but it's virtual mapping (probably their code
can handle physical coalescing too though but the block layer does
physical coalescing before that though). And the virtual mapping is
optional. Not all the IOMMU operations do. As I said, x86 IOMMUs
(VT-d, AMD, and Calgary) except for GART does not do coalescing. Any
non-IOMMU dma operations don't do something complicated.

If you add physical coalescing code into pci_map_sg, block-layer
drivers go though physical coalescing code twice, the block layer and
pci_map_sg meaninglessly. It's not the right approach.


> > The dma operations are free to coalesce the entries that physically
> > and virtually adjacent but there are not many that does.
> >
> > For example, by default, on x86, only AMD GART (x86_64) dma operation
> > coalesces such entries.
> >
>
> > What's kinda of your driver? If it's a SCSI (or block) driver, you
> > don't need this trick.
> >
> Not a block driver.
>
> A data acquisition driver through PCI Express.

Hmm, what location will your driver live under drivers/?


> > I don't think that inventing a homegrown function to coalesce sg
> > entries in a driver is a good idea. If you really need this, it's
> > would be better to have a generic function to coalesce sg entries and
> > modify the block layer to use it (and your driver can use it too).
> >
> +1 for this idea, *iff* it does not belong in pci_map_sg.
>
> However, I think it *does* belong in pci_map_sg, because pci_map_sg
> can perform virtual coalescing exploiting IOMMU's if I am not
> mistaken.

As I wrote above, it's not the right approach.

2008-11-19 07:05:48

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: pci_map_sg() does not coalesce adjacent physical memory? x86

Oops, sorry about typos.

On Wed, 19 Nov 2008 15:58:12 +0900
FUJITA Tomonori <[email protected]> wrote:

> > Of course, I could write my own (tm) and map my pages into PCI using
> > pci_map_single, but I thought I was being the good kid by using what
> > was there.
>
> I don't think it's not a good place. What pci_map_single should do is
> mapping a virtual address to a dma-capable address.
>
> IOMMU dma operations maps a virtual address to an I/O address and
> non-IOMMU dma operations do nothing. As you said below, some IOMMU
> operations do coalescing but it's virtual mapping (probably their code
virtual coalescing

> can handle physical coalescing too though but the block layer does
> physical coalescing before that though). And the virtual mapping is
virtual coalescing

> optional. Not all the IOMMU operations do. As I said, x86 IOMMUs
> (VT-d, AMD, and Calgary) except for GART does not do coalescing. Any
> non-IOMMU dma operations don't do something complicated.

2008-11-19 07:58:58

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: pci_map_sg() does not coalesce adjacent physical memory? x86

On Wed, 19 Nov 2008 15:58:12 +0900
FUJITA Tomonori <[email protected]> wrote:

> On Wed, 19 Nov 2008 07:22:31 +0100
> "Leon Woestenberg" <[email protected]> wrote:
>
> > Hello,
> >
> > On Wed, Nov 19, 2008 at 6:19 AM, FUJITA Tomonori
> > <[email protected]> wrote:
> > > On Mon, 17 Nov 2008 19:15:32 -0800
> > >> > pci_map_sg() does not coalesce the scattergather list for me on x86.
> > >> [ Marked as Bogus Report ]
> > >>
> > >> > Is this expected? Documentation mentions that coalescing is typically
> > >> > done by pci_map_sg().
> > >
> > > Hm, what document did you read? We might need to fix it.
> > >
> > http://lxr.linux.no/linux+v2.6.27.6/Documentation/DMA-API.txt
> >
> > 316 pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
> > 317 int nents, int direction)
> > 318
> > 319Maps a scatter gather list from the block layer.
> > 320
> > 321Returns: the number of physical segments mapped (this may be shorter
> > 322than <nents> passed in if the block layer determines that some
> > 323elements of the scatter/gather list are physically adjacent and thus
> > 324may be mapped with a single entry).
>
> Hmm, the description looks confusing. The block layer coalesces
> physically adjacent entries before pci_map_sg(). pci_map_sg() could
> coalesces physically or virtually adjacent entries.

How about this?

=
From: FUJITA Tomonori <[email protected]>
Subject: [PATCH] fix pci_map_sg/dma_map_sg scatterlists handling in DMA-API.txt

- pci_map_sg/dma_map_sg are used with a scatter gather list that
doesn't come from the block layer (e.g. some network drivers do).

- how IOMMUs merge adjacent elements of the scatter/gather list is
independent of how the block layer determines sees elements.


Signed-off-by: FUJITA Tomonori <[email protected]>
---
Documentation/DMA-API.txt | 8 +++-----
1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index b8e8646..b462bb1 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -316,12 +316,10 @@ reduce current DMA mapping usage or delay and try again later).
pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
int nents, int direction)

-Maps a scatter gather list from the block layer.
-
Returns: the number of physical segments mapped (this may be shorter
-than <nents> passed in if the block layer determines that some
-elements of the scatter/gather list are physically adjacent and thus
-may be mapped with a single entry).
+than <nents> passed in if some elements of the scatter/gather list are
+physically or virtually adjacent and an IOMMU maps them with a single
+entry).

Please note that the sg cannot be mapped again if it has been mapped once.
The mapping process is allowed to destroy information in the sg.
--
1.5.5.GIT

2008-11-19 09:45:25

by Leon Woestenberg

[permalink] [raw]
Subject: Re: pci_map_sg() does not coalesce adjacent physical memory? x86

Hello,

On Wed, Nov 19, 2008 at 8:58 AM, FUJITA Tomonori
<[email protected]> wrote:
> On Wed, 19 Nov 2008 15:58:12 +0900
> FUJITA Tomonori <[email protected]> wrote:
>> On Wed, 19 Nov 2008 07:22:31 +0100
>
> - pci_map_sg/dma_map_sg are used with a scatter gather list that
> doesn't come from the block layer (e.g. some network drivers do).
>
This is the point I then want to make: we have pci_map_sg() users in
other system than the block layer, the network and v4l2 subsystems,
why cannot they benefit from coalescing?

Should they copy the block layer coalescing implementation, or should
that implementation be made more generic and live outside the block
sub system?

My intended use case is as follows. See the sg_write() call for the code flow.

http://www.sidebranch.com/leon/user_dma_sg.c


Thanks,
--
Leon

2008-11-19 10:06:16

by Tejun Heo

[permalink] [raw]
Subject: Re: pci_map_sg() does not coalesce adjacent physical memory? x86

Leon Woestenberg wrote:
> Hello,
>
> On Wed, Nov 19, 2008 at 8:58 AM, FUJITA Tomonori
> <[email protected]> wrote:
>> On Wed, 19 Nov 2008 15:58:12 +0900
>> FUJITA Tomonori <[email protected]> wrote:
>>> On Wed, 19 Nov 2008 07:22:31 +0100
>> - pci_map_sg/dma_map_sg are used with a scatter gather list that
>> doesn't come from the block layer (e.g. some network drivers do).
>>
> This is the point I then want to make: we have pci_map_sg() users in
> other system than the block layer, the network and v4l2 subsystems,
> why cannot they benefit from coalescing?

Because pci_map_sg() doesn't know the memory access limits of the
controller as block layer does.

> Should they copy the block layer coalescing implementation, or should
> that implementation be made more generic and live outside the block
> sub system?

The latter sounds like a good idea to me.

Thanks.

--
tejun