This is a followup to http://lkml.org/lkml/2007/8/24/280
Despite Grant's desire for a more elegant solution, there's
not much new here. I moved the API change from pci.h to
dma-mapping.h and removed the pci_ prefix from the name.
Problem Description
-------------------
On Altix, DMA may be reordered within the NUMA interconnect.
This can be a problem with Infiniband, where DMA to Completion Queues
allocated in user-space can race with data DMA. This patchset allows
a driver to associate a user-space memory region with a "dmaflush"
attribute, so that writes to the memory region flush in-flight DMA,
preventing the CQ/data race.
There are four patches in this set:
[1/4] dma: add dma_flags_set_dmaflush() to dma interface
[2/4] dma: redefine dma_flags_set_dmaflush() for sn-ia64
[3/4] dma: document dma_flags_set_dmaflush()
[4/4] mthca: allow setting "dmaflush" attribute on user-allocated memory
[+jejb to cc]
On Tue, Sep 25, 2007 at 04:58:43PM -0700, [email protected] wrote:
> This is a followup to http://lkml.org/lkml/2007/8/24/280
>
> Despite Grant's desire for a more elegant solution, there's
> not much new here. I moved the API change from pci.h to
> dma-mapping.h and removed the pci_ prefix from the name.
Thanks - but I don't have a better idea either.
I think you are right to just move forward with this until
someone provides a better API.
> Problem Description
> -------------------
> On Altix, DMA may be reordered within the NUMA interconnect.
> This can be a problem with Infiniband, where DMA to Completion Queues
> allocated in user-space can race with data DMA. This patchset allows
> a driver to associate a user-space memory region with a "dmaflush"
> attribute, so that writes to the memory region flush in-flight DMA,
> preventing the CQ/data race.
Can we define this API to provide the same semantics as the memory
that dma_alloc_coherent() returns?
Did I summarize this correctly?
Defining it terms of completion queues won't mean much to most folks.
Better to add a description of completion queues to the DMA-API.txt if
necessary. dma_alloc_coherent() API is pretty well understood.
> There are four patches in this set:
>
> [1/4] dma: add dma_flags_set_dmaflush() to dma interface
Sorry - this feels like a "color of the shed" argument, but isn't
this about DMA ordering attribute?
"dmaflush" is an action and not an attribute to me.
Is dma_flags_set_coherent() better since it's doing the same thing
as dma_alloc_coherent()?
> [2/4] dma: redefine dma_flags_set_dmaflush() for sn-ia64
> [3/4] dma: document dma_flags_set_dmaflush()
This patch updates Documentation/DMA-mapping.txt. But it's a change to
the generic (not PCI specific) API described in DMA-API.txt.
Can you update that as well please?
Upon reading the "2) Platforms that permit DMA reordering", I think I
have been confusing coherency with ordering. I think I have because DMA
is leaving the "PCI domain", crossing an "unordered domain" (NUMA,
interconnect), and then finally hitting the cache coherency "domain"
when it reaches a "far away" memory controller. That's why I've
been thinking of this as a coherency problem.
The description and API uses the word "flush" (which is ok I guess) instead
of describing this in terms of enforcing DMA ordering. Any DMA write to the
"strongly ordered" region will cause _all_ inflight DMA to be visible
to cache coherency, thus preserving the illusion of strong DMA ordering.
Does that sound right/better to you too?
I don't have chipset docs and some of this is just trying to rephrase
what I've heard before from former SGI employees.
> [4/4] mthca: allow setting "dmaflush" attribute on user-allocated memory
Besides calling the parameter "dmaflush", it looks fine to me.
(It's either a DMA ordering or coherency attribute depending on how
you want to look at it.)
thanks,
grant
On Tuesday, September 25, 2007 11:49:50 pm Grant Grundler wrote:
> Upon reading the "2) Platforms that permit DMA reordering", I think I
> have been confusing coherency with ordering. I think I have because DMA
> is leaving the "PCI domain", crossing an "unordered domain" (NUMA,
> interconnect), and then finally hitting the cache coherency "domain"
> when it reaches a "far away" memory controller. That's why I've
> been thinking of this as a coherency problem.
>
> The description and API uses the word "flush" (which is ok I guess) instead
> of describing this in terms of enforcing DMA ordering. Any DMA write to
> the "strongly ordered" region will cause _all_ inflight DMA to be visible
> to cache coherency, thus preserving the illusion of strong DMA ordering.
>
> Does that sound right/better to you too?
> I don't have chipset docs and some of this is just trying to rephrase
> what I've heard before from former SGI employees.
I definitely wouldn't describe this as a coherency issue--the lines involved
in the DMA writes are fully coherent. It's really an ordering problem, and
the new API is setting a "barrier" bit in the DMA address that indicates to
the bridge that any outstanding DMA should be written before the barriered
data. So calling it set_flush or set_barrier is fine with me...
Jesse
> Can we define this API to provide the same semantics as the memory
> that dma_alloc_coherent() returns?
No, definitely not. The property of the mapping here is all about
ordering with respect to other DMAs (from the same device) and nothing
to do with coherency between the CPU's and device's view of the memory.
> Sorry - this feels like a "color of the shed" argument, but isn't
> this about DMA ordering attribute?
> "dmaflush" is an action and not an attribute to me.
I guess I don't love the "dmaflush" name, but the property of these
mappings is exactly that DMA into one of these mappings also performs
the action of flushing other in-flight DMAs. However I guess your
point is a good one: the effect really desired is that DMAs to these
mappings become visible strictly after earlier DMAs, and we don't care
exactly how the effect is obtained.
No good idea of a better name though.
- R.
On Wed, Sep 26, 2007 at 12:49:50AM -0600, Grant Grundler wrote:
[edited out several points that I think have been already
addresed by others in this thread.]
> ....
> Defining it terms of completion queues won't mean much to most folks.
> Better to add a description of completion queues to the DMA-API.txt if
> necessary. dma_alloc_coherent() API is pretty well understood.
OK, next time I'll use a more generic description.
>
> > There are four patches in this set:
> >
> > [1/4] dma: add dma_flags_set_dmaflush() to dma interface
>
> Sorry - this feels like a "color of the shed" argument, but isn't
> this about DMA ordering attribute?
> "dmaflush" is an action and not an attribute to me.
Right - an attribute is a noun, not a verb. I'm going to try
"s/dmaflush/dmabarrier/" in the next version.
> ....
> This patch updates Documentation/DMA-mapping.txt. But it's a change to
> the generic (not PCI specific) API described in DMA-API.txt.
> Can you update that as well please?
>....
Ja, I realized that soon after hitting the send button. I'll
move the documentation to DMA-API.txt.
--
Arthur