2008-02-28 03:26:24

by Arthur Kepner

[permalink] [raw]
Subject: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface


Document the new dma_{un}map_{single|sg}_attrs() functions.

Signed-off-by: Arthur Kepner <[email protected]>
Acked-by: Grant Grundler <[email protected]>

---

DMA-API.txt | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
DMA-attributes.txt | 29 +++++++++++++++++++++++
2 files changed, 94 insertions(+)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index b939ebb..fdb82b0 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -395,6 +395,71 @@ Notes: You must do this:

See also dma_map_single().

+dma_addr_t
+dma_map_single_attrs(struct device *dev, void *cpu_addr, size_t size,
+ enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+
+void
+dma_unmap_single_attrs(struct device *dev, dma_addr_t dma_addr,
+ size_t size, enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+
+int
+dma_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
+ int nents, enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+
+void
+dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sgl,
+ int nents, enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+
+The four functions above are just like the counterpart functions
+without the _attrs suffixes, except that they pass an optional
+struct dma_attrs*.
+
+struct dma_attrs encapsulates a set of "dma attributes". For the
+definition of struct dma_attrs see linux/dma-attrs.h.
+
+The interpretation of dma attributes is architecture-specific, and
+each attribute should be documented in Documentation/DMA-attributes.txt.
+
+If struct dma_attrs* is NULL, the semantics of each of these
+functions is identical to those of the corresponding function
+without the _attrs suffix. As a result dma_map_single_attrs()
+can generally replace dma_map_single(), etc.
+
+As an example of the use of the *_attrs functions, here's how
+you could pass an attribute DMA_ATTR_FOO when mapping memory
+for DMA:
+
+#include <linux/dma-attrs.h>
+/* DMA_ATTR_FOO should be defined in linux/dma-attrs.h and
+ * documented in Documentation/DMA-attributes.txt */
+...
+
+ DECLARE_DMA_ATTRS(attrs);
+ dma_set_attr(&attrs, DMA_ATTR_FOO);
+ ....
+ n = dma_map_sg_attrs(dev, sg, nents, DMA_TO_DEVICE, &attr);
+ ....
+
+Architectures that care about DMA_ATTR_FOO would check for its
+presence in their implementations of the mapping and unmapping
+routines, e.g.:
+
+void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t dma_addr,
+ size_t size, enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+ ....
+ int foo = dma_get_attr(attrs, DMA_ATTR_FOO);
+ ....
+ if (foo)
+ /* twizzle the frobnozzle */
+ ....
+

Part II - Advanced dma_ usage
-----------------------------
diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index e69de29..36baea5 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -0,0 +1,29 @@
+ DMA attributes
+ ==============
+
+This document describes the semantics of the DMA attributes that are
+defined in linux/dma-attrs.h.
+
+
+DMA_ATTR_SYNC_ON_WRITE
+----------------------
+
+DMA_ATTR_SYNC_ON_WRITE is used on the IA64_SGI_SN2 architecture.
+It provides a mechanism for devices to explicitly order their DMA
+writes.
+
+On IA64_SGI_SN2 machines, DMA may be reordered within the NUMA
+interconnect. Allowing reordering improves performance, but in some
+situations it may be necessary to ensure that one DMA write is
+complete before another is visible. For example, if the device does
+a DMA write to indicate that data is available in memory, DMA of the
+"completion indication" can race with DMA of data.
+
+When a memory region is mapped with the DMA_ATTR_SYNC_ON_WRITE attribute,
+a write to that region causes all in-flight DMA to be flushed to memory.
+Any pending DMA will complete and be visible in memory before the write
+to the region with the DMA_ATTR_SYNC_ON_WRITE attribute becomes visible.
+
+(For more information, see the document titled "SGI Altix Architecture
+Considerations for Linux Device Drivers" at http://techpubs.sgi.com/.)
+


2008-02-29 02:45:59

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Thu, Feb 28, 2008 at 2:24 PM, <[email protected]> wrote:
>
> Document the new dma_{un}map_{single|sg}_attrs() functions.
>
> diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
> index e69de29..36baea5 100644
> --- a/Documentation/DMA-attributes.txt
> +++ b/Documentation/DMA-attributes.txt
> @@ -0,0 +1,29 @@
> + DMA attributes
> + ==============
> +
> +This document describes the semantics of the DMA attributes that are
> +defined in linux/dma-attrs.h.
> +
> +
> +DMA_ATTR_SYNC_ON_WRITE
> +----------------------
> +
> +DMA_ATTR_SYNC_ON_WRITE is used on the IA64_SGI_SN2 architecture.
> +It provides a mechanism for devices to explicitly order their DMA
> +writes.
> +
> +On IA64_SGI_SN2 machines, DMA may be reordered within the NUMA
> +interconnect. Allowing reordering improves performance, but in some
> +situations it may be necessary to ensure that one DMA write is
> +complete before another is visible. For example, if the device does
> +a DMA write to indicate that data is available in memory, DMA of the
> +"completion indication" can race with DMA of data.
> +
> +When a memory region is mapped with the DMA_ATTR_SYNC_ON_WRITE attribute,
> +a write to that region causes all in-flight DMA to be flushed to memory.
> +Any pending DMA will complete and be visible in memory before the write
> +to the region with the DMA_ATTR_SYNC_ON_WRITE attribute becomes visible.


I'm not clear how this is all meant to work. Your intial patch says
this is an interface to pass "architecture-specific
attributes" from drivers through to the DMA mapping code, which is
fair enough - we want to do something similar.

But it's not clear that DMA_ATTR_SYNC_ON_WRITE is architecture
specific. If I was a driver writer might assume it works on all
platforms. And in fact in patch 3/3 you define it in
linux/dma-attrs.h, so it's not architecture specific.

What is architecture specific is the semantic. DMA_ATTR_SYNC_ON_WRITE
is defined entirely in terms of what it does on ia64 hardware, and a
particular flavour thereof.

It just seems to me we're going to end up with a gazillion flags,
because DMA_ATTR_SYNC_ON_WRITE isn't quite the same semantic as what
Power can do. So we'll have to add
DMA_ATTR_SYNC_ON_WRITE_WRT_OTHER_SYNC_ON_WRITE_MAPPINGS_ONLY and so
on. Or, we end with subtle driver bugs because the semantics aren't
clear across platforms.

I guess I think the attributes should either be truly arch-specific,
ie. DMA_ATTR_IA64_SYNC_ON_WRITE. Or we need to come up with some well
defined, and architecture neutral semantics for what the flags mean.

cheers

2008-02-29 18:25:36

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Fri, Feb 29, 2008 at 01:45:46PM +1100, Michael Ellerman wrote:
> On Thu, Feb 28, 2008 at 2:24 PM, <[email protected]> wrote:
> >
> > Document the new dma_{un}map_{single|sg}_attrs() functions.
> >
> > diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
> > index e69de29..36baea5 100644
> > --- a/Documentation/DMA-attributes.txt
> > +++ b/Documentation/DMA-attributes.txt
> > @@ -0,0 +1,29 @@
> > + DMA attributes
> > + ==============
> > +
> > +This document describes the semantics of the DMA attributes that are
> > +defined in linux/dma-attrs.h.
> > +
> > +
> > +DMA_ATTR_SYNC_ON_WRITE
> > +----------------------
> > +
> > +DMA_ATTR_SYNC_ON_WRITE is used on the IA64_SGI_SN2 architecture.
> > +It provides a mechanism for devices to explicitly order their DMA
> > +writes.
> > +
> > +On IA64_SGI_SN2 machines, DMA may be reordered within the NUMA
> > +interconnect. Allowing reordering improves performance, but in some
> > +situations it may be necessary to ensure that one DMA write is
> > +complete before another is visible. For example, if the device does
> > +a DMA write to indicate that data is available in memory, DMA of the
> > +"completion indication" can race with DMA of data.
> > +
> > +When a memory region is mapped with the DMA_ATTR_SYNC_ON_WRITE attribute,
> > +a write to that region causes all in-flight DMA to be flushed to memory.
> > +Any pending DMA will complete and be visible in memory before the write
> > +to the region with the DMA_ATTR_SYNC_ON_WRITE attribute becomes visible.
>
>
> I'm not clear how this is all meant to work. Your intial patch says
> this is an interface to pass "architecture-specific
> attributes" from drivers through to the DMA mapping code, which is
> fair enough - we want to do something similar.
>
> But it's not clear that DMA_ATTR_SYNC_ON_WRITE is architecture
> specific. If I was a driver writer might assume it works on all
> platforms.

That would be a fair assumption. But it is required to be a NOP on
platforms that don't need the "hint" ("attr" or whatever you want
to call it). Specific architectures (SN2 in this case) will need
to implement something.

> And in fact in patch 3/3 you define it in
> linux/dma-attrs.h, so it's not architecture specific.

Correct. The same driver needs to compile on all architectures.

> What is architecture specific is the semantic. DMA_ATTR_SYNC_ON_WRITE
> is defined entirely in terms of what it does on ia64 hardware, and a
> particular flavour thereof.
>
> It just seems to me we're going to end up with a gazillion flags,
> because DMA_ATTR_SYNC_ON_WRITE isn't quite the same semantic as what
> Power can do. So we'll have to add
> DMA_ATTR_SYNC_ON_WRITE_WRT_OTHER_SYNC_ON_WRITE_MAPPINGS_ONLY and so
> on. Or, we end with subtle driver bugs because the semantics aren't
> clear across platforms.

I agree. Just have to sort those issues out as people come up with them.

> I guess I think the attributes should either be truly arch-specific,
> ie. DMA_ATTR_IA64_SYNC_ON_WRITE. Or we need to come up with some well
> defined, and architecture neutral semantics for what the flags mean.

The latter. I have no intention of adding arch specific flags that
only mean something on one arch. The intent here is to enable
correct functionality on specific arches without impacting
performance on arches that don't need those "attributes".

hth,
grant

>
> cheers

2008-02-29 18:38:20

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Fri, 2008-02-29 at 11:25 -0700, Grant Grundler wrote:
> On Fri, Feb 29, 2008 at 01:45:46PM +1100, Michael Ellerman wrote:
> > On Thu, Feb 28, 2008 at 2:24 PM, <[email protected]> wrote:
> > >
> > > Document the new dma_{un}map_{single|sg}_attrs() functions.
> > >
> > > diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
> > > index e69de29..36baea5 100644
> > > --- a/Documentation/DMA-attributes.txt
> > > +++ b/Documentation/DMA-attributes.txt
> > > @@ -0,0 +1,29 @@
> > > + DMA attributes
> > > + ==============
> > > +
> > > +This document describes the semantics of the DMA attributes that are
> > > +defined in linux/dma-attrs.h.
> > > +
> > > +
> > > +DMA_ATTR_SYNC_ON_WRITE
> > > +----------------------
> > > +
> > > +DMA_ATTR_SYNC_ON_WRITE is used on the IA64_SGI_SN2 architecture.
> > > +It provides a mechanism for devices to explicitly order their DMA
> > > +writes.
> > > +
> > > +On IA64_SGI_SN2 machines, DMA may be reordered within the NUMA
> > > +interconnect. Allowing reordering improves performance, but in some
> > > +situations it may be necessary to ensure that one DMA write is
> > > +complete before another is visible. For example, if the device does
> > > +a DMA write to indicate that data is available in memory, DMA of the
> > > +"completion indication" can race with DMA of data.
> > > +
> > > +When a memory region is mapped with the DMA_ATTR_SYNC_ON_WRITE attribute,
> > > +a write to that region causes all in-flight DMA to be flushed to memory.
> > > +Any pending DMA will complete and be visible in memory before the write
> > > +to the region with the DMA_ATTR_SYNC_ON_WRITE attribute becomes visible.
> >
> >
> > I'm not clear how this is all meant to work. Your intial patch says
> > this is an interface to pass "architecture-specific
> > attributes" from drivers through to the DMA mapping code, which is
> > fair enough - we want to do something similar.
> >
> > But it's not clear that DMA_ATTR_SYNC_ON_WRITE is architecture
> > specific. If I was a driver writer might assume it works on all
> > platforms.
>
> That would be a fair assumption. But it is required to be a NOP on
> platforms that don't need the "hint" ("attr" or whatever you want
> to call it). Specific architectures (SN2 in this case) will need
> to implement something.

To be honest, I still don't like the name. SYNC_ON_WRITE is the SN2
implementation. What it's actually doing is implementing strict
ordering semantics. I think it should really be
DMA_ATTR_STRICT_ORDERING (with a corresponding
DMA_ATTR_RELAXED_ORDERING).

This means that if ever anyone sets a PCIe bridge to relaxed ordering by
default, this attribute will also work for them.

James

2008-02-29 21:25:37

by Arthur Kepner

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Fri, Feb 29, 2008 at 01:45:46PM +1100, Michael Ellerman wrote:
> ....
> I guess I think the attributes should either be truly arch-specific,
> ie. DMA_ATTR_IA64_SYNC_ON_WRITE. Or we need to come up with some well
> defined, and architecture neutral semantics for what the flags mean.
>

I'd hoped to do just this - to define architecture neutral semantics
for attributes.

But I see your point - the documentation in DMA-attributes.txt isn't
consistent with that plan. Let me see if I can address your concerns
by changing the documentation.

--
Arthur

2008-03-01 03:01:22

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface


On Fri, 2008-02-29 at 12:37 -0600, James Bottomley wrote:
> To be honest, I still don't like the name. SYNC_ON_WRITE is the SN2
> implementation. What it's actually doing is implementing strict
> ordering semantics. I think it should really be
> DMA_ATTR_STRICT_ORDERING (with a corresponding
> DMA_ATTR_RELAXED_ORDERING).
>
> This means that if ever anyone sets a PCIe bridge to relaxed ordering
> by
> default, this attribute will also work for them.

But that would be asking for trouble no ? I would expect pretty much
everything to break appart with relaxed by default no ? Or I don't fully
understand what your arch calls "relaxed"...

I do agree that we should aim for simple semantics, they should cover
99% of the needs, and leave some bit space or attribute space for archs
to define private ones when really needed.

In our case, I suspect that the two main thing we could define here for
DMA that would be useful generally would be relaxed ordering (strict
being the default) and maybe read prefetching (though that would be the
default, maybe no prefetch). We -might- have use of separating relaxed
ordering for read vs. writes, but that's pretty much it.

Cheers,
Ben.

2008-03-01 03:12:28

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Friday, February 29, 2008 6:56 pm Benjamin Herrenschmidt wrote:
> On Fri, 2008-02-29 at 12:37 -0600, James Bottomley wrote:
> > To be honest, I still don't like the name. SYNC_ON_WRITE is the SN2
> > implementation. What it's actually doing is implementing strict
> > ordering semantics. I think it should really be
> > DMA_ATTR_STRICT_ORDERING (with a corresponding
> > DMA_ATTR_RELAXED_ORDERING).
> >
> > This means that if ever anyone sets a PCIe bridge to relaxed ordering
> > by
> > default, this attribute will also work for them.
>
> But that would be asking for trouble no ? I would expect pretty much
> everything to break appart with relaxed by default no ? Or I don't fully
> understand what your arch calls "relaxed"...

Depends on how the device completes commands and what the driver does to check
for completion. It would probably work fine in many cases.

> I do agree that we should aim for simple semantics, they should cover
> 99% of the needs, and leave some bit space or attribute space for archs
> to define private ones when really needed.
>
> In our case, I suspect that the two main thing we could define here for
> DMA that would be useful generally would be relaxed ordering (strict
> being the default) and maybe read prefetching (though that would be the
> default, maybe no prefetch). We -might- have use of separating relaxed
> ordering for read vs. writes, but that's pretty much it.

Sounds about right. I've got to poke the PCIe 3.0 folks again about some of
the potential ordering changes there; hopefully we can cover that stuff too
with Arthur's approach.

Jesse

2008-03-01 07:18:48

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Sat, Mar 01, 2008 at 01:56:44PM +1100, Benjamin Herrenschmidt wrote:
...
> > This means that if ever anyone sets a PCIe bridge to relaxed ordering
> > by default, this attribute will also work for them.

Is this the same "relaxed ordering" defined in PCIe 1.0 specs?
If so, as explained before, use of "RO" bit in the DMA transactions
is up the the device and it MUST send a DMA transaction without
RO bit set to flush in-flight writes to the device. The host OS
does not have control over individual transactions, just the
general enablement of the feature. And "RO" bit should be
transperent to current applications including RDMA.

It's a very different issue than the SN2 node interconnect where
ALL transactions can be re-ordered by default and we have to do
something special to force ordering whene DMA writes are issued
to specific memory regions. The PCI device has no control over
this (unlike RO-bit discussed above).

hth,
grant

> But that would be asking for trouble no ? I would expect pretty much
> everything to break appart with relaxed by default no ? Or I don't fully
> understand what your arch calls "relaxed"...
>
> I do agree that we should aim for simple semantics, they should cover
> 99% of the needs, and leave some bit space or attribute space for archs
> to define private ones when really needed.
>
> In our case, I suspect that the two main thing we could define here for
> DMA that would be useful generally would be relaxed ordering (strict
> being the default) and maybe read prefetching (though that would be the
> default, maybe no prefetch). We -might- have use of separating relaxed
> ordering for read vs. writes, but that's pretty much it.
>
> Cheers,
> Ben.
>

2008-03-05 18:14:45

by Arthur Kepner

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Fri, Feb 29, 2008 at 12:37:56PM -0600, James Bottomley wrote:
> ....
> To be honest, I still don't like the name. SYNC_ON_WRITE is the SN2
> implementation. What it's actually doing is implementing strict
> ordering semantics. I think it should really be
> DMA_ATTR_STRICT_ORDERING (with a corresponding
> DMA_ATTR_RELAXED_ORDERING)....

I've been thinking about a new name, but don't like
DMA_ATTR_STRICT_ORDERING.

What I'm trying to do is to establish order (across
a NUMA fabric) of DMA to different memory regions, i.e.,
DMA to memory region A forces all outstanding DMA (to
memory regions B, C,....) to complete first.

DMA_ATTR_STRICT_ORDERING sounds like a PCI thing to me,
and this is a NUMA interconnect thing.

--
Arthur

2008-03-05 19:02:45

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Wednesday, March 05, 2008 10:13 am [email protected] wrote:
> On Fri, Feb 29, 2008 at 12:37:56PM -0600, James Bottomley wrote:
> > ....
> > To be honest, I still don't like the name. SYNC_ON_WRITE is the SN2
> > implementation. What it's actually doing is implementing strict
> > ordering semantics. I think it should really be
> > DMA_ATTR_STRICT_ORDERING (with a corresponding
> > DMA_ATTR_RELAXED_ORDERING)....
>
> I've been thinking about a new name, but don't like
> DMA_ATTR_STRICT_ORDERING.
>
> What I'm trying to do is to establish order (across
> a NUMA fabric) of DMA to different memory regions, i.e.,
> DMA to memory region A forces all outstanding DMA (to
> memory regions B, C,....) to complete first.
>
> DMA_ATTR_STRICT_ORDERING sounds like a PCI thing to me,
> and this is a NUMA interconnect thing.

Well, we used to call it a DMA barrier back in the old days, so
DMA_ATTR_BARRIER might work (or DMA_ATTR_INTERCONNECT_BARRIER if you want to
be extra clear about what it's doing).

Jesse

2008-03-06 06:01:57

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Wed, 2008-03-05 at 10:13 -0800, [email protected] wrote:
> On Fri, Feb 29, 2008 at 12:37:56PM -0600, James Bottomley wrote:
> > ....
> > To be honest, I still don't like the name. SYNC_ON_WRITE is the SN2
> > implementation. What it's actually doing is implementing strict
> > ordering semantics. I think it should really be
> > DMA_ATTR_STRICT_ORDERING (with a corresponding
> > DMA_ATTR_RELAXED_ORDERING)....
>
> I've been thinking about a new name, but don't like
> DMA_ATTR_STRICT_ORDERING.
>
> What I'm trying to do is to establish order (across
> a NUMA fabric) of DMA to different memory regions, i.e.,
> DMA to memory region A forces all outstanding DMA (to
> memory regions B, C,....) to complete first.
>
> DMA_ATTR_STRICT_ORDERING sounds like a PCI thing to me,
> and this is a NUMA interconnect thing.


It's not a matter of whether it's a PCI thing or a NUMA thing. The name
of the attribute needs to reflect the semantics of what the driver
perceives, along the entire path from the device including PCI and NUMA
and other bits.

How the attribute is implemented is up to the arch code, it can decide
if it sets attributes on the PHB or the device or in the NUMA
interconnect or whatever.

For a single mapping, it's fairly straight forward to define two obvious
cases:
- strict ordering, all accesses to this mapping are ordered.
- weak ordering (none, relaxed ..) - accesses to this mapping are not
ordered in anyway.

Where I think it gets confusing is the semantics between mappings with
different attributes. If I map region A weak and then region B strict,
does an access to region B have any effect on accesses to region A?

It sounds like in your case the answer is yes - but I don't know if
that's clearly the right answer.

/me scratches his head :)

cheers

--
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-03-12 01:21:32

by Arthur Kepner

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface


I have a new proposal for the documentation portion of this
patchset. The code changes essentially amount to
s/DMA_ATTR_SYNC_ON_WRITE/DMA_ATTR_BARRIER/ so thought I'd send
just the doc change for comments now.

The description of the DMA_ATTR_BARRIER is now very short and
generic - no particular architecture is even mentioned. I can add
a sentence or two near the architecture-specific changes in
arch/ia64/sn/pci/pci_dma.c about why the implementation works
on ia64/sn, etc.

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index e69de29..a4106ec 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -0,0 +1,24 @@
+ DMA attributes
+ ==============
+
+This document describes the semantics of the DMA attributes that are
+defined in linux/dma-attrs.h.
+
+DMA_ATTR_BARRIER
+----------------
+
+DMA_ATTR_BARRIER is a barrier attribute for DMA. DMA to a memory
+region with the DMA_ATTR_BARRIER attribute forces all pending DMA
+writes to complete, and thus provides a mechanism to strictly order
+DMA from a device across all intervening busses and bridges. This
+barrier is not specific to a particular type of interconnect, it
+applies to the system as a whole, and so its implementation must
+account for the idiosyncracies of the system all the way from the
+DMA device to memory.
+
+As an example of a situation where DMA_ATTR_BARRIER would be useful,
+suppose that a device does a DMA write to indicate that data is ready
+and available in memory. The DMA of the "completion indication" could
+race with data DMA. Mapping the memory used for completion indications
+with DMA_ATTR_BARRIER would prevent the race.
+

2008-03-14 04:14:02

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Tue, Mar 11, 2008 at 06:19:54PM -0700, [email protected] wrote:
>
> I have a new proposal for the documentation portion of this
> patchset. The code changes essentially amount to
> s/DMA_ATTR_SYNC_ON_WRITE/DMA_ATTR_BARRIER/ so thought I'd send
> just the doc change for comments now.
>
> The description of the DMA_ATTR_BARRIER is now very short and
> generic - no particular architecture is even mentioned. I can add
> a sentence or two near the architecture-specific changes in
> arch/ia64/sn/pci/pci_dma.c about why the implementation works
> on ia64/sn, etc.

Acked-by: Grant Grundler <[email protected]>

One part of me still wants some history in the doc. But I think
it's better to keep it short and let folks look for implementations
and make sure they are sufficiently well explained.

thanks,
grant

> diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
> index e69de29..a4106ec 100644
> --- a/Documentation/DMA-attributes.txt
> +++ b/Documentation/DMA-attributes.txt
> @@ -0,0 +1,24 @@
> + DMA attributes
> + ==============
> +
> +This document describes the semantics of the DMA attributes that are
> +defined in linux/dma-attrs.h.
> +
> +DMA_ATTR_BARRIER
> +----------------
> +
> +DMA_ATTR_BARRIER is a barrier attribute for DMA. DMA to a memory
> +region with the DMA_ATTR_BARRIER attribute forces all pending DMA
> +writes to complete, and thus provides a mechanism to strictly order
> +DMA from a device across all intervening busses and bridges. This
> +barrier is not specific to a particular type of interconnect, it
> +applies to the system as a whole, and so its implementation must
> +account for the idiosyncracies of the system all the way from the
> +DMA device to memory.
> +
> +As an example of a situation where DMA_ATTR_BARRIER would be useful,
> +suppose that a device does a DMA write to indicate that data is ready
> +and available in memory. The DMA of the "completion indication" could
> +race with data DMA. Mapping the memory used for completion indications
> +with DMA_ATTR_BARRIER would prevent the race.
> +

2008-03-14 04:32:18

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Tue, 2008-03-11 at 18:19 -0700, [email protected] wrote:
> I have a new proposal for the documentation portion of this
> patchset. The code changes essentially amount to
> s/DMA_ATTR_SYNC_ON_WRITE/DMA_ATTR_BARRIER/ so thought I'd send
> just the doc change for comments now.
>
> The description of the DMA_ATTR_BARRIER is now very short and
> generic - no particular architecture is even mentioned. I can add
> a sentence or two near the architecture-specific changes in
> arch/ia64/sn/pci/pci_dma.c about why the implementation works
> on ia64/sn, etc.

I like this better, sorry to keep nitpicking, but just a few comments
below ..

> diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
> index e69de29..a4106ec 100644
> --- a/Documentation/DMA-attributes.txt
> +++ b/Documentation/DMA-attributes.txt
> @@ -0,0 +1,24 @@
> + DMA attributes
> + ==============
> +
> +This document describes the semantics of the DMA attributes that are
> +defined in linux/dma-attrs.h.
> +
> +DMA_ATTR_BARRIER
> +----------------
> +
> +DMA_ATTR_BARRIER is a barrier attribute for DMA. DMA to a memory
> +region with the DMA_ATTR_BARRIER attribute forces all pending DMA
> +writes to complete, and thus provides a mechanism to strictly order
> +DMA from a device across all intervening busses and bridges. This
> +barrier is not specific to a particular type of interconnect, it
> +applies to the system as a whole, and so its implementation must
> +account for the idiosyncracies of the system all the way from the
> +DMA device to memory.

You say a "DMA to a memory region with the DMA_ATTR_BARRIER attribute
forces all pending DMA writes to complete". Does it force _all_ DMA
writes to complete, or just writes to the region, or just writes coming
from devices? What if something is writing to a device?

Does DMA_ATTR_BARRIER have any effect on reads?

cheers

--
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-03-14 05:23:30

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Fri, Mar 14, 2008 at 03:30:29PM +1100, Michael Ellerman wrote:
...
> > +DMA_ATTR_BARRIER is a barrier attribute for DMA. DMA to a memory
> > +region with the DMA_ATTR_BARRIER attribute forces all pending DMA
> > +writes to complete, and thus provides a mechanism to strictly order
> > +DMA from a device across all intervening busses and bridges. This
> > +barrier is not specific to a particular type of interconnect, it
> > +applies to the system as a whole, and so its implementation must
> > +account for the idiosyncracies of the system all the way from the
> > +DMA device to memory.
>
> You say a "DMA to a memory region with the DMA_ATTR_BARRIER attribute
> forces all pending DMA writes to complete". Does it force _all_ DMA
> writes to complete, or just writes to the region, or just writes coming
> from devices?

Yes to "_all_ DMA writes". re-read the second sentence.

> What if something is writing to a device?

aka MMIO writes. Not defined and in general unrelated. AFAIK, no
version of PCI has ordering rules for traffic going in opposite
directions. I'm 99% sure about this but haven't reviewed any
PCI docs in about 2 years.

Note I think it's possible for DMA READs to complete out of order
given PCI-x (IIRC) and PCI-e (I'm certain) support split transactions.
This means a device can have multiple DMA reads in flight and the
chipset/system firmware (aka BIOS) have to enable and support this
functionality as well. I _believe_ those split transactions can
complete out of order (since they are tagged like SCSI/SATA are)
and I expect it's up to the device to deal with that already.

Devices which do not support or use more than one split transaction
obviously won't have any DMA READ ordering issues.

On a related note, I always think of MSI/MSI-X as DMA Writes.
Michael's got me thinking we need to explicitly state that.
In "normal" use, the device will not issue an MSI until after
the "completion DMA write" has been issued and thus the MSI/MSI-X
transactions are NOT subject to the ordering requirement...but
that's not exactly true. We don't want the MSI to ever
pass the "completion DMA write".

Do we need to state the platform interconnect can NOT allow
successive DMA writes to pass the ordered DMA writes?
Or state MSI DMA writes are implied to be to an "ordered DMA region"?
Both statements?


> Does DMA_ATTR_BARRIER have any effect on reads?

It could but it's not defined to. :)
I thought about this before but wanted to keep the document short.

Perhaps two things should be done to address this:
1) rename this to DMA_ATTR_WRITE_BARRIER
2) state it has no explicit effect on DMA Reads.

If someone needs that changed in the future, we can then
define how reading from a region will trigger ordering rules.

thanks,
grant

>
> cheers
>
> --
> Michael Ellerman
> OzLabs, IBM Australia Development Lab
>
> wwweb: http://michael.ellerman.id.au
> phone: +61 2 6212 1183 (tie line 70 21183)
>
> We do not inherit the earth from our ancestors,
> we borrow it from our children. - S.M.A.R.T Person

2008-03-14 16:41:33

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Thursday, March 13, 2008 10:21 pm Grant Grundler wrote:
> > What if something is writing to a device?
>
> aka MMIO writes. Not defined and in general unrelated. AFAIK, no
> version of PCI has ordering rules for traffic going in opposite
> directions. I'm 99% sure about this but haven't reviewed any
> PCI docs in about 2 years.

Yeah, afaik mmio traffic will be unaffected. It's travelling in the other
direction and subject to different ordering constraints.

> On a related note, I always think of MSI/MSI-X as DMA Writes.
> Michael's got me thinking we need to explicitly state that.
> In "normal" use, the device will not issue an MSI until after
> the "completion DMA write" has been issued and thus the MSI/MSI-X
> transactions are NOT subject to the ordering requirement...but
> that's not exactly true. We don't want the MSI to ever
> pass the "completion DMA write".
>
> Do we need to state the platform interconnect can NOT allow
> successive DMA writes to pass the ordered DMA writes?
> Or state MSI DMA writes are implied to be to an "ordered DMA region"?
> Both statements?

Yeah, I've always thought of MSIs the same way. The platform really should
ensure that MSIs have this barrier bit set (regular interrupts on Altix are
actually DMAs as well, and have the barrier bit set), either these docs or
the MSI docs should probably indicate as much. That could be done as a
separate patch though...

Jesse

2008-03-18 01:09:15

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Thu, 2008-03-13 at 23:21 -0600, Grant Grundler wrote:
> On Fri, Mar 14, 2008 at 03:30:29PM +1100, Michael Ellerman wrote:
> ...
> > > +DMA_ATTR_BARRIER is a barrier attribute for DMA. DMA to a memory
> > > +region with the DMA_ATTR_BARRIER attribute forces all pending DMA
> > > +writes to complete, and thus provides a mechanism to strictly order
> > > +DMA from a device across all intervening busses and bridges. This
> > > +barrier is not specific to a particular type of interconnect, it
> > > +applies to the system as a whole, and so its implementation must
> > > +account for the idiosyncracies of the system all the way from the
> > > +DMA device to memory.
> >
> > You say a "DMA to a memory region with the DMA_ATTR_BARRIER attribute
> > forces all pending DMA writes to complete". Does it force _all_ DMA
> > writes to complete, or just writes to the region, or just writes coming
> > from devices?
>
> Yes to "_all_ DMA writes". re-read the second sentence.

Yeah, just checking.

> > What if something is writing to a device?
>
> aka MMIO writes. Not defined and in general unrelated. AFAIK, no
> version of PCI has ordering rules for traffic going in opposite
> directions. I'm 99% sure about this but haven't reviewed any
> PCI docs in about 2 years.

I'll take you word for it :) But there's also non-PCI things between
the device and the cpu, so anything's potentially possible.

> On a related note, I always think of MSI/MSI-X as DMA Writes.
> Michael's got me thinking we need to explicitly state that.
> In "normal" use, the device will not issue an MSI until after
> the "completion DMA write" has been issued and thus the MSI/MSI-X
> transactions are NOT subject to the ordering requirement...but
> that's not exactly true. We don't want the MSI to ever
> pass the "completion DMA write".
>
> Do we need to state the platform interconnect can NOT allow
> successive DMA writes to pass the ordered DMA writes?
> Or state MSI DMA writes are implied to be to an "ordered DMA region"?
> Both statements?

I think the 2nd at least. If MSIs can pass the DMAs they're signalling
completion for then they're no better than LSI, the driver will still
need to go check the DMA has finished somehow.

> > Does DMA_ATTR_BARRIER have any effect on reads?
>
> It could but it's not defined to. :)
> I thought about this before but wanted to keep the document short.
>
> Perhaps two things should be done to address this:
> 1) rename this to DMA_ATTR_WRITE_BARRIER
> 2) state it has no explicit effect on DMA Reads.
>
> If someone needs that changed in the future, we can then
> define how reading from a region will trigger ordering rules.

I'd vote for 1), that way if someone comes up with a semantic for
read/write/both we will have sensible flag names.

cheers

--
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-03-20 00:42:36

by Arthur Kepner

[permalink] [raw]
Subject: Re: [PATCH 1/3 v3] dma: document dma_{un}map_{single|sg}_attrs() interface

On Fri, Mar 14, 2008 at 03:30:29PM +1100, Michael Ellerman wrote:
>
> I like this better, sorry to keep nitpicking, but just a few comments
> below ..

Comments are appreciated.

> ...
> You say a "DMA to a memory region with the DMA_ATTR_BARRIER attribute
> forces all pending DMA writes to complete". Does it force _all_ DMA
> writes to complete, or just writes to the region, or just writes coming
> from devices? What if something is writing to a device?
>

Yes, forces all writes to complete (as others have already said).


> Does DMA_ATTR_BARRIER have any effect on reads?
>

No, changed the name to DMA_ATTR_WRITE_BARRIER to emphasize this.

--
Arthur