2007-12-18 00:31:27

by Arthur Kepner

[permalink] [raw]
Subject: [RFC] dma: passing "attributes" to dma_map_* routines


Waaay back in October I sent some patches for passing additional
attributes to the dma_map_* routines:

http://marc.info/?l=linux-kernel&m=119137949604365&w=2

This is somthing needed for ia64 Altix NUMA machines (as described
in that thread).

Several folks objected to the approach I used - it was a bit of a
hack - and so these patches were dropped.

This time I'd like to get something resembling concensus before
spending much time on this.

Can you please comment on the options listed here, or suggest
alternatives?

We could:

1) add an additional parameter to the dma_map_* routines, somthing
like:

dma_map_single(struct device *dev, void *cpu_addr, size_t size,
enum dma_data_direction direction, u32 flags)
....

2) change the "direction" argument to be a field of bits:

dma_map_single(struct device *dev, void *cpu_addr, size_t size,
u32 flags)
....

where flags encodes the direction and optionally other attributes.

3) add new functions to the dma interface, e.g.:

dma_map_single_attrs(struct device *dev, void *cpu_addr, size_t size,
enum dma_data_direction direction, u32 attrs)
.....

1) and 2) are going to require many minor changes, and 3) is going
to pollute the dma interface a bit.

Other ideas?

--
Arthur


2007-12-18 16:51:25

by Stefan Richter

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines

[email protected] wrote:
> Waaay back in October I sent some patches for passing additional
> attributes to the dma_map_* routines:
>
> http://marc.info/?l=linux-kernel&m=119137949604365&w=2

Do I understand correctly?: A device and the CPUs communicate via two
separate memory areas: A data buffer and a status FIFO. The NUMA
interconnect may reorder accesses of the device to the areas. (Write
accesses? Read accesses? Both?)

To ensure synchronization between device and CPUs, you want to mark a
memory area which is to be dma-mapped with a flag which says: "Writes
to the memory region will cause in-flight DMA to be flushed". Whose
writes? A write access from the device or a write access from a CPU?
--
Stefan Richter
-=====-=-=== ==-- =--=-
http://arcgraph.de/sr/

2007-12-18 19:05:07

by Roland Dreier

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines

> > Waaay back in October I sent some patches for passing additional
> > attributes to the dma_map_* routines:
> >
> > http://marc.info/?l=linux-kernel&m=119137949604365&w=2
>
> Do I understand correctly?: A device and the CPUs communicate via two
> separate memory areas: A data buffer and a status FIFO. The NUMA
> interconnect may reorder accesses of the device to the areas. (Write
> accesses? Read accesses? Both?)

Yes, that is correct. In-flight DMA writes from the device to system
memory can pass each other, so that the CPU sees a "complete" status
in the completion queue before the operation in question has actually
finished updating CPU memory.

> To ensure synchronization between device and CPUs, you want to mark a
> memory area which is to be dma-mapped with a flag which says: "Writes
> to the memory region will cause in-flight DMA to be flushed". Whose
> writes? A write access from the device or a write access from a CPU?

Write access from the device. The idea is to set the flag on the
completion queue, so that when the device writes the "complete" status
for the operation, all DMAs associated with that operation are flushed
first.

- R.

2007-12-18 20:09:37

by Arthur Kepner

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines

On Tue, Dec 18, 2007 at 05:50:42PM +0100, Stefan Richter wrote:

> Do I understand correctly?: A device and the CPUs communicate via two
> separate memory areas: A data buffer and a status FIFO. The NUMA
> interconnect may reorder accesses of the device to the areas. (Write
> accesses? Read accesses? Both?)

Yes, I think you understand. Reorderings are possible on reads and
writes. Things get synced up by either an interrupt or a write to
a memory region with a "barrier attribute". Memory allocated with
dma_alloc_coherent() gets the barrier attribute. The idea here is
to allow memory allocated with plain old malloc() or whatever to
get the same attribute.

>
> To ensure synchronization between device and CPUs, you want to mark a
> memory area which is to be dma-mapped with a flag which says: "Writes
> to the memory region will cause in-flight DMA to be flushed". Whose
> writes? A write access from the device or a write access from a CPU?

A write from the device, e.g., when the device writes to indicate
"data DMA is complete".

--
Arthur

2007-12-18 21:00:30

by Stefan Richter

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines

[email protected] wrote:
> Reorderings are possible on reads and
> writes. Things get synced up by either an interrupt or a write to
> a memory region with a "barrier attribute". Memory allocated with
> dma_alloc_coherent() gets the barrier attribute. The idea here is
> to allow memory allocated with plain old malloc() or whatever to
> get the same attribute.

>From its purpose it sounds like you need this only for few special
memory regions which would typically be mapped by dma_map_single() and
furthermore that drivers who need this behavior will be changed to
explicitly demand it. If so, a nonintrusive API extension could simply
be to add an

dma_addr_t dma_map_single_write_last(struct device *dev, void *ptr,
size_t size, enum dma_data_direction direction);

or however you'd like to call it. (DMA-maps a memory area for which it
is guaranteed that of all DMA writes pending at any time, a DMA
reordering interconnect --- if such an interconnect is present --- will
perform DMAs to other areas first and to this area last. On machines
which don't reorder DMAs, this function is the same as dma_map_single().)

However, your older patch series looks like you want this behavior also
in areas which are mapped by dma_map_sg(), do you?. Still, adding two
functions of the kind like above, if necessary, might still be
preferable to changing the call parameters of existing functions or to
overloading enum dma_data_direction.

So that would be option 3) of yours, though without your attrs
parameter. Do you expect the need for even more flags for other kinds
of special behavior?
--
Stefan Richter
-=====-=-=== ==-- =--=-
http://arcgraph.de/sr/

2007-12-18 21:09:59

by Roland Dreier

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines

> However, your older patch series looks like you want this behavior also
> in areas which are mapped by dma_map_sg(), do you?. Still, adding two
> functions of the kind like above, if necessary, might still be
> preferable to changing the call parameters of existing functions or to
> overloading enum dma_data_direction.

Yes, the _sg variants are needed, because we want to map userspace
regions with this attribute and there's of course no guarantee that
userspace memory is contiguous.

- R.

2007-12-20 18:53:37

by Arthur Kepner

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines

On Tue, Dec 18, 2007 at 09:59:24PM +0100, Stefan Richter wrote:

> ....
> So that would be option 3) of yours, though without your attrs
> parameter. Do you expect the need for even more flags for other kinds
> of special behavior?

I was hoping to keep the option of adding additional
flags, but for now there's no obvious need for other
flags that I'm aware of.

--
Arthur

2007-12-20 19:08:19

by Arthur Kepner

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines

On Tue, Dec 18, 2007 at 09:59:24PM +0100, Stefan Richter wrote:
>
> From its purpose it sounds like you need this only for few special
> memory regions which would typically be mapped by dma_map_single()

We need the _sg versions too, as Roland already mentioned.

> .... and
> furthermore that drivers who need this behavior will be changed to
> explicitly demand it. If so, a nonintrusive API extension could simply
> be to add an
>
> dma_addr_t dma_map_single_write_last(struct device *dev, void *ptr,
> size_t size, enum dma_data_direction direction);
> ...

This is the easiest thing to do, and therefore it'd be my
preference. But I'm concerned that the keepers of the dma
interface will object to this. So far they've been silent
in this thread - maybe they need to see a patch before
they'll get engaged....

--
Arthur

2007-12-20 20:58:29

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines


On Tue, 2007-12-18 at 12:07 -0800, [email protected] wrote:
> On Tue, Dec 18, 2007 at 05:50:42PM +0100, Stefan Richter wrote:
>
> > Do I understand correctly?: A device and the CPUs communicate via two
> > separate memory areas: A data buffer and a status FIFO. The NUMA
> > interconnect may reorder accesses of the device to the areas. (Write
> > accesses? Read accesses? Both?)
>
> Yes, I think you understand. Reorderings are possible on reads and
> writes. Things get synced up by either an interrupt or a write to
> a memory region with a "barrier attribute". Memory allocated with
> dma_alloc_coherent() gets the barrier attribute. The idea here is
> to allow memory allocated with plain old malloc() or whatever to
> get the same attribute.
>
> >
> > To ensure synchronization between device and CPUs, you want to mark a
> > memory area which is to be dma-mapped with a flag which says: "Writes
> > to the memory region will cause in-flight DMA to be flushed". Whose
> > writes? A write access from the device or a write access from a CPU?
>
> A write from the device, e.g., when the device writes to indicate
> "data DMA is complete".

Can't you just have a primitive to sync things up that you call
explicitely from your driver after fetching a new status entry ?

Ben.

2007-12-21 18:02:01

by Arthur Kepner

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines

On Fri, Dec 21, 2007 at 07:56:25AM +1100, Benjamin Herrenschmidt wrote:
> ...
> Can't you just have a primitive to sync things up that you call
> explicitely from your driver after fetching a new status entry ?
>

Well, the only mechanisms I know to get things synced are the ones
I mentioned before: 1) generate an interrupt, 2) write to memory
which has the "barrier" attribute. Obviously 1 is out - giving
the memory used for status indications the barrier attribute is
the most primitive means I'm aware of.

--
Arthur

2007-12-21 20:25:18

by Stefan Richter

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines

[email protected] wrote:
> On Fri, Dec 21, 2007 at 07:56:25AM +1100, Benjamin Herrenschmidt wrote:
>> ...
>> Can't you just have a primitive to sync things up that you call
>> explicitely from your driver after fetching a new status entry ?
>
> Well, the only mechanisms I know to get things synced are the ones
> I mentioned before: 1) generate an interrupt, 2) write to memory
> which has the "barrier" attribute. Obviously 1 is out - giving
> the memory used for status indications the barrier attribute is
> the most primitive means I'm aware of.

I don't know what it is in detail what Arthur et al are facing, but this
is how it looks to me:

A sync call like Benjamin suggests has two preconditions.

1. The interconnect hardware actually offers such a functionality.
I.e. there is a way for the CPU to request flushing of pending DMAs, and
the interconnect is able to generate an interrupt to signal when it is done.

2. The protocols which the devices use are suitable for this. I.e.
there are no race conditions during the window between when the DMAs
were flushed and when the CPU has finished reading the status memory area.

Even _if_ both is true, a regime with such a call would obviously be
more complex and might be harder to get to perform well, compared to the
regime where the interconnect hardware is told beforehand that some
particular DMA writes must never be reordered relative to earlier and
later DMA writes.
--
Stefan Richter
-=====-=-=== ==-- =-=-=
http://arcgraph.de/sr/

2007-12-21 21:32:31

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC] dma: passing "attributes" to dma_map_* routines


On Fri, 2007-12-21 at 10:00 -0800, [email protected] wrote:
> On Fri, Dec 21, 2007 at 07:56:25AM +1100, Benjamin Herrenschmidt wrote:
> > ...
> > Can't you just have a primitive to sync things up that you call
> > explicitely from your driver after fetching a new status entry ?
> >
>
> Well, the only mechanisms I know to get things synced are the ones
> I mentioned before: 1) generate an interrupt, 2) write to memory
> which has the "barrier" attribute. Obviously 1 is out - giving
> the memory used for status indications the barrier attribute is
> the most primitive means I'm aware of.

Well, I'm not totally against turning "direction" into a flag mask, as I
do have requests to do something similar on some PowerPC's in fact in
order to control the ordering guarantees of a given DMA mapping (ie.
relaxed vs. fully ordered).

I'm just worried that we'll end up with as many semantics for those
flags as we have host bridges & archs around, which would be bad.

Ben.