LinuxLists.cc - [RFC] generic device DMA implementation

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

You should update Documentation/DMA-mapping.txt too :)

2002-12-04 19:28:50

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> You should update Documentation/DMA-mapping.txt too :)

Oh, yes, and convert all the other arch's too. That's on my list of things
todo (and will be there when I post a patch). I was just throwing out a
request for comments to see what turned up.

James

2002-12-04 21:12:31

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

James Bottomley writes:
> Currently our only DMA API is highly PCI specific (making any non-pci
> bus with a DMA controller create fake PCI devices to help it
> function).
>
> Now that we have the generic device model, it should be equally
> possible to rephrase the entire API for generic devices instead of
> pci_devs.

Keep in mind that sometimes the actual _implementation_ is also highly
PCI-specific -- that is, what works for PCI devices may not work for
other devices and vice-versa.

So perhaps instead of just replacing `pci_...' with `dma_...', it would
be better to add new function pointers to `struct bus_type' for all this
stuff (or something like that).

> The PCI api has pci_alloc_consistent which allocates only consistent memory
> and fails the allocation if none is available thus leading to driver writers
> who might need to function with inconsistent memory to detect this and employ
> a fallback strategy.
> ...
> The idea is that the memory type can be coded into dma_addr_t which the
> subsequent memory sync operations can use to determine whether
> wback/invalidate should be a nop or not.

How is the driver supposed to tell whether a given dma_addr_t value
represents consistent memory or not? It seems like an (arch-specific)
`dma_addr_is_consistent' function is necessary, but I couldn't see one
in your patch.

Thanks,

-Miles
--
We are all lying in the gutter, but some of us are looking at the stars.
-Oscar Wilde

2002-12-04 21:14:18

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

2002-12-04 21:39:29

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> How is the driver supposed to tell whether a given dma_addr_t value
> represents consistent memory or not? It seems like an (arch-specific)
> `dma_addr_is_consistent' function is necessary, but I couldn't see one
> in your patch.

well, the patch was only for x86, which is fully consistent. For parisc, that
becomes a field for the dma accessor functions.

However, even on parisc, the (supported) machines are either entirely
consistent or entirely inconsistent.

If you have a machine that has both consistent and inconsistent blocks, you
need to encode that in dma_addr_t (which is a platform definable type).

The sync functions would just decode the type and either nop or perform the
sync.

James

2002-12-04 21:35:13

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> Keep in mind that sometimes the actual _implementation_ is also highly
> PCI-specific -- that is, what works for PCI devices may not work for
> other devices and vice-versa.

> So perhaps instead of just replacing `pci_...' with `dma_...', it
> would be better to add new function pointers to `struct bus_type' for
> all this stuff (or something like that).

Not really, that can all be taken care of in the platform implementation.

The parisc implementation has exactly that problem. The platform
implementation uses the generic device platform_data to cache the iommu
accessor methods (it actually finds the iommu by walking up the device parents
until it gets to the iommu driver--which means it needs to walk off the PCI
bus).

In general, the generic device already has enough information that the
platform implementation can be highly bus specific---and, of course, once you
know exactly what bus it's on, you can cast it to the bus device if you want.

James

2002-12-05 00:37:26

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On 2002-12-04, James Bottomley wrote:

>Now that we have the generic device model, it should be equally possible to
>rephrase the entire [DMA] API for generic devices instead of pci_devs.

Yes. This issue has come up repeatedly. I'd really like to
see a change like yours integrated soon to stop the spread of fake PCI
devices (including the pcidev==NULL convention) and other contortions
being used to work around this. Also, such a change would enable
consolidation of certain memory allocations and their often buggy
error branches from hundred of drivers into a few places.

As you know, I posted a similar patch that created a new field
in struct bus_type, as Miles Bader suggested just now, although only
for {alloc,free}_consistent. if the bus-specific variation can be
confined to some smaller part of these routines or eliminated, then
I'm all in favor of skipping the extra indirection and going with your
approach. It will be interesting to see if your model allows most of
the sbus_ and pci_ DMA mapping routines in sparc to be merged. I
suspect that you will have to adopt some kind of convention, such as
that device->parent->driver_private will have a common meaning for pci
and sbus device on that platform.

>The new DMA API allows a driver to advertise its level of consistent memory
>compliance to dma_alloc_consistent. There are essentially two levels:
>
>- I only work with consistent memory, fail if I cannot get it, or
>- I can work with inconsistent memory, try consistent first but return
>inconsistent if it's not available.

If these routines can allocate non-consistent memory, then how
about renaming them to something less misleading, like dma_{malloc,free}?

Can you please define the "consistency" argument to these
two routines as a bit mask? There are probably other kinds of memory
inconsistency a driver might be able to accomodate in the future (CPU
read caching, CPU writeback, incosistency across mulitple CPU's if the
driver knows that it is only going to run on one CPU). I think 0
should be the "most consistent" kind of memory. That way, DMA memory
allocators could ignore bits that they don't know about, as those bits
would only advertise extra capabilities of a driver. I think this
extensibility is more useful than the debugging value of
DMA_CONFORMANCE_NONE.

>The idea is that the memory type can be coded into dma_addr_t which the
>subsequent memory sync operations can use to determine whether
>wback/invalidate should be a nop or not.

Your patch does not have to wait for this, but I would like
macros like {r,w}mb_maybe(dma_addr, len) that would compile to nothing
on machines where dma_malloc always returned consistent memory,
compile to your proposed range checking versions on machines that
could return consistent or inconsistent memory, and compile to
dma_cache_wback and rmb(?) on machines that always returned
inconsistent memory. The existing dma_cache_wback routines would
still never do the range checks, because they would continue to be
used only in cases where the need for flushing is known at compile
time (they would always compile to either the barrier code or nothing).

Also something that could be added later is a
bus_type.mem_mapped flag so that these DMA routines could do:

BUG_ON(!dev->bus.mem_mapped);

...to catch attempts to allocate memory for devices that are
not mapped. Alternatively, we could have a struct mem_device that
embeds a struct device and represents only those types of devices
that can be mapped into memory.

It is also possible that we might want to add a field to
struct device identifying the memory mapped "parent" of a
non-memory-mapped device, such as the PCI-based USB host adapter of a
USB network device so that mapping of network packets for transmission
could be centralized. That's probably a separate patch though.

P.S., Did you miss a patch for include/linux/device.h adding
device.dma_mask, or is that change already queued for 2.5.51?

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-05 00:40:18

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Wed, Dec 04, 2002 at 11:47:14AM -0600, James Bottomley wrote:
> Currently our only DMA API is highly PCI specific (making any non-pci bus with
> a DMA controller create fake PCI devices to help it function).
>
> Now that we have the generic device model, it should be equally possible to
> rephrase the entire API for generic devices instead of pci_devs.
>
> This patch does just that (for x86---although I also have working code for
> parisc, that's where I actually tested the DMA capability).
>
> The API is substantially the same as the PCI DMA one, with one important
> exception with regard to consistent memory:
>
> The PCI api has pci_alloc_consistent which allocates only consistent memory
> and fails the allocation if none is available thus leading to driver writers
> who might need to function with inconsistent memory to detect this and employ
> a fallback strategy.
>
> The new DMA API allows a driver to advertise its level of consistent memory
> compliance to dma_alloc_consistent. There are essentially two levels:
>
> - I only work with consistent memory, fail if I cannot get it, or
> - I can work with inconsistent memory, try consistent first but return
> inconsistent if it's not available.

Do you have an example of where the second option is useful? Off hand
the only places I can think of where you'd use a consistent_alloc()
rather than map_single() and friends is in cases where the hardware's
behaviour means you absolutely positively have to have consistent
memory.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-05 00:47:59

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

David Gibson wrote:
> On Wed, Dec 04, 2002 at 11:47:14AM -0600, James Bottomley wrote:
>>The new DMA API allows a driver to advertise its level of consistent memory
>>compliance to dma_alloc_consistent. There are essentially two levels:
>>
>>- I only work with consistent memory, fail if I cannot get it, or
>>- I can work with inconsistent memory, try consistent first but return
>>inconsistent if it's not available.
>
>
> Do you have an example of where the second option is useful? Off hand
> the only places I can think of where you'd use a consistent_alloc()
> rather than map_single() and friends is in cases where the hardware's
> behaviour means you absolutely positively have to have consistent
> memory.

agreed, good catch. Returning inconsistent memory when you asked for
consistent makes not much sense: the programmer either knows what the
hardware wants, or the programmer is silly and should not be using
alloc_consistent anyway.

2002-12-05 00:48:52

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

Adam J. Richter wrote:
> On 2002-12-04, James Bottomley wrote:
>
>
>>Now that we have the generic device model, it should be equally possible to
>>rephrase the entire [DMA] API for generic devices instead of pci_devs.
>
>
> Yes. This issue has come up repeatedly. I'd really like to
> see a change like yours integrated soon to stop the spread of fake PCI
> devices (including the pcidev==NULL convention) and other contortions
> being used to work around this. Also, such a change would enable
> consolidation of certain memory allocations and their often buggy
> error branches from hundred of drivers into a few places.

Agreed. I'm glad James is doing this work, it will clean up a lot of
assumptions and corner-case-uglies...

Jeff

2002-12-05 01:15:03

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

David Gibson wrote:
>On Wed, Dec 04, 2002 at 11:47:14AM -0600, James Bottomley wrote:
[...]
>> The new DMA API allows a driver to advertise its level of consistent memory
>> compliance to dma_alloc_consistent. There are essentially two levels:
>>
>> - I only work with consistent memory, fail if I cannot get it, or
>> - I can work with inconsistent memory, try consistent first but return
>> inconsistent if it's not available.
>
>Do you have an example of where the second option is useful?

From a previous discussion, I understand that there are some
PCI bus parisc machines without consistent memory.

>Off hand
>the only places I can think of where you'd use a consistent_alloc()
>rather than map_single() and friends is in cases where the hardware's
>behaviour means you absolutely positively have to have consistent
>memory.

That would result in big rarely used branches in device
drivers or lots of ifdef's and the equivalent. With James's approach,
porting a driver to support those parisc machines (for example) would
involve sprinkling in some calls to macros that would compile to
nothing on the other machines.

Compare the code clutter involved in allowing those
inconsistent parisc machines to run, say, the ten most popular
ethernet controllers and the four most popular scsi controllers. I
think the difference in the resulting source code size would already
be in the hundreds of lines.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-05 01:36:49

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> Do you have an example of where the second option is useful? Off hand
> the only places I can think of where you'd use a consistent_alloc()
> rather than map_single() and friends is in cases where the hardware's
> behaviour means you absolutely positively have to have consistent
> memory.

Well, it comes from parisc drivers. Here you'd really rather have consistent
memory because it's more efficient, but on certain platforms it's just not
possible.

In the drivers that do this, it leads to this type of awfulness:

consistent = 1;
if(!mem = pci_alloc_consistent() {
mem = __get_free_pages
mem = pci_map_single()
consistent = 1;
}
....
if(!consistent)
dma_cache_wback()

etc.

The idea is that this translates to

mem = dma_alloc_consistent(... DMA_CONFORMANCE_NON_CONSISTENT)

...

dma_sync_single(mem..)

Where if you have consistent memory then the sync is a nop.

[email protected] said:
> If these routines can allocate non-consistent memory, then how about
> renaming them to something less misleading, like dma_{malloc,free}?

Yes, I think the above makes this point. I'll change the names.

James

2002-12-05 01:55:09

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> As you know, I posted a similar patch that created a new field in
> struct bus_type, as Miles Bader suggested just now, although only for
> {alloc,free}_consistent. if the bus-specific variation can be
> confined to some smaller part of these routines or eliminated, then
> I'm all in favor of skipping the extra indirection and going with your
> approach. It will be interesting to see if your model allows most of
> the sbus_ and pci_ DMA mapping routines in sparc to be merged. I
> suspect that you will have to adopt some kind of convention, such as
> that device->parent->driver_private will have a common meaning for pci
> and sbus device on that platform.

I did prototype something like this, using a field called dma_accessors that
was basically a platform opaque set of function pointers.

I ultimately came to the conclusion that these functions couldn't be per
bus_type, they had to be per bus instance. Finally, it just seemed easier to
load this information into the platform_data field of the generic device and
let the implementation handle it instead of exposing it explicitly in the
model.

> Can you please define the "consistency" argument to these two
> routines as a bit mask? There are probably other kinds of memory
> inconsistency a driver might be able to accomodate in the future (CPU
> read caching, CPU writeback, incosistency across mulitple CPU's if the
> driver knows that it is only going to run on one CPU). I think 0
> should be the "most consistent" kind of memory. That way, DMA memory
> allocators could ignore bits that they don't know about, as those bits
> would only advertise extra capabilities of a driver. I think this
> extensibility is more useful than the debugging value of
> DMA_CONFORMANCE_NONE.

I'd rather hide the range of possible memory types from the drivers. I think
all a driver needs to know is that the memory is fully consistent, or it isn't
(and if it isn't, the driver has to put the full syncs in, the implementation
decides if they really correspond to anything).

By and large, most drivers just want to specify CONFORMANCE_CONSISTENT, so
that they don't have to bother with the sync points.

> Also something that could be added later is a bus_type.mem_mapped
> flag so that these DMA routines could do:

> BUG_ON(!dev->bus.mem_mapped);

> ...to catch attempts to allocate memory for devices that are not
> mapped. Alternatively, we could have a struct mem_device that embeds
> a struct device and represents only those types of devices that can be
> mapped into memory.

I'm dubious about efforts to unify io space and memory space. I think the
semantics are just too different. However, if someone else wants to lead the
charge...

> P.S., Did you miss a patch for include/linux/device.h adding
> device.dma_mask, or is that change already queued for 2.5.51?

I think that's queued somewhere in Patrick Mochel's pile for inclusion.

James

2002-12-05 02:24:24

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

James Bottomley <[email protected]> writes:
> > How is the driver supposed to tell whether a given dma_addr_t value
> > represents consistent memory or not? It seems like an
> > (arch-specific) `dma_addr_is_consistent' function is necessary, but
> > I couldn't see one in your patch.
>
> If you have a machine that has both consistent and inconsistent blocks, you
> need to encode that in dma_addr_t (which is a platform definable type).
>
> The sync functions would just decode the type and either nop or perform the
> sync.

My thinking was that a driver might want to do things like --

if (dma_addr_is_consistent (some_funky_addr)) {
do it quickly;
} else
do_it_the_slow_way (some_funky_addr);

in other words, something besides just calling the sync functions, in
the case where the memory was consistent.

-Miles
--
Suburbia: where they tear out the trees and then name streets after them.

2002-12-05 02:33:14

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Wed, Dec 04, 2002 at 05:21:04PM -0800, Adam J. Richter wrote:
> David Gibson wrote:
> >On Wed, Dec 04, 2002 at 11:47:14AM -0600, James Bottomley wrote:
> [...]
> >> The new DMA API allows a driver to advertise its level of consistent memory
> >> compliance to dma_alloc_consistent. There are essentially two levels:
> >>
> >> - I only work with consistent memory, fail if I cannot get it, or
> >> - I can work with inconsistent memory, try consistent first but return
> >> inconsistent if it's not available.
> >
> >Do you have an example of where the second option is useful?
>
> From a previous discussion, I understand that there are some
> PCI bus parisc machines without consistent memory.

And there are PPCs without consistent memory, except by disabling
cache.

> >Off hand
> >the only places I can think of where you'd use a consistent_alloc()
> >rather than map_single() and friends is in cases where the hardware's
> >behaviour means you absolutely positively have to have consistent
> >memory.
>
> That would result in big rarely used branches in device
> drivers or lots of ifdef's and the equivalent. With James's approach,
> porting a driver to support those parisc machines (for example) would
> involve sprinkling in some calls to macros that would compile to
> nothing on the other machines.
>
> Compare the code clutter involved in allowing those
> inconsistent parisc machines to run, say, the ten most popular
> ethernet controllers and the four most popular scsi controllers. I
> think the difference in the resulting source code size would already
> be in the hundreds of lines.

For cases like this, I'm talking about replacing the
consistent_alloc() with a kmalloc(), then using the cache flush
macros. Is there any machine for which this is not sufficient?

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-05 02:33:12

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Wed, Dec 04, 2002 at 07:44:17PM -0600, James Bottomley wrote:
> [email protected] said:
> > Do you have an example of where the second option is useful? Off hand
> > the only places I can think of where you'd use a consistent_alloc()
> > rather than map_single() and friends is in cases where the hardware's
> > behaviour means you absolutely positively have to have consistent
> > memory.
>
> Well, it comes from parisc drivers. Here you'd really rather have
> consistent memory because it's more efficient, but on certain
> platforms it's just not possible.

Hmm... that doesn't seem sufficient to explain it.

Some background: I work with PPC embedded chips (the 4xx family) whose
only way to get consistent memory is by entirely disabling the cache.
However in some cases you *have* to have consistent memory despite
this very high cost. In all other cases you want to use inconsistent
memory (just allocated with kmalloc() or get_free_pages()) and
explicit cache flushes.

It seems the "try to get consistent memory, but otherwise give me
inconsistent" is only useful on machines which:
(1) Are not fully consisent, BUT
(2) Can get consistent memory without disabling the cache, BUT
(3) Not very much of it, so you might run out.

The point is, there has to be an advantage to using consistent memory
if it is available AND the possibility of it not being available.

Otherwise, drivers which absolutely need consistent memory, no matter
the cost, should use consistent_alloc(), all other drivers just use
kmalloc() (or whatever) then use the DMA flushing functions which
compile to NOPs on platforms with consistent memory.

Are there actually any machines with the properties described above?
The machines I know about don't:
- x86 and normal PPC are fully consistent, so the question
doesn't arise
- PPC 4xx and 8xx are incconsistent if cached, so you never
want consistent if you don't absolutely need it
- PA Risc is fully non-consistent (I'm told), so the question
doesn't arise.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-05 02:43:09

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

David Gibson <[email protected]> writes:
> For cases like this, I'm talking about replacing the
> consistent_alloc() with a kmalloc(), then using the cache flush
> macros. Is there any machine for which this is not sufficient?

I'm not entirely sure what you mean by `using the cache flush macros,'
but on one of my platforms, PCI consistent memory must be allocated from
a special area.

It's also not clear what you mean by `for cases like this' -- do you
mean, replace _all_ uses of xxx_alloc_consistent with kmalloc, or do you
mean just those cases where pci_alloc_consistent currently returns 0?

If the former, it obviously doesn't work on my platform; if the latter,
I guess this is what James' patch assumes the platform-specific
dma_alloc_consistent function will do.

-Miles
--
I'd rather be consing.

2002-12-05 02:56:40

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

>On Wed, Dec 04, 2002 at 07:44:17PM -0600, James Bottomley wrote:
>> [email protected] said:
>> > Do you have an example of where the second option is useful? Off hand
>> > the only places I can think of where you'd use a consistent_alloc()
>> > rather than map_single() and friends is in cases where the hardware's
>> > behaviour means you absolutely positively have to have consistent
>> > memory.
>>
>> Well, it comes from parisc drivers. Here you'd really rather have
>> consistent memory because it's more efficient, but on certain
>> platforms it's just not possible.

>Hmm... that doesn't seem sufficient to explain it.

The question is not what is possible, but what is optimal.

Yes, it is possible to write drivers for machines without
consistent memory that work with any DMA device, by using
dma_{map,sync}_single as you suggest, even if caching could be
disabled. That is how drivers/scsi/53c700.c and
drivers/net/lasi_82596.c work today.

The advantages of James's approach is that it will result in
these drivers having simpler source code and even smaller object code
on machines that do not have this problem.

If were to try the approach of using pci_{map,sync}_single
always (i.e., just writing the code not to use alloc_consistent), that
would have a performance cost on machines where using consistent
memory for writing small amounts of data is cheaper than the cost of
the cache flushes that would otherwise be required.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-05 02:59:08

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> My thinking was that a driver might want to do things like --
> if (dma_addr_is_consistent (some_funky_addr)) {
> do it quickly;
> } else
> do_it_the_slow_way (some_funky_addr);
> in other words, something besides just calling the sync functions, in
> the case where the memory was consistent.

Actually, I did code an api for that case, it's the dma_get_conformance() one
which tells you the consistency type of memory that you actually got, so if
you really need to tell the difference, you can.

James

2002-12-05 03:06:04

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> The point is, there has to be an advantage to using consistent memory
> if it is available AND the possibility of it not being available.

I'm really thinking of this from the driver writer's point of view. The
advantage of consistent memory is that you don't have to think about where to
place all the sync points (sync points can be really subtle and nasty and an
absolute pain---I shudder to recall all of the problems I ran into writing a
driver on a fully inconsistent platform).

The advantage here is that you can code the driver only to use consistent
memory and not bother with the sync points (whatever the cost of this is).
Most platforms support reasonably cheap consistent memory, so most people
simply don't want to bother with inconsistent memory if they can avoid it.

If you do the sync points, you can specify the DMA_CONFORMANCE_NON_CONSISTENT
level and have the platform choose what type of memory you get. For a
platform which makes memory consistent by turning off CPU caching at the page
level, it's probably better to return non-consistent memory if the driver can
cope with it.

James

2002-12-05 03:11:02

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

David Gibson <[email protected]> writes:
> It seems the "try to get consistent memory, but otherwise give me
> inconsistent" is only useful on machines which:
> (1) Are not fully consisent, BUT
> (2) Can get consistent memory without disabling the cache, BUT
> (3) Not very much of it, so you might run out.
>
> The point is, there has to be an advantage to using consistent memory
> if it is available AND the possibility of it not being available.
...
> Are there actually any machines with the properties described above?

As I mentioned in my previous message, one of my platforms is like that
-- PCI consistent memory must be allocated from a special pool of
memory, which is only 2 megabytes in size.

-Miles
--
`Life is a boundless sea of bitterness'

2002-12-05 03:34:03

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 01:38:47PM +1100, David Gibson wrote:
> It seems the "try to get consistent memory, but otherwise give me
> inconsistent" is only useful on machines which:
> (1) Are not fully consisent, BUT
> (2) Can get consistent memory without disabling the cache, BUT
> (3) Not very much of it, so you might run out.
>
> The point is, there has to be an advantage to using consistent memory
> if it is available AND the possibility of it not being available.

Agreed here. Add to this

(4) quite silly from an API taste perspective.

> Otherwise, drivers which absolutely need consistent memory, no matter
> the cost, should use consistent_alloc(), all other drivers just use
> kmalloc() (or whatever) then use the DMA flushing functions which
> compile to NOPs on platforms with consistent memory.

Ug. This is travelling backwards in time.

kmalloc is not intended to allocate memory for DMA'ing. I (and others)
didn't spend all that time converting drivers to the PCI DMA API just to
see all that work undone.

Note that I am speaking from a driver API perspective, however. If you
are talking about using kmalloc "under the hood" while still presenting
the same driver interface, that's fine.

Jeff

2002-12-05 05:15:22

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

At the risk of beating a dead horse, I'd like to clarify a potential
ambiguity.

David Gibson wrote:
>It seems the "try to get consistent memory, but otherwise give me
>inconsistent" is only useful on machines which:
> (1) Are not fully consisent, BUT
> (2) Can get consistent memory without disabling the cache, BUT
> (3) Not very much of it, so you might run out.

>The point is, there has to be an advantage to using consistent memory
>if it is available AND the possibility of it not being available.

It is enough that there is an advantage to using consistent
memory on one platform (such as sparc64?) and the possibility of it
not being available on another platform (such as parisc), given that
you want the driver on both platforms (such as 53c700). In that case,
we have identified three possible choices so far:

APPROACH PROBLEMS

1. Use both memory allocators. Increased source and object size,
(as 53c700 currently does) rarely used code branches, unneeded
"if (!consistent)" tests on platforms
where the answer is constant.

2. Assume only inconsistent memory. Slower on platforms where consistent
memory has speed advantage

3. Have "maybe consistent" allocation
and {w,r}mb_maybe(addr,len) macros.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-05 05:37:12

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

James Bottomley <[email protected]> writes:
> > Keep in mind that sometimes the actual _implementation_ is also highly
> > PCI-specific -- that is, what works for PCI devices may not work for
> > other devices and vice-versa.
>
> that can all be taken care of in the platform implementation.
>
> In general, the generic device already has enough information that the
> platform implementation can be highly bus specific---and, of course,
> once you know exactly what bus it's on, you can cast it to the bus
> device if you want.

I presume you mean something like (in an arch-specific file somewhere):

void *dma_alloc_consistent (struct device *dev, size_t size,
dma_addr_t *dma_handle,
enum dma_conformance_level level)
{
if (dev->SOME_FIELD == SOME_CONSTANT)
return my_wierd_ass_pci_alloc_consistent ((struct pci_dev *)dev, ...);
else
return 0; /* or kmalloc(...); */
}

?

I did a bit of grovelling, but I'm still not quite sure what test I
can do (i.e., what SOME_FIELD and SOME_CONSTANT should be, if it's
really that simple).

Ah well, as long as it's possible I guess I'll figure it out when the
source hits the fan...

-Miles
--
I have seen the enemy, and he is us. -- Pogo

2002-12-05 05:57:38

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 11:31:10AM +0900, Miles Bader wrote:
> James Bottomley <[email protected]> writes:
> > > How is the driver supposed to tell whether a given dma_addr_t value
> > > represents consistent memory or not? It seems like an
> > > (arch-specific) `dma_addr_is_consistent' function is necessary, but
> > > I couldn't see one in your patch.
> >
> > If you have a machine that has both consistent and inconsistent blocks, you
> > need to encode that in dma_addr_t (which is a platform definable type).
> >
> > The sync functions would just decode the type and either nop or perform the
> > sync.
>
> My thinking was that a driver might want to do things like --
>
> if (dma_addr_is_consistent (some_funky_addr)) {
> do it quickly;
> } else
> do_it_the_slow_way (some_funky_addr);
>
> in other words, something besides just calling the sync functions, in
> the case where the memory was consistent.

Yes, but using consistent memory is not necessarily the fast way - in
fact it probably won't be. Machines which don't do DMA cache snooping
will need to disable caching to get consistent memory, so using
consistent memory is very slow - on such a machine explicit syncs are
preferable wherever possible.

On a machine which is nicely consistent, the cache "flushes" should
become NOPs, so we'd expect the two sides of that if to do the same
thing.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-05 05:58:46

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Wed, Dec 04, 2002 at 10:41:31PM -0500, Jeff Garzik wrote:
> On Thu, Dec 05, 2002 at 01:38:47PM +1100, David Gibson wrote:
> > It seems the "try to get consistent memory, but otherwise give me
> > inconsistent" is only useful on machines which:
> > (1) Are not fully consisent, BUT
> > (2) Can get consistent memory without disabling the cache, BUT
> > (3) Not very much of it, so you might run out.
> >
> > The point is, there has to be an advantage to using consistent memory
> > if it is available AND the possibility of it not being available.
>
> Agreed here. Add to this
>
> (4) quite silly from an API taste perspective.
>
>
> > Otherwise, drivers which absolutely need consistent memory, no matter
> > the cost, should use consistent_alloc(), all other drivers just use
> > kmalloc() (or whatever) then use the DMA flushing functions which
> > compile to NOPs on platforms with consistent memory.
>
> Ug. This is travelling backwards in time.
>
> kmalloc is not intended to allocate memory for DMA'ing. I (and others)
> didn't spend all that time converting drivers to the PCI DMA API just to
> see all that work undone.

But if there aren't any consistency constraints on the memory, why not
get it with kmalloc(). There are two approaches to handling DMA on a
not-fully-consistent machine:
1) Allocate the memory specially so that it is consistent
2) Use any old memory, and make sure we have explicit cache
frobbing.

We have to have both: some hardware requires approach (1), and the
structure of the kernel often requires (2) to avoid lots of copying
(e.g. a network device doesn't allocate its own skbs to transmit, so
it can't assume the memory has any special consistency properties).

Since in case (2), we can't make assumptions about where the memory
came from, it might as well come from kmalloc() (or a slab, or
get_free_pages() or whatever).

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-05 05:57:39

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Wed, Dec 04, 2002 at 09:13:33PM -0600, James Bottomley wrote:
> [email protected] said:
> > The point is, there has to be an advantage to using consistent memory
> > if it is available AND the possibility of it not being available.
>
> I'm really thinking of this from the driver writer's point of view. The
> advantage of consistent memory is that you don't have to think about where to
> place all the sync points (sync points can be really subtle and nasty and an
> absolute pain---I shudder to recall all of the problems I ran into writing a
> driver on a fully inconsistent platform).
>
> The advantage here is that you can code the driver only to use consistent
> memory and not bother with the sync points (whatever the cost of this is).
> Most platforms support reasonably cheap consistent memory, so most people
> simply don't want to bother with inconsistent memory if they can avoid it.
>
> If you do the sync points, you can specify the
> DMA_CONFORMANCE_NON_CONSISTENT level and have the platform choose
> what type of memory you get. For a platform which makes memory
> consistent by turning off CPU caching at the page level, it's
> probably better to return non-consistent memory if the driver can
> cope with it.

But if you have the sync points, you don't need a special allocater
for the memory at all - any old RAM will do. So why not just use
kmalloc() to get it.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-05 05:58:46

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 12:17:55PM +0900, Miles Bader wrote:
> David Gibson <[email protected]> writes:
> > It seems the "try to get consistent memory, but otherwise give me
> > inconsistent" is only useful on machines which:
> > (1) Are not fully consisent, BUT
> > (2) Can get consistent memory without disabling the cache, BUT
> > (3) Not very much of it, so you might run out.
> >
> > The point is, there has to be an advantage to using consistent memory
> > if it is available AND the possibility of it not being available.
> ...
> > Are there actually any machines with the properties described above?
>
> As I mentioned in my previous message, one of my platforms is like that
> memory, which is only 2 megabytes in size.

Ok, that starts to make sense then (what platform is it,
incidentally). Is using consistent memory actually faster than doing
the cache flushes expliticly? Much?

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-05 06:08:40

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Wed, Dec 04, 2002 at 07:02:18PM -0800, Adam J. Richter wrote:
> >On Wed, Dec 04, 2002 at 07:44:17PM -0600, James Bottomley wrote:
> >> [email protected] said:
> >> > Do you have an example of where the second option is useful? Off hand
> >> > the only places I can think of where you'd use a consistent_alloc()
> >> > rather than map_single() and friends is in cases where the hardware's
> >> > behaviour means you absolutely positively have to have consistent
> >> > memory.
> >>
> >> Well, it comes from parisc drivers. Here you'd really rather have
> >> consistent memory because it's more efficient, but on certain
> >> platforms it's just not possible.
>
> >Hmm... that doesn't seem sufficient to explain it.
>
> The question is not what is possible, but what is optimal.
>
> Yes, it is possible to write drivers for machines without
> consistent memory that work with any DMA device, by using
> dma_{map,sync}_single as you suggest, even if caching could be
> disabled. That is how drivers/scsi/53c700.c and
> drivers/net/lasi_82596.c work today.
>
> The advantages of James's approach is that it will result in
> these drivers having simpler source code and even smaller object code
> on machines that do not have this problem.

Since, with James's approach you'd need a dma sync function (which
might compile to NOP) in pretty much the same places you'd need
map/sync calls, I don't see that it does make the source noticeably
simpler.

The only difference is that the map functions might also involve iommu
or similar setup - which also could compile to a nop in some cases.

> If were to try the approach of using pci_{map,sync}_single
> always (i.e., just writing the code not to use alloc_consistent),
> that would have a performance cost on machines where using
> consistent memory for writing small amounts of data is cheaper than
> the cost of the cache flushes that would otherwise be required.

Well, I'm only talking about the cases where we actually care about
reducing the use of consistent memory.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-05 06:08:39

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 11:49:52AM +0900, Miles Bader wrote:
> David Gibson <[email protected]> writes:
> > For cases like this, I'm talking about replacing the
> > consistent_alloc() with a kmalloc(), then using the cache flush
> > macros. Is there any machine for which this is not sufficient?
>
> I'm not entirely sure what you mean by `using the cache flush macros,'
> but on one of my platforms, PCI consistent memory must be allocated from
> a special area.

Well, yes, you only need the cache flush macros on memory that *isn't*
consistent.

> It's also not clear what you mean by `for cases like this' -- do you
> mean, replace _all_ uses of xxx_alloc_consistent with kmalloc, or do you
> mean just those cases where pci_alloc_consistent currently returns 0?

I mean replace xxx_alloc_consistent() with kmalloc() and appropriate
calls to map_single() (or whatever) in those cases where we actually
care about reducing our usage of (genuinely) consistent memory and it
is possible to do so.

> If the former, it obviously doesn't work on my platform; if the latter,
> I guess this is what James' patch assumes the platform-specific
> dma_alloc_consistent function will do.

Well, with James approach you need a dma_sync() of some sort in pretty
much exactly the same places you need a map_single() or similar if you
used kmalloc() to start with.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-05 06:36:42

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

David Gibson <[email protected]> writes:
> > As I mentioned in my previous message, one of my platforms is like that
> > memory, which is only 2 megabytes in size.
>
> Ok, that starts to make sense then (what platform is it,
> incidentally). Is using consistent memory actually faster than doing
> the cache flushes expliticly? Much?

It's an embedded evaluation board (Midas `RTE-MOTHER-A' and
`RTE-V850E-MA1-CB').

The thing is there _is_ no cache on this machine (it's very slow), so
cache-consistency is actually not an issue (and the cache-flushing
macros won't help me at all).

PCI devices are several busses removed from the CPU, and they only have
this one 2MB area in common. So on this machine, PCI devices can _only_
use consistent memory.

When a driver uses the non-consistent interfaces, then:

* pci_map_single allocates a `shadow area' of consistent memory and
pci_unmap_single deallocates it

* pci_dma_sync_... just does a memcpy to/from the `shadow' consistent
memory from/to the drivers kalloc'd block (in principle I think this
is incorrect, because it uses the `dir' parameter to determine the
direction to copy, but it works in practice)

So you can see that for this platform, it would be better if drivers
could _always_ use alloc_consistent, but many don't.

Yes this is a wierd and frustrating design, but I think it does credit
to the linux PCI layer that I could get it work at all, without
modifying any drivers! I guess my main goal in this discussion is to
ensure that remains the case...

-Miles
--
.Numeric stability is probably not all that important when you're guessing.

2002-12-05 11:03:36

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Wed, 2002-12-04 at 22:46, James Bottomley wrote:
> [email protected] said:
> > How is the driver supposed to tell whether a given dma_addr_t value
> > represents consistent memory or not? It seems like an (arch-specific)
> > `dma_addr_is_consistent' function is necessary, but I couldn't see one
> > in your patch.
>
> well, the patch was only for x86, which is fully consistent. For parisc, that
> becomes a field for the dma accessor functions.
>
> However, even on parisc, the (supported) machines are either entirely
> consistent or entirely inconsistent.
>
> If you have a machine that has both consistent and inconsistent blocks, you
> need to encode that in dma_addr_t (which is a platform definable type).

I don't agree here. Encoding things in dma_addr_t, then special casing
in consistent_{map,unmap,sync,....) looks really ugly to me ! You want
dma_addr_t to contain a bus address for the given bus you are working
with and pass that to your device, period.

Consistency of memory (or simply, in some cases, accessibility of system
memory by a given device) is really a property of the bus. Tweaking
magic bits in dma_addr_t and testing them later is a hack. The proper
implementation is to have the consistent_{alloc,free,map,unmap,sync,...)
functions be function pointers in the generic bus structure.

Actually, the device model defines a bus "type" structure rather than a
"bus instance" structure (well, at least it did last I looked a couple
of weeks ago). That's a problem I beleive here, as those functions are
really a property of a given bus instance. One solution would eventually
be to have the set of functions pointers in the generic struct device
and by default be copied from parent to child.

Actually, to avoid bloat, I think a single pointer to a struct
containing the whole set of consistent functions is enough though, as
those will typically be statically defined.

Ben.

2002-12-05 10:56:28

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, 2002-12-05 at 01:47, David Gibson wrote:
> Do you have an example of where the second option is useful? Off hand
> the only places I can think of where you'd use a consistent_alloc()
> rather than map_single() and friends is in cases where the hardware's
> behaviour means you absolutely positively have to have consistent
> memory.

Looking at our implementation (ppc32 on non-coherent CPUs like 405) of
pci_map_single, which just flushes the cache, I still feel we need a
consistent_alloc, that is an implementation that _disables_ caching for
the area.

A typical example is an USB OHCI driver. You really don't want to play
cache tricks with the shared area here. That will happen each time you
have a shared area in memory in which both the CPU and the device may
read/write in the same cache line.

For things like ring descriptors of a net driver, I feel it's very much
simpler (and possibly more efficient too) to also allocate non-cacheable
space for consistent instead of continuously flushing/invalidating.
Actually, flush/invalidate here can also have nasty side effects if
several descriptors fit in the same cache line.

The data buffers, of course (skbuffs typically) would preferably use
pci_map_* like APIs (hrm... did we ever make sure skbuffs would _not_
mix the data buffer with control datas in the same cache line ? This
have been a problem with non-coherent CPUs in the past).

Ben.

2002-12-05 11:09:25

by William Lee Irwin III

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 12:15:30PM +0100, Benjamin Herrenschmidt wrote:
> Actually, the device model defines a bus "type" structure rather than a
> "bus instance" structure (well, at least it did last I looked a couple
> of weeks ago). That's a problem I beleive here, as those functions are
> really a property of a given bus instance. One solution would eventually
> be to have the set of functions pointers in the generic struct device
> and by default be copied from parent to child.

On an unrelated note, a "bus instance" structure would seem to be
required for proper handling of bridges in combination with PCI segments.

Bill

2002-12-05 11:28:21

by Russell King

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 12:08:16PM +0100, Benjamin Herrenschmidt wrote:
> For things like ring descriptors of a net driver, I feel it's very much
> simpler (and possibly more efficient too) to also allocate non-cacheable
> space for consistent instead of continuously flushing/invalidating.
> Actually, flush/invalidate here can also have nasty side effects if
> several descriptors fit in the same cache line.

Indeed. Think about a 16-byte descriptor in a 32-byte cache line.
The net chip has written status information to the first word, you've
just written to the 4th word of that cache line.

To access the status word written by the chip, you need to invalidate
(without writeback) that cache line. For the chip to access the word
you've just written, you need to writeback that cache line.

In other words, you _will_ loose information in this case, guaranteed.
I'd rather keep our existing pci_* API than be forced into this crap
again.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-12-05 11:52:58

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

David Gibson wrote:
>Since, with James's approach you'd need a dma sync function (which
>might compile to NOP) in pretty much the same places you'd need
>map/sync calls, I don't see that it does make the source noticeably
>simpler.

Because then you don't have to have a branch for
case where the platform *does* support consistent memory.

>> If were to try the approach of using pci_{map,sync}_single
>> always (i.e., just writing the code not to use alloc_consistent),
>> that would have a performance cost on machines where using
>> consistent memory for writing small amounts of data is cheaper than
>> the cost of the cache flushes that would otherwise be required.
>
>Well, I'm only talking about the cases where we actually care about
>reducing the use of consistent memory.

Then you're not fully weighing the benefits of this facility.
The primary beneficiaries of this facility are device drivers for
which we'd like to have the performance advantages of consistent
memory when available (at least on machines that always return
consistent memory) but which we'd also like to have work as
efficiently as possible on platforms that lack consistent memory or
have so little that we want the device driver to still work even when
no consistent memory is available. That includes all PCI devices that
users of the inconsistent parisc machines want to use.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-05 12:08:32

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

Benjamin Herrenschmidt wrote:
>On Wed, 2002-12-04 at 22:46, James Bottomley wrote:
>> If you have a machine that has both consistent and inconsistent blocks, you
>> need to encode that in dma_addr_t (which is a platform definable type).
>
>I don't agree here. Encoding things in dma_addr_t, then special casing
>in consistent_{map,unmap,sync,....) looks really ugly to me ! You want
>dma_addr_t to contain a bus address for the given bus you are working
>with and pass that to your device, period.

I don't think that James meant actually defining flag bits
inside of dma_addr_t, although I suppose you could do it for some
unused high bits on some architectures. I think the implication
was that you could have something like:

static inline int is_consistent(dma_addr_t addr)
{
return (addr >= CONSISTENT_AREA_START && addr < CONSISTENT_AREA_END);
}

I also don't recall anyone proposing special casing
dma_{map,unmap,sync} based on the results of such a check. I think
the only function that might use it would be maybe_wmb(addr,len),
which, on some might machines might be:

static inline void maybe_wmb(dma_addr_t addr, size_t len)
{
if (!is_consistent(addr))
wmb();
}

In practice, I think dma_malloc() would either always return
consistent memory or never return consistent memory on a given
machine, so maybe_wmb would probably never do such range checking.
Instead it would compile to nothing on machines where dma_alloc always
returned consistent memory, would compile to wmb() on systems where
dma_alloc would only succeed if it could return non-consistent memory,
and would compile to a procedure pointer on parisc that would either
set to point to a no-op or wmb, depending on which kind of machine the
kernel was booted on.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-05 12:16:12

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

Russell King wrote:
[An excellent explanation of why you sometimes may need consistent
memory.]
>In other words, you _will_ loose information in this case, guaranteed.
>I'd rather keep our existing pci_* API than be forced into this crap
>again.

All of the proposed API variants that we have discussed in
this thread for pci_alloc_consistent / dma_malloc give you consistent
memory (or fail) unless you specifically tell it that returning
inconsistent memory is OK.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-05 12:37:04

by Russell King

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 04:21:01AM -0800, Adam J. Richter wrote:
> Russell King wrote:
> [An excellent explanation of why you sometimes may need consistent
> memory.]
> >In other words, you _will_ loose information in this case, guaranteed.
> >I'd rather keep our existing pci_* API than be forced into this crap
> >again.
>
> All of the proposed API variants that we have discussed in
> this thread for pci_alloc_consistent / dma_malloc give you consistent
> memory (or fail) unless you specifically tell it that returning
> inconsistent memory is OK.

How does a driver writer determine if his driver can cope with inconsistent
memory? If their view is a 32-byte cache line, and their descriptors are
32 bytes long, they could well say "we can cope with inconsistent memory".
When 64 byte cache lines are the norm, the driver magically breaks.

I think we actually want to pass the minimum granularity the driver can
cope with if we're going to allocate inconsistent memory. A driver
writer does not have enough information to determine on their own
whether inconsistent memory is going to be usable on any architecture.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-12-05 14:55:58

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> But if you have the sync points, you don't need a special allocater
> for the memory at all - any old RAM will do. So why not just use
> kmalloc() to get it.

Because with kmalloc, you have to be aware of platform implementations. Most
notably that cache flush/invalidate instructions only operate at the level of
certain block of memory (called the cache line width). If kmalloc returns
less than a cache line width you have the potential for severe cockups because
of the possibility of interfering cache operations on adjacent kmalloc regions
that share the same cache line.

the dma_alloc... function guarantees to avoid this for you by passing the
allocation to the platform layer which knows the cache characteristics and
requirements for the machine (and dma controller) you're using.

James

2002-12-05 15:05:04

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> actually, the device model defines a bus "type" structure rather than a
> "bus instance" structure (well, at least it did last I looked a couple
> of weeks ago). That's a problem I beleive here, as those functions are
> really a property of a given bus instance. One solution would
> eventually be to have the set of functions pointers in the generic
> struct device and by default be copied from parent to child.

Well, the bus in the generic device model is just a struct device as well.

The parisc implementation I'm working on stores this type of conversion
information in the platform_data field of the generic device (although the
function pointers that make use of it are global).

I did do an implementation which added a dma_accessors set of functions to the
struct device (and also struct bus_type), but I eventually concluded they had
to be so platform specific that there was no utility to exposing them.

> Consistency of memory (or simply, in some cases, accessibility of
> system memory by a given device) is really a property of the bus.
> Tweaking magic bits in dma_addr_t and testing them later is a hack.
> The proper implementation is to have the consistent_{alloc,free,map,unm
> ap,sync,...) functions be function pointers in the generic bus
> structure.

actually, in parisc, the implementation is simple. The type of memory is
determined globally per architecture (so it's not encoded in the dma_addr_t).
As Adam said. The preferred platform implementation for a machine that did
both (I believe this is the fabled parisc V class, which I've never seen)
would be to implement a consistent region so you could tell if dma_addr_t fell
in that region for whether it was consistent or not.

I fully recognise that dma_addr_t actually has to be freely convertable to the
physical address by simple casting, because that's the way the pci_ functions
use it.

James

2002-12-05 15:17:02

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> I'd rather keep our existing pci_* API than be forced into this crap
> again.

Let me just clarify: I'm not planning to revoke the pci_* API, or to deviate
substantially from it. I'm not even planning to force any arch's to use it if
they don't want to. I'm actually thinking of putting something like this in
the asm-generic implementations:

dma_*(struct device *dev, ...) {
BUG_ON(dev->bus != &pci_bus_type)
pci_*(to_pci_device(dev), ..)
}

The whole point is not to force another massive shift in the way drivers are
written, but to provide a generic device based API for those who need it.
There are very few drivers that actually have to allocate fake PCI devices
today, but this API is aimed squarely at helping them. Drivers that only ever
see real PCI devices won't need touching.

James

2002-12-05 16:22:44

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

David Gibson wrote:
> On Wed, Dec 04, 2002 at 10:41:31PM -0500, Jeff Garzik wrote:
>
>>On Thu, Dec 05, 2002 at 01:38:47PM +1100, David Gibson wrote:
>>
>>>It seems the "try to get consistent memory, but otherwise give me
>>>inconsistent" is only useful on machines which:
>>> (1) Are not fully consisent, BUT
>>> (2) Can get consistent memory without disabling the cache, BUT
>>> (3) Not very much of it, so you might run out.
>>>
>>>The point is, there has to be an advantage to using consistent memory
>>>if it is available AND the possibility of it not being available.
>>
>>Agreed here. Add to this
>>
>>(4) quite silly from an API taste perspective.
>>
>>
>>
>>>Otherwise, drivers which absolutely need consistent memory, no matter
>>>the cost, should use consistent_alloc(), all other drivers just use
>>>kmalloc() (or whatever) then use the DMA flushing functions which
>>>compile to NOPs on platforms with consistent memory.
>>
>>Ug. This is travelling backwards in time.
>>
>>kmalloc is not intended to allocate memory for DMA'ing. I (and others)
>>didn't spend all that time converting drivers to the PCI DMA API just to
>>see all that work undone.
>
>
> But if there aren't any consistency constraints on the memory, why not
> get it with kmalloc(). There are two approaches to handling DMA on a
> not-fully-consistent machine:
> 1) Allocate the memory specially so that it is consistent
> 2) Use any old memory, and make sure we have explicit cache
> frobbing.

For me it's an API issue. kmalloc does not return DMA'able memory.

If "your way" is acceptable to most, then at the very least I would want

#define get_any_old_dmaable_memory kmalloc

2002-12-05 17:41:44

by Manfred Spraul

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

>
>
>Hmm... that doesn't seem sufficient to explain it.
>
>Some background: I work with PPC embedded chips (the 4xx family) whose
>only way to get consistent memory is by entirely disabling the cache.
>
What do you mean with "disable"?
Do you have to disable the cache entirely when you encounter the first
pci_alloc_consistent() call, or do you disable the cache just for the
region that is returned by pci_alloc_consistent()?

If you disable it entirely - would "before_acess_consistent_area() /
after_access_consistent_area()" macros help to avoid that, or are there
other problems?

--
Manfred

2002-12-05 20:23:37

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

Russell King wrote:
>On Thu, Dec 05, 2002 at 04:21:01AM -0800, Adam J. Richter wrote:
>> Russell King wrote:
>> [An excellent explanation of why you sometimes may need consistent
>> memory.]
>> >In other words, you _will_ loose information in this case, guaranteed.
>> >I'd rather keep our existing pci_* API than be forced into this crap
>> >again.
>>
>> All of the proposed API variants that we have discussed in
>> this thread for pci_alloc_consistent / dma_malloc give you consistent
>> memory (or fail) unless you specifically tell it that returning
>> inconsistent memory is OK.

>How does a driver writer determine if his driver can cope with inconsistent
>memory? If their view is a 32-byte cache line, and their descriptors are
>32 bytes long, they could well say "we can cope with inconsistent memory".
>When 64 byte cache lines are the norm, the driver magically breaks.
>
>I think we actually want to pass the minimum granularity the driver can
>cope with if we're going to allocate inconsistent memory. A driver
>writer does not have enough information to determine on their own
>whether inconsistent memory is going to be usable on any architecture.

I agree with James that dma_malloc should round its allocation
sizes up to a multiple of cache line size (at least if it is returning
inconsistent writeback cached memory), and I would extend that
statement to the pool allocator (currently PCI specific, and an API
that I'd like to change slightly, but that's another matter). For
dma_malloc, this would currently just be a documentation change, as it
currently always allocates entire pages. For the pool allocator,
might requiring adding a few lines of code.

There may be still be other cache size issues, and how to deal
with them will be a driver-specific question. I think most drivers
will not this problem because hardware programming and data structures
are designed so that at any given time either the IO device or the CPU
has an implicit write lock on the data structure and there is a
specific protocol for handing ownership from one to the other (for
example, the CPU sets up the data structures, flushes its write cache,
then writes sets the "go" bit and does not do further writes until it
sees that the IO device has set the "done" bit).

However, not all data strucutres and protocols are amenable to
such techniques. For example, OHCI USB controllers have a 16 byte
Endpoint Descriptor which contains a NextTD (next transfer descriptor)
field designed to be writable by the controller and a EndTD (end
transfer descriptor designed to be writable by the controller) so that
the device driver can add more transfers to an endpoint while that
endpoint descriptor is still hot as long as the architecture supports
32-bit atomic writes, instead of waiting for that endpoint's queue to
empty before adding new requests. I think that the Linux OHCI
controller currently only queues one request per bulk or control
endpoint, so I don't think it uses this feature, if it were to, it
would have to check that it really did have consistent memory or that
the cache line size was 8 bytes or less (not 4 bytes, because of where
these registers are located). These checks would evaluate to compile
time constant on most or all architectures.

For other devices, it may be necessary to use other workarounds
or to fail initialization. It may also depend on how inconsistent the
memory is. For example, read cache write through memory may suffice
given read barriers (and it would be interesting to find out if this
kind of memory is available on the inconsistent parisc machines).

I think the question of whether it would actually simplify
things to embed this test in dma_malloc would depend on how common the
case is where you really want the device driver to fail. I suspet
that it would be simpler to create a symbol like SMP_CACHE_BYTES or
L1_CACHE_BYTES that the affected drivers could examine. Also, if the
need really is that common, maybe it could be put in struct
device_driver so that it could appear once instead of in the typically
two or three times in the drivers (and you could even teach depmod to
read, although I don't know if that would be useful).

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-06 00:03:26

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 03:43:58PM +0900, Miles Bader wrote:
> David Gibson <[email protected]> writes:
> > > As I mentioned in my previous message, one of my platforms is like that
> > > memory, which is only 2 megabytes in size.
> >
> > Ok, that starts to make sense then (what platform is it,
> > incidentally). Is using consistent memory actually faster than doing
> > the cache flushes expliticly? Much?
>
> It's an embedded evaluation board (Midas `RTE-MOTHER-A' and
> `RTE-V850E-MA1-CB').
>
> The thing is there _is_ no cache on this machine (it's very slow), so
> cache-consistency is actually not an issue (and the cache-flushing
> macros won't help me at all).
>
> PCI devices are several busses removed from the CPU, and they only have
> this one 2MB area in common. So on this machine, PCI devices can _only_
> use consistent memory.
>
> When a driver uses the non-consistent interfaces, then:
>
> * pci_map_single allocates a `shadow area' of consistent memory and
> pci_unmap_single deallocates it

That's a little misleading: all your memory is consistent, the point
is that it is a shadow area of PCI-mappable memory.

> * pci_dma_sync_... just does a memcpy to/from the `shadow' consistent
> memory from/to the drivers kalloc'd block (in principle I think this
> is incorrect, because it uses the `dir' parameter to determine the
> direction to copy, but it works in practice)
>
> So you can see that for this platform, it would be better if drivers
> could _always_ use alloc_consistent, but many don't.

Ah, ok, now I understand. The issue here is actually nothing to do
with consistency, since your platform is fully consistent. The issue
is that there are other constraints for DMAable memory and you want
drivers to be able to easily mallocate with those constraints.

Actually, it occurs to me that PC ISA DMA is in a similar situation -
there is a constraint on DMAable memory (sufficiently low physical
address) which has nothing to do with consistency.

> Yes this is a wierd and frustrating design, but I think it does credit
> to the linux PCI layer that I could get it work at all, without
> modifying any drivers! I guess my main goal in this discussion is to
> ensure that remains the case...

Ok... see also my reply to one of James's posts.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-06 00:03:27

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 12:08:16PM +0100, Benjamin Herrenschmidt wrote:
> On Thu, 2002-12-05 at 01:47, David Gibson wrote:
> > Do you have an example of where the second option is useful? Off hand
> > the only places I can think of where you'd use a consistent_alloc()
> > rather than map_single() and friends is in cases where the hardware's
> > behaviour means you absolutely positively have to have consistent
> > memory.
>
> Looking at our implementation (ppc32 on non-coherent CPUs like 405) of
> pci_map_single, which just flushes the cache, I still feel we need a
> consistent_alloc, that is an implementation that _disables_ caching for
> the area.

No question there: that's James's first option.

> A typical example is an USB OHCI driver. You really don't want to play
> cache tricks with the shared area here. That will happen each time you
> have a shared area in memory in which both the CPU and the device may
> read/write in the same cache line.
>
> For things like ring descriptors of a net driver, I feel it's very much
> simpler (and possibly more efficient too) to also allocate non-cacheable
> space for consistent instead of continuously flushing/invalidating.
> Actually, flush/invalidate here can also have nasty side effects if
> several descriptors fit in the same cache line.
>
> The data buffers, of course (skbuffs typically) would preferably use
> pci_map_* like APIs (hrm... did we ever make sure skbuffs would _not_
> mix the data buffer with control datas in the same cache line ? This
> have been a problem with non-coherent CPUs in the past).

Indeed - the 405GP ethernet driver, which I've worked on, uses exactly
this approach. consistent_alloc() is used for the descriptor ring
buffer, and DMA syncs are used for the data buffers.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-06 00:03:26

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 06:49:10PM +0100, Manfred Spraul wrote:
> >
> >
> >Hmm... that doesn't seem sufficient to explain it.
> >
> >Some background: I work with PPC embedded chips (the 4xx family) whose
> >only way to get consistent memory is by entirely disabling the cache.

> What do you mean with "disable"?
> Do you have to disable the cache entirely when you encounter the first
> pci_alloc_consistent() call, or do you disable the cache just for the
> region that is returned by pci_alloc_consistent()?

Just for the region - it is an attribute in the PTE.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-06 00:02:03

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 11:29:45AM -0500, Jeff Garzik wrote:
> David Gibson wrote:
> >On Wed, Dec 04, 2002 at 10:41:31PM -0500, Jeff Garzik wrote:
> >
> >>On Thu, Dec 05, 2002 at 01:38:47PM +1100, David Gibson wrote:
> >>
> >>>It seems the "try to get consistent memory, but otherwise give me
> >>>inconsistent" is only useful on machines which:
> >>> (1) Are not fully consisent, BUT
> >>> (2) Can get consistent memory without disabling the cache, BUT
> >>> (3) Not very much of it, so you might run out.
> >>>
> >>>The point is, there has to be an advantage to using consistent memory
> >>>if it is available AND the possibility of it not being available.
> >>
> >>Agreed here. Add to this
> >>
> >>(4) quite silly from an API taste perspective.
> >>
> >>
> >>
> >>>Otherwise, drivers which absolutely need consistent memory, no matter
> >>>the cost, should use consistent_alloc(), all other drivers just use
> >>>kmalloc() (or whatever) then use the DMA flushing functions which
> >>>compile to NOPs on platforms with consistent memory.
> >>
> >>Ug. This is travelling backwards in time.
> >>
> >>kmalloc is not intended to allocate memory for DMA'ing. I (and others)
> >>didn't spend all that time converting drivers to the PCI DMA API just to
> >>see all that work undone.
> >
> >
> >But if there aren't any consistency constraints on the memory, why not
> >get it with kmalloc(). There are two approaches to handling DMA on a
> >not-fully-consistent machine:
> > 1) Allocate the memory specially so that it is consistent
> > 2) Use any old memory, and make sure we have explicit cache
> >frobbing.
>
> For me it's an API issue. kmalloc does not return DMA'able memory.

Ok - see my reply to James's post. I see the point of this given that
there are constraints on DMAable memory which are not related to
consistency (e.g. particular address ranges and cacheline alignment).
A mallocater which can satisfy these constraints makes sense to me.

I just think it's a mistake to associate these constraints with cache
consistency - they are not related. James's original patch does make
this separation in practice, but it misleading suggests a link - which
is what confused me.

>
> If "your way" is acceptable to most, then at the very least I would want
>
> #define get_any_old_dmaable_memory kmalloc

I imagine platforms where any address is DMAable and which are fully
consistent would do this.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-06 00:03:27

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 03:57:53AM -0800, Adam J. Richter wrote:
> David Gibson wrote:
> >Since, with James's approach you'd need a dma sync function (which
> >might compile to NOP) in pretty much the same places you'd need
> >map/sync calls, I don't see that it does make the source noticeably
> >simpler.
>
> Because then you don't have to have a branch for
> case where the platform *does* support consistent memory.

Sorry, you're going to have to explain where this extra branch is, I
don't see it.

> >> If were to try the approach of using pci_{map,sync}_single
> >> always (i.e., just writing the code not to use alloc_consistent),
> >> that would have a performance cost on machines where using
> >> consistent memory for writing small amounts of data is cheaper than
> >> the cost of the cache flushes that would otherwise be required.
> >
> >Well, I'm only talking about the cases where we actually care about
> >reducing the use of consistent memory.
>
> Then you're not fully weighing the benefits of this facility.
> The primary beneficiaries of this facility are device drivers for
> which we'd like to have the performance advantages of consistent
> memory when available (at least on machines that always return
> consistent memory) but which we'd also like to have work as

What performance advantages of consistent memory? Can you name any
non-fully-consistent platform where consistent memory is preferable
when it is not strictly required? For, all the non-consistent
platforms I'm aware of getting consistent memory means disabling the
cache and therefore is to be avoided wherever it can be.

> efficiently as possible on platforms that lack consistent memory or
> have so little that we want the device driver to still work even when
> no consistent memory is available. That includes all PCI devices that
> users of the inconsistent parisc machines want to use.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-06 00:02:05

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 09:03:24AM -0600, James Bottomley wrote:
> [email protected] said:
> > But if you have the sync points, you don't need a special allocater
> > for the memory at all - any old RAM will do. So why not just use
> > kmalloc() to get it.
>
> Because with kmalloc, you have to be aware of platform
> implementations. Most notably that cache flush/invalidate
> instructions only operate at the level of certain block of memory
> (called the cache line width). If kmalloc returns less than a cache
> line width you have the potential for severe cockups because of the
> possibility of interfering cache operations on adjacent kmalloc
> regions that share the same cache line.

Having debugged a stack corruption problem when attempting to use USB
on a PPC 4xx machine, which was due to improperly aligned DMA buffers,
I am well aware of this issue.

> the dma_alloc... function guarantees to avoid this for you by passing the
> allocation to the platform layer which knows the cache characteristics and
> requirements for the machine (and dma controller) you're using.

Ok - now I begin to see the point of this: I was being misled by the
emphasis on a preference for consistent allocation and the original
"alloc_consistent" name you suggested. When consistent memory isn't
strictly required it's as likely as not that it won't be preferred
either.

Given this, and Miles example, I can see the point of a DMA mallocater
that applies DMA constraints that are not to do with consistency.
Then consistency could also be specified, but that's a separate issue.

So, to remove the misleading emphasis on the point of the allocated
being consistent memory (your name change was a start, this goes
further), I'd prefer to see something like:

void *dma_malloc(struct device *bus, unsigned long size, int flags,
dma_addr_t *dma_addr);

Which returns virtual and DMA pointers for a chunk of memory
satisfying any DMA conditions for the specified bus. Then if flags
includes DMA_CONSISTENT (or some such) the memory will be allocated
consistent in addition to those constraints.

If DMA_CONSISTENT is not specified, the memory might be consistent,
and there would be a preference for consistent only on platforms where
consistent memory is actually preferable (I haven't yet heard of one).

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-06 02:04:25

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

David Gibson wrote:
>On Thu, Dec 05, 2002 at 03:57:53AM -0800, Adam J. Richter wrote:
>> David Gibson wrote:
>> >Since, with James's approach you'd need a dma sync function (which
>> >might compile to NOP) in pretty much the same places you'd need
>> >map/sync calls, I don't see that it does make the source noticeably
>> >simpler.
>>
>> Because then you don't have to have a branch for
>> case where the platform *does* support consistent memory.

>Sorry, you're going to have to explain where this extra branch is, I
>don't see it.

In linux-2.5.50/drivers/net/lasi_82596.c, the macros
CHECK_{WBACK,INV,WBACK_INV} have definitions like:

#define CHECK_WBACK(addr,len) \
do { if (!dma_consistent) dma_cache_wback((unsigned long)addr,len); } while (0)

These macros are even used in IO paths like i596_rx(). The
"if()" statement in each of these macros is the extra branch that
disappears on most architectures under James's proposal.

[...]
>What performance advantages of consistent memory? Can you name any
>non-fully-consistent platform where consistent memory is preferable
>when it is not strictly required? For, all the non-consistent
>platforms I'm aware of getting consistent memory means disabling the
>cache and therefore is to be avoided wherever it can be.

I believe that the cache synchronization operations for
nonconsistent memory are often expensive enough so that consistent
memory is faster on many platforms for small reads and writes, such as
dealing with control and status fields and hardware DMA lists. For
example, pci_sync_single is 55 lines of C code in
linux-2.5.50/arch/sparc64/kernel/pci_iommu.c.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-06 02:16:31

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

David Gibson <[email protected]> writes:
> > * pci_map_single allocates a `shadow area' of consistent memory and
> > pci_unmap_single deallocates it
>
> That's a little misleading: all your memory is consistent, the point
> is that it is a shadow area of PCI-mappable memory.

Well I suppose that's true if you're using the term in a cache-related
sense (which I gather is the convention).

OTOH, if you think about it from the view point of the PCI framework,
the terminology actually does make sense even in this odd case --
`consistent' memory is indeed consistent (both CPU and device see a
single image), but other memory used to communicate with the driver is
`inconsistent' (CPU and device see different things until a sync
operation is done).

> The issue is that there are other constraints for DMAable memory and
> you want drivers to be able to easily mallocate with those
> constraints.
>
> Actually, it occurs to me that PC ISA DMA is in a similar situation -
> there is a constraint on DMAable memory (sufficiently low physical
> address) which has nothing to do with consistency.

Indeed. What I'm doing is basically bounce-buffers.

-Miles
--
`The suburb is an obsolete and contradictory form of human settlement'

2002-12-06 02:45:40

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 06:08:22PM -0800, Adam J. Richter wrote:
> David Gibson wrote:
> >On Thu, Dec 05, 2002 at 03:57:53AM -0800, Adam J. Richter wrote:
> >> David Gibson wrote:
> >> >Since, with James's approach you'd need a dma sync function (which
> >> >might compile to NOP) in pretty much the same places you'd need
> >> >map/sync calls, I don't see that it does make the source noticeably
> >> >simpler.
> >>
> >> Because then you don't have to have a branch for
> >> case where the platform *does* support consistent memory.
>
> >Sorry, you're going to have to explain where this extra branch is, I
> >don't see it.
>
> In linux-2.5.50/drivers/net/lasi_82596.c, the macros
> CHECK_{WBACK,INV,WBACK_INV} have definitions like:
>
> #define CHECK_WBACK(addr,len) \
> do { if (!dma_consistent) dma_cache_wback((unsigned long)addr,len); } while (0)
>
> These macros are even used in IO paths like i596_rx(). The
> "if()" statement in each of these macros is the extra branch that
> disappears on most architectures under James's proposal.

Erm... I have no problem with the macros that James's proposal would
use to take away this branch - I would expect to use exactly the same
ones. It's just the notion of "try to get consistent memory, but get
me any old memory otherwise" that I'm not so convinced by.

In any case, on platforms where the dma_malloc() could really return
either consistent or non-consistent memory, James's sync macros would
have to have an equivalent branch within.

> [...]
> >What performance advantages of consistent memory? Can you name any
> >non-fully-consistent platform where consistent memory is preferable
> >when it is not strictly required? For, all the non-consistent
> >platforms I'm aware of getting consistent memory means disabling the
> >cache and therefore is to be avoided wherever it can be.
>
> I believe that the cache synchronization operations for
> nonconsistent memory are often expensive enough so that consistent
> memory is faster on many platforms for small reads and writes, such as
> dealing with control and status fields and hardware DMA lists. For
> example, pci_sync_single is 55 lines of C code in
> linux-2.5.50/arch/sparc64/kernel/pci_iommu.c.

Hmm... fair enough. Ok, I can see the point of a fall back to
non-consistent approach given that. So I guess the idea makes sense,
so long as dma_malloc() (without the consistent flag) is taken to be
"give me DMAable memory, consistent or not, whichever is cheaper for
this platform" rather than "give me DMAable memory, consistent if
possible". It was originally presented as the latter which misled me.

I think the change to the parameters which I suggested in a reply to
James makes this a bit clearer.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-06 03:59:04

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

I think it's a huge error to try and move the DMA stuff into
the generic device interfaces _AND_ change semantics and arguments
at the same time.

Each operation should be done in seperate steps.

Then, if you want to talk about changing semantics etc. there are
more pressing needs (read as: real bugs) in the current DMA APIs
that must be fixed before you add new "cool" features to the
interfaces. For example, we have a "pci_dma_sync_*()" interface
which changes ownership from the device back to the cpu, but we
do not have the corollary which returns ownership of the DMA buffer
back to the device. Basically, every networking device driver that
recycles buffers using pci_dma_sync_*() to peak at the header but then
gives the buffer back to the device is buggy for this reason.

Fix this before changing stuff.

I don't have any time to discuss this further so please do me a big
favor and drop me from the CC: lists, I've been able to only lightly
read the existing parts of this thread, if at all, so the postings
will only hit /dev/null while I'm so busy right now.

Thanks.

2002-12-06 06:12:58

by David Brownell

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

I'm all in favor of making the driver model support dma mapping,
so usb won't need to try any more. I'd expect that to make some
dma model issues for the sa1100 and uml usb ports vanish, and
ideally to eliminate some code now in usbcore.

> empty before adding new requests. I think that the Linux OHCI
> controller currently only queues one request per bulk or control
> endpoint, so I don't think it uses this feature, if it were to, it

In 2.5, all hcds are supposed to queue all kinds of usb requests,
including ohci. (The ohci driver has supported that feature as
long as I recall.) Storage is using that by default now, which
lets high speed disks talk using big scatterlist dma requests.

That's a big change from 2.4, where queueing mostly worked but
wasn't really used by many drivers. In particular, storage
rarely queued more than one page ... now I've seen it queueing
several dozen pages, so faster devices can reach their peak
transfer speeds. (Tens of MByte/sec, sure.)

- Dave

2002-12-06 07:10:18

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

David Gibson wrote:
>On Thu, Dec 05, 2002 at 06:08:22PM -0800, Adam J. Richter wrote:
[...]
>> In linux-2.5.50/drivers/net/lasi_82596.c, the macros
>> CHECK_{WBACK,INV,WBACK_INV} have definitions like:
>>
>> #define CHECK_WBACK(addr,len) \
>> do { if (!dma_consistent) dma_cache_wback((unsigned long)addr,len); } while (0)
>>
>> These macros are even used in IO paths like i596_rx(). The
>> "if()" statement in each of these macros is the extra branch that
>> disappears on most architectures under James's proposal.
>
>Erm... I have no problem with the macros that James's proposal would
>use to take away this branch - I would expect to use exactly the same
>ones. It's just the notion of "try to get consistent memory, but get
>me any old memory otherwise" that I'm not so convinced by.
>
>In any case, on platforms where the dma_malloc() could really return
>either consistent or non-consistent memory, James's sync macros would
>have to have an equivalent branch within.

Yeah, I should have said "because then you don't have to have a
branch for the case where the platform always or *never* returns
consistent memory on a give machine."

>> >What performance advantages of consistent memory?

>> [...] For
>> example, pci_sync_single is 55 lines of C code in
>> linux-2.5.50/arch/sparc64/kernel/pci_iommu.c.
>
>Hmm... fair enough. Ok, I can see the point of a fall back to
>non-consistent approach given that. So I guess the idea makes sense,
>so long as dma_malloc() (without the consistent flag) is taken to be
>"give me DMAable memory, consistent or not, whichever is cheaper for
>this platform" rather than "give me DMAable memory, consistent if
>possible". It was originally presented as the latter which misled me.

As long as dma_sync_maybe works with the addresses returned by
dma_malloc and dma_malloc only returns the types of memory that the
callers claims to be prepared to deal with, the decision about what
kind of memory dma_malloc should return when it has a choice is up to
the platform implementation.

>I think the change to the parameters which I suggested in a reply to
>James makes this a bit clearer.

I previously suggested some of the changes in your description:
name them dma_{malloc,free} (which James basically agrees with), have
a flags field. However, given that it's a parameter and you're going
to pass a constant symbol like DMA_CONSISTENT or DMA_INCONSISTENT to it,
it doesn't really matter if its an enum or an int to start with, as it
could be changed later with minimal or zero driver changes.

I like your term DMA_CONSISTENT better than
DMA_CONFORMANCE_CONSISTANT. I think the word "conformance" in there
does not reduce the time that it takes to figure out what the symbol
means. I don't think any other facility will want to use the terms
DMA_{,IN}CONSISTENT, so I prefer that we go with the more medium sized
symbol.

Naming the parameter to dma_malloc "bus" would imply that it
will not look at individual device information like dma_mask, which is
wrong. Putting the flags field in the middle of the parameter list
will make the dma_malloc and dma_free lists unnnecessarily different.
I think these two were just oversights in your posting.

Anyhow, I think we're in full agreement at this point on the
substantive stuff at this point.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-06 07:37:36

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

Dave Miller wrote:
>I think it's a huge error to try and move the DMA stuff into
>the generic device interfaces _AND_ change semantics and arguments
>at the same time.

Nobody is talking about changing the existing pci_xxx
interface. For the new dma_xxx routines, I think it would actually be
an error to wait to make the particular changes we are discussing,
because now there when there is no compatability to break and, less
importantly, because having it in 2.6.0 from the start might make one
less #ifdef for those people who want to try to maintain a
multi-version "2.6.x" device driver. (Notice that I try to describe
underlying advantages or disadvantages when I advocate something.)

People have already given a lot of thought to the modest
difference in the dma_xxx interface being discussed. These change
will eliminate a difficulty in supporting devices on inconsistent-only
machines, I real problem that was partly induced by the original
pci_alloc_consistent interface.

Six months ago, I posted proposal to turn scatterlists into
linked lists to reduce copying and translation between certain IO
list formats. David responded "Now is not the time for this, when we
finally have the generic struct device stuff, then you can start doing
DMA stuff at the generic layer" in this posting:
http://marc.theaimsgroup.com/?l=linux-kernel&m=102501406027125&w=2

I think that David is erring too much on the side of
stagnation right now. I hope he'll understand this in future. In the
meantime, I'd be in favor of continuing to work this into a clean
patch that everyone else likes and then asking Linus to integrate that
with or without David's blessing if nobody identifies any real
technical problems with it, at least if nobody else objects.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-06 15:17:59

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, 2002-12-05 at 23:41, Adam J. Richter wrote:
> These change
> will eliminate a difficulty in supporting devices on inconsistent-only
> machines

These systems simply do not exist.

You can turn the cache off on pages or the cpu caches are fully coherent
with the device with caches turned on for the page.

What platform is the exception? Not bothering to implement the
cache-disabled mapping solution does not make a platform a candidate
for the answer to this question.

I think you are solving a non-problem, but if you want me to see your
side of the story you need to give me specific examples of where
pci_alloc_consistent() is "IMPOSSIBLE".

2002-12-06 16:14:40

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, 2002-12-06, David S. Miller wrote:
>On Thu, 2002-12-05 at 23:41, Adam J. Richter wrote:
>> These change
>> will eliminate a difficulty in supporting devices on inconsistent-only
>> machines

>I think you are solving a non-problem, but if you want me to see your
>side of the story you need to give me specific examples of where
>pci_alloc_consistent() is "IMPOSSIBLE".

I am not a parisc developer, but it is apparently the
case for certain parisc machines with "PCXS/T processors" or
the "T class" machines, as described by Mathew Wilcox:

http://lists.parisc-linux.org/pipermail/parisc-linux/2002-December/018535.html

They currently need the contortions
that are implemented in linux-2.5.50/drivers/net/lasi_82596.c
and partially implemented in drivers/scsi/53c700.c to be
implemented in every driver that they want to use (i.e., what
these drivers try to do when pci_alloc_consistent fails).

Under the API addition that we've been discussing, the
extra cache flushes and invalidations that these drivers need
would become macros that would be expand to nothing on the
other architectures, and the drivers would no longer have to
have "if (consistent_alloation_failed) ..." branches around them.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-06 16:19:39

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> I like your term DMA_CONSISTENT better than DMA_CONFORMANCE_CONSISTANT
> . I think the word "conformance" in there does not reduce the time
> that it takes to figure out what the symbol means. I don't think any
> other facility will want to use the terms DMA_{,IN}CONSISTENT, so I
> prefer that we go with the more medium sized symbol.

I'm not so keen on this. The idea of this parameter is not to tell the
allocation routine what type of memory you would like, but to tell it what
type of memory the driver can cope with. I think for the inconsistent case,
DMA_INCONSISTENT looks like the driver is requiring inconsistent memory, and
expecting to get it. I'm open to changing the "CONFORMANCE" part, but I'd
like to name these parameters something that doesn't imply they're requesting
a type of memory.

James

2002-12-06 16:32:30

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, Dec 06, 2002 at 08:19:25AM -0800, Adam J. Richter wrote:
> On Fri, 2002-12-06, David S. Miller wrote:
> >I think you are solving a non-problem, but if you want me to see your
> >side of the story you need to give me specific examples of where
> >pci_alloc_consistent() is "IMPOSSIBLE".
>
> I am not a parisc developer, but it is apparently the
> case for certain parisc machines with "PCXS/T processors" or
> the "T class" machines, as described by Mathew Wilcox:

Machines built with PCXS and PCXT processors are guaranteed not to have
PCI. So this only becomes a problem when supporting non-PCI devices.
The devices you mentioned -- 53c700 & 82596 -- are core IO and really do
need to be supported. There's also a large userbase for these machines,
dropping support for them is not an option.

T class machines don't have PCI slots per se, but they do have GSC
slots into which a card can be plugged that contains a Dino GSC to PCI
bridge and one or more PCI devices. Examples of cards that are like
this include acenic, single and dual tulip.

On the other hand, it's going to take some really motivated person to
make T class work. I'm firmly uninterested in it.

--
"It's not Hollywood. War is real, war is primarily not about defeat or
victory, it is about death. I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

2002-12-06 16:40:53

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

> These systems simply do not exist.

Yes, they do. The parisc pcxs and pcxt processors are the prime example that
has annoyed me for a while. This has no ability to control the cache at the
page level (it doesn't even seem to allow fully disabling the processor
cache---not that you'd want to do that). The result is that it cannot ever
return consistent memory, so pci_alloc_consistent always fails (see
arch/parisc/kernel/pci-dma.c:fail_alloc_consistent). I have one of these
machines (A HP9000/715) and I maintain the driver for the SCSI chip, which
also needs to work efficiently on the intel platform, which is what got me
first thinking about the problem.

Let me say again: I don't envisage any driver writer worrying about this edge
case, unless they're already implementing work arounds for it now.

I plan to maintain the current pci_ DMA API exactly as it is, with no
deviations. Thus the dma_ API too can be operated in full compatibility mode
with the pci_ API. That's the design intent. However, I want the dma_ API to
simplify this driver edge case for me (and for others who have to maintain
similar drivers), which is why it allows a deviation from the pci_ API *if the
driver writer asks for it*.

James

2002-12-06 17:02:43

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, 06 Dec 2002, James Bottomley wrote:
>[email protected] said:
>> I like your term DMA_CONSISTENT better than DMA_CONFORMANCE_CONSISTANT
>> . I think the word "conformance" in there does not reduce the time
>> that it takes to figure out what the symbol means. I don't think any
>> other facility will want to use the terms DMA_{,IN}CONSISTENT, so I
>> prefer that we go with the more medium sized symbol.

>I'm not so keen on this. The idea of this parameter is not to tell the
>allocation routine what type of memory you would like, but to tell it what
>type of memory the driver can cope with. I think for the inconsistent case,
>DMA_INCONSISTENT looks like the driver is requiring inconsistent memory, and
>expecting to get it. I'm open to changing the "CONFORMANCE" part, but I'd
>like to name these parameters something that doesn't imply they're requesting
>a type of memory.

How about renaming DMA_INCONSISTENT to DMA_MAYBE_CONSISTENT?

By the way, I previously suggested a flags field to indicate
what the driver could cope with. 0 would mean consistent memory, 1's
would indicate other things that the driver could cope with that would
be added if and when a real need for them arises (read caching, write
back cachine, cpu-cpu consistency, cache line size smaller than 2**n
bytes, etc.). Regarding the debugging capability of
DMA_CONFORMANCE_NONE, I don't think that will be as useful in the way
that DMA_DIRECTION_NONE is, because transfer direction is often passed
through the io path of a device driver and errors in doing so are a
common. In comparison, I think the calls to dma_malloc will typically
have this argument specified as a constant where the call is made,
with the possible exception of some allocation being consolidated in
the generic device layer.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-06 17:34:59

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, 6 Dec 2002, Matthew Wilcox wrote:
>Machines built with PCXS and PCXT processors are guaranteed not to have
>PCI. So this only becomes a problem when supporting non-PCI devices.
>The devices you mentioned -- 53c700 & 82596 -- are core IO and really do
>need to be supported. There's also a large userbase for these machines,
>dropping support for them is not an option.

Back on 7 Nov 2002, James Bottomley wrote:
| The ncr8xxx driver is another one used for the Zalon controller in parisc, so
| it will eventually have the same issues.

How many other drivers beyond these three do we expect to
need similar sync points if the T class remains unsupported?

>T class machines don't have PCI slots per se, but they do have GSC
>slots into which a card can be plugged that contains a Dino GSC to PCI
>bridge and one or more PCI devices. Examples of cards that are like
>this include acenic, single and dual tulip.

Regarding the "T class", I would be intersted in knowing how
old it is, if it is discontinued at this point, how much of a user
base there is, and how many of these PCI-on-GSC cards there are.

I was previously under the impression that there were some
parisc machines that could take some kind of commodity PCI cards and
lacked consistent memory. If the reality is that only about six
drivers would ever have to be ported to use these sync points, then I
could see keeping dma_{alloc,free}_consistent, and moving the
capability of dealing with inconsistent memory to some wrappers in a
separate .h file (dma_alloc_maybe_consistent, dma_alloc_maybe_free).

I suppose another consideration would be how likely it is that
a machine that we might care about without consistent memory will ship
in the future. In general, the memory hierarchy is getting taller
(levels of caching, non-uniform memory access), but perhaps the
industry will continue to treat consistent memory capability as a
requirement.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-06 17:41:08

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, Dec 06, 2002 at 10:26:57AM -0600, James Bottomley wrote:
> I'm not so keen on this. The idea of this parameter is not to tell the
> allocation routine what type of memory you would like, but to tell it what
> type of memory the driver can cope with. I think for the inconsistent case,
> DMA_INCONSISTENT looks like the driver is requiring inconsistent memory, and
> expecting to get it.

Of course if they're flags, then `DMA_CONSISTENT | DMA_INCONSISTENT'
is pretty obvious...

-Miles

--
P.S. All information contained in the above letter is false,
for reasons of military security.

2002-12-06 18:00:24

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, Dec 06, 2002 at 09:39:24AM -0800, Adam J. Richter wrote:
> On Fri, 6 Dec 2002, Matthew Wilcox wrote:
> >Machines built with PCXS and PCXT processors are guaranteed not to have
> >PCI. So this only becomes a problem when supporting non-PCI devices.
> >The devices you mentioned -- 53c700 & 82596 -- are core IO and really do
> >need to be supported. There's also a large userbase for these machines,
> >dropping support for them is not an option.
>
> Back on 7 Nov 2002, James Bottomley wrote:
> | The ncr8xxx driver is another one used for the Zalon controller in parisc, so
> | it will eventually have the same issues.
>
> How many other drivers beyond these three do we expect to
> need similar sync points if the T class remains unsupported?

Er, well.. machines that can take the Zalon card also have consistent PCI.
However, there are machines which can take the ncr53c720 chip which have
no consistent shared memory available. Rumours abound of an ncr53c770
driver that already supports non-consistent memory, but nobody's actually
said whch one it is yet.

Leaving aside the T-class, machines that don't support io consistent memory
generally have:

(drivers that need io consistent memory):
- 82596 ethernet
- ncr53c700 scsi
- ncr53c720 scsi
- zero to four EISA slots

(drivers that don't do DMA):
- two 16550-compatible serial ports
- Mux serial port
- Lasi parallel port
- Skunk parallel port
- HIL keyboard/mouse
- Graphics cards

(custom drivers needed anyway):
- Harmony audio
- various other SCSI chips
- Interphase 100BaseTx
- HPPB slots

I think that's about it... cc

> >T class machines don't have PCI slots per se, but they do have GSC
> >slots into which a card can be plugged that contains a Dino GSC to PCI
> >bridge and one or more PCI devices. Examples of cards that are like
> >this include acenic, single and dual tulip.
>
> Regarding the "T class", I would be intersted in knowing how
> old it is, if it is discontinued at this point, how much of a user
> base there is, and how many of these PCI-on-GSC cards there are.

It's certainly discontinued. I get the impression it was already out in
1997 from a quick Google search. It's not exactly a slow machine even
by todays standards -- up to 12 180MHz 64-bit processors, but it's just
too weird to be worth supporting.

There's lots of PCI-on-GSC cards; they were used in the B/C/J workstations
and the D/K/R servers.

> I was previously under the impression that there were some
> parisc machines that could take some kind of commodity PCI cards and
> lacked consistent memory.

No, that is not the case.

> If the reality is that only about six
> drivers would ever have to be ported to use these sync points, then I
> could see keeping dma_{alloc,free}_consistent, and moving the
> capability of dealing with inconsistent memory to some wrappers in a
> separate .h file (dma_alloc_maybe_consistent, dma_alloc_maybe_free).
>
> I suppose another consideration would be how likely it is that
> a machine that we might care about without consistent memory will ship
> in the future. In general, the memory hierarchy is getting taller
> (levels of caching, non-uniform memory access), but perhaps the
> industry will continue to treat consistent memory capability as a
> requirement.

I think it will. The IOMMU in the T600 is the only one I've ever heard
of that wasn't consistent with host memory.

--
"It's not Hollywood. War is real, war is primarily not about defeat or
victory, it is about death. I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

2002-12-06 18:12:42

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

From: "Adam J. Richter" <[email protected]>
Date: Fri, 6 Dec 2002 08:19:25 -0800

Under the API addition that we've been discussing, the
extra cache flushes and invalidations that these drivers need
would become macros that would be expand to nothing on the
other architectures, and the drivers would no longer have to
have "if (consistent_alloation_failed) ..." branches around them.

Ok, but here is where my big concerns lie.

Specifically, it took years to get most developers confortable with
pci_alloc_consitent() and friends. I totally fear that asking them to
now add cache flushing stuff to their drivers takes the complexity way
over the edge.

Willy, these PCXS/T processors sound like a newer cpu, do you mean to
tell me the caches are totally not coherent with device bus space?

Please elaborate, I want to learn more.

2002-12-06 18:21:42

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> Specifically, it took years to get most developers confortable with
> pci_alloc_consitent() and friends. I totally fear that asking them to
> now add cache flushing stuff to their drivers takes the complexity way
> over the edge.

I have no plans ever to do that. It's only a tiny minority of drivers that
should ever need to know the awful guts of cache flushing, and such drivers
are already implementing the cache flushes now.

How about (as Adam suggested) two dma allocation API's

1) dma_alloc_consistent which behaves identically to pci_alloc_consistent
2) dma_alloc which can take the conformance flag and can be used to tidy up
the drivers that need to know about cache flushing.

James

2002-12-06 18:26:48

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

From: James Bottomley <[email protected]>
Date: Fri, 06 Dec 2002 12:29:10 -0600

How about (as Adam suggested) two dma allocation API's

1) dma_alloc_consistent which behaves identically to pci_alloc_consistent
2) dma_alloc which can take the conformance flag and can be used to tidy up
the drivers that need to know about cache flushing.

Now that the situation is much more clear, I'm feeling a lot
better about this.

I have only one request, in terms of naming. What we're really
doing is adding a third class of memory, it really isn't consistent
and it really isn't streaming. It's inconsistent memory meant to
be used for "consistent memory things".

So could someone come up with a clever name for this thing? :-)

2002-12-06 18:28:33

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, Dec 06, 2002 at 10:17:15AM -0800, David S. Miller wrote:
> Specifically, it took years to get most developers confortable with
> pci_alloc_consitent() and friends. I totally fear that asking them to
> now add cache flushing stuff to their drivers takes the complexity way
> over the edge.
>
> Willy, these PCXS/T processors sound like a newer cpu, do you mean to
> tell me the caches are totally not coherent with device bus space?
>
> Please elaborate, I want to learn more.

Nono, these are _old_ machines, probably stopped production round about
1995 or so. We mentioned these briefly back in the original days of
the pci_alloc_consistent interface discussions. These machines cannot
allocate uncached memory, nor can the peripherals snoop the CPU's cache
(or vice versa). As I indicated to Adam, there's a fairly limited range
of devices available for these systems and there shouldn't be a huge
problem converting the few drivers we need to these interfaces.

--
"It's not Hollywood. War is real, war is primarily not about defeat or
victory, it is about death. I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

2002-12-06 18:33:20

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> I have only one request, in terms of naming. What we're really doing
> is adding a third class of memory, it really isn't consistent and it
> really isn't streaming. It's inconsistent memory meant to be used for
> "consistent memory things".

Yes, we've discussed that too...but not come to a conclusion. The problem is
really that if you call dma_alloc and pass in the DMA_CONFORMANCE_NON_CONSISTEN
T flag, what you're saying is "This driver implements all the correct cache
flushes and can cope with inconsistent memory. Please give me the type of
memory that's most efficient for the platform I'm running on.". The driver
isn't asking give me a specific type of memory, it's telling the platform what
it's capabilities are.

Any thoughts on naming would be most welcome.

James

2002-12-06 18:33:38

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

From: Matthew Wilcox <[email protected]>
Date: Fri, 6 Dec 2002 18:36:09 +0000

As I indicated to Adam, there's a fairly limited range of devices
available for these systems and there shouldn't be a huge problem
converting the few drivers we need to these interfaces.

Ok, so to reiterate my other email, I'm fine with this as long as
suitable names are used to describe what is happening in the API
and to avoid confusion with existing practice.

2002-12-06 18:37:48

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

From: James Bottomley <[email protected]>
Date: Fri, 06 Dec 2002 12:40:49 -0600

Yes, we've discussed that too...but not come to a conclusion. The problem is
really that if you call dma_alloc and pass in the DMA_CONFORMANCE_NON_CONSISTEN
T flag, what you're saying is "This driver implements all the correct cache
flushes and can cope with inconsistent memory. Please give me the type of
memory that's most efficient for the platform I'm running on.". The driver
isn't asking give me a specific type of memory, it's telling the platform what
it's capabilities are.

Any thoughts on naming would be most welcome.

How about just making a dma_alloc_$(NEWNAME)(), and consistent ports
can just alias that to dma_alloc_consistent()?

The only question is $(NEWNAME). "inconsistent" might be ok, but it's
maybe too similar to "consistent" for my taste.

How about dma_alloc_noncoherent(). I like this one, comments?

2002-12-06 20:57:31

by Oliver Xymoron

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, Dec 06, 2002 at 10:42:21AM -0800, David S. Miller wrote:
> From: James Bottomley <[email protected]>
> Date: Fri, 06 Dec 2002 12:40:49 -0600
>
> Yes, we've discussed that too...but not come to a conclusion. The problem is
> really that if you call dma_alloc and pass in the DMA_CONFORMANCE_NON_CONSISTEN
> T flag, what you're saying is "This driver implements all the correct cache
> flushes and can cope with inconsistent memory. Please give me the type of
> memory that's most efficient for the platform I'm running on.". The driver
> isn't asking give me a specific type of memory, it's telling the platform what
> it's capabilities are.
>
> Any thoughts on naming would be most welcome.
>
> How about just making a dma_alloc_$(NEWNAME)(), and consistent ports
> can just alias that to dma_alloc_consistent()?
>
> The only question is $(NEWNAME). "inconsistent" might be ok, but it's
> maybe too similar to "consistent" for my taste.

Can we do pci_alloc_consistent -> dma_alloc? Then regardless of what
you name the other one, the consistent version will obviously be prefered.

--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."

2002-12-06 22:14:05

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, 6 Dec 2002, Matthew Wilcox wrote:
>Leaving aside the T-class, machines that don't support io consistent memory
>generally have:
>
>(drivers that need io consistent memory):
[...]
> - zero to four EISA slots

So it sounds like any EISA or ISA card could be plugged into
these machines.

This makes me lean infinitesmally more toward a parameter
to dma_alloc rather than a separate dma_alloc_not_necessarily_consistent
function, because if there ever are other dma_alloc variations that
we want to support, it is more likely that there may be overlap
between the users of those features and then the number of
different function calls would have to grow exponentially (or
we might then talk about changing the API again, which is not
the end of the world, but is certainly more difficult than not
having to do so).

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-06 22:19:23

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> This makes me lean infinitesmally more toward a parameter to
> dma_alloc rather than a separate dma_alloc_not_necessarily_consistent
> function, because if there ever are other dma_alloc variations that we
> want to support, it is more likely that there may be overlap between
> the users of those features and then the number of different function
> calls would have to grow exponentially (or we might then talk about
> changing the API again, which is not the end of the world, but is
> certainly more difficult than not having to do so).

I think I like this.

how about dma_alloc to take two flags

DRIVER_SUPPORTS_CONSISTENT_ONLY

and

DRIVER_SUPPORTS_NON_CONSISTENT

The meaning of which are hopefully obvious this time

and dma_alloc_consistent to be equivalent to dma_alloc with
DRIVER_SUPPORTS_CONSISTENT_ONLY (and hence equivalent to pci_alloc_consistent)

James

2002-12-06 22:24:57

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

From: James Bottomley <[email protected]>
Date: Fri, 06 Dec 2002 16:26:54 -0600

[email protected] said:
> This makes me lean infinitesmally more toward a parameter to
> dma_alloc rather than a separate dma_alloc_not_necessarily_consistent
> function, because if there ever are other dma_alloc variations that we
> want to support, it is more likely that there may be overlap between
> the users of those features and then the number of different function
> calls would have to grow exponentially (or we might then talk about
> changing the API again, which is not the end of the world, but is
> certainly more difficult than not having to do so).

I think I like this.

I don't.

If the concept isn't all that important, why bother?

See, if you have to allocate a whole new routine, you'll think
about whether it makes sense or not.

It's like adding a new system call, and the same arguments apply.

I don't want a 'flags' thing, because that tends to be the action
which opens the flood gates for putting random feature-of-the-day new
bits.

If you have to actually get a real API change made, it will get review
and won't "sneak on in". I also don't want architectures adding arch
specific flag bits that some drivers end up using, for example.
The suggested scheme allows that, and I can guarentee you that people
will do things like that.

You must take the time to get the semantics right and make sure they
really do handle the cases that are problematic. Random flag bits
passed to a "do everything" dma_alloc function don't encourage that at
all.

2002-12-06 22:25:17

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, 2002-12-06 at 23:26, James Bottomley wrote:
> [email protected] said:
> > This makes me lean infinitesmally more toward a parameter to
> > dma_alloc rather than a separate dma_alloc_not_necessarily_consistent
> > function, because if there ever are other dma_alloc variations that we
> > want to support, it is more likely that there may be overlap between
> > the users of those features and then the number of different function
> > calls would have to grow exponentially (or we might then talk about
> > changing the API again, which is not the end of the world, but is
> > certainly more difficult than not having to do so).
>
> I think I like this.
>
> how about dma_alloc to take two flags
>
> DRIVER_SUPPORTS_CONSISTENT_ONLY
>
> and
> DRIVER_SUPPORTS_CONSISTENT_ONLY
> DRIVER_SUPPORTS_NON_CONSISTENT
>

I rather like Dave's suggestion. I wouldn't want to type
DRIVER_SUPPORTS_CONSISTENT_ONLY a few dozen times for example... sure
you can do that internally but exposing it to drivers... why ?

2002-12-06 22:44:33

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

From: James Bottomley <[email protected]>
Date: Fri, 06 Dec 2002 16:48:57 -0600

I just don't like API names that look like

dma_alloc_may_be_inconsistent()

but if that's what it takes, I'll do it

Just use dma_alloc_noncoherent() and we can grep for that.

2002-12-06 22:41:27

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

[email protected] said:
> It's like adding a new system call, and the same arguments apply.

> I don't want a 'flags' thing, because that tends to be the action
> which opens the flood gates for putting random feature-of-the-day new
> bits.

I did think of this. The flags are enums in include/linux/dma-mapping.h In
theory they can't be hijacked by an architecture without either changing this
global header or exciting compiler warnings.

However, I can only see their being two types of drivers: those which do all
the sync points and those which don't do any, so I can't see any reason for
there to be any more than two such flags.

I also want an active discouragement from using the may return inconsistent
API, and I think, given the general programmer predisposition not to want to
type, that a long flag name (or a long routine name) does this.

I just don't like API names that look like

dma_alloc_may_be_inconsistent()

but if that's what it takes, I'll do it

James

2002-12-06 22:49:12

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

James Bottomley wrote:
>how about dma_alloc to take two flags

>DRIVER_SUPPORTS_CONSISTENT_ONLY

It's pretty much impossible not to support consistent memory.

I'd suggest a shorter name for code readability and particularly
to hint that this is the standard usage. I'd suggest DMA_CONSISTENT
or "0".

>and

>DRIVER_SUPPORTS_NON_CONSISTENT

There is a pretty strong convention for medium to short names
in the kernel, although this name will be used much less, so its
length is not as important. I'd like something that would match the
names of the corresponding cache flushing and invalidation functions.
I think I had previously suggested DMA_MAYBE_CONSISTENT and wmb_maybe
or dma_sync_maybe but I'm not that attached to the "maybe" word.

[...]

>and dma_alloc_consistent to be equivalent to dma_alloc with
>DRIVER_SUPPORTS_CONSISTENT_ONLY (and hence equivalent to pci_alloc_consistent)

Why have a separate dma_alloc_consistent function?

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-07 04:09:56

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

The question of flags versus an extra procedure name is not
actually that big of a deal to me. I can live with either approach
for dma_alloc. However, I'll explain what I see as advantages of
using a flags parameter for others to consider (or to tell me where
they think I'm wrong or haven't thought of something).

On Fri, 06 Dec 2002, David S. Miller wrote:
> I don't want a 'flags' thing, because that tends to be the action
> which opens the flood gates for putting random feature-of-the-day new
> bits.

It is possible to overuse any extension mechanism. I think
you've made a general argument against extension mechanisms that is
usually not true in practice, at least in Linux. I think simple
extension mechanisms, like having a flag word in this case when we
need to express a choice between two options anyhow, tend to do more
good than harm on average, even when I think about most egregious
cases like filesystems. Maybe if I could see an example or two
extension mechanisms that have a net negative impact in your opinion,
I'd better understand.

> If you have to actually get a real API change made, it will get review
> and won't "sneak on in"

Or Linux just won't get that optimization because people give
up or leave their changes on the back burner indefinitely, something
that I think happens to most Linux improvements, especially if you
count those that don't make it to implementation because people
correctly forsee this kind of bureaucracy. If anyone does decide to
propose another flag-like facility for dma_alloc, I expect people will
complain that changing the API may require hundreds of drivers to be
updated.

I did think about the possibility of a flags parameter
inviting features that aren't worth their complexity or other costs
before I suggested a flags parameter. My view was and is that if
people handling individual architectures want to add and remove flags
bits, even just to experiment with features, I think the existing
process of getting patches integrated would cause enough review. I
also think our capacity to process changes is already exceeded by
voluntary submissions as evidenced by backlogs and dropped patches.

> I also don't want architectures adding arch
> specific flag bits that some drivers end up using, for example.

Here I'm guessing at your intended meaning. If you mean that
there will be numbering collisions, I would expect these flags to be
defined in include/asm-xxx/dma-mapping.h. I was going to suggest that
we even do this for DMA_ALLOW_INCONSISENT.
include/asm-parisc/dma-mapping.h would contain:

#define DMA_ALLOW_INCONSISTENT 0x1

linux/dma-mapping.h would contain:

#include <asm/dma-mapping.h>
#ifndef DMA_ALLOW_INCONSISTENT
# define DMA_ALLOW_INCONSISTENT 0
#endif

By that convention, bits would not be used in architectures
that never set them, and it could conceivably simplify some compiler
optimizations like "flags |= DMA_SOME_FLAG;" and
"if (flags & DMA_SOME_FLAG) {....}" on architectures where this is never
true. Bit assignment would be under the control of the platforms, at
least in the absense of a flag that is meaningful on every platform (if
there were one, it would just simplify the source code to define it only
in linux/dma-mapping.h).

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-07 10:52:01

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Thu, Dec 05, 2002 at 11:14:16PM -0800, Adam J. Richter wrote:
> David Gibson wrote:
> >On Thu, Dec 05, 2002 at 06:08:22PM -0800, Adam J. Richter wrote:
> [...]
> >> In linux-2.5.50/drivers/net/lasi_82596.c, the macros
> >> CHECK_{WBACK,INV,WBACK_INV} have definitions like:
> >>
> >> #define CHECK_WBACK(addr,len) \
> >> do { if (!dma_consistent) dma_cache_wback((unsigned long)addr,len); } while (0)
> >>
> >> These macros are even used in IO paths like i596_rx(). The
> >> "if()" statement in each of these macros is the extra branch that
> >> disappears on most architectures under James's proposal.
> >
> >Erm... I have no problem with the macros that James's proposal would
> >use to take away this branch - I would expect to use exactly the same
> >ones. It's just the notion of "try to get consistent memory, but get
> >me any old memory otherwise" that I'm not so convinced by.
> >
> >In any case, on platforms where the dma_malloc() could really return
> >either consistent or non-consistent memory, James's sync macros would
> >have to have an equivalent branch within.
>
> Yeah, I should have said "because then you don't have to have a
> branch for the case where the platform always or *never* returns
> consistent memory on a give machine."

Actually, no, since my idea was to remove the "consistent_alloc()"
path from the driver entirely - leaving only the map/sync approach.
That gives a result which is correct everywhere (afaict) but (as
you've since pointed out) will perform poorly on platforms where the
map/sync operations are expensive.

> >> >What performance advantages of consistent memory?
>
> >> [...] For
> >> example, pci_sync_single is 55 lines of C code in
> >> linux-2.5.50/arch/sparc64/kernel/pci_iommu.c.
> >
> >Hmm... fair enough. Ok, I can see the point of a fall back to
> >non-consistent approach given that. So I guess the idea makes sense,
> >so long as dma_malloc() (without the consistent flag) is taken to be
> >"give me DMAable memory, consistent or not, whichever is cheaper for
> >this platform" rather than "give me DMAable memory, consistent if
> >possible". It was originally presented as the latter which misled me.
>
> As long as dma_sync_maybe works with the addresses returned by
> dma_malloc and dma_malloc only returns the types of memory that the
> callers claims to be prepared to deal with, the decision about what
> kind of memory dma_malloc should return when it has a choice is up to
> the platform implementation.
>
> >I think the change to the parameters which I suggested in a reply to
> >James makes this a bit clearer.
>
> I previously suggested some of the changes in your description:
> name them dma_{malloc,free} (which James basically agrees with), have
> a flags field. However, given that it's a parameter and you're going
> to pass a constant symbol like DMA_CONSISTENT or DMA_INCONSISTENT to it,
> it doesn't really matter if its an enum or an int to start with, as it
> could be changed later with minimal or zero driver changes.
>
> I like your term DMA_CONSISTENT better than
> DMA_CONFORMANCE_CONSISTANT. I think the word "conformance" in there
> does not reduce the time that it takes to figure out what the symbol
> means. I don't think any other facility will want to use the terms
> DMA_{,IN}CONSISTENT, so I prefer that we go with the more medium sized
> symbol.

Actually I think the "conformance" is actively misleading, since (to
me) it implies that the function is always "trying" to get consistent
memory, which it really isn't.

> Naming the parameter to dma_malloc "bus" would imply that it
> will not look at individual device information like dma_mask, which is
> wrong. Putting the flags field in the middle of the parameter list
> will make the dma_malloc and dma_free lists unnnecessarily different.
> I think these two were just oversights in your posting.

Yes indeed.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-07 10:50:47

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, Dec 06, 2002 at 10:26:57AM -0600, James Bottomley wrote:
> [email protected] said:
> > I like your term DMA_CONSISTENT better than DMA_CONFORMANCE_CONSISTANT
> > . I think the word "conformance" in there does not reduce the time
> > that it takes to figure out what the symbol means. I don't think any
> > other facility will want to use the terms DMA_{,IN}CONSISTENT, so I
> > prefer that we go with the more medium sized symbol.
>
> I'm not so keen on this. The idea of this parameter is not to tell
> the allocation routine what type of memory you would like, but to
> tell it what type of memory the driver can cope with. I think for
> the inconsistent case, DMA_INCONSISTENT looks like the driver is
> requiring inconsistent memory, and expecting to get it. I'm open to
> changing the "CONFORMANCE" part, but I'd like to name these
> parameters something that doesn't imply they're requesting a type of
> memory.

Well, actually I was thinking of the flags as a bitmask, not an enum,
so I was assuming (flags==0) for not-neccessarily-consistent memory.
However, since having seen davem's comments, I agree with him that
separate entry points is probably a better idea for API sanity.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-07 10:50:46

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Fri, Dec 06, 2002 at 10:31:13AM -0800, David Miller wrote:
> From: James Bottomley <[email protected]>
> Date: Fri, 06 Dec 2002 12:29:10 -0600
>
> How about (as Adam suggested) two dma allocation API's
>
> 1) dma_alloc_consistent which behaves identically to pci_alloc_consistent
> 2) dma_alloc which can take the conformance flag and can be used to tidy up
> the drivers that need to know about cache flushing.
>
> Now that the situation is much more clear, I'm feeling a lot
> better about this.
>
> I have only one request, in terms of naming. What we're really
> doing is adding a third class of memory, it really isn't consistent
> and it really isn't streaming. It's inconsistent memory meant to
> be used for "consistent memory things".

Not really... it seems to me its abdicating the choice of consistent
versus streaming memory to the platform. Or to look at it another
way, the actual guarantees it provides are identical to those of
streaming DMA, but this gives the platform an opportunity to optimise
by controlling the allocation rather than demanding it deal with
memory from any old place as pci_map_* must do.

A driver using this sort of memory should be at least isomorphic to
one using streaming memory (maybe identical, depending on exactly
which functions are which etc.).

> So could someone come up with a clever name for this thing? :-)

Given that, how about "fast-streaming" DMA memory.

--
David Gibson | For every complex problem there is a
[email protected] | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2002-12-07 11:19:29

by Russell King

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Sat, Dec 07, 2002 at 08:45:30PM +1100, David Gibson wrote:
> Actually, no, since my idea was to remove the "consistent_alloc()"
> path from the driver entirely - leaving only the map/sync approach.
> That gives a result which is correct everywhere (afaict) but (as
> you've since pointed out) will perform poorly on platforms where the
> map/sync operations are expensive.

As I've also pointed out in the past couple of days, doing this will
mean that you then need to teach the drivers to align structures to
cache line boundaries. Otherwise, you _will_ get into a situation
where you _will_ loose data.

One such illustration of this is the tulip driver, with an array of
16-byte control/status blocks on a machine with a 32-byte cache line
size.

I would rather keep the consistent_alloc() approach for allocating
consistent memory, and align structures as they see fit, rather than
having to teach the drivers to align appropriately. And you can be
damned sure that driver writers are _not_ going to get the alignment
right.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-12-07 14:34:43

[permalink] [raw]

Subject: Re: [RFC] generic device DMA implementation

On Sat, 7 Dec 2002 11 Russell King wrote:
>On Sat, Dec 07, 2002 at 08:45:30PM +1100, David Gibson wrote:
>> Actually, no, since my idea was to remove the "consistent_alloc()"
>> path from the driver entirely - leaving only the map/sync approach.
>> That gives a result which is correct everywhere (afaict) but (as
>> you've since pointed out) will perform poorly on platforms where the
>> map/sync operations are expensive.

>As I've also pointed out in the past couple of days, doing this will
>mean that you then need to teach the drivers to align structures to
>cache line boundaries. Otherwise, you _will_ get into a situation
>where you _will_ loose data.

Drivers for such hardware would allocate their memory with
dma_alloc(...,DMA_CONSISTENT), which is what 99.9% of all current
drivers would do, indicating that the allocation should
fail if consistent memory is unavailable.

David Gibson was describing a hypothetical platform which
would have both consistent and inconsistent meory but on which the
cache operations were so cheap that he thought it might be more
optimal to give inconsistent memory to those drivers that claimed
to be able to handle it. (Ignore the question of whether that
really is optimal; let's assume David is right for the sake
of example.) On such a platform, drivers that did not
claim to be able to handle inconsistent memory would still get
consistent memory (or get NULL). The optimization that David has
in mind would only be done for drivers that claim to be able to
handle inconsistent memory.

>I would rather keep the consistent_alloc() approach for allocating
>consistent memory, and align structures as they see fit, rather than
>having to teach the drivers to align appropriately. And you can be
>damned sure that driver writers are _not_ going to get the alignment
>right.

Nobody is talking about eliminating the mechanism for a
driver to say "fail if you cannot give me consistent memory."
That would be the normal usage.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-12-08 05:22:43