2002-06-08 20:38:59

by Roland Dreier

[permalink] [raw]
Subject: PCI DMA to small buffers on cache-incoherent arch

I recently fixed some problems in the 2.4 USB stack that caused
crashes on the PowerPC 440GP (and probably other cache-incoherent
architectures). However, some parts of my fix have raised some
questions on the linux-usb-devel and linuxppc-embedded mailing lists
and I would like to raise these issues here so that we can get a
definitive answer. I would especially appreciate comments from David
Miller and other PCI DMA experts.

The problem that caused crashes on cache-incoherent architectures (my
specific system uses a PPC 440GP but this should apply in general) was
the following. The USB stack was doing PCI DMA into buffers that were
allocated on the stack, which causes stack corruption: on the PPC
440GP, pci_map_single() with PCI_DMA_FROMDEVICE just invalidates the
cache for the region being mapped. However, if a buffer is smaller
than a cache line, then two bad things can happen.

First, there may be valid data outside the buffer but in the same
cache line that has not been flushed to main memory yet. In that case
when the cache is invalidated the new data is lost and any access to
that memory will get the old data from main memory. Second, access to
the cache line after the cache has been invalidated but before the DMA
has completed will pull the cache line back into processor cache and
the DMA buffer will have invalid data.

The solution to this was simply to use kmalloc() to allocate buffers
instead of using automatic variables. On the PPC 440GP this is
definitely safe because the 440GP's has 32 byte cache lines and
kmalloc() will never return a buffer that is smaller than 32 bytes or
not 32-byte aligned. However, this leads to my first question: is
this safe on all architectures? Could there be a cache-incoherent
architecture where kmalloc() returned a buffer smaller than a cache
line?

The second question is related to this. There are other parts of the
USB stack where structures are defined:

struct something {
/* ... some members ... */
char buffer[SMALLER_THAN_L1_CACHE_LINE_SIZE];
/* ... some more members ... */
};

Then a struct something is kmalloc()'ed and buffer is used to DMA
into. However, even though the overall structure is aligned on a
cache line boundary, buffer is not aligned. It seems to me that this
is not safe because access to members near buffer during a DMA could
cause corruption (as detailed above). In my patch I changed code like
this to look like

struct something {
/* ... some members ... */
char *buffer;
/* ... some more members ... */
};

and then kmalloc()'ed buffer separately when kmalloc()'ing a struct
something.

However, there is some question about whether changing these buffers
is really necessary. Code like the above doesn't cause immediately
obvious crashes the way buffers on the stack do (since we usually get
lucky and don't touch the members near buffer around the DMA access).
I felt that relying on this coincidence is not safe and should be
fixed at the same time.

DMA-mapping.txt says kmalloc()'ed memory is OK for DMAs and does not
mention cache alignment. So the question is: did I misunderstand the
cache coherency issues or is my patch correct? At a higher level: how
is this supposed to work? Should code doing DMA into a smallish
buffer do

buffer = kmalloc(max(BUFFER_SIZE, L1_CACHE_BYTES), GFP_XXX);

or is a kmalloc()'ed buffer always safe? Does the code doing DMA need
to worry about the cache alignment of its buffer?

In any case this probably needs to be documented better. Once I
understand the answer I'll write up a patch to DMA-mapping.txt so no
one has to rediscover all this.

Thanks,
Roland


2002-06-08 21:44:18

by Anton Blanchard

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch


> The problem that caused crashes on cache-incoherent architectures (my
> specific system uses a PPC 440GP but this should apply in general) was
> the following. The USB stack was doing PCI DMA into buffers that were
> allocated on the stack, which causes stack corruption: on the PPC
> 440GP, pci_map_single() with PCI_DMA_FROMDEVICE just invalidates the
> cache for the region being mapped. However, if a buffer is smaller
> than a cache line, then two bad things can happen.

Yes, DMAing to the stack is definitely a bug, thats mentioned in
Documentation/DMA-mapping.txt. We used to vmalloc our kernel stacks
on ppc64 and that picked up all sorts of DMA violations.

I just checked 2.5 and noticed the scsi code is _still_ DMAing to the
stack! Maybe it would be worth having a debug option for x86 where
it uses vmalloc for kernel stack allocation :)

Anyway attached is a patch from Todd Inglett that I updated for 2.5.

Anton

===== drivers/scsi/scsi_scan.c 1.13 vs edited =====
--- 1.13/drivers/scsi/scsi_scan.c Fri May 31 11:17:30 2002
+++ edited/drivers/scsi/scsi_scan.c Sun Jun 9 07:28:25 2002
@@ -368,7 +368,6 @@
unsigned int dev;
unsigned int lun;
unsigned char *scsi_result;
- unsigned char scsi_result0[256];
Scsi_Device *SDpnt;
Scsi_Device *SDtail;

@@ -390,9 +389,7 @@
*/
scsi_initialize_queue(SDpnt, shpnt);
SDpnt->request_queue.queuedata = (void *) SDpnt;
- /* Make sure we have something that is valid for DMA purposes */
- scsi_result = ((!shpnt->unchecked_isa_dma)
- ? &scsi_result0[0] : kmalloc(512, GFP_DMA));
+ scsi_result = kmalloc(512, GFP_DMA);
}

if (scsi_result == NULL) {
@@ -532,10 +529,9 @@
kfree((char *) SDpnt);
}

- /* If we allocated a buffer so we could do DMA, free it now */
- if (scsi_result != &scsi_result0[0] && scsi_result != NULL) {
- kfree(scsi_result);
- } {
+ kfree(scsi_result);
+
+ {
Scsi_Device *sdev;
Scsi_Cmnd *scmd;

===== drivers/scsi/sr_ioctl.c 1.13 vs edited =====
--- 1.13/drivers/scsi/sr_ioctl.c Thu May 23 23:18:39 2002
+++ edited/drivers/scsi/sr_ioctl.c Sun Jun 9 07:30:15 2002
@@ -344,7 +344,12 @@
Scsi_CD *SCp = cdi->handle;
u_char sr_cmd[10];
int result, target = minor(cdi->dev);
- unsigned char buffer[32];
+ unsigned char *buffer = kmalloc(512, GFP_DMA);
+
+ if (buffer == NULL) {
+ printk("SCSI DMA pool exhausted.");
+ return -ENOMEM;
+ }

memset(sr_cmd, 0, sizeof(sr_cmd));

@@ -417,6 +422,7 @@
return -EINVAL;
}

+ kfree(buffer);
#if 0
if (result)
printk("DEBUG: sr_audio: result for ioctl %x: %x\n", cmd, result);

2002-06-08 23:07:05

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Roland Dreier <[email protected]>
Date: 08 Jun 2002 13:38:53 -0700

The USB stack was doing PCI DMA into buffers that were
allocated on the stack

This is illegal.

which causes stack corruption: on the PPC
440GP, pci_map_single() with PCI_DMA_FROMDEVICE just invalidates the
cache for the region being mapped. However, if a buffer is smaller
than a cache line, then two bad things can happen.

There is no allocation scheme legal for PCI DMA which gives you
smaller than a cacheline of data, this includes SLAB. This is why
stack buffers and the like are illegal for PCI DMA.

The solution to this was simply to use kmalloc() to allocate buffers
instead of using automatic variables.

Right.

However, this leads to my first question: is this safe on all
architectures?

It must be. If the architecture allows SLAB to give smaller than
cacheline sized data, it must handle PCI DMA map/unmap flushing
in an appropriate fashion (ie. handle sub-cacheline buffers).

DMA-mapping.txt says kmalloc()'ed memory is OK for DMAs and does not
mention cache alignment.

It doesn't mention cache alignment because that is an implementation
specific issue. The user of the interfaces need not be concerned
about any of this.

There need be no changes to the documentation. If you do as the
documentation states (use kmalloc or get_free_page to get your
buffers) then it will just work.

Franks a lot,
David S. Miller
[email protected]

2002-06-09 00:40:29

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "David" == David S Miller <[email protected]> writes:

David> There is no allocation scheme legal for PCI DMA which gives
David> you smaller than a cacheline of data, this includes SLAB.
David> This is why stack buffers and the like are illegal for PCI
David> DMA.

David> If the architecture allows SLAB to give smaller than
David> cacheline sized data, it must handle PCI DMA map/unmap
David> flushing in an appropriate fashion (ie. handle
David> sub-cacheline buffers).

Thanks, that's great information. However, there is one question you
didn't cover. How about using a sub-cache-line piece of a kmalloc()'ed
buffer (eg the case I described of using one member from a struct as a
DMA buffer)? As far as I can see there is no guarantee that this will
always work. Do you agree that this should be treated as a bug and
fixed when it is found? Or should we leave that usage unless it is
observed causing problems (since we almost always get lucky and don't
touch the rest of the cache line near the DMA)?

Thanks,
Roland

2002-06-09 00:52:54

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "Anton" == Anton Blanchard <[email protected]> writes:

Anton> I just checked 2.5 and noticed the scsi code is _still_
Anton> DMAing to the stack! Maybe it would be worth having a debug
Anton> option for x86 where it uses vmalloc for kernel stack
Anton> allocation :)

Good catches, just a few comments on the patch.

Anton> Anyway attached is a patch from Todd Inglett that I updated
Anton> for 2.5.

===== drivers/scsi/scsi_scan.c 1.13 vs edited =====
- /* Make sure we have something that is valid for DMA purposes */
- scsi_result = ((!shpnt->unchecked_isa_dma)
- ? &scsi_result0[0] : kmalloc(512, GFP_DMA));
+ scsi_result = kmalloc(512, GFP_DMA);

I don't think you should always use GFP_DMA here... probably better to
do:

scsi_result = kmalloc(512,
shpnt->unchecked_isa_dma ? GFP_DMA : GFP_ATOMIC);

And similarly this:

===== drivers/scsi/sr_ioctl.c 1.13 vs edited =====
+ unsigned char *buffer = kmalloc(512, GFP_DMA);

should probably use GFP_KERNEL (or possibly a check of
unchecked_isa_dma as above, though this doesn't seem to be needed
based on the existing code).

Best,
Roland

2002-06-09 00:57:11

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Roland Dreier <[email protected]>
Date: 08 Jun 2002 17:40:24 -0700

Or should we leave that usage unless it is observed causing
problems (since we almost always get lucky and don't touch the rest
of the cache line near the DMA)?

I think passing in a 4 byte chunk and assuming the rest of the
cacheline is unmodified is a valid expectation the more I think
about it.

This means what MIPS is doing is wrong. For partial cacheline bits it
can't do the invalidate thing.

2002-06-09 01:30:32

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

David S. Miller writes:
> From: Roland Dreier <[email protected]>

>> which causes stack corruption: on the PPC
>> 440GP, pci_map_single() with PCI_DMA_FROMDEVICE just invalidates the
>> cache for the region being mapped. However, if a buffer is smaller
>> than a cache line, then two bad things can happen.
>
> There is no allocation scheme legal for PCI DMA which gives you
> smaller than a cacheline of data, this includes SLAB. This is why
> stack buffers and the like are illegal for PCI DMA.
>
>> The solution to this was simply to use kmalloc() to allocate buffers
>> instead of using automatic variables.
>
> Right.
>
>> However, this leads to my first question: is this safe on all
>> architectures?
>
> It must be. If the architecture allows SLAB to give smaller than
> cacheline sized data, it must handle PCI DMA map/unmap flushing
> in an appropriate fashion (ie. handle sub-cacheline buffers).
>>
>> DMA-mapping.txt says kmalloc()'ed memory is OK for DMAs and does not
>> mention cache alignment.
>
> It doesn't mention cache alignment because that is an implementation
> specific issue. The user of the interfaces need not be concerned
> about any of this.
>
> There need be no changes to the documentation. If you do as the
> documentation states (use kmalloc or get_free_page to get your
> buffers) then it will just work.

On a non-SMP system, would it be OK to map all the memory
without memory coherency enabled? You seem to be implying that
one only needs to implement some mechanism in pci_map_single()
to handle flushing cache lines (write back, then invalidate).

This would be useful for Macs.


2002-06-09 01:26:17

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "David" == David S Miller <[email protected]> writes:

Roland> Or should we leave that usage unless it is observed
Roland> causing problems (since we almost always get lucky and
Roland> don't touch the rest of the cache line near the DMA)?

David> I think passing in a 4 byte chunk and assuming the rest of
David> the cacheline is unmodified is a valid expectation the more
David> I think about it.

Just to make sure I'm reading this correctly, you're saying that as
long as a buffer is OK for DMA, it should be OK to use a
sub-cache-line chunk as a DMA buffer via pci_map_single(), and
accessing the rest of the cache line should be OK at any time before,
during and after the DMA.

David> This means what MIPS is doing is wrong. For partial
David> cacheline bits it can't do the invalidate thing.

If I understand you, this means non-cache-coherent PPC is wrong as
well -- pci_map_single() goes through consistent_sync() and turns
into:

case PCI_DMA_FROMDEVICE: /* invalidate only */
invalidate_dcache_range(start, end);
break;

What alternate implementation are you proposing?

Thanks,
Roland

2002-06-09 05:32:51

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Roland Dreier <[email protected]>
Date: 08 Jun 2002 18:26:12 -0700

Just to make sure I'm reading this correctly, you're saying that as
long as a buffer is OK for DMA, it should be OK to use a
sub-cache-line chunk as a DMA buffer via pci_map_single(), and
accessing the rest of the cache line should be OK at any time before,
during and after the DMA.

Yes.

David> This means what MIPS is doing is wrong. For partial
David> cacheline bits it can't do the invalidate thing.

If I understand you, this means non-cache-coherent PPC is wrong as
well -- pci_map_single() goes through consistent_sync() and turns
into:

case PCI_DMA_FROMDEVICE: /* invalidate only */
invalidate_dcache_range(start, end);
break;

What alternate implementation are you proposing?

For non-cacheline aligned chunks in the range "start" to "end" you
must perform a cache writeback and invalidate. To preserve the data
outside of the DMA range.

2002-06-09 05:33:33

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: "Albert D. Cahalan" <[email protected]>
Date: Sat, 8 Jun 2002 21:30:31 -0400 (EDT)

On a non-SMP system, would it be OK to map all the memory
without memory coherency enabled? You seem to be implying that
one only needs to implement some mechanism in pci_map_single()
to handle flushing cache lines (write back, then invalidate).

This would be useful for Macs.

It's just avoiding flushing by effecting flushing the cache after
every load/store the cpu does, so of course it would work.

It would be slow as hell, but it would work.

2002-06-09 06:16:43

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "David" == David S Miller <[email protected]> writes:

Roland> Just to make sure I'm reading this correctly, you're
Roland> saying that as long as a buffer is OK for DMA, it should
Roland> be OK to use a sub-cache-line chunk as a DMA buffer via
Roland> pci_map_single(), and accessing the rest of the cache line
Roland> should be OK at any time before, during and after the DMA.

David> Yes.

Roland> What alternate implementation are you proposing?

David> For non-cacheline aligned chunks in the range "start" to
David> "end" you must perform a cache writeback and invalidate. To
David> preserve the data outside of the DMA range.

Doesn't this still have a problem if you touch data in the same cache
line as the DMA buffer after the pci_map but before the DMA takes
place? The CPU will pull the cache line back in and it might not see
the data the DMA brought in.

It seems to me that to be totally safe, pci_unmap would have to save
the non-aligned part outside the buffer to temporary storage, do an
invalidate, and then copy back the non-aligned part.

In any case if this is what pci_map is supposed to do then we have to
fix up several architectures at least...

Thanks,
Roland

2002-06-09 06:33:39

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

David S. Miller writes:
> From: "Albert D. Cahalan" <[email protected]>

>> On a non-SMP system, would it be OK to map all the memory
>> without memory coherency enabled? You seem to be implying that
>> one only needs to implement some mechanism in pci_map_single()
>> to handle flushing cache lines (write back, then invalidate).
>>
>> This would be useful for Macs.
>
> It's just avoiding flushing by effecting flushing the cache after
> every load/store the cpu does, so of course it would work.
>
> It would be slow as hell, but it would work.

I don't see why it would have to be slow. Mapping the memory
with coherency disabled will reduce bus traffic for all the
regular kernel code doing non-DMA stuff. (coherency requires
that the CPU generate address-only bus operations)

The obvious concern is that some kernel code might touch
a cache line after the invalidate but before the DMA is
done. If that could be considered a bug, then every non-SMP
Mac could gain some speed by turning off coherency.
Maybe the x86 MTRRs allow for something similar.

For device --> memory DMA:

1. write back cache lines that cross unaligned boundries
2. start the DMA
3. invalidate the above cache lines
4. invalidate cache lines that are fully inside the DMA

For memory --> device DMA:

1. write back all cache lines affected by the DMA
2. start the DMA
3. invalidate the above cache lines

2002-06-09 06:46:06

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Sonntag, 9. Juni 2002 07:29 schrieb David S. Miller:
> From: Roland Dreier <[email protected]>
> Date: 08 Jun 2002 18:26:12 -0700
>
> Just to make sure I'm reading this correctly, you're saying that as
> long as a buffer is OK for DMA, it should be OK to use a
> sub-cache-line chunk as a DMA buffer via pci_map_single(), and
> accessing the rest of the cache line should be OK at any time before,
> during and after the DMA.
>
> Yes.

Does this mean that this piece of memory does have to be declared
uncacheable until DMA is finished ?
How else do you solve th problem of validity during DMA and
especially after DMA ?

Regards
Oliver

2002-06-09 06:50:16

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch


> For memory --> device DMA:
>
> 1. write back all cache lines affected by the DMA
> 2. start the DMA
> 3. invalidate the above cache lines

Why the third step ? That data should still
be valid.

Regards
Oliver

2002-06-09 06:57:28

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Oliver Neukum writes:

>> For memory --> device DMA:
>>
>> 1. write back all cache lines affected by the DMA
>> 2. start the DMA
>> 3. invalidate the above cache lines
>
> Why the third step ? That data should still
> be valid.

I made a mistake, but perhaps it is a good one.
There is no need to invalidate the cache lines,
but I'd guess that commonly the data won't be
used again. Doing the invalidate would free up
some cache lines, meaning that the CPU would
have empty slots to use for other stuff.

2002-06-09 07:15:24

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Sonntag, 9. Juni 2002 08:57 schrieb Albert D. Cahalan:
> Oliver Neukum writes:
> >> For memory --> device DMA:
> >>
> >> 1. write back all cache lines affected by the DMA
> >> 2. start the DMA
> >> 3. invalidate the above cache lines
> >
> > Why the third step ? That data should still
> > be valid.
>
> I made a mistake, but perhaps it is a good one.
> There is no need to invalidate the cache lines,
> but I'd guess that commonly the data won't be
> used again. Doing the invalidate would free up
> some cache lines, meaning that the CPU would
> have empty slots to use for other stuff.

Then this should be made commonly available
and could be used eg. on kfree.
The choice of kicking the data out or not could then
be made.

Regards
Oliver

2002-06-09 08:48:56

by Russell King

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Sun, Jun 09, 2002 at 02:33:35AM -0400, Albert D. Cahalan wrote:
> For device --> memory DMA:
>
> 1. write back cache lines that cross unaligned boundries

What if some of the cache lines inside the DMA region are dirty and...

> 2. start the DMA

get evicted now when the CPU is doing something else?

> 3. invalidate the above cache lines
> 4. invalidate cache lines that are fully inside the DMA

You really need to:

1. write back cache lines that cross unaligned boundries
3. invalidate the above cache lines
2. start the DMA
4. invalidate cache lines that are fully inside the DMA

which is precisely we do on ARM.

> For memory --> device DMA:
>
> 1. write back all cache lines affected by the DMA
> 2. start the DMA
> 3. invalidate the above cache lines

What's the point of (3) ? The data in memory hasn't been changed by DMA.
What if we were writing out a page that was mmaped into a process, and
the process wrote to that page while we were DMAing ?

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-06-09 09:55:54

by Paul Mackerras

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

David S. Miller writes:

> From: Roland Dreier <[email protected]>
> Date: 08 Jun 2002 18:26:12 -0700
>
> Just to make sure I'm reading this correctly, you're saying that as
> long as a buffer is OK for DMA, it should be OK to use a
> sub-cache-line chunk as a DMA buffer via pci_map_single(), and
> accessing the rest of the cache line should be OK at any time before,
> during and after the DMA.
>
> Yes.

No. Not on processors where the cache doesn't snoop DMA accesses to
memory.

> What alternate implementation are you proposing?
>
> For non-cacheline aligned chunks in the range "start" to "end" you
> must perform a cache writeback and invalidate. To preserve the data
> outside of the DMA range.

This is the problem scenario. Suppose we are doing DMA to a buffer B
and also independently accessing a variable X which is not part of B.
Suppose that X and the beginning of B are both in cache line C.

CPU I/O device

1. write back & invalidate cache
line(s) containing B
2. start DMA
3. read X: cache line C now
present in cache
4. DMA write to B:
cache line C updated in memory
5. write X: cache line C now
dirty in cache
6. Signal DMA completion

Now at this point the driver calls pci_unmap_single or whatever. What
is pci_unmap_single to do? If it does nothing, or does a writeback,
we lose the DMA data. If it does an invalidate we lose the value
written to X. Clearly, neither is correct.

The bottom line is that we can only have one writer to any given cache
line at a time. If a buffer is being used for DMA from a device to
memory, the cpu MUST NOT write to any cache line that overlaps the
buffer.

Fundamentally, this is because the hardware lacks the ability to
transfer the "ownership" of the cache line between the cpu and the DMA
device on demand, and so we must manage that in software. The only
safe way to allow the cpu to write to the cache line during a DMA
transfer is active is to pause the DMA, invalidate the cache line, do
the write, write back and invalidate the cache line, and restart the
DMA.

Paul.

2002-06-09 10:54:09

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>
>Now at this point the driver calls pci_unmap_single or whatever. What
>is pci_unmap_single to do? If it does nothing, or does a writeback,
>we lose the DMA data. If it does an invalidate we lose the value
>written to X. Clearly, neither is correct.
>
>The bottom line is that we can only have one writer to any given cache
>line at a time. If a buffer is being used for DMA from a device to
>memory, the cpu MUST NOT write to any cache line that overlaps the
>buffer.

Note that this typically happen with network drivers, some infos in
the skbuf getting lost when they happen to share a cache line with
the data portion of the skbuf. When writing a driver specific to a
non-coherent CPU, it can be worked around by reserving enough space,
but "generic" PCI drivers are still affected.

On those architectures, the core skbuf alloc routines should probably
make sure the data portion don't share a cache line with other
informations.

Ben.


2002-06-09 15:42:49

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Russell King writes:
> On Sun, Jun 09, 2002 at 02:33:35AM -0400, Albert D. Cahalan wrote:

>> For device --> memory DMA:
>>
>> 1. write back cache lines that cross unaligned boundries
>
> What if some of the cache lines inside the DMA region are dirty and...
>
>> 2. start the DMA
>
> get evicted now when the CPU is doing something else?

Ugh. Yes, I guess the very act of starting the DMA
could evict a few. In that case, your method won't
work either.

It sure would be nice to get the DMA started early,
but that would require some way to keep new stuff
out of the cache.

Well gee, the MPC7400 has such an ability. I don't
know if it's fast, but one could do:

1. write back cache lines that cross unaligned boundries
2. sync
3. set L2CR[L2DO] and L2CR[L2IO] (and maybe HID0[DLOCK] too)
4. start the DMA
5. invalidate all cache lines affected by the DMA
6. undo step 3 (clear L2CR[L2DO], L2CR[L2IO], HID0[DLOCK])

Most likely the above needs another sync or two.

> You really need to:
>
> 1. write back cache lines that cross unaligned boundries
> 3. invalidate the above cache lines
> 2. start the DMA
------ write-back happens during DMA; data is ruined ------
> 4. invalidate cache lines that are fully inside the DMA
>
> which is precisely we do on ARM.

>> For memory --> device DMA:
>>
>> 1. write back all cache lines affected by the DMA
>> 2. start the DMA
>> 3. invalidate the above cache lines
>
> What's the point of (3) ? The data in memory hasn't been changed by DMA.
> What if we were writing out a page that was mmaped into a process, and
> the process wrote to that page while we were DMAing ?

As noted in another email, that was a mistake.
It might be good to invalidate anyway, since
that frees up cache lines for other uses.
The process won't get a chance to touch the page
unless you have SMP or (maybe) preemption.

2002-06-09 23:26:40

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Sonntag, 9. Juni 2002 10:48 schrieb Russell King:
> On Sun, Jun 09, 2002 at 02:33:35AM -0400, Albert D. Cahalan wrote:
> > For device --> memory DMA:
> >
> > 1. write back cache lines that cross unaligned boundries
>
> What if some of the cache lines inside the DMA region are dirty and...
>
> > 2. start the DMA
>
> get evicted now when the CPU is doing something else?
>
> > 3. invalidate the above cache lines
> > 4. invalidate cache lines that are fully inside the DMA
>
> You really need to:
>
> 1. write back cache lines that cross unaligned boundries
> 3. invalidate the above cache lines
> 2. start the DMA
> 4. invalidate cache lines that are fully inside the DMA

Starting DMA has to be the very last step.

Regards
Oliver

2002-06-10 04:31:24

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Paul Mackerras <[email protected]>
Date: Sun, 9 Jun 2002 19:51:58 +1000 (EST)

This is the problem scenario. Suppose we are doing DMA to a buffer B
and also independently accessing a variable X which is not part of B.
Suppose that X and the beginning of B are both in cache line C.

I see what the problem is. Hmmm...

I'm trying to specify this such that knowledge of cachelines and
whatnot don't escape the arch specific code, ho hum... Looks like
that isn't possible.

2002-06-10 04:28:43

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Oliver Neukum <[email protected]>
Date: Sun, 9 Jun 2002 08:45:49 +0200

Am Sonntag, 9. Juni 2002 07:29 schrieb David S. Miller:
> From: Roland Dreier <[email protected]>
> Date: 08 Jun 2002 18:26:12 -0700
>
> Just to make sure I'm reading this correctly, you're saying that as
> long as a buffer is OK for DMA, it should be OK to use a
> sub-cache-line chunk as a DMA buffer via pci_map_single(), and
> accessing the rest of the cache line should be OK at any time before,
> during and after the DMA.
>
> Yes.

Does this mean that this piece of memory does have to be declared
uncacheable until DMA is finished ?
How else do you solve th problem of validity during DMA and
especially after DMA ?

You flush either before/after depending upon whether the cpu caches
are writeback in nature or not, and the cpu is not allowed to touch
those addresses while the device is doing the DMA.

2002-06-10 10:00:41

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch


> Does this mean that this piece of memory does have to be declared
> uncacheable until DMA is finished ?
> How else do you solve th problem of validity during DMA and
> especially after DMA ?
>
> You flush either before/after depending upon whether the cpu caches
> are writeback in nature or not, and the cpu is not allowed to touch
> those addresses while the device is doing the DMA.

So we need some kind of cache_aligned macro in our USB data
structures if they contain a buffer. Which macro would we have to use ?
Is there a macro conditional to incoherent architectures so we don't
have to waste RAM needlessly ?

Regards
Oliver

2002-06-10 10:28:01

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Oliver Neukum <[email protected]>
Date: Mon, 10 Jun 2002 12:00:14 +0200

> You flush either before/after depending upon whether the cpu caches
> are writeback in nature or not, and the cpu is not allowed to touch
> those addresses while the device is doing the DMA.

So we need some kind of cache_aligned macro in our USB data
structures if they contain a buffer. Which macro would we have to use ?

For now use SMP_CACHE_BYTES I guess.

2002-06-10 15:59:54

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "David" == David S Miller <[email protected]> writes:

Paul> This is the problem scenario. Suppose we are doing DMA to a
Paul> buffer B and also independently accessing a variable X which
Paul> is not part of B. Suppose that X and the beginning of B are
Paul> both in cache line C.

David> I see what the problem is. Hmmm...

David> I'm trying to specify this such that knowledge of
David> cachelines and whatnot don't escape the arch specific code,
David> ho hum... Looks like that isn't possible.

So is the consensus now that in general drivers should make sure any
buffers passed to pci_map/unmap are aligned to SMP_CACHE_BYTES (and a
multiple of SMP_CACHE_BYTES in size)? In other words if a driver uses
an unaligned buffer it should be fixed unless we can prove (and
document in a comment :) that it won't cause problems on an arch
without cache coherency and with a writeback cache.

Thanks,
Roland

2002-06-10 16:03:27

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "Roland" == Roland Dreier <[email protected]> writes:

David> For non-cacheline aligned chunks in the range "start" to
David> "end" you must perform a cache writeback and invalidate. To
David> preserve the data outside of the DMA range.

Roland> Doesn't this still have a problem if you touch data in the
Roland> same cache line as the DMA buffer after the pci_map but
Roland> before the DMA takes place? The CPU will pull the cache
Roland> line back in and it might not see the data the DMA brought
Roland> in.

Roland> It seems to me that to be totally safe, pci_unmap would
Roland> have to save the non-aligned part outside the buffer to
Roland> temporary storage, do an invalidate, and then copy back
Roland> the non-aligned part.

Replying to myself.... Anyway, I realized that even my idea above is
wrong. I don't see _any_ safe way to share a cache line between a DMA
buffer and other data. Access to the cache line might pull the cache
line back in and write it back at any time, which could corrupt the
DMA'ed data. I don't see a way to hide the existence of cache lines
etc. from the driver.

Best,
Roland

2002-06-10 17:03:18

by Tom Rini

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Mon, Jun 10, 2002 at 08:59:48AM -0700, Roland Dreier wrote:
> >>>>> "David" == David S Miller <[email protected]> writes:
>
> Paul> This is the problem scenario. Suppose we are doing DMA to a
> Paul> buffer B and also independently accessing a variable X which
> Paul> is not part of B. Suppose that X and the beginning of B are
> Paul> both in cache line C.
>
> David> I see what the problem is. Hmmm...
>
> David> I'm trying to specify this such that knowledge of
> David> cachelines and whatnot don't escape the arch specific code,
> David> ho hum... Looks like that isn't possible.
>
> So is the consensus now that in general drivers should make sure any
> buffers passed to pci_map/unmap are aligned to SMP_CACHE_BYTES (and a
> multiple of SMP_CACHE_BYTES in size)? In other words if a driver uses
> an unaligned buffer it should be fixed unless we can prove (and
> document in a comment :) that it won't cause problems on an arch
> without cache coherency and with a writeback cache.

And how about we don't call it SMP_CACHE_BYTES too? The processors
where this matters certainly aren't doing SMP...

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

2002-06-10 17:22:56

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch


> > So is the consensus now that in general drivers should make sure any
> > buffers passed to pci_map/unmap are aligned to SMP_CACHE_BYTES (and a
> > multiple of SMP_CACHE_BYTES in size)? In other words if a driver uses
> > an unaligned buffer it should be fixed unless we can prove (and
> > document in a comment :) that it won't cause problems on an arch
> > without cache coherency and with a writeback cache.
>
> And how about we don't call it SMP_CACHE_BYTES too? The processors
> where this matters certainly aren't doing SMP...

Definitely we should call it something different so we can limit it to
architectures that need it.

Regards
Oliver

2002-06-10 17:29:01

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "Tom" == Tom Rini <[email protected]> writes:

Roland> So is the consensus now that in general drivers should
Roland> make sure any buffers passed to pci_map/unmap are aligned
Roland> to SMP_CACHE_BYTES (and a multiple of SMP_CACHE_BYTES in
Roland> size)? In other words if a driver uses an unaligned
Roland> buffer it should be fixed unless we can prove (and
Roland> document in a comment :) that it won't cause problems on
Roland> an arch without cache coherency and with a writeback
Roland> cache.

Tom> And how about we don't call it SMP_CACHE_BYTES too? The
Tom> processors where this matters certainly aren't doing SMP...

Fair enough... there is of course L1_CACHE_BYTES but I'm not positive
that's always the right thing. If we want to introduce a new constant
then we will have to touch every arch (which is not necessarily a
killer but it means "fixed" drivers won't compile for everyone until
their arch is fixed).

What would you propose? I don't have strong feelings about the exact
form of a solution but I would like to decide something so we can have
a standard way of fixing drivers that use unaligned DMA buffers (and
convincing maintainers to apply the patches).

Thanks,
Roland



2002-06-10 17:30:21

by Tom Rini

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Mon, Jun 10, 2002 at 07:22:26PM +0200, Oliver Neukum wrote:
>
> > > So is the consensus now that in general drivers should make sure any
> > > buffers passed to pci_map/unmap are aligned to SMP_CACHE_BYTES (and a
> > > multiple of SMP_CACHE_BYTES in size)? In other words if a driver uses
> > > an unaligned buffer it should be fixed unless we can prove (and
> > > document in a comment :) that it won't cause problems on an arch
> > > without cache coherency and with a writeback cache.
> >
> > And how about we don't call it SMP_CACHE_BYTES too? The processors
> > where this matters certainly aren't doing SMP...
>
> Definitely we should call it something different so we can limit it to
> architectures that need it.

No. We should just make it come out to a nop for arches that don't need
it. Otherwise we'll end up with ugly things like:
#ifdef CONFIG_NOT_CACHE_COHERENT
...
#else
...
#endif

All over things like USB...

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

2002-06-10 17:40:11

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Montag, 10. Juni 2002 19:29 schrieb Tom Rini:
> On Mon, Jun 10, 2002 at 07:22:26PM +0200, Oliver Neukum wrote:
> > > > So is the consensus now that in general drivers should make sure
> > > > any buffers passed to pci_map/unmap are aligned to SMP_CACHE_BYTES
> > > > (and a multiple of SMP_CACHE_BYTES in size)? In other words if a
> > > > driver uses an unaligned buffer it should be fixed unless we can
> > > > prove (and document in a comment :) that it won't cause problems
> > > > on an arch without cache coherency and with a writeback cache.
> > >
> > > And how about we don't call it SMP_CACHE_BYTES too? The processors
> > > where this matters certainly aren't doing SMP...
> >
> > Definitely we should call it something different so we can limit it to
> > architectures that need it.
>
> No. We should just make it come out to a nop for arches that don't need
> it. Otherwise we'll end up with ugly things like:
> #ifdef CONFIG_NOT_CACHE_COHERENT
> ...
> #else
> ...
> #endif
>
> All over things like USB...

You are right.

Regards
Oliver


2002-06-10 17:58:52

by Russell King

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Mon, Jun 10, 2002 at 07:22:26PM +0200, Oliver Neukum wrote:
> Definitely we should call it something different so we can limit it to
> architectures that need it.

DMA_ALIGN, DMA_ALIGN_BYTES or DMA_CACHE_BYTES.

Hmm, thinking about this some more (and just rambling). On some ARMs
(maybe other CPUs as well) each cache line has two dirty bits - one
for each half. If only half the cache line is dirty, it will only
write out the dirty half when evicting it. In this case, we'd want:

#define L1_CACHE_BYTES 32
#define DMA_CACHE_BYTES 16

which will probably work as long as we handle stuff carefully in the
architecture specific layer.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-06-10 18:10:26

by William Jhun

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Sun, Jun 09, 2002 at 09:27:05PM -0700, David S. Miller wrote:
> From: Paul Mackerras <[email protected]>
> Date: Sun, 9 Jun 2002 19:51:58 +1000 (EST)
>
> This is the problem scenario. Suppose we are doing DMA to a buffer B
> and also independently accessing a variable X which is not part of B.
> Suppose that X and the beginning of B are both in cache line C.
>
> I see what the problem is. Hmmm...
>
> I'm trying to specify this such that knowledge of cachelines and
> whatnot don't escape the arch specific code, ho hum... Looks like
> that isn't possible.

Perhaps provide macros in asm/pci.h that will:

- Take a buffer size and add an appropriate amount (one cache line for
alignment and the remainder to fill out the last cache line) to be
used for kmalloc(), etc, eg:

#define DMA_SIZE_ROUNDUP(size) \
((size + 2*SMP_CACHE_BYTES - 1) & ~(SMP_CACHE_BYTES - 1))

- Take a buffer address (as returned from kmalloc() with the modified
size from above) and round it up to a cacheline boundary, eg:

#define DMA_BUFFER_ALIGN(ptr) \
(((unsigned long)ptr + SMP_CACHE_BYTES - 1) & ~(SMP_CACHE_BYTES - 1))

These two, in conjunction, would provide a buffer that's aligned on a
cacheline boundary and ends on a cacheline boundary. Kind of ugly, but
would be sufficient and would hide the cacheline size specifics.
Cache-coherent platforms would just returned the original argument.

Thanks,
Will

2002-06-10 18:31:46

by William Jhun

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Mon, Jun 10, 2002 at 11:07:40AM -0700, William Jhun wrote:
> Perhaps provide macros in asm/pci.h that will:
>
> - Take a buffer size and add an appropriate amount (one cache line for
> alignment and the remainder to fill out the last cache line) to be
> used for kmalloc(), etc, eg:

Err, I should say any case where the buffer may not be cache-aligned
(this would enable the use of DMA to stack...). For allocation routines
that give a cache-aligned buffer, this is obviously not needed (but
would just add one cacheline to the size).

>
> #define DMA_SIZE_ROUNDUP(size) \
> ((size + 2*SMP_CACHE_BYTES - 1) & ~(SMP_CACHE_BYTES - 1))
>

2002-06-10 18:33:33

by Mark Zealey

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Mon, Jun 10, 2002 at 11:07:40AM -0700, William Jhun wrote:

> On Sun, Jun 09, 2002 at 09:27:05PM -0700, David S. Miller wrote:
> > From: Paul Mackerras <[email protected]>
> > Date: Sun, 9 Jun 2002 19:51:58 +1000 (EST)
> >
> > This is the problem scenario. Suppose we are doing DMA to a buffer B
> > and also independently accessing a variable X which is not part of B.
> > Suppose that X and the beginning of B are both in cache line C.
> >
> > I see what the problem is. Hmmm...
> >
> > I'm trying to specify this such that knowledge of cachelines and
> > whatnot don't escape the arch specific code, ho hum... Looks like
> > that isn't possible.
>
> Perhaps provide macros in asm/pci.h that will:
>
> - Take a buffer size and add an appropriate amount (one cache line for
> alignment and the remainder to fill out the last cache line) to be
> used for kmalloc(), etc, eg:
>
> #define DMA_SIZE_ROUNDUP(size) \
> ((size + 2*SMP_CACHE_BYTES - 1) & ~(SMP_CACHE_BYTES - 1))
>
> - Take a buffer address (as returned from kmalloc() with the modified
> size from above) and round it up to a cacheline boundary, eg:
>
> #define DMA_BUFFER_ALIGN(ptr) \
> (((unsigned long)ptr + SMP_CACHE_BYTES - 1) & ~(SMP_CACHE_BYTES - 1))
>
> These two, in conjunction, would provide a buffer that's aligned on a
> cacheline boundary and ends on a cacheline boundary. Kind of ugly, but
> would be sufficient and would hide the cacheline size specifics.
> Cache-coherent platforms would just returned the original argument.

Why not just make some dmalloc() macro in pci.h which will do the nessecory
magic resizing and alignments? seems a lot easier to do...

--

Mark Zealey
[email protected]; [email protected]

2002-06-10 18:44:37

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch


> > These two, in conjunction, would provide a buffer that's aligned on a
> > cacheline boundary and ends on a cacheline boundary. Kind of ugly, but
> > would be sufficient and would hide the cacheline size specifics.
> > Cache-coherent platforms would just returned the original argument.
>
> Why not just make some dmalloc() macro in pci.h which will do the
> nessecory magic resizing and alignments? seems a lot easier to do...

That won't be enough. We need a solution for buffers that are parts of
structures.

Regards
Oliver


2002-06-10 19:03:27

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "Tom" == Tom Rini <[email protected]> writes:

Tom> No. We should just make it come out to a nop for arches that
Tom> don't need it. Otherwise we'll end up with ugly things like:
Tom> #ifdef CONFIG_NOT_CACHE_COHERENT ... #else ... #endif
Tom> All over things like USB...

Good point. How about the following: add a file to each arch named
say, <asm/dma_buffer.h>, that defines a macro __dma_buffer. This
macro would be used as follows to mark DMA buffers (example taken from
<linux/usb.h>):

struct usb_device {
/* ... stuff deleted ... */

struct usb_bus *bus; /* Bus we're part of */

struct usb_device_descriptor descriptor __dma_buffer; /* Descriptor */
struct usb_config_descriptor *config; /* All of the configs */

/* ... more stuff deleted ... */
};

Then cache-coherent architectures like i386 can just do

#define __dma_buffer

while PPC can do

#ifdef CONFIG_NOT_CACHE_COHERENT

#define __dma_buffer __dma_buffer_line(__LINE__)
#define __dma_buffer_line(line) __dma_buffer_expand_line(line)
#define __dma_buffer_expand_line(line) \
__attribute__ ((aligned(SMP_CACHE_BYTES))); \
char __dma_pad_ ## line [0] __attribute__ ((aligned(SMP_CACHE_BYTES)))

#else /* CONFIG_NOT_CACHE_COHERENT */

#define __dma_buffer

#endif /* CONFIG_NOT_CACHE_COHERENT */

C purists will point out that this is not guaranteed to work since the
compiler can reorder structure members. However I'm sure that there
are many, many other places in the kernel where we are counting on gcc
not to reorder structures.

Comments?

Thanks,
Roland

2002-06-10 19:15:18

by Tom Rini

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Mon, Jun 10, 2002 at 12:03:19PM -0700, Roland Dreier wrote:
> >>>>> "Tom" == Tom Rini <[email protected]> writes:
>
> Tom> No. We should just make it come out to a nop for arches that
> Tom> don't need it. Otherwise we'll end up with ugly things like:
> Tom> #ifdef CONFIG_NOT_CACHE_COHERENT ... #else ... #endif
> Tom> All over things like USB...
>
> Good point. How about the following: add a file to each arch named
> say, <asm/dma_buffer.h>, that defines a macro __dma_buffer. This
> macro would be used as follows to mark DMA buffers (example taken from
> <linux/usb.h>):
[snip]
> Comments?

SMP_CACHE_BYTES is non-sensical on 4xx (and 8xx) since they don't do
SMP..

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

2002-06-10 19:21:49

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "Tom" == Tom Rini <[email protected]> writes:

Tom> SMP_CACHE_BYTES is non-sensical on 4xx (and 8xx) since they
Tom> don't do SMP..

True but <asm/cache.h> defines it anyway... of course it would be no
problem to use L1_CACHE_BYTES and in fact that probably makes sense
because we're talking about PPC-only macros (other arches would have
their own definition).

Best,
Roland

2002-06-10 19:27:29

by Tom Rini

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Mon, Jun 10, 2002 at 12:21:44PM -0700, Roland Dreier wrote:
> >>>>> "Tom" == Tom Rini <[email protected]> writes:
>
> Tom> SMP_CACHE_BYTES is non-sensical on 4xx (and 8xx) since they
> Tom> don't do SMP..
>
> True but <asm/cache.h> defines it anyway...

Right, to L1_CACHE_BYTES even. :) So we should probably use that
directly.

> of course it would be no
> problem to use L1_CACHE_BYTES and in fact that probably makes sense
> because we're talking about PPC-only macros (other arches would have
> their own definition).

Well, ARM (whom we (PPC)) borrowed some ideas from will set it to
L1_CACHE_BYTES too, from the sound of rmk. So it might even be
consistent among the non coherent processors. :)

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

2002-06-11 03:14:40

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch


Wait a second, forget all of this cache alignment crap. If we can
avoid drivers seeing it, we should by all means necessary.

We should just tell people to use PCI pools and be done with it. That
way all the complexity about buffer alignment and all this other
crapola lives strictly inside of the PCI pool code.

2002-06-11 04:04:36

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "David" == David S Miller <[email protected]> writes:

David> Wait a second, forget all of this cache alignment crap. If
David> we can avoid drivers seeing it, we should by all means
David> necessary.

David> We should just tell people to use PCI pools and be done
David> with it. That way all the complexity about buffer
David> alignment and all this other crapola lives strictly inside
David> of the PCI pool code.

That's fine but there are drivers (USB, etc) doing

struct something {
int field1;
char dma_buffer[SMALLER_THAN_CACHE_LINE];
int field2;
};

struct something *dev = kmalloc(sizeof *dev, GFP_KERNEL);

Do they have to change to

struct something {
int field1;
char *dma_buffer;
int field2;
};

struct something *dev = kmalloc(sizeof *dev, GFP_KERNEL);
dev->dma_buffer = kmalloc(SMALLER_THAN_CACHE_LINE, GFP_KERNEL);

(This is always safe because as you said kmalloc can never return a
slab that's not safe for DMA) I don't see how PCI pools help here.

Best,
Roland

2002-06-11 04:19:10

by Brad Hards

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Tue, 11 Jun 2002 14:04, Roland Dreier wrote:
> >>>>> "David" == David S Miller <[email protected]> writes:
>
> David> Wait a second, forget all of this cache alignment crap. If
> David> we can avoid drivers seeing it, we should by all means
> David> necessary.
>
> David> We should just tell people to use PCI pools and be done
> David> with it. That way all the complexity about buffer
> David> alignment and all this other crapola lives strictly inside
> David> of the PCI pool code.
>
> That's fine but there are drivers (USB, etc) doing
>
> struct something {
> int field1;
> char dma_buffer[SMALLER_THAN_CACHE_LINE];
> int field2;
> };
>
> struct something *dev = kmalloc(sizeof *dev, GFP_KERNEL);
>
> Do they have to change to
>
> struct something {
> int field1;
> char *dma_buffer;
> int field2;
> };
>
> struct something *dev = kmalloc(sizeof *dev, GFP_KERNEL);
> dev->dma_buffer = kmalloc(SMALLER_THAN_CACHE_LINE, GFP_KERNEL);
>
> (This is always safe because as you said kmalloc can never return a
> slab that's not safe for DMA) I don't see how PCI pools help here.
In the USB case, the "something" represents a device, and the "dma_buffer" is
something you want to send-to / receive-from the device.

kmallocing the transfer buffers is a problem for suspend. Doing the "you get
the transfer buffer with the device" is really useful, because you know that
the device configuration will be returned. But we might need to re-init the
device after a suspend, and [I've been told that] memory allocation may
deadlock under these circumstances.

Would it be enought to move the transfer buffers to the start of the device
struct, and then pad it up to a cacheline?

--
http://conf.linux.org.au. 22-25Jan2003. Perth, Australia. Birds in Black.

2002-06-11 04:24:40

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "Brad" == Brad Hards <[email protected]> writes:

Brad> Would it be enought to move the transfer buffers to the
Brad> start of the device struct, and then pad it up to a
Brad> cacheline?

Something like the __dma_buffer macro I posted earlier makes it even
simpler. I'll make a patch that adds <asm/dma_buffer.h> so we have
something concrete to discuss.

Best,
Roland

2002-06-11 04:25:44

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Roland Dreier <[email protected]>
Date: 10 Jun 2002 21:04:30 -0700

That's fine but there are drivers (USB, etc) doing

struct something {
int field1;
char dma_buffer[SMALLER_THAN_CACHE_LINE];
int field2;
};

struct something *dev = kmalloc(sizeof *dev, GFP_KERNEL);

Do they have to change to

How about allocating struct something using pci_pool?

2002-06-11 04:28:17

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Roland Dreier <[email protected]>
Date: 10 Jun 2002 21:24:35 -0700

Something like the __dma_buffer macro I posted earlier makes it even
simpler. I'll make a patch that adds <asm/dma_buffer.h> so we have
something concrete to discuss.

Why don't you just allocate the "struct something" from PCI pools?
If you don't have the pci_dev from that point, make some callback into
the host adapter driver so you can get at it.

2002-06-11 04:39:31

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch


struct something {
int field1;
char dma_buffer[SMALLER_THAN_CACHE_LINE];
int field2;
};

struct something *dev = kmalloc(sizeof *dev, GFP_KERNEL);

David> How about allocating struct something using pci_pool?

The problem is the driver can't safely touch field1 or field2 near the
DMA (it might pull the cache line back in too soon, or dirty the cache
line and have it written back on top of DMA'ed data)

Best,
Roland

2002-06-11 04:42:58

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Roland Dreier <[email protected]>
Date: 10 Jun 2002 21:39:27 -0700

David> How about allocating struct something using pci_pool?

The problem is the driver can't safely touch field1 or field2 near the
DMA (it might pull the cache line back in too soon, or dirty the cache
line and have it written back on top of DMA'ed data)

Oh duh, I see, then go to making the thing a pointer to the
dma area.

2002-06-11 05:30:43

by David Brownell

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

> From: Roland Dreier <[email protected]>
> Date: 10 Jun 2002 21:39:27 -0700
>
> David> How about allocating struct something using pci_pool?
>
> The problem is the driver can't safely touch field1 or field2 near the
> DMA (it might pull the cache line back in too soon, or dirty the cache
> line and have it written back on top of DMA'ed data)
>
> Oh duh, I see, then go to making the thing a pointer to the
> dma area.

If it's going to be a pointer, it might as well be normal kmalloc data,
avoiding obfuscation of the memory model used by USB device drivers.
(They don't have to worry about DMA addresses, which IMO is a feature.)

Seems like there are two basic options for how to handle all this.

(a) Expose the dma/cache interaction to device drivers, so
they can manage (and prevent) bad ones like the cacheline
sharing scenarios.

(b) Require drivers to provide I/O buffers that are slab/page/...
pointers from some API that prevents such problems.

I think (a) is more or less a variant of the existing alignment
requirements that get embedded in ABIs, but which still show up
in code that's very close to the hardware. Many device driver
folk are familiar with them, or performance tuners, but not so
many folk doing USB device drivers would be. (They are thankfully
far from host controller hardware and its quirks!)

Personally I'd avoid (b) since I like to allocate memory all at
once in most cases ... lots fewer error cases to cope with (or,
more typically _not_ cope with :).

One question is whether to support only one of them, or allow both.
In either case the DMA-mapping.txt will need to touch on the issue.

- Dave




2002-06-11 05:48:07

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: David Brownell <[email protected]>
Date: Mon, 10 Jun 2002 22:31:45 -0700

One question is whether to support only one of them, or allow both.
In either case the DMA-mapping.txt will need to touch on the issue.

Another important issue is from where do these issues originate?

This stuff rarely happens most of the time because block buffers and
networking buffers are what are fed to the chip.

I still think it is crucial that we not put this alignment garbage
into the drivers themselves. Driver writers are going to get it
wrong or be confused by it.

2002-06-11 06:24:06

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Dienstag, 11. Juni 2002 06:38 schrieb David S. Miller:
> From: Roland Dreier <[email protected]>
> Date: 10 Jun 2002 21:39:27 -0700
>
> David> How about allocating struct something using pci_pool?
>
> The problem is the driver can't safely touch field1 or field2 near
> the DMA (it might pull the cache line back in too soon, or dirty the
> cache line and have it written back on top of DMA'ed data)
>
> Oh duh, I see, then go to making the thing a pointer to the
> dma area.

That would mean doing things like memory allocations for one single
byte. Also it would affect things like the scsi layer (sense buffer in
the structure).
And it would be additional overhead for everybody for the benefit
of a small minority. An alignment macro could be turned into a nop
on some architectures.

Regards
Oliver

2002-06-11 06:43:01

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Oliver Neukum <[email protected]>
Date: Tue, 11 Jun 2002 08:23:52 +0200

That would mean doing things like memory allocations for one single
byte. Also it would affect things like the scsi layer (sense buffer in
the structure).
And it would be additional overhead for everybody for the benefit
of a small minority. An alignment macro could be turned into a nop
on some architectures.

The problem is people are going to get it wrong if we expose all of
this cacheline crap to the drivers.

2002-06-11 07:39:53

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Dienstag, 11. Juni 2002 08:38 schrieb David S. Miller:
> From: Oliver Neukum <[email protected]>
> Date: Tue, 11 Jun 2002 08:23:52 +0200
>
> That would mean doing things like memory allocations for one single
> byte. Also it would affect things like the scsi layer (sense buffer
> in the structure).
> And it would be additional overhead for everybody for the benefit
> of a small minority. An alignment macro could be turned into a nop
> on some architectures.
>
> The problem is people are going to get it wrong if we expose all of
> this cacheline crap to the drivers.

You don't have to fully expose it. We make a simple rule like:
If you want to do DMA to a variable it needs "DMA_ALIGN after the
name in the declaration". Architectures could define it, with generic nop.
People will get it wrong by doing DMA to members of structures
otherwise. There's no really painless way to solve this.
You have to introduce a new rule anyway.

Regards
Oliver

2002-06-11 07:42:43

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Oliver Neukum <[email protected]>
Date: Tue, 11 Jun 2002 09:38:52 +0200

You don't have to fully expose it. We make a simple rule like:
If you want to do DMA to a variable it needs "DMA_ALIGN after the
name in the declaration". Architectures could define it, with generic nop.
People will get it wrong by doing DMA to members of structures
otherwise. There's no really painless way to solve this.
You have to introduce a new rule anyway.

The DMA_ALIGN attribute doesn't work, on some systems the PCI
cacheline size is determined at boot time not compile time.

2002-06-11 07:49:42

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: "David S. Miller" <[email protected]>
Date: Tue, 11 Jun 2002 00:36:25 -0700 (PDT)

The DMA_ALIGN attribute doesn't work, on some systems the PCI
cacheline size is determined at boot time not compile time.

Another note, it could be per-PCI controller what this cacheline size
is. We'll need to pass in a pdev to the alignment interfaces to
do this correctly.

So none of this can be done at compile time folks.

2002-06-11 08:08:08

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Dienstag, 11. Juni 2002 09:43 schrieb David S. Miller:
> From: "David S. Miller" <[email protected]>
> Date: Tue, 11 Jun 2002 00:36:25 -0700 (PDT)
>
> The DMA_ALIGN attribute doesn't work, on some systems the PCI
> cacheline size is determined at boot time not compile time.
>
> Another note, it could be per-PCI controller what this cacheline size
> is. We'll need to pass in a pdev to the alignment interfaces to
> do this correctly.

Could you please explain this ?

I thought this was a problem of a CPU dirtying a cache line
that overlaps with an area being DMAed into. So the determining
factor should be the granularity of the dirty status of the CPU.

Are there really PCI controllers which have to physically write
much more than is transfered ?

Now really puzzeled
Oliver

2002-06-11 08:19:36

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Oliver Neukum <[email protected]>
Date: Tue, 11 Jun 2002 10:07:19 +0200

Are there really PCI controllers which have to physically write
much more than is transfered ?

On sparc64 the cacheline size can be either 64 or 128 bytes.
It's a bus characteristic, so we have to get at the PCI
controller info.

2002-06-11 12:06:29

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Dienstag, 11. Juni 2002 10:15 schrieb David S. Miller:
> From: Oliver Neukum <[email protected]>
> Date: Tue, 11 Jun 2002 10:07:19 +0200
>
> Are there really PCI controllers which have to physically write
> much more than is transfered ?
>
> On sparc64 the cacheline size can be either 64 or 128 bytes.
> It's a bus characteristic, so we have to get at the PCI
> controller info.

A sparc64 is unlikely to be short on memory, or is it ?
What's wrong with always aligning on 128 bytes on sparc64 ?
A runtime check would be expensive.

Regards
Oliver


2002-06-11 12:08:45

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Oliver Neukum <[email protected]>
Date: Tue, 11 Jun 2002 14:06:14 +0200

A sparc64 is unlikely to be short on memory, or is it ?
What's wrong with always aligning on 128 bytes on sparc64 ?
A runtime check would be expensive.

Maybe on arch FOO, target X needs no alignment when using PCI
controller Y, but for PCI controller Z it does need alignment.

2002-06-11 14:04:43

by David Woodhouse

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch


[email protected] said:
> Replying to myself.... Anyway, I realized that even my idea above is
> wrong. I don't see _any_ safe way to share a cache line between a DMA
> buffer and other data. Access to the cache line might pull the cache
> line back in and write it back at any time, which could corrupt the
> DMA'ed data. I don't see a way to hide the existence of cache lines
> etc. from the driver.

Access to a cache line can't be shared safely. Disable the cache for that
page while its ownership is in doubt and you can happily share the RAM
though. If we find another CPU which fetches cache lines preemptively, we
may have to do that anyway.

--
dwmw2


2002-06-11 15:11:31

by David Brownell

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

> One question is whether to support only one of them, or allow both.
> In either case the DMA-mapping.txt will need to touch on the issue.
>
> Another important issue is from where do these issues originate?
>
> This stuff rarely happens most of the time because block buffers and
> networking buffers are what are fed to the chip.

I think the examples Oliver found related mostly to device control
and status tracking.


> I still think it is crucial that we not put this alignment garbage
> into the drivers themselves. Driver writers are going to get it
> wrong or be confused by it.

I guess I don't see a way around exposing some of it. Do you?

Even the (b) option requires drivers to know they can't pass pointers
to the inside of a structure, AND that memory right after the end of
an I/O buffer is unavailable. That exposes the issue. Although it
doesn't provide help to manage it, as (a) does, or detect it.

Should the dma mapping APIs try to detect the "DMA buffer starts in
middle of non-coherent cacheline" case, and fail? That might be
worth checking, catching some of these errors, even if it ignores
the corresponding "ends in middle of non-coherent cacheline" case.
And it'd handle that "it's a runtime issue on some HW" concern.

Or then there's David Woodhouse's option (disable caching on those
pages while the DMA mapping is active) which seems good, except for
the fact that this issue is most common for buffers that are a lot
smaller than one page ... so lots of otherwise cacheable data would
suddenly get very slow. :)

- Dave

2002-06-11 15:45:10

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Dienstag, 11. Juni 2002 17:12 schrieb David Brownell:
> > One question is whether to support only one of them, or allow both.
> > In either case the DMA-mapping.txt will need to touch on the issue.
> >
> > Another important issue is from where do these issues originate?
> >
> > This stuff rarely happens most of the time because block buffers and
> > networking buffers are what are fed to the chip.
>
> I think the examples Oliver found related mostly to device control
> and status tracking.

I did not look for others. It would seem that SCSI and networking are
affected as well. SCSI due to the sense buffer.

> Or then there's David Woodhouse's option (disable caching on those
> pages while the DMA mapping is active) which seems good, except for
> the fact that this issue is most common for buffers that are a lot
> smaller than one page ... so lots of otherwise cacheable data would
> suddenly get very slow. :)

There might be several buffers in one page. You'd have to count
DMA users in struct page.
Secondly are you sure that a physical page won't be accessed
by several different virtual addresses ? On some architectures
that would kill us.

Regards
Oliver

2002-06-11 15:57:23

by William Jhun

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Tue, Jun 11, 2002 at 12:43:05AM -0700, David S. Miller wrote:
> From: "David S. Miller" <[email protected]>
> Date: Tue, 11 Jun 2002 00:36:25 -0700 (PDT)
>
> The DMA_ALIGN attribute doesn't work, on some systems the PCI
> cacheline size is determined at boot time not compile time.
>
> Another note, it could be per-PCI controller what this cacheline size
> is. We'll need to pass in a pdev to the alignment interfaces to
> do this correctly.
>
> So none of this can be done at compile time folks.

Why is this a problem? So you just make a static inline that takes the
pdev and does the Right Thing at runtime... It's implementation-
dependent whether it's a compile-time macro or a more elaborate
inline...

2002-06-11 17:54:45

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Dienstag, 11. Juni 2002 14:04 schrieb David S. Miller:
> From: Oliver Neukum <[email protected]>
> Date: Tue, 11 Jun 2002 14:06:14 +0200
>
> A sparc64 is unlikely to be short on memory, or is it ?
> What's wrong with always aligning on 128 bytes on sparc64 ?
> A runtime check would be expensive.
>
> Maybe on arch FOO, target X needs no alignment when using PCI
> controller Y, but for PCI controller Z it does need alignment.

Still does that justify the overhead and the complications ?
Couldn't we provide for the worst case in a generic kernel
and make it a compile time option ?

If I understand you correctly, we even couldn't use kmalloc()
for allocating the buffers. IMHO you cannot expose that to
driver writers and hope to get a useful result.
So what are the alternatives ?
We could either use a bounce buffer or disable caching for the
page in question, which has its own set of problems.

Regards
Oliver


2002-06-11 18:26:19

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "David" == David S Miller <[email protected]> writes:

David> Maybe on arch FOO, target X needs no alignment when using
David> PCI controller Y, but for PCI controller Z it does need
David> alignment.

I'm not sure I understand this objection. kmalloc() is will return a
buffer that is safe for DMA for all controllers. Any compile-time
static alignment of structure members would similarly do worst-case
alignment. Also I have to admit that I don't quite understand the
issue you're raising here. Obviously the CPU's cache line size
doesn't change based on the PCI controller. Are you referring to PCI
device's CLS register? I don't see how that could matter since the
bus master shouldn't ever write beyond the buffer the CPU gives it.

In any case, given drivers that have:

struct something {
int field1;
char dma_buffer[SMALLER_THAN_CACHELINE];
int field2;
};

struct something *dev = kmalloc(sizeof *dev, GFP_KERNEL);
/* do DMA into dev->dma_buffer */

I know of several ways to make this safe:

1) Add a static alignment macro and change the declaration to

struct something {
int field1;
char dma_buffer[SMALLER_THAN_CACHELINE] __dma_buffer;
int field2;
};

__dma_buffer would be a NOP on cache coherent architectures (i386,
etc) but might introduce some alignment not strictly necessary on
architectures (???)

2) Change the code to

struct something {
int field1;
char *dma_buffer;
int field2;
};

struct something *dev = kmalloc(sizeof *dev, GFP_KERNEL);
dev->dma_buffer = kmalloc(SMALLER_THAN_CACHELINE, GFP_KERNEL);
/* do DMA into dev->dma_buffer */

This always uses strictly more memory than 1) above, complicates the
code, may introduce bugs (do we know if anyone did *dev_copy = *dev
anywhere?).

3) Change the code to

struct something {
int field1;
char *dma_buffer;
int field2;
};

struct something *dev = kmalloc(sizeof *dev, GFP_KERNEL);
/* find pci_device */
dev->dma_buffer = aligned_pci_alloc(pci_device, SMALLER_THAN_CACHELINE);
/* do DMA into dev->dma_buffer */

This may use less memory than 1) or 2) above on some architectures but
will use more than 1) on cache-coherent architectures. It makes the
code even more complex since now the code that allocates the dma
buffer has to know which PCI device will use it (for example, in USB,
the hub driver is separated from the HCD driver, which is who knows
about the PCI bus).

4) David Woodhouse's suggestion of turning off caching for pages where
unaligned DMA is in progress. This may be doable but seems quite
complex. Also, drivers will probably some way of aligning their
buffers to avoid turning of caching, which means that this approach
and the above are complementary.

Best,
Roland

2002-06-11 23:01:13

by Thunder from the hill

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Hi,

On 11 Jun 2002, Roland Dreier wrote:
> 3) Change the code to
>
> struct something {
> int field1;
> char *dma_buffer;
> int field2;
> };
>
> struct something *dev = kmalloc(sizeof *dev, GFP_KERNEL);
> /* find pci_device */
> dev->dma_buffer = aligned_pci_alloc(pci_device, SMALLER_THAN_CACHELINE);
> /* do DMA into dev->dma_buffer */

You introduce a possible null pointer dereference here, don't you?

// Assume this fails:
struct something *dev = kmalloc(sizeof(struct something), GFP_KERNEL);
// dev is a NULL pointer now.
dev->dma_buffer = aligned_pci_alloc(pci_device, SMALLER_THAN_CACHELINE);
// Big bang

Just wondering...

Regards,
Thunder
--
German attitude becoming | Thunder from the hill at ngforever
rightaway popular: |
"Get outa my way, | free inhabitant not directly
for I got a mobile phone!" | belonging anywhere

2002-06-11 23:56:21

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "Thunder" == Thunder from the hill <[email protected]> writes:

Thunder> You introduce a possible null pointer dereference here,
Thunder> don't you?

I left out the error checking for the allocations everywhere in my
email. It wasn't real code and I thought that testing for NULL all
over the place would obscure the real point.

R.

2002-06-12 00:09:36

by Thunder from the hill

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Hi,

On 11 Jun 2002, Roland Dreier wrote:
> I left out the error checking for the allocations everywhere in my
> email. It wasn't real code and I thought that testing for NULL all
> over the place would obscure the real point.

Sure thing. It was just my newly introduced paranoia, I have some things
like them to fix.

Regards,
Thunder
--
German attitude becoming | Thunder from the hill at ngforever
rightaway popular: |
"Get outa my way, | free inhabitant not directly
for I got a mobile phone!" | belonging anywhere

2002-06-12 03:30:12

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: David Brownell <[email protected]>
Date: Tue, 11 Jun 2002 08:12:35 -0700

Should the dma mapping APIs try to detect the "DMA buffer starts in
middle of non-coherent cacheline" case, and fail? That might be
worth checking, catching some of these errors, even if it ignores
the corresponding "ends in middle of non-coherent cacheline" case.
And it'd handle that "it's a runtime issue on some HW" concern.

This brings back another issue, returning failure from pci_map_*()
and friends which currently cannot happen.

Or then there's David Woodhouse's option (disable caching on those
pages while the DMA mapping is active) which seems good, except for
the fact that this issue is most common for buffers that are a lot
smaller than one page ... so lots of otherwise cacheable data would
suddenly get very slow. :)

Remember please that specifically the DMA mapping APIs encourage use
of consistent memory for small data objects. It is specifically
because non-consistent DMA accesses to small bits are going to be very
slow (ie. the PCI controller is going to prefetch further cache lines
for no reason, for example). The non-consistent end of the APIs is
meant for long contiguous buffers, not small chunks.

This is one of the reasons I want to fix this by making people use
either consistent memory or PCI pools (which is consistent memory
too).

2002-06-12 06:24:01

by David Brownell

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

David S. Miller wrote:
> From: David Brownell <[email protected]>
>
> Should the dma mapping APIs try to detect the "DMA buffer starts in
> middle of non-coherent cacheline" case, and fail?
>
> This brings back another issue, returning failure from pci_map_*()
> and friends which currently cannot happen.

An issue that'll get fixed, yes? :)

I'd suspect ((dma_addr_t)0) would be a reasonable error return.
At least some hardware treats such values like software would
treat null pointers. No call syntax change necessary, which
might be good or bad depending on how you feel tomorrow.


> Remember please that specifically the DMA mapping APIs encourage use
> of consistent memory for small data objects. ...
> ... The non-consistent end of the APIs is
> meant for long contiguous buffers, not small chunks.

And in between, a nice huge grey area to play with and argue about!


> This is one of the reasons I want to fix this by making people use
> either consistent memory or PCI pools (which is consistent memory
> too).

For that model, I would prefer tools more like a kmalloc than the
pci_pool, which is most like a kmem_cache_t. The particular objects
in question are a bit small to use page-or-bigger allocators, too.

The problem for APIs like USB is that they haven't yet exposed DMA
addresses. Doing that, giving folk a choice from the "non-consistent
end of the APIs", would be a big change.


Oh no -- I just had an evil thought. Now that we have the device model
code partially in place, shouldn't we have DMA memory calls talk in
terms of "struct device" not "struct pci_device"? That'd be the way
to have _one_ API for dma mapping (and consistent memory allocation),
working for PCI, USB, and any other bus framework that comes along.

Nah ... forget I said that. Too logical, it could never happen! ;)

- Dave






2002-06-12 06:28:47

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: David Brownell <[email protected]>
Date: Tue, 11 Jun 2002 23:25:09 -0700

I'd suspect ((dma_addr_t)0) would be a reasonable error return.
At least some hardware treats such values like software would
treat null pointers. No call syntax change necessary, which
might be good or bad depending on how you feel tomorrow.

0 is a valid PCI dma address on many platforms. This is part
of the problem.

> Remember please that specifically the DMA mapping APIs encourage use
> of consistent memory for small data objects. ...
> ... The non-consistent end of the APIs is
> meant for long contiguous buffers, not small chunks.

And in between, a nice huge grey area to play with and argue about!

Not gray area, fully intentional! From the beginning that was meant
to be one of the distinctions between consistent and streaming DMA
memory.

For that model, I would prefer tools more like a kmalloc than the
pci_pool, which is most like a kmem_cache_t. The particular objects
in question are a bit small to use page-or-bigger allocators, too.

Huh? The whole idea is that it is memory for PCI dma, it has to be
PCI in nature. If you want a kmalloc'ish thing, simple use
pci_alloc_consistent and carve up the pages you get internally.

The problem for APIs like USB is that they haven't yet exposed DMA
addresses. Doing that, giving folk a choice from the "non-consistent
end of the APIs", would be a big change.

But that is the direction I'd like things to go in. A lot of problems
have arisen because the USB layer likes to internally let drivers do
DMA on any gob of memory whatsoever. That has to stop, it really does.

Oh no -- I just had an evil thought. Now that we have the device model
code partially in place, shouldn't we have DMA memory calls talk in
terms of "struct device" not "struct pci_device"? That'd be the way
to have _one_ API for dma mapping (and consistent memory allocation),
working for PCI, USB, and any other bus framework that comes along.

Sure, that's the idea. Just change pci_alloc_consistent to
dev_alloc_consistent whatever. It's all still the same problem
though. The USB drivers have to stop DMA'ing to arbitrary little gobs
of memory.


2002-06-12 07:05:30

by David Brownell

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

David S. Miller wrote:
> From: David Brownell <[email protected]>
> Date: Tue, 11 Jun 2002 23:25:09 -0700
>
> I'd suspect ((dma_addr_t)0) would be a reasonable error return.
> At least some hardware treats such values like software would
> treat null pointers. No call syntax change necessary, which
> might be good or bad depending on how you feel tomorrow.
>
> 0 is a valid PCI dma address on many platforms. This is part
> of the problem.

OK, so a new call syntax would be desirable. Or reserving page zero
even on such platforms.


> > Remember please that specifically the DMA mapping APIs encourage use
> > of consistent memory for small data objects. ...
> > ... The non-consistent end of the APIs is
> > meant for long contiguous buffers, not small chunks.
>
> And in between, a nice huge grey area to play with and argue about!
>
> Not gray area, fully intentional! From the beginning that was meant
> to be one of the distinctions between consistent and streaming DMA
> memory.

I realize that, but it's still a grey area ... meaning only that the
specific choice, in a set of systems, is one of the design tradeoffs.
Costs can vary a lot, but near either end the tradeoffs get clearer.


> For that model, I would prefer tools more like a kmalloc than the
> pci_pool, which is most like a kmem_cache_t. The particular objects
> in question are a bit small to use page-or-bigger allocators, too.
>
> Huh? The whole idea is that it is memory for PCI dma, it has to be
> PCI in nature. If you want a kmalloc'ish thing, simple use
> pci_alloc_consistent and carve up the pages you get internally.

And the reason that logic doesn't lead us to rip kmalloc() out of
the kernel (in favor of the page allocator) is ... what? Every driver
shouldn't _need_ to write yet another malloc() clone, that's all I'm
saying; that'd be a waste.


> The problem for APIs like USB is that they haven't yet exposed DMA
> addresses. Doing that, giving folk a choice from the "non-consistent
> end of the APIs", would be a big change.
>
> But that is the direction I'd like things to go in. A lot of problems
> have arisen because the USB layer likes to internally let drivers do
> DMA on any gob of memory whatsoever. That has to stop, it really does.

I'll accept that for argument -- though I'll ask what other problems
(than this shared-cacheline scenario) you're thinking of. This is the
only one that I recall seeming like it might argue for an API change.

(And USB is just one example, too. Oliver turned up similar cases
in the SCSI layer, as I recall. Likely a lot of it can be traced
down to widely-available hardware that wouldn't recognize a NUMA if
it tripped on one.)


> Oh no -- I just had an evil thought. Now that we have the device model
> code partially in place, shouldn't we have DMA memory calls talk in
> terms of "struct device" not "struct pci_device"? That'd be the way
> to have _one_ API for dma mapping (and consistent memory allocation),
> working for PCI, USB, and any other bus framework that comes along.
>
> Sure, that's the idea. Just change pci_alloc_consistent to
> dev_alloc_consistent whatever.

Change, and fix up that lack of error codes on the mapping calls, and
maybe a few other things ... :)

Certainly hanging some sort of memory allocator object off struct device
should be pretty cheap. At least for USB, all devices on the same bus
could share the same allocator. Not that I know PCI that well, but I
suspect that could be true there too ... more sharing, less internal
fragmentation, and more efficient use of various resources.

- Dave





2002-06-12 09:28:17

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: David Brownell <[email protected]>
Date: Wed, 12 Jun 2002 00:06:39 -0700

David S. Miller wrote:
> 0 is a valid PCI dma address on many platforms. This is part
> of the problem.

OK, so a new call syntax would be desirable. Or reserving page zero
even on such platforms.

Or, better yet, define a PCI_BAD_ADDR per-arch.

> Huh? The whole idea is that it is memory for PCI dma, it has to be
> PCI in nature. If you want a kmalloc'ish thing, simple use
> pci_alloc_consistent and carve up the pages you get internally.

And the reason that logic doesn't lead us to rip kmalloc() out of
the kernel (in favor of the page allocator) is ... what? Every driver
shouldn't _need_ to write yet another malloc() clone, that's all I'm
saying; that'd be a waste.

That is why we have the PCI pool thing, it's PCI alloc consistent +
SLAB carving.

Certainly hanging some sort of memory allocator object off struct device
should be pretty cheap. At least for USB, all devices on the same bus
could share the same allocator. Not that I know PCI that well, but I
suspect that could be true there too ... more sharing, less internal
fragmentation, and more efficient use of various resources.

These instances we are talking about want to allocate multiple
instance of the same object (thus the same size). They are not
allocating things like variable length strings and whatnot.

That is why I keep suggesting PCI pools, it's the perfect kind of
setup (and the easiest to implement). A generic kmalloc() out of
pages is more to implement, certainly.

If you want to hide this behind a usb_dma_pool_create,
usb_dma_pool_alloc etc. kind of interface that calls back into the
host controller layer to get the pci_dev etc., that's fine.

Now enough talking and someone implement this, talk is cheap.

2002-06-12 09:39:05

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>This may use less memory than 1) or 2) above on some architectures but
>will use more than 1) on cache-coherent architectures. It makes the
>code even more complex since now the code that allocates the dma
>buffer has to know which PCI device will use it (for example, in USB,
>the hub driver is separated from the HCD driver, which is who knows
>about the PCI bus).

What about an arch that can be both coherent or incoherent (the same
kernel binary would boot both) ?

I can also imagine quite a bunch of embedded stuffs where you may have
both coherent and non-coherent devices depending on the bus the live
on or on bridge bugs.

For your example, I don't buy it. You could well design the USB urb
allocation in such a way that they are passed down the controller of
a given device.

Ben.



2002-06-12 09:43:25

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>Remember please that specifically the DMA mapping APIs encourage use
>of consistent memory for small data objects. It is specifically
>because non-consistent DMA accesses to small bits are going to be very
>slow (ie. the PCI controller is going to prefetch further cache lines
>for no reason, for example). The non-consistent end of the APIs is
>meant for long contiguous buffers, not small chunks.
>
>This is one of the reasons I want to fix this by making people use
>either consistent memory or PCI pools (which is consistent memory
>too).

Please don't limit the API design to PCI ;)

There are more and more embedded CPUs out there with their own
bunch of on-chip devices that are neither consistent nor PCI based,
their drivers will have the exact same problem to deal with though.

Ben.

2002-06-12 09:46:43

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Benjamin Herrenschmidt <[email protected]>
Date: Tue, 11 Jun 2002 19:33:47 +0200

>This is one of the reasons I want to fix this by making people use
>either consistent memory or PCI pools (which is consistent memory
>too).

Please don't limit the API design to PCI ;)

When I say "PCI pools" think "struct device pools" because that
will what it will be in the end.

That is Linus's long range plan, everything you see as pci_*
will become dev_*.

So I'm not really limiting the design in any way.

2002-06-12 11:52:19

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: William Jhun <[email protected]>
Date: Mon, 10 Jun 2002 11:07:40 -0700

On Sun, Jun 09, 2002 at 09:27:05PM -0700, David S. Miller wrote:
> I'm trying to specify this such that knowledge of cachelines and
> whatnot don't escape the arch specific code, ho hum... Looks like
> that isn't possible.

Perhaps provide macros in asm/pci.h that will:

You don't understand, I think. I want to avoid the drivers doing
any of the "align this, align that" stuff. I want the allocation
to do it for them, that way the code is in one place.

2002-06-12 12:03:20

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch


> For your example, I don't buy it. You could well design the USB urb
> allocation in such a way that they are passed down the controller of
> a given device.

Urbs are not the problem. An urb abstractly speaking is just a description
of io. It does not contain a buffer, just a pointer to it.

However many drivers allocate some of these buffers together with their
device descriptors, which would, if a special allocator must be used,
become impossible.
Usbcore could allocate bounce buffers, but performance would suck.

If I understand both Davids correctly this is the solution.
Buffers for dma must be allocated seperately using a special allocation
function which is given the device so it can allocate correctly.
David B wants a bus specific pointer to a function in the generic
driver structure, right ?

Regards
Oliver

2002-06-12 12:06:48

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Oliver Neukum <[email protected]>
Date: Wed, 12 Jun 2002 14:02:53 +0200

If I understand both Davids correctly this is the solution.
Buffers for dma must be allocated seperately using a special allocation
function which is given the device so it can allocate correctly.
David B wants a bus specific pointer to a function in the generic
driver structure, right ?

Eventually it could be exactly that, via the generic driver struct.

For now it can be a usb specific thing, and when the generic device
stuff is ready we just go:

#define usb_pool_alloc(...) dev_pool_alloc(...)

without having to touch any of the driver call sites.

2002-06-12 12:09:05

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Mittwoch, 12. Juni 2002 13:47 schrieb David S. Miller:
> From: William Jhun <[email protected]>
> Date: Mon, 10 Jun 2002 11:07:40 -0700
>
> On Sun, Jun 09, 2002 at 09:27:05PM -0700, David S. Miller wrote:
> > I'm trying to specify this such that knowledge of cachelines and
> > whatnot don't escape the arch specific code, ho hum... Looks like
> > that isn't possible.
>
> Perhaps provide macros in asm/pci.h that will:
>
> You don't understand, I think. I want to avoid the drivers doing
> any of the "align this, align that" stuff. I want the allocation
> to do it for them, that way the code is in one place.

That means that all buffers must be allocated seperately.
Is it worth that ? How about skbuffs ?

Regards
Oliver

2002-06-12 12:15:39

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>If I understand both Davids correctly this is the solution.
>Buffers for dma must be allocated seperately using a special allocation
>function which is given the device so it can allocate correctly.
>David B wants a bus specific pointer to a function in the generic
>driver structure, right ?

Then let's have those as part of the generic device struct, with
the default ones pointing to the parent bus ones.

That way, a couple of generic ones could be set at the root of the
device tree for fully coherent or fully incoherent archs, and
bus drivers would have the ability to affect their child devices
ones.

Ben.


2002-06-12 14:13:07

by David Brownell

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

> Please don't limit the API design to PCI ;)
>
> When I say "PCI pools" think "struct device pools" because that
> will what it will be in the end.
>
> That is Linus's long range plan, everything you see as pci_*
> will become dev_*.

I think the guts of it could happen in the 2.5 series soonish
though, so it'd only be "long range" in that quaint Wall Street
sense of "finishes sometime after next quarter". :)

Just to put some flesh on the discussion ... here's the rough
shape of what I was thinking, as more wood for the fire:

- Associated with a "struct device" would be some kind of
memory allocator/mapper object that would expose every (?)
primitive in DMA-mapping.txt, but not PCI-specific. That
would basically be a vtable, using the device for state.

struct dev_dma_whatsit {
/* page-level "consistent" allocators
* should they continue to be nonblocking?
*/
void *alloc_consistent(struct device *dev,
size_t *size, dma_addr_t *dma);
void free_consistent(struct device *dev,
size_t size, void *addr, dma_addr_t dma);

/* "streaming mapping" calls
* return BAD_DMA_ADDR on error
*/
dma_addr_t map_single(struct device *dev,
void *addr, size_t size, int dir);
void unmap_single(struct device *dev,
dma_addr_t, size_t size, int dir);
void sync_single(struct device *dev,
dma_addr_t, size_t size, int dir);

/* Also: dma masks, page and scatterlist
* map/unmap/sync, dma64_addr_t, ...
*/
};

Those would be designed so they could be shared between
devices, invisibly to code talking to a struct device.

Example: all USB devices connected to a given bus would
normally delegate to the PCI device underlying that bus.
(Except for non-PCI host controllers, of course!)

And the PCI versions of those methods have rather obvious
implementations ... :)

- There'd be dev_*() routines that in many cases just call
directly to those methods, or failed cleanly if the whatsit
wasn't set up ("used the default allocator"), or omitted
key methods. Many could be inlinable functions:

dev_alloc_consistent(...)
dev_free_consistent(...)
dev_map_single(...)
... etc

In some cases they'd mostly be layered over those primitives:

/* kmalloc analogue */
void *dev_kmalloc(struct device *dev,
size_t size, int mem_flags,
dma_addr_t *dma);
void dev_kfree (struct device *dev, void *mem,
dma_addr_t dma);

/* pci_pool/kmem_cache_t analogues */
struct dev_pool *dev_pool_create(const char *name,
struct device *dev, size_t size,
size_t allocation, int mem_flags);
void dev_pool_destroy(struct dev_pool *pool);

void *dev_pool_alloc(struct dev_pool *pool,
int mem_flags, dma_addr_t *dma);
void dev_pool_free(struct dev_pool *pool,
void *mem, dma_addr_t dma);

- Some of the other driver APIs would want to change to
leverage these primitives. They might want to provide
wrappers to make the dev_*() calls given the subsystem
device API.

Again using USB for an, one _possible_ change would be
to make urb->transfer_buffer be a dma_addr_t ... where
today it's a void*. (Discussions of such details belong
on the linux-usb-devel list, for the most part.) That'd
be a pretty major change to the USB API, since it'd force
device drivers to manage DMA mappings. While usb-storage
might benefit from sglist support, other drivers might
not see much of a win.

OK, that's enough trouble for this morning... ;)

- Dave



2002-06-12 15:00:55

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

> Those would be designed so they could be shared between
> devices, invisibly to code talking to a struct device.
>
> Example: all USB devices connected to a given bus would
> normally delegate to the PCI device underlying that bus.
> (Except for non-PCI host controllers, of course!)
>
> And the PCI versions of those methods have rather obvious
> implementations ... :)

Yup. The simplest way I see here is to have a device automatically
inherit, by default, his parent device ones. The arch will provide
default "root" allocators, so a fully coherent or fully non-coherent
arch may not have to do anything but these.

Now, if a given bus in the arch need specific quirks, it's up to
the bus device node to put it's own versions to be used by it's
childs.

Ben.


2002-06-12 16:09:18

by William Jhun

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

On Wed, Jun 12, 2002 at 04:47:59AM -0700, David S. Miller wrote:
> From: William Jhun <[email protected]>
> Date: Mon, 10 Jun 2002 11:07:40 -0700
>
> On Sun, Jun 09, 2002 at 09:27:05PM -0700, David S. Miller wrote:
> > I'm trying to specify this such that knowledge of cachelines and
> > whatnot don't escape the arch specific code, ho hum... Looks like
> > that isn't possible.
>
> Perhaps provide macros in asm/pci.h that will:
>
> You don't understand, I think. I want to avoid the drivers doing
> any of the "align this, align that" stuff. I want the allocation
> to do it for them, that way the code is in one place.

No, I do understand your point. And this does not bring "knowledge of
cachelines and whatnot" into the driver; those "macros" could similarly
be calls to arch-specific code that acts based on a pdev. I was simply
trying to think of a compromise between that and massively changing the
interface by which a driver obtains buffers. And I assume alloc_skb()
and others would need to change otherwise. How would you specify if your
skb data needs to be PCI DMA-able? What about net drivers not using DMA
at all?

Thanks,
Will

2002-06-12 18:29:21

by Pavel Machek

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Hi!

> The DMA_ALIGN attribute doesn't work, on some systems the PCI
> cacheline size is determined at boot time not compile time.
>
> Another note, it could be per-PCI controller what this cacheline size
> is. We'll need to pass in a pdev to the alignment interfaces to
> do this correctly.
>
> So none of this can be done at compile time folks.

But upper bound is certainly known at compile time, right?
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

2002-06-12 18:44:49

by Roland Dreier

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>>>> "David" == David Brownell <[email protected]> writes:

David> Again using USB for an, one _possible_ change would be to
David> make urb->transfer_buffer be a dma_addr_t ... where today
David> it's a void*. (Discussions of such details belong on the
David> linux-usb-devel list, for the most part.) That'd be a
David> pretty major change to the USB API, since it'd force device
David> drivers to manage DMA mappings. While usb-storage might
David> benefit from sglist support, other drivers might not see
David> much of a win.

Let's go back to the beginning of this discussion since I think we're
losing sight of the original problem. In general I certainly support
the idea of making the DMA mapping stuff device generic instead of
tied to PCI. For example, this would really clean up the handling of
on-chip peripherals for some of the PowerPC 4xx CPUs I work with (in
fact on the linuxppc-embedded list there has been discussion about the
abuse of constants like PCI_DMA_FROMDEVICE in contexts that have
nothing to do with PCI).

However, this discussion started when I fixed some problems I was
having with USB on my IBM PowerPC 440GP system (which is not cache
coherent). I posted a patch to linux-usb-devel that got rid of the
use of unaligned buffers for DMA. Most of the changes got rid of DMA
into variables on the stack, which everyone agreed were clear bugs. I
also changed some code that did DMA into the middle of a struct in
kmalloc()'ed memory. As a concrete example, in drivers/usb/hub.h I
made the change:

struct usb_hub {
struct usb_device *dev;

struct urb *urb; /* Interrupt polling pipe */

- char buffer[(USB_MAXCHILDREN + 1 + 7) / 8]; /* add 1 bit for hub status change */
- /* and add 7 bits to round up to byte boundary */
+ char *buffer;
int error;
int nerrors;

and then after the kmalloc of struct usb_hub *hub, I added a kmalloc
of hub->buffer.

There were two objections to this change. First, there were questions
about whether the change was really needed. I think we have all
agreed that DMA into buffer is not necessarily safe. Second, people
felt that splitting the allocation of hub in two introduced needless
complication and that it would be better to align buffer within the
structure. To this end I proposed a __dma_buffer macro; cache
coherent architectures would just do

#define __dma_buffer

while non-cache-coherent architectures would define

#define __dma_buffer __dma_buffer_line(__LINE__)
#define __dma_buffer_line(line) __dma_buffer_expand_line(line)
#define __dma_buffer_expand_line(line) \
__attribute__ ((aligned(L1_CACHE_BYTES))); \
char __dma_pad_ ## line [0] __attribute__ ((aligned(L1_CACHE_BYTES)))

Then the USB hub fix would just amount to changing the declaration of
buffer to

char buffer[(USB_MAXCHILDREN + 1 + 7) / 8] __dma_buffer; /* add 1 bit for hub status change */

and no other code would change. This seems to me to be pretty close
to the minimal change to make things correct. You would get zero
bloat on cache coherent architectures, and only one line of code needs
to be touched. We can debate about whether this puts too much
knowledge about DMA into the driver; I would argue that
device-specific allocators are worse.

The other objection to __dma_buffer seems to be that alignment
requirements can't be calculated at compile time; however this problem
seems to exist for kmalloc() as well (since slab caches have to be
initialized well before every bus is discovered, not to mention the
fact that we might have new alignment requirements coming from
hot-plugged devices). It seems to me that both kmalloc() and
__dma_buffer both need to align for the worst possible case, and no
one objects to kmalloc(). (Well, you might say that device drivers
should use device specific allocators, but skbuffs and other generic
buffers don't know what device they'll end up at so they need to come
from a generic allocator)

The current USB driver design seems pretty reasonable: only the HCD
drivers need to know about DMA mappings, and other USB drivers just
pass buffer addresses. I don't think you would get much support for
forcing every driver to handle its own DMA mapping.

I would like to see both dev_map_xxx etc. and something like
__dma_buffer go into the kernel. I think they both have their uses.

Best,
Roland

2002-06-12 19:11:56

by David Brownell

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

> Let's go back to the beginning of this discussion since I think we're
> losing sight of the original problem.

I think actually a couple were unearthed, but you're right to focus.


> In general I certainly support
> the idea of making the DMA mapping stuff device generic instead of
> tied to PCI.

PCI was a good place to start (focus ... :) but clearly it shouldn't
be the only bus architecture with such support. Note that there are
actually two related approaches in the DMA-mapping.txt APIs ... one is
DMA mapping, the other is "consistent memory". Both should be made
generic rather than PCI-specific, not just mapping APIs.


> However, this discussion started when I fixed some problems I was
> having with USB on my IBM PowerPC 440GP system (which is not cache
> coherent).

... which basically led to discussion of options I summarized a while
back as (a) expose the issues to drivers via macros, or (b) expose
them through some other API.

And DaveM wanted to focus on (b) options that involve exposing
consistent memory to drivers as buffers (sized in multiples of
cachelines, where that matters) rather than DMA-mapping them.

Based on the discussion, I think the answer for now is to go with
the (b) variant you had originally started with, using kmalloc for
the buffers. The __dma_buffer style macro didn't seem popular;
though I agree that it's not clear kmalloc() really solves it
today. (Given DaveM's SPARC example, the minimum size value it
returns would need to be 128 bytes ... which clearly isn't so.)


> The current USB driver design seems pretty reasonable: only the HCD
> drivers need to know about DMA mappings, and other USB drivers just
> pass buffer addresses. I don't think you would get much support for
> forcing every driver to handle its own DMA mapping.

Me either. But I suspect that it'd be good to have that as an option;
maybe just add a transfer_dma field to the URB, and have the HCD use
that, instead of creating a mapping, when transfer_buffer is null.
That'd certainly be a better approach for supporting sglist in the
usb-storage code than the alternatives I've heard so far.


> I would like to see both dev_map_xxx etc. and something like
> __dma_buffer go into the kernel. I think they both have their uses.

Got Patch? Actually, I know you do, I shouldn't ask. :)

- Dave




2002-06-12 20:21:24

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Pavel Machek <[email protected]>
Date: Wed, 12 Jun 2002 11:06:39 +0200

But upper bound is certainly known at compile time, right?

That's true.

2002-06-12 20:23:02

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch


> Based on the discussion, I think the answer for now is to go with
> the (b) variant you had originally started with, using kmalloc for
> the buffers. The __dma_buffer style macro didn't seem popular;
> though I agree that it's not clear kmalloc() really solves it
> today. (Given DaveM's SPARC example, the minimum size value it
> returns would need to be 128 bytes ... which clearly isn't so.)

Perhaps I might point out that we need a solution for 2.4.
We certainly could go for kmalloc. But given the sparc64 example,
it won't work. If you do it with alignment in your own buffer, you can
control which fields are accessed.
We'd need to export a worst case minimum size for kmalloc.
Ugly, very ugly. Tons of kmalloc( size>MIN_DMA ? size : MIN_DMA,...

But it seems that this would mean quite a painful change in many drivers.
Which alignment macros wouldn't mean. In fact in the common case
there'd be no code change at all.
So if we take stability as a guideline, the alignment macros win without
question.

[..]
> That'd certainly be a better approach for supporting sglist in the
> usb-storage code than the alternatives I've heard so far.

Could you elaborate ? This sounds interesting.

> > I would like to see both dev_map_xxx etc. and something like
> > __dma_buffer go into the kernel. I think they both have their uses.
>
> Got Patch? Actually, I know you do, I shouldn't ask. :)

So perhaps something could be done in time for 2.4.20 ?

Regards
Oliver

2002-06-12 22:50:57

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: David Brownell <[email protected]>
Date: Wed, 12 Jun 2002 12:13:08 -0700

> The current USB driver design seems pretty reasonable: only the HCD
> drivers need to know about DMA mappings, and other USB drivers just
> pass buffer addresses. I don't think you would get much support for
> forcing every driver to handle its own DMA mapping.

Me either. But I suspect that it'd be good to have that as an option;
maybe just add a transfer_dma field to the URB, and have the HCD use
that, instead of creating a mapping, when transfer_buffer is null.
That'd certainly be a better approach for supporting sglist in the
usb-storage code than the alternatives I've heard so far.

Another solution for the "small chunks" issue is to in fact keep all
of the real DMA buffers in the host controller driver. Ie. a pool
of tiny consitent DMA memory bounce buffers. That is an idea for
discussion, I have no particular preference.

This could actually improve performance for things like the input
layer USB drivers. On several systems multiple cache lines are
fetched when a keyboard key is typed, and this is because we use
streaming buffers instead of consistent ones for these transfers.

We have two problems we want to solve, the DMA alignment stuff and
using consistent memory for these small buffers. Therefore moving to
consistent memory (by whatever mechanism the USB desires to implement
this) is the way to go.

2002-06-12 22:56:10

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: David Brownell <[email protected]>
Date: Wed, 12 Jun 2002 12:13:08 -0700

> In general I certainly support
> the idea of making the DMA mapping stuff device generic instead of
> tied to PCI.

PCI was a good place to start (focus ... :) but clearly it shouldn't
be the only bus architecture with such support. Note that there are
actually two related approaches in the DMA-mapping.txt APIs ... one is
DMA mapping, the other is "consistent memory". Both should be made
generic rather than PCI-specific, not just mapping APIs.

I tried to imply this, if I didn't it is my error.

The intention is that every interface in DMA-mapping.txt will have
a dev_* counterpart when we move away from pci_dev to the generic
device struct.

One idea I want people to avoid is that we end up trying to do DMA
operations from a USB generic device struct. This will lead to
multiple levels of indirection to do the PCI dma operation (USB device
--> USB bus --> PCI bus --> PCI bus DMA ops) and that will be ugly as
hell.

Based on the discussion, I think the answer for now is to go with
the (b) variant you had originally started with, using kmalloc for
the buffers. The __dma_buffer style macro didn't seem popular;
though I agree that it's not clear kmalloc() really solves it
today. (Given DaveM's SPARC example, the minimum size value it
returns would need to be 128 bytes ... which clearly isn't so.)

Let's ignore the sparc64 issue for now. It's really a red herring
because as Pavel mentioned I could just use the largest size.

In actuality the "sparc64 issue" was partially a straw-man. Sparc64
has none of the DMA alignment issues as the PCI controller provides
coherency of partial cacheline transfers, we just need to flush
pending writes and prefetched reads out of the PCI controllers around
non-consistent DMA transfers.

Franks a lot,
David S. Miller
[email protected]

2002-06-12 22:57:54

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Oliver Neukum <[email protected]>
Date: Wed, 12 Jun 2002 21:58:55 +0200

Perhaps I might point out that we need a solution for 2.4.

For 2.4.x just do the DMA alignment macro idea. That would
be the easiest change to verify.

2002-06-12 23:18:04

by Oliver Neukum

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

Am Donnerstag, 13. Juni 2002 00:51 schrieb David S. Miller:
> From: Oliver Neukum <[email protected]>
> Date: Wed, 12 Jun 2002 21:58:55 +0200
>
> Perhaps I might point out that we need a solution for 2.4.
>
> For 2.4.x just do the DMA alignment macro idea. That would
> be the easiest change to verify.

OK, there we have something to work on.
Apart from the drivers I've already mailed, struct scsi_cmnd
is affected. kmalloc might be a problem.

Could we agree on a name for the macro ?

Regards
Oliver


2002-06-13 00:23:29

by Roland Dreier

[permalink] [raw]
Subject: [PATCH] 2.4 add __dma_buffer alignment macro

Following up on the "PCI DMA to small buffers on cache-incoherent
arch" discussion, here is a patch that adds <asm/dma_buffer.h>, which
contains a __dma_buffer alignment macro for DMA into buffers in the
middle of structures.

I only implemented for i386 and ppc, I don't know enough about other
architectures to get them right.

Thanks,
Roland

Documentation/DMA-mapping.txt | 27 +++++++++++++++++++++++++++
include/asm-alpha/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-arm/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-cris/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-i386/dma_buffer.h | 23 +++++++++++++++++++++++
include/asm-ia64/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-m68k/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-mips/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-mips64/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-parisc/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-ppc/dma_buffer.h | 38 ++++++++++++++++++++++++++++++++++++++
include/asm-s390/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-s390x/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-sh/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-sparc/dma_buffer.h | 22 ++++++++++++++++++++++
include/asm-sparc64/dma_buffer.h | 22 ++++++++++++++++++++++
16 files changed, 374 insertions(+)

diff -Naur linux-2.4.18.orig/Documentation/DMA-mapping.txt linux-2.4.18/Documentation/DMA-mapping.txt
--- linux-2.4.18.orig/Documentation/DMA-mapping.txt Mon Feb 25 11:37:51 2002
+++ linux-2.4.18/Documentation/DMA-mapping.txt Wed Jun 12 13:38:30 2002
@@ -67,6 +67,33 @@
networking subsystems make sure that the buffers they use are valid
for you to DMA from/to.

+Note that on non-cache-coherent architectures, having a DMA buffer
+that shares a cache line with other data can lead to memory
+corruption. The __dma_buffer macro defined in <asm/dma_buffer.h>
+exists to allow safe DMA buffers to be declared easily and portably as
+part of larger structures without causing bloat on cache-coherent
+architectures. Of course these structures must be contained in memory
+that can be used for DMA as described above.
+
+To use __dma_buffer, just declare a struct like:
+
+ struct mydevice {
+ int field1;
+ char buffer[BUFFER_SIZE] __dma_buffer;
+ int field2;
+ };
+
+If this is used in code like:
+
+ struct mydevice *dev;
+ dev = kmalloc(sizeof *dev, GFP_KERNEL);
+
+then dev->buffer will be safe for DMA on all architectures. On a
+cache-coherent architecture the members of dev will be aligned exactly
+as they would have been without __dma_buffer; on a non-cache-coherent
+architecture buffer and field2 will be aligned so that buffer does not
+share a cache line with any other data.
+
DMA addressing limitations

Does your device have any DMA addressing limitations? For example, is
diff -Naur linux-2.4.18.orig/include/asm-alpha/dma_buffer.h linux-2.4.18/include/asm-alpha/dma_buffer.h
--- linux-2.4.18.orig/include/asm-alpha/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-alpha/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_ALPHA_DMA_BUFFER_H
+#define _ASM_ALPHA_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for alpha
+
+#endif /* _ASM_ALPHA_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-arm/dma_buffer.h linux-2.4.18/include/asm-arm/dma_buffer.h
--- linux-2.4.18.orig/include/asm-arm/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-arm/dma_buffer.h Wed Jun 12 13:45:45 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_ARM_DMA_BUFFER_H
+#define _ASM_ARM_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for arm
+
+#endif /* _ASM_ARM_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-cris/dma_buffer.h linux-2.4.18/include/asm-cris/dma_buffer.h
--- linux-2.4.18.orig/include/asm-cris/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-cris/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_CRIS_DMA_BUFFER_H
+#define _ASM_CRIS_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for cris
+
+#endif /* _ASM_CRIS_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-i386/dma_buffer.h linux-2.4.18/include/asm-i386/dma_buffer.h
--- linux-2.4.18.orig/include/asm-i386/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-i386/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,23 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_I386_DMA_BUFFER_H
+#define _ASM_I386_DMA_BUFFER_H
+
+/* We have cache-coherent DMA so __dma_buffer is defined to be a NOP */
+#define __dma_buffer
+
+#endif /* _ASM_I386_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-ia64/dma_buffer.h linux-2.4.18/include/asm-ia64/dma_buffer.h
--- linux-2.4.18.orig/include/asm-ia64/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-ia64/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_IA64_DMA_BUFFER_H
+#define _ASM_IA64_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for ia64
+
+#endif /* _ASM_IA64_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-m68k/dma_buffer.h linux-2.4.18/include/asm-m68k/dma_buffer.h
--- linux-2.4.18.orig/include/asm-m68k/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-m68k/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_M68K_DMA_BUFFER_H
+#define _ASM_M68K_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for m68k
+
+#endif /* _ASM_M68K_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-mips/dma_buffer.h linux-2.4.18/include/asm-mips/dma_buffer.h
--- linux-2.4.18.orig/include/asm-mips/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-mips/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_MIPS_DMA_BUFFER_H
+#define _ASM_MIPS_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for mips
+
+#endif /* _ASM_MIPS_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-mips64/dma_buffer.h linux-2.4.18/include/asm-mips64/dma_buffer.h
--- linux-2.4.18.orig/include/asm-mips64/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-mips64/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_MIPS64_DMA_BUFFER_H
+#define _ASM_MIPS64_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for mips64
+
+#endif /* _ASM_MIPS64_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-parisc/dma_buffer.h linux-2.4.18/include/asm-parisc/dma_buffer.h
--- linux-2.4.18.orig/include/asm-parisc/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-parisc/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_PARISC_DMA_BUFFER_H
+#define _ASM_PARISC_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for parisc
+
+#endif /* _ASM_PARISC_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-ppc/dma_buffer.h linux-2.4.18/include/asm-ppc/dma_buffer.h
--- linux-2.4.18.orig/include/asm-ppc/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-ppc/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,38 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version. */
+
+#ifdef __KERNEL__
+#ifndef _ASM_PPC_DMA_BUFFER_H
+#define _ASM_PPC_DMA_BUFFER_H
+
+#include <linux/config.h>
+
+#ifdef CONFIG_NOT_COHERENT_CACHE
+
+#include <asm/cache.h>
+
+#define __dma_buffer __dma_buffer_line(__LINE__)
+#define __dma_buffer_line(line) __dma_buffer_expand_line(line)
+#define __dma_buffer_expand_line(line) \
+ __attribute__ ((aligned(L1_CACHE_BYTES))); \
+ char __dma_pad_ ## line [0] __attribute__ ((aligned(L1_CACHE_BYTES)))
+
+#else /* CONFIG_NOT_COHERENT_CACHE */
+
+/* We have cache-coherent DMA so __dma_buffer is defined to be a NOP */
+#define __dma_buffer
+
+#endif /* CONFIG_NOT_COHERENT_CACHE */
+
+#endif /* _ASM_PPC_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-s390/dma_buffer.h linux-2.4.18/include/asm-s390/dma_buffer.h
--- linux-2.4.18.orig/include/asm-s390/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-s390/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_S390_DMA_BUFFER_H
+#define _ASM_S390_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for s390
+
+#endif /* _ASM_S390_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-s390x/dma_buffer.h linux-2.4.18/include/asm-s390x/dma_buffer.h
--- linux-2.4.18.orig/include/asm-s390x/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-s390x/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_S390X_DMA_BUFFER_H
+#define _ASM_S390X_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for s390x
+
+#endif /* _ASM_S390X_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-sh/dma_buffer.h linux-2.4.18/include/asm-sh/dma_buffer.h
--- linux-2.4.18.orig/include/asm-sh/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-sh/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_SH_DMA_BUFFER_H
+#define _ASM_SH_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for sh
+
+#endif /* _ASM_SH_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-sparc/dma_buffer.h linux-2.4.18/include/asm-sparc/dma_buffer.h
--- linux-2.4.18.orig/include/asm-sparc/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-sparc/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_SPARC_DMA_BUFFER_H
+#define _ASM_SPARC_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for sparc
+
+#endif /* _ASM_SPARC_DMA_BUFFER_H */
+#endif /* __KERNEL__ */
diff -Naur linux-2.4.18.orig/include/asm-sparc64/dma_buffer.h linux-2.4.18/include/asm-sparc64/dma_buffer.h
--- linux-2.4.18.orig/include/asm-sparc64/dma_buffer.h Wed Dec 31 16:00:00 1969
+++ linux-2.4.18/include/asm-sparc64/dma_buffer.h Wed Jun 12 13:38:30 2002
@@ -0,0 +1,22 @@
+/*
+ * __dma_buffer macro for safe DMA into struct members
+ *
+ * See "What memory is DMA'able?" in Documentation/DMA-mapping.txt
+ * for more information.
+ *
+ * Copyright (c) 2002 Roland Dreier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifdef __KERNEL__
+#ifndef _ASM_SPARC64_DMA_BUFFER_H
+#define _ASM_SPARC64_DMA_BUFFER_H
+
+#warning <asm/dma_buffer.h> is not implemented yet for sparc64
+
+#endif /* _ASM_SPARC64_DMA_BUFFER_H */
+#endif /* __KERNEL__ */

2002-06-13 01:11:53

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] 2.4 add __dma_buffer alignment macro


Just use asm/dma.h, no need to make a new file.

2002-06-13 01:19:17

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH] 2.4 add __dma_buffer alignment macro

>>>>> "David" == David S Miller <[email protected]> writes:

David> Just use asm/dma.h, no need to make a new file.

I could do that, however my thinking was that if I added it to an
existing file then it would add to the existing nested include bloat.
For example <asm/dma.h> for i386 has 10 inline functions and includes
<linux/config.h>, <linux/spinlock.h>, <asm/io.h> and <linux/delay.h>.
Seems kind of excessive just to get one macro that ends up being the
empty string, and I would prefer not to force people to include all
that just to declare a structure.

Best,
Roland

2002-06-13 04:56:37

by David Brownell

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

>>That'd certainly be a better approach for supporting sglist in the
>>usb-storage code than the alternatives I've heard so far.
>
> Could you elaborate ? This sounds interesting.

It's been discussed on linux-usb-devel more than once.

It's really a secondary or tertiary consideration. The top
performance issue for usb-storage in USB 2.0 is that it doesn't
yet use bulk queueing for its requests ... which means that it
wastes at least half the bandwidth available to it (staying in
the low tens of MByte/sec vs twenties or even higher). [*]

The sglist issue is a "how to create fewer DMA mappings" issue.
There are specific sglist DMA mapping calls, but they can't
be used without adding to the usbcore API ... either sglist,
usable only by usb-storage AFAICT, or something more general
like dma_addr_t.

In any case it's an optimization issue that doesn't seem like it
should be at the front of anyone's list for some time.

- Dave

[*] Yes, patches have been available to fix it. Not updated
after the block I/O changes in 2.5, but not hard to do so.

Worth updating and integrating after the HCD stuff settles
down so we can make sure that all three major HCDs do the
bulk queuing correctly in both success and failure paths.

2002-06-13 09:37:56

by David Brownell

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

David S. Miller wrote:
> From: David Brownell <[email protected]>
> Date: Wed, 12 Jun 2002 12:13:08 -0700
>
> > The current USB driver design seems pretty reasonable: only the HCD
> > drivers need to know about DMA mappings, and other USB drivers just
> > pass buffer addresses. I don't think you would get much support for
> > forcing every driver to handle its own DMA mapping.
>
> Me either. But I suspect that it'd be good to have that as an option;
> maybe just add a transfer_dma field to the URB, and have the HCD use
> that, instead of creating a mapping, when transfer_buffer is null.
> That'd certainly be a better approach for supporting sglist in the
> usb-storage code than the alternatives I've heard so far.
>
> Another solution for the "small chunks" issue is to in fact keep all
> of the real DMA buffers in the host controller driver. Ie. a pool
> of tiny consitent DMA memory bounce buffers. That is an idea for
> discussion, I have no particular preference.

Interesting idea. I'd rather have the "hcd framework" layer
manage such stuff than have each of N host controller drivers
trying to do the same thing though ... less likely to turn
into bug heaven.


> This could actually improve performance for things like the input
> layer USB drivers. On several systems multiple cache lines are
> fetched when a keyboard key is typed, and this is because we use
> streaming buffers instead of consistent ones for these transfers.

And that's exactly the problem scenario, since it uses the "automagic
resubmit" model with interrupt transfers. Each of the HCDs needs to
handle that differently -- different behaviors, different bugs, yeech.
I won't go into it now, but a queued transfer model would work a lot
better.


> We have two problems we want to solve, the DMA alignment stuff and
> using consistent memory for these small buffers. Therefore moving to
> consistent memory (by whatever mechanism the USB desires to implement
> this) is the way to go.

Right, the alignment stuff is a correctness issue, the consistent
memory issue is a performance concern. I like to think that 2.5
will have a lot less correctness issues in the USB stack, so it
can start to pay more attention to performance concerns.

- Dave




2002-06-13 09:43:29

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: David Brownell <[email protected]>
Date: Wed, 12 Jun 2002 22:13:17 -0700

David S. Miller wrote:
> We have two problems we want to solve, the DMA alignment stuff and
> using consistent memory for these small buffers. Therefore moving to
> consistent memory (by whatever mechanism the USB desires to implement
> this) is the way to go.

Right, the alignment stuff is a correctness issue, the consistent
memory issue is a performance concern. I like to think that 2.5
will have a lot less correctness issues in the USB stack, so it
can start to pay more attention to performance concerns.

I want to reemphasize that by going to consistent memory we solve
both problems, in particular the alignment stuff becomes a non-issue.

2002-06-14 04:19:17

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Oliver Neukum <[email protected]>
Date: Tue, 11 Jun 2002 16:23:30 +0200

Still does that justify the overhead and the complications ?
Couldn't we provide for the worst case in a generic kernel
and make it a compile time option ?

I already posted that using the worst case scenerio would be an
acceptable solution to this problem, I consider it a closed issue
and you can ignore my earlier gripes on this issue.

2002-06-14 04:46:16

by David Miller

[permalink] [raw]
Subject: Re: PCI DMA to small buffers on cache-incoherent arch

From: Oliver Neukum <[email protected]>
Date: Wed, 12 Jun 2002 14:08:26 +0200

That means that all buffers must be allocated seperately.
Is it worth that ? How about skbuffs ?

skbuffs are fine, they are already minimally aligned to
SMP_CACHE_BYTES and also have the same minimum length.

The only problematic case are as in the eepro100.c driver where it
tries to use part of the skbuff data area for the receive descriptor.