2009-06-28 14:39:32

by Ming Lei

[permalink] [raw]
Subject: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

From: Ming Lei <[email protected]>

dma_sync_*_for_cpu() is introduced to make cpu access dma buffers safely when
dma transfer is over, it seems there is nothing to do with cpu write buffer,
so remove it.

Signed-off-by: Ming Lei <[email protected]>
---
include/asm-generic/dma-mapping-common.h | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/dma-mapping-common.h b/include/asm-generic/dma-mapping-common.h
index 5406a60..73411ab 100644
--- a/include/asm-generic/dma-mapping-common.h
+++ b/include/asm-generic/dma-mapping-common.h
@@ -103,7 +103,6 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
if (ops->sync_single_for_cpu)
ops->sync_single_for_cpu(dev, addr, size, dir);
debug_dma_sync_single_for_cpu(dev, addr, size, dir);
- flush_write_buffers();
}

static inline void dma_sync_single_for_device(struct device *dev,
@@ -132,7 +131,6 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev,
ops->sync_single_range_for_cpu(dev, addr, offset, size, dir);
debug_dma_sync_single_range_for_cpu(dev, addr, offset, size, dir);

- flush_write_buffers();
} else
dma_sync_single_for_cpu(dev, addr, size, dir);
}
@@ -165,7 +163,6 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
if (ops->sync_sg_for_cpu)
ops->sync_sg_for_cpu(dev, sg, nelems, dir);
debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
- flush_write_buffers();
}

static inline void
--
1.6.0.GIT


2009-06-28 18:15:27

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Sunday 28 June 2009 14:39:19 [email protected] wrote:
> From: Ming Lei <[email protected]>
>
> dma_sync_*_for_cpu() is introduced to make cpu access dma buffers safely when
> dma transfer is over, it seems there is nothing to do with cpu write buffer,
> so remove it.
>
> Signed-off-by: Ming Lei <[email protected]>

Right, this looks correct. On a related note, flush_write_buffers is
architecture specific right now: only x86 and frv implement it at all,
though and with slightly different semantics.

Maybe it would be more consistent to change the dma_map_* and
dma_sync_*_for_device stuff there to wmb() to make it portable
to other architectures.

Arnd <><

2009-06-29 12:32:16

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Sun, Jun 28, 2009 at 03:34:35PM +0000, Arnd Bergmann wrote:
> On Sunday 28 June 2009 14:39:19 [email protected] wrote:
> > From: Ming Lei <[email protected]>
> >
> > dma_sync_*_for_cpu() is introduced to make cpu access dma buffers safely when
> > dma transfer is over, it seems there is nothing to do with cpu write buffer,
> > so remove it.
> >
> > Signed-off-by: Ming Lei <[email protected]>
>
> Right, this looks correct. On a related note, flush_write_buffers is
> architecture specific right now: only x86 and frv implement it at all,
> though and with slightly different semantics.

This doen't look correct to me. The sync functions may do bounce buffering
which is all about copying data from one place in main memory to another. So we
need these flush_write_buffer() calls in the _for_cpu path too.

> Maybe it would be more consistent to change the dma_map_* and
> dma_sync_*_for_device stuff there to wmb() to make it portable
> to other architectures.

If we change it to wmb() it would be executed every time there even if the
processor doesn't require it. Other architectures could simply add a
flush_write_buffers() implemention if they want to adapt the common dma-mapping
implementation, no?

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-06-29 13:58:25

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

2009/6/29 Joerg Roedel <[email protected]>:
> On Sun, Jun 28, 2009 at 03:34:35PM +0000, Arnd Bergmann wrote:
>> On Sunday 28 June 2009 14:39:19 [email protected] wrote:
>> > From: Ming Lei <[email protected]>
>> >
>> > dma_sync_*_for_cpu() is introduced to make cpu access dma buffers safely when
>> > dma transfer is over, it seems there is nothing to do with cpu write buffer,
>> > so remove it.
>> >
>> > Signed-off-by: Ming Lei <[email protected]>
>>
>> Right, this looks correct. On a related note, flush_write_buffers is
>> architecture specific right now: only x86 and frv implement it at all,
>> though and with slightly different semantics.
>
> This doen't look correct to me. The sync functions may do bounce buffering
> which is all about copying data from one place in main memory to another. So we
> need these flush_write_buffer() calls in the _for_cpu path too.

IMHO, even we do not call flush_write_buffer(), CPU can read correct
data from the
dma buffer since write buffer can't affect cache, right?

Thanks.

>
>> Maybe it would be more consistent to change the dma_map_* and
>> dma_sync_*_for_device stuff there to wmb() to make it ?portable
>> to other architectures.
>
> If we change it to wmb() it would be executed every time there even if the
> processor doesn't require it. Other architectures could simply add a
> flush_write_buffers() implemention if they want to adapt the common dma-mapping
> implementation, no?
>
> ? ? ? ?Joerg
>
> --
> ? ? ? ? ? | Advanced Micro Devices GmbH
> ?Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
> ?System ? ?|
> ?Research ?| Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
> ?Center ? ?| Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
> ? ? ? ? ? | Registergericht M?nchen, HRB Nr. 43632
>
>



--
Lei Ming

2009-06-29 14:46:02

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Mon, Jun 29, 2009 at 09:51:34PM +0800, Ming Lei wrote:
> 2009/6/29 Joerg Roedel <[email protected]>:
> > On Sun, Jun 28, 2009 at 03:34:35PM +0000, Arnd Bergmann wrote:
> >> On Sunday 28 June 2009 14:39:19 [email protected] wrote:
> >> > From: Ming Lei <[email protected]>
> >> >
> >> > dma_sync_*_for_cpu() is introduced to make cpu access dma buffers safely when
> >> > dma transfer is over, it seems there is nothing to do with cpu write buffer,
> >> > so remove it.
> >> >
> >> > Signed-off-by: Ming Lei <[email protected]>
> >>
> >> Right, this looks correct. On a related note, flush_write_buffers is
> >> architecture specific right now: only x86 and frv implement it at all,
> >> though and with slightly different semantics.
> >
> > This doen't look correct to me. The sync functions may do bounce buffering
> > which is all about copying data from one place in main memory to another. So we
> > need these flush_write_buffer() calls in the _for_cpu path too.
>
> IMHO, even we do not call flush_write_buffer(), CPU can read correct
> data from the
> dma buffer since write buffer can't affect cache, right?

flush_write_buffer is not about cache flushing. It is about read/write
reordering in the CPU. Think of it as a memory barrier. On most x86
systems this function is therefore a nop. Cache flushing for
architectures without cache-coherent DMA is typically handled in their
low-level dma-api code (I have seen that at least in parisc32).

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-06-29 14:55:12

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

2009/6/29 Joerg Roedel <[email protected]>:
>>
>> IMHO, even we do not call flush_write_buffer(), CPU can read correct
>> data from the
>> dma buffer since write buffer can't affect cache, right?
>
> flush_write_buffer is not about cache flushing. It is about read/write
> reordering in the CPU. Think of it as a memory barrier. On most x86

You mean we may need a memory barrier between writing data from bouncing buffer
to dma buffer and reading data from dma buffer, do you?

> systems this function is therefore a nop. Cache flushing for
> architectures without cache-coherent DMA is typically handled in their
> low-level dma-api code (I have seen that at least in parisc32).



--
Lei Ming

2009-06-29 15:45:02

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Mon, Jun 29, 2009 at 10:54:51PM +0800, Ming Lei wrote:
> 2009/6/29 Joerg Roedel <[email protected]>:
> >>
> >> IMHO, even we do not call flush_write_buffer(), CPU can read correct
> >> data from the
> >> dma buffer since write buffer can't affect cache, right?
> >
> > flush_write_buffer is not about cache flushing. It is about read/write
> > reordering in the CPU. Think of it as a memory barrier. On most x86
>
> You mean we may need a memory barrier between writing data from bouncing buffer
> to dma buffer and reading data from dma buffer, do you?

CONFIG_X86_OOSTORE (when set flush_write_buffers is a memory barrier and not a
nop on x86) is defined for WINCHIP. So it seems necessary on some chips.

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-06-29 16:22:49

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Monday 29 June 2009, Joerg Roedel wrote:
> On Sun, Jun 28, 2009 at 03:34:35PM +0000, Arnd Bergmann wrote:
> > On Sunday 28 June 2009 14:39:19 [email protected] wrote:
> > > From: Ming Lei <[email protected]>
> > >
> > > dma_sync_*_for_cpu() is introduced to make cpu access dma buffers safely when
> > > dma transfer is over, it seems there is nothing to do with cpu write buffer,
> > > so remove it.
> > >
> > > Signed-off-by: Ming Lei <[email protected]>
> >
> > Right, this looks correct. On a related note, flush_write_buffers is
> > architecture specific right now: only x86 and frv implement it at all,
> > though and with slightly different semantics.
>
> This doen't look correct to me. The sync functions may do bounce buffering
> which is all about copying data from one place in main memory to another. So we
> need these flush_write_buffer() calls in the _for_cpu path too.

Right, I didn't consider that.

Wouldn't it be better to put the flush_write_buffer in the specific
operation (swiotlb_sync_*_for_*) rather than the multiplexer?

Maybe in that case, smp_wmb() would be more appropriate because
it is defined on all architectures.

> > Maybe it would be more consistent to change the dma_map_* and
> > dma_sync_*_for_device stuff there to wmb() to make it portable
> > to other architectures.
>
> If we change it to wmb() it would be executed every time there even if the
> processor doesn't require it. Other architectures could simply add a
> flush_write_buffers() implemention if they want to adapt the common dma-mapping
> implementation, no?

As mentioned, the definition of flush_write_buffers() seems a little dodgy,
I would feel much more comfortable with putting it into the architecture
specific code or using one of the existing common barriers, since we already
have so many of them.

Arnd <><

2009-06-29 16:30:56

by Alan

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

> Wouldn't it be better to put the flush_write_buffer in the specific
> operation (swiotlb_sync_*_for_*) rather than the multiplexer?
>
> Maybe in that case, smp_wmb() would be more appropriate because
> it is defined on all architectures.

smp_wmb() is stronger and it would slow down x86 if we did that (we'd go
from no-op on a coherent platform to using mfence/lfence etc)

2009-06-29 16:45:59

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Monday 29 June 2009, Alan Cox wrote:
> > Wouldn't it be better to put the flush_write_buffer in the specific
> > operation (swiotlb_sync_*_for_*) rather than the multiplexer?
> >
> > Maybe in that case, smp_wmb() would be more appropriate because
> > it is defined on all architectures.
>
> smp_wmb() is stronger and it would slow down x86 if we did that (we'd go
> from no-op on a coherent platform to using mfence/lfence etc)
>
Really? In my copy of system.h, I read

#ifdef CONFIG_SMP
# ifdef CONFIG_X86_OOSTORE
# define smp_wmb() wmb()
# else
# define smp_wmb() barrier()
# endif
#else
# define smp_wmb() barrier()
#endif

That actually looks weaker than flush_write_buffer, as it would turn into
a barrier() in case of !SMP or !X86_OOSTORE, and into an sfence instead of
lock addl on all modern CPUs in case of SMP && X86_OOSTORE.

Of course that raises the question of whether smp_wmb() is too weak in case of
!SMP or X86_PPRO_FENCE, but with the described scenario, I don't think
it does.

Arnd <><

2009-06-29 17:14:51

by Alan

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

> Of course that raises the question of whether smp_wmb() is too weak in case of
> !SMP or X86_PPRO_FENCE, but with the described scenario, I don't think
> it does.

I'm pretty sure it is for the Winchip -but we don't care as there are no
SMP ones. OTOH with DMA we do care

2009-06-29 18:47:56

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Mon, Jun 29, 2009 at 06:45:15PM +0200, Arnd Bergmann wrote:
> On Monday 29 June 2009, Alan Cox wrote:
> > > Wouldn't it be better to put the flush_write_buffer in the specific
> > > operation (swiotlb_sync_*_for_*) rather than the multiplexer?
> > >
> > > Maybe in that case, smp_wmb() would be more appropriate because
> > > it is defined on all architectures.
> >
> > smp_wmb() is stronger and it would slow down x86 if we did that (we'd go
> > from no-op on a coherent platform to using mfence/lfence etc)
> >
> Really? In my copy of system.h, I read
>
> #ifdef CONFIG_SMP
> # ifdef CONFIG_X86_OOSTORE
> # define smp_wmb() wmb()
> # else
> # define smp_wmb() barrier()
> # endif
> #else
> # define smp_wmb() barrier()
> #endif

With that definition an smp_wmb() would do the right job on x86. If thats also
true for other architectures using this generic header we can remove
flush_write_buffer().

>
> That actually looks weaker than flush_write_buffer, as it would turn into
> a barrier() in case of !SMP or !X86_OOSTORE, and into an sfence instead of
> lock addl on all modern CPUs in case of SMP && X86_OOSTORE.

X86_OOSTORE is defined for (MWINCHIP3D || MWINCHIPC6) && MTRR. I am not sure if
these chips have sfence.
It would also help to know what OOSTORE exactly means in the context of that
chip.

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-06-29 18:48:35

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Mon, Jun 29, 2009 at 06:22:17PM +0200, Arnd Bergmann wrote:
> As mentioned, the definition of flush_write_buffers() seems a little dodgy,
> I would feel much more comfortable with putting it into the architecture
> specific code or using one of the existing common barriers, since we already
> have so many of them.

Yeah true. We should remove this function if we see that it is not required.

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-06-29 19:09:48

by Alan

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

> X86_OOSTORE is defined for (MWINCHIP3D || MWINCHIPC6) && MTRR. I am not sure if
> these chips have sfence.
> It would also help to know what OOSTORE exactly means in the context of that
> chip.

Non uncached stores may be visible out of order on the bus.

Alan

2009-06-29 19:26:39

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Mon, Jun 29, 2009 at 08:10:26PM +0100, Alan Cox wrote:
> > X86_OOSTORE is defined for (MWINCHIP3D || MWINCHIPC6) && MTRR. I am not sure if
> > these chips have sfence.
> > It would also help to know what OOSTORE exactly means in the context of that
> > chip.
>
> Non uncached stores may be visible out of order on the bus.

Ah ok, thanks. I think we should document that somewhere. A help text for the
Kconfig entry comes to my mind for that purpose. I will send a patch for this.

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-06-30 12:34:40

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Monday 29 June 2009, Alan Cox wrote:
> > Of course that raises the question of whether smp_wmb() is too weak in case of
> > !SMP or X86_PPRO_FENCE, but with the described scenario, I don't think
> > it does.
>
> I'm pretty sure it is for the Winchip -but we don't care as there are no
> SMP ones. OTOH with DMA we do care

Ok. The Winchip also does not have an IOMMU or the need for SWIOTLB, so
I guess it would be ok to move the flush_write_buffers() out of the common
code into the x86 pci-nommu implementation. That one is also the only
place that calls flush_write_buffers() in dma_map_().

This would need to get revisited if we want to implement OOSTORE for
VIA Nano, which could then need it in swiotlb.
---
[PATCH] asm-generic: remove calling flush_write_buffers() in dma_sync_*

Flushing the write buffers is only needed on x86 processors using OOSTORE
(currently only pre-VIA Centaur processors) when giving a buffer to the
device, but not for handing it back.

Based on an earlier patch from Ming Lei <[email protected]>

Cc: Joerg Roedel <[email protected]>
Cc: FUJITA Tomonori <[email protected]>
Cc: Alan Cox <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -79,12 +79,29 @@ static void nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
free_pages((unsigned long)vaddr, get_order(size));
}

+static void nommu_sync_single_for_device(struct device *dev,
+ dma_addr_t addr, size_t size,
+ enum dma_data_direction dir)
+{
+ flush_write_buffers();
+}
+
+
+static void nommu_sync_sg_for_device(struct device *dev,
+ struct scatterlist *sg, int nelems,
+ enum dma_data_direction dir)
+{
+ flush_write_buffers();
+}
+
struct dma_map_ops nommu_dma_ops = {
- .alloc_coherent = dma_generic_alloc_coherent,
- .free_coherent = nommu_free_coherent,
- .map_sg = nommu_map_sg,
- .map_page = nommu_map_page,
- .is_phys = 1,
+ .alloc_coherent = dma_generic_alloc_coherent,
+ .free_coherent = nommu_free_coherent,
+ .map_sg = nommu_map_sg,
+ .map_page = nommu_map_page,
+ .sync_single_for_device = nommu_sync_single_for_device,
+ .sync_sg_for_device = nommu_sync_sg_for_device,
+ .is_phys = 1,
};

void __init no_iommu_init(void)
diff --git a/include/asm-generic/dma-mapping-common.h b/include/asm-generic/dma-mapping-common.h
index 5406a60..e694263 100644
--- a/include/asm-generic/dma-mapping-common.h
+++ b/include/asm-generic/dma-mapping-common.h
@@ -103,7 +103,6 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
if (ops->sync_single_for_cpu)
ops->sync_single_for_cpu(dev, addr, size, dir);
debug_dma_sync_single_for_cpu(dev, addr, size, dir);
- flush_write_buffers();
}

static inline void dma_sync_single_for_device(struct device *dev,
@@ -116,7 +115,6 @@ static inline void dma_sync_single_for_device(struct device *dev,
if (ops->sync_single_for_device)
ops->sync_single_for_device(dev, addr, size, dir);
debug_dma_sync_single_for_device(dev, addr, size, dir);
- flush_write_buffers();
}

static inline void dma_sync_single_range_for_cpu(struct device *dev,
@@ -132,7 +130,6 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev,
ops->sync_single_range_for_cpu(dev, addr, offset, size, dir);
debug_dma_sync_single_range_for_cpu(dev, addr, offset, size, dir);

- flush_write_buffers();
} else
dma_sync_single_for_cpu(dev, addr, size, dir);
}
@@ -150,7 +147,6 @@ static inline void dma_sync_single_range_for_device(struct device *dev,
ops->sync_single_range_for_device(dev, addr, offset, size, dir);
debug_dma_sync_single_range_for_device(dev, addr, offset, size, dir);

- flush_write_buffers();
} else
dma_sync_single_for_device(dev, addr, size, dir);
}
@@ -165,7 +161,6 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
if (ops->sync_sg_for_cpu)
ops->sync_sg_for_cpu(dev, sg, nelems, dir);
debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
- flush_write_buffers();
}

static inline void
@@ -179,7 +174,6 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
ops->sync_sg_for_device(dev, sg, nelems, dir);
debug_dma_sync_sg_for_device(dev, sg, nelems, dir);

- flush_write_buffers();
}

#define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, NULL)

2009-06-30 12:38:49

by Alan

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

> Ok. The Winchip also does not have an IOMMU or the need for SWIOTLB, so
> I guess it would be ok to move the flush_write_buffers() out of the common
> code into the x86 pci-nommu implementation. That one is also the only
> place that calls flush_write_buffers() in dma_map_().

What about non x86 - this is asm-generic you are playing with and its the
kind of change that causes evil really hard to track down and subtle
corruptions and user data loss if you get it wrong.

2009-06-30 12:49:26

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Tuesday 30 June 2009, Alan Cox wrote:
> > Ok. The Winchip also does not have an IOMMU or the need for SWIOTLB, so
> > I guess it would be ok to move the flush_write_buffers() out of the common
> > code into the x86 pci-nommu implementation. That one is also the only
> > place that calls flush_write_buffers() in dma_map_().
>
> What about non x86 - this is asm-generic you are playing with and its the
> kind of change that causes evil really hard to track down and subtle
> corruptions and user data loss if you get it wrong.

Non-x86 is the real motivation for the patch, because the flush_write_buffers
call in this file is currently not implemented and causes build errors
on everything but x86, frv, ia64 and m32r, where the latter two implement
it as an empty macro.

The only users of the file right now are x86 and ia64, and ia64 only added
the empty flush_write_buffers() definition in order to use it.

Arnd <><

2009-06-30 13:08:31

by Alan

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

> > What about non x86 - this is asm-generic you are playing with and its the
> > kind of change that causes evil really hard to track down and subtle
> > corruptions and user data loss if you get it wrong.
>
> Non-x86 is the real motivation for the patch, because the flush_write_buffers
> call in this file is currently not implemented and causes build errors
> on everything but x86, frv, ia64 and m32r, where the latter two implement
> it as an empty macro.

Ok so the FRV would grow a similar no iommu patch to the x86.

> The only users of the file right now are x86 and ia64, and ia64 only added
> the empty flush_write_buffers() definition in order to use it.

That now all makes sense.

2009-06-30 13:38:59

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Tuesday 30 June 2009, Alan Cox wrote:
> > > What about non x86 - this is asm-generic you are playing with and its the
> > > kind of change that causes evil really hard to track down and subtle
> > > corruptions and user data loss if you get it wrong.
> >
> > Non-x86 is the real motivation for the patch, because the flush_write_buffers
> > call in this file is currently not implemented and causes build errors
> > on everything but x86, frv, ia64 and m32r, where the latter two implement
> > it as an empty macro.
>
> Ok so the FRV would grow a similar no iommu patch to the x86.

Well, not even that. dma-mapping-common.h only makes sense on architectures
that have multiple dma-mapping implementations (parisc, mips, arm, powerpc,
sparc, ia64 and x86, possibly alpha). All others including frv only need a
nommu case anyway and would not use dma-mapping-common.h but could be
changed to use something like the dma-mapping-linear.h I worked on recently.

Arnd <><

2009-07-07 01:54:24

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

2009/6/30 Arnd Bergmann <[email protected]>:
> On Tuesday 30 June 2009, Alan Cox wrote:
> Well, not even that. dma-mapping-common.h only makes sense on architectures
> that have multiple dma-mapping implementations (parisc, mips, arm, powerpc,

It seems that there is only one dma-mmaping implementation on ARM, doesn't it?
Is it necessary that using dma-mapping-common.h on ARM?

> sparc, ia64 and x86, possibly alpha). All others including frv only need a
> nommu case anyway and would not use dma-mapping-common.h but could be
> changed to use something like the dma-mapping-linear.h I worked on recently.




--
Lei Ming

2009-07-07 07:49:04

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Tue, Jul 07, 2009 at 09:54:20AM +0800, Ming Lei wrote:
> 2009/6/30 Arnd Bergmann <[email protected]>:
> > On Tuesday 30 June 2009, Alan Cox wrote:
> > Well, not even that. dma-mapping-common.h only makes sense on architectures
> > that have multiple dma-mapping implementations (parisc, mips, arm, powerpc,
>
> It seems that there is only one dma-mmaping implementation on ARM, doesn't it?
> Is it necessary that using dma-mapping-common.h on ARM?

ARM has two (normal, and dma bounce), and in the long run we need to do
cache handling on unmap as well as map due to CPU speculative fetches.

2009-07-07 13:43:48

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

2009/7/7 Russell King - ARM Linux <[email protected]>:
> On Tue, Jul 07, 2009 at 09:54:20AM +0800, Ming Lei wrote:
>> 2009/6/30 Arnd Bergmann <[email protected]>:
>> > On Tuesday 30 June 2009, Alan Cox wrote:
>> > Well, not even that. dma-mapping-common.h only makes sense on architectures
>> > that have multiple dma-mapping implementations (parisc, mips, arm, powerpc,
>>
>> It seems that there is only one dma-mmaping implementation on ARM, doesn't it?
>> Is it necessary that using dma-mapping-common.h on ARM?
>
> ARM has two (normal, and dma bounce), and in the long run we need to do

OK, Can we use dma-mapping-common.h on ARM?

> cache handling on unmap as well as map due to CPU speculative fetches.

IMHO, it seems we can fix this problem now.

For DMA_TO_DEVICE transfer, clean cache in dma map, but does nothing in
dma unmap;

For DMA_FROM_DEVICE, we may do nothing in dma map, but invaliate cache
in dma unmap.

Is it doable?

Thanks.
--
Lei Ming

2009-07-07 14:07:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Tuesday 07 July 2009, Ming Lei wrote:
> > ARM has two (normal, and dma bounce), and in the long run we need to do
>
> OK, Can we use dma-mapping-common.h on ARM?

It should work in principle. It may be a good idea to also move to
the generic swiotlb instead of the traditional dma bounce at the
same time.

Note that dma-mapping-common.h is only needed if you want to support
two or more different DMA implementations in a single kernel, which
I'm not sure is needed for ARM.

> > cache handling on unmap as well as map due to CPU speculative fetches.
>
> IMHO, it seems we can fix this problem now.
>
> For DMA_TO_DEVICE transfer, clean cache in dma map, but does nothing in
> dma unmap;
>
> For DMA_FROM_DEVICE, we may do nothing in dma map, but invaliate cache
> in dma unmap.

A number of other architectures do this already. You also need to
have dma_sync_*_for_cpu and dma_sync_*_for_device, where the *_for_device
operation needs to do the same flushing as dma_map_* and *_for_cpu
does the same as dma_unmap_*.

Note that actually you need to do writeback+invalidate in DMA_TO_DEVICE
and at least an invalidate in DMA_FROM_DEVICE during dma_map_*.
For the unmap, I don't think you ever need to invalidate the cache.
If you invalidate only at unmap time for DMA_FROM_DEVICE, a dirty
cache line might be accidentally flushed to the buffer after
the device has written to it.

Arnd <><

2009-07-07 14:55:16

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

2009/7/7 Arnd Bergmann <[email protected]>:
> On Tuesday 07 July 2009, Ming Lei wrote:
>> > ARM has two (normal, and dma bounce), and in the long run we need to do
>>
>> OK, Can we use dma-mapping-common.h on ARM?
>
> It should work in principle. It may be a good idea to also move to
> the generic swiotlb instead of the traditional dma bounce at the
> same time.
>
> Note that dma-mapping-common.h is only needed if you want to support
> two or more different DMA implementations in a single kernel, which
> I'm not sure is needed for ARM.
>
>> > cache handling on unmap as well as map due to CPU speculative fetches.
>>
>> IMHO, it seems we can fix this problem now.
>>
>> For DMA_TO_DEVICE transfer, clean cache in dma map, but does nothing in
>> dma unmap;
>>
>> For ?DMA_FROM_DEVICE, we may do nothing in dma map, but invaliate cache
>> in dma unmap.
>
> A number of other architectures do this already. You also need to
> have dma_sync_*_for_cpu and dma_sync_*_for_device, where the *_for_device
> operation needs to do the same flushing as dma_map_* and *_for_cpu
> does the same as dma_unmap_*.
>
> Note that actually you need to do writeback+invalidate in DMA_TO_DEVICE

IMHO, writeback in dma map is enough for DMA_TO_DEVICE transfer.

> and at least an invalidate in DMA_FROM_DEVICE during dma_map_*.
> For the unmap, I don't think you ever need to invalidate the cache.

No, as Russell said that CPU speculative fetches can make cache "dirty" after
dma map but before dma is over, then CPU may read inconsistent data
after DMA_FROM_DEVICE transfer is finished if does not invalidate
cache in
dma unmap.

> If you invalidate only at unmap time for DMA_FROM_DEVICE, a dirty
> cache line might be accidentally flushed to the buffer after
> the device has written to it.

It seems you are reasonable, and we do need to invalidate cache in both dma
map and dma unmap for DMA_FROM_DEVICE, don't we?

--
Lei Ming

2009-07-07 15:31:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Tuesday 07 July 2009, Ming Lei wrote:
> > Note that actually you need to do writeback+invalidate in DMA_TO_DEVICE
>
> IMHO, writeback in dma map is enough for DMA_TO_DEVICE transfer.

Right. writeback+invalidate would only be needed for bidirectional
transfers but not for DMA_TO_DEVICE.

> > and at least an invalidate in DMA_FROM_DEVICE during dma_map_*.
> > For the unmap, I don't think you ever need to invalidate the cache.
>
> No, as Russell said that CPU speculative fetches can make cache "dirty" after
> dma map but before dma is over, then CPU may read inconsistent data
> after DMA_FROM_DEVICE transfer is finished if does not invalidate
> cache in
> dma unmap.

Hmm, I don't think any of the other architectures currently
try to prevent this. Is ARM special in this regard, or does that
mean that the others are broken? What kind of code would cause
a speculative read on a data section that the kernel (by convention)
must not access? Would it be enough to have an rmb() in dma_unmap_*
and dma_sync_*_for_cpu to prevent speculative accesses?

> > If you invalidate only at unmap time for DMA_FROM_DEVICE, a dirty
> > cache line might be accidentally flushed to the buffer after
> > the device has written to it.
>
> It seems you are reasonable, and we do need to invalidate cache in both dma
> map and dma unmap for DMA_FROM_DEVICE, don't we?

Is it guaranteed that a cache line from a speculative fetch will
never be written back? Otherwise it would still be broken.

Arnd <><

2009-07-07 17:34:20

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Tue, Jul 07, 2009 at 09:43:37PM +0800, Ming Lei wrote:
> 2009/7/7 Russell King - ARM Linux <[email protected]>:
> > On Tue, Jul 07, 2009 at 09:54:20AM +0800, Ming Lei wrote:
> >> 2009/6/30 Arnd Bergmann <[email protected]>:
> >> > On Tuesday 30 June 2009, Alan Cox wrote:
> >> > Well, not even that. dma-mapping-common.h only makes sense on architectures
> >> > that have multiple dma-mapping implementations (parisc, mips, arm, powerpc,
> >>
> >> It seems that there is only one dma-mmaping implementation on ARM, doesn't it?
> >> Is it necessary that using dma-mapping-common.h on ARM?
> >
> > ARM has two (normal, and dma bounce), and in the long run we need to do
>
> OK, Can we use dma-mapping-common.h on ARM?

No.

> > cache handling on unmap as well as map due to CPU speculative fetches.
>
> IMHO, it seems we can fix this problem now.
>
> For DMA_TO_DEVICE transfer, clean cache in dma map, but does nothing in
> dma unmap;
>
> For DMA_FROM_DEVICE, we may do nothing in dma map, but invaliate cache
> in dma unmap.

No. If there are dirty lines, they can be written back, overwriting the
data which has been DMA'd there.

Basically, with speculative prefetching, we need to:

- for DMA_TO_DEVICE, clean the cache in the DMA map and do nothing on DMA unmap
- for DMA_FROM_DEVICE, invalidate the cache on DMA map and again on DMA unmap

Moreover, this isn't going to be trivial. In order to do these cache
operations, we need the virtual address. However, the unmap APIs aren't
given this information (though dma_to_virt() will do what's required for
the dma_unmap_single case.)

dma_unmap_page() on the other hand, we don't have any API existing to
turn a DMA address back into a page struct...

If only we had an architecture specific struct to contain whatever
information needs to be carried from map to unmap...

2009-07-07 17:36:32

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH][RFC] asm-generic:remove calling flush_write_buffers() in dma_sync_*_for_cpu

On Tue, Jul 07, 2009 at 04:06:48PM +0200, Arnd Bergmann wrote:
> Note that actually you need to do writeback+invalidate in DMA_TO_DEVICE

Rubbish. DMA _to_ the device just needs any data held in the cache to
be pushed out into the hardware RAM. There's absolutely no point in
invalidating it - that's a complete waste of time.

> and at least an invalidate in DMA_FROM_DEVICE during dma_map_*.
> For the unmap, I don't think you ever need to invalidate the cache.

As I've already explained, you do if you have speculative prefetching
which can bring in data into the cache from _any_ memory location at
_any_ time. We _are_ going to have to do this, no questions asked.