2010-06-17 01:14:01

by James Bottomley

[permalink] [raw]
Subject: bnx2 fails to compile on parisc because of missing get_dma_ops()

I'm not quite sure whose fault this one is.

However, this code in bnx2.c:

if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
next_rx_buf =
&rxr->rx_buf_ring[
RX_RING_IDX(NEXT_RX_BD(sw_cons))];
prefetch(next_rx_buf->desc);
}

Looks remarkably fragile: what exactly is it trying to do?

The commit that causes the problem:

commit a33fa66bcf365ffe5b79d1ae1d3582cc261ae56e
Author: Michael Chan <[email protected]>
Date: Thu May 6 08:58:13 2010 +0000

bnx2: Add prefetches to rx path.

Looks fairly innocuous by the description.

Should parisc have a get_dma_ops()? We don't need one because our dma
ops are per platform not per bus.

James


2010-06-17 01:16:22

by David Miller

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

From: James Bottomley <[email protected]>
Date: Wed, 16 Jun 2010 20:13:49 -0500

> However, this code in bnx2.c:
>
> if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
> next_rx_buf =
> &rxr->rx_buf_ring[
> RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> prefetch(next_rx_buf->desc);
> }
>
> Looks remarkably fragile: what exactly is it trying to do?
>
> The commit that causes the problem:
>
> commit a33fa66bcf365ffe5b79d1ae1d3582cc261ae56e
> Author: Michael Chan <[email protected]>
> Date: Thu May 6 08:58:13 2010 +0000
>
> bnx2: Add prefetches to rx path.
>
> Looks fairly innocuous by the description.
>
> Should parisc have a get_dma_ops()? We don't need one because our dma
> ops are per platform not per bus.

I think asking for get_dma_ops() directly in a driver is dodgy at
best, especially one that is meant to compile on any PCI supporting
system. At least right now.

2010-06-17 01:17:19

by Mike Frysinger

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Wed, Jun 16, 2010 at 9:13 PM, James Bottomley wrote:
> I'm not quite sure whose fault this one is.
>
> However, this code in bnx2.c:
>
> ? ? ? ? ? ? ? ?if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
> ? ? ? ? ? ? ? ? ? ? ? ?next_rx_buf =
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?&rxr->rx_buf_ring[
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> ? ? ? ? ? ? ? ? ? ? ? ?prefetch(next_rx_buf->desc);
> ? ? ? ? ? ? ? ?}
>
> Looks remarkably fragile: what exactly is it trying to do?
>
> The commit that causes the problem:
>
> commit a33fa66bcf365ffe5b79d1ae1d3582cc261ae56e
> Author: Michael Chan <[email protected]>
> Date: ? Thu May 6 08:58:13 2010 +0000
>
> ? ?bnx2: Add prefetches to rx path.
>
> Looks fairly innocuous by the description.
>
> Should parisc have a get_dma_ops()? ?We don't need one because our dma
> ops are per platform not per bus.

looks like it'll be broken on more than just parisc:
$ grep get_dma_ops arch/*/include/asm/ -rl | cut -d/ -f 2
alpha
ia64
microblaze
powerpc
sh
sparc
x86
-mike

2010-06-17 03:54:11

by Michael Chan

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

Mike Frysinger wrote:

> On Wed, Jun 16, 2010 at 9:13 PM, James Bottomley wrote:
> > I'm not quite sure whose fault this one is.
> >
> > However, this code in bnx2.c:
> >
> > if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
> > next_rx_buf =
> > &rxr->rx_buf_ring[
> > RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> > prefetch(next_rx_buf->desc);
> > }
> >
> > Looks remarkably fragile: what exactly is it trying to do?

If sync_single is not defined, that means the CPU has a consistent
view of next_rx_buf and so it makes sense to prefetch it.

> >
> > The commit that causes the problem:
> >
> > commit a33fa66bcf365ffe5b79d1ae1d3582cc261ae56e
> > Author: Michael Chan <[email protected]>
> > Date: Thu May 6 08:58:13 2010 +0000
> >
> > bnx2: Add prefetches to rx path.
> >
> > Looks fairly innocuous by the description.
> >
> > Should parisc have a get_dma_ops()? We don't need one
> because our dma
> > ops are per platform not per bus.
>
> looks like it'll be broken on more than just parisc:
> $ grep get_dma_ops arch/*/include/asm/ -rl | cut -d/ -f 2
> alpha
> ia64
> microblaze
> powerpc
> sh
> sparc
> x86

Most of these archs use the dma functions in:

<asm-genric/dma-mapping-common.h>

so it's not a problem.

I think I'll send in a patch to remove that part of the code
from bnx2.c for now.

Thanks.

2010-06-17 04:00:06

by Mike Frysinger

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Wed, Jun 16, 2010 at 11:53 PM, Michael Chan wrote:
> Mike Frysinger wrote:
>> > The commit that causes the problem:
>> >
>> > commit a33fa66bcf365ffe5b79d1ae1d3582cc261ae56e
>> > Author: Michael Chan <[email protected]>
>> > Date: ? Thu May 6 08:58:13 2010 +0000
>> >
>> > ? ?bnx2: Add prefetches to rx path.
>> >
>> > Looks fairly innocuous by the description.
>> >
>> > Should parisc have a get_dma_ops()? ?We don't need one
>> because our dma
>> > ops are per platform not per bus.
>>
>> looks like it'll be broken on more than just parisc:
>> $ grep get_dma_ops arch/*/include/asm/ -rl | cut -d/ -f 2
>> alpha
>> ia64
>> microblaze
>> powerpc
>> sh
>> sparc
>> x86
>
> Most of these archs use the dma functions in:
>
> <asm-genric/dma-mapping-common.h>
>
> so it's not a problem.

the grep is showing only the arches that define get_dma_ops (and so
the new code works). you'd have to invert the list to see the ones
which do not define get_dma_ops(), and the inverted list is larger.
that was merely my point.
-mike

2010-06-17 04:04:16

by Paul Mundt

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Wed, Jun 16, 2010 at 08:53:57PM -0700, Michael Chan wrote:
> Mike Frysinger wrote:
>
> > On Wed, Jun 16, 2010 at 9:13 PM, James Bottomley wrote:
> > > I'm not quite sure whose fault this one is.
> > >
> > > However, this code in bnx2.c:
> > >
> > > if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
> > > next_rx_buf =
> > > &rxr->rx_buf_ring[
> > > RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> > > prefetch(next_rx_buf->desc);
> > > }
> > >
> > > Looks remarkably fragile: what exactly is it trying to do?
>
> If sync_single is not defined, that means the CPU has a consistent
> view of next_rx_buf and so it makes sense to prefetch it.
>
Except that's not a valid assertion, there are platforms that implement
it for sanity checks yet still have consistent DMA. You are making
inherently non-portable assumptions for a PCI driver, which is a good
example of why drivers should never be side-stepping the API in the first
place. If you want to have a micro-optimization for the consistent DMA
case, you can check dma_is_consistent(), which is part of the API and
will be variable on certain platform configurations (ie, some may be
consistent with PCI but not on other busses, etc.)

2010-06-17 04:11:08

by Michael Chan

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

Paul Mundt wrote:

> On Wed, Jun 16, 2010 at 08:53:57PM -0700, Michael Chan wrote:
> > If sync_single is not defined, that means the CPU has a consistent
> > view of next_rx_buf and so it makes sense to prefetch it.
> >
> Except that's not a valid assertion, there are platforms that
> implement
> it for sanity checks yet still have consistent DMA. You are making
> inherently non-portable assumptions for a PCI driver, which is a good
> example of why drivers should never be side-stepping the API
> in the first
> place. If you want to have a micro-optimization for the consistent DMA
> case, you can check dma_is_consistent(), which is part of the API and
> will be variable on certain platform configurations (ie, some may be
> consistent with PCI but not on other busses, etc.)
>
>

Thanks for the tip. I didn't know about the dma_is_consistent() API.
I'll use this to fix it then.

2010-06-17 04:20:47

by James Bottomley

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Wed, 2010-06-16 at 20:53 -0700, Michael Chan wrote:
> Mike Frysinger wrote:
>
> > On Wed, Jun 16, 2010 at 9:13 PM, James Bottomley wrote:
> > > I'm not quite sure whose fault this one is.
> > >
> > > However, this code in bnx2.c:
> > >
> > > if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
> > > next_rx_buf =
> > > &rxr->rx_buf_ring[
> > > RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> > > prefetch(next_rx_buf->desc);
> > > }
> > >
> > > Looks remarkably fragile: what exactly is it trying to do?
>
> If sync_single is not defined, that means the CPU has a consistent
> view of next_rx_buf and so it makes sense to prefetch it.

That's not entirely a correct statement. Many architectures make a DMA
area coherent by turning off the CPU cache over it. In that case,
prefetching makes absolutely no sense (although it usually works but is
a nop).

> > > The commit that causes the problem:
> > >
> > > commit a33fa66bcf365ffe5b79d1ae1d3582cc261ae56e
> > > Author: Michael Chan <[email protected]>
> > > Date: Thu May 6 08:58:13 2010 +0000
> > >
> > > bnx2: Add prefetches to rx path.
> > >
> > > Looks fairly innocuous by the description.
> > >
> > > Should parisc have a get_dma_ops()? We don't need one
> > because our dma
> > > ops are per platform not per bus.
> >
> > looks like it'll be broken on more than just parisc:
> > $ grep get_dma_ops arch/*/include/asm/ -rl | cut -d/ -f 2
> > alpha
> > ia64
> > microblaze
> > powerpc
> > sh
> > sparc
> > x86
>
> Most of these archs use the dma functions in:
>
> <asm-genric/dma-mapping-common.h>
>
> so it's not a problem.

Parisc begs to differ.

Plus you're making assumptions about the contents of the ops structure
which is an internal architecture object ... that's bound to run into
portability problems even if we make it compile on all platform.

> I think I'll send in a patch to remove that part of the code
> from bnx2.c for now.

I think that's the best solution.

James

2010-06-17 06:24:59

by Michael Chan

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

Michael Chan wrote:
>
> Paul Mundt wrote:
>
> > If you want to have a micro-optimization for the consistent DMA
> > case, you can check dma_is_consistent(), which is part of the API and
> > will be variable on certain platform configurations (ie, some may be
> > consistent with PCI but not on other busses, etc.)
> >
> >
>
> Thanks for the tip. I didn't know about the dma_is_consistent() API.
> I'll use this to fix it then.
>

David, why is dma_is_consistent() always returning 1 on sparc? The
streaming DMA is not consistent.

2010-06-17 12:13:51

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Wed, 16 Jun 2010 20:13:49 -0500
James Bottomley <[email protected]> wrote:

> I'm not quite sure whose fault this one is.
>
> However, this code in bnx2.c:
>
> if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
> next_rx_buf =
> &rxr->rx_buf_ring[
> RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> prefetch(next_rx_buf->desc);
> }
>
> Looks remarkably fragile: what exactly is it trying to do?
>
> The commit that causes the problem:
>
> commit a33fa66bcf365ffe5b79d1ae1d3582cc261ae56e
> Author: Michael Chan <[email protected]>
> Date: Thu May 6 08:58:13 2010 +0000
>
> bnx2: Add prefetches to rx path.
>
> Looks fairly innocuous by the description.
>
> Should parisc have a get_dma_ops()? We don't need one because our dma
> ops are per platform not per bus.

Shouldn't. Only the architectures that use
include/dma-generic/dma-mapping.common.h need to have get_dma_ops().

Using include/dma-generic/dma-mapping.common.h for parisc is not a bad
idea though.

2010-06-17 12:14:00

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Wed, 16 Jun 2010 20:53:57 -0700
"Michael Chan" <[email protected]> wrote:

> > > The commit that causes the problem:
> > >
> > > commit a33fa66bcf365ffe5b79d1ae1d3582cc261ae56e
> > > Author: Michael Chan <[email protected]>
> > > Date: Thu May 6 08:58:13 2010 +0000
> > >
> > > bnx2: Add prefetches to rx path.
> > >
> > > Looks fairly innocuous by the description.
> > >
> > > Should parisc have a get_dma_ops()? We don't need one
> > because our dma
> > > ops are per platform not per bus.
> >
> > looks like it'll be broken on more than just parisc:
> > $ grep get_dma_ops arch/*/include/asm/ -rl | cut -d/ -f 2
> > alpha
> > ia64
> > microblaze
> > powerpc
> > sh
> > sparc
> > x86
>
> Most of these archs use the dma functions in:
>
> <asm-genric/dma-mapping-common.h>
>
> so it's not a problem.

No, it's wrong assumption. asm-genric/dma-mapping-common.h is the
helper code to simplify architecture's DMA core code. Some
architecture uses it and some don't.

You can't expect every architectures to use it.


> I think I'll send in a patch to remove that part of the code
> from bnx2.c for now.

Yeah. I'm not sure you already sent a patch.

=
From: FUJITA Tomonori <[email protected]>
Date: Thu, 17 Jun 2010 13:06:15 +0900
Subject: [PATCH] bnx2: fix dma_get_ops compilation breakage

This removes dma_get_ops() prefetch optimization in bnx2.

bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is
noop. bnx2 does prefetch if it's noop.

But dma_get_ops() isn't available on all the architectures (only the
architectures that uses dma_map_ops struct have it). Using
dma_get_ops() in drivers leads to compilation breakage on many
archtectures.

Currently, we don't have a way to see if dma_sync_single_for_cpu() is
noop. If it can improve the performance notably, we can add the new
DMA API for it.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
drivers/net/bnx2.c | 10 +---------
1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 949d7a9..b3305fc 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -3073,7 +3073,6 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
u16 hw_cons, sw_cons, sw_ring_cons, sw_prod, sw_ring_prod;
struct l2_fhdr *rx_hdr;
int rx_pkt = 0, pg_ring_used = 0;
- struct pci_dev *pdev = bp->pdev;

hw_cons = bnx2_get_hw_rx_cons(bnapi);
sw_cons = rxr->rx_cons;
@@ -3086,7 +3085,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
while (sw_cons != hw_cons) {
unsigned int len, hdr_len;
u32 status;
- struct sw_bd *rx_buf, *next_rx_buf;
+ struct sw_bd *rx_buf;
struct sk_buff *skb;
dma_addr_t dma_addr;
u16 vtag = 0;
@@ -3098,13 +3097,6 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
rx_buf = &rxr->rx_buf_ring[sw_ring_cons];
skb = rx_buf->skb;
prefetchw(skb);
-
- if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
- next_rx_buf =
- &rxr->rx_buf_ring[
- RX_RING_IDX(NEXT_RX_BD(sw_cons))];
- prefetch(next_rx_buf->desc);
- }
rx_buf->skb = NULL;

dma_addr = dma_unmap_addr(rx_buf, mapping);
--
1.5.6.5

2010-06-17 12:13:56

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Wed, 16 Jun 2010 18:16:32 -0700 (PDT)
David Miller <[email protected]> wrote:

> From: James Bottomley <[email protected]>
> Date: Wed, 16 Jun 2010 20:13:49 -0500
>
> > However, this code in bnx2.c:
> >
> > if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
> > next_rx_buf =
> > &rxr->rx_buf_ring[
> > RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> > prefetch(next_rx_buf->desc);
> > }
> >
> > Looks remarkably fragile: what exactly is it trying to do?
> >
> > The commit that causes the problem:
> >
> > commit a33fa66bcf365ffe5b79d1ae1d3582cc261ae56e
> > Author: Michael Chan <[email protected]>
> > Date: Thu May 6 08:58:13 2010 +0000
> >
> > bnx2: Add prefetches to rx path.
> >
> > Looks fairly innocuous by the description.
> >
> > Should parisc have a get_dma_ops()? We don't need one because our dma
> > ops are per platform not per bus.
>
> I think asking for get_dma_ops() directly in a driver is dodgy at
> best, especially one that is meant to compile on any PCI supporting
> system. At least right now.

Yeah, it should be used only in the architecture core
code. get_dma_ops() is not mentioned in DMA-API-* docs.

2010-06-17 12:21:31

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Wed, 16 Jun 2010 23:24:44 -0700
"Michael Chan" <[email protected]> wrote:

> Michael Chan wrote:
> >
> > Paul Mundt wrote:
> >
> > > If you want to have a micro-optimization for the consistent DMA
> > > case, you can check dma_is_consistent(), which is part of the API and
> > > will be variable on certain platform configurations (ie, some may be
> > > consistent with PCI but not on other busses, etc.)
> > >
> > >
> >
> > Thanks for the tip. I didn't know about the dma_is_consistent() API.
> > I'll use this to fix it then.
> >
>
> David, why is dma_is_consistent() always returning 1 on sparc? The
> streaming DMA is not consistent.

I think that there are some confusion about dma_is_consistent(). Some
architectures think that dma_is_consistent() is supposed to return 1
if they can allocate coherent memory (note that some architectures
can't allocate coherent memory).

2010-06-17 12:54:42

by Michael Chan

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

FUJITA Tomonori wrote:

> From: FUJITA Tomonori <[email protected]>
> Date: Thu, 17 Jun 2010 13:06:15 +0900
> Subject: [PATCH] bnx2: fix dma_get_ops compilation breakage
>
> This removes dma_get_ops() prefetch optimization in bnx2.
>
> bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is
> noop. bnx2 does prefetch if it's noop.
>
> But dma_get_ops() isn't available on all the architectures (only the
> architectures that uses dma_map_ops struct have it). Using
> dma_get_ops() in drivers leads to compilation breakage on many
> archtectures.
>
> Currently, we don't have a way to see if dma_sync_single_for_cpu() is
> noop. If it can improve the performance notably, we can add the new
> DMA API for it.

This prefetch improves performance noticeably when the driver is
handling incoming 64-byte packets at a sustained rate.

>
> Signed-off-by: FUJITA Tomonori <[email protected]>

Acked-by: Michael Chan <[email protected]>

Thanks.

> ---
> drivers/net/bnx2.c | 10 +---------
> 1 files changed, 1 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 949d7a9..b3305fc 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -3073,7 +3073,6 @@ bnx2_rx_int(struct bnx2 *bp, struct
> bnx2_napi *bnapi, int budget)
> u16 hw_cons, sw_cons, sw_ring_cons, sw_prod, sw_ring_prod;
> struct l2_fhdr *rx_hdr;
> int rx_pkt = 0, pg_ring_used = 0;
> - struct pci_dev *pdev = bp->pdev;
>
> hw_cons = bnx2_get_hw_rx_cons(bnapi);
> sw_cons = rxr->rx_cons;
> @@ -3086,7 +3085,7 @@ bnx2_rx_int(struct bnx2 *bp, struct
> bnx2_napi *bnapi, int budget)
> while (sw_cons != hw_cons) {
> unsigned int len, hdr_len;
> u32 status;
> - struct sw_bd *rx_buf, *next_rx_buf;
> + struct sw_bd *rx_buf;
> struct sk_buff *skb;
> dma_addr_t dma_addr;
> u16 vtag = 0;
> @@ -3098,13 +3097,6 @@ bnx2_rx_int(struct bnx2 *bp, struct
> bnx2_napi *bnapi, int budget)
> rx_buf = &rxr->rx_buf_ring[sw_ring_cons];
> skb = rx_buf->skb;
> prefetchw(skb);
> -
> - if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
> - next_rx_buf =
> - &rxr->rx_buf_ring[
> -
> RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> - prefetch(next_rx_buf->desc);
> - }
> rx_buf->skb = NULL;
>
> dma_addr = dma_unmap_addr(rx_buf, mapping);
> --
> 1.5.6.5
>
>
>

2010-06-17 13:12:25

by James Bottomley

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote:
> FUJITA Tomonori wrote:
>
> > From: FUJITA Tomonori <[email protected]>
> > Date: Thu, 17 Jun 2010 13:06:15 +0900
> > Subject: [PATCH] bnx2: fix dma_get_ops compilation breakage
> >
> > This removes dma_get_ops() prefetch optimization in bnx2.
> >
> > bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is
> > noop. bnx2 does prefetch if it's noop.
> >
> > But dma_get_ops() isn't available on all the architectures (only the
> > architectures that uses dma_map_ops struct have it). Using
> > dma_get_ops() in drivers leads to compilation breakage on many
> > archtectures.
> >
> > Currently, we don't have a way to see if dma_sync_single_for_cpu() is
> > noop. If it can improve the performance notably, we can add the new
> > DMA API for it.
>
> This prefetch improves performance noticeably when the driver is
> handling incoming 64-byte packets at a sustained rate.

So why not do it unconditionally? The worst that can happen is that you
pull in a stale cache line which will get cleaned in the dma_sync, thus
slightly degrading performance on incoherent architectures.

Alternatively, come up with a dma prefetch infrastructure ... all you're
really doing is hinting to the architecture that you'll sync this region
next.

James

2010-06-17 13:30:48

by Michael Chan

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

James Bottomley wrote:

> On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote:
> > This prefetch improves performance noticeably when the driver is
> > handling incoming 64-byte packets at a sustained rate.
>
> So why not do it unconditionally? The worst that can happen
> is that you
> pull in a stale cache line which will get cleaned in the
> dma_sync, thus
> slightly degrading performance on incoherent architectures.

The original patch was an unconditional prefetch. There was
some discussion that it might not be correct if the DMA wasn't
sync'ed yet on some archs. If the concensus is that it is ok to
do so, that would be the simplest solution.

>
> Alternatively, come up with a dma prefetch infrastructure ...
> all you're
> really doing is hinting to the architecture that you'll sync
> this region
> next.
>
> James
>
>
>
>

2010-06-17 13:36:59

by James Bottomley

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Thu, 2010-06-17 at 06:30 -0700, Michael Chan wrote:
> James Bottomley wrote:
>
> > On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote:
> > > This prefetch improves performance noticeably when the driver is
> > > handling incoming 64-byte packets at a sustained rate.
> >
> > So why not do it unconditionally? The worst that can happen
> > is that you
> > pull in a stale cache line which will get cleaned in the
> > dma_sync, thus
> > slightly degrading performance on incoherent architectures.
>
> The original patch was an unconditional prefetch. There was
> some discussion that it might not be correct if the DMA wasn't
> sync'ed yet on some archs. If the concensus is that it is ok to
> do so, that would be the simplest solution.

It's definitely not "correct" in that it may pull in stale data. But it
should be harmless in that if it does, the subsequent sync will destroy
the cache line (even if it actually pulled in correct data) and prevent
the actual use of the prefetched data being wrong (or indeed being
prefetched at all).

James


2010-06-17 14:06:31

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Thu, 17 Jun 2010 06:30:39 -0700
"Michael Chan" <[email protected]> wrote:

> James Bottomley wrote:
>
> > On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote:
> > > This prefetch improves performance noticeably when the driver is
> > > handling incoming 64-byte packets at a sustained rate.
> >
> > So why not do it unconditionally? The worst that can happen
> > is that you
> > pull in a stale cache line which will get cleaned in the
> > dma_sync, thus
> > slightly degrading performance on incoherent architectures.
>
> The original patch was an unconditional prefetch. There was
> some discussion that it might not be correct if the DMA wasn't
> sync'ed yet on some archs. If the concensus is that it is ok to
> do so, that would be the simplest solution.

As James said, it just adds useless prefetch on incoherent
architectures. sync_single_for_cpu is called later so we can see the
correct data. One useless prefetch is unlikely to lead performance
drop.

You might prefer this v2.

=
From: FUJITA Tomonori <[email protected]>
Subject: [PATCH v2] bnx2: fix dma_get_ops compilation breakage

This removes dma_get_ops() prefetch optimization in bnx2.

bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is
noop. bnx2 does prefetch if it's noop.

But dma_get_ops() isn't available on all the architectures (only the
architectures that uses dma_map_ops struct have it). Using
dma_get_ops() in drivers leads to compilation breakage on many
archtectures.

This patch removes dma_get_ops() and changes bnx2 to do prefetch on
all the architectures. This adds useless prefetch on incoherent
architectures but this is harmless. It is also unlikely to cause the
performance drop.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
drivers/net/bnx2.c | 10 ++++------
1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 949d7a9..85f1692 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -3099,12 +3099,10 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
skb = rx_buf->skb;
prefetchw(skb);

- if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
- next_rx_buf =
- &rxr->rx_buf_ring[
- RX_RING_IDX(NEXT_RX_BD(sw_cons))];
- prefetch(next_rx_buf->desc);
- }
+ next_rx_buf =
+ &rxr->rx_buf_ring[RX_RING_IDX(NEXT_RX_BD(sw_cons))];
+ prefetch(next_rx_buf->desc);
+
rx_buf->skb = NULL;

dma_addr = dma_unmap_addr(rx_buf, mapping);
--
1.6.5

2010-06-17 14:36:43

by David Miller

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

From: FUJITA Tomonori <[email protected]>
Date: Thu, 17 Jun 2010 21:21:13 +0900

> On Wed, 16 Jun 2010 23:24:44 -0700
> "Michael Chan" <[email protected]> wrote:
>
>> David, why is dma_is_consistent() always returning 1 on sparc? The
>> streaming DMA is not consistent.
>
> I think that there are some confusion about dma_is_consistent(). Some
> architectures think that dma_is_consistent() is supposed to return 1
> if they can allocate coherent memory (note that some architectures
> can't allocate coherent memory).

Right, and that's why it's defined this way.

If the desired meaning is different, just me know and I'll fix the
sparc definition.

2010-06-17 14:42:48

by Michael Chan

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

FUJITA Tomonori wrote:

> On Thu, 17 Jun 2010 06:30:39 -0700
> "Michael Chan" <[email protected]> wrote:
>
> > James Bottomley wrote:
> >
> > > On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote:
> > > > This prefetch improves performance noticeably when the driver is
> > > > handling incoming 64-byte packets at a sustained rate.
> > >
> > > So why not do it unconditionally? The worst that can happen
> > > is that you
> > > pull in a stale cache line which will get cleaned in the
> > > dma_sync, thus
> > > slightly degrading performance on incoherent architectures.
> >
> > The original patch was an unconditional prefetch. There was
> > some discussion that it might not be correct if the DMA wasn't
> > sync'ed yet on some archs. If the concensus is that it is ok to
> > do so, that would be the simplest solution.
>
> As James said, it just adds useless prefetch on incoherent
> architectures. sync_single_for_cpu is called later so we can see the
> correct data. One useless prefetch is unlikely to lead performance
> drop.
>
> You might prefer this v2.

Yes, thanks.
Acked-by: Michael Chan <[email protected]>

>
> =
> From: FUJITA Tomonori <[email protected]>
> Subject: [PATCH v2] bnx2: fix dma_get_ops compilation breakage
>
> This removes dma_get_ops() prefetch optimization in bnx2.
>
> bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is
> noop. bnx2 does prefetch if it's noop.
>
> But dma_get_ops() isn't available on all the architectures (only the
> architectures that uses dma_map_ops struct have it). Using
> dma_get_ops() in drivers leads to compilation breakage on many
> archtectures.
>
> This patch removes dma_get_ops() and changes bnx2 to do prefetch on
> all the architectures. This adds useless prefetch on incoherent
> architectures but this is harmless. It is also unlikely to cause the
> performance drop.
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> ---
> drivers/net/bnx2.c | 10 ++++------
> 1 files changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 949d7a9..85f1692 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -3099,12 +3099,10 @@ bnx2_rx_int(struct bnx2 *bp, struct
> bnx2_napi *bnapi, int budget)
> skb = rx_buf->skb;
> prefetchw(skb);
>
> - if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
> - next_rx_buf =
> - &rxr->rx_buf_ring[
> -
> RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> - prefetch(next_rx_buf->desc);
> - }
> + next_rx_buf =
> +
> &rxr->rx_buf_ring[RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> + prefetch(next_rx_buf->desc);
> +
> rx_buf->skb = NULL;
>
> dma_addr = dma_unmap_addr(rx_buf, mapping);
> --
> 1.6.5
>
>
>

2010-06-17 14:50:51

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Thu, 17 Jun 2010 07:36:53 -0700 (PDT)
David Miller <[email protected]> wrote:

> From: FUJITA Tomonori <[email protected]>
> Date: Thu, 17 Jun 2010 21:21:13 +0900
>
> > On Wed, 16 Jun 2010 23:24:44 -0700
> > "Michael Chan" <[email protected]> wrote:
> >
> >> David, why is dma_is_consistent() always returning 1 on sparc? The
> >> streaming DMA is not consistent.
> >
> > I think that there are some confusion about dma_is_consistent(). Some
> > architectures think that dma_is_consistent() is supposed to return 1
> > if they can allocate coherent memory (note that some architectures
> > can't allocate coherent memory).
>
> Right, and that's why it's defined this way.
>
> If the desired meaning is different, just me know and I'll fix the
> sparc definition.

I think that there are some other architectures do the same. We need
to make sure that all the architectures define dma_is_consistent() in
the same meaning if drivers need it. However, I'm not sure we really
need dma_is_consistent(). There is only one user of it (and I think we
could remove it).

In the bnx2 case, we can simply prefetch on all the archs (or just
remove the optimization).

2010-06-17 14:50:56

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

Oops, some typos.

On Thu, 17 Jun 2010 23:05:59 +0900
FUJITA Tomonori <[email protected]> wrote:

> On Thu, 17 Jun 2010 06:30:39 -0700
> "Michael Chan" <[email protected]> wrote:
>
> > James Bottomley wrote:
> >
> > > On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote:
> > > > This prefetch improves performance noticeably when the driver is
> > > > handling incoming 64-byte packets at a sustained rate.
> > >
> > > So why not do it unconditionally? The worst that can happen
> > > is that you
> > > pull in a stale cache line which will get cleaned in the
> > > dma_sync, thus
> > > slightly degrading performance on incoherent architectures.
> >
> > The original patch was an unconditional prefetch. There was
> > some discussion that it might not be correct if the DMA wasn't
> > sync'ed yet on some archs. If the concensus is that it is ok to
> > do so, that would be the simplest solution.
>
> As James said, it just adds useless prefetch on incoherent
> architectures. sync_single_for_cpu is called later so we can see the
> correct data. One useless prefetch is unlikely to lead performance
> drop.
>
> You might prefer this v2.
>
> =
> From: FUJITA Tomonori <[email protected]>
> Subject: [PATCH v2] bnx2: fix dma_get_ops compilation breakage
>
> This removes dma_get_ops() prefetch optimization in bnx2.
>
> bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is
> noop. bnx2 does prefetch if it's noop.
>
> But dma_get_ops() isn't available on all the architectures (only the
> architectures that uses dma_map_ops struct have it). Using
> dma_get_ops() in drivers leads to compilation breakage on many
> archtectures.

s/archtectures/architectures/

> This patch removes dma_get_ops() and changes bnx2 to do prefetch on
> all the architectures. This adds useless prefetch on incoherent
~~~~~~~~~~
s/incoherent/non-coherent/
(thanks to Alan)

> architectures but this is harmless. It is also unlikely to cause the
> performance drop.

2010-06-17 15:31:10

by Paul Mundt

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Thu, Jun 17, 2010 at 11:50:35PM +0900, FUJITA Tomonori wrote:
> On Thu, 17 Jun 2010 07:36:53 -0700 (PDT)
> David Miller <[email protected]> wrote:
>
> > From: FUJITA Tomonori <[email protected]>
> > Date: Thu, 17 Jun 2010 21:21:13 +0900
> >
> > > On Wed, 16 Jun 2010 23:24:44 -0700
> > > "Michael Chan" <[email protected]> wrote:
> > >
> > >> David, why is dma_is_consistent() always returning 1 on sparc? The
> > >> streaming DMA is not consistent.
> > >
> > > I think that there are some confusion about dma_is_consistent(). Some
> > > architectures think that dma_is_consistent() is supposed to return 1
> > > if they can allocate coherent memory (note that some architectures
> > > can't allocate coherent memory).
> >
> > Right, and that's why it's defined this way.
> >
> > If the desired meaning is different, just me know and I'll fix the
> > sparc definition.
>
> I think that there are some other architectures do the same. We need
> to make sure that all the architectures define dma_is_consistent() in
> the same meaning if drivers need it. However, I'm not sure we really
> need dma_is_consistent(). There is only one user of it (and I think we
> could remove it).
>
> In the bnx2 case, we can simply prefetch on all the archs (or just
> remove the optimization).

I think its worthwhile keeping, especially since the consistency can vary
on a per struct device level. If there's a benefit with these sorts of
prefetch micro-optimizations in drivers when it doesn't cost us that much
to provide the hint, I don't really see the harm. If dma_is_consistent()
is suddenly supposed to take on other meanings, or it's supposed to mean
something entirely different, then this is something we should deal with
separately.

I don't see any harm in letting drivers know whether we can support
consistent DMA allocs for a given struct device or not though, even if
the micro-optimization is marginal at best.

At least I've conditionalized the definition on SH, and it seems other
archictures have done so too. It's not clear what we'd gain from throwing
that away as long as they're generally in agreement on what it means.

2010-06-17 15:52:49

by David Miller

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

From: FUJITA Tomonori <[email protected]>
Date: Thu, 17 Jun 2010 23:50:36 +0900

> Oops, some typos.

I've applied the patch with the typos fixed, thanks!

2010-06-22 06:30:27

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Fri, 18 Jun 2010 00:30:51 +0900
Paul Mundt <[email protected]> wrote:

> On Thu, Jun 17, 2010 at 11:50:35PM +0900, FUJITA Tomonori wrote:
> > On Thu, 17 Jun 2010 07:36:53 -0700 (PDT)
> > David Miller <[email protected]> wrote:
> >
> > > From: FUJITA Tomonori <[email protected]>
> > > Date: Thu, 17 Jun 2010 21:21:13 +0900
> > >
> > > > On Wed, 16 Jun 2010 23:24:44 -0700
> > > > "Michael Chan" <[email protected]> wrote:
> > > >
> > > >> David, why is dma_is_consistent() always returning 1 on sparc? The
> > > >> streaming DMA is not consistent.
> > > >
> > > > I think that there are some confusion about dma_is_consistent(). Some
> > > > architectures think that dma_is_consistent() is supposed to return 1
> > > > if they can allocate coherent memory (note that some architectures
> > > > can't allocate coherent memory).
> > >
> > > Right, and that's why it's defined this way.
> > >
> > > If the desired meaning is different, just me know and I'll fix the
> > > sparc definition.
> >
> > I think that there are some other architectures do the same. We need
> > to make sure that all the architectures define dma_is_consistent() in
> > the same meaning if drivers need it. However, I'm not sure we really
> > need dma_is_consistent(). There is only one user of it (and I think we
> > could remove it).
> >
> > In the bnx2 case, we can simply prefetch on all the archs (or just
> > remove the optimization).
>
> I think its worthwhile keeping, especially since the consistency can vary
> on a per struct device level. If there's a benefit with these sorts of
> prefetch micro-optimizations in drivers when it doesn't cost us that much
> to provide the hint, I don't really see the harm. If dma_is_consistent()
> is suddenly supposed to take on other meanings, or it's supposed to mean
> something entirely different, then this is something we should deal with
> separately.
>
> I don't see any harm in letting drivers know whether we can support
> consistent DMA allocs for a given struct device or not though, even if
> the micro-optimization is marginal at best.

I'm happier with exporting less DMA APIs to drivers because looks like
new original ways to use the APIs wrongly can be always invented.


> At least I've conditionalized the definition on SH, and it seems other
> archictures have done so too. It's not clear what we'd gain from throwing

>From a quick look, except for SH and POWERPC (and always-coherent
architectures), everyone does differently?

There are architectures that need to turn off the CPU cache for
coherent memory, I can't find none of them that see if an address is
coherent or not in dma_is_consistent().

As I wrote, there is only one user of this API and we can remove it
easily. Then I'm not sure it's worth fixing dma_is_consistent() in
many architectures. I prefer to add this to
feature-removal-schedule.txt to see if driver writers oppose.


> that away as long as they're generally in agreement on what it means.

2010-06-22 17:14:56

by Grant Grundler

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Tue, Jun 22, 2010 at 03:30:08PM +0900, FUJITA Tomonori wrote:
...
> > I don't see any harm in letting drivers know whether we can support
> > consistent DMA allocs for a given struct device or not though, even if
> > the micro-optimization is marginal at best.
>
> I'm happier with exporting less DMA APIs to drivers because looks like
> new original ways to use the APIs wrongly can be always invented.

Agree.

...
> There are architectures that need to turn off the CPU cache for
> coherent memory, I can't find none of them that see if an address is
> coherent or not in dma_is_consistent().

parisc "knows" primarily based on chipset and then checks CPU model.
We hook in the correct dma_ops early in boot before any device drivers
are probed.

hth,
grant

2010-06-22 17:26:10

by James Bottomley

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Tue, 2010-06-22 at 15:30 +0900, FUJITA Tomonori wrote:
> On Fri, 18 Jun 2010 00:30:51 +0900
> Paul Mundt <[email protected]> wrote:
>
> > On Thu, Jun 17, 2010 at 11:50:35PM +0900, FUJITA Tomonori wrote:
> > > On Thu, 17 Jun 2010 07:36:53 -0700 (PDT)
> > > David Miller <[email protected]> wrote:
> > >
> > > > From: FUJITA Tomonori <[email protected]>
> > > > Date: Thu, 17 Jun 2010 21:21:13 +0900
> > > >
> > > > > On Wed, 16 Jun 2010 23:24:44 -0700
> > > > > "Michael Chan" <[email protected]> wrote:
> > > > >
> > > > >> David, why is dma_is_consistent() always returning 1 on sparc? The
> > > > >> streaming DMA is not consistent.
> > > > >
> > > > > I think that there are some confusion about dma_is_consistent(). Some
> > > > > architectures think that dma_is_consistent() is supposed to return 1
> > > > > if they can allocate coherent memory (note that some architectures
> > > > > can't allocate coherent memory).
> > > >
> > > > Right, and that's why it's defined this way.
> > > >
> > > > If the desired meaning is different, just me know and I'll fix the
> > > > sparc definition.
> > >
> > > I think that there are some other architectures do the same. We need
> > > to make sure that all the architectures define dma_is_consistent() in
> > > the same meaning if drivers need it. However, I'm not sure we really
> > > need dma_is_consistent(). There is only one user of it (and I think we
> > > could remove it).
> > >
> > > In the bnx2 case, we can simply prefetch on all the archs (or just
> > > remove the optimization).
> >
> > I think its worthwhile keeping, especially since the consistency can vary
> > on a per struct device level. If there's a benefit with these sorts of
> > prefetch micro-optimizations in drivers when it doesn't cost us that much
> > to provide the hint, I don't really see the harm. If dma_is_consistent()
> > is suddenly supposed to take on other meanings, or it's supposed to mean
> > something entirely different, then this is something we should deal with
> > separately.
> >
> > I don't see any harm in letting drivers know whether we can support
> > consistent DMA allocs for a given struct device or not though, even if
> > the micro-optimization is marginal at best.
>
> I'm happier with exporting less DMA APIs to drivers because looks like
> new original ways to use the APIs wrongly can be always invented.
>
>
> > At least I've conditionalized the definition on SH, and it seems other
> > archictures have done so too. It's not clear what we'd gain from throwing
>
> >From a quick look, except for SH and POWERPC (and always-coherent
> architectures), everyone does differently?
>
> There are architectures that need to turn off the CPU cache for
> coherent memory, I can't find none of them that see if an address is
> coherent or not in dma_is_consistent().

Yes, I fear ... parisc. We have a class of machines where this is the
only way (and we also have a class of machines where the cache disable
doesn't work properly and we can't manufacture coherent memory at all).
All our pa2.0 systems are fully integrated between the iommu cache and
the CPU cache, so they can manufacture coherent memory properly, but the
pa1.0 systems are a mixed bag of dirty tricks.

> As I wrote, there is only one user of this API and we can remove it
> easily. Then I'm not sure it's worth fixing dma_is_consistent() in
> many architectures. I prefer to add this to
> feature-removal-schedule.txt to see if driver writers oppose.

Let me check our two drivers: lasi and 53c700; they're the only ones we
support on the architecture that can't do any coherence. I think we
don't need to tell because the dma_sync_cache calls which replace
coherent memory handling are indirected on the platform so we don't need
a global dma_is_coherent() flag.

James

2010-06-23 00:38:45

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: bnx2 fails to compile on parisc because of missing get_dma_ops()

On Tue, 22 Jun 2010 12:26:04 -0500
James Bottomley <[email protected]> wrote:

> > As I wrote, there is only one user of this API and we can remove it
> > easily. Then I'm not sure it's worth fixing dma_is_consistent() in
> > many architectures. I prefer to add this to
> > feature-removal-schedule.txt to see if driver writers oppose.
>
> Let me check our two drivers: lasi and 53c700; they're the only ones we
> support on the architecture that can't do any coherence. I think we
> don't need to tell because the dma_sync_cache calls which replace
> coherent memory handling are indirected on the platform so we don't need
> a global dma_is_coherent() flag.

There is only one place where 53c700 uses dma_is_consistent() (lasi
doesn't use it):

BUG_ON(!dma_is_consistent(hostdata->dev, pScript) && L1_CACHE_BYTES < dma_get_cache_alignment());

I think that we can remove the above checking since the existing
parisc systems that can't allocate coherent memory pass this checking.

53c700 and lasi call dma_cache_sync() unconditionally so we can live
without dma_is_consistent().