2014-10-13 10:52:58

by Lluís Batlle i Rossell

[permalink] [raw]
Subject: Regarding tx-nocache-copy in the Sheevaplug

Hello,

on the 7th of January 2014 ths patch was applied:
https://lkml.org/lkml/2014/1/7/307

[PATCH v2] net: Do not enable tx-nocache-copy by default

In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
sent corrupted. I think this machine has something special about the cache.

Enabling back this tx-nocache-copy (as it used to be before the patch) the
transfers work fine again. I think that most people, encountering this problem,
completely disable the tx offload instead of enabling back this setting.

Is this an ARM kernel problem regarding this platform?

Thank you,
Llu?s


2014-10-13 12:26:17

by Eric Dumazet

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On Mon, 2014-10-13 at 12:52 +0200, Lluís Batlle i Rossell wrote:
> Hello,
>
> on the 7th of January 2014 ths patch was applied:
> https://lkml.org/lkml/2014/1/7/307
>
> [PATCH v2] net: Do not enable tx-nocache-copy by default
>
> In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> sent corrupted. I think this machine has something special about the cache.
>
> Enabling back this tx-nocache-copy (as it used to be before the patch) the
> transfers work fine again. I think that most people, encountering this problem,
> completely disable the tx offload instead of enabling back this setting.
>
> Is this an ARM kernel problem regarding this platform?

Which NIC and driver is this exactly ?


2014-10-13 12:32:29

by Lluís Batlle i Rossell

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On Mon, Oct 13, 2014 at 05:26:11AM -0700, Eric Dumazet wrote:
> On Mon, 2014-10-13 at 12:52 +0200, Llu?s Batlle i Rossell wrote:
> > Hello,
> >
> > on the 7th of January 2014 ths patch was applied:
> > https://lkml.org/lkml/2014/1/7/307
> >
> > [PATCH v2] net: Do not enable tx-nocache-copy by default
> >
> > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > sent corrupted. I think this machine has something special about the cache.
> >
> > Enabling back this tx-nocache-copy (as it used to be before the patch) the
> > transfers work fine again. I think that most people, encountering this problem,
> > completely disable the tx offload instead of enabling back this setting.
> >
> > Is this an ARM kernel problem regarding this platform?
>
> Which NIC and driver is this exactly ?

According to dmesg in 3.10.1:
[ 7.858872] mv643xx_eth: MV-643xx 10/100/1000 ethernet driver version 1.4
[ 7.866001] mv643xx_eth_port mv643xx_eth_port.0 eth0: port 0 with MAC address 00:50:43:01:d1:bb

Regards,
Llu?s.

2014-10-13 14:22:32

by Andrew Lunn

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On Mon, Oct 13, 2014 at 12:52:46PM +0200, Llu?s Batlle i Rossell wrote:
> Hello,
>
> on the 7th of January 2014 ths patch was applied:
> https://lkml.org/lkml/2014/1/7/307
>
> [PATCH v2] net: Do not enable tx-nocache-copy by default
>
> In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> sent corrupted. I think this machine has something special about the cache.

Hi Llu?s

Please could you describe your test setup. I would like to try to
reproduce the problem. I have a machine based on kirkwood 6282 and the
same ethernet.

Thanks
Andrew

2014-10-13 14:32:02

by Lluís Batlle i Rossell

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

Enabling tx offload and disabling tx-nocache-copy, making the machine *send* a
lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite easy to
reproduce here.

As for the hardware, it's an old sheevaplug board.

On Mon, Oct 13, 2014 at 04:21:56PM +0200, Andrew Lunn wrote:
> On Mon, Oct 13, 2014 at 12:52:46PM +0200, Llu?s Batlle i Rossell wrote:
> > Hello,
> >
> > on the 7th of January 2014 ths patch was applied:
> > https://lkml.org/lkml/2014/1/7/307
> >
> > [PATCH v2] net: Do not enable tx-nocache-copy by default
> >
> > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > sent corrupted. I think this machine has something special about the cache.
>
> Hi Llu?s
>
> Please could you describe your test setup. I would like to try to
> reproduce the problem. I have a machine based on kirkwood 6282 and the
> same ethernet.
>
> Thanks
> Andrew

2014-10-13 14:49:24

by Eric Dumazet

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On Mon, 2014-10-13 at 16:31 +0200, Lluís Batlle i Rossell wrote:
> Enabling tx offload and disabling tx-nocache-copy, making the machine *send* a
> lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite easy to
> reproduce here.
>
> As for the hardware, it's an old sheevaplug board.


Have you tried disabling TSO only, and are you using the latest kernel ?

Ezequiel Garcia added lot of changes recently.

2014-10-13 15:48:28

by Lluís Batlle i Rossell

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On Mon, Oct 13, 2014 at 07:49:19AM -0700, Eric Dumazet wrote:
> On Mon, 2014-10-13 at 16:31 +0200, Llu?s Batlle i Rossell wrote:
> > Enabling tx offload and disabling tx-nocache-copy, making the machine *send* a
> > lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite easy to
> > reproduce here.
> >
> > As for the hardware, it's an old sheevaplug board.
>
>
> Have you tried disabling TSO only, and are you using the latest kernel ?
>
> Ezequiel Garcia added lot of changes recently.
>
>

Is TSO TCP segmentation offload? It's disabled. The kernel is 3.16.3 (debian).
https://packages.debian.org/testing/kernel/linux-image-3.16-2-kirkwood

2014-10-15 21:57:07

by Benjamin Poirier

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On 2014/10/13 12:52, Llu?s Batlle i Rossell wrote:
> Hello,
>
> on the 7th of January 2014 ths patch was applied:
> https://lkml.org/lkml/2014/1/7/307
>
> [PATCH v2] net: Do not enable tx-nocache-copy by default
>
> In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> sent corrupted. I think this machine has something special about the cache.
>
> Enabling back this tx-nocache-copy (as it used to be before the patch) the
> transfers work fine again. I think that most people, encountering this problem,
> completely disable the tx offload instead of enabling back this setting.
>
> Is this an ARM kernel problem regarding this platform?

This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
skb_do_copy_data_nocache() should end up using __copy_from_user()
regardless of tx-nocache-copy.

2014-10-15 22:45:30

by Eric Dumazet

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On Wed, 2014-10-15 at 14:57 -0700, Benjamin Poirier wrote:
> On 2014/10/13 12:52, Lluís Batlle i Rossell wrote:
> > Hello,
> >
> > on the 7th of January 2014 ths patch was applied:
> > https://lkml.org/lkml/2014/1/7/307
> >
> > [PATCH v2] net: Do not enable tx-nocache-copy by default
> >
> > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > sent corrupted. I think this machine has something special about the cache.
> >
> > Enabling back this tx-nocache-copy (as it used to be before the patch) the
> > transfers work fine again. I think that most people, encountering this problem,
> > completely disable the tx offload instead of enabling back this setting.
> >
> > Is this an ARM kernel problem regarding this platform?
>
> This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
> skb_do_copy_data_nocache() should end up using __copy_from_user()
> regardless of tx-nocache-copy.

kmap_atomic()/kunmap_atomic() is missing, so we lack
__cpuc_flush_dcache_area() operations.




2014-10-16 17:34:08

by Benjamin Poirier

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On 2014/10/15 15:45, Eric Dumazet wrote:
> On Wed, 2014-10-15 at 14:57 -0700, Benjamin Poirier wrote:
> > On 2014/10/13 12:52, Llu?s Batlle i Rossell wrote:
> > > Hello,
> > >
> > > on the 7th of January 2014 ths patch was applied:
> > > https://lkml.org/lkml/2014/1/7/307
> > >
> > > [PATCH v2] net: Do not enable tx-nocache-copy by default
> > >
> > > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > > sent corrupted. I think this machine has something special about the cache.
> > >
> > > Enabling back this tx-nocache-copy (as it used to be before the patch) the
> > > transfers work fine again. I think that most people, encountering this problem,
> > > completely disable the tx offload instead of enabling back this setting.
> > >
> > > Is this an ARM kernel problem regarding this platform?
> >
> > This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
> > skb_do_copy_data_nocache() should end up using __copy_from_user()
> > regardless of tx-nocache-copy.
>
> kmap_atomic()/kunmap_atomic() is missing, so we lack
> __cpuc_flush_dcache_area() operations.
>

You lost me there.
1) I don't see the link
2) It seems kmap_atomic and so on are there:
$ grep kmap_atomic System.map-3.16-2-kirkwood
c0014838 T kmap_atomic
c001491c T kmap_atomic_pfn
c00149a4 T kmap_atomic_to_page

MACH_KIRKWOOD selects CPU_FEROCEON which has
__cpuc_flush_dcache_area ->
cpu_cache.flush_kern_dcache_area ->
feroceon_flush_kern_dcache_area

2014-10-16 17:46:41

by Lluís Batlle i Rossell

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On Thu, Oct 16, 2014 at 10:34:01AM -0700, Benjamin Poirier wrote:
> On 2014/10/15 15:45, Eric Dumazet wrote:
> > On Wed, 2014-10-15 at 14:57 -0700, Benjamin Poirier wrote:
> > > On 2014/10/13 12:52, Llu?s Batlle i Rossell wrote:
> > > > Hello,
> > > >
> > > > on the 7th of January 2014 ths patch was applied:
> > > > https://lkml.org/lkml/2014/1/7/307
> > > >
> > > > [PATCH v2] net: Do not enable tx-nocache-copy by default
> > > >
> > > > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > > > sent corrupted. I think this machine has something special about the cache.
> > > >
> > > > Enabling back this tx-nocache-copy (as it used to be before the patch) the
> > > > transfers work fine again. I think that most people, encountering this problem,
> > > > completely disable the tx offload instead of enabling back this setting.
> > > >
> > > > Is this an ARM kernel problem regarding this platform?
> > >
> > > This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
> > > skb_do_copy_data_nocache() should end up using __copy_from_user()
> > > regardless of tx-nocache-copy.
> >
> > kmap_atomic()/kunmap_atomic() is missing, so we lack
> > __cpuc_flush_dcache_area() operations.
> >
>
> You lost me there.
> 1) I don't see the link
> 2) It seems kmap_atomic and so on are there:
> $ grep kmap_atomic System.map-3.16-2-kirkwood
> c0014838 T kmap_atomic
> c001491c T kmap_atomic_pfn
> c00149a4 T kmap_atomic_to_page
>
> MACH_KIRKWOOD selects CPU_FEROCEON which has
> __cpuc_flush_dcache_area ->
> cpu_cache.flush_kern_dcache_area ->
> feroceon_flush_kern_dcache_area

Hello all,

it seems I was a bit wrong - although enabling back tx-nocache-copy makes the
tx-errors happen much less often (ssh complaining about HMAC), they still
happen. It seems that something was introduced in some recent kernels that broke
the tx offload.

I have no idea what it can be, but since 2.6 until at least 3.10 the network
driver worked fine with tx offload in this sheevaplug board.

Regards,
Llu?s.

2014-10-16 17:48:28

by Eric Dumazet

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On Thu, 2014-10-16 at 10:34 -0700, Benjamin Poirier wrote:
> On 2014/10/15 15:45, Eric Dumazet wrote:

> > kmap_atomic()/kunmap_atomic() is missing, so we lack
> > __cpuc_flush_dcache_area() operations.
> >
>
> You lost me there.
> 1) I don't see the link
> 2) It seems kmap_atomic and so on are there:
> $ grep kmap_atomic System.map-3.16-2-kirkwood
> c0014838 T kmap_atomic
> c001491c T kmap_atomic_pfn
> c00149a4 T kmap_atomic_to_page
>
> MACH_KIRKWOOD selects CPU_FEROCEON which has
> __cpuc_flush_dcache_area ->
> cpu_cache.flush_kern_dcache_area ->
> feroceon_flush_kern_dcache_area

I meant to put a '?' instead of a '.'

Note that tcp does a copy, using :

2014-10-17 20:55:35

by Benjamin Poirier

[permalink] [raw]
Subject: Re: Regarding tx-nocache-copy in the Sheevaplug

On 2014/10/16 19:46, Llu?s Batlle i Rossell wrote:
[...]
>
> Hello all,
>
> it seems I was a bit wrong - although enabling back tx-nocache-copy makes the
> tx-errors happen much less often (ssh complaining about HMAC), they still
> happen. It seems that something was introduced in some recent kernels that broke
> the tx offload.
>
> I have no idea what it can be, but since 2.6 until at least 3.10 the network
> driver worked fine with tx offload in this sheevaplug board.

It's not the most pleasant alternative but if you can be sure enough
whether the problem is occurring or not, you could try bisecting,
possibly limiting the bisection to mv643xx

$ git bisect start v3.16.3 v3.10 -- drivers/net/ethernet/marvell/mv643xx_eth.c
Bisecting: 16 revisions left to test after this (roughly 4 steps)

The problem might be outside of the driver though.