2018-10-19 15:24:53

by Richard Genoud

[permalink] [raw]
Subject: CRC errors between mvneta and macb

Hi all,

I've been struggling with a strange behavior between a clearfog-pro
and an at91sam9g35-ek boards.

TL;DR: ethernet frames are received with a CRC error on the clearfog
ETH0, but seem perfectly all right. Add a switch between the 2
boards, and the ethernet frames are accepted.


I've got a clearfog pro and an at91sam9g35-ek, both with kernel
4.19-rc8.
An RJ45 cable is plugged between the clearfog (on the solo port (eth0))
and the g35-ek board (100Mb/s).

They are configured with autoneg and a fixed IP address.

I start the 2 board, and, with the clearfog I ping the g35-ek.
If it succeeds, it will until the g35-ek is rebooted.
If it fails, it also will until the g35-ek is rebooted.

Rebooting the cleafog doesn't change anything.
Resetting the g35-ek PHY (mii-diag -R) doesn't change anything either.

When the ping fails, it's actually because the mvneta returns a CRC
error:
mvneta f1070000.ethernet eth0: bad rx status 0cc10000 (crc error), size=66

And, if I plug the RJ45 cable between the clearfog's matrix and the
g35-ek, everything works well, always.

To ease the debugging, instead of a ping I used:
https://gist.github.com/austinmarton/1922600
from the g35-ek in order to have the same frame every time.
So, I check with the scope the ethernet CRC (on the g35-ek PHY TXD[0-1]
(DM9161A)).
And the CRC is all right.

I also manage to trigger this bug by simply doing:
rmmod macb ; insmod macb.ko on the g35-ek.
Then, frames are accepted, or not.

I checked all PHY/macb register values on the g35-ek, they are the same.

The only thing I could find is related to the TXCLK on the PHY.

When there's a CRC error, the TXCLK has its polarity inverted...
That's a clue !

But this TXCLK (25MHz) is not used on the g35-ek.
Only the REFCLK/XT2 (50MHz) is used to synchronise the PHY and the macb.
So I guess that the TXCLK has a role to play to generate TX+/TX-

And I also guess that when the signal is converted back on the clearfog,
the clock polarity is somehow responsible for the CRC errors.

I was heading to get my scope on the clearfog's PHY to see what it
received, but Marvell's documentation is not as freely available as
Atmel's ones, so I'm quite stuck at this point.

Any idea ?

NB: I also managed to trigger this with an at91sam9g20-ek (but not with
a sama5d2)


Regards,
Richard


2018-10-19 15:45:29

by Willy Tarreau

[permalink] [raw]
Subject: Re: CRC errors between mvneta and macb

On Fri, Oct 19, 2018 at 05:15:03PM +0200, Richard Genoud wrote:
> When there's a CRC error, the TXCLK has its polarity inverted...
> That's a clue !
>
> But this TXCLK (25MHz) is not used on the g35-ek.
> Only the REFCLK/XT2 (50MHz) is used to synchronise the PHY and the macb.
> So I guess that the TXCLK has a role to play to generate TX+/TX-

Well, just a stupid idea, maybe when this signal is inverted, the TX+/TX-
are desynchronized by half a clock and are not always properly interpreted
on the other side ?

Willy

2018-10-22 06:54:07

by Richard Genoud

[permalink] [raw]
Subject: Re: CRC errors between mvneta and macb

Le 19/10/2018 à 17:44, Willy Tarreau a écrit :
> On Fri, Oct 19, 2018 at 05:15:03PM +0200, Richard Genoud wrote:
>> When there's a CRC error, the TXCLK has its polarity inverted...
>> That's a clue !
>>
>> But this TXCLK (25MHz) is not used on the g35-ek.
>> Only the REFCLK/XT2 (50MHz) is used to synchronise the PHY and the macb.
>> So I guess that the TXCLK has a role to play to generate TX+/TX-
>
> Well, just a stupid idea, maybe when this signal is inverted, the TX+/TX-
> are desynchronized by half a clock and are not always properly interpreted
> on the other side ?
>
> Willy
>

I must admit that I'm not familiar with the PHY internals, I'll have to
dig into that.

Richard.

2018-10-22 15:16:17

by Richard Genoud

[permalink] [raw]
Subject: Re: CRC errors between mvneta and macb

Le 22/10/2018 à 08:51, Richard Genoud a écrit :
> Le 19/10/2018 à 17:44, Willy Tarreau a écrit :
>> On Fri, Oct 19, 2018 at 05:15:03PM +0200, Richard Genoud wrote:
>>> When there's a CRC error, the TXCLK has its polarity inverted...
>>> That's a clue !
>>>
>>> But this TXCLK (25MHz) is not used on the g35-ek.
>>> Only the REFCLK/XT2 (50MHz) is used to synchronise the PHY and the macb.
>>> So I guess that the TXCLK has a role to play to generate TX+/TX-
>>
>> Well, just a stupid idea, maybe when this signal is inverted, the TX+/TX-
>> are desynchronized by half a clock and are not always properly interpreted
>> on the other side ?
>>
>> Willy
>>
>
> I must admit that I'm not familiar with the PHY internals, I'll have to
> dig into that.
>
> Richard.
>

I dug more on the subject, and I think I found what Marvell's PHY/MAC
doesn't like.

First of all, I forced the liaison at 10Mbits full duplex on both sides,
as the Manchester code is "easier" to decode than the 4B5B-MLT3 used for
fast ethernet.

Fortunately, the FCS errors are still present on 10Mbits/s.

After analyzing the ethernet frame on the Davicom PHY's output (pin
TX+), I find out that the FCS errors occurs when the ethernet preamble
is longer than 56bits. (something like 58 or 60 bits)

To say this in another way, instead of having 28 times 1-0 followed by
the SFD (10101011), I see 29 or 30 times 1-0 followed by the SFD.
(sometimes 29, sometimes 30)


Should a longer preamble be considered as an FCS error ? It seems a
little harsh since the point of the preamble is to synchronize the frame.

I don't know what the 802.3 standard says about that.


Regards,
Richard.

2018-10-22 16:38:33

by Willy Tarreau

[permalink] [raw]
Subject: Re: CRC errors between mvneta and macb

On Mon, Oct 22, 2018 at 05:15:21PM +0200, Richard Genoud wrote:
> After analyzing the ethernet frame on the Davicom PHY's output (pin
> TX+), I find out that the FCS errors occurs when the ethernet preamble
> is longer than 56bits. (something like 58 or 60 bits)
>
> To say this in another way, instead of having 28 times 1-0 followed by
> the SFD (10101011), I see 29 or 30 times 1-0 followed by the SFD.
> (sometimes 29, sometimes 30)
>
>
> Should a longer preamble be considered as an FCS error ? It seems a
> little harsh since the point of the preamble is to synchronize the frame.

That indeed seems a bit strange considering that you're not supposed to
know what is before the preamble so it would very well contain random
noise looking a lot like alteranted bits.

> I don't know what the 802.3 standard says about that.

Just found it :-)

https://www.trincoll.edu/Academics/MajorsAndMinors/Engineering/Documents/IEEE%20Standard%20for%20Ethernet.pdf

Page 132, #7.2.3.2 :

The DTE is required to supply at least 56 bits of preamble in
order to satisfy system requirements. System components consume
preamble bits in order to perform their functions. The number
of preamble bits sourced ensures an adequate number of bits are
provided to each system component to correctly implement its
function.

So that totally makes sense since the purpose is to enable signal
detection at the hardware leve, hence the problem definitely is on
the receiver in your case.

Willy

2018-10-22 19:08:39

by Andrew Lunn

[permalink] [raw]
Subject: Re: CRC errors between mvneta and macb

> I dug more on the subject, and I think I found what Marvell's PHY/MAC
> doesn't like.

Hi Richard

What PHY is being used?

> After analyzing the ethernet frame on the Davicom PHY's output (pin
> TX+), I find out that the FCS errors occurs when the ethernet preamble
> is longer than 56bits. (something like 58 or 60 bits)

Some Marvell PHYs have a register bit which might be of interest: Page
2, register 16, bit 6.

0 = Pad odd nibble preambles in copper receive packets.
1 = Pass as is and do not pad odd nibble preambles in

Andrew

2018-10-23 06:58:55

by Richard Genoud

[permalink] [raw]
Subject: Re: CRC errors between mvneta and macb

Le 22/10/2018 à 20:19, Andrew Lunn a écrit :
>> I dug more on the subject, and I think I found what Marvell's PHY/MAC
>> doesn't like.
>
> Hi Richard
>
> What PHY is being used?
>
88E1512-NNP2

>> After analyzing the ethernet frame on the Davicom PHY's output (pin
>> TX+), I find out that the FCS errors occurs when the ethernet preamble
>> is longer than 56bits. (something like 58 or 60 bits)
>
> Some Marvell PHYs have a register bit which might be of interest: Page
> 2, register 16, bit 6.
>
> 0 = Pad odd nibble preambles in copper receive packets.
> 1 = Pass as is and do not pad odd nibble preambles in
>
> Andrew
>

Thanks, I'll look into that.

Richard

2018-10-23 07:04:16

by Richard Genoud

[permalink] [raw]
Subject: Re: CRC errors between mvneta and macb

Le 22/10/2018 à 18:34, Willy Tarreau a écrit :
> On Mon, Oct 22, 2018 at 05:15:21PM +0200, Richard Genoud wrote:
>> After analyzing the ethernet frame on the Davicom PHY's output (pin
>> TX+), I find out that the FCS errors occurs when the ethernet preamble
>> is longer than 56bits. (something like 58 or 60 bits)
>>
>> To say this in another way, instead of having 28 times 1-0 followed by
>> the SFD (10101011), I see 29 or 30 times 1-0 followed by the SFD.
>> (sometimes 29, sometimes 30)
>>
>>
>> Should a longer preamble be considered as an FCS error ? It seems a
>> little harsh since the point of the preamble is to synchronize the frame.
>
> That indeed seems a bit strange considering that you're not supposed to
> know what is before the preamble so it would very well contain random
> noise looking a lot like alteranted bits.
>
>> I don't know what the 802.3 standard says about that.
>
> Just found it :-)
>
> https://www.trincoll.edu/Academics/MajorsAndMinors/Engineering/Documents/IEEE%20Standard%20for%20Ethernet.pdf
>
> Page 132, #7.2.3.2 :
>
> The DTE is required to supply at least 56 bits of preamble in
> order to satisfy system requirements. System components consume
> preamble bits in order to perform their functions. The number
> of preamble bits sourced ensures an adequate number of bits are
> provided to each system component to correctly implement its
> function.
>
> So that totally makes sense since the purpose is to enable signal
> detection at the hardware leve, hence the problem definitely is on
> the receiver in your case.
>
> Willy
>
Great ! Thanks !
I'll check on the Marvell side

Richard

2018-10-23 12:40:27

by Richard Genoud

[permalink] [raw]
Subject: Re: CRC errors between mvneta and macb

Le 22/10/2018 à 20:19, Andrew Lunn a écrit :
>> I dug more on the subject, and I think I found what Marvell's PHY/MAC
>> doesn't like.
>
> Hi Richard
>
> What PHY is being used?
>
>> After analyzing the ethernet frame on the Davicom PHY's output (pin
>> TX+), I find out that the FCS errors occurs when the ethernet preamble
>> is longer than 56bits. (something like 58 or 60 bits)
>
> Some Marvell PHYs have a register bit which might be of interest: Page
> 2, register 16, bit 6.
>
> 0 = Pad odd nibble preambles in copper receive packets.
> 1 = Pass as is and do not pad odd nibble preambles in
It doesn't seem to change anything.

But the problem really seems to be between the 88E1512 and mvneta.

In mvneta_rx_swbm() I dumped the data received, in both cases, I've got
the same thing:
0000 0000 0000 0000 0004 a3d2 a7ef 0800
dead beef 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 8a86 ce78
The 2 first bytes are the marvell header, and 4 last the CRC
In one case the MVNETA_RXD_ERR_SUMMARY status bit is set, and not in the
other case.

But I don't have access to the Marvell documentation to know exactly
what is the status "MVNETA_RXD_ERR_CRC".

Richard