2009-09-18 13:41:44

by Grozdan

[permalink] [raw]
Subject: sky2 rx length errors

Hi,

I have a Marvell onboard NIC (88E8053) and I've been noticing for a
while now a bit weird behavior with the sky2 driver. This mostly
occurs with newer kernels (2.6.30, 2.6.31) and my older distro kernel
(2.6.27.21) does not seem to have the same problem. Basically, the
sky2 driver will randomly and unpredictably spew rx length error
messages and reboot itself. I also noticed in dmesg that this mostly
occurs after "martian destination" messages. After this message, sky2
starts spewing messages as shown below and then reboots itself. It is
not really a big problem for me, but since I'm virtually always logged
in in IRC, the client always loses connection, waits for a few minutes
to get a response from the server and then relogs me again. I do not
think it's a HW problem as the Marvell NIC otherwise works perfectly
and I've checked my cable modem too which operates without a problem.
Any ideas?

PS: please cc me as I'm not subscribed to the mailing list

sky2 driver version 1.23
sky2 0000:05:00.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
sky2 0000:05:00.0: setting latency timer to 64
sky2 0000:05:00.0: PCI: Disallowing DAC for device
sky2 0000:05:00.0: Yukon-2 EC chip revision 2
sky2 0000:05:00.0: irq 53 for MSI/MSI-X
sky2 0000:05:00.0: No interrupt generated using MSI, switching to INTx mode.
sky2 eth0: addr 00:11:d8:a1:5b:0e
sky2 eth0: enabling interface
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control rx
.....
.....
martian destination 0.0.0.0 from 172.23.204.1, dev eth0
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x5ea0100 length 598
sky2 eth0: rx length error: status 0x5ea0100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x5ea0100 length 598
sky2 eth0: rx length error: status 0x5ea0100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598
sky2 eth0: rx length error: status 0x4420100 length 598


2009-09-20 06:36:09

by Andrew Morton

[permalink] [raw]
Subject: Re: sky2 rx length errors

(added cc's from the MAINTAINERS file)

On Fri, 18 Sep 2009 15:41:45 +0200 Grozdan <[email protected]> wrote:

> Hi,
>
> I have a Marvell onboard NIC (88E8053) and I've been noticing for a
> while now a bit weird behavior with the sky2 driver. This mostly
> occurs with newer kernels (2.6.30, 2.6.31) and my older distro kernel
> (2.6.27.21) does not seem to have the same problem. Basically, the
> sky2 driver will randomly and unpredictably spew rx length error
> messages and reboot itself. I also noticed in dmesg that this mostly
> occurs after "martian destination" messages. After this message, sky2
> starts spewing messages as shown below and then reboots itself. It is
> not really a big problem for me, but since I'm virtually always logged
> in in IRC, the client always loses connection, waits for a few minutes
> to get a response from the server and then relogs me again. I do not
> think it's a HW problem as the Marvell NIC otherwise works perfectly
> and I've checked my cable modem too which operates without a problem.
> Any ideas?
>
> PS: please cc me as I'm not subscribed to the mailing list
>
> sky2 driver version 1.23
> sky2 0000:05:00.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
> sky2 0000:05:00.0: setting latency timer to 64
> sky2 0000:05:00.0: PCI: Disallowing DAC for device
> sky2 0000:05:00.0: Yukon-2 EC chip revision 2
> sky2 0000:05:00.0: irq 53 for MSI/MSI-X
> sky2 0000:05:00.0: No interrupt generated using MSI, switching to INTx mode.
> sky2 eth0: addr 00:11:d8:a1:5b:0e
> sky2 eth0: enabling interface
> sky2 eth0: Link is up at 100 Mbps, full duplex, flow control rx
> .....
> .....
> martian destination 0.0.0.0 from 172.23.204.1, dev eth0
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x5ea0100 length 598
> sky2 eth0: rx length error: status 0x5ea0100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x5ea0100 length 598
> sky2 eth0: rx length error: status 0x5ea0100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598

2009-09-20 18:05:33

by Stephen Hemminger

[permalink] [raw]
Subject: Re: sky2 rx length errors

On Sat, 19 Sep 2009 23:35:36 -0700
Andrew Morton <[email protected]> wrote:

> (added cc's from the MAINTAINERS file)
>
> On Fri, 18 Sep 2009 15:41:45 +0200 Grozdan <[email protected]> wrote:
>
> > Hi,
> >
> > I have a Marvell onboard NIC (88E8053) and I've been noticing for a
> > while now a bit weird behavior with the sky2 driver. This mostly
> > occurs with newer kernels (2.6.30, 2.6.31) and my older distro kernel
> > (2.6.27.21) does not seem to have the same problem. Basically, the
> > sky2 driver will randomly and unpredictably spew rx length error
> > messages and reboot itself. I also noticed in dmesg that this mostly
> > occurs after "martian destination" messages. After this message, sky2
> > starts spewing messages as shown below and then reboots itself. It is
> > not really a big problem for me, but since I'm virtually always logged
> > in in IRC, the client always loses connection, waits for a few minutes
> > to get a response from the server and then relogs me again. I do not
> > think it's a HW problem as the Marvell NIC otherwise works perfectly
> > and I've checked my cable modem too which operates without a problem.
> > Any ideas?
> >
> > PS: please cc me as I'm not subscribed to the mailing list
> >
> > sky2 driver version 1.23
> > sky2 0000:05:00.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
> > sky2 0000:05:00.0: setting latency timer to 64
> > sky2 0000:05:00.0: PCI: Disallowing DAC for device
> > sky2 0000:05:00.0: Yukon-2 EC chip revision 2
> > sky2 0000:05:00.0: irq 53 for MSI/MSI-X
> > sky2 0000:05:00.0: No interrupt generated using MSI, switching to INTx mode.
> > sky2 eth0: addr 00:11:d8:a1:5b:0e
> > sky2 eth0: enabling interface
> > sky2 eth0: Link is up at 100 Mbps, full duplex, flow control rx
> > .....
> > .....
> > martian destination 0.0.0.0 from 172.23.204.1, dev eth0
> > sky2 eth0: rx length error: status 0x4420100 length 598
> > sky2 eth0: rx length error: status 0x5ea0100 length 598

This error status occurs if the length reported by the PHY does not
match the len reported by the DMA engine. The error status is:
0x4420100 = length 1090 + broadcast packet...

No idea what is on your network, but perhaps there is some MTU confusion?
Since martian destination seems related, knowing more about that packet
might help.

2009-09-20 18:16:03

by Grozdan

[permalink] [raw]
Subject: Re: sky2 rx length errors

2009/9/20 Stephen Hemminger <[email protected]>:

>
> This error status occurs if the length reported by the PHY does not
> match the len reported by the DMA engine. ?The error status is:
> ? 0x4420100 = length 1090 + broadcast packet...
>
> No idea what is on your network, but perhaps there is some MTU confusion?
> Since martian destination seems related, knowing more about that packet
> might help.
>

Hi,

Thanks for the reply. There's nothing on my home network here. It is
just a direct connection from my PC to my cable modem and there's
nothing in between. I've googled a bit and it seems others also
encounter this problem. I've read a few posts on the Ubuntu bugzilla
where people change the MTU from 1500 to 1492 and this fixes the
problem. However, even with this, some report that the problem is
still there. I did the same and it didn't change anything for me. So I
disabled my onboard NIC and added a 3Com one which has been working
perfectly so far and I think I'll just keep using it instead of the
Marvell one.

2009-09-20 18:34:58

by Willy Tarreau

[permalink] [raw]
Subject: Re: sky2 rx length errors

Hi guys,

On Sun, Sep 20, 2009 at 08:16:02PM +0200, Grozdan wrote:
> 2009/9/20 Stephen Hemminger <[email protected]>:
>
> >
> > This error status occurs if the length reported by the PHY does not
> > match the len reported by the DMA engine. ?The error status is:
> > ? 0x4420100 = length 1090 + broadcast packet...
> >
> > No idea what is on your network, but perhaps there is some MTU confusion?
> > Since martian destination seems related, knowing more about that packet
> > might help.
> >
>
> Hi,
>
> Thanks for the reply. There's nothing on my home network here. It is
> just a direct connection from my PC to my cable modem and there's
> nothing in between. I've googled a bit and it seems others also
> encounter this problem.

I've encountered similar issues on early 8053 chips too. Those were
soldered on motherboard of network servers bought about 4 years ago.
No matter what trick I could try, change drivers, enable/disable flow
control, change negociation speed, etc... the PHY would occasionally
and randomly get mad and start shifting received frames by a few bytes,
thus causing loss of network connectivity. The logs would also display
martians, depending on the bytes in the frame which appeared in the
IP header once shifted.

Sometimes it would automatically get back after a chip reset, sometimes
not. It seemed that disabling flow control helped a bit, but it was not
fantastic. It would randomly hang every 1-30 days, which made the issue
rather hard to debug.

I don't precisely remember the rev. of the chip, but I remember that
it was pretty old and that more recent machines had a much larger
number that never exhibited the issue. Also, my desktop right here
runs off a 88E8056 (~= two 8053s) and has never failed yet.

So I really think that there was a horrible batch of chips in its
early days.

> I've read a few posts on the Ubuntu bugzilla
> where people change the MTU from 1500 to 1492 and this fixes the
> problem. However, even with this, some report that the problem is
> still there. I did the same and it didn't change anything for me.

Did not help for me either.

> So I
> disabled my onboard NIC and added a 3Com one which has been working
> perfectly so far and I think I'll just keep using it instead of the
> Marvell one.

That's the best you can do if you happen to have one of those buggy
chips. We had to stuff intel NICs in the servers causing trouble at
the customer's and it solved the issue too.

Regards,
Willy

2009-09-20 18:46:59

by Grozdan

[permalink] [raw]
Subject: Re: sky2 rx length errors

2009/9/20 Willy Tarreau <[email protected]>:
> Hi guys,
>
> On Sun, Sep 20, 2009 at 08:16:02PM +0200, Grozdan wrote:
>> 2009/9/20 Stephen Hemminger <[email protected]>:
>>
>> >
>> > This error status occurs if the length reported by the PHY does not
>> > match the len reported by the DMA engine. ?The error status is:
>> > ? 0x4420100 = length 1090 + broadcast packet...
>> >
>> > No idea what is on your network, but perhaps there is some MTU confusion?
>> > Since martian destination seems related, knowing more about that packet
>> > might help.
>> >
>>
>> Hi,
>>
>> Thanks for the reply. There's nothing on my home network here. It is
>> just a direct connection from my PC to my cable modem and there's
>> nothing in between. I've googled a bit and it seems others also
>> encounter this problem.
>
> I've encountered similar issues on early 8053 chips too. Those were
> soldered on motherboard of network servers bought about 4 years ago.
> No matter what trick I could try, change drivers, enable/disable flow
> control, change negociation speed, etc... the PHY would occasionally
> and randomly get mad and start shifting received frames by a few bytes,
> thus causing loss of network connectivity. The logs would also display
> martians, depending on the bytes in the frame which appeared in the
> IP header once shifted.
>
> Sometimes it would automatically get back after a chip reset, sometimes
> not. It seemed that disabling flow control helped a bit, but it was not
> fantastic. It would randomly hang every 1-30 days, which made the issue
> rather hard to debug.
>
> I don't precisely remember the rev. of the chip, but I remember that
> it was pretty old and that more recent machines had a much larger
> number that never exhibited the issue. Also, my desktop right here
> runs off a 88E8056 (~= two 8053s) and has never failed yet.
>
> So I really think that there was a horrible batch of chips in its
> early days.
>
>> I've read a few posts on the Ubuntu bugzilla
>> where people change the MTU from 1500 to 1492 and this fixes the
>> problem. However, even with this, some report that the problem is
>> still there. I did the same and it didn't change anything for me.
>
> Did not help for me either.
>
>> So I
>> disabled my onboard NIC and added a 3Com one which has been working
>> perfectly so far and I think I'll just keep using it instead of the
>> Marvell one.
>
> That's the best you can do if you happen to have one of those buggy
> chips. We had to stuff intel NICs in the servers causing trouble at
> the customer's and it solved the issue too.
>
> Regards,
> Willy
>
>

Thanks Willy :)

What I'm still wondering a bit though is the fact that I've never seen
it behave like that for the past 3 years I've been using it. Only
recently, with upgrading my kernel to 2.6.30 and later on to 2.6.31
(self-compiled, sources taken from the openSUSE build service) it
started to behave like that. In the past I also used older kernels (of
course) like 2.6.27.x and 2.6.29 and never encountered this. So I'm a
bit uncertain as to whether it's actually something in the kernel that
makes it behave like that or that there's a HW problem that suddenly
occurred or got exposed...

2009-09-20 18:54:24

by Willy Tarreau

[permalink] [raw]
Subject: Re: sky2 rx length errors

On Sun, Sep 20, 2009 at 08:46:59PM +0200, Grozdan wrote:
(...)
> Thanks Willy :)
>
> What I'm still wondering a bit though is the fact that I've never seen
> it behave like that for the past 3 years I've been using it. Only
> recently, with upgrading my kernel to 2.6.30 and later on to 2.6.31
> (self-compiled, sources taken from the openSUSE build service) it
> started to behave like that. In the past I also used older kernels (of
> course) like 2.6.27.x and 2.6.29 and never encountered this. So I'm a
> bit uncertain as to whether it's actually something in the kernel that
> makes it behave like that or that there's a HW problem that suddenly
> occurred or got exposed...

Unless you changed the switch port it is connected to, I agree this
sounds strange. I have also wondered if those issues could be caused
by temperature rising on the chip. I don't know if any recent change
could cause such environmental differences to occur :-/

Regards,
Willy

2009-09-21 01:46:10

by Stephen Hemminger

[permalink] [raw]
Subject: Re: sky2 rx length errors

On Mon, 21 Sep 2009 07:11:21 +0900
Mike McCormack <[email protected]> wrote:

> 2009/9/21 Stephen Hemminger <[email protected]>
>
> > On Sat, 19 Sep 2009 23:35:36 -0700
> > Andrew Morton <[email protected]> wrote:
> >
> > > (added cc's from the MAINTAINERS file)
> > >
> > > On Fri, 18 Sep 2009 15:41:45 +0200 Grozdan <[email protected]> wrote:
> >
>
> <snip>
>
> > > martian destination 0.0.0.0 from 172.23.204.1, dev eth0
> > > > sky2 eth0: rx length error: status 0x4420100 length 598
> > > > sky2 eth0: rx length error: status 0x5ea0100 length 598
> >
> > This error status occurs if the length reported by the PHY does not
> > match the len reported by the DMA engine. The error status is:
> > 0x4420100 = length 1090 + broadcast packet...
> >
> > No idea what is on your network, but perhaps there is some MTU confusion?
> > Since martian destination seems related, knowing more about that packet
> > might help.
> >
> >
> This appears to be the same problem reported at:
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/292445
>
> Mike

This really looks like multiple packets are getting smashed
together into one DMA, i.e a hardware timing related issue.
it might be possible to work around the problem
by separating them.

--