2007-10-03 00:42:29

by Ian Kumlien

[permalink] [raw]
Subject: [BUG] sky2 errors in 2.6.23-rc9-git1

Hi,

Sorry about this but the latest sky2 seems damned odd.
I have been running with jumbo frames at home for quite some time but
with this kernel that doesn't work, i instead get loads of:
sky2 eth0: rx length error: status 0x5e60500 length 1510
sky2 eth0: rx length error: status 0x5e60500 length 1510
sky2 eth0: rx length error: status 0x5ea0500 length 1514
sky2 eth0: rx length error: status 0x5ea0500 length 1514

Where length can be just about anything from 800 -> MTU

That is not enough though, i also, for some reason, got several hangs:
sky2 eth0: hung mac 0:68 fifo 143 (133:76)
sky2 eth0: receiver hang detected
sky2 eth0: disabling interface
sky2 eth0: enabling interface
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
sky2 eth0: hung mac 0:125 fifo 195 (93:88)
sky2 eth0: receiver hang detected
sky2 eth0: disabling interface
sky2 eth0: enabling interface
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
sky2 eth0: hung mac 0:124 fifo 98 (10:108)
sky2 eth0: receiver hang detected
sky2 eth0: disabling interface
sky2 eth0: enabling interface
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
sky2 eth0: hung mac 0:41 fifo 30 (187:17)
sky2 eth0: receiver hang detected
sky2 eth0: disabling interface
sky2 eth0: enabling interface
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
...

All during about 2 minutes.

Could this be related to [sky2: sky2 FE+ receive status workaround]:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blobdiff;f=drivers/net/sky2.c;h=a3de0b6127ebb537b87a1849e207909fcc333ee4;hp=0792031a5cf959a1543f32f4e0f2ab4ccb7b0ec2;hb=3b12e0141f7a97c3b84731b5f935ed738bb6f960;hpb=ff0ce6845bc18292e80ea40d11c3d3a539a3fc5e

The chips being used are:
sky2 0000:02:00.0: v1.18 addr 0xdbffc000 irq 17 Yukon-EC (0xb6) rev 2
sky2 0000:02:00.0: v1.18 addr 0xfddfc000 irq 17 Yukon-EC (0xb6) rev 1

The receiver hang only happes on the REV 2 chip, which also reports:
sky2 0000:02:00.0: No interrupt generated using MSI, switching to INTx mode.

Ifconfig reports:
REV 2 chip:
RX packets:30492 errors:0 dropped:646 overruns:0 frame:646
TX packets:29229 errors:0 dropped:0 overruns:0 carrier:0

REV 1 chip:
RX packets:19795 errors:0 dropped:131 overruns:0 frame:131
TX packets:18588 errors:0 dropped:0 overruns:0 carrier:0

Let me know when jumbo frames work again, just mail me patches =)
(to tired to look in to it closer atm)

--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2007-10-03 01:02:44

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] sky2: jumbo frame regression fix

Remove unneeded check that caused problems with jumbo frame sizes.
The check was recently added and is wrong.
When using jumbo frames the sky2 driver does fragmentation, so
rx_data_size is less than mtu.

Signed-off-by: Stephen Hemminger <[email protected]>

--- a/drivers/net/sky2.c 2007-10-02 17:56:31.000000000 -0700
+++ b/drivers/net/sky2.c 2007-10-02 17:58:56.000000000 -0700
@@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru
sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
prefetch(sky2->rx_ring + sky2->rx_next);

- if (length < ETH_ZLEN || length > sky2->rx_data_size)
- goto len_error;
-
/* This chip has hardware problems that generates bogus status.
* So do only marginal checking and expect higher level protocols
* to handle crap frames.

2007-10-03 01:07:41

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] sky2: jumbo frame regression fix

Stephen Hemminger wrote:
> Remove unneeded check that caused problems with jumbo frame sizes.
> The check was recently added and is wrong.
> When using jumbo frames the sky2 driver does fragmentation, so
> rx_data_size is less than mtu.
>
> Signed-off-by: Stephen Hemminger <[email protected]>
>
> --- a/drivers/net/sky2.c 2007-10-02 17:56:31.000000000 -0700
> +++ b/drivers/net/sky2.c 2007-10-02 17:58:56.000000000 -0700
> @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru
> sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
> prefetch(sky2->rx_ring + sky2->rx_next);
>
> - if (length < ETH_ZLEN || length > sky2->rx_data_size)
> - goto len_error;
> -

2.6.23? 2.6.24? enquiring minds want to know...


2007-10-03 01:35:17

by Ian Kumlien

[permalink] [raw]
Subject: Re: [PATCH] sky2: jumbo frame regression fix

On tis, 2007-10-02 at 18:02 -0700, Stephen Hemminger wrote:
> Remove unneeded check that caused problems with jumbo frame sizes.
> The check was recently added and is wrong.
> When using jumbo frames the sky2 driver does fragmentation, so
> rx_data_size is less than mtu.

Confirmed working.

Now running with 9k mtu with no errors, =)

It also seems that the FIFO bug was the one that affected me before,
damn odd race that one.

> Signed-off-by: Stephen Hemminger <[email protected]>
Tested-by: Ian Kumlien <[email protected]>

(if that tag exists now)

Btw, Sorry but all mail directly to you will be blocked. I have yet to
fix the relaying properly with isp:s blocking port 25 etc so for some of
you this mail will only show up on the ML.

> --- a/drivers/net/sky2.c 2007-10-02 17:56:31.000000000 -0700
> +++ b/drivers/net/sky2.c 2007-10-02 17:58:56.000000000 -0700
> @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru
> sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
> prefetch(sky2->rx_ring + sky2->rx_next);
>
> - if (length < ETH_ZLEN || length > sky2->rx_data_size)
> - goto len_error;
> -
> /* This chip has hardware problems that generates bogus status.
> * So do only marginal checking and expect higher level protocols
> * to handle crap frames.
--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2007-10-03 04:54:13

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] sky2: jumbo frame regression fix

On Tue, 02 Oct 2007 21:07:22 -0400
Jeff Garzik <[email protected]> wrote:

> Stephen Hemminger wrote:
> > Remove unneeded check that caused problems with jumbo frame sizes.
> > The check was recently added and is wrong.
> > When using jumbo frames the sky2 driver does fragmentation, so
> > rx_data_size is less than mtu.
> >
> > Signed-off-by: Stephen Hemminger <[email protected]>
> >
> > --- a/drivers/net/sky2.c 2007-10-02 17:56:31.000000000 -0700
> > +++ b/drivers/net/sky2.c 2007-10-02 17:58:56.000000000 -0700
> > @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru
> > sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
> > prefetch(sky2->rx_ring + sky2->rx_next);
> >
> > - if (length < ETH_ZLEN || length > sky2->rx_data_size)
> > - goto len_error;
> > -
>
> 2.6.23? 2.6.24? enquiring minds want to know...

2.6.23, since it is a regression

--
Stephen Hemminger <[email protected]>

2007-10-03 04:59:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] sky2: jumbo frame regression fix

Stephen Hemminger wrote:
> On Tue, 02 Oct 2007 21:07:22 -0400
> Jeff Garzik <[email protected]> wrote:
>
>> Stephen Hemminger wrote:
>>> Remove unneeded check that caused problems with jumbo frame sizes.
>>> The check was recently added and is wrong.
>>> When using jumbo frames the sky2 driver does fragmentation, so
>>> rx_data_size is less than mtu.
>>>
>>> Signed-off-by: Stephen Hemminger <[email protected]>
>>>
>>> --- a/drivers/net/sky2.c 2007-10-02 17:56:31.000000000 -0700
>>> +++ b/drivers/net/sky2.c 2007-10-02 17:58:56.000000000 -0700
>>> @@ -2163,9 +2163,6 @@ static struct sk_buff *sky2_receive(stru
>>> sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
>>> prefetch(sky2->rx_ring + sky2->rx_next);
>>>
>>> - if (length < ETH_ZLEN || length > sky2->rx_data_size)
>>> - goto len_error;
>>> -
>> 2.6.23? 2.6.24? enquiring minds want to know...
>
> 2.6.23, since it is a regression

You can have regressions in behavior in net-2.6.24.git, too. _Please_
be specific about where you want your patches to go. Thanks.

Jeff



2007-10-03 05:02:19

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] sky2: jumbo frame regression fix

On Wed, 03 Oct 2007 03:34:34 +0200
Ian Kumlien <[email protected]> wrote:

> On tis, 2007-10-02 at 18:02 -0700, Stephen Hemminger wrote:
> > Remove unneeded check that caused problems with jumbo frame sizes.
> > The check was recently added and is wrong.
> > When using jumbo frames the sky2 driver does fragmentation, so
> > rx_data_size is less than mtu.
>
> Confirmed working.
>
> Now running with 9k mtu with no errors, =)
>
> It also seems that the FIFO bug was the one that affected me before,
> damn odd race that one.

Does the workaround (forced reset work). Ian, you are the first person to
report triggering it. I haven't found a way to make it happen.
What combination of flow control and speeds are you using?


--
Stephen Hemminger <[email protected]>

2007-10-03 07:38:05

by Ian Kumlien

[permalink] [raw]
Subject: Re: [PATCH] sky2: jumbo frame regression fix

On tis, 2007-10-02 at 21:59 -0700, Stephen Hemminger wrote:
> On Wed, 03 Oct 2007 03:34:34 +0200
> Ian Kumlien <[email protected]> wrote:
>
> > On tis, 2007-10-02 at 18:02 -0700, Stephen Hemminger wrote:
> > > Remove unneeded check that caused problems with jumbo frame sizes.
> > > The check was recently added and is wrong.
> > > When using jumbo frames the sky2 driver does fragmentation, so
> > > rx_data_size is less than mtu.
> >
> > Confirmed working.
> >
> > Now running with 9k mtu with no errors, =)
> >
> > It also seems that the FIFO bug was the one that affected me before,
> > damn odd race that one.
>
> Does the workaround (forced reset work). Ian, you are the first person to
> report triggering it. I haven't found a way to make it happen.
> What combination of flow control and speeds are you using?

Yes it works, it's the problem i had all along =)

As to how to make it happen thats a bit harder...
To me it seems like it's a combination of several connections and
somewhat high bandwidth but you have to send data for it to happen...

To me it usually happens when seeding files via Bittorrent, but it seems
like it has to be somewhat special circumstances to actually trigger it.

I use jumbo frames, my lan is gigabit, to my firewall. From the firewall
it's common 1500 mtu 100mbit and i doubt that this has anything to do
with it (if it's not a 'number of frames that can be stored' problem and
thus the mtu limits it to a really small value making it easier to
trigger)

Well, thats my thoughts atleast but then i just got up after having
slept 5 hours, so =)

--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2007-10-03 08:30:30

by Ian Kumlien

[permalink] [raw]
Subject: Re: [PATCH] sky2: jumbo frame regression fix

On Tue, Oct 02, 2007 at 09:59:14PM -0700, Stephen Hemminger wrote:
> On Wed, 03 Oct 2007 03:34:34 +0200
> Ian Kumlien <[email protected]> wrote:
>
> > On tis, 2007-10-02 at 18:02 -0700, Stephen Hemminger wrote:
> > > Remove unneeded check that caused problems with jumbo frame sizes.
> > > The check was recently added and is wrong.
> > > When using jumbo frames the sky2 driver does fragmentation, so
> > > rx_data_size is less than mtu.
> >
> > Confirmed working.
> >
> > Now running with 9k mtu with no errors, =)
> >
> > It also seems that the FIFO bug was the one that affected me before,
> > damn odd race that one.
>
> Does the workaround (forced reset work). Ian, you are the first person to
> report triggering it. I haven't found a way to make it happen.
> What combination of flow control and speeds are you using?

I forgot to add, last time was -rc8-git2 or 3 and using Westwood flow
control.

> --
> Stephen Hemminger <[email protected]>

2007-10-03 17:40:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] sky2: jumbo frame regression fix

Stephen Hemminger wrote:
> Remove unneeded check that caused problems with jumbo frame sizes.
> The check was recently added and is wrong.
> When using jumbo frames the sky2 driver does fragmentation, so
> rx_data_size is less than mtu.
>
> Signed-off-by: Stephen Hemminger <[email protected]>

applied


2007-10-03 17:59:19

by Bill Davidsen

[permalink] [raw]
Subject: Re: [PATCH] sky2: jumbo frame regression fix

Ian Kumlien wrote:
> On tis, 2007-10-02 at 18:02 -0700, Stephen Hemminger wrote:
>> Remove unneeded check that caused problems with jumbo frame sizes.
>> The check was recently added and is wrong.
>> When using jumbo frames the sky2 driver does fragmentation, so
>> rx_data_size is less than mtu.
>
> Confirmed working.
>
> Now running with 9k mtu with no errors, =)

Have you verified that you are actually getting jumbo packets out of the
NIC? I had one machine which did standard packets silently using sky2
and jumbo using sk98lin. I was looking for something else with tcpdump
and got one of those WTF moments when I saw all the tiny packets.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2007-10-03 19:03:52

by Ian Kumlien

[permalink] [raw]
Subject: Re: [PATCH] sky2: jumbo frame regression fix

On ons, 2007-10-03 at 14:04 -0400, Bill Davidsen wrote:
> Ian Kumlien wrote:
> > On tis, 2007-10-02 at 18:02 -0700, Stephen Hemminger wrote:
> >> Remove unneeded check that caused problems with jumbo frame sizes.
> >> The check was recently added and is wrong.
> >> When using jumbo frames the sky2 driver does fragmentation, so
> >> rx_data_size is less than mtu.
> >
> > Confirmed working.
> >
> > Now running with 9k mtu with no errors, =)
>
> Have you verified that you are actually getting jumbo packets out of the
> NIC? I had one machine which did standard packets silently using sky2
> and jumbo using sk98lin. I was looking for something else with tcpdump
> and got one of those WTF moments when I saw all the tiny packets.

20:27:06.542461 IP pi.local > blue.local: ICMP echo request, id 27173, seq 42, length 8008
20:27:06.543136 IP blue.local > pi.local: ICMP echo reply, id 27173, seq 42, length 8008

That should solve it for us, right? =)

--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2007-11-27 22:40:37

by Ian Kumlien

[permalink] [raw]
Subject: [SKY2] Problems (2.6.24-rc3-git1)

[Repost, no reply has been recived]

Hi,

A little while ago, something went horribly wrong.

I could still use my mouse and the desktop was still alive more or
less... everything using networking was dead AND the keyboard was
dead... So i composed commands using existing text on the screen.

The device:
sky2 0000:02:00.0: v1.20 addr 0xdbffc000 irq 17 Yukon-EC (0xb6) rev 2
sky2 0000:02:00.0: PCI Express Advanced Error Reporting not configured or MMCONFIG problem?
sky2 0000:02:00.0: No interrupt generated using MSI, switching to INTx mode.
sky2 eth0: addr 00:15:f2:aa:8b:3e

From dmesg:
sky2 eth0: hung mac 124:39 fifo 195 (185:180)
sky2 eth0: receiver hang detected
sky2 eth0: disabling interface
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442

And it continues until i press the reset button.


--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2007-11-27 22:58:24

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [SKY2] Problems (2.6.24-rc3-git1)

On Tue, 27 Nov 2007 23:40:22 +0100
Ian Kumlien <[email protected]> wrote:

> [Repost, no reply has been recived]
>
> Hi,
>
> A little while ago, something went horribly wrong.
>
> I could still use my mouse and the desktop was still alive more or
> less... everything using networking was dead AND the keyboard was
> dead... So i composed commands using existing text on the screen.
>
> The device:
> sky2 0000:02:00.0: v1.20 addr 0xdbffc000 irq 17 Yukon-EC (0xb6) rev 2
> sky2 0000:02:00.0: PCI Express Advanced Error Reporting not configured or MMCONFIG problem?
> sky2 0000:02:00.0: No interrupt generated using MSI, switching to INTx mode.
> sky2 eth0: addr 00:15:f2:aa:8b:3e

The recovery logic for hung FIFO no longer works in 2.6.24-rc1+, I'm looking
into it.


--
Stephen Hemminger <[email protected]>

2007-11-27 23:08:16

by Ian Kumlien

[permalink] [raw]
Subject: [SKY2] Problems (2.6.24-rc3-git1)

From lkml:
On Tue, 27 Nov 2007 23:40:22 +0100
Ian Kumlien <[email protected]> wrote:

> [Repost, no reply has been recived]
>
> Hi,
>
> A little while ago, something went horribly wrong.
>
> I could still use my mouse and the desktop was still alive more or
> less... everything using networking was dead AND the keyboard was
> dead... So i composed commands using existing text on the screen.
>
> The device:
> sky2 0000:02:00.0: v1.20 addr 0xdbffc000 irq 17 Yukon-EC (0xb6) rev 2
> sky2 0000:02:00.0: PCI Express Advanced Error Reporting not configured or MMCONFIG problem?
> sky2 0000:02:00.0: No interrupt generated using MSI, switching to INTx mode.
> sky2 eth0: addr 00:15:f2:aa:8b:3e

The recovery logic for hung FIFO no longer works in 2.6.24-rc1+, I'm looking
into it.
---

Ahh, ok thanks, i dunno if your reply is caught in greylisting...
Else please CC always =)
(did you get the email on netdev as well? in that case, i'm very sorry
for reposting... I only follow linux-kernel on a regular basis)

When you have something ready for testing, mail me a patch =)

--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part