2009-04-13 22:48:36

by Yan Seiner

[permalink] [raw]
Subject: mcp55 forcedeth woes

I have a few Asus M2N-SLI deluxe mobos. These mobos have the MCP55
chipset and two 1gb ethernet ports. Occasionally, and for no reason that
I can figure out, these ports will die. There are various ways to try and
fix these; they seem to be about 50% effective, and approach something
akin to voodoo.

Based on this discussion here:

http://patchwork.kernel.org/patch/16212/

I've gotten the ability to turn the ports on and off somewhat.

For port 0,

ethtool -s eth0 autoneg off speed 10 duplex full

turns on the link, and gets me half-duplex, 10mb/sec. Not much, granted.

ethtool -s eth0 autoneg off speed 100 duplex full

causes the link to go up and down on about a 2 second cycle.

ethtool -s eth0 autoneg on

causes the link to drop.

For port 1, the behavior is similar, except that I can get a stable 100
mbit connection.

So the problem is in the autoneg code. It's a driver issue as this is
reported widely to work under windows of various flavors.

I'm running 2.6.29.1; I'm ok with patching and building kernels, but I'm
not a kernel hacker.

What, if anything, can I provide and do to fix this?


--
Yan Seiner, PE

Support my bid for the 4J School Board
http://www.seiner.com


2009-04-13 23:49:00

by Andrew Morton

[permalink] [raw]
Subject: Re: mcp55 forcedeth woes

(suitable cc's added)

On Mon, 13 Apr 2009 15:13:05 -0700 (PDT)
"Yan Seiner" <[email protected]> wrote:

> I have a few Asus M2N-SLI deluxe mobos. These mobos have the MCP55
> chipset and two 1gb ethernet ports. Occasionally, and for no reason that
> I can figure out, these ports will die. There are various ways to try and
> fix these; they seem to be about 50% effective, and approach something
> akin to voodoo.
>
> Based on this discussion here:
>
> http://patchwork.kernel.org/patch/16212/
>
> I've gotten the ability to turn the ports on and off somewhat.
>
> For port 0,
>
> ethtool -s eth0 autoneg off speed 10 duplex full
>
> turns on the link, and gets me half-duplex, 10mb/sec. Not much, granted.
>
> ethtool -s eth0 autoneg off speed 100 duplex full
>
> causes the link to go up and down on about a 2 second cycle.
>
> ethtool -s eth0 autoneg on
>
> causes the link to drop.
>
> For port 1, the behavior is similar, except that I can get a stable 100
> mbit connection.
>
> So the problem is in the autoneg code. It's a driver issue as this is
> reported widely to work under windows of various flavors.
>
> I'm running 2.6.29.1; I'm ok with patching and building kernels, but I'm
> not a kernel hacker.
>
> What, if anything, can I provide and do to fix this?
>

2009-04-14 14:37:00

by Yan Seiner

[permalink] [raw]
Subject: Re: mcp55 forcedeth woes

Gene Heskett wrote:
> On Monday 13 April 2009, Yan Seiner wrote:
>
>> Gene Heskett wrote:
>>
>>> On Monday 13 April 2009, Yan Seiner wrote:
>>>
>>>> I have a few Asus M2N-SLI deluxe mobos. These mobos have the MCP55
>>>> chipset and two 1gb ethernet ports. Occasionally, and for no reason that
>>>> I can figure out, these ports will die. There are various ways to try
>>>> and fix these; they seem to be about 50% effective, and approach
>>>> something akin to voodoo.
>>>>
>>>> Based on this discussion here:
>>>>
>>>> http://patchwork.kernel.org/patch/16212/
>>>>
>>>> I've gotten the ability to turn the ports on and off somewhat.
>>>>
>>>> For port 0,
>>>>
>>>> ethtool -s eth0 autoneg off speed 10 duplex full
>>>>
>>>> turns on the link, and gets me half-duplex, 10mb/sec. Not much, granted.
>>>>
>>>> ethtool -s eth0 autoneg off speed 100 duplex full
>>>>
>>>> causes the link to go up and down on about a 2 second cycle.
>>>>
>>>> ethtool -s eth0 autoneg on
>>>>
>>>> causes the link to drop.
>>>>
>>>> For port 1, the behavior is similar, except that I can get a stable 100
>>>> mbit connection.
>>>>
>>>> So the problem is in the autoneg code. It's a driver issue as this is
>>>> reported widely to work under windows of various flavors.
>>>>
>>>> I'm running 2.6.29.1; I'm ok with patching and building kernels, but I'm
>>>> not a kernel hacker.
>>>>
>>>> What, if anything, can I provide and do to fix this?
>>>>
>>> It was in 2.6.29-rc8 or 9 that they finally got the ability to turn them
>>> back on on my identical mobo. Through most of the 29-rcx series we had
>>> the choice of rebooting with the reset button, or powering everything
>>> associated with the ports down for about 2 minutes so they would forget
>>> they were turned off by a graceful shutdown. Now they are turned off (and
>>> I've still NDI why) and back on like they are supposed to be. I called
>>> that a PIMA.
>>>
>> Yeah, I've been fighting this for a while.... The boards are rock-solid
>> under load, which is why I like them.... But this is a PITA.
>>
>> This board worked fine, then I shutdown and the ports have not come back
>> since. I'm running 2.6.29.1 - no joy on the ports.
>>
>>
>
> Shut it down, including removing the power cord, and unplug all ethernet
> cables attached, give it time to fully discharge all stored power in the caps,
> at least 30 secs, I usually go make a cup of tea in the microwave, so its
> about 3 minutes. Plug everything back in and power it up, they should work
> again.
>
> However, I've been running 2.6.29.1-rc2, and that has not been a problem, I
> can see the leds on the ports go plumb dark at it runs the shutdown, and come
> back on about 15 seconds before the init.d/network script runs as it boots up.
>
>
>> I'm building forcedeth from .30-rc1 - we'll see if that helps.
>> Something like 15% of the driver changed, so it's still in very heavy
>> development.
>>
>
> Wow! Like you, I'm using forcedeth. 2.6.29.1-rc1 did have a short uptime for
> forcedeth bug IIRC, but so far, -rc2 has lasted longer than KDE-4.2.1 on this
> F10 system will, I had an 8 day uptime at first, and I'm in the 5th day again.
> I killed it the first time screwing with some worthless bluetooth dongles I
> got from USBGear. Locked it up tighter than a Nebraska bulls ass in flytime.
> Had to use the reset button. :(
>
I followed Gene's advice and got some progress....

port 0 is now fully functional. Port 1, however, remains in its
semi-zombie state. Maybe I'll take the machine off-line longer next
time. Maybe I'll sacrifice a black and white chicken at the same time.

I also back-ported 2.9.30-rc1 forcedeth.c to my 2.6.29.1 kernel; no
difference whatsoever.

One other observation:

On the switch, the status LEDs glow with half-brightness when I enable
port 1 using ethtool... This leads me to suspect that the voltage
levels on the port aren't normal. Not being a hardware engineer, I have
no idea what importance this has; I am offering this as an observation.

--Yan

--
Yan Seiner

Support my bid for the 4J School Board.
Visit http://www.seiner.com/schoolboard

2009-04-14 16:35:23

by John Stoffel

[permalink] [raw]
Subject: Re: mcp55 forcedeth woes

>>>>> "Yan" == Yan Seiner <[email protected]> writes:

Yan> Gene Heskett wrote:
>> On Monday 13 April 2009, Yan Seiner wrote:
>>
>>> Gene Heskett wrote:
>>>
>>>> On Monday 13 April 2009, Yan Seiner wrote:
>>>>
>>>>> I have a few Asus M2N-SLI deluxe mobos. These mobos have the MCP55
>>>>> chipset and two 1gb ethernet ports. Occasionally, and for no reason that
>>>>> I can figure out, these ports will die. There are various ways to try
>>>>> and fix these; they seem to be about 50% effective, and approach
>>>>> something akin to voodoo.
>>>>>
>>>>> Based on this discussion here:
>>>>>
>>>>> http://patchwork.kernel.org/patch/16212/
>>>>>
>>>>> I've gotten the ability to turn the ports on and off somewhat.
>>>>>
>>>>> For port 0,
>>>>>
>>>>> ethtool -s eth0 autoneg off speed 10 duplex full
>>>>>
>>>>> turns on the link, and gets me half-duplex, 10mb/sec. Not much, granted.
>>>>>
>>>>> ethtool -s eth0 autoneg off speed 100 duplex full
>>>>>
>>>>> causes the link to go up and down on about a 2 second cycle.
>>>>>
>>>>> ethtool -s eth0 autoneg on
>>>>>
>>>>> causes the link to drop.
>>>>>
>>>>> For port 1, the behavior is similar, except that I can get a stable 100
>>>>> mbit connection.
>>>>>
>>>>> So the problem is in the autoneg code. It's a driver issue as this is
>>>>> reported widely to work under windows of various flavors.
>>>>>
>>>>> I'm running 2.6.29.1; I'm ok with patching and building kernels, but I'm
>>>>> not a kernel hacker.
>>>>>
>>>>> What, if anything, can I provide and do to fix this?
>>>>>
>>>> It was in 2.6.29-rc8 or 9 that they finally got the ability to turn them
>>>> back on on my identical mobo. Through most of the 29-rcx series we had
>>>> the choice of rebooting with the reset button, or powering everything
>>>> associated with the ports down for about 2 minutes so they would forget
>>>> they were turned off by a graceful shutdown. Now they are turned off (and
>>>> I've still NDI why) and back on like they are supposed to be. I called
>>>> that a PIMA.
>>>>
>>> Yeah, I've been fighting this for a while.... The boards are rock-solid
>>> under load, which is why I like them.... But this is a PITA.
>>>
>>> This board worked fine, then I shutdown and the ports have not come back
>>> since. I'm running 2.6.29.1 - no joy on the ports.
>>>
>>>
>>
>> Shut it down, including removing the power cord, and unplug all ethernet
>> cables attached, give it time to fully discharge all stored power in the caps,
>> at least 30 secs, I usually go make a cup of tea in the microwave, so its
>> about 3 minutes. Plug everything back in and power it up, they should work
>> again.
>>
>> However, I've been running 2.6.29.1-rc2, and that has not been a problem, I
>> can see the leds on the ports go plumb dark at it runs the shutdown, and come
>> back on about 15 seconds before the init.d/network script runs as it boots up.
>>
>>
>>> I'm building forcedeth from .30-rc1 - we'll see if that helps.
>>> Something like 15% of the driver changed, so it's still in very heavy
>>> development.
>>>
>>
>> Wow! Like you, I'm using forcedeth. 2.6.29.1-rc1 did have a short uptime for
>> forcedeth bug IIRC, but so far, -rc2 has lasted longer than KDE-4.2.1 on this
>> F10 system will, I had an 8 day uptime at first, and I'm in the 5th day again.
>> I killed it the first time screwing with some worthless bluetooth dongles I
>> got from USBGear. Locked it up tighter than a Nebraska bulls ass in flytime.
>> Had to use the reset button. :(
>>
Yan> I followed Gene's advice and got some progress....

Yan> port 0 is now fully functional. Port 1, however, remains in its
Yan> semi-zombie state. Maybe I'll take the machine off-line longer next
Yan> time. Maybe I'll sacrifice a black and white chicken at the same time.

Yan> I also back-ported 2.9.30-rc1 forcedeth.c to my 2.6.29.1 kernel; no
Yan> difference whatsoever.

Yan> One other observation:

Yan> On the switch, the status LEDs glow with half-brightness when I enable
Yan> port 1 using ethtool... This leads me to suspect that the voltage
Yan> levels on the port aren't normal. Not being a hardware engineer, I have
Yan> no idea what importance this has; I am offering this as an observation.

I've also got one of these boards, but I seem to recall that 2.6.28
and higher worked just fine, but when I was chasing another problem
with complete system crashes when running tcpdump, I found the dead
port problem as well.

Thanks for posting your workaround, they'll be a big help. Esp if I
put them in early to the system boot process as a quick hack.

The M2N-SLI Deluxe boards are nice though, stable stable stable. I've
been very happy with mine. I wonder if there's a newer BIOS which
might address some of these issues as well, since it seems to me that
the BIOS, esp on power up, should be reseting the ports to something
more sane.

John

2009-04-14 16:48:39

by Yan Seiner

[permalink] [raw]
Subject: Re: mcp55 forcedeth woes


On Tue, April 14, 2009 9:34 am, John Stoffel wrote:
>
> I've also got one of these boards, but I seem to recall that 2.6.28
> and higher worked just fine, but when I was chasing another problem
> with complete system crashes when running tcpdump, I found the dead
> port problem as well.

Yeah, this one surfaced after I had a memory stick go bad and lock up the
machine several times.

>
> Thanks for posting your workaround, they'll be a big help. Esp if I
> put them in early to the system boot process as a quick hack.
>
> The M2N-SLI Deluxe boards are nice though, stable stable stable. I've
> been very happy with mine. I wonder if there's a newer BIOS which
> might address some of these issues as well, since it seems to me that
> the BIOS, esp on power up, should be reseting the ports to something
> more sane.

There is a newer bios, 1702, that may address these issues - at least teh
writeup for the 17xx bios for the m2n32 talks about a network lockup fix.
I haven't tried yet. These are the most stable boards I've found; one of
mine runs 6 SATA drives locally, 9 via eSATA, 2 dual DVI video cards, and
2 15K scsi drives, all without a hitch. It never glitches - except for
the d*mn network, this board is a piece of perfection.

--
Yan Seiner, PE

Support my bid for the 4J School Board
http://www.seiner.com

2009-04-15 03:01:28

by Sid Boyce

[permalink] [raw]
Subject: Re: mcp55 forcedeth woes

Yan Seiner wrote:
> There is a newer bios, 1702, that may address these issues - at least teh
> writeup for the 17xx bios for the m2n32 talks about a network lockup fix.
> I haven't tried yet. These are the most stable boards I've found; one of
> mine runs 6 SATA drives locally, 9 via eSATA, 2 dual DVI video cards, and
> 2 15K scsi drives, all without a hitch. It never glitches - except for
> the d*mn network, this board is a piece of perfection.
For the M2N-SLI-Deluxe I found only 1701 which causes oops when booting
any kernel on 64X2 6000 and for the M2N32-SLI 2205 which I haven't tried
so far.

Regards
Sid.
--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support
Specialist, Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks

2009-04-15 13:54:59

by Yan Seiner

[permalink] [raw]
Subject: Re: mcp55 forcedeth woes

Andrew Morton wrote:
> (suitable cc's added)
>
> On Mon, 13 Apr 2009 15:13:05 -0700 (PDT)
> "Yan Seiner" <[email protected]> wrote:
>
>
>> I have a few Asus M2N-SLI deluxe mobos. These mobos have the MCP55
>> chipset and two 1gb ethernet ports. Occasionally, and for no reason that
>> I can figure out, these ports will die. There are various ways to try and
>> fix these; they seem to be about 50% effective, and approach something
>> akin to voodoo.
>>
>> Based on this discussion here:
>>
>> http://patchwork.kernel.org/patch/16212/
>>
>> I've gotten the ability to turn the ports on and off somewhat.
>>
>> For port 0,
>>
>> ethtool -s eth0 autoneg off speed 10 duplex full
>>
>> turns on the link, and gets me half-duplex, 10mb/sec. Not much, granted.
>>
>> ethtool -s eth0 autoneg off speed 100 duplex full
>>
>> causes the link to go up and down on about a 2 second cycle.
>>
>> ethtool -s eth0 autoneg on
>>
>> causes the link to drop.
>>
>> For port 1, the behavior is similar, except that I can get a stable 100
>> mbit connection.
>>
>> So the problem is in the autoneg code. It's a driver issue as this is
>> reported widely to work under windows of various flavors.
>>
>> I'm running 2.6.29.1; I'm ok with patching and building kernels, but I'm
>> not a kernel hacker.
>>
>> What, if anything, can I provide and do to fix this?
>>
>>
>
>
>
I've been reading the forcedeth.c code.

So far I've established that it does recognize the Marvell PHY. AFAIK,
the PHY on the board is 88E1116, so that much is right. OK, not much
progress.

In nv_probe, there are some lines that should turn on the PHY. I stuck
some printk in there to see what's happening. Turns out nothing is
happening..... powerstate remains 0? Does this mean that the power is
already on? Or that we're reading/writing the wrong register?

if (id->driver_data & DEV_HAS_POWER_CNTRL) {

/* take phy and nic out of low power mode */
powerstate = readl(base + NvRegPowerState2);
printk(KERN_INFO "Turning on power: 0x%04x.\n",powerstate);
powerstate &= ~NVREG_POWERSTATE2_POWERUP_MASK;
if ((id->device == PCI_DEVICE_ID_NVIDIA_NVENET_12 ||
id->device == PCI_DEVICE_ID_NVIDIA_NVENET_13) &&
pci_dev->revision >= 0xA3)
powerstate |= NVREG_POWERSTATE2_POWERUP_REV_A3;
printk(KERN_INFO "Writing powerstate: 0x%04x.\n",powerstate);
writel(powerstate, base + NvRegPowerState2);
powerstate = readl(base + NvRegPowerState2);
printk(KERN_INFO "Powerstate 0x%04x.\n",powerstate);
}

[ 3936.277196] forcedeth: Reverse Engineered nForce ethernet driver.
Version 0.64.
[ 3936.277223] forcedeth 0000:00:08.0: PCI INT A -> Link[APCH] -> GSI 22
(level, low) -> IRQ 22
[ 3936.277230] forcedeth 0000:00:08.0: setting latency timer to 64
[ 3936.277331] nv_probe: set workaround bit for reversed mac addr
[ 3936.277337] Turning on power: 0x0000.
[ 3936.277339] Writing powerstate: 0x0000.
[ 3936.277343] Powerstate 0x0000.
[ 3936.278352] 0000:00:08.0: open: Found PHY 5040:0003 at address 19.
[ 3936.796061] 0000:00:08.0: phy reset
[ 3937.316809] forcedeth 0000:00:08.0: ifname eth0, PHY OUI 0x5043 @ 19,
addr 00:1e:8c:6f:a5:27
[ 3937.316816] forcedeth 0000:00:08.0: highdma csum vlan pwrctl mgmt
gbit lnktim msi desc-v3
[ 3937.317113] forcedeth 0000:00:09.0: PCI INT A -> Link[AMC1] -> GSI 20
(level, low) -> IRQ 20
[ 3937.317120] forcedeth 0000:00:09.0: setting latency timer to 64
[ 3937.317200] nv_probe: set workaround bit for reversed mac addr
[ 3937.317206] Turning on power: 0x0000.
[ 3937.317209] Writing powerstate: 0x0000.
[ 3937.317213] Powerstate 0x0000.
[ 3937.318317] 0000:00:09.0: open: Found PHY 5040:0003 at address 19.
[ 3937.379044] forcedeth 0000:00:08.0: irq 29 for MSI/MSI-X
[ 3937.379236] eth0: no link during initialization.
[ 3937.836062] 0000:00:09.0: phy reset
[ 3938.356857] forcedeth 0000:00:09.0: ifname eth1, PHY OUI 0x5043 @ 19,
addr 00:1e:8c:6f:be:40
[ 3938.356864] forcedeth 0000:00:09.0: highdma csum vlan pwrctl mgmt
gbit lnktim msi desc-v3
[ 3938.417202] forcedeth 0000:00:09.0: irq 30 for MSI/MSI-X
[ 3938.417396] eth1: no link during initialization.

(this is getting more urgent; I rebooted the machine and lost both
network ports. Instead of dual-gigabit I'm running off an ancient PCI
card. I'm willing to test/build/debug, but I need some input from those
who understand the forcedeth code....)

--
Yan Seiner

Support my bid for the 4J School Board.
Visit http://www.seiner.com/schoolboard