2005-12-19 19:54:58

by Carsten Otto

[permalink] [raw]
Subject: Intel e1000 fails after RAM upgrade

Hi there!

First the basic system specs:
Athlon64 3500+ S939, Winchester
Kernel 2.6.14.4, X86_64
4*1 GB RAM DDR 333, Dual Channel [before: 2*1 GB RAM DDR 400, Dual Channel]
Intel Gigabit PCI (Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02))
Abit AV8

After upgrading the memory to 4 GB I noticed my e1000 did not work.
dmesg shows:

e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
TDH <2000>
TDT <2000>
next_to_use <6>
next_to_clean <0>
buffer_info[next_to_clean]
dma <13024c002>
time_stamp <ffffd8c7>
next_to_watch <0>
jiffies <ffffe096>
next_to_watch.status <0>
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
TDH <2000>
TDT <2000>
next_to_use <6>
next_to_clean <0>
buffer_info[next_to_clean]
dma <13024c002>
time_stamp <ffffd8c7>
next_to_watch <0>
jiffies <ffffe28a>
next_to_watch.status <0>
eth0: no IPv6 routers present
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
TDH <2000>
TDT <2000>
next_to_use <6>
next_to_clean <0>
buffer_info[next_to_clean]
dma <13024c002>
time_stamp <ffffd8c7>
next_to_watch <0>
jiffies <ffffe47e>
next_to_watch.status <0>
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
TDH <2000>
TDT <2000>
next_to_use <6>
next_to_clean <0>
buffer_info[next_to_clean]
dma <13024c002>
time_stamp <ffffd8c7>
next_to_watch <0>
jiffies <ffffe672>
next_to_watch.status <0>

ethtool -t eth0 offline:
The test result is FAIL
The test extra info:
Register test (offline) 40
Eeprom test (offline) 0
Interrupt test (offline) 4
Loopback test (offline) 13
Link test (on/offline) 1

I have two of these cards. Both run fine in my (old, 32bit) server. I
tested with both cards with both systems. Only in my 64bit machine this
error occurs - with both cards.

Please tell me what to do. I have to live with the VIA onboard in the
meantime and that is not the best network card...

Thanks a lot,
--
Carsten Otto
[email protected]
http://www.c-otto.de


Attachments:
(No filename) (2.41 kB)
(No filename) (189.00 B)
Download all attachments

2005-12-19 22:38:36

by Carsten Otto

[permalink] [raw]
Subject: Re: Intel e1000 fails after RAM upgrade

On Mon, Dec 19, 2005 at 08:54:58PM +0100, Carsten Otto wrote:
> After upgrading the memory to 4 GB I noticed my e1000 did not work.

The problem also exists when I remove 2 GB. So it has to do something
with the kernel update in between. I will downgrade the kernel now until
this is solved.
--
Carsten Otto
[email protected]
http://www.c-otto.de


Attachments:
(No filename) (337.00 B)
(No filename) (189.00 B)
Download all attachments

2005-12-19 22:50:03

by Bonilla, Alejandro

[permalink] [raw]
Subject: RE: Intel e1000 fails after RAM upgrade


|-----Original Message-----
|From: [email protected]
|[mailto:[email protected]] On Behalf Of Carsten Otto
|Sent: Monday, December 19, 2005 4:39 PM
|To: [email protected]
|Subject: Re: Intel e1000 fails after RAM upgrade
|
|On Mon, Dec 19, 2005 at 08:54:58PM +0100, Carsten Otto wrote:
|> After upgrading the memory to 4 GB I noticed my e1000 did not work.
|
|The problem also exists when I remove 2 GB. So it has to do something
|with the kernel update in between. I will downgrade the kernel

I wish I could add something to the thread here. Sorry for not kicking
in before, I did not see it.

I remember a problem with the 100Ve and the 1000MT giving issues when it
is a LOM or even a PCI adapter. I used to fix a lot of these problems by
removing all power from the board molex, maybe the "battery" on the
mother board for some minutes and then plug-in everything back in.

Basically, the NIC wouldn't negotiate or will act funky, like no link or
no real connectivity.

Dunno if you already tried it, hope it helps.

.Alejandro

|now until
|this is solved.
|--
|Carsten Otto
|[email protected]
|http://www.c-otto.de
|

2005-12-19 23:09:08

by Carsten Otto

[permalink] [raw]
Subject: Re: Intel e1000 fails after RAM upgrade

On Mon, Dec 19, 2005 at 04:49:58PM -0600, Bonilla, Alejandro wrote:
> I remember a problem with the 100Ve and the 1000MT giving issues when it
> is a LOM or even a PCI adapter. I used to fix a lot of these problems by
> removing all power from the board molex, maybe the "battery" on the
> mother board for some minutes and then plug-in everything back in.

I carried the card around the house for some minutes (although the
computer had power for most of the time). I will try a bios reset
(without battery) now.

The kernel downgrade did not help.

e1000: eth0: e1000_reg_test: pattern test reg 0028 failed: got
0xA5A585A5 expected 0xA5A5A5A5

This is shown when testing with ethtool.

Bye,
--
Carsten Otto
[email protected]
http://www.c-otto.de


Attachments:
(No filename) (738.00 B)
(No filename) (189.00 B)
Download all attachments

2005-12-19 23:36:00

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: Intel e1000 fails after RAM upgrade

On 12/19/05, Carsten Otto <[email protected]> wrote:
> Hi there!
>
> First the basic system specs:
> Athlon64 3500+ S939, Winchester
> Kernel 2.6.14.4, X86_64
> 4*1 GB RAM DDR 333, Dual Channel [before: 2*1 GB RAM DDR 400, Dual Channel]
> Intel Gigabit PCI (Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02))
> Abit AV8
>
> After upgrading the memory to 4 GB I noticed my e1000 did not work.
> dmesg shows:
>
> e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
> e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> TDH <2000>
> TDT <2000>
> next_to_use <6>
> next_to_clean <0>
> buffer_info[next_to_clean]
> dma <13024c002>
> time_stamp <ffffd8c7>
> next_to_watch <0>
> jiffies <ffffe096>
> next_to_watch.status <0>

are you using 4096 tx descriptors? what is your MTU configured to?
I'm confused because it appears you have 8192 (0x2000) descriptors but
the driver only allows 4096

> e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> TDH <2000>
> TDT <2000>
> next_to_use <6>
> next_to_clean <0>
> buffer_info[next_to_clean]
> dma <13024c002>
> time_stamp <ffffd8c7>
> next_to_watch <0>
> jiffies <ffffe28a>
> next_to_watch.status <0>
> eth0: no IPv6 routers present
> e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> TDH <2000>
> TDT <2000>
> next_to_use <6>
> next_to_clean <0>
> buffer_info[next_to_clean]
> dma <13024c002>
> time_stamp <ffffd8c7>
> next_to_watch <0>
> jiffies <ffffe47e>
> next_to_watch.status <0>
> e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> TDH <2000>
> TDT <2000>
> next_to_use <6>
> next_to_clean <0>
> buffer_info[next_to_clean]
> dma <13024c002>
> time_stamp <ffffd8c7>
> next_to_watch <0>
> jiffies <ffffe672>
> next_to_watch.status <0>
>
> ethtool -t eth0 offline:
> The test result is FAIL
> The test extra info:
> Register test (offline) 40
> Eeprom test (offline) 0
> Interrupt test (offline) 4
> Loopback test (offline) 13
> Link test (on/offline) 1
>
> I have two of these cards. Both run fine in my (old, 32bit) server. I
> tested with both cards with both systems. Only in my 64bit machine this
> error occurs - with both cards.
>
> Please tell me what to do. I have to live with the VIA onboard in the
> meantime and that is not the best network card...

well, lets work on what is occuring, because this should work just fine.

2005-12-20 00:47:55

by Carsten Otto

[permalink] [raw]
Subject: Re: Intel e1000 fails after RAM upgrade

On Mon, Dec 19, 2005 at 03:35:57PM -0800, Jesse Brandeburg wrote:
> > TDH <2000>
> > TDT <2000>
> > next_to_use <6>
> > next_to_clean <0>
> > buffer_info[next_to_clean]
> > dma <13024c002>
> > time_stamp <ffffd8c7>
> > next_to_watch <0>
> > jiffies <ffffe096>
> > next_to_watch.status <0>
>
> are you using 4096 tx descriptors? what is your MTU configured to?
> I'm confused because it appears you have 8192 (0x2000) descriptors but
> the driver only allows 4096

I am not. My MTU is 1500. At the moment the error message is:

TDH <0>
TDT <3>
next_to_use <3>
next_to_clean <0>
buffer_info[next_to_clean]
dma <128d61202>
time_stamp <100026052>
next_to_watch <0>
jiffies <10002667c>
next_to_watch.status <0>

I can't reproduce the 2000 value.

PS: The problem does not occur with Knoppix 3.9.

Thanks for the help,
--
Carsten Otto
[email protected]
http://www.c-otto.de


Attachments:
(No filename) (1.07 kB)
(No filename) (189.00 B)
Download all attachments

2006-01-02 12:17:46

by Carsten Otto

[permalink] [raw]
Subject: Re: Intel e1000 fails after RAM upgrade

On Mon, Dec 19, 2005 at 08:54:58PM +0100, Carsten Otto wrote:
> e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang

Is there anything I can do to make this work and/or you have more fun
and luck fixing this? I could provide plenty of debugging information,
just tell me what you need. Unfortunately I am unable to solve the
problem myself.

Short summary:
Every e1000 card I tried (Desktop MT) produces above error in my
computer and works in other PCs. Memory/BIOS/Kernel-version do not
change this. Either the kernel is flawed in several versions or my system
(Gentoo) does some ugly things to the driver (I doubt that). Changing some
values with ethtool does not help.

Thanks,
--
Carsten Otto
[email protected]
http://www.c-otto.de

2006-01-04 23:38:57

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: Intel e1000 fails after RAM upgrade

On 1/2/06, Carsten Otto <[email protected]> wrote:
> On Mon, Dec 19, 2005 at 08:54:58PM +0100, Carsten Otto wrote:
> > e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>
> Is there anything I can do to make this work and/or you have more fun
> and luck fixing this? I could provide plenty of debugging information,
> just tell me what you need. Unfortunately I am unable to solve the
> problem myself.
>
> Short summary:
> Every e1000 card I tried (Desktop MT) produces above error in my
> computer and works in other PCs. Memory/BIOS/Kernel-version do not
> change this. Either the kernel is flawed in several versions or my system
> (Gentoo) does some ugly things to the driver (I doubt that). Changing some
> values with ethtool does not help.

I'm not sure it's e1000, is there any chance you can try a different
network adapter (like not e1000 based)? with the ethtool diags error
there is something corrupting memory in your system or on your pci bus
(most likely)

My best recommendation is to check to make sure there aren't any bios
updates for your system, make sure you aren't running overclocked on
the pci bus, try different slots, try a different network adapter.
Maybe you can try memtest86 overnight?

Honestly right now it doesn't sound like a network problem, but a
system problem.

Jesse

2006-01-04 23:44:07

by Carsten Otto

[permalink] [raw]
Subject: Re: Intel e1000 fails after RAM upgrade

On Wed, Jan 04, 2006 at 03:38:55PM -0800, Jesse Brandeburg wrote:
> I'm not sure it's e1000, is there any chance you can try a different
> network adapter (like not e1000 based)? with the ethtool diags error
> there is something corrupting memory in your system or on your pci bus
> (most likely)
>
> My best recommendation is to check to make sure there aren't any bios
> updates for your system, make sure you aren't running overclocked on
> the pci bus, try different slots, try a different network adapter.
> Maybe you can try memtest86 overnight?
>
> Honestly right now it doesn't sound like a network problem, but a
> system problem.

I forgot to answer here, sorry.
The problem is solved, see here:
http://lkml.org/lkml/2006/1/2/21
--
Carsten Otto
[email protected]
http://www.c-otto.de


Attachments:
(No filename) (788.00 B)
(No filename) (189.00 B)
Download all attachments