LinuxLists.cc - Intel 82559 NIC corrupted EEPROM

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

John wrote:
>
> Several people have reported the same error. Intel's Auke Kok has
> stated that ignoring the error is a BAD idea.
>
> http://lkml.org/lkml/2006/7/10/215
>
> What tool is used to reprogram the EEPROM? ethtool?
> I suppose I'll have to ask the manufacturer for an updated EEPROM?
>
> # ethtool -e eth0
> Cannot get EEPROM data: Operation not supported
>
> I'm not sure why I can't dump the contents of the EEPROM.
> Does the driver need to be loaded?
>

Yes, the driver needs to be loaded.

Basically, Auke wants you to throw away your NIC and/or motherboard.
Since you're effectively dead, the only damage you can do by disabling
the check has already been done. This unfortunately seems to be fairly
common with e100, especially for the on-motherboard version, and you
basically have two options: either disable the check or write an offline
tool to reprogram the EEPROM.

The latest netdev tree (if it's not in Linus' tree already, which it
might be) does add back the option to ignore the check so you can update
the EEPROM, which will automatically fix the checksum.

-hpa

2006-11-04 06:22:52

by Tim Hockin

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

On Fri, Nov 03, 2006 at 05:46:25PM -0800, H. Peter Anvin wrote:
> Basically, Auke wants you to throw away your NIC and/or motherboard.
> Since you're effectively dead, the only damage you can do by disabling
> the check has already been done. This unfortunately seems to be fairly
> common with e100, especially for the on-motherboard version, and you
> basically have two options: either disable the check or write an offline
> tool to reprogram the EEPROM.

I have a tool to write the eepro100 EEPROM. Let me see if I can find it.
It even had all the default data coded, ready to restore a NIC to default.

However - back in the eepro100.c days, it was considered a warning only if
the EEPROM had a bad checksum. There were two "supported" formats for the
EEPROM, one of which was just the MAC address. And it worked!

Tim

2006-11-04 06:28:10

by Tim Hockin

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

On Fri, Nov 03, 2006 at 10:22:51PM -0800, [email protected] wrote:
> On Fri, Nov 03, 2006 at 05:46:25PM -0800, H. Peter Anvin wrote:
> > Basically, Auke wants you to throw away your NIC and/or motherboard.
> > Since you're effectively dead, the only damage you can do by disabling
> > the check has already been done. This unfortunately seems to be fairly
> > common with e100, especially for the on-motherboard version, and you
> > basically have two options: either disable the check or write an offline
> > tool to reprogram the EEPROM.
>
> I have a tool to write the eepro100 EEPROM. Let me see if I can find it.
> It even had all the default data coded, ready to restore a NIC to default.
>
> However - back in the eepro100.c days, it was considered a warning only if
> the EEPROM had a bad checksum. There were two "supported" formats for the
> EEPROM, one of which was just the MAC address. And it worked!

One from the vaults: http://www.hockin.org/~thockin/enet_eeprom/

It's pretty simple, but easily hacked. ifdown your interface first! :)

Tim

2006-11-07 11:22:40

by John

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

H. Peter Anvin wrote:

> John wrote:
>
>> Several people have reported the same error. Intel's Auke Kok has
>> stated that ignoring the error is a BAD idea.
>>
>> http://lkml.org/lkml/2006/7/10/215
>>
>> What tool is used to reprogram the EEPROM? ethtool?
>> I suppose I'll have to ask the manufacturer for an updated EEPROM?
>>
>> # ethtool -e eth0
>> Cannot get EEPROM data: Operation not supported
>>
>> I'm not sure why I can't dump the contents of the EEPROM.
>> Does the driver need to be loaded?
>
> Yes, the driver needs to be loaded.
>
> Basically, Auke wants you to throw away your NIC and/or motherboard.
> Since you're effectively dead, the only damage you can do by
> disabling the check has already been done. This unfortunately seems
> to be fairly common with e100, especially for the on-motherboard
> version, and you basically have two options: either disable the check
> or write an offline tool to reprogram the EEPROM.
>
> The latest netdev tree (if it's not in Linus' tree already, which it
> might be) does add back the option to ignore the check so you can
> update the EEPROM, which will automatically fix the checksum.

I have investigated further.

I changed e100_eeprom_load() to return 0 even when the checksum fails.

Loading e100.ko reports:

e100: Intel(R) PRO/100 Network Driver, 3.4.14-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LNKA] -> GSI 11 (level,
low) -> IRQ 11
e100: 0000:00:08.0: e100_eeprom_load: EEPROM corrupted
e100: eth0: e100_probe: addr 0xe6302000, irq 11, MAC addr FF:FF:FF:FF:FF:FF
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKB] -> GSI 12 (level,
low) -> IRQ 12
e100: eth1: e100_probe: addr 0xe6301000, irq 12, MAC addr 00:30:64:04:E6:E5
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKC] -> GSI 10 (level,
low) -> IRQ 10
e100: eth2: e100_probe: addr 0xe6300000, irq 10, MAC addr 00:30:64:04:E6:E6

I had thought all cards would have the same problem, but only the
first NIC seems affected.

The MAC address for eth0 should be 00:30:64:04:E6:E4
(0x003064 is an ADLINK OUI.)

#ip addr
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
5: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether ff:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
6: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:30:64:04:e6:e5 brd ff:ff:ff:ff:ff:ff
7: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:30:64:04:e6:e6 brd ff:ff:ff:ff:ff:ff

I then used ethtool to dump the contents of the EEPROMs.

# ethtool -e eth0
Offset Values
------ ------
0x0000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0060 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

# ethtool -e eth1
Offset Values
------ ------
0x0000 00 30 64 04 e6 e5 03 0e 00 00 01 02 01 47 00 00
0x0010 13 72 10 83 a2 40 01 00 86 80 00 00 00 00 00 00
0x0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0060 28 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f7 91

# ethtool -e eth2
Offset Values
------ ------
0x0000 00 30 64 04 e6 e6 03 0e 00 00 01 02 01 47 00 00
0x0010 13 72 10 83 a2 40 01 00 86 80 00 00 00 00 00 00
0x0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0060 28 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f7 90

Either the EEPROM image on eth0 is corrupted, or ethtool is not
able to read the contents of the EEPROM.

So I tried the other driver, eepro100.c which, AFAIU, e100.c is
supposed to supersede.

Loading eepro100.ko reports:

eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
<[email protected]> and others
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LNKA] -> GSI 11 (level,
low) -> IRQ 11
eth0: 0000:00:08.0, 00:30:64:04:E6:E4, IRQ 11.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKB] -> GSI 12 (level,
low) -> IRQ 12
eth1: 0000:00:09.0, 00:30:64:04:E6:E5, IRQ 12.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKC] -> GSI 10 (level,
low) -> IRQ 10
eth2: 0000:00:0a.0, 00:30:64:04:E6:E6, IRQ 10.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).

NOTE: eepro100.ko found the correct MAC address for eth0.

#ip addr
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:30:64:04:e6:e4 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:30:64:04:e6:e5 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:30:64:04:e6:e6 brd ff:ff:ff:ff:ff:ff

I then used Donald Becker's program to dump the contents of all
the EEPROMs. ( ftp://http://www.scyld.com/pub/diag/ )

# eepro100-diag -ee
eepro100-diag.c:v2.13 2/28/2005 Donald Becker ([email protected])
http://www.scyld.com/diag/index.html

Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xd800.
EEPROM contents, size 64x16:
00: 3000 0464 e4e6 0e03 0000 0201 4701 0000 _0d__________G__
0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
...
0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
0x38: 0000 0000 0000 0000 0000 0000 0000 92f7 ________________
The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:30:64:04:E6:E4.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
Sleep mode is enabled. This is not recommended.
Under high load the card may not respond to
PCI requests, and thus cause a master abort.
To clear sleep mode use the '-G 0 -w -w -f' options.

Index #2: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xdc00.
EEPROM contents, size 64x16:
00: 3000 0464 e5e6 0e03 0000 0201 4701 0000 _0d__________G__
0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
...
0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
0x38: 0000 0000 0000 0000 0000 0000 0000 91f7 ________________
The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:30:64:04:E6:E5.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
Sleep mode is enabled. This is not recommended.
Under high load the card may not respond to
PCI requests, and thus cause a master abort.
To clear sleep mode use the '-G 0 -w -w -f' options.

Index #3: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xe000.
EEPROM contents, size 64x16:
00: 3000 0464 e6e6 0e03 0000 0201 4701 0000 _0d__________G__
0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
...
0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
0x38: 0000 0000 0000 0000 0000 0000 0000 90f7 ________________
The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:30:64:04:E6:E6.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
Sleep mode is enabled. This is not recommended.
Under high load the card may not respond to
PCI requests, and thus cause a master abort.
To clear sleep mode use the '-G 0 -w -w -f' options.

Apparently, eepro100.ko is able to read the contents of the EEPROM on
eth0 and it declares the checksum correct. Is it possible that there is
a bug in e100.c that makes it fail to read the EEPROM on eth0?

Regards,

John

2006-11-07 17:19:23

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

John wrote:
>
> I then used ethtool to dump the contents of the EEPROMs.
>
> # ethtool -e eth0
> Offset Values
> ------ ------
> 0x0000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x0010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x0020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x0030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x0040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x0060 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
> Either the EEPROM image on eth0 is corrupted, or ethtool is not
> able to read the contents of the EEPROM.
>

[...]

>
> I then used Donald Becker's program to dump the contents of all
> the EEPROMs. ( ftp://http://www.scyld.com/pub/diag/ )
>
> # eepro100-diag -ee
> eepro100-diag.c:v2.13 2/28/2005 Donald Becker ([email protected])
> http://www.scyld.com/diag/index.html
>
> Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xd800.
> EEPROM contents, size 64x16:
> 00: 3000 0464 e4e6 0e03 0000 0201 4701 0000 _0d__________G__
> 0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
> ...
> 0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
> 0x38: 0000 0000 0000 0000 0000 0000 0000 92f7 ________________
> The EEPROM checksum is correct.
> Intel EtherExpress Pro 10/100 EEPROM contents:
> Station address 00:30:64:04:E6:E4.
> Board assembly 721383-016, Physical connectors present: RJ45
> Primary interface chip i82555 PHY #1.
> Sleep mode is enabled. This is not recommended.
> Under high load the card may not respond to
> PCI requests, and thus cause a master abort.
> To clear sleep mode use the '-G 0 -w -w -f' options.
>
> Index #2: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xdc00.
> EEPROM contents, size 64x16:
> 00: 3000 0464 e5e6 0e03 0000 0201 4701 0000 _0d__________G__
> 0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
> ...
> 0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
> 0x38: 0000 0000 0000 0000 0000 0000 0000 91f7 ________________
> The EEPROM checksum is correct.
> Intel EtherExpress Pro 10/100 EEPROM contents:
> Station address 00:30:64:04:E6:E5.
> Board assembly 721383-016, Physical connectors present: RJ45
> Primary interface chip i82555 PHY #1.
> Sleep mode is enabled. This is not recommended.
> Under high load the card may not respond to
> PCI requests, and thus cause a master abort.
> To clear sleep mode use the '-G 0 -w -w -f' options.
>
> Index #3: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xe000.
> EEPROM contents, size 64x16:
> 00: 3000 0464 e6e6 0e03 0000 0201 4701 0000 _0d__________G__
> 0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
> ...
> 0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
> 0x38: 0000 0000 0000 0000 0000 0000 0000 90f7 ________________
> The EEPROM checksum is correct.
> Intel EtherExpress Pro 10/100 EEPROM contents:
> Station address 00:30:64:04:E6:E6.
> Board assembly 721383-016, Physical connectors present: RJ45
> Primary interface chip i82555 PHY #1.
> Sleep mode is enabled. This is not recommended.
> Under high load the card may not respond to
> PCI requests, and thus cause a master abort.
> To clear sleep mode use the '-G 0 -w -w -f' options.
>
> Apparently, eepro100.ko is able to read the contents of the EEPROM on
> eth0 and it declares the checksum correct. Is it possible that there is
> a bug in e100.c that makes it fail to read the EEPROM on eth0?
>

Sure as heck sounds like it.

-hpa

2006-11-07 17:43:52

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

H. Peter Anvin wrote:
> John wrote:
>>
>> I then used ethtool to dump the contents of the EEPROMs.
>>
>> # ethtool -e eth0
>> Offset Values
>> ------ ------
>> 0x0000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 0x0010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 0x0020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 0x0030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 0x0040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 0x0060 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>
>> Either the EEPROM image on eth0 is corrupted, or ethtool is not
>> able to read the contents of the EEPROM.
>>
>
> [...]
>
>>
>> I then used Donald Becker's program to dump the contents of all
>> the EEPROMs. ( ftp://http://www.scyld.com/pub/diag/ )
>>
>> # eepro100-diag -ee
>> eepro100-diag.c:v2.13 2/28/2005 Donald Becker ([email protected])
>> http://www.scyld.com/diag/index.html
>>
>> Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xd800.
>> EEPROM contents, size 64x16:
>> 00: 3000 0464 e4e6 0e03 0000 0201 4701 0000 _0d__________G__
>> 0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
>> ...
>> 0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
>> 0x38: 0000 0000 0000 0000 0000 0000 0000 92f7 ________________
>> The EEPROM checksum is correct.
>> Intel EtherExpress Pro 10/100 EEPROM contents:
>> Station address 00:30:64:04:E6:E4.
>> Board assembly 721383-016, Physical connectors present: RJ45
>> Primary interface chip i82555 PHY #1.
>> Sleep mode is enabled. This is not recommended.
>> Under high load the card may not respond to
>> PCI requests, and thus cause a master abort.
>> To clear sleep mode use the '-G 0 -w -w -f' options.
>>
>> Index #2: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xdc00.
>> EEPROM contents, size 64x16:
>> 00: 3000 0464 e5e6 0e03 0000 0201 4701 0000 _0d__________G__
>> 0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
>> ...
>> 0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
>> 0x38: 0000 0000 0000 0000 0000 0000 0000 91f7 ________________
>> The EEPROM checksum is correct.
>> Intel EtherExpress Pro 10/100 EEPROM contents:
>> Station address 00:30:64:04:E6:E5.
>> Board assembly 721383-016, Physical connectors present: RJ45
>> Primary interface chip i82555 PHY #1.
>> Sleep mode is enabled. This is not recommended.
>> Under high load the card may not respond to
>> PCI requests, and thus cause a master abort.
>> To clear sleep mode use the '-G 0 -w -w -f' options.
>>
>> Index #3: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xe000.
>> EEPROM contents, size 64x16:
>> 00: 3000 0464 e6e6 0e03 0000 0201 4701 0000 _0d__________G__
>> 0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
>> ...
>> 0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
>> 0x38: 0000 0000 0000 0000 0000 0000 0000 90f7 ________________
>> The EEPROM checksum is correct.
>> Intel EtherExpress Pro 10/100 EEPROM contents:
>> Station address 00:30:64:04:E6:E6.
>> Board assembly 721383-016, Physical connectors present: RJ45
>> Primary interface chip i82555 PHY #1.
>> Sleep mode is enabled. This is not recommended.
>> Under high load the card may not respond to
>> PCI requests, and thus cause a master abort.
>> To clear sleep mode use the '-G 0 -w -w -f' options.
>>
>> Apparently, eepro100.ko is able to read the contents of the EEPROM on
>> eth0 and it declares the checksum correct. Is it possible that there
>> is a bug in e100.c that makes it fail to read the EEPROM on eth0?
>>
>
> Sure as heck sounds like it.

(Please CC either me or at netdev on all intel nic drivers. thanks. I removed
`[email protected]` since it throws a bounce, and [email protected] is a support
address only, doesn't reach us developers)

how did you do the first `ethtool` eeprom dump? did you have the `e100` module loaded at
that time? Did you use the new `override` mechanism graciously donated by David M?

Cheers,

Auke

2006-11-07 18:29:30

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

Auke Kok wrote:
>
> (Please CC either me or at netdev on all intel nic drivers. thanks. I
> removed `[email protected]` since it throws a bounce, and
> [email protected] is a support address only, doesn't reach us
> developers)
>

I think John <[email protected]> is the one who can actually answer your
questions...

-hpa

2006-11-07 18:34:45

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

H. Peter Anvin wrote:
> Auke Kok wrote:
>>
>> (Please CC either me or at netdev on all intel nic drivers. thanks. I
>> removed `[email protected]` since it throws a bounce, and
>> [email protected] is a support address only, doesn't reach us
>> developers)
>>
>
> I think John <[email protected]> is the one who can actually answer your
> questions...

his original mail reads:

"Please note, email address is a bit-bucket.
I do monitor the mailing list. "

2006-11-07 18:40:27

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

Auke Kok wrote:
> H. Peter Anvin wrote:
>> Auke Kok wrote:
>>>
>>> (Please CC either me or at netdev on all intel nic drivers. thanks. I
>>> removed `[email protected]` since it throws a bounce, and
>>> [email protected] is a support address only, doesn't reach us
>>> developers)
>>>
>>
>> I think John <[email protected]> is the one who can actually answer your
>> questions...
>
> his original mail reads:
>
> "Please note, email address is a bit-bucket.
> I do monitor the mailing list. "

Ah. So it makes no difference either way. It definitely doesn't bounce.

-hpa

2006-11-08 10:55:28

by John

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

Hello all,

[ E-mail address is a bit-bucket. I *do* monitor the mailing lists. ]

I will try and summarize the problem as I understand it at this point.

I've written two messages so far:
http://groups.google.com/group/linux.kernel/msg/3a05d819c66474db
http://groups.google.com/group/linux.kernel/msg/391aebbb3dfd6039

And here is a link to the complete thread:
http://lkml.org/lkml/fancy/2006/11/3/124

I have a motherboard with three on-board 82559 NICs.

o eepro100.ko properly initializes all three NICs
o e100.ko fails to initialize one of them

NOTE: With kernel 2.6.14, e100.ko fails to initialize the NIC with MAC
address 00:30:64:04:E6:E4. With kernel 2.6.18 e100.ko fails to
initialize the NIC with MAC address 00:30:64:04:E6:E5.

The problem is not an incorrect checksum. (Donald Becker's dump utility
reports a correct checksum for all three NICs.) The problem seems to be
that e100.ko fails to read the contents of one of the EEPROMs.

Auke wrote:
> How did you do the first `ethtool` eeprom dump? did you have the
> `e100` module loaded at that time? Did you use the new `override`
> mechanism graciously donated by David M?

These tests were performed on a 2.6.14 kernel. I hacked
e100_eeprom_load() to return 0 even when the checksum
fails. Thus the driver did not refuse to load, and I was
able to use ethtool to dump the contents of the 3 EEPROMs.

Here are additional examples running a 2.6.18.1-hrt kernel.

'insmod e100.ko' reports:

e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LNKA] -> GSI 12 (level,
low) -> IRQ 12
e100: eth0: e100_probe: addr 0xe5300000, irq 12, MAC addr 00:30:64:04:E6:E4
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKB] -> GSI 10 (level,
low) -> IRQ 10
e100: 0000:00:09.0: e100_eeprom_load: EEPROM corrupted
ACPI: PCI interrupt for device 0000:00:09.0 disabled
e100: probe of 0000:00:09.0 failed with error -11
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKC] -> GSI 11 (level,
low) -> IRQ 11
e100: eth1: e100_probe: addr 0xe5301000, irq 11, MAC addr 00:30:64:04:E6:E6

'insmod e100.ko eeprom_bad_csum_allow=1' reports:

e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LNKA] -> GSI 12 (level,
low) -> IRQ 12
e100: eth0: e100_probe: addr 0xe5300000, irq 12, MAC addr 00:30:64:04:E6:E4
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKB] -> GSI 10 (level,
low) -> IRQ 10
e100: 0000:00:09.0: e100_eeprom_load: EEPROM corrupted
e100: 0000:00:09.0: e100_probe: Invalid MAC address from EEPROM, aborting.
ACPI: PCI interrupt for device 0000:00:09.0 disabled
e100: probe of 0000:00:09.0 failed with error -11
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKC] -> GSI 11 (level,
low) -> IRQ 11
e100: eth1: e100_probe: addr 0xe5301000, irq 11, MAC addr 00:30:64:04:E6:E6

'insmod e100.ko debug=16' reports:

e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LNKA] -> GSI 12 (level,
low) -> IRQ 12
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x0000,
data_out=0x18203000
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x0000,
data_out=0x18217809
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x0000,
data_out=0x18217809
e100: 0000:00:08.0: e100_phy_init: phy_addr = 1
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=0, reg=0, data_in=0x0400,
data_out=0x14000400
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x0000,
data_out=0x18203000
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=1, reg=0, data_in=0x3000,
data_out=0x14203000
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=2, reg=0, data_in=0x0400,
data_out=0x14400400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=3, reg=0, data_in=0x0400,
data_out=0x14600400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=4, reg=0, data_in=0x0400,
data_out=0x14800400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=5, reg=0, data_in=0x0400,
data_out=0x14A00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=6, reg=0, data_in=0x0400,
data_out=0x14C00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=7, reg=0, data_in=0x0400,
data_out=0x14E00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=8, reg=0, data_in=0x0400,
data_out=0x15000400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=9, reg=0, data_in=0x0400,
data_out=0x15200400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=10, reg=0, data_in=0x0400,
data_out=0x15400400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=11, reg=0, data_in=0x0400,
data_out=0x15600400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=12, reg=0, data_in=0x0400,
data_out=0x15800400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=13, reg=0, data_in=0x0400,
data_out=0x15A00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=14, reg=0, data_in=0x0400,
data_out=0x15C00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=15, reg=0, data_in=0x0400,
data_out=0x15E00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=16, reg=0, data_in=0x0400,
data_out=0x16000400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=17, reg=0, data_in=0x0400,
data_out=0x16200400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=18, reg=0, data_in=0x0400,
data_out=0x16400400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=19, reg=0, data_in=0x0400,
data_out=0x16600400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=20, reg=0, data_in=0x0400,
data_out=0x16800400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=21, reg=0, data_in=0x0400,
data_out=0x16A00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=22, reg=0, data_in=0x0400,
data_out=0x16C00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=23, reg=0, data_in=0x0400,
data_out=0x16E00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=24, reg=0, data_in=0x0400,
data_out=0x17000400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=25, reg=0, data_in=0x0400,
data_out=0x17200400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=26, reg=0, data_in=0x0400,
data_out=0x17400400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=27, reg=0, data_in=0x0400,
data_out=0x17600400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=28, reg=0, data_in=0x0400,
data_out=0x17800400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=29, reg=0, data_in=0x0400,
data_out=0x17A00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=30, reg=0, data_in=0x0400,
data_out=0x17C00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=31, reg=0, data_in=0x0400,
data_out=0x17E00400
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=2, data_in=0x0000,
data_out=0x182202A8
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=3, data_in=0x0000,
data_out=0x18230154
e100: 0000:00:08.0: e100_phy_init: phy ID = 0x015402A8
e100: eth0: e100_probe: addr 0xe5300000, irq 12, MAC addr 00:30:64:04:E6:E4
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKB] -> GSI 10 (level,
low) -> IRQ 10
e100: 0000:00:09.0: e100_eeprom_load: EEPROM corrupted
ACPI: PCI interrupt for device 0000:00:09.0 disabled
e100: probe of 0000:00:09.0 failed with error -11
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKC] -> GSI 11 (level,
low) -> IRQ 11
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x0000,
data_out=0x18203000
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x0000,
data_out=0x18217809
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x0000,
data_out=0x18217809
e100: 0000:00:0a.0: e100_phy_init: phy_addr = 1
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=0, reg=0, data_in=0x0400,
data_out=0x14000400
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x0000,
data_out=0x18203000
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=1, reg=0, data_in=0x3000,
data_out=0x14203000
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=2, reg=0, data_in=0x0400,
data_out=0x14400400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=3, reg=0, data_in=0x0400,
data_out=0x14600400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=4, reg=0, data_in=0x0400,
data_out=0x14800400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=5, reg=0, data_in=0x0400,
data_out=0x14A00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=6, reg=0, data_in=0x0400,
data_out=0x14C00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=7, reg=0, data_in=0x0400,
data_out=0x14E00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=8, reg=0, data_in=0x0400,
data_out=0x15000400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=9, reg=0, data_in=0x0400,
data_out=0x15200400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=10, reg=0, data_in=0x0400,
data_out=0x15400400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=11, reg=0, data_in=0x0400,
data_out=0x15600400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=12, reg=0, data_in=0x0400,
data_out=0x15800400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=13, reg=0, data_in=0x0400,
data_out=0x15A00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=14, reg=0, data_in=0x0400,
data_out=0x15C00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=15, reg=0, data_in=0x0400,
data_out=0x15E00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=16, reg=0, data_in=0x0400,
data_out=0x16000400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=17, reg=0, data_in=0x0400,
data_out=0x16200400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=18, reg=0, data_in=0x0400,
data_out=0x16400400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=19, reg=0, data_in=0x0400,
data_out=0x16600400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=20, reg=0, data_in=0x0400,
data_out=0x16800400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=21, reg=0, data_in=0x0400,
data_out=0x16A00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=22, reg=0, data_in=0x0400,
data_out=0x16C00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=23, reg=0, data_in=0x0400,
data_out=0x16E00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=24, reg=0, data_in=0x0400,
data_out=0x17000400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=25, reg=0, data_in=0x0400,
data_out=0x17200400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=26, reg=0, data_in=0x0400,
data_out=0x17400400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=27, reg=0, data_in=0x0400,
data_out=0x17600400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=28, reg=0, data_in=0x0400,
data_out=0x17800400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=29, reg=0, data_in=0x0400,
data_out=0x17A00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=30, reg=0, data_in=0x0400,
data_out=0x17C00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=31, reg=0, data_in=0x0400,
data_out=0x17E00400
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=2, data_in=0x0000,
data_out=0x182202A8
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=3, data_in=0x0000,
data_out=0x18230154
e100: 0000:00:0a.0: e100_phy_init: phy ID = 0x015402A8
e100: eth1: e100_probe: addr 0xe5301000, irq 11, MAC addr 00:30:64:04:E6:E6

'insmod e100.ko eeprom_bad_csum_allow=1 debug=16' reports:

e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
PCI: Enabling device 0000:00:08.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LNKA] -> GSI 12 (level,
low) -> IRQ 12
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x0000,
data_out=0x18203000
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x0000,
data_out=0x18217809
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x0000,
data_out=0x18217809
e100: 0000:00:08.0: e100_phy_init: phy_addr = 1
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=0, reg=0, data_in=0x0400,
data_out=0x14000400
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x0000,
data_out=0x18203000
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=1, reg=0, data_in=0x3000,
data_out=0x14203000
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=2, reg=0, data_in=0x0400,
data_out=0x14400400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=3, reg=0, data_in=0x0400,
data_out=0x14600400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=4, reg=0, data_in=0x0400,
data_out=0x14800400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=5, reg=0, data_in=0x0400,
data_out=0x14A00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=6, reg=0, data_in=0x0400,
data_out=0x14C00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=7, reg=0, data_in=0x0400,
data_out=0x14E00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=8, reg=0, data_in=0x0400,
data_out=0x15000400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=9, reg=0, data_in=0x0400,
data_out=0x15200400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=10, reg=0, data_in=0x0400,
data_out=0x15400400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=11, reg=0, data_in=0x0400,
data_out=0x15600400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=12, reg=0, data_in=0x0400,
data_out=0x15800400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=13, reg=0, data_in=0x0400,
data_out=0x15A00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=14, reg=0, data_in=0x0400,
data_out=0x15C00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=15, reg=0, data_in=0x0400,
data_out=0x15E00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=16, reg=0, data_in=0x0400,
data_out=0x16000400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=17, reg=0, data_in=0x0400,
data_out=0x16200400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=18, reg=0, data_in=0x0400,
data_out=0x16400400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=19, reg=0, data_in=0x0400,
data_out=0x16600400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=20, reg=0, data_in=0x0400,
data_out=0x16800400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=21, reg=0, data_in=0x0400,
data_out=0x16A00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=22, reg=0, data_in=0x0400,
data_out=0x16C00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=23, reg=0, data_in=0x0400,
data_out=0x16E00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=24, reg=0, data_in=0x0400,
data_out=0x17000400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=25, reg=0, data_in=0x0400,
data_out=0x17200400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=26, reg=0, data_in=0x0400,
data_out=0x17400400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=27, reg=0, data_in=0x0400,
data_out=0x17600400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=28, reg=0, data_in=0x0400,
data_out=0x17800400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=29, reg=0, data_in=0x0400,
data_out=0x17A00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=30, reg=0, data_in=0x0400,
data_out=0x17C00400
e100: 0000:00:08.0: mdio_ctrl: WRITE:addr=31, reg=0, data_in=0x0400,
data_out=0x17E00400
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=2, data_in=0x0000,
data_out=0x182202A8
e100: 0000:00:08.0: mdio_ctrl: READ:addr=1, reg=3, data_in=0x0000,
data_out=0x18230154
e100: 0000:00:08.0: e100_phy_init: phy ID = 0x015402A8
e100: eth0: e100_probe: addr 0xe5300000, irq 12, MAC addr 00:30:64:04:E6:E4
PCI: Enabling device 0000:00:09.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKB] -> GSI 10 (level,
low) -> IRQ 10
e100: 0000:00:09.0: e100_eeprom_load: EEPROM corrupted
e100: 0000:00:09.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x0000,
data_out=0x18203000
e100: 0000:00:09.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x0000,
data_out=0x18217829
e100: 0000:00:09.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x0000,
data_out=0x1821782D
e100: 0000:00:09.0: e100_phy_init: phy_addr = 1
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=0, reg=0, data_in=0x0400,
data_out=0x14000400
e100: 0000:00:09.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x0000,
data_out=0x18203000
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=1, reg=0, data_in=0x3000,
data_out=0x14203000
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=2, reg=0, data_in=0x0400,
data_out=0x14400400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=3, reg=0, data_in=0x0400,
data_out=0x14600400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=4, reg=0, data_in=0x0400,
data_out=0x14800400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=5, reg=0, data_in=0x0400,
data_out=0x14A00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=6, reg=0, data_in=0x0400,
data_out=0x14C00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=7, reg=0, data_in=0x0400,
data_out=0x14E00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=8, reg=0, data_in=0x0400,
data_out=0x15000400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=9, reg=0, data_in=0x0400,
data_out=0x15200400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=10, reg=0, data_in=0x0400,
data_out=0x15400400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=11, reg=0, data_in=0x0400,
data_out=0x15600400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=12, reg=0, data_in=0x0400,
data_out=0x15800400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=13, reg=0, data_in=0x0400,
data_out=0x15A00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=14, reg=0, data_in=0x0400,
data_out=0x15C00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=15, reg=0, data_in=0x0400,
data_out=0x15E00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=16, reg=0, data_in=0x0400,
data_out=0x16000400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=17, reg=0, data_in=0x0400,
data_out=0x16200400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=18, reg=0, data_in=0x0400,
data_out=0x16400400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=19, reg=0, data_in=0x0400,
data_out=0x16600400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=20, reg=0, data_in=0x0400,
data_out=0x16800400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=21, reg=0, data_in=0x0400,
data_out=0x16A00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=22, reg=0, data_in=0x0400,
data_out=0x16C00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=23, reg=0, data_in=0x0400,
data_out=0x16E00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=24, reg=0, data_in=0x0400,
data_out=0x17000400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=25, reg=0, data_in=0x0400,
data_out=0x17200400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=26, reg=0, data_in=0x0400,
data_out=0x17400400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=27, reg=0, data_in=0x0400,
data_out=0x17600400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=28, reg=0, data_in=0x0400,
data_out=0x17800400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=29, reg=0, data_in=0x0400,
data_out=0x17A00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=30, reg=0, data_in=0x0400,
data_out=0x17C00400
e100: 0000:00:09.0: mdio_ctrl: WRITE:addr=31, reg=0, data_in=0x0400,
data_out=0x17E00400
e100: 0000:00:09.0: mdio_ctrl: READ:addr=1, reg=2, data_in=0x0000,
data_out=0x182202A8
e100: 0000:00:09.0: mdio_ctrl: READ:addr=1, reg=3, data_in=0x0000,
data_out=0x18230154
e100: 0000:00:09.0: e100_phy_init: phy ID = 0x015402A8
e100: 0000:00:09.0: e100_probe: Invalid MAC address from EEPROM, aborting.
ACPI: PCI interrupt for device 0000:00:09.0 disabled
e100: probe of 0000:00:09.0 failed with error -11
PCI: Enabling device 0000:00:0a.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKC] -> GSI 11 (level,
low) -> IRQ 11
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x0000,
data_out=0x18203000
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x0000,
data_out=0x18217809
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=1, data_in=0x0000,
data_out=0x18217809
e100: 0000:00:0a.0: e100_phy_init: phy_addr = 1
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=0, reg=0, data_in=0x0400,
data_out=0x14000400
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=0, data_in=0x0000,
data_out=0x18203000
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=1, reg=0, data_in=0x3000,
data_out=0x14203000
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=2, reg=0, data_in=0x0400,
data_out=0x14400400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=3, reg=0, data_in=0x0400,
data_out=0x14600400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=4, reg=0, data_in=0x0400,
data_out=0x14800400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=5, reg=0, data_in=0x0400,
data_out=0x14A00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=6, reg=0, data_in=0x0400,
data_out=0x14C00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=7, reg=0, data_in=0x0400,
data_out=0x14E00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=8, reg=0, data_in=0x0400,
data_out=0x15000400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=9, reg=0, data_in=0x0400,
data_out=0x15200400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=10, reg=0, data_in=0x0400,
data_out=0x15400400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=11, reg=0, data_in=0x0400,
data_out=0x15600400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=12, reg=0, data_in=0x0400,
data_out=0x15800400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=13, reg=0, data_in=0x0400,
data_out=0x15A00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=14, reg=0, data_in=0x0400,
data_out=0x15C00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=15, reg=0, data_in=0x0400,
data_out=0x15E00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=16, reg=0, data_in=0x0400,
data_out=0x16000400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=17, reg=0, data_in=0x0400,
data_out=0x16200400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=18, reg=0, data_in=0x0400,
data_out=0x16400400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=19, reg=0, data_in=0x0400,
data_out=0x16600400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=20, reg=0, data_in=0x0400,
data_out=0x16800400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=21, reg=0, data_in=0x0400,
data_out=0x16A00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=22, reg=0, data_in=0x0400,
data_out=0x16C00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=23, reg=0, data_in=0x0400,
data_out=0x16E00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=24, reg=0, data_in=0x0400,
data_out=0x17000400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=25, reg=0, data_in=0x0400,
data_out=0x17200400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=26, reg=0, data_in=0x0400,
data_out=0x17400400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=27, reg=0, data_in=0x0400,
data_out=0x17600400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=28, reg=0, data_in=0x0400,
data_out=0x17800400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=29, reg=0, data_in=0x0400,
data_out=0x17A00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=30, reg=0, data_in=0x0400,
data_out=0x17C00400
e100: 0000:00:0a.0: mdio_ctrl: WRITE:addr=31, reg=0, data_in=0x0400,
data_out=0x17E00400
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=2, data_in=0x0000,
data_out=0x182202A8
e100: 0000:00:0a.0: mdio_ctrl: READ:addr=1, reg=3, data_in=0x0000,
data_out=0x18230154
e100: 0000:00:0a.0: e100_phy_init: phy ID = 0x015402A8
e100: eth1: e100_probe: addr 0xe5301000, irq 11, MAC addr 00:30:64:04:E6:E6

i.e. e100.ko initializes only two NICs:

# ip addr
1: lo: <LOOPBACK,UP,10000> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop
link/ether e2:18:f7:f8:88:4e brd ff:ff:ff:ff:ff:ff
6: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:30:64:04:e6:e4 brd ff:ff:ff:ff:ff:ff
7: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:30:64:04:e6:e6 brd ff:ff:ff:ff:ff:ff

Constrast this with eepro100.ko...

'insmod e100.ko debug=6' reports:

eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
<[email protected]> and others
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12
PCI: setting IRQ 12 as level-triggered
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LNKA] -> GSI 12 (level,
low) -> IRQ 12
Found Intel i82557 PCI Speedo at 0xe5300000, IRQ 12.
eth0: 0000:00:08.0, 00:30:64:04:E6:E4, IRQ 12.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKB] -> GSI 10 (level,
low) -> IRQ 10
Found Intel i82557 PCI Speedo at 0xe5302000, IRQ 10.
eth1: 0000:00:09.0, 00:30:64:04:E6:E5, IRQ 10.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKC] -> GSI 11 (level,
low) -> IRQ 11
Found Intel i82557 PCI Speedo at 0xe5301000, IRQ 11.
eth2: 0000:00:0a.0, 00:30:64:04:E6:E6, IRQ 11.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).

#ip addr
1: lo: <LOOPBACK,UP,10000> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop
link/ether e2:18:f7:f8:88:4e brd ff:ff:ff:ff:ff:ff
3: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:30:64:04:e6:e4 brd ff:ff:ff:ff:ff:ff
4: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:30:64:04:e6:e5 brd ff:ff:ff:ff:ff:ff
5: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:30:64:04:e6:e6 brd ff:ff:ff:ff:ff:ff

# eepro100-diag -aa -ee
eepro100-diag.c:v2.13 2/28/2005 Donald Becker ([email protected])
http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xd800.
i82557 chip registers at 0xd800:
00000000 00000000 00000000 00080002 10000000 00000000
No interrupt sources are pending.
The transmit unit state is 'Idle'.
The receive unit state is 'Idle'.
This status is unusual for an activated interface.
EEPROM contents, size 64x16:
00: 3000 0464 e4e6 0e03 0000 0201 4701 0000 _0d__________G__
0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
...
0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
0x38: 0000 0000 0000 0000 0000 0000 0000 92f7 ________________
The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:30:64:04:E6:E4.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
Sleep mode is enabled. This is not recommended.
Under high load the card may not respond to
PCI requests, and thus cause a master abort.
To clear sleep mode use the '-G 0 -w -w -f' options.
Index #2: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xdc00.
i82557 chip registers at 0xdc00:
00000000 00000000 00000000 00080002 10000000 00000000
No interrupt sources are pending.
The transmit unit state is 'Idle'.
The receive unit state is 'Idle'.
This status is unusual for an activated interface.
EEPROM contents, size 64x16:
00: 3000 0464 e5e6 0e03 0000 0201 4701 0000 _0d__________G__
0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
...
0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
0x38: 0000 0000 0000 0000 0000 0000 0000 91f7 ________________
The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:30:64:04:E6:E5.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
Sleep mode is enabled. This is not recommended.
Under high load the card may not respond to
PCI requests, and thus cause a master abort.
To clear sleep mode use the '-G 0 -w -w -f' options.
Index #3: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0xe000.
i82557 chip registers at 0xe000:
00000000 00000000 00000000 00080002 10000000 00000000
No interrupt sources are pending.
The transmit unit state is 'Idle'.
The receive unit state is 'Idle'.
This status is unusual for an activated interface.
EEPROM contents, size 64x16:
00: 3000 0464 e6e6 0e03 0000 0201 4701 0000 _0d__________G__
0x08: 7213 8310 40a2 0001 8086 0000 0000 0000 _r___@__________
...
0x30: 0128 0000 0000 0000 0000 0000 0000 0000 (_______________
0x38: 0000 0000 0000 0000 0000 0000 0000 90f7 ________________
The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:30:64:04:E6:E6.
Board assembly 721383-016, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
Sleep mode is enabled. This is not recommended.
Under high load the card may not respond to
PCI requests, and thus cause a master abort.
To clear sleep mode use the '-G 0 -w -w -f' options.

On a related note, I am concerned by this message:

Sleep mode is enabled. This is not recommended.
Under high load the card may not respond to
PCI requests, and thus cause a master abort.
To clear sleep mode use the '-G 0 -w -w -f' options.

Intel 82559 EEPROM Map and Programming Information (AP-394) states:
http://www.intel.com/design/network/applnots/ap394.htm

The Standby Enable bit enables the 82559 to enter standby mode. When
this bit equals 1b, the 82559 is able to recognize an idle state and can
enter standby mode (some internal clocks are stopped for power saving
purposes). The 82559 does not require a PCI clock signal in standby
mode. If this bit equals 0b, the idle recognition circuit is disabled
and the 82559 always remains in an active state. Thus, the 82559 always
requests PCI CLK using the Clockrun mechanism.

Auke, do you agree with Donald Becker's warning?

If I disable STB, the NICs will waste a bit more power when idle,
is that correct? Are there other implications?

Thanks for reading this far!

John

2006-11-08 16:18:40

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

John wrote:
> I have a motherboard with three on-board 82559 NICs.
>
> o eepro100.ko properly initializes all three NICs
> o e100.ko fails to initialize one of them
>
> NOTE: With kernel 2.6.14, e100.ko fails to initialize the NIC with MAC
> address 00:30:64:04:E6:E4. With kernel 2.6.18 e100.ko fails to
> initialize the NIC with MAC address 00:30:64:04:E6:E5.
>
> The problem is not an incorrect checksum. (Donald Becker's dump utility
> reports a correct checksum for all three NICs.) The problem seems to be
> that e100.ko fails to read the contents of one of the EEPROMs.

[snip]

> 'insmod e100.ko eeprom_bad_csum_allow=1' reports:
>
> e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
> e100: Copyright(c) 1999-2005 Intel Corporation
> ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 12
> PCI: setting IRQ 12 as level-triggered
> ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LNKA] -> GSI 12 (level,
> low) -> IRQ 12
> e100: eth0: e100_probe: addr 0xe5300000, irq 12, MAC addr 00:30:64:04:E6:E4
> ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
> PCI: setting IRQ 10 as level-triggered
> ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKB] -> GSI 10 (level,
> low) -> IRQ 10
> e100: 0000:00:09.0: e100_eeprom_load: EEPROM corrupted
> e100: 0000:00:09.0: e100_probe: Invalid MAC address from EEPROM, aborting.
> ACPI: PCI interrupt for device 0000:00:09.0 disabled
> e100: probe of 0000:00:09.0 failed with error -11

This is what I was afraid of: even though the code allows you to bypass the EEPROM
checksum, the probe fails on a further check to see if the MAC address is valid.

Since something with this NIC specifically made the EEPROM return all 0xff's, the MAC
address is automatically invalid, and thus probe fails.

It seems that the driver has more problems with this NIC than just the eeprom checksum
being bad. Needless to say this might need fixing.

Can you load the eepro driver and send me the full eeprom dump? Perhaps I can duplicate
things over here.

[snip]

> On a related note, I am concerned by this message:
>
> Sleep mode is enabled. This is not recommended.
> Under high load the card may not respond to
> PCI requests, and thus cause a master abort.
> To clear sleep mode use the '-G 0 -w -w -f' options.
>
> Intel 82559 EEPROM Map and Programming Information (AP-394) states:
> http://www.intel.com/design/network/applnots/ap394.htm
>
> The Standby Enable bit enables the 82559 to enter standby mode. When
> this bit equals 1b, the 82559 is able to recognize an idle state and can
> enter standby mode (some internal clocks are stopped for power saving
> purposes). The 82559 does not require a PCI clock signal in standby
> mode. If this bit equals 0b, the idle recognition circuit is disabled
> and the 82559 always remains in an active state. Thus, the 82559 always
> requests PCI CLK using the Clockrun mechanism.
>
> Auke, do you agree with Donald Becker's warning?

If you are using the e100 in a performance situation, I would certainly switch it off :)

> If I disable STB, the NICs will waste a bit more power when idle,
> is that correct? Are there other implications?

hm, I don't know the power specs of e100 that well, so I can't say that it saves
significant amounts of power, but I suspect it would.

Power management on nics is hairy business. As suggested, it can take time before the
nic powers back up, performance can be impacted, and some commands might return an
invalid or unknown value. OTOH our labs here test these things pretty well before they
get send out to customers and resales agents, so Beckers cautious wording catches the
severity pretty well (recommended).

I would say that under most circumstances, it's safe to enable STB, but you might want
to disable it for use in routing and other server applications, where most of the time
the NIC is active anyway.

hth

Auke

>
> Thanks for reading this far!
>
> John

2006-11-08 17:27:00

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

On 11/8/06, John <[email protected]> wrote:
> Hello all,
>
> [ E-mail address is a bit-bucket. I *do* monitor the mailing lists. ]
>
> I will try and summarize the problem as I understand it at this point.
>
> I've written two messages so far:
> http://groups.google.com/group/linux.kernel/msg/3a05d819c66474db
> http://groups.google.com/group/linux.kernel/msg/391aebbb3dfd6039
>
> And here is a link to the complete thread:
> http://lkml.org/lkml/fancy/2006/11/3/124
>
> I have a motherboard with three on-board 82559 NICs.
>
> o eepro100.ko properly initializes all three NICs
> o e100.ko fails to initialize one of them
>
> NOTE: With kernel 2.6.14, e100.ko fails to initialize the NIC with MAC
> address 00:30:64:04:E6:E4. With kernel 2.6.18 e100.ko fails to
> initialize the NIC with MAC address 00:30:64:04:E6:E5.
>
> The problem is not an incorrect checksum. (Donald Becker's dump utility
> reports a correct checksum for all three NICs.) The problem seems to be
> that e100.ko fails to read the contents of one of the EEPROMs.

<snip>

Thanks for the report, I have some thoughts.
I suspect that one reason beckers code works is that it uses IO based
access (slower, and different method) to the adapter rather than
memory mapped access.

The second thought is that the adapter is in D3, and something about
your kernel or the driver doesn't successfully wake it up to D0. An
indication of this would be looking at lspci -vv before/after loading
the driver. Also, after loading/unloading eepro100 does the e100
driver work?

A third idea is look for a master abort in lspci after e100 fails to load.

And a last idea is for us to instrument the reads /writes from/to the
device during init and see if everything is returning 0xffffffff, as
that indicates the I/O and/or memory bar is not enabled, or the
address returned from ioremap is invalid.

Jesse

2006-11-09 12:16:44

by John

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

Auke Kok wrote:

> This is what I was afraid of: even though the code allows you to bypass
> the EEPROM checksum, the probe fails on a further check to see if the
> MAC address is valid.
>
> Since something with this NIC specifically made the EEPROM return all
> 0xff's, the MAC address is automatically invalid, and thus probe fails.

I don't understand why you think there is something wrong with a
specific NIC?

In 2.6.14.7, e100.ko fails to read the EEPROM on 0000:00:08.0 (eth0)
In 2.6.18.1, e100.ko fails to read the EEPROM on 0000:00:09.0 (eth1)
In both kernels, eepro100.ko successfully reads all the EEPROMs.

> It seems that the driver has more problems with this NIC than just the
> eeprom checksum being bad. Needless to say this might need fixing.
>
> Can you load the eepro driver and send me the full eeprom dump?
> Perhaps I can duplicate things over here.

00:08.0 EEPROM contents, size 64x16

3000 0464 e4e6 0e03 0000 0201 4701 0000
7213 8310 40a2 0001 8086 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0128 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 92f7

00:09.0 EEPROM contents, size 64x16

3000 0464 e5e6 0e03 0000 0201 4701 0000
7213 8310 40a2 0001 8086 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0128 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 91f7

00:0a.0 EEPROM contents, size 64x16

3000 0464 e6e6 0e03 0000 0201 4701 0000
7213 8310 40a2 0001 8086 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0128 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 90f7

2006-11-09 14:15:01

by John

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

Jesse Brandeburg wrote:

> I suspect that one reason Becker's code works is that it uses IO
> based access (slower, and different method) to the adapter rather
> than memory mapped access.

I've noticed this difference.

> The second thought is that the adapter is in D3, and something about
> your kernel or the driver doesn't successfully wake it up to D0.

On my NICs, the EEPROM ID (Word 0Ah) is set to 0x40a2.
Thus DDPD (bit 6) is set to 0.

DDPD is the "Disable Deep Power Down while PME is disabled" bit.
0 - Deep Power Down is enabled in D3 state while PME-disabled.
1 - Deep Power Down disabled in D3 state while PME-disabled.
This bit should be set to 1b if a TCO controller is being used via the
SMB because it requires receive functionality at all power states.

Are you suggesting I try and set DDPD to 1?
Or is this completely unrelated?

> An indication of this would be looking at lspci -vv before/after
> loading the driver.

$ diff -u lspci_vv_before_e100.txt lspci_vv_after_e100.txt
--- lspci_vv_before_e100.txt 2006-11-09 14:51:30.000000000 +0100
+++ lspci_vv_after_e100.txt 2006-11-09 14:51:30.000000000 +0100
@@ -74,21 +74,20 @@
Expansion ROM at 20000000 [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
- Status: D0 PME-Enable+ DSel=0 DScale=2 PME-
+ Status: D0 PME-Enable- DSel=0 DScale=2 PME-

00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro
100] (rev 08)
Subsystem: Intel Corporation EtherExpress PRO/100B (TX)
- Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
+ Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
- Latency: 32 (2000ns min, 14000ns max), cache line size 08
Interrupt: pin A routed to IRQ 10
- Region 0: Memory at e5302000 (32-bit, non-prefetchable) [size=4K]
- Region 1: I/O ports at dc00 [size=64]
- Region 2: Memory at e5100000 (32-bit, non-prefetchable) [size=1M]
+ Region 0: Memory at e5302000 (32-bit, non-prefetchable)
[disabled] [size=4K]
+ Region 1: I/O ports at dc00 [disabled] [size=64]
+ Region 2: Memory at e5100000 (32-bit, non-prefetchable)
[disabled] [size=1M]
Expansion ROM at 20100000 [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
- Status: D0 PME-Enable+ DSel=0 DScale=2 PME-
+ Status: D0 PME-Enable- DSel=0 DScale=2 PME-

00:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro
100] (rev 08)
Subsystem: Intel Corporation EtherExpress PRO/100B (TX)

> Also, after loading/unloading eepro100 does the e100 driver work?

No.

> A third idea is look for a master abort in lspci after e100 fails to
> load.

I don't understand that one.

2006-11-09 17:04:16

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

John wrote:
> Auke Kok wrote:
>
>> This is what I was afraid of: even though the code allows you to
>> bypass the EEPROM checksum, the probe fails on a further check to see
>> if the MAC address is valid.
>>
>> Since something with this NIC specifically made the EEPROM return all
>> 0xff's, the MAC address is automatically invalid, and thus probe fails.
>
> I don't understand why you think there is something wrong with a
> specific NIC?

that was completely not my point - I was merely trying to point out that the original
problem causes a cascade of error events later on, and bypassing the eeprom check in
this case didn't help you at all. Something is wrong in the driver, but I don't
understand yet why it only affects one of the 3 nics in your system.

> In 2.6.14.7, e100.ko fails to read the EEPROM on 0000:00:08.0 (eth0)
> In 2.6.18.1, e100.ko fails to read the EEPROM on 0000:00:09.0 (eth1)

almost sounds like a bug got fixed and it introduced a regression. this wouldn't be the
right time to pull out git-bisect would it? even loading 2.6.15, 2.6.16, 2.6.17 on it
would give us some good information.

Cheers,

Auke

2006-11-10 00:19:25

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

On 11/9/06, John <[email protected]> wrote:
> > The second thought is that the adapter is in D3, and something about
> > your kernel or the driver doesn't successfully wake it up to D0.
>
> On my NICs, the EEPROM ID (Word 0Ah) is set to 0x40a2.
> Thus DDPD (bit 6) is set to 0.
>
> DDPD is the "Disable Deep Power Down while PME is disabled" bit.
> 0 - Deep Power Down is enabled in D3 state while PME-disabled.
> 1 - Deep Power Down disabled in D3 state while PME-disabled.
> This bit should be set to 1b if a TCO controller is being used via the
> SMB because it requires receive functionality at all power states.
>
> Are you suggesting I try and set DDPD to 1?
> Or is this completely unrelated?

This may be related but I doubt it. Something is strange about how
memory is being mapped in your system. whatever is creating the
problem moved when you changed the kernel version. I'm wondering if
there is a device collision at e5302000. I'm not convinced at this
point it is e100's fault.

can you send output of cat /proc/iomem

> > An indication of this would be looking at lspci -vv before/after
> > loading the driver.
>
> $ diff -u lspci_vv_before_e100.txt lspci_vv_after_e100.txt
> --- lspci_vv_before_e100.txt 2006-11-09 14:51:30.000000000 +0100
> +++ lspci_vv_after_e100.txt 2006-11-09 14:51:30.000000000 +0100
> @@ -74,21 +74,20 @@
> Expansion ROM at 20000000 [disabled] [size=1M]
> Capabilities: [dc] Power Management version 2
> Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> - Status: D0 PME-Enable+ DSel=0 DScale=2 PME-
> + Status: D0 PME-Enable- DSel=0 DScale=2 PME-

okay when the driver loads it is clearing PME enable, but not
re-enabling it when it unloads. That is pretty much expected.

> 00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro
> 100] (rev 08)
> Subsystem: Intel Corporation EtherExpress PRO/100B (TX)
> - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B-
> + Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B-
> Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium
> >TAbort- <TAbort- <MAbort- >SERR- <PERR-

pci_enable_device should be enabling io,mem,busmaster, they are
probably being disabled when the driver errors out of init. maybe you
should add a call to pci_set_power_state(dev, PCI_D0); before the
call to e100_reset

> > Also, after loading/unloading eepro100 does the e100 driver work?
>
> No.

now that is really odd.

> > A third idea is look for a master abort in lspci after e100 fails to
> > load.
>
> I don't understand that one.

There isn't one, MAbort+ would be showing in the above lspci output.

The all 0xffffffff returns when you read registers is a sure sign the
hardware either isn't at the address specified or is in a power down
state. The only other option i can think of is that something else is
intercepting memory reads and writes.

try something like the attached patch, compile tested only:

Attachments:

(No filename) (3.04 kB)
e100_debug.patch (969.00 B)
Download all attachments

2006-11-10 12:02:59

by John

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

Linux version 2.6.18.1-hrt (john@venus) (gcc version 3.4.4) #1 PREEMPT Tue Nov 7 18:00:05 CET 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
BIOS-e820: 000000000fff0000 - 000000000fff3000 (ACPI NVS)
BIOS-e820: 000000000fff3000 - 0000000010000000 (ACPI data)
BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
255MB LOWMEM available.
On node 0 totalpages: 65520
DMA zone: 4096 pages, LIFO batch:0
Normal zone: 61424 pages, LIFO batch:15
DMI 2.3 present.
ACPI: RSDP (v000 VIA601 ) @ 0x000f6950
ACPI: RSDT (v001 VIA601 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0fff3000
ACPI: FADT (v001 VIA601 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0fff3040
ACPI: DSDT (v001 VIA601 AWRDACPI 0x00001000 MSFT 0x0100000c) @ 0x00000000
ACPI: PM-Timer IO Port: 0x4008
Allocating PCI resources starting at 20000000 (gap: 10000000:efff0000)
Detected 1266.766 MHz processor.
VSYSCALL: consistency checks...passed...mapping...done.
VSYSCALL: fixmap virt addr: 0xffffd000
Built 1 zonelists. Total pages: 65520
Kernel command line: ro root=/dev/hda1 console=ttyS0,57600n8 console=tty0 panic=3
Local APIC disabled by BIOS -- you can enable it with "lapic"
mapped APIC to ffffc000 (01201000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
Clock event device pit configured with caps set: 07
PID hash table entries: 1024 (order: 10, 4096 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 256840k/262080k available (1626k kernel code, 4752k reserved, 532k data, 160k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 2535.90 BogoMIPS (lpj=5071805)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0383f9ff 00000000 00000000 00000000 00000000 00000000 00000000
CPU: After vendor identify, caps: 0383f9ff 00000000 00000000 00000000 00000000 00000000 00000000
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
CPU: After all inits, caps: 0383f9ff 00000000 00000000 00000040 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to ffffe000.
CPU: Intel(R) Pentium(R) III CPU - S 1266MHz stepping 04
Checking 'hlt' instruction... OK.
ACPI: Core revision 20060707
ACPI: setting ELCR to 0200 (from 1e20)
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfb210, last bus=2
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: Interpreter enabled
ACPI: Using PIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
PCI quirk: region 6000-607f claimed by vt82c686 HW-mon
PCI quirk: region 5000-500f claimed by vt82c686 SMB
PCI: Firmware left 0000:00:08.0 e100 interrupts enabled, disabling
PCI: Firmware left 0000:00:09.0 e100 interrupts enabled, disabling
PCI: Firmware left 0000:00:0a.0 e100 interrupts enabled, disabling
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 11 *12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 1 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 1 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 1 3 4 *5 6 7 10 11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 13 devices
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
PCI: Bridge: 0000:00:01.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:0d.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Setting latency timer of device 0000:00:01.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
TCP established hash table entries: 8192 (order: 3, 32768 bytes)
TCP bind hash table entries: 4096 (order: 2, 16384 bytes)
TCP: Hash tables configured (established 8192 bind 4096)
TCP reno registered
Machine check exception polling timer started.
IA-32 Microcode Update Driver: v1.14a <[email protected]>
io scheduler noop registered
io scheduler cfq registered (default)
PCI: Disabling Via external APIC routing
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: CPU0 (power states: C1[C1] C2[C2])
ACPI: Processor [CPU0] (supports 2 throttling states)
Non-volatile memory driver v1.2
Linux agpgart interface v0.101 (c) Dave Jones
agpgart: Detected VIA Apollo Pro 133 chipset
agpgart: AGP aperture is 64M @ 0xe0000000
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
serial8250: ttyS2 at I/O 0x3e8 (irq = 4) is a 16550A
00:08: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:09: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:0c: ttyS2 at I/O 0x3e8 (irq = 4) is a 16550A
loop: loaded (max 8 devices)
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 0000:00:07.1
PCI: VIA IRQ fixup for 0000:00:07.1, from 255 to 0
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci0000:00:07.1
ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:pio, hdb:pio
ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: PQI IDE DiskOnModule, ATA DISK drive
hda: set_drive_speed_status: status=0x51 { DriveReady SeekComplete Error }
hda: set_drive_speed_status: error=0x04 { DriveStatusError }
hda: set_drive_speed_status: status=0x51 { DriveReady SeekComplete Error }
hda: set_drive_speed_status: error=0x04 { DriveStatusError }
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
Probing IDE interface ide1...
hda: max request size: 128KiB
hda: 256000 sectors (131 MB) w/1KiB Cache, CHS=500/16/32
hda: hda1 hda2 hda3 hda4
PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
PNP: PS/2 controller doesn't have AUX irq; using default 12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
oprofile: using timer interrupt.
netem: version 1.2
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
802.1Q VLAN Support v1.8 Ben Greear <[email protected]>
All bugs added by David S. Miller <[email protected]>
Using IPI Shortcut mode
VFS: Mounted root (ext2 filesystem) readonly.
Time: tsc clocksource has been installed.
Freeing unused kernel memory: 160k freed
Time: acpi_pm clocksource has been installed.
process `syslogd' is using obsolete setsockopt SO_BSDCOMPAT

Attachments:

config-2.6.18.1-adlink (20.43 kB)
dmesg.txt (7.05 kB)
Download all attachments

2006-11-15 08:33:55

by John

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

John wrote:

> 00000000-0009ffff : System RAM
> 000a0000-000bffff : Video RAM area
> 000f0000-000fffff : System ROM
> 00100000-0ffeffff : System RAM
> 00100000-00296a1a : Kernel code
> 00296a1b-0031bbe7 : Kernel data
> 0fff0000-0fff2fff : ACPI Non-volatile Storage
> 0fff3000-0fffffff : ACPI Tables
> 20000000-200fffff : 0000:00:08.0
> 20100000-201fffff : 0000:00:09.0
> 20200000-202fffff : 0000:00:0a.0
> e0000000-e3ffffff : 0000:00:00.0
> e5000000-e50fffff : 0000:00:08.0
> e5100000-e51fffff : 0000:00:09.0
> e5200000-e52fffff : 0000:00:0a.0
> e5300000-e5300fff : 0000:00:08.0
> e5301000-e5301fff : 0000:00:0a.0
> e5302000-e5302fff : 0000:00:09.0
> ffff0000-ffffffff : reserved
>
> I've also attached:
>
> o config-2.6.18.1-adlink used to compile this kernel
> o dmesg output after the machine boots

I suppose the information I've sent is not enough to locate the
root of the problem. Is there more I can provide?

2006-11-27 14:16:53

by John

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

John wrote:

>> 00000000-0009ffff : System RAM
>> 000a0000-000bffff : Video RAM area
>> 000f0000-000fffff : System ROM
>> 00100000-0ffeffff : System RAM
>> 00100000-00296a1a : Kernel code
>> 00296a1b-0031bbe7 : Kernel data
>> 0fff0000-0fff2fff : ACPI Non-volatile Storage
>> 0fff3000-0fffffff : ACPI Tables
>> 20000000-200fffff : 0000:00:08.0
>> 20100000-201fffff : 0000:00:09.0
>> 20200000-202fffff : 0000:00:0a.0
>> e0000000-e3ffffff : 0000:00:00.0
>> e5000000-e50fffff : 0000:00:08.0
>> e5100000-e51fffff : 0000:00:09.0
>> e5200000-e52fffff : 0000:00:0a.0
>> e5300000-e5300fff : 0000:00:08.0
>> e5301000-e5301fff : 0000:00:0a.0
>> e5302000-e5302fff : 0000:00:09.0
>> ffff0000-ffffffff : reserved
>>
>> I've also attached:
>>
>> o config-2.6.18.1-adlink used to compile this kernel
>> o dmesg output after the machine boots
>
> I suppose the information I've sent is not enough to locate the
> root of the problem. Is there more I can provide?

Here is some context for those who have been added to the CC list:
http://groups.google.com/group/linux.kernel/browse_frm/thread/bdc8fd08fb601c26

As far as I understand, some consider the eepro100 driver to be
obsolete, and it has been considered for removal.

What is the current status?

Unfortunately, e100 does not work out-of-the-box on this system.

Is there something I can do to improve the situation?

--
Regards,

John

[ E-mail address is a bit-bucket. I *do* monitor the mailing lists. ]

2006-11-27 20:34:11

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

On 11/27/06, John <[email protected]> wrote:
> John wrote:
>
> >> 00000000-0009ffff : System RAM
> >> 000a0000-000bffff : Video RAM area
> >> 000f0000-000fffff : System ROM
> >> 00100000-0ffeffff : System RAM
> >> 00100000-00296a1a : Kernel code
> >> 00296a1b-0031bbe7 : Kernel data
> >> 0fff0000-0fff2fff : ACPI Non-volatile Storage
> >> 0fff3000-0fffffff : ACPI Tables
> >> 20000000-200fffff : 0000:00:08.0
> >> 20100000-201fffff : 0000:00:09.0
> >> 20200000-202fffff : 0000:00:0a.0
> >> e0000000-e3ffffff : 0000:00:00.0
> >> e5000000-e50fffff : 0000:00:08.0
> >> e5100000-e51fffff : 0000:00:09.0
> >> e5200000-e52fffff : 0000:00:0a.0
> >> e5300000-e5300fff : 0000:00:08.0
> >> e5301000-e5301fff : 0000:00:0a.0
> >> e5302000-e5302fff : 0000:00:09.0
> >> ffff0000-ffffffff : reserved
> >>
> >> I've also attached:
> >>
> >> o config-2.6.18.1-adlink used to compile this kernel
> >> o dmesg output after the machine boots
> >
> > I suppose the information I've sent is not enough to locate the
> > root of the problem. Is there more I can provide?
>
> Here is some context for those who have been added to the CC list:
> http://groups.google.com/group/linux.kernel/browse_frm/thread/bdc8fd08fb601c26
>
> As far as I understand, some consider the eepro100 driver to be
> obsolete, and it has been considered for removal.
>
> What is the current status?
>
> Unfortunately, e100 does not work out-of-the-box on this system.
>
> Is there something I can do to improve the situation?

lets go ahead and print the output from e100_load_eeprom
debug patch attached.

Attachments:

(No filename) (1.52 kB)
e100_debug.patch (849.00 B)
Download all attachments

2006-11-29 11:27:48

by John

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

Jesse Brandeburg wrote:

> John wrote:
>
>> Here is some context for those who have been added to the CC list:
>> http://groups.google.com/group/linux.kernel/browse_frm/thread/bdc8fd08fb601c26
>>
>> As far as I understand, some consider the eepro100 driver to be
>> obsolete, and it has been considered for removal.
>>
>> What is the current status?
>>
>> Unfortunately, e100 does not work out-of-the-box on this system.
>>
>> Is there something I can do to improve the situation?
>
> Let's go ahead and print the output from e100_load_eeprom
> debug patch attached.

Loading (then unloading) e100.ko fails the first few times (i.e. the
driver claims one of the EEPROMs is corrupted). Thereafter, sometimes it
fails, other times it works. Sounds like a race, no?

$ cat load_unload
: > /var/log/kern.log
insmod e100.ko debug=16
sleep 1
cp /var/log/kern.log insmod_$I.txt
ip link > ip_link_$I.txt
sleep 2
rmmod e100
let "I=I+1"

(cf. attached compressed archive)

FAILURE:
insmod_100.txt
insmod_101.txt
insmod_102.txt
insmod_105.txt
insmod_107.txt
insmod_108.txt
insmod_110.txt
insmod_111.txt
insmod_114.txt

SUCCESS:
insmod_103.txt
insmod_104.txt
insmod_106.txt
insmod_109.txt
insmod_112.txt
insmod_113.txt
insmod_115.txt
insmod_116.txt

On an unrelated note, insmod_100.txt is truncated at the beginning, and
insmod_110.txt is truncated in the middle (!!) cf. line 14. What would
cause klogd to behave like that?

Regards.

Attachments:

TEST-e100.tar.bz2 (14.09 kB)

2006-11-29 18:55:58

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

On 11/29/06, John <[email protected]> wrote:
> > Let's go ahead and print the output from e100_load_eeprom
> > debug patch attached.
>
> Loading (then unloading) e100.ko fails the first few times (i.e. the
> driver claims one of the EEPROMs is corrupted). Thereafter, sometimes it
> fails, other times it works. Sounds like a race, no?

yes, or something like that. I think you may have a piece of eeprom
hardware that is either "slow" or slightly out of spec. I wonder if
the hrt kernel makes udelay(4) much more like 4us than the regular
kernels.

can you try adding mdelay(100); in e100_eeprom_load before the for loop,
and then change the multiple udelay(4) to mdelay(1) in e100_eeprom_read

> On an unrelated note, insmod_100.txt is truncated at the beginning, and
> insmod_110.txt is truncated in the middle (!!) cf. line 14. What would
> cause klogd to behave like that?

usually its because whatever is printing is printing too fast or too
much at a time.

2006-12-04 23:26:17

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

On 12/1/06, John <[email protected]> wrote:
> > can you try adding mdelay(100); in e100_eeprom_load before the for loop,
> > and then change the multiple udelay(4) to mdelay(1) in e100_eeprom_read
>
> I applied the attached patch.
>
> Loading the driver now takes around one minute :-)

ouch, but yep, thats what happens when you use "super extra delay"

> I ran 'source load_unload' 25 times in a loop.
>
> The first 12 times were successful. The last 13 times failed.
> (cf. attached archive)
>
> I noticed something very strange.
>
> The number of words obviously in error (0xFFFF) returned by the EEPROM
> on 00:09.0 is not constant.

That is very strange, I would think that maybe you have something else
on the bus with the e100 that may be hogging bus cycles you have
failing hardware (maybe a bad eeprom, or possibly a bad mac chip)

> $ grep -c 0xFFFF insmod*
> insmod_300.txt:0
> insmod_301.txt:0
> insmod_302.txt:0
> insmod_303.txt:0
> insmod_304.txt:0
> insmod_305.txt:0
> insmod_306.txt:0
> insmod_307.txt:0
> insmod_308.txt:0
> insmod_309.txt:0
> insmod_310.txt:0
> insmod_311.txt:0
> insmod_312.txt:1
> insmod_313.txt:5
> insmod_314.txt:24
> insmod_315.txt:45
> insmod_316.txt:243
> insmod_317.txt:256
> insmod_318.txt:256
> insmod_319.txt:256
> insmod_320.txt:256
> insmod_321.txt:256
> insmod_322.txt:256
> insmod_323.txt:253
> insmod_324.txt:240

this is even stranger, does it cycle back down (sine wave) to zero
again? The delays did seem to work, at least sometimes. This
indicates that something needs that extra delay to successfully read
the eeprom. I might try changing all the udelay(4) to udelay(40) (x10
increase) and see if that gives you a happy medium of "most times
driver loads without error"

John, this problem seems to be very specific to your hardware. I know
that you have put in a lot of time debugging this, but I'm not sure
what we can do from here. If this were a generic code problem more
people would be reporting the issue.

What would you like to do? At this stage I would like e100 to work
better than it is, but I'm not sure what to do next.

Thanks for your patience on this issue,
Jesse

2007-02-07 11:08:40

by John Sigler

[permalink] [raw]

Subject: Re: Intel 82559 NIC corrupted EEPROM

Jesse Brandeburg wrote:

> John wrote:
>
>> Jesse Brandeburg wrote:
>>
>>> can you try adding mdelay(100); in e100_eeprom_load before the for loop,
>>> and then change the multiple udelay(4) to mdelay(1) in e100_eeprom_read
>>
>> I applied the attached patch.
>>
>> Loading the driver now takes around one minute :-)
>
> ouch, but yep, thats what happens when you use "super extra delay"
>
>> I ran 'source load_unload' 25 times in a loop.
>>
>> The first 12 times were successful. The last 13 times failed.
>> (cf. attached archive)
>>
>> I noticed something very strange.
>>
>> The number of words obviously in error (0xFFFF) returned by the EEPROM
>> on 00:09.0 is not constant.
>
> That is very strange, I would think that maybe you have something else
> on the bus with the e100 that may be hogging bus cycles you have
> failing hardware (maybe a bad eeprom, or possibly a bad mac chip)
>
>> $ grep -c 0xFFFF insmod*
>> insmod_300.txt:0
>> insmod_301.txt:0
>> insmod_302.txt:0
>> insmod_303.txt:0
>> insmod_304.txt:0
>> insmod_305.txt:0
>> insmod_306.txt:0
>> insmod_307.txt:0
>> insmod_308.txt:0
>> insmod_309.txt:0
>> insmod_310.txt:0
>> insmod_311.txt:0
>> insmod_312.txt:1
>> insmod_313.txt:5
>> insmod_314.txt:24
>> insmod_315.txt:45
>> insmod_316.txt:243
>> insmod_317.txt:256
>> insmod_318.txt:256
>> insmod_319.txt:256
>> insmod_320.txt:256
>> insmod_321.txt:256
>> insmod_322.txt:256
>> insmod_323.txt:253
>> insmod_324.txt:240
>
> this is even stranger, does it cycle back down (sine wave) to zero
> again? The delays did seem to work, at least sometimes. This
> indicates that something needs that extra delay to successfully read
> the eeprom. I might try changing all the udelay(4) to udelay(40) (x10
> increase) and see if that gives you a happy medium of "most times
> driver loads without error"
>
> John, this problem seems to be very specific to your hardware. I know
> that you have put in a lot of time debugging this, but I'm not sure
> what we can do from here. If this were a generic code problem more
> people would be reporting the issue.
>
> What would you like to do? At this stage I would like e100 to work
> better than it is, but I'm not sure what to do next.

Hello everyone,

I'm resurrecting this thread because it appears we'll need to support
these motherboards for several months to come, yet Adrian Bunk has
scheduled the removal of eepro100 in January 2007.

To recap, we have to support ~30 EBC-2000T motherboards.
http://www.adlinktech.com/PD/web/PD_detail.php?pid=213
These motherboards come with three on-board Intel 82559 NICs.

Last time I checked, i.e. two months ago, e100 did not correctly
initialize all three NICs on these motherboards. Therefore, we've been
using eepro100.

I will be testing the latest 2.6.20 kernel to see if the situation has
changed, but I wanted to let you all know that there are still some
eepro100 users out there, out of necessity.

Regards,

John

2007-02-13 19:45:48