2001-10-11 12:29:39

by Robbert Kouprie

[permalink] [raw]
Subject: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

Hi all,

I can confirm that the known bug in the Intel EtherExpress Pro/100
adapter is still not worked around in recent kernels. The bug only
manifests itself when the card is operating on 10 Mbit half duplex. On
100 Mbit there are no problems. The problem is that after the device
received certain amount of traffic (between 80 and 130 Mb in my tests)
the device will lockup on new connections. Processes start to hang after
this and logging in is impossible. The only solution is to reset the
interface (using a previously logged in root session) and reboot the
system.

This is tested on :
Linus kernels:
- 2.4.5
- 2.4.10-pre2
- 2.4.10
- 2.4.11pre6
- 2.4.11
- 2.4.11 with eepro100.c from ac kernel 2.4.10ac11 (minor diffs)

Again, on 100 Mbit there are no more problems.

This is my relevant dmesg output:

eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
<[email protected]> and others
eth0: OEM i82557/i82558 10/100 Ethernet, 00:D0:B7:E8:A2:02, IRQ 17.
Board assembly 749658-005, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0xdbd8681d).

cat /proc/pci part:
Bus 0, device 13, function 0:
Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev
9).
IRQ 17.
Master Capable. Latency=32. Min Gnt=8.Max Lat=56.
Non-prefetchable 32 bit memory at 0xda020000 [0xda020fff].
I/O at 0xc800 [0xc83f].
Non-prefetchable 32 bit memory at 0xda000000 [0xda01ffff].

These are taken from the current setup on 100 Mbit.

Regards,
- Robbert Kouprie, System Administrator, The Netherlands


2001-10-11 16:29:31

by John Gluck

[permalink] [raw]
Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

Hi

I haven't noticed this problem on my system...

I have an 82558 that uses this driver on a Tyan S1836DLUAN motherboard.
I have Dual PIIIs and the system is multi homed. I have a 2nd interface that
uses the NE2000 PCI driver.

The system is my workstation but also acts as an internet gateway (via cable
modem) and firewall for 2 other computers. My workstation is on 24/7 and has
never hungup.

I am currently using 2.4.10 for the kernel but I've used most kenels since
the 2.4.0testX days.

In your tests, is the 80 - 130 megs a single file or is it an aggregate. In
my use I far exceed the amount of data but it's web surfing so the files are
much smaller the what you mention.

John

Robbert Kouprie wrote:

> Hi all,
>
> I can confirm that the known bug in the Intel EtherExpress Pro/100
> adapter is still not worked around in recent kernels. The bug only
> manifests itself when the card is operating on 10 Mbit half duplex. On
> 100 Mbit there are no problems. The problem is that after the device
> received certain amount of traffic (between 80 and 130 Mb in my tests)
> the device will lockup on new connections. Processes start to hang after
> this and logging in is impossible. The only solution is to reset the
> interface (using a previously logged in root session) and reboot the
> system.
>

2001-10-11 16:52:40

by Robbert Kouprie

[permalink] [raw]
Subject: RE: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

Hi,

My testcase was indeed one large file. I did not test lots of small
files. It would always lockup after said amount of traffic, but only in
10 Mbit half duplex mode. Also, I have the 82557, not the 82558 chip.

The problem looks a lot like what should be fixed in this changelog line
from 2.4.9-ac13:

- Work around eepro100 bug with some chip (Arjan van de
Ven)
versions on 10Mbit half duplex

Only 2.4.10ac11 still had the problem...

- Robbert

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of John Gluck
> Sent: donderdag 11 oktober 2001 18:27
> To: Robbert Kouprie
> Cc: [email protected]
> Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels
> 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)
>
>
> Hi
>
> I haven't noticed this problem on my system...
>
> I have an 82558 that uses this driver on a Tyan S1836DLUAN
> motherboard.
> I have Dual PIIIs and the system is multi homed. I have a 2nd
> interface that
> uses the NE2000 PCI driver.
>
> The system is my workstation but also acts as an internet
> gateway (via cable
> modem) and firewall for 2 other computers. My workstation is
> on 24/7 and has
> never hungup.
>
> I am currently using 2.4.10 for the kernel but I've used most
> kenels since
> the 2.4.0testX days.
>
> In your tests, is the 80 - 130 megs a single file or is it an
> aggregate. In
> my use I far exceed the amount of data but it's web surfing
> so the files are
> much smaller the what you mention.
>
> John
>
> Robbert Kouprie wrote:
>
> > Hi all,
> >
> > I can confirm that the known bug in the Intel EtherExpress Pro/100
> > adapter is still not worked around in recent kernels. The bug only
> > manifests itself when the card is operating on 10 Mbit half
> duplex. On
> > 100 Mbit there are no problems. The problem is that after the device
> > received certain amount of traffic (between 80 and 130 Mb
> in my tests)
> > the device will lockup on new connections. Processes start
> to hang after
> > this and logging in is impossible. The only solution is to reset the
> > interface (using a previously logged in root session) and reboot the
> > system.
> >
>
>

2001-10-11 17:10:22

by Alan

[permalink] [raw]
Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

> files. It would always lockup after said amount of traffic, but only in
> 10 Mbit half duplex mode. Also, I have the 82557, not the 82558 chip.
>
> The problem looks a lot like what should be fixed in this changelog line
> from 2.4.9-ac13:

Check the workaround is being activated for your eepro100..

2001-10-11 17:28:56

by Robbert Kouprie

[permalink] [raw]
Subject: RE: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

I assume that should print out a message at bootup? That didn't happen:

Oct 7 18:29:18 radium eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
Oct 7 18:29:18 radium eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified
by Andrey V. Savochkin <[email protected]> and others
Oct 7 18:29:18 radium eth0: OEM i82557/i82558 10/100 Ethernet,
00:D0:B7:E8:A2:02, IRQ 17.
Oct 7 18:29:18 radium Board assembly 749658-005, Physical connectors
present: RJ45
Oct 7 18:29:18 radium Primary interface chip i82555 PHY #1.
Oct 7 18:29:18 radium General self-test: passed.
Oct 7 18:29:18 radium Serial sub-system self-test: passed.
Oct 7 18:29:18 radium Internal registers self-test: passed.
Oct 7 18:29:18 radium ROM checksum self-test: passed (0xdbd8681d).

- Robbert

> -----Original Message-----
> From: Alan Cox [mailto:[email protected]]
> Sent: donderdag 11 oktober 2001 19:16
> To: Robbert Kouprie
> Cc: 'John Gluck'; [email protected]
> Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels
> 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)
>
>
> > files. It would always lockup after said amount of traffic,
> but only in
> > 10 Mbit half duplex mode. Also, I have the 82557, not the
> 82558 chip.
> >
> > The problem looks a lot like what should be fixed in this
> changelog line
> > from 2.4.9-ac13:
>
> Check the workaround is being activated for your eepro100..
>

2001-10-11 17:46:17

by Matthew S. Hallacy

[permalink] [raw]
Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

On Thu, Oct 11, 2001 at 02:29:56PM +0200, Robbert Kouprie wrote:
> Hi all,
>
> I can confirm that the known bug in the Intel EtherExpress Pro/100
> adapter is still not worked around in recent kernels. The bug only
> manifests itself when the card is operating on 10 Mbit half duplex. On
> 100 Mbit there are no problems. The problem is that after the device
> received certain amount of traffic (between 80 and 130 Mb in my tests)
> the device will lockup on new connections. Processes start to hang after
> this and logging in is impossible. The only solution is to reset the
> interface (using a previously logged in root session) and reboot the
> system.
[snip]

I currently have the equivalent of 8 of these in my system (Compaq NC3131,
quad ethernet..)

Bus 2, device 4, function 0:
Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 5).
^^^^
IRQ 10.
Master Capable. Latency=64. Min Gnt=8.Max Lat=56.
Prefetchable 32 bit memory at 0xcb7ff000 [0xcb7fffff].
I/O at 0x7c00 [0x7c1f].
Non-prefetchable 32 bit memory at 0xcfe00000 [0xcfefffff].

it is the same chip, this particular interface is 10mbit/half duplex, and
all the interfaces transfer 1G+/day (some small files, some larger than 500 megs)
with no problems, I should note this:

eth0: OEM i82557/i82558 10/100 Ethernet, DE:AD:BA:BE:CA:FE, IRQ 10.
Receiver lock-up bug exists -- enabling work-around.
^^^^^^^^^^^^^^^^^^^^
Board assembly 009542-001, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x24c9f043).
Receiver lock-up workaround activated.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
eth1: OEM i82557/i82558 10/100 Ethernet, DE:AD:BE:EF:CA:FE, IRQ 10.
Receiver lock-up bug exists -- enabling work-around.
^^^^^^^^^^^^^^^^^^^^
Board assembly 009542-001, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x24c9f043).
Receiver lock-up workaround activated.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

etc etc, for every interface.

I'm up to 2.4.10, it's worked fine on 2.4.2, 2.4.4, 2.4.5-8, and 2.4.10 so far.
(I didn't use the kernels not mentioned)


Good luck.
-poptix

2001-10-11 19:19:27

by Robbert Kouprie

[permalink] [raw]
Subject: RE: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

I looked a bit deeper. In linux-2.4.10-ac11/drivers/net/eepro100.c:

line 802:
if ((pdev->device=0x2449) || ( (pdev->device > 0x1030) &&
(pdev->device < 0x1039) ))
sp->chip_id = 1;

line 1358:
/* workaround for hardware bug on 10 mbit half duplex */

if ((sp->partner==0) && (sp->chip_id==1)) {
wait_for_cmd_done(ioaddr + SCBCmd);
outb(0 , ioaddr + SCBCmd);
}

Maybe we need another device id at line 802? The work-around seems to
stay untriggered this way.

My device's id is: 8086:1229 - Intel, 82557 [Ethernet Pro 100]
The present ids are: 8086:1030 - 82559 InBusiness 10/100
8086:1031-1039 - are not listed in my db
8086:2449 - 82820 820 (Camino 2) Chipset
Ethernet

For one thing, in Linus' 2.4.12 the if condition at line 802 isn't
present at all, so that sure isn't gonna work.

Regards,
- Robbert

> -----Original Message-----
> From: Alan Cox [mailto:[email protected]]
> Sent: donderdag 11 oktober 2001 19:16
> To: Robbert Kouprie
> Cc: 'John Gluck'; [email protected]
> Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels
> 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)
>
>
> > files. It would always lockup after said amount of traffic,
> but only in
> > 10 Mbit half duplex mode. Also, I have the 82557, not the
> 82558 chip.
> >
> > The problem looks a lot like what should be fixed in this
> changelog line
> > from 2.4.9-ac13:
>
> Check the workaround is being activated for your eepro100..
>

2001-10-11 19:29:07

by Alan

[permalink] [raw]
Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

> if ((pdev->device=0x2449) || ( (pdev->device > 0x1030) &&
^^^^^^^

Well thats a bug (just fixed)

> My device's id is: 8086:1229 - Intel, 82557 [Ethernet Pro 100]
> The present ids are: 8086:1030 - 82559 InBusiness 10/100
> 8086:1031-1039 - are not listed in my db
> 8086:2449 - 82820 820 (Camino 2) Chipset
> Ethernet
>
> For one thing, in Linus' 2.4.12 the if condition at line 802 isn't
> present at all, so that sure isn't gonna work.

Try enabling the test regardless and seeing if it helps on your box

2001-10-11 20:08:40

by Ion Badulescu

[permalink] [raw]
Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

On Thu, 11 Oct 2001 11:42:08 -0600, Matthew S. Hallacy <[email protected]> wrote:

> I currently have the equivalent of 8 of these in my system (Compaq NC3131,
> quad ethernet..)
>
> Bus 2, device 4, function 0:
> Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 5).
> ^^^^

Umm, no, that's actually an 82558 rev B. pci.ids should be updated to
have "Intel Corporation 8255[7-9]" for this id, because Intel can't make
up their minds to change the PCI id when they release a new product.

rev 1-3 are 82557, rev 4-5 are 82558, rev 6-8 are 82559.

> it is the same chip, this particular interface is 10mbit/half duplex, and
> all the interfaces transfer 1G+/day (some small files, some larger than 500 megs)
> with no problems, I should note this:
>
> eth0: OEM i82557/i82558 10/100 Ethernet, DE:AD:BA:BE:CA:FE, IRQ 10.
> Receiver lock-up bug exists -- enabling work-around.
> ^^^^^^^^^^^^^^^^^^^^

The OEM probably forgot to initialized the eeprom correctly, because
82558 rev B and higher don't have this bug. Anyway, the workaround is
pretty harmless.

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2001-10-11 20:17:00

by Dan Hollis

[permalink] [raw]
Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

On Thu, 11 Oct 2001, Ion Badulescu wrote:
> Umm, no, that's actually an 82558 rev B. pci.ids should be updated to
> have "Intel Corporation 8255[7-9]" for this id, because Intel can't make
> up their minds to change the PCI id when they release a new product.
> rev 1-3 are 82557, rev 4-5 are 82558, rev 6-8 are 82559.

lspci should be changed to take in account rev numbers...

-Dan
--
[-] Omae no subete no kichi wa ore no mono da. [-]

2001-10-11 21:02:35

by Robbert Kouprie

[permalink] [raw]
Subject: RE: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

Alan,

Your fix seems to have eliminated the problem. I found this strange, as
the device ids still did not match mine. So I added a PRINTK line in the
test, and found that _with_ your fix it DOES NOT get triggered. The ac
kernel WITH the bug however DOES trigger the test.

So, as I have tested both your and Linus' driver (in which the whole
"if" was missing), one has to conclude that both the bug in de ac driver
AND the whole missing line in Linus' kernel made the test succeed, where
is actually SHOULD NOT succeed. So actually my NIC is perfectly ok, but
not in combination with a workaround for a bug it doesn't have ;) This
was what broke things.

So, the 10Mbit half-duplex workaround breaks stuff on the devices that
do not suffer from the bug. This is dangerous... ;)

Anyway, I'm upgraded to 100Mbit now, and the bug is fixed, so I'm happy
:)
Thanx for your help.

Regards,
- Robbert

> -----Original Message-----
> From: Alan Cox [mailto:[email protected]]
> Sent: donderdag 11 oktober 2001 21:34
> To: Robbert Kouprie
> Cc: 'Alan Cox'; [email protected]
> Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels
> 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)
>
>
> > if ((pdev->device=0x2449) || ( (pdev->device > 0x1030) &&
> ^^^^^^^
>
> Well thats a bug (just fixed)
>
> > My device's id is: 8086:1229 - Intel, 82557 [Ethernet Pro 100]
> > The present ids are: 8086:1030 - 82559 InBusiness 10/100
> > 8086:1031-1039 - are not listed in my db
> > 8086:2449 - 82820 820 (Camino 2) Chipset
> > Ethernet
> >
> > For one thing, in Linus' 2.4.12 the if condition at line 802 isn't
> > present at all, so that sure isn't gonna work.
>
> Try enabling the test regardless and seeing if it helps on your box
>
>

2001-10-12 08:40:49

by Robbert Kouprie

[permalink] [raw]
Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)



> > Bus 2, device 4, function 0:
> > Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 5).
> > ^^^^
>
> Umm, no, that's actually an 82558 rev B. pci.ids should be updated to
> have "Intel Corporation 8255[7-9]" for this id, because Intel can't make
> up their minds to change the PCI id when they release a new product.
> rev 1-3 are 82557, rev 4-5 are 82558, rev 6-8 are 82559.

Mine says rev 9 :)

radium:/# lspci -v -d 8086:1229
00:0d.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100]
(rev 09)
Subsystem: Intel Corporation: Unknown device 0011
Flags: bus master, medium devsel, latency 32, IRQ 17
Memory at da020000 (32-bit, non-prefetchable) [size=4K]
I/O ports at c800 [size=64]
Memory at da000000 (32-bit, non-prefetchable) [size=128K]
Expansion ROM at <unassigned> [disabled] [size=1M]
Capabilities: [dc] Power Management version 2

radium:/# lspci -nv -d 8086:1229
00:0d.0 Class 0200: 8086:1229 (rev 09)
Subsystem: 8086:0011
Flags: bus master, medium devsel, latency 32, IRQ 17
Memory at da020000 (32-bit, non-prefetchable) [size=4K]
I/O ports at c800 [size=64]
Memory at da000000 (32-bit, non-prefetchable) [size=128K]
Expansion ROM at <unassigned> [disabled] [size=1M]
Capabilities: [dc] Power Management version 2


> > eth0: OEM i82557/i82558 10/100 Ethernet, DE:AD:BA:BE:CA:FE, IRQ 10.
> > Receiver lock-up bug exists -- enabling work-around.
> > ^^^^^^^^^^^^^^^^^^^^
> The OEM probably forgot to initialized the eeprom correctly, because
> 82558 rev B and higher don't have this bug. Anyway, the workaround is
> pretty harmless.

My card DOES NOT have the receiver lock-up bug and also DOES NOT have the
10 Mbit half duplex bug, which was the one I was referring to. The device
detection for the workaround for the latter bug turned out to be
wrong.

- Robbert

2001-10-12 13:40:35

by Ion Badulescu

[permalink] [raw]
Subject: Re: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

On Fri, 12 Oct 2001, Robbert Kouprie wrote:

> Mine says rev 9 :)
>
> radium:/# lspci -v -d 8086:1229
> 00:0d.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100]
> (rev 09)
> Subsystem: Intel Corporation: Unknown device 0011
> Flags: bus master, medium devsel, latency 32, IRQ 17
> Memory at da020000 (32-bit, non-prefetchable) [size=4K]
> I/O ports at c800 [size=64]
> Memory at da000000 (32-bit, non-prefetchable) [size=128K]
> Expansion ROM at <unassigned> [disabled] [size=1M]
> Capabilities: [dc] Power Management version 2

That's an 82559ER step A.

> > eth0: OEM i82557/i82558 10/100 Ethernet, DE:AD:BA:BE:CA:FE, IRQ 10.
> > Receiver lock-up bug exists -- enabling work-around.
> > ^^^^^^^^^^^^^^^^^^^^
>
> My card DOES NOT have the receiver lock-up bug

Your card's eeprom claims otherwise. The eeprom is most likely wrong, but
again, the workaround for *this* bug is pretty harmless, whether the bug
exists or not.

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.




2001-10-13 18:16:06

by Robbert Kouprie

[permalink] [raw]
Subject: RE: eepro100.c bug on 10Mbit half duplex (kernels 2.4.5 / 2.4.10 / 2.4.11pre6 / 2.4.11 / 2.4.10ac11)

On Fri, 12 Oct 2001, Ion Badulescu wrote:

> > > Receiver lock-up bug exists -- enabling work-around.
> > > ^^^^^^^^^^^^^^^^^^^^
> >
> > My card DOES NOT have the receiver lock-up bug
>
> Your card's eeprom claims otherwise. The eeprom is most
> likely wrong, but
> again, the workaround for *this* bug is pretty harmless,
> whether the bug
> exists or not.
>
> Ion
>

Sorry, this was kind of a unlucky paste, because this line:

> > > Receiver lock-up bug exists -- enabling work-around.
> > > ^^^^^^^^^^^^^^^^^^^^

is from the previous sender's dmesg (Matthew S. Hallacy). *My card* does
not give this message, but it surely has a bug (which is not the
receiver lock-up bug).

Earlier, I was somewhat too quick with my conclusions. Since I upgraded
the link to 100 Mbit, also half duplex, the problem seemed gone. This
was NOT the case. The problem now only takes about 10 times as much
traffic to trigger.

* With vanilla kernel-2.4.13-pre2 the problem exists.
* With vanilla kernel-2.4.12-ac1 the problem exists.

So I added my device id to the 10 Mbit half-duplex workaround check, and
problem went away. For now. ;) I am gonna test this for some days and if
it stays put I will post the patch. Anyway, I should've stuck with 3com
:)

Regards,
- Robbert