2001-02-08 05:53:01

by Augustin Vidovic

[permalink] [raw]
Subject: [PATCH] eepro100.c, kernel 2.4.1

Patch for drivers/net/eepro100.c in kernel 2.4.1 (and before).
For some of the bugged Intel EtherExpress Pro 100 network cards,
although the driver diagnoses the receiver lock-up bug, the workaround
is not enabled. It appears that the test for the diagnostic and the test
for the workaround activation are different. I assumed the diagnostic
test is OK and I changed the work-around activation test. I had several
Intel ISP1100 boxes with the bug diagnosed, but the workaround not enabled,
and after the patch, the workaround is activated and the boxes seem to
be alright even under very high network trafic (they were failing before,
due to the card bug, I think).

Attached is the tarball of the patch, I believe conform to the list
FAQ guidelines. Since the patch is only one line, I also include it
in the body of this message as plain text.

--- linux-2.4.1/drivers/net/eepro100.c Sun Jan 28 03:40:14 2001
+++ linux-2.4.1-vido1/drivers/net/eepro100.c Thu Feb 8 14:08:49 2001
@@ -815,7 +815,7 @@

sp->phy[0] = eeprom[6];
sp->phy[1] = eeprom[7];
- sp->rx_bug = (eeprom[3] & 0x03) == 3 ? 0 : 1;
+ sp->rx_bug = eeprom[3] & 0x03;

if (sp->rx_bug)
printk(KERN_INFO " Receiver lock-up workaround activated.\n");

I don't understand why the tests for the diagnostic and for the
workaround activation were different. Maybe a simple bug, but maybe there
was an obscure reason. I someone knows...

--
Augustin Vidovic http://www.vidovic.org/augustin/
"Nous sommes tous quelque chose de naissance, musicien ou assassin,
mais il faut apprendre le maniement de la harpe ou du couteau."


Attachments:
(No filename) (1.62 kB)
patch-eepro100-vido1.tar (10.00 kB)
Download all attachments

2001-02-08 07:24:49

by Ion Badulescu

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

On Thu, 8 Feb 2001 14:53:55 +0900, Augustin Vidovic <[email protected]> wrote:

> --- linux-2.4.1/drivers/net/eepro100.c Sun Jan 28 03:40:14 2001
> +++ linux-2.4.1-vido1/drivers/net/eepro100.c Thu Feb 8 14:08:49 2001
> @@ -815,7 +815,7 @@
>
> sp->phy[0] = eeprom[6];
> sp->phy[1] = eeprom[7];
> - sp->rx_bug = (eeprom[3] & 0x03) == 3 ? 0 : 1;
> + sp->rx_bug = eeprom[3] & 0x03;
>
> if (sp->rx_bug)
> printk(KERN_INFO " Receiver lock-up workaround activated.\n");

This patch is wrong, please DON'T apply it.

It's the printk that gets it wrong, although that's harmless.
Intel's documentation states that the bug does NOT exist if the
bits 0 and 1 in eeprom[3] are 1. Thus, the workaround is correct,
the printk is wrong.

The correct patch for 2.4.1 is attached. 2.2.18 needs something
similar, the same patch can be applied with some fuzz.

Thanks,
Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
--------------------------------
--- /usr/src/local/linux-2.4.vanilla/drivers/net/eepro100.c Wed Feb 7 15:45:16 2001
+++ linux-2.4/drivers/net/eepro100.c Wed Feb 7 23:07:29 2001
@@ -725,7 +725,7 @@
/* The self-test results must be paragraph aligned. */
volatile s32 *self_test_results;
int boguscnt = 16000; /* Timeout for set-test. */
- if (eeprom[3] & 0x03)
+ if ((eeprom[3] & 0x03) != 0x03)
printk(KERN_INFO " Receiver lock-up bug exists -- enabling"
" work-around.\n");
printk(KERN_INFO " Board assembly %4.4x%2.2x-%3.3d, Physical"

2001-02-08 07:36:02

by Augustin Vidovic

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

On Wed, Feb 07, 2001 at 11:23:01PM -0800, Ion Badulescu wrote:
> Intel's documentation states that the bug does NOT exist if the
> bits 0 and 1 in eeprom[3] are 1. Thus, the workaround is correct,
> the printk is wrong.

I wonder if it's not Intel's documentation which is wrong : it seems
that the bug showed up also with the network cards used in my boxes,
and the patch I proposed seemed to fix that problem.

--
Augustin Vidovic http://www.vidovic.org/augustin/
"Nous sommes tous quelque chose de naissance, musicien ou assassin,
mais il faut apprendre le maniement de la harpe ou du couteau."

2001-02-08 07:45:15

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

> It's the printk that gets it wrong, although that's harmless.
> Intel's documentation states that the bug does NOT exist if the
> bits 0 and 1 in eeprom[3] are 1. Thus, the workaround is correct,
> the printk is wrong.

So why does it fix the problem for him. His report and your reply don't
make sense viewed together

2001-02-08 07:55:51

by Andrey Savochkin

[permalink] [raw]
Subject: Re: eepro100.c, kernel 2.4.1

On Thu, Feb 08, 2001 at 02:42:52AM -0500, Alan Cox wrote:
> > It's the printk that gets it wrong, although that's harmless.
> > Intel's documentation states that the bug does NOT exist if the
> > bits 0 and 1 in eeprom[3] are 1. Thus, the workaround is correct,
> > the printk is wrong.
>
> So why does it fix the problem for him. His report and your reply don't
> make sense viewed together

First of all, I have information that the bug may be in 82557 only.

Augustin, could you provide full information about your cards (including the
text printed by the driver at the initialization) and elaborate on "failing
under high load"?

Best regards
Andrey

2001-02-08 07:59:42

by Ion Badulescu

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

On Thu, 8 Feb 2001, Alan Cox wrote:

> > It's the printk that gets it wrong, although that's harmless.
> > Intel's documentation states that the bug does NOT exist if the
> > bits 0 and 1 in eeprom[3] are 1. Thus, the workaround is correct,
> > the printk is wrong.
>
> So why does it fix the problem for him. His report and your reply don't
> make sense viewed together

I don't think it fixes *this* bug. However, the bug workaround effectively
reinitializes the chip, so it might serve as a generic 'reset and try
again' kind of workaround. In that case, we might as well enable it
unconditionally... but I don't see it as a good solution. It's a stop-gap
measure at best.

We need to find out what exactly happens. Until he tells us more about how
his boxes "were failing before", there really isn't much we can diagnose.

I happen to also have an Intel ISP1100 box here, and I know what's inside
-- i82559 C-step chips which definitely don't have this bug. The bug is an
i82557-only bug; what makes things confusing is Intel idea of giving
multiple chips the same PCI id. They can be identified via the PCI rev:

i82557 step A-C: rev 1-3
i82558 step A-B: rev 4-5
i82559 step A-C: rev 6-8

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.


2001-02-08 10:40:38

by Augustin Vidovic

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

On Wed, Feb 07, 2001 at 11:59:05PM -0800, Ion Badulescu wrote:
> I don't think it fixes *this* bug. However, the bug workaround effectively
> reinitializes the chip, so it might serve as a generic 'reset and try
> again' kind of workaround. In that case, we might as well enable it
> unconditionally... but I don't see it as a good solution. It's a stop-gap
> measure at best.
>
> We need to find out what exactly happens. Until he tells us more about how
> his boxes "were failing before", there really isn't much we can diagnose.


Ok, then let's go into a bit more details.

First, the part of the dmesg concerning the network interfaces:

eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <[email protected]> and others
PCI: Found IRQ 5 for device 00:0c.0
PCI: The same IRQ used for device 00:0d.0
eth0: PCI device 8086:1229, 00:D0:B7:00:BE:00, IRQ 5.
Receiver lock-up bug exists -- enabling work-around.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
Receiver lock-up workaround activated.
PCI: Found IRQ 5 for device 00:0d.0
PCI: The same IRQ used for device 00:0c.0
eth1: PCI device 8086:1229, 00:D0:B7:00:BE:01, IRQ 5.
Receiver lock-up bug exists -- enabling work-around.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
Receiver lock-up workaround activated.


Please note: the "Receiver lock-up workaround activated." message is printed
now only since I applied my patch. Before, only the "enabling work-around." part
appeared, which is a bit tricky.

Second, attached to this mail is an mrtg graph png. Beware that the timeline goes
from right to left. This covers the past week. Every day the big peak is the
midnight "masturbation rush" when nearly everyone connects at the same time to
browse pr0n sites. You'll notice that the midnight peak is castrated suddenly
last friday. This accident happened 3 times the previous week. Kind of frustrating.

You can see a kind of sudden blackout which lasts about 3 hours, and then the
situation resumes to normality.

At the same time, the /var/log/messages receives thousands of messages from the
NET: subsystem.

A rather long research on the various mailing lists and newsgroups about networking
shows that this behavior is shown the same way on systems using a bugged Intel EtherExpress
Pro 100 network card.

Since the dmesg of the kernel tells about a work-around for such a bug, I was assuming
that the work around was activated, but I had a doubt and after looking at the source,
I discovered that it wasn't.

On saturday I patched the kernels, and since the midnight peaks are no longer
broken, there is no more desperate messages from the NET subsystem in the logs,
so maybe the problem has been fixed.

Now, as Ion says, maybe it is not the "receiver lock-up bug" itself which is
worked-around, frankly I don't know.


--
Augustin Vidovic http://www.vidovic.org/augustin/
"Nous sommes tous quelque chose de naissance, musicien ou assassin,
mais il faut apprendre le maniement de la harpe ou du couteau."


Attachments:
(No filename) (3.46 kB)
mrtg.png (4.15 kB)
Download all attachments

2001-02-08 11:00:45

by Ion Badulescu

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

On Thu, 8 Feb 2001 19:41:56 +0900, Augustin Vidovic <[email protected]> wrote:

> You can see a kind of sudden blackout which lasts about 3 hours, and then the
> situation resumes to normality.
>
> At the same time, the /var/log/messages receives thousands of messages from the
> NET: subsystem.

So what _were_ those messages? Can you post them?

> Since the dmesg of the kernel tells about a work-around for such a bug, I was assuming
> that the work around was activated, but I had a doubt and after looking at the source,
> I discovered that it wasn't.

Well, your patch disables the work-around exactly for those (really old) cards
that actually need it and enables it for those that don't need it.

> Now, as Ion says, maybe it is not the "receiver lock-up bug" itself which is
> worked-around, frankly I don't know.

There is a very simple way to tell. Check your logs for messages like:

eth0: Sending a multicast list set command from a timer routine........."

If you find such messages, the work-around really did something. Otherwise,
it's the placebo effect...


Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2001-02-08 11:14:20

by Augustin Vidovic

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

On Thu, Feb 08, 2001 at 03:00:10AM -0800, Ion Badulescu wrote:
> > At the same time, the /var/log/messages receives thousands of messages from the
> > NET: subsystem.
>
> So what _were_ those messages? Can you post them?

No I can't because they were suppressed by the syslogd (DOS protection), only
their number being reported (several thousands every few seconds).

> > Since the dmesg of the kernel tells about a work-around for such a bug, I was assuming
> > that the work around was activated, but I had a doubt and after looking at the source,
> > I discovered that it wasn't.
>
> Well, your patch disables the work-around exactly for those (really old) cards
> that actually need it and enables it for those that don't need it.

No, because the test usede for the activation is now the same as the one used
for the diagnostic, which means that every card which is diagnosed to have the
bug get the workaround activated.

> > Now, as Ion says, maybe it is not the "receiver lock-up bug" itself which is
> > worked-around, frankly I don't know.
>
> eth0: Sending a multicast list set command from a timer routine........."
>
> If you find such messages, the work-around really did something. Otherwise,
> it's the placebo effect...

Now, I do not get _any_ message in the logs, which means that the network
cards activity is closer to normality than before the patch.

--
Augustin Vidovic http://www.vidovic.org/augustin/
"Nous sommes tous quelque chose de naissance, musicien ou assassin,
mais il faut apprendre le maniement de la harpe ou du couteau."

2001-02-08 11:27:21

by Ion Badulescu

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

On Thu, 8 Feb 2001 20:15:39 +0900, Augustin Vidovic <[email protected]> wrote:

>> So what _were_ those messages? Can you post them?
>
> No I can't because they were suppressed by the syslogd (DOS protection), only
> their number being reported (several thousands every few seconds).

syslogd does not suppress messages, it suppresses *identical* messages.
So what was the *first* message logged by syslogd, the one followed by
"last message repeated XXX times"?

>> Well, your patch disables the work-around exactly for those (really old) cards
>> that actually need it and enables it for those that don't need it.
>
> No, because the test usede for the activation is now the same as the one used
> for the diagnostic, which means that every card which is diagnosed to have the
> bug get the workaround activated.

Umm, no. With your patch, both the diagnostic and the activation are wrong,
whereas before only the diagnostic was wrong.

>> eth0: Sending a multicast list set command from a timer routine........."
>>
>> If you find such messages, the work-around really did something. Otherwise,
>> it's the placebo effect...
>
> Now, I do not get _any_ message in the logs, which means that the network
> cards activity is closer to normality than before the patch.

So your patch did not do you any good. Case closed, as far as the work-around
is concerned.

If you post the original log messages, we might be able to find the real
bug...

[and please don't drop the Cc:]

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2001-02-08 11:44:31

by Augustin Vidovic

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

On Thu, Feb 08, 2001 at 03:26:51AM -0800, Ion Badulescu wrote:
> syslogd does not suppress messages, it suppresses *identical* messages.
> So what was the *first* message logged by syslogd, the one followed by
> "last message repeated XXX times"?

It's not "last message repeatead XXX times", it's :
...
Jan 30 00:01:18 XXX kernel: NET: 8298 messages suppressed.
Jan 30 00:01:24 XXX kernel: NET: 2929 messages suppressed.
Jan 30 00:01:38 XXX kernel: NET: 1225 messages suppressed.
Jan 30 00:01:43 XXX kernel: NET: 4397 messages suppressed.
Jan 30 00:01:48 XXX kernel: NET: 2342 messages suppressed.
...
(ad nauseam)

This suppression of thousands of lines was described as a DOS-protection
in the docs I read.

> Umm, no. With your patch, both the diagnostic and the activation are wrong,
> whereas before only the diagnostic was wrong.

With my patch, the test becomes (eeprom[3] & 0x03), which is not null
for every possible non-null value of the two lower bits :

bit1 bit0 [bit1,bit0]&[1,1]
0 0 00
0 1 01
1 0 10
1 1 11

Whereas the other test is more restrictive, because it excludes the "11"
from the results.
The old cards still get the workaround enabled this this wider test.

> > Now, I do not get _any_ message in the logs, which means that the network
> > cards activity is closer to normality than before the patch.
>
> So your patch did not do you any good. Case closed, as far as the work-around
> is concerned.

To the contrary, it seems to do a lot of good, because the NET subsystem
does not send any more panic messages to the kernel, and the cluster has
not meltdown again so far.

> If you post the original log messages, we might be able to find the real
> bug...

Sorry, I can't, as they were suppressed (as you can see in the example
I copy-pasted before in this mail), and now I don't get any other one.

> [and please don't drop the Cc:]

Ok, if you insist.

--
Augustin Vidovic http://www.vidovic.org/augustin/
"Nous sommes tous quelque chose de naissance, musicien ou assassin,
mais il faut apprendre le maniement de la harpe ou du couteau."

2001-02-08 11:53:32

by Ion Badulescu

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

On Thu, 8 Feb 2001, Augustin Vidovic wrote:

> This suppression of thousands of lines was described as a DOS-protection
> in the docs I read.

Still, there should be something before these suppressed messages started.

> With my patch, the test becomes (eeprom[3] & 0x03), which is not null
> for every possible non-null value of the two lower bits :
>
> bit1 bit0 [bit1,bit0]&[1,1]
> 0 0 00
> 0 1 01
> 1 0 10
> 1 1 11
>
> Whereas the other test is more restrictive, because it excludes the "11"
> from the results.
> The old cards still get the workaround enabled this this wider test.

No, they don't.

It goes like this:

bit0 = 1 means the workaround may be omitted when operating at 10 Mbit
bit1 = 1 means the workaround may be omitted when operating at 100 Mbit

So the workaround needs to be activated when at least one bit is zero, and
may be omitted when both bits are 1. That's exactly what the original code
does.

> > So your patch did not do you any good. Case closed, as far as the work-around
> > is concerned.
>
> To the contrary, it seems to do a lot of good, because the NET subsystem
> does not send any more panic messages to the kernel, and the cluster has
> not meltdown again so far.

"Yesterday, a brick fell upon my head while I was walking on the street.
Today, I put my hat on before leaving home, and no brick fell on my head
anymore. So the hat must have helped!"

Please read the code if you don't believe me.


Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2001-02-08 12:08:26

by Augustin Vidovic

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

On Thu, Feb 08, 2001 at 03:53:10AM -0800, Ion Badulescu wrote:
> Still, there should be something before these suppressed messages started.

No, sorry, but absolutely nothing since the boot.

> It goes like this:
>
> bit0 = 1 means the workaround may be omitted when operating at 10 Mbit
> bit1 = 1 means the workaround may be omitted when operating at 100 Mbit
>
> So the workaround needs to be activated when at least one bit is zero, and
> may be omitted when both bits are 1. That's exactly what the original code
> does.

Ah ok.

> "Yesterday, a brick fell upon my head while I was walking on the street.
> Today, I put my hat on before leaving home, and no brick fell on my head
> anymore. So the hat must have helped!"

You're absolutely right. I still don't know if activating the workaround
helped, it just seemed to help.

> Please read the code if you don't believe me.

I read it, but I don't have the Intel docs, so I miss the information you
have.

Thank you for spending time for this problem.

--
Augustin Vidovic http://www.vidovic.org/augustin/
"Nous sommes tous quelque chose de naissance, musicien ou assassin,
mais il faut apprendre le maniement de la harpe ou du couteau."

2001-02-09 14:38:45

by Peter Lund

[permalink] [raw]
Subject: Re: [PATCH] eepro100.c, kernel 2.4.1

Alan Cox, Thu Feb 08 2001 - 02:42:52 EST:

> > It's the printk that gets it wrong, although that's harmless.
> > Intel's documentation states that the bug does NOT exist if the
> > bits 0 and 1 in eeprom[3] are 1. Thus, the workaround is correct,
> > the printk is wrong.
>
> So why does it fix the problem for him. His report and your reply don't
> make sense viewed together

Wish I'd seen this patch about a month and a half before. I had borrowed two
machines from IBM Denmark for evaluation and their motherboard mounted eepro100
cards (forget which exact chip version it was) didn't quite work with the driver
in the standard RH 6.2.

On boot up it said something about the Receiver lock up bug (only one of the two
messages, I think) and then it locked up anyway half an hour and a couple of
hundred ethernet packets later. I didn't have time to look really closely at
the source code at the time :/

Just another data point indicating that the current receiver lock up enabling
code isn't good enough on newish chips.

-Peter

2001-02-12 05:38:58

by Andrey Savochkin

[permalink] [raw]
Subject: Re: eepro100.c, kernel 2.4.1

Ion,

On Thu, Feb 08, 2001 at 03:26:51AM -0800, Ion Badulescu wrote:
> On Thu, 8 Feb 2001 20:15:39 +0900, Augustin Vidovic <[email protected]> wrote:
>
> >> eth0: Sending a multicast list set command from a timer routine........."
> >>
> >> If you find such messages, the work-around really did something. Otherwise,
> >> it's the placebo effect...
> >
> > Now, I do not get _any_ message in the logs, which means that the network
> > cards activity is closer to normality than before the patch.
>
> So your patch did not do you any good. Case closed, as far as the work-around
> is concerned.

I've just checked: "Sending a multicast list set command" is printed only on
high debug levels, so Augustin might not see them.

If "Receiver lock-up workaround activated" message is printed, then the
workaround is really activated.
I doubt that the real reason is that RX bug, but periodic multicast list set
commands may certainly affect the behavior.

Augustin, could you send the output of `lspci' and `eepro100-diag -ee', please?
(The latter may be taken from ftp://scyld.com/pub/diag/)

Best regards
Andrey

2001-02-12 09:01:13

by Ion Badulescu

[permalink] [raw]
Subject: Re: eepro100.c, kernel 2.4.1

On Mon, 12 Feb 2001, Andrey Savochkin wrote:

> I've just checked: "Sending a multicast list set command" is printed only on
> high debug levels, so Augustin might not see them.

I could have sworn that I saw the message being printed unconditionally.
But you're right, so we're back to square one..

BTW, stalling Rx for 2 seconds is enough to trigger even NFS warnings. It
might not be enough to show up on performance graphs, though.

> If "Receiver lock-up workaround activated" message is printed, then the
> workaround is really activated.

What's more worrying is that lately I've seen an (unrelated) report with
the workaround activated even on i82559 chips -- even though in theory the
i82559 cannot possibly have this bug. It might be that the eeprom was
improperly initialized, but that's something I'd expect from an OEM, not
from Intel:

00:0d.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 09)
Subsystem: Intel Corporation: Unknown device 2411

kernel: eth0: Intel PCI EtherExpress Pro100 82557, 00:10:A4:0E:7F:EC, IRQ 10.
kernel: Board assembly 000695-001, Physical connectors present: RJ45
kernel: Primary interface chip i82555 PHY #1.
kernel: General self-test: passed.
kernel: Serial sub-system self-test: passed.
kernel: Internal registers self-test: passed.
kernel: ROM checksum self-test: passed (0xdbd8681d).
kernel: Receiver lock-up workaround activated.

This was with 2.2.18 (not my report, taken from the list).

> I doubt that the real reason is that RX bug, but periodic multicast list set
> commands may certainly affect the behavior.

Full chip re-initialization, only the descriptor lists are untouched..

> Augustin, could you send the output of `lspci' and `eepro100-diag -ee', please?
> (The latter may be taken from ftp://scyld.com/pub/diag/)

I'd be curious to see them too.

Thanks,
Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

2001-02-20 06:31:14

by Augustin Vidovic

[permalink] [raw]
Subject: Re: eepro100.c, kernel 2.4.1

On Mon, Feb 12, 2001 at 01:00:34AM -0800, Ion Badulescu wrote:
> > Augustin, could you send the output of `lspci' and `eepro100-diag -ee', please?
> > (The latter may be taken from ftp://scyld.com/pub/diag/)
>
> I'd be curious to see them too.

Ok, here is the output (the status are displayed only if the interface
is down, so I had to go execute this manually on the machines) :


eepro100-diag.c:v2.02 7/19/2000 Donald Becker ([email protected])
http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xef00.
i82557 chip registers at 0xef00:
00000000 00000000 00000000 00080002 182541e1 00000600
No interrupt sources are pending.
The transmit unit state is 'Idle'.
The receive unit state is 'Idle'.
This status is unusual for an activated interface.
Index #2: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xee80.
i82557 chip registers at 0xee80:
00000000 00000000 00000000 00080002 183f0000 00000000
No interrupt sources are pending.
The transmit unit state is 'Idle'.
The receive unit state is 'Idle'.
This status is unusual for an activated interface.
eepro100-diag.c:v2.02 7/19/2000 Donald Becker ([email protected])
http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xef00.
i82557 chip registers at 0xef00:
00000000 00000000 00000000 00080002 182541e1 00000600
No interrupt sources are pending.
The transmit unit state is 'Idle'.
The receive unit state is 'Idle'.
This status is unusual for an activated interface.
Index #2: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xee80.
i82557 chip registers at 0xee80:
00000000 00000000 00000000 00080002 183f0000 00000000
No interrupt sources are pending.
The transmit unit state is 'Idle'.
The receive unit state is 'Idle'.
This status is unusual for an activated interface.
eepro100-diag.c:v2.02 7/19/2000 Donald Becker ([email protected])
http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xef00.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:D0:B7:00:BE:00.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
Index #2: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xee80.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:D0:B7:00:BE:01.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
eepro100-diag.c:v2.02 7/19/2000 Donald Becker ([email protected])
http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xef00.
EEPROM contents, size 64x16:
00: d000 00b7 00be 0c03 0003 0201 4701 0000
0x08: 0000 0000 40a2 3000 8086 0000 0000 0000
...
0x38: 0000 0000 0000 0000 0000 0000 0000 a315
The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:D0:B7:00:BE:00.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
Index #2: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xee80.
EEPROM contents, size 64x16:
00: d000 00b7 01be 0c03 0003 0201 4701 0000
0x08: 0000 0000 40a2 3000 8086 0000 0000 0000
...
0x38: 0000 0000 0000 0000 0000 0000 0000 a215
The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
Station address 00:D0:B7:00:BE:01.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
eepro100-diag.c:v2.02 7/19/2000 Donald Becker ([email protected])
http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xef00.
MII PHY #1 transceiver registers:
3000 782d 02a8 0154 05e1 41e1 0003 0000
0000 0000 0000 0000 0000 0000 0000 0000
0203 0000 0001 ffff 0000 0001 ffff 0001
0004 0000 0000 0000 0000 0000 0000 0000.
Index #2: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xee80.
MII PHY #1 transceiver registers:
3000 7809 02a8 0154 05e1 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000.

2001-02-20 07:22:03

by Andrey Savochkin

[permalink] [raw]
Subject: Re: eepro100.c, kernel 2.4.1

On Tue, Feb 20, 2001 at 03:30:48PM +0900, Augustin Vidovic wrote:
> On Mon, Feb 12, 2001 at 01:00:34AM -0800, Ion Badulescu wrote:
> > > Augustin, could you send the output of `lspci' and `eepro100-diag -ee', please?
> > > (The latter may be taken from ftp://scyld.com/pub/diag/)
> >
> > I'd be curious to see them too.
>
> Ok, here is the output (the status are displayed only if the interface
> is down, so I had to go execute this manually on the machines) :
>
>
> eepro100-diag.c:v2.02 7/19/2000 Donald Becker ([email protected])
> http://www.scyld.com/diag/index.html
> Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xef00.
[snip]

What about lspci?

Andrey

2001-02-20 08:18:22

by Augustin Vidovic

[permalink] [raw]
Subject: Re: eepro100.c, kernel 2.4.1

On Mon, Feb 19, 2001 at 11:21:36PM -0800, Andrey Savochkin wrote:
> On Tue, Feb 20, 2001 at 03:30:48PM +0900, Augustin Vidovic wrote:
> > On Mon, Feb 12, 2001 at 01:00:34AM -0800, Ion Badulescu wrote:
> > > > Augustin, could you send the output of `lspci' and `eepro100-diag -ee', please?
> > > > (The latter may be taken from ftp://scyld.com/pub/diag/)
> > >
> > > I'd be curious to see them too.
> >
> > Ok, here is the output (the status are displayed only if the interface
> > is down, so I had to go execute this manually on the machines) :
> >
> >
> > eepro100-diag.c:v2.02 7/19/2000 Donald Becker ([email protected])
> > http://www.scyld.com/diag/index.html
> > Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xef00.
> [snip]
>
> What about lspci?

Ah, yes, here it is :

00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (AGP disabled) (rev 03)
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:0c.0 Ethernet controller: Intel Corporation 82557 (rev 08)
00:0d.0 Ethernet controller: Intel Corporation 82557 (rev 08)
00:0e.0 VGA compatible controller: ATI Technologies Inc 215GP [Mach64 GP] (rev 5c)

2001-02-20 23:39:22

by Andrey Savochkin

[permalink] [raw]
Subject: Re: eepro100.c, kernel 2.4.1

On Tue, Feb 20, 2001 at 05:18:37PM +0900, Augustin Vidovic wrote:
> 00:0c.0 Ethernet controller: Intel Corporation 82557 (rev 08)
> 00:0d.0 Ethernet controller: Intel Corporation 82557 (rev 08)

It's i82559.
It can't have that original bug which is checked by those EEPROM bits and
workaround for which is implemented.
You probably have another one :-)

What are the symptomes of the lock-ups?
Does TX timeout happen?
Does the card recover and resume operations after that?

Best regards
Andrey