2002-06-18 17:20:46

by Joshua Newton

[permalink] [raw]
Subject: Incredible weirdness with eepro100?

I'm posting here as a last resort, having spent about forty hours trying
to root out this particular problem. First up, my hardware and software
configuration on the problem machine:

ABIT KT7A-RAID v1.0 w/ 1GHz Athlon Thunderbird
256MB SDR SDRAM
/dev/hda: Maxtor 2B020H1 (20GB, 5400rpm)
/dev/hdc: Maxtor 5T040H4 (40GB, 7200rpm)
/dev/hdd: Sony CDU5211 (52x ATAPI)
eth0: 3Com 3c905B @ 100Mbps/FDX
eth1: Intel PILA8470B (eepro100) @ 100Mbps/FDX

uname -a:

Linux rapier.corona 2.4.19-pre9 #2 Tue Jun 18 11:34:52 EDT 2002 i686
unknown

cat /etc/mandrake-release:

Mandrake Linux release 8.3 (Cooker) for i586


Okay, the problem, as simply as I can state it: uploading really fast to
this particular box works okay for a bit, and then suddenly stops hard.
For example, a quick cp(1) across NFSv3 -- from a known working box to
the whacked-out box -- yields the following:

cp: writing `/scratch/test/chaos12m.sf2': Input/output error

A quick check of dmesg(8) turns up zilch, as does a quick tail on
/var/log/messages. As I'm watching the lights on the switch, they start
off blinking madly, as is proper for a 100Mbps/FDX transmission between
machines connected to the same switch, sitting ~6ft apart, and then the
lights stop blinking entirely.

The behaviour can be repeated across FTP and Samba as well, both with
known-working boxen, one running Linux 2.4.18 stock and the other
running WinXP with all recent patches applied, both using eepro100
cards. Transfer between these two working boxen (only tested via ftp so
far) works beautifully, with the files screaming across at a healthy
~8MB/s.

I am pulling my hair out on this problem. I've tried various different
kernel versions on this box (2.4.19-pre2-ac2, 2.4.18 stock,
2.4.19-pre2), different drivers (in additional to the stock eepro100
drivers, the official Intel e100 driver gives the same results with both
v1.8.37 and v2.0.30), different switch ports (on an Intel Express 520T
Switch -- nice hardware), different NICs (I had a spare eepro100, and
tried that), different protocols, and on and on. I'm just about at the
kicking and screaming stage now.

So far, I've found precisely THREE clues to the problem:

(1) Downloading files is no problem. This ONLY occurs on upload.

(2) The problem is only triggered by certain files being transferred. On
the WinXP box, I was trying to transfer across some .wav files I'd
produced in the course of running some MIDIs thru wavetable synthesis on
the SB Dead! card on the XP box. The first .wav transferred fine, and
every other one stopped dead. This was over FTP, and it was repeatable.
The second time, I was transferring the Chaos 12MB SoundFont bank from
the working Linux 2.4.18 box to the goofy one, and got I/O errors across
NFS. If I copied a bunch of other files first, they'd go across fine,
and then it would choke on the SoundFont. This was repeatable with both
NFS and FTP.

(3) Here's the really good clue: if I transfer the SoundFont bank from
the good box to the goofy box with scp -- hey, it works instantly!

My working guess is that something somewhere in the networking code on
this particular machine -- or in both the e100 and eepro100 drivers --
is seeing SOMETHING it doesn't like in certain files, and it's exploding
on takeoff.

Any assistance whatsoever on this problem would be greatly appreciated.


--

"However, Science People like to believe in laws, even when such laws
can be circumvented by their own Science. They become most displeased
if you suggest it would be more accurate to speak of the Generally
Good Idea of Gravity or the Three Useful Guidelines of
Thermodynamics."

-- James Alan Gardner, /Ascending/


2002-06-19 13:48:13

by Sylvain Le Briero

[permalink] [raw]
Subject: Re: Incredible weirdness with eepro100?

it seems that some older kernels do the same...

I have discovered recently I have the same problem with a HP Netserver LH
3000 wich works fine in any other case :

uname-a
Linux databaseserver 2.4.10 #1 Tue Nov 13 17:28:13 CET 2001 i686 unknown

excerpt of dmesg :
eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
<[email protected]> and others
eth0: OEM i82557/i82558 10/100 Ethernet, 00:30:6E:00:1A:49, IRQ 18.
Receiver lock-up bug exists -- enabling work-around.
Board assembly 506477-150, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
Receiver lock-up workaround activated.

on a Slackware 8.0 running postgreSQL.

When i mount a SMB share and copy a large file to an NT Server (postgres
database files : 2-3 GBytes) all network connections are closed and a
reboot is needed.

It seems also that this problem is related

This server has been in production state for almost 1 year and is very
stable as long as you don't transfer large files to a network mount point.

Hope this helps



2002-06-19 13:48:25

by Sylvain Le Briero

[permalink] [raw]
Subject: Re: Incredible weirdness with eepro100?

it seems that some older kernels do the same...

I have discovered recently I have the same problem with a HP Netserver LH
3000 wich works fine in any other case :

uname-a
Linux databaseserver 2.4.10 #1 Tue Nov 13 17:28:13 CET 2001 i686 unknown

excerpt of dmesg :
eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
<[email protected]> and others
eth0: OEM i82557/i82558 10/100 Ethernet, 00:30:6E:00:1A:49, IRQ 18.
Receiver lock-up bug exists -- enabling work-around.
Board assembly 506477-150, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
Receiver lock-up workaround activated.

on a Slackware 8.0 running postgreSQL.

When i mount a SMB share and copy a large file to an NT Server (postgres
database files : 2-3 GBytes) all network connections are closed and a
reboot is needed.

It seems also that this problem is related

This server has been in production state for almost 1 year and is very
stable as long as you don't transfer large files to a network mount point.

Hope this helps



2002-06-19 13:57:15

by Dave Gilbert (Home)

[permalink] [raw]
Subject: Re: Incredible weirdness with eepro100?

Yep, a me-to I'm afraid. We had the problem with an eepro100 on-board a
motherboard. Worked fine except when we copied large files and then it
would start randomly timing out on smb/NFS.

Tried new kernels (2.4.16 I think was the last I tried); was mainly
using the Intel drivers that were in SuSE kernels.

In the end we gave up and put a 3com 3c905 in - it has been fine ever since.

Dave

P.S. I'm not in a situation to try anthing more withit since it is a
production server.




2002-06-19 16:39:05

by Joshua Newton

[permalink] [raw]
Subject: Re: Incredible weirdness with eepro100?

Well, the fix is sadly simple: switching to a spare 3c905C-TX seems to
have eliminated the problem. Copying the contents of claymore:/pack (a
directory with lots of big RPMs and tarballs and other miscellany) to
rapier:/scratch is super-fast and error-free. This directory includes a
few files (such as the Chaos SoundFont) that killed the transfer
yesterday.

Once I receive my new chassis, I'll move rapier's guts to it and start
testing with FreeBSD 4.6 to try to determine whether the problem is
linked to this particular /hardware/ configuration, or if it's something
in the Linux kernel and/or drivers. I'll also swap around network cards
to try and determine if this is some sort of strange interaction between
the 3c59x and e100/eepro100 drivers.

--

"However, Science People like to believe in laws, even when such laws
can be circumvented by their own Science. They become most displeased
if you suggest it would be more accurate to speak of the Generally
Good Idea of Gravity or the Three Useful Guidelines of
Thermodynamics."

-- James Alan Gardner, /Ascending/