2002-05-26 13:06:12

by jason andrade

[permalink] [raw]
Subject: e1000 intel driver bug (which impacts nfs)


Hi,

I'd spent many hours trying to diagnose and get a fix for what i thought
was a nfs performance bug. It turns out that i'm 99% sure this has ended
up being a bug in the Intel E1000 driver for the Intel 1000T (or any
Intel Gigabit Ethernet adapter in copper for me anyway). It's present in
both the older 3.X drivers and the new 4.X driver including 4.1.7 (the
current version)

The symptom/problem is that nfs will simply "hang" - clients will start
to queue requests and we were unable to figure out anything on the server
that would clear this except a reboot. With some more testing we were
able to verify and reproduce and resolve the problem by stopping nfs,
downing the gigabit interface, unloading the driver, reloading it,
reconfiguring the interface and restarting nfs. Within 2 minutes the
clients would start responding again.

Someone else has told me he can achieve the same effect with a ifconfig
down, pause, ifconfig up on that interface but this to date has not
worked for me.

I hope this helps anyone else trying to debug mysterious "nfs hangs" under
2.4.X. It doesn't seem to be tickled unless you are doing quite large
amounts of nfs traffic (we're pushing 1-1.5T a day on this interface)
and it's quite random (i've had a lockup from 4 hours to 10 days after
a reboot)

I am still trying to work out why 8K nfs mounts do not work (UDP) for
us (back to 1K now) and to try 8/16/32K mounts over TCP instead.

Since i now finally have a pure gigE network with a 9000 MTU for the
backend between servers i'm hoping this might work a bit better.



I'd also like to second Seth Vidal's comments about getting Neil, Trond
and co to provide a definitive (revised weekly? monthly?)

"this is what our recommend patchlist is and against which kernels and why"
on the nfs list and/or as part of the faq.

it is increasingly hard to track the major nfs patch contributors to work
out what should be applied and what can wait as well as figuring out the
patch dependencies.


cheers,

-jason


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2002-06-22 21:26:26

by Rock Gordon

[permalink] [raw]
Subject: tar returns errors

Hi all,

When untarring files over an NFS mounted filesystem
(client and server both 2.4.18 vanilla), tar prints
errors like "Cannot change mode to rwxr-xr-x". When I
run strace on tar, it prints this message:

----
utime("Documentation/", [2002/06/22-17:11:43,
2002/06/20-19:29:20]) = 72
chmod("Documentation/", 0755) = 72
write(2, "tar: ", 5tar: ) = 5
write(2, "Documentation/: Cannot change mo"...,
47Documentation/: Cannot change mode to rwxr-xr-x) =
47
write(2, "\n", 1
----

Now, the return value from utime and chmod are 72, but
the man page doesn't say anything about them ...

Any clues?


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com


-------------------------------------------------------
Sponsored by:
ThinkGeek at http://www.ThinkGeek.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-06-02 22:28:24

by Thomas Langås

[permalink] [raw]
Subject: Re: e1000 intel driver bug (which impacts nfs)

jason andrade:
> I hope this helps anyone else trying to debug mysterious "nfs hangs" under
> 2.4.X. It doesn't seem to be tickled unless you are doing quite large
> amounts of nfs traffic (we're pushing 1-1.5T a day on this interface)
> and it's quite random (i've had a lockup from 4 hours to 10 days after
> a reboot)

We've also got problems with nfs-hangs when transfering large files (ie.
files around 300-400M, sometime we have to go a bit higher tho, like
2GB-3GB files, but it's always possible to trigger this. However, we
don't need to be jumping through hoops to "fix it", after a min or so,
it's ok again. It seems to me like there's a VM problem or something.

We've got 2GB mem on the machines which are suffering from theese
problems.

--
Thomas

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-06-03 02:33:01

by seth vidal

[permalink] [raw]
Subject: Re: e1000 intel driver bug (which impacts nfs)

On Sun, 2002-06-02 at 18:28, Thomas Lang?s wrote:
> jason andrade:
> > I hope this helps anyone else trying to debug mysterious "nfs hangs" under
> > 2.4.X. It doesn't seem to be tickled unless you are doing quite large
> > amounts of nfs traffic (we're pushing 1-1.5T a day on this interface)
> > and it's quite random (i've had a lockup from 4 hours to 10 days after
> > a reboot)
>
> We've also got problems with nfs-hangs when transfering large files (ie.
> files around 300-400M, sometime we have to go a bit higher tho, like
> 2GB-3GB files, but it's always possible to trigger this. However, we
> don't need to be jumping through hoops to "fix it", after a min or so,
> it's ok again. It seems to me like there's a VM problem or something.
>
> We've got 2GB mem on the machines which are suffering from theese
> problems.

Could this be a VM problem - something like a problem flushing the write
cache?

it sounds like that from your description.

-sv


Attachments:
signature.asc (232.00 B)
This is a digitally signed message part

2002-06-03 07:48:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: e1000 intel driver bug (which impacts nfs)

>>>>> " " =3D=3D Thomas Lang=E5s <[email protected]> writes:

> We've also got problems with nfs-hangs when transfering large
> files (ie. files around 300-400M, sometime we have to go a bit
> higher tho, like 2GB-3GB files, but it's always possible to
> trigger this. However, we don't need to be jumping through
> hoops to "fix it", after a min or so, it's ok again. It seems
> to me like there's a VM problem or something.

> We've got 2GB mem on the machines which are suffering from
> theese problems.

There is a problem with highmem that I'm just barely getting to grips
with: NFS is able to starve the kernel of highmem bounce buffer
resources because we kmap() the pages for too long (surprise: NFS
predates the highmem code by several years and so nobody ever
considered kmap() when the design was made).

An attempt at a fix has been merged in to the development kernels as
of 2.5.19. The same patches are available for 2.4.19-pre8 +
NFS_ALL. If you'd like to test it out, you will need

http://www.fys.uio.no/~trondmy/src/2.4.19-pre8/linux-2.4.19-NFS_ALL.dif

+ the 4 patches from

http://www.fys.uio.no/~trondmy/src/2.4.19-pre8/alpha

Cheers,
Trond

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-06-03 08:49:48

by Ryan Sweet

[permalink] [raw]
Subject: Re: e1000 intel driver bug (which impacts nfs)

On Mon, 3 Jun 2002, Thomas Lang=E5s wrote:

> We've also got problems with nfs-hangs when transfering large files (ie.
> files around 300-400M, sometime we have to go a bit higher tho, like
> 2GB-3GB files, but it's always possible to trigger this. However, we
> don't need to be jumping through hoops to "fix it", after a min or so,
> it's ok again. It seems to me like there's a VM problem or something.

Does the problem happen with local I/O also, or only with nfs?



--=20
Ryan Sweet <[email protected]>
Atos Origin Engineering Services
http://www.aoes.nl


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-06-03 08:56:12

by Mark Manuel Cruz Ramos

[permalink] [raw]
Subject: RE: e1000 intel driver bug (which impacts nfs)


I had this problem also using the standard install of rh 7.2. but =
after
upgrading the kernel to 2.4.18, everything is fine already.

-----Original Message-----
From: Ryan Sweet [mailto:[email protected]]
Sent: Monday, June 03, 2002 4:37 PM
To: [email protected]
Cc: jason andrade
Subject: Re: [NFS] e1000 intel driver bug (which impacts nfs)


On Mon, 3 Jun 2002, Thomas Lang=E5s wrote:

> We've also got problems with nfs-hangs when transfering large files =
(ie.
> files around 300-400M, sometime we have to go a bit higher tho, like
> 2GB-3GB files, but it's always possible to trigger this. However, we
> don't need to be jumping through hoops to "fix it", after a min or =
so,
> it's ok again. It seems to me like there's a VM problem or something.

Does the problem happen with local I/O also, or only with nfs?



--=20
Ryan Sweet <[email protected]>
Atos Origin Engineering Services
http://www.aoes.nl


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-06-05 15:04:15

by darren.miller

[permalink] [raw]
Subject: Re: e1000 intel driver bug (which impacts nfs)

I recently have been doing heavy investigation into using Linux for NAS
with an Intel E1000 and
Slackware 8.

I found this problem also, however it disappears if you use PowerTweak
Daemon and get the latest
driver for the Intel Card.

But PowerTweak cured alot of my issues...

http://powertweak.sourceforge.net/

Hope this helps

Darren

==============================================================================
Darren Miller
Senior Systems Support Engineer
Microsoft Certified Professional
SCO Advanced Certified Engineer
Infomation Systems Department (Core Server Support)
Philips Semiconductors,Milbrook Industrial Estate,Southampton,SO15
0DJ,England




Thomas Lang?s <[email protected]>
Sent by: [email protected]
2002-06-02 23:28
Please respond to nfs


To: jason andrade <[email protected]>
cc: [email protected]
(bcc: Darren Miller/SOU/SC/PHILIPS)
Subject: Re: [NFS] e1000 intel driver bug (which impacts nfs)
Classification:



jason andrade:
> I hope this helps anyone else trying to debug mysterious "nfs hangs"
under
> 2.4.X. It doesn't seem to be tickled unless you are doing quite large
> amounts of nfs traffic (we're pushing 1-1.5T a day on this interface)
> and it's quite random (i've had a lockup from 4 hours to 10 days after
> a reboot)

We've also got problems with nfs-hangs when transfering large files (ie.
files around 300-400M, sometime we have to go a bit higher tho, like
2GB-3GB files, but it's always possible to trigger this. However, we
don't need to be jumping through hoops to "fix it", after a min or so,
it's ok again. It seems to me like there's a VM problem or something.

We've got 2GB mem on the machines which are suffering from theese
problems.

--
Thomas

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2002-09-22 20:44:46

by Allen Day

[permalink] [raw]
Subject: RE: e1000 intel driver bug (which impacts nfs)

I found this thread describing NFS hanging with the Intel E1000 gigabit
ethernet adapter, and was wondering if any more progress has been made on
the problem.

I'm using a Redhat 7.3 2.4.18-3smp kernel, nfs-utils-0.3.3-5, on a dual
Xeon 1.8GHz / 4GB / SuperMicro P4DP6.

The symptoms I'm experiencing are that the NFS server behaves fine for a
few hours, then mounts are no longer available from it. Trying to mount
using a remote host or localhost as the NFS client gives an error in
dmesg: "nfs: task xxxx can't get a request slot". It isn't possible to
stop NFS via its init.d script, or even to get the machine to respond to
a 'shutdown' -- the only way I've been able to temporarily solve the
problem is to push the reset button on the box.

It may also be worth mentioning that once I notice the NFS problem has
started, I can no longer do a 'df' without making the terminal hang.
Also, the NFS share isn't very small... it's about 450GB.

Any idea what's going on here? Am I looking at an E1000 driver problem or
an NFS problem?

-Allen



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs