2004-10-13 16:50:46

by Dan Stromberg

[permalink] [raw]
Subject: NFS write errors


We are occasionally getting NFS write errors when writing terrabytes of
data from an AIX 5.1 system to an RHEL 3 system, using the version of
in-kernel NFS that comes with RHEL 3. We're using 8k rsize, 8k wsize,
nfs v3, and tcp presently, but this isn't written in stone.

Has anyone else seen this?

Has anyone found a workaround?

Does anyone have any suggestions, speculative or otherwise?

Thanks!

--
Dan Stromberg DCS/NACS/UCI <[email protected]>




-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-10-13 17:33:22

by Roger Heflin

[permalink] [raw]
Subject: RE: NFS write errors


I have seen a fair amount of issues with RH3.0 update 2, either
on the client or server side.

On the client side, Opterun's will crash the client on a core dump
across NFS, on IA64's we get a variety of odd errors including using
vi on an existing file and the file going away on the :wq exit, and
no longer being there at all. If you are using update 2, I would
strongly suggest getting update 3.

I don't know if there were issues before update 2, I did do some testing
on update 1 and we did not see anything bad, but we updated to update 2
before the serious production work started, and it was bad.

Roger

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Dan Stromberg
Sent: Wednesday, October 13, 2004 11:50 AM
To: [email protected]
Cc: Dan Stromberg
Subject: [NFS] NFS write errors


We are occasionally getting NFS write errors when writing terrabytes of data
from an AIX 5.1 system to an RHEL 3 system, using the version of in-kernel
NFS that comes with RHEL 3. We're using 8k rsize, 8k wsize, nfs v3, and tcp
presently, but this isn't written in stone.

Has anyone else seen this?

Has anyone found a workaround?

Does anyone have any suggestions, speculative or otherwise?

Thanks!

--
Dan Stromberg DCS/NACS/UCI <[email protected]>




-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-10-13 20:13:02

by Steve Dickson

[permalink] [raw]
Subject: Re: NFS write errors

Dan Stromberg wrote:

>We are occasionally getting NFS write errors when writing terrabytes of
>data from an AIX 5.1 system to an RHEL 3 system, using the version of
>in-kernel NFS that comes with RHEL 3. We're using 8k rsize, 8k wsize,
>nfs v3, and tcp presently, but this isn't written in stone.
>
>
What kind of errors? Is there anything in either /var/log/messages
that talk about some error condition? What kernel are we talking about?

SteveD.


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-10-13 20:39:39

by Dan Stromberg

[permalink] [raw]
Subject: Re: NFS write errors

On Wed, 2004-10-13 at 13:12, Steve Dickson wrote:
> Dan Stromberg wrote:
>
> >We are occasionally getting NFS write errors when writing terrabytes of
> >data from an AIX 5.1 system to an RHEL 3 system, using the version of
> >in-kernel NFS that comes with RHEL 3. We're using 8k rsize, 8k wsize,
> >nfs v3, and tcp presently, but this isn't written in stone.
> >
> >
> What kind of errors? Is there anything in either /var/log/messages
> that talk about some error condition? What kernel are we talking about?
>
> SteveD.

Both of the write errors I've documented were during a huge rsync.

The only error that looks at all relevant in /var/log/messages are:

Oct 11 14:44:37 esmft1 rpc.rquotad: No correct mountpoint specified.
Oct 11 14:44:37 esmft1 rpc.rquotad: Can't find filesystem mountpoint for
directo
ry /data/gfs044
Oct 11 14:44:37 esmft1 rpc.rquotad: No correct mountpoint specified.
Oct 11 14:44:37 esmft1 rpc.rquotad: Can't find filesystem mountpoint for
directo
ry /data/gfs045
Oct 11 14:44:37 esmft2-2 rpc.rquotad: Can't find filesystem mountpoint
for direc
tory /foo
Oct 11 14:44:37 esmft1 rpc.rquotad: No correct mountpoint specified.
Oct 11 14:44:37 esmft2-2 rpc.rquotad: No correct mountpoint specified.
Oct 11 14:44:37 esmft2-2 rpc.rquotad: Can't find filesystem mountpoint
for direc
tory /mnt/lustre
Oct 11 14:44:37 esmft2-2 rpc.rquotad: No correct mountpoint specified.

esmft2 is the relevant NFS server. esmft1 is not germane, but it's
interesting that it had such similar errors. The time of the failure
was 14:46 on Oct 11 - two minutes after these errors.

The other write error I documented had nothing at all relevant looking
in /var/log/messages.

The linux NFS server has:

[root@esmft2 log]# cat /etc/redhat-release
Red Hat Enterprise Linux WS release 3 (Taroon Update 3)
[root@esmft2 log]# cat /proc/version
Linux version 2.4.21-15.0.4.EL.lustre1.3.9.1 (root@esmft2) (gcc version
3.2.3 20030502 (Red Hat Linux 3.2.3-42)) #7 SMP Mon Sep 13 15:49:25 PDT
2004
[root@esmft2 log]#

The Oct 11th write error had nothing relevant in AIX's errpt -a. The
Oct 8th write error is no longer in errpt -a's data (?).

--
Dan Stromberg DCS/NACS/UCI <[email protected]>




-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-10-14 12:44:23

by Steve Dickson

[permalink] [raw]
Subject: Re: NFS write errors



Dan Stromberg wrote:

>Both of the write errors I've documented were during a huge rsync.
>
>
hmm... are you are using soft mounts?

SteveD.


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-10-14 16:06:35

by Dan Stromberg

[permalink] [raw]
Subject: Re: NFS write errors

On Thu, 2004-10-14 at 05:44, Steve Dickson wrote:
> Dan Stromberg wrote:
>
> >Both of the write errors I've documented were during a huge rsync.
> >
> >
> hmm... are you are using soft mounts?
>
> SteveD.

I didn't specify hard/soft or intr/nointr, and mount | grep doesn't
indicate anything either way. I assume it's defaulting to hard +
nointr.

I believe the fact that the mount remains mounted after the error,
supports the idea that it isn't mounted soft.

Thanks!

--
Dan Stromberg DCS/NACS/UCI <[email protected]>


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part