2004-07-23 16:20:39

by John Roberts

[permalink] [raw]
Subject: Linux NFS writes to Solaris very, very slow


Hi there,

I work in Hillsboro, Oregon, USA for Credence Systems Corporation (as a
software engineer) and we use the Redhat Enterprise 3 (2.4.21 kernel,
a Redhat hodgepodge with some 2.6 stuff) distribution on our
x86 PCs.

We're on a network with lots of Sun systems and a central file
server that is a Veritas cluster of twin SunFire servers running
Solaris 2.8.

What we've observed is that NFS writes from our Linux boxes to
the Solaris server (and other Solaris workstations) is _very_
slow. Reads seem to be operating at a reasonable speed.
FTP speeds are blazing (protocol below NFS).

On a completely seperate note, Linux-to-Linux NFS file writes
only seem to be fast if we publish the serving Linux volume
as asynchronous (default setting is synchronous which is slow).

I'm curious if anyone has any ideas on what I should do regarding
the slow nature of Linux clients to Solaris servers over NFS?
I don't know if anybody has encountered this or if anyone in the
Linux kernel community is looking at it.

Any thoughts/advice would be _greatly_ appreciated.

thanks,


John Roberts
[email protected]




-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-07-23 16:35:36

by Olaf Kirch

[permalink] [raw]
Subject: Re: Linux NFS writes to Solaris very, very slow

On Fri, Jul 23, 2004 at 09:20:34AM -0700, John Roberts wrote:
> What we've observed is that NFS writes from our Linux boxes to
> the Solaris server (and other Solaris workstations) is _very_
> slow. Reads seem to be operating at a reasonable speed.
> FTP speeds are blazing (protocol below NFS).

What are your mount options? you should be using nfsv3; v2 is
of course terribly slow on writes.

> On a completely seperate note, Linux-to-Linux NFS file writes
> only seem to be fast if we publish the serving Linux volume
> as asynchronous (default setting is synchronous which is slow).

as long as your client uses nfsv2, that is to be expected. In
NFSv2 the (mandated) default is to use sync mode, i.e. write
each and every blob of data to disk before acknowledging the
RPC operation.

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-23 17:04:25

by John Roberts

[permalink] [raw]
Subject: Re: Linux NFS writes to Solaris very, very slow



>On Fri, Jul 23, 2004 at 09:20:34AM -0700, John Roberts wrote:
>> What we've observed is that NFS writes from our Linux boxes to
>> the Solaris server (and other Solaris workstations) is _very_
>> slow. Reads seem to be operating at a reasonable speed.
>> FTP speeds are blazing (protocol below NFS).
>
>What are your mount options? you should be using nfsv3; v2 is
>of course terribly slow on writes.

Our Solaris server is orfsrv2, which is mounted via
autofs and "mount" returns:

orfsrv2:/export/vol/ims/apps/xemacs/2.4.21-4.0.1.EL on /opt2/xemacs type nfs
(rw,addr=10.4.10.154)
orfsrv2:/export/vol/engr/cobalt on /ims/cobalt type nfs (rw,tcp,addr=10.4.10.154)
orfsrv2:/export/vol/engr/viper on /ims/viper type nfs (rw,addr=10.4.10.154)

Full mount list at end of this email/post, along with /etc/fstab.

I believe that we are using nfsv3. How can I tell?
When I use your wonderful nfsstat utility it reports only
Client nfs v3 stats (although it gives both Server nfs v2
and Server nfs v3 stats).

Our distro is Redhat Enterprise Linux 3 (2.4.21-4.0.1.EL #1).

As for the Solaris server, its running Solaris 2.8.
I would assume that's nfsv3. If I run nfsstat on another Sun
(solaris 2.8) workstation, it reports back fileserver
statistics like:

/usr/local from orfsrv2:/export/vol/ims/apps/local/5.8
Flags:
vers=3,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=32768,wsize=32768,retra
ns=5,timeo=600
Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60



>> On a completely seperate note, Linux-to-Linux NFS file writes
>> only seem to be fast if we publish the serving Linux volume
>> as asynchronous (default setting is synchronous which is slow).
>
>as long as your client uses nfsv2, that is to be expected. In
>NFSv2 the (mandated) default is to use sync mode, i.e. write
>each and every blob of data to disk before acknowledging the
>RPC operation.

So are you saying nfsv3 defaults to async?



Thanks for the reply!


John Roberts
[email protected]

--------------------------mount output----------------------------
/dev/sda3 on / type ext3 (rw)
none on /proc type proc (rw)
usbdevfs on /proc/bus/usb type usbdevfs (rw)
/dev/sda1 on /boot type ext3 (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda7 on /export type ext3 (rw)
/dev/sda2 on /export/home type ext3 (rw)
none on /dev/shm type tmpfs (rw)
/dev/sda5 on /var type ext3 (rw)
automount(pid2825) on /hosts type autofs
(rw,fd=5,pgrp=2825,minproto=2,maxproto=4)
automount(pid2807) on /csc/dept type autofs
(rw,fd=5,pgrp=2807,minproto=2,maxproto=4)
automount(pid2777) on /csc/viewstorage type autofs
(rw,fd=5,pgrp=2777,minproto=2,maxproto=4)
automount(pid2784) on /csc/vobs type autofs
(rw,fd=5,pgrp=2784,minproto=2,maxproto=4)
automount(pid2786) on /csc/proj type autofs
(rw,fd=5,pgrp=2786,minproto=2,maxproto=4)
automount(pid2782) on /csc/tools type autofs
(rw,fd=5,pgrp=2782,minproto=2,maxproto=4)
automount(pid2890) on /home type autofs (rw,fd=5,pgrp=2890,minproto=2,maxproto=4)
automount(pid2886) on /opt2 type autofs (rw,fd=5,pgrp=2886,minproto=2,maxproto=4)
automount(pid2991) on /u type autofs (rw,fd=5,pgrp=2991,minproto=2,maxproto=4)
automount(pid2981) on /et type autofs (rw,fd=5,pgrp=2981,minproto=2,maxproto=4)
automount(pid2929) on /mfg type autofs (rw,fd=5,pgrp=2929,minproto=2,maxproto=4)
automount(pid2979) on /ims type autofs (rw,fd=5,pgrp=2979,minproto=2,maxproto=4)
automount(pid2927) on /net type autofs (rw,fd=5,pgrp=2927,minproto=2,maxproto=4)
algebra.hillsboro.credence.com:/export/home/jroberts on /u/jroberts type nfs
(rw,nosuid,tcp,soft,addr=10.4.13.147)
algebra.hillsboro.credence.com:/export/home/jroberts on /home/jroberts type nfs
(rw,nosuid,tcp,soft,addr=10.4.13.147)
orfsrv2:/export/vol/ims/apps/xemacs/2.4.21-4.0.1.EL on /opt2/xemacs type nfs
(rw,addr=10.4.10.154)
orfsrv2:/export/vol/engr/cobalt on /ims/cobalt type nfs (rw,tcp,addr=10.4.10.154)
orfsrv2:/export/vol/engr/viper on /ims/viper type nfs (rw,addr=10.4.10.154)


---------------------------/etc/fstab-------------------------------------
LABEL=/ / ext3 defaults 1 1
LABEL=/boot /boot ext3 defaults 1 2
none /dev/pts devpts gid=5,mode=620 0 0
LABEL=/export /export ext3 defaults 1 2
LABEL=/export/home /export/home ext3 defaults 1 2
none /proc proc defaults 0 0
none /dev/shm tmpfs defaults 0 0
LABEL=/var /var ext3 defaults 1 2
/dev/sda6 swap swap defaults 0 0
/dev/cdrom /mnt/cdrom udf,iso9660 noauto,owner,kudzu,ro
0 0
/dev/fd0 /mnt/floppy auto noauto,owner,kudzu 0 0




-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-23 17:50:44

by John Roberts

[permalink] [raw]
Subject: RE: Linux NFS writes to Solaris very, very slow


>I would ensure you are using NFSv3 and TCP (as opposed to UDP) mounts.

I believe I am, but still getting poor Linux client-to-Solaris server
NFS write speed. Read performance seems okay.


>On our solaris NFS server our automount maps have the following options:
>
>-proto=tcp,vers=3

Ditto here. For my server volume I have:
vers=3,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=32768,wsize=32768,retr
ans=5,timeo=600


>And on the Linux clients I can see this is the case by doing a 'cat
>/proc/mounts':
>
>nfs rw,v3,hard,intr,tcp,lock, 0 0 (edited to remove irrelevant info)

I have similar settings, but not the same. We also use autofs to
mount the file server for us:

nfs rw,v3,rsize=32768,wsize=32768,hard,tcp,lock,addr=orfsrv2 0 0



Thanks for the reply!


John Roberts
[email protected]




-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-26 15:18:14

by Bernd Schubert

[permalink] [raw]
Subject: async vs. sync


> On a completely seperate note, Linux-to-Linux NFS file writes
> only seem to be fast if we publish the serving Linux volume
> as asynchronous (default setting is synchronous which is slow).
>

Yeah, we just observed something similar.=20

Here some numbers for write speed of the clients:

linux-2.6.7:
async: 11MB/s
sync, wdelay: 2-3MB/s
sync, no_wdelay: 7MB/s

linux-2.4.27-rc3:
async: 11MB/s
sync, wdelay: 2-3MB/s
sync, no_wdelay: 2-3MB/s

The kernel version corresponds to the server kernel, all clients still run=
=20
2.4.26.

Unfortunality switching the server to 2.6.7 makes the server crash every=20
morning on running the cron-jobs with page allocation errors, so keeping=20
2.6.x is currently not an option. At least not as long as those errors seem=
=20
to be tolerated by the kernel maintainers :(

Any ideas whats the issue with sync vs async in 2.4.x?

Thanks,
Bernd




=2D-=20
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universit=E4t Heidelberg
INF 229
69120 Heidelberg
e-mail: [email protected]




-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-26 17:05:51

by Bernd Schubert

[permalink] [raw]
Subject: Re: async vs. sync

On Monday 26 July 2004 18:26, you wrote:
> > Any ideas whats the issue with sync vs async in 2.4.x?
>
> <Sonorus>
> Read The FAQ, Luke
> </Sonorus>
>
> the async export option opens a window that allows silent data
> corruption. did you have a specific question about it?

Exactly, therefore I would like to enable the sync-export option again.
However, reducing the speed to about 1/5 of async speed for large files and
to about 1/20 for smaller files is not an option. The people already
complained that compiling their projects takes ages now...

D'oh, sorry, I forgot to write that we use the tcp-mounts and nfs3. So those
numbers really shouldn't happen, should they?


Cheers,
Bernd


Attachments:
(No filename) (697.00 B)
(No filename) (189.00 B)
signature
Download all attachments

2004-07-26 19:48:07

by Jan Bruvoll

[permalink] [raw]
Subject: Re: async vs. sync

Bernd Schubert wrote:

> Exactly, therefore I would like to enable the sync-export option again.
>
>However, reducing the speed to about 1/5 of async speed for large files and
>to about 1/20 for smaller files is not an option. The people already
>complained that compiling their projects takes ages now...
>
>D'oh, sorry, I forgot to write that we use the tcp-mounts and nfs3. So those
>numbers really shouldn't happen, should they?
>
>

Hi Bernd,

are you doing anything "interesting" underneath NFS, ie. what are you
storing your files on?

My set-up has about the same performance data, however my big problem is
that the server dies when I hook all my clients up... Raw throughput
seemed ok, but when you started complaining about the speed, I started
thinking that my figures weren't all that great anymore.

The set-up is a dual 3Ghz Xeon, 2Gb RAM, and a DRBD-mirrored partition
sitting on a 3Ware Escalade 7500 to 7200rpm disks. Kernel
2.4.26-gentoo-r5, everything else newest version. The annoying thing is
that the server this new set-up is replacing is an old P3/650 that
easily copes with the load... :-\

Best regards
Jan



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-26 21:28:20

by Bernd Schubert

[permalink] [raw]
Subject: Re: async vs. sync

On Monday 26 July 2004 19:21, you wrote:
> sorry i'm not copying the list... our local exchange server changed my
> outgoing To: address a couple of weeks ago, and i still haven't gotten
> the list moderator's attention to fix up my e-mail address.

No problem, I will send a full quote to the list.

>
> > Exactly, therefore I would like to enable the sync-export
> > option again.
> > However, reducing the speed to about 1/5 of async speed for
> > large files and
> > to about 1/20 for smaller files is not an option. The people already
> > complained that compiling their projects takes ages now...
> >
> > D'oh, sorry, I forgot to write that we use the tcp-mounts and
> > nfs3. So those
> > numbers really shouldn't happen, should they?
>
> if the server is working correctly, theoretically this should not
> happen.
>
> are you using strange mount options on the client, such as "noac" or
> "sync" ? "async" on the client is perfectly OK to use; it's the server
> side export option that is badness.

No, I think not. Here's the fstab line for /home:
hamilton:/home /home nfs rsize=8192,wsize=8192,hard,intr,tcp 0 0

Here's something that is for me pretty interesting (since I don't completely
understand whats causing it):

1.) The export option is changed on the server from sync to async,
'exportfs -rav' is executed.

2.) One client had mounted with the async-mount option, all other clients
without this option. No client remounts /home after the the export option was
changed on the server.

3.) All but the one client now can write to the server with 5MB/s for large
*and* small files. However the client which has mounted with the async-mount
option can now write with full 100MBit (so 11 MB/s) to the server.

4.) Of course, remounting /home on any client even without giving the
async-mount option, gives back full write speed to all clients now.

Is the behaviour from 3) intented to happen? Actually I thought that a remount
on the clients is necessary to gain back full write speed.

Thanks,
Bernd


Attachments:
(No filename) (1.98 kB)
(No filename) (189.00 B)
signature
Download all attachments

2004-07-26 22:06:18

by Bernd Schubert

[permalink] [raw]
Subject: Re: async vs. sync

> are you doing anything "interesting" underneath NFS, ie. what are you
> storing your files on?

Its similar to your configuration ;)

> My set-up has about the same performance data, however my big problem is
> that the server dies when I hook all my clients up... Raw throughput

Fortunately that doesn't seem to happen in our case.

> seemed ok, but when you started complaining about the speed, I started
> thinking that my figures weren't all that great anymore.

At least its good to know that we are not alone. Well, I do know about the
performance decrease of sync-exports for a pretty long time, but I was never
sure if its not only a problem of our server (the previous one was a
PII-450).

>
> The set-up is a dual 3Ghz Xeon, 2Gb RAM, and a DRBD-mirrored partition
> sitting on a 3Ware Escalade 7500 to 7200rpm disks. Kernel

Here its a dual opteron, 3GB RAM, adaptec 79xx PCI-X scsi, transtec scsi-ide
raid. Local disk i/o is over 70MB/s. The server is connected with GBit, the
clients only with 100MBit. Here are some performace numbers when the server
was still running 2.6.7:

writing to /worka (async exported): 4 clients, all at full 11MB/s

writing to /home, (sync,no_wdelay exported): 4 clients at 7MB/s

(When I did the tests I was simply to lazy to test with more clients. Actually
I thought that 4 times 7 MB/s is more than sufficient for usual work).

The /home partition is also mirrored via drbd to a failover server, /worka is
not mirrored (mirroring 1.7TB is not that easy as mirroring 200GB ;) ).

So I really don't think that the server performance is the problem.

> 2.4.26-gentoo-r5, everything else newest version. The annoying thing is
> that the server this new set-up is replacing is an old P3/650 that
> easily copes with the load... :-\

Well, as I said, when I tested our old server with sync mounts, it had the
same problem. Are you sure that your old server did not export asynchronous?

Cheers,
Bernd


Attachments:
(No filename) (1.91 kB)
(No filename) (189.00 B)
signature
Download all attachments

2004-07-26 23:05:22

by John Roberts

[permalink] [raw]
Subject: Re: async vs. sync


>At least its good to know that we are not alone. Well, I do know about the
>performance decrease of sync-exports for a pretty long time, but I was never
>sure if its not only a problem of our server (the previous one was a
>PII-450).

We definitely see the same problem of sync-exports _much_ slower (4x to 7x-ish)
on our network (Dell 340 and 340 workstations with 2.2 GHz Pentium IVs).

We've switched to using async when serving up our Linux volumes.


>So I really don't think that the server performance is the problem.

I absolutely agree. In fact, I've observed that Linux clients
are _faster_ writing to slower Solaris servers than fast ones.

On a 400 MHz Sun Ultra-5 I was using as a server was 4x _faster_
than a 1.5 GHz SunBlade 2500 system.

My observation is the faster the Solaris server, the slower
the Linux NFS client (writes only).

I don't know if the whole Linux-to-Solaris write issue
can be explained by the number of COMMIT requests.
Solaris clients only issue a COMMIT request when the
file is closed or when the client's buffer cache is
flushed by memory management operations (like pageout).

However Linux client issues one COMMIT for every 32
writes, and for a 32K block size, that's one COMMIT
for every 1 Mbyte of data.

Those extra COMMITs must affect performance, but I still
think there's something else going on.



John Roberts

[email protected]




-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-27 12:00:22

by Jan Bruvoll

[permalink] [raw]
Subject: Re: async vs. sync

Bernd Schubert wrote:

>>are you doing anything "interesting" underneath NFS, ie. what are you
>>storing your files on?
>>
>>
>
>Its similar to your configuration ;)
>
>

Good :-)

>>My set-up has about the same performance data, however my big problem is
>>that the server dies when I hook all my clients up... Raw throughput
>>
>>
>
>Fortunately that doesn't seem to happen in our case.
>
>

Let me clarify - the server doesn't quite die, however it slows down
drastically, having a load of ~7 while 99% CPU idle. The clients clog
up, and for instance trying to mount anything just times out. If I
unmount the busy share from the clients, everything settles down again.

>>seemed ok, but when you started complaining about the speed, I started
>>thinking that my figures weren't all that great anymore.
>>
>>
>
>At least its good to know that we are not alone. Well, I do know about the
>performance decrease of sync-exports for a pretty long time, but I was never
>sure if its not only a problem of our server (the previous one was a
>PII-450).
>
>
>
>>The set-up is a dual 3Ghz Xeon, 2Gb RAM, and a DRBD-mirrored partition
>>sitting on a 3Ware Escalade 7500 to 7200rpm disks. Kernel
>>
>>
>
>Here its a dual opteron, 3GB RAM, adaptec 79xx PCI-X scsi, transtec scsi-ide
>raid. Local disk i/o is over 70MB/s. The server is connected with GBit, the
>clients only with 100MBit. Here are some performace numbers when the server
>was still running 2.6.7:
>
>writing to /worka (async exported): 4 clients, all at full 11MB/s
>
>writing to /home, (sync,no_wdelay exported): 4 clients at 7MB/s
>
>(When I did the tests I was simply to lazy to test with more clients. Actually
>I thought that 4 times 7 MB/s is more than sufficient for usual work).
>
>The /home partition is also mirrored via drbd to a failover server, /worka is
>not mirrored (mirroring 1.7TB is not that easy as mirroring 200GB ;) ).
>
>So I really don't think that the server performance is the problem.
>
>

I tried a little more performance number hunting:
- NFS copy and write to same device - 3Mb/s sustained (500Mb file)
- piozone direct write to RAID device - ~34-50Mb/s (depending on file size)
- drbd network device - 933Mb/s

Nothing really wrong here I'd say - not blazingly quick, but surely
nothing to explain why everything just stops when I hook up my 12
clients to this set-up?

>>2.4.26-gentoo-r5, everything else newest version. The annoying thing is
>>that the server this new set-up is replacing is an old P3/650 that
>>easily copes with the load... :-\
>>
>>
>
>Well, as I said, when I tested our old server with sync mounts, it had the
>same problem. Are you sure that your old server did not export asynchronous?
>
>

Doesn't seem so, no, the /etc/export is -extremely- simple:

/export client1(rw,no_root_squash)

Similarly, I'm not trying to be clever on the client side (/etc/fstab
excerpt):

server:/export /export nfs rw,hard,intr 0 0

This set-up works wonderfully on the old 2.4.17 box, but grinds to a
halt on the new ones :-(

Regards
Jan



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-27 13:00:53

by Bernd Schubert

[permalink] [raw]
Subject: Re: async vs. sync

> >>My set-up has about the same performance data, however my big problem is
> >>that the server dies when I hook all my clients up... Raw throughput
> >
> >Fortunately that doesn't seem to happen in our case.
>
> Let me clarify - the server doesn't quite die, however it slows down
> drastically, having a load of ~7 while 99% CPU idle. The clients clog
> up, and for instance trying to mount anything just times out. If I
> unmount the busy share from the clients, everything settles down again.

We noticed a higher load with sync-mounts, usually we have 0-2, with 2.4. and
sync-mounts it went until 3-4. But we also only used sync mounts for less
than 24, so changed to async mounts as soon as the people complained.

[snip]

>
> I tried a little more performance number hunting:
> - NFS copy and write to same device - 3Mb/s sustained (500Mb file)
> - piozone direct write to RAID device - ~34-50Mb/s (depending on file
> size) - drbd network device - 933Mb/s
>
> Nothing really wrong here I'd say - not blazingly quick, but surely
> nothing to explain why everything just stops when I hook up my 12
> clients to this set-up?

Certainly not. We have 45 diskless booting clients.

[snip]

> >Well, as I said, when I tested our old server with sync mounts, it had the
> >same problem. Are you sure that your old server did not export
> > asynchronous?
>
> Doesn't seem so, no, the /etc/export is -extremely- simple:
>
> /export client1(rw,no_root_squash)

Which nfs-utils version is running on that server? With 1.0 the default
changed from async to sync. I guess you only did security updates from your
distribution and so there's <=0.3 installed.

With 2.6.7 it helped us a lot to give the no_wdelay option, maybe you should
try it as well.

>
> Similarly, I'm not trying to be clever on the client side (/etc/fstab
> excerpt):
>
> server:/export /export nfs rw,hard,intr 0 0

You should make sure, that rsize and wsize are set to 8192 (cat /proc/mounts
tells you the current value). Also, just try to use tcp-mounts, if you have
an asynchronous net (e.g. the server and switch have a GBit interface, but
the clients only 100MBit) this is necessary anyway. The same if the clients
or server are using linux-2.6.x.


Cheers,
Bernd


Attachments:
(No filename) (2.19 kB)
(No filename) (189.00 B)
signature
Download all attachments

2004-07-27 13:58:01

by Ian Kent

[permalink] [raw]
Subject: Re: async vs. sync

On Tue, 27 Jul 2004, Bernd Schubert wrote:

> > >>My set-up has about the same performance data, however my big problem is
> > >>that the server dies when I hook all my clients up... Raw throughput
> > >
> > >Fortunately that doesn't seem to happen in our case.
> >
> > Let me clarify - the server doesn't quite die, however it slows down
> > drastically, having a load of ~7 while 99% CPU idle. The clients clog
> > up, and for instance trying to mount anything just times out. If I
> > unmount the busy share from the clients, everything settles down again.
>
> We noticed a higher load with sync-mounts, usually we have 0-2, with 2.4. and
> sync-mounts it went until 3-4. But we also only used sync mounts for less
> than 24, so changed to async mounts as soon as the people complained.
>
> [snip]
>
> >
> > I tried a little more performance number hunting:
> > - NFS copy and write to same device - 3Mb/s sustained (500Mb file)
> > - piozone direct write to RAID device - ~34-50Mb/s (depending on file
> > size) - drbd network device - 933Mb/s
> >
> > Nothing really wrong here I'd say - not blazingly quick, but surely
> > nothing to explain why everything just stops when I hook up my 12
> > clients to this set-up?
>
> Certainly not. We have 45 diskless booting clients.

How many threads is the server running?

Ian



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-27 14:04:40

by Jan Bruvoll

[permalink] [raw]
Subject: Re: async vs. sync

Hi Ian,

[email protected] wrote:

> How many threads is the server running?


In my desperation I bumped it all the way up to 128, which is about 50
more than the number of simultaneous mounts.

Regards
Jan



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-27 14:11:42

by Jan Bruvoll

[permalink] [raw]
Subject: Re: async vs. sync

Hi again,

Bernd Schubert wrote:

> Certainly not. We have 45 diskless booting clients.

I should maybe add that these 12 clients are a web server farm for one
of the largest national web sites. However, raw throughput is not the
problem.

Weirder still is that the server seems to lose interest every now and
then - I've been running tests with 2-3 clients and everything is ok,
then I add another client or incur some write activity to the shared
disk and suddenly the server just goes quiet - the clients hang, the
server does nothing (not even disk writes), and it continues like that
for 10-15 secs, then everything clears up again until the next clog-up
30 secs down the line.

Are there any known issues with DRBD perhaps? Didn't think that would be
such a throughput hog?

> Which nfs-utils version is running on that server? With 1.0 the
> default changed from async to sync. I guess you only did security
> updates from your distribution and so there's <=0.3 installed


This is of course highly interesting. You're quite correct, our old
server runs nfs-utils-0.3.1.

>With 2.6.7 it helped us a lot to give the no_wdelay option, maybe you should
>try it as well.
>
>

Haven't made the big leap to 2.6.x yet, especially not after reading
your own comment on stability issues with 2.6.7.

>You should make sure, that rsize and wsize are set to 8192 (cat /proc/mounts
>tells you the current value). Also, just try to use tcp-mounts, if you have
>an asynchronous net (e.g. the server and switch have a GBit interface, but
>the clients only 100MBit) this is necessary anyway. The same if the clients
>or server are using linux-2.6.x.
>
>

Ok - will try these things too.

Thanks
Jan



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-27 15:07:41

by Bernd Schubert

[permalink] [raw]
Subject: Re: async vs. sync

On Tuesday 27 July 2004 16:16, you wrote:
> bernd-
>
> you didn't mention how many server processes were running. if the
> server load goes up to 8, then i would guess you have only 8. you
> should probably boost it to a much larger number, like 128 or more.

Currently 64, top usually shows a usuage between 1 and 10, but not more. I can
try to increase it 128 or more in the evening to see if it helps to speed up
sync mounts.


Cheers,
Bernd



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-28 08:59:35

by Olaf Kirch

[permalink] [raw]
Subject: Re: async vs. sync

Hi,

the way the sync export option affects NFSv3 writes is limited to
COMMITs, so if you see a slow-down here it must be bottle-necking in
that part of the code.

Quite possibly, this is a problem of the underlying file system.
You're using a journaled file system, right? So what seems to happen
is that on every n-th commit call or so all nfsd processes stall
as the file system tries to write its journal. Note that the
VM currently allows dirty data to accumulate for up to 30 seconds
before it is forcibly written to disk (dirty_expire_centisecs sysctl).

A good way to simulate this is to run several iozone processes on the
server and tell them to sync() every 1 MB or so.

iozone -s 1g -r 1m -o -i 0

This takes NFS out of the equation.

Maybe it would help to play with the dirty writeback strategy, e.g. by
lowering /proc/sys/vm/dirty_writeback_centisecs (to e.g. 250), increasing
dirty_background_ratio or lowering vm_dirty_ratio.

It may also be interesting to compare ext3 vs reiser here.

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-28 12:36:07

by Bernd Schubert

[permalink] [raw]
Subject: Re: async vs. sync

Hello Olaf,

> the way the sync export option affects NFSv3 writes is limited to
> COMMITs, so if you see a slow-down here it must be bottle-necking in
> that part of the code.
>
> Quite possibly, this is a problem of the underlying file system.
> You're using a journaled file system, right? So what seems to happen
> is that on every n-th commit call or so all nfsd processes stall
> as the file system tries to write its journal. Note that the
> VM currently allows dirty data to accumulate for up to 30 seconds
> before it is forcibly written to disk (dirty_expire_centisecs sysctl).
>
> A good way to simulate this is to run several iozone processes on the
> server and tell them to sync() every 1 MB or so.
>
> iozone -s 1g -r 1m -o -i 0

I just did some tests:

/home: 10MB/s (mirrored via drbd)
/worka: 60MB/s (not mirrored)

The filesystem is reiserfs in both cases, but it seems drbd has in this case a
terrible performance problem. I also tested to write to a drbd mirrored ext2
partition, but it has the same problem, so I think this is filesystem
independent.

Well, during the testing period of our server, I also tested the drbd
performance, but unfortunately I did most tests with linux-2.6.7. With 2.6.7
I got more than 30MB/s over drbd. Well, I also did some tests with 2.4., but
this was with drbd-0.6.12, now we are using drbd-0.7.0. As far as I remember
the numbers with drbd-0.6+linux-2.4 were similar or even faster than
drbd-0.7+linux-2.6.

>
> This takes NFS out of the equation.

It seems you are right, its not a nfs issue.

>
> Maybe it would help to play with the dirty writeback strategy, e.g. by
> lowering /proc/sys/vm/dirty_writeback_centisecs (to e.g. 250), increasing
> dirty_background_ratio or lowering vm_dirty_ratio.

Well, I only see this on 2.6. systems, but not on 2.4., are there similar
triggers in 2.4.? With 2.6. we don't have the problem at all.


Thanks a lot for pointing me in the right direction!


Best regards,
Bernd



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-07-28 12:54:15

by Olaf Kirch

[permalink] [raw]
Subject: Re: async vs. sync

On Wed, Jul 28, 2004 at 02:35:51PM +0200, Bernd Schubert wrote:
> Well, I only see this on 2.6. systems, but not on 2.4., are there similar
> triggers in 2.4.? With 2.6. we don't have the problem at all.

On 2.4 you'd have to play with the bdflush tunables. I cant
remember what all these numbers mean; consult the documentation.
But there are also some tunables there that let you configure
when bdflush starts writing out dirty pages.

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 15:37:01

by Olaf Kirch

[permalink] [raw]
Subject: Re: async vs. sync

On Tue, Nov 16, 2004 at 10:48:12AM -0800, Lever, Charles wrote:
> i'm just looking for clarification so i can provide a good explanation
> in the Linux NFS FAQ about the evils of using "async." i'll cruise
> through the server code.

Just about the only reason for async I can think of is if you have an
incoming data stream you need to write at a constant rate (think of a
diskless set top box writing an mpeg2 stream)

Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
[email protected] | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 17:55:41

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Olaf Kirch wrote:

>On Tue, Nov 16, 2004 at 10:48:12AM -0800, Lever, Charles wrote:
>
>
>>i'm just looking for clarification so i can provide a good explanation
>>in the Linux NFS FAQ about the evils of using "async." i'll cruise
>>through the server code.
>>
>>
>
>Just about the only reason for async I can think of is if you have an
>incoming data stream you need to write at a constant rate (think of a
>diskless set top box writing an mpeg2 stream)
>
>Olaf
>
>
OK, but using sync at my site is really really slow ... compare to async
! here's a detailed (options printed) demonstration for an untar
operation that takes 13 minutes in async mode and only 14 secondes in
sync mode !!

1) Export in sync mode
NFS server (RedHat ES3 kernel 2.4.21-4.ELsmp) options for that export:
$ cat /proc/fs/nfs/exports | grep arvouin
/p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,sync,wdelay,acl) #
157.159.21.55
$ cat /var/lib/nfs/xtab | grep arvouin
/p2v5f1
arvouin.int-evry.fr(rw,sync,wdelay,hide,nocrossmnt,secure,no_root_squash,no_all_squash,subtree_check,secure_locks,acl,mapping=identity,anonuid=-2,anongid=-2)

Client running Fedora Core 2, kernel 2.6.8-1.521
[root@arvouin /mnt/cobra3/mci/test/Test-sync]
$cat /proc/mounts
cobra3:/p2v5f1 /mnt/cobra3 nfs
rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
$time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
real 13m3.686s
user 0m1.055s
sys 0m4.354s

2) Export in async mode:
Same NFS server, options for that export:
$ cat /proc/fs/nfs/exports | grep arvouin
/p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,async,wdelay,acl) #
157.159.21.55
$ cat /var/lib/nfs/xtab | grep arvouin
/p2v5f2
arvouin.int-evry.fr(rw,async,wdelay,hide,nocrossmnt,secure,no_root_squash,no_all_squash,subtree_check,secure_locks,acl,mapping=identity,anonuid=-2,anongid=-2)

Same client running Fedora Core 2, kernel 2.6.8-1.521
cobra3:/p2v5f1 /mnt/cobra3 nfs
rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
[root@arvouin /mnt/cobra3/mci/test/Test-sync]
$time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
real 0m14.802s
user 0m0.867s
sys 0m2.886s

My users won't accept the sync performances ! . I have no choice, but is
running in async mode is really evil as you mentioned it ? is there a
way to have better performances in sync in my case ? As anyone had the
same gap in performance as me ( here 55 times longer in sync mode !) ?

Thanks.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 18:07:51

by Roger Heflin

[permalink] [raw]
Subject: RE: async vs. sync


This might be the issue, (and someone correct this if I am incorrect),
I know I ran into it a couple of years ago, and it is not the easiest
to understand exactly what is actually going on.

There are 2 places where you can put sync and async, one is the exports
and one is on the mount command. They are different.

You want sync on the exports, this will allow a client to survive without
data loss if the server reboots. You want async on the client mount end
and
this will generally give you the speed. With async on the client end
the client is keeping track of what is outstanding if the server crashes,
so you won't lose data. With async on both ends the server tells one that
the data is safe when it is not, and if the server crashes the client things
that the data was safe when it really was not.

If you put sync in both locations then your NFS disk is fully synced and
the application won't even start another write until the last one is
confirmed
finished and on the actual disk. With async on the client end the next
write
will start before the client has received an ack from the server, and this
will
be reasonably fast.

So basically:

exports mount
sync sync -> really safe and really slow
sync async -> Safe and fast
async either -> unsafe and fast.

Running async exports and async mount did not appear (under my testing) to
be faster under a sustained load than did sync exports and async mount.
When
the initial test was started async/async was faster but that quick changed
once
the buffer cache filled up.

Roger


> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of jehan.procaccia
> Sent: Monday, November 22, 2004 11:55 AM
> To: Olaf Kirch
> Cc: [email protected]; [email protected];
> [email protected]
> Subject: Re: [NFS] async vs. sync
>
> Olaf Kirch wrote:
>
> >On Tue, Nov 16, 2004 at 10:48:12AM -0800, Lever, Charles wrote:
> >
> >
> >>i'm just looking for clarification so i can provide a good
> explanation
> >>in the Linux NFS FAQ about the evils of using "async." i'll cruise
> >>through the server code.
> >>
> >>
> >
> >Just about the only reason for async I can think of is if
> you have an
> >incoming data stream you need to write at a constant rate
> (think of a
> >diskless set top box writing an mpeg2 stream)
> >
> >Olaf
> >
> >
> OK, but using sync at my site is really really slow ...
> compare to async ! here's a detailed (options printed)
> demonstration for an untar operation that takes 13 minutes in
> async mode and only 14 secondes in sync mode !!
>
> 1) Export in sync mode
> NFS server (RedHat ES3 kernel 2.4.21-4.ELsmp) options for
> that export:
> $ cat /proc/fs/nfs/exports | grep arvouin
> /p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,sync,wdelay,acl) #
> 157.159.21.55
> $ cat /var/lib/nfs/xtab | grep arvouin
> /p2v5f1
> arvouin.int-evry.fr(rw,sync,wdelay,hide,nocrossmnt,secure,no_r
oot_squash,no_all_squash,subtree_check,secure_locks,acl,mapping=identity,ano
nuid=-2,anongid=-2)
>
> Client running Fedora Core 2, kernel 2.6.8-1.521
> [root@arvouin /mnt/cobra3/mci/test/Test-sync] $cat /proc/mounts
> cobra3:/p2v5f1 /mnt/cobra3 nfs
> rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
> $time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
> real 13m3.686s
> user 0m1.055s
> sys 0m4.354s
>
> 2) Export in async mode:
> Same NFS server, options for that export:
> $ cat /proc/fs/nfs/exports | grep arvouin
> /p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,async,wdelay,acl) #
> 157.159.21.55
> $ cat /var/lib/nfs/xtab | grep arvouin
> /p2v5f2
> arvouin.int-evry.fr(rw,async,wdelay,hide,nocrossmnt,secure,no_
root_squash,no_all_squash,subtree_check,secure_locks,acl,mapping=identity,an
onuid=-2,anongid=-2)
>
> Same client running Fedora Core 2, kernel 2.6.8-1.521
> cobra3:/p2v5f1 /mnt/cobra3 nfs
> rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
> [root@arvouin /mnt/cobra3/mci/test/Test-sync] $time tar xvfz
> /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
> real 0m14.802s
> user 0m0.867s
> sys 0m2.886s
>
> My users won't accept the sync performances ! . I have no
> choice, but is running in async mode is really evil as you
> mentioned it ? is there a way to have better performances in
> sync in my case ? As anyone had the same gap in performance
> as me ( here 55 times longer in sync mode !) ?
>
> Thanks.
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide Read honest &
> candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 18:09:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: async vs. sync

m=E5 den 22.11.2004 Klokka 18:55 (+0100) skreiv jehan.procaccia:

> My users won't accept the sync performances !

Will they accept data loss in case of a server crash? If they don't
care, then hey - you're safe...

BTW: I've never seen a slowdown as bad as that. Just out of curiosity,
what happens if you turn off ACL support and/or use a client from
ftp.kernel.org, with no ACL support.

(NOTE: the reason I've rejected ACLs so far in the mainline client is
the lack of support for caching in the patches I've seen so far)

Cheers,
Trond
--=20
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 18:32:17

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync

> If you put sync in both locations then your NFS disk is fully=20
> synced and=20
> the application won't even start another write until the last=20
> one is confirmed
> finished and on the actual disk. With async on the client=20
> end the next
> write=20
> will start before the client has received an ack from the=20
> server, and this will be reasonably fast.
>=20
> So basically:
>=20
> exports mount
> sync sync -> really safe and really slow
> sync async -> Safe and fast
> async either -> unsafe and fast.
>=20
> Running async exports and async mount did not appear (under=20
> my testing) to be faster under a sustained load than did sync=20
> exports and async mount. When the initial test was started=20
> async/async was faster but that quick changed once the buffer=20
> cache filled up.

the "async" setting on the client side is the default. using the "sync"
mount option on the client side will be slow no matter what.

the recommended settings are:

1. don't use the "sync" mount option on the client unless your
application requires it

2. always use the "sync" export option (with newer nfs-utils, this is
the default).


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 18:47:13

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

OK , thanks for the explanation . So now I try:

exports mount
sync async -> Safe and fast

[root@arvouin ~]
$mount cobra3:/p2v5f1/ -o async /mnt/cobra3

unfortunatly the async mount option doesn't shows up in /proc/mounts,
so I am not sure my client is using async, how can I check that ?

[root@arvouin ~]
$cat /proc/mounts
cobra3:/p2v5f1/ /mnt/cobra3 nfs
rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0

Anyway, I untar again my httpd, I'am late so I didn't wait 13 minute or
so but it started to be as long as it was :-(

Now on the server I removed ACL (nerver asked for it though !?)
$ cat /proc/fs/nfs/exports | grep arvouin
/p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,sync,wdelay,no_acl) #
157.159.21.55
again I cannot check on the client that this option is removed ?,
anyway again I untar my httpd, I'am still late ... but seeing each file
(tar -v !) shouwing up every second or so in the tty tells me that it
will take at least 10 minutes ... :-(

Any other idea ? what is wrong in my config ?

thanks.




Roger Heflin wrote:

>This might be the issue, (and someone correct this if I am incorrect),
>I know I ran into it a couple of years ago, and it is not the easiest
>to understand exactly what is actually going on.
>
>There are 2 places where you can put sync and async, one is the exports
>and one is on the mount command. They are different.
>
>You want sync on the exports, this will allow a client to survive without
>data loss if the server reboots. You want async on the client mount end
>and
>this will generally give you the speed. With async on the client end
>the client is keeping track of what is outstanding if the server crashes,
>so you won't lose data. With async on both ends the server tells one that
>the data is safe when it is not, and if the server crashes the client things
>that the data was safe when it really was not.
>
>If you put sync in both locations then your NFS disk is fully synced and
>the application won't even start another write until the last one is
>confirmed
>finished and on the actual disk. With async on the client end the next
>write
>will start before the client has received an ack from the server, and this
>will
>be reasonably fast.
>
>So basically:
>
>exports mount
>sync sync -> really safe and really slow
>sync async -> Safe and fast
>async either -> unsafe and fast.
>
>Running async exports and async mount did not appear (under my testing) to
>be faster under a sustained load than did sync exports and async mount.
>When
>the initial test was started async/async was faster but that quick changed
>once
>the buffer cache filled up.
>
> Roger
>
>
>
>
>>-----Original Message-----
>>From: [email protected]
>>[mailto:[email protected]] On Behalf Of jehan.procaccia
>>Sent: Monday, November 22, 2004 11:55 AM
>>To: Olaf Kirch
>>Cc: [email protected]; [email protected];
>>[email protected]
>>Subject: Re: [NFS] async vs. sync
>>
>>Olaf Kirch wrote:
>>
>>
>>
>>>On Tue, Nov 16, 2004 at 10:48:12AM -0800, Lever, Charles wrote:
>>>
>>>
>>>
>>>
>>>>i'm just looking for clarification so i can provide a good
>>>>
>>>>
>>explanation
>>
>>
>>>>in the Linux NFS FAQ about the evils of using "async." i'll cruise
>>>>through the server code.
>>>>
>>>>
>>>>
>>>>
>>>Just about the only reason for async I can think of is if
>>>
>>>
>>you have an
>>
>>
>>>incoming data stream you need to write at a constant rate
>>>
>>>
>>(think of a
>>
>>
>>>diskless set top box writing an mpeg2 stream)
>>>
>>>Olaf
>>>
>>>
>>>
>>>
>>OK, but using sync at my site is really really slow ...
>>compare to async ! here's a detailed (options printed)
>>demonstration for an untar operation that takes 13 minutes in
>>async mode and only 14 secondes in sync mode !!
>>
>>1) Export in sync mode
>>NFS server (RedHat ES3 kernel 2.4.21-4.ELsmp) options for
>>that export:
>>$ cat /proc/fs/nfs/exports | grep arvouin
>>/p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,sync,wdelay,acl) #
>>157.159.21.55
>>$ cat /var/lib/nfs/xtab | grep arvouin
>>/p2v5f1
>>arvouin.int-evry.fr(rw,sync,wdelay,hide,nocrossmnt,secure,no_r
>>
>>
>oot_squash,no_all_squash,subtree_check,secure_locks,acl,mapping=identity,ano
>nuid=-2,anongid=-2)
>
>
>>Client running Fedora Core 2, kernel 2.6.8-1.521
>>[root@arvouin /mnt/cobra3/mci/test/Test-sync] $cat /proc/mounts
>>cobra3:/p2v5f1 /mnt/cobra3 nfs
>>rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
>>$time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
>>real 13m3.686s
>>user 0m1.055s
>>sys 0m4.354s
>>
>>2) Export in async mode:
>>Same NFS server, options for that export:
>>$ cat /proc/fs/nfs/exports | grep arvouin
>>/p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,async,wdelay,acl) #
>>157.159.21.55
>>$ cat /var/lib/nfs/xtab | grep arvouin
>>/p2v5f2
>>arvouin.int-evry.fr(rw,async,wdelay,hide,nocrossmnt,secure,no_
>>
>>
>root_squash,no_all_squash,subtree_check,secure_locks,acl,mapping=identity,an
>onuid=-2,anongid=-2)
>
>
>>Same client running Fedora Core 2, kernel 2.6.8-1.521
>>cobra3:/p2v5f1 /mnt/cobra3 nfs
>>rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
>>[root@arvouin /mnt/cobra3/mci/test/Test-sync] $time tar xvfz
>>/usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
>>real 0m14.802s
>>user 0m0.867s
>>sys 0m2.886s
>>
>>My users won't accept the sync performances ! . I have no
>>choice, but is running in async mode is really evil as you
>>mentioned it ? is there a way to have better performances in
>>sync in my case ? As anyone had the same gap in performance
>>as me ( here 55 times longer in sync mode !) ?
>>
>>Thanks.
>>
>>
>>-------------------------------------------------------
>>SF email is sponsored by - The IT Product Guide Read honest &
>>candid reviews on hundreds of IT Products from real users.
>>Discover which products truly live up to the hype. Start reading now.
>>http://productguide.itmanagersjournal.com/
>>_______________________________________________
>>NFS maillist - [email protected]
>>https://lists.sourceforge.net/lists/listinfo/nfs
>>
>>
>>
>
>
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 19:02:46

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync

> in my old solaris 7 nfs it was async I suppose ! although we=20
> didn't lose data, maybe we where risking it ... ?

the Solaris NFS server doesn't support an "async" mode.

> anyway now I move from and old=20
> solaris NFS server to a brand new linux one with a SAN=20
> (AX100) Storage=20
> Processor in Raid 5 an Fiber Channel attachement, how can I support=20
> performances more than 50 times longer :-( with that config, =20
> user and=20
> manager wil tell me that I spoild money on that new server !, there=20
> must be a misconfiguration somewhere ?.

take a walk through the NFS how-to http://nfs.sourceforge.net/howto/ to
see if there is anything useful there.

also, you could measure your file system performance locally on the NFS
server instead of via an NFS client to see if your RAID and local file
system is configured correctly.

finally, you should look at raw network performance between your clients
and server to make sure you are not losing any performance there.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 18:57:59

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Trond Myklebust wrote:

>m=E5 den 22.11.2004 Klokka 18:55 (+0100) skreiv jehan.procaccia:
>
> =20
>
>>My users won't accept the sync performances !
>> =20
>>
>
>Will they accept data loss in case of a server crash? If they don't
>care, then hey - you're safe...
> =20
>
in my old solaris 7 nfs it was async I suppose ! although we didn't lose=20
data, maybe we where risking it ... ? anyway now I move from and old=20
solaris NFS server to a brand new linux one with a SAN (AX100) Storage=20
Processor in Raid 5 an Fiber Channel attachement, how can I support=20
performances more than 50 times longer :-( with that config, user and=20
manager wil tell me that I spoild money on that new server !, there=20
must be a misconfiguration somewhere ?.

>BTW: I've never seen a slowdown as bad as that. Just out of curiosity,
>what happens if you turn off ACL support and/or use a client from
>ftp.kernel.org, with no ACL support.
> =20
>
I did it on the server export option (no_acl) , no better performance.=20
However I don't know why I have ACLs, I nerver ask for it !?
/etc/fstab on the server
/dev/emcpowerp1 /p2v5f1 ext3 =20
defaults,usrquota,grpquota 1 2

>(NOTE: the reason I've rejected ACLs so far in the mainline client is
>the lack of support for caching in the patches I've seen so far)
>
>Cheers,
> Trond
> =20
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 19:05:14

by Roger Heflin

[permalink] [raw]
Subject: RE: async vs. sync


Solaris has always been sync on the export, async on the mount, that
is unless you had a prestoserve type module (which made some kernel
adjustments) which used to be available on the servers, and would backup
the memory that nfs was using as cache with a battery. Without
said hardware there was no way to make Solaris async.

Roger=20

> -----Original Message-----
> From: [email protected]=20
> [mailto:[email protected]] On Behalf Of jehan.procaccia
> Sent: Monday, November 22, 2004 12:58 PM
> To: Trond Myklebust
> Cc: Olaf Kirch; [email protected]; [email protected]
> Subject: Re: [NFS] async vs. sync
>=20
> Trond Myklebust wrote:
>=20
> >m=E5 den 22.11.2004 Klokka 18:55 (+0100) skreiv jehan.procaccia:
> >
> > =20
> >
> >>My users won't accept the sync performances !
> >> =20
> >>
> >
> >Will they accept data loss in case of a server crash? If they don't=20
> >care, then hey - you're safe...
> > =20
> >
> in my old solaris 7 nfs it was async I suppose ! although we=20
> didn't lose data, maybe we where risking it ... ? anyway now=20
> I move from and old solaris NFS server to a brand new linux=20
> one with a SAN (AX100) Storage Processor in Raid 5 an Fiber=20
> Channel attachement, how can I support performances more than=20
> 50 times longer :-( with that config, user and manager wil=20
> tell me that I spoild money on that new server !, there must=20
> be a misconfiguration somewhere ?.
>=20
> >BTW: I've never seen a slowdown as bad as that. Just out of=20
> curiosity,=20
> >what happens if you turn off ACL support and/or use a client from=20
> >ftp.kernel.org, with no ACL support.
> > =20
> >
> I did it on the server export option (no_acl) , no better=20
> performance.=20
> However I don't know why I have ACLs, I nerver ask for it !?
> /etc/fstab on the server
> /dev/emcpowerp1 /p2v5f1 ext3 =20
> defaults,usrquota,grpquota 1 2
>=20
> >(NOTE: the reason I've rejected ACLs so far in the mainline=20
> client is=20
> >the lack of support for caching in the patches I've seen so far)
> >
> >Cheers,
> > Trond
> > =20
> >
>=20
>=20
>=20
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide Read honest &=20
> candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.=20
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> NFS maillist - [email protected]=20
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 19:10:30

by Roger Heflin

[permalink] [raw]
Subject: RE: async vs. sync


After looking a bit closer, I am not sure that there is anything
wrong with your mount options. The one thing I would suggest
is changing your rsize/wsize blocksize and seeing what that
does to your performance. You should be able to go as high
as 32k.

Your test may also be somewhat misleading as it is performing
a lot of file creates and those take a lot of time no matter
how you do it, do the normal user applications do a lot of
file creates/opens or do the normal users do a lot of large
file writes? How many files are in that tar file? And what
is their average size? File creates and file opens are one
place that NFS tends to have a large difference in speed.

Roger

> -----Original Message-----
> From: jehan.procaccia [mailto:[email protected]]
> Sent: Monday, November 22, 2004 12:47 PM
> To: Roger Heflin
> Cc: 'Olaf Kirch'; [email protected];
> [email protected]; [email protected]
> Subject: Re: [NFS] async vs. sync
>
> OK , thanks for the explanation . So now I try:
>
> exports mount
> sync async -> Safe and fast
>
> [root@arvouin ~]
> $mount cobra3:/p2v5f1/ -o async /mnt/cobra3
>
> unfortunatly the async mount option doesn't shows up in
> /proc/mounts, so I am not sure my client is using async, how
> can I check that ?
>
> [root@arvouin ~]
> $cat /proc/mounts
> cobra3:/p2v5f1/ /mnt/cobra3 nfs
> rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
>
> Anyway, I untar again my httpd, I'am late so I didn't wait 13
> minute or so but it started to be as long as it was :-(
>
> Now on the server I removed ACL (nerver asked for it though
> !?) $ cat /proc/fs/nfs/exports | grep arvouin
> /p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,sync,wdelay,no_acl) #
> 157.159.21.55
> again I cannot check on the client that this option is
> removed ?, anyway again I untar my httpd, I'am still late
> ... but seeing each file (tar -v !) shouwing up every second
> or so in the tty tells me that it will take at least 10
> minutes ... :-(
>
> Any other idea ? what is wrong in my config ?
>
> thanks.
>
>
>
>
> Roger Heflin wrote:
>
> >This might be the issue, (and someone correct this if I am
> incorrect),
> >I know I ran into it a couple of years ago, and it is not
> the easiest
> >to understand exactly what is actually going on.
> >
> >There are 2 places where you can put sync and async, one is
> the exports
> >and one is on the mount command. They are different.
> >
> >You want sync on the exports, this will allow a client to
> survive without
> >data loss if the server reboots. You want async on the
> client mount end
> >and
> >this will generally give you the speed. With async on the
> client end
> >the client is keeping track of what is outstanding if the server
> >crashes, so you won't lose data. With async on both ends the server
> >tells one that the data is safe when it is not, and if the server
> >crashes the client things that the data was safe when it
> really was not.
> >
> >If you put sync in both locations then your NFS disk is fully synced
> >and the application won't even start another write until the
> last one
> >is confirmed
> >finished and on the actual disk. With async on the client
> end the next
> >write
> >will start before the client has received an ack from the
> server, and
> >this will be reasonably fast.
> >
> >So basically:
> >
> >exports mount
> >sync sync -> really safe and really slow
> >sync async -> Safe and fast
> >async either -> unsafe and fast.
> >
> >Running async exports and async mount did not appear (under
> my testing)
> >to be faster under a sustained load than did sync exports
> and async mount.
> >When
> >the initial test was started async/async was faster but that quick
> >changed once the buffer cache filled up.
> >
> > Roger
> >
> >
> >
> >
> >>-----Original Message-----
> >>From: [email protected]
> >>[mailto:[email protected]] On Behalf Of
> jehan.procaccia
> >>Sent: Monday, November 22, 2004 11:55 AM
> >>To: Olaf Kirch
> >>Cc: [email protected]; [email protected];
> >>[email protected]
> >>Subject: Re: [NFS] async vs. sync
> >>
> >>Olaf Kirch wrote:
> >>
> >>
> >>
> >>>On Tue, Nov 16, 2004 at 10:48:12AM -0800, Lever, Charles wrote:
> >>>
> >>>
> >>>
> >>>
> >>>>i'm just looking for clarification so i can provide a good
> >>>>
> >>>>
> >>explanation
> >>
> >>
> >>>>in the Linux NFS FAQ about the evils of using "async."
> i'll cruise
> >>>>through the server code.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>Just about the only reason for async I can think of is if
> >>>
> >>>
> >>you have an
> >>
> >>
> >>>incoming data stream you need to write at a constant rate
> >>>
> >>>
> >>(think of a
> >>
> >>
> >>>diskless set top box writing an mpeg2 stream)
> >>>
> >>>Olaf
> >>>
> >>>
> >>>
> >>>
> >>OK, but using sync at my site is really really slow ...
> >>compare to async ! here's a detailed (options printed)
> demonstration
> >>for an untar operation that takes 13 minutes in async mode
> and only 14
> >>secondes in sync mode !!
> >>
> >>1) Export in sync mode
> >>NFS server (RedHat ES3 kernel 2.4.21-4.ELsmp) options for that
> >>export:
> >>$ cat /proc/fs/nfs/exports | grep arvouin
> >>/p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,sync,wdelay,acl) #
> >>157.159.21.55
> >>$ cat /var/lib/nfs/xtab | grep arvouin
> >>/p2v5f1
> >>arvouin.int-evry.fr(rw,sync,wdelay,hide,nocrossmnt,secure,no_r
> >>
> >>
> >oot_squash,no_all_squash,subtree_check,secure_locks,acl,mappi
> ng=identit
> >y,ano
> >nuid=-2,anongid=-2)
> >
> >
> >>Client running Fedora Core 2, kernel 2.6.8-1.521 [root@arvouin
> >>/mnt/cobra3/mci/test/Test-sync] $cat /proc/mounts
> >>cobra3:/p2v5f1 /mnt/cobra3 nfs
> >>rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0 $time tar
> >>xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
> >>real 13m3.686s
> >>user 0m1.055s
> >>sys 0m4.354s
> >>
> >>2) Export in async mode:
> >>Same NFS server, options for that export:
> >>$ cat /proc/fs/nfs/exports | grep arvouin
> >>/p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,async,wdelay,acl) #
> >>157.159.21.55
> >>$ cat /var/lib/nfs/xtab | grep arvouin
> >>/p2v5f2
> >>arvouin.int-evry.fr(rw,async,wdelay,hide,nocrossmnt,secure,no_
> >>
> >>
> >root_squash,no_all_squash,subtree_check,secure_locks,acl,mapp
> ing=identi
> >ty,an
> >onuid=-2,anongid=-2)
> >
> >
> >>Same client running Fedora Core 2, kernel 2.6.8-1.521
> >>cobra3:/p2v5f1 /mnt/cobra3 nfs
> >>rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
> >>[root@arvouin /mnt/cobra3/mci/test/Test-sync] $time tar xvfz
> >>/usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
> >>real 0m14.802s
> >>user 0m0.867s
> >>sys 0m2.886s
> >>
> >>My users won't accept the sync performances ! . I have no
> choice, but
> >>is running in async mode is really evil as you mentioned it
> ? is there
> >>a way to have better performances in sync in my case ? As
> anyone had
> >>the same gap in performance as me ( here 55 times longer in
> sync mode
> >>!) ?
> >>
> >>Thanks.
> >>
> >>
> >>-------------------------------------------------------
> >>SF email is sponsored by - The IT Product Guide Read honest
> & candid
> >>reviews on hundreds of IT Products from real users.
> >>Discover which products truly live up to the hype. Start
> reading now.
> >>http://productguide.itmanagersjournal.com/
> >>_______________________________________________
> >>NFS maillist - [email protected]
> >>https://lists.sourceforge.net/lists/listinfo/nfs
> >>
> >>
> >>
> >
> >
> >
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 20:15:01

by Trond Myklebust

[permalink] [raw]
Subject: Re: async vs. sync

m=E5 den 22.11.2004 Klokka 19:57 (+0100) skreiv jehan.procaccia:
> >
> in my old solaris 7 nfs it was async I suppose ! although we didn't lose=20
> data, maybe we where risking it ... ? anyway now I move from and old=20
> solaris NFS server to a brand new linux one with a SAN (AX100) Storage=20
> Processor in Raid 5 an Fiber Channel attachement, how can I support=20
> performances more than 50 times longer :-( with that config, user and=20
> manager wil tell me that I spoild money on that new server !, there=20
> must be a misconfiguration somewhere ?.
>=20

Solaris has no equivalent to the "async" export option. AFAIK that is a
Linux creation.

Cheers,
Trond
--=20
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 21:05:14

by Paul Cunningham

[permalink] [raw]
Subject: Re: async vs. sync

It has been a few years, but I remember some of the async details. I always
used async for performance reasons, and much testing was performed to assure
no data would be lost. If a client sent an async write to the server, the
call could return prior to data being flushed to disk. The data would make
it to disk once the server decided to write it or the client sends in a
COMMIT. At some point in time the client will attempt to close the file,
this is when a COMMIT must be sent, the hope is that the server has already
written the dirty pages to disk while the client was busy doing other
things, and will respond with an OK quickly. If any dirty pages remain,
they must be flushed prior to responding OK. Data should never be lost as
long as the NFSPROC3_COMMIT procedure is adhered to.

This is my recollection of how NFS v3 works (former kernel programmer), not
necessarily how Linux implements the protocol (but I hope these rules are
followed).

Paul Cunningham

----- Original Message -----
From: "Trond Myklebust" <[email protected]>
To: "jehan.procaccia" <[email protected]>
Cc: "Olaf Kirch" <[email protected]>; <[email protected]>;
<[email protected]>
Sent: Monday, November 22, 2004 3:14 PM
Subject: Re: [NFS] async vs. sync


m? den 22.11.2004 Klokka 19:57 (+0100) skreiv jehan.procaccia:
> >
> in my old solaris 7 nfs it was async I suppose ! although we didn't lose
> data, maybe we where risking it ... ? anyway now I move from and old
> solaris NFS server to a brand new linux one with a SAN (AX100) Storage
> Processor in Raid 5 an Fiber Channel attachement, how can I support
> performances more than 50 times longer :-( with that config, user and
> manager wil tell me that I spoild money on that new server !, there
> must be a misconfiguration somewhere ?.
>

Solaris has no equivalent to the "async" export option. AFAIK that is a
Linux creation.

Cheers,
Trond
--
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 21:14:46

by Trond Myklebust

[permalink] [raw]
Subject: Re: async vs. sync

m=E5 den 22.11.2004 Klokka 16:04 (-0500) skreiv Paul Cunningham:

> It has been a few years, but I remember some of the async details. I
> always=20
> used async for performance reasons, and much testing was performed to
> assure=20
> no data would be lost. If a client sent an async write to the server,
> the=20
> call could return prior to data being flushed to disk. The data would
> make=20
> it to disk once the server decided to write it or the client sends in
> a=20
> COMMIT. At some point in time the client will attempt to close the
> file,=20
> this is when a COMMIT must be sent, the hope is that the server has
> already=20
> written the dirty pages to disk while the client was busy doing other=20
> things, and will respond with an OK quickly. If any dirty pages
> remain,=20
> they must be flushed prior to responding OK. Data should never be
> lost as=20
> long as the NFSPROC3_COMMIT procedure is adhered to.
>=20

Sure. This is how the Linux client works. The problem is the "async"
*export* option on the server.

--=20
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 21:26:01

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Lever, Charles wrote:

>>in my old solaris 7 nfs it was async I suppose ! although we
>>didn't lose data, maybe we where risking it ... ?
>>
>>
>
>the Solaris NFS server doesn't support an "async" mode.
>
>
bad news for me !

>
>
>>anyway now I move from and old
>>solaris NFS server to a brand new linux one with a SAN
>>(AX100) Storage
>>Processor in Raid 5 an Fiber Channel attachement, how can I support
>>performances more than 50 times longer :-( with that config,
>>user and
>>manager wil tell me that I spoild money on that new server !, there
>>must be a misconfiguration somewhere ?.
>>
>>
>
>take a walk through the NFS how-to http://nfs.sourceforge.net/howto/ to
>see if there is anything useful there.
>
>
I did that already, but maybe I should go through again ...

>also, you could measure your file system performance locally on the NFS
>server instead of via an NFS client to see if your RAID and local file
>system is configured correctly.
>
>
No problem, it goes very fast ! 4s for 13 minutes in NFS async mode :-(
[root@cobra3 /p2v5f1/mci/test/Test-sync]
$ time tar xvfz /tmp/httpd-2.0.51.tar.gz
real 0m4.404s
user 0m0.450s
sys 0m0.770s

>finally, you should look at raw network performance between your clients
>and server to make sure you are not losing any performance there.
>
>
No, anyway in async mode it's OK, but proof is here with iperf in udp
mode -> 93.7 Mbits/s on a 100 Mbits/sec ethernet network !
[root@arvouin Test-sync]# iperf -c cobra3 -i 1 -u -b 100000000

[root@cobra3 ~]
$ iperf -s -i 1 -u
[ 3] 0.0-10.0 sec 112 MBytes 93.7 Mbits/sec 0.053 ms 0/79767 (0%)

I've also tried to stop iptables firewall on both side (client/server)
no way :-( .
Anyone is getting the same performance as me in server sync mode ?, if
someone could just tar xvfz any recent httpd.tar.gz file (nearly 3000
source files) and let me know how it take on their nfs configuration ?

Thanks.





-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 21:44:51

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Roger Heflin wrote:

>
>After looking a bit closer, I am not sure that there is anything
>wrong with your mount options. The one thing I would suggest
>is changing your rsize/wsize blocksize and seeing what that
>does to your performance. You should be able to go as high
>as 32k.
>
>
[root@arvouin tmp]# mount cobra3:/p2v5f1 -o
async,wsize=32768,rsize=32768,soft /mnt/cobra3
[root@arvouin /mnt/cobra3/mci/test/Test-sync]
$time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz

sorry I don't want to wait more than 10 minutes to send that mail, but
again seeing the files apperaing very slowly on the tty it seems not to
be the solution :-( .

>Your test may also be somewhat misleading as it is performing
>a lot of file creates and those take a lot of time no matter
>how you do it, do the normal user applications do a lot of
>file creates/opens or do the normal users do a lot of large
>file writes? How many files are in that tar file? And what
>is their average size? File creates and file opens are one
>place that NFS tends to have a large difference in speed.
>
>
3000 files with often less than 100Ko or even 10Ko in size in the httpd
source tarball. Sure it won't be the day to day usage of our students,
although they are teach computer science in our school and might have
that kind of usage , anyway "untar, gunzip, make, make install" is one
of Mine favorite usage ;-) , maybe I'am too selfish !.

> Roger
>
>
>
>>-----Original Message-----
>>From: jehan.procaccia [mailto:[email protected]]
>>Sent: Monday, November 22, 2004 12:47 PM
>>To: Roger Heflin
>>Cc: 'Olaf Kirch'; [email protected];
>>[email protected]; [email protected]
>>Subject: Re: [NFS] async vs. sync
>>
>>OK , thanks for the explanation . So now I try:
>>
>>exports mount
>>sync async -> Safe and fast
>>
>>[root@arvouin ~]
>>$mount cobra3:/p2v5f1/ -o async /mnt/cobra3
>>
>>unfortunatly the async mount option doesn't shows up in
>>/proc/mounts, so I am not sure my client is using async, how
>>can I check that ?
>>
>>[root@arvouin ~]
>>$cat /proc/mounts
>>cobra3:/p2v5f1/ /mnt/cobra3 nfs
>>rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
>>
>>Anyway, I untar again my httpd, I'am late so I didn't wait 13
>>minute or so but it started to be as long as it was :-(
>>
>>Now on the server I removed ACL (nerver asked for it though
>>!?) $ cat /proc/fs/nfs/exports | grep arvouin
>>/p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,sync,wdelay,no_acl) #
>>157.159.21.55
>>again I cannot check on the client that this option is
>>removed ?, anyway again I untar my httpd, I'am still late
>>... but seeing each file (tar -v !) shouwing up every second
>>or so in the tty tells me that it will take at least 10
>>minutes ... :-(
>>
>>Any other idea ? what is wrong in my config ?
>>
>>thanks.
>>
>>
>>
>>
>>Roger Heflin wrote:
>>
>>
>>
>>>This might be the issue, (and someone correct this if I am
>>>
>>>
>>incorrect),
>>
>>
>>>I know I ran into it a couple of years ago, and it is not
>>>
>>>
>>the easiest
>>
>>
>>>to understand exactly what is actually going on.
>>>
>>>There are 2 places where you can put sync and async, one is
>>>
>>>
>>the exports
>>
>>
>>>and one is on the mount command. They are different.
>>>
>>>You want sync on the exports, this will allow a client to
>>>
>>>
>>survive without
>>
>>
>>>data loss if the server reboots. You want async on the
>>>
>>>
>>client mount end
>>
>>
>>>and
>>>this will generally give you the speed. With async on the
>>>
>>>
>>client end
>>
>>
>>>the client is keeping track of what is outstanding if the server
>>>crashes, so you won't lose data. With async on both ends the server
>>>tells one that the data is safe when it is not, and if the server
>>>crashes the client things that the data was safe when it
>>>
>>>
>>really was not.
>>
>>
>>>If you put sync in both locations then your NFS disk is fully synced
>>>and the application won't even start another write until the
>>>
>>>
>>last one
>>
>>
>>>is confirmed
>>>finished and on the actual disk. With async on the client
>>>
>>>
>>end the next
>>
>>
>>>write
>>>will start before the client has received an ack from the
>>>
>>>
>>server, and
>>
>>
>>>this will be reasonably fast.
>>>
>>>So basically:
>>>
>>>exports mount
>>>sync sync -> really safe and really slow
>>>sync async -> Safe and fast
>>>async either -> unsafe and fast.
>>>
>>>Running async exports and async mount did not appear (under
>>>
>>>
>>my testing)
>>
>>
>>>to be faster under a sustained load than did sync exports
>>>
>>>
>>and async mount.
>>
>>
>>>When
>>>the initial test was started async/async was faster but that quick
>>>changed once the buffer cache filled up.
>>>
>>> Roger
>>>
>>>
>>>
>>>
>>>
>>>
>>>>-----Original Message-----
>>>>From: [email protected]
>>>>[mailto:[email protected]] On Behalf Of
>>>>
>>>>
>>jehan.procaccia
>>
>>
>>>>Sent: Monday, November 22, 2004 11:55 AM
>>>>To: Olaf Kirch
>>>>Cc: [email protected]; [email protected];
>>>>[email protected]
>>>>Subject: Re: [NFS] async vs. sync
>>>>
>>>>Olaf Kirch wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>On Tue, Nov 16, 2004 at 10:48:12AM -0800, Lever, Charles wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>i'm just looking for clarification so i can provide a good
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>explanation
>>>>
>>>>
>>>>
>>>>
>>>>>>in the Linux NFS FAQ about the evils of using "async."
>>>>>>
>>>>>>
>>i'll cruise
>>
>>
>>>>>>through the server code.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>Just about the only reason for async I can think of is if
>>>>>
>>>>>
>>>>>
>>>>>
>>>>you have an
>>>>
>>>>
>>>>
>>>>
>>>>>incoming data stream you need to write at a constant rate
>>>>>
>>>>>
>>>>>
>>>>>
>>>>(think of a
>>>>
>>>>
>>>>
>>>>
>>>>>diskless set top box writing an mpeg2 stream)
>>>>>
>>>>>Olaf
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>OK, but using sync at my site is really really slow ...
>>>>compare to async ! here's a detailed (options printed)
>>>>
>>>>
>>demonstration
>>
>>
>>>>for an untar operation that takes 13 minutes in async mode
>>>>
>>>>
>>and only 14
>>
>>
>>>>secondes in sync mode !!
>>>>
>>>>1) Export in sync mode
>>>>NFS server (RedHat ES3 kernel 2.4.21-4.ELsmp) options for that
>>>>export:
>>>>$ cat /proc/fs/nfs/exports | grep arvouin
>>>>/p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,sync,wdelay,acl) #
>>>>157.159.21.55
>>>>$ cat /var/lib/nfs/xtab | grep arvouin
>>>>/p2v5f1
>>>>arvouin.int-evry.fr(rw,sync,wdelay,hide,nocrossmnt,secure,no_r
>>>>
>>>>
>>>>
>>>>
>>>oot_squash,no_all_squash,subtree_check,secure_locks,acl,mappi
>>>
>>>
>>ng=identit
>>
>>
>>>y,ano
>>>nuid=-2,anongid=-2)
>>>
>>>
>>>
>>>
>>>>Client running Fedora Core 2, kernel 2.6.8-1.521 [root@arvouin
>>>>/mnt/cobra3/mci/test/Test-sync] $cat /proc/mounts
>>>>cobra3:/p2v5f1 /mnt/cobra3 nfs
>>>>rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0 $time tar
>>>>xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
>>>>real 13m3.686s
>>>>user 0m1.055s
>>>>sys 0m4.354s
>>>>
>>>>2) Export in async mode:
>>>>Same NFS server, options for that export:
>>>>$ cat /proc/fs/nfs/exports | grep arvouin
>>>>/p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,async,wdelay,acl) #
>>>>157.159.21.55
>>>>$ cat /var/lib/nfs/xtab | grep arvouin
>>>>/p2v5f2
>>>>arvouin.int-evry.fr(rw,async,wdelay,hide,nocrossmnt,secure,no_
>>>>
>>>>
>>>>
>>>>
>>>root_squash,no_all_squash,subtree_check,secure_locks,acl,mapp
>>>
>>>
>>ing=identi
>>
>>
>>>ty,an
>>>onuid=-2,anongid=-2)
>>>
>>>
>>>
>>>
>>>>Same client running Fedora Core 2, kernel 2.6.8-1.521
>>>>cobra3:/p2v5f1 /mnt/cobra3 nfs
>>>>rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
>>>>[root@arvouin /mnt/cobra3/mci/test/Test-sync] $time tar xvfz
>>>>/usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
>>>>real 0m14.802s
>>>>user 0m0.867s
>>>>sys 0m2.886s
>>>>
>>>>My users won't accept the sync performances ! . I have no
>>>>
>>>>
>>choice, but
>>
>>
>>>>is running in async mode is really evil as you mentioned it
>>>>
>>>>
>>? is there
>>
>>
>>>>a way to have better performances in sync in my case ? As
>>>>
>>>>
>>anyone had
>>
>>
>>>>the same gap in performance as me ( here 55 times longer in
>>>>
>>>>
>>sync mode
>>
>>
>>>>!) ?
>>>>
>>>>Thanks.
>>>>
>>>>
>>>>-------------------------------------------------------
>>>>SF email is sponsored by - The IT Product Guide Read honest
>>>>
>>>>
>>& candid
>>
>>
>>>>reviews on hundreds of IT Products from real users.
>>>>Discover which products truly live up to the hype. Start
>>>>
>>>>
>>reading now.
>>
>>
>>>>http://productguide.itmanagersjournal.com/
>>>>_______________________________________________
>>>>NFS maillist - [email protected]
>>>>https://lists.sourceforge.net/lists/listinfo/nfs
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>
>
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 21:50:52

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync

> [root@arvouin tmp]# mount cobra3:/p2v5f1 -o=20
> async,wsize=3D32768,rsize=3D32768,soft /mnt/cobra3

um. you're not using NFS version 3?

not to mention "soft" mounts are also truly the spawn of satan.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 21:52:43

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

jehan procaccia wrote:

> [root@arvouin tmp]# mount cobra3:/p2v5f1 -o
> async,wsize=32768,rsize=32768,soft /mnt/cobra3
> [root@arvouin /mnt/cobra3/mci/test/Test-sync]
> $time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
>
> sorry I don't want to wait more than 10 minutes to send that mail, but
> again seeing the files apperaing very slowly on the tty it seems not
> to be the solution :-( .

For the record ... it finnaly ended, not very better than the original
13 minutes !
real 12m53.520s
user 0m1.070s
sys 0m4.268s

>
>> Your test may also be somewhat misleading as it is performing
>> a lot of file creates and those take a lot of time no matter
>> how you do it, do the normal user applications do a lot of file
>> creates/opens or do the normal users do a lot of large
>> file writes? How many files are in that tar file? And what
>> is their average size? File creates and file opens are one
>> place that NFS tends to have a large difference in speed.
>>
>>
> 3000 files with often less than 100Ko or even 10Ko in size in the
> httpd source tarball. Sure it won't be the day to day usage of our
> students, although they are teach computer science in our school and
> might have that kind of usage , anyway "untar, gunzip, make, make
> install" is one of Mine favorite usage ;-) , maybe I'am too selfish !.
>
>> Roger
>>
>>
>>
>>> -----Original Message-----
>>> From: jehan.procaccia [mailto:[email protected]] Sent:
>>> Monday, November 22, 2004 12:47 PM
>>> To: Roger Heflin
>>> Cc: 'Olaf Kirch'; [email protected];
>>> [email protected]; [email protected]
>>> Subject: Re: [NFS] async vs. sync
>>>
>>> OK , thanks for the explanation . So now I try:
>>>
>>> exports mount
>>> sync async -> Safe and fast
>>>
>>> [root@arvouin ~]
>>> $mount cobra3:/p2v5f1/ -o async /mnt/cobra3
>>>
>>> unfortunatly the async mount option doesn't shows up in
>>> /proc/mounts, so I am not sure my client is using async, how can I
>>> check that ?
>>>
>>> [root@arvouin ~]
>>> $cat /proc/mounts
>>> cobra3:/p2v5f1/ /mnt/cobra3 nfs
>>> rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
>>>
>>> Anyway, I untar again my httpd, I'am late so I didn't wait 13 minute
>>> or so but it started to be as long as it was :-(
>>>
>>> Now on the server I removed ACL (nerver asked for it though !?) $
>>> cat /proc/fs/nfs/exports | grep arvouin
>>> /p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,sync,wdelay,no_acl) #
>>> 157.159.21.55
>>> again I cannot check on the client that this option is removed ?,
>>> anyway again I untar my httpd, I'am still late ... but seeing each
>>> file (tar -v !) shouwing up every second or so in the tty tells me
>>> that it will take at least 10 minutes ... :-(
>>>
>>> Any other idea ? what is wrong in my config ?
>>>
>>> thanks.
>>>
>>>
>>>
>>>
>>> Roger Heflin wrote:
>>>
>>>
>>>
>>>> This might be the issue, (and someone correct this if I am
>>>
>>> incorrect),
>>>
>>>> I know I ran into it a couple of years ago, and it is not
>>>
>>> the easiest
>>>
>>>> to understand exactly what is actually going on.
>>>>
>>>> There are 2 places where you can put sync and async, one is
>>>
>>> the exports
>>>
>>>> and one is on the mount command. They are different.
>>>>
>>>> You want sync on the exports, this will allow a client to
>>>
>>> survive without
>>>
>>>
>>>> data loss if the server reboots. You want async on the
>>>
>>> client mount end
>>>
>>>
>>>> and
>>>> this will generally give you the speed. With async on the
>>>
>>> client end
>>>
>>>
>>>> the client is keeping track of what is outstanding if the server
>>>> crashes, so you won't lose data. With async on both ends the
>>>> server tells one that the data is safe when it is not, and if the
>>>> server crashes the client things that the data was safe when it
>>>
>>> really was not.
>>>
>>>
>>>> If you put sync in both locations then your NFS disk is fully
>>>> synced and the application won't even start another write until the
>>>>
>>>
>>> last one
>>>
>>>> is confirmed
>>>> finished and on the actual disk. With async on the client
>>>
>>> end the next
>>>
>>>
>>>> write
>>>> will start before the client has received an ack from the
>>>
>>> server, and
>>>
>>>> this will be reasonably fast.
>>>>
>>>> So basically:
>>>>
>>>> exports mount
>>>> sync sync -> really safe and really slow
>>>> sync async -> Safe and fast
>>>> async either -> unsafe and fast.
>>>>
>>>> Running async exports and async mount did not appear (under
>>>
>>> my testing)
>>>
>>>> to be faster under a sustained load than did sync exports
>>>
>>> and async mount.
>>>
>>>
>>>> When
>>>> the initial test was started async/async was faster but that quick
>>>> changed once the buffer cache filled up.
>>>>
>>>> Roger
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected]
>>>>> [mailto:[email protected]] On Behalf Of
>>>>
>>> jehan.procaccia
>>>
>>>
>>>>> Sent: Monday, November 22, 2004 11:55 AM
>>>>> To: Olaf Kirch
>>>>> Cc: [email protected]; [email protected];
>>>>> [email protected]
>>>>> Subject: Re: [NFS] async vs. sync
>>>>>
>>>>> Olaf Kirch wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Tue, Nov 16, 2004 at 10:48:12AM -0800, Lever, Charles wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> i'm just looking for clarification so i can provide a good
>>>>>>>
>>>>>>>
>>>>>>
>>>>> explanation
>>>>>
>>>>>
>>>>>
>>>>>>> in the Linux NFS FAQ about the evils of using "async."
>>>>>>
>>> i'll cruise
>>>
>>>>>>> through the server code.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Just about the only reason for async I can think of is if
>>>>>>
>>>>>>
>>>>>
>>>>> you have an
>>>>>
>>>>>
>>>>>
>>>>>> incoming data stream you need to write at a constant rate
>>>>>>
>>>>>>
>>>>>
>>>>> (think of a
>>>>>
>>>>>
>>>>>
>>>>>> diskless set top box writing an mpeg2 stream)
>>>>>>
>>>>>> Olaf
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> OK, but using sync at my site is really really slow ... compare to
>>>>> async ! here's a detailed (options printed)
>>>>
>>> demonstration
>>>
>>>>> for an untar operation that takes 13 minutes in async mode
>>>>
>>> and only 14
>>>
>>>>> secondes in sync mode !!
>>>>>
>>>>> 1) Export in sync mode
>>>>> NFS server (RedHat ES3 kernel 2.4.21-4.ELsmp) options for that
>>>>> export:
>>>>> $ cat /proc/fs/nfs/exports | grep arvouin
>>>>> /p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,sync,wdelay,acl) #
>>>>> 157.159.21.55
>>>>> $ cat /var/lib/nfs/xtab | grep arvouin
>>>>> /p2v5f1
>>>>> arvouin.int-evry.fr(rw,sync,wdelay,hide,nocrossmnt,secure,no_r
>>>>>
>>>>>
>>>>
>>>> oot_squash,no_all_squash,subtree_check,secure_locks,acl,mappi
>>>>
>>>
>>> ng=identit
>>>
>>>
>>>> y,ano
>>>> nuid=-2,anongid=-2)
>>>>
>>>>
>>>>
>>>>
>>>>> Client running Fedora Core 2, kernel 2.6.8-1.521 [root@arvouin
>>>>> /mnt/cobra3/mci/test/Test-sync] $cat /proc/mounts
>>>>> cobra3:/p2v5f1 /mnt/cobra3 nfs
>>>>> rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0 $time
>>>>> tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
>>>>> real 13m3.686s
>>>>> user 0m1.055s
>>>>> sys 0m4.354s
>>>>>
>>>>> 2) Export in async mode:
>>>>> Same NFS server, options for that export:
>>>>> $ cat /proc/fs/nfs/exports | grep arvouin
>>>>> /p2v5f1 arvouin.int-evry.fr(rw,no_root_squash,async,wdelay,acl) #
>>>>> 157.159.21.55
>>>>> $ cat /var/lib/nfs/xtab | grep arvouin
>>>>> /p2v5f2
>>>>> arvouin.int-evry.fr(rw,async,wdelay,hide,nocrossmnt,secure,no_
>>>>>
>>>>>
>>>>
>>>> root_squash,no_all_squash,subtree_check,secure_locks,acl,mapp
>>>>
>>>
>>> ing=identi
>>>
>>>
>>>> ty,an
>>>> onuid=-2,anongid=-2)
>>>>
>>>>
>>>>
>>>>
>>>>> Same client running Fedora Core 2, kernel 2.6.8-1.521
>>>>> cobra3:/p2v5f1 /mnt/cobra3 nfs
>>>>> rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0
>>>>> [root@arvouin /mnt/cobra3/mci/test/Test-sync] $time tar xvfz
>>>>> /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
>>>>> real 0m14.802s
>>>>> user 0m0.867s
>>>>> sys 0m2.886s
>>>>>
>>>>> My users won't accept the sync performances ! . I have no
>>>>
>>> choice, but
>>>
>>>>> is running in async mode is really evil as you mentioned it
>>>>
>>> ? is there
>>>
>>>>> a way to have better performances in sync in my case ? As
>>>>
>>> anyone had
>>>
>>>>> the same gap in performance as me ( here 55 times longer in
>>>>
>>> sync mode
>>>
>>>>> !) ?
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> -------------------------------------------------------
>>>>> SF email is sponsored by - The IT Product Guide Read honest
>>>>
>>> & candid
>>>
>>>>> reviews on hundreds of IT Products from real users.
>>>>> Discover which products truly live up to the hype. Start
>>>>
>>> reading now.
>>>
>>>>> http://productguide.itmanagersjournal.com/
>>>>> _______________________________________________
>>>>> NFS maillist - [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/nfs
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 21:57:39

by Joshua Baker-LePain

[permalink] [raw]
Subject: Re: async vs. sync

On Mon, 22 Nov 2004 at 1:45pm, [email protected] wrote

> Anyone is getting the same performance as me in server sync mode ?, if
> someone could just tar xvfz any recent httpd.tar.gz file (nearly 3000
> source files) and let me know how it take on their nfs configuration ?

Server: RH7.3, sync export
Client: Centos 3.3 (RHEL 3 U3 rebuild), async mount, UDP, 8K r/wsize

[jlb@chaos tmp]$ time tar xzf ~/tmp/httpd-2.0.52.tar.gz

real 0m25.522s
user 0m0.520s
sys 0m0.560s

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 22:06:13

by Nicolas Kowalski

[permalink] [raw]
Subject: Re: async vs. sync

On Mon, 22 Nov 2004, jehan procaccia wrote:

> Anyone is getting the same performance as me in server sync mode ?, if
> someone could just tar xvfz any recent httpd.tar.gz file (nearly 3000 source
> files) and let me know how it take on their nfs configuration ?

Using a 2.4.27 Linux client, without any options for the NFS mount, on a
100Mbps network, I get this when untarring httpd-2.0.52.tar.gz:

real 1m33.781s
user 0m0.610s
sys 0m0.980s

Server information:

- Dual-Xeon 2.4Ghz, 1GB RAM, U160 73GB SCSI disks

- Linux 2.4.27, XFS filesystems, quotas (user,group) on, no ACL, sync
exports.

Regards.

--
Nicolas



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 22:06:28

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Lever, Charles wrote:

>>[root@arvouin tmp]# mount cobra3:/p2v5f1 -o
>>async,wsize=32768,rsize=32768,soft /mnt/cobra3
>>
>>
>
>um. you're not using NFS version 3?
>
>
>
I though it was a default !?

>not to mention "soft" mounts are also truly the spawn of satan.
>
>
OK I remove it, and force nfs v3
$mount cobra3:/p2v5f1 -o async,wsize=32768,rsize=32768,nfsvers=3 /mnt/cobra3

It is about to take the same very long time :-(

However what stricks me is that although I asked for r&wsize of 32 K ,
looking at /proc/mounts show :
$cat /proc/mounts
cobra3:/p2v5f1 /mnt/cobra3 nfs
rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0

8 K r&wsize ??? I also tried with 16K it's still shows 8K ?, howerver
moving from soft to hard did shows up in /proc/mounts .
Again, How can we check every options of an NFS mounted filesystem on
the client side ? (other than /proc/mountd) .

thanks .





-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 22:09:01

by Paul Cunningham

[permalink] [raw]
Subject: Re: async vs. sync

Sorry, I must have missed the export option in a previous message. As I
shake the cobwebs, I cannot remember an async export option used to increase
performance with a warning from the man page "allows the NFS server to
violate the NFS protocol".

Sounds like a dangerous option. Perhaps I'm too far removed from this now,
but the applicable mount options were async and v3. The server then should
adhere to the mount request from the client regardless of an export option.
This all assumes v3 is available on the server. Scanning the man page on my
various Linux boxes I cannot find a v3 mount option, the issue is becoming
much clearer now!

So how does a client mount NFS v3 with Linux?

Paul Cunningham


----- Original Message -----
From: "Trond Myklebust" <[email protected]>
To: "Paul Cunningham" <[email protected]>
Cc: <[email protected]>
Sent: Monday, November 22, 2004 4:14 PM
Subject: Re: [NFS] async vs. sync


m? den 22.11.2004 Klokka 16:04 (-0500) skreiv Paul Cunningham:

> It has been a few years, but I remember some of the async details. I
> always
> used async for performance reasons, and much testing was performed to
> assure
> no data would be lost. If a client sent an async write to the server,
> the
> call could return prior to data being flushed to disk. The data would
> make
> it to disk once the server decided to write it or the client sends in
> a
> COMMIT. At some point in time the client will attempt to close the
> file,
> this is when a COMMIT must be sent, the hope is that the server has
> already
> written the dirty pages to disk while the client was busy doing other
> things, and will respond with an OK quickly. If any dirty pages
> remain,
> they must be flushed prior to responding OK. Data should never be
> lost as
> long as the NFSPROC3_COMMIT procedure is adhered to.
>

Sure. This is how the Linux client works. The problem is the "async"
*export* option on the server.

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 22:15:01

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync


> >>[root@arvouin tmp]# mount cobra3:/p2v5f1 -o=20
> >>async,wsize=3D32768,rsize=3D32768,soft /mnt/cobra3
> >
> >um. you're not using NFS version 3?
> >
> I though it was a default !?

in Linux, only on more recent mount command / kernel combinations is
version 3 the default.

> >not to mention "soft" mounts are also truly the spawn of satan.
> > =20
> >
> OK I remove it, and force nfs v3

note: soft is probably not a performance issue here, but it will
definitely be a source of silent data corruption.

> However what stricks me is that although I asked for r&wsize=20
> of 32 K ,=20
> looking at /proc/mounts show :
> $cat /proc/mounts
> cobra3:/p2v5f1 /mnt/cobra3 nfs=20
> rw,v3,rsize=3D8192,wsize=3D8192,hard,tcp,lock,addr=3Dcobra3 0 0
>=20
> 8 K r&wsize ??? I also tried with 16K it's still shows 8K ?

that means your server supports only 8KB transfer sizes. the client and
server negotiate the maximum size of reads and writes at mount time.

have you read some of the excellent reference textbooks listed in the
NFS FAQ? that might help you to become a little more fluent with the
operation of NFS.

> howerver=20
> moving from soft to hard did shows up in /proc/mounts .
> Again, How can we check every options of an NFS mounted filesystem on=20
> the client side ? (other than /proc/mountd) .

some options don't show up when they are at their default settings.
no-one ever specifies "async" as a mount option, as it is the default,
so it is not included in /proc/mounts. NFS version isn't listed in
/proc/mounts unless it was specified on the original mount command.
otherwise, i quite agree with you that every setting should be spelled
out in /proc/mounts, but unfortunately that's not the way it works
today.

the only way to truly see what's going on is to capture a network trace
and load it up in ethereal. in fact, i recommend that as your next step
-- capture about 30 seconds of your test workload with tcpdump to see
what's going over the wire. ethereal also has a nice RPC round trip
average calculator.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 22:20:57

by Trond Myklebust

[permalink] [raw]
Subject: Re: async vs. sync

m? den 22.11.2004 Klokka 22:52 (+0100) skreiv jehan procaccia:
> > [root@arvouin tmp]# mount cobra3:/p2v5f1 -o
> > async,wsize=32768,rsize=32768,soft /mnt/cobra3
> > [root@arvouin /mnt/cobra3/mci/test/Test-sync]
> > $time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
> >
> > sorry I don't want to wait more than 10 minutes to send that mail, but
> > again seeing the files apperaing very slowly on the tty it seems not
> > to be the solution :-( .

The following script may help you understand why things are slower in
the case of "async" on an untar. It basically just creates a bunch of
files: in the first test it does not sync the directory to disk after
each file creation, in the second case it does. The test does no
reads/writes to the file.

Run it on the server and you will see a clear difference in time
between test1 and test2. Run it on the client, and there should be
little difference between test1 and test2 (but there will be a heavy
dependency on the "async" vs "sync" export flag on the server).

NFSv3 mandates that all directory-related operations should behave as in
test 2. Only writes to ordinary files may be cached by the server, and
when the client sends a COMMIT request, the server should do an fsync()
on that file.

Cheers,
Trond
--
Trond Myklebust <[email protected]>


Attachments:
script.sh (447.00 B)

2004-11-22 22:26:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: async vs. sync

m=E5 den 22.11.2004 Klokka 17:07 (-0500) skreiv Paul Cunningham:

> So how does a client mount NFS v3 with Linux?
>=20

The default is to do just as you described: writes are asynchronous
using the caching model described in RFC1813 (NFSv3 unstable writes are
the default, and a COMMIT is sent on close and/or if memory pressure
forces the client to release the cached writes early).

There is also a "sync" mount option available that forces the client to
use NFSv3 stable writes.

The problem is the server side "async" option which basically causes
knfsd to ignore COMMIT requests, as well as causing it to fail to
fsync() the directory after file CREATE/LINK/UNLINK/RENAME/...
i.e. it turns of all consistency guarantess on the server.

Cheers,
Trond

--=20
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 22:57:59

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Trond Myklebust wrote:

>m=E5 den 22.11.2004 Klokka 22:52 (+0100) skreiv jehan procaccia:
> =20
>
>>>[root@arvouin tmp]# mount cobra3:/p2v5f1 -o=20
>>>async,wsize=3D32768,rsize=3D32768,soft /mnt/cobra3
>>>[root@arvouin /mnt/cobra3/mci/test/Test-sync]
>>>$time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
>>>
>>>sorry I don't want to wait more than 10 minutes to send that mail, but=
=20
>>>again seeing the files apperaing very slowly on the tty it seems not=20
>>>to be the solution :-( .
>>> =20
>>>
>
>The following script may help you understand why things are slower in
>the case of "async" on an untar. It basically just creates a bunch of
>files: in the first test it does not sync the directory to disk after
>each file creation, in the second case it does. The test does no
>reads/writes to the file.
>
>Run it on the server and you will see a clear difference in time
>between test1 and test2. Run it on the client, and there should be
>little difference between test1 and test2 (but there will be a heavy
>dependency on the "async" vs "sync" export flag on the server).
> =20
>
These prediction are perfect, this is exaclty what happened:
I reduce the loop from 1000 to 100, it was too long on the client in=20
sync mode ....

[root@cobra3 /p2v5f1/mci/test/Test-sync]
$ ./test2.sh
Test without directory sync after file creation

real 0m0.037s
user 0m0.010s
sys 0m0.000s
Test2 with directory sync after file creation

real 0m6.040s
user 0m0.000s
sys 0m0.000s

NFS client, while server export in sync mode
cobra3:/p2v5f1 /mnt/cobra3 nfs=20
rw,v3,rsize=3D8192,wsize=3D8192,soft,tcp,lock,addr=3Dcobra3 0 0
$./test2.sh
Test without directory sync after file creation

real 0m31.144s
user 0m0.042s
sys 0m0.373s
Test2 with directory sync after file creation

real 0m49.030s
user 0m0.073s
sys 0m0.694s

Now NFS server exports in async mode to the client, performances are far=20
better , even better than direclty on the server when forcing a sync=20
call after every creation !
$./test2.sh
Test without directory sync after file creation

real 0m0.446s
user 0m0.026s
sys 0m0.078s
Test2 with directory sync after file creation

real 0m0.785s
user 0m0.076s
sys 0m0.305s

Hopefully I'll try to capture traffic next wednesday when I'll be back=20
at work ..

Thanks.

>NFSv3 mandates that all directory-related operations should behave as in
>test 2. Only writes to ordinary files may be cached by the server, and
>when the client sends a COMMIT request, the server should do an fsync()
>on that file.
>
>Cheers,
> Trond
> =20
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-22 23:51:24

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Lucky guys !
real 1m33.781s
and
tar xzf ~/tmp/httpd-2.0.52.tar.gz
real 0m25.522s
these are the kind of values I am expecting ... ( although in the 25s
case I see that you din't used -v option of tar, which does waist lot of
time ...) . I still cannot explain myself why I need async mode on the
export server to appraoch these kind of performances ... ?

[email protected] wrote:

> On Mon, 22 Nov 2004, jehan procaccia wrote:
>
>> Anyone is getting the same performance as me in server sync mode ?,
>> if someone could just tar xvfz any recent httpd.tar.gz file (nearly
>> 3000 source files) and let me know how it take on their nfs
>> configuration ?
>
>
> Using a 2.4.27 Linux client, without any options for the NFS mount, on
> a 100Mbps network, I get this when untarring httpd-2.0.52.tar.gz:
>
> real 1m33.781s
> user 0m0.610s
> sys 0m0.980s
>
> Server information:
>
> - Dual-Xeon 2.4Ghz, 1GB RAM, U160 73GB SCSI disks
>
> - Linux 2.4.27, XFS filesystems, quotas (user,group) on, no ACL, sync
> exports.
>
> Regards.
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-23 01:10:00

by Dan Stromberg

[permalink] [raw]
Subject: RE: async vs. sync

On Mon, 2004-11-22 at 13:50, Lever, Charles wrote:
> > [root@arvouin tmp]# mount cobra3:/p2v5f1 -o
> > async,wsize=32768,rsize=32768,soft /mnt/cobra3
>
> um. you're not using NFS version 3?
>
> not to mention "soft" mounts are also truly the spawn of satan.

Unless your filesystem is also readonly...

>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-11-23 03:53:27

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync

> > not to mention "soft" mounts are also truly the spawn of satan.
>=20
> Unless your filesystem is also readonly...

unfortunately that is a myth. even a readonly file system can fall
victim to the evils of soft mounts, i'm told.

the cached version of the file on the client can become corrupted if the
file is changing remotely and one or more cache-refresh reads times out.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-23 09:51:12

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

I found something interesting ! If I export an internal scsi disk of my=20
NFS server (/home) I get good performances, My problem is with exports=20
from the Dell/EMC Storage Processor , I will open a ticket about that on=20
the hotline ...

Nfs servor exporting an Internal scsi disk in sync mode
$ cat /var/lib/nfs/xtab
/home =20
arvouin.int-evry.fr(rw,sync,no_wdelay,hide,nocrossmnt,secure,no_root_squa=
sh,no_all_squash,subtree_check,secure_locks,no_acl,mapping=3Didentity,ano=
nuid=3D-2,anongid=3D-2)

Client mount in sync an NFS export in sync also !:
cobra3:/home /mnt/cobra3home nfs=20
rw,sync,v3,rsize=3D8192,wsize=3D8192,hard,tcp,lock,addr=3Dcobra3 0 0
[root@arvouin /mnt/cobra3home/Nfs-test]
$time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
real 0m27.719s
user 0m1.002s
sys 0m4.296s

[root@arvouin /mnt/cobra3home/Nfs-test]
$./test2.sh
Test without directory sync after file creation
=20
real 0m0.568s
user 0m0.031s
sys 0m0.114s
Test2 with directory sync after file creation
=20
real 0m0.904s
user 0m0.094s
sys 0m0.266s


This is what I expect in term of performances . I will continue my=20
requests on the DEll/EMC hotline , but maybe the security of that AX100=20
storage Processor (raid5, spare disk, double fiber attachement, UPS)=20
allows me to use async export mode in such a case ?

thanks .

jehan procaccia wrote:

> Trond Myklebust wrote:
>
>> m=E5 den 22.11.2004 Klokka 22:52 (+0100) skreiv jehan procaccia:
>> =20
>>
>>>> [root@arvouin tmp]# mount cobra3:/p2v5f1 -o=20
>>>> async,wsize=3D32768,rsize=3D32768,soft /mnt/cobra3
>>>> [root@arvouin /mnt/cobra3/mci/test/Test-sync]
>>>> $time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
>>>>
>>>> sorry I don't want to wait more than 10 minutes to send that mail,=20
>>>> but again seeing the files apperaing very slowly on the tty it=20
>>>> seems not to be the solution :-( .
>>>> =20
>>>
>>
>> The following script may help you understand why things are slower in
>> the case of "async" on an untar. It basically just creates a bunch of
>> files: in the first test it does not sync the directory to disk after
>> each file creation, in the second case it does. The test does no
>> reads/writes to the file.
>>
>> Run it on the server and you will see a clear difference in time
>> between test1 and test2. Run it on the client, and there should be
>> little difference between test1 and test2 (but there will be a heavy
>> dependency on the "async" vs "sync" export flag on the server).
>> =20
>>
> These prediction are perfect, this is exaclty what happened:
> I reduce the loop from 1000 to 100, it was too long on the client in=20
> sync mode ....
>
> [root@cobra3 /p2v5f1/mci/test/Test-sync]
> $ ./test2.sh
> Test without directory sync after file creation
>
> real 0m0.037s
> user 0m0.010s
> sys 0m0.000s
> Test2 with directory sync after file creation
>
> real 0m6.040s
> user 0m0.000s
> sys 0m0.000s
>
> NFS client, while server export in sync mode
> cobra3:/p2v5f1 /mnt/cobra3 nfs=20
> rw,v3,rsize=3D8192,wsize=3D8192,soft,tcp,lock,addr=3Dcobra3 0 0
> $./test2.sh
> Test without directory sync after file creation
>
> real 0m31.144s
> user 0m0.042s
> sys 0m0.373s
> Test2 with directory sync after file creation
>
> real 0m49.030s
> user 0m0.073s
> sys 0m0.694s
>
> Now NFS server exports in async mode to the client, performances are=20
> far better , even better than direclty on the server when forcing a=20
> sync call after every creation !
> $./test2.sh
> Test without directory sync after file creation
>
> real 0m0.446s
> user 0m0.026s
> sys 0m0.078s
> Test2 with directory sync after file creation
>
> real 0m0.785s
> user 0m0.076s
> sys 0m0.305s
>
> Hopefully I'll try to capture traffic next wednesday when I'll be back=20
> at work ..
>
> Thanks.
>
>> NFSv3 mandates that all directory-related operations should behave as =
in
>> test 2. Only writes to ordinary files may be cached by the server, and
>> when the client sends a COMMIT request, the server should do an fsync(=
)
>> on that file.
>>
>> Cheers,
>> Trond
>> =20
>>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users=
.
> Discover which products truly live up to the hype. Start reading now.=20
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nf
> s




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-23 14:30:22

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync

> This is what I expect in term of performances . I will continue my=20
> requests on the DEll/EMC hotline , but maybe the security of=20
> that AX100=20
> storage Processor (raid5, spare disk, double fiber attachement, UPS)=20
> allows me to use async export mode in such a case ?

the "async" export option changes the behavior of the NFS server
daemons, not of the underlying local file system or storage subsystem.
the problem is that changes made by clients will remain in your NFS
server's memory and not get flushed onto permanent storage.

so, i really don't think the storage subsystem will have any effect on
the safety of your data before the data reaches permanent storage. as
someone else pointed out earlier, the solution is to use battery-backed
main memory when using "async" (prestoserve for solaris?).

as trond said, if your users and backup facilities can tolerate the loss
of data during a crash, then it is perfectly fine to use "async." most
don't, however.

btw, it is fairly well understood that RAID-5 and NFS servers don't mix
well. RAID-5's weakest point is that it doesn't handle small random
writes very well, and that's exactly what is required of it when
handling NFS traffic that consists mostly of metadata changes (file
creates, deletes, and so on). neil explained clearly how to make the
best use of a RAID-5 with NFS: do your local file system journaling
somewhere else.

when trying your workload locally on the NFS server, realize that there
are some optimizations that local file systems make, like caching and
coalescing metadata updates, that the NFS protocol does not allow. this
affects especially workloads with lots of metadata change operations,
because the NFS protocol requires each metadata update to reside on
permanent storage before the NFS server replies to the client,
effectively serializing the workload with storage activity.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-23 14:57:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: async vs. sync

On Tue, Nov 23, 2004 at 10:50:55AM +0100, jehan procaccia wrote:
> This is what I expect in term of performances . I will continue my
> requests on the DEll/EMC hotline , but maybe the security of that AX100
> storage Processor (raid5, spare disk, double fiber attachement, UPS)
> allows me to use async export mode in such a case ?

No. The problem with the async export option is that it allows the
server to tell the client that data is safely committed to disk when it
hasn't actually been. So the risk is that the nfs server will fail
while valuable data is still sitting in its memory. It's the server
itself that's the weak point, not the storage.

--Bruce Fields


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-23 16:33:06

by Dan Stromberg

[permalink] [raw]
Subject: RE: async vs. sync

On Mon, 2004-11-22 at 19:53, Lever, Charles wrote:
> > > not to mention "soft" mounts are also truly the spawn of satan.
> >
> > Unless your filesystem is also readonly...
>
> unfortunately that is a myth. even a readonly file system can fall
> victim to the evils of soft mounts, i'm told.

Ummmm... How?

About all that can happen, is you lose the mount. You pretty much
-can't- toast any data.

> the cached version of the file on the client can become corrupted if the
> file is changing remotely and one or more cache-refresh reads times out.

In which case it wasn't truly read only....


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-11-23 16:36:51

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync

> On Mon, 2004-11-22 at 19:53, Lever, Charles wrote:
> > > > not to mention "soft" mounts are also truly the spawn of satan.
> > >=20
> > > Unless your filesystem is also readonly...
> >=20
> > unfortunately that is a myth. even a readonly file system can fall=20
> > victim to the evils of soft mounts, i'm told.
>=20
> Ummmm... How?
>=20
> About all that can happen, is you lose the mount. You pretty much
> -can't- toast any data.
>=20
> > the cached version of the file on the client can become=20
> corrupted if=20
> > the file is changing remotely and one or more cache-refresh reads=20
> > times out.
>=20
> In which case it wasn't truly read only....

let's be precise.

there is only one case where a soft mount won't cause data corruption.
that is when the local file system on the server is mounted read only
and exported read only, and the client mounts the NFS export read only.

maybe good for sharing CD-ROMs via NFS, but otherwise not terribly
useful.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-23 18:16:47

by Dan Stromberg

[permalink] [raw]
Subject: RE: async vs. sync

On Tue, 2004-11-23 at 08:36, Lever, Charles wrote:
> > On Mon, 2004-11-22 at 19:53, Lever, Charles wrote:
> > > > > not to mention "soft" mounts are also truly the spawn of satan.
> > > >
> > > > Unless your filesystem is also readonly...
> > >
> > > unfortunately that is a myth. even a readonly file system can fall
> > > victim to the evils of soft mounts, i'm told.
> >
> > Ummmm... How?
> >
> > About all that can happen, is you lose the mount. You pretty much
> > -can't- toast any data.
> >
> > > the cached version of the file on the client can become
> > corrupted if
> > > the file is changing remotely and one or more cache-refresh reads
> > > times out.
> >
> > In which case it wasn't truly read only....
>
> let's be precise.
>
> there is only one case where a soft mount won't cause data corruption.
> that is when the local file system on the server is mounted read only
> and exported read only, and the client mounts the NFS export read only.
>
> maybe good for sharing CD-ROMs via NFS, but otherwise not terribly
> useful.

Yes, that's nice and precise. :)

We used to have a filesystem that contained netboot info (ethers,
bootptab, bootparams, tftp), mounted soft on a bunch of hosts. We never
had a problem with it, other than it disappearing when the NFS server
would go down for a while. The data changed very infrequently, so we
were fine.



Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-11-23 21:47:05

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Lever, Charles wrote:

>>This is what I expect in term of performances . I will continue my
>>requests on the DEll/EMC hotline , but maybe the security of
>>that AX100
>>storage Processor (raid5, spare disk, double fiber attachement, UPS)
>>allows me to use async export mode in such a case ?
>>
>>
>
>the "async" export option changes the behavior of the NFS server
>daemons, not of the underlying local file system or storage subsystem.
>the problem is that changes made by clients will remain in your NFS
>server's memory and not get flushed onto permanent storage.
>
>so, i really don't think the storage subsystem will have any effect on
>the safety of your data before the data reaches permanent storage. as
>someone else pointed out earlier, the solution is to use battery-backed
>main memory when using "async" (prestoserve for solaris?).
>
>as trond said, if your users and backup facilities can tolerate the loss
>of data during a crash, then it is perfectly fine to use "async." most
>don't, however.
>
>btw, it is fairly well understood that RAID-5 and NFS servers don't mix
>well. RAID-5's weakest point is that it doesn't handle small random
>writes very well, and that's exactly what is required of it when
>handling NFS traffic that consists mostly of metadata changes (file
>creates, deletes, and so on). neil explained clearly how to make the
>best use of a RAID-5 with NFS: do your local file system journaling
>somewhere else.
>
>
No, not yet, but if it is safer and increase performances maybe I
should do it !

Perhaps it's not the place to talk about ext3 here, but if someone on
the list did already put their journal on a separate device, please
confirm me those points:
From what I read on man mkefs for ext3 FS I can create a journal on a
separate FS :
mke2fs -O journal_dev external-journal
creates the journal FS, on which device ? -> internal scsi drive of my
server or better placed on the dell/EMC SP ?

mke2fs -J device=/dev/external-journal /dev/emcpower
Format the FS and use the external journal just create above, but what is the recommended size of the
external journal ? when journal is internal it is said the size of the journal must be at least 1024 filesystem blocks
(in my case blocks a 4K size) so journal is at least 4 Mb, but should it be bigger ?

Finally, can I "externalize" an already internal journal from production FS (convert journal from inside to outside without reformating the FS ) ?

thanks.


>when trying your workload locally on the NFS server, realize that there
>are some optimizations that local file systems make, like caching and
>coalescing metadata updates, that the NFS protocol does not allow. this
>affects especially workloads with lots of metadata change operations,
>because the NFS protocol requires each metadata update to reside on
>permanent storage before the NFS server replies to the client,
>effectively serializing the workload with storage activity.
>
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-24 18:45:41

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Good advice :-), externalizing the journal from my FS on the AX100 SP to
an internal SCSI disk on the NFS server is very good in terms of
performances, now it take 1,5 minutes instead of 12m to unter httpd
archive !

Journal on a 16 Mo partition , export sync, mount async
[root@arvouin /mnt/cobra3extjournaldata16/Nfs-test16]
$time tar xvfz /usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
real 1m35.097s
user 0m0.872s
sys 0m2.409s

However now the tar extraction goes very fast but stops 1 or 2 or and
restart fast -> there are some hangs. Here with a 16MB journal I got 15
hangs of 1-2 seconds, with a 128 MB I get only 3 hangs but they last 4or
5 seconds. I checked at a momment of an hang on the nfs server with
iostat, and disk utilisation goes from a few % to 316 % in the exemple
below (for 128 MB journal withing the 4 seconds hangs it goes to 4700 % !)
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
/dev/emcpowerl2
0.00 150.67 97.33 224.00 768.00 3018.67 384.00
1509.33 11.78 33.33 19.79 9.83 316.00

Maybe it hangs because the journal commits on the SP ! ?

Well, finally, is this safer in terms of performances to externalize
journal than using async export ?
And is it possible to externalize a journal on an already existing ext3
FS or do we need to reformat it ?

Thanks.

jehan procaccia wrote:

> Lever, Charles wrote:
>
>>
>> btw, it is fairly well understood that RAID-5 and NFS servers don't mix
>> well. RAID-5's weakest point is that it doesn't handle small random
>> writes very well, and that's exactly what is required of it when
>> handling NFS traffic that consists mostly of metadata changes (file
>> creates, deletes, and so on). neil explained clearly how to make the
>> best use of a RAID-5 with NFS: do your local file system journaling
>> somewhere else.
>>
>>
> No, not yet, but if it is safer and increase performances maybe I
> should do it !
>
> Perhaps it's not the place to talk about ext3 here, but if someone on
> the list did already put their journal on a separate device, please
> confirm me those points:
> From what I read on man mkefs for ext3 FS I can create a journal on a
> separate FS :
> mke2fs -O journal_dev external-journal
> creates the journal FS, on which device ? -> internal scsi drive of my
> server or better placed on the dell/EMC SP ?
>
> mke2fs -J device=/dev/external-journal /dev/emcpower
> Format the FS and use the external journal just create above, but what
> is the recommended size of the external journal ? when journal is
> internal it is said the size of the journal must be at least 1024
> filesystem blocks
> (in my case blocks a 4K size) so journal is at least 4 Mb, but should
> it be bigger ?
>
> Finally, can I "externalize" an already internal journal from
> production FS (convert journal from inside to outside without
> reformating the FS ) ?
>
> thanks.
>
>
>> when trying your workload locally on the NFS server, realize that there
>> are some optimizations that local file systems make, like caching and
>> coalescing metadata updates, that the NFS protocol does not allow. this
>> affects especially workloads with lots of metadata change operations,
>> because the NFS protocol requires each metadata update to reside on
>> permanent storage before the NFS server replies to the client,
>> effectively serializing the workload with storage activity.
>>
>>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-24 23:14:59

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Neil Brown wrote:

>On Wednesday November 24, [email protected] wrote:
>
>
>>However now the tar extraction goes very fast but stops 1 or 2 or and
>>restart fast -> there are some hangs. Here with a 16MB journal I got 15
>>hangs of 1-2 seconds, with a 128 MB I get only 3 hangs but they last 4or
>>5 seconds. I checked at a momment of an hang on the nfs server with
>>iostat, and disk utilisation goes from a few % to 316 % in the exemple
>>below (for 128 MB journal withing the 4 seconds hangs it goes to 4700 % !)
>>Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
>>avgrq-sz avgqu-sz await svctm %util
>>/dev/emcpowerl2
>> 0.00 150.67 97.33 224.00 768.00 3018.67 384.00
>>1509.33 11.78 33.33 19.79 9.83 316.00
>>
>>Maybe it hangs because the journal commits on the SP ! ?
>>
>>
>>
>
>It hangs because of some clumsy code in ext3 that no-one has bothered
>to fix yet - I had a look once but it was a little beyond the time I
>had to spare.
>
>When information is written to the journal, it stays in memory as well
>and is eventually written out to the main filesystem using normal
>lazy-flushing mechanisms (data is pushed out either due to memory
>pressure or because it has been idle for too long).
>When ext3 wants to add information to the head of the journal, it
>needs to clean up the tail to make space.
>If it finds that the data that was written to the tail is already
>safe in the main filesystem, it just frees up some of the tail and
>starts using it for a new head.
>HOWEVER, if it finds that the data in the tail hasn't made it to the
>main filesystem, it flushes *ALL* of the data in the journal out to
>the main filesystem. (It should only flush some fraction or fixed
>number of blocks or something). This flushing causes a very
>noticeable pause. The larger the journal, the less often the flush is
>needed, but the longer the flush lasts for.
>
>There are two ways to avoid this pause. One I have tested and works
>well. The other only just occurred to me and I haven't tried.
>
>The untested one involves making the journal larger than main memory.
>If it is that large, then memory pressure should flush out journal
>blocks before the journal wraps back to them, and so the flush should
>never happen. However such a large journal may cause other problems
>(slow replay) as mentioned in my other email.
>
>The way that works if to adjust the "bdflush" parameters so that data
>is flushed to disk more quickly. The default is to flush data once it
>is 30 seconds old. If you reduce that to 5 seconds, the problem goes
>away.
>
>For 2.4, I put
>vm.bdflush = 30 500 0 0 100 500 60 20 0
>
>in my /etc/sysctl.conf, which is equivalent to running
> echo 30 500 0 0 100 500 60 20 0 > /proc/sys/vm/bdflush
>
>

$ uname -r
2.4.21-4.ELsmp
here's what I had before setting the above:
$ cat /proc/sys/vm/bdflush
50 500 0 0 500 3000 80 50 0

Now indeed pauses seems to be shorter (I've seen 12 instead of 15 and
they latest less than 1s )
[root@arvouin Nfs-test]# time tar xvfz
/usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
real 1m22.504s
user 0m0.898s
sys 0m2.846s

On a 128MB journal it's even better, I don't see any pauses (I had a
least 3 of each 4-5 seconds before) .
[root@arvouin Nfs-test]# time tar xvfz
/usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
real 0m25.038s
user 0m0.914s
sys 0m2.477s

Very good :-)

just for the record so that I'am sure how I got that performance, here
is the server's export options: (data=journal in /etc/fstab for that FS !)
$ cat /var/lib/nfs/xtab
/mnt/emcpowerm1
arvouin.int-evry.fr(rw,sync,no_wdelay,hide,nocrossmnt,secure,no_root_squash,no_all_squash,subtree_check,secure_locks,no_acl,mapping=identity,anonuid=-2,anongid=-2)
and client mount option
[root@arvouin Nfs-test]# cat /proc/mounts
cobra3:/mnt/emcpowerm1 /mnt/cobra3extjournal nfs
rw,v3,rsize=8192,wsize=8192,hard,tcp,lock,addr=cobra3 0 0

To be sure of the improvement of the "hack" on /proc/sys/vm/bdflush I've
set it back to the original values:
$ echo 50 500 0 0 500 3000 80 50 0 > /proc/sys/vm/bdflush

and dynamically (no unmount or remont anything on either side) test again

[root@arvouin Nfs-test]# time tar xvfz
/usr/src/redhat/SOURCES/httpd-2.0.51.tar.gz
real 1m19.655s
user 0m0.860s
sys 0m2.612s

time is longer and pauses are worst than I though -> 3 pauses of
approximately 10 to 15 seconds each !

So it seem to be a very good advice to echo 30 500 0 0 100 500 60 20 0
> /proc/sys/vm/bdflush :-)
however this is a general configuration, will it disturb other devices ?
what means every figures here ? why where they set to an non optimal
value iin the 1st place ?

PS: different optimisation:
I've read this "the maximum block size is
defined by the value of the kernel constant *NFSSVC_MAXBLKSIZE*,
found in the Linux kernel source file ./include/linux/nfsd/const.h"
is there a way to change my actual 8K buffer size to 32 K without
recompiling the kernel ?

thanks.


>For 2.6, I assume you would
> echo 500 > /proc/sys/vm/dirty_expire_centisecs
>but I haven't tested this.
>
>
>
>
>>Well, finally, is this safer in terms of performances to externalize
>>journal than using async export ?
>>
>>
>
>Absolutely, providing you trust the hardware that you are storing your
>journal on.
>An external journal is perfectly safe.
>async export is not.
>
>NeilBrown
>
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-24 23:35:24

by NeilBrown

[permalink] [raw]
Subject: Re: async vs. sync

On Thursday November 25, [email protected] wrote:
>
> So it seem to be a very good advice to echo 30 500 0 0 100 500 60 20 0
> > /proc/sys/vm/bdflush :-)
> however this is a general configuration, will it disturb other devices ?
> what means every figures here ? why where they set to an non optimal
> value iin the 1st place ?

The values are fairly optimal for most usages. They just happen to be
not so good for an NFS server using ext3 with data=journal....

The effect of the change is that data is flushed to disc more
quickly. This provides fewer opportunities to coalesce nearby data into
large blocks, and means that blocks that are being changed frequently
get written out more often. So it could cause a bit more disk IO, but
I doubt that it would be much.

As for the meanings of the numbers: fs/buffer.c contains:
/* The dummy values in this structure are left in there for compatibility
* with old programs that play with the /proc entries.
*/
union bdflush_param {
struct {
int nfract; /* Percentage of buffer cache dirty to
activate bdflush */
int ndirty; /* Maximum number of dirty blocks to write out per
wake-cycle */
int dummy2; /* old "nrefill" */
int dummy3; /* unused */
int interval; /* jiffies delay between kupdate flushes */
int age_buffer; /* Time for normal buffer to age before we flush it */
int nfract_sync;/* Percentage of buffer cache dirty to
activate bdflush synchronously */
int nfract_stop_bdflush; /* Percetange of buffer cache dirty to stop bdflush */
int dummy5; /* unused */
} b_un;
unsigned int data[N_PARAM];
} bdf_prm = {{30, 500, 0, 0, 5*HZ, 30*HZ, 60, 20, 0}};

/* These are the min and max parameter values that we will allow to be assigned */
int bdflush_min[N_PARAM] = { 0, 1, 0, 0, 0, 1*HZ, 0, 0, 0};
int bdflush_max[N_PARAM] = {100,50000, 20000, 20000,10000*HZ, 10000*HZ, 100, 100, 0};

so a lot of them don't mean anything. The key ones for this exercise
are "age_buffer" which we set to 500 (5 seconds) and "interval" which
needs to be less than age_buffer.


>
> PS: different optimisation:
> I've read this "the maximum block size is
> defined by the value of the kernel constant *NFSSVC_MAXBLKSIZE*,
> found in the Linux kernel source file ./include/linux/nfsd/const.h"
> is there a way to change my actual 8K buffer size to 32 K without
> recompiling the kernel ?

No, and you probably shouldn't. knfsd in 2.4 is unlikely to handle
requests larger than a couple of pages very well, which is why the
size if set to 2 pages by default. 2.6 handles larger requests much
better and so sets a larger page size.

NeilBrown


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-24 22:24:24

by NeilBrown

[permalink] [raw]
Subject: Re: async vs. sync

On Wednesday November 24, [email protected] wrote:
>
> However now the tar extraction goes very fast but stops 1 or 2 or and
> restart fast -> there are some hangs. Here with a 16MB journal I got 15
> hangs of 1-2 seconds, with a 128 MB I get only 3 hangs but they last 4or
> 5 seconds. I checked at a momment of an hang on the nfs server with
> iostat, and disk utilisation goes from a few % to 316 % in the exemple
> below (for 128 MB journal withing the 4 seconds hangs it goes to 4700 % !)
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
> avgrq-sz avgqu-sz await svctm %util
> /dev/emcpowerl2
> 0.00 150.67 97.33 224.00 768.00 3018.67 384.00
> 1509.33 11.78 33.33 19.79 9.83 316.00
>
> Maybe it hangs because the journal commits on the SP ! ?
>

It hangs because of some clumsy code in ext3 that no-one has bothered
to fix yet - I had a look once but it was a little beyond the time I
had to spare.

When information is written to the journal, it stays in memory as well
and is eventually written out to the main filesystem using normal
lazy-flushing mechanisms (data is pushed out either due to memory
pressure or because it has been idle for too long).
When ext3 wants to add information to the head of the journal, it
needs to clean up the tail to make space.
If it finds that the data that was written to the tail is already
safe in the main filesystem, it just frees up some of the tail and
starts using it for a new head.
HOWEVER, if it finds that the data in the tail hasn't made it to the
main filesystem, it flushes *ALL* of the data in the journal out to
the main filesystem. (It should only flush some fraction or fixed
number of blocks or something). This flushing causes a very
noticeable pause. The larger the journal, the less often the flush is
needed, but the longer the flush lasts for.

There are two ways to avoid this pause. One I have tested and works
well. The other only just occurred to me and I haven't tried.

The untested one involves making the journal larger than main memory.
If it is that large, then memory pressure should flush out journal
blocks before the journal wraps back to them, and so the flush should
never happen. However such a large journal may cause other problems
(slow replay) as mentioned in my other email.

The way that works if to adjust the "bdflush" parameters so that data
is flushed to disk more quickly. The default is to flush data once it
is 30 seconds old. If you reduce that to 5 seconds, the problem goes
away.

For 2.4, I put
vm.bdflush = 30 500 0 0 100 500 60 20 0

in my /etc/sysctl.conf, which is equivalent to running
echo 30 500 0 0 100 500 60 20 0 > /proc/sys/vm/bdflush

For 2.6, I assume you would
echo 500 > /proc/sys/vm/dirty_expire_centisecs
but I haven't tested this.


> Well, finally, is this safer in terms of performances to externalize
> journal than using async export ?

Absolutely, providing you trust the hardware that you are storing your
journal on.
An external journal is perfectly safe.
async export is not.

NeilBrown


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-24 19:05:35

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync

> However now the tar extraction goes very fast but stops 1 or=20
> 2 or and=20
> restart fast -> there are some hangs. Here with a 16MB=20
> journal I got 15=20
> hangs of 1-2 seconds, with a 128 MB I get only 3 hangs but=20
> they last 4or=20
> 5 seconds. I checked at a momment of an hang on the nfs server with=20
> iostat, and disk utilisation goes from a few % to 316 % in=20
> the exemple=20
> below (for 128 MB journal withing the 4 seconds hangs it goes=20
> to 4700 % !)
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s=20
> wkB/s=20
> avgrq-sz avgqu-sz await svctm %util
> /dev/emcpowerl2
> 0.00 150.67 97.33 224.00 768.00 3018.67 384.00 =20
> 1509.33 11.78 33.33 19.79 9.83 316.00
>=20
> Maybe it hangs because the journal commits on the SP ! ?

i'll leave that to NFS performance experts. in general, increasing the
size of the journal means that the physical file system can handle a
higher transaction rate, so you are going in the right direction. but i
don't have any specific knowlege about journal sizing best practices.

> Well, finally, is this safer in terms of performances to externalize=20
> journal than using async export ?

neil recommends mirroring the journal, but imho, that may not be
necessary. if the journal disk goes bad, you can just fsck the RAID5
array and replace the journal disk. otherwise, yes, you should use the
"sync" export option and a separate journal disk for best data integrity
and good performance.

> And is it possible to externalize a journal on an already=20
> existing ext3 FS or do we need to reformat it ?

i'm just guessing, but i think you can do this. you should be able to
disable journaling on the existing FS, then re-enable it with the new
journal device. naturally you should back up your file system before
trying anything.

> jehan procaccia wrote:
>=20
> > Lever, Charles wrote:
> >
> >>
> >> btw, it is fairly well understood that RAID-5 and NFS=20
> servers don't mix
> >> well. RAID-5's weakest point is that it doesn't handle=20
> small random
> >> writes very well, and that's exactly what is required of it when
> >> handling NFS traffic that consists mostly of metadata changes (file
> >> creates, deletes, and so on). neil explained clearly how=20
> to make the
> >> best use of a RAID-5 with NFS: do your local file system journaling
> >> somewhere else.
> >> =20
> >>
> > No, not yet, but if it is safer and increase performances maybe I=20
> > should do it !
> >
> > Perhaps it's not the place to talk about ext3 here, but if=20
> someone on=20
> > the list did already put their journal on a separate device, please=20
> > confirm me those points:
> > From what I read on man mkefs for ext3 FS I can create a=20
> journal on a=20
> > separate FS :
> > mke2fs -O journal_dev external-journal
> > creates the journal FS, on which device ? -> internal scsi=20
> drive of my=20
> > server or better placed on the dell/EMC SP ?
> >
> > mke2fs -J device=3D/dev/external-journal /dev/emcpower
> > Format the FS and use the external journal just create=20
> above, but what=20
> > is the recommended size of the external journal ? when journal is=20
> > internal it is said the size of the journal must be at least 1024=20
> > filesystem blocks
> > (in my case blocks a 4K size) so journal is at least 4 Mb,=20
> but should=20
> > it be bigger ?
> >
> > Finally, can I "externalize" an already internal journal from=20
> > production FS (convert journal from inside to outside without=20
> > reformating the FS ) ?
> >
> > thanks.
> >
> >
> >> when trying your workload locally on the NFS server,=20
> realize that there
> >> are some optimizations that local file systems make, like=20
> caching and
> >> coalescing metadata updates, that the NFS protocol does=20
> not allow. this
> >> affects especially workloads with lots of metadata change=20
> operations,
> >> because the NFS protocol requires each metadata update to reside on
> >> permanent storage before the NFS server replies to the client,
> >> effectively serializing the workload with storage activity.
> >> =20
> >>
> >
> >
> >
> > -------------------------------------------------------
> > SF email is sponsored by - The IT Product Guide
> > Read honest & candid reviews on hundreds of IT Products=20
> from real users.
> > Discover which products truly live up to the hype. Start=20
> reading now.=20
> > http://productguide.itmanagersjournal.com/
> > _______________________________________________
> > NFS maillist - [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nfs
>=20
>=20
>=20


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-24 22:09:42

by NeilBrown

[permalink] [raw]
Subject: Re: async vs. sync

On Tuesday November 23, [email protected] wrote:
>
> Perhaps it's not the place to talk about ext3 here, but if someone on
> the list did already put their journal on a separate device, please
> confirm me those points:
> From what I read on man mkefs for ext3 FS I can create a journal on a
> separate FS :
> mke2fs -O journal_dev external-journal
> creates the journal FS, on which device ? -> internal scsi drive of my
> server or better placed on the dell/EMC SP ?

Any device that is reliable and will respond very quickly to
sequential writes is fine. A dedicated mirrored pair of fast SCSI drives is
ideal if you can afford it.
If the "dell/EMC SP" can be configured to provide a mirrored pair of
other-wise lightly used drives, then a partition on that would be
fine.
If you put the journal on a single drive, then a failure of that drive
will cost you some data loss, and a full 'fsck' of the main storage.

I put my journals on internal drives. One cost of this is if the
server dies (e.g. bad power supply) then you need to move the internal
drives to some other computer before you can get your data back. This
can be trivial, or frustratingly difficult(*).

>
> mke2fs -J device=/dev/external-journal /dev/emcpower
> Format the FS and use the external journal just create above, but what is the recommended size of the
> external journal ? when journal is internal it is said the size of the journal must be at least 1024 filesystem blocks
> (in my case blocks a 4K size) so journal is at least 4 Mb, but
> should it be bigger ?

I think 1gigabyte is a good size - probably much bigger than needed,
but not too big. I wouldn't go below 128Meg myself.

I once had a 6gig journal (because there was plenty of space, so "why
not").
I found out why not: On restart ext3 has to scan the whole journal.
For a 6gig journal that takes a while. Not as long as fsck, but still
too long.

>
> Finally, can I "externalize" an already internal journal from
> production FS (convert journal from inside to outside without
> reformating the FS ) ?

Yes, but not while the filesystem is mounted.

Use tune2fs to remove the internal journal:
tune2fs -O ^has_journal
create an external journal
mke2fs -O journal_dev /dev/external-journal
and attach the journal to the fs
tune2fs -J device=/dev/external-journal

then re-mount.

NeilBrown

(*) I recently needed to move a large filesystem that was on an
external drive box from a failing machine to another machine. Both
had hot-swap scsi drives for the internal system discs though they
were from different manufacturers.
The drive caddies were different but unscrewing a drive from one and
putting it in the other didn't seem like too much trouble .... until
we tried it. The caddy from one manufacturer was designed so there
was no tolerance. A drive 0.5mm larger than the one they provided
simply didn't fit. The drive I wanted to put in was 0.5mm larger :-(

Fortunately, as it was a controlled change over, we could detach the
journal on the system discs of one machine and re-attach a new journal
on the other. If the first computer had blown up instead of started
failing occasionally, it would have been *much* more awkward.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-16 16:16:12

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync

is the effect of the "sync" export option limited to NFSv3 COMMIT, or is
it limited to both NFSv3 COMMIT and NFSv3 FILE_SYNC/DATA_SYNC WRITE?

what are the effects on NFSv4 writes and commits?

> -----Original Message-----
> From: Olaf Kirch [mailto:[email protected]]
> Sent: Wednesday, July 28, 2004 4:57 AM
> To: Bernd Schubert
> Cc: [email protected]
> Subject: Re: [NFS] async vs. sync
>=20
>=20
> Hi,
>=20
> the way the sync export option affects NFSv3 writes is
> limited to COMMITs, so if you see a slow-down here it must be=20
> bottle-necking in that part of the code.


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. http://www.intersystems.com/match8
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-16 16:33:12

by Trond Myklebust

[permalink] [raw]
Subject: RE: async vs. sync

ty den 16.11.2004 Klokka 08:15 (-0800) skreiv Lever, Charles:
> is the effect of the "sync" export option limited to NFSv3 COMMIT, or is
> it limited to both NFSv3 COMMIT and NFSv3 FILE_SYNC/DATA_SYNC WRITE?
>
> what are the effects on NFSv4 writes and commits?

No! It is clearly not just limited to writes and commits.

Look at the code in fs/nfs/vfs.c: there are EX_ISSYNC() exceptions that
wrap calls to nfsd_sync_dir() in nfsd*_create(), nfsd_symlink(),
nfsd_link(), nfsd_rename(), and nfsd_unlink().

Cheers,
Trond

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. http://www.intersystems.com/match8
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-16 17:19:05

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Trond Myklebust wrote:

>ty den 16.11.2004 Klokka 08:15 (-0800) skreiv Lever, Charles:
>
>
>>is the effect of the "sync" export option limited to NFSv3 COMMIT, or is
>>it limited to both NFSv3 COMMIT and NFSv3 FILE_SYNC/DATA_SYNC WRITE?
>>
>>what are the effects on NFSv4 writes and commits?
>>
>>
>
>No! It is clearly not just limited to writes and commits.
>
>Look at the code in fs/nfs/vfs.c: there are EX_ISSYNC() exceptions that
>wrap calls to nfsd_sync_dir() in nfsd*_create(), nfsd_symlink(),
>nfsd_link(), nfsd_rename(), and nfsd_unlink().
>
>Cheers,
> Trond
>
>
>
By the way, I noticed on a performance factor of 30 (!) from an sync to
async nfs v3 export FS :

sync export:

$time tar xvfz linux-2.6.8.tar.gz
real 64m18.618s
user 0m5.742s
sys 0m15.658s

async export:

$time tar xvfz linux-2.6.8.tar.gz
real 2m0.552s
user 0m5.838s
sys 0m15.678s


Is it really dangerous to use async ? why recent OS uses sync by default
(My client is a Fedora Core 2 (kernel 2.6) and server an RedHat
Entreprise server 3 (kernel 2.4))

thanks.


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. http://www.intersystems.com/match8
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-16 18:08:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: async vs. sync

ty den 16.11.2004 Klokka 18:18 (+0100) skreiv jehan.procaccia:
> >
> By the way, I noticed on a performance factor of 30 (!) from an sync to
> async nfs v3 export FS :
>
> sync export:
>
> $time tar xvfz linux-2.6.8.tar.gz
> real 64m18.618s
> user 0m5.742s
> sys 0m15.658s
>
> async export:
>
> $time tar xvfz linux-2.6.8.tar.gz
> real 2m0.552s
> user 0m5.838s
> sys 0m15.678s

What mount options are you using? I don't see a factor 30 difference
using my setup.

> Is it really dangerous to use async ? why recent OS uses sync by default
> (My client is a Fedora Core 2 (kernel 2.6) and server an RedHat
> Entreprise server 3 (kernel 2.4))

"async" is bad because it lies to you about whether or not the data is
on disk or not. Type "sync", and it will happily return, and tell your
application that all is well but you will still lose your data if the
server crashes on you...

Cheers,
Trond

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. http://www.intersystems.com/match8
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-16 18:45:37

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync

> Is it really dangerous to use async ? why recent OS uses sync=20
> by default (My client is a Fedora Core 2 (kernel 2.6) and
> server an RedHat Entreprise server 3 (kernel 2.4))

see the NFS FAQ:

http://nfs.sourceforge.net/

questions B4 and B5, at least.


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. http://www.intersystems.com/match8
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-16 18:48:21

by Lever, Charles

[permalink] [raw]
Subject: RE: async vs. sync

> ty den 16.11.2004 Klokka 08:15 (-0800) skreiv Lever, Charles:
> > is the effect of the "sync" export option limited to NFSv3=20
> COMMIT, or is
> > it limited to both NFSv3 COMMIT and NFSv3 FILE_SYNC/DATA_SYNC WRITE?
> >=20
> > what are the effects on NFSv4 writes and commits?
>=20
> No! It is clearly not just limited to writes and commits.
>=20
> Look at the code in fs/nfs/vfs.c: there are EX_ISSYNC()=20
> exceptions that
> wrap calls to nfsd_sync_dir() in nfsd*_create(), nfsd_symlink(),
> nfsd_link(), nfsd_rename(), and nfsd_unlink().

ulp. wow. this clearly affects all versions of NFS on the Linux
server, then.

i'm just looking for clarification so i can provide a good explanation
in the Linux NFS FAQ about the evils of using "async." i'll cruise
through the server code.



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. http://www.intersystems.com/match8
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-17 07:00:16

by Oliver Beowulf Friedrich

[permalink] [raw]
Subject: Problem with NFS-Exports

Greetings @ all,

I'm new here on the list, but already have a simple question to you.

I'm using Gentoo linux, currently, having a working server@home with
Gentoo which exports Gentoo's temporary folders out to my workstations.

server root # cat /etc/exports
/home 192.168.0.0/24(rw,async)
/usr/portage 192.168.0.0/24(rw,no_root_squash,async,mp)
/tmp
192.168.0.0/24(rw,no_root_squash,insecure_locks,async,mp)
/mnt/data 192.168.0.0/24(rw,no_root_squash,mp,sync)
/mnt/backup 192.168.0.0/24(rw,no_root_squash,mp,sync)


One Workstation is now on compiling a new fresh Gentoo Linux, so I
startet with booting from Gentoo-2004.3 liveCD, mounting local
filesystems, mounting nfs filesystems, chrooting and starting to
compile.

So fine, but while compiling, I get the following Error messages
printed out quit often:

lockd: failed to monitor 192.168.0.2
lockd: cannont monitor 192.168.0.2

So whats going on there? Sometimes it's going so far, that my new
WKS cannot compile, cause of:

lockd: server ist not responding
lockd: still trying OK

How do i get this solved? With my previus installation it worked
fine, I had only NFS-Server Support compiled into Kernel, now I have
NFSV3-Server Support compiled in.

My Server now is not used to heavy, only a little Samba Server for
my (only) Windows-Box where MP3's are shared to...

On my last installation on my Server I used distcc while compiling,
and even that did not lead to such Error Messages...

Thanks for reading

BeowulfOF




-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. http://www.intersystems.com/match8
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-01 17:27:28

by jehan.procaccia

[permalink] [raw]
Subject: Re: async vs. sync

Yes, I did made a note on this long thread to resume it. it is mainly in
french but mix a lot of english and captures a also in english so I
might be easy for you to read it .
Here it is ;

http://www.int-evry.fr/mci/user/procacci/Doc/nfs.html

check principaly from chapter 8 to the end
thanks everyone for your help.

Alex Mccoll wrote:

>Hi Jehan,
>I've been following your discussion with Neil Brown, thank you
>for your insights! I'm roughly in the same situation, and it's
>been very helpful for me to find your thread!
>Say, did you figure out how to create the journal on a seperate
>device? If you have any notes on that, could you send them
>my way?
>
>Thanks, in advance,
>Alex.
>
>
>On Tue, 23 Nov 2004, jehan procaccia wrote:
>
>
>
>>Date: Tue, 23 Nov 2004 22:46:44 +0100
>>From: jehan procaccia <[email protected]>
>>To: "Lever, Charles" <[email protected]>
>>Cc: [email protected]
>>Subject: Re: [NFS] async vs. sync
>>
>>Lever, Charles wrote:
>>
>>
>>
>>>>This is what I expect in term of performances . I will continue my
>>>>requests on the DEll/EMC hotline , but maybe the security of
>>>>that AX100
>>>>storage Processor (raid5, spare disk, double fiber attachement, UPS)
>>>>allows me to use async export mode in such a case ?
>>>>
>>>>
>>>>
>>>>
>>>the "async" export option changes the behavior of the NFS server
>>>daemons, not of the underlying local file system or storage subsystem.
>>>the problem is that changes made by clients will remain in your NFS
>>>server's memory and not get flushed onto permanent storage.
>>>
>>>so, i really don't think the storage subsystem will have any effect on
>>>the safety of your data before the data reaches permanent storage. as
>>>someone else pointed out earlier, the solution is to use battery-backed
>>>main memory when using "async" (prestoserve for solaris?).
>>>
>>>as trond said, if your users and backup facilities can tolerate the loss
>>>of data during a crash, then it is perfectly fine to use "async." most
>>>don't, however.
>>>
>>>btw, it is fairly well understood that RAID-5 and NFS servers don't mix
>>>well. RAID-5's weakest point is that it doesn't handle small random
>>>writes very well, and that's exactly what is required of it when
>>>handling NFS traffic that consists mostly of metadata changes (file
>>>creates, deletes, and so on). neil explained clearly how to make the
>>>best use of a RAID-5 with NFS: do your local file system journaling
>>>somewhere else.
>>>
>>>
>>>
>>>
>>No, not yet, but if it is safer and increase performances maybe I
>>should do it !
>>
>>Perhaps it's not the place to talk about ext3 here, but if someone on
>>the list did already put their journal on a separate device, please
>>confirm me those points:
>> From what I read on man mkefs for ext3 FS I can create a journal on a
>>separate FS :
>>mke2fs -O journal_dev external-journal
>>creates the journal FS, on which device ? -> internal scsi drive of my
>>server or better placed on the dell/EMC SP ?
>>
>>mke2fs -J device=/dev/external-journal /dev/emcpower
>>Format the FS and use the external journal just create above, but what is the recommended size of the
>>external journal ? when journal is internal it is said the size of the journal must be at least 1024 filesystem blocks
>>(in my case blocks a 4K size) so journal is at least 4 Mb, but should it be bigger ?
>>
>>Finally, can I "externalize" an already internal journal from production FS (convert journal from inside to outside without reformating the FS ) ?
>>
>>thanks.
>>
>>
>>
>>
>>>when trying your workload locally on the NFS server, realize that there
>>>are some optimizations that local file systems make, like caching and
>>>coalescing metadata updates, that the NFS protocol does not allow. this
>>>affects especially workloads with lots of metadata change operations,
>>>because the NFS protocol requires each metadata update to reside on
>>>permanent storage before the NFS server replies to the client,
>>>effectively serializing the workload with storage activity.
>>>
>>>
>>>
>>>
>>
>>-------------------------------------------------------
>>SF email is sponsored by - The IT Product Guide
>>Read honest & candid reviews on hundreds of IT Products from real users.
>>Discover which products truly live up to the hype. Start reading now.
>>http://productguide.itmanagersjournal.com/
>>_______________________________________________
>>NFS maillist - [email protected]
>>https://lists.sourceforge.net/lists/listinfo/nfs
>>
>>
>>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs