2009-09-02 18:13:55

by Jason Legate

[permalink] [raw]
Subject: NFS for millions of files

Hi, I'm trying to setup a server that we can create millions of files on over
NFS. When I run our creation benchmark locally I can get around 3000 files/
second in the configuration we're using now, but only around 300/second over
NFS. It's mounted as this:

rw,nosuid,nodev,noatime,nodiratime,hard,bg,nointr,rsize=32768,wsize=32768,tcp,
nfsvers=3,timeo=600,actimeo=600,nocto

When I mount the same FS over localhost instead of across the lan, it performs
about full speed (the 3000/sec). Anyone have any ideas what I might tweak or
look at?

We're going to be testing various XFS/LVM configs to get the best performance,
but right out the gate, NFS having a 10:1 penalty of performance doesn't bode
well.

Thanks in advance,
Jason


2009-09-02 19:27:06

by Peter Chacko

[permalink] [raw]
Subject: Re: NFS for millions of files

Of course fragmentation doesn't matter if there is no data IO...You
true...I usually got significant performance with rszie/wsize around
4K, while MTU was at 1.5 k.....As you said right balance between the
performance gains at the RPC/UDP stack and fragmentation overhead at
L3/L2 stack....

May be you can try adding more threads on the server, and if it is
MIPIO capable, you can find better performance....and lastly NFS is
not meant for performance...Until you use NFSoRDMA pr pNFS....As Chuck
mentioned, sequential LOG based file systems based NFS server offer
better performance at a server storage level..

And if you have a real need, i have created an NFS load balancer that
brings superior performance(better than local FS for large meta data
intensive workloads or IO is spread to different files) for my
previous employer...You can find more details here.. Its not open
source...But i can have them contact you and give you a trial
version...

http://www.calsoftlabs.com/whitepapers/net-nfs.html

thanks

On Thu, Sep 3, 2009 at 12:40 AM, Aaron Wiebe<[email protected]> wrote:
> On Wed, Sep 2, 2009 at 2:35 PM, Peter Chacko<[email protected]> wrote:
>> Is this NFSv4 ? ?rsize and wsize > MTU size will cause fragmentation
>> and performance issues...Try making it around 4k .....You used ?1<<15
>> fir your example. if you don't do writes....then this shouldn't
>> matter...and of course NFS is ?Nfs is not For Scalability.. You cannot
>> get the same performance on NFS as you would get for localFS...May be
>> you can try 10g....still there is TCP/UDP/IP stack overhead.....
>
> Fragmentation won't hurt you that much, and it doesn't even apply to
> opens, since those operations are significantly smaller. ?The
> performance gains higher in the stack from a larger rsize/wsize
> generally outweigh the savings in frame size optimization. ?Of course,
> jumbo frames are always a good idea, but again, doesn't help here.
>
> -Aaron
>



--
Best regards,
Peter Chacko

NetDiox computing systems,
Network storage & OS training and research.
Bangalore, India.
http://www.netdiox.com
080 2664 0708

2009-09-02 18:19:15

by Aaron Wiebe

[permalink] [raw]
Subject: Re: NFS for millions of files

Have a look at these two kernel params - I'd recommend bumping them up
to 128 (they're 16 by default).

sunrpc.tcp_slot_table_entries
sunrpc.udp_slot_table_entries

Keep in mind that this could also be a serialization issue. If you've
got a 3ms latency, and you're performing all of your opens serially,
you aren't going to get much faster. If you do the work in parallel
you'll likely get substantially better numbers.

-Aaron


On Wed, Sep 2, 2009 at 2:08 PM, Jason Legate<[email protected]> wrote:
> Hi, I'm trying to setup a server that we can create millions of files on over
> NFS. ?When I run our creation benchmark locally ?I can get around 3000 files/
> second in the configuration we're using now, but only around 300/second over
> NFS. ?It's mounted as this:
>
> rw,nosuid,nodev,noatime,nodiratime,hard,bg,nointr,rsize=32768,wsize=32768,tcp,
> nfsvers=3,timeo=600,actimeo=600,nocto
>
> When I mount the same FS over localhost instead of across the lan, it performs
> about full speed (the 3000/sec). ?Anyone have any ideas what I might tweak or
> look at?
>
> We're going to be testing various XFS/LVM configs to get the best performance,
> but right out the gate, NFS having a 10:1 penalty of performance doesn't bode
> well.
>
> Thanks in advance,
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2009-09-02 19:10:46

by Aaron Wiebe

[permalink] [raw]
Subject: Re: NFS for millions of files

On Wed, Sep 2, 2009 at 2:35 PM, Peter Chacko<[email protected]> wrote:
> Is this NFSv4 ? ?rsize and wsize > MTU size will cause fragmentation
> and performance issues...Try making it around 4k .....You used ?1<<15
> fir your example. if you don't do writes....then this shouldn't
> matter...and of course NFS is ?Nfs is not For Scalability.. You cannot
> get the same performance on NFS as you would get for localFS...May be
> you can try 10g....still there is TCP/UDP/IP stack overhead.....

Fragmentation won't hurt you that much, and it doesn't even apply to
opens, since those operations are significantly smaller. The
performance gains higher in the stack from a larger rsize/wsize
generally outweigh the savings in frame size optimization. Of course,
jumbo frames are always a good idea, but again, doesn't help here.

-Aaron

2009-09-03 18:15:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS for millions of files

On Wed, Sep 02, 2009 at 02:37:31PM -0400, Peter Staubach wrote:
> Please keep in mind that the NFS stable storage requirements are
> probably causing a significant slowdown in activities such as this.

My first thought too, but:

> Jason Legate wrote:
> > When I run our creation benchmark locally I can get around 3000
> > files/ second in the configuration we're using now, but only around
> > 300/second over NFS. It's mounted as this:
...
> > When I mount the same FS over localhost instead of across the lan,
> > it performs about full speed (the 3000/sec).

The localhost NFS mount would be incurring the same sync latency, so all
his latency must be due to network. (And with those numbers I guess
he's either got lots of disk spindles, or an ssd, or (uh-oh) has the
async option set?)

--b.

2009-09-02 19:02:06

by Jason Legate

[permalink] [raw]
Subject: Re: NFS for millions of files

NFSv3, and I will try various other rsize/wsize.

Apologies for the accidental double post...

We are utilizing XFS with a separate logdev, dedicated raid 1 15k sas drives,
and are already utilizing write caching. I guess I can try playing with a few
other things. Thanks for all the pointers.

Jason

It is said that Peter Chacko, at Thu, 03 Sep 2009, wrote:

> Is this NFSv4 ? rsize and wsize > MTU size will cause fragmentation
> and performance issues...Try making it around 4k .....You used 1<<15
> fir your example. if you don't do writes....then this shouldn't
> matter...and of course NFS is Nfs is not For Scalability.. You cannot
> get the same performance on NFS as you would get for localFS...May be
> you can try 10g....still there is TCP/UDP/IP stack overhead.....
>
>
> On Wed, Sep 2, 2009 at 11:49 PM, Aaron Wiebe<[email protected]> wrote:
> > Have a look at these two kernel params - I'd recommend bumping them up
> > to 128 (they're 16 by default).
> >
> > sunrpc.tcp_slot_table_entries
> > sunrpc.udp_slot_table_entries
> >
> > Keep in mind that this could also be a serialization issue. ?If you've
> > got a 3ms latency, and you're performing all of your opens serially,
> > you aren't going to get much faster. ?If you do the work in parallel
> > you'll likely get substantially better numbers.
> >
> > -Aaron
> >
> >
> > On Wed, Sep 2, 2009 at 2:08 PM, Jason Legate<[email protected]> wrote:
> >> Hi, I'm trying to setup a server that we can create millions of files on over
> >> NFS. ?When I run our creation benchmark locally ?I can get around 3000 files/
> >> second in the configuration we're using now, but only around 300/second over
> >> NFS. ?It's mounted as this:
> >>
> >> rw,nosuid,nodev,noatime,nodiratime,hard,bg,nointr,rsize=32768,wsize=32768,tcp,
> >> nfsvers=3,timeo=600,actimeo=600,nocto
> >>
> >> When I mount the same FS over localhost instead of across the lan, it performs
> >> about full speed (the 3000/sec). ?Anyone have any ideas what I might tweak or
> >> look at?
> >>
> >> We're going to be testing various XFS/LVM configs to get the best performance,
> >> but right out the gate, NFS having a 10:1 penalty of performance doesn't bode
> >> well.
> >>
> >> Thanks in advance,
> >> Jason
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to [email protected]
> >> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >
>
>
>
> --
> Best regards,
> Peter Chacko
>
> NetDiox computing systems,
> Network storage & OS training and research.
> Bangalore, India.
> http://www.netdiox.com
> 080 2664 0708

2009-09-03 18:35:49

by Ric Wheeler

[permalink] [raw]
Subject: Re: NFS for millions of files

On 09/03/2009 02:15 PM, J. Bruce Fields wrote:
> On Wed, Sep 02, 2009 at 02:37:31PM -0400, Peter Staubach wrote:
>
>> Please keep in mind that the NFS stable storage requirements are
>> probably causing a significant slowdown in activities such as this.
>>
> My first thought too, but:
>
>
>> Jason Legate wrote:
>>
>>> When I run our creation benchmark locally I can get around 3000
>>> files/ second in the configuration we're using now, but only around
>>> 300/second over NFS. It's mounted as this:
>>>
> ...
>
>>> When I mount the same FS over localhost instead of across the lan,
>>> it performs about full speed (the 3000/sec).
>>>
> The localhost NFS mount would be incurring the same sync latency, so all
> his latency must be due to network. (And with those numbers I guess
> he's either got lots of disk spindles, or an ssd, or (uh-oh) has the
> async option set?)
>
> --b.
>

For small files without doing an fsync per file, getting 3000 files/sec
is not that much. Ext3 can do it with a local s-ata disk. I suspect that
Jason would run much slower if he ran with local fsync()'s enabled
(similar to what NFS servers have to do).

Some quick testing on a local F12 (rawhide) box:

No fsync: 3729 files/sec

[root@ricdesktop rwheeler]# fs_mark -s 20480 -n 10000 -S 0 -d /test/test_dir

# fs_mark -s 20480 -n 10000 -S 0 -d /test/test_dir
# Version 3.3, 1 thread(s) starting at Thu Sep 3 14:26:26 2009
# Sync method: NO SYNC: Test does not issue sync() or fsync() calls.
# Directories: no subdirectories used
# File names: 40 bytes long, (16 initial bytes of time stamp with 24
random bytes at end of name)
# Files info: size 20480 bytes, written with an IO size of 16384
bytes per write
# App overhead is time in microseconds spent in the test not doing
file writing related system calls.

FSUse% Count Size Files/sec App Overhead
5 10000 20480 3729.0 315859

Fsync & working barriers: 24.8 files/sec

[root@ricdesktop rwheeler]# fs_mark -s 20480 -n 10000 -S 1 -d /test/test_dir

# fs_mark -s 20480 -n 10000 -S 1 -d /test/test_dir
# Version 3.3, 1 thread(s) starting at Thu Sep 3 14:27:00 2009
# Sync method: INBAND FSYNC: fsync() per file in write loop.
# Directories: no subdirectories used
# File names: 40 bytes long, (16 initial bytes of time stamp with 24
random bytes at end of name)
# Files info: size 20480 bytes, written with an IO size of 16384
bytes per write
# App overhead is time in microseconds spent in the test not doing
file writing related system calls.

FSUse% Count Size Files/sec App Overhead
5 10000 20480 24.8 350322

Fsync/no write barriers: 377.1 files/sec

[root@ricdesktop rwheeler]# umount /test/
[root@ricdesktop rwheeler]# mount -o barrier=0 /dev/sdb /test/
[root@ricdesktop rwheeler]# fs_mark -s 20480 -n 10000 -S 1 -d /test/test_dir

# fs_mark -s 20480 -n 10000 -S 1 -d /test/test_dir
# Version 3.3, 1 thread(s) starting at Thu Sep 3 14:36:27 2009
# Sync method: INBAND FSYNC: fsync() per file in write loop.
# Directories: no subdirectories used
# File names: 40 bytes long, (16 initial bytes of time stamp with 24
random bytes at end of name)
# Files info: size 20480 bytes, written with an IO size of 16384
bytes per write
# App overhead is time in microseconds spent in the test not doing
file writing related system calls.

FSUse% Count Size Files/sec App Overhead
5 10000 20480 377.1 328472


Ric






2009-09-02 18:55:46

by Jason Legate

[permalink] [raw]
Subject: Re: NFS for millions of files

It is said that Aaron Wiebe, at Wed, 02 Sep 2009, wrote:

> Have a look at these two kernel params - I'd recommend bumping them up
> to 128 (they're 16 by default).
>
> sunrpc.tcp_slot_table_entries
> sunrpc.udp_slot_table_entries

They were indeed 16. I've bumped them up to 128, and am trying again.

> Keep in mind that this could also be a serialization issue. If you've
> got a 3ms latency, and you're performing all of your opens serially,
> you aren't going to get much faster. If you do the work in parallel
> you'll likely get substantially better numbers.

It appears it might just be serialization. When I run more than 1 thread,
I can consistently get between 200 and 250 files/thread/second (up to about
10 threads, at which point I get diminishing returns per thread).

I seem to be doing a lot of GETATTR's, and it appears to be driven by perl.
We are open()'ing files with +>filename and apparently perl will do an fstat
for us after opening. Thanks for the tip, Aaron!

-j

> -Aaron
>
>
> On Wed, Sep 2, 2009 at 2:08 PM, Jason Legate<[email protected]> wrote:
> > Hi, I'm trying to setup a server that we can create millions of files on over
> > NFS. ?When I run our creation benchmark locally ?I can get around 3000 files/
> > second in the configuration we're using now, but only around 300/second over
> > NFS. ?It's mounted as this:
> >
> > rw,nosuid,nodev,noatime,nodiratime,hard,bg,nointr,rsize=32768,wsize=32768,tcp,
> > nfsvers=3,timeo=600,actimeo=600,nocto
> >
> > When I mount the same FS over localhost instead of across the lan, it performs
> > about full speed (the 3000/sec). ?Anyone have any ideas what I might tweak or
> > look at?
> >
> > We're going to be testing various XFS/LVM configs to get the best performance,
> > but right out the gate, NFS having a 10:1 penalty of performance doesn't bode
> > well.
> >
> > Thanks in advance,
> > Jason
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-09-03 18:54:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS for millions of files

On Thu, 2009-09-03 at 14:37 -0400, Ric Wheeler wrote:
> On 09/03/2009 02:15 PM, J. Bruce Fields wrote:
> > On Wed, Sep 02, 2009 at 02:37:31PM -0400, Peter Staubach wrote:
> >
> >> Please keep in mind that the NFS stable storage requirements are
> >> probably causing a significant slowdown in activities such as this.
> >>
> > My first thought too, but:
> >
> >
> >> Jason Legate wrote:
> >>
> >>> When I run our creation benchmark locally I can get around 3000
> >>> files/ second in the configuration we're using now, but only around
> >>> 300/second over NFS. It's mounted as this:
> >>>
> > ...
> >
> >>> When I mount the same FS over localhost instead of across the lan,
> >>> it performs about full speed (the 3000/sec).
> >>>
> > The localhost NFS mount would be incurring the same sync latency, so all
> > his latency must be due to network. (And with those numbers I guess
> > he's either got lots of disk spindles, or an ssd, or (uh-oh) has the
> > async option set?)
> >
> > --b.
> >
>
> For small files without doing an fsync per file, getting 3000 files/sec
> is not that much. Ext3 can do it with a local s-ata disk. I suspect that
> Jason would run much slower if he ran with local fsync()'s enabled
> (similar to what NFS servers have to do).

Actually, NFS servers have to do _two_ fsyncs. One when creating the
file, and one when the client closes it. NFSv4 with write delegations
could allow the client to delay the fsync on close, but does not allow
it to eliminate the fsync-on-create.
In order help speed up those workloads we have therefore recently
submitted a proposal to also delay the fsync-on-create. This proposal is
for inclusion in the IETF's NFSv4 working group charter for NFSv4.2...

Cheers
Trond


2009-09-02 18:38:22

by Chuck Lever

[permalink] [raw]
Subject: Re: NFS for millions of files

On Sep 2, 2009, at 2:08 PM, Jason Legate wrote:
> Hi, I'm trying to setup a server that we can create millions of
> files on over
> NFS. When I run our creation benchmark locally I can get around
> 3000 files/
> second in the configuration we're using now, but only around 300/
> second over
> NFS. It's mounted as this:
>
> rw
> ,nosuid
> ,nodev,noatime,nodiratime,hard,bg,nointr,rsize=32768,wsize=32768,tcp,
> nfsvers=3,timeo=600,actimeo=600,nocto
>
> When I mount the same FS over localhost instead of across the lan,
> it performs
> about full speed (the 3000/sec). Anyone have any ideas what I might
> tweak or
> look at?
>
> We're going to be testing various XFS/LVM configs to get the best
> performance,
> but right out the gate, NFS having a 10:1 penalty of performance
> doesn't bode
> well.

If you are using a slow LAN (like 100Mb/s) that might be a problem.

Metadata operations (like file creation) are always slower on NFS than
on local file systems. There is significantly more serialization
involved for NFS since access to the file system is shared across
multiple systems.

You might consider a cluster file system instead of NFS if you are
driving metadata-intensive workloads while sharing amongst only local
clients. Or, if you can install a fast log device for your server
file system, that might mitigate the disk waits during each file
creation.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2009-09-02 18:37:37

by Peter Staubach

[permalink] [raw]
Subject: Re: NFS for millions of files

Jason Legate wrote:
> Hi, I'm trying to setup a server that we can create millions of files on over
> NFS. When I run our creation benchmark locally I can get around 3000 files/
> second in the configuration we're using now, but only around 300/second over
> NFS. It's mounted as this:
>
> rw,nosuid,nodev,noatime,nodiratime,hard,bg,nointr,rsize=32768,wsize=32768,tcp,
> nfsvers=3,timeo=600,actimeo=600,nocto
>
> When I mount the same FS over localhost instead of across the lan, it performs
> about full speed (the 3000/sec). Anyone have any ideas what I might tweak or
> look at?
>
> We're going to be testing various XFS/LVM configs to get the best performance,
> but right out the gate, NFS having a 10:1 penalty of performance doesn't bode
> well.
>

Hi.

Please keep in mind that the NFS stable storage requirements are
probably causing a significant slowdown in activities such as this.
All of the files being created over NFS are being flushed to stable
storage, including modified directory contents and inode information,
before the server responds to the client. All of those local file
creates are simply manipulating in-memory buffers and are not being
flushed to stable storage until some later time.

There really aren't any good solutions, except perhaps, for
utilizing write caching on the server, and I would only recommend
doing that if you have a good solution, from top to bottom in
the file system and storage stack on the server. Most aren't.

ps

2009-09-02 18:35:45

by Peter Chacko

[permalink] [raw]
Subject: Re: NFS for millions of files

Is this NFSv4 ? rsize and wsize > MTU size will cause fragmentation
and performance issues...Try making it around 4k .....You used 1<<15
fir your example. if you don't do writes....then this shouldn't
matter...and of course NFS is Nfs is not For Scalability.. You cannot
get the same performance on NFS as you would get for localFS...May be
you can try 10g....still there is TCP/UDP/IP stack overhead.....


On Wed, Sep 2, 2009 at 11:49 PM, Aaron Wiebe<[email protected]> wrote:
> Have a look at these two kernel params - I'd recommend bumping them up
> to 128 (they're 16 by default).
>
> sunrpc.tcp_slot_table_entries
> sunrpc.udp_slot_table_entries
>
> Keep in mind that this could also be a serialization issue. ?If you've
> got a 3ms latency, and you're performing all of your opens serially,
> you aren't going to get much faster. ?If you do the work in parallel
> you'll likely get substantially better numbers.
>
> -Aaron
>
>
> On Wed, Sep 2, 2009 at 2:08 PM, Jason Legate<[email protected]> wrote:
>> Hi, I'm trying to setup a server that we can create millions of files on over
>> NFS. ?When I run our creation benchmark locally ?I can get around 3000 files/
>> second in the configuration we're using now, but only around 300/second over
>> NFS. ?It's mounted as this:
>>
>> rw,nosuid,nodev,noatime,nodiratime,hard,bg,nointr,rsize=32768,wsize=32768,tcp,
>> nfsvers=3,timeo=600,actimeo=600,nocto
>>
>> When I mount the same FS over localhost instead of across the lan, it performs
>> about full speed (the 3000/sec). ?Anyone have any ideas what I might tweak or
>> look at?
>>
>> We're going to be testing various XFS/LVM configs to get the best performance,
>> but right out the gate, NFS having a 10:1 penalty of performance doesn't bode
>> well.
>>
>> Thanks in advance,
>> Jason
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>



--
Best regards,
Peter Chacko

NetDiox computing systems,
Network storage & OS training and research.
Bangalore, India.
http://www.netdiox.com
080 2664 0708

2009-09-03 19:05:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS for millions of files

On Thu, Sep 03, 2009 at 02:37:51PM -0400, Ric Wheeler wrote:
> On 09/03/2009 02:15 PM, J. Bruce Fields wrote:
>> On Wed, Sep 02, 2009 at 02:37:31PM -0400, Peter Staubach wrote:
>>
>>> Please keep in mind that the NFS stable storage requirements are
>>> probably causing a significant slowdown in activities such as this.
>>>
>> My first thought too, but:
>>
>>
>>> Jason Legate wrote:
>>>
>>>> When I run our creation benchmark locally I can get around 3000
>>>> files/ second in the configuration we're using now, but only around
>>>> 300/second over NFS. It's mounted as this:
>>>>
>> ...
>>
>>>> When I mount the same FS over localhost instead of across the lan,
>>>> it performs about full speed (the 3000/sec).
>>>>
>> The localhost NFS mount would be incurring the same sync latency, so all
>> his latency must be due to network. (And with those numbers I guess
>> he's either got lots of disk spindles, or an ssd, or (uh-oh) has the
>> async option set?)
>>
>> --b.
>>
>
> For small files without doing an fsync per file, getting 3000 files/sec
> is not that much. Ext3 can do it with a local s-ata disk. I suspect that
> Jason would run much slower if he ran with local fsync()'s enabled
> (similar to what NFS servers have to do).

Yeah, which is why I was suspecting they set "async" on the export. In
which case I hope they know what they're doing....

--b.