2008-12-30 10:42:50

by Krishna Kumar2

[permalink] [raw]
Subject: [RFC PATCH 0/1] nfsd: Improve NFS server performance

From: Krishna Kumar <[email protected]>

Patch summary:
--------------
Change the readahead caching on the server to a file handle caching model.
Since file handles are unique, this patch removes all dependencies on the
kernel readahead parameters/implementation and instead caches files based
on file handles. This change allows the server to not have to open/close
a file multiple times when the client reads it, and results in faster lookup
times. Also, readahead is automatically taken care of since the file is not
closed while it is getting read (quickly) by the client.


Read algo change:
------------------
The new nfsd_read() is changed to:
if file {
Old code
} else {
Check if this FH is cached
if fh && fh has cached file pointer:
Get file pointer
Update fields in fhp from cache
call fh_verify
else:
Nothing in the cache, call nfsd_open as usual

nfsd_vfs_read

if fh {
If this is a new fh entry:
Save cached values
Drop our reference to fh
} else
Close file
}


Performance:
-------------
This patch was tested with clients running 1, 4, 8, 16 --- 256 test processes,
each doing reads of different files. Each test includes different I/O sizes.
Many individual tests (16% of test cases) got throughput improvement in the
9 to 15% range. The full results are provided at the end of this post.

Please review. Any comments or improvement ideas are greatly appreciated.

Signed-off-by: Krishna Kumar <[email protected]>
---

(#Test Processes on Client == #NFSD's on Server)
--------------------------------------------------------------
#Test Processes Org BW KB/s New BW KB/s %
--------------------------------------------------------------
4 256 48151.09 50328.70 4.52
4 4096 47700.05 49760.34 4.31
4 8192 47553.34 48509.00 2.00
4 16384 48764.87 51208.54 5.01
4 32768 49306.11 50141.59 1.69
4 65536 48681.46 49491.32 1.66
4 131072 48378.02 49971.95 3.29

8 256 38906.95 42444.95 9.09
8 4096 38141.46 42154.24 10.52
8 8192 37058.55 41241.78 11.28
8 16384 37446.56 40573.70 8.35
8 32768 36655.91 42159.85 15.01
8 65536 38776.11 40619.20 4.75
8 131072 38187.85 41119.04 7.67

16 256 36274.49 36143.00 -0.36
16 4096 34320.56 37664.35 9.74
16 8192 35489.65 34555.43 -2.63
16 16384 35647.32 36289.72 1.80
16 32768 37037.31 36874.33 -0.44
16 65536 36388.14 36991.56 1.65
16 131072 35729.34 37588.85 5.20

32 256 30838.89 32811.47 6.39
32 4096 31291.93 33439.83 6.86
32 8192 29885.57 33337.10 11.54
32 16384 30020.23 31795.97 5.91
32 32768 32805.03 33860.68 3.21
32 65536 31275.12 32997.34 5.50
32 131072 33391.85 34209.86 2.44

64 256 26729.46 28077.13 5.04
64 4096 25705.01 27339.37 6.35
64 8192 27757.06 27488.04 -0.96
64 16384 22927.44 23938.79 4.41
64 32768 26956.16 27848.52 3.31
64 65536 27419.59 29228.76 6.59
64 131072 27623.29 27651.99 .10

128 256 22463.63 22437.45 -.11
128 4096 22039.69 22554.03 2.33
128 8192 22218.42 24010.64 8.06
128 16384 15295.59 16745.28 9.47
128 32768 23319.54 23450.46 0.56
128 65536 22942.03 24169.26 5.34
128 131072 23845.27 23894.14 0.20

256 256 15659.17 16266.38 3.87
256 4096 15614.72 16362.25 4.78
256 8192 16950.24 17092.50 0.83
256 16384 9253.25 10274.28 11.03
256 32768 17872.89 17792.93 -.44
256 65536 18459.78 18641.68 0.98
256 131072 19408.01 20538.80 5.82
--------------------------------------------------------------


2009-02-05 20:24:00

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance

On Thu, Feb 05, 2009 at 08:38:19PM +0530, Krishna Kumar2 wrote:
> Hi Bruce,
>
> Thanks for your comments (also please refer to REV2 of patch as that is
> much simpler).

Yes, apologies, I only noticed I had a later vesion after responding to
the wrong one....

> > I think of open and lookup as fairly fast, so I'm surprised this
> > makes a great difference; do you have profile results or something
> > to confirm that this is in fact what made the difference?
>
> Beyond saving the open/lookup times, the cache is updated only once.
> Hence no lock plus update is required for subsequent reads - the code
> does a single lock on every read operation instead of two. The time to
> get the cache is approximately the same for old vs new code; but in
> the new code we get file/dentry and svc_exp.
>
> I used to have counters in nfsd_open - something like dbg_num_opens,
> dbg_open_jiffies, dgb_close_jiffies, dbg_read_jiffies,
> dgb_cache_jiffies, etc. I can reintroduce those debugs and get a run
> and see how those numbers looks like, is that what you are looking
> for?

I'm not sure what you mean by dbg_open_jiffies--surely a single open of
a file already in the dentry cache is too fast to be measurable in
jiffies?

> > When do items get removed from this cache?
>
> At the first open, the item is kept at the end of a global list (which is
> manipulated by the new daemon). After some jiffies are over, the daemon
> goes through the list till it comes to the first entry that has not
> expired; and frees up all the earlier entries. If the file is being used,
> it is not freed. If file is used after free, a new entry is added to the
> end of the list. So very minimal list manipulation is required - no sorting
> and moving entries in the list.

OK, yeah, I just wondered whether you could end up with a reference to a
file hanging around indefinitely even after it had been deleted, for
example.

I've heard of someone updating read-only block snapshots by stopping
mountd, flushing the export cache, unmounting the old snapshot, then
mounting the new one and restarting mountd. A bit of a hack, but I
guess it works, as long as no clients hold locks or NFSv4 opens on the
filesystem.

An open cache may break that by holding references to the filesystem
they want to unmount. But perhaps we should give such users a proper
interface that tells nfsd to temporarily drop state it holds on a
filesystem, and tell them to use that instead.

> Please let me know if you would like me to write up a small text about how
> this patch works.

Any explanation always welcome.

> > Could you provide details sufficient to reproduce this test if
> > necessary? (At least: what was the test code, how many clients were
> > used, what was the client and server hardware, and what filesystem was
> > the server exporting?)
>
> Sure - I will send the test code in a day (don't have access to the system
> right
> now, sorry. But this is a script that runs a C program that forks and then
> reads
> a file till it is killed and it prints the amount of data read and the
> amount of
> time it ran).
>
> The other details are:
> #Clients: 1
> Hardware Configuration (both systems):
> Two Dual-Core AMD Opteron (4 cpus) at 3GH.
> 1GB memory
> 10gbps private network
> Filesystem: ext3 (one filesystem)

OK, thanks! And what sort of disk on the server?

--b.

2009-02-05 15:08:39

by Krishna Kumar2

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance

Hi Bruce,

Thanks for your comments (also please refer to REV2 of patch as that is
much simpler).

> >
> > Patch summary:
> > --------------
> > Change the readahead caching on the server to a file handle caching
model.
> > Since file handles are unique, this patch removes all dependencies on
the
> > kernel readahead parameters/implementation and instead caches files
based
> > on file handles. This change allows the server to not have to
open/close
> > a file multiple times when the client reads it, and results in faster
lookup
> > times.
>
> I think of open and lookup as fairly fast, so I'm surprised this makes a
> great difference; do you have profile results or something to confirm
> that this is in fact what made the difference?

Beyond saving the open/lookup times, the cache is updated only once. Hence
no
lock plus update is required for subsequent reads - the code does a single
lock
on every read operation instead of two. The time to get the cache is
approximately the same for old vs new code; but in the new code we get
file/dentry and svc_exp.

I used to have counters in nfsd_open - something like dbg_num_opens,
dbg_open_jiffies, dgb_close_jiffies, dbg_read_jiffies, dgb_cache_jiffies,
etc.
I can reintroduce those debugs and get a run and see how those numbers
looks
like, is that what you are looking for?

> > Also, readahead is automatically taken care of since the file is not
> > closed while it is getting read (quickly) by the client.
> >
> >
> > Read algo change:
> > ------------------
> > The new nfsd_read() is changed to:
> > if file {
> > Old code
> > } else {
> > Check if this FH is cached
> > if fh && fh has cached file pointer:
> > Get file pointer
> > Update fields in fhp from cache
> > call fh_verify
> > else:
> > Nothing in the cache, call nfsd_open as usual
> >
> > nfsd_vfs_read
> >
> > if fh {
> > If this is a new fh entry:
> > Save cached values
> > Drop our reference to fh
> > } else
> > Close file
> > }
>
> When do items get removed from this cache?

At the first open, the item is kept at the end of a global list (which is
manipulated by the new daemon). After some jiffies are over, the daemon
goes through the list till it comes to the first entry that has not
expired; and frees up all the earlier entries. If the file is being used,
it is not freed. If file is used after free, a new entry is added to the
end of the list. So very minimal list manipulation is required - no sorting
and moving entries in the list.

Please let me know if you would like me to write up a small text about how
this patch works.

> > Performance:
> > -------------
> > This patch was tested with clients running 1, 4, 8, 16 --- 256 test
processes,
> > each doing reads of different files. Each test includes different I/O
sizes.
> > Many individual tests (16% of test cases) got throughput improvement in
the
> > 9 to 15% range. The full results are provided at the end of this post.
>
> Could you provide details sufficient to reproduce this test if
> necessary? (At least: what was the test code, how many clients were
> used, what was the client and server hardware, and what filesystem was
> the server exporting?)

Sure - I will send the test code in a day (don't have access to the system
right
now, sorry. But this is a script that runs a C program that forks and then
reads
a file till it is killed and it prints the amount of data read and the
amount of
time it ran).

The other details are:
#Clients: 1
Hardware Configuration (both systems):
Two Dual-Core AMD Opteron (4 cpus) at 3GH.
1GB memory
10gbps private network
Filesystem: ext3 (one filesystem)

Thanks,

- KK

> > Please review. Any comments or improvement ideas are greatly
appreciated.
> >
> > Signed-off-by: Krishna Kumar <[email protected]>
> > ---
> >
> > (#Test Processes on Client == #NFSD's on Server)
> > --------------------------------------------------------------
> > #Test Processes Org BW KB/s New BW KB/s %
> > --------------------------------------------------------------
>
> What's the second column?
>
> > 4 256 48151.09 50328.70 4.52
> > 4 4096 47700.05 49760.34 4.31
> > 4 8192 47553.34 48509.00 2.00
> > 4 16384 48764.87 51208.54 5.01
> > 4 32768 49306.11 50141.59 1.69
> > 4 65536 48681.46 49491.32 1.66
> > 4 131072 48378.02 49971.95 3.29
> >
> > 8 256 38906.95 42444.95 9.09
> > 8 4096 38141.46 42154.24 10.52
> > 8 8192 37058.55 41241.78 11.28
> > 8 16384 37446.56 40573.70 8.35
> > 8 32768 36655.91 42159.85 15.01
> > 8 65536 38776.11 40619.20 4.75
> > 8 131072 38187.85 41119.04 7.67
> >
> > 16 256 36274.49 36143.00 -0.36
> > 16 4096 34320.56 37664.35 9.74
> > 16 8192 35489.65 34555.43 -2.63
> > 16 16384 35647.32 36289.72 1.80
> > 16 32768 37037.31 36874.33 -0.44
> > 16 65536 36388.14 36991.56 1.65
> > 16 131072 35729.34 37588.85 5.20
> >
> > 32 256 30838.89 32811.47 6.39
> > 32 4096 31291.93 33439.83 6.86
> > 32 8192 29885.57 33337.10 11.54
> > 32 16384 30020.23 31795.97 5.91
> > 32 32768 32805.03 33860.68 3.21
> > 32 65536 31275.12 32997.34 5.50
> > 32 131072 33391.85 34209.86 2.44
> >
> > 64 256 26729.46 28077.13 5.04
> > 64 4096 25705.01 27339.37 6.35
> > 64 8192 27757.06 27488.04 -0.96
> > 64 16384 22927.44 23938.79 4.41
> > 64 32768 26956.16 27848.52 3.31
> > 64 65536 27419.59 29228.76 6.59
> > 64 131072 27623.29 27651.99 .10
> >
> > 128 256 22463.63 22437.45 -.11
> > 128 4096 22039.69 22554.03 2.33
> > 128 8192 22218.42 24010.64 8.06
> > 128 16384 15295.59 16745.28 9.47
> > 128 32768 23319.54 23450.46 0.56
> > 128 65536 22942.03 24169.26 5.34
> > 128 131072 23845.27 23894.14 0.20
> >
> > 256 256 15659.17 16266.38 3.87
> > 256 4096 15614.72 16362.25 4.78
> > 256 8192 16950.24 17092.50 0.83
> > 256 16384 9253.25 10274.28 11.03
> > 256 32768 17872.89 17792.93 -.44
> > 256 65536 18459.78 18641.68 0.98
> > 256 131072 19408.01 20538.80 5.82
> > --------------------------------------------------------------
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html


2009-02-04 23:19:55

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance

On Tue, Dec 30, 2008 at 04:12:45PM +0530, Krishna Kumar wrote:
> From: Krishna Kumar <[email protected]>

Thanks for the work, and apologies for the slow response.

>
> Patch summary:
> --------------
> Change the readahead caching on the server to a file handle caching model.
> Since file handles are unique, this patch removes all dependencies on the
> kernel readahead parameters/implementation and instead caches files based
> on file handles. This change allows the server to not have to open/close
> a file multiple times when the client reads it, and results in faster lookup
> times.

I think of open and lookup as fairly fast, so I'm surprised this makes a
great difference; do you have profile results or something to confirm
that this is in fact what made the difference?

> Also, readahead is automatically taken care of since the file is not
> closed while it is getting read (quickly) by the client.
>
>
> Read algo change:
> ------------------
> The new nfsd_read() is changed to:
> if file {
> Old code
> } else {
> Check if this FH is cached
> if fh && fh has cached file pointer:
> Get file pointer
> Update fields in fhp from cache
> call fh_verify
> else:
> Nothing in the cache, call nfsd_open as usual
>
> nfsd_vfs_read
>
> if fh {
> If this is a new fh entry:
> Save cached values
> Drop our reference to fh
> } else
> Close file
> }

When do items get removed from this cache?

>
>
> Performance:
> -------------
> This patch was tested with clients running 1, 4, 8, 16 --- 256 test processes,
> each doing reads of different files. Each test includes different I/O sizes.
> Many individual tests (16% of test cases) got throughput improvement in the
> 9 to 15% range. The full results are provided at the end of this post.

Could you provide details sufficient to reproduce this test if
necessary? (At least: what was the test code, how many clients were
used, what was the client and server hardware, and what filesystem was
the server exporting?)

--b.

>
> Please review. Any comments or improvement ideas are greatly appreciated.
>
> Signed-off-by: Krishna Kumar <[email protected]>
> ---
>
> (#Test Processes on Client == #NFSD's on Server)
> --------------------------------------------------------------
> #Test Processes Org BW KB/s New BW KB/s %
> --------------------------------------------------------------

What's the second column?

> 4 256 48151.09 50328.70 4.52
> 4 4096 47700.05 49760.34 4.31
> 4 8192 47553.34 48509.00 2.00
> 4 16384 48764.87 51208.54 5.01
> 4 32768 49306.11 50141.59 1.69
> 4 65536 48681.46 49491.32 1.66
> 4 131072 48378.02 49971.95 3.29
>
> 8 256 38906.95 42444.95 9.09
> 8 4096 38141.46 42154.24 10.52
> 8 8192 37058.55 41241.78 11.28
> 8 16384 37446.56 40573.70 8.35
> 8 32768 36655.91 42159.85 15.01
> 8 65536 38776.11 40619.20 4.75
> 8 131072 38187.85 41119.04 7.67
>
> 16 256 36274.49 36143.00 -0.36
> 16 4096 34320.56 37664.35 9.74
> 16 8192 35489.65 34555.43 -2.63
> 16 16384 35647.32 36289.72 1.80
> 16 32768 37037.31 36874.33 -0.44
> 16 65536 36388.14 36991.56 1.65
> 16 131072 35729.34 37588.85 5.20
>
> 32 256 30838.89 32811.47 6.39
> 32 4096 31291.93 33439.83 6.86
> 32 8192 29885.57 33337.10 11.54
> 32 16384 30020.23 31795.97 5.91
> 32 32768 32805.03 33860.68 3.21
> 32 65536 31275.12 32997.34 5.50
> 32 131072 33391.85 34209.86 2.44
>
> 64 256 26729.46 28077.13 5.04
> 64 4096 25705.01 27339.37 6.35
> 64 8192 27757.06 27488.04 -0.96
> 64 16384 22927.44 23938.79 4.41
> 64 32768 26956.16 27848.52 3.31
> 64 65536 27419.59 29228.76 6.59
> 64 131072 27623.29 27651.99 .10
>
> 128 256 22463.63 22437.45 -.11
> 128 4096 22039.69 22554.03 2.33
> 128 8192 22218.42 24010.64 8.06
> 128 16384 15295.59 16745.28 9.47
> 128 32768 23319.54 23450.46 0.56
> 128 65536 22942.03 24169.26 5.34
> 128 131072 23845.27 23894.14 0.20
>
> 256 256 15659.17 16266.38 3.87
> 256 4096 15614.72 16362.25 4.78
> 256 8192 16950.24 17092.50 0.83
> 256 16384 9253.25 10274.28 11.03
> 256 32768 17872.89 17792.93 -.44
> 256 65536 18459.78 18641.68 0.98
> 256 131072 19408.01 20538.80 5.82
> --------------------------------------------------------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-02-07 09:17:10

by Krishna Kumar2

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance

Hi Bruce,

> > I used to have counters in nfsd_open - something like dbg_num_opens,
> > dbg_open_jiffies, dgb_close_jiffies, dbg_read_jiffies,
> > dgb_cache_jiffies, etc. I can reintroduce those debugs and get a run
> > and see how those numbers looks like, is that what you are looking
> > for?
>
> I'm not sure what you mean by dbg_open_jiffies--surely a single open of
> a file already in the dentry cache is too fast to be measurable in
> jiffies?

When dbg_number_of_opens is very high, I see a big difference in the open
times
for original vs new (almost zero) code. I am running 8, 64, 256, etc,
processes and each of them reads files upto 500MB (a lot of open/read/close
per file per process), so the jiffies adds up (contention between parallel
opens, some processing in open, etc). To clarify this, I will reintroduce
the debugs and get some values (it was done a long time back and I don't
remember how much difference was there), and post it along with what the
debug code is doing.

> OK, yeah, I just wondered whether you could end up with a reference to a
> file hanging around indefinitely even after it had been deleted, for
> example.

If client deletes a file, the server immediately locates and removes the
cached
entry. If server deletes a file, my original intention was to use inotify
to
inform NFS server to delete the cache but that ran into some problems. So
my
solution was to fallback to the cache getting deleted by the daemon after
the
short timeout, till then the space for the inode is not freed. So in both
cases,
references to the file will not hang around indefinitely.

> I've heard of someone updating read-only block snapshots by stopping
> mountd, flushing the export cache, unmounting the old snapshot, then
> mounting the new one and restarting mountd. A bit of a hack, but I
> guess it works, as long as no clients hold locks or NFSv4 opens on the
> filesystem.
>
> An open cache may break that by holding references to the filesystem
> they want to unmount. But perhaps we should give such users a proper
> interface that tells nfsd to temporarily drop state it holds on a
> filesystem, and tell them to use that instead.

I must admit that I am lost in this scenario - I was assuming that the
filesystem can be unmounted only after nfs services are stopped, hence I
added
cache cleanup on nfsd_shutdown. Is there some hook to catch for the unmount
where I should clean the cache for that filesystem?

> > Please let me know if you would like me to write up a small text about
how
> > this patch works.
>
> Any explanation always welcome.

Sure. I will send this text soon, along with test program.

> > The other details are:
> > #Clients: 1
> > Hardware Configuration (both systems):
> > Two Dual-Core AMD Opteron (4 cpus) at 3GH.
> > 1GB memory
> > 10gbps private network
> > Filesystem: ext3 (one filesystem)
>
> OK, thanks! And what sort of disk on the server?

133 GB ServeRAID (I think ST9146802SS Seagate disk), containing 256 files,
each
of 500MB size.

Thanks,

- KK


2009-02-09 19:06:45

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance

On Sat, Feb 07, 2009 at 02:43:55PM +0530, Krishna Kumar2 wrote:
> Hi Bruce,
>
> > > I used to have counters in nfsd_open - something like dbg_num_opens,
> > > dbg_open_jiffies, dgb_close_jiffies, dbg_read_jiffies,
> > > dgb_cache_jiffies, etc. I can reintroduce those debugs and get a run
> > > and see how those numbers looks like, is that what you are looking
> > > for?
> >
> > I'm not sure what you mean by dbg_open_jiffies--surely a single open of
> > a file already in the dentry cache is too fast to be measurable in
> > jiffies?
>
> When dbg_number_of_opens is very high, I see a big difference in the open
> times
> for original vs new (almost zero) code. I am running 8, 64, 256, etc,
> processes and each of them reads files upto 500MB (a lot of open/read/close
> per file per process), so the jiffies adds up (contention between parallel
> opens, some processing in open, etc). To clarify this, I will reintroduce
> the debugs and get some values (it was done a long time back and I don't
> remember how much difference was there), and post it along with what the
> debug code is doing.
>
> > OK, yeah, I just wondered whether you could end up with a reference to a
> > file hanging around indefinitely even after it had been deleted, for
> > example.
>
> If client deletes a file, the server immediately locates and removes the
> cached
> entry. If server deletes a file, my original intention was to use inotify
> to
> inform NFS server to delete the cache but that ran into some problems. So
> my
> solution was to fallback to the cache getting deleted by the daemon after
> the
> short timeout, till then the space for the inode is not freed. So in both
> cases,
> references to the file will not hang around indefinitely.
>
> > I've heard of someone updating read-only block snapshots by stopping
> > mountd, flushing the export cache, unmounting the old snapshot, then
> > mounting the new one and restarting mountd. A bit of a hack, but I
> > guess it works, as long as no clients hold locks or NFSv4 opens on the
> > filesystem.
> >
> > An open cache may break that by holding references to the filesystem
> > they want to unmount. But perhaps we should give such users a proper
> > interface that tells nfsd to temporarily drop state it holds on a
> > filesystem, and tell them to use that instead.
>
> I must admit that I am lost in this scenario - I was assuming that the
> filesystem can be unmounted only after nfs services are stopped, hence I
> added
> cache cleanup on nfsd_shutdown. Is there some hook to catch for the unmount
> where I should clean the cache for that filesystem?

No. People have talked about doing that, but it hasn't happened.

But I think I'd prefer some separate operation (probably just triggered
by a write to a some new file in the nfsd filesystem) that told nfsd to
release all its references to a given filesystem. An administrator
would have to know to do this before unmounting (or maybe mount could be
patched to do this).

Since we don't have a way to tell clients (at least v2/v3 clients) that
we've lost their state on just one filesystem, we'd have to save nfsd's
state internally but drop any hard references to filesystem objects,
then reacquire them afterward.

I'm not sure how best to do that.

That's not necessarily a prerequisite for this change; it depends on how
common that sort of use is.

--b.

>
> > > Please let me know if you would like me to write up a small text about
> how
> > > this patch works.
> >
> > Any explanation always welcome.
>
> Sure. I will send this text soon, along with test program.
>
> > > The other details are:
> > > #Clients: 1
> > > Hardware Configuration (both systems):
> > > Two Dual-Core AMD Opteron (4 cpus) at 3GH.
> > > 1GB memory
> > > 10gbps private network
> > > Filesystem: ext3 (one filesystem)
> >
> > OK, thanks! And what sort of disk on the server?
>
> 133 GB ServeRAID (I think ST9146802SS Seagate disk), containing 256 files,
> each
> of 500MB size.
>
> Thanks,
>
> - KK
>

2009-02-09 20:56:34

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance

On Feb 9, 2009, at 2:06 PM, J. Bruce Fields wrote:
> On Sat, Feb 07, 2009 at 02:43:55PM +0530, Krishna Kumar2 wrote:
>> Hi Bruce,
>>
>>>> I used to have counters in nfsd_open - something like
>>>> dbg_num_opens,
>>>> dbg_open_jiffies, dgb_close_jiffies, dbg_read_jiffies,
>>>> dgb_cache_jiffies, etc. I can reintroduce those debugs and get a
>>>> run
>>>> and see how those numbers looks like, is that what you are looking
>>>> for?
>>>
>>> I'm not sure what you mean by dbg_open_jiffies--surely a single
>>> open of
>>> a file already in the dentry cache is too fast to be measurable in
>>> jiffies?
>>
>> When dbg_number_of_opens is very high, I see a big difference in
>> the open
>> times
>> for original vs new (almost zero) code. I am running 8, 64, 256, etc,
>> processes and each of them reads files upto 500MB (a lot of open/
>> read/close
>> per file per process), so the jiffies adds up (contention between
>> parallel
>> opens, some processing in open, etc). To clarify this, I will
>> reintroduce
>> the debugs and get some values (it was done a long time back and I
>> don't
>> remember how much difference was there), and post it along with
>> what the
>> debug code is doing.
>>
>>> OK, yeah, I just wondered whether you could end up with a
>>> reference to a
>>> file hanging around indefinitely even after it had been deleted, for
>>> example.
>>
>> If client deletes a file, the server immediately locates and
>> removes the
>> cached
>> entry. If server deletes a file, my original intention was to use
>> inotify
>> to
>> inform NFS server to delete the cache but that ran into some
>> problems. So
>> my
>> solution was to fallback to the cache getting deleted by the daemon
>> after
>> the
>> short timeout, till then the space for the inode is not freed. So
>> in both
>> cases,
>> references to the file will not hang around indefinitely.
>>
>>> I've heard of someone updating read-only block snapshots by stopping
>>> mountd, flushing the export cache, unmounting the old snapshot, then
>>> mounting the new one and restarting mountd. A bit of a hack, but I
>>> guess it works, as long as no clients hold locks or NFSv4 opens on
>>> the
>>> filesystem.
>>>
>>> An open cache may break that by holding references to the filesystem
>>> they want to unmount. But perhaps we should give such users a
>>> proper
>>> interface that tells nfsd to temporarily drop state it holds on a
>>> filesystem, and tell them to use that instead.
>>
>> I must admit that I am lost in this scenario - I was assuming that
>> the
>> filesystem can be unmounted only after nfs services are stopped,
>> hence I
>> added
>> cache cleanup on nfsd_shutdown. Is there some hook to catch for the
>> unmount
>> where I should clean the cache for that filesystem?
>
> No. People have talked about doing that, but it hasn't happened.

It should be noted that mountd's UMNT and UMNT_ALL requests (used by
NFSv2/v3) are advisory, and that our NFSv4 client doesn't contact the
server at unmount time.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-02-09 21:04:23

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] nfsd: Improve NFS server performance

On Mon, Feb 09, 2009 at 03:56:16PM -0500, Chuck Lever wrote:
> On Feb 9, 2009, at 2:06 PM, J. Bruce Fields wrote:
>>
>> No. People have talked about doing that, but it hasn't happened.
>
> It should be noted that mountd's UMNT and UMNT_ALL requests (used by
> NFSv2/v3) are advisory, and that our NFSv4 client doesn't contact the
> server at unmount time.

We're not talking about a client unmounting a server, but about a server
unmounting an exported filesystem.

--b.