2013-04-09 12:46:12

by Miquel van Smoorenburg

[permalink] [raw]
Subject: [PATCH 0/2] RFC: nfs client: lower number of NFS ops in some circumstances

These NFS client patches are posted as an RFC - request for comment.

The idea behind these patches is to make it possible to cut down on
the number of NFS operations used when opening a file in certain
circumstances and environments (esp. mostly-readonly environments).

1/2: "noaccesscheck" mount option

If this option is enabled, the nfs client will not send any
NFS ACCESS calls to the server, except for UID 0. For all other
uids, access is checked locally using generic_permission().

2/2: "sloppycto=N" mount option

This mount option is a bit like like "nocto" - it suppresses
a GETATTR call when a file is opened if we still have valid
attribute data in the cache. The difference is that 1) we
only do this for files that are opened read-only and 2) only
when the last attribute update was 'N' seconds or less ago.

We've been using these patches in production for a couple of months.

Background:

On our webhosting setup, all our customers data is stored on
NFS server appliances. A cluster of linux webservers mounts those
volumes (using NFSv3 over TCP, Unix auth) and serves HTTP,
using a reasonably standard apache setup.

That works pretty well, the problem is our customers like to run
"modern" PHP CMSes like Joomla that are built 'modular' and like to
include hundreds of PHP files to generate just one page. To add
insult to injury, PHP itself stat()s every file before it open()s it.

This means for every pageview with such a CMS we get hundreds of:

stat(): GETATTR
open(): ACCESS + GETATTR (for close-to-open consistency).

Obviously we also get a few hundred read() requests, but only
the very first time, and the content of these files is cached
pretty well after that.

If a second pageview is within the nfsi->attrtimeo timeout, we
may see:

stat(): (GETATTR cached)
open(): ACCESS + GETATTR

or

stat(): (GETATTR cached)
open(): (ACCESS cached) + GETATTR

But after the attribute timeout it's 3 NFS operations again.

With the 'noaccesscheck' mount option, this gets reduced to two operations:

stat(): GETATTR
open(): GETATTR

And after adding the 'sloppycto=3' mount option it becomes

stat(): GETATTR
open(): (GETATTR cached)

This really cuts down on the number of NFS operations. That's good
for us, as the NFS server solution we're using appears to support a
high, but still limited maximum number of NFS ops/sec.

I can think of a few enhancements/adjustments to these patches:

- a clearer name than "noaccesscheck" ?
- instead of "sloppycto=N", perhaps "nocto=ro,acctomin=X,acctomax=Y" ?
- only allow mounting with 'noaccesscheck' when sec=sys
- in namei.h, switch LOOKUP_WRITE and LOOKUP_RENAME_TARGET values
for cosmetic reasons
- ....

Thoughts? Comments? Ridicule? Other solutions?

Mike.


2013-04-09 13:00:35

by Miquel van Smoorenburg

[permalink] [raw]
Subject: [PATCH 1/2] "noaccesscheck" mount option

1/2: "noaccesscheck" mount option

If this option is enabled, the nfs client will not send any
NFS ACCESS calls to the server, except for UID 0. For all other
uids, access is checked locally using generic_permission().

diff -ruN linux-3.9-rc6.orig/include/uapi/linux/nfs_mount.h linux-3.9-rc6/include/uapi/linux/nfs_mount.h
--- linux-3.9-rc6.orig/include/uapi/linux/nfs_mount.h 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/include/uapi/linux/nfs_mount.h 2013-04-08 15:58:38.590470728 +0200
@@ -74,4 +74,6 @@
#define NFS_MOUNT_LOCAL_FLOCK 0x100000
#define NFS_MOUNT_LOCAL_FCNTL 0x200000

+#define NFS_MOUNT_NOACCESSCHECK 0x400000
+
#endif
diff -ruN linux-3.9-rc6.orig/fs/nfs/dir.c linux-3.9-rc6/fs/nfs/dir.c
--- linux-3.9-rc6.orig/fs/nfs/dir.c 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/fs/nfs/dir.c 2013-04-08 15:59:04.674471048 +0200
@@ -2165,6 +2165,22 @@
struct nfs_access_entry cache;
int status;

+ if (NFS_SERVER(inode)->flags & NFS_MOUNT_NOACCESSCHECK) {
+ /*
+ * We could also check
+ * NFS_SERVER(inode)->client->cl_auth->au_ops->au_flavor
+ * to see if this is RPC_AUTH_UNIX, which is the only
+ * auth flavor where this makes sense, but that's way
+ * too much pointer chasing.
+ */
+ if (cred->cr_uid != 0) {
+ status = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ if (status == 0)
+ status = generic_permission(inode, mask);
+ return status;
+ }
+ }
+
status = nfs_access_get_cached(inode, cred, &cache);
if (status == 0)
goto out;
diff -ruN linux-3.9-rc6.orig/fs/nfs/super.c linux-3.9-rc6/fs/nfs/super.c
--- linux-3.9-rc6.orig/fs/nfs/super.c 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/fs/nfs/super.c 2013-04-08 15:59:04.678470794 +0200
@@ -91,6 +91,7 @@
Opt_resvport, Opt_noresvport,
Opt_fscache, Opt_nofscache,
Opt_migration, Opt_nomigration,
+ Opt_accesscheck, Opt_noaccesscheck,

/* Mount options that take integer arguments */
Opt_port,
@@ -152,6 +153,8 @@
{ Opt_nofscache, "nofsc" },
{ Opt_migration, "migration" },
{ Opt_nomigration, "nomigration" },
+ { Opt_accesscheck, "accesscheck" },
+ { Opt_noaccesscheck, "noaccesscheck" },

{ Opt_port, "port=%s" },
{ Opt_rsize, "rsize=%s" },
@@ -635,6 +638,7 @@
{ NFS_MOUNT_NORDIRPLUS, ",nordirplus", "" },
{ NFS_MOUNT_UNSHARED, ",nosharecache", "" },
{ NFS_MOUNT_NORESVPORT, ",noresvport", "" },
+ { NFS_MOUNT_NOACCESSCHECK, ",noaccesscheck", "" },
{ 0, NULL, NULL }
};
const struct proc_nfs_info *nfs_infop;
@@ -1261,6 +1265,12 @@
case Opt_nomigration:
mnt->options &= NFS_OPTION_MIGRATION;
break;
+ case Opt_accesscheck:
+ mnt->flags &= ~NFS_MOUNT_NOACCESSCHECK;
+ break;
+ case Opt_noaccesscheck:
+ mnt->flags |= NFS_MOUNT_NOACCESSCHECK;
+ break;

/*
* options that take numeric values


2013-04-24 15:01:04

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: [PATCH 0/2] RFC: nfs client: lower number of NFS ops in some circumstances

On 10/04/13 17:18, Peter Staubach wrote:
> Miquel van Smoorenburg wrote:
>>
>> These NFS client patches are posted as an RFC - request for comment.
>>
>> The idea behind these patches is to make it possible to cut down on
>> the number of NFS operations used when opening a file in certain
>> circumstances and environments (esp. mostly-readonly environments).
>
> Outside of possibly reducing the overall RPC count between the NFS
> client and the NFS server, was a difference measured in the overall
> performance of the website, from the customer's viewpoint?

(Sorry, it took a while before I could run these tests)

Yes- mostly tests with a webbrowser, as in "does it 'feel' faster",
which is hard to quantify objectively.

We also ran some more objective tests, which I re-ran just now.

I took one of our servers out of the cluster, and did some tests on it.
I ran a "wget http://customer-site.nl/" on it repeatedly with different
mount options. The customer testsite is a joomla site (one of the
'modular' CMSes that stats/opens a lot of files for each page-request).

2 sessions below, each session was started with an empty cache.


1. Mountoptions: rw,nosuid,tcp,noatime,vers=3

request request time (wget)
1 0m1.287s
2 0m0.550s
3 0m0.453s
4 0m0.343s
5 0m0.379s

nfsstat:

Client rpc stats:
calls retrans authrefrsh
2612 0 2612

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 1308 50% 0 0% 416 15% 667 25% 3 0%
read write create mkdir symlink mknod
216 8% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 0 0% 2 0%
fsstat fsinfo pathconf commit
0 0% 0 0% 0 0% 0 0%

2. Mountoptions: rw,nosuid,tcp,noatime,noaccesscheck,sloppycto=3,vers=3

request request time (wget)
1 0m0.787s
2 0m0.376s
3 0m0.385s
4 0m0.324s
5 0m0.338s

nfsstat:

Client rpc stats:
calls retrans authrefrsh
1539 0 1539

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 902 58% 0 0% 416 27% 0 0% 3 0%
read write create mkdir symlink mknod
216 14% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 0 0% 2 0%
fsstat fsinfo pathconf commit
0 0% 0 0% 0 0% 0 0%


You can clearly see that with the new mount options the number of NFS requests
is way lower, and page loading time (esp. for the first request) has gone down.
The number of lookups and reads is exactly the same, but the second nfsstat has
zero 'access' calls and 45% less 'getattr' calls.

Mike.

2013-04-10 15:26:10

by Peter Staubach

[permalink] [raw]
Subject: RE: [PATCH 0/2] RFC: nfs client: lower number of NFS ops in some circumstances

Outside of possibly reducing the overall RPC count between the NFS client and the NFS server, was a difference measured in the overall performance of the website, from the customer's viewpoint?

Thanx...

ps


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Miquel van Smoorenburg
Sent: Tuesday, April 09, 2013 8:46 AM
To: Trond Myklebust
Cc: [email protected]
Subject: [PATCH 0/2] RFC: nfs client: lower number of NFS ops in some circumstances

These NFS client patches are posted as an RFC - request for comment.

The idea behind these patches is to make it possible to cut down on the number of NFS operations used when opening a file in certain circumstances and environments (esp. mostly-readonly environments).

1/2: "noaccesscheck" mount option

If this option is enabled, the nfs client will not send any
NFS ACCESS calls to the server, except for UID 0. For all other
uids, access is checked locally using generic_permission().

2/2: "sloppycto=N" mount option

This mount option is a bit like like "nocto" - it suppresses
a GETATTR call when a file is opened if we still have valid
attribute data in the cache. The difference is that 1) we
only do this for files that are opened read-only and 2) only
when the last attribute update was 'N' seconds or less ago.

We've been using these patches in production for a couple of months.

Background:

On our webhosting setup, all our customers data is stored on NFS server appliances. A cluster of linux webservers mounts those volumes (using NFSv3 over TCP, Unix auth) and serves HTTP, using a reasonably standard apache setup.

That works pretty well, the problem is our customers like to run "modern" PHP CMSes like Joomla that are built 'modular' and like to include hundreds of PHP files to generate just one page. To add insult to injury, PHP itself stat()s every file before it open()s it.

This means for every pageview with such a CMS we get hundreds of:

stat(): GETATTR
open(): ACCESS + GETATTR (for close-to-open consistency).

Obviously we also get a few hundred read() requests, but only the very first time, and the content of these files is cached pretty well after that.

If a second pageview is within the nfsi->attrtimeo timeout, we may see:

stat(): (GETATTR cached)
open(): ACCESS + GETATTR

or

stat(): (GETATTR cached)
open(): (ACCESS cached) + GETATTR

But after the attribute timeout it's 3 NFS operations again.

With the 'noaccesscheck' mount option, this gets reduced to two operations:

stat(): GETATTR
open(): GETATTR

And after adding the 'sloppycto=3' mount option it becomes

stat(): GETATTR
open(): (GETATTR cached)

This really cuts down on the number of NFS operations. That's good for us, as the NFS server solution we're using appears to support a high, but still limited maximum number of NFS ops/sec.

I can think of a few enhancements/adjustments to these patches:

- a clearer name than "noaccesscheck" ?
- instead of "sloppycto=N", perhaps "nocto=ro,acctomin=X,acctomax=Y" ?
- only allow mounting with 'noaccesscheck' when sec=sys
- in namei.h, switch LOOKUP_WRITE and LOOKUP_RENAME_TARGET values
for cosmetic reasons
- ....

Thoughts? Comments? Ridicule? Other solutions?

Mike.

2013-04-09 13:01:35

by Miquel van Smoorenburg

[permalink] [raw]
Subject: [PATCH 2/2] "sloppycto=N" mount option

2/2: "sloppycto=N" mount option

This mount option is a bit like like "nocto" - it suppresses
a GETATTR call when a file is opened if we still have valid
attribute data in the cache. The difference is that 1) we
only do this for files that are opened read-only and 2) only
when the last attribute update was 'N' seconds or less ago.

diff -ruN linux-3.9-rc6.orig/include/linux/namei.h linux-3.9-rc6/include/linux/namei.h
--- linux-3.9-rc6.orig/include/linux/namei.h 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/include/linux/namei.h 2013-04-08 15:53:44.546470854 +0200
@@ -55,6 +55,7 @@
#define LOOKUP_JUMPED 0x1000
#define LOOKUP_ROOT 0x2000
#define LOOKUP_EMPTY 0x4000
+#define LOOKUP_WRITE 0x8000

extern int user_path_at(int, const char __user *, unsigned, struct path *);
extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
diff -ruN linux-3.9-rc6.orig/include/linux/nfs_fs_sb.h linux-3.9-rc6/include/linux/nfs_fs_sb.h
--- linux-3.9-rc6.orig/include/linux/nfs_fs_sb.h 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/include/linux/nfs_fs_sb.h 2013-04-08 15:44:15.726470599 +0200
@@ -124,6 +124,7 @@
unsigned int acregmax;
unsigned int acdirmin;
unsigned int acdirmax;
+ unsigned int sloppycto;
unsigned int namelen;
unsigned int options; /* extra options enabled by mount */
#define NFS_OPTION_FSCACHE 0x00000001 /* - local caching enabled */
diff -ruN linux-3.9-rc6.orig/fs/nfs/client.c linux-3.9-rc6/fs/nfs/client.c
--- linux-3.9-rc6.orig/fs/nfs/client.c 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/fs/nfs/client.c 2013-04-08 15:44:15.714470488 +0200
@@ -779,6 +779,7 @@
server->acregmax = data->acregmax * HZ;
server->acdirmin = data->acdirmin * HZ;
server->acdirmax = data->acdirmax * HZ;
+ server->sloppycto = data->sloppycto * HZ;

/* Start lockd here, before we might error out */
error = nfs_start_lockd(server);
@@ -859,6 +860,7 @@
if (server->flags & NFS_MOUNT_NOAC) {
server->acregmin = server->acregmax = 0;
server->acdirmin = server->acdirmax = 0;
+ server->sloppycto = 0;
}

server->maxfilesize = fsinfo->maxfilesize;
@@ -926,6 +928,7 @@
target->acregmax = source->acregmax;
target->acdirmin = source->acdirmin;
target->acdirmax = source->acdirmax;
+ target->sloppycto = source->sloppycto;
target->caps = source->caps;
target->options = source->options;
}
diff -ruN linux-3.9-rc6.orig/fs/nfs/dir.c linux-3.9-rc6/fs/nfs/dir.c
--- linux-3.9-rc6.orig/fs/nfs/dir.c 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/fs/nfs/dir.c 2013-04-08 15:52:33.602470466 +0200
@@ -972,6 +972,22 @@
}

/*
+ * See if we allow sloppy close-to-open consistency.
+ */
+static inline
+int sloppycto(struct inode *inode, int flags)
+{
+ struct nfs_server *server = NFS_SERVER(inode);
+ struct nfs_inode *nfsi = NFS_I(inode);
+
+ return S_ISREG(inode->i_mode) &&
+ !(flags & LOOKUP_WRITE) &&
+ server->sloppycto &&
+ time_in_range_open(jiffies, nfsi->attrtimeo_timestamp,
+ nfsi->attrtimeo_timestamp + server->sloppycto);
+}
+
+/*
* Inode and filehandle revalidation for lookups.
*
* We force revalidation in the cases where the VFS sets LOOKUP_REVAL,
@@ -991,7 +1007,8 @@
if (flags & LOOKUP_REVAL)
goto out_force;
/* This is an open(2) */
- if ((flags & LOOKUP_OPEN) && !(server->flags & NFS_MOUNT_NOCTO) &&
+ if ((flags & LOOKUP_OPEN) &&
+ !(server->flags & NFS_MOUNT_NOCTO) && !sloppycto(inode, flags) &&
(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
goto out_force;
out:
diff -ruN linux-3.9-rc6.orig/fs/nfs/fscache.c linux-3.9-rc6/fs/nfs/fscache.c
--- linux-3.9-rc6.orig/fs/nfs/fscache.c 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/fs/nfs/fscache.c 2013-04-08 15:44:15.722470552 +0200
@@ -89,6 +89,7 @@
key->key.nfs_server.acregmax = nfss->acregmax;
key->key.nfs_server.acdirmin = nfss->acdirmin;
key->key.nfs_server.acdirmax = nfss->acdirmax;
+ key->key.nfs_server.sloppycto = nfss->sloppycto;
key->key.nfs_server.fsid = nfss->fsid;
key->key.rpc_auth.au_flavor = nfss->client->cl_auth->au_flavor;

diff -ruN linux-3.9-rc6.orig/fs/nfs/fscache.h linux-3.9-rc6/fs/nfs/fscache.h
--- linux-3.9-rc6.orig/fs/nfs/fscache.h 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/fs/nfs/fscache.h 2013-04-08 15:44:15.722470552 +0200
@@ -43,6 +43,7 @@
unsigned int acregmax;
unsigned int acdirmin;
unsigned int acdirmax;
+ unsigned int sloppycto;
} nfs_server;

struct {
diff -ruN linux-3.9-rc6.orig/fs/nfs/internal.h linux-3.9-rc6/fs/nfs/internal.h
--- linux-3.9-rc6.orig/fs/nfs/internal.h 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/fs/nfs/internal.h 2013-04-08 15:44:15.726470599 +0200
@@ -82,6 +82,7 @@
int flags;
unsigned int rsize, wsize;
unsigned int timeo, retrans;
+ unsigned int sloppycto;
unsigned int acregmin, acregmax,
acdirmin, acdirmax;
unsigned int namlen;
diff -ruN linux-3.9-rc6.orig/fs/nfs/nfs4client.c linux-3.9-rc6/fs/nfs/nfs4client.c
--- linux-3.9-rc6.orig/fs/nfs/nfs4client.c 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/fs/nfs/nfs4client.c 2013-04-08 15:44:15.726470599 +0200
@@ -795,6 +795,7 @@
server->acregmax = data->acregmax * HZ;
server->acdirmin = data->acdirmin * HZ;
server->acdirmax = data->acdirmax * HZ;
+ server->sloppycto = data->sloppycto * HZ;

server->port = data->nfs_server.port;

diff -ruN linux-3.9-rc6.orig/fs/nfs/super.c linux-3.9-rc6/fs/nfs/super.c
--- linux-3.9-rc6.orig/fs/nfs/super.c 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/fs/nfs/super.c 2013-04-08 15:44:15.726470599 +0200
@@ -98,6 +98,7 @@
Opt_timeo, Opt_retrans,
Opt_acregmin, Opt_acregmax,
Opt_acdirmin, Opt_acdirmax,
+ Opt_sloppycto,
Opt_actimeo,
Opt_namelen,
Opt_mountport,
@@ -163,6 +164,7 @@
{ Opt_acregmax, "acregmax=%s" },
{ Opt_acdirmin, "acdirmin=%s" },
{ Opt_acdirmax, "acdirmax=%s" },
+ { Opt_sloppycto, "sloppycto=%s" },
{ Opt_actimeo, "actimeo=%s" },
{ Opt_namelen, "namlen=%s" },
{ Opt_mountport, "mountport=%s" },
@@ -656,6 +658,8 @@
seq_printf(m, ",acdirmin=%u", nfss->acdirmin/HZ);
if (nfss->acdirmax != NFS_DEF_ACDIRMAX*HZ || showdefaults)
seq_printf(m, ",acdirmax=%u", nfss->acdirmax/HZ);
+ if (nfss->sloppycto != 0)
+ seq_printf(m, ",sloppycto=%u", nfss->sloppycto/HZ);
for (nfs_infop = nfs_info; nfs_infop->flag; nfs_infop++) {
if (nfss->flags & nfs_infop->flag)
seq_puts(m, nfs_infop->str);
@@ -1316,6 +1320,11 @@
goto out_invalid_value;
mnt->acdirmax = option;
break;
+ case Opt_sloppycto:
+ if (nfs_get_option_ul(args, &option))
+ goto out_invalid_value;
+ mnt->sloppycto = option;
+ break;
case Opt_actimeo:
if (nfs_get_option_ul(args, &option))
goto out_invalid_value;
@@ -2074,6 +2083,7 @@
data->acregmax != nfss->acregmax / HZ ||
data->acdirmin != nfss->acdirmin / HZ ||
data->acdirmax != nfss->acdirmax / HZ ||
+ data->sloppycto != nfss->sloppycto / HZ ||
data->timeo != (10U * nfss->client->cl_timeout->to_initval / HZ) ||
data->nfs_server.port != nfss->port ||
data->nfs_server.addrlen != nfss->nfs_client->cl_addrlen ||
@@ -2245,6 +2255,8 @@
goto Ebusy;
if (a->acdirmax != b->acdirmax)
goto Ebusy;
+ if (a->sloppycto != b->sloppycto)
+ goto Ebusy;
if (clnt_a->cl_auth->au_flavor != clnt_b->cl_auth->au_flavor)
goto Ebusy;
return 1;
diff -ruN linux-3.9-rc6.orig/fs/open.c linux-3.9-rc6/fs/open.c
--- linux-3.9-rc6.orig/fs/open.c 2013-04-08 05:49:54.000000000 +0200
+++ linux-3.9-rc6/fs/open.c 2013-04-08 15:57:34.818470456 +0200
@@ -899,6 +899,8 @@
if (flags & O_EXCL)
op->intent |= LOOKUP_EXCL;
}
+ if (flags & (O_RDWR|O_WRONLY|O_CREAT))
+ op->intent |= LOOKUP_WRITE;

if (flags & O_DIRECTORY)
lookup_flags |= LOOKUP_DIRECTORY;