LinuxLists.cc - [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd

2023-10-23 02:11:46

Subject: [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh

Hi,
this is a revised version of my previous series with the same name.
This first two patches are unchanged.
The third patch, which was an RFC, has been replaced with the last
patch which actually addresses the issue rather than skirting
around it.

Patch 3 here is a revert of a change I noticed while exploring the
code. cache_open() must be called BEFORE forking workers, as explained
in that patch.
Patches 4 and 5 factor our common code which makes the final patch
simpler.

The core issue is that sometimes mountd (or exportd) cannot give a
definitey "yes" or "no" to a request to map an fsid to a path name.
In these cases the only safe option is to delay and try again.

This only becomes relevant if a filesystem is mounted by a client, then
the server restarts (or the export cache is flushed) and the client
tries to use a filehandle that it already has, but that server cannot
find it and cannot be sure it doesn't exist. This can happen when an
export is marked "mountpoint" or when a re-exported NFS filesystem
cannot contact the server and reports an ETIMEDOUT error. In these
cases we want the client to continue waiting (which it does) and also
want mountd/exportd to periodically check if the target filesystem has
come back (which it currently does not).
With the current code, once this situation happens and the client is
waiting, the client will continue to wait indefintely even if the
target filesytem becomes available. The client can only continue if
the NFS server is restarted or the export cache is flushed. After the
ptsch, then within 2 minutes of the target filesystem becoming
available again, mountd will tell the kernel and when the client asks
again it will get be allowed to proceed.

NeilBrown

[PATCH 1/6] export: fix handling of error from match_fsid()
[PATCH 2/6] export: add EACCES to the list of known
[PATCH 3/6] export: move cache_open() before workers are forked.
[PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c
[PATCH 5/6] Share process_loop code between mountd and exportd.
[PATCH 6/6] cache: periodically retry requests that couldn't be

2023-10-23 02:11:50

by NeilBrown

[permalink] [raw]

Subject: [PATCH 1/6] export: fix handling of error from match_fsid()

If match_fsid() returns -1 we shouldn't assume that the path definitely
doesn't match the fsid, though it might not.
This is a similar situation to where an export is expected to be a mount
point, but is found not to be one. So it can be handled the same way,
by setting 'dev_missing'.
This will only have an effect if no other path matched the fsid, which
is what we want.

The current code results in nothing being exported if any export point,
or any mount point beneath a crossmnt export point, fails a 'stat'
request, which is too harsh.

Signed-off-by: NeilBrown <[email protected]>
---
support/export/cache.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index 19bbba556060..e4595020f43f 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -858,7 +858,8 @@ static void nfsd_fh(int f)
case 0:
continue;
case -1:
- goto out;
+ dev_missing ++;
+ continue;
}
if (is_ipaddr_client(dom)
&& !ipaddr_client_matches(exp, ai))
--
2.42.0

2023-10-23 02:11:53

by NeilBrown

[permalink] [raw]

Subject: [PATCH 2/6] export: add EACCES to the list of known path_lookup_error() errors.

If a 'stat' results in EACCES (for root), then it is likely a permanent
problem. One possible cause is a 'fuser' filesystem which only gives
any access to the user which mounted it.

So it is reasonable for EACCES to be a "path lookup error"

Signed-off-by: NeilBrown <[email protected]>
---
support/export/cache.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/support/export/cache.c b/support/export/cache.c
index e4595020f43f..5307f6c8d872 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -77,6 +77,7 @@ static bool path_lookup_error(int err)
case ENAMETOOLONG:
case ENOENT:
case ENOTDIR:
+ case EACCES:
return 1;
}
return 0;
--
2.42.0

2023-10-23 02:12:01

by NeilBrown

[permalink] [raw]

Subject: [PATCH 3/6] export: move cache_open() before workers are forked.

If each worker has a separate open on a cache channel, then each worker
will potentially receive every upcall request resulting in duplicated
work.

A worker will only not see a request that another worker sees if that
other worker answers the request before this worker gets a chance to
read it.

To avoid duplicate effort between threads and so get maximum benefit
from multiple threads, open the cache channels before forking.

Note that the kernel provides locking so that only one thread can be
reading to writing to any channel at any given moment.

Fixes: 5fc3bac9e0c3 ("mountd: Ensure we don't share cache file descriptors among processes.")
Signed-off-by: NeilBrown <[email protected]>
---
utils/exportd/exportd.c | 8 ++++++--
utils/mountd/mountd.c | 8 ++++++--
2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/utils/exportd/exportd.c b/utils/exportd/exportd.c
index 2dd12cb6015b..6f866445efc2 100644
--- a/utils/exportd/exportd.c
+++ b/utils/exportd/exportd.c
@@ -289,12 +289,16 @@ main(int argc, char **argv)
else if (num_threads > MAX_THREADS)
num_threads = MAX_THREADS;

+ /* Open cache channel files BEFORE forking so each upcall is
+ * only handled by one thread. Kernel provides locking for both
+ * read and write.
+ */
+ cache_open();
+
if (num_threads > 1)
fork_workers();

- /* Open files now to avoid sharing descriptors among forked processes */
- cache_open();
v4clients_init();

/* Process incoming upcalls */
diff --git a/utils/mountd/mountd.c b/utils/mountd/mountd.c
index bcf749fabbb3..f9c62cded66c 100644
--- a/utils/mountd/mountd.c
+++ b/utils/mountd/mountd.c
@@ -916,12 +916,16 @@ main(int argc, char **argv)
else if (num_threads > MAX_THREADS)
num_threads = MAX_THREADS;

+ /* Open cache channel files BEFORE forking so each upcall is
+ * only handled by one thread. Kernel provides locking for both
+ * read and write.
+ */
+ cache_open();
+
if (num_threads > 1)
fork_workers();

nfsd_path_init();
- /* Open files now to avoid sharing descriptors among forked processes */
- cache_open();
v4clients_init();

xlog(L_NOTICE, "Version " VERSION " starting");
--
2.42.0

2023-10-23 02:12:08

by NeilBrown

[permalink] [raw]

Subject: [PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c

Both mountd and exported have fork_workers() and wait_for_workers()
which are nearly identical.
Move this code into cache.c (adding a cache_ prefix to the function
names) and leave the minor differences in the two callers.

Also remove duplicate declarations from mountd.h.

Signed-off-by: NeilBrown <[email protected]>
---
support/export/cache.c | 75 ++++++++++++++++++++++++++++++++-
support/export/export.h | 2 +
utils/exportd/exportd.c | 90 ++++++----------------------------------
utils/mountd/mountd.c | 91 ++++++-----------------------------------
utils/mountd/mountd.h | 9 ----
5 files changed, 99 insertions(+), 168 deletions(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index 5307f6c8d872..1874156af5e5 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -1,10 +1,9 @@
-
/*
* Handle communication with knfsd internal cache
*
* We open /proc/net/rpc/{auth.unix.ip,nfsd.export,nfsd.fh}/channel
* and listen for requests (using my_svc_run)
- *
+ *
*/

#ifdef HAVE_CONFIG_H
@@ -16,6 +15,7 @@
#include <sys/select.h>
#include <sys/stat.h>
#include <sys/vfs.h>
+#include <sys/wait.h>
#include <time.h>
#include <netinet/in.h>
#include <arpa/inet.h>
@@ -1775,3 +1775,74 @@ cache_get_filehandle(nfs_export *exp, int len, char *p)
fh.fh_size = qword_get(&bp, (char *)fh.fh_handle, NFS3_FHSIZE);
return &fh;
}
+
+/* Wait for all worker child processes to exit and reap them */
+void
+cache_wait_for_workers(char *prog)
+{
+ int status;
+ pid_t pid;
+
+ for (;;) {
+
+ pid = waitpid(0, &status, 0);
+
+ if (pid < 0) {
+ if (errno == ECHILD)
+ return; /* no more children */
+ xlog(L_FATAL, "%s: can't wait: %s\n", prog,
+ strerror(errno));
+ }
+
+ /* Note: because we SIG_IGN'd SIGCHLD earlier, this
+ * does not happen on 2.6 kernels, and waitpid() blocks
+ * until all the children are dead then returns with
+ * -ECHILD. But, we don't need to do anything on the
+ * death of individual workers, so we don't care. */
+ xlog(L_NOTICE, "%s: reaped child %d, status %d\n",
+ prog, (int)pid, status);
+ }
+}
+
+/* Fork num_threads worker children and wait for them */
+int
+cache_fork_workers(char *prog, int num_threads)
+{
+ int i;
+ pid_t pid;
+
+ if (num_threads <= 1)
+ return 1;
+
+ xlog(L_NOTICE, "%s: starting %d threads\n", prog, num_threads);
+
+ for (i = 0 ; i < num_threads ; i++) {
+ pid = fork();
+ if (pid < 0) {
+ xlog(L_FATAL, "%s: cannot fork: %s\n", prog,
+ strerror(errno));
+ }
+ if (pid == 0) {
+ /* worker child */
+
+ /* Re-enable the default action on SIGTERM et al
+ * so that workers die naturally when sent them.
+ * Only the parent unregisters with pmap and
+ * hence needs to do special SIGTERM handling. */
+ struct sigaction sa;
+ sa.sa_handler = SIG_DFL;
+ sa.sa_flags = 0;
+ sigemptyset(&sa.sa_mask);
+ sigaction(SIGHUP, &sa, NULL);
+ sigaction(SIGINT, &sa, NULL);
+ sigaction(SIGTERM, &sa, NULL);
+
+ /* fall into my_svc_run in caller */
+ return 1;
+ }
+ }
+
+ /* in parent */
+ cache_wait_for_workers(prog);
+ return 0;
+}
diff --git a/support/export/export.h b/support/export/export.h
index 8d5a0d3004ef..ce561f9fbd3e 100644
--- a/support/export/export.h
+++ b/support/export/export.h
@@ -29,6 +29,8 @@ int v4clients_process(fd_set *fdset);
struct nfs_fh_len *
cache_get_filehandle(nfs_export *exp, int len, char *p);
int cache_export(nfs_export *exp, char *path);
+int cache_fork_workers(char *prog, int num_threads);
+void cache_wait_for_workers(char *prog);

bool ipaddr_client_matches(nfs_export *exp, struct addrinfo *ai);
bool namelist_client_matches(nfs_export *exp, char *dom);
diff --git a/utils/exportd/exportd.c b/utils/exportd/exportd.c
index 6f866445efc2..d07a885c6763 100644
--- a/utils/exportd/exportd.c
+++ b/utils/exportd/exportd.c
@@ -16,7 +16,6 @@
#include <string.h>
#include <getopt.h>
#include <errno.h>
-#include <wait.h>

#include "nfslib.h"
#include "conffile.h"
@@ -54,90 +53,19 @@ static char shortopts[] = "d:fghs:t:liT:";
*/
inline static void set_signals(void);

-/* Wait for all worker child processes to exit and reap them */
-static void
-wait_for_workers (void)
-{
- int status;
- pid_t pid;
-
- for (;;) {
-
- pid = waitpid(0, &status, 0);
-
- if (pid < 0) {
- if (errno == ECHILD)
- return; /* no more children */
- xlog(L_FATAL, "mountd: can't wait: %s\n",
- strerror(errno));
- }
-
- /* Note: because we SIG_IGN'd SIGCHLD earlier, this
- * does not happen on 2.6 kernels, and waitpid() blocks
- * until all the children are dead then returns with
- * -ECHILD. But, we don't need to do anything on the
- * death of individual workers, so we don't care. */
- xlog(L_NOTICE, "mountd: reaped child %d, status %d\n",
- (int)pid, status);
- }
-}
-
inline void
cleanup_lockfiles (void)
{
unlink(etab.lockfn);
}

-/* Fork num_threads worker children and wait for them */
static void
-fork_workers(void)
-{
- int i;
- pid_t pid;
-
- xlog(L_NOTICE, "mountd: starting %d threads\n", num_threads);
-
- for (i = 0 ; i < num_threads ; i++) {
- pid = fork();
- if (pid < 0) {
- xlog(L_FATAL, "mountd: cannot fork: %s\n",
- strerror(errno));
- }
- if (pid == 0) {
- /* worker child */
-
- /* Re-enable the default action on SIGTERM et al
- * so that workers die naturally when sent them.
- * Only the parent unregisters with pmap and
- * hence needs to do special SIGTERM handling. */
- struct sigaction sa;
- sa.sa_handler = SIG_DFL;
- sa.sa_flags = 0;
- sigemptyset(&sa.sa_mask);
- sigaction(SIGHUP, &sa, NULL);
- sigaction(SIGINT, &sa, NULL);
- sigaction(SIGTERM, &sa, NULL);
-
- /* fall into my_svc_run in caller */
- return;
- }
- }
-
- /* in parent */
- wait_for_workers();
- cleanup_lockfiles();
- free_state_path_names(&etab);
- xlog(L_NOTICE, "exportd: no more workers, exiting\n");
- exit(0);
-}
-
-static void
killer (int sig)
{
if (num_threads > 1) {
/* play Kronos and eat our children */
kill(0, SIGTERM);
- wait_for_workers();
+ cache_wait_for_workers("exportd");
}
cleanup_lockfiles();
free_state_path_names(&etab);
@@ -145,6 +73,7 @@ killer (int sig)

exit(0);
}
+
static void
sig_hup (int UNUSED(sig))
{
@@ -152,8 +81,9 @@ sig_hup (int UNUSED(sig))
xlog (L_NOTICE, "Received SIGHUP... Ignoring.\n");
return;
}
-inline static void
-set_signals(void)
+
+inline static void
+set_signals(void)
{
struct sigaction sa;

@@ -295,9 +225,13 @@ main(int argc, char **argv)
*/
cache_open();

- if (num_threads > 1)
- fork_workers();
-
+ if (cache_fork_workers(progname, num_threads) == 0) {
+ /* We forked, waited, and now need to clean up */
+ cleanup_lockfiles();
+ free_state_path_names(&etab);
+ xlog(L_NOTICE, "%s: no more workers, exiting\n", progname);
+ exit(0);
+ }

v4clients_init();

diff --git a/utils/mountd/mountd.c b/utils/mountd/mountd.c
index f9c62cded66c..dbd5546df61c 100644
--- a/utils/mountd/mountd.c
+++ b/utils/mountd/mountd.c
@@ -21,7 +21,6 @@
#include <errno.h>
#include <fcntl.h>
#include <sys/resource.h>
-#include <sys/wait.h>

#include "conffile.h"
#include "xmalloc.h"
@@ -119,90 +118,17 @@ cleanup_lockfiles (void)
unlink(rmtab.lockfn);
}

-/* Wait for all worker child processes to exit and reap them */
-static void
-wait_for_workers (void)
-{
- int status;
- pid_t pid;
-
- for (;;) {
-
- pid = waitpid(0, &status, 0);
-
- if (pid < 0) {
- if (errno == ECHILD)
- return; /* no more children */
- xlog(L_FATAL, "mountd: can't wait: %s\n",
- strerror(errno));
- }
-
- /* Note: because we SIG_IGN'd SIGCHLD earlier, this
- * does not happen on 2.6 kernels, and waitpid() blocks
- * until all the children are dead then returns with
- * -ECHILD. But, we don't need to do anything on the
- * death of individual workers, so we don't care. */
- xlog(L_NOTICE, "mountd: reaped child %d, status %d\n",
- (int)pid, status);
- }
-}
-
-/* Fork num_threads worker children and wait for them */
-static void
-fork_workers(void)
-{
- int i;
- pid_t pid;
-
- xlog(L_NOTICE, "mountd: starting %d threads\n", num_threads);
-
- for (i = 0 ; i < num_threads ; i++) {
- pid = fork();
- if (pid < 0) {
- xlog(L_FATAL, "mountd: cannot fork: %s\n",
- strerror(errno));
- }
- if (pid == 0) {
- /* worker child */
-
- /* Re-enable the default action on SIGTERM et al
- * so that workers die naturally when sent them.
- * Only the parent unregisters with pmap and
- * hence needs to do special SIGTERM handling. */
- struct sigaction sa;
- sa.sa_handler = SIG_DFL;
- sa.sa_flags = 0;
- sigemptyset(&sa.sa_mask);
- sigaction(SIGHUP, &sa, NULL);
- sigaction(SIGINT, &sa, NULL);
- sigaction(SIGTERM, &sa, NULL);
-
- /* fall into my_svc_run in caller */
- return;
- }
- }
-
- /* in parent */
- wait_for_workers();
- unregister_services();
- cleanup_lockfiles();
- free_state_path_names(&etab);
- free_state_path_names(&rmtab);
- xlog(L_NOTICE, "mountd: no more workers, exiting\n");
- exit(0);
-}
-
/*
* Signal handler.
*/
-static void
+static void
killer (int sig)
{
unregister_services();
if (num_threads > 1) {
/* play Kronos and eat our children */
kill(0, SIGTERM);
- wait_for_workers();
+ cache_wait_for_workers("mountd");
}
cleanup_lockfiles();
free_state_path_names(&etab);
@@ -220,7 +146,7 @@ sig_hup (int UNUSED(sig))
}

bool_t
-mount_null_1_svc(struct svc_req *rqstp, void *UNUSED(argp),
+mount_null_1_svc(struct svc_req *rqstp, void *UNUSED(argp),
void *UNUSED(resp))
{
struct sockaddr *sap = nfs_getrpccaller(rqstp->rq_xprt);
@@ -922,8 +848,15 @@ main(int argc, char **argv)
*/
cache_open();

- if (num_threads > 1)
- fork_workers();
+ if (cache_fork_workers("mountd", num_threads) == 0) {
+ /* We forked, waited, and now need to clean up */
+ unregister_services();
+ cleanup_lockfiles();
+ free_state_path_names(&etab);
+ free_state_path_names(&rmtab);
+ xlog(L_NOTICE, "mountd: no more workers, exiting\n");
+ exit(0);
+ }

nfsd_path_init();
v4clients_init();
diff --git a/utils/mountd/mountd.h b/utils/mountd/mountd.h
index d30775313f66..bd5c9576d8ad 100644
--- a/utils/mountd/mountd.h
+++ b/utils/mountd/mountd.h
@@ -51,13 +51,4 @@ void mountlist_del(char *host, const char *path);
void mountlist_del_all(const struct sockaddr *sap);
mountlist mountlist_list(void);

-void cache_open(void);
-struct nfs_fh_len *
- cache_get_filehandle(nfs_export *exp, int len, char *p);
-int cache_export(nfs_export *exp, char *path);
-
-bool ipaddr_client_matches(nfs_export *exp, struct addrinfo *ai);
-bool namelist_client_matches(nfs_export *exp, char *dom);
-bool client_matches(nfs_export *exp, char *dom, struct addrinfo *ai);
-
#endif /* MOUNTD_H */
--
2.42.0

2023-10-23 02:12:13

by NeilBrown

[permalink] [raw]

Subject: [PATCH 5/6] Share process_loop code between mountd and exportd.

There is substantial commonality between cache_process_loop() used by
exportd and my_svc_run() used by mountd.

Remove the looping from cache_process_loop() renaming it to
cache_process() and call it in a loop from exportd.
my_svc_run() now calls cache_process() for all the common functionality
and adds code specific to being an RPC server.

Signed-off-by: NeilBrown <[email protected]>
---
support/export/cache.c | 49 +++++++++++++++++++++--------------------
support/export/export.h | 1 +
utils/exportd/exportd.c | 5 +++--
utils/mountd/svc_run.c | 23 ++++---------------
4 files changed, 33 insertions(+), 45 deletions(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index 1874156af5e5..a01eba4f6619 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -1602,40 +1602,41 @@ int cache_process_req(fd_set *readfds)
}

/**
- * cache_process_loop - process incoming upcalls
+ * cache_process - process incoming upcalls
+ * Returns -ve on error, or number of fds in svc_fds
+ * that might need processing.
*/
-void cache_process_loop(void)
+int cache_process(fd_set *readfds)
{
- fd_set readfds;
+ fd_set fdset;
int selret;

- FD_ZERO(&readfds);
-
- for (;;) {
-
- cache_set_fds(&readfds);
- v4clients_set_fds(&readfds);
-
- selret = select(FD_SETSIZE, &readfds,
- (void *) 0, (void *) 0, (struct timeval *) 0);
+ if (!readfds) {
+ FD_ZERO(&fdset);
+ readfds = &fdset;
+ }
+ cache_set_fds(readfds);
+ v4clients_set_fds(readfds);

+ selret = select(FD_SETSIZE, readfds,
+ (void *) 0, (void *) 0, (struct timeval *) 0);

- switch (selret) {
- case -1:
- if (errno == EINTR || errno == ECONNREFUSED
- || errno == ENETUNREACH || errno == EHOSTUNREACH)
- continue;
- xlog(L_ERROR, "my_svc_run() - select: %m");
- return;
+ switch (selret) {
+ case -1:
+ if (errno == EINTR || errno == ECONNREFUSED
+ || errno == ENETUNREACH || errno == EHOSTUNREACH)
+ return 0;
+ return -1;

- default:
- cache_process_req(&readfds);
- v4clients_process(&readfds);
- }
+ default:
+ selret -= cache_process_req(readfds);
+ selret -= v4clients_process(readfds);
+ if (selret < 0)
+ selret = 0;
}
+ return selret;
}

-
/*
* Give IP->domain and domain+path->options to kernel
* % echo nfsd $IP $[now+DEFAULT_TTL] $domain > /proc/net/rpc/auth.unix.ip/channel
diff --git a/support/export/export.h b/support/export/export.h
index ce561f9fbd3e..e2009ccdc443 100644
--- a/support/export/export.h
+++ b/support/export/export.h
@@ -31,6 +31,7 @@ struct nfs_fh_len *
int cache_export(nfs_export *exp, char *path);
int cache_fork_workers(char *prog, int num_threads);
void cache_wait_for_workers(char *prog);
+int cache_process(fd_set *readfds);

bool ipaddr_client_matches(nfs_export *exp, struct addrinfo *ai);
bool namelist_client_matches(nfs_export *exp, char *dom);
diff --git a/utils/exportd/exportd.c b/utils/exportd/exportd.c
index d07a885c6763..a2e370ac506f 100644
--- a/utils/exportd/exportd.c
+++ b/utils/exportd/exportd.c
@@ -236,9 +236,10 @@ main(int argc, char **argv)
v4clients_init();

/* Process incoming upcalls */
- cache_process_loop();
+ while (cache_process(NULL) >= 0)
+ ;

- xlog(L_ERROR, "%s: process loop terminated unexpectedly. Exiting...\n",
+ xlog(L_ERROR, "%s: process loop terminated unexpectedly(%m). Exiting...\n",
progname);

free_state_path_names(&etab);
diff --git a/utils/mountd/svc_run.c b/utils/mountd/svc_run.c
index 167b9757bde2..2aaf3756bbb1 100644
--- a/utils/mountd/svc_run.c
+++ b/utils/mountd/svc_run.c
@@ -97,28 +97,13 @@ my_svc_run(void)
int selret;

for (;;) {
-
readfds = svc_fdset;
- cache_set_fds(&readfds);
- v4clients_set_fds(&readfds);
-
- selret = select(FD_SETSIZE, &readfds,
- (void *) 0, (void *) 0, (struct timeval *) 0);
-
-
- switch (selret) {
- case -1:
- if (errno == EINTR || errno == ECONNREFUSED
- || errno == ENETUNREACH || errno == EHOSTUNREACH)
- continue;
+ selret = cache_process(&readfds);
+ if (selret < 0) {
xlog(L_ERROR, "my_svc_run() - select: %m");
return;
-
- default:
- selret -= cache_process_req(&readfds);
- selret -= v4clients_process(&readfds);
- if (selret)
- svc_getreqset(&readfds);
}
+ if (selret)
+ svc_getreqset(&readfds);
}
}
--
2.42.0

2023-10-23 02:12:18

by NeilBrown

[permalink] [raw]

Subject: [PATCH 6/6] cache: periodically retry requests that couldn't be answered.

Requests from the kernel to map the fsid from a filehandle to a path
name sometimes cannot be answered because the filesystems isn't
available now but might be available later.

This happens if an export is marked "mountpoint" but the mountpoint
isn't currently mounted. In this case it might get mounted in the
future.

It also happens in an NFS filesystem is being re-exported and the server
is unresponsive. In that case (if it was mounted "softerr") we get
ETIMEDOUT from a stat() attempt and so cannot give either a positive or
negative response.

These cases are currently handled poorly. No answer is returned to the
kernel so it will continue waiting for an answer - and never get one
even if the NFS server comes back or the mountpoint is mounted.

We cannot report a soft error to the kernel so much retry ourselves.

With this patch we record the request when the lookup fails with
dev_missing or similar and retry every 2 minutes.

Signed-off-by: NeilBrown <[email protected]>
---
support/export/cache.c | 121 +++++++++++++++++++++++++++++++++++------
1 file changed, 103 insertions(+), 18 deletions(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index a01eba4f6619..6c0a44a3a209 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -759,7 +759,15 @@ static struct addrinfo *lookup_client_addr(char *dom)
return ret;
}

-static void nfsd_fh(int f)
+#define RETRY_SEC 120
+struct delayed {
+ char *message;
+ time_t last_attempt;
+ int f;
+ struct delayed *next;
+} *delayed;
+
+static int nfsd_handle_fh(int f, char *bp, int blen)
{
/* request are:
* domain fsidtype fsid
@@ -777,21 +785,13 @@ static void nfsd_fh(int f)
nfs_export *exp;
int i;
int dev_missing = 0;
- char buf[RPC_CHAN_BUF_SIZE], *bp;
- int blen;
+ char buf[RPC_CHAN_BUF_SIZE];
int did_uncover = 0;
-
- blen = cache_read(f, buf, sizeof(buf));
- if (blen <= 0 || buf[blen-1] != '\n') return;
- buf[blen-1] = 0;
-
- xlog(D_CALL, "nfsd_fh: inbuf '%s'", buf);
-
- bp = buf;
+ int ret = 0;

dom = malloc(blen);
if (dom == NULL)
- return;
+ return ret;
if (qword_get(&bp, dom, blen) <= 0)
goto out;
if (qword_get_int(&bp, &fsidtype) != 0)
@@ -893,8 +893,10 @@ static void nfsd_fh(int f)
/* The missing dev could be what we want, so just be
* quiet rather than returning stale yet
*/
- if (dev_missing)
+ if (dev_missing) {
+ ret = 1;
goto out;
+ }
} else if (found->e_mountpoint &&
!is_mountpoint(found->e_mountpoint[0]?
found->e_mountpoint:
@@ -904,7 +906,7 @@ static void nfsd_fh(int f)
xlog(L_WARNING, "%s not exported as %d not a mountpoint",
found->e_path, found->e_mountpoint);
*/
- /* FIXME we need to make sure we re-visit this later */
+ ret = 1;
goto out;
}

@@ -933,7 +935,68 @@ out:
free(found_path);
nfs_freeaddrinfo(ai);
free(dom);
- xlog(D_CALL, "nfsd_fh: found %p path %s", found, found ? found->e_path : NULL);
+ if (!ret)
+ xlog(D_CALL, "nfsd_fh: found %p path %s",
+ found, found ? found->e_path : NULL);
+ return ret;
+}
+
+static void nfsd_fh(int f)
+{
+ struct delayed *d, **dp;
+ char inbuf[RPC_CHAN_BUF_SIZE];
+ int blen;
+
+ blen = cache_read(f, inbuf, sizeof(inbuf));
+ if (blen <= 0 || inbuf[blen-1] != '\n') return;
+ inbuf[blen-1] = 0;
+
+ xlog(D_CALL, "nfsd_fh: inbuf '%s'", inbuf);
+
+ if (nfsd_handle_fh(f, inbuf, blen) == 0)
+ return;
+ /* We don't have a definitive answer to give the kernel.
+ * This is because an export marked "mountpoint" isn't a
+ * mountpoint, or because a stat of a mountpoint fails with
+ * a strange error like ETIMEDOUT as is possible with an
+ * NFS mount marked "softerr" which is being re-exported.
+ *
+ * We cannot tell the kernel to retry, so we have to
+ * retry ourselves.
+ */
+ d = malloc(sizeof(*d));
+
+ if (!d)
+ return;
+ d->message = strndup(inbuf, blen);
+ if (!d->message) {
+ free(d);
+ return;
+ }
+ d->f = f;
+ d->last_attempt = time(NULL);
+ d->next = NULL;
+ dp = &delayed;
+ while (*dp)
+ dp = &(*dp)->next;
+ *dp = d;
+}
+
+static void nfsd_retry_fh(struct delayed *d)
+{
+ struct delayed **dp;
+
+ if (nfsd_handle_fh(d->f, d->message, strlen(d->message)+1) == 0) {
+ free(d->message);
+ free(d);
+ return;
+ }
+ d->last_attempt = time(NULL);
+ d->next = NULL;
+ dp = &delayed;
+ while (*dp)
+ dp = &(*dp)->next;
+ *dp = d;
}

#ifdef HAVE_JUNCTION_SUPPORT
@@ -1512,7 +1575,7 @@ static void nfsd_export(int f)
* This will cause it not to appear in the V4 Pseudo-root
* and so a "mount" of this path will fail, just like with
* V3.
- * And filehandle for this mountpoint from an earlier
+ * Any filehandle for this mountpoint from an earlier
* mount will block in nfsd.fh lookup.
*/
xlog(L_WARNING,
@@ -1610,6 +1673,7 @@ int cache_process(fd_set *readfds)
{
fd_set fdset;
int selret;
+ struct timeval tv = { 24*3600, 0 };

if (!readfds) {
FD_ZERO(&fdset);
@@ -1618,8 +1682,29 @@ int cache_process(fd_set *readfds)
cache_set_fds(readfds);
v4clients_set_fds(readfds);

- selret = select(FD_SETSIZE, readfds,
- (void *) 0, (void *) 0, (struct timeval *) 0);
+ if (delayed) {
+ time_t now = time(NULL);
+ time_t delay;
+ if (delayed->last_attempt > now)
+ /* Clock updated - retry immediately */
+ delayed->last_attempt = now - RETRY_SEC;
+ delay = delayed->last_attempt + RETRY_SEC - now;
+ if (delay < 0)
+ delay = 0;
+ tv.tv_sec = delay;
+ }
+ selret = select(FD_SETSIZE, readfds, NULL, NULL, &tv);
+
+ if (delayed) {
+ time_t now = time(NULL);
+ struct delayed *d = delayed;
+
+ if (d->last_attempt + RETRY_SEC <= now) {
+ delayed = d->next;
+ d->next = NULL;
+ nfsd_retry_fh(d);
+ }
+ }

switch (selret) {
case -1:
--
2.42.0

2023-10-25 17:38:45

by Steve Dickson

[permalink] [raw]

Subject: Re: [PATCH 0/6 nfs-utils v2] fixes for error handling in nfsd_fh

On 10/22/23 9:58 PM, NeilBrown wrote:
> Hi,
> this is a revised version of my previous series with the same name.
> This first two patches are unchanged.
> The third patch, which was an RFC, has been replaced with the last
> patch which actually addresses the issue rather than skirting
> around it.
>
> Patch 3 here is a revert of a change I noticed while exploring the
> code. cache_open() must be called BEFORE forking workers, as explained
> in that patch.
> Patches 4 and 5 factor our common code which makes the final patch
> simpler.
>
> The core issue is that sometimes mountd (or exportd) cannot give a
> definitey "yes" or "no" to a request to map an fsid to a path name.
> In these cases the only safe option is to delay and try again.
>
> This only becomes relevant if a filesystem is mounted by a client, then
> the server restarts (or the export cache is flushed) and the client
> tries to use a filehandle that it already has, but that server cannot
> find it and cannot be sure it doesn't exist. This can happen when an
> export is marked "mountpoint" or when a re-exported NFS filesystem
> cannot contact the server and reports an ETIMEDOUT error. In these
> cases we want the client to continue waiting (which it does) and also
> want mountd/exportd to periodically check if the target filesystem has
> come back (which it currently does not).
> With the current code, once this situation happens and the client is
> waiting, the client will continue to wait indefintely even if the
> target filesytem becomes available. The client can only continue if
> the NFS server is restarted or the export cache is flushed. After the
> ptsch, then within 2 minutes of the target filesystem becoming
> available again, mountd will tell the kernel and when the client asks
> again it will get be allowed to proceed.
>
> NeilBrown
>
>
> [PATCH 1/6] export: fix handling of error from match_fsid()
> [PATCH 2/6] export: add EACCES to the list of known
> [PATCH 3/6] export: move cache_open() before workers are forked.
> [PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c
> [PATCH 5/6] Share process_loop code between mountd and exportd.
> [PATCH 6/6] cache: periodically retry requests that couldn't be
>
Committed... (tag: nfs-utils-2-6-4-rc5)

steved.