LinuxLists.cc - [PATCH 0/3 nfs-utils] fixes for error handling in nfsd

2023-10-11 05:16:13

Subject: [PATCH 0/3 nfs-utils] fixes for error handling in nfsd_fh

When the kernel ask mountd to map the fsid from a filehandle to a path
name the request is handled by nfsd_fh().
The kernel expects to get a definitive answer - either a path name or
an clear statement that there is no path name. However mountd sometimes cannot
give a definitive answer - it wants to say "ask again later", but it cannot.

One case that it wants to say "ask again" is when a 'stat' of a
mountpoint fails with a non-path-name error. This might mean that a
re-exported NFS mount is having problems.

The code for handling this case is incorrect. mountd potentially scan
every moint point. If any mount point matches the fsid, it should
succeed even if some other mountpoint appears to have problems. Instead
assumes failure is *any* mountpoint has problems. The first patch fixes
this - so now the error handling is only triggered if no matching mount
can be found.

This list of errors that are considered "path-name errors" does not
included EACCES. This error can be returned, e.g., by fuse filesystem
that only allow a single use to have access. The should be treated as a
path-name error. The second patch fixes this.

However this still leave the error handling to be wrong. The response
when no match is found and at least one questionable path is seen, is to
not send a response to the kernel. The result of this is that the
kernel never asks again. It return NFS4ERR_DELAY (for NFSv4) to the
client, but doesn't ever re-queue the request as the request is already
queued. There is no way to tell the kernel "dequeue this request treat
the lookup as failure".

The third patch proposes a machanism to do this. It returns a negative
result to the kernel with an expiry time in the past. This could
reasonably be interpreted as "dequeue the request but do return a
definitive result". The kernel doesn't currently treat it like that.
Rather it treats it as definitive just once, and will ask again for the
next request. But that will only come after the client has seen a failure.

We could easily change the kernel to respond differently, but we would
need to think carefully about the behaviour on older kernels, and about
the possibility of a much higher rate of fsid lookup requests being sent
from kernel to mountd.

So the third patch is just an RFC. The first two patches are, I think,
suitable for immediate release (subject to normal review).

Thanks,
NeilBrown

2023-10-11 05:16:18

by NeilBrown

[permalink] [raw]

Subject: [PATCH 1/3] export: fix handling of error from match_fsid()

If match_fsid() returns -1 we shouldn't assume that the path definitely
doesn't match the fsid, though it might not.
This is a similar situation to where an export is expected to be a mount
point, but is found not to be one. So it can be handled the same way,
by setting 'dev_missing'.
This will only have an effect is no other path matched the fsid, which
is what we want.

The current code results in nothing being exported and any export point,
or any mount point beneath a crossmnt export point fail a 'stat'
request, which is too harsh.

Signed-off-by: NeilBrown <[email protected]>
---
support/export/cache.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index 19bbba556060..e4595020f43f 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -858,7 +858,8 @@ static void nfsd_fh(int f)
case 0:
continue;
case -1:
- goto out;
+ dev_missing ++;
+ continue;
}
if (is_ipaddr_client(dom)
&& !ipaddr_client_matches(exp, ai))
--
2.42.0

2023-10-11 05:16:25

by NeilBrown

[permalink] [raw]

Subject: [PATCH 2/3] export: add EACCES to the list of known path_lookup_error() errors.

If a 'stat' results in EACCES (for root), then it is likely a permanent
problem. One possible cause is a 'fuser' filesystem which only gives
any access to the user which mounted it.

So it is reasonable for EACCES to be a "path lookup error"

Signed-off-by: NeilBrown <[email protected]>
---
support/export/cache.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/support/export/cache.c b/support/export/cache.c
index e4595020f43f..5307f6c8d872 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -77,6 +77,7 @@ static bool path_lookup_error(int err)
case ENAMETOOLONG:
case ENOENT:
case ENOTDIR:
+ case EACCES:
return 1;
}
return 0;
--
2.42.0

2023-10-11 05:16:30

by NeilBrown

[permalink] [raw]

Subject: [PATCH 3/3 RFC] export: nfsd_fh - always an answer to a well-formed question.

When the kernel asks mountd for some information it is important that
mountd reply as the kernel will not ask again.
When we don't have a useful reply we want the kernel to see this
as a transient failure. This not currently (v6.6) any way to
communicate a transient failure. The best we can do is give a
negative answer which is already expired. This will at least
allow the kernel to ask again.

The kernel needs to be enhanced to not treat an entry that is already
expired as ever reliable.

Signed-off-by: NeilBrown <[email protected]>
---
support/export/cache.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index 5307f6c8d872..74cacea9f0cc 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -894,7 +894,7 @@ static void nfsd_fh(int f)
* quiet rather than returning stale yet
*/
if (dev_missing)
- goto out;
+ goto out_delay;
} else if (found->e_mountpoint &&
!is_mountpoint(found->e_mountpoint[0]?
found->e_mountpoint:
@@ -904,8 +904,7 @@ static void nfsd_fh(int f)
xlog(L_WARNING, "%s not exported as %d not a mountpoint",
found->e_path, found->e_mountpoint);
*/
- /* FIXME we need to make sure we re-visit this later */
- goto out;
+ goto out_delay;
}

bp = buf; blen = sizeof(buf);
@@ -934,6 +933,26 @@ out:
nfs_freeaddrinfo(ai);
free(dom);
xlog(D_CALL, "nfsd_fh: found %p path %s", found, found ? found->e_path : NULL);
+ return;
+
+out_delay:
+ /* We don't have a definitely answer to give the kernel - maybe we will later.
+ * This could be because an export marked "mountpoint" isn't a mountpoint, or
+ * because a mountpoint fails with a strange error like ETIMEDOUT as is possible
+ * with an NFS mount marked "softerr" which is being re-exported.
+ * If we tell the kernel nothing, it will never ask again, so we have
+ * to give some answer. A negative answer that has already expired
+ * is the best we can do.
+ */
+ bp = buf; blen = sizeof(buf);
+ qword_add(&bp, &blen, dom);
+ qword_addint(&bp, &blen, fsidtype);
+ qword_addhex(&bp, &blen, fsid, fsidlen);
+ qword_addint(&bp, &blen, time(NULL) - 1);
+ qword_addeol(&bp, &blen);
+ if (blen <= 0 || cache_write(f, buf, bp - buf) != bp - buf)
+ xlog(L_ERROR, "nfsd_fh: error writing reply");
+ xlog(D_AUTH, "unknown access to %s", *dom == '$' ? dom+1 : dom);
}

#ifdef HAVE_JUNCTION_SUPPORT
--
2.42.0