2022-02-01 16:26:10

by Richard Weinberger

[permalink] [raw]
Subject: [RFC PATCH] mountd: export: Deal with NFS filesystems

When a NFS filesystem is exported usually NFSD refuses
to export the filesystem because mountd was unable to provide
an UUID and the kernel cannot derive a dev id from the NFS client
super block.

To deal with this situation, teach uuid_by_path() how to generate
an UUID from a NFS filesystem.

Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid
from the backend and use it as seed for mountd's UUID mechanism.

Signed-off-by: Richard Weinberger <[email protected]>
---
Hello,

This is the userspace side to make cross mounts work for NFS re-exports.
The basic idea is that mountd takes the NFS fsid from the backend server
and feeds into mountd'd existing UUID mechanism.

Getting the NFS fsid is currently a bit clumsy, mountd has to scan
/proc/fs/nfsfs/volumes. With the upcoming fsinfo() system call this
information could be transferred in a much simpler way.
e.g. NFS client stores the fsid in its sb->s_uuid and fsinfo() exposes
sb->s_uuid to userspace.

This approach works for v3-to-v4 and v4-to-v4 re-exports.
v3-to-v3 does *not* work, the reason is that nfs_encode_fh() aborts due
to not enough space in the fhandle.
I'm still investigating on how to improve this case.

Currently I’m toying around with a NFS fsid to export FSID_NUM mappings
scheme which are stored to disk.
Another possibility is a new fsid type for NFSD, where the raw NFS fsid
from the backend server is mirrored. Although such a mirror mode makes
local exports impossible.
At the end of the day this is something the admin has to decide.
I don't think there will be a generic solution which satisfies all needs.

Looking forward for your feedback. :-)

Thanks,
//richard
---
support/export/cache.c | 65 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 61 insertions(+), 4 deletions(-)

diff --git a/support/export/cache.c b/support/export/cache.c
index a5823e92e9f2..053ad86619b8 100644
--- a/support/export/cache.c
+++ b/support/export/cache.c
@@ -331,6 +331,51 @@ static const unsigned long nonblkid_filesystems[] = {
0 /* last */
};

+static int get_uuid_from_fsid(char *path, char *uuid_str, size_t len)
+{
+ unsigned int min_dev, maj_dev, min_fsid, maj_fsid;
+ int rc, n, found = 0, header_seen = 0;
+ struct stat stb;
+ FILE *nfsfs_fd;
+ char line[128];
+
+ rc = nfsd_path_stat(path, &stb);
+ if (rc) {
+ xlog(L_WARNING, "Unable to stat %s", path);
+ return 0;
+ }
+
+ nfsfs_fd = fopen("/proc/fs/nfsfs/volumes", "r");
+ if (nfsfs_fd == NULL) {
+ xlog(L_WARNING, "Unable to open nfsfs volume file: %m");
+ return 0;
+ }
+
+ while (fgets(line, sizeof(line), nfsfs_fd) != NULL) {
+ if (!header_seen) {
+ header_seen = 1;
+ continue;
+ }
+ n = sscanf(line, "v%*u %*x %*u %u:%u %x:%x %*s", &maj_dev,
+ &min_dev, &maj_fsid, &min_fsid);
+
+ if (n != 4) {
+ xlog(L_WARNING, "Unable to parse nfsfs volume line: %d, %s", n, line);
+ continue;
+ }
+
+ if (makedev(maj_dev, min_dev) == stb.st_dev) {
+ found = 1;
+ snprintf(uuid_str, len, "%08x%08x", maj_fsid, min_fsid);
+ break;
+ }
+ }
+
+ fclose(nfsfs_fd);
+
+ return found;
+}
+
static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
{
/* get a uuid for the filesystem found at 'path'.
@@ -362,7 +407,7 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
*/
struct statfs64 st;
char fsid_val[17];
- const char *blkid_val = NULL;
+ const char *fsuuid_val = NULL;
const char *val;
int rc;

@@ -375,7 +420,19 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
break;
}
if (*bad == 0)
- blkid_val = get_uuid_blkdev(path);
+ fsuuid_val = get_uuid_blkdev(path);
+ else if (*bad == 0x6969 /* NFS_SUPER_MAGIC */) {
+ char tmp[17];
+ int ret = get_uuid_from_fsid(path, tmp, sizeof(tmp));
+
+ if (ret < 0) {
+ xlog(L_WARNING, "Unable to read nfsfs volume file: %i", ret);
+ } else if (ret == 0) {
+ xlog(L_WARNING, "Unable to find nfsfs volume entry for %s", path);
+ } else {
+ fsuuid_val = tmp;
+ }
+ }
}

if (rc == 0 &&
@@ -385,8 +442,8 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
else
fsid_val[0] = 0;

- if (blkid_val && (type--) == 0)
- val = blkid_val;
+ if (fsuuid_val && (type--) == 0)
+ val = fsuuid_val;
else if (fsid_val[0] && (type--) == 0)
val = fsid_val;
else
--
2.26.2


2022-02-01 20:45:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH] mountd: export: Deal with NFS filesystems

Thanks for continuing to look into this, it's interesting work.

On Mon, Jan 31, 2022 at 11:43:16AM +0100, Richard Weinberger wrote:
> When a NFS filesystem is exported usually NFSD refuses
> to export the filesystem because mountd was unable to provide
> an UUID and the kernel cannot derive a dev id from the NFS client
> super block.
>
> To deal with this situation, teach uuid_by_path() how to generate
> an UUID from a NFS filesystem.
>
> Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid
> from the backend and use it as seed for mountd's UUID mechanism.

Sorry, I haven't checked--what is that number, really? It's probably
not the fsid returned from the server, as that wouldn't be guaranteed
unique across multiple servers. But that means it's probably generated
in a way that doesn't guarantee it's stable across reboots. And we need
filehandles to work across reboots.

I think we've got to generate fsid's ourself and keep an on-disk
database.

> This is the userspace side to make cross mounts work for NFS re-exports.
> The basic idea is that mountd takes the NFS fsid from the backend server
> and feeds into mountd'd existing UUID mechanism.
>
> Getting the NFS fsid is currently a bit clumsy, mountd has to scan
> /proc/fs/nfsfs/volumes. With the upcoming fsinfo() system call this
> information could be transferred in a much simpler way.
> e.g. NFS client stores the fsid in its sb->s_uuid and fsinfo() exposes
> sb->s_uuid to userspace.
>
> This approach works for v3-to-v4 and v4-to-v4 re-exports.
> v3-to-v3 does *not* work, the reason is that nfs_encode_fh() aborts due
> to not enough space in the fhandle.
> I'm still investigating on how to improve this case.

We might be able to do better, but I think a v4-only solution to start
would be OK.

> Currently I’m toying around with a NFS fsid to export FSID_NUM mappings
> scheme which are stored to disk.

I think something like that's more like what we need.

> Another possibility is a new fsid type for NFSD, where the raw NFS fsid
> from the backend server is mirrored. Although such a mirror mode makes
> local exports impossible.
> At the end of the day this is something the admin has to decide.
> I don't think there will be a generic solution which satisfies all needs.

We can provide options if we need to, but the costs of options can be
underestimated. Ideally we'd at least have a default that's safe and
works for most cases.

--b.

>
> Looking forward for your feedback. :-)
>
> Thanks,
> //richard
> ---
> support/export/cache.c | 65 +++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 61 insertions(+), 4 deletions(-)
>
> diff --git a/support/export/cache.c b/support/export/cache.c
> index a5823e92e9f2..053ad86619b8 100644
> --- a/support/export/cache.c
> +++ b/support/export/cache.c
> @@ -331,6 +331,51 @@ static const unsigned long nonblkid_filesystems[] = {
> 0 /* last */
> };
>
> +static int get_uuid_from_fsid(char *path, char *uuid_str, size_t len)
> +{
> + unsigned int min_dev, maj_dev, min_fsid, maj_fsid;
> + int rc, n, found = 0, header_seen = 0;
> + struct stat stb;
> + FILE *nfsfs_fd;
> + char line[128];
> +
> + rc = nfsd_path_stat(path, &stb);
> + if (rc) {
> + xlog(L_WARNING, "Unable to stat %s", path);
> + return 0;
> + }
> +
> + nfsfs_fd = fopen("/proc/fs/nfsfs/volumes", "r");
> + if (nfsfs_fd == NULL) {
> + xlog(L_WARNING, "Unable to open nfsfs volume file: %m");
> + return 0;
> + }
> +
> + while (fgets(line, sizeof(line), nfsfs_fd) != NULL) {
> + if (!header_seen) {
> + header_seen = 1;
> + continue;
> + }
> + n = sscanf(line, "v%*u %*x %*u %u:%u %x:%x %*s", &maj_dev,
> + &min_dev, &maj_fsid, &min_fsid);
> +
> + if (n != 4) {
> + xlog(L_WARNING, "Unable to parse nfsfs volume line: %d, %s", n, line);
> + continue;
> + }
> +
> + if (makedev(maj_dev, min_dev) == stb.st_dev) {
> + found = 1;
> + snprintf(uuid_str, len, "%08x%08x", maj_fsid, min_fsid);
> + break;
> + }
> + }
> +
> + fclose(nfsfs_fd);
> +
> + return found;
> +}
> +
> static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
> {
> /* get a uuid for the filesystem found at 'path'.
> @@ -362,7 +407,7 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
> */
> struct statfs64 st;
> char fsid_val[17];
> - const char *blkid_val = NULL;
> + const char *fsuuid_val = NULL;
> const char *val;
> int rc;
>
> @@ -375,7 +420,19 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
> break;
> }
> if (*bad == 0)
> - blkid_val = get_uuid_blkdev(path);
> + fsuuid_val = get_uuid_blkdev(path);
> + else if (*bad == 0x6969 /* NFS_SUPER_MAGIC */) {
> + char tmp[17];
> + int ret = get_uuid_from_fsid(path, tmp, sizeof(tmp));
> +
> + if (ret < 0) {
> + xlog(L_WARNING, "Unable to read nfsfs volume file: %i", ret);
> + } else if (ret == 0) {
> + xlog(L_WARNING, "Unable to find nfsfs volume entry for %s", path);
> + } else {
> + fsuuid_val = tmp;
> + }
> + }
> }
>
> if (rc == 0 &&
> @@ -385,8 +442,8 @@ static int uuid_by_path(char *path, int type, size_t uuidlen, char *uuid)
> else
> fsid_val[0] = 0;
>
> - if (blkid_val && (type--) == 0)
> - val = blkid_val;
> + if (fsuuid_val && (type--) == 0)
> + val = fsuuid_val;
> else if (fsid_val[0] && (type--) == 0)
> val = fsid_val;
> else
> --
> 2.26.2

2022-02-01 20:45:37

by Richard Weinberger

[permalink] [raw]
Subject: Re: [RFC PATCH] mountd: export: Deal with NFS filesystems

----- Ursprüngliche Mail -----
> Von: "bfields" <[email protected]>
>> Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid
>> from the backend and use it as seed for mountd's UUID mechanism.
>
> Sorry, I haven't checked--what is that number, really? It's probably
> not the fsid returned from the server, as that wouldn't be guaranteed
> unique across multiple servers. But that means it's probably generated
> in a way that doesn't guarantee it's stable across reboots. And we need
> filehandles to work across reboots.

Unless I badly misunderstood the code it comes from fs/nfs/client.c's
nfs_create_server() where the NFS client fetches NFS_ATTR_FATTR via getattr().
So it should be unique:
https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9

Thanks,
//richard

2022-02-01 20:49:47

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC PATCH] mountd: export: Deal with NFS filesystems

On Mon, Jan 31, 2022 at 06:15:41PM +0100, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
> > Von: "bfields" <[email protected]>
> >> Using /proc/fs/nfsfs/volumes it is possible to find the NFS fsid
> >> from the backend and use it as seed for mountd's UUID mechanism.
> >
> > Sorry, I haven't checked--what is that number, really? It's probably
> > not the fsid returned from the server, as that wouldn't be guaranteed
> > unique across multiple servers. But that means it's probably generated
> > in a way that doesn't guarantee it's stable across reboots. And we need
> > filehandles to work across reboots.
>
> Unless I badly misunderstood the code it comes from fs/nfs/client.c's
> nfs_create_server() where the NFS client fetches NFS_ATTR_FATTR via getattr().
> So it should be unique:
> https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9

Unique within one server, but not across different servers.

Depending on how your servers assign fsids, collisions might be quite
likely. Results of a collision would be pretty bad.

--b.

2022-02-01 20:54:00

by Richard Weinberger

[permalink] [raw]
Subject: Re: [RFC PATCH] mountd: export: Deal with NFS filesystems

----- Ursprüngliche Mail -----
> Von: "bfields" <[email protected]>
>> Unless I badly misunderstood the code it comes from fs/nfs/client.c's
>> nfs_create_server() where the NFS client fetches NFS_ATTR_FATTR via getattr().
>> So it should be unique:
>> https://datatracker.ietf.org/doc/html/rfc7530#section-5.8.1.9
>
> Unique within one server, but not across different servers.
>
> Depending on how your servers assign fsids, collisions might be quite
> likely. Results of a collision would be pretty bad.

Hmm. So this means when you have a setup where the same file system is exported
via multiple servers (the load balancing use case), getattr on the same file can
report a different fsid at any time?
This renders the fsid attribute close to useless :-(

Anyway, this is not a show stopper just makes more clear that we should
take the FSID_NUM mapping path.
Please stay tuned, I'll send ASAP another patch.

Thanks,
//richard