LinuxLists.cc - [RFC PATCH 0/3] Dealing with NFS re-export and cross mounts

2022-01-10 18:44:37

Subject: [RFC PATCH 0/3] Dealing with NFS re-export and cross mounts

Currently when re-exporting a NFS share the NFS cross mount feature does
not work [0].
This RFC patch series outlines an approach to address the problem.

Crossing mounts does not work for two reasons:

1. As soon the NFS client (on the re-exporting server) sees a different
filesystem id, it installs an automount. That way the other filesystem
will be mounted automatically when someone enters the directory.
But the cross mount logic of KNFS does not know about automount.
The three patches in this series address the problem and teach both KNFSD
and the exportfs logic of NFS to deal with automount.

2. When KNFSD detects crossing of a mount point, it asks rpc.mountd to install
a new export for the target mount point. Beside of authentication rpc.mountd
also has to find a filesystem id for the new export. Is the to be exported
filesystem a NFS share, rpc.mountd cannot derive a filesystem id from it and
refuses to export. In the logs you’ll see error such as:
mountd: Cannot export /srv/nfs/vol0, possibly unsupported filesystem or fsid= required
To deal with that I changed rpc.mountd to use an arbitrary fsid.
Since this is a gross hack we need to agree on an approach to derive filesystem
ids for NFS mounts.

rpc.mountd could:
a) re-use the fsid from the original NFS server.
Beside of requesting this information, the problem with that approach is
that the original fsid might conflict with an existing export.
b) derive the fsid from stat->st_dev.
c) allocate a free fsid.

One use case to consider is load balancing. When multiple NFS servers re-export
a NFS mount, they need to use the same fsid for crossed mounts.
So I'm a little puzzled which approach is best. What do you think?

Known issues:
- Only tested with NFSv3 (both server and client) so far.

[0] https://marc.info/?l=linux-nfs&m=161653016627277&w=2

Richard Weinberger (3):
NFSD: Teach nfsd_mountpoint() auto mounts
fs: namei: Allow follow_down() to uncover auto mounts
NFS: nfs_encode_fh: Remove S_AUTOMOUNT check

fs/namei.c | 2 +-
fs/nfs/export.c | 5 -----
fs/nfsd/vfs.c | 2 +-
3 files changed, 2 insertions(+), 7 deletions(-)

--
2.26.2

2022-01-10 18:44:39

by Richard Weinberger

[permalink] [raw]

Subject: [RFC PATCH 1/3] NFSD: Teach nfsd_mountpoint() auto mounts

Currently nfsd_mountpoint() tests for mount points using d_mountpoint(),
this works only when a mount point is already uncovered.
In our case the mount point is of type auto mount and can be coverted.
i.e. ->d_automount() was not called.

Using d_managed() nfsd_mountpoint() can test whether a mount point is
either already uncovered or can be uncovered later.

Signed-off-by: Richard Weinberger <[email protected]>
---
fs/nfsd/vfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c99857689e2c..2f3352a99de6 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -160,7 +160,7 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
return 1;
if (nfsd4_is_junction(dentry))
return 1;
- if (d_mountpoint(dentry))
+ if (d_managed(dentry))
/*
* Might only be a mountpoint in a different namespace,
* but we need to check.
--
2.26.2

2022-01-10 18:44:40

by Richard Weinberger

[permalink] [raw]

Subject: [RFC PATCH 2/3] fs: namei: Allow follow_down() to uncover auto mounts

This function is only used by NFSD to cross mount points.
If a mount point is of type auto mount, follow_down() will
not uncover it. Add LOOKUP_AUTOMOUNT to the lookup flags
to have ->d_automount() called when NFSD walks down the
mount tree.

Signed-off-by: Richard Weinberger <[email protected]>
---
fs/namei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1f9d2187c765..b9de9fc4bfed 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1410,7 +1410,7 @@ int follow_down(struct path *path)
{
struct vfsmount *mnt = path->mnt;
bool jumped;
- int ret = traverse_mounts(path, &jumped, NULL, 0);
+ int ret = traverse_mounts(path, &jumped, NULL, LOOKUP_AUTOMOUNT);

if (path->mnt != mnt)
mntput(mnt);
--
2.26.2

2022-01-10 18:44:40

by Richard Weinberger

[permalink] [raw]

Subject: [RFC PATCH 3/3] NFS: nfs_encode_fh: Remove S_AUTOMOUNT check

Now with NFSD being able to cross into auto mounts,
the check can be removed.

Signed-off-by: Richard Weinberger <[email protected]>
---
fs/nfs/export.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index 8c8028959863..6d56a52c424a 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -54,11 +54,6 @@ nfs_encode_fh(struct inode *inode, __u32 *p, int *max_len, struct inode *parent)
dprintk("%s: max fh len %d inode %p parent %p",
__func__, *max_len, inode, parent);

- if (IS_AUTOMOUNT(inode)) {
- dprintk("%s: refusing to create fh for automount inode %p\n",
- __func__, inode);
- return FILEID_INVALID;
- }
if (*max_len < len) {
dprintk("%s: fh len %d too small, required %d\n",
__func__, *max_len, len);
--
2.26.2

2022-01-11 19:43:41

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [RFC PATCH 0/3] Dealing with NFS re-export and cross mounts

On Mon, Jan 10, 2022 at 07:44:16PM +0100, Richard Weinberger wrote:
> Currently when re-exporting a NFS share the NFS cross mount feature does
> not work [0].
> This RFC patch series outlines an approach to address the problem.
>
> Crossing mounts does not work for two reasons:
>
> 1. As soon the NFS client (on the re-exporting server) sees a different
> filesystem id, it installs an automount. That way the other filesystem
> will be mounted automatically when someone enters the directory.
> But the cross mount logic of KNFS does not know about automount.
> The three patches in this series address the problem and teach both KNFSD
> and the exportfs logic of NFS to deal with automount.
>
> 2. When KNFSD detects crossing of a mount point, it asks rpc.mountd to install
> a new export for the target mount point. Beside of authentication rpc.mountd
> also has to find a filesystem id for the new export. Is the to be exported
> filesystem a NFS share, rpc.mountd cannot derive a filesystem id from it and
> refuses to export. In the logs you’ll see error such as:
> mountd: Cannot export /srv/nfs/vol0, possibly unsupported filesystem or fsid= required
> To deal with that I changed rpc.mountd to use an arbitrary fsid.
> Since this is a gross hack we need to agree on an approach to derive filesystem
> ids for NFS mounts.

The toughest problem to deal with is reboot of the re-export server. If
you want this to work across reboots, then you need to pick an fsid that
will be the same across reboots.

Also, you need to deal with getting an fsid for a filesystem that isn't
mounted yet. That's because, if you reboot while a client is using
/srv/nfs/vol0, when you come back up, the client *isn't* going to look
up the path /srv/nfs/vol0 again--it's just going to give you a
filehandle for some object under there, and you're going to have to
figure out what to do with that.

Simplest might be recording the fsid's you use in an on-disk database.
knfsd makes an upcall to rpc.mountd each time it encounters a new fsid,
so maybe that'd mean you could do all the management of that database in
rpc.mountd and minimize required kernel patches.

Maybe a last-resort option would be just to not support reboot of the
re-export server. That's already what we do for locking. I'm not happy
about that, and have some vague ideas how it might be fixed, but not
anything that's likely to be done soon.

Then I think a random fsid might be OK. I believe fsids can be up to 32
bits so there's effectively no chance of collisions.

But, I can't remember, can those nfs automounts expire? An export that
looks idle from the server's point of view might still be in use by a
client, so we can't drop that mount and then get stuck returning ESTALE
when the client does eventually try to use it.

--b.

>
> rpc.mountd could:
> a) re-use the fsid from the original NFS server.
> Beside of requesting this information, the problem with that approach is
> that the original fsid might conflict with an existing export.
> b) derive the fsid from stat->st_dev.
> c) allocate a free fsid.
>
> One use case to consider is load balancing. When multiple NFS servers re-export
> a NFS mount, they need to use the same fsid for crossed mounts.
> So I'm a little puzzled which approach is best. What do you think?
>
> Known issues:
> - Only tested with NFSv3 (both server and client) so far.
>
> [0] https://marc.info/?l=linux-nfs&m=161653016627277&w=2
>
> Richard Weinberger (3):
> NFSD: Teach nfsd_mountpoint() auto mounts
> fs: namei: Allow follow_down() to uncover auto mounts
> NFS: nfs_encode_fh: Remove S_AUTOMOUNT check
>
> fs/namei.c | 2 +-
> fs/nfs/export.c | 5 -----
> fs/nfsd/vfs.c | 2 +-
> 3 files changed, 2 insertions(+), 7 deletions(-)
>
> --
> 2.26.2

2022-01-11 20:01:39

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [RFC PATCH 0/3] Dealing with NFS re-export and cross mounts

On Tue, Jan 11, 2022 at 02:43:37PM -0500, J. Bruce Fields wrote:
> On Mon, Jan 10, 2022 at 07:44:16PM +0100, Richard Weinberger wrote:
> > rpc.mountd could:
> > a) re-use the fsid from the original NFS server.
> > Beside of requesting this information, the problem with that approach is
> > that the original fsid might conflict with an existing export.
> > b) derive the fsid from stat->st_dev.
> > c) allocate a free fsid.
> >
> > One use case to consider is load balancing. When multiple NFS servers re-export
> > a NFS mount, they need to use the same fsid for crossed mounts.

I guess if rpc.mountd kept an on-disk database of fsid's, it wouldn't be
too big a deal to later enhance that with the option of a distributed
database.

So I'm leaning towards picking a random fsid and sticking it in a
database. When you encouter a new filesystem you'd need to make sure
the addition of a new entry is atomic and persistent before returning to
knfsd.

It'd be nice if mountd had an easy way to query the on-the-wire fsid
from userspace, and then you could index entries on the fsid. Absent
that, maybe just indexing on server and path would be good enough.

I'm not sure how NFS's st_dev's are generated. I think they might
depend on stuff that isn't necessarily the same on each boot (like the
order the NFS filesystems were mounted in), so they wouldn't work.

--b.

> > So I'm a little puzzled which approach is best. What do you think?
> >
> > Known issues:
> > - Only tested with NFSv3 (both server and client) so far.
> >
> > [0] https://marc.info/?l=linux-nfs&m=161653016627277&w=2
> >
> > Richard Weinberger (3):
> > NFSD: Teach nfsd_mountpoint() auto mounts
> > fs: namei: Allow follow_down() to uncover auto mounts
> > NFS: nfs_encode_fh: Remove S_AUTOMOUNT check
> >
> > fs/namei.c | 2 +-
> > fs/nfs/export.c | 5 -----
> > fs/nfsd/vfs.c | 2 +-
> > 3 files changed, 2 insertions(+), 7 deletions(-)
> >
> > --
> > 2.26.2

2022-01-11 20:02:21

by J. Bruce Fields

[permalink] [raw]