2021-12-09 21:12:07

by Richard Weinberger

[permalink] [raw]
Subject: Improving NFS re-export

Hello NFS list,

I'd like to improve the NFS re-export feature, especially wrt. crossmounts.
Currently a NFS client will face EIO when crossing a mount point on the re-exporting server.
This was discussed here[0]. While in that discussion the assumption was that check_export()
in fs/nfsd/export.c emits EIO I did further experiments and realized that EIO actually
comes from the NFS client side of the re-exporting server.

nfs_encode_fh() in fs/nfs/export.c checks for IS_AUTOMOUNT(inode), if this is the case
it refuses to create a new file handle.
So while accessing /files/disk2 directly on the re-exporting server triggers an automount,
accessing via nfsd the export function of the client side gives up.

AFAIU the suggested proxy-only-mode[1] will not address this problem, right?

One workaround is manually adding an export for each volume on the re-exporting server.
This kinda works but is tedious and error prone.

I have a crazy idea how to automate this:
Since nfs_encode_fh() in the NFS client side of the re-exporting server can detect
crossing mounts, we could install a new export on the sever side as soon the
IS_AUTOMOUNT(inode) case arises. We could even use the same fsid.
What do you think?

Another obstacle is file handle wrapping.
When re-exporting, the NFS client side adds inode and file information to each file handle,
the server side also adds information. In my test setup this enlarges a 16 bytes file handle
to 40 bytes.
The proxy-only-mode won't help us either here.

Did you consider using the opaque file handle from the server as lookup key in a
(persisted) data structure?
That way at least the client side of the re-exporting server no longer has to enlarge
the file handle with inode and file type information.
If the re-exporting server re-exports just one server (proxy-only-mode) we could also
skip adding the fsid to the handle.
What do you think?

I'm looking forward to hear your comments.

Thanks,
//richard

[0] https://marc.info/?l=linux-nfs&m=161670807413876&w=2
[1] https://linux-nfs.org/wiki/index.php/NFS_proxy-only_mode


2021-12-09 21:41:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Improving NFS re-export

On Thu, Dec 09, 2021 at 10:05:48PM +0100, Richard Weinberger wrote:
> Hello NFS list,
>
> I'd like to improve the NFS re-export feature, especially wrt. crossmounts.
> Currently a NFS client will face EIO when crossing a mount point on the re-exporting server.
> This was discussed here[0]. While in that discussion the assumption was that check_export()
> in fs/nfsd/export.c emits EIO I did further experiments and realized that EIO actually
> comes from the NFS client side of the re-exporting server.
>
> nfs_encode_fh() in fs/nfs/export.c checks for IS_AUTOMOUNT(inode), if this is the case
> it refuses to create a new file handle.
> So while accessing /files/disk2 directly on the re-exporting server triggers an automount,
> accessing via nfsd the export function of the client side gives up.
>
> AFAIU the suggested proxy-only-mode[1] will not address this problem, right?

That's how I was thinking of addressing the problem, actually. I
haven't figured out how to make that proxy-only mode work, though.

> One workaround is manually adding an export for each volume on the re-exporting server.
> This kinda works but is tedious and error prone.
>
> I have a crazy idea how to automate this:
> Since nfs_encode_fh() in the NFS client side of the re-exporting server can detect
> crossing mounts, we could install a new export on the sever side as soon the
> IS_AUTOMOUNT(inode) case arises. We could even use the same fsid.
> What do you think?

Something like that might work.

I'm not sure what you mean by the same fsid. I think you'd need to make
up a new fsid each time you encounter a new filesystem. And you'd also
want to persist it on disk if you want this to keep working across
reboots of the proxy.

I think you could patch rpc.mountd to do that.

> Another obstacle is file handle wrapping.
> When re-exporting, the NFS client side adds inode and file information to each file handle,
> the server side also adds information. In my test setup this enlarges a 16 bytes file handle
> to 40 bytes.
> The proxy-only-mode won't help us either here.

Part of my motivation for a proxy-only mode was to remove that wrapping.

Since you're dedicating the host to reexporting one single backend
server, in theory you don't need any of the information in the wrapper.
When you (the proxy) get a filehandle from a client, you know which
server that filehandle originally came from, so you can go ask that
server for whatever you need to know about the filehandle (like an
fsid).

> Did you consider using the opaque file handle from the server as
> lookup key in a (persisted) data structure?

A little, but I don't think it works.

If you do this, you do need to require that you only export one server.
Otherwise there may be collisions (two different servers could return
filehandles that happen to have the same value).

The database would store every filehandle the client has ever seen.
That could be a lot. It may also include filehandles for since-deleted
files. The only way to prune such entries would be to try using them
and see if the server gives you STALE errors.

--b.

> That way at least the client side of the re-exporting server no longer has to enlarge
> the file handle with inode and file type information.
> If the re-exporting server re-exports just one server (proxy-only-mode) we could also
> skip adding the fsid to the handle.
> What do you think?
>
> I'm looking forward to hear your comments.
>
> Thanks,
> //richard
>
> [0] https://marc.info/?l=linux-nfs&m=161670807413876&w=2
> [1] https://linux-nfs.org/wiki/index.php/NFS_proxy-only_mode

2021-12-09 22:03:27

by Richard Weinberger

[permalink] [raw]
Subject: Re: Improving NFS re-export

----- Ursprüngliche Mail -----
> On Thu, Dec 09, 2021 at 10:05:48PM +0100, Richard Weinberger wrote:
>> nfs_encode_fh() in fs/nfs/export.c checks for IS_AUTOMOUNT(inode), if this is
>> the case
>> it refuses to create a new file handle.
>> So while accessing /files/disk2 directly on the re-exporting server triggers an
>> automount,
>> accessing via nfsd the export function of the client side gives up.
>>
>> AFAIU the suggested proxy-only-mode[1] will not address this problem, right?
>
> That's how I was thinking of addressing the problem, actually. I
> haven't figured out how to make that proxy-only mode work, though.
>
>> One workaround is manually adding an export for each volume on the re-exporting
>> server.
>> This kinda works but is tedious and error prone.
>>
>> I have a crazy idea how to automate this:
>> Since nfs_encode_fh() in the NFS client side of the re-exporting server can
>> detect
>> crossing mounts, we could install a new export on the sever side as soon the
>> IS_AUTOMOUNT(inode) case arises. We could even use the same fsid.
>> What do you think?
>
> Something like that might work.
>
> I'm not sure what you mean by the same fsid. I think you'd need to make
> up a new fsid each time you encounter a new filesystem. And you'd also
> want to persist it on disk if you want this to keep working across
> reboots of the proxy.

By same fsid I meant reusing the fsid from the backend server.

> I think you could patch rpc.mountd to do that.

Okay, I need to dig into this.

>> Another obstacle is file handle wrapping.
>> When re-exporting, the NFS client side adds inode and file information to each
>> file handle,
>> the server side also adds information. In my test setup this enlarges a 16 bytes
>> file handle
>> to 40 bytes.
>> The proxy-only-mode won't help us either here.
>
> Part of my motivation for a proxy-only mode was to remove that wrapping.
>
> Since you're dedicating the host to reexporting one single backend
> server, in theory you don't need any of the information in the wrapper.
> When you (the proxy) get a filehandle from a client, you know which
> server that filehandle originally came from, so you can go ask that
> server for whatever you need to know about the filehandle (like an
> fsid).

I see. That way we could get rid of file handle wrapping but loose the
NFS clinet inode cache on the re-exporting server, I think.

>> Did you consider using the opaque file handle from the server as
>> lookup key in a (persisted) data structure?
>
> A little, but I don't think it works.
>
> If you do this, you do need to require that you only export one server.
> Otherwise there may be collisions (two different servers could return
> filehandles that happen to have the same value).
>
> The database would store every filehandle the client has ever seen.
> That could be a lot. It may also include filehandles for since-deleted
> files. The only way to prune such entries would be to try using them
> and see if the server gives you STALE errors.

True. I didn't think about the pruning case.

Thanks a lot for the prompt reply and your valuable input.
//richard

2021-12-21 14:31:26

by Daire Byrne

[permalink] [raw]
Subject: Re: Improving NFS re-export

On Thu, 9 Dec 2021 at 22:03, Richard Weinberger <[email protected]> wrote:
>
> I see. That way we could get rid of file handle wrapping but loose the
> NFS clinet inode cache on the re-exporting server, I think.

As an avid user of re-exporting over the WAN, we do like to be able to
selectively cache as much of the metadata lookups as possible
(actimeo=3600, vfs_cache_pressure=1).

I'm not sure if losing the re-export server's client inode cache would
effect that ability?

And on the subject of the "proxy" server and a server per export; if
like us, you have 30 servers or mountpoints to re-export but you might
only actively use 5-10 of those at any one time, so it is more
resource efficient (CPU, RAM, fscache storage) to use a single
re-export server for more than one mountpoint re-export. But in the
proxy case, maybe the same thing could be achieved with a
containerised knfsd with all the proxy servers running on the same
server?

I'm not sure if you could have shared storage and have multiple
fs-cache/cachefilesd in containers though.

Either way, I'm interested to see what you come up with. Always happy
to test new variations on re-exporting.

Daire

2021-12-21 17:21:10

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Improving NFS re-export

On Tue, Dec 21, 2021 at 02:30:45PM +0000, Daire Byrne wrote:
> On Thu, 9 Dec 2021 at 22:03, Richard Weinberger <[email protected]> wrote:
> >
> > I see. That way we could get rid of file handle wrapping but loose the
> > NFS clinet inode cache on the re-exporting server, I think.
>
> As an avid user of re-exporting over the WAN, we do like to be able to
> selectively cache as much of the metadata lookups as possible
> (actimeo=3600, vfs_cache_pressure=1).
>
> I'm not sure if losing the re-export server's client inode cache would
> effect that ability?

A proxy without an inode cache wouldn't be good.

So the inode cache would have to be indexed just on (a hash of) the raw
filehandle.

> And on the subject of the "proxy" server and a server per export; if
> like us, you have 30 servers or mountpoints to re-export but you might
> only actively use 5-10 of those at any one time, so it is more
> resource efficient (CPU, RAM, fscache storage) to use a single
> re-export server for more than one mountpoint re-export.

That's useful to know, thanks.

> But in the proxy case, maybe the same thing could be achieved with a
> containerised knfsd with all the proxy servers running on the same
> server?

Yes, that's what I was thinking.

> I'm not sure if you could have shared storage and have multiple
> fs-cache/cachefilesd in containers though.

Seems like there should be a few ways to do that.

> Either way, I'm interested to see what you come up with. Always happy
> to test new variations on re-exporting.

I haven't managed to come up with a plan for making a proxy-only mode
work, though, so I'm not feeling too optimistic about that particular
idea.

--b.

2021-12-21 21:39:55

by Richard Weinberger

[permalink] [raw]
Subject: Re: Improving NFS re-export

Daire,

----- Ursprüngliche Mail -----
> Von: "Daire Byrne" <[email protected]>
> Either way, I'm interested to see what you come up with. Always happy
> to test new variations on re-exporting.

David and I will share patches soon. We're quite happy with the kernel side,
but our rpc.mountd changes are still hacky.
We have a prove of concept fix for cross mounts and some crazy ideas how to
reduce the fhandle overhead when re-exporting.

Thanks,
//richard