LinuxLists.cc - Clients mounting subdirectories with NFSv3 can prevent unmounts on server

2019-08-04 21:10:42

Subject: Clients mounting subdirectories with NFSv3 can prevent unmounts on server

If a client mounts a subdirectory of an export using NFSv3, the server can end
up with invalid (CACHE_VALID bit unset) entries in its export cache. These
entries indirectly have a reference to the struct mount* containing the export,
preventing the filesystem containing the export from being unmounted, even
after the export has been unexported:

/tmp# mount --bind foo bar
/tmp# exportfs -o fsid=7 '*:/tmp/bar'
/tmp# mount -t nfs -o vers=3 localhost:/tmp/bar/a /mnt/a
/tmp# umount /mnt/a
/tmp# exportfs -u '*:/tmp/bar'
/tmp# umount /tmp/bar
umount: /tmp/bar: target is busy.
/tmp# cat /proc/net/rpc/nfsd.export/content
#path domain(flags)
# /tmp/bar/a *()

It looks like what's happening is that when rpc.mountd does a downcall to get a
filehandle corresponding to a particular path, exp_parent() traverses the
elements in the given path looking for one which is in the export cache. If
there isn't a valid entry in the cache for the given path, which there won't be
for a subdirectory of an export, then sunrpc_cache_lookup_rcu() inserts an
new, invalid entry.

Are the invalid entries inserted while executing exp_parent() useful? Could we
prevent them from being added?

Thanks,
John

2019-08-22 03:40:53

by John Gallagher

[permalink] [raw]

Subject: Re: Clients mounting subdirectories with NFSv3 can prevent unmounts on server

On Sun, Aug 4, 2019 at 2:09 PM John Gallagher
<[email protected]> wrote:
>
> If a client mounts a subdirectory of an export using NFSv3, the server can end
> up with invalid (CACHE_VALID bit unset) entries in its export cache. These
> entries indirectly have a reference to the struct mount* containing the export,
> preventing the filesystem containing the export from being unmounted, even
> after the export has been unexported:
>
> /tmp# mount --bind foo bar
> /tmp# exportfs -o fsid=7 '*:/tmp/bar'
> /tmp# mount -t nfs -o vers=3 localhost:/tmp/bar/a /mnt/a
> /tmp# umount /mnt/a
> /tmp# exportfs -u '*:/tmp/bar'
> /tmp# umount /tmp/bar
> umount: /tmp/bar: target is busy.
> /tmp# cat /proc/net/rpc/nfsd.export/content
> #path domain(flags)
> # /tmp/bar/a *()
>
> It looks like what's happening is that when rpc.mountd does a downcall to get a
> filehandle corresponding to a particular path, exp_parent() traverses the
> elements in the given path looking for one which is in the export cache. If
> there isn't a valid entry in the cache for the given path, which there won't be
> for a subdirectory of an export, then sunrpc_cache_lookup_rcu() inserts an
> new, invalid entry.

Looking into this a bit further, it seems that this is a regression, probably
introduced by d6fc8821c2d2aba4cc18447a467f543e46e7367d in 4.13. That commit
adds the additional check of the CACHE_VALID bit in cache_is_expired() which
prevents these entries from ever being considered expired, and therefore from
ever being flushed. I can think of at least a few possible approaches for
fixing this:

1. Prevent exp_parent() from adding these entries. I suspect they aren't
very useful anyway, since, being invalid, they aren't actually caching any
info from userspace. Perhaps we could add a new helper function for caches
which does lookups without adding a new entry when it doesn't find an
existing entry.
2. Find a way to make the check in cache_is_expired() more specific, so that
it solves the issue from d6fc8821, but still allows these entries to be
considered expired when we flush the cache.
3. Find some alternate way to solve the issue from d6fc8821 which doesn't
affect the way caches are flushed.

I haven't looked at the issue solved by d6fc8821 yet, so I don't know how
practical 2 and 3 are. Suggestions on what approach might be best would be
welcome. Once I get an idea on how best to proceed, I'd be happy to put a patch
together.

-John