2016-02-06 00:20:00

by Kjetil Joergensen

[permalink] [raw]
Subject: nfsd and containers

Hi,

trying to fit everything into the same mold, we're trying to run the
in-kernel "nfs server" inside of docker containers. It works great -
with one exception, "unclean shutdown" of the container itself which
leaves behind the knfsd threads, which holds on to references to i.e.
the mount-namespace the filesystem it's exported lives within.

We've done some patching of docker, so we use ceph rbd devices,
mounted into the docker container, and a veth pair for networking. The
"init" process in the docker containers pid-namespace has a notion of
graceful shutdown, where echos 0 into /proc/fs/nfsd/threads.

In the case where the container init process gets an un-trappable
signal, the kernel threads not really being part of the pid-namespace
will be left behind, the knfsd threads holds on references to the
mount-namespace, which leaves the filesystem mounted.

Yes - we can from the outside signal the kernel NFSd threads which do
let them terminate, but it's not ideal.

A simple-ish test case: unshare -n -p -m -f --mount-proc -- /usr/sbin/rpc.nfsd

Wishful thinking: the kernel nfsd threads that were spawned by
rpc.nfsd goes away with the pid-namespace
Actual outcome: the kernel nfsd threads sticks around until signalled

The actual question(s):
- Am I missing something ?
- Is this folly, and should be abandoned post haste ? (In essence, go
find a userspace nfs implementation)
- In the case where this is folly but we still decide to plow ahead,
is there any way I can determine which namespaces the kernel nfsd
threads hold references to ? (Making killing signalling the "right"
nfsd threads easier, as I lost my reference to the correct
/proc/fs/nfsd with the process I had in the corresponding
pid-namespace)

Cheers,
--
Kjetil Joergensen <[email protected]>
Medallia Inc