2013-07-28 20:15:01

by Dave Reisner

[permalink] [raw]
Subject: First NFS mount is delayed in 3.10

Hi,

I use an extremely simple NFSv4 setup to share files between my desktop
and some KVM guests. Noteworthy, I suppose, is that I'm not passing any
sec= parameter with the mount options. According to the documentation,
this means I'm using sec=sys.

Starting with the Linux 3.10 series, there's a 15 second delay when
mounting an NFS share for the first time, followed by a pair of messages
generated by the sunrpc module:

[ 22.206788] RPC: AUTH_GSS upcall timed out.
[ 22.206788] Please check user daemon is running.

The mount does succeed after this timeout, and everything behaves as
expected. Subsequent mounting is fast and this seems to be a result of
Linux commit abfdbd53a4e28. Additionally, if I start rpc.gssd on a
client, this delay naturally goes away as the requisite userspace daemon
is responsive. I can confirm that this same behavior exists on the
latest from Linus's repo; git describe'd as v3.11-rc2-355-g6c504ec.
I've not made any attempts to bisect this yet.

Is this expected behavior? The aforementioned commit seems to claim that
there's now a "stronger dependency on rpc.gssd", but the manpage for
rpc.gssd seems to indicate that it's really only used for Kerberized
NFS. Am I really going to need to use rpc.gssd on my clients now, even
though it'll effectively be doing nothing?

Any help is greatly appreciated. Please CC me on replies as I'm not
subscribed to the list.

Thanks,
Dave



2013-07-28 20:55:16

by Chuck Lever

[permalink] [raw]
Subject: Re: First NFS mount is delayed in 3.10


On Jul 28, 2013, at 10:14 PM, Dave Reisner <[email protected]> wrote:

> Hi,
>
> I use an extremely simple NFSv4 setup to share files between my desktop
> and some KVM guests. Noteworthy, I suppose, is that I'm not passing any
> sec= parameter with the mount options. According to the documentation,
> this means I'm using sec=sys.
>
> Starting with the Linux 3.10 series, there's a 15 second delay when
> mounting an NFS share for the first time, followed by a pair of messages
> generated by the sunrpc module:
>
> [ 22.206788] RPC: AUTH_GSS upcall timed out.
> [ 22.206788] Please check user daemon is running.
>
> The mount does succeed after this timeout, and everything behaves as
> expected. Subsequent mounting is fast and this seems to be a result of
> Linux commit abfdbd53a4e28. Additionally, if I start rpc.gssd on a
> client, this delay naturally goes away as the requisite userspace daemon
> is responsive. I can confirm that this same behavior exists on the
> latest from Linus's repo; git describe'd as v3.11-rc2-355-g6c504ec.
> I've not made any attempts to bisect this yet.
>
> Is this expected behavior?

Yes.

> The aforementioned commit seems to claim that
> there's now a "stronger dependency on rpc.gssd", but the manpage for
> rpc.gssd seems to indicate that it's really only used for Kerberized
> NFS.

The NFS client now attempts to use Kerberos for certain tasks on NFSv4 mounts, even if sec=sys is in effect.

The delay is due to a long-standing bug in the kernel -- it currently has no secure way to tell if rpc.gssd is running before it tries an upcall. The recent change simply exposes this delay more often.

A complete remedy may involve changes in both user space and kernel space. Once the kernel can tell for sure when rpc.gssd isn't running, it can skip the upcall and avoid the delay.

> Am I really going to need to use rpc.gssd on my clients now, even
> though it'll effectively be doing nothing?

"Need" is a little strong in this case: as you point out, everything works as expected, except for the delay, if rpc.gssd is not running.

Also, a large number of components of any distribution are "effectively doing nothing" for most people, who don't use every last feature that is available.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com