Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Configuring NFSv4.0 Kerberos on a multi-homed Linux NFS server
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <20160506024401.GC5365@fieldses.org>
Date: Fri, 6 May 2016 09:23:40 -0400
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Message-Id: <9E194D65-280F-4107-979C-FEFF2B83B211@oracle.com>
References: <DB5B2D02-412E-46AD-9F0A-7EBA4EB0C0B0@oracle.com> <8198666A-8963-42D2-9C4C-08374F0E8E5D@oracle.com> <20160506024401.GC5365@fieldses.org>
To: "J. Bruce Fields" <bfields@fieldses.org>
Sender: linux-nfs-owner@vger.kernel.org


> On May 5, 2016, at 10:44 PM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Thu, May 05, 2016 at 05:01:58PM -0400, Chuck Lever wrote:
>> After some IRC discussion with Bruce, we think the answer
>> is "this is not supported in the current Linux NFS server."
>> 
>> The server does not have a way to determine which service
>> principal to use for NFSv4.0 callback operations. It picks
>> (probably) the first nfs/ service principal in the server's
>> keytab for all callback operations.
>> 
>> Thus if a Linux NFS server has a keytab, clients can mount
>> it with NFSv4.0 (and any security flavor) only on the i/f
>> whose hostname matches the name of the nfs/ service
>> principal in that server's keytab.
> 
> One correction: the mount should still work correctly.  The server just
> won't grant any delegations to the client.

Unfortunately this is not the case.

The CB_NULLs the server uses to validate the backchannel
connection work, and a GSS context is correctly established.
The server starts to hand out delegations.

Operation continues until the server tries to recall a
delegation. The CB_COMPOUND / CB_RECALL fails for the
reasons described above.

Operation stalls for tens of seconds while the server
waits for the client to respond to the CB_RECALL.
Requests against the file whose delegation is being
recalled get NFS4ERR_DELAY.

After some period, the client happens to perform a RENEW,
and the server reports NFS4ERR_CB_PATH_DOWN.

The client returns its delegations and performs another
SETCLIENTID. The server destroys the backchannel GSS
context and closes the backchannel connection.

The server creates a new backchannel connection and
establishes a fresh GSS context for the backchannel.
Operation continues until the server tries to recall
another delegation.

So, operation is correct and no data corruption occurs.
But the mount is not usable in any production sense
because operation can stall for tens of seconds whenever
a delegation recall is attempted. Depending on the
workload, that can be frequent, or it may not be
noticeable.

This is the behavior when the client discards callback
operations that are not properly authenticated. If the
client behavior is changed to respond with RPCAUTH_BADCRED,
the server can recognize that the client received the
request and responded.

The server will have to change its behavior in this case.
Today it continues to attempt to use the backchannel, and
each attempt fails. Somehow it needs to mark that client
so that it stops trying to issue CB operations to it.


>> In other words, if the server has a keytab with the
>> principals:
>> 
>>  nfs/server-a
>>  nfs/server-b
>>  nfs/server-c
>> 
>> NFSv4.0 will operate correctly only when mounting the
>> server via server-a: .
>> 
>> Clients that do not have a keytab should be able to mount
>> with NFSv4.0 via the other interfaces. This is because
>> they will not try to negotiate krb5i for lease management,
>> and the server will not attempt to use krb5i for callback
>> operations.
>> 
>> Bruce feels this is a corner case, would be difficult to
>> address, and is adequately worked around by using NFSv3
>> or NFSv4.1 or higher. So currently this is a WONTFIX.
> 
> Right, so if there's somebody really need delegations in the multi-homed
> NFSv4.0/krb5 case, they're welcomed to look into it--I can't say I'd
> turn down good patches (maybe it's not even that hard--may depend on
> whether the gss-proxy protocol does what we need?).  But it doesn't seem
> like a priority.

During happy hour, Marcus claimed it should be straightforward
to fix.


> --b.
> 
>> 
>> Copied Bruce to correct anything I might have summarized
>> incorrectly.
>> 
>> --
>> Chuck Lever

--
Chuck Lever