Return-Path: Received: from userp1040.oracle.com ([156.151.31.81]:40511 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757964AbcEFNXo (ORCPT ); Fri, 6 May 2016 09:23:44 -0400 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: Configuring NFSv4.0 Kerberos on a multi-homed Linux NFS server From: Chuck Lever In-Reply-To: <20160506024401.GC5365@fieldses.org> Date: Fri, 6 May 2016 09:23:40 -0400 Cc: Linux NFS Mailing List Message-Id: <9E194D65-280F-4107-979C-FEFF2B83B211@oracle.com> References: <8198666A-8963-42D2-9C4C-08374F0E8E5D@oracle.com> <20160506024401.GC5365@fieldses.org> To: "J. Bruce Fields" Sender: linux-nfs-owner@vger.kernel.org List-ID: > On May 5, 2016, at 10:44 PM, Bruce Fields wrote: > > On Thu, May 05, 2016 at 05:01:58PM -0400, Chuck Lever wrote: >> After some IRC discussion with Bruce, we think the answer >> is "this is not supported in the current Linux NFS server." >> >> The server does not have a way to determine which service >> principal to use for NFSv4.0 callback operations. It picks >> (probably) the first nfs/ service principal in the server's >> keytab for all callback operations. >> >> Thus if a Linux NFS server has a keytab, clients can mount >> it with NFSv4.0 (and any security flavor) only on the i/f >> whose hostname matches the name of the nfs/ service >> principal in that server's keytab. > > One correction: the mount should still work correctly. The server just > won't grant any delegations to the client. Unfortunately this is not the case. The CB_NULLs the server uses to validate the backchannel connection work, and a GSS context is correctly established. The server starts to hand out delegations. Operation continues until the server tries to recall a delegation. The CB_COMPOUND / CB_RECALL fails for the reasons described above. Operation stalls for tens of seconds while the server waits for the client to respond to the CB_RECALL. Requests against the file whose delegation is being recalled get NFS4ERR_DELAY. After some period, the client happens to perform a RENEW, and the server reports NFS4ERR_CB_PATH_DOWN. The client returns its delegations and performs another SETCLIENTID. The server destroys the backchannel GSS context and closes the backchannel connection. The server creates a new backchannel connection and establishes a fresh GSS context for the backchannel. Operation continues until the server tries to recall another delegation. So, operation is correct and no data corruption occurs. But the mount is not usable in any production sense because operation can stall for tens of seconds whenever a delegation recall is attempted. Depending on the workload, that can be frequent, or it may not be noticeable. This is the behavior when the client discards callback operations that are not properly authenticated. If the client behavior is changed to respond with RPCAUTH_BADCRED, the server can recognize that the client received the request and responded. The server will have to change its behavior in this case. Today it continues to attempt to use the backchannel, and each attempt fails. Somehow it needs to mark that client so that it stops trying to issue CB operations to it. >> In other words, if the server has a keytab with the >> principals: >> >> nfs/server-a >> nfs/server-b >> nfs/server-c >> >> NFSv4.0 will operate correctly only when mounting the >> server via server-a: . >> >> Clients that do not have a keytab should be able to mount >> with NFSv4.0 via the other interfaces. This is because >> they will not try to negotiate krb5i for lease management, >> and the server will not attempt to use krb5i for callback >> operations. >> >> Bruce feels this is a corner case, would be difficult to >> address, and is adequately worked around by using NFSv3 >> or NFSv4.1 or higher. So currently this is a WONTFIX. > > Right, so if there's somebody really need delegations in the multi-homed > NFSv4.0/krb5 case, they're welcomed to look into it--I can't say I'd > turn down good patches (maybe it's not even that hard--may depend on > whether the gss-proxy protocol does what we need?). But it doesn't seem > like a priority. During happy hour, Marcus claimed it should be straightforward to fix. > --b. > >> >> Copied Bruce to correct anything I might have summarized >> incorrectly. >> >> -- >> Chuck Lever -- Chuck Lever