Message-ID: <527D0CE6.2020401@RedHat.com>
Date: Fri, 08 Nov 2013 11:10:14 -0500
From: Steve Dickson <SteveD@redhat.com>
MIME-Version: 1.0
To: Jeff Layton <jlayton@redhat.com>
CC: Chuck Lever <chuck.lever@oracle.com>,
        Trond Myklebust <Trond.Myklebust@netapp.com>,
        Linux NFS Mailing list <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] Adding the nfs4_use_min_auth module parameter
References: <1383851364-8370-1-git-send-email-steved@redhat.com>	<A8E8F2C4-8CB7-434E-8520-B2B53F0582D9@oracle.com>	<527C07B4.800@RedHat.com>	<44CA89EA-8B5E-4B83-A622-78A78F760FF1@oracle.com>	<527CDBFC.3070903@RedHat.com>	<20131108082202.4032f1a2@tlielax.poochiereds.net>	<527CFC72.2030907@RedHat.com> <20131108101232.10d49851@tlielax.poochiereds.net>
In-Reply-To: <20131108101232.10d49851@tlielax.poochiereds.net>
Content-Type: text/plain; charset=ISO-8859-1
Sender: linux-nfs-owner@vger.kernel.org


On 08/11/13 10:12, Jeff Layton wrote:
> On Fri, 08 Nov 2013 10:00:02 -0500
> Steve Dickson <SteveD@redhat.com> wrote:
> 
>>
>>
>> On 08/11/13 08:22, Jeff Layton wrote:
>>> On Fri, 08 Nov 2013 07:41:32 -0500
>>> Steve Dickson <SteveD@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On 07/11/13 18:05, Chuck Lever wrote:
>>>>>
>>>>> On Nov 7, 2013, at 1:35 PM, Steve Dickson <SteveD@redhat.com> wrote:
>>>>>
>>>>>> Hey mrchuck... 
>>>>>>
>>>>>> On 07/11/13 14:25, Chuck Lever wrote:
>>>>>>> Hi Steve-
>>>>>>>
>>>>>>> On Nov 7, 2013, at 11:09 AM, Steve Dickson <steved@redhat.com> wrote:
>>>>>>>
>>>>>>>> This new module parameter makes the v4 client
>>>>>>>> use the minimal authentication flavor (AUTH_UNIX)
>>>>>>>> when establishing NFSV4 state and doing the
>>>>>>>> pseudoroot lookup
>>>>>>>
>>>>>>> The patch description doesn't say, but is this change to work 
>>>>>>> around the 15 second GSSD upcall timeout? 
>>>>>> Yes. A 15 second delay on every mount due to security that
>>>>>> nobody is requesting is just not good.. IMHO..
>>>>>
>>>>> One thing we haven't discussed is reducing the upcall timeout to 5 seconds or less, 
>>>>> as a form of immediate relief.  15 seconds is arbitrary, and is onerous even when 
>>>>> you expect the mount to work (ie why would it be good for any properly configured 
>>>>> environment to take 15 seconds to establish a GSS context?).
>>>>>
>>>>> In other words, there are still cases where users wait 15 seconds unnecessarily, 
>>>>> and not because of the use of krb5i for lease management.  Aren't those of concern?
>>>> No. I think the concern here, at least my concern, is the lack of management.
>>>> We are forcing admins to use krb5i in lease management when its not necessary
>>>> and there is no way to turn it off.
>>>>   
>>>
>>> I don't think that's really the case. The idea was to have the client
>>> attempt to use krb5i if it's available, and then to fall back to
>>> AUTH_SYS if it isn't. This would be *absolutely* no big deal if the
>>> GSSAPI upcall succeeded or failed immediately instead of requiring this
>>> timeout when the daemon isn't running.
>> What server makes krb5i available today in state setup and pseudoroot lookups?
>>
> 
> That I don't know...sorry...
Then what is the justification to take all these extra steps
there they going to fail %100 of the time??

 
> 
>>>
>>>>>
>>>>>> Also running
>>>>>> a security daemon for non-secure mounts just seems wrong to me.
>>>>>
>>>>> It seems wrong to me to keep auth_rpcgss loaded if no mounts use Kerberos security.  
>>>>> What's the difference between that and running gssd all the time?
>>>> You can say thing about %99 of all the modules load in a Fedora kernel today.
>>>> lsmod | wc -l show there is 107 modules load on a f19 system. Do you think
>>>> all those modules are used? Of course not, but they are not spewing 25
>>>> log messages per mount or sucking up CPU cycles unnecessarily. 
>>>>
>>>
>>> rpc.gssd spends most of its time sleeping, unless there is
>>> nfs mount or "real" gssapi activity. CPU cycles shouldn't be an issue.
>>> The RSS for the process on my 64-bit box is ~2k so I wouldn't even be
>>> concerned about memory usage. For comparison, we don't worry much about
>>> running rpc.statd these days and it's of similar size and duty cycle.
>> rpc.statd is needed by the V3 protocol to support file locking which
>> is a bit more needed that a rpc.gssd to do unsecured mounts.
>>
> 
> Right, but we've defaulted to v4 for *years* now. Most people run statd
> and don't actually use it. No one seems to get bent out of shape about
> that though.
So you using that justify another needless daemon? 

> 
>>>
>>> The log messages are another matter. rpc.gssd is just too chatty by
>>> default. That's fixable, but it would mean that we'd need to tell
>>> people to run it in verbose mode when tracking down problems instead of
>>> assuming that all of those messages would go to the logs. That seems
>>> like a reasonable thing to do anyway. Most people don't care about
>>> rpc.gssd log messages once they have kerberized NFS working.
>>
>> So we should remove error and warning messages that in the
>> pass help us debug very difficult code so we can run rpc.gssd
>> just for normal unsecured mounts? 
>>
> 
> Yes. Because in the case where you don't have a keytab or a credcache,
> failure is expected. There's no need to log messages due to those sorts
> of problems. Once you have it working correctly, there's just no need to
> keep logging that junk. Most of those log messages should only be
> enabled when rpc.gssd is run with -v(vvv). Admins can (and often do)
> run rpc.gssd in foreground with those flags while they sort through the
> setup.
When there is no keytab or credcache there should be to start rpc.gssd.
Non-secure mount should not depend on the existence of a user level 
daemon... When has ever been the case a NFS mount was dependent on
user level daemon? 


> 
>> My apologies... but that makes absolutely no sense to me. 
>>
>> Let talk about scalability... Does anybody have idea what
>> this needless upcall will cost on a client that does 
>> a very large number of mounts all at once? 
>>
> 
> Virtually nothing -- a couple of pipe reads/writes.
Until rpc.gssd dies and everything comes to a halt... ;-)

> 
>> Please let me reiterate my point. The new securing of
>> SETCLIENTIDs and pseudoroot lookups is good! I have
>> no problem with the actual technology. What I'm having
>> a problem with is I can not manage this new technology.
>>
>> Could you image the push back there would be if, 
>> back in the day when secure mounts became available, 
>> we required *everyone* to use secure mounts. The 
>> only way you could use NFS  was through a secure mount.
>>
>> That's basically what we are doing today! We are requiring
>> every mount to try a secure flavor that fails %100 time
>> because there is no server support for this technology. 
>>
>> Again its not the technology, its the management of the
>> technology that this new module parameter address. All
>> I'm looking for is switch that enables or disables 
>> this new technology... 
>>
> 
> No one is requiring anyone to do anything. IMO, running rpc.gssd by
> default is simply a reasonable workaround for this long delay in
> mounting until we have a real solution for this problem.
The client code is hard code to try security flavor on every
mount. There is no way around it. That is I mean by requiring
all mounts to use security flavor.
> 
> I'd like to see us transition to a more sensible upcall in the future
> that doesn't require this timeout, but we're not there yet. Once we
> have that, we won't need to run rpc.gssd at all anymore.
So we all agree, this code needs work... So why not have 
a way to disable the trying of security flavor until
its ready for prime time??? We done that in the past??

Or may this... Disable it until there is server support!
What is wrong with that??? 

steved.

> 
> FWIW, I'm perfectly happy do the work on such an upcall. I think
> call_usermodehelper (or maybe keys API) makes a lot of sense for this.
> The only real problem I can see with doing this today is the damnable
> namespaces. The nfs client is currently net-namespacified, but running
> something like call_usermodehelper sort of requires that you switch the
> mnt namespace too.
>