2016-06-21 15:03:27

by Steve Dickson

[permalink] [raw]
Subject: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled

When Kerberos is enabled, the /etc/krb5.keytab exists
which causes the both gssd daemons to start, automatically.

With rpc.gssd running, on all NFS mounts, an upcall
is done to get GSS security context for SETCLIENTID procedure.

When Kerberos is not configured for NFS, meaning
there is no host/hostname@REALM principal in
the key tab, those upcalls always fall causing
the mount to hang for several seconds.

This patch added an [Install] section to both
services so the services can be enable and disable.
The README was also updated.

Signed-off-by: Steve Dickson <[email protected]>
---
systemd/README | 14 +++++---------
systemd/rpc-gssd.service | 6 ++++++
systemd/rpc-svcgssd.service | 7 +++++++
3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/systemd/README b/systemd/README
index 7c43df8..58dae42 100644
--- a/systemd/README
+++ b/systemd/README
@@ -59,13 +59,9 @@ information such as in /etc/sysconfig/nfs or /etc/defaults/nfs.
It is run once by nfs-config.service.

rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab
-is present.
-If a site needs this file present but does not want the gss daemons
-running, it should create
- /etc/systemd/system/rpc-gssd.service.d/01-disable.conf
-and
- /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf
+is present. If a site needs this file present but does not want
+the gss daemons running, they can be disabled by doing
+
+ systemctl disable rpc-gssd
+ systemctl disable rpc-svcgssd

-containing
- [Unit]
- ConditionNull=false
diff --git a/systemd/rpc-gssd.service b/systemd/rpc-gssd.service
index d4a3819..681f26a 100644
--- a/systemd/rpc-gssd.service
+++ b/systemd/rpc-gssd.service
@@ -17,3 +17,9 @@ EnvironmentFile=-/run/sysconfig/nfs-utils

Type=forking
ExecStart=/usr/sbin/rpc.gssd $GSSDARGS
+
+# Only start if the service is enabled
+# and /etc/krb5.keytab exists
+[Install]
+WantedBy=multi-user.target
+
diff --git a/systemd/rpc-svcgssd.service b/systemd/rpc-svcgssd.service
index 41177b6..4433ed7 100644
--- a/systemd/rpc-svcgssd.service
+++ b/systemd/rpc-svcgssd.service
@@ -18,3 +18,10 @@ After=nfs-config.service
EnvironmentFile=-/run/sysconfig/nfs-utils
Type=forking
ExecStart=/usr/sbin/rpc.svcgssd $SVCGSSDARGS
+
+# Only start if the service is enabled
+# and /etc/krb5.keytab exists
+# and when gss-proxy is not runing
+[Install]
+WantedBy=multi-user.target
+
--
2.5.5



2016-06-21 15:48:06

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled


> On Jun 21, 2016, at 11:43 AM, Steve Dickson <[email protected]> wrote:
>
>
>
> On 06/21/2016 11:26 AM, Chuck Lever wrote:
>>
>>> On Jun 21, 2016, at 10:53 AM, Steve Dickson <[email protected]> wrote:
>>>
>>> When Kerberos is enabled, the /etc/krb5.keytab exists
>>> which causes the both gssd daemons to start, automatically.
>>>
>>> With rpc.gssd running, on all NFS mounts, an upcall
>>> is done to get GSS security context for SETCLIENTID procedure.
>>>
>>> When Kerberos is not configured for NFS, meaning
>>> there is no host/hostname@REALM principal in
>>> the key tab, those upcalls always fall causing
>>> the mount to hang for several seconds.
>>
>> What is the root cause of the temporary hang?
> All the upcalls to rpc.gssd... I think there are
> three for every mount.
>
>>
>> When you say "the upcall fails" do you mean there is
>> no reply, or that there is a negative reply after a
>> delay, or there is an immediate negative reply?
> Good point.. the upcalls did not fail, they
> just received negative replies.

I would say that the upcalls themselves are not the
root cause of the delay if they all return immediately.

Are you saying that each negative reply takes a moment?
If that's the case, is there something that gssd should
do to reply more quickly when there's no host or nfs
service principal in the keytab?

Adding administrative interface complexity to work around
an underlying implementation problem might not be the best
long term choice.


> steved.
>
>>
>>
>>> This patch added an [Install] section to both
>>> services so the services can be enable and disable.
>>> The README was also updated.
>>>
>>> Signed-off-by: Steve Dickson <[email protected]>
>>> ---
>>> systemd/README | 14 +++++---------
>>> systemd/rpc-gssd.service | 6 ++++++
>>> systemd/rpc-svcgssd.service | 7 +++++++
>>> 3 files changed, 18 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/systemd/README b/systemd/README
>>> index 7c43df8..58dae42 100644
>>> --- a/systemd/README
>>> +++ b/systemd/README
>>> @@ -59,13 +59,9 @@ information such as in /etc/sysconfig/nfs or /etc/defaults/nfs.
>>> It is run once by nfs-config.service.
>>>
>>> rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab
>>> -is present.
>>> -If a site needs this file present but does not want the gss daemons
>>> -running, it should create
>>> - /etc/systemd/system/rpc-gssd.service.d/01-disable.conf
>>> -and
>>> - /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf
>>> +is present. If a site needs this file present but does not want
>>> +the gss daemons running, they can be disabled by doing
>>> +
>>> + systemctl disable rpc-gssd
>>> + systemctl disable rpc-svcgssd
>>>
>>> -containing
>>> - [Unit]
>>> - ConditionNull=false
>>> diff --git a/systemd/rpc-gssd.service b/systemd/rpc-gssd.service
>>> index d4a3819..681f26a 100644
>>> --- a/systemd/rpc-gssd.service
>>> +++ b/systemd/rpc-gssd.service
>>> @@ -17,3 +17,9 @@ EnvironmentFile=-/run/sysconfig/nfs-utils
>>>
>>> Type=forking
>>> ExecStart=/usr/sbin/rpc.gssd $GSSDARGS
>>> +
>>> +# Only start if the service is enabled
>>> +# and /etc/krb5.keytab exists
>>> +[Install]
>>> +WantedBy=multi-user.target
>>> +
>>> diff --git a/systemd/rpc-svcgssd.service b/systemd/rpc-svcgssd.service
>>> index 41177b6..4433ed7 100644
>>> --- a/systemd/rpc-svcgssd.service
>>> +++ b/systemd/rpc-svcgssd.service
>>> @@ -18,3 +18,10 @@ After=nfs-config.service
>>> EnvironmentFile=-/run/sysconfig/nfs-utils
>>> Type=forking
>>> ExecStart=/usr/sbin/rpc.svcgssd $SVCGSSDARGS
>>> +
>>> +# Only start if the service is enabled
>>> +# and /etc/krb5.keytab exists
>>> +# and when gss-proxy is not runing
>>> +[Install]
>>> +WantedBy=multi-user.target
>>> +
>>> --
>>> 2.5.5
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever

--
Chuck Lever




2016-06-21 15:54:01

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled



On 06/21/2016 11:26 AM, Chuck Lever wrote:
>
>> On Jun 21, 2016, at 10:53 AM, Steve Dickson <[email protected]> wrote:
>>
>> When Kerberos is enabled, the /etc/krb5.keytab exists
>> which causes the both gssd daemons to start, automatically.
>>
>> With rpc.gssd running, on all NFS mounts, an upcall
>> is done to get GSS security context for SETCLIENTID procedure.
>>
>> When Kerberos is not configured for NFS, meaning
>> there is no host/hostname@REALM principal in
>> the key tab, those upcalls always fall causing
>> the mount to hang for several seconds.
>
> What is the root cause of the temporary hang?
All the upcalls to rpc.gssd... I think there are
three for every mount.

>
> When you say "the upcall fails" do you mean there is
> no reply, or that there is a negative reply after a
> delay, or there is an immediate negative reply?
Good point.. the upcalls did not fail, they
just received negative replies.

steved.

>
>
>> This patch added an [Install] section to both
>> services so the services can be enable and disable.
>> The README was also updated.
>>
>> Signed-off-by: Steve Dickson <[email protected]>
>> ---
>> systemd/README | 14 +++++---------
>> systemd/rpc-gssd.service | 6 ++++++
>> systemd/rpc-svcgssd.service | 7 +++++++
>> 3 files changed, 18 insertions(+), 9 deletions(-)
>>
>> diff --git a/systemd/README b/systemd/README
>> index 7c43df8..58dae42 100644
>> --- a/systemd/README
>> +++ b/systemd/README
>> @@ -59,13 +59,9 @@ information such as in /etc/sysconfig/nfs or /etc/defaults/nfs.
>> It is run once by nfs-config.service.
>>
>> rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab
>> -is present.
>> -If a site needs this file present but does not want the gss daemons
>> -running, it should create
>> - /etc/systemd/system/rpc-gssd.service.d/01-disable.conf
>> -and
>> - /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf
>> +is present. If a site needs this file present but does not want
>> +the gss daemons running, they can be disabled by doing
>> +
>> + systemctl disable rpc-gssd
>> + systemctl disable rpc-svcgssd
>>
>> -containing
>> - [Unit]
>> - ConditionNull=false
>> diff --git a/systemd/rpc-gssd.service b/systemd/rpc-gssd.service
>> index d4a3819..681f26a 100644
>> --- a/systemd/rpc-gssd.service
>> +++ b/systemd/rpc-gssd.service
>> @@ -17,3 +17,9 @@ EnvironmentFile=-/run/sysconfig/nfs-utils
>>
>> Type=forking
>> ExecStart=/usr/sbin/rpc.gssd $GSSDARGS
>> +
>> +# Only start if the service is enabled
>> +# and /etc/krb5.keytab exists
>> +[Install]
>> +WantedBy=multi-user.target
>> +
>> diff --git a/systemd/rpc-svcgssd.service b/systemd/rpc-svcgssd.service
>> index 41177b6..4433ed7 100644
>> --- a/systemd/rpc-svcgssd.service
>> +++ b/systemd/rpc-svcgssd.service
>> @@ -18,3 +18,10 @@ After=nfs-config.service
>> EnvironmentFile=-/run/sysconfig/nfs-utils
>> Type=forking
>> ExecStart=/usr/sbin/rpc.svcgssd $SVCGSSDARGS
>> +
>> +# Only start if the service is enabled
>> +# and /etc/krb5.keytab exists
>> +# and when gss-proxy is not runing
>> +[Install]
>> +WantedBy=multi-user.target
>> +
>> --
>> 2.5.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>

2016-06-21 17:20:31

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled

Hey,

On 06/21/2016 11:47 AM, Chuck Lever wrote:
>>> >> When you say "the upcall fails" do you mean there is
>>> >> no reply, or that there is a negative reply after a
>>> >> delay, or there is an immediate negative reply?
>> > Good point.. the upcalls did not fail, they
>> > just received negative replies.
> I would say that the upcalls themselves are not the
> root cause of the delay if they all return immediately.
Well when rpc.gssd is not running (aka no upcalls)
the delays stop happening.

>
> Are you saying that each negative reply takes a moment?
Yes. Even on sec=sys mounts. Which is the issue.

> If that's the case, is there something that gssd should
> do to reply more quickly when there's no host or nfs
> service principal in the keytab?
I don't think so... unless we start caching negative
negative response or something like which is way
overkill especially since the problem is solved
by not starting rpc.gssd.

>
> Adding administrative interface complexity to work around
> an underlying implementation problem might not be the best
> long term choice.
Well there already was way to stop gssd from starting when
kerberos is configured but not for NFS. From the systemd/README:

rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab
is present.
If a site needs this file present but does not want the gss daemons
running, it should create
/etc/systemd/system/rpc-gssd.service.d/01-disable.conf
and
/etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf

containing
[Unit]
ConditionNull=false

Which does work and will still work... but I'm thinking it is
much similar to disable the service via systemd command
systemctl disable rpc-gssd

than creating and editing those .conf files.

steved

2016-06-21 19:32:09

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled


> On Jun 21, 2016, at 1:20 PM, Steve Dickson <[email protected]> wrote:
>
> Hey,
>
> On 06/21/2016 11:47 AM, Chuck Lever wrote:
>>>>>> When you say "the upcall fails" do you mean there is
>>>>>> no reply, or that there is a negative reply after a
>>>>>> delay, or there is an immediate negative reply?
>>>> Good point.. the upcalls did not fail, they
>>>> just received negative replies.
>> I would say that the upcalls themselves are not the
>> root cause of the delay if they all return immediately.
> Well when rpc.gssd is not running (aka no upcalls)
> the delays stop happening.

Well let me say it a different way: the mechanism of
performing an upcall should be fast. The stuff that gssd
is doing as a result of the upcall request may be taking
longer than expected, though.

If gssd is up, and has nothing to do (which I think is
the case here?) then IMO that upcall should be unnoticeable.
I don't expect there to be any difference between the kernel
squelching an upcall, and an upcall completing immediately.


>> Are you saying that each negative reply takes a moment?
> Yes. Even on sec=sys mounts. Which is the issue.

Yep, I get that. I've seen that behavior on occasion,
and agree it should be addressed somehow.


>> If that's the case, is there something that gssd should
>> do to reply more quickly when there's no host or nfs
>> service principal in the keytab?
> I don't think so... unless we start caching negative
> negative response or something like which is way
> overkill especially since the problem is solved
> by not starting rpc.gssd.

I'd like to understand why this upcall, which should be
equivalent to a no-op, is not returning an immediate
answer. Three of these in a row shouldn't take more than
a dozen milliseconds.

How long does the upcall take when there is a service
principal versus how long it takes when there isn't one?
Try running gssd under strace to get some timings.

Is gssd waiting for syslog or something?


>> Adding administrative interface complexity to work around
>> an underlying implementation problem might not be the best
>> long term choice.
> Well there already was way to stop gssd from starting when
> kerberos is configured but not for NFS. From the systemd/README:
>
> rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab
> is present.
> If a site needs this file present but does not want the gss daemons
> running, it should create
> /etc/systemd/system/rpc-gssd.service.d/01-disable.conf
> and
> /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf
>
> containing
> [Unit]
> ConditionNull=false
>
> Which does work and will still work... but I'm thinking it is
> much similar to disable the service via systemd command
> systemctl disable rpc-gssd
>
> than creating and editing those .conf files.

This should all be automatic, IMO.

On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5
to your mounts. No reboot, nothing to restart. Linux should be
that simple.


--
Chuck Lever




2016-06-21 21:20:35

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled


> On Jun 21, 2016, at 10:53 AM, Steve Dickson <[email protected]> wrote:
>
> When Kerberos is enabled, the /etc/krb5.keytab exists
> which causes the both gssd daemons to start, automatically.
>
> With rpc.gssd running, on all NFS mounts, an upcall
> is done to get GSS security context for SETCLIENTID procedure.
>
> When Kerberos is not configured for NFS, meaning
> there is no host/hostname@REALM principal in
> the key tab, those upcalls always fall causing
> the mount to hang for several seconds.

What is the root cause of the temporary hang?

When you say "the upcall fails" do you mean there is
no reply, or that there is a negative reply after a
delay, or there is an immediate negative reply?


> This patch added an [Install] section to both
> services so the services can be enable and disable.
> The README was also updated.
>
> Signed-off-by: Steve Dickson <[email protected]>
> ---
> systemd/README | 14 +++++---------
> systemd/rpc-gssd.service | 6 ++++++
> systemd/rpc-svcgssd.service | 7 +++++++
> 3 files changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/systemd/README b/systemd/README
> index 7c43df8..58dae42 100644
> --- a/systemd/README
> +++ b/systemd/README
> @@ -59,13 +59,9 @@ information such as in /etc/sysconfig/nfs or /etc/defaults/nfs.
> It is run once by nfs-config.service.
>
> rpc.gssd and rpc.svcgssd are assumed to be needed if /etc/krb5.keytab
> -is present.
> -If a site needs this file present but does not want the gss daemons
> -running, it should create
> - /etc/systemd/system/rpc-gssd.service.d/01-disable.conf
> -and
> - /etc/systemd/system/rpc-svcgssd.service.d/01-disable.conf
> +is present. If a site needs this file present but does not want
> +the gss daemons running, they can be disabled by doing
> +
> + systemctl disable rpc-gssd
> + systemctl disable rpc-svcgssd
>
> -containing
> - [Unit]
> - ConditionNull=false
> diff --git a/systemd/rpc-gssd.service b/systemd/rpc-gssd.service
> index d4a3819..681f26a 100644
> --- a/systemd/rpc-gssd.service
> +++ b/systemd/rpc-gssd.service
> @@ -17,3 +17,9 @@ EnvironmentFile=-/run/sysconfig/nfs-utils
>
> Type=forking
> ExecStart=/usr/sbin/rpc.gssd $GSSDARGS
> +
> +# Only start if the service is enabled
> +# and /etc/krb5.keytab exists
> +[Install]
> +WantedBy=multi-user.target
> +
> diff --git a/systemd/rpc-svcgssd.service b/systemd/rpc-svcgssd.service
> index 41177b6..4433ed7 100644
> --- a/systemd/rpc-svcgssd.service
> +++ b/systemd/rpc-svcgssd.service
> @@ -18,3 +18,10 @@ After=nfs-config.service
> EnvironmentFile=-/run/sysconfig/nfs-utils
> Type=forking
> ExecStart=/usr/sbin/rpc.svcgssd $SVCGSSDARGS
> +
> +# Only start if the service is enabled
> +# and /etc/krb5.keytab exists
> +# and when gss-proxy is not runing
> +[Install]
> +WantedBy=multi-user.target
> +
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2016-06-23 15:57:56

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled

Sorry for the delayed response... PTO yesterday.

On 06/21/2016 01:57 PM, Chuck Lever wrote:
>
>> On Jun 21, 2016, at 1:20 PM, Steve Dickson <[email protected]> wrote:
>>
>> Hey,
>>
>> On 06/21/2016 11:47 AM, Chuck Lever wrote:
>>>>>>> When you say "the upcall fails" do you mean there is
>>>>>>> no reply, or that there is a negative reply after a
>>>>>>> delay, or there is an immediate negative reply?
>>>>> Good point.. the upcalls did not fail, they
>>>>> just received negative replies.
>>> I would say that the upcalls themselves are not the
>>> root cause of the delay if they all return immediately.
>> Well when rpc.gssd is not running (aka no upcalls)
>> the delays stop happening.
>
> Well let me say it a different way: the mechanism of
> performing an upcall should be fast. The stuff that gssd
> is doing as a result of the upcall request may be taking
> longer than expected, though.
I'm pretty sure its not the actual mechanism causing the
delay... Its the act of failing (read keytabs maybe even
ping the KDC) is what taking the time at least that's
what the sys logs show.

>
> If gssd is up, and has nothing to do (which I think is
> the case here?) then IMO that upcall should be unnoticeable.
Well its not... It is causing a delay.

> I don't expect there to be any difference between the kernel
> squelching an upcall, and an upcall completing immediately.
There kernel will always make the upcall when rpc.gssd
is running... I don't see how the kernel can squelch the upcall
with rpc.gssd running. Not starting rpc.gssd is the only
way to squelch the upcall.

>
>
>>> Are you saying that each negative reply takes a moment?
>> Yes. Even on sec=sys mounts. Which is the issue.
>
> Yep, I get that. I've seen that behavior on occasion,
> and agree it should be addressed somehow.
>
>
>>> If that's the case, is there something that gssd should
>>> do to reply more quickly when there's no host or nfs
>>> service principal in the keytab?
>> I don't think so... unless we start caching negative
>> negative response or something like which is way
>> overkill especially since the problem is solved
>> by not starting rpc.gssd.
>
> I'd like to understand why this upcall, which should be
> equivalent to a no-op, is not returning an immediate
> answer. Three of these in a row shouldn't take more than
> a dozen milliseconds.
It looks like, from the systlog timestamps, each upcall
is taking a ~1 sec.

>
> How long does the upcall take when there is a service
> principal versus how long it takes when there isn't one?
> Try running gssd under strace to get some timings.
the key tab does have a nfs/hosname@REALM entry. So the
call to the KDC is probably failing... which
could be construed as a misconfiguration, but
that misconfiguration should not even come into
play with sec=sys mounts... IMHO...


>
> Is gssd waiting for syslog or something?
No... its just failing to get the machine creds for root

[snip]

>> Which does work and will still work... but I'm thinking it is
>> much similar to disable the service via systemd command
>> systemctl disable rpc-gssd
>>
>> than creating and editing those .conf files.
>
> This should all be automatic, IMO.
>
> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5
> to your mounts. No reboot, nothing to restart. Linux should be
> that simple.
The only extra step with Linux is to 'sysctmctl start rpc-gssd'
I don't there is much would can do about that.... But of
course... Patches are always welcomed!! 8-)

TBL... When kerberos is configured correctly for NFS everything
works just fine. When kerberos is configured, but not for NFS,
causes delays on all NFS mounts.

Today, there is a method to stop rpc-gssd from blindly starting
when kerberos is configured to eliminate that delay.
This patch just tweaking that method to make things easier.

To address your concern about covering up a bug. I just don't
see it... The code is doing exactly what its asked to do.
By default the kernel asks krb5i context (when rpc.gssd
is run). rpc.gssd looking for a principle in the key tab,
when found the KDC is called...

Everything is working just like it should and it is
failing just like it should. I'm just trying to
eliminate all this process when not needed, in
an easier way..

steved.

2016-06-24 01:30:30

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled


> On Jun 23, 2016, at 11:57 AM, Steve Dickson <[email protected]> wrote:
>
> Sorry for the delayed response... PTO yesterday.
>
>> On 06/21/2016 01:57 PM, Chuck Lever wrote:
>>
>>> On Jun 21, 2016, at 1:20 PM, Steve Dickson <[email protected]> wrote:
>>>
>>> Hey,
>>>
>>> On 06/21/2016 11:47 AM, Chuck Lever wrote:
>>>>>>>> When you say "the upcall fails" do you mean there is
>>>>>>>> no reply, or that there is a negative reply after a
>>>>>>>> delay, or there is an immediate negative reply?
>>>>>> Good point.. the upcalls did not fail, they
>>>>>> just received negative replies.
>>>> I would say that the upcalls themselves are not the
>>>> root cause of the delay if they all return immediately.
>>> Well when rpc.gssd is not running (aka no upcalls)
>>> the delays stop happening.
>>
>> Well let me say it a different way: the mechanism of
>> performing an upcall should be fast. The stuff that gssd
>> is doing as a result of the upcall request may be taking
>> longer than expected, though.
> I'm pretty sure its not the actual mechanism causing the
> delay... Its the act of failing (read keytabs maybe even
> ping the KDC) is what taking the time at least that's
> what the sys logs show.
>
>>
>> If gssd is up, and has nothing to do (which I think is
>> the case here?) then IMO that upcall should be unnoticeable.
> Well its not... It is causing a delay.
>
>> I don't expect there to be any difference between the kernel
>> squelching an upcall, and an upcall completing immediately.
> There kernel will always make the upcall when rpc.gssd
> is running... I don't see how the kernel can squelch the upcall
> with rpc.gssd running. Not starting rpc.gssd is the only
> way to squelch the upcall.
>
>>
>>
>>>> Are you saying that each negative reply takes a moment?
>>> Yes. Even on sec=sys mounts. Which is the issue.
>>
>> Yep, I get that. I've seen that behavior on occasion,
>> and agree it should be addressed somehow.
>>
>>
>>>> If that's the case, is there something that gssd should
>>>> do to reply more quickly when there's no host or nfs
>>>> service principal in the keytab?
>>> I don't think so... unless we start caching negative
>>> negative response or something like which is way
>>> overkill especially since the problem is solved
>>> by not starting rpc.gssd.
>>
>> I'd like to understand why this upcall, which should be
>> equivalent to a no-op, is not returning an immediate
>> answer. Three of these in a row shouldn't take more than
>> a dozen milliseconds.
> It looks like, from the systlog timestamps, each upcall
> is taking a ~1 sec.
>
>>
>> How long does the upcall take when there is a service
>> principal versus how long it takes when there isn't one?
>> Try running gssd under strace to get some timings.
> the key tab does have a nfs/hosname@REALM entry. So the
> call to the KDC is probably failing... which
> could be construed as a misconfiguration, but
> that misconfiguration should not even come into
> play with sec=sys mounts... IMHO...

I disagree, of course. sec=sys means the client is not going
to use Kerberos to authenticate individual user requests,
and users don't need a Kerberos ticket to access their files.
That's still the case.

I'm not aware of any promise that sec=sys means there is
no Kerberos within 50 miles of that mount.

If there are valid keytabs on both systems, they need to
be set up correctly. If there's a misconfiguration, then
gssd needs to report it precisely instead of time out.
And it's just as easy to add a service principal to a keytab
as it is to disable a systemd service in that case.


>> Is gssd waiting for syslog or something?
> No... its just failing to get the machine creds for root

Clearly more is going on than that, and so far we have only
some speculation. Can you provide an strace of rpc.gssd or
a network capture so we can confirm what's going on?


> [snip]
>
>>> Which does work and will still work... but I'm thinking it is
>>> much similar to disable the service via systemd command
>>> systemctl disable rpc-gssd
>>>
>>> than creating and editing those .conf files.
>>
>> This should all be automatic, IMO.
>>
>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5
>> to your mounts. No reboot, nothing to restart. Linux should be
>> that simple.
> The only extra step with Linux is to 'sysctmctl start rpc-gssd'
> I don't there is much would can do about that....

Sure there is. Leave gssd running, and make sure it can respond
quickly in every reasonable case. :-p


> But of
> course... Patches are always welcomed!! 8-)
>
> TBL... When kerberos is configured correctly for NFS everything
> works just fine. When kerberos is configured, but not for NFS,
> causes delays on all NFS mounts.

This convinces me even more that there is a gssd issue here.


> Today, there is a method to stop rpc-gssd from blindly starting
> when kerberos is configured to eliminate that delay.

I can fix my broken TV by not turning it on, and I don't
notice the problem. But the problem is still there any
time I want to watch TV.

The problem is not fixed by disabling gssd, it's just
hidden in some cases.


> This patch just tweaking that method to make things easier.

It makes one thing easier, and other things more difficult.
As a community, I thought our goal was to make Kerberos
easier to use, not easier to turn off.


> To address your concern about covering up a bug. I just don't
> see it... The code is doing exactly what its asked to do.
> By default the kernel asks krb5i context (when rpc.gssd
> is run). rpc.gssd looking for a principle in the key tab,
> when found the KDC is called...
>
> Everything is working just like it should and it is
> failing just like it should. I'm just trying to
> eliminate all this process when not needed, in
> an easier way..

I'm not even sure now what the use case is. The client has
proper principals, but the server doesn't? The server
should refuse the init sec context immediately. Is gssd
even running on the server?

Suppose there are a thousand clients and one broken
server. An administrator would fix that one server by
adding an extra service principal, rather than log
into a thousand clients to change a setting on each.

Suppose your client wants both sys and krb5 mounts of
a group of servers, and some are "misconfigured."
You have to enable gssd on the client but there are still
delays on the sec=sys mounts.

In fact, I think that's going to be pretty common. Why add
an NFS service principal on a client if you don't expect
to use sec=krb5 some of the time?



2016-06-28 14:27:57

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled

Again, sorry for the delay... That darn flux capacitor broke... again!!! :-)

On 06/23/2016 09:30 PM, Chuck Lever wrote:
>
>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <[email protected]> wrote:

[snip]

>> the key tab does have a nfs/hosname@REALM entry. So the
>> call to the KDC is probably failing... which
>> could be construed as a misconfiguration, but
>> that misconfiguration should not even come into
>> play with sec=sys mounts... IMHO...
>
> I disagree, of course. sec=sys means the client is not going
> to use Kerberos to authenticate individual user requests,
> and users don't need a Kerberos ticket to access their files.
> That's still the case.
>
> I'm not aware of any promise that sec=sys means there is
> no Kerberos within 50 miles of that mount.
I think that's is the assumption... No Kerberos will be
needed for sec=sys mounts. Its not when Kerberos is
not configured.

>
> If there are valid keytabs on both systems, they need to
> be set up correctly. If there's a misconfiguration, then
> gssd needs to report it precisely instead of time out.
> And it's just as easy to add a service principal to a keytab
> as it is to disable a systemd service in that case.
I think its more straightforward to disable a service
that is not needed than to have to add a principal to a
keytab for a service that's not being used or needed.

>
>
>>> Is gssd waiting for syslog or something?
>> No... its just failing to get the machine creds for root
>
> Clearly more is going on than that, and so far we have only
> some speculation. Can you provide an strace of rpc.gssd or
> a network capture so we can confirm what's going on?
Yes... Yes... and Yes.. I added you to the bz...

>
>
>> [snip]
>>
>>>> Which does work and will still work... but I'm thinking it is
>>>> much similar to disable the service via systemd command
>>>> systemctl disable rpc-gssd
>>>>
>>>> than creating and editing those .conf files.
>>>
>>> This should all be automatic, IMO.
>>>
>>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5
>>> to your mounts. No reboot, nothing to restart. Linux should be
>>> that simple.
>> The only extra step with Linux is to 'sysctmctl start rpc-gssd'
>> I don't there is much would can do about that....
>
> Sure there is. Leave gssd running, and make sure it can respond
> quickly in every reasonable case. :-p
>
>
>> But of
>> course... Patches are always welcomed!! 8-)
>>
>> TBL... When kerberos is configured correctly for NFS everything
>> works just fine. When kerberos is configured, but not for NFS,
>> causes delays on all NFS mounts.
>
> This convinces me even more that there is a gssd issue here.
>
>
>> Today, there is a method to stop rpc-gssd from blindly starting
>> when kerberos is configured to eliminate that delay.
>
> I can fix my broken TV by not turning it on, and I don't
> notice the problem. But the problem is still there any
> time I want to watch TV.
>
> The problem is not fixed by disabling gssd, it's just
> hidden in some cases.
I agree this %100... All I'm saying there should be a
way to disable it when the daemon is not needed or used.

Having it automatically started just because there is a
keytab, at first, I thought was a good idea, now
it turns not people really don't what miscellaneous
daemons running. Case in point gssproxy... Automatically
comes but there is a way to disable it. With rpc.gssd
there is not (easily).

>
>
>> This patch just tweaking that method to make things easier.
>
> It makes one thing easier, and other things more difficult.
> As a community, I thought our goal was to make Kerberos
> easier to use, not easier to turn off.
Again I can't agree with you more! But this is the case
were Kerberos is *not* being used for NFS... we should
make that case work as well...

>
>
>> To address your concern about covering up a bug. I just don't
>> see it... The code is doing exactly what its asked to do.
>> By default the kernel asks krb5i context (when rpc.gssd
>> is run). rpc.gssd looking for a principle in the key tab,
>> when found the KDC is called...
>>
>> Everything is working just like it should and it is
>> failing just like it should. I'm just trying to
>> eliminate all this process when not needed, in
>> an easier way..
>
> I'm not even sure now what the use case is. The client has
> proper principals, but the server doesn't? The server
> should refuse the init sec context immediately. Is gssd
> even running on the server?
No they don't because they are not using Kerberos for NFS...

So I guess this is what we are saying:

If you what to used Kerberos for anything at all,
they must configure it for NFS for their clients
to work properly... I'm not sure we really want to
say this.

>
> Suppose there are a thousand clients and one broken
> server. An administrator would fix that one server by
> adding an extra service principal, rather than log
> into a thousand clients to change a setting on each.
>
> Suppose your client wants both sys and krb5 mounts of
> a group of servers, and some are "misconfigured."
> You have to enable gssd on the client but there are still
> delays on the sec=sys mounts.
In both these cases you are assuming Kerberos mounts
are being used and so Kerberos should be configured
for NFS. That is just not the case.

>
> In fact, I think that's going to be pretty common. Why add
> an NFS service principal on a client if you don't expect
> to use sec=krb5 some of the time?
In that case adding the principal does make sense. But...

Why *must* you add a principal when you know only sec=sys
mounts will be used?

steved.

2016-06-28 16:28:01

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled


> On Jun 28, 2016, at 10:27 AM, Steve Dickson <[email protected]> wrote:
>
> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-)
>
> On 06/23/2016 09:30 PM, Chuck Lever wrote:
>>
>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <[email protected]> wrote:
>
> [snip]
>
>>> the key tab does have a nfs/hosname@REALM entry. So the
>>> call to the KDC is probably failing... which
>>> could be construed as a misconfiguration, but
>>> that misconfiguration should not even come into
>>> play with sec=sys mounts... IMHO...
>>
>> I disagree, of course. sec=sys means the client is not going
>> to use Kerberos to authenticate individual user requests,
>> and users don't need a Kerberos ticket to access their files.
>> That's still the case.
>>
>> I'm not aware of any promise that sec=sys means there is
>> no Kerberos within 50 miles of that mount.
> I think that's is the assumption... No Kerberos will be
> needed for sec=sys mounts. Its not when Kerberos is
> not configured.

NFSv3 sec=sys happens to mean that no Kerberos is needed.
This hasn't changed either.

NFSv4 sec=sys is different. Just like NFSv4 ACLs, and
NFSv4 ID mapping, and NFSv4 locking, and so on.

Note though that Kerberos isn't needed for NFSv4 sec=sys
even when there is a keytab. The client negotiates and
operates without it.


>> If there are valid keytabs on both systems, they need to
>> be set up correctly. If there's a misconfiguration, then
>> gssd needs to report it precisely instead of time out.
>> And it's just as easy to add a service principal to a keytab
>> as it is to disable a systemd service in that case.
> I think its more straightforward to disable a service
> that is not needed than to have to add a principal to a
> keytab for a service that's not being used or needed.

IMO automating NFS setup so that it chooses the most
secure possible settings without intervention is the
best possible solution.


>>>> Is gssd waiting for syslog or something?
>>> No... its just failing to get the machine creds for root
>>
>> Clearly more is going on than that, and so far we have only
>> some speculation. Can you provide an strace of rpc.gssd or
>> a network capture so we can confirm what's going on?
> Yes... Yes... and Yes.. I added you to the bz...

Thanks! I'll have a look at it.


>>> [snip]
>>>
>>>>> Which does work and will still work... but I'm thinking it is
>>>>> much similar to disable the service via systemd command
>>>>> systemctl disable rpc-gssd
>>>>>
>>>>> than creating and editing those .conf files.
>>>>
>>>> This should all be automatic, IMO.
>>>>
>>>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5
>>>> to your mounts. No reboot, nothing to restart. Linux should be
>>>> that simple.
>>> The only extra step with Linux is to 'sysctmctl start rpc-gssd'
>>> I don't there is much would can do about that....
>>
>> Sure there is. Leave gssd running, and make sure it can respond
>> quickly in every reasonable case. :-p
>>
>>
>>> But of
>>> course... Patches are always welcomed!! 8-)
>>>
>>> TBL... When kerberos is configured correctly for NFS everything
>>> works just fine. When kerberos is configured, but not for NFS,
>>> causes delays on all NFS mounts.
>>
>> This convinces me even more that there is a gssd issue here.
>>
>>
>>> Today, there is a method to stop rpc-gssd from blindly starting
>>> when kerberos is configured to eliminate that delay.
>>
>> I can fix my broken TV by not turning it on, and I don't
>> notice the problem. But the problem is still there any
>> time I want to watch TV.
>>
>> The problem is not fixed by disabling gssd, it's just
>> hidden in some cases.
> I agree this %100... All I'm saying there should be a
> way to disable it when the daemon is not needed or used.

NFSv4 sec=sys *does* use Kerberos, when it is available.
It has for years.

Documentation should be updated to state that if Kerberos
is configured on clients, they will attempt to use it to
manage some operations that are common to all NFSv4 mount
points on that client, even when a mount point uses sec=sys.

Kerberos will be used for user authentication only if the
client administrator has not specified a sec= setting, but
the server export allows the use of Kerberos; or if the
client administrator has specified a sec=krb5, sec=krb5i,
or sec=krb5p setting.

The reason for using Kerberos for common operations is
that a client may have just one lease management principal.
If the client uses sec=sys and sec=krb5 mounts, and the
sec=sys mount is done first, then lease management would use
sys as well. The client cannot change this principal after
it has established a lease and files are open.

A subsequent sec=krb5 mount will also use sec=sys for
lease management. This will be surprising and insecure
behavior. Therefore, all mounts from this client attempt
to set up a krb5 lease management transport.

The server should have an nfs/ service principal. It
doesn't _require_ one, but it's a best practice to have
one in place.

Administrators that have Kerberos available should use
it. There's no overhead to enabling it on NFS servers,
as long as the list of security flavors the server
returns for each export does not include Kerberos
flavors.


> Having it automatically started just because there is a
> keytab, at first, I thought was a good idea, now
> it turns not people really don't what miscellaneous
> daemons running. Case in point gssproxy... Automatically
> comes but there is a way to disable it. With rpc.gssd
> there is not (easily).

There are good reasons to disable daemons:

- The daemon consumes a lot of resources.
- The daemon exposes an attack surface.

gssd does neither.

There are good reasons not to disable daemons:

- It enables simpler administration.
- It keeps the test matrix narrow (because you
have to test just one configuration, not
multiple ones: gssd enabled, gssd disabled,
and so on).

Always enabling gssd provides both of these benefits.


>>> This patch just tweaking that method to make things easier.
>>
>> It makes one thing easier, and other things more difficult.
>> As a community, I thought our goal was to make Kerberos
>> easier to use, not easier to turn off.
> Again I can't agree with you more! But this is the case
> were Kerberos is *not* being used for NFS... we should
> make that case work as well...

Agreed.

But NFSv4 sec=sys *does* use Kerberos when Kerberos is
configured on the system. It's a fact, and we now need to
make it convenient and natural and bug-free. The choice is
between increasing security and just making it work, or
adding one more knob that administrators have to Google for.


>>> To address your concern about covering up a bug. I just don't
>>> see it... The code is doing exactly what its asked to do.
>>> By default the kernel asks krb5i context (when rpc.gssd
>>> is run). rpc.gssd looking for a principle in the key tab,
>>> when found the KDC is called...
>>>
>>> Everything is working just like it should and it is
>>> failing just like it should. I'm just trying to
>>> eliminate all this process when not needed, in
>>> an easier way..
>>
>> I'm not even sure now what the use case is. The client has
>> proper principals, but the server doesn't? The server
>> should refuse the init sec context immediately. Is gssd
>> even running on the server?
> No they don't because they are not using Kerberos for NFS...

OK, let's state clearly what's going on here:


The client has a host/ principal. gssd is started
automatically.


The server has what?

If the server has a keytab and an nfs/ principal,
gss-proxy should be running, and there are no delays.

If the server has a keytab and no nfs/ principal,
gss-proxy should be running, and any init sec
context should fail immediately. There should be no
delay. (If there is a delay, that needs to be
troubleshot).

If the server does not have a keytab, gss-proxy will
not be running, and NFSv4 clients will have to sense
this. It takes a moment for each sniff. Otherwise,
there's no operational difference.


I'm assuming then that the problem is that Kerberos
is not set up on the _server_. Can you confirm this?

Also, this negotiation should be done only during
the first contact of each server after a client
reboot, thus the delay happens only during the first
mount, not during subsequent ones. Can that also be
confirmed?


> So I guess this is what we are saying:
>
> If you what to used Kerberos for anything at all,
> they must configure it for NFS for their clients
> to work properly... I'm not sure we really want to
> say this.

Well, the clients are working properly without the
server principal in place. They just have an extra
delay at mount time. (you yourself pointed out in
an earlier e-mail that the client is doing everything
correctly, and no mention has been made of any other
operational issue).

We should encourage customers to set up in the most
secure way possible. In this case:

- Kerberos is already available in the environment

- It's not _required_ only _recommended_ (clients can
still use sec=sys without it) for the server to
enable Kerberos, but it's a best practice

I'm guessing that if gssd and gss-proxy are running on
the server all the time, even when there is no keytab,
that delay should go away for everyone. So:

- Always run a gssd service on servers that export NFSv4
(I assume this will address the delay problem)

- Recommend the NFS server be provisioned with an nfs/
principal, and explicitly specify sec=sys on exports
to prevent clients from negotiating an unwanted Kerberos
security setting

I far prefer these fixes to adding another administrative
setting on the client. It encourages better security, and
it addresses the problem for all NFS clients that might
want to try using Kerberos against Linux NFS servers, for
whatever reason.


>> Suppose there are a thousand clients and one broken
>> server. An administrator would fix that one server by
>> adding an extra service principal, rather than log
>> into a thousand clients to change a setting on each.
>>
>> Suppose your client wants both sys and krb5 mounts of
>> a group of servers, and some are "misconfigured."
>> You have to enable gssd on the client but there are still
>> delays on the sec=sys mounts.
> In both these cases you are assuming Kerberos mounts
> are being used and so Kerberos should be configured
> for NFS. That is just not the case.

My assumption is that administrators would prefer automatic
client set up, and good security by default.

There's no way to know in advance whether an administrator
will want sec=sys and sec=krb5 mounts on the same system.
/etc/fstab can be changed at any time, mounts can be done
by hand, or the administrator can add or remove principals
from /etc/krb5.keytab.

Our clients have to work when there are just sec=sys
mounts, or when there are sec=sys and sec=krb5 mounts.
They must allow on-demand configuration of sec=krb5. They
must attempt to provide the best possible level of security
at all times.

The out-of-the-shrinkwrap configuration must assume a mix
of capabilities.


>> In fact, I think that's going to be pretty common. Why add
>> an NFS service principal on a client if you don't expect
>> to use sec=krb5 some of the time?
> In that case adding the principal does make sense. But...
>
> Why *must* you add a principal when you know only sec=sys
> mounts will be used?

Explained in detail above (and this is only for NFSv4, and
is not at all a _must_). But in summary:

A client will attempt to use Kerberos for NFSv4 sec=sys when
there is a host/ or nfs/ principal in its keytab. That needs
to be documented.

Our _recommendation_ is that the server be provisioned with
an nfs/ principal as well when NFSv4 is used in an environment
where Kerberos is present. This eliminates a costly per-mount
security negotiation, and enables cryptographically strong
authentication of each client that mounts that server. NFSv4
sec=sys works properly otherwise without this principal.


--
Chuck Lever




2016-06-28 17:23:30

by Weston Andros Adamson

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled


> On Jun 28, 2016, at 12:27 PM, Chuck Lever <[email protected]> wrote:
>
>>
>> On Jun 28, 2016, at 10:27 AM, Steve Dickson <[email protected]> wrote:
>>
>> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-)
>>
>> On 06/23/2016 09:30 PM, Chuck Lever wrote:
>>>
>>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <[email protected]> wrote:
>>
>> [snip]
>>
>>>> the key tab does have a nfs/hosname@REALM entry. So the
>>>> call to the KDC is probably failing... which
>>>> could be construed as a misconfiguration, but
>>>> that misconfiguration should not even come into
>>>> play with sec=sys mounts... IMHO...
>>>
>>> I disagree, of course. sec=sys means the client is not going
>>> to use Kerberos to authenticate individual user requests,
>>> and users don't need a Kerberos ticket to access their files.
>>> That's still the case.
>>>
>>> I'm not aware of any promise that sec=sys means there is
>>> no Kerberos within 50 miles of that mount.
>> I think that's is the assumption... No Kerberos will be
>> needed for sec=sys mounts. Its not when Kerberos is
>> not configured.
>
> NFSv3 sec=sys happens to mean that no Kerberos is needed.
> This hasn't changed either.
>
> NFSv4 sec=sys is different. Just like NFSv4 ACLs, and
> NFSv4 ID mapping, and NFSv4 locking, and so on.
>
> Note though that Kerberos isn't needed for NFSv4 sec=sys
> even when there is a keytab. The client negotiates and
> operates without it.
>
>
>>> If there are valid keytabs on both systems, they need to
>>> be set up correctly. If there's a misconfiguration, then
>>> gssd needs to report it precisely instead of time out.
>>> And it's just as easy to add a service principal to a keytab
>>> as it is to disable a systemd service in that case.
>> I think its more straightforward to disable a service
>> that is not needed than to have to add a principal to a
>> keytab for a service that's not being used or needed.
>
> IMO automating NFS setup so that it chooses the most
> secure possible settings without intervention is the
> best possible solution.
>
>
>>>>> Is gssd waiting for syslog or something?
>>>> No... its just failing to get the machine creds for root
>>>
>>> Clearly more is going on than that, and so far we have only
>>> some speculation. Can you provide an strace of rpc.gssd or
>>> a network capture so we can confirm what's going on?
>> Yes... Yes... and Yes.. I added you to the bz...
>
> Thanks! I'll have a look at it.
>
>
>>>> [snip]
>>>>
>>>>>> Which does work and will still work... but I'm thinking it is
>>>>>> much similar to disable the service via systemd command
>>>>>> systemctl disable rpc-gssd
>>>>>>
>>>>>> than creating and editing those .conf files.
>>>>>
>>>>> This should all be automatic, IMO.
>>>>>
>>>>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5
>>>>> to your mounts. No reboot, nothing to restart. Linux should be
>>>>> that simple.
>>>> The only extra step with Linux is to 'sysctmctl start rpc-gssd'
>>>> I don't there is much would can do about that....
>>>
>>> Sure there is. Leave gssd running, and make sure it can respond
>>> quickly in every reasonable case. :-p
>>>
>>>
>>>> But of
>>>> course... Patches are always welcomed!! 8-)
>>>>
>>>> TBL... When kerberos is configured correctly for NFS everything
>>>> works just fine. When kerberos is configured, but not for NFS,
>>>> causes delays on all NFS mounts.
>>>
>>> This convinces me even more that there is a gssd issue here.
>>>
>>>
>>>> Today, there is a method to stop rpc-gssd from blindly starting
>>>> when kerberos is configured to eliminate that delay.
>>>
>>> I can fix my broken TV by not turning it on, and I don't
>>> notice the problem. But the problem is still there any
>>> time I want to watch TV.
>>>
>>> The problem is not fixed by disabling gssd, it's just
>>> hidden in some cases.
>> I agree this %100... All I'm saying there should be a
>> way to disable it when the daemon is not needed or used.
>
> NFSv4 sec=sys *does* use Kerberos, when it is available.
> It has for years.
>
> Documentation should be updated to state that if Kerberos
> is configured on clients, they will attempt to use it to
> manage some operations that are common to all NFSv4 mount
> points on that client, even when a mount point uses sec=sys.
>
> Kerberos will be used for user authentication only if the
> client administrator has not specified a sec= setting, but
> the server export allows the use of Kerberos; or if the
> client administrator has specified a sec=krb5, sec=krb5i,
> or sec=krb5p setting.
>
> The reason for using Kerberos for common operations is
> that a client may have just one lease management principal.
> If the client uses sec=sys and sec=krb5 mounts, and the
> sec=sys mount is done first, then lease management would use
> sys as well. The client cannot change this principal after
> it has established a lease and files are open.
>
> A subsequent sec=krb5 mount will also use sec=sys for
> lease management. This will be surprising and insecure
> behavior. Therefore, all mounts from this client attempt
> to set up a krb5 lease management transport.

Chuck,

Thanks for explaining this so well! This definitely should make
it?s way into documentation - we should have added something
like this a long time ago.

I?m definitely guilty of having to figure out why the client worked
this way and not documenting it...

-dros

>
> The server should have an nfs/ service principal. It
> doesn't _require_ one, but it's a best practice to have
> one in place.
>
> Administrators that have Kerberos available should use
> it. There's no overhead to enabling it on NFS servers,
> as long as the list of security flavors the server
> returns for each export does not include Kerberos
> flavors.
>
>
>> Having it automatically started just because there is a
>> keytab, at first, I thought was a good idea, now
>> it turns not people really don't what miscellaneous
>> daemons running. Case in point gssproxy... Automatically
>> comes but there is a way to disable it. With rpc.gssd
>> there is not (easily).
>
> There are good reasons to disable daemons:
>
> - The daemon consumes a lot of resources.
> - The daemon exposes an attack surface.
>
> gssd does neither.
>
> There are good reasons not to disable daemons:
>
> - It enables simpler administration.
> - It keeps the test matrix narrow (because you
> have to test just one configuration, not
> multiple ones: gssd enabled, gssd disabled,
> and so on).
>
> Always enabling gssd provides both of these benefits.
>
>
>>>> This patch just tweaking that method to make things easier.
>>>
>>> It makes one thing easier, and other things more difficult.
>>> As a community, I thought our goal was to make Kerberos
>>> easier to use, not easier to turn off.
>> Again I can't agree with you more! But this is the case
>> were Kerberos is *not* being used for NFS... we should
>> make that case work as well...
>
> Agreed.
>
> But NFSv4 sec=sys *does* use Kerberos when Kerberos is
> configured on the system. It's a fact, and we now need to
> make it convenient and natural and bug-free. The choice is
> between increasing security and just making it work, or
> adding one more knob that administrators have to Google for.
>
>
>>>> To address your concern about covering up a bug. I just don't
>>>> see it... The code is doing exactly what its asked to do.
>>>> By default the kernel asks krb5i context (when rpc.gssd
>>>> is run). rpc.gssd looking for a principle in the key tab,
>>>> when found the KDC is called...
>>>>
>>>> Everything is working just like it should and it is
>>>> failing just like it should. I'm just trying to
>>>> eliminate all this process when not needed, in
>>>> an easier way..
>>>
>>> I'm not even sure now what the use case is. The client has
>>> proper principals, but the server doesn't? The server
>>> should refuse the init sec context immediately. Is gssd
>>> even running on the server?
>> No they don't because they are not using Kerberos for NFS...
>
> OK, let's state clearly what's going on here:
>
>
> The client has a host/ principal. gssd is started
> automatically.
>
>
> The server has what?
>
> If the server has a keytab and an nfs/ principal,
> gss-proxy should be running, and there are no delays.
>
> If the server has a keytab and no nfs/ principal,
> gss-proxy should be running, and any init sec
> context should fail immediately. There should be no
> delay. (If there is a delay, that needs to be
> troubleshot).
>
> If the server does not have a keytab, gss-proxy will
> not be running, and NFSv4 clients will have to sense
> this. It takes a moment for each sniff. Otherwise,
> there's no operational difference.
>
>
> I'm assuming then that the problem is that Kerberos
> is not set up on the _server_. Can you confirm this?
>
> Also, this negotiation should be done only during
> the first contact of each server after a client
> reboot, thus the delay happens only during the first
> mount, not during subsequent ones. Can that also be
> confirmed?
>
>
>> So I guess this is what we are saying:
>>
>> If you what to used Kerberos for anything at all,
>> they must configure it for NFS for their clients
>> to work properly... I'm not sure we really want to
>> say this.
>
> Well, the clients are working properly without the
> server principal in place. They just have an extra
> delay at mount time. (you yourself pointed out in
> an earlier e-mail that the client is doing everything
> correctly, and no mention has been made of any other
> operational issue).
>
> We should encourage customers to set up in the most
> secure way possible. In this case:
>
> - Kerberos is already available in the environment
>
> - It's not _required_ only _recommended_ (clients can
> still use sec=sys without it) for the server to
> enable Kerberos, but it's a best practice
>
> I'm guessing that if gssd and gss-proxy are running on
> the server all the time, even when there is no keytab,
> that delay should go away for everyone. So:
>
> - Always run a gssd service on servers that export NFSv4
> (I assume this will address the delay problem)
>
> - Recommend the NFS server be provisioned with an nfs/
> principal, and explicitly specify sec=sys on exports
> to prevent clients from negotiating an unwanted Kerberos
> security setting
>
> I far prefer these fixes to adding another administrative
> setting on the client. It encourages better security, and
> it addresses the problem for all NFS clients that might
> want to try using Kerberos against Linux NFS servers, for
> whatever reason.
>
>
>>> Suppose there are a thousand clients and one broken
>>> server. An administrator would fix that one server by
>>> adding an extra service principal, rather than log
>>> into a thousand clients to change a setting on each.
>>>
>>> Suppose your client wants both sys and krb5 mounts of
>>> a group of servers, and some are "misconfigured."
>>> You have to enable gssd on the client but there are still
>>> delays on the sec=sys mounts.
>> In both these cases you are assuming Kerberos mounts
>> are being used and so Kerberos should be configured
>> for NFS. That is just not the case.
>
> My assumption is that administrators would prefer automatic
> client set up, and good security by default.
>
> There's no way to know in advance whether an administrator
> will want sec=sys and sec=krb5 mounts on the same system.
> /etc/fstab can be changed at any time, mounts can be done
> by hand, or the administrator can add or remove principals
> from /etc/krb5.keytab.
>
> Our clients have to work when there are just sec=sys
> mounts, or when there are sec=sys and sec=krb5 mounts.
> They must allow on-demand configuration of sec=krb5. They
> must attempt to provide the best possible level of security
> at all times.
>
> The out-of-the-shrinkwrap configuration must assume a mix
> of capabilities.
>
>
>>> In fact, I think that's going to be pretty common. Why add
>>> an NFS service principal on a client if you don't expect
>>> to use sec=krb5 some of the time?
>> In that case adding the principal does make sense. But...
>>
>> Why *must* you add a principal when you know only sec=sys
>> mounts will be used?
>
> Explained in detail above (and this is only for NFSv4, and
> is not at all a _must_). But in summary:
>
> A client will attempt to use Kerberos for NFSv4 sec=sys when
> there is a host/ or nfs/ principal in its keytab. That needs
> to be documented.
>
> Our _recommendation_ is that the server be provisioned with
> an nfs/ principal as well when NFSv4 is used in an environment
> where Kerberos is present. This eliminates a costly per-mount
> security negotiation, and enables cryptographically strong
> authentication of each client that mounts that server. NFSv4
> sec=sys works properly otherwise without this principal.
>
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2016-06-28 18:11:50

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled



On 06/28/2016 12:27 PM, Chuck Lever wrote:
>
>> On Jun 28, 2016, at 10:27 AM, Steve Dickson <[email protected]> wrote:
>>
>> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-)
>>
>> On 06/23/2016 09:30 PM, Chuck Lever wrote:
>>>
>>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <[email protected]> wrote:
>>
>> [snip]
>>
>>>> the key tab does have a nfs/hosname@REALM entry. So the
>>>> call to the KDC is probably failing... which
>>>> could be construed as a misconfiguration, but
>>>> that misconfiguration should not even come into
>>>> play with sec=sys mounts... IMHO...
>>>
>>> I disagree, of course. sec=sys means the client is not going
>>> to use Kerberos to authenticate individual user requests,
>>> and users don't need a Kerberos ticket to access their files.
>>> That's still the case.
>>>
>>> I'm not aware of any promise that sec=sys means there is
>>> no Kerberos within 50 miles of that mount.
>> I think that's is the assumption... No Kerberos will be
>> needed for sec=sys mounts. Its not when Kerberos is
>> not configured.
>
> NFSv3 sec=sys happens to mean that no Kerberos is needed.
> This hasn't changed either.
>
> NFSv4 sec=sys is different. Just like NFSv4 ACLs, and
> NFSv4 ID mapping, and NFSv4 locking, and so on.
>
> Note though that Kerberos isn't needed for NFSv4 sec=sys
> even when there is a keytab. The client negotiates and
> operates without it.
If there is a keytab... there will be rpc.gssd runnning
which will cause an upcall... and the negotiation starts
with krb5i.. So yes its not needed but it will be tried.

>
>
>>> If there are valid keytabs on both systems, they need to
>>> be set up correctly. If there's a misconfiguration, then
>>> gssd needs to report it precisely instead of time out.
>>> And it's just as easy to add a service principal to a keytab
>>> as it is to disable a systemd service in that case.
>> I think its more straightforward to disable a service
>> that is not needed than to have to add a principal to a
>> keytab for a service that's not being used or needed.
>
> IMO automating NFS setup so that it chooses the most
> secure possible settings without intervention is the
> best possible solution.
Sure... now back to the point. ;-)


>>>
>>> The problem is not fixed by disabling gssd, it's just
>>> hidden in some cases.
>> I agree this %100... All I'm saying there should be a
>> way to disable it when the daemon is not needed or used.
>
> NFSv4 sec=sys *does* use Kerberos, when it is available.
> It has for years.
Right... lets define "available" when rpc.gssd is running.
When rpc.gssd is not running Kerberos is not available.

>
> Documentation should be updated to state that if Kerberos
> is configured on clients, they will attempt to use it to
> manage some operations that are common to all NFSv4 mount
> points on that client, even when a mount point uses sec=sys.
>
> Kerberos will be used for user authentication only if the
> client administrator has not specified a sec= setting, but
> the server export allows the use of Kerberos; or if the
> client administrator has specified a sec=krb5, sec=krb5i,
> or sec=krb5p setting.
>
> The reason for using Kerberos for common operations is
> that a client may have just one lease management principal.
> If the client uses sec=sys and sec=krb5 mounts, and the
> sec=sys mount is done first, then lease management would use
> sys as well. The client cannot change this principal after
> it has established a lease and files are open.
>
> A subsequent sec=krb5 mount will also use sec=sys for
> lease management. This will be surprising and insecure
> behavior. Therefore, all mounts from this client attempt
> to set up a krb5 lease management transport.
>
> The server should have an nfs/ service principal. It
> doesn't _require_ one, but it's a best practice to have
> one in place.
Yeah our documentation is lacking in this area...

>
> Administrators that have Kerberos available should use
> it. There's no overhead to enabling it on NFS servers,
> as long as the list of security flavors the server
> returns for each export does not include Kerberos
> flavors.
Admins are going to do what they want to no matter
what we say... IMHO...

>
>
>> Having it automatically started just because there is a
>> keytab, at first, I thought was a good idea, now
>> it turns not people really don't what miscellaneous
>> daemons running. Case in point gssproxy... Automatically
>> comes but there is a way to disable it. With rpc.gssd
>> there is not (easily).
>
> There are good reasons to disable daemons:
>
> - The daemon consumes a lot of resources.
> - The daemon exposes an attack surface.
>
> gssd does neither.
How about not needed? no rpc.gssd.. no upcall... no problem... ;-)

>
> There are good reasons not to disable daemons:
I'm assuming you meant "to disable" or "not to enable" here.

>
> - It enables simpler administration.
> - It keeps the test matrix narrow (because you
> have to test just one configuration, not
> multiple ones: gssd enabled, gssd disabled,
> and so on).
>
> Always enabling gssd provides both of these benefits.
This is a production environment so there is no testing
but simpler admin is never a bad thing.

>
>
>>>> This patch just tweaking that method to make things easier.
>>>
>>> It makes one thing easier, and other things more difficult.
>>> As a community, I thought our goal was to make Kerberos
>>> easier to use, not easier to turn off.
>> Again I can't agree with you more! But this is the case
>> were Kerberos is *not* being used for NFS... we should
>> make that case work as well...
>
> Agreed.
>
> But NFSv4 sec=sys *does* use Kerberos when Kerberos is
> configured on the system. It's a fact, and we now need to
> make it convenient and natural and bug-free. The choice is
> between increasing security and just making it work, or
> adding one more knob that administrators have to Google for.
If they do not want use Kerberos for NFS, whether is a good
idea or not, we can not force them to... Or can we?

>
>
>>>> To address your concern about covering up a bug. I just don't
>>>> see it... The code is doing exactly what its asked to do.
>>>> By default the kernel asks krb5i context (when rpc.gssd
>>>> is run). rpc.gssd looking for a principle in the key tab,
>>>> when found the KDC is called...
>>>>
>>>> Everything is working just like it should and it is
>>>> failing just like it should. I'm just trying to
>>>> eliminate all this process when not needed, in
>>>> an easier way..
>>>
>>> I'm not even sure now what the use case is. The client has
>>> proper principals, but the server doesn't? The server
>>> should refuse the init sec context immediately. Is gssd
>>> even running on the server?
>> No they don't because they are not using Kerberos for NFS...
>
> OK, let's state clearly what's going on here:
>
>
> The client has a host/ principal. gssd is started
> automatically.
>
>
> The server has what?
No info on the server other than its Linux and the
nfs server is running.

>
> If the server has a keytab and an nfs/ principal,
> gss-proxy should be running, and there are no delays.
In my testing when gss-proxy is not runnning the mount
hangs.

>
> If the server has a keytab and no nfs/ principal,
> gss-proxy should be running, and any init sec
> context should fail immediately. There should be no
> delay. (If there is a delay, that needs to be
> troubleshot).
>
> If the server does not have a keytab, gss-proxy will
> not be running, and NFSv4 clients will have to sense
> this. It takes a moment for each sniff. Otherwise,
> there's no operational difference.
>
>
> I'm assuming then that the problem is that Kerberos
> is not set up on the _server_. Can you confirm this?
I'll try... but we should have to force people to
set up Kerberos on server they are not going to use.

>
> Also, this negotiation should be done only during
> the first contact of each server after a client
> reboot, thus the delay happens only during the first
> mount, not during subsequent ones. Can that also be
> confirmed?
It appears it happen on all of them.

>
>
>> So I guess this is what we are saying:
>>
>> If you what to used Kerberos for anything at all,
>> they must configure it for NFS for their clients
>> to work properly... I'm not sure we really want to
>> say this.
>
> Well, the clients are working properly without the
> server principal in place. They just have an extra
> delay at mount time. (you yourself pointed out in
> an earlier e-mail that the client is doing everything
> correctly, and no mention has been made of any other
> operational issue).
This appeared to be the case.

>
> We should encourage customers to set up in the most
> secure way possible. In this case:
>
> - Kerberos is already available in the environment
>
> - It's not _required_ only _recommended_ (clients can
> still use sec=sys without it) for the server to
> enable Kerberos, but it's a best practice
>
> I'm guessing that if gssd and gss-proxy are running on
> the server all the time, even when there is no keytab,
> that delay should go away for everyone. So:
>
> - Always run a gssd service on servers that export NFSv4
> (I assume this will address the delay problem)
>
> - Recommend the NFS server be provisioned with an nfs/
> principal, and explicitly specify sec=sys on exports
> to prevent clients from negotiating an unwanted Kerberos
> security setting
Or don't start rpc.gssd... ;-)

>
> I far prefer these fixes to adding another administrative
> setting on the client. It encourages better security, and
> it addresses the problem for all NFS clients that might
> want to try using Kerberos against Linux NFS servers, for
> whatever reason.
As you say we can only recommend... If they don't
now want to use secure mounts in a Kerberos environment
we should not make them, is all I'm saying.

>
>
>>> Suppose there are a thousand clients and one broken
>>> server. An administrator would fix that one server by
>>> adding an extra service principal, rather than log
>>> into a thousand clients to change a setting on each.
>>>
>>> Suppose your client wants both sys and krb5 mounts of
>>> a group of servers, and some are "misconfigured."
>>> You have to enable gssd on the client but there are still
>>> delays on the sec=sys mounts.
>> In both these cases you are assuming Kerberos mounts
>> are being used and so Kerberos should be configured
>> for NFS. That is just not the case.
>
> My assumption is that administrators would prefer automatic
> client set up, and good security by default.
I don't think we can make any assumption what admins want.
They want strong security, but not with NFS... That's
their choice, not ours.

>
> There's no way to know in advance whether an administrator
> will want sec=sys and sec=krb5 mounts on the same system.
> /etc/fstab can be changed at any time, mounts can be done
> by hand, or the administrator can add or remove principals
> from /etc/krb5.keytab.
>
> Our clients have to work when there are just sec=sys
> mounts, or when there are sec=sys and sec=krb5 mounts.
> They must allow on-demand configuration of sec=krb5. They
> must attempt to provide the best possible level of security
> at all times.
>
> The out-of-the-shrinkwrap configuration must assume a mix
> of capabilities.
I agree... And they are... But if they know for a fact, that
their client(s) will never want to use secure mount, which
I'm sure there a few out there, I see no problem in
not starting a service they well never use.

>
>
>>> In fact, I think that's going to be pretty common. Why add
>>> an NFS service principal on a client if you don't expect
>>> to use sec=krb5 some of the time?
>> In that case adding the principal does make sense. But...
>>
>> Why *must* you add a principal when you know only sec=sys
>> mounts will be used?
>
> Explained in detail above (and this is only for NFSv4, and
> is not at all a _must_). But in summary:
>
> A client will attempt to use Kerberos for NFSv4 sec=sys when
> there is a host/ or nfs/ principal in its keytab. That needs
> to be documented.
>
> Our _recommendation_ is that the server be provisioned with
> an nfs/ principal as well when NFSv4 is used in an environment
> where Kerberos is present. This eliminates a costly per-mount
> security negotiation, and enables cryptographically strong
> authentication of each client that mounts that server. NFSv4
> sec=sys works properly otherwise without this principal.
The was beautifully said... and I agree with all...
But customer is going to turn around and tell me to go pound
sand... Because they are not about to touching their server!!! :-)
Esp when all they have to do is disable a service on the client
where the hang is occurring.

steved.

2016-06-28 18:13:00

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled



On 06/28/2016 01:23 PM, Weston Andros Adamson wrote:
>
>> On Jun 28, 2016, at 12:27 PM, Chuck Lever <[email protected]> wrote:
>>
>>>
>>> On Jun 28, 2016, at 10:27 AM, Steve Dickson <[email protected]> wrote:
>>>
>>> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-)
>>>
>>> On 06/23/2016 09:30 PM, Chuck Lever wrote:
>>>>
>>>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <[email protected]> wrote:
>>>
>>> [snip]
>>>
>>>>> the key tab does have a nfs/hosname@REALM entry. So the
>>>>> call to the KDC is probably failing... which
>>>>> could be construed as a misconfiguration, but
>>>>> that misconfiguration should not even come into
>>>>> play with sec=sys mounts... IMHO...
>>>>
>>>> I disagree, of course. sec=sys means the client is not going
>>>> to use Kerberos to authenticate individual user requests,
>>>> and users don't need a Kerberos ticket to access their files.
>>>> That's still the case.
>>>>
>>>> I'm not aware of any promise that sec=sys means there is
>>>> no Kerberos within 50 miles of that mount.
>>> I think that's is the assumption... No Kerberos will be
>>> needed for sec=sys mounts. Its not when Kerberos is
>>> not configured.
>>
>> NFSv3 sec=sys happens to mean that no Kerberos is needed.
>> This hasn't changed either.
>>
>> NFSv4 sec=sys is different. Just like NFSv4 ACLs, and
>> NFSv4 ID mapping, and NFSv4 locking, and so on.
>>
>> Note though that Kerberos isn't needed for NFSv4 sec=sys
>> even when there is a keytab. The client negotiates and
>> operates without it.
>>
>>
>>>> If there are valid keytabs on both systems, they need to
>>>> be set up correctly. If there's a misconfiguration, then
>>>> gssd needs to report it precisely instead of time out.
>>>> And it's just as easy to add a service principal to a keytab
>>>> as it is to disable a systemd service in that case.
>>> I think its more straightforward to disable a service
>>> that is not needed than to have to add a principal to a
>>> keytab for a service that's not being used or needed.
>>
>> IMO automating NFS setup so that it chooses the most
>> secure possible settings without intervention is the
>> best possible solution.
>>
>>
>>>>>> Is gssd waiting for syslog or something?
>>>>> No... its just failing to get the machine creds for root
>>>>
>>>> Clearly more is going on than that, and so far we have only
>>>> some speculation. Can you provide an strace of rpc.gssd or
>>>> a network capture so we can confirm what's going on?
>>> Yes... Yes... and Yes.. I added you to the bz...
>>
>> Thanks! I'll have a look at it.
>>
>>
>>>>> [snip]
>>>>>
>>>>>>> Which does work and will still work... but I'm thinking it is
>>>>>>> much similar to disable the service via systemd command
>>>>>>> systemctl disable rpc-gssd
>>>>>>>
>>>>>>> than creating and editing those .conf files.
>>>>>>
>>>>>> This should all be automatic, IMO.
>>>>>>
>>>>>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5
>>>>>> to your mounts. No reboot, nothing to restart. Linux should be
>>>>>> that simple.
>>>>> The only extra step with Linux is to 'sysctmctl start rpc-gssd'
>>>>> I don't there is much would can do about that....
>>>>
>>>> Sure there is. Leave gssd running, and make sure it can respond
>>>> quickly in every reasonable case. :-p
>>>>
>>>>
>>>>> But of
>>>>> course... Patches are always welcomed!! 8-)
>>>>>
>>>>> TBL... When kerberos is configured correctly for NFS everything
>>>>> works just fine. When kerberos is configured, but not for NFS,
>>>>> causes delays on all NFS mounts.
>>>>
>>>> This convinces me even more that there is a gssd issue here.
>>>>
>>>>
>>>>> Today, there is a method to stop rpc-gssd from blindly starting
>>>>> when kerberos is configured to eliminate that delay.
>>>>
>>>> I can fix my broken TV by not turning it on, and I don't
>>>> notice the problem. But the problem is still there any
>>>> time I want to watch TV.
>>>>
>>>> The problem is not fixed by disabling gssd, it's just
>>>> hidden in some cases.
>>> I agree this %100... All I'm saying there should be a
>>> way to disable it when the daemon is not needed or used.
>>
>> NFSv4 sec=sys *does* use Kerberos, when it is available.
>> It has for years.
>>
>> Documentation should be updated to state that if Kerberos
>> is configured on clients, they will attempt to use it to
>> manage some operations that are common to all NFSv4 mount
>> points on that client, even when a mount point uses sec=sys.
>>
>> Kerberos will be used for user authentication only if the
>> client administrator has not specified a sec= setting, but
>> the server export allows the use of Kerberos; or if the
>> client administrator has specified a sec=krb5, sec=krb5i,
>> or sec=krb5p setting.
>>
>> The reason for using Kerberos for common operations is
>> that a client may have just one lease management principal.
>> If the client uses sec=sys and sec=krb5 mounts, and the
>> sec=sys mount is done first, then lease management would use
>> sys as well. The client cannot change this principal after
>> it has established a lease and files are open.
>>
>> A subsequent sec=krb5 mount will also use sec=sys for
>> lease management. This will be surprising and insecure
>> behavior. Therefore, all mounts from this client attempt
>> to set up a krb5 lease management transport.
>
> Chuck,
>
> Thanks for explaining this so well! This definitely should make
> it?s way into documentation - we should have added something
> like this a long time ago.
I agree... where should it go? the mount.nfs man page??

steved.

2016-06-28 18:19:34

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled


> On Jun 28, 2016, at 2:12 PM, Steve Dickson <[email protected]> wrote:
>
>
>
>> On 06/28/2016 01:23 PM, Weston Andros Adamson wrote:
>>
>>>> On Jun 28, 2016, at 12:27 PM, Chuck Lever <[email protected]> wrote:
>>>>
>>>>
>>>> On Jun 28, 2016, at 10:27 AM, Steve Dickson <[email protected]> wrote:
>>>>
>>>> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-)
>>>>
>>>>> On 06/23/2016 09:30 PM, Chuck Lever wrote:
>>>>>
>>>>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <[email protected]> wrote:
>>>>
>>>> [snip]
>>>>
>>>>>> the key tab does have a nfs/hosname@REALM entry. So the
>>>>>> call to the KDC is probably failing... which
>>>>>> could be construed as a misconfiguration, but
>>>>>> that misconfiguration should not even come into
>>>>>> play with sec=sys mounts... IMHO...
>>>>>
>>>>> I disagree, of course. sec=sys means the client is not going
>>>>> to use Kerberos to authenticate individual user requests,
>>>>> and users don't need a Kerberos ticket to access their files.
>>>>> That's still the case.
>>>>>
>>>>> I'm not aware of any promise that sec=sys means there is
>>>>> no Kerberos within 50 miles of that mount.
>>>> I think that's is the assumption... No Kerberos will be
>>>> needed for sec=sys mounts. Its not when Kerberos is
>>>> not configured.
>>>
>>> NFSv3 sec=sys happens to mean that no Kerberos is needed.
>>> This hasn't changed either.
>>>
>>> NFSv4 sec=sys is different. Just like NFSv4 ACLs, and
>>> NFSv4 ID mapping, and NFSv4 locking, and so on.
>>>
>>> Note though that Kerberos isn't needed for NFSv4 sec=sys
>>> even when there is a keytab. The client negotiates and
>>> operates without it.
>>>
>>>
>>>>> If there are valid keytabs on both systems, they need to
>>>>> be set up correctly. If there's a misconfiguration, then
>>>>> gssd needs to report it precisely instead of time out.
>>>>> And it's just as easy to add a service principal to a keytab
>>>>> as it is to disable a systemd service in that case.
>>>> I think its more straightforward to disable a service
>>>> that is not needed than to have to add a principal to a
>>>> keytab for a service that's not being used or needed.
>>>
>>> IMO automating NFS setup so that it chooses the most
>>> secure possible settings without intervention is the
>>> best possible solution.
>>>
>>>
>>>>>>> Is gssd waiting for syslog or something?
>>>>>> No... its just failing to get the machine creds for root
>>>>>
>>>>> Clearly more is going on than that, and so far we have only
>>>>> some speculation. Can you provide an strace of rpc.gssd or
>>>>> a network capture so we can confirm what's going on?
>>>> Yes... Yes... and Yes.. I added you to the bz...
>>>
>>> Thanks! I'll have a look at it.
>>>
>>>
>>>>>> [snip]
>>>>>>
>>>>>>>> Which does work and will still work... but I'm thinking it is
>>>>>>>> much similar to disable the service via systemd command
>>>>>>>> systemctl disable rpc-gssd
>>>>>>>>
>>>>>>>> than creating and editing those .conf files.
>>>>>>>
>>>>>>> This should all be automatic, IMO.
>>>>>>>
>>>>>>> On Solaris, drop in a keytab and a krb5.conf, and add sec=krb5
>>>>>>> to your mounts. No reboot, nothing to restart. Linux should be
>>>>>>> that simple.
>>>>>> The only extra step with Linux is to 'sysctmctl start rpc-gssd'
>>>>>> I don't there is much would can do about that....
>>>>>
>>>>> Sure there is. Leave gssd running, and make sure it can respond
>>>>> quickly in every reasonable case. :-p
>>>>>
>>>>>
>>>>>> But of
>>>>>> course... Patches are always welcomed!! 8-)
>>>>>>
>>>>>> TBL... When kerberos is configured correctly for NFS everything
>>>>>> works just fine. When kerberos is configured, but not for NFS,
>>>>>> causes delays on all NFS mounts.
>>>>>
>>>>> This convinces me even more that there is a gssd issue here.
>>>>>
>>>>>
>>>>>> Today, there is a method to stop rpc-gssd from blindly starting
>>>>>> when kerberos is configured to eliminate that delay.
>>>>>
>>>>> I can fix my broken TV by not turning it on, and I don't
>>>>> notice the problem. But the problem is still there any
>>>>> time I want to watch TV.
>>>>>
>>>>> The problem is not fixed by disabling gssd, it's just
>>>>> hidden in some cases.
>>>> I agree this %100... All I'm saying there should be a
>>>> way to disable it when the daemon is not needed or used.
>>>
>>> NFSv4 sec=sys *does* use Kerberos, when it is available.
>>> It has for years.
>>>
>>> Documentation should be updated to state that if Kerberos
>>> is configured on clients, they will attempt to use it to
>>> manage some operations that are common to all NFSv4 mount
>>> points on that client, even when a mount point uses sec=sys.
>>>
>>> Kerberos will be used for user authentication only if the
>>> client administrator has not specified a sec= setting, but
>>> the server export allows the use of Kerberos; or if the
>>> client administrator has specified a sec=krb5, sec=krb5i,
>>> or sec=krb5p setting.
>>>
>>> The reason for using Kerberos for common operations is
>>> that a client may have just one lease management principal.
>>> If the client uses sec=sys and sec=krb5 mounts, and the
>>> sec=sys mount is done first, then lease management would use
>>> sys as well. The client cannot change this principal after
>>> it has established a lease and files are open.
>>>
>>> A subsequent sec=krb5 mount will also use sec=sys for
>>> lease management. This will be surprising and insecure
>>> behavior. Therefore, all mounts from this client attempt
>>> to set up a krb5 lease management transport.
>>
>> Chuck,
>>
>> Thanks for explaining this so well! This definitely should make
>> it’s way into documentation - we should have added something
>> like this a long time ago.
> I agree... where should it go? the mount.nfs man page??

nfs(5) is where this kind of thing typically goes.



2016-06-28 20:38:52

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 1/1 v2] systemd: Only start the gssd daemons when they are enabled


> On Jun 28, 2016, at 2:11 PM, Steve Dickson <[email protected]> wrote:
>
>
>
>> On 06/28/2016 12:27 PM, Chuck Lever wrote:
>>
>>> On Jun 28, 2016, at 10:27 AM, Steve Dickson <[email protected]> wrote:
>>>
>>> Again, sorry for the delay... That darn flux capacitor broke... again!!! :-)
>>>
>>>> On 06/23/2016 09:30 PM, Chuck Lever wrote:
>>>>
>>>>> On Jun 23, 2016, at 11:57 AM, Steve Dickson <[email protected]> wrote:
>>>
>>> [snip]
>>>
>>>>> the key tab does have a nfs/hosname@REALM entry. So the
>>>>> call to the KDC is probably failing... which
>>>>> could be construed as a misconfiguration, but
>>>>> that misconfiguration should not even come into
>>>>> play with sec=sys mounts... IMHO...
>>>>
>>>> I disagree, of course. sec=sys means the client is not going
>>>> to use Kerberos to authenticate individual user requests,
>>>> and users don't need a Kerberos ticket to access their files.
>>>> That's still the case.
>>>>
>>>> I'm not aware of any promise that sec=sys means there is
>>>> no Kerberos within 50 miles of that mount.
>>> I think that's is the assumption... No Kerberos will be
>>> needed for sec=sys mounts. Its not when Kerberos is
>>> not configured.
>>
>> NFSv3 sec=sys happens to mean that no Kerberos is needed.
>> This hasn't changed either.
>>
>> NFSv4 sec=sys is different. Just like NFSv4 ACLs, and
>> NFSv4 ID mapping, and NFSv4 locking, and so on.
>>
>> Note though that Kerberos isn't needed for NFSv4 sec=sys
>> even when there is a keytab. The client negotiates and
>> operates without it.
> If there is a keytab... there will be rpc.gssd runnning
> which will cause an upcall... and the negotiation starts
> with krb5i.. So yes its not needed but it will be tried.
>
>>
>>
>>>> If there are valid keytabs on both systems, they need to
>>>> be set up correctly. If there's a misconfiguration, then
>>>> gssd needs to report it precisely instead of time out.
>>>> And it's just as easy to add a service principal to a keytab
>>>> as it is to disable a systemd service in that case.
>>> I think its more straightforward to disable a service
>>> that is not needed than to have to add a principal to a
>>> keytab for a service that's not being used or needed.
>>
>> IMO automating NFS setup so that it chooses the most
>> secure possible settings without intervention is the
>> best possible solution.
> Sure... now back to the point. ;-)
>
>
>>>>
>>>> The problem is not fixed by disabling gssd, it's just
>>>> hidden in some cases.
>>> I agree this %100... All I'm saying there should be a
>>> way to disable it when the daemon is not needed or used.
>>
>> NFSv4 sec=sys *does* use Kerberos, when it is available.
>> It has for years.
> Right... lets define "available" when rpc.gssd is running.
> When rpc.gssd is not running Kerberos is not available.

OK, but now whenever a change to the Kerberos
configuration on the host is made (the keytab is
created or destroyed, or a principal is added or
removed from the keytab), an extra step is needed
to ensure secure NFS is working properly.

Should we go farther and say that, if there happen
to be no sec=krb5[ip] mounts on the system, gssd
should be shut down? I mean, it's not being used,
so let's turn it off!

There is a host/ principal in the keytab. That means
Kerberos is active on that system, and gssd can use it.
That means it is possible that an administrator (or
automounter) may specify sec=krb5 at some point
during the life of this client. For me that means gssd
should be running on this system.

Another way to achieve your goal is to add a command
line option to gssd which specifies which principal in the
local keytab to use as the machine credential. Specify
a principal that is not in the keytab, and gssd should do
no negotiation at all, and will return immediately.

There may already be a command line option to do this
(I'm not at liberty to confirm my memory at the moment).

That would be an immediate solution for this customer,
if provisioning an nfs/ service principal on their server
is still anathema, and no other code change is needed.


>> Documentation should be updated to state that if Kerberos
>> is configured on clients, they will attempt to use it to
>> manage some operations that are common to all NFSv4 mount
>> points on that client, even when a mount point uses sec=sys.
>>
>> Kerberos will be used for user authentication only if the
>> client administrator has not specified a sec= setting, but
>> the server export allows the use of Kerberos; or if the
>> client administrator has specified a sec=krb5, sec=krb5i,
>> or sec=krb5p setting.
>>
>> The reason for using Kerberos for common operations is
>> that a client may have just one lease management principal.
>> If the client uses sec=sys and sec=krb5 mounts, and the
>> sec=sys mount is done first, then lease management would use
>> sys as well. The client cannot change this principal after
>> it has established a lease and files are open.
>>
>> A subsequent sec=krb5 mount will also use sec=sys for
>> lease management. This will be surprising and insecure
>> behavior. Therefore, all mounts from this client attempt
>> to set up a krb5 lease management transport.
>>
>> The server should have an nfs/ service principal. It
>> doesn't _require_ one, but it's a best practice to have
>> one in place.
> Yeah our documentation is lacking in this area...
>
>>
>> Administrators that have Kerberos available should use
>> it. There's no overhead to enabling it on NFS servers,
>> as long as the list of security flavors the server
>> returns for each export does not include Kerberos
>> flavors.
> Admins are going to do what they want to no matter
> what we say... IMHO...
>
>>
>>
>>> Having it automatically started just because there is a
>>> keytab, at first, I thought was a good idea, now
>>> it turns not people really don't what miscellaneous
>>> daemons running. Case in point gssproxy... Automatically
>>> comes but there is a way to disable it. With rpc.gssd
>>> there is not (easily).
>>
>> There are good reasons to disable daemons:
>>
>> - The daemon consumes a lot of resources.
>> - The daemon exposes an attack surface.
>>
>> gssd does neither.
> How about not needed? no rpc.gssd.. no upcall... no problem... ;-)

>> There are good reasons not to disable daemons:
> I'm assuming you meant "to disable" or "not to enable" here.

No, I meant exactly what I wrote. Let's rewrite it
"good reasons to leave a daemon enabled"


>> - It enables simpler administration.
>> - It keeps the test matrix narrow (because you
>> have to test just one configuration, not
>> multiple ones: gssd enabled, gssd disabled,
>> and so on).
>>
>> Always enabling gssd provides both of these benefits.
> This is a production environment so there is no testing

I meant QA testing by the distributor. Without the extra
knob, the QA tester has to test only the configuration
where gssd is enabled.

Whenever you add a knob like this, you have to double
your QA test matrix.


> but simpler admin is never a bad thing.
>
>>
>>
>>>>> This patch just tweaking that method to make things easier.
>>>>
>>>> It makes one thing easier, and other things more difficult.
>>>> As a community, I thought our goal was to make Kerberos
>>>> easier to use, not easier to turn off.
>>> Again I can't agree with you more! But this is the case
>>> were Kerberos is *not* being used for NFS... we should
>>> make that case work as well...
>>
>> Agreed.
>>
>> But NFSv4 sec=sys *does* use Kerberos when Kerberos is
>> configured on the system. It's a fact, and we now need to
>> make it convenient and natural and bug-free. The choice is
>> between increasing security and just making it work, or
>> adding one more knob that administrators have to Google for.
> If they do not want use Kerberos for NFS, whether is a good
> idea or not, we can not force them to... Or can we?

No-one is forcing anyone to do anything.


>>>>> To address your concern about covering up a bug. I just don't
>>>>> see it... The code is doing exactly what its asked to do.
>>>>> By default the kernel asks krb5i context (when rpc.gssd
>>>>> is run). rpc.gssd looking for a principle in the key tab,
>>>>> when found the KDC is called...
>>>>>
>>>>> Everything is working just like it should and it is
>>>>> failing just like it should. I'm just trying to
>>>>> eliminate all this process when not needed, in
>>>>> an easier way..
>>>>
>>>> I'm not even sure now what the use case is. The client has
>>>> proper principals, but the server doesn't? The server
>>>> should refuse the init sec context immediately. Is gssd
>>>> even running on the server?
>>> No they don't because they are not using Kerberos for NFS...
>>
>> OK, let's state clearly what's going on here:
>>
>>
>> The client has a host/ principal. gssd is started
>> automatically.
>>
>>
>> The server has what?
> No info on the server other than its Linux and the
> nfs server is running.
>
>>
>> If the server has a keytab and an nfs/ principal,
>> gss-proxy should be running, and there are no delays.
> In my testing when gss-proxy is not runnning the mount
> hangs.
>
>>
>> If the server has a keytab and no nfs/ principal,
>> gss-proxy should be running, and any init sec
>> context should fail immediately. There should be no
>> delay. (If there is a delay, that needs to be
>> troubleshot).
>>
>> If the server does not have a keytab, gss-proxy will
>> not be running, and NFSv4 clients will have to sense
>> this. It takes a moment for each sniff. Otherwise,
>> there's no operational difference.
>>
>>
>> I'm assuming then that the problem is that Kerberos
>> is not set up on the _server_. Can you confirm this?
> I'll try... but we should have to force people to
> set up Kerberos on server they are not going to use.

I say one more time: no-one is forcing anyone to
do anything.


>> Also, this negotiation should be done only during
>> the first contact of each server after a client
>> reboot, thus the delay happens only during the first
>> mount, not during subsequent ones. Can that also be
>> confirmed?
> It appears it happen on all of them.

Can this customer's observed behavior be reproduced
in vitro? Seems like there are many unknowns here, and
it would make sense to get more answers before
proposing a long-term change to our administrative
interfaces.


>>> So I guess this is what we are saying:
>>>
>>> If you what to used Kerberos for anything at all,
>>> they must configure it for NFS for their clients
>>> to work properly... I'm not sure we really want to
>>> say this.
>>
>> Well, the clients are working properly without the
>> server principal in place. They just have an extra
>> delay at mount time. (you yourself pointed out in
>> an earlier e-mail that the client is doing everything
>> correctly, and no mention has been made of any other
>> operational issue).
> This appeared to be the case.
>
>>
>> We should encourage customers to set up in the most
>> secure way possible. In this case:
>>
>> - Kerberos is already available in the environment
>>
>> - It's not _required_ only _recommended_ (clients can
>> still use sec=sys without it) for the server to
>> enable Kerberos, but it's a best practice
>>
>> I'm guessing that if gssd and gss-proxy are running on
>> the server all the time, even when there is no keytab,
>> that delay should go away for everyone. So:
>>
>> - Always run a gssd service on servers that export NFSv4
>> (I assume this will address the delay problem)
>>
>> - Recommend the NFS server be provisioned with an nfs/
>> principal, and explicitly specify sec=sys on exports
>> to prevent clients from negotiating an unwanted Kerberos
>> security setting
> Or don't start rpc.gssd... ;-)
>
>>
>> I far prefer these fixes to adding another administrative
>> setting on the client. It encourages better security, and
>> it addresses the problem for all NFS clients that might
>> want to try using Kerberos against Linux NFS servers, for
>> whatever reason.
> As you say we can only recommend... If they don't
> now want to use secure mounts in a Kerberos environment
> we should not make them, is all I'm saying.

I don't see that I'm proposing otherwise. I've simply
described the recommended best practice.

NFSv4 sec=sys works fine with or without Kerberos
present.

However, if there is a KDC available, and the client
is provisioned with a host/ principal, we recommend
adding an nfs/ service principal to the NFS server.
sec=sys still works in the absence of said principal.

How is that forcing anything?

In the specific case for your customer, it's simply not
clear why the delays occur. More information is needed
before it makes sense to propose a code change.


>>>> Suppose there are a thousand clients and one broken
>>>> server. An administrator would fix that one server by
>>>> adding an extra service principal, rather than log
>>>> into a thousand clients to change a setting on each.
>>>>
>>>> Suppose your client wants both sys and krb5 mounts of
>>>> a group of servers, and some are "misconfigured."
>>>> You have to enable gssd on the client but there are still
>>>> delays on the sec=sys mounts.
>>> In both these cases you are assuming Kerberos mounts
>>> are being used and so Kerberos should be configured
>>> for NFS. That is just not the case.
>>
>> My assumption is that administrators would prefer automatic
>> client set up, and good security by default.
> I don't think we can make any assumption what admins want.
> They want strong security, but not with NFS... That's
> their choice, not ours.

>> There's no way to know in advance whether an administrator
>> will want sec=sys and sec=krb5 mounts on the same system.
>> /etc/fstab can be changed at any time, mounts can be done
>> by hand, or the administrator can add or remove principals
>> from /etc/krb5.keytab.
>>
>> Our clients have to work when there are just sec=sys
>> mounts, or when there are sec=sys and sec=krb5 mounts.
>> They must allow on-demand configuration of sec=krb5. They
>> must attempt to provide the best possible level of security
>> at all times.
>>
>> The out-of-the-shrinkwrap configuration must assume a mix
>> of capabilities.
> I agree... And they are... But if they know for a fact, that
> their client(s) will never want to use secure mount, which
> I'm sure there a few out there, I see no problem in
> not starting a service they well never use.

Why "force" an admin to worry about whether some
random service is running or not?

IMO the mechanism (one or more daemons, a systemctl
service, the use of a keyring, or using The Force) should
be transparent to the administrator, who should care only
about security policy settings.

The whole idea of having separate services for enabling
NFS security is confusing IMO. The default is sec=sys,
but as soon as you vary from that, things get wonky.

It also makes it much harder for distributors or upstream
developers to make alterations to this mechanism while
not altering the administrative interfaces.

I have to check whether "SECURE=YES" is uncommented
in /etc/sysconfig/nfs. I have to check whether nfs.target
includes nfs-secure.service. None of this is obvious or
desirable, and after all is said and done I usually miss
something and have to Google anyway, before a valid
krb5.conf and adding "sec=krb5" works properly.

And the only reason we have this complication is because
someone complained once about extra daemons running.
It's just superstition.

Why can't it be simple for all sec= settings?


>>>> In fact, I think that's going to be pretty common. Why add
>>>> an NFS service principal on a client if you don't expect
>>>> to use sec=krb5 some of the time?
>>> In that case adding the principal does make sense. But...
>>>
>>> Why *must* you add a principal when you know only sec=sys
>>> mounts will be used?
>>
>> Explained in detail above (and this is only for NFSv4, and
>> is not at all a _must_). But in summary:
>>
>> A client will attempt to use Kerberos for NFSv4 sec=sys when
>> there is a host/ or nfs/ principal in its keytab. That needs
>> to be documented.
>>
>> Our _recommendation_ is that the server be provisioned with
>> an nfs/ principal as well when NFSv4 is used in an environment
>> where Kerberos is present. This eliminates a costly per-mount
>> security negotiation, and enables cryptographically strong
>> authentication of each client that mounts that server. NFSv4
>> sec=sys works properly otherwise without this principal.
> The was beautifully said... and I agree with all...
> But customer is going to turn around and tell me to go pound
> sand... Because they are not about to touching their server!!! :-)

What if this customer came back and said "We also
want this to work with NFSv2 on UDP?" Would you
still want to accommodate them?

If they don't want to provision an nfs/ service principal
it would be really helpful for us to know why. IMO the
community should not accommodate anyone who
refuses to use a best practice without a reason. Is
there a reason?


> Esp when all they have to do is disable a service on the client
> where the hang is occurring.

They could also use NFSv3.