2022-09-19 15:44:31

by Chuck Lever

[permalink] [raw]
Subject: NFSv4.0 callback with Kerberos not working

Hi-

I rediscovered recently that NFSv4.0 with Kerberos does not work on
multi-homed hosts. This is true even with sec=sys because the client
attempts to establish a GSS context when there is a keytab present.

Basically my test environment has to work for sec=sys and sec=krb*
and for all NFS versions and minor versions. Thus I keep a keytab
on it.

Now, I have three network interfaces on my client: one RoCE, one
IB, and one GbE. They are each on their own subnet and each has
a unique hostname (that varies in the domain part).

But mounting one of my IB or RoCE test servers with NFSv4.0 results
in the infamous "NFSv4: Invalid callback credential" message. The
client always uses the principal for GbE interface.

This was working at one point, but seems to have devolved over time.


Here are some of the problems I found:

1. The kernel always asks for service=* .

If your system's keytab has only "nfs" service principals in it,
that should be OK. If it has a "host" principal in it, that's
going to be the first one that gssd picks up.

NFSv4.0 callback does not work with a host@ acceptor -- it wants
nfs@.

There are two possible workarounds:

a. Remove all but the nfs@ keys from your system's keytab.

b. Modify the kernel to use "service=nfs" in the upcall.

I favor b. The NFS specifications do not appear to require it,
but they suggest that an "nfs@" principal is always to be used
for protecting NFS with GSS.

But more importantly, other subsystems share the keytab with
NFS. They might want a root@ or host@ key in there too, and
that will break NFSv4.0.


2. nsswitch.conf::hosts now has a "myhostname" service, and it's
placed before the "resolve" service by default.

I enabled systemd-resolved on my systems, to be part of the future.
Yeah, I know, right?

Now, a DNS query for the hostname associated with any of my system's
IP addresses (and there are several) always resolves to the One True
hostname. So gssd always gets the wrong principal when mounting via
alternate network interfaces.

Moving "myhostname" after "resolve" seems to address this issue, but
I'm told that this will be reverted if I reconfigure the resolver or
update the system?

The bugs I found that document this issue keep getting closed because
they target a specific Fedora version which always gets EOL'd after
a year.


3. gssproxy gets the acceptor name wrong.

It has the same problem as in 2, even with the nsswitch.conf
workaround in place. So gssproxy returns the same principal for every
network interface on the system, and that breaks NFSv4.0 callback.

Note also that adding "use-gss-proxy=0" to /etc/nfs.conf does not
appear to disable gssproxy. I had to boot up and then "sudo systemctl
stop gssproxy" and even then, the kernel still tries to make upcalls
to it.

I noticed that setting the gssd debugging options in /etc/nfs.conf
also has no effect. I had to edit the gssd service files to get
debugging information

I'm not sure how to fix this one -- I'd like to see gssproxy
fixed to deal with this correctly, but also whatever reads
/etc/nfs.conf needs to get fixed so that the gssd settings in
that file are observed.


Any opinions or guidance appreciated, especially from maintainers
(like, aw hell naw, or yep that's broken, send a patch).


--
Chuck Lever




2022-09-19 17:46:23

by Chuck Lever

[permalink] [raw]
Subject: Re: NFSv4.0 callback with Kerberos not working

Clarification:


> On Sep 19, 2022, at 11:31 AM, Chuck Lever III <[email protected]> wrote:
>
> Hi-
>
> I rediscovered recently that NFSv4.0 with Kerberos does not work on
> multi-homed hosts. This is true even with sec=sys because the client
> attempts to establish a GSS context when there is a keytab present.
>
> Basically my test environment has to work for sec=sys and sec=krb*
> and for all NFS versions and minor versions. Thus I keep a keytab
> on it.
>
> Now, I have three network interfaces on my client: one RoCE, one
> IB, and one GbE. They are each on their own subnet and each has
> a unique hostname (that varies in the domain part).
>
> But mounting one of my IB or RoCE test servers with NFSv4.0 results
> in the infamous "NFSv4: Invalid callback credential" message. The
> client always uses the principal for GbE interface...

... for the forward channel, but it expects the backchannel
principal to be the acceptor that the server saw on the forward
channel.

Currently, when a Linux client mounts server.ib.example.net:

- the client uses the acceptor [email protected]
(if the keytab happens to have a host@ principal)

- the authenticates to the principal [email protected]

- the client expects to see the server authenticate to
[email protected] as the principal on the
backchannel, but gets [email protected] instead,
and check_gss_callback_principal() fails

IIUC, the NFS protocol expects:
- the client uses the acceptor [email protected]

- the server uses the principal [email protected]

- the client should see [email protected] as the
principal on the backchannel


> This was working at one point, but seems to have devolved over time.
>
>
> Here are some of the problems I found:
>
> 1. The kernel always asks for service=* .
>
> If your system's keytab has only "nfs" service principals in it,
> that should be OK. If it has a "host" principal in it, that's
> going to be the first one that gssd picks up.
>
> NFSv4.0 callback does not work with a host@ acceptor -- it wants
> nfs@.
>
> There are two possible workarounds:
>
> a. Remove all but the nfs@ keys from your system's keytab.
>
> b. Modify the kernel to use "service=nfs" in the upcall.
>
> I favor b. The NFS specifications do not appear to require it,
> but they suggest that an "nfs@" principal is always to be used
> for protecting NFS with GSS.

And: the NFS callback channel is an NFS service that needs to
use an nfs@ service principal. So when the server attempts to
authenticate to the client's callback service, it always needs
to use nfs@.


> But more importantly, other subsystems share the keytab with
> NFS. They might want a root@ or host@ key in there too, and
> that will break NFSv4.0.
>
>
> 2. nsswitch.conf::hosts now has a "myhostname" service, and it's
> placed before the "resolve" service by default.
>
> I enabled systemd-resolved on my systems, to be part of the future.
> Yeah, I know, right?
>
> Now, a DNS query for the hostname associated with any of my system's
> IP addresses (and there are several) always resolves to the One True
> hostname. So gssd always gets the wrong principal when mounting via
> alternate network interfaces.
>
> Moving "myhostname" after "resolve" seems to address this issue, but
> I'm told that this will be reverted if I reconfigure the resolver or
> update the system?
>
> The bugs I found that document this issue keep getting closed because
> they target a specific Fedora version which always gets EOL'd after
> a year.
>
>
> 3. gssproxy gets the acceptor name wrong.
>
> It has the same problem as in 2, even with the nsswitch.conf
> workaround in place. So gssproxy returns the same principal for every
> network interface on the system, and that breaks NFSv4.0 callback.
>
> Note also that adding "use-gss-proxy=0" to /etc/nfs.conf does not
> appear to disable gssproxy. I had to boot up and then "sudo systemctl
> stop gssproxy" and even then, the kernel still tries to make upcalls
> to it.
>
> I noticed that setting the gssd debugging options in /etc/nfs.conf
> also has no effect. I had to edit the gssd service files to get
> debugging information
>
> I'm not sure how to fix this one -- I'd like to see gssproxy
> fixed to deal with this correctly, but also whatever reads
> /etc/nfs.conf needs to get fixed so that the gssd settings in
> that file are observed.
>
>
> Any opinions or guidance appreciated, especially from maintainers
> (like, aw hell naw, or yep that's broken, send a patch).

Another possibility would be to make check_gss_callback_principal()
more flexible.


--
Chuck Lever



2022-09-19 18:08:10

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFSv4.0 callback with Kerberos not working

On Mon, 2022-09-19 at 15:31 +0000, Chuck Lever III wrote:
> Hi-
>
> I rediscovered recently that NFSv4.0 with Kerberos does not work on
> multi-homed hosts. This is true even with sec=sys because the client
> attempts to establish a GSS context when there is a keytab present.
>
> Basically my test environment has to work for sec=sys and sec=krb*
> and for all NFS versions and minor versions. Thus I keep a keytab
> on it.
>
> Now, I have three network interfaces on my client: one RoCE, one
> IB, and one GbE. They are each on their own subnet and each has
> a unique hostname (that varies in the domain part).
>
> But mounting one of my IB or RoCE test servers with NFSv4.0 results
> in the infamous "NFSv4: Invalid callback credential" message. The
> client always uses the principal for GbE interface.
>
> This was working at one point, but seems to have devolved over time.
>
>
> Here are some of the problems I found:
>
> 1. The kernel always asks for service=* .
>
> If your system's keytab has only "nfs" service principals in it,
> that should be OK. If it has a "host" principal in it, that's
> going to be the first one that gssd picks up.
>
> NFSv4.0 callback does not work with a host@ acceptor -- it wants
> nfs@.
>
> There are two possible workarounds:
>
> a. Remove all but the nfs@ keys from your system's keytab.
>
> b. Modify the kernel to use "service=nfs" in the upcall.
>

There's also

c. Put the nfs service principal in its own keytab and use the '-k'
option to tell rpc.gssd where to find it.

However note that 'host/<hostname@REALM>' is normally the expected
principal name for authenticating as a specific hostname. So I'd expect
clients to want to authenticate using that credential so that it is
matched to the hostname entry in /etc/exports on the server.

The 'nfs/<hostname@REALM>' would normally be considered a NFS service
principal name, so should really be used by the NFSv4 server to
identify its service (see RFC5661 Section 2.2.1.1.1.3.) rather than
being used by the NFS client.
The same principal is also used by the NFSv4 server to identify itself
when acting as a client to the NFS callback service according to
RFC7530 section 3.3.3.

So what I'm saying is that for the standard NFS client, then '*' is
probably the right thing to use (with a slight preference for 'host/'),
but for the NFS server use case of connecting to the callback service,
it should specify the 'nfs/' prefix. It can do that right now by
setting the clnt->cl_principal. As far as I can tell, the current
behaviour in knfsd is to set it to the same prefix as the server
svc_cred, and to default to 'nfs/' if the server svc_cred doesn't have
such a prefix.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2022-09-19 18:16:13

by Chuck Lever

[permalink] [raw]
Subject: Re: NFSv4.0 callback with Kerberos not working



> On Sep 19, 2022, at 1:59 PM, Trond Myklebust <[email protected]> wrote:
>
> On Mon, 2022-09-19 at 15:31 +0000, Chuck Lever III wrote:
>> Hi-
>>
>> I rediscovered recently that NFSv4.0 with Kerberos does not work on
>> multi-homed hosts. This is true even with sec=sys because the client
>> attempts to establish a GSS context when there is a keytab present.
>>
>> Basically my test environment has to work for sec=sys and sec=krb*
>> and for all NFS versions and minor versions. Thus I keep a keytab
>> on it.
>>
>> Now, I have three network interfaces on my client: one RoCE, one
>> IB, and one GbE. They are each on their own subnet and each has
>> a unique hostname (that varies in the domain part).
>>
>> But mounting one of my IB or RoCE test servers with NFSv4.0 results
>> in the infamous "NFSv4: Invalid callback credential" message. The
>> client always uses the principal for GbE interface.
>>
>> This was working at one point, but seems to have devolved over time.
>>
>>
>> Here are some of the problems I found:
>>
>> 1. The kernel always asks for service=* .
>>
>> If your system's keytab has only "nfs" service principals in it,
>> that should be OK. If it has a "host" principal in it, that's
>> going to be the first one that gssd picks up.
>>
>> NFSv4.0 callback does not work with a host@ acceptor -- it wants
>> nfs@.
>>
>> There are two possible workarounds:
>>
>> a. Remove all but the nfs@ keys from your system's keytab.
>>
>> b. Modify the kernel to use "service=nfs" in the upcall.
>>
>
> There's also
>
> c. Put the nfs service principal in its own keytab and use the '-k'
> option to tell rpc.gssd where to find it.
>
> However note that 'host/<hostname@REALM>' is normally the expected
> principal name for authenticating as a specific hostname. So I'd expect
> clients to want to authenticate using that credential so that it is
> matched to the hostname entry in /etc/exports on the server.
>
> The 'nfs/<hostname@REALM>' would normally be considered a NFS service
> principal name, so should really be used by the NFSv4 server to
> identify its service (see RFC5661 Section 2.2.1.1.1.3.) rather than
> being used by the NFS client.

Fair enough, we can leave the client's service name alone.


> The same principal is also used by the NFSv4 server to identify itself
> when acting as a client to the NFS callback service according to
> RFC7530 section 3.3.3.
> So what I'm saying is that for the standard NFS client, then '*' is
> probably the right thing to use (with a slight preference for 'host/'),
> but for the NFS server use case of connecting to the callback service,
> it should specify the 'nfs/' prefix. It can do that right now by
> setting the clnt->cl_principal. As far as I can tell, the current
> behaviour in knfsd is to set it to the same prefix as the server
> svc_cred, and to default to 'nfs/' if the server svc_cred doesn't have
> such a prefix.

The server uses the client-provided service name in this case.
If the client authenticates as "host@" then the server will
authenticate to the "host@" service on the backchannel.

Maybe the only mismatch is that my server is using
"[email protected]" on the backchannel, and it should
be using "[email protected]" instead?


--
Chuck Lever



2022-09-19 19:39:00

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFSv4.0 callback with Kerberos not working

On Mon, Sep 19, 2022 at 2:16 PM Chuck Lever III <[email protected]> wrote:
>
>
>
> > On Sep 19, 2022, at 1:59 PM, Trond Myklebust <[email protected]> wrote:
> >
> > On Mon, 2022-09-19 at 15:31 +0000, Chuck Lever III wrote:
> >> Hi-
> >>
> >> I rediscovered recently that NFSv4.0 with Kerberos does not work on
> >> multi-homed hosts. This is true even with sec=sys because the client
> >> attempts to establish a GSS context when there is a keytab present.
> >>
> >> Basically my test environment has to work for sec=sys and sec=krb*
> >> and for all NFS versions and minor versions. Thus I keep a keytab
> >> on it.
> >>
> >> Now, I have three network interfaces on my client: one RoCE, one
> >> IB, and one GbE. They are each on their own subnet and each has
> >> a unique hostname (that varies in the domain part).
> >>
> >> But mounting one of my IB or RoCE test servers with NFSv4.0 results
> >> in the infamous "NFSv4: Invalid callback credential" message. The
> >> client always uses the principal for GbE interface.
> >>
> >> This was working at one point, but seems to have devolved over time.
> >>
> >>
> >> Here are some of the problems I found:
> >>
> >> 1. The kernel always asks for service=* .
> >>
> >> If your system's keytab has only "nfs" service principals in it,
> >> that should be OK. If it has a "host" principal in it, that's
> >> going to be the first one that gssd picks up.
> >>
> >> NFSv4.0 callback does not work with a host@ acceptor -- it wants
> >> nfs@.
> >>
> >> There are two possible workarounds:
> >>
> >> a. Remove all but the nfs@ keys from your system's keytab.
> >>
> >> b. Modify the kernel to use "service=nfs" in the upcall.
> >>
> >
> > There's also
> >
> > c. Put the nfs service principal in its own keytab and use the '-k'
> > option to tell rpc.gssd where to find it.
> >
> > However note that 'host/<hostname@REALM>' is normally the expected
> > principal name for authenticating as a specific hostname. So I'd expect
> > clients to want to authenticate using that credential so that it is
> > matched to the hostname entry in /etc/exports on the server.
> >
> > The 'nfs/<hostname@REALM>' would normally be considered a NFS service
> > principal name, so should really be used by the NFSv4 server to
> > identify its service (see RFC5661 Section 2.2.1.1.1.3.) rather than
> > being used by the NFS client.
>
> Fair enough, we can leave the client's service name alone.
>
>
> > The same principal is also used by the NFSv4 server to identify itself
> > when acting as a client to the NFS callback service according to
> > RFC7530 section 3.3.3.
> > So what I'm saying is that for the standard NFS client, then '*' is
> > probably the right thing to use (with a slight preference for 'host/'),
> > but for the NFS server use case of connecting to the callback service,
> > it should specify the 'nfs/' prefix. It can do that right now by
> > setting the clnt->cl_principal. As far as I can tell, the current
> > behaviour in knfsd is to set it to the same prefix as the server
> > svc_cred, and to default to 'nfs/' if the server svc_cred doesn't have
> > such a prefix.
>
> The server uses the client-provided service name in this case.
> If the client authenticates as "host@" then the server will
> authenticate to the "host@" service on the backchannel.
>
> Maybe the only mismatch is that my server is using
> "[email protected]" on the backchannel, and it should
> be using "[email protected]" instead?

Given that the spec says: "therefore, the realm name for the server
principal must be the same for the callback as it was for the
SETCLIENTID." Doesn't it mean that the server needs to use the same
domain/realm name as what the client authenticated to in the
forechannel (ie server should be using @client.ib.example.net realm
for the callback channel)?

>
>
> --
> Chuck Lever
>
>
>

2022-09-19 20:21:46

by Chuck Lever

[permalink] [raw]
Subject: Re: NFSv4.0 callback with Kerberos not working



> On Sep 19, 2022, at 3:32 PM, Olga Kornievskaia <[email protected]> wrote:
>
> On Mon, Sep 19, 2022 at 2:16 PM Chuck Lever III <[email protected]> wrote:
>>
>>
>>
>>> On Sep 19, 2022, at 1:59 PM, Trond Myklebust <[email protected]> wrote:
>>>
>>> On Mon, 2022-09-19 at 15:31 +0000, Chuck Lever III wrote:
>>>> Hi-
>>>>
>>>> I rediscovered recently that NFSv4.0 with Kerberos does not work on
>>>> multi-homed hosts. This is true even with sec=sys because the client
>>>> attempts to establish a GSS context when there is a keytab present.
>>>>
>>>> Basically my test environment has to work for sec=sys and sec=krb*
>>>> and for all NFS versions and minor versions. Thus I keep a keytab
>>>> on it.
>>>>
>>>> Now, I have three network interfaces on my client: one RoCE, one
>>>> IB, and one GbE. They are each on their own subnet and each has
>>>> a unique hostname (that varies in the domain part).
>>>>
>>>> But mounting one of my IB or RoCE test servers with NFSv4.0 results
>>>> in the infamous "NFSv4: Invalid callback credential" message. The
>>>> client always uses the principal for GbE interface.
>>>>
>>>> This was working at one point, but seems to have devolved over time.
>>>>
>>>>
>>>> Here are some of the problems I found:
>>>>
>>>> 1. The kernel always asks for service=* .
>>>>
>>>> If your system's keytab has only "nfs" service principals in it,
>>>> that should be OK. If it has a "host" principal in it, that's
>>>> going to be the first one that gssd picks up.
>>>>
>>>> NFSv4.0 callback does not work with a host@ acceptor -- it wants
>>>> nfs@.
>>>>
>>>> There are two possible workarounds:
>>>>
>>>> a. Remove all but the nfs@ keys from your system's keytab.
>>>>
>>>> b. Modify the kernel to use "service=nfs" in the upcall.
>>>>
>>>
>>> There's also
>>>
>>> c. Put the nfs service principal in its own keytab and use the '-k'
>>> option to tell rpc.gssd where to find it.
>>>
>>> However note that 'host/<hostname@REALM>' is normally the expected
>>> principal name for authenticating as a specific hostname. So I'd expect
>>> clients to want to authenticate using that credential so that it is
>>> matched to the hostname entry in /etc/exports on the server.
>>>
>>> The 'nfs/<hostname@REALM>' would normally be considered a NFS service
>>> principal name, so should really be used by the NFSv4 server to
>>> identify its service (see RFC5661 Section 2.2.1.1.1.3.) rather than
>>> being used by the NFS client.
>>
>> Fair enough, we can leave the client's service name alone.
>>
>>
>>> The same principal is also used by the NFSv4 server to identify itself
>>> when acting as a client to the NFS callback service according to
>>> RFC7530 section 3.3.3.
>>> So what I'm saying is that for the standard NFS client, then '*' is
>>> probably the right thing to use (with a slight preference for 'host/'),
>>> but for the NFS server use case of connecting to the callback service,
>>> it should specify the 'nfs/' prefix. It can do that right now by
>>> setting the clnt->cl_principal. As far as I can tell, the current
>>> behaviour in knfsd is to set it to the same prefix as the server
>>> svc_cred, and to default to 'nfs/' if the server svc_cred doesn't have
>>> such a prefix.
>>
>> The server uses the client-provided service name in this case.
>> If the client authenticates as "host@" then the server will
>> authenticate to the "host@" service on the backchannel.
>>
>> Maybe the only mismatch is that my server is using
>> "[email protected]" on the backchannel, and it should
>> be using "[email protected]" instead?
>
> Given that the spec says: "therefore, the realm name for the server
> principal must be the same for the callback as it was for the
> SETCLIENTID." Doesn't it mean that the server needs to use the same
> domain/realm name as what the client authenticated to in the
> forechannel (ie server should be using @client.ib.example.net realm
> for the callback channel)?

Yes.

If the server is using the client's acceptor, then it should
authenticate to whatever the client sent it. The server should
use @client.ib.example.net only if that's what the client sent.

The service name component was a red herring.

I'm looking into the Linux server's behavior now, but I have to
revert all my debugging crap to get a clear picture.


--
Chuck Lever



2022-09-20 16:56:09

by Chuck Lever

[permalink] [raw]
Subject: Re: NFSv4.0 callback with Kerberos not working


> On Sep 19, 2022, at 2:15 PM, Chuck Lever III <[email protected]> wrote:
>
>> On Sep 19, 2022, at 1:59 PM, Trond Myklebust <[email protected]> wrote:
>
>> The same principal is also used by the NFSv4 server to identify itself
>> when acting as a client to the NFS callback service according to
>> RFC7530 section 3.3.3.
>> So what I'm saying is that for the standard NFS client, then '*' is
>> probably the right thing to use (with a slight preference for 'host/'),
>> but for the NFS server use case of connecting to the callback service,
>> it should specify the 'nfs/' prefix. It can do that right now by
>> setting the clnt->cl_principal. As far as I can tell, the current
>> behaviour in knfsd is to set it to the same prefix as the server
>> svc_cred, and to default to 'nfs/' if the server svc_cred doesn't have
>> such a prefix.
>
> The server uses the client-provided service name in this case.
> If the client authenticates as "host@" then the server will
> authenticate to the "host@" service on the backchannel.
>
> Maybe the only mismatch is that my server is using
> "[email protected]" on the backchannel, and it should
> be using "[email protected]" instead?

The Linux NFS server uses gssproxy to acquire a credential for
the NFS4_CB context. It appears to be using "uname -n" instead
of the hostname bound to its InfiniBand network interface --
the latter is what matches the acceptor in the context
established by SETCLIENTID, and the former does not.

I've filed an issue against gssproxy to get help understanding
what's going wrong:

https://github.com/gssapi/gssproxy/issues/65


--
Chuck Lever