2018-05-09 21:19:44

by Chuck Lever III

[permalink] [raw]
Subject: SETCLIENTID acceptor

I'm right on the edge of my understanding of how this all works.

I've re-keyed my NFS server. Now on my client, I'm seeing this on
vers=3D4.0,sec=3Dsys mounts:

May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred

manet is my client, and klimt is my server. I'm mounting with
NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.

Because the client is using krb5i for lease management, the server
is required to use krb5i for the callback channel (S 3.3.3 of RFC
7530).

After a SETCLIENTID, the client copies the acceptor from the GSS
context it set up, and uses that to check incoming callback
requests. I instrumented the client's SETCLIENTID proc, and I see
this:

check_gss_callback_principal: [email protected], =
[email protected]

The principal strings are not equal, and that's why the client
believes the callback credential is bogus. Now I'm trying to
figure out whether it is the server's callback client or the
client's callback server that is misbehaving.

To me, the server's callback principal (host@klimt) seems like it
is correct. The client would identify as host@manet when making
calls to the server, for example, so I'd expect the server to
behave similarly when performing callbacks.

Can anyone shed more light on this?


--
Chuck Lever





2018-05-10 17:40:36

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor

On Wed, May 9, 2018 at 5:19 PM, Chuck Lever <[email protected]> wrote:
> I'm right on the edge of my understanding of how this all works.
>
> I've re-keyed my NFS server. Now on my client, I'm seeing this on
> vers=4.0,sec=sys mounts:
>
> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>
> manet is my client, and klimt is my server. I'm mounting with
> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>
> Because the client is using krb5i for lease management, the server
> is required to use krb5i for the callback channel (S 3.3.3 of RFC
> 7530).
>
> After a SETCLIENTID, the client copies the acceptor from the GSS
> context it set up, and uses that to check incoming callback
> requests. I instrumented the client's SETCLIENTID proc, and I see
> this:
>
> check_gss_callback_principal: [email protected], [email protected]
>
> The principal strings are not equal, and that's why the client
> believes the callback credential is bogus. Now I'm trying to
> figure out whether it is the server's callback client or the
> client's callback server that is misbehaving.
>
> To me, the server's callback principal (host@klimt) seems like it
> is correct. The client would identify as host@manet when making
> calls to the server, for example, so I'd expect the server to
> behave similarly when performing callbacks.
>
> Can anyone shed more light on this?

What are your full hostnames of each machine and does the reverse
lookup from the ip to hostname on each machine give you what you
expect?

Sounds like all of them need to be resolved to <>.ib.1015grager.net
but somewhere you are getting <>.1015grager.net instead.

2018-05-10 18:09:22

by Chuck Lever III

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor



> On May 10, 2018, at 1:40 PM, Olga Kornievskaia <[email protected]> wrote:
>=20
> On Wed, May 9, 2018 at 5:19 PM, Chuck Lever <[email protected]> =
wrote:
>> I'm right on the edge of my understanding of how this all works.
>>=20
>> I've re-keyed my NFS server. Now on my client, I'm seeing this on
>> vers=3D4.0,sec=3Dsys mounts:
>>=20
>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid =
cred
>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid =
cred
>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid =
cred
>>=20
>> manet is my client, and klimt is my server. I'm mounting with
>> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>>=20
>> Because the client is using krb5i for lease management, the server
>> is required to use krb5i for the callback channel (S 3.3.3 of RFC
>> 7530).
>>=20
>> After a SETCLIENTID, the client copies the acceptor from the GSS
>> context it set up, and uses that to check incoming callback
>> requests. I instrumented the client's SETCLIENTID proc, and I see
>> this:
>>=20
>> check_gss_callback_principal: [email protected], =
[email protected]
>>=20
>> The principal strings are not equal, and that's why the client
>> believes the callback credential is bogus. Now I'm trying to
>> figure out whether it is the server's callback client or the
>> client's callback server that is misbehaving.
>>=20
>> To me, the server's callback principal (host@klimt) seems like it
>> is correct. The client would identify as host@manet when making
>> calls to the server, for example, so I'd expect the server to
>> behave similarly when performing callbacks.
>>=20
>> Can anyone shed more light on this?
>=20
> What are your full hostnames of each machine and does the reverse
> lookup from the ip to hostname on each machine give you what you
> expect?
>=20
> Sounds like all of them need to be resolved to <>.ib.1015grager.net
> but somewhere you are getting <>.1015grager.net instead.

The forward and reverse mappings are consistent, and rdns is
disabled in my krb5.conf files. My server is multi-homed; it
has a 1GbE interface (klimt.1015granger.net); an FDR IB
interface (klimt.ib.1015granger.net); and a 25 GbE interface
(klimt.roce.1015granger.net).

My theory is that the server needs to use the same principal
for callback operations that the client used for lease
establishment. The last paragraph of S3.3.3 seems to state
that requirement, though it's not especially clear; and the
client has required it since commit f11b2a1cfbf5 (2014).

So the server should authenticate as [email protected] and not
host@klimt, in this case, when performing callback requests.

This seems to mean that the server stack is going to need to
expose the SName in each GSS context so that it can dig that
out to create a proper callback credential for each callback
transport.

I guess I've reported this issue before, but now I'm tucking
in and trying to address it correctly.


--
Chuck Lever




2018-05-10 19:07:18

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor

On Thu, May 10, 2018 at 2:09 PM, Chuck Lever <[email protected]> wrote:
>
>
>> On May 10, 2018, at 1:40 PM, Olga Kornievskaia <[email protected]> wrote:
>>
>> On Wed, May 9, 2018 at 5:19 PM, Chuck Lever <[email protected]> wrote:
>>> I'm right on the edge of my understanding of how this all works.
>>>
>>> I've re-keyed my NFS server. Now on my client, I'm seeing this on
>>> vers=4.0,sec=sys mounts:
>>>
>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>
>>> manet is my client, and klimt is my server. I'm mounting with
>>> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>>>
>>> Because the client is using krb5i for lease management, the server
>>> is required to use krb5i for the callback channel (S 3.3.3 of RFC
>>> 7530).
>>>
>>> After a SETCLIENTID, the client copies the acceptor from the GSS
>>> context it set up, and uses that to check incoming callback
>>> requests. I instrumented the client's SETCLIENTID proc, and I see
>>> this:
>>>
>>> check_gss_callback_principal: [email protected], [email protected]
>>>
>>> The principal strings are not equal, and that's why the client
>>> believes the callback credential is bogus. Now I'm trying to
>>> figure out whether it is the server's callback client or the
>>> client's callback server that is misbehaving.
>>>
>>> To me, the server's callback principal (host@klimt) seems like it
>>> is correct. The client would identify as host@manet when making
>>> calls to the server, for example, so I'd expect the server to
>>> behave similarly when performing callbacks.
>>>
>>> Can anyone shed more light on this?
>>
>> What are your full hostnames of each machine and does the reverse
>> lookup from the ip to hostname on each machine give you what you
>> expect?
>>
>> Sounds like all of them need to be resolved to <>.ib.1015grager.net
>> but somewhere you are getting <>.1015grager.net instead.
>
> The forward and reverse mappings are consistent, and rdns is
> disabled in my krb5.conf files. My server is multi-homed; it
> has a 1GbE interface (klimt.1015granger.net); an FDR IB
> interface (klimt.ib.1015granger.net); and a 25 GbE interface
> (klimt.roce.1015granger.net).

Ah, so you are keeping it very interesting...

> My theory is that the server needs to use the same principal
> for callback operations that the client used for lease
> establishment. The last paragraph of S3.3.3 seems to state
> that requirement, though it's not especially clear; and the
> client has required it since commit f11b2a1cfbf5 (2014).
>
> So the server should authenticate as [email protected] and not
> host@klimt, in this case, when performing callback requests.

Yes I agree that server should have authenticated as [email protected] and
that's what I see in my (simple) single home setup.

In nfs-utils there is code that deals with the callback and comment
about choices for the principal:
* Restricting gssd to use "nfs" service name is needed for when
* the NFS server is doing a callback to the NFS client. In this
* case, the NFS server has to authenticate itself as "nfs" --
* even if there are other service keys such as "host" or "root"
* in the keytab.
So the upcall for the callback should have specifically specified
"nfs" to look for the nfs/<hostname>. Question is if you key tab has
both:
nfs/klmit and nfs/klmit.ib how does it choose which one to take. I'm
not sure. But I guess in your case you are seeing that it choose
"host/<>" which would really be a nfs-utils bug.

What's in your server's key tab?

An output from gssd -vvv would be interesting.

> This seems to mean that the server stack is going to need to
> expose the SName in each GSS context so that it can dig that
> out to create a proper callback credential for each callback
> transport.
>
> I guess I've reported this issue before, but now I'm tucking
> in and trying to address it correctly.
>
>
> --
> Chuck Lever
>
>
>

2018-05-10 19:23:18

by Chuck Lever III

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor



> On May 10, 2018, at 3:07 PM, Olga Kornievskaia <[email protected]> wrote:
>=20
> On Thu, May 10, 2018 at 2:09 PM, Chuck Lever <[email protected]> =
wrote:
>>=20
>>=20
>>> On May 10, 2018, at 1:40 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>>=20
>>> On Wed, May 9, 2018 at 5:19 PM, Chuck Lever <[email protected]> =
wrote:
>>>> I'm right on the edge of my understanding of how this all works.
>>>>=20
>>>> I've re-keyed my NFS server. Now on my client, I'm seeing this on
>>>> vers=3D4.0,sec=3Dsys mounts:
>>>>=20
>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid =
cred
>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid =
cred
>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid =
cred
>>>>=20
>>>> manet is my client, and klimt is my server. I'm mounting with
>>>> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>>>>=20
>>>> Because the client is using krb5i for lease management, the server
>>>> is required to use krb5i for the callback channel (S 3.3.3 of RFC
>>>> 7530).
>>>>=20
>>>> After a SETCLIENTID, the client copies the acceptor from the GSS
>>>> context it set up, and uses that to check incoming callback
>>>> requests. I instrumented the client's SETCLIENTID proc, and I see
>>>> this:
>>>>=20
>>>> check_gss_callback_principal: =
[email protected], =
[email protected]
>>>>=20
>>>> The principal strings are not equal, and that's why the client
>>>> believes the callback credential is bogus. Now I'm trying to
>>>> figure out whether it is the server's callback client or the
>>>> client's callback server that is misbehaving.
>>>>=20
>>>> To me, the server's callback principal (host@klimt) seems like it
>>>> is correct. The client would identify as host@manet when making
>>>> calls to the server, for example, so I'd expect the server to
>>>> behave similarly when performing callbacks.
>>>>=20
>>>> Can anyone shed more light on this?
>>>=20
>>> What are your full hostnames of each machine and does the reverse
>>> lookup from the ip to hostname on each machine give you what you
>>> expect?
>>>=20
>>> Sounds like all of them need to be resolved to <>.ib.1015grager.net
>>> but somewhere you are getting <>.1015grager.net instead.
>>=20
>> The forward and reverse mappings are consistent, and rdns is
>> disabled in my krb5.conf files. My server is multi-homed; it
>> has a 1GbE interface (klimt.1015granger.net); an FDR IB
>> interface (klimt.ib.1015granger.net); and a 25 GbE interface
>> (klimt.roce.1015granger.net).
>=20
> Ah, so you are keeping it very interesting...
>=20
>> My theory is that the server needs to use the same principal
>> for callback operations that the client used for lease
>> establishment. The last paragraph of S3.3.3 seems to state
>> that requirement, though it's not especially clear; and the
>> client has required it since commit f11b2a1cfbf5 (2014).
>>=20
>> So the server should authenticate as [email protected] and not
>> host@klimt, in this case, when performing callback requests.
>=20
> Yes I agree that server should have authenticated as [email protected] and
> that's what I see in my (simple) single home setup.
>=20
> In nfs-utils there is code that deals with the callback and comment
> about choices for the principal:
> * Restricting gssd to use "nfs" service name is needed for =
when
> * the NFS server is doing a callback to the NFS client. In =
this
> * case, the NFS server has to authenticate itself as "nfs" --
> * even if there are other service keys such as "host" or =
"root"
> * in the keytab.
> So the upcall for the callback should have specifically specified
> "nfs" to look for the nfs/<hostname>. Question is if you key tab has
> both:
> nfs/klmit and nfs/klmit.ib how does it choose which one to take. I'm
> not sure. But I guess in your case you are seeing that it choose
> "host/<>" which would really be a nfs-utils bug.

I think the upcall is correctly requesting an nfs/ principal
(see below).

Not only does it need to choose an nfs/ principal, but it also
has to pick the correct domain name. The domain name does not
seem to be passed up to gssd. fs/nfsd/nfs4state.c has this:

749 static struct rpc_cred *callback_cred;
750=20
751 int set_callback_cred(void)
752 {
753 if (callback_cred)
754 return 0;
755 callback_cred =3D rpc_lookup_machine_cred("nfs");
756 if (!callback_cred)
757 return -ENOMEM;
758 return 0;
759 }
760=20
761 void cleanup_callback_cred(void)
762 {
763 if (callback_cred) {
764 put_rpccred(callback_cred);
765 callback_cred =3D NULL;
766 }
767 }
768=20
769 static struct rpc_cred *get_backchannel_cred(struct nfs4_client =
*clp, struct rpc_clnt *client, struct nfsd4_session *ses)
770 {
771 if (clp->cl_minorversion =3D=3D 0) {
772 return get_rpccred(callback_cred);
773 } else {
774 struct rpc_auth *auth =3D client->cl_auth;
775 struct auth_cred acred =3D {};
776=20
777 acred.uid =3D ses->se_cb_sec.uid;
778 acred.gid =3D ses->se_cb_sec.gid;
779 return auth->au_ops->lookup_cred(client->cl_auth, =
&acred, 0);
780 }
781 }=20

rpc_lookup_machine_cred("nfs"); should request an "nfs/" service
principal, shouldn't it?

Though I think this approach is incorrect. The server should not
use the machine cred here, it should use a credential based on
the principal the client used to establish it's lease.


> What's in your server's key tab?

[root@klimt ~]# klist -ke /etc/krb5.keytab
Keytab name: FILE:/etc/krb5.keytab
KVNO Principal
---- =
--------------------------------------------------------------------------=

4 host/[email protected] =
(aes256-cts-hmac-sha1-96)=20
4 host/[email protected] =
(aes128-cts-hmac-sha1-96)=20
4 host/[email protected] (des3-cbc-sha1)=20
4 host/[email protected] (arcfour-hmac)=20
3 nfs/[email protected] (aes256-cts-hmac-sha1-96)=20=

3 nfs/[email protected] (aes128-cts-hmac-sha1-96)=20=

3 nfs/[email protected] (des3-cbc-sha1)=20
3 nfs/[email protected] (arcfour-hmac)=20
3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)=20
3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)=20
3 nfs/[email protected] (des3-cbc-sha1)=20
3 nfs/[email protected] (arcfour-hmac)=20
3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)=20
3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)=20
3 nfs/[email protected] (des3-cbc-sha1)=20
3 nfs/[email protected] (arcfour-hmac)=20
[root@klimt ~]#=20

As a workaround, I bet moving the keys for nfs/klimt.ib to
the front of the keytab file would allow Kerberos to work
with the klimt.ib interface.


> An output from gssd -vvv would be interesting.

May 10 14:43:24 klimt rpc.gssd[1191]: #012handle_gssd_upcall: 'mech=3Dkrb5=
uid=3D0 [email protected] service=3Dnfs =
enctypes=3D18,17,16,2
3,3,1,2 ' (nfsd4_cb/clnt0)
May 10 14:43:24 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 =
tgtname [email protected]
May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for =
'manet.1015granger.net' is 'manet.1015granger.net'
May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for =
'klimt.1015granger.net' is 'klimt.1015granger.net'
May 10 14:43:24 klimt rpc.gssd[1191]: Success getting keytab entry for =
'nfs/[email protected]'
May 10 14:43:24 klimt rpc.gssd[1191]: gssd_get_single_krb5_cred: =
principal 'nfs/[email protected]' =
ccache:'FILE:/tmp/krb5ccmachine_1015GRANGER.NET'
May 10 14:43:24 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
May 10 14:43:24 klimt rpc.gssd[1191]: creating tcp client for server =
manet.1015granger.net
May 10 14:43:24 klimt rpc.gssd[1191]: creating context with server =
[email protected]
May 10 14:43:24 klimt rpc.gssd[1191]: doing downcall: lifetime_rec=3D76170=
[email protected]
May 10 14:44:31 klimt rpc.gssd[1191]: #012handle_gssd_upcall: 'mech=3Dkrb5=
uid=3D0 [email protected] service=3Dnfs =
enctypes=3D18,17,16,23,3,1,2 ' (nfsd4_cb/clnt1)
May 10 14:44:31 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 =
tgtname [email protected]
May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for =
'manet.1015granger.net' is 'manet.1015granger.net'
May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for =
'klimt.1015granger.net' is 'klimt.1015granger.net'
May 10 14:44:31 klimt rpc.gssd[1191]: Success getting keytab entry for =
'nfs/[email protected]'
May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
May 10 14:44:31 klimt rpc.gssd[1191]: creating tcp client for server =
manet.1015granger.net
May 10 14:44:31 klimt rpc.gssd[1191]: creating context with server =
[email protected]
May 10 14:44:31 klimt rpc.gssd[1191]: doing downcall: lifetime_rec=3D76103=
[email protected]


>> This seems to mean that the server stack is going to need to
>> expose the SName in each GSS context so that it can dig that
>> out to create a proper callback credential for each callback
>> transport.
>>=20
>> I guess I've reported this issue before, but now I'm tucking
>> in and trying to address it correctly.

--
Chuck Lever




2018-05-10 20:58:40

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor

On Thu, May 10, 2018 at 3:23 PM, Chuck Lever <[email protected]> wrote:
>
>
>> On May 10, 2018, at 3:07 PM, Olga Kornievskaia <[email protected]> wrote:
>>
>> On Thu, May 10, 2018 at 2:09 PM, Chuck Lever <[email protected]> wrote:
>>>
>>>
>>>> On May 10, 2018, at 1:40 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>
>>>> On Wed, May 9, 2018 at 5:19 PM, Chuck Lever <[email protected]> wrote:
>>>>> I'm right on the edge of my understanding of how this all works.
>>>>>
>>>>> I've re-keyed my NFS server. Now on my client, I'm seeing this on
>>>>> vers=4.0,sec=sys mounts:
>>>>>
>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>>
>>>>> manet is my client, and klimt is my server. I'm mounting with
>>>>> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>>>>>
>>>>> Because the client is using krb5i for lease management, the server
>>>>> is required to use krb5i for the callback channel (S 3.3.3 of RFC
>>>>> 7530).
>>>>>
>>>>> After a SETCLIENTID, the client copies the acceptor from the GSS
>>>>> context it set up, and uses that to check incoming callback
>>>>> requests. I instrumented the client's SETCLIENTID proc, and I see
>>>>> this:
>>>>>
>>>>> check_gss_callback_principal: [email protected], [email protected]
>>>>>
>>>>> The principal strings are not equal, and that's why the client
>>>>> believes the callback credential is bogus. Now I'm trying to
>>>>> figure out whether it is the server's callback client or the
>>>>> client's callback server that is misbehaving.
>>>>>
>>>>> To me, the server's callback principal (host@klimt) seems like it
>>>>> is correct. The client would identify as host@manet when making
>>>>> calls to the server, for example, so I'd expect the server to
>>>>> behave similarly when performing callbacks.
>>>>>
>>>>> Can anyone shed more light on this?
>>>>
>>>> What are your full hostnames of each machine and does the reverse
>>>> lookup from the ip to hostname on each machine give you what you
>>>> expect?
>>>>
>>>> Sounds like all of them need to be resolved to <>.ib.1015grager.net
>>>> but somewhere you are getting <>.1015grager.net instead.
>>>
>>> The forward and reverse mappings are consistent, and rdns is
>>> disabled in my krb5.conf files. My server is multi-homed; it
>>> has a 1GbE interface (klimt.1015granger.net); an FDR IB
>>> interface (klimt.ib.1015granger.net); and a 25 GbE interface
>>> (klimt.roce.1015granger.net).
>>
>> Ah, so you are keeping it very interesting...
>>
>>> My theory is that the server needs to use the same principal
>>> for callback operations that the client used for lease
>>> establishment. The last paragraph of S3.3.3 seems to state
>>> that requirement, though it's not especially clear; and the
>>> client has required it since commit f11b2a1cfbf5 (2014).
>>>
>>> So the server should authenticate as [email protected] and not
>>> host@klimt, in this case, when performing callback requests.
>>
>> Yes I agree that server should have authenticated as [email protected] and
>> that's what I see in my (simple) single home setup.
>>
>> In nfs-utils there is code that deals with the callback and comment
>> about choices for the principal:
>> * Restricting gssd to use "nfs" service name is needed for when
>> * the NFS server is doing a callback to the NFS client. In this
>> * case, the NFS server has to authenticate itself as "nfs" --
>> * even if there are other service keys such as "host" or "root"
>> * in the keytab.
>> So the upcall for the callback should have specifically specified
>> "nfs" to look for the nfs/<hostname>. Question is if you key tab has
>> both:
>> nfs/klmit and nfs/klmit.ib how does it choose which one to take. I'm
>> not sure. But I guess in your case you are seeing that it choose
>> "host/<>" which would really be a nfs-utils bug.
>
> I think the upcall is correctly requesting an nfs/ principal
> (see below).
>
> Not only does it need to choose an nfs/ principal, but it also
> has to pick the correct domain name. The domain name does not
> seem to be passed up to gssd. fs/nfsd/nfs4state.c has this:
>
> 749 static struct rpc_cred *callback_cred;
> 750
> 751 int set_callback_cred(void)
> 752 {
> 753 if (callback_cred)
> 754 return 0;
> 755 callback_cred = rpc_lookup_machine_cred("nfs");
> 756 if (!callback_cred)
> 757 return -ENOMEM;
> 758 return 0;
> 759 }
> 760
> 761 void cleanup_callback_cred(void)
> 762 {
> 763 if (callback_cred) {
> 764 put_rpccred(callback_cred);
> 765 callback_cred = NULL;
> 766 }
> 767 }
> 768
> 769 static struct rpc_cred *get_backchannel_cred(struct nfs4_client *clp, struct rpc_clnt *client, struct nfsd4_session *ses)
> 770 {
> 771 if (clp->cl_minorversion == 0) {
> 772 return get_rpccred(callback_cred);
> 773 } else {
> 774 struct rpc_auth *auth = client->cl_auth;
> 775 struct auth_cred acred = {};
> 776
> 777 acred.uid = ses->se_cb_sec.uid;
> 778 acred.gid = ses->se_cb_sec.gid;
> 779 return auth->au_ops->lookup_cred(client->cl_auth, &acred, 0);
> 780 }
> 781 }
>
> rpc_lookup_machine_cred("nfs"); should request an "nfs/" service
> principal, shouldn't it?
>
> Though I think this approach is incorrect. The server should not
> use the machine cred here, it should use a credential based on
> the principal the client used to establish it's lease.
>
>
>> What's in your server's key tab?
>
> [root@klimt ~]# klist -ke /etc/krb5.keytab
> Keytab name: FILE:/etc/krb5.keytab
> KVNO Principal
> ---- --------------------------------------------------------------------------
> 4 host/[email protected] (aes256-cts-hmac-sha1-96)
> 4 host/[email protected] (aes128-cts-hmac-sha1-96)
> 4 host/[email protected] (des3-cbc-sha1)
> 4 host/[email protected] (arcfour-hmac)
> 3 nfs/[email protected] (aes256-cts-hmac-sha1-96)
> 3 nfs/[email protected] (aes128-cts-hmac-sha1-96)
> 3 nfs/[email protected] (des3-cbc-sha1)
> 3 nfs/[email protected] (arcfour-hmac)
> 3 nfs/[email protected] (aes256-cts-hmac-sha1-96)
> 3 nfs/[email protected] (aes128-cts-hmac-sha1-96)
> 3 nfs/[email protected] (des3-cbc-sha1)
> 3 nfs/[email protected] (arcfour-hmac)
> 3 nfs/[email protected] (aes256-cts-hmac-sha1-96)
> 3 nfs/[email protected] (aes128-cts-hmac-sha1-96)
> 3 nfs/[email protected] (des3-cbc-sha1)
> 3 nfs/[email protected] (arcfour-hmac)
> [root@klimt ~]#
>
> As a workaround, I bet moving the keys for nfs/klimt.ib to
> the front of the keytab file would allow Kerberos to work
> with the klimt.ib interface.
>
>
>> An output from gssd -vvv would be interesting.
>
> May 10 14:43:24 klimt rpc.gssd[1191]: #012handle_gssd_upcall: 'mech=krb5 uid=0 [email protected] service=nfs enctypes=18,17,16,2
> 3,3,1,2 ' (nfsd4_cb/clnt0)
> May 10 14:43:24 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 tgtname [email protected]
> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for 'manet.1015granger.net' is 'manet.1015granger.net'
> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for 'klimt.1015granger.net' is 'klimt.1015granger.net'

I think that's the problem. This should have been
klimt.ib.1015granger.net. nfs-utils just calls gethostname() to get
the local domain name. And this is what it'll match against the key
tab entry. So I think even if you move the key tabs around it probably
will still pick [email protected].

Honestly, I'm also surprised that "[email protected]"
and not "[email protected]". What principal name
did the client use to authenticate to the server? I also somehow
assumed that this should have been
"[email protected]".

> May 10 14:43:24 klimt rpc.gssd[1191]: Success getting keytab entry for 'nfs/[email protected]'
> May 10 14:43:24 klimt rpc.gssd[1191]: gssd_get_single_krb5_cred: principal 'nfs/[email protected]' ccache:'FILE:/tmp/krb5ccmachine_1015GRANGER.NET'
> May 10 14:43:24 klimt rpc.gssd[1191]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
> May 10 14:43:24 klimt rpc.gssd[1191]: creating tcp client for server manet.1015granger.net
> May 10 14:43:24 klimt rpc.gssd[1191]: creating context with server [email protected]
> May 10 14:43:24 klimt rpc.gssd[1191]: doing downcall: lifetime_rec=76170 [email protected]
> May 10 14:44:31 klimt rpc.gssd[1191]: #012handle_gssd_upcall: 'mech=krb5 uid=0 [email protected] service=nfs enctypes=18,17,16,23,3,1,2 ' (nfsd4_cb/clnt1)
> May 10 14:44:31 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 tgtname [email protected]
> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for 'manet.1015granger.net' is 'manet.1015granger.net'
> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for 'klimt.1015granger.net' is 'klimt.1015granger.net'
> May 10 14:44:31 klimt rpc.gssd[1191]: Success getting keytab entry for 'nfs/[email protected]'
> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
> May 10 14:44:31 klimt rpc.gssd[1191]: creating tcp client for server manet.1015granger.net
> May 10 14:44:31 klimt rpc.gssd[1191]: creating context with server [email protected]
> May 10 14:44:31 klimt rpc.gssd[1191]: doing downcall: lifetime_rec=76103 [email protected]

Going back to the original mail where you wrote:

check_gss_callback_principal: [email protected],
[email protected]

Where is this output on the client kernel or server kernel?

According to the gssd output. In the callback authentication
[email protected] is authenticating to
[email protected]. None of them match the
"check_gss_callback_principal" output. So I'm confused...


>
>
>>> This seems to mean that the server stack is going to need to
>>> expose the SName in each GSS context so that it can dig that
>>> out to create a proper callback credential for each callback
>>> transport.
>>>
>>> I guess I've reported this issue before, but now I'm tucking
>>> in and trying to address it correctly.
>
> --
> Chuck Lever
>
>
>

2018-05-10 21:11:17

by Chuck Lever III

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor



> On May 10, 2018, at 4:58 PM, Olga Kornievskaia <[email protected]> wrote:
>=20
> On Thu, May 10, 2018 at 3:23 PM, Chuck Lever <[email protected]> =
wrote:
>>=20
>>=20
>>> On May 10, 2018, at 3:07 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>>=20
>>> On Thu, May 10, 2018 at 2:09 PM, Chuck Lever =
<[email protected]> wrote:
>>>>=20
>>>>=20
>>>>> On May 10, 2018, at 1:40 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>>>>=20
>>>>> On Wed, May 9, 2018 at 5:19 PM, Chuck Lever =
<[email protected]> wrote:
>>>>>> I'm right on the edge of my understanding of how this all works.
>>>>>>=20
>>>>>> I've re-keyed my NFS server. Now on my client, I'm seeing this on
>>>>>> vers=3D4.0,sec=3Dsys mounts:
>>>>>>=20
>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains =
invalid cred
>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains =
invalid cred
>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains =
invalid cred
>>>>>>=20
>>>>>> manet is my client, and klimt is my server. I'm mounting with
>>>>>> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>>>>>>=20
>>>>>> Because the client is using krb5i for lease management, the =
server
>>>>>> is required to use krb5i for the callback channel (S 3.3.3 of RFC
>>>>>> 7530).
>>>>>>=20
>>>>>> After a SETCLIENTID, the client copies the acceptor from the GSS
>>>>>> context it set up, and uses that to check incoming callback
>>>>>> requests. I instrumented the client's SETCLIENTID proc, and I see
>>>>>> this:
>>>>>>=20
>>>>>> check_gss_callback_principal: =
[email protected], =
[email protected]
>>>>>>=20
>>>>>> The principal strings are not equal, and that's why the client
>>>>>> believes the callback credential is bogus. Now I'm trying to
>>>>>> figure out whether it is the server's callback client or the
>>>>>> client's callback server that is misbehaving.
>>>>>>=20
>>>>>> To me, the server's callback principal (host@klimt) seems like it
>>>>>> is correct. The client would identify as host@manet when making
>>>>>> calls to the server, for example, so I'd expect the server to
>>>>>> behave similarly when performing callbacks.
>>>>>>=20
>>>>>> Can anyone shed more light on this?
>>>>>=20
>>>>> What are your full hostnames of each machine and does the reverse
>>>>> lookup from the ip to hostname on each machine give you what you
>>>>> expect?
>>>>>=20
>>>>> Sounds like all of them need to be resolved to =
<>.ib.1015grager.net
>>>>> but somewhere you are getting <>.1015grager.net instead.
>>>>=20
>>>> The forward and reverse mappings are consistent, and rdns is
>>>> disabled in my krb5.conf files. My server is multi-homed; it
>>>> has a 1GbE interface (klimt.1015granger.net); an FDR IB
>>>> interface (klimt.ib.1015granger.net); and a 25 GbE interface
>>>> (klimt.roce.1015granger.net).
>>>=20
>>> Ah, so you are keeping it very interesting...
>>>=20
>>>> My theory is that the server needs to use the same principal
>>>> for callback operations that the client used for lease
>>>> establishment. The last paragraph of S3.3.3 seems to state
>>>> that requirement, though it's not especially clear; and the
>>>> client has required it since commit f11b2a1cfbf5 (2014).
>>>>=20
>>>> So the server should authenticate as [email protected] and not
>>>> host@klimt, in this case, when performing callback requests.
>>>=20
>>> Yes I agree that server should have authenticated as [email protected] =
and
>>> that's what I see in my (simple) single home setup.
>>>=20
>>> In nfs-utils there is code that deals with the callback and comment
>>> about choices for the principal:
>>> * Restricting gssd to use "nfs" service name is needed for =
when
>>> * the NFS server is doing a callback to the NFS client. In =
this
>>> * case, the NFS server has to authenticate itself as "nfs" --
>>> * even if there are other service keys such as "host" or =
"root"
>>> * in the keytab.
>>> So the upcall for the callback should have specifically specified
>>> "nfs" to look for the nfs/<hostname>. Question is if you key tab has
>>> both:
>>> nfs/klmit and nfs/klmit.ib how does it choose which one to take. I'm
>>> not sure. But I guess in your case you are seeing that it choose
>>> "host/<>" which would really be a nfs-utils bug.
>>=20
>> I think the upcall is correctly requesting an nfs/ principal
>> (see below).
>>=20
>> Not only does it need to choose an nfs/ principal, but it also
>> has to pick the correct domain name. The domain name does not
>> seem to be passed up to gssd. fs/nfsd/nfs4state.c has this:

Sorry, this is fs/nfsd/nfs4callback.c


>> 749 static struct rpc_cred *callback_cred;
>> 750
>> 751 int set_callback_cred(void)
>> 752 {
>> 753 if (callback_cred)
>> 754 return 0;
>> 755 callback_cred =3D rpc_lookup_machine_cred("nfs");
>> 756 if (!callback_cred)
>> 757 return -ENOMEM;
>> 758 return 0;
>> 759 }
>> 760
>> 761 void cleanup_callback_cred(void)
>> 762 {
>> 763 if (callback_cred) {
>> 764 put_rpccred(callback_cred);
>> 765 callback_cred =3D NULL;
>> 766 }
>> 767 }
>> 768
>> 769 static struct rpc_cred *get_backchannel_cred(struct nfs4_client =
*clp, struct rpc_clnt *client, struct nfsd4_session *ses)
>> 770 {
>> 771 if (clp->cl_minorversion =3D=3D 0) {
>> 772 return get_rpccred(callback_cred);
>> 773 } else {
>> 774 struct rpc_auth *auth =3D client->cl_auth;
>> 775 struct auth_cred acred =3D {};
>> 776
>> 777 acred.uid =3D ses->se_cb_sec.uid;
>> 778 acred.gid =3D ses->se_cb_sec.gid;
>> 779 return auth->au_ops->lookup_cred(client->cl_auth, =
&acred, 0);
>> 780 }
>> 781 }
>>=20
>> rpc_lookup_machine_cred("nfs"); should request an "nfs/" service
>> principal, shouldn't it?

It doesn't seem to generate an upcall.


>> Though I think this approach is incorrect. The server should not
>> use the machine cred here, it should use a credential based on
>> the principal the client used to establish it's lease.
>>=20
>>=20
>>> What's in your server's key tab?
>>=20
>> [root@klimt ~]# klist -ke /etc/krb5.keytab
>> Keytab name: FILE:/etc/krb5.keytab
>> KVNO Principal
>> ---- =
--------------------------------------------------------------------------=

>> 4 host/[email protected] =
(aes256-cts-hmac-sha1-96)
>> 4 host/[email protected] =
(aes128-cts-hmac-sha1-96)
>> 4 host/[email protected] (des3-cbc-sha1)
>> 4 host/[email protected] (arcfour-hmac)
>> 3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)
>> 3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)
>> 3 nfs/[email protected] (des3-cbc-sha1)
>> 3 nfs/[email protected] (arcfour-hmac)
>> 3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)
>> 3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)
>> 3 nfs/[email protected] (des3-cbc-sha1)
>> 3 nfs/[email protected] (arcfour-hmac)
>> 3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)
>> 3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)
>> 3 nfs/[email protected] (des3-cbc-sha1)
>> 3 nfs/[email protected] (arcfour-hmac)
>> [root@klimt ~]#
>>=20
>> As a workaround, I bet moving the keys for nfs/klimt.ib to
>> the front of the keytab file would allow Kerberos to work
>> with the klimt.ib interface.
>>=20
>>=20
>>> An output from gssd -vvv would be interesting.
>>=20
>> May 10 14:43:24 klimt rpc.gssd[1191]: #012handle_gssd_upcall: =
'mech=3Dkrb5 uid=3D0 [email protected] service=3Dnfs =
enctypes=3D18,17,16,2
>> 3,3,1,2 ' (nfsd4_cb/clnt0)
>> May 10 14:43:24 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 =
tgtname [email protected]
>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for =
'manet.1015granger.net' is 'manet.1015granger.net'
>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for =
'klimt.1015granger.net' is 'klimt.1015granger.net'
>=20
> I think that's the problem. This should have been
> klimt.ib.1015granger.net. nfs-utils just calls gethostname() to get
> the local domain name. And this is what it'll match against the key
> tab entry. So I think even if you move the key tabs around it probably
> will still pick [email protected].
>=20
> Honestly, I'm also surprised that "[email protected]"
> and not "[email protected]". What principal name
> did the client use to authenticate to the server? I also somehow
> assumed that this should have been
> "[email protected]".

Likely for the same reason you state, nfs-utils on the client
will use gethostname(3) to do the keytab lookup. And I didn't
put any nfs/ principals in my client keytab:

[root@manet ~]# klist -ke /etc/krb5.keytab
Keytab name: FILE:/etc/krb5.keytab
KVNO Principal
---- =
--------------------------------------------------------------------------=

2 host/[email protected] =
(aes256-cts-hmac-sha1-96)=20
2 host/[email protected] =
(aes128-cts-hmac-sha1-96)=20
2 host/[email protected] (des3-cbc-sha1)=20
2 host/[email protected] (arcfour-hmac)=20
[root@manet ~]#


>> May 10 14:43:24 klimt rpc.gssd[1191]: Success getting keytab entry =
for 'nfs/[email protected]'
>> May 10 14:43:24 klimt rpc.gssd[1191]: gssd_get_single_krb5_cred: =
principal 'nfs/[email protected]' =
ccache:'FILE:/tmp/krb5ccmachine_1015GRANGER.NET'
>> May 10 14:43:24 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>> May 10 14:43:24 klimt rpc.gssd[1191]: creating tcp client for server =
manet.1015granger.net
>> May 10 14:43:24 klimt rpc.gssd[1191]: creating context with server =
[email protected]
>> May 10 14:43:24 klimt rpc.gssd[1191]: doing downcall: =
lifetime_rec=3D76170 [email protected]
>> May 10 14:44:31 klimt rpc.gssd[1191]: #012handle_gssd_upcall: =
'mech=3Dkrb5 uid=3D0 [email protected] service=3Dnfs =
enctypes=3D18,17,16,23,3,1,2 ' (nfsd4_cb/clnt1)
>> May 10 14:44:31 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 =
tgtname [email protected]
>> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for =
'manet.1015granger.net' is 'manet.1015granger.net'
>> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for =
'klimt.1015granger.net' is 'klimt.1015granger.net'
>> May 10 14:44:31 klimt rpc.gssd[1191]: Success getting keytab entry =
for 'nfs/[email protected]'
>> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>> May 10 14:44:31 klimt rpc.gssd[1191]: creating tcp client for server =
manet.1015granger.net
>> May 10 14:44:31 klimt rpc.gssd[1191]: creating context with server =
[email protected]
>> May 10 14:44:31 klimt rpc.gssd[1191]: doing downcall: =
lifetime_rec=3D76103 [email protected]
>=20
> Going back to the original mail where you wrote:
>=20
> check_gss_callback_principal: [email protected],
> [email protected]
>=20
> Where is this output on the client kernel or server kernel?
>=20
> According to the gssd output. In the callback authentication
> [email protected] is authenticating to
> [email protected]. None of them match the
> "check_gss_callback_principal" output. So I'm confused...

This is instrumentation I added to the check_gss_callback_principal
function on the client. The above is gssd output on the server.

The client seems to be checking the acceptor ([email protected]) of
the forward channel GSS context against the principal the server
actually uses (host@klimt) to establish the backchannel GSS
context.


>>>> This seems to mean that the server stack is going to need to
>>>> expose the SName in each GSS context so that it can dig that
>>>> out to create a proper callback credential for each callback
>>>> transport.
>>>>=20
>>>> I guess I've reported this issue before, but now I'm tucking
>>>> in and trying to address it correctly.

--
Chuck Lever




2018-05-10 21:34:18

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor

On Thu, May 10, 2018 at 5:11 PM, Chuck Lever <[email protected]> wrote:
>
>
>> On May 10, 2018, at 4:58 PM, Olga Kornievskaia <[email protected]> wrote:
>>
>> On Thu, May 10, 2018 at 3:23 PM, Chuck Lever <[email protected]> wrote:
>>>
>>>
>>>> On May 10, 2018, at 3:07 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>
>>>> On Thu, May 10, 2018 at 2:09 PM, Chuck Lever <[email protected]> wrote:
>>>>>
>>>>>
>>>>>> On May 10, 2018, at 1:40 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>>>
>>>>>> On Wed, May 9, 2018 at 5:19 PM, Chuck Lever <[email protected]> wrote:
>>>>>>> I'm right on the edge of my understanding of how this all works.
>>>>>>>
>>>>>>> I've re-keyed my NFS server. Now on my client, I'm seeing this on
>>>>>>> vers=4.0,sec=sys mounts:
>>>>>>>
>>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>>>>
>>>>>>> manet is my client, and klimt is my server. I'm mounting with
>>>>>>> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>>>>>>>
>>>>>>> Because the client is using krb5i for lease management, the server
>>>>>>> is required to use krb5i for the callback channel (S 3.3.3 of RFC
>>>>>>> 7530).
>>>>>>>
>>>>>>> After a SETCLIENTID, the client copies the acceptor from the GSS
>>>>>>> context it set up, and uses that to check incoming callback
>>>>>>> requests. I instrumented the client's SETCLIENTID proc, and I see
>>>>>>> this:
>>>>>>>
>>>>>>> check_gss_callback_principal: [email protected], [email protected]
>>>>>>>
>>>>>>> The principal strings are not equal, and that's why the client
>>>>>>> believes the callback credential is bogus. Now I'm trying to
>>>>>>> figure out whether it is the server's callback client or the
>>>>>>> client's callback server that is misbehaving.
>>>>>>>
>>>>>>> To me, the server's callback principal (host@klimt) seems like it
>>>>>>> is correct. The client would identify as host@manet when making
>>>>>>> calls to the server, for example, so I'd expect the server to
>>>>>>> behave similarly when performing callbacks.
>>>>>>>
>>>>>>> Can anyone shed more light on this?
>>>>>>
>>>>>> What are your full hostnames of each machine and does the reverse
>>>>>> lookup from the ip to hostname on each machine give you what you
>>>>>> expect?
>>>>>>
>>>>>> Sounds like all of them need to be resolved to <>.ib.1015grager.net
>>>>>> but somewhere you are getting <>.1015grager.net instead.
>>>>>
>>>>> The forward and reverse mappings are consistent, and rdns is
>>>>> disabled in my krb5.conf files. My server is multi-homed; it
>>>>> has a 1GbE interface (klimt.1015granger.net); an FDR IB
>>>>> interface (klimt.ib.1015granger.net); and a 25 GbE interface
>>>>> (klimt.roce.1015granger.net).
>>>>
>>>> Ah, so you are keeping it very interesting...
>>>>
>>>>> My theory is that the server needs to use the same principal
>>>>> for callback operations that the client used for lease
>>>>> establishment. The last paragraph of S3.3.3 seems to state
>>>>> that requirement, though it's not especially clear; and the
>>>>> client has required it since commit f11b2a1cfbf5 (2014).
>>>>>
>>>>> So the server should authenticate as [email protected] and not
>>>>> host@klimt, in this case, when performing callback requests.
>>>>
>>>> Yes I agree that server should have authenticated as [email protected] and
>>>> that's what I see in my (simple) single home setup.
>>>>
>>>> In nfs-utils there is code that deals with the callback and comment
>>>> about choices for the principal:
>>>> * Restricting gssd to use "nfs" service name is needed for when
>>>> * the NFS server is doing a callback to the NFS client. In this
>>>> * case, the NFS server has to authenticate itself as "nfs" --
>>>> * even if there are other service keys such as "host" or "root"
>>>> * in the keytab.
>>>> So the upcall for the callback should have specifically specified
>>>> "nfs" to look for the nfs/<hostname>. Question is if you key tab has
>>>> both:
>>>> nfs/klmit and nfs/klmit.ib how does it choose which one to take. I'm
>>>> not sure. But I guess in your case you are seeing that it choose
>>>> "host/<>" which would really be a nfs-utils bug.
>>>
>>> I think the upcall is correctly requesting an nfs/ principal
>>> (see below).
>>>
>>> Not only does it need to choose an nfs/ principal, but it also
>>> has to pick the correct domain name. The domain name does not
>>> seem to be passed up to gssd. fs/nfsd/nfs4state.c has this:
>
> Sorry, this is fs/nfsd/nfs4callback.c
>
>
>>> 749 static struct rpc_cred *callback_cred;
>>> 750
>>> 751 int set_callback_cred(void)
>>> 752 {
>>> 753 if (callback_cred)
>>> 754 return 0;
>>> 755 callback_cred = rpc_lookup_machine_cred("nfs");
>>> 756 if (!callback_cred)
>>> 757 return -ENOMEM;
>>> 758 return 0;
>>> 759 }
>>> 760
>>> 761 void cleanup_callback_cred(void)
>>> 762 {
>>> 763 if (callback_cred) {
>>> 764 put_rpccred(callback_cred);
>>> 765 callback_cred = NULL;
>>> 766 }
>>> 767 }
>>> 768
>>> 769 static struct rpc_cred *get_backchannel_cred(struct nfs4_client *clp, struct rpc_clnt *client, struct nfsd4_session *ses)
>>> 770 {
>>> 771 if (clp->cl_minorversion == 0) {
>>> 772 return get_rpccred(callback_cred);
>>> 773 } else {
>>> 774 struct rpc_auth *auth = client->cl_auth;
>>> 775 struct auth_cred acred = {};
>>> 776
>>> 777 acred.uid = ses->se_cb_sec.uid;
>>> 778 acred.gid = ses->se_cb_sec.gid;
>>> 779 return auth->au_ops->lookup_cred(client->cl_auth, &acred, 0);
>>> 780 }
>>> 781 }
>>>
>>> rpc_lookup_machine_cred("nfs"); should request an "nfs/" service
>>> principal, shouldn't it?
>
> It doesn't seem to generate an upcall.
>
>
>>> Though I think this approach is incorrect. The server should not
>>> use the machine cred here, it should use a credential based on
>>> the principal the client used to establish it's lease.
>>>
>>>
>>>> What's in your server's key tab?
>>>
>>> [root@klimt ~]# klist -ke /etc/krb5.keytab
>>> Keytab name: FILE:/etc/krb5.keytab
>>> KVNO Principal
>>> ---- --------------------------------------------------------------------------
>>> 4 host/[email protected] (aes256-cts-hmac-sha1-96)
>>> 4 host/[email protected] (aes128-cts-hmac-sha1-96)
>>> 4 host/[email protected] (des3-cbc-sha1)
>>> 4 host/[email protected] (arcfour-hmac)
>>> 3 nfs/[email protected] (aes256-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (aes128-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (des3-cbc-sha1)
>>> 3 nfs/[email protected] (arcfour-hmac)
>>> 3 nfs/[email protected] (aes256-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (aes128-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (des3-cbc-sha1)
>>> 3 nfs/[email protected] (arcfour-hmac)
>>> 3 nfs/[email protected] (aes256-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (aes128-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (des3-cbc-sha1)
>>> 3 nfs/[email protected] (arcfour-hmac)
>>> [root@klimt ~]#
>>>
>>> As a workaround, I bet moving the keys for nfs/klimt.ib to
>>> the front of the keytab file would allow Kerberos to work
>>> with the klimt.ib interface.
>>>
>>>
>>>> An output from gssd -vvv would be interesting.
>>>
>>> May 10 14:43:24 klimt rpc.gssd[1191]: #012handle_gssd_upcall: 'mech=krb5 uid=0 [email protected] service=nfs enctypes=18,17,16,2
>>> 3,3,1,2 ' (nfsd4_cb/clnt0)
>>> May 10 14:43:24 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 tgtname [email protected]
>>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for 'manet.1015granger.net' is 'manet.1015granger.net'
>>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for 'klimt.1015granger.net' is 'klimt.1015granger.net'
>>
>> I think that's the problem. This should have been
>> klimt.ib.1015granger.net. nfs-utils just calls gethostname() to get
>> the local domain name. And this is what it'll match against the key
>> tab entry. So I think even if you move the key tabs around it probably
>> will still pick [email protected].
>>
>> Honestly, I'm also surprised that "[email protected]"
>> and not "[email protected]". What principal name
>> did the client use to authenticate to the server? I also somehow
>> assumed that this should have been
>> "[email protected]".
>
> Likely for the same reason you state, nfs-utils on the client
> will use gethostname(3) to do the keytab lookup. And I didn't
> put any nfs/ principals in my client keytab:
>
> [root@manet ~]# klist -ke /etc/krb5.keytab
> Keytab name: FILE:/etc/krb5.keytab
> KVNO Principal
> ---- --------------------------------------------------------------------------
> 2 host/[email protected] (aes256-cts-hmac-sha1-96)
> 2 host/[email protected] (aes128-cts-hmac-sha1-96)
> 2 host/[email protected] (des3-cbc-sha1)
> 2 host/[email protected] (arcfour-hmac)
> [root@manet ~]#
>
>
>>> May 10 14:43:24 klimt rpc.gssd[1191]: Success getting keytab entry for 'nfs/[email protected]'
>>> May 10 14:43:24 klimt rpc.gssd[1191]: gssd_get_single_krb5_cred: principal 'nfs/[email protected]' ccache:'FILE:/tmp/krb5ccmachine_1015GRANGER.NET'
>>> May 10 14:43:24 klimt rpc.gssd[1191]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>> May 10 14:43:24 klimt rpc.gssd[1191]: creating tcp client for server manet.1015granger.net
>>> May 10 14:43:24 klimt rpc.gssd[1191]: creating context with server [email protected]
>>> May 10 14:43:24 klimt rpc.gssd[1191]: doing downcall: lifetime_rec=76170 [email protected]
>>> May 10 14:44:31 klimt rpc.gssd[1191]: #012handle_gssd_upcall: 'mech=krb5 uid=0 [email protected] service=nfs enctypes=18,17,16,23,3,1,2 ' (nfsd4_cb/clnt1)
>>> May 10 14:44:31 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 tgtname [email protected]
>>> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for 'manet.1015granger.net' is 'manet.1015granger.net'
>>> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for 'klimt.1015granger.net' is 'klimt.1015granger.net'
>>> May 10 14:44:31 klimt rpc.gssd[1191]: Success getting keytab entry for 'nfs/[email protected]'
>>> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>> May 10 14:44:31 klimt rpc.gssd[1191]: creating tcp client for server manet.1015granger.net
>>> May 10 14:44:31 klimt rpc.gssd[1191]: creating context with server [email protected]
>>> May 10 14:44:31 klimt rpc.gssd[1191]: doing downcall: lifetime_rec=76103 [email protected]
>>
>> Going back to the original mail where you wrote:
>>
>> check_gss_callback_principal: [email protected],
>> [email protected]
>>
>> Where is this output on the client kernel or server kernel?
>>
>> According to the gssd output. In the callback authentication
>> [email protected] is authenticating to
>> [email protected]. None of them match the
>> "check_gss_callback_principal" output. So I'm confused...
>
> This is instrumentation I added to the check_gss_callback_principal
> function on the client. The above is gssd output on the server.
>
> The client seems to be checking the acceptor ([email protected]) of
> the forward channel GSS context against the principal the server
> actually uses (host@klimt) to establish the backchannel GSS
> context.
>

But according to the gssd output on the server, the server uses
'nfs/[email protected]' not "host@klimt" as the
principal.
So if that output would have been a difference but only in the domain,
then that would be matching my understanding.


>
>>>>> This seems to mean that the server stack is going to need to
>>>>> expose the SName in each GSS context so that it can dig that
>>>>> out to create a proper callback credential for each callback
>>>>> transport.
>>>>>
>>>>> I guess I've reported this issue before, but now I'm tucking
>>>>> in and trying to address it correctly.
>
> --
> Chuck Lever
>
>
>

2018-05-11 14:34:59

by Chuck Lever III

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor



> On May 10, 2018, at 5:34 PM, Olga Kornievskaia <[email protected]> wrote:
>=20
> On Thu, May 10, 2018 at 5:11 PM, Chuck Lever <[email protected]> =
wrote:
>>=20
>>=20
>>> On May 10, 2018, at 4:58 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>>=20
>>> On Thu, May 10, 2018 at 3:23 PM, Chuck Lever =
<[email protected]> wrote:
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: Success getting keytab entry =
for 'nfs/[email protected]'
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: gssd_get_single_krb5_cred: =
principal 'nfs/[email protected]' =
ccache:'FILE:/tmp/krb5ccmachine_1015GRANGER.NET'
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: creating tcp client for =
server manet.1015granger.net
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: creating context with server =
[email protected]
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: doing downcall: =
lifetime_rec=3D76170 [email protected]
>>>> May 10 14:44:31 klimt rpc.gssd[1191]: #012handle_gssd_upcall: =
'mech=3Dkrb5 uid=3D0 [email protected] service=3Dnfs =
enctypes=3D18,17,16,23,3,1,2 ' (nfsd4_cb/clnt1)
>>>> May 10 14:44:31 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 =
tgtname [email protected]
>>>> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for =
'manet.1015granger.net' is 'manet.1015granger.net'
>>>> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for =
'klimt.1015granger.net' is 'klimt.1015granger.net'
>>>> May 10 14:44:31 klimt rpc.gssd[1191]: Success getting keytab entry =
for 'nfs/[email protected]'
>>>> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>>> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>>> May 10 14:44:31 klimt rpc.gssd[1191]: creating tcp client for =
server manet.1015granger.net
>>>> May 10 14:44:31 klimt rpc.gssd[1191]: creating context with server =
[email protected]
>>>> May 10 14:44:31 klimt rpc.gssd[1191]: doing downcall: =
lifetime_rec=3D76103 [email protected]
>>>=20
>>> Going back to the original mail where you wrote:
>>>=20
>>> check_gss_callback_principal: [email protected],=

>>> [email protected]
>>>=20
>>> Where is this output on the client kernel or server kernel?
>>>=20
>>> According to the gssd output. In the callback authentication
>>> [email protected] is authenticating to
>>> [email protected]. None of them match the
>>> "check_gss_callback_principal" output. So I'm confused...
>>=20
>> This is instrumentation I added to the check_gss_callback_principal
>> function on the client. The above is gssd output on the server.
>>=20
>> The client seems to be checking the acceptor ([email protected]) of
>> the forward channel GSS context against the principal the server
>> actually uses (host@klimt) to establish the backchannel GSS
>> context.
>>=20
>=20
> But according to the gssd output on the server, the server uses
> 'nfs/[email protected]' not "host@klimt" as the
> principal.
> So if that output would have been a difference but only in the domain,
> then that would be matching my understanding.

I can't even get this to work with NFS/TCP on klimt.1015granger.net,
and a single "nfs/klimt.1015granger.net" entry in the server's keytab.
The client complains the server is using "[email protected]"
as the callback principal.

I'm looking into it.

--
Chuck Lever




2018-05-11 19:44:03

by Chuck Lever III

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor



> On May 11, 2018, at 10:34 AM, Chuck Lever <[email protected]> =
wrote:
>=20
>=20
>=20
>> On May 10, 2018, at 5:34 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>=20
>> On Thu, May 10, 2018 at 5:11 PM, Chuck Lever <[email protected]> =
wrote:
>>>=20
>>>=20
>>>> On May 10, 2018, at 4:58 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>>>=20
>>>> On Thu, May 10, 2018 at 3:23 PM, Chuck Lever =
<[email protected]> wrote:
>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: Success getting keytab entry =
for 'nfs/[email protected]'
>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: gssd_get_single_krb5_cred: =
principal 'nfs/[email protected]' =
ccache:'FILE:/tmp/krb5ccmachine_1015GRANGER.NET'
>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: creating tcp client for =
server manet.1015granger.net
>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: creating context with server =
[email protected]
>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: doing downcall: =
lifetime_rec=3D76170 [email protected]
>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: #012handle_gssd_upcall: =
'mech=3Dkrb5 uid=3D0 [email protected] service=3Dnfs =
enctypes=3D18,17,16,23,3,1,2 ' (nfsd4_cb/clnt1)
>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid =
0 tgtname [email protected]
>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for =
'manet.1015granger.net' is 'manet.1015granger.net'
>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for =
'klimt.1015granger.net' is 'klimt.1015granger.net'
>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: Success getting keytab entry =
for 'nfs/[email protected]'
>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC =
'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: creating tcp client for =
server manet.1015granger.net
>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: creating context with server =
[email protected]
>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: doing downcall: =
lifetime_rec=3D76103 [email protected]
>>>>=20
>>>> Going back to the original mail where you wrote:
>>>>=20
>>>> check_gss_callback_principal: =
[email protected],
>>>> [email protected]
>>>>=20
>>>> Where is this output on the client kernel or server kernel?
>>>>=20
>>>> According to the gssd output. In the callback authentication
>>>> [email protected] is authenticating to
>>>> [email protected]. None of them match the
>>>> "check_gss_callback_principal" output. So I'm confused...
>>>=20
>>> This is instrumentation I added to the check_gss_callback_principal
>>> function on the client. The above is gssd output on the server.
>>>=20
>>> The client seems to be checking the acceptor ([email protected]) of
>>> the forward channel GSS context against the principal the server
>>> actually uses (host@klimt) to establish the backchannel GSS
>>> context.
>>>=20
>>=20
>> But according to the gssd output on the server, the server uses
>> 'nfs/[email protected]' not "host@klimt" as the
>> principal.
>> So if that output would have been a difference but only in the =
domain,
>> then that would be matching my understanding.
>=20
> I can't even get this to work with NFS/TCP on klimt.1015granger.net,
> and a single "nfs/klimt.1015granger.net" entry in the server's keytab.
> The client complains the server is using "[email protected]"
> as the callback principal.
>=20
> I'm looking into it.

It appears that gssproxy caches the credential on persistent storage.
See /var/lib/gssproxy/clients/*


--
Chuck Lever




2018-05-11 20:04:19

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor

On Fri, May 11, 2018 at 3:43 PM, Chuck Lever <[email protected]> wrote:
>
>
>> On May 11, 2018, at 10:34 AM, Chuck Lever <[email protected]> wrote:
>>
>>
>>
>>> On May 10, 2018, at 5:34 PM, Olga Kornievskaia <[email protected]> wrote:
>>>
>>> On Thu, May 10, 2018 at 5:11 PM, Chuck Lever <[email protected]> wrote:
>>>>
>>>>
>>>>> On May 10, 2018, at 4:58 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>>
>>>>> On Thu, May 10, 2018 at 3:23 PM, Chuck Lever <[email protected]> wrote:
>>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: Success getting keytab entry for 'nfs/[email protected]'
>>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: gssd_get_single_krb5_cred: principal 'nfs/[email protected]' ccache:'FILE:/tmp/krb5ccmachine_1015GRANGER.NET'
>>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: creating tcp client for server manet.1015granger.net
>>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: creating context with server [email protected]
>>>>>> May 10 14:43:24 klimt rpc.gssd[1191]: doing downcall: lifetime_rec=76170 [email protected]
>>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: #012handle_gssd_upcall: 'mech=krb5 uid=0 [email protected] service=nfs enctypes=18,17,16,23,3,1,2 ' (nfsd4_cb/clnt1)
>>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 tgtname [email protected]
>>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for 'manet.1015granger.net' is 'manet.1015granger.net'
>>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: Full hostname for 'klimt.1015granger.net' is 'klimt.1015granger.net'
>>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: Success getting keytab entry for 'nfs/[email protected]'
>>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_1015GRANGER.NET' are good until 1526064204
>>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: creating tcp client for server manet.1015granger.net
>>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: creating context with server [email protected]
>>>>>> May 10 14:44:31 klimt rpc.gssd[1191]: doing downcall: lifetime_rec=76103 [email protected]
>>>>>
>>>>> Going back to the original mail where you wrote:
>>>>>
>>>>> check_gss_callback_principal: [email protected],
>>>>> [email protected]
>>>>>
>>>>> Where is this output on the client kernel or server kernel?
>>>>>
>>>>> According to the gssd output. In the callback authentication
>>>>> [email protected] is authenticating to
>>>>> [email protected]. None of them match the
>>>>> "check_gss_callback_principal" output. So I'm confused...
>>>>
>>>> This is instrumentation I added to the check_gss_callback_principal
>>>> function on the client. The above is gssd output on the server.
>>>>
>>>> The client seems to be checking the acceptor ([email protected]) of
>>>> the forward channel GSS context against the principal the server
>>>> actually uses (host@klimt) to establish the backchannel GSS
>>>> context.
>>>>
>>>
>>> But according to the gssd output on the server, the server uses
>>> 'nfs/[email protected]' not "host@klimt" as the
>>> principal.
>>> So if that output would have been a difference but only in the domain,
>>> then that would be matching my understanding.
>>
>> I can't even get this to work with NFS/TCP on klimt.1015granger.net,
>> and a single "nfs/klimt.1015granger.net" entry in the server's keytab.
>> The client complains the server is using "[email protected]"
>> as the callback principal.
>>
>> I'm looking into it.
>
> It appears that gssproxy caches the credential on persistent storage.
> See /var/lib/gssproxy/clients/*

gssproxy has given me so many problems. I always turn it off.

2018-05-11 20:57:51

by Chuck Lever III

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor



> On May 10, 2018, at 4:58 PM, Olga Kornievskaia <[email protected]> wrote:
>=20
> On Thu, May 10, 2018 at 3:23 PM, Chuck Lever <[email protected]> =
wrote:
>>=20
>>=20
>>> On May 10, 2018, at 3:07 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>>=20
>>> On Thu, May 10, 2018 at 2:09 PM, Chuck Lever =
<[email protected]> wrote:
>>>>=20
>>>>=20
>>>>> On May 10, 2018, at 1:40 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>>>>=20
>>>>> On Wed, May 9, 2018 at 5:19 PM, Chuck Lever =
<[email protected]> wrote:
>>>>>> I'm right on the edge of my understanding of how this all works.
>>>>>>=20
>>>>>> I've re-keyed my NFS server. Now on my client, I'm seeing this on
>>>>>> vers=3D4.0,sec=3Dsys mounts:
>>>>>>=20
>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains =
invalid cred
>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains =
invalid cred
>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains =
invalid cred
>>>>>>=20
>>>>>> manet is my client, and klimt is my server. I'm mounting with
>>>>>> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>>>>>>=20
>>>>>> Because the client is using krb5i for lease management, the =
server
>>>>>> is required to use krb5i for the callback channel (S 3.3.3 of RFC
>>>>>> 7530).
>>>>>>=20
>>>>>> After a SETCLIENTID, the client copies the acceptor from the GSS
>>>>>> context it set up, and uses that to check incoming callback
>>>>>> requests. I instrumented the client's SETCLIENTID proc, and I see
>>>>>> this:
>>>>>>=20
>>>>>> check_gss_callback_principal: =
[email protected], =
[email protected]
>>>>>>=20
>>>>>> The principal strings are not equal, and that's why the client
>>>>>> believes the callback credential is bogus. Now I'm trying to
>>>>>> figure out whether it is the server's callback client or the
>>>>>> client's callback server that is misbehaving.
>>>>>>=20
>>>>>> To me, the server's callback principal (host@klimt) seems like it
>>>>>> is correct. The client would identify as host@manet when making
>>>>>> calls to the server, for example, so I'd expect the server to
>>>>>> behave similarly when performing callbacks.
>>>>>>=20
>>>>>> Can anyone shed more light on this?
>>>>>=20
>>>>> What are your full hostnames of each machine and does the reverse
>>>>> lookup from the ip to hostname on each machine give you what you
>>>>> expect?
>>>>>=20
>>>>> Sounds like all of them need to be resolved to =
<>.ib.1015grager.net
>>>>> but somewhere you are getting <>.1015grager.net instead.
>>>>=20
>>>> The forward and reverse mappings are consistent, and rdns is
>>>> disabled in my krb5.conf files. My server is multi-homed; it
>>>> has a 1GbE interface (klimt.1015granger.net); an FDR IB
>>>> interface (klimt.ib.1015granger.net); and a 25 GbE interface
>>>> (klimt.roce.1015granger.net).
>>>=20
>>> Ah, so you are keeping it very interesting...
>>>=20
>>>> My theory is that the server needs to use the same principal
>>>> for callback operations that the client used for lease
>>>> establishment. The last paragraph of S3.3.3 seems to state
>>>> that requirement, though it's not especially clear; and the
>>>> client has required it since commit f11b2a1cfbf5 (2014).
>>>>=20
>>>> So the server should authenticate as [email protected] and not
>>>> host@klimt, in this case, when performing callback requests.
>>>=20
>>> Yes I agree that server should have authenticated as [email protected] =
and
>>> that's what I see in my (simple) single home setup.
>>>=20
>>> In nfs-utils there is code that deals with the callback and comment
>>> about choices for the principal:
>>> * Restricting gssd to use "nfs" service name is needed for =
when
>>> * the NFS server is doing a callback to the NFS client. In =
this
>>> * case, the NFS server has to authenticate itself as "nfs" --
>>> * even if there are other service keys such as "host" or =
"root"
>>> * in the keytab.
>>> So the upcall for the callback should have specifically specified
>>> "nfs" to look for the nfs/<hostname>. Question is if you key tab has
>>> both:
>>> nfs/klmit and nfs/klmit.ib how does it choose which one to take. I'm
>>> not sure. But I guess in your case you are seeing that it choose
>>> "host/<>" which would really be a nfs-utils bug.
>>=20
>> I think the upcall is correctly requesting an nfs/ principal
>> (see below).
>>=20
>> Not only does it need to choose an nfs/ principal, but it also
>> has to pick the correct domain name. The domain name does not
>> seem to be passed up to gssd. fs/nfsd/nfs4state.c has this:
>>=20
>> 749 static struct rpc_cred *callback_cred;
>> 750
>> 751 int set_callback_cred(void)
>> 752 {
>> 753 if (callback_cred)
>> 754 return 0;
>> 755 callback_cred =3D rpc_lookup_machine_cred("nfs");
>> 756 if (!callback_cred)
>> 757 return -ENOMEM;
>> 758 return 0;
>> 759 }
>> 760
>> 761 void cleanup_callback_cred(void)
>> 762 {
>> 763 if (callback_cred) {
>> 764 put_rpccred(callback_cred);
>> 765 callback_cred =3D NULL;
>> 766 }
>> 767 }
>> 768
>> 769 static struct rpc_cred *get_backchannel_cred(struct nfs4_client =
*clp, struct rpc_clnt *client, struct nfsd4_session *ses)
>> 770 {
>> 771 if (clp->cl_minorversion =3D=3D 0) {
>> 772 return get_rpccred(callback_cred);
>> 773 } else {
>> 774 struct rpc_auth *auth =3D client->cl_auth;
>> 775 struct auth_cred acred =3D {};
>> 776
>> 777 acred.uid =3D ses->se_cb_sec.uid;
>> 778 acred.gid =3D ses->se_cb_sec.gid;
>> 779 return auth->au_ops->lookup_cred(client->cl_auth, =
&acred, 0);
>> 780 }
>> 781 }
>>=20
>> rpc_lookup_machine_cred("nfs"); should request an "nfs/" service
>> principal, shouldn't it?
>>=20
>> Though I think this approach is incorrect. The server should not
>> use the machine cred here, it should use a credential based on
>> the principal the client used to establish it's lease.
>>=20
>>=20
>>> What's in your server's key tab?
>>=20
>> [root@klimt ~]# klist -ke /etc/krb5.keytab
>> Keytab name: FILE:/etc/krb5.keytab
>> KVNO Principal
>> ---- =
--------------------------------------------------------------------------=

>> 4 host/[email protected] =
(aes256-cts-hmac-sha1-96)
>> 4 host/[email protected] =
(aes128-cts-hmac-sha1-96)
>> 4 host/[email protected] (des3-cbc-sha1)
>> 4 host/[email protected] (arcfour-hmac)
>> 3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)
>> 3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)
>> 3 nfs/[email protected] (des3-cbc-sha1)
>> 3 nfs/[email protected] (arcfour-hmac)
>> 3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)
>> 3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)
>> 3 nfs/[email protected] (des3-cbc-sha1)
>> 3 nfs/[email protected] (arcfour-hmac)
>> 3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)
>> 3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)
>> 3 nfs/[email protected] (des3-cbc-sha1)
>> 3 nfs/[email protected] (arcfour-hmac)
>> [root@klimt ~]#
>>=20
>> As a workaround, I bet moving the keys for nfs/klimt.ib to
>> the front of the keytab file would allow Kerberos to work
>> with the klimt.ib interface.
>>=20
>>=20
>>> An output from gssd -vvv would be interesting.
>>=20
>> May 10 14:43:24 klimt rpc.gssd[1191]: #012handle_gssd_upcall: =
'mech=3Dkrb5 uid=3D0 [email protected] service=3Dnfs =
enctypes=3D18,17,16,2
>> 3,3,1,2 ' (nfsd4_cb/clnt0)
>> May 10 14:43:24 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 =
tgtname [email protected]
>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for =
'manet.1015granger.net' is 'manet.1015granger.net'
>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for =
'klimt.1015granger.net' is 'klimt.1015granger.net'
>=20
> I think that's the problem. This should have been
> klimt.ib.1015granger.net. nfs-utils just calls gethostname() to get
> the local domain name. And this is what it'll match against the key
> tab entry. So I think even if you move the key tabs around it probably
> will still pick [email protected].

mount.nfs has a helper function called nfs_ca_sockname() that does a
connect/getsockname dance to derive the local host's hostname as it
is seen by the other end of the connection. So in this case, the
server's gssd would get the client's name, "manet.ib.1015granger.net"
and the "nfs" service name, and would correctly derive the service
principal "nfs/klimt.ib.1015granger.net" based on that.

Would it work if gssd did this instead of using gethostname(3) ? Then
the kernel wouldn't have to pass the correct principal up to gssd, it
would be able to derive it by itself.


--
Chuck Lever




2018-05-14 17:26:18

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor

On Fri, May 11, 2018 at 4:57 PM, Chuck Lever <[email protected]> wrote:
>
>
>> On May 10, 2018, at 4:58 PM, Olga Kornievskaia <[email protected]> wrote:
>>
>> On Thu, May 10, 2018 at 3:23 PM, Chuck Lever <[email protected]> wrote:
>>>
>>>
>>>> On May 10, 2018, at 3:07 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>
>>>> On Thu, May 10, 2018 at 2:09 PM, Chuck Lever <[email protected]> wrote:
>>>>>
>>>>>
>>>>>> On May 10, 2018, at 1:40 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>>>
>>>>>> On Wed, May 9, 2018 at 5:19 PM, Chuck Lever <[email protected]> wrote:
>>>>>>> I'm right on the edge of my understanding of how this all works.
>>>>>>>
>>>>>>> I've re-keyed my NFS server. Now on my client, I'm seeing this on
>>>>>>> vers=4.0,sec=sys mounts:
>>>>>>>
>>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>>>>
>>>>>>> manet is my client, and klimt is my server. I'm mounting with
>>>>>>> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>>>>>>>
>>>>>>> Because the client is using krb5i for lease management, the server
>>>>>>> is required to use krb5i for the callback channel (S 3.3.3 of RFC
>>>>>>> 7530).
>>>>>>>
>>>>>>> After a SETCLIENTID, the client copies the acceptor from the GSS
>>>>>>> context it set up, and uses that to check incoming callback
>>>>>>> requests. I instrumented the client's SETCLIENTID proc, and I see
>>>>>>> this:
>>>>>>>
>>>>>>> check_gss_callback_principal: [email protected], [email protected]
>>>>>>>
>>>>>>> The principal strings are not equal, and that's why the client
>>>>>>> believes the callback credential is bogus. Now I'm trying to
>>>>>>> figure out whether it is the server's callback client or the
>>>>>>> client's callback server that is misbehaving.
>>>>>>>
>>>>>>> To me, the server's callback principal (host@klimt) seems like it
>>>>>>> is correct. The client would identify as host@manet when making
>>>>>>> calls to the server, for example, so I'd expect the server to
>>>>>>> behave similarly when performing callbacks.
>>>>>>>
>>>>>>> Can anyone shed more light on this?
>>>>>>
>>>>>> What are your full hostnames of each machine and does the reverse
>>>>>> lookup from the ip to hostname on each machine give you what you
>>>>>> expect?
>>>>>>
>>>>>> Sounds like all of them need to be resolved to <>.ib.1015grager.net
>>>>>> but somewhere you are getting <>.1015grager.net instead.
>>>>>
>>>>> The forward and reverse mappings are consistent, and rdns is
>>>>> disabled in my krb5.conf files. My server is multi-homed; it
>>>>> has a 1GbE interface (klimt.1015granger.net); an FDR IB
>>>>> interface (klimt.ib.1015granger.net); and a 25 GbE interface
>>>>> (klimt.roce.1015granger.net).
>>>>
>>>> Ah, so you are keeping it very interesting...
>>>>
>>>>> My theory is that the server needs to use the same principal
>>>>> for callback operations that the client used for lease
>>>>> establishment. The last paragraph of S3.3.3 seems to state
>>>>> that requirement, though it's not especially clear; and the
>>>>> client has required it since commit f11b2a1cfbf5 (2014).
>>>>>
>>>>> So the server should authenticate as [email protected] and not
>>>>> host@klimt, in this case, when performing callback requests.
>>>>
>>>> Yes I agree that server should have authenticated as [email protected] and
>>>> that's what I see in my (simple) single home setup.
>>>>
>>>> In nfs-utils there is code that deals with the callback and comment
>>>> about choices for the principal:
>>>> * Restricting gssd to use "nfs" service name is needed for when
>>>> * the NFS server is doing a callback to the NFS client. In this
>>>> * case, the NFS server has to authenticate itself as "nfs" --
>>>> * even if there are other service keys such as "host" or "root"
>>>> * in the keytab.
>>>> So the upcall for the callback should have specifically specified
>>>> "nfs" to look for the nfs/<hostname>. Question is if you key tab has
>>>> both:
>>>> nfs/klmit and nfs/klmit.ib how does it choose which one to take. I'm
>>>> not sure. But I guess in your case you are seeing that it choose
>>>> "host/<>" which would really be a nfs-utils bug.
>>>
>>> I think the upcall is correctly requesting an nfs/ principal
>>> (see below).
>>>
>>> Not only does it need to choose an nfs/ principal, but it also
>>> has to pick the correct domain name. The domain name does not
>>> seem to be passed up to gssd. fs/nfsd/nfs4state.c has this:
>>>
>>> 749 static struct rpc_cred *callback_cred;
>>> 750
>>> 751 int set_callback_cred(void)
>>> 752 {
>>> 753 if (callback_cred)
>>> 754 return 0;
>>> 755 callback_cred = rpc_lookup_machine_cred("nfs");
>>> 756 if (!callback_cred)
>>> 757 return -ENOMEM;
>>> 758 return 0;
>>> 759 }
>>> 760
>>> 761 void cleanup_callback_cred(void)
>>> 762 {
>>> 763 if (callback_cred) {
>>> 764 put_rpccred(callback_cred);
>>> 765 callback_cred = NULL;
>>> 766 }
>>> 767 }
>>> 768
>>> 769 static struct rpc_cred *get_backchannel_cred(struct nfs4_client *clp, struct rpc_clnt *client, struct nfsd4_session *ses)
>>> 770 {
>>> 771 if (clp->cl_minorversion == 0) {
>>> 772 return get_rpccred(callback_cred);
>>> 773 } else {
>>> 774 struct rpc_auth *auth = client->cl_auth;
>>> 775 struct auth_cred acred = {};
>>> 776
>>> 777 acred.uid = ses->se_cb_sec.uid;
>>> 778 acred.gid = ses->se_cb_sec.gid;
>>> 779 return auth->au_ops->lookup_cred(client->cl_auth, &acred, 0);
>>> 780 }
>>> 781 }
>>>
>>> rpc_lookup_machine_cred("nfs"); should request an "nfs/" service
>>> principal, shouldn't it?
>>>
>>> Though I think this approach is incorrect. The server should not
>>> use the machine cred here, it should use a credential based on
>>> the principal the client used to establish it's lease.
>>>
>>>
>>>> What's in your server's key tab?
>>>
>>> [root@klimt ~]# klist -ke /etc/krb5.keytab
>>> Keytab name: FILE:/etc/krb5.keytab
>>> KVNO Principal
>>> ---- --------------------------------------------------------------------------
>>> 4 host/[email protected] (aes256-cts-hmac-sha1-96)
>>> 4 host/[email protected] (aes128-cts-hmac-sha1-96)
>>> 4 host/[email protected] (des3-cbc-sha1)
>>> 4 host/[email protected] (arcfour-hmac)
>>> 3 nfs/[email protected] (aes256-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (aes128-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (des3-cbc-sha1)
>>> 3 nfs/[email protected] (arcfour-hmac)
>>> 3 nfs/[email protected] (aes256-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (aes128-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (des3-cbc-sha1)
>>> 3 nfs/[email protected] (arcfour-hmac)
>>> 3 nfs/[email protected] (aes256-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (aes128-cts-hmac-sha1-96)
>>> 3 nfs/[email protected] (des3-cbc-sha1)
>>> 3 nfs/[email protected] (arcfour-hmac)
>>> [root@klimt ~]#
>>>
>>> As a workaround, I bet moving the keys for nfs/klimt.ib to
>>> the front of the keytab file would allow Kerberos to work
>>> with the klimt.ib interface.
>>>
>>>
>>>> An output from gssd -vvv would be interesting.
>>>
>>> May 10 14:43:24 klimt rpc.gssd[1191]: #012handle_gssd_upcall: 'mech=krb5 uid=0 [email protected] service=nfs enctypes=18,17,16,2
>>> 3,3,1,2 ' (nfsd4_cb/clnt0)
>>> May 10 14:43:24 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 tgtname [email protected]
>>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for 'manet.1015granger.net' is 'manet.1015granger.net'
>>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for 'klimt.1015granger.net' is 'klimt.1015granger.net'
>>
>> I think that's the problem. This should have been
>> klimt.ib.1015granger.net. nfs-utils just calls gethostname() to get
>> the local domain name. And this is what it'll match against the key
>> tab entry. So I think even if you move the key tabs around it probably
>> will still pick [email protected].
>
> mount.nfs has a helper function called nfs_ca_sockname() that does a
> connect/getsockname dance to derive the local host's hostname as it
> is seen by the other end of the connection. So in this case, the
> server's gssd would get the client's name, "manet.ib.1015granger.net"
> and the "nfs" service name, and would correctly derive the service
> principal "nfs/klimt.ib.1015granger.net" based on that.
>
> Would it work if gssd did this instead of using gethostname(3) ? Then
> the kernel wouldn't have to pass the correct principal up to gssd, it
> would be able to derive it by itself.

I'd need to remind myself of how all of this work because I could
confidently answer this. We are currently passing "target=" from the
kernel as well as doing gethostbyname() in the gssd. Why? I don't know
and need to figure out what each piece really accomplishes.

I would think if the kernel could provide us with the correct domain
name (as it knows over which interface the request came in), then gssd
should just be using that instead querying the domain on its own.

Btw, what happened after your turned off the gssproxy? Did you get
further in getting the "nfs" and not "host" identity used?

2018-05-14 18:02:26

by Chuck Lever III

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor



> On May 14, 2018, at 1:26 PM, Olga Kornievskaia <[email protected]> wrote:
>=20
> On Fri, May 11, 2018 at 4:57 PM, Chuck Lever <[email protected]> =
wrote:
>>=20
>>=20
>>> On May 10, 2018, at 4:58 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>>=20
>>> On Thu, May 10, 2018 at 3:23 PM, Chuck Lever =
<[email protected]> wrote:
>>>>=20
>>>>=20
>>>>> On May 10, 2018, at 3:07 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>>>>=20
>>>>> On Thu, May 10, 2018 at 2:09 PM, Chuck Lever =
<[email protected]> wrote:
>>>>>>=20
>>>>>>=20
>>>>>>> On May 10, 2018, at 1:40 PM, Olga Kornievskaia <[email protected]> =
wrote:
>>>>>>>=20
>>>>>>> On Wed, May 9, 2018 at 5:19 PM, Chuck Lever =
<[email protected]> wrote:
>>>>>>>> I'm right on the edge of my understanding of how this all =
works.
>>>>>>>>=20
>>>>>>>> I've re-keyed my NFS server. Now on my client, I'm seeing this =
on
>>>>>>>> vers=3D4.0,sec=3Dsys mounts:
>>>>>>>>=20
>>>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains =
invalid cred
>>>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains =
invalid cred
>>>>>>>> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains =
invalid cred
>>>>>>>>=20
>>>>>>>> manet is my client, and klimt is my server. I'm mounting with
>>>>>>>> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>>>>>>>>=20
>>>>>>>> Because the client is using krb5i for lease management, the =
server
>>>>>>>> is required to use krb5i for the callback channel (S 3.3.3 of =
RFC
>>>>>>>> 7530).
>>>>>>>>=20
>>>>>>>> After a SETCLIENTID, the client copies the acceptor from the =
GSS
>>>>>>>> context it set up, and uses that to check incoming callback
>>>>>>>> requests. I instrumented the client's SETCLIENTID proc, and I =
see
>>>>>>>> this:
>>>>>>>>=20
>>>>>>>> check_gss_callback_principal: =
[email protected], =
[email protected]
>>>>>>>>=20
>>>>>>>> The principal strings are not equal, and that's why the client
>>>>>>>> believes the callback credential is bogus. Now I'm trying to
>>>>>>>> figure out whether it is the server's callback client or the
>>>>>>>> client's callback server that is misbehaving.
>>>>>>>>=20
>>>>>>>> To me, the server's callback principal (host@klimt) seems like =
it
>>>>>>>> is correct. The client would identify as host@manet when making
>>>>>>>> calls to the server, for example, so I'd expect the server to
>>>>>>>> behave similarly when performing callbacks.
>>>>>>>>=20
>>>>>>>> Can anyone shed more light on this?
>>>>>>>=20
>>>>>>> What are your full hostnames of each machine and does the =
reverse
>>>>>>> lookup from the ip to hostname on each machine give you what you
>>>>>>> expect?
>>>>>>>=20
>>>>>>> Sounds like all of them need to be resolved to =
<>.ib.1015grager.net
>>>>>>> but somewhere you are getting <>.1015grager.net instead.
>>>>>>=20
>>>>>> The forward and reverse mappings are consistent, and rdns is
>>>>>> disabled in my krb5.conf files. My server is multi-homed; it
>>>>>> has a 1GbE interface (klimt.1015granger.net); an FDR IB
>>>>>> interface (klimt.ib.1015granger.net); and a 25 GbE interface
>>>>>> (klimt.roce.1015granger.net).
>>>>>=20
>>>>> Ah, so you are keeping it very interesting...
>>>>>=20
>>>>>> My theory is that the server needs to use the same principal
>>>>>> for callback operations that the client used for lease
>>>>>> establishment. The last paragraph of S3.3.3 seems to state
>>>>>> that requirement, though it's not especially clear; and the
>>>>>> client has required it since commit f11b2a1cfbf5 (2014).
>>>>>>=20
>>>>>> So the server should authenticate as [email protected] and not
>>>>>> host@klimt, in this case, when performing callback requests.
>>>>>=20
>>>>> Yes I agree that server should have authenticated as [email protected] =
and
>>>>> that's what I see in my (simple) single home setup.
>>>>>=20
>>>>> In nfs-utils there is code that deals with the callback and =
comment
>>>>> about choices for the principal:
>>>>> * Restricting gssd to use "nfs" service name is needed for =
when
>>>>> * the NFS server is doing a callback to the NFS client. In =
this
>>>>> * case, the NFS server has to authenticate itself as "nfs" =
--
>>>>> * even if there are other service keys such as "host" or =
"root"
>>>>> * in the keytab.
>>>>> So the upcall for the callback should have specifically specified
>>>>> "nfs" to look for the nfs/<hostname>. Question is if you key tab =
has
>>>>> both:
>>>>> nfs/klmit and nfs/klmit.ib how does it choose which one to take. =
I'm
>>>>> not sure. But I guess in your case you are seeing that it choose
>>>>> "host/<>" which would really be a nfs-utils bug.
>>>>=20
>>>> I think the upcall is correctly requesting an nfs/ principal
>>>> (see below).
>>>>=20
>>>> Not only does it need to choose an nfs/ principal, but it also
>>>> has to pick the correct domain name. The domain name does not
>>>> seem to be passed up to gssd. fs/nfsd/nfs4state.c has this:
>>>>=20
>>>> 749 static struct rpc_cred *callback_cred;
>>>> 750
>>>> 751 int set_callback_cred(void)
>>>> 752 {
>>>> 753 if (callback_cred)
>>>> 754 return 0;
>>>> 755 callback_cred =3D rpc_lookup_machine_cred("nfs");
>>>> 756 if (!callback_cred)
>>>> 757 return -ENOMEM;
>>>> 758 return 0;
>>>> 759 }
>>>> 760
>>>> 761 void cleanup_callback_cred(void)
>>>> 762 {
>>>> 763 if (callback_cred) {
>>>> 764 put_rpccred(callback_cred);
>>>> 765 callback_cred =3D NULL;
>>>> 766 }
>>>> 767 }
>>>> 768
>>>> 769 static struct rpc_cred *get_backchannel_cred(struct nfs4_client =
*clp, struct rpc_clnt *client, struct nfsd4_session *ses)
>>>> 770 {
>>>> 771 if (clp->cl_minorversion =3D=3D 0) {
>>>> 772 return get_rpccred(callback_cred);
>>>> 773 } else {
>>>> 774 struct rpc_auth *auth =3D client->cl_auth;
>>>> 775 struct auth_cred acred =3D {};
>>>> 776
>>>> 777 acred.uid =3D ses->se_cb_sec.uid;
>>>> 778 acred.gid =3D ses->se_cb_sec.gid;
>>>> 779 return =
auth->au_ops->lookup_cred(client->cl_auth, &acred, 0);
>>>> 780 }
>>>> 781 }
>>>>=20
>>>> rpc_lookup_machine_cred("nfs"); should request an "nfs/" service
>>>> principal, shouldn't it?
>>>>=20
>>>> Though I think this approach is incorrect. The server should not
>>>> use the machine cred here, it should use a credential based on
>>>> the principal the client used to establish it's lease.
>>>>=20
>>>>=20
>>>>> What's in your server's key tab?
>>>>=20
>>>> [root@klimt ~]# klist -ke /etc/krb5.keytab
>>>> Keytab name: FILE:/etc/krb5.keytab
>>>> KVNO Principal
>>>> ---- =
--------------------------------------------------------------------------=

>>>> 4 host/[email protected] =
(aes256-cts-hmac-sha1-96)
>>>> 4 host/[email protected] =
(aes128-cts-hmac-sha1-96)
>>>> 4 host/[email protected] (des3-cbc-sha1)
>>>> 4 host/[email protected] (arcfour-hmac)
>>>> 3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)
>>>> 3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)
>>>> 3 nfs/[email protected] (des3-cbc-sha1)
>>>> 3 nfs/[email protected] (arcfour-hmac)
>>>> 3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)
>>>> 3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)
>>>> 3 nfs/[email protected] (des3-cbc-sha1)
>>>> 3 nfs/[email protected] (arcfour-hmac)
>>>> 3 nfs/[email protected] =
(aes256-cts-hmac-sha1-96)
>>>> 3 nfs/[email protected] =
(aes128-cts-hmac-sha1-96)
>>>> 3 nfs/[email protected] (des3-cbc-sha1)
>>>> 3 nfs/[email protected] (arcfour-hmac)
>>>> [root@klimt ~]#
>>>>=20
>>>> As a workaround, I bet moving the keys for nfs/klimt.ib to
>>>> the front of the keytab file would allow Kerberos to work
>>>> with the klimt.ib interface.
>>>>=20
>>>>=20
>>>>> An output from gssd -vvv would be interesting.
>>>>=20
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: #012handle_gssd_upcall: =
'mech=3Dkrb5 uid=3D0 [email protected] service=3Dnfs =
enctypes=3D18,17,16,2
>>>> 3,3,1,2 ' (nfsd4_cb/clnt0)
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 =
tgtname [email protected]
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for =
'manet.1015granger.net' is 'manet.1015granger.net'
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for =
'klimt.1015granger.net' is 'klimt.1015granger.net'
>>>=20
>>> I think that's the problem. This should have been
>>> klimt.ib.1015granger.net. nfs-utils just calls gethostname() to get
>>> the local domain name. And this is what it'll match against the key
>>> tab entry. So I think even if you move the key tabs around it =
probably
>>> will still pick [email protected].
>>=20
>> mount.nfs has a helper function called nfs_ca_sockname() that does a
>> connect/getsockname dance to derive the local host's hostname as it
>> is seen by the other end of the connection. So in this case, the
>> server's gssd would get the client's name, "manet.ib.1015granger.net"
>> and the "nfs" service name, and would correctly derive the service
>> principal "nfs/klimt.ib.1015granger.net" based on that.
>>=20
>> Would it work if gssd did this instead of using gethostname(3) ? Then
>> the kernel wouldn't have to pass the correct principal up to gssd, it
>> would be able to derive it by itself.
>=20
> I'd need to remind myself of how all of this work because I could
> confidently answer this. We are currently passing "target=3D" from the
> kernel as well as doing gethostbyname() in the gssd. Why? I don't know
> and need to figure out what each piece really accomplishes.
>=20
> I would think if the kernel could provide us with the correct domain
> name (as it knows over which interface the request came in), then gssd
> should just be using that instead querying the domain on its own.

I didn't see a target field, but I didn't look that closely.

The credential created by the kernel for this purpose does
not appear to provide more than "nfs" as the service
principal. Changing gssd as I describe above seems to help
the situation (on the server at least; I don't know what it
would do to the client).

It looks like the same cred is used for all NFSv4.0 callback
channels. That at least will need a code change to make
multi-homing work properly with Kerberos.

I'm not claiming that I have a long term solution here. I'm
just reporting my experimental results :-)


> Btw, what happened after your turned off the gssproxy? Did you get
> further in getting the "nfs" and not "host" identity used?

I erased the gssproxy cache, and that appears to have fixed
the client misbehavior. I'm still using gssproxy, and I was
able to use NFSv4.0 with Kerberos on my TCP-only i/f, then
on my IB i/f, then on my RoCE i/f without notable problems.

Since gssproxy is the default configuration on RHEL 7-based
systems, I think we want to make gssproxy work rather than
disabling it -- unless there is some serious structural=20
problem that will prevent it from ever working right.


--
Chuck Lever




2018-05-14 21:00:05

by J. Bruce Fields

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor

On Wed, May 09, 2018 at 05:19:41PM -0400, Chuck Lever wrote:
> I'm right on the edge of my understanding of how this all works.
>
> I've re-keyed my NFS server. Now on my client, I'm seeing this on
> vers=4.0,sec=sys mounts:
>
> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
> May 8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>
> manet is my client, and klimt is my server. I'm mounting with
> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>
> Because the client is using krb5i for lease management, the server
> is required to use krb5i for the callback channel (S 3.3.3 of RFC
> 7530).
>
> After a SETCLIENTID, the client copies the acceptor from the GSS
> context it set up, and uses that to check incoming callback
> requests. I instrumented the client's SETCLIENTID proc, and I see
> this:
>
> check_gss_callback_principal: [email protected], [email protected]
>
> The principal strings are not equal, and that's why the client
> believes the callback credential is bogus. Now I'm trying to
> figure out whether it is the server's callback client or the
> client's callback server that is misbehaving.
>
> To me, the server's callback principal (host@klimt) seems like it
> is correct. The client would identify as host@manet when making
> calls to the server, for example, so I'd expect the server to
> behave similarly when performing callbacks.

2018-05-14 21:07:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: SETCLIENTID acceptor

On Mon, May 14, 2018 at 02:02:19PM -0400, Chuck Lever wrote:
> I erased the gssproxy cache, and that appears to have fixed
> the client misbehavior. I'm still using gssproxy, and I was
> able to use NFSv4.0 with Kerberos on my TCP-only i/f, then
> on my IB i/f, then on my RoCE i/f without notable problems.
>
> Since gssproxy is the default configuration on RHEL 7-based
> systems, I think we want to make gssproxy work rather than
> disabling it -- unless there is some serious structural
> problem that will prevent it from ever working right.

Yeah. Maybe discuss it with Simo or someone if we've figured out what's
actually going on.

--g.