2011-10-20 23:39:14

by Arno Schuring

[permalink] [raw]
Subject: sec=krb5 mounts never return

Hello list,

tl;dr: mount calls with sec=krb5 never return. They appear to get
stalled in the kernel on "svc: transport is dead". I need to know
if this is a configuration issue, a kernel problem or a userland
problem, and how it can be resolved.


I've been running a succesful NFS4 setup at home for a year now, but
incorporating krb5 security has so far proven fruitless. I believe the
Kerberos side of the equation is no longer causing problems; it is used
for user authentication too, and nfs security contexts seem to work
properly. As said above, the mount request for any Kerberos mount gets
halted somewhere in flight:

# mount -vvv -t nfs4 -o sec=krb5 genie:/media /mnt
mount: fstab path: "/etc/fstab"
mount: mtab path: "/etc/mtab"
mount: lock path: "/etc/mtab~"
mount: temp path: "/etc/mtab.tmp"
mount: UID: 0
mount: eUID: 0
mount: spec: "genie:/media"
mount: node: "/mnt"
mount: types: "nfs4"
mount: opts: "sec=krb5"
mount: external mount: argv[0] = "/sbin/mount.nfs4"
mount: external mount: argv[1] = "genie:/media"
mount: external mount: argv[2] = "/mnt"
mount: external mount: argv[3] = "-v"
mount: external mount: argv[4] = "-o"
mount: external mount: argv[5] = "rw,sec=krb5"
mount.nfs4: timeout set for Sun Oct 16 22:52:10 2011
mount.nfs4: trying text-based options
'sec=krb5,addr=172.22.21.8,clientaddr=172.22.21.5'

And there it just seems to hang (the timeout is ignored completely)
until killed via signal or Ctrl-C. The same mount without sec=krb5
completes without issue:
# mount -v -t nfs4 genie:/media /mnt
mount.nfs4: timeout set for Thu Oct 20 23:48:55 2011
mount.nfs4: trying text-based options
'addr=172.22.21.8,clientaddr=172.22.21.8'
genie:/media on /mnt type nfs4 (rw)

The machine is running a fully updated Debian 6.0 (Squeeze). So far I
have only tried with one other client (also Squeeze) with similar
results. Relevant software versions: kernel 2.6.32 (armel) and
nfs-utils 1.2.2.


The only suggestion that Google has been able to give me was to verify
that the rpcsec_gss_krb5 was loaded, and it is:
$ lsmod|grep gss
rpcsec_gss_krb5 9026 5
auth_rpcgss 33334 4 rpcsec_gss_krb5,nfsd,nfs
sunrpc 171216 21 [..]

Reproducible with only these exports:
# exportfs -v
/srv/nfs 172.22.21.8(ro,wdelay,root_squash,no_subtree_check,fsid=0)
/srv/nfs gss/krb5(ro,wdelay,root_squash,no_subtree_check,fsid=0)


I can provide a full rpc_debug/svc_debug log if required. I'm not
attaching it now because it is somewhat large (well, 60k) and I'm not
even sure that user issues are on-topic for this list. What looks
suspicious to me is the following excerpt:
[600472.772226] nfsd: connect from 172.22.21.8, port=46257
[600472.772300] svc: svc_setup_socket created deda1a00 (inet df948900)
[600473.431966] svc_recv: found XPT_CLOSE
[600473.431982] svc: svc_delete_xprt(deda1a00)
[600473.432114] svc: transport deda1a00 is dead, not enqueued



I'd be grateful for any pointers,
Arno
(not subscribed, please keep me in CC)


2011-10-25 22:00:31

by Arno Schuring

[permalink] [raw]
Subject: Re: sec=krb5 mounts never return

Arno Schuring ([email protected] on 2011-10-21 01:23 +0200):
> Hello list,
>
> tl;dr: mount calls with sec=krb5 never return. They appear to get
> stalled in the kernel on "svc: transport is dead". I need to know
> if this is a configuration issue, a kernel problem or a userland
> problem, and how it can be resolved.

This appears to have been reported two months ago against Debian:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638157

And fixed over a year ago on this list:
http://www.spinics.net/lists/linux-nfs/msg14723.html


It's a wonderful world, isn't it?

Best wishes,
Arno

2011-10-21 22:05:47

by Douglas E. Engert

[permalink] [raw]
Subject: Re: sec=krb5 mounts never return

If the enctype=4 and length= 8 that sounds like DES.
You may need to add to the krb5.conf file,

allow_weak_crypto = yes


On 10/21/2011 3:58 PM, Kevin Coffman wrote:
> On Fri, Oct 21, 2011 at 4:10 PM, Arno Schuring<[email protected]> wrote:
>> Hi Kevin,
>>
>> thanks for your reply. Attached is all the information I have been able
>> to gather thus far (kernel and daemon logs).
>>
>> Kevin Coffman ([email protected] on 2011-10-21 10:33 -0400):
>>> On Thu, Oct 20, 2011 at 7:23 PM, Arno Schuring
>>> <[email protected]> wrote:
>>>>
>>>> I've been running a succesful NFS4 setup at home for a year now, but
>>>> incorporating krb5 security has so far proven fruitless. I believe
>>>> the Kerberos side of the equation is no longer causing problems; it
>>>> is used for user authentication too, and nfs security contexts seem
>>>> to work properly. As said above, the mount request for any Kerberos
>>>> mount gets halted somewhere in flight:
>> [..]
>>>> [600472.772226] nfsd: connect from 172.22.21.8, port=46257
>>>> [600472.772300] svc: svc_setup_socket created deda1a00 (inet
>>>> df948900) [600473.431966] svc_recv: found XPT_CLOSE
>>>> [600473.431982] svc: svc_delete_xprt(deda1a00)
>>>> [600473.432114] svc: transport deda1a00 is dead, not enqueued
>>>>
>>>
>>> You should be seeing syslog messages if not, but I'll ask anyway.
>>> You've got rpc.gssd configured and running on the client, and if this
>>> is a linux server, rpc.svcgssd configured and running on the server.
>>> ("Configured" more or less means you've got a keytab.) If they are
>>> running, what does their output look like? (Perhaps using "-vvv" to
>>> get detailed output.)
>>
>> In this case I'm trying with a local mount, so client==server. The gssd
>> logs invariably end with the following lines:
>> rpc.gssd[26133]: creating context with server [email protected]
>> rpc.gssd[28189]: DEBUG: serialize_krb5_ctx: lucid version!
>> rpc.gssd[28189]: prepare_krb5_rfc1964_buffer: serializing keys with
>> enctype 4 and length 8
>> rpc.gssd[28189]: doing downcall
>> [ then nothing until I kill the mount process ]
>>
>> In the svcgssd logs, nothing stands out to me. It all appears proper
>> to the untrained eye:
>>
>> rpc.svcgssd[26188]: handling null request
>> rpc.svcgssd[26188]: sname = nfs/[email protected]
>> rpc.svcgssd[26188]: libnfsidmap: using (default) domain: loos.site
>> rpc.svcgssd[26188]: DEBUG: serialize_krb5_ctx: lucid version!
>> rpc.svcgssd[26188]: prepare_krb5_rfc1964_buffer: serializing keys with
>> enctype 4 and length 8
>> rpc.svcgssd[26188]: doing downcall
>> rpc.svcgssd[26188]: mech: krb5, hndl len: 4, ctx len 85, timeout:
>> 1319183270 (35999 from now), clnt: [email protected], uid: -1, gid:
>> -1, num aux grps: 0:
>> rpc.svcgssd[26188]: sending null reply
>> rpc.svcgssd[26188]: finished handling null request
>>
>>
>> Regards,
>> Arno
>
> The userland/daemon stuff all looks fine to me. I'm not as familiar
> with the kernel logs. I believe the following kernel messages are
> from the ^C:
>
> Oct 20 23:48:18 genie kernel: [600500.662524] RPC: 437 return -512,
> status -512
> Oct 20 23:48:18 genie kernel: [600500.662539] RPC: 437 release task
> Oct 20 23:48:18 genie kernel: [600500.662558] RPC: freeing
> buffer of size 3712 at d6915000
> Oct 20 23:48:18 genie kernel: [600500.662578] RPC: 437 release
> request d6911000
> Oct 20 23:48:18 genie kernel: [600500.662594] RPC:
> wake_up_next(c3966234 "xprt_backlog")
> Oct 20 23:48:18 genie kernel: [600500.662613] RPC: 437 releasing
> RPCSEC_GSS cred df1f9300
> Oct 20 23:48:18 genie kernel: [600500.662631] RPC:
> rpc_release_client(de639e00)
> Oct 20 23:48:18 genie kernel: [600500.662648] RPC: 437 freeing task
> Oct 20 23:48:18 genie kernel: [600500.662664] nfs4_get_root: getroot error = 512
>
> I don't see anything obviously wrong.
>
> K.C.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

--

Douglas E. Engert <[email protected]>
Argonne National Laboratory
9700 South Cass Avenue
Argonne, Illinois 60439
(630) 252-5444

2011-10-21 14:33:17

by Kevin Coffman

[permalink] [raw]
Subject: Re: sec=krb5 mounts never return

On Thu, Oct 20, 2011 at 7:23 PM, Arno Schuring <[email protected]> wrote:
> Hello list,
>
> tl;dr: mount calls with sec=krb5 never return. They appear to get
> stalled in the kernel on "svc: transport is dead". I need to know
> if this is a configuration issue, a kernel problem or a userland
> problem, and how it can be resolved.
>
>
> I've been running a succesful NFS4 setup at home for a year now, but
> incorporating krb5 security has so far proven fruitless. I believe the
> Kerberos side of the equation is no longer causing problems; it is used
> for user authentication too, and nfs security contexts seem to work
> properly. As said above, the mount request for any Kerberos mount gets
> halted somewhere in flight:
>
> # mount -vvv -t nfs4 -o sec=krb5 genie:/media /mnt
> mount: fstab path: "/etc/fstab"
> mount: mtab path: ?"/etc/mtab"
> mount: lock path: ?"/etc/mtab~"
> mount: temp path: ?"/etc/mtab.tmp"
> mount: UID: ? ? ? ?0
> mount: eUID: ? ? ? 0
> mount: spec: ?"genie:/media"
> mount: node: ?"/mnt"
> mount: types: "nfs4"
> mount: opts: ?"sec=krb5"
> mount: external mount: argv[0] = "/sbin/mount.nfs4"
> mount: external mount: argv[1] = "genie:/media"
> mount: external mount: argv[2] = "/mnt"
> mount: external mount: argv[3] = "-v"
> mount: external mount: argv[4] = "-o"
> mount: external mount: argv[5] = "rw,sec=krb5"
> mount.nfs4: timeout set for Sun Oct 16 22:52:10 2011
> mount.nfs4: trying text-based options
> 'sec=krb5,addr=172.22.21.8,clientaddr=172.22.21.5'
>
> And there it just seems to hang (the timeout is ignored completely)
> until killed via signal or Ctrl-C. The same mount without sec=krb5
> completes without issue:
> # mount -v -t nfs4 genie:/media /mnt
> mount.nfs4: timeout set for Thu Oct 20 23:48:55 2011
> mount.nfs4: trying text-based options
> 'addr=172.22.21.8,clientaddr=172.22.21.8'
> genie:/media on /mnt type nfs4 (rw)
>
> The machine is running a fully updated Debian 6.0 (Squeeze). So far I
> have only tried with one other client (also Squeeze) with similar
> results. Relevant software versions: kernel 2.6.32 (armel) and
> nfs-utils 1.2.2.
>
>
> The only suggestion that Google has been able to give me was to verify
> that the rpcsec_gss_krb5 was loaded, and it is:
> $ lsmod|grep gss
> rpcsec_gss_krb5 ? ? ? ? 9026 ?5
> auth_rpcgss ? ? ? ? ? ?33334 ?4 rpcsec_gss_krb5,nfsd,nfs
> sunrpc ? ? ? ? ? ? ? ?171216 ?21 [..]
>
> Reproducible with only these exports:
> # exportfs -v
> /srv/nfs ?172.22.21.8(ro,wdelay,root_squash,no_subtree_check,fsid=0)
> /srv/nfs ?gss/krb5(ro,wdelay,root_squash,no_subtree_check,fsid=0)
>
>
> I can provide a full rpc_debug/svc_debug log if required. I'm not
> attaching it now because it is somewhat large (well, 60k) and I'm not
> even sure that user issues are on-topic for this list. What looks
> suspicious to me is the following excerpt:
> [600472.772226] nfsd: connect from 172.22.21.8, port=46257
> [600472.772300] svc: svc_setup_socket created deda1a00 (inet df948900)
> [600473.431966] svc_recv: found XPT_CLOSE
> [600473.431982] svc: svc_delete_xprt(deda1a00)
> [600473.432114] svc: transport deda1a00 is dead, not enqueued
>
>
>
> I'd be grateful for any pointers,
> Arno
> (not subscribed, please keep me in CC)

You should be seeing syslog messages if not, but I'll ask anyway.
You've got rpc.gssd configured and running on the client, and if this
is a linux server, rpc.svcgssd configured and running on the server.
("Configured" more or less means you've got a keytab.) If they are
running, what does their output look like? (Perhaps using "-vvv" to
get detailed output.)

K.C.

2011-10-21 20:10:18

by Arno Schuring

[permalink] [raw]
Subject: Re: sec=krb5 mounts never return

Hi Kevin,

thanks for your reply. Attached is all the information I have been able
to gather thus far (kernel and daemon logs).

Kevin Coffman ([email protected] on 2011-10-21 10:33 -0400):
> On Thu, Oct 20, 2011 at 7:23 PM, Arno Schuring
> <[email protected]> wrote:
> >
> > I've been running a succesful NFS4 setup at home for a year now, but
> > incorporating krb5 security has so far proven fruitless. I believe
> > the Kerberos side of the equation is no longer causing problems; it
> > is used for user authentication too, and nfs security contexts seem
> > to work properly. As said above, the mount request for any Kerberos
> > mount gets halted somewhere in flight:
[..]
> > [600472.772226] nfsd: connect from 172.22.21.8, port=46257
> > [600472.772300] svc: svc_setup_socket created deda1a00 (inet
> > df948900) [600473.431966] svc_recv: found XPT_CLOSE
> > [600473.431982] svc: svc_delete_xprt(deda1a00)
> > [600473.432114] svc: transport deda1a00 is dead, not enqueued
> >
>
> You should be seeing syslog messages if not, but I'll ask anyway.
> You've got rpc.gssd configured and running on the client, and if this
> is a linux server, rpc.svcgssd configured and running on the server.
> ("Configured" more or less means you've got a keytab.) If they are
> running, what does their output look like? (Perhaps using "-vvv" to
> get detailed output.)

In this case I'm trying with a local mount, so client==server. The gssd
logs invariably end with the following lines:
rpc.gssd[26133]: creating context with server [email protected]
rpc.gssd[28189]: DEBUG: serialize_krb5_ctx: lucid version!
rpc.gssd[28189]: prepare_krb5_rfc1964_buffer: serializing keys with
enctype 4 and length 8
rpc.gssd[28189]: doing downcall
[ then nothing until I kill the mount process ]

In the svcgssd logs, nothing stands out to me. It all appears proper
to the untrained eye:

rpc.svcgssd[26188]: handling null request
rpc.svcgssd[26188]: sname = nfs/[email protected]
rpc.svcgssd[26188]: libnfsidmap: using (default) domain: loos.site
rpc.svcgssd[26188]: DEBUG: serialize_krb5_ctx: lucid version!
rpc.svcgssd[26188]: prepare_krb5_rfc1964_buffer: serializing keys with
enctype 4 and length 8
rpc.svcgssd[26188]: doing downcall
rpc.svcgssd[26188]: mech: krb5, hndl len: 4, ctx len 85, timeout:
1319183270 (35999 from now), clnt: [email protected], uid: -1, gid:
-1, num aux grps: 0:
rpc.svcgssd[26188]: sending null reply
rpc.svcgssd[26188]: finished handling null request


Regards,
Arno


Attachments:
(No filename) (2.44 kB)
daemon.log (6.61 kB)
kern.log (52.38 kB)
Download all attachments

2011-10-21 20:58:22

by Kevin Coffman

[permalink] [raw]
Subject: Re: sec=krb5 mounts never return

On Fri, Oct 21, 2011 at 4:10 PM, Arno Schuring <[email protected]> wrote:
> Hi Kevin,
>
> thanks for your reply. Attached is all the information I have been able
> to gather thus far (kernel and daemon logs).
>
> Kevin Coffman ([email protected] on 2011-10-21 10:33 -0400):
>> On Thu, Oct 20, 2011 at 7:23 PM, Arno Schuring
>> <[email protected]> wrote:
>> >
>> > I've been running a succesful NFS4 setup at home for a year now, but
>> > incorporating krb5 security has so far proven fruitless. I believe
>> > the Kerberos side of the equation is no longer causing problems; it
>> > is used for user authentication too, and nfs security contexts seem
>> > to work properly. As said above, the mount request for any Kerberos
>> > mount gets halted somewhere in flight:
> [..]
>> > [600472.772226] nfsd: connect from 172.22.21.8, port=46257
>> > [600472.772300] svc: svc_setup_socket created deda1a00 (inet
>> > df948900) [600473.431966] svc_recv: found XPT_CLOSE
>> > [600473.431982] svc: svc_delete_xprt(deda1a00)
>> > [600473.432114] svc: transport deda1a00 is dead, not enqueued
>> >
>>
>> You should be seeing syslog messages if not, but I'll ask anyway.
>> You've got rpc.gssd configured and running on the client, and if this
>> is a linux server, rpc.svcgssd configured and running on the server.
>> ("Configured" more or less means you've got a keytab.) ?If they are
>> running, what does their output look like? ?(Perhaps using "-vvv" to
>> get detailed output.)
>
> In this case I'm trying with a local mount, so client==server. The gssd
> logs invariably end with the following lines:
> rpc.gssd[26133]: creating context with server [email protected]
> rpc.gssd[28189]: DEBUG: serialize_krb5_ctx: lucid version!
> rpc.gssd[28189]: prepare_krb5_rfc1964_buffer: serializing keys with
> enctype 4 and length 8
> rpc.gssd[28189]: doing downcall
> [ then nothing until I kill the mount process ]
>
> In the svcgssd logs, nothing stands out to me. It all appears proper
> to the untrained eye:
>
> rpc.svcgssd[26188]: handling null request
> rpc.svcgssd[26188]: sname = nfs/[email protected]
> rpc.svcgssd[26188]: libnfsidmap: using (default) domain: loos.site
> rpc.svcgssd[26188]: DEBUG: serialize_krb5_ctx: lucid version!
> rpc.svcgssd[26188]: prepare_krb5_rfc1964_buffer: serializing keys with
> enctype 4 and length 8
> rpc.svcgssd[26188]: doing downcall
> rpc.svcgssd[26188]: mech: krb5, hndl len: 4, ctx len 85, timeout:
> 1319183270 (35999 from now), clnt: [email protected], uid: -1, gid:
> -1, num aux grps: 0:
> rpc.svcgssd[26188]: sending null reply
> rpc.svcgssd[26188]: finished handling null request
>
>
> Regards,
> Arno

The userland/daemon stuff all looks fine to me. I'm not as familiar
with the kernel logs. I believe the following kernel messages are
from the ^C:

Oct 20 23:48:18 genie kernel: [600500.662524] RPC: 437 return -512,
status -512
Oct 20 23:48:18 genie kernel: [600500.662539] RPC: 437 release task
Oct 20 23:48:18 genie kernel: [600500.662558] RPC: freeing
buffer of size 3712 at d6915000
Oct 20 23:48:18 genie kernel: [600500.662578] RPC: 437 release
request d6911000
Oct 20 23:48:18 genie kernel: [600500.662594] RPC:
wake_up_next(c3966234 "xprt_backlog")
Oct 20 23:48:18 genie kernel: [600500.662613] RPC: 437 releasing
RPCSEC_GSS cred df1f9300
Oct 20 23:48:18 genie kernel: [600500.662631] RPC:
rpc_release_client(de639e00)
Oct 20 23:48:18 genie kernel: [600500.662648] RPC: 437 freeing task
Oct 20 23:48:18 genie kernel: [600500.662664] nfs4_get_root: getroot error = 512

I don't see anything obviously wrong.

K.C.

2011-10-22 11:12:53

by Arno Schuring

[permalink] [raw]
Subject: Re: sec=krb5 mounts never return

Douglas E. Engert ([email protected] on 2011-10-21 16:43 -0500):
> If the enctype=4 and length= 8 that sounds like DES.
> You may need to add to the krb5.conf file,
>
> allow_weak_crypto = yes

That's there. I'm not particularly happy about it, but it is my
understanding that 2.6.32 does not allow strong crypto for nfs4.

>
> On 10/21/2011 3:58 PM, Kevin Coffman wrote:
> >>
> >> In this case I'm trying with a local mount, so client==server. The
> >> gssd logs invariably end with the following lines:
> >> rpc.gssd[26133]: creating context with server [email protected]
> >> rpc.gssd[28189]: DEBUG: serialize_krb5_ctx: lucid version!
> >> rpc.gssd[28189]: prepare_krb5_rfc1964_buffer: serializing keys with
> >> enctype 4 and length 8
> >> rpc.gssd[28189]: doing downcall
> >> [ then nothing until I kill the mount process ]
> >>
> >> In the svcgssd logs, nothing stands out to me. It all appears
> >> proper to the untrained eye:
> >>
> >> rpc.svcgssd[26188]: handling null request
> >> rpc.svcgssd[26188]: sname = nfs/[email protected]
> >> rpc.svcgssd[26188]: libnfsidmap: using (default) domain: loos.site
> >> rpc.svcgssd[26188]: DEBUG: serialize_krb5_ctx: lucid version!
> >> rpc.svcgssd[26188]: prepare_krb5_rfc1964_buffer: serializing keys
> >> with enctype 4 and length 8
> >> rpc.svcgssd[26188]: doing downcall
> >> rpc.svcgssd[26188]: mech: krb5, hndl len: 4, ctx len 85, timeout:
> >> 1319183270 (35999 from now), clnt: [email protected], uid: -1,
> >> gid: -1, num aux grps: 0:
> >> rpc.svcgssd[26188]: sending null reply
> >> rpc.svcgssd[26188]: finished handling null request
> >>
> > The userland/daemon stuff all looks fine to me. I'm not as familiar
> > with the kernel logs. I believe the following kernel messages are
> > from the ^C:
> >
Indeed. 23:47 is the mount command, 23:48 is the ^C. I see I did not
even allow 30s before killing, but I've had similar results even when
waiting minutes.

> >
> > I don't see anything obviously wrong.
Thanks. Any kernel wizards that want to take a stab at this? I can
provide more input if required, but kernel traces are hard to get
because this is a NAS without console.


Regards,
Arno