2011-03-22 23:27:31

by Brian J. Murrell

[permalink] [raw]
Subject: different kernels mean NFS4/GSSAPI works or doesn't

I was hoping I could bring a kernel.org ticket that I filed to your
attention in the hopes that somebody might have an epiphany.

https://bugzilla.kernel.org/show_bug.cgi?id=31442

This is a strange problem where simply booting to a different kernel,
even within the same release stream (2.6.32) can result in an NFS server
that doesn't seem to want to respond to GSSAPI mount requests.

I was working with Trond on it and it got as far as my reporting what
rpc.gssd is doing when a failed (blocked in fact) mount request happens:

pc# rpc.gssd with the -f -vvv
beginning poll
handling gssd upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt6e1)
handle_gssd_upcall: 'mech=krb5 uid=0 enctypes=18,17,16,23,3,1,2 '
handling krb5 upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt6e1)
process_krb5_upcall: service is '<null>'
Full hostname for 'linux.example.com' is 'linux.example.com'
Full hostname for 'pc' is 'pc'
Key table entry not found while getting keytab entry for 'root/pc@ILINX'
Key table entry not found while getting keytab entry for 'nfs/pc@ILINX'
Key table entry not found while getting keytab entry for 'host/pc@ILINX'
Success getting keytab entry for nfs/*@ILINX
WARNING: Key table entry not found while getting initial ticket for
principal
'nfs/pc.example.com@ILINX' using keytab 'WRFILE:/etc/krb5.keytab'
ERROR: No credentials found for connection to server linux.example.com
doing error downcall
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e1
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e0
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6df
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e4
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e3

pc$ sudo mount -t nfs4 -o sec=krb5 linux:/tmp /mnt/tmp
mount.nfs4: access denied by server while mounting linux:/tmp

Now granted, this isn't a block/hang on the mount, but this was also
after having removed 3des entries from my keytabs. I wasn't getting
access denied before removing the 3des keytab entries but was getting
blocked mount.nfs4 commands on the client.

More gory details are in the ticket.

Any next debugging steps?

Cheers,
b.


Attachments:
signature.asc (262.00 B)
OpenPGP digital signature

2011-03-23 02:41:12

by Kevin Coffman

[permalink] [raw]
Subject: Re: different kernels mean NFS4/GSSAPI works or doesn't

Hi Brian,
Can you tell me how you removed the 3des keys from the keytab? (If
you simply used ktutil, this isn't good enough. The KDC will still
issue a ticket with 3DES because, as far as it knows, the service
still has a 3des key in the KDC and supports 3des.)

(Sorry if this was already provided and I missed it.) Do you have
output from gssd and svcgssd in the original failure/hang case?

K.C.

On Tue, Mar 22, 2011 at 7:27 PM, Brian J. Murrell <[email protected]> wrote:
> I was hoping I could bring a kernel.org ticket that I filed to your
> attention in the hopes that somebody might have an epiphany.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=31442
>
> This is a strange problem where simply booting to a different kernel,
> even within the same release stream (2.6.32) can result in an NFS server
> that doesn't seem to want to respond to GSSAPI mount requests.
>
> I was working with Trond on it and it got as far as my reporting what
> rpc.gssd is doing when a failed (blocked in fact) mount request happens:
>
> pc# rpc.gssd with the -f -vvv
> beginning poll
> handling gssd upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt6e1)
> handle_gssd_upcall: 'mech=krb5 uid=0 enctypes=18,17,16,23,3,1,2 '
> handling krb5 upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt6e1)
> process_krb5_upcall: service is '<null>'
> Full hostname for 'linux.example.com' is 'linux.example.com'
> Full hostname for 'pc' is 'pc'
> Key table entry not found while getting keytab entry for 'root/pc@ILINX'
> Key table entry not found while getting keytab entry for 'nfs/pc@ILINX'
> Key table entry not found while getting keytab entry for 'host/pc@ILINX'
> Success getting keytab entry for nfs/*@ILINX
> WARNING: Key table entry not found while getting initial ticket for
> principal
> 'nfs/pc.example.com@ILINX' using keytab 'WRFILE:/etc/krb5.keytab'
> ERROR: No credentials found for connection to server linux.example.com
> doing error downcall
> destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e1
> destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e0
> destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6df
> destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e4
> destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e3
>
> pc$ sudo mount -t nfs4 -o sec=krb5 linux:/tmp /mnt/tmp
> mount.nfs4: access denied by server while mounting linux:/tmp
>
> Now granted, this isn't a block/hang on the mount, but this was also
> after having removed 3des entries from my keytabs. ?I wasn't getting
> access denied before removing the 3des keytab entries but was getting
> blocked mount.nfs4 commands on the client.
>
> More gory details are in the ticket.
>
> Any next debugging steps?
>
> Cheers,
> b.
>
>

2011-03-23 17:07:17

by Myklebust, Trond

[permalink] [raw]
Subject: Re: different kernels mean NFS4/GSSAPI works or doesn't

Please don't remove people from the Cc list.

On Wed, 2011-03-23 at 11:07 -0400, Brian J. Murrell wrote:
> On 11-03-23 10:29 AM, Kevin Coffman wrote:
> >
> > However, that doesn't appear to be the case from looking at the
> > gssd/svcgssd output. Have you checked the server's syslog for
> > messages about rpc.gssd not running?
>
> Yup. Nothing there.
>
> > OK. Thanks. There is enough info there, even with the redaction.
>
> Ahhh. Good.
>
> > This shows that a des (enctype 4) session key is being negotiated and
> > delivered to both kernels, so I don't think any of the Kerberos issues
> > should be involved here. (At least from the user-land perspective.)
>
> Hrm. So this gets even more interesting (for some values of "interesting").
>
> > I'm not sure what kernel change would be causing your hang ...
>
> Thanks for looking though.
>
> I'm open to any more ideas, debugging info requests, etc. that anyone
> might have.

You said this happens only with RPCSEC_GSS?

Have you tried bisecting, to see which changes caused it? You can
probably restrict yourself to bisecting changes to include/linux/sunrpc,
net/sunrpc and net/sunrpc/auth_gss.

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-03-27 00:07:26

by Brian J. Murrell

[permalink] [raw]
Subject: Re: different kernels mean NFS4/GSSAPI works or doesn't

On 11-03-23 01:07 PM, Trond Myklebust wrote:
> Please don't remove people from the Cc list.

Fair enough. Personally, I dislike getting two copies, but I can
respect others' wishes.

> You said this happens only with RPCSEC_GSS?

Indeed. Hrm. Well, to be honest my two scenarios are NFS4 with
RPCSEC_GSS and NFSv3. I've never really looked into NFS4 without
RPCSEC_GSS. Is that even possible? What's the point? I guess there
are protocol improvements in NFS4 aside from RPCSEC_GSS?

> Have you tried bisecting, to see which changes caused it?

I just did, using Ubuntu's "mainline" kernels, which AFAIU are upstream
kernel source without any of their special sauce, but built with their
.configs.

It seems the problem was introduced in 2.6.32.12. I don't really know
which commit between 2.6.32.11 and 2.6.32.12 though. In fact I don't
even know where to find these so-called .11 and .12 4th order of
releases to see what all went into it.

I did find
http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.32.11 but a
gitweb interface to the changes would be much more useful I think.
Although the number of changes that show up there for matches of /nfs/
or /gss/ are quite small (only one for the latter).

> You can
> probably restrict yourself to bisecting changes to include/linux/sunrpc,
> net/sunrpc and net/sunrpc/auth_gss.

If I could find a gitweb tree to grovel through. :-)

b.




Attachments:
signature.asc (262.00 B)
OpenPGP digital signature

2011-03-30 12:09:14

by Brian J. Murrell

[permalink] [raw]
Subject: Re: different kernels mean NFS4/GSSAPI works or doesn't

On 11-03-27 08:42 AM, Brian J. Murrell wrote:
>
> It seems the problem was introduced in 2.6.32.12. I don't really know
> which commit between 2.6.32.11 and 2.6.32.12 though. In fact I don't
> even know where to find these so-called .11 and .12 4th order of
> releases to see what all went into it.

OK. So I have linux-2.6.32.11.tar.bz2 and patch-2.6.32.12.bz2 from
kernel.org.

patch-2.6.32.12.bz2 doesn't apply cleanly to the linux-2.6.32.11 tree
that's in linux-2.6.32.11.tar.bz2 though:

# bzcat ../patch-2.6.32.12.bz2 | patch -p1 --dry-run
patching file Documentation/Changes
Reversed (or previously applied) patch detected! Assume -R? [n]

Looking at the patch and the tree, indeed, the hunk the patch is trying
to apply:

--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -49,6 +49,8 @@ o oprofile 0.9 #
oprofiled --version
o udev 081 # udevinfo -V
o grub 0.93 # grub --version
o mcelog 0.6
+o iptables 1.4.1 # iptables -V
+

Is already in 2.6.32.11:

o udev 081 # udevinfo -V
o grub 0.93 # grub --version
o mcelog 0.6
o iptables 1.4.1 # iptables -V


Kernel compilation
==================

What am I missing about this patch file?

>> You can
>> probably restrict yourself to bisecting changes to include/linux/sunrpc,
>> net/sunrpc and net/sunrpc/auth_gss.

In any case, that .12 patch file has the following changes to files
matching /sunrpc/ (apologies for the wrapping -- stupid thunderbird has
no way to turn that off unless I compose HTML mails):

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index fc6a43c..2370ab4 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -485,7 +485,7 @@ gss_refresh_upcall(struct rpc_task *task)
dprintk("RPC: %5u gss_refresh_upcall for uid %u\n", task->tk_pid,
cred->cr_uid);
gss_msg = gss_setup_upcall(task->tk_client, gss_auth, cred);
- if (IS_ERR(gss_msg) == -EAGAIN) {
+ if (PTR_ERR(gss_msg) == -EAGAIN) {
/* XXX: warning on the first, under the assumption we
* shouldn't normally hit this case on a refresh. */
warn_gssd();
@@ -644,7 +644,22 @@ gss_pipe_downcall(struct file *filp, const char
__user *src, size_t mlen)
p = gss_fill_context(p, end, ctx, gss_msg->auth->mech);
if (IS_ERR(p)) {
err = PTR_ERR(p);
- gss_msg->msg.errno = (err == -EAGAIN) ? -EAGAIN : -EACCES;
+ switch (err) {
+ case -EACCES:
+ gss_msg->msg.errno = err;
+ err = mlen;
+ break;
+ case -EFAULT:
+ case -ENOMEM:
+ case -EINVAL:
+ case -ENOSYS:
+ gss_msg->msg.errno = -EAGAIN;
+ break;
+ default:
+ printk(KERN_CRIT "%s: bad return from "
+ "gss_fill_context: %ld\n", __func__, err);
+ BUG();
+ }
goto err_release_msg;
}
gss_msg->ctx = gss_get_ctx(ctx);
@@ -1258,9 +1273,8 @@ alloc_enc_pages(struct rpc_rqst *rqstp)
rqstp->rq_release_snd_buf = priv_release_snd_buf;
return 0;
out_free:
- for (i--; i >= 0; i--) {
- __free_page(rqstp->rq_enc_pages[i]);
- }
+ rqstp->rq_enc_pages_num = i;
+ priv_release_snd_buf(rqstp);
out:
return -EAGAIN;
}
diff --git a/net/sunrpc/auth_gss/gss_krb5_mech.c
b/net/sunrpc/auth_gss/gss_krb5_mech.c
index ef45eba..2deb0ed 100644
--- a/net/sunrpc/auth_gss/gss_krb5_mech.c
+++ b/net/sunrpc/auth_gss/gss_krb5_mech.c
@@ -131,8 +131,10 @@ gss_import_sec_context_kerberos(const void *p,
struct krb5_ctx *ctx;
int tmp;

- if (!(ctx = kzalloc(sizeof(*ctx), GFP_NOFS)))
+ if (!(ctx = kzalloc(sizeof(*ctx), GFP_NOFS))) {
+ p = ERR_PTR(-ENOMEM);
goto out_err;
+ }

p = simple_get_bytes(p, end, &ctx->initiate, sizeof(ctx->initiate));
if (IS_ERR(p))
diff --git a/net/sunrpc/auth_gss/gss_mech_switch.c
b/net/sunrpc/auth_gss/gss_mech_switch.c
index 6efbb0c..76e4c6f 100644
--- a/net/sunrpc/auth_gss/gss_mech_switch.c
+++ b/net/sunrpc/auth_gss/gss_mech_switch.c
@@ -252,7 +252,7 @@ gss_import_sec_context(const void *input_token,
size_t bufsize,
struct gss_ctx **ctx_id)
{
if (!(*ctx_id = kzalloc(sizeof(**ctx_id), GFP_KERNEL)))
- return GSS_S_FAILURE;
+ return -ENOMEM;
(*ctx_id)->mech_type = gss_mech_get(mech);

return mech->gm_ops
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index 49278f8..27a2378 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -587,6 +587,8 @@ static struct dentry
*__rpc_lookup_create_exclusive(struct dentry *parent,
struct dentry *dentry;

dentry = __rpc_lookup_create(parent, name);
+ if (IS_ERR(dentry))
+ return dentry;
if (dentry->d_inode == NULL)
return dentry;
dput(dentry);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index df124f7..3fbd6ba 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -711,7 +711,10 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
spin_unlock_bh(&pool->sp_lock);

len = 0;
- if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
+ if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
+ dprintk("svc_recv: found XPT_CLOSE\n");
+ svc_delete_xprt(xprt);
+ } else if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
struct svc_xprt *newxpt;
newxpt = xprt->xpt_ops->xpo_accept(xprt);
if (newxpt) {
@@ -737,7 +740,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
svc_xprt_received(newxpt);
}
svc_xprt_received(xprt);
- } else if (!test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
+ } else {
dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
rqstp, pool->sp_id, xprt,
atomic_read(&xprt->xpt_ref.refcount));
@@ -750,11 +753,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
dprintk("svc: got len=%d\n", len);
}

- if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
- dprintk("svc_recv: found XPT_CLOSE\n");
- svc_delete_xprt(xprt);
- }
-
/* No data, incomplete (TCP) read, or accept() */
if (len == 0 || len == -EAGAIN) {
rqstp->rq_res.len = 0;
@@ -900,11 +898,8 @@ void svc_delete_xprt(struct svc_xprt *xprt)
if (test_bit(XPT_TEMP, &xprt->xpt_flags))
serv->sv_tmpcnt--;

- for (dr = svc_deferred_dequeue(xprt); dr;
- dr = svc_deferred_dequeue(xprt)) {
- svc_xprt_put(xprt);
+ while ((dr = svc_deferred_dequeue(xprt)) != NULL)
kfree(dr);
- }

svc_xprt_put(xprt);
spin_unlock_bh(&serv->sv_lock);
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 117f68a..97cc3de 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -655,23 +655,25 @@ static struct unix_gid *unix_gid_lookup(uid_t uid)
return NULL;
}

-static int unix_gid_find(uid_t uid, struct group_info **gip,
- struct svc_rqst *rqstp)
+static struct group_info *unix_gid_find(uid_t uid, struct svc_rqst *rqstp)
{
- struct unix_gid *ug = unix_gid_lookup(uid);
+ struct unix_gid *ug;
+ struct group_info *gi;
+ int ret;
+
+ ug = unix_gid_lookup(uid);
if (!ug)
- return -EAGAIN;
- switch (cache_check(&unix_gid_cache, &ug->h, &rqstp->rq_chandle)) {
+ return ERR_PTR(-EAGAIN);
+ ret = cache_check(&unix_gid_cache, &ug->h, &rqstp->rq_chandle);
+ switch (ret) {
case -ENOENT:
- *gip = NULL;
- return 0;
+ return ERR_PTR(-ENOENT);
case 0:
- *gip = ug->gi;
- get_group_info(*gip);
+ gi = get_group_info(ug->gi);
cache_put(&ug->h, &unix_gid_cache);
- return 0;
+ return gi;
default:
- return -EAGAIN;
+ return ERR_PTR(-EAGAIN);
}
}

@@ -681,6 +683,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
struct sockaddr_in *sin;
struct sockaddr_in6 *sin6, sin6_storage;
struct ip_map *ipm;
+ struct group_info *gi;
+ struct svc_cred *cred = &rqstp->rq_cred;

switch (rqstp->rq_addr.ss_family) {
case AF_INET:
@@ -722,6 +726,17 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
ip_map_cached_put(rqstp, ipm);
break;
}
+
+ gi = unix_gid_find(cred->cr_uid, rqstp);
+ switch (PTR_ERR(gi)) {
+ case -EAGAIN:
+ return SVC_DROP;
+ case -ENOENT:
+ break;
+ default:
+ put_group_info(cred->cr_group_info);
+ cred->cr_group_info = gi;
+ }
return SVC_OK;
}

@@ -818,19 +833,11 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32
*authp)
slen = svc_getnl(argv); /* gids length */
if (slen > 16 || (len -= (slen + 2)*4) < 0)
goto badcred;
- if (unix_gid_find(cred->cr_uid, &cred->cr_group_info, rqstp)
- == -EAGAIN)
+ cred->cr_group_info = groups_alloc(slen);
+ if (cred->cr_group_info == NULL)
return SVC_DROP;
- if (cred->cr_group_info == NULL) {
- cred->cr_group_info = groups_alloc(slen);
- if (cred->cr_group_info == NULL)
- return SVC_DROP;
- for (i = 0; i < slen; i++)
- GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
- } else {
- for (i = 0; i < slen ; i++)
- svc_getnl(argv);
- }
+ for (i = 0; i < slen; i++)
+ GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {
*authp = rpc_autherr_badverf;
return SVC_DENIED;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 1c246a4..70b0a22 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -968,6 +968,7 @@ static int svc_tcp_recv_record(struct svc_sock
*svsk, struct svc_rqst *rqstp)
return len;
err_delete:
set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
+ svc_xprt_received(&svsk->sk_xprt);
err_again:
return -EAGAIN;
}
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 37c5475..5cdbf7c 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1926,6 +1926,11 @@ static void xs_tcp_setup_socket(struct rpc_xprt
*xprt,
case -EALREADY:
xprt_clear_connecting(xprt);
return;
+ case -EINVAL:
+ /* Happens, for instance, if the user specified a link
+ * local IPv6 address without a scope-id.
+ */
+ goto out;
}
out_eagain:
status = -EAGAIN;

Anything jump out at you?

b.


Attachments:
signature.asc (262.00 B)
OpenPGP digital signature

2011-03-23 11:03:27

by Brian J. Murrell

[permalink] [raw]
Subject: Re: different kernels mean NFS4/GSSAPI works or doesn't

On 11-03-22 10:41 PM, Kevin Coffman wrote:
> Hi Brian,

Hi Kevin,

> Can you tell me how you removed the 3des keys from the keytab? (If
> you simply used ktutil, this isn't good enough. The KDC will still
> issue a ticket with 3DES because, as far as it knows, the service
> still has a 3des key in the KDC and supports 3des.)

Ahh. Well, I did use ktutil. I opened the keytabs (/etc/krb5.keytab on
both the client and server) then while they were open in ktutil, removed
the files from the respective filesystems, deleted the keys in ktutil
then wrote it back out from ktutil to the filesystem.

But you say this is not good enough, yes? Should I be removing keys
from the KDC also?

kadmin.local: getprinc nfs/pc.example.com@ILINX
Principal: nfs/pc.example.com@ILINX
Expiration date: [never]
Last password change: Sun Mar 14 20:51:08 EDT 2010
Password expiration date: [none]
Maximum ticket life: 0 days 10:00:00
Maximum renewable life: 7 days 00:00:00
Last modified: Sun Mar 14 20:51:08 EDT 2010 (brian/admin@ILINX)
Last successful authentication: Sat Mar 19 11:39:41 EDT 2011
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 2
Key: vno 2, Triple DES cbc mode with HMAC/sha1, no salt
Key: vno 2, DES cbc mode with CRC-32, no salt
MKey: vno 1
Attributes: REQUIRES_PRE_AUTH
Policy: [none]

and/or

kadmin.local: getprinc nfs/linux.example.com@ILINX
Principal: nfs/linux.example.com@ILINX
Expiration date: [never]
Last password change: Sat Nov 01 00:37:11 EDT 2008
Password expiration date: [none]
Maximum ticket life: 0 days 10:00:00
Maximum renewable life: 7 days 00:00:00
Last modified: Sat Nov 01 00:37:11 EDT 2008 (root/admin@ILINX)
Last successful authentication: [never]
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 1
Key: vno 4, DES cbc mode with CRC-32, no salt
MKey: vno 1
Attributes:
Policy: [none]

Ahhhh. But it seems that the server doesn't have a 3des key in the KDC,
only the client does.

> (Sorry if this was already provided and I missed it.)

No, I don't think we have been here yet.

> Do you have
> output from gssd and svcgssd in the original failure/hang case?

Here they are:

pc# rpc.gssd with the -f -vvv
beginning poll
handling gssd upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt4b90)
handle_gssd_upcall: 'mech=krb5 uid=0 enctypes=18,17,16,23,3,1,2 '
handling krb5 upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt4b90)
process_krb5_upcall: service is '<null>'
Full hostname for 'linux.example.com' is 'linux.example.com'
Full hostname for 'pc' is 'pc'
Key table entry not found while getting keytab entry for 'root/pc@ILINX'
Key table entry not found while getting keytab entry for 'nfs/pc@ILINX'
Key table entry not found while getting keytab entry for 'host/pc@ILINX'
Success getting keytab entry for nfs/*@ILINX
Successfully obtained machine credentials for principal
'nfs/pc.example.com@ILINX' stored in ccache 'FILE:/tmp/krb5cc_machine_ILINX'
INFO: Credentials in CC 'FILE:/tmp/krb5cc_machine_ILINX' are good until
1300913838
using FILE:/tmp/krb5cc_machine_ILINX as credentials cache for machine creds
using environment variable to select krb5 ccache
FILE:/tmp/krb5cc_machine_ILINX
creating context using fsuid 0 (save_uid 0)
creating tcp client for server linux.example.com
DEBUG: port already set to 2049
creating context with server [email protected]
DEBUG: serialize_krb5_ctx: lucid version!
prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8
doing downcall

linux# /usr/sbin/rpc.svcgssd -f -vvv
entering poll
leaving poll
handling null request
sname = nfs/pc.example.com@ILINX
DEBUG: serialize_krb5_ctx: lucid version!
prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8
doing downcall
mech: krb5, hndl len: 4, ctx len 85, timeout: 1300913838 (35999 from
now), uid: -1, gid: -1, num aux grps: 0:
sending null reply
writing message: \x \x60...99 0 0 \x01000000 \x60......b2
finished handling null request
entering poll

pc# mount -t nfs4 -o sec=krb5 linux.example.com:/tmp /mnt/tmp
[hung/blocked]

I don't know if the contents of the "writing message: " were important
but I also don't know if there is sensitive information in it or not, so
being cautious, I did some redacting. Please let me know if that's a
problem.

All of the above is with my original keytabs, before 3des key removal.

Cheers,
b.


Attachments:
signature.asc (262.00 B)
OpenPGP digital signature

2011-03-23 14:29:43

by Kevin Coffman

[permalink] [raw]
Subject: Re: different kernels mean NFS4/GSSAPI works or doesn't

On Wed, Mar 23, 2011 at 7:03 AM, Brian J. Murrell <[email protected]> wrote:
> On 11-03-22 10:41 PM, Kevin Coffman wrote:
>> Hi Brian,
>
> Hi Kevin,
>
>> Can you tell me how you removed the 3des keys from the keytab? ?(If
>> you simply used ktutil, this isn't good enough. ?The KDC will still
>> issue a ticket with 3DES because, as far as it knows, the service
>> still has a 3des key in the KDC and supports 3des.)
>
> Ahh. ?Well, I did use ktutil. ?I opened the keytabs (/etc/krb5.keytab on
> both the client and server) then while they were open in ktutil, removed
> the files from the respective filesystems, deleted the keys in ktutil
> then wrote it back out from ktutil to the filesystem.
>
> But you say this is not good enough, yes? ?Should I be removing keys
> from the KDC also?
>
> kadmin.local: ?getprinc nfs/pc.example.com@ILINX
> Principal: nfs/pc.example.com@ILINX
> Expiration date: [never]
> Last password change: Sun Mar 14 20:51:08 EDT 2010
> Password expiration date: [none]
> Maximum ticket life: 0 days 10:00:00
> Maximum renewable life: 7 days 00:00:00
> Last modified: Sun Mar 14 20:51:08 EDT 2010 (brian/admin@ILINX)
> Last successful authentication: Sat Mar 19 11:39:41 EDT 2011
> Last failed authentication: [never]
> Failed password attempts: 0
> Number of keys: 2
> Key: vno 2, Triple DES cbc mode with HMAC/sha1, no salt
> Key: vno 2, DES cbc mode with CRC-32, no salt
> MKey: vno 1
> Attributes: REQUIRES_PRE_AUTH
> Policy: [none]
>
> and/or
>
> kadmin.local: ?getprinc nfs/linux.example.com@ILINX
> Principal: nfs/linux.example.com@ILINX
> Expiration date: [never]
> Last password change: Sat Nov 01 00:37:11 EDT 2008
> Password expiration date: [none]
> Maximum ticket life: 0 days 10:00:00
> Maximum renewable life: 7 days 00:00:00
> Last modified: Sat Nov 01 00:37:11 EDT 2008 (root/admin@ILINX)
> Last successful authentication: [never]
> Last failed authentication: [never]
> Failed password attempts: 0
> Number of keys: 1
> Key: vno 4, DES cbc mode with CRC-32, no salt
> MKey: vno 1
> Attributes:
> Policy: [none]
>
> Ahhhh. ?But it seems that the server doesn't have a 3des key in the KDC,
> only the client does.

Which should be fine unless there is some callback stuff going. In
that case, the NFS client becomes a server and its keytab would need
to have only a des key as well. You would need to re-issue the client
machine's keytab on the KDC with only a des key. See
http://www.citi.umich.edu/projects/nfsv4/linux/krb5-setup.html

However, that doesn't appear to be the case from looking at the
gssd/svcgssd output. Have you checked the server's syslog for
messages about rpc.gssd not running? (That would indicate it is
trying to initiate a callback and the upcall is failing because
rpc.gssd is not running on the NFS server.) I don't recall that
situation causing the mount to hang though.

>> (Sorry if this was already provided and I missed it.)
>
> No, I don't think we have been here yet.
>
>> Do you have
>> output from gssd and svcgssd in the original failure/hang case?
>
> Here they are:
>
> pc# rpc.gssd with the -f -vvv
> beginning poll
> handling gssd upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt4b90)
> handle_gssd_upcall: 'mech=krb5 uid=0 enctypes=18,17,16,23,3,1,2 '
> handling krb5 upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt4b90)
> process_krb5_upcall: service is '<null>'
> Full hostname for 'linux.example.com' is 'linux.example.com'
> Full hostname for 'pc' is 'pc'
> Key table entry not found while getting keytab entry for 'root/pc@ILINX'
> Key table entry not found while getting keytab entry for 'nfs/pc@ILINX'
> Key table entry not found while getting keytab entry for 'host/pc@ILINX'
> Success getting keytab entry for nfs/*@ILINX
> Successfully obtained machine credentials for principal
> 'nfs/pc.example.com@ILINX' stored in ccache 'FILE:/tmp/krb5cc_machine_ILINX'
> INFO: Credentials in CC 'FILE:/tmp/krb5cc_machine_ILINX' are good until
> 1300913838
> using FILE:/tmp/krb5cc_machine_ILINX as credentials cache for machine creds
> using environment variable to select krb5 ccache
> FILE:/tmp/krb5cc_machine_ILINX
> creating context using fsuid 0 (save_uid 0)
> creating tcp client for server linux.example.com
> DEBUG: port already set to 2049
> creating context with server [email protected]
> DEBUG: serialize_krb5_ctx: lucid version!
> prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8
> doing downcall
>
> linux# /usr/sbin/rpc.svcgssd -f -vvv
> entering poll
> leaving poll
> handling null request
> sname = nfs/pc.example.com@ILINX
> DEBUG: serialize_krb5_ctx: lucid version!
> prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8
> doing downcall
> mech: krb5, hndl len: 4, ctx len 85, timeout: 1300913838 (35999 from
> now), uid: -1, gid: -1, num aux grps: 0:
> sending null reply
> writing message: \x \x60...99 0 0 \x01000000 \x60......b2
> finished handling null request
> entering poll
>
> pc# mount -t nfs4 -o sec=krb5 linux.example.com:/tmp /mnt/tmp
> [hung/blocked]
>
> I don't know if the contents of the "writing message: " were important
> but I also don't know if there is sensitive information in it or not, so
> being cautious, I did some redacting. ?Please let me know if that's a
> problem.
>
> All of the above is with my original keytabs, before 3des key removal.
>
> Cheers,
> b.
>
>

OK. Thanks. There is enough info there, even with the redaction.

> prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8

This shows that a des (enctype 4) session key is being negotiated and
delivered to both kernels, so I don't think any of the Kerberos issues
should be involved here. (At least from the user-land perspective.)

I'm not sure what kernel change would be causing your hang ...

K.C.

2011-03-27 12:52:18

by Brian J. Murrell

[permalink] [raw]
Subject: Re: different kernels mean NFS4/GSSAPI works or doesn't

On 11-03-23 01:07 PM, Trond Myklebust wrote:
> Please don't remove people from the Cc list.

Fair enough. Personally, I dislike getting two copies, but I can
respect others' wishes.

> You said this happens only with RPCSEC_GSS?

Indeed. Hrm. Well, to be honest my two scenarios are NFS4 with
RPCSEC_GSS and NFSv3. I've never really looked into NFS4 without
RPCSEC_GSS. Is that even possible? What's the point? I guess there
are protocol improvements in NFS4 aside from RPCSEC_GSS?

> Have you tried bisecting, to see which changes caused it?

I just did, using Ubuntu's "mainline" kernels, which AFAIU are upstream
kernel source without any of their special sauce, but built with their
.configs.

It seems the problem was introduced in 2.6.32.12. I don't really know
which commit between 2.6.32.11 and 2.6.32.12 though. In fact I don't
even know where to find these so-called .11 and .12 4th order of
releases to see what all went into it.

I did find
http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.32.11 but a
gitweb interface to the changes would be much more useful I think.
Although the number of changes that show up there for matches of /nfs/
or /gss/ are quite small (only one for the latter).

> You can
> probably restrict yourself to bisecting changes to include/linux/sunrpc,
> net/sunrpc and net/sunrpc/auth_gss.

If I could find a gitweb tree to grovel through. :-)

b.






Attachments:
signature.asc (262.00 B)
OpenPGP digital signature

2011-03-23 15:07:29

by Brian J. Murrell

[permalink] [raw]
Subject: Re: different kernels mean NFS4/GSSAPI works or doesn't

On 11-03-23 10:29 AM, Kevin Coffman wrote:
>
> However, that doesn't appear to be the case from looking at the
> gssd/svcgssd output. Have you checked the server's syslog for
> messages about rpc.gssd not running?

Yup. Nothing there.

> OK. Thanks. There is enough info there, even with the redaction.

Ahhh. Good.

> This shows that a des (enctype 4) session key is being negotiated and
> delivered to both kernels, so I don't think any of the Kerberos issues
> should be involved here. (At least from the user-land perspective.)

Hrm. So this gets even more interesting (for some values of "interesting").

> I'm not sure what kernel change would be causing your hang ...

Thanks for looking though.

I'm open to any more ideas, debugging info requests, etc. that anyone
might have.

b.


Attachments:
signature.asc (262.00 B)
OpenPGP digital signature

2011-04-01 13:02:36

by Brian J. Murrell

[permalink] [raw]
Subject: Re: different kernels mean NFS4/GSSAPI works or doesn't

On 11-03-30 08:09 AM, Brian J. Murrell wrote:
>
> OK. So I have linux-2.6.32.11.tar.bz2 and patch-2.6.32.12.bz2 from
> kernel.org.
>
> patch-2.6.32.12.bz2 doesn't apply cleanly to the linux-2.6.32.11 tree
> that's in linux-2.6.32.11.tar.bz2 though:
>
> # bzcat ../patch-2.6.32.12.bz2 | patch -p1 --dry-run
> patching file Documentation/Changes
> Reversed (or previously applied) patch detected! Assume -R? [n]

OK. I figured this out. .12 patch is to 2.6.32 not 2.6.32.11.

Any ideas on which of the below I should try backing out in my .12 build
to see which might be causing the problem?

> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> index fc6a43c..2370ab4 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -485,7 +485,7 @@ gss_refresh_upcall(struct rpc_task *task)
> dprintk("RPC: %5u gss_refresh_upcall for uid %u\n", task->tk_pid,
> cred->cr_uid);
> gss_msg = gss_setup_upcall(task->tk_client, gss_auth, cred);
> - if (IS_ERR(gss_msg) == -EAGAIN) {
> + if (PTR_ERR(gss_msg) == -EAGAIN) {
> /* XXX: warning on the first, under the assumption we
> * shouldn't normally hit this case on a refresh. */
> warn_gssd();
> @@ -644,7 +644,22 @@ gss_pipe_downcall(struct file *filp, const char
> __user *src, size_t mlen)
> p = gss_fill_context(p, end, ctx, gss_msg->auth->mech);
> if (IS_ERR(p)) {
> err = PTR_ERR(p);
> - gss_msg->msg.errno = (err == -EAGAIN) ? -EAGAIN : -EACCES;
> + switch (err) {
> + case -EACCES:
> + gss_msg->msg.errno = err;
> + err = mlen;
> + break;
> + case -EFAULT:
> + case -ENOMEM:
> + case -EINVAL:
> + case -ENOSYS:
> + gss_msg->msg.errno = -EAGAIN;
> + break;
> + default:
> + printk(KERN_CRIT "%s: bad return from "
> + "gss_fill_context: %ld\n", __func__, err);
> + BUG();
> + }
> goto err_release_msg;
> }
> gss_msg->ctx = gss_get_ctx(ctx);
> @@ -1258,9 +1273,8 @@ alloc_enc_pages(struct rpc_rqst *rqstp)
> rqstp->rq_release_snd_buf = priv_release_snd_buf;
> return 0;
> out_free:
> - for (i--; i >= 0; i--) {
> - __free_page(rqstp->rq_enc_pages[i]);
> - }
> + rqstp->rq_enc_pages_num = i;
> + priv_release_snd_buf(rqstp);
> out:
> return -EAGAIN;
> }
> diff --git a/net/sunrpc/auth_gss/gss_krb5_mech.c
> b/net/sunrpc/auth_gss/gss_krb5_mech.c
> index ef45eba..2deb0ed 100644
> --- a/net/sunrpc/auth_gss/gss_krb5_mech.c
> +++ b/net/sunrpc/auth_gss/gss_krb5_mech.c
> @@ -131,8 +131,10 @@ gss_import_sec_context_kerberos(const void *p,
> struct krb5_ctx *ctx;
> int tmp;
>
> - if (!(ctx = kzalloc(sizeof(*ctx), GFP_NOFS)))
> + if (!(ctx = kzalloc(sizeof(*ctx), GFP_NOFS))) {
> + p = ERR_PTR(-ENOMEM);
> goto out_err;
> + }
>
> p = simple_get_bytes(p, end, &ctx->initiate, sizeof(ctx->initiate));
> if (IS_ERR(p))
> diff --git a/net/sunrpc/auth_gss/gss_mech_switch.c
> b/net/sunrpc/auth_gss/gss_mech_switch.c
> index 6efbb0c..76e4c6f 100644
> --- a/net/sunrpc/auth_gss/gss_mech_switch.c
> +++ b/net/sunrpc/auth_gss/gss_mech_switch.c
> @@ -252,7 +252,7 @@ gss_import_sec_context(const void *input_token,
> size_t bufsize,
> struct gss_ctx **ctx_id)
> {
> if (!(*ctx_id = kzalloc(sizeof(**ctx_id), GFP_KERNEL)))
> - return GSS_S_FAILURE;
> + return -ENOMEM;
> (*ctx_id)->mech_type = gss_mech_get(mech);
>
> return mech->gm_ops
> diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
> index 49278f8..27a2378 100644
> --- a/net/sunrpc/rpc_pipe.c
> +++ b/net/sunrpc/rpc_pipe.c
> @@ -587,6 +587,8 @@ static struct dentry
> *__rpc_lookup_create_exclusive(struct dentry *parent,
> struct dentry *dentry;
>
> dentry = __rpc_lookup_create(parent, name);
> + if (IS_ERR(dentry))
> + return dentry;
> if (dentry->d_inode == NULL)
> return dentry;
> dput(dentry);
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index df124f7..3fbd6ba 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -711,7 +711,10 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> spin_unlock_bh(&pool->sp_lock);
>
> len = 0;
> - if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
> + if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> + dprintk("svc_recv: found XPT_CLOSE\n");
> + svc_delete_xprt(xprt);
> + } else if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
> struct svc_xprt *newxpt;
> newxpt = xprt->xpt_ops->xpo_accept(xprt);
> if (newxpt) {
> @@ -737,7 +740,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> svc_xprt_received(newxpt);
> }
> svc_xprt_received(xprt);
> - } else if (!test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> + } else {
> dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
> rqstp, pool->sp_id, xprt,
> atomic_read(&xprt->xpt_ref.refcount));
> @@ -750,11 +753,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> dprintk("svc: got len=%d\n", len);
> }
>
> - if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> - dprintk("svc_recv: found XPT_CLOSE\n");
> - svc_delete_xprt(xprt);
> - }
> -
> /* No data, incomplete (TCP) read, or accept() */
> if (len == 0 || len == -EAGAIN) {
> rqstp->rq_res.len = 0;
> @@ -900,11 +898,8 @@ void svc_delete_xprt(struct svc_xprt *xprt)
> if (test_bit(XPT_TEMP, &xprt->xpt_flags))
> serv->sv_tmpcnt--;
>
> - for (dr = svc_deferred_dequeue(xprt); dr;
> - dr = svc_deferred_dequeue(xprt)) {
> - svc_xprt_put(xprt);
> + while ((dr = svc_deferred_dequeue(xprt)) != NULL)
> kfree(dr);
> - }
>
> svc_xprt_put(xprt);
> spin_unlock_bh(&serv->sv_lock);
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index 117f68a..97cc3de 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -655,23 +655,25 @@ static struct unix_gid *unix_gid_lookup(uid_t uid)
> return NULL;
> }
>
> -static int unix_gid_find(uid_t uid, struct group_info **gip,
> - struct svc_rqst *rqstp)
> +static struct group_info *unix_gid_find(uid_t uid, struct svc_rqst *rqstp)
> {
> - struct unix_gid *ug = unix_gid_lookup(uid);
> + struct unix_gid *ug;
> + struct group_info *gi;
> + int ret;
> +
> + ug = unix_gid_lookup(uid);
> if (!ug)
> - return -EAGAIN;
> - switch (cache_check(&unix_gid_cache, &ug->h, &rqstp->rq_chandle)) {
> + return ERR_PTR(-EAGAIN);
> + ret = cache_check(&unix_gid_cache, &ug->h, &rqstp->rq_chandle);
> + switch (ret) {
> case -ENOENT:
> - *gip = NULL;
> - return 0;
> + return ERR_PTR(-ENOENT);
> case 0:
> - *gip = ug->gi;
> - get_group_info(*gip);
> + gi = get_group_info(ug->gi);
> cache_put(&ug->h, &unix_gid_cache);
> - return 0;
> + return gi;
> default:
> - return -EAGAIN;
> + return ERR_PTR(-EAGAIN);
> }
> }
>
> @@ -681,6 +683,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
> struct sockaddr_in *sin;
> struct sockaddr_in6 *sin6, sin6_storage;
> struct ip_map *ipm;
> + struct group_info *gi;
> + struct svc_cred *cred = &rqstp->rq_cred;
>
> switch (rqstp->rq_addr.ss_family) {
> case AF_INET:
> @@ -722,6 +726,17 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
> ip_map_cached_put(rqstp, ipm);
> break;
> }
> +
> + gi = unix_gid_find(cred->cr_uid, rqstp);
> + switch (PTR_ERR(gi)) {
> + case -EAGAIN:
> + return SVC_DROP;
> + case -ENOENT:
> + break;
> + default:
> + put_group_info(cred->cr_group_info);
> + cred->cr_group_info = gi;
> + }
> return SVC_OK;
> }
>
> @@ -818,19 +833,11 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32
> *authp)
> slen = svc_getnl(argv); /* gids length */
> if (slen > 16 || (len -= (slen + 2)*4) < 0)
> goto badcred;
> - if (unix_gid_find(cred->cr_uid, &cred->cr_group_info, rqstp)
> - == -EAGAIN)
> + cred->cr_group_info = groups_alloc(slen);
> + if (cred->cr_group_info == NULL)
> return SVC_DROP;
> - if (cred->cr_group_info == NULL) {
> - cred->cr_group_info = groups_alloc(slen);
> - if (cred->cr_group_info == NULL)
> - return SVC_DROP;
> - for (i = 0; i < slen; i++)
> - GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
> - } else {
> - for (i = 0; i < slen ; i++)
> - svc_getnl(argv);
> - }
> + for (i = 0; i < slen; i++)
> + GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
> if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {
> *authp = rpc_autherr_badverf;
> return SVC_DENIED;
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 1c246a4..70b0a22 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -968,6 +968,7 @@ static int svc_tcp_recv_record(struct svc_sock
> *svsk, struct svc_rqst *rqstp)
> return len;
> err_delete:
> set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> + svc_xprt_received(&svsk->sk_xprt);
> err_again:
> return -EAGAIN;
> }
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 37c5475..5cdbf7c 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1926,6 +1926,11 @@ static void xs_tcp_setup_socket(struct rpc_xprt
> *xprt,
> case -EALREADY:
> xprt_clear_connecting(xprt);
> return;
> + case -EINVAL:
> + /* Happens, for instance, if the user specified a link
> + * local IPv6 address without a scope-id.
> + */
> + goto out;
> }
> out_eagain:
> status = -EAGAIN;
>
> Anything jump out at you?
>
> b.
>



Attachments:
signature.asc (262.00 B)
OpenPGP digital signature