2012-03-16 15:46:45

by Sachin Prabhu

[permalink] [raw]
Subject: Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server

We have a user report that they see the following messages
in /var/log/messages and the NFS share hangs when a user's kerberos
credentials expire.

kernel: Error: state manager encountered RPCSEC_GSS session expired
against NFSv4 server vm140-31.

The reproducer is as follows

1. Configure NFS4 + Kerberos, mount nfs4 share on the client side using
sec=krb5.

2. Create 2 nfsusers, login as user1, obtain a kerberos ticket with a
short duration and open a file on the nfs share. Leave this file open
# su - user1
$ kinit -l 5m
$ cd /home/user1
$ touch file1.txt
$ sleep 100000 < file1.txt &

3. After 300 seconds, on a different terminal, login as user2, obtain a
kerberos ticket and attempt to open a file.
# su - user2
$ kinit
$ cd /home/user2
$ touch myfile1.txt
.
.
At this point, the process hangs and /var/log/messages are filled up
with the following messages.
kernel: Error: state manager encountered RPCSEC_GSS session expired
against NFSv4 server $(hostname)

On further debugging, we found the cause to be the that the state
manager uses the credentials of the first stateowner with open files it
finds. These are returned by nfs4_get_renew_cred_locked() ->
nfs4_get_renew_cred_server_locked() to call the RENEW.

1) The server before it opens a file needs to set a client id. It does
this by calling the SET_CLIENTID call. The server in response returns a
client id.
Since kernel 2.6.29(commit a7b721037f898b29a8083da59b1dccd3da385b07) the
SET_CLIENTID call is made using the machine credentials.

2) However all subsequent RENEW calls for that clientid, the server uses
the first credential it finds which is used by an open file on that
machine. In our test case, it is the user with the expired ticket.
When the ticket expires, the call to refresh the credentials, made at
call_refresh -> rpcauth_refreshcred -> gss_refresh()
returns EKEYEXPIRED.
This means that the RENEW call fails before it could be sent over the
wire.
The clientid on the server eventually expires.

3) When the user with the valid ticket then attempts to open a file, the
server returns a NFS4ERR_EXPIRED which indicates that clientid at the
server is no longer valid. A warning message is printed out at this
time. To fix this, the client attempts to RENEW. This hits the problem
in step 2.

Step 2 and 3 now run continously and no RENEW calls are sent over the
wire.

The SET_CLIENTID calls are made using the machine creds. Why don't we
simply use the machine creds to renew the clientid?

Something similar to the patch below should do the trick.

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ec9f6ef..607ba50 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6194,7 +6194,7 @@ struct nfs4_state_recovery_ops
nfs41_nograce_recovery_ops = {

struct nfs4_state_maintenance_ops nfs40_state_renewal_ops = {
.sched_state_renewal = nfs4_proc_async_renew,
- .get_state_renewal_cred_locked = nfs4_get_renew_cred_locked,
+ .get_state_renewal_cred_locked = nfs4_get_setclientid_cred,
.renew_lease = nfs4_proc_renew,
};


Sachin Prabhu



2012-03-16 19:03:34

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server

T24gRnJpLCAyMDEyLTAzLTE2IGF0IDEzOjQyIC0wNDAwLCBEYW5pZWwgS2FobiBHaWxsbW9yIHdy
b3RlOg0KPiBPbiAwMy8xNi8yMDEyIDAxOjM2IFBNLCBNeWtsZWJ1c3QsIFRyb25kIHdyb3RlOg0K
PiA+IFRoZSBwcm9ibGVtIGlzIHRoYXQgaWYgdGhlIGNsaWVudCBkb2Vzbid0IGhhdmUgYSBtYWNo
aW5lIGNyZWQsIHRoZW4geW91DQo+ID4gZW5kIHVwIHRha2luZyBhIHJhbmRvbSB1c2VyIGNyZWRl
bnRpYWwgdGhhdCBtYXkgbm90IGN1cnJlbnRseSBiZSBob2xkaW5nDQo+ID4gYW55IE9QRU4gZmls
ZXMuIEluIHRoYXQgY2FzZSB0b28gdGhlIFJFTkVXIHdpbGwgZmFpbC4NCj4gDQo+IFNvIGlmIGkn
bSB1bmRlcnN0YW5kaW5nIHRoaXMgcmlnaHQ6DQo+IA0KPiBTYWNoaW4ncyBwcm9wb3NhbCBmYWls
cyB3aGVuIHRoZSBtYWNoaW5lIGhhcyBubyBtYWNoaW5lIGNyZWRzLg0KPiANCj4gVGhlIGN1cnJl
bnQgaW1wbGVtZW50YXRpb24gZmFpbHMgd2hlbiB0aGUgbG9nZ2VkLWluIHVzZXIncyBjcmVkZW50
aWFscyANCj4gYXJlIGV4cGlyZWQuDQo+IA0KPiBDYW4gdGhlIE5GUyBjbGllbnQncyBsb2dpYyB0
ZXN0IGZvciB0aG9zZSBkaWZmZXJlbnQgY2FzZXMsIHVzZSB0aGUgDQo+IGFwcHJvcHJpYXRlIGNy
ZWRzIGZvciBSRU5FVyBpbiB0aGVpciBkaWZmZXJlbmNlLCBhbmQgcmVkdWNlIHRoZSBmYWlsdXJl
IA0KPiBjYXNlIHRvIHRoZWlyIGludGVyc2VjdGlvbj8NCg0KU2ltcGxlciBzb2x1dGlvbjogYWRk
IHRoZSBjYWxsIHRvIGdldCBtYWNoaW5lIGNyZWRzIHRvIHRoZSBleGlzdGluZw0KZ2V0X3JlbmV3
X2NyZWRzIGZ1bmN0aW9uLg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQg
bWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0
YXBwLmNvbQ0KDQo=

2012-03-16 19:26:58

by Sachin Prabhu

[permalink] [raw]
Subject: Re: Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server

On Fri, 2012-03-16 at 19:03 +0000, Myklebust, Trond wrote:
> On Fri, 2012-03-16 at 13:42 -0400, Daniel Kahn Gillmor wrote:
> > On 03/16/2012 01:36 PM, Myklebust, Trond wrote:
> > > The problem is that if the client doesn't have a machine cred, then you
> > > end up taking a random user credential that may not currently be holding
> > > any OPEN files. In that case too the RENEW will fail.
> >
> > So if i'm understanding this right:
> >
> > Sachin's proposal fails when the machine has no machine creds.
> >
> > The current implementation fails when the logged-in user's credentials
> > are expired.
> >
> > Can the NFS client's logic test for those different cases, use the
> > appropriate creds for RENEW in their difference, and reduce the failure
> > case to their intersection?
>
> Simpler solution: add the call to get machine creds to the existing
> get_renew_creds function.

Hello Trond,

I've sent a patch to the Mailing list using your suggestion above.

Thanks
Sachin Prabhu


2012-03-16 19:26:11

by Sachin Prabhu

[permalink] [raw]
Subject: [PATCH] Try using machine credentials for RENEW calls

Using user credentials for RENEW calls will fail when the user
credentials have expired.

To avoid this, try using the machine credentials when making RENEW
calls. If no machine credentials have been set, fall back to using user
credentials as before.

Signed-off-by: Sachin Prabhu <[email protected]>

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 4539203..10194eb 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -146,6 +146,11 @@ struct rpc_cred *nfs4_get_renew_cred_locked(struct nfs_client *clp)
struct rpc_cred *cred = NULL;
struct nfs_server *server;

+ /* Use machine credentials if available */
+ cred = nfs4_get_machine_cred_locked(clp);
+ if (cred != NULL)
+ goto out;
+
rcu_read_lock();
list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) {
cred = nfs4_get_renew_cred_server_locked(server);
@@ -153,6 +158,8 @@ struct rpc_cred *nfs4_get_renew_cred_locked(struct nfs_client *clp)
break;
}
rcu_read_unlock();
+
+out:
return cred;
}




2012-03-16 17:42:02

by Daniel Kahn Gillmor

[permalink] [raw]
Subject: Re: Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server

On 03/16/2012 01:36 PM, Myklebust, Trond wrote:
> The problem is that if the client doesn't have a machine cred, then you
> end up taking a random user credential that may not currently be holding
> any OPEN files. In that case too the RENEW will fail.

So if i'm understanding this right:

Sachin's proposal fails when the machine has no machine creds.

The current implementation fails when the logged-in user's credentials
are expired.

Can the NFS client's logic test for those different cases, use the
appropriate creds for RENEW in their difference, and reduce the failure
case to their intersection?

--dkg

2012-03-16 17:36:33

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server

T24gRnJpLCAyMDEyLTAzLTE2IGF0IDE1OjQ2ICswMDAwLCBTYWNoaW4gUHJhYmh1IHdyb3RlOg0K
PiBXZSBoYXZlIGEgdXNlciByZXBvcnQgdGhhdCB0aGV5IHNlZSB0aGUgZm9sbG93aW5nIG1lc3Nh
Z2VzDQo+IGluIC92YXIvbG9nL21lc3NhZ2VzIGFuZCB0aGUgTkZTIHNoYXJlIGhhbmdzIHdoZW4g
YSB1c2VyJ3Mga2VyYmVyb3MNCj4gY3JlZGVudGlhbHMgZXhwaXJlLg0KPiANCj4ga2VybmVsOiBF
cnJvcjogc3RhdGUgbWFuYWdlciBlbmNvdW50ZXJlZCBSUENTRUNfR1NTIHNlc3Npb24gZXhwaXJl
ZA0KPiBhZ2FpbnN0IE5GU3Y0IHNlcnZlciB2bTE0MC0zMS4NCj4gDQo+IFRoZSByZXByb2R1Y2Vy
IGlzIGFzIGZvbGxvd3MNCj4gDQo+IDEuIENvbmZpZ3VyZSBORlM0ICsgS2VyYmVyb3MsIG1vdW50
IG5mczQgc2hhcmUgb24gdGhlIGNsaWVudCBzaWRlIHVzaW5nDQo+IHNlYz1rcmI1Lg0KPiANCj4g
Mi4gQ3JlYXRlIDIgbmZzdXNlcnMsIGxvZ2luIGFzIHVzZXIxLCBvYnRhaW4gYSBrZXJiZXJvcyB0
aWNrZXQgd2l0aCBhDQo+IHNob3J0IGR1cmF0aW9uIGFuZCBvcGVuIGEgZmlsZSBvbiB0aGUgbmZz
IHNoYXJlLiBMZWF2ZSB0aGlzIGZpbGUgb3Blbg0KPiAjIHN1IC0gdXNlcjENCj4gJCBraW5pdCAt
bCA1bQ0KPiAkIGNkIC9ob21lL3VzZXIxDQo+ICQgdG91Y2ggZmlsZTEudHh0DQo+ICQgc2xlZXAg
MTAwMDAwIDwgZmlsZTEudHh0ICYNCj4gDQo+IDMuIEFmdGVyIDMwMCBzZWNvbmRzLCBvbiBhIGRp
ZmZlcmVudCB0ZXJtaW5hbCwgbG9naW4gYXMgdXNlcjIsIG9idGFpbiBhDQo+IGtlcmJlcm9zIHRp
Y2tldCBhbmQgYXR0ZW1wdCB0byBvcGVuIGEgZmlsZS4NCj4gIyBzdSAtIHVzZXIyDQo+ICQga2lu
aXQNCj4gJCBjZCAvaG9tZS91c2VyMg0KPiAkIHRvdWNoIG15ZmlsZTEudHh0DQo+IC4NCj4gLg0K
PiBBdCB0aGlzIHBvaW50LCB0aGUgcHJvY2VzcyBoYW5ncyBhbmQgL3Zhci9sb2cvbWVzc2FnZXMg
YXJlIGZpbGxlZCB1cA0KPiB3aXRoIHRoZSBmb2xsb3dpbmcgbWVzc2FnZXMuDQo+IGtlcm5lbDog
RXJyb3I6IHN0YXRlIG1hbmFnZXIgZW5jb3VudGVyZWQgUlBDU0VDX0dTUyBzZXNzaW9uIGV4cGly
ZWQNCj4gYWdhaW5zdCBORlN2NCBzZXJ2ZXIgJChob3N0bmFtZSkNCj4gDQo+IE9uIGZ1cnRoZXIg
ZGVidWdnaW5nLCB3ZSBmb3VuZCB0aGUgY2F1c2UgdG8gYmUgdGhlIHRoYXQgdGhlIHN0YXRlDQo+
IG1hbmFnZXIgdXNlcyB0aGUgY3JlZGVudGlhbHMgb2YgdGhlIGZpcnN0IHN0YXRlb3duZXIgd2l0
aCBvcGVuIGZpbGVzIGl0DQo+IGZpbmRzLiBUaGVzZSBhcmUgcmV0dXJuZWQgYnkgbmZzNF9nZXRf
cmVuZXdfY3JlZF9sb2NrZWQoKSAtPg0KPiBuZnM0X2dldF9yZW5ld19jcmVkX3NlcnZlcl9sb2Nr
ZWQoKSB0byBjYWxsIHRoZSBSRU5FVy4NCj4gDQo+IDEpIFRoZSBzZXJ2ZXIgYmVmb3JlIGl0IG9w
ZW5zIGEgZmlsZSBuZWVkcyB0byBzZXQgYSBjbGllbnQgaWQuIEl0IGRvZXMNCj4gdGhpcyBieSBj
YWxsaW5nIHRoZSBTRVRfQ0xJRU5USUQgY2FsbC4gVGhlIHNlcnZlciBpbiByZXNwb25zZSByZXR1
cm5zIGENCj4gY2xpZW50IGlkLiANCj4gU2luY2Uga2VybmVsIDIuNi4yOShjb21taXQgYTdiNzIx
MDM3Zjg5OGIyOWE4MDgzZGE1OWIxZGNjZDNkYTM4NWIwNykgdGhlDQo+IFNFVF9DTElFTlRJRCBj
YWxsIGlzIG1hZGUgdXNpbmcgdGhlIG1hY2hpbmUgY3JlZGVudGlhbHMuIA0KPiANCj4gMikgSG93
ZXZlciBhbGwgc3Vic2VxdWVudCBSRU5FVyBjYWxscyBmb3IgdGhhdCBjbGllbnRpZCwgdGhlIHNl
cnZlciB1c2VzDQo+IHRoZSBmaXJzdCBjcmVkZW50aWFsIGl0IGZpbmRzIHdoaWNoIGlzIHVzZWQg
YnkgYW4gb3BlbiBmaWxlIG9uIHRoYXQNCj4gbWFjaGluZS4gIEluIG91ciB0ZXN0IGNhc2UsIGl0
IGlzIHRoZSB1c2VyIHdpdGggdGhlIGV4cGlyZWQgdGlja2V0LiANCj4gV2hlbiB0aGUgdGlja2V0
IGV4cGlyZXMsIHRoZSBjYWxsIHRvIHJlZnJlc2ggdGhlIGNyZWRlbnRpYWxzLCBtYWRlIGF0DQo+
IGNhbGxfcmVmcmVzaCAtPiBycGNhdXRoX3JlZnJlc2hjcmVkIC0+IGdzc19yZWZyZXNoKCkNCj4g
cmV0dXJucyBFS0VZRVhQSVJFRC4NCj4gVGhpcyBtZWFucyB0aGF0IHRoZSBSRU5FVyBjYWxsIGZh
aWxzIGJlZm9yZSBpdCBjb3VsZCBiZSBzZW50IG92ZXIgdGhlDQo+IHdpcmUuIA0KPiBUaGUgY2xp
ZW50aWQgb24gdGhlIHNlcnZlciBldmVudHVhbGx5IGV4cGlyZXMuDQo+IA0KPiAzKSBXaGVuIHRo
ZSB1c2VyIHdpdGggdGhlIHZhbGlkIHRpY2tldCB0aGVuIGF0dGVtcHRzIHRvIG9wZW4gYSBmaWxl
LCB0aGUNCj4gc2VydmVyIHJldHVybnMgYSBORlM0RVJSX0VYUElSRUQgd2hpY2ggaW5kaWNhdGVz
IHRoYXQgY2xpZW50aWQgYXQgdGhlDQo+IHNlcnZlciBpcyBubyBsb25nZXIgdmFsaWQuIEEgd2Fy
bmluZyBtZXNzYWdlIGlzIHByaW50ZWQgb3V0IGF0IHRoaXMNCj4gdGltZS4gVG8gZml4IHRoaXMs
IHRoZSBjbGllbnQgYXR0ZW1wdHMgdG8gUkVORVcuIFRoaXMgaGl0cyB0aGUgcHJvYmxlbQ0KPiBp
biBzdGVwIDIuDQo+IA0KPiBTdGVwIDIgYW5kIDMgbm93IHJ1biBjb250aW5vdXNseSBhbmQgbm8g
UkVORVcgY2FsbHMgYXJlIHNlbnQgb3ZlciB0aGUNCj4gd2lyZS4NCj4gDQo+IFRoZSBTRVRfQ0xJ
RU5USUQgY2FsbHMgYXJlIG1hZGUgdXNpbmcgdGhlIG1hY2hpbmUgY3JlZHMuIFdoeSBkb24ndCB3
ZQ0KPiBzaW1wbHkgdXNlIHRoZSBtYWNoaW5lIGNyZWRzIHRvIHJlbmV3IHRoZSBjbGllbnRpZD8N
Cg0KVGhlIHByb2JsZW0gaXMgdGhhdCBpZiB0aGUgY2xpZW50IGRvZXNuJ3QgaGF2ZSBhIG1hY2hp
bmUgY3JlZCwgdGhlbiB5b3UNCmVuZCB1cCB0YWtpbmcgYSByYW5kb20gdXNlciBjcmVkZW50aWFs
IHRoYXQgbWF5IG5vdCBjdXJyZW50bHkgYmUgaG9sZGluZw0KYW55IE9QRU4gZmlsZXMuIEluIHRo
YXQgY2FzZSB0b28gdGhlIFJFTkVXIHdpbGwgZmFpbC4NCg0KQ2hlZXJzDQogIFRyb25kDQotLSAN
ClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0K
VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg==

2016-07-27 13:50:08

by Hari Krishnan

[permalink] [raw]
Subject: Re: Error: state manager encountered RPCSEC_GSS session expired against NFSv4 server

Have u got a fix? I'm having same issue.

Myklebust, Trond <Trond.Myklebust@...> writes: