2012-09-06 19:54:08

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

From: Andy Adamson <[email protected]>

This patch set is a Request for Comments: I think it does a fair job of
handling the issue - but it is complicated, and other brains might find a
better way to get the job done.

We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
context key) that is about to expire or has expired. We currently will
paint ourselves into a corner by returning success to the applciation
for such a buffered WRITE, only to discover that we do not have permission when
we attempt to flush the WRITE (and potentially associated COMMIT) to disk.
This results in data corruption.

First, a couple of "setup" patches:
1) Patch SUNRPC handle EKEYEXPIRED in call_refreshresult returns EACCES to the
application on an expired or non-existent gss context when the users (Kerberos)
credentials have also expired or are non-existent. Current behavior is to
retry upcalls to GSSD forever. Please see patch comment for detail.

2) Patch SUNRPC set gss gc_expiry to full lifetime works in conjunction with
the gssd patch "0001-GSSD-Pass-GSS_context-lifetime-to-the-kernel". The
gssd patch passes the actual remaining TGT lifetime in the downcall, and
this kernel patch sets the gss context gc_expiry to this lifetime.

Then the two patches that avoid using an expired credential key:
3) Patch SUNRPC new rpc_credops to test credential expiry is the heart of this
work. It provides the RPC layer helper functions to allow NFS to manage
data in the face of expired credentials

4) Patch NFS avoid expired credential keys for buffered writes calls the
RPC helper functions.

Pages for buffered WRITEs are allocated in nfs_write_begin where we have an
nfs_open_context and associated rpc_cred. This is a generic rpc_cred, NOT
the gss_cred used in the actual WRITE RPC. Each WRITE RPC call takes the generic
rpc_cred (or uses the 'current_cred') uid and uses it to lookup the associated
gss_cred and gss_context in the call_refresh RPC state. So, there is a
one-to-one association between the nfs_open_context generic_cred and a
gss_cred with a matching uid and a valid non expired gss context.

We need to check the nfs_open_context generic cred 'underlying' gss_cred
gss_context gc_expiry in nfs_write_begin to determine if there is enough time
left in the gss_context lifetime to complete the buffered WRITE.

I started by adding a "key_timeout" rpc_authops routine only set by the generic
auth to do this work, called by rpcauth_key_timeout_notify, called from
nfs_write_begin. It does the lookup of the gss_cred (see the patch for
fast-tracking of non-gss underlying creds) and then tests the gss_context
gc_expiry against timeouts by calling a new crkey_timeout rpc_credops set
only for the gss_cred.

I came up with the idea of two credential key timeout watermarks:
1) A high watermark, RPC_KEY_EXPIRE_TIMEO set to 90 seconds.
2) A low watermark, RPC_KEY_EXPIRE_FAIL set to 30 seconds.
NOTE: these timeouts are guesses that work in a VM environment. We may want to
make them adjustable via module parameters.

If key_timeout is called on a credential with an underlying credential key that
will expire within high watermark seconds, we set the RPC_CRED_KEY_EXPIRE_SOON
flag in the generic_cred acred so that the NFS layer can clean up prior to
key expiration It does this by calling a new crkey_to_expire rpc_credop set
only for generic creds that tests for the RPC_CRED_KEY_EXPIRE_SOON flag.

If the RPC_CRED_KEY_EXPIRE_SOON flag is set in the nfs_open_context generic
cred (the acred portion), then nfs_write_end will call nfs_wb_all
on EVERY WRITE CALL it sees, and nfs_write_rpcsetup will send NFS_FILE_SYNC
WRITEs. The idea of the high watermark is to give time to flush all buffered
WRITEs and COMMITs, as well as to continue to WRITE, but only with
NFS_FILE_SYNC, allowing the application to try to finish writing before the
gss context expires. NOTE that this means EACH WRITE within the high watermark
timeout is a singe PAGE of NFS_FILE_SYNC. I think this is fine, because we are
in a failure mode - the most important thing is to NOT fail a buffered WRITE..

If key_timeout is called on a credential key that will expire within low
watermark seconds, we return EACCES from nfs_write_begin to stop
the buffered WRITE using the credential key.

Checking a generic credential's underlying credential involves a cred lookup.
To avoid this lookup in the normal case when the underlying credential has
a key that is valid (before the high watermark), a notify flag is set in
the generic credential the first time the key_timeout is called. The
generic credential then stops checking the underlying credential key expiry, and
the underlying credential (gss_cred) match routine then checks the key
expiration upon each normal use and sets a flag in the associated generic
credential only when the key expiration is within the high watermark.
This in turn signals the generic credential key_timeout to perform the extra
credetial lookup thereafter.

TING:

I have tested these patches with a TGT of 5 minutes, and a modified
Connectathon special test: bigfile.c that only writes and never flushes.
I also increased the file size to 524288000 bytes to give me time.
I named the test buffered-write.

I kinit, and run 6 instances of buffered write with the first and maybe second
test completing prior to TGT expiration, and the third/fourth tests spanning
the high water mark. The fifth test starts within the high water mark, and the
sixth test starts within the low water mark.

I have seen the expected behavior: test 1,2 succeed. Tests 3-6 fail with
Permission Denied. Tests 3 and 4 start with normal buffered UNSTABLE writes
then switch to single page NFS_FILE_SYNC writes with a COMMIT for the
normal buffered writes. Test 5 has only single page NFS_FILE_SYNC writes,
test 6 simply fails.

None of the WRITEs fail.

ISSUES
1) Once in a while I see "short write" instead of "Permission
denied". I cannot reproduce this. Even then wireshark shows all writes succeeding.

2) Although I have tested re-establishing Kerberos credentials after the
test run, I have not done any testing re-estabishing Kerberos credentials
during the a test run that is within the high watermark.

3) General corner case and other testing

-->Andy

Andy Adamson (4):
SUNRPC handle EKEYEXPIRED in call_refreshresult
SUNRPC set gss gc_expiry to full lifetime
SUNRPC new rpc_credops to test credential expiry
NFS avoid expired credential keys for buffered writes

fs/nfs/file.c | 10 ++++
fs/nfs/internal.h | 2 +
fs/nfs/nfs3proc.c | 6 +-
fs/nfs/nfs4filelayout.c | 1 -
fs/nfs/nfs4proc.c | 18 --------
fs/nfs/nfs4state.c | 22 ---------
fs/nfs/proc.c | 43 ------------------
fs/nfs/write.c | 29 ++++++++++++
include/linux/sunrpc/auth.h | 27 ++++++++++++
net/sunrpc/auth.c | 21 +++++++++
net/sunrpc/auth_generic.c | 93 ++++++++++++++++++++++++++++++++++++++++
net/sunrpc/auth_gss/auth_gss.c | 51 +++++++++++++++++++---
net/sunrpc/clnt.c | 1 +
13 files changed, 231 insertions(+), 93 deletions(-)

--
1.7.7.6



2012-09-06 19:54:11

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 4/4] NFS avoid expired credential keys for buffered writes

From: Andy Adamson <[email protected]>

We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
context key) that is about to expire or has expired. We currently will
paint ourselves into a corner by returning success to the applciation
for such a buffered WRITE, only to discover that we do not have permission when
we attempt to flush the WRITE (and potentially associated COMMIT) to disk.

Use the RPC layer credential key timeout and expire routines which use a
high and and a low watermark - RPC_KEY_EXPIRE_TIMEO and RPC_KEY_EXPIRE_FAIL.
We test the key in nfs_write_begin.

If a buffered WRITE is using a credential with a key that will expire within
high watermark seconds, flush the inode in nfs_write_end and send only
NFS_FILE_SYNC WRITEs. Note that this results in single page NFS_FILE_SYNC WRITEs.

If the buffered WRITE is using a credential key that will expire within low
watermark seconds, fail the WRITE in nfs_write_begin _before_ the WRITE is
buffered and return -EACCES to the application.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/file.c | 10 ++++++++++
fs/nfs/internal.h | 2 ++
fs/nfs/write.c | 29 +++++++++++++++++++++++++++++
3 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 75d6d0a..df776e5 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -345,6 +345,7 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
int ret;
pgoff_t index = pos >> PAGE_CACHE_SHIFT;
struct page *page;
+ struct nfs_open_context *ctx = nfs_file_open_context(file);
int once_thru = 0;

dfprintk(PAGECACHE, "NFS: write_begin(%s/%s(%ld), %u@%lld)\n",
@@ -362,6 +363,10 @@ start:
if (ret)
return ret;

+ ret = nfs_ctx_key_timeout_notify(ctx);
+ if (ret)
+ return ret;
+
page = grab_cache_page_write_begin(mapping, index, flags);
if (!page)
return -ENOMEM;
@@ -387,6 +392,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
struct page *page, void *fsdata)
{
unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
+ struct nfs_open_context *ctx = nfs_file_open_context(file);
int status;

dfprintk(PAGECACHE, "NFS: write_end(%s/%s(%ld), %u@%lld)\n",
@@ -422,6 +428,10 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
if (status < 0)
return status;
NFS_I(mapping->host)->write_io += copied;
+
+ if (nfs_ctx_key_to_expire(ctx))
+ nfs_wb_all(mapping->host);
+
return copied;
}

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 31fdb03..cf4764e 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -449,6 +449,8 @@ void nfs_request_remove_commit_list(struct nfs_page *req,
void nfs_init_cinfo(struct nfs_commit_info *cinfo,
struct inode *inode,
struct nfs_direct_req *dreq);
+int nfs_ctx_key_timeout_notify(struct nfs_open_context *ctx);
+bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx);

#ifdef CONFIG_MIGRATION
extern int nfs_migrate_page(struct address_space *,
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index e3b5537..9fa538f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -872,6 +872,32 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
}

/*
+ * Avoid buffered writes when a open context credential's key would
+ * expire soon.
+ *
+ * Returns -EACCES if the key will expire within RPC_KEY_EXPIRE_FAIL.
+ *
+ * Return 0 and set a credential flag which triggers the inode to flush
+ * and performs NFS_FILE_SYNC writes if the key will expired within
+ * RPC_KEY_EXPIRE_TIMEO.
+ */
+int
+nfs_ctx_key_timeout_notify(struct nfs_open_context *ctx)
+{
+ struct rpc_auth *auth = NFS_SERVER(ctx->state->inode)->client->cl_auth;
+
+ return rpcauth_key_timeout_notify(auth, ctx->cred);
+}
+
+/*
+ * Test if the open context credential key is marked to expire soon.
+ */
+bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx)
+{
+ return rpcauth_cred_key_to_expire(ctx->cred);
+}
+
+/*
* If the page cache is marked as unsafe or invalid, then we can't rely on
* the PageUptodate() flag. In this case, we will need to turn off
* write optimisations that depend on the page contents being correct.
@@ -1024,6 +1050,9 @@ static void nfs_write_rpcsetup(struct nfs_write_data *data,
data->args.stable = NFS_FILE_SYNC;
}

+ if (nfs_ctx_key_to_expire(data->args.context))
+ data->args.stable = NFS_FILE_SYNC;
+
data->res.fattr = &data->fattr;
data->res.count = count;
data->res.verf = &data->verf;
--
1.7.7.6


2012-09-10 20:15:07

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

T24gTW9uLCAyMDEyLTA5LTEwIGF0IDE1OjU2IC0wNDAwLCBKaW0gUmVlcyB3cm90ZToNCj4gSmVm
ZiBMYXl0b24gd3JvdGU6DQo+IA0KPiAgIE15IG9ubHkgY29uY2VybiBpcyB0aGlzIGJlaGF2aW9y
IGRlc2NyaWJlZCBpbiB0aGUgbGFzdCBwYXRjaDoNCj4gICANCj4gICAiSWYgdGhlIGJ1ZmZlcmVk
IFdSSVRFIGlzIHVzaW5nIGEgY3JlZGVudGlhbCBrZXkgdGhhdCB3aWxsIGV4cGlyZQ0KPiAgIHdp
dGhpbiBsb3cgd2F0ZXJtYXJrIHNlY29uZHMsIGZhaWwgdGhlIFdSSVRFIGluIG5mc193cml0ZV9i
ZWdpbg0KPiAgIF9iZWZvcmVfIHRoZSBXUklURSBpcyBidWZmZXJlZCBhbmQgcmV0dXJuIC1FQUND
RVMgdG8gdGhlIGFwcGxpY2F0aW9uLiINCj4gICANCj4gICBTaG91bGRuJ3QgcmVqZWN0aW5nIHRo
ZSB3cml0ZSBhdHRlbXB0IGJlIHRoZSBwdXJ2aWV3IG9mIHRoZSBzZXJ2ZXI/DQo+ICAgDQo+ICAg
SXQgc2VlbXMgdG8gbWUgdGhhdCB3ZSdkIGJlIGJlc3Qgb2ZmIGp1c3Qgc3dpdGNoaW5nIHRvIHN5
bmNocm9ub3VzDQo+ICAgd3JpdGVzIHdoZW4gd2Ugc3RhcnQgYXBwcm9hY2hpbmcgY3JlZGVudGlh
bCBleHBpcmF0aW9uLCBhbmQgbGV0dGluZyB0aGUNCj4gICBzZXJ2ZXIgaGFuZGxlIHRoZSBjYXNl
IHdoZXJlIHRoZSBjcmVkZW50aWFsIGV4cGlyZXMuDQo+ICAgDQo+ICAgSXQgc2VlbXMgdG8gbWUg
dGhhdCB0aGUgY2xpZW50IHNob3VsZCBqdXN0IGJlIGluIHRoZSBidXNpbmVzcyBvZg0KPiAgIHJl
Y29nbml6aW5nIHdoZW4gdGhlIGNyZWRlbnRpYWwgbWlnaHQgYmUgYWJvdXQgdG8gZXhwaXJlIGFu
ZCBhdHRlbXB0IHRvDQo+ICAgZW5zdXJlIHRoYXQgd2UgZG9uJ3QgZW5kIHVwIHdpdGggYSBidW5j
aCBvZiBidWZmZXJlZCBkYXRhIGluIHRoYXQgY2FzZS4NCj4gDQo+IEkgZG9uJ3QgbGlrZSBpdC4g
IEl0J3MgZWFzeSB0byB1bmRlcnN0YW5kIHdoeSB3cml0ZXMgbWlnaHQgZmFpbCBpZiB0aGUgY3Jl
ZHMNCj4gYXJlIGdldHRpbmcgc3RhbGUuICBJdCdzIG5vdCBzbyBlYXN5IHRvIHVuZGVyc3RhbmQg
d2h5IGNsaWVudCBiZWhhdmlvcg0KPiBjaGFuZ2VzIGluIGEgc3VidGxlIHdheSwgd2l0aCBwZXJm
b3JtYW5jZSBpbXBsaWNhdGlvbnMuDQoNCldlIGRvbid0IGtub3cgdGhhdCB0aGUgY3JlZHMgYXJl
IHN0YWxlLiBXZSBqdXN0IGtub3cgdGhhdCB0aGUgR1NTDQpzZXNzaW9uIGlzIGFib3V0IHRvIGV4
cGlyZS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5l
cg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0K
DQo=

2012-09-12 15:13:08

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes


On Sep 10, 2012, at 4:08 PM, Jeff Layton wrote:

> On Mon, 10 Sep 2012 19:52:35 +0000
> "Adamson, Andy" <[email protected]> wrote:
>
>>
>> On Sep 10, 2012, at 2:57 PM, Jeff Layton wrote:
>>
>>> On Thu, 6 Sep 2012 15:54:07 -0400
>>> [email protected] wrote:
>>>
>>>> From: Andy Adamson <[email protected]>
>>>>
>>>> This patch set is a Request for Comments: I think it does a fair job of
>>>> handling the issue - but it is complicated, and other brains might find a
>>>> better way to get the job done.
>>>>
>>>> We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
>>>> context key) that is about to expire or has expired. We currently will
>>>> paint ourselves into a corner by returning success to the applciation
>>>> for such a buffered WRITE, only to discover that we do not have permission when
>>>> we attempt to flush the WRITE (and potentially associated COMMIT) to disk.
>>>> This results in data corruption.
>>>>
>>>> First, a couple of "setup" patches:
>>>> 1) Patch SUNRPC handle EKEYEXPIRED in call_refreshresult returns EACCES to the
>>>> application on an expired or non-existent gss context when the users (Kerberos)
>>>> credentials have also expired or are non-existent. Current behavior is to
>>>> retry upcalls to GSSD forever. Please see patch comment for detail.
>>>>
>>>> 2) Patch SUNRPC set gss gc_expiry to full lifetime works in conjunction with
>>>> the gssd patch "0001-GSSD-Pass-GSS_context-lifetime-to-the-kernel". The
>>>> gssd patch passes the actual remaining TGT lifetime in the downcall, and
>>>> this kernel patch sets the gss context gc_expiry to this lifetime.
>>>>
>>>> Then the two patches that avoid using an expired credential key:
>>>> 3) Patch SUNRPC new rpc_credops to test credential expiry is the heart of this
>>>> work. It provides the RPC layer helper functions to allow NFS to manage
>>>> data in the face of expired credentials
>>>>
>>>> 4) Patch NFS avoid expired credential keys for buffered writes calls the
>>>> RPC helper functions.
>>>>
>>>> Pages for buffered WRITEs are allocated in nfs_write_begin where we have an
>>>> nfs_open_context and associated rpc_cred. This is a generic rpc_cred, NOT
>>>> the gss_cred used in the actual WRITE RPC. Each WRITE RPC call takes the generic
>>>> rpc_cred (or uses the 'current_cred') uid and uses it to lookup the associated
>>>> gss_cred and gss_context in the call_refresh RPC state. So, there is a
>>>> one-to-one association between the nfs_open_context generic_cred and a
>>>> gss_cred with a matching uid and a valid non expired gss context.
>>>>
>>>> We need to check the nfs_open_context generic cred 'underlying' gss_cred
>>>> gss_context gc_expiry in nfs_write_begin to determine if there is enough time
>>>> left in the gss_context lifetime to complete the buffered WRITE.
>>>>
>>>> I started by adding a "key_timeout" rpc_authops routine only set by the generic
>>>> auth to do this work, called by rpcauth_key_timeout_notify, called from
>>>> nfs_write_begin. It does the lookup of the gss_cred (see the patch for
>>>> fast-tracking of non-gss underlying creds) and then tests the gss_context
>>>> gc_expiry against timeouts by calling a new crkey_timeout rpc_credops set
>>>> only for the gss_cred.
>>>>
>>>> I came up with the idea of two credential key timeout watermarks:
>>>> 1) A high watermark, RPC_KEY_EXPIRE_TIMEO set to 90 seconds.
>>>> 2) A low watermark, RPC_KEY_EXPIRE_FAIL set to 30 seconds.
>>>> NOTE: these timeouts are guesses that work in a VM environment. We may want to
>>>> make them adjustable via module parameters.
>>>>
>>>> If key_timeout is called on a credential with an underlying credential key that
>>>> will expire within high watermark seconds, we set the RPC_CRED_KEY_EXPIRE_SOON
>>>> flag in the generic_cred acred so that the NFS layer can clean up prior to
>>>> key expiration It does this by calling a new crkey_to_expire rpc_credop set
>>>> only for generic creds that tests for the RPC_CRED_KEY_EXPIRE_SOON flag.
>>>>
>>>> If the RPC_CRED_KEY_EXPIRE_SOON flag is set in the nfs_open_context generic
>>>> cred (the acred portion), then nfs_write_end will call nfs_wb_all
>>>> on EVERY WRITE CALL it sees, and nfs_write_rpcsetup will send NFS_FILE_SYNC
>>>> WRITEs. The idea of the high watermark is to give time to flush all buffered
>>>> WRITEs and COMMITs, as well as to continue to WRITE, but only with
>>>> NFS_FILE_SYNC, allowing the application to try to finish writing before the
>>>> gss context expires. NOTE that this means EACH WRITE within the high watermark
>>>> timeout is a singe PAGE of NFS_FILE_SYNC. I think this is fine, because we are
>>>> in a failure mode - the most important thing is to NOT fail a buffered WRITE..
>>>>
>>>> If key_timeout is called on a credential key that will expire within low
>>>> watermark seconds, we return EACCES from nfs_write_begin to stop
>>>> the buffered WRITE using the credential key.
>>>>
>>>> Checking a generic credential's underlying credential involves a cred lookup.
>>>> To avoid this lookup in the normal case when the underlying credential has
>>>> a key that is valid (before the high watermark), a notify flag is set in
>>>> the generic credential the first time the key_timeout is called. The
>>>> generic credential then stops checking the underlying credential key expiry, and
>>>> the underlying credential (gss_cred) match routine then checks the key
>>>> expiration upon each normal use and sets a flag in the associated generic
>>>> credential only when the key expiration is within the high watermark.
>>>> This in turn signals the generic credential key_timeout to perform the extra
>>>> credetial lookup thereafter.
>>>>
>>>> TING:
>>>>
>>>> I have tested these patches with a TGT of 5 minutes, and a modified
>>>> Connectathon special test: bigfile.c that only writes and never flushes.
>>>> I also increased the file size to 524288000 bytes to give me time.
>>>> I named the test buffered-write.
>>>>
>>>> I kinit, and run 6 instances of buffered write with the first and maybe second
>>>> test completing prior to TGT expiration, and the third/fourth tests spanning
>>>> the high water mark. The fifth test starts within the high water mark, and the
>>>> sixth test starts within the low water mark.
>>>>
>>>> I have seen the expected behavior: test 1,2 succeed. Tests 3-6 fail with
>>>> Permission Denied. Tests 3 and 4 start with normal buffered UNSTABLE writes
>>>> then switch to single page NFS_FILE_SYNC writes with a COMMIT for the
>>>> normal buffered writes. Test 5 has only single page NFS_FILE_SYNC writes,
>>>> test 6 simply fails.
>>>>
>>>> None of the WRITEs fail.
>>>>
>>>> ISSUES
>>>> 1) Once in a while I see "short write" instead of "Permission
>>>> denied". I cannot reproduce this. Even then wireshark shows all writes succeeding.
>>>>
>>>> 2) Although I have tested re-establishing Kerberos credentials after the
>>>> test run, I have not done any testing re-estabishing Kerberos credentials
>>>> during the a test run that is within the high watermark.
>>>>
>>>> 3) General corner case and other testing
>>>>
>>>> -->Andy
>>>>
>>>> Andy Adamson (4):
>>>> SUNRPC handle EKEYEXPIRED in call_refreshresult
>>>> SUNRPC set gss gc_expiry to full lifetime
>>>> SUNRPC new rpc_credops to test credential expiry
>>>> NFS avoid expired credential keys for buffered writes
>>>>
>>>> fs/nfs/file.c | 10 ++++
>>>> fs/nfs/internal.h | 2 +
>>>> fs/nfs/nfs3proc.c | 6 +-
>>>> fs/nfs/nfs4filelayout.c | 1 -
>>>> fs/nfs/nfs4proc.c | 18 --------
>>>> fs/nfs/nfs4state.c | 22 ---------
>>>> fs/nfs/proc.c | 43 ------------------
>>>> fs/nfs/write.c | 29 ++++++++++++
>>>> include/linux/sunrpc/auth.h | 27 ++++++++++++
>>>> net/sunrpc/auth.c | 21 +++++++++
>>>> net/sunrpc/auth_generic.c | 93 ++++++++++++++++++++++++++++++++++++++++
>>>> net/sunrpc/auth_gss/auth_gss.c | 51 +++++++++++++++++++---
>>>> net/sunrpc/clnt.c | 1 +
>>>> 13 files changed, 231 insertions(+), 93 deletions(-)
>>>>
>>>
>>> In general, I like this set...
>>>
>>> My only concern is this behavior described in the last patch:
>>>
>>> "If the buffered WRITE is using a credential key that will expire
>>> within low watermark seconds, fail the WRITE in nfs_write_begin
>>> _before_ the WRITE is buffered and return -EACCES to the application."
>>>
>>> Shouldn't rejecting the write attempt be the purview of the server?
>>
>> The TGT lifetime is shared by the client and the server - it's the same, so I don't think we'd gain anything by doing that.
>>
>>>
>>> It seems to me that we'd be best off just switching to synchronous
>>> writes when we start approaching credential expiration, and letting the
>>> server handle the case where the credential expires.
>>
>> You mean switching to direct IO? I believe that means taking a different path in the VFS layer which I found difficult to do from nfs_write_begin. But maybe I just didn't see a good way to do this.
>>
>>
>>> It seems to me that the client should just be in the business of
>>> recognizing when the credential might be about to expire and attempt to
>>> ensure that we don't end up with a bunch of buffered data in that case.
>>
>> Yes.
>>
>> -->Andy
>>
>>>
>>> --
>>> Jeff Layton <[email protected]>
>>
>
>
> I meant that I didn't really see the point of the two different states
> when we're approaching ticket expiry.
>
> IIUC, at the "high watermark", you'll switch to synchronous writes, and
> presumably will start syncing out any buffered data. At that point,
> what are you buying by rejecting writes altogether with EACCES at the
> low watermark?
>
> It seems to me that the high watermark should be all you need to deal
> with as a client. Once you're past that point you can just let the
> client attempt the writes. If they fail, then no big deal -- you were
> doing synchronous writes already so you've already avoided having a
> bunch of data buffered up.

Hi Jeff

After doing more test verification, here are the reasons for the low watermark. Reason #2 is the strongest justification.

1) It's a bad idea to "just let the client attempt the writes".

The client needs a valid GSS context to send an RPCSEC_GSS RPC because of the checksum of the RPC header (up to and including the credential) that is computed using the GSS_GetMIC() call with enctype and the QOP from the GSS context.

Plus, the client and the server see the same TGT lifetime, and thus the same GSS context expiration.

Therefore, the only time the client can send an RPCSEC_GSS Kerberos GSS security mechanism RPC to the server and have it reject the RPC with -EACCES (or the RPC level RPCSEC_GSS_CTXPROBLEM) is in the case where the GSS context expires while the RPC is in flight or prior to the RPCSEC_GSS processing by the server. 99.999% of the time, the client determines whether or not the RPCSEC_GSS call fails prior to putting an RPC on the wire based on it's GSS context expiry. So it is not an option to wait for the server to reject the RPC.

"The client attempting writes" means that with my patch set, the buffered write is assigned pages in nfs_write_begin, and we don't find out that the client can't send the WRITE until nfs_write_end calls nfs_wb_all() which tries to flush the newly assigned pages, and the RPC layer fails to find a valid GSS context, so performs multiple up calls to GSSD which all fail, and nfs_wb_all() returns an error - which I have not seen ever it do - so that nfs_write_end fails with -EACCES. Then the pages need to be cleaned up ?.

We might as well fail the call in nfs_write_begin before the pages are assigned, it's a lot cleaner as we save a bunch of up calls to GSSD and we don't have to clean up any pages nor ensure that nfs_write_end returns -EACCES.

With the low water mark, we fail in nfs_write_begin, and we're not hitting corner cases in the GSS code such as a GSS context being assigned with a lifetime of 0 ( I think a GSS context creation can succeed with the resultant lifetime of 0 because there was a lifetime of 1 when the context was created, but it fell to 0 by the time GSSD called gss_inquire_context ).

2) It's a really really good idea to give the client time to clean up by sending a CLOSE to let the server know what is going on. Note for pNFS the CLOSE also carries an implicit LAYOUTRETURN for the return_on_close case. Without the CLOSE, the server simply sees WRITEs stop. BTW this is a good reason to do the same for READ.

3) Extra time to flush buffered data. With my patch set, data is allowed to be buffered, but then flushed prior to nfs_write_end's return. IIUC, a (multithreaded) application can be writing a very large amount of data, which during the high water mark is sent 1 PAGE at a time NFS_FILE_SYNC. This takes significantly longer than coalesced buffered UNSTABLE WRITEs and many more RPCs (not continuations). The purpose of this code is to totally drain the buffered writes. The low water mark provides a period of no new WRITEs giving more RPC resources to the exiting single PAGE draining. I suspect that we will need to make the watermark(s) tunable via module parameters for different workloads.

-->Andy

>
> --
> Jeff Layton <[email protected]>


2012-09-10 18:57:35

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

On Thu, 6 Sep 2012 15:54:07 -0400
[email protected] wrote:

> From: Andy Adamson <[email protected]>
>
> This patch set is a Request for Comments: I think it does a fair job of
> handling the issue - but it is complicated, and other brains might find a
> better way to get the job done.
>
> We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
> context key) that is about to expire or has expired. We currently will
> paint ourselves into a corner by returning success to the applciation
> for such a buffered WRITE, only to discover that we do not have permission when
> we attempt to flush the WRITE (and potentially associated COMMIT) to disk.
> This results in data corruption.
>
> First, a couple of "setup" patches:
> 1) Patch SUNRPC handle EKEYEXPIRED in call_refreshresult returns EACCES to the
> application on an expired or non-existent gss context when the users (Kerberos)
> credentials have also expired or are non-existent. Current behavior is to
> retry upcalls to GSSD forever. Please see patch comment for detail.
>
> 2) Patch SUNRPC set gss gc_expiry to full lifetime works in conjunction with
> the gssd patch "0001-GSSD-Pass-GSS_context-lifetime-to-the-kernel". The
> gssd patch passes the actual remaining TGT lifetime in the downcall, and
> this kernel patch sets the gss context gc_expiry to this lifetime.
>
> Then the two patches that avoid using an expired credential key:
> 3) Patch SUNRPC new rpc_credops to test credential expiry is the heart of this
> work. It provides the RPC layer helper functions to allow NFS to manage
> data in the face of expired credentials
>
> 4) Patch NFS avoid expired credential keys for buffered writes calls the
> RPC helper functions.
>
> Pages for buffered WRITEs are allocated in nfs_write_begin where we have an
> nfs_open_context and associated rpc_cred. This is a generic rpc_cred, NOT
> the gss_cred used in the actual WRITE RPC. Each WRITE RPC call takes the generic
> rpc_cred (or uses the 'current_cred') uid and uses it to lookup the associated
> gss_cred and gss_context in the call_refresh RPC state. So, there is a
> one-to-one association between the nfs_open_context generic_cred and a
> gss_cred with a matching uid and a valid non expired gss context.
>
> We need to check the nfs_open_context generic cred 'underlying' gss_cred
> gss_context gc_expiry in nfs_write_begin to determine if there is enough time
> left in the gss_context lifetime to complete the buffered WRITE.
>
> I started by adding a "key_timeout" rpc_authops routine only set by the generic
> auth to do this work, called by rpcauth_key_timeout_notify, called from
> nfs_write_begin. It does the lookup of the gss_cred (see the patch for
> fast-tracking of non-gss underlying creds) and then tests the gss_context
> gc_expiry against timeouts by calling a new crkey_timeout rpc_credops set
> only for the gss_cred.
>
> I came up with the idea of two credential key timeout watermarks:
> 1) A high watermark, RPC_KEY_EXPIRE_TIMEO set to 90 seconds.
> 2) A low watermark, RPC_KEY_EXPIRE_FAIL set to 30 seconds.
> NOTE: these timeouts are guesses that work in a VM environment. We may want to
> make them adjustable via module parameters.
>
> If key_timeout is called on a credential with an underlying credential key that
> will expire within high watermark seconds, we set the RPC_CRED_KEY_EXPIRE_SOON
> flag in the generic_cred acred so that the NFS layer can clean up prior to
> key expiration It does this by calling a new crkey_to_expire rpc_credop set
> only for generic creds that tests for the RPC_CRED_KEY_EXPIRE_SOON flag.
>
> If the RPC_CRED_KEY_EXPIRE_SOON flag is set in the nfs_open_context generic
> cred (the acred portion), then nfs_write_end will call nfs_wb_all
> on EVERY WRITE CALL it sees, and nfs_write_rpcsetup will send NFS_FILE_SYNC
> WRITEs. The idea of the high watermark is to give time to flush all buffered
> WRITEs and COMMITs, as well as to continue to WRITE, but only with
> NFS_FILE_SYNC, allowing the application to try to finish writing before the
> gss context expires. NOTE that this means EACH WRITE within the high watermark
> timeout is a singe PAGE of NFS_FILE_SYNC. I think this is fine, because we are
> in a failure mode - the most important thing is to NOT fail a buffered WRITE..
>
> If key_timeout is called on a credential key that will expire within low
> watermark seconds, we return EACCES from nfs_write_begin to stop
> the buffered WRITE using the credential key.
>
> Checking a generic credential's underlying credential involves a cred lookup.
> To avoid this lookup in the normal case when the underlying credential has
> a key that is valid (before the high watermark), a notify flag is set in
> the generic credential the first time the key_timeout is called. The
> generic credential then stops checking the underlying credential key expiry, and
> the underlying credential (gss_cred) match routine then checks the key
> expiration upon each normal use and sets a flag in the associated generic
> credential only when the key expiration is within the high watermark.
> This in turn signals the generic credential key_timeout to perform the extra
> credetial lookup thereafter.
>
> TING:
>
> I have tested these patches with a TGT of 5 minutes, and a modified
> Connectathon special test: bigfile.c that only writes and never flushes.
> I also increased the file size to 524288000 bytes to give me time.
> I named the test buffered-write.
>
> I kinit, and run 6 instances of buffered write with the first and maybe second
> test completing prior to TGT expiration, and the third/fourth tests spanning
> the high water mark. The fifth test starts within the high water mark, and the
> sixth test starts within the low water mark.
>
> I have seen the expected behavior: test 1,2 succeed. Tests 3-6 fail with
> Permission Denied. Tests 3 and 4 start with normal buffered UNSTABLE writes
> then switch to single page NFS_FILE_SYNC writes with a COMMIT for the
> normal buffered writes. Test 5 has only single page NFS_FILE_SYNC writes,
> test 6 simply fails.
>
> None of the WRITEs fail.
>
> ISSUES
> 1) Once in a while I see "short write" instead of "Permission
> denied". I cannot reproduce this. Even then wireshark shows all writes succeeding.
>
> 2) Although I have tested re-establishing Kerberos credentials after the
> test run, I have not done any testing re-estabishing Kerberos credentials
> during the a test run that is within the high watermark.
>
> 3) General corner case and other testing
>
> -->Andy
>
> Andy Adamson (4):
> SUNRPC handle EKEYEXPIRED in call_refreshresult
> SUNRPC set gss gc_expiry to full lifetime
> SUNRPC new rpc_credops to test credential expiry
> NFS avoid expired credential keys for buffered writes
>
> fs/nfs/file.c | 10 ++++
> fs/nfs/internal.h | 2 +
> fs/nfs/nfs3proc.c | 6 +-
> fs/nfs/nfs4filelayout.c | 1 -
> fs/nfs/nfs4proc.c | 18 --------
> fs/nfs/nfs4state.c | 22 ---------
> fs/nfs/proc.c | 43 ------------------
> fs/nfs/write.c | 29 ++++++++++++
> include/linux/sunrpc/auth.h | 27 ++++++++++++
> net/sunrpc/auth.c | 21 +++++++++
> net/sunrpc/auth_generic.c | 93 ++++++++++++++++++++++++++++++++++++++++
> net/sunrpc/auth_gss/auth_gss.c | 51 +++++++++++++++++++---
> net/sunrpc/clnt.c | 1 +
> 13 files changed, 231 insertions(+), 93 deletions(-)
>

In general, I like this set...

My only concern is this behavior described in the last patch:

"If the buffered WRITE is using a credential key that will expire
within low watermark seconds, fail the WRITE in nfs_write_begin
_before_ the WRITE is buffered and return -EACCES to the application."

Shouldn't rejecting the write attempt be the purview of the server?

It seems to me that we'd be best off just switching to synchronous
writes when we start approaching credential expiration, and letting the
server handle the case where the credential expires.

It seems to me that the client should just be in the business of
recognizing when the credential might be about to expire and attempt to
ensure that we don't end up with a bunch of buffered data in that case.

--
Jeff Layton <[email protected]>

2012-09-10 19:58:11

by Jim Rees

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

Jeff Layton wrote:

My only concern is this behavior described in the last patch:

"If the buffered WRITE is using a credential key that will expire
within low watermark seconds, fail the WRITE in nfs_write_begin
_before_ the WRITE is buffered and return -EACCES to the application."

Shouldn't rejecting the write attempt be the purview of the server?

It seems to me that we'd be best off just switching to synchronous
writes when we start approaching credential expiration, and letting the
server handle the case where the credential expires.

It seems to me that the client should just be in the business of
recognizing when the credential might be about to expire and attempt to
ensure that we don't end up with a bunch of buffered data in that case.

I don't like it. It's easy to understand why writes might fail if the creds
are getting stale. It's not so easy to understand why client behavior
changes in a subtle way, with performance implications.

2012-09-13 17:57:57

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

On Wed, Sep 12, 2012 at 04:14:38PM +0000, Adamson, Andy wrote:
>
> On Sep 12, 2012, at 11:21 AM, Myklebust, Trond wrote:
>
> > On Wed, 2012-09-12 at 15:13 +0000, Adamson, Andy wrote:
> >> After doing more test verification, here are the reasons for the low watermark. Reason #2 is the strongest justification.

1 and 2 don't sound right. What exactly were the test failures?

The client and server's gss code already check the context expiry for
us--we don't want an extra check in an upper layer on the client.

The context *will* expire unexpectedly sometimes, and we do have to
handle that. (The server's clock could be a tad faster than the
server's, or the server could reboot, etc., etc.)

I agree with all the suggestions for trying to anticipate expiry in the
normal cases and preparing to minimize the damage, that's fine. But
once the expiry finally comes we should leave the existing mechanisms to
do their job.

--b.

2012-09-06 19:54:09

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 2/4] SUNRPC set gss gc_expiry to full lifetime

From: Andy Adamson <[email protected]>

Only use the default GSSD_MIN_TIMEOUT if the gss downcall timeout is zero.
Store the full lifetime in gc_expiry (not 3/4 of the lifetime) as subsequent
patches will use the gc_expiry to determine buffered WRITE behavior in the
face of expired or soon to be expired gss credentials.

Signed-off-by: Andy Adamson <[email protected]>
---
net/sunrpc/auth_gss/auth_gss.c | 15 +++++++++++----
1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 34c5220..c59f5ed 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -192,17 +192,21 @@ gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx, struct
const void *q;
unsigned int seclen;
unsigned int timeout;
+ unsigned long now = jiffies;
u32 window_size;
int ret;

- /* First unsigned int gives the lifetime (in seconds) of the cred */
+ /* First unsigned int gives the remaining lifetime in seconds of the
+ * credential - e.g. the remaining TGT lifetime for Kerberos or
+ * the -t value passed to GSSD. */
p = simple_get_bytes(p, end, &timeout, sizeof(timeout));
if (IS_ERR(p))
goto err;
if (timeout == 0)
timeout = GSSD_MIN_TIMEOUT;
- ctx->gc_expiry = jiffies + (unsigned long)timeout * HZ * 3 / 4;
- /* Sequence number window. Determines the maximum number of simultaneous requests */
+ ctx->gc_expiry = now + ((unsigned long)timeout * HZ);
+ /* Sequence number window. Determines the maximum number of
+ * simultaneous requests */
p = simple_get_bytes(p, end, &window_size, sizeof(window_size));
if (IS_ERR(p))
goto err;
@@ -237,9 +241,12 @@ gss_fill_context(const void *p, const void *end, struct gss_cl_ctx *ctx, struct
p = ERR_PTR(ret);
goto err;
}
+ dprintk("RPC: %s Success. gc_expiry %lu now %lu timeout %u\n",
+ __func__, ctx->gc_expiry, now, timeout);
return q;
err:
- dprintk("RPC: gss_fill_context returning %ld\n", -PTR_ERR(p));
+ dprintk("RPC: %s returns %ld gc_expiry %lu now %lu timeout %u\n",
+ __func__, -PTR_ERR(p), ctx->gc_expiry, now, timeout);
return p;
}

--
1.7.7.6


2012-09-12 15:21:04

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

T24gV2VkLCAyMDEyLTA5LTEyIGF0IDE1OjEzICswMDAwLCBBZGFtc29uLCBBbmR5IHdyb3RlOg0K
PiBPbiBTZXAgMTAsIDIwMTIsIGF0IDQ6MDggUE0sIEplZmYgTGF5dG9uIHdyb3RlOg0KPiANCj4g
PiBPbiBNb24sIDEwIFNlcCAyMDEyIDE5OjUyOjM1ICswMDAwDQo+ID4gIkFkYW1zb24sIEFuZHki
IDxXaWxsaWFtLkFkYW1zb25AbmV0YXBwLmNvbT4gd3JvdGU6DQo+ID4gDQo+ID4+IA0KPiA+PiBP
biBTZXAgMTAsIDIwMTIsIGF0IDI6NTcgUE0sIEplZmYgTGF5dG9uIHdyb3RlOg0KPiA+PiANCj4g
Pj4+IE9uIFRodSwgIDYgU2VwIDIwMTIgMTU6NTQ6MDcgLTA0MDANCj4gPj4+IGFuZHJvc0BuZXRh
cHAuY29tIHdyb3RlOg0KPiA+Pj4gDQo+ID4+Pj4gRnJvbTogQW5keSBBZGFtc29uIDxhbmRyb3NA
bmV0YXBwLmNvbT4NCj4gPj4+PiANCj4gPj4+PiBUaGlzIHBhdGNoIHNldCBpcyBhIFJlcXVlc3Qg
Zm9yIENvbW1lbnRzOiBJIHRoaW5rIGl0IGRvZXMgYSBmYWlyIGpvYiBvZg0KPiA+Pj4+IGhhbmRs
aW5nIHRoZSBpc3N1ZSAtIGJ1dCBpdCBpcyBjb21wbGljYXRlZCwgYW5kIG90aGVyIGJyYWlucyBt
aWdodCBmaW5kIGENCj4gPj4+PiBiZXR0ZXIgd2F5IHRvIGdldCB0aGUgam9iIGRvbmUuDQo+ID4+
Pj4gDQo+ID4+Pj4gV2UgbXVzdCBhdm9pZCBidWZmZXJpbmcgYSBXUklURSB0aGF0IGlzIHVzaW5n
IGEgY3JlZGVudGlhbCBrZXkgKGUuZy4gYSBHU1MNCj4gPj4+PiBjb250ZXh0IGtleSkgdGhhdCBp
cyBhYm91dCB0byBleHBpcmUgb3IgaGFzIGV4cGlyZWQuICBXZSBjdXJyZW50bHkgd2lsbA0KPiA+
Pj4+IHBhaW50IG91cnNlbHZlcyBpbnRvIGEgY29ybmVyIGJ5IHJldHVybmluZyBzdWNjZXNzIHRv
IHRoZSBhcHBsY2lhdGlvbg0KPiA+Pj4+IGZvciBzdWNoIGEgYnVmZmVyZWQgV1JJVEUsIG9ubHkg
dG8gZGlzY292ZXIgdGhhdCB3ZSBkbyBub3QgaGF2ZSBwZXJtaXNzaW9uIHdoZW4NCj4gPj4+PiB3
ZSBhdHRlbXB0IHRvIGZsdXNoIHRoZSBXUklURSAoYW5kIHBvdGVudGlhbGx5IGFzc29jaWF0ZWQg
Q09NTUlUKSB0byBkaXNrLg0KPiA+Pj4+IFRoaXMgcmVzdWx0cyBpbiBkYXRhIGNvcnJ1cHRpb24u
DQo+ID4+Pj4gDQo+ID4+Pj4gRmlyc3QsIGEgY291cGxlIG9mICJzZXR1cCIgcGF0Y2hlczoNCj4g
Pj4+PiAxKSBQYXRjaCBTVU5SUEMgaGFuZGxlIEVLRVlFWFBJUkVEIGluIGNhbGxfcmVmcmVzaHJl
c3VsdCByZXR1cm5zIEVBQ0NFUyB0byB0aGUNCj4gPj4+PiBhcHBsaWNhdGlvbiBvbiBhbiBleHBp
cmVkIG9yIG5vbi1leGlzdGVudCBnc3MgY29udGV4dCB3aGVuIHRoZSB1c2VycyAoS2VyYmVyb3Mp
DQo+ID4+Pj4gY3JlZGVudGlhbHMgaGF2ZSBhbHNvIGV4cGlyZWQgb3IgYXJlIG5vbi1leGlzdGVu
dC4gQ3VycmVudCBiZWhhdmlvciBpcyB0bw0KPiA+Pj4+IHJldHJ5IHVwY2FsbHMgdG8gR1NTRCBm
b3JldmVyLiBQbGVhc2Ugc2VlIHBhdGNoIGNvbW1lbnQgZm9yIGRldGFpbC4NCj4gPj4+PiANCj4g
Pj4+PiAyKSBQYXRjaCBTVU5SUEMgc2V0IGdzcyBnY19leHBpcnkgdG8gZnVsbCBsaWZldGltZSB3
b3JrcyBpbiBjb25qdW5jdGlvbiB3aXRoDQo+ID4+Pj4gdGhlIGdzc2QgcGF0Y2ggIjAwMDEtR1NT
RC1QYXNzLUdTU19jb250ZXh0LWxpZmV0aW1lLXRvLXRoZS1rZXJuZWwiLiBUaGUNCj4gPj4+PiBn
c3NkIHBhdGNoIHBhc3NlcyB0aGUgYWN0dWFsIHJlbWFpbmluZyBUR1QgbGlmZXRpbWUgaW4gdGhl
IGRvd25jYWxsLCBhbmQNCj4gPj4+PiB0aGlzIGtlcm5lbCBwYXRjaCBzZXRzIHRoZSBnc3MgY29u
dGV4dCBnY19leHBpcnkgdG8gdGhpcyBsaWZldGltZS4NCj4gPj4+PiANCj4gPj4+PiBUaGVuIHRo
ZSB0d28gcGF0Y2hlcyB0aGF0IGF2b2lkIHVzaW5nIGFuIGV4cGlyZWQgY3JlZGVudGlhbCBrZXk6
DQo+ID4+Pj4gMykgUGF0Y2ggU1VOUlBDIG5ldyBycGNfY3JlZG9wcyB0byB0ZXN0IGNyZWRlbnRp
YWwgZXhwaXJ5IGlzIHRoZSBoZWFydCBvZiB0aGlzDQo+ID4+Pj4gd29yay4gSXQgcHJvdmlkZXMg
dGhlIFJQQyBsYXllciBoZWxwZXIgZnVuY3Rpb25zIHRvIGFsbG93IE5GUyB0byBtYW5hZ2UNCj4g
Pj4+PiBkYXRhIGluIHRoZSBmYWNlIG9mIGV4cGlyZWQgY3JlZGVudGlhbHMNCj4gPj4+PiANCj4g
Pj4+PiA0KSBQYXRjaCBORlMgYXZvaWQgZXhwaXJlZCBjcmVkZW50aWFsIGtleXMgZm9yIGJ1ZmZl
cmVkIHdyaXRlcyBjYWxscyB0aGUNCj4gPj4+PiBSUEMgaGVscGVyIGZ1bmN0aW9ucy4NCj4gPj4+
PiANCj4gPj4+PiBQYWdlcyBmb3IgYnVmZmVyZWQgV1JJVEVzIGFyZSBhbGxvY2F0ZWQgaW4gbmZz
X3dyaXRlX2JlZ2luIHdoZXJlIHdlIGhhdmUgYW4NCj4gPj4+PiBuZnNfb3Blbl9jb250ZXh0IGFu
ZCBhc3NvY2lhdGVkIHJwY19jcmVkLiBUaGlzIGlzIGEgZ2VuZXJpYyBycGNfY3JlZCwgTk9UDQo+
ID4+Pj4gdGhlIGdzc19jcmVkIHVzZWQgaW4gdGhlIGFjdHVhbCBXUklURSBSUEMuIEVhY2ggV1JJ
VEUgUlBDIGNhbGwgdGFrZXMgdGhlIGdlbmVyaWMNCj4gPj4+PiBycGNfY3JlZCAob3IgdXNlcyB0
aGUgJ2N1cnJlbnRfY3JlZCcpIHVpZCBhbmQgdXNlcyBpdCB0byBsb29rdXAgdGhlIGFzc29jaWF0
ZWQNCj4gPj4+PiBnc3NfY3JlZCBhbmQgZ3NzX2NvbnRleHQgaW4gdGhlIGNhbGxfcmVmcmVzaCBS
UEMgc3RhdGUuIFNvLCB0aGVyZSBpcyBhDQo+ID4+Pj4gb25lLXRvLW9uZSBhc3NvY2lhdGlvbiBi
ZXR3ZWVuIHRoZSBuZnNfb3Blbl9jb250ZXh0IGdlbmVyaWNfY3JlZCBhbmQgYQ0KPiA+Pj4+IGdz
c19jcmVkIHdpdGggYSBtYXRjaGluZyB1aWQgYW5kIGEgdmFsaWQgbm9uIGV4cGlyZWQgZ3NzIGNv
bnRleHQuDQo+ID4+Pj4gDQo+ID4+Pj4gV2UgbmVlZCB0byBjaGVjayB0aGUgbmZzX29wZW5fY29u
dGV4dCBnZW5lcmljIGNyZWQgJ3VuZGVybHlpbmcnIGdzc19jcmVkDQo+ID4+Pj4gZ3NzX2NvbnRl
eHQgZ2NfZXhwaXJ5IGluIG5mc193cml0ZV9iZWdpbiB0byBkZXRlcm1pbmUgaWYgdGhlcmUgaXMg
ZW5vdWdoIHRpbWUNCj4gPj4+PiBsZWZ0IGluIHRoZSBnc3NfY29udGV4dCBsaWZldGltZSB0byBj
b21wbGV0ZSB0aGUgYnVmZmVyZWQgV1JJVEUuDQo+ID4+Pj4gDQo+ID4+Pj4gSSBzdGFydGVkIGJ5
IGFkZGluZyBhICJrZXlfdGltZW91dCIgcnBjX2F1dGhvcHMgcm91dGluZSBvbmx5IHNldCBieSB0
aGUgZ2VuZXJpYw0KPiA+Pj4+IGF1dGggdG8gZG8gdGhpcyB3b3JrLCBjYWxsZWQgYnkgcnBjYXV0
aF9rZXlfdGltZW91dF9ub3RpZnksIGNhbGxlZCBmcm9tDQo+ID4+Pj4gbmZzX3dyaXRlX2JlZ2lu
LiBJdCBkb2VzIHRoZSBsb29rdXAgb2YgdGhlIGdzc19jcmVkIChzZWUgdGhlIHBhdGNoIGZvcg0K
PiA+Pj4+IGZhc3QtdHJhY2tpbmcgb2Ygbm9uLWdzcyB1bmRlcmx5aW5nIGNyZWRzKSBhbmQgdGhl
biB0ZXN0cyB0aGUgZ3NzX2NvbnRleHQNCj4gPj4+PiBnY19leHBpcnkgYWdhaW5zdCB0aW1lb3V0
cyBieSBjYWxsaW5nIGEgbmV3IGNya2V5X3RpbWVvdXQgcnBjX2NyZWRvcHMgc2V0DQo+ID4+Pj4g
b25seSBmb3IgdGhlIGdzc19jcmVkLg0KPiA+Pj4+IA0KPiA+Pj4+IEkgY2FtZSB1cCB3aXRoIHRo
ZSBpZGVhIG9mIHR3byBjcmVkZW50aWFsIGtleSB0aW1lb3V0IHdhdGVybWFya3M6DQo+ID4+Pj4g
MSkgQSBoaWdoIHdhdGVybWFyaywgUlBDX0tFWV9FWFBJUkVfVElNRU8gc2V0IHRvIDkwIHNlY29u
ZHMuDQo+ID4+Pj4gMikgQSBsb3cgd2F0ZXJtYXJrLCBSUENfS0VZX0VYUElSRV9GQUlMIHNldCB0
byAzMCBzZWNvbmRzLg0KPiA+Pj4+IE5PVEU6IHRoZXNlIHRpbWVvdXRzIGFyZSBndWVzc2VzIHRo
YXQgd29yayBpbiBhIFZNIGVudmlyb25tZW50LiBXZSBtYXkgd2FudCB0bw0KPiA+Pj4+IG1ha2Ug
dGhlbSBhZGp1c3RhYmxlIHZpYSBtb2R1bGUgcGFyYW1ldGVycy4NCj4gPj4+PiANCj4gPj4+PiBJ
ZiBrZXlfdGltZW91dCBpcyBjYWxsZWQgb24gYSBjcmVkZW50aWFsIHdpdGggYW4gdW5kZXJseWlu
ZyBjcmVkZW50aWFsIGtleSB0aGF0DQo+ID4+Pj4gd2lsbCBleHBpcmUgd2l0aGluIGhpZ2ggd2F0
ZXJtYXJrIHNlY29uZHMsIHdlIHNldCB0aGUgUlBDX0NSRURfS0VZX0VYUElSRV9TT09ODQo+ID4+
Pj4gZmxhZyBpbiB0aGUgZ2VuZXJpY19jcmVkIGFjcmVkIHNvIHRoYXQgdGhlIE5GUyBsYXllciBj
YW4gY2xlYW4gdXAgcHJpb3IgdG8NCj4gPj4+PiBrZXkgZXhwaXJhdGlvbiBJdCBkb2VzIHRoaXMg
YnkgY2FsbGluZyBhIG5ldyBjcmtleV90b19leHBpcmUgcnBjX2NyZWRvcCBzZXQNCj4gPj4+PiBv
bmx5IGZvciBnZW5lcmljIGNyZWRzIHRoYXQgdGVzdHMgZm9yIHRoZSBSUENfQ1JFRF9LRVlfRVhQ
SVJFX1NPT04gZmxhZy4NCj4gPj4+PiANCj4gPj4+PiBJZiB0aGUgUlBDX0NSRURfS0VZX0VYUElS
RV9TT09OIGZsYWcgaXMgc2V0IGluIHRoZSBuZnNfb3Blbl9jb250ZXh0IGdlbmVyaWMNCj4gPj4+
PiBjcmVkICh0aGUgYWNyZWQgcG9ydGlvbiksIHRoZW4gbmZzX3dyaXRlX2VuZCB3aWxsIGNhbGwg
bmZzX3diX2FsbA0KPiA+Pj4+IG9uIEVWRVJZIFdSSVRFIENBTEwgaXQgc2VlcywgYW5kIG5mc193
cml0ZV9ycGNzZXR1cCB3aWxsIHNlbmQgTkZTX0ZJTEVfU1lOQw0KPiA+Pj4+IFdSSVRFcy4gIFRo
ZSBpZGVhIG9mIHRoZSBoaWdoIHdhdGVybWFyayBpcyB0byBnaXZlIHRpbWUgdG8gZmx1c2ggYWxs
IGJ1ZmZlcmVkDQo+ID4+Pj4gV1JJVEVzIGFuZCBDT01NSVRzLCBhcyB3ZWxsIGFzIHRvIGNvbnRp
bnVlIHRvIFdSSVRFLCBidXQgb25seSB3aXRoDQo+ID4+Pj4gTkZTX0ZJTEVfU1lOQywgYWxsb3dp
bmcgdGhlIGFwcGxpY2F0aW9uIHRvIHRyeSB0byBmaW5pc2ggd3JpdGluZyBiZWZvcmUgdGhlDQo+
ID4+Pj4gZ3NzIGNvbnRleHQgZXhwaXJlcy4gTk9URSB0aGF0IHRoaXMgbWVhbnMgRUFDSCBXUklU
RSB3aXRoaW4gdGhlIGhpZ2ggd2F0ZXJtYXJrDQo+ID4+Pj4gdGltZW91dCBpcyBhIHNpbmdlIFBB
R0Ugb2YgTkZTX0ZJTEVfU1lOQy4gSSB0aGluayB0aGlzIGlzIGZpbmUsIGJlY2F1c2Ugd2UgYXJl
DQo+ID4+Pj4gaW4gYSBmYWlsdXJlIG1vZGUgLSB0aGUgbW9zdCBpbXBvcnRhbnQgdGhpbmcgaXMg
dG8gTk9UIGZhaWwgYSBidWZmZXJlZCBXUklURS4uDQo+ID4+Pj4gDQo+ID4+Pj4gSWYga2V5X3Rp
bWVvdXQgaXMgY2FsbGVkIG9uIGEgY3JlZGVudGlhbCBrZXkgdGhhdCB3aWxsIGV4cGlyZSB3aXRo
aW4gbG93DQo+ID4+Pj4gd2F0ZXJtYXJrIHNlY29uZHMsIHdlIHJldHVybiBFQUNDRVMgZnJvbSBu
ZnNfd3JpdGVfYmVnaW4gdG8gc3RvcA0KPiA+Pj4+IHRoZSBidWZmZXJlZCBXUklURSB1c2luZyB0
aGUgY3JlZGVudGlhbCBrZXkuDQo+ID4+Pj4gDQo+ID4+Pj4gQ2hlY2tpbmcgYSBnZW5lcmljIGNy
ZWRlbnRpYWwncyB1bmRlcmx5aW5nIGNyZWRlbnRpYWwgaW52b2x2ZXMgYSBjcmVkIGxvb2t1cC4N
Cj4gPj4+PiBUbyBhdm9pZCB0aGlzIGxvb2t1cCBpbiB0aGUgbm9ybWFsIGNhc2Ugd2hlbiB0aGUg
dW5kZXJseWluZyBjcmVkZW50aWFsIGhhcw0KPiA+Pj4+IGEga2V5IHRoYXQgaXMgdmFsaWQgKGJl
Zm9yZSB0aGUgaGlnaCB3YXRlcm1hcmspLCBhIG5vdGlmeSBmbGFnIGlzIHNldCBpbg0KPiA+Pj4+
IHRoZSBnZW5lcmljIGNyZWRlbnRpYWwgdGhlIGZpcnN0IHRpbWUgdGhlIGtleV90aW1lb3V0IGlz
IGNhbGxlZC4gVGhlDQo+ID4+Pj4gZ2VuZXJpYyBjcmVkZW50aWFsIHRoZW4gc3RvcHMgY2hlY2tp
bmcgdGhlIHVuZGVybHlpbmcgY3JlZGVudGlhbCBrZXkgZXhwaXJ5LCBhbmQNCj4gPj4+PiB0aGUg
dW5kZXJseWluZyBjcmVkZW50aWFsIChnc3NfY3JlZCkgbWF0Y2ggcm91dGluZSB0aGVuIGNoZWNr
cyB0aGUga2V5DQo+ID4+Pj4gZXhwaXJhdGlvbiB1cG9uIGVhY2ggbm9ybWFsIHVzZSBhbmQgc2V0
cyBhIGZsYWcgaW4gdGhlIGFzc29jaWF0ZWQgZ2VuZXJpYw0KPiA+Pj4+IGNyZWRlbnRpYWwgb25s
eSB3aGVuIHRoZSBrZXkgZXhwaXJhdGlvbiBpcyB3aXRoaW4gdGhlIGhpZ2ggd2F0ZXJtYXJrLg0K
PiA+Pj4+IFRoaXMgaW4gdHVybiBzaWduYWxzIHRoZSBnZW5lcmljIGNyZWRlbnRpYWwga2V5X3Rp
bWVvdXQgdG8gcGVyZm9ybSB0aGUgZXh0cmENCj4gPj4+PiBjcmVkZXRpYWwgbG9va3VwIHRoZXJl
YWZ0ZXIuDQo+ID4+Pj4gDQo+ID4+Pj4gVElORzoNCj4gPj4+PiANCj4gPj4+PiBJIGhhdmUgdGVz
dGVkIHRoZXNlIHBhdGNoZXMgd2l0aCBhIFRHVCBvZiA1IG1pbnV0ZXMsIGFuZCBhIG1vZGlmaWVk
DQo+ID4+Pj4gQ29ubmVjdGF0aG9uIHNwZWNpYWwgdGVzdDogYmlnZmlsZS5jIHRoYXQgb25seSB3
cml0ZXMgYW5kIG5ldmVyIGZsdXNoZXMuDQo+ID4+Pj4gSSBhbHNvIGluY3JlYXNlZCB0aGUgZmls
ZSBzaXplIHRvIDUyNDI4ODAwMCBieXRlcyB0byBnaXZlIG1lIHRpbWUuDQo+ID4+Pj4gSSBuYW1l
ZCB0aGUgdGVzdCBidWZmZXJlZC13cml0ZS4NCj4gPj4+PiANCj4gPj4+PiBJIGtpbml0LCBhbmQg
cnVuIDYgaW5zdGFuY2VzIG9mIGJ1ZmZlcmVkIHdyaXRlIHdpdGggdGhlIGZpcnN0IGFuZCBtYXli
ZSBzZWNvbmQNCj4gPj4+PiB0ZXN0IGNvbXBsZXRpbmcgcHJpb3IgdG8gVEdUIGV4cGlyYXRpb24s
IGFuZCB0aGUgdGhpcmQvZm91cnRoIHRlc3RzIHNwYW5uaW5nDQo+ID4+Pj4gdGhlIGhpZ2ggd2F0
ZXIgbWFyay4gVGhlIGZpZnRoIHRlc3Qgc3RhcnRzIHdpdGhpbiB0aGUgaGlnaCB3YXRlciBtYXJr
LCBhbmQgdGhlDQo+ID4+Pj4gc2l4dGggdGVzdCBzdGFydHMgd2l0aGluIHRoZSBsb3cgd2F0ZXIg
bWFyay4NCj4gPj4+PiANCj4gPj4+PiBJIGhhdmUgc2VlbiB0aGUgZXhwZWN0ZWQgYmVoYXZpb3I6
IHRlc3QgMSwyIHN1Y2NlZWQuIFRlc3RzIDMtNiBmYWlsIHdpdGgNCj4gPj4+PiBQZXJtaXNzaW9u
IERlbmllZC4gVGVzdHMgMyBhbmQgNCBzdGFydCB3aXRoIG5vcm1hbCBidWZmZXJlZCBVTlNUQUJM
RSB3cml0ZXMNCj4gPj4+PiB0aGVuIHN3aXRjaCB0byBzaW5nbGUgcGFnZSBORlNfRklMRV9TWU5D
IHdyaXRlcyB3aXRoIGEgQ09NTUlUIGZvciB0aGUNCj4gPj4+PiBub3JtYWwgYnVmZmVyZWQgd3Jp
dGVzLiBUZXN0IDUgaGFzIG9ubHkgc2luZ2xlIHBhZ2UgTkZTX0ZJTEVfU1lOQyB3cml0ZXMsDQo+
ID4+Pj4gdGVzdCA2IHNpbXBseSBmYWlscy4NCj4gPj4+PiANCj4gPj4+PiBOb25lIG9mIHRoZSBX
UklURXMgZmFpbC4NCj4gPj4+PiANCj4gPj4+PiBJU1NVRVMNCj4gPj4+PiAxKSBPbmNlIGluIGEg
d2hpbGUgSSBzZWUgInNob3J0IHdyaXRlIiBpbnN0ZWFkIG9mICJQZXJtaXNzaW9uDQo+ID4+Pj4g
ZGVuaWVkIi4gSSBjYW5ub3QgcmVwcm9kdWNlIHRoaXMuIEV2ZW4gdGhlbiB3aXJlc2hhcmsgc2hv
d3MgYWxsIHdyaXRlcyBzdWNjZWVkaW5nLg0KPiA+Pj4+IA0KPiA+Pj4+IDIpIEFsdGhvdWdoIEkg
aGF2ZSB0ZXN0ZWQgcmUtZXN0YWJsaXNoaW5nIEtlcmJlcm9zIGNyZWRlbnRpYWxzIGFmdGVyIHRo
ZQ0KPiA+Pj4+IHRlc3QgcnVuLCBJIGhhdmUgbm90IGRvbmUgYW55IHRlc3RpbmcgcmUtZXN0YWJp
c2hpbmcgS2VyYmVyb3MgY3JlZGVudGlhbHMgDQo+ID4+Pj4gZHVyaW5nIHRoZSBhIHRlc3QgcnVu
IHRoYXQgaXMgd2l0aGluIHRoZSBoaWdoIHdhdGVybWFyay4NCj4gPj4+PiANCj4gPj4+PiAzKSBH
ZW5lcmFsIGNvcm5lciBjYXNlIGFuZCBvdGhlciB0ZXN0aW5nDQo+ID4+Pj4gDQo+ID4+Pj4gLS0+
QW5keQ0KPiA+Pj4+IA0KPiA+Pj4+IEFuZHkgQWRhbXNvbiAoNCk6DQo+ID4+Pj4gU1VOUlBDIGhh
bmRsZSBFS0VZRVhQSVJFRCBpbiBjYWxsX3JlZnJlc2hyZXN1bHQNCj4gPj4+PiBTVU5SUEMgc2V0
IGdzcyBnY19leHBpcnkgdG8gZnVsbCBsaWZldGltZQ0KPiA+Pj4+IFNVTlJQQyBuZXcgcnBjX2Ny
ZWRvcHMgdG8gdGVzdCBjcmVkZW50aWFsIGV4cGlyeQ0KPiA+Pj4+IE5GUyBhdm9pZCBleHBpcmVk
IGNyZWRlbnRpYWwga2V5cyBmb3IgYnVmZmVyZWQgd3JpdGVzDQo+ID4+Pj4gDQo+ID4+Pj4gZnMv
bmZzL2ZpbGUuYyAgICAgICAgICAgICAgICAgIHwgICAxMCArKysrDQo+ID4+Pj4gZnMvbmZzL2lu
dGVybmFsLmggICAgICAgICAgICAgIHwgICAgMiArDQo+ID4+Pj4gZnMvbmZzL25mczNwcm9jLmMg
ICAgICAgICAgICAgIHwgICAgNiArLQ0KPiA+Pj4+IGZzL25mcy9uZnM0ZmlsZWxheW91dC5jICAg
ICAgICB8ICAgIDEgLQ0KPiA+Pj4+IGZzL25mcy9uZnM0cHJvYy5jICAgICAgICAgICAgICB8ICAg
MTggLS0tLS0tLS0NCj4gPj4+PiBmcy9uZnMvbmZzNHN0YXRlLmMgICAgICAgICAgICAgfCAgIDIy
IC0tLS0tLS0tLQ0KPiA+Pj4+IGZzL25mcy9wcm9jLmMgICAgICAgICAgICAgICAgICB8ICAgNDMg
LS0tLS0tLS0tLS0tLS0tLS0tDQo+ID4+Pj4gZnMvbmZzL3dyaXRlLmMgICAgICAgICAgICAgICAg
IHwgICAyOSArKysrKysrKysrKysNCj4gPj4+PiBpbmNsdWRlL2xpbnV4L3N1bnJwYy9hdXRoLmgg
ICAgfCAgIDI3ICsrKysrKysrKysrKw0KPiA+Pj4+IG5ldC9zdW5ycGMvYXV0aC5jICAgICAgICAg
ICAgICB8ICAgMjEgKysrKysrKysrDQo+ID4+Pj4gbmV0L3N1bnJwYy9hdXRoX2dlbmVyaWMuYyAg
ICAgIHwgICA5MyArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrDQo+ID4+
Pj4gbmV0L3N1bnJwYy9hdXRoX2dzcy9hdXRoX2dzcy5jIHwgICA1MSArKysrKysrKysrKysrKysr
KysrLS0tDQo+ID4+Pj4gbmV0L3N1bnJwYy9jbG50LmMgICAgICAgICAgICAgIHwgICAgMSArDQo+
ID4+Pj4gMTMgZmlsZXMgY2hhbmdlZCwgMjMxIGluc2VydGlvbnMoKyksIDkzIGRlbGV0aW9ucygt
KQ0KPiA+Pj4+IA0KPiA+Pj4gDQo+ID4+PiBJbiBnZW5lcmFsLCBJIGxpa2UgdGhpcyBzZXQuLi4N
Cj4gPj4+IA0KPiA+Pj4gTXkgb25seSBjb25jZXJuIGlzIHRoaXMgYmVoYXZpb3IgZGVzY3JpYmVk
IGluIHRoZSBsYXN0IHBhdGNoOg0KPiA+Pj4gDQo+ID4+PiAiSWYgdGhlIGJ1ZmZlcmVkIFdSSVRF
IGlzIHVzaW5nIGEgY3JlZGVudGlhbCBrZXkgdGhhdCB3aWxsIGV4cGlyZQ0KPiA+Pj4gd2l0aGlu
IGxvdyB3YXRlcm1hcmsgc2Vjb25kcywgZmFpbCB0aGUgV1JJVEUgaW4gbmZzX3dyaXRlX2JlZ2lu
DQo+ID4+PiBfYmVmb3JlXyB0aGUgV1JJVEUgaXMgYnVmZmVyZWQgYW5kIHJldHVybiAtRUFDQ0VT
IHRvIHRoZSBhcHBsaWNhdGlvbi4iDQo+ID4+PiANCj4gPj4+IFNob3VsZG4ndCByZWplY3Rpbmcg
dGhlIHdyaXRlIGF0dGVtcHQgYmUgdGhlIHB1cnZpZXcgb2YgdGhlIHNlcnZlcj8NCj4gPj4gDQo+
ID4+IFRoZSBUR1QgbGlmZXRpbWUgaXMgc2hhcmVkIGJ5IHRoZSBjbGllbnQgYW5kIHRoZSBzZXJ2
ZXIgLSBpdCdzIHRoZSBzYW1lLCBzbyBJIGRvbid0IHRoaW5rIHdlJ2QgZ2FpbiBhbnl0aGluZyBi
eSBkb2luZyB0aGF0Lg0KPiA+PiANCj4gPj4+IA0KPiA+Pj4gSXQgc2VlbXMgdG8gbWUgdGhhdCB3
ZSdkIGJlIGJlc3Qgb2ZmIGp1c3Qgc3dpdGNoaW5nIHRvIHN5bmNocm9ub3VzDQo+ID4+PiB3cml0
ZXMgd2hlbiB3ZSBzdGFydCBhcHByb2FjaGluZyBjcmVkZW50aWFsIGV4cGlyYXRpb24sIGFuZCBs
ZXR0aW5nIHRoZQ0KPiA+Pj4gc2VydmVyIGhhbmRsZSB0aGUgY2FzZSB3aGVyZSB0aGUgY3JlZGVu
dGlhbCBleHBpcmVzLg0KPiA+PiANCj4gPj4gWW91IG1lYW4gc3dpdGNoaW5nIHRvIGRpcmVjdCBJ
Tz8gSSBiZWxpZXZlIHRoYXQgbWVhbnMgdGFraW5nIGEgZGlmZmVyZW50IHBhdGggaW4gdGhlIFZG
UyBsYXllciB3aGljaCBJIGZvdW5kIGRpZmZpY3VsdCB0byBkbyBmcm9tIG5mc193cml0ZV9iZWdp
bi4gQnV0IG1heWJlIEkganVzdCBkaWRuJ3Qgc2VlIGEgZ29vZCB3YXkgdG8gZG8gdGhpcy4NCj4g
Pj4gDQo+ID4+IA0KPiA+Pj4gSXQgc2VlbXMgdG8gbWUgdGhhdCB0aGUgY2xpZW50IHNob3VsZCBq
dXN0IGJlIGluIHRoZSBidXNpbmVzcyBvZg0KPiA+Pj4gcmVjb2duaXppbmcgd2hlbiB0aGUgY3Jl
ZGVudGlhbCBtaWdodCBiZSBhYm91dCB0byBleHBpcmUgYW5kIGF0dGVtcHQgdG8NCj4gPj4+IGVu
c3VyZSB0aGF0IHdlIGRvbid0IGVuZCB1cCB3aXRoIGEgYnVuY2ggb2YgYnVmZmVyZWQgZGF0YSBp
biB0aGF0IGNhc2UuDQo+ID4+IA0KPiA+PiBZZXMuDQo+ID4+IA0KPiA+PiAtLT5BbmR5DQo+ID4+
IA0KPiA+Pj4gDQo+ID4+PiAtLSANCj4gPj4+IEplZmYgTGF5dG9uIDxqbGF5dG9uQHJlZGhhdC5j
b20+DQo+ID4+IA0KPiA+IA0KPiA+IA0KPiA+IEkgbWVhbnQgdGhhdCBJIGRpZG4ndCByZWFsbHkg
c2VlIHRoZSBwb2ludCBvZiB0aGUgdHdvIGRpZmZlcmVudCBzdGF0ZXMNCj4gPiB3aGVuIHdlJ3Jl
IGFwcHJvYWNoaW5nIHRpY2tldCBleHBpcnkuDQo+ID4gDQo+ID4gSUlVQywgYXQgdGhlICJoaWdo
IHdhdGVybWFyayIsIHlvdSdsbCBzd2l0Y2ggdG8gc3luY2hyb25vdXMgd3JpdGVzLCBhbmQNCj4g
PiBwcmVzdW1hYmx5IHdpbGwgc3RhcnQgc3luY2luZyBvdXQgYW55IGJ1ZmZlcmVkIGRhdGEuIEF0
IHRoYXQgcG9pbnQsDQo+ID4gd2hhdCBhcmUgeW91IGJ1eWluZyBieSByZWplY3Rpbmcgd3JpdGVz
IGFsdG9nZXRoZXIgd2l0aCBFQUNDRVMgYXQgdGhlDQo+ID4gbG93IHdhdGVybWFyaz8NCj4gPiAN
Cj4gPiBJdCBzZWVtcyB0byBtZSB0aGF0IHRoZSBoaWdoIHdhdGVybWFyayBzaG91bGQgYmUgYWxs
IHlvdSBuZWVkIHRvIGRlYWwNCj4gPiB3aXRoIGFzIGEgY2xpZW50LiBPbmNlIHlvdSdyZSBwYXN0
IHRoYXQgcG9pbnQgeW91IGNhbiBqdXN0IGxldCB0aGUNCj4gPiBjbGllbnQgYXR0ZW1wdCB0aGUg
d3JpdGVzLiBJZiB0aGV5IGZhaWwsIHRoZW4gbm8gYmlnIGRlYWwgLS0geW91IHdlcmUNCj4gPiBk
b2luZyBzeW5jaHJvbm91cyB3cml0ZXMgYWxyZWFkeSBzbyB5b3UndmUgYWxyZWFkeSBhdm9pZGVk
IGhhdmluZyBhDQo+ID4gYnVuY2ggb2YgZGF0YSBidWZmZXJlZCB1cC4NCj4gDQo+IEhpIEplZmYN
Cj4gDQo+IEFmdGVyIGRvaW5nIG1vcmUgdGVzdCB2ZXJpZmljYXRpb24sIGhlcmUgYXJlIHRoZSBy
ZWFzb25zIGZvciB0aGUgbG93IHdhdGVybWFyay4gUmVhc29uICMyIGlzIHRoZSBzdHJvbmdlc3Qg
anVzdGlmaWNhdGlvbi4NCj4gDQo+IDEpIEl0J3MgYSBiYWQgaWRlYSB0byAianVzdCBsZXQgdGhl
IGNsaWVudCBhdHRlbXB0IHRoZSB3cml0ZXMiLg0KPiANCj4gVGhlIGNsaWVudCBuZWVkcyBhIHZh
bGlkIEdTUyBjb250ZXh0IHRvIHNlbmQgYW4gUlBDU0VDX0dTUyBSUEMgYmVjYXVzZSBvZiB0aGUg
Y2hlY2tzdW0gb2YgdGhlIFJQQyBoZWFkZXIgKHVwIHRvIGFuZCBpbmNsdWRpbmcgdGhlIGNyZWRl
bnRpYWwpIHRoYXQgaXMgY29tcHV0ZWQgdXNpbmcgdGhlIEdTU19HZXRNSUMoKSBjYWxsIHdpdGgg
ZW5jdHlwZSBhbmQgdGhlIFFPUCBmcm9tIHRoZSBHU1MgY29udGV4dC4NCj4gDQo+IFBsdXMsIHRo
ZSBjbGllbnQgYW5kIHRoZSBzZXJ2ZXIgc2VlIHRoZSBzYW1lIFRHVCBsaWZldGltZSwgYW5kIHRo
dXMgdGhlIHNhbWUgR1NTIGNvbnRleHQgZXhwaXJhdGlvbi4NCj4gDQo+IFRoZXJlZm9yZSwgdGhl
IG9ubHkgdGltZSB0aGUgY2xpZW50IGNhbiBzZW5kIGFuIFJQQ1NFQ19HU1MgS2VyYmVyb3MgR1NT
IHNlY3VyaXR5IG1lY2hhbmlzbSBSUEMgdG8gdGhlIHNlcnZlciBhbmQgaGF2ZSBpdCByZWplY3Qg
dGhlIFJQQyB3aXRoIC1FQUNDRVMgKG9yIHRoZSBSUEMgbGV2ZWwgUlBDU0VDX0dTU19DVFhQUk9C
TEVNKSBpcyBpbiB0aGUgY2FzZSB3aGVyZSB0aGUgR1NTIGNvbnRleHQgZXhwaXJlcyB3aGlsZSB0
aGUgUlBDIGlzIGluIGZsaWdodCBvciBwcmlvciB0byB0aGUgUlBDU0VDX0dTUyBwcm9jZXNzaW5n
IGJ5IHRoZSBzZXJ2ZXIuICA5OS45OTklIG9mIHRoZSB0aW1lLCB0aGUgY2xpZW50IGRldGVybWlu
ZXMgd2hldGhlciBvciBub3QgdGhlIFJQQ1NFQ19HU1MgIGNhbGwgZmFpbHMgcHJpb3IgdG8gcHV0
dGluZyBhbiBSUEMgb24gdGhlIHdpcmUgYmFzZWQgb24gaXQncyBHU1MgY29udGV4dCBleHBpcnku
ICBTbyBpdCBpcyBub3QgYW4gb3B0aW9uIHRvIHdhaXQgZm9yIHRoZSBzZXJ2ZXIgdG8gcmVqZWN0
IHRoZSBSUEMuDQoNCk5BQ0suIFRoYXQncyBub3QgYW4gYWNjZXB0YWJsZSBzdHJhdGVneS4NCg0K
TmVpdGhlciB0aGUgTkZTIGNsaWVudCBub3IgdGhlIFJQQyBjbGllbnQga25vd3Mgd2hldGhlciBv
ciBub3QgdGhlIFRHVA0KaGFzIGJlZW4gcmVuZXdlZCB1bnRpbCBpdCB0cmllcyB0byBlc3RhYmxp
c2ggYSBuZXcgUlBDU0VDX0dTUyBzZXNzaW9uLg0KDQpJZiB3ZSBkb24ndCBhdHRlbXB0IHRoZSBX
UklURSwgdGhlbiB3ZSBkb24ndCBrbm93IGlmIGl0IHdpbGwgc3VjY2VlZCBvcg0KZmFpbC4NCg0K
PiAyKSBJdCdzIGEgcmVhbGx5IHJlYWxseSBnb29kIGlkZWEgdG8gZ2l2ZSB0aGUgY2xpZW50IHRp
bWUgdG8gY2xlYW4gdXAgYnkgc2VuZGluZyBhIENMT1NFIHRvIGxldCB0aGUgc2VydmVyIGtub3cg
d2hhdCBpcyBnb2luZyBvbi4gTm90ZSBmb3IgcE5GUyB0aGUgQ0xPU0UgYWxzbyBjYXJyaWVzIGFu
IGltcGxpY2l0IExBWU9VVFJFVFVSTiBmb3IgdGhlIHJldHVybl9vbl9jbG9zZSBjYXNlLiAgV2l0
aG91dCB0aGUgQ0xPU0UsIHRoZSBzZXJ2ZXIgc2ltcGx5IHNlZXMgV1JJVEVzIHN0b3AuICBCVFcg
dGhpcyBpcyBhIGdvb2QgcmVhc29uIHRvIGRvIHRoZSBzYW1lIGZvciBSRUFELg0KDQpOby4gVGhp
cyB0b28gaXMgYSBiYWQgaWRlYS4gWW91J3JlIHByZS1lbXB0aW5nIGFueSBhY3Rpb24gdGhhdCB0
aGUNCmFwcGxpY2F0aW9uIG9yIHVzZXIgbWF5IHRha2UuDQoNCj4gMykgRXh0cmEgdGltZSB0byBm
bHVzaCBidWZmZXJlZCBkYXRhLiBXaXRoIG15IHBhdGNoIHNldCwgZGF0YSBpcyBhbGxvd2VkIHRv
IGJlIGJ1ZmZlcmVkLCBidXQgdGhlbiBmbHVzaGVkIHByaW9yIHRvIG5mc193cml0ZV9lbmQncyBy
ZXR1cm4uIElJVUMsIGEgKG11bHRpdGhyZWFkZWQpIGFwcGxpY2F0aW9uIGNhbiBiZSB3cml0aW5n
IGEgdmVyeSBsYXJnZSBhbW91bnQgb2YgZGF0YSwgd2hpY2ggZHVyaW5nIHRoZSBoaWdoIHdhdGVy
IG1hcmsgaXMgc2VudCAxIFBBR0UgYXQgYSB0aW1lIE5GU19GSUxFX1NZTkMuICBUaGlzIHRha2Vz
IHNpZ25pZmljYW50bHkgbG9uZ2VyIHRoYW4gY29hbGVzY2VkIGJ1ZmZlcmVkIFVOU1RBQkxFIFdS
SVRFcyBhbmQgbWFueSBtb3JlIFJQQ3MgKG5vdCBjb250aW51YXRpb25zKS4gICBUaGUgcHVycG9z
ZSBvZiB0aGlzIGNvZGUgaXMgdG8gdG90YWxseSBkcmFpbiB0aGUgYnVmZmVyZWQgd3JpdGVzLiBU
aGUgbG93IHdhdGVyIG1hcmsgcHJvdmlkZXMgYSBwZXJpb2Qgb2Ygbm8gbmV3IFdSSVRFcyBnaXZp
bmcgbW9yZSBSUEMgcmVzb3VyY2VzIHRvIHRoZSBleGl0aW5nIHNpbmdsZSBQQUdFIGRyYWluaW5n
LiAgSSBzdXNwZWN0IHRoYXQgd2Ugd2lsbCBuZWVkIHRvIG1ha2UgdGhlIHdhdGVybWFyayhzKSB0
dW5hYmxlIHZpYSBtb2R1bGUgcGFyYW1ldGVycyBmb3IgZGlmZmVyZW50IHdvcmtsb2Fkcy4NCg0K
Rmx1c2hpbmcgdGhlIGJ1ZmZlciBpZiB3ZSB0aGluayB0aGF0IHRoZSBHU1Mgc2Vzc2lvbiBpcyBh
Ym91dCB0byBmYWlsDQpfaXNfIGEgZ29vZCBzdHJhdGVneS4gSSdtIGZpbmUgd2l0aCB0aGlzIHBh
cnQgb2YgdGhlIHBhdGNoZXMuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xp
ZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3
Lm5ldGFwcC5jb20NCg==

2012-09-12 16:14:42

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes


On Sep 12, 2012, at 11:21 AM, Myklebust, Trond wrote:

> On Wed, 2012-09-12 at 15:13 +0000, Adamson, Andy wrote:
>> On Sep 10, 2012, at 4:08 PM, Jeff Layton wrote:
>>
>>> On Mon, 10 Sep 2012 19:52:35 +0000
>>> "Adamson, Andy" <[email protected]> wrote:
>>>
>>>>
>>>> On Sep 10, 2012, at 2:57 PM, Jeff Layton wrote:
>>>>
>>>>> On Thu, 6 Sep 2012 15:54:07 -0400
>>>>> [email protected] wrote:
>>>>>
>>>>>> From: Andy Adamson <[email protected]>
>>>>>>
>>>>>> This patch set is a Request for Comments: I think it does a fair job of
>>>>>> handling the issue - but it is complicated, and other brains might find a
>>>>>> better way to get the job done.
>>>>>>
>>>>>> We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
>>>>>> context key) that is about to expire or has expired. We currently will
>>>>>> paint ourselves into a corner by returning success to the applciation
>>>>>> for such a buffered WRITE, only to discover that we do not have permission when
>>>>>> we attempt to flush the WRITE (and potentially associated COMMIT) to disk.
>>>>>> This results in data corruption.
>>>>>>
>>>>>> First, a couple of "setup" patches:
>>>>>> 1) Patch SUNRPC handle EKEYEXPIRED in call_refreshresult returns EACCES to the
>>>>>> application on an expired or non-existent gss context when the users (Kerberos)
>>>>>> credentials have also expired or are non-existent. Current behavior is to
>>>>>> retry upcalls to GSSD forever. Please see patch comment for detail.
>>>>>>
>>>>>> 2) Patch SUNRPC set gss gc_expiry to full lifetime works in conjunction with
>>>>>> the gssd patch "0001-GSSD-Pass-GSS_context-lifetime-to-the-kernel". The
>>>>>> gssd patch passes the actual remaining TGT lifetime in the downcall, and
>>>>>> this kernel patch sets the gss context gc_expiry to this lifetime.
>>>>>>
>>>>>> Then the two patches that avoid using an expired credential key:
>>>>>> 3) Patch SUNRPC new rpc_credops to test credential expiry is the heart of this
>>>>>> work. It provides the RPC layer helper functions to allow NFS to manage
>>>>>> data in the face of expired credentials
>>>>>>
>>>>>> 4) Patch NFS avoid expired credential keys for buffered writes calls the
>>>>>> RPC helper functions.
>>>>>>
>>>>>> Pages for buffered WRITEs are allocated in nfs_write_begin where we have an
>>>>>> nfs_open_context and associated rpc_cred. This is a generic rpc_cred, NOT
>>>>>> the gss_cred used in the actual WRITE RPC. Each WRITE RPC call takes the generic
>>>>>> rpc_cred (or uses the 'current_cred') uid and uses it to lookup the associated
>>>>>> gss_cred and gss_context in the call_refresh RPC state. So, there is a
>>>>>> one-to-one association between the nfs_open_context generic_cred and a
>>>>>> gss_cred with a matching uid and a valid non expired gss context.
>>>>>>
>>>>>> We need to check the nfs_open_context generic cred 'underlying' gss_cred
>>>>>> gss_context gc_expiry in nfs_write_begin to determine if there is enough time
>>>>>> left in the gss_context lifetime to complete the buffered WRITE.
>>>>>>
>>>>>> I started by adding a "key_timeout" rpc_authops routine only set by the generic
>>>>>> auth to do this work, called by rpcauth_key_timeout_notify, called from
>>>>>> nfs_write_begin. It does the lookup of the gss_cred (see the patch for
>>>>>> fast-tracking of non-gss underlying creds) and then tests the gss_context
>>>>>> gc_expiry against timeouts by calling a new crkey_timeout rpc_credops set
>>>>>> only for the gss_cred.
>>>>>>
>>>>>> I came up with the idea of two credential key timeout watermarks:
>>>>>> 1) A high watermark, RPC_KEY_EXPIRE_TIMEO set to 90 seconds.
>>>>>> 2) A low watermark, RPC_KEY_EXPIRE_FAIL set to 30 seconds.
>>>>>> NOTE: these timeouts are guesses that work in a VM environment. We may want to
>>>>>> make them adjustable via module parameters.
>>>>>>
>>>>>> If key_timeout is called on a credential with an underlying credential key that
>>>>>> will expire within high watermark seconds, we set the RPC_CRED_KEY_EXPIRE_SOON
>>>>>> flag in the generic_cred acred so that the NFS layer can clean up prior to
>>>>>> key expiration It does this by calling a new crkey_to_expire rpc_credop set
>>>>>> only for generic creds that tests for the RPC_CRED_KEY_EXPIRE_SOON flag.
>>>>>>
>>>>>> If the RPC_CRED_KEY_EXPIRE_SOON flag is set in the nfs_open_context generic
>>>>>> cred (the acred portion), then nfs_write_end will call nfs_wb_all
>>>>>> on EVERY WRITE CALL it sees, and nfs_write_rpcsetup will send NFS_FILE_SYNC
>>>>>> WRITEs. The idea of the high watermark is to give time to flush all buffered
>>>>>> WRITEs and COMMITs, as well as to continue to WRITE, but only with
>>>>>> NFS_FILE_SYNC, allowing the application to try to finish writing before the
>>>>>> gss context expires. NOTE that this means EACH WRITE within the high watermark
>>>>>> timeout is a singe PAGE of NFS_FILE_SYNC. I think this is fine, because we are
>>>>>> in a failure mode - the most important thing is to NOT fail a buffered WRITE..
>>>>>>
>>>>>> If key_timeout is called on a credential key that will expire within low
>>>>>> watermark seconds, we return EACCES from nfs_write_begin to stop
>>>>>> the buffered WRITE using the credential key.
>>>>>>
>>>>>> Checking a generic credential's underlying credential involves a cred lookup.
>>>>>> To avoid this lookup in the normal case when the underlying credential has
>>>>>> a key that is valid (before the high watermark), a notify flag is set in
>>>>>> the generic credential the first time the key_timeout is called. The
>>>>>> generic credential then stops checking the underlying credential key expiry, and
>>>>>> the underlying credential (gss_cred) match routine then checks the key
>>>>>> expiration upon each normal use and sets a flag in the associated generic
>>>>>> credential only when the key expiration is within the high watermark.
>>>>>> This in turn signals the generic credential key_timeout to perform the extra
>>>>>> credetial lookup thereafter.
>>>>>>
>>>>>> TING:
>>>>>>
>>>>>> I have tested these patches with a TGT of 5 minutes, and a modified
>>>>>> Connectathon special test: bigfile.c that only writes and never flushes.
>>>>>> I also increased the file size to 524288000 bytes to give me time.
>>>>>> I named the test buffered-write.
>>>>>>
>>>>>> I kinit, and run 6 instances of buffered write with the first and maybe second
>>>>>> test completing prior to TGT expiration, and the third/fourth tests spanning
>>>>>> the high water mark. The fifth test starts within the high water mark, and the
>>>>>> sixth test starts within the low water mark.
>>>>>>
>>>>>> I have seen the expected behavior: test 1,2 succeed. Tests 3-6 fail with
>>>>>> Permission Denied. Tests 3 and 4 start with normal buffered UNSTABLE writes
>>>>>> then switch to single page NFS_FILE_SYNC writes with a COMMIT for the
>>>>>> normal buffered writes. Test 5 has only single page NFS_FILE_SYNC writes,
>>>>>> test 6 simply fails.
>>>>>>
>>>>>> None of the WRITEs fail.
>>>>>>
>>>>>> ISSUES
>>>>>> 1) Once in a while I see "short write" instead of "Permission
>>>>>> denied". I cannot reproduce this. Even then wireshark shows all writes succeeding.
>>>>>>
>>>>>> 2) Although I have tested re-establishing Kerberos credentials after the
>>>>>> test run, I have not done any testing re-estabishing Kerberos credentials
>>>>>> during the a test run that is within the high watermark.
>>>>>>
>>>>>> 3) General corner case and other testing
>>>>>>
>>>>>> -->Andy
>>>>>>
>>>>>> Andy Adamson (4):
>>>>>> SUNRPC handle EKEYEXPIRED in call_refreshresult
>>>>>> SUNRPC set gss gc_expiry to full lifetime
>>>>>> SUNRPC new rpc_credops to test credential expiry
>>>>>> NFS avoid expired credential keys for buffered writes
>>>>>>
>>>>>> fs/nfs/file.c | 10 ++++
>>>>>> fs/nfs/internal.h | 2 +
>>>>>> fs/nfs/nfs3proc.c | 6 +-
>>>>>> fs/nfs/nfs4filelayout.c | 1 -
>>>>>> fs/nfs/nfs4proc.c | 18 --------
>>>>>> fs/nfs/nfs4state.c | 22 ---------
>>>>>> fs/nfs/proc.c | 43 ------------------
>>>>>> fs/nfs/write.c | 29 ++++++++++++
>>>>>> include/linux/sunrpc/auth.h | 27 ++++++++++++
>>>>>> net/sunrpc/auth.c | 21 +++++++++
>>>>>> net/sunrpc/auth_generic.c | 93 ++++++++++++++++++++++++++++++++++++++++
>>>>>> net/sunrpc/auth_gss/auth_gss.c | 51 +++++++++++++++++++---
>>>>>> net/sunrpc/clnt.c | 1 +
>>>>>> 13 files changed, 231 insertions(+), 93 deletions(-)
>>>>>>
>>>>>
>>>>> In general, I like this set...
>>>>>
>>>>> My only concern is this behavior described in the last patch:
>>>>>
>>>>> "If the buffered WRITE is using a credential key that will expire
>>>>> within low watermark seconds, fail the WRITE in nfs_write_begin
>>>>> _before_ the WRITE is buffered and return -EACCES to the application."
>>>>>
>>>>> Shouldn't rejecting the write attempt be the purview of the server?
>>>>
>>>> The TGT lifetime is shared by the client and the server - it's the same, so I don't think we'd gain anything by doing that.
>>>>
>>>>>
>>>>> It seems to me that we'd be best off just switching to synchronous
>>>>> writes when we start approaching credential expiration, and letting the
>>>>> server handle the case where the credential expires.
>>>>
>>>> You mean switching to direct IO? I believe that means taking a different path in the VFS layer which I found difficult to do from nfs_write_begin. But maybe I just didn't see a good way to do this.
>>>>
>>>>
>>>>> It seems to me that the client should just be in the business of
>>>>> recognizing when the credential might be about to expire and attempt to
>>>>> ensure that we don't end up with a bunch of buffered data in that case.
>>>>
>>>> Yes.
>>>>
>>>> -->Andy
>>>>
>>>>>
>>>>> --
>>>>> Jeff Layton <[email protected]>
>>>>
>>>
>>>
>>> I meant that I didn't really see the point of the two different states
>>> when we're approaching ticket expiry.
>>>
>>> IIUC, at the "high watermark", you'll switch to synchronous writes, and
>>> presumably will start syncing out any buffered data. At that point,
>>> what are you buying by rejecting writes altogether with EACCES at the
>>> low watermark?
>>>
>>> It seems to me that the high watermark should be all you need to deal
>>> with as a client. Once you're past that point you can just let the
>>> client attempt the writes. If they fail, then no big deal -- you were
>>> doing synchronous writes already so you've already avoided having a
>>> bunch of data buffered up.
>>
>> Hi Jeff
>>
>> After doing more test verification, here are the reasons for the low watermark. Reason #2 is the strongest justification.
>>
>> 1) It's a bad idea to "just let the client attempt the writes".
>>
>> The client needs a valid GSS context to send an RPCSEC_GSS RPC because of the checksum of the RPC header (up to and including the credential) that is computed using the GSS_GetMIC() call with enctype and the QOP from the GSS context.
>>
>> Plus, the client and the server see the same TGT lifetime, and thus the same GSS context expiration.
>>
>> Therefore, the only time the client can send an RPCSEC_GSS Kerberos GSS security mechanism RPC to the server and have it reject the RPC with -EACCES (or the RPC level RPCSEC_GSS_CTXPROBLEM) is in the case where the GSS context expires while the RPC is in flight or prior to the RPCSEC_GSS processing by the server. 99.999% of the time, the client determines whether or not the RPCSEC_GSS call fails prior to putting an RPC on the wire based on it's GSS context expiry. So it is not an option to wait for the server to reject the RPC.
>
> NACK. That's not an acceptable strategy.

>
> Neither the NFS client nor the RPC client knows whether or not the TGT
> has been renewed until it tries to establish a new RPCSEC_GSS session.

Well, it is unknown until an upcall to GSSD is done which currently is only triggered by trying to establish a new RPCSEC_GSS session.

>
> If we don't attempt the WRITE, then we don't know if it will succeed or
> fail.

I disagree with that strategy, because of all the reasons stated as justification of the low watermark.

I propose that we provide a strategy of checking for the renewal of the TGT once we are within the watermark(s) independent from the current "refresh only if it's failed" in the RPC layer. Once we are in the watermark(s) we are preparing to FAIL. If we find the TGT refreshed, then we can back off the nfs_wb_all/NFS_FILE_SYNC in the high watermark.

For starters, I propose that we do a refresh upcall once the first time we see we are within the high water mark, and once the first time we see we are in the the low water mark prior to setting -EACCES. This could simply be a flag that triggers a refresh upcall in generic_key_timeout.

>
>> 2) It's a really really good idea to give the client time to clean up by sending a CLOSE to let the server know what is going on. Note for pNFS the CLOSE also carries an implicit LAYOUTRETURN for the return_on_close case. Without the CLOSE, the server simply sees WRITEs stop. BTW this is a good reason to do the same for READ.
>
> No. This too is a bad idea. You're pre-empting any action that the
> application or user may take.

So does letting the GSS context expire prior to close. If the application gets an -EACCES on a write and is coded to clean up after such a failure e.g. call CLOSE, and then CLOSE fails due to expired GSS context which mean all UNLOCKs also fail? This pre-empts the action of recovery that the application wants to take.

Applications are written for AUTH_SYS. They are not used to not being able to clean up after an error.

I think at least a short (5-10 sec) low water mark makes sense along with the proposed additional refresh up calls.

I guess what we really need is a way for TGT renewal (kinit) to notify the RPC layer. I'll think about this WRT using the keyring.

>
>> 3) Extra time to flush buffered data. With my patch set, data is allowed to be buffered, but then flushed prior to nfs_write_end's return. IIUC, a (multithreaded) application can be writing a very large amount of data, which during the high water mark is sent 1 PAGE at a time NFS_FILE_SYNC. This takes significantly longer than coalesced buffered UNSTABLE WRITEs and many more RPCs (not continuations). The purpose of this code is to totally drain the buffered writes. The low water mark provides a period of no new WRITEs giving more RPC resources to the exiting single PAGE draining. I suspect that we will need to make the watermark(s) tunable via module parameters for different workloads.
>
> Flushing the buffer if we think that the GSS session is about to fail
> _is_ a good strategy. I'm fine with this part of the patches.

Well, that part of the patches does not work by simply removing the low water mark. I'll find out why i'm never seeing any return other than 0 from nfs_wb_all(). Do you know why?

I'll leave the low water mark in for my testing, and include the ability of setting the water marks via module parameter - so I can include setting the low water mark to 0. Then we can see how various applications actually respond to having the rug pulled out from under them with/without an opportunity to recover.

-->Andy

>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com


2012-09-13 18:12:37

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes


On Sep 13, 2012, at 1:57 PM, J. Bruce Fields wrote:

> On Wed, Sep 12, 2012 at 04:14:38PM +0000, Adamson, Andy wrote:
>>
>> On Sep 12, 2012, at 11:21 AM, Myklebust, Trond wrote:
>>
>>> On Wed, 2012-09-12 at 15:13 +0000, Adamson, Andy wrote:
>>>> After doing more test verification, here are the reasons for the low watermark. Reason #2 is the strongest justification.
>
> 1 and 2 don't sound right. What exactly were the test failures?
>
> The client and server's gss code already check the context expiry for
> us--we don't want an extra check in an upper layer on the client.

Yeah, I agree that another upcall is not really the right way to go.

What we want is to be notified (on the client) when a user's credential has changed - either destroyed or renewed. I have some ideas which I will propose after these patches are resolved.

>
> The context *will* expire unexpectedly sometimes, and we do have to
> handle that. (The server's clock could be a tad faster than the
> server's, or the server could reboot, etc., etc.)

Agreed, and the patch set needs to recognize (handle) this. But this is not the normal case.

>
> I agree with all the suggestions for trying to anticipate expiry in the
> normal cases and preparing to minimize the damage, that's fine. But
> once the expiry finally comes we should leave the existing mechanisms to
> do their job.

I'm not changing what happens at GSS context expiry, but what happens just before the context expires.

I do hear the arguments against the low watermark, and will submit patches with a single (the so-called high) watermark. We can then proceed from there. :)

-->Andy

>
> --b.


2012-09-13 18:21:59

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

On Thu, Sep 13, 2012 at 06:09:05PM +0000, Myklebust, Trond wrote:
> On Thu, 2012-09-13 at 13:57 -0400, J. Bruce Fields wrote:
> > On Wed, Sep 12, 2012 at 04:14:38PM +0000, Adamson, Andy wrote:
> > >
> > > On Sep 12, 2012, at 11:21 AM, Myklebust, Trond wrote:
> > >
> > > > On Wed, 2012-09-12 at 15:13 +0000, Adamson, Andy wrote:
> > > >> After doing more test verification, here are the reasons for the low watermark. Reason #2 is the strongest justification.
> >
> > 1 and 2 don't sound right. What exactly were the test failures?
> >
> > The client and server's gss code already check the context expiry for
> > us--we don't want an extra check in an upper layer on the client.
> >
> > The context *will* expire unexpectedly sometimes, and we do have to
> > handle that. (The server's clock could be a tad faster than the
> > server's, or the server could reboot, etc., etc.)
> >
> > I agree with all the suggestions for trying to anticipate expiry in the
> > normal cases and preparing to minimize the damage, that's fine. But
> > once the expiry finally comes we should leave the existing mechanisms to
> > do their job.
>
> Right, but the problem that the existing mechanisms have is that due to
> asynchronous reads and writes, the application can end up eating
> sizeable chunks of memory. It can also end up grabbing locks without
> being able to free them afterwards.
>
> SP4_MACH_CRED solves most of these issues, but NFSv4.1 is less pervasive
> than NFSv4 at this point, so it would be nice to have a solution for the
> latter too.

Yep, understood.

--b.

2012-09-06 19:54:09

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 1/4] SUNRPC handle EKEYEXPIRED in call_refreshresult

From: Andy Adamson <[email protected]>

Currently, when an RPCSEC_GSS context has expired or is non-existent
and the users (Kerberos) credentials have also expired or are non-existent,
the client receives the -EKEYEXPIRED error and tries to refresh the context
forever. If an application is performing I/O, or other work against the share,
the application hangs, and the user is not prompted to refresh/establish their
credentials. This can result in a denial of service for other users.

Users are expected to manage their Kerberos credential lifetimes to mitigate
this issue.

Move the -EKEYEXPIRED handling into the RPC layer. Try tk_cred_retry number
of times to refresh the gss_context, and then return -EACCES to the application.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfs/nfs3proc.c | 6 +++---
fs/nfs/nfs4filelayout.c | 1 -
fs/nfs/nfs4proc.c | 18 ------------------
fs/nfs/nfs4state.c | 22 ----------------------
fs/nfs/proc.c | 43 -------------------------------------------
net/sunrpc/clnt.c | 1 +
6 files changed, 4 insertions(+), 87 deletions(-)

diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index d6b3b5f..29e5155 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -24,14 +24,14 @@

#define NFSDBG_FACILITY NFSDBG_PROC

-/* A wrapper to handle the EJUKEBOX and EKEYEXPIRED error messages */
+/* A wrapper to handle the EJUKEBOX error messages */
static int
nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
{
int res;
do {
res = rpc_call_sync(clnt, msg, flags);
- if (res != -EJUKEBOX && res != -EKEYEXPIRED)
+ if (res != -EJUKEBOX)
break;
freezable_schedule_timeout_killable(NFS_JUKEBOX_RETRY_TIME);
res = -ERESTARTSYS;
@@ -44,7 +44,7 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
static int
nfs3_async_handle_jukebox(struct rpc_task *task, struct inode *inode)
{
- if (task->tk_status != -EJUKEBOX && task->tk_status != -EKEYEXPIRED)
+ if (task->tk_status != -EJUKEBOX)
return 0;
if (task->tk_status == -EJUKEBOX)
nfs_inc_stats(inode, NFSIOS_DELAY);
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 53f94d9..000e01e 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -169,7 +169,6 @@ static int filelayout_async_handle_error(struct rpc_task *task,
break;
case -NFS4ERR_DELAY:
case -NFS4ERR_GRACE:
- case -EKEYEXPIRED:
rpc_delay(task, FILELAYOUT_POLL_RETRY_MAX);
break;
case -NFS4ERR_RETRY_UNCACHED_REP:
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6352741..48b5cfe 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -344,7 +344,6 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc
}
case -NFS4ERR_GRACE:
case -NFS4ERR_DELAY:
- case -EKEYEXPIRED:
ret = nfs4_delay(server->client, &exception->timeout);
if (ret != 0)
break;
@@ -1374,13 +1373,6 @@ int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state
nfs_inode_find_state_and_recover(state->inode,
stateid);
nfs4_schedule_stateid_recovery(server, state);
- case -EKEYEXPIRED:
- /*
- * User RPCSEC_GSS context has expired.
- * We cannot recover this stateid now, so
- * skip it and allow recovery thread to
- * proceed.
- */
case -ENOMEM:
err = 0;
goto out;
@@ -3975,7 +3967,6 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
case -NFS4ERR_DELAY:
nfs_inc_server_stats(server, NFSIOS_DELAY);
case -NFS4ERR_GRACE:
- case -EKEYEXPIRED:
rpc_delay(task, NFS4_POLL_RETRY_MAX);
task->tk_status = 0;
return -EAGAIN;
@@ -4949,15 +4940,6 @@ int nfs4_lock_delegation_recall(struct nfs4_state *state, struct file_lock *fl)
nfs4_schedule_stateid_recovery(server, state);
err = 0;
goto out;
- case -EKEYEXPIRED:
- /*
- * User RPCSEC_GSS context has expired.
- * We cannot recover this stateid now, so
- * skip it and allow recovery thread to
- * proceed.
- */
- err = 0;
- goto out;
case -ENOMEM:
case -NFS4ERR_DENIED:
/* kill_proc(fl->fl_pid, SIGLOST, 1); */
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 55148de..8e7207e 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1315,14 +1315,6 @@ restart:
/* Mark the file as being 'closed' */
state->state = 0;
break;
- case -EKEYEXPIRED:
- /*
- * User RPCSEC_GSS context has expired.
- * We cannot recover this stateid now, so
- * skip it and allow recovery thread to
- * proceed.
- */
- break;
case -NFS4ERR_ADMIN_REVOKED:
case -NFS4ERR_STALE_STATEID:
case -NFS4ERR_BAD_STATEID:
@@ -1475,14 +1467,6 @@ static void nfs4_state_start_reclaim_nograce(struct nfs_client *clp)
nfs4_state_mark_reclaim_helper(clp, nfs4_state_mark_reclaim_nograce);
}

-static void nfs4_warn_keyexpired(const char *s)
-{
- printk_ratelimited(KERN_WARNING "Error: state manager"
- " encountered RPCSEC_GSS session"
- " expired against NFSv4 server %s.\n",
- s);
-}
-
static int nfs4_recovery_handle_error(struct nfs_client *clp, int error)
{
switch (error) {
@@ -1516,10 +1500,6 @@ static int nfs4_recovery_handle_error(struct nfs_client *clp, int error)
case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
set_bit(NFS4CLNT_BIND_CONN_TO_SESSION, &clp->cl_state);
break;
- case -EKEYEXPIRED:
- /* Nothing we can do */
- nfs4_warn_keyexpired(clp->cl_hostname);
- break;
default:
dprintk("%s: failed to handle error %d for server %s\n",
__func__, error, clp->cl_hostname);
@@ -1632,8 +1612,6 @@ static int nfs4_handle_reclaim_lease_error(struct nfs_client *clp, int status)
dprintk("%s: exit with error %d for server %s\n",
__func__, -EPROTONOSUPPORT, clp->cl_hostname);
return -EPROTONOSUPPORT;
- case -EKEYEXPIRED:
- nfs4_warn_keyexpired(clp->cl_hostname);
case -NFS4ERR_NOT_SAME: /* FixMe: implement recovery
* in nfs4_exchange_id */
default:
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 50a88c3..f084dac 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -47,39 +47,6 @@
#define NFSDBG_FACILITY NFSDBG_PROC

/*
- * wrapper to handle the -EKEYEXPIRED error message. This should generally
- * only happen if using krb5 auth and a user's TGT expires. NFSv2 doesn't
- * support the NFSERR_JUKEBOX error code, but we handle this situation in the
- * same way that we handle that error with NFSv3.
- */
-static int
-nfs_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
-{
- int res;
- do {
- res = rpc_call_sync(clnt, msg, flags);
- if (res != -EKEYEXPIRED)
- break;
- freezable_schedule_timeout_killable(NFS_JUKEBOX_RETRY_TIME);
- res = -ERESTARTSYS;
- } while (!fatal_signal_pending(current));
- return res;
-}
-
-#define rpc_call_sync(clnt, msg, flags) nfs_rpc_wrapper(clnt, msg, flags)
-
-static int
-nfs_async_handle_expired_key(struct rpc_task *task)
-{
- if (task->tk_status != -EKEYEXPIRED)
- return 0;
- task->tk_status = 0;
- rpc_restart_call(task);
- rpc_delay(task, NFS_JUKEBOX_RETRY_TIME);
- return 1;
-}
-
-/*
* Bare-bones access to getattr: this is for nfs_read_super.
*/
static int
@@ -364,8 +331,6 @@ static void nfs_proc_unlink_rpc_prepare(struct rpc_task *task, struct nfs_unlink

static int nfs_proc_unlink_done(struct rpc_task *task, struct inode *dir)
{
- if (nfs_async_handle_expired_key(task))
- return 0;
nfs_mark_for_revalidate(dir);
return 1;
}
@@ -385,8 +350,6 @@ static int
nfs_proc_rename_done(struct rpc_task *task, struct inode *old_dir,
struct inode *new_dir)
{
- if (nfs_async_handle_expired_key(task))
- return 0;
nfs_mark_for_revalidate(old_dir);
nfs_mark_for_revalidate(new_dir);
return 1;
@@ -642,9 +605,6 @@ static int nfs_read_done(struct rpc_task *task, struct nfs_read_data *data)
{
struct inode *inode = data->header->inode;

- if (nfs_async_handle_expired_key(task))
- return -EAGAIN;
-
nfs_invalidate_atime(inode);
if (task->tk_status >= 0) {
nfs_refresh_inode(inode, data->res.fattr);
@@ -671,9 +631,6 @@ static int nfs_write_done(struct rpc_task *task, struct nfs_write_data *data)
{
struct inode *inode = data->header->inode;

- if (nfs_async_handle_expired_key(task))
- return -EAGAIN;
-
if (task->tk_status >= 0)
nfs_post_op_update_inode_force_wcc(inode, data->res.fattr);
return 0;
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index fa48c60..a15b96c 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1343,6 +1343,7 @@ call_refreshresult(struct rpc_task *task)
return;
case -ETIMEDOUT:
rpc_delay(task, 3*HZ);
+ case -EKEYEXPIRED:
case -EAGAIN:
status = -EACCES;
if (!task->tk_cred_retry)
--
1.7.7.6


2012-09-10 19:52:38

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes


On Sep 10, 2012, at 2:57 PM, Jeff Layton wrote:

> On Thu, 6 Sep 2012 15:54:07 -0400
> [email protected] wrote:
>
>> From: Andy Adamson <[email protected]>
>>
>> This patch set is a Request for Comments: I think it does a fair job of
>> handling the issue - but it is complicated, and other brains might find a
>> better way to get the job done.
>>
>> We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
>> context key) that is about to expire or has expired. We currently will
>> paint ourselves into a corner by returning success to the applciation
>> for such a buffered WRITE, only to discover that we do not have permission when
>> we attempt to flush the WRITE (and potentially associated COMMIT) to disk.
>> This results in data corruption.
>>
>> First, a couple of "setup" patches:
>> 1) Patch SUNRPC handle EKEYEXPIRED in call_refreshresult returns EACCES to the
>> application on an expired or non-existent gss context when the users (Kerberos)
>> credentials have also expired or are non-existent. Current behavior is to
>> retry upcalls to GSSD forever. Please see patch comment for detail.
>>
>> 2) Patch SUNRPC set gss gc_expiry to full lifetime works in conjunction with
>> the gssd patch "0001-GSSD-Pass-GSS_context-lifetime-to-the-kernel". The
>> gssd patch passes the actual remaining TGT lifetime in the downcall, and
>> this kernel patch sets the gss context gc_expiry to this lifetime.
>>
>> Then the two patches that avoid using an expired credential key:
>> 3) Patch SUNRPC new rpc_credops to test credential expiry is the heart of this
>> work. It provides the RPC layer helper functions to allow NFS to manage
>> data in the face of expired credentials
>>
>> 4) Patch NFS avoid expired credential keys for buffered writes calls the
>> RPC helper functions.
>>
>> Pages for buffered WRITEs are allocated in nfs_write_begin where we have an
>> nfs_open_context and associated rpc_cred. This is a generic rpc_cred, NOT
>> the gss_cred used in the actual WRITE RPC. Each WRITE RPC call takes the generic
>> rpc_cred (or uses the 'current_cred') uid and uses it to lookup the associated
>> gss_cred and gss_context in the call_refresh RPC state. So, there is a
>> one-to-one association between the nfs_open_context generic_cred and a
>> gss_cred with a matching uid and a valid non expired gss context.
>>
>> We need to check the nfs_open_context generic cred 'underlying' gss_cred
>> gss_context gc_expiry in nfs_write_begin to determine if there is enough time
>> left in the gss_context lifetime to complete the buffered WRITE.
>>
>> I started by adding a "key_timeout" rpc_authops routine only set by the generic
>> auth to do this work, called by rpcauth_key_timeout_notify, called from
>> nfs_write_begin. It does the lookup of the gss_cred (see the patch for
>> fast-tracking of non-gss underlying creds) and then tests the gss_context
>> gc_expiry against timeouts by calling a new crkey_timeout rpc_credops set
>> only for the gss_cred.
>>
>> I came up with the idea of two credential key timeout watermarks:
>> 1) A high watermark, RPC_KEY_EXPIRE_TIMEO set to 90 seconds.
>> 2) A low watermark, RPC_KEY_EXPIRE_FAIL set to 30 seconds.
>> NOTE: these timeouts are guesses that work in a VM environment. We may want to
>> make them adjustable via module parameters.
>>
>> If key_timeout is called on a credential with an underlying credential key that
>> will expire within high watermark seconds, we set the RPC_CRED_KEY_EXPIRE_SOON
>> flag in the generic_cred acred so that the NFS layer can clean up prior to
>> key expiration It does this by calling a new crkey_to_expire rpc_credop set
>> only for generic creds that tests for the RPC_CRED_KEY_EXPIRE_SOON flag.
>>
>> If the RPC_CRED_KEY_EXPIRE_SOON flag is set in the nfs_open_context generic
>> cred (the acred portion), then nfs_write_end will call nfs_wb_all
>> on EVERY WRITE CALL it sees, and nfs_write_rpcsetup will send NFS_FILE_SYNC
>> WRITEs. The idea of the high watermark is to give time to flush all buffered
>> WRITEs and COMMITs, as well as to continue to WRITE, but only with
>> NFS_FILE_SYNC, allowing the application to try to finish writing before the
>> gss context expires. NOTE that this means EACH WRITE within the high watermark
>> timeout is a singe PAGE of NFS_FILE_SYNC. I think this is fine, because we are
>> in a failure mode - the most important thing is to NOT fail a buffered WRITE..
>>
>> If key_timeout is called on a credential key that will expire within low
>> watermark seconds, we return EACCES from nfs_write_begin to stop
>> the buffered WRITE using the credential key.
>>
>> Checking a generic credential's underlying credential involves a cred lookup.
>> To avoid this lookup in the normal case when the underlying credential has
>> a key that is valid (before the high watermark), a notify flag is set in
>> the generic credential the first time the key_timeout is called. The
>> generic credential then stops checking the underlying credential key expiry, and
>> the underlying credential (gss_cred) match routine then checks the key
>> expiration upon each normal use and sets a flag in the associated generic
>> credential only when the key expiration is within the high watermark.
>> This in turn signals the generic credential key_timeout to perform the extra
>> credetial lookup thereafter.
>>
>> TING:
>>
>> I have tested these patches with a TGT of 5 minutes, and a modified
>> Connectathon special test: bigfile.c that only writes and never flushes.
>> I also increased the file size to 524288000 bytes to give me time.
>> I named the test buffered-write.
>>
>> I kinit, and run 6 instances of buffered write with the first and maybe second
>> test completing prior to TGT expiration, and the third/fourth tests spanning
>> the high water mark. The fifth test starts within the high water mark, and the
>> sixth test starts within the low water mark.
>>
>> I have seen the expected behavior: test 1,2 succeed. Tests 3-6 fail with
>> Permission Denied. Tests 3 and 4 start with normal buffered UNSTABLE writes
>> then switch to single page NFS_FILE_SYNC writes with a COMMIT for the
>> normal buffered writes. Test 5 has only single page NFS_FILE_SYNC writes,
>> test 6 simply fails.
>>
>> None of the WRITEs fail.
>>
>> ISSUES
>> 1) Once in a while I see "short write" instead of "Permission
>> denied". I cannot reproduce this. Even then wireshark shows all writes succeeding.
>>
>> 2) Although I have tested re-establishing Kerberos credentials after the
>> test run, I have not done any testing re-estabishing Kerberos credentials
>> during the a test run that is within the high watermark.
>>
>> 3) General corner case and other testing
>>
>> -->Andy
>>
>> Andy Adamson (4):
>> SUNRPC handle EKEYEXPIRED in call_refreshresult
>> SUNRPC set gss gc_expiry to full lifetime
>> SUNRPC new rpc_credops to test credential expiry
>> NFS avoid expired credential keys for buffered writes
>>
>> fs/nfs/file.c | 10 ++++
>> fs/nfs/internal.h | 2 +
>> fs/nfs/nfs3proc.c | 6 +-
>> fs/nfs/nfs4filelayout.c | 1 -
>> fs/nfs/nfs4proc.c | 18 --------
>> fs/nfs/nfs4state.c | 22 ---------
>> fs/nfs/proc.c | 43 ------------------
>> fs/nfs/write.c | 29 ++++++++++++
>> include/linux/sunrpc/auth.h | 27 ++++++++++++
>> net/sunrpc/auth.c | 21 +++++++++
>> net/sunrpc/auth_generic.c | 93 ++++++++++++++++++++++++++++++++++++++++
>> net/sunrpc/auth_gss/auth_gss.c | 51 +++++++++++++++++++---
>> net/sunrpc/clnt.c | 1 +
>> 13 files changed, 231 insertions(+), 93 deletions(-)
>>
>
> In general, I like this set...
>
> My only concern is this behavior described in the last patch:
>
> "If the buffered WRITE is using a credential key that will expire
> within low watermark seconds, fail the WRITE in nfs_write_begin
> _before_ the WRITE is buffered and return -EACCES to the application."
>
> Shouldn't rejecting the write attempt be the purview of the server?

The TGT lifetime is shared by the client and the server - it's the same, so I don't think we'd gain anything by doing that.

>
> It seems to me that we'd be best off just switching to synchronous
> writes when we start approaching credential expiration, and letting the
> server handle the case where the credential expires.

You mean switching to direct IO? I believe that means taking a different path in the VFS layer which I found difficult to do from nfs_write_begin. But maybe I just didn't see a good way to do this.

>
> It seems to me that the client should just be in the business of
> recognizing when the credential might be about to expire and attempt to
> ensure that we don't end up with a bunch of buffered data in that case.

Yes.

-->Andy

>
> --
> Jeff Layton <[email protected]>


2012-09-10 20:13:48

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

T24gTW9uLCAyMDEyLTA5LTEwIGF0IDE5OjUyICswMDAwLCBBZGFtc29uLCBBbmR5IHdyb3RlOg0K
PiBPbiBTZXAgMTAsIDIwMTIsIGF0IDI6NTcgUE0sIEplZmYgTGF5dG9uIHdyb3RlOiANCj4gPiBN
eSBvbmx5IGNvbmNlcm4gaXMgdGhpcyBiZWhhdmlvciBkZXNjcmliZWQgaW4gdGhlIGxhc3QgcGF0
Y2g6DQo+ID4gDQo+ID4gIklmIHRoZSBidWZmZXJlZCBXUklURSBpcyB1c2luZyBhIGNyZWRlbnRp
YWwga2V5IHRoYXQgd2lsbCBleHBpcmUNCj4gPiB3aXRoaW4gbG93IHdhdGVybWFyayBzZWNvbmRz
LCBmYWlsIHRoZSBXUklURSBpbiBuZnNfd3JpdGVfYmVnaW4NCj4gPiBfYmVmb3JlXyB0aGUgV1JJ
VEUgaXMgYnVmZmVyZWQgYW5kIHJldHVybiAtRUFDQ0VTIHRvIHRoZSBhcHBsaWNhdGlvbi4iDQo+
ID4gDQo+ID4gU2hvdWxkbid0IHJlamVjdGluZyB0aGUgd3JpdGUgYXR0ZW1wdCBiZSB0aGUgcHVy
dmlldyBvZiB0aGUgc2VydmVyPw0KPiANCj4gVGhlIFRHVCBsaWZldGltZSBpcyBzaGFyZWQgYnkg
dGhlIGNsaWVudCBhbmQgdGhlIHNlcnZlciAtIGl0J3MgdGhlIHNhbWUsIHNvIEkgZG9uJ3QgdGhp
bmsgd2UnZCBnYWluIGFueXRoaW5nIGJ5IGRvaW5nIHRoYXQuDQo+IA0KPiA+IA0KPiA+IEl0IHNl
ZW1zIHRvIG1lIHRoYXQgd2UnZCBiZSBiZXN0IG9mZiBqdXN0IHN3aXRjaGluZyB0byBzeW5jaHJv
bm91cw0KPiA+IHdyaXRlcyB3aGVuIHdlIHN0YXJ0IGFwcHJvYWNoaW5nIGNyZWRlbnRpYWwgZXhw
aXJhdGlvbiwgYW5kIGxldHRpbmcgdGhlDQo+ID4gc2VydmVyIGhhbmRsZSB0aGUgY2FzZSB3aGVy
ZSB0aGUgY3JlZGVudGlhbCBleHBpcmVzLg0KPiANCj4gWW91IG1lYW4gc3dpdGNoaW5nIHRvIGRp
cmVjdCBJTz8gSSBiZWxpZXZlIHRoYXQgbWVhbnMgdGFraW5nIGEgZGlmZmVyZW50IHBhdGggaW4g
dGhlIFZGUyBsYXllciB3aGljaCBJIGZvdW5kIGRpZmZpY3VsdCB0byBkbyBmcm9tIG5mc193cml0
ZV9iZWdpbi4gQnV0IG1heWJlIEkganVzdCBkaWRuJ3Qgc2VlIGEgZ29vZCB3YXkgdG8gZG8gdGhp
cy4NCg0KU3luY2hyb25vdXMgSS9PIGlzIGEgY29tcGxldGVseSBkaWZmZXJlbnQgY29uY2VwdCB0
aGFuIGRpcmVjdCBJL08uDQoNCkZvciBvcmRpbmFyeSB3cml0ZXMsIG1ha2luZyB0aGVtIHN5bmNo
cm9ub3VzIGJhc2ljYWxseSBpbnZvbHZlcyBjYWxsaW5nDQp2ZnNfZnN5bmMoKSBiZWZvcmUgcmV0
dXJuaW5nIGZyb20gdGhlIHdyaXRlKCkgc3lzY2FsbCAoc2VlDQpuZnNfZmlsZV93cml0ZSgpKS4N
Cg0KRm9yIGRpcmVjdCBJL08sIG1ha2luZyBpdCBzeW5jaHJvbm91cyBtZWFucyBlbnN1cmluZyB0
aGF0IHdlIGRvIHdhaXQgaW4NCm5mc19kaXJlY3Rfd2FpdCgpLg0KDQpGb3IgbW1hcCgpLCB0aGlu
Z3MgYXJlIGEgYml0IG1vcmUgdHJpY2t5IHNpbmNlIHRoZSBkaXJ0eWluZyBwcm9jZXNzDQpoYXBw
ZW5zIHZpYSBhIGNhbGxiYWNrIGZyb20gdGhlIHBhZ2UgZmF1bHQuIEknbSBub3Qgc3VyZSBob3cg
eW91IGNvdWxkDQpmb3JjZSB0aGF0IHRvIGJlIHN5bmNocm9ub3VzLg0KDQotLSANClRyb25kIE15
a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlr
bGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg==

2012-09-10 20:09:01

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

On Mon, 10 Sep 2012 19:52:35 +0000
"Adamson, Andy" <[email protected]> wrote:

>
> On Sep 10, 2012, at 2:57 PM, Jeff Layton wrote:
>
> > On Thu, 6 Sep 2012 15:54:07 -0400
> > [email protected] wrote:
> >
> >> From: Andy Adamson <[email protected]>
> >>
> >> This patch set is a Request for Comments: I think it does a fair job of
> >> handling the issue - but it is complicated, and other brains might find a
> >> better way to get the job done.
> >>
> >> We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
> >> context key) that is about to expire or has expired. We currently will
> >> paint ourselves into a corner by returning success to the applciation
> >> for such a buffered WRITE, only to discover that we do not have permission when
> >> we attempt to flush the WRITE (and potentially associated COMMIT) to disk.
> >> This results in data corruption.
> >>
> >> First, a couple of "setup" patches:
> >> 1) Patch SUNRPC handle EKEYEXPIRED in call_refreshresult returns EACCES to the
> >> application on an expired or non-existent gss context when the users (Kerberos)
> >> credentials have also expired or are non-existent. Current behavior is to
> >> retry upcalls to GSSD forever. Please see patch comment for detail.
> >>
> >> 2) Patch SUNRPC set gss gc_expiry to full lifetime works in conjunction with
> >> the gssd patch "0001-GSSD-Pass-GSS_context-lifetime-to-the-kernel". The
> >> gssd patch passes the actual remaining TGT lifetime in the downcall, and
> >> this kernel patch sets the gss context gc_expiry to this lifetime.
> >>
> >> Then the two patches that avoid using an expired credential key:
> >> 3) Patch SUNRPC new rpc_credops to test credential expiry is the heart of this
> >> work. It provides the RPC layer helper functions to allow NFS to manage
> >> data in the face of expired credentials
> >>
> >> 4) Patch NFS avoid expired credential keys for buffered writes calls the
> >> RPC helper functions.
> >>
> >> Pages for buffered WRITEs are allocated in nfs_write_begin where we have an
> >> nfs_open_context and associated rpc_cred. This is a generic rpc_cred, NOT
> >> the gss_cred used in the actual WRITE RPC. Each WRITE RPC call takes the generic
> >> rpc_cred (or uses the 'current_cred') uid and uses it to lookup the associated
> >> gss_cred and gss_context in the call_refresh RPC state. So, there is a
> >> one-to-one association between the nfs_open_context generic_cred and a
> >> gss_cred with a matching uid and a valid non expired gss context.
> >>
> >> We need to check the nfs_open_context generic cred 'underlying' gss_cred
> >> gss_context gc_expiry in nfs_write_begin to determine if there is enough time
> >> left in the gss_context lifetime to complete the buffered WRITE.
> >>
> >> I started by adding a "key_timeout" rpc_authops routine only set by the generic
> >> auth to do this work, called by rpcauth_key_timeout_notify, called from
> >> nfs_write_begin. It does the lookup of the gss_cred (see the patch for
> >> fast-tracking of non-gss underlying creds) and then tests the gss_context
> >> gc_expiry against timeouts by calling a new crkey_timeout rpc_credops set
> >> only for the gss_cred.
> >>
> >> I came up with the idea of two credential key timeout watermarks:
> >> 1) A high watermark, RPC_KEY_EXPIRE_TIMEO set to 90 seconds.
> >> 2) A low watermark, RPC_KEY_EXPIRE_FAIL set to 30 seconds.
> >> NOTE: these timeouts are guesses that work in a VM environment. We may want to
> >> make them adjustable via module parameters.
> >>
> >> If key_timeout is called on a credential with an underlying credential key that
> >> will expire within high watermark seconds, we set the RPC_CRED_KEY_EXPIRE_SOON
> >> flag in the generic_cred acred so that the NFS layer can clean up prior to
> >> key expiration It does this by calling a new crkey_to_expire rpc_credop set
> >> only for generic creds that tests for the RPC_CRED_KEY_EXPIRE_SOON flag.
> >>
> >> If the RPC_CRED_KEY_EXPIRE_SOON flag is set in the nfs_open_context generic
> >> cred (the acred portion), then nfs_write_end will call nfs_wb_all
> >> on EVERY WRITE CALL it sees, and nfs_write_rpcsetup will send NFS_FILE_SYNC
> >> WRITEs. The idea of the high watermark is to give time to flush all buffered
> >> WRITEs and COMMITs, as well as to continue to WRITE, but only with
> >> NFS_FILE_SYNC, allowing the application to try to finish writing before the
> >> gss context expires. NOTE that this means EACH WRITE within the high watermark
> >> timeout is a singe PAGE of NFS_FILE_SYNC. I think this is fine, because we are
> >> in a failure mode - the most important thing is to NOT fail a buffered WRITE..
> >>
> >> If key_timeout is called on a credential key that will expire within low
> >> watermark seconds, we return EACCES from nfs_write_begin to stop
> >> the buffered WRITE using the credential key.
> >>
> >> Checking a generic credential's underlying credential involves a cred lookup.
> >> To avoid this lookup in the normal case when the underlying credential has
> >> a key that is valid (before the high watermark), a notify flag is set in
> >> the generic credential the first time the key_timeout is called. The
> >> generic credential then stops checking the underlying credential key expiry, and
> >> the underlying credential (gss_cred) match routine then checks the key
> >> expiration upon each normal use and sets a flag in the associated generic
> >> credential only when the key expiration is within the high watermark.
> >> This in turn signals the generic credential key_timeout to perform the extra
> >> credetial lookup thereafter.
> >>
> >> TING:
> >>
> >> I have tested these patches with a TGT of 5 minutes, and a modified
> >> Connectathon special test: bigfile.c that only writes and never flushes.
> >> I also increased the file size to 524288000 bytes to give me time.
> >> I named the test buffered-write.
> >>
> >> I kinit, and run 6 instances of buffered write with the first and maybe second
> >> test completing prior to TGT expiration, and the third/fourth tests spanning
> >> the high water mark. The fifth test starts within the high water mark, and the
> >> sixth test starts within the low water mark.
> >>
> >> I have seen the expected behavior: test 1,2 succeed. Tests 3-6 fail with
> >> Permission Denied. Tests 3 and 4 start with normal buffered UNSTABLE writes
> >> then switch to single page NFS_FILE_SYNC writes with a COMMIT for the
> >> normal buffered writes. Test 5 has only single page NFS_FILE_SYNC writes,
> >> test 6 simply fails.
> >>
> >> None of the WRITEs fail.
> >>
> >> ISSUES
> >> 1) Once in a while I see "short write" instead of "Permission
> >> denied". I cannot reproduce this. Even then wireshark shows all writes succeeding.
> >>
> >> 2) Although I have tested re-establishing Kerberos credentials after the
> >> test run, I have not done any testing re-estabishing Kerberos credentials
> >> during the a test run that is within the high watermark.
> >>
> >> 3) General corner case and other testing
> >>
> >> -->Andy
> >>
> >> Andy Adamson (4):
> >> SUNRPC handle EKEYEXPIRED in call_refreshresult
> >> SUNRPC set gss gc_expiry to full lifetime
> >> SUNRPC new rpc_credops to test credential expiry
> >> NFS avoid expired credential keys for buffered writes
> >>
> >> fs/nfs/file.c | 10 ++++
> >> fs/nfs/internal.h | 2 +
> >> fs/nfs/nfs3proc.c | 6 +-
> >> fs/nfs/nfs4filelayout.c | 1 -
> >> fs/nfs/nfs4proc.c | 18 --------
> >> fs/nfs/nfs4state.c | 22 ---------
> >> fs/nfs/proc.c | 43 ------------------
> >> fs/nfs/write.c | 29 ++++++++++++
> >> include/linux/sunrpc/auth.h | 27 ++++++++++++
> >> net/sunrpc/auth.c | 21 +++++++++
> >> net/sunrpc/auth_generic.c | 93 ++++++++++++++++++++++++++++++++++++++++
> >> net/sunrpc/auth_gss/auth_gss.c | 51 +++++++++++++++++++---
> >> net/sunrpc/clnt.c | 1 +
> >> 13 files changed, 231 insertions(+), 93 deletions(-)
> >>
> >
> > In general, I like this set...
> >
> > My only concern is this behavior described in the last patch:
> >
> > "If the buffered WRITE is using a credential key that will expire
> > within low watermark seconds, fail the WRITE in nfs_write_begin
> > _before_ the WRITE is buffered and return -EACCES to the application."
> >
> > Shouldn't rejecting the write attempt be the purview of the server?
>
> The TGT lifetime is shared by the client and the server - it's the same, so I don't think we'd gain anything by doing that.
>
> >
> > It seems to me that we'd be best off just switching to synchronous
> > writes when we start approaching credential expiration, and letting the
> > server handle the case where the credential expires.
>
> You mean switching to direct IO? I believe that means taking a different path in the VFS layer which I found difficult to do from nfs_write_begin. But maybe I just didn't see a good way to do this.
>
>
> > It seems to me that the client should just be in the business of
> > recognizing when the credential might be about to expire and attempt to
> > ensure that we don't end up with a bunch of buffered data in that case.
>
> Yes.
>
> -->Andy
>
> >
> > --
> > Jeff Layton <[email protected]>
>


I meant that I didn't really see the point of the two different states
when we're approaching ticket expiry.

IIUC, at the "high watermark", you'll switch to synchronous writes, and
presumably will start syncing out any buffered data. At that point,
what are you buying by rejecting writes altogether with EACCES at the
low watermark?

It seems to me that the high watermark should be all you need to deal
with as a client. Once you're past that point you can just let the
client attempt the writes. If they fail, then no big deal -- you were
doing synchronous writes already so you've already avoided having a
bunch of data buffered up.

--
Jeff Layton <[email protected]>

2012-09-10 21:03:59

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

Jeff, Jim, Trond - Thanks for all the comments. I'm looking again at the code and will incorporate your suggestions and do some more testing. Should have results and a modified patch set tomorrow.

-->Andy

On Sep 10, 2012, at 4:14 PM, Myklebust, Trond wrote:

> On Mon, 2012-09-10 at 15:56 -0400, Jim Rees wrote:
>> Jeff Layton wrote:
>>
>> My only concern is this behavior described in the last patch:
>>
>> "If the buffered WRITE is using a credential key that will expire
>> within low watermark seconds, fail the WRITE in nfs_write_begin
>> _before_ the WRITE is buffered and return -EACCES to the application."
>>
>> Shouldn't rejecting the write attempt be the purview of the server?
>>
>> It seems to me that we'd be best off just switching to synchronous
>> writes when we start approaching credential expiration, and letting the
>> server handle the case where the credential expires.
>>
>> It seems to me that the client should just be in the business of
>> recognizing when the credential might be about to expire and attempt to
>> ensure that we don't end up with a bunch of buffered data in that case.
>>
>> I don't like it. It's easy to understand why writes might fail if the creds
>> are getting stale. It's not so easy to understand why client behavior
>> changes in a subtle way, with performance implications.
>
> We don't know that the creds are stale. We just know that the GSS
> session is about to expire.
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>


2012-09-07 01:36:25

by Jim Rees

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

[email protected] wrote:

From: Andy Adamson <[email protected]>

This patch set is a Request for Comments: I think it does a fair job of
handling the issue - but it is complicated, and other brains might find a
better way to get the job done.

I really like it. What we're doing now is nuts. It even (mostly) solves
the old afs problem of write succeeding then having the close fail.

2012-09-06 19:54:11

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 3/4] SUNRPC new rpc_credops to test credential expiry

From: Andy Adamson <[email protected]>

This patch provides the RPC layer helper functions to allow NFS to manage
data in the face of expired credentials - such as avoiding buffered WRITEs
and COMMITs when the gss context will expire before the WRITEs are flushed
and COMMITs are sent.

These helper functions enable checking the expiration of an underlying
credential key for a generic rpc credential, e.g. the gss_cred gss context
gc_expiry which for Kerberos is set to the remaining TGT lifetime.

A new rpc_authops key_timeout is only defined for the generic auth.
A new rpc_credops crkey_to_expire is only defined for the generic cred.
A new rpc_credops crkey_timeout is only defined for the gss cred.

Set two credential key expiry watermarks:
1) A high watermark, RPC_KEY_EXPIRE_TIMEO set to 90 seconds.
2) A low watermark, RPC_KEY_EXPIRE_FAIL set to 30 seconds.
NOTE: these timeouts are guesses that work in a VM environment. We may want to
make them adjustable via module parameters.

If key_timeout is called on a credential with an underlying credential key that
will expire within high watermark seconds, we set the RPC_CRED_KEY_EXPIRE_SOON
flag in the generic_cred acred so that the NFS layer can clean up prior to
key expiration.

If key_timeout is called on a credential key that will expire within low
watermark seconds, we return EACCES to allow NFS to stop any new data
operations on the credential key.

Checking a generic credential's underlying credential involves a cred lookup.
To avoid this lookup in the normal case when the underlying credential has
a key that is valid (before the high watermark), a notify flag is set in
the generic credential the first time the key_timeout is called. The
generic credential then stops checking the underlying credential key expiry, and
the underlying credential (gss_cred) match routine then checks the key
expiration upon each normal use and sets a flag in the associated generic
credential only when the key expiration is within the high watermark.
This in turn signals the generic credential key_timeout to perform the extra
credetial lookup thereafter.

Signed-off-by: Andy Adamson <[email protected]>
---
include/linux/sunrpc/auth.h | 27 ++++++++++++
net/sunrpc/auth.c | 21 +++++++++
net/sunrpc/auth_generic.c | 93 ++++++++++++++++++++++++++++++++++++++++
net/sunrpc/auth_gss/auth_gss.c | 36 ++++++++++++++-
4 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h
index f25ba92..cd01476 100644
--- a/include/linux/sunrpc/auth.h
+++ b/include/linux/sunrpc/auth.h
@@ -21,12 +21,30 @@
/* size of the nodename buffer */
#define UNX_MAXNODENAME 32

+/* Set RPC_CRED_KEY_EXPIRE_SOON auth_cred flag if called RPC_CRED_EXPIRE_TIMEO
+ * miliseconds before the credential key will expire */
+#define RPC_KEY_EXPIRE_TIMEO (90 * HZ)
+
+/* Return -EACCES if called RPC_CRED_EXPIRE_FAIL
+ * miliseconds before the credential key will expire */
+#define RPC_KEY_EXPIRE_FAIL (30 * HZ)
+
+/* auth_cred ac_flags bits */
+enum {
+ RPC_CRED_NO_CRKEY_TIMEOUT = 0, /* underlying cred has no key timeout */
+ RPC_CRED_KEY_EXPIRE_SOON = 1, /* underlying cred key will expire soon */
+ RPC_CRED_NOTIFY_TIMEOUT = 2, /* nofity generic cred when underlying
+ key will expire soon */
+};
+
+
/* Work around the lack of a VFS credential */
struct auth_cred {
uid_t uid;
gid_t gid;
struct group_info *group_info;
const char *principal;
+ unsigned long ac_flags;
unsigned char machine_cred : 1;
};

@@ -102,6 +120,8 @@ struct rpc_authops {
int (*pipes_create)(struct rpc_auth *);
void (*pipes_destroy)(struct rpc_auth *);
int (*list_pseudoflavors)(rpc_authflavor_t *, int);
+ int (*key_timeout)(struct rpc_auth *,
+ struct rpc_cred *);
};

struct rpc_credops {
@@ -118,6 +138,9 @@ struct rpc_credops {
void *, __be32 *, void *);
int (*crunwrap_resp)(struct rpc_task *, kxdrdproc_t,
void *, __be32 *, void *);
+ int (*crkey_timeout)(struct rpc_cred *,
+ unsigned long);
+ bool (*crkey_to_expire)(struct rpc_cred *);
};

extern const struct rpc_authops authunix_ops;
@@ -152,6 +175,10 @@ int rpcauth_uptodatecred(struct rpc_task *);
int rpcauth_init_credcache(struct rpc_auth *);
void rpcauth_destroy_credcache(struct rpc_auth *);
void rpcauth_clear_credcache(struct rpc_cred_cache *);
+int rpcauth_key_timeout_notify(struct rpc_auth *,
+ struct rpc_cred *);
+bool rpcauth_cred_key_to_expire(struct rpc_cred *);
+

static inline
struct rpc_cred * get_rpccred(struct rpc_cred *cred)
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index b5c067b..841facb 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -270,6 +270,27 @@ out_nocache:
EXPORT_SYMBOL_GPL(rpcauth_init_credcache);

/*
+ * Setup a credential key lifetime timeout notification
+ */
+int
+rpcauth_key_timeout_notify(struct rpc_auth *auth, struct rpc_cred *cred)
+{
+ if (!cred->cr_auth->au_ops->key_timeout)
+ return 0;
+ return cred->cr_auth->au_ops->key_timeout(auth, cred);
+}
+EXPORT_SYMBOL_GPL(rpcauth_key_timeout_notify);
+
+bool
+rpcauth_cred_key_to_expire(struct rpc_cred *cred)
+{
+ if (!cred->cr_ops->crkey_to_expire)
+ return false;
+ return cred->cr_ops->crkey_to_expire(cred);
+}
+EXPORT_SYMBOL_GPL(rpcauth_cred_key_to_expire);
+
+/*
* Destroy a list of credentials
*/
static inline
diff --git a/net/sunrpc/auth_generic.c b/net/sunrpc/auth_generic.c
index 6ed6f20..cc44b70 100644
--- a/net/sunrpc/auth_generic.c
+++ b/net/sunrpc/auth_generic.c
@@ -89,6 +89,7 @@ generic_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
gcred->acred.uid = acred->uid;
gcred->acred.gid = acred->gid;
gcred->acred.group_info = acred->group_info;
+ gcred->acred.ac_flags = 0;
if (gcred->acred.group_info != NULL)
get_group_info(gcred->acred.group_info);
gcred->acred.machine_cred = acred->machine_cred;
@@ -180,11 +181,88 @@ void rpc_destroy_generic_auth(void)
rpcauth_destroy_credcache(&generic_auth);
}

+/*
+ * Test the the current time (now) against the underlying credential key expiry
+ * minus a timeout and setup notification.
+ *
+ * The normal case:
+ * If 'now' is before the key expiry minus RPC_KEY_EXPIRE_TIMEO, set
+ * the RPC_CRED_NOTIFY_TIMEOUT flag to setup the underlying credential
+ * rpc_credops crmatch routine to notify this generic cred when it's key
+ * expiration is within RPC_KEY_EXPIRE_TIMEO, and return 0.
+ *
+ * The error case:
+ * If 'now' is within key expiry minus RPC_KEY_EXPIRE_FAIL, or the underlying
+ * cred lookup fails, return -EACCES.
+ *
+ * The 'almost' error case:
+ * If 'now' is within key expiry minus RPC_KEY_EXPIRE_TIMEO, but not within
+ * key expiry minus RPC_KEY_EXPIRE_FAIL, set the RPC_CRED_EXPIRE_SOON bit
+ * on the acred ac_flags and return 0.
+ */
+static int
+generic_key_timeout(struct rpc_auth *auth, struct rpc_cred *cred)
+{
+ struct auth_cred *acred = &container_of(cred, struct generic_cred,
+ gc_base)->acred;
+ struct rpc_cred *tcred;
+ int ret = 0;
+
+
+ dprintk("--> %s cred %p acred %p acred->ac_flags %lu\n", __func__,
+ cred, acred, acred->ac_flags);
+
+ /* Fast track for non crkey_timeout (no key) underlying credentials */
+ if (test_bit(RPC_CRED_NO_CRKEY_TIMEOUT, &acred->ac_flags))
+ return 0;
+
+ /* Fast track for the normal case */
+ if (test_bit(RPC_CRED_NOTIFY_TIMEOUT, &acred->ac_flags))
+ return 0;
+
+ /* lookup_cred either returns a valid referenced rpc_cred, or PTR_ERR */
+ tcred = auth->au_ops->lookup_cred(auth, acred, 0);
+ if (IS_ERR(tcred))
+ return -EACCES;
+
+ if (!tcred->cr_ops->crkey_timeout) {
+ set_bit(RPC_CRED_NO_CRKEY_TIMEOUT, &acred->ac_flags);
+ ret = 0;
+ goto out_put;
+ }
+
+ /* Test for the error case */
+ ret = tcred->cr_ops->crkey_timeout(tcred, RPC_KEY_EXPIRE_FAIL);
+ if (ret != 0) { /* -EACCESS */
+ ret = -EACCES;
+ goto out_put;
+ }
+
+ /* Test for the almost error case */
+ ret = tcred->cr_ops->crkey_timeout(tcred, RPC_KEY_EXPIRE_TIMEO);
+ if (ret != 0) {
+ set_bit(RPC_CRED_KEY_EXPIRE_SOON, &acred->ac_flags);
+ ret = 0;
+ } else {
+ /* In case underlying cred key has been reset */
+ test_and_clear_bit(RPC_CRED_KEY_EXPIRE_SOON, &acred->ac_flags);
+ /* set up fasttrack for the normal case */
+ set_bit(RPC_CRED_NOTIFY_TIMEOUT, &acred->ac_flags);
+ }
+
+out_put:
+ put_rpccred(tcred);
+ dprintk("%s return %d acred->ac_flags %lu\n", __func__, ret,
+ acred->ac_flags);
+ return ret;
+}
+
static const struct rpc_authops generic_auth_ops = {
.owner = THIS_MODULE,
.au_name = "Generic",
.lookup_cred = generic_lookup_cred,
.crcreate = generic_create_cred,
+ .key_timeout = generic_key_timeout,
};

static struct rpc_auth generic_auth = {
@@ -192,9 +270,24 @@ static struct rpc_auth generic_auth = {
.au_count = ATOMIC_INIT(0),
};

+static bool generic_key_to_expire(struct rpc_cred *cred)
+{
+ struct auth_cred *acred = &container_of(cred, struct generic_cred,
+ gc_base)->acred;
+ bool ret;
+
+ get_rpccred(cred);
+ ret = test_bit(RPC_CRED_KEY_EXPIRE_SOON, &acred->ac_flags);
+ put_rpccred(cred);
+
+ return ret;
+}
+
static const struct rpc_credops generic_credops = {
.cr_name = "Generic cred",
.crdestroy = generic_destroy_cred,
.crbind = generic_bind_cred,
.crmatch = generic_match,
+ .crkey_to_expire = generic_key_to_expire,
};
+
diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index c59f5ed..51307a2 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -1093,10 +1093,24 @@ gss_cred_init(struct rpc_auth *auth, struct rpc_cred *cred)
return err;
}

+/* Returns -EACCES if GSS context will expire within timeout (miliseconds) */
+static int
+gss_key_timeout(struct rpc_cred *rc, unsigned long timeout)
+{
+ struct gss_cred *gss_cred = container_of(rc, struct gss_cred, gc_base);
+ unsigned long now = jiffies;
+ unsigned long expire = gss_cred->gc_ctx->gc_expiry - timeout;
+
+ if (time_after(now, expire))
+ return -EACCES;
+ return 0;
+}
+
static int
gss_match(struct auth_cred *acred, struct rpc_cred *rc, int flags)
{
struct gss_cred *gss_cred = container_of(rc, struct gss_cred, gc_base);
+ int ret;

if (test_bit(RPCAUTH_CRED_NEW, &rc->cr_flags))
goto out;
@@ -1109,11 +1123,28 @@ out:
if (acred->principal != NULL) {
if (gss_cred->gc_principal == NULL)
return 0;
- return strcmp(acred->principal, gss_cred->gc_principal) == 0;
+ ret = strcmp(acred->principal, gss_cred->gc_principal) == 0;
+ goto check_expire;
}
if (gss_cred->gc_principal != NULL)
return 0;
- return rc->cr_uid == acred->uid;
+ ret = rc->cr_uid == acred->uid;
+
+check_expire:
+ if (ret == 0)
+ return ret;
+
+ /* Notify acred users of GSS context expiration timeout */
+ if (test_bit(RPC_CRED_NOTIFY_TIMEOUT, &acred->ac_flags) &&
+ (gss_key_timeout(rc, RPC_KEY_EXPIRE_TIMEO) != 0)) {
+ /* test will now be done from generic cred */
+ test_and_clear_bit(RPC_CRED_NOTIFY_TIMEOUT, &acred->ac_flags);
+ /* tell NFS layer that key will expire soon */
+ set_bit(RPC_CRED_KEY_EXPIRE_SOON, &acred->ac_flags);
+ pr_warn("RPC: GSS context %p to expire within %d ms.\n",
+ gss_cred->gc_ctx, RPC_KEY_EXPIRE_TIMEO);
+ }
+ return ret;
}

/*
@@ -1640,6 +1671,7 @@ static const struct rpc_credops gss_credops = {
.crvalidate = gss_validate,
.crwrap_req = gss_wrap_req,
.crunwrap_resp = gss_unwrap_resp,
+ .crkey_timeout = gss_key_timeout,
};

static const struct rpc_credops gss_nullops = {
--
1.7.7.6


2012-09-13 18:09:26

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 0/4] RFC Avoid expired credential keys for buffered writes

T24gVGh1LCAyMDEyLTA5LTEzIGF0IDEzOjU3IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIFdlZCwgU2VwIDEyLCAyMDEyIGF0IDA0OjE0OjM4UE0gKzAwMDAsIEFkYW1zb24sIEFu
ZHkgd3JvdGU6DQo+ID4gDQo+ID4gT24gU2VwIDEyLCAyMDEyLCBhdCAxMToyMSBBTSwgTXlrbGVi
dXN0LCBUcm9uZCB3cm90ZToNCj4gPiANCj4gPiA+IE9uIFdlZCwgMjAxMi0wOS0xMiBhdCAxNTox
MyArMDAwMCwgQWRhbXNvbiwgQW5keSB3cm90ZToNCj4gPiA+PiBBZnRlciBkb2luZyBtb3JlIHRl
c3QgdmVyaWZpY2F0aW9uLCBoZXJlIGFyZSB0aGUgcmVhc29ucyBmb3IgdGhlIGxvdyB3YXRlcm1h
cmsuIFJlYXNvbiAjMiBpcyB0aGUgc3Ryb25nZXN0IGp1c3RpZmljYXRpb24uDQo+IA0KPiAxIGFu
ZCAyIGRvbid0IHNvdW5kIHJpZ2h0LiAgV2hhdCBleGFjdGx5IHdlcmUgdGhlIHRlc3QgZmFpbHVy
ZXM/DQo+IA0KPiBUaGUgY2xpZW50IGFuZCBzZXJ2ZXIncyBnc3MgY29kZSBhbHJlYWR5IGNoZWNr
IHRoZSBjb250ZXh0IGV4cGlyeSBmb3INCj4gdXMtLXdlIGRvbid0IHdhbnQgYW4gZXh0cmEgY2hl
Y2sgaW4gYW4gdXBwZXIgbGF5ZXIgb24gdGhlIGNsaWVudC4NCj4gDQo+IFRoZSBjb250ZXh0ICp3
aWxsKiBleHBpcmUgdW5leHBlY3RlZGx5IHNvbWV0aW1lcywgYW5kIHdlIGRvIGhhdmUgdG8NCj4g
aGFuZGxlIHRoYXQuICAoVGhlIHNlcnZlcidzIGNsb2NrIGNvdWxkIGJlIGEgdGFkIGZhc3RlciB0
aGFuIHRoZQ0KPiBzZXJ2ZXIncywgb3IgdGhlIHNlcnZlciBjb3VsZCByZWJvb3QsIGV0Yy4sIGV0
Yy4pDQo+IA0KPiBJIGFncmVlIHdpdGggYWxsIHRoZSBzdWdnZXN0aW9ucyBmb3IgdHJ5aW5nIHRv
IGFudGljaXBhdGUgZXhwaXJ5IGluIHRoZQ0KPiBub3JtYWwgY2FzZXMgYW5kIHByZXBhcmluZyB0
byBtaW5pbWl6ZSB0aGUgZGFtYWdlLCB0aGF0J3MgZmluZS4gIEJ1dA0KPiBvbmNlIHRoZSBleHBp
cnkgZmluYWxseSBjb21lcyB3ZSBzaG91bGQgbGVhdmUgdGhlIGV4aXN0aW5nIG1lY2hhbmlzbXMg
dG8NCj4gZG8gdGhlaXIgam9iLg0KDQpSaWdodCwgYnV0IHRoZSBwcm9ibGVtIHRoYXQgdGhlIGV4
aXN0aW5nIG1lY2hhbmlzbXMgaGF2ZSBpcyB0aGF0IGR1ZSB0bw0KYXN5bmNocm9ub3VzIHJlYWRz
IGFuZCB3cml0ZXMsIHRoZSBhcHBsaWNhdGlvbiBjYW4gZW5kIHVwIGVhdGluZw0Kc2l6ZWFibGUg
Y2h1bmtzIG9mIG1lbW9yeS4gSXQgY2FuIGFsc28gZW5kIHVwIGdyYWJiaW5nIGxvY2tzIHdpdGhv
dXQNCmJlaW5nIGFibGUgdG8gZnJlZSB0aGVtIGFmdGVyd2FyZHMuDQoNClNQNF9NQUNIX0NSRUQg
c29sdmVzIG1vc3Qgb2YgdGhlc2UgaXNzdWVzLCBidXQgTkZTdjQuMSBpcyBsZXNzIHBlcnZhc2l2
ZQ0KdGhhbiBORlN2NCBhdCB0aGlzIHBvaW50LCBzbyBpdCB3b3VsZCBiZSBuaWNlIHRvIGhhdmUg
YSBzb2x1dGlvbiBmb3IgdGhlDQpsYXR0ZXIgdG9vLg0KDQotLSANClRyb25kIE15a2xlYnVzdA0K
TGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5l
dGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQo=