Hi,
I have a report of an NFS server which runs out of kernel memory when
it gets heave rpcsec_gss traffic (auth_sys doesn't trigger the
problem so it must be gss related).
From looking at /proc/slab_allocators it seems that the main user of
memory is the rsc and rsi caches.
It appears entries are inserted into these caches with an expiry of
'forever' so they grow but never shrink.
We should fix this.
For the rsi (init) cache I assume the entry is only needed once so a
short expiry of (say) one minute should be plenty.
For the rsc (context) cache, the entry could be needed repeatedly
during the lifetime of a 'session'. However eventually it will
become stale and should be allowed to expire.
I assume that if the kernel requests a particular entry a second
time, an hour later, it will get the same answer - is that correct?
In that case, setting the expiry to something largish seems
appropriate.
Hence the following patch (untested yet - but I will get it tested in
due course).
Does this seem reasonable?
Thanks,
NeilBrown
diff --git a/utils/gssd/svcgssd_proc.c b/utils/gssd/svcgssd_proc.c
index 794c2f4..088a007 100644
--- a/utils/gssd/svcgssd_proc.c
+++ b/utils/gssd/svcgssd_proc.c
@@ -86,7 +86,9 @@ do_svc_downcall(gss_buffer_desc *out_handle, struct svc_cred *cred,
}
qword_printhex(f, out_handle->value, out_handle->length);
/* XXX are types OK for the rest of this? */
- qword_printint(f, 0x7fffffff); /*XXX need a better timeout */
+
+ /* 'context' could be needed for a while. */
+ qword_printint(f, time(0) + 60*60);
qword_printint(f, cred->cr_uid);
qword_printint(f, cred->cr_gid);
qword_printint(f, cred->cr_ngroups);
@@ -130,7 +132,8 @@ send_response(FILE *f, gss_buffer_desc *in_handle, gss_buffer_desc *in_token,
qword_addhex(&bp, &blen, in_handle->value, in_handle->length);
qword_addhex(&bp, &blen, in_token->value, in_token->length);
- qword_addint(&bp, &blen, 0x7fffffff); /*XXX need a better timeout */
+ /* INIT context info will only be needed for a short while */
+ qword_addint(&bp, &blen, time(0) + 60);
qword_adduint(&bp, &blen, maj_stat);
qword_adduint(&bp, &blen, min_stat);
qword_addhex(&bp, &blen, out_handle->value, out_handle->length);
Hi Neil,
This seems reasonable.
I have a patch somewhere that gets the actual Kerberos expiration that
could be used for the rsc timeout. But I think this should be fine
for now. (Perhaps at the cost of requiring clients to negotiate a new
context every hour?)
K.C.
On Tue, Dec 2, 2008 at 12:18 AM, Neil Brown <[email protected]> wrote:
>
>
> Hi,
> I have a report of an NFS server which runs out of kernel memory when
> it gets heave rpcsec_gss traffic (auth_sys doesn't trigger the
> problem so it must be gss related).
>
> From looking at /proc/slab_allocators it seems that the main user of
> memory is the rsc and rsi caches.
> It appears entries are inserted into these caches with an expiry of
> 'forever' so they grow but never shrink.
> We should fix this.
>
> For the rsi (init) cache I assume the entry is only needed once so a
> short expiry of (say) one minute should be plenty.
> For the rsc (context) cache, the entry could be needed repeatedly
> during the lifetime of a 'session'. However eventually it will
> become stale and should be allowed to expire.
>
> I assume that if the kernel requests a particular entry a second
> time, an hour later, it will get the same answer - is that correct?
>
> In that case, setting the expiry to something largish seems
> appropriate.
>
> Hence the following patch (untested yet - but I will get it tested in
> due course).
>
> Does this seem reasonable?
>
> Thanks,
> NeilBrown
>
>
> diff --git a/utils/gssd/svcgssd_proc.c b/utils/gssd/svcgssd_proc.c
> index 794c2f4..088a007 100644
> --- a/utils/gssd/svcgssd_proc.c
> +++ b/utils/gssd/svcgssd_proc.c
> @@ -86,7 +86,9 @@ do_svc_downcall(gss_buffer_desc *out_handle, struct svc_cred *cred,
> }
> qword_printhex(f, out_handle->value, out_handle->length);
> /* XXX are types OK for the rest of this? */
> - qword_printint(f, 0x7fffffff); /*XXX need a better timeout */
> +
> + /* 'context' could be needed for a while. */
> + qword_printint(f, time(0) + 60*60);
> qword_printint(f, cred->cr_uid);
> qword_printint(f, cred->cr_gid);
> qword_printint(f, cred->cr_ngroups);
> @@ -130,7 +132,8 @@ send_response(FILE *f, gss_buffer_desc *in_handle, gss_buffer_desc *in_token,
>
> qword_addhex(&bp, &blen, in_handle->value, in_handle->length);
> qword_addhex(&bp, &blen, in_token->value, in_token->length);
> - qword_addint(&bp, &blen, 0x7fffffff); /*XXX need a better timeout */
> + /* INIT context info will only be needed for a short while */
> + qword_addint(&bp, &blen, time(0) + 60);
> qword_adduint(&bp, &blen, maj_stat);
> qword_adduint(&bp, &blen, min_stat);
> qword_addhex(&bp, &blen, out_handle->value, out_handle->length);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
Kevin Coffman wrote:
> Hi Neil,
> This seems reasonable.
>
> I have a patch somewhere that gets the actual Kerberos expiration that
> could be used for the rsc timeout. But I think this should be fine
> for now. (Perhaps at the cost of requiring clients to negotiate a new
> context every hour?)
This question is a bit worrisome, imho... I understand the need to release
memory consumed by dead contexts but on the other hand, renegotiating
contexts every hour on the hours seems a bit costly as well...
Does it make sense to make this time out configurable? Yes, it would be
a very obscure knob, but in the unlikely event there is a bug in the
renegotiating code or renegotiating simply becomes too costly, I think
it would good to have a way to dial the time out back up as a work-around.
steved.
On Tue, Dec 2, 2008 at 11:04 AM, Steve Dickson <[email protected]> wrote:
> Kevin Coffman wrote:
>> Hi Neil,
>> This seems reasonable.
>>
>> I have a patch somewhere that gets the actual Kerberos expiration that
>> could be used for the rsc timeout. But I think this should be fine
>> for now. (Perhaps at the cost of requiring clients to negotiate a new
>> context every hour?)
> This question is a bit worrisome, imho... I understand the need to release
> memory consumed by dead contexts but on the other hand, renegotiating
> contexts every hour on the hours seems a bit costly as well...
>
> Does it make sense to make this time out configurable? Yes, it would be
> a very obscure knob, but in the unlikely event there is a bug in the
> renegotiating code or renegotiating simply becomes too costly, I think
> it would good to have a way to dial the time out back up as a work-around.
>
> steved.
Hi Steve,
Rather than adding another config knob, how 'bout I dust off my patch
to get the "right" timeout value? I should be able to make that
available tomorrow. (I have limited resources to work on this at the
moment.)
K.C.
On Tue, Dec 02, 2008 at 04:18:17PM +1100, Neil Brown wrote:
> I have a report of an NFS server which runs out of kernel memory when
> it gets heave rpcsec_gss traffic (auth_sys doesn't trigger the
> problem so it must be gss related).
>
> From looking at /proc/slab_allocators it seems that the main user of
> memory is the rsc and rsi caches.
> It appears entries are inserted into these caches with an expiry of
> 'forever' so they grow but never shrink.
> We should fix this.
Yes, definitely a leak. But it's funny, I have a strong memory of a
conversation on one of these lists with someone who looked into this
problem and found that even with some silly artificial tests that
established as many contexts per second as possible, they weren't able
to see significant memory consumption--but I can't find the thread now.
Some limit on the size might also be nice depending on what the worst
case looks like with expiries of about a day.
--b.
>
> For the rsi (init) cache I assume the entry is only needed once so a
> short expiry of (say) one minute should be plenty.
> For the rsc (context) cache, the entry could be needed repeatedly
> during the lifetime of a 'session'. However eventually it will
> become stale and should be allowed to expire.
>
> I assume that if the kernel requests a particular entry a second
> time, an hour later, it will get the same answer - is that correct?
>
> In that case, setting the expiry to something largish seems
> appropriate.
>
> Hence the following patch (untested yet - but I will get it tested in
> due course).
>
> Does this seem reasonable?
> Dec 2, 2008 at 12:40 PM, Kevin Coffman <[email protected]> wrote:
> On Tue, Dec 2, 2008 at 11:04 AM, Steve Dickson <[email protected]> wrote:
>> Kevin Coffman wrote:
>>> Hi Neil,
>>> This seems reasonable.
>>>
>>> I have a patch somewhere that gets the actual Kerberos expiration that
>>> could be used for the rsc timeout. But I think this should be fine
>>> for now. (Perhaps at the cost of requiring clients to negotiate a new
>>> context every hour?)
>> This question is a bit worrisome, imho... I understand the need to release
>> memory consumed by dead contexts but on the other hand, renegotiating
>> contexts every hour on the hours seems a bit costly as well...
>>
>> Does it make sense to make this time out configurable? Yes, it would be
>> a very obscure knob, but in the unlikely event there is a bug in the
>> renegotiating code or renegotiating simply becomes too costly, I think
>> it would good to have a way to dial the time out back up as a work-around.
>>
>> steved.
>
> Hi Steve,
>
> Rather than adding another config knob, how 'bout I dust off my patch
> to get the "right" timeout value? I should be able to make that
> available tomorrow. (I have limited resources to work on this at the
> moment.)
>
> K.C.
It took me a bit longer than I anticipated to retrofit my changes for
this. I have patches that I will send out for review later tonight or
tomorrow.
K.C.