Dear all,
we have found memory leak on OpenVz7 node and believe it affects mainline too.
sunrpc_cache_lookup() removes exprired cache_head from hash,
however if it waits for reply on submitted cache_request
both of them can leak forever, nobody cleans unhashed cache_heads.
Originally we had claim on busy loop device of stopped container,
that had executed nfs server inside.
Device was kept by mount that was detached from already destroyed mount namespace.
By using crash search we have found some structure with path struct related to our mount.
Finally we have found that it was alive svc_export struct used by to alive cache_request,
however both of them pointed to already freed cache_detail.
We decided that cache_detail was correctly freed during destroy of net namespace,
however svc_export with taken path struct, cache_request and some other structures
seems was leaked forever.
This could happen only if cache_head of svc_export was removed from hash on cache_detail
before its destroy. Finally we have found that it could happen when sunrpc_cache_lookup()
removes expired cache_head from hash.
Usually it works correctly and cache_put(freeme) frees expired cache_head.
However in our case cache_head have an extra reference counter from stalled cache_request.
Becasue of cache_head was removed from hash of cache_detail it cannot be found in cache_clean()
and its cache_request cannot be freed in cache_dequeue(). Memory leaks forever,
exactly like we observed.
After may attempts we have reproduced this situation on OpenVz7 kernel,
however our reproducer is quite long and complex. Unfortunately we still
did not reproduced this problem on mainline kernel and did not validated the patch yet.
It would be great if someone advised us some simple way to trigger described scenario.
We are not sure that our patch is correct, please let us know if our analyze missed something.
Vasily Averin (1):
sunrpc: cache_head leak due queued requests
net/sunrpc/cache.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--
2.17.1
On Wed, Nov 28, 2018 at 11:45:46AM +0300, Vasily Averin wrote:
> Dear all, we have found memory leak on OpenVz7 node and believe it
> affects mainline too.
>
> sunrpc_cache_lookup() removes exprired cache_head from hash, however
> if it waits for reply on submitted cache_request both of them can leak
> forever, nobody cleans unhashed cache_heads.
>
> Originally we had claim on busy loop device of stopped container, that
> had executed nfs server inside. Device was kept by mount that was
> detached from already destroyed mount namespace. By using crash
> search we have found some structure with path struct related to our
> mount. Finally we have found that it was alive svc_export struct used
> by to alive cache_request, however both of them pointed to already
> freed cache_detail.
>
> We decided that cache_detail was correctly freed during destroy of net
> namespace, however svc_export with taken path struct, cache_request
> and some other structures seems was leaked forever.
>
> This could happen only if cache_head of svc_export was removed from
> hash on cache_detail before its destroy. Finally we have found that it
> could happen when sunrpc_cache_lookup() removes expired cache_head
> from hash.
>
> Usually it works correctly and cache_put(freeme) frees expired
> cache_head. However in our case cache_head have an extra reference
> counter from stalled cache_request. Becasue of cache_head was removed
> from hash of cache_detail it cannot be found in cache_clean() and its
> cache_request cannot be freed in cache_dequeue(). Memory leaks
> forever, exactly like we observed.
>
> After may attempts we have reproduced this situation on OpenVz7
> kernel, however our reproducer is quite long and complex.
> Unfortunately we still did not reproduced this problem on mainline
> kernel and did not validated the patch yet.
>
> It would be great if someone advised us some simple way to trigger
> described scenario.
I think you should be able to produce hung upcalls by flushing the cache
(exportfs -f), then stopping mountd, then trying to access the
filesystem from a client. Does that help?
> We are not sure that our patch is correct, please let us know if our
> analyze missed something.
It looks OK to me, but it would be helpful to have Neil's review too.
I think I'd also copy some of the above into the changelog--e.g. it
might be useful to document that this can manifest as a stray reference
cuont on a mount.
--b.
On Wed, Nov 28 2018, J. Bruce Fields wrote:
> On Wed, Nov 28, 2018 at 11:45:46AM +0300, Vasily Averin wrote:
>> Dear all, we have found memory leak on OpenVz7 node and believe it
>> affects mainline too.
>>
>> sunrpc_cache_lookup() removes exprired cache_head from hash, however
>> if it waits for reply on submitted cache_request both of them can leak
>> forever, nobody cleans unhashed cache_heads.
>>
>> Originally we had claim on busy loop device of stopped container, that
>> had executed nfs server inside. Device was kept by mount that was
>> detached from already destroyed mount namespace. By using crash
>> search we have found some structure with path struct related to our
>> mount. Finally we have found that it was alive svc_export struct used
>> by to alive cache_request, however both of them pointed to already
>> freed cache_detail.
>>
>> We decided that cache_detail was correctly freed during destroy of net
>> namespace, however svc_export with taken path struct, cache_request
>> and some other structures seems was leaked forever.
>>
>> This could happen only if cache_head of svc_export was removed from
>> hash on cache_detail before its destroy. Finally we have found that it
>> could happen when sunrpc_cache_lookup() removes expired cache_head
>> from hash.
>>
>> Usually it works correctly and cache_put(freeme) frees expired
>> cache_head. However in our case cache_head have an extra reference
>> counter from stalled cache_request. Becasue of cache_head was removed
>> from hash of cache_detail it cannot be found in cache_clean() and its
>> cache_request cannot be freed in cache_dequeue(). Memory leaks
>> forever, exactly like we observed.
>>
>> After may attempts we have reproduced this situation on OpenVz7
>> kernel, however our reproducer is quite long and complex.
>> Unfortunately we still did not reproduced this problem on mainline
>> kernel and did not validated the patch yet.
>>
>> It would be great if someone advised us some simple way to trigger
>> described scenario.
>
> I think you should be able to produce hung upcalls by flushing the cache
> (exportfs -f), then stopping mountd, then trying to access the
> filesystem from a client. Does that help?
>
>> We are not sure that our patch is correct, please let us know if our
>> analyze missed something.
>
> It looks OK to me, but it would be helpful to have Neil's review too.
Yes, it makes sense to me.
Reviewed-by: NeilBrown <[email protected]>
Thanks,
NeilBrown
>
> I think I'd also copy some of the above into the changelog--e.g. it
> might be useful to document that this can manifest as a stray reference
> cuont on a mount.
>
> --b.
On Thu, Nov 29, 2018 at 04:35:12PM +1100, NeilBrown wrote:
> On Wed, Nov 28 2018, J. Bruce Fields wrote:
>
> > On Wed, Nov 28, 2018 at 11:45:46AM +0300, Vasily Averin wrote:
> >> Dear all, we have found memory leak on OpenVz7 node and believe it
> >> affects mainline too.
> >>
> >> sunrpc_cache_lookup() removes exprired cache_head from hash, however
> >> if it waits for reply on submitted cache_request both of them can leak
> >> forever, nobody cleans unhashed cache_heads.
> >>
> >> Originally we had claim on busy loop device of stopped container, that
> >> had executed nfs server inside. Device was kept by mount that was
> >> detached from already destroyed mount namespace. By using crash
> >> search we have found some structure with path struct related to our
> >> mount. Finally we have found that it was alive svc_export struct used
> >> by to alive cache_request, however both of them pointed to already
> >> freed cache_detail.
> >>
> >> We decided that cache_detail was correctly freed during destroy of net
> >> namespace, however svc_export with taken path struct, cache_request
> >> and some other structures seems was leaked forever.
> >>
> >> This could happen only if cache_head of svc_export was removed from
> >> hash on cache_detail before its destroy. Finally we have found that it
> >> could happen when sunrpc_cache_lookup() removes expired cache_head
> >> from hash.
> >>
> >> Usually it works correctly and cache_put(freeme) frees expired
> >> cache_head. However in our case cache_head have an extra reference
> >> counter from stalled cache_request. Becasue of cache_head was removed
> >> from hash of cache_detail it cannot be found in cache_clean() and its
> >> cache_request cannot be freed in cache_dequeue(). Memory leaks
> >> forever, exactly like we observed.
> >>
> >> After may attempts we have reproduced this situation on OpenVz7
> >> kernel, however our reproducer is quite long and complex.
> >> Unfortunately we still did not reproduced this problem on mainline
> >> kernel and did not validated the patch yet.
> >>
> >> It would be great if someone advised us some simple way to trigger
> >> described scenario.
> >
> > I think you should be able to produce hung upcalls by flushing the cache
> > (exportfs -f), then stopping mountd, then trying to access the
> > filesystem from a client. Does that help?
> >
> >> We are not sure that our patch is correct, please let us know if our
> >> analyze missed something.
> >
> > It looks OK to me, but it would be helpful to have Neil's review too.
>
> Yes, it makes sense to me.
> Reviewed-by: NeilBrown <[email protected]>
OK, applied for 4.21.--b.
On Thu, Nov 29 2018, NeilBrown wrote:
> On Wed, Nov 28 2018, J. Bruce Fields wrote:
>
>> On Wed, Nov 28, 2018 at 11:45:46AM +0300, Vasily Averin wrote:
>>> Dear all, we have found memory leak on OpenVz7 node and believe it
>>> affects mainline too.
>>>
>>> sunrpc_cache_lookup() removes exprired cache_head from hash, however
>>> if it waits for reply on submitted cache_request both of them can leak
>>> forever, nobody cleans unhashed cache_heads.
>>>
>>> Originally we had claim on busy loop device of stopped container, that
>>> had executed nfs server inside. Device was kept by mount that was
>>> detached from already destroyed mount namespace. By using crash
>>> search we have found some structure with path struct related to our
>>> mount. Finally we have found that it was alive svc_export struct used
>>> by to alive cache_request, however both of them pointed to already
>>> freed cache_detail.
>>>
>>> We decided that cache_detail was correctly freed during destroy of net
>>> namespace, however svc_export with taken path struct, cache_request
>>> and some other structures seems was leaked forever.
>>>
>>> This could happen only if cache_head of svc_export was removed from
>>> hash on cache_detail before its destroy. Finally we have found that it
>>> could happen when sunrpc_cache_lookup() removes expired cache_head
>>> from hash.
>>>
>>> Usually it works correctly and cache_put(freeme) frees expired
>>> cache_head. However in our case cache_head have an extra reference
>>> counter from stalled cache_request. Becasue of cache_head was removed
>>> from hash of cache_detail it cannot be found in cache_clean() and its
>>> cache_request cannot be freed in cache_dequeue(). Memory leaks
>>> forever, exactly like we observed.
>>>
>>> After may attempts we have reproduced this situation on OpenVz7
>>> kernel, however our reproducer is quite long and complex.
>>> Unfortunately we still did not reproduced this problem on mainline
>>> kernel and did not validated the patch yet.
>>>
>>> It would be great if someone advised us some simple way to trigger
>>> described scenario.
>>
>> I think you should be able to produce hung upcalls by flushing the cache
>> (exportfs -f), then stopping mountd, then trying to access the
>> filesystem from a client. Does that help?
>>
>>> We are not sure that our patch is correct, please let us know if our
>>> analyze missed something.
>>
>> It looks OK to me, but it would be helpful to have Neil's review too.
>
> Yes, it makes sense to me.
> Reviewed-by: NeilBrown <[email protected]>
Unfortunately I was wrong. See below.
NeilBrown
From: NeilBrown <[email protected]>
Subject: [PATCH] sunrpc: don't mark uninitialised items as VALID.
A recent commit added a call to cache_fresh_locked()
when an expired item was found.
The call sets the CACHE_VALID flag, so it is important
that the item actually is valid.
There are two ways it could be valid:
1/ If ->update has been called to fill in relevant content
2/ If CACHE_NEGATIVE is set, to say that content doesn't exist.
An expired item that is waiting for an update will be neither.
Setting CACHE_VALID will mean that a subsequent call to cache_put()
will be likely to dereference uninitialised pointers.
So we must make sure the item is valid, and we already have code to do
that in try_to_negate_entry(). This takes the hash lock and so cannot
be used directly, so take out the two lines that we need and use them.
Now cache_fresh_locked() is certain to be called only on
a valid item.
Cc: [email protected] # 2.6.35
Fixes: 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued request")
Signed-off-by: NeilBrown <[email protected]>
---
net/sunrpc/cache.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 12bb23b8e0c5..f5f8ce8c3443 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -105,6 +105,8 @@ static struct cache_head *sunrpc_cache_add_entry(struct cache_detail *detail,
if (cache_is_expired(detail, tmp)) {
hlist_del_init_rcu(&tmp->cache_list);
detail->entries --;
+ if (cache_is_valid(tmp) == -EAGAIN)
+ set_bit(CACHE_NEGATIVE, &h->flags);
cache_fresh_locked(tmp, 0, detail);
freeme = tmp;
break;
--
2.14.0.rc0.dirty
On Fri, Apr 05 2019, NeilBrown wrote:
>
> Unfortunately I was wrong. See below.
>
Arrghh.. Ignore that patch. Use this one.
NeilBrown
From b7a8dced1a34869710a5d68a64e8bd37bf2ab426 Mon Sep 17 00:00:00 2001
From: NeilBrown <[email protected]>
Date: Fri, 5 Apr 2019 09:04:41 +1100
Subject: [PATCH] sunrpc: don't mark uninitialised items as VALID.
A recent commit added a call to cache_fresh_locked()
when an expired item was found.
The call sets the CACHE_VALID flag, so it is important
that the item actually is valid.
There are two ways it could be valid:
1/ If ->update has been called to fill in relevant content
2/ if CACHE_NEGATIVE is set, to say that content doesn't exist.
An expired item that is waiting for an update will be neither.
Setting CACHE_VALID will mean that a subsequent call to cache_put()
will be likely to dereference uninitialised pointers.
So we must make sure the item is valid, and we already have code to do
that in try_to_negate_entry(). This takes the hash lock and so cannot
be used directly, so take out the two lines that we need and use them.
Now cache_fresh_locked() is certain to be called only on
a valid item.
Cc: [email protected] # 2.6.35
Fixes: 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued request")
Signed-off-by: NeilBrown <[email protected]>
---
net/sunrpc/cache.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 12bb23b8e0c5..261131dfa1f1 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -54,6 +54,7 @@ static void cache_init(struct cache_head *h, struct cache_detail *detail)
h->last_refresh = now;
}
+static inline int cache_is_valid(struct cache_head *h);
static void cache_fresh_locked(struct cache_head *head, time_t expiry,
struct cache_detail *detail);
static void cache_fresh_unlocked(struct cache_head *head,
@@ -105,6 +106,8 @@ static struct cache_head *sunrpc_cache_add_entry(struct cache_detail *detail,
if (cache_is_expired(detail, tmp)) {
hlist_del_init_rcu(&tmp->cache_list);
detail->entries --;
+ if (cache_is_valid(tmp) == -EAGAIN)
+ set_bit(CACHE_NEGATIVE, &tmp->flags);
cache_fresh_locked(tmp, 0, detail);
freeme = tmp;
break;
--
2.14.0.rc0.dirty
On Fri, Apr 05, 2019 at 11:34:40AM +1100, NeilBrown wrote:
> On Fri, Apr 05 2019, NeilBrown wrote:
> >
> > Unfortunately I was wrong. See below.
> >
>
> Arrghh.. Ignore that patch. Use this one.
Took me a moment to spot the difference. OK! Queuing that up for
5.1....
--b.
>
> NeilBrown
>
> From b7a8dced1a34869710a5d68a64e8bd37bf2ab426 Mon Sep 17 00:00:00 2001
> From: NeilBrown <[email protected]>
> Date: Fri, 5 Apr 2019 09:04:41 +1100
> Subject: [PATCH] sunrpc: don't mark uninitialised items as VALID.
>
> A recent commit added a call to cache_fresh_locked()
> when an expired item was found.
> The call sets the CACHE_VALID flag, so it is important
> that the item actually is valid.
> There are two ways it could be valid:
> 1/ If ->update has been called to fill in relevant content
> 2/ if CACHE_NEGATIVE is set, to say that content doesn't exist.
>
> An expired item that is waiting for an update will be neither.
> Setting CACHE_VALID will mean that a subsequent call to cache_put()
> will be likely to dereference uninitialised pointers.
>
> So we must make sure the item is valid, and we already have code to do
> that in try_to_negate_entry(). This takes the hash lock and so cannot
> be used directly, so take out the two lines that we need and use them.
>
> Now cache_fresh_locked() is certain to be called only on
> a valid item.
>
> Cc: [email protected] # 2.6.35
> Fixes: 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued request")
> Signed-off-by: NeilBrown <[email protected]>
> ---
> net/sunrpc/cache.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 12bb23b8e0c5..261131dfa1f1 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -54,6 +54,7 @@ static void cache_init(struct cache_head *h, struct cache_detail *detail)
> h->last_refresh = now;
> }
>
> +static inline int cache_is_valid(struct cache_head *h);
> static void cache_fresh_locked(struct cache_head *head, time_t expiry,
> struct cache_detail *detail);
> static void cache_fresh_unlocked(struct cache_head *head,
> @@ -105,6 +106,8 @@ static struct cache_head *sunrpc_cache_add_entry(struct cache_detail *detail,
> if (cache_is_expired(detail, tmp)) {
> hlist_del_init_rcu(&tmp->cache_list);
> detail->entries --;
> + if (cache_is_valid(tmp) == -EAGAIN)
> + set_bit(CACHE_NEGATIVE, &tmp->flags);
> cache_fresh_locked(tmp, 0, detail);
> freeme = tmp;
> break;
> --
> 2.14.0.rc0.dirty
>