2013-06-13 02:54:44

by NeilBrown

[permalink] [raw]
Subject: [PATCH 0/5] Fix assorted races in the sunrpc cache.

There are a few races possible in the sunrpc cache which can result in
messages being dropped or memory not being freed.

These patches have been tested extensively and appear to remove all
dropped-message errors and memory leaks.

NeilBrown


---

NeilBrown (5):
sunrpc/cache: remove races with queuing an upcall.
sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
sunrpc/cache: ensure items removed from cache do not have pending upcalls.
net/sunrpc: xpt_auth_cache should be ignored when expired.
sunrpc: Don't schedule an upcall on a replaced cache entry.


include/linux/sunrpc/cache.h | 49 ++++++++++++---------------
net/sunrpc/cache.c | 75 ++++++++++++++++++++++++------------------
net/sunrpc/svcauth_unix.c | 4 +-
3 files changed, 67 insertions(+), 61 deletions(-)

--
Signature



2013-06-13 02:54:51

by NeilBrown

[permalink] [raw]
Subject: [PATCH 1/5] sunrpc/cache: remove races with queuing an upcall.

We currently queue an upcall after setting CACHE_PENDING,
and dequeue after clearing CACHE_PENDING.
So a request should only be present when CACHE_PENDING is set.

However we don't combine the test and the enqueue/dequeue in
a protected region, so it is possible (if unlikely) for a race
to result in a request being queued without CACHE_PENDING set,
or a request to be absent despite CACHE_PENDING.

So: include a test for CACHE_PENDING inside the regions of
enqueue and dequeue where queue_lock is held, and abort
the operation if the value is not as expected.

Also remove the early 'return' from cache_dequeue() to ensure that it
always removes all entries: As there is no locking between setting
CACHE_PENDING and calling sunrpc_cache_pipe_upcall it is not
inconceivable for some other thread to clear CACHE_PENDING and then
someone else to set it and call sunrpc_cache_pipe_upcall, both before
the original threads completed the call.

With this, it perfectly safe and correct to:
- call cache_dequeue() if and only if we have just
cleared CACHE_PENDING
- call sunrpc_cache_pipe_upcall() (via cache_make_upcall)
if and only if we have just set CACHE_PENDING.

Reported-by: Bodo Stroesser <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Bodo Stroesser <[email protected]>
---
net/sunrpc/cache.c | 40 +++++++++++++++++++++++++++++-----------
1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 80fe5c8..ce47f45 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1036,23 +1036,32 @@ static int cache_release(struct inode *inode, struct file *filp,

static void cache_dequeue(struct cache_detail *detail, struct cache_head *ch)
{
- struct cache_queue *cq;
+ struct cache_queue *cq, *tmp;
+ struct cache_request *cr;
+ struct list_head dequeued;
+
+ INIT_LIST_HEAD(&dequeued);
spin_lock(&queue_lock);
- list_for_each_entry(cq, &detail->queue, list)
+ list_for_each_entry_safe(cq, tmp, &detail->queue, list)
if (!cq->reader) {
- struct cache_request *cr = container_of(cq, struct cache_request, q);
+ cr = container_of(cq, struct cache_request, q);
if (cr->item != ch)
continue;
+ if (test_bit(CACHE_PENDING, &ch->flags))
+ /* Lost a race and it is pending again */
+ break;
if (cr->readers != 0)
continue;
- list_del(&cr->q.list);
- spin_unlock(&queue_lock);
- cache_put(cr->item, detail);
- kfree(cr->buf);
- kfree(cr);
- return;
+ list_move(&cr->q.list, &dequeued);
}
spin_unlock(&queue_lock);
+ while (!list_empty(&dequeued)) {
+ cr = list_entry(dequeued.next, struct cache_request, q.list);
+ list_del(&cr->q.list);
+ cache_put(cr->item, detail);
+ kfree(cr->buf);
+ kfree(cr);
+ }
}

/*
@@ -1166,6 +1175,7 @@ int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h)

char *buf;
struct cache_request *crq;
+ int ret = 0;

if (!detail->cache_request)
return -EINVAL;
@@ -1191,10 +1201,18 @@ int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h)
crq->len = 0;
crq->readers = 0;
spin_lock(&queue_lock);
- list_add_tail(&crq->q.list, &detail->queue);
+ if (test_bit(CACHE_PENDING, &h->flags))
+ list_add_tail(&crq->q.list, &detail->queue);
+ else
+ /* Lost a race, no longer PENDING, so don't enqueue */
+ ret = -EAGAIN;
spin_unlock(&queue_lock);
wake_up(&queue_wait);
- return 0;
+ if (ret == -EAGAIN) {
+ kfree(buf);
+ kfree(crq);
+ }
+ return ret;
}
EXPORT_SYMBOL_GPL(sunrpc_cache_pipe_upcall);




2013-06-13 02:55:05

by NeilBrown

[permalink] [raw]
Subject: [PATCH 3/5] sunrpc/cache: ensure items removed from cache do not have pending upcalls.

It is possible for a race to set CACHE_PENDING after cache_clean()
has removed a cache entry from the cache.
If CACHE_PENDING is still set when the entry is finally 'put',
the cache_dequeue() will never happen and we can leak memory.

So set a new flag 'CACHE_CLEANED' when we remove something from
the cache, and don't queue any upcall if it is set.

If CACHE_PENDING is set before CACHE_CLEANED, the call that
cache_clean() makes to cache_fresh_unlocked() will free memory
as needed. If CACHE_PENDING is set after CACHE_CLEANED, the
test in sunrpc_cache_pipe_upcall will ensure that the memory
is not allocated.

Reported-by: <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
include/linux/sunrpc/cache.h | 1 +
net/sunrpc/cache.c | 6 +++++-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 303399b..8419f7d 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -57,6 +57,7 @@ struct cache_head {
#define CACHE_VALID 0 /* Entry contains valid data */
#define CACHE_NEGATIVE 1 /* Negative entry - there is no match for the key */
#define CACHE_PENDING 2 /* An upcall has been sent but no reply received yet*/
+#define CACHE_CLEANED 3 /* Entry has been cleaned from cache */

#define CACHE_NEW_EXPIRY 120 /* keep new things pending confirmation for 120 seconds */

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 4940be0..454e23c 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -306,7 +306,7 @@ EXPORT_SYMBOL_GPL(cache_check);
* a current pointer into that list and into the table
* for that entry.
*
- * Each time clean_cache is called it finds the next non-empty entry
+ * Each time cache_clean is called it finds the next non-empty entry
* in the current table and walks the list in that entry
* looking for entries that can be removed.
*
@@ -453,6 +453,7 @@ static int cache_clean(void)
current_index ++;
spin_unlock(&cache_list_lock);
if (ch) {
+ set_bit(CACHE_CLEANED, &ch->flags);
cache_fresh_unlocked(ch, d);
cache_put(ch, d);
}
@@ -1178,6 +1179,9 @@ int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h)
warn_no_listener(detail);
return -EINVAL;
}
+ if (test_bit(CACHE_CLEANED, &h->flags))
+ /* Too late to make an upcall */
+ return -EAGAIN;

buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!buf)



2013-06-13 02:55:19

by NeilBrown

[permalink] [raw]
Subject: [PATCH 5/5] sunrpc: Don't schedule an upcall on a replaced cache entry.

When a cache entry is replaced, the "expiry_time" get set to
zero by a call to "cache_fresh_locked(..., 0)" at the end of
"sunrpc_cache_update".

This low expiry time makes cache_check() think that the 'refresh_age'
is negative, so the 'age' is comparatively large and a refresh is
triggered.
However refreshing a replaced entry it pointless, it cannot achieve
anything useful.

So teach cache_check to ignore a low refresh_age when expiry_time
is zero.

Reported-by: Bodo Stroesser <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
net/sunrpc/cache.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index d01eb07..13dad67 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -262,7 +262,8 @@ int cache_check(struct cache_detail *detail,
if (rqstp == NULL) {
if (rv == -EAGAIN)
rv = -ENOENT;
- } else if (rv == -EAGAIN || age > refresh_age/2) {
+ } else if (rv == -EAGAIN ||
+ (h->expiry_time != 0 && age > refresh_age/2)) {
dprintk("RPC: Want update, refage=%ld, age=%ld\n",
refresh_age, age);
if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {



2013-06-13 02:54:58

by NeilBrown

[permalink] [raw]
Subject: [PATCH 2/5] sunrpc/cache: use cache_fresh_unlocked consistently and correctly.

cache_fresh_unlocked() is called when a cache entry
has been updated and ensures that if there were any
pending upcalls, they are cleared.

So every time we update a cache entry, we should call this,
and this should be the only way that we try to clear
pending calls (that sort of uniformity makes code sooo much
easier to read).

try_to_negate_entry() will (possibly) mark an entry as
negative. If it doesn't, it is because the entry already
is VALID.
So the entry will be valid on exit, so it is appropriate to
call cache_fresh_unlocked().
So tidy up try_to_negate_entry() to do that, and remove
partial open-coded cache_fresh_unlocked() from the one
call-site of try_to_negate_entry().

In the other branch of the 'switch(cache_make_upcall())',
we again have a partial open-coded version of cache_fresh_unlocked().
Replace that with a real call.

And again in cache_clean(), use a real call to cache_fresh_unlocked().

These call sites might previously have called
cache_revisit_request() if CACHE_PENDING wasn't set.
This is never necessary because cache_revisit_request() can
only do anything if the item is in the cache_defer_hash,
However any time that an item is added to the cache_defer_hash
(setup_deferral), the code immediately tests CACHE_PENDING,
and removes the entry again if it is clear. So all other
places we only need to 'cache_revisit_request' if we've
just cleared CACHE_PENDING.

Reported-by: Bodo Stroesser <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
net/sunrpc/cache.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index ce47f45..4940be0 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -228,15 +228,14 @@ static int try_to_negate_entry(struct cache_detail *detail, struct cache_head *h

write_lock(&detail->hash_lock);
rv = cache_is_valid(detail, h);
- if (rv != -EAGAIN) {
- write_unlock(&detail->hash_lock);
- return rv;
+ if (rv == -EAGAIN) {
+ set_bit(CACHE_NEGATIVE, &h->flags);
+ cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
+ rv = -ENOENT;
}
- set_bit(CACHE_NEGATIVE, &h->flags);
- cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
write_unlock(&detail->hash_lock);
cache_fresh_unlocked(h, detail);
- return -ENOENT;
+ return rv;
}

/*
@@ -275,13 +274,10 @@ int cache_check(struct cache_detail *detail,
if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
switch (cache_make_upcall(detail, h)) {
case -EINVAL:
- clear_bit(CACHE_PENDING, &h->flags);
- cache_revisit_request(h);
rv = try_to_negate_entry(detail, h);
break;
case -EAGAIN:
- clear_bit(CACHE_PENDING, &h->flags);
- cache_revisit_request(h);
+ cache_fresh_unlocked(h, detail);
break;
}
}
@@ -457,9 +453,7 @@ static int cache_clean(void)
current_index ++;
spin_unlock(&cache_list_lock);
if (ch) {
- if (test_and_clear_bit(CACHE_PENDING, &ch->flags))
- cache_dequeue(current_detail, ch);
- cache_revisit_request(ch);
+ cache_fresh_unlocked(ch, d);
cache_put(ch, d);
}
} else



2013-06-13 02:55:12

by NeilBrown

[permalink] [raw]
Subject: [PATCH 4/5] net/sunrpc: xpt_auth_cache should be ignored when expired.

commit d202cce8963d9268ff355a386e20243e8332b308
sunrpc: never return expired entries in sunrpc_cache_lookup

moved the 'entry is expired' test from cache_check to
sunrpc_cache_lookup, so that it happened early and some races could
safely be ignored.

However the ip_map (in svcauth_unix.c) has a separate single-item
cache which allows quick lookup without locking. An entry in this
case would not be subject to the expiry test and so could be used
well after it has expired.

This is not normally a big problem because the first time it is used
after it is expired an up-call will be scheduled to refresh the entry
(if it hasn't been scheduled already) and the old entry will then
be invalidated. So on the second attempt to use it after it has
expired, ip_map_cached_get will discard it.

However that is subtle and not ideal, so replace the "!cache_valid"
test with "cache_is_expired".
In doing this we drop the test on the "CACHE_VALID" bit. This is
unnecessary as the bit is never cleared, and an entry will only
be cached if the bit is set.

Reported-by: Bodo Stroesser <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
include/linux/sunrpc/cache.h | 48 ++++++++++++++++++------------------------
net/sunrpc/cache.c | 6 -----
net/sunrpc/svcauth_unix.c | 4 ++--
3 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 8419f7d..6ce690d 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -149,6 +149,24 @@ struct cache_deferred_req {
int too_many);
};

+/*
+ * timestamps kept in the cache are expressed in seconds
+ * since boot. This is the best for measuring differences in
+ * real time.
+ */
+static inline time_t seconds_since_boot(void)
+{
+ struct timespec boot;
+ getboottime(&boot);
+ return get_seconds() - boot.tv_sec;
+}
+
+static inline time_t convert_to_wallclock(time_t sinceboot)
+{
+ struct timespec boot;
+ getboottime(&boot);
+ return boot.tv_sec + sinceboot;
+}

extern const struct file_operations cache_file_operations_pipefs;
extern const struct file_operations content_file_operations_pipefs;
@@ -182,15 +200,10 @@ static inline void cache_put(struct cache_head *h, struct cache_detail *cd)
kref_put(&h->ref, cd->cache_put);
}

-static inline int cache_valid(struct cache_head *h)
+static inline int cache_is_expired(struct cache_detail *detail, struct cache_head *h)
{
- /* If an item has been unhashed pending removal when
- * the refcount drops to 0, the expiry_time will be
- * set to 0. We don't want to consider such items
- * valid in this context even though CACHE_VALID is
- * set.
- */
- return (h->expiry_time != 0 && test_bit(CACHE_VALID, &h->flags));
+ return (h->expiry_time < seconds_since_boot()) ||
+ (detail->flush_time > h->last_refresh);
}

extern int cache_check(struct cache_detail *detail,
@@ -251,25 +264,6 @@ static inline int get_uint(char **bpp, unsigned int *anint)
return 0;
}

-/*
- * timestamps kept in the cache are expressed in seconds
- * since boot. This is the best for measuring differences in
- * real time.
- */
-static inline time_t seconds_since_boot(void)
-{
- struct timespec boot;
- getboottime(&boot);
- return get_seconds() - boot.tv_sec;
-}
-
-static inline time_t convert_to_wallclock(time_t sinceboot)
-{
- struct timespec boot;
- getboottime(&boot);
- return boot.tv_sec + sinceboot;
-}
-
static inline time_t get_expiry(char **bpp)
{
int rv;
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 454e23c..d01eb07 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -50,12 +50,6 @@ static void cache_init(struct cache_head *h)
h->last_refresh = now;
}

-static inline int cache_is_expired(struct cache_detail *detail, struct cache_head *h)
-{
- return (h->expiry_time < seconds_since_boot()) ||
- (detail->flush_time > h->last_refresh);
-}
-
struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
struct cache_head *key, int hash)
{
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 06bdf5a..a98853d 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -347,13 +347,13 @@ ip_map_cached_get(struct svc_xprt *xprt)
spin_lock(&xprt->xpt_lock);
ipm = xprt->xpt_auth_cache;
if (ipm != NULL) {
- if (!cache_valid(&ipm->h)) {
+ sn = net_generic(xprt->xpt_net, sunrpc_net_id);
+ if (cache_is_expired(sn->ip_map_cache, &ipm->h)) {
/*
* The entry has been invalidated since it was
* remembered, e.g. by a second mount from the
* same IP address.
*/
- sn = net_generic(xprt->xpt_net, sunrpc_net_id);
xprt->xpt_auth_cache = NULL;
spin_unlock(&xprt->xpt_lock);
cache_put(&ipm->h, sn->ip_map_cache);



2013-07-02 01:53:26

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 0/5] Fix assorted races in the sunrpc cache.

On Mon, 1 Jul 2013 20:39:58 -0400 "J. Bruce Fields" <[email protected]>
wrote:

> On Thu, Jun 13, 2013 at 12:53:42PM +1000, NeilBrown wrote:
> > There are a few races possible in the sunrpc cache which can result in
> > messages being dropped or memory not being freed.
> >
> > These patches have been tested extensively and appear to remove all
> > dropped-message errors and memory leaks.
>
> OK, thanks very much to both of you for keeping at these. Applied to
>
> git://linux-nfs.org/~bfields/linux.git for-3.11
>
> Some trivial fixup to the second patch was needed due to
> b6040f9706c4c81cc50b50855ed70840f022bebb "sunrpc: the cache_detail in
> cache_is_valid is unused any more" (but you might want to double-check
> to make sure I didn't typo something).
>
> --b.

Thanks. I couldn't see any typos. And if the compiler doesn't either, that
settles it :-)

NeilBrown


Attachments:
signature.asc (828.00 B)

2013-07-02 00:40:02

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/5] Fix assorted races in the sunrpc cache.

On Thu, Jun 13, 2013 at 12:53:42PM +1000, NeilBrown wrote:
> There are a few races possible in the sunrpc cache which can result in
> messages being dropped or memory not being freed.
>
> These patches have been tested extensively and appear to remove all
> dropped-message errors and memory leaks.

OK, thanks very much to both of you for keeping at these. Applied to

git://linux-nfs.org/~bfields/linux.git for-3.11

Some trivial fixup to the second patch was needed due to
b6040f9706c4c81cc50b50855ed70840f022bebb "sunrpc: the cache_detail in
cache_is_valid is unused any more" (but you might want to double-check
to make sure I didn't typo something).

--b.