LinuxLists.cc - [PATCH RFC 0/3] nfsd: convert nfsd DRC code to use list

2013-12-05 11:01:00

Subject: [PATCH RFC 0/3] nfsd: convert nfsd DRC code to use list_lru infrastructure

This patchset converts the LRU list in the nfsd duplicate reply cache to
use the new list_lru infrastrucure. Note that this is based on top of
the patch that I sent to Bruce earlier this week that fixes the
svc_cacherep direct reclaim bug. I'm sending this as an RFC since I'm
not 100% convinced it's an improvement.

The majorly unintuitive thing about list_lru that I've found is that
you can't call call list_lru_del from list_lru_walk. So, you need
different routines to free an object depending on how it's being freed.

Using list_lru also means extra spinlocking since we'd be moving to a
per-node LRU, but that's probably not a big deal since all of this is
done under the cache_lock anyway.

In any case, here's what a conversion to list_lru would look like...

Discuss!

Jeff Layton (3):
nfsd: don't try to reuse an expired DRC entry off the list
list_lru: add a new LRU_SKIP_REST lru_status value and handling
nfsd: convert DRC code to use list_lru

fs/nfsd/nfscache.c | 130 +++++++++++++++++++++++++----------------------
include/linux/list_lru.h | 2 +
mm/list_lru.c | 4 +-
3 files changed, 75 insertions(+), 61 deletions(-)

--
1.8.4.2

2013-12-05 13:42:23

by Jeff Layton

[permalink] [raw]

Subject: Re: [PATCH RFC 1/3] nfsd: don't try to reuse an expired DRC entry off the list

On Thu, 5 Dec 2013 05:29:19 -0800
Christoph Hellwig <[email protected]> wrote:

> On Thu, Dec 05, 2013 at 06:00:51AM -0500, Jeff Layton wrote:
> > Currently when we are processing a request, we try to scrape an expired
> > or over-limit entry off the list in preference to allocating a new one
> > from the slab.
> >
> > This is unnecessarily complicated. Just use the slab layer.
>
> I really don't think pruning caches from lookup functions is a good
> idea. To many chances for locking inversions, unbounded runtime and
> other issues. So I'd prefer my earlier version of the patch to
> remove it entirely if we want to change anything beyond your minimal
> fix.
>

It's not likely to hit a locking inversion here since we don't have a
lot of different locks, but point taken...

If we take that out, then that does mean that the cache may grow larger
than the max...possibly much larger since the pruner workqueue job
only runs every 120s and you can put a lot more entries into the cache
in that period.

That said, I don't feel too strongly about it, so if you think leaving
it to the workqueue job and the shrinker is the right thing to do, then
so be it.

--
Jeff Layton <[email protected]>

2013-12-05 13:29:20

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH RFC 1/3] nfsd: don't try to reuse an expired DRC entry off the list

On Thu, Dec 05, 2013 at 06:00:51AM -0500, Jeff Layton wrote:
> Currently when we are processing a request, we try to scrape an expired
> or over-limit entry off the list in preference to allocating a new one
> from the slab.
>
> This is unnecessarily complicated. Just use the slab layer.

I really don't think pruning caches from lookup functions is a good
idea. To many chances for locking inversions, unbounded runtime and
other issues. So I'd prefer my earlier version of the patch to
remove it entirely if we want to change anything beyond your minimal
fix.

2013-12-11 16:31:53

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH RFC 3/3] nfsd: convert DRC code to use list_lru

On Thu, Dec 05, 2013 at 10:58:23AM -0500, J. Bruce Fields wrote:
> On Thu, Dec 05, 2013 at 08:48:25AM -0500, Jeff Layton wrote:
> > On Thu, 5 Dec 2013 05:41:29 -0800
> > Christoph Hellwig <[email protected]> wrote:
> > > All together doesn't seem to helpful as long as the DRC keeps
> > > it's own timing based purge and similar bits unfortunatel.
> >
> > Agreed. It might make sense to do this in the context of a larger
> > redesign but otherwise I can't get too excited about this set, TBH...
>
> Well, patch 1/3 still deletes some code and doesn't have any obvious
> downsides.

I'm applying that one patch and dropping the rest for now.

I understand Christoph's request to remove the one-off cache pruning but
first want to be convinced that's safe.

--b.

2013-12-05 15:58:25

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH RFC 3/3] nfsd: convert DRC code to use list_lru

On Thu, Dec 05, 2013 at 08:48:25AM -0500, Jeff Layton wrote:
> On Thu, 5 Dec 2013 05:41:29 -0800
> Christoph Hellwig <[email protected]> wrote:
> > All together doesn't seem to helpful as long as the DRC keeps
> > it's own timing based purge and similar bits unfortunatel.
>
> Agreed. It might make sense to do this in the context of a larger
> redesign but otherwise I can't get too excited about this set, TBH...

Well, patch 1/3 still deletes some code and doesn't have any obvious
downsides.

--b.

2013-12-05 11:01:03

by Jeff Layton

[permalink] [raw]

Subject: [PATCH RFC 1/3] nfsd: don't try to reuse an expired DRC entry off the list

Currently when we are processing a request, we try to scrape an expired
or over-limit entry off the list in preference to allocating a new one
from the slab.

This is unnecessarily complicated. Just use the slab layer.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfscache.c | 36 ++++--------------------------------
1 file changed, 4 insertions(+), 32 deletions(-)

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index b6af150..f8f060f 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -132,13 +132,6 @@ nfsd_reply_cache_alloc(void)
}

static void
-nfsd_reply_cache_unhash(struct svc_cacherep *rp)
-{
- hlist_del_init(&rp->c_hash);
- list_del_init(&rp->c_lru);
-}
-
-static void
nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
{
if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) {
@@ -416,22 +409,8 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)

/*
* Since the common case is a cache miss followed by an insert,
- * preallocate an entry. First, try to reuse the first entry on the LRU
- * if it works, then go ahead and prune the LRU list.
+ * preallocate an entry.
*/
- spin_lock(&cache_lock);
- if (!list_empty(&lru_head)) {
- rp = list_first_entry(&lru_head, struct svc_cacherep, c_lru);
- if (nfsd_cache_entry_expired(rp) ||
- num_drc_entries >= max_drc_entries) {
- nfsd_reply_cache_unhash(rp);
- prune_cache_entries();
- goto search_cache;
- }
- }
-
- /* No expired ones available, allocate a new one. */
- spin_unlock(&cache_lock);
rp = nfsd_reply_cache_alloc();
spin_lock(&cache_lock);
if (likely(rp)) {
@@ -439,7 +418,9 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
drc_mem_usage += sizeof(*rp);
}

-search_cache:
+ /* go ahead and prune the cache */
+ prune_cache_entries();
+
found = nfsd_cache_search(rqstp, csum);
if (found) {
if (likely(rp))
@@ -453,15 +434,6 @@ search_cache:
goto out;
}

- /*
- * We're keeping the one we just allocated. Are we now over the
- * limit? Prune one off the tip of the LRU in trade for the one we
- * just allocated if so.
- */
- if (num_drc_entries >= max_drc_entries)
- nfsd_reply_cache_free_locked(list_first_entry(&lru_head,
- struct svc_cacherep, c_lru));
-
nfsdstats.rcmisses++;
rqstp->rq_cacherep = rp;
rp->c_state = RC_INPROG;
--
1.8.4.2

2013-12-05 13:41:29

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH RFC 3/3] nfsd: convert DRC code to use list_lru

> static void
> -nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
> +__nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
> {
> if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) {
> drc_mem_usage -= rp->c_replvec.iov_len;
> @@ -140,13 +141,26 @@ nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
> }
> if (!hlist_unhashed(&rp->c_hash))
> hlist_del(&rp->c_hash);
> - list_del(&rp->c_lru);
> --num_drc_entries;
> drc_mem_usage -= sizeof(*rp);
> kmem_cache_free(drc_slab, rp);

I would be better to move the hash list deletion out of this and
keep this as a plain nfsd_reply_cache_free(). E.g. in the lookup
cache hit case we've never linked any item and would want this low-level
function.

> }
>
> static void
> +nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
> +{
> + list_lru_del(&lru_head, &rp->c_lru);
> + __nfsd_reply_cache_free_locked(rp);
> +}
> +
> +static void
> +nfsd_reply_cache_free_isolate(struct svc_cacherep *rp)
> +{
> + list_del(&rp->c_lru);
> + __nfsd_reply_cache_free_locked(rp);
> +}

Should be merged into the only caller.

> +
> +static void
> nfsd_reply_cache_free(struct svc_cacherep *rp)
> {
> spin_lock(&cache_lock);
> @@ -156,50 +170,66 @@ nfsd_reply_cache_free(struct svc_cacherep *rp)
>

> +static enum lru_status
> +nfsd_purge_lru_entry(struct list_head *item, spinlock_t *lock, void *cb_arg)
> {
> - struct svc_cacherep *rp;
> + struct svc_cacherep *rp = list_entry(item, struct svc_cacherep, c_lru);
>
> + nfsd_reply_cache_free_locked(rp);
> + return LRU_REMOVED;
> +}
> +
> +void nfsd_reply_cache_shutdown(void)
> +{
> unregister_shrinker(&nfsd_reply_cache_shrinker);
> cancel_delayed_work_sync(&cache_cleaner);
>
> - while (!list_empty(&lru_head)) {
> - rp = list_entry(lru_head.next, struct svc_cacherep, c_lru);
> - nfsd_reply_cache_free_locked(rp);
> - }
> + /* In principle, nothing should be altering the list now, but... */
> + spin_lock(&cache_lock);
> + list_lru_walk(&lru_head, nfsd_purge_lru_entry, NULL, ULONG_MAX);
> + spin_unlock(&cache_lock);

This needs the version that does the list_del internally to, doesn't
it? I'd suggest to just use sd_purge_expired_entry with a forced
flag in the argument.

> + memset(&lru_head, 0, sizeof(lru_head));

don't think there's much of a point.

All together doesn't seem to helpful as long as the DRC keeps
it's own timing based purge and similar bits unfortunatel.

2013-12-05 11:01:04

by Jeff Layton

[permalink] [raw]

Subject: [PATCH RFC 3/3] nfsd: convert DRC code to use list_lru

Rather than keeping our own LRU list, convert the nfsd DRC code to use
the new list_lru routines.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfscache.c | 94 ++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 66 insertions(+), 28 deletions(-)

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index f8f060f..7f5480f 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -13,6 +13,7 @@
#include <linux/highmem.h>
#include <linux/log2.h>
#include <linux/hash.h>
+#include <linux/list_lru.h>
#include <net/checksum.h>

#include "nfsd.h"
@@ -28,7 +29,7 @@
#define TARGET_BUCKET_SIZE 64

static struct hlist_head * cache_hash;
-static struct list_head lru_head;
+static struct list_lru lru_head;
static struct kmem_cache *drc_slab;

/* max number of entries allowed in the cache */
@@ -132,7 +133,7 @@ nfsd_reply_cache_alloc(void)
}

static void
-nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
+__nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
{
if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) {
drc_mem_usage -= rp->c_replvec.iov_len;
@@ -140,13 +141,26 @@ nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
}
if (!hlist_unhashed(&rp->c_hash))
hlist_del(&rp->c_hash);
- list_del(&rp->c_lru);
--num_drc_entries;
drc_mem_usage -= sizeof(*rp);
kmem_cache_free(drc_slab, rp);
}

static void
+nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
+{
+ list_lru_del(&lru_head, &rp->c_lru);
+ __nfsd_reply_cache_free_locked(rp);
+}
+
+static void
+nfsd_reply_cache_free_isolate(struct svc_cacherep *rp)
+{
+ list_del(&rp->c_lru);
+ __nfsd_reply_cache_free_locked(rp);
+}
+
+static void
nfsd_reply_cache_free(struct svc_cacherep *rp)
{
spin_lock(&cache_lock);
@@ -156,50 +170,66 @@ nfsd_reply_cache_free(struct svc_cacherep *rp)

int nfsd_reply_cache_init(void)
{
+ int ret;
unsigned int hashsize;

- INIT_LIST_HEAD(&lru_head);
max_drc_entries = nfsd_cache_size_limit();
num_drc_entries = 0;
hashsize = nfsd_hashsize(max_drc_entries);
maskbits = ilog2(hashsize);

register_shrinker(&nfsd_reply_cache_shrinker);
+
+ ret = list_lru_init(&lru_head);
+ if (ret)
+ goto out_error;
+
+ ret = -ENOMEM;
drc_slab = kmem_cache_create("nfsd_drc", sizeof(struct svc_cacherep),
0, 0, NULL);
if (!drc_slab)
- goto out_nomem;
+ goto out_error;

cache_hash = kcalloc(hashsize, sizeof(struct hlist_head), GFP_KERNEL);
if (!cache_hash)
- goto out_nomem;
+ goto out_error;

return 0;
-out_nomem:
- printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
+out_error:
+ printk(KERN_ERR "nfsd: failed to setup reply cache: %d\n", ret);
nfsd_reply_cache_shutdown();
- return -ENOMEM;
+ return ret;
}

-void nfsd_reply_cache_shutdown(void)
+static enum lru_status
+nfsd_purge_lru_entry(struct list_head *item, spinlock_t *lock, void *cb_arg)
{
- struct svc_cacherep *rp;
+ struct svc_cacherep *rp = list_entry(item, struct svc_cacherep, c_lru);

+ nfsd_reply_cache_free_locked(rp);
+ return LRU_REMOVED;
+}
+
+void nfsd_reply_cache_shutdown(void)
+{
unregister_shrinker(&nfsd_reply_cache_shrinker);
cancel_delayed_work_sync(&cache_cleaner);

- while (!list_empty(&lru_head)) {
- rp = list_entry(lru_head.next, struct svc_cacherep, c_lru);
- nfsd_reply_cache_free_locked(rp);
- }
+ /* In principle, nothing should be altering the list now, but... */
+ spin_lock(&cache_lock);
+ list_lru_walk(&lru_head, nfsd_purge_lru_entry, NULL, ULONG_MAX);
+ spin_unlock(&cache_lock);

- kfree (cache_hash);
+ kfree(cache_hash);
cache_hash = NULL;

if (drc_slab) {
kmem_cache_destroy(drc_slab);
drc_slab = NULL;
}
+
+ list_lru_destroy(&lru_head);
+ memset(&lru_head, 0, sizeof(lru_head));
}

/*
@@ -210,7 +240,8 @@ static void
lru_put_end(struct svc_cacherep *rp)
{
rp->c_timestamp = jiffies;
- list_move_tail(&rp->c_lru, &lru_head);
+ list_lru_del(&lru_head, &rp->c_lru);
+ list_lru_add(&lru_head, &rp->c_lru);
schedule_delayed_work(&cache_cleaner, RC_EXPIRE);
}

@@ -231,23 +262,30 @@ nfsd_cache_entry_expired(struct svc_cacherep *rp)
time_after(jiffies, rp->c_timestamp + RC_EXPIRE);
}

+static enum lru_status
+nfsd_purge_expired_entry(struct list_head *item, spinlock_t *lock, void *cb_arg)
+{
+ struct svc_cacherep *rp = list_entry(item, struct svc_cacherep, c_lru);
+
+ if (!nfsd_cache_entry_expired(rp) &&
+ num_drc_entries <= max_drc_entries)
+ return LRU_SKIP_REST;
+
+ nfsd_reply_cache_free_isolate(rp);
+ return LRU_REMOVED;
+}
+
/*
* Walk the LRU list and prune off entries that are older than RC_EXPIRE.
* Also prune the oldest ones when the total exceeds the max number of entries.
*/
-static long
+static unsigned long
prune_cache_entries(void)
{
- struct svc_cacherep *rp, *tmp;
- long freed = 0;
+ unsigned long freed;

- list_for_each_entry_safe(rp, tmp, &lru_head, c_lru) {
- if (!nfsd_cache_entry_expired(rp) &&
- num_drc_entries <= max_drc_entries)
- break;
- nfsd_reply_cache_free_locked(rp);
- freed++;
- }
+ freed = list_lru_walk(&lru_head, nfsd_purge_expired_entry, NULL,
+ ULONG_MAX);

/*
* Conditionally rearm the job. If we cleaned out the list, then
@@ -255,7 +293,7 @@ prune_cache_entries(void)
* Otherwise, we rearm the job or modify the existing one to run in
* RC_EXPIRE since we just ran the pruner.
*/
- if (list_empty(&lru_head))
+ if (num_drc_entries == 0)
cancel_delayed_work(&cache_cleaner);
else
mod_delayed_work(system_wq, &cache_cleaner, RC_EXPIRE);
--
1.8.4.2

2013-12-05 13:48:52

by Jeff Layton

[permalink] [raw]

Subject: Re: [PATCH RFC 3/3] nfsd: convert DRC code to use list_lru

On Thu, 5 Dec 2013 05:41:29 -0800
Christoph Hellwig <[email protected]> wrote:

> > static void
> > -nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
> > +__nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
> > {
> > if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) {
> > drc_mem_usage -= rp->c_replvec.iov_len;
> > @@ -140,13 +141,26 @@ nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
> > }
> > if (!hlist_unhashed(&rp->c_hash))
> > hlist_del(&rp->c_hash);
> > - list_del(&rp->c_lru);
> > --num_drc_entries;
> > drc_mem_usage -= sizeof(*rp);
> > kmem_cache_free(drc_slab, rp);
>
> I would be better to move the hash list deletion out of this and
> keep this as a plain nfsd_reply_cache_free(). E.g. in the lookup
> cache hit case we've never linked any item and would want this low-level
> function.
>

Ok.

> > }
> >
> > static void
> > +nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
> > +{
> > + list_lru_del(&lru_head, &rp->c_lru);
> > + __nfsd_reply_cache_free_locked(rp);
> > +}
> > +
> > +static void
> > +nfsd_reply_cache_free_isolate(struct svc_cacherep *rp)
> > +{
> > + list_del(&rp->c_lru);
> > + __nfsd_reply_cache_free_locked(rp);
> > +}
>
> Should be merged into the only caller.
>

Ok.

> > +
> > +static void
> > nfsd_reply_cache_free(struct svc_cacherep *rp)
> > {
> > spin_lock(&cache_lock);
> > @@ -156,50 +170,66 @@ nfsd_reply_cache_free(struct svc_cacherep *rp)
> >
>
> > +static enum lru_status
> > +nfsd_purge_lru_entry(struct list_head *item, spinlock_t *lock, void *cb_arg)
> > {
> > - struct svc_cacherep *rp;
> > + struct svc_cacherep *rp = list_entry(item, struct svc_cacherep, c_lru);
> >
> > + nfsd_reply_cache_free_locked(rp);
> > + return LRU_REMOVED;
> > +}
> > +
> > +void nfsd_reply_cache_shutdown(void)
> > +{
> > unregister_shrinker(&nfsd_reply_cache_shrinker);
> > cancel_delayed_work_sync(&cache_cleaner);
> >
> > - while (!list_empty(&lru_head)) {
> > - rp = list_entry(lru_head.next, struct svc_cacherep, c_lru);
> > - nfsd_reply_cache_free_locked(rp);
> > - }
> > + /* In principle, nothing should be altering the list now, but... */
> > + spin_lock(&cache_lock);
> > + list_lru_walk(&lru_head, nfsd_purge_lru_entry, NULL, ULONG_MAX);
> > + spin_unlock(&cache_lock);
>
> This needs the version that does the list_del internally to, doesn't
> it? I'd suggest to just use sd_purge_expired_entry with a forced
> flag in the argument.
>

Yes it does. Good catch.

> > + memset(&lru_head, 0, sizeof(lru_head));
>
> don't think there's much of a point.
>

Yeah, I got spooked by the fact that lru->node isn't zeroed out in
list_lru_destroy, but on list_lru_init it should get clobbered anyway
so the memset is pointless.

>
> All together doesn't seem to helpful as long as the DRC keeps
> it's own timing based purge and similar bits unfortunatel.

Agreed. It might make sense to do this in the context of a larger
redesign but otherwise I can't get too excited about this set, TBH...

--
Jeff Layton <[email protected]>

2013-12-05 15:50:35

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH RFC 1/3] nfsd: don't try to reuse an expired DRC entry off the list

On Thu, Dec 05, 2013 at 08:41:56AM -0500, Jeff Layton wrote:
> On Thu, 5 Dec 2013 05:29:19 -0800
> Christoph Hellwig <[email protected]> wrote:
>
> > On Thu, Dec 05, 2013 at 06:00:51AM -0500, Jeff Layton wrote:
> > > Currently when we are processing a request, we try to scrape an expired
> > > or over-limit entry off the list in preference to allocating a new one
> > > from the slab.
> > >
> > > This is unnecessarily complicated. Just use the slab layer.
> >
> > I really don't think pruning caches from lookup functions is a good
> > idea. To many chances for locking inversions, unbounded runtime and
> > other issues. So I'd prefer my earlier version of the patch to
> > remove it entirely if we want to change anything beyond your minimal
> > fix.
> >
>
> It's not likely to hit a locking inversion here since we don't have a
> lot of different locks, but point taken...
>
> If we take that out, then that does mean that the cache may grow larger
> than the max...possibly much larger since the pruner workqueue job
> only runs every 120s and you can put a lot more entries into the cache
> in that period.

Yeah, this scares me.

I looked through my old mail but couldn't find your previous results:
how long did you find the hash buckets needed to be before the
difference was noticeable on your setup?

We typically do a failed-lookup-and-insert on every cached operation so
I wouldn't be surprised if a workload of small random writes, for
example, could produce an unreasonably large cache in that time.

--b.

>
> That said, I don't feel too strongly about it, so if you think leaving
> it to the workqueue job and the shrinker is the right thing to do, then
> so be it.
>
> --
> Jeff Layton <[email protected]>

2013-12-05 13:30:08

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH RFC 2/3] list_lru: add a new LRU_SKIP_REST lru_status value and handling

On Thu, Dec 05, 2013 at 06:00:52AM -0500, Jeff Layton wrote:
> The current list_lru_walk_node implementation always walks the entire
> list. In some cases, we can be sure that when an entry isn't freeable
> that none of the rest of the entries on the list will be either.
>
> Create a new LRU_SKIP_REST return value that not only indicates that
> the current entry was skipped, but that caller should stop scanning
> the current node.
>
> Signed-off-by: Jeff Layton <[email protected]>

I'd call it LRU_DONE, but otherwise this looks fine to me.

2013-12-05 13:36:59

by Jeff Layton

[permalink] [raw]

Subject: Re: [PATCH RFC 2/3] list_lru: add a new LRU_SKIP_REST lru_status value and handling

On Thu, 5 Dec 2013 05:30:08 -0800
Christoph Hellwig <[email protected]> wrote:

> On Thu, Dec 05, 2013 at 06:00:52AM -0500, Jeff Layton wrote:
> > The current list_lru_walk_node implementation always walks the entire
> > list. In some cases, we can be sure that when an entry isn't freeable
> > that none of the rest of the entries on the list will be either.
> >
> > Create a new LRU_SKIP_REST return value that not only indicates that
> > the current entry was skipped, but that caller should stop scanning
> > the current node.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
>
> I'd call it LRU_DONE, but otherwise this looks fine to me.
>

Ok, if we end up taking this in I'll change the name...

Thanks,
--
Jeff Layton <[email protected]>

2013-12-05 13:37:55

by Jeff Layton

[permalink] [raw]

Subject: Re: [PATCH RFC 0/3] nfsd: convert nfsd DRC code to use list_lru infrastructure

On Thu, 5 Dec 2013 05:27:17 -0800
Christoph Hellwig <[email protected]> wrote:

> On Thu, Dec 05, 2013 at 06:00:50AM -0500, Jeff Layton wrote:
> > This patchset converts the LRU list in the nfsd duplicate reply cache to
> > use the new list_lru infrastrucure. Note that this is based on top of
> > the patch that I sent to Bruce earlier this week that fixes the
> > svc_cacherep direct reclaim bug. I'm sending this as an RFC since I'm
> > not 100% convinced it's an improvement.
> >
> > The majorly unintuitive thing about list_lru that I've found is that
> > you can't call call list_lru_del from list_lru_walk. So, you need
> > different routines to free an object depending on how it's being freed.
>
> That's because list_lru_walk already does all the accounting. You can
> simply do an list_del_init and then return LRU_MOVED and you're done.
>

Well no...we have to take it off the hlist too, kmem_cache_free the
object and fix up the accounting for the stats.

list_lru handles the accounting for the lru list itself, but not for
the rest of the housekeeping we have to do.

--
Jeff Layton <[email protected]>

2013-12-05 16:22:53

by Jeff Layton

[permalink] [raw]

Subject: Re: [PATCH RFC 1/3] nfsd: don't try to reuse an expired DRC entry off the list

On Thu, 5 Dec 2013 10:50:24 -0500
"J. Bruce Fields" <[email protected]> wrote:

> On Thu, Dec 05, 2013 at 08:41:56AM -0500, Jeff Layton wrote:
> > On Thu, 5 Dec 2013 05:29:19 -0800
> > Christoph Hellwig <[email protected]> wrote:
> >
> > > On Thu, Dec 05, 2013 at 06:00:51AM -0500, Jeff Layton wrote:
> > > > Currently when we are processing a request, we try to scrape an expired
> > > > or over-limit entry off the list in preference to allocating a new one
> > > > from the slab.
> > > >
> > > > This is unnecessarily complicated. Just use the slab layer.
> > >
> > > I really don't think pruning caches from lookup functions is a good
> > > idea. To many chances for locking inversions, unbounded runtime and
> > > other issues. So I'd prefer my earlier version of the patch to
> > > remove it entirely if we want to change anything beyond your minimal
> > > fix.
> > >
> >
> > It's not likely to hit a locking inversion here since we don't have a
> > lot of different locks, but point taken...
> >
> > If we take that out, then that does mean that the cache may grow larger
> > than the max...possibly much larger since the pruner workqueue job
> > only runs every 120s and you can put a lot more entries into the cache
> > in that period.
>
> Yeah, this scares me.
>
> I looked through my old mail but couldn't find your previous results:
> how long did you find the hash buckets needed to be before the
> difference was noticeable on your setup?
>
> We typically do a failed-lookup-and-insert on every cached operation so
> I wouldn't be surprised if a workload of small random writes, for
> example, could produce an unreasonably large cache in that time.
>

I don't think I ever measured that. I did some some measurement of how
long it took to generate checksums, but that's a different phase of
this stuff. I just sort of went with "too long a chain == bad, so keep
them as short as possible".

>
> >
> > That said, I don't feel too strongly about it, so if you think leaving
> > it to the workqueue job and the shrinker is the right thing to do, then
> > so be it.
> >
> > --
> > Jeff Layton <[email protected]>

--
Jeff Layton <[email protected]>

2013-12-05 13:27:19

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH RFC 0/3] nfsd: convert nfsd DRC code to use list_lru infrastructure

On Thu, Dec 05, 2013 at 06:00:50AM -0500, Jeff Layton wrote:
> This patchset converts the LRU list in the nfsd duplicate reply cache to
> use the new list_lru infrastrucure. Note that this is based on top of
> the patch that I sent to Bruce earlier this week that fixes the
> svc_cacherep direct reclaim bug. I'm sending this as an RFC since I'm
> not 100% convinced it's an improvement.
>
> The majorly unintuitive thing about list_lru that I've found is that
> you can't call call list_lru_del from list_lru_walk. So, you need
> different routines to free an object depending on how it's being freed.

That's because list_lru_walk already does all the accounting. You can
simply do an list_del_init and then return LRU_MOVED and you're done.

2013-12-05 11:01:03

by Jeff Layton

[permalink] [raw]

Subject: [PATCH RFC 2/3] list_lru: add a new LRU_SKIP_REST lru_status value and handling

The current list_lru_walk_node implementation always walks the entire
list. In some cases, we can be sure that when an entry isn't freeable
that none of the rest of the entries on the list will be either.

Create a new LRU_SKIP_REST return value that not only indicates that
the current entry was skipped, but that caller should stop scanning
the current node.

Signed-off-by: Jeff Layton <[email protected]>
---
include/linux/list_lru.h | 2 ++
mm/list_lru.c | 4 +++-
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 3ce5417..6140c3a 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -15,6 +15,8 @@ enum lru_status {
LRU_REMOVED, /* item removed from list */
LRU_ROTATE, /* item referenced, give another pass */
LRU_SKIP, /* item cannot be locked, skip */
+ LRU_SKIP_REST, /* item isn't freeable, and none of the rest
+ on the list will be either */
LRU_RETRY, /* item not freeable. May drop the lock
internally, but has to return locked. */
};
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 72f9dec..abec7fb 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -98,6 +98,8 @@ restart:
break;
case LRU_SKIP:
break;
+ case LRU_SKIP_REST:
+ goto done;
case LRU_RETRY:
/*
* The lru lock has been dropped, our list traversal is
@@ -108,7 +110,7 @@ restart:
BUG();
}
}
-
+done:
spin_unlock(&nlru->lock);
return isolated;
}
--
1.8.4.2