2006-05-24 01:24:28

by David Chinner

[permalink] [raw]
Subject: [PATCH] Per-superblock unused dentry LRU lists.


As originally described in:

http://marc.theaimsgroup.com/?l=linux-kernel&m=114491661527656&w=2

shrink_dcache_sb() becomes a severe bottleneck when the unused dentry
list becomes long and mounts and unmounts occur frequently on the
machine.

My initial attempt to solve the problem using per-node LRUs
(http://marc.theaimsgroup.com/?l=linux-kernel&m=114537118504917&w=2)
fell apart under the complexity of locking required. This approach can
probably be made to work in the long term, but we need a robust fix
for now....

The patch attached below is based on the suggestion made by Andrew
Morton here:

http://marc.theaimsgroup.com/?l=linux-fsdevel&m=114499224412427&w=2

This approach does not change any of the locking required, so avoids
the locking problems of the per-node lru implementation.

I've attempted to make reclaim fair by keeping track of the last superblock
we pruned, and starting from the next on in the list each time.

Signed-off-by: Dave Chinner <[email protected]>

---
The patch has been stress tested on single and multiple filesystems,
using dbench, postmark and parallel create/unlink load tests. It has also
been running on the problem server since last Saturday which currently
has ~11 million cached dentries, and a dentry_cache slab size of ~27 million
objects.

% diffstat patches/dcache-per-sb-lru
fs/dcache.c | 214 +++++++++++++++++++++++++++++------------------------
fs/super.c | 1
include/linux/fs.h | 1
3 files changed, 121 insertions(+), 95 deletions(-)


Index: 2.6.x-xfs-new/fs/dcache.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/dcache.c 2006-05-15 16:24:44.212207654 +1000
+++ 2.6.x-xfs-new/fs/dcache.c 2006-05-16 14:00:46.327462206 +1000
@@ -62,7 +62,6 @@ static kmem_cache_t *dentry_cache;
static unsigned int d_hash_mask;
static unsigned int d_hash_shift;
static struct hlist_head *dentry_hashtable;
-static LIST_HEAD(dentry_unused);

/* Statistics gathering. */
struct dentry_stat_t dentry_stat = {
@@ -114,6 +113,38 @@ static void dentry_iput(struct dentry *
}
}

+static void
+dentry_lru_add(struct dentry *dentry)
+{
+ list_add(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
+ dentry_stat.nr_unused++;
+}
+
+static void
+dentry_lru_add_tail(struct dentry *dentry)
+{
+ list_add_tail(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
+ dentry_stat.nr_unused++;
+}
+
+static void
+dentry_lru_del(struct dentry *dentry)
+{
+ if (!list_empty(&dentry->d_lru)) {
+ list_del(&dentry->d_lru);
+ dentry_stat.nr_unused--;
+ }
+}
+
+static void
+dentry_lru_del_init(struct dentry *dentry)
+{
+ if (likely(!list_empty(&dentry->d_lru))) {
+ list_del_init(&dentry->d_lru);
+ dentry_stat.nr_unused--;
+ }
+}
+
/*
* This is dput
*
@@ -173,8 +204,7 @@ repeat:
goto kill_it;
if (list_empty(&dentry->d_lru)) {
dentry->d_flags |= DCACHE_REFERENCED;
- list_add(&dentry->d_lru, &dentry_unused);
- dentry_stat.nr_unused++;
+ dentry_lru_add(dentry);
}
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
@@ -186,13 +216,8 @@ unhash_it:
kill_it: {
struct dentry *parent;

- /* If dentry was on d_lru list
- * delete it from there
- */
- if (!list_empty(&dentry->d_lru)) {
- list_del(&dentry->d_lru);
- dentry_stat.nr_unused--;
- }
+ /* If dentry was on d_lru list delete it from there */
+ dentry_lru_del(dentry);
list_del(&dentry->d_u.d_child);
dentry_stat.nr_dentry--; /* For d_free, below */
/*drops the locks, at that point nobody can reach this dentry */
@@ -268,10 +293,7 @@ int d_invalidate(struct dentry * dentry)
static inline struct dentry * __dget_locked(struct dentry *dentry)
{
atomic_inc(&dentry->d_count);
- if (!list_empty(&dentry->d_lru)) {
- dentry_stat.nr_unused--;
- list_del_init(&dentry->d_lru);
- }
+ dentry_lru_del_init(dentry);
return dentry;
}

@@ -377,6 +399,48 @@ static inline void prune_one_dentry(stru
spin_lock(&dcache_lock);
}

+/*
+ * Shrink the dentry LRU on æ given superblock.
+ */
+static void
+__shrink_dcache_sb(struct super_block *sb, int *count, int flags)
+{
+ struct dentry *dentry;
+ int cnt = (count == NULL) ? -1 : *count;
+
+ spin_lock(&dcache_lock);
+ while (!list_empty(&sb->s_dentry_lru) && cnt--) {
+ dentry = list_entry(sb->s_dentry_lru.prev,
+ struct dentry, d_lru);
+ dentry_lru_del_init(dentry);
+ BUG_ON(dentry->d_sb != sb);
+ prefetch(sb->s_dentry_lru.prev);
+
+ spin_lock(&dentry->d_lock);
+ /*
+ * We found an inuse dentry which was not removed from
+ * the LRU because of laziness during lookup. Do not free
+ * it - just keep it off the LRU list.
+ */
+ if (atomic_read(&dentry->d_count)) {
+ spin_unlock(&dentry->d_lock);
+ continue;
+ }
+ /* If the dentry matches the flags passed in, don't free it.
+ * clear the flags and put it back on the LRU */
+ if (flags && (dentry->d_flags & flags)) {
+ dentry->d_flags &= ~flags;
+ dentry_lru_add(dentry);
+ spin_unlock(&dentry->d_lock);
+ continue;
+ }
+ prune_one_dentry(dentry);
+ }
+ spin_unlock(&dcache_lock);
+ if (count)
+ *count = cnt;
+}
+
/**
* prune_dcache - shrink the dcache
* @count: number of entries to try and free
@@ -392,42 +456,44 @@ static inline void prune_one_dentry(stru

static void prune_dcache(int count)
{
- spin_lock(&dcache_lock);
- for (; count ; count--) {
- struct dentry *dentry;
- struct list_head *tmp;
-
- cond_resched_lock(&dcache_lock);
+ struct super_block *sb;
+ static struct super_block *sb_hand = NULL;
+ int work_done = 0;
+
+ spin_lock(&sb_lock);
+ if (sb_hand == NULL)
+ sb_hand = sb_entry(super_blocks.next);
+restart:
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ if (sb != sb_hand)
+ continue;
+ /* Found the next superblock to work on.
+ * Move the hand forwards so that parallel
+ * pruners will work on a different sb */
+ work_done++;
+ sb_hand = sb_entry(sb->s_list.next);
+ sb->s_count++;
+ spin_unlock(&sb_lock);
+
+ /* we don't take the s_umount lock here because
+ * because we can be called already holding a
+ * write lock on a superblock */
+ if (!list_empty(&sb->s_dentry_lru))
+ __shrink_dcache_sb(sb, &count, DCACHE_REFERENCED);

- tmp = dentry_unused.prev;
- if (tmp == &dentry_unused)
+ spin_lock(&sb_lock);
+ if (__put_super_and_need_restart(sb) && count)
+ goto restart;
+ if (count <= 0)
break;
- list_del_init(tmp);
- prefetch(dentry_unused.prev);
- dentry_stat.nr_unused--;
- dentry = list_entry(tmp, struct dentry, d_lru);
-
- spin_lock(&dentry->d_lock);
- /*
- * We found an inuse dentry which was not removed from
- * dentry_unused because of laziness during lookup. Do not free
- * it - just keep it off the dentry_unused list.
- */
- if (atomic_read(&dentry->d_count)) {
- spin_unlock(&dentry->d_lock);
- continue;
- }
- /* If the dentry was recently referenced, don't free it. */
- if (dentry->d_flags & DCACHE_REFERENCED) {
- dentry->d_flags &= ~DCACHE_REFERENCED;
- list_add(&dentry->d_lru, &dentry_unused);
- dentry_stat.nr_unused++;
- spin_unlock(&dentry->d_lock);
- continue;
- }
- prune_one_dentry(dentry);
}
- spin_unlock(&dcache_lock);
+ if (!work_done) {
+ /* sb_hand is stale. Start and the beginning of the
+ * list again. */
+ sb_hand = sb_entry(super_blocks.next);
+ goto restart;
+ }
+ spin_unlock(&sb_lock);
}

/*
@@ -454,41 +520,7 @@ static void prune_dcache(int count)

void shrink_dcache_sb(struct super_block * sb)
{
- struct list_head *tmp, *next;
- struct dentry *dentry;
-
- /*
- * Pass one ... move the dentries for the specified
- * superblock to the most recent end of the unused list.
- */
- spin_lock(&dcache_lock);
- list_for_each_safe(tmp, next, &dentry_unused) {
- dentry = list_entry(tmp, struct dentry, d_lru);
- if (dentry->d_sb != sb)
- continue;
- list_del(tmp);
- list_add(tmp, &dentry_unused);
- }
-
- /*
- * Pass two ... free the dentries for this superblock.
- */
-repeat:
- list_for_each_safe(tmp, next, &dentry_unused) {
- dentry = list_entry(tmp, struct dentry, d_lru);
- if (dentry->d_sb != sb)
- continue;
- dentry_stat.nr_unused--;
- list_del_init(tmp);
- spin_lock(&dentry->d_lock);
- if (atomic_read(&dentry->d_count)) {
- spin_unlock(&dentry->d_lock);
- continue;
- }
- prune_one_dentry(dentry);
- goto repeat;
- }
- spin_unlock(&dcache_lock);
+ __shrink_dcache_sb(sb, NULL, 0);
}

/*
@@ -572,17 +604,13 @@ resume:
struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
next = tmp->next;

- if (!list_empty(&dentry->d_lru)) {
- dentry_stat.nr_unused--;
- list_del_init(&dentry->d_lru);
- }
+ dentry_lru_del_init(dentry);
/*
* move only zero ref count dentries to the end
* of the unused list for prune_dcache
*/
if (!atomic_read(&dentry->d_count)) {
- list_add(&dentry->d_lru, dentry_unused.prev);
- dentry_stat.nr_unused++;
+ dentry_lru_add_tail(dentry);
found++;
}

@@ -657,18 +685,14 @@ void shrink_dcache_anon(struct hlist_hea
spin_lock(&dcache_lock);
hlist_for_each(lp, head) {
struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
- if (!list_empty(&this->d_lru)) {
- dentry_stat.nr_unused--;
- list_del_init(&this->d_lru);
- }

+ dentry_lru_del_init(this);
/*
* move only zero ref count dentries to the end
* of the unused list for prune_dcache
*/
if (!atomic_read(&this->d_count)) {
- list_add_tail(&this->d_lru, &dentry_unused);
- dentry_stat.nr_unused++;
+ dentry_lru_add_tail(this);
found++;
}
}
@@ -1019,7 +1043,7 @@ struct dentry *d_splice_alias(struct ino
* rcu_read_lock() and rcu_read_unlock() are used to disable preemption while
* lookup is going on.
*
- * dentry_unused list is not updated even if lookup finds the required dentry
+ * The dentry unused LRU is not updated even if lookup finds the required dentry
* in there. It is updated in places such as prune_dcache, shrink_dcache_sb,
* select_parent and __dget_locked. This laziness saves lookup from dcache_lock
* acquisition.
Index: 2.6.x-xfs-new/include/linux/fs.h
===================================================================
--- 2.6.x-xfs-new.orig/include/linux/fs.h 2006-03-17 13:16:16.000000000 +1100
+++ 2.6.x-xfs-new/include/linux/fs.h 2006-05-15 17:14:53.532277564 +1000
@@ -831,6 +831,7 @@ struct super_block {
struct list_head s_io; /* parked for writeback */
struct hlist_head s_anon; /* anonymous dentries for (nfs) exporting */
struct list_head s_files;
+ struct list_head s_dentry_lru; /* unused dentry lru */

struct block_device *s_bdev;
struct list_head s_instances;
Index: 2.6.x-xfs-new/fs/super.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/super.c 2006-03-17 13:16:12.000000000 +1100
+++ 2.6.x-xfs-new/fs/super.c 2006-05-15 17:49:59.009147060 +1000
@@ -71,6 +71,7 @@ static struct super_block *alloc_super(v
INIT_LIST_HEAD(&s->s_instances);
INIT_HLIST_HEAD(&s->s_anon);
INIT_LIST_HEAD(&s->s_inodes);
+ INIT_LIST_HEAD(&s->s_dentry_lru);
init_rwsem(&s->s_umount);
mutex_init(&s->s_lock);
down_write(&s->s_umount);


2006-05-24 01:41:34

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.

On Wed, 24 May 2006 11:24:12 +1000 David Chinner wrote:

> Index: 2.6.x-xfs-new/fs/dcache.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/dcache.c 2006-05-15 16:24:44.212207654 +1000
> +++ 2.6.x-xfs-new/fs/dcache.c 2006-05-16 14:00:46.327462206 +1000
> @@ -114,6 +113,38 @@ static void dentry_iput(struct dentry *
> }
> }
>
> +static void
> +dentry_lru_add(struct dentry *dentry)
> +{
> + list_add(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
> + dentry_stat.nr_unused++;
> +}
> +
> +static void
> +dentry_lru_add_tail(struct dentry *dentry)
> +{
> + list_add_tail(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
> + dentry_stat.nr_unused++;
> +}
> +
> +static void
> +dentry_lru_del(struct dentry *dentry)
> +{
> + if (!list_empty(&dentry->d_lru)) {
> + list_del(&dentry->d_lru);
> + dentry_stat.nr_unused--;
> + }
> +}
> +
> +static void
> +dentry_lru_del_init(struct dentry *dentry)
> +{
> + if (likely(!list_empty(&dentry->d_lru))) {
> + list_del_init(&dentry->d_lru);
> + dentry_stat.nr_unused--;
> + }
> +}
> +
> /*
> * This is dput
> *

Please use regular kernel coding style for functions:
don't put qualifiers and names on separate lines.

> @@ -377,6 +399,48 @@ static inline void prune_one_dentry(stru
> spin_lock(&dcache_lock);
> }
>
> +/*
> + * Shrink the dentry LRU on æ given superblock.

on ? given superblock ?

> + */
> +static void
> +__shrink_dcache_sb(struct super_block *sb, int *count, int flags)
> +{


---
~Randy

2006-05-24 01:48:58

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.

On Tue, May 23, 2006 at 06:44:07PM -0700, Randy.Dunlap wrote:
> On Wed, 24 May 2006 11:24:12 +1000 David Chinner wrote:
> > +static void
> > +dentry_lru_del_init(struct dentry *dentry)
> > +{
> > + if (likely(!list_empty(&dentry->d_lru))) {
> > + list_del_init(&dentry->d_lru);
> > + dentry_stat.nr_unused--;
> > + }
> > +}
> > +
> > /*
> > * This is dput
> > *
>
> Please use regular kernel coding style for functions:
> don't put qualifiers and names on separate lines.

Oops - I'm too used to the XFS coding style. Fixed.

> > @@ -377,6 +399,48 @@ static inline void prune_one_dentry(stru
> > spin_lock(&dcache_lock);
> > }
> >
> > +/*
> > + * Shrink the dentry LRU on æ given superblock.
>
> on ? given superblock ?

"a". Fixed.

I'll wait for other comments before sending out an updated patch.

Cheers,

Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

2006-05-24 01:59:47

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.

On Wed, 2006-05-24 at 11:24 +1000, David Chinner wrote:


> http://marc.theaimsgroup.com/?l=linux-kernel&m=114491661527656&w=2
>
> shrink_dcache_sb() becomes a severe bottleneck when the unused dentry
> list becomes long and mounts and unmounts occur frequently on the
> machine.

how does a per SB list deal with umounts that occur frequently? I
suppose it means destroying just your list...

> I've attempted to make reclaim fair by keeping track of the last superblock
> we pruned, and starting from the next on in the list each time.

how fair is this in the light of a non-equal sizes? say you have a 4Gb
fs and a 1Tb list. Will your approach result in trying to prune 1000
from the 4Gb, then 1000 from the 1Tb, then 1000 from the 4Gb etc ?
while that is fair in absolute terms, in relative terms it's highly not
fair to the small filesystem....
(I'm not sure there is ONE right answer here, well maybe by scaling the
count to the per fs count rather than to the total...)



this makes me wonder btw if we can do a per superblock slab for these
dentries, that may decrease fragmentation of slabs in combination with
your patch....


2006-05-24 04:02:12

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.

On Wed, May 24, 2006 at 03:59:40AM +0200, Arjan van de Ven wrote:
> On Wed, 2006-05-24 at 11:24 +1000, David Chinner wrote:
>
>
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=114491661527656&w=2
> >
> > shrink_dcache_sb() becomes a severe bottleneck when the unused dentry
> > list becomes long and mounts and unmounts occur frequently on the
> > machine.
>
> how does a per SB list deal with umounts that occur frequently? I
> suppose it means destroying just your list...

shrink_dcache_sb no longer walks the single unsed dentry list
with the dcache lock held. With many millions of dentries on the
unused list, this meant that all dentry operations come to a
halt while one process executes shrink_dcache_sb().

With a per-sb lru, we no longer have to walk a global list to find
all the dentries that belong to the superblock we are unmounting.
Hence we only hold the dcache_lock for as long as it takes to pull
a dentry of the list and free it.

> > I've attempted to make reclaim fair by keeping track of the last superblock
> > we pruned, and starting from the next on in the list each time.
>
> how fair is this in the light of a non-equal sizes?

It's not. If you read the thread I pointed at, you'll note that
I commented at the time this method introduces "interesting" reclaim
fairness issues, and this is one of the reasons I didn't implement
this method straight away.

> say you have a 4Gb
> fs and a 1Tb list. Will your approach result in trying to prune 1000
> from the 4Gb, then 1000 from the 1Tb, then 1000 from the 4Gb etc ?
> while that is fair in absolute terms, in relative terms it's highly not
> fair to the small filesystem....

Yup, that is what the current code I've written will do. I just
wanted someting that worked over all superblocks to begin with.
It's not very smart, but improving it can be done incrementally.

> (I'm not sure there is ONE right answer here, well maybe by scaling the
> count to the per fs count rather than to the total...)

Agreed - scaling is probably going to result in the fairest reclaim,
but I prefered to focus on getting the mechanism working properly
and testing that before considering optimising the reclaim algorithm.

Also, I'd prefer to address deficiencies as they arise, rather than
make optimisations based on assumptions that may be incorrect....

> this makes me wonder btw if we can do a per superblock slab for these
> dentries, that may decrease fragmentation of slabs in combination with
> your patch....

That doesn't prevent slab fragmentation per filesystem. And most of
the fragmentation problems I see are in the inode caches so at best
this would be a bandaid rather than a real solution to the fundamental
problems....

Cheers,

Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

2006-05-24 04:23:14

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.

On Wed, 2006-05-24 at 14:01 +1000, David Chinner wrote:
> On Wed, May 24, 2006 at 03:59:40AM +0200, Arjan van de Ven wrote:
> > On Wed, 2006-05-24 at 11:24 +1000, David Chinner wrote:
> >
> >
> > > http://marc.theaimsgroup.com/?l=linux-kernel&m=114491661527656&w=2
> > >
> > > shrink_dcache_sb() becomes a severe bottleneck when the unused dentry
> > > list becomes long and mounts and unmounts occur frequently on the
> > > machine.
> >
> > how does a per SB list deal with umounts that occur frequently? I
> > suppose it means destroying just your list...
>
> shrink_dcache_sb no longer walks the single unsed dentry list
> with the dcache lock held. With many millions of dentries on the
> unused list, this meant that all dentry operations come to a
> halt while one process executes shrink_dcache_sb().
>
> With a per-sb lru, we no longer have to walk a global list to find
> all the dentries that belong to the superblock we are unmounting.
> Hence we only hold the dcache_lock for as long as it takes to pull
> a dentry of the list and free it.

list_splice is great for that. I can for sure see this benefit

> Yup, that is what the current code I've written will do. I just
> wanted someting that worked over all superblocks to begin with.
> It's not very smart, but improving it can be done incrementally.

I think that if you say A you should say B, I mean, if you make the list
per SB you probably just should do the step and make at least the
counter per SB as well. That will also save in cacheline bounces I
suppose... but more importantly you keep the counters next to the list.
Which you'll also need to do any kind of scaling I suppose later on, so
might as well keep the stats already.


> > this makes me wonder btw if we can do a per superblock slab for these
> > dentries, that may decrease fragmentation of slabs in combination with
> > your patch....
>
> That doesn't prevent slab fragmentation per filesystem. And most of
> the fragmentation problems I see are in the inode caches so at best
> this would be a bandaid rather than a real solution to the fundamental
> problems....

well what it would do is make sure an unmount causes a lot of free
pages, not just fragmented slabs.

Also it will give some more locality to group related dentries together,
so pinning wise it will help some.. but yeah at this point this is just
hand holding. I'm just sort of wondering how much stuff can be made per
SB.. if we go this way with the list, maybe just more can follow...


2006-05-24 05:06:32

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.

On Wed, May 24, 2006 at 11:24:12AM +1000, David Chinner wrote:
>
> As originally described in:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=114491661527656&w=2
>
> shrink_dcache_sb() becomes a severe bottleneck when the unused dentry
> list becomes long and mounts and unmounts occur frequently on the
> machine.
>
> My initial attempt to solve the problem using per-node LRUs
> (http://marc.theaimsgroup.com/?l=linux-kernel&m=114537118504917&w=2)
> fell apart under the complexity of locking required. This approach can
> probably be made to work in the long term, but we need a robust fix
> for now....
>
> The patch attached below is based on the suggestion made by Andrew
> Morton here:
>
> http://marc.theaimsgroup.com/?l=linux-fsdevel&m=114499224412427&w=2
>
> This approach does not change any of the locking required, so avoids
> the locking problems of the per-node lru implementation.
>
> I've attempted to make reclaim fair by keeping track of the last superblock
> we pruned, and starting from the next on in the list each time.
>
> Signed-off-by: Dave Chinner <[email protected]>
>
> ---
> The patch has been stress tested on single and multiple filesystems,
> using dbench, postmark and parallel create/unlink load tests. It has also
> been running on the problem server since last Saturday which currently
> has ~11 million cached dentries, and a dentry_cache slab size of ~27 million
> objects.

I like the patch already :)

<snip>

> +static void
> +dentry_lru_add(struct dentry *dentry)
> +{
> + list_add(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
> + dentry_stat.nr_unused++;
> +}
> +
> +static void
> +dentry_lru_add_tail(struct dentry *dentry)
> +{
> + list_add_tail(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
> + dentry_stat.nr_unused++;
> +}
> +
> +static void
> +dentry_lru_del(struct dentry *dentry)
> +{
> + if (!list_empty(&dentry->d_lru)) {
> + list_del(&dentry->d_lru);
> + dentry_stat.nr_unused--;
> + }
> +}
> +
> +static void
> +dentry_lru_del_init(struct dentry *dentry)
> +{
> + if (likely(!list_empty(&dentry->d_lru))) {
> + list_del_init(&dentry->d_lru);
> + dentry_stat.nr_unused--;
> + }
> +}
> +

I wonder if there should be per-sb count of number of used dentries. This
will help us while unmounting. Instead of passing count as NULL, the number
of dentries in the super block can be passed. May be we should have
per-dentry stats along with global statistics.

<snip>

>
> +/*
> + * Shrink the dentry LRU on æ given superblock.
> + */
> +static void
> +__shrink_dcache_sb(struct super_block *sb, int *count, int flags)
> +{
> + struct dentry *dentry;
> + int cnt = (count == NULL) ? -1 : *count;

This code seems hacky. Some comments please. Negative counters are generally
not good IMHO. This can be avoided by per-sb dentry stats (please see
the comment above)

> +
> + spin_lock(&dcache_lock);
> + while (!list_empty(&sb->s_dentry_lru) && cnt--) {
> + dentry = list_entry(sb->s_dentry_lru.prev,
> + struct dentry, d_lru);
> + dentry_lru_del_init(dentry);
> + BUG_ON(dentry->d_sb != sb);
> + prefetch(sb->s_dentry_lru.prev);
> +
> + spin_lock(&dentry->d_lock);
> + /*
> + * We found an inuse dentry which was not removed from
> + * the LRU because of laziness during lookup. Do not free
> + * it - just keep it off the LRU list.
> + */
> + if (atomic_read(&dentry->d_count)) {
> + spin_unlock(&dentry->d_lock);
> + continue;
> + }
> + /* If the dentry matches the flags passed in, don't free it.
> + * clear the flags and put it back on the LRU */

The comment does not follow coding style conventions. What do these flags
typically contain - DCACHE_REFERENCED? Could you clean up this comment
please.

> + if (flags && (dentry->d_flags & flags)) {
> + dentry->d_flags &= ~flags;
> + dentry_lru_add(dentry);
> + spin_unlock(&dentry->d_lock);
> + continue;
> + }
> + prune_one_dentry(dentry);
> + }
> + spin_unlock(&dcache_lock);
> + if (count)
> + *count = cnt;
> +}
> +
> /**
> * prune_dcache - shrink the dcache
> * @count: number of entries to try and free
> @@ -392,42 +456,44 @@ static inline void prune_one_dentry(stru
>
> static void prune_dcache(int count)
> {
> - spin_lock(&dcache_lock);
> - for (; count ; count--) {
> - struct dentry *dentry;
> - struct list_head *tmp;
> -
> - cond_resched_lock(&dcache_lock);
> + struct super_block *sb;
> + static struct super_block *sb_hand = NULL;
> + int work_done = 0;
> +
> + spin_lock(&sb_lock);
> + if (sb_hand == NULL)
> + sb_hand = sb_entry(super_blocks.next);
> +restart:
> + list_for_each_entry(sb, &super_blocks, s_list) {
> + if (sb != sb_hand)
> + continue;
> + /* Found the next superblock to work on.
> + * Move the hand forwards so that parallel
> + * pruners will work on a different sb */
> + work_done++;
> + sb_hand = sb_entry(sb->s_list.next);
> + sb->s_count++;
> + spin_unlock(&sb_lock);
> +
> + /* we don't take the s_umount lock here because
> + * because we can be called already holding a
> + * write lock on a superblock */

You can use trylock. Please see the patches in -mm to fix the umount
race.

> + if (!list_empty(&sb->s_dentry_lru))
> + __shrink_dcache_sb(sb, &count, DCACHE_REFERENCED);
>
> - tmp = dentry_unused.prev;
> - if (tmp == &dentry_unused)
> + spin_lock(&sb_lock);
> + if (__put_super_and_need_restart(sb) && count)
> + goto restart;

Comment please.

> + if (count <= 0)
> break;
> - list_del_init(tmp);
> - prefetch(dentry_unused.prev);
> - dentry_stat.nr_unused--;
> - dentry = list_entry(tmp, struct dentry, d_lru);
> -
> - spin_lock(&dentry->d_lock);
> - /*
> - * We found an inuse dentry which was not removed from
> - * dentry_unused because of laziness during lookup. Do not free
> - * it - just keep it off the dentry_unused list.
> - */
> - if (atomic_read(&dentry->d_count)) {
> - spin_unlock(&dentry->d_lock);
> - continue;
> - }
> - /* If the dentry was recently referenced, don't free it. */
> - if (dentry->d_flags & DCACHE_REFERENCED) {
> - dentry->d_flags &= ~DCACHE_REFERENCED;
> - list_add(&dentry->d_lru, &dentry_unused);
> - dentry_stat.nr_unused++;
> - spin_unlock(&dentry->d_lock);
> - continue;
> - }
> - prune_one_dentry(dentry);
> }
> - spin_unlock(&dcache_lock);
> + if (!work_done) {
> + /* sb_hand is stale. Start and the beginning of the
> + * list again. */

Please fix the comment style

> + sb_hand = sb_entry(super_blocks.next);
> + goto restart;
> + }
> + spin_unlock(&sb_lock);
> }
>
> /*
> @@ -454,41 +520,7 @@ static void prune_dcache(int count)
>
> void shrink_dcache_sb(struct super_block * sb)
> {
> - struct list_head *tmp, *next;
> - struct dentry *dentry;
> -
> - /*
> - * Pass one ... move the dentries for the specified
> - * superblock to the most recent end of the unused list.
> - */
> - spin_lock(&dcache_lock);
> - list_for_each_safe(tmp, next, &dentry_unused) {
> - dentry = list_entry(tmp, struct dentry, d_lru);
> - if (dentry->d_sb != sb)
> - continue;
> - list_del(tmp);
> - list_add(tmp, &dentry_unused);
> - }
> -
> - /*
> - * Pass two ... free the dentries for this superblock.
> - */
> -repeat:
> - list_for_each_safe(tmp, next, &dentry_unused) {
> - dentry = list_entry(tmp, struct dentry, d_lru);
> - if (dentry->d_sb != sb)
> - continue;
> - dentry_stat.nr_unused--;
> - list_del_init(tmp);
> - spin_lock(&dentry->d_lock);
> - if (atomic_read(&dentry->d_count)) {
> - spin_unlock(&dentry->d_lock);
> - continue;
> - }
> - prune_one_dentry(dentry);
> - goto repeat;
> - }
> - spin_unlock(&dcache_lock);
> + __shrink_dcache_sb(sb, NULL, 0);
> }
>
> /*
> @@ -572,17 +604,13 @@ resume:
> struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
> next = tmp->next;
>
> - if (!list_empty(&dentry->d_lru)) {
> - dentry_stat.nr_unused--;
> - list_del_init(&dentry->d_lru);
> - }
> + dentry_lru_del_init(dentry);
> /*
> * move only zero ref count dentries to the end
> * of the unused list for prune_dcache
> */
> if (!atomic_read(&dentry->d_count)) {
> - list_add(&dentry->d_lru, dentry_unused.prev);
> - dentry_stat.nr_unused++;
> + dentry_lru_add_tail(dentry);
> found++;
> }

This should not be required with per super-block dentries. The only
reason, I think we moved dentries to the tail is to club all entries
from the sb together (to free them all at once).

>
> @@ -657,18 +685,14 @@ void shrink_dcache_anon(struct hlist_hea
> spin_lock(&dcache_lock);
> hlist_for_each(lp, head) {
> struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
> - if (!list_empty(&this->d_lru)) {
> - dentry_stat.nr_unused--;
> - list_del_init(&this->d_lru);
> - }
>
> + dentry_lru_del_init(this);
> /*
> * move only zero ref count dentries to the end
> * of the unused list for prune_dcache
> */
> if (!atomic_read(&this->d_count)) {
> - list_add_tail(&this->d_lru, &dentry_unused);
> - dentry_stat.nr_unused++;
> + dentry_lru_add_tail(this);
> found++;
> }
> }

This might still be required. Do you want to split out the anonymous dcache
entries as well? I am not sure what their superblock points to.




Balbir Singh,
Linux Technology Center,
IBM Software Labs

2006-05-24 05:47:45

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.

On Wed, May 24, 2006 at 06:23:09AM +0200, Arjan van de Ven wrote:
> On Wed, 2006-05-24 at 14:01 +1000, David Chinner wrote:
> > Yup, that is what the current code I've written will do. I just
> > wanted someting that worked over all superblocks to begin with.
> > It's not very smart, but improving it can be done incrementally.
>
> I think that if you say A you should say B, I mean, if you make the list
> per SB you probably just should do the step and make at least the
> counter per SB as well. That will also save in cacheline bounces I
> suppose... but more importantly you keep the counters next to the list.

But it doesn't remove the need for the global counter. The
dcache_lock is far more heavily contended so the counter cacheline
bouncing is lost in the noise here. No to mention the counter can
only be updated while holding the dcache_lock. Hence at this point,
adding per-sb counters is pure overhead unless the reclaim method
is made to use them.

> Which you'll also need to do any kind of scaling I suppose later on, so
> might as well keep the stats already.

The per-sb list improves scalability only by reducing the maximum
length of time the dcache_lock is held. Scalabilty for further
parallelism (and therefore better reclaim performance) is going to
require locking changes so that's when I'd expect the counters to
need changing. I don't want to over-optimise before we know what we
actually need to optimise...

Cheers,

Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

2006-05-24 06:20:00

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.

On Wed, May 24, 2006 at 10:32:14AM +0530, Balbir Singh wrote:
> On Wed, May 24, 2006 at 11:24:12AM +1000, David Chinner wrote:
> > +}
> > +
> > +static void
> > +dentry_lru_del_init(struct dentry *dentry)
> > +{
> > + if (likely(!list_empty(&dentry->d_lru))) {
> > + list_del_init(&dentry->d_lru);
> > + dentry_stat.nr_unused--;
> > + }
> > +}
> > +
>
> I wonder if there should be per-sb count of number of used dentries. This
> will help us while unmounting. Instead of passing count as NULL, the number
> of dentries in the super block can be passed.

Possibly.

> May be we should have
> per-dentry stats along with global statistics.

per-sb, you mean? ;)

>
> <snip>
>
> >
> > +/*
> > + * Shrink the dentry LRU on æ given superblock.
> > + */
> > +static void
> > +__shrink_dcache_sb(struct super_block *sb, int *count, int flags)
> > +{
> > + struct dentry *dentry;
> > + int cnt = (count == NULL) ? -1 : *count;
>
> This code seems hacky. Some comments please. Negative counters are generally
> not good IMHO. This can be avoided by per-sb dentry stats (please see
> the comment above)

I'll fix that up...

> > +
> > + spin_lock(&dcache_lock);
> > + while (!list_empty(&sb->s_dentry_lru) && cnt--) {
> > + dentry = list_entry(sb->s_dentry_lru.prev,
> > + struct dentry, d_lru);
> > + dentry_lru_del_init(dentry);
> > + BUG_ON(dentry->d_sb != sb);
> > + prefetch(sb->s_dentry_lru.prev);
> > +
> > + spin_lock(&dentry->d_lock);
> > + /*
> > + * We found an inuse dentry which was not removed from
> > + * the LRU because of laziness during lookup. Do not free
> > + * it - just keep it off the LRU list.
> > + */
> > + if (atomic_read(&dentry->d_count)) {
> > + spin_unlock(&dentry->d_lock);
> > + continue;
> > + }
> > + /* If the dentry matches the flags passed in, don't free it.
> > + * clear the flags and put it back on the LRU */
>
> The comment does not follow coding style conventions. What do these flags
> typically contain - DCACHE_REFERENCED? Could you clean up this comment
> please.

DCACHE_REFERENCED is the only one at the moment. I'll make that more obvious.

> > + if (flags && (dentry->d_flags & flags)) {
> > + dentry->d_flags &= ~flags;
> > + dentry_lru_add(dentry);
> > + spin_unlock(&dentry->d_lock);
> > + continue;
> > + }
> > + prune_one_dentry(dentry);
> > + }
> > + spin_unlock(&dcache_lock);
> > + if (count)
> > + *count = cnt;
> > +}
> > +
> > /**
> > * prune_dcache - shrink the dcache
> > * @count: number of entries to try and free
> > @@ -392,42 +456,44 @@ static inline void prune_one_dentry(stru
> >
> > static void prune_dcache(int count)
> > {
> > - spin_lock(&dcache_lock);
> > - for (; count ; count--) {
> > - struct dentry *dentry;
> > - struct list_head *tmp;
> > -
> > - cond_resched_lock(&dcache_lock);
> > + struct super_block *sb;
> > + static struct super_block *sb_hand = NULL;
> > + int work_done = 0;
> > +
> > + spin_lock(&sb_lock);
> > + if (sb_hand == NULL)
> > + sb_hand = sb_entry(super_blocks.next);
> > +restart:
> > + list_for_each_entry(sb, &super_blocks, s_list) {
> > + if (sb != sb_hand)
> > + continue;
> > + /* Found the next superblock to work on.
> > + * Move the hand forwards so that parallel
> > + * pruners will work on a different sb */
> > + work_done++;
> > + sb_hand = sb_entry(sb->s_list.next);
> > + sb->s_count++;
> > + spin_unlock(&sb_lock);
> > +
> > + /* we don't take the s_umount lock here because
> > + * because we can be called already holding a
> > + * write lock on a superblock */
>
> You can use trylock. Please see the patches in -mm to fix the umount
> race.

I'm not sure what unmount race you are talking about.

AFAICT, there is no race here - we've got a reference on the superblock so it
can't go away and the lru list is protected by the dcache_lock, so there's
nothing else we can race with. However, we can deadlock by taking the
s_umount lock here. So why even bother to try to take the lock when we don't
actually need it?

> > + if (!list_empty(&sb->s_dentry_lru))
> > + __shrink_dcache_sb(sb, &count, DCACHE_REFERENCED);
> >
> > - tmp = dentry_unused.prev;
> > - if (tmp == &dentry_unused)
> > + spin_lock(&sb_lock);
> > + if (__put_super_and_need_restart(sb) && count)
> > + goto restart;
>
> Comment please.

I'm not sure what a comment needs to explain that the code doesn't.
Which bit do you think needs commenting?

> > - spin_unlock(&dcache_lock);
> > + if (!work_done) {
> > + /* sb_hand is stale. Start and the beginning of the
> > + * list again. */

*nod*

> > }
> >
> > /*
> > @@ -454,41 +520,7 @@ static void prune_dcache(int count)
> >
> > void shrink_dcache_sb(struct super_block * sb)
> > {
> > - struct list_head *tmp, *next;
> > - struct dentry *dentry;
> > -
> > - /*
> > - * Pass one ... move the dentries for the specified
> > - * superblock to the most recent end of the unused list.
> > - */
> > - spin_lock(&dcache_lock);
> > - list_for_each_safe(tmp, next, &dentry_unused) {
> > - dentry = list_entry(tmp, struct dentry, d_lru);
> > - if (dentry->d_sb != sb)
> > - continue;
> > - list_del(tmp);
> > - list_add(tmp, &dentry_unused);
> > - }
> > -
> > - /*
> > - * Pass two ... free the dentries for this superblock.
> > - */
> > -repeat:
> > - list_for_each_safe(tmp, next, &dentry_unused) {
> > - dentry = list_entry(tmp, struct dentry, d_lru);
> > - if (dentry->d_sb != sb)
> > - continue;
> > - dentry_stat.nr_unused--;
> > - list_del_init(tmp);
> > - spin_lock(&dentry->d_lock);
> > - if (atomic_read(&dentry->d_count)) {
> > - spin_unlock(&dentry->d_lock);
> > - continue;
> > - }
> > - prune_one_dentry(dentry);
> > - goto repeat;
> > - }
> > - spin_unlock(&dcache_lock);
> > + __shrink_dcache_sb(sb, NULL, 0);
> > }
> >
> > /*
> > @@ -572,17 +604,13 @@ resume:
> > struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
> > next = tmp->next;
> >
> > - if (!list_empty(&dentry->d_lru)) {
> > - dentry_stat.nr_unused--;
> > - list_del_init(&dentry->d_lru);
> > - }
> > + dentry_lru_del_init(dentry);
> > /*
> > * move only zero ref count dentries to the end
> > * of the unused list for prune_dcache
> > */
> > if (!atomic_read(&dentry->d_count)) {
> > - list_add(&dentry->d_lru, dentry_unused.prev);
> > - dentry_stat.nr_unused++;
> > + dentry_lru_add_tail(dentry);
> > found++;
> > }
>
> This should not be required with per super-block dentries. The only
> reason, I think we moved dentries to the tail is to club all entries
> from the sb together (to free them all at once).

I think we still need to do that. We get called in contexts that aren't
related to unmounting, so we want these dentries to be the first
to be reclaimed from that superblock when we next call prune_dcache().

> > @@ -657,18 +685,14 @@ void shrink_dcache_anon(struct hlist_hea
> > spin_lock(&dcache_lock);
> > hlist_for_each(lp, head) {
> > struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
> > - if (!list_empty(&this->d_lru)) {
> > - dentry_stat.nr_unused--;
> > - list_del_init(&this->d_lru);
> > - }
> >
> > + dentry_lru_del_init(this);
> > /*
> > * move only zero ref count dentries to the end
> > * of the unused list for prune_dcache
> > */
> > if (!atomic_read(&this->d_count)) {
> > - list_add_tail(&this->d_lru, &dentry_unused);
> > - dentry_stat.nr_unused++;
> > + dentry_lru_add_tail(this);
> > found++;
> > }
> > }
>
> This might still be required. Do you want to split out the anonymous dcache
> entries as well? I am not sure what their superblock points to.

No. Right now I just want to fix the problem that has been reported with
shrink_dcache_sb().

>
>
>
>
> Balbir Singh,
> Linux Technology Center,
> IBM Software Labs

--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

2006-05-24 07:10:23

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.


> per-sb, you mean? ;)

Yes.

>
> > You can use trylock. Please see the patches in -mm to fix the umount
> > race.
>
> I'm not sure what unmount race you are talking about.
>
> AFAICT, there is no race here - we've got a reference on the superblock so it
> can't go away and the lru list is protected by the dcache_lock, so there's
> nothing else we can race with. However, we can deadlock by taking the
> s_umount lock here. So why even bother to try to take the lock when we don't
> actually need it?

Please read the thread at http://lkml.org/lkml/2006/4/2/101.

>
> > > + if (__put_super_and_need_restart(sb) && count)
> > > + goto restart;
> >
> > Comment please.
>
> I'm not sure what a comment needs to explain that the code doesn't.
> Which bit do you think needs commenting?

I was referring to the __put_super_and_need_restart() part.

> > This should not be required with per super-block dentries. The only
> > reason, I think we moved dentries to the tail is to club all entries
> > from the sb together (to free them all at once).
>
> I think we still need to do that. We get called in contexts that aren't
> related to unmounting, so we want these dentries to be the first
> to be reclaimed from that superblock when we next call prune_dcache().

Is it? I quickly checked the callers of shrink_dcache_sb() and all of
them seem to be mount related. shrink_dcache_parent() is another story.
Am I missing something? Code reference will be particularly useful.

>
> No. Right now I just want to fix the problem that has been reported with
> shrink_dcache_sb().

Sure.

Balbir Singh,
Linux Technology Center,
IBM Software Labs

2006-05-24 08:16:12

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.

On Wed, May 24, 2006 at 12:36:06PM +0530, Balbir Singh wrote:
> > > You can use trylock. Please see the patches in -mm to fix the umount
> > > race.
> >
> > I'm not sure what unmount race you are talking about.
> >
> > AFAICT, there is no race here - we've got a reference on the superblock so it
> > can't go away and the lru list is protected by the dcache_lock, so there's
> > nothing else we can race with. However, we can deadlock by taking the
> > s_umount lock here. So why even bother to try to take the lock when we don't
> > actually need it?
>
> Please read the thread at http://lkml.org/lkml/2006/4/2/101.

Ok, thanks for the pointer. it's a completely differnt sort of race you're
talking about ;)

I'll have to look at the -mm tree and how the locking is different
there before commenting further.

> > > > + if (__put_super_and_need_restart(sb) && count)
> > > > + goto restart;
> > >
> > > Comment please.
> >
> > I'm not sure what a comment needs to explain that the code doesn't.
> > Which bit do you think needs commenting?
>
> I was referring to the __put_super_and_need_restart() part.

Ok. i didn't think it necessary given it's use in several other places
and the comment at the head of __put_super_and_need_restart() explain
exactly what it does and when to use it. Comment added anyway.

> > > This should not be required with per super-block dentries. The only
> > > reason, I think we moved dentries to the tail is to club all entries
> > > from the sb together (to free them all at once).
> >
> > I think we still need to do that. We get called in contexts that aren't
> > related to unmounting, so we want these dentries to be the first
> > to be reclaimed from that superblock when we next call prune_dcache().
>
> Is it? I quickly checked the callers of shrink_dcache_sb() and all of
> them seem to be mount related. shrink_dcache_parent() is another story.
> Am I missing something? Code reference will be particularly useful.

I thought you were referring to shrink_dcache_parent(). At least, I
was, and the hunk of diff that your comment followed after was from
select_parent(). Please correct me if I'm wrong, but I think we're
agreeing that it's doing the right thing in select_parent().

The modified shrink_dcache_sb() doesn't do do any list moving at all,
it simply frees all the dentries on the superblock in a single pass.....

Cheers,

Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

2006-05-24 08:32:40

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Per-superblock unused dentry LRU lists.

> I thought you were referring to shrink_dcache_parent(). At least, I
> was, and the hunk of diff that your comment followed after was from
> select_parent(). Please correct me if I'm wrong, but I think we're
> agreeing that it's doing the right thing in select_parent().
>
> The modified shrink_dcache_sb() doesn't do do any list moving at all,
> it simply frees all the dentries on the superblock in a single pass.....

I think I misread the code, I thought I was still at an older
function.

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> R&D Software Enginner
> SGI Australian Software Group

--

Balbir Singh,
Linux Technology Center,
IBM Software Labs