2006-10-25 12:31:08

by Vasily Averin

[permalink] [raw]
Subject: [Q] missing unused dentry in prune_dcache()?

Hello folks,

I would like to ask you clarify me one question in the the following patch:
http://linux.bkbits.net:8080/linux-2.6/gnupatch@449b144ecSF1rYskg3q-SeR2vf88zg
# ChangeSet
# 2006/06/22 15:05:57-07:00 [email protected]
# [PATCH] Fix dcache race during umount

# If prune_dcache finds a dentry that it cannot free, it leaves it where it
# is (at the tail of the list) and exits, on the assumption that some other
# thread will be removing that dentry soon.

However as far as I see this comment is not correct: when we cannot take
s_umount rw_semaphore (for example because it was taken in do_remount) this
dentry is already extracted from dentry_unused list and we do not add it into
the list again. Therefore dentry will not be found by prune_dcache() and
shrink_dcache_sb() and will leave in memory very long time until the partition
will be unmounted.

Am I probably err?

The patch adds this dentry into tail of the dentry_unused list.

Signed-off-by: Vasily Averin <[email protected]>

--- linux-2.6.19-rc3/fs/dcache.c.prdch 2006-10-25 16:09:19.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-25 16:08:20.000000000 +0400
@@ -477,6 +477,8 @@ static void prune_dcache(int count, stru
}
up_read(s_umount);
}
+ list_add_tail(&dentry->d_lru, &dentry_unused);
+ dentry_stat.nr_unused++;
spin_unlock(&dentry->d_lock);
/* Cannot remove the first dentry, and it isn't appropriate
* to move it to the head of the list, so give up, and try


2006-10-25 13:52:41

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Vasily Averin <[email protected]> wrote:

> # If prune_dcache finds a dentry that it cannot free, it leaves it where it
> # is (at the tail of the list) and exits, on the assumption that some other
> # thread will be removing that dentry soon.
>
> However as far as I see this comment is not correct: when we cannot take
> s_umount rw_semaphore (for example because it was taken in do_remount) this
> dentry is already extracted from dentry_unused list and we do not add it into
> the list again.

You would seem to be correct.

> Therefore dentry will not be found by prune_dcache() and shrink_dcache_sb()
> and will leave in memory very long time until the partition will be
> unmounted.

And here too:-/

> Am I probably err?

Unfortunately not. I wonder if remount should be getting a writelock on the
s_umount sem, but I don't see why not. grab_super() also gets a writelock on
it, and so that could cause problems too.

shrink_dcache_for_umount_subtree() doesn't care because it doesn't scan the
dcache_unused list, but as you say, other things are affected.

> The patch adds this dentry into tail of the dentry_unused list.

I think that's reasonable. I wonder if we can avoid removing it from the list
in the first place, but I suspect it's less optimal.

Acked-By: David Howells <[email protected]>

2006-10-25 13:58:49

by Vasily Averin

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

David Howells wrote:
> Vasily Averin <[email protected]> wrote:
>> The patch adds this dentry into tail of the dentry_unused list.
>
> I think that's reasonable. I wonder if we can avoid removing it from the list
> in the first place, but I suspect it's less optimal.

Could you please explain this place in details, I do not understand why tail of
the list is better than head.
Also I do not understand why we should go to out in this case. Why we cannot use
next dentry in the list instead?

> Acked-By: David Howells <[email protected]>


Thank you,
Vasily Averin

2006-10-25 14:29:50

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Vasily Averin <[email protected]> wrote:

> >> The patch adds this dentry into tail of the dentry_unused list.
> >
> > I think that's reasonable. I wonder if we can avoid removing it from the
> > list in the first place, but I suspect it's less optimal.
>
> Could you please explain this place in details, I do not understand why tail
> of the list is better than head. Also I do not understand why we should go
> to out in this case. Why we cannot use next dentry in the list instead?

I meant adding it back into the list is reasonable; I didn't actually consider
where you were adding it back.

However, you've just raised a good point: moving a dentry to the head of the
list is what happens when we see it's been referenced recently. The most
recently used entry is at the head and the least at the tail.

If you look at the list scan algorithm, you can see it works from the tail
towards the head.

We could re-insert the dentry at the position from which we removed it, but
that would mean retaining a pointer to one of its neighbouring dentries. We
can do that - it's just stack space after all...

I think the main point though, is generally that the superblock that owns the
dentry is very busy in some critical way; either it's being unmounted (in which
case we may as well put the dentry at the MRU end), or it's being remounted (in
which case, it's probably best putting it back where we found it).

You also have to consider how often superblocks are going to be unmounted or
remounted vs the dcache being pruned...

There is also grab_super(), but that's only used by sget() when a superblock is
being mounted again - also unlikely.

So, given that the three ops that affect this are very unlikely to be happening
at any one time, it's probably worth just slapping it at the head and ignoring
it. The main thing is that it's reinserted somewhere.

Hope that helps...

David

2006-10-26 11:37:01

by Vasily Averin

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Hello David

David Howells wrote:
> Vasily Averin <[email protected]> wrote:
>>>> The patch adds this dentry into tail of the dentry_unused list.
>>> I think that's reasonable. I wonder if we can avoid removing it from the
>>> list in the first place, but I suspect it's less optimal.
>> Could you please explain this place in details, I do not understand why tail
>> of the list is better than head. Also I do not understand why we should go
>> to out in this case. Why we cannot use next dentry in the list instead?
>
> I meant adding it back into the list is reasonable; I didn't actually consider
> where you were adding it back.
>
> So, given that the three ops that affect this are very unlikely to be happening
> at any one time, it's probably worth just slapping it at the head and ignoring
> it. The main thing is that it's reinserted somewhere.

I don't like to insert this dentry into tail of list because of it prevents
shrink_dcache_memory. It finds "skipped" dentry at the tail of the list, does
not free it and goes to out without any progress.

Therefore I've removed break of cycle and insert this dentry to head of the
list. Theoretically it can lead to the second using of the same dentry, however
I do not think that it is a big problem.

Signed-off-by: Vasily Averin <[email protected]>

--- linux-2.6.19-rc3/fs/dcache.c.prdch 2006-10-25 16:09:19.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-26 15:14:51.000000000 +0400
@@ -478,11 +478,9 @@ static void prune_dcache(int count, stru
up_read(s_umount);
}
spin_unlock(&dentry->d_lock);
- /* Cannot remove the first dentry, and it isn't appropriate
- * to move it to the head of the list, so give up, and try
- * later
- */
- break;
+ /* Inserting dentry to tail of the list leads to cycle */
+ list_add(&dentry->d_lru, &dentry_unused);
+ dentry_stat.nr_unused++;
}
spin_unlock(&dcache_lock);
}

2006-10-26 11:49:32

by Vasily Averin

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Hello David,

I've noticed one more minor issue in your patch: in
shrink_dcache_for_umount_subtree() function you decrement dentry_stat.nr_dentry
without dcache_lock.

Thank you,
Vasily Averin

2006-10-26 12:35:22

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Vasily Averin <[email protected]> wrote:

> I've noticed one more minor issue in your patch: in
> shrink_dcache_for_umount_subtree() function you decrement
> dentry_stat.nr_dentry without dcache_lock.

Hmmm... that's a very good point:-/ I wonder if I can batch them.

David

2006-10-26 13:25:18

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Vasily Averin <[email protected]> wrote:

> I've noticed one more minor issue in your patch: in
> shrink_dcache_for_umount_subtree() function you decrement
> dentry_stat.nr_dentry without dcache_lock.

How about the attached patch?

David
---
VFS: Fix an error in unused dentry counting

From: David Howells <[email protected]>

Fix an error in unused dentry counting in shrink_dcache_for_umount_subtree() in
which the count is modified without the dcache_lock held.

Signed-Off-By: David Howells <[email protected]>
---

fs/dcache.c | 22 +++++++++++++++-------
1 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index a1ff91e..eab1bf4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -553,16 +553,17 @@ repeat:
* - see the comments on shrink_dcache_for_umount() for a description of the
* locking
*/
-static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
+static unsigned shrink_dcache_for_umount_subtree(struct dentry *dentry)
{
struct dentry *parent;
+ unsigned detached = 0;

BUG_ON(!IS_ROOT(dentry));

/* detach this root from the system */
spin_lock(&dcache_lock);
if (!list_empty(&dentry->d_lru)) {
- dentry_stat.nr_unused--;
+ detached++;
list_del_init(&dentry->d_lru);
}
__d_drop(dentry);
@@ -579,7 +580,7 @@ static void shrink_dcache_for_umount_sub
list_for_each_entry(loop, &dentry->d_subdirs,
d_u.d_child) {
if (!list_empty(&loop->d_lru)) {
- dentry_stat.nr_unused--;
+ detached++;
list_del_init(&loop->d_lru);
}

@@ -620,7 +621,7 @@ static void shrink_dcache_for_umount_sub
atomic_dec(&parent->d_count);

list_del(&dentry->d_u.d_child);
- dentry_stat.nr_dentry--; /* For d_free, below */
+ detached++; /* For d_free, below */

inode = dentry->d_inode;
if (inode) {
@@ -638,7 +639,7 @@ static void shrink_dcache_for_umount_sub
* otherwise we ascend to the parent and move to the
* next sibling if there is one */
if (!parent)
- return;
+ return detached;

dentry = parent;

@@ -663,6 +664,7 @@ static void shrink_dcache_for_umount_sub
void shrink_dcache_for_umount(struct super_block *sb)
{
struct dentry *dentry;
+ unsigned detached = 0;

if (down_read_trylock(&sb->s_umount))
BUG();
@@ -670,11 +672,17 @@ void shrink_dcache_for_umount(struct sup
dentry = sb->s_root;
sb->s_root = NULL;
atomic_dec(&dentry->d_count);
- shrink_dcache_for_umount_subtree(dentry);
+ detached = shrink_dcache_for_umount_subtree(dentry);

while (!hlist_empty(&sb->s_anon)) {
dentry = hlist_entry(sb->s_anon.first, struct dentry, d_hash);
- shrink_dcache_for_umount_subtree(dentry);
+ detached += shrink_dcache_for_umount_subtree(dentry);
+ }
+
+ if (detached) {
+ spin_lock(&dcache_lock);
+ dentry_stat.nr_unused -= detached;
+ spin_unlock(&dcache_lock);
}
}

2006-10-26 13:26:57

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Vasily Averin <[email protected]> wrote:

> Therefore I've removed break of cycle and insert this dentry to head of the
> list. Theoretically it can lead to the second using of the same dentry,
> however I do not think that it is a big problem.

Hmmm... Or maybe it could be a problem. If whenever we find the dentry we
stick it back on the head of the list, this could be a problem as we're
traversing the list from tail to head.

David

2006-10-26 13:58:30

by Vasily Averin

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

David Howells wrote:
> Vasily Averin <[email protected]> wrote:
>
>> I've noticed one more minor issue in your patch: in
>> shrink_dcache_for_umount_subtree() function you decrement
>> dentry_stat.nr_dentry without dcache_lock.
>
> How about the attached patch?

I would note that in first two hunks you have decremented dentry_stat.nr_unused
correctly, under dcache_lock. Probably it's better to count freed dentries only
in third case and corrects dentry_stat.nr_unused value inside
shrink_dcache_for_umount_subtree() function before return.

Thank you,
Vasily Averin
> David
> ---
> VFS: Fix an error in unused dentry counting
>
> From: David Howells <[email protected]>
>
> Fix an error in unused dentry counting in shrink_dcache_for_umount_subtree() in
> which the count is modified without the dcache_lock held.
>
> Signed-Off-By: David Howells <[email protected]>
> ---
>
> fs/dcache.c | 22 +++++++++++++++-------
> 1 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index a1ff91e..eab1bf4 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -553,16 +553,17 @@ repeat:
> * - see the comments on shrink_dcache_for_umount() for a description of the
> * locking
> */
> -static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
> +static unsigned shrink_dcache_for_umount_subtree(struct dentry *dentry)
> {
> struct dentry *parent;
> + unsigned detached = 0;
>
> BUG_ON(!IS_ROOT(dentry));
>
> /* detach this root from the system */
> spin_lock(&dcache_lock);
> if (!list_empty(&dentry->d_lru)) {
> - dentry_stat.nr_unused--;
> + detached++;
> list_del_init(&dentry->d_lru);
> }
> __d_drop(dentry);
> @@ -579,7 +580,7 @@ static void shrink_dcache_for_umount_sub
> list_for_each_entry(loop, &dentry->d_subdirs,
> d_u.d_child) {
> if (!list_empty(&loop->d_lru)) {
> - dentry_stat.nr_unused--;
> + detached++;
> list_del_init(&loop->d_lru);
> }
>
> @@ -620,7 +621,7 @@ static void shrink_dcache_for_umount_sub
> atomic_dec(&parent->d_count);
>
> list_del(&dentry->d_u.d_child);
> - dentry_stat.nr_dentry--; /* For d_free, below */
> + detached++; /* For d_free, below */
>
> inode = dentry->d_inode;
> if (inode) {
> @@ -638,7 +639,7 @@ static void shrink_dcache_for_umount_sub
> * otherwise we ascend to the parent and move to the
> * next sibling if there is one */
> if (!parent)
> - return;
> + return detached;
>
> dentry = parent;
>
> @@ -663,6 +664,7 @@ static void shrink_dcache_for_umount_sub
> void shrink_dcache_for_umount(struct super_block *sb)
> {
> struct dentry *dentry;
> + unsigned detached = 0;
>
> if (down_read_trylock(&sb->s_umount))
> BUG();
> @@ -670,11 +672,17 @@ void shrink_dcache_for_umount(struct sup
> dentry = sb->s_root;
> sb->s_root = NULL;
> atomic_dec(&dentry->d_count);
> - shrink_dcache_for_umount_subtree(dentry);
> + detached = shrink_dcache_for_umount_subtree(dentry);
>
> while (!hlist_empty(&sb->s_anon)) {
> dentry = hlist_entry(sb->s_anon.first, struct dentry, d_hash);
> - shrink_dcache_for_umount_subtree(dentry);
> + detached += shrink_dcache_for_umount_subtree(dentry);
> + }
> +
> + if (detached) {
> + spin_lock(&dcache_lock);
> + dentry_stat.nr_unused -= detached;
> + spin_unlock(&dcache_lock);
> }
> }
>
>

2006-10-26 14:09:05

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Vasily Averin <[email protected]> wrote:

> I would note that in first two hunks you have decremented
> dentry_stat.nr_unused correctly, under dcache_lock. Probably it's better to
> count freed dentries only in third case and corrects dentry_stat.nr_unused
> value inside shrink_dcache_for_umount_subtree() function before return.

Maybe, maybe not. This way we only touch the dentry_stat cacheline once, if
at all. I'm not sure it'd make much difference though.

David

2006-10-27 06:32:42

by Vasily Averin

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

David Howells wrote:
> Vasily Averin <[email protected]> wrote:
>
>> I've noticed one more minor issue in your patch: in
>> shrink_dcache_for_umount_subtree() function you decrement
>> dentry_stat.nr_dentry without dcache_lock.
>
> How about the attached patch?
I'm sorry, but your patch is wrong:
you have mixed calculation of 2 variables:
dentry_stat.nr_unused -- were correct, it was decremented under dcache_lock.
dentry_stat.nr_dentry -- were incorrect, it was decremented without dcache_lock.

You should correct dentry_stat.nr_dentry, but instead you broke calculation of
dentry_stat.nr_unused.

I've fixed this issue by following patch.

Thank you,
Vasily Averin
---
VFS: Fix an error in dentry_stat.nr_dentry counting

From: Vasily Averin <[email protected]>

Fix an error in dentry_stat.nr_dentry counting in
shrink_dcache_for_umount_subtree() in which the count is modified without the
dcache_lock held.

Signed-Off-By: Vasily Averin <[email protected]>

--- linux-2.6.19-rc3/fs/dcache.c.nrdntr 2006-10-26 15:14:51.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-27 10:24:07.000000000 +0400
@@ -554,6 +554,7 @@ repeat:
static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
{
struct dentry *parent;
+ unsigned detached = 0;

BUG_ON(!IS_ROOT(dentry));

@@ -618,7 +619,7 @@ static void shrink_dcache_for_umount_sub
atomic_dec(&parent->d_count);

list_del(&dentry->d_u.d_child);
- dentry_stat.nr_dentry--; /* For d_free, below */
+ detached++;

inode = dentry->d_inode;
if (inode) {
@@ -635,8 +636,8 @@ static void shrink_dcache_for_umount_sub
/* finished when we fall off the top of the tree,
* otherwise we ascend to the parent and move to the
* next sibling if there is one */
- if (!parent)
- return;
+ if (!parent)
+ goto out;

dentry = parent;

@@ -645,6 +646,11 @@ static void shrink_dcache_for_umount_sub
dentry = list_entry(dentry->d_subdirs.next,
struct dentry, d_u.d_child);
}
+out:
+ /* several dentries were freed, need to correct nr_dentry */
+ spin_lock(&dcache_lock);
+ dentry_stat.nr_dentry -= detached;
+ spin_unlock(&dcache_lock);
}

/*

2006-10-27 06:50:51

by Vasily Averin

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Vasily Averin wrote:
> David Howells wrote:
>> Vasily Averin <[email protected]> wrote:
>>
>>> I've noticed one more minor issue in your patch: in
>>> shrink_dcache_for_umount_subtree() function you decrement
>>> dentry_stat.nr_dentry without dcache_lock.
>> How about the attached patch?
> I'm sorry, but your patch is wrong:
> you have mixed calculation of 2 variables:
> dentry_stat.nr_unused -- were correct, it was decremented under dcache_lock.
> dentry_stat.nr_dentry -- were incorrect, it was decremented without dcache_lock.
>
> You should correct dentry_stat.nr_dentry, but instead you broke calculation of
> dentry_stat.nr_unused.
>
> I've fixed this issue by following patch.
corrected version, extra space were removed

Thank you,
Vasily Averin

---
VFS: Fix an error in dentry_stat.nr_dentry counting

From: Vasily Averin <[email protected]>

Fix an error in dentry_stat.nr_dentry counting in
shrink_dcache_for_umount_subtree() in which the count is modified without the
dcache_lock held.

Signed-Off-By: Vasily Averin <[email protected]>

--- linux-2.6.19-rc3/fs/dcache.c.nrdntr 2006-10-26 15:14:51.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-27 10:45:11.000000000 +0400
@@ -554,6 +554,7 @@ repeat:
static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
{
struct dentry *parent;
+ unsigned detached = 0;

BUG_ON(!IS_ROOT(dentry));

@@ -618,7 +619,7 @@ static void shrink_dcache_for_umount_sub
atomic_dec(&parent->d_count);

list_del(&dentry->d_u.d_child);
- dentry_stat.nr_dentry--; /* For d_free, below */
+ detached++;

inode = dentry->d_inode;
if (inode) {
@@ -636,7 +637,7 @@ static void shrink_dcache_for_umount_sub
* otherwise we ascend to the parent and move to the
* next sibling if there is one */
if (!parent)
- return;
+ goto out;

dentry = parent;

@@ -645,6 +646,11 @@ static void shrink_dcache_for_umount_sub
dentry = list_entry(dentry->d_subdirs.next,
struct dentry, d_u.d_child);
}
+out:
+ /* several dentries were freed, need to correct nr_dentry */
+ spin_lock(&dcache_lock);
+ dentry_stat.nr_dentry -= detached;
+ spin_unlock(&dcache_lock);
}

/*

2006-10-27 08:06:06

by Vasily Averin

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

David Howells wrote:
> Vasily Averin <[email protected]> wrote:
>
>> Therefore I've removed break of cycle and insert this dentry to head of the
>> list. Theoretically it can lead to the second using of the same dentry,
>> however I do not think that it is a big problem.
>
> Hmmm... Or maybe it could be a problem. If whenever we find the dentry we
> stick it back on the head of the list, this could be a problem as we're
> traversing the list from tail to head.

I still do not think that it could be a problem:
count argument of prune_dcache is 128 if prune_dcache is called from
shrink_dcache_memory() function and even lesser if prune_dcache is called from
shrink_dcache_parent() function. I think usually the size of unused_dentry list
(nr_usnused) is much greater and may be compared with these values only in case
of very hard memory shortage. From my point of view it is very rare situation
and I even not sure that it can really happen at all.

However even if this situation will happen I do not see here any seriously
troubles: Yes, we will try to free the same dentries, much probably without
success again, but why it is bad in case of hard memory shortage?

If my arguments are not convincing, I can protect second use of the same dentry:
we can compare counter of the skipped dentries with nr_unused value.

And what the alternatives we have?
1) We can move this dentry to end of list? But it is bad because it will prevent
shrink_dcache_memory().
2) we can move these dentries into some temporal list, and insert it back to
unused_list later? But it is bad because of we can be rescheduled and drop
dcache_lock and may confuse someone who will assume that these dentries are in
unused_list.

Therefore I believe that my patch is optimal solution.

Thank you,
Vasily Averin

2006-10-27 09:38:30

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Vasily Averin <[email protected]> wrote:

> > You should correct dentry_stat.nr_dentry, but instead you broke
> > calculation of dentry_stat.nr_unused.
> >
> > I've fixed this issue by following patch.
> corrected version, extra space were removed

Oops. You're right.

Acked-By: David Howells <[email protected]>

2006-10-27 10:44:22

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?


Vasily Averin <[email protected]> wrote:

> Therefore I believe that my patch is optimal solution.

I'm not sure that prune_dcache() is particularly optimal. If we're looking to
prune for a specific superblock, it may scan most of the dentry_unused list
several times, once for each dentry it eliminates.

Imagine the list with a million dentries on it. Imagine further that all the
dentries you're trying to eliminate are up near the head end: you're going to
have to scan most of the list several times unnecessarily; if you're asked to
kill 128 dentries, you might wind up examining on the order of 100,000,000
dentries, 99% of which you scan 128 times.

I wonder if this could be improved by making the assumption that there won't be
any entries inserted tailwards of where we've just looked. The problem is that
if dcache_lock is dropped, we've no way of keeping track of the current
position without inserting a marker into the list.

Now we could do the marker thing quite easily. We'd have to insert a dummy
dcache entry, probably with d_sb pointing to some special location that is
recognised as saying "that dentry is a marker".

We could do something like the attached patch, for example. Note that the
patch compiles, but I haven't tested it. It also uses a big chunk of stack
space for the marker. It ought to be safe enough with respect to the other
functions that touch that list - all of those deal with specific dentries or
look for dentries by superblock.

David
---
diff --git a/fs/dcache.c b/fs/dcache.c
index eab1bf4..a1cae74 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -395,18 +395,27 @@ static void prune_one_dentry(struct dent
* This function may fail to free any resources if
* all the dentries are in use.
*/
-
+
static void prune_dcache(int count, struct super_block *sb)
{
+ struct dentry marker = {
+ .d_sb = (struct super_block *) &prune_dcache,
+ };
+
+ struct list_head *tmp;
+
spin_lock(&dcache_lock);
+ list_add_tail(&marker.d_lru, &dentry_unused);
+
for (; count ; count--) {
struct dentry *dentry;
- struct list_head *tmp;
struct rw_semaphore *s_umount;

cond_resched_lock(&dcache_lock);

- tmp = dentry_unused.prev;
+ tmp = marker.d_lru.prev;
+ list_del_init(&marker.d_lru);
+
if (sb) {
/* Try to find a dentry for this sb, but don't try
* too hard, if they aren't near the tail they will
@@ -418,9 +427,18 @@ static void prune_dcache(int count, stru
skip--;
tmp = tmp->prev;
}
+ } else {
+ /* We may not be the only pruner */
+ while (tmp != &dentry_unused) {
+ dentry = list_entry(tmp, struct dentry, d_lru);
+ if (dentry->d_sb !=
+ (struct super_block *) &prune_dcache)
+ break;
+ }
}
if (tmp == &dentry_unused)
break;
+ list_add(&marker.d_lru, tmp);
list_del_init(tmp);
prefetch(dentry_unused.prev);
dentry_stat.nr_unused--;
@@ -439,7 +457,7 @@ static void prune_dcache(int count, stru
/* If the dentry was recently referenced, don't free it. */
if (dentry->d_flags & DCACHE_REFERENCED) {
dentry->d_flags &= ~DCACHE_REFERENCED;
- list_add(&dentry->d_lru, &dentry_unused);
+ list_add(&dentry->d_lru, &marker.d_lru);
dentry_stat.nr_unused++;
spin_unlock(&dentry->d_lock);
continue;
@@ -478,12 +496,10 @@ static void prune_dcache(int count, stru
up_read(s_umount);
}
spin_unlock(&dentry->d_lock);
- /* Cannot remove the first dentry, and it isn't appropriate
- * to move it to the head of the list, so give up, and try
- * later
- */
- break;
+ list_add(&dentry->d_lru, &marker.d_lru);
+ dentry_stat.nr_unused++;
}
+ list_del(&marker.d_lru);
spin_unlock(&dcache_lock);
}

2006-10-27 11:51:09

by Vasily Averin

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

David,

David Howells wrote:
> Vasily Averin <[email protected]> wrote:
>> Therefore I believe that my patch is optimal solution.
> I'm not sure that prune_dcache() is particularly optimal.

I means that my patch is optimal for problem in subject. I would like to ask you
to approve it and we will go to next issue.

> If we're looking to
> prune for a specific superblock, it may scan most of the dentry_unused list
> several times, once for each dentry it eliminates.
>
> Imagine the list with a million dentries on it. Imagine further that all the
> dentries you're trying to eliminate are up near the head end: you're going to
> have to scan most of the list several times unnecessarily; if you're asked to
> kill 128 dentries, you might wind up examining on the order of 100,000,000
> dentries, 99% of which you scan 128 times.

I would note that we (Virtuozzo/OpenVZ team) have seen similar issue on praxis.
We have kernel that handles a few dozens Virtual servers, and each of them have
the several isolated filesystems. We have seen that umount (and remount) can
work very slowly, it was cycled inside shrink_dcache_sb() up to several hours
with taken s_umount semaphore.

We are trying to resolve this issue by using per-sb lru list. I'm preparing the
patch for 2.6.19-rc3 right now and going to send it soon.

thank you,
Vasily Averin

2006-10-27 12:13:34

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Vasily Averin <[email protected]> wrote:

> > Vasily Averin <[email protected]> wrote:
> >> Therefore I believe that my patch is optimal solution.
> > I'm not sure that prune_dcache() is particularly optimal.
>
> I means that my patch is optimal for problem in subject.

I didn't say it wasn't.

> I would like to ask you to approve it and we will go to next issue.

I did ack it didn't I? I must fix my mail client so that it doesn't
automatically remove my email address from the To/Cc fields when I'm replying
to a message:-/

> We have seen that umount (and remount) can work very slowly, it was cycled
> inside shrink_dcache_sb() up to several hours with taken s_umount semaphore.

umount at least should be fixed as that should no longer use
shrink_dcache_sb().

> We are trying to resolve this issue by using per-sb lru list. I'm preparing
> the patch for 2.6.19-rc3 right now and going to send it soon.

That sounds tricky; you have to check all your LRU lists to find the LRU
dentry.

David

2006-10-27 13:42:19

by Vasily Averin

[permalink] [raw]
Subject: [PATCH 2.6.19-rc3] VFS: missing unused dentry in prune_dcache()

Andrew,

could you please use this patch instead of
missing-unused-dentry-in-prune_dcache.patch

As far as I understand David Howells have not any objections

Thank you,
Vasily Averin

---
VFS: missing unused dentry in prune_dcache()

From: Vasily Averin <[email protected]>

If we cannot delete dentry we should insert it back to the dentry_unused list.
To prevent cycle and do not blocks prune_dcache() call from
shrink_dcache_memory() we adding this dentry to head of list.

Signed-off-by: Vasily Averin <[email protected]>

--- linux-2.6.19-rc3/fs/dcache.c.prdch 2006-10-25 16:09:19.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-26 15:14:51.000000000 +0400
@@ -478,11 +478,9 @@ static void prune_dcache(int count, stru
up_read(s_umount);
}
spin_unlock(&dentry->d_lock);
- /* Cannot remove the first dentry, and it isn't appropriate
- * to move it to the head of the list, so give up, and try
- * later
- */
- break;
+ /* Inserting dentry to tail of the list leads to cycle */
+ list_add(&dentry->d_lru, &dentry_unused);
+ dentry_stat.nr_unused++;
}
spin_unlock(&dcache_lock);
}

2006-10-27 13:47:07

by Vasily Averin

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

David Howells wrote:
> Vasily Averin <[email protected]> wrote:
>> I would like to ask you to approve it and we will go to next issue.
>
> I did ack it didn't I? I must fix my mail client so that it doesn't
> automatically remove my email address from the To/Cc fields when I'm replying
> to a message:-/

To prevent any mistake I've resend this patch again to you and akpm@.

>> We have seen that umount (and remount) can work very slowly, it was cycled
>> inside shrink_dcache_sb() up to several hours with taken s_umount semaphore.
>
> umount at least should be fixed as that should no longer use
> shrink_dcache_sb().

Umount calls shrink_dcache_sb in "Special case for "unmounting" root".
Usually it happen only once, but in case OpenVZ it happens every time when any
Virtual server is stopped, each of them have own isolated root partition.

>> We are trying to resolve this issue by using per-sb lru list. I'm preparing
>> the patch for 2.6.19-rc3 right now and going to send it soon.

Please wait until my following email, it will be ready soon

thank you,
Vasily Averin

2006-10-27 14:05:57

by Vasily Averin

[permalink] [raw]
Subject: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list

From: Vasily Averin <[email protected]>

Virtuozzo/OpenVZ linux kernel team has discovered that umount/remount can last
for hours looping in shrink_dcache_sb() without much successes. Since during
shrinking s_umount semaphore is taken lots of other unrelated operations like
sync can stop working until shrink finished.

It happens due to very long unused dentries list (>1 million of dentries),
so that it takes shrink_dcache_sb() longer then 1/HZ seconds to get to the
required dentry and after freeing this single dentry it reschedules and restarts.

The proposed fix prevents this issue by using per-sb dentry LRU list. It
provides very quickly search for the unused dentries for given super block thus
forcing shrinking always making good progress.

It was well tested on 2.6.9-based Virtuozzo/OpenVZ kernels, but the port to the
latest mainstream kernel is not tested.

Signed-off-by: Kirill Korotaev <[email protected]>
Signed-off-by: Vasily Averin <[email protected]>

--- linux-2.6.19-rc3/fs/dcache.c.shrsb 2006-10-27 10:45:11.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-27 15:49:54.000000000 +0400
@@ -173,6 +173,7 @@ repeat:
if (list_empty(&dentry->d_lru)) {
dentry->d_flags |= DCACHE_REFERENCED;
list_add(&dentry->d_lru, &dentry_unused);
+ list_add(&dentry->d_sb_lru, &dentry->d_sb->s_dentry_unused);
dentry_stat.nr_unused++;
}
spin_unlock(&dentry->d_lock);
@@ -190,6 +191,7 @@ kill_it: {
*/
if (!list_empty(&dentry->d_lru)) {
list_del(&dentry->d_lru);
+ list_del(&dentry->d_sb_lru);
dentry_stat.nr_unused--;
}
list_del(&dentry->d_u.d_child);
@@ -270,6 +272,7 @@ static inline struct dentry * __dget_loc
if (!list_empty(&dentry->d_lru)) {
dentry_stat.nr_unused--;
list_del_init(&dentry->d_lru);
+ list_del_init(&dentry->d_sb_lru);
}
return dentry;
}
@@ -398,6 +401,10 @@ static void prune_one_dentry(struct dent

static void prune_dcache(int count, struct super_block *sb)
{
+ struct list_head *lru_head;
+
+ lru_head = sb ? &sb->s_dentry_unused : &dentry_unused;
+
spin_lock(&dcache_lock);
for (; count ; count--) {
struct dentry *dentry;
@@ -406,25 +413,16 @@ static void prune_dcache(int count, stru

cond_resched_lock(&dcache_lock);

- tmp = dentry_unused.prev;
- if (sb) {
- /* Try to find a dentry for this sb, but don't try
- * too hard, if they aren't near the tail they will
- * be moved down again soon
- */
- int skip = count;
- while (skip && tmp != &dentry_unused &&
- list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
- skip--;
- tmp = tmp->prev;
- }
- }
- if (tmp == &dentry_unused)
+ tmp = lru_head->prev;
+ if (tmp == lru_head)
break;
- list_del_init(tmp);
- prefetch(dentry_unused.prev);
+
+ prefetch(lru_head->prev);
dentry_stat.nr_unused--;
- dentry = list_entry(tmp, struct dentry, d_lru);
+ dentry = sb ? list_entry(tmp, struct dentry, d_sb_lru) :
+ list_entry(tmp, struct dentry, d_lru);
+ list_del_init(&dentry->d_lru);
+ list_del_init(&dentry->d_sb_lru);

spin_lock(&dentry->d_lock);
/*
@@ -440,6 +438,8 @@ static void prune_dcache(int count, stru
if (dentry->d_flags & DCACHE_REFERENCED) {
dentry->d_flags &= ~DCACHE_REFERENCED;
list_add(&dentry->d_lru, &dentry_unused);
+ list_add(&dentry->d_sb_lru,
+ &dentry->d_sb->s_dentry_unused);
dentry_stat.nr_unused++;
spin_unlock(&dentry->d_lock);
continue;
@@ -455,7 +455,8 @@ static void prune_dcache(int count, stru
* If this dentry is for "my" filesystem, then I can prune it
* without taking the s_umount lock (I already hold it).
*/
- if (sb && dentry->d_sb == sb) {
+ if (sb) {
+ BUG_ON(dentry->d_sb != sb);
prune_one_dentry(dentry);
continue;
}
@@ -480,6 +481,8 @@ static void prune_dcache(int count, stru
spin_unlock(&dentry->d_lock);
/* Inserting dentry to tail of the list leads to cycle */
list_add(&dentry->d_lru, &dentry_unused);
+ list_add(&dentry->d_sb_lru,
+ &dentry->d_sb->s_dentry_unused);
dentry_stat.nr_unused++;
}
spin_unlock(&dcache_lock);
@@ -509,31 +512,15 @@ static void prune_dcache(int count, stru

void shrink_dcache_sb(struct super_block * sb)
{
- struct list_head *tmp, *next;
- struct dentry *dentry;
-
- /*
- * Pass one ... move the dentries for the specified
- * superblock to the most recent end of the unused list.
- */
spin_lock(&dcache_lock);
- list_for_each_safe(tmp, next, &dentry_unused) {
- dentry = list_entry(tmp, struct dentry, d_lru);
- if (dentry->d_sb != sb)
- continue;
- list_move(tmp, &dentry_unused);
- }
+ while (!list_empty(&sb->s_dentry_unused)) {
+ struct dentry *dentry;

- /*
- * Pass two ... free the dentries for this superblock.
- */
-repeat:
- list_for_each_safe(tmp, next, &dentry_unused) {
- dentry = list_entry(tmp, struct dentry, d_lru);
- if (dentry->d_sb != sb)
- continue;
+ dentry = list_entry((&sb->s_dentry_unused)->next,
+ struct dentry, d_sb_lru);
dentry_stat.nr_unused--;
- list_del_init(tmp);
+ list_del_init(&dentry->d_lru);
+ list_del_init(&dentry->d_sb_lru);
spin_lock(&dentry->d_lock);
if (atomic_read(&dentry->d_count)) {
spin_unlock(&dentry->d_lock);
@@ -541,7 +528,6 @@ repeat:
}
prune_one_dentry(dentry);
cond_resched_lock(&dcache_lock);
- goto repeat;
}
spin_unlock(&dcache_lock);
}
@@ -563,6 +549,7 @@ static void shrink_dcache_for_umount_sub
if (!list_empty(&dentry->d_lru)) {
dentry_stat.nr_unused--;
list_del_init(&dentry->d_lru);
+ list_del_init(&dentry->d_sb_lru);
}
__d_drop(dentry);
spin_unlock(&dcache_lock);
@@ -580,6 +567,7 @@ static void shrink_dcache_for_umount_sub
if (!list_empty(&loop->d_lru)) {
dentry_stat.nr_unused--;
list_del_init(&loop->d_lru);
+ list_del_init(&loop->d_sb_lru);
}

__d_drop(loop);
@@ -766,6 +754,7 @@ resume:
if (!list_empty(&dentry->d_lru)) {
dentry_stat.nr_unused--;
list_del_init(&dentry->d_lru);
+ list_del_init(&dentry->d_sb_lru);
}
/*
* move only zero ref count dentries to the end
@@ -773,6 +762,8 @@ resume:
*/
if (!atomic_read(&dentry->d_count)) {
list_add_tail(&dentry->d_lru, &dentry_unused);
+ list_add_tail(&dentry->d_sb_lru,
+ &dentry->d_sb->s_dentry_unused);
dentry_stat.nr_unused++;
found++;
}
@@ -892,6 +883,7 @@ struct dentry *d_alloc(struct dentry * p
#endif
INIT_HLIST_NODE(&dentry->d_hash);
INIT_LIST_HEAD(&dentry->d_lru);
+ INIT_LIST_HEAD(&dentry->d_sb_lru);
INIT_LIST_HEAD(&dentry->d_subdirs);
INIT_LIST_HEAD(&dentry->d_alias);

--- linux-2.6.19-rc3/fs/super.c.shrsb 2006-10-24 10:29:13.000000000 +0400
+++ linux-2.6.19-rc3/fs/super.c 2006-10-27 15:36:33.000000000 +0400
@@ -69,6 +69,7 @@ static struct super_block *alloc_super(s
INIT_LIST_HEAD(&s->s_io);
INIT_LIST_HEAD(&s->s_files);
INIT_LIST_HEAD(&s->s_instances);
+ INIT_LIST_HEAD(&s->s_dentry_unused);
INIT_HLIST_HEAD(&s->s_anon);
INIT_LIST_HEAD(&s->s_inodes);
init_rwsem(&s->s_umount);
--- linux-2.6.19-rc3/include/linux/dcache.h.shrsb 2006-10-24 10:29:14.000000000
+0400
+++ linux-2.6.19-rc3/include/linux/dcache.h 2006-10-27 15:36:33.000000000 +0400
@@ -94,6 +94,7 @@ struct dentry {
struct qstr d_name;

struct list_head d_lru; /* LRU list */
+ struct list_head d_sb_lru; /* per-sb LRU list */
/*
* d_child and d_rcu can share memory
*/
--- linux-2.6.19-rc3/include/linux/fs.h.shrsb 2006-10-24 10:29:14.000000000 +0400
+++ linux-2.6.19-rc3/include/linux/fs.h 2006-10-27 15:36:33.000000000 +0400
@@ -939,6 +939,7 @@ struct super_block {
struct list_head s_dirty; /* dirty inodes */
struct list_head s_io; /* parked for writeback */
struct hlist_head s_anon; /* anonymous dentries for (nfs) exporting */
+ struct list_head s_dentry_unused;
struct list_head s_files;

struct block_device *s_bdev;

2006-10-27 14:25:57

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 2.6.19-rc3] VFS: missing unused dentry in prune_dcache()

Vasily Averin <[email protected]> wrote:

> + /* Inserting dentry to tail of the list leads to cycle */
> + list_add(&dentry->d_lru, &dentry_unused);
> + dentry_stat.nr_unused++;

I'd phrase that comment differently: "Insert dentry at the head of the list as
inserting at the tail leads to a cycle".

But other than that:

Acked-By: David Howells <[email protected]>

2006-10-27 14:30:37

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

Vasily Averin <[email protected]> wrote:

> Umount calls shrink_dcache_sb in "Special case for "unmounting" root".
> Usually it happen only once, but in case OpenVZ it happens every time when any
> Virtual server is stopped, each of them have own isolated root partition.

Where is that? Are you talking about do_remount_sb()?

David

2006-10-27 14:32:25

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [Q] missing unused dentry in prune_dcache()?

> Vasily Averin <[email protected]> wrote:
>
>
>>Umount calls shrink_dcache_sb in "Special case for "unmounting" root".
>>Usually it happen only once, but in case OpenVZ it happens every time when any
>>Virtual server is stopped, each of them have own isolated root partition.
>
>
> Where is that? Are you talking about do_remount_sb()?
AFAIR yes.

Kirill

2006-10-27 18:07:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list

On Fri, 27 Oct 2006 18:05:50 +0400
Vasily Averin <[email protected]> wrote:

> Virtuozzo/OpenVZ linux kernel team has discovered that umount/remount can last
> for hours looping in shrink_dcache_sb() without much successes. Since during
> shrinking s_umount semaphore is taken lots of other unrelated operations like
> sync can stop working until shrink finished.

Did you consider altering shrink_dcache_sb() so that it holds onto
dcache_lock and moves all the to-be-pruned dentries onto a private list in
a single pass, then prunes them all outside the lock?

2006-10-30 04:26:16

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list

On Fri, Oct 27, 2006 at 06:05:50PM +0400, Vasily Averin wrote:
>
> The proposed fix prevents this issue by using per-sb dentry LRU list. It
> provides very quickly search for the unused dentries for given super block thus
> forcing shrinking always making good progress.

We've been down this path before:

http://marc.theaimsgroup.com/?l=linux-kernel&m=114861109717260&w=2

A lot of comments on per-sb unused dentry lists were made in
the threads associated with the above. other solutions were
found to the problem that the above patch addressed, but I don't
think any of them have made it to mainline yet. You might want
to have a bit of a read of these threads first...

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-10-30 06:20:56

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list

David,

>>The proposed fix prevents this issue by using per-sb dentry LRU list. It
>>provides very quickly search for the unused dentries for given super block thus
>>forcing shrinking always making good progress.
>
>
> We've been down this path before:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=114861109717260&w=2
>
> A lot of comments on per-sb unused dentry lists were made in
> the threads associated with the above. other solutions were
> found to the problem that the above patch addressed, but I don't
> think any of them have made it to mainline yet. You might want
> to have a bit of a read of these threads first...
The major difference between our patch and the one discussed in the link
it that we keep both global and per-sb dentry LRU lists.
Thus, when needed normal LRU is used and prune logic is unchanged,
while umount/remount use per-sb list and do its job faster.

Thanks,
Kirill

2006-10-30 14:24:30

by Vasily Averin

[permalink] [raw]
Subject: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list

Andrew Morton wrote:
> On Fri, 27 Oct 2006 18:05:50 +0400
> Vasily Averin <[email protected]> wrote:
>
>> Virtuozzo/OpenVZ linux kernel team has discovered that umount/remount can last
>> for hours looping in shrink_dcache_sb() without much successes. Since during
>> shrinking s_umount semaphore is taken lots of other unrelated operations like
>> sync can stop working until shrink finished.
>
> Did you consider altering shrink_dcache_sb() so that it holds onto
> dcache_lock and moves all the to-be-pruned dentries onto a private list in
> a single pass, then prunes them all outside the lock?

At the first glance it is wrong because of 2 reasons:
1) it continues to check the whole global LRU list (we propose to use per-sb
LRU, it will provide very quick search)
2) we have not any guarantee that someone will add new unused dentries to the
list when we prune it outside the lock. And to the contrary, some of unused
dentries can be used again. As far as I understand we should hold dcache_lock
beginning at the removing dentry from unused_list until dentry_iput() call.

David did it inside shrink_dcache_for_umount() just because it have guarantee
that all the filesystem operations are finished and new ones cannot be started.

Thank you,
Vasily Averin

2006-10-30 15:06:26

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list

Andrew,

>>Virtuozzo/OpenVZ linux kernel team has discovered that umount/remount can last
>>for hours looping in shrink_dcache_sb() without much successes. Since during
>>shrinking s_umount semaphore is taken lots of other unrelated operations like
>>sync can stop working until shrink finished.
>
>
> Did you consider altering shrink_dcache_sb() so that it holds onto
> dcache_lock and moves all the to-be-pruned dentries onto a private list in
> a single pass, then prunes them all outside the lock?

moving dentries from global list to the local one can take arbitrary number
of milliseconds (with huge amount of memory), so nothing good here from latency
view point.

Thanks,
Kirill

2006-10-30 15:08:28

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list

On Monday 30 October 2006 15:24, Vasily Averin wrote:
> Andrew Morton wrote:
> > On Fri, 27 Oct 2006 18:05:50 +0400
> >
> > Vasily Averin <[email protected]> wrote:
> >> Virtuozzo/OpenVZ linux kernel team has discovered that umount/remount
> >> can last for hours looping in shrink_dcache_sb() without much successes.
> >> Since during shrinking s_umount semaphore is taken lots of other
> >> unrelated operations like sync can stop working until shrink finished.
> >
> > Did you consider altering shrink_dcache_sb() so that it holds onto
> > dcache_lock and moves all the to-be-pruned dentries onto a private list
> > in a single pass, then prunes them all outside the lock?
>
> At the first glance it is wrong because of 2 reasons:
> 1) it continues to check the whole global LRU list (we propose to use
> per-sb LRU, it will provide very quick search)

Quick search maybe, but your patch adds 2 pointers to each dentry in the
system... That's pretty expensive, as dentries are already using a *lot* of
ram.

Maybe an alternative would be to not have anymore a global dentry_unused, but
only per-sb unused dentries lists ?


> 2) we have not any guarantee that someone will add new unused dentries to
> the list when we prune it outside the lock. And to the contrary, some of
> unused dentries can be used again. As far as I understand we should hold
> dcache_lock beginning at the removing dentry from unused_list until
> dentry_iput() call.
>
> David did it inside shrink_dcache_for_umount() just because it have
> guarantee that all the filesystem operations are finished and new ones
> cannot be started.

2006-10-30 15:27:17

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list

> Quick search maybe, but your patch adds 2 pointers to each dentry in the
> system... That's pretty expensive, as dentries are already using a *lot* of
> ram.
I don't see much problems with it... it is cache and it can be pruned if needed.
Some time ago, for example, my patch introducing the same list for inodes
was commited.

> Maybe an alternative would be to not have anymore a global dentry_unused, but
> only per-sb unused dentries lists ?
I don't know global LRU implementation based on per-sb lists, do you?
If someone suggest the algorithm for more or less fair global LRU
based on non-global list we will implement it. However, so far,
AFAICS there were problems with it.

Thanks,
Kirill

2006-10-30 16:09:35

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list

On Monday 30 October 2006 16:34, Kirill Korotaev wrote:
> > Quick search maybe, but your patch adds 2 pointers to each dentry in the
> > system... That's pretty expensive, as dentries are already using a *lot*
> > of ram.
>
> I don't see much problems with it... it is cache and it can be pruned if
> needed. Some time ago, for example, my patch introducing the same list for
> inodes was commited.

The ratio of dentries/PAGE is higher than ration of inodes/PAGE.
(On a x86_64 machine, 19 dentries per PAGE, 5 ext3 inodes per PAGE=

Therefore I suspect that the number of pages locked in dentry_cache because of
one inuse dentry is higher.

>
> > Maybe an alternative would be to not have anymore a global dentry_unused,
> > but only per-sb unused dentries lists ?
>
> I don't know global LRU implementation based on per-sb lists, do you?
> If someone suggest the algorithm for more or less fair global LRU
> based on non-global list we will implement it. However, so far,
> AFAICS there were problems with it.

Using 32 bytes per dentry for unused LRUs sounds too much, maybe we should
revert LRUS handling to timestamps or things like that.

2006-10-31 13:24:42

by Vasily Averin

[permalink] [raw]
Subject: [Q] missing ->d_delete() in shrink_dcache_for_umount()?

David,

Vasily Averin wrote:
> Hello folks,
>
> I would like to ask you clarify me one question in the the following patch:
> http://linux.bkbits.net:8080/linux-2.6/gnupatch@449b144ecSF1rYskg3q-SeR2vf88zg
> # ChangeSet
> # 2006/06/22 15:05:57-07:00 [email protected]
> # [PATCH] Fix dcache race during umount
>
> # If prune_dcache finds a dentry that it cannot free, it leaves it where it
> # is (at the tail of the list) and exits, on the assumption that some other
> # thread will be removing that dentry soon.

It looks like I've noticed yet one suspicious place in your patch. As far as I
see you have removed dput(root) call in shrink_dcache_for_umount() function.
However I would note that dput contains ->d_delete() call that is missing in
your function:

if (dentry->d_op && dentry->d_op->d_delete) {
if (dentry->d_op->d_delete(dentry))
goto unhash_it;

I'm not sure but it seems to me some (probably out-of-tree) filesystems can do
something useful in this place.

Thank you,
Vasily Averin

2006-10-31 15:08:56

by David Howells

[permalink] [raw]
Subject: Re: [Q] missing ->d_delete() in shrink_dcache_for_umount()?


Vasily Averin <[email protected]> wrote:

> It looks like I've noticed yet one suspicious place in your patch. As far as I
> see you have removed dput(root) call in shrink_dcache_for_umount() function.
> However I would note that dput contains ->d_delete() call that is missing in
> your function:
>
> if (dentry->d_op && dentry->d_op->d_delete) {
> if (dentry->d_op->d_delete(dentry))
> goto unhash_it;
>
> I'm not sure but it seems to me some (probably out-of-tree) filesystems can do
> something useful in this place.

Can they though? I'm not so sure. I've added Al to the To list to get his
opinion.

It seems to me that d_op->d_delete() is asking a question of whether the dentry
should be discarded immediately upon dput() reducing the usage count to 0. The
documentation in vfs.txt isn't entirely clear on this point, but what it does
say is that that op isn't allowed to sleep, so there's a limit as to what it
can do.

Furthermore, d_op->d_delete() is probably the wrong hook to call. It returns
an indication of whether the dentry should be retained or not, but we aren't
interested in that at this point: we have to get rid of the dentry anyway, so
the fs's opinion is irrelevant.

Finally, we do still call d_op->d_release() by way of d_free(), so it's not as
if the fs doesn't get a chance to clean up the dentry.

David