2006-03-08 14:51:08

by Jan Blunck

[permalink] [raw]
Subject: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

Andrew, I have test this patch for a while now and none of the users has seen
the "busy inodes" message for a while now. Can you please apply and test it in
-mm?

This is an updated version of the patch which adresses some issues that came
up during discussion. Although sb->prunes usually is 0, I'm testing it now
before calling wake_up(). Besides that, the shrink_dcache_parent() is only
waiting for prunes if we are called through generic_shutdown_super() when
sb->s_root is NULL.

Original patch description:

Kirill Korotaev <[email protected]> discovered a race between shrink_dcache_parent()
and shrink_dcache_memory() which leads to "Busy inodes after unmount".
When unmounting a file system shrink_dcache_parent() is racing against a
possible shrink_dcache_memory(). This might lead to the situation that
shrink_dcache_parent() is returning too early. In this situation the
super_block is destroyed before shrink_dcache_memory() could put the inode.

This patch fixes the problem through introducing a prunes counter which is
incremented when a dentry is pruned but the corresponding inoded isn't put
yet.When the prunes counter is not null, shrink_dcache_parent() is waiting and
restarting its work.

Signed-off-by: Jan Blunck <[email protected]>


Attachments:
(No filename) (1.22 kB)
umount-prune_one_dentry-fix.diff (3.51 kB)
Download all attachments

2006-03-09 06:33:42

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

On Wed, Mar 08, 2006 at 03:51:05PM +0100, Jan Blunck wrote:
> Andrew, I have test this patch for a while now and none of the users has seen
> the "busy inodes" message for a while now. Can you please apply and test it in
> -mm?
>
> This is an updated version of the patch which adresses some issues that came
> up during discussion. Although sb->prunes usually is 0, I'm testing it now
> before calling wake_up(). Besides that, the shrink_dcache_parent() is only
> waiting for prunes if we are called through generic_shutdown_super() when
> sb->s_root is NULL.
>
> Original patch description:
>
> Kirill Korotaev <[email protected]> discovered a race between shrink_dcache_parent()
> and shrink_dcache_memory() which leads to "Busy inodes after unmount".
> When unmounting a file system shrink_dcache_parent() is racing against a
> possible shrink_dcache_memory(). This might lead to the situation that
> shrink_dcache_parent() is returning too early. In this situation the
> super_block is destroyed before shrink_dcache_memory() could put the inode.
>
> This patch fixes the problem through introducing a prunes counter which is
> incremented when a dentry is pruned but the corresponding inoded isn't put
> yet.When the prunes counter is not null, shrink_dcache_parent() is waiting and
> restarting its work.
>
> Signed-off-by: Jan Blunck <[email protected]>

Looks good, small cosmetic comment below

<snip>

> +/*
> + * If we slept on waiting for other prunes to finish, there maybe are
> + * some dentries the d_lru list that we have "overlooked" the last
> + * time we called select_parent(). Therefor lets restart in this case.
> + */
> void shrink_dcache_parent(struct dentry * parent)
> {
> int found;
> + struct super_block *sb = parent->d_sb;
>
> + again:
> while ((found = select_parent(parent)) != 0)
> prune_dcache(found);
> +
> + /* If we are called from generic_shutdown_super() during
> + * umount of a filesystem, we want to check for other prunes */
> + if (!sb->s_root && wait_on_prunes(sb))
> + goto again;
> }

cosmetic - could we do this with a do { } while() loop instead of a goto?

I think though that after select_parent(), wait_on_prunes() can sleep just
once, so we do not need a goto. Just calling wait_on_prunes() should
fix the race. For all the dentries missed in the race, wait_on_parent()
will ensure that they are dput() by prune_one_dentry() before wait_on_parent()
returns.

But, I do not have anything against the goto, so this patch should be just
fine.

<snip>

> if (root) {
> sb->s_root = NULL;
> - shrink_dcache_parent(root);
> shrink_dcache_anon(&sb->s_anon);
> + shrink_dcache_parent(root);
> dput(root);

This change might conflict with the NFS patches in -mm.

<snip>

Thanks,
Balbir

2006-03-09 11:00:30

by Jan Blunck

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

On Thu, Mar 09, Balbir Singh wrote:

> > +/*
> > + * If we slept on waiting for other prunes to finish, there maybe are
> > + * some dentries the d_lru list that we have "overlooked" the last
> > + * time we called select_parent(). Therefor lets restart in this case.
> > + */
> > void shrink_dcache_parent(struct dentry * parent)
> > {
> > int found;
> > + struct super_block *sb = parent->d_sb;
> >
> > + again:
> > while ((found = select_parent(parent)) != 0)
> > prune_dcache(found);
> > +
> > + /* If we are called from generic_shutdown_super() during
> > + * umount of a filesystem, we want to check for other prunes */
> > + if (!sb->s_root && wait_on_prunes(sb))
> > + goto again;
> > }
>
> cosmetic - could we do this with a do { } while() loop instead of a goto?
>
> I think though that after select_parent(), wait_on_prunes() can sleep just
> once, so we do not need a goto. Just calling wait_on_prunes() should
> fix the race. For all the dentries missed in the race, wait_on_parent()
> will ensure that they are dput() by prune_one_dentry() before wait_on_parent()
> returns.
>
> But, I do not have anything against the goto, so this patch should be just
> fine.
>

No. Think about a dentry which parent isn't unhashed. This parent will end up
on the lru-list after the wait_on_prunes(). Therefore we have to do the
select_parent()/prune_dcache() again to get rid of all dentries.

And I missed the "goto vs. do...while()" against my colleagues here, too ;)
I'll send a followup.

> > if (root) {
> > sb->s_root = NULL;
> > - shrink_dcache_parent(root);
> > shrink_dcache_anon(&sb->s_anon);
> > + shrink_dcache_parent(root);
> > dput(root);
>
> This change might conflict with the NFS patches in -mm.
>

Hmm, right. Andrew, if you want a rediff against -mm just tell me. I'm
actually diff'ing against lates linux-2.6.git.

Regards,
Jan

--
Jan Blunck [email protected]
SuSE LINUX AG - A Novell company
Maxfeldstr. 5 +49-911-74053-608
D-90409 N?rnberg http://www.suse.de

2006-03-09 11:24:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

Jan Blunck <[email protected]> wrote:
>
> > This change might conflict with the NFS patches in -mm.
> >
>
> Hmm, right. Andrew, if you want a rediff against -mm just tell me. I'm
> actually diff'ing against lates linux-2.6.git.

I'll work it out.

Are we all happy with this patch now?

<looks at it>

Cosmetically, I don't think wait_on_prunes() should be concerned about
whether or not it "slept". That action is not significant and preemptible
kernels can "sleep" at just about any stage. So I think the concept of
"slept" in there should be replaced with, say, "prunes_remaining" or
something like that. Consequently the all-important comment over
wait_on_prunes() should be updated to provide a bit more information about
the significance of its return value, please.

Also I think there should be some explanation somewhere which describes why
we can continue to assume that there aren't any prunes left to do after
wait_on_prunes() has dropped dcache_lock. I mean, once you've dropped the
lock it's usually the case that anything which you examined while holding
that lock now becomes out-of-date and invalid. I assume the thinking is
that because there's an unmount in progress, nothing can come in and add
new dentries?

IOW: why isn't there a race between wait_on_prunes() and prune_one_dentry()?

2006-03-09 11:36:47

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

<snip>
> No. Think about a dentry which parent isn't unhashed. This parent will end up
> on the lru-list after the wait_on_prunes(). Therefore we have to do the
> select_parent()/prune_dcache() again to get rid of all dentries.
>

Just checked, yes prune_one_dentry() needs to be called once the parent
gets on to the LRU list, so we must call prune_dcache() again.
In -mm, the generic_shutdown_super() calls shrink_dcache_sb() which takes care
of looping till the lru-list is empty.

Your fix is correct, but the looping is probably not be required for -mm.

Regards,
Balbir

<snip>

2006-03-09 11:58:33

by Jan Blunck

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

On Thu, Mar 09, Andrew Morton wrote:

> > Hmm, right. Andrew, if you want a rediff against -mm just tell me. I'm
> > actually diff'ing against lates linux-2.6.git.
>
> I'll work it out.
>

Ok, so I'll just send an updated patch against linux-2.6.git adressing your
issues later.

> Cosmetically, I don't think wait_on_prunes() should be concerned about
> whether or not it "slept". That action is not significant and preemptible
> kernels can "sleep" at just about any stage. So I think the concept of
> "slept" in there should be replaced with, say, "prunes_remaining" or
> something like that. Consequently the all-important comment over
> wait_on_prunes() should be updated to provide a bit more information about
> the significance of its return value, please.

Ok, will do.

> Also I think there should be some explanation somewhere which describes why
> we can continue to assume that there aren't any prunes left to do after
> wait_on_prunes() has dropped dcache_lock. I mean, once you've dropped the
> lock it's usually the case that anything which you examined while holding
> that lock now becomes out-of-date and invalid. I assume the thinking is
> that because there's an unmount in progress, nothing can come in and add
> new dentries?
>
> IOW: why isn't there a race between wait_on_prunes() and prune_one_dentry()?

Because outside dcache_lock, either the refcount on the parent dentry is wrong
(therefore select_parent() might return 0) and sb_prunes != 0 or the refcount
is correct and the sb_prunes == 0. Since sb_root == NULL when unmounting the
filesystem there are no new dentries coming in.

> Are we all happy with this patch now?

I'll update the comments and come back to you with an updated patch. Then I'm
fine with the patch.

Regards,
Jan

--
Jan Blunck [email protected]
SuSE LINUX AG - A Novell company
Maxfeldstr. 5 +49-911-74053-608
D-90409 N?rnberg http://www.suse.de

2006-03-09 12:49:13

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

>>>This change might conflict with the NFS patches in -mm.
>>
>> >
>>
>> Hmm, right. Andrew, if you want a rediff against -mm just tell me. I'm
>> actually diff'ing against lates linux-2.6.git.
>
>
> I'll work it out.
>
> Are we all happy with this patch now?

I can't see why we fix shrink_dcache_parent() only, why
shrink_dcache_anon() is totally missed?

Thanks,
Kirill

2006-03-09 14:08:30

by Jan Blunck

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

On Thu, Mar 09, Kirill Korotaev wrote:

> >
> >Are we all happy with this patch now?
>
> I can't see why we fix shrink_dcache_parent() only, why
> shrink_dcache_anon() is totally missed?
>

First of all because anon-dentries don't have a parent. So they are not a real
problem in don't restarting the shrink_dcache_anon() if we waited for prunes.
Since I've reordered the calls to shrink_dcache_anon() and
shrink_dcache_parent() in generic_shutdown_super() they are handled as normal
dentries if they are pruned through shrink_dcache_memory() from the d_lru list.

Regards,
Jan

--
Jan Blunck [email protected]
SuSE LINUX AG - A Novell company
Maxfeldstr. 5 +49-911-74053-608
D-90409 N?rnberg http://www.suse.de

2006-03-09 14:31:40

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

>>>Are we all happy with this patch now?
>>
>>I can't see why we fix shrink_dcache_parent() only, why
>>shrink_dcache_anon() is totally missed?
>>
>
>
> First of all because anon-dentries don't have a parent. So they are not a real
> problem in don't restarting the shrink_dcache_anon() if we waited for prunes.
> Since I've reordered the calls to shrink_dcache_anon() and
> shrink_dcache_parent() in generic_shutdown_super() they are handled as normal
> dentries if they are pruned through shrink_dcache_memory() from the d_lru list.
This looks a bad idea to reorder calls to achieve such a behvaiour.
I would move the loop outside of shrink_dcache_parent to
generic_shutdown_super instead.

Thanks,
Kirill


2006-03-09 14:38:19

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

commented your patch a bit.
and attached a corrected version. please review it.

> Andrew, I have test this patch for a while now and none of the users has seen
> the "busy inodes" message for a while now. Can you please apply and test it in
> -mm?
>
> This is an updated version of the patch which adresses some issues that came
> up during discussion. Although sb->prunes usually is 0, I'm testing it now
> before calling wake_up(). Besides that, the shrink_dcache_parent() is only
> waiting for prunes if we are called through generic_shutdown_super() when
> sb->s_root is NULL.
>
> Original patch description:
>
> Kirill Korotaev <[email protected]> discovered a race between shrink_dcache_parent()
> and shrink_dcache_memory() which leads to "Busy inodes after unmount".
> When unmounting a file system shrink_dcache_parent() is racing against a
> possible shrink_dcache_memory(). This might lead to the situation that
> shrink_dcache_parent() is returning too early. In this situation the
> super_block is destroyed before shrink_dcache_memory() could put the inode.
>
> This patch fixes the problem through introducing a prunes counter which is
> incremented when a dentry is pruned but the corresponding inoded isn't put
> yet.When the prunes counter is not null, shrink_dcache_parent() is waiting and
> restarting its work.
>
> Signed-off-by: Jan Blunck <[email protected]>
>
>
> ------------------------------------------------------------------------
>
> ---
>
> fs/dcache.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
> fs/super.c | 4 +++-
> include/linux/fs.h | 3 +++
> 3 files changed, 53 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/fs/dcache.c
> ===================================================================
> --- linux-2.6.orig/fs/dcache.c
> +++ linux-2.6/fs/dcache.c
> @@ -364,17 +364,22 @@ restart:
> */
> static inline void prune_one_dentry(struct dentry * dentry)
> {
> + struct super_block *sb = dentry->d_sb;
> struct dentry * parent;
>
> __d_drop(dentry);
> list_del(&dentry->d_u.d_child);
> dentry_stat.nr_dentry--; /* For d_free, below */
> + sb->s_prunes++;
> dentry_iput(dentry);
> parent = dentry->d_parent;
> d_free(dentry);
> if (parent != dentry)
> dput(parent);
> spin_lock(&dcache_lock);
> + sb->s_prunes--;
> + if (likely(!sb->s_prunes))
<<< Is it possibe to do something like:
if (unlikely(!sb->s_root && !sb->s_prunes))
?

> + wake_up(&sb->s_wait_prunes);
> }
>
> /**
> @@ -623,19 +628,60 @@ out:
> return found;
> }
>
> +/*
> + * A special version of wait_event(!sb->s_prunes) which takes the dcache_lock
> + * when checking the condition and gives feedback if we slept.
> + */
> +static int wait_on_prunes(struct super_block *sb)
> +{
> + DEFINE_WAIT(wait);
> + int slept = 0;
> +
> +#ifdef DCACHE_DEBUG
> + printk(KERN_DEBUG "%s: waiting for %d prunes\n", __FUNCTION__,
> + sb->s_prunes);
> +#endif
> +
> + spin_lock(&dcache_lock);
> + for (;;) {
> + prepare_to_wait(&sb->s_wait_prunes, &wait,
> + TASK_UNINTERRUPTIBLE);
> + if (!sb->s_prunes)
> + break;
> + spin_unlock(&dcache_lock);
> + schedule();
> + slept = 1;
> + spin_lock(&dcache_lock);
> + }
> + spin_unlock(&dcache_lock);
> + finish_wait(&sb->s_wait_prunes, &wait);
> + return slept;
> +}
> +
> /**
> * shrink_dcache_parent - prune dcache
> * @parent: parent of entries to prune
> *
> * Prune the dcache to remove unused children of the parent dentry.
> */
> -
> +/*
> + * If we slept on waiting for other prunes to finish, there maybe are
> + * some dentries the d_lru list that we have "overlooked" the last
> + * time we called select_parent(). Therefor lets restart in this case.
> + */
> void shrink_dcache_parent(struct dentry * parent)
> {
> int found;
> + struct super_block *sb = parent->d_sb;
>
> + again:
> while ((found = select_parent(parent)) != 0)
> prune_dcache(found);
> +
> + /* If we are called from generic_shutdown_super() during
> + * umount of a filesystem, we want to check for other prunes */
> + if (!sb->s_root && wait_on_prunes(sb))
> + goto again;
<<<< I don't like this loop here as it looks like a hack for some
special case.
better to move it to generic_shutdown() and omit sb->s_root check at all.

> }
>
> /**
> Index: linux-2.6/fs/super.c
> ===================================================================
> --- linux-2.6.orig/fs/super.c
> +++ linux-2.6/fs/super.c
> @@ -80,6 +80,8 @@ static struct super_block *alloc_super(v
> sema_init(&s->s_dquot.dqio_sem, 1);
> sema_init(&s->s_dquot.dqonoff_sem, 1);
> init_rwsem(&s->s_dquot.dqptr_sem);
> + s->s_prunes = 0;
> + init_waitqueue_head(&s->s_wait_prunes);
> init_waitqueue_head(&s->s_wait_unfrozen);
> s->s_maxbytes = MAX_NON_LFS;
> s->dq_op = sb_dquot_ops;
> @@ -230,8 +232,8 @@ void generic_shutdown_super(struct super
>
> if (root) {
> sb->s_root = NULL;
> - shrink_dcache_parent(root);
> shrink_dcache_anon(&sb->s_anon);
> + shrink_dcache_parent(root);
<<<< As I noted in another email it is better to move the loop outside
of shrink_dcache_parent...

> dput(root);
<<<< just FYI: your patch looks better then original mine here, as you
don't need to check for race with root, while we check for specific
dentry race and had to check for the race after dput(root) (see my patch
if interested).

> fsync_super(sb);
> lock_super(sb);
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h
> +++ linux-2.6/include/linux/fs.h
> @@ -835,6 +835,9 @@ struct super_block {
> struct list_head s_instances;
> struct quota_info s_dquot; /* Diskquota specific options */
>
> + unsigned int s_prunes; /* protected by dcache_lock */
> + wait_queue_head_t s_wait_prunes;
> +
> int s_frozen;
> wait_queue_head_t s_wait_unfrozen;
>


2006-03-09 16:09:24

by Jan Blunck

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

On Thu, Mar 09, Kirill Korotaev wrote:

> commented your patch a bit.
> and attached a corrected version. please review it.
>

Thanks! I'll send the corrected patch.

So, everythings fine now?


> > d_free(dentry);
> > if (parent != dentry)
> > dput(parent);
> > spin_lock(&dcache_lock);
> >+ sb->s_prunes--;
> >+ if (likely(!sb->s_prunes))
> <<< Is it possibe to do something like:
> if (unlikely(!sb->s_root && !sb->s_prunes))
> ?

Uh, I forgot about that one. You already complained about that before :(

> > void shrink_dcache_parent(struct dentry * parent)
> > {
> > int found;
> >+ struct super_block *sb = parent->d_sb;
> >
> >+ again:
> > while ((found = select_parent(parent)) != 0)
> > prune_dcache(found);
> >+
> >+ /* If we are called from generic_shutdown_super() during
> >+ * umount of a filesystem, we want to check for other prunes */
> >+ if (!sb->s_root && wait_on_prunes(sb))
> >+ goto again;
> <<<< I don't like this loop here as it looks like a hack for some
> special case.
> better to move it to generic_shutdown() and omit sb->s_root check at all.
>

Yes, looks a little cleaner though.

Regards,
Jan

--
Jan Blunck [email protected]
SuSE LINUX AG - A Novell company
Maxfeldstr. 5 +49-911-74053-608
D-90409 N?rnberg http://www.suse.de

2006-03-09 16:14:00

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

> Thanks! I'll send the corrected patch.
> So, everythings fine now?
looks so! Will be glad to Ack/Sign or whatever needed :)))

>>> d_free(dentry);
>>> if (parent != dentry)
>>> dput(parent);
>>> spin_lock(&dcache_lock);
>>>+ sb->s_prunes--;
>>>+ if (likely(!sb->s_prunes))
>>
>><<< Is it possibe to do something like:
>>if (unlikely(!sb->s_root && !sb->s_prunes))
>>?
>
>
> Uh, I forgot about that one. You already complained about that before :(
But I'm not sure it is that simple... s_root is set to NULL w/o locks,
so I wonder whether it is safe to check it here or we can miss some
wakeups...

Thanks,
Kirill

2006-03-09 16:39:50

by Jan Blunck

[permalink] [raw]
Subject: Re: [PATCH] Fix shrink_dcache_parent() against shrink_dcache_memory() race (updated patch)

On Thu, Mar 09, Kirill Korotaev wrote:

> >Thanks! I'll send the corrected patch.
> >So, everythings fine now?
> looks so! Will be glad to Ack/Sign or whatever needed :)))
>

Ok.

> >>> d_free(dentry);
> >>> if (parent != dentry)
> >>> dput(parent);
> >>> spin_lock(&dcache_lock);
> >>>+ sb->s_prunes--;
> >>>+ if (likely(!sb->s_prunes))
> >>
> >><<< Is it possibe to do something like:
> >>if (unlikely(!sb->s_root && !sb->s_prunes))
> >>?
> >
> >
> >Uh, I forgot about that one. You already complained about that before :(
> But I'm not sure it is that simple... s_root is set to NULL w/o locks,
> so I wonder whether it is safe to check it here or we can miss some
> wakeups...

No, it's not. We need to down_read(&sb->s_umount) for that which is
deadlocking because we down_write() it before calling ->kill_sb(). So this
isn't safe. For now I'll keep it like before and live with the overhead of
calling wake_up() on an empty wait-queue.

Regards,
Jan

--
Jan Blunck [email protected]
SuSE LINUX AG - A Novell company
Maxfeldstr. 5 +49-911-74053-608
D-90409 N?rnberg http://www.suse.de