2007-01-12 20:45:57

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

I've been looking at a case where many threads are opening, unlinking, and
hardlinking files on ext3 . At unmount time I see an oops, because the superblock's
orphan list points to a freed inode.

I did some tracing of the inodes, and it looks like this:

ext3_unlink():[/src/linux-2.6.18/fs/ext3/namei.c:2123] adding orphan
i_state:0x7 cpu:1 i_count:2 i_nlink:0

ext3_orphan_add():[/src/linux-2.6.18/fs/ext3/namei.c:1890] ext3_orphan_add
i_state:0x7 cpu:1 i_count:2 i_nlink:0

iput():[/src/linux-2.6.18/fs/inode.c:1139] iput enter
i_state:0x7 cpu:1 i_count:2 i_nlink:0

ext3_link():[/src/linux-2.6.18/fs/ext3/namei.c:2202] ext3_link enter
i_state:0x7 cpu:3 i_count:1 i_nlink:0

ext3_inc_count():[/src/linux-2.6.18/fs/ext3/namei.c:1627] done
i_state:0x7 cpu:3 i_count:1 i_nlink:1

The unlink gets there first, finds i_count > 0 (in use) but nlink goes to 0, so
it puts it on the orphan inode list. Then link comes along, and bumps the link
back up to 1. So now we are on the orphan inode list, but we are not unlinked.

Eventually when count goes to 0, and we still have 1 link, again no action is
taken to remove the inode from the orphan list, because it is still linked (i.e.
we don't go through ext3_delete())

When this inode is eventually freed, the sb orphan list gets corrupted, because
we have freed it without first removing it from the orphan list.

I think the simple solution is to remove the inode from the orphan list
when we bump the link back up from 0 to 1. I put that test in there because
there are other potential reasons that we might be on the list (truncates,
direct IO).

Comments?

Thanks,
-Eric

p.s. ext3_inc_count and ext3_dec_count seem misnamed, have an unused
arg, and are very infrequently called. I'll probably submit a patch
to just put the single line of code into the caller, too.

---

Remove inode from the orphan list in ext3_link() if we might have
raced with ext3_unlink(), which potentially put it on the list.
If we're on the list with nlink > 0, we'll never get cleaned up
properly and eventually may corrupt the list.

Signed-off-by: Eric Sandeen <[email protected]>

Index: linux-2.6.19/fs/ext3/namei.c
===================================================================
--- linux-2.6.19.orig/fs/ext3/namei.c
+++ linux-2.6.19/fs/ext3/namei.c
@@ -2204,6 +2204,9 @@ retry:
inode->i_ctime = CURRENT_TIME_SEC;
ext3_inc_count(handle, inode);
atomic_inc(&inode->i_count);
+ /* did we race w/ unlink? */
+ if (inode->i_nlink == 1)
+ ext3_orphan_del(handle, inode);

err = ext3_add_nondir(handle, dentry, inode);
ext3_journal_stop(handle);


2007-01-12 21:02:17

by Alex Tomas

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race


interesting ..

I thought VFS doesn't allow concurrent operations.
if unlink goes first, then link should wait on the
parent's i_mutex and then found no source name.

thanks, Alex

>>>>> Eric Sandeen (ES) writes:

ES> )
ES> I've been looking at a case where many threads are opening, unlinking, and
ES> hardlinking files on ext3 . At unmount time I see an oops, because the superblock's
ES> orphan list points to a freed inode.

ES> I did some tracing of the inodes, and it looks like this:

ES> ext3_unlink():[/src/linux-2.6.18/fs/ext3/namei.c:2123] adding orphan
ES> i_state:0x7 cpu:1 i_count:2 i_nlink:0

ES> ext3_orphan_add():[/src/linux-2.6.18/fs/ext3/namei.c:1890] ext3_orphan_add
ES> i_state:0x7 cpu:1 i_count:2 i_nlink:0

ES> iput():[/src/linux-2.6.18/fs/inode.c:1139] iput enter
ES> i_state:0x7 cpu:1 i_count:2 i_nlink:0

ES> ext3_link():[/src/linux-2.6.18/fs/ext3/namei.c:2202] ext3_link enter
ES> i_state:0x7 cpu:3 i_count:1 i_nlink:0

ES> ext3_inc_count():[/src/linux-2.6.18/fs/ext3/namei.c:1627] done
ES> i_state:0x7 cpu:3 i_count:1 i_nlink:1

ES> The unlink gets there first, finds i_count > 0 (in use) but nlink goes to 0, so
ES> it puts it on the orphan inode list. Then link comes along, and bumps the link
ES> back up to 1. So now we are on the orphan inode list, but we are not unlinked.

ES> Eventually when count goes to 0, and we still have 1 link, again no action is
ES> taken to remove the inode from the orphan list, because it is still linked (i.e.
ES> we don't go through ext3_delete())

ES> When this inode is eventually freed, the sb orphan list gets corrupted, because
ES> we have freed it without first removing it from the orphan list.

ES> I think the simple solution is to remove the inode from the orphan list
ES> when we bump the link back up from 0 to 1. I put that test in there because
ES> there are other potential reasons that we might be on the list (truncates,
ES> direct IO).

ES> Comments?

ES> Thanks,
ES> -Eric

ES> p.s. ext3_inc_count and ext3_dec_count seem misnamed, have an unused
ES> arg, and are very infrequently called. I'll probably submit a patch
ES> to just put the single line of code into the caller, too.

ES> ---

ES> Remove inode from the orphan list in ext3_link() if we might have
ES> raced with ext3_unlink(), which potentially put it on the list.
ES> If we're on the list with nlink > 0, we'll never get cleaned up
ES> properly and eventually may corrupt the list.

ES> Signed-off-by: Eric Sandeen <[email protected]>

ES> Index: linux-2.6.19/fs/ext3/namei.c
ES> ===================================================================
ES> --- linux-2.6.19.orig/fs/ext3/namei.c
ES> +++ linux-2.6.19/fs/ext3/namei.c
ES> @@ -2204,6 +2204,9 @@ retry:
inode-> i_ctime = CURRENT_TIME_SEC;
ES> ext3_inc_count(handle, inode);
ES> atomic_inc(&inode->i_count);
ES> + /* did we race w/ unlink? */
ES> + if (inode->i_nlink == 1)
ES> + ext3_orphan_del(handle, inode);

ES> err = ext3_add_nondir(handle, dentry, inode);
ES> ext3_journal_stop(handle);


ES> -
ES> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
ES> the body of a message to [email protected]
ES> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-01-12 21:14:49

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

Alex Tomas wrote:
> interesting ..
>
> I thought VFS doesn't allow concurrent operations.
> if unlink goes first, then link should wait on the
> parent's i_mutex and then found no source name.
>
> thanks, Alex

Well... I was wondering that myself, whether this race should even
happen. But the bottom of do_unlinkat looks like:

mutex_unlock(&nd.dentry->d_inode->i_mutex);
if (inode)
iput(inode); /* truncate the inode here */
exit1:
path_release(&nd);
exit:
putname(name);
return error;

so I think it's possible that link can sneak in there & find it after
the mutex is dropped...? Is this ok? :) It's certainly -happening-
anyway....

-Eric

2007-01-12 21:17:45

by Alex Tomas

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race


ah, it seems vfs_link() doesn't check whether source is still alive.
for example, in mkdir case vfs_mkdir() calls may_create() and
checks the parent is still there:
if (IS_DEADDIR(dir))
return -ENOENT;

VFS doesn't set S_DEAD on regular files, but we could check i_nlink.

thanks, Alex

>>>>> Alex Tomas (AT) writes:

AT> interesting ..

AT> I thought VFS doesn't allow concurrent operations.
AT> if unlink goes first, then link should wait on the
AT> parent's i_mutex and then found no source name.

AT> thanks, Alex

>>>>> Eric Sandeen (ES) writes:

ES> )
ES> I've been looking at a case where many threads are opening, unlinking, and
ES> hardlinking files on ext3 . At unmount time I see an oops, because the superblock's
ES> orphan list points to a freed inode.

ES> I did some tracing of the inodes, and it looks like this:

ES> ext3_unlink():[/src/linux-2.6.18/fs/ext3/namei.c:2123] adding orphan
ES> i_state:0x7 cpu:1 i_count:2 i_nlink:0

ES> ext3_orphan_add():[/src/linux-2.6.18/fs/ext3/namei.c:1890] ext3_orphan_add
ES> i_state:0x7 cpu:1 i_count:2 i_nlink:0

ES> iput():[/src/linux-2.6.18/fs/inode.c:1139] iput enter
ES> i_state:0x7 cpu:1 i_count:2 i_nlink:0

ES> ext3_link():[/src/linux-2.6.18/fs/ext3/namei.c:2202] ext3_link enter
ES> i_state:0x7 cpu:3 i_count:1 i_nlink:0

ES> ext3_inc_count():[/src/linux-2.6.18/fs/ext3/namei.c:1627] done
ES> i_state:0x7 cpu:3 i_count:1 i_nlink:1

ES> The unlink gets there first, finds i_count > 0 (in use) but nlink goes to 0, so
ES> it puts it on the orphan inode list. Then link comes along, and bumps the link
ES> back up to 1. So now we are on the orphan inode list, but we are not unlinked.

ES> Eventually when count goes to 0, and we still have 1 link, again no action is
ES> taken to remove the inode from the orphan list, because it is still linked (i.e.
ES> we don't go through ext3_delete())

ES> When this inode is eventually freed, the sb orphan list gets corrupted, because
ES> we have freed it without first removing it from the orphan list.

ES> I think the simple solution is to remove the inode from the orphan list
ES> when we bump the link back up from 0 to 1. I put that test in there because
ES> there are other potential reasons that we might be on the list (truncates,
ES> direct IO).

ES> Comments?

ES> Thanks,
ES> -Eric

ES> p.s. ext3_inc_count and ext3_dec_count seem misnamed, have an unused
ES> arg, and are very infrequently called. I'll probably submit a patch
ES> to just put the single line of code into the caller, too.

ES> ---

ES> Remove inode from the orphan list in ext3_link() if we might have
ES> raced with ext3_unlink(), which potentially put it on the list.
ES> If we're on the list with nlink > 0, we'll never get cleaned up
ES> properly and eventually may corrupt the list.

ES> Signed-off-by: Eric Sandeen <[email protected]>

ES> Index: linux-2.6.19/fs/ext3/namei.c
ES> ===================================================================
ES> --- linux-2.6.19.orig/fs/ext3/namei.c
ES> +++ linux-2.6.19/fs/ext3/namei.c
ES> @@ -2204,6 +2204,9 @@ retry:
inode-> i_ctime = CURRENT_TIME_SEC;
ES> ext3_inc_count(handle, inode);
ES> atomic_inc(&inode->i_count);
ES> + /* did we race w/ unlink? */
ES> + if (inode->i_nlink == 1)
ES> + ext3_orphan_del(handle, inode);

ES> err = ext3_add_nondir(handle, dentry, inode);
ES> ext3_journal_stop(handle);


ES> -
ES> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
ES> the body of a message to [email protected]
ES> More majordomo info at http://vger.kernel.org/majordomo-info.html
AT> -
AT> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
AT> the body of a message to [email protected]
AT> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-01-12 21:24:42

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

On Sat, 2007-01-13 at 00:02 +0300, Alex Tomas wrote:
> interesting ..
>
> I thought VFS doesn't allow concurrent operations.
> if unlink goes first, then link should wait on the
> parent's i_mutex and then found no source name.

I don't think the VFS ever takes the source's parent's i_mutex. Unless
the source and destination's parent is the same, in which case the
i_mutex is taken after the source has already been looked up.

> thanks, Alex
>
> >>>>> Eric Sandeen (ES) writes:
>
> ES> )
> ES> I've been looking at a case where many threads are opening, unlinking, and
> ES> hardlinking files on ext3 . At unmount time I see an oops, because the superblock's
> ES> orphan list points to a freed inode.
>
> ES> I did some tracing of the inodes, and it looks like this:
>
> ES> ext3_unlink():[/src/linux-2.6.18/fs/ext3/namei.c:2123] adding orphan
> ES> i_state:0x7 cpu:1 i_count:2 i_nlink:0
>
> ES> ext3_orphan_add():[/src/linux-2.6.18/fs/ext3/namei.c:1890] ext3_orphan_add
> ES> i_state:0x7 cpu:1 i_count:2 i_nlink:0
>
> ES> iput():[/src/linux-2.6.18/fs/inode.c:1139] iput enter
> ES> i_state:0x7 cpu:1 i_count:2 i_nlink:0
>
> ES> ext3_link():[/src/linux-2.6.18/fs/ext3/namei.c:2202] ext3_link enter
> ES> i_state:0x7 cpu:3 i_count:1 i_nlink:0
>
> ES> ext3_inc_count():[/src/linux-2.6.18/fs/ext3/namei.c:1627] done
> ES> i_state:0x7 cpu:3 i_count:1 i_nlink:1
>
> ES> The unlink gets there first, finds i_count > 0 (in use) but nlink goes to 0, so
> ES> it puts it on the orphan inode list. Then link comes along, and bumps the link
> ES> back up to 1. So now we are on the orphan inode list, but we are not unlinked.
>
> ES> Eventually when count goes to 0, and we still have 1 link, again no action is
> ES> taken to remove the inode from the orphan list, because it is still linked (i.e.
> ES> we don't go through ext3_delete())
>
> ES> When this inode is eventually freed, the sb orphan list gets corrupted, because
> ES> we have freed it without first removing it from the orphan list.
>
> ES> I think the simple solution is to remove the inode from the orphan list
> ES> when we bump the link back up from 0 to 1. I put that test in there because
> ES> there are other potential reasons that we might be on the list (truncates,
> ES> direct IO).
>
> ES> Comments?
>
> ES> Thanks,
> ES> -Eric
>
> ES> p.s. ext3_inc_count and ext3_dec_count seem misnamed, have an unused
> ES> arg, and are very infrequently called. I'll probably submit a patch
> ES> to just put the single line of code into the caller, too.
>
> ES> ---
>
> ES> Remove inode from the orphan list in ext3_link() if we might have
> ES> raced with ext3_unlink(), which potentially put it on the list.
> ES> If we're on the list with nlink > 0, we'll never get cleaned up
> ES> properly and eventually may corrupt the list.
>
> ES> Signed-off-by: Eric Sandeen <[email protected]>
>
> ES> Index: linux-2.6.19/fs/ext3/namei.c
> ES> ===================================================================
> ES> --- linux-2.6.19.orig/fs/ext3/namei.c
> ES> +++ linux-2.6.19/fs/ext3/namei.c
> ES> @@ -2204,6 +2204,9 @@ retry:
> inode-> i_ctime = CURRENT_TIME_SEC;
> ES> ext3_inc_count(handle, inode);
> ES> atomic_inc(&inode->i_count);
> ES> + /* did we race w/ unlink? */
> ES> + if (inode->i_nlink == 1)
> ES> + ext3_orphan_del(handle, inode);
>
> ES> err = ext3_add_nondir(handle, dentry, inode);
> ES> ext3_journal_stop(handle);

--
David Kleikamp
IBM Linux Technology Center

2007-01-12 21:26:45

by Alex Tomas

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

>>>>> Eric Sandeen (ES) writes:

ES> so I think it's possible that link can sneak in there & find it after
ES> the mutex is dropped...? Is this ok? :) It's certainly -happening-
ES> anyway....

yes, but it shouldn't allow to re-link such inode back, IMHO.
a filesystem may start some non-revertable activity in its
unlink method.

thanks, Alex

2007-01-12 21:48:20

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

Alex Tomas wrote:
>>>>>> Eric Sandeen (ES) writes:
>
> ES> so I think it's possible that link can sneak in there & find it after
> ES> the mutex is dropped...? Is this ok? :) It's certainly -happening-
> ES> anyway....
>
> yes, but it shouldn't allow to re-link such inode back, IMHO.
> a filesystem may start some non-revertable activity in its
> unlink method.
>
> thanks, Alex

I tend to agree, chatting w/ Al I think he does too. :) I'll test
a patch that kicks out ext3_link() with -ENOENT at the top, and resubmit
that if things go well.

-Eric

2007-01-12 21:55:58

by Alex Tomas

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

>>>>> Eric Sandeen (ES) writes:
ES> I tend to agree, chatting w/ Al I think he does too. :) I'll test
ES> a patch that kicks out ext3_link() with -ENOENT at the top, and resubmit
ES> that if things go well.

shouldn't VFS do that?

thanks, Alex

2007-01-12 22:01:53

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

Alex Tomas wrote:
>>>>>> Eric Sandeen (ES) writes:
> ES> I tend to agree, chatting w/ Al I think he does too. :) I'll test
> ES> a patch that kicks out ext3_link() with -ENOENT at the top, and resubmit
> ES> that if things go well.
>
> shouldn't VFS do that?

Al says "no" and I'm not arguing. :)

Apparently this may be OK with some filesystems, and Al says he doesn't
want to know about i_nlink in the vfs in any case.

But I suppose there may be other filesystems which DO care, and should
be checking if they're not.

-Eric

2007-01-12 22:07:59

by Alex Tomas

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

>>>>> Eric Sandeen (ES) writes:

ES> Al says "no" and I'm not arguing. :)

ES> Apparently this may be OK with some filesystems, and Al says he doesn't
ES> want to know about i_nlink in the vfs in any case.

well, generic_drop_inode() uses i_nlink ...

ES> But I suppose there may be other filesystems which DO care, and should
ES> be checking if they're not.

this is why I thought VFS could take care.

thanks, Alex

2007-01-12 22:35:20

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

Eric Sandeen wrote:

> Alex Tomas wrote:
>
>> yes, but it shouldn't allow to re-link such inode back, IMHO.
>> a filesystem may start some non-revertable activity in its
>> unlink method.
>>
>> thanks, Alex
>>
>
> I tend to agree, chatting w/ Al I think he does too. :) I'll test
> a patch that kicks out ext3_link() with -ENOENT at the top, and resubmit
> that if things go well.
>
Well this seems to fix things up for ext3 (and ext4 by extension):

---

Return -ENOENT from ext[34]_link if we've raced with unlink and
i_nlink is 0. Doing otherwise has the potential to corrupt the
orphan inode list, because we'd wind up with an inode with a
non-zero link count on the list, and it will never get properly
cleaned up.

Signed-off-by: Eric Sandeen <[email protected]>

Index: linux-2.6.19/fs/ext3/namei.c
===================================================================
--- linux-2.6.19.orig/fs/ext3/namei.c
+++ linux-2.6.19/fs/ext3/namei.c
@@ -2191,6 +2191,8 @@ static int ext3_link (struct dentry * ol

if (inode->i_nlink >= EXT3_LINK_MAX)
return -EMLINK;
+ if (inode->i_nlink == 0)
+ return -ENOENT;

retry:
handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
Index: linux-2.6.19/fs/ext4/namei.c
===================================================================
--- linux-2.6.19.orig/fs/ext4/namei.c
+++ linux-2.6.19/fs/ext4/namei.c
@@ -2189,6 +2189,8 @@ static int ext4_link (struct dentry * ol

if (inode->i_nlink >= EXT4_LINK_MAX)
return -EMLINK;
+ if (inode->i_nlink == 0)
+ return -ENOENT;

retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +

2007-01-14 11:57:56

by Dmitri Monakhov

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

Eric Sandeen <[email protected]> writes:

> I've been looking at a case where many threads are opening, unlinking, and
> hardlinking files on ext3 .
How many concurent threads do you use and how long does it takes to trigger
this race? I've tried to reproduce this with two threads, but not succeed.
<thread 1>
fd = create("src")
close(fd)
unlink("src")
<thread 2>
link("src", "dst")
unlink("dst")

Original testcase will be the best answer :).
Thanks.
> At unmount time I see an oops, because the superblock's
> orphan list points to a freed inode.
>
> I did some tracing of the inodes, and it looks like this:
>
> ext3_unlink():[/src/linux-2.6.18/fs/ext3/namei.c:2123] adding orphan
> i_state:0x7 cpu:1 i_count:2 i_nlink:0
>
> ext3_orphan_add():[/src/linux-2.6.18/fs/ext3/namei.c:1890] ext3_orphan_add
> i_state:0x7 cpu:1 i_count:2 i_nlink:0
>
> iput():[/src/linux-2.6.18/fs/inode.c:1139] iput enter
> i_state:0x7 cpu:1 i_count:2 i_nlink:0
>
> ext3_link():[/src/linux-2.6.18/fs/ext3/namei.c:2202] ext3_link enter
> i_state:0x7 cpu:3 i_count:1 i_nlink:0
>
> ext3_inc_count():[/src/linux-2.6.18/fs/ext3/namei.c:1627] done
> i_state:0x7 cpu:3 i_count:1 i_nlink:1
>
> The unlink gets there first, finds i_count > 0 (in use) but nlink goes to 0, so
> it puts it on the orphan inode list. Then link comes along, and bumps the link
> back up to 1. So now we are on the orphan inode list, but we are not unlinked.
>
> Eventually when count goes to 0, and we still have 1 link, again no action is
> taken to remove the inode from the orphan list, because it is still linked (i.e.
> we don't go through ext3_delete())
>
> When this inode is eventually freed, the sb orphan list gets corrupted, because
> we have freed it without first removing it from the orphan list.
>
> I think the simple solution is to remove the inode from the orphan list
> when we bump the link back up from 0 to 1. I put that test in there because
> there are other potential reasons that we might be on the list (truncates,
> direct IO).
>
> Comments?
>
> Thanks,
> -Eric
>
> p.s. ext3_inc_count and ext3_dec_count seem misnamed, have an unused
> arg, and are very infrequently called. I'll probably submit a patch
> to just put the single line of code into the caller, too.
>
> ---
>
> Remove inode from the orphan list in ext3_link() if we might have
> raced with ext3_unlink(), which potentially put it on the list.
> If we're on the list with nlink > 0, we'll never get cleaned up
> properly and eventually may corrupt the list.
>
> Signed-off-by: Eric Sandeen <[email protected]>
>
> Index: linux-2.6.19/fs/ext3/namei.c
> ===================================================================
> --- linux-2.6.19.orig/fs/ext3/namei.c
> +++ linux-2.6.19/fs/ext3/namei.c
> @@ -2204,6 +2204,9 @@ retry:
> inode->i_ctime = CURRENT_TIME_SEC;
> ext3_inc_count(handle, inode);
> atomic_inc(&inode->i_count);
> + /* did we race w/ unlink? */
> + if (inode->i_nlink == 1)
> + ext3_orphan_del(handle, inode);
>
> err = ext3_add_nondir(handle, dentry, inode);
> ext3_journal_stop(handle);
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2007-01-14 15:42:44

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

Dmitriy Monakhov wrote:
> Eric Sandeen <[email protected]> writes:
>
>> I've been looking at a case where many threads are opening, unlinking, and
>> hardlinking files on ext3 .
> How many concurent threads do you use and how long does it takes to trigger
> this race? I've tried to reproduce this with two threads, but not succeed.
> <thread 1>
> fd = create("src")
> close(fd)
> unlink("src")
> <thread 2>
> link("src", "dst")
> unlink("dst")
>
> Original testcase will be the best answer :).

Sure :) Though I didn't write it... see this collection of bash scripts:

http://people.redhat.com/esandeen/testcases/orphan-repro.tar.bz2

I didn't write it, but it exposed the bug for me. The VAR file contains
variables to specify mountpoint and a device, which the script starts by
mkfs'ing, so be warned.

It spawns -many- threads, and on my 4 CPU opteron I can hit it in a
reasonable amount of time. It would probably be nice to have a more
targeted testcase but it did the trick for me.

Thanks,
-Eric

> Thanks.

2007-01-17 22:43:11

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

On Jan 12, 2007 14:45 -0600, Eric Sandeen wrote:
> p.s. ext3_inc_count and ext3_dec_count seem misnamed, have an unused
> arg, and are very infrequently called. I'll probably submit a patch
> to just put the single line of code into the caller, too.

I use those functions in the > 32000 subdirs patch to handle the case
where a directory overflows the i_nlinks counter and needs to be set
to have i_nlinks = 1.

Patch was posted previously, but rejected because there is a potential
for directory loss if the filesystem is mounted on an older kernel
(i_nlinks == 1, subdir is removed, directory is truncated). I need to
add an ROCOMPAT feature in order to handle this, but haven't gotten
around to doing it.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-01-17 22:48:11

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

Andreas Dilger wrote:
> On Jan 12, 2007 14:45 -0600, Eric Sandeen wrote:
>> p.s. ext3_inc_count and ext3_dec_count seem misnamed, have an unused
>> arg, and are very infrequently called. I'll probably submit a patch
>> to just put the single line of code into the caller, too.
>
> I use those functions in the > 32000 subdirs patch to handle the case
> where a directory overflows the i_nlinks counter and needs to be set
> to have i_nlinks = 1.
>
> Patch was posted previously, but rejected because there is a potential
> for directory loss if the filesystem is mounted on an older kernel
> (i_nlinks == 1, subdir is removed, directory is truncated). I need to
> add an ROCOMPAT feature in order to handle this, but haven't gotten
> around to doing it.

Ah, ok. Sorry I'd missed that. Can we at least rename them to
something that makes sense? :)

-Eric

> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>

2007-01-18 11:17:14

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

On Jan 17, 2007 16:47 -0600, Eric Sandeen wrote:
> Andreas Dilger wrote:
> > On Jan 12, 2007 14:45 -0600, Eric Sandeen wrote:
> >> p.s. ext3_inc_count and ext3_dec_count seem misnamed, have an unused
> >> arg, and are very infrequently called. I'll probably submit a patch
> >> to just put the single line of code into the caller, too.
> >
> > I use those functions in the > 32000 subdirs patch to handle the case
> > where a directory overflows the i_nlinks counter and needs to be set
> > to have i_nlinks = 1.
> >
> > Patch was posted previously, but rejected because there is a potential
> > for directory loss if the filesystem is mounted on an older kernel
> > (i_nlinks == 1, subdir is removed, directory is truncated). I need to
> > add an ROCOMPAT feature in order to handle this, but haven't gotten
> > around to doing it.
>
> Ah, ok. Sorry I'd missed that. Can we at least rename them to
> something that makes sense? :)

I have no objection to renaming them, but had previously left them as-is
because it meant a smaller patch.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-01-18 15:29:29

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

Andreas Dilger wrote:
>> Ah, ok. Sorry I'd missed that. Can we at least rename them to
>> something that makes sense? :)
>
> I have no objection to renaming them, but had previously left them as-is
> because it meant a smaller patch.

*shrug* not a big deal; I can wait. Was just one of those things I came
across while looking in this area.

Thanks,

-Eric