2008-06-24 14:17:37

by Louis Rilling

[permalink] [raw]
Subject: configfs: Q: item leak in a failing configfs_attach_group()?

Hi,

I'd like an opinion on the following scenario:

process 1: process 2:
configfs_mkdir("A")
attach_group("A")
attach_item("A")
d_instantiate("A")
populate_groups("A")
mutex_lock("A")
attach_group("A/B")
attach_item("A")
d_instantiate("A/B")
mkdir("A/B/C")
do_path_lookup("A/B/C", LOOKUP_PARENT)
ok
lookup_create("A/B/C")
mutex_lock("A/B")
ok
configfs_mkdir("A/B/C")
ok
attach_group("A/C")
attach_item("A/C")
d_instantiate("A/C")
populate_groups("A/C")
mutex_lock("A/C")
attach_group("A/C/D")
attach_item("A/C/D")
failure
mutex_unlock("A/C")
detach_groups("A/C")
nothing to do
mkdir("A/C/E")
do_path_lookup("A/C/E", LOOKUP_PARENT)
ok
lookup_create("A/C/E")
mutex_lock("A/C")
ok
configfs_mkdir("A/C/E")
ok
detach_item("A/C")
d_delete("A/C")
mutex_unlock("A")
detach_groups("A")
mutex_lock("A/B")
detach_group("A/B")
detach_groups("A/B")
nothing since no _default_ group
detach_item("A/B")
mutex_unlock("A/B")
d_delete("A/B")
detach_item("A")
d_delete("A")

Two bugs (if the scenario is possible):

1/ "A/B/C" and "A/C/E" are created, but never removed while their parent are
removed in the end.

2/ "A" and "A/C" inodes are not locked while detach_item() is called on them,
which may probably confuse VFS.

Is there something that prevents such scenario? I'd say that once dentries
are instantiated, the dcache does not need to lock their inode to traverse them,
so the scenario is possible.

Where am I wrong?

If I'm right, two kinds of solutions for issue 1:
i/ tag new directories with CONFIGFS_USET_NEW before calling d_instantiate, and
validate the whole group+default groups hierarchy in a second pass by clearing
CONFIGFS_USET_NEW

ii/ do not call d_instantiate() immediately in configfs_create() if called from
configfs_create_dir(), and d_instantitate() the group+default groups hierarchy
in a second pass. Problem: is it correct to add children to a dentry which is
not yet instantiated?

For issue 2/, locking the inode before calling detach_item() (as is done from
configfs_rmdir()), plus a solution for 1/ should be sufficient.

Thanks for your explanations.

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (2.58 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-06-24 17:11:34

by Joel Becker

[permalink] [raw]
Subject: Re: configfs: Q: item leak in a failing configfs_attach_group()?

On Tue, Jun 24, 2008 at 04:16:49PM +0200, Louis Rilling wrote:
> Hi,
>
> I'd like an opinion on the following scenario:
>
> process 1: process 2:
> configfs_mkdir("A")
> attach_group("A")
> attach_item("A")
> d_instantiate("A")
> populate_groups("A")
> mutex_lock("A")
> attach_group("A/B")
> attach_item("A")
> d_instantiate("A/B")
> mkdir("A/B/C")
> do_path_lookup("A/B/C", LOOKUP_PARENT)

This has to sleep until
configfs_mkdir("A") finishes.
It's waiting on A->d_parent's
i_mutex, which is held by
sys_mkdirat().

Joel

--

"Sometimes one pays most for the things one gets for nothing."
- Albert Einstein

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-06-24 18:05:11

by Louis Rilling

[permalink] [raw]
Subject: Re: configfs: Q: item leak in a failing configfs_attach_group()?

On Tue, Jun 24, 2008 at 10:10:51AM -0700, Joel Becker wrote:
> On Tue, Jun 24, 2008 at 04:16:49PM +0200, Louis Rilling wrote:
> > Hi,
> >
> > I'd like an opinion on the following scenario:
> >
> > process 1: process 2:
> > configfs_mkdir("A")
> > attach_group("A")
> > attach_item("A")
> > d_instantiate("A")
> > populate_groups("A")
> > mutex_lock("A")
> > attach_group("A/B")
> > attach_item("A")
> > d_instantiate("A/B")
> > mkdir("A/B/C")
> > do_path_lookup("A/B/C", LOOKUP_PARENT)
>
> This has to sleep until
> configfs_mkdir("A") finishes.
> It's waiting on A->d_parent's
> i_mutex, which is held by
> sys_mkdirat().

Can you be more precise? I don't see where do_path_lookup() locks an inode
outside of real_lookup(), and what I understand is that real_lookup() is neither
called for "A", nor called for "A/B" because __d_lookup() will find positive
dentries (they were d_instantiated). Since do_path_lookup() is called with
LOOKUP_PARENT, it stops here. Then lookup_create() locks "A/B"'s inode, without
waiting, and calls lookup_hash("A/B", "C"), which ends calling
configfs_lookup("A/B", "C") because "A/B" has no child "C" in the dcache.

What am I missing?

Thanks,

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (1.42 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-06-24 21:35:44

by Joel Becker

[permalink] [raw]
Subject: Re: configfs: Q: item leak in a failing configfs_attach_group()?

On Tue, Jun 24, 2008 at 08:04:56PM +0200, Louis Rilling wrote:
> On Tue, Jun 24, 2008 at 10:10:51AM -0700, Joel Becker wrote:
> > On Tue, Jun 24, 2008 at 04:16:49PM +0200, Louis Rilling wrote:
> > > Hi,
> > >
> > > I'd like an opinion on the following scenario:
> > >
> > > process 1: process 2:
> > > configfs_mkdir("A")
> > > attach_group("A")
> > > attach_item("A")
> > > d_instantiate("A")
> > > populate_groups("A")
> > > mutex_lock("A")
> > > attach_group("A/B")
> > > attach_item("A")
> > > d_instantiate("A/B")
> > > mkdir("A/B/C")
> > > do_path_lookup("A/B/C", LOOKUP_PARENT)
> >
> > This has to sleep until
> > configfs_mkdir("A") finishes.
> > It's waiting on A->d_parent's
> > i_mutex, which is held by
> > sys_mkdirat().
>
> Can you be more precise? I don't see where do_path_lookup() locks an inode

It doesn't. It's in lookup_create(), which takes the mutex on the
parent of 'A'. Note that the end of sys_mkdirat() explicitly drops that
mutex - it couldn't do so if it hadn't been taken :-)

Joel

--

"You cannot bring about prosperity by discouraging thrift. You cannot
strengthen the weak by weakening the strong. You cannot help the wage
earner by pulling down the wage payer. You cannot further the
brotherhood of man by encouraging class hatred. You cannot help the
poor by destroying the rich. You cannot build character and courage by
taking away a man's initiative and independence. You cannot help men
permanently by doing for them what they could and should do for
themselves."
- Abraham Lincoln

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-06-25 09:55:38

by Louis Rilling

[permalink] [raw]
Subject: Re: configfs: Q: item leak in a failing configfs_attach_group()?

On Tue, Jun 24, 2008 at 02:34:39PM -0700, Joel Becker wrote:
> On Tue, Jun 24, 2008 at 08:04:56PM +0200, Louis Rilling wrote:
> > On Tue, Jun 24, 2008 at 10:10:51AM -0700, Joel Becker wrote:
> > > On Tue, Jun 24, 2008 at 04:16:49PM +0200, Louis Rilling wrote:
> > > > Hi,
> > > >
> > > > I'd like an opinion on the following scenario:
> > > >
> > > > process 1: process 2:
> > > > configfs_mkdir("A")
> > > > attach_group("A")
> > > > attach_item("A")
> > > > d_instantiate("A")
> > > > populate_groups("A")
> > > > mutex_lock("A")
> > > > attach_group("A/B")
> > > > attach_item("A")
> > > > d_instantiate("A/B")
> > > > mkdir("A/B/C")
> > > > do_path_lookup("A/B/C", LOOKUP_PARENT)
> > >
> > > This has to sleep until
> > > configfs_mkdir("A") finishes.
> > > It's waiting on A->d_parent's
> > > i_mutex, which is held by
> > > sys_mkdirat().
> >
> > Can you be more precise? I don't see where do_path_lookup() locks an inode
>
> It doesn't. It's in lookup_create(), which takes the mutex on the
> parent of 'A'. Note that the end of sys_mkdirat() explicitly drops that
> mutex - it couldn't do so if it hadn't been taken :-)

So, my scenario is realistic. Process 2 only locks "B"'s inode in
lookup_create() ("B" is the parent of the new directory "C"), and never has to
lock "A" or "A"'s parent. IOW, process 2 does not have to wait on any i_mutex
locked by process 1.

Back to the two solutions that I've suggested (copy-pasted below), which one
would you prefer?

If I'm right, two kinds of solutions for issue 1 (new item created while
attaching a default group hierarchy):
i/ tag new directories with CONFIGFS_USET_NEW before calling d_instantiate, and
validate the whole group+default groups hierarchy in a second pass by clearing
CONFIGFS_USET_NEW

ii/ do not call d_instantiate() immediately in configfs_create() if called from
configfs_create_dir(), and d_instantitate() the group+default groups hierarchy
in a second pass. Problem: is it correct to add children to a dentry which is
not yet instantiated?

For issue 2/ (detach_item() called without locking the detached item's inode),
locking the inode before calling detach_item() (as is done from
configfs_rmdir()), plus a solution for 1/ should be sufficient.

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (2.45 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-06-25 20:22:22

by Joel Becker

[permalink] [raw]
Subject: Re: configfs: Q: item leak in a failing configfs_attach_group()?

On Wed, Jun 25, 2008 at 11:55:27AM +0200, Louis Rilling wrote:
> On Tue, Jun 24, 2008 at 02:34:39PM -0700, Joel Becker wrote:
> > On Tue, Jun 24, 2008 at 08:04:56PM +0200, Louis Rilling wrote:
> > > On Tue, Jun 24, 2008 at 10:10:51AM -0700, Joel Becker wrote:
> > > > On Tue, Jun 24, 2008 at 04:16:49PM +0200, Louis Rilling wrote:
> > > > > Hi,
> > > > >
> > > > > I'd like an opinion on the following scenario:
> > > > >
> > > > > process 1: process 2:
> > > > > configfs_mkdir("A")
> > > > > attach_group("A")
> > > > > attach_item("A")
> > > > > d_instantiate("A")
> > > > > populate_groups("A")
> > > > > mutex_lock("A")
> > > > > attach_group("A/B")
> > > > > attach_item("A")
> > > > > d_instantiate("A/B")
> > > > > mkdir("A/B/C")
> > > > > do_path_lookup("A/B/C", LOOKUP_PARENT)
> > > >
> > > > This has to sleep until
> > > > configfs_mkdir("A") finishes.
> > > > It's waiting on A->d_parent's
> > > > i_mutex, which is held by
> > > > sys_mkdirat().
> > >
> > > Can you be more precise? I don't see where do_path_lookup() locks an inode
> >
> > It doesn't. It's in lookup_create(), which takes the mutex on the
> > parent of 'A'. Note that the end of sys_mkdirat() explicitly drops that
> > mutex - it couldn't do so if it hadn't been taken :-)
>
> So, my scenario is realistic. Process 2 only locks "B"'s inode in
> lookup_create() ("B" is the parent of the new directory "C"), and never has to
> lock "A" or "A"'s parent. IOW, process 2 does not have to wait on any i_mutex
> locked by process 1.

Um, 'A' hasn't appeared yet. I don't see how it looks up 'A'
until we're done.

Joel

--

"When ideas fail, words come in very handy."
- Goethe

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-06-25 20:31:02

by Joel Becker

[permalink] [raw]
Subject: Re: configfs: Q: item leak in a failing configfs_attach_group()?

On Wed, Jun 25, 2008 at 01:20:58PM -0700, Joel Becker wrote:
> Um, 'A' hasn't appeared yet. I don't see how it looks up 'A'
> until we're done.

Oh, I think you mean cached_lookup isn't blocked by i_mutex.

Joel

--

Life's Little Instruction Book #497

"Go down swinging."

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-06-26 02:13:21

by Joel Becker

[permalink] [raw]
Subject: Re: configfs: Q: item leak in a failing configfs_attach_group()?

On Wed, Jun 25, 2008 at 11:55:27AM +0200, Louis Rilling wrote:
> Back to the two solutions that I've suggested (copy-pasted below), which one
> would you prefer?

God, this is all ugly. You've found so many ugly cases :-(

> If I'm right, two kinds of solutions for issue 1 (new item created while
> attaching a default group hierarchy):
> i/ tag new directories with CONFIGFS_USET_NEW before calling d_instantiate, and
> validate the whole group+default groups hierarchy in a second pass by clearing
> CONFIGFS_USET_NEW

I think this is the right way. We can't d_instantiate() later,
because lower callers use dentry->d_inode, and trying to work around
that would be even uglier!
But can't we just propagate CONFIGFS_USET_MKDIR? That's what's
happening actually. Just set it down in the paths. Then, change like
so:

if (group)
ret = configfs_attach_group(parent_item, item, dentry);
else
ret = configfs_attach_item(parent_item, item, dentry);

spin_lock(&configfs_dirent_lock);
sd->s_type &= ~CONFIGFS_USET_IN_MKDIR;
+ if (!ret)
+ configfs_clear_mkdir_flag(dentry);
spin_unlock(&configfs_dirent_lock);

Right?

> For issue 2/ (detach_item() called without locking the detached item's inode),
> locking the inode before calling detach_item() (as is done from
> configfs_rmdir()), plus a solution for 1/ should be sufficient.

Make sure you lock/unlock in the right place (mirror the
teardown path).

Joel

--

A good programming language should have features that make the
kind of people who use the phrase "software engineering" shake
their heads disapprovingly.
- Paul Graham

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127