LinuxLists.cc - [BUGFIX][PATCH 0/2] configfs: Fix cleanup after mkdir() failure

2008-06-26 18:06:04

Subject: [BUGFIX][PATCH 0/2] configfs: Fix cleanup after mkdir() failure

[ applies on top of http://lkml.org/lkml/2008/6/23/145 aka symlink() fixes ]

Hi,

This patchset fixes two kinds of bugs happening when
configfs_attach_group()/configfs_attach_item() fail and userspace races with
mkdir() or symlink().

Please read the first patch header for a detailed scenario explaining the bugs.

Louis

Summary (2):
configfs: Prevent userspace from creating new entries under attaching
directories
configfs: Lock new directory inodes before removing on cleanup after
failure

fs/configfs/configfs_internal.h | 1 +
fs/configfs/dir.c | 127 +++++++++++++++++++++++++++++++--------
fs/configfs/symlink.c | 17 +++++-
3 files changed, 118 insertions(+), 27 deletions(-)

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes

2008-06-26 18:06:27

by Louis Rilling

[permalink] [raw]

Subject: [BUGFIX][PATCH 2/2] configfs: Lock new directory inodes before removing on cleanup after failure

Once a new configfs directory is created by configfs_attach_item() or
configfs_attach_group(), a failure in the remaining initialization steps leads
to removing a directory which inode the VFS may have already accessed.

This commit adds the necessary inode locking to safely remove configfs
directories while cleaning up after a failure. As an advantage, the locking
rules of populate_groups() and detach_groups() become the same: the caller must
have the group's inode mutex locked.

Signed-off-by: Louis Rilling <[email protected]>
---
fs/configfs/dir.c | 50 +++++++++++++++++++++++++++++---------------------
1 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index bfd2620..c11bc1b 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -303,6 +303,8 @@ static void remove_dir(struct dentry * d)
* The only thing special about this is that we remove any files in
* the directory before we remove the directory, and we've inlined
* what used to be configfs_rmdir() below, instead of calling separately.
+ *
+ * Caller holds the mutex of the item's inode
*/

static void configfs_remove_dir(struct config_item * item)
@@ -594,36 +596,20 @@ static int create_default_group(struct config_group *parent_group,
static int populate_groups(struct config_group *group)
{
struct config_group *new_group;
- struct dentry *dentry = group->cg_item.ci_dentry;
int ret = 0;
int i;

- if (group->default_groups) {
- /*
- * FYI, we're faking mkdir here
- * I'm not sure we need this semaphore, as we're called
- * from our parent's mkdir. That holds our parent's
- * i_mutex, so afaik lookup cannot continue through our
- * parent to find us, let alone mess with our tree.
- * That said, taking our i_mutex is closer to mkdir
- * emulation, and shouldn't hurt.
- */
- mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
-
+ if (group->default_groups)
for (i = 0; group->default_groups[i]; i++) {
new_group = group->default_groups[i];

ret = create_default_group(group, new_group);
- if (ret)
+ if (ret) {
+ detach_groups(group);
break;
+ }
}

- mutex_unlock(&dentry->d_inode->i_mutex);
- }
-
- if (ret)
- detach_groups(group);
-
return ret;
}

@@ -738,7 +724,14 @@ static int configfs_attach_item(struct config_item *parent_item,
if (!ret) {
ret = populate_attrs(item);
if (ret) {
+ /*
+ * We are going to remove an inode and its dentry but
+ * the VFS may already have hit and used them.
+ */
+ mutex_lock(&dentry->d_inode->i_mutex);
configfs_remove_dir(item);
+ dentry->d_inode->i_flags |= S_DEAD;
+ mutex_unlock(&dentry->d_inode->i_mutex);
d_delete(dentry);
}
}
@@ -746,6 +739,7 @@ static int configfs_attach_item(struct config_item *parent_item,
return ret;
}

+/* Caller holds the mutex of the item's inode */
static void configfs_detach_item(struct config_item *item)
{
detach_attrs(item);
@@ -764,16 +758,30 @@ static int configfs_attach_group(struct config_item *parent_item,
sd = dentry->d_fsdata;
sd->s_type |= CONFIGFS_USET_DIR;

+ /*
+ * FYI, we're faking mkdir in populate_groups()
+ * We must lock the group's inode to avoid races with the VFS
+ * which can already hit the inode and try to add/remove entries
+ * under it.
+ *
+ * We must also lock the inode to remove it safely in case of
+ * error.
+ */
+ mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
ret = populate_groups(to_config_group(item));
if (ret) {
configfs_detach_item(item);
- d_delete(dentry);
+ dentry->d_inode->i_flags |= S_DEAD;
}
+ mutex_unlock(&dentry->d_inode->i_mutex);
+ if (ret)
+ d_delete(dentry);
}

return ret;
}

+/* Caller holds the mutex of the group's inode */
static void configfs_detach_group(struct config_item *item)
{
detach_groups(to_config_group(item));
--
1.5.5.3

2008-06-26 18:06:45

by Louis Rilling

[permalink] [raw]

Subject: [BUGFIX][PATCH 1/2] configfs: Prevent userspace from creating new entries under attaching directories

process 1: process 2:
configfs_mkdir("A")
attach_group("A")
attach_item("A")
d_instantiate("A")
populate_groups("A")
mutex_lock("A")
attach_group("A/B")
attach_item("A")
d_instantiate("A/B")
mkdir("A/B/C")
do_path_lookup("A/B/C", LOOKUP_PARENT)
ok
lookup_create("A/B/C")
mutex_lock("A/B")
ok
configfs_mkdir("A/B/C")
ok
attach_group("A/C")
attach_item("A/C")
d_instantiate("A/C")
populate_groups("A/C")
mutex_lock("A/C")
attach_group("A/C/D")
attach_item("A/C/D")
failure
mutex_unlock("A/C")
detach_groups("A/C")
nothing to do
mkdir("A/C/E")
do_path_lookup("A/C/E", LOOKUP_PARENT)
ok
lookup_create("A/C/E")
mutex_lock("A/C")
ok
configfs_mkdir("A/C/E")
ok
detach_item("A/C")
d_delete("A/C")
mutex_unlock("A")
detach_groups("A")
mutex_lock("A/B")
detach_group("A/B")
detach_groups("A/B")
nothing since no _default_ group
detach_item("A/B")
mutex_unlock("A/B")
d_delete("A/B")
detach_item("A")
d_delete("A")

Two bugs:

1/ "A/B/C" and "A/C/E" are created, but never removed while their parent are
removed in the end. The same could happen with symlink() instead of mkdir().

2/ "A" and "A/C" inodes are not locked while detach_item() is called on them,
which may probably confuse VFS.

This commit fixes 1/, tagging new directories with CONFIGFS_USET_CREATING before
building the inode and instantiating the dentry, and validating the whole
group+default groups hierarchy in a second pass by clearing
CONFIGFS_USET_CREATING.
mkdir(), symlink(), lookup(), and dir_open() simply return -ENOENT if
called in (or linking to) a directory tagged with CONFIGFS_USET_CREATING. This
does not prevent userspace from calling stat() successfuly on such directories,
but this prevents userspace from adding (children to | symlinking from/to |
read/write attributes of | listing the contents of) not validated items. In
other words, userspace will not interact with the subsystem on a new item until
the new item creation completes correctly.
It was first proposed to re-use CONFIGFS_USET_IN_MKDIR instead of a new
flag CONFIGFS_USET_CREATING, but this generated conflicts when checking the
target of a new symlink: a valid target directory in the middle of attaching
a new user-created child item could be wrongly detected as being attached.

2/ is fixed by next commit.

Signed-off-by: Louis Rilling <[email protected]>
---
fs/configfs/configfs_internal.h | 1 +
fs/configfs/dir.c | 77 ++++++++++++++++++++++++++++++++++++---
fs/configfs/symlink.c | 17 ++++++++-
3 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/fs/configfs/configfs_internal.h b/fs/configfs/configfs_internal.h
index 5f61b26..51d76bf 100644
--- a/fs/configfs/configfs_internal.h
+++ b/fs/configfs/configfs_internal.h
@@ -49,6 +49,7 @@ struct configfs_dirent {
#define CONFIGFS_USET_DEFAULT 0x0080
#define CONFIGFS_USET_DROPPING 0x0100
#define CONFIGFS_USET_IN_MKDIR 0x0200
+#define CONFIGFS_USET_CREATING 0x0400
#define CONFIGFS_NOT_PINNED (CONFIGFS_ITEM_ATTR)

extern struct mutex configfs_symlink_mutex;
diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index 2c873fd..bfd2620 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -185,7 +185,7 @@ static int create_dir(struct config_item * k, struct dentry * p,
error = configfs_dirent_exists(p->d_fsdata, d->d_name.name);
if (!error)
error = configfs_make_dirent(p->d_fsdata, d, k, mode,
- CONFIGFS_DIR);
+ CONFIGFS_DIR | CONFIGFS_USET_CREATING);
if (!error) {
error = configfs_create(d, mode, init_dir);
if (!error) {
@@ -209,6 +209,9 @@ static int create_dir(struct config_item * k, struct dentry * p,
* configfs_create_dir - create a directory for an config_item.
* @item: config_itemwe're creating directory for.
* @dentry: config_item's dentry.
+ *
+ * Note: user-created entries won't be allowed under this new directory
+ * until it is validated by configfs_validate_dir()
*/

static int configfs_create_dir(struct config_item * item, struct dentry *dentry)
@@ -231,6 +234,23 @@ static int configfs_create_dir(struct config_item * item, struct dentry *dentry)
return error;
}

+/*
+ * Allow userspace to create new entries under a new directory created with
+ * configfs_create_dir(), and under all of its chidlren directories recursively.
+ * @sd configfs_dirent of the new directory to validate
+ *
+ * Caller must hold configfs_dirent_lock.
+ */
+static void configfs_validate_dir(struct configfs_dirent *sd)
+{
+ struct configfs_dirent *child_sd;
+
+ sd->s_type &= ~CONFIGFS_USET_CREATING;
+ list_for_each_entry(child_sd, &sd->s_children, s_sibling)
+ if (child_sd->s_type & CONFIGFS_USET_CREATING)
+ configfs_validate_dir(child_sd);
+}
+
int configfs_create_link(struct configfs_symlink *sl,
struct dentry *parent,
struct dentry *dentry)
@@ -332,6 +352,21 @@ static struct dentry * configfs_lookup(struct inode *dir,
int found = 0;
int err = 0;

+ /*
+ * Fake invisibility if dir belongs to a group/default groups hierarchy
+ * being attached
+ *
+ * This forbids userspace to read/write attributes of items which may
+ * not complete their initialization, since the dentries of the
+ * attributes won't be instantiated.
+ */
+ spin_lock(&configfs_dirent_lock);
+ if (parent_sd->s_type & CONFIGFS_USET_CREATING)
+ err = -ENOENT;
+ spin_unlock(&configfs_dirent_lock);
+ if (err)
+ goto out;
+
list_for_each_entry(sd, &parent_sd->s_children, s_sibling) {
if (sd->s_type & CONFIGFS_NOT_PINNED) {
const unsigned char * name = configfs_get_name(sd);
@@ -353,6 +388,7 @@ static struct dentry * configfs_lookup(struct inode *dir,
return simple_lookup(dir, dentry, nd);
}

+out:
return ERR_PTR(err);
}

@@ -1027,7 +1063,7 @@ EXPORT_SYMBOL(configfs_undepend_item);

static int configfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
{
- int ret, module_got = 0;
+ int ret = 0, module_got = 0;
struct config_group *group;
struct config_item *item;
struct config_item *parent_item;
@@ -1043,6 +1079,18 @@ static int configfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
}

sd = dentry->d_parent->d_fsdata;
+
+ /*
+ * Fake invisibility if dir belongs to a group/default groups hierarchy
+ * being attached
+ */
+ spin_lock(&configfs_dirent_lock);
+ if (sd->s_type & CONFIGFS_USET_CREATING)
+ ret = -ENOENT;
+ spin_unlock(&configfs_dirent_lock);
+ if (ret)
+ goto out;
+
if (!(sd->s_type & CONFIGFS_USET_DIR)) {
ret = -EPERM;
goto out;
@@ -1137,6 +1185,8 @@ static int configfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)

spin_lock(&configfs_dirent_lock);
sd->s_type &= ~CONFIGFS_USET_IN_MKDIR;
+ if (!ret)
+ configfs_validate_dir(dentry->d_fsdata);
spin_unlock(&configfs_dirent_lock);

out_unlink:
@@ -1317,13 +1367,26 @@ static int configfs_dir_open(struct inode *inode, struct file *file)
{
struct dentry * dentry = file->f_path.dentry;
struct configfs_dirent * parent_sd = dentry->d_fsdata;
+ int err = 0;

mutex_lock(&dentry->d_inode->i_mutex);
- file->private_data = configfs_new_dirent(parent_sd, NULL);
- mutex_unlock(&dentry->d_inode->i_mutex);
+ /*
+ * Fake invisibility if dir belongs to a group/default groups hierarchy
+ * being attached
+ */
+ spin_lock(&configfs_dirent_lock);
+ if (parent_sd->s_type & CONFIGFS_USET_CREATING)
+ err = -ENOENT;
+ spin_unlock(&configfs_dirent_lock);

- return IS_ERR(file->private_data) ? PTR_ERR(file->private_data) : 0;
+ if (!err) {
+ file->private_data = configfs_new_dirent(parent_sd, NULL);
+ if (IS_ERR(file->private_data))
+ err = PTR_ERR(file->private_data);
+ }
+ mutex_unlock(&dentry->d_inode->i_mutex);

+ return err;
}

static int configfs_dir_close(struct inode *inode, struct file *file)
@@ -1494,6 +1557,10 @@ int configfs_register_subsystem(struct configfs_subsystem *subsys)
if (err) {
d_delete(dentry);
dput(dentry);
+ } else {
+ spin_lock(&configfs_dirent_lock);
+ configfs_validate_dir(dentry->d_fsdata);
+ spin_unlock(&configfs_dirent_lock);
}
}

diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index 21bd751..65623fb 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -81,7 +81,7 @@ static int create_link(struct config_item *parent_item,
if (sl) {
sl->sl_target = config_item_get(item);
spin_lock(&configfs_dirent_lock);
- if (target_sd->s_type & CONFIGFS_USET_DROPPING) {
+ if (target_sd->s_type & (CONFIGFS_USET_CREATING | CONFIGFS_USET_DROPPING)) {
spin_unlock(&configfs_dirent_lock);
config_item_put(item);
kfree(sl);
@@ -129,6 +129,7 @@ int configfs_symlink(struct inode *dir, struct dentry *dentry, const char *symna
{
int ret;
struct nameidata nd;
+ struct configfs_dirent *sd;
struct config_item *parent_item;
struct config_item *target_item;
struct config_item_type *type;
@@ -137,9 +138,23 @@ int configfs_symlink(struct inode *dir, struct dentry *dentry, const char *symna
if (dentry->d_parent == configfs_sb->s_root)
goto out;

+ sd = dentry->d_parent->d_fsdata;
+ /*
+ * Fake invisibility if dir belongs to a group/default groups hierarchy
+ * being attached
+ */
+ ret = 0;
+ spin_lock(&configfs_dirent_lock);
+ if (sd->s_type & CONFIGFS_USET_CREATING)
+ ret = -ENOENT;
+ spin_unlock(&configfs_dirent_lock);
+ if (ret)
+ goto out;
+
parent_item = configfs_get_config_item(dentry->d_parent);
type = parent_item->ci_type;

+ ret = -EPERM;
if (!type || !type->ct_item_ops ||
!type->ct_item_ops->allow_link)
goto out_put;
--
1.5.5.3

2008-07-03 21:59:44

by Joel Becker

[permalink] [raw]

Subject: Re: [Ocfs2-devel] [BUGFIX][PATCH 1/2] configfs: Prevent userspace from creating new entries under attaching directories

On Thu, Jun 26, 2008 at 08:05:48PM +0200, Louis Rilling wrote:
> This commit fixes 1/, tagging new directories with CONFIGFS_USET_CREATING before
> building the inode and instantiating the dentry, and validating the whole
> group+default groups hierarchy in a second pass by clearing
> CONFIGFS_USET_CREATING.

Man, I'm wary of all these in-flight flags. I hope they are all
orthogonal :-)

> mkdir(), symlink(), lookup(), and dir_open() simply return -ENOENT if
> called in (or linking to) a directory tagged with CONFIGFS_USET_CREATING. This

Why not block until the create is done?

> + /*
> + * Fake invisibility if dir belongs to a group/default groups hierarchy
> + * being attached
> + *
> + * This forbids userspace to read/write attributes of items which may
> + * not complete their initialization, since the dentries of the
> + * attributes won't be instantiated.
> + */
int configfs_dirent_is_ready(struct configfs_dirent *sd)
{
int err = 0;
> + spin_lock(&configfs_dirent_lock);
> + if (parent_sd->s_type & CONFIGFS_USET_CREATING)
> + err = -ENOENT;
> + spin_unlock(&configfs_dirent_lock);
return err;
}

Then use is_ready() in the five places you check it ;-) Perhaps
change configfs_validate_dir() to configfs_dir_set_ready(). I do like
the way validate_dir() is coded.

Joel

--

"There are only two ways to live your life. One is as though nothing
is a miracle. The other is as though everything is a miracle."
- Albert Einstein

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-07-03 22:08:08

by Joel Becker

[permalink] [raw]

Subject: Re: [Ocfs2-devel] [BUGFIX][PATCH 2/2] configfs: Lock new directory inodes before removing on cleanup after failure

On Thu, Jun 26, 2008 at 08:05:49PM +0200, Louis Rilling wrote:
> Once a new configfs directory is created by configfs_attach_item() or
> configfs_attach_group(), a failure in the remaining initialization steps leads
> to removing a directory which inode the VFS may have already accessed.
>
> This commit adds the necessary inode locking to safely remove configfs
> directories while cleaning up after a failure. As an advantage, the locking
> rules of populate_groups() and detach_groups() become the same: the caller must
> have the group's inode mutex locked.

I like this latter symmetry.

> static void configfs_remove_dir(struct config_item * item)
> @@ -594,36 +596,20 @@ static int create_default_group(struct config_group *parent_group,
> static int populate_groups(struct config_group *group)
> {
> struct config_group *new_group;
> - struct dentry *dentry = group->cg_item.ci_dentry;
> int ret = 0;
> int i;
>
> - if (group->default_groups) {
> - /*
> - * FYI, we're faking mkdir here
> - * I'm not sure we need this semaphore, as we're called
> - * from our parent's mkdir. That holds our parent's
> - * i_mutex, so afaik lookup cannot continue through our
> - * parent to find us, let alone mess with our tree.
> - * That said, taking our i_mutex is closer to mkdir
> - * emulation, and shouldn't hurt.
> - */
> - mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
> -
> + if (group->default_groups)

Put the {} around this if. It may only have the for() as a
child, but it's 10 lines.

> for (i = 0; group->default_groups[i]; i++) {
> new_group = group->default_groups[i];
>
> ret = create_default_group(group, new_group);
> - if (ret)
> + if (ret) {
> + detach_groups(group);
> break;
> + }
> }
>
<snip>
> @@ -738,7 +724,14 @@ static int configfs_attach_item(struct config_item *parent_item,
> if (!ret) {
> ret = populate_attrs(item);
> if (ret) {
> + /*
> + * We are going to remove an inode and its dentry but
> + * the VFS may already have hit and used them.
> + */
> + mutex_lock(&dentry->d_inode->i_mutex);
> configfs_remove_dir(item);
> + dentry->d_inode->i_flags |= S_DEAD;
> + mutex_unlock(&dentry->d_inode->i_mutex);

This emulates how rmdir() would lock it, which is quite nice.
Perhaps add to the comment "thus, we must lock them as rmdir() would".

Joel

--

"There is shadow under this red rock.
(Come in under the shadow of this red rock)
And I will show you something different from either
Your shadow at morning striding behind you
Or your shadow at evening rising to meet you.
I will show you fear in a handful of dust."

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-07-04 11:12:41

by Louis Rilling

[permalink] [raw]

Subject: Re: [Ocfs2-devel] [BUGFIX][PATCH 1/2] configfs: Prevent userspace from creating new entries under attaching directories

On Thu, Jul 03, 2008 at 02:58:56PM -0700, Joel Becker wrote:
> On Thu, Jun 26, 2008 at 08:05:48PM +0200, Louis Rilling wrote:
> > This commit fixes 1/, tagging new directories with CONFIGFS_USET_CREATING before
> > building the inode and instantiating the dentry, and validating the whole
> > group+default groups hierarchy in a second pass by clearing
> > CONFIGFS_USET_CREATING.
>
> Man, I'm wary of all these in-flight flags. I hope they are all
> orthogonal :-)

Yes they are :)

>
> > mkdir(), symlink(), lookup(), and dir_open() simply return -ENOENT if
> > called in (or linking to) a directory tagged with CONFIGFS_USET_CREATING. This
>
> Why not block until the create is done?

Hm, I think that we can't without risking deadlocks.

- mkdir(): we can only hit CONFIGFS_USET_CREATING when called inside a default
group A/.../B of a new group A being attached. We hold B's i_mutex from
sys_mkdirat(). We must not block because this could deadlock with
detach_groups() in case the new group A fails to attach all its default
groups.
- symlink(): same issue as mkdir(), plus the fact that it is not possible to
block on the target of symlink(), because of potential deadlocks with
lock_rename().
- lookup(): same issue as mkdir().
- dir_open(): same issue as mkdir().

>
> > + /*
> > + * Fake invisibility if dir belongs to a group/default groups hierarchy
> > + * being attached
> > + *
> > + * This forbids userspace to read/write attributes of items which may
> > + * not complete their initialization, since the dentries of the
> > + * attributes won't be instantiated.
> > + */
> int configfs_dirent_is_ready(struct configfs_dirent *sd)
> {
> int err = 0;
> > + spin_lock(&configfs_dirent_lock);
> > + if (parent_sd->s_type & CONFIGFS_USET_CREATING)
> > + err = -ENOENT;
> > + spin_unlock(&configfs_dirent_lock);
> return err;
> }
>
> Then use is_ready() in the five places you check it ;-) Perhaps
> change configfs_validate_dir() to configfs_dir_set_ready(). I do like
> the way validate_dir() is coded.

Ok. I'll resend the patchset with this change.

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes

Attachments:

(No filename) (2.23 kB)
signature.asc (189.00 B)
Digital signature Download all attachments