2008-06-12 13:44:27

by Louis Rilling

[permalink] [raw]
Subject: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

This patch introduces configfs_dirent_lock spinlock to protect configfs_dirent
traversals against linkage mutations (add/del/move). This will allow
configfs_detach_prep() to avoid locking i_mutexes.

Locking rules for configfs_dirent linkage mutations are the same plus the
requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
either take appropriate i_mutex as before, or take configfs_dirent_lock.

The spinlock could actually be a mutex, but the critical sections are either
O(1) or should not be too long (default groups walking in last patch).

Signed-off-by: Louis Rilling <[email protected]>
---
fs/configfs/configfs_internal.h | 3 +++
fs/configfs/dir.c | 25 +++++++++++++++++++++++++
fs/configfs/inode.c | 2 ++
fs/configfs/symlink.c | 2 ++
4 files changed, 32 insertions(+)

Index: b/fs/configfs/configfs_internal.h
===================================================================
--- a/fs/configfs/configfs_internal.h 2008-06-12 13:44:27.000000000 +0200
+++ b/fs/configfs/configfs_internal.h 2008-06-12 13:44:34.000000000 +0200
@@ -26,6 +26,7 @@

#include <linux/slab.h>
#include <linux/list.h>
+#include <linux/spinlock.h>

struct configfs_dirent {
atomic_t s_count;
@@ -49,6 +50,8 @@ struct configfs_dirent {
#define CONFIGFS_USET_DROPPING 0x0100
#define CONFIGFS_NOT_PINNED (CONFIGFS_ITEM_ATTR)

+extern spinlock_t configfs_dirent_lock;
+
extern struct vfsmount * configfs_mount;
extern struct kmem_cache *configfs_dir_cachep;

Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c 2008-06-12 13:44:27.000000000 +0200
+++ b/fs/configfs/dir.c 2008-06-12 15:20:26.000000000 +0200
@@ -35,6 +35,11 @@
#include "configfs_internal.h"

DECLARE_RWSEM(configfs_rename_sem);
+/*
+ * Protects configfs_dirent traversals against linkage mutations
+ * Can be used as an alternative to taking the concerned i_mutex
+ */
+DEFINE_SPINLOCK(configfs_dirent_lock);

static void configfs_d_iput(struct dentry * dentry,
struct inode * inode)
@@ -79,7 +84,9 @@ static struct configfs_dirent *configfs_
atomic_set(&sd->s_count, 1);
INIT_LIST_HEAD(&sd->s_links);
INIT_LIST_HEAD(&sd->s_children);
+ spin_lock(&configfs_dirent_lock);
list_add(&sd->s_sibling, &parent_sd->s_children);
+ spin_unlock(&configfs_dirent_lock);
sd->s_element = element;

return sd;
@@ -173,7 +180,9 @@ static int create_dir(struct config_item
} else {
struct configfs_dirent *sd = d->d_fsdata;
if (sd) {
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_put(sd);
}
}
@@ -224,7 +233,9 @@ int configfs_create_link(struct configfs
else {
struct configfs_dirent *sd = dentry->d_fsdata;
if (sd) {
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_put(sd);
}
}
@@ -238,7 +249,9 @@ static void remove_dir(struct dentry * d
struct configfs_dirent * sd;

sd = d->d_fsdata;
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_put(sd);
if (d->d_inode)
simple_rmdir(parent->d_inode,d);
@@ -410,7 +423,9 @@ static void detach_attrs(struct config_i
list_for_each_entry_safe(sd, tmp, &parent_sd->s_children, s_sibling) {
if (!sd->s_element || !(sd->s_type & CONFIGFS_NOT_PINNED))
continue;
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_drop_dentry(sd, dentry);
configfs_put(sd);
}
@@ -1268,7 +1283,9 @@ static int configfs_dir_close(struct ino
struct configfs_dirent * cursor = file->private_data;

mutex_lock(&dentry->d_inode->i_mutex);
+ spin_lock(&configfs_dirent_lock);
list_del_init(&cursor->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
mutex_unlock(&dentry->d_inode->i_mutex);

release_configfs_dirent(cursor);
@@ -1308,7 +1325,9 @@ static int configfs_readdir(struct file
/* fallthrough */
default:
if (filp->f_pos == 2) {
+ spin_lock(&configfs_dirent_lock);
list_move(q, &parent_sd->s_children);
+ spin_unlock(&configfs_dirent_lock);
}
for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
struct configfs_dirent *next;
@@ -1331,7 +1350,9 @@ static int configfs_readdir(struct file
dt_type(next)) < 0)
return 0;

+ spin_lock(&configfs_dirent_lock);
list_move(q, p);
+ spin_unlock(&configfs_dirent_lock);
p = q;
filp->f_pos++;
}
@@ -1362,7 +1383,9 @@ static loff_t configfs_dir_lseek(struct
struct list_head *p;
loff_t n = file->f_pos - 2;

+ spin_lock(&configfs_dirent_lock);
list_del(&cursor->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
p = sd->s_children.next;
while (n && p != &sd->s_children) {
struct configfs_dirent *next;
@@ -1372,7 +1395,9 @@ static loff_t configfs_dir_lseek(struct
n--;
p = p->next;
}
+ spin_lock(&configfs_dirent_lock);
list_add_tail(&cursor->s_sibling, p);
+ spin_unlock(&configfs_dirent_lock);
}
}
mutex_unlock(&dentry->d_inode->i_mutex);
Index: b/fs/configfs/inode.c
===================================================================
--- a/fs/configfs/inode.c 2008-06-12 13:44:27.000000000 +0200
+++ b/fs/configfs/inode.c 2008-06-12 13:44:34.000000000 +0200
@@ -247,7 +247,9 @@ void configfs_hash_and_remove(struct den
if (!sd->s_element)
continue;
if (!strcmp(configfs_get_name(sd), name)) {
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_drop_dentry(sd, dir);
configfs_put(sd);
break;
Index: b/fs/configfs/symlink.c
===================================================================
--- a/fs/configfs/symlink.c 2008-06-12 13:44:27.000000000 +0200
+++ b/fs/configfs/symlink.c 2008-06-12 13:44:34.000000000 +0200
@@ -169,7 +169,9 @@ int configfs_unlink(struct inode *dir, s
parent_item = configfs_get_config_item(dentry->d_parent);
type = parent_item->ci_type;

+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_drop_dentry(sd, dentry->d_parent);
dput(dentry);
configfs_put(sd);

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


2008-06-12 19:14:48

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

On Thu, Jun 12, 2008 at 03:31:27PM +0200, Louis Rilling wrote:
> This patch introduces configfs_dirent_lock spinlock to protect configfs_dirent
> traversals against linkage mutations (add/del/move). This will allow
> configfs_detach_prep() to avoid locking i_mutexes.

I like that you expanded the scope to cover all configfs_dirent
linkages. These are all protected by i_mutex in the current code, but
your rename fix removes that protection.

> Locking rules for configfs_dirent linkage mutations are the same plus the
> requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
> either take appropriate i_mutex as before, or take configfs_dirent_lock.

Nope, you *must* take configfs_dirent_lock now. You've removed
i_mutex protection in the last patch.


> The spinlock could actually be a mutex, but the critical sections are either
> O(1) or should not be too long (default groups walking in last patch).

I'm wary of someone's reasonably deep groups. Discussing it
yesterday I'd been convinced that a mutex was good to start with. But
given your increased scope of the lock, let's try the spinlock and see
what happens.

> +extern spinlock_t configfs_dirent_lock;

Boy I wish this could be static to one file :-(

> @@ -79,7 +84,9 @@ static struct configfs_dirent *configfs_
> atomic_set(&sd->s_count, 1);
> INIT_LIST_HEAD(&sd->s_links);
> INIT_LIST_HEAD(&sd->s_children);
> + spin_lock(&configfs_dirent_lock);
> list_add(&sd->s_sibling, &parent_sd->s_children);
> + spin_unlock(&configfs_dirent_lock);
> sd->s_element = element;

You need to set s_element either under the lock or before taking
the lock. Once you've dropped the lock, someone can reach this dirent
from the parent, and should see s_element.

> @@ -173,7 +180,9 @@ static int create_dir(struct config_item
> } else {
> struct configfs_dirent *sd = d->d_fsdata;
> if (sd) {
> + spin_lock(&configfs_dirent_lock);
> list_del_init(&sd->s_sibling);
> + spin_unlock(&configfs_dirent_lock);
> configfs_put(sd);
> }
> }
> @@ -224,7 +233,9 @@ int configfs_create_link(struct configfs
> else {
> struct configfs_dirent *sd = dentry->d_fsdata;
> if (sd) {
> + spin_lock(&configfs_dirent_lock);
> list_del_init(&sd->s_sibling);
> + spin_unlock(&configfs_dirent_lock);
> configfs_put(sd);
> }
> }

These strike me as wrong - I would think that one should either
see the old configfs_dirent tree or the new one. That is, one would
take the dirent lock at the beginning of configfs_mkdir() and release it
at the end - so any other code that looks at the dirent tree will see an
atomic change. Here, some other path could see the new dirent after
configfs_make_dirent() but then it disappears when configfs_create()
fails.
If you did that, though, it'd have to be a mutex.
Now, the only thing that sees this intermediate condition is
configfs itself. Everyone else is protected by i_mutex. I guess it's
OK - but can you comment that fact? i_mutex does *not* protect
traversal of the configfs_dirent tree, but it does prevent the outside
world from seeing the intermediate states.

> @@ -238,7 +249,9 @@ static void remove_dir(struct dentry * d
> struct configfs_dirent * sd;
>
> sd = d->d_fsdata;
> + spin_lock(&configfs_dirent_lock);
> list_del_init(&sd->s_sibling);
> + spin_unlock(&configfs_dirent_lock);
> configfs_put(sd);
> if (d->d_inode)
> simple_rmdir(parent->d_inode,d);
> @@ -410,7 +423,9 @@ static void detach_attrs(struct config_i
> list_for_each_entry_safe(sd, tmp, &parent_sd->s_children, s_sibling) {
> if (!sd->s_element || !(sd->s_type & CONFIGFS_NOT_PINNED))
> continue;
> + spin_lock(&configfs_dirent_lock);
> list_del_init(&sd->s_sibling);
> + spin_unlock(&configfs_dirent_lock);
> configfs_drop_dentry(sd, dentry);
> configfs_put(sd);
> }
> @@ -1268,7 +1283,9 @@ static int configfs_dir_close(struct ino
> struct configfs_dirent * cursor = file->private_data;
>
> mutex_lock(&dentry->d_inode->i_mutex);
> + spin_lock(&configfs_dirent_lock);
> list_del_init(&cursor->s_sibling);
> + spin_unlock(&configfs_dirent_lock);
> mutex_unlock(&dentry->d_inode->i_mutex);
>
> release_configfs_dirent(cursor);
> @@ -1362,7 +1383,9 @@ static loff_t configfs_dir_lseek(struct
> struct list_head *p;
> loff_t n = file->f_pos - 2;
>
> + spin_lock(&configfs_dirent_lock);
> list_del(&cursor->s_sibling);
> + spin_unlock(&configfs_dirent_lock);
> p = sd->s_children.next;
> while (n && p != &sd->s_children) {
> struct configfs_dirent *next;
> @@ -1372,7 +1395,9 @@ static loff_t configfs_dir_lseek(struct
> n--;
> p = p->next;
> }
> + spin_lock(&configfs_dirent_lock);
> list_add_tail(&cursor->s_sibling, p);
> + spin_unlock(&configfs_dirent_lock);
> }
> }
> mutex_unlock(&dentry->d_inode->i_mutex);
> Index: b/fs/configfs/inode.c
> ===================================================================
> --- a/fs/configfs/inode.c 2008-06-12 13:44:27.000000000 +0200
> +++ b/fs/configfs/inode.c 2008-06-12 13:44:34.000000000 +0200
> @@ -247,7 +247,9 @@ void configfs_hash_and_remove(struct den
> if (!sd->s_element)
> continue;
> if (!strcmp(configfs_get_name(sd), name)) {
> + spin_lock(&configfs_dirent_lock);
> list_del_init(&sd->s_sibling);
> + spin_unlock(&configfs_dirent_lock);
> configfs_drop_dentry(sd, dir);
> configfs_put(sd);
> break;
> Index: b/fs/configfs/symlink.c
> ===================================================================
> --- a/fs/configfs/symlink.c 2008-06-12 13:44:27.000000000 +0200
> +++ b/fs/configfs/symlink.c 2008-06-12 13:44:34.000000000 +0200
> @@ -169,7 +169,9 @@ int configfs_unlink(struct inode *dir, s
> parent_item = configfs_get_config_item(dentry->d_parent);
> type = parent_item->ci_type;
>
> + spin_lock(&configfs_dirent_lock);
> list_del_init(&sd->s_sibling);
> + spin_unlock(&configfs_dirent_lock);
> configfs_drop_dentry(sd, dentry->d_parent);
> dput(dentry);
> configfs_put(sd);

You do the lock,del(sibling),unlock so often, maybe it should be
a helper. Then you can make configfs_dirent_lock static to dir.c.
Well, you use dirent_lock in your s_links patch, so maybe not static.

Joel

--

"Copy from one, it's plagiarism; copy from two, it's research."
- Wilson Mizner

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-06-12 22:26:18

by Louis Rilling

[permalink] [raw]
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

On Thu, Jun 12, 2008 at 12:13:48PM -0700, Joel Becker wrote:
> On Thu, Jun 12, 2008 at 03:31:27PM +0200, Louis Rilling wrote:
> > This patch introduces configfs_dirent_lock spinlock to protect configfs_dirent
> > traversals against linkage mutations (add/del/move). This will allow
> > configfs_detach_prep() to avoid locking i_mutexes.
>
> I like that you expanded the scope to cover all configfs_dirent
> linkages. These are all protected by i_mutex in the current code, but
> your rename fix removes that protection.
>
> > Locking rules for configfs_dirent linkage mutations are the same plus the
> > requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
> > either take appropriate i_mutex as before, or take configfs_dirent_lock.
>
> Nope, you *must* take configfs_dirent_lock now. You've removed
> i_mutex protection in the last patch.

Oh well. Do you mean because of CONFIGFS_USET_DROPPING being set without
i_mutex locked? This is the only mutation (except in the s_links patch) done
without i_mutex locked. I thought that actually either other
configfs_dirent traversals like readdir() and dir_lseek() would prevent
detach_prep() from succeeding because they add dirents before, or are done in
places where detach_prep() cannot do harm because new_dirent() fails whenever it
sees CONFIGFS_USET_DROPPING: detach_attrs() and detach_groups()
must ignore CONFIGFS_USET_DROPPING, depend_prep() is protected since the
whole path is locked from configfs root, lookup() can succeed since at worst its
result will be invalidated when actually detaching the default groups. The only
function for which I can not figure out is configfs_hash_and_remove(), but it is
not used.
I admit that the case of symlink() needs an extra check to ensure
that the target is not about to be removed. The bug was already there
though, right?
Anyway, if it looks conceptually simpler to use
configfs_dirent_lock (probably better a mutex in that case) wherever
i_mutex are supposed to protect configfs_dirent traversals, I'm ok with it.

>
>
> > The spinlock could actually be a mutex, but the critical sections are either
> > O(1) or should not be too long (default groups walking in last patch).
>
> I'm wary of someone's reasonably deep groups. Discussing it
> yesterday I'd been convinced that a mutex was good to start with. But
> given your increased scope of the lock, let's try the spinlock and see
> what happens.
>
> > +extern spinlock_t configfs_dirent_lock;
>
> Boy I wish this could be static to one file :-(
>
> > @@ -79,7 +84,9 @@ static struct configfs_dirent *configfs_
> > atomic_set(&sd->s_count, 1);
> > INIT_LIST_HEAD(&sd->s_links);
> > INIT_LIST_HEAD(&sd->s_children);
> > + spin_lock(&configfs_dirent_lock);
> > list_add(&sd->s_sibling, &parent_sd->s_children);
> > + spin_unlock(&configfs_dirent_lock);
> > sd->s_element = element;
>
> You need to set s_element either under the lock or before taking
> the lock. Once you've dropped the lock, someone can reach this dirent
> from the parent, and should see s_element.

Will do.

>
> > @@ -173,7 +180,9 @@ static int create_dir(struct config_item
> > } else {
> > struct configfs_dirent *sd = d->d_fsdata;
> > if (sd) {
> > + spin_lock(&configfs_dirent_lock);
> > list_del_init(&sd->s_sibling);
> > + spin_unlock(&configfs_dirent_lock);
> > configfs_put(sd);
> > }
> > }
> > @@ -224,7 +233,9 @@ int configfs_create_link(struct configfs
> > else {
> > struct configfs_dirent *sd = dentry->d_fsdata;
> > if (sd) {
> > + spin_lock(&configfs_dirent_lock);
> > list_del_init(&sd->s_sibling);
> > + spin_unlock(&configfs_dirent_lock);
> > configfs_put(sd);
> > }
> > }
>
> These strike me as wrong - I would think that one should either
> see the old configfs_dirent tree or the new one. That is, one would
> take the dirent lock at the beginning of configfs_mkdir() and release it
> at the end - so any other code that looks at the dirent tree will see an
> atomic change. Here, some other path could see the new dirent after
> configfs_make_dirent() but then it disappears when configfs_create()
> fails.
> If you did that, though, it'd have to be a mutex.

And we should not take other i_mutex in populate_groups() and
populate_attrs(), otherwise deadlocks could happen.

> Now, the only thing that sees this intermediate condition is
> configfs itself. Everyone else is protected by i_mutex. I guess it's
> OK - but can you comment that fact? i_mutex does *not* protect
> traversal of the configfs_dirent tree, but it does prevent the outside
> world from seeing the intermediate states.

The only intermediate conditions that may hurt one's mind is that an
mkdir() (resp. symlink()) racing with an rmdir() can successfuly call
make_item()/make_group() (resp. allow_link()) and immediately fail when
finalizing with attach_item()/attach_group() (resp. create_link()). So, from
userspace and the VFS this seems like "mkdir foo/bar/baz" simply failed because
of "rmdir foo", while at the same time from the subsystem point of view this
seems like userspace did "mkdir foo/bar/baz; rmdir foo/bar/baz; rmdir foo".
As I pointed out in the rename fix, this however can already happen when
attach_item()/attach_group() (resp. create_link()) fails because of
memory pressure for instance.
For the other intermediate conditions, see above about locking rules.

>
> > @@ -238,7 +249,9 @@ static void remove_dir(struct dentry * d
> > struct configfs_dirent * sd;
> >
> > sd = d->d_fsdata;
> > + spin_lock(&configfs_dirent_lock);
> > list_del_init(&sd->s_sibling);
> > + spin_unlock(&configfs_dirent_lock);
> > configfs_put(sd);
> > if (d->d_inode)
> > simple_rmdir(parent->d_inode,d);
> > @@ -410,7 +423,9 @@ static void detach_attrs(struct config_i
> > list_for_each_entry_safe(sd, tmp, &parent_sd->s_children, s_sibling) {
> > if (!sd->s_element || !(sd->s_type & CONFIGFS_NOT_PINNED))
> > continue;
> > + spin_lock(&configfs_dirent_lock);
> > list_del_init(&sd->s_sibling);
> > + spin_unlock(&configfs_dirent_lock);
> > configfs_drop_dentry(sd, dentry);
> > configfs_put(sd);
> > }
> > @@ -1268,7 +1283,9 @@ static int configfs_dir_close(struct ino
> > struct configfs_dirent * cursor = file->private_data;
> >
> > mutex_lock(&dentry->d_inode->i_mutex);
> > + spin_lock(&configfs_dirent_lock);
> > list_del_init(&cursor->s_sibling);
> > + spin_unlock(&configfs_dirent_lock);
> > mutex_unlock(&dentry->d_inode->i_mutex);
> >
> > release_configfs_dirent(cursor);
> > @@ -1362,7 +1383,9 @@ static loff_t configfs_dir_lseek(struct
> > struct list_head *p;
> > loff_t n = file->f_pos - 2;
> >
> > + spin_lock(&configfs_dirent_lock);
> > list_del(&cursor->s_sibling);
> > + spin_unlock(&configfs_dirent_lock);
> > p = sd->s_children.next;
> > while (n && p != &sd->s_children) {
> > struct configfs_dirent *next;
> > @@ -1372,7 +1395,9 @@ static loff_t configfs_dir_lseek(struct
> > n--;
> > p = p->next;
> > }
> > + spin_lock(&configfs_dirent_lock);
> > list_add_tail(&cursor->s_sibling, p);
> > + spin_unlock(&configfs_dirent_lock);
> > }
> > }
> > mutex_unlock(&dentry->d_inode->i_mutex);
> > Index: b/fs/configfs/inode.c
> > ===================================================================
> > --- a/fs/configfs/inode.c 2008-06-12 13:44:27.000000000 +0200
> > +++ b/fs/configfs/inode.c 2008-06-12 13:44:34.000000000 +0200
> > @@ -247,7 +247,9 @@ void configfs_hash_and_remove(struct den
> > if (!sd->s_element)
> > continue;
> > if (!strcmp(configfs_get_name(sd), name)) {
> > + spin_lock(&configfs_dirent_lock);
> > list_del_init(&sd->s_sibling);
> > + spin_unlock(&configfs_dirent_lock);
> > configfs_drop_dentry(sd, dir);
> > configfs_put(sd);
> > break;
> > Index: b/fs/configfs/symlink.c
> > ===================================================================
> > --- a/fs/configfs/symlink.c 2008-06-12 13:44:27.000000000 +0200
> > +++ b/fs/configfs/symlink.c 2008-06-12 13:44:34.000000000 +0200
> > @@ -169,7 +169,9 @@ int configfs_unlink(struct inode *dir, s
> > parent_item = configfs_get_config_item(dentry->d_parent);
> > type = parent_item->ci_type;
> >
> > + spin_lock(&configfs_dirent_lock);
> > list_del_init(&sd->s_sibling);
> > + spin_unlock(&configfs_dirent_lock);
> > configfs_drop_dentry(sd, dentry->d_parent);
> > dput(dentry);
> > configfs_put(sd);
>
> You do the lock,del(sibling),unlock so often, maybe it should be
> a helper. Then you can make configfs_dirent_lock static to dir.c.
> Well, you use dirent_lock in your s_links patch, so maybe not static.

Yes I need it not static, or implement helpers for s_children/s_sibling
and for s_links/sl_list. And if the scope should be extended to all
cases of configfs_dirent traversals plus whatever would fix symlink()'s
target not disappearing, it should definitely not be static.

Louis

--
Dr Louis Rilling Kerlabs - IRISA
Skype: louis.rilling Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52 Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71 35042 Rennes CEDEX - France
http://www.kerlabs.com/

2008-06-13 02:41:52

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

On Fri, Jun 13, 2008 at 12:25:58AM +0200, Louis Rilling wrote:
> On Thu, Jun 12, 2008 at 12:13:48PM -0700, Joel Becker wrote:
> > On Thu, Jun 12, 2008 at 03:31:27PM +0200, Louis Rilling wrote:
> > > Locking rules for configfs_dirent linkage mutations are the same plus the
> > > requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
> > > either take appropriate i_mutex as before, or take configfs_dirent_lock.
> >
> > Nope, you *must* take configfs_dirent_lock now. You've removed
> > i_mutex protection in the last patch.
>
> Oh well. Do you mean because of CONFIGFS_USET_DROPPING being set without
> i_mutex locked? This is the only mutation (except in the s_links patch) done
> without i_mutex locked. I thought that actually either other
> configfs_dirent traversals like readdir() and dir_lseek() would prevent
> detach_prep() from succeeding because they add dirents before, or are done in
> places where detach_prep() cannot do harm because new_dirent() fails whenever it
> sees CONFIGFS_USET_DROPPING: detach_attrs() and detach_groups()
> must ignore CONFIGFS_USET_DROPPING, depend_prep() is protected since the
> whole path is locked from configfs root, lookup() can succeed since at worst its
> result will be invalidated when actually detaching the default groups. The only
> function for which I can not figure out is configfs_hash_and_remove(), but it is
> not used.

I don't mean that your code is wrong, I mean that the comment is
unclear. The locking rules aren't "you can use i_mutex or dirent_lock,
take your pick". I think you are right that configfs_detach_prep() is
safe to set dropping as it does without i_mutex.
This is related to the discussion below about VFS visible
changes (i_mutex protection) vs subsystem internal changes (dirent_lock
protection). The protections have different scope, but your comment
made them sound interchangable.

> I admit that the case of symlink() needs an extra check to ensure
> that the target is not about to be removed. The bug was already there
> though, right?
> Anyway, if it looks conceptually simpler to use
> configfs_dirent_lock (probably better a mutex in that case) wherever
> i_mutex are supposed to protect configfs_dirent traversals, I'm ok with it.

Leave it as a spinlock.
Going over the changes, I was pretty convinced your detach_prep
was safe vis-a-vis mkdir. You're under i_mutex for the immediate
directory, and both attach_* and detach_* are under the immediate
i_mutex when they make the change. Also, you have your readdir and
lookup walking s_children without a lock. I *think* that's safe, because
it's also against the immediate directory, and thus the vfs is holding
i_mutex for you.

> And we should not take other i_mutex in populate_groups() and
> populate_attrs(), otherwise deadlocks could happen.

Huh, we certainly should. perhaps you are speaking as if we
were turning dirent_lock into a mutex. We're not turning dirent_lock
into a mutex yet.

> > Now, the only thing that sees this intermediate condition is
> > configfs itself. Everyone else is protected by i_mutex. I guess it's
> > OK - but can you comment that fact? i_mutex does *not* protect
> > traversal of the configfs_dirent tree, but it does prevent the outside
> > world from seeing the intermediate states.
>
> The only intermediate conditions that may hurt one's mind is that an
> mkdir() (resp. symlink()) racing with an rmdir() can successfuly call
> make_item()/make_group() (resp. allow_link()) and immediately fail when
> finalizing with attach_item()/attach_group() (resp. create_link()). So, from
> userspace and the VFS this seems like "mkdir foo/bar/baz" simply failed because
> of "rmdir foo", while at the same time from the subsystem point of view this
> seems like userspace did "mkdir foo/bar/baz; rmdir foo/bar/baz; rmdir foo".
> As I pointed out in the rename fix, this however can already happen when
> attach_item()/attach_group() (resp. create_link()) fails because of
> memory pressure for instance.

I'm not even sure what you said here :-)

Joel

--

"Egotist: a person more interested in himself than in me."
- Ambrose Bierce

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-06-13 10:45:29

by Louis Rilling

[permalink] [raw]
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

On Thu, Jun 12, 2008 at 07:41:31PM -0700, Joel Becker wrote:
> On Fri, Jun 13, 2008 at 12:25:58AM +0200, Louis Rilling wrote:
> > On Thu, Jun 12, 2008 at 12:13:48PM -0700, Joel Becker wrote:
> > > On Thu, Jun 12, 2008 at 03:31:27PM +0200, Louis Rilling wrote:
> > > > Locking rules for configfs_dirent linkage mutations are the same plus the
> > > > requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
> > > > either take appropriate i_mutex as before, or take configfs_dirent_lock.
> > >
> > > Nope, you *must* take configfs_dirent_lock now. You've removed
> > > i_mutex protection in the last patch.
> >
> > Oh well. Do you mean because of CONFIGFS_USET_DROPPING being set without
> > i_mutex locked? This is the only mutation (except in the s_links patch) done
> > without i_mutex locked. I thought that actually either other
> > configfs_dirent traversals like readdir() and dir_lseek() would prevent
> > detach_prep() from succeeding because they add dirents before, or are done in
> > places where detach_prep() cannot do harm because new_dirent() fails whenever it
> > sees CONFIGFS_USET_DROPPING: detach_attrs() and detach_groups()
> > must ignore CONFIGFS_USET_DROPPING, depend_prep() is protected since the
> > whole path is locked from configfs root, lookup() can succeed since at worst its
> > result will be invalidated when actually detaching the default groups. The only
> > function for which I can not figure out is configfs_hash_and_remove(), but it is
> > not used.
>
> I don't mean that your code is wrong, I mean that the comment is
> unclear. The locking rules aren't "you can use i_mutex or dirent_lock,
> take your pick". I think you are right that configfs_detach_prep() is
> safe to set dropping as it does without i_mutex.
> This is related to the discussion below about VFS visible
> changes (i_mutex protection) vs subsystem internal changes (dirent_lock
> protection). The protections have different scope, but your comment
> made them sound interchangable.

Agreed.

>
> > I admit that the case of symlink() needs an extra check to ensure
> > that the target is not about to be removed. The bug was already there
> > though, right?
> > Anyway, if it looks conceptually simpler to use
> > configfs_dirent_lock (probably better a mutex in that case) wherever
> > i_mutex are supposed to protect configfs_dirent traversals, I'm ok with it.
>
> Leave it as a spinlock.
> Going over the changes, I was pretty convinced your detach_prep
> was safe vis-a-vis mkdir. You're under i_mutex for the immediate
> directory, and both attach_* and detach_* are under the immediate
> i_mutex when they make the change. Also, you have your readdir and
> lookup walking s_children without a lock. I *think* that's safe, because
> it's also against the immediate directory, and thus the vfs is holding
> i_mutex for you.

Unfortunately, thinking a bit more about it I found some issues with
i_mutex lock free detach_prep(), but nothing that can't be fixed ;)
Between detach_prep() in A and mkdir() in a default group A/B:
detach_prep() can be called in the middle of attach_group(), for instance after
having attached A/B/C, but attach_group() may then fail (because of memory
pressure for instance) while attaching C's default group A/B/C/D. This would
lead to both mkdir(A/B/C) and rmdir(A) failing, the reason for rmdir failure
being at best obscure: the user would have expected to either see mkdir succeed
and rmdir fail because of the new A/B/C group, or see mkdir fail and rmdir
succeed because no user-created group lived under A. Solution: tag A/B with
USET_IN_MKDIR on mkdir entrance, remove that tag on mkdir exit, and retry
detach_prep() as long as USET_IN_MKDIR is found under A/*.
Between rmdir() and readdir(): dir_open() might add a configfs_dirent
to a default group A/B that detach_prep() already marked with USET_DROPPING.
This could result in detach_groups() dropping the dirent and make readdir() in
A/B crash afterwards. Solution: check USET_DROPPING in dir_open() and fail if
it is set.
Between rmdir() and lookup(): several lookup() called under A/* while
rmdir(A) in the middle of detach_groups() could return inconsistent results (for
instance some default groups being there and some other ones not). Solution:
lock dirent_lock for the whole lookup() duration, check USET_DROPPING of current
dir, and fail with ENOENT if it is set.

I'll send a corrected patch later.

>
> > And we should not take other i_mutex in populate_groups() and
> > populate_attrs(), otherwise deadlocks could happen.
>
> Huh, we certainly should. perhaps you are speaking as if we
> were turning dirent_lock into a mutex. We're not turning dirent_lock
> into a mutex yet.

I was speaking as if we replaced i_mutex protection with dirent_lock
protection for a whole mkdir(), that is taking the lock before attach_* and
releasing it after.

>
> > > Now, the only thing that sees this intermediate condition is
> > > configfs itself. Everyone else is protected by i_mutex. I guess it's
> > > OK - but can you comment that fact? i_mutex does *not* protect
> > > traversal of the configfs_dirent tree, but it does prevent the outside
> > > world from seeing the intermediate states.
> >
> > The only intermediate conditions that may hurt one's mind is that an
> > mkdir() (resp. symlink()) racing with an rmdir() can successfuly call
> > make_item()/make_group() (resp. allow_link()) and immediately fail when
> > finalizing with attach_item()/attach_group() (resp. create_link()). So, from
> > userspace and the VFS this seems like "mkdir foo/bar/baz" simply failed because
> > of "rmdir foo", while at the same time from the subsystem point of view this
> > seems like userspace did "mkdir foo/bar/baz; rmdir foo/bar/baz; rmdir foo".
> > As I pointed out in the rename fix, this however can already happen when
> > attach_item()/attach_group() (resp. create_link()) fails because of
> > memory pressure for instance.
>
> I'm not even sure what you said here :-)

I was just saying that with i_mutex lock free detach_prep(), we have kind of
optimistic mkdir(), with conflicts resolved as error cases of attach_*.

The intermediate conditions that really matter are:
1/ the existence of partial default groups trees (I mean configfs_dirent trees)
in the middle of attach_group() and detach_group(),
2/ the existence of default group trees that are tagged as USET_DROPPING and
should be treated as not existing anymore.
These intermediate conditions lead to the issues pointed out above, and
will be correctly handled (I hope) in the coming corrected patch.

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (6.66 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-06-13 12:09:33

by Louis Rilling

[permalink] [raw]
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

On Fri, Jun 13, 2008 at 12:45:13PM +0200, Louis Rilling wrote:
>
> Unfortunately, thinking a bit more about it I found some issues with
> i_mutex lock free detach_prep(), but nothing that can't be fixed ;)
> Between detach_prep() in A and mkdir() in a default group A/B:
> detach_prep() can be called in the middle of attach_group(), for instance after
> having attached A/B/C, but attach_group() may then fail (because of memory
> pressure for instance) while attaching C's default group A/B/C/D. This would
> lead to both mkdir(A/B/C) and rmdir(A) failing, the reason for rmdir failure
> being at best obscure: the user would have expected to either see mkdir succeed
> and rmdir fail because of the new A/B/C group, or see mkdir fail and rmdir
> succeed because no user-created group lived under A. Solution: tag A/B with
> USET_IN_MKDIR on mkdir entrance, remove that tag on mkdir exit, and retry
> detach_prep() as long as USET_IN_MKDIR is found under A/*.
> Between rmdir() and readdir(): dir_open() might add a configfs_dirent
> to a default group A/B that detach_prep() already marked with USET_DROPPING.
> This could result in detach_groups() dropping the dirent and make readdir() in
> A/B crash afterwards. Solution: check USET_DROPPING in dir_open() and fail if
> it is set.
> Between rmdir() and lookup(): several lookup() called under A/* while
> rmdir(A) in the middle of detach_groups() could return inconsistent results (for
> instance some default groups being there and some other ones not). Solution:
> lock dirent_lock for the whole lookup() duration, check USET_DROPPING of current
> dir, and fail with ENOENT if it is set.

Oh, should probably provide some d_revalidate() also, which would return
-ENOENT for a dentry under a directory flagged with USET_DROPPING. But I'm
realizing that such "inconsistencies" (some default groups being valid in the
d_cache and some other ones not) already happen between the time detach_prep()
has flagged a default group with USET_DROPPING and the default
group is actually detached. Am I wrong?

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (2.19 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-06-13 20:18:10

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

Louis,
Can I just say, you're the first person to do serious review
other than myself, and I really appreciate it :-)

On Fri, Jun 13, 2008 at 12:45:13PM +0200, Louis Rilling wrote:
> On Thu, Jun 12, 2008 at 07:41:31PM -0700, Joel Becker wrote:
> Unfortunately, thinking a bit more about it I found some issues with
> i_mutex lock free detach_prep(), but nothing that can't be fixed ;)
> Between detach_prep() in A and mkdir() in a default group A/B:
> detach_prep() can be called in the middle of attach_group(), for instance after
> having attached A/B/C, but attach_group() may then fail (because of memory
> pressure for instance) while attaching C's default group A/B/C/D. This would
> lead to both mkdir(A/B/C) and rmdir(A) failing, the reason for rmdir failure
> being at best obscure: the user would have expected to either see mkdir succeed
> and rmdir fail because of the new A/B/C group, or see mkdir fail and rmdir
> succeed because no user-created group lived under A. Solution: tag A/B with
> USET_IN_MKDIR on mkdir entrance, remove that tag on mkdir exit, and retry
> detach_prep() as long as USET_IN_MKDIR is found under A/*.

I see what you are saying here. I'm not sure if that is worth
the complexity - we can say "it was kind of there". No one will ever
hit it :-) But let me think about it more.

> Between rmdir() and readdir(): dir_open() might add a configfs_dirent
> to a default group A/B that detach_prep() already marked with USET_DROPPING.
> This could result in detach_groups() dropping the dirent and make readdir() in
> A/B crash afterwards. Solution: check USET_DROPPING in dir_open() and fail if
> it is set.

I was trying to see why this could happen, given that we can
come to this from other places - the dir could have been open before we
set USET_DROPPING. Oh! We actually fail rmdir with ENOTEMPTY when the
dir is open? That's wrong. Ignore it though - we'll fix it later.
But back to your concern. configfs_readdir() can't crash for
two reasons. First, detach_groups() won't remove this dirent. A
readdir placeholder has s_element==NULL. Note the check in
detach_groups():

if (!sd->s_element ||
!(sd->s_type & CONFIGFS_USET_DEFAULT))
continue;

It skips our readdir placeholder, allowing us to free it in dir_close().
There's another reason this can't be a problem. If we get into
detach_groups(), we take i_mutex, locking out readdir(). Then we delete
the directory, setting S_DEAD. In vfs_readdir(), they check
IS_DEADDIR() after getting i_mutex. So they will see S_DEAD and not
call our ->readdir(). S_DEAD is important. Someone could actually have
our default_group as their cwd. S_DEAD prevents them from doing
anything :-)

> Between rmdir() and lookup(): several lookup() called under A/* while
> rmdir(A) in the middle of detach_groups() could return inconsistent results (for
> instance some default groups being there and some other ones not). Solution:
> lock dirent_lock for the whole lookup() duration, check USET_DROPPING of current
> dir, and fail with ENOENT if it is set.

Nah, we don't care about the spurious lookups. This is a normal
race of i_mutex. USET_DROPPING is not a way to prevent VFS views from
changing - it's only a way to prevent new children.
Remember, ->lookup() comes with i_mutex locking. We hold
i_mutex during the entire delete, so they can't call ->lookup() until
we're done with a directory. Conversely, if they win i_mutex and ->lookup()
a default group, then try to use it after we've removed it, they'll just
ENOENT. This is evident back in do_rename(). They call lookup, which
takes and drops locks, then call lock_rename() to get the locks back.
And they can handle ENOENT at that point.

> I was speaking as if we replaced i_mutex protection with dirent_lock
> protection for a whole mkdir(), that is taking the lock before attach_* and
> releasing it after.

Ok. I think that's not the way to go, what you currently have
is better.

> > I'm not even sure what you said here :-)
>
> I was just saying that with i_mutex lock free detach_prep(), we have kind of
> optimistic mkdir(), with conflicts resolved as error cases of attach_*.

Basically, the concerns you had above.

> The intermediate conditions that really matter are:
> 1/ the existence of partial default groups trees (I mean configfs_dirent trees)
> in the middle of attach_group() and detach_group(),

This is your first case, the mkdir ENOMEM vs rmdir ENOTEMPTY.

> 2/ the existence of default group trees that are tagged as USET_DROPPING and
> should be treated as not existing anymore.

This is not an issue. USET_DROPPING does *not* mean it went
away. It means we're safe to make it go away. We protect the actual
going-away with i_mutex. And that's normal VFS behavior.

Joel

--

"I don't know anything about music. In my line you don't have
to."
- Elvis Presley

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-06-13 20:20:08

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

On Fri, Jun 13, 2008 at 02:09:23PM +0200, Louis Rilling wrote:
> Oh, should probably provide some d_revalidate() also, which would return
> -ENOENT for a dentry under a directory flagged with USET_DROPPING. But I'm
> realizing that such "inconsistencies" (some default groups being valid in the
> d_cache and some other ones not) already happen between the time detach_prep()
> has flagged a default group with USET_DROPPING and the default
> group is actually detached. Am I wrong?

We don't need d_revalidate(). As I stated at the end of my last
email, USET_DROPPING does not mean 'It already went away'. It just
means we're safe to do so, because we prevent new children. We actually
make it go away underneath i_mutex.
The VFS handles inconsistencies between lookup and action. It's
part of normal operation. Otherwise, they'd have to hold all the
i_mutexes around lookup and action.

Joel

--

"When choosing between two evils, I always like to try the one
I've never tried before."
- Mae West

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-06-13 21:54:23

by Louis Rilling

[permalink] [raw]
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

On Fri, Jun 13, 2008 at 01:17:46PM -0700, Joel Becker wrote:
> Louis,
> Can I just say, you're the first person to do serious review
> other than myself, and I really appreciate it :-)

It's just that I use configfs in my own work, and I'm playing hard with
it, especially with modules. So I need to understand exactly what it
does, and what is possible with it.

>
> On Fri, Jun 13, 2008 at 12:45:13PM +0200, Louis Rilling wrote:
> > On Thu, Jun 12, 2008 at 07:41:31PM -0700, Joel Becker wrote:
> > Unfortunately, thinking a bit more about it I found some issues with
> > i_mutex lock free detach_prep(), but nothing that can't be fixed ;)
> > Between detach_prep() in A and mkdir() in a default group A/B:
> > detach_prep() can be called in the middle of attach_group(), for instance after
> > having attached A/B/C, but attach_group() may then fail (because of memory
> > pressure for instance) while attaching C's default group A/B/C/D. This would
> > lead to both mkdir(A/B/C) and rmdir(A) failing, the reason for rmdir failure
> > being at best obscure: the user would have expected to either see mkdir succeed
> > and rmdir fail because of the new A/B/C group, or see mkdir fail and rmdir
> > succeed because no user-created group lived under A. Solution: tag A/B with
> > USET_IN_MKDIR on mkdir entrance, remove that tag on mkdir exit, and retry
> > detach_prep() as long as USET_IN_MKDIR is found under A/*.
>
> I see what you are saying here. I'm not sure if that is worth
> the complexity - we can say "it was kind of there". No one will ever
> hit it :-) But let me think about it more.

To me it's an issue only if we want to provide some atomic view to
userspace: either userspace sees a group with all of its default groups,
or it sees none. So the question is: does userspace need such atomicity?
Currently configfs provides it, so this would be a userspace visible
change if we break it.

>
> > Between rmdir() and readdir(): dir_open() might add a configfs_dirent
> > to a default group A/B that detach_prep() already marked with USET_DROPPING.
> > This could result in detach_groups() dropping the dirent and make readdir() in
> > A/B crash afterwards. Solution: check USET_DROPPING in dir_open() and fail if
> > it is set.
>
> I was trying to see why this could happen, given that we can
> come to this from other places - the dir could have been open before we
> set USET_DROPPING. Oh! We actually fail rmdir with ENOTEMPTY when the
> dir is open? That's wrong. Ignore it though - we'll fix it later.
> But back to your concern. configfs_readdir() can't crash for
> two reasons. First, detach_groups() won't remove this dirent. A
> readdir placeholder has s_element==NULL. Note the check in
> detach_groups():
>
> if (!sd->s_element ||
> !(sd->s_type & CONFIGFS_USET_DEFAULT))
> continue;
>
> It skips our readdir placeholder, allowing us to free it in dir_close().

I had not noticed this. Thanks for pointed it out.

> There's another reason this can't be a problem. If we get into
> detach_groups(), we take i_mutex, locking out readdir(). Then we delete
> the directory, setting S_DEAD. In vfs_readdir(), they check
> IS_DEADDIR() after getting i_mutex. So they will see S_DEAD and not
> call our ->readdir(). S_DEAD is important. Someone could actually have
> our default_group as their cwd. S_DEAD prevents them from doing
> anything :-)

As I told you in a previous email, I'm missing some VFS skills, so
thanks again for the explanation.

>
> > Between rmdir() and lookup(): several lookup() called under A/* while
> > rmdir(A) in the middle of detach_groups() could return inconsistent results (for
> > instance some default groups being there and some other ones not). Solution:
> > lock dirent_lock for the whole lookup() duration, check USET_DROPPING of current
> > dir, and fail with ENOENT if it is set.
>
> Nah, we don't care about the spurious lookups. This is a normal
> race of i_mutex. USET_DROPPING is not a way to prevent VFS views from
> changing - it's only a way to prevent new children.
> Remember, ->lookup() comes with i_mutex locking. We hold
> i_mutex during the entire delete, so they can't call ->lookup() until
> we're done with a directory. Conversely, if they win i_mutex and ->lookup()
> a default group, then try to use it after we've removed it, they'll just
> ENOENT. This is evident back in do_rename(). They call lookup, which
> takes and drops locks, then call lock_rename() to get the locks back.
> And they can handle ENOENT at that point.

Sure, my only concern is the atomic view of userspace: can userspace
tolerate that (pwd=A/B, with B a default group of A, B having default groups C
and D, and A being removed) 'ls C' returns error because default group C is
already removed and 'ls D' is ok because default group D is not removed yet?

>
> > I was speaking as if we replaced i_mutex protection with dirent_lock
> > protection for a whole mkdir(), that is taking the lock before attach_* and
> > releasing it after.
>
> Ok. I think that's not the way to go, what you currently have
> is better.

Agreed.

>
> > The intermediate conditions that really matter are:
> > 1/ the existence of partial default groups trees (I mean configfs_dirent trees)
> > in the middle of attach_group() and detach_group(),
>
> This is your first case, the mkdir ENOMEM vs rmdir ENOTEMPTY.

Exactly.

>
> > 2/ the existence of default group trees that are tagged as USET_DROPPING and
> > should be treated as not existing anymore.
>
> This is not an issue. USET_DROPPING does *not* mean it went
> away. It means we're safe to make it go away. We protect the actual
> going-away with i_mutex. And that's normal VFS behavior.

Again this is the concern of atomicity from userspace point of view: to
provide such atomic view, mkdir(), lookup(), readdir(), and probably
attributes open() should just fail when done in a default group flagged with
USET_DROPPING.

Anyway, if atomicity from userspace point of view is not a concern, this
just makes things simpler, and I'm ok with it.

Louis

--
Dr Louis Rilling Kerlabs - IRISA
Skype: louis.rilling Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52 Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71 35042 Rennes CEDEX - France
http://www.kerlabs.com/

2008-06-13 22:35:19

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

On Fri, Jun 13, 2008 at 11:54:01PM +0200, Louis Rilling wrote:
> On Fri, Jun 13, 2008 at 01:17:46PM -0700, Joel Becker wrote:
> > Louis,
> > Can I just say, you're the first person to do serious review
> > other than myself, and I really appreciate it :-)
>
> It's just that I use configfs in my own work, and I'm playing hard with
> it, especially with modules. So I need to understand exactly what it
> does, and what is possible with it.

I'm all for it. And see my other email responding to Luis about
the current concerns with configfs.

> > I see what you are saying here. I'm not sure if that is worth
> > the complexity - we can say "it was kind of there". No one will ever
> > hit it :-) But let me think about it more.
>
> To me it's an issue only if we want to provide some atomic view to
> userspace: either userspace sees a group with all of its default groups,
> or it sees none. So the question is: does userspace need such atomicity?
> Currently configfs provides it, so this would be a userspace visible
> change if we break it.

People *won't* see that. default groups are populated and
cleaned under i_mutex. The race of mkdir vs rmdir isn't about seeing
partial default groups, it's about the ENOMEM racing the ENOTEMPTY. It
doesn't impact lookup or other operations. We can fix it. I'm just not
sure it's worth the complexity (and this is an open question).

> > There's another reason this can't be a problem. If we get into
> > detach_groups(), we take i_mutex, locking out readdir(). Then we delete
> > the directory, setting S_DEAD. In vfs_readdir(), they check
> > IS_DEADDIR() after getting i_mutex. So they will see S_DEAD and not
> > call our ->readdir(). S_DEAD is important. Someone could actually have
> > our default_group as their cwd. S_DEAD prevents them from doing
> > anything :-)
>
> As I told you in a previous email, I'm missing some VFS skills, so
> thanks again for the explanation.

Oh, don't worry about that. There's nothing wrong with not
knowing - that's why you asked me. Now you do :-)

> Sure, my only concern is the atomic view of userspace: can userspace
> tolerate that (pwd=A/B, with B a default group of A, B having default groups C
> and D, and A being removed) 'ls C' returns error because default group C is
> already removed and 'ls D' is ok because default group D is not removed yet?

They can't see that. We take i_mutex in detach_group. This
locks out lookup and readdir. When we're done with detach_group, all
default groups are gone.

> > > 2/ the existence of default group trees that are tagged as USET_DROPPING and
> > > should be treated as not existing anymore.
> >
> > This is not an issue. USET_DROPPING does *not* mean it went
> > away. It means we're safe to make it go away. We protect the actual
> > going-away with i_mutex. And that's normal VFS behavior.
>
> Again this is the concern of atomicity from userspace point of view: to
> provide such atomic view, mkdir(), lookup(), readdir(), and probably
> attributes open() should just fail when done in a default group flagged with
> USET_DROPPING.

It's not atomic, though, and never has been. I'm not quite sure
what you are unsure of here. Let me try to clarify a little.
Are you worried about two separate runs of the ls(1) command?

# ls A/B/C
# ls A/B/D

These can't be atomic, because someone else could rmdir(1) in the
middle:

# ls A/B/C
# rmdir A/B
# ls A/B/D
ls: No such file or directory

This is perfectly normal, and there is no way to prevent it - it is
separate entrances to the system call.
Do you mean inside one call? That is "ls A/B" would print "C"
but not "D"? That cannot happen, because we hold B's i_mutex during
detach_group. So, if readdir beat us to i_mutex, it lists "C D". If we
win, we remove both before releasing B's i_Mutex, and readdir errors
with ENOENT - we removed B.
I'm not quite sure what inconsistency you are asking about here.

Joel

--

"The one important thing i have learned over the years is the
difference between taking one's work seriously and taking one's self
seriously. The first is imperative and the second is disastrous."
-Margot Fonteyn

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2008-06-16 11:30:29

by Louis Rilling

[permalink] [raw]
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

On Fri, Jun 13, 2008 at 03:34:41PM -0700, Joel Becker wrote:
> > To me it's an issue only if we want to provide some atomic view to
> > userspace: either userspace sees a group with all of its default groups,
> > or it sees none. So the question is: does userspace need such atomicity?
> > Currently configfs provides it, so this would be a userspace visible
> > change if we break it.
>
> People *won't* see that. default groups are populated and
> cleaned under i_mutex. The race of mkdir vs rmdir isn't about seeing
> partial default groups, it's about the ENOMEM racing the ENOTEMPTY. It
> doesn't impact lookup or other operations. We can fix it. I'm just not
> sure it's worth the complexity (and this is an open question).

It's not that difficult to implement, you may just find it a bit ugly... I hope
to send you a corrected "rename fix" today.

>
> > Sure, my only concern is the atomic view of userspace: can userspace
> > tolerate that (pwd=A/B, with B a default group of A, B having default groups C
> > and D, and A being removed) 'ls C' returns error because default group C is
> > already removed and 'ls D' is ok because default group D is not removed yet?
>
> They can't see that. We take i_mutex in detach_group. This
> locks out lookup and readdir. When we're done with detach_group, all
> default groups are gone.

If I understand correctly, lookup() is not called each time userspace does ls,
and in configfs case, it is never called for existing items since the d_cache is
populated as soon as the user creates items. So lookup() does not block 'ls'
during rmdir() (unless it is a lookup for a never accessed attribute). I think
that this is the point that invalidates all my theory about atomicity :)

>
> > > > 2/ the existence of default group trees that are tagged as USET_DROPPING and
> > > > should be treated as not existing anymore.
> > >
> > > This is not an issue. USET_DROPPING does *not* mean it went
> > > away. It means we're safe to make it go away. We protect the actual
> > > going-away with i_mutex. And that's normal VFS behavior.
> >
> > Again this is the concern of atomicity from userspace point of view: to
> > provide such atomic view, mkdir(), lookup(), readdir(), and probably
> > attributes open() should just fail when done in a default group flagged with
> > USET_DROPPING.
>
> It's not atomic, though, and never has been. I'm not quite sure
> what you are unsure of here. Let me try to clarify a little.
> Are you worried about two separate runs of the ls(1) command?
>
> # ls A/B/C
> # ls A/B/D
>
> These can't be atomic, because someone else could rmdir(1) in the
> middle:
>
> # ls A/B/C
> # rmdir A/B
> # ls A/B/D
> ls: No such file or directory
>
> This is perfectly normal, and there is no way to prevent it - it is
> separate entrances to the system call.
> Do you mean inside one call? That is "ls A/B" would print "C"
> but not "D"? That cannot happen, because we hold B's i_mutex during
> detach_group. So, if readdir beat us to i_mutex, it lists "C D". If we
> win, we remove both before releasing B's i_Mutex, and readdir errors
> with ENOENT - we removed B.
> I'm not quite sure what inconsistency you are asking about here.

The scenario that made me worry was more:
process 1:
/* PWD=A/B */
# ls C
ls: No such file or directory
/* some sync between process 1 and 2 */
process 2:
/* PWD=A/D */
# ls E
/* ok */
# ls E
ls: No such file or directory

From a user's point of view, this looks as if somebody did 'rmdir A; mkdir A;
rmdir A', while there actually were only 'rmdir A'.

If there were no d_cache, this would be impossible with the current
implementation of detach_prep() locking all default groups. But with the d_cache
this has always been possible.

Anyway, I give up with this (wrong) atomicity concern.

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (3.92 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments