2011-02-21 17:37:46

by Lino Sanfilippo

[permalink] [raw]
Subject: [PATCH 0/5] fsnotify: decouple mark lock and marks fsobject lock

1. mark lock
2. inode/vfsmount lock

With these patches only one of both locks are taken at the same time, which
removes the need for a strict locking order.

These patches depend on patch "inotify: fix race when adding a new watch" sent to lkml
on Feb. 11.


2011-02-21 17:37:51

by Lino Sanfilippo

[permalink] [raw]
Subject: [PATCH 3/5] fsnotify: dont call fsnotify_add_[inode|vfsmount]_mark() with mark lock held.

Dont call fsnotify_add_[inode|vfsmount]_mark() with the mark lock already held,
but take the lock in these functions.

Signed-off-by: Lino Sanfilippo <[email protected]>
---
fs/notify/inode_mark.c | 8 ++++++--
fs/notify/mark.c | 8 --------
fs/notify/vfsmount_mark.c | 8 ++++++--
3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index a9b8661..b357e4b 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -186,12 +186,13 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
struct hlist_node *node, *last = NULL;
int ret = 0;

+ spin_lock(&mark->lock);
+ mark->group = group;
+ mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE;
mark->flags |= FSNOTIFY_MARK_FLAG_INODE;
mark->i.inode = igrab(inode);
mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;

- assert_spin_locked(&mark->lock);
-
spin_lock(&inode->i_lock);
/* is mark the first mark? */
if (hlist_empty(&inode->i_fsnotify_marks)) {
@@ -231,7 +232,10 @@ out:
iput(mark->i.inode);
mark->i.inode = NULL;
mark->flags &= ~FSNOTIFY_MARK_FLAG_INODE;
+ mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
+ mark->group = NULL;
}
+ spin_unlock(&mark->lock);

return ret;
}
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 6d9bf1d..57f8dd6 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -202,10 +202,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
{
int ret;

- spin_lock(&mark->lock);

- mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE;
- mark->group = group;
list_add(&mark->g_list, &group->marks_list);
fsnotify_get_mark(mark); /* for i_list and g_list */
atomic_inc(&group->num_marks);
@@ -220,20 +217,15 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
if (ret)
goto err;

- spin_unlock(&mark->lock);

if (inode)
__fsnotify_update_child_dentry_flags(inode);

return 0;
err:
- mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
list_del_init(&mark->g_list);
- mark->group = NULL;
atomic_dec(&group->num_marks);

- spin_unlock(&mark->lock);
-
spin_lock(&destroy_lock);
list_add(&mark->destroy_list, &destroy_list);
spin_unlock(&destroy_lock);
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index b33e78b..a6fe562 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -143,11 +143,12 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
struct hlist_node *node, *last = NULL;
int ret = 0;

+ spin_lock(&mark->lock);
+ mark->group = group;
+ mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE;
mark->flags |= FSNOTIFY_MARK_FLAG_VFSMOUNT;
mark->m.mnt = mnt;

- assert_spin_locked(&mark->lock);
-
spin_lock(&mnt->mnt_root->d_lock);
/* is mark the first mark? */
if (hlist_empty(&mnt->mnt_fsnotify_marks)) {
@@ -185,7 +186,10 @@ out:
if (ret) {
mark->m.mnt = NULL;
mark->flags &= ~FSNOTIFY_MARK_FLAG_VFSMOUNT;
+ mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
+ mark->group = NULL;
}
+ spin_unlock(&mark->lock);

return ret;
}
--
1.7.1

2011-02-21 17:37:55

by Lino Sanfilippo

[permalink] [raw]
Subject: [PATCH 5/5] fsnotify: dont lock fsobject with mark lock held

With this patch we dont lock a fsobject (inode/vfsmount) as long as we hold
the mark lock. Thus there is no strict locking order

1. mark lock
2. fsobject lock

any more.

Signed-off-by: Lino Sanfilippo <[email protected]>
---
fs/notify/inode_mark.c | 7 ++++---
fs/notify/vfsmount_mark.c | 7 ++++---
2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index ea015c7..245828a 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -61,8 +61,6 @@ void fsnotify_recalc_inode_mask(struct inode *inode)
void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark,
struct inode *inode)
{
- spin_lock(&mark->lock);
-
spin_lock(&inode->i_lock);
hlist_del_init_rcu(&mark->i.i_list);
/*
@@ -73,6 +71,7 @@ void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark,
fsnotify_recalc_inode_mask_locked(inode);
spin_unlock(&inode->i_lock);

+ spin_lock(&mark->lock);
if (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED) {
/*
* __fsnotify_update_child_dentry_flags(inode);
@@ -209,6 +208,7 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
mark->flags |= FSNOTIFY_MARK_FLAG_INODE;
mark->i.inode = igrab(inode);
mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
+ spin_unlock(&mark->lock);

spin_lock(&inode->i_lock);
/* is mark the first mark? */
@@ -245,14 +245,15 @@ out:
spin_unlock(&inode->i_lock);

if (ret) {
+ spin_lock(&mark->lock);
mark->flags &= ~FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
iput(mark->i.inode);
mark->i.inode = NULL;
mark->flags &= ~FSNOTIFY_MARK_FLAG_INODE;
mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
mark->group = NULL;
+ spin_unlock(&mark->lock);
}
- spin_unlock(&mark->lock);

return ret;
}
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index 71b6f64..bf50de1 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -85,13 +85,12 @@ void fsnotify_recalc_vfsmount_mask(struct vfsmount *mnt)
void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark,
struct vfsmount *mnt)
{
- spin_lock(&mark->lock);
-
spin_lock(&mnt->mnt_root->d_lock);
hlist_del_init_rcu(&mark->m.m_list);
fsnotify_recalc_vfsmount_mask_locked(mnt);
spin_unlock(&mnt->mnt_root->d_lock);

+ spin_lock(&mark->lock);
mark->m.mnt = NULL;
mark->flags &= ~FSNOTIFY_MARK_FLAG_VFSMOUNT;
spin_unlock(&mark->lock);
@@ -148,6 +147,7 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE;
mark->flags |= FSNOTIFY_MARK_FLAG_VFSMOUNT;
mark->m.mnt = mnt;
+ spin_unlock(&mark->lock);

spin_lock(&mnt->mnt_root->d_lock);
/* is mark the first mark? */
@@ -184,12 +184,13 @@ out:
spin_unlock(&mnt->mnt_root->d_lock);

if (ret) {
+ spin_lock(&mark->lock);
mark->m.mnt = NULL;
mark->flags &= ~FSNOTIFY_MARK_FLAG_VFSMOUNT;
mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
mark->group = NULL;
+ spin_unlock(&mark->lock);
}
- spin_unlock(&mark->lock);

return ret;
}
--
1.7.1

2011-02-21 17:37:49

by Lino Sanfilippo

[permalink] [raw]
Subject: [PATCH 1/5] fsnotify: grab inode ref in add_inode_mark() instead of add_mark()

Currently the type of a mark is checked in fsnotify_add_mark() and if its an
inode mark a ref is grabbed. Since this part is only needed for inode marks it
is shiftet to add_inode_mark().
Beside this the fsobject is assigned to the mark outside of the fsobject lock.

Signed-off-by: Lino Sanfilippo <[email protected]>
---
fs/notify/inode_mark.c | 12 +++++++++---
fs/notify/mark.c | 2 --
fs/notify/vfsmount_mark.c | 9 ++++++---
3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index d438f11..168a242 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -188,13 +188,12 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
int ret = 0;

mark->flags |= FSNOTIFY_MARK_FLAG_INODE;
+ mark->i.inode = igrab(inode);
+ mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;

assert_spin_locked(&mark->lock);

spin_lock(&inode->i_lock);
-
- mark->i.inode = inode;
-
/* is mark the first mark? */
if (hlist_empty(&inode->i_fsnotify_marks)) {
hlist_add_head_rcu(&mark->i.i_list, &inode->i_fsnotify_marks);
@@ -228,6 +227,13 @@ out:
fsnotify_recalc_inode_mask_locked(inode);
spin_unlock(&inode->i_lock);

+ if (ret) {
+ mark->flags &= ~FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
+ iput(mark->i.inode);
+ mark->i.inode = NULL;
+ mark->flags &= ~FSNOTIFY_MARK_FLAG_INODE;
+ }
+
return ret;
}

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 11cfedc..422de6e 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -219,8 +219,6 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
BUG();
if (ret)
goto err;
- /* this will pin the object if appropriate */
- fsnotify_set_mark_mask_locked(mark, mark->mask);

spin_unlock(&mark->lock);

diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index 04f5929..c43c310 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -145,13 +145,11 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
int ret = 0;

mark->flags |= FSNOTIFY_MARK_FLAG_VFSMOUNT;
+ mark->m.mnt = mnt;

assert_spin_locked(&mark->lock);

spin_lock(&mnt->mnt_root->d_lock);
-
- mark->m.mnt = mnt;
-
/* is mark the first mark? */
if (hlist_empty(&mnt->mnt_fsnotify_marks)) {
hlist_add_head_rcu(&mark->m.m_list, &mnt->mnt_fsnotify_marks);
@@ -185,5 +183,10 @@ out:
fsnotify_recalc_vfsmount_mask_locked(mnt);
spin_unlock(&mnt->mnt_root->d_lock);

+ if (ret) {
+ mark->m.mnt = NULL;
+ mark->flags &= ~FSNOTIFY_MARK_FLAG_VFSMOUNT;
+ }
+
return ret;
}
--
1.7.1

2011-02-21 17:38:21

by Lino Sanfilippo

[permalink] [raw]
Subject: [PATCH 4/5] fsnotify: dont call fsnotify_destroy_[inode|vfsmount]_mark() with mark lock held.

Dont call fsnotify_destroy_[inode|vfsmount]_mark() with the mark lock already held,
but take the lock within these functions.

Also dont put inode in mark_destroy() but in inode_mark_destroy().

Signed-off-by: Lino Sanfilippo <[email protected]>
---
fs/notify/inode_mark.c | 27 ++++++++++++++---
fs/notify/mark.c | 72 +++++++++++++-------------------------------
fs/notify/vfsmount_mark.c | 10 +++---
3 files changed, 48 insertions(+), 61 deletions(-)

diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index b357e4b..ea015c7 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -61,21 +61,38 @@ void fsnotify_recalc_inode_mask(struct inode *inode)
void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark,
struct inode *inode)
{
- assert_spin_locked(&mark->lock);
+ spin_lock(&mark->lock);

spin_lock(&inode->i_lock);
-
hlist_del_init_rcu(&mark->i.i_list);
- mark->i.inode = NULL;
-
/*
* this mark is now off the inode->i_fsnotify_marks list and we
* hold the inode->i_lock, so this is the perfect time to update the
* inode->i_fsnotify_mask
*/
fsnotify_recalc_inode_mask_locked(inode);
-
spin_unlock(&inode->i_lock);
+
+ if (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED) {
+ /*
+ * __fsnotify_update_child_dentry_flags(inode);
+ *
+ * I really want to call that, but we can't, we have no idea if
+ * the inode still exists the second we drop the mark->lock.
+ *
+ * The next time an event arrive to this inode from one of it's
+ * children
+ * __fsnotify_parent will see that the inode doesn't care about
+ * it's children and will update all of these flags then. So
+ * really this is just a lazy update (and could be a perf
+ * win...)
+ */
+ iput(inode);
+ mark->flags &= ~FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
+ }
+ mark->i.inode = NULL;
+ mark->flags &= ~FSNOTIFY_MARK_FLAG_INODE;
+ spin_unlock(&mark->lock);
}

/*
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 57f8dd6..8743532 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -112,6 +112,7 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
{
struct fsnotify_group *group;
struct inode *inode = NULL;
+ struct vfsmount *mnt = NULL;

spin_lock(&mark->lock);
/* something else already called this function on this mark */
@@ -119,27 +120,26 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
spin_unlock(&mark->lock);
return;
}
+ if (mark->flags & FSNOTIFY_MARK_FLAG_INODE)
+ inode = mark->i.inode;
+ else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT)
+ mnt = mark->m.mnt;
+ else
+ BUG();
mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
group = mark->group;
spin_unlock(&mark->lock);

mutex_lock(&group->mark_mutex);
- spin_lock(&mark->lock);
-
/* 1 from caller and 1 for being on i_list/g_list */
BUG_ON(atomic_read(&mark->refcnt) < 2);

- if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
- inode = mark->i.inode;
+ if (inode)
fsnotify_destroy_inode_mark(mark, inode);
- } else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT)
- fsnotify_destroy_vfsmount_mark(mark, mark->m.mnt);
- else
- BUG();
- spin_unlock(&mark->lock);
+ else /* mount */
+ fsnotify_destroy_vfsmount_mark(mark, mnt);

list_del_init(&mark->g_list);
-
mutex_unlock(&group->mark_mutex);

spin_lock(&destroy_lock);
@@ -156,21 +156,6 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
group->ops->freeing_mark(mark, group);

/*
- * __fsnotify_update_child_dentry_flags(inode);
- *
- * I really want to call that, but we can't, we have no idea if the inode
- * still exists the second we drop the mark->lock.
- *
- * The next time an event arrive to this inode from one of it's children
- * __fsnotify_parent will see that the inode doesn't care about it's
- * children and will update all of these flags then. So really this
- * is just a lazy update (and could be a perf win...)
- */
-
- if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
- iput(inode);
-
- /*
* it's possible that this group tried to destroy itself, but this
* this mark was simultaneously being freed by inode. If that's the
* case, we finish freeing the group here.
@@ -217,7 +202,6 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
if (ret)
goto err;

-
if (inode)
__fsnotify_update_child_dentry_flags(inode);

@@ -265,29 +249,30 @@ void fsnotify_remove_mark_locked(struct fsnotify_mark *mark,
struct fsnotify_group *group)
{
struct inode *inode = NULL;
+ struct vfsmount *mnt = NULL;

spin_lock(&mark->lock);
-
/* something else already called this function on this mark */
if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
spin_unlock(&mark->lock);
return;
}
-
+ if (mark->flags & FSNOTIFY_MARK_FLAG_INODE)
+ inode = mark->i.inode;
+ else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT)
+ mnt = mark->m.mnt;
+ else
+ BUG();
mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
+ spin_unlock(&mark->lock);

/* 1 from caller and 1 for being on i_list/g_list */
BUG_ON(atomic_read(&mark->refcnt) < 2);

-
- if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
- inode = mark->i.inode;
+ if (inode)
fsnotify_destroy_inode_mark(mark, inode);
- } else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT)
- fsnotify_destroy_vfsmount_mark(mark, mark->m.mnt);
- else
- BUG();
- spin_unlock(&mark->lock);
+ else /* mount */
+ fsnotify_destroy_vfsmount_mark(mark, mnt);

list_del_init(&mark->g_list);

@@ -305,21 +290,6 @@ void fsnotify_remove_mark_locked(struct fsnotify_mark *mark,
group->ops->freeing_mark(mark, group);

/*
- * __fsnotify_update_child_dentry_flags(inode);
- *
- * I really want to call that, but we can't, we have no idea if the inode
- * still exists the second we drop the mark->lock.
- *
- * The next time an event arrive to this inode from one of it's children
- * __fsnotify_parent will see that the inode doesn't care about it's
- * children and will update all of these flags then. So really this
- * is just a lazy update (and could be a perf win...)
- */
-
- if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
- iput(inode);
-
- /*
* it's possible that this group tried to destroy itself, but this
* this mark was simultaneously being freed by inode. If that's the
* case, we finish freeing the group here.
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index a6fe562..71b6f64 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -85,16 +85,16 @@ void fsnotify_recalc_vfsmount_mask(struct vfsmount *mnt)
void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark,
struct vfsmount *mnt)
{
- assert_spin_locked(&mark->lock);
+ spin_lock(&mark->lock);

spin_lock(&mnt->mnt_root->d_lock);
-
hlist_del_init_rcu(&mark->m.m_list);
- mark->m.mnt = NULL;
-
fsnotify_recalc_vfsmount_mask_locked(mnt);
-
spin_unlock(&mnt->mnt_root->d_lock);
+
+ mark->m.mnt = NULL;
+ mark->flags &= ~FSNOTIFY_MARK_FLAG_VFSMOUNT;
+ spin_unlock(&mark->lock);
}

static struct fsnotify_mark *fsnotify_find_vfsmount_mark_locked(struct fsnotify_group *group,
--
1.7.1

2011-02-21 17:38:40

by Lino Sanfilippo

[permalink] [raw]
Subject: [PATCH 2/5] fsnotify: pass inode/mount as a parameter to fsnotify_destroy_[inode|vfsmount]_mark()

Pass the fsobject (inode/mount) as a parameter to fsnotify_destroy_[inode|vfsmount]_mark()
instead of taking it from the passed mark.

Signed-off-by: Lino Sanfilippo <[email protected]>
---
fs/notify/fsnotify.h | 6 ++++--
fs/notify/inode_mark.c | 5 ++---
fs/notify/mark.c | 8 ++++----
fs/notify/vfsmount_mark.c | 5 ++---
4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index 85e7d2b..8e2fc42 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -27,9 +27,11 @@ extern int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
extern void fsnotify_final_destroy_group(struct fsnotify_group *group);

/* vfsmount specific destruction of a mark */
-extern void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark);
+extern void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark,
+ struct vfsmount *mnt);
/* inode specific destruction of a mark */
-extern void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark);
+extern void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark,
+ struct inode *inode);
/* run the list of all marks associated with inode and flag them to be freed */
extern void fsnotify_clear_marks_by_inode(struct inode *inode);
/* run the list of all marks associated with vfsmount and flag them to be freed */
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index 168a242..a9b8661 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -58,10 +58,9 @@ void fsnotify_recalc_inode_mask(struct inode *inode)
__fsnotify_update_child_dentry_flags(inode);
}

-void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
+void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark,
+ struct inode *inode)
{
- struct inode *inode = mark->i.inode;
-
assert_spin_locked(&mark->lock);

spin_lock(&inode->i_lock);
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 422de6e..6d9bf1d 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -131,9 +131,9 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)

if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
inode = mark->i.inode;
- fsnotify_destroy_inode_mark(mark);
+ fsnotify_destroy_inode_mark(mark, inode);
} else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT)
- fsnotify_destroy_vfsmount_mark(mark);
+ fsnotify_destroy_vfsmount_mark(mark, mark->m.mnt);
else
BUG();
spin_unlock(&mark->lock);
@@ -290,9 +290,9 @@ void fsnotify_remove_mark_locked(struct fsnotify_mark *mark,

if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
inode = mark->i.inode;
- fsnotify_destroy_inode_mark(mark);
+ fsnotify_destroy_inode_mark(mark, inode);
} else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT)
- fsnotify_destroy_vfsmount_mark(mark);
+ fsnotify_destroy_vfsmount_mark(mark, mark->m.mnt);
else
BUG();
spin_unlock(&mark->lock);
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index c43c310..b33e78b 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -82,10 +82,9 @@ void fsnotify_recalc_vfsmount_mask(struct vfsmount *mnt)
spin_unlock(&mnt->mnt_root->d_lock);
}

-void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark)
+void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark,
+ struct vfsmount *mnt)
{
- struct vfsmount *mnt = mark->m.mnt;
-
assert_spin_locked(&mark->lock);

spin_lock(&mnt->mnt_root->d_lock);
--
1.7.1

2011-02-21 19:36:36

by Lino Sanfilippo

[permalink] [raw]
Subject: Re: [PATCH 0/5] fsnotify: decouple mark lock and marks fsobject lock


The intro was somehow truncated. It should be:

This patch series removes the dependency between holding the lock of a mark and
holding the lock of the marks fsobject (inode/vfsmount).
Right now we have to use the following locking order

1. mark lock
2. inode/vfsmount lock

With these patches only one of both locks are taken at the same time, which
removes the need for a strict locking order.

These patches depend on patch "inotify: fix race when adding a new watch" sent to lkml
on Feb. 11.

Lino

2011-02-25 18:36:57

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH 1/5] fsnotify: grab inode ref in add_inode_mark() instead of add_mark()

On Mon, 2011-02-21 at 18:33 +0100, Lino Sanfilippo wrote:
> Currently the type of a mark is checked in fsnotify_add_mark() and if its an
> inode mark a ref is grabbed. Since this part is only needed for inode marks it
> is shiftet to add_inode_mark().
> Beside this the fsobject is assigned to the mark outside of the fsobject lock.
>
> Signed-off-by: Lino Sanfilippo <[email protected]>

I think your patch series is making a noticeable change which I don't
think I'm comfortable with. The current code does not pin inodes in
core if they only have ignored masks. Under memory pressure those
inodes can get eviced and the mark will get cleaned up. My glance at
this code leads me to believe that all inodes with any mark is going to
be pinned in core. I don't think that's a good idea for AV vendor use
where they might want to watch everything on the system with mount point
marks and put ignored marks on everything that comes along to cache
allows.

The fact that inodes might not be pinned in core is the reason for some
of the stupid difficulty. There is probably some way to work it out but
I think it's something I'm going to need to think about.

Am I miss-reading your changes? I think they will work, just maybe not
quite how I originally intended. I probably should write an fanotify
stress tester that stresses both marks with real masks and with only
ignored_masks...

-Eric

> ---
> fs/notify/inode_mark.c | 12 +++++++++---
> fs/notify/mark.c | 2 --
> fs/notify/vfsmount_mark.c | 9 ++++++---
> 3 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
> index d438f11..168a242 100644
> --- a/fs/notify/inode_mark.c
> +++ b/fs/notify/inode_mark.c
> @@ -188,13 +188,12 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
> int ret = 0;
>
> mark->flags |= FSNOTIFY_MARK_FLAG_INODE;
> + mark->i.inode = igrab(inode);
> + mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
>
> assert_spin_locked(&mark->lock);
>
> spin_lock(&inode->i_lock);
> -
> - mark->i.inode = inode;
> -
> /* is mark the first mark? */
> if (hlist_empty(&inode->i_fsnotify_marks)) {
> hlist_add_head_rcu(&mark->i.i_list, &inode->i_fsnotify_marks);
> @@ -228,6 +227,13 @@ out:
> fsnotify_recalc_inode_mask_locked(inode);
> spin_unlock(&inode->i_lock);
>
> + if (ret) {
> + mark->flags &= ~FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
> + iput(mark->i.inode);
> + mark->i.inode = NULL;
> + mark->flags &= ~FSNOTIFY_MARK_FLAG_INODE;
> + }
> +
> return ret;
> }
>
> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index 11cfedc..422de6e 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -219,8 +219,6 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
> BUG();
> if (ret)
> goto err;
> - /* this will pin the object if appropriate */
> - fsnotify_set_mark_mask_locked(mark, mark->mask);
>
> spin_unlock(&mark->lock);
>
> diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
> index 04f5929..c43c310 100644
> --- a/fs/notify/vfsmount_mark.c
> +++ b/fs/notify/vfsmount_mark.c
> @@ -145,13 +145,11 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
> int ret = 0;
>
> mark->flags |= FSNOTIFY_MARK_FLAG_VFSMOUNT;
> + mark->m.mnt = mnt;
>
> assert_spin_locked(&mark->lock);
>
> spin_lock(&mnt->mnt_root->d_lock);
> -
> - mark->m.mnt = mnt;
> -
> /* is mark the first mark? */
> if (hlist_empty(&mnt->mnt_fsnotify_marks)) {
> hlist_add_head_rcu(&mark->m.m_list, &mnt->mnt_fsnotify_marks);
> @@ -185,5 +183,10 @@ out:
> fsnotify_recalc_vfsmount_mask_locked(mnt);
> spin_unlock(&mnt->mnt_root->d_lock);
>
> + if (ret) {
> + mark->m.mnt = NULL;
> + mark->flags &= ~FSNOTIFY_MARK_FLAG_VFSMOUNT;
> + }
> +
> return ret;
> }

2011-03-01 18:34:12

by Lino Sanfilippo

[permalink] [raw]
Subject: Re: [PATCH 1/5] fsnotify: grab inode ref in add_inode_mark() instead of add_mark()

On Fri, Feb 25, 2011 at 01:36:35PM -0500, Eric Paris wrote:
> I think your patch series is making a noticeable change which I don't
> think I'm comfortable with. The current code does not pin inodes in
> core if they only have ignored masks. Under memory pressure those
> inodes can get eviced and the mark will get cleaned up. My glance at
> this code leads me to believe that all inodes with any mark is going to
> be pinned in core. I don't think that's a good idea for AV vendor use
> where they might want to watch everything on the system with mount point
> marks and put ignored marks on everything that comes along to cache
> allows.
>
> The fact that inodes might not be pinned in core is the reason for some
> of the stupid difficulty. There is probably some way to work it out but
> I think it's something I'm going to need to think about.

Youre right, the inode is now also pinned if there is no mask set. This is
a change i did not on purpose. Whether the inode is pinned or not does not
affect the original purpose of the patch series, which was the decoupling
of the mark lock and the fsobject lock. I simply forgot to check the mask.
I could replace that part with something like

- mark->i.inode = igrab(inode);
- mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
+ mark->i.inode = inode;
+ if (mark->mask) { /* only pin if mask is set */
+ igrab(inode);
+ mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
+ }

Should i just fix it and resend the patches? Or do you have any other
concerns?

2011-03-01 21:08:01

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH 1/5] fsnotify: grab inode ref in add_inode_mark() instead of add_mark()

On Tue, 2011-03-01 at 19:30 +0100, Lino Sanfilippo wrote:
> On Fri, Feb 25, 2011 at 01:36:35PM -0500, Eric Paris wrote:
> > I think your patch series is making a noticeable change which I don't
> > think I'm comfortable with. The current code does not pin inodes in
> > core if they only have ignored masks. Under memory pressure those
> > inodes can get eviced and the mark will get cleaned up. My glance at
> > this code leads me to believe that all inodes with any mark is going to
> > be pinned in core. I don't think that's a good idea for AV vendor use
> > where they might want to watch everything on the system with mount point
> > marks and put ignored marks on everything that comes along to cache
> > allows.
> >
> > The fact that inodes might not be pinned in core is the reason for some
> > of the stupid difficulty. There is probably some way to work it out but
> > I think it's something I'm going to need to think about.
>
> Youre right, the inode is now also pinned if there is no mask set. This is
> a change i did not on purpose. Whether the inode is pinned or not does not
> affect the original purpose of the patch series, which was the decoupling
> of the mark lock and the fsobject lock. I simply forgot to check the mask.
> I could replace that part with something like
>
> - mark->i.inode = igrab(inode);
> - mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
> + mark->i.inode = inode;
> + if (mark->mask) { /* only pin if mask is set */
> + igrab(inode);
> + mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
> + }
>
> Should i just fix it and resend the patches? Or do you have any other
> concerns?

No, but it's a rather serious concern to me. You need to make sure that
mark->i.inode is actually valid before you use it and cannot disappear
underneath you. I haven't relooked at your code (and clearly will after
you fix) but what I'm most concerned about is some place where we are
trying to delete a mark, either explicitly or by group, and then race
with the inode being evicted from cache. The inode being evicted from
cache is going to eventually hit the destroy_by_inode code. When that
function returns we cannot use mark->i.inode any more. In today's code
we prevent this situation by never dereferencing or using the
mark->i.inode outside of mark->lock. You're going to have to remember
that after a 'destroy_by_inode' call you can't ever use the inode
again....

-Eric