2010-08-18 16:29:59

by Eric Paris

[permalink] [raw]
Subject: [PATCH 1/5] fanotify: do not dereference inode_mark when it is unset

The fanotify code is supposed to get the group from the mark. It accidentally
only used the inode_mark. If the vfsmount_mark was set but not the inode_mark
it would deref the NULL inode_mark. Get the group from the correct place.

Reported-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Eric Paris <[email protected]>
---

fs/notify/fsnotify.c | 15 +++++++++++----
1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 3970392..f3e3b35 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -148,13 +148,14 @@ static int send_to_group(struct inode *to_tell, struct vfsmount *mnt,
const unsigned char *file_name,
struct fsnotify_event **event)
{
- struct fsnotify_group *group = inode_mark->group;
+ struct fsnotify_group *group = NULL;
__u32 inode_test_mask = (mask & ~FS_EVENT_ON_CHILD);
__u32 vfsmount_test_mask = (mask & ~FS_EVENT_ON_CHILD);

- pr_debug("%s: group=%p to_tell=%p mnt=%p mark=%p mask=%x data=%p"
- " data_is=%d cookie=%d event=%p\n", __func__, group, to_tell,
- mnt, inode_mark, mask, data, data_is, cookie, *event);
+ if (unlikely(!inode_mark && !vfsmount_mark)) {
+ BUG();
+ return 0;
+ }

/* clear ignored on inode modification */
if (mask & FS_MODIFY) {
@@ -168,18 +169,24 @@ static int send_to_group(struct inode *to_tell, struct vfsmount *mnt,

/* does the inode mark tell us to do something? */
if (inode_mark) {
+ group = inode_mark->group;
inode_test_mask &= inode_mark->mask;
inode_test_mask &= ~inode_mark->ignored_mask;
}

/* does the vfsmount_mark tell us to do something? */
if (vfsmount_mark) {
+ group = vfsmount_mark->group;
vfsmount_test_mask &= vfsmount_mark->mask;
vfsmount_test_mask &= ~vfsmount_mark->ignored_mask;
if (inode_mark)
vfsmount_test_mask &= ~inode_mark->ignored_mask;
}

+ pr_debug("%s: group=%p to_tell=%p mnt=%p mark=%p mask=%x data=%p"
+ " data_is=%d cookie=%d event=%p\n", __func__, group, to_tell,
+ mnt, inode_mark, mask, data, data_is, cookie, *event);
+
if (!inode_test_mask && !vfsmount_test_mask)
return 0;


2010-08-18 16:30:05

by Eric Paris

[permalink] [raw]
Subject: [PATCH 2/5] fsnotify: reset used_inode and used_vfsmount on each pass

The fsnotify main loop has 2 booleans which tell if a particular mark was
sent to the listeners or if it should be processed in the next pass. The
problem is that the booleans were not reset on each traversal of the loop.
So marks could get skipped even when they were not sent to the notifiers.

Reported-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Eric Paris <[email protected]>
---

fs/notify/fsnotify.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index f3e3b35..59dc7a0 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -220,7 +220,7 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
struct fsnotify_event *event = NULL;
struct vfsmount *mnt;
int idx, ret = 0;
- bool used_inode = false, used_vfsmount = false;
+ bool used_inode, used_vfsmount;
/* global tests shouldn't care about events on child only the specific event */
__u32 test_mask = (mask & ~FS_EVENT_ON_CHILD);

@@ -261,6 +261,8 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
}

while (inode_node || vfsmount_node) {
+ used_inode = used_vfsmount = false;
+
if (inode_node) {
inode_mark = hlist_entry(srcu_dereference(inode_node, &fsnotify_mark_srcu),
struct fsnotify_mark, i.i_list);

2010-08-18 16:30:11

by Eric Paris

[permalink] [raw]
Subject: [PATCH 3/5] fanotify: add MAINTAINERS entry

add myself as the maintainer.

Reported-by: Andy Gospodarek <[email protected]>
Signed-off-by: Eric Paris <[email protected]
---

MAINTAINERS | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2c07d55..fb321c2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2299,6 +2299,12 @@ S: Maintained
F: Documentation/hwmon/f71805f
F: drivers/hwmon/f71805f.c

+FANOTIFY
+M: Eric Paris <[email protected]>
+S: Maintained
+F: fs/notify/fanotify/
+F: include/linux/fanotify.h
+
FARSYNC SYNCHRONOUS DRIVER
M: Kevin Curtis <[email protected]>
W: http://www.farsite.co.uk/

2010-08-18 16:30:17

by Eric Paris

[permalink] [raw]
Subject: [PATCH 4/5] fsnotify: fix ignored mask handling between inode and vfsmount marks

The interesting 2 list lockstep walking didn't quite work out if the inode
marks only had ignores and the vfsmount list requested events. The code to
shortcut list traversal would not run the inode list since it didn't have real
event requests. This code forces inode list traversal when a vfsmount mark
matches the event type. Maybe we could add an i_fsnotify_ignored_mask field
to struct inode to get the shortcut back, but it doesn't seem worth it to grow
struct inode again.

I bet with the recent changes to lock the way we do now it would actually not
be a major perf hit to just drop i_fsnotify_mark_mask altogether. But that is
for another day.

Signed-off-by: Eric Paris <[email protected]>
---

fs/notify/fsnotify.c | 35 +++++++++++++++++------------------
1 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 59dc7a0..6f2777c 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -149,8 +149,8 @@ static int send_to_group(struct inode *to_tell, struct vfsmount *mnt,
struct fsnotify_event **event)
{
struct fsnotify_group *group = NULL;
- __u32 inode_test_mask = (mask & ~FS_EVENT_ON_CHILD);
- __u32 vfsmount_test_mask = (mask & ~FS_EVENT_ON_CHILD);
+ __u32 inode_test_mask = 0;
+ __u32 vfsmount_test_mask = 0;

if (unlikely(!inode_mark && !vfsmount_mark)) {
BUG();
@@ -170,12 +170,14 @@ static int send_to_group(struct inode *to_tell, struct vfsmount *mnt,
/* does the inode mark tell us to do something? */
if (inode_mark) {
group = inode_mark->group;
+ inode_test_mask = (mask & ~FS_EVENT_ON_CHILD);
inode_test_mask &= inode_mark->mask;
inode_test_mask &= ~inode_mark->ignored_mask;
}

/* does the vfsmount_mark tell us to do something? */
if (vfsmount_mark) {
+ vfsmount_test_mask = (mask & ~FS_EVENT_ON_CHILD);
group = vfsmount_mark->group;
vfsmount_test_mask &= vfsmount_mark->mask;
vfsmount_test_mask &= ~vfsmount_mark->ignored_mask;
@@ -183,9 +185,12 @@ static int send_to_group(struct inode *to_tell, struct vfsmount *mnt,
vfsmount_test_mask &= ~inode_mark->ignored_mask;
}

- pr_debug("%s: group=%p to_tell=%p mnt=%p mark=%p mask=%x data=%p"
- " data_is=%d cookie=%d event=%p\n", __func__, group, to_tell,
- mnt, inode_mark, mask, data, data_is, cookie, *event);
+ pr_debug("%s: group=%p to_tell=%p mnt=%p mask=%x inode_mark=%p"
+ " inode_test_mask=%x vfsmount_mark=%p vfsmount_test_mask=%x"
+ " data=%p data_is=%d cookie=%d event=%p\n",
+ __func__, group, to_tell, mnt, mask, inode_mark,
+ inode_test_mask, vfsmount_mark, vfsmount_test_mask, data,
+ data_is, cookie, *event);

if (!inode_test_mask && !vfsmount_test_mask)
return 0;
@@ -214,7 +219,7 @@ static int send_to_group(struct inode *to_tell, struct vfsmount *mnt,
int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
const unsigned char *file_name, u32 cookie)
{
- struct hlist_node *inode_node, *vfsmount_node;
+ struct hlist_node *inode_node = NULL, *vfsmount_node = NULL;
struct fsnotify_mark *inode_mark = NULL, *vfsmount_mark = NULL;
struct fsnotify_group *inode_group, *vfsmount_group;
struct fsnotify_event *event = NULL;
@@ -245,19 +250,13 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
(test_mask & to_tell->i_fsnotify_mask))
inode_node = srcu_dereference(to_tell->i_fsnotify_marks.first,
&fsnotify_mark_srcu);
- else
- inode_node = NULL;

- if (mnt) {
- if ((mask & FS_MODIFY) ||
- (test_mask & mnt->mnt_fsnotify_mask))
- vfsmount_node = srcu_dereference(mnt->mnt_fsnotify_marks.first,
- &fsnotify_mark_srcu);
- else
- vfsmount_node = NULL;
- } else {
- mnt = NULL;
- vfsmount_node = NULL;
+ if (mnt && ((mask & FS_MODIFY) ||
+ (test_mask & mnt->mnt_fsnotify_mask))) {
+ vfsmount_node = srcu_dereference(mnt->mnt_fsnotify_marks.first,
+ &fsnotify_mark_srcu);
+ inode_node = srcu_dereference(to_tell->i_fsnotify_marks.first,
+ &fsnotify_mark_srcu);
}

while (inode_node || vfsmount_node) {

2010-08-18 16:30:23

by Eric Paris

[permalink] [raw]
Subject: [PATCH 5/5] fanotify: flush outstanding perm requests on group destroy

fanotify should flush (and allow) all outstanding permission requests when
the group is being torn down. The most logical place for this flushing
was the fsnotify free_group_priv hook but that hook can't work. When fanotify
is waiting on a permission response from userspace it is holding the
fsnotify mark srcu lock. Group tear down to get to the free_group_priv hook
requires syncronizing the srcu lock.

The solution entered here is to add an atomic which is set on fanotify_release
which will prevent any further permissions actions from being taken. We then
flush all outstanding permission events, which will cause the original side to
release the srcu lock. The group destruction code then proceeds to sync the
srcu lock and finish cleaning up normally.

Signed-off-by: Eric Paris <[email protected]>
---

fs/notify/fanotify/fanotify.c | 3 +++
fs/notify/fanotify/fanotify_user.c | 21 +++++++++++++++++++++
include/linux/fanotify.h | 7 -------
include/linux/fsnotify_backend.h | 1 +
4 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 756566f..fe7845e 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -92,6 +92,9 @@ static int fanotify_get_response_from_access(struct fsnotify_group *group,

pr_debug("%s: group=%p event=%p\n", __func__, group, event);

+ if (unlikely(atomic_read(&group->fanotify_data.bypass_perm)))
+ return 0;
+
wait_event(group->fanotify_data.access_waitq, event->response);

/* userspace responded, convert to something usable */
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 032b837..425ec89 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -187,6 +187,9 @@ static int prepare_for_access_response(struct fsnotify_group *group,
if (!(event->mask & FAN_ALL_PERM_EVENTS))
return 0;

+ if (unlikely(atomic_read(&group->fanotify_data.bypass_perm)))
+ return 0;
+
re = kmem_cache_alloc(fanotify_response_event_cache, GFP_KERNEL);
if (!re)
return -ENOMEM;
@@ -364,9 +367,27 @@ static ssize_t fanotify_write(struct file *file, const char __user *buf, size_t
static int fanotify_release(struct inode *ignored, struct file *file)
{
struct fsnotify_group *group = file->private_data;
+ struct fanotify_response_event *re, *lre;

pr_debug("%s: file=%p group=%p\n", __func__, file, group);

+#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
+ atomic_inc(&group->fanotify_data.bypass_perm);
+
+ mutex_lock(&group->fanotify_data.access_mutex);
+ list_for_each_entry_safe(re, lre, &group->fanotify_data.access_list, list) {
+ pr_debug("%s: found group=%p re=%p event=%p\n", __func__, group,
+ re, re->event);
+
+ list_del_init(&re->list);
+ re->event->response = FAN_ALLOW;
+
+ kmem_cache_free(fanotify_response_event_cache, re);
+ }
+ mutex_unlock(&group->fanotify_data.access_mutex);
+
+ wake_up(&group->fanotify_data.access_waitq);
+#endif
/* matches the fanotify_init->fsnotify_alloc_group */
fsnotify_put_group(group);

diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index f0949a5..9854356 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -95,11 +95,4 @@ struct fanotify_response {
(long)(meta)->event_len >= (long)FAN_EVENT_METADATA_LEN && \
(long)(meta)->event_len <= (long)(len))

-#ifdef __KERNEL__
-
-struct fanotify_wait {
- struct fsnotify_event *event;
- __s32 fd;
-};
-#endif /* __KERNEL__ */
#endif /* _LINUX_FANOTIFY_H */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index ed36fb5..3d5b07c 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -156,6 +156,7 @@ struct fsnotify_group {
struct mutex access_mutex;
struct list_head access_list;
wait_queue_head_t access_waitq;
+ atomic_t bypass_perm;
#endif /* CONFIG_FANOTIFY_ACCESS_PERMISSIONS */
int f_flags;
} fanotify_data;

2010-08-19 14:07:46

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH 5/5] fanotify: flush outstanding perm requests on group destroy

I'm pretty sure this patch is still racy, just a much smaller race.
I'm going to rework this today.

-Eric

On Wed, Aug 18, 2010 at 12:30 PM, Eric Paris <[email protected]> wrote:
> fanotify should flush (and allow) all outstanding permission requests when
> the group is being torn down. ?The most logical place for this flushing
> was the fsnotify free_group_priv hook but that hook can't work. ?When fanotify
> is waiting on a permission response from userspace it is holding the
> fsnotify mark srcu lock. ?Group tear down to get to the free_group_priv hook
> requires syncronizing the srcu lock.
>
> The solution entered here is to add an atomic which is set on fanotify_release
> which will prevent any further permissions actions from being taken. ?We then
> flush all outstanding permission events, which will cause the original side to
> release the srcu lock. ?The group destruction code then proceeds to sync the
> srcu lock and finish cleaning up normally.
>
> Signed-off-by: Eric Paris <[email protected]>
> ---
>
> ?fs/notify/fanotify/fanotify.c ? ? ?| ? ?3 +++
> ?fs/notify/fanotify/fanotify_user.c | ? 21 +++++++++++++++++++++
> ?include/linux/fanotify.h ? ? ? ? ? | ? ?7 -------
> ?include/linux/fsnotify_backend.h ? | ? ?1 +
> ?4 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 756566f..fe7845e 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -92,6 +92,9 @@ static int fanotify_get_response_from_access(struct fsnotify_group *group,
>
> ? ? ? ?pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>
> + ? ? ? if (unlikely(atomic_read(&group->fanotify_data.bypass_perm)))
> + ? ? ? ? ? ? ? return 0;
> +
> ? ? ? ?wait_event(group->fanotify_data.access_waitq, event->response);
>
> ? ? ? ?/* userspace responded, convert to something usable */
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 032b837..425ec89 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -187,6 +187,9 @@ static int prepare_for_access_response(struct fsnotify_group *group,
> ? ? ? ?if (!(event->mask & FAN_ALL_PERM_EVENTS))
> ? ? ? ? ? ? ? ?return 0;
>
> + ? ? ? if (unlikely(atomic_read(&group->fanotify_data.bypass_perm)))
> + ? ? ? ? ? ? ? return 0;
> +
> ? ? ? ?re = kmem_cache_alloc(fanotify_response_event_cache, GFP_KERNEL);
> ? ? ? ?if (!re)
> ? ? ? ? ? ? ? ?return -ENOMEM;
> @@ -364,9 +367,27 @@ static ssize_t fanotify_write(struct file *file, const char __user *buf, size_t
> ?static int fanotify_release(struct inode *ignored, struct file *file)
> ?{
> ? ? ? ?struct fsnotify_group *group = file->private_data;
> + ? ? ? struct fanotify_response_event *re, *lre;
>
> ? ? ? ?pr_debug("%s: file=%p group=%p\n", __func__, file, group);
>
> +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> + ? ? ? atomic_inc(&group->fanotify_data.bypass_perm);
> +
> + ? ? ? mutex_lock(&group->fanotify_data.access_mutex);
> + ? ? ? list_for_each_entry_safe(re, lre, &group->fanotify_data.access_list, list) {
> + ? ? ? ? ? ? ? pr_debug("%s: found group=%p re=%p event=%p\n", __func__, group,
> + ? ? ? ? ? ? ? ? ? ? ? ?re, re->event);
> +
> + ? ? ? ? ? ? ? list_del_init(&re->list);
> + ? ? ? ? ? ? ? re->event->response = FAN_ALLOW;
> +
> + ? ? ? ? ? ? ? kmem_cache_free(fanotify_response_event_cache, re);
> + ? ? ? }
> + ? ? ? mutex_unlock(&group->fanotify_data.access_mutex);
> +
> + ? ? ? wake_up(&group->fanotify_data.access_waitq);
> +#endif
> ? ? ? ?/* matches the fanotify_init->fsnotify_alloc_group */
> ? ? ? ?fsnotify_put_group(group);
>
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index f0949a5..9854356 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -95,11 +95,4 @@ struct fanotify_response {
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(long)(meta)->event_len >= (long)FAN_EVENT_METADATA_LEN && \
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(long)(meta)->event_len <= (long)(len))
>
> -#ifdef __KERNEL__
> -
> -struct fanotify_wait {
> - ? ? ? struct fsnotify_event *event;
> - ? ? ? __s32 fd;
> -};
> -#endif /* __KERNEL__ */
> ?#endif /* _LINUX_FANOTIFY_H */
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index ed36fb5..3d5b07c 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -156,6 +156,7 @@ struct fsnotify_group {
> ? ? ? ? ? ? ? ? ? ? ? ?struct mutex access_mutex;
> ? ? ? ? ? ? ? ? ? ? ? ?struct list_head access_list;
> ? ? ? ? ? ? ? ? ? ? ? ?wait_queue_head_t access_waitq;
> + ? ? ? ? ? ? ? ? ? ? ? atomic_t bypass_perm;
> ?#endif /* CONFIG_FANOTIFY_ACCESS_PERMISSIONS */
> ? ? ? ? ? ? ? ? ? ? ? ?int f_flags;
> ? ? ? ? ? ? ? ?} fanotify_data;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>