2008-11-25 19:42:49

by Evgeniy Polyakov

[permalink] [raw]
Subject: [take2] Inotify: nested attributes support.

Hi.

Attached patch (also attached test application) implementes
scalable nested attributes which are transferred on behalf the inotify
mechanism. It uses inotify_init1() to show that new event format should
be returned when reading from inotify file descriptor.

Attributes are attached to given inotify instance via TIOCSETD ioctl,
where ID of the attribute is transferred. If entry with given ID exists,
it is removed and -EEXIST is returned.

There is a set of callbacks indexed by ID (currently 4 attributes are
supported: pid, tid, io details (start/size of the written block) and
name), which are attached to given inotify device.
When new event is created, each callback is executed and may queue some
data into event structure, based on its internal details. When event is
read from the userspace, first its header is transferred (old
inotify_event structure) and then all attributes data in the order of
its registration via above ioctl. Nested attributes have following
structure:

struct inotify_attribute
{
unsigned int id;
unsigned int size;
unsigned char data[0];
};

where size is nuber of attached bytes for given attribute.
inotify_event.len contains number of bytes used to store all attached
attributes and data.

Attribute callback can check if mask contains its bits and do some steps
based on that information, for example io details attribute does not add
own data (and header) if event mask does not contain IN_MODIFY bit.

It is possible to infinitely extend number of attributes processed, so
we will be no longer limited by inotify API and ABI.

Test application usage:
$ gcc ./iotest.c -W -Wall -I/path/to/kernel/include/ -o iotest
$ ./iotest -f /tmp/ -t -n -p -i
2008-11-25 22:29:47.8477 pid: 1850, tid: 1850, name: /tmp/, wd: 1, mask: 303, attributes: pid: 1, tid: 1, io: 1, name: 1.
event: 2, wd: 1, cookie: 0, len: 61.
tid: 1672.
pid: 1672.
io details: start: 0, size: 0.
name: test.
event: 2, wd: 1, cookie: 0, len: 61.
tid: 1672.
pid: 1672.
io details: start: 0, size: 6.
name: test.

Example with only tid and io details:
$ $ ./iotest -f /tmp/ -t -i
2008-11-25 22:40:30.201286 pid: 1928, tid: 1928, name: /tmp/, wd: 1,
mask: 303, attributes: pid: 0, tid: 1, io: 1, name: 0.
event: 2, wd: 1, cookie: 0, len: 36.
tid: 1672.
io details: start: 0, size: 0.


while in parallel I did:
$ echo qwe11 > /tmp/test

Two events were sent because of two calls for fsnotify_modify(), invoked
lkely from attribute change and write. Events are not combined since
because attributes may be different. Not having a name always combines
(read: skips next event) events because of old logic.

Signed-off-by: Evgeniy Polyakov <[email protected]>

fs/compat.c | 2 +-
fs/inotify.c | 23 ++--
fs/inotify_user.c | 300 ++++++++++++++++++++++++++++++++++++++++------
fs/nfsd/vfs.c | 2 +-
fs/read_write.c | 4 +-
include/linux/fsnotify.h | 105 ++++++++++-------
include/linux/inotify.h | 50 +++++++-
7 files changed, 382 insertions(+), 104 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index e5f49f5..43f4d73 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1162,7 +1162,7 @@ out:
if (type == READ)
fsnotify_access(dentry);
else
- fsnotify_modify(dentry);
+ fsnotify_modify(dentry, *pos - ret, ret);
}
return ret;
}
diff --git a/fs/inotify.c b/fs/inotify.c
index 7bbed1b..b56cefc 100644
--- a/fs/inotify.c
+++ b/fs/inotify.c
@@ -24,6 +24,7 @@
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/spinlock.h>
+#include <linux/fsnotify.h>
#include <linux/idr.h>
#include <linux/slab.h>
#include <linux/fs.h>
@@ -251,8 +252,9 @@ static void remove_watch_no_event(struct inotify_watch *watch,
void inotify_remove_watch_locked(struct inotify_handle *ih,
struct inotify_watch *watch)
{
+ INOTIFY_DETAILS(det, IN_IGNORED, 0, NULL);
remove_watch_no_event(watch, ih);
- ih->in_ops->handle_event(watch, watch->wd, IN_IGNORED, 0, NULL, NULL);
+ ih->in_ops->handle_event(watch, watch->wd, &det, NULL);
}
EXPORT_SYMBOL_GPL(inotify_remove_watch_locked);

@@ -297,8 +299,8 @@ void inotify_d_move(struct dentry *entry)
* @name: filename, if any
* @n_inode: inode associated with name
*/
-void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
- const char *name, struct inode *n_inode)
+void inotify_inode_queue_event(struct inode *inode, struct inotify_details *det,
+ struct inode *n_inode)
{
struct inotify_watch *watch, *next;

@@ -308,13 +310,12 @@ void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
mutex_lock(&inode->inotify_mutex);
list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
u32 watch_mask = watch->mask;
- if (watch_mask & mask) {
+ if (watch_mask & det->mask) {
struct inotify_handle *ih= watch->ih;
mutex_lock(&ih->mutex);
if (watch_mask & IN_ONESHOT)
remove_watch_no_event(watch, ih);
- ih->in_ops->handle_event(watch, watch->wd, mask, cookie,
- name, n_inode);
+ ih->in_ops->handle_event(watch, watch->wd, det, n_inode);
mutex_unlock(&ih->mutex);
}
}
@@ -329,8 +330,7 @@ EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
* @cookie: cookie for synchronization, or zero
* @name: filename, if any
*/
-void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
- u32 cookie, const char *name)
+void inotify_dentry_parent_queue_event(struct dentry *dentry, struct inotify_details *det)
{
struct dentry *parent;
struct inode *inode;
@@ -345,8 +345,7 @@ void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
if (inotify_inode_watched(inode)) {
dget(parent);
spin_unlock(&dentry->d_lock);
- inotify_inode_queue_event(inode, mask, cookie, name,
- dentry->d_inode);
+ inotify_inode_queue_event(inode, det, dentry->d_inode);
dput(parent);
} else
spin_unlock(&dentry->d_lock);
@@ -428,9 +427,9 @@ void inotify_unmount_inodes(struct list_head *list)
watches = &inode->inotify_watches;
list_for_each_entry_safe(watch, next_w, watches, i_list) {
struct inotify_handle *ih= watch->ih;
+ INOTIFY_DETAILS(det, IN_UNMOUNT, 0, NULL);
mutex_lock(&ih->mutex);
- ih->in_ops->handle_event(watch, watch->wd, IN_UNMOUNT, 0,
- NULL, NULL);
+ ih->in_ops->handle_event(watch, watch->wd, &det, NULL);
inotify_remove_watch_locked(ih, watch);
mutex_unlock(&ih->mutex);
}
diff --git a/fs/inotify_user.c b/fs/inotify_user.c
index d367e9b..4d03a19 100644
--- a/fs/inotify_user.c
+++ b/fs/inotify_user.c
@@ -83,6 +83,9 @@ struct inotify_device {
unsigned int queue_size; /* size of the queue (bytes) */
unsigned int event_count; /* number of pending events */
unsigned int max_events; /* maximum number of events */
+ unsigned int flags; /* inotify_init1() flags */
+ struct mutex callback_lock; /* attribute callback lock */
+ struct list_head callback_list; /* attribute callback list */
};

/*
@@ -96,6 +99,7 @@ struct inotify_device {
struct inotify_kernel_event {
struct inotify_event event; /* the user-space event */
struct list_head list; /* entry in inotify_device's list */
+ struct list_head attribute_list; /* List of the attributes */
char *name; /* filename, if any */
};

@@ -108,6 +112,25 @@ struct inotify_user_watch {
struct inotify_watch wdata; /* inotify watch data */
};

+struct inotify_attribute_callback {
+ /* entry in the inotify attribute list */
+ struct list_head callback_entry;
+
+ /* attribute id used to setup callback from userspace */
+ unsigned int id;
+
+ /* callback which is used to attach attribute to given event */
+ int (* callback)(struct inotify_device *dev,
+ struct inotify_kernel_event *ev,
+ struct inotify_details *det);
+};
+
+struct inotify_kernel_attribute
+{
+ struct list_head attribute_entry;
+ struct inotify_attribute attr;
+};
+
#ifdef CONFIG_SYSCTL

#include <linux/sysctl.h>
@@ -179,13 +202,47 @@ static void free_inotify_user_watch(struct inotify_watch *w)
kmem_cache_free(watch_cachep, watch);
}

+static int inotify_kernel_fill_attributes(struct inotify_device *dev,
+ struct inotify_kernel_event *ev, struct inotify_details *det)
+{
+ struct inotify_attribute_callback *attr;
+ int err = 0;
+
+ mutex_lock(&dev->callback_lock);
+ list_for_each_entry(attr, &dev->callback_list, callback_entry) {
+ err = attr->callback(dev, ev, det);
+ if (err < 0)
+ break;
+ ev->event.len += err;
+ err = 0;
+ }
+ mutex_unlock(&dev->callback_lock);
+
+ return err;
+}
+
+/*
+ * free_kevent - frees the given kevent.
+ */
+static void free_kevent(struct inotify_kernel_event *kevent)
+{
+ struct inotify_kernel_attribute *attr, *tmp;
+
+ list_for_each_entry_safe(attr, tmp, &kevent->attribute_list, attribute_entry) {
+ list_del(&attr->attribute_entry);
+ kfree(attr);
+ }
+ kfree(kevent->name);
+ kmem_cache_free(event_cachep, kevent);
+}
+
/*
* kernel_event - create a new kernel event with the given parameters
*
* This function can sleep.
*/
-static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
- const char *name)
+static struct inotify_kernel_event * kernel_event(struct inotify_device *dev,
+ s32 wd, struct inotify_details *det)
{
struct inotify_kernel_event *kevent;

@@ -197,12 +254,23 @@ static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
memset(&kevent->event, 0, sizeof(struct inotify_event));

kevent->event.wd = wd;
- kevent->event.mask = mask;
- kevent->event.cookie = cookie;
+ kevent->event.mask = det->mask;
+ kevent->event.cookie = det->cookie;
+ kevent->event.len = 0;
+ kevent->name = NULL;
+
+ INIT_LIST_HEAD(&kevent->attribute_list);
+
+ if (dev->flags & IN_ATTRS) {
+ if (inotify_kernel_fill_attributes(dev, kevent, det)) {
+ free_kevent(kevent);
+ return NULL;
+ }

- INIT_LIST_HEAD(&kevent->list);
+ return kevent;
+ }

- if (name) {
+ if (det->name) {
size_t len, rem, event_size = sizeof(struct inotify_event);

/*
@@ -212,7 +280,7 @@ static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
* up to the next multiple of the structure's sizeof. This is
* simple and safe for all architectures.
*/
- len = strlen(name) + 1;
+ len = strlen(det->name) + 1;
rem = event_size - len;
if (len > event_size) {
rem = event_size - (len % event_size);
@@ -225,13 +293,10 @@ static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
kmem_cache_free(event_cachep, kevent);
return NULL;
}
- memcpy(kevent->name, name, len);
+ memcpy(kevent->name, det->name, len);
if (rem)
memset(kevent->name + len, 0, rem);
kevent->event.len = len + rem;
- } else {
- kevent->event.len = 0;
- kevent->name = NULL;
}

return kevent;
@@ -267,9 +332,8 @@ inotify_dev_get_last_event(struct inotify_device *dev)
*
* Can sleep (calls kernel_event()).
*/
-static void inotify_dev_queue_event(struct inotify_watch *w, u32 wd, u32 mask,
- u32 cookie, const char *name,
- struct inode *ignored)
+static void inotify_dev_queue_event(struct inotify_watch *w, u32 wd,
+ struct inotify_details *det, struct inode *ignored)
{
struct inotify_user_watch *watch;
struct inotify_device *dev;
@@ -283,18 +347,19 @@ static void inotify_dev_queue_event(struct inotify_watch *w, u32 wd, u32 mask,
/* we can safely put the watch as we don't reference it while
* generating the event
*/
- if (mask & IN_IGNORED || w->mask & IN_ONESHOT)
+ if (det->mask & IN_IGNORED || w->mask & IN_ONESHOT)
put_inotify_watch(w); /* final put */

/* coalescing: drop this event if it is a dupe of the previous */
last = inotify_dev_get_last_event(dev);
- if (last && last->event.mask == mask && last->event.wd == wd &&
- last->event.cookie == cookie) {
+ if (last && last->event.mask == det->mask &&
+ last->event.wd == wd &&
+ last->event.cookie == det->cookie) {
const char *lastname = last->name;

- if (!name && !lastname)
+ if (!det->name && !lastname)
goto out;
- if (name && lastname && !strcmp(lastname, name))
+ if (det->name && lastname && !strcmp(lastname, det->name))
goto out;
}

@@ -303,10 +368,12 @@ static void inotify_dev_queue_event(struct inotify_watch *w, u32 wd, u32 mask,
goto out;

/* if the queue overflows, we need to notify user space */
- if (unlikely(dev->event_count == dev->max_events))
- kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
- else
- kevent = kernel_event(wd, mask, cookie, name);
+ if (unlikely(dev->event_count == dev->max_events)) {
+ det->mask = IN_Q_OVERFLOW;
+ det->name = NULL;
+ kevent = kernel_event(dev, -1, det);
+ } else
+ kevent = kernel_event(dev, wd, det);

if (unlikely(!kevent))
goto out;
@@ -337,15 +404,6 @@ static void remove_kevent(struct inotify_device *dev,
}

/*
- * free_kevent - frees the given kevent.
- */
-static void free_kevent(struct inotify_kernel_event *kevent)
-{
- kfree(kevent->name);
- kmem_cache_free(event_cachep, kevent);
-}
-
-/*
* inotify_dev_event_dequeue - destroy an event on the given device
*
* Caller must hold dev->ev_mutex.
@@ -427,6 +485,30 @@ static unsigned int inotify_poll(struct file *file, poll_table *wait)
return ret;
}

+static int inotify_copy_user_attributes(struct inotify_kernel_event *ev,
+ void __user *buf)
+{
+ struct inotify_kernel_attribute *attr, *tmp;
+ int err = 0;
+
+ list_for_each_entry_safe(attr, tmp, &ev->attribute_list,
+ attribute_entry) {
+ unsigned int size = attr->attr.size +
+ sizeof(struct inotify_attribute);
+
+ err = copy_to_user(buf, &attr->attr, size);
+ if (err)
+ return -EFAULT;
+
+ buf += size;
+
+ list_del(&attr->attribute_entry);
+ kfree(attr);
+ }
+
+ return 0;
+}
+
static ssize_t inotify_read(struct file *file, char __user *buf,
size_t count, loff_t *pos)
{
@@ -500,14 +582,18 @@ static ssize_t inotify_read(struct file *file, char __user *buf,
buf += event_size;
count -= event_size;

- if (kevent->name) {
+ if (dev->flags & IN_ATTRS) {
+ ret = inotify_copy_user_attributes(kevent, buf);
+ if (ret < 0)
+ break;
+ } else if (kevent->name) {
if (copy_to_user(buf, kevent->name, kevent->event.len)){
ret = -EFAULT;
break;
}
- buf += kevent->event.len;
- count -= kevent->event.len;
}
+ buf += kevent->event.len;
+ count -= kevent->event.len;

free_kevent(kevent);

@@ -525,6 +611,19 @@ static int inotify_fasync(int fd, struct file *file, int on)
return fasync_helper(fd, file, on, &dev->fa) >= 0 ? 0 : -EIO;
}

+static void inotify_drop_attribute_list(struct inotify_device *dev)
+{
+ struct inotify_attribute_callback *attr, *tmp;
+
+ mutex_lock(&dev->callback_lock);
+ list_for_each_entry_safe(attr, tmp, &dev->callback_list,
+ callback_entry) {
+ list_del(&attr->callback_entry);
+ kfree(attr);
+ }
+ mutex_unlock(&dev->callback_lock);
+}
+
static int inotify_release(struct inode *ignored, struct file *file)
{
struct inotify_device *dev = file->private_data;
@@ -537,23 +636,145 @@ static int inotify_release(struct inode *ignored, struct file *file)
inotify_dev_event_dequeue(dev);
mutex_unlock(&dev->ev_mutex);

+ inotify_drop_attribute_list(dev);
+
/* free this device: the put matching the get in inotify_init() */
put_inotify_dev(dev);

return 0;
}

+static int inotify_create_attr(struct inotify_kernel_event *ev,
+ unsigned int id, void *data, unsigned int size)
+{
+ struct inotify_kernel_attribute *attr;
+
+ attr = kmalloc(sizeof(struct inotify_kernel_attribute) + size,
+ GFP_NOFS);
+ if (!attr)
+ return -ENOMEM;
+
+ attr->attr.id = id;
+ attr->attr.size = size;
+ memcpy(attr->attr.data, data, size);
+
+ list_add_tail(&attr->attribute_entry, &ev->attribute_list);
+ return size + sizeof(struct inotify_attribute);
+}
+
+static int inotify_add_pid_attr(struct inotify_device *dev,
+ struct inotify_kernel_event *ev,
+ struct inotify_details *det)
+{
+ pid_t pid = task_tgid_vnr(current);
+
+ return inotify_create_attr(ev, INOTIFY_ATTR_PID, &pid, sizeof(pid_t));
+}
+
+static int inotify_add_tid_attr(struct inotify_device *dev,
+ struct inotify_kernel_event *ev,
+ struct inotify_details *det)
+{
+ pid_t pid = task_pid_vnr(current);
+
+ return inotify_create_attr(ev, INOTIFY_ATTR_TID, &pid, sizeof(pid_t));
+}
+
+static int inotify_add_io_details_attr(struct inotify_device *dev,
+ struct inotify_kernel_event *ev,
+ struct inotify_details *det)
+{
+ struct inotify_io_details io;
+
+ if (!(det->mask & IN_MODIFY))
+ return 0;
+
+ io.start = det->start;
+ io.size = det->size;
+
+ return inotify_create_attr(ev, INOTIFY_ATTR_IO,
+ &io, sizeof(struct inotify_io_details));
+}
+
+static int inotify_add_name_attr(struct inotify_device *dev,
+ struct inotify_kernel_event *ev,
+ struct inotify_details *det)
+{
+ return inotify_create_attr(ev, INOTIFY_ATTR_NAME,
+ (void *)det->name, strlen(det->name) + 1);
+}
+
+static struct inotify_attribute_callback inotify_supported_attrs[] = {
+ { .id = INOTIFY_ATTR_PID, .callback = inotify_add_pid_attr, },
+ { .id = INOTIFY_ATTR_TID, .callback = inotify_add_tid_attr, },
+ { .id = INOTIFY_ATTR_IO, .callback = inotify_add_io_details_attr, },
+ { .id = INOTIFY_ATTR_NAME, .callback = inotify_add_name_attr, },
+};
+
+static int inotify_attach_attribute_callback(struct inotify_device *dev,
+ unsigned int a)
+{
+ struct inotify_attribute_callback *attr, *new;
+ unsigned int i;
+ int err = -ENOENT;
+
+ for (i=0; i<ARRAY_SIZE(inotify_supported_attrs); ++i) {
+ attr = &inotify_supported_attrs[i];
+
+ if (attr->id != a)
+ continue;
+
+ mutex_lock(&dev->callback_lock);
+
+ err = 0;
+ list_for_each_entry(attr, &dev->callback_list,
+ callback_entry) {
+ if (attr->id == a) {
+ list_del(&attr->callback_entry);
+ kfree(attr);
+ err = -EEXIST;
+ break;
+ }
+ }
+
+ if (!err) {
+ err = -ENOMEM;
+ new = kmalloc(sizeof(struct inotify_attribute_callback),
+ GFP_KERNEL);
+ if (new) {
+ new->callback = inotify_supported_attrs[i].callback;
+ new->id = a;
+
+ list_add_tail(&new->callback_entry,
+ &dev->callback_list);
+ err = 0;
+ }
+ }
+ mutex_unlock(&dev->callback_lock);
+
+ break;
+ }
+
+ return err;
+}
+
static long inotify_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
struct inotify_device *dev;
void __user *p;
int ret = -ENOTTY;
+ unsigned int attr;

dev = file->private_data;
p = (void __user *) arg;

switch (cmd) {
+ case TIOCSETD:
+ ret = get_user(attr, (unsigned int __user *) p);
+ if (!ret)
+ ret = inotify_attach_attribute_callback(dev, attr);
+ break;
case FIONREAD:
ret = put_user(dev->queue_size, (int __user *) p);
break;
@@ -583,14 +804,14 @@ asmlinkage long sys_inotify_init1(int flags)
struct user_struct *user;
struct file *filp;
int fd, ret;
+ unsigned int iflags = flags & ~(IN_CLOEXEC | IN_NONBLOCK);
+
+ flags &= IN_CLOEXEC | IN_NONBLOCK;

/* Check the IN_* constants for consistency. */
BUILD_BUG_ON(IN_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(IN_NONBLOCK != O_NONBLOCK);

- if (flags & ~(IN_CLOEXEC | IN_NONBLOCK))
- return -EINVAL;
-
fd = get_unused_fd_flags(flags & O_CLOEXEC);
if (fd < 0)
return fd;
@@ -639,6 +860,9 @@ asmlinkage long sys_inotify_init1(int flags)
dev->max_events = inotify_max_queued_events;
dev->user = user;
atomic_set(&dev->count, 0);
+ dev->flags = iflags;
+ mutex_init(&dev->callback_lock);
+ INIT_LIST_HEAD(&dev->callback_list);

get_inotify_dev(dev);
atomic_inc(&user->inotify_devs);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 4433c8f..d3e4adc 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1007,7 +1007,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
set_fs(oldfs);
if (host_err >= 0) {
nfsdstats.io_write += cnt;
- fsnotify_modify(file->f_path.dentry);
+ fsnotify_modify(file->f_path.dentry, offset-cnt, cnt);
}

/* clear setuid/setgid flag after write */
diff --git a/fs/read_write.c b/fs/read_write.c
index 969a6d9..5c521c2 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -335,7 +335,7 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
else
ret = do_sync_write(file, buf, count, pos);
if (ret > 0) {
- fsnotify_modify(file->f_path.dentry);
+ fsnotify_modify(file->f_path.dentry, *pos - ret, ret);
add_wchar(current, ret);
}
inc_syscw(current);
@@ -628,7 +628,7 @@ out:
if (type == READ)
fsnotify_access(file->f_path.dentry);
else
- fsnotify_modify(file->f_path.dentry);
+ fsnotify_modify(file->f_path.dentry, *pos - ret, ret);
}
return ret;
}
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 00fbd5b..0c5171c 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -15,6 +15,9 @@
#include <linux/inotify.h>
#include <linux/audit.h>

+#define INOTIFY_DETAILS(det, __mask, __cookie, __name) \
+ struct inotify_details det = { .mask = __mask, .cookie = __cookie, .name = __name, }
+
/*
* fsnotify_d_instantiate - instantiate a dentry for inode
* Called with dcache_lock held.
@@ -42,7 +45,7 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
int isdir, struct inode *target, struct dentry *moved)
{
struct inode *source = moved->d_inode;
- u32 cookie = inotify_get_cookie();
+ INOTIFY_DETAILS(det, IN_MOVED_FROM|isdir, inotify_get_cookie(), old_name);

if (old_dir == new_dir)
inode_dir_notify(old_dir, DN_RENAME);
@@ -53,18 +56,22 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,

if (isdir)
isdir = IN_ISDIR;
- inotify_inode_queue_event(old_dir, IN_MOVED_FROM|isdir,cookie,old_name,
- source);
- inotify_inode_queue_event(new_dir, IN_MOVED_TO|isdir, cookie, new_name,
- source);
+ inotify_inode_queue_event(old_dir, &det, source);
+ det.mask = IN_MOVED_TO|isdir;
+ det.name = (const unsigned char *)new_name;
+ inotify_inode_queue_event(new_dir, &det, source);

+ det.name = NULL;
+ det.cookie = 0;
if (target) {
- inotify_inode_queue_event(target, IN_DELETE_SELF, 0, NULL, NULL);
+ det.mask = IN_DELETE_SELF;
+ inotify_inode_queue_event(target, &det, NULL);
inotify_inode_is_dead(target);
}

if (source) {
- inotify_inode_queue_event(source, IN_MOVE_SELF, 0, NULL, NULL);
+ det.mask = IN_MOVE_SELF;
+ inotify_inode_queue_event(source, &det, NULL);
}
audit_inode_child(new_name, moved, new_dir);
}
@@ -74,10 +81,11 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
*/
static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
{
+ INOTIFY_DETAILS(det, IN_DELETE|isdir, 0, dentry->d_name.name);
if (isdir)
isdir = IN_ISDIR;
dnotify_parent(dentry, DN_DELETE);
- inotify_dentry_parent_queue_event(dentry, IN_DELETE|isdir, 0, dentry->d_name.name);
+ inotify_dentry_parent_queue_event(dentry, &det);
}

/*
@@ -85,7 +93,8 @@ static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
*/
static inline void fsnotify_inoderemove(struct inode *inode)
{
- inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL, NULL);
+ INOTIFY_DETAILS(det, IN_DELETE_SELF, 0, NULL);
+ inotify_inode_queue_event(inode, &det, NULL);
inotify_inode_is_dead(inode);
}

@@ -94,7 +103,8 @@ static inline void fsnotify_inoderemove(struct inode *inode)
*/
static inline void fsnotify_link_count(struct inode *inode)
{
- inotify_inode_queue_event(inode, IN_ATTRIB, 0, NULL, NULL);
+ INOTIFY_DETAILS(det, IN_ATTRIB, 0, NULL);
+ inotify_inode_queue_event(inode, &det, NULL);
}

/*
@@ -102,9 +112,9 @@ static inline void fsnotify_link_count(struct inode *inode)
*/
static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
{
+ INOTIFY_DETAILS(det, IN_CREATE, 0, dentry->d_name.name);
inode_dir_notify(inode, DN_CREATE);
- inotify_inode_queue_event(inode, IN_CREATE, 0, dentry->d_name.name,
- dentry->d_inode);
+ inotify_inode_queue_event(inode, &det, dentry->d_inode);
audit_inode_child(dentry->d_name.name, dentry, inode);
}

@@ -115,9 +125,9 @@ static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
*/
static inline void fsnotify_link(struct inode *dir, struct inode *inode, struct dentry *new_dentry)
{
+ INOTIFY_DETAILS(det, IN_CREATE, 0, new_dentry->d_name.name);
inode_dir_notify(dir, DN_CREATE);
- inotify_inode_queue_event(dir, IN_CREATE, 0, new_dentry->d_name.name,
- inode);
+ inotify_inode_queue_event(dir, &det, inode);
fsnotify_link_count(inode);
audit_inode_child(new_dentry->d_name.name, new_dentry, dir);
}
@@ -127,9 +137,9 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode, struct
*/
static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry)
{
+ INOTIFY_DETAILS(det, IN_CREATE | IN_ISDIR, 0, dentry->d_name.name);
inode_dir_notify(inode, DN_CREATE);
- inotify_inode_queue_event(inode, IN_CREATE | IN_ISDIR, 0,
- dentry->d_name.name, dentry->d_inode);
+ inotify_inode_queue_event(inode, &det, dentry->d_inode);
audit_inode_child(dentry->d_name.name, dentry, inode);
}

@@ -139,30 +149,35 @@ static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry)
static inline void fsnotify_access(struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
- u32 mask = IN_ACCESS;
+ INOTIFY_DETAILS(det, IN_ACCESS, 0, dentry->d_name.name);

if (S_ISDIR(inode->i_mode))
- mask |= IN_ISDIR;
+ det.mask |= IN_ISDIR;

dnotify_parent(dentry, DN_ACCESS);
- inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
- inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+ inotify_dentry_parent_queue_event(dentry, &det);
+ det.name = NULL;
+ inotify_inode_queue_event(inode, &det, NULL);
}

/*
* fsnotify_modify - file was modified
*/
-static inline void fsnotify_modify(struct dentry *dentry)
+static inline void fsnotify_modify(struct dentry *dentry, loff_t start, size_t size)
{
struct inode *inode = dentry->d_inode;
- u32 mask = IN_MODIFY;
+ INOTIFY_DETAILS(det, IN_MODIFY, 0, dentry->d_name.name);
+
+ det.start = start;
+ det.size = size;

if (S_ISDIR(inode->i_mode))
- mask |= IN_ISDIR;
+ det.mask |= IN_ISDIR;

dnotify_parent(dentry, DN_MODIFY);
- inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
- inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+ inotify_dentry_parent_queue_event(dentry, &det);
+ det.name = NULL;
+ inotify_inode_queue_event(inode, &det, NULL);
}

/*
@@ -171,13 +186,14 @@ static inline void fsnotify_modify(struct dentry *dentry)
static inline void fsnotify_open(struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
- u32 mask = IN_OPEN;
+ INOTIFY_DETAILS(det, IN_OPEN, 0, dentry->d_name.name);

if (S_ISDIR(inode->i_mode))
- mask |= IN_ISDIR;
+ det.mask |= IN_ISDIR;

- inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
- inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+ inotify_dentry_parent_queue_event(dentry, &det);
+ det.name = NULL;
+ inotify_inode_queue_event(inode, &det, NULL);
}

/*
@@ -187,15 +203,15 @@ static inline void fsnotify_close(struct file *file)
{
struct dentry *dentry = file->f_path.dentry;
struct inode *inode = dentry->d_inode;
- const char *name = dentry->d_name.name;
- fmode_t mode = file->f_mode;
- u32 mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
+ INOTIFY_DETAILS(det, (file->f_mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE,
+ 0, dentry->d_name.name);

if (S_ISDIR(inode->i_mode))
- mask |= IN_ISDIR;
+ det.mask |= IN_ISDIR;

- inotify_dentry_parent_queue_event(dentry, mask, 0, name);
- inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+ inotify_dentry_parent_queue_event(dentry, &det);
+ det.name = NULL;
+ inotify_inode_queue_event(inode, &det, NULL);
}

/*
@@ -204,13 +220,14 @@ static inline void fsnotify_close(struct file *file)
static inline void fsnotify_xattr(struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
- u32 mask = IN_ATTRIB;
+ INOTIFY_DETAILS(det, IN_ATTRIB, 0, dentry->d_name.name);

if (S_ISDIR(inode->i_mode))
- mask |= IN_ISDIR;
+ det.mask |= IN_ISDIR;

- inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
- inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+ inotify_dentry_parent_queue_event(dentry, &det);
+ det.name = NULL;
+ inotify_inode_queue_event(inode, &det, NULL);
}

/*
@@ -255,11 +272,13 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
if (dn_mask)
dnotify_parent(dentry, dn_mask);
if (in_mask) {
+ INOTIFY_DETAILS(det, in_mask, 0, NULL);
if (S_ISDIR(inode->i_mode))
- in_mask |= IN_ISDIR;
- inotify_inode_queue_event(inode, in_mask, 0, NULL, NULL);
- inotify_dentry_parent_queue_event(dentry, in_mask, 0,
- dentry->d_name.name);
+ det.mask |= IN_ISDIR;
+ det.size = i_size_read(inode);
+ inotify_inode_queue_event(inode, &det, NULL);
+ det.name = dentry->d_name.name;
+ inotify_dentry_parent_queue_event(dentry, &det);
}
}

diff --git a/include/linux/inotify.h b/include/linux/inotify.h
index 37ea289..ce35d9a 100644
--- a/include/linux/inotify.h
+++ b/include/linux/inotify.h
@@ -68,6 +68,33 @@ struct inotify_event {
/* Flags for sys_inotify_init1. */
#define IN_CLOEXEC O_CLOEXEC
#define IN_NONBLOCK O_NONBLOCK
+#define IN_ATTRS 0x00000001
+
+struct inotify_attribute
+{
+ unsigned int id;
+ unsigned int size;
+ unsigned char data[0];
+};
+
+enum inotify_attribute_types {
+ /* Attach PID of the IO origin process to the event */
+ INOTIFY_ATTR_PID = 1,
+
+ /* Attach thread ID of the IO origin */
+ INOTIFY_ATTR_TID,
+
+ /* Attach offset and size of the reported IO block */
+ INOTIFY_ATTR_IO,
+
+ /* Attach name of the object to be accessed */
+ INOTIFY_ATTR_NAME,
+};
+
+struct inotify_io_details
+{
+ __u64 start, size;
+};

#ifdef __KERNEL__

@@ -95,9 +122,17 @@ struct inotify_watch {
__u32 mask; /* event mask for this watch */
};

+struct inotify_details
+{
+ u32 mask, cookie;
+ const unsigned char *name;
+ loff_t start;
+ size_t size;
+};
+
struct inotify_operations {
- void (*handle_event)(struct inotify_watch *, u32, u32, u32,
- const char *, struct inode *);
+ void (*handle_event)(struct inotify_watch *, u32,
+ struct inotify_details *, struct inode *);
void (*destroy_watch)(struct inotify_watch *);
};

@@ -107,10 +142,10 @@ struct inotify_operations {

extern void inotify_d_instantiate(struct dentry *, struct inode *);
extern void inotify_d_move(struct dentry *);
-extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
- const char *, struct inode *);
-extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
- const char *);
+extern void inotify_inode_queue_event(struct inode *, struct inotify_details *,
+ struct inode *);
+extern void inotify_dentry_parent_queue_event(struct dentry *,
+ struct inotify_details *);
extern void inotify_unmount_inodes(struct list_head *);
extern void inotify_inode_is_dead(struct inode *);
extern u32 inotify_get_cookie(void);
@@ -157,7 +192,8 @@ static inline void inotify_inode_queue_event(struct inode *inode,

static inline void inotify_dentry_parent_queue_event(struct dentry *dentry,
__u32 mask, __u32 cookie,
- const char *filename)
+ const char *filename,
+ void *data)
{
}



--
Evgeniy Polyakov


2008-11-25 19:44:32

by Evgeniy Polyakov

[permalink] [raw]
Subject: [take2] Inotify: nested attributes test application.

$ ./iotest -h
Usage: ./iotest -f path -t <tid> -p <pid> -i <io details> -n <name> -h <help>

$ ./iotest -f /tmp/ -t -i
2008-11-25 22:40:30.201286 pid: 1928, tid: 1928, name: /tmp/, wd: 1, mask: 303, attributes: pid: 0, tid: 1, io: 1, name: 0.
event: 2, wd: 1, cookie: 0, len: 36.
tid: 1672.
io details: start: 0, size: 0.

$ echo qwe11 > /tmp/test

--
Evgeniy Polyakov


Attachments:
(No filename) (373.00 B)
iotest.c (4.58 kB)
Download all attachments

2008-11-26 00:26:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

On Tue, 25 Nov 2008 22:42:35 +0300
Evgeniy Polyakov <[email protected]> wrote:

> Hi.
>
> Attached patch (also attached test application) implementes
> scalable nested attributes which are transferred on behalf the inotify
> mechanism. It uses inotify_init1() to show that new event format should
> be returned when reading from inotify file descriptor.
>
> Attributes are attached to given inotify instance via TIOCSETD ioctl,
> where ID of the attribute is transferred. If entry with given ID exists,
> it is removed and -EEXIST is returned.
>
> There is a set of callbacks indexed by ID (currently 4 attributes are
> supported: pid, tid, io details (start/size of the written block) and
> name), which are attached to given inotify device.
> When new event is created, each callback is executed and may queue some
> data into event structure, based on its internal details. When event is
> read from the userspace, first its header is transferred (old
> inotify_event structure) and then all attributes data in the order of
> its registration via above ioctl. Nested attributes have following
> structure:
>
> struct inotify_attribute
> {
> unsigned int id;
> unsigned int size;
> unsigned char data[0];
> };
>
> where size is nuber of attached bytes for given attribute.
> inotify_event.len contains number of bytes used to store all attached
> attributes and data.
>
> Attribute callback can check if mask contains its bits and do some steps
> based on that information, for example io details attribute does not add
> own data (and header) if event mask does not contain IN_MODIFY bit.
>
> It is possible to infinitely extend number of attributes processed, so
> we will be no longer limited by inotify API and ABI.
>
> Test application usage:
> $ gcc ./iotest.c -W -Wall -I/path/to/kernel/include/ -o iotest
> $ ./iotest -f /tmp/ -t -n -p -i
> 2008-11-25 22:29:47.8477 pid: 1850, tid: 1850, name: /tmp/, wd: 1, mask: 303, attributes: pid: 1, tid: 1, io: 1, name: 1.
> event: 2, wd: 1, cookie: 0, len: 61.
> tid: 1672.
> pid: 1672.
> io details: start: 0, size: 0.
> name: test.
> event: 2, wd: 1, cookie: 0, len: 61.
> tid: 1672.
> pid: 1672.
> io details: start: 0, size: 6.
> name: test.
>
> Example with only tid and io details:
> $ $ ./iotest -f /tmp/ -t -i
> 2008-11-25 22:40:30.201286 pid: 1928, tid: 1928, name: /tmp/, wd: 1,
> mask: 303, attributes: pid: 0, tid: 1, io: 1, name: 0.
> event: 2, wd: 1, cookie: 0, len: 36.
> tid: 1672.
> io details: start: 0, size: 0.
>
>
> while in parallel I did:
> $ echo qwe11 > /tmp/test
>
> Two events were sent because of two calls for fsnotify_modify(), invoked
> lkely from attribute change and write. Events are not combined since
> because attributes may be different. Not having a name always combines
> (read: skips next event) events because of old logic.
>

I guess I'm being more than usually thick, but I don't understand what
this is all about, why it was implemented, what value it provides to
users, etc, etc? Why do I want scalable nested attributes in inotify??

I'm buried in patches which I don't understand lately, and having
hundreds of people send patches at one guy who doesn't understand them
isn't a good system. Eric Paris is working on inotify-type things as
well. It would be neat if you guys were to understand and review each
other's work. Please.

2008-11-26 07:42:51

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

Hi Andrew.

On Tue, Nov 25, 2008 at 04:24:34PM -0800, Andrew Morton ([email protected]) wrote:
> I guess I'm being more than usually thick, but I don't understand what
> this is all about, why it was implemented, what value it provides to
> users, etc, etc? Why do I want scalable nested attributes in inotify??

Originally I just wanted to have a PID value in the inotify events, so
reused cookie for that, but people rather vocally rised against this. So
solution is to extend its structure. It would be possible just to add
couple more bytes and store data there, but if we will want to add some
more data into event later, we will have to implement inotify3 and so
on. So I implemented a way to put essentially any number of new and old
events in any order, turn then on and off, and do not care about
possible limitation of the structure. As example I added PID, TID,
write IO start/size and name attributes.

> I'm buried in patches which I don't understand lately, and having
> hundreds of people send patches at one guy who doesn't understand them
> isn't a good system. Eric Paris is working on inotify-type things as
> well. It would be neat if you guys were to understand and review each
> other's work. Please.

No problem, I will review patches if added to the copy. I'm not
subscsribed to linux-kernel@ so will miss them otherwise.

--
Evgeniy Polyakov

2008-11-26 08:16:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

On Wed, 26 Nov 2008 10:42:39 +0300 Evgeniy Polyakov <[email protected]> wrote:

> Hi Andrew.
>
> On Tue, Nov 25, 2008 at 04:24:34PM -0800, Andrew Morton ([email protected]) wrote:
> > I guess I'm being more than usually thick, but I don't understand what
> > this is all about, why it was implemented, what value it provides to
> > users, etc, etc? Why do I want scalable nested attributes in inotify??
>
> Originally I just wanted to have a PID value in the inotify events, so
> reused cookie for that, but people rather vocally rised against this. So
> solution is to extend its structure. It would be possible just to add
> couple more bytes and store data there, but if we will want to add some
> more data into event later, we will have to implement inotify3 and so
> on. So I implemented a way to put essentially any number of new and old
> events in any order, turn then on and off, and do not care about
> possible limitation of the structure. As example I added PID, TID,
> write IO start/size and name attributes.

OK, so we have a super-duper framework which will allow us to add pids
(and other things) to inotify messages.

This still doesn't provide a reason for anyone to be interested in the
code! Why do we want pids in inotify messages?

And how does this work give that pids are (no longer) system-wide unique?

> > I'm buried in patches which I don't understand lately, and having
> > hundreds of people send patches at one guy who doesn't understand them
> > isn't a good system. Eric Paris is working on inotify-type things as
> > well. It would be neat if you guys were to understand and review each
> > other's work. Please.
>
> No problem, I will review patches if added to the copy. I'm not
> subscsribed to linux-kernel@ so will miss them otherwise.

Thanks.

2008-11-26 08:29:48

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

On Wed, Nov 26, 2008 at 12:15:38AM -0800, Andrew Morton ([email protected]) wrote:
> OK, so we have a super-duper framework which will allow us to add pids
> (and other things) to inotify messages.

Yup :)

> This still doesn't provide a reason for anyone to be interested in the
> code! Why do we want pids in inotify messages?

I actually cared only about myself :)
I started the thread and implementation, because my application has to
differentiate IO made by itself and any IO made by system (another
users, crons, whatever else), inotify did not give me that info, so I
extended it. As of others: PID/TID may be used by watching applications
to reduce own load to not process own IO, things like beagle may show
who actually made changes into the file.

> And how does this work give that pids are (no longer) system-wide unique?

It gets pids from the caller's task_struct (via current), so its data is
as unique as process calling getpid() or syscall(__NR_gettid).

--
Evgeniy Polyakov

2008-11-26 08:39:43

by Andrew Morton

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

On Wed, 26 Nov 2008 11:29:36 +0300 Evgeniy Polyakov <[email protected]> wrote:

> On Wed, Nov 26, 2008 at 12:15:38AM -0800, Andrew Morton ([email protected]) wrote:
> > OK, so we have a super-duper framework which will allow us to add pids
> > (and other things) to inotify messages.
>
> Yup :)
>
> > This still doesn't provide a reason for anyone to be interested in the
> > code! Why do we want pids in inotify messages?
>
> I actually cared only about myself :)
> I started the thread and implementation, because my application has to
> differentiate IO made by itself and any IO made by system (another
> users, crons, whatever else), inotify did not give me that info, so I
> extended it. As of others: PID/TID may be used by watching applications
> to reduce own load to not process own IO, things like beagle may show
> who actually made changes into the file.

hrm. Well this is the sort of information which reviewers want to know
all about before looking at an implementation.

> > And how does this work give that pids are (no longer) system-wide unique?
>
> It gets pids from the caller's task_struct (via current), so its data is
> as unique as process calling getpid() or syscall(__NR_gettid).

That means that the code delivers non-unique process identifiers to
userspace. A client gets pid=42 but there are seven processes on the
machine with that pid. Problem.

2008-11-26 08:51:53

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

On Wed, Nov 26, 2008 at 12:38:25AM -0800, Andrew Morton ([email protected]) wrote:
> That means that the code delivers non-unique process identifiers to
> userspace. A client gets pid=42 but there are seven processes on the
> machine with that pid. Problem.

Because of namespaces I suppose? This is not a problem for those who
run in the same namespace, otherwise we can add nsproxy ID if needed for
anyone.

--
Evgeniy Polyakov

2008-11-26 12:47:43

by David Newall

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

Evgeniy Polyakov wrote:
> On Wed, Nov 26, 2008 at 12:15:38AM -0800, Andrew Morton ([email protected]) wrote:
>
>> This still doesn't provide a reason for anyone to be interested in the
>> code! Why do we want pids in inotify messages?
>>
>
> my application has to
> differentiate IO made by itself and any IO made by system (another
> users, crons, whatever else)

I don't think so. As discussed,
(http://marc.info/?l=linux-kernel&m=122735413932519), you already can
differentiate I/O made by local users, so you don't need to modify inotify.

This change violates my first rule of programming: If there's two or
more ways of solving a problem, pick one; don't pick them all.

2008-11-26 12:51:25

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

On Wed, Nov 26, 2008 at 11:17:28PM +1030, David Newall ([email protected]) wrote:
> I don't think so. As discussed,
> (http://marc.info/?l=linux-kernel&m=122735413932519), you already can
> differentiate I/O made by local users, so you don't need to modify inotify.

And as was shown it does not work for all cases and introduces unneded
performance overhead.

> This change violates my first rule of programming: If there's two or
> more ways of solving a problem, pick one; don't pick them all.

Then we should go back to caves, raw meat was so tasty...

--
Evgeniy Polyakov

2008-11-26 13:05:01

by David Newall

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

Evgeniy Polyakov wrote:
> On Wed, Nov 26, 2008 at 11:17:28PM +1030, David Newall ([email protected]) wrote:
>
>> I don't think so. As discussed,
>> (http://marc.info/?l=linux-kernel&m=122735413932519), you already can
>> differentiate I/O made by local users, so you don't need to modify inotify.
>>
>
> And as was shown it does not work for all cases and introduces unneded
> performance overhead.
>

If the performance using a local loopback is insufficient, or even when
using a UNIX domain socket, then it's going to be even worse for the
actual network-connected clients. And the cases were it doesn't work
amount to poor system administration, which is a solvable problem.


>> This change violates my first rule of programming: If there's two or
>> more ways of solving a problem, pick one; don't pick them all.
>>
>
> Then we should go back to caves, raw meat was so tasty...

Amusing, but irrelevant to programming.

2008-11-26 13:15:51

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

On Wed, Nov 26, 2008 at 11:33:35PM +1030, David Newall ([email protected]) wrote:
> > And as was shown it does not work for all cases and introduces unneded
> > performance overhead.
>
> If the performance using a local loopback is insufficient, or even when
> using a UNIX domain socket, then it's going to be even worse for the
> actual network-connected clients. And the cases were it doesn't work
> amount to poor system administration, which is a solvable problem.

Proposing to introduce additional overhead when it can be avoided is not
the best way to deal with the problem.
Another small detail, that it does not completely solve the problem.

> >> This change violates my first rule of programming: If there's two or
> >> more ways of solving a problem, pick one; don't pick them all.
> >
> > Then we should go back to caves, raw meat was so tasty...
>
> Amusing, but irrelevant to programming.

Programming is so special, that new things which help having a progress
should not be developed there? Care to recall how to work with the punch
card? Following your logic inotify should be returned from the kernel,
since all applications which may do IO can be modified to first register
it in some special database and not checking IO at run-time. And we will
not care about others who did not do that.

Your 'rule' does not apply to the case, where is no way to solve the
problem completely using existing methods.

--
Evgeniy Polyakov

2008-12-04 08:41:37

by Pavel Machek

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

On Wed 2008-11-26 11:29:36, Evgeniy Polyakov wrote:
> On Wed, Nov 26, 2008 at 12:15:38AM -0800, Andrew Morton ([email protected]) wrote:
> > OK, so we have a super-duper framework which will allow us to add pids
> > (and other things) to inotify messages.
>
> Yup :)
>
> > This still doesn't provide a reason for anyone to be interested in the
> > code! Why do we want pids in inotify messages?
>
> I actually cared only about myself :)
> I started the thread and implementation, because my application has to
> differentiate IO made by itself and any IO made by system (another
> users, crons, whatever else), inotify did not give me that info, so I
> extended it. As of others: PID/TID may be used by watching applications
> to reduce own load to not process own IO, things like beagle may show
> who actually made changes into the file.

Actually, does the kernel even know who initiated the i/o?

Take two threads, both mapping /etc/something , both of them writing
through the mmap. Kernel sees dirty pages so it writes them back, but
which thread is repsonsible for the write?

> > And how does this work give that pids are (no longer) system-wide unique?
>
> It gets pids from the caller's task_struct (via current), so its data is
> as unique as process calling getpid() or syscall(__NR_gettid).

What happens on mmap()?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-12-04 09:09:22

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take2] Inotify: nested attributes support.

On Thu, Dec 04, 2008 at 09:43:10AM +0100, Pavel Machek ([email protected]) wrote:
> Actually, does the kernel even know who initiated the i/o?
>
> Take two threads, both mapping /etc/something , both of them writing
> through the mmap. Kernel sees dirty pages so it writes them back, but
> which thread is repsonsible for the write?

It depends. If write will be done with write() syscall, caller will be
stored. Otherwise inotify does not hook into page fault code afaics.

> > > And how does this work give that pids are (no longer) system-wide unique?
> >
> > It gets pids from the caller's task_struct (via current), so its data is
> > as unique as process calling getpid() or syscall(__NR_gettid).
>
> What happens on mmap()?

mmap() syscall is not processed by inotify.

--
Evgeniy Polyakov