2005-03-04 18:55:14

by Robert Love

[permalink] [raw]
Subject: [patch] inotify for 2.6.11

Below is inotify, diffed against 2.6.11.

I greatly reworked much of the data structures and their interactions,
to lay the groundwork for sanitizing the locking. I then, I hope,
sanitized the locking. It looks right, I am happy. Comments welcome.
I surely could of missed something. Maybe even something big.

But, regardless, this release is a huge jump from the previous, fixing
all known issues and greatly improving the locking.

Best,

Robert Love


inotify!

inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

* inotify's interface is a device node, not SIGIO. You open a
single fd to the device node, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
* inotify implements provides finger grained event control.

Inotify is currently used by Beagle (a desktop search infrastructure)
and Gamin (a FAM replacement).

Signed-off-by: Robert Love <[email protected]>

drivers/char/Makefile | 1
fs/Kconfig | 13
fs/Makefile | 1
fs/attr.c | 33 -
fs/compat.c | 14
fs/file_table.c | 4
fs/inode.c | 4
fs/inotify.c | 1013 +++++++++++++++++++++++++++++++++++++++++++++
fs/namei.c | 38 -
fs/open.c | 9
fs/read_write.c | 24 -
fs/super.c | 2
include/linux/fs.h | 8
include/linux/fsnotify.h | 235 ++++++++++
include/linux/inotify.h | 113 +++++
include/linux/miscdevice.h | 1
include/linux/sched.h | 2
kernel/user.c | 2
18 files changed, 1455 insertions(+), 62 deletions(-)

diff -urN linux-2.6.11/drivers/char/Makefile linux/drivers/char/Makefile
--- linux-2.6.11/drivers/char/Makefile 2005-03-02 02:38:26.000000000 -0500
+++ linux/drivers/char/Makefile 2005-03-04 13:11:27.414110056 -0500
@@ -9,6 +9,7 @@

obj-y += mem.o random.o tty_io.o n_tty.o tty_ioctl.o

+
obj-$(CONFIG_LEGACY_PTYS) += pty.o
obj-$(CONFIG_UNIX98_PTYS) += pty.o
obj-y += misc.o
diff -urN linux-2.6.11/fs/attr.c linux/fs/attr.c
--- linux-2.6.11/fs/attr.c 2005-03-02 02:37:48.000000000 -0500
+++ linux/fs/attr.c 2005-03-04 13:13:00.689929976 -0500
@@ -10,7 +10,7 @@
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/fcntl.h>
#include <linux/quotaops.h>
#include <linux/security.h>
@@ -107,31 +107,8 @@
out:
return error;
}
-
EXPORT_SYMBOL(inode_setattr);

-int setattr_mask(unsigned int ia_valid)
-{
- unsigned long dn_mask = 0;
-
- if (ia_valid & ATTR_UID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_GID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_SIZE)
- dn_mask |= DN_MODIFY;
- /* both times implies a utime(s) call */
- if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
- dn_mask |= DN_ATTRIB;
- else if (ia_valid & ATTR_ATIME)
- dn_mask |= DN_ACCESS;
- else if (ia_valid & ATTR_MTIME)
- dn_mask |= DN_MODIFY;
- if (ia_valid & ATTR_MODE)
- dn_mask |= DN_ATTRIB;
- return dn_mask;
-}
-
int notify_change(struct dentry * dentry, struct iattr * attr)
{
struct inode *inode = dentry->d_inode;
@@ -194,11 +171,9 @@
if (ia_valid & ATTR_SIZE)
up_write(&dentry->d_inode->i_alloc_sem);

- if (!error) {
- unsigned long dn_mask = setattr_mask(ia_valid);
- if (dn_mask)
- dnotify_parent(dentry, dn_mask);
- }
+ if (!error)
+ fsnotify_change(dentry, ia_valid);
+
return error;
}

diff -urN linux-2.6.11/fs/compat.c linux/fs/compat.c
--- linux-2.6.11/fs/compat.c 2005-03-02 02:38:08.000000000 -0500
+++ linux/fs/compat.c 2005-03-04 13:11:31.336513760 -0500
@@ -36,7 +36,7 @@
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/dirent.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/highuid.h>
#include <linux/sunrpc/svc.h>
#include <linux/nfsd/nfsd.h>
@@ -1233,9 +1233,15 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ if (type == READ)
+ fsnotify_access(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ else
+ fsnotify_modify(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ }
return ret;
}

diff -urN linux-2.6.11/fs/file_table.c linux/fs/file_table.c
--- linux-2.6.11/fs/file_table.c 2005-03-02 02:37:47.000000000 -0500
+++ linux/fs/file_table.c 2005-03-04 13:11:31.337513608 -0500
@@ -16,6 +16,7 @@
#include <linux/eventpoll.h>
#include <linux/mount.h>
#include <linux/cdev.h>
+#include <linux/fsnotify.h>

/* sysctl tunables... */
struct files_stat_struct files_stat = {
@@ -122,6 +123,9 @@
struct vfsmount *mnt = file->f_vfsmnt;
struct inode *inode = dentry->d_inode;

+
+ fsnotify_close(dentry, inode, file->f_mode, dentry->d_name.name);
+
might_sleep();
/*
* The function eventpoll_release() should be the first called
diff -urN linux-2.6.11/fs/inode.c linux/fs/inode.c
--- linux-2.6.11/fs/inode.c 2005-03-02 02:38:33.000000000 -0500
+++ linux/fs/inode.c 2005-03-04 13:11:31.339513304 -0500
@@ -130,6 +130,10 @@
#ifdef CONFIG_QUOTA
memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
#endif
+#ifdef CONFIG_INOTIFY
+ INIT_LIST_HEAD(&inode->inotify_watches);
+ spin_lock_init(&inode->inotify_lock);
+#endif
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_cdev = NULL;
diff -urN linux-2.6.11/fs/inotify.c linux/fs/inotify.c
--- linux-2.6.11/fs/inotify.c 1969-12-31 19:00:00.000000000 -0500
+++ linux/fs/inotify.c 2005-03-04 13:11:31.340513152 -0500
@@ -0,0 +1,1013 @@
+/*
+ * fs/inotify.c - inode-based file event notifications
+ *
+ * Authors:
+ * John McCutchan <[email protected]>
+ * Robert Love <[email protected]>
+ *
+ * Copyright (C) 2005 John McCutchan
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/writeback.h>
+#include <linux/inotify.h>
+
+#include <asm/ioctls.h>
+
+static atomic_t inotify_cookie;
+
+static kmem_cache_t *watch_cachep;
+static kmem_cache_t *event_cachep;
+
+static int max_user_devices;
+static int max_user_watches;
+static unsigned int max_queued_events;
+
+/*
+ * Lock ordering:
+ *
+ * inode_lock (used to safely walk inode_in_use list)
+ * inode->inotify_lock (protects inotify->inotify_watches and watches->i_list)
+ * inotify_dev->lock (protects inotify_device and watches->d_list)
+ */
+
+/*
+ * Lifetimes of the three main data structures -- inotify_device, inode, and
+ * inotify_watch -- are managed by reference count.
+ *
+ * inotify_device: Lifetime is from open until release. Additional references
+ * can bump the count via get_inotify_dev() and drop the count via
+ * put_inotify_dev().
+ *
+ * inotify_watch: Lifetime is from create_watch() to destory_watch().
+ * Additional references can bump the count via get_inotify_watch() and drop
+ * the count via put_inotify_watch().
+ *
+ * inode: Pinned so long as the inode is associated with a watch, from
+ * create_watch() to put_inotify_watch().
+ */
+
+/*
+ * struct inotify_device - represents an open instance of an inotify device
+ *
+ * This structure is protected by 'lock'.
+ */
+struct inotify_device {
+ wait_queue_head_t wq; /* wait queue for i/o */
+ struct idr idr; /* idr mapping wd -> watch */
+ struct list_head events; /* list of queued events */
+ struct list_head watches; /* list of watches */
+ spinlock_t lock; /* protects this bad boy */
+ atomic_t count; /* reference count */
+ struct user_struct *user; /* user who opened this dev */
+ unsigned int queue_size; /* size of the queue (bytes) */
+ unsigned int event_count; /* number of pending events */
+ unsigned int max_events; /* maximum number of events */
+};
+
+/*
+ * struct inotify_kernel_event - An intofiy event, originating from a watch and
+ * queued for user-space. A list of these is attached to each instance of the
+ * device. In read(), this list is walked and all events that can fit in the
+ * buffer are returned.
+ *
+ * Protected by dev->lock of the device in which we are queued.
+ */
+struct inotify_kernel_event {
+ struct inotify_event event; /* the user-space event */
+ struct list_head list; /* entry in inotify_device's list */
+ char *name; /* filename, if any */
+};
+
+/*
+ * struct inotify_watch - represents a watch request on a specific inode
+ *
+ * d_list is protected by dev->lock of the associated dev->watches.
+ * i_list and mask are protected by inode->inotify_lock of the associated inode.
+ * dev, inode, and wd are never written to once the watch is created.
+ */
+struct inotify_watch {
+ struct list_head d_list; /* entry in inotify_device's list */
+ struct list_head i_list; /* entry in inode's list */
+ atomic_t count; /* reference count */
+ struct inotify_device *dev; /* associated device */
+ struct inode *inode; /* associated inode */
+ s32 wd; /* watch descriptor */
+ u32 mask; /* event mask for this watch */
+};
+
+static ssize_t show_max_queued_events(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_queued_events);
+}
+
+static ssize_t store_max_queued_events(struct class_device *class,
+ const char *buf, size_t count)
+{
+ unsigned int max;
+
+ if (sscanf(buf, "%u", &max) > 0 && max > 0) {
+ max_queued_events = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_devices(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_devices);
+}
+
+static ssize_t store_max_user_devices(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_devices = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_watches(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_watches);
+}
+
+static ssize_t store_max_user_watches(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_watches = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static CLASS_DEVICE_ATTR(max_queued_events, S_IRUGO | S_IWUSR,
+ show_max_queued_events, store_max_queued_events);
+static CLASS_DEVICE_ATTR(max_user_devices, S_IRUGO | S_IWUSR,
+ show_max_user_devices, store_max_user_devices);
+static CLASS_DEVICE_ATTR(max_user_watches, S_IRUGO | S_IWUSR,
+ show_max_user_watches, store_max_user_watches);
+
+static inline void get_inotify_dev(struct inotify_device *dev)
+{
+ atomic_inc(&dev->count);
+}
+
+static inline void put_inotify_dev(struct inotify_device *dev)
+{
+ if (atomic_dec_and_test(&dev->count)) {
+ atomic_dec(&dev->user->inotify_devs);
+ free_uid(dev->user);
+ kfree(dev);
+ }
+}
+
+static inline void get_inotify_watch(struct inotify_watch *watch)
+{
+ atomic_inc(&watch->count);
+}
+
+static inline void put_inotify_watch(struct inotify_watch *watch)
+{
+ if (atomic_dec_and_test(&watch->count)) {
+ put_inotify_dev(watch->dev);
+ iput(watch->inode);
+ kmem_cache_free(watch_cachep, watch);
+ }
+}
+
+/*
+ * kernel_event - create a new kernel event with the given parameters
+ */
+static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_kernel_event *kevent;
+
+ /* XXX: optimally, we should use GFP_KERNEL */
+ kevent = kmem_cache_alloc(event_cachep, GFP_ATOMIC);
+ if (unlikely(!kevent))
+ return NULL;
+
+ /* we hand this out to user-space, so zero it just in case */
+ memset(&kevent->event, 0, sizeof(struct inotify_event));
+
+ kevent->event.wd = wd;
+ kevent->event.mask = mask;
+ kevent->event.cookie = cookie;
+
+ INIT_LIST_HEAD(&kevent->list);
+
+ if (name) {
+ size_t len, rem, event_size = sizeof(struct inotify_event);
+
+ /*
+ * We need to pad the filename so as to properly align an
+ * array of inotify_event structures. Because the structure is
+ * small and the common case is a small filename, we just round
+ * up to the next multiple of the structure's sizeof. This is
+ * simple and safe for all architectures.
+ */
+ len = strlen(name) + 1;
+ rem = event_size - len;
+ if (len > event_size) {
+ rem = event_size - (len % event_size);
+ if (len % event_size == 0)
+ rem = 0;
+ }
+ len += rem;
+
+ /* XXX: optimally, we should use GFP_KERNEL */
+ kevent->name = kmalloc(len, GFP_ATOMIC);
+ if (unlikely(!kevent->name)) {
+ kmem_cache_free(event_cachep, kevent);
+ return NULL;
+ }
+ memset(kevent->name, 0, len);
+ strncpy(kevent->name, name, strlen(name));
+ kevent->event.len = len;
+ } else {
+ kevent->event.len = 0;
+ kevent->name = NULL;
+ }
+
+ return kevent;
+}
+
+/*
+ * inotify_dev_get_event - return the next event in the given dev's queue
+ *
+ * Caller must hold dev->lock.
+ */
+static inline struct inotify_kernel_event *
+inotify_dev_get_event(struct inotify_device *dev)
+{
+ return list_entry(dev->events.next, struct inotify_kernel_event, list);
+}
+
+/*
+ * inotify_dev_queue_event - add a new event to the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_queue_event(struct inotify_device *dev,
+ struct inotify_watch *watch, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct inotify_kernel_event *kevent, *last;
+
+ /* coalescing: drop this event if it is a dupe of the previous */
+ last = inotify_dev_get_event(dev);
+ if (dev->event_count && last->event.mask == mask &&
+ last->event.wd == watch->wd) {
+ const char *lastname = last->name;
+
+ if (!name && !lastname)
+ return;
+ if (name && lastname && !strcmp(lastname, name))
+ return;
+ }
+
+ /*
+ * The queue has already overflowed and we have already sent the
+ * Q_OVERFLOW event.
+ */
+ if (unlikely(dev->event_count > dev->max_events))
+ return;
+
+ /* if the queue overflows, we need to notify user space */
+ if (unlikely(dev->event_count == dev->max_events))
+ kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
+ else
+ kevent = kernel_event(watch->wd, mask, cookie, name);
+
+ if (unlikely(!kevent))
+ return;
+
+ /* queue the event and wake up anyone waiting */
+ dev->event_count++;
+ dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
+ list_add_tail(&kevent->list, &dev->events);
+ wake_up_interruptible(&dev->wq);
+}
+
+static inline int inotify_dev_has_events(struct inotify_device *dev)
+{
+ return !list_empty(&dev->events);
+}
+
+/*
+ * remove_kevent - cleans up and ultimately frees the given kevent
+ */
+static void remove_kevent(struct inotify_device *dev,
+ struct inotify_kernel_event *kevent)
+{
+ BUG_ON(!dev);
+ BUG_ON(!kevent);
+
+ list_del(&kevent->list);
+
+ dev->event_count--;
+ dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
+
+ if (kevent->name)
+ kfree(kevent->name);
+ kmem_cache_free(event_cachep, kevent);
+}
+
+/*
+ * inotify_dev_event_dequeue - destroy an event on the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_event_dequeue(struct inotify_device *dev)
+{
+ if (inotify_dev_has_events(dev)) {
+ struct inotify_kernel_event *kevent;
+ kevent = inotify_dev_get_event(dev);
+ remove_kevent(dev, kevent);
+ }
+}
+
+/*
+ * inotify_dev_get_wd - returns the next WD for use by the given dev
+ *
+ * Grabs dev->lock. This function can sleep.
+ */
+static int inotify_dev_get_wd(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ int ret;
+
+repeat:
+ if (unlikely(!idr_pre_get(&dev->idr, GFP_KERNEL)))
+ return -ENOSPC;
+ spin_lock(&dev->lock);
+ ret = idr_get_new(&dev->idr, watch, &watch->wd);
+ spin_unlock(&dev->lock);
+ if (ret == -EAGAIN) /* more memory is required, try again */
+ goto repeat;
+ else if (ret) /* the idr is full! */
+ return -ENOSPC;
+
+ return 0;
+}
+
+/*
+ * create_watch - creates a watch on the given device.
+ *
+ * Calls inotify_dev_get_wd(), so it both grabs dev->lock and may sleep.
+ * Both 'dev' and 'inode' (by way of nameidata) need to be pinned.
+ */
+static struct inotify_watch *create_watch(struct inotify_device *dev,
+ u32 mask, struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ if (atomic_read(&dev->user->inotify_watches) >= max_user_watches)
+ return NULL;
+
+ watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
+ if (unlikely(!watch))
+ return NULL;
+
+ if (unlikely(inotify_dev_get_wd(dev, watch))) {
+ kmem_cache_free(watch_cachep, watch);
+ return NULL;
+ }
+
+ watch->mask = mask;
+ atomic_set(&watch->count, 0);
+ INIT_LIST_HEAD(&watch->d_list);
+ INIT_LIST_HEAD(&watch->i_list);
+
+ /* save a reference to device and bump the count to make it official */
+ get_inotify_dev(dev);
+ watch->dev = dev;
+
+ /*
+ * Save a reference to the inode and bump the ref count to make it
+ * official. We hold a reference to nameidata, which makes this safe.
+ */
+ watch->inode = inode;
+ spin_lock(&inode_lock);
+ __iget(inode);
+ spin_unlock(&inode_lock);
+
+ /* bump our own count, corresponding to our entry in dev->watches */
+ get_inotify_watch(watch);
+
+ atomic_inc(&dev->user->inotify_watches);
+
+ return watch;
+}
+
+/*
+ * inotify_find_dev - find the watch associated with the given inode and dev
+ *
+ * Callers must hold inode->inotify_lock.
+ */
+static struct inotify_watch *inode_find_dev(struct inode *inode,
+ struct inotify_device *dev)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->dev == dev)
+ return watch;
+ }
+
+ return NULL;
+}
+
+/*
+ * inotify_dev_is_watching_inode - is this device watching this inode?
+ *
+ * Requires 'dev' and 'inode' to both be pinned and dev->lock be held.
+ */
+static inline int inotify_dev_is_watching_inode(struct inotify_device *dev,
+ struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &dev->watches, d_list) {
+ if (watch->inode == inode)
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * remove_watch_no_event - remove_watch() without the IN_IGNORED event.
+ */
+static void remove_watch_no_event(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ BUG_ON(!dev);
+ BUG_ON(!watch);
+
+ list_del(&watch->i_list);
+ list_del(&watch->d_list);
+
+ atomic_dec(&dev->user->inotify_watches);
+ idr_remove(&dev->idr, watch->wd);
+ put_inotify_watch(watch);
+}
+
+/*
+ * remove_watch - Remove a watch from both the device and the inode. Sends
+ * the IN_IGNORED event to the given device signifying that the inode is no
+ * longer watched.
+ *
+ * Callers must hold both inode->inotify_lock and dev->lock. We drop a
+ * reference to the inode before returning.
+ */
+static void remove_watch(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ inotify_dev_queue_event(dev, watch, IN_IGNORED, 0, NULL);
+ remove_watch_no_event(watch, dev);
+}
+
+/* Kernel API */
+
+/**
+ * inotify_inode_queue_event - queue an event to all watches on this inode
+ * @inode: inode event is originating from
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_watch *watch;
+
+ spin_lock(&inode->inotify_lock);
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->mask & mask) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, mask, cookie, name);
+ spin_unlock(&dev->lock);
+ }
+ }
+ spin_unlock(&inode->inotify_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
+
+/**
+ * inotify_dentry_parent_queue_event - queue an event to a dentry's parent
+ * @dentry: the dentry in question, we queue against this dentry's parent
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct dentry *parent;
+ struct inode *inode;
+
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ inode = parent->d_inode;
+ if (!list_empty(&inode->inotify_watches)) {
+ dget(parent);
+ spin_unlock(&dentry->d_lock);
+ inotify_inode_queue_event(inode, mask, cookie, name);
+ dput(parent);
+ } else
+ spin_unlock(&dentry->d_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
+
+/**
+ * inotify_get_cookie - return a unique cookie for use in synchronizing events
+ *
+ * Returns the unique cookie.
+ */
+u32 inotify_get_cookie(void)
+{
+ return atomic_inc_return(&inotify_cookie);
+}
+EXPORT_SYMBOL_GPL(inotify_get_cookie);
+
+/**
+ * inotify_super_block_umount - process watches on an unmounted fs
+ * @sb: the super_block of the filesystem in question
+ */
+void inotify_super_block_umount(struct super_block *sb)
+{
+ struct inode *inode;
+
+ /* Walk the list of inodes and find those on this superblock */
+ spin_lock(&inode_lock);
+ list_for_each_entry(inode, &inode_in_use, i_list) {
+ struct inotify_watch *watch, *next;
+ struct list_head *watches;
+
+ if (inode->i_sb != sb)
+ continue;
+
+ /* for each watch, send IN_UNMOUNT and then remove it */
+ spin_lock(&inode->inotify_lock);
+ watches = &inode->inotify_watches;
+ list_for_each_entry_safe(watch, next, watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, IN_UNMOUNT,0,NULL);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ spin_unlock(&inode->inotify_lock);
+ }
+ spin_unlock(&inode_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_super_block_umount);
+
+/**
+ * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
+ * @inode: inode that is about to be removed
+ */
+void inotify_inode_is_dead(struct inode *inode)
+{
+ struct inotify_watch *watch, *next;
+
+ spin_lock(&inode->inotify_lock);
+ list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ spin_unlock(&inode->inotify_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
+
+/* Device Interface */
+
+static unsigned int inotify_poll(struct file *file, poll_table *wait)
+{
+ struct inotify_device *dev;
+ int ret = 0;
+
+ dev = file->private_data;
+ get_inotify_dev(dev);
+
+ poll_wait(file, &dev->wq, wait);
+ spin_lock(&dev->lock);
+ if (inotify_dev_has_events(dev))
+ ret = POLLIN | POLLRDNORM;
+ spin_unlock(&dev->lock);
+
+ put_inotify_dev(dev);
+ return ret;
+}
+
+static ssize_t inotify_read(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ size_t event_size;
+ struct inotify_device *dev;
+ char __user *start;
+ int ret;
+ DEFINE_WAIT(wait);
+
+ start = buf;
+ dev = file->private_data;
+
+ /* we only hand out full inotify events */
+ event_size = sizeof(struct inotify_event);
+ if (count < event_size)
+ return 0;
+
+ while (1) {
+ int events;
+
+ prepare_to_wait(&dev->wq, &wait, TASK_INTERRUPTIBLE);
+
+ spin_lock(&dev->lock);
+ events = inotify_dev_has_events(dev);
+ spin_unlock(&dev->lock);
+ if (events) {
+ ret = 0;
+ break;
+ }
+
+ if (file->f_flags & O_NONBLOCK) {
+ ret = -EAGAIN;
+ break;
+ }
+
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+
+ schedule();
+ }
+
+ finish_wait(&dev->wq, &wait);
+ if (ret)
+ return ret;
+
+ while (1) {
+ struct inotify_kernel_event *kevent;
+
+ spin_lock(&dev->lock);
+ if (!inotify_dev_has_events(dev)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ kevent = inotify_dev_get_event(dev);
+ if (event_size + kevent->event.len > count) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ list_del_init(&kevent->list);
+ spin_unlock(&dev->lock);
+
+ if (copy_to_user(buf, &kevent->event, event_size)) {
+ /* put the event back on the queue */
+ spin_lock(&dev->lock);
+ list_add(&kevent->list, &dev->events);
+ spin_unlock(&dev->lock);
+ return -EFAULT;
+ }
+ buf += event_size;
+ count -= event_size;
+
+ if (kevent->name) {
+ if (copy_to_user(buf, kevent->name, kevent->event.len)){
+ /* put the event back on the queue */
+ spin_lock(&dev->lock);
+ list_add(&kevent->list, &dev->events);
+ spin_unlock(&dev->lock);
+ return -EFAULT;
+ }
+ buf += kevent->event.len;
+ count -= kevent->event.len;
+ }
+
+ /*
+ * We made it here, so the event was copied to the user. It is
+ * already removed from the event list, just free it.
+ */
+ spin_lock(&dev->lock);
+ remove_kevent(dev, kevent);
+ spin_unlock(&dev->lock);
+ }
+
+ return buf - start;
+}
+
+static int inotify_open(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+ struct user_struct *user;
+ int ret;
+
+ user = get_uid(current->user);
+
+ if (unlikely(atomic_read(&user->inotify_devs) >= max_user_devices)) {
+ ret = -EMFILE;
+ goto out_err;
+ }
+
+ dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
+ if (unlikely(!dev)) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ idr_init(&dev->idr);
+ INIT_LIST_HEAD(&dev->events);
+ INIT_LIST_HEAD(&dev->watches);
+ init_waitqueue_head(&dev->wq);
+ spin_lock_init(&dev->lock);
+
+ dev->event_count = 0;
+ dev->queue_size = 0;
+ dev->max_events = max_queued_events;
+ dev->user = user;
+ atomic_set(&dev->count, 0);
+
+ get_inotify_dev(dev);
+ atomic_inc(&current->user->inotify_devs);
+
+ file->private_data = dev;
+
+ return 0;
+out_err:
+ free_uid(current->user);
+ return ret;
+}
+
+static int inotify_release(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+
+ dev = file->private_data;
+ BUG_ON(!dev);
+
+ /*
+ * Destroy all of the watches on this device. Unfortunately, not very
+ * pretty. We cannot do a simple iteration over the list, because we
+ * do not know the inode until we iterate to the watch. But we need to
+ * hold inode->inotify_lock before dev->lock. The following works.
+ */
+ while (1) {
+ struct inotify_watch *watch;
+ struct list_head *watches;
+ struct inode *inode;
+
+ spin_lock(&dev->lock);
+ watches = &dev->watches;
+ if (list_empty(watches)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ watch = list_entry(watches->next, struct inotify_watch, d_list);
+ get_inotify_watch(watch);
+ spin_unlock(&dev->lock);
+
+ inode = watch->inode;
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ remove_watch_no_event(watch, dev);
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+ put_inotify_watch(watch);
+ }
+
+ /* destroy all of the events on this device */
+ spin_lock(&dev->lock);
+ while (inotify_dev_has_events(dev))
+ inotify_dev_event_dequeue(dev);
+ spin_unlock(&dev->lock);
+
+ /* free this device: the put matching the get in inotify_open() */
+ put_inotify_dev(dev);
+
+ return 0;
+}
+
+static int inotify_add_watch(struct inotify_device *dev,
+ struct inotify_watch_request *request)
+{
+ struct inode *inode;
+ struct inotify_watch *watch, *old;
+ struct nameidata nd;
+ int ret;
+
+ ret = __user_walk(request->name, LOOKUP_FOLLOW, &nd);
+ if (unlikely(ret))
+ return ret;
+
+ /* you can only watch an inode if you have read permissions on it */
+ ret = permission(nd.dentry->d_inode, MAY_READ, NULL);
+ if (unlikely(ret))
+ goto nd_out;
+
+ /* inode is held in place by a reference on nd */
+ inode = nd.dentry->d_inode;
+
+ /*
+ * Handle the case of re-adding a watch on an (inode,dev) pair that we
+ * are already watching. We just update the mask and return its wd.
+ */
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ old->mask = request->mask;
+ ret = old->wd;
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+ goto nd_out;
+ }
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+ /*
+ * We do this lockless, for both scalability and so we can allocate
+ * with GFP_KERNEL. But that means we can race here and add this watch
+ * twice. We fix up that case below, by again checking for the watch
+ * after we reacquire the locks.
+ */
+ watch = create_watch(dev, request->mask, inode);
+ if (unlikely(!watch)) {
+ ret = -ENOSPC;
+ goto nd_out;
+ }
+
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ /* We raced! Destroy this watch and return. */
+ put_inotify_watch(watch);
+ ret = -EBUSY;
+ } else {
+ /* Add the watch to the device's and the inode's list */
+ list_add(&watch->d_list, &dev->watches);
+ list_add(&watch->i_list, &inode->inotify_watches);
+ ret = watch->wd;
+ }
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+nd_out:
+ path_release(&nd);
+
+ return ret;
+}
+
+/*
+ * inotify_ignore - handle the INOTIFY_IGNORE ioctl, asking that a given wd be
+ * removed from the device.
+ */
+static int inotify_ignore(struct inotify_device *dev, s32 wd)
+{
+ struct inotify_watch *watch;
+ struct inode *inode;
+
+ spin_lock(&dev->lock);
+ watch = idr_find(&dev->idr, wd);
+ spin_unlock(&dev->lock);
+
+ if (unlikely(!watch))
+ return -EINVAL;
+
+ inode = watch->inode;
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+ return 0;
+}
+
+static int inotify_ioctl(struct inode *ip, struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct inotify_device *dev;
+ struct inotify_watch_request request;
+ void __user *p;
+ int ret = -ENOTTY;
+ s32 wd;
+
+ dev = file->private_data;
+ p = (void __user *) arg;
+
+ get_inotify_dev(dev);
+
+ switch (cmd) {
+ case INOTIFY_WATCH:
+ if (unlikely(copy_from_user(&request, p, sizeof (request)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_add_watch(dev, &request);
+ break;
+ case INOTIFY_IGNORE:
+ if (unlikely(copy_from_user(&wd, p, sizeof (wd)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_ignore(dev, wd);
+ break;
+ case FIONREAD:
+ ret = put_user(dev->queue_size, (int __user *) p);
+ break;
+ }
+
+ put_inotify_dev(dev);
+
+ return ret;
+}
+
+static struct file_operations inotify_fops = {
+ .owner = THIS_MODULE,
+ .poll = inotify_poll,
+ .read = inotify_read,
+ .open = inotify_open,
+ .release = inotify_release,
+ .ioctl = inotify_ioctl,
+};
+
+static struct miscdevice inotify_device = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "inotify",
+ .fops = &inotify_fops,
+};
+
+/*
+ * inotify_init - Our initialization function. Note that we cannnot return
+ * error because we have compiled-in VFS hooks. So an (unlikely) failure here
+ * must result in panic().
+ */
+static int __init inotify_init(void)
+{
+ struct class_device *class;
+ int ret;
+
+ ret = misc_register(&inotify_device);
+ if (unlikely(ret))
+ panic("inotify: misc_register returned %d\n", ret);
+
+ max_queued_events = 512;
+ max_user_devices = 64;
+ max_user_watches = 16384;
+
+ class = inotify_device.class;
+ class_device_create_file(class, &class_device_attr_max_queued_events);
+ class_device_create_file(class, &class_device_attr_max_user_devices);
+ class_device_create_file(class, &class_device_attr_max_user_watches);
+
+ atomic_set(&inotify_cookie, 0);
+
+ watch_cachep = kmem_cache_create("inotify_watch_cache",
+ sizeof(struct inotify_watch),
+ 0, SLAB_PANIC, NULL, NULL);
+ event_cachep = kmem_cache_create("inotify_event_cache",
+ sizeof(struct inotify_kernel_event),
+ 0, SLAB_PANIC, NULL, NULL);
+
+ printk(KERN_INFO "inotify device minor=%d\n", inotify_device.minor);
+
+ return 0;
+}
+
+module_init(inotify_init);
diff -urN linux-2.6.11/fs/Kconfig linux/fs/Kconfig
--- linux-2.6.11/fs/Kconfig 2005-03-02 02:38:10.000000000 -0500
+++ linux/fs/Kconfig 2005-03-04 13:11:31.342512848 -0500
@@ -339,6 +339,19 @@
If you don't know whether you need it, then you don't need it:
answer N.

+config INOTIFY
+ bool "Inotify file change notification support"
+ default y
+ ---help---
+ Say Y here to enable inotify support and the /dev/inotify character
+ device. Inotify is a file change notification system and a
+ replacement for dnotify. Inotify fixes numerous shortcomings in
+ dnotify and introduces several new features. It allows monitoring
+ of both files and directories via a single open fd. Multiple file
+ events are supported.
+
+ If unsure, say Y.
+
config QUOTA
bool "Quota support"
help
diff -urN linux-2.6.11/fs/Makefile linux/fs/Makefile
--- linux-2.6.11/fs/Makefile 2005-03-02 02:38:10.000000000 -0500
+++ linux/fs/Makefile 2005-03-04 13:11:31.343512696 -0500
@@ -11,6 +11,7 @@
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \

+obj-$(CONFIG_INOTIFY) += inotify.o
obj-$(CONFIG_EPOLL) += eventpoll.o
obj-$(CONFIG_COMPAT) += compat.o

diff -urN linux-2.6.11/fs/namei.c linux/fs/namei.c
--- linux-2.6.11/fs/namei.c 2005-03-02 02:37:55.000000000 -0500
+++ linux/fs/namei.c 2005-03-04 13:11:31.345512392 -0500
@@ -21,7 +21,7 @@
#include <linux/namei.h>
#include <linux/quotaops.h>
#include <linux/pagemap.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/smp_lock.h>
#include <linux/personality.h>
#include <linux/security.h>
@@ -1252,7 +1252,7 @@
DQUOT_INIT(dir);
error = dir->i_op->create(dir, dentry, mode, nd);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_create(dir, dentry, mode);
}
return error;
@@ -1557,7 +1557,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mknod(dir, dentry, mode, dev);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_mknod(dir, dentry, mode, dev);
}
return error;
@@ -1630,7 +1630,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mkdir(dir, dentry, mode);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_mkdir(dir, dentry->d_name.name);
security_inode_post_mkdir(dir,dentry, mode);
}
return error;
@@ -1724,10 +1724,8 @@
}
}
up(&dentry->d_inode->i_sem);
- if (!error) {
- inode_dir_notify(dir, DN_DELETE);
- d_delete(dentry);
- }
+ if (!error)
+ fsnotify_rmdir(dentry, dentry->d_inode, dir);
dput(dentry);

return error;
@@ -1797,10 +1795,9 @@
up(&dentry->d_inode->i_sem);

/* We don't d_delete() NFS sillyrenamed files--they still exist. */
- if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
- d_delete(dentry);
- inode_dir_notify(dir, DN_DELETE);
- }
+ if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED))
+ fsnotify_unlink(dentry->d_inode, dir, dentry);
+
return error;
}

@@ -1874,7 +1871,7 @@
DQUOT_INIT(dir);
error = dir->i_op->symlink(dir, dentry, oldname);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_symlink(dir, dentry, oldname);
}
return error;
@@ -1947,7 +1944,7 @@
error = dir->i_op->link(old_dentry, dir, new_dentry);
up(&old_dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, new_dentry->d_name.name);
security_inode_post_link(old_dentry, dir, new_dentry);
}
return error;
@@ -2111,6 +2108,7 @@
{
int error;
int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
+ char *old_name;

if (old_dentry->d_inode == new_dentry->d_inode)
return 0;
@@ -2132,18 +2130,18 @@
DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir);

+ old_name = fsnotify_oldname_init(old_dentry);
+
if (is_dir)
error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
else
error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
if (!error) {
- if (old_dir == new_dir)
- inode_dir_notify(old_dir, DN_RENAME);
- else {
- inode_dir_notify(old_dir, DN_DELETE);
- inode_dir_notify(new_dir, DN_CREATE);
- }
+ const char *new_name = old_dentry->d_name.name;
+ fsnotify_move(old_dir, new_dir, old_name, new_name);
}
+ fsnotify_oldname_free(old_name);
+
return error;
}

diff -urN linux-2.6.11/fs/open.c linux/fs/open.c
--- linux-2.6.11/fs/open.c 2005-03-02 02:37:47.000000000 -0500
+++ linux/fs/open.c 2005-03-04 13:11:31.346512240 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/smp_lock.h>
#include <linux/quotaops.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/tty.h>
@@ -944,9 +944,14 @@
fd = get_unused_fd();
if (fd >= 0) {
struct file *f = filp_open(tmp, flags, mode);
+ struct dentry *dentry;
+
error = PTR_ERR(f);
if (IS_ERR(f))
goto out_error;
+ dentry = f->f_dentry;
+ fsnotify_open(dentry, dentry->d_inode,
+ dentry->d_name.name);
fd_install(fd, f);
}
out:
@@ -998,7 +1003,7 @@
retval = err;
}

- dnotify_flush(filp, id);
+ fsnotify_flush(filp, id);
locks_remove_posix(filp, id);
fput(filp);
return retval;
diff -urN linux-2.6.11/fs/read_write.c linux/fs/read_write.c
--- linux-2.6.11/fs/read_write.c 2005-03-02 02:38:12.000000000 -0500
+++ linux/fs/read_write.c 2005-03-04 13:18:56.283871480 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/uio.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/module.h>
#include <linux/syscalls.h>
@@ -239,7 +239,10 @@
else
ret = do_sync_read(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_ACCESS);
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ fsnotify_access(dentry, inode,
+ dentry->d_name.name);
current->rchar += ret;
}
current->syscr++;
@@ -287,7 +290,10 @@
else
ret = do_sync_write(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_MODIFY);
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ fsnotify_modify(dentry, inode,
+ dentry->d_name.name);
current->wchar += ret;
}
current->syscw++;
@@ -523,9 +529,15 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+
+ if (type == READ)
+ fsnotify_access(dentry, inode, dentry->d_name.name);
+ else
+ fsnotify_modify(dentry, inode, dentry->d_name.name);
+ }
return ret;
Efault:
ret = -EFAULT;
diff -urN linux-2.6.11/fs/super.c linux/fs/super.c
--- linux-2.6.11/fs/super.c 2005-03-02 02:38:08.000000000 -0500
+++ linux/fs/super.c 2005-03-04 13:14:36.168415040 -0500
@@ -36,6 +36,7 @@
#include <linux/vfs.h>
#include <linux/writeback.h> /* for the emergency remount stuff */
#include <linux/idr.h>
+#include <linux/fsnotify.h>
#include <linux/kobject.h>
#include <asm/uaccess.h>

@@ -229,6 +230,7 @@

if (root) {
sb->s_root = NULL;
+ fsnotify_sb_umount(sb);
shrink_dcache_parent(root);
shrink_dcache_anon(&sb->s_anon);
dput(root);
diff -urN linux-2.6.11/include/linux/fs.h linux/include/linux/fs.h
--- linux-2.6.11/include/linux/fs.h 2005-03-02 02:37:50.000000000 -0500
+++ linux/include/linux/fs.h 2005-03-04 13:11:31.352511328 -0500
@@ -224,6 +224,7 @@
struct kstatfs;
struct vm_area_struct;
struct vfsmount;
+struct inotify_inode_data;

/* Used to be a macro which just called the function, now just a function */
extern void update_atime (struct inode *);
@@ -471,6 +472,11 @@
struct dnotify_struct *i_dnotify; /* for directory notifications */
#endif

+#ifdef CONFIG_INOTIFY
+ struct list_head inotify_watches; /* watches on this inode */
+ spinlock_t inotify_lock; /* protects the watches list */
+#endif
+
unsigned long i_state;
unsigned long dirtied_when; /* jiffies of first dirtying */

@@ -1365,7 +1371,7 @@
extern int do_remount_sb(struct super_block *sb, int flags,
void *data, int force);
extern sector_t bmap(struct inode *, sector_t);
-extern int setattr_mask(unsigned int);
+extern void setattr_mask(unsigned int, int *, u32 *);
extern int notify_change(struct dentry *, struct iattr *);
extern int permission(struct inode *, int, struct nameidata *);
extern int generic_permission(struct inode *, int,
diff -urN linux-2.6.11/include/linux/fsnotify.h linux/include/linux/fsnotify.h
--- linux-2.6.11/include/linux/fsnotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/fsnotify.h 2005-03-04 13:11:31.353511176 -0500
@@ -0,0 +1,235 @@
+#ifndef _LINUX_FS_NOTIFY_H
+#define _LINUX_FS_NOTIFY_H
+
+/*
+ * include/linux/fs_notify.h - generic hooks for filesystem notification, to
+ * reduce in-source duplication from both dnotify and inotify.
+ *
+ * We don't compile any of this away in some complicated menagerie of ifdefs.
+ * Instead, we rely on the code inside to optimize away as needed.
+ *
+ * (C) Copyright 2005 Robert Love
+ */
+
+#ifdef __KERNEL__
+
+#include <linux/dnotify.h>
+#include <linux/inotify.h>
+
+/*
+ * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
+ */
+static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
+ const char *old_name, const char *new_name)
+{
+ u32 cookie;
+
+ if (old_dir == new_dir)
+ inode_dir_notify(old_dir, DN_RENAME);
+ else {
+ inode_dir_notify(old_dir, DN_DELETE);
+ inode_dir_notify(new_dir, DN_CREATE);
+ }
+
+ cookie = inotify_get_cookie();
+
+ inotify_inode_queue_event(old_dir, IN_MOVED_FROM, cookie, old_name);
+ inotify_inode_queue_event(new_dir, IN_MOVED_TO, cookie, new_name);
+}
+
+/*
+ * fsnotify_unlink - file was unlinked
+ */
+static inline void fsnotify_unlink(struct inode *inode, struct inode *dir,
+ struct dentry *dentry)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_FILE, 0, dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+ d_delete(dentry);
+}
+
+/*
+ * fsnotify_rmdir - directory was removed
+ */
+static inline void fsnotify_rmdir(struct dentry *dentry, struct inode *inode,
+ struct inode *dir)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_SUBDIR,0,dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+ d_delete(dentry);
+}
+
+/*
+ * fsnotify_create - filename was linked in
+ */
+static inline void fsnotify_create(struct inode *inode, const char *filename)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_FILE, 0, filename);
+}
+
+/*
+ * fsnotify_mkdir - directory 'name' was created
+ */
+static inline void fsnotify_mkdir(struct inode *inode, const char *name)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_SUBDIR, 0, name);
+}
+
+/*
+ * fsnotify_access - file was read
+ */
+static inline void fsnotify_access(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ dnotify_parent(dentry, DN_ACCESS);
+ inotify_dentry_parent_queue_event(dentry, IN_ACCESS, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
+}
+
+/*
+ * fsnotify_modify - file was modified
+ */
+static inline void fsnotify_modify(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ dnotify_parent(dentry, DN_MODIFY);
+ inotify_dentry_parent_queue_event(dentry, IN_MODIFY, 0, filename);
+ inotify_inode_queue_event(inode, IN_MODIFY, 0, NULL);
+}
+
+/*
+ * fsnotify_open - file was opened
+ */
+static inline void fsnotify_open(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ inotify_inode_queue_event(inode, IN_OPEN, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, IN_OPEN, 0, filename);
+}
+
+/*
+ * fsnotify_close - file was closed
+ */
+static inline void fsnotify_close(struct dentry *dentry, struct inode *inode,
+ mode_t mode, const char *filename)
+{
+ u32 mask;
+
+ mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
+ inotify_dentry_parent_queue_event(dentry, mask, 0, filename);
+ inotify_inode_queue_event(inode, mask, 0, NULL);
+}
+
+/*
+ * fsnotify_change - notify_change event. file was modified and/or metadata
+ * was changed.
+ */
+static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
+{
+ int dn_mask = 0;
+ u32 in_mask = 0;
+
+ if (ia_valid & ATTR_UID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_GID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_SIZE) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ /* both times implies a utime(s) call */
+ if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
+ {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ } else if (ia_valid & ATTR_ATIME) {
+ in_mask |= IN_ACCESS;
+ dn_mask |= DN_ACCESS;
+ } else if (ia_valid & ATTR_MTIME) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ if (ia_valid & ATTR_MODE) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+
+ if (dn_mask)
+ dnotify_parent(dentry, dn_mask);
+ if (in_mask) {
+ inotify_inode_queue_event(dentry->d_inode, in_mask, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, in_mask, 0,
+ dentry->d_name.name);
+ }
+}
+
+/*
+ * fsnotify_sb_umount - filesystem unmount
+ */
+static inline void fsnotify_sb_umount(struct super_block *sb)
+{
+ inotify_super_block_umount(sb);
+}
+
+/*
+ * fsnotify_flush - flush time!
+ */
+static inline void fsnotify_flush(struct file *filp, fl_owner_t id)
+{
+ dnotify_flush(filp, id);
+}
+
+#ifdef CONFIG_INOTIFY /* inotify helpers */
+
+/*
+ * fsnotify_oldname_init - save off the old filename before we change it
+ *
+ * this could be kstrdup if only we could add that to lib/string.c
+ */
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ char *old_name;
+
+ old_name = kmalloc(strlen(old_dentry->d_name.name) + 1, GFP_KERNEL);
+ if (old_name)
+ strcpy(old_name, old_dentry->d_name.name);
+ return old_name;
+}
+
+/*
+ * fsnotify_oldname_free - free the name we got from fsnotify_oldname_init
+ */
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+ kfree(old_name);
+}
+
+#else /* CONFIG_INOTIFY */
+
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ return NULL;
+}
+
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+}
+
+#endif /* ! CONFIG_INOTIFY */
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_FS_NOTIFY_H */
diff -urN linux-2.6.11/include/linux/inotify.h linux/include/linux/inotify.h
--- linux-2.6.11/include/linux/inotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/inotify.h 2005-03-04 13:11:31.354511024 -0500
@@ -0,0 +1,113 @@
+/*
+ * Inode based directory notification for Linux
+ *
+ * Copyright (C) 2005 John McCutchan
+ */
+
+#ifndef _LINUX_INOTIFY_H
+#define _LINUX_INOTIFY_H
+
+#include <linux/types.h>
+#include <linux/limits.h>
+
+/*
+ * struct inotify_event - structure read from the inotify device for each event
+ *
+ * When you are watching a directory, you will receive the filename for events
+ * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, ..., relative to the wd.
+ */
+struct inotify_event {
+ __s32 wd; /* watch descriptor */
+ __u32 mask; /* watch mask */
+ __u32 cookie; /* cookie to synchronize two events */
+ size_t len; /* length (including nulls) of name */
+ char name[0]; /* stub for possible name */
+};
+
+/*
+ * struct inotify_watch_request - represents a watch request
+ *
+ * Pass to the inotify device via the INOTIFY_WATCH ioctl
+ */
+struct inotify_watch_request {
+ char *name; /* filename name */
+ __u32 mask; /* event mask */
+};
+
+/* the following are legal, implemented events */
+#define IN_ACCESS 0x00000001 /* File was accessed */
+#define IN_MODIFY 0x00000002 /* File was modified */
+#define IN_ATTRIB 0x00000004 /* File changed attributes */
+#define IN_CLOSE_WRITE 0x00000008 /* Writtable file was closed */
+#define IN_CLOSE_NOWRITE 0x00000010 /* Unwrittable file closed */
+#define IN_OPEN 0x00000020 /* File was opened */
+#define IN_MOVED_FROM 0x00000040 /* File was moved from X */
+#define IN_MOVED_TO 0x00000080 /* File was moved to Y */
+#define IN_DELETE_SUBDIR 0x00000100 /* Subdir was deleted */
+#define IN_DELETE_FILE 0x00000200 /* Subfile was deleted */
+#define IN_CREATE_SUBDIR 0x00000400 /* Subdir was created */
+#define IN_CREATE_FILE 0x00000800 /* Subfile was created */
+#define IN_DELETE_SELF 0x00001000 /* Self was deleted */
+#define IN_UNMOUNT 0x00002000 /* Backing fs was unmounted */
+#define IN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define IN_IGNORED 0x00008000 /* File was ignored */
+
+/* special flags */
+#define IN_ALL_EVENTS 0xffffffff /* All the events */
+#define IN_CLOSE (IN_CLOSE_WRITE | IN_CLOSE_NOWRITE)
+
+#define INOTIFY_IOCTL_MAGIC 'Q'
+#define INOTIFY_IOCTL_MAXNR 2
+
+#define INOTIFY_WATCH _IOR(INOTIFY_IOCTL_MAGIC, 1, struct inotify_watch_request)
+#define INOTIFY_IGNORE _IOR(INOTIFY_IOCTL_MAGIC, 2, int)
+
+#ifdef __KERNEL__
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/config.h>
+#include <asm/atomic.h>
+
+#ifdef CONFIG_INOTIFY
+
+extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
+ const char *);
+extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
+ const char *);
+extern void inotify_super_block_umount(struct super_block *);
+extern void inotify_inode_is_dead(struct inode *);
+extern u32 inotify_get_cookie(void);
+
+#else
+
+static inline void inotify_inode_queue_event(struct inode *inode,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_dentry_parent_queue_event(struct dentry *dentry,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_super_block_umount(struct super_block *sb)
+{
+}
+
+static inline void inotify_inode_is_dead(struct inode *inode)
+{
+}
+
+static inline u32 inotify_get_cookie(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_INOTIFY */
+
+#endif /* __KERNEL __ */
+
+#endif /* _LINUX_INOTIFY_H */
diff -urN linux-2.6.11/include/linux/miscdevice.h linux/include/linux/miscdevice.h
--- linux-2.6.11/include/linux/miscdevice.h 2005-03-02 02:38:10.000000000 -0500
+++ linux/include/linux/miscdevice.h 2005-03-04 13:11:31.355510872 -0500
@@ -2,6 +2,7 @@
#define _LINUX_MISCDEVICE_H
#include <linux/module.h>
#include <linux/major.h>
+#include <linux/device.h>

#define PSMOUSE_MINOR 1
#define MS_BUSMOUSE_MINOR 2
diff -urN linux-2.6.11/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.6.11/include/linux/sched.h 2005-03-02 02:37:48.000000000 -0500
+++ linux/include/linux/sched.h 2005-03-04 13:11:31.356510720 -0500
@@ -368,6 +368,8 @@
atomic_t processes; /* How many processes does this user have? */
atomic_t files; /* How many open files does this user have? */
atomic_t sigpending; /* How many pending signals does this user have? */
+ atomic_t inotify_watches; /* How many inotify watches does this user have? */
+ atomic_t inotify_devs; /* How many inotify devs does this user have opened? */
/* protected by mq_lock */
unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
unsigned long locked_shm; /* How many pages of mlocked shm ? */
diff -urN linux-2.6.11/kernel/user.c linux/kernel/user.c
--- linux-2.6.11/kernel/user.c 2005-03-02 02:37:58.000000000 -0500
+++ linux/kernel/user.c 2005-03-04 13:11:31.357510568 -0500
@@ -119,6 +119,8 @@
atomic_set(&new->processes, 0);
atomic_set(&new->files, 0);
atomic_set(&new->sigpending, 0);
+ atomic_set(&new->inotify_watches, 0);
+ atomic_set(&new->inotify_devs, 0);

new->mq_bytes = 0;
new->locked_shm = 0;



2005-03-04 19:41:18

by Robert Love

[permalink] [raw]
Subject: [patch] inotify for 2.6.11-mm1

On Fri, 2005-03-04 at 13:37 -0500, Robert Love wrote:

Hey, Andrew.

> I greatly reworked much of the data structures and their interactions,
> to lay the groundwork for sanitizing the locking. I then, I hope,
> sanitized the locking. It looks right, I am happy. Comments welcome.
> I surely could of missed something. Maybe even something big.
>
> But, regardless, this release is a huge jump from the previous, fixing
> all known issues and greatly improving the locking.

Attached is inotify, replacing the current version of inotify in 2.6-mm.
The patch is diffed against 2.6.11-mm1, modulo the two inotify patches
already in-tree.

I'd like to start moving forward on this, the locking is greatly
improved, resolve any new issues, etc.

Please, apply.

Your humble servant,

Robert Love


inotify!

inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

* inotify's interface is a device node, not SIGIO. You open a
single fd to the device node, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
* inotify supports much finer grained events.

Inotify is currently used by Beagle (a desktop search infrastructure)
and Gamin (a FAM replacement).

Signed-off-by: Robert Love <[email protected]>

fs/Kconfig | 13
fs/Makefile | 1
fs/attr.c | 33 -
fs/compat.c | 14
fs/file_table.c | 4
fs/inode.c | 4
fs/inotify.c | 1013 +++++++++++++++++++++++++++++++++++++++++++++
fs/namei.c | 38 -
fs/open.c | 9
fs/read_write.c | 24 -
fs/super.c | 2
include/linux/fs.h | 8
include/linux/fsnotify.h | 235 ++++++++++
include/linux/inotify.h | 113 +++++
include/linux/sched.h | 2
kernel/user.c | 2
18 files changed, 1453 insertions(+), 62 deletions(-)

diff -urN linux-2.6.11-mm1/fs/attr.c linux/fs/attr.c
--- linux-2.6.11-mm1/fs/attr.c 2005-03-04 14:06:21.732297568 -0500
+++ linux/fs/attr.c 2005-03-04 13:27:05.560490128 -0500
@@ -10,7 +10,7 @@
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/fcntl.h>
#include <linux/quotaops.h>
#include <linux/security.h>
@@ -107,31 +107,8 @@
out:
return error;
}
-
EXPORT_SYMBOL(inode_setattr);

-int setattr_mask(unsigned int ia_valid)
-{
- unsigned long dn_mask = 0;
-
- if (ia_valid & ATTR_UID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_GID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_SIZE)
- dn_mask |= DN_MODIFY;
- /* both times implies a utime(s) call */
- if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
- dn_mask |= DN_ATTRIB;
- else if (ia_valid & ATTR_ATIME)
- dn_mask |= DN_ACCESS;
- else if (ia_valid & ATTR_MTIME)
- dn_mask |= DN_MODIFY;
- if (ia_valid & ATTR_MODE)
- dn_mask |= DN_ATTRIB;
- return dn_mask;
-}
-
int notify_change(struct dentry * dentry, struct iattr * attr)
{
struct inode *inode = dentry->d_inode;
@@ -194,11 +171,9 @@
if (ia_valid & ATTR_SIZE)
up_write(&dentry->d_inode->i_alloc_sem);

- if (!error) {
- unsigned long dn_mask = setattr_mask(ia_valid);
- if (dn_mask)
- dnotify_parent(dentry, dn_mask);
- }
+ if (!error)
+ fsnotify_change(dentry, ia_valid);
+
return error;
}

diff -urN linux-2.6.11-mm1/fs/compat.c linux/fs/compat.c
--- linux-2.6.11-mm1/fs/compat.c 2005-03-04 14:06:21.734297264 -0500
+++ linux/fs/compat.c 2005-03-04 13:27:05.562489824 -0500
@@ -36,7 +36,7 @@
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/dirent.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/highuid.h>
#include <linux/sunrpc/svc.h>
#include <linux/nfsd/nfsd.h>
@@ -1233,9 +1233,15 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ if (type == READ)
+ fsnotify_access(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ else
+ fsnotify_modify(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ }
return ret;
}

diff -urN linux-2.6.11-mm1/fs/file_table.c linux/fs/file_table.c
--- linux-2.6.11-mm1/fs/file_table.c 2005-03-04 14:06:21.734297264 -0500
+++ linux/fs/file_table.c 2005-03-04 13:27:05.563489672 -0500
@@ -16,6 +16,7 @@
#include <linux/eventpoll.h>
#include <linux/mount.h>
#include <linux/cdev.h>
+#include <linux/fsnotify.h>

/* sysctl tunables... */
struct files_stat_struct files_stat = {
@@ -122,6 +123,9 @@
struct vfsmount *mnt = file->f_vfsmnt;
struct inode *inode = dentry->d_inode;

+
+ fsnotify_close(dentry, inode, file->f_mode, dentry->d_name.name);
+
might_sleep();
/*
* The function eventpoll_release() should be the first called
diff -urN linux-2.6.11-mm1/fs/inode.c linux/fs/inode.c
--- linux-2.6.11-mm1/fs/inode.c 2005-03-04 14:06:21.737296808 -0500
+++ linux/fs/inode.c 2005-03-04 13:27:05.565489368 -0500
@@ -132,6 +132,10 @@
#ifdef CONFIG_QUOTA
memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
#endif
+#ifdef CONFIG_INOTIFY
+ INIT_LIST_HEAD(&inode->inotify_watches);
+ spin_lock_init(&inode->inotify_lock);
+#endif
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_cdev = NULL;
diff -urN linux-2.6.11-mm1/fs/inotify.c linux/fs/inotify.c
--- linux-2.6.11-mm1/fs/inotify.c 1969-12-31 19:00:00.000000000 -0500
+++ linux/fs/inotify.c 2005-03-04 13:27:05.568488912 -0500
@@ -0,0 +1,1013 @@
+/*
+ * fs/inotify.c - inode-based file event notifications
+ *
+ * Authors:
+ * John McCutchan <[email protected]>
+ * Robert Love <[email protected]>
+ *
+ * Copyright (C) 2005 John McCutchan
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/writeback.h>
+#include <linux/inotify.h>
+
+#include <asm/ioctls.h>
+
+static atomic_t inotify_cookie;
+
+static kmem_cache_t *watch_cachep;
+static kmem_cache_t *event_cachep;
+
+static int max_user_devices;
+static int max_user_watches;
+static unsigned int max_queued_events;
+
+/*
+ * Lock ordering:
+ *
+ * inode_lock (used to safely walk inode_in_use list)
+ * inode->inotify_lock (protects inotify->inotify_watches and watches->i_list)
+ * inotify_dev->lock (protects inotify_device and watches->d_list)
+ */
+
+/*
+ * Lifetimes of the three main data structures -- inotify_device, inode, and
+ * inotify_watch -- are managed by reference count.
+ *
+ * inotify_device: Lifetime is from open until release. Additional references
+ * can bump the count via get_inotify_dev() and drop the count via
+ * put_inotify_dev().
+ *
+ * inotify_watch: Lifetime is from create_watch() to destory_watch().
+ * Additional references can bump the count via get_inotify_watch() and drop
+ * the count via put_inotify_watch().
+ *
+ * inode: Pinned so long as the inode is associated with a watch, from
+ * create_watch() to put_inotify_watch().
+ */
+
+/*
+ * struct inotify_device - represents an open instance of an inotify device
+ *
+ * This structure is protected by 'lock'.
+ */
+struct inotify_device {
+ wait_queue_head_t wq; /* wait queue for i/o */
+ struct idr idr; /* idr mapping wd -> watch */
+ struct list_head events; /* list of queued events */
+ struct list_head watches; /* list of watches */
+ spinlock_t lock; /* protects this bad boy */
+ atomic_t count; /* reference count */
+ struct user_struct *user; /* user who opened this dev */
+ unsigned int queue_size; /* size of the queue (bytes) */
+ unsigned int event_count; /* number of pending events */
+ unsigned int max_events; /* maximum number of events */
+};
+
+/*
+ * struct inotify_kernel_event - An intofiy event, originating from a watch and
+ * queued for user-space. A list of these is attached to each instance of the
+ * device. In read(), this list is walked and all events that can fit in the
+ * buffer are returned.
+ *
+ * Protected by dev->lock of the device in which we are queued.
+ */
+struct inotify_kernel_event {
+ struct inotify_event event; /* the user-space event */
+ struct list_head list; /* entry in inotify_device's list */
+ char *name; /* filename, if any */
+};
+
+/*
+ * struct inotify_watch - represents a watch request on a specific inode
+ *
+ * d_list is protected by dev->lock of the associated dev->watches.
+ * i_list and mask are protected by inode->inotify_lock of the associated inode.
+ * dev, inode, and wd are never written to once the watch is created.
+ */
+struct inotify_watch {
+ struct list_head d_list; /* entry in inotify_device's list */
+ struct list_head i_list; /* entry in inode's list */
+ atomic_t count; /* reference count */
+ struct inotify_device *dev; /* associated device */
+ struct inode *inode; /* associated inode */
+ s32 wd; /* watch descriptor */
+ u32 mask; /* event mask for this watch */
+};
+
+static ssize_t show_max_queued_events(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_queued_events);
+}
+
+static ssize_t store_max_queued_events(struct class_device *class,
+ const char *buf, size_t count)
+{
+ unsigned int max;
+
+ if (sscanf(buf, "%u", &max) > 0 && max > 0) {
+ max_queued_events = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_devices(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_devices);
+}
+
+static ssize_t store_max_user_devices(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_devices = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_watches(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_watches);
+}
+
+static ssize_t store_max_user_watches(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_watches = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static CLASS_DEVICE_ATTR(max_queued_events, S_IRUGO | S_IWUSR,
+ show_max_queued_events, store_max_queued_events);
+static CLASS_DEVICE_ATTR(max_user_devices, S_IRUGO | S_IWUSR,
+ show_max_user_devices, store_max_user_devices);
+static CLASS_DEVICE_ATTR(max_user_watches, S_IRUGO | S_IWUSR,
+ show_max_user_watches, store_max_user_watches);
+
+static inline void get_inotify_dev(struct inotify_device *dev)
+{
+ atomic_inc(&dev->count);
+}
+
+static inline void put_inotify_dev(struct inotify_device *dev)
+{
+ if (atomic_dec_and_test(&dev->count)) {
+ atomic_dec(&dev->user->inotify_devs);
+ free_uid(dev->user);
+ kfree(dev);
+ }
+}
+
+static inline void get_inotify_watch(struct inotify_watch *watch)
+{
+ atomic_inc(&watch->count);
+}
+
+static inline void put_inotify_watch(struct inotify_watch *watch)
+{
+ if (atomic_dec_and_test(&watch->count)) {
+ put_inotify_dev(watch->dev);
+ iput(watch->inode);
+ kmem_cache_free(watch_cachep, watch);
+ }
+}
+
+/*
+ * kernel_event - create a new kernel event with the given parameters
+ */
+static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_kernel_event *kevent;
+
+ /* XXX: optimally, we should use GFP_KERNEL */
+ kevent = kmem_cache_alloc(event_cachep, GFP_ATOMIC);
+ if (unlikely(!kevent))
+ return NULL;
+
+ /* we hand this out to user-space, so zero it just in case */
+ memset(&kevent->event, 0, sizeof(struct inotify_event));
+
+ kevent->event.wd = wd;
+ kevent->event.mask = mask;
+ kevent->event.cookie = cookie;
+
+ INIT_LIST_HEAD(&kevent->list);
+
+ if (name) {
+ size_t len, rem, event_size = sizeof(struct inotify_event);
+
+ /*
+ * We need to pad the filename so as to properly align an
+ * array of inotify_event structures. Because the structure is
+ * small and the common case is a small filename, we just round
+ * up to the next multiple of the structure's sizeof. This is
+ * simple and safe for all architectures.
+ */
+ len = strlen(name) + 1;
+ rem = event_size - len;
+ if (len > event_size) {
+ rem = event_size - (len % event_size);
+ if (len % event_size == 0)
+ rem = 0;
+ }
+ len += rem;
+
+ /* XXX: optimally, we should use GFP_KERNEL */
+ kevent->name = kmalloc(len, GFP_ATOMIC);
+ if (unlikely(!kevent->name)) {
+ kmem_cache_free(event_cachep, kevent);
+ return NULL;
+ }
+ memset(kevent->name, 0, len);
+ strncpy(kevent->name, name, strlen(name));
+ kevent->event.len = len;
+ } else {
+ kevent->event.len = 0;
+ kevent->name = NULL;
+ }
+
+ return kevent;
+}
+
+/*
+ * inotify_dev_get_event - return the next event in the given dev's queue
+ *
+ * Caller must hold dev->lock.
+ */
+static inline struct inotify_kernel_event *
+inotify_dev_get_event(struct inotify_device *dev)
+{
+ return list_entry(dev->events.next, struct inotify_kernel_event, list);
+}
+
+/*
+ * inotify_dev_queue_event - add a new event to the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_queue_event(struct inotify_device *dev,
+ struct inotify_watch *watch, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct inotify_kernel_event *kevent, *last;
+
+ /* coalescing: drop this event if it is a dupe of the previous */
+ last = inotify_dev_get_event(dev);
+ if (dev->event_count && last->event.mask == mask &&
+ last->event.wd == watch->wd) {
+ const char *lastname = last->name;
+
+ if (!name && !lastname)
+ return;
+ if (name && lastname && !strcmp(lastname, name))
+ return;
+ }
+
+ /*
+ * The queue has already overflowed and we have already sent the
+ * Q_OVERFLOW event.
+ */
+ if (unlikely(dev->event_count > dev->max_events))
+ return;
+
+ /* if the queue overflows, we need to notify user space */
+ if (unlikely(dev->event_count == dev->max_events))
+ kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
+ else
+ kevent = kernel_event(watch->wd, mask, cookie, name);
+
+ if (unlikely(!kevent))
+ return;
+
+ /* queue the event and wake up anyone waiting */
+ dev->event_count++;
+ dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
+ list_add_tail(&kevent->list, &dev->events);
+ wake_up_interruptible(&dev->wq);
+}
+
+static inline int inotify_dev_has_events(struct inotify_device *dev)
+{
+ return !list_empty(&dev->events);
+}
+
+/*
+ * remove_kevent - cleans up and ultimately frees the given kevent
+ */
+static void remove_kevent(struct inotify_device *dev,
+ struct inotify_kernel_event *kevent)
+{
+ BUG_ON(!dev);
+ BUG_ON(!kevent);
+
+ list_del(&kevent->list);
+
+ dev->event_count--;
+ dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
+
+ if (kevent->name)
+ kfree(kevent->name);
+ kmem_cache_free(event_cachep, kevent);
+}
+
+/*
+ * inotify_dev_event_dequeue - destroy an event on the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_event_dequeue(struct inotify_device *dev)
+{
+ if (inotify_dev_has_events(dev)) {
+ struct inotify_kernel_event *kevent;
+ kevent = inotify_dev_get_event(dev);
+ remove_kevent(dev, kevent);
+ }
+}
+
+/*
+ * inotify_dev_get_wd - returns the next WD for use by the given dev
+ *
+ * Grabs dev->lock. This function can sleep.
+ */
+static int inotify_dev_get_wd(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ int ret;
+
+repeat:
+ if (unlikely(!idr_pre_get(&dev->idr, GFP_KERNEL)))
+ return -ENOSPC;
+ spin_lock(&dev->lock);
+ ret = idr_get_new(&dev->idr, watch, &watch->wd);
+ spin_unlock(&dev->lock);
+ if (ret == -EAGAIN) /* more memory is required, try again */
+ goto repeat;
+ else if (ret) /* the idr is full! */
+ return -ENOSPC;
+
+ return 0;
+}
+
+/*
+ * create_watch - creates a watch on the given device.
+ *
+ * Calls inotify_dev_get_wd(), so it both grabs dev->lock and may sleep.
+ * Both 'dev' and 'inode' (by way of nameidata) need to be pinned.
+ */
+static struct inotify_watch *create_watch(struct inotify_device *dev,
+ u32 mask, struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ if (atomic_read(&dev->user->inotify_watches) >= max_user_watches)
+ return NULL;
+
+ watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
+ if (unlikely(!watch))
+ return NULL;
+
+ if (unlikely(inotify_dev_get_wd(dev, watch))) {
+ kmem_cache_free(watch_cachep, watch);
+ return NULL;
+ }
+
+ watch->mask = mask;
+ atomic_set(&watch->count, 0);
+ INIT_LIST_HEAD(&watch->d_list);
+ INIT_LIST_HEAD(&watch->i_list);
+
+ /* save a reference to device and bump the count to make it official */
+ get_inotify_dev(dev);
+ watch->dev = dev;
+
+ /*
+ * Save a reference to the inode and bump the ref count to make it
+ * official. We hold a reference to nameidata, which makes this safe.
+ */
+ watch->inode = inode;
+ spin_lock(&inode_lock);
+ __iget(inode);
+ spin_unlock(&inode_lock);
+
+ /* bump our own count, corresponding to our entry in dev->watches */
+ get_inotify_watch(watch);
+
+ atomic_inc(&dev->user->inotify_watches);
+
+ return watch;
+}
+
+/*
+ * inotify_find_dev - find the watch associated with the given inode and dev
+ *
+ * Callers must hold inode->inotify_lock.
+ */
+static struct inotify_watch *inode_find_dev(struct inode *inode,
+ struct inotify_device *dev)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->dev == dev)
+ return watch;
+ }
+
+ return NULL;
+}
+
+/*
+ * inotify_dev_is_watching_inode - is this device watching this inode?
+ *
+ * Requires 'dev' and 'inode' to both be pinned and dev->lock be held.
+ */
+static inline int inotify_dev_is_watching_inode(struct inotify_device *dev,
+ struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &dev->watches, d_list) {
+ if (watch->inode == inode)
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * remove_watch_no_event - remove_watch() without the IN_IGNORED event.
+ */
+static void remove_watch_no_event(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ BUG_ON(!dev);
+ BUG_ON(!watch);
+
+ list_del(&watch->i_list);
+ list_del(&watch->d_list);
+
+ atomic_dec(&dev->user->inotify_watches);
+ idr_remove(&dev->idr, watch->wd);
+ put_inotify_watch(watch);
+}
+
+/*
+ * remove_watch - Remove a watch from both the device and the inode. Sends
+ * the IN_IGNORED event to the given device signifying that the inode is no
+ * longer watched.
+ *
+ * Callers must hold both inode->inotify_lock and dev->lock. We drop a
+ * reference to the inode before returning.
+ */
+static void remove_watch(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ inotify_dev_queue_event(dev, watch, IN_IGNORED, 0, NULL);
+ remove_watch_no_event(watch, dev);
+}
+
+/* Kernel API */
+
+/**
+ * inotify_inode_queue_event - queue an event to all watches on this inode
+ * @inode: inode event is originating from
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_watch *watch;
+
+ spin_lock(&inode->inotify_lock);
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->mask & mask) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, mask, cookie, name);
+ spin_unlock(&dev->lock);
+ }
+ }
+ spin_unlock(&inode->inotify_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
+
+/**
+ * inotify_dentry_parent_queue_event - queue an event to a dentry's parent
+ * @dentry: the dentry in question, we queue against this dentry's parent
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct dentry *parent;
+ struct inode *inode;
+
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ inode = parent->d_inode;
+ if (!list_empty(&inode->inotify_watches)) {
+ dget(parent);
+ spin_unlock(&dentry->d_lock);
+ inotify_inode_queue_event(inode, mask, cookie, name);
+ dput(parent);
+ } else
+ spin_unlock(&dentry->d_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
+
+/**
+ * inotify_get_cookie - return a unique cookie for use in synchronizing events
+ *
+ * Returns the unique cookie.
+ */
+u32 inotify_get_cookie(void)
+{
+ return atomic_inc_return(&inotify_cookie);
+}
+EXPORT_SYMBOL_GPL(inotify_get_cookie);
+
+/**
+ * inotify_super_block_umount - process watches on an unmounted fs
+ * @sb: the super_block of the filesystem in question
+ */
+void inotify_super_block_umount(struct super_block *sb)
+{
+ struct inode *inode;
+
+ /* Walk the list of inodes and find those on this superblock */
+ spin_lock(&inode_lock);
+ list_for_each_entry(inode, &inode_in_use, i_list) {
+ struct inotify_watch *watch, *next;
+ struct list_head *watches;
+
+ if (inode->i_sb != sb)
+ continue;
+
+ /* for each watch, send IN_UNMOUNT and then remove it */
+ spin_lock(&inode->inotify_lock);
+ watches = &inode->inotify_watches;
+ list_for_each_entry_safe(watch, next, watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, IN_UNMOUNT,0,NULL);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ spin_unlock(&inode->inotify_lock);
+ }
+ spin_unlock(&inode_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_super_block_umount);
+
+/**
+ * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
+ * @inode: inode that is about to be removed
+ */
+void inotify_inode_is_dead(struct inode *inode)
+{
+ struct inotify_watch *watch, *next;
+
+ spin_lock(&inode->inotify_lock);
+ list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ spin_unlock(&inode->inotify_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
+
+/* Device Interface */
+
+static unsigned int inotify_poll(struct file *file, poll_table *wait)
+{
+ struct inotify_device *dev;
+ int ret = 0;
+
+ dev = file->private_data;
+ get_inotify_dev(dev);
+
+ poll_wait(file, &dev->wq, wait);
+ spin_lock(&dev->lock);
+ if (inotify_dev_has_events(dev))
+ ret = POLLIN | POLLRDNORM;
+ spin_unlock(&dev->lock);
+
+ put_inotify_dev(dev);
+ return ret;
+}
+
+static ssize_t inotify_read(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ size_t event_size;
+ struct inotify_device *dev;
+ char __user *start;
+ int ret;
+ DEFINE_WAIT(wait);
+
+ start = buf;
+ dev = file->private_data;
+
+ /* we only hand out full inotify events */
+ event_size = sizeof(struct inotify_event);
+ if (count < event_size)
+ return 0;
+
+ while (1) {
+ int events;
+
+ prepare_to_wait(&dev->wq, &wait, TASK_INTERRUPTIBLE);
+
+ spin_lock(&dev->lock);
+ events = inotify_dev_has_events(dev);
+ spin_unlock(&dev->lock);
+ if (events) {
+ ret = 0;
+ break;
+ }
+
+ if (file->f_flags & O_NONBLOCK) {
+ ret = -EAGAIN;
+ break;
+ }
+
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+
+ schedule();
+ }
+
+ finish_wait(&dev->wq, &wait);
+ if (ret)
+ return ret;
+
+ while (1) {
+ struct inotify_kernel_event *kevent;
+
+ spin_lock(&dev->lock);
+ if (!inotify_dev_has_events(dev)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ kevent = inotify_dev_get_event(dev);
+ if (event_size + kevent->event.len > count) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ list_del_init(&kevent->list);
+ spin_unlock(&dev->lock);
+
+ if (copy_to_user(buf, &kevent->event, event_size)) {
+ /* put the event back on the queue */
+ spin_lock(&dev->lock);
+ list_add(&kevent->list, &dev->events);
+ spin_unlock(&dev->lock);
+ return -EFAULT;
+ }
+ buf += event_size;
+ count -= event_size;
+
+ if (kevent->name) {
+ if (copy_to_user(buf, kevent->name, kevent->event.len)){
+ /* put the event back on the queue */
+ spin_lock(&dev->lock);
+ list_add(&kevent->list, &dev->events);
+ spin_unlock(&dev->lock);
+ return -EFAULT;
+ }
+ buf += kevent->event.len;
+ count -= kevent->event.len;
+ }
+
+ /*
+ * We made it here, so the event was copied to the user. It is
+ * already removed from the event list, just free it.
+ */
+ spin_lock(&dev->lock);
+ remove_kevent(dev, kevent);
+ spin_unlock(&dev->lock);
+ }
+
+ return buf - start;
+}
+
+static int inotify_open(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+ struct user_struct *user;
+ int ret;
+
+ user = get_uid(current->user);
+
+ if (unlikely(atomic_read(&user->inotify_devs) >= max_user_devices)) {
+ ret = -EMFILE;
+ goto out_err;
+ }
+
+ dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
+ if (unlikely(!dev)) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ idr_init(&dev->idr);
+ INIT_LIST_HEAD(&dev->events);
+ INIT_LIST_HEAD(&dev->watches);
+ init_waitqueue_head(&dev->wq);
+ spin_lock_init(&dev->lock);
+
+ dev->event_count = 0;
+ dev->queue_size = 0;
+ dev->max_events = max_queued_events;
+ dev->user = user;
+ atomic_set(&dev->count, 0);
+
+ get_inotify_dev(dev);
+ atomic_inc(&current->user->inotify_devs);
+
+ file->private_data = dev;
+
+ return 0;
+out_err:
+ free_uid(current->user);
+ return ret;
+}
+
+static int inotify_release(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+
+ dev = file->private_data;
+ BUG_ON(!dev);
+
+ /*
+ * Destroy all of the watches on this device. Unfortunately, not very
+ * pretty. We cannot do a simple iteration over the list, because we
+ * do not know the inode until we iterate to the watch. But we need to
+ * hold inode->inotify_lock before dev->lock. The following works.
+ */
+ while (1) {
+ struct inotify_watch *watch;
+ struct list_head *watches;
+ struct inode *inode;
+
+ spin_lock(&dev->lock);
+ watches = &dev->watches;
+ if (list_empty(watches)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ watch = list_entry(watches->next, struct inotify_watch, d_list);
+ get_inotify_watch(watch);
+ spin_unlock(&dev->lock);
+
+ inode = watch->inode;
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ remove_watch_no_event(watch, dev);
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+ put_inotify_watch(watch);
+ }
+
+ /* destroy all of the events on this device */
+ spin_lock(&dev->lock);
+ while (inotify_dev_has_events(dev))
+ inotify_dev_event_dequeue(dev);
+ spin_unlock(&dev->lock);
+
+ /* free this device: the put matching the get in inotify_open() */
+ put_inotify_dev(dev);
+
+ return 0;
+}
+
+static int inotify_add_watch(struct inotify_device *dev,
+ struct inotify_watch_request *request)
+{
+ struct inode *inode;
+ struct inotify_watch *watch, *old;
+ struct nameidata nd;
+ int ret;
+
+ ret = __user_walk(request->name, LOOKUP_FOLLOW, &nd);
+ if (unlikely(ret))
+ return ret;
+
+ /* you can only watch an inode if you have read permissions on it */
+ ret = permission(nd.dentry->d_inode, MAY_READ, NULL);
+ if (unlikely(ret))
+ goto nd_out;
+
+ /* inode is held in place by a reference on nd */
+ inode = nd.dentry->d_inode;
+
+ /*
+ * Handle the case of re-adding a watch on an (inode,dev) pair that we
+ * are already watching. We just update the mask and return its wd.
+ */
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ old->mask = request->mask;
+ ret = old->wd;
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+ goto nd_out;
+ }
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+ /*
+ * We do this lockless, for both scalability and so we can allocate
+ * with GFP_KERNEL. But that means we can race here and add this watch
+ * twice. We fix up that case below, by again checking for the watch
+ * after we reacquire the locks.
+ */
+ watch = create_watch(dev, request->mask, inode);
+ if (unlikely(!watch)) {
+ ret = -ENOSPC;
+ goto nd_out;
+ }
+
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ /* We raced! Destroy this watch and return. */
+ put_inotify_watch(watch);
+ ret = -EBUSY;
+ } else {
+ /* Add the watch to the device's and the inode's list */
+ list_add(&watch->d_list, &dev->watches);
+ list_add(&watch->i_list, &inode->inotify_watches);
+ ret = watch->wd;
+ }
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+nd_out:
+ path_release(&nd);
+
+ return ret;
+}
+
+/*
+ * inotify_ignore - handle the INOTIFY_IGNORE ioctl, asking that a given wd be
+ * removed from the device.
+ */
+static int inotify_ignore(struct inotify_device *dev, s32 wd)
+{
+ struct inotify_watch *watch;
+ struct inode *inode;
+
+ spin_lock(&dev->lock);
+ watch = idr_find(&dev->idr, wd);
+ spin_unlock(&dev->lock);
+
+ if (unlikely(!watch))
+ return -EINVAL;
+
+ inode = watch->inode;
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+ return 0;
+}
+
+static int inotify_ioctl(struct inode *ip, struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct inotify_device *dev;
+ struct inotify_watch_request request;
+ void __user *p;
+ int ret = -ENOTTY;
+ s32 wd;
+
+ dev = file->private_data;
+ p = (void __user *) arg;
+
+ get_inotify_dev(dev);
+
+ switch (cmd) {
+ case INOTIFY_WATCH:
+ if (unlikely(copy_from_user(&request, p, sizeof (request)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_add_watch(dev, &request);
+ break;
+ case INOTIFY_IGNORE:
+ if (unlikely(copy_from_user(&wd, p, sizeof (wd)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_ignore(dev, wd);
+ break;
+ case FIONREAD:
+ ret = put_user(dev->queue_size, (int __user *) p);
+ break;
+ }
+
+ put_inotify_dev(dev);
+
+ return ret;
+}
+
+static struct file_operations inotify_fops = {
+ .owner = THIS_MODULE,
+ .poll = inotify_poll,
+ .read = inotify_read,
+ .open = inotify_open,
+ .release = inotify_release,
+ .ioctl = inotify_ioctl,
+};
+
+static struct miscdevice inotify_device = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "inotify",
+ .fops = &inotify_fops,
+};
+
+/*
+ * inotify_init - Our initialization function. Note that we cannnot return
+ * error because we have compiled-in VFS hooks. So an (unlikely) failure here
+ * must result in panic().
+ */
+static int __init inotify_init(void)
+{
+ struct class_device *class;
+ int ret;
+
+ ret = misc_register(&inotify_device);
+ if (unlikely(ret))
+ panic("inotify: misc_register returned %d\n", ret);
+
+ max_queued_events = 512;
+ max_user_devices = 64;
+ max_user_watches = 16384;
+
+ class = inotify_device.class;
+ class_device_create_file(class, &class_device_attr_max_queued_events);
+ class_device_create_file(class, &class_device_attr_max_user_devices);
+ class_device_create_file(class, &class_device_attr_max_user_watches);
+
+ atomic_set(&inotify_cookie, 0);
+
+ watch_cachep = kmem_cache_create("inotify_watch_cache",
+ sizeof(struct inotify_watch),
+ 0, SLAB_PANIC, NULL, NULL);
+ event_cachep = kmem_cache_create("inotify_event_cache",
+ sizeof(struct inotify_kernel_event),
+ 0, SLAB_PANIC, NULL, NULL);
+
+ printk(KERN_INFO "inotify device minor=%d\n", inotify_device.minor);
+
+ return 0;
+}
+
+module_init(inotify_init);
diff -urN linux-2.6.11-mm1/fs/Kconfig linux/fs/Kconfig
--- linux-2.6.11-mm1/fs/Kconfig 2005-03-04 13:23:55.609367088 -0500
+++ linux/fs/Kconfig 2005-03-04 13:27:05.570488608 -0500
@@ -344,6 +344,19 @@
If you don't know whether you need it, then you don't need it:
answer N.

+config INOTIFY
+ bool "Inotify file change notification support"
+ default y
+ ---help---
+ Say Y here to enable inotify support and the /dev/inotify character
+ device. Inotify is a file change notification system and a
+ replacement for dnotify. Inotify fixes numerous shortcomings in
+ dnotify and introduces several new features. It allows monitoring
+ of both files and directories via a single open fd. Multiple file
+ events are supported.
+
+ If unsure, say Y.
+
config QUOTA
bool "Quota support"
help
diff -urN linux-2.6.11-mm1/fs/Makefile linux/fs/Makefile
--- linux-2.6.11-mm1/fs/Makefile 2005-03-04 13:23:55.616366024 -0500
+++ linux/fs/Makefile 2005-03-04 13:27:05.571488456 -0500
@@ -11,6 +11,7 @@
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \

+obj-$(CONFIG_INOTIFY) += inotify.o
obj-$(CONFIG_EPOLL) += eventpoll.o
obj-$(CONFIG_COMPAT) += compat.o

diff -urN linux-2.6.11-mm1/fs/namei.c linux/fs/namei.c
--- linux-2.6.11-mm1/fs/namei.c 2005-03-04 14:06:21.750294832 -0500
+++ linux/fs/namei.c 2005-03-04 13:27:05.574488000 -0500
@@ -21,7 +21,7 @@
#include <linux/namei.h>
#include <linux/quotaops.h>
#include <linux/pagemap.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/smp_lock.h>
#include <linux/personality.h>
#include <linux/security.h>
@@ -1261,7 +1261,7 @@
DQUOT_INIT(dir);
error = dir->i_op->create(dir, dentry, mode, nd);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_create(dir, dentry, mode);
}
return error;
@@ -1566,7 +1566,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mknod(dir, dentry, mode, dev);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_mknod(dir, dentry, mode, dev);
}
return error;
@@ -1639,7 +1639,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mkdir(dir, dentry, mode);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_mkdir(dir, dentry->d_name.name);
security_inode_post_mkdir(dir,dentry, mode);
}
return error;
@@ -1729,10 +1729,8 @@
}
}
up(&dentry->d_inode->i_sem);
- if (!error) {
- inode_dir_notify(dir, DN_DELETE);
- d_delete(dentry);
- }
+ if (!error)
+ fsnotify_rmdir(dentry, dentry->d_inode, dir);
dput(dentry);

return error;
@@ -1802,10 +1800,9 @@
up(&dentry->d_inode->i_sem);

/* We don't d_delete() NFS sillyrenamed files--they still exist. */
- if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
- d_delete(dentry);
- inode_dir_notify(dir, DN_DELETE);
- }
+ if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED))
+ fsnotify_unlink(dentry->d_inode, dir, dentry);
+
return error;
}

@@ -1879,7 +1876,7 @@
DQUOT_INIT(dir);
error = dir->i_op->symlink(dir, dentry, oldname);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_symlink(dir, dentry, oldname);
}
return error;
@@ -1952,7 +1949,7 @@
error = dir->i_op->link(old_dentry, dir, new_dentry);
up(&old_dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, new_dentry->d_name.name);
security_inode_post_link(old_dentry, dir, new_dentry);
}
return error;
@@ -2116,6 +2113,7 @@
{
int error;
int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
+ char *old_name;

if (old_dentry->d_inode == new_dentry->d_inode)
return 0;
@@ -2137,18 +2135,18 @@
DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir);

+ old_name = fsnotify_oldname_init(old_dentry);
+
if (is_dir)
error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
else
error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
if (!error) {
- if (old_dir == new_dir)
- inode_dir_notify(old_dir, DN_RENAME);
- else {
- inode_dir_notify(old_dir, DN_DELETE);
- inode_dir_notify(new_dir, DN_CREATE);
- }
+ const char *new_name = old_dentry->d_name.name;
+ fsnotify_move(old_dir, new_dir, old_name, new_name);
}
+ fsnotify_oldname_free(old_name);
+
return error;
}

diff -urN linux-2.6.11-mm1/fs/open.c linux/fs/open.c
--- linux-2.6.11-mm1/fs/open.c 2005-03-04 14:06:21.751294680 -0500
+++ linux/fs/open.c 2005-03-04 13:27:05.576487696 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/smp_lock.h>
#include <linux/quotaops.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/tty.h>
@@ -944,9 +944,14 @@
fd = get_unused_fd();
if (fd >= 0) {
struct file *f = filp_open(tmp, flags, mode);
+ struct dentry *dentry;
+
error = PTR_ERR(f);
if (IS_ERR(f))
goto out_error;
+ dentry = f->f_dentry;
+ fsnotify_open(dentry, dentry->d_inode,
+ dentry->d_name.name);
fd_install(fd, f);
}
out:
@@ -998,7 +1003,7 @@
retval = err;
}

- dnotify_flush(filp, id);
+ fsnotify_flush(filp, id);
locks_remove_posix(filp, id);
fput(filp);
return retval;
diff -urN linux-2.6.11-mm1/fs/read_write.c linux/fs/read_write.c
--- linux-2.6.11-mm1/fs/read_write.c 2005-03-04 14:06:21.753294376 -0500
+++ linux/fs/read_write.c 2005-03-04 13:27:05.577487544 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/uio.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/module.h>
#include <linux/syscalls.h>
@@ -239,7 +239,10 @@
else
ret = do_sync_read(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_ACCESS);
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ fsnotify_access(dentry, inode,
+ dentry->d_name.name);
current->rchar += ret;
}
current->syscr++;
@@ -287,7 +290,10 @@
else
ret = do_sync_write(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_MODIFY);
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ fsnotify_modify(dentry, inode,
+ dentry->d_name.name);
current->wchar += ret;
}
current->syscw++;
@@ -523,9 +529,15 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+
+ if (type == READ)
+ fsnotify_access(dentry, inode, dentry->d_name.name);
+ else
+ fsnotify_modify(dentry, inode, dentry->d_name.name);
+ }
return ret;
Efault:
ret = -EFAULT;
diff -urN linux-2.6.11-mm1/fs/super.c linux/fs/super.c
--- linux-2.6.11-mm1/fs/super.c 2005-03-04 14:06:21.754294224 -0500
+++ linux/fs/super.c 2005-03-04 13:27:05.578487392 -0500
@@ -36,6 +36,7 @@
#include <linux/vfs.h>
#include <linux/writeback.h> /* for the emergency remount stuff */
#include <linux/idr.h>
+#include <linux/fsnotify.h>
#include <linux/kobject.h>
#include <asm/uaccess.h>

@@ -229,6 +230,7 @@

if (root) {
sb->s_root = NULL;
+ fsnotify_sb_umount(sb);
shrink_dcache_parent(root);
shrink_dcache_anon(&sb->s_anon);
dput(root);
diff -urN linux-2.6.11-mm1/include/linux/fs.h linux/include/linux/fs.h
--- linux-2.6.11-mm1/include/linux/fs.h 2005-03-04 14:06:21.755294072 -0500
+++ linux/include/linux/fs.h 2005-03-04 13:27:05.581486936 -0500
@@ -223,6 +223,7 @@
struct kstatfs;
struct vm_area_struct;
struct vfsmount;
+struct inotify_inode_data;

/* Used to be a macro which just called the function, now just a function */
extern void update_atime (struct inode *);
@@ -473,6 +474,11 @@
struct dnotify_struct *i_dnotify; /* for directory notifications */
#endif

+#ifdef CONFIG_INOTIFY
+ struct list_head inotify_watches; /* watches on this inode */
+ spinlock_t inotify_lock; /* protects the watches list */
+#endif
+
unsigned long i_state;
unsigned long dirtied_when; /* jiffies of first dirtying */

@@ -1368,7 +1374,7 @@
extern int do_remount_sb(struct super_block *sb, int flags,
void *data, int force);
extern sector_t bmap(struct inode *, sector_t);
-extern int setattr_mask(unsigned int);
+extern void setattr_mask(unsigned int, int *, u32 *);
extern int notify_change(struct dentry *, struct iattr *);
extern int permission(struct inode *, int, struct nameidata *);
extern int generic_permission(struct inode *, int,
diff -urN linux-2.6.11-mm1/include/linux/fsnotify.h linux/include/linux/fsnotify.h
--- linux-2.6.11-mm1/include/linux/fsnotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/fsnotify.h 2005-03-04 13:27:05.583486632 -0500
@@ -0,0 +1,235 @@
+#ifndef _LINUX_FS_NOTIFY_H
+#define _LINUX_FS_NOTIFY_H
+
+/*
+ * include/linux/fs_notify.h - generic hooks for filesystem notification, to
+ * reduce in-source duplication from both dnotify and inotify.
+ *
+ * We don't compile any of this away in some complicated menagerie of ifdefs.
+ * Instead, we rely on the code inside to optimize away as needed.
+ *
+ * (C) Copyright 2005 Robert Love
+ */
+
+#ifdef __KERNEL__
+
+#include <linux/dnotify.h>
+#include <linux/inotify.h>
+
+/*
+ * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
+ */
+static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
+ const char *old_name, const char *new_name)
+{
+ u32 cookie;
+
+ if (old_dir == new_dir)
+ inode_dir_notify(old_dir, DN_RENAME);
+ else {
+ inode_dir_notify(old_dir, DN_DELETE);
+ inode_dir_notify(new_dir, DN_CREATE);
+ }
+
+ cookie = inotify_get_cookie();
+
+ inotify_inode_queue_event(old_dir, IN_MOVED_FROM, cookie, old_name);
+ inotify_inode_queue_event(new_dir, IN_MOVED_TO, cookie, new_name);
+}
+
+/*
+ * fsnotify_unlink - file was unlinked
+ */
+static inline void fsnotify_unlink(struct inode *inode, struct inode *dir,
+ struct dentry *dentry)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_FILE, 0, dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+ d_delete(dentry);
+}
+
+/*
+ * fsnotify_rmdir - directory was removed
+ */
+static inline void fsnotify_rmdir(struct dentry *dentry, struct inode *inode,
+ struct inode *dir)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_SUBDIR,0,dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+ d_delete(dentry);
+}
+
+/*
+ * fsnotify_create - filename was linked in
+ */
+static inline void fsnotify_create(struct inode *inode, const char *filename)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_FILE, 0, filename);
+}
+
+/*
+ * fsnotify_mkdir - directory 'name' was created
+ */
+static inline void fsnotify_mkdir(struct inode *inode, const char *name)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_SUBDIR, 0, name);
+}
+
+/*
+ * fsnotify_access - file was read
+ */
+static inline void fsnotify_access(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ dnotify_parent(dentry, DN_ACCESS);
+ inotify_dentry_parent_queue_event(dentry, IN_ACCESS, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
+}
+
+/*
+ * fsnotify_modify - file was modified
+ */
+static inline void fsnotify_modify(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ dnotify_parent(dentry, DN_MODIFY);
+ inotify_dentry_parent_queue_event(dentry, IN_MODIFY, 0, filename);
+ inotify_inode_queue_event(inode, IN_MODIFY, 0, NULL);
+}
+
+/*
+ * fsnotify_open - file was opened
+ */
+static inline void fsnotify_open(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ inotify_inode_queue_event(inode, IN_OPEN, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, IN_OPEN, 0, filename);
+}
+
+/*
+ * fsnotify_close - file was closed
+ */
+static inline void fsnotify_close(struct dentry *dentry, struct inode *inode,
+ mode_t mode, const char *filename)
+{
+ u32 mask;
+
+ mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
+ inotify_dentry_parent_queue_event(dentry, mask, 0, filename);
+ inotify_inode_queue_event(inode, mask, 0, NULL);
+}
+
+/*
+ * fsnotify_change - notify_change event. file was modified and/or metadata
+ * was changed.
+ */
+static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
+{
+ int dn_mask = 0;
+ u32 in_mask = 0;
+
+ if (ia_valid & ATTR_UID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_GID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_SIZE) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ /* both times implies a utime(s) call */
+ if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
+ {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ } else if (ia_valid & ATTR_ATIME) {
+ in_mask |= IN_ACCESS;
+ dn_mask |= DN_ACCESS;
+ } else if (ia_valid & ATTR_MTIME) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ if (ia_valid & ATTR_MODE) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+
+ if (dn_mask)
+ dnotify_parent(dentry, dn_mask);
+ if (in_mask) {
+ inotify_inode_queue_event(dentry->d_inode, in_mask, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, in_mask, 0,
+ dentry->d_name.name);
+ }
+}
+
+/*
+ * fsnotify_sb_umount - filesystem unmount
+ */
+static inline void fsnotify_sb_umount(struct super_block *sb)
+{
+ inotify_super_block_umount(sb);
+}
+
+/*
+ * fsnotify_flush - flush time!
+ */
+static inline void fsnotify_flush(struct file *filp, fl_owner_t id)
+{
+ dnotify_flush(filp, id);
+}
+
+#ifdef CONFIG_INOTIFY /* inotify helpers */
+
+/*
+ * fsnotify_oldname_init - save off the old filename before we change it
+ *
+ * this could be kstrdup if only we could add that to lib/string.c
+ */
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ char *old_name;
+
+ old_name = kmalloc(strlen(old_dentry->d_name.name) + 1, GFP_KERNEL);
+ if (old_name)
+ strcpy(old_name, old_dentry->d_name.name);
+ return old_name;
+}
+
+/*
+ * fsnotify_oldname_free - free the name we got from fsnotify_oldname_init
+ */
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+ kfree(old_name);
+}
+
+#else /* CONFIG_INOTIFY */
+
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ return NULL;
+}
+
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+}
+
+#endif /* ! CONFIG_INOTIFY */
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_FS_NOTIFY_H */
diff -urN linux-2.6.11-mm1/include/linux/inotify.h linux/include/linux/inotify.h
--- linux-2.6.11-mm1/include/linux/inotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/inotify.h 2005-03-04 13:27:05.584486480 -0500
@@ -0,0 +1,113 @@
+/*
+ * Inode based directory notification for Linux
+ *
+ * Copyright (C) 2005 John McCutchan
+ */
+
+#ifndef _LINUX_INOTIFY_H
+#define _LINUX_INOTIFY_H
+
+#include <linux/types.h>
+#include <linux/limits.h>
+
+/*
+ * struct inotify_event - structure read from the inotify device for each event
+ *
+ * When you are watching a directory, you will receive the filename for events
+ * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, ..., relative to the wd.
+ */
+struct inotify_event {
+ __s32 wd; /* watch descriptor */
+ __u32 mask; /* watch mask */
+ __u32 cookie; /* cookie to synchronize two events */
+ size_t len; /* length (including nulls) of name */
+ char name[0]; /* stub for possible name */
+};
+
+/*
+ * struct inotify_watch_request - represents a watch request
+ *
+ * Pass to the inotify device via the INOTIFY_WATCH ioctl
+ */
+struct inotify_watch_request {
+ char *name; /* filename name */
+ __u32 mask; /* event mask */
+};
+
+/* the following are legal, implemented events */
+#define IN_ACCESS 0x00000001 /* File was accessed */
+#define IN_MODIFY 0x00000002 /* File was modified */
+#define IN_ATTRIB 0x00000004 /* File changed attributes */
+#define IN_CLOSE_WRITE 0x00000008 /* Writtable file was closed */
+#define IN_CLOSE_NOWRITE 0x00000010 /* Unwrittable file closed */
+#define IN_OPEN 0x00000020 /* File was opened */
+#define IN_MOVED_FROM 0x00000040 /* File was moved from X */
+#define IN_MOVED_TO 0x00000080 /* File was moved to Y */
+#define IN_DELETE_SUBDIR 0x00000100 /* Subdir was deleted */
+#define IN_DELETE_FILE 0x00000200 /* Subfile was deleted */
+#define IN_CREATE_SUBDIR 0x00000400 /* Subdir was created */
+#define IN_CREATE_FILE 0x00000800 /* Subfile was created */
+#define IN_DELETE_SELF 0x00001000 /* Self was deleted */
+#define IN_UNMOUNT 0x00002000 /* Backing fs was unmounted */
+#define IN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define IN_IGNORED 0x00008000 /* File was ignored */
+
+/* special flags */
+#define IN_ALL_EVENTS 0xffffffff /* All the events */
+#define IN_CLOSE (IN_CLOSE_WRITE | IN_CLOSE_NOWRITE)
+
+#define INOTIFY_IOCTL_MAGIC 'Q'
+#define INOTIFY_IOCTL_MAXNR 2
+
+#define INOTIFY_WATCH _IOR(INOTIFY_IOCTL_MAGIC, 1, struct inotify_watch_request)
+#define INOTIFY_IGNORE _IOR(INOTIFY_IOCTL_MAGIC, 2, int)
+
+#ifdef __KERNEL__
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/config.h>
+#include <asm/atomic.h>
+
+#ifdef CONFIG_INOTIFY
+
+extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
+ const char *);
+extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
+ const char *);
+extern void inotify_super_block_umount(struct super_block *);
+extern void inotify_inode_is_dead(struct inode *);
+extern u32 inotify_get_cookie(void);
+
+#else
+
+static inline void inotify_inode_queue_event(struct inode *inode,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_dentry_parent_queue_event(struct dentry *dentry,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_super_block_umount(struct super_block *sb)
+{
+}
+
+static inline void inotify_inode_is_dead(struct inode *inode)
+{
+}
+
+static inline u32 inotify_get_cookie(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_INOTIFY */
+
+#endif /* __KERNEL __ */
+
+#endif /* _LINUX_INOTIFY_H */
diff -urN linux-2.6.11-mm1/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.6.11-mm1/include/linux/sched.h 2005-03-04 14:06:21.766292400 -0500
+++ linux/include/linux/sched.h 2005-03-04 13:27:05.586486176 -0500
@@ -411,6 +411,8 @@
atomic_t processes; /* How many processes does this user have? */
atomic_t files; /* How many open files does this user have? */
atomic_t sigpending; /* How many pending signals does this user have? */
+ atomic_t inotify_watches; /* How many inotify watches does this user have? */
+ atomic_t inotify_devs; /* How many inotify devs does this user have opened? */
/* protected by mq_lock */
unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
unsigned long locked_shm; /* How many pages of mlocked shm ? */
diff -urN linux-2.6.11-mm1/kernel/user.c linux/kernel/user.c
--- linux-2.6.11-mm1/kernel/user.c 2005-03-04 14:06:21.766292400 -0500
+++ linux/kernel/user.c 2005-03-04 13:27:05.588485872 -0500
@@ -120,6 +120,8 @@
atomic_set(&new->processes, 0);
atomic_set(&new->files, 0);
atomic_set(&new->sigpending, 0);
+ atomic_set(&new->inotify_watches, 0);
+ atomic_set(&new->inotify_devs, 0);

new->mq_bytes = 0;
new->locked_shm = 0;


2005-03-04 23:43:19

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Fri, 2005-03-04 at 15:38 -0600, Timothy R. Chavez wrote:

Hi, Mr. Chavez.

> Are there plans of reworking the "generic" hooking infrastructure
> (fsnotify.h) to be more like the security hooking framework (+
> stacking)? I think it'd be nice to be able to have a fs_notify struct
> of function pointers, point at the one's I've chosen to implement, and
> then register / unregister with the framework. Maybe this is an
> overly complicated approach, but these don't seem like they're generic
> hooks in anyway.

Personally, I think it is overkill. I don't think we are going to have
the myriad of file notification systems that we have for security layers
(indeed, the goal is to have just inotify).

That said, we could always make the layer more pluggable once inotify is
in. I would not fight that. But, personally I don't see any real
benefit, just additional complexity and overhead.

> + * include/linux/fs_notify.h - >generic< hooks for filesystem notification, to
> + * reduce in-source duplication from both >dnotify and inotify<.
>
> I guess I don't fully understand that comment. Just quickly glancing
> at it, all you've done is added a level of indirection and shifted the
> same redundant code from the VFS to fs_notify.h -- Please correct me
> if I'm wrong (not at all uncommon).

No, you are right. The "generic" part is supposed to be what is in the
VFS. E.g., the fsnotify_foo() calls are supposed to be the generic
interface.

The body of these calls, as you can see, is static code, a simple copy
and cleanup of the inotify + dnotify hooks.

The idea, spurred by Christoph Hellwig's suggestion, was to keep the VFS
clean. Not make a super neat pluggable notification system. I think
the layers ARE generic, though, in the sense that foonotify could
probably drop some static code into fsnotify.h and work.

> As you already know, there's work being done on the audit subsystem
> that also needs notifications from the filesystem and would require
> yet another set of hooks. However, where we get notified might differ
> from where inotify and dnotify get notified and it seems like
> fs_notify is tailored specifically for inotify (and accommodates
> dnotify out of obligation) and openly implements the "generic" hooks
> it requires.
>
> Regardless, if this is the way it's going to be done. We'll expand
> fs_notify.h to meet our needs as well.

If we end up duplicating stuff and making a big mess, then the audit
layer and the notification layer should DEFINITELY look at merging and
consolidating. But I think that we need to wait until one or the other
gets more traction and into the mainline kernel.

> Also, FYI:
> I just purchased the 2nd edition of your book, looking forward to reading it.

Great. Hope you enjoy it! ;-)

Best,

Robert Love


2005-03-04 23:43:22

by Timothy R. Chavez

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Fri, 04 Mar 2005 13:37:24 -0500, Robert Love <[email protected]> wrote:
> Below is inotify, diffed against 2.6.11.
>
> I greatly reworked much of the data structures and their interactions,
> to lay the groundwork for sanitizing the locking. I then, I hope,
> sanitized the locking. It looks right, I am happy. Comments welcome.
> I surely could of missed something. Maybe even something big.
>
> But, regardless, this release is a huge jump from the previous, fixing
> all known issues and greatly improving the locking.
>
> Best,
>
> Robert Love

Hey Robert,

Are there plans of reworking the "generic" hooking infrastructure
(fsnotify.h) to be more like the security hooking framework (+
stacking)? I think it'd be nice to be able to have a fs_notify struct
of function pointers, point at the one's I've chosen to implement, and
then register / unregister with the framework. Maybe this is an
overly complicated approach, but these don't seem like they're generic
hooks in anyway.

+ * include/linux/fs_notify.h - >generic< hooks for filesystem notification, to
+ * reduce in-source duplication from both >dnotify and inotify<.

I guess I don't fully understand that comment. Just quickly glancing
at it, all you've done is added a level of indirection and shifted the
same redundant code from the VFS to fs_notify.h -- Please correct me
if I'm wrong (not at all uncommon).

As you already know, there's work being done on the audit subsystem
that also needs notifications from the filesystem and would require
yet another set of hooks. However, where we get notified might differ
from where inotify and dnotify get notified and it seems like
fs_notify is tailored specifically for inotify (and accommodates
dnotify out of obligation) and openly implements the "generic" hooks
it requires.

Regardless, if this is the way it's going to be done. We'll expand
fs_notify.h to meet our needs as well.

Also, FYI:
I just purchased the 2nd edition of your book, looking forward to reading it.

<snip>

--
- Timothy R. Chavez

2005-03-06 00:10:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Fri, Mar 04, 2005 at 01:37:24PM -0500, Robert Love wrote:
> Below is inotify, diffed against 2.6.11.
>
> I greatly reworked much of the data structures and their interactions,
> to lay the groundwork for sanitizing the locking. I then, I hope,
> sanitized the locking. It looks right, I am happy. Comments welcome.
> I surely could of missed something. Maybe even something big.
>
> But, regardless, this release is a huge jump from the previous, fixing
> all known issues and greatly improving the locking.
>

The user interface is still bogus. Also now version of it has stayed in
-mm long enough because bad bugs pop up almost weekly.

2005-03-06 00:37:39

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Sun, 2005-03-06 at 00:04 +0000, Christoph Hellwig wrote:

> The user interface is still bogus.

I presume you are talking about the ioctl. I have tried to engage you
and others on what exactly you prefer instead. I have said that moving
to a write interface is fine but I don't see how ut is _any_ better than
the ioctl. Write is less typed, in fact, since we lose the command
versus argument delineation.

But if it is a anonymous decision, I'll switch it. Or take patches. ;-)
It isn't a big deal.

> Also now version of it has stayed in -mm long enough because bad
> bugs pop up almost weekly.

I don't follow this sentence.

Best,

Robert Love


2005-03-07 01:21:02

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11-mm1

> - if ((ret + (type == READ)) > 0)
> - dnotify_parent(file->f_dentry,
> - (type == READ) ? DN_ACCESS : DN_MODIFY);
> + if ((ret + (type == READ)) > 0) {
> + struct dentry *dentry = file->f_dentry;
> + if (type == READ)
> + fsnotify_access(dentry, dentry->d_inode,
> + dentry->d_name.name);
> + else
> + fsnotify_modify(dentry, dentry->d_inode,
> + dentry->d_name.name);
> + }

Both fsnotify_access and fsnotify_modify always get the second argument
as ->d_inode and third argument as ->d_name.name of the first, please fix
this blatantly stupid API.

> + fsnotify_close(dentry, inode, file->f_mode, dentry->d_name.name);

dito for thw second and last argument here.. In fact fsnotify_close
should just get a struct file *

> might_sleep();

this one seems totally unrelated.

> +/*
> + * struct inotify_device - represents an open instance of an inotify device
> + *
> + * This structure is protected by 'lock'.
> + */
> +struct inotify_device {
> + wait_queue_head_t wq; /* wait queue for i/o */
> + struct idr idr; /* idr mapping wd -> watch */
> + struct list_head events; /* list of queued events */
> + struct list_head watches; /* list of watches */
> + spinlock_t lock; /* protects this bad boy */
> + atomic_t count; /* reference count */
> + struct user_struct *user; /* user who opened this dev */
> + unsigned int queue_size; /* size of the queue (bytes) */
> + unsigned int event_count; /* number of pending events */
> + unsigned int max_events; /* maximum number of events */
> +};

> +/*
> + * kernel_event - create a new kernel event with the given parameters
> + */
> +static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
> + const char *name)
> +{
> + struct inotify_kernel_event *kevent;
> +
> + /* XXX: optimally, we should use GFP_KERNEL */
> + kevent = kmem_cache_alloc(event_cachep, GFP_ATOMIC);

indeed. having a new atomic memory allocation in every filesystem operation
sounds like a really bad idea.

> + /* XXX: optimally, we should use GFP_KERNEL */
> + kevent->name = kmalloc(len, GFP_ATOMIC);

and two is even worse. A notification that fails under slight load
isn't that helpfull.

> +static inline int inotify_dev_has_events(struct inotify_device *dev)
> +{
> + return !list_empty(&dev->events);
> +}

just remove this bit of obsfucation.

> + if (kevent->name)
> + kfree(kevent->name);

kfree(NULL) is just fine.

> +
> +repeat:
> + if (unlikely(!idr_pre_get(&dev->idr, GFP_KERNEL)))
> + return -ENOSPC;
> + spin_lock(&dev->lock);
> + ret = idr_get_new(&dev->idr, watch, &watch->wd);
> + spin_unlock(&dev->lock);
> + if (ret == -EAGAIN) /* more memory is required, try again */
> + goto repeat;

what's the problem with a proper do { } while loop here?

> + watch->inode = inode;
> + spin_lock(&inode_lock);
> + __iget(inode);
> + spin_unlock(&inode_lock);

do you ever calls this when igrab() would fail? Don't think so,
and in that case I'd prefer you to use igrab instead of using these
lowlevel functions.

> +void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
> + u32 cookie, const char *name)
> +{
> + struct dentry *parent;
> + struct inode *inode;
> +
> + spin_lock(&dentry->d_lock);
> + parent = dentry->d_parent;
> + inode = parent->d_inode;
> + if (!list_empty(&inode->inotify_watches)) {

So what's protecting ->inotify_watches? There can be multiple dentries
so d_lock is not sufficient.

> +
> +/**
> + * inotify_super_block_umount - process watches on an unmounted fs
> + * @sb: the super_block of the filesystem in question
> + */
> +void inotify_super_block_umount(struct super_block *sb)
> +{
> + struct inode *inode;
> +
> + /* Walk the list of inodes and find those on this superblock */
> + spin_lock(&inode_lock);
> + list_for_each_entry(inode, &inode_in_use, i_list) {
> + struct inotify_watch *watch, *next;
> + struct list_head *watches;
> +
> + if (inode->i_sb != sb)
> + continue;
> +
> + /* for each watch, send IN_UNMOUNT and then remove it */
> + spin_lock(&inode->inotify_lock);
> + watches = &inode->inotify_watches;
> + list_for_each_entry_safe(watch, next, watches, i_list) {
> + struct inotify_device *dev = watch->dev;
> + spin_lock(&dev->lock);
> + inotify_dev_queue_event(dev, watch, IN_UNMOUNT,0,NULL);
> + remove_watch(watch, dev);
> + spin_unlock(&dev->lock);
> + }
> + spin_unlock(&inode->inotify_lock);
> + }
> + spin_unlock(&inode_lock);
> +}
> +EXPORT_SYMBOL_GPL(inotify_super_block_umount);

-mm has a list of inodes per superblock, which Andrew said he'd send
along to lines, you should probably use that one. In fact I wonder
whether you'd be better of just calling a per-inode removal hook from
invalidate_inodes.

> +static unsigned int inotify_poll(struct file *file, poll_table *wait)
> +{
> + struct inotify_device *dev;
> + int ret = 0;
> +
> + dev = file->private_data;
> + get_inotify_dev(dev);

tabs vs spaces messups.

> +static int inotify_add_watch(struct inotify_device *dev,
> + struct inotify_watch_request *request)
> +{
> + struct inode *inode;
> + struct inotify_watch *watch, *old;
> + struct nameidata nd;
> + int ret;
> +
> + ret = __user_walk(request->name, LOOKUP_FOLLOW, &nd);
> + if (unlikely(ret))
> + return ret;
> +
> + /* you can only watch an inode if you have read permissions on it */
> + ret = permission(nd.dentry->d_inode, MAY_READ, NULL);

permission takes a nameidata as last parameter, so please do so aswell.

> +
> + switch (cmd) {
> + case INOTIFY_WATCH:
> + if (unlikely(copy_from_user(&request, p, sizeof (request)))) {
> + ret = -EFAULT;
> + break;
> + }
> + ret = inotify_add_watch(dev, &request);
> + break;
> + case INOTIFY_IGNORE:
> + if (unlikely(copy_from_user(&wd, p, sizeof (wd)))) {
> + ret = -EFAULT;
> + break;
> + }
> + ret = inotify_ignore(dev, wd);
> + break;
> + case FIONREAD:
> + ret = put_user(dev->queue_size, (int __user *) p);
> + break;
> + }
> +
> + put_inotify_dev(dev);
> +
> + return ret;

Both INOTIFY_WATCH and INOTIFY_IGNORE could easily be implemented as
writes to /dev/inotify, echo "wath /path/to/foo 42", echo "ignore 23".

> +static struct miscdevice inotify_device = {
> + .minor = MISC_DYNAMIC_MINOR,
> + .name = "inotify",
> + .fops = &inotify_fops,
> +};

Should probably use the /dev/mem major.

> + default y

please don't default a new and experimental facility to y. In fact
default is totally overused.

> + if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED))
> + fsnotify_unlink(dentry->d_inode, dir, dentry);

first argument can be deduced from the last.

> + fsnotify_open(dentry, dentry->d_inode,
> + dentry->d_name.name);

second and third argument can be taken from the first

> +#ifdef CONFIG_INOTIFY
> + struct list_head inotify_watches; /* watches on this inode */
> + spinlock_t inotify_lock; /* protects the watches list */
> +#endif

do you really need a spinlock of your own in every inode? Inode memory
usage is a quite big problem.

> +static inline void fsnotify_rmdir(struct dentry *dentry, struct inode *inode,
> + struct inode *dir)
> +{
> + inode_dir_notify(dir, DN_DELETE);
> + inotify_inode_queue_event(dir, IN_DELETE_SUBDIR,0,dentry->d_name.name);
> + inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
> +
> + inotify_inode_is_dead(inode);
> + d_delete(dentry);

why is the d_delete in here?

> +/*
> + * fsnotify_change - notify_change event. file was modified and/or metadata
> + * was changed.
> + */
> +static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)

this one is far too large to be inlined.

> +/*
> + * struct inotify_event - structure read from the inotify device for each event
> + *
> + * When you are watching a directory, you will receive the filename for events
> + * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, ..., relative to the wd.
> + */
> +struct inotify_event {
> + __s32 wd; /* watch descriptor */
> + __u32 mask; /* watch mask */
> + __u32 cookie; /* cookie to synchronize two events */
> + size_t len; /* length (including nulls) of name */

please don't use size_t in a user ABI. Just use __u32

> diff -urN linux-2.6.11-mm1/include/linux/sched.h linux/include/linux/sched.h
> --- linux-2.6.11-mm1/include/linux/sched.h 2005-03-04 14:06:21.766292400 -0500
> +++ linux/include/linux/sched.h 2005-03-04 13:27:05.586486176 -0500
> @@ -411,6 +411,8 @@
> atomic_t processes; /* How many processes does this user have? */
> atomic_t files; /* How many open files does this user have? */
> atomic_t sigpending; /* How many pending signals does this user have? */
> + atomic_t inotify_watches; /* How many inotify watches does this user have? */
> + atomic_t inotify_devs; /* How many inotify devs does this user have opened? */

Why are these unconditional, unlike the inode ones?

2005-03-07 01:23:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Sat, Mar 05, 2005 at 07:40:06PM -0500, Robert Love wrote:
> On Sun, 2005-03-06 at 00:04 +0000, Christoph Hellwig wrote:
>
> > The user interface is still bogus.
>
> I presume you are talking about the ioctl. I have tried to engage you
> and others on what exactly you prefer instead. I have said that moving
> to a write interface is fine but I don't see how ut is _any_ better than
> the ioctl. Write is less typed, in fact, since we lose the command
> versus argument delineation.
>
> But if it is a anonymous decision, I'll switch it. Or take patches. ;-)
> It isn't a big deal.

See the review I sent. Write is exactly the right interface for that kind
of thing. For comment vs argument either put the number first so we don't
have the problem of finding a delinator that isn't a valid filename, or
use '\0' as such.

> > Also now version of it has stayed in -mm long enough because bad
> > bugs pop up almost weekly.
>
> I don't follow this sentence.

It means that every re3vision of inotify so far has been buggy in some
respect and ig got dropped from -mm again and again. It should get some
more testing there and not sent firectly for mainline.

2005-03-07 01:38:38

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11-mm1

Christoph Hellwig <[email protected]> wrote:
>
> -mm has a list of inodes per superblock, which Andrew said he'd send
> along to lines, you should probably use that one.

That was merged a month or two ago.

superblock.s_inodes, linked via inode.i_sb_list, protected by inode_lock.

2005-03-07 03:26:51

by Albert Cahalan

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

Christoph Hellwig writes:
> On Sat, Mar 05, 2005 at 07:40:06PM -0500, Robert Love wrote:
>> On Sun, 2005-03-06 at 00:04 +0000, Christoph Hellwig wrote:

>>> The user interface is still bogus.
>>
>> I presume you are talking about the ioctl. I have tried to engage you
>> and others on what exactly you prefer instead. I have said that moving
>> to a write interface is fine but I don't see how ut is _any_ better than
>> the ioctl. Write is less typed, in fact, since we lose the command
>> versus argument delineation.
>>
>> But if it is a anonymous decision, I'll switch it. Or take patches. ;-)
>> It isn't a big deal.
>
> See the review I sent. Write is exactly the right interface for that kind
> of thing. For comment vs argument either put the number first so we don't
> have the problem of finding a delinator that isn't a valid filename, or
> use '\0' as such.

That's just putrid. You've proposed an interface that
combines the worst of ASCII with the worst of binary.

It is now well-established that ASCII interfaces are
horribly slow. This one will be no exception... but
with the '\0' in there, you have a binary interface.
So, it's an evil hybrid.

An ioctl() is a syscall with scope restricting it to a
single fd. This is a fine user interface, not a bogus one.
(keep 32-on-64 operation in mind to be polite)

If you'd rather have a normal (global) system call though,
that'll do too, likely leading to a bit more type checking
in the glibc-provided headers.

Adding plain old syscalls is rather nice actually.
It's only a pain at first, while waiting for glibc
to catch up.


2005-03-07 04:30:38

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Mon, 2005-03-07 at 01:23 +0000, Christoph Hellwig wrote:

> It means that every re3vision of inotify so far has been buggy in some
> respect and ig got dropped from -mm again and again. It should get some
> more testing there and not sent firectly for mainline.

It was dropped from 2.6-mm once.

Robert Love


2005-03-07 21:49:09

by Robert Love

[permalink] [raw]
Subject: [patch] inotify for 2.6.11, updated

On Fri, 2005-03-04 at 13:37 -0500, Robert Love wrote:

> I greatly reworked much of the data structures and their interactions,
> to lay the groundwork for sanitizing the locking. I then, I hope,
> sanitized the locking. It looks right, I am happy. Comments welcome.
> I surely could of missed something. Maybe even something big.
>
> But, regardless, this release is a huge jump from the previous, fixing
> all known issues and greatly improving the locking.

Updated inotify, against 2.6.11, addressing hch's concerns (see other
thread).

Love,

Robert Love


inotify!

inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

* inotify's interface is a device node, not SIGIO. You open a
single fd to the device node, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.

Inotify is currently used by Beagle (a desktop search infrastructure)
and Gamin (a FAM replacement).

Signed-off-by: Robert Love <[email protected]>

fs/Kconfig | 13
fs/Makefile | 1
fs/attr.c | 33 -
fs/compat.c | 14
fs/file_table.c | 3
fs/inode.c | 4
fs/inotify.c | 1014 +++++++++++++++++++++++++++++++++++++++++++++++
fs/namei.c | 30 -
fs/open.c | 6
fs/read_write.c | 15
fs/super.c | 2
include/linux/fs.h | 8
include/linux/fsnotify.h | 236 ++++++++++
include/linux/inotify.h | 113 +++++
include/linux/sched.h | 4
kernel/user.c | 4
16 files changed, 1444 insertions(+), 56 deletions(-)

diff -urN linux-2.6.11/fs/attr.c linux/fs/attr.c
--- linux-2.6.11/fs/attr.c 2005-03-02 02:37:48.000000000 -0500
+++ linux/fs/attr.c 2005-03-07 16:10:46.854669712 -0500
@@ -10,7 +10,7 @@
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/fcntl.h>
#include <linux/quotaops.h>
#include <linux/security.h>
@@ -107,31 +107,8 @@
out:
return error;
}
-
EXPORT_SYMBOL(inode_setattr);

-int setattr_mask(unsigned int ia_valid)
-{
- unsigned long dn_mask = 0;
-
- if (ia_valid & ATTR_UID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_GID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_SIZE)
- dn_mask |= DN_MODIFY;
- /* both times implies a utime(s) call */
- if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
- dn_mask |= DN_ATTRIB;
- else if (ia_valid & ATTR_ATIME)
- dn_mask |= DN_ACCESS;
- else if (ia_valid & ATTR_MTIME)
- dn_mask |= DN_MODIFY;
- if (ia_valid & ATTR_MODE)
- dn_mask |= DN_ATTRIB;
- return dn_mask;
-}
-
int notify_change(struct dentry * dentry, struct iattr * attr)
{
struct inode *inode = dentry->d_inode;
@@ -194,11 +171,9 @@
if (ia_valid & ATTR_SIZE)
up_write(&dentry->d_inode->i_alloc_sem);

- if (!error) {
- unsigned long dn_mask = setattr_mask(ia_valid);
- if (dn_mask)
- dnotify_parent(dentry, dn_mask);
- }
+ if (!error)
+ fsnotify_change(dentry, ia_valid);
+
return error;
}

diff -urN linux-2.6.11/fs/compat.c linux/fs/compat.c
--- linux-2.6.11/fs/compat.c 2005-03-02 02:38:08.000000000 -0500
+++ linux/fs/compat.c 2005-03-07 16:10:14.152641176 -0500
@@ -36,7 +36,7 @@
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/dirent.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/highuid.h>
#include <linux/sunrpc/svc.h>
#include <linux/nfsd/nfsd.h>
@@ -1233,9 +1233,15 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ if (type == READ)
+ fsnotify_access(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ else
+ fsnotify_modify(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ }
return ret;
}

diff -urN linux-2.6.11/fs/file_table.c linux/fs/file_table.c
--- linux-2.6.11/fs/file_table.c 2005-03-02 02:37:47.000000000 -0500
+++ linux/fs/file_table.c 2005-03-07 16:10:14.153641024 -0500
@@ -16,6 +16,7 @@
#include <linux/eventpoll.h>
#include <linux/mount.h>
#include <linux/cdev.h>
+#include <linux/fsnotify.h>

/* sysctl tunables... */
struct files_stat_struct files_stat = {
@@ -123,6 +124,8 @@
struct inode *inode = dentry->d_inode;

might_sleep();
+
+ fsnotify_close(file);
/*
* The function eventpoll_release() should be the first called
* in the file cleanup chain.
diff -urN linux-2.6.11/fs/inode.c linux/fs/inode.c
--- linux-2.6.11/fs/inode.c 2005-03-02 02:38:33.000000000 -0500
+++ linux/fs/inode.c 2005-03-07 16:10:14.155640720 -0500
@@ -130,6 +130,10 @@
#ifdef CONFIG_QUOTA
memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
#endif
+#ifdef CONFIG_INOTIFY
+ INIT_LIST_HEAD(&inode->inotify_watches);
+ spin_lock_init(&inode->inotify_lock);
+#endif
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_cdev = NULL;
diff -urN linux-2.6.11/fs/inotify.c linux/fs/inotify.c
--- linux-2.6.11/fs/inotify.c 1969-12-31 19:00:00.000000000 -0500
+++ linux/fs/inotify.c 2005-03-07 16:10:14.158640264 -0500
@@ -0,0 +1,1014 @@
+/*
+ * fs/inotify.c - inode-based file event notifications
+ *
+ * Authors:
+ * John McCutchan <[email protected]>
+ * Robert Love <[email protected]>
+ *
+ * Copyright (C) 2005 John McCutchan
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/writeback.h>
+#include <linux/inotify.h>
+
+#include <asm/ioctls.h>
+
+static atomic_t inotify_cookie;
+
+static kmem_cache_t *watch_cachep;
+static kmem_cache_t *event_cachep;
+
+static int max_user_devices;
+static int max_user_watches;
+static unsigned int max_queued_events;
+
+/*
+ * Lock ordering:
+ *
+ * inode_lock (used to safely walk the super_block->s_inodes list)
+ * dentry->d_lock (used to keep d_move() away from dentry->d_parent)
+ * inode->inotify_lock (protects inotify->inotify_watches and watches->i_list)
+ * inotify_dev->lock (protects inotify_device and watches->d_list)
+ */
+
+/*
+ * Lifetimes of the three main data structures -- inotify_device, inode, and
+ * inotify_watch -- are managed by reference count.
+ *
+ * inotify_device: Lifetime is from open until release. Additional references
+ * can bump the count via get_inotify_dev() and drop the count via
+ * put_inotify_dev().
+ *
+ * inotify_watch: Lifetime is from create_watch() to destory_watch().
+ * Additional references can bump the count via get_inotify_watch() and drop
+ * the count via put_inotify_watch().
+ *
+ * inode: Pinned so long as the inode is associated with a watch, from
+ * create_watch() to put_inotify_watch().
+ */
+
+/*
+ * struct inotify_device - represents an open instance of an inotify device
+ *
+ * This structure is protected by 'lock'.
+ */
+struct inotify_device {
+ wait_queue_head_t wq; /* wait queue for i/o */
+ struct idr idr; /* idr mapping wd -> watch */
+ struct list_head events; /* list of queued events */
+ struct list_head watches; /* list of watches */
+ spinlock_t lock; /* protects this bad boy */
+ atomic_t count; /* reference count */
+ struct user_struct *user; /* user who opened this dev */
+ unsigned int queue_size; /* size of the queue (bytes) */
+ unsigned int event_count; /* number of pending events */
+ unsigned int max_events; /* maximum number of events */
+};
+
+/*
+ * struct inotify_kernel_event - An intofiy event, originating from a watch and
+ * queued for user-space. A list of these is attached to each instance of the
+ * device. In read(), this list is walked and all events that can fit in the
+ * buffer are returned.
+ *
+ * Protected by dev->lock of the device in which we are queued.
+ */
+struct inotify_kernel_event {
+ struct inotify_event event; /* the user-space event */
+ struct list_head list; /* entry in inotify_device's list */
+ char *name; /* filename, if any */
+};
+
+/*
+ * struct inotify_watch - represents a watch request on a specific inode
+ *
+ * d_list is protected by dev->lock of the associated dev->watches.
+ * i_list and mask are protected by inode->inotify_lock of the associated inode.
+ * dev, inode, and wd are never written to once the watch is created.
+ */
+struct inotify_watch {
+ struct list_head d_list; /* entry in inotify_device's list */
+ struct list_head i_list; /* entry in inode's list */
+ atomic_t count; /* reference count */
+ struct inotify_device *dev; /* associated device */
+ struct inode *inode; /* associated inode */
+ s32 wd; /* watch descriptor */
+ u32 mask; /* event mask for this watch */
+};
+
+static ssize_t show_max_queued_events(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_queued_events);
+}
+
+static ssize_t store_max_queued_events(struct class_device *class,
+ const char *buf, size_t count)
+{
+ unsigned int max;
+
+ if (sscanf(buf, "%u", &max) > 0 && max > 0) {
+ max_queued_events = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_devices(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_devices);
+}
+
+static ssize_t store_max_user_devices(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_devices = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_watches(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_watches);
+}
+
+static ssize_t store_max_user_watches(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_watches = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static CLASS_DEVICE_ATTR(max_queued_events, S_IRUGO | S_IWUSR,
+ show_max_queued_events, store_max_queued_events);
+static CLASS_DEVICE_ATTR(max_user_devices, S_IRUGO | S_IWUSR,
+ show_max_user_devices, store_max_user_devices);
+static CLASS_DEVICE_ATTR(max_user_watches, S_IRUGO | S_IWUSR,
+ show_max_user_watches, store_max_user_watches);
+
+static inline void get_inotify_dev(struct inotify_device *dev)
+{
+ atomic_inc(&dev->count);
+}
+
+static inline void put_inotify_dev(struct inotify_device *dev)
+{
+ if (atomic_dec_and_test(&dev->count)) {
+ atomic_dec(&dev->user->inotify_devs);
+ free_uid(dev->user);
+ kfree(dev);
+ }
+}
+
+static inline void get_inotify_watch(struct inotify_watch *watch)
+{
+ atomic_inc(&watch->count);
+}
+
+static inline void put_inotify_watch(struct inotify_watch *watch)
+{
+ if (atomic_dec_and_test(&watch->count)) {
+ put_inotify_dev(watch->dev);
+ iput(watch->inode);
+ kmem_cache_free(watch_cachep, watch);
+ }
+}
+
+/*
+ * kernel_event - create a new kernel event with the given parameters
+ */
+static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_kernel_event *kevent;
+
+ /* XXX: optimally, we should use GFP_KERNEL */
+ kevent = kmem_cache_alloc(event_cachep, GFP_ATOMIC);
+ if (unlikely(!kevent))
+ return NULL;
+
+ /* we hand this out to user-space, so zero it just in case */
+ memset(&kevent->event, 0, sizeof(struct inotify_event));
+
+ kevent->event.wd = wd;
+ kevent->event.mask = mask;
+ kevent->event.cookie = cookie;
+
+ INIT_LIST_HEAD(&kevent->list);
+
+ if (name) {
+ size_t len, rem, event_size = sizeof(struct inotify_event);
+
+ /*
+ * We need to pad the filename so as to properly align an
+ * array of inotify_event structures. Because the structure is
+ * small and the common case is a small filename, we just round
+ * up to the next multiple of the structure's sizeof. This is
+ * simple and safe for all architectures.
+ */
+ len = strlen(name) + 1;
+ rem = event_size - len;
+ if (len > event_size) {
+ rem = event_size - (len % event_size);
+ if (len % event_size == 0)
+ rem = 0;
+ }
+ len += rem;
+
+ /* XXX: optimally, we should use GFP_KERNEL */
+ kevent->name = kmalloc(len, GFP_ATOMIC);
+ if (unlikely(!kevent->name)) {
+ kmem_cache_free(event_cachep, kevent);
+ return NULL;
+ }
+ memset(kevent->name, 0, len);
+ strncpy(kevent->name, name, strlen(name));
+ kevent->event.len = len;
+ } else {
+ kevent->event.len = 0;
+ kevent->name = NULL;
+ }
+
+ return kevent;
+}
+
+/*
+ * inotify_dev_get_event - return the next event in the given dev's queue
+ *
+ * Caller must hold dev->lock.
+ */
+static inline struct inotify_kernel_event *
+inotify_dev_get_event(struct inotify_device *dev)
+{
+ return list_entry(dev->events.next, struct inotify_kernel_event, list);
+}
+
+/*
+ * inotify_dev_queue_event - add a new event to the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_queue_event(struct inotify_device *dev,
+ struct inotify_watch *watch, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct inotify_kernel_event *kevent, *last;
+
+ /* coalescing: drop this event if it is a dupe of the previous */
+ last = inotify_dev_get_event(dev);
+ if (dev->event_count && last->event.mask == mask &&
+ last->event.wd == watch->wd) {
+ const char *lastname = last->name;
+
+ if (!name && !lastname)
+ return;
+ if (name && lastname && !strcmp(lastname, name))
+ return;
+ }
+
+ /*
+ * The queue has already overflowed and we have already sent the
+ * Q_OVERFLOW event.
+ */
+ if (unlikely(dev->event_count > dev->max_events))
+ return;
+
+ /* if the queue overflows, we need to notify user space */
+ if (unlikely(dev->event_count == dev->max_events))
+ kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
+ else
+ kevent = kernel_event(watch->wd, mask, cookie, name);
+
+ if (unlikely(!kevent))
+ return;
+
+ /* queue the event and wake up anyone waiting */
+ dev->event_count++;
+ dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
+ list_add_tail(&kevent->list, &dev->events);
+ wake_up_interruptible(&dev->wq);
+}
+
+/*
+ * remove_kevent - cleans up and ultimately frees the given kevent
+ */
+static void remove_kevent(struct inotify_device *dev,
+ struct inotify_kernel_event *kevent)
+{
+ BUG_ON(!dev);
+ BUG_ON(!kevent);
+
+ list_del(&kevent->list);
+
+ dev->event_count--;
+ dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
+
+ kfree(kevent->name);
+ kmem_cache_free(event_cachep, kevent);
+}
+
+/*
+ * inotify_dev_event_dequeue - destroy an event on the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_event_dequeue(struct inotify_device *dev)
+{
+ if (!list_empty(&dev->events)) {
+ struct inotify_kernel_event *kevent;
+ kevent = inotify_dev_get_event(dev);
+ remove_kevent(dev, kevent);
+ }
+}
+
+/*
+ * inotify_dev_get_wd - returns the next WD for use by the given dev
+ *
+ * Grabs dev->lock. This function can sleep.
+ */
+static int inotify_dev_get_wd(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ int ret;
+
+ do {
+ if (unlikely(!idr_pre_get(&dev->idr, GFP_KERNEL)))
+ return -ENOSPC;
+ spin_lock(&dev->lock);
+ ret = idr_get_new(&dev->idr, watch, &watch->wd);
+ spin_unlock(&dev->lock);
+ } while (ret == -EAGAIN);
+
+ return ret;
+}
+
+/*
+ * create_watch - creates a watch on the given device.
+ *
+ * Calls inotify_dev_get_wd(), so it both grabs dev->lock and may sleep.
+ * Both 'dev' and 'inode' (by way of nameidata) need to be pinned.
+ */
+static struct inotify_watch *create_watch(struct inotify_device *dev,
+ u32 mask, struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ if (atomic_read(&dev->user->inotify_watches) >= max_user_watches)
+ return NULL;
+
+ watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
+ if (unlikely(!watch))
+ return NULL;
+
+ if (unlikely(inotify_dev_get_wd(dev, watch))) {
+ kmem_cache_free(watch_cachep, watch);
+ return NULL;
+ }
+
+ watch->mask = mask;
+ atomic_set(&watch->count, 0);
+ INIT_LIST_HEAD(&watch->d_list);
+ INIT_LIST_HEAD(&watch->i_list);
+
+ /* save a reference to device and bump the count to make it official */
+ get_inotify_dev(dev);
+ watch->dev = dev;
+
+ /*
+ * Save a reference to the inode and bump the ref count to make it
+ * official. We hold a reference to nameidata, which makes this safe.
+ */
+ watch->inode = igrab(inode);
+
+ /* bump our own count, corresponding to our entry in dev->watches */
+ get_inotify_watch(watch);
+
+ atomic_inc(&dev->user->inotify_watches);
+
+ return watch;
+}
+
+/*
+ * inotify_find_dev - find the watch associated with the given inode and dev
+ *
+ * Callers must hold inode->inotify_lock.
+ */
+static struct inotify_watch *inode_find_dev(struct inode *inode,
+ struct inotify_device *dev)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->dev == dev)
+ return watch;
+ }
+
+ return NULL;
+}
+
+/*
+ * inotify_dev_is_watching_inode - is this device watching this inode?
+ *
+ * Requires 'dev' and 'inode' to both be pinned and dev->lock be held.
+ */
+static inline int inotify_dev_is_watching_inode(struct inotify_device *dev,
+ struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &dev->watches, d_list) {
+ if (watch->inode == inode)
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * remove_watch_no_event - remove_watch() without the IN_IGNORED event.
+ */
+static void remove_watch_no_event(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ BUG_ON(!dev);
+ BUG_ON(!watch);
+
+ list_del(&watch->i_list);
+ list_del(&watch->d_list);
+
+ atomic_dec(&dev->user->inotify_watches);
+ idr_remove(&dev->idr, watch->wd);
+ put_inotify_watch(watch);
+}
+
+/*
+ * remove_watch - Remove a watch from both the device and the inode. Sends
+ * the IN_IGNORED event to the given device signifying that the inode is no
+ * longer watched.
+ *
+ * Callers must hold both inode->inotify_lock and dev->lock. We drop a
+ * reference to the inode before returning.
+ */
+static void remove_watch(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ inotify_dev_queue_event(dev, watch, IN_IGNORED, 0, NULL);
+ remove_watch_no_event(watch, dev);
+}
+
+/*
+ * __inode_queue_event - internal helper for inotify_inode_queue_event()
+ *
+ * Caller must hold inode->inotify_lock.
+ */
+void __inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->mask & mask) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, mask, cookie, name);
+ spin_unlock(&dev->lock);
+ }
+ }
+}
+
+/* Kernel API */
+
+/**
+ * inotify_inode_queue_event - queue an event to all watches on this inode
+ * @inode: inode event is originating from
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ spin_lock(&inode->inotify_lock);
+ __inode_queue_event(inode, mask, cookie, name);
+ spin_unlock(&inode->inotify_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
+
+/**
+ * inotify_dentry_parent_queue_event - queue an event to a dentry's parent
+ * @dentry: the dentry in question, we queue against this dentry's parent
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct dentry *parent;
+ struct inode *inode;
+
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ inode = parent->d_inode;
+ spin_lock(&inode->inotify_lock);
+ if (!list_empty(&inode->inotify_watches)) {
+ dget(parent);
+ spin_unlock(&dentry->d_lock);
+ __inode_queue_event(inode, mask, cookie, name);
+ spin_unlock(&inode->inotify_lock);
+ dput(parent);
+ } else {
+ spin_unlock(&inode->inotify_lock);
+ spin_unlock(&dentry->d_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
+
+/**
+ * inotify_get_cookie - return a unique cookie for use in synchronizing events
+ *
+ * Returns the unique cookie.
+ */
+u32 inotify_get_cookie(void)
+{
+ return atomic_inc_return(&inotify_cookie);
+}
+EXPORT_SYMBOL_GPL(inotify_get_cookie);
+
+/**
+ * inotify_super_block_umount - process watches on an unmounted fs
+ * @sb: the super_block of the filesystem in question
+ */
+void inotify_super_block_umount(struct super_block *sb)
+{
+ struct inode *inode;
+
+ /* walk the list of inodes on this superblock */
+ spin_lock(&inode_lock);
+ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ struct inotify_watch *watch, *next;
+ struct list_head *watches;
+
+ /* for each watch, send IN_UNMOUNT and then remove it */
+ spin_lock(&inode->inotify_lock);
+ watches = &inode->inotify_watches;
+ list_for_each_entry_safe(watch, next, watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, IN_UNMOUNT,0,NULL);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ spin_unlock(&inode->inotify_lock);
+ }
+ spin_unlock(&inode_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_super_block_umount);
+
+/**
+ * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
+ * @inode: inode that is about to be removed
+ */
+void inotify_inode_is_dead(struct inode *inode)
+{
+ struct inotify_watch *watch, *next;
+
+ spin_lock(&inode->inotify_lock);
+ list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ spin_unlock(&inode->inotify_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
+
+/* Device Interface */
+
+static unsigned int inotify_poll(struct file *file, poll_table *wait)
+{
+ struct inotify_device *dev;
+ int ret = 0;
+
+ dev = file->private_data;
+ get_inotify_dev(dev);
+
+ poll_wait(file, &dev->wq, wait);
+ spin_lock(&dev->lock);
+ if (!list_empty(&dev->events))
+ ret = POLLIN | POLLRDNORM;
+ spin_unlock(&dev->lock);
+
+ put_inotify_dev(dev);
+ return ret;
+}
+
+static ssize_t inotify_read(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ size_t event_size;
+ struct inotify_device *dev;
+ char __user *start;
+ int ret;
+ DEFINE_WAIT(wait);
+
+ start = buf;
+ dev = file->private_data;
+
+ /* we only hand out full inotify events */
+ event_size = sizeof(struct inotify_event);
+ if (count < event_size)
+ return 0;
+
+ while (1) {
+ int events;
+
+ prepare_to_wait(&dev->wq, &wait, TASK_INTERRUPTIBLE);
+
+ spin_lock(&dev->lock);
+ events = !list_empty(&dev->events);
+ spin_unlock(&dev->lock);
+ if (events) {
+ ret = 0;
+ break;
+ }
+
+ if (file->f_flags & O_NONBLOCK) {
+ ret = -EAGAIN;
+ break;
+ }
+
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+
+ schedule();
+ }
+
+ finish_wait(&dev->wq, &wait);
+ if (ret)
+ return ret;
+
+ while (1) {
+ struct inotify_kernel_event *kevent;
+
+ spin_lock(&dev->lock);
+ if (list_empty(&dev->events)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ kevent = inotify_dev_get_event(dev);
+ if (event_size + kevent->event.len > count) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ list_del_init(&kevent->list);
+ spin_unlock(&dev->lock);
+
+ if (copy_to_user(buf, &kevent->event, event_size)) {
+ /* put the event back on the queue */
+ spin_lock(&dev->lock);
+ list_add(&kevent->list, &dev->events);
+ spin_unlock(&dev->lock);
+ return -EFAULT;
+ }
+ buf += event_size;
+ count -= event_size;
+
+ if (kevent->name) {
+ if (copy_to_user(buf, kevent->name, kevent->event.len)){
+ /* put the event back on the queue */
+ spin_lock(&dev->lock);
+ list_add(&kevent->list, &dev->events);
+ spin_unlock(&dev->lock);
+ return -EFAULT;
+ }
+ buf += kevent->event.len;
+ count -= kevent->event.len;
+ }
+
+ /*
+ * We made it here, so the event was copied to the user. It is
+ * already removed from the event list, just free it.
+ */
+ spin_lock(&dev->lock);
+ remove_kevent(dev, kevent);
+ spin_unlock(&dev->lock);
+ }
+
+ return buf - start;
+}
+
+static int inotify_open(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+ struct user_struct *user;
+ int ret;
+
+ user = get_uid(current->user);
+
+ if (unlikely(atomic_read(&user->inotify_devs) >= max_user_devices)) {
+ ret = -EMFILE;
+ goto out_err;
+ }
+
+ dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
+ if (unlikely(!dev)) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ idr_init(&dev->idr);
+ INIT_LIST_HEAD(&dev->events);
+ INIT_LIST_HEAD(&dev->watches);
+ init_waitqueue_head(&dev->wq);
+ spin_lock_init(&dev->lock);
+
+ dev->event_count = 0;
+ dev->queue_size = 0;
+ dev->max_events = max_queued_events;
+ dev->user = user;
+ atomic_set(&dev->count, 0);
+
+ get_inotify_dev(dev);
+ atomic_inc(&current->user->inotify_devs);
+
+ file->private_data = dev;
+
+ return 0;
+out_err:
+ free_uid(current->user);
+ return ret;
+}
+
+static int inotify_release(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+
+ dev = file->private_data;
+ BUG_ON(!dev);
+
+ /*
+ * Destroy all of the watches on this device. Unfortunately, not very
+ * pretty. We cannot do a simple iteration over the list, because we
+ * do not know the inode until we iterate to the watch. But we need to
+ * hold inode->inotify_lock before dev->lock. The following works.
+ */
+ while (1) {
+ struct inotify_watch *watch;
+ struct list_head *watches;
+ struct inode *inode;
+
+ spin_lock(&dev->lock);
+ watches = &dev->watches;
+ if (list_empty(watches)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ watch = list_entry(watches->next, struct inotify_watch, d_list);
+ get_inotify_watch(watch);
+ spin_unlock(&dev->lock);
+
+ inode = watch->inode;
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ remove_watch_no_event(watch, dev);
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+ put_inotify_watch(watch);
+ }
+
+ /* destroy all of the events on this device */
+ spin_lock(&dev->lock);
+ while (!list_empty(&dev->events))
+ inotify_dev_event_dequeue(dev);
+ spin_unlock(&dev->lock);
+
+ /* free this device: the put matching the get in inotify_open() */
+ put_inotify_dev(dev);
+
+ return 0;
+}
+
+static int inotify_add_watch(struct inotify_device *dev,
+ struct inotify_watch_request *request)
+{
+ struct inode *inode;
+ struct inotify_watch *watch, *old;
+ struct nameidata nd;
+ int ret;
+
+ ret = __user_walk(request->name, LOOKUP_FOLLOW, &nd);
+ if (unlikely(ret))
+ return ret;
+
+ /* you can only watch an inode if you have read permissions on it */
+ ret = permission(nd.dentry->d_inode, MAY_READ, &nd);
+ if (unlikely(ret))
+ goto nd_out;
+
+ /* inode is held in place by a reference on nd */
+ inode = nd.dentry->d_inode;
+
+ /*
+ * Handle the case of re-adding a watch on an (inode,dev) pair that we
+ * are already watching. We just update the mask and return its wd.
+ */
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ old->mask = request->mask;
+ ret = old->wd;
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+ goto nd_out;
+ }
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+ /*
+ * We do this lockless, for both scalability and so we can allocate
+ * with GFP_KERNEL. But that means we can race here and add this watch
+ * twice. We fix up that case below, by again checking for the watch
+ * after we reacquire the locks.
+ */
+ watch = create_watch(dev, request->mask, inode);
+ if (unlikely(!watch)) {
+ ret = -ENOSPC;
+ goto nd_out;
+ }
+
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ /* We raced! Destroy this watch and return. */
+ put_inotify_watch(watch);
+ ret = -EBUSY;
+ } else {
+ /* Add the watch to the device's and the inode's list */
+ list_add(&watch->d_list, &dev->watches);
+ list_add(&watch->i_list, &inode->inotify_watches);
+ ret = watch->wd;
+ }
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+nd_out:
+ path_release(&nd);
+
+ return ret;
+}
+
+/*
+ * inotify_ignore - handle the INOTIFY_IGNORE ioctl, asking that a given wd be
+ * removed from the device.
+ */
+static int inotify_ignore(struct inotify_device *dev, s32 wd)
+{
+ struct inotify_watch *watch;
+ struct inode *inode;
+
+ spin_lock(&dev->lock);
+ watch = idr_find(&dev->idr, wd);
+ spin_unlock(&dev->lock);
+
+ if (unlikely(!watch))
+ return -EINVAL;
+
+ inode = watch->inode;
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+ return 0;
+}
+
+static int inotify_ioctl(struct inode *ip, struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct inotify_device *dev;
+ struct inotify_watch_request request;
+ void __user *p;
+ int ret = -ENOTTY;
+ s32 wd;
+
+ dev = file->private_data;
+ p = (void __user *) arg;
+
+ get_inotify_dev(dev);
+
+ switch (cmd) {
+ case INOTIFY_WATCH:
+ if (unlikely(copy_from_user(&request, p, sizeof (request)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_add_watch(dev, &request);
+ break;
+ case INOTIFY_IGNORE:
+ if (unlikely(copy_from_user(&wd, p, sizeof (wd)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_ignore(dev, wd);
+ break;
+ case FIONREAD:
+ ret = put_user(dev->queue_size, (int __user *) p);
+ break;
+ }
+
+ put_inotify_dev(dev);
+
+ return ret;
+}
+
+static struct file_operations inotify_fops = {
+ .owner = THIS_MODULE,
+ .poll = inotify_poll,
+ .read = inotify_read,
+ .open = inotify_open,
+ .release = inotify_release,
+ .ioctl = inotify_ioctl,
+};
+
+static struct miscdevice inotify_device = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "inotify",
+ .fops = &inotify_fops,
+};
+
+/*
+ * inotify_init - Our initialization function. Note that we cannnot return
+ * error because we have compiled-in VFS hooks. So an (unlikely) failure here
+ * must result in panic().
+ */
+static int __init inotify_init(void)
+{
+ struct class_device *class;
+ int ret;
+
+ ret = misc_register(&inotify_device);
+ if (unlikely(ret))
+ panic("inotify: misc_register returned %d\n", ret);
+
+ max_queued_events = 512;
+ max_user_devices = 64;
+ max_user_watches = 16384;
+
+ class = inotify_device.class;
+ class_device_create_file(class, &class_device_attr_max_queued_events);
+ class_device_create_file(class, &class_device_attr_max_user_devices);
+ class_device_create_file(class, &class_device_attr_max_user_watches);
+
+ atomic_set(&inotify_cookie, 0);
+
+ watch_cachep = kmem_cache_create("inotify_watch_cache",
+ sizeof(struct inotify_watch),
+ 0, SLAB_PANIC, NULL, NULL);
+ event_cachep = kmem_cache_create("inotify_event_cache",
+ sizeof(struct inotify_kernel_event),
+ 0, SLAB_PANIC, NULL, NULL);
+
+ printk(KERN_INFO "inotify device minor=%d\n", inotify_device.minor);
+
+ return 0;
+}
+
+module_init(inotify_init);
diff -urN linux-2.6.11/fs/Kconfig linux/fs/Kconfig
--- linux-2.6.11/fs/Kconfig 2005-03-02 02:38:10.000000000 -0500
+++ linux/fs/Kconfig 2005-03-07 16:10:14.160639960 -0500
@@ -339,6 +339,19 @@
If you don't know whether you need it, then you don't need it:
answer N.

+config INOTIFY
+ bool "Inotify file change notification support"
+ default y
+ ---help---
+ Say Y here to enable inotify support and the /dev/inotify character
+ device. Inotify is a file change notification system and a
+ replacement for dnotify. Inotify fixes numerous shortcomings in
+ dnotify and introduces several new features. It allows monitoring
+ of both files and directories via a single open fd. Multiple file
+ events are supported.
+
+ If unsure, say Y.
+
config QUOTA
bool "Quota support"
help
diff -urN linux-2.6.11/fs/Makefile linux/fs/Makefile
--- linux-2.6.11/fs/Makefile 2005-03-02 02:38:10.000000000 -0500
+++ linux/fs/Makefile 2005-03-07 16:10:14.162639656 -0500
@@ -11,6 +11,7 @@
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \

+obj-$(CONFIG_INOTIFY) += inotify.o
obj-$(CONFIG_EPOLL) += eventpoll.o
obj-$(CONFIG_COMPAT) += compat.o

diff -urN linux-2.6.11/fs/namei.c linux/fs/namei.c
--- linux-2.6.11/fs/namei.c 2005-03-02 02:37:55.000000000 -0500
+++ linux/fs/namei.c 2005-03-07 16:10:14.165639200 -0500
@@ -21,7 +21,7 @@
#include <linux/namei.h>
#include <linux/quotaops.h>
#include <linux/pagemap.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/smp_lock.h>
#include <linux/personality.h>
#include <linux/security.h>
@@ -1252,7 +1252,7 @@
DQUOT_INIT(dir);
error = dir->i_op->create(dir, dentry, mode, nd);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_create(dir, dentry, mode);
}
return error;
@@ -1557,7 +1557,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mknod(dir, dentry, mode, dev);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_mknod(dir, dentry, mode, dev);
}
return error;
@@ -1630,7 +1630,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mkdir(dir, dentry, mode);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_mkdir(dir, dentry->d_name.name);
security_inode_post_mkdir(dir,dentry, mode);
}
return error;
@@ -1725,7 +1725,7 @@
}
up(&dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_DELETE);
+ fsnotify_rmdir(dentry, dentry->d_inode, dir);
d_delete(dentry);
}
dput(dentry);
@@ -1798,9 +1798,10 @@

/* We don't d_delete() NFS sillyrenamed files--they still exist. */
if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
+ fsnotify_unlink(dentry, dir);
d_delete(dentry);
- inode_dir_notify(dir, DN_DELETE);
}
+
return error;
}

@@ -1874,7 +1875,7 @@
DQUOT_INIT(dir);
error = dir->i_op->symlink(dir, dentry, oldname);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_symlink(dir, dentry, oldname);
}
return error;
@@ -1947,7 +1948,7 @@
error = dir->i_op->link(old_dentry, dir, new_dentry);
up(&old_dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, new_dentry->d_name.name);
security_inode_post_link(old_dentry, dir, new_dentry);
}
return error;
@@ -2111,6 +2112,7 @@
{
int error;
int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
+ char *old_name;

if (old_dentry->d_inode == new_dentry->d_inode)
return 0;
@@ -2132,18 +2134,18 @@
DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir);

+ old_name = fsnotify_oldname_init(old_dentry);
+
if (is_dir)
error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
else
error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
if (!error) {
- if (old_dir == new_dir)
- inode_dir_notify(old_dir, DN_RENAME);
- else {
- inode_dir_notify(old_dir, DN_DELETE);
- inode_dir_notify(new_dir, DN_CREATE);
- }
+ const char *new_name = old_dentry->d_name.name;
+ fsnotify_move(old_dir, new_dir, old_name, new_name);
}
+ fsnotify_oldname_free(old_name);
+
return error;
}

diff -urN linux-2.6.11/fs/open.c linux/fs/open.c
--- linux-2.6.11/fs/open.c 2005-03-02 02:37:47.000000000 -0500
+++ linux/fs/open.c 2005-03-07 16:10:14.167638896 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/smp_lock.h>
#include <linux/quotaops.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/tty.h>
@@ -944,9 +944,11 @@
fd = get_unused_fd();
if (fd >= 0) {
struct file *f = filp_open(tmp, flags, mode);
+
error = PTR_ERR(f);
if (IS_ERR(f))
goto out_error;
+ fsnotify_open(f->f_dentry);
fd_install(fd, f);
}
out:
@@ -998,7 +1000,7 @@
retval = err;
}

- dnotify_flush(filp, id);
+ fsnotify_flush(filp, id);
locks_remove_posix(filp, id);
fput(filp);
return retval;
diff -urN linux-2.6.11/fs/read_write.c linux/fs/read_write.c
--- linux-2.6.11/fs/read_write.c 2005-03-02 02:38:12.000000000 -0500
+++ linux/fs/read_write.c 2005-03-07 16:11:36.806075944 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/uio.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/module.h>
#include <linux/syscalls.h>
@@ -239,7 +239,7 @@
else
ret = do_sync_read(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_ACCESS);
+ fsnotify_access(file->f_dentry);
current->rchar += ret;
}
current->syscr++;
@@ -287,7 +287,7 @@
else
ret = do_sync_write(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_MODIFY);
+ fsnotify_modify(file->f_dentry);
current->wchar += ret;
}
current->syscw++;
@@ -523,9 +523,12 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ if (type == READ)
+ fsnotify_access(file->f_dentry);
+ else
+ fsnotify_modify(file->f_dentry);
+ }
return ret;
Efault:
ret = -EFAULT;
diff -urN linux-2.6.11/fs/super.c linux/fs/super.c
--- linux-2.6.11/fs/super.c 2005-03-02 02:38:08.000000000 -0500
+++ linux/fs/super.c 2005-03-07 16:11:51.305871640 -0500
@@ -37,6 +37,7 @@
#include <linux/writeback.h> /* for the emergency remount stuff */
#include <linux/idr.h>
#include <linux/kobject.h>
+#include <linux/fsnotify.h>
#include <asm/uaccess.h>


@@ -229,6 +230,7 @@

if (root) {
sb->s_root = NULL;
+ fsnotify_sb_umount(sb);
shrink_dcache_parent(root);
shrink_dcache_anon(&sb->s_anon);
dput(root);
diff -urN linux-2.6.11/include/linux/fs.h linux/include/linux/fs.h
--- linux-2.6.11/include/linux/fs.h 2005-03-02 02:37:50.000000000 -0500
+++ linux/include/linux/fs.h 2005-03-07 16:10:14.173637984 -0500
@@ -224,6 +224,7 @@
struct kstatfs;
struct vm_area_struct;
struct vfsmount;
+struct inotify_inode_data;

/* Used to be a macro which just called the function, now just a function */
extern void update_atime (struct inode *);
@@ -471,6 +472,11 @@
struct dnotify_struct *i_dnotify; /* for directory notifications */
#endif

+#ifdef CONFIG_INOTIFY
+ struct list_head inotify_watches; /* watches on this inode */
+ spinlock_t inotify_lock; /* protects the watches list */
+#endif
+
unsigned long i_state;
unsigned long dirtied_when; /* jiffies of first dirtying */

@@ -1365,7 +1371,7 @@
extern int do_remount_sb(struct super_block *sb, int flags,
void *data, int force);
extern sector_t bmap(struct inode *, sector_t);
-extern int setattr_mask(unsigned int);
+extern void setattr_mask(unsigned int, int *, u32 *);
extern int notify_change(struct dentry *, struct iattr *);
extern int permission(struct inode *, int, struct nameidata *);
extern int generic_permission(struct inode *, int,
diff -urN linux-2.6.11/include/linux/fsnotify.h linux/include/linux/fsnotify.h
--- linux-2.6.11/include/linux/fsnotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/fsnotify.h 2005-03-07 16:10:14.174637832 -0500
@@ -0,0 +1,236 @@
+#ifndef _LINUX_FS_NOTIFY_H
+#define _LINUX_FS_NOTIFY_H
+
+/*
+ * include/linux/fs_notify.h - generic hooks for filesystem notification, to
+ * reduce in-source duplication from both dnotify and inotify.
+ *
+ * We don't compile any of this away in some complicated menagerie of ifdefs.
+ * Instead, we rely on the code inside to optimize away as needed.
+ *
+ * (C) Copyright 2005 Robert Love
+ */
+
+#ifdef __KERNEL__
+
+#include <linux/dnotify.h>
+#include <linux/inotify.h>
+
+/*
+ * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
+ */
+static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
+ const char *old_name, const char *new_name)
+{
+ u32 cookie;
+
+ if (old_dir == new_dir)
+ inode_dir_notify(old_dir, DN_RENAME);
+ else {
+ inode_dir_notify(old_dir, DN_DELETE);
+ inode_dir_notify(new_dir, DN_CREATE);
+ }
+
+ cookie = inotify_get_cookie();
+
+ inotify_inode_queue_event(old_dir, IN_MOVED_FROM, cookie, old_name);
+ inotify_inode_queue_event(new_dir, IN_MOVED_TO, cookie, new_name);
+}
+
+/*
+ * fsnotify_unlink - file was unlinked
+ */
+static inline void fsnotify_unlink(struct dentry *dentry, struct inode *dir)
+{
+ struct inode *inode = dentry->d_inode;
+
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_FILE, 0, dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+}
+
+/*
+ * fsnotify_rmdir - directory was removed
+ */
+static inline void fsnotify_rmdir(struct dentry *dentry, struct inode *inode,
+ struct inode *dir)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_SUBDIR,0,dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+}
+
+/*
+ * fsnotify_create - filename was linked in
+ */
+static inline void fsnotify_create(struct inode *inode, const char *filename)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_FILE, 0, filename);
+}
+
+/*
+ * fsnotify_mkdir - directory 'name' was created
+ */
+static inline void fsnotify_mkdir(struct inode *inode, const char *name)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_SUBDIR, 0, name);
+}
+
+/*
+ * fsnotify_access - file was read
+ */
+static inline void fsnotify_access(struct dentry *dentry)
+{
+ dnotify_parent(dentry, DN_ACCESS);
+ inotify_dentry_parent_queue_event(dentry, IN_ACCESS, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(dentry->d_inode, IN_ACCESS, 0, NULL);
+}
+
+/*
+ * fsnotify_modify - file was modified
+ */
+static inline void fsnotify_modify(struct dentry *dentry)
+{
+ dnotify_parent(dentry, DN_MODIFY);
+ inotify_dentry_parent_queue_event(dentry, IN_MODIFY, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(dentry->d_inode, IN_MODIFY, 0, NULL);
+}
+
+/*
+ * fsnotify_open - file was opened
+ */
+static inline void fsnotify_open(struct dentry *dentry)
+{
+ inotify_inode_queue_event(dentry->d_inode, IN_OPEN, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, IN_OPEN, 0,
+ dentry->d_name.name);
+}
+
+/*
+ * fsnotify_close - file was closed
+ */
+static inline void fsnotify_close(struct file *file)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ const char *filename = dentry->d_name.name;
+ mode_t mode = file->f_mode;
+ u32 mask;
+
+ mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
+ inotify_dentry_parent_queue_event(dentry, mask, 0, filename);
+ inotify_inode_queue_event(inode, mask, 0, NULL);
+}
+
+/*
+ * fsnotify_change - notify_change event. file was modified and/or metadata
+ * was changed.
+ */
+static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
+{
+ int dn_mask = 0;
+ u32 in_mask = 0;
+
+ if (ia_valid & ATTR_UID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_GID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_SIZE) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ /* both times implies a utime(s) call */
+ if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
+ {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ } else if (ia_valid & ATTR_ATIME) {
+ in_mask |= IN_ACCESS;
+ dn_mask |= DN_ACCESS;
+ } else if (ia_valid & ATTR_MTIME) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ if (ia_valid & ATTR_MODE) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+
+ if (dn_mask)
+ dnotify_parent(dentry, dn_mask);
+ if (in_mask) {
+ inotify_inode_queue_event(dentry->d_inode, in_mask, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, in_mask, 0,
+ dentry->d_name.name);
+ }
+}
+
+/*
+ * fsnotify_sb_umount - filesystem unmount
+ */
+static inline void fsnotify_sb_umount(struct super_block *sb)
+{
+ inotify_super_block_umount(sb);
+}
+
+/*
+ * fsnotify_flush - flush time!
+ */
+static inline void fsnotify_flush(struct file *filp, fl_owner_t id)
+{
+ dnotify_flush(filp, id);
+}
+
+#ifdef CONFIG_INOTIFY /* inotify helpers */
+
+/*
+ * fsnotify_oldname_init - save off the old filename before we change it
+ *
+ * this could be kstrdup if only we could add that to lib/string.c
+ */
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ char *old_name;
+
+ old_name = kmalloc(strlen(old_dentry->d_name.name) + 1, GFP_KERNEL);
+ if (old_name)
+ strcpy(old_name, old_dentry->d_name.name);
+ return old_name;
+}
+
+/*
+ * fsnotify_oldname_free - free the name we got from fsnotify_oldname_init
+ */
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+ kfree(old_name);
+}
+
+#else /* CONFIG_INOTIFY */
+
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ return NULL;
+}
+
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+}
+
+#endif /* ! CONFIG_INOTIFY */
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_FS_NOTIFY_H */
diff -urN linux-2.6.11/include/linux/inotify.h linux/include/linux/inotify.h
--- linux-2.6.11/include/linux/inotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/inotify.h 2005-03-07 16:10:14.174637832 -0500
@@ -0,0 +1,113 @@
+/*
+ * Inode based directory notification for Linux
+ *
+ * Copyright (C) 2005 John McCutchan
+ */
+
+#ifndef _LINUX_INOTIFY_H
+#define _LINUX_INOTIFY_H
+
+#include <linux/types.h>
+#include <linux/limits.h>
+
+/*
+ * struct inotify_event - structure read from the inotify device for each event
+ *
+ * When you are watching a directory, you will receive the filename for events
+ * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, ..., relative to the wd.
+ */
+struct inotify_event {
+ __s32 wd; /* watch descriptor */
+ __u32 mask; /* watch mask */
+ __u32 cookie; /* cookie to synchronize two events */
+ __u32 len; /* length (including nulls) of name */
+ char name[0]; /* stub for possible name */
+};
+
+/*
+ * struct inotify_watch_request - represents a watch request
+ *
+ * Pass to the inotify device via the INOTIFY_WATCH ioctl
+ */
+struct inotify_watch_request {
+ char *name; /* filename name */
+ __u32 mask; /* event mask */
+};
+
+/* the following are legal, implemented events */
+#define IN_ACCESS 0x00000001 /* File was accessed */
+#define IN_MODIFY 0x00000002 /* File was modified */
+#define IN_ATTRIB 0x00000004 /* File changed attributes */
+#define IN_CLOSE_WRITE 0x00000008 /* Writtable file was closed */
+#define IN_CLOSE_NOWRITE 0x00000010 /* Unwrittable file closed */
+#define IN_OPEN 0x00000020 /* File was opened */
+#define IN_MOVED_FROM 0x00000040 /* File was moved from X */
+#define IN_MOVED_TO 0x00000080 /* File was moved to Y */
+#define IN_DELETE_SUBDIR 0x00000100 /* Subdir was deleted */
+#define IN_DELETE_FILE 0x00000200 /* Subfile was deleted */
+#define IN_CREATE_SUBDIR 0x00000400 /* Subdir was created */
+#define IN_CREATE_FILE 0x00000800 /* Subfile was created */
+#define IN_DELETE_SELF 0x00001000 /* Self was deleted */
+#define IN_UNMOUNT 0x00002000 /* Backing fs was unmounted */
+#define IN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define IN_IGNORED 0x00008000 /* File was ignored */
+
+/* special flags */
+#define IN_ALL_EVENTS 0xffffffff /* All the events */
+#define IN_CLOSE (IN_CLOSE_WRITE | IN_CLOSE_NOWRITE)
+
+#define INOTIFY_IOCTL_MAGIC 'Q'
+#define INOTIFY_IOCTL_MAXNR 2
+
+#define INOTIFY_WATCH _IOR(INOTIFY_IOCTL_MAGIC, 1, struct inotify_watch_request)
+#define INOTIFY_IGNORE _IOR(INOTIFY_IOCTL_MAGIC, 2, int)
+
+#ifdef __KERNEL__
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/config.h>
+#include <asm/atomic.h>
+
+#ifdef CONFIG_INOTIFY
+
+extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
+ const char *);
+extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
+ const char *);
+extern void inotify_super_block_umount(struct super_block *);
+extern void inotify_inode_is_dead(struct inode *);
+extern u32 inotify_get_cookie(void);
+
+#else
+
+static inline void inotify_inode_queue_event(struct inode *inode,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_dentry_parent_queue_event(struct dentry *dentry,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_super_block_umount(struct super_block *sb)
+{
+}
+
+static inline void inotify_inode_is_dead(struct inode *inode)
+{
+}
+
+static inline u32 inotify_get_cookie(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_INOTIFY */
+
+#endif /* __KERNEL __ */
+
+#endif /* _LINUX_INOTIFY_H */
diff -urN linux-2.6.11/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.6.11/include/linux/sched.h 2005-03-02 02:37:48.000000000 -0500
+++ linux/include/linux/sched.h 2005-03-07 16:10:14.175637680 -0500
@@ -368,6 +368,10 @@
atomic_t processes; /* How many processes does this user have? */
atomic_t files; /* How many open files does this user have? */
atomic_t sigpending; /* How many pending signals does this user have? */
+#ifdef CONFIG_INOTIFY
+ atomic_t inotify_watches; /* How many inotify watches does this user have? */
+ atomic_t inotify_devs; /* How many inotify devs does this user have opened? */
+#endif
/* protected by mq_lock */
unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
unsigned long locked_shm; /* How many pages of mlocked shm ? */
diff -urN linux-2.6.11/kernel/user.c linux/kernel/user.c
--- linux-2.6.11/kernel/user.c 2005-03-02 02:37:58.000000000 -0500
+++ linux/kernel/user.c 2005-03-07 16:10:14.176637528 -0500
@@ -119,6 +119,10 @@
atomic_set(&new->processes, 0);
atomic_set(&new->files, 0);
atomic_set(&new->sigpending, 0);
+#ifdef CONFIG_INOTIFY
+ atomic_set(&new->inotify_watches, 0);
+ atomic_set(&new->inotify_devs, 0);
+#endif

new->mq_bytes = 0;
new->locked_shm = 0;


2005-03-07 22:05:00

by Robert Love

[permalink] [raw]
Subject: [patch] inotify for 2.6.11-mm1, updated

On Mon, 2005-03-07 at 01:19 +0000, Christoph Hellwig wrote:

Hi, hch.

I went ahead and implemented all of your suggestions, save for the ones
below where I have comments or disagree (see below). Most of your
comments were straightforward and I made the changes as you suggested.
See the following patch, against 2.6.11-mm1.

We will try out a write()-based interface. John is working on that.
I'd like Andrew and others to chime in on whether they really prefer
that to an ioctl?

> > might_sleep();
>
> this one seems totally unrelated.

Eh? We did not add that. ;)

> > + /* XXX: optimally, we should use GFP_KERNEL */
> > + kevent = kmem_cache_alloc(event_cachep, GFP_ATOMIC);
>
> indeed. having a new atomic memory allocation in every filesystem operation
> sounds like a really bad idea.

Obviously we know that--the FIXME is there to signify as much. Anyhow,
the allocation is not on every operation, just every event.

> > +static struct miscdevice inotify_device = {
> > + .minor = MISC_DYNAMIC_MINOR,
> > + .name = "inotify",
> > + .fops = &inotify_fops,
> > +};
>
> Should probably use the /dev/mem major.

Hrm, should we?

Also, the memory class stuff is all local to mem.c. For example, I
cannot get at /sys/class/mem. The misc. device stuff is exported.

> > + default y
>
> please don't default a new and experimental facility to y. In fact
> default is totally overused.

I'd agree when we go to mainline, but for 2.6-mm more testing is
welcome. Besides, they don't have to use inotify. This just gets the
hooks compiled in.

I will definitely remove 'default' altogether before we go to mainline.

> > +#ifdef CONFIG_INOTIFY
> > + struct list_head inotify_watches; /* watches on this inode */
> > + spinlock_t inotify_lock; /* protects the watches list */
> > +#endif
>
> do you really need a spinlock of your own in every inode? Inode memory
> usage is a quite big problem.

Yah, we do. For a couple of reasons. First, by introducing our own
lock, we never need touch i_lock, and avoid that scalability mess
altogether. Second, and most importantly, i_lock is an outermost lock.
We need our lock to be nestable, because we walk inode -> inotify_watch
-> inotify_device. I've tried various rewrites to not need our own
lock. None are pretty.

I can offer to the "inode memory worries me" people that they can always
disable CONFIG_INOTIFY.

> > +/*
> > + * fsnotify_change - notify_change event. file was modified and/or metadata
> > + * was changed.
> > + */
> > +static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
>
> this one is far too large to be inlined.

I'd agree, but it is only called from one place. And this way everything
stays in fsnotify.h.

Best,

Robert Love


inotify!

inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

* inotify's interface is a device node, not SIGIO. You open a
single fd to the device node, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.

Inotify is currently used by Beagle (a desktop search infrastructure)
and Gamin (a FAM replacement).

Signed-off-by: Robert Love <[email protected]>

fs/Kconfig | 13
fs/Makefile | 1
fs/attr.c | 33 -
fs/compat.c | 14
fs/file_table.c | 3
fs/inode.c | 4
fs/inotify.c | 1014 +++++++++++++++++++++++++++++++++++++++++++++++
fs/namei.c | 30 -
fs/open.c | 6
fs/read_write.c | 15
fs/super.c | 2
include/linux/fs.h | 8
include/linux/fsnotify.h | 236 ++++++++++
include/linux/inotify.h | 113 +++++
include/linux/sched.h | 4
kernel/user.c | 4
16 files changed, 1444 insertions(+), 56 deletions(-)

diff -urN linux-2.6.11-mm1/fs/attr.c linux/fs/attr.c
--- linux-2.6.11-mm1/fs/attr.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/attr.c 2005-03-07 16:20:02.213242376 -0500
@@ -10,7 +10,7 @@
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/fcntl.h>
#include <linux/quotaops.h>
#include <linux/security.h>
@@ -107,31 +107,8 @@
out:
return error;
}
-
EXPORT_SYMBOL(inode_setattr);

-int setattr_mask(unsigned int ia_valid)
-{
- unsigned long dn_mask = 0;
-
- if (ia_valid & ATTR_UID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_GID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_SIZE)
- dn_mask |= DN_MODIFY;
- /* both times implies a utime(s) call */
- if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
- dn_mask |= DN_ATTRIB;
- else if (ia_valid & ATTR_ATIME)
- dn_mask |= DN_ACCESS;
- else if (ia_valid & ATTR_MTIME)
- dn_mask |= DN_MODIFY;
- if (ia_valid & ATTR_MODE)
- dn_mask |= DN_ATTRIB;
- return dn_mask;
-}
-
int notify_change(struct dentry * dentry, struct iattr * attr)
{
struct inode *inode = dentry->d_inode;
@@ -194,11 +171,9 @@
if (ia_valid & ATTR_SIZE)
up_write(&dentry->d_inode->i_alloc_sem);

- if (!error) {
- unsigned long dn_mask = setattr_mask(ia_valid);
- if (dn_mask)
- dnotify_parent(dentry, dn_mask);
- }
+ if (!error)
+ fsnotify_change(dentry, ia_valid);
+
return error;
}

diff -urN linux-2.6.11-mm1/fs/compat.c linux/fs/compat.c
--- linux-2.6.11-mm1/fs/compat.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/compat.c 2005-03-07 16:20:02.216241920 -0500
@@ -36,7 +36,7 @@
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/dirent.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/highuid.h>
#include <linux/sunrpc/svc.h>
#include <linux/nfsd/nfsd.h>
@@ -1233,9 +1233,15 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ if (type == READ)
+ fsnotify_access(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ else
+ fsnotify_modify(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ }
return ret;
}

diff -urN linux-2.6.11-mm1/fs/file_table.c linux/fs/file_table.c
--- linux-2.6.11-mm1/fs/file_table.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/file_table.c 2005-03-07 16:20:02.217241768 -0500
@@ -16,6 +16,7 @@
#include <linux/eventpoll.h>
#include <linux/mount.h>
#include <linux/cdev.h>
+#include <linux/fsnotify.h>

/* sysctl tunables... */
struct files_stat_struct files_stat = {
@@ -123,6 +124,8 @@
struct inode *inode = dentry->d_inode;

might_sleep();
+
+ fsnotify_close(file);
/*
* The function eventpoll_release() should be the first called
* in the file cleanup chain.
diff -urN linux-2.6.11-mm1/fs/inode.c linux/fs/inode.c
--- linux-2.6.11-mm1/fs/inode.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/inode.c 2005-03-07 16:20:02.219241464 -0500
@@ -132,6 +132,10 @@
#ifdef CONFIG_QUOTA
memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
#endif
+#ifdef CONFIG_INOTIFY
+ INIT_LIST_HEAD(&inode->inotify_watches);
+ spin_lock_init(&inode->inotify_lock);
+#endif
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_cdev = NULL;
diff -urN linux-2.6.11-mm1/fs/inotify.c linux/fs/inotify.c
--- linux-2.6.11-mm1/fs/inotify.c 1969-12-31 19:00:00.000000000 -0500
+++ linux/fs/inotify.c 2005-03-07 16:20:02.222241008 -0500
@@ -0,0 +1,1014 @@
+/*
+ * fs/inotify.c - inode-based file event notifications
+ *
+ * Authors:
+ * John McCutchan <[email protected]>
+ * Robert Love <[email protected]>
+ *
+ * Copyright (C) 2005 John McCutchan
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/writeback.h>
+#include <linux/inotify.h>
+
+#include <asm/ioctls.h>
+
+static atomic_t inotify_cookie;
+
+static kmem_cache_t *watch_cachep;
+static kmem_cache_t *event_cachep;
+
+static int max_user_devices;
+static int max_user_watches;
+static unsigned int max_queued_events;
+
+/*
+ * Lock ordering:
+ *
+ * inode_lock (used to safely walk the super_block->s_inodes list)
+ * dentry->d_lock (used to keep d_move() away from dentry->d_parent)
+ * inode->inotify_lock (protects inotify->inotify_watches and watches->i_list)
+ * inotify_dev->lock (protects inotify_device and watches->d_list)
+ */
+
+/*
+ * Lifetimes of the three main data structures -- inotify_device, inode, and
+ * inotify_watch -- are managed by reference count.
+ *
+ * inotify_device: Lifetime is from open until release. Additional references
+ * can bump the count via get_inotify_dev() and drop the count via
+ * put_inotify_dev().
+ *
+ * inotify_watch: Lifetime is from create_watch() to destory_watch().
+ * Additional references can bump the count via get_inotify_watch() and drop
+ * the count via put_inotify_watch().
+ *
+ * inode: Pinned so long as the inode is associated with a watch, from
+ * create_watch() to put_inotify_watch().
+ */
+
+/*
+ * struct inotify_device - represents an open instance of an inotify device
+ *
+ * This structure is protected by 'lock'.
+ */
+struct inotify_device {
+ wait_queue_head_t wq; /* wait queue for i/o */
+ struct idr idr; /* idr mapping wd -> watch */
+ struct list_head events; /* list of queued events */
+ struct list_head watches; /* list of watches */
+ spinlock_t lock; /* protects this bad boy */
+ atomic_t count; /* reference count */
+ struct user_struct *user; /* user who opened this dev */
+ unsigned int queue_size; /* size of the queue (bytes) */
+ unsigned int event_count; /* number of pending events */
+ unsigned int max_events; /* maximum number of events */
+};
+
+/*
+ * struct inotify_kernel_event - An intofiy event, originating from a watch and
+ * queued for user-space. A list of these is attached to each instance of the
+ * device. In read(), this list is walked and all events that can fit in the
+ * buffer are returned.
+ *
+ * Protected by dev->lock of the device in which we are queued.
+ */
+struct inotify_kernel_event {
+ struct inotify_event event; /* the user-space event */
+ struct list_head list; /* entry in inotify_device's list */
+ char *name; /* filename, if any */
+};
+
+/*
+ * struct inotify_watch - represents a watch request on a specific inode
+ *
+ * d_list is protected by dev->lock of the associated dev->watches.
+ * i_list and mask are protected by inode->inotify_lock of the associated inode.
+ * dev, inode, and wd are never written to once the watch is created.
+ */
+struct inotify_watch {
+ struct list_head d_list; /* entry in inotify_device's list */
+ struct list_head i_list; /* entry in inode's list */
+ atomic_t count; /* reference count */
+ struct inotify_device *dev; /* associated device */
+ struct inode *inode; /* associated inode */
+ s32 wd; /* watch descriptor */
+ u32 mask; /* event mask for this watch */
+};
+
+static ssize_t show_max_queued_events(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_queued_events);
+}
+
+static ssize_t store_max_queued_events(struct class_device *class,
+ const char *buf, size_t count)
+{
+ unsigned int max;
+
+ if (sscanf(buf, "%u", &max) > 0 && max > 0) {
+ max_queued_events = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_devices(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_devices);
+}
+
+static ssize_t store_max_user_devices(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_devices = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_watches(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_watches);
+}
+
+static ssize_t store_max_user_watches(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_watches = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static CLASS_DEVICE_ATTR(max_queued_events, S_IRUGO | S_IWUSR,
+ show_max_queued_events, store_max_queued_events);
+static CLASS_DEVICE_ATTR(max_user_devices, S_IRUGO | S_IWUSR,
+ show_max_user_devices, store_max_user_devices);
+static CLASS_DEVICE_ATTR(max_user_watches, S_IRUGO | S_IWUSR,
+ show_max_user_watches, store_max_user_watches);
+
+static inline void get_inotify_dev(struct inotify_device *dev)
+{
+ atomic_inc(&dev->count);
+}
+
+static inline void put_inotify_dev(struct inotify_device *dev)
+{
+ if (atomic_dec_and_test(&dev->count)) {
+ atomic_dec(&dev->user->inotify_devs);
+ free_uid(dev->user);
+ kfree(dev);
+ }
+}
+
+static inline void get_inotify_watch(struct inotify_watch *watch)
+{
+ atomic_inc(&watch->count);
+}
+
+static inline void put_inotify_watch(struct inotify_watch *watch)
+{
+ if (atomic_dec_and_test(&watch->count)) {
+ put_inotify_dev(watch->dev);
+ iput(watch->inode);
+ kmem_cache_free(watch_cachep, watch);
+ }
+}
+
+/*
+ * kernel_event - create a new kernel event with the given parameters
+ */
+static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_kernel_event *kevent;
+
+ /* XXX: optimally, we should use GFP_KERNEL */
+ kevent = kmem_cache_alloc(event_cachep, GFP_ATOMIC);
+ if (unlikely(!kevent))
+ return NULL;
+
+ /* we hand this out to user-space, so zero it just in case */
+ memset(&kevent->event, 0, sizeof(struct inotify_event));
+
+ kevent->event.wd = wd;
+ kevent->event.mask = mask;
+ kevent->event.cookie = cookie;
+
+ INIT_LIST_HEAD(&kevent->list);
+
+ if (name) {
+ size_t len, rem, event_size = sizeof(struct inotify_event);
+
+ /*
+ * We need to pad the filename so as to properly align an
+ * array of inotify_event structures. Because the structure is
+ * small and the common case is a small filename, we just round
+ * up to the next multiple of the structure's sizeof. This is
+ * simple and safe for all architectures.
+ */
+ len = strlen(name) + 1;
+ rem = event_size - len;
+ if (len > event_size) {
+ rem = event_size - (len % event_size);
+ if (len % event_size == 0)
+ rem = 0;
+ }
+ len += rem;
+
+ /* XXX: optimally, we should use GFP_KERNEL */
+ kevent->name = kmalloc(len, GFP_ATOMIC);
+ if (unlikely(!kevent->name)) {
+ kmem_cache_free(event_cachep, kevent);
+ return NULL;
+ }
+ memset(kevent->name, 0, len);
+ strncpy(kevent->name, name, strlen(name));
+ kevent->event.len = len;
+ } else {
+ kevent->event.len = 0;
+ kevent->name = NULL;
+ }
+
+ return kevent;
+}
+
+/*
+ * inotify_dev_get_event - return the next event in the given dev's queue
+ *
+ * Caller must hold dev->lock.
+ */
+static inline struct inotify_kernel_event *
+inotify_dev_get_event(struct inotify_device *dev)
+{
+ return list_entry(dev->events.next, struct inotify_kernel_event, list);
+}
+
+/*
+ * inotify_dev_queue_event - add a new event to the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_queue_event(struct inotify_device *dev,
+ struct inotify_watch *watch, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct inotify_kernel_event *kevent, *last;
+
+ /* coalescing: drop this event if it is a dupe of the previous */
+ last = inotify_dev_get_event(dev);
+ if (dev->event_count && last->event.mask == mask &&
+ last->event.wd == watch->wd) {
+ const char *lastname = last->name;
+
+ if (!name && !lastname)
+ return;
+ if (name && lastname && !strcmp(lastname, name))
+ return;
+ }
+
+ /*
+ * The queue has already overflowed and we have already sent the
+ * Q_OVERFLOW event.
+ */
+ if (unlikely(dev->event_count > dev->max_events))
+ return;
+
+ /* if the queue overflows, we need to notify user space */
+ if (unlikely(dev->event_count == dev->max_events))
+ kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
+ else
+ kevent = kernel_event(watch->wd, mask, cookie, name);
+
+ if (unlikely(!kevent))
+ return;
+
+ /* queue the event and wake up anyone waiting */
+ dev->event_count++;
+ dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
+ list_add_tail(&kevent->list, &dev->events);
+ wake_up_interruptible(&dev->wq);
+}
+
+/*
+ * remove_kevent - cleans up and ultimately frees the given kevent
+ */
+static void remove_kevent(struct inotify_device *dev,
+ struct inotify_kernel_event *kevent)
+{
+ BUG_ON(!dev);
+ BUG_ON(!kevent);
+
+ list_del(&kevent->list);
+
+ dev->event_count--;
+ dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
+
+ kfree(kevent->name);
+ kmem_cache_free(event_cachep, kevent);
+}
+
+/*
+ * inotify_dev_event_dequeue - destroy an event on the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_event_dequeue(struct inotify_device *dev)
+{
+ if (!list_empty(&dev->events)) {
+ struct inotify_kernel_event *kevent;
+ kevent = inotify_dev_get_event(dev);
+ remove_kevent(dev, kevent);
+ }
+}
+
+/*
+ * inotify_dev_get_wd - returns the next WD for use by the given dev
+ *
+ * Grabs dev->lock. This function can sleep.
+ */
+static int inotify_dev_get_wd(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ int ret;
+
+ do {
+ if (unlikely(!idr_pre_get(&dev->idr, GFP_KERNEL)))
+ return -ENOSPC;
+ spin_lock(&dev->lock);
+ ret = idr_get_new(&dev->idr, watch, &watch->wd);
+ spin_unlock(&dev->lock);
+ } while (ret == -EAGAIN);
+
+ return ret;
+}
+
+/*
+ * create_watch - creates a watch on the given device.
+ *
+ * Calls inotify_dev_get_wd(), so it both grabs dev->lock and may sleep.
+ * Both 'dev' and 'inode' (by way of nameidata) need to be pinned.
+ */
+static struct inotify_watch *create_watch(struct inotify_device *dev,
+ u32 mask, struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ if (atomic_read(&dev->user->inotify_watches) >= max_user_watches)
+ return NULL;
+
+ watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
+ if (unlikely(!watch))
+ return NULL;
+
+ if (unlikely(inotify_dev_get_wd(dev, watch))) {
+ kmem_cache_free(watch_cachep, watch);
+ return NULL;
+ }
+
+ watch->mask = mask;
+ atomic_set(&watch->count, 0);
+ INIT_LIST_HEAD(&watch->d_list);
+ INIT_LIST_HEAD(&watch->i_list);
+
+ /* save a reference to device and bump the count to make it official */
+ get_inotify_dev(dev);
+ watch->dev = dev;
+
+ /*
+ * Save a reference to the inode and bump the ref count to make it
+ * official. We hold a reference to nameidata, which makes this safe.
+ */
+ watch->inode = igrab(inode);
+
+ /* bump our own count, corresponding to our entry in dev->watches */
+ get_inotify_watch(watch);
+
+ atomic_inc(&dev->user->inotify_watches);
+
+ return watch;
+}
+
+/*
+ * inotify_find_dev - find the watch associated with the given inode and dev
+ *
+ * Callers must hold inode->inotify_lock.
+ */
+static struct inotify_watch *inode_find_dev(struct inode *inode,
+ struct inotify_device *dev)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->dev == dev)
+ return watch;
+ }
+
+ return NULL;
+}
+
+/*
+ * inotify_dev_is_watching_inode - is this device watching this inode?
+ *
+ * Requires 'dev' and 'inode' to both be pinned and dev->lock be held.
+ */
+static inline int inotify_dev_is_watching_inode(struct inotify_device *dev,
+ struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &dev->watches, d_list) {
+ if (watch->inode == inode)
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * remove_watch_no_event - remove_watch() without the IN_IGNORED event.
+ */
+static void remove_watch_no_event(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ BUG_ON(!dev);
+ BUG_ON(!watch);
+
+ list_del(&watch->i_list);
+ list_del(&watch->d_list);
+
+ atomic_dec(&dev->user->inotify_watches);
+ idr_remove(&dev->idr, watch->wd);
+ put_inotify_watch(watch);
+}
+
+/*
+ * remove_watch - Remove a watch from both the device and the inode. Sends
+ * the IN_IGNORED event to the given device signifying that the inode is no
+ * longer watched.
+ *
+ * Callers must hold both inode->inotify_lock and dev->lock. We drop a
+ * reference to the inode before returning.
+ */
+static void remove_watch(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ inotify_dev_queue_event(dev, watch, IN_IGNORED, 0, NULL);
+ remove_watch_no_event(watch, dev);
+}
+
+/*
+ * __inode_queue_event - internal helper for inotify_inode_queue_event()
+ *
+ * Caller must hold inode->inotify_lock.
+ */
+void __inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->mask & mask) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, mask, cookie, name);
+ spin_unlock(&dev->lock);
+ }
+ }
+}
+
+/* Kernel API */
+
+/**
+ * inotify_inode_queue_event - queue an event to all watches on this inode
+ * @inode: inode event is originating from
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ spin_lock(&inode->inotify_lock);
+ __inode_queue_event(inode, mask, cookie, name);
+ spin_unlock(&inode->inotify_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
+
+/**
+ * inotify_dentry_parent_queue_event - queue an event to a dentry's parent
+ * @dentry: the dentry in question, we queue against this dentry's parent
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct dentry *parent;
+ struct inode *inode;
+
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ inode = parent->d_inode;
+ spin_lock(&inode->inotify_lock);
+ if (!list_empty(&inode->inotify_watches)) {
+ dget(parent);
+ spin_unlock(&dentry->d_lock);
+ __inode_queue_event(inode, mask, cookie, name);
+ spin_unlock(&inode->inotify_lock);
+ dput(parent);
+ } else {
+ spin_unlock(&inode->inotify_lock);
+ spin_unlock(&dentry->d_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
+
+/**
+ * inotify_get_cookie - return a unique cookie for use in synchronizing events
+ *
+ * Returns the unique cookie.
+ */
+u32 inotify_get_cookie(void)
+{
+ return atomic_inc_return(&inotify_cookie);
+}
+EXPORT_SYMBOL_GPL(inotify_get_cookie);
+
+/**
+ * inotify_super_block_umount - process watches on an unmounted fs
+ * @sb: the super_block of the filesystem in question
+ */
+void inotify_super_block_umount(struct super_block *sb)
+{
+ struct inode *inode;
+
+ /* walk the list of inodes on this superblock */
+ spin_lock(&inode_lock);
+ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ struct inotify_watch *watch, *next;
+ struct list_head *watches;
+
+ /* for each watch, send IN_UNMOUNT and then remove it */
+ spin_lock(&inode->inotify_lock);
+ watches = &inode->inotify_watches;
+ list_for_each_entry_safe(watch, next, watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, IN_UNMOUNT,0,NULL);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ spin_unlock(&inode->inotify_lock);
+ }
+ spin_unlock(&inode_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_super_block_umount);
+
+/**
+ * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
+ * @inode: inode that is about to be removed
+ */
+void inotify_inode_is_dead(struct inode *inode)
+{
+ struct inotify_watch *watch, *next;
+
+ spin_lock(&inode->inotify_lock);
+ list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ spin_unlock(&inode->inotify_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
+
+/* Device Interface */
+
+static unsigned int inotify_poll(struct file *file, poll_table *wait)
+{
+ struct inotify_device *dev;
+ int ret = 0;
+
+ dev = file->private_data;
+ get_inotify_dev(dev);
+
+ poll_wait(file, &dev->wq, wait);
+ spin_lock(&dev->lock);
+ if (!list_empty(&dev->events))
+ ret = POLLIN | POLLRDNORM;
+ spin_unlock(&dev->lock);
+
+ put_inotify_dev(dev);
+ return ret;
+}
+
+static ssize_t inotify_read(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ size_t event_size;
+ struct inotify_device *dev;
+ char __user *start;
+ int ret;
+ DEFINE_WAIT(wait);
+
+ start = buf;
+ dev = file->private_data;
+
+ /* we only hand out full inotify events */
+ event_size = sizeof(struct inotify_event);
+ if (count < event_size)
+ return 0;
+
+ while (1) {
+ int events;
+
+ prepare_to_wait(&dev->wq, &wait, TASK_INTERRUPTIBLE);
+
+ spin_lock(&dev->lock);
+ events = !list_empty(&dev->events);
+ spin_unlock(&dev->lock);
+ if (events) {
+ ret = 0;
+ break;
+ }
+
+ if (file->f_flags & O_NONBLOCK) {
+ ret = -EAGAIN;
+ break;
+ }
+
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+
+ schedule();
+ }
+
+ finish_wait(&dev->wq, &wait);
+ if (ret)
+ return ret;
+
+ while (1) {
+ struct inotify_kernel_event *kevent;
+
+ spin_lock(&dev->lock);
+ if (list_empty(&dev->events)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ kevent = inotify_dev_get_event(dev);
+ if (event_size + kevent->event.len > count) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ list_del_init(&kevent->list);
+ spin_unlock(&dev->lock);
+
+ if (copy_to_user(buf, &kevent->event, event_size)) {
+ /* put the event back on the queue */
+ spin_lock(&dev->lock);
+ list_add(&kevent->list, &dev->events);
+ spin_unlock(&dev->lock);
+ return -EFAULT;
+ }
+ buf += event_size;
+ count -= event_size;
+
+ if (kevent->name) {
+ if (copy_to_user(buf, kevent->name, kevent->event.len)){
+ /* put the event back on the queue */
+ spin_lock(&dev->lock);
+ list_add(&kevent->list, &dev->events);
+ spin_unlock(&dev->lock);
+ return -EFAULT;
+ }
+ buf += kevent->event.len;
+ count -= kevent->event.len;
+ }
+
+ /*
+ * We made it here, so the event was copied to the user. It is
+ * already removed from the event list, just free it.
+ */
+ spin_lock(&dev->lock);
+ remove_kevent(dev, kevent);
+ spin_unlock(&dev->lock);
+ }
+
+ return buf - start;
+}
+
+static int inotify_open(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+ struct user_struct *user;
+ int ret;
+
+ user = get_uid(current->user);
+
+ if (unlikely(atomic_read(&user->inotify_devs) >= max_user_devices)) {
+ ret = -EMFILE;
+ goto out_err;
+ }
+
+ dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
+ if (unlikely(!dev)) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ idr_init(&dev->idr);
+ INIT_LIST_HEAD(&dev->events);
+ INIT_LIST_HEAD(&dev->watches);
+ init_waitqueue_head(&dev->wq);
+ spin_lock_init(&dev->lock);
+
+ dev->event_count = 0;
+ dev->queue_size = 0;
+ dev->max_events = max_queued_events;
+ dev->user = user;
+ atomic_set(&dev->count, 0);
+
+ get_inotify_dev(dev);
+ atomic_inc(&current->user->inotify_devs);
+
+ file->private_data = dev;
+
+ return 0;
+out_err:
+ free_uid(current->user);
+ return ret;
+}
+
+static int inotify_release(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+
+ dev = file->private_data;
+ BUG_ON(!dev);
+
+ /*
+ * Destroy all of the watches on this device. Unfortunately, not very
+ * pretty. We cannot do a simple iteration over the list, because we
+ * do not know the inode until we iterate to the watch. But we need to
+ * hold inode->inotify_lock before dev->lock. The following works.
+ */
+ while (1) {
+ struct inotify_watch *watch;
+ struct list_head *watches;
+ struct inode *inode;
+
+ spin_lock(&dev->lock);
+ watches = &dev->watches;
+ if (list_empty(watches)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ watch = list_entry(watches->next, struct inotify_watch, d_list);
+ get_inotify_watch(watch);
+ spin_unlock(&dev->lock);
+
+ inode = watch->inode;
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ remove_watch_no_event(watch, dev);
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+ put_inotify_watch(watch);
+ }
+
+ /* destroy all of the events on this device */
+ spin_lock(&dev->lock);
+ while (!list_empty(&dev->events))
+ inotify_dev_event_dequeue(dev);
+ spin_unlock(&dev->lock);
+
+ /* free this device: the put matching the get in inotify_open() */
+ put_inotify_dev(dev);
+
+ return 0;
+}
+
+static int inotify_add_watch(struct inotify_device *dev,
+ struct inotify_watch_request *request)
+{
+ struct inode *inode;
+ struct inotify_watch *watch, *old;
+ struct nameidata nd;
+ int ret;
+
+ ret = __user_walk(request->name, LOOKUP_FOLLOW, &nd);
+ if (unlikely(ret))
+ return ret;
+
+ /* you can only watch an inode if you have read permissions on it */
+ ret = permission(nd.dentry->d_inode, MAY_READ, &nd);
+ if (unlikely(ret))
+ goto nd_out;
+
+ /* inode is held in place by a reference on nd */
+ inode = nd.dentry->d_inode;
+
+ /*
+ * Handle the case of re-adding a watch on an (inode,dev) pair that we
+ * are already watching. We just update the mask and return its wd.
+ */
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ old->mask = request->mask;
+ ret = old->wd;
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+ goto nd_out;
+ }
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+ /*
+ * We do this lockless, for both scalability and so we can allocate
+ * with GFP_KERNEL. But that means we can race here and add this watch
+ * twice. We fix up that case below, by again checking for the watch
+ * after we reacquire the locks.
+ */
+ watch = create_watch(dev, request->mask, inode);
+ if (unlikely(!watch)) {
+ ret = -ENOSPC;
+ goto nd_out;
+ }
+
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ /* We raced! Destroy this watch and return. */
+ put_inotify_watch(watch);
+ ret = -EBUSY;
+ } else {
+ /* Add the watch to the device's and the inode's list */
+ list_add(&watch->d_list, &dev->watches);
+ list_add(&watch->i_list, &inode->inotify_watches);
+ ret = watch->wd;
+ }
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+nd_out:
+ path_release(&nd);
+
+ return ret;
+}
+
+/*
+ * inotify_ignore - handle the INOTIFY_IGNORE ioctl, asking that a given wd be
+ * removed from the device.
+ */
+static int inotify_ignore(struct inotify_device *dev, s32 wd)
+{
+ struct inotify_watch *watch;
+ struct inode *inode;
+
+ spin_lock(&dev->lock);
+ watch = idr_find(&dev->idr, wd);
+ spin_unlock(&dev->lock);
+
+ if (unlikely(!watch))
+ return -EINVAL;
+
+ inode = watch->inode;
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+ return 0;
+}
+
+static int inotify_ioctl(struct inode *ip, struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct inotify_device *dev;
+ struct inotify_watch_request request;
+ void __user *p;
+ int ret = -ENOTTY;
+ s32 wd;
+
+ dev = file->private_data;
+ p = (void __user *) arg;
+
+ get_inotify_dev(dev);
+
+ switch (cmd) {
+ case INOTIFY_WATCH:
+ if (unlikely(copy_from_user(&request, p, sizeof (request)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_add_watch(dev, &request);
+ break;
+ case INOTIFY_IGNORE:
+ if (unlikely(copy_from_user(&wd, p, sizeof (wd)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_ignore(dev, wd);
+ break;
+ case FIONREAD:
+ ret = put_user(dev->queue_size, (int __user *) p);
+ break;
+ }
+
+ put_inotify_dev(dev);
+
+ return ret;
+}
+
+static struct file_operations inotify_fops = {
+ .owner = THIS_MODULE,
+ .poll = inotify_poll,
+ .read = inotify_read,
+ .open = inotify_open,
+ .release = inotify_release,
+ .ioctl = inotify_ioctl,
+};
+
+static struct miscdevice inotify_device = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "inotify",
+ .fops = &inotify_fops,
+};
+
+/*
+ * inotify_init - Our initialization function. Note that we cannnot return
+ * error because we have compiled-in VFS hooks. So an (unlikely) failure here
+ * must result in panic().
+ */
+static int __init inotify_init(void)
+{
+ struct class_device *class;
+ int ret;
+
+ ret = misc_register(&inotify_device);
+ if (unlikely(ret))
+ panic("inotify: misc_register returned %d\n", ret);
+
+ max_queued_events = 512;
+ max_user_devices = 64;
+ max_user_watches = 16384;
+
+ class = inotify_device.class;
+ class_device_create_file(class, &class_device_attr_max_queued_events);
+ class_device_create_file(class, &class_device_attr_max_user_devices);
+ class_device_create_file(class, &class_device_attr_max_user_watches);
+
+ atomic_set(&inotify_cookie, 0);
+
+ watch_cachep = kmem_cache_create("inotify_watch_cache",
+ sizeof(struct inotify_watch),
+ 0, SLAB_PANIC, NULL, NULL);
+ event_cachep = kmem_cache_create("inotify_event_cache",
+ sizeof(struct inotify_kernel_event),
+ 0, SLAB_PANIC, NULL, NULL);
+
+ printk(KERN_INFO "inotify device minor=%d\n", inotify_device.minor);
+
+ return 0;
+}
+
+module_init(inotify_init);
diff -urN linux-2.6.11-mm1/fs/Kconfig linux/fs/Kconfig
--- linux-2.6.11-mm1/fs/Kconfig 2005-03-04 13:23:55.000000000 -0500
+++ linux/fs/Kconfig 2005-03-07 16:20:02.224240704 -0500
@@ -344,6 +344,19 @@
If you don't know whether you need it, then you don't need it:
answer N.

+config INOTIFY
+ bool "Inotify file change notification support"
+ default y
+ ---help---
+ Say Y here to enable inotify support and the /dev/inotify character
+ device. Inotify is a file change notification system and a
+ replacement for dnotify. Inotify fixes numerous shortcomings in
+ dnotify and introduces several new features. It allows monitoring
+ of both files and directories via a single open fd. Multiple file
+ events are supported.
+
+ If unsure, say Y.
+
config QUOTA
bool "Quota support"
help
diff -urN linux-2.6.11-mm1/fs/Makefile linux/fs/Makefile
--- linux-2.6.11-mm1/fs/Makefile 2005-03-04 13:23:55.000000000 -0500
+++ linux/fs/Makefile 2005-03-07 16:20:02.225240552 -0500
@@ -11,6 +11,7 @@
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \

+obj-$(CONFIG_INOTIFY) += inotify.o
obj-$(CONFIG_EPOLL) += eventpoll.o
obj-$(CONFIG_COMPAT) += compat.o

diff -urN linux-2.6.11-mm1/fs/namei.c linux/fs/namei.c
--- linux-2.6.11-mm1/fs/namei.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/namei.c 2005-03-07 16:20:02.229239944 -0500
@@ -21,7 +21,7 @@
#include <linux/namei.h>
#include <linux/quotaops.h>
#include <linux/pagemap.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/smp_lock.h>
#include <linux/personality.h>
#include <linux/security.h>
@@ -1261,7 +1261,7 @@
DQUOT_INIT(dir);
error = dir->i_op->create(dir, dentry, mode, nd);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_create(dir, dentry, mode);
}
return error;
@@ -1566,7 +1566,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mknod(dir, dentry, mode, dev);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_mknod(dir, dentry, mode, dev);
}
return error;
@@ -1639,7 +1639,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mkdir(dir, dentry, mode);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_mkdir(dir, dentry->d_name.name);
security_inode_post_mkdir(dir,dentry, mode);
}
return error;
@@ -1730,7 +1730,7 @@
}
up(&dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_DELETE);
+ fsnotify_rmdir(dentry, dentry->d_inode, dir);
d_delete(dentry);
}
dput(dentry);
@@ -1803,9 +1803,10 @@

/* We don't d_delete() NFS sillyrenamed files--they still exist. */
if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
+ fsnotify_unlink(dentry, dir);
d_delete(dentry);
- inode_dir_notify(dir, DN_DELETE);
}
+
return error;
}

@@ -1879,7 +1880,7 @@
DQUOT_INIT(dir);
error = dir->i_op->symlink(dir, dentry, oldname);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_symlink(dir, dentry, oldname);
}
return error;
@@ -1952,7 +1953,7 @@
error = dir->i_op->link(old_dentry, dir, new_dentry);
up(&old_dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, new_dentry->d_name.name);
security_inode_post_link(old_dentry, dir, new_dentry);
}
return error;
@@ -2116,6 +2117,7 @@
{
int error;
int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
+ char *old_name;

if (old_dentry->d_inode == new_dentry->d_inode)
return 0;
@@ -2137,18 +2139,18 @@
DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir);

+ old_name = fsnotify_oldname_init(old_dentry);
+
if (is_dir)
error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
else
error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
if (!error) {
- if (old_dir == new_dir)
- inode_dir_notify(old_dir, DN_RENAME);
- else {
- inode_dir_notify(old_dir, DN_DELETE);
- inode_dir_notify(new_dir, DN_CREATE);
- }
+ const char *new_name = old_dentry->d_name.name;
+ fsnotify_move(old_dir, new_dir, old_name, new_name);
}
+ fsnotify_oldname_free(old_name);
+
return error;
}

diff -urN linux-2.6.11-mm1/fs/open.c linux/fs/open.c
--- linux-2.6.11-mm1/fs/open.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/open.c 2005-03-07 16:20:02.230239792 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/smp_lock.h>
#include <linux/quotaops.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/tty.h>
@@ -944,9 +944,11 @@
fd = get_unused_fd();
if (fd >= 0) {
struct file *f = filp_open(tmp, flags, mode);
+
error = PTR_ERR(f);
if (IS_ERR(f))
goto out_error;
+ fsnotify_open(f->f_dentry);
fd_install(fd, f);
}
out:
@@ -998,7 +1000,7 @@
retval = err;
}

- dnotify_flush(filp, id);
+ fsnotify_flush(filp, id);
locks_remove_posix(filp, id);
fput(filp);
return retval;
diff -urN linux-2.6.11-mm1/fs/read_write.c linux/fs/read_write.c
--- linux-2.6.11-mm1/fs/read_write.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/read_write.c 2005-03-07 16:20:02.232239488 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/uio.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/module.h>
#include <linux/syscalls.h>
@@ -239,7 +239,7 @@
else
ret = do_sync_read(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_ACCESS);
+ fsnotify_access(file->f_dentry);
current->rchar += ret;
}
current->syscr++;
@@ -287,7 +287,7 @@
else
ret = do_sync_write(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_MODIFY);
+ fsnotify_modify(file->f_dentry);
current->wchar += ret;
}
current->syscw++;
@@ -523,9 +523,12 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ if (type == READ)
+ fsnotify_access(file->f_dentry);
+ else
+ fsnotify_modify(file->f_dentry);
+ }
return ret;
Efault:
ret = -EFAULT;
diff -urN linux-2.6.11-mm1/fs/super.c linux/fs/super.c
--- linux-2.6.11-mm1/fs/super.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/super.c 2005-03-07 16:20:02.233239336 -0500
@@ -37,6 +37,7 @@
#include <linux/writeback.h> /* for the emergency remount stuff */
#include <linux/idr.h>
#include <linux/kobject.h>
+#include <linux/fsnotify.h>
#include <asm/uaccess.h>


@@ -229,6 +230,7 @@

if (root) {
sb->s_root = NULL;
+ fsnotify_sb_umount(sb);
shrink_dcache_parent(root);
shrink_dcache_anon(&sb->s_anon);
dput(root);
diff -urN linux-2.6.11-mm1/include/linux/fs.h linux/include/linux/fs.h
--- linux-2.6.11-mm1/include/linux/fs.h 2005-03-04 14:06:21.000000000 -0500
+++ linux/include/linux/fs.h 2005-03-07 16:20:02.237238728 -0500
@@ -223,6 +223,7 @@
struct kstatfs;
struct vm_area_struct;
struct vfsmount;
+struct inotify_inode_data;

/* Used to be a macro which just called the function, now just a function */
extern void update_atime (struct inode *);
@@ -473,6 +474,11 @@
struct dnotify_struct *i_dnotify; /* for directory notifications */
#endif

+#ifdef CONFIG_INOTIFY
+ struct list_head inotify_watches; /* watches on this inode */
+ spinlock_t inotify_lock; /* protects the watches list */
+#endif
+
unsigned long i_state;
unsigned long dirtied_when; /* jiffies of first dirtying */

@@ -1368,7 +1374,7 @@
extern int do_remount_sb(struct super_block *sb, int flags,
void *data, int force);
extern sector_t bmap(struct inode *, sector_t);
-extern int setattr_mask(unsigned int);
+extern void setattr_mask(unsigned int, int *, u32 *);
extern int notify_change(struct dentry *, struct iattr *);
extern int permission(struct inode *, int, struct nameidata *);
extern int generic_permission(struct inode *, int,
diff -urN linux-2.6.11-mm1/include/linux/fsnotify.h linux/include/linux/fsnotify.h
--- linux-2.6.11-mm1/include/linux/fsnotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/fsnotify.h 2005-03-07 16:20:02.238238576 -0500
@@ -0,0 +1,236 @@
+#ifndef _LINUX_FS_NOTIFY_H
+#define _LINUX_FS_NOTIFY_H
+
+/*
+ * include/linux/fs_notify.h - generic hooks for filesystem notification, to
+ * reduce in-source duplication from both dnotify and inotify.
+ *
+ * We don't compile any of this away in some complicated menagerie of ifdefs.
+ * Instead, we rely on the code inside to optimize away as needed.
+ *
+ * (C) Copyright 2005 Robert Love
+ */
+
+#ifdef __KERNEL__
+
+#include <linux/dnotify.h>
+#include <linux/inotify.h>
+
+/*
+ * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
+ */
+static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
+ const char *old_name, const char *new_name)
+{
+ u32 cookie;
+
+ if (old_dir == new_dir)
+ inode_dir_notify(old_dir, DN_RENAME);
+ else {
+ inode_dir_notify(old_dir, DN_DELETE);
+ inode_dir_notify(new_dir, DN_CREATE);
+ }
+
+ cookie = inotify_get_cookie();
+
+ inotify_inode_queue_event(old_dir, IN_MOVED_FROM, cookie, old_name);
+ inotify_inode_queue_event(new_dir, IN_MOVED_TO, cookie, new_name);
+}
+
+/*
+ * fsnotify_unlink - file was unlinked
+ */
+static inline void fsnotify_unlink(struct dentry *dentry, struct inode *dir)
+{
+ struct inode *inode = dentry->d_inode;
+
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_FILE, 0, dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+}
+
+/*
+ * fsnotify_rmdir - directory was removed
+ */
+static inline void fsnotify_rmdir(struct dentry *dentry, struct inode *inode,
+ struct inode *dir)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_SUBDIR,0,dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+}
+
+/*
+ * fsnotify_create - filename was linked in
+ */
+static inline void fsnotify_create(struct inode *inode, const char *filename)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_FILE, 0, filename);
+}
+
+/*
+ * fsnotify_mkdir - directory 'name' was created
+ */
+static inline void fsnotify_mkdir(struct inode *inode, const char *name)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_SUBDIR, 0, name);
+}
+
+/*
+ * fsnotify_access - file was read
+ */
+static inline void fsnotify_access(struct dentry *dentry)
+{
+ dnotify_parent(dentry, DN_ACCESS);
+ inotify_dentry_parent_queue_event(dentry, IN_ACCESS, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(dentry->d_inode, IN_ACCESS, 0, NULL);
+}
+
+/*
+ * fsnotify_modify - file was modified
+ */
+static inline void fsnotify_modify(struct dentry *dentry)
+{
+ dnotify_parent(dentry, DN_MODIFY);
+ inotify_dentry_parent_queue_event(dentry, IN_MODIFY, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(dentry->d_inode, IN_MODIFY, 0, NULL);
+}
+
+/*
+ * fsnotify_open - file was opened
+ */
+static inline void fsnotify_open(struct dentry *dentry)
+{
+ inotify_inode_queue_event(dentry->d_inode, IN_OPEN, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, IN_OPEN, 0,
+ dentry->d_name.name);
+}
+
+/*
+ * fsnotify_close - file was closed
+ */
+static inline void fsnotify_close(struct file *file)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ const char *filename = dentry->d_name.name;
+ mode_t mode = file->f_mode;
+ u32 mask;
+
+ mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
+ inotify_dentry_parent_queue_event(dentry, mask, 0, filename);
+ inotify_inode_queue_event(inode, mask, 0, NULL);
+}
+
+/*
+ * fsnotify_change - notify_change event. file was modified and/or metadata
+ * was changed.
+ */
+static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
+{
+ int dn_mask = 0;
+ u32 in_mask = 0;
+
+ if (ia_valid & ATTR_UID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_GID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_SIZE) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ /* both times implies a utime(s) call */
+ if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
+ {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ } else if (ia_valid & ATTR_ATIME) {
+ in_mask |= IN_ACCESS;
+ dn_mask |= DN_ACCESS;
+ } else if (ia_valid & ATTR_MTIME) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ if (ia_valid & ATTR_MODE) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+
+ if (dn_mask)
+ dnotify_parent(dentry, dn_mask);
+ if (in_mask) {
+ inotify_inode_queue_event(dentry->d_inode, in_mask, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, in_mask, 0,
+ dentry->d_name.name);
+ }
+}
+
+/*
+ * fsnotify_sb_umount - filesystem unmount
+ */
+static inline void fsnotify_sb_umount(struct super_block *sb)
+{
+ inotify_super_block_umount(sb);
+}
+
+/*
+ * fsnotify_flush - flush time!
+ */
+static inline void fsnotify_flush(struct file *filp, fl_owner_t id)
+{
+ dnotify_flush(filp, id);
+}
+
+#ifdef CONFIG_INOTIFY /* inotify helpers */
+
+/*
+ * fsnotify_oldname_init - save off the old filename before we change it
+ *
+ * this could be kstrdup if only we could add that to lib/string.c
+ */
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ char *old_name;
+
+ old_name = kmalloc(strlen(old_dentry->d_name.name) + 1, GFP_KERNEL);
+ if (old_name)
+ strcpy(old_name, old_dentry->d_name.name);
+ return old_name;
+}
+
+/*
+ * fsnotify_oldname_free - free the name we got from fsnotify_oldname_init
+ */
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+ kfree(old_name);
+}
+
+#else /* CONFIG_INOTIFY */
+
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ return NULL;
+}
+
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+}
+
+#endif /* ! CONFIG_INOTIFY */
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_FS_NOTIFY_H */
diff -urN linux-2.6.11-mm1/include/linux/inotify.h linux/include/linux/inotify.h
--- linux-2.6.11-mm1/include/linux/inotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/inotify.h 2005-03-07 16:20:02.240238272 -0500
@@ -0,0 +1,113 @@
+/*
+ * Inode based directory notification for Linux
+ *
+ * Copyright (C) 2005 John McCutchan
+ */
+
+#ifndef _LINUX_INOTIFY_H
+#define _LINUX_INOTIFY_H
+
+#include <linux/types.h>
+#include <linux/limits.h>
+
+/*
+ * struct inotify_event - structure read from the inotify device for each event
+ *
+ * When you are watching a directory, you will receive the filename for events
+ * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, ..., relative to the wd.
+ */
+struct inotify_event {
+ __s32 wd; /* watch descriptor */
+ __u32 mask; /* watch mask */
+ __u32 cookie; /* cookie to synchronize two events */
+ __u32 len; /* length (including nulls) of name */
+ char name[0]; /* stub for possible name */
+};
+
+/*
+ * struct inotify_watch_request - represents a watch request
+ *
+ * Pass to the inotify device via the INOTIFY_WATCH ioctl
+ */
+struct inotify_watch_request {
+ char *name; /* filename name */
+ __u32 mask; /* event mask */
+};
+
+/* the following are legal, implemented events */
+#define IN_ACCESS 0x00000001 /* File was accessed */
+#define IN_MODIFY 0x00000002 /* File was modified */
+#define IN_ATTRIB 0x00000004 /* File changed attributes */
+#define IN_CLOSE_WRITE 0x00000008 /* Writtable file was closed */
+#define IN_CLOSE_NOWRITE 0x00000010 /* Unwrittable file closed */
+#define IN_OPEN 0x00000020 /* File was opened */
+#define IN_MOVED_FROM 0x00000040 /* File was moved from X */
+#define IN_MOVED_TO 0x00000080 /* File was moved to Y */
+#define IN_DELETE_SUBDIR 0x00000100 /* Subdir was deleted */
+#define IN_DELETE_FILE 0x00000200 /* Subfile was deleted */
+#define IN_CREATE_SUBDIR 0x00000400 /* Subdir was created */
+#define IN_CREATE_FILE 0x00000800 /* Subfile was created */
+#define IN_DELETE_SELF 0x00001000 /* Self was deleted */
+#define IN_UNMOUNT 0x00002000 /* Backing fs was unmounted */
+#define IN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define IN_IGNORED 0x00008000 /* File was ignored */
+
+/* special flags */
+#define IN_ALL_EVENTS 0xffffffff /* All the events */
+#define IN_CLOSE (IN_CLOSE_WRITE | IN_CLOSE_NOWRITE)
+
+#define INOTIFY_IOCTL_MAGIC 'Q'
+#define INOTIFY_IOCTL_MAXNR 2
+
+#define INOTIFY_WATCH _IOR(INOTIFY_IOCTL_MAGIC, 1, struct inotify_watch_request)
+#define INOTIFY_IGNORE _IOR(INOTIFY_IOCTL_MAGIC, 2, int)
+
+#ifdef __KERNEL__
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/config.h>
+#include <asm/atomic.h>
+
+#ifdef CONFIG_INOTIFY
+
+extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
+ const char *);
+extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
+ const char *);
+extern void inotify_super_block_umount(struct super_block *);
+extern void inotify_inode_is_dead(struct inode *);
+extern u32 inotify_get_cookie(void);
+
+#else
+
+static inline void inotify_inode_queue_event(struct inode *inode,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_dentry_parent_queue_event(struct dentry *dentry,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_super_block_umount(struct super_block *sb)
+{
+}
+
+static inline void inotify_inode_is_dead(struct inode *inode)
+{
+}
+
+static inline u32 inotify_get_cookie(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_INOTIFY */
+
+#endif /* __KERNEL __ */
+
+#endif /* _LINUX_INOTIFY_H */
diff -urN linux-2.6.11-mm1/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.6.11-mm1/include/linux/sched.h 2005-03-04 14:06:21.000000000 -0500
+++ linux/include/linux/sched.h 2005-03-07 16:20:02.242237968 -0500
@@ -411,6 +411,10 @@
atomic_t processes; /* How many processes does this user have? */
atomic_t files; /* How many open files does this user have? */
atomic_t sigpending; /* How many pending signals does this user have? */
+#ifdef CONFIG_INOTIFY
+ atomic_t inotify_watches; /* How many inotify watches does this user have? */
+ atomic_t inotify_devs; /* How many inotify devs does this user have opened? */
+#endif
/* protected by mq_lock */
unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
unsigned long locked_shm; /* How many pages of mlocked shm ? */
diff -urN linux-2.6.11-mm1/kernel/user.c linux/kernel/user.c
--- linux-2.6.11-mm1/kernel/user.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/kernel/user.c 2005-03-07 16:20:02.243237816 -0500
@@ -120,6 +120,10 @@
atomic_set(&new->processes, 0);
atomic_set(&new->files, 0);
atomic_set(&new->sigpending, 0);
+#ifdef CONFIG_INOTIFY
+ atomic_set(&new->inotify_watches, 0);
+ atomic_set(&new->inotify_devs, 0);
+#endif

new->mq_bytes = 0;
new->locked_shm = 0;


2005-03-08 01:21:42

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Maandag 07 M?rz 2005 04:13, Albert Cahalan wrote:
> Christoph Hellwig writes:
> > See the review I sent. Write is exactly the right interface for that kind
> > of thing. For comment vs argument either put the number first so we don't
> > have the problem of finding a delinator that isn't a valid filename, or
> > use '\0' as such.
>
> That's just putrid. You've proposed an interface that
> combines the worst of ASCII with the worst of binary.

I guess it's possible to avoid the need for passing the command at all.
The read data already has a format that mixes binary and variable-length
ascii data, so write could use a data structure similar (or even identical)
to the one used there, e.g.

struct inotify_watch_request {
__u32 mask; /* watch mask */
__u32 len; /* length (including nulls) of name */
char name[0]; /* stub for name */
};

This can replace both INOTIFY_WATCH and INOTIFY_IGNORE, if you simply
define a zero mask as a special value for ignore. FIONREAD is a
well-established interface, so I don't think it's necessary to replace
this.

> Adding plain old syscalls is rather nice actually.
> It's only a pain at first, while waiting for glibc
> to catch up.

Yes, that might be a workable interface as well, but don't mix syscalls
with a misc device then. Instead, you might build on something like
futexfs, with syscalls replacing both ioctl and read:

int inotify_open(int flags);
int inotify_watch(int fd, unsigned mask, char *name);
int inotify_ignore(int fd, int wd);
int inotify_getevents(int fd, int max_events, struct inotify_event *);

Arnd <><

2005-03-08 04:40:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11-mm1, updated

> > this one seems totally unrelated.
>
> Eh? We did not add that. ;)

Sorry, I thought I saw a + somewhere there at the beggining of the line,
my fault.

> > Should probably use the /dev/mem major.
>
> Hrm, should we?
>
> Also, the memory class stuff is all local to mem.c. For example, I
> cannot get at /sys/class/mem. The misc. device stuff is exported.

Why do you need the classdevice? I'm really not too eager about adding
tons of new misdevices now that we can route directly to individual majors
with cdev_add & stuff. Especially when you're actually relying on class
device you should have your own one instead of relying on an onsolete
layer.

> > do you really need a spinlock of your own in every inode? Inode memory
> > usage is a quite big problem.
>
> Yah, we do. For a couple of reasons. First, by introducing our own
> lock, we never need touch i_lock, and avoid that scalability mess
> altogether. Second, and most importantly, i_lock is an outermost lock.
> We need our lock to be nestable, because we walk inode -> inotify_watch
> -> inotify_device. I've tried various rewrites to not need our own
> lock. None are pretty.
>
> I can offer to the "inode memory worries me" people that they can always
> disable CONFIG_INOTIFY.

They're bound to use distro kernels unfortunaly.. These people is anyone
doing big-scale fileserving at least.

> + if ((ret + (type == READ)) > 0) {
> + struct dentry *dentry = file->f_dentry;
> + if (type == READ)
> + fsnotify_access(dentry, dentry->d_inode,
> + dentry->d_name.name);
> + else
> + fsnotify_modify(dentry, dentry->d_inode,
> + dentry->d_name.name);
> + }

Arguments two and three are still redudant.

Actually, you fixed that in read_write.c, just compat.c is still missing.
Looks like you forget to fix that one and didn't have a chance to compile-test
the 32bit compat layer?

2005-03-08 04:48:50

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11-mm1, updated

On Tue, 2005-03-08 at 04:40 +0000, Christoph Hellwig wrote:

> Why do you need the classdevice? I'm really not too eager about adding
> tons of new misdevices now that we can route directly to individual majors
> with cdev_add & stuff. Especially when you're actually relying on class
> device you should have your own one instead of relying on an onsolete
> layer.

We have sysfs knobs and /sys/class/misc/inotify makes sense.

> Actually, you fixed that in read_write.c, just compat.c is still missing.
> Looks like you forget to fix that one and didn't have a chance to compile-test
> the 32bit compat layer?

Yah, I just missed it. It is fixed in my tree.

Thanks,

Robert Love


2005-03-08 17:17:44

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11-mm1, updated

On Mon, 2005-03-07 at 23:50 -0500, Robert Love wrote:

> Yah, I just missed it. It is fixed in my tree.

Following patch, against 2.6.11-mm1, fixes the hooks in fs/compat.c.

Otherwise unchanged from the previous patch.

Robert Love


inotify!

inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

* inotify's interface is a device node, not SIGIO. You open a
single fd to the device node, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.

Inotify is currently used by Beagle (a desktop search infrastructure)
and Gamin (a FAM replacement).

Signed-off-by: Robert Love <[email protected]>

fs/Kconfig | 13
fs/Makefile | 1
fs/attr.c | 33 -
fs/compat.c | 12
fs/file_table.c | 3
fs/inode.c | 4
fs/inotify.c | 1014 +++++++++++++++++++++++++++++++++++++++++++++++
fs/namei.c | 30 -
fs/open.c | 6
fs/read_write.c | 15
fs/super.c | 2
include/linux/fs.h | 8
include/linux/fsnotify.h | 236 ++++++++++
include/linux/inotify.h | 113 +++++
include/linux/sched.h | 4
kernel/user.c | 4
16 files changed, 1442 insertions(+), 56 deletions(-)

diff -urN linux-2.6.11-mm1/fs/attr.c linux/fs/attr.c
--- linux-2.6.11-mm1/fs/attr.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/attr.c 2005-03-08 12:02:28.216810448 -0500
@@ -10,7 +10,7 @@
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/fcntl.h>
#include <linux/quotaops.h>
#include <linux/security.h>
@@ -107,31 +107,8 @@
out:
return error;
}
-
EXPORT_SYMBOL(inode_setattr);

-int setattr_mask(unsigned int ia_valid)
-{
- unsigned long dn_mask = 0;
-
- if (ia_valid & ATTR_UID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_GID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_SIZE)
- dn_mask |= DN_MODIFY;
- /* both times implies a utime(s) call */
- if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
- dn_mask |= DN_ATTRIB;
- else if (ia_valid & ATTR_ATIME)
- dn_mask |= DN_ACCESS;
- else if (ia_valid & ATTR_MTIME)
- dn_mask |= DN_MODIFY;
- if (ia_valid & ATTR_MODE)
- dn_mask |= DN_ATTRIB;
- return dn_mask;
-}
-
int notify_change(struct dentry * dentry, struct iattr * attr)
{
struct inode *inode = dentry->d_inode;
@@ -194,11 +171,9 @@
if (ia_valid & ATTR_SIZE)
up_write(&dentry->d_inode->i_alloc_sem);

- if (!error) {
- unsigned long dn_mask = setattr_mask(ia_valid);
- if (dn_mask)
- dnotify_parent(dentry, dn_mask);
- }
+ if (!error)
+ fsnotify_change(dentry, ia_valid);
+
return error;
}

diff -urN linux-2.6.11-mm1/fs/compat.c linux/fs/compat.c
--- linux-2.6.11-mm1/fs/compat.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/compat.c 2005-03-08 12:02:30.518460544 -0500
@@ -36,7 +36,7 @@
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/dirent.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/highuid.h>
#include <linux/sunrpc/svc.h>
#include <linux/nfsd/nfsd.h>
@@ -1233,9 +1233,13 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ if (type == READ)
+ fsnotify_access(dentry);
+ else
+ fsnotify_modify(dentry);
+ }
return ret;
}

diff -urN linux-2.6.11-mm1/fs/file_table.c linux/fs/file_table.c
--- linux-2.6.11-mm1/fs/file_table.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/file_table.c 2005-03-08 12:02:28.219809992 -0500
@@ -16,6 +16,7 @@
#include <linux/eventpoll.h>
#include <linux/mount.h>
#include <linux/cdev.h>
+#include <linux/fsnotify.h>

/* sysctl tunables... */
struct files_stat_struct files_stat = {
@@ -123,6 +124,8 @@
struct inode *inode = dentry->d_inode;

might_sleep();
+
+ fsnotify_close(file);
/*
* The function eventpoll_release() should be the first called
* in the file cleanup chain.
diff -urN linux-2.6.11-mm1/fs/inode.c linux/fs/inode.c
--- linux-2.6.11-mm1/fs/inode.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/inode.c 2005-03-08 12:02:28.221809688 -0500
@@ -132,6 +132,10 @@
#ifdef CONFIG_QUOTA
memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
#endif
+#ifdef CONFIG_INOTIFY
+ INIT_LIST_HEAD(&inode->inotify_watches);
+ spin_lock_init(&inode->inotify_lock);
+#endif
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_cdev = NULL;
diff -urN linux-2.6.11-mm1/fs/inotify.c linux/fs/inotify.c
--- linux-2.6.11-mm1/fs/inotify.c 1969-12-31 19:00:00.000000000 -0500
+++ linux/fs/inotify.c 2005-03-08 12:02:28.224809232 -0500
@@ -0,0 +1,1014 @@
+/*
+ * fs/inotify.c - inode-based file event notifications
+ *
+ * Authors:
+ * John McCutchan <[email protected]>
+ * Robert Love <[email protected]>
+ *
+ * Copyright (C) 2005 John McCutchan
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/writeback.h>
+#include <linux/inotify.h>
+
+#include <asm/ioctls.h>
+
+static atomic_t inotify_cookie;
+
+static kmem_cache_t *watch_cachep;
+static kmem_cache_t *event_cachep;
+
+static int max_user_devices;
+static int max_user_watches;
+static unsigned int max_queued_events;
+
+/*
+ * Lock ordering:
+ *
+ * inode_lock (used to safely walk the super_block->s_inodes list)
+ * dentry->d_lock (used to keep d_move() away from dentry->d_parent)
+ * inode->inotify_lock (protects inotify->inotify_watches and watches->i_list)
+ * inotify_dev->lock (protects inotify_device and watches->d_list)
+ */
+
+/*
+ * Lifetimes of the three main data structures -- inotify_device, inode, and
+ * inotify_watch -- are managed by reference count.
+ *
+ * inotify_device: Lifetime is from open until release. Additional references
+ * can bump the count via get_inotify_dev() and drop the count via
+ * put_inotify_dev().
+ *
+ * inotify_watch: Lifetime is from create_watch() to destory_watch().
+ * Additional references can bump the count via get_inotify_watch() and drop
+ * the count via put_inotify_watch().
+ *
+ * inode: Pinned so long as the inode is associated with a watch, from
+ * create_watch() to put_inotify_watch().
+ */
+
+/*
+ * struct inotify_device - represents an open instance of an inotify device
+ *
+ * This structure is protected by 'lock'.
+ */
+struct inotify_device {
+ wait_queue_head_t wq; /* wait queue for i/o */
+ struct idr idr; /* idr mapping wd -> watch */
+ struct list_head events; /* list of queued events */
+ struct list_head watches; /* list of watches */
+ spinlock_t lock; /* protects this bad boy */
+ atomic_t count; /* reference count */
+ struct user_struct *user; /* user who opened this dev */
+ unsigned int queue_size; /* size of the queue (bytes) */
+ unsigned int event_count; /* number of pending events */
+ unsigned int max_events; /* maximum number of events */
+};
+
+/*
+ * struct inotify_kernel_event - An intofiy event, originating from a watch and
+ * queued for user-space. A list of these is attached to each instance of the
+ * device. In read(), this list is walked and all events that can fit in the
+ * buffer are returned.
+ *
+ * Protected by dev->lock of the device in which we are queued.
+ */
+struct inotify_kernel_event {
+ struct inotify_event event; /* the user-space event */
+ struct list_head list; /* entry in inotify_device's list */
+ char *name; /* filename, if any */
+};
+
+/*
+ * struct inotify_watch - represents a watch request on a specific inode
+ *
+ * d_list is protected by dev->lock of the associated dev->watches.
+ * i_list and mask are protected by inode->inotify_lock of the associated inode.
+ * dev, inode, and wd are never written to once the watch is created.
+ */
+struct inotify_watch {
+ struct list_head d_list; /* entry in inotify_device's list */
+ struct list_head i_list; /* entry in inode's list */
+ atomic_t count; /* reference count */
+ struct inotify_device *dev; /* associated device */
+ struct inode *inode; /* associated inode */
+ s32 wd; /* watch descriptor */
+ u32 mask; /* event mask for this watch */
+};
+
+static ssize_t show_max_queued_events(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_queued_events);
+}
+
+static ssize_t store_max_queued_events(struct class_device *class,
+ const char *buf, size_t count)
+{
+ unsigned int max;
+
+ if (sscanf(buf, "%u", &max) > 0 && max > 0) {
+ max_queued_events = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_devices(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_devices);
+}
+
+static ssize_t store_max_user_devices(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_devices = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_watches(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_watches);
+}
+
+static ssize_t store_max_user_watches(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_watches = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static CLASS_DEVICE_ATTR(max_queued_events, S_IRUGO | S_IWUSR,
+ show_max_queued_events, store_max_queued_events);
+static CLASS_DEVICE_ATTR(max_user_devices, S_IRUGO | S_IWUSR,
+ show_max_user_devices, store_max_user_devices);
+static CLASS_DEVICE_ATTR(max_user_watches, S_IRUGO | S_IWUSR,
+ show_max_user_watches, store_max_user_watches);
+
+static inline void get_inotify_dev(struct inotify_device *dev)
+{
+ atomic_inc(&dev->count);
+}
+
+static inline void put_inotify_dev(struct inotify_device *dev)
+{
+ if (atomic_dec_and_test(&dev->count)) {
+ atomic_dec(&dev->user->inotify_devs);
+ free_uid(dev->user);
+ kfree(dev);
+ }
+}
+
+static inline void get_inotify_watch(struct inotify_watch *watch)
+{
+ atomic_inc(&watch->count);
+}
+
+static inline void put_inotify_watch(struct inotify_watch *watch)
+{
+ if (atomic_dec_and_test(&watch->count)) {
+ put_inotify_dev(watch->dev);
+ iput(watch->inode);
+ kmem_cache_free(watch_cachep, watch);
+ }
+}
+
+/*
+ * kernel_event - create a new kernel event with the given parameters
+ */
+static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_kernel_event *kevent;
+
+ /* XXX: optimally, we should use GFP_KERNEL */
+ kevent = kmem_cache_alloc(event_cachep, GFP_ATOMIC);
+ if (unlikely(!kevent))
+ return NULL;
+
+ /* we hand this out to user-space, so zero it just in case */
+ memset(&kevent->event, 0, sizeof(struct inotify_event));
+
+ kevent->event.wd = wd;
+ kevent->event.mask = mask;
+ kevent->event.cookie = cookie;
+
+ INIT_LIST_HEAD(&kevent->list);
+
+ if (name) {
+ size_t len, rem, event_size = sizeof(struct inotify_event);
+
+ /*
+ * We need to pad the filename so as to properly align an
+ * array of inotify_event structures. Because the structure is
+ * small and the common case is a small filename, we just round
+ * up to the next multiple of the structure's sizeof. This is
+ * simple and safe for all architectures.
+ */
+ len = strlen(name) + 1;
+ rem = event_size - len;
+ if (len > event_size) {
+ rem = event_size - (len % event_size);
+ if (len % event_size == 0)
+ rem = 0;
+ }
+ len += rem;
+
+ /* XXX: optimally, we should use GFP_KERNEL */
+ kevent->name = kmalloc(len, GFP_ATOMIC);
+ if (unlikely(!kevent->name)) {
+ kmem_cache_free(event_cachep, kevent);
+ return NULL;
+ }
+ memset(kevent->name, 0, len);
+ strncpy(kevent->name, name, strlen(name));
+ kevent->event.len = len;
+ } else {
+ kevent->event.len = 0;
+ kevent->name = NULL;
+ }
+
+ return kevent;
+}
+
+/*
+ * inotify_dev_get_event - return the next event in the given dev's queue
+ *
+ * Caller must hold dev->lock.
+ */
+static inline struct inotify_kernel_event *
+inotify_dev_get_event(struct inotify_device *dev)
+{
+ return list_entry(dev->events.next, struct inotify_kernel_event, list);
+}
+
+/*
+ * inotify_dev_queue_event - add a new event to the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_queue_event(struct inotify_device *dev,
+ struct inotify_watch *watch, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct inotify_kernel_event *kevent, *last;
+
+ /* coalescing: drop this event if it is a dupe of the previous */
+ last = inotify_dev_get_event(dev);
+ if (dev->event_count && last->event.mask == mask &&
+ last->event.wd == watch->wd) {
+ const char *lastname = last->name;
+
+ if (!name && !lastname)
+ return;
+ if (name && lastname && !strcmp(lastname, name))
+ return;
+ }
+
+ /*
+ * The queue has already overflowed and we have already sent the
+ * Q_OVERFLOW event.
+ */
+ if (unlikely(dev->event_count > dev->max_events))
+ return;
+
+ /* if the queue overflows, we need to notify user space */
+ if (unlikely(dev->event_count == dev->max_events))
+ kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
+ else
+ kevent = kernel_event(watch->wd, mask, cookie, name);
+
+ if (unlikely(!kevent))
+ return;
+
+ /* queue the event and wake up anyone waiting */
+ dev->event_count++;
+ dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
+ list_add_tail(&kevent->list, &dev->events);
+ wake_up_interruptible(&dev->wq);
+}
+
+/*
+ * remove_kevent - cleans up and ultimately frees the given kevent
+ */
+static void remove_kevent(struct inotify_device *dev,
+ struct inotify_kernel_event *kevent)
+{
+ BUG_ON(!dev);
+ BUG_ON(!kevent);
+
+ list_del(&kevent->list);
+
+ dev->event_count--;
+ dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
+
+ kfree(kevent->name);
+ kmem_cache_free(event_cachep, kevent);
+}
+
+/*
+ * inotify_dev_event_dequeue - destroy an event on the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_event_dequeue(struct inotify_device *dev)
+{
+ if (!list_empty(&dev->events)) {
+ struct inotify_kernel_event *kevent;
+ kevent = inotify_dev_get_event(dev);
+ remove_kevent(dev, kevent);
+ }
+}
+
+/*
+ * inotify_dev_get_wd - returns the next WD for use by the given dev
+ *
+ * Grabs dev->lock. This function can sleep.
+ */
+static int inotify_dev_get_wd(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ int ret;
+
+ do {
+ if (unlikely(!idr_pre_get(&dev->idr, GFP_KERNEL)))
+ return -ENOSPC;
+ spin_lock(&dev->lock);
+ ret = idr_get_new(&dev->idr, watch, &watch->wd);
+ spin_unlock(&dev->lock);
+ } while (ret == -EAGAIN);
+
+ return ret;
+}
+
+/*
+ * create_watch - creates a watch on the given device.
+ *
+ * Calls inotify_dev_get_wd(), so it both grabs dev->lock and may sleep.
+ * Both 'dev' and 'inode' (by way of nameidata) need to be pinned.
+ */
+static struct inotify_watch *create_watch(struct inotify_device *dev,
+ u32 mask, struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ if (atomic_read(&dev->user->inotify_watches) >= max_user_watches)
+ return NULL;
+
+ watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
+ if (unlikely(!watch))
+ return NULL;
+
+ if (unlikely(inotify_dev_get_wd(dev, watch))) {
+ kmem_cache_free(watch_cachep, watch);
+ return NULL;
+ }
+
+ watch->mask = mask;
+ atomic_set(&watch->count, 0);
+ INIT_LIST_HEAD(&watch->d_list);
+ INIT_LIST_HEAD(&watch->i_list);
+
+ /* save a reference to device and bump the count to make it official */
+ get_inotify_dev(dev);
+ watch->dev = dev;
+
+ /*
+ * Save a reference to the inode and bump the ref count to make it
+ * official. We hold a reference to nameidata, which makes this safe.
+ */
+ watch->inode = igrab(inode);
+
+ /* bump our own count, corresponding to our entry in dev->watches */
+ get_inotify_watch(watch);
+
+ atomic_inc(&dev->user->inotify_watches);
+
+ return watch;
+}
+
+/*
+ * inotify_find_dev - find the watch associated with the given inode and dev
+ *
+ * Callers must hold inode->inotify_lock.
+ */
+static struct inotify_watch *inode_find_dev(struct inode *inode,
+ struct inotify_device *dev)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->dev == dev)
+ return watch;
+ }
+
+ return NULL;
+}
+
+/*
+ * inotify_dev_is_watching_inode - is this device watching this inode?
+ *
+ * Requires 'dev' and 'inode' to both be pinned and dev->lock be held.
+ */
+static inline int inotify_dev_is_watching_inode(struct inotify_device *dev,
+ struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &dev->watches, d_list) {
+ if (watch->inode == inode)
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * remove_watch_no_event - remove_watch() without the IN_IGNORED event.
+ */
+static void remove_watch_no_event(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ BUG_ON(!dev);
+ BUG_ON(!watch);
+
+ list_del(&watch->i_list);
+ list_del(&watch->d_list);
+
+ atomic_dec(&dev->user->inotify_watches);
+ idr_remove(&dev->idr, watch->wd);
+ put_inotify_watch(watch);
+}
+
+/*
+ * remove_watch - Remove a watch from both the device and the inode. Sends
+ * the IN_IGNORED event to the given device signifying that the inode is no
+ * longer watched.
+ *
+ * Callers must hold both inode->inotify_lock and dev->lock. We drop a
+ * reference to the inode before returning.
+ */
+static void remove_watch(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ inotify_dev_queue_event(dev, watch, IN_IGNORED, 0, NULL);
+ remove_watch_no_event(watch, dev);
+}
+
+/*
+ * __inode_queue_event - internal helper for inotify_inode_queue_event()
+ *
+ * Caller must hold inode->inotify_lock.
+ */
+void __inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->mask & mask) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, mask, cookie, name);
+ spin_unlock(&dev->lock);
+ }
+ }
+}
+
+/* Kernel API */
+
+/**
+ * inotify_inode_queue_event - queue an event to all watches on this inode
+ * @inode: inode event is originating from
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ spin_lock(&inode->inotify_lock);
+ __inode_queue_event(inode, mask, cookie, name);
+ spin_unlock(&inode->inotify_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
+
+/**
+ * inotify_dentry_parent_queue_event - queue an event to a dentry's parent
+ * @dentry: the dentry in question, we queue against this dentry's parent
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct dentry *parent;
+ struct inode *inode;
+
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ inode = parent->d_inode;
+ spin_lock(&inode->inotify_lock);
+ if (!list_empty(&inode->inotify_watches)) {
+ dget(parent);
+ spin_unlock(&dentry->d_lock);
+ __inode_queue_event(inode, mask, cookie, name);
+ spin_unlock(&inode->inotify_lock);
+ dput(parent);
+ } else {
+ spin_unlock(&inode->inotify_lock);
+ spin_unlock(&dentry->d_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
+
+/**
+ * inotify_get_cookie - return a unique cookie for use in synchronizing events
+ *
+ * Returns the unique cookie.
+ */
+u32 inotify_get_cookie(void)
+{
+ return atomic_inc_return(&inotify_cookie);
+}
+EXPORT_SYMBOL_GPL(inotify_get_cookie);
+
+/**
+ * inotify_super_block_umount - process watches on an unmounted fs
+ * @sb: the super_block of the filesystem in question
+ */
+void inotify_super_block_umount(struct super_block *sb)
+{
+ struct inode *inode;
+
+ /* walk the list of inodes on this superblock */
+ spin_lock(&inode_lock);
+ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ struct inotify_watch *watch, *next;
+ struct list_head *watches;
+
+ /* for each watch, send IN_UNMOUNT and then remove it */
+ spin_lock(&inode->inotify_lock);
+ watches = &inode->inotify_watches;
+ list_for_each_entry_safe(watch, next, watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, IN_UNMOUNT,0,NULL);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ spin_unlock(&inode->inotify_lock);
+ }
+ spin_unlock(&inode_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_super_block_umount);
+
+/**
+ * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
+ * @inode: inode that is about to be removed
+ */
+void inotify_inode_is_dead(struct inode *inode)
+{
+ struct inotify_watch *watch, *next;
+
+ spin_lock(&inode->inotify_lock);
+ list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ spin_unlock(&inode->inotify_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
+
+/* Device Interface */
+
+static unsigned int inotify_poll(struct file *file, poll_table *wait)
+{
+ struct inotify_device *dev;
+ int ret = 0;
+
+ dev = file->private_data;
+ get_inotify_dev(dev);
+
+ poll_wait(file, &dev->wq, wait);
+ spin_lock(&dev->lock);
+ if (!list_empty(&dev->events))
+ ret = POLLIN | POLLRDNORM;
+ spin_unlock(&dev->lock);
+
+ put_inotify_dev(dev);
+ return ret;
+}
+
+static ssize_t inotify_read(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ size_t event_size;
+ struct inotify_device *dev;
+ char __user *start;
+ int ret;
+ DEFINE_WAIT(wait);
+
+ start = buf;
+ dev = file->private_data;
+
+ /* we only hand out full inotify events */
+ event_size = sizeof(struct inotify_event);
+ if (count < event_size)
+ return 0;
+
+ while (1) {
+ int events;
+
+ prepare_to_wait(&dev->wq, &wait, TASK_INTERRUPTIBLE);
+
+ spin_lock(&dev->lock);
+ events = !list_empty(&dev->events);
+ spin_unlock(&dev->lock);
+ if (events) {
+ ret = 0;
+ break;
+ }
+
+ if (file->f_flags & O_NONBLOCK) {
+ ret = -EAGAIN;
+ break;
+ }
+
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+
+ schedule();
+ }
+
+ finish_wait(&dev->wq, &wait);
+ if (ret)
+ return ret;
+
+ while (1) {
+ struct inotify_kernel_event *kevent;
+
+ spin_lock(&dev->lock);
+ if (list_empty(&dev->events)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ kevent = inotify_dev_get_event(dev);
+ if (event_size + kevent->event.len > count) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ list_del_init(&kevent->list);
+ spin_unlock(&dev->lock);
+
+ if (copy_to_user(buf, &kevent->event, event_size)) {
+ /* put the event back on the queue */
+ spin_lock(&dev->lock);
+ list_add(&kevent->list, &dev->events);
+ spin_unlock(&dev->lock);
+ return -EFAULT;
+ }
+ buf += event_size;
+ count -= event_size;
+
+ if (kevent->name) {
+ if (copy_to_user(buf, kevent->name, kevent->event.len)){
+ /* put the event back on the queue */
+ spin_lock(&dev->lock);
+ list_add(&kevent->list, &dev->events);
+ spin_unlock(&dev->lock);
+ return -EFAULT;
+ }
+ buf += kevent->event.len;
+ count -= kevent->event.len;
+ }
+
+ /*
+ * We made it here, so the event was copied to the user. It is
+ * already removed from the event list, just free it.
+ */
+ spin_lock(&dev->lock);
+ remove_kevent(dev, kevent);
+ spin_unlock(&dev->lock);
+ }
+
+ return buf - start;
+}
+
+static int inotify_open(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+ struct user_struct *user;
+ int ret;
+
+ user = get_uid(current->user);
+
+ if (unlikely(atomic_read(&user->inotify_devs) >= max_user_devices)) {
+ ret = -EMFILE;
+ goto out_err;
+ }
+
+ dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
+ if (unlikely(!dev)) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ idr_init(&dev->idr);
+ INIT_LIST_HEAD(&dev->events);
+ INIT_LIST_HEAD(&dev->watches);
+ init_waitqueue_head(&dev->wq);
+ spin_lock_init(&dev->lock);
+
+ dev->event_count = 0;
+ dev->queue_size = 0;
+ dev->max_events = max_queued_events;
+ dev->user = user;
+ atomic_set(&dev->count, 0);
+
+ get_inotify_dev(dev);
+ atomic_inc(&current->user->inotify_devs);
+
+ file->private_data = dev;
+
+ return 0;
+out_err:
+ free_uid(current->user);
+ return ret;
+}
+
+static int inotify_release(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+
+ dev = file->private_data;
+ BUG_ON(!dev);
+
+ /*
+ * Destroy all of the watches on this device. Unfortunately, not very
+ * pretty. We cannot do a simple iteration over the list, because we
+ * do not know the inode until we iterate to the watch. But we need to
+ * hold inode->inotify_lock before dev->lock. The following works.
+ */
+ while (1) {
+ struct inotify_watch *watch;
+ struct list_head *watches;
+ struct inode *inode;
+
+ spin_lock(&dev->lock);
+ watches = &dev->watches;
+ if (list_empty(watches)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ watch = list_entry(watches->next, struct inotify_watch, d_list);
+ get_inotify_watch(watch);
+ spin_unlock(&dev->lock);
+
+ inode = watch->inode;
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ remove_watch_no_event(watch, dev);
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+ put_inotify_watch(watch);
+ }
+
+ /* destroy all of the events on this device */
+ spin_lock(&dev->lock);
+ while (!list_empty(&dev->events))
+ inotify_dev_event_dequeue(dev);
+ spin_unlock(&dev->lock);
+
+ /* free this device: the put matching the get in inotify_open() */
+ put_inotify_dev(dev);
+
+ return 0;
+}
+
+static int inotify_add_watch(struct inotify_device *dev,
+ struct inotify_watch_request *request)
+{
+ struct inode *inode;
+ struct inotify_watch *watch, *old;
+ struct nameidata nd;
+ int ret;
+
+ ret = __user_walk(request->name, LOOKUP_FOLLOW, &nd);
+ if (unlikely(ret))
+ return ret;
+
+ /* you can only watch an inode if you have read permissions on it */
+ ret = permission(nd.dentry->d_inode, MAY_READ, &nd);
+ if (unlikely(ret))
+ goto nd_out;
+
+ /* inode is held in place by a reference on nd */
+ inode = nd.dentry->d_inode;
+
+ /*
+ * Handle the case of re-adding a watch on an (inode,dev) pair that we
+ * are already watching. We just update the mask and return its wd.
+ */
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ old->mask = request->mask;
+ ret = old->wd;
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+ goto nd_out;
+ }
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+ /*
+ * We do this lockless, for both scalability and so we can allocate
+ * with GFP_KERNEL. But that means we can race here and add this watch
+ * twice. We fix up that case below, by again checking for the watch
+ * after we reacquire the locks.
+ */
+ watch = create_watch(dev, request->mask, inode);
+ if (unlikely(!watch)) {
+ ret = -ENOSPC;
+ goto nd_out;
+ }
+
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ /* We raced! Destroy this watch and return. */
+ put_inotify_watch(watch);
+ ret = -EBUSY;
+ } else {
+ /* Add the watch to the device's and the inode's list */
+ list_add(&watch->d_list, &dev->watches);
+ list_add(&watch->i_list, &inode->inotify_watches);
+ ret = watch->wd;
+ }
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+nd_out:
+ path_release(&nd);
+
+ return ret;
+}
+
+/*
+ * inotify_ignore - handle the INOTIFY_IGNORE ioctl, asking that a given wd be
+ * removed from the device.
+ */
+static int inotify_ignore(struct inotify_device *dev, s32 wd)
+{
+ struct inotify_watch *watch;
+ struct inode *inode;
+
+ spin_lock(&dev->lock);
+ watch = idr_find(&dev->idr, wd);
+ spin_unlock(&dev->lock);
+
+ if (unlikely(!watch))
+ return -EINVAL;
+
+ inode = watch->inode;
+ spin_lock(&inode->inotify_lock);
+ spin_lock(&dev->lock);
+ remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ spin_unlock(&inode->inotify_lock);
+
+ return 0;
+}
+
+static int inotify_ioctl(struct inode *ip, struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct inotify_device *dev;
+ struct inotify_watch_request request;
+ void __user *p;
+ int ret = -ENOTTY;
+ s32 wd;
+
+ dev = file->private_data;
+ p = (void __user *) arg;
+
+ get_inotify_dev(dev);
+
+ switch (cmd) {
+ case INOTIFY_WATCH:
+ if (unlikely(copy_from_user(&request, p, sizeof (request)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_add_watch(dev, &request);
+ break;
+ case INOTIFY_IGNORE:
+ if (unlikely(copy_from_user(&wd, p, sizeof (wd)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_ignore(dev, wd);
+ break;
+ case FIONREAD:
+ ret = put_user(dev->queue_size, (int __user *) p);
+ break;
+ }
+
+ put_inotify_dev(dev);
+
+ return ret;
+}
+
+static struct file_operations inotify_fops = {
+ .owner = THIS_MODULE,
+ .poll = inotify_poll,
+ .read = inotify_read,
+ .open = inotify_open,
+ .release = inotify_release,
+ .ioctl = inotify_ioctl,
+};
+
+static struct miscdevice inotify_device = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "inotify",
+ .fops = &inotify_fops,
+};
+
+/*
+ * inotify_init - Our initialization function. Note that we cannnot return
+ * error because we have compiled-in VFS hooks. So an (unlikely) failure here
+ * must result in panic().
+ */
+static int __init inotify_init(void)
+{
+ struct class_device *class;
+ int ret;
+
+ ret = misc_register(&inotify_device);
+ if (unlikely(ret))
+ panic("inotify: misc_register returned %d\n", ret);
+
+ max_queued_events = 512;
+ max_user_devices = 64;
+ max_user_watches = 16384;
+
+ class = inotify_device.class;
+ class_device_create_file(class, &class_device_attr_max_queued_events);
+ class_device_create_file(class, &class_device_attr_max_user_devices);
+ class_device_create_file(class, &class_device_attr_max_user_watches);
+
+ atomic_set(&inotify_cookie, 0);
+
+ watch_cachep = kmem_cache_create("inotify_watch_cache",
+ sizeof(struct inotify_watch),
+ 0, SLAB_PANIC, NULL, NULL);
+ event_cachep = kmem_cache_create("inotify_event_cache",
+ sizeof(struct inotify_kernel_event),
+ 0, SLAB_PANIC, NULL, NULL);
+
+ printk(KERN_INFO "inotify device minor=%d\n", inotify_device.minor);
+
+ return 0;
+}
+
+module_init(inotify_init);
diff -urN linux-2.6.11-mm1/fs/Kconfig linux/fs/Kconfig
--- linux-2.6.11-mm1/fs/Kconfig 2005-03-04 13:23:55.000000000 -0500
+++ linux/fs/Kconfig 2005-03-08 12:02:28.227808776 -0500
@@ -344,6 +344,19 @@
If you don't know whether you need it, then you don't need it:
answer N.

+config INOTIFY
+ bool "Inotify file change notification support"
+ default y
+ ---help---
+ Say Y here to enable inotify support and the /dev/inotify character
+ device. Inotify is a file change notification system and a
+ replacement for dnotify. Inotify fixes numerous shortcomings in
+ dnotify and introduces several new features. It allows monitoring
+ of both files and directories via a single open fd. Multiple file
+ events are supported.
+
+ If unsure, say Y.
+
config QUOTA
bool "Quota support"
help
diff -urN linux-2.6.11-mm1/fs/Makefile linux/fs/Makefile
--- linux-2.6.11-mm1/fs/Makefile 2005-03-04 13:23:55.000000000 -0500
+++ linux/fs/Makefile 2005-03-08 12:02:28.228808624 -0500
@@ -11,6 +11,7 @@
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \

+obj-$(CONFIG_INOTIFY) += inotify.o
obj-$(CONFIG_EPOLL) += eventpoll.o
obj-$(CONFIG_COMPAT) += compat.o

diff -urN linux-2.6.11-mm1/fs/namei.c linux/fs/namei.c
--- linux-2.6.11-mm1/fs/namei.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/namei.c 2005-03-08 12:02:28.230808320 -0500
@@ -21,7 +21,7 @@
#include <linux/namei.h>
#include <linux/quotaops.h>
#include <linux/pagemap.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/smp_lock.h>
#include <linux/personality.h>
#include <linux/security.h>
@@ -1261,7 +1261,7 @@
DQUOT_INIT(dir);
error = dir->i_op->create(dir, dentry, mode, nd);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_create(dir, dentry, mode);
}
return error;
@@ -1566,7 +1566,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mknod(dir, dentry, mode, dev);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_mknod(dir, dentry, mode, dev);
}
return error;
@@ -1639,7 +1639,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mkdir(dir, dentry, mode);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_mkdir(dir, dentry->d_name.name);
security_inode_post_mkdir(dir,dentry, mode);
}
return error;
@@ -1730,7 +1730,7 @@
}
up(&dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_DELETE);
+ fsnotify_rmdir(dentry, dentry->d_inode, dir);
d_delete(dentry);
}
dput(dentry);
@@ -1803,9 +1803,10 @@

/* We don't d_delete() NFS sillyrenamed files--they still exist. */
if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
+ fsnotify_unlink(dentry, dir);
d_delete(dentry);
- inode_dir_notify(dir, DN_DELETE);
}
+
return error;
}

@@ -1879,7 +1880,7 @@
DQUOT_INIT(dir);
error = dir->i_op->symlink(dir, dentry, oldname);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_symlink(dir, dentry, oldname);
}
return error;
@@ -1952,7 +1953,7 @@
error = dir->i_op->link(old_dentry, dir, new_dentry);
up(&old_dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, new_dentry->d_name.name);
security_inode_post_link(old_dentry, dir, new_dentry);
}
return error;
@@ -2116,6 +2117,7 @@
{
int error;
int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
+ char *old_name;

if (old_dentry->d_inode == new_dentry->d_inode)
return 0;
@@ -2137,18 +2139,18 @@
DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir);

+ old_name = fsnotify_oldname_init(old_dentry);
+
if (is_dir)
error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
else
error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
if (!error) {
- if (old_dir == new_dir)
- inode_dir_notify(old_dir, DN_RENAME);
- else {
- inode_dir_notify(old_dir, DN_DELETE);
- inode_dir_notify(new_dir, DN_CREATE);
- }
+ const char *new_name = old_dentry->d_name.name;
+ fsnotify_move(old_dir, new_dir, old_name, new_name);
}
+ fsnotify_oldname_free(old_name);
+
return error;
}

diff -urN linux-2.6.11-mm1/fs/open.c linux/fs/open.c
--- linux-2.6.11-mm1/fs/open.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/open.c 2005-03-08 12:02:28.232808016 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/smp_lock.h>
#include <linux/quotaops.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/tty.h>
@@ -944,9 +944,11 @@
fd = get_unused_fd();
if (fd >= 0) {
struct file *f = filp_open(tmp, flags, mode);
+
error = PTR_ERR(f);
if (IS_ERR(f))
goto out_error;
+ fsnotify_open(f->f_dentry);
fd_install(fd, f);
}
out:
@@ -998,7 +1000,7 @@
retval = err;
}

- dnotify_flush(filp, id);
+ fsnotify_flush(filp, id);
locks_remove_posix(filp, id);
fput(filp);
return retval;
diff -urN linux-2.6.11-mm1/fs/read_write.c linux/fs/read_write.c
--- linux-2.6.11-mm1/fs/read_write.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/read_write.c 2005-03-08 12:02:28.233807864 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/uio.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/module.h>
#include <linux/syscalls.h>
@@ -239,7 +239,7 @@
else
ret = do_sync_read(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_ACCESS);
+ fsnotify_access(file->f_dentry);
current->rchar += ret;
}
current->syscr++;
@@ -287,7 +287,7 @@
else
ret = do_sync_write(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_MODIFY);
+ fsnotify_modify(file->f_dentry);
current->wchar += ret;
}
current->syscw++;
@@ -523,9 +523,12 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ if (type == READ)
+ fsnotify_access(file->f_dentry);
+ else
+ fsnotify_modify(file->f_dentry);
+ }
return ret;
Efault:
ret = -EFAULT;
diff -urN linux-2.6.11-mm1/fs/super.c linux/fs/super.c
--- linux-2.6.11-mm1/fs/super.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/fs/super.c 2005-03-08 12:02:28.234807712 -0500
@@ -37,6 +37,7 @@
#include <linux/writeback.h> /* for the emergency remount stuff */
#include <linux/idr.h>
#include <linux/kobject.h>
+#include <linux/fsnotify.h>
#include <asm/uaccess.h>


@@ -229,6 +230,7 @@

if (root) {
sb->s_root = NULL;
+ fsnotify_sb_umount(sb);
shrink_dcache_parent(root);
shrink_dcache_anon(&sb->s_anon);
dput(root);
diff -urN linux-2.6.11-mm1/include/linux/fs.h linux/include/linux/fs.h
--- linux-2.6.11-mm1/include/linux/fs.h 2005-03-04 14:06:21.000000000 -0500
+++ linux/include/linux/fs.h 2005-03-08 12:02:28.237807256 -0500
@@ -223,6 +223,7 @@
struct kstatfs;
struct vm_area_struct;
struct vfsmount;
+struct inotify_inode_data;

/* Used to be a macro which just called the function, now just a function */
extern void update_atime (struct inode *);
@@ -473,6 +474,11 @@
struct dnotify_struct *i_dnotify; /* for directory notifications */
#endif

+#ifdef CONFIG_INOTIFY
+ struct list_head inotify_watches; /* watches on this inode */
+ spinlock_t inotify_lock; /* protects the watches list */
+#endif
+
unsigned long i_state;
unsigned long dirtied_when; /* jiffies of first dirtying */

@@ -1368,7 +1374,7 @@
extern int do_remount_sb(struct super_block *sb, int flags,
void *data, int force);
extern sector_t bmap(struct inode *, sector_t);
-extern int setattr_mask(unsigned int);
+extern void setattr_mask(unsigned int, int *, u32 *);
extern int notify_change(struct dentry *, struct iattr *);
extern int permission(struct inode *, int, struct nameidata *);
extern int generic_permission(struct inode *, int,
diff -urN linux-2.6.11-mm1/include/linux/fsnotify.h linux/include/linux/fsnotify.h
--- linux-2.6.11-mm1/include/linux/fsnotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/fsnotify.h 2005-03-08 12:02:28.238807104 -0500
@@ -0,0 +1,236 @@
+#ifndef _LINUX_FS_NOTIFY_H
+#define _LINUX_FS_NOTIFY_H
+
+/*
+ * include/linux/fs_notify.h - generic hooks for filesystem notification, to
+ * reduce in-source duplication from both dnotify and inotify.
+ *
+ * We don't compile any of this away in some complicated menagerie of ifdefs.
+ * Instead, we rely on the code inside to optimize away as needed.
+ *
+ * (C) Copyright 2005 Robert Love
+ */
+
+#ifdef __KERNEL__
+
+#include <linux/dnotify.h>
+#include <linux/inotify.h>
+
+/*
+ * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
+ */
+static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
+ const char *old_name, const char *new_name)
+{
+ u32 cookie;
+
+ if (old_dir == new_dir)
+ inode_dir_notify(old_dir, DN_RENAME);
+ else {
+ inode_dir_notify(old_dir, DN_DELETE);
+ inode_dir_notify(new_dir, DN_CREATE);
+ }
+
+ cookie = inotify_get_cookie();
+
+ inotify_inode_queue_event(old_dir, IN_MOVED_FROM, cookie, old_name);
+ inotify_inode_queue_event(new_dir, IN_MOVED_TO, cookie, new_name);
+}
+
+/*
+ * fsnotify_unlink - file was unlinked
+ */
+static inline void fsnotify_unlink(struct dentry *dentry, struct inode *dir)
+{
+ struct inode *inode = dentry->d_inode;
+
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_FILE, 0, dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+}
+
+/*
+ * fsnotify_rmdir - directory was removed
+ */
+static inline void fsnotify_rmdir(struct dentry *dentry, struct inode *inode,
+ struct inode *dir)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_SUBDIR,0,dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+}
+
+/*
+ * fsnotify_create - filename was linked in
+ */
+static inline void fsnotify_create(struct inode *inode, const char *filename)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_FILE, 0, filename);
+}
+
+/*
+ * fsnotify_mkdir - directory 'name' was created
+ */
+static inline void fsnotify_mkdir(struct inode *inode, const char *name)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_SUBDIR, 0, name);
+}
+
+/*
+ * fsnotify_access - file was read
+ */
+static inline void fsnotify_access(struct dentry *dentry)
+{
+ dnotify_parent(dentry, DN_ACCESS);
+ inotify_dentry_parent_queue_event(dentry, IN_ACCESS, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(dentry->d_inode, IN_ACCESS, 0, NULL);
+}
+
+/*
+ * fsnotify_modify - file was modified
+ */
+static inline void fsnotify_modify(struct dentry *dentry)
+{
+ dnotify_parent(dentry, DN_MODIFY);
+ inotify_dentry_parent_queue_event(dentry, IN_MODIFY, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(dentry->d_inode, IN_MODIFY, 0, NULL);
+}
+
+/*
+ * fsnotify_open - file was opened
+ */
+static inline void fsnotify_open(struct dentry *dentry)
+{
+ inotify_inode_queue_event(dentry->d_inode, IN_OPEN, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, IN_OPEN, 0,
+ dentry->d_name.name);
+}
+
+/*
+ * fsnotify_close - file was closed
+ */
+static inline void fsnotify_close(struct file *file)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ const char *filename = dentry->d_name.name;
+ mode_t mode = file->f_mode;
+ u32 mask;
+
+ mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
+ inotify_dentry_parent_queue_event(dentry, mask, 0, filename);
+ inotify_inode_queue_event(inode, mask, 0, NULL);
+}
+
+/*
+ * fsnotify_change - notify_change event. file was modified and/or metadata
+ * was changed.
+ */
+static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
+{
+ int dn_mask = 0;
+ u32 in_mask = 0;
+
+ if (ia_valid & ATTR_UID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_GID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_SIZE) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ /* both times implies a utime(s) call */
+ if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
+ {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ } else if (ia_valid & ATTR_ATIME) {
+ in_mask |= IN_ACCESS;
+ dn_mask |= DN_ACCESS;
+ } else if (ia_valid & ATTR_MTIME) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ if (ia_valid & ATTR_MODE) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+
+ if (dn_mask)
+ dnotify_parent(dentry, dn_mask);
+ if (in_mask) {
+ inotify_inode_queue_event(dentry->d_inode, in_mask, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, in_mask, 0,
+ dentry->d_name.name);
+ }
+}
+
+/*
+ * fsnotify_sb_umount - filesystem unmount
+ */
+static inline void fsnotify_sb_umount(struct super_block *sb)
+{
+ inotify_super_block_umount(sb);
+}
+
+/*
+ * fsnotify_flush - flush time!
+ */
+static inline void fsnotify_flush(struct file *filp, fl_owner_t id)
+{
+ dnotify_flush(filp, id);
+}
+
+#ifdef CONFIG_INOTIFY /* inotify helpers */
+
+/*
+ * fsnotify_oldname_init - save off the old filename before we change it
+ *
+ * this could be kstrdup if only we could add that to lib/string.c
+ */
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ char *old_name;
+
+ old_name = kmalloc(strlen(old_dentry->d_name.name) + 1, GFP_KERNEL);
+ if (old_name)
+ strcpy(old_name, old_dentry->d_name.name);
+ return old_name;
+}
+
+/*
+ * fsnotify_oldname_free - free the name we got from fsnotify_oldname_init
+ */
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+ kfree(old_name);
+}
+
+#else /* CONFIG_INOTIFY */
+
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ return NULL;
+}
+
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+}
+
+#endif /* ! CONFIG_INOTIFY */
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_FS_NOTIFY_H */
diff -urN linux-2.6.11-mm1/include/linux/inotify.h linux/include/linux/inotify.h
--- linux-2.6.11-mm1/include/linux/inotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/inotify.h 2005-03-08 12:02:28.240806800 -0500
@@ -0,0 +1,113 @@
+/*
+ * Inode based directory notification for Linux
+ *
+ * Copyright (C) 2005 John McCutchan
+ */
+
+#ifndef _LINUX_INOTIFY_H
+#define _LINUX_INOTIFY_H
+
+#include <linux/types.h>
+#include <linux/limits.h>
+
+/*
+ * struct inotify_event - structure read from the inotify device for each event
+ *
+ * When you are watching a directory, you will receive the filename for events
+ * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, ..., relative to the wd.
+ */
+struct inotify_event {
+ __s32 wd; /* watch descriptor */
+ __u32 mask; /* watch mask */
+ __u32 cookie; /* cookie to synchronize two events */
+ __u32 len; /* length (including nulls) of name */
+ char name[0]; /* stub for possible name */
+};
+
+/*
+ * struct inotify_watch_request - represents a watch request
+ *
+ * Pass to the inotify device via the INOTIFY_WATCH ioctl
+ */
+struct inotify_watch_request {
+ char *name; /* filename name */
+ __u32 mask; /* event mask */
+};
+
+/* the following are legal, implemented events */
+#define IN_ACCESS 0x00000001 /* File was accessed */
+#define IN_MODIFY 0x00000002 /* File was modified */
+#define IN_ATTRIB 0x00000004 /* File changed attributes */
+#define IN_CLOSE_WRITE 0x00000008 /* Writtable file was closed */
+#define IN_CLOSE_NOWRITE 0x00000010 /* Unwrittable file closed */
+#define IN_OPEN 0x00000020 /* File was opened */
+#define IN_MOVED_FROM 0x00000040 /* File was moved from X */
+#define IN_MOVED_TO 0x00000080 /* File was moved to Y */
+#define IN_DELETE_SUBDIR 0x00000100 /* Subdir was deleted */
+#define IN_DELETE_FILE 0x00000200 /* Subfile was deleted */
+#define IN_CREATE_SUBDIR 0x00000400 /* Subdir was created */
+#define IN_CREATE_FILE 0x00000800 /* Subfile was created */
+#define IN_DELETE_SELF 0x00001000 /* Self was deleted */
+#define IN_UNMOUNT 0x00002000 /* Backing fs was unmounted */
+#define IN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define IN_IGNORED 0x00008000 /* File was ignored */
+
+/* special flags */
+#define IN_ALL_EVENTS 0xffffffff /* All the events */
+#define IN_CLOSE (IN_CLOSE_WRITE | IN_CLOSE_NOWRITE)
+
+#define INOTIFY_IOCTL_MAGIC 'Q'
+#define INOTIFY_IOCTL_MAXNR 2
+
+#define INOTIFY_WATCH _IOR(INOTIFY_IOCTL_MAGIC, 1, struct inotify_watch_request)
+#define INOTIFY_IGNORE _IOR(INOTIFY_IOCTL_MAGIC, 2, int)
+
+#ifdef __KERNEL__
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/config.h>
+#include <asm/atomic.h>
+
+#ifdef CONFIG_INOTIFY
+
+extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
+ const char *);
+extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
+ const char *);
+extern void inotify_super_block_umount(struct super_block *);
+extern void inotify_inode_is_dead(struct inode *);
+extern u32 inotify_get_cookie(void);
+
+#else
+
+static inline void inotify_inode_queue_event(struct inode *inode,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_dentry_parent_queue_event(struct dentry *dentry,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_super_block_umount(struct super_block *sb)
+{
+}
+
+static inline void inotify_inode_is_dead(struct inode *inode)
+{
+}
+
+static inline u32 inotify_get_cookie(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_INOTIFY */
+
+#endif /* __KERNEL __ */
+
+#endif /* _LINUX_INOTIFY_H */
diff -urN linux-2.6.11-mm1/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.6.11-mm1/include/linux/sched.h 2005-03-04 14:06:21.000000000 -0500
+++ linux/include/linux/sched.h 2005-03-08 12:02:28.241806648 -0500
@@ -411,6 +411,10 @@
atomic_t processes; /* How many processes does this user have? */
atomic_t files; /* How many open files does this user have? */
atomic_t sigpending; /* How many pending signals does this user have? */
+#ifdef CONFIG_INOTIFY
+ atomic_t inotify_watches; /* How many inotify watches does this user have? */
+ atomic_t inotify_devs; /* How many inotify devs does this user have opened? */
+#endif
/* protected by mq_lock */
unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
unsigned long locked_shm; /* How many pages of mlocked shm ? */
diff -urN linux-2.6.11-mm1/kernel/user.c linux/kernel/user.c
--- linux-2.6.11-mm1/kernel/user.c 2005-03-04 14:06:21.000000000 -0500
+++ linux/kernel/user.c 2005-03-08 12:02:28.242806496 -0500
@@ -120,6 +120,10 @@
atomic_set(&new->processes, 0);
atomic_set(&new->files, 0);
atomic_set(&new->sigpending, 0);
+#ifdef CONFIG_INOTIFY
+ atomic_set(&new->inotify_watches, 0);
+ atomic_set(&new->inotify_devs, 0);
+#endif

new->mq_bytes = 0;
new->locked_shm = 0;


2005-04-05 08:10:11

by Prakash Punnoor

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

Hi,

I am having a little trouble with inotify 0.22. Previous version worked w/o
trouble (even with nvidia and nvsound loaded) with 2.6.12-rc1-kb2 and gamin

Now I use 2.6.12-rc2 with inotify 0.22 and got this after a few minutes of
uptime (compiling some stuff):

Apr 5 09:40:43 tachyon Unable to handle kernel NULL pointer dereference at
virtual address 00000010
Apr 5 09:40:43 tachyon printing eip:
Apr 5 09:40:43 tachyon b0181d83
Apr 5 09:40:43 tachyon *pde = 00000000
Apr 5 09:40:43 tachyon Oops: 0002 [#1]
Apr 5 09:40:43 tachyon PREEMPT
Apr 5 09:40:43 tachyon Modules linked in: nvidia nvsound
Apr 5 09:40:43 tachyon CPU: 0
Apr 5 09:40:43 tachyon EIP: 0060:[<b0181d83>] Tainted: P VLI
Apr 5 09:40:43 tachyon EFLAGS: 00210286 (2.6.12-rc2)
Apr 5 09:40:43 tachyon EIP is at inotify_ignore+0x23/0xd0
Apr 5 09:40:43 tachyon eax: 00000000 ebx: b998ce60 ecx: 00000000 edx:
00000000
Apr 5 09:40:43 tachyon esi: 00000000 edi: b998ce60 ebp: 80045102 esp:
b336bf48
Apr 5 09:40:43 tachyon ds: 007b es: 007b ss: 0068
Apr 5 09:40:43 tachyon Process gam_server (pid: 4144, threadinfo=b336a000
task=ccf84aa0)
Apr 5 09:40:43 tachyon Stack: b998ce60 b998ce98 08062484 b0181ebf b336bf78
b336bf74 0000003e b0181e30
Apr 5 09:40:43 tachyon c1168b00 08062484 80045102 b016c40b 00000000 c1168b00
c1168b00 00000003
Apr 5 09:40:43 tachyon b336a000 b016c640 00000000 080623d8 c1168b00 00000003
fffffff7 b336a000
Apr 5 09:40:43 tachyon Call Trace:
Apr 5 09:40:43 tachyon [<b0181ebf>] inotify_ioctl+0x8f/0xd0
Apr 5 09:40:43 tachyon [<b0181e30>] inotify_ioctl+0x0/0xd0
Apr 5 09:40:43 tachyon [<b016c40b>] do_ioctl+0x2b/0xa0
Apr 5 09:40:43 tachyon [<b016c640>] vfs_ioctl+0x60/0x210
Apr 5 09:40:43 tachyon [<b016c82d>] sys_ioctl+0x3d/0x70
Apr 5 09:40:43 tachyon [<b0103127>] sysenter_past_esp+0x54/0x75
Apr 5 09:40:43 tachyon Code: 89 58 20 eb c2 8d 76 00 83 ec 0c 89 7c 24 08 89
1c 24 89 c7 89 74 24 04 ff 4f 28 0f 88 2b 03 00 00 8d 47 08 e8 ef f9 12 00 89
c6 <ff> 40 10 ff 47 28 0f 8e 22 03 00 00 85 f6 ba ea ff ff ff 74 65

Cheers,

Prakash


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2005-04-05 16:15:41

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Tue, 2005-04-05 at 09:58 +0200, Prakash Punnoor wrote:

> I am having a little trouble with inotify 0.22. Previous version worked w/o
> trouble (even with nvidia and nvsound loaded) with 2.6.12-rc1-kb2 and gamin
>
> Now I use 2.6.12-rc2 with inotify 0.22 and got this after a few minutes of
> uptime (compiling some stuff):

Ah, thanks. That was not even related to the semaphore rewrite, but a
small bug fix I slipped in. But of course.

Gamin is an interesting test case for us because it does so many
ignores.

Anyhow, this should fix it. Confirm?

Thanks,

Robert Love


inotify!

inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

* inotify's interface is a device node, not SIGIO. You open a
single fd to the device node, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.

Inotify is currently used by Beagle (a desktop search infrastructure)
and Gamin (a FAM replacement).

Signed-off-by: Robert Love <[email protected]>

Documentation/filesystems/inotify.txt | 81 ++
fs/Kconfig | 13
fs/Makefile | 1
fs/attr.c | 33 -
fs/compat.c | 12
fs/file_table.c | 3
fs/inode.c | 6
fs/inotify.c | 961 ++++++++++++++++++++++++++++++++++
fs/namei.c | 30 -
fs/open.c | 6
fs/read_write.c | 15
include/linux/fs.h | 6
include/linux/fsnotify.h | 228 ++++++++
include/linux/inotify.h | 111 +++
include/linux/sched.h | 4
kernel/user.c | 4
16 files changed, 1458 insertions(+), 56 deletions(-)

diff -urN linux-2.6.12-rc1/Documentation/filesystems/inotify.txt linux/Documentation/filesystems/inotify.txt
--- linux-2.6.12-rc1/Documentation/filesystems/inotify.txt 1969-12-31 19:00:00.000000000 -0500
+++ linux/Documentation/filesystems/inotify.txt 2005-04-04 16:26:15.000000000 -0400
@@ -0,0 +1,81 @@
+ inotify
+ a powerful yet simple file change notification system
+
+
+
+Document started 15 Mar 2005 by Robert Love <[email protected]>
+
+(i) User Interface
+
+Inotify is controlled by a device node, /dev/inotify. If you do not use udev,
+this device may need to be created manually. First step, open it
+
+ int dev_fd = open ("/dev/inotify", O_RDONLY);
+
+Change events are managed by "watches". A watch is an (object,mask) pair where
+the object is a file or directory and the mask is a bitmask of one or more
+inotify events that the application wishes to receive. See <linux/inotify.h>
+for valid events. A watch is referenced by a watch descriptor, or wd.
+
+Watches are added via a file descriptor.
+
+Watches on a directory will return events on any files inside of the directory.
+
+Adding a watch is simple,
+
+ /* 'wd' represents the watch on fd with mask */
+ struct inotify_request req = { fd, mask };
+ int wd = ioctl (dev_fd, INOTIFY_WATCH, &req);
+
+You can add a large number of files via something like
+
+ for each file to watch {
+ struct inotify_request req;
+ int file_fd;
+
+ file_fd = open (file, O_RDONLY);
+ if (fd < 0) {
+ perror ("open");
+ break;
+ }
+
+ req.fd = file_fd;
+ req.mask = mask;
+
+ wd = ioctl (dev_fd, INOTIFY_WATCH, &req);
+
+ close (fd);
+ }
+
+You can update an existing watch in the same manner, by passing in a new mask.
+
+An existing watch is removed via the INOTIFY_IGNORE ioctl, for example
+
+ ioctl (dev_fd, INOTIFY_IGNORE, wd);
+
+Events are provided in the form of an inotify_event structure that is read(2)
+from /dev/inotify. The filename is of dynamic length and follows the struct.
+It is of size len. The filename is padded with null bytes to ensure proper
+alignment. This padding is reflected in len.
+
+You can slurp multiple events by passing a large buffer, for example
+
+ size_t len = read (fd, buf, BUF_LEN);
+
+Will return as many events as are available and fit in BUF_LEN.
+
+/dev/inotify is also select() and poll() able.
+
+You can find the size of the current event queue via the FIONREAD ioctl.
+
+All watches are destroyed and cleaned up on close.
+
+
+(ii) Internal Kernel Implementation
+
+Each open inotify device is associated with an inotify_device structure.
+
+Each watch is associated with an inotify_watch structure. Watches are chained
+off of each associated device and each associated inode.
+
+See fs/inotify.c for the locking and lifetime rules.
diff -urN linux-2.6.12-rc1/fs/attr.c linux/fs/attr.c
--- linux-2.6.12-rc1/fs/attr.c 2005-03-23 11:23:20.000000000 -0500
+++ linux/fs/attr.c 2005-04-04 16:26:15.000000000 -0400
@@ -10,7 +10,7 @@
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/fcntl.h>
#include <linux/quotaops.h>
#include <linux/security.h>
@@ -107,31 +107,8 @@
out:
return error;
}
-
EXPORT_SYMBOL(inode_setattr);

-int setattr_mask(unsigned int ia_valid)
-{
- unsigned long dn_mask = 0;
-
- if (ia_valid & ATTR_UID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_GID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_SIZE)
- dn_mask |= DN_MODIFY;
- /* both times implies a utime(s) call */
- if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
- dn_mask |= DN_ATTRIB;
- else if (ia_valid & ATTR_ATIME)
- dn_mask |= DN_ACCESS;
- else if (ia_valid & ATTR_MTIME)
- dn_mask |= DN_MODIFY;
- if (ia_valid & ATTR_MODE)
- dn_mask |= DN_ATTRIB;
- return dn_mask;
-}
-
int notify_change(struct dentry * dentry, struct iattr * attr)
{
struct inode *inode = dentry->d_inode;
@@ -194,11 +171,9 @@
if (ia_valid & ATTR_SIZE)
up_write(&dentry->d_inode->i_alloc_sem);

- if (!error) {
- unsigned long dn_mask = setattr_mask(ia_valid);
- if (dn_mask)
- dnotify_parent(dentry, dn_mask);
- }
+ if (!error)
+ fsnotify_change(dentry, ia_valid);
+
return error;
}

diff -urN linux-2.6.12-rc1/fs/compat.c linux/fs/compat.c
--- linux-2.6.12-rc1/fs/compat.c 2005-03-23 11:24:04.000000000 -0500
+++ linux/fs/compat.c 2005-04-04 16:26:15.000000000 -0400
@@ -36,7 +36,7 @@
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/dirent.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/highuid.h>
#include <linux/sunrpc/svc.h>
#include <linux/nfsd/nfsd.h>
@@ -1233,9 +1233,13 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ if (type == READ)
+ fsnotify_access(dentry);
+ else
+ fsnotify_modify(dentry);
+ }
return ret;
}

diff -urN linux-2.6.12-rc1/fs/file_table.c linux/fs/file_table.c
--- linux-2.6.12-rc1/fs/file_table.c 2005-03-23 11:23:20.000000000 -0500
+++ linux/fs/file_table.c 2005-04-04 16:26:15.000000000 -0400
@@ -16,6 +16,7 @@
#include <linux/eventpoll.h>
#include <linux/mount.h>
#include <linux/cdev.h>
+#include <linux/fsnotify.h>

/* sysctl tunables... */
struct files_stat_struct files_stat = {
@@ -123,6 +124,8 @@
struct inode *inode = dentry->d_inode;

might_sleep();
+
+ fsnotify_close(file);
/*
* The function eventpoll_release() should be the first called
* in the file cleanup chain.
diff -urN linux-2.6.12-rc1/fs/inode.c linux/fs/inode.c
--- linux-2.6.12-rc1/fs/inode.c 2005-03-23 11:24:04.000000000 -0500
+++ linux/fs/inode.c 2005-04-04 16:26:15.000000000 -0400
@@ -21,6 +21,7 @@
#include <linux/pagemap.h>
#include <linux/cdev.h>
#include <linux/bootmem.h>
+#include <linux/inotify.h>

/*
* This is needed for the following functions:
@@ -130,6 +131,10 @@
#ifdef CONFIG_QUOTA
memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
#endif
+#ifdef CONFIG_INOTIFY
+ INIT_LIST_HEAD(&inode->inotify_watches);
+ sema_init(&inode->inotify_sem, 1);
+#endif
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_cdev = NULL;
@@ -356,6 +361,7 @@

down(&iprune_sem);
spin_lock(&inode_lock);
+ inotify_unmount_inodes(&sb->s_inodes);
busy = invalidate_list(&sb->s_inodes, &throw_away);
spin_unlock(&inode_lock);

diff -urN linux-2.6.12-rc1/fs/inotify.c linux/fs/inotify.c
--- linux-2.6.12-rc1/fs/inotify.c 1969-12-31 19:00:00.000000000 -0500
+++ linux/fs/inotify.c 2005-04-05 12:08:28.000000000 -0400
@@ -0,0 +1,961 @@
+/*
+ * fs/inotify.c - inode-based file event notifications
+ *
+ * Authors:
+ * John McCutchan <[email protected]>
+ * Robert Love <[email protected]>
+ *
+ * Copyright (C) 2005 John McCutchan
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/writeback.h>
+#include <linux/inotify.h>
+
+#include <asm/ioctls.h>
+
+static atomic_t inotify_cookie;
+
+static kmem_cache_t *watch_cachep;
+static kmem_cache_t *event_cachep;
+
+static int max_user_devices;
+static int max_user_watches;
+static unsigned int max_queued_events;
+
+/*
+ * Lock ordering:
+ *
+ * dentry->d_lock (used to keep d_move() away from dentry->d_parent)
+ * iprune_sem (synchronize versus shrink_icache_memory())
+ * inode_lock (protects the super_block->s_inodes list)
+ * inode->inotify_sem (protects inode->inotify_watches and watches->i_list)
+ * inotify_dev->sem (protects inotify_device and watches->d_list)
+ */
+
+/*
+ * Lifetimes of the three main data structures--inotify_device, inode, and
+ * inotify_watch--are managed by reference count.
+ *
+ * inotify_device: Lifetime is from open until release. Additional references
+ * can bump the count via get_inotify_dev() and drop the count via
+ * put_inotify_dev().
+ *
+ * inotify_watch: Lifetime is from create_watch() to destory_watch().
+ * Additional references can bump the count via get_inotify_watch() and drop
+ * the count via put_inotify_watch().
+ *
+ * inode: Pinned so long as the inode is associated with a watch, from
+ * create_watch() to put_inotify_watch().
+ */
+
+/*
+ * struct inotify_device - represents an open instance of an inotify device
+ *
+ * This structure is protected by the semaphore 'sem'.
+ */
+struct inotify_device {
+ wait_queue_head_t wq; /* wait queue for i/o */
+ struct idr idr; /* idr mapping wd -> watch */
+ struct semaphore sem; /* protects this bad boy */
+ struct list_head events; /* list of queued events */
+ struct list_head watches; /* list of watches */
+ atomic_t count; /* reference count */
+ struct user_struct *user; /* user who opened this dev */
+ unsigned int queue_size; /* size of the queue (bytes) */
+ unsigned int event_count; /* number of pending events */
+ unsigned int max_events; /* maximum number of events */
+};
+
+/*
+ * struct inotify_kernel_event - An intofiy event, originating from a watch and
+ * queued for user-space. A list of these is attached to each instance of the
+ * device. In read(), this list is walked and all events that can fit in the
+ * buffer are returned.
+ *
+ * Protected by dev->sem of the device in which we are queued.
+ */
+struct inotify_kernel_event {
+ struct inotify_event event; /* the user-space event */
+ struct list_head list; /* entry in inotify_device's list */
+ char *name; /* filename, if any */
+};
+
+/*
+ * struct inotify_watch - represents a watch request on a specific inode
+ *
+ * d_list is protected by dev->sem of the associated watch->dev.
+ * i_list and mask are protected by inode->inotify_sem of the associated inode.
+ * dev, inode, and wd are never written to once the watch is created.
+ */
+struct inotify_watch {
+ struct list_head d_list; /* entry in inotify_device's list */
+ struct list_head i_list; /* entry in inode's list */
+ atomic_t count; /* reference count */
+ struct inotify_device *dev; /* associated device */
+ struct inode *inode; /* associated inode */
+ s32 wd; /* watch descriptor */
+ u32 mask; /* event mask for this watch */
+};
+
+static ssize_t show_max_queued_events(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_queued_events);
+}
+
+static ssize_t store_max_queued_events(struct class_device *class,
+ const char *buf, size_t count)
+{
+ unsigned int max;
+
+ if (sscanf(buf, "%u", &max) > 0 && max > 0) {
+ max_queued_events = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_devices(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_devices);
+}
+
+static ssize_t store_max_user_devices(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_devices = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_watches(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", max_user_watches);
+}
+
+static ssize_t store_max_user_watches(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ max_user_watches = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static CLASS_DEVICE_ATTR(max_queued_events, S_IRUGO | S_IWUSR,
+ show_max_queued_events, store_max_queued_events);
+static CLASS_DEVICE_ATTR(max_user_devices, S_IRUGO | S_IWUSR,
+ show_max_user_devices, store_max_user_devices);
+static CLASS_DEVICE_ATTR(max_user_watches, S_IRUGO | S_IWUSR,
+ show_max_user_watches, store_max_user_watches);
+
+static inline void get_inotify_dev(struct inotify_device *dev)
+{
+ atomic_inc(&dev->count);
+}
+
+static inline void put_inotify_dev(struct inotify_device *dev)
+{
+ if (atomic_dec_and_test(&dev->count)) {
+ atomic_dec(&dev->user->inotify_devs);
+ free_uid(dev->user);
+ kfree(dev);
+ }
+}
+
+static inline void get_inotify_watch(struct inotify_watch *watch)
+{
+ atomic_inc(&watch->count);
+}
+
+/*
+ * put_inotify_watch - decrements the ref count on a given watch. cleans up
+ * the watch and its references if the count reaches zero.
+ */
+static inline void put_inotify_watch(struct inotify_watch *watch)
+{
+ if (atomic_dec_and_test(&watch->count)) {
+ put_inotify_dev(watch->dev);
+ iput(watch->inode);
+ kmem_cache_free(watch_cachep, watch);
+ }
+}
+
+/*
+ * kernel_event - create a new kernel event with the given parameters
+ *
+ * This function can sleep.
+ */
+static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_kernel_event *kevent;
+
+ kevent = kmem_cache_alloc(event_cachep, GFP_KERNEL);
+ if (unlikely(!kevent))
+ return NULL;
+
+ /* we hand this out to user-space, so zero it just in case */
+ memset(&kevent->event, 0, sizeof(struct inotify_event));
+
+ kevent->event.wd = wd;
+ kevent->event.mask = mask;
+ kevent->event.cookie = cookie;
+
+ INIT_LIST_HEAD(&kevent->list);
+
+ if (name) {
+ size_t len, rem, event_size = sizeof(struct inotify_event);
+
+ /*
+ * We need to pad the filename so as to properly align an
+ * array of inotify_event structures. Because the structure is
+ * small and the common case is a small filename, we just round
+ * up to the next multiple of the structure's sizeof. This is
+ * simple and safe for all architectures.
+ */
+ len = strlen(name) + 1;
+ rem = event_size - len;
+ if (len > event_size) {
+ rem = event_size - (len % event_size);
+ if (len % event_size == 0)
+ rem = 0;
+ }
+ len += rem;
+
+ kevent->name = kmalloc(len, GFP_KERNEL);
+ if (unlikely(!kevent->name)) {
+ kmem_cache_free(event_cachep, kevent);
+ return NULL;
+ }
+ memset(kevent->name, 0, len);
+ strncpy(kevent->name, name, strlen(name));
+ kevent->event.len = len;
+ } else {
+ kevent->event.len = 0;
+ kevent->name = NULL;
+ }
+
+ return kevent;
+}
+
+/*
+ * inotify_dev_get_event - return the next event in the given dev's queue
+ *
+ * Caller must hold dev->sem.
+ */
+static inline struct inotify_kernel_event *
+inotify_dev_get_event(struct inotify_device *dev)
+{
+ return list_entry(dev->events.next, struct inotify_kernel_event, list);
+}
+
+/*
+ * inotify_dev_queue_event - add a new event to the given device
+ *
+ * Caller must hold dev->sem. Can sleep (calls kernel_event()).
+ */
+static void inotify_dev_queue_event(struct inotify_device *dev,
+ struct inotify_watch *watch, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct inotify_kernel_event *kevent, *last;
+
+ /* coalescing: drop this event if it is a dupe of the previous */
+ last = inotify_dev_get_event(dev);
+ if (dev->event_count && last->event.mask == mask &&
+ last->event.cookie == cookie &&
+ last->event.wd == watch->wd) {
+ const char *lastname = last->name;
+
+ if (!name && !lastname)
+ return;
+ if (name && lastname && !strcmp(lastname, name))
+ return;
+ }
+
+ /* the queue overflowed and we already sent the Q_OVERFLOW event */
+ if (unlikely(dev->event_count > dev->max_events))
+ return;
+
+ /* if the queue overflows, we need to notify user space */
+ if (unlikely(dev->event_count == dev->max_events))
+ kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
+ else
+ kevent = kernel_event(watch->wd, mask, cookie, name);
+
+ if (unlikely(!kevent))
+ return;
+
+ /* queue the event and wake up anyone waiting */
+ dev->event_count++;
+ dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
+ list_add_tail(&kevent->list, &dev->events);
+ wake_up_interruptible(&dev->wq);
+}
+
+/*
+ * remove_kevent - cleans up and ultimately frees the given kevent
+ *
+ * Caller must hold dev->sem.
+ */
+static void remove_kevent(struct inotify_device *dev,
+ struct inotify_kernel_event *kevent)
+{
+ list_del(&kevent->list);
+
+ dev->event_count--;
+ dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
+
+ kfree(kevent->name);
+ kmem_cache_free(event_cachep, kevent);
+}
+
+/*
+ * inotify_dev_event_dequeue - destroy an event on the given device
+ *
+ * Caller must hold dev->sem.
+ */
+static void inotify_dev_event_dequeue(struct inotify_device *dev)
+{
+ if (!list_empty(&dev->events)) {
+ struct inotify_kernel_event *kevent;
+ kevent = inotify_dev_get_event(dev);
+ remove_kevent(dev, kevent);
+ }
+}
+
+/*
+ * inotify_dev_get_wd - returns the next WD for use by the given dev
+ *
+ * Callers must hold dev->sem. This function can sleep.
+ */
+static int inotify_dev_get_wd(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ int ret;
+
+ do {
+ if (unlikely(!idr_pre_get(&dev->idr, GFP_KERNEL)))
+ return -ENOSPC;
+ ret = idr_get_new(&dev->idr, watch, &watch->wd);
+ } while (ret == -EAGAIN);
+
+ return ret;
+}
+
+/*
+ * create_watch - creates a watch on the given device.
+ *
+ * Callers must hold dev->sem. Calls inotify_dev_get_wd() so may sleep.
+ * Both 'dev' and 'inode' (by way of nameidata) need to be pinned.
+ */
+static struct inotify_watch *create_watch(struct inotify_device *dev,
+ u32 mask, struct inode *inode)
+{
+ struct inotify_watch *watch;
+ int ret;
+
+ if (atomic_read(&dev->user->inotify_watches) >= max_user_watches)
+ return ERR_PTR(-ENOSPC);
+
+ watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
+ if (unlikely(!watch))
+ return ERR_PTR(-ENOMEM);
+
+ ret = inotify_dev_get_wd(dev, watch);
+ if (unlikely(ret)) {
+ kmem_cache_free(watch_cachep, watch);
+ return ERR_PTR(ret);
+ }
+
+ watch->mask = mask;
+ atomic_set(&watch->count, 0);
+ INIT_LIST_HEAD(&watch->d_list);
+ INIT_LIST_HEAD(&watch->i_list);
+
+ /* save a reference to device and bump the count to make it official */
+ get_inotify_dev(dev);
+ watch->dev = dev;
+
+ /*
+ * Save a reference to the inode and bump the ref count to make it
+ * official. We hold a reference to nameidata, which makes this safe.
+ */
+ watch->inode = igrab(inode);
+
+ /* bump our own count, corresponding to our entry in dev->watches */
+ get_inotify_watch(watch);
+
+ atomic_inc(&dev->user->inotify_watches);
+
+ return watch;
+}
+
+/*
+ * inotify_find_dev - find the watch associated with the given inode and dev
+ *
+ * Callers must hold inode->inotify_sem.
+ */
+static struct inotify_watch *inode_find_dev(struct inode *inode,
+ struct inotify_device *dev)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->dev == dev)
+ return watch;
+ }
+
+ return NULL;
+}
+
+/*
+ * inotify_dev_is_watching_inode - is this device watching this inode?
+ *
+ * Callers must hold dev->sem.
+ */
+static inline int inotify_dev_is_watching_inode(struct inotify_device *dev,
+ struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &dev->watches, d_list) {
+ if (watch->inode == inode)
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * remove_watch_no_event - remove_watch() without the IN_IGNORED event.
+ */
+static void remove_watch_no_event(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ list_del(&watch->i_list);
+ list_del(&watch->d_list);
+
+ atomic_dec(&dev->user->inotify_watches);
+ idr_remove(&dev->idr, watch->wd);
+ put_inotify_watch(watch);
+}
+
+/*
+ * remove_watch - Remove a watch from both the device and the inode. Sends
+ * the IN_IGNORED event to the given device signifying that the inode is no
+ * longer watched.
+ *
+ * Callers must hold both inode->inotify_sem and dev->sem. We drop a
+ * reference to the inode before returning.
+ *
+ * The inode is not iput() so as to remain atomic. If the inode needs to be
+ * iput(), the call returns one. Otherwise, it returns zero.
+ */
+static void remove_watch(struct inotify_watch *watch,struct inotify_device *dev)
+{
+ inotify_dev_queue_event(dev, watch, IN_IGNORED, 0, NULL);
+ remove_watch_no_event(watch, dev);
+}
+
+/* Kernel API */
+
+/**
+ * inotify_inode_queue_event - queue an event to all watches on this inode
+ * @inode: inode event is originating from
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_watch *watch;
+
+ down(&inode->inotify_sem);
+ list_for_each_entry(watch, &inode->inotify_watches, i_list) {
+ if (watch->mask & mask) {
+ struct inotify_device *dev = watch->dev;
+ down(&dev->sem);
+ inotify_dev_queue_event(dev, watch, mask, cookie, name);
+ up(&dev->sem);
+ }
+ }
+ up(&inode->inotify_sem);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
+
+/**
+ * inotify_dentry_parent_queue_event - queue an event to a dentry's parent
+ * @dentry: the dentry in question, we queue against this dentry's parent
+ * @mask: event mask describing this event
+ * @cookie: cookie for synchronization, or zero
+ * @name: filename, if any
+ */
+void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
+ u32 cookie, const char *name)
+{
+ struct dentry *parent;
+ struct inode *inode;
+
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ inode = parent->d_inode;
+
+ /* We intentially do the list_empty() lockless. The race is fine. */
+ if (!list_empty(&inode->inotify_watches)) {
+ dget(parent);
+ spin_unlock(&dentry->d_lock);
+ inotify_inode_queue_event(inode, mask, cookie, name);
+ dput(parent);
+ } else
+ spin_unlock(&dentry->d_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
+
+/**
+ * inotify_get_cookie - return a unique cookie for use in synchronizing events.
+ */
+u32 inotify_get_cookie(void)
+{
+ return atomic_inc_return(&inotify_cookie);
+}
+EXPORT_SYMBOL_GPL(inotify_get_cookie);
+
+/**
+ * inotify_unmount_inodes - an sb is unmounting. handle any watched inodes.
+ * @list: list of inodes being unmounted (sb->s_inodes)
+ *
+ * Called with inode_lock held, protecting the unmounting super block's list
+ * of inodes, and with iprune_sem held, keeping shrink_icache_memory() at bay.
+ * We temporarily drop inode_lock, however, and CAN block.
+ */
+void inotify_unmount_inodes(struct list_head *list)
+{
+ struct inode *inode, *next_i;
+
+ list_for_each_entry_safe(inode, next_i, list, i_sb_list) {
+ struct inotify_watch *watch, *next_w;
+ struct list_head *watches;
+
+ /*
+ * We can safely drop inode_lock here because the per-sb list
+ * of inodes must not change during unmount and iprune_sem
+ * keeps shrink_icache_memory() away.
+ */
+ spin_unlock(&inode_lock);
+
+ /* for each watch, send IN_UNMOUNT and then remove it */
+ down(&inode->inotify_sem);
+ watches = &inode->inotify_watches;
+ list_for_each_entry_safe(watch, next_w, watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ down(&dev->sem);
+ inotify_dev_queue_event(dev, watch, IN_UNMOUNT,0,NULL);
+ remove_watch(watch, dev);
+ up(&dev->sem);
+ }
+ up(&inode->inotify_sem);
+
+ spin_lock(&inode_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(inotify_unmount_inodes);
+
+/**
+ * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
+ * @inode: inode that is about to be removed
+ */
+void inotify_inode_is_dead(struct inode *inode)
+{
+ struct inotify_watch *watch, *next;
+
+ down(&inode->inotify_sem);
+ list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ down(&dev->sem);
+ remove_watch(watch, dev);
+ up(&dev->sem);
+ }
+ up(&inode->inotify_sem);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
+
+/* Device Interface */
+
+static unsigned int inotify_poll(struct file *file, poll_table *wait)
+{
+ struct inotify_device *dev = file->private_data;
+ int ret = 0;
+
+ poll_wait(file, &dev->wq, wait);
+ down(&dev->sem);
+ if (!list_empty(&dev->events))
+ ret = POLLIN | POLLRDNORM;
+ up(&dev->sem);
+
+ return ret;
+}
+
+static ssize_t inotify_read(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ size_t event_size = sizeof (struct inotify_event);
+ struct inotify_device *dev;
+ char __user *start;
+ int ret;
+ DEFINE_WAIT(wait);
+
+ start = buf;
+ dev = file->private_data;
+
+ while (1) {
+ int events;
+
+ prepare_to_wait(&dev->wq, &wait, TASK_INTERRUPTIBLE);
+
+ down(&dev->sem);
+ events = !list_empty(&dev->events);
+ up(&dev->sem);
+ if (events) {
+ ret = 0;
+ break;
+ }
+
+ if (file->f_flags & O_NONBLOCK) {
+ ret = -EAGAIN;
+ break;
+ }
+
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+
+ schedule();
+ }
+
+ finish_wait(&dev->wq, &wait);
+ if (ret)
+ return ret;
+
+ down(&dev->sem);
+ while (1) {
+ struct inotify_kernel_event *kevent;
+
+ ret = buf - start;
+ if (list_empty(&dev->events))
+ break;
+
+ kevent = inotify_dev_get_event(dev);
+ if (event_size + kevent->event.len > count)
+ break;
+
+ if (copy_to_user(buf, &kevent->event, event_size)) {
+ ret = -EFAULT;
+ break;
+ }
+ buf += event_size;
+ count -= event_size;
+
+ if (kevent->name) {
+ if (copy_to_user(buf, kevent->name, kevent->event.len)){
+ ret = -EFAULT;
+ break;
+ }
+ buf += kevent->event.len;
+ count -= kevent->event.len;
+ }
+
+ remove_kevent(dev, kevent);
+ }
+ up(&dev->sem);
+
+ return ret;
+}
+
+static int inotify_open(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+ struct user_struct *user;
+ int ret;
+
+ user = get_uid(current->user);
+
+ if (unlikely(atomic_read(&user->inotify_devs) >= max_user_devices)) {
+ ret = -EMFILE;
+ goto out_err;
+ }
+
+ dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
+ if (unlikely(!dev)) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ idr_init(&dev->idr);
+ INIT_LIST_HEAD(&dev->events);
+ INIT_LIST_HEAD(&dev->watches);
+ init_waitqueue_head(&dev->wq);
+ sema_init(&dev->sem, 1);
+
+ dev->event_count = 0;
+ dev->queue_size = 0;
+ dev->max_events = max_queued_events;
+ dev->user = user;
+ atomic_set(&dev->count, 0);
+
+ get_inotify_dev(dev);
+ atomic_inc(&current->user->inotify_devs);
+
+ file->private_data = dev;
+
+ return 0;
+out_err:
+ free_uid(user);
+ return ret;
+}
+
+static int inotify_release(struct inode *ignored, struct file *file)
+{
+ struct inotify_device *dev = file->private_data;
+
+ /*
+ * Destroy all of the watches on this device. Unfortunately, not very
+ * pretty. We cannot do a simple iteration over the list, because we
+ * do not know the inode until we iterate to the watch. But we need to
+ * hold inode->inotify_sem before dev->sem. The following works.
+ */
+ while (1) {
+ struct inotify_watch *watch;
+ struct list_head *watches;
+ struct inode *inode;
+
+ down(&dev->sem);
+ watches = &dev->watches;
+ if (list_empty(watches)) {
+ up(&dev->sem);
+ break;
+ }
+ watch = list_entry(watches->next, struct inotify_watch, d_list);
+ get_inotify_watch(watch);
+ up(&dev->sem);
+
+ inode = watch->inode;
+ down(&inode->inotify_sem);
+ down(&dev->sem);
+ remove_watch_no_event(watch, dev);
+ up(&dev->sem);
+ up(&inode->inotify_sem);
+ put_inotify_watch(watch);
+ }
+
+ /* destroy all of the events on this device */
+ down(&dev->sem);
+ while (!list_empty(&dev->events))
+ inotify_dev_event_dequeue(dev);
+ up(&dev->sem);
+
+ /* free this device: the put matching the get in inotify_open() */
+ put_inotify_dev(dev);
+
+ return 0;
+}
+
+static int inotify_add_watch(struct inotify_device *dev,
+ int fd, u32 mask)
+{
+ struct inotify_watch *watch, *old;
+ struct inode *inode;
+ struct file *filp;
+ int ret;
+
+ filp = fget(fd);
+ if (!filp)
+ return -EBADF;
+ inode = filp->f_dentry->d_inode;
+
+ down(&inode->inotify_sem);
+ down(&dev->sem);
+
+ /*
+ * Handle the case of re-adding a watch on an (inode,dev) pair that we
+ * are already watching. We just update the mask and return its wd.
+ */
+ old = inode_find_dev(inode, dev);
+ if (unlikely(old)) {
+ old->mask = mask;
+ ret = old->wd;
+ goto out;
+ }
+
+ watch = create_watch(dev, mask, inode);
+ if (unlikely(IS_ERR(watch))) {
+ ret = PTR_ERR(watch);
+ goto out;
+ }
+
+ /* Add the watch to the device's and the inode's list */
+ list_add(&watch->d_list, &dev->watches);
+ list_add(&watch->i_list, &inode->inotify_watches);
+ ret = watch->wd;
+
+out:
+ up(&dev->sem);
+ up(&inode->inotify_sem);
+ fput(filp);
+
+ return ret;
+}
+
+/*
+ * inotify_ignore - handle the INOTIFY_IGNORE ioctl, asking that a given wd be
+ * removed from the device.
+ *
+ * Can sleep.
+ */
+static int inotify_ignore(struct inotify_device *dev, s32 wd)
+{
+ struct inotify_watch *watch;
+ struct inode *inode;
+
+ down(&dev->sem);
+ watch = idr_find(&dev->idr, wd);
+ if (unlikely(!watch)) {
+ up(&dev->sem);
+ return -EINVAL;
+ }
+ get_inotify_watch(watch);
+ up(&dev->sem);
+
+ inode = watch->inode;
+ down(&inode->inotify_sem);
+ down(&dev->sem);
+ remove_watch(watch, dev);
+ up(&dev->sem);
+ up(&inode->inotify_sem);
+ put_inotify_watch(watch);
+
+ return 0;
+}
+
+static long inotify_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct inotify_device *dev;
+ struct inotify_watch_request request;
+ void __user *p;
+ int ret = -ENOTTY;
+ s32 wd;
+
+ dev = file->private_data;
+ p = (void __user *) arg;
+
+ switch (cmd) {
+ case INOTIFY_WATCH:
+ if (unlikely(copy_from_user(&request, p, sizeof (request)))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_add_watch(dev, request.fd, request.mask);
+ break;
+ case INOTIFY_IGNORE:
+ if (unlikely(get_user(wd, (int __user *) p))) {
+ ret = -EFAULT;
+ break;
+ }
+ ret = inotify_ignore(dev, wd);
+ break;
+ case FIONREAD:
+ ret = put_user(dev->queue_size, (int __user *) p);
+ break;
+ }
+
+ return ret;
+}
+
+static struct file_operations inotify_fops = {
+ .owner = THIS_MODULE,
+ .poll = inotify_poll,
+ .read = inotify_read,
+ .open = inotify_open,
+ .release = inotify_release,
+ .unlocked_ioctl = inotify_ioctl,
+ .compat_ioctl = inotify_ioctl,
+};
+
+static struct miscdevice inotify_device = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "inotify",
+ .fops = &inotify_fops,
+};
+
+/*
+ * inotify_init - Our initialization function. Note that we cannnot return
+ * error because we have compiled-in VFS hooks. So an (unlikely) failure here
+ * must result in panic().
+ */
+static int __init inotify_init(void)
+{
+ struct class_device *class;
+ int ret;
+
+ ret = misc_register(&inotify_device);
+ if (unlikely(ret))
+ panic("inotify: misc_register returned %d\n", ret);
+
+ max_queued_events = 512;
+ max_user_devices = 64;
+ max_user_watches = 16384;
+
+ class = inotify_device.class;
+ class_device_create_file(class, &class_device_attr_max_queued_events);
+ class_device_create_file(class, &class_device_attr_max_user_devices);
+ class_device_create_file(class, &class_device_attr_max_user_watches);
+
+ atomic_set(&inotify_cookie, 0);
+
+ watch_cachep = kmem_cache_create("inotify_watch_cache",
+ sizeof(struct inotify_watch),
+ 0, SLAB_PANIC, NULL, NULL);
+ event_cachep = kmem_cache_create("inotify_event_cache",
+ sizeof(struct inotify_kernel_event),
+ 0, SLAB_PANIC, NULL, NULL);
+
+ printk(KERN_INFO "inotify device minor=%d\n", inotify_device.minor);
+
+ return 0;
+}
+
+module_init(inotify_init);
diff -urN linux-2.6.12-rc1/fs/Kconfig linux/fs/Kconfig
--- linux-2.6.12-rc1/fs/Kconfig 2005-03-23 11:23:20.000000000 -0500
+++ linux/fs/Kconfig 2005-04-04 16:26:15.000000000 -0400
@@ -339,6 +339,19 @@
If you don't know whether you need it, then you don't need it:
answer N.

+config INOTIFY
+ bool "Inotify file change notification support"
+ default y
+ ---help---
+ Say Y here to enable inotify support and the /dev/inotify character
+ device. Inotify is a file change notification system and a
+ replacement for dnotify. Inotify fixes numerous shortcomings in
+ dnotify and introduces several new features. It allows monitoring
+ of both files and directories via a single open fd. Multiple file
+ events are supported.
+
+ If unsure, say Y.
+
config QUOTA
bool "Quota support"
help
diff -urN linux-2.6.12-rc1/fs/Makefile linux/fs/Makefile
--- linux-2.6.12-rc1/fs/Makefile 2005-03-23 11:23:20.000000000 -0500
+++ linux/fs/Makefile 2005-04-04 16:26:15.000000000 -0400
@@ -11,6 +11,7 @@
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \

+obj-$(CONFIG_INOTIFY) += inotify.o
obj-$(CONFIG_EPOLL) += eventpoll.o
obj-$(CONFIG_COMPAT) += compat.o

diff -urN linux-2.6.12-rc1/fs/namei.c linux/fs/namei.c
--- linux-2.6.12-rc1/fs/namei.c 2005-03-23 11:24:04.000000000 -0500
+++ linux/fs/namei.c 2005-04-04 16:26:15.000000000 -0400
@@ -21,7 +21,7 @@
#include <linux/namei.h>
#include <linux/quotaops.h>
#include <linux/pagemap.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/smp_lock.h>
#include <linux/personality.h>
#include <linux/security.h>
@@ -1292,7 +1292,7 @@
DQUOT_INIT(dir);
error = dir->i_op->create(dir, dentry, mode, nd);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_create(dir, dentry, mode);
}
return error;
@@ -1597,7 +1597,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mknod(dir, dentry, mode, dev);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_mknod(dir, dentry, mode, dev);
}
return error;
@@ -1670,7 +1670,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mkdir(dir, dentry, mode);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_mkdir(dir, dentry->d_name.name);
security_inode_post_mkdir(dir,dentry, mode);
}
return error;
@@ -1761,7 +1761,7 @@
}
up(&dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_DELETE);
+ fsnotify_rmdir(dentry, dentry->d_inode, dir);
d_delete(dentry);
}
dput(dentry);
@@ -1834,9 +1834,10 @@

/* We don't d_delete() NFS sillyrenamed files--they still exist. */
if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
+ fsnotify_unlink(dentry, dir);
d_delete(dentry);
- inode_dir_notify(dir, DN_DELETE);
}
+
return error;
}

@@ -1910,7 +1911,7 @@
DQUOT_INIT(dir);
error = dir->i_op->symlink(dir, dentry, oldname);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_symlink(dir, dentry, oldname);
}
return error;
@@ -1983,7 +1984,7 @@
error = dir->i_op->link(old_dentry, dir, new_dentry);
up(&old_dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, new_dentry->d_name.name);
security_inode_post_link(old_dentry, dir, new_dentry);
}
return error;
@@ -2147,6 +2148,7 @@
{
int error;
int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
+ char *old_name;

if (old_dentry->d_inode == new_dentry->d_inode)
return 0;
@@ -2168,18 +2170,18 @@
DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir);

+ old_name = fsnotify_oldname_init(old_dentry);
+
if (is_dir)
error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
else
error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
if (!error) {
- if (old_dir == new_dir)
- inode_dir_notify(old_dir, DN_RENAME);
- else {
- inode_dir_notify(old_dir, DN_DELETE);
- inode_dir_notify(new_dir, DN_CREATE);
- }
+ const char *new_name = old_dentry->d_name.name;
+ fsnotify_move(old_dir, new_dir, old_name, new_name);
}
+ fsnotify_oldname_free(old_name);
+
return error;
}

diff -urN linux-2.6.12-rc1/fs/open.c linux/fs/open.c
--- linux-2.6.12-rc1/fs/open.c 2005-03-23 11:23:20.000000000 -0500
+++ linux/fs/open.c 2005-04-04 16:26:15.000000000 -0400
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/smp_lock.h>
#include <linux/quotaops.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/tty.h>
@@ -944,9 +944,11 @@
fd = get_unused_fd();
if (fd >= 0) {
struct file *f = filp_open(tmp, flags, mode);
+
error = PTR_ERR(f);
if (IS_ERR(f))
goto out_error;
+ fsnotify_open(f->f_dentry);
fd_install(fd, f);
}
out:
@@ -998,7 +1000,7 @@
retval = err;
}

- dnotify_flush(filp, id);
+ fsnotify_flush(filp, id);
locks_remove_posix(filp, id);
fput(filp);
return retval;
diff -urN linux-2.6.12-rc1/fs/read_write.c linux/fs/read_write.c
--- linux-2.6.12-rc1/fs/read_write.c 2005-03-23 11:23:20.000000000 -0500
+++ linux/fs/read_write.c 2005-04-04 16:26:15.000000000 -0400
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/uio.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/module.h>
#include <linux/syscalls.h>
@@ -239,7 +239,7 @@
else
ret = do_sync_read(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_ACCESS);
+ fsnotify_access(file->f_dentry);
current->rchar += ret;
}
current->syscr++;
@@ -287,7 +287,7 @@
else
ret = do_sync_write(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_MODIFY);
+ fsnotify_modify(file->f_dentry);
current->wchar += ret;
}
current->syscw++;
@@ -523,9 +523,12 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ if (type == READ)
+ fsnotify_access(file->f_dentry);
+ else
+ fsnotify_modify(file->f_dentry);
+ }
return ret;
Efault:
ret = -EFAULT;
diff -urN linux-2.6.12-rc1/include/linux/fs.h linux/include/linux/fs.h
--- linux-2.6.12-rc1/include/linux/fs.h 2005-03-23 11:24:05.000000000 -0500
+++ linux/include/linux/fs.h 2005-04-04 16:26:15.000000000 -0400
@@ -472,6 +472,11 @@
struct dnotify_struct *i_dnotify; /* for directory notifications */
#endif

+#ifdef CONFIG_INOTIFY
+ struct list_head inotify_watches; /* watches on this inode */
+ struct semaphore inotify_sem; /* protects the watches list */
+#endif
+
unsigned long i_state;
unsigned long dirtied_when; /* jiffies of first dirtying */

@@ -1366,7 +1371,6 @@
extern int do_remount_sb(struct super_block *sb, int flags,
void *data, int force);
extern sector_t bmap(struct inode *, sector_t);
-extern int setattr_mask(unsigned int);
extern int notify_change(struct dentry *, struct iattr *);
extern int permission(struct inode *, int, struct nameidata *);
extern int generic_permission(struct inode *, int,
diff -urN linux-2.6.12-rc1/include/linux/fsnotify.h linux/include/linux/fsnotify.h
--- linux-2.6.12-rc1/include/linux/fsnotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/fsnotify.h 2005-04-04 16:26:15.000000000 -0400
@@ -0,0 +1,228 @@
+#ifndef _LINUX_FS_NOTIFY_H
+#define _LINUX_FS_NOTIFY_H
+
+/*
+ * include/linux/fs_notify.h - generic hooks for filesystem notification, to
+ * reduce in-source duplication from both dnotify and inotify.
+ *
+ * We don't compile any of this away in some complicated menagerie of ifdefs.
+ * Instead, we rely on the code inside to optimize away as needed.
+ *
+ * (C) Copyright 2005 Robert Love
+ */
+
+#ifdef __KERNEL__
+
+#include <linux/dnotify.h>
+#include <linux/inotify.h>
+
+/*
+ * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
+ */
+static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
+ const char *old_name, const char *new_name)
+{
+ u32 cookie;
+
+ if (old_dir == new_dir)
+ inode_dir_notify(old_dir, DN_RENAME);
+ else {
+ inode_dir_notify(old_dir, DN_DELETE);
+ inode_dir_notify(new_dir, DN_CREATE);
+ }
+
+ cookie = inotify_get_cookie();
+
+ inotify_inode_queue_event(old_dir, IN_MOVED_FROM, cookie, old_name);
+ inotify_inode_queue_event(new_dir, IN_MOVED_TO, cookie, new_name);
+}
+
+/*
+ * fsnotify_unlink - file was unlinked
+ */
+static inline void fsnotify_unlink(struct dentry *dentry, struct inode *dir)
+{
+ struct inode *inode = dentry->d_inode;
+
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_FILE, 0, dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+}
+
+/*
+ * fsnotify_rmdir - directory was removed
+ */
+static inline void fsnotify_rmdir(struct dentry *dentry, struct inode *inode,
+ struct inode *dir)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_SUBDIR,0,dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+}
+
+/*
+ * fsnotify_create - filename was linked in
+ */
+static inline void fsnotify_create(struct inode *inode, const char *filename)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_FILE, 0, filename);
+}
+
+/*
+ * fsnotify_mkdir - directory 'name' was created
+ */
+static inline void fsnotify_mkdir(struct inode *inode, const char *name)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_SUBDIR, 0, name);
+}
+
+/*
+ * fsnotify_access - file was read
+ */
+static inline void fsnotify_access(struct dentry *dentry)
+{
+ dnotify_parent(dentry, DN_ACCESS);
+ inotify_dentry_parent_queue_event(dentry, IN_ACCESS, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(dentry->d_inode, IN_ACCESS, 0, NULL);
+}
+
+/*
+ * fsnotify_modify - file was modified
+ */
+static inline void fsnotify_modify(struct dentry *dentry)
+{
+ dnotify_parent(dentry, DN_MODIFY);
+ inotify_dentry_parent_queue_event(dentry, IN_MODIFY, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(dentry->d_inode, IN_MODIFY, 0, NULL);
+}
+
+/*
+ * fsnotify_open - file was opened
+ */
+static inline void fsnotify_open(struct dentry *dentry)
+{
+ inotify_inode_queue_event(dentry->d_inode, IN_OPEN, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, IN_OPEN, 0,
+ dentry->d_name.name);
+}
+
+/*
+ * fsnotify_close - file was closed
+ */
+static inline void fsnotify_close(struct file *file)
+{
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ const char *filename = dentry->d_name.name;
+ mode_t mode = file->f_mode;
+ u32 mask;
+
+ mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
+ inotify_dentry_parent_queue_event(dentry, mask, 0, filename);
+ inotify_inode_queue_event(inode, mask, 0, NULL);
+}
+
+/*
+ * fsnotify_change - notify_change event. file was modified and/or metadata
+ * was changed.
+ */
+static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
+{
+ int dn_mask = 0;
+ u32 in_mask = 0;
+
+ if (ia_valid & ATTR_UID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_GID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_SIZE) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ /* both times implies a utime(s) call */
+ if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
+ {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ } else if (ia_valid & ATTR_ATIME) {
+ in_mask |= IN_ACCESS;
+ dn_mask |= DN_ACCESS;
+ } else if (ia_valid & ATTR_MTIME) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ if (ia_valid & ATTR_MODE) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+
+ if (dn_mask)
+ dnotify_parent(dentry, dn_mask);
+ if (in_mask) {
+ inotify_inode_queue_event(dentry->d_inode, in_mask, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, in_mask, 0,
+ dentry->d_name.name);
+ }
+}
+
+/*
+ * fsnotify_flush - flush time!
+ */
+static inline void fsnotify_flush(struct file *filp, fl_owner_t id)
+{
+ dnotify_flush(filp, id);
+}
+
+#ifdef CONFIG_INOTIFY /* inotify helpers */
+
+/*
+ * fsnotify_oldname_init - save off the old filename before we change it
+ *
+ * this could be kstrdup if only we could add that to lib/string.c
+ */
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ char *old_name;
+
+ old_name = kmalloc(strlen(old_dentry->d_name.name) + 1, GFP_KERNEL);
+ if (old_name)
+ strcpy(old_name, old_dentry->d_name.name);
+ return old_name;
+}
+
+/*
+ * fsnotify_oldname_free - free the name we got from fsnotify_oldname_init
+ */
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+ kfree(old_name);
+}
+
+#else /* CONFIG_INOTIFY */
+
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ return NULL;
+}
+
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+}
+
+#endif /* ! CONFIG_INOTIFY */
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_FS_NOTIFY_H */
diff -urN linux-2.6.12-rc1/include/linux/inotify.h linux/include/linux/inotify.h
--- linux-2.6.12-rc1/include/linux/inotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/inotify.h 2005-04-04 18:01:13.000000000 -0400
@@ -0,0 +1,111 @@
+/*
+ * Inode based directory notification for Linux
+ *
+ * Copyright (C) 2005 John McCutchan
+ */
+
+#ifndef _LINUX_INOTIFY_H
+#define _LINUX_INOTIFY_H
+
+#include <linux/types.h>
+
+/*
+ * struct inotify_event - structure read from the inotify device for each event
+ *
+ * When you are watching a directory, you will receive the filename for events
+ * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, ..., relative to the wd.
+ */
+struct inotify_event {
+ __s32 wd; /* watch descriptor */
+ __u32 mask; /* watch mask */
+ __u32 cookie; /* cookie to synchronize two events */
+ __u32 len; /* length (including nulls) of name */
+ char name[0]; /* stub for possible name */
+};
+
+/*
+ * struct inotify_watch_request - represents a watch request
+ *
+ * Pass to the inotify device via the INOTIFY_WATCH ioctl
+ */
+struct inotify_watch_request {
+ int fd; /* fd of filename to watch */
+ __u32 mask; /* event mask */
+};
+
+/* the following are legal, implemented events */
+#define IN_ACCESS 0x00000001 /* File was accessed */
+#define IN_MODIFY 0x00000002 /* File was modified */
+#define IN_ATTRIB 0x00000004 /* File changed attributes */
+#define IN_CLOSE_WRITE 0x00000008 /* Writtable file was closed */
+#define IN_CLOSE_NOWRITE 0x00000010 /* Unwrittable file closed */
+#define IN_OPEN 0x00000020 /* File was opened */
+#define IN_MOVED_FROM 0x00000040 /* File was moved from X */
+#define IN_MOVED_TO 0x00000080 /* File was moved to Y */
+#define IN_DELETE_SUBDIR 0x00000100 /* Subdir was deleted */
+#define IN_DELETE_FILE 0x00000200 /* Subfile was deleted */
+#define IN_CREATE_SUBDIR 0x00000400 /* Subdir was created */
+#define IN_CREATE_FILE 0x00000800 /* Subfile was created */
+#define IN_DELETE_SELF 0x00001000 /* Self was deleted */
+#define IN_UNMOUNT 0x00002000 /* Backing fs was unmounted */
+#define IN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define IN_IGNORED 0x00008000 /* File was ignored */
+
+/* special flags */
+#define IN_ALL_EVENTS 0xffffffff /* All the events */
+#define IN_CLOSE (IN_CLOSE_WRITE | IN_CLOSE_NOWRITE)
+
+#define INOTIFY_IOCTL_MAGIC 'Q'
+#define INOTIFY_IOCTL_MAXNR 2
+
+#define INOTIFY_WATCH _IOR(INOTIFY_IOCTL_MAGIC, 1, struct inotify_watch_request)
+#define INOTIFY_IGNORE _IOR(INOTIFY_IOCTL_MAGIC, 2, int)
+
+#ifdef __KERNEL__
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/config.h>
+
+#ifdef CONFIG_INOTIFY
+
+extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
+ const char *);
+extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
+ const char *);
+extern void inotify_unmount_inodes(struct list_head *);
+extern void inotify_inode_is_dead(struct inode *);
+extern u32 inotify_get_cookie(void);
+
+#else
+
+static inline void inotify_inode_queue_event(struct inode *inode,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_dentry_parent_queue_event(struct dentry *dentry,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_unmount_inodes(struct list_head *list)
+{
+}
+
+static inline void inotify_inode_is_dead(struct inode *inode)
+{
+}
+
+static inline u32 inotify_get_cookie(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_INOTIFY */
+
+#endif /* __KERNEL __ */
+
+#endif /* _LINUX_INOTIFY_H */
diff -urN linux-2.6.12-rc1/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.6.12-rc1/include/linux/sched.h 2005-03-23 11:24:05.000000000 -0500
+++ linux/include/linux/sched.h 2005-04-04 16:26:15.000000000 -0400
@@ -394,6 +394,10 @@
atomic_t processes; /* How many processes does this user have? */
atomic_t files; /* How many open files does this user have? */
atomic_t sigpending; /* How many pending signals does this user have? */
+#ifdef CONFIG_INOTIFY
+ atomic_t inotify_watches; /* How many inotify watches does this user have? */
+ atomic_t inotify_devs; /* How many inotify devs does this user have opened? */
+#endif
/* protected by mq_lock */
unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
unsigned long locked_shm; /* How many pages of mlocked shm ? */
diff -urN linux-2.6.12-rc1/kernel/user.c linux/kernel/user.c
--- linux-2.6.12-rc1/kernel/user.c 2005-03-23 11:24:05.000000000 -0500
+++ linux/kernel/user.c 2005-04-04 16:26:15.000000000 -0400
@@ -120,6 +120,10 @@
atomic_set(&new->processes, 0);
atomic_set(&new->files, 0);
atomic_set(&new->sigpending, 0);
+#ifdef CONFIG_INOTIFY
+ atomic_set(&new->inotify_watches, 0);
+ atomic_set(&new->inotify_devs, 0);
+#endif

new->mq_bytes = 0;
new->locked_shm = 0;


2005-04-05 17:53:51

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Tue, 2005-04-05 at 19:20 +0200, Prakash Punnoor wrote:

> BTW, what else could I use to make use of inotify? I know fam, which afaik
> only uses dnotify.

Beagle, a desktop search infrastructure. Check out
http://www.gnome.org/projects/beagle

Some other little projects. If anyone else is using it, please let us
know!

The main problem is that dnotify sucks so bad now that no one uses it.
So we don't have any existing applications to convert, besides FAM, and
we did that (via Gamin).

I've been meaning to write some sample GNOME code to show how easy it is
to use Inotify, even directly. I'll get on that.

> > Anyhow, this should fix it. Confirm?
>
> So far no problems. Interesting enough the previous patch worked w/o problem
> the last hours...

It might of been caused by a bug in Gamin, so it took some while to
expose. It should only happen when the user asks to remove a watch on a
wd that does not exist (I just forgot to check that error case in a bug
fix I added).

Keep pounding. It ought to be fixed, but please let me know if not!

Robert Love



2005-04-05 17:45:51

by Prakash Punnoor

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

Robert Love schrieb:
> On Tue, 2005-04-05 at 09:58 +0200, Prakash Punnoor wrote:
>
>>I am having a little trouble with inotify 0.22. Previous version worked w/o
>>trouble (even with nvidia and nvsound loaded) with 2.6.12-rc1-kb2 and gamin
>>
>>Now I use 2.6.12-rc2 with inotify 0.22 and got this after a few minutes of
>>uptime (compiling some stuff):
>
>
> Ah, thanks. That was not even related to the semaphore rewrite, but a
> small bug fix I slipped in. But of course.
>
> Gamin is an interesting test case for us because it does so many
> ignores.

BTW, what else could I use to make use of inotify? I know fam, which afaik
only uses dnotify.

> Anyhow, this should fix it. Confirm?

So far no problems. Interesting enough the previous patch worked w/o problem
the last hours...

Cheers

--
Prakash Punnoor


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2005-04-05 22:13:55

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Tue, 2005-04-05 at 19:20 +0200, Prakash Punnoor wrote:

> BTW, what else could I use to make use of inotify? I know fam, which afaik
> only uses dnotify.

Here is a little sample glib application that shows the ease-yet-power
of inotify.

http://www.kernel.org/pub/linux/kernel/people/rml/inotify/glib/

It integrates inotify watches into the glib mainloop via GIOChannel.
Everything is abstracted behind simple interfaces, so this might prove a
nice start for curious inotify developers.

Robert Love


2005-04-06 01:21:45

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Tue, 2005-04-05 at 17:53 -0700, Rusty Lynch wrote:

> From just a casual look, it seems like this could be used to monitor the
> comings and goings of processes by monitoring /proc. Unfortunately
> inotify doesn't seem to be getting all the events on the proc filesystem
> like it does on a real filesystem because I am not seeing new events every
> time a new process is added or removed. The same is true if you attempt
> to monitor something like /sys/bus/usb/devices/ and add/remove a usb
> device.
>
> On a side note, it's still rather interesting to monitor /proc and watch
> all the traffic.

Yah, I agree. I looked into doing this awhile back, when I noticed
inotify did not generate events for /proc. We just need to add calls to
the fsnotify hooks to the proc_create() and proc_delete_foo() stuff.

The interfaces are capable, e.g. we can add support at anytime, even
after inotify is merged. I'd be for it.

Robert Love


2005-04-06 03:14:55

by Adam Kropelin

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

I've been meaning to play with inotify for a while now and finally made
time for it tonight. I'm not much of a GUI guy, so I'm mostly interested
in exploring the command line applications of inotify --i.e., what sort of
havoc can I wreak with it in a script.

To that end I sat down tonight a threw together a simple command line
interface. Before I tell you where the code is, please note that I wrote
it while half asleep and more than a little high on cough syrup, so it's
undoubtedly chock full of buffer overflows, infinite loops, segfaults,
and other gremlins just waiting to eat your data, so PLEASE FOR THE LOVE
OF MIKE don't use it on, around, under, or in the general vicinity of,
anything important.

http://www.kroptech.com/~adk0212/watchf.c

The basic usage is...

watchf [-i] [-e<events>] <file-to-watch> [-- <command-to-run>...>]

-i says stay in an infinite loop, don't exit after one event
-e gives the list of events to wait for (see the code)
<file-to-watch> is the file or directory to be watched
Everything after -- is taken as a command to run each time an
event ocurrs with any ocurrences of {} being replaced with the
name of the affected file, as returned in the inotify_event
structure.

So what's it good for? Well, besides making fun of my coding skills,
it can be used to watch a directory and run a command when something
changes. For example, a one-line auto-gzip daemon that will haul off and
gzip anything you drop into ./gzipme:

watchf -i -eWT gzipme -- gzip gzipme/{}

Where to go from here? The code is relatively half-baked at the moment,
but I imgaine this could be turned into a relatively useful generic
tool. For example, it should send the filename to stdout and return the
event mask when in single-shot mode. That would make it useful as part
of a longer pipeline. You should be able to use it to wait for a specific
file to be created --although that will be interesting if one or more
segments of the path don't exist yet. Ultimately I'd like to get rid of
the <command-to-run> argument(s), but I can't think of any way to do it
that isn't going to end up missing events.

--Adam

2005-04-08 01:52:39

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

On Thu, 2005-04-07 at 18:37 -0700, Rusty Lynch wrote:

> Looking into this a little more I realized that the lack of /proc
> notifications (for processes coming and going) is a common problem anytime
> a file is modified without going through the VFS. Other examples are
> remote file changes on a mounted NFS partition, remote file changes on a
> mounted cluster filesystem (like ocfs or gfs), and just about any virtual
> file system where the kernel is adding/deleting/modifying files from below
> the VFS.

Indeed it is. But none of those are anything that we care about (except
maybe /proc).

The problem of changes on remote filesystems is solved by FAM.

Robert Love


2005-04-10 17:04:49

by Erik Meitner

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11

Robert Love wrote:
> Below is inotify, diffed against 2.6.11.
>
> I greatly reworked much of the data structures and their interactions,
> to lay the groundwork for sanitizing the locking. I then, I hope,
> sanitized the locking. It looks right, I am happy. Comments welcome.
> I surely could of missed something. Maybe even something big.
>
> But, regardless, this release is a huge jump from the previous, fixing
> all known issues and greatly improving the locking.
>

Inotify 0.22 severly destabilized my 2.6.11 system when it was in use.
The system would get into a state where it was unable to spawn new
processes, the desktop would hang, and the only way to shut down would
be to do a hard reset. This only happened when inotify was in use after
gamin started up. I was only able to get the following out of the
logs(not every time though):

c017b901
Modules linked in: radeon deflate zlib_deflate twofish serpent aes_i586
blowfish sha256 sha1 crypto_null af_key parport_pc lp parport 8250_pci
8250 serial_core usbhid ohci_hcd ohci1394 ieee1394 yenta_socket
rsrc_nonstatic
suspend_text
CPU: 0
EIP: 0060:[inotify_ignore+49/224] Not tainted VLI
EFLAGS: 00010246 (2.6.11.6-nx9005)
EIP is at inotify_ignore+0x31/0xe0
eax: 00000000 ebx: e88544e0 ecx: 00000000 edx: 00000000
esi: 00000000 edi: e88544e0 ebp: e8760000 esp: e8761f20
ds: 007b es: 007b ss: 0068
Process gam_server (pid: 6640, threadinfo=e8760000 task=ed19e060)
Stack: e88544e8 000000f6 e88544e0 fffffff2 fffffff7 c017bace e88544e0
000000f6
00000004 000000f6 00000292 00000000 08098e54 80045102 fffffff7
c01685a8
e5d584c0 80045102 08098e54 c0456118 00000000 e5d584c0 c0168785
e5d584c0
Call Trace:
[inotify_ioctl+286/288] inotify_ioctl+0x11e/0x120
[do_ioctl+104/128] do_ioctl+0x68/0x80
[vfs_ioctl+101/464] vfs_ioctl+0x65/0x1d0
[sys_ioctl+69/144] sys_ioctl+0x45/0x90
[syscall_call+7/11] syscall_call+0x7/0xb
Code: 10 89 5c 24 08 89 74 24 0c 8b 7c 24 18 ff 4f 28 0f 88 95 03 00 00
8b 44 24 1c 89 44 24 04 8d 47 08 89 04 24 e8 11 89 0b 00 89 c6 <ff> 40
10 ff 47 28 0f 8e 81 03 00 00 85 f6 b8 ea ff ff ff 74 6e

Thanks.