2004-11-01 18:46:29

by John McCutchan

[permalink] [raw]
Subject: [RFC][PATCH] inotify 0.15

Hello,

In this release of inotify I have added a new ulimit for the
number of watches a particular user can have. I am interested
in peoples response to this addition and any alternative solutions.

Here is release 0.15.0 of inotify. Attached is a patch to 2.6.9

--New in this version--
-uses idr to manager watch descriptors (rml)
-drop limit on inotify instances (me)
-make watch limit a ulimit (me)
-dropped dnotify patch (me)
-misc cleanups (me,rml,Bryan Rittmeyer)

John McCutchan

Inotify documentation:

--HOWTO USE--
Inotify is a character device that when opened offers 2 IOCTL's.

INOTIFY_WATCH:
Which takes a path and event mask and returns a unique
(to the instance of the driver) integer (wd [watch descriptor]
from here on) that is a 1:1 mapping to the path passed.
What happens is inotify refs the inode for the path and
adds a inotify_watch structure to the inode. Each
inotify device instance can only watch an inode once.

INOTIFY_IGNORE:
Which takes an integer (that you got from INOTIFY_WATCH)
representing a wd that you are not interested in watching
anymore. This will:

send an IGNORE event to the device
remove the inotify_watch structure from the device and
from the inode and unref the inode.

After you are watching 1 or more paths, you can read from the fd
and get events. The events are struct inotify_event. If you are
watching a directory and something happens to a file in the directory
the event will contain the filename (just the filename not the full
path).

-- EVENTS --
IN_ACCESS - Sent when file is accessed.
IN_MODIFY - Sent when file is modified.
IN_ATTRIB - Sent when file is chmod'ed.
IN_CLOSE_WRITE - Sent when file is closed and was opened for writing.
IN_CLOSE_NOWRITE - Sent when file is closed but was not opened for
writing.
IN_CLOSE - either of the two above events.
IN_OPEN - Sent when file is opened.
IN_MOVED_FROM - Sent to the source folder of a move.
IN_MOVED_TO - Sent to the destination folder of a move.
IN_DELETE_SUBDIR - Sent when a sub directory is deleted.(When watching
parent)
IN_DELETE_FILE - Sent when a file is deleted. (When watching parent)
IN_CREATE_SUBDIR - Sent when a sub directory is created.(When watching
parent)
IN_CREATE_FILE - Sent when a file is created. (When watching parent)
IN_DELETE_SELF - Sent when file is deleted.
IN_UNMOUNT - Sent when the filesystem is being unmounted.
IN_Q_OVERFLOW - Sent when your event queue has over flowed.

The MOVED_FROM/MOVED_TO events are always sent in pairs.
MOVED_FROM/MOVED_TO
is also sent when a file is renamed. The cookie field in the event
structure
allows you to pair MOVED_FROM/MOVED_TO events. These two events are not
guaranteed to be successive in the event stream. You must rely on the
cookie to pair them up.

If you aren't watching the source and destination folders of a
move/rename
You will only get MOVED_TO or MOVED_FROM. In this case, MOVED_TO
is equivelent to a CREATE and MOVED_FROM is equivelent to a DELETE.

--Why Not dnotify and Why inotify (By Robert Love)--

Everyone seems quick to deride the blunder known as "dnotify" and
applaud a
replacement, any replacement, man anything but that current mess, but in
the
name of fairness I present my treatise on why dnotify is what one might
call
not good:

* dnotify requires the opening of one fd per each directory that you
intend to
watch.
o The file descriptor pins the directory, disallowing the backing
device to be unmounted, which absolutely wrecks havoc with removable
media.
o Watching many directories results in many open file descriptors,
possibly hitting a per-process fd limit.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects the
directory, but you are then forced to keep a cache of stat structures
around to compare things in order to find out which file.
* dnotify's interface to user-space is awful.
o dnotify uses signals to communicate with user-space.
o Specifically, dnotify uses SIGIO.
o But then you can pick a different signal! So by "signals," I really
meant you need to use real-time signals if you want to queue the
events.
* dnotify basically ignores any problems that would arise in the VFS
from hard
links.
* Rumor is that the "d" in "dnotify" does not stand for "directory"
but for "suck."

A suitable replacement is "inotify." And now, my tract on what inotify
brings
to the table:

* inotify's interface is a device node, not SIGIO.
o You open only a single fd, to the device node. No more pinning
directories or opening a million file descriptors.
o Usage is nice: open the device, issue simple commands via ioctl(),
and then block on the device. It returns events when, well, there are
events to be returned.
o You can select() on the device node and so it integrates with main
loops like coffee mixed with vanilla milkshake.
* inotify has an event that says "the filesystem that the item you were
watching is on was unmounted" (this is particularly cool).
* inotify can watch directories or files.
* The "i" in inotify does not stand for "suck" but for "inode" -- the
logical
choice since inotify is inode-based.

--COMPLEXITY--

I have been asked what the complexity of inotify is. Inotify has
2 path codes where complexity could be an issue:

Adding a watch to a device
This code has to check if the inode is already being watched
by the device, this is O(N) where N is the number of devices
watching the inode

Removing a watch from a device
This code
This code has to do a search of all watches on the device to
find the watch descriptor that is being asked to remove.
This involves a linear search, but should not really be an issue
because it is limited to 8192 entries. If this does turn in to
a concern, I would replace the list of watches on the device
with a sorted binary tree, so that the search could be done
very quickly.

In the near future this will be based on idr, and the complexity
will be O(logn)

The calls to inotify from the VFS code has a complexity of O(1).

--MEMORY USAGE--

The inotify data structures are light weight:

inotify watch is 40 bytes
inotify device is 68 bytes
inotify event is 272 bytes

So assuming a device has 8192 watches, the structures are only going
to consume 320KB of memory. With a maximum number of 8 devices allowed
to exist at a time, this is still only 2.5 MB

Each device can also have 256 events queued at a time, which sums to
68KB per device. And only .5 MB if all devices are opened and have
a full event queue.

So approximately 3 MB of memory are used in the rare case of
everything open and full.

Each inotify watch pins the inode of a directory/file in memory,
the size of an inode is different per file system but lets assume
that it is 512 byes.

So assuming the maximum number of global watches are active, this would
pin down 32 MB of inodes in the inode cache. Again not a problem
on a modern system.

On smaller systems, the maximum watches / events could be lowered
to provide a smaller foot print.

Keep in mind that this is an absolute worst case memory analysis.
In reality it will most likely cost approximately 5MB.

--KERNEL CHANGES--
inotify char device driver.

Adding calls to inotify_inode_queue_event and
inotify_dentry_parent_queue_event from VFS operations.
Dnotify has the same function calls.

Adding a call to inotify_super_block_umount from
generic_shutdown_superblock

inotify_super_block_umount:
find all of the inodes that are on the super block being shut down
foreach of these inodes:
remove the watchers on them
send UNMOUNT and IGNORED events
unref the inode

--
John McCutchan <[email protected]>


Attachments:
inotify-0.15.diff (41.85 kB)

2004-11-01 22:33:25

by Robert Love

[permalink] [raw]
Subject: Re: [RFC][PATCH] inotify 0.15

On Mon, 2004-11-01 at 12:31 -0500, John McCutchan wrote:

> In this release of inotify I have added a new ulimit for the
> number of watches a particular user can have. I am interested
> in peoples response to this addition and any alternative solutions.
>
> Here is release 0.15.0 of inotify. Attached is a patch to 2.6.9

By and by, people using 2.6.10-rc1 or -mm or whatever will find that
several interfaces used by this patch coincidentally changed.

So I thought I'd point you all at an updated 2.6.10-rc1 patch:

http://www.kernel.org/pub/linux/kernel/people/rml/inotify/v2.6/0.15/

Which might be easier.

Robert Love


2004-11-01 22:46:25

by Robert Love

[permalink] [raw]
Subject: [patch] inotify: change default number of queued events

Hola, John.

The default of 16384 maximum queued events is way too high. That is a
lot of physical memory that the user can pin.

Attached patch takes it to 512.

Robert Love


change the default number of queued events down from 16384 to 512

Signed-Off-By: Robert Love <[email protected]>

drivers/char/inotify.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)

diff -urN linux-2.6.10-rc1-inotify/drivers/char/inotify.c linux/drivers/char/inotify.c
--- linux-2.6.10-rc1-inotify/drivers/char/inotify.c 2004-11-01 15:50:36.163496688 -0500
+++ linux/drivers/char/inotify.c 2004-11-01 15:48:15.380898896 -0500
@@ -921,7 +921,7 @@
if (ret)
return ret;

- sysfs_attrib_max_queued_events = 16384;
+ sysfs_attrib_max_queued_events = 512;

for (i = 0;inotify_device_attrs[i];i++)
device_create_file(inotify_device.dev, inotify_device_attrs[i]);


2004-11-01 22:46:38

by Robert Love

[permalink] [raw]
Subject: [patch] inotify: misc. style cleanup

John!

Misc. style clean up.

Robert Love


misc. style cleanup

Signed-Off-By: Robert Love <[email protected]>

drivers/char/inotify.c | 46 ++++++++++++++++------------------------------
1 files changed, 16 insertions(+), 30 deletions(-)

diff -urN linux-2.6.10-rc1-inotify/drivers/char/inotify.c linux/drivers/char/inotify.c
--- linux-2.6.10-rc1-inotify/drivers/char/inotify.c 2004-11-01 15:04:12.039747832 -0500
+++ linux/drivers/char/inotify.c 2004-11-01 15:38:22.277064384 -0500
@@ -80,21 +80,20 @@
#define inotify_watch_i_list(pos) list_entry((pos), struct inotify_watch, i_list)
#define inotify_watch_u_list(pos) list_entry((pos), struct inotify_watch, u_list)

-static ssize_t show_max_queued_events (struct device *dev, char *buf)
+static ssize_t show_max_queued_events(struct device *dev, char *buf)
{
sprintf(buf, "%d", sysfs_attrib_max_queued_events);
-
- return strlen(buf)+1;
+ return strlen(buf) + 1;
}

-static ssize_t store_max_queued_events (struct device *dev,
- const char *buf,
- size_t count)
+static ssize_t store_max_queued_events(struct device *dev, const char *buf,
+ size_t count)
{
return 0;
}

-static DEVICE_ATTR(max_queued_events, S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH, show_max_queued_events, store_max_queued_events);
+static DEVICE_ATTR(max_queued_events, S_IRUGO | S_IWUSR, show_max_queued_events,
+ store_max_queued_events);

static struct device_attribute *inotify_device_attrs[] = {
&dev_attr_max_queued_events,
@@ -203,9 +202,7 @@
{
struct inotify_kernel_event *kevent, *last;

- /*
- * Check if the new event is a duplicate of the last event queued.
- */
+ /* check if the new event is a duplicate of the last event queued. */
last = inotify_dev_get_event(dev);
if (dev->event_count && last->event.mask == mask &&
last->event.wd == watch->wd) {
@@ -272,17 +269,16 @@
* This function can sleep.
*/
static int inotify_dev_get_wd(struct inotify_device *dev,
- struct inotify_watch *watch)
+ struct inotify_watch *watch)
{
int ret;

- if (!dev || !watch || dev->nr_watches >= current->signal->rlim[RLIMIT_IWATCHES].rlim_cur)
+ if (dev->nr_watches >= current->rlim[RLIMIT_IWATCHES].rlim_cur)
return -1;

repeat:
- if (!idr_pre_get(&dev->idr, GFP_KERNEL)) {
+ if (!idr_pre_get(&dev->idr, GFP_KERNEL))
return -ENOSPC;
- }
spin_lock(&dev->lock);
ret = idr_get_new(&dev->idr, watch, &watch->wd);
spin_unlock(&dev->lock);
@@ -394,9 +390,8 @@
struct inotify_watch *watch;

list_for_each_entry(watch, &dev->watches, d_list) {
- if (watch->inode == inode) {
+ if (watch->inode == inode)
return 1;
- }
}

return 0;
@@ -410,9 +405,8 @@
static int inotify_dev_add_watch(struct inotify_device *dev,
struct inotify_watch *watch)
{
- if (!dev || !watch) {
+ if (!dev || !watch)
return -EINVAL;
- }

list_add(&watch->d_list, &dev->watches);
return 0;
@@ -477,9 +471,7 @@
}

if (inode_find_dev (inode, watch->dev))
- {
return -EINVAL;
- }

list_add(&watch->i_list, &inode->inotify_data->watches);
inode->inotify_data->watch_count++;
@@ -678,7 +670,7 @@
if (count < event_size)
return -EINVAL;

- while(1) {
+ while (1) {
int has_events;

spin_lock(&dev->lock);
@@ -760,9 +752,8 @@
{
struct inotify_watch *watch,*next;

- list_for_each_entry_safe(watch, next, &dev->watches, d_list) {
+ list_for_each_entry_safe(watch, next, &dev->watches, d_list)
ignore_helper(watch, 0);
- }
}

/*
@@ -797,9 +788,8 @@
int ret;

inode = find_inode(request->dirname);
- if (IS_ERR(inode)) {
+ if (IS_ERR(inode))
return PTR_ERR(inode);
- }

spin_lock(&inode->i_lock);
spin_lock(&dev->lock);
@@ -934,9 +924,7 @@
sysfs_attrib_max_queued_events = 16384;

for (i = 0;inotify_device_attrs[i];i++)
- {
- device_create_file (inotify_device.dev, inotify_device_attrs[i]);
- }
+ device_create_file(inotify_device.dev, inotify_device_attrs[i]);

atomic_set(&watch_count, 0);
atomic_set(&inotify_cookie, 0);
@@ -958,6 +946,4 @@
return 0;
}

-
-
module_init(inotify_init);


2004-11-02 05:00:57

by Robert Love

[permalink] [raw]
Subject: [patch] inotify: idr stuff

John.

MAX_ID_MASK is already set to the maximum IDR value, so we don't need to
add IDR_MAX_ID. I know this was my idea--I was, uh, being
metaphoric. :)

Plus, MAX_ID_MASK is computed based on IDR_BITS, so it is "always
right."

Also add an IWATCHES_HARD_LIMIT, set to MAX_ID_MASK, instead of using
MAX_ID_MASK directly.

Finally, this uncovers that idr.h is not protected against double
inclusion, so wrap it in ifndef's.

Best,

Robert Love


idr stuff

Signed-Off-By: Robert Love <[email protected]>

include/asm-i386/resource.h | 4 ++--
include/linux/idr.h | 5 ++++-
include/linux/inotify.h | 4 +++-
3 files changed, 9 insertions(+), 4 deletions(-)

diff -urN linux-2.6.10-rc1-inotify/include/asm-i386/resource.h linux/include/asm-i386/resource.h
--- linux-2.6.10-rc1-inotify/include/asm-i386/resource.h 2004-11-01 15:04:29.307122792 -0500
+++ linux/include/asm-i386/resource.h 2004-11-01 15:21:33.374440896 -0500
@@ -18,7 +18,7 @@
#define RLIMIT_LOCKS 10 /* maximum file locks held */
#define RLIMIT_SIGPENDING 11 /* max number of pending signals */
#define RLIMIT_MSGQUEUE 12 /* maximum bytes in POSIX mqueues */
-#define RLIMIT_IWATCHES 13 /* maximum number of inotify watches */
+#define RLIMIT_IWATCHES 13 /* maximum number of inotify watches */

#define RLIM_NLIMITS 14

@@ -46,7 +46,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ MAX_SIGPENDING, MAX_SIGPENDING }, \
{ MQ_BYTES_MAX, MQ_BYTES_MAX }, \
- { IWATCHES_SOFT_LIMIT, IDR_MAX_ID }, \
+ { IWATCHES_SOFT_LIMIT, IWATCHES_HARD_LIMIT }, \
}

#endif /* __KERNEL__ */
diff -urN linux-2.6.10-rc1-inotify/include/linux/idr.h linux/include/linux/idr.h
--- linux-2.6.10-rc1-inotify/include/linux/idr.h 2004-11-01 15:04:28.453252600 -0500
+++ linux/include/linux/idr.h 2004-11-01 15:24:01.538916472 -0500
@@ -1,3 +1,5 @@
+#ifndef _LINUX_IDR_H
+#define _LINUX_IDR_H
/*
* include/linux/idr.h
*
@@ -29,7 +31,6 @@
# error "BITS_PER_LONG is not 32 or 64"
#endif

-#define IDR_MAX_ID 0x7fffffff
#define IDR_SIZE (1 << IDR_BITS)
#define IDR_MASK ((1 << IDR_BITS)-1)

@@ -77,3 +78,5 @@
int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id);
void idr_remove(struct idr *idp, int id);
void idr_init(struct idr *idp);
+
+#endif /* _LINUX_IDR_H */
diff -urN linux-2.6.10-rc1-inotify/include/linux/inotify.h linux/include/linux/inotify.h
--- linux-2.6.10-rc1-inotify/include/linux/inotify.h 2004-11-01 15:04:28.684217488 -0500
+++ linux/include/linux/inotify.h 2004-11-01 15:21:48.714108912 -0500
@@ -72,8 +72,10 @@
#include <linux/dcache.h>
#include <linux/fs.h>
#include <linux/config.h>
+#include <linux/idr.h>

-#define IWATCHES_SOFT_LIMIT 16384
+#define IWATCHES_SOFT_LIMIT 16384
+#define IWATCHES_HARD_LIMIT MAX_ID_MASK

struct inotify_inode_data {
struct list_head watches;



2004-11-02 05:14:58

by Robert Love

[permalink] [raw]
Subject: [patch] inotify: fix dnotify compile

Hi, John.

Looks like the dnotify patch was only half removed. The declaration of
dir_notify_enable in include/linux/fs.h is still removed by the patch.
So e.g. sysctl.c does not compile.

You can remove that hunk from the patch or commit the following.

Thanks,

Robert Love


fix CONFIG_DNOTIFY compile

Signed-Off-By: Robert Love <[email protected]>

include/linux/fs.h | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)

diff -u linux-poop/include/linux/fs.h linux/include/linux/fs.h
--- linux-poop/include/linux/fs.h 2004-10-26 22:13:06.000000000 -0400
+++ linux/include/linux/fs.h 2004-11-01 14:52:58.630121656 -0500
@@ -62,7 +62,7 @@
};
extern struct inodes_stat_t inodes_stat;

-extern int leases_enable, lease_break_time;
+extern int leases_enable, dir_notify_enable, lease_break_time;

#define NR_FILE 8192 /* this can well be larger on a larger system */
#define NR_RESERVED_FILES 10 /* reserved for root */


2004-11-03 02:52:38

by Gene Heskett

[permalink] [raw]
Subject: Re: [patch] inotify: idr stuff

On Monday 01 November 2004 15:28, Robert Love wrote:
>John.
>
>MAX_ID_MASK is already set to the maximum IDR value, so we don't
> need to add IDR_MAX_ID. I know this was my idea--I was, uh, being
>metaphoric. :)
>
>Plus, MAX_ID_MASK is computed based on IDR_BITS, so it is "always
>right."
>
>Also add an IWATCHES_HARD_LIMIT, set to MAX_ID_MASK, instead of
> using MAX_ID_MASK directly.
>
>Finally, this uncovers that idr.h is not protected against double
>inclusion, so wrap it in ifndef's.
>
>Best,
>
> Robert Love

Anybody know whats going on, I just got 8 copies of this
[...]

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.28% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.