Hello,
Here is release 0.9.2 of inotify. Attached is a patch to 2.6.8.1
--New in this version--
Works with 4K stack kernels
Stops timers properly when using SMP
A couple of things to note,
The memory usage analysis below is worst case, when inotify is watching
65536 inodes.
Inotify does pin inodes, but this is no different than dnotify.
Inotify is designed as a replacement for dnotify.
The key difference's are:
Inotify does not require the file to be opened to watch it
When a path you are watching is unmounted you will recieve an UNMOUNT
event.
Events are devilered over a FD that is select()-able not with signals.
--COMPLEXITY--
I have been asked what the complexity of inotify is. Inotify has
2 path codes where complexity could be an issue:
Adding a watcher to a device
This code has to check if the inode is already being watched
by the device, this is O(1) since the maximum number of
devices is limited to 8.
Removing a watch from a device
This code has to do a search of all watches on the device to
find the watch descriptor that is being asked to remove.
This involves a linear search, but should not really be an issue
because it is limited to 8192 entries. If this does turn in to
a concern, I would replace the list of watches on the device
with a sorted binary tree, so that the search could be done
very quickly.
The calls to inotify from the VFS code has a complexity of O(1) so
inotify does not affect the speed of VFS operations.
--MEMORY USAGE--
The inotify data structures are light weight:
inotify watch is 40 bytes
inotify device is 68 bytes
inotify event is 272 bytes
So assuming a device has 8192 watches, the structures are only going
to consume 320KB of memory. With a maximum number of 8 devices allowed
to exist at a time, this is still only 2.5 MB
Each device can also have 256 events queued at a time, which sums to
68KB per device. And only .5 MB if all devices are opened and have
a full event queue.
So approximately 3 MB of memory are used in the rare case of
everything open and full.
Each inotify watch pins the inode of a directory/file in memory,
the size of an inode is different per file system but lets assume
that it is 512 byes.
So assuming the maximum number of global watches are active, this would
pin down 32 MB of inodes in the inode cache. Again not a problem
on a modern system.
On smaller systems, the maximum watches / events could be lowered
to provide a smaller foot print.
Older release notes:
I am resubmitting inotify for comments and review. Inotify has
changed drastically from the earlier proposal that Al Viro did not
approve of. There is no longer any use of (device number, inode number)
pairs. Please give this version of inotify a fresh view.
Inotify is a character device that when opened offers 2 IOCTL's.
(It actually has 4 but the other 2 are used for debugging)
INOTIFY_WATCH:
Which takes a path and event mask and returns a unique
(to the instance of the driver) integer (wd [watch descriptor]
from here on) that is a 1:1 mapping to the path passed.
What happens is inotify gets the inode (and ref's the inode)
for the path and adds a inotify_watcher structure to the inodes
list of watchers. If this instance of the driver is already
watching the path, the event mask will be updated and
the original wd will be returned.
INOTIFY_IGNORE:
Which takes an integer (that you got from INOTIFY_WATCH)
representing a wd that you are not interested in watching
anymore. This will:
send an IGNORE event to the device
remove the inotify_watcher structure from the device and
from the inode and unref the inode.
After you are watching 1 or more paths, you can read from the fd
and get events. The events are struct inotify_event. If you are
watching a directory and something happens to a file in the directory
the event will contain the filename (just the filename not the full
path).
Aside from the inotify character device driver.
The changes to the kernel are very minor.
The first change is adding calls to inotify_inode_queue_event and
inotify_dentry_parent_queue_event from the various vfs functions. This
is identical to dnotify.
The second change is more serious, it adds a call to
inotify_super_block_umount
inside generic_shutdown_superblock. What inotify_super_block_umount does
is:
find all of the inodes that are on the super block being shut down,
sends each watcher on each inode the UNMOUNT and IGNORED event
removes the watcher structures from each instance of the device driver
and each inode.
unref's the inode.
I would appreciate design review, code review and testing.
John
On Sun, 2004-09-19 at 23:56 -0400, John McCutchan wrote:
> I would appreciate design review, code review and testing.
Hi, John.
When you pass SLAB_PANIC to kmem_cache_create(), the slab layer will
panic the kernel and thus halt the machine. So there is no reason to
check the return value.
We could remove the SLAB_PANIC, but I think that it makes sense, so
instead I removed the code checking the returns, saving a few bytes.
Patch is against your inotify patch.
Robert Love
On Sun, 2004-09-19 at 23:56 -0400, John McCutchan wrote:
> I would appreciate design review, code review and testing.
Hey, John!
Attached patch, against inotify 0.9.2, cleans up the inotify.h header
file a bit. The patch is a little noisy because I reformat some lines
for readability or coding style concerns, but there are a few important
changes:
- Use PATH_MAX, the actual maximum path length, not 256, for the
size of "filename" in "struct inotify_event"
- Separate with __KERNEL__ the kernel from the user API
- Instead of forward declaring the inode, superblock, and
dentry structures, include the appropriate header file
Thanks!
Robert Love
On Sun, 2004-09-19 at 23:56 -0400, John McCutchan wrote:
> I would appreciate design review, code review and testing.
Hi, John.
Attached patches fixes two compile warnings I receive in
drivers/char/inotify.c:
- Declaration after code in inotify_watch()
- Uncasted conversion from pointer to integer
Also, wrap some arithmetic in parenthesis, to be safe.
Best,
Robert Love
On Sun, 2004-09-19 at 23:56 -0400, John McCutchan wrote:
> I would appreciate design review, code review and testing.
Hello sir!
kmalloc() can fail, so check the return value and handle appropriately.
Patch is against inotify 0.9.2 plus the last patches I sent.
Thanks,
Robert Love
Robert Love wrote:
>
> +/*
> + * struct inotify_event - structure read from the inotify device for each event
> + *
> + * When you are watching a directory, you will receive the filename for events
> + * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, and so on ...
> + *
> + * Note: When reading from the device you must provide a buffer that is a
> + * multiple of sizeof(struct inotify_event)
> + */
> struct inotify_event {
> int wd;
> int mask;
> - char filename[256];
> + char filename[PATH_MAX];
> };
You really want to shove >4kB per event to userspace???
Ciao, ET.
Edgar Toernig wrote:
> Robert Love wrote:
>
>> struct inotify_event {
>> int wd;
>> int mask;
>>- char filename[256];
>>+ char filename[PATH_MAX];
>> };
>
>
> You really want to shove >4kB per event to userspace???
Ouch.
Maybe make it variable-size? On average it would likely be shorter.
struct inotify_event {
int wd;
int mask;
short namelen;
char filename[0];
};
Chris
On Tue, 2004-09-21 at 17:34 +0200, Edgar Toernig wrote:
> You really want to shove >4kB per event to userspace???
Eh, good point. And I guess this is just the filename.
Probably best to keep it at 256, then, until we think of something
better.
Attached patch is the header cleanup minus the PATH_MAX bit.
Robert Love
On Sun, 2004-09-19 at 23:56 -0400, John McCutchan wrote:
> I would appreciate design review, code review and testing.
Hey, John.
We are seeing an oops when monitoring a large number of directories.
The system keeps running, but I/O gets flaky and eventually processes
start getting stuck.
Also, the ioctl() stops returning new WD after 1024. Thereafter, it
keeps returning the same value.
I have attached the relevant bits from the syslog. I will debug it, but
I thought that perhaps you would immediately see the issue.
Thanks,
Robert Love
On Tue, 2004-09-21 at 12:04 -0400, Robert Love wrote:
> Hey, John.
>
> We are seeing an oops when monitoring a large number of directories.
> The system keeps running, but I/O gets flaky and eventually processes
> start getting stuck.
>
> Also, the ioctl() stops returning new WD after 1024. Thereafter, it
> keeps returning the same value.
>
> I have attached the relevant bits from the syslog. I will debug it, but
> I thought that perhaps you would immediately see the issue.
OK. I fixed the problem with ioctl() failing after 1024 WD's. This may
also fix the oopses. Still checking on that.
The problem was that we were passing the size of dev->bitmask in _bytes_
to find_first_zero_bit(). But find_first_zero_bit()'s second parameter
is the size in _bits_.
I then went ahead and just made dev->bitmask an array, since we know the
size at compile time.
Comments?
Best,
Robert Love
On Tue, 2004-09-21 at 14:56 -0400, Robert Love wrote:
> I then went ahead and just made dev->bitmask an array, since we know the
> size at compile time.
Forgot to remove the kfree() there, which I did not hit in testing until
now.
This patch applies on top of the previous.
Robert Love
Signed-Off-By: Robert "Cheese Taste Good" Love <[email protected]>
drivers/char/inotify.c | 1 -
1 files changed, 1 deletion(-)
--- linux-inotify-rml/drivers/char/inotify.c.orig 2004-09-21 16:50:00.643770936 -0400
+++ linux-inotify-rml/drivers/char/inotify.c 2004-09-21 16:50:45.118009824 -0400
@@ -711,7 +711,6 @@
inotify_release_all_events(dev);
- kfree (dev->bitmask);
kfree (dev);
}
On Tue, 2004-09-21 at 11:43, Chris Friesen wrote:
> Edgar Toernig wrote:
> > Robert Love wrote:
> >
> >> struct inotify_event {
> >> int wd;
> >> int mask;
> >>- char filename[256];
> >>+ char filename[PATH_MAX];
> >> };
> >
> >
> > You really want to shove >4kB per event to userspace???
>
> Ouch.
>
> Maybe make it variable-size? On average it would likely be shorter.
>
> struct inotify_event {
> int wd;
> int mask;
> short namelen;
> char filename[0];
> };
This makes reading events from inotify a pain, first you need to read up
to namelen, then read namelen more bytes.
John
On Tue, 2004-09-21 at 14:56, Robert Love wrote:
> On Tue, 2004-09-21 at 12:04 -0400, Robert Love wrote:
>
> > Hey, John.
> >
> > We are seeing an oops when monitoring a large number of directories.
> > The system keeps running, but I/O gets flaky and eventually processes
> > start getting stuck.
> >
> > Also, the ioctl() stops returning new WD after 1024. Thereafter, it
> > keeps returning the same value.
> >
> > I have attached the relevant bits from the syslog. I will debug it, but
> > I thought that perhaps you would immediately see the issue.
>
> OK. I fixed the problem with ioctl() failing after 1024 WD's. This may
> also fix the oopses. Still checking on that.
>
I hope it fixes the oopses, I have only just started looking at the oops
you sent me, and nothing jumped out at me.
> The problem was that we were passing the size of dev->bitmask in _bytes_
> to find_first_zero_bit(). But find_first_zero_bit()'s second parameter
> is the size in _bits_.
>
Good to know, I wasn't sure when I wrote this code.
> I then went ahead and just made dev->bitmask an array, since we know the
> size at compile time.
>
> Comments?
Sounds good.
John
On Tue, 2004-09-21 at 22:32 -0400, John McCutchan wrote:
> I hope it fixes the oopses, I have only just started looking at the oops
> you sent me, and nothing jumped out at me.
It seems to have fixed the oops.
We are seeing nothing but perfection with the updated patch that I
posted. ;-)
Robert Love
On Tue, 2004-09-21 at 19:27, John McCutchan wrote:
> On Tue, 2004-09-21 at 11:43, Chris Friesen wrote:
> > Edgar Toernig wrote:
> > > Robert Love wrote:
> > >
> > >> struct inotify_event {
> > >> int wd;
> > >> int mask;
> > >>- char filename[256];
> > >>+ char filename[PATH_MAX];
> > >> };
> > >
> > >
> > > You really want to shove >4kB per event to userspace???
> >
> > Ouch.
> >
> > Maybe make it variable-size? On average it would likely be shorter.
> >
> > struct inotify_event {
> > int wd;
> > int mask;
> > short namelen;
> > char filename[0];
> > };
>
> This makes reading events from inotify a pain, first you need to read up
> to namelen, then read namelen more bytes.
>
Not the case. A mildly smarter userspace program would merely read
everything outstanding (or everything up to a fixed buffer length), and
then unserialize the events based on the boundaries it can figure out
from the first portion of the structure.
This is a way common technique.
At least in code I write, anyway :-).
Ray
On Wed, 2004-09-22 at 21:46, Ray Lee wrote:
> On Tue, 2004-09-21 at 19:27, John McCutchan wrote:
> > On Tue, 2004-09-21 at 11:43, Chris Friesen wrote:
> > > Edgar Toernig wrote:
> > > > Robert Love wrote:
> > > >
> > > >> struct inotify_event {
> > > >> int wd;
> > > >> int mask;
> > > >>- char filename[256];
> > > >>+ char filename[PATH_MAX];
> > > >> };
> > > >
> > > >
> > > > You really want to shove >4kB per event to userspace???
> > >
> > > Ouch.
> > >
> > > Maybe make it variable-size? On average it would likely be shorter.
> > >
> > > struct inotify_event {
> > > int wd;
> > > int mask;
> > > short namelen;
> > > char filename[0];
> > > };
> >
> > This makes reading events from inotify a pain, first you need to read up
> > to namelen, then read namelen more bytes.
> >
>
> Not the case. A mildly smarter userspace program would merely read
> everything outstanding (or everything up to a fixed buffer length), and
> then unserialize the events based on the boundaries it can figure out
> from the first portion of the structure.
>
> This is a way common technique.
>
> At least in code I write, anyway :-).
Of course this is possible, but the inotify kernel driver only allows userspace
program to read in event sized chunks (So that the event queue
handling is kept simple)
John
Okay, just skimmed the latest patch to try to make sure I'm not talking
crazy talk. No guarantees, though.
On Wed, 2004-09-22 at 20:42, John McCutchan wrote:
> the inotify kernel driver only allows userspace
> program to read in event sized chunks (So that the event queue
> handling is kept simple)
It's not much more code.
Instead of calculating events as:
+static ssize_t inotify_read(struct file *file, __user char *buf,
+ size_t count, loff_t *pos) {
[...]
+ int events;
[...]
+ /* We only hand out full inotify events */
+ if (count < sizeof(struct inotify_event)) {
+ out = -EINVAL;
+ goto out;
+ }
+
+ events = count / sizeof(struct inotify_event);
...just keep track of the actual byte count left in buf, and continue
copying until the next event would overflow buf. Require userspace to
provide a buffer at least NAME_MAX + sizeof(struct inotify_event) [*]
where the last field in the struct is declared as filename[0], which
will guarantee forward progress in passing events.
[*] Here's one of those things that makes me think that I'm
talking out my tush. The comments claim that only the filename
will be returned to userspace, but later on another comment says
that the size might technically fly up to PATH_MAX. Wassup?
Events still arrive at userspace in logical chunks; all is good.
Perhaps I'm missing something. Always a possibility, that.
BTW:
<pedantic>
+ unsigned long bitmask[MAX_INOTIFY_DEV_WATCHERS/BITS_PER_LONG];
would be more correct if written
unsigned long bitmask[(MAX_INOTIFY_DEV_WATCHERS + BITS_PER_LONG - 1) / BITS_PER_LONG];
</pedantic>
BTW #2: 'mask' is variously declared as an unsigned long and other times
as an int. Granted, the two base declarations seem to live in different
structs, but I can't figure out when a mask-like thing would want to be
signed. Please consider either changing the name or, more likely,
changing all usages to unsigned. My single linear reading through the
patch hasn't quite clarified the usage to me.
Ray
P.s. Have I mentioned that I like the inotify idea a heck of a lot
better than dnotify? Ghu save us from people who think signals are a
wonderful way to communicate complex information.
On Wed, 2004-09-22 at 21:52 -0700, Ray Lee wrote:
> [*] Here's one of those things that makes me think that I'm
> talking out my tush. The comments claim that only the filename
> will be returned to userspace, but later on another comment says
> that the size might technically fly up to PATH_MAX. Wassup?
Technically speaking, a single filename can be as large as PATH_MAX-1.
The comment is just a warning, though, to explain the dreary theoretical
side of the world. Pragmatism demands that we just use
INOTIFY_FILENAME_MAX, which is a more reasonable 256.
> BTW:
> <pedantic>
> + unsigned long bitmask[MAX_INOTIFY_DEV_WATCHERS/BITS_PER_LONG];
>
> would be more correct if written
>
> unsigned long bitmask[(MAX_INOTIFY_DEV_WATCHERS + BITS_PER_LONG - 1) / BITS_PER_LONG];
>
> </pedantic>
Indeed! Although we define MAX_INOTIFY_DEV_WATCHERS right above and it
is a power of two.
> BTW #2: 'mask' is variously declared as an unsigned long and other times
> as an int. Granted, the two base declarations seem to live in different
> structs, but I can't figure out when a mask-like thing would want to be
> signed. Please consider either changing the name or, more likely,
> changing all usages to unsigned. My single linear reading through the
> patch hasn't quite clarified the usage to me.
Probably should just be an 'unsigned int' everywhere.
But there are a few variables that have the same name in various
structures. That confuses me to no end, but I am jumpy like that.
> P.s. Have I mentioned that I like the inotify idea a heck of a lot
> better than dnotify? Ghu save us from people who think signals are a
> wonderful way to communicate complex information.
Oh, dude, inotify is a godsend.
Robert Love
I still might be talking crazy talk, but...
On Wed, 2004-09-22 at 22:10, Robert Love wrote:
> The comment is just a warning, though, to explain the dreary theoretical
> side of the world. Pragmatism demands that we just use
> INOTIFY_FILENAME_MAX, which is a more reasonable 256.
Why bother being pragmatic when we can be correct? It's not much more
code to just do it right, and allow up to PATH_MAX filenames to be
passed to userspace.
As an alternate argument, an `ls -1 | wc` on a randomly picked directory
of my filesystem reveals an average filename length of just under 11
characters. We can save some memory (and a lot of syscalls!) by packing
in events more tightly than the 256 character statically sized
rendition.
So, correct *and* efficient. Again, what am I missing here?
> > BTW:
> > <pedantic>
> > + unsigned long bitmask[MAX_INOTIFY_DEV_WATCHERS/BITS_PER_LONG];
> >
> > would be more correct if written
> >
> > unsigned long bitmask[(MAX_INOTIFY_DEV_WATCHERS + BITS_PER_LONG - 1) / BITS_PER_LONG];
> >
> > </pedantic>
>
> Indeed! Although we define MAX_INOTIFY_DEV_WATCHERS right above and it
> is a power of two.
Sure, it's a multiple of BITS_PER_LONG right *now*, but what about in
the future? It only adds compile time overhead. Regardless, it's why I
wrote my comment as 'more correct.' I won't lose any sleep over that
change not going in, but there's no reason to encourage bad coding
habits.
> Oh, dude, inotify is a godsend.
Amen. So let's get it nailed, so that this corner of the problem space
can be considered done, and we can all move on to building bigger and
better things on top of it.
Ray