I have a customer application which sees a significant performance
degradation when run with udevd running. This appears to be due to:
void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
u32 cookie, const char *name)
{
...
if (!atomic_read (&inotify_watches))
return;
spin_lock(&dentry->d_lock);
The particular application is a 512 thread application. When run with
udevd turned off the application finishes in about 6 seconds. With it
turned on, the appliction takes minutes (I think we timed it to around
22 minutes, but we have not been patient enough to wait it through to
the end in some time). This is being run on a 512cpu system, but there
is a noticable performance hit on smaller systems as well.
As far as I can tell, the application has multiple threads writing
different portions of the same file, but the customer is still working
on providing a simplified test case.
I know _VERY_ little about filesystems. udevd appears to be looking
at /etc/udev/rules.d. This bumps inotify_watches to 1. The file
being written is on an xfs filesystem mounted at a different mountpoint.
Could the inotify flag be moved from a global to a sb (or something
finer) point and therefore avoid taking the dentry->d_lock when there
is no possibility of a watch event being queued.
Thanks,
Robin Holt
Robin Holt <[email protected]> wrote:
>
>
> I have a customer application which sees a significant performance
> degradation when run with udevd running. This appears to be due to:
>
>
> void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
> u32 cookie, const char *name)
> {
> ...
> if (!atomic_read (&inotify_watches))
> return;
>
> spin_lock(&dentry->d_lock);
>
> The particular application is a 512 thread application. When run with
> udevd turned off the application finishes in about 6 seconds. With it
> turned on, the appliction takes minutes (I think we timed it to around
> 22 minutes, but we have not been patient enough to wait it through to
> the end in some time). This is being run on a 512cpu system, but there
> is a noticable performance hit on smaller systems as well.
>
> As far as I can tell, the application has multiple threads writing
> different portions of the same file, but the customer is still working
> on providing a simplified test case.
Are you able to work out whether inotify_inode_watched(inode) is returning
true?
Probably it isn't, and everyone is hammering dentry->d_lock. You'll
probably find that if all those processes were accessing the shared file
via separate filenames (all hardlinked to the same file), things would
improve.
I get a screwed feeling about this. We have to take d_lock so we can get
at the parent dentry. Otherwise we have obscure races with rename and
unlink.
> I know _VERY_ little about filesystems. udevd appears to be looking
> at /etc/udev/rules.d. This bumps inotify_watches to 1. The file
> being written is on an xfs filesystem mounted at a different mountpoint.
> Could the inotify flag be moved from a global to a sb (or something
> finer) point and therefore avoid taking the dentry->d_lock when there
> is no possibility of a watch event being queued.
umm, yes, that's a bit of a palliative, but we could probably move
inotify_watches into dentry->d_inode->i_sb.
On Wed, 2006-22-02 at 07:42 -0600, Robin Holt wrote:
>
> I know _VERY_ little about filesystems. udevd appears to be looking
> at /etc/udev/rules.d. This bumps inotify_watches to 1. The file
> being written is on an xfs filesystem mounted at a different mountpoint.
> Could the inotify flag be moved from a global to a sb (or something
> finer) point and therefore avoid taking the dentry->d_lock when there
> is no possibility of a watch event being queued.
We could do this, and avoid the problem, but only in this specific
scenario. The file being written is on a different mountpoint but whats
to stop a different app from running inotify on that mount point?
Perhaps the program could be altered instead?
--
John McCutchan <[email protected]>
On Wed, Feb 22, 2006 at 11:48:23AM -0500, John McCutchan wrote:
> On Wed, 2006-22-02 at 07:42 -0600, Robin Holt wrote:
> >
> > I know _VERY_ little about filesystems. udevd appears to be looking
> > at /etc/udev/rules.d. This bumps inotify_watches to 1. The file
> > being written is on an xfs filesystem mounted at a different mountpoint.
> > Could the inotify flag be moved from a global to a sb (or something
> > finer) point and therefore avoid taking the dentry->d_lock when there
> > is no possibility of a watch event being queued.
>
> We could do this, and avoid the problem, but only in this specific
> scenario. The file being written is on a different mountpoint but whats
> to stop a different app from running inotify on that mount point?
> Perhaps the program could be altered instead?
Looking at fsnotify_access() I think we could hit the same scenario.
Are you proposing we alter any appliction where multiple threads read
a single data file to first make a hard link to that data file and each
read from their private copy? I don't think that is a very reasonable
suggestion.
Let me reiterate, I know _VERY_ little about filesystems. Can the
dentry->d_lock be changed to a read/write lock?
Thanks,
Robin
Robin Holt <[email protected]> wrote:
>
> Let me reiterate, I know _VERY_ little about filesystems. Can the
> dentry->d_lock be changed to a read/write lock?
Well, it could, but I suspect that won't help - the hold times in there
will be very short so the problem is more likely acquisition frequency.
However it's a bit strange that this function is the bottleneck. If their
workload is doing large numbers of reads or writes from large numbers of
processes against the same file then they should be hitting heavy
contention on other locks, such as i_sem and/or tree_lock and/or lru_lock
and others.
Can you tell us more about the kernel-visible behaviour of this app?
Andrew Morton wrote:
>Robin Holt <[email protected]> wrote:
>
>
>>Let me reiterate, I know _VERY_ little about filesystems. Can the
>> dentry->d_lock be changed to a read/write lock?
>>
>>
>
>Well, it could, but I suspect that won't help - the hold times in there
>will be very short so the problem is more likely acquisition frequency.
>
>However it's a bit strange that this function is the bottleneck. If their
>workload is doing large numbers of reads or writes from large numbers of
>processes against the same file then they should be hitting heavy
>contention on other locks, such as i_sem and/or tree_lock and/or lru_lock
>and others.
>
>Can you tell us more about the kernel-visible behaviour of this app?
>
>
I have also seen this problem, and it's hard to reproduce. What you will
see is udev getting spawned
multiple times as reported by top. I have found its related to
intermittent failures of the hard drive and
the hotpluger for some reason invoking udev multiple times in response
to this. I saw it on a Compaq
laptop right before my hard drive croaked, and it seems BIOS specific as
well, since I have never seen
it or been able to reproduce it reliably.
Jeff
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
On Wed, 2006-22-02 at 11:50 -0600, Robin Holt wrote:
> On Wed, Feb 22, 2006 at 11:48:23AM -0500, John McCutchan wrote:
> > On Wed, 2006-22-02 at 07:42 -0600, Robin Holt wrote:
> > >
> > > I know _VERY_ little about filesystems. udevd appears to be looking
> > > at /etc/udev/rules.d. This bumps inotify_watches to 1. The file
> > > being written is on an xfs filesystem mounted at a different mountpoint.
> > > Could the inotify flag be moved from a global to a sb (or something
> > > finer) point and therefore avoid taking the dentry->d_lock when there
> > > is no possibility of a watch event being queued.
> >
> > We could do this, and avoid the problem, but only in this specific
> > scenario. The file being written is on a different mountpoint but whats
> > to stop a different app from running inotify on that mount point?
> > Perhaps the program could be altered instead?
>
> Looking at fsnotify_access() I think we could hit the same scenario.
> Are you proposing we alter any appliction where multiple threads read
> a single data file to first make a hard link to that data file and each
> read from their private copy? I don't think that is a very reasonable
> suggestion.
Listen, what I'm saying is that your suggested change will only help in
one specific scenario, and simply having inotify used on the 'wrong'
mountpoint will get you back to square one. So, your suggestion isn't
really a solution, but a way of avoiding the real problem. What I *am*
suggesting is that a real fix be found, instead of your hack.
--
John McCutchan <[email protected]>
John McCutchan <[email protected]> wrote:
>
> On Wed, 2006-22-02 at 11:50 -0600, Robin Holt wrote:
> > On Wed, Feb 22, 2006 at 11:48:23AM -0500, John McCutchan wrote:
> > > On Wed, 2006-22-02 at 07:42 -0600, Robin Holt wrote:
> > > >
> > > > I know _VERY_ little about filesystems. udevd appears to be looking
> > > > at /etc/udev/rules.d. This bumps inotify_watches to 1. The file
> > > > being written is on an xfs filesystem mounted at a different mountpoint.
> > > > Could the inotify flag be moved from a global to a sb (or something
> > > > finer) point and therefore avoid taking the dentry->d_lock when there
> > > > is no possibility of a watch event being queued.
> > >
> > > We could do this, and avoid the problem, but only in this specific
> > > scenario. The file being written is on a different mountpoint but whats
> > > to stop a different app from running inotify on that mount point?
> > > Perhaps the program could be altered instead?
> >
> > Looking at fsnotify_access() I think we could hit the same scenario.
> > Are you proposing we alter any appliction where multiple threads read
> > a single data file to first make a hard link to that data file and each
> > read from their private copy? I don't think that is a very reasonable
> > suggestion.
>
> Listen, what I'm saying is that your suggested change will only help in
> one specific scenario, and simply having inotify used on the 'wrong'
> mountpoint will get you back to square one. So, your suggestion isn't
> really a solution, but a way of avoiding the real problem. What I *am*
> suggesting is that a real fix be found,
I have a bad feeling about this one. It'd be nice to have an exact
understanding of the problen source, but if it's just lots of traffic on
->d_lock we're kinda stuck. I don't expect we'll run off and RCUify
d_parent or turn d_lock into a seq_lock or anything liek that.
Then again, maybe making d_lock an rwlock _will_ help - if this workload is
also hitting tree_lock (Robin?) and we're not seeing suckiness due to that
then perhaps the rwlock is magically helping.
> instead of your hack.
It's not a terribly bad hack - it's just poor-man's hashing, and it's
reasonably well-suited to the sorts of machines and workloads which we
expect will hit this problem.
On Wed, 2006-22-02 at 15:12 -0800, Andrew Morton wrote:
> John McCutchan <[email protected]> wrote:
> >
> > On Wed, 2006-22-02 at 11:50 -0600, Robin Holt wrote:
> > > On Wed, Feb 22, 2006 at 11:48:23AM -0500, John McCutchan wrote:
> > > > On Wed, 2006-22-02 at 07:42 -0600, Robin Holt wrote:
> > > > >
> > > > > I know _VERY_ little about filesystems. udevd appears to be looking
> > > > > at /etc/udev/rules.d. This bumps inotify_watches to 1. The file
> > > > > being written is on an xfs filesystem mounted at a different mountpoint.
> > > > > Could the inotify flag be moved from a global to a sb (or something
> > > > > finer) point and therefore avoid taking the dentry->d_lock when there
> > > > > is no possibility of a watch event being queued.
> > > >
> > > > We could do this, and avoid the problem, but only in this specific
> > > > scenario. The file being written is on a different mountpoint but whats
> > > > to stop a different app from running inotify on that mount point?
> > > > Perhaps the program could be altered instead?
> > >
> > > Looking at fsnotify_access() I think we could hit the same scenario.
> > > Are you proposing we alter any appliction where multiple threads read
> > > a single data file to first make a hard link to that data file and each
> > > read from their private copy? I don't think that is a very reasonable
> > > suggestion.
> >
> > Listen, what I'm saying is that your suggested change will only help in
> > one specific scenario, and simply having inotify used on the 'wrong'
> > mountpoint will get you back to square one. So, your suggestion isn't
> > really a solution, but a way of avoiding the real problem. What I *am*
> > suggesting is that a real fix be found,
>
> I have a bad feeling about this one. It'd be nice to have an exact
> understanding of the problen source, but if it's just lots of traffic on
> ->d_lock we're kinda stuck. I don't expect we'll run off and RCUify
> d_parent or turn d_lock into a seq_lock or anything liek that.
>
> Then again, maybe making d_lock an rwlock _will_ help - if this workload is
> also hitting tree_lock (Robin?) and we're not seeing suckiness due to that
> then perhaps the rwlock is magically helping.
>
>
> > instead of your hack.
>
> It's not a terribly bad hack - it's just poor-man's hashing, and it's
> reasonably well-suited to the sorts of machines and workloads which we
> expect will hit this problem.
>
If this is as good as it gets, here is a patch (totally untested).
Signed-off-by: John McCutchan <[email protected]>
Index: linux-2.6.16-rc4/fs/inotify.c
===================================================================
--- linux-2.6.16-rc4.orig/fs/inotify.c 2006-02-17 17:23:45.000000000 -0500
+++ linux-2.6.16-rc4/fs/inotify.c 2006-02-22 18:36:29.000000000 -0500
@@ -38,7 +38,6 @@
#include <asm/ioctls.h>
static atomic_t inotify_cookie;
-static atomic_t inotify_watches;
static kmem_cache_t *watch_cachep;
static kmem_cache_t *event_cachep;
@@ -426,7 +425,7 @@
get_inotify_watch(watch);
atomic_inc(&dev->user->inotify_watches);
- atomic_inc(&inotify_watches);
+ atomic_inc(&inode->i_sb->s_inotify_watches);
return watch;
}
@@ -459,7 +458,7 @@
list_del(&watch->d_list);
atomic_dec(&dev->user->inotify_watches);
- atomic_dec(&inotify_watches);
+ atomic_dec(&watch->inode->i_sb->s_inotify_watches);
idr_remove(&dev->idr, watch->wd);
put_inotify_watch(watch);
}
@@ -538,7 +537,7 @@
struct dentry *parent;
struct inode *inode;
- if (!atomic_read (&inotify_watches))
+ if (!atomic_read (&dentry->d_sb->s_inotify_watches))
return;
spin_lock(&dentry->d_lock);
@@ -1065,7 +1064,6 @@
inotify_max_user_watches = 8192;
atomic_set(&inotify_cookie, 0);
- atomic_set(&inotify_watches, 0);
watch_cachep = kmem_cache_create("inotify_watch_cache",
sizeof(struct inotify_watch),
Index: linux-2.6.16-rc4/fs/super.c
===================================================================
--- linux-2.6.16-rc4.orig/fs/super.c 2006-02-17 17:23:45.000000000 -0500
+++ linux-2.6.16-rc4/fs/super.c 2006-02-22 18:34:27.000000000 -0500
@@ -86,6 +86,9 @@
s->s_qcop = sb_quotactl_ops;
s->s_op = &default_op;
s->s_time_gran = 1000000000;
+#ifdef CONFIG_INOTIFY
+ atomic_set(&s->s_inotify_watches, 0);
+#endif
}
out:
return s;
Index: linux-2.6.16-rc4/include/linux/fs.h
===================================================================
--- linux-2.6.16-rc4.orig/include/linux/fs.h 2006-02-22 18:30:14.000000000 -0500
+++ linux-2.6.16-rc4/include/linux/fs.h 2006-02-22 18:32:37.000000000 -0500
@@ -843,7 +843,7 @@
void *s_fs_info; /* Filesystem private info */
+#ifdef CONFIG_INOTIFY
+ atomic_t s_inotify_watches; /* Number of inotify watches */
+#endif
/*
--
John McCutchan <[email protected]>
On Wed, Feb 22, 2006 at 12:05:47PM -0800, Andrew Morton wrote:
> Robin Holt <[email protected]> wrote:
> >
> > Let me reiterate, I know _VERY_ little about filesystems. Can the
> > dentry->d_lock be changed to a read/write lock?
>
> Well, it could, but I suspect that won't help - the hold times in there
> will be very short so the problem is more likely acquisition frequency.
>
> However it's a bit strange that this function is the bottleneck. If their
> workload is doing large numbers of reads or writes from large numbers of
> processes against the same file then they should be hitting heavy
> contention on other locks, such as i_sem and/or tree_lock and/or lru_lock
> and others.
>
> Can you tell us more about the kernel-visible behaviour of this app?
I looked at a little more of the output. All I have to go on is a few
back traces generated by kdb of the entire system a few seconds apart.
In all of the traces, the first chunk of cpus are the only ones doing
writes. I have not counted exactly, but I think it is around 32. There
may be more or less, but that is the feeling (sometimes they are doing
reads as well).
The remainder of the cpus seem to be doing seeks to various parts of
the file and doing reads. The seeks+read seem to overlap as I see
a nearly uniform read of 64k, but the seeks seem to be 4k aligned.
Also, at the time the snapshots were taken, they all are fairly
early in the file (between about 15% and 35% of the file) where
the writes are happening toward to early part of the second half
(50% - 70%).
I no longer have access to the test machine. Since we told them
to turn off udevd, they no longer see a problem. I will try to
convince them to allow some more testing. I will also try to
recreate on our machine as well.
Thanks,
Robin
On Thu, Feb 23, 2006 at 06:56:39AM -0600, Robin Holt wrote:
> On Wed, Feb 22, 2006 at 12:05:47PM -0800, Andrew Morton wrote:
> > Robin Holt <[email protected]> wrote:
> > >
> > > Let me reiterate, I know _VERY_ little about filesystems. Can the
> > > dentry->d_lock be changed to a read/write lock?
> >
> > Well, it could, but I suspect that won't help - the hold times in there
> > will be very short so the problem is more likely acquisition frequency.
> >
> > However it's a bit strange that this function is the bottleneck. If their
> > workload is doing large numbers of reads or writes from large numbers of
> > processes against the same file then they should be hitting heavy
> > contention on other locks, such as i_sem and/or tree_lock and/or lru_lock
> > and others.
> >
> > Can you tell us more about the kernel-visible behaviour of this app?
>
> I looked at a little more of the output. All I have to go on is a few
> back traces generated by kdb of the entire system a few seconds apart.
>
> In all of the traces, the first chunk of cpus are the only ones doing
> writes. I have not counted exactly, but I think it is around 32. There
> may be more or less, but that is the feeling (sometimes they are doing
> reads as well).
Robin, is the app doing direct or buffered I/O?
Cheers,
Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group
On Wed, Feb 22, 2006 at 03:12:23PM -0800, Andrew Morton wrote:
> It's not a terribly bad hack - it's just poor-man's hashing, and it's
> reasonably well-suited to the sorts of machines and workloads which we
> expect will hit this problem.
The dnotify/inotify wakeups are a problem, namely because the implementation
is braindead: it makes the wrong part of the interface fast (setting up
notify entries) at the expense of making the rest of the kernel slow (adding
locks to read()/write()). read() and write() are incredibly hot paths in
the kernel and should be optimized at the expense of dnotify and inotify,
which are uncommon operations.
-ben
--
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here
and they've asked us to stop the party." Don't Email: <[email protected]>.
John McCutchan <[email protected]> wrote:
>
> ...
> >
> > I have a bad feeling about this one. It'd be nice to have an exact
> > understanding of the problen source, but if it's just lots of traffic on
> > ->d_lock we're kinda stuck. I don't expect we'll run off and RCUify
> > d_parent or turn d_lock into a seq_lock or anything liek that.
> >
> > Then again, maybe making d_lock an rwlock _will_ help - if this workload is
> > also hitting tree_lock (Robin?) and we're not seeing suckiness due to that
> > then perhaps the rwlock is magically helping.
> >
> >
> > > instead of your hack.
> >
> > It's not a terribly bad hack - it's just poor-man's hashing, and it's
> > reasonably well-suited to the sorts of machines and workloads which we
> > expect will hit this problem.
> >
>
> If this is as good as it gets, here is a patch (totally untested).
>
> ...
> @@ -538,7 +537,7 @@
> struct dentry *parent;
> struct inode *inode;
>
> - if (!atomic_read (&inotify_watches))
> + if (!atomic_read (&dentry->d_sb->s_inotify_watches))
> return;
>
What happens here if we're watching a mountpoint - the parent is on a
different fs?
John McCutchan <[email protected]> wrote:
>
> ...
> >
> > I have a bad feeling about this one. It'd be nice to have an exact
> > understanding of the problen source, but if it's just lots of traffic on
> > ->d_lock we're kinda stuck. I don't expect we'll run off and RCUify
> > d_parent or turn d_lock into a seq_lock or anything liek that.
> >
> > Then again, maybe making d_lock an rwlock _will_ help - if this workload is
> > also hitting tree_lock (Robin?) and we're not seeing suckiness due to that
> > then perhaps the rwlock is magically helping.
> >
> >
> > > instead of your hack.
> >
> > It's not a terribly bad hack - it's just poor-man's hashing, and it's
> > reasonably well-suited to the sorts of machines and workloads which we
> > expect will hit this problem.
> >
>
> If this is as good as it gets, here is a patch (totally untested).
>
> ...
> @@ -538,7 +537,7 @@
> struct dentry *parent;
> struct inode *inode;
>
> - if (!atomic_read (&inotify_watches))
> + if (!atomic_read (&dentry->d_sb->s_inotify_watches))
> return;
>
What happens here if we're watching a mointpoint - the parent is on a
different fs?
On Thu, Feb 23, 2006 at 04:14:25PM -0800, Andrew Morton wrote:
> John McCutchan <[email protected]> wrote:
> >
> > ...
> > >
> > > I have a bad feeling about this one. It'd be nice to have an exact
> > > understanding of the problen source, but if it's just lots of traffic on
> > > ->d_lock we're kinda stuck. I don't expect we'll run off and RCUify
> > > d_parent or turn d_lock into a seq_lock or anything liek that.
> > >
> > > Then again, maybe making d_lock an rwlock _will_ help - if this workload is
> > > also hitting tree_lock (Robin?) and we're not seeing suckiness due to that
> > > then perhaps the rwlock is magically helping.
> > >
> > >
> > > > instead of your hack.
> > >
> > > It's not a terribly bad hack - it's just poor-man's hashing, and it's
> > > reasonably well-suited to the sorts of machines and workloads which we
> > > expect will hit this problem.
> > >
> >
> > If this is as good as it gets, here is a patch (totally untested).
> >
> > ...
> > @@ -538,7 +537,7 @@
> > struct dentry *parent;
> > struct inode *inode;
> >
> > - if (!atomic_read (&inotify_watches))
> > + if (!atomic_read (&dentry->d_sb->s_inotify_watches))
> > return;
> >
>
> What happens here if we're watching a mountpoint - the parent is on a
> different fs?
There are four cases to consider here.
Case 1: parent fs watched and child fs watched
correct results
Case 2: parent fs watched and child fs not watched
We may not deliver an event that should be delivered.
Case 3: parent fs not watched and child fs watched
We take d_lock when we don't need to
Case 4: parent fs not watched and child fs not watched
correct results
Case 2 screws us. We have to take the lock to even look at the parent's
dentry->d_sb->s_inotify_watches. I don't know of a way around this one.
John
John McCutchan <[email protected]> wrote:
>
> > > @@ -538,7 +537,7 @@
> > > struct dentry *parent;
> > > struct inode *inode;
> > >
> > > - if (!atomic_read (&inotify_watches))
> > > + if (!atomic_read (&dentry->d_sb->s_inotify_watches))
> > > return;
> > >
> >
> > What happens here if we're watching a mountpoint - the parent is on a
> > different fs?
>
> There are four cases to consider here.
>
> Case 1: parent fs watched and child fs watched
> correct results
> Case 2: parent fs watched and child fs not watched
> We may not deliver an event that should be delivered.
> Case 3: parent fs not watched and child fs watched
> We take d_lock when we don't need to
> Case 4: parent fs not watched and child fs not watched
> correct results
>
> Case 2 screws us. We have to take the lock to even look at the parent's
> dentry->d_sb->s_inotify_watches. I don't know of a way around this one.
Yeah. There are a lot of "screw"s in this thread.
I wonder if RCU can save us - if we do an rcu_read_lock() we at least know
that the dentries won't get deallocated. Then we can take a look at
d_parent (which might not be the parent any more). Once in a million years
we might send a false event or miss sending an event, depending on where
our dentry suddenly got moved to. Not very nice, but at least it won't
oops.
(hopefully cc's Dipankar)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -799,6 +799,7 @@ void d_instantiate(struct dentry *entry,
if (inode)
list_add(&entry->d_alias, &inode->i_dentry);
entry->d_inode = inode;
+ fsnotify_d_instantiate(entry, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(entry, inode);
}
@@ -850,6 +851,7 @@ struct dentry *d_instantiate_unique(stru
list_add(&entry->d_alias, &inode->i_dentry);
do_negative:
entry->d_inode = inode;
+ fsnotify_d_instantiate(entry, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(entry, inode);
return NULL;
@@ -980,6 +982,7 @@ struct dentry *d_splice_alias(struct ino
new = __d_find_alias(inode, 1);
if (new) {
BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
+ fsnotify_d_instantiate(new, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(new, inode);
d_rehash(dentry);
@@ -989,6 +992,7 @@ struct dentry *d_splice_alias(struct ino
/* d_instantiate takes dcache_lock, so we do it by hand */
list_add(&dentry->d_alias, &inode->i_dentry);
dentry->d_inode = inode;
+ fsnotify_d_instantiate(dentry, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(dentry, inode);
d_rehash(dentry);
Index: linux-2.6/fs/inotify.c
===================================================================
--- linux-2.6.orig/fs/inotify.c
+++ linux-2.6/fs/inotify.c
@@ -381,6 +381,41 @@ static int find_inode(const char __user
}
/*
+ * inotify_inode_watched - returns nonzero if there are watches on this inode
+ * and zero otherwise. We call this lockless, we do not care if we race.
+ */
+static inline int inotify_inode_watched(struct inode *inode)
+{
+ return !list_empty(&inode->inotify_watches);
+}
+
+static void set_dentry_child_flags(struct inode *inode, int new_watch)
+{
+ struct dentry *alias;
+
+ if (inotify_inode_watched(inode))
+ return;
+
+ spin_lock(&dcache_lock);
+ list_for_each_entry(alias, &inode->i_dentry, d_alias) {
+ struct dentry *child;
+
+ list_for_each_entry(child, &alias->d_subdirs, d_u.d_child) {
+ spin_lock(&child->d_lock);
+ if (new_watch) {
+ BUG_ON(child->d_flags & DCACHE_INOTIFY_PARENT_WATCHED);
+ child->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
+ } else {
+ BUG_ON(!(child->d_flags & DCACHE_INOTIFY_PARENT_WATCHED));
+ child->d_flags &= ~DCACHE_INOTIFY_PARENT_WATCHED;
+ }
+ spin_unlock(&child->d_lock);
+ }
+ }
+ spin_unlock(&dcache_lock);
+}
+
+/*
* create_watch - creates a watch on the given device.
*
* Callers must hold dev->sem. Calls inotify_dev_get_wd() so may sleep.
@@ -406,6 +441,8 @@ static struct inotify_watch *create_watc
return ERR_PTR(ret);
}
+ set_dentry_child_flags(inode, 1);
+
dev->last_wd = watch->wd;
watch->mask = mask;
atomic_set(&watch->count, 0);
@@ -462,6 +499,8 @@ static void remove_watch_no_event(struct
atomic_dec(&inotify_watches);
idr_remove(&dev->idr, watch->wd);
put_inotify_watch(watch);
+
+ set_dentry_child_flags(watch->inode, 0);
}
/*
@@ -481,16 +520,18 @@ static void remove_watch(struct inotify_
remove_watch_no_event(watch, dev);
}
-/*
- * inotify_inode_watched - returns nonzero if there are watches on this inode
- * and zero otherwise. We call this lockless, we do not care if we race.
- */
-static inline int inotify_inode_watched(struct inode *inode)
+/* Kernel API */
+
+void inotify_d_instantiate(struct dentry *entry, struct inode *inode)
{
- return !list_empty(&inode->inotify_watches);
-}
+ struct dentry *parent;
-/* Kernel API */
+ spin_lock(&entry->d_lock);
+ parent = entry->d_parent;
+ if (inotify_inode_watched(parent->d_inode))
+ entry->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
+ spin_unlock(&entry->d_lock);
+}
/**
* inotify_inode_queue_event - queue an event to all watches on this inode
@@ -538,7 +579,7 @@ void inotify_dentry_parent_queue_event(s
struct dentry *parent;
struct inode *inode;
- if (!atomic_read (&inotify_watches))
+ if (!(dentry->d_flags & DCACHE_INOTIFY_PARENT_WATCHED))
return;
spin_lock(&dentry->d_lock);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -162,6 +162,8 @@ d_iput: no no no yes
#define DCACHE_REFERENCED 0x0008 /* Recently used, don't discard. */
#define DCACHE_UNHASHED 0x0010
+#define DCACHE_INOTIFY_PARENT_WATCHED 0x0020 /* Parent inode is watched */
+
extern spinlock_t dcache_lock;
/**
Index: linux-2.6/include/linux/fsnotify.h
===================================================================
--- linux-2.6.orig/include/linux/fsnotify.h
+++ linux-2.6/include/linux/fsnotify.h
@@ -16,6 +16,12 @@
#include <linux/dnotify.h>
#include <linux/inotify.h>
+static inline void fsnotify_d_instantiate(struct dentry *entry,
+ struct inode *inode)
+{
+ inotify_d_instantiate(entry, inode);
+}
+
/*
* fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
*/
Index: linux-2.6/include/linux/inotify.h
===================================================================
--- linux-2.6.orig/include/linux/inotify.h
+++ linux-2.6/include/linux/inotify.h
@@ -71,6 +71,7 @@ struct inotify_event {
#ifdef CONFIG_INOTIFY
+extern void inotify_d_instantiate(struct dentry *, struct inode *);
extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
const char *);
extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
@@ -81,6 +82,10 @@ extern u32 inotify_get_cookie(void);
#else
+static inline void inotify_d_instantiate(struct dentry *, struct inode *)
+{
+}
+
static inline void inotify_inode_queue_event(struct inode *inode,
__u32 mask, __u32 cookie,
const char *filename)
Nick Piggin <[email protected]> wrote:
>
> @@ -538,7 +579,7 @@ void inotify_dentry_parent_queue_event(s
> struct dentry *parent;
> struct inode *inode;
>
> - if (!atomic_read (&inotify_watches))
> + if (!(dentry->d_flags & DCACHE_INOTIFY_PARENT_WATCHED))
Yeah, I think that would work. One would need to wire up d_move() also -
for when a file is moved from a watched to non-watched directory and
vice-versa.
Andrew Morton wrote:
> Nick Piggin <[email protected]> wrote:
>
>>@@ -538,7 +579,7 @@ void inotify_dentry_parent_queue_event(s
>> struct dentry *parent;
>> struct inode *inode;
>>
>> - if (!atomic_read (&inotify_watches))
>> + if (!(dentry->d_flags & DCACHE_INOTIFY_PARENT_WATCHED))
>
>
> Yeah, I think that would work. One would need to wire up d_move() also -
> for when a file is moved from a watched to non-watched directory and
> vice-versa.
>
>
Oh yeah of course, good catch. Are there any other cases missing?
... I'll let the others on this thread digest before spitting out
another patch.
John or Robert, is there some kind of inotify test suite around?
Thanks,
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
On Fri, Feb 24, 2006 at 06:07:43PM +1100, Nick Piggin wrote:
> Attached is a first implementation of what was my idea then of how
> to solve it... note it is pretty rough and I never got around to doing
> much testing of it.
>
> Basically: moves work out of inotify event time and to inotify attach
> /detach time while staying out of the core VFS.
The customer called and said they had tried with udevd running and this
patch applied. They said the problem is gone.
Thanks,
Robin Holt
Robin Holt wrote:
> On Fri, Feb 24, 2006 at 06:07:43PM +1100, Nick Piggin wrote:
>
>>Attached is a first implementation of what was my idea then of how
>>to solve it... note it is pretty rough and I never got around to doing
>>much testing of it.
>>
>>Basically: moves work out of inotify event time and to inotify attach
>>/detach time while staying out of the core VFS.
>
>
> The customer called and said they had tried with udevd running and this
> patch applied. They said the problem is gone.
>
OK, well tell them not to run it in production of course...
I'll spend a bit of time to implement Andrew's suggestion, and
do some testing on it.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
Previous inotify work avoidance is good when inotify is completely
unused, but it breaks down if even a single watch is in place anywhere
in the system. Robin Holt notices that udev is one such culprit - it
slows down a 512-thread application on a 512 CPU system from 6 seconds
to 22 minutes.
Solve this by adding a flag in the dentry that tells inotify whether
or not its parent inode has a watch on it. Event queueing to parent
will skip taking locks if this flag is cleared. Setting and clearing
of this flag on all child dentries versus event delivery: this is no
different to the way that inode_watches modifications was implemented,
in terms of race cases, and that was shown to be equivalent to always
performing the check.
The essential behaviour is that activity occuring _after_ a watch has
been added and _before_ it has been removed, will generate events.
Signed-off-by: Nick Piggin <[email protected]>
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -799,6 +799,7 @@ void d_instantiate(struct dentry *entry,
if (inode)
list_add(&entry->d_alias, &inode->i_dentry);
entry->d_inode = inode;
+ fsnotify_d_instantiate(entry, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(entry, inode);
}
@@ -850,6 +851,7 @@ struct dentry *d_instantiate_unique(stru
list_add(&entry->d_alias, &inode->i_dentry);
do_negative:
entry->d_inode = inode;
+ fsnotify_d_instantiate(entry, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(entry, inode);
return NULL;
@@ -980,6 +982,7 @@ struct dentry *d_splice_alias(struct ino
new = __d_find_alias(inode, 1);
if (new) {
BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
+ fsnotify_d_instantiate(new, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(new, inode);
d_rehash(dentry);
@@ -989,6 +992,7 @@ struct dentry *d_splice_alias(struct ino
/* d_instantiate takes dcache_lock, so we do it by hand */
list_add(&dentry->d_alias, &inode->i_dentry);
dentry->d_inode = inode;
+ fsnotify_d_instantiate(dentry, inode);
spin_unlock(&dcache_lock);
security_d_instantiate(dentry, inode);
d_rehash(dentry);
@@ -1339,6 +1343,7 @@ already_unhashed:
list_add(&dentry->d_u.d_child, &dentry->d_parent->d_subdirs);
spin_unlock(&target->d_lock);
+ fsnotify_d_move(dentry);
spin_unlock(&dentry->d_lock);
write_sequnlock(&rename_lock);
spin_unlock(&dcache_lock);
Index: linux-2.6/fs/inotify.c
===================================================================
--- linux-2.6.orig/fs/inotify.c
+++ linux-2.6/fs/inotify.c
@@ -38,7 +38,6 @@
#include <asm/ioctls.h>
static atomic_t inotify_cookie;
-static atomic_t inotify_watches;
static kmem_cache_t *watch_cachep;
static kmem_cache_t *event_cachep;
@@ -381,6 +380,41 @@ static int find_inode(const char __user
}
/*
+ * inotify_inode_watched - returns nonzero if there are watches on this inode
+ * and zero otherwise. We call this lockless, we do not care if we race.
+ */
+static inline int inotify_inode_watched(struct inode *inode)
+{
+ return !list_empty(&inode->inotify_watches);
+}
+
+/*
+ * Get child dentry flag into synch with parent inode.
+ */
+static void set_dentry_child_flags(struct inode *inode, int watched)
+{
+ struct dentry *alias;
+
+ spin_lock(&dcache_lock);
+ list_for_each_entry(alias, &inode->i_dentry, d_alias) {
+ struct dentry *child;
+
+ list_for_each_entry(child, &alias->d_subdirs, d_u.d_child) {
+ spin_lock(&child->d_lock);
+ if (watched) {
+ WARN_ON(child->d_flags & DCACHE_INOTIFY_PARENT_WATCHED);
+ child->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
+ } else {
+ WARN_ON(!(child->d_flags & DCACHE_INOTIFY_PARENT_WATCHED));
+ child->d_flags &= ~DCACHE_INOTIFY_PARENT_WATCHED;
+ }
+ spin_unlock(&child->d_lock);
+ }
+ }
+ spin_unlock(&dcache_lock);
+}
+
+/*
* create_watch - creates a watch on the given device.
*
* Callers must hold dev->sem. Calls inotify_dev_get_wd() so may sleep.
@@ -426,7 +460,6 @@ static struct inotify_watch *create_watc
get_inotify_watch(watch);
atomic_inc(&dev->user->inotify_watches);
- atomic_inc(&inotify_watches);
return watch;
}
@@ -458,8 +491,10 @@ static void remove_watch_no_event(struct
list_del(&watch->i_list);
list_del(&watch->d_list);
+ if (!inotify_inode_watched(watch->inode))
+ set_dentry_child_flags(watch->inode, 0);
+
atomic_dec(&dev->user->inotify_watches);
- atomic_dec(&inotify_watches);
idr_remove(&dev->idr, watch->wd);
put_inotify_watch(watch);
}
@@ -481,16 +516,39 @@ static void remove_watch(struct inotify_
remove_watch_no_event(watch, dev);
}
+/* Kernel API */
+
/*
- * inotify_inode_watched - returns nonzero if there are watches on this inode
- * and zero otherwise. We call this lockless, we do not care if we race.
+ * inotify_d_instantiate - instantiate dcache entry for inode
*/
-static inline int inotify_inode_watched(struct inode *inode)
+void inotify_d_instantiate(struct dentry *entry, struct inode *inode)
{
- return !list_empty(&inode->inotify_watches);
+ struct dentry *parent;
+
+ if (!inode)
+ return;
+
+ WARN_ON(entry->d_flags & DCACHE_INOTIFY_PARENT_WATCHED);
+ spin_lock(&entry->d_lock);
+ parent = entry->d_parent;
+ if (inotify_inode_watched(parent->d_inode))
+ entry->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
+ spin_unlock(&entry->d_lock);
}
-/* Kernel API */
+/*
+ * inotify_d_move - dcache entry has been moved
+ */
+void inotify_d_move(struct dentry *entry)
+{
+ struct dentry *parent;
+
+ parent = entry->d_parent;
+ if (inotify_inode_watched(parent->d_inode))
+ entry->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
+ else
+ entry->d_flags &= ~DCACHE_INOTIFY_PARENT_WATCHED;
+}
/**
* inotify_inode_queue_event - queue an event to all watches on this inode
@@ -538,7 +596,7 @@ void inotify_dentry_parent_queue_event(s
struct dentry *parent;
struct inode *inode;
- if (!atomic_read (&inotify_watches))
+ if (!(dentry->d_flags & DCACHE_INOTIFY_PARENT_WATCHED))
return;
spin_lock(&dentry->d_lock);
@@ -993,6 +1051,9 @@ asmlinkage long sys_inotify_add_watch(in
goto out;
}
+ if (!inotify_inode_watched(inode))
+ set_dentry_child_flags(inode, 1);
+
/* Add the watch to the device's and the inode's list */
list_add(&watch->d_list, &dev->watches);
list_add(&watch->i_list, &inode->inotify_watches);
@@ -1065,7 +1126,6 @@ static int __init inotify_setup(void)
inotify_max_user_watches = 8192;
atomic_set(&inotify_cookie, 0);
- atomic_set(&inotify_watches, 0);
watch_cachep = kmem_cache_create("inotify_watch_cache",
sizeof(struct inotify_watch),
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -162,6 +162,8 @@ d_iput: no no no yes
#define DCACHE_REFERENCED 0x0008 /* Recently used, don't discard. */
#define DCACHE_UNHASHED 0x0010
+#define DCACHE_INOTIFY_PARENT_WATCHED 0x0020 /* Parent inode is watched */
+
extern spinlock_t dcache_lock;
/**
Index: linux-2.6/include/linux/fsnotify.h
===================================================================
--- linux-2.6.orig/include/linux/fsnotify.h
+++ linux-2.6/include/linux/fsnotify.h
@@ -17,6 +17,25 @@
#include <linux/inotify.h>
/*
+ * fsnotify_d_instantiate - instantiate a dentry for inode
+ * Called with dcache_lock held.
+ */
+static inline void fsnotify_d_instantiate(struct dentry *entry,
+ struct inode *inode)
+{
+ inotify_d_instantiate(entry, inode);
+}
+
+/*
+ * fsnotify_d_move - entry has been moved
+ * Called with dcache_lock and entry->d_lock held.
+ */
+static inline void fsnotify_d_move(struct dentry *entry)
+{
+ inotify_d_move(entry);
+}
+
+/*
* fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
*/
static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
Index: linux-2.6/include/linux/inotify.h
===================================================================
--- linux-2.6.orig/include/linux/inotify.h
+++ linux-2.6/include/linux/inotify.h
@@ -71,6 +71,8 @@ struct inotify_event {
#ifdef CONFIG_INOTIFY
+extern void inotify_d_instantiate(struct dentry *, struct inode *);
+extern void inotify_d_move(struct dentry *);
extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
const char *);
extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
@@ -81,6 +83,14 @@ extern u32 inotify_get_cookie(void);
#else
+static inline void inotify_d_instantiate(struct dentry *, struct inode *)
+{
+}
+
+static inline void inotify_d_move(struct dentry *)
+{
+}
+
static inline void inotify_inode_queue_event(struct inode *inode,
__u32 mask, __u32 cookie,
const char *filename)
On Fri, 2006-24-02 at 18:07 +1100, Nick Piggin wrote:
> Andrew Morton wrote:
> > John McCutchan <[email protected]> wrote:
> >
> >> > > @@ -538,7 +537,7 @@
> >> > > struct dentry *parent;
> >> > > struct inode *inode;
> >> > >
> >> > > - if (!atomic_read (&inotify_watches))
> >> > > + if (!atomic_read (&dentry->d_sb->s_inotify_watches))
> >> > > return;
> >> > >
> >> >
> >> > What happens here if we're watching a mountpoint - the parent is on a
> >> > different fs?
> >>
> >> There are four cases to consider here.
> >>
> >> Case 1: parent fs watched and child fs watched
> >> correct results
> >> Case 2: parent fs watched and child fs not watched
> >> We may not deliver an event that should be delivered.
> >> Case 3: parent fs not watched and child fs watched
> >> We take d_lock when we don't need to
> >> Case 4: parent fs not watched and child fs not watched
> >> correct results
> >>
> >> Case 2 screws us. We have to take the lock to even look at the parent's
> >> dentry->d_sb->s_inotify_watches. I don't know of a way around this one.
> >
> >
> > Yeah. There are a lot of "screw"s in this thread.
> >
> > I wonder if RCU can save us - if we do an rcu_read_lock() we at least know
> > that the dentries won't get deallocated. Then we can take a look at
> > d_parent (which might not be the parent any more). Once in a million years
> > we might send a false event or miss sending an event, depending on where
> > our dentry suddenly got moved to. Not very nice, but at least it won't
> > oops.
> >
> > (hopefully cc's Dipankar)
>
> I saw this problem when testing my lockless pagecache a while back.
>
> Attached is a first implementation of what was my idea then of how
> to solve it... note it is pretty rough and I never got around to doing
> much testing of it.
>
> Basically: moves work out of inotify event time and to inotify attach
> /detach time while staying out of the core VFS.
This looks really good. There might be some corner cases but it looks
like it will solve this problem nicely.
--
John McCutchan <[email protected]>
On Fri, 2006-24-02 at 18:19 +1100, Nick Piggin wrote:
> Andrew Morton wrote:
> > Nick Piggin <[email protected]> wrote:
> >
> >>@@ -538,7 +579,7 @@ void inotify_dentry_parent_queue_event(s
> >> struct dentry *parent;
> >> struct inode *inode;
> >>
> >> - if (!atomic_read (&inotify_watches))
> >> + if (!(dentry->d_flags & DCACHE_INOTIFY_PARENT_WATCHED))
> >
> >
> > Yeah, I think that would work. One would need to wire up d_move() also -
> > for when a file is moved from a watched to non-watched directory and
> > vice-versa.
> >
> >
>
> Oh yeah of course, good catch. Are there any other cases missing?
>
> ... I'll let the others on this thread digest before spitting out
> another patch.
>
> John or Robert, is there some kind of inotify test suite around?
Unfortunately there isn't. In the past gamin's test suite was a good
start, but it hasn't been working for a while now.
--
John McCutchan <[email protected]>
John McCutchan wrote:
> On Fri, 2006-24-02 at 18:07 +1100, Nick Piggin wrote:
>>I saw this problem when testing my lockless pagecache a while back.
>>
>>Attached is a first implementation of what was my idea then of how
>>to solve it... note it is pretty rough and I never got around to doing
>>much testing of it.
>>
>>Basically: moves work out of inotify event time and to inotify attach
>>/detach time while staying out of the core VFS.
>
>
>
> This looks really good. There might be some corner cases but it looks
> like it will solve this problem nicely.
>
Thanks. You should see I sent a new version which fixes several bugs
and cleans up the code a bit.
There might be some areas of potential problems:
- creating and deleting watches on directories with many entries will
take a long time. Is anyone likely to be creating and destroying
these things at a very high frequency? Probably nobody cares except
it might twist some real-time knickers.
- concurrent operations in the same watched directory will incur the
same scalability penalty. I think this is basically a non-issue since
the sheer number of events coming out will likely be a bigger problem.
Doctor, it hurts when I do this.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
On Mon, 2006-27-02 at 21:11 +1100, Nick Piggin wrote:
> John McCutchan wrote:
> > On Fri, 2006-24-02 at 18:07 +1100, Nick Piggin wrote:
>
> >>I saw this problem when testing my lockless pagecache a while back.
> >>
> >>Attached is a first implementation of what was my idea then of how
> >>to solve it... note it is pretty rough and I never got around to doing
> >>much testing of it.
> >>
> >>Basically: moves work out of inotify event time and to inotify attach
> >>/detach time while staying out of the core VFS.
> >
> >
> >
> > This looks really good. There might be some corner cases but it looks
> > like it will solve this problem nicely.
> >
>
> Thanks. You should see I sent a new version which fixes several bugs
> and cleans up the code a bit.
>
Yeah, it looks good. I haven't had time to test it myself but nothing
jumps out at as being wrong. I can only say that about the code that
touches inotify -- the rest of the VFS someone else will need to comment
on.
> There might be some areas of potential problems:
> - creating and deleting watches on directories with many entries will
> take a long time. Is anyone likely to be creating and destroying
> these things at a very high frequency? Probably nobody cares except
> it might twist some real-time knickers.
>
That's not a typical inotify usage pattern. Typically a watch is created
and left until the directory is deleted, or the application closes.
> - concurrent operations in the same watched directory will incur the
> same scalability penalty. I think this is basically a non-issue since
> the sheer number of events coming out will likely be a bigger problem.
> Doctor, it hurts when I do this.
>
Again, Yeah, I don't think we need to worry.
--
John McCutchan <[email protected]>
Nick Piggin <[email protected]> wrote:
>
> Previous inotify work avoidance is good when inotify is completely
> unused, but it breaks down if even a single watch is in place anywhere
> in the system. Robin Holt notices that udev is one such culprit - it
> slows down a 512-thread application on a 512 CPU system from 6 seconds
> to 22 minutes.
A problem is that the audit tree (believe it or not) adds a pile of new
inotify functionality. I don't know what those changes do and they might
conflict with the changes you've made (apart from giving us two copies of
inotify_inode_watched()) and the audit changes were apparently only
socialised on the linux-audit mailing list and my twice-sent patch to make
the audit tree compile has been ignored for a couple of weeks.
So I'm going to bitbucket the audit tree until a) it compiles and b) its
inotify changes have been explained and reviewed and c) we've reviewed
those changes against your optimisations. I think fixes will be needed.