I believe I've encountered a scaling problem with the inotify code in
2.6.10-mm3.
vfs_read() and vfs_write() (for example) do:
dnotify_parent(dentry, DN_ACCESS);
inotify_dentry_parent_queue_event(dentry,
IN_ACCESS, 0, dentry->d_name.name);
inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
for the optional "notify" functionality.
dnotify_parent() knows how to exit quickly:
if (!dir_notify_enable)
return;
looking at a global flag, and inotify_inode_queue_event() also knows a quick
exit:
if (!inode->inotify_data)
return;
However, inotify_dentry_parent_queue_event() first has to get the parent
dentry:
parent = dget_parent(dentry);
inotify_inode_queue_event(parent->d_inode, mask, cookie, filename);
dput(parent);
and, sadly, dget_parent(dentry) grabs a spinlock (dentry->d_lock) and dirties a cacheline (dentry->dcount). I have an AIM7-like workload that does numerous
concurrent reads and suffers a 40% drop in performance because of this,
primarily due to spinlock contention.
Is it possible for a parent's inode->inotify_data to be enabled when none of
its children's inotify_data are enabled? That would make it easy - just look
at the current inode's inotify_data before walking back to the parent.
John Hawkes
On Thu, 2005-01-13 at 15:56 -0800, John Hawkes wrote:
> I believe I've encountered a scaling problem with the inotify code in
> 2.6.10-mm3.
>
> vfs_read() and vfs_write() (for example) do:
> dnotify_parent(dentry, DN_ACCESS);
> inotify_dentry_parent_queue_event(dentry,
> IN_ACCESS, 0, dentry->d_name.name);
> inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
> for the optional "notify" functionality.
>
> dnotify_parent() knows how to exit quickly:
> if (!dir_notify_enable)
> return;
This isn't a "quick exit", though. It is just a termination check in
case dnotify was disabled on boot. The rest of dnotify_parent() is
always executed and it does the equivalent of dget_parent().
So why is inotify showing up on your test and not dnotify?
Hm, dnotify always grabs the lock but does not bump dentry->count unless
there is actually a watch on the dentry. Could that really be the
difference and cause of the slowdown? We could probably do that, too.
> Is it possible for a parent's inode->inotify_data to be enabled when none of
> its children's inotify_data are enabled? That would make it easy - just look
> at the current inode's inotify_data before walking back to the parent.
Unfortunately, no. There is no relationship between the parent and the
child inode's inotify_data structure. The best we can do is exactly
what dnotify does, actually, which is
spin_lock(&dentry->d_lock);
parent = dentry->d_parent;
if (parent->d_inode->i_dnotify_mask & event) {
dget(parent);
spin_unlock(&dentry->d_lock);
__inode_dir_notify(parent->d_inode, event);
dput(parent);
} else {
spin_unlock(&dentry->d_lock);
}
Instead of our current "explicit"
parent = dget_parent(dentry);
inotify_inode_queue_event(parent->d_inode, mask, cookie,
filename);
dput(parent);
E.g., save the ref unless absolutely needed.
I am open to other ideas, too, but I don't see any nice shortcuts like
what we can do in inotify_inode_queue_event().
(Other) John? Any ideas?
Best,
Robert Love
On Thu, 2005-01-13 at 19:49 -0500, Robert Love wrote:
> On Thu, 2005-01-13 at 15:56 -0800, John Hawkes wrote:
> [snip]
>
> I am open to other ideas, too, but I don't see any nice shortcuts like
> what we can do in inotify_inode_queue_event().
>
> (Other) John? Any ideas?
No, you covered things well. This code was really just a straight copy
of the dnotify code. Rob cleaned it up at some point, giving us what we
have today. The only fix I can think of is the one suggested by Rob --
copying the dnotify code again.
--
John McCutchan <[email protected]>
On Thu, 2005-01-13 at 20:31 -0500, John McCutchan wrote:
> No, you covered things well. This code was really just a straight copy
> of the dnotify code. Rob cleaned it up at some point, giving us what we
> have today. The only fix I can think of is the one suggested by Rob --
> copying the dnotify code again.
Oh, did I undo the dnotify optimization? :-)
Eh, no, looks like not exactly: The old code always did a dget().
John Hawkes: Attached patch implements the dget() optimization in the
same manner as dnotify. Compile-tested but not booted.
Let me know!
Robert Love
inotify: don't do dget() unless we have to
drivers/char/inotify.c | 14 +++++++++++---
1 files changed, 11 insertions(+), 3 deletions(-)
diff -urN linux-2.6.10-mm3/drivers/char/inotify.c linux/drivers/char/inotify.c
--- linux-2.6.10-mm3/drivers/char/inotify.c 2005-01-13 21:04:53.108971856 -0500
+++ linux/drivers/char/inotify.c 2005-01-13 21:25:20.052448096 -0500
@@ -607,10 +607,18 @@
u32 cookie, const char *filename)
{
struct dentry *parent;
+ struct inode *inode;
- parent = dget_parent(dentry);
- inotify_inode_queue_event(parent->d_inode, mask, cookie, filename);
- dput(parent);
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ inode = parent->d_inode;
+ if (inode->inotify_data) {
+ dget(parent);
+ spin_unlock(&dentry->d_lock);
+ inotify_inode_queue_event(inode, mask, cookie, filename);
+ dput(parent);
+ } else
+ spin_unlock(&dentry->d_lock);
}
EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);