2005-01-14 00:02:58

by John Hawkes

[permalink] [raw]
Subject: 2.6.10-mm3 scaling problem with inotify

I believe I've encountered a scaling problem with the inotify code in
2.6.10-mm3.

vfs_read() and vfs_write() (for example) do:
dnotify_parent(dentry, DN_ACCESS);
inotify_dentry_parent_queue_event(dentry,
IN_ACCESS, 0, dentry->d_name.name);
inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
for the optional "notify" functionality.

dnotify_parent() knows how to exit quickly:
if (!dir_notify_enable)
return;
looking at a global flag, and inotify_inode_queue_event() also knows a quick
exit:
if (!inode->inotify_data)
return;
However, inotify_dentry_parent_queue_event() first has to get the parent
dentry:
parent = dget_parent(dentry);
inotify_inode_queue_event(parent->d_inode, mask, cookie, filename);
dput(parent);
and, sadly, dget_parent(dentry) grabs a spinlock (dentry->d_lock) and dirties a cacheline (dentry->dcount). I have an AIM7-like workload that does numerous
concurrent reads and suffers a 40% drop in performance because of this,
primarily due to spinlock contention.

Is it possible for a parent's inode->inotify_data to be enabled when none of
its children's inotify_data are enabled? That would make it easy - just look
at the current inode's inotify_data before walking back to the parent.

John Hawkes


2005-01-14 00:52:35

by Robert Love

[permalink] [raw]
Subject: Re: 2.6.10-mm3 scaling problem with inotify

On Thu, 2005-01-13 at 15:56 -0800, John Hawkes wrote:

> I believe I've encountered a scaling problem with the inotify code in
> 2.6.10-mm3.
>
> vfs_read() and vfs_write() (for example) do:
> dnotify_parent(dentry, DN_ACCESS);
> inotify_dentry_parent_queue_event(dentry,
> IN_ACCESS, 0, dentry->d_name.name);
> inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
> for the optional "notify" functionality.
>
> dnotify_parent() knows how to exit quickly:
> if (!dir_notify_enable)
> return;

This isn't a "quick exit", though. It is just a termination check in
case dnotify was disabled on boot. The rest of dnotify_parent() is
always executed and it does the equivalent of dget_parent().

So why is inotify showing up on your test and not dnotify?

Hm, dnotify always grabs the lock but does not bump dentry->count unless
there is actually a watch on the dentry. Could that really be the
difference and cause of the slowdown? We could probably do that, too.

> Is it possible for a parent's inode->inotify_data to be enabled when none of
> its children's inotify_data are enabled? That would make it easy - just look
> at the current inode's inotify_data before walking back to the parent.

Unfortunately, no. There is no relationship between the parent and the
child inode's inotify_data structure. The best we can do is exactly
what dnotify does, actually, which is

spin_lock(&dentry->d_lock);
parent = dentry->d_parent;
if (parent->d_inode->i_dnotify_mask & event) {
dget(parent);
spin_unlock(&dentry->d_lock);
__inode_dir_notify(parent->d_inode, event);
dput(parent);
} else {
spin_unlock(&dentry->d_lock);
}

Instead of our current "explicit"

parent = dget_parent(dentry);
inotify_inode_queue_event(parent->d_inode, mask, cookie,
filename);
dput(parent);

E.g., save the ref unless absolutely needed.

I am open to other ideas, too, but I don't see any nice shortcuts like
what we can do in inotify_inode_queue_event().

(Other) John? Any ideas?

Best,

Robert Love


2005-01-14 01:37:14

by John McCutchan

[permalink] [raw]
Subject: Re: 2.6.10-mm3 scaling problem with inotify

On Thu, 2005-01-13 at 19:49 -0500, Robert Love wrote:
> On Thu, 2005-01-13 at 15:56 -0800, John Hawkes wrote:
> [snip]
>
> I am open to other ideas, too, but I don't see any nice shortcuts like
> what we can do in inotify_inode_queue_event().
>
> (Other) John? Any ideas?

No, you covered things well. This code was really just a straight copy
of the dnotify code. Rob cleaned it up at some point, giving us what we
have today. The only fix I can think of is the one suggested by Rob --
copying the dnotify code again.

--
John McCutchan <[email protected]>

2005-01-14 02:29:32

by Robert Love

[permalink] [raw]
Subject: Re: 2.6.10-mm3 scaling problem with inotify

On Thu, 2005-01-13 at 20:31 -0500, John McCutchan wrote:

> No, you covered things well. This code was really just a straight copy
> of the dnotify code. Rob cleaned it up at some point, giving us what we
> have today. The only fix I can think of is the one suggested by Rob --
> copying the dnotify code again.

Oh, did I undo the dnotify optimization? :-)

Eh, no, looks like not exactly: The old code always did a dget().

John Hawkes: Attached patch implements the dget() optimization in the
same manner as dnotify. Compile-tested but not booted.

Let me know!

Robert Love


inotify: don't do dget() unless we have to

drivers/char/inotify.c | 14 +++++++++++---
1 files changed, 11 insertions(+), 3 deletions(-)

diff -urN linux-2.6.10-mm3/drivers/char/inotify.c linux/drivers/char/inotify.c
--- linux-2.6.10-mm3/drivers/char/inotify.c 2005-01-13 21:04:53.108971856 -0500
+++ linux/drivers/char/inotify.c 2005-01-13 21:25:20.052448096 -0500
@@ -607,10 +607,18 @@
u32 cookie, const char *filename)
{
struct dentry *parent;
+ struct inode *inode;

- parent = dget_parent(dentry);
- inotify_inode_queue_event(parent->d_inode, mask, cookie, filename);
- dput(parent);
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ inode = parent->d_inode;
+ if (inode->inotify_data) {
+ dget(parent);
+ spin_unlock(&dentry->d_lock);
+ inotify_inode_queue_event(inode, mask, cookie, filename);
+ dput(parent);
+ } else
+ spin_unlock(&dentry->d_lock);
}
EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);