2002-07-12 20:36:14

by Paul Menage

[permalink] [raw]
Subject: [RFC] Rearranging struct dentry for cache affinity


Here's a patch that reorganises struct dentry to try to improve the
cache behaviour. Main changes:

- try to separate out volatile data from constant or rarely-changing
data. On a 32-bit system with 64 byte cacheline, the first cacheline is
used for the refcounting/lru data and the "less-referenced" data, mainly
used by remote and exported filesystems. The second cacheline is used
for mostly read-only data.

- for systems where cacheline size = 4 * word size, keep all the data
needed for checking a non-matching dcache hash entry (modulo
hash-collisions) in one cacheline.

- on SMP systems, only set the DCACHE_REFERENCED bit in d_vfs_flags if
it wasn't already set.

Al, would you have any interest in a patch like this, if it has a
noticeable performance effect? If so, I'll put some performance figures
together.

Paul

diff -X /mnt/elbrus/home/pmenage/dontdiff -Naur linux-2.5.25/fs/dcache.c linux-2.5.25-dentry/fs/dcache.c
--- linux-2.5.25/fs/dcache.c Tue Jun 18 19:11:48 2002
+++ linux-2.5.25-dentry/fs/dcache.c Fri Jul 12 12:56:07 2002
@@ -707,7 +707,7 @@
struct dentry *res = NULL;

if (root_inode) {
- res = d_alloc(NULL, &(const struct qstr) { "/", 1, 0 });
+ res = d_alloc(NULL, &(const struct qstr) { name: "/", len: 1});
if (res) {
res->d_sb = root_inode->i_sb;
res->d_parent = res;
@@ -754,7 +754,7 @@
return res;
}

- tmp = d_alloc(NULL, &(const struct qstr) {"",0,0});
+ tmp = d_alloc(NULL, &(const struct qstr) { });
if (!tmp)
return NULL;

@@ -883,7 +883,10 @@
if (memcmp(dentry->d_name.name, str, len))
continue;
}
- dentry->d_vfs_flags |= DCACHE_REFERENCED;
+#ifdef CONFIG_SMP
+ if(!(dentry->d_vfs_flags & DCACHE_REFERENCED))
+#endif
+ dentry->d_vfs_flags |= DCACHE_REFERENCED;
return dentry;
}
return NULL;
diff -X /mnt/elbrus/home/pmenage/dontdiff -Naur linux-2.5.25/include/linux/dcache.h linux-2.5.25-dentry/include/linux/dcache.h
--- linux-2.5.25/include/linux/dcache.h Tue Jun 18 19:11:59 2002
+++ linux-2.5.25-dentry/include/linux/dcache.h Fri Jul 12 13:13:04 2002
@@ -22,12 +22,15 @@

/*
* "quick string" -- eases parameter passing, but more importantly
- * saves "metadata" about the string (ie length and the hash).
+ * saves "metadata" about the string (ie length and the hash).
+ *
+ * "hash" comes first to keep it in the same cacheline as "d_hash" and
+ * "d_parent" for systems where a cacheline is 4 words.
*/
struct qstr {
- const unsigned char * name;
- unsigned int len;
unsigned int hash;
+ unsigned int len;
+ const unsigned char * name;
};

struct dentry_stat_t {
@@ -67,23 +70,45 @@
#define DNAME_INLINE_LEN 16

struct dentry {
- atomic_t d_count;
+ /* First cacheline(s): data that changes frequently and is referenced often */
+
+ /* Keep refcounting/lru data together for 32-bit/16-byte systems */
+ atomic_t d_count;
+ unsigned long d_vfs_flags;
+ struct list_head d_lru; /* d_count = 0 LRU list */
+
+ /* Data that is referenced less often, or only by a few
+ * (mostly remote or exported) filesystems. (effectively
+ * cache padding for 64-byte cachelines) */
unsigned int d_flags;
- struct inode * d_inode; /* Where the name belongs to - NULL is negative */
+ struct list_head d_alias; /* inode alias list */
+
+ unsigned long d_time; /* used by d_revalidate */
+ struct super_block * d_sb; /* The root of the dentry tree */
+ void * d_fsdata; /* fs-specific data */
+
+ /* This probably belongs in the second section, but if we put
+ * it there we spill over into the next cacheline on
+ * 32-bit/64-byte systems */
+
+ struct list_head d_subdirs; /* our children */
+
+ /* Data that rarely changes and is referenced often - start a
+ * new cacheline to avoid SMP cache bounce.
+ *
+ * All the data for a failed hash check is in the same
+ * cacheline on a 32-bit/16-byte systems.
+ */
+
+ struct list_head d_hash ____cacheline_aligned; /* lookup hash list */
struct dentry * d_parent; /* parent directory */
- struct list_head d_hash; /* lookup hash list */
- struct list_head d_lru; /* d_count = 0 LRU list */
- struct list_head d_child; /* child of parent list */
- struct list_head d_subdirs; /* our children */
- struct list_head d_alias; /* inode alias list */
+ struct qstr d_name;
+ unsigned char d_iname[DNAME_INLINE_LEN]; /* small names */
+ struct inode * d_inode; /* Where the name belongs to - NULL is negative */
+ struct dentry_operations *d_op;
int d_mounted;
- struct qstr d_name;
- unsigned long d_time; /* used by d_revalidate */
- struct dentry_operations *d_op;
- struct super_block * d_sb; /* The root of the dentry tree */
- unsigned long d_vfs_flags;
- void * d_fsdata; /* fs-specific data */
- unsigned char d_iname[DNAME_INLINE_LEN]; /* small names */
+
+ struct list_head d_child; /* child of parent list */
};

struct dentry_operations {
diff -X /mnt/elbrus/home/pmenage/dontdiff -Naur linux-2.5.25/kernel/futex.c linux-2.5.25-dentry/kernel/futex.c
--- linux-2.5.25/kernel/futex.c Wed Jun 26 01:07:20 2002
+++ linux-2.5.25-dentry/kernel/futex.c Fri Jul 12 13:01:34 2002
@@ -366,7 +366,7 @@
root->i_uid = root->i_gid = 0;
root->i_atime = root->i_mtime = root->i_ctime = CURRENT_TIME;

- sb->s_root = d_alloc(NULL, &(const struct qstr) { "futex", 5, 0 });
+ sb->s_root = d_alloc(NULL, &(const struct qstr) { name : "futex", len : 5});
sb->s_root->d_sb = sb;
sb->s_root->d_parent = sb->s_root;
d_instantiate(sb->s_root, root);




2002-07-12 20:55:41

by Dave Jones

[permalink] [raw]
Subject: Re: [RFC] Rearranging struct dentry for cache affinity

On Fri, Jul 12, 2002 at 01:38:35PM -0700, Paul Menage wrote:

> - dentry->d_vfs_flags |= DCACHE_REFERENCED;
> +#ifdef CONFIG_SMP
> + if(!(dentry->d_vfs_flags & DCACHE_REFERENCED))
> +#endif
> + dentry->d_vfs_flags |= DCACHE_REFERENCED;

Yuck. Is doing this conditional on UP really a measurable effect?
Somehow I doubt it's worth this ugliness. If you must microoptimise
to this level, at least try and keep the code clean.

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-07-12 21:18:42

by Paul Menage

[permalink] [raw]
Subject: Re: [RFC] Rearranging struct dentry for cache affinity

>On Fri, Jul 12, 2002 at 01:38:35PM -0700, Paul Menage wrote:
>
> > - dentry->d_vfs_flags |= DCACHE_REFERENCED;
> > +#ifdef CONFIG_SMP
> > + if(!(dentry->d_vfs_flags & DCACHE_REFERENCED))
> > +#endif
> > + dentry->d_vfs_flags |= DCACHE_REFERENCED;
>
>Yuck. Is doing this conditional on UP really a measurable effect?

I'm don't know yet - I did some brief tests on x86 UP to see how much
positive/negative effect the branch misses versus the cache dirtying
had, and (using an exponentially decaying distribution of entries being
tested/set) the if() statement did show an improvement for sufficiently
skewed distributions. But I've no idea yet how that matches up to the
distributions that we'd see in the dcache in actual use.

As I said in an earlier email, it might be nice to have an
__ensure_bit_set() function that uses architecture-specific knowledge to
make sure a particular bit is set as efficiently as possible. (e.g.
taking advantage of predicated writes, etc).

>If you must microoptimise
>to this level, at least try and keep the code clean.

Sure - this is just a quick [RFC] hack.

Paul

2002-07-13 09:09:01

by Alexander Viro

[permalink] [raw]
Subject: Re: [RFC] Rearranging struct dentry for cache affinity



On Fri, 12 Jul 2002, Paul Menage wrote:

> - tmp = d_alloc(NULL, &(const struct qstr) {"",0,0});
> + tmp = d_alloc(NULL, &(const struct qstr) { });

Ugh. That changes behaviour and I'm less than sure that it's correct.
> if (!tmp)
> return NULL;
>
> @@ -883,7 +883,10 @@
> if (memcmp(dentry->d_name.name, str, len))
> continue;
> }
> - dentry->d_vfs_flags |= DCACHE_REFERENCED;
> +#ifdef CONFIG_SMP
> + if(!(dentry->d_vfs_flags & DCACHE_REFERENCED))
> +#endif
> + dentry->d_vfs_flags |= DCACHE_REFERENCED;
> return dentry;

Absolutely no way. If anything, lose that CONFIG_SMP - it's too fscking
ugly. But I suspect that moving that line to final dput() (see previous
posting) would work better.

diff -X /mnt/elbrus/home/pmenage/dontdiff -Naur linux-2.5.25/kernel/futex.c linux-2.5.25-dentry/kernel/futex.c
> --- linux-2.5.25/kernel/futex.c Wed Jun 26 01:07:20 2002
> +++ linux-2.5.25-dentry/kernel/futex.c Fri Jul 12 13:01:34 2002
[snip]

Eeek....

futex.c is seriously b0rken.

diff -urN C25/kernel/futex.c C25-current/kernel/futex.c
--- C25/kernel/futex.c Thu Jun 20 20:28:00 2002
+++ C25-current/kernel/futex.c Sat Jul 13 05:07:24 2002
@@ -48,7 +48,7 @@
extern void send_sigio(struct fown_struct *fown, int fd, int band);

/* Everyone needs a dentry and inode */
-static struct dentry *futex_dentry;
+static struct vfsmount *futex_mnt;

/* We use this instead of a normal wait_queue_t, so we can wake only
the relevent ones (hashed queues may be shared) */
@@ -272,7 +272,8 @@
return -ENFILE;
}
filp->f_op = &futex_fops;
- filp->f_dentry = dget(futex_dentry);
+ filp->f_vfsmnt = mntget(futex_mnt);
+ filp->f_dentry = dget(futex_mnt->mnt_root);

if (signal) {
filp->f_owner.pid = current->pid;
@@ -348,46 +349,16 @@
return ret;
}

-/* FIXME: Oh yeah, makes sense to write a filesystem... */
-static struct super_operations futexfs_ops = { statfs: simple_statfs };
-
-/* Don't check error returns: we're dead if they happen */
-static int futexfs_fill_super(struct super_block *sb, void *data, int silent)
-{
- struct inode *root;
-
- sb->s_blocksize = 1024;
- sb->s_blocksize_bits = 10;
- sb->s_magic = 0xBAD1DEA;
- sb->s_op = &futexfs_ops;
-
- root = new_inode(sb);
- root->i_mode = S_IFDIR | S_IRUSR | S_IWUSR;
- root->i_uid = root->i_gid = 0;
- root->i_atime = root->i_mtime = root->i_ctime = CURRENT_TIME;
-
- sb->s_root = d_alloc(NULL, &(const struct qstr) { "futex", 5, 0 });
- sb->s_root->d_sb = sb;
- sb->s_root->d_parent = sb->s_root;
- d_instantiate(sb->s_root, root);
-
- /* We never let this drop to zero. */
- futex_dentry = dget(sb->s_root);
-
- return 0;
-}
-
static struct super_block *
futexfs_get_sb(struct file_system_type *fs_type,
int flags, char *dev_name, void *data)
{
- return get_sb_nodev(fs_type, flags, data, futexfs_fill_super);
+ return get_sb_pseudo(fs_type, "futex", NULL, 0xBAD1DEA);
}

static struct file_system_type futex_fs_type = {
name: "futexfs",
get_sb: futexfs_get_sb,
- kill_sb: kill_anon_super,
};

static int __init init(void)
@@ -395,7 +366,7 @@
unsigned int i;

register_filesystem(&futex_fs_type);
- kern_mount(&futex_fs_type);
+ futex_mnt = kern_mount(&futex_fs_type);

for (i = 0; i < ARRAY_SIZE(futex_queues); i++)
INIT_LIST_HEAD(&futex_queues[i]);

2002-07-16 04:05:59

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Rearranging struct dentry for cache affinity

On Sat, 13 Jul 2002 05:11:46 -0400 (EDT)
Alexander Viro <[email protected]> wrote:

> futex.c is seriously b0rken.

Really? Other than changing over to get_sb_psuedo(), what does your patch
fix? As the filesystem should never be unmounted, what am I missing?

Thanks!
Rusty.

> @@ -272,7 +272,8 @@
> return -ENFILE;
> }
> filp->f_op = &futex_fops;
> - filp->f_dentry = dget(futex_dentry);
> + filp->f_vfsmnt = mntget(futex_mnt);
> + filp->f_dentry = dget(futex_mnt->mnt_root);
>
> if (signal) {
> filp->f_owner.pid = current->pid;
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy

2002-07-16 04:37:49

by Alexander Viro

[permalink] [raw]
Subject: Re: [RFC] Rearranging struct dentry for cache affinity



On Tue, 16 Jul 2002, Rusty Russell wrote:

> On Sat, 13 Jul 2002 05:11:46 -0400 (EDT)
> Alexander Viro <[email protected]> wrote:
>
> > futex.c is seriously b0rken.
>
> Really? Other than changing over to get_sb_psuedo(), what does your patch
> fix? As the filesystem should never be unmounted, what am I missing?
> > filp->f_op = &futex_fops;
> > - filp->f_dentry = dget(futex_dentry);
> > + filp->f_vfsmnt = mntget(futex_mnt);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > + filp->f_dentry = dget(futex_mnt->mnt_root);

Uninitialized ->f_vfsmnt == quite a few places in the tree very unhappy.
E.g. any access to /proc/<pid>/fd/<n>.

I'd say that "any user can oops the kernel in a couple of lines" does
qualify for serious breakage...

2002-07-17 00:28:25

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Rearranging struct dentry for cache affinity

In message <[email protected]> you wri
te:
> On Tue, 16 Jul 2002, Rusty Russell wrote:
>
> > Really? Other than changing over to get_sb_psuedo(), what does your patch
> > fix? As the filesystem should never be unmounted, what am I missing?
> > > filp->f_op = &futex_fops;
> > > - filp->f_dentry = dget(futex_dentry);
> > > + filp->f_vfsmnt = mntget(futex_mnt);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > + filp->f_dentry = dget(futex_mnt->mnt_root);
>
> Uninitialized ->f_vfsmnt == quite a few places in the tree very unhappy.
> E.g. any access to /proc/<pid>/fd/<n>.

I figured out which ones I needed to set by rough inspection, always a
dangerous practice.

Thanks!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.