2004-04-30 19:55:48

by Jose R. Santos

[permalink] [raw]
Subject: [PATCH] dentry and inode cache hash algorithm performance changes.

err... Seems I've send this to the wrong list.
Sending to linux-kernel this time :)

-----------------------------------------------------------------------

Hi Andrew,

Could you consider this patch for inclusion into mainline kernel? It
alleviates some issues seen with Linux when accessing millions of files
on machines with large amounts of RAM (+32GB). Both algorithms are base
on some studies that Dominique Heger was doing on hash table efficiencies
in Linux. The dentry hash table has been tested in small systems with
one internal IDE hard disk as well as in large SMP with many fiberchanel
disks. Dominique claims that in all the testing done, they did not see
one case were this has function provided worst performance and that in
most test they were seeing better performance.

The inode hash function was done by me base on Dominique's original work
and has only been stress tested with SpecSFS. It provided a 3%
improvement over the default algorithm in the SpecSFS results and speed
ups in the response time of almost all filesystem operations the benchmark
stress. With the better distribution is as also possible to reduce the
number of inode buckets for 32 million to 16 million and still get a slightly
better results.

Anton was nice enough to provide some graphs that show the distribution
before and after the patch at http://samba.org/~anton/linux/sfs/1/

Thanks

-JRS

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1582 -> 1.1583
# fs/dcache.c 1.70 -> 1.71
# fs/inode.c 1.113 -> 1.114
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 04/04/30 [email protected] 1.1583
# Hash functions changes that show improvements on SpecSFS when a large amount of files is used (+20 Million).
# --------------------------------------------
#
diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c Fri Apr 30 12:14:23 2004
+++ b/fs/dcache.c Fri Apr 30 12:14:23 2004
@@ -28,6 +28,7 @@
#include <asm/uaccess.h>
#include <linux/security.h>
#include <linux/seqlock.h>
+#include <linux/hash.h>

#define DCACHE_PARANOIA 1
/* #define DCACHE_DEBUG 1 */
@@ -799,8 +800,8 @@

static inline struct hlist_head * d_hash(struct dentry * parent, unsigned long hash)
{
- hash += (unsigned long) parent / L1_CACHE_BYTES;
- hash = hash ^ (hash >> D_HASHBITS);
+ hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
+ hash = hash ^ ((hash ^ GOLDEN_RATIO_PRIME) >> D_HASHBITS);
return dentry_hashtable + (hash & D_HASHMASK);
}

diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c Fri Apr 30 12:14:23 2004
+++ b/fs/inode.c Fri Apr 30 12:14:23 2004
@@ -671,8 +671,9 @@

static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
{
- unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
- tmp = tmp + (tmp >> I_HASHBITS);
+ unsigned long tmp = (hashval + ((unsigned long) sb) ^ (GOLDEN_RATIO_PRIME + hashval)
+ / L1_CACHE_BYTES);
+ tmp = tmp + ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
return tmp & I_HASHMASK;
}






2004-04-30 21:04:25

by Jose R. Santos

[permalink] [raw]
Subject: Re: [PATCH] dentry and inode cache hash algorithm performance changes.

On 04/30/04 15:18:32, Andrew Morton wrote:
> "Jose R. Santos" <[email protected]> wrote:
> >
> > diff -Nru a/fs/inode.c b/fs/inode.c
> > --- a/fs/inode.c Fri Apr 30 12:14:23 2004
> > +++ b/fs/inode.c Fri Apr 30 12:14:23 2004
> > @@ -671,8 +671,9 @@
> >
> > static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
> > {
> > - unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
> > - tmp = tmp + (tmp >> I_HASHBITS);
> > + unsigned long tmp = (hashval + ((unsigned long) sb) ^ (GOLDEN_RATIO_PRIME + hashval)
> > + / L1_CACHE_BYTES);
> > + tmp = tmp + ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
> > return tmp & I_HASHMASK;
> > }
> >
>
> Are you sure about this? It's doing:
>
> tmp = hashval + sb ^ ((GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES)
>
> should it not be:
>
> tmp = hashval + (sb ^ (GOLDEN_RATIO_PRIME + hashval)) / L1_CACHE_BYTES
>
> ?

err... Wrote the patch to fast. It should read

tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES

I screw up... I'll send a fixed patch in a while.

-JRS


2004-04-30 21:34:02

by Jose R. Santos

[permalink] [raw]
Subject: Re: [PATCH] dentry and inode cache hash algorithm performance changes.

On 04/30/04 15:57:01, Jose R. Santos wrote:
> err... Wrote the patch to fast. It should read
>
> tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES
>
> I screw up... I'll send a fixed patch in a while.

Just notice I've made another error in the inode hash code.

Fixed patch (I hope) with beautification.

-JRS


diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c Fri Apr 30 16:27:41 2004
+++ b/fs/dcache.c Fri Apr 30 16:27:41 2004
@@ -28,6 +28,7 @@
#include <asm/uaccess.h>
#include <linux/security.h>
#include <linux/seqlock.h>
+#include <linux/hash.h>

#define DCACHE_PARANOIA 1
/* #define DCACHE_DEBUG 1 */
@@ -799,8 +800,8 @@

static inline struct hlist_head * d_hash(struct dentry * parent, unsigned long hash)
{
- hash += (unsigned long) parent / L1_CACHE_BYTES;
- hash = hash ^ (hash >> D_HASHBITS);
+ hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
+ hash = hash ^ ((hash ^ GOLDEN_RATIO_PRIME) >> D_HASHBITS);
return dentry_hashtable + (hash & D_HASHMASK);
}

diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c Fri Apr 30 16:27:41 2004
+++ b/fs/inode.c Fri Apr 30 16:27:41 2004
@@ -671,12 +671,13 @@

static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
{
- unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
- tmp = tmp + (tmp >> I_HASHBITS);
+ unsigned long tmp;
+
+ tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
+ L1_CACHE_BYTES;
+ tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
return tmp & I_HASHMASK;
}
-
-/* Yeah, I know about quadratic hash. Maybe, later. */

/**
* iunique - get a unique inode number

2004-04-30 22:00:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] dentry and inode cache hash algorithm performance changes.

"Jose R. Santos" <[email protected]> wrote:
>
> On 04/30/04 15:57:01, Jose R. Santos wrote:
> > err... Wrote the patch to fast. It should read
> >
> > tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES
> >
> > I screw up... I'll send a fixed patch in a while.
>
> Just notice I've made another error in the inode hash code.
>
> Fixed patch (I hope) with beautification.

Does this mean you need to redo the instrumentation and benchmarking? If
so, please do that and send the numbers along? There's no particular
urgency on that, but we should do it.

Also, I'd be interested in understanding what the input to the hashing
functions looked like in this testing. It could be that the new hash just
happens to work well with one particular test's dataset. Please convince
us otherwise ;)

2004-04-30 23:42:54

by Jose R. Santos

[permalink] [raw]
Subject: Re: [PATCH] dentry and inode cache hash algorithm performance changes.

On 04/30/04 17:02:56, Andrew Morton wrote:
> Does this mean you need to redo the instrumentation and benchmarking? If
> so, please do that and send the numbers along? There's no particular
> urgency on that, but we should do it.

No, the screw up was made we I created the patch on a clean source
tree. This patch represents the actual data gather on those graphs.

> Also, I'd be interested in understanding what the input to the hashing
> functions looked like in this testing. It could be that the new hash just
> happens to work well with one particular test's dataset. Please convince
> us otherwise ;)

hehehe... I knew it could't be that easy. :)

For the dentry hash function, some of my other coorkers had put this
hash function through various testing and have concluded that the
hash function was equal or better than the default hash function.
These runs were done with a (hopefully to be Open Source soon)
benchmark called FFSB which can simulate various io patters across
many filesystems and variable file sizes.

SpecSFS fileset is basically a lot of small file which varies
depending on the size of the run. For a not so big SMP system
the number of file is in the +20 Million files range. Of those
20 million files only 10% are access randomly by the client. The
purpose of this is that the benchmark tries to stress not only
the NFS layer but, VM and Filesystems layers as well. The filesets
are also hundreds of gigabytes in size in order to promote disk
head movement by guaranteeing cache misses in memory. SFS 27% of
the workload are lookups __d_lookup has showing high in my
profiles.

For the inode hash the problem that I see is that when running
a benchmark with this huge fileset we end up trying to free a
lot of inode entries during the run while trying to put new
entries in cache. We end up calling ifind_fast() which calls
find_inodes_fast() held under inode_lock. In order to avoid
holding the inode_lock we needed to avoid having long chains
in that hash function.

When I took a look at the original hash function, I found it
to be a bit to simple for any workload. My solution (which I
took advantage of Dominique's work) was to create a hash
that function that could generate completely different hashes
depending on the hashval and the superblock in order to have
the hash scale as we added more filesystems to the machine.

Both of these problems can be somewhat tuned out by increasing
the number of buckets of both d and i cache but it got to a
point were I had 256MB of inode and 128MB in dentry hash buckets
on a not so large SMP. With the hash changes I have been able
to reduce the number of buckets to 128MB for inode cache and to
32MB for dentry cache and still get better performance.

If it help my case... I haven't been running this benchmark
for long, so I haven't been able to find a way to cheat. I
need to come up with generic solutions until I can find a
cheat for the benchmark. :)


-JRS