2010-11-25 13:23:37

by Jozsef Kadlecsik

[permalink] [raw]
Subject: [PATCH 0/2] New jhash function

Hi,

Please consider applying the following patch:

The current jhash.h implements the lookup2() hash function by Bob Jenkins.
However, lookup2() is outdated as Bob wrote a new hash function called
lookup3(). The new hash function

- mixes better than lookup2(): it passes the check that every input bit
changes every output bit 50% of the time, while lookup2() failed it.
- performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
than lookup2() depending on the key length.

You can read a longer comparison of the two and other hash functions at
http://burtleburtle.net/bob/hash/doobs.html.

jhash is widely used in the kernel and because the functions
are inlined, the cost in size is significant. The new functions
are slightly larger than the previous ones so the new implementation
uses non-inlined fucntions. (See
http://lkml.indiana.edu/hypermail//linux/kernel/0902.2/01149.html).
Therefore the first patch replaces the calls to the internal macros
of jhash with the jhash function calls and the second patch contains
the implementation of the new jhash functions.

Jozsef Kadlecsik (2):
Prepare the tree for un-inlined jhash.
The new jhash implementation

include/linux/jhash.h | 136 ++-------------------------------
lib/Makefile | 2 +-
lib/jhash.c | 153 ++++++++++++++++++++++++++++++++++++++
net/ipv6/inet6_connection_sock.c | 18 ++---
net/ipv6/reassembly.c | 36 ++++-----
5 files changed, 186 insertions(+), 159 deletions(-)
create mode 100644 lib/jhash.c


2010-11-25 13:23:38

by Jozsef Kadlecsik

[permalink] [raw]
Subject: [PATCH 2/2] The new jhash implementation

The current jhash.h implements the lookup2() hash function by Bob Jenkins.
However, lookup2() is outdated as Bob wrote a new hash function called
lookup3(). The new hash function

- mixes better than lookup2(): it passes the check that every input bit
changes every output bit 50% of the time, while lookup2() failed it.
- performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
than lookup2() depending on the key length.

The patch replaces the lookup2() implementation of the 'jhash*'
functions with that of lookup3().

You can read a longer comparison of the two and other hash functions at
http://burtleburtle.net/bob/hash/doobs.html.

Signed-off-by: Jozsef Kadlecsik <[email protected]>
---
include/linux/jhash.h | 136 +++-----------------------------------------
lib/Makefile | 2 +-
lib/jhash.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 162 insertions(+), 129 deletions(-)
create mode 100644 lib/jhash.c

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index ced1159..ca69ac3 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -1,134 +1,14 @@
#ifndef _LINUX_JHASH_H
#define _LINUX_JHASH_H

-/* jhash.h: Jenkins hash support.
- *
- * Copyright (C) 1996 Bob Jenkins ([email protected])
- *
- * http://burtleburtle.net/bob/hash/
- *
- * These are the credits from Bob's sources:
- *
- * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
- * hash(), hash2(), hash3, and mix() are externally useful functions.
- * Routines to test the hash are included if SELF_TEST is defined.
- * You can use this free for any purpose. It has no warranty.
- *
- * Copyright (C) 2003 David S. Miller ([email protected])
- *
- * I've modified Bob's hash to be useful in the Linux kernel, and
- * any bugs present are surely my fault. -DaveM
- */
-
-/* NOTE: Arguments are modified. */
-#define __jhash_mix(a, b, c) \
-{ \
- a -= b; a -= c; a ^= (c>>13); \
- b -= c; b -= a; b ^= (a<<8); \
- c -= a; c -= b; c ^= (b>>13); \
- a -= b; a -= c; a ^= (c>>12); \
- b -= c; b -= a; b ^= (a<<16); \
- c -= a; c -= b; c ^= (b>>5); \
- a -= b; a -= c; a ^= (c>>3); \
- b -= c; b -= a; b ^= (a<<10); \
- c -= a; c -= b; c ^= (b>>15); \
-}
-
-/* The golden ration: an arbitrary value */
-#define JHASH_GOLDEN_RATIO 0x9e3779b9
-
-/* The most generic version, hashes an arbitrary sequence
- * of bytes. No alignment or length assumptions are made about
- * the input key.
- */
-static inline u32 jhash(const void *key, u32 length, u32 initval)
-{
- u32 a, b, c, len;
- const u8 *k = key;
-
- len = length;
- a = b = JHASH_GOLDEN_RATIO;
- c = initval;
-
- while (len >= 12) {
- a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
- b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
- c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
-
- __jhash_mix(a,b,c);
-
- k += 12;
- len -= 12;
- }
-
- c += length;
- switch (len) {
- case 11: c += ((u32)k[10]<<24);
- case 10: c += ((u32)k[9]<<16);
- case 9 : c += ((u32)k[8]<<8);
- case 8 : b += ((u32)k[7]<<24);
- case 7 : b += ((u32)k[6]<<16);
- case 6 : b += ((u32)k[5]<<8);
- case 5 : b += k[4];
- case 4 : a += ((u32)k[3]<<24);
- case 3 : a += ((u32)k[2]<<16);
- case 2 : a += ((u32)k[1]<<8);
- case 1 : a += k[0];
- };
-
- __jhash_mix(a,b,c);
-
- return c;
-}
-
-/* A special optimized version that handles 1 or more of u32s.
- * The length parameter here is the number of u32s in the key.
- */
-static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
-{
- u32 a, b, c, len;
-
- a = b = JHASH_GOLDEN_RATIO;
- c = initval;
- len = length;
-
- while (len >= 3) {
- a += k[0];
- b += k[1];
- c += k[2];
- __jhash_mix(a, b, c);
- k += 3; len -= 3;
- }
-
- c += length * 4;
-
- switch (len) {
- case 2 : b += k[1];
- case 1 : a += k[0];
- };
-
- __jhash_mix(a,b,c);
-
- return c;
-}
-
-
-/* A special ultra-optimized versions that knows they are hashing exactly
- * 3, 2 or 1 word(s).
- *
- * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally
- * done at the end is not done here.
- */
-static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
-{
- a += JHASH_GOLDEN_RATIO;
- b += JHASH_GOLDEN_RATIO;
- c += initval;
-
- __jhash_mix(a, b, c);
-
- return c;
-}
+/* Best hash sizes are of power of two */
+#define jhash_size(n) ((u32)1<<(n))
+/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
+#define jhash_mask(n) (jhash_size(n)-1)
+
+extern u32 jhash(const void *key, u32 length, u32 initval);
+extern u32 jhash2(const u32 *k, u32 length, u32 initval);
+extern u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval);

static inline u32 jhash_2words(u32 a, u32 b, u32 initval)
{
diff --git a/lib/Makefile b/lib/Makefile
index e6a3763..a1a4932 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -10,7 +10,7 @@ endif
lib-y := ctype.o string.o vsprintf.o cmdline.o \
rbtree.o radix-tree.o dump_stack.o \
idr.o int_sqrt.o extable.o prio_tree.o \
- sha1.o irq_regs.o reciprocal_div.o argv_split.o \
+ jhash.o sha1.o irq_regs.o reciprocal_div.o argv_split.o \
proportions.o prio_heap.o ratelimit.o show_mem.o \
is_single_threaded.o plist.o decompress.o flex_array.o

diff --git a/lib/jhash.c b/lib/jhash.c
new file mode 100644
index 0000000..593cc77
--- /dev/null
+++ b/lib/jhash.c
@@ -0,0 +1,153 @@
+/* jhash.c: Jenkins hash support.
+ *
+ * Copyright (C) 2006. Bob Jenkins ([email protected])
+ *
+ * http://burtleburtle.net/bob/hash/
+ *
+ * These are the credits from Bob's sources:
+ *
+ * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+ *
+ * These are functions for producing 32-bit hashes for hash table lookup.
+ * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+ * are externally useful functions. Routines to test the hash are included
+ * if SELF_TEST is defined. You can use this free for any purpose. It's in
+ * the public domain. It has no warranty.
+ *
+ * Copyright (C) 2009-2010 Jozsef Kadlecsik ([email protected])
+ *
+ * I've modified Bob's hash to be useful in the Linux kernel, and
+ * any bugs present are my fault.
+ * Jozsef
+ */
+#include <linux/bitops.h>
+#include <linux/module.h>
+#include <linux/jhash.h>
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c) \
+{ \
+ a -= c; a ^= rol32(c, 4); c += b; \
+ b -= a; b ^= rol32(a, 6); a += c; \
+ c -= b; c ^= rol32(b, 8); b += a; \
+ a -= c; a ^= rol32(c, 16); c += b; \
+ b -= a; b ^= rol32(a, 19); a += c; \
+ c -= b; c ^= rol32(b, 4); b += a; \
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c) \
+{ \
+ c ^= b; c -= rol32(b, 14); \
+ a ^= c; a -= rol32(c, 11); \
+ b ^= a; b -= rol32(a, 25); \
+ c ^= b; c -= rol32(b, 16); \
+ a ^= c; a -= rol32(c, 4); \
+ b ^= a; b -= rol32(a, 14); \
+ c ^= b; c -= rol32(b, 24); \
+}
+
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL 0xdeadbeef
+
+/* jhash - hash an arbitrary key
+ * @k: sequence of bytes as key
+ * @length: the length of the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * The generic version, hashes an arbitrary sequence of bytes.
+ * No alignment or length assumptions are made about the input key.
+ *
+ * Returns the hash value of the key. The result depends on endianness.
+ */
+u32 jhash(const void *key, u32 length, u32 initval)
+{
+ u32 a, b, c;
+ const u8 *k = key;
+
+ /* Set up the internal state */
+ a = b = c = JHASH_INITVAL + length + initval;
+
+ /* All but the last block: affect some 32 bits of (a,b,c) */
+ while (length > 12) {
+ a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24);
+ b += k[4] + ((u32)k[5]<<8) + ((u32)k[6]<<16) + ((u32)k[7]<<24);
+ c += k[8] + ((u32)k[9]<<8) + ((u32)k[10]<<16) + ((u32)k[11]<<24);
+ __jhash_mix(a, b, c);
+ length -= 12;
+ k += 12;
+ }
+ /* Last block: affect all 32 bits of (c) */
+ /* All the case statements fall through */
+ switch (length) {
+ case 12: c += (u32)k[11]<<24;
+ case 11: c += (u32)k[10]<<16;
+ case 10: c += (u32)k[9]<<8;
+ case 9: c += k[8];
+ case 8: b += (u32)k[7]<<24;
+ case 7: b += (u32)k[6]<<16;
+ case 6: b += (u32)k[5]<<8;
+ case 5: b += k[4];
+ case 4: a += (u32)k[3]<<24;
+ case 3: a += (u32)k[2]<<16;
+ case 2: a += (u32)k[1]<<8;
+ case 1: a += k[0];
+ __jhash_final(a, b, c);
+ case 0: /* Nothing left to add */
+ break;
+ }
+
+ return c;
+}
+EXPORT_SYMBOL(jhash);
+
+/* jhash2 - hash an array of u32's
+ * @k: the key which must be an array of u32's
+ * @length: the number of u32's in the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * Returns the hash value of the key.
+ */
+u32 jhash2(const u32 *k, u32 length, u32 initval)
+{
+ u32 a, b, c;
+
+ /* Set up the internal state */
+ a = b = c = JHASH_INITVAL + (length<<2) + initval;
+
+ /* Handle most of the key */
+ while (length > 3) {
+ a += k[0];
+ b += k[1];
+ c += k[2];
+ __jhash_mix(a, b, c);
+ length -= 3;
+ k += 3;
+ }
+
+ /* Handle the last 3 u32's: all the case statements fall through */
+ switch (length) {
+ case 3: c += k[2];
+ case 2: b += k[1];
+ case 1: a += k[0];
+ __jhash_final(a, b, c);
+ case 0: /* Nothing left to add */
+ break;
+ }
+
+ return c;
+}
+EXPORT_SYMBOL(jhash2);
+
+/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
+u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
+{
+ a += JHASH_INITVAL;
+ b += JHASH_INITVAL;
+ c += initval;
+
+ __jhash_mix(a, b, c);
+
+ return c;
+}
+EXPORT_SYMBOL(jhash_3words);
--
1.7.0.4

2010-11-25 13:24:32

by Jozsef Kadlecsik

[permalink] [raw]
Subject: [PATCH 1/2] Prepare the tree for un-inlined jhash.

jhash is widely used in the kernel and because the functions
are inlined, the cost in size is significant. Also, the new jhash
functions are slightly larger than the previous ones so better un-inline.
As a preparation step, the calls to the internal macros are replaced
with the plain jhash function calls.

Signed-off-by: Jozsef Kadlecsik <[email protected]>
---
net/ipv6/inet6_connection_sock.c | 18 ++++++++----------
net/ipv6/reassembly.c | 36 ++++++++++++++++--------------------
2 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 8a16280..861d252 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -60,18 +60,16 @@ EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict);
static u32 inet6_synq_hash(const struct in6_addr *raddr, const __be16 rport,
const u32 rnd, const u16 synq_hsize)
{
- u32 a = (__force u32)raddr->s6_addr32[0];
- u32 b = (__force u32)raddr->s6_addr32[1];
- u32 c = (__force u32)raddr->s6_addr32[2];
-
- a += JHASH_GOLDEN_RATIO;
- b += JHASH_GOLDEN_RATIO;
- c += rnd;
- __jhash_mix(a, b, c);
-
- a += (__force u32)raddr->s6_addr32[3];
- b += (__force u32)rport;
- __jhash_mix(a, b, c);
+ u32 c;
+
+ c = jhash_3words((__force u32)raddr->s6_addr32[0],
+ (__force u32)raddr->s6_addr32[1],
+ (__force u32)raddr->s6_addr32[2],
+ rnd);
+
+ c = jhash_2words((__force u32)raddr->s6_addr32[3],
+ (__force u32)rport,
+ c);

return c & (synq_hsize - 1);
}
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index c7ba314..5e57490 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -104,26 +104,22 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
const struct in6_addr *daddr, u32 rnd)
{
- u32 a, b, c;
-
- a = (__force u32)saddr->s6_addr32[0];
- b = (__force u32)saddr->s6_addr32[1];
- c = (__force u32)saddr->s6_addr32[2];
-
- a += JHASH_GOLDEN_RATIO;
- b += JHASH_GOLDEN_RATIO;
- c += rnd;
- __jhash_mix(a, b, c);
-
- a += (__force u32)saddr->s6_addr32[3];
- b += (__force u32)daddr->s6_addr32[0];
- c += (__force u32)daddr->s6_addr32[1];
- __jhash_mix(a, b, c);
-
- a += (__force u32)daddr->s6_addr32[2];
- b += (__force u32)daddr->s6_addr32[3];
- c += (__force u32)id;
- __jhash_mix(a, b, c);
+ u32 c;
+
+ c = jhash_3words((__force u32)saddr->s6_addr32[0],
+ (__force u32)saddr->s6_addr32[1],
+ (__force u32)saddr->s6_addr32[2],
+ rnd);
+
+ c = jhash_3words((__force u32)saddr->s6_addr32[3],
+ (__force u32)daddr->s6_addr32[0],
+ (__force u32)daddr->s6_addr32[1],
+ c);
+
+ c = jhash_3words((__force u32)daddr->s6_addr32[2],
+ (__force u32)daddr->s6_addr32[3],
+ (__force u32)id,
+ c);

return c & (INETFRAGS_HASHSZ - 1);
}
--
1.7.0.4

2010-11-25 13:39:52

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH 1/2] Prepare the tree for un-inlined jhash.

On Thursday 2010-11-25 14:15, Jozsef Kadlecsik wrote:

>jhash is widely used in the kernel and because the functions
>are inlined, the cost in size is significant. Also, the new jhash
>functions are slightly larger than the previous ones so better un-inline.
>As a preparation step, the calls to the internal macros are replaced
>with the plain jhash function calls.

Do you have a non-normative allyesconfig/allmodconfig build whose
size(1) you can run on, to show approximately just how much it differs?


thanks,
Jan

2010-11-25 13:49:15

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 2/2] The new jhash implementation

Le jeudi 25 novembre 2010 à 14:15 +0100, Jozsef Kadlecsik a écrit :
> The current jhash.h implements the lookup2() hash function by Bob Jenkins.
> However, lookup2() is outdated as Bob wrote a new hash function called
> lookup3(). The new hash function
>
> - mixes better than lookup2(): it passes the check that every input bit
> changes every output bit 50% of the time, while lookup2() failed it.
> - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
> than lookup2() depending on the key length.
>
> The patch replaces the lookup2() implementation of the 'jhash*'
> functions with that of lookup3().
>
> You can read a longer comparison of the two and other hash functions at
> http://burtleburtle.net/bob/hash/doobs.html.
>
> Signed-off-by: Jozsef Kadlecsik <[email protected]>
> ---
> include/linux/jhash.h | 136 +++-----------------------------------------
> lib/Makefile | 2 +-
> lib/jhash.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 162 insertions(+), 129 deletions(-)
> create mode 100644 lib/jhash.c
>
...

I agree jhash() should be not be inlined.

I am not sure for other variants.

> +u32 jhash(const void *key, u32 length, u32 initval)
> +{
> + u32 a, b, c;
> + const u8 *k = key;
> +
> + /* Set up the internal state */
> + a = b = c = JHASH_INITVAL + length + initval;
> +
> + /* All but the last block: affect some 32 bits of (a,b,c) */
> + while (length > 12) {
> + a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24);

disassembly code on x86_32 for the previous line :

26: 66 90 xchg %ax,%ax
28: 0f b6 72 01 movzbl 0x1(%edx),%esi
2c: 0f b6 4a 02 movzbl 0x2(%edx),%ecx
30: c1 e6 08 shl $0x8,%esi
33: c1 e1 10 shl $0x10,%ecx
36: 8d 0c 0e lea (%esi,%ecx,1),%ecx
39: 0f b6 32 movzbl (%edx),%esi
3c: 8d 34 31 lea (%ecx,%esi,1),%esi
3f: 0f b6 4a 03 movzbl 0x3(%edx),%ecx
43: c1 e1 18 shl $0x18,%ecx
46: 8d 0c 0e lea (%esi,%ecx,1),%ecx

or (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) :

1b: 0f b6 7b 01 movzbl 0x1(%ebx),%edi
1f: c1 e7 08 shl $0x8,%edi
22: 89 7d f0 mov %edi,-0x10(%ebp)
25: 0f b6 7b 02 movzbl 0x2(%ebx),%edi
29: c1 e7 10 shl $0x10,%edi
2c: 03 7d f0 add -0x10(%ebp),%edi
2f: 89 7d f0 mov %edi,-0x10(%ebp)
32: 0f b6 3b movzbl (%ebx),%edi
35: 03 7d f0 add -0x10(%ebp),%edi
38: 89 7d f0 mov %edi,-0x10(%ebp)
3b: 0f b6 7b 03 movzbl 0x3(%ebx),%edi
3f: c1 e7 18 shl $0x18,%edi
42: 03 7d f0 add -0x10(%ebp),%edi


I suggest :

#include <linux/unaligned/packed_struct.h>
...
a += __get_unaligned_cpu32(k);
b += __get_unaligned_cpu32(k+4);
c += __get_unaligned_cpu32(k+8);

Fits nicely in registers.




> + b += k[4] + ((u32)k[5]<<8) + ((u32)k[6]<<16) + ((u32)k[7]<<24);
> + c += k[8] + ((u32)k[9]<<8) + ((u32)k[10]<<16) + ((u32)k[11]<<24);
> + __jhash_mix(a, b, c);
> + length -= 12;
> + k += 12;
> + }
> + /* Last block: affect all 32 bits of (c) */
> + /* All the case statements fall through */
> + switch (length) {
> + case 12: c += (u32)k[11]<<24;
> + case 11: c += (u32)k[10]<<16;
> + case 10: c += (u32)k[9]<<8;
> + case 9: c += k[8];
> + case 8: b += (u32)k[7]<<24;
> + case 7: b += (u32)k[6]<<16;
> + case 6: b += (u32)k[5]<<8;
> + case 5: b += k[4];
> + case 4: a += (u32)k[3]<<24;
> + case 3: a += (u32)k[2]<<16;
> + case 2: a += (u32)k[1]<<8;
> + case 1: a += k[0];
> + __jhash_final(a, b, c);
> + case 0: /* Nothing left to add */
> + break;
> + }
> +
> + return c;
> +}
> +EXPORT_SYMBOL(jhash);


2010-11-25 13:55:59

by Changli Gao

[permalink] [raw]
Subject: Re: [PATCH 2/2] The new jhash implementation

On Thu, Nov 25, 2010 at 9:49 PM, Eric Dumazet <[email protected]> wrote:
> Le jeudi 25 novembre 2010 ? 14:15 +0100, Jozsef Kadlecsik a ?crit :
>> The current jhash.h implements the lookup2() hash function by Bob Jenkins.
>> However, lookup2() is outdated as Bob wrote a new hash function called
>> lookup3(). The new hash function
>>
>> - mixes better than lookup2(): it passes the check that every input bit
>> ? changes every output bit 50% of the time, while lookup2() failed it.
>> - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
>> ? than lookup2() depending on the key length.
>>
>> The patch replaces the lookup2() implementation of the 'jhash*'
>> functions with that of lookup3().
>>
>> You can read a longer comparison of the two and other hash functions at
>> http://burtleburtle.net/bob/hash/doobs.html.
>>
>> Signed-off-by: Jozsef Kadlecsik <[email protected]>
>> ---
>> ?include/linux/jhash.h | ?136 +++-----------------------------------------
>> ?lib/Makefile ? ? ? ? ?| ? ?2 +-
>> ?lib/jhash.c ? ? ? ? ? | ?153 +++++++++++++++++++++++++++++++++++++++++++++++++
>> ?3 files changed, 162 insertions(+), 129 deletions(-)
>> ?create mode 100644 lib/jhash.c
>>
> ...
>
> I agree jhash() should be not be inlined.
>
> I am not sure for other variants.
>
>> +u32 jhash(const void *key, u32 length, u32 initval)
>> +{
>> + ? ? u32 a, b, c;
>> + ? ? const u8 *k = key;
>> +
>> + ? ? /* Set up the internal state */
>> + ? ? a = b = c = JHASH_INITVAL + length + initval;
>> +
>> + ? ? /* All but the last block: affect some 32 bits of (a,b,c) */
>> + ? ? while (length > 12) {
>> + ? ? ? ? ? ? a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24);
>
> disassembly code on x86_32 for the previous line :
>
> ?26: ? 66 90 ? ? ? ? ? ? ? ? ? xchg ? %ax,%ax
> ?28: ? 0f b6 72 01 ? ? ? ? ? ? movzbl 0x1(%edx),%esi
> ?2c: ? 0f b6 4a 02 ? ? ? ? ? ? movzbl 0x2(%edx),%ecx
> ?30: ? c1 e6 08 ? ? ? ? ? ? ? ?shl ? ?$0x8,%esi
> ?33: ? c1 e1 10 ? ? ? ? ? ? ? ?shl ? ?$0x10,%ecx
> ?36: ? 8d 0c 0e ? ? ? ? ? ? ? ?lea ? ?(%esi,%ecx,1),%ecx
> ?39: ? 0f b6 32 ? ? ? ? ? ? ? ?movzbl (%edx),%esi
> ?3c: ? 8d 34 31 ? ? ? ? ? ? ? ?lea ? ?(%ecx,%esi,1),%esi
> ?3f: ? 0f b6 4a 03 ? ? ? ? ? ? movzbl 0x3(%edx),%ecx
> ?43: ? c1 e1 18 ? ? ? ? ? ? ? ?shl ? ?$0x18,%ecx
> ?46: ? 8d 0c 0e ? ? ? ? ? ? ? ?lea ? ?(%esi,%ecx,1),%ecx
>
> or (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) :
>
> ?1b: ? 0f b6 7b 01 ? ? ? ? ? ? movzbl 0x1(%ebx),%edi
> ?1f: ? c1 e7 08 ? ? ? ? ? ? ? ?shl ? ?$0x8,%edi
> ?22: ? 89 7d f0 ? ? ? ? ? ? ? ?mov ? ?%edi,-0x10(%ebp)
> ?25: ? 0f b6 7b 02 ? ? ? ? ? ? movzbl 0x2(%ebx),%edi
> ?29: ? c1 e7 10 ? ? ? ? ? ? ? ?shl ? ?$0x10,%edi
> ?2c: ? 03 7d f0 ? ? ? ? ? ? ? ?add ? ?-0x10(%ebp),%edi
> ?2f: ? 89 7d f0 ? ? ? ? ? ? ? ?mov ? ?%edi,-0x10(%ebp)
> ?32: ? 0f b6 3b ? ? ? ? ? ? ? ?movzbl (%ebx),%edi
> ?35: ? 03 7d f0 ? ? ? ? ? ? ? ?add ? ?-0x10(%ebp),%edi
> ?38: ? 89 7d f0 ? ? ? ? ? ? ? ?mov ? ?%edi,-0x10(%ebp)
> ?3b: ? 0f b6 7b 03 ? ? ? ? ? ? movzbl 0x3(%ebx),%edi
> ?3f: ? c1 e7 18 ? ? ? ? ? ? ? ?shl ? ?$0x18,%edi
> ?42: ? 03 7d f0 ? ? ? ? ? ? ? ?add ? ?-0x10(%ebp),%edi
>
>
> I suggest :
>
> #include <linux/unaligned/packed_struct.h>
> ...
> ? ? ? ?a += __get_unaligned_cpu32(k);
> ? ? ? ?b += __get_unaligned_cpu32(k+4);
> ? ? ? ?c += __get_unaligned_cpu32(k+8);
>
> Fits nicely in registers.
>

I think you mean get_unaligned_le32().

--
Regards,
Changli Gao([email protected])

2010-11-25 13:57:37

by Jozsef Kadlecsik

[permalink] [raw]
Subject: Re: [PATCH 1/2] Prepare the tree for un-inlined jhash.

On Thu, 25 Nov 2010, Jan Engelhardt wrote:

> On Thursday 2010-11-25 14:15, Jozsef Kadlecsik wrote:
>
> >jhash is widely used in the kernel and because the functions
> >are inlined, the cost in size is significant. Also, the new jhash
> >functions are slightly larger than the previous ones so better un-inline.
> >As a preparation step, the calls to the internal macros are replaced
> >with the plain jhash function calls.
>
> Do you have a non-normative allyesconfig/allmodconfig build whose
> size(1) you can run on, to show approximately just how much it differs?

In the cover mail I referred the link to the message from Ilpo Jarvinen:
"I once looked into inlining cost and jhash functions were among the most
wasteful (kernel-wide). Multiple jhash bodies were 100+ bytes, and the
overall cost was 10k+."

Best regards,
Jozsef
-
E-mail : [email protected], [email protected]
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary

2010-11-25 14:05:54

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 2/2] The new jhash implementation

Le jeudi 25 novembre 2010 à 21:55 +0800, Changli Gao a écrit :

> > I suggest :
> >
> > #include <linux/unaligned/packed_struct.h>
> > ...
> > a += __get_unaligned_cpu32(k);
> > b += __get_unaligned_cpu32(k+4);
> > c += __get_unaligned_cpu32(k+8);
> >
> > Fits nicely in registers.
> >
>
> I think you mean get_unaligned_le32().
>

No, I meant __get_unaligned_cpu32()

We do same thing in jhash2() :

a += k[0];
b += k[1];
c += k[2];

We dont care of bit order of the 32bit quantity we are adding to a,b or
c , as long its consistent for the current machine ;)

get_unaligned_le32() would be slow on big endian arches.


2010-11-25 14:41:13

by Jozsef Kadlecsik

[permalink] [raw]
Subject: Re: [PATCH 2/2] The new jhash implementation

On Thu, 25 Nov 2010, Eric Dumazet wrote:

> I agree jhash() should be not be inlined.
>
> I am not sure for other variants.

Yeah, I have got the same feelings. I decided to un-inline all of them
because that way the internal macros are not exposed at all.

> > +u32 jhash(const void *key, u32 length, u32 initval)
> > +{
> > + u32 a, b, c;
> > + const u8 *k = key;
> > +
> > + /* Set up the internal state */
> > + a = b = c = JHASH_INITVAL + length + initval;
> > +
> > + /* All but the last block: affect some 32 bits of (a,b,c) */
> > + while (length > 12) {
> > + a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24);
>
> disassembly code on x86_32 for the previous line :
>
> 26: 66 90 xchg %ax,%ax
> 28: 0f b6 72 01 movzbl 0x1(%edx),%esi
> 2c: 0f b6 4a 02 movzbl 0x2(%edx),%ecx
> 30: c1 e6 08 shl $0x8,%esi
> 33: c1 e1 10 shl $0x10,%ecx
> 36: 8d 0c 0e lea (%esi,%ecx,1),%ecx
> 39: 0f b6 32 movzbl (%edx),%esi
> 3c: 8d 34 31 lea (%ecx,%esi,1),%esi
> 3f: 0f b6 4a 03 movzbl 0x3(%edx),%ecx
> 43: c1 e1 18 shl $0x18,%ecx
> 46: 8d 0c 0e lea (%esi,%ecx,1),%ecx
>
> or (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) :
>
> 1b: 0f b6 7b 01 movzbl 0x1(%ebx),%edi
> 1f: c1 e7 08 shl $0x8,%edi
> 22: 89 7d f0 mov %edi,-0x10(%ebp)
> 25: 0f b6 7b 02 movzbl 0x2(%ebx),%edi
> 29: c1 e7 10 shl $0x10,%edi
> 2c: 03 7d f0 add -0x10(%ebp),%edi
> 2f: 89 7d f0 mov %edi,-0x10(%ebp)
> 32: 0f b6 3b movzbl (%ebx),%edi
> 35: 03 7d f0 add -0x10(%ebp),%edi
> 38: 89 7d f0 mov %edi,-0x10(%ebp)
> 3b: 0f b6 7b 03 movzbl 0x3(%ebx),%edi
> 3f: c1 e7 18 shl $0x18,%edi
> 42: 03 7d f0 add -0x10(%ebp),%edi
>
>
> I suggest :
>
> #include <linux/unaligned/packed_struct.h>
> ...
> a += __get_unaligned_cpu32(k);
> b += __get_unaligned_cpu32(k+4);
> c += __get_unaligned_cpu32(k+8);
>
> Fits nicely in registers.

Good idea, thanks!

Here follows the updated second patch:

The current jhash.h implements the lookup2() hash function by Bob Jenkins.
However, lookup2() is outdated as Bob wrote a new hash function called
lookup3(). The new hash function

- mixes better than lookup2(): it passes the check that every input bit
changes every output bit 50% of the time, while lookup2() failed it.
- performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
than lookup2() depending on the key length.

The patch replaces the lookup2() implementation of the 'jhash*'
functions with that of lookup3().

You can read a longer comparison of the two and other hash functions at
http://burtleburtle.net/bob/hash/doobs.html.

Signed-off-by: Jozsef Kadlecsik <[email protected]>
---
include/linux/jhash.h | 136 +++----------------------------------------
lib/Makefile | 2 +-
lib/jhash.c | 154 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 163 insertions(+), 129 deletions(-)
create mode 100644 lib/jhash.c

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index ced1159..ca69ac3 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -1,134 +1,14 @@
#ifndef _LINUX_JHASH_H
#define _LINUX_JHASH_H

-/* jhash.h: Jenkins hash support.
- *
- * Copyright (C) 1996 Bob Jenkins ([email protected])
- *
- * http://burtleburtle.net/bob/hash/
- *
- * These are the credits from Bob's sources:
- *
- * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
- * hash(), hash2(), hash3, and mix() are externally useful functions.
- * Routines to test the hash are included if SELF_TEST is defined.
- * You can use this free for any purpose. It has no warranty.
- *
- * Copyright (C) 2003 David S. Miller ([email protected])
- *
- * I've modified Bob's hash to be useful in the Linux kernel, and
- * any bugs present are surely my fault. -DaveM
- */
-
-/* NOTE: Arguments are modified. */
-#define __jhash_mix(a, b, c) \
-{ \
- a -= b; a -= c; a ^= (c>>13); \
- b -= c; b -= a; b ^= (a<<8); \
- c -= a; c -= b; c ^= (b>>13); \
- a -= b; a -= c; a ^= (c>>12); \
- b -= c; b -= a; b ^= (a<<16); \
- c -= a; c -= b; c ^= (b>>5); \
- a -= b; a -= c; a ^= (c>>3); \
- b -= c; b -= a; b ^= (a<<10); \
- c -= a; c -= b; c ^= (b>>15); \
-}
-
-/* The golden ration: an arbitrary value */
-#define JHASH_GOLDEN_RATIO 0x9e3779b9
-
-/* The most generic version, hashes an arbitrary sequence
- * of bytes. No alignment or length assumptions are made about
- * the input key.
- */
-static inline u32 jhash(const void *key, u32 length, u32 initval)
-{
- u32 a, b, c, len;
- const u8 *k = key;
-
- len = length;
- a = b = JHASH_GOLDEN_RATIO;
- c = initval;
-
- while (len >= 12) {
- a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
- b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
- c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
-
- __jhash_mix(a,b,c);
-
- k += 12;
- len -= 12;
- }
-
- c += length;
- switch (len) {
- case 11: c += ((u32)k[10]<<24);
- case 10: c += ((u32)k[9]<<16);
- case 9 : c += ((u32)k[8]<<8);
- case 8 : b += ((u32)k[7]<<24);
- case 7 : b += ((u32)k[6]<<16);
- case 6 : b += ((u32)k[5]<<8);
- case 5 : b += k[4];
- case 4 : a += ((u32)k[3]<<24);
- case 3 : a += ((u32)k[2]<<16);
- case 2 : a += ((u32)k[1]<<8);
- case 1 : a += k[0];
- };
-
- __jhash_mix(a,b,c);
-
- return c;
-}
-
-/* A special optimized version that handles 1 or more of u32s.
- * The length parameter here is the number of u32s in the key.
- */
-static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
-{
- u32 a, b, c, len;
-
- a = b = JHASH_GOLDEN_RATIO;
- c = initval;
- len = length;
-
- while (len >= 3) {
- a += k[0];
- b += k[1];
- c += k[2];
- __jhash_mix(a, b, c);
- k += 3; len -= 3;
- }
-
- c += length * 4;
-
- switch (len) {
- case 2 : b += k[1];
- case 1 : a += k[0];
- };
-
- __jhash_mix(a,b,c);
-
- return c;
-}
-
-
-/* A special ultra-optimized versions that knows they are hashing exactly
- * 3, 2 or 1 word(s).
- *
- * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally
- * done at the end is not done here.
- */
-static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
-{
- a += JHASH_GOLDEN_RATIO;
- b += JHASH_GOLDEN_RATIO;
- c += initval;
-
- __jhash_mix(a, b, c);
-
- return c;
-}
+/* Best hash sizes are of power of two */
+#define jhash_size(n) ((u32)1<<(n))
+/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
+#define jhash_mask(n) (jhash_size(n)-1)
+
+extern u32 jhash(const void *key, u32 length, u32 initval);
+extern u32 jhash2(const u32 *k, u32 length, u32 initval);
+extern u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval);

static inline u32 jhash_2words(u32 a, u32 b, u32 initval)
{
diff --git a/lib/Makefile b/lib/Makefile
index e6a3763..a1a4932 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -10,7 +10,7 @@ endif
lib-y := ctype.o string.o vsprintf.o cmdline.o \
rbtree.o radix-tree.o dump_stack.o \
idr.o int_sqrt.o extable.o prio_tree.o \
- sha1.o irq_regs.o reciprocal_div.o argv_split.o \
+ jhash.o sha1.o irq_regs.o reciprocal_div.o argv_split.o \
proportions.o prio_heap.o ratelimit.o show_mem.o \
is_single_threaded.o plist.o decompress.o flex_array.o

diff --git a/lib/jhash.c b/lib/jhash.c
new file mode 100644
index 0000000..538277b
--- /dev/null
+++ b/lib/jhash.c
@@ -0,0 +1,154 @@
+/* jhash.c: Jenkins hash support.
+ *
+ * Copyright (C) 2006. Bob Jenkins ([email protected])
+ *
+ * http://burtleburtle.net/bob/hash/
+ *
+ * These are the credits from Bob's sources:
+ *
+ * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+ *
+ * These are functions for producing 32-bit hashes for hash table lookup.
+ * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+ * are externally useful functions. Routines to test the hash are included
+ * if SELF_TEST is defined. You can use this free for any purpose. It's in
+ * the public domain. It has no warranty.
+ *
+ * Copyright (C) 2009-2010 Jozsef Kadlecsik ([email protected])
+ *
+ * I've modified Bob's hash to be useful in the Linux kernel, and
+ * any bugs present are my fault.
+ * Jozsef
+ */
+#include <linux/bitops.h>
+#include <linux/module.h>
+#include <linux/unaligned/packed_struct.h>
+#include <linux/jhash.h>
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c) \
+{ \
+ a -= c; a ^= rol32(c, 4); c += b; \
+ b -= a; b ^= rol32(a, 6); a += c; \
+ c -= b; c ^= rol32(b, 8); b += a; \
+ a -= c; a ^= rol32(c, 16); c += b; \
+ b -= a; b ^= rol32(a, 19); a += c; \
+ c -= b; c ^= rol32(b, 4); b += a; \
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c) \
+{ \
+ c ^= b; c -= rol32(b, 14); \
+ a ^= c; a -= rol32(c, 11); \
+ b ^= a; b -= rol32(a, 25); \
+ c ^= b; c -= rol32(b, 16); \
+ a ^= c; a -= rol32(c, 4); \
+ b ^= a; b -= rol32(a, 14); \
+ c ^= b; c -= rol32(b, 24); \
+}
+
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL 0xdeadbeef
+
+/* jhash - hash an arbitrary key
+ * @k: sequence of bytes as key
+ * @length: the length of the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * The generic version, hashes an arbitrary sequence of bytes.
+ * No alignment or length assumptions are made about the input key.
+ *
+ * Returns the hash value of the key. The result depends on endianness.
+ */
+u32 jhash(const void *key, u32 length, u32 initval)
+{
+ u32 a, b, c;
+ const u8 *k = key;
+
+ /* Set up the internal state */
+ a = b = c = JHASH_INITVAL + length + initval;
+
+ /* All but the last block: affect some 32 bits of (a,b,c) */
+ while (length > 12) {
+ a += __get_unaligned_cpu32(k);
+ b += __get_unaligned_cpu32(k + 4);
+ c += __get_unaligned_cpu32(k + 8);
+ __jhash_mix(a, b, c);
+ length -= 12;
+ k += 12;
+ }
+ /* Last block: affect all 32 bits of (c) */
+ /* All the case statements fall through */
+ switch (length) {
+ case 12: c += (u32)k[11]<<24;
+ case 11: c += (u32)k[10]<<16;
+ case 10: c += (u32)k[9]<<8;
+ case 9: c += k[8];
+ case 8: b += (u32)k[7]<<24;
+ case 7: b += (u32)k[6]<<16;
+ case 6: b += (u32)k[5]<<8;
+ case 5: b += k[4];
+ case 4: a += (u32)k[3]<<24;
+ case 3: a += (u32)k[2]<<16;
+ case 2: a += (u32)k[1]<<8;
+ case 1: a += k[0];
+ __jhash_final(a, b, c);
+ case 0: /* Nothing left to add */
+ break;
+ }
+
+ return c;
+}
+EXPORT_SYMBOL(jhash);
+
+/* jhash2 - hash an array of u32's
+ * @k: the key which must be an array of u32's
+ * @length: the number of u32's in the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * Returns the hash value of the key.
+ */
+u32 jhash2(const u32 *k, u32 length, u32 initval)
+{
+ u32 a, b, c;
+
+ /* Set up the internal state */
+ a = b = c = JHASH_INITVAL + (length<<2) + initval;
+
+ /* Handle most of the key */
+ while (length > 3) {
+ a += k[0];
+ b += k[1];
+ c += k[2];
+ __jhash_mix(a, b, c);
+ length -= 3;
+ k += 3;
+ }
+
+ /* Handle the last 3 u32's: all the case statements fall through */
+ switch (length) {
+ case 3: c += k[2];
+ case 2: b += k[1];
+ case 1: a += k[0];
+ __jhash_final(a, b, c);
+ case 0: /* Nothing left to add */
+ break;
+ }
+
+ return c;
+}
+EXPORT_SYMBOL(jhash2);
+
+/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
+u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
+{
+ a += JHASH_INITVAL;
+ b += JHASH_INITVAL;
+ c += initval;
+
+ __jhash_mix(a, b, c);
+
+ return c;
+}
+EXPORT_SYMBOL(jhash_3words);
--
1.7.0.4

Best regards,
Jozsef
-
E-mail : [email protected], [email protected]
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary

2010-11-25 17:21:16

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 2/2] The new jhash implementation

Le jeudi 25 novembre 2010 à 15:41 +0100, Jozsef Kadlecsik a écrit :

...

> +/* __jhash_mix -- mix 3 32-bit values reversibly. */
> +#define __jhash_mix(a, b, c) \
> +{ \
> + a -= c; a ^= rol32(c, 4); c += b; \
> + b -= a; b ^= rol32(a, 6); a += c; \
> + c -= b; c ^= rol32(b, 8); b += a; \
> + a -= c; a ^= rol32(c, 16); c += b; \
> + b -= a; b ^= rol32(a, 19); a += c; \
> + c -= b; c ^= rol32(b, 4); b += a; \
> +}
> +
> +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
> +#define __jhash_final(a, b, c) \
> +{ \
> + c ^= b; c -= rol32(b, 14); \
> + a ^= c; a -= rol32(c, 11); \
> + b ^= a; b -= rol32(a, 25); \
> + c ^= b; c -= rol32(b, 16); \
> + a ^= c; a -= rol32(c, 4); \
> + b ^= a; b -= rol32(a, 14); \
> + c ^= b; c -= rol32(b, 24); \
> +}
> +

So we now have a special __jhash_final(a, b, c) thing for the last
values.



> +/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
> +u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
> +{
> + a += JHASH_INITVAL;
> + b += JHASH_INITVAL;
> + c += initval;
> +
> + __jhash_mix(a, b, c);
> +
> + return c;
> +}
> +EXPORT_SYMBOL(jhash_3words);


But you dont use it in jhash_3words().


I do think jhash_3words() should stay inlined, unless maybe
CONFIG_CC_OPTIMIZE_FOR_SIZE=y

We hit it several time per packet in network stack in RX path.

Once in skb_get_rxhash() (unless device fills skb->rxhash)
Once at least in conntrack (if used).
Once in UDP or TCP stack


2010-11-25 21:14:22

by Jozsef Kadlecsik

[permalink] [raw]
Subject: Re: [PATCH 2/2] The new jhash implementation

On Thu, 25 Nov 2010, Eric Dumazet wrote:

> Le jeudi 25 novembre 2010 ? 15:41 +0100, Jozsef Kadlecsik a ?crit :
>
> > +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
> > +#define __jhash_final(a, b, c) \
> > +{ \
> > + c ^= b; c -= rol32(b, 14); \
> > + a ^= c; a -= rol32(c, 11); \
> > + b ^= a; b -= rol32(a, 25); \
> > + c ^= b; c -= rol32(b, 16); \
> > + a ^= c; a -= rol32(c, 4); \
> > + b ^= a; b -= rol32(a, 14); \
> > + c ^= b; c -= rol32(b, 24); \
> > +}
> > +
>
> So we now have a special __jhash_final(a, b, c) thing for the last
> values.

Yes, and...

> > +/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
> > +u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
> > +{
> > + a += JHASH_INITVAL;
> > + b += JHASH_INITVAL;
> > + c += initval;
> > +
> > + __jhash_mix(a, b, c);
> > +
> > + return c;
> > +}
> > +EXPORT_SYMBOL(jhash_3words);
>
> But you dont use it in jhash_3words().

... excellent spotting! That should be __jhash_final in jhash_3words.

> I do think jhash_3words() should stay inlined, unless maybe
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y
>
> We hit it several time per packet in network stack in RX path.
>
> Once in skb_get_rxhash() (unless device fills skb->rxhash)
> Once at least in conntrack (if used).
> Once in UDP or TCP stack

There follows the new version of the second patch: jhash_3words is moved
back to inlined form and the bug in it fixed. Thanks for your thorough
reviewing indeed!

The current jhash.h implements the lookup2() hash function by Bob Jenkins.
However, lookup2() is outdated as Bob wrote a new hash function called
lookup3(). The new hash function

- mixes better than lookup2(): it passes the check that every input bit
changes every output bit 50% of the time, while lookup2() failed it.
- performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
than lookup2() depending on the key length.

The patch replaces the lookup2() implementation of the 'jhash*'
functions with that of lookup3().

You can read a longer comparison of the two and other hash functions at
http://burtleburtle.net/bob/hash/doobs.html.

Signed-off-by: Jozsef Kadlecsik <[email protected]>
---
include/linux/jhash.h | 140 ++++++++-----------------------------------------
lib/Makefile | 2 +-
lib/jhash.c | 127 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 151 insertions(+), 118 deletions(-)
create mode 100644 lib/jhash.c

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index ced1159..963abad 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -1,131 +1,37 @@
#ifndef _LINUX_JHASH_H
#define _LINUX_JHASH_H

-/* jhash.h: Jenkins hash support.
- *
- * Copyright (C) 1996 Bob Jenkins ([email protected])
- *
- * http://burtleburtle.net/bob/hash/
- *
- * These are the credits from Bob's sources:
- *
- * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
- * hash(), hash2(), hash3, and mix() are externally useful functions.
- * Routines to test the hash are included if SELF_TEST is defined.
- * You can use this free for any purpose. It has no warranty.
- *
- * Copyright (C) 2003 David S. Miller ([email protected])
- *
- * I've modified Bob's hash to be useful in the Linux kernel, and
- * any bugs present are surely my fault. -DaveM
- */
-
-/* NOTE: Arguments are modified. */
-#define __jhash_mix(a, b, c) \
-{ \
- a -= b; a -= c; a ^= (c>>13); \
- b -= c; b -= a; b ^= (a<<8); \
- c -= a; c -= b; c ^= (b>>13); \
- a -= b; a -= c; a ^= (c>>12); \
- b -= c; b -= a; b ^= (a<<16); \
- c -= a; c -= b; c ^= (b>>5); \
- a -= b; a -= c; a ^= (c>>3); \
- b -= c; b -= a; b ^= (a<<10); \
- c -= a; c -= b; c ^= (b>>15); \
-}
-
-/* The golden ration: an arbitrary value */
-#define JHASH_GOLDEN_RATIO 0x9e3779b9
-
-/* The most generic version, hashes an arbitrary sequence
- * of bytes. No alignment or length assumptions are made about
- * the input key.
- */
-static inline u32 jhash(const void *key, u32 length, u32 initval)
-{
- u32 a, b, c, len;
- const u8 *k = key;
-
- len = length;
- a = b = JHASH_GOLDEN_RATIO;
- c = initval;
-
- while (len >= 12) {
- a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
- b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
- c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
-
- __jhash_mix(a,b,c);
-
- k += 12;
- len -= 12;
- }
-
- c += length;
- switch (len) {
- case 11: c += ((u32)k[10]<<24);
- case 10: c += ((u32)k[9]<<16);
- case 9 : c += ((u32)k[8]<<8);
- case 8 : b += ((u32)k[7]<<24);
- case 7 : b += ((u32)k[6]<<16);
- case 6 : b += ((u32)k[5]<<8);
- case 5 : b += k[4];
- case 4 : a += ((u32)k[3]<<24);
- case 3 : a += ((u32)k[2]<<16);
- case 2 : a += ((u32)k[1]<<8);
- case 1 : a += k[0];
- };
-
- __jhash_mix(a,b,c);
-
- return c;
+/* Best hash sizes are of power of two */
+#define jhash_size(n) ((u32)1<<(n))
+/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
+#define jhash_mask(n) (jhash_size(n)-1)
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c) \
+{ \
+ c ^= b; c -= rol32(b, 14); \
+ a ^= c; a -= rol32(c, 11); \
+ b ^= a; b -= rol32(a, 25); \
+ c ^= b; c -= rol32(b, 16); \
+ a ^= c; a -= rol32(c, 4); \
+ b ^= a; b -= rol32(a, 14); \
+ c ^= b; c -= rol32(b, 24); \
}

-/* A special optimized version that handles 1 or more of u32s.
- * The length parameter here is the number of u32s in the key.
- */
-static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
-{
- u32 a, b, c, len;
-
- a = b = JHASH_GOLDEN_RATIO;
- c = initval;
- len = length;
-
- while (len >= 3) {
- a += k[0];
- b += k[1];
- c += k[2];
- __jhash_mix(a, b, c);
- k += 3; len -= 3;
- }
-
- c += length * 4;
-
- switch (len) {
- case 2 : b += k[1];
- case 1 : a += k[0];
- };
-
- __jhash_mix(a,b,c);
-
- return c;
-}
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL 0xdeadbeef

+extern u32 jhash(const void *key, u32 length, u32 initval);
+extern u32 jhash2(const u32 *k, u32 length, u32 initval);

-/* A special ultra-optimized versions that knows they are hashing exactly
- * 3, 2 or 1 word(s).
- *
- * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally
- * done at the end is not done here.
- */
+/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
{
- a += JHASH_GOLDEN_RATIO;
- b += JHASH_GOLDEN_RATIO;
+ a += JHASH_INITVAL;
+ b += JHASH_INITVAL;
c += initval;

- __jhash_mix(a, b, c);
+ __jhash_final(a, b, c);

return c;
}
diff --git a/lib/Makefile b/lib/Makefile
index e6a3763..a1a4932 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -10,7 +10,7 @@ endif
lib-y := ctype.o string.o vsprintf.o cmdline.o \
rbtree.o radix-tree.o dump_stack.o \
idr.o int_sqrt.o extable.o prio_tree.o \
- sha1.o irq_regs.o reciprocal_div.o argv_split.o \
+ jhash.o sha1.o irq_regs.o reciprocal_div.o argv_split.o \
proportions.o prio_heap.o ratelimit.o show_mem.o \
is_single_threaded.o plist.o decompress.o flex_array.o

diff --git a/lib/jhash.c b/lib/jhash.c
new file mode 100644
index 0000000..e0c8d11
--- /dev/null
+++ b/lib/jhash.c
@@ -0,0 +1,127 @@
+/* jhash.c: Jenkins hash support.
+ *
+ * Copyright (C) 2006. Bob Jenkins ([email protected])
+ *
+ * http://burtleburtle.net/bob/hash/
+ *
+ * These are the credits from Bob's sources:
+ *
+ * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+ *
+ * These are functions for producing 32-bit hashes for hash table lookup.
+ * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+ * are externally useful functions. Routines to test the hash are included
+ * if SELF_TEST is defined. You can use this free for any purpose. It's in
+ * the public domain. It has no warranty.
+ *
+ * Copyright (C) 2009-2010 Jozsef Kadlecsik ([email protected])
+ *
+ * I've modified Bob's hash to be useful in the Linux kernel, and
+ * any bugs present are my fault.
+ * Jozsef
+ */
+#include <linux/bitops.h>
+#include <linux/module.h>
+#include <linux/unaligned/packed_struct.h>
+#include <linux/jhash.h>
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c) \
+{ \
+ a -= c; a ^= rol32(c, 4); c += b; \
+ b -= a; b ^= rol32(a, 6); a += c; \
+ c -= b; c ^= rol32(b, 8); b += a; \
+ a -= c; a ^= rol32(c, 16); c += b; \
+ b -= a; b ^= rol32(a, 19); a += c; \
+ c -= b; c ^= rol32(b, 4); b += a; \
+}
+
+/* jhash - hash an arbitrary key
+ * @k: sequence of bytes as key
+ * @length: the length of the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * The generic version, hashes an arbitrary sequence of bytes.
+ * No alignment or length assumptions are made about the input key.
+ *
+ * Returns the hash value of the key. The result depends on endianness.
+ */
+u32 jhash(const void *key, u32 length, u32 initval)
+{
+ u32 a, b, c;
+ const u8 *k = key;
+
+ /* Set up the internal state */
+ a = b = c = JHASH_INITVAL + length + initval;
+
+ /* All but the last block: affect some 32 bits of (a,b,c) */
+ while (length > 12) {
+ a += __get_unaligned_cpu32(k);
+ b += __get_unaligned_cpu32(k + 4);
+ c += __get_unaligned_cpu32(k + 8);
+ __jhash_mix(a, b, c);
+ length -= 12;
+ k += 12;
+ }
+ /* Last block: affect all 32 bits of (c) */
+ /* All the case statements fall through */
+ switch (length) {
+ case 12: c += (u32)k[11]<<24;
+ case 11: c += (u32)k[10]<<16;
+ case 10: c += (u32)k[9]<<8;
+ case 9: c += k[8];
+ case 8: b += (u32)k[7]<<24;
+ case 7: b += (u32)k[6]<<16;
+ case 6: b += (u32)k[5]<<8;
+ case 5: b += k[4];
+ case 4: a += (u32)k[3]<<24;
+ case 3: a += (u32)k[2]<<16;
+ case 2: a += (u32)k[1]<<8;
+ case 1: a += k[0];
+ __jhash_final(a, b, c);
+ case 0: /* Nothing left to add */
+ break;
+ }
+
+ return c;
+}
+EXPORT_SYMBOL(jhash);
+
+/* jhash2 - hash an array of u32's
+ * @k: the key which must be an array of u32's
+ * @length: the number of u32's in the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * Returns the hash value of the key.
+ */
+u32 jhash2(const u32 *k, u32 length, u32 initval)
+{
+ u32 a, b, c;
+
+ /* Set up the internal state */
+ a = b = c = JHASH_INITVAL + (length<<2) + initval;
+
+ /* Handle most of the key */
+ while (length > 3) {
+ a += k[0];
+ b += k[1];
+ c += k[2];
+ __jhash_mix(a, b, c);
+ length -= 3;
+ k += 3;
+ }
+
+ /* Handle the last 3 u32's: all the case statements fall through */
+ switch (length) {
+ case 3: c += k[2];
+ case 2: b += k[1];
+ case 1: a += k[0];
+ __jhash_final(a, b, c);
+ case 0: /* Nothing left to add */
+ break;
+ }
+
+ return c;
+}
+EXPORT_SYMBOL(jhash2);
+
--
1.7.0.4

Best regards,
Jozsef
-
E-mail : [email protected], [email protected]
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary

2010-11-29 23:27:36

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 2/2] The new jhash implementation

On Thu, 25 Nov 2010 11:45:08 pm Jozsef Kadlecsik wrote:
> The current jhash.h implements the lookup2() hash function by Bob Jenkins.
> However, lookup2() is outdated as Bob wrote a new hash function called
> lookup3(). The new hash function

Oops, we discussed this ages ago, looks like it fell through the cracks.

Thanks Jozsef!

> +extern u32 jhash(const void *key, u32 length, u32 initval);
> +extern u32 jhash2(const u32 *k, u32 length, u32 initval);
> +extern u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval);

This last one can be marked __attribute__((const)) for a little
extra compiler help, too.

Cheers,
Rusty.

2010-12-03 12:39:08

by Jozsef Kadlecsik

[permalink] [raw]
Subject: [PATCH 1/2] Remove calls to jhash internals

Because the jhash implementation changes, replace the calls to the jhash
internal macros with calls to the jhash functions.
---
net/ipv6/inet6_connection_sock.c | 18 ++++++++----------
net/ipv6/reassembly.c | 36 ++++++++++++++++--------------------
2 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 8a16280..861d252 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -60,18 +60,16 @@ EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict);
static u32 inet6_synq_hash(const struct in6_addr *raddr, const __be16 rport,
const u32 rnd, const u16 synq_hsize)
{
- u32 a = (__force u32)raddr->s6_addr32[0];
- u32 b = (__force u32)raddr->s6_addr32[1];
- u32 c = (__force u32)raddr->s6_addr32[2];
-
- a += JHASH_GOLDEN_RATIO;
- b += JHASH_GOLDEN_RATIO;
- c += rnd;
- __jhash_mix(a, b, c);
-
- a += (__force u32)raddr->s6_addr32[3];
- b += (__force u32)rport;
- __jhash_mix(a, b, c);
+ u32 c;
+
+ c = jhash_3words((__force u32)raddr->s6_addr32[0],
+ (__force u32)raddr->s6_addr32[1],
+ (__force u32)raddr->s6_addr32[2],
+ rnd);
+
+ c = jhash_2words((__force u32)raddr->s6_addr32[3],
+ (__force u32)rport,
+ c);

return c & (synq_hsize - 1);
}
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index c7ba314..5e57490 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -104,26 +104,22 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
const struct in6_addr *daddr, u32 rnd)
{
- u32 a, b, c;
-
- a = (__force u32)saddr->s6_addr32[0];
- b = (__force u32)saddr->s6_addr32[1];
- c = (__force u32)saddr->s6_addr32[2];
-
- a += JHASH_GOLDEN_RATIO;
- b += JHASH_GOLDEN_RATIO;
- c += rnd;
- __jhash_mix(a, b, c);
-
- a += (__force u32)saddr->s6_addr32[3];
- b += (__force u32)daddr->s6_addr32[0];
- c += (__force u32)daddr->s6_addr32[1];
- __jhash_mix(a, b, c);
-
- a += (__force u32)daddr->s6_addr32[2];
- b += (__force u32)daddr->s6_addr32[3];
- c += (__force u32)id;
- __jhash_mix(a, b, c);
+ u32 c;
+
+ c = jhash_3words((__force u32)saddr->s6_addr32[0],
+ (__force u32)saddr->s6_addr32[1],
+ (__force u32)saddr->s6_addr32[2],
+ rnd);
+
+ c = jhash_3words((__force u32)saddr->s6_addr32[3],
+ (__force u32)daddr->s6_addr32[0],
+ (__force u32)daddr->s6_addr32[1],
+ c);
+
+ c = jhash_3words((__force u32)daddr->s6_addr32[2],
+ (__force u32)daddr->s6_addr32[3],
+ (__force u32)id,
+ c);

return c & (INETFRAGS_HASHSZ - 1);
}
--
1.7.0.4

2010-12-03 12:39:09

by Jozsef Kadlecsik

[permalink] [raw]
Subject: [PATCH 0/2] New jhash function

Hi,

The current jhash.h implements the lookup2() hash function by Bob Jenkins.
However, lookup2() is outdated as Bob wrote a new hash function called
lookup3(). There is a longer comparison of those two and other hash
functions at http://burtleburtle.net/bob/hash/doobs.html.

Please consider applying the following patches.

Speed

I wrote a small benchmark program to compare jhash2 and jhash3 (you can
download it from http://www.kfki.hu/~kadlec/sw/netfilter/jhash23.tgz).
On a machine with Core2 Duo and compiled with -O2, the ratio of the execution
time for the byte variants of the hash functions were (in parens the different
key sizes):

jhash2/jhash3 (4 bytes): 1.587518
jhash2/jhash3 (8 bytes): 1.282824
jhash2/jhash3 (12 bytes): 2.349628
jhash2/jhash3 (16 bytes): 1.466988
jhash2/jhash3 (32 bytes): 1.501063
jhash2/jhash3 (64 bytes): 1.587527
jhash2/jhash3 (128 bytes): 1.628037
jhash2/jhash3 (256 bytes): 1.648318

Similarly, for the word variants

jhash2/jhash3 (1 words): 1.300904
jhash2/jhash3 (2 words): 1.316108
jhash2/jhash3 (3 words): 2.249708
jhash2/jhash3 (4 words): 1.460192
jhash2/jhash3 (8 words): 1.501438
jhash2/jhash3 (16 words): 1.558039
jhash2/jhash3 (32 words): 1.520082
jhash2/jhash3 (64 words): 1.528347

Sizes

When compiling just the byte variants the sizes are

text data bss dec hex filename
661 0 0 661 295 jhashb2.o
441 0 0 441 1b9 jhashb3.o

and for the word variants

text data bss dec hex filename
354 0 0 354 162 jhashw2.o
248 0 0 248 f8 jhashw3.o

I compiled the kernel with "allyesconfig", in three variants: jhash2, jhash3 and
jhash3 un-inlined

text data bss dec hex filename
69297477 11282083 35904032 116483592 6f16608 vmlinux.jhash2
69293829 11282083 35903728 116479640 6f15698 vmlinux.jhash3
69288578 11282083 35902336 116472997 6f13ca5 vmlinux.jhash3-uninlined

With jhash3 we can gain 3.6k and un-inlining shrinks the code with an additional
5.2k. In the patch I left jhash(3) inlined.

Uniformity

According to Bob Jenkins, lookup3() mixes better than lookup2(): it passes
the check that every input bit changes every output bit 50% of the time, while
lookup2() failed it. In order to verify it I added jhash3 to the cttest program,
which tests hash functions with (artifical, worst case) netfilter conntrack data
and calculates the statistics (chi-square, probability of longest bucket, etc).
You can find the program and the results here:
http://www.kfki.hu/~kadlec/sw/netfilter/ct3/ - to sum up, lookup3() produces
uniform key values and no weakness could be spotted.

Many thanks to Eric Dumazet for his thorough reviewing.

Dave applied the first patch to net-next-2.6.

Jozsef Kadlecsik (2):
Remove calls to jhash internals
The new jhash implementation

include/linux/jhash.h | 183 ++++++++++++++++++++++----------------
net/ipv6/inet6_connection_sock.c | 18 ++--
net/ipv6/reassembly.c | 36 ++++----
3 files changed, 129 insertions(+), 108 deletions(-)

2010-12-03 12:39:07

by Jozsef Kadlecsik

[permalink] [raw]
Subject: [PATCH 2/2] The new jhash implementation

The current jhash.h implements the lookup2() hash function by Bob Jenkins.
However, lookup2() is outdated as Bob wrote a new hash function called
lookup3(). The patch replaces the lookup2() implementation of the 'jhash*'
functions with that of lookup3().

You can read a longer comparison of the two and other hash functions at
http://burtleburtle.net/bob/hash/doobs.html.
---
include/linux/jhash.h | 183 ++++++++++++++++++++++++++++---------------------
1 files changed, 105 insertions(+), 78 deletions(-)

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index ced1159..47cb09e 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -3,129 +3,156 @@

/* jhash.h: Jenkins hash support.
*
- * Copyright (C) 1996 Bob Jenkins ([email protected])
+ * Copyright (C) 2006. Bob Jenkins ([email protected])
*
* http://burtleburtle.net/bob/hash/
*
* These are the credits from Bob's sources:
*
- * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
- * hash(), hash2(), hash3, and mix() are externally useful functions.
- * Routines to test the hash are included if SELF_TEST is defined.
- * You can use this free for any purpose. It has no warranty.
+ * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
*
- * Copyright (C) 2003 David S. Miller ([email protected])
+ * These are functions for producing 32-bit hashes for hash table lookup.
+ * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+ * are externally useful functions. Routines to test the hash are included
+ * if SELF_TEST is defined. You can use this free for any purpose. It's in
+ * the public domain. It has no warranty.
+ *
+ * Copyright (C) 2009-2010 Jozsef Kadlecsik ([email protected])
*
* I've modified Bob's hash to be useful in the Linux kernel, and
- * any bugs present are surely my fault. -DaveM
+ * any bugs present are my fault.
+ * Jozsef
*/
+#include <linux/bitops.h>
+#include <linux/unaligned/packed_struct.h>
+
+/* Best hash sizes are of power of two */
+#define jhash_size(n) ((u32)1<<(n))
+/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
+#define jhash_mask(n) (jhash_size(n)-1)
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c) \
+{ \
+ a -= c; a ^= rol32(c, 4); c += b; \
+ b -= a; b ^= rol32(a, 6); a += c; \
+ c -= b; c ^= rol32(b, 8); b += a; \
+ a -= c; a ^= rol32(c, 16); c += b; \
+ b -= a; b ^= rol32(a, 19); a += c; \
+ c -= b; c ^= rol32(b, 4); b += a; \
+}

-/* NOTE: Arguments are modified. */
-#define __jhash_mix(a, b, c) \
-{ \
- a -= b; a -= c; a ^= (c>>13); \
- b -= c; b -= a; b ^= (a<<8); \
- c -= a; c -= b; c ^= (b>>13); \
- a -= b; a -= c; a ^= (c>>12); \
- b -= c; b -= a; b ^= (a<<16); \
- c -= a; c -= b; c ^= (b>>5); \
- a -= b; a -= c; a ^= (c>>3); \
- b -= c; b -= a; b ^= (a<<10); \
- c -= a; c -= b; c ^= (b>>15); \
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c) \
+{ \
+ c ^= b; c -= rol32(b, 14); \
+ a ^= c; a -= rol32(c, 11); \
+ b ^= a; b -= rol32(a, 25); \
+ c ^= b; c -= rol32(b, 16); \
+ a ^= c; a -= rol32(c, 4); \
+ b ^= a; b -= rol32(a, 14); \
+ c ^= b; c -= rol32(b, 24); \
}

-/* The golden ration: an arbitrary value */
-#define JHASH_GOLDEN_RATIO 0x9e3779b9
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL 0xdeadbeef

-/* The most generic version, hashes an arbitrary sequence
- * of bytes. No alignment or length assumptions are made about
- * the input key.
+/* jhash - hash an arbitrary key
+ * @k: sequence of bytes as key
+ * @length: the length of the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * The generic version, hashes an arbitrary sequence of bytes.
+ * No alignment or length assumptions are made about the input key.
+ *
+ * Returns the hash value of the key. The result depends on endianness.
*/
static inline u32 jhash(const void *key, u32 length, u32 initval)
{
- u32 a, b, c, len;
+ u32 a, b, c;
const u8 *k = key;

- len = length;
- a = b = JHASH_GOLDEN_RATIO;
- c = initval;
-
- while (len >= 12) {
- a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
- b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
- c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
-
- __jhash_mix(a,b,c);
+ /* Set up the internal state */
+ a = b = c = JHASH_INITVAL + length + initval;

+ /* All but the last block: affect some 32 bits of (a,b,c) */
+ while (length > 12) {
+ a += __get_unaligned_cpu32(k);
+ b += __get_unaligned_cpu32(k + 4);
+ c += __get_unaligned_cpu32(k + 8);
+ __jhash_mix(a, b, c);
+ length -= 12;
k += 12;
- len -= 12;
}
-
- c += length;
- switch (len) {
- case 11: c += ((u32)k[10]<<24);
- case 10: c += ((u32)k[9]<<16);
- case 9 : c += ((u32)k[8]<<8);
- case 8 : b += ((u32)k[7]<<24);
- case 7 : b += ((u32)k[6]<<16);
- case 6 : b += ((u32)k[5]<<8);
- case 5 : b += k[4];
- case 4 : a += ((u32)k[3]<<24);
- case 3 : a += ((u32)k[2]<<16);
- case 2 : a += ((u32)k[1]<<8);
- case 1 : a += k[0];
- };
-
- __jhash_mix(a,b,c);
+ /* Last block: affect all 32 bits of (c) */
+ /* All the case statements fall through */
+ switch (length) {
+ case 12: c += (u32)k[11]<<24;
+ case 11: c += (u32)k[10]<<16;
+ case 10: c += (u32)k[9]<<8;
+ case 9: c += k[8];
+ case 8: b += (u32)k[7]<<24;
+ case 7: b += (u32)k[6]<<16;
+ case 6: b += (u32)k[5]<<8;
+ case 5: b += k[4];
+ case 4: a += (u32)k[3]<<24;
+ case 3: a += (u32)k[2]<<16;
+ case 2: a += (u32)k[1]<<8;
+ case 1: a += k[0];
+ __jhash_final(a, b, c);
+ case 0: /* Nothing left to add */
+ break;
+ }

return c;
}

-/* A special optimized version that handles 1 or more of u32s.
- * The length parameter here is the number of u32s in the key.
+/* jhash2 - hash an array of u32's
+ * @k: the key which must be an array of u32's
+ * @length: the number of u32's in the key
+ * @initval: the previous hash, or an arbitray value
+ *
+ * Returns the hash value of the key.
*/
static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
{
- u32 a, b, c, len;
+ u32 a, b, c;

- a = b = JHASH_GOLDEN_RATIO;
- c = initval;
- len = length;
+ /* Set up the internal state */
+ a = b = c = JHASH_INITVAL + (length<<2) + initval;

- while (len >= 3) {
+ /* Handle most of the key */
+ while (length > 3) {
a += k[0];
b += k[1];
c += k[2];
__jhash_mix(a, b, c);
- k += 3; len -= 3;
+ length -= 3;
+ k += 3;
}

- c += length * 4;
-
- switch (len) {
- case 2 : b += k[1];
- case 1 : a += k[0];
- };
-
- __jhash_mix(a,b,c);
+ /* Handle the last 3 u32's: all the case statements fall through */
+ switch (length) {
+ case 3: c += k[2];
+ case 2: b += k[1];
+ case 1: a += k[0];
+ __jhash_final(a, b, c);
+ case 0: /* Nothing left to add */
+ break;
+ }

return c;
}


-/* A special ultra-optimized versions that knows they are hashing exactly
- * 3, 2 or 1 word(s).
- *
- * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally
- * done at the end is not done here.
- */
+/* jhash_3words - hash exactly 3, 2 or 1 word(s) */
static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
{
- a += JHASH_GOLDEN_RATIO;
- b += JHASH_GOLDEN_RATIO;
+ a += JHASH_INITVAL;
+ b += JHASH_INITVAL;
c += initval;

- __jhash_mix(a, b, c);
+ __jhash_final(a, b, c);

return c;
}
--
1.7.0.4

2010-12-08 17:09:28

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 0/2] New jhash function

From: Jozsef Kadlecsik <[email protected]>
Date: Fri, 3 Dec 2010 13:38:59 +0100

> The current jhash.h implements the lookup2() hash function by Bob Jenkins.
> However, lookup2() is outdated as Bob wrote a new hash function called
> lookup3(). There is a longer comparison of those two and other hash
> functions at http://burtleburtle.net/bob/hash/doobs.html.
>
> Please consider applying the following patches.

Patch #1 is already in the net-next-2.6 tree, and as long as there are
no major objections to the general crowd (including Rusty et al.) I am
happy to put patch #2 into my tree as well.

Rusty, does the current version of patch #2 look good to you?

Thanks!

2010-12-08 21:23:53

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 0/2] New jhash function

On Thu, 9 Dec 2010 03:39:54 am David Miller wrote:
> From: Jozsef Kadlecsik <[email protected]>
> Date: Fri, 3 Dec 2010 13:38:59 +0100
>
> > The current jhash.h implements the lookup2() hash function by Bob Jenkins.
> > However, lookup2() is outdated as Bob wrote a new hash function called
> > lookup3(). There is a longer comparison of those two and other hash
> > functions at http://burtleburtle.net/bob/hash/doobs.html.
> >
> > Please consider applying the following patches.
>
> Patch #1 is already in the net-next-2.6 tree, and as long as there are
> no major objections to the general crowd (including Rusty et al.) I am
> happy to put patch #2 into my tree as well.
>
> Rusty, does the current version of patch #2 look good to you?

Yes, 2/2 good. Thanks Jozsef!

Acked-by: Rusty Russell <[email protected]>

2010-12-10 04:19:01

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 0/2] New jhash function

From: Rusty Russell <[email protected]>
Date: Thu, 9 Dec 2010 07:53:45 +1030

> On Thu, 9 Dec 2010 03:39:54 am David Miller wrote:
>> From: Jozsef Kadlecsik <[email protected]>
>> Date: Fri, 3 Dec 2010 13:38:59 +0100
>>
>> > The current jhash.h implements the lookup2() hash function by Bob Jenkins.
>> > However, lookup2() is outdated as Bob wrote a new hash function called
>> > lookup3(). There is a longer comparison of those two and other hash
>> > functions at http://burtleburtle.net/bob/hash/doobs.html.
>> >
>> > Please consider applying the following patches.
>>
>> Patch #1 is already in the net-next-2.6 tree, and as long as there are
>> no major objections to the general crowd (including Rusty et al.) I am
>> happy to put patch #2 into my tree as well.
>>
>> Rusty, does the current version of patch #2 look good to you?
>
> Yes, 2/2 good. Thanks Jozsef!
>
> Acked-by: Rusty Russell <[email protected]>

Applied, thanks everyone.