After distilling with hpa's help the results of some weeks-old
numerological experiments^W^Wnumber crunching, I've devised a patch
here for -rmap to make the waitqueue hashing somewhat more palatable
for SPARC and several others.
This patch uses some operator-sparse Fibonacci hashing primes in order
to allow shift/add implementations of the hash function used for hashed
waitqueues.
Dan, Dave, could you take a look here and please comment?
Randy, any chance you could benchmark -rmap with this on top for
comparison against standard -rmap to ensure there is no regression?
Thanks,
Bill
P.S.: Dave: Sorry I took so long to get this thing out, but here it is.
# This is a BitKeeper generated patch for the following project:
# Project Name: Long-term Linux VM development
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.199 ->
# include/linux/mm_inline.h 1.14 -> 1.15
# mm/filemap.c 1.52 -> 1.53
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/02/17 [email protected] 1.200
# Use operator-sparse Fibonacci hashing primes for the hashed waitqueues so as to reduce the cost of
# the hashing computation on several major RISC architectures and most 64-bit machines.
#
# Specifically the search for operator-sparse primes was documented around the definitions of the
# GOLDEN_RATIO_PRIME values, where the number of terms in the sum of powers of 2 was chosen in
# advance and the numbers generated this way were sorted by their continued fraction expansion
# and then filtered for primality.
#
# In turn the definition of page_waitqueue() also needed to be altered so that for 64-bit machines,
# whose compilers cannot perform the transformation of multiplication by a constant to shifts and
# adds, the transformation is explcitly coded, and documentation is included for it describing how
# the shifts and adds correspond to the signed sum of powers of two representation of the golden
# ratio prime.
# --------------------------------------------
#
diff -Nru a/include/linux/mm_inline.h b/include/linux/mm_inline.h
--- a/include/linux/mm_inline.h Sun Feb 17 00:49:26 2002
+++ b/include/linux/mm_inline.h Sun Feb 17 00:49:26 2002
@@ -9,12 +9,43 @@
* Chuck Lever verified the effectiveness of this technique for several
* hash tables in his paper documenting the benchmark results:
* http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
+ *
+ * These prime numbers were determined by searching all numbers
+ * composed of sums of seven terms of the form 2^n with n decreasing
+ * with the index of the summand and sorting by the continued fraction
+ * expansion of p/2^BITS_PER_LONG, then filtering for primality.
+ * Verifying that the confidence tests on the chi^2 statistics passed
+ * is necessary because this process can produce "bad" factors, but on
+ * the other hand it often produces excellent factors, so it's a very
+ * useful process for generating hash functions.
*/
+
#if BITS_PER_LONG == 32
-#define GOLDEN_RATIO_PRIME 2654435761UL
+/*
+ * 2654404609 / 2^32 has the continued fraction expansion
+ * 0,1,1,1,1,1,1,1,1,1,1,1,1,18,7,1,3,7,18,1,1,1,1,1,1,1,1,1,1,2,0
+ * which is very close to phi.
+ * It is of the form 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1
+ * which makes it suitable for compiler optimization to shift/add
+ * implementations on machines where multiplication is a slow operation.
+ * gcc is successfully able to optimize this on 32-bit SPARC and MIPS.
+ */
+#define GOLDEN_RATIO_PRIME 0x9e370001UL
#elif BITS_PER_LONG == 64
-#define GOLDEN_RATIO_PRIME 11400714819323198549UL
+/*
+ * 11400862456688148481 / 2^64 has the continued fraction expansion
+ * 0,1,1,1,1,1,1,1,1,1,1,1,2,1,14,1,1048579,15,1,2,1,1,3,9,1,1,8,3,1,7,
+ * 1,1,9,1,2,1,1,1,1,2,0
+ * which is very close to phi = (sqrt(5)-1)/2 = 0,1,1,1,....
+ *
+ * It is of the form 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1
+ * which makes it suitable for shift/add implementations of the hash
+ * function. 64-bit architectures typically have slow multiplies (or
+ * no hardware multiply) and also gcc is unable to optimize 64-bit
+ * multiplies for these bit patterns so explicit expansion is used.
+ */
+#define GOLDEN_RATIO_PRIME 0x9e37fffffffc0001UL
#else
#error Define GOLDEN_RATIO_PRIME in mm_inline.h for your wordsize.
diff -Nru a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c Sun Feb 17 00:49:26 2002
+++ b/mm/filemap.c Sun Feb 17 00:49:26 2002
@@ -787,6 +787,9 @@
* collisions. This cost is great enough that effective hashing
* is necessary to maintain performance.
*/
+
+#if BITS_PER_LONG == 32
+
static inline wait_queue_head_t *page_waitqueue(struct page *page)
{
const zone_t *zone = page_zone(page);
@@ -798,6 +801,50 @@
return &wait[hash];
}
+
+#elif BITS_PER_LONG == 64
+
+static inline wait_queue_head_t *page_waitqueue(struct page *page)
+{
+ const zone_t *zone = page_zone(page);
+ wait_queue_head_t *wait = zone->wait_table;
+ unsigned long hash = (unsigned long)page;
+ unsigned long n;
+
+ /*
+ * The bit-sparse GOLDEN_RATIO_PRIME is a sum of powers of 2
+ * with varying signs: 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1
+ * The shifts in the code below correspond to the differences
+ * in the exponents above.
+ * The primary reason why the work of expanding the multiplication
+ * this way is not given to the compiler is because 64-bit code
+ * generation does not seem to be capable of doing it. So the code
+ * here manually expands it.
+ * The differences are used in order to both reduce the number of
+ * variables used and also to reduce the size of the immediates
+ * needed for the shift instructions, whose precision is limited
+ * on some architectures.
+ */
+
+ n = hash;
+ n <<= 18;
+ hash -= n;
+ n <<= 33;
+ hash -= n;
+ n <<= 3;
+ hash += n;
+ n <<= 3;
+ hash -= n;
+ n <<= 4;
+ hash += n;
+ n <<= 2;
+ hash += n;
+
+ hash >>= zone->wait_table_shift;
+ return &wait[hash];
+}
+
+#endif /* BITS_PER_LONG == 64 */
/*
On February 17, 2002 10:01 am, William Lee Irwin III wrote:
> After distilling with hpa's help the results of some weeks-old
> numerological experiments^W^Wnumber crunching, I've devised a patch
> here for -rmap to make the waitqueue hashing somewhat more palatable
> for SPARC and several others.
>
> This patch uses some operator-sparse Fibonacci hashing primes in order
> to allow shift/add implementations of the hash function used for hashed
> waitqueues.
>
> Dan, Dave, could you take a look here and please comment?
Could you explain in very simple terms, suitable for Aunt Tillie (ok, not
*that* simple) how the continued fraction works, how it's notated, and how
the terms of the expansion relate to good performance as a hash?
--
Daniel
> Randy, any chance you could benchmark -rmap with this on top for
> comparison against standard -rmap to ensure there is no regression?
>
>
> Thanks,
> Bill
>
> P.S.: Dave: Sorry I took so long to get this thing out, but here it is.
>
>
> # This is a BitKeeper generated patch for the following project:
> # Project Name: Long-term Linux VM development
> # This patch format is intended for GNU patch command version 2.5 or higher.
> # This patch includes the following deltas:
> # ChangeSet 1.199 ->
> # include/linux/mm_inline.h 1.14 -> 1.15
> # mm/filemap.c 1.52 -> 1.53
> #
> # The following is the BitKeeper ChangeSet Log
> # --------------------------------------------
> # 02/02/17 [email protected] 1.200
> # Use operator-sparse Fibonacci hashing primes for the hashed waitqueues so as to reduce the cost of
> # the hashing computation on several major RISC architectures and most 64-bit machines.
> #
> # Specifically the search for operator-sparse primes was documented around the definitions of the
> # GOLDEN_RATIO_PRIME values, where the number of terms in the sum of powers of 2 was chosen in
> # advance and the numbers generated this way were sorted by their continued fraction expansion
> # and then filtered for primality.
> #
> # In turn the definition of page_waitqueue() also needed to be altered so that for 64-bit machines,
> # whose compilers cannot perform the transformation of multiplication by a constant to shifts and
> # adds, the transformation is explcitly coded, and documentation is included for it describing how
> # the shifts and adds correspond to the signed sum of powers of two representation of the golden
> # ratio prime.
> # --------------------------------------------
> #
> diff -Nru a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> --- a/include/linux/mm_inline.h Sun Feb 17 00:49:26 2002
> +++ b/include/linux/mm_inline.h Sun Feb 17 00:49:26 2002
> @@ -9,12 +9,43 @@
> * Chuck Lever verified the effectiveness of this technique for several
> * hash tables in his paper documenting the benchmark results:
> * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
> + *
> + * These prime numbers were determined by searching all numbers
> + * composed of sums of seven terms of the form 2^n with n decreasing
> + * with the index of the summand and sorting by the continued fraction
> + * expansion of p/2^BITS_PER_LONG, then filtering for primality.
> + * Verifying that the confidence tests on the chi^2 statistics passed
> + * is necessary because this process can produce "bad" factors, but on
> + * the other hand it often produces excellent factors, so it's a very
> + * useful process for generating hash functions.
> */
> +
> #if BITS_PER_LONG == 32
> -#define GOLDEN_RATIO_PRIME 2654435761UL
> +/*
> + * 2654404609 / 2^32 has the continued fraction expansion
> + * 0,1,1,1,1,1,1,1,1,1,1,1,1,18,7,1,3,7,18,1,1,1,1,1,1,1,1,1,1,2,0
> + * which is very close to phi.
> + * It is of the form 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1
> + * which makes it suitable for compiler optimization to shift/add
> + * implementations on machines where multiplication is a slow operation.
> + * gcc is successfully able to optimize this on 32-bit SPARC and MIPS.
> + */
> +#define GOLDEN_RATIO_PRIME 0x9e370001UL
>
> #elif BITS_PER_LONG == 64
> -#define GOLDEN_RATIO_PRIME 11400714819323198549UL
> +/*
> + * 11400862456688148481 / 2^64 has the continued fraction expansion
> + * 0,1,1,1,1,1,1,1,1,1,1,1,2,1,14,1,1048579,15,1,2,1,1,3,9,1,1,8,3,1,7,
> + * 1,1,9,1,2,1,1,1,1,2,0
> + * which is very close to phi = (sqrt(5)-1)/2 = 0,1,1,1,....
> + *
> + * It is of the form 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1
> + * which makes it suitable for shift/add implementations of the hash
> + * function. 64-bit architectures typically have slow multiplies (or
> + * no hardware multiply) and also gcc is unable to optimize 64-bit
> + * multiplies for these bit patterns so explicit expansion is used.
> + */
> +#define GOLDEN_RATIO_PRIME 0x9e37fffffffc0001UL
>
> #else
> #error Define GOLDEN_RATIO_PRIME in mm_inline.h for your wordsize.
> diff -Nru a/mm/filemap.c b/mm/filemap.c
> --- a/mm/filemap.c Sun Feb 17 00:49:26 2002
> +++ b/mm/filemap.c Sun Feb 17 00:49:26 2002
> @@ -787,6 +787,9 @@
> * collisions. This cost is great enough that effective hashing
> * is necessary to maintain performance.
> */
> +
> +#if BITS_PER_LONG == 32
> +
> static inline wait_queue_head_t *page_waitqueue(struct page *page)
> {
> const zone_t *zone = page_zone(page);
> @@ -798,6 +801,50 @@
>
> return &wait[hash];
> }
> +
> +#elif BITS_PER_LONG == 64
> +
> +static inline wait_queue_head_t *page_waitqueue(struct page *page)
> +{
> + const zone_t *zone = page_zone(page);
> + wait_queue_head_t *wait = zone->wait_table;
> + unsigned long hash = (unsigned long)page;
> + unsigned long n;
> +
> + /*
> + * The bit-sparse GOLDEN_RATIO_PRIME is a sum of powers of 2
> + * with varying signs: 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1
> + * The shifts in the code below correspond to the differences
> + * in the exponents above.
> + * The primary reason why the work of expanding the multiplication
> + * this way is not given to the compiler is because 64-bit code
> + * generation does not seem to be capable of doing it. So the code
> + * here manually expands it.
> + * The differences are used in order to both reduce the number of
> + * variables used and also to reduce the size of the immediates
> + * needed for the shift instructions, whose precision is limited
> + * on some architectures.
> + */
> +
> + n = hash;
> + n <<= 18;
> + hash -= n;
> + n <<= 33;
> + hash -= n;
> + n <<= 3;
> + hash += n;
> + n <<= 3;
> + hash -= n;
> + n <<= 4;
> + hash += n;
> + n <<= 2;
> + hash += n;
> +
> + hash >>= zone->wait_table_shift;
> + return &wait[hash];
> +}
> +
> +#endif /* BITS_PER_LONG == 64 */
>
>
> /*
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
Daniel
On February 17, 2002 10:01 am, William Lee Irwin III wrote:
>> After distilling with hpa's help the results of some weeks-old
>> numerological experiments^W^Wnumber crunching, I've devised a patch
>> here for -rmap to make the waitqueue hashing somewhat more palatable
>> for SPARC and several others.
>>
>> This patch uses some operator-sparse Fibonacci hashing primes in order
>> to allow shift/add implementations of the hash function used for hashed
>> waitqueues.
>>
>> Dan, Dave, could you take a look here and please comment?
On Mon, Feb 18, 2002 at 11:31:15PM +0100, Daniel Phillips wrote:
> Could you explain in very simple terms, suitable for Aunt Tillie (ok, not
> *that* simple) how the continued fraction works, how it's notated, and how
> the terms of the expansion relate to good performance as a hash?
Do you want it just in a post or in-line?
Here's the posted brief version:
Numbers have "integer parts" and "fractional parts", for instance, if
you have a number such as 10 1/2 (ten and one half) the integer part
is 10 and the fractional part is 1/2. The fractional part of a number
x is written {x}.
Now, there is something called the "spectrum" of a number, which for
a number x is the set of all the numbers of the form n * x, where n
is an integer. So we have {1*x}, {2*x}, {3*x}, and so on.
If we want to measure how well a number distributes things we can try
to see how uniform the spectrum is as a distribution. There is a
theorem which states the "most uniform" distribution results from the
number phi = (sqrt(5)-1)/2, which is related to Fibonacci numbers.
The continued fraction of phi is
0 + 1
-----
1 + 1
-----
1 + 1
-----
1 + 1
-----
1 + 1
...
where it's 1's all the way down. Some additional study also revealed
that how close the continued fraction of a number is to phi is related
to how uniform the spectrum is. For brevity, I write continued fractions
in-line, for instance, 0,1,1,1,1,... for phi, or 0,1,2,3,4,... for
0 + 1
-----
1 + 2
-----
1 + 3
-----
1 + 4
....
One way to evaluate these is to "chop off" the fraction at some point
(for instance, where I put ...) and then reduce it like an ordinary
fraction expression.
Fibonacci hashing considers the number p/2^n where n is BITS_PER_LONG
and p is a prime number, and this is supposed to have a relationship
to how evenly-distributed all the n-bit numbers multiplied by p in
n-bit arithmetic are. Which is where the hash functions come in, since
you want hash functions to evenly distribute things. There are reasons
why primes are better, too.
And I think that covers most of what you had in mind.
In my own opinion, this stuff borders on numerology, but it seems to be
a convenient supply of hash functions that pass chi^2 tests on the
bucket distributions, so I sort of tolerate it. If I'm not using a strict
enough test then I'm all ears...
Cheers,
Bill
On Mon, Feb 18, 2002 at 04:34:50PM -0800, William Lee Irwin III wrote:
> where it's 1's all the way down. Some additional study also revealed
> that how close the continued fraction of a number is to phi is related
> to how uniform the spectrum is. For brevity, I write continued fractions
> in-line, for instance, 0,1,1,1,1,... for phi, or 0,1,2,3,4,... for
>
> 0 + 1
> -----
> 1 + 2
> -----
> 1 + 3
> -----
> 1 + 4
> ....
doh! Sorry, I just woke up
0 + 1
-----
1 + 1
-----
2 + 1
-----
3 + 1
-----
4 + 1
...
Cheers,
Bill
On February 19, 2002 01:34 am, William Lee Irwin III wrote:
> On February 17, 2002 10:01 am, William Lee Irwin III wrote:
> >> After distilling with hpa's help the results of some weeks-old
> >> numerological experiments^W^Wnumber crunching, I've devised a patch
> >> here for -rmap to make the waitqueue hashing somewhat more palatable
> >> for SPARC and several others.
> >>
> >> This patch uses some operator-sparse Fibonacci hashing primes in order
> >> to allow shift/add implementations of the hash function used for hashed
> >> waitqueues.
> >>
> >> Dan, Dave, could you take a look here and please comment?
>
> On Mon, Feb 18, 2002 at 11:31:15PM +0100, Daniel Phillips wrote:
> > Could you explain in very simple terms, suitable for Aunt Tillie (ok, not
> > *that* simple) how the continued fraction works, how it's notated, and how
> > the terms of the expansion relate to good performance as a hash?
>
> Do you want it just in a post or in-line?
>
> Here's the posted brief version:
>
> Numbers have "integer parts" and "fractional parts", for instance, if
> you have a number such as 10 1/2 (ten and one half) the integer part
> is 10 and the fractional part is 1/2. The fractional part of a number
> x is written {x}.
>
> Now, there is something called the "spectrum" of a number, which for
> a number x is the set of all the numbers of the form n * x, where n
> is an integer. So we have {1*x}, {2*x}, {3*x}, and so on.
>
> If we want to measure how well a number distributes things we can try
> to see how uniform the spectrum is as a distribution. There is a
> theorem which states the "most uniform" distribution results from the
> number phi = (sqrt(5)-1)/2, which is related to Fibonacci numbers.
>
> The continued fraction of phi is
>
> 0 + 1
> -----
> 1 + 1
> -----
> 1 + 1
> -----
> 1 + 1
> -----
> 1 + 1
> ...
>
> where it's 1's all the way down. Some additional study also revealed
> that how close the continued fraction of a number is to phi is related
> to how uniform the spectrum is. For brevity, I write continued fractions
> in-line, for instance, 0,1,1,1,1,... for phi, or 0,1,2,3,4,... for
>
> 0 + 1
> -----
> 1 + 2
> -----
> 1 + 3
> -----
> 1 + 4
> ....
>
> One way to evaluate these is to "chop off" the fraction at some point
> (for instance, where I put ...) and then reduce it like an ordinary
> fraction expression.
>
> Fibonacci hashing considers the number p/2^n where n is BITS_PER_LONG
> and p is a prime number, and this is supposed to have a relationship
> to how evenly-distributed all the n-bit numbers multiplied by p in
> n-bit arithmetic are. Which is where the hash functions come in, since
> you want hash functions to evenly distribute things. There are reasons
> why primes are better, too.
>
> And I think that covers most of what you had in mind.
Yes, it sure does, thanks a lot.
> In my own opinion, this stuff borders on numerology, but it seems to be
> a convenient supply of hash functions that pass chi^2 tests on the
> bucket distributions, so I sort of tolerate it. If I'm not using a strict
> enough test then I'm all ears...
/me resolves to get back to htree hashing very soon.
--
Daniel
On Mon, Feb 18, 2002 at 04:34:50PM -0800, William Lee Irwin III wrote:
> Now, there is something called the "spectrum" of a number, which for
> a number x is the set of all the numbers of the form n * x, where n
> is an integer. So we have {1*x}, {2*x}, {3*x}, and so on.
Argh! Spec(x) is the multiset [1*x], [2*x], [3*x] where [x] is the
integer part of x.
Cheers,
Bill