LinuxLists.cc - [PATCH v2 0/2] KSM: Replace jhash2 with xxhash

2017-09-21 23:18:31

Subject: [PATCH v2 0/2] KSM: Replace jhash2 with xxhash

ksm use jhash2 for hashing pages,
in 4.14 xxhash has been merged to mainline kernel.

xxhash much faster then jhash2 on big inputs (32 byte+)

xxhash has 2 versions, one with 32-bit hash and
one with 64-bit hash.

64-bit version works faster then 32-bit on 64-bit arch.

So lets get better from two worlds,
create arch dependent xxhash() function that will use
fastest algo for current arch.
This a first patch.

Performance info and ksm update can be found in second patch.

Changelog:
v1 -> v2:
- Move xxhash() to xxhash.h/c and separate patches

Timofey Titovets (2):
xxHash: create arch dependent 32/64-bit xxhash()
KSM: Replace jhash2 with xxhash

include/linux/xxhash.h | 24 ++++++++++++++++++++++++
lib/xxhash.c | 10 ++++++++++
mm/Kconfig | 1 +
mm/ksm.c | 14 +++++++-------
4 files changed, 42 insertions(+), 7 deletions(-)

--
2.14.1

2017-09-21 23:18:35

by Timofey Titovets

[permalink] [raw]

Subject: [PATCH v2 2/2] KSM: Replace jhash2 with xxhash

jhash2 used for calculating checksum
for in memory pages, for detect fact of
changes in page.

xxhash much faster then jhash2, some tests:
x86_64 host:
CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
PAGE_SIZE: 4096, loop count: 1048576
jhash2: 0xacbc7a5b time: 1907 ms, th: 2251.9 MiB/s
xxhash32: 0x570da981 time: 739 ms, th: 5809.4 MiB/s
xxhash64: 0xa1fa032ab85bbb62 time: 371 ms, th: 11556.6 MiB/s

CPU: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
PAGE_SIZE: 4096, loop count: 1048576
jhash2: 0xe680b382 time: 3722 ms, th: 1153.896680 MiB/s
xxhash32: 0x56d00be4 time: 1183 ms, th: 3629.130689 MiB/s
xxhash64: 0x8c194cff29cc4dee time: 725 ms, th: 5918.003401 MiB/s

xxhash64 on x86_32 work with ~ same speed as jhash2.
xxhash32 on x86_32 work with ~ same speed as for x86_64
jhash2 are faster than xxhash on input data smaller than 32 byte

So use xxhash() which will take appropriate hash version
for target arch

I did some benchmarks (i get cpu load of ksmd from htop):
CPU: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
ksm: sleep_millisecs = 1
jhash2: ~18%
xxhash64: ~11%
ksm: sleep_millisecs = 20 - default
jhash2: ~4.7%
xxhash64: ~3.3%

- 11 / 18 ~= 0.6 -> Profit: ~40%
- 3.3/4.7 ~= 0.7 -> Profit: ~30%

Signed-off-by: Timofey Titovets <[email protected]>
Acked-by: Andi Kleen <[email protected]>
---
mm/Kconfig | 1 +
mm/ksm.c | 14 +++++++-------
2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 9c4bdddd80c2..252ab266ac23 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -305,6 +305,7 @@ config MMU_NOTIFIER
config KSM
bool "Enable KSM for page merging"
depends on MMU
+ select XXHASH
help
Enable Kernel Samepage Merging: KSM periodically scans those areas
of an application's address space that an app has advised may be
diff --git a/mm/ksm.c b/mm/ksm.c
index 15dd7415f7b3..6527fe21aaa3 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -25,7 +25,7 @@
#include <linux/pagemap.h>
#include <linux/rmap.h>
#include <linux/spinlock.h>
-#include <linux/jhash.h>
+#include <linux/xxhash.h>
#include <linux/delay.h>
#include <linux/kthread.h>
#include <linux/wait.h>
@@ -186,7 +186,7 @@ struct rmap_item {
};
struct mm_struct *mm;
unsigned long address; /* + low bits used for flags below */
- unsigned int oldchecksum; /* when unstable */
+ xxhash_t oldchecksum; /* when unstable */
union {
struct rb_node node; /* when node of unstable tree */
struct { /* when listed from stable tree */
@@ -255,7 +255,7 @@ static unsigned int ksm_thread_pages_to_scan = 100;
static unsigned int ksm_thread_sleep_millisecs = 20;

/* Checksum of an empty (zeroed) page */
-static unsigned int zero_checksum __read_mostly;
+static xxhash_t zero_checksum __read_mostly;

/* Whether to merge empty (zeroed) pages with actual zero pages */
static bool ksm_use_zero_pages __read_mostly;
@@ -982,11 +982,11 @@ static int unmerge_and_remove_all_rmap_items(void)
}
#endif /* CONFIG_SYSFS */

-static u32 calc_checksum(struct page *page)
+static xxhash_t calc_checksum(struct page *page)
{
- u32 checksum;
+ xxhash_t checksum;
void *addr = kmap_atomic(page);
- checksum = jhash2(addr, PAGE_SIZE / 4, 17);
+ checksum = xxhash(addr, PAGE_SIZE, 0);
kunmap_atomic(addr);
return checksum;
}
@@ -1994,7 +1994,7 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
struct page *tree_page = NULL;
struct stable_node *stable_node;
struct page *kpage;
- unsigned int checksum;
+ xxhash_t checksum;
int err;
bool max_page_sharing_bypass = false;

--
2.14.1

2017-09-21 23:18:55

by Timofey Titovets

[permalink] [raw]

Subject: [PATCH v2 1/2] xxHash: create arch dependent 32/64-bit xxhash()

xxh32() - fast on both 32/64-bit platforms
xxh64() - fast only on 64-bit platform

Create xxhash() which will pickup fastest version
on compile time.

Add type xxhash_t to map correct hash size.

As result depends on cpu word size,
the main proporse of that - in memory hashing.

Signed-off-by: Timofey Titovets <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Cc: Linux-kernel <[email protected]>
---
include/linux/xxhash.h | 24 ++++++++++++++++++++++++
lib/xxhash.c | 10 ++++++++++
2 files changed, 34 insertions(+)

diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
index 9e1f42cb57e9..195a0ae10e9b 100644
--- a/include/linux/xxhash.h
+++ b/include/linux/xxhash.h
@@ -76,6 +76,7 @@
#define XXHASH_H

#include <linux/types.h>
+#include <linux/bitops.h> /* BITS_PER_LONG */

/*-****************************
* Simple Hash Functions
@@ -107,6 +108,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed);
*/
uint64_t xxh64(const void *input, size_t length, uint64_t seed);

+#if BITS_PER_LONG == 64
+typedef u64 xxhash_t;
+#else
+typedef u32 xxhash_t;
+#endif
+
+/**
+ * xxhash() - calculate 32/64-bit hash based on cpu word size
+ *
+ * @input: The data to hash.
+ * @length: The length of the data to hash.
+ * @seed: The seed can be used to alter the result predictably.
+ *
+ * This function always work as xxh32() for 32-bit systems
+ * and as xxh64() for 64-bit systems.
+ * Because result depends on cpu work size,
+ * the main proporse of that function is for in memory hashing.
+ *
+ * Return: 32/64-bit hash of the data.
+ */
+
+xxhash_t xxhash(const void *input, size_t length, uint64_t seed);
+
/*-****************************
* Streaming Hash Functions
*****************************/
diff --git a/lib/xxhash.c b/lib/xxhash.c
index aa61e2a3802f..7dd1105fcc30 100644
--- a/lib/xxhash.c
+++ b/lib/xxhash.c
@@ -236,6 +236,16 @@ uint64_t xxh64(const void *input, const size_t len, const uint64_t seed)
}
EXPORT_SYMBOL(xxh64);

+xxhash_t xxhash(const void *input, size_t length, uint64_t seed)
+{
+#if BITS_PER_LONG == 64
+ return xxh64(input, length, seed);
+#else
+ return xxh32(input, length, seed);
+#endif
+}
+EXPORT_SYMBOL(xxhash);
+
/*-**************************************************
* Advanced Hash Functions
***************************************************/
--
2.14.1

2017-09-25 14:59:35

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] xxHash: create arch dependent 32/64-bit xxhash()

On Fri, Sep 22, 2017 at 02:18:17AM +0300, Timofey Titovets wrote:
> diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
> index 9e1f42cb57e9..195a0ae10e9b 100644
> --- a/include/linux/xxhash.h
> +++ b/include/linux/xxhash.h
> @@ -76,6 +76,7 @@
> #define XXHASH_H
>
> #include <linux/types.h>
> +#include <linux/bitops.h> /* BITS_PER_LONG */
>
> /*-****************************
> * Simple Hash Functions

Huh? linux/types.h already brings in BITS_PER_LONG. Look:

linux/types.h
uapi/linux/types.h
uapi/asm/types.h
uapi/asm-generic/types.h
uapi/asm-generic/int-ll64.h
asm/bitsperlong.h

> @@ -107,6 +108,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed);
> */
> uint64_t xxh64(const void *input, size_t length, uint64_t seed);
>
> +#if BITS_PER_LONG == 64
> +typedef u64 xxhash_t;
> +#else
> +typedef u32 xxhash_t;
> +#endif

This is a funny way to spell 'unsigned long' ...

> +/**
> + * xxhash() - calculate 32/64-bit hash based on cpu word size
> + *
> + * @input: The data to hash.
> + * @length: The length of the data to hash.
> + * @seed: The seed can be used to alter the result predictably.
> + *
> + * This function always work as xxh32() for 32-bit systems
> + * and as xxh64() for 64-bit systems.
> + * Because result depends on cpu work size,
> + * the main proporse of that function is for in memory hashing.
> + *
> + * Return: 32/64-bit hash of the data.
> + */
> +

> +xxhash_t xxhash(const void *input, size_t length, uint64_t seed)
> +{
> +#if BITS_PER_LONG == 64
> + return xxh64(input, length, seed);
> +#else
> + return xxh32(input, length, seed);
> +#endif
> +}

Let's move that to the header file and make it a static inline. That way
it doesn't need to be an EXPORT_SYMBOL.

Also, I think the kerneldoc could do with a bit of work. Try this:

/**
* xxhash() - calculate wordsize hash of the input with a given seed
* @input: The data to hash.
* @length: The length of the data to hash.
* @seed: The seed can be used to alter the result predictably.
*
* If the hash does not need to be comparable between machines with
* different word sizes, this function will call whichever of xxh32()
* or xxh64() is faster.
*
* Return: wordsize hash of the data.
*/
static inline
unsigned long xxhash(const void *input, size_t length, unsigned long seed)
{
#if BITS_PER_LONG == 64
return xxh64(input, length, seed);
#else
return xxh32(input, length, seed);
#endif
}

2017-09-25 16:18:30

by Timofey Titovets

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] xxHash: create arch dependent 32/64-bit xxhash()

2017-09-25 17:59 GMT+03:00 Matthew Wilcox <[email protected]>:
> On Fri, Sep 22, 2017 at 02:18:17AM +0300, Timofey Titovets wrote:
>> diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
>> index 9e1f42cb57e9..195a0ae10e9b 100644
>> --- a/include/linux/xxhash.h
>> +++ b/include/linux/xxhash.h
>> @@ -76,6 +76,7 @@
>> #define XXHASH_H
>>
>> #include <linux/types.h>
>> +#include <linux/bitops.h> /* BITS_PER_LONG */
>>
>> /*-****************************
>> * Simple Hash Functions
>
> Huh? linux/types.h already brings in BITS_PER_LONG. Look:
>
> linux/types.h
> uapi/linux/types.h
> uapi/asm/types.h
> uapi/asm-generic/types.h
> uapi/asm-generic/int-ll64.h
> asm/bitsperlong.h

Will fix that, thanks.

>> @@ -107,6 +108,29 @@ uint32_t xxh32(const void *input, size_t length, uint32_t seed);
>> */
>> uint64_t xxh64(const void *input, size_t length, uint64_t seed);
>>
>> +#if BITS_PER_LONG == 64
>> +typedef u64 xxhash_t;
>> +#else
>> +typedef u32 xxhash_t;
>> +#endif
>
> This is a funny way to spell 'unsigned long' ...

i'm just want some strict and obvious types for in memory hashing.
And that just looks pretty for my eye (IMHO),
I will replace that with 'unsigned long' of course and drop xxhash_t completely,
as you find that unacceptable.

>> +/**
>> + * xxhash() - calculate 32/64-bit hash based on cpu word size
>> + *
>> + * @input: The data to hash.
>> + * @length: The length of the data to hash.
>> + * @seed: The seed can be used to alter the result predictably.
>> + *
>> + * This function always work as xxh32() for 32-bit systems
>> + * and as xxh64() for 64-bit systems.
>> + * Because result depends on cpu work size,
>> + * the main proporse of that function is for in memory hashing.
>> + *
>> + * Return: 32/64-bit hash of the data.
>> + */
>> +
>
>> +xxhash_t xxhash(const void *input, size_t length, uint64_t seed)
>> +{
>> +#if BITS_PER_LONG == 64
>> + return xxh64(input, length, seed);
>> +#else
>> + return xxh32(input, length, seed);
>> +#endif
>> +}
>
> Let's move that to the header file and make it a static inline. That way
> it doesn't need to be an EXPORT_SYMBOL.

Agreed, thanks.

> Also, I think the kerneldoc could do with a bit of work. Try this:
>
> /**
> * xxhash() - calculate wordsize hash of the input with a given seed
> * @input: The data to hash.
> * @length: The length of the data to hash.
> * @seed: The seed can be used to alter the result predictably.
> *
> * If the hash does not need to be comparable between machines with
> * different word sizes, this function will call whichever of xxh32()
> * or xxh64() is faster.
> *
> * Return: wordsize hash of the data.
> */

Replace with your version, thanks.

> static inline
> unsigned long xxhash(const void *input, size_t length, unsigned long seed)
> {
> #if BITS_PER_LONG == 64
> return xxh64(input, length, seed);
> #else
> return xxh32(input, length, seed);
> #endif
> }

--
Have a nice day,
Timofey.