This patchset is to optimize the cross-socket memory access with
MPOL_PREFERRED_MANY policy.
To test this patch we ran the following test on a 3 node system.
Node 0 - 2GB - Tier 1
Node 1 - 11GB - Tier 1
Node 6 - 10GB - Tier 2
Below changes are made to memcached to set the memory policy,
It select Node0 and Node1 as preferred nodes.
#include <numaif.h>
#include <numa.h>
unsigned long nodemask;
int ret;
nodemask = 0x03;
ret = set_mempolicy(MPOL_PREFERRED_MANY | MPOL_F_NUMA_BALANCING,
&nodemask, 10);
/* If MPOL_F_NUMA_BALANCING isn't supported,
* fall back to MPOL_PREFERRED_MANY */
if (ret < 0 && errno == EINVAL){
printf("set mem policy normal\n");
ret = set_mempolicy(MPOL_PREFERRED_MANY, &nodemask, 10);
}
if (ret < 0) {
perror("Failed to call set_mempolicy");
exit(-1);
}
Test Procedure:
===============
1. Make sure memory tiering and demotion are enabled.
2. Start memcached.
# ./memcached -b 100000 -m 204800 -u root -c 1000000 -t 7
-d -s "/tmp/memcached.sock"
3. Run memtier_benchmark to store 3200000 keys.
#./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
--threads=1 --pipeline=1 --ratio=1:0 --key-pattern=S:S --key-minimum=1
--key-maximum=3200000 -n allkeys -c 1 -R -x 1 -d 1024
4. Start a memory eater on node 0 and 1. This will demote all memcached
pages to node 6.
5. Make sure all the memcached pages got demoted to lower tier by reading
/proc/<memcaced PID>/numa_maps.
# cat /proc/2771/numa_maps
---
default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
---
6. Kill memory eater.
7. Read the pgpromote_success counter.
8. Start reading the keys by running memtier_benchmark.
#./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
--pipeline=1 --distinct-client-seed --ratio=0:3 --key-pattern=R:R
--key-minimum=1 --key-maximum=3200000 -n allkeys
--threads=64 -c 1 -R -x 6
9. Read the pgpromote_success counter.
Test Results:
=============
Without Patch
------------------
1. pgpromote_success before test
Node 0: pgpromote_success 11
Node 1: pgpromote_success 140974
pgpromote_success after test
Node 0: pgpromote_success 11
Node 1: pgpromote_success 140974
2. Memtier-benchmark result.
AGGREGATED AVERAGE RESULTS (6 runs)
==================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency
------------------------------------------------------------------
Sets 0.00 --- --- --- ---
Gets 305792.03 305791.93 0.10 0.18949 0.16700
Waits 0.00 --- --- --- ---
Totals 305792.03 305791.93 0.10 0.18949 0.16700
======================================
p99 Latency p99.9 Latency KB/sec
-------------------------------------
--- --- 0.00
0.44700 1.71100 11542.69
--- --- ---
0.44700 1.71100 11542.69
With Patch
---------------
1. pgpromote_success before test
Node 0: pgpromote_success 5
Node 1: pgpromote_success 89386
pgpromote_success after test
Node 0: pgpromote_success 57895
Node 1: pgpromote_success 141463
2. Memtier-benchmark result.
AGGREGATED AVERAGE RESULTS (6 runs)
====================================================================
Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency
--------------------------------------------------------------------
Sets 0.00 --- --- --- ---
Gets 521942.24 521942.07 0.17 0.11459 0.10300
Waits 0.00 --- --- --- ---
Totals 521942.24 521942.07 0.17 0.11459 0.10300
=======================================
p99 Latency p99.9 Latency KB/sec
---------------------------------------
--- --- 0.00
0.23100 0.31900 19701.68
--- --- ---
0.23100 0.31900 19701.68
Test Result Analysis:
=====================
1. With patch we could observe pages are getting promoted.
2. Memtier-benchmark results shows that, with the patch,
performance has increased more than 50%.
Ops/sec without fix - 305792.03
Ops/sec with fix - 521942.24
Changes:
V4
- Added an example in the "PATCH 2/2" commit message as per the discussion
from V3.
V3:
- Added "* @vmf: structure describing the fault" comment for
mpol_misplaced() to fix the warning.
https://lore.kernel.org/oe-kbuild-all/[email protected]/
-https://lore.kernel.org/lkml/[email protected]/
v2:
- Rebased on latest upstream (v6.8-rc7)
- Used 'numa_node_id()' to get the current execution node ID, Added
'lockdep_assert_held' to make sure that the 'mpol_misplaced()' is
called with ptl held.
- The migration condition has been updated; now, migration will only
occur if the execution node is present in the policy nodemask.
-https://lore.kernel.org/lkml/[email protected]/
-v1: https://lore.kernel.org/linux-mm/9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com/#t
Donet Tom (2):
mm/mempolicy: Use numa_node_id() instead of cpu_to_node()
mm/numa_balancing:Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy
include/linux/mempolicy.h | 5 +++--
mm/huge_memory.c | 2 +-
mm/internal.h | 2 +-
mm/memory.c | 8 +++++---
mm/mempolicy.c | 36 +++++++++++++++++++++++++++---------
5 files changed, 37 insertions(+), 16 deletions(-)
--
2.39.3
On Mon, 25 Mar 2024 09:24:12 -0500 Donet Tom <[email protected]> wrote:
> V4
> - Added an example in the "PATCH 2/2" commit message as per the discussion
> from V3.
Thanks, I updated the changelogs in place.
Donet Tom <[email protected]> writes:
> This patchset is to optimize the cross-socket memory access with
> MPOL_PREFERRED_MANY policy.
>
> To test this patch we ran the following test on a 3 node system.
> Node 0 - 2GB - Tier 1
> Node 1 - 11GB - Tier 1
> Node 6 - 10GB - Tier 2
>
> Below changes are made to memcached to set the memory policy,
> It select Node0 and Node1 as preferred nodes.
>
> #include <numaif.h>
> #include <numa.h>
>
> unsigned long nodemask;
> int ret;
>
> nodemask = 0x03;
> ret = set_mempolicy(MPOL_PREFERRED_MANY | MPOL_F_NUMA_BALANCING,
> &nodemask, 10);
> /* If MPOL_F_NUMA_BALANCING isn't supported,
> * fall back to MPOL_PREFERRED_MANY */
> if (ret < 0 && errno == EINVAL){
> printf("set mem policy normal\n");
> ret = set_mempolicy(MPOL_PREFERRED_MANY, &nodemask, 10);
> }
> if (ret < 0) {
> perror("Failed to call set_mempolicy");
> exit(-1);
> }
>
> Test Procedure:
> ===============
> 1. Make sure memory tiering and demotion are enabled.
> 2. Start memcached.
>
> # ./memcached -b 100000 -m 204800 -u root -c 1000000 -t 7
> -d -s "/tmp/memcached.sock"
>
> 3. Run memtier_benchmark to store 3200000 keys.
>
> #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
> --threads=1 --pipeline=1 --ratio=1:0 --key-pattern=S:S --key-minimum=1
> --key-maximum=3200000 -n allkeys -c 1 -R -x 1 -d 1024
>
> 4. Start a memory eater on node 0 and 1. This will demote all memcached
> pages to node 6.
> 5. Make sure all the memcached pages got demoted to lower tier by reading
> /proc/<memcaced PID>/numa_maps.
>
> # cat /proc/2771/numa_maps
> ---
> default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
> default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
> ---
>
> 6. Kill memory eater.
> 7. Read the pgpromote_success counter.
> 8. Start reading the keys by running memtier_benchmark.
>
> #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
> --pipeline=1 --distinct-client-seed --ratio=0:3 --key-pattern=R:R
> --key-minimum=1 --key-maximum=3200000 -n allkeys
> --threads=64 -c 1 -R -x 6
>
> 9. Read the pgpromote_success counter.
>
> Test Results:
> =============
> Without Patch
> ------------------
> 1. pgpromote_success before test
> Node 0: pgpromote_success 11
> Node 1: pgpromote_success 140974
>
> pgpromote_success after test
> Node 0: pgpromote_success 11
> Node 1: pgpromote_success 140974
>
> 2. Memtier-benchmark result.
> AGGREGATED AVERAGE RESULTS (6 runs)
> ==================================================================
> Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency
> ------------------------------------------------------------------
> Sets 0.00 --- --- --- ---
> Gets 305792.03 305791.93 0.10 0.18949 0.16700
> Waits 0.00 --- --- --- ---
> Totals 305792.03 305791.93 0.10 0.18949 0.16700
>
> ======================================
> p99 Latency p99.9 Latency KB/sec
> -------------------------------------
> --- --- 0.00
> 0.44700 1.71100 11542.69
> --- --- ---
> 0.44700 1.71100 11542.69
>
> With Patch
> ---------------
> 1. pgpromote_success before test
> Node 0: pgpromote_success 5
> Node 1: pgpromote_success 89386
>
> pgpromote_success after test
> Node 0: pgpromote_success 57895
> Node 1: pgpromote_success 141463
>
> 2. Memtier-benchmark result.
> AGGREGATED AVERAGE RESULTS (6 runs)
> ====================================================================
> Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency
> --------------------------------------------------------------------
> Sets 0.00 --- --- --- ---
> Gets 521942.24 521942.07 0.17 0.11459 0.10300
> Waits 0.00 --- --- --- ---
> Totals 521942.24 521942.07 0.17 0.11459 0.10300
>
> =======================================
> p99 Latency p99.9 Latency KB/sec
> ---------------------------------------
> --- --- 0.00
> 0.23100 0.31900 19701.68
> --- --- ---
> 0.23100 0.31900 19701.68
>
>
> Test Result Analysis:
> =====================
> 1. With patch we could observe pages are getting promoted.
> 2. Memtier-benchmark results shows that, with the patch,
> performance has increased more than 50%.
>
> Ops/sec without fix - 305792.03
> Ops/sec with fix - 521942.24
>
> Changes:
> V4
> - Added an example in the "PATCH 2/2" commit message as per the discussion
> from V3.
> V3:
> - Added "* @vmf: structure describing the fault" comment for
> mpol_misplaced() to fix the warning.
> https://lore.kernel.org/oe-kbuild-all/[email protected]/
> -https://lore.kernel.org/lkml/[email protected]/
> v2:
> - Rebased on latest upstream (v6.8-rc7)
> - Used 'numa_node_id()' to get the current execution node ID, Added
> 'lockdep_assert_held' to make sure that the 'mpol_misplaced()' is
> called with ptl held.
> - The migration condition has been updated; now, migration will only
> occur if the execution node is present in the policy nodemask.
> -https://lore.kernel.org/lkml/[email protected]/
>
> -v1: https://lore.kernel.org/linux-mm/9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com/#t
>
>
> Donet Tom (2):
> mm/mempolicy: Use numa_node_id() instead of cpu_to_node()
> mm/numa_balancing:Allow migrate on protnone reference with
> MPOL_PREFERRED_MANY policy
>
> include/linux/mempolicy.h | 5 +++--
> mm/huge_memory.c | 2 +-
> mm/internal.h | 2 +-
> mm/memory.c | 8 +++++---
> mm/mempolicy.c | 36 +++++++++++++++++++++++++++---------
> 5 files changed, 37 insertions(+), 16 deletions(-)
LGTM, Thanks! Feel free to add
Reviewed-by: "Huang, Ying" <[email protected]>
in the future version.
--
Best Regards,
Huang, Ying