LinuxLists.cc - [RFC PATCH 0/5] NUMA Balancer Suite

2019-04-22 02:13:43

Subject: [RFC PATCH 0/5] NUMA Balancer Suite

We have NUMA Balancing feature which always trying to move pages
of a task to the node it executed more, while still got issues:

* page cache can't be handled
* no cgroup level balancing

Suppose we have a box with 4 cpu, two cgroup A & B each running 4 tasks,
below scenery could be easily observed:

NODE0 | NODE1
|
CPU0 CPU1 | CPU2 CPU3
task_A0 task_A1 | task_A2 task_A3
task_B0 task_B1 | task_B2 task_B3

and usually with the equal memory consumption on each node, when tasks have
similar behavior.

In this case numa balancing try to move pages of task_A0,1 & task_B0,1 to node 0,
pages of task_A2,3 & task_B2,3 to node 1, but page cache will be located randomly,
depends on the first read/write CPU location.

Let's suppose another scenery:

NODE0 | NODE1
|
CPU0 CPU1 | CPU2 CPU3
task_A0 task_A1 | task_B0 task_B1
task_A2 task_A3 | task_B2 task_B3

By switching the cpu & memory resources of task_A0,1 and task_B0,1, now workloads
of cgroup A all on node 0, and cgroup B all on node 1, resource consumption are same
but related tasks could share a closer cpu cache, while cache still randomly located.

Now what if the workloads generate lot's of page cache, and most of the memory
accessing are page cache writing?

A page cache generated by task_A0 on NODE1 won't follow it to NODE0, but if task_A0
was already on NODE0 before it read/write files, caches will be there, so how to
make sure this happen?

Usually we could solve this problem by binding workloads on a single node, if the
cgroup A was binding to CPU0,1, then all the caches it generated will be on NODE0,
the numa bonus will be maximum.

However, this require a very well administration on specified workloads, suppose in our
cases if A & B are with a changing CPU requirement from 0% to 400%, then binding to a
single node would be a bad idea.

So what we need is a way to detect memory topology on cgroup level, and try to migrate
cpu/mem resources to the node with most of the caches there, as long as the resource
is plenty on that node.

This patch set introduced:
* advanced per-cgroup numa statistic
* numa preferred node feature
* Numa Balancer module

Which helps to achieve an easy and flexible numa resource assignment, to gain numa bonus
as much as possible.

Michael Wang (5):
numa: introduce per-cgroup numa balancing locality statistic
numa: append per-node execution info in memory.numa_stat
numa: introduce per-cgroup preferred numa node
numa: introduce numa balancer infrastructure
numa: numa balancer

drivers/Makefile | 1 +
drivers/numa/Makefile | 1 +
drivers/numa/numa_balancer.c | 715 +++++++++++++++++++++++++++++++++++++++++++
include/linux/memcontrol.h | 99 ++++++
include/linux/sched.h | 9 +-
kernel/sched/debug.c | 8 +
kernel/sched/fair.c | 41 +++
mm/huge_memory.c | 7 +-
mm/memcontrol.c | 246 +++++++++++++++
mm/memory.c | 9 +-
mm/mempolicy.c | 4 +
11 files changed, 1133 insertions(+), 7 deletions(-)
create mode 100644 drivers/numa/Makefile
create mode 100644 drivers/numa/numa_balancer.c

--
2.14.4.44.g2045bb6

2019-04-22 02:13:43

Subject: [RFC PATCH 0/5] NUMA Balancer Suite

Subject: [RFC PATCH 1/5] numa: introduce per-cgroup numa balancing locality, statistic

Subject: [RFC PATCH 2/5] numa: append per-node execution info in memory.numa_stat

Subject: [RFC PATCH 3/5] numa: introduce per-cgroup preferred numa node

Subject: [RFC PATCH 4/5] numa: introduce numa balancer infrastructure

Subject: [RFC PATCH 5/5] numa: numa balancer

Subject: Re: [RFC PATCH 0/5] NUMA Balancer Suite

Subject: Re: [RFC PATCH 1/5] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [RFC PATCH 1/5] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [RFC PATCH 1/5] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [RFC PATCH 2/5] numa: append per-node execution info in memory.numa_stat

Subject: Re: [RFC PATCH 3/5] numa: introduce per-cgroup preferred numa node

Subject: Re: [RFC PATCH 5/5] numa: numa balancer

Subject: Re: [RFC PATCH 1/5] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [RFC PATCH 1/5] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [RFC PATCH 1/5] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [RFC PATCH 2/5] numa: append per-node execution info in memory.numa_stat

Subject: Re: [RFC PATCH 3/5] numa: introduce per-cgroup preferred numa node

Subject: Re: [RFC PATCH 1/5] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [RFC PATCH 2/5] numa: append per-node execution info in memory.numa_stat

Subject: Re: [RFC PATCH 5/5] numa: numa balancer

Subject: Re: [RFC PATCH 2/5] numa: append per-node execution info in memory.numa_stat

Subject: [PATCH 0/4] per cpu cgroup numa suite

Subject: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: [PATCH 2/4] numa: append per-node execution info in memory.numa_stat

Subject: [PATCH 3/4] numa: introduce numa group per task group

Subject: [PATCH 4/4] numa: introduce numa cling feature

Subject: [PATCH v2 4/4] numa: introduce numa cling feature

Subject: Re: [PATCH v2 4/4] numa: introduce numa cling feature

Subject: [PATCH v3 4/4] numa: introduce numa cling feature

Subject: Re: [PATCH 0/4] per cgroup numa suite

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH 2/4] numa: append per-node execution info in memory.numa_stat

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH 3/4] numa: introduce numa group per task group

Subject: Re: [PATCH 4/4] numa: introduce numa cling feature

Subject: Re: [PATCH 4/4] numa: introduce numa cling feature

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH 2/4] numa: append per-node execution info in memory.numa_stat

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH 3/4] numa: introduce numa group per task group

Subject: Re: [PATCH 4/4] numa: introduce numa cling feature

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH 4/4] numa: introduce numa cling feature

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: [PATCH v2 0/4] per-cgroup numa suite

Subject: [PATCH v2 1/4] numa: introduce per-cgroup numa balancing locality statistic

Subject: [PATCH v2 3/4] numa: introduce numa group per task group

Subject: [PATCH v2 2/4] numa: append per-node execution time in cpu.numa_stat

Subject: [PATCH v4 4/4] numa: introduce numa cling feature

Subject: Re: [PATCH v2 2/4] numa: append per-node execution time in cpu.numa_stat

Subject: Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

Subject: Re: [PATCH v2 2/4] numa: append per-node execution time in cpu.numa_stat

Subject: [PATCH v5 4/4] numa: introduce numa cling feature

Subject: Re: [PATCH 4/4] numa: introduce numa cling feature

Subject: Re: [PATCH v2 0/4] per-cgroup numa suite

Subject: Re: [PATCH v2 0/4] per-cgroup numa suite