2022-01-23 15:57:18

by Wei Yang

[permalink] [raw]
Subject: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order

To make node order in round-robin in the same distance group, we add a
penalty to the first node we got in each round.

To get a round-robin order in the same distance group, we don't need to
decrease the penalty since:

* find_next_best_node() always iterates node in the same order
* distance matters more then penalty in find_next_best_node()
* in nodes with the same distance, the first one would be picked up

So it is fine to increase same penalty when we get the first node in the
same distance group.

Signed-off-by: Wei Yang <[email protected]>
---
mm/page_alloc.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..f27afd517652 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6245,13 +6245,12 @@ static void build_thisnode_zonelists(pg_data_t *pgdat)
static void build_zonelists(pg_data_t *pgdat)
{
static int node_order[MAX_NUMNODES];
- int node, load, nr_nodes = 0;
+ int node, nr_nodes = 0;
nodemask_t used_mask = NODE_MASK_NONE;
int local_node, prev_node;

/* NUMA-aware ordering of nodes */
local_node = pgdat->node_id;
- load = nr_online_nodes;
prev_node = local_node;

memset(node_order, 0, sizeof(node_order));
@@ -6263,11 +6262,10 @@ static void build_zonelists(pg_data_t *pgdat)
*/
if (node_distance(local_node, node) !=
node_distance(local_node, prev_node))
- node_load[node] += load;
+ node_load[node] += nr_online_nodes;

node_order[nr_nodes++] = node;
prev_node = node;
- load--;
}

build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
--
2.33.1


2022-01-23 15:57:31

by Wei Yang

[permalink] [raw]
Subject: [PATCH 2/2] mm/page_alloc: add penalty to local_node

'Commit 54d032ced983 ("mm/page_alloc: use accumulated load when building
node fallback list")' fix a bug on zonelist order. This let me think
about what would happen if we have a node system with the following
distance matrix.

Node 0 1 2 3 4 5 6 7
----------------------------
0 10 12 12 12 32 32 32 32
1 12 10 12 12 32 32 32 32
2 12 12 10 12 32 32 32 32
3 12 12 12 10 32 32 32 32
4 32 32 32 32 10 12 12 12
5 32 32 32 32 12 10 12 12
6 32 32 32 32 12 12 10 12
7 32 32 32 32 12 12 12 10

Unfortunately for this case, the node fallback list gets built like this:

Node Fallback list
---------------------
0: 0 1 2 3 4 5 6 7
1: 1 0 2 3 5 6 7 4
2: 2 3 0 1 6 7 4 5
3: 3 2 0 1 7 4 5 6
4: 4 5 6 7 0 1 2 3
5: 5 4 6 7 1 2 3 0
6: 6 7 4 5 2 3 0 1
7: 7 6 4 5 3 0 1 2

We found the order in diagonal block is not expected. The reason is we
don't penalty local node.

After penalty local node, the node fallback list gets built like this:

Node Fallback list
---------------------
0: 0 1 2 3 4 5 6 7
1: 1 2 3 0 5 6 7 4
2: 2 3 0 1 6 7 4 5
3: 3 0 1 2 7 4 5 6
4: 4 5 6 7 0 1 2 3
5: 5 6 7 4 1 2 3 0
6: 6 7 4 5 2 3 0 1
7: 7 4 5 6 3 0 1 2

Now the fallback list is in round-robin order.

I am not very familiar with the node distance pattern, while I tried the
following distance matrix. Both of them works with this change.

Node 0 1 2 3
----------------
0 10 10 10 10
1 10 10 10 10
2 10 10 10 10
3 10 10 10 10

Node 0 1 2 3 4 5 6 7
----------------------------
0 10 10 10 10 32 32 32 32
1 10 10 10 10 32 32 32 32
2 10 10 10 10 32 32 32 32
3 10 10 10 10 32 32 32 32
4 32 32 32 32 10 10 10 10
5 32 32 32 32 10 10 10 10
6 32 32 32 32 10 10 10 10
7 32 32 32 32 10 10 10 10

Signed-off-by: Wei Yang <[email protected]>
CC: Krupa Ramakrishnan <[email protected]>
CC: KAMEZAWA Hiroyuki <[email protected]>
CC: Michal Hocko <[email protected]>
---
mm/page_alloc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f27afd517652..0cc25429a17e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6260,8 +6260,9 @@ static void build_zonelists(pg_data_t *pgdat)
* So adding penalty to the first node in same
* distance group to make it round-robin.
*/
- if (node_distance(local_node, node) !=
- node_distance(local_node, prev_node))
+ if ((node_distance(local_node, node) !=
+ node_distance(local_node, prev_node)) ||
+ node == local_node)
node_load[node] += nr_online_nodes;

node_order[nr_nodes++] = node;
--
2.33.1

2022-03-21 23:34:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order

Could someone please help review this series?

Thanks.

2022-03-27 00:52:34

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order

On Mon, Mar 21, 2022 at 02:17:23PM -0700, Andrew Morton wrote:
>Could someone please help review this series?
>

Andrew,

Sorry for my mistake, I forgot one element in my emulation code for Patch 2.
Current kernel could handle the situation described in commit log properly.

So Patch 2 is not necessary. Sorry for the disturbance.

>Thanks.

--
Wei Yang
Help you, Help me

2022-04-06 11:27:23

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order

On 1/23/22 02:35, Wei Yang wrote:
> To make node order in round-robin in the same distance group, we add a
> penalty to the first node we got in each round.
>
> To get a round-robin order in the same distance group, we don't need to
> decrease the penalty since:
>
> * find_next_best_node() always iterates node in the same order
> * distance matters more then penalty in find_next_best_node()
> * in nodes with the same distance, the first one would be picked up
>
> So it is fine to increase same penalty when we get the first node in the
> same distance group.

With that logic I'm not even sure if we need nr_online_nodes as penalty or
it could be just 1. Would you know?

>
> Signed-off-by: Wei Yang <[email protected]>
> ---
> mm/page_alloc.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..f27afd517652 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6245,13 +6245,12 @@ static void build_thisnode_zonelists(pg_data_t *pgdat)
> static void build_zonelists(pg_data_t *pgdat)
> {
> static int node_order[MAX_NUMNODES];
> - int node, load, nr_nodes = 0;
> + int node, nr_nodes = 0;
> nodemask_t used_mask = NODE_MASK_NONE;
> int local_node, prev_node;
>
> /* NUMA-aware ordering of nodes */
> local_node = pgdat->node_id;
> - load = nr_online_nodes;
> prev_node = local_node;
>
> memset(node_order, 0, sizeof(node_order));
> @@ -6263,11 +6262,10 @@ static void build_zonelists(pg_data_t *pgdat)
> */
> if (node_distance(local_node, node) !=
> node_distance(local_node, prev_node))
> - node_load[node] += load;
> + node_load[node] += nr_online_nodes;
>
> node_order[nr_nodes++] = node;
> prev_node = node;
> - load--;
> }
>
> build_zonelists_in_node_order(pgdat, node_order, nr_nodes);

2022-04-07 14:01:09

by Wei Yang

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order

On Tue, Apr 05, 2022 at 07:11:12PM +0200, Vlastimil Babka wrote:
>On 1/23/22 02:35, Wei Yang wrote:
>> To make node order in round-robin in the same distance group, we add a
>> penalty to the first node we got in each round.
>>
>> To get a round-robin order in the same distance group, we don't need to
>> decrease the penalty since:
>>
>> * find_next_best_node() always iterates node in the same order
>> * distance matters more then penalty in find_next_best_node()
>> * in nodes with the same distance, the first one would be picked up
>>
>> So it is fine to increase same penalty when we get the first node in the
>> same distance group.
>
>With that logic I'm not even sure if we need nr_online_nodes as penalty or
>it could be just 1. Would you know?

Yes, it has the same effect.

[ 0.031849] Fallback order for Node 0: 0 1 2 3 4 5 6 7
[ 0.031854] Fallback order for Node 1: 1 2 3 0 5 6 7 4
[ 0.031857] Fallback order for Node 2: 2 3 0 1 6 7 4 5
[ 0.031860] Fallback order for Node 3: 3 0 1 2 7 4 5 6
[ 0.031864] Fallback order for Node 4: 4 5 6 7 0 1 2 3
[ 0.031867] Fallback order for Node 5: 5 6 7 4 1 2 3 0
[ 0.031870] Fallback order for Node 6: 6 7 4 5 2 3 0 1
[ 0.031873] Fallback order for Node 7: 7 4 5 6 3 0 1 2

Do you prefer to set it to 1?

--
Wei Yang
Help you, Help me

2022-04-07 18:10:04

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/page_alloc: add same penalty is enough to get round-robin order

On 4/7/22 01:47, Wei Yang wrote:
> On Tue, Apr 05, 2022 at 07:11:12PM +0200, Vlastimil Babka wrote:
>>On 1/23/22 02:35, Wei Yang wrote:
>>> To make node order in round-robin in the same distance group, we add a
>>> penalty to the first node we got in each round.
>>>
>>> To get a round-robin order in the same distance group, we don't need to
>>> decrease the penalty since:
>>>
>>> * find_next_best_node() always iterates node in the same order
>>> * distance matters more then penalty in find_next_best_node()
>>> * in nodes with the same distance, the first one would be picked up
>>>
>>> So it is fine to increase same penalty when we get the first node in the
>>> same distance group.
>>
>>With that logic I'm not even sure if we need nr_online_nodes as penalty or
>>it could be just 1. Would you know?
>
> Yes, it has the same effect.

Good.

> [ 0.031849] Fallback order for Node 0: 0 1 2 3 4 5 6 7
> [ 0.031854] Fallback order for Node 1: 1 2 3 0 5 6 7 4
> [ 0.031857] Fallback order for Node 2: 2 3 0 1 6 7 4 5
> [ 0.031860] Fallback order for Node 3: 3 0 1 2 7 4 5 6
> [ 0.031864] Fallback order for Node 4: 4 5 6 7 0 1 2 3
> [ 0.031867] Fallback order for Node 5: 5 6 7 4 1 2 3 0
> [ 0.031870] Fallback order for Node 6: 6 7 4 5 2 3 0 1
> [ 0.031873] Fallback order for Node 7: 7 4 5 6 3 0 1 2
>
> Do you prefer to set it to 1?

Yeah I think it's worth simplyfing as much as feasible, so the code is more
obvious. I think we can also then remove the MAX_NODE_LOAD #define and usage.

Also please Cc at least Oscar and David (added to Cc now) on v2 as they have
been active in memory hotplug area recently.

Thanks,
Vlastimil