2020-06-16 11:54:41

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 0/3] mm/shuffle: fix and cleanips

Patch #1 is a fix for overlapping zones and offline sections. Patch #2
stops shuffling the complete zone when onlining a memory block. Patch #3
removes dynamic reconfiguration which is currently dead code (and it's
unclear if this will be needed in the near future).

David Hildenbrand (3):
mm/shuffle: don't move pages between zones and don't read garbage
memmaps
mm/memory_hotplug: don't shuffle complete zone when onlining memory
mm/shuffle: remove dynamic reconfiguration

mm/memory_hotplug.c | 3 ---
mm/shuffle.c | 48 ++++++++++++---------------------------------
mm/shuffle.h | 29 ---------------------------
3 files changed, 12 insertions(+), 68 deletions(-)

--
2.26.2


2020-06-16 11:55:13

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory

Commit e900a918b098 ("mm: shuffle initial free memory to improve
memory-side-cache utilization") introduced shuffling of free pages
during system boot and whenever we online memory blocks.

However, whenever we online memory blocks, all pages that will be
exposed to the buddy end up getting freed via __free_one_page(). In the
general case, we free these pages in MAX_ORDER - 1 chunks, which
corresponds to the shuffle order.

Inside __free_one_page(), we will already shuffle the newly onlined pages
using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
memory hotplug.

Note: When hotplugging a DIMM, each memory block (e.g., 128MB .. 2G on
x86-64) will get onlined individually, resulting in a shuffle_zone() for
every memory block getting onlined.

Cc: Andrew Morton <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/memory_hotplug.c | 3 ---
mm/shuffle.c | 2 +-
mm/shuffle.h | 12 ------------
3 files changed, 1 insertion(+), 16 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9b34e03e730a4..845a517649c71 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -40,7 +40,6 @@
#include <asm/tlbflush.h>

#include "internal.h"
-#include "shuffle.h"

/*
* online_page_callback contains pointer to current page onlining function.
@@ -822,8 +821,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
zone->zone_pgdat->node_present_pages += onlined_pages;
pgdat_resize_unlock(zone->zone_pgdat, &flags);

- shuffle_zone(zone);
-
node_states_set_node(nid, &arg);
if (need_zonelists_rebuild)
build_all_zonelists(NULL);
diff --git a/mm/shuffle.c b/mm/shuffle.c
index dd13ab851b3ee..609c26aa57db0 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -180,7 +180,7 @@ void __meminit __shuffle_free_memory(pg_data_t *pgdat)
struct zone *z;

for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
- shuffle_zone(z);
+ __shuffle_zone(z);
}

bool shuffle_pick_tail(void)
diff --git a/mm/shuffle.h b/mm/shuffle.h
index 4d79f03b6658f..657e2b9ec38dd 100644
--- a/mm/shuffle.h
+++ b/mm/shuffle.h
@@ -30,14 +30,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
__shuffle_free_memory(pgdat);
}

-extern void __shuffle_zone(struct zone *z);
-static inline void shuffle_zone(struct zone *z)
-{
- if (!static_branch_unlikely(&page_alloc_shuffle_key))
- return;
- __shuffle_zone(z);
-}
-
static inline bool is_shuffle_order(int order)
{
if (!static_branch_unlikely(&page_alloc_shuffle_key))
@@ -54,10 +46,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
{
}

-static inline void shuffle_zone(struct zone *z)
-{
-}
-
static inline void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
{
}
--
2.26.2

2020-06-16 11:55:37

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration

Commit e900a918b098 ("mm: shuffle initial free memory to improve
memory-side-cache utilization") promised "autodetection of a
memory-side-cache (to be added in a follow-on patch)" over a year ago.

The original series included patches [1], however, they were dropped
during review [2] to be followed-up later.

Let's simplify for now and re-add when really (ever?) needed.

[1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com/
[2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com/

Cc: Andrew Morton <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/shuffle.c | 28 ++--------------------------
mm/shuffle.h | 17 -----------------
2 files changed, 2 insertions(+), 43 deletions(-)

diff --git a/mm/shuffle.c b/mm/shuffle.c
index 609c26aa57db0..702c0d5cf276c 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -10,33 +10,11 @@
#include "shuffle.h"

DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
-static unsigned long shuffle_state __ro_after_init;
-
-/*
- * Depending on the architecture, module parameter parsing may run
- * before, or after the cache detection. SHUFFLE_FORCE_DISABLE prevents,
- * or reverts the enabling of the shuffle implementation. SHUFFLE_ENABLE
- * attempts to turn on the implementation, but aborts if it finds
- * SHUFFLE_FORCE_DISABLE already set.
- */
-__meminit void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
-{
- if (ctl == SHUFFLE_FORCE_DISABLE)
- set_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state);
-
- if (test_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state)) {
- if (test_and_clear_bit(SHUFFLE_ENABLE, &shuffle_state))
- static_branch_disable(&page_alloc_shuffle_key);
- } else if (ctl == SHUFFLE_ENABLE
- && !test_and_set_bit(SHUFFLE_ENABLE, &shuffle_state))
- static_branch_enable(&page_alloc_shuffle_key);
-}

static bool shuffle_param;
static int shuffle_show(char *buffer, const struct kernel_param *kp)
{
- return sprintf(buffer, "%c\n", test_bit(SHUFFLE_ENABLE, &shuffle_state)
- ? 'Y' : 'N');
+ return sprintf(buffer, "%c\n", shuffle_param ? 'Y' : 'N');
}

static __meminit int shuffle_store(const char *val,
@@ -47,9 +25,7 @@ static __meminit int shuffle_store(const char *val,
if (rc < 0)
return rc;
if (shuffle_param)
- page_alloc_shuffle(SHUFFLE_ENABLE);
- else
- page_alloc_shuffle(SHUFFLE_FORCE_DISABLE);
+ static_branch_enable(&page_alloc_shuffle_key);
return 0;
}
module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
diff --git a/mm/shuffle.h b/mm/shuffle.h
index 657e2b9ec38dd..5574cbd4611e3 100644
--- a/mm/shuffle.h
+++ b/mm/shuffle.h
@@ -4,23 +4,10 @@
#define _MM_SHUFFLE_H
#include <linux/jump_label.h>

-/*
- * SHUFFLE_ENABLE is called from the command line enabling path, or by
- * platform-firmware enabling that indicates the presence of a
- * direct-mapped memory-side-cache. SHUFFLE_FORCE_DISABLE is called from
- * the command line path and overrides any previous or future
- * SHUFFLE_ENABLE.
- */
-enum mm_shuffle_ctl {
- SHUFFLE_ENABLE,
- SHUFFLE_FORCE_DISABLE,
-};
-
#define SHUFFLE_ORDER (MAX_ORDER-1)

#ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR
DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
-extern void page_alloc_shuffle(enum mm_shuffle_ctl ctl);
extern void __shuffle_free_memory(pg_data_t *pgdat);
extern bool shuffle_pick_tail(void);
static inline void shuffle_free_memory(pg_data_t *pgdat)
@@ -46,10 +33,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
{
}

-static inline void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
-{
-}
-
static inline bool is_shuffle_order(int order)
{
return false;
--
2.26.2

2020-06-16 12:43:34

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration

[Add Dan]

On Tue 16-06-20 13:52:13, David Hildenbrand wrote:
> Commit e900a918b098 ("mm: shuffle initial free memory to improve
> memory-side-cache utilization") promised "autodetection of a
> memory-side-cache (to be added in a follow-on patch)" over a year ago.
>
> The original series included patches [1], however, they were dropped
> during review [2] to be followed-up later.
>
> Let's simplify for now and re-add when really (ever?) needed.
>
> [1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com/
> [2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com/
>
> Cc: Andrew Morton <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Huang Ying <[email protected]>
> Cc: Wei Yang <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

While I am not against removing this unused code I am really curious
what is the future of the auto detection. Has this just fall through
cracks or there are some more serious problem to make detection
possible/reliable?

> ---
> mm/shuffle.c | 28 ++--------------------------
> mm/shuffle.h | 17 -----------------
> 2 files changed, 2 insertions(+), 43 deletions(-)
>
> diff --git a/mm/shuffle.c b/mm/shuffle.c
> index 609c26aa57db0..702c0d5cf276c 100644
> --- a/mm/shuffle.c
> +++ b/mm/shuffle.c
> @@ -10,33 +10,11 @@
> #include "shuffle.h"
>
> DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
> -static unsigned long shuffle_state __ro_after_init;
> -
> -/*
> - * Depending on the architecture, module parameter parsing may run
> - * before, or after the cache detection. SHUFFLE_FORCE_DISABLE prevents,
> - * or reverts the enabling of the shuffle implementation. SHUFFLE_ENABLE
> - * attempts to turn on the implementation, but aborts if it finds
> - * SHUFFLE_FORCE_DISABLE already set.
> - */
> -__meminit void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
> -{
> - if (ctl == SHUFFLE_FORCE_DISABLE)
> - set_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state);
> -
> - if (test_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state)) {
> - if (test_and_clear_bit(SHUFFLE_ENABLE, &shuffle_state))
> - static_branch_disable(&page_alloc_shuffle_key);
> - } else if (ctl == SHUFFLE_ENABLE
> - && !test_and_set_bit(SHUFFLE_ENABLE, &shuffle_state))
> - static_branch_enable(&page_alloc_shuffle_key);
> -}
>
> static bool shuffle_param;
> static int shuffle_show(char *buffer, const struct kernel_param *kp)
> {
> - return sprintf(buffer, "%c\n", test_bit(SHUFFLE_ENABLE, &shuffle_state)
> - ? 'Y' : 'N');
> + return sprintf(buffer, "%c\n", shuffle_param ? 'Y' : 'N');
> }
>
> static __meminit int shuffle_store(const char *val,
> @@ -47,9 +25,7 @@ static __meminit int shuffle_store(const char *val,
> if (rc < 0)
> return rc;
> if (shuffle_param)
> - page_alloc_shuffle(SHUFFLE_ENABLE);
> - else
> - page_alloc_shuffle(SHUFFLE_FORCE_DISABLE);
> + static_branch_enable(&page_alloc_shuffle_key);
> return 0;
> }
> module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400);
> diff --git a/mm/shuffle.h b/mm/shuffle.h
> index 657e2b9ec38dd..5574cbd4611e3 100644
> --- a/mm/shuffle.h
> +++ b/mm/shuffle.h
> @@ -4,23 +4,10 @@
> #define _MM_SHUFFLE_H
> #include <linux/jump_label.h>
>
> -/*
> - * SHUFFLE_ENABLE is called from the command line enabling path, or by
> - * platform-firmware enabling that indicates the presence of a
> - * direct-mapped memory-side-cache. SHUFFLE_FORCE_DISABLE is called from
> - * the command line path and overrides any previous or future
> - * SHUFFLE_ENABLE.
> - */
> -enum mm_shuffle_ctl {
> - SHUFFLE_ENABLE,
> - SHUFFLE_FORCE_DISABLE,
> -};
> -
> #define SHUFFLE_ORDER (MAX_ORDER-1)
>
> #ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR
> DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
> -extern void page_alloc_shuffle(enum mm_shuffle_ctl ctl);
> extern void __shuffle_free_memory(pg_data_t *pgdat);
> extern bool shuffle_pick_tail(void);
> static inline void shuffle_free_memory(pg_data_t *pgdat)
> @@ -46,10 +33,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
> {
> }
>
> -static inline void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
> -{
> -}
> -
> static inline bool is_shuffle_order(int order)
> {
> return false;
> --
> 2.26.2

--
Michal Hocko
SUSE Labs

2020-06-16 12:55:06

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory

On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
> Commit e900a918b098 ("mm: shuffle initial free memory to improve
> memory-side-cache utilization") introduced shuffling of free pages
> during system boot and whenever we online memory blocks.
>
> However, whenever we online memory blocks, all pages that will be
> exposed to the buddy end up getting freed via __free_one_page(). In the
> general case, we free these pages in MAX_ORDER - 1 chunks, which
> corresponds to the shuffle order.
>
> Inside __free_one_page(), we will already shuffle the newly onlined pages
> using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
> memory hotplug.
>
> Note: When hotplugging a DIMM, each memory block (e.g., 128MB .. 2G on
> x86-64) will get onlined individually, resulting in a shuffle_zone() for
> every memory block getting onlined.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Alexander Duyck <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/memory_hotplug.c | 3 ---
> mm/shuffle.c | 2 +-
> mm/shuffle.h | 12 ------------
> 3 files changed, 1 insertion(+), 16 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9b34e03e730a4..845a517649c71 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -40,7 +40,6 @@
> #include <asm/tlbflush.h>
>
> #include "internal.h"
> -#include "shuffle.h"
>
> /*
> * online_page_callback contains pointer to current page onlining function.
> @@ -822,8 +821,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
> zone->zone_pgdat->node_present_pages += onlined_pages;
> pgdat_resize_unlock(zone->zone_pgdat, &flags);
>
> - shuffle_zone(zone);
> -
> node_states_set_node(nid, &arg);
> if (need_zonelists_rebuild)
> build_all_zonelists(NULL);
> diff --git a/mm/shuffle.c b/mm/shuffle.c
> index dd13ab851b3ee..609c26aa57db0 100644
> --- a/mm/shuffle.c
> +++ b/mm/shuffle.c
> @@ -180,7 +180,7 @@ void __meminit __shuffle_free_memory(pg_data_t *pgdat)
> struct zone *z;
>
> for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
> - shuffle_zone(z);
> + __shuffle_zone(z);
> }
>
> bool shuffle_pick_tail(void)
> diff --git a/mm/shuffle.h b/mm/shuffle.h
> index 4d79f03b6658f..657e2b9ec38dd 100644
> --- a/mm/shuffle.h
> +++ b/mm/shuffle.h
> @@ -30,14 +30,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
> __shuffle_free_memory(pgdat);
> }
>
> -extern void __shuffle_zone(struct zone *z);
> -static inline void shuffle_zone(struct zone *z)
> -{
> - if (!static_branch_unlikely(&page_alloc_shuffle_key))
> - return;
> - __shuffle_zone(z);
> -}
> -
> static inline bool is_shuffle_order(int order)
> {
> if (!static_branch_unlikely(&page_alloc_shuffle_key))
> @@ -54,10 +46,6 @@ static inline void shuffle_free_memory(pg_data_t *pgdat)
> {
> }
>
> -static inline void shuffle_zone(struct zone *z)
> -{
> -}
> -
> static inline void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
> {
> }
> --
> 2.26.2

--
Michal Hocko
SUSE Labs

2020-06-16 13:48:17

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration

On 16.06.20 14:41, Michal Hocko wrote:
> [Add Dan]

Whops, dropped by mistake. Thanks for adding.

>
> On Tue 16-06-20 13:52:13, David Hildenbrand wrote:
>> Commit e900a918b098 ("mm: shuffle initial free memory to improve
>> memory-side-cache utilization") promised "autodetection of a
>> memory-side-cache (to be added in a follow-on patch)" over a year ago.
>>
>> The original series included patches [1], however, they were dropped
>> during review [2] to be followed-up later.
>>
>> Let's simplify for now and re-add when really (ever?) needed.
>>
>> [1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com/
>> [2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com/
>>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Johannes Weiner <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Minchan Kim <[email protected]>
>> Cc: Huang Ying <[email protected]>
>> Cc: Wei Yang <[email protected]>
>> Cc: Keith Busch <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>
> While I am not against removing this unused code I am really curious
> what is the future of the auto detection. Has this just fall through
> cracks or there are some more serious problem to make detection
> possible/reliable?

From the bouncing mails I assume Keith - author of the original patches
in [1] - is no longer working at Intel (or I messed up :) "#5.1.0
Address rejected"). Maybe Dan can clarify what the future of this is.

--
Thanks,

David / dhildenb

2020-06-16 17:03:04

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] mm/shuffle: remove dynamic reconfiguration

On Tue, Jun 16, 2020 at 6:45 AM David Hildenbrand <[email protected]> wrote:
>
> On 16.06.20 14:41, Michal Hocko wrote:
> > [Add Dan]
>
> Whops, dropped by mistake. Thanks for adding.
>
> >
> > On Tue 16-06-20 13:52:13, David Hildenbrand wrote:
> >> Commit e900a918b098 ("mm: shuffle initial free memory to improve
> >> memory-side-cache utilization") promised "autodetection of a
> >> memory-side-cache (to be added in a follow-on patch)" over a year ago.
> >>
> >> The original series included patches [1], however, they were dropped
> >> during review [2] to be followed-up later.
> >>
> >> Let's simplify for now and re-add when really (ever?) needed.
> >>
> >> [1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com/
> >> [2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com/
> >>
> >> Cc: Andrew Morton <[email protected]>
> >> Cc: Johannes Weiner <[email protected]>
> >> Cc: Michal Hocko <[email protected]>
> >> Cc: Minchan Kim <[email protected]>
> >> Cc: Huang Ying <[email protected]>
> >> Cc: Wei Yang <[email protected]>
> >> Cc: Keith Busch <[email protected]>
> >> Cc: Mel Gorman <[email protected]>
> >> Signed-off-by: David Hildenbrand <[email protected]>
> >
> > While I am not against removing this unused code I am really curious
> > what is the future of the auto detection. Has this just fall through
> > cracks or there are some more serious problem to make detection
> > possible/reliable?
>
> From the bouncing mails I assume Keith - author of the original patches
> in [1] - is no longer working at Intel (or I messed up :) "#5.1.0
> Address rejected"). Maybe Dan can clarify what the future of this is.

People are actively using this via the command line option. In fact it
has saved some people that have deployed platforms with mistmatched
DIMM populations where no HMAT autodetection would be possible. It's
also been useful for mitigating other memory interleaving performance
problems that are sensitive to a given address line.

The reason this is still manual is the dearth of platforms that
publish an HMAT, but again the command line option is actively
mitigating performance issues or otherwise allowing for testing a
"pre-aged" memory allocator free list.

2020-06-16 17:04:02

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory

On Tue, Jun 16, 2020 at 5:51 AM Michal Hocko <[email protected]> wrote:
>
> On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
> > Commit e900a918b098 ("mm: shuffle initial free memory to improve
> > memory-side-cache utilization") introduced shuffling of free pages
> > during system boot and whenever we online memory blocks.
> >
> > However, whenever we online memory blocks, all pages that will be
> > exposed to the buddy end up getting freed via __free_one_page(). In the
> > general case, we free these pages in MAX_ORDER - 1 chunks, which
> > corresponds to the shuffle order.
> >
> > Inside __free_one_page(), we will already shuffle the newly onlined pages
> > using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
> > memory hotplug.
> >
> > Note: When hotplugging a DIMM, each memory block (e.g., 128MB .. 2G on
> > x86-64) will get onlined individually, resulting in a shuffle_zone() for
> > every memory block getting onlined.
> >
> > Cc: Andrew Morton <[email protected]>
> > Cc: Alexander Duyck <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Signed-off-by: David Hildenbrand <[email protected]>
>
> Acked-by: Michal Hocko <[email protected]>

Nacked-by: Dan Williams <[email protected]>

As explained in another thread this is in active use.

2020-06-16 17:08:16

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory

On Tue, Jun 16, 2020 at 10:00 AM Dan Williams <[email protected]> wrote:
>
> On Tue, Jun 16, 2020 at 5:51 AM Michal Hocko <[email protected]> wrote:
> >
> > On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
> > > Commit e900a918b098 ("mm: shuffle initial free memory to improve
> > > memory-side-cache utilization") introduced shuffling of free pages
> > > during system boot and whenever we online memory blocks.
> > >
> > > However, whenever we online memory blocks, all pages that will be
> > > exposed to the buddy end up getting freed via __free_one_page(). In the
> > > general case, we free these pages in MAX_ORDER - 1 chunks, which
> > > corresponds to the shuffle order.
> > >
> > > Inside __free_one_page(), we will already shuffle the newly onlined pages
> > > using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
> > > memory hotplug.

This was already explained in the initial patch submission. The
shuffle_pick_tail() shuffling at run time is only sufficient for
maintaining the shuffle. It's not sufficient for effectively
randomizing the free list.

See:

e900a918b098 mm: shuffle initial free memory to improve
memory-side-cache utilization

This initial randomization can be undone over time so a follow-on patch is
introduced to inject entropy on page free decisions. It is reasonable to
ask if the page free entropy is sufficient, but it is not enough due to
the in-order initial freeing of pages. At the start of that process
putting page1 in front or behind page0 still keeps them close together,
page2 is still near page1 and has a high chance of being adjacent. As
more pages are added ordering diversity improves, but there is still high
page locality for the low address pages and this leads to no significant
impact to the cache conflict rate.

2020-06-16 18:27:02

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory

On 16.06.20 19:03, Dan Williams wrote:
> On Tue, Jun 16, 2020 at 10:00 AM Dan Williams <[email protected]> wrote:
>>
>> On Tue, Jun 16, 2020 at 5:51 AM Michal Hocko <[email protected]> wrote:
>>>
>>> On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
>>>> Commit e900a918b098 ("mm: shuffle initial free memory to improve
>>>> memory-side-cache utilization") introduced shuffling of free pages
>>>> during system boot and whenever we online memory blocks.
>>>>
>>>> However, whenever we online memory blocks, all pages that will be
>>>> exposed to the buddy end up getting freed via __free_one_page(). In the
>>>> general case, we free these pages in MAX_ORDER - 1 chunks, which
>>>> corresponds to the shuffle order.
>>>>
>>>> Inside __free_one_page(), we will already shuffle the newly onlined pages
>>>> using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
>>>> memory hotplug.
>
> This was already explained in the initial patch submission. The
> shuffle_pick_tail() shuffling at run time is only sufficient for
> maintaining the shuffle. It's not sufficient for effectively
> randomizing the free list.

Thanks for pointing that out. Something like that should definitely
belong into the code in form of a comment. I'll think of something.

Tow things:

1. The "shuffle when onlining" is really sub-optimal. Assume you hotplug
a 64GB DIMM. With 128MB memory blocks you get 512 zone shuffles. I guess
something better would have been to schedule a reshuffle some ms in the
future.

2. You'll run into the same issue whenever you free a bigger,
consecutive range. Like free_contig_rang()'ing one or more multiple
gigantic pages. We might want to reshuffle (or schedule a reshuffle)
here as well.

--
Thanks,

David / dhildenb

2020-06-17 06:51:21

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory

On Tue 16-06-20 10:03:31, Dan Williams wrote:
> On Tue, Jun 16, 2020 at 10:00 AM Dan Williams <[email protected]> wrote:
> >
> > On Tue, Jun 16, 2020 at 5:51 AM Michal Hocko <[email protected]> wrote:
> > >
> > > On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
> > > > Commit e900a918b098 ("mm: shuffle initial free memory to improve
> > > > memory-side-cache utilization") introduced shuffling of free pages
> > > > during system boot and whenever we online memory blocks.
> > > >
> > > > However, whenever we online memory blocks, all pages that will be
> > > > exposed to the buddy end up getting freed via __free_one_page(). In the
> > > > general case, we free these pages in MAX_ORDER - 1 chunks, which
> > > > corresponds to the shuffle order.
> > > >
> > > > Inside __free_one_page(), we will already shuffle the newly onlined pages
> > > > using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
> > > > memory hotplug.
>
> This was already explained in the initial patch submission. The
> shuffle_pick_tail() shuffling at run time is only sufficient for
> maintaining the shuffle. It's not sufficient for effectively
> randomizing the free list.

Yes, the "randomness" of the added memory will be lower. But is this
observable for hotplug scenarios? Is memory hotplug for the normal
memory really a thing in setups which use RAM as a cache?

While I do agree that the code wise the shuffling per online operation
doesn't really have any overhead really but it would be really great to
know whether it matters at all.
--
Michal Hocko
SUSE Labs

2020-06-17 19:08:34

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] mm/memory_hotplug: don't shuffle complete zone when onlining memory

On Tue, Jun 16, 2020 at 11:48 PM Michal Hocko <[email protected]> wrote:
>
> On Tue 16-06-20 10:03:31, Dan Williams wrote:
> > On Tue, Jun 16, 2020 at 10:00 AM Dan Williams <[email protected]> wrote:
> > >
> > > On Tue, Jun 16, 2020 at 5:51 AM Michal Hocko <[email protected]> wrote:
> > > >
> > > > On Tue 16-06-20 13:52:12, David Hildenbrand wrote:
> > > > > Commit e900a918b098 ("mm: shuffle initial free memory to improve
> > > > > memory-side-cache utilization") introduced shuffling of free pages
> > > > > during system boot and whenever we online memory blocks.
> > > > >
> > > > > However, whenever we online memory blocks, all pages that will be
> > > > > exposed to the buddy end up getting freed via __free_one_page(). In the
> > > > > general case, we free these pages in MAX_ORDER - 1 chunks, which
> > > > > corresponds to the shuffle order.
> > > > >
> > > > > Inside __free_one_page(), we will already shuffle the newly onlined pages
> > > > > using "to_tail = shuffle_pick_tail();". Drop explicit zone shuffling on
> > > > > memory hotplug.
> >
> > This was already explained in the initial patch submission. The
> > shuffle_pick_tail() shuffling at run time is only sufficient for
> > maintaining the shuffle. It's not sufficient for effectively
> > randomizing the free list.
>
> Yes, the "randomness" of the added memory will be lower. But is this
> observable for hotplug scenarios?

I'm not sure of the intersection of platforms using memory hotplug and
shuffling in production.

> Is memory hotplug for the normal
> memory really a thing in setups which use RAM as a cache?

I would point out again though that the utility of shuffling goes
beyond RAM-as-cache. I have seen some cost sensitive customer platform
configurations that asymmetrically populate memory controllers. Think
1 DIMM on controller0 and 2 DIMMs on controller1. In that case Linux
can get into pathological situations where an application is bandwidth
limited because it only accesses the single-DIMM backed memory range.
Shuffling balances accesses across all available memory memory
controllers restoring full memory bandwidth for that configuration. So
shuffling is used to solve problems that are otherwise invisible to
Linux, there's no indication from the platform that one memory range
has lower bandwidth than another.

> While I do agree that the code wise the shuffling per online operation
> doesn't really have any overhead really but it would be really great to
> know whether it matters at all.

I agree this is a good test case, especially considering the
"dax_kmem" solution where memory might be reserved from the initial
shuffling and onlined later. I assume there's a cross-over point where
not shuffling hotplugged memory starts to be noticeable. I just don't
have those numbers handy.