2018-02-28 10:55:23

by Chao Fan

[permalink] [raw]
Subject: [PATCH v9 0/5] x86/KASLR: Add parameter kaslr_boot_mem=nn[KMG]@ss[KMG]

Long time no reply, rebase the patchset, change the parameter name
from 'kaslr_mem' to 'kaslr_boot_mem'. There's no more code change.

***Background:
People reported that kaslr may randomly chooses some positions
which are located in movable memory regions. This will break memory
hotplug feature.

And also on kvm guest with 4GB meory, the good unfragmented 1GB could
be occupied by randomized kernel. It will cause hugetlb failing to
allocate 1GB page. While kernel with 'nokaslr' has not such issue.
This causes regression. Please see the discussion mail:
https://lkml.org/lkml/2018/1/4/236

***Solutions:
Introduce a new kernel parameter 'kaslr_boot_mem=nn@ss' to let users to
specify the memory regions where kernel can be allowed to randomize
safely.

E.g if 'movable_node' is spedified, we can use 'kaslr_boot_mem=nn@ss' to
tell KASLR where we can put kernel safely. Then KASLR code can avoid
those movable regions and only choose those immovable regions
specified.

For hugetlb case, users can always add 'kaslr_boot_mem=1G' in kernel
cmdline since the 0~1G is always fragmented region because of BIOS
reserved area. Surely users can specify regions more precisely if
they know system memory very well.

*** Issues need be discussed
There are several issues I am not quite sure, please help review and
give suggestions:

1) Since there's already mem_avoid[] which stores the memory regions
KASLR need avoid. For the regions KASLR can safely use, I name it as
mem_usable[], not sure if it's appropriate. Or kaslr_boot_mem[] directly?

2) In v6, I made 'kaslr_boot_mem=' as a kernel parameter which users can
use to specify memory regions where kenrel can be extracted safely by
'kaslr_boot_mem=nn@ss', or regions where we need avoid to extract kernel by
'kaslr_boot_mem=nn!ss'. While later I rethink about it, seems
'kaslr_boot_mem=nn@ss' can satisfy the current requirement, there's no need
to introduce the 'kaslr_boot_mem=nn!ss'. So I just take that
'kaslr_boot_mem=nn!ss' handling patch off, may add it later if anyone think
it's necessary. Any suggestions?
https://www.spinics.net/lists/kernel/msg2698457.html

***Test results:
- I did some tests for the memory hotplug issues. I specify the memory
region in one node, then I found every time the kernel will be
extracted to the memory of this node.
- Luiz tested this series with a 4GB KVM guest. With kaslr_boot_mem=1G,
got one 1GB page allocated 100% of the time in 85 boots. Without
kaslr_boot_mem=, got 3 failures in only 10 boots (that is, in 3 boots
no 1GB page allocated). So this series solves the 1GB page problem.

***History
v8->v9:
- Rebase in new version.
- Change the name of parameter from 'kaslr_mem' to 'kaslr_boot_mem'

v7->v8:
- Just improve some comments.
- Change the wrong spelling.
- Add the Tested-by and Acked-by.

v6->v7:
- Drop the unnecessary avoid part for now.
- Add document for the new parameter.

v5->v6:
- Add the last patch to save the avoid memory regions.

v4->v5:
- Change the problem reported by LKP
Follow Dou's suggestion:
- Also return if match "movable_node" when parsing kernel commandline
in handle_mem_filter without define CONFIG_MEMORY_HOTPLUG

v3->v4:
Follow Kees's suggestion:
- Put the functions variables of immovable_mem to #ifdef
CONFIG_MEMORY_HOTPLUG and change some code place
- Change the name of "process_mem_region" to "slots_count"
- Reanme the new function "process_immovable_mem" to "process_mem_region"
Follow Baoquan's suggestion:
- Fail KASLR if "movable_node" specified without "immovable_mem"
- Ajust the code place of handling mem_region directely if no
immovable_mem specified
Follow Randy's suggestion:
- Change the mistake and add detailed description for the document.

v2->v3:
Follow Baoquan He's suggestion:
- Change names of several functions.
- Add a new parameter "immovable_mem" instead of extending mvoable_node
- Use the clamp to calculate the memory intersecting, which makes
logical more clear.
- Disable memory mirror if movable_node specified

v1->v2:
Follow Dou Liyang's suggestion:
- Add the parse for movable_node=nn[KMG] without @ss[KMG]
- Fix the bug for more than one "movable_node=" specified
- Drop useless variables and use mem_vector region directely
- Add more comments.

Chao Fan (5):
x86/KASLR: Add kaslr_boot_mem=nn[KMG]@ss[KMG]
x86/KASLR: Handle the memory regions specified in kaslr_boot_mem
x86/KASLR: Give a warning if movable_node specified without
kaslr_boot_mem=
x86/KASLR: Skip memory mirror handling if movable_node specified
document: add document for kaslr_boot_mem

Documentation/admin-guide/kernel-parameters.txt | 10 ++
arch/x86/boot/compressed/kaslr.c | 154 +++++++++++++++++++++---
2 files changed, 150 insertions(+), 14 deletions(-)

--
2.14.3





2018-02-28 10:52:53

by Chao Fan

[permalink] [raw]
Subject: [PATCH v9 5/5] document: add document for kaslr_boot_mem

Cc: [email protected]
Cc: Jonathan Corbet <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Chao Fan <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b37c1c30c16f..23232c9ffa39 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2389,6 +2389,16 @@
allocations which rules out almost all kernel
allocations. Use with caution!

+ kaslr_boot_mem=nn[KMG][@ss[KMG]]
+ [KNL] Force usage of a specific region of memory
+ for KASLR during kernel decompression stage.
+ Region of usable memory is from ss to ss+nn. If ss is
+ omitted, it is equivalent to kaslr_boot_mem=nn[KMG]@0.
+ Multiple regions can be specified, comma delimited.
+ Notice: only support 4 regions at most now.
+ Example:
+ kaslr_boot_mem=1G,500M@2G,1G@4G
+
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>

--
2.14.3




2018-02-28 10:53:06

by Chao Fan

[permalink] [raw]
Subject: [PATCH v9 3/5] x86/KASLR: Give a warning if movable_node specified without kaslr_boot_mem=

Since only 'movable_node' specified without 'kaslr_boot_mem=' may
break memory hotplug, so reconmmend users using 'kaslr_boot_mem='
when 'movable_node' specified.

Signed-off-by: Chao Fan <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/boot/compressed/kaslr.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index eb47502fbe54..9fb86248d5c5 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -288,6 +288,16 @@ static int handle_mem_filter(void)
!strstr(args, "kaslr_boot_mem="))
return 0;

+#ifdef CONFIG_MEMORY_HOTPLUG
+ /*
+ * Check if 'kaslr_boot_mem=' specified when 'movable_node' found.
+ * If not, just give a warrning. Otherwise memory hotplug could be
+ * affected if kernel is put on movable memory regions.
+ */
+ if (strstr(args, "movable_node") && !strstr(args, "kaslr_boot_mem="))
+ warn("'kaslr_boot_mem=' should be specified when using 'movable_node'.\n");
+#endif
+
tmp_cmdline = malloc(len + 1);
if (!tmp_cmdline)
error("Failed to allocate space for tmp_cmdline");
--
2.14.3




2018-02-28 10:53:15

by Chao Fan

[permalink] [raw]
Subject: [PATCH v9 4/5] x86/KASLR: Skip memory mirror handling if movable_node specified

In kernel code, if 'movable_node' specified, it will skip the mirror
feature. So also skip mirror feature in KASLR.

Signed-off-by: Chao Fan <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/boot/compressed/kaslr.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 9fb86248d5c5..d19085dbd6f5 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -698,6 +698,7 @@ static bool
process_efi_entries(unsigned long minimum, unsigned long image_size)
{
struct efi_info *e = &boot_params->efi_info;
+ char *args = (char *)get_cmd_line_ptr();
bool efi_mirror_found = false;
struct mem_vector region;
efi_memory_desc_t *md;
@@ -731,6 +732,12 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
}
}

+#ifdef CONFIG_MEMORY_HOTPLUG
+ /* Skip memory mirror if 'movabale_node' specified */
+ if (strstr(args, "movable_node"))
+ efi_mirror_found = false;
+#endif
+
for (i = 0; i < nr_desc; i++) {
md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);

--
2.14.3




2018-02-28 10:53:39

by Chao Fan

[permalink] [raw]
Subject: [PATCH v9 1/5] x86/KASLR: Add kaslr_boot_mem=nn[KMG]@ss[KMG]

Introduce a new kernel parameter kaslr_boot_mem=nn[KMG]@ss[KMG] which is
used by KASLR only during kernel decompression stage.

Users can use it to specify memory regions where kernel can be randomized
into. E.g if movable_node specified in kernel cmdline, kernel could be
extracted into those movable regions, this will make memory hotplug fail.
With the help of 'kaslr_boot_mem=', limit kernel in those immovable regions
specified.

Signed-off-by: Chao Fan <[email protected]>
Tested-by: Luiz Capitulino <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/boot/compressed/kaslr.c | 73 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 70 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 66e42a098d70..e33e5cbf7244 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -114,6 +114,15 @@ enum mem_avoid_index {

static struct mem_vector mem_avoid[MEM_AVOID_MAX];

+/* Only support at most 4 usable memory regions specified for kaslr */
+#define MAX_KASLR_MEM_USABLE 4
+
+/* Store the usable memory regions for kaslr */
+static struct mem_vector mem_usable[MAX_KASLR_MEM_USABLE];
+
+/* The amount of usable regions for kaslr user specify, not more than 4 */
+static int num_usable_region;
+
static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
{
/* Item one is entirely before item two. */
@@ -212,7 +221,62 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
}

-static int handle_mem_memmap(void)
+static int parse_kaslr_boot_mem(char *p,
+ unsigned long long *start,
+ unsigned long long *size)
+{
+ char *oldp;
+
+ if (!p)
+ return -EINVAL;
+
+ oldp = p;
+ *size = memparse(p, &p);
+ if (p == oldp)
+ return -EINVAL;
+
+ switch (*p) {
+ case '@':
+ *start = memparse(p + 1, &p);
+ return 0;
+ default:
+ /*
+ * If w/o offset, only size specified, kaslr_boot_mem=nn[KMG]
+ * has the same behaviour as kaslr_boot_mem=nn[KMG]@0. It means
+ * the region starts from 0.
+ */
+ *start = 0;
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+static void parse_kaslr_boot_mem_regions(char *str)
+{
+ static int i;
+
+ while (str && (i < MAX_KASLR_MEM_USABLE)) {
+ int rc;
+ unsigned long long start, size;
+ char *k = strchr(str, ',');
+
+ if (k)
+ *k++ = 0;
+
+ rc = parse_kaslr_boot_mem(str, &start, &size);
+ if (rc < 0)
+ break;
+ str = k;
+
+ mem_usable[i].start = start;
+ mem_usable[i].size = size;
+ i++;
+ }
+ num_usable_region = i;
+}
+
+static int handle_mem_filter(void)
{
char *args = (char *)get_cmd_line_ptr();
size_t len = strlen((char *)args);
@@ -220,7 +284,8 @@ static int handle_mem_memmap(void)
char *param, *val;
u64 mem_size;

- if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+ if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+ !strstr(args, "kaslr_boot_mem="))
return 0;

tmp_cmdline = malloc(len + 1);
@@ -245,6 +310,8 @@ static int handle_mem_memmap(void)

if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+ } else if (!strcmp(param, "kaslr_boot_mem")) {
+ parse_kaslr_boot_mem_regions(val);
} else if (!strcmp(param, "mem")) {
char *p = val;

@@ -384,7 +451,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
/* We don't need to set a mapping for setup_data. */

/* Mark the memmap regions we need to avoid */
- handle_mem_memmap();
+ handle_mem_filter();

#ifdef CONFIG_X86_VERBOSE_BOOTUP
/* Make sure video RAM can be used. */
--
2.14.3




2018-02-28 10:54:00

by Chao Fan

[permalink] [raw]
Subject: [PATCH v9 2/5] x86/KASLR: Handle the memory regions specified in kaslr_boot_mem

If no 'kaslr_boot_mem=' specified, just handle the e820/efi entries
directly as before. Otherwise, limit kernel to memory regions
specified in 'kaslr_boot_mem=' commandline.

Rename process_mem_region to slots_count to match
slots_fetch_random, and name new function as process_mem_region.

Signed-off-by: Chao Fan <[email protected]>
Tested-by: Luiz Capitulino <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/boot/compressed/kaslr.c | 64 +++++++++++++++++++++++++++++++++-------
1 file changed, 53 insertions(+), 11 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index e33e5cbf7244..eb47502fbe54 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -554,9 +554,9 @@ static unsigned long slots_fetch_random(void)
return 0;
}

-static void process_mem_region(struct mem_vector *entry,
- unsigned long minimum,
- unsigned long image_size)
+static void slots_count(struct mem_vector *entry,
+ unsigned long minimum,
+ unsigned long image_size)
{
struct mem_vector region, overlap;
struct slot_area slot_area;
@@ -633,6 +633,52 @@ static void process_mem_region(struct mem_vector *entry,
}
}

+static bool process_mem_region(struct mem_vector region,
+ unsigned long long minimum,
+ unsigned long long image_size)
+{
+ /*
+ * If 'kaslr_boot_mem=' specified, walk all the regions, and
+ * filter the intersection to slots_count.
+ */
+ if (num_usable_region > 0) {
+ int i;
+
+ for (i = 0; i < num_usable_region; i++) {
+ struct mem_vector entry;
+ unsigned long long start, end, entry_end, region_end;
+
+ start = mem_usable[i].start;
+ end = start + mem_usable[i].size;
+ region_end = region.start + region.size;
+
+ entry.start = clamp(region.start, start, end);
+ entry_end = clamp(region_end, start, end);
+
+ if (entry.start < entry_end) {
+ entry.size = entry_end - entry.start;
+ slots_count(&entry, minimum, image_size);
+ }
+
+ if (slot_area_index == MAX_SLOT_AREA) {
+ debug_putstr("Aborted e820/efi memmap scan (slot_areas full)!\n");
+ return 1;
+ }
+ }
+ return 0;
+ }
+
+ /*
+ * If no kaslr_boot_mem stored, use region directly
+ */
+ slots_count(&region, minimum, image_size);
+ if (slot_area_index == MAX_SLOT_AREA) {
+ debug_putstr("Aborted e820/efi memmap scan (slot_areas full)!\n");
+ return 1;
+ }
+ return 0;
+}
+
#ifdef CONFIG_EFI
/*
* Returns true if mirror region found (and must have been processed
@@ -698,11 +744,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)

region.start = md->phys_addr;
region.size = md->num_pages << EFI_PAGE_SHIFT;
- process_mem_region(&region, minimum, image_size);
- if (slot_area_index == MAX_SLOT_AREA) {
- debug_putstr("Aborted EFI scan (slot_areas full)!\n");
+
+ if (process_mem_region(region, minimum, image_size))
break;
- }
}
return true;
}
@@ -729,11 +773,9 @@ static void process_e820_entries(unsigned long minimum,
continue;
region.start = entry->addr;
region.size = entry->size;
- process_mem_region(&region, minimum, image_size);
- if (slot_area_index == MAX_SLOT_AREA) {
- debug_putstr("Aborted e820 scan (slot_areas full)!\n");
+
+ if (process_mem_region(region, minimum, image_size))
break;
- }
}
}

--
2.14.3




2018-03-12 09:38:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v9 0/5] x86/KASLR: Add parameter kaslr_boot_mem=nn[KMG]@ss[KMG]


* Chao Fan <[email protected]> wrote:

> Long time no reply, rebase the patchset, change the parameter name
> from 'kaslr_mem' to 'kaslr_boot_mem'. There's no more code change.
>
> ***Background:
> People reported that kaslr may randomly chooses some positions
> which are located in movable memory regions. This will break memory
> hotplug feature.

[...]

> ***Solutions:
> Introduce a new kernel parameter 'kaslr_boot_mem=nn@ss' to let users to
> specify the memory regions where kernel can be allowed to randomize
> safely.

Manual solutions like that are pretty suboptimal to users, aren't they?

In what way does memory hotplug feature 'break'? Does it crash or misbehave? Or
simply does it not allow the movement of the affected memory region, while still
allowing the rest to be moved?

Thanks,

Ingo

2018-03-12 10:12:28

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v9 0/5] x86/KASLR: Add parameter kaslr_boot_mem=nn[KMG]@ss[KMG]

Hi Ingo,

On 03/12/18 at 10:35am, Ingo Molnar wrote:
>
> * Chao Fan <[email protected]> wrote:
>
> > Long time no reply, rebase the patchset, change the parameter name
> > from 'kaslr_mem' to 'kaslr_boot_mem'. There's no more code change.
> >
> > ***Background:
> > People reported that kaslr may randomly chooses some positions
> > which are located in movable memory regions. This will break memory
> > hotplug feature.
>
> [...]
>
> > ***Solutions:
> > Introduce a new kernel parameter 'kaslr_boot_mem=nn@ss' to let users to
> > specify the memory regions where kernel can be allowed to randomize
> > safely.
>
> Manual solutions like that are pretty suboptimal to users, aren't they?
>
> In what way does memory hotplug feature 'break'? Does it crash or misbehave? Or
> simply does it not allow the movement of the affected memory region, while still
> allowing the rest to be moved?

AFAIT, if kernel is randomized into the movable memory region, the
affected memory region can not be hot added/removed since it has kernel
data. Surely, the system can still work, the unaffected part still can
be moved. Still it will cause regression on memory hotplug.

Mainly we parse SRAT table to get the ranges of memory provided by
hot-added memory devices in initmem_init(), that's very late. During boot,
we don't know it. Chao ever posted patches to grab SRAT at decompressing
stage, the code is very complicated and not elegant, ACPI maintainer
NACKed that.

Thanks
Baoquan

2018-03-12 10:58:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v9 0/5] x86/KASLR: Add parameter kaslr_boot_mem=nn[KMG]@ss[KMG]


* Baoquan He <[email protected]> wrote:

> Hi Ingo,
>
> On 03/12/18 at 10:35am, Ingo Molnar wrote:
> >
> > * Chao Fan <[email protected]> wrote:
> >
> > > Long time no reply, rebase the patchset, change the parameter name
> > > from 'kaslr_mem' to 'kaslr_boot_mem'. There's no more code change.
> > >
> > > ***Background:
> > > People reported that kaslr may randomly chooses some positions
> > > which are located in movable memory regions. This will break memory
> > > hotplug feature.
> >
> > [...]
> >
> > > ***Solutions:
> > > Introduce a new kernel parameter 'kaslr_boot_mem=nn@ss' to let users to
> > > specify the memory regions where kernel can be allowed to randomize
> > > safely.
> >
> > Manual solutions like that are pretty suboptimal to users, aren't they?
> >
> > In what way does memory hotplug feature 'break'? Does it crash or misbehave? Or
> > simply does it not allow the movement of the affected memory region, while still
> > allowing the rest to be moved?
>
> AFAIT, if kernel is randomized into the movable memory region, the
> affected memory region can not be hot added/removed since it has kernel
> data. Surely, the system can still work, the unaffected part still can
> be moved. Still it will cause regression on memory hotplug.
>
> Mainly we parse SRAT table to get the ranges of memory provided by
> hot-added memory devices in initmem_init(), that's very late. During boot,
> we don't know it. Chao ever posted patches to grab SRAT at decompressing
> stage, the code is very complicated and not elegant, ACPI maintainer
> NACKed that.

So there's apparently a mis-design here:

- KASLR needs to be done very early on during bootup: - it's not realistic to
expect KASLR to be done with a booted up kernel, because pointers to various
KASLR-ed objects are already widely spread out in memory.

- But for some unfathomable reason the memory hotplug attribute of memory
regions is not part of the regular memory map but part of late-init ACPI data
structures.

The right solution would be _not_ to fudge the KASLR location, but to provide the
memory hotplug information to early code, preferably via the primary memory map.
KASLR can then make use of it and avoid those regions, just like it avoids other
memory regions already.

In addition to that hardware makers (including virtualized hardware) should also
fix their systems to provide memory hotplug information to early code.

Thanks,

Ingo

2018-03-12 12:05:50

by Chao Fan

[permalink] [raw]
Subject: Re: [PATCH v9 0/5] x86/KASLR: Add parameter kaslr_boot_mem=nn[KMG]@ss[KMG]

On Mon, Mar 12, 2018 at 11:57:27AM +0100, Ingo Molnar wrote:
>
>* Baoquan He <[email protected]> wrote:
>
>> Hi Ingo,
>>
>> On 03/12/18 at 10:35am, Ingo Molnar wrote:
>> >
>> > * Chao Fan <[email protected]> wrote:
>> >
>> > > Long time no reply, rebase the patchset, change the parameter name
>> > > from 'kaslr_mem' to 'kaslr_boot_mem'. There's no more code change.
>> > >
>> > > ***Background:
>> > > People reported that kaslr may randomly chooses some positions
>> > > which are located in movable memory regions. This will break memory
>> > > hotplug feature.
>> >
>> > [...]
>> >
>> > > ***Solutions:
>> > > Introduce a new kernel parameter 'kaslr_boot_mem=nn@ss' to let users to
>> > > specify the memory regions where kernel can be allowed to randomize
>> > > safely.
>> >
>> > Manual solutions like that are pretty suboptimal to users, aren't they?
>> >
>> > In what way does memory hotplug feature 'break'? Does it crash or misbehave? Or
>> > simply does it not allow the movement of the affected memory region, while still
>> > allowing the rest to be moved?
>>
>> AFAIT, if kernel is randomized into the movable memory region, the
>> affected memory region can not be hot added/removed since it has kernel
>> data. Surely, the system can still work, the unaffected part still can
>> be moved. Still it will cause regression on memory hotplug.
>>
>> Mainly we parse SRAT table to get the ranges of memory provided by
>> hot-added memory devices in initmem_init(), that's very late. During boot,
>> we don't know it. Chao ever posted patches to grab SRAT at decompressing
>> stage, the code is very complicated and not elegant, ACPI maintainer
>> NACKed that.

Thanks for Ingo's suggestion and Baoquan's explaination.

Yes, I did ever try to dig SRAT table in boot period in early RFC PATCH:
https://lkml.org/lkml/2017/9/3/77
But the change is too huge so made this patchset to avoid this bug in a
small change, which will not make the code looks messy.

Thanks,
Chao Fan

>
>So there's apparently a mis-design here:
>
> - KASLR needs to be done very early on during bootup: - it's not realistic to
> expect KASLR to be done with a booted up kernel, because pointers to various
> KASLR-ed objects are already widely spread out in memory.
>
> - But for some unfathomable reason the memory hotplug attribute of memory
> regions is not part of the regular memory map but part of late-init ACPI data
> structures.
>
>The right solution would be _not_ to fudge the KASLR location, but to provide the
>memory hotplug information to early code, preferably via the primary memory map.
>KASLR can then make use of it and avoid those regions, just like it avoids other
>memory regions already.
>
>In addition to that hardware makers (including virtualized hardware) should also
>fix their systems to provide memory hotplug information to early code.
>
>Thanks,
>
> Ingo
>
>



2018-03-13 14:04:22

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v9 0/5] x86/KASLR: Add parameter kaslr_boot_mem=nn[KMG]@ss[KMG]

On 03/12/18 at 08:04pm, Chao Fan wrote:
> On Mon, Mar 12, 2018 at 11:57:27AM +0100, Ingo Molnar wrote:
> >> > > ***Background:
> >> > > People reported that kaslr may randomly chooses some positions
> >> > > which are located in movable memory regions. This will break memory
> >> > > hotplug feature.
> >> >
> >> > [...]
> >> >
> >> > > ***Solutions:
> >> > > Introduce a new kernel parameter 'kaslr_boot_mem=nn@ss' to let users to
> >> > > specify the memory regions where kernel can be allowed to randomize
> >> > > safely.
> >> >
> >> > Manual solutions like that are pretty suboptimal to users, aren't they?
> >> >
> >> > In what way does memory hotplug feature 'break'? Does it crash or misbehave? Or
> >> > simply does it not allow the movement of the affected memory region, while still
> >> > allowing the rest to be moved?
> >>
> >> AFAIT, if kernel is randomized into the movable memory region, the
> >> affected memory region can not be hot added/removed since it has kernel
> >> data. Surely, the system can still work, the unaffected part still can
> >> be moved. Still it will cause regression on memory hotplug.
> >>
> >> Mainly we parse SRAT table to get the ranges of memory provided by
> >> hot-added memory devices in initmem_init(), that's very late. During boot,
> >> we don't know it. Chao ever posted patches to grab SRAT at decompressing
> >> stage, the code is very complicated and not elegant, ACPI maintainer
> >> NACKed that.
>
> Thanks for Ingo's suggestion and Baoquan's explaination.
>
> Yes, I did ever try to dig SRAT table in boot period in early RFC PATCH:
> https://lkml.org/lkml/2017/9/3/77
> But the change is too huge so made this patchset to avoid this bug in a
> small change, which will not make the code looks messy.

ACPI tables are not independent, to parse SRAT to get information of
hotplug memory, we need get RSDP pointer, which points at RSDT or XSDT.
Then find SRAT from them. While RSDP is not in a fixed location, there
are several candidate positions, code can be checked in acpi_find_root_pointer()
of drivers/acpi/acpica/tbxfroot.c . And then iterate RSDT/XSDT to search
SRAT. These codes can not be reused between kaslr.c and drivers/acpi
because acpi code has special handling. So it will bloat kaslr boot
code. This is why both Rafael and I think it might be not good to grab
parse ACPI SRAT table in kaslr boot code.

>
> >
> >So there's apparently a mis-design here:
> >
> > - KASLR needs to be done very early on during bootup: - it's not realistic to
> > expect KASLR to be done with a booted up kernel, because pointers to various
> > KASLR-ed objects are already widely spread out in memory.
> >
> > - But for some unfathomable reason the memory hotplug attribute of memory
> > regions is not part of the regular memory map but part of late-init ACPI data
> > structures.
> >
> >The right solution would be _not_ to fudge the KASLR location, but to provide the
> >memory hotplug information to early code, preferably via the primary memory map.
> >KASLR can then make use of it and avoid those regions, just like it avoids other
> >memory regions already.
> >
> >In addition to that hardware makers (including virtualized hardware) should also
> >fix their systems to provide memory hotplug information to early code.

The hugepage allocation on kvm guest is a different situation. If people
want to allocate n pages of 1G size, they will get one page less in
kaslr enabled kernel than kaslr disabled kernel, casually. Because
kernel might be randomized to those 1G aligned huge pages in kaslr
kernel. While in no kaslr case, kernel will be put at 16M.

default_hugepagesz=1G hugepagesz=1G hugepages='n'

For this issue, unless we use a algorithm to analyze kernel cmdline and
do a flexiable estimate to avoid those 1G aligned huge pages. Still we
can't avoid the case that memblock may break the good 1G page. I can't
think of a good way to fix this in kaslr boot code.

Thanks
Baoquan

> >
> >
>
>

2018-03-19 07:26:23

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v9 0/5] x86/KASLR: Add parameter kaslr_boot_mem=nn[KMG]@ss[KMG]

On 03/12/18 at 08:04pm, Chao Fan wrote:
> On Mon, Mar 12, 2018 at 11:57:27AM +0100, Ingo Molnar wrote:
> >
> >* Baoquan He <[email protected]> wrote:
> >
> >> Hi Ingo,
> >>
> >> On 03/12/18 at 10:35am, Ingo Molnar wrote:
> >> >
> >> > * Chao Fan <[email protected]> wrote:
> >> >
> >> > > Long time no reply, rebase the patchset, change the parameter name
> >> > > from 'kaslr_mem' to 'kaslr_boot_mem'. There's no more code change.
> >> > >
> >> > > ***Background:
> >> > > People reported that kaslr may randomly chooses some positions
> >> > > which are located in movable memory regions. This will break memory
> >> > > hotplug feature.
> >> >
> >> > [...]
> >> >
> >> > > ***Solutions:
> >> > > Introduce a new kernel parameter 'kaslr_boot_mem=nn@ss' to let users to
> >> > > specify the memory regions where kernel can be allowed to randomize
> >> > > safely.
> >> >
> >> > Manual solutions like that are pretty suboptimal to users, aren't they?
> >> >
> >> > In what way does memory hotplug feature 'break'? Does it crash or misbehave? Or
> >> > simply does it not allow the movement of the affected memory region, while still
> >> > allowing the rest to be moved?
> >>
> >> AFAIT, if kernel is randomized into the movable memory region, the
> >> affected memory region can not be hot added/removed since it has kernel
> >> data. Surely, the system can still work, the unaffected part still can
> >> be moved. Still it will cause regression on memory hotplug.
> >>
> >> Mainly we parse SRAT table to get the ranges of memory provided by
> >> hot-added memory devices in initmem_init(), that's very late. During boot,
> >> we don't know it. Chao ever posted patches to grab SRAT at decompressing
> >> stage, the code is very complicated and not elegant, ACPI maintainer
> >> NACKed that.

Hi Chao,

Seems Ingo prefers the handling in kaslr boot code. Maybe you can try
to optimize and split your below patch and post anouther round?

I will see how to sove the hugepage in boot/compressed/kaslr.c .

Thanks
Baoquan

>
> Thanks for Ingo's suggestion and Baoquan's explaination.
>
> Yes, I did ever try to dig SRAT table in boot period in early RFC PATCH:
> https://lkml.org/lkml/2017/9/3/77
> But the change is too huge so made this patchset to avoid this bug in a
> small change, which will not make the code looks messy.
>
> Thanks,
> Chao Fan
>
> >
> >So there's apparently a mis-design here:
> >
> > - KASLR needs to be done very early on during bootup: - it's not realistic to
> > expect KASLR to be done with a booted up kernel, because pointers to various
> > KASLR-ed objects are already widely spread out in memory.
> >
> > - But for some unfathomable reason the memory hotplug attribute of memory
> > regions is not part of the regular memory map but part of late-init ACPI data
> > structures.
> >
> >The right solution would be _not_ to fudge the KASLR location, but to provide the
> >memory hotplug information to early code, preferably via the primary memory map.
> >KASLR can then make use of it and avoid those regions, just like it avoids other
> >memory regions already.
> >
> >In addition to that hardware makers (including virtualized hardware) should also
> >fix their systems to provide memory hotplug information to early code.
> >
> >Thanks,
> >
> > Ingo
> >
> >
>
>

2018-03-19 08:19:15

by Chao Fan

[permalink] [raw]
Subject: Re: [PATCH v9 0/5] x86/KASLR: Add parameter kaslr_boot_mem=nn[KMG]@ss[KMG]

On Mon, Mar 19, 2018 at 03:24:46PM +0800, Baoquan He wrote:
>On 03/12/18 at 08:04pm, Chao Fan wrote:
>> On Mon, Mar 12, 2018 at 11:57:27AM +0100, Ingo Molnar wrote:
>> >
>> >* Baoquan He <[email protected]> wrote:
>> >
>> >> Hi Ingo,
>> >>
>> >> On 03/12/18 at 10:35am, Ingo Molnar wrote:
>> >> >
>> >> > * Chao Fan <[email protected]> wrote:
>> >> >
>> >> > > Long time no reply, rebase the patchset, change the parameter name
>> >> > > from 'kaslr_mem' to 'kaslr_boot_mem'. There's no more code change.
>> >> > >
>> >> > > ***Background:
>> >> > > People reported that kaslr may randomly chooses some positions
>> >> > > which are located in movable memory regions. This will break memory
>> >> > > hotplug feature.
>> >> >
>> >> > [...]
>> >> >
>> >> > > ***Solutions:
>> >> > > Introduce a new kernel parameter 'kaslr_boot_mem=nn@ss' to let users to
>> >> > > specify the memory regions where kernel can be allowed to randomize
>> >> > > safely.
>> >> >
>> >> > Manual solutions like that are pretty suboptimal to users, aren't they?
>> >> >
>> >> > In what way does memory hotplug feature 'break'? Does it crash or misbehave? Or
>> >> > simply does it not allow the movement of the affected memory region, while still
>> >> > allowing the rest to be moved?
>> >>
>> >> AFAIT, if kernel is randomized into the movable memory region, the
>> >> affected memory region can not be hot added/removed since it has kernel
>> >> data. Surely, the system can still work, the unaffected part still can
>> >> be moved. Still it will cause regression on memory hotplug.
>> >>
>> >> Mainly we parse SRAT table to get the ranges of memory provided by
>> >> hot-added memory devices in initmem_init(), that's very late. During boot,
>> >> we don't know it. Chao ever posted patches to grab SRAT at decompressing
>> >> stage, the code is very complicated and not elegant, ACPI maintainer
>> >> NACKed that.
>
>Hi Chao,
>
>Seems Ingo prefers the handling in kaslr boot code. Maybe you can try
>to optimize and split your below patch and post anouther round?
>
>I will see how to sove the hugepage in boot/compressed/kaslr.c .

Yes, seems that I need to pick up the old patch and try to optimize it.

Thanks,
Chao Fan

>
>Thanks
>Baoquan
>
>>
>> Thanks for Ingo's suggestion and Baoquan's explaination.
>>
>> Yes, I did ever try to dig SRAT table in boot period in early RFC PATCH:
>> https://lkml.org/lkml/2017/9/3/77
>> But the change is too huge so made this patchset to avoid this bug in a
>> small change, which will not make the code looks messy.
>>
>> Thanks,
>> Chao Fan
>>
>> >
>> >So there's apparently a mis-design here:
>> >
>> > - KASLR needs to be done very early on during bootup: - it's not realistic to
>> > expect KASLR to be done with a booted up kernel, because pointers to various
>> > KASLR-ed objects are already widely spread out in memory.
>> >
>> > - But for some unfathomable reason the memory hotplug attribute of memory
>> > regions is not part of the regular memory map but part of late-init ACPI data
>> > structures.
>> >
>> >The right solution would be _not_ to fudge the KASLR location, but to provide the
>> >memory hotplug information to early code, preferably via the primary memory map.
>> >KASLR can then make use of it and avoid those regions, just like it avoids other
>> >memory regions already.
>> >
>> >In addition to that hardware makers (including virtualized hardware) should also
>> >fix their systems to provide memory hotplug information to early code.
>> >
>> >Thanks,
>> >
>> > Ingo
>> >
>> >
>>
>>
>
>



2018-03-29 01:48:24

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v9 0/5] x86/KASLR: Add parameter kaslr_boot_mem=nn[KMG]@ss[KMG]

Hi Ingo, Kees, Baoquan and Chao

At 03/12/2018 06:57 PM, Ingo Molnar wrote:
[...]
> So there's apparently a mis-design here:
>
> - KASLR needs to be done very early on during bootup: - it's not realistic to
> expect KASLR to be done with a booted up kernel, because pointers to various
> KASLR-ed objects are already widely spread out in memory.
>
> - But for some unfathomable reason the memory hotplug attribute of memory
> regions is not part of the regular memory map but part of late-init ACPI data
> structures.
>
> The right solution would be _not_ to fudge the KASLR location, but to provide the
> memory hotplug information to early code, preferably via the primary memory map.
> KASLR can then make use of it and avoid those regions, just like it avoids other
> memory regions already.
>
> In addition to that hardware makers (including virtualized hardware) should also
> fix their systems to provide memory hotplug information to early code.
>

Yes, but before this, can we fix this problem by the following patch
which has been sent and reviewed by Kees before[1]. its solution is:

  Extend movable_node option to restrict kernel to be randomized in
  immovable nodes by adding a parameter. this parameter sets up
  the boundaries between the home nodes and other nodes.

My reason is here:

  - What we really want to solve is the KASLR breaks *physical Node
    hotplug*, Keep the decompressed kernel in an immovable node is
    enough.

  - AFAICS, there are not too many systems where physical Node hotplug
actually works in practice, and there mush be one node called *home
    node* which is immovable for storing basic information.

  - the node in modern systems could have double-digit gigabytes memory,
    It can completely satisfy the operation of KASLR.

So, Just restrict kernel to be randomized in the home node, and ignore
other nodes when kernel has the *movable_node* option in the command
line.

Thoughts? may I rebase and resend the patch?

[1] https://lkml.org/lkml/2017/8/3/401

Thanks,

dou