2018-01-04 08:03:01

by Chao Fan

[permalink] [raw]
Subject: [PATCH v5 0/4] kaslr: add parameter immovable_mem=nn[KMG]@ss[KMG] to make memory hotplug work well with kaslr


Here is a problem:
Here is a machine with several NUMA nodes and some of them are
hot-pluggable. It's not good for kernel to be extracted in the memory
region of movable node. But in current code, I print the address chosen by
kaslr and found it may be placed in movable node sometimes.
To solve this problem, it's better to limit the memory region chosen by
kaslr to immovable node in kaslr.c. But the memory information about if
it's hot-pluggable is stored in ACPI SRAT table, which is parsed after
kernel is extracted. So we can't get the detail memory information
before extracting kernel.

So add the new parameter immovable_mem=nn@ss, in which nn means
the size of memory in *immovable* node, and ss means the start position of
this memory region. Then limit kaslr choose memory in these regions.

There are two policies:
1. Specify the memory region in *movable* node to avoid:
Then we can use the existing mem_avoid to handle. But if the memory
on movable node was separated by memory hole or different movable nodes
are discontinuous, we don't know how many regions need to avoid.
OTOH, we must avoid all of the movable memory, otherwise, kaslr may
choose the wrong place.
2. Specify the memory region in *immovable* node to select:
Only support 4 regions in this parameter. Then user can use two nodes
at least for kaslr to choose, it's enough for the kernel to extract.
At the same time, because we need only 4 new mem_vector, the usage
of memory here is not too big. So I think this way is better, and this
patchset is based on this policy.

PATCH 1/4 parse the new parameter immovable_mem=nn[KMG]@ss[KMG], then
store the memory regions.
PATCH 2/4 select the memory region in immovable node when process
memmap.
PATCH 3/4 skip mirror feature if movable_node or immovable_mem specified.
PATCH 4/4 add document.

v1->v2:
Follow Dou Liyang's suggestion:
- Add the parse for movable_node=nn[KMG] without @ss[KMG]
- Fix the bug for more than one "movable_node=" specified
- Drop useless variables and use mem_vector region directely
- Add more comments.

v2->v3:
Follow Baoquan He's suggestion:
- Change names of several functions.
- Add a new parameter "immovable_mem" instead of extending mvoable_node
- Use the clamp to calculate the memory intersecting, which makes
logical more clear.
- Disable memory mirror if movable_node specified

v3->v4:
Follow Kees's suggestion:
- Put the functions variables of immovable_mem to #ifdef
CONFIG_MEMORY_HOTPLUG and change some code place
- Change the name of "process_mem_region" to "slots_count"
- Reanme the new function "process_immovable_mem" to "process_mem_region"
Follow Baoquan's suggestion:
- Fail KASLR if "movable_node" specified without "immovable_mem"
- Ajust the code place of handling mem_region directely if no
immovable_mem specified
Follow Randy's suggestion:
- Change the mistake and add detailed description for the document.

v4->v5:
- Change the problem reported by LKP
Follow Dou's suggestion:
- Also return if match "movable_node" when parsing kernel commandline
in handle_mem_filter without define CONFIG_MEMORY_HOTPLUG

Chao Fan (4):
kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory
kaslr: calculate the memory region in immovable node
kaslr: disable memory mirror feature when movable_node
document: change the document for immovable_mem

Documentation/admin-guide/kernel-parameters.txt | 10 ++
arch/x86/boot/compressed/kaslr.c | 186 ++++++++++++++++++++++--
2 files changed, 182 insertions(+), 14 deletions(-)

--
2.14.3




2018-01-04 08:03:06

by Chao Fan

[permalink] [raw]
Subject: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory

In current code, kaslr may choose the memory region in movable
nodes to extract kernel, which will make the nodes can't be hot-removed.
To solve it, we can specify the memory region in immovable node.
Create immovable_mem to store the regions in immovable_mem, where should
be chosen by kaslr.

Also change the "handle_mem_memmap" to "handle_mem_filter", since
it will not only handle memmap parameter now.
Since "immovable_mem=" only works with "movable_node", so "immovable_mem="
doesn't work alone. If specify "movable_node" without "immovable_mem=",
disable KASLR.

Multiple regions can be specified, comma delimited.
Considering the usage of memory, only support for 4 regions.
4 regions contains 2 nodes at least, enough for kernel to extract.

Signed-off-by: Chao Fan <[email protected]>
---
arch/x86/boot/compressed/kaslr.c | 112 +++++++++++++++++++++++++++++++++++++--
1 file changed, 109 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 8199a6187251..60e5aa28b510 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -108,6 +108,19 @@ enum mem_avoid_index {

static struct mem_vector mem_avoid[MEM_AVOID_MAX];

+#ifdef CONFIG_MEMORY_HOTPLUG
+/* Only supporting at most 4 immovable memory regions with kaslr */
+#define MAX_IMMOVABLE_MEM 4
+
+static bool lack_immovable_mem;
+
+/* Store the memory regions in immovable node */
+static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
+
+/* The immovable regions user specify, not more than 4 */
+static int num_immovable_region;
+#endif
+
static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
{
/* Item one is entirely before item two. */
@@ -206,15 +219,97 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
}

-static int handle_mem_memmap(void)
+#ifdef CONFIG_MEMORY_HOTPLUG
+static int parse_immovable_mem(char *p,
+ unsigned long long *start,
+ unsigned long long *size)
+{
+ char *oldp;
+
+ if (!p)
+ return -EINVAL;
+
+ oldp = p;
+ *size = memparse(p, &p);
+ if (p == oldp)
+ return -EINVAL;
+
+ switch (*p) {
+ case '@':
+ *start = memparse(p + 1, &p);
+ return 0;
+ default:
+ /*
+ * If w/o offset, only size specified, immovable_mem=nn[KMG]
+ * has the same behaviour as immovable_mem=nn[KMG]@0. It means
+ * the region starts from 0.
+ */
+ *start = 0;
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+static void parse_immovable_mem_regions(char *str)
+{
+ static int i;
+
+ while (str && (i < MAX_IMMOVABLE_MEM)) {
+ int rc;
+ unsigned long long start, size;
+ char *k = strchr(str, ',');
+
+ if (k)
+ *k++ = 0;
+
+ rc = parse_immovable_mem(str, &start, &size);
+ if (rc < 0)
+ break;
+ str = k;
+
+ immovable_mem[i].start = start;
+ immovable_mem[i].size = size;
+ i++;
+ }
+ num_immovable_region = i;
+}
+#else
+static inline void parse_immovable_mem_regions(char *str)
+{
+}
+#endif
+
+static int handle_mem_filter(void)
{
char *args = (char *)get_cmd_line_ptr();
size_t len = strlen((char *)args);
+ bool enable_movable_node = false;
char *tmp_cmdline;
char *param, *val;
u64 mem_size;

- if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+#ifdef CONFIG_MEMORY_HOTPLUG
+ if (strstr(args, "movable_node")) {
+ /*
+ * Confirm "movable_node" specified, otherwise
+ * "immovable_mem=" doesn't work.
+ */
+ enable_movable_node = true;
+
+ /*
+ * If only specify "movable_node" without "immovable_mem=",
+ * disable KASLR.
+ */
+ if (!strstr(args, "immovable_mem=")) {
+ lack_immovable_mem = true;
+ return 0;
+ }
+ }
+#endif
+
+ if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+ !enable_movable_node)
return 0;

tmp_cmdline = malloc(len + 1);
@@ -239,6 +334,9 @@ static int handle_mem_memmap(void)

if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+ } else if (!strcmp(param, "immovable_mem=") &&
+ enable_movable_node) {
+ parse_immovable_mem_regions(val);
} else if (!strcmp(param, "mem")) {
char *p = val;

@@ -378,7 +476,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
/* We don't need to set a mapping for setup_data. */

/* Mark the memmap regions we need to avoid */
- handle_mem_memmap();
+ handle_mem_filter();

#ifdef CONFIG_X86_VERBOSE_BOOTUP
/* Make sure video RAM can be used. */
@@ -673,6 +771,14 @@ static unsigned long find_random_phys_addr(unsigned long minimum,
return 0;
}

+#ifdef CONFIG_MEMORY_HOTPLUG
+ /* Check if specify "movable_node" without "immovable_mem=". */
+ if (lack_immovable_mem) {
+ debug_putstr("Fail KASLR when movable_node specified without immovable_mem=.\n");
+ return 0;
+ }
+#endif
+
/* Make sure minimum is aligned. */
minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);

--
2.14.3



2018-01-04 08:03:14

by Chao Fan

[permalink] [raw]
Subject: [PATCH v5 2/4] kaslr: calculate the memory region in immovable node

If there is no immovable memory region specified, use region directely.
There are several conditons:
1. CONFIG_MEMORY_HOTPLUG is not specified to y.
2. immovable_mem= is not specified.

Otherwise, calculate the intersecting between memmap entry and
immovable memory.

Rename process_mem_region to slots_count to match
slots_fetch_random, and rename new function sa process_mem_region.

Signed-off-by: Chao Fan <[email protected]>
---
arch/x86/boot/compressed/kaslr.c | 67 +++++++++++++++++++++++++++++++++-------
1 file changed, 56 insertions(+), 11 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 60e5aa28b510..8f9398757120 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -579,9 +579,9 @@ static unsigned long slots_fetch_random(void)
return 0;
}

-static void process_mem_region(struct mem_vector *entry,
- unsigned long minimum,
- unsigned long image_size)
+static void slots_count(struct mem_vector *entry,
+ unsigned long minimum,
+ unsigned long image_size)
{
struct mem_vector region, overlap;
struct slot_area slot_area;
@@ -658,6 +658,55 @@ static void process_mem_region(struct mem_vector *entry,
}
}

+static bool process_mem_region(struct mem_vector region,
+ unsigned long long minimum,
+ unsigned long long image_size)
+{
+#ifdef CONFIG_MEMORY_HOTPLUG
+ /*
+ * If immovable_mem= specified, walk all immovable regions, and
+ * filter the intersection to slots_count.
+ */
+ int i;
+
+ if (num_immovable_region > 0) {
+ for (i = 0; i < num_immovable_region; i++) {
+ struct mem_vector entry;
+ unsigned long long start, end, entry_end, region_end;
+
+ start = immovable_mem[i].start;
+ end = start + immovable_mem[i].size;
+ region_end = region.start + region.size;
+
+ entry.start = clamp(region.start, start, end);
+ entry_end = clamp(region_end, start, end);
+
+ if (entry.start < entry_end) {
+ entry.size = entry_end - entry.start;
+ slots_count(&entry, minimum, image_size);
+ }
+
+ if (slot_area_index == MAX_SLOT_AREA) {
+ debug_putstr("Aborted memmap scan (slot_areas full)!\n");
+ return 1;
+ }
+ }
+ return 0;
+ }
+#endif
+
+ /*
+ * If no immovable_mem stored, or CONFIG_MEMORY_HOTPLUG not specified,
+ * use region directly
+ */
+ slots_count(&region, minimum, image_size);
+ if (slot_area_index == MAX_SLOT_AREA) {
+ debug_putstr("Aborted memmap scan (slot_areas full)!\n");
+ return 1;
+ }
+ return 0;
+}
+
#ifdef CONFIG_EFI
/*
* Returns true if mirror region found (and must have been processed
@@ -723,11 +772,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)

region.start = md->phys_addr;
region.size = md->num_pages << EFI_PAGE_SHIFT;
- process_mem_region(&region, minimum, image_size);
- if (slot_area_index == MAX_SLOT_AREA) {
- debug_putstr("Aborted EFI scan (slot_areas full)!\n");
+
+ if (process_mem_region(region, minimum, image_size))
break;
- }
}
return true;
}
@@ -754,11 +801,9 @@ static void process_e820_entries(unsigned long minimum,
continue;
region.start = entry->addr;
region.size = entry->size;
- process_mem_region(&region, minimum, image_size);
- if (slot_area_index == MAX_SLOT_AREA) {
- debug_putstr("Aborted e820 scan (slot_areas full)!\n");
+
+ if (process_mem_region(region, minimum, image_size))
break;
- }
}
}

--
2.14.3



2018-01-04 08:03:20

by Chao Fan

[permalink] [raw]
Subject: [PATCH v5 3/4] kaslr: disable memory mirror feature when movable_node

In kernel code, if movable_node specified, it will skip the mirror
feature. So we should also skip mirror feature in kaslr.

Signed-off-by: Chao Fan <[email protected]>
---
arch/x86/boot/compressed/kaslr.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 8f9398757120..f8a925de9436 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -716,6 +716,7 @@ static bool
process_efi_entries(unsigned long minimum, unsigned long image_size)
{
struct efi_info *e = &boot_params->efi_info;
+ char *args = (char *)get_cmd_line_ptr();
bool efi_mirror_found = false;
struct mem_vector region;
efi_memory_desc_t *md;
@@ -749,6 +750,12 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
}
}

+#ifdef CONFIG_MEMORY_HOTPLUG
+ /* Skip memory mirror if movabale_node or immovable_mem specified */
+ if (strstr(args, "movable_node"))
+ efi_mirror_found = false;
+#endif
+
for (i = 0; i < nr_desc; i++) {
md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);

--
2.14.3



2018-01-04 08:03:19

by Chao Fan

[permalink] [raw]
Subject: [PATCH v5 4/4] document: change the document for immovable_mem

Add the document for the change of new parameter
immovable_mem=nn[KMG][@ss[KMG]].

Signed-off-by: Chao Fan <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7041c6710f22..41b69a010d1a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2352,6 +2352,16 @@
allocations which rules out almost all kernel
allocations. Use with caution!

+ immovable_mem=nn[KMG][@ss[KMG]]
+ [KNL] Force usage of a specific region of memory.
+ Make memory hotplug work well with KASLR.
+ Region of memory in immovable node is from ss to ss+nn.
+ If ss is omitted, it defaults to 0.
+ Multiple regions can be specified, comma delimited.
+ Notice: we support 4 regions at most now.
+ Example:
+ immovable_mem=1G,500M@2G,1G@4G
+
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>

--
2.14.3



2018-01-04 10:31:08

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory

On 01/04/18 at 04:02pm, Chao Fan wrote:
> In current code, kaslr may choose the memory region in movable
> nodes to extract kernel, which will make the nodes can't be hot-removed.
> To solve it, we can specify the memory region in immovable node.
> Create immovable_mem to store the regions in immovable_mem, where should
> be chosen by kaslr.
>
> Also change the "handle_mem_memmap" to "handle_mem_filter", since
> it will not only handle memmap parameter now.
> Since "immovable_mem=" only works with "movable_node", so "immovable_mem="
> doesn't work alone. If specify "movable_node" without "immovable_mem=",
> disable KASLR.
>
> Multiple regions can be specified, comma delimited.
> Considering the usage of memory, only support for 4 regions.
> 4 regions contains 2 nodes at least, enough for kernel to extract.
>
> Signed-off-by: Chao Fan <[email protected]>

Hi Chao,

Thanks for your effort on this issue.

Luiz told me they met a hugetlb issue when kaslr enabled on kvm guest.
Please check the below bug information. There's only one available
position which hugepage can use to allocate. In this case, if we have a
generic parameter to tell kernel where we can randomize into, this
hugepage issue can be solved. We can restrict kernel to randomize beyond
[0x40000000, 0x7fffffff]. Not sure if your immovable_mem=nn[KMG]@ss[KMG]
can be adjusted to do this. I am hesitating on whether we should change
this or not.

Hi maintainers, Kees,

1) Let's keep Chao's current code, just ask Luiz to use nokaslr to work
around the hugepage allocation failure;
2) Change immovable_mem=nn[KMG]@ss[KMG] to be a generic parameter,
people can use it to restrict kernel to places they want.

Which one is better? Or any other idea or suggestion?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Luiz said on kvm guest, they add the following to the kernel command-line:
default_hugepagesz=1G hugepagesz=1G hugepages=1

Boot the guest and check number of 1GB pages reserved:

grep HugePages_Total /proc/meminfo

When booting with "nokaslr" HugePages_Total is always 1. When booting
without "nokaslr" sometimes HugePages_Total is zero (that is, reserving
the 1GB page fails).

And 20 reboots, 6 failures to mount single 1G hugepage.

I reproduced on kvm guest with Luiz's help, and found it's because there's
only one available position for 1G hugepage allocation, [0x40000000, 0x7fffffff].
That's why they saw 1/4 possibility of failure. If kernel randomized to
[0x0, 0x3fffffff], [0x80000000, 0xbffdffff], or [0x100000000,
0x13fffffff], hugepage can always succeed to allocate.

dmesg output snippet of kvm guest:
[ +0.000000] e820: BIOS-provided physical RAM map:
[ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable

Thanks
Baoquan

> ---
> arch/x86/boot/compressed/kaslr.c | 112 +++++++++++++++++++++++++++++++++++++--
> 1 file changed, 109 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 8199a6187251..60e5aa28b510 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -108,6 +108,19 @@ enum mem_avoid_index {
>
> static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +/* Only supporting at most 4 immovable memory regions with kaslr */
> +#define MAX_IMMOVABLE_MEM 4
> +
> +static bool lack_immovable_mem;
> +
> +/* Store the memory regions in immovable node */
> +static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
> +
> +/* The immovable regions user specify, not more than 4 */
> +static int num_immovable_region;
> +#endif
> +
> static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
> {
> /* Item one is entirely before item two. */
> @@ -206,15 +219,97 @@ static void mem_avoid_memmap(char *str)
> memmap_too_large = true;
> }
>
> -static int handle_mem_memmap(void)
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +static int parse_immovable_mem(char *p,
> + unsigned long long *start,
> + unsigned long long *size)
> +{
> + char *oldp;
> +
> + if (!p)
> + return -EINVAL;
> +
> + oldp = p;
> + *size = memparse(p, &p);
> + if (p == oldp)
> + return -EINVAL;
> +
> + switch (*p) {
> + case '@':
> + *start = memparse(p + 1, &p);
> + return 0;
> + default:
> + /*
> + * If w/o offset, only size specified, immovable_mem=nn[KMG]
> + * has the same behaviour as immovable_mem=nn[KMG]@0. It means
> + * the region starts from 0.
> + */
> + *start = 0;
> + return 0;
> + }
> +
> + return -EINVAL;
> +}
> +
> +static void parse_immovable_mem_regions(char *str)
> +{
> + static int i;
> +
> + while (str && (i < MAX_IMMOVABLE_MEM)) {
> + int rc;
> + unsigned long long start, size;
> + char *k = strchr(str, ',');
> +
> + if (k)
> + *k++ = 0;
> +
> + rc = parse_immovable_mem(str, &start, &size);
> + if (rc < 0)
> + break;
> + str = k;
> +
> + immovable_mem[i].start = start;
> + immovable_mem[i].size = size;
> + i++;
> + }
> + num_immovable_region = i;
> +}
> +#else
> +static inline void parse_immovable_mem_regions(char *str)
> +{
> +}
> +#endif
> +
> +static int handle_mem_filter(void)
> {
> char *args = (char *)get_cmd_line_ptr();
> size_t len = strlen((char *)args);
> + bool enable_movable_node = false;
> char *tmp_cmdline;
> char *param, *val;
> u64 mem_size;
>
> - if (!strstr(args, "memmap=") && !strstr(args, "mem="))
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + if (strstr(args, "movable_node")) {
> + /*
> + * Confirm "movable_node" specified, otherwise
> + * "immovable_mem=" doesn't work.
> + */
> + enable_movable_node = true;
> +
> + /*
> + * If only specify "movable_node" without "immovable_mem=",
> + * disable KASLR.
> + */
> + if (!strstr(args, "immovable_mem=")) {
> + lack_immovable_mem = true;
> + return 0;
> + }
> + }
> +#endif
> +
> + if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
> + !enable_movable_node)
> return 0;
>
> tmp_cmdline = malloc(len + 1);
> @@ -239,6 +334,9 @@ static int handle_mem_memmap(void)
>
> if (!strcmp(param, "memmap")) {
> mem_avoid_memmap(val);
> + } else if (!strcmp(param, "immovable_mem=") &&
> + enable_movable_node) {
> + parse_immovable_mem_regions(val);
> } else if (!strcmp(param, "mem")) {
> char *p = val;
>
> @@ -378,7 +476,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
> /* We don't need to set a mapping for setup_data. */
>
> /* Mark the memmap regions we need to avoid */
> - handle_mem_memmap();
> + handle_mem_filter();
>
> #ifdef CONFIG_X86_VERBOSE_BOOTUP
> /* Make sure video RAM can be used. */
> @@ -673,6 +771,14 @@ static unsigned long find_random_phys_addr(unsigned long minimum,
> return 0;
> }
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + /* Check if specify "movable_node" without "immovable_mem=". */
> + if (lack_immovable_mem) {
> + debug_putstr("Fail KASLR when movable_node specified without immovable_mem=.\n");
> + return 0;
> + }
> +#endif
> +
> /* Make sure minimum is aligned. */
> minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
>
> --
> 2.14.3
>
>
>

2018-01-04 16:21:09

by Luiz Capitulino

[permalink] [raw]
Subject: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)

On Thu, 4 Jan 2018 18:30:57 +0800
Baoquan He <[email protected]> wrote:

> On 01/04/18 at 04:02pm, Chao Fan wrote:
> > In current code, kaslr may choose the memory region in movable
> > nodes to extract kernel, which will make the nodes can't be hot-removed.
> > To solve it, we can specify the memory region in immovable node.
> > Create immovable_mem to store the regions in immovable_mem, where should
> > be chosen by kaslr.

[...]

> Hi Chao,
>
> Thanks for your effort on this issue.
>
> Luiz told me they met a hugetlb issue when kaslr enabled on kvm guest.
> Please check the below bug information. There's only one available
> position which hugepage can use to allocate. In this case, if we have a
> generic parameter to tell kernel where we can randomize into, this
> hugepage issue can be solved. We can restrict kernel to randomize beyond
> [0x40000000, 0x7fffffff]. Not sure if your immovable_mem=nn[KMG]@ss[KMG]
> can be adjusted to do this. I am hesitating on whether we should change
> this or not.

Having a generic kaslr parameter to control where the kernel is extracted
is one solution for this problem.

The general problem statement is that KASLR may break some kernel features
depending on where the kernel is extracted. Two examples are hot-plugged
memory (this series) and 1GB HugeTLB pages.

The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
that there's a bunch of people running guests with up to 5GB of memory and
with that amount of memory you have one or two 1GB pages and is easier for
KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
you may not get any 1GB pages at all when this happens. However, I can also
reproduce this on bare-metal with lots of memory where I can loose a 1GB
page from time to time.

Having a kaslr_range= parameter solves both issues, but two major drawbacks
is that it breaks existing setups and I guess users will have a very hard
time choosing good ranges.

Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
could have a list of ranges known to contain holes and/or immovable
memory and only extract the kernel into those ranges.

2018-01-05 02:58:55

by Chao Fan

[permalink] [raw]
Subject: Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory

On Thu, Jan 04, 2018 at 06:30:57PM +0800, Baoquan He wrote:
>On 01/04/18 at 04:02pm, Chao Fan wrote:
>> In current code, kaslr may choose the memory region in movable
>> nodes to extract kernel, which will make the nodes can't be hot-removed.
>> To solve it, we can specify the memory region in immovable node.
>> Create immovable_mem to store the regions in immovable_mem, where should
>> be chosen by kaslr.
>>
>> Also change the "handle_mem_memmap" to "handle_mem_filter", since
>> it will not only handle memmap parameter now.
>> Since "immovable_mem=" only works with "movable_node", so "immovable_mem="
>> doesn't work alone. If specify "movable_node" without "immovable_mem=",
>> disable KASLR.
>>
>> Multiple regions can be specified, comma delimited.
>> Considering the usage of memory, only support for 4 regions.
>> 4 regions contains 2 nodes at least, enough for kernel to extract.
>>
>> Signed-off-by: Chao Fan <[email protected]>
>
>Hi Chao,
>
>Thanks for your effort on this issue.
>
>Luiz told me they met a hugetlb issue when kaslr enabled on kvm guest.
>Please check the below bug information. There's only one available
>position which hugepage can use to allocate. In this case, if we have a
>generic parameter to tell kernel where we can randomize into, this
>hugepage issue can be solved. We can restrict kernel to randomize beyond
>[0x40000000, 0x7fffffff]. Not sure if your immovable_mem=nn[KMG]@ss[KMG]
>can be adjusted to do this. I am hesitating on whether we should change
>this or not.
>

Hi Baoquan, Luiz,

In my personal understanding, there is only one region,
[0x40000000, 0x7fffffff] suitable for the 1G page, so we should avoid
kaslr to choose this region, right?

If my understanding is right, I think it's more similar with mem_avoid.
Because we specify where KASLR *choose* in "immovable_mem=", we specify
where KASLR *avoid* in "mem_avoid".
So I wonder if it's OK to expand mem_avoid, and add a member like
MEM_AVOID_HUGEPAGE in "enum mem_avoid_index".
But there is a disadvantage, we can only specify the limited regions.

Luiz, I am not familiar with HUGE PAGE, I wonder how many 1G HUGE pages
does system need in general? We may need to limit it in 2, or 4.

Thanks,
Chao Fan

>Hi maintainers, Kees,
>
>1) Let's keep Chao's current code, just ask Luiz to use nokaslr to work
>around the hugepage allocation failure;
>2) Change immovable_mem=nn[KMG]@ss[KMG] to be a generic parameter,
>people can use it to restrict kernel to places they want.
>
>Which one is better? Or any other idea or suggestion?
>
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>Luiz said on kvm guest, they add the following to the kernel command-line:
>default_hugepagesz=1G hugepagesz=1G hugepages=1
>
>Boot the guest and check number of 1GB pages reserved:
>
>grep HugePages_Total /proc/meminfo
>
>When booting with "nokaslr" HugePages_Total is always 1. When booting
>without "nokaslr" sometimes HugePages_Total is zero (that is, reserving
>the 1GB page fails).
>
>And 20 reboots, 6 failures to mount single 1G hugepage.
>
>I reproduced on kvm guest with Luiz's help, and found it's because there's
>only one available position for 1G hugepage allocation, [0x40000000, 0x7fffffff].
>That's why they saw 1/4 possibility of failure. If kernel randomized to
>[0x0, 0x3fffffff], [0x80000000, 0xbffdffff], or [0x100000000,
>0x13fffffff], hugepage can always succeed to allocate.
>
>dmesg output snippet of kvm guest:
>[ +0.000000] e820: BIOS-provided physical RAM map:
>[ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
>[ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
>[ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
>[ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
>[ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
>[ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
>[ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
>[ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable
>
>Thanks
>Baoquan
>
>> ---
>> arch/x86/boot/compressed/kaslr.c | 112 +++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 109 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
>> index 8199a6187251..60e5aa28b510 100644
>> --- a/arch/x86/boot/compressed/kaslr.c
>> +++ b/arch/x86/boot/compressed/kaslr.c
>> @@ -108,6 +108,19 @@ enum mem_avoid_index {
>>
>> static struct mem_vector mem_avoid[MEM_AVOID_MAX];
>>
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +/* Only supporting at most 4 immovable memory regions with kaslr */
>> +#define MAX_IMMOVABLE_MEM 4
>> +
>> +static bool lack_immovable_mem;
>> +
>> +/* Store the memory regions in immovable node */
>> +static struct mem_vector immovable_mem[MAX_IMMOVABLE_MEM];
>> +
>> +/* The immovable regions user specify, not more than 4 */
>> +static int num_immovable_region;
>> +#endif
>> +
>> static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
>> {
>> /* Item one is entirely before item two. */
>> @@ -206,15 +219,97 @@ static void mem_avoid_memmap(char *str)
>> memmap_too_large = true;
>> }
>>
>> -static int handle_mem_memmap(void)
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +static int parse_immovable_mem(char *p,
>> + unsigned long long *start,
>> + unsigned long long *size)
>> +{
>> + char *oldp;
>> +
>> + if (!p)
>> + return -EINVAL;
>> +
>> + oldp = p;
>> + *size = memparse(p, &p);
>> + if (p == oldp)
>> + return -EINVAL;
>> +
>> + switch (*p) {
>> + case '@':
>> + *start = memparse(p + 1, &p);
>> + return 0;
>> + default:
>> + /*
>> + * If w/o offset, only size specified, immovable_mem=nn[KMG]
>> + * has the same behaviour as immovable_mem=nn[KMG]@0. It means
>> + * the region starts from 0.
>> + */
>> + *start = 0;
>> + return 0;
>> + }
>> +
>> + return -EINVAL;
>> +}
>> +
>> +static void parse_immovable_mem_regions(char *str)
>> +{
>> + static int i;
>> +
>> + while (str && (i < MAX_IMMOVABLE_MEM)) {
>> + int rc;
>> + unsigned long long start, size;
>> + char *k = strchr(str, ',');
>> +
>> + if (k)
>> + *k++ = 0;
>> +
>> + rc = parse_immovable_mem(str, &start, &size);
>> + if (rc < 0)
>> + break;
>> + str = k;
>> +
>> + immovable_mem[i].start = start;
>> + immovable_mem[i].size = size;
>> + i++;
>> + }
>> + num_immovable_region = i;
>> +}
>> +#else
>> +static inline void parse_immovable_mem_regions(char *str)
>> +{
>> +}
>> +#endif
>> +
>> +static int handle_mem_filter(void)
>> {
>> char *args = (char *)get_cmd_line_ptr();
>> size_t len = strlen((char *)args);
>> + bool enable_movable_node = false;
>> char *tmp_cmdline;
>> char *param, *val;
>> u64 mem_size;
>>
>> - if (!strstr(args, "memmap=") && !strstr(args, "mem="))
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> + if (strstr(args, "movable_node")) {
>> + /*
>> + * Confirm "movable_node" specified, otherwise
>> + * "immovable_mem=" doesn't work.
>> + */
>> + enable_movable_node = true;
>> +
>> + /*
>> + * If only specify "movable_node" without "immovable_mem=",
>> + * disable KASLR.
>> + */
>> + if (!strstr(args, "immovable_mem=")) {
>> + lack_immovable_mem = true;
>> + return 0;
>> + }
>> + }
>> +#endif
>> +
>> + if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
>> + !enable_movable_node)
>> return 0;
>>
>> tmp_cmdline = malloc(len + 1);
>> @@ -239,6 +334,9 @@ static int handle_mem_memmap(void)
>>
>> if (!strcmp(param, "memmap")) {
>> mem_avoid_memmap(val);
>> + } else if (!strcmp(param, "immovable_mem=") &&
>> + enable_movable_node) {
>> + parse_immovable_mem_regions(val);
>> } else if (!strcmp(param, "mem")) {
>> char *p = val;
>>
>> @@ -378,7 +476,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
>> /* We don't need to set a mapping for setup_data. */
>>
>> /* Mark the memmap regions we need to avoid */
>> - handle_mem_memmap();
>> + handle_mem_filter();
>>
>> #ifdef CONFIG_X86_VERBOSE_BOOTUP
>> /* Make sure video RAM can be used. */
>> @@ -673,6 +771,14 @@ static unsigned long find_random_phys_addr(unsigned long minimum,
>> return 0;
>> }
>>
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> + /* Check if specify "movable_node" without "immovable_mem=". */
>> + if (lack_immovable_mem) {
>> + debug_putstr("Fail KASLR when movable_node specified without immovable_mem=.\n");
>> + return 0;
>> + }
>> +#endif
>> +
>> /* Make sure minimum is aligned. */
>> minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
>>
>> --
>> 2.14.3
>>
>>
>>
>
>


2018-01-08 14:39:57

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory

On Fri, 5 Jan 2018 10:58:11 +0800
Chao Fan <[email protected]> wrote:

> On Thu, Jan 04, 2018 at 06:30:57PM +0800, Baoquan He wrote:
> >On 01/04/18 at 04:02pm, Chao Fan wrote:
> >> In current code, kaslr may choose the memory region in movable
> >> nodes to extract kernel, which will make the nodes can't be hot-removed.
> >> To solve it, we can specify the memory region in immovable node.
> >> Create immovable_mem to store the regions in immovable_mem, where should
> >> be chosen by kaslr.
> >>
> >> Also change the "handle_mem_memmap" to "handle_mem_filter", since
> >> it will not only handle memmap parameter now.
> >> Since "immovable_mem=" only works with "movable_node", so "immovable_mem="
> >> doesn't work alone. If specify "movable_node" without "immovable_mem=",
> >> disable KASLR.
> >>
> >> Multiple regions can be specified, comma delimited.
> >> Considering the usage of memory, only support for 4 regions.
> >> 4 regions contains 2 nodes at least, enough for kernel to extract.
> >>
> >> Signed-off-by: Chao Fan <[email protected]>
> >
> >Hi Chao,
> >
> >Thanks for your effort on this issue.
> >
> >Luiz told me they met a hugetlb issue when kaslr enabled on kvm guest.
> >Please check the below bug information. There's only one available
> >position which hugepage can use to allocate. In this case, if we have a
> >generic parameter to tell kernel where we can randomize into, this
> >hugepage issue can be solved. We can restrict kernel to randomize beyond
> >[0x40000000, 0x7fffffff]. Not sure if your immovable_mem=nn[KMG]@ss[KMG]
> >can be adjusted to do this. I am hesitating on whether we should change
> >this or not.
> >
>
> Hi Baoquan, Luiz,
>
> In my personal understanding, there is only one region,
> [0x40000000, 0x7fffffff] suitable for the 1G page, so we should avoid
> kaslr to choose this region, right?

For a guest configured with 4GB of memory with the device configuration
we're using yes.

> If my understanding is right, I think it's more similar with mem_avoid.
> Because we specify where KASLR *choose* in "immovable_mem=", we specify
> where KASLR *avoid* in "mem_avoid".
> So I wonder if it's OK to expand mem_avoid, and add a member like
> MEM_AVOID_HUGEPAGE in "enum mem_avoid_index".
> But there is a disadvantage, we can only specify the limited regions.

Not requiring new command-line options would be great for users,
but I'm not sure it's possible to use mem_avoid because I guess the
free area may vary depending on amount of memory, devices, etc.

> Luiz, I am not familiar with HUGE PAGE, I wonder how many 1G HUGE pages
> does system need in general? We may need to limit it in 2, or 4.

I don't think it's possible to impose a limit. But the case we've
been discussing in this thread it the case that has the greater
impact: a guest with 4GB of memory which always has 1GB page with nokaslr,
but may not have any 1GB page without nokaslr.

2018-01-09 01:37:31

by Chao Fan

[permalink] [raw]
Subject: Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory

On Mon, Jan 08, 2018 at 09:39:54AM -0500, Luiz Capitulino wrote:
>On Fri, 5 Jan 2018 10:58:11 +0800
>Chao Fan <[email protected]> wrote:
>
>> On Thu, Jan 04, 2018 at 06:30:57PM +0800, Baoquan He wrote:
>> >On 01/04/18 at 04:02pm, Chao Fan wrote:
>> >> In current code, kaslr may choose the memory region in movable
>> >> nodes to extract kernel, which will make the nodes can't be hot-removed.
>> >> To solve it, we can specify the memory region in immovable node.
>> >> Create immovable_mem to store the regions in immovable_mem, where should
>> >> be chosen by kaslr.
>> >>
>> >> Also change the "handle_mem_memmap" to "handle_mem_filter", since
>> >> it will not only handle memmap parameter now.
>> >> Since "immovable_mem=" only works with "movable_node", so "immovable_mem="
>> >> doesn't work alone. If specify "movable_node" without "immovable_mem=",
>> >> disable KASLR.
>> >>
>> >> Multiple regions can be specified, comma delimited.
>> >> Considering the usage of memory, only support for 4 regions.
>> >> 4 regions contains 2 nodes at least, enough for kernel to extract.
>> >>
>> >> Signed-off-by: Chao Fan <[email protected]>
>> >
>> >Hi Chao,
>> >
>> >Thanks for your effort on this issue.
>> >
>> >Luiz told me they met a hugetlb issue when kaslr enabled on kvm guest.
>> >Please check the below bug information. There's only one available
>> >position which hugepage can use to allocate. In this case, if we have a
>> >generic parameter to tell kernel where we can randomize into, this
>> >hugepage issue can be solved. We can restrict kernel to randomize beyond
>> >[0x40000000, 0x7fffffff]. Not sure if your immovable_mem=nn[KMG]@ss[KMG]
>> >can be adjusted to do this. I am hesitating on whether we should change
>> >this or not.
>> >
>>
>> Hi Baoquan, Luiz,
>>
>> In my personal understanding, there is only one region,
>> [0x40000000, 0x7fffffff] suitable for the 1G page, so we should avoid
>> kaslr to choose this region, right?
>
>For a guest configured with 4GB of memory with the device configuration
>we're using yes.
>
>> If my understanding is right, I think it's more similar with mem_avoid.
>> Because we specify where KASLR *choose* in "immovable_mem=", we specify
>> where KASLR *avoid* in "mem_avoid".
>> So I wonder if it's OK to expand mem_avoid, and add a member like
>> MEM_AVOID_HUGEPAGE in "enum mem_avoid_index".
>> But there is a disadvantage, we can only specify the limited regions.
>
>Not requiring new command-line options would be great for users,
>but I'm not sure it's possible to use mem_avoid because I guess the
>free area may vary depending on amount of memory, devices, etc.
>

OK, I got it. So the old command-line iptions may not suit this issue.
I will try to think about it.
I don't know if other people have a good idea.

Thanks,
Chao Fan

>> Luiz, I am not familiar with HUGE PAGE, I wonder how many 1G HUGE pages
>> does system need in general? We may need to limit it in 2, or 4.
>
>I don't think it's possible to impose a limit. But the case we've
>been discussing in this thread it the case that has the greater
>impact: a guest with 4GB of memory which always has 1GB page with nokaslr,
>but may not have any 1GB page without nokaslr.
>
>


2018-01-11 09:00:40

by Baoquan He

[permalink] [raw]
Subject: Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)

Hi Luiz,

On 01/04/18 at 11:21am, Luiz Capitulino wrote:
> Having a generic kaslr parameter to control where the kernel is extracted
> is one solution for this problem.
>
> The general problem statement is that KASLR may break some kernel features
> depending on where the kernel is extracted. Two examples are hot-plugged
> memory (this series) and 1GB HugeTLB pages.
>
> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
> that there's a bunch of people running guests with up to 5GB of memory and
> with that amount of memory you have one or two 1GB pages and is easier for
> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
> you may not get any 1GB pages at all when this happens. However, I can also
> reproduce this on bare-metal with lots of memory where I can loose a 1GB
> page from time to time.
>
> Having a kaslr_range= parameter solves both issues, but two major drawbacks
> is that it breaks existing setups and I guess users will have a very hard
> time choosing good ranges.
>
> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
> could have a list of ranges known to contain holes and/or immovable
> memory and only extract the kernel into those ranges.

If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
always, whether people need hugetlb or not.

So in this case, what range do we need to avoid? Only [1G, 2G]?

Thanks
Baoquan

2018-01-11 18:05:00

by Kees Cook

[permalink] [raw]
Subject: Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)

On Thu, Jan 11, 2018 at 1:00 AM, Baoquan He <[email protected]> wrote:
> Hi Luiz,
>
> On 01/04/18 at 11:21am, Luiz Capitulino wrote:
>> Having a generic kaslr parameter to control where the kernel is extracted
>> is one solution for this problem.
>>
>> The general problem statement is that KASLR may break some kernel features
>> depending on where the kernel is extracted. Two examples are hot-plugged
>> memory (this series) and 1GB HugeTLB pages.
>>
>> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
>> that there's a bunch of people running guests with up to 5GB of memory and
>> with that amount of memory you have one or two 1GB pages and is easier for
>> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
>> you may not get any 1GB pages at all when this happens. However, I can also
>> reproduce this on bare-metal with lots of memory where I can loose a 1GB
>> page from time to time.
>>
>> Having a kaslr_range= parameter solves both issues, but two major drawbacks
>> is that it breaks existing setups and I guess users will have a very hard
>> time choosing good ranges.
>>
>> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
>> could have a list of ranges known to contain holes and/or immovable
>> memory and only extract the kernel into those ranges.
>
> If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
> always, whether people need hugetlb or not.
>
> So in this case, what range do we need to avoid? Only [1G, 2G]?

Any ranges like that that need to be avoided should be known at build
time, so they should simply be added to the mem_avoid list that is
already present in the KASLR code...

-Kees

--
Kees Cook
Pixel Security

2018-01-12 02:01:34

by Chao Fan

[permalink] [raw]
Subject: Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)

On Thu, Jan 11, 2018 at 10:04:56AM -0800, Kees Cook wrote:
>On Thu, Jan 11, 2018 at 1:00 AM, Baoquan He <[email protected]> wrote:
>> Hi Luiz,
>>
>> On 01/04/18 at 11:21am, Luiz Capitulino wrote:
>>> Having a generic kaslr parameter to control where the kernel is extracted
>>> is one solution for this problem.
>>>
>>> The general problem statement is that KASLR may break some kernel features
>>> depending on where the kernel is extracted. Two examples are hot-plugged
>>> memory (this series) and 1GB HugeTLB pages.
>>>
>>> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
>>> that there's a bunch of people running guests with up to 5GB of memory and
>>> with that amount of memory you have one or two 1GB pages and is easier for
>>> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
>>> you may not get any 1GB pages at all when this happens. However, I can also
>>> reproduce this on bare-metal with lots of memory where I can loose a 1GB
>>> page from time to time.
>>>
>>> Having a kaslr_range= parameter solves both issues, but two major drawbacks
>>> is that it breaks existing setups and I guess users will have a very hard
>>> time choosing good ranges.
>>>
>>> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
>>> could have a list of ranges known to contain holes and/or immovable
>>> memory and only extract the kernel into those ranges.
>>
>> If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
>> always, whether people need hugetlb or not.
>>
>> So in this case, what range do we need to avoid? Only [1G, 2G]?
>
>Any ranges like that that need to be avoided should be known at build
>time, so they should simply be added to the mem_avoid list that is
>already present in the KASLR code...
>

Hi Kees,

So this issue can be figured out in a independent patch.
And does this patch have any problems? If so, please tell me, I will try
my best to improve it.

Thanks,
Chao Fan

>-Kees
>
>--
>Kees Cook
>Pixel Security
>
>


2018-01-12 02:32:00

by Baoquan He

[permalink] [raw]
Subject: Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)

On 01/11/18 at 10:04am, Kees Cook wrote:
> On Thu, Jan 11, 2018 at 1:00 AM, Baoquan He <[email protected]> wrote:
> > Hi Luiz,
> >
> > On 01/04/18 at 11:21am, Luiz Capitulino wrote:
> >> Having a generic kaslr parameter to control where the kernel is extracted
> >> is one solution for this problem.
> >>
> >> The general problem statement is that KASLR may break some kernel features
> >> depending on where the kernel is extracted. Two examples are hot-plugged
> >> memory (this series) and 1GB HugeTLB pages.
> >>
> >> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
> >> that there's a bunch of people running guests with up to 5GB of memory and
> >> with that amount of memory you have one or two 1GB pages and is easier for
> >> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
> >> you may not get any 1GB pages at all when this happens. However, I can also
> >> reproduce this on bare-metal with lots of memory where I can loose a 1GB
> >> page from time to time.
> >>
> >> Having a kaslr_range= parameter solves both issues, but two major drawbacks
> >> is that it breaks existing setups and I guess users will have a very hard
> >> time choosing good ranges.
> >>
> >> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
> >> could have a list of ranges known to contain holes and/or immovable
> >> memory and only extract the kernel into those ranges.
> >
> > If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
> > always, whether people need hugetlb or not.
> >
> > So in this case, what range do we need to avoid? Only [1G, 2G]?
>
> Any ranges like that that need to be avoided should be known at build
> time, so they should simply be added to the mem_avoid list that is
> already present in the KASLR code...

Seems KASLR doesn't have an solution which allow user to specify avoided
range for kernel text KASLR stage only. The memmap="!#$" can add range to
mem_avoid, while it will make them not added to e820.

Here like this hugetlb case, Luiz wants kernel to avoid the [2G, 3G)
candidate position for hugetlb allocation, meanwhile wants it to be
added to mm subsystem later.

Thanks
Baoquan

2018-01-12 02:49:00

by Chao Fan

[permalink] [raw]
Subject: Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)

On Fri, Jan 12, 2018 at 10:31:52AM +0800, Baoquan He wrote:
>On 01/11/18 at 10:04am, Kees Cook wrote:
>> On Thu, Jan 11, 2018 at 1:00 AM, Baoquan He <[email protected]> wrote:
>> > Hi Luiz,
>> >
>> > On 01/04/18 at 11:21am, Luiz Capitulino wrote:
>> >> Having a generic kaslr parameter to control where the kernel is extracted
>> >> is one solution for this problem.
>> >>
>> >> The general problem statement is that KASLR may break some kernel features
>> >> depending on where the kernel is extracted. Two examples are hot-plugged
>> >> memory (this series) and 1GB HugeTLB pages.
>> >>
>> >> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
>> >> that there's a bunch of people running guests with up to 5GB of memory and
>> >> with that amount of memory you have one or two 1GB pages and is easier for
>> >> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
>> >> you may not get any 1GB pages at all when this happens. However, I can also
>> >> reproduce this on bare-metal with lots of memory where I can loose a 1GB
>> >> page from time to time.
>> >>
>> >> Having a kaslr_range= parameter solves both issues, but two major drawbacks
>> >> is that it breaks existing setups and I guess users will have a very hard
>> >> time choosing good ranges.
>> >>
>> >> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
>> >> could have a list of ranges known to contain holes and/or immovable
>> >> memory and only extract the kernel into those ranges.
>> >
>> > If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
>> > always, whether people need hugetlb or not.
>> >
>> > So in this case, what range do we need to avoid? Only [1G, 2G]?
>>
>> Any ranges like that that need to be avoided should be known at build
>> time, so they should simply be added to the mem_avoid list that is
>> already present in the KASLR code...
>
>Seems KASLR doesn't have an solution which allow user to specify avoided
>range for kernel text KASLR stage only. The memmap="!#$" can add range to
>mem_avoid, while it will make them not added to e820.
>

How about adding a new option, like "huge_page=nn@ss". Fill the regions
to mem_avoid. But this parameter will only be parsed in kaslr period.
The followed handlling of memmap will not be excuted.

Thanks,
Chao Fan

>Here like this hugetlb case, Luiz wants kernel to avoid the [2G, 3G)
>candidate position for hugetlb allocation, meanwhile wants it to be
>added to mm subsystem later.
>
>Thanks
>Baoquan
>
>
>


2018-01-12 18:52:09

by Luiz Capitulino

[permalink] [raw]
Subject: Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)

On Fri, 12 Jan 2018 10:47:53 +0800
Chao Fan <[email protected]> wrote:

> On Fri, Jan 12, 2018 at 10:31:52AM +0800, Baoquan He wrote:
> >On 01/11/18 at 10:04am, Kees Cook wrote:
> >> On Thu, Jan 11, 2018 at 1:00 AM, Baoquan He <[email protected]> wrote:
> >> > Hi Luiz,
> >> >
> >> > On 01/04/18 at 11:21am, Luiz Capitulino wrote:
> >> >> Having a generic kaslr parameter to control where the kernel is extracted
> >> >> is one solution for this problem.
> >> >>
> >> >> The general problem statement is that KASLR may break some kernel features
> >> >> depending on where the kernel is extracted. Two examples are hot-plugged
> >> >> memory (this series) and 1GB HugeTLB pages.
> >> >>
> >> >> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
> >> >> that there's a bunch of people running guests with up to 5GB of memory and
> >> >> with that amount of memory you have one or two 1GB pages and is easier for
> >> >> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
> >> >> you may not get any 1GB pages at all when this happens. However, I can also
> >> >> reproduce this on bare-metal with lots of memory where I can loose a 1GB
> >> >> page from time to time.
> >> >>
> >> >> Having a kaslr_range= parameter solves both issues, but two major drawbacks
> >> >> is that it breaks existing setups and I guess users will have a very hard
> >> >> time choosing good ranges.
> >> >>
> >> >> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
> >> >> could have a list of ranges known to contain holes and/or immovable
> >> >> memory and only extract the kernel into those ranges.
> >> >
> >> > If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
> >> > always, whether people need hugetlb or not.
> >> >
> >> > So in this case, what range do we need to avoid? Only [1G, 2G]?
> >>
> >> Any ranges like that that need to be avoided should be known at build
> >> time, so they should simply be added to the mem_avoid list that is
> >> already present in the KASLR code...
> >
> >Seems KASLR doesn't have an solution which allow user to specify avoided
> >range for kernel text KASLR stage only. The memmap="!#$" can add range to
> >mem_avoid, while it will make them not added to e820.
> >
>
> How about adding a new option, like "huge_page=nn@ss". Fill the regions
> to mem_avoid. But this parameter will only be parsed in kaslr period.
> The followed handlling of memmap will not be excuted.

If we add a new option, I think we should try to make general enough
to satisfy both hugepages and the memory hotplug problem. Otherwise
we'll end up adding a new option for each feature KASLR breaks...

However, in the case of the 1GB page problem, I'm starting to think
that it may be possible to know which 1GB areas are already fragmented
and extract the kernel to one of those areas. I don't know if this would
help the memory hotplug issue though.

2018-01-13 04:02:37

by Baoquan He

[permalink] [raw]
Subject: Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)

On 01/12/18 at 01:52pm, Luiz Capitulino wrote:
> On Fri, 12 Jan 2018 10:47:53 +0800
> Chao Fan <[email protected]> wrote:
>
> > On Fri, Jan 12, 2018 at 10:31:52AM +0800, Baoquan He wrote:
> > >On 01/11/18 at 10:04am, Kees Cook wrote:
> > >> On Thu, Jan 11, 2018 at 1:00 AM, Baoquan He <[email protected]> wrote:
> > >> > Hi Luiz,
> > >> >
> > >> > On 01/04/18 at 11:21am, Luiz Capitulino wrote:
> > >> >> Having a generic kaslr parameter to control where the kernel is extracted
> > >> >> is one solution for this problem.
> > >> >>
> > >> >> The general problem statement is that KASLR may break some kernel features
> > >> >> depending on where the kernel is extracted. Two examples are hot-plugged
> > >> >> memory (this series) and 1GB HugeTLB pages.
> > >> >>
> > >> >> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
> > >> >> that there's a bunch of people running guests with up to 5GB of memory and
> > >> >> with that amount of memory you have one or two 1GB pages and is easier for
> > >> >> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
> > >> >> you may not get any 1GB pages at all when this happens. However, I can also
> > >> >> reproduce this on bare-metal with lots of memory where I can loose a 1GB
> > >> >> page from time to time.
> > >> >>
> > >> >> Having a kaslr_range= parameter solves both issues, but two major drawbacks
> > >> >> is that it breaks existing setups and I guess users will have a very hard
> > >> >> time choosing good ranges.
> > >> >>
> > >> >> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
> > >> >> could have a list of ranges known to contain holes and/or immovable
> > >> >> memory and only extract the kernel into those ranges.
> > >> >
> > >> > If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
> > >> > always, whether people need hugetlb or not.
> > >> >
> > >> > So in this case, what range do we need to avoid? Only [1G, 2G]?
> > >>
> > >> Any ranges like that that need to be avoided should be known at build
> > >> time, so they should simply be added to the mem_avoid list that is
> > >> already present in the KASLR code...
> > >
> > >Seems KASLR doesn't have an solution which allow user to specify avoided
> > >range for kernel text KASLR stage only. The memmap="!#$" can add range to
> > >mem_avoid, while it will make them not added to e820.
> > >
> >
> > How about adding a new option, like "huge_page=nn@ss". Fill the regions
> > to mem_avoid. But this parameter will only be parsed in kaslr period.
> > The followed handlling of memmap will not be excuted.
>
> If we add a new option, I think we should try to make general enough
> to satisfy both hugepages and the memory hotplug problem. Otherwise
> we'll end up adding a new option for each feature KASLR breaks...

Yes, this is my concern. We can take advantage of this opportunity to
make it.

>
> However, in the case of the 1GB page problem, I'm starting to think
> that it may be possible to know which 1GB areas are already fragmented
> and extract the kernel to one of those areas. I don't know if this would
> help the memory hotplug issue though.

This is also the thing Chao is trying to solve. Since user may not
know how to get those hotplugable memory region, Chao is trying to add a
sysfs interface to export them which are extracted from ACPI SRAT.
Wonder if hugetlb can do the similar.

And the hugetlb issue only exists in 4G memory size of system, right?
For large memory system, no such problem.

Thanks
Baoquan

2018-01-13 05:07:57

by Chao Fan

[permalink] [raw]
Subject: Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)

On Sat, Jan 13, 2018 at 12:02:26PM +0800, Baoquan He wrote:
>On 01/12/18 at 01:52pm, Luiz Capitulino wrote:
>> On Fri, 12 Jan 2018 10:47:53 +0800
>> Chao Fan <[email protected]> wrote:
>>
>> > On Fri, Jan 12, 2018 at 10:31:52AM +0800, Baoquan He wrote:
>> > >On 01/11/18 at 10:04am, Kees Cook wrote:
>> > >> On Thu, Jan 11, 2018 at 1:00 AM, Baoquan He <[email protected]> wrote:
>> > >> > Hi Luiz,
>> > >> >
>> > >> > On 01/04/18 at 11:21am, Luiz Capitulino wrote:
>> > >> >> Having a generic kaslr parameter to control where the kernel is extracted
>> > >> >> is one solution for this problem.
>> > >> >>
>> > >> >> The general problem statement is that KASLR may break some kernel features
>> > >> >> depending on where the kernel is extracted. Two examples are hot-plugged
>> > >> >> memory (this series) and 1GB HugeTLB pages.
>> > >> >>
>> > >> >> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
>> > >> >> that there's a bunch of people running guests with up to 5GB of memory and
>> > >> >> with that amount of memory you have one or two 1GB pages and is easier for
>> > >> >> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
>> > >> >> you may not get any 1GB pages at all when this happens. However, I can also
>> > >> >> reproduce this on bare-metal with lots of memory where I can loose a 1GB
>> > >> >> page from time to time.
>> > >> >>
>> > >> >> Having a kaslr_range= parameter solves both issues, but two major drawbacks
>> > >> >> is that it breaks existing setups and I guess users will have a very hard
>> > >> >> time choosing good ranges.
>> > >> >>
>> > >> >> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
>> > >> >> could have a list of ranges known to contain holes and/or immovable
>> > >> >> memory and only extract the kernel into those ranges.
>> > >> >
>> > >> > If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
>> > >> > always, whether people need hugetlb or not.
>> > >> >
>> > >> > So in this case, what range do we need to avoid? Only [1G, 2G]?
>> > >>
>> > >> Any ranges like that that need to be avoided should be known at build
>> > >> time, so they should simply be added to the mem_avoid list that is
>> > >> already present in the KASLR code...
>> > >
>> > >Seems KASLR doesn't have an solution which allow user to specify avoided
>> > >range for kernel text KASLR stage only. The memmap="!#$" can add range to
>> > >mem_avoid, while it will make them not added to e820.
>> > >
>> >
>> > How about adding a new option, like "huge_page=nn@ss". Fill the regions
>> > to mem_avoid. But this parameter will only be parsed in kaslr period.
>> > The followed handlling of memmap will not be excuted.
>>
>> If we add a new option, I think we should try to make general enough
>> to satisfy both hugepages and the memory hotplug problem. Otherwise
>> we'll end up adding a new option for each feature KASLR breaks...
>
>Yes, this is my concern. We can take advantage of this opportunity to
>make it.
>
>>
>> However, in the case of the 1GB page problem, I'm starting to think
>> that it may be possible to know which 1GB areas are already fragmented
>> and extract the kernel to one of those areas. I don't know if this would
>> help the memory hotplug issue though.

Hi Luiz,

Before this patchset, I ever try to parse ACPI SRAT table to get the
detailed memory information, then filter the movable regions.
But the code is too heavy. So I changed my method like Baoquan said.

>
>This is also the thing Chao is trying to solve. Since user may not
>know how to get those hotplugable memory region, Chao is trying to add a
>sysfs interface to export them which are extracted from ACPI SRAT.
>Wonder if hugetlb can do the similar.
>
>And the hugetlb issue only exists in 4G memory size of system, right?
>For large memory system, no such problem.
>

Hi Baoquan,

I also wonder this problem.
I asked Luiz in the email. Since the mem_avoid limit the amount of
regions, so I asked Luiz how many 1G huge pages does system need.
He said free area may vary depending on amount of memory, devices, etc.

So in my personal understanding, if there is a machine with the memory
is 6G, and 2 suitable position for 1G huge page, and system need 2
huge pages, so the bug will also happen.

Well, if there is a large number of memory, there will be many suitable
regions, KASLR will break only one suitable region, so we don't need
care this bug. But I wonder the boundary of these two situations.
What's the limited counts of this issue.

Thanks,
Chao Fan

>Thanks
>Baoquan
>
>


2018-01-31 03:33:54

by Baoquan He

[permalink] [raw]
Subject: Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)

Hi Kees,

On 01/11/18 at 10:04am, Kees Cook wrote:
> On Thu, Jan 11, 2018 at 1:00 AM, Baoquan He <[email protected]> wrote:
> > Hi Luiz,
> >
> > On 01/04/18 at 11:21am, Luiz Capitulino wrote:
> >> Having a generic kaslr parameter to control where the kernel is extracted
> >> is one solution for this problem.
> >>
> >> The general problem statement is that KASLR may break some kernel features
> >> depending on where the kernel is extracted. Two examples are hot-plugged
> >> memory (this series) and 1GB HugeTLB pages.
> >>
> >> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
> >> that there's a bunch of people running guests with up to 5GB of memory and
> >> with that amount of memory you have one or two 1GB pages and is easier for
> >> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
> >> you may not get any 1GB pages at all when this happens. However, I can also
> >> reproduce this on bare-metal with lots of memory where I can loose a 1GB
> >> page from time to time.
> >>
> >> Having a kaslr_range= parameter solves both issues, but two major drawbacks
> >> is that it breaks existing setups and I guess users will have a very hard
> >> time choosing good ranges.
> >>
> >> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
> >> could have a list of ranges known to contain holes and/or immovable
> >> memory and only extract the kernel into those ranges.
> >
> > If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
> > always, whether people need hugetlb or not.
> >
> > So in this case, what range do we need to avoid? Only [1G, 2G]?
>
> Any ranges like that that need to be avoided should be known at build
> time, so they should simply be added to the mem_avoid list that is
> already present in the KASLR code...

Sorry, I might misunderstand your suggestion before. Are you suggesting to
add a specific range to mem_avoid[] by hardcoding?

I may not make the situation stated clearly, sorry for that. For this
hugepage issue, Luiz tested in a kvm guest with 4G memory. And the
hugetlb need allocate 1G with 1G aligned, so only [1G, 2G] area is good
1G huge page for allocation. The other area has no good 1G page for
usage:
[0, 1G]: BIOS reserved several pages;
[2G, 3G]: the top is reserved by system, 0x00000000bffe0000-0x00000000bfffffff
[3G, 4G]: no ram deployed by firmware
[4G, 5G]: system allocate from top to bottom

dmesg output snippet of kvm guest:
[ +0.000000] e820: BIOS-provided physical RAM map:
[ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable

However, this only failed in this system. If Luiz setup kvm with 5G or
larger memory, you can see, there will be more than one good 1G page.
While kernel randomization can only occupy one. So if more than one good
1G page, the 1G huge page allocation failure won't occur. So it's a very
corner case, that's why I don't want to hardcode it into mem_avoid[].
Code sounds not reasonable with the change which we need avoid [1G, 2G]
area, and the code comments have to tell that we do this because system
with 4G memory can't allocate 1G huge page successfully. Other than that,
those system which don't need hugetlb feature, or have more memory, don't
have this issue at all.

These are my thinking about the current fixing way, not sure if it's
peruasive or make sense. Would like to hear any suggestion or different
idea to solve the encountered problems.

Thanks
Baoquan