2018-05-16 10:09:35

by Baoquan He

[permalink] [raw]
Subject: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization

This is a regression bug fix. Luiz's team reported that 1GB huge page
allocation will get one less 1GB page randomly when KASLR is enabled. On
their KVM guest with 4GB RAM, which only has one good 1GB huge page,
they found the 1GB huge page allocation sometime failed with below
kernel option adding.

default_hugepagesz=1G hugepagesz=1G hugepages=1

This is because kernel may be randomized into those good 1GB huge pages.

I ever thought to solve this by specifying available memory regions
which kernel KASLR can be randomized into to avoid those good 1GB huge
pages. Chao's patches can be used to fix it:
https://lkml.org/lkml/2018/2/28/217

Later, Ingo suggested avoiding them in boot KASLR code.
https://lkml.org/lkml/2018/3/12/312

So I made this patchset to handle the conflict between 1GB huge pages
allocation and KASLR. Any idea or suggestion about the handling,
function naming is appreciated.

Baoquan He (2):
x86/boot/KASLR: Add two functions for 1GB huge pages handling
x86/boot/KASLR: Skip specified number of 1GB huge pages when do
physical randomization

arch/x86/boot/compressed/kaslr.c | 84 +++++++++++++++++++++++++++++++++++++---
1 file changed, 79 insertions(+), 5 deletions(-)

--
2.13.6



2018-05-16 10:07:19

by Baoquan He

[permalink] [raw]
Subject: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to
handle conflict between KASLR and huge pages, will be used in the next patch.

Function parse_gb_huge_pages() is used to parse kernel command-line to get
how many 1GB huge pages have been specified. A static global variable
'max_gb_huge_pages' is added to store the number.

And process_gb_huge_page() is used to skip as many 1GB huge pages as possible
from the passed in memory region according to the specified number.

Signed-off-by: Baoquan He <[email protected]>
---
arch/x86/boot/compressed/kaslr.c | 71 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 71 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index a0a50b91ecef..13bd879cdc5d 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
}

+/* Store the number of 1GB huge pages which user specified.*/
+static unsigned long max_gb_huge_pages;
+
+static int parse_gb_huge_pages(char *param, char* val)
+{
+ char *p;
+ u64 mem_size;
+ static bool gbpage_sz = false;
+
+ if (!strcmp(param, "hugepagesz")) {
+ p = val;
+ mem_size = memparse(p, &p);
+ if (mem_size == PUD_SIZE) {
+ if (gbpage_sz)
+ warn("Repeadly set hugeTLB page size of 1G!\n");
+ gbpage_sz = true;
+ } else
+ gbpage_sz = false;
+ } else if (!strcmp(param, "hugepages") && gbpage_sz) {
+ p = val;
+ max_gb_huge_pages = simple_strtoull(p, &p, 0);
+ debug_putaddr(max_gb_huge_pages);
+ }
+}
+
+
static int handle_mem_memmap(void)
{
char *args = (char *)get_cmd_line_ptr();
@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
}
}

+/* Skip as many 1GB huge pages as possible in the passed region. */
+static void process_gb_huge_page(struct mem_vector *region, unsigned long image_size)
+{
+ int i = 0;
+ unsigned long addr, size;
+ struct mem_vector tmp;
+
+ if (!max_gb_huge_pages) {
+ store_slot_info(region, image_size);
+ return;
+ }
+
+ addr = ALIGN(region->start, PUD_SIZE);
+ /* If Did we raise the address above the passed in memory entry? */
+ if (addr < region->start + region->size)
+ size = region->size - (addr - region->start);
+
+ /* Check how many 1GB huge pages can be filtered out*/
+ while (size > PUD_SIZE && max_gb_huge_pages) {
+ size -= PUD_SIZE;
+ max_gb_huge_pages--;
+ i++;
+ }
+
+ if (!i) {
+ store_slot_info(region, image_size);
+ return;
+ }
+
+ /* Process the remaining regions after filtering out. */
+
+ if (addr >= region->start + image_size) {
+ tmp.start = region->start;
+ tmp.size = addr - region->start;
+ store_slot_info(&tmp, image_size);
+ }
+
+ size = region->size - (addr - region->start) - i * PUD_SIZE;
+ if (size >= image_size) {
+ tmp.start = addr + i*PUD_SIZE;
+ tmp.size = size;
+ store_slot_info(&tmp, image_size);
+ }
+}
+
static unsigned long slots_fetch_random(void)
{
unsigned long slot;
--
2.13.6


2018-05-16 10:07:42

by Baoquan He

[permalink] [raw]
Subject: [PATCH 2/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization

For 1GB huge pages allocation, a regression bug is reported when KASLR
is enabled. On KVM guest with 4GB RAM, and add the following to the
kernel command-line:

'default_hugepagesz=1G hugepagesz=1G hugepages=1'

Then boot the guest and check number of 1GB pages reserved:
grep HugePages_Total /proc/meminfo

When booting with "nokaslr" HugePages_Total is always 1. When booting
without "nokaslr" sometimes HugePages_Total is zero (that is, reserving
the 1GB page fails). It may need to boot a few times to trigger the issue.

After investigation, the root cause is that kernel may be put in the only
good 1GB huge page [0x40000000, 0x7fffffff] randomly. Below is the dmesg
output snippet of the KVM guest. We can see that only
[0x40000000, 0x7fffffff] region is good 1GB huge page,
[0x100000000, 0x13fffffff] will be touched by memblock top-down allocation.

[ +0.000000] e820: BIOS-provided physical RAM map:
[ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable

And also on those bare-metal machines with larger memory, one less 1GB huge
page might be got with KASLR enabled than 'nokaslr' specified case. It's also
because that kernel might be randomized into those good 1GB huge pages.

To fix this, firstly parse kernel command-line to get how many 1GB huge pages
are specified. Then try to skip the specified number of 1GB huge pages when
decide which memory region kernel can be randomized into.

And also change the name of handle_mem_memmap() as handle_mem_options() since
it doesn't only handle 'mem=' and 'memmap=', but include 'hugepagesxxx' now.

Signed-off-by: Baoquan He <[email protected]>
---
arch/x86/boot/compressed/kaslr.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 13bd879cdc5d..b4819faab602 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -241,7 +241,7 @@ static int parse_gb_huge_pages(char *param, char* val)
}


-static int handle_mem_memmap(void)
+static int handle_mem_options(void)
{
char *args = (char *)get_cmd_line_ptr();
size_t len = strlen((char *)args);
@@ -249,7 +249,8 @@ static int handle_mem_memmap(void)
char *param, *val;
u64 mem_size;

- if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+ if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+ !strstr(args,"hugepages"))
return 0;

tmp_cmdline = malloc(len + 1);
@@ -274,6 +275,8 @@ static int handle_mem_memmap(void)

if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+ } else if (strstr(param, "hugepages")) {
+ parse_gb_huge_pages(param, val);
} else if (!strcmp(param, "mem")) {
char *p = val;

@@ -413,7 +416,7 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
/* We don't need to set a mapping for setup_data. */

/* Mark the memmap regions we need to avoid */
- handle_mem_memmap();
+ handle_mem_options();

#ifdef CONFIG_X86_VERBOSE_BOOTUP
/* Make sure video RAM can be used. */
@@ -617,7 +620,7 @@ static void process_mem_region(struct mem_vector *entry,

/* If nothing overlaps, store the region and return. */
if (!mem_avoid_overlap(&region, &overlap)) {
- store_slot_info(&region, image_size);
+ process_gb_huge_page(&region, image_size);
return;
}

@@ -627,7 +630,7 @@ static void process_mem_region(struct mem_vector *entry,

beginning.start = region.start;
beginning.size = overlap.start - region.start;
- store_slot_info(&beginning, image_size);
+ process_gb_huge_page(&beginning, image_size);
}

/* Return if overlap extends to or past end of region. */
--
2.13.6


2018-05-17 03:27:40

by Chao Fan

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

Hi Baoquan,

I have reviewed the patch, I think the caculation of address has no
problem. But maybe I miss something, so I have several questions.

On Wed, May 16, 2018 at 06:05:31PM +0800, Baoquan He wrote:
>Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to
>handle conflict between KASLR and huge pages, will be used in the next patch.
>
>Function parse_gb_huge_pages() is used to parse kernel command-line to get
>how many 1GB huge pages have been specified. A static global variable
>'max_gb_huge_pages' is added to store the number.
>
>And process_gb_huge_page() is used to skip as many 1GB huge pages as possible
>from the passed in memory region according to the specified number.
>
>Signed-off-by: Baoquan He <[email protected]>
>---
> arch/x86/boot/compressed/kaslr.c | 71 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 71 insertions(+)
>
>diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
>index a0a50b91ecef..13bd879cdc5d 100644
>--- a/arch/x86/boot/compressed/kaslr.c
>+++ b/arch/x86/boot/compressed/kaslr.c
>@@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str)
> memmap_too_large = true;
> }
>
>+/* Store the number of 1GB huge pages which user specified.*/
>+static unsigned long max_gb_huge_pages;
>+
>+static int parse_gb_huge_pages(char *param, char* val)
>+{
>+ char *p;
>+ u64 mem_size;
>+ static bool gbpage_sz = false;
>+
>+ if (!strcmp(param, "hugepagesz")) {
>+ p = val;
>+ mem_size = memparse(p, &p);
>+ if (mem_size == PUD_SIZE) {
>+ if (gbpage_sz)
>+ warn("Repeadly set hugeTLB page size of 1G!\n");
>+ gbpage_sz = true;
>+ } else
>+ gbpage_sz = false;
>+ } else if (!strcmp(param, "hugepages") && gbpage_sz) {
>+ p = val;
>+ max_gb_huge_pages = simple_strtoull(p, &p, 0);
>+ debug_putaddr(max_gb_huge_pages);
>+ }
>+}
>+
>+
> static int handle_mem_memmap(void)
> {
> char *args = (char *)get_cmd_line_ptr();
>@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
> }
> }
>
>+/* Skip as many 1GB huge pages as possible in the passed region. */
>+static void process_gb_huge_page(struct mem_vector *region, unsigned long image_size)
>+{
>+ int i = 0;
>+ unsigned long addr, size;
>+ struct mem_vector tmp;
>+
>+ if (!max_gb_huge_pages) {
>+ store_slot_info(region, image_size);
>+ return;
>+ }
>+
>+ addr = ALIGN(region->start, PUD_SIZE);
>+ /* If Did we raise the address above the passed in memory entry? */
>+ if (addr < region->start + region->size)
>+ size = region->size - (addr - region->start);
>+
>+ /* Check how many 1GB huge pages can be filtered out*/
>+ while (size > PUD_SIZE && max_gb_huge_pages) {
>+ size -= PUD_SIZE;
>+ max_gb_huge_pages--;

The global variable 'max_gb_huge_pages' means how many huge pages
user specified when you get it from command line.
But here, everytime we find a position which is good for huge page
allocation, the 'max_gdb_huge_page' decreased. So in my understanding,
it is used to store how many huge pages that we still need to search memory
for good slots to filter out, right?
If it's right, maybe the name 'max_gb_huge_pages' is not very suitable.
If my understanding is wrong, please tell me.

>+ i++;
>+ }
>+
>+ if (!i) {
>+ store_slot_info(region, image_size);
>+ return;
>+ }
>+
>+ /* Process the remaining regions after filtering out. */
>+
This line may be unusable.
>+ if (addr >= region->start + image_size) {
>+ tmp.start = region->start;
>+ tmp.size = addr - region->start;
>+ store_slot_info(&tmp, image_size);
>+ }
>+
>+ size = region->size - (addr - region->start) - i * PUD_SIZE;
>+ if (size >= image_size) {
>+ tmp.start = addr + i*PUD_SIZE;
>+ tmp.size = size;
>+ store_slot_info(&tmp, image_size);
>+ }

I have another question not related to kaslr.
Here you try to avoid the memory from addr to (addr + i * PUD_SIZE),
but I wonder if after walking all memory regions, 'max_gb_huge_pages'
is still more than 0, which means there isn't enough memory slots for
huge page, what will happen?

Thanks,
Chao Fan

>+}
>+
> static unsigned long slots_fetch_random(void)
> {
> unsigned long slot;
>--
>2.13.6
>
>
>



2018-05-17 04:04:40

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

Hi Chao,

On 05/17/18 at 11:27am, Chao Fan wrote:
> >+/* Store the number of 1GB huge pages which user specified.*/
> >+static unsigned long max_gb_huge_pages;
> >+
> >+static int parse_gb_huge_pages(char *param, char* val)
> >+{
> >+ char *p;
> >+ u64 mem_size;
> >+ static bool gbpage_sz = false;
> >+
> >+ if (!strcmp(param, "hugepagesz")) {
> >+ p = val;
> >+ mem_size = memparse(p, &p);
> >+ if (mem_size == PUD_SIZE) {
> >+ if (gbpage_sz)
> >+ warn("Repeadly set hugeTLB page size of 1G!\n");
> >+ gbpage_sz = true;
> >+ } else
> >+ gbpage_sz = false;
> >+ } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> >+ p = val;
> >+ max_gb_huge_pages = simple_strtoull(p, &p, 0);
> >+ debug_putaddr(max_gb_huge_pages);
> >+ }
> >+}
> >+
> >+
> > static int handle_mem_memmap(void)
> > {
> > char *args = (char *)get_cmd_line_ptr();
> >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
> > }
> > }
> >
> >+/* Skip as many 1GB huge pages as possible in the passed region. */
> >+static void process_gb_huge_page(struct mem_vector *region, unsigned long image_size)
> >+{
> >+ int i = 0;
> >+ unsigned long addr, size;
> >+ struct mem_vector tmp;
> >+
> >+ if (!max_gb_huge_pages) {
> >+ store_slot_info(region, image_size);
> >+ return;
> >+ }
> >+
> >+ addr = ALIGN(region->start, PUD_SIZE);
> >+ /* If Did we raise the address above the passed in memory entry? */
> >+ if (addr < region->start + region->size)
> >+ size = region->size - (addr - region->start);
> >+
> >+ /* Check how many 1GB huge pages can be filtered out*/
> >+ while (size > PUD_SIZE && max_gb_huge_pages) {
> >+ size -= PUD_SIZE;
> >+ max_gb_huge_pages--;
>
> The global variable 'max_gb_huge_pages' means how many huge pages
> user specified when you get it from command line.
> But here, everytime we find a position which is good for huge page
> allocation, the 'max_gdb_huge_page' decreased. So in my understanding,
> it is used to store how many huge pages that we still need to search memory
> for good slots to filter out, right?
> If it's right, maybe the name 'max_gb_huge_pages' is not very suitable.
> If my understanding is wrong, please tell me.

No, you have understood it very right. I finished the draft patch last
week, but changed this variable name and the function names several
time, still I feel they are not good. However I can't get a better name.

Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected
from kernel command-line. And in this function it will be decreased. But
we can't define another global variable only for decreasing in this
place.

And you can see that in this patchset I only take cares of 1GB huge
pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why
1GB only? Because 2MB is not impacted by KASLR, please check the code in
hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be
pre-allocated in hugetlb_nrpages_setup(), and if you look into
hugetlb_nrpages_setup(), you will find that it will call
alloc_bootmem_huge_page() to allocate huge pages one by one, but not at
one time. That is why I always add 'gb' in the middle of the global
variable and the newly added functions.

And it will answer your below questions. When walk over all memory
regions, 'max_gb_huge_pages' is still not 0, what should we do? It's
normal and done as expected. Here hugetlb only try its best to allocate
as many as possible according to 'max_gb_huge_pages'. If can't fully
satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add
below to command-line:

default_hugepagesz=1G hugepagesz=1G hugepages=20

Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G)
and [3G,4G) are touched by bios reservation and pci/firmware reservation.
Then this 14 huge pages are maximal value which is expected. It's not a
bug in huge page. But with kaslr enabled, it sometime only get 13 1GB
huge pages because KASLR put kernel into one of those good 1GB huge
pages. This is a bug.

I am not very familiar with huge page handling, just read code recently
because of this kaslr bug. Hope Luiz and people from his team can help
correct and clarify if anything is not right. Especially the function
names, I feel it's not good, if anyone have a better idea, I will really
appreciate that.
>
> >+ i++;
> >+ }
> >+
> >+ if (!i) {
> >+ store_slot_info(region, image_size);
> >+ return;
> >+ }
> >+
> >+ /* Process the remaining regions after filtering out. */
> >+
> This line may be unusable.

Hmm, I made it on purpose. Because 1GB huge pages may be digged out from
the middle, then the remaing head and tail regions still need be
handled. I put it here to mean that it covers below two code blocks.

I can remove it if people think it's not appropriate.

> >+ if (addr >= region->start + image_size) {
> >+ tmp.start = region->start;
> >+ tmp.size = addr - region->start;
> >+ store_slot_info(&tmp, image_size);
> >+ }
> >+
> >+ size = region->size - (addr - region->start) - i * PUD_SIZE;
> >+ if (size >= image_size) {
> >+ tmp.start = addr + i*PUD_SIZE;
> >+ tmp.size = size;
> >+ store_slot_info(&tmp, image_size);
> >+ }
>
> I have another question not related to kaslr.
> Here you try to avoid the memory from addr to (addr + i * PUD_SIZE),
> but I wonder if after walking all memory regions, 'max_gb_huge_pages'
> is still more than 0, which means there isn't enough memory slots for
> huge page, what will happen?

Please check the response at the beginning of response.

Thanks
Baoquan

>
>
> >+}
> >+
> > static unsigned long slots_fetch_random(void)
> > {
> > unsigned long slot;
> >--
> >2.13.6
> >
> >
> >
>
>

2018-05-17 05:13:02

by Damian Tometzki

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

On Wed, 16. May 18:05, Baoquan He wrote:
> Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to
> handle conflict between KASLR and huge pages, will be used in the next patch.
>
> Function parse_gb_huge_pages() is used to parse kernel command-line to get
> how many 1GB huge pages have been specified. A static global variable
> 'max_gb_huge_pages' is added to store the number.
>
> And process_gb_huge_page() is used to skip as many 1GB huge pages as possible
> from the passed in memory region according to the specified number.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> arch/x86/boot/compressed/kaslr.c | 71 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 71 insertions(+)
>
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index a0a50b91ecef..13bd879cdc5d 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str)
> memmap_too_large = true;
> }
>
> +/* Store the number of 1GB huge pages which user specified.*/
> +static unsigned long max_gb_huge_pages;
> +
> +static int parse_gb_huge_pages(char *param, char* val)
> +{
> + char *p;
> + u64 mem_size;
> + static bool gbpage_sz = false;
> +
> + if (!strcmp(param, "hugepagesz")) {
> + p = val;
> + mem_size = memparse(p, &p);
> + if (mem_size == PUD_SIZE) {
> + if (gbpage_sz)
> + warn("Repeadly set hugeTLB page size of 1G!\n");
> + gbpage_sz = true;
> + } else
> + gbpage_sz = false;
> + } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> + p = val;
> + max_gb_huge_pages = simple_strtoull(p, &p, 0);
> + debug_putaddr(max_gb_huge_pages);
> + }
> +}
Hello,

the return value is missing for the function or not ?

regards
Damian

> +
> +
> static int handle_mem_memmap(void)
> {
> char *args = (char *)get_cmd_line_ptr();
> @@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
> }
> }
>
> +/* Skip as many 1GB huge pages as possible in the passed region. */
> +static void process_gb_huge_page(struct mem_vector *region, unsigned long image_size)
> +{
> + int i = 0;
> + unsigned long addr, size;
> + struct mem_vector tmp;
> +
> + if (!max_gb_huge_pages) {
> + store_slot_info(region, image_size);
> + return;
> + }
> +
> + addr = ALIGN(region->start, PUD_SIZE);
> + /* If Did we raise the address above the passed in memory entry? */
> + if (addr < region->start + region->size)
> + size = region->size - (addr - region->start);
> +
> + /* Check how many 1GB huge pages can be filtered out*/
> + while (size > PUD_SIZE && max_gb_huge_pages) {
> + size -= PUD_SIZE;
> + max_gb_huge_pages--;
> + i++;
> + }
> +
> + if (!i) {
> + store_slot_info(region, image_size);
> + return;
> + }
> +
> + /* Process the remaining regions after filtering out. */
> +
> + if (addr >= region->start + image_size) {
> + tmp.start = region->start;
> + tmp.size = addr - region->start;
> + store_slot_info(&tmp, image_size);
> + }
> +
> + size = region->size - (addr - region->start) - i * PUD_SIZE;
> + if (size >= image_size) {
> + tmp.start = addr + i*PUD_SIZE;
> + tmp.size = size;
> + store_slot_info(&tmp, image_size);
> + }
> +}
> +
> static unsigned long slots_fetch_random(void)
> {
> unsigned long slot;
> --
> 2.13.6
>

2018-05-17 05:39:28

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

On 05/17/18 at 07:12am, damian wrote:
> On Wed, 16. May 18:05, Baoquan He wrote:
> > Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to
> > handle conflict between KASLR and huge pages, will be used in the next patch.
> >
> > Function parse_gb_huge_pages() is used to parse kernel command-line to get
> > how many 1GB huge pages have been specified. A static global variable
> > 'max_gb_huge_pages' is added to store the number.
> >
> > And process_gb_huge_page() is used to skip as many 1GB huge pages as possible
> > from the passed in memory region according to the specified number.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > ---
> > arch/x86/boot/compressed/kaslr.c | 71 ++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 71 insertions(+)
> >
> > diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> > index a0a50b91ecef..13bd879cdc5d 100644
> > --- a/arch/x86/boot/compressed/kaslr.c
> > +++ b/arch/x86/boot/compressed/kaslr.c
> > @@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str)
> > memmap_too_large = true;
> > }
> >
> > +/* Store the number of 1GB huge pages which user specified.*/
> > +static unsigned long max_gb_huge_pages;
> > +
> > +static int parse_gb_huge_pages(char *param, char* val)
> > +{
> > + char *p;
> > + u64 mem_size;
> > + static bool gbpage_sz = false;
> > +
> > + if (!strcmp(param, "hugepagesz")) {
> > + p = val;
> > + mem_size = memparse(p, &p);
> > + if (mem_size == PUD_SIZE) {
> > + if (gbpage_sz)
> > + warn("Repeadly set hugeTLB page size of 1G!\n");
> > + gbpage_sz = true;
> > + } else
> > + gbpage_sz = false;
> > + } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> > + p = val;
> > + max_gb_huge_pages = simple_strtoull(p, &p, 0);
> > + debug_putaddr(max_gb_huge_pages);
> > + }
> > +}
> Hello,
>
> the return value is missing for the function or not ?

Thanks, very good catch. It should be 'static void ', no return value
since we didn't check it. Will update when repost.

>
> > +
> > +
> > static int handle_mem_memmap(void)
> > {
> > char *args = (char *)get_cmd_line_ptr();
> > @@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
> > }
> > }
> >
> > +/* Skip as many 1GB huge pages as possible in the passed region. */
> > +static void process_gb_huge_page(struct mem_vector *region, unsigned long image_size)
> > +{
> > + int i = 0;
> > + unsigned long addr, size;
> > + struct mem_vector tmp;
> > +
> > + if (!max_gb_huge_pages) {
> > + store_slot_info(region, image_size);
> > + return;
> > + }
> > +
> > + addr = ALIGN(region->start, PUD_SIZE);
> > + /* If Did we raise the address above the passed in memory entry? */
> > + if (addr < region->start + region->size)
> > + size = region->size - (addr - region->start);
> > +
> > + /* Check how many 1GB huge pages can be filtered out*/
> > + while (size > PUD_SIZE && max_gb_huge_pages) {
> > + size -= PUD_SIZE;
> > + max_gb_huge_pages--;
> > + i++;
> > + }
> > +
> > + if (!i) {
> > + store_slot_info(region, image_size);
> > + return;
> > + }
> > +
> > + /* Process the remaining regions after filtering out. */
> > +
> > + if (addr >= region->start + image_size) {
> > + tmp.start = region->start;
> > + tmp.size = addr - region->start;
> > + store_slot_info(&tmp, image_size);
> > + }
> > +
> > + size = region->size - (addr - region->start) - i * PUD_SIZE;
> > + if (size >= image_size) {
> > + tmp.start = addr + i*PUD_SIZE;
> > + tmp.size = size;
> > + store_slot_info(&tmp, image_size);
> > + }
> > +}
> > +
> > static unsigned long slots_fetch_random(void)
> > {
> > unsigned long slot;
> > --
> > 2.13.6
> >

2018-05-17 05:54:39

by Chao Fan

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

On Thu, May 17, 2018 at 12:03:43PM +0800, Baoquan He wrote:
>Hi Chao,
>
>On 05/17/18 at 11:27am, Chao Fan wrote:
>> >+/* Store the number of 1GB huge pages which user specified.*/
>> >+static unsigned long max_gb_huge_pages;
>> >+
>> >+static int parse_gb_huge_pages(char *param, char* val)
>> >+{
>> >+ char *p;
>> >+ u64 mem_size;
>> >+ static bool gbpage_sz = false;
>> >+
>> >+ if (!strcmp(param, "hugepagesz")) {
>> >+ p = val;
>> >+ mem_size = memparse(p, &p);
>> >+ if (mem_size == PUD_SIZE) {
>> >+ if (gbpage_sz)
>> >+ warn("Repeadly set hugeTLB page size of 1G!\n");
>> >+ gbpage_sz = true;
>> >+ } else
>> >+ gbpage_sz = false;
>> >+ } else if (!strcmp(param, "hugepages") && gbpage_sz) {
>> >+ p = val;
>> >+ max_gb_huge_pages = simple_strtoull(p, &p, 0);
>> >+ debug_putaddr(max_gb_huge_pages);
>> >+ }
>> >+}
>> >+
>> >+
>> > static int handle_mem_memmap(void)
>> > {
>> > char *args = (char *)get_cmd_line_ptr();
>> >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
>> > }
>> > }
>> >
>> >+/* Skip as many 1GB huge pages as possible in the passed region. */
>> >+static void process_gb_huge_page(struct mem_vector *region, unsigned long image_size)
>> >+{
>> >+ int i = 0;
>> >+ unsigned long addr, size;
>> >+ struct mem_vector tmp;
>> >+
>> >+ if (!max_gb_huge_pages) {
>> >+ store_slot_info(region, image_size);
>> >+ return;
>> >+ }
>> >+
>> >+ addr = ALIGN(region->start, PUD_SIZE);
>> >+ /* If Did we raise the address above the passed in memory entry? */
>> >+ if (addr < region->start + region->size)
>> >+ size = region->size - (addr - region->start);
>> >+
>> >+ /* Check how many 1GB huge pages can be filtered out*/
>> >+ while (size > PUD_SIZE && max_gb_huge_pages) {
>> >+ size -= PUD_SIZE;
>> >+ max_gb_huge_pages--;
>>
>> The global variable 'max_gb_huge_pages' means how many huge pages
>> user specified when you get it from command line.
>> But here, everytime we find a position which is good for huge page
>> allocation, the 'max_gdb_huge_page' decreased. So in my understanding,
>> it is used to store how many huge pages that we still need to search memory
>> for good slots to filter out, right?
>> If it's right, maybe the name 'max_gb_huge_pages' is not very suitable.
>> If my understanding is wrong, please tell me.
>
>No, you have understood it very right. I finished the draft patch last
>week, but changed this variable name and the function names several
>time, still I feel they are not good. However I can't get a better name.
>
>Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected
>from kernel command-line. And in this function it will be decreased. But
>we can't define another global variable only for decreasing in this
>place.
>
>And you can see that in this patchset I only take cares of 1GB huge
>pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why
>1GB only? Because 2MB is not impacted by KASLR, please check the code in
>hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be
>pre-allocated in hugetlb_nrpages_setup(), and if you look into
>hugetlb_nrpages_setup(), you will find that it will call
>alloc_bootmem_huge_page() to allocate huge pages one by one, but not at
>one time. That is why I always add 'gb' in the middle of the global
>variable and the newly added functions.
>
>And it will answer your below questions. When walk over all memory
>regions, 'max_gb_huge_pages' is still not 0, what should we do? It's
>normal and done as expected. Here hugetlb only try its best to allocate
>as many as possible according to 'max_gb_huge_pages'. If can't fully
>satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add
>below to command-line:
>
>default_hugepagesz=1G hugepagesz=1G hugepages=20
>
>Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G)
>and [3G,4G) are touched by bios reservation and pci/firmware reservation.
>Then this 14 huge pages are maximal value which is expected. It's not a
>bug in huge page. But with kaslr enabled, it sometime only get 13 1GB
>huge pages because KASLR put kernel into one of those good 1GB huge
>pages. This is a bug.

Thanks for your explaination, I got it.

>
>I am not very familiar with huge page handling, just read code recently
>because of this kaslr bug. Hope Luiz and people from his team can help
>correct and clarify if anything is not right. Especially the function
>names, I feel it's not good, if anyone have a better idea, I will really
>appreciate that.
>>
>> >+ i++;
>> >+ }
>> >+
>> >+ if (!i) {
>> >+ store_slot_info(region, image_size);
>> >+ return;
>> >+ }
>> >+
>> >+ /* Process the remaining regions after filtering out. */
>> >+
>> This line may be unusable.
>
>Hmm, I made it on purpose. Because 1GB huge pages may be digged out from
>the middle, then the remaing head and tail regions still need be
>handled. I put it here to mean that it covers below two code blocks.
>

Yes, the two parts below are all in the condition when if(!i) is false.
The first part is the memory before good slots for huge pages,
the second part is after.

>I can remove it if people think it's not appropriate.
>
>> >+ if (addr >= region->start + image_size) {
>> >+ tmp.start = region->start;
>> >+ tmp.size = addr - region->start;
>> >+ store_slot_info(&tmp, image_size);
>> >+ }
>> >+
>> >+ size = region->size - (addr - region->start) - i * PUD_SIZE;
>> >+ if (size >= image_size) {
>> >+ tmp.start = addr + i*PUD_SIZE;
>> >+ tmp.size = size;
>> >+ store_slot_info(&tmp, image_size);
>> >+ }
These 5 lines may have a wrong space, you can check it.

Thanks,
Chao Fan

>>
>> I have another question not related to kaslr.
>> Here you try to avoid the memory from addr to (addr + i * PUD_SIZE),
>> but I wonder if after walking all memory regions, 'max_gb_huge_pages'
>> is still more than 0, which means there isn't enough memory slots for
>> huge page, what will happen?
>
>Please check the response at the beginning of response.
>
>Thanks
>Baoquan
>
>>
>>
>> >+}
>> >+
>> > static unsigned long slots_fetch_random(void)
>> > {
>> > unsigned long slot;
>> >--
>> >2.13.6
>> >
>> >
>> >
>>
>>
>
>



2018-05-17 06:15:07

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

On 05/17/18 at 01:53pm, Chao Fan wrote:
> On Thu, May 17, 2018 at 12:03:43PM +0800, Baoquan He wrote:
> >Hi Chao,
> >
> >On 05/17/18 at 11:27am, Chao Fan wrote:
> >> >+/* Store the number of 1GB huge pages which user specified.*/
> >> >+static unsigned long max_gb_huge_pages;
> >> >+
> >> >+static int parse_gb_huge_pages(char *param, char* val)
> >> >+{
> >> >+ char *p;
> >> >+ u64 mem_size;
> >> >+ static bool gbpage_sz = false;
> >> >+
> >> >+ if (!strcmp(param, "hugepagesz")) {
> >> >+ p = val;
> >> >+ mem_size = memparse(p, &p);
> >> >+ if (mem_size == PUD_SIZE) {
> >> >+ if (gbpage_sz)
> >> >+ warn("Repeadly set hugeTLB page size of 1G!\n");
> >> >+ gbpage_sz = true;
> >> >+ } else
> >> >+ gbpage_sz = false;
> >> >+ } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> >> >+ p = val;
> >> >+ max_gb_huge_pages = simple_strtoull(p, &p, 0);
> >> >+ debug_putaddr(max_gb_huge_pages);
> >> >+ }
> >> >+}
> >> >+
> >> >+
> >> > static int handle_mem_memmap(void)
> >> > {
> >> > char *args = (char *)get_cmd_line_ptr();
> >> >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, unsigned long image_size)
> >> > }
> >> > }
> >> >
> >> >+/* Skip as many 1GB huge pages as possible in the passed region. */
> >> >+static void process_gb_huge_page(struct mem_vector *region, unsigned long image_size)
> >> >+{
> >> >+ int i = 0;
> >> >+ unsigned long addr, size;
> >> >+ struct mem_vector tmp;
> >> >+
> >> >+ if (!max_gb_huge_pages) {
> >> >+ store_slot_info(region, image_size);
> >> >+ return;
> >> >+ }
> >> >+
> >> >+ addr = ALIGN(region->start, PUD_SIZE);
> >> >+ /* If Did we raise the address above the passed in memory entry? */
> >> >+ if (addr < region->start + region->size)
> >> >+ size = region->size - (addr - region->start);
> >> >+
> >> >+ /* Check how many 1GB huge pages can be filtered out*/
> >> >+ while (size > PUD_SIZE && max_gb_huge_pages) {
> >> >+ size -= PUD_SIZE;
> >> >+ max_gb_huge_pages--;
> >>
> >> The global variable 'max_gb_huge_pages' means how many huge pages
> >> user specified when you get it from command line.
> >> But here, everytime we find a position which is good for huge page
> >> allocation, the 'max_gdb_huge_page' decreased. So in my understanding,
> >> it is used to store how many huge pages that we still need to search memory
> >> for good slots to filter out, right?
> >> If it's right, maybe the name 'max_gb_huge_pages' is not very suitable.
> >> If my understanding is wrong, please tell me.
> >
> >No, you have understood it very right. I finished the draft patch last
> >week, but changed this variable name and the function names several
> >time, still I feel they are not good. However I can't get a better name.
> >
> >Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected
> >from kernel command-line. And in this function it will be decreased. But
> >we can't define another global variable only for decreasing in this
> >place.
> >
> >And you can see that in this patchset I only take cares of 1GB huge
> >pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why
> >1GB only? Because 2MB is not impacted by KASLR, please check the code in
> >hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be
> >pre-allocated in hugetlb_nrpages_setup(), and if you look into
> >hugetlb_nrpages_setup(), you will find that it will call
> >alloc_bootmem_huge_page() to allocate huge pages one by one, but not at
> >one time. That is why I always add 'gb' in the middle of the global
> >variable and the newly added functions.
> >
> >And it will answer your below questions. When walk over all memory
> >regions, 'max_gb_huge_pages' is still not 0, what should we do? It's
> >normal and done as expected. Here hugetlb only try its best to allocate
> >as many as possible according to 'max_gb_huge_pages'. If can't fully
> >satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add
> >below to command-line:
> >
> >default_hugepagesz=1G hugepagesz=1G hugepages=20
> >
> >Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G)
> >and [3G,4G) are touched by bios reservation and pci/firmware reservation.
> >Then this 14 huge pages are maximal value which is expected. It's not a
> >bug in huge page. But with kaslr enabled, it sometime only get 13 1GB
> >huge pages because KASLR put kernel into one of those good 1GB huge
> >pages. This is a bug.
>
> Thanks for your explaination, I got it.
>
> >
> >I am not very familiar with huge page handling, just read code recently
> >because of this kaslr bug. Hope Luiz and people from his team can help
> >correct and clarify if anything is not right. Especially the function
> >names, I feel it's not good, if anyone have a better idea, I will really
> >appreciate that.
> >>
> >> >+ i++;
> >> >+ }
> >> >+
> >> >+ if (!i) {
> >> >+ store_slot_info(region, image_size);
> >> >+ return;
> >> >+ }
> >> >+
> >> >+ /* Process the remaining regions after filtering out. */
> >> >+
> >> This line may be unusable.
> >
> >Hmm, I made it on purpose. Because 1GB huge pages may be digged out from
> >the middle, then the remaing head and tail regions still need be
> >handled. I put it here to mean that it covers below two code blocks.
> >
>
> Yes, the two parts below are all in the condition when if(!i) is false.
> The first part is the memory before good slots for huge pages,
> the second part is after.
>
> >I can remove it if people think it's not appropriate.
> >
> >> >+ if (addr >= region->start + image_size) {
> >> >+ tmp.start = region->start;
> >> >+ tmp.size = addr - region->start;
> >> >+ store_slot_info(&tmp, image_size);
> >> >+ }
> >> >+
> >> >+ size = region->size - (addr - region->start) - i * PUD_SIZE;
> >> >+ if (size >= image_size) {
> >> >+ tmp.start = addr + i*PUD_SIZE;
> >> >+ tmp.size = size;
> >> >+ store_slot_info(&tmp, image_size);
> >> >+ }
> These 5 lines may have a wrong space, you can check it.

Yes, will replace it with tab. Thanks.

>
> >>
> >> I have another question not related to kaslr.
> >> Here you try to avoid the memory from addr to (addr + i * PUD_SIZE),
> >> but I wonder if after walking all memory regions, 'max_gb_huge_pages'
> >> is still more than 0, which means there isn't enough memory slots for
> >> huge page, what will happen?
> >
> >Please check the response at the beginning of response.
> >
> >Thanks
> >Baoquan
> >
> >>
> >>
> >> >+}
> >> >+
> >> > static unsigned long slots_fetch_random(void)
> >> > {
> >> > unsigned long slot;
> >> >--
> >> >2.13.6
> >> >
> >> >
> >> >
> >>
> >>
> >
> >
>
>

2018-05-18 07:02:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization


* Baoquan He <[email protected]> wrote:

> This is a regression bug fix. Luiz's team reported that 1GB huge page
> allocation will get one less 1GB page randomly when KASLR is enabled. On
> their KVM guest with 4GB RAM, which only has one good 1GB huge page,
> they found the 1GB huge page allocation sometime failed with below
> kernel option adding.
>
> default_hugepagesz=1G hugepagesz=1G hugepages=1
>
> This is because kernel may be randomized into those good 1GB huge pages.
>
> I ever thought to solve this by specifying available memory regions
> which kernel KASLR can be randomized into to avoid those good 1GB huge
> pages. Chao's patches can be used to fix it:
> https://lkml.org/lkml/2018/2/28/217
>
> Later, Ingo suggested avoiding them in boot KASLR code.
> https://lkml.org/lkml/2018/3/12/312

Yes, but these patches don't appear to implement what I suggested:

> So there's apparently a mis-design here:
>
> - KASLR needs to be done very early on during bootup: - it's not realistic to
> expect KASLR to be done with a booted up kernel, because pointers to various
> KASLR-ed objects are already widely spread out in memory.
>
> - But for some unfathomable reason the memory hotplug attribute of memory
> regions is not part of the regular memory map but part of late-init ACPI data
> structures.
>
> The right solution would be _not_ to fudge the KASLR location, but to provide
> the memory hotplug information to early code, preferably via the primary memory
> map. KASLR can then make use of it and avoid those regions, just like it avoids
> other memory regions already.
>
> In addition to that hardware makers (including virtualized hardware) should also
> fix their systems to provide memory hotplug information to early code.

So my question: why don't we pass in the information that these are hotplug pages
that should not be KASLR randomized into?

If that attribute of memory regions was present then KASLR could simply skip the
hotplug regions!

Thanks,

Ingo

2018-05-18 07:44:41

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization

On 05/18/18 at 09:00am, Ingo Molnar wrote:
>
> * Baoquan He <[email protected]> wrote:
>
> > This is a regression bug fix. Luiz's team reported that 1GB huge page
> > allocation will get one less 1GB page randomly when KASLR is enabled. On
> > their KVM guest with 4GB RAM, which only has one good 1GB huge page,
> > they found the 1GB huge page allocation sometime failed with below
> > kernel option adding.
> >
> > default_hugepagesz=1G hugepagesz=1G hugepages=1
> >
> > This is because kernel may be randomized into those good 1GB huge pages.
> >
> > I ever thought to solve this by specifying available memory regions
> > which kernel KASLR can be randomized into to avoid those good 1GB huge
> > pages. Chao's patches can be used to fix it:
> > https://lkml.org/lkml/2018/2/28/217
> >
> > Later, Ingo suggested avoiding them in boot KASLR code.
> > https://lkml.org/lkml/2018/3/12/312
>
> Yes, but these patches don't appear to implement what I suggested:
>
> > So there's apparently a mis-design here:
> >
> > - KASLR needs to be done very early on during bootup: - it's not realistic to
> > expect KASLR to be done with a booted up kernel, because pointers to various
> > KASLR-ed objects are already widely spread out in memory.
> >
> > - But for some unfathomable reason the memory hotplug attribute of memory
> > regions is not part of the regular memory map but part of late-init ACPI data
> > structures.
> >
> > The right solution would be _not_ to fudge the KASLR location, but to provide
> > the memory hotplug information to early code, preferably via the primary memory
> > map. KASLR can then make use of it and avoid those regions, just like it avoids
> > other memory regions already.
> >
> > In addition to that hardware makers (including virtualized hardware) should also
> > fix their systems to provide memory hotplug information to early code.
>
> So my question: why don't we pass in the information that these are hotplug pages
> that should not be KASLR randomized into?
>
> If that attribute of memory regions was present then KASLR could simply skip the
> hotplug regions!

OK, I realized my saying above is misled because I didn't explain the
background clearly. Let me add it:

Previously, FJ reported the movable_node issue that KASLR will put
kernel into movable_node. That cause those movable_nodes can't be hot
plugged any more. So finally we plannned to solve it by adding a new
kernel parameter :

kaslr_boot_mem=nn[KMG]@ss[KMG]

We want customer to specify memory regions which KASLR can make use to
randomize kernel into. Outside of the specified regions, we need avoid
to put kernel into those regions even though they are also available
RAM. As for movable_node issue, we can add immovable regions into
kaslr_boot_mem=nn[KMG]@ss[KMG].

During this hotplug issue reviewing, Luiz's team reported this 1GB hugepages
regression bug, I reproduced the bug and found out the root cause, then
realized that I can utilize kaslr_boot_mem=nn[KMG]@ss[KMG] parameter to
fix it too. E.g the KVM guest with 4GB RAM, we have a good 1GB huge
page, then we can add "kaslr_boot_mem=1G@0, kaslr_boot_mem=3G@2G" to
kernel command-line, then the good 1GB region [1G, 2G) won't be taken
into account for kernel physical randomization.

Later, you pointed out that 'kaslr_boot_mem=' way need user to specify
memory region manually, it's not good, suggested to solve them by
getting information and solving them in KASLR boot code. So they are two
issues now, for the movable_node issue, we need get hotplug information
from SRAT table and then avoid them; for this 1GB hugepage issue, we
need get information from kernel command-line, then avoid them.

This patch is for the hugepage issue only. Since FJ reported the hotplug
issue and they assigned engineers to work on it, I would like to wait
for them to post according to your suggestion.

I will add this to cover letter of v2 post.

Thanks
Baoquan

2018-05-18 09:11:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization


* Baoquan He <[email protected]> wrote:

> OK, I realized my saying above is misled because I didn't explain the
> background clearly. Let me add it:
>
> Previously, FJ reported the movable_node issue that KASLR will put
> kernel into movable_node. That cause those movable_nodes can't be hot
> plugged any more. So finally we plannned to solve it by adding a new
> kernel parameter :
>
> kaslr_boot_mem=nn[KMG]@ss[KMG]
>
> We want customer to specify memory regions which KASLR can make use to
> randomize kernel into.

*WHY* should the "customer" care?

This is a _bug_: movable, hotpluggable zones of physical memory should not be
randomized into.

> [...] Outside of the specified regions, we need avoid to put kernel into those
> regions even though they are also available RAM. As for movable_node issue, we
> can add immovable regions into kaslr_boot_mem=nn[KMG]@ss[KMG].
>
> During this hotplug issue reviewing, Luiz's team reported this 1GB hugepages
> regression bug, I reproduced the bug and found out the root cause, then
> realized that I can utilize kaslr_boot_mem=nn[KMG]@ss[KMG] parameter to
> fix it too. E.g the KVM guest with 4GB RAM, we have a good 1GB huge
> page, then we can add "kaslr_boot_mem=1G@0, kaslr_boot_mem=3G@2G" to
> kernel command-line, then the good 1GB region [1G, 2G) won't be taken
> into account for kernel physical randomization.
>
> Later, you pointed out that 'kaslr_boot_mem=' way need user to specify
> memory region manually, it's not good, suggested to solve them by
> getting information and solving them in KASLR boot code. So they are two
> issues now, for the movable_node issue, we need get hotplug information
> from SRAT table and then avoid them; for this 1GB hugepage issue, we
> need get information from kernel command-line, then avoid them.
>
> This patch is for the hugepage issue only. Since FJ reported the hotplug
> issue and they assigned engineers to work on it, I would like to wait
> for them to post according to your suggestion.

All of this is handling it the wrong way about. This is *not* primarily about
KASLR at all, and the user should not be required to specify some weird KASLR
parameters.

This is a basic _memory map enumeration_ problem in both cases:

- in the hotplug case KASLR doesn't know that it's a movable zone and relocates
into it,

- and in the KVM case KASLR doesn't know that it's a valuable 1GB page that
shouldn't be broken up.

Note that it's not KASLR specific: if we had some other kernel feature that tried
to allocate a piece of memory from what appears to be perfectly usable generic RAM
we'd have the same problems!

We need to fix the real root problem, which is lack of knowledge about crutial
attributes of physical memory. Once that knowledge is properly represented at this
early boot stage both KASLR and other memory allocators can make use of it to
avoid those regions.

Thanks,

Ingo

2018-05-18 11:29:18

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization

On 05/18/18 at 10:19am, Ingo Molnar wrote:
>
> * Baoquan He <[email protected]> wrote:
>
> > OK, I realized my saying above is misled because I didn't explain the
> > background clearly. Let me add it:
> >
> > Previously, FJ reported the movable_node issue that KASLR will put
> > kernel into movable_node. That cause those movable_nodes can't be hot
> > plugged any more. So finally we plannned to solve it by adding a new
> > kernel parameter :
> >
> > kaslr_boot_mem=nn[KMG]@ss[KMG]
> >
> > We want customer to specify memory regions which KASLR can make use to
> > randomize kernel into.
>
> *WHY* should the "customer" care?
>
> This is a _bug_: movable, hotpluggable zones of physical memory should not be
> randomized into.

Yes, for movable zones, agreed.

But for huge pages, it's only related to memory layout.

>
> > [...] Outside of the specified regions, we need avoid to put kernel into those
> > regions even though they are also available RAM. As for movable_node issue, we
> > can add immovable regions into kaslr_boot_mem=nn[KMG]@ss[KMG].
> >
> > During this hotplug issue reviewing, Luiz's team reported this 1GB hugepages
> > regression bug, I reproduced the bug and found out the root cause, then
> > realized that I can utilize kaslr_boot_mem=nn[KMG]@ss[KMG] parameter to
> > fix it too. E.g the KVM guest with 4GB RAM, we have a good 1GB huge
> > page, then we can add "kaslr_boot_mem=1G@0, kaslr_boot_mem=3G@2G" to
> > kernel command-line, then the good 1GB region [1G, 2G) won't be taken
> > into account for kernel physical randomization.
> >
> > Later, you pointed out that 'kaslr_boot_mem=' way need user to specify
> > memory region manually, it's not good, suggested to solve them by
> > getting information and solving them in KASLR boot code. So they are two
> > issues now, for the movable_node issue, we need get hotplug information
> > from SRAT table and then avoid them; for this 1GB hugepage issue, we
> > need get information from kernel command-line, then avoid them.
> >
> > This patch is for the hugepage issue only. Since FJ reported the hotplug
> > issue and they assigned engineers to work on it, I would like to wait
> > for them to post according to your suggestion.
>
> All of this is handling it the wrong way about. This is *not* primarily about
> KASLR at all, and the user should not be required to specify some weird KASLR
> parameters.
>
> This is a basic _memory map enumeration_ problem in both cases:
>
> - in the hotplug case KASLR doesn't know that it's a movable zone and relocates
> into it,

Yes, in boot KASLR, we haven't parsed ACPI table to get hotplug
information. If decide to read SRAT table, we can get if memory region
is hotpluggable, then avoid them. This can be consistent with the later
code after entering kernel.

>
> - and in the KVM case KASLR doesn't know that it's a valuable 1GB page that
> shouldn't be broken up.
>
> Note that it's not KASLR specific: if we had some other kernel feature that tried
> to allocate a piece of memory from what appears to be perfectly usable generic RAM
> we'd have the same problems!

Hmm, this may not be the situation for 1GB huge pages. For 1GB huge
pages, the bug is that on KVM guest with 4GB ram, when user adds
'default_hugepagesz=1G hugepagesz=1G hugepages=1' to kernel
command-line, if 'nokaslr' is specified, they can get 1GB huge page
allocated successfully. If remove 'nokaslr', namely KASLR is enabled,
the 1GB huge page allocation failed.

In hugetlb_nrpages_setup(), you can see that the current huge page code
relies on memblock to get 1GB huge pages. Below is the e820 memory
map from Luiz's bug report. In fact there are two good 1GB huge pages,
one is [0x40000000, 0x7fffffff], the 2nd one is
[0x100000000, 0x13fffffff]. by default memblock will allocate top-down
if movable_node is set, then [0x100000000, 0x13fffffff] will be broken
when system initialization goes into hugetlb_nrpages_setup() invocation.
So normally huge page can only get one good 1GB huge page, whether KASLR
is enanled or not. This is not bug, but decided by the current huge page
implementation. In this case, KASLR boot code can see two good 1GB huge
pages, and try to avoid them. Besides, if it's a good 1GB huge page,
it's not defined in memory map and also not attribute. It's only decided
by the memory layout and also decided by the memory usage situation in
the running system. If want to keep all good 1GB huge pages untouched,
we may need to adjust the current memblock allocation code, to avoid
any possibility to step into good 1GB huge pages before huge page
allocation. However this comes to the improvement area of huge pages
implementation, not related to KASLR.

[ +0.000000] e820: BIOS-provided physical RAM map:
[ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable

Furthermore, on bare-metal with large memory, e.g with 100GB memory, if
user specifies 'default_hugepagesz=1G hugepagesz=1G hugepages=2' to only
expect two 1GB huge pages reserved, if we save all those tens of good
1GB huge pages untouched, it seems to be over reactive.

Not sure if I understand your point correctly, this is my thought about
the huge page issue, please help to point out anything wrong if any.

Thanks
Baoquan
>
> We need to fix the real root problem, which is lack of knowledge about crutial
> attributes of physical memory. Once that knowledge is properly represented at this
> early boot stage both KASLR and other memory allocators can make use of it to
> avoid those regions.
>
> Thanks,
>
> Ingo

2018-05-18 12:15:29

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization

On 05/18/18 at 07:28pm, Baoquan He wrote:
> On 05/18/18 at 10:19am, Ingo Molnar wrote:
> >
> > * Baoquan He <[email protected]> wrote:
> >
> > > OK, I realized my saying above is misled because I didn't explain the
> > > background clearly. Let me add it:
> > >
> > > Previously, FJ reported the movable_node issue that KASLR will put
> > > kernel into movable_node. That cause those movable_nodes can't be hot
> > > plugged any more. So finally we plannned to solve it by adding a new
> > > kernel parameter :
> > >
> > > kaslr_boot_mem=nn[KMG]@ss[KMG]
> > >
> > > We want customer to specify memory regions which KASLR can make use to
> > > randomize kernel into.
> >
> > *WHY* should the "customer" care?
> >
> > This is a _bug_: movable, hotpluggable zones of physical memory should not be
> > randomized into.
>
> Yes, for movable zones, agreed.
>
> But for huge pages, it's only related to memory layout.
>
> >
> > > [...] Outside of the specified regions, we need avoid to put kernel into those
> > > regions even though they are also available RAM. As for movable_node issue, we
> > > can add immovable regions into kaslr_boot_mem=nn[KMG]@ss[KMG].
> > >
> > > During this hotplug issue reviewing, Luiz's team reported this 1GB hugepages
> > > regression bug, I reproduced the bug and found out the root cause, then
> > > realized that I can utilize kaslr_boot_mem=nn[KMG]@ss[KMG] parameter to
> > > fix it too. E.g the KVM guest with 4GB RAM, we have a good 1GB huge
> > > page, then we can add "kaslr_boot_mem=1G@0, kaslr_boot_mem=3G@2G" to
> > > kernel command-line, then the good 1GB region [1G, 2G) won't be taken
> > > into account for kernel physical randomization.
> > >
> > > Later, you pointed out that 'kaslr_boot_mem=' way need user to specify
> > > memory region manually, it's not good, suggested to solve them by
> > > getting information and solving them in KASLR boot code. So they are two
> > > issues now, for the movable_node issue, we need get hotplug information
> > > from SRAT table and then avoid them; for this 1GB hugepage issue, we
> > > need get information from kernel command-line, then avoid them.
> > >
> > > This patch is for the hugepage issue only. Since FJ reported the hotplug
> > > issue and they assigned engineers to work on it, I would like to wait
> > > for them to post according to your suggestion.
> >
> > All of this is handling it the wrong way about. This is *not* primarily about
> > KASLR at all, and the user should not be required to specify some weird KASLR
> > parameters.
> >
> > This is a basic _memory map enumeration_ problem in both cases:
> >
> > - in the hotplug case KASLR doesn't know that it's a movable zone and relocates
> > into it,
>
> Yes, in boot KASLR, we haven't parsed ACPI table to get hotplug
> information. If decide to read SRAT table, we can get if memory region
> is hotpluggable, then avoid them. This can be consistent with the later
> code after entering kernel.
>
> >
> > - and in the KVM case KASLR doesn't know that it's a valuable 1GB page that
> > shouldn't be broken up.
> >
> > Note that it's not KASLR specific: if we had some other kernel feature that tried
> > to allocate a piece of memory from what appears to be perfectly usable generic RAM
> > we'd have the same problems!
>
> Hmm, this may not be the situation for 1GB huge pages. For 1GB huge
> pages, the bug is that on KVM guest with 4GB ram, when user adds
> 'default_hugepagesz=1G hugepagesz=1G hugepages=1' to kernel
> command-line, if 'nokaslr' is specified, they can get 1GB huge page
> allocated successfully. If remove 'nokaslr', namely KASLR is enabled,
> the 1GB huge page allocation failed.
>
> In hugetlb_nrpages_setup(), you can see that the current huge page code
> relies on memblock to get 1GB huge pages. Below is the e820 memory
> map from Luiz's bug report. In fact there are two good 1GB huge pages,
> one is [0x40000000, 0x7fffffff], the 2nd one is
> [0x100000000, 0x13fffffff]. by default memblock will allocate top-down
> if movable_node is set, then [0x100000000, 0x13fffffff] will be broken
~not
Sorry, missed 'not'.

void __init setup_arch(char **cmdline_p)
{
...
#ifdef CONFIG_MEMORY_HOTPLUG
if (movable_node_is_enabled())
memblock_set_bottom_up(true);
#endif
...
}

> when system initialization goes into hugetlb_nrpages_setup() invocation.
> So normally huge page can only get one good 1GB huge page, whether KASLR
> is enanled or not. This is not bug, but decided by the current huge page
> implementation. In this case, KASLR boot code can see two good 1GB huge
> pages, and try to avoid them. Besides, if it's a good 1GB huge page,
> it's not defined in memory map and also not attribute. It's only decided
> by the memory layout and also decided by the memory usage situation in
> the running system. If want to keep all good 1GB huge pages untouched,
> we may need to adjust the current memblock allocation code, to avoid
> any possibility to step into good 1GB huge pages before huge page
> allocation. However this comes to the improvement area of huge pages
> implementation, not related to KASLR.
>
> [ +0.000000] e820: BIOS-provided physical RAM map:
> [ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> [ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> [ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> [ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
> [ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
> [ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> [ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> [ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable
>
> Furthermore, on bare-metal with large memory, e.g with 100GB memory, if
> user specifies 'default_hugepagesz=1G hugepagesz=1G hugepages=2' to only
> expect two 1GB huge pages reserved, if we save all those tens of good
> 1GB huge pages untouched, it seems to be over reactive.
>
> Not sure if I understand your point correctly, this is my thought about
> the huge page issue, please help to point out anything wrong if any.
>
> Thanks
> Baoquan
> >
> > We need to fix the real root problem, which is lack of knowledge about crutial
> > attributes of physical memory. Once that knowledge is properly represented at this
> > early boot stage both KASLR and other memory allocators can make use of it to
> > avoid those regions.
> >
> > Thanks,
> >
> > Ingo

2018-05-23 19:10:52

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization

On Fri, 18 May 2018 19:28:36 +0800
Baoquan He <[email protected]> wrote:

> > Note that it's not KASLR specific: if we had some other kernel feature that tried
> > to allocate a piece of memory from what appears to be perfectly usable generic RAM
> > we'd have the same problems!
>
> Hmm, this may not be the situation for 1GB huge pages. For 1GB huge
> pages, the bug is that on KVM guest with 4GB ram, when user adds
> 'default_hugepagesz=1G hugepagesz=1G hugepages=1' to kernel
> command-line, if 'nokaslr' is specified, they can get 1GB huge page
> allocated successfully. If remove 'nokaslr', namely KASLR is enabled,
> the 1GB huge page allocation failed.

Let me clarify that this issue is not specific to KVM in any way. The same
issue happens on bare-metal, but if you have lots of memory you'll hardly
notice it. On the other hand, it's common to create KVM guests with a few
GBs of memory. In those guests, you may not be able to get a 1GB hugepage
at all if kaslr is enabled.

This series is a simple fix for this bug. It hooks up into already existing
KASLR code that scans memory regions to be avoided. The memory hotplug
issue is left for another day.

Now, if I understand what Ingo is saying is that he wants to see all problems
solved with a generic solution vs. a specific solution for each problem.

2018-05-28 09:54:53

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization

On 05/23/18 at 03:10pm, Luiz Capitulino wrote:
> On Fri, 18 May 2018 19:28:36 +0800
> Baoquan He <[email protected]> wrote:
>
> > > Note that it's not KASLR specific: if we had some other kernel feature that tried
> > > to allocate a piece of memory from what appears to be perfectly usable generic RAM
> > > we'd have the same problems!
> >
> > Hmm, this may not be the situation for 1GB huge pages. For 1GB huge
> > pages, the bug is that on KVM guest with 4GB ram, when user adds
> > 'default_hugepagesz=1G hugepagesz=1G hugepages=1' to kernel
> > command-line, if 'nokaslr' is specified, they can get 1GB huge page
> > allocated successfully. If remove 'nokaslr', namely KASLR is enabled,
> > the 1GB huge page allocation failed.
>
> Let me clarify that this issue is not specific to KVM in any way. The same
> issue happens on bare-metal, but if you have lots of memory you'll hardly
> notice it. On the other hand, it's common to create KVM guests with a few
> GBs of memory. In those guests, you may not be able to get a 1GB hugepage
> at all if kaslr is enabled.
>
> This series is a simple fix for this bug. It hooks up into already existing
> KASLR code that scans memory regions to be avoided. The memory hotplug
> issue is left for another day.

Exactly.

This issue is about kernel being randomized into good 1GB huge pages to
break later huge page allocation, and we can only scan memory to know
where 1GB huge page is located and avoid them.

The memory hotplug issue is about kernel being randomized into movable
memory regions, and we need read ACPI SRAT table to retrieve the
attribute of memory regions to know if it's movable, then avoid it if
yes.

>
> Now, if I understand what Ingo is saying is that he wants to see all problems
> solved with a generic solution vs. a specific solution for each problem.

Hmm, if we understand Ingo's words correctly, for these two issues,
seems there isn't a generic solution to solve both of them. We can only
fix them separately.

Hi Ingo,

Ping!

Not sure if my above understanding is correct. Could you confirm if I
have understood your comments and if the solution of this patchset is
right?

Thanks
Baoquan

2018-05-29 13:30:24

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization

On Mon, 28 May 2018 17:54:18 +0800
Baoquan He <[email protected]> wrote:

> On 05/23/18 at 03:10pm, Luiz Capitulino wrote:
> > On Fri, 18 May 2018 19:28:36 +0800
> > Baoquan He <[email protected]> wrote:
> >
> > > > Note that it's not KASLR specific: if we had some other kernel feature that tried
> > > > to allocate a piece of memory from what appears to be perfectly usable generic RAM
> > > > we'd have the same problems!
> > >
> > > Hmm, this may not be the situation for 1GB huge pages. For 1GB huge
> > > pages, the bug is that on KVM guest with 4GB ram, when user adds
> > > 'default_hugepagesz=1G hugepagesz=1G hugepages=1' to kernel
> > > command-line, if 'nokaslr' is specified, they can get 1GB huge page
> > > allocated successfully. If remove 'nokaslr', namely KASLR is enabled,
> > > the 1GB huge page allocation failed.
> >
> > Let me clarify that this issue is not specific to KVM in any way. The same
> > issue happens on bare-metal, but if you have lots of memory you'll hardly
> > notice it. On the other hand, it's common to create KVM guests with a few
> > GBs of memory. In those guests, you may not be able to get a 1GB hugepage
> > at all if kaslr is enabled.
> >
> > This series is a simple fix for this bug. It hooks up into already existing
> > KASLR code that scans memory regions to be avoided. The memory hotplug
> > issue is left for another day.
>
> Exactly.
>
> This issue is about kernel being randomized into good 1GB huge pages to
> break later huge page allocation, and we can only scan memory to know
> where 1GB huge page is located and avoid them.
>
> The memory hotplug issue is about kernel being randomized into movable
> memory regions, and we need read ACPI SRAT table to retrieve the
> attribute of memory regions to know if it's movable, then avoid it if
> yes.

Makes sense. Since the KASLR code already scans memory regions looking
for regions to skip and since this series just uses that, I think this
is a good solution to the problem:

Reviewed-and-Tested-by: Luiz Capitulino <[email protected]>

>
> >
> > Now, if I understand what Ingo is saying is that he wants to see all problems
> > solved with a generic solution vs. a specific solution for each problem.
>
> Hmm, if we understand Ingo's words correctly, for these two issues,
> seems there isn't a generic solution to solve both of them. We can only
> fix them separately.
>
> Hi Ingo,
>
> Ping!
>
> Not sure if my above understanding is correct. Could you confirm if I
> have understood your comments and if the solution of this patchset is
> right?
>
> Thanks
> Baoquan
>


2018-06-21 15:03:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling


* Baoquan He <[email protected]> wrote:

> +/* Store the number of 1GB huge pages which user specified.*/
> +static unsigned long max_gb_huge_pages;
> +
> +static int parse_gb_huge_pages(char *param, char* val)
> +{
> + char *p;
> + u64 mem_size;
> + static bool gbpage_sz = false;
> +
> + if (!strcmp(param, "hugepagesz")) {
> + p = val;
> + mem_size = memparse(p, &p);
> + if (mem_size == PUD_SIZE) {
> + if (gbpage_sz)
> + warn("Repeadly set hugeTLB page size of 1G!\n");
> + gbpage_sz = true;
> + } else
> + gbpage_sz = false;
> + } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> + p = val;
> + max_gb_huge_pages = simple_strtoull(p, &p, 0);
> + debug_putaddr(max_gb_huge_pages);
> + }
> +}

This function has at least one style problem and one typo.

Also, we don't put statics in the middle of function variable blocks.

> +/* Skip as many 1GB huge pages as possible in the passed region. */
> +static void process_gb_huge_page(struct mem_vector *region, unsigned long image_size)
> +{
> + int i = 0;
> + unsigned long addr, size;
> + struct mem_vector tmp;
> +
> + if (!max_gb_huge_pages) {
> + store_slot_info(region, image_size);
> + return;
> + }
> +
> + addr = ALIGN(region->start, PUD_SIZE);
> + /* If Did we raise the address above the passed in memory entry? */
> + if (addr < region->start + region->size)
> + size = region->size - (addr - region->start);
> +
> + /* Check how many 1GB huge pages can be filtered out*/
> + while (size > PUD_SIZE && max_gb_huge_pages) {
> + size -= PUD_SIZE;
> + max_gb_huge_pages--;
> + i++;
> + }
> +
> + if (!i) {
> + store_slot_info(region, image_size);
> + return;
> + }
> +
> + /* Process the remaining regions after filtering out. */
> +
> + if (addr >= region->start + image_size) {
> + tmp.start = region->start;
> + tmp.size = addr - region->start;
> + store_slot_info(&tmp, image_size);
> + }
> +
> + size = region->size - (addr - region->start) - i * PUD_SIZE;
> + if (size >= image_size) {
> + tmp.start = addr + i*PUD_SIZE;
> + tmp.size = size;
> + store_slot_info(&tmp, image_size);
> + }
> +}

I counted about 5 style problems and at least one typo in this function ...

Please review the code you are submitting more carefully for small details as
well.

Thanks,

Ingo

2018-06-22 12:16:14

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

On 06/21/18 at 05:01pm, Ingo Molnar wrote:
>
> * Baoquan He <[email protected]> wrote:
>
> > +/* Store the number of 1GB huge pages which user specified.*/
> > +static unsigned long max_gb_huge_pages;
> > +
> > +static int parse_gb_huge_pages(char *param, char* val)
> > +{
> > + char *p;
> > + u64 mem_size;
> > + static bool gbpage_sz = false;
> > +
> > + if (!strcmp(param, "hugepagesz")) {
> > + p = val;
> > + mem_size = memparse(p, &p);
> > + if (mem_size == PUD_SIZE) {
> > + if (gbpage_sz)
> > + warn("Repeadly set hugeTLB page size of 1G!\n");
> > + gbpage_sz = true;
> > + } else
> > + gbpage_sz = false;
> > + } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> > + p = val;
> > + max_gb_huge_pages = simple_strtoull(p, &p, 0);
> > + debug_putaddr(max_gb_huge_pages);
> > + }
> > +}
>
> This function has at least one style problem and one typo.
>
> Also, we don't put statics in the middle of function variable blocks.
>

Sorry, I will check all of them carefully and repost according to your
comments.

> > +/* Skip as many 1GB huge pages as possible in the passed region. */
> > +static void process_gb_huge_page(struct mem_vector *region, unsigned long image_size)
> > +{
> > + int i = 0;
> > + unsigned long addr, size;
> > + struct mem_vector tmp;
> > +
> > + if (!max_gb_huge_pages) {
> > + store_slot_info(region, image_size);
> > + return;
> > + }
> > +
> > + addr = ALIGN(region->start, PUD_SIZE);
> > + /* If Did we raise the address above the passed in memory entry? */
> > + if (addr < region->start + region->size)
> > + size = region->size - (addr - region->start);
> > +
> > + /* Check how many 1GB huge pages can be filtered out*/
> > + while (size > PUD_SIZE && max_gb_huge_pages) {
> > + size -= PUD_SIZE;
> > + max_gb_huge_pages--;
> > + i++;
> > + }
> > +
> > + if (!i) {
> > + store_slot_info(region, image_size);
> > + return;
> > + }
> > +
> > + /* Process the remaining regions after filtering out. */
> > +
> > + if (addr >= region->start + image_size) {
> > + tmp.start = region->start;
> > + tmp.size = addr - region->start;
> > + store_slot_info(&tmp, image_size);
> > + }
> > +
> > + size = region->size - (addr - region->start) - i * PUD_SIZE;
> > + if (size >= image_size) {
> > + tmp.start = addr + i*PUD_SIZE;
> > + tmp.size = size;
> > + store_slot_info(&tmp, image_size);
> > + }
> > +}
>
> I counted about 5 style problems and at least one typo in this function ...
>
> Please review the code you are submitting more carefully for small details as
> well.
>
> Thanks,
>
> Ingo

2018-06-24 07:16:03

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling


* Baoquan He <[email protected]> wrote:

> On 06/21/18 at 05:01pm, Ingo Molnar wrote:
> >
> > * Baoquan He <[email protected]> wrote:
> >
> > > +/* Store the number of 1GB huge pages which user specified.*/
> > > +static unsigned long max_gb_huge_pages;
> > > +
> > > +static int parse_gb_huge_pages(char *param, char* val)
> > > +{
> > > + char *p;
> > > + u64 mem_size;
> > > + static bool gbpage_sz = false;
> > > +
> > > + if (!strcmp(param, "hugepagesz")) {
> > > + p = val;
> > > + mem_size = memparse(p, &p);
> > > + if (mem_size == PUD_SIZE) {
> > > + if (gbpage_sz)
> > > + warn("Repeadly set hugeTLB page size of 1G!\n");
> > > + gbpage_sz = true;
> > > + } else
> > > + gbpage_sz = false;
> > > + } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> > > + p = val;
> > > + max_gb_huge_pages = simple_strtoull(p, &p, 0);
> > > + debug_putaddr(max_gb_huge_pages);
> > > + }
> > > +}
> >
> > This function has at least one style problem and one typo.
> >
> > Also, we don't put statics in the middle of function variable blocks.
> >
>
> Sorry, I will check all of them carefully and repost according to your
> comments.

Thanks!

Ingo