2019-11-15 14:51:35

by Masayoshi Mizuma

[permalink] [raw]
Subject: [PATCH v5 0/4] Adjust the padding size for KASLR

From: Masayoshi Mizuma <[email protected]>

The system sometimes crashes while memory hot-adding on KASLR
enabled system. The crash happens because the regions pointed by
kaslr_regions[].base are overwritten by the hot-added memory.

It happens because of the padding size for kaslr_regions[].base isn't
enough for the system whose physical memory layout has huge space for
memory hotplug. kaslr_regions[].base points "actual installed
memory size + padding" or higher address. So, if the "actual + padding"
is lower address than the maximum memory address, which means the memory
address reachable by memory hot-add, kaslr_regions[].base is destroyed by
the overwritten.

address
^
|------- maximum memory address (Hotplug)
| ^
|------- kaslr_regions[0].base | Hotadd-able region
| ^ |
| | padding |
| V V
|------- actual memory address (Installed on boot)
|

Fix it by getting the maximum memory address from SRAT and store
the value in boot_param, then set the padding size while KASLR
initializing if the default padding size isn't enough.

Masayoshi Mizuma (4):
x86/boot: Wrap up the SRAT traversing code into subtable_parse()
x86/boot: Add max_addr field in struct boot_params
x86/boot: Get the max address from SRAT
x86/mm/KASLR: Adjust the padding size for the direct mapping.

Documentation/x86/zero-page.rst | 4 ++
arch/x86/boot/compressed/acpi.c | 42 ++++++++++++++++----
arch/x86/include/uapi/asm/bootparam.h | 2 +-
arch/x86/mm/kaslr.c | 57 ++++++++++++++++++++-------
4 files changed, 82 insertions(+), 23 deletions(-)

--
2.20.1


2019-11-15 14:51:40

by Masayoshi Mizuma

[permalink] [raw]
Subject: [PATCH v5 1/4] x86/boot: Wrap up the SRAT traversing code into subtable_parse()

From: Masayoshi Mizuma <[email protected]>

Wrap up the SRAT traversing code into subtable_parse().

Reviewed-by: Baoquan He <[email protected]>
Signed-off-by: Masayoshi Mizuma <[email protected]>
---
arch/x86/boot/compressed/acpi.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index 25019d42ae93..a0f81438a0fd 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -362,6 +362,19 @@ static unsigned long get_acpi_srat_table(void)
return 0;
}

+static void subtable_parse(struct acpi_subtable_header *sub_table, int *num)
+{
+ struct acpi_srat_mem_affinity *ma;
+
+ ma = (struct acpi_srat_mem_affinity *)sub_table;
+
+ if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) {
+ immovable_mem[*num].start = ma->base_address;
+ immovable_mem[*num].size = ma->length;
+ (*num)++;
+ }
+}
+
/**
* count_immovable_mem_regions - Parse SRAT and cache the immovable
* memory regions into the immovable_mem array.
@@ -395,14 +408,8 @@ int count_immovable_mem_regions(void)
while (table + sizeof(struct acpi_subtable_header) < table_end) {
sub_table = (struct acpi_subtable_header *)table;
if (sub_table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) {
- struct acpi_srat_mem_affinity *ma;

- ma = (struct acpi_srat_mem_affinity *)sub_table;
- if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) {
- immovable_mem[num].start = ma->base_address;
- immovable_mem[num].size = ma->length;
- num++;
- }
+ subtable_parse(sub_table, &num);

if (num >= MAX_NUMNODES*2) {
debug_putstr("Too many immovable memory regions, aborting.\n");
--
2.20.1

2019-11-15 14:51:57

by Masayoshi Mizuma

[permalink] [raw]
Subject: [PATCH v5 3/4] x86/boot: Get the max address from SRAT

From: Masayoshi Mizuma <[email protected]>

Get the max address from SRAT and write it into boot_params->max_addr.

Acked-by: Baoquan He <[email protected]>
Signed-off-by: Masayoshi Mizuma <[email protected]>
---
arch/x86/boot/compressed/acpi.c | 33 +++++++++++++++++++++++++++------
1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index a0f81438a0fd..47db706656e0 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -362,17 +362,32 @@ static unsigned long get_acpi_srat_table(void)
return 0;
}

-static void subtable_parse(struct acpi_subtable_header *sub_table, int *num)
+/**
+ * subtable_parse - Cache the immovable memory regions into the
+ * immovable_mem array and update the index of the array.
+ *
+ * Return the end physical address of hotpluggable region.
+ * The system memory can be extended to the address by hotplug.
+ */
+static unsigned long subtable_parse(struct acpi_subtable_header *sub_table,
+ int *num)
{
struct acpi_srat_mem_affinity *ma;
+ unsigned long addr = 0;

ma = (struct acpi_srat_mem_affinity *)sub_table;

- if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) {
- immovable_mem[*num].start = ma->base_address;
- immovable_mem[*num].size = ma->length;
- (*num)++;
+ if (ma->length) {
+ if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)
+ addr = ma->base_address + ma->length;
+ else {
+ immovable_mem[*num].start = ma->base_address;
+ immovable_mem[*num].size = ma->length;
+ (*num)++;
+ }
}
+
+ return addr;
}

/**
@@ -391,6 +406,7 @@ int count_immovable_mem_regions(void)
struct acpi_subtable_header *sub_table;
struct acpi_table_header *table_header;
char arg[MAX_ACPI_ARG_LENGTH];
+ unsigned long max_addr = 0, addr;
int num = 0;

if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 &&
@@ -409,7 +425,9 @@ int count_immovable_mem_regions(void)
sub_table = (struct acpi_subtable_header *)table;
if (sub_table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) {

- subtable_parse(sub_table, &num);
+ addr = subtable_parse(sub_table, &num);
+ if (addr > max_addr)
+ max_addr = addr;

if (num >= MAX_NUMNODES*2) {
debug_putstr("Too many immovable memory regions, aborting.\n");
@@ -418,6 +436,9 @@ int count_immovable_mem_regions(void)
}
table += sub_table->length;
}
+
+ boot_params->max_addr = max_addr;
+
return num;
}
#endif /* CONFIG_RANDOMIZE_BASE && CONFIG_MEMORY_HOTREMOVE */
--
2.20.1

2019-11-15 14:52:21

by Masayoshi Mizuma

[permalink] [raw]
Subject: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

From: Masayoshi Mizuma <[email protected]>

The system sometimes crashes while memory hot-adding on KASLR
enabled system. The crash happens because the regions pointed by
kaslr_regions[].base are overwritten by the hot-added memory.

It happens because of the padding size for kaslr_regions[].base isn't
enough for the system whose physical memory layout has huge space for
memory hotplug. kaslr_regions[].base points "actual installed
memory size + padding" or higher address. So, if the "actual + padding"
is lower address than the maximum memory address, which means the memory
address reachable by memory hot-add, kaslr_regions[].base is destroyed by
the overwritten.

address
^
|------- maximum memory address (Hotplug)
| ^
|------- kaslr_regions[0].base | Hotadd-able region
| ^ |
| | padding |
| V V
|------- actual memory address (Installed on boot)
|

Fix it by getting the maximum memory address from SRAT and store
the value in boot_param, then set the padding size while KASLR
initializing if the default padding size isn't enough.

Signed-off-by: Baoquan He <[email protected]>
Signed-off-by: Masayoshi Mizuma <[email protected]>
---
arch/x86/mm/kaslr.c | 57 +++++++++++++++++++++++++++++++++------------
1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index dc6182eecefa..1f0aa9e68226 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -70,15 +70,52 @@ static inline bool kaslr_memory_enabled(void)
return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
}

+/*
+ * Even though a huge virtual address space is reserved for the direct
+ * mapping of physical memory, e.g in 4-level paging mode, it's 64TB,
+ * rare system can own enough physical memory to use it up, most are
+ * even less than 1TB. So with KASLR enabled, we adapt the size of
+ * direct mapping area to the size of actual physical memory plus the
+ * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
+ * The left part will be taken out to join memory randomization.
+ */
+static inline unsigned long calc_direct_mapping_size(void)
+{
+ unsigned long size_tb, memory_tb;
+
+ memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
+ CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+ if (boot_params.max_addr) {
+ unsigned long maximum_tb;
+
+ maximum_tb = DIV_ROUND_UP(boot_params.max_addr,
+ 1UL << TB_SHIFT);
+
+ if (maximum_tb > memory_tb)
+ memory_tb = maximum_tb;
+ }
+#endif
+ size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
+
+ /*
+ * Adapt physical memory region size based on available memory
+ */
+ if (memory_tb < size_tb)
+ size_tb = memory_tb;
+
+ return size_tb;
+}
+
/* Initialize base and padding for each memory region randomized with KASLR */
void __init kernel_randomize_memory(void)
{
- size_t i;
- unsigned long vaddr_start, vaddr;
- unsigned long rand, memory_tb;
- struct rnd_state rand_state;
+ unsigned long vaddr_start, vaddr, rand;
unsigned long remain_entropy;
unsigned long vmemmap_size;
+ struct rnd_state rand_state;
+ size_t i;

vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
vaddr = vaddr_start;
@@ -95,20 +132,10 @@ void __init kernel_randomize_memory(void)
if (!kaslr_memory_enabled())
return;

- kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
+ kaslr_regions[0].size_tb = calc_direct_mapping_size();
kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;

- /*
- * Update Physical memory mapping to available and
- * add padding if needed (especially for memory hotplug support).
- */
BUG_ON(kaslr_regions[0].base != &page_offset_base);
- memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
- CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
-
- /* Adapt phyiscal memory region size based on available memory */
- if (memory_tb < kaslr_regions[0].size_tb)
- kaslr_regions[0].size_tb = memory_tb;

/*
* Calculate the vmemmap region size in TBs, aligned to a TB
--
2.20.1

2019-11-15 14:53:44

by Masayoshi Mizuma

[permalink] [raw]
Subject: [PATCH v5 2/4] x86/boot: Add max_addr field in struct boot_params

From: Masayoshi Mizuma <[email protected]>

Add max_addr field in struct boot_params. max_addr shows the
maximum memory address to be reachable by memory hot-add.
max_addr is set by parsing ACPI SRAT.

Reviewed-by: Baoquan He <[email protected]>
Signed-off-by: Masayoshi Mizuma <[email protected]>
---
Documentation/x86/zero-page.rst | 4 ++++
arch/x86/include/uapi/asm/bootparam.h | 2 +-
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/x86/zero-page.rst b/Documentation/x86/zero-page.rst
index f088f5881666..cc3938d68481 100644
--- a/Documentation/x86/zero-page.rst
+++ b/Documentation/x86/zero-page.rst
@@ -19,6 +19,7 @@ Offset/Size Proto Name Meaning
058/008 ALL tboot_addr Physical address of tboot shared page
060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information
(struct ist_info)
+078/010 ALL max_addr The possible maximum physical memory address [1]_
080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!!
090/010 ALL hd1_info hd1 disk parameter, OBSOLETE!!
0A0/010 ALL sys_desc_table System description table (struct sys_desc_table),
@@ -43,3 +44,6 @@ Offset/Size Proto Name Meaning
(array of struct e820_entry)
D00/1EC ALL eddbuf EDD data (array of struct edd_info)
=========== ===== ======================= =================================================
+
+.. [1] max_addr shows the maximum memory address to be reachable by memory
+ hot-add. max_addr is set by parsing ACPI SRAT.
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index c895df5482c5..6efad338bba9 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -158,7 +158,7 @@ struct boot_params {
__u64 tboot_addr; /* 0x058 */
struct ist_info ist_info; /* 0x060 */
__u64 acpi_rsdp_addr; /* 0x070 */
- __u8 _pad3[8]; /* 0x078 */
+ __u64 max_addr; /* 0x078 */
__u8 hd0_info[16]; /* obsolete! */ /* 0x080 */
__u8 hd1_info[16]; /* obsolete! */ /* 0x090 */
struct sys_desc_table sys_desc_table; /* obsolete! */ /* 0x0a0 */
--
2.20.1

2019-11-17 00:46:08

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On 11/15/19 at 09:49am, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma <[email protected]>
>
> The system sometimes crashes while memory hot-adding on KASLR
> enabled system. The crash happens because the regions pointed by
> kaslr_regions[].base are overwritten by the hot-added memory.
>
> It happens because of the padding size for kaslr_regions[].base isn't
> enough for the system whose physical memory layout has huge space for
> memory hotplug. kaslr_regions[].base points "actual installed
> memory size + padding" or higher address. So, if the "actual + padding"
> is lower address than the maximum memory address, which means the memory
> address reachable by memory hot-add, kaslr_regions[].base is destroyed by
> the overwritten.
>
> address
> ^
> |------- maximum memory address (Hotplug)
> | ^
> |------- kaslr_regions[0].base | Hotadd-able region
> | ^ |
> | | padding |
> | V V
> |------- actual memory address (Installed on boot)
> |
>
> Fix it by getting the maximum memory address from SRAT and store
> the value in boot_param, then set the padding size while KASLR
> initializing if the default padding size isn't enough.
>
> Signed-off-by: Baoquan He <[email protected]>
> Signed-off-by: Masayoshi Mizuma <[email protected]>
> ---
> arch/x86/mm/kaslr.c | 57 +++++++++++++++++++++++++++++++++------------
> 1 file changed, 42 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index dc6182eecefa..1f0aa9e68226 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -70,15 +70,52 @@ static inline bool kaslr_memory_enabled(void)
> return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
> }
>
> +/*
> + * Even though a huge virtual address space is reserved for the direct
> + * mapping of physical memory, e.g in 4-level paging mode, it's 64TB,
> + * rare system can own enough physical memory to use it up, most are
> + * even less than 1TB. So with KASLR enabled, we adapt the size of
> + * direct mapping area to the size of actual physical memory plus the
> + * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
> + * The left part will be taken out to join memory randomization.

The doc doesn't reflect the the case of boot_params.max_addr existed?

Otherwise, it looks good to me and easy to understand. You can remove my
'Signed-off' and feel free to add my ack.

Acked-by: Baoquan He <[email protected]>

Thanks
Baoquan

> + */
> +static inline unsigned long calc_direct_mapping_size(void)
> +{
> + unsigned long size_tb, memory_tb;
> +
> + memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + if (boot_params.max_addr) {
> + unsigned long maximum_tb;
> +
> + maximum_tb = DIV_ROUND_UP(boot_params.max_addr,
> + 1UL << TB_SHIFT);
> +
> + if (maximum_tb > memory_tb)
> + memory_tb = maximum_tb;
> + }
> +#endif
> + size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> +
> + /*
> + * Adapt physical memory region size based on available memory
> + */
> + if (memory_tb < size_tb)
> + size_tb = memory_tb;
> +
> + return size_tb;
> +}
> +
> /* Initialize base and padding for each memory region randomized with KASLR */
> void __init kernel_randomize_memory(void)
> {
> - size_t i;
> - unsigned long vaddr_start, vaddr;
> - unsigned long rand, memory_tb;
> - struct rnd_state rand_state;
> + unsigned long vaddr_start, vaddr, rand;
> unsigned long remain_entropy;
> unsigned long vmemmap_size;
> + struct rnd_state rand_state;
> + size_t i;
>
> vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4;
> vaddr = vaddr_start;
> @@ -95,20 +132,10 @@ void __init kernel_randomize_memory(void)
> if (!kaslr_memory_enabled())
> return;
>
> - kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
> + kaslr_regions[0].size_tb = calc_direct_mapping_size();
> kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
>
> - /*
> - * Update Physical memory mapping to available and
> - * add padding if needed (especially for memory hotplug support).
> - */
> BUG_ON(kaslr_regions[0].base != &page_offset_base);
> - memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> - CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> -
> - /* Adapt phyiscal memory region size based on available memory */
> - if (memory_tb < kaslr_regions[0].size_tb)
> - kaslr_regions[0].size_tb = memory_tb;
>
> /*
> * Calculate the vmemmap region size in TBs, aligned to a TB
> --
> 2.20.1
>

2019-12-12 20:19:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 0/4] Adjust the padding size for KASLR

On Fri, Nov 15, 2019 at 09:49:13AM -0500, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma <[email protected]>
>
> The system sometimes crashes while memory hot-adding on KASLR
> enabled system. The crash happens because the regions pointed by
> kaslr_regions[].base are overwritten by the hot-added memory.
>
> It happens because of the padding size for kaslr_regions[].base isn't
> enough for the system whose physical memory layout has huge space for
> memory hotplug. kaslr_regions[].base points "actual installed
> memory size + padding" or higher address. So, if the "actual + padding"
> is lower address than the maximum memory address, which means the memory
> address reachable by memory hot-add, kaslr_regions[].base is destroyed by
> the overwritten.

I can only try to guess what this is trying to tell me so please rewrite
this using simple, declarative sentences.

Use a structure like this:

Problem is A.

It happens because of B.

Fix it by doing C.

(Potentially do D).

For more detailed info, see
Documentation/process/submitting-patches.rst, Section "2) Describe your
changes".

Also, to the tone, from Documentation/process/submitting-patches.rst:

"Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour."

Also, when you write your commit messages, always talk about "why"
you're doing a change and not "what" you're doing - the "what" is
visible from the diff.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-12-12 20:19:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 1/4] x86/boot: Wrap up the SRAT traversing code into subtable_parse()

On Fri, Nov 15, 2019 at 09:49:14AM -0500, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma <[email protected]>
>
> Wrap up the SRAT traversing code into subtable_parse().

Why?

>
> Reviewed-by: Baoquan He <[email protected]>
> Signed-off-by: Masayoshi Mizuma <[email protected]>
> ---
> arch/x86/boot/compressed/acpi.c | 21 ++++++++++++++-------
> 1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> index 25019d42ae93..a0f81438a0fd 100644
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -362,6 +362,19 @@ static unsigned long get_acpi_srat_table(void)
> return 0;
> }
>
> +static void subtable_parse(struct acpi_subtable_header *sub_table, int *num)
> +{
> + struct acpi_srat_mem_affinity *ma;
> +
> + ma = (struct acpi_srat_mem_affinity *)sub_table;
> +
> + if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) {
> + immovable_mem[*num].start = ma->base_address;
> + immovable_mem[*num].size = ma->length;
> + (*num)++;
> + }
> +}
> +
> /**
> * count_immovable_mem_regions - Parse SRAT and cache the immovable
> * memory regions into the immovable_mem array.
> @@ -395,14 +408,8 @@ int count_immovable_mem_regions(void)
> while (table + sizeof(struct acpi_subtable_header) < table_end) {
> sub_table = (struct acpi_subtable_header *)table;
> if (sub_table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) {

So this is checking for the table type being something but calling a
function which looks generic, at least judging by the name.

And that generic function is a function why exactly? It beats me. And
the input/output argument @num is actually begging for this to *not* be
a function at all.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-12-12 20:21:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On Fri, Nov 15, 2019 at 09:49:17AM -0500, Masayoshi Mizuma wrote:
> +/*
> + * Even though a huge virtual address space is reserved for the direct
> + * mapping of physical memory, e.g in 4-level paging mode, it's 64TB,
> + * rare system can own enough physical memory to use it up, most are
> + * even less than 1TB.

This sentence is unparseable.

> So with KASLR enabled, we adapt the size of

Who's "we"?

> + * direct mapping area to the size of actual physical memory plus the
> + * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
> + * The left part will be taken out to join memory randomization.
> + */
> +static inline unsigned long calc_direct_mapping_size(void)

What direct mapping?!

The code is computing the physical memory regions base address and
sizes.

> +{
> + unsigned long size_tb, memory_tb;
> +
> + memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + if (boot_params.max_addr) {
> + unsigned long maximum_tb;
> +
> + maximum_tb = DIV_ROUND_UP(boot_params.max_addr,
> + 1UL << TB_SHIFT);

All that jumping through hoops and adding a member to boot_params which
is useless on !hot-add systems - basically the majority out there - just
so that you can use that max address here?!

Did you not find acpi_table_parse_srat()?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-12-13 13:30:34

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On 12/12/19 at 09:19pm, Borislav Petkov wrote:
> > + * direct mapping area to the size of actual physical memory plus the
> > + * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
> > + * The left part will be taken out to join memory randomization.
> > + */
> > +static inline unsigned long calc_direct_mapping_size(void)
>
> What direct mapping?!
>
> The code is computing the physical memory regions base address and
> sizes.

In Documentation/x86/x86_64/mm.rst, the physical memory regions mapping
with page_offset is called as the direct mapping of physical memory.
Seems both is used in kernel and document.

>
> > +{
> > + unsigned long size_tb, memory_tb;
> > +
> > + memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> > + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> > +
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> > + if (boot_params.max_addr) {
> > + unsigned long maximum_tb;
> > +
> > + maximum_tb = DIV_ROUND_UP(boot_params.max_addr,
> > + 1UL << TB_SHIFT);
>
> All that jumping through hoops and adding a member to boot_params which
> is useless on !hot-add systems - basically the majority out there - just
> so that you can use that max address here?!
>
> Did you not find acpi_table_parse_srat()?

kernel_randomize_memory() is invoked much earlier than
acpi_table_parse_srat(). KASLR need know the max address to reserve
space for the direct mapping region (or the physical memory region)
so that it can cover later possible hotplugged memory.

Thanks
Baoquan

2019-12-13 14:16:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On Fri, Dec 13, 2019 at 09:28:50PM +0800, Baoquan He wrote:
> In Documentation/x86/x86_64/mm.rst, the physical memory regions mapping
> with page_offset is called as the direct mapping of physical memory.

The fact that it happens to compute the *first* region's size, which
*happens* to be the direct mapping of all physical memory is immaterial
here.

It is actually causing more confusion in an already complex piece of
code. You can call this function just as well

calc_region_size()

which won't confuse readers. Because all you care about here is the
region's size - not which region it is.

> kernel_randomize_memory() is invoked much earlier than
> acpi_table_parse_srat().

And? What are we going to do about that?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-12-13 14:57:24

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On 12/13/19 at 03:15pm, Borislav Petkov wrote:
> On Fri, Dec 13, 2019 at 09:28:50PM +0800, Baoquan He wrote:
> > In Documentation/x86/x86_64/mm.rst, the physical memory regions mapping
> > with page_offset is called as the direct mapping of physical memory.
>
> The fact that it happens to compute the *first* region's size, which
> *happens* to be the direct mapping of all physical memory is immaterial
> here.
>
> It is actually causing more confusion in an already complex piece of
> code. You can call this function just as well
>
> calc_region_size()
>
> which won't confuse readers. Because all you care about here is the
> region's size - not which region it is.

Won't calc_region_size be too generic? We also have vmalloc and vmemmap,
and here we are specifically calculating the direct mapping of physical
memory.

>
> > kernel_randomize_memory() is invoked much earlier than
> > acpi_table_parse_srat().
>
> And? What are we going to do about that?


void __init setup_arch(char **cmdline_p)
{
...
kernel_randomize_memory();
...
init_mem_mapping();
...
initmem_init();
...
}

In kernel_randomize_memory(), KASLR builds the layout of these regions,
including their starting address and the gap between them. Once
finished, __PAGE_OFFSET, VMALLOC_START, VMEMMAP_START are settled.
If not knowing the max address to cover all the possible hotplugged
memory, later memory hotplug will fail.

Thanks
Baoquan

2019-12-13 16:39:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On Fri, Dec 13, 2019 at 10:54:48PM +0800, Baoquan He wrote:
> On 12/13/19 at 03:15pm, Borislav Petkov wrote:
> > On Fri, Dec 13, 2019 at 09:28:50PM +0800, Baoquan He wrote:
> > > In Documentation/x86/x86_64/mm.rst, the physical memory regions mapping
> > > with page_offset is called as the direct mapping of physical memory.
> >
> > The fact that it happens to compute the *first* region's size, which
> > *happens* to be the direct mapping of all physical memory is immaterial
> > here.
> >
> > It is actually causing more confusion in an already complex piece of
> > code. You can call this function just as well
> >
> > calc_region_size()
> >
> > which won't confuse readers. Because all you care about here is the
> > region's size - not which region it is.
>
> Won't calc_region_size be too generic? We also have vmalloc and vmemmap,
> and here we are specifically calculating the direct mapping of physical
> memory.

It sounds like you didn't read what I wrote above so read it again pls.

> If not knowing the max address to cover all the possible hotplugged
> memory, later memory hotplug will fail.

You don't have to state the obvious - I can see that in the code.

So let me ask you differently: can the parsing of the SRAT table happen
shortly before kernel_randomize_memory() *without* adding all that gunk
to the compressed stage, and without adding the boot_params member and
done only for memory hot_add machines?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-12-13 23:30:33

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On 12/13/19 at 05:38pm, Borislav Petkov wrote:
> On Fri, Dec 13, 2019 at 10:54:48PM +0800, Baoquan He wrote:
> > On 12/13/19 at 03:15pm, Borislav Petkov wrote:
> > > On Fri, Dec 13, 2019 at 09:28:50PM +0800, Baoquan He wrote:
> > > > In Documentation/x86/x86_64/mm.rst, the physical memory regions mapping
> > > > with page_offset is called as the direct mapping of physical memory.
> > >
> > > The fact that it happens to compute the *first* region's size, which
> > > *happens* to be the direct mapping of all physical memory is immaterial
> > > here.
> > >
> > > It is actually causing more confusion in an already complex piece of
> > > code. You can call this function just as well
> > >
> > > calc_region_size()
> > >
> > > which won't confuse readers. Because all you care about here is the
> > > region's size - not which region it is.
> >
> > Won't calc_region_size be too generic? We also have vmalloc and vmemmap,
> > and here we are specifically calculating the direct mapping of physical
> > memory.
>
> It sounds like you didn't read what I wrote above so read it again pls.

Got it, I believe people won't be confused with calc_region_size().
It's fine to me.

>
> > If not knowing the max address to cover all the possible hotplugged
> > memory, later memory hotplug will fail.
>
> You don't have to state the obvious - I can see that in the code.
>
> So let me ask you differently: can the parsing of the SRAT table happen
> shortly before kernel_randomize_memory() *without* adding all that gunk
> to the compressed stage, and without adding the boot_params member and
> done only for memory hot_add machines?

OK, you mean parsing SRAT again before kernel_randomize_memory(). I
think this is what Masa made this patchset to avoid. Then we will have
three times SRAT parsing. Passing the max addr from boot to
kernel_randomize_memory() was raised when review Chao Fan's patchset, I
vaguely remember. Chao didn't take the SRAT parsing way in boot code
firstly, later someone suggested to parse, and said the issue in
kernel_randomize_memory() can be fixed with the parsed value.

Surely, parsing SRAT here is also good. Maybe Masa can make a draft
patch, let people see what it looks like.

Thanks
Baoquan

2019-12-14 03:34:57

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On 12/12/19 at 09:19pm, Borislav Petkov wrote:
> On Fri, Nov 15, 2019 at 09:49:17AM -0500, Masayoshi Mizuma wrote:
> > +/*
> > + * Even though a huge virtual address space is reserved for the direct
> > + * mapping of physical memory, e.g in 4-level paging mode, it's 64TB,
> > + * rare system can own enough physical memory to use it up, most are
> > + * even less than 1TB.
>
> This sentence is unparseable.
>
> > So with KASLR enabled, we adapt the size of
>
> Who's "we"?
>
> > + * direct mapping area to the size of actual physical memory plus the
> > + * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING.
> > + * The left part will be taken out to join memory randomization.
> > + */
> > +static inline unsigned long calc_direct_mapping_size(void)
>
> What direct mapping?!
>
> The code is computing the physical memory regions base address and
> sizes.
>
> > +{
> > + unsigned long size_tb, memory_tb;
> > +
> > + memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
> > + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> > +
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> > + if (boot_params.max_addr) {
> > + unsigned long maximum_tb;
> > +
> > + maximum_tb = DIV_ROUND_UP(boot_params.max_addr,
> > + 1UL << TB_SHIFT);
>
> All that jumping through hoops and adding a member to boot_params which
> is useless on !hot-add systems - basically the majority out there - just

Read this again, one system owns ACPI SRAT tables, doesn't mean it's
a hot-adding system. Currently, most of systems have ACPI and the
affiliated SRAT tables. But to support memory hotplug, it need specific
firmware setting and web management to remotely power on/off the DIMM.
At least, systems which is delived to us for testing is very few.

So in kernel there's no way to check if memory hotplug is supported in
system. Memory hotplug is the sufficient and unnecessary condition of
the SRAT existence, I would say.

Thanks
Baoquan

2019-12-14 07:15:48

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On Sat, Dec 14, 2019 at 07:29:28AM +0800, Baoquan He wrote:
> OK, you mean parsing SRAT again before kernel_randomize_memory(). I
> think this is what Masa made this patchset to avoid. Then we will have
> three times SRAT parsing.

Can the SRAT parsing be *moved* up so that it is done before
kernel_randomize_memory() ?

So that it doesn't happen 3 times and acpi_numa_init() can simply use
the already parsed results.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-12-14 07:16:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On Sat, Dec 14, 2019 at 11:33:48AM +0800, Baoquan He wrote:
> So in kernel there's no way to check if memory hotplug is supported in
> system.

Which makes it even worse. We will carry this bunch of code which would
be dead code on most systems. Do you understand now why I'm pushing back
on this to be designed as clean and as lean as possible?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2019-12-14 10:12:51

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v5 4/4] x86/mm/KASLR: Adjust the padding size for the direct mapping.

On 12/14/19 at 08:13am, Borislav Petkov wrote:
> On Sat, Dec 14, 2019 at 07:29:28AM +0800, Baoquan He wrote:
> > OK, you mean parsing SRAT again before kernel_randomize_memory(). I
> > think this is what Masa made this patchset to avoid. Then we will have
> > three times SRAT parsing.
>
> Can the SRAT parsing be *moved* up so that it is done before
> kernel_randomize_memory() ?
>
> So that it doesn't happen 3 times and acpi_numa_init() can simply use
> the already parsed results.

Not very sure, It could be doable. Maybe Masa can have a try.