2019-01-30 08:23:56

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v2 0/2] x86: respect memory size limits

On a customer system running Xen a boot problem was observed due to
the kernel not respecting the memory size limit imposed by the Xen
hypervisor.

During analysis I found the same problem should be able to occur on
bare metal in case the memory would be limited via the "mem=" boot
parameter.

The system this problem has been observed on has tons of memory
added via PCI. So while in the E820 map the not to be used memory has
been wiped out the additional PCI memory is detected during ACPI scan
and it is added via __add_memory().

This small series tries to repair the issue by testing the imposed
memory limit during the memory hotplug process and refusing to add it
in case the limit is being violated.

I've chosen to refuse adding the complete memory chunk in case the
limit is reached instead of adding only some of the memory, as I
thought this would result in less problems (e.g. avoiding to add
only parts of a 128MB memory bar which might be difficult to remove
later).

Changes in V2:
- patch 1: set initial allowed size to U64_MAX instead -1
- patch 2: set initial allowed size to end of E820 RAM

Juergen Gross (2):
x86: respect memory size limiting via mem= parameter
x86/xen: dont add memory above max allowed allocation

arch/x86/kernel/e820.c | 5 +++++
arch/x86/xen/setup.c | 10 ++++++++++
drivers/xen/xen-balloon.c | 6 ++++++
include/linux/memory_hotplug.h | 2 ++
mm/memory_hotplug.c | 6 ++++++
5 files changed, 29 insertions(+)

--
2.16.4



2019-01-30 08:24:26

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v2 2/2] x86/xen: dont add memory above max allowed allocation

Don't allow memory to be added above the allowed maximum allocation
limit set by Xen.

Trying to do so would result in cases like the following:

[ 584.559652] ------------[ cut here ]------------
[ 584.564897] WARNING: CPU: 2 PID: 1 at ../arch/x86/xen/multicalls.c:129 xen_alloc_pte+0x1c7/0x390()
[ 584.575151] Modules linked in:
[ 584.578643] Supported: Yes
[ 584.581750] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.4.120-92.70-default #1
[ 584.590000] Hardware name: Cisco Systems Inc UCSC-C460-M4/UCSC-C460-M4, BIOS C460M4.4.0.1b.0.0629181419 06/29/2018
[ 584.601862] 0000000000000000 ffffffff813175a0 0000000000000000 ffffffff8184777c
[ 584.610200] ffffffff8107f4e1 ffff880487eb7000 ffff8801862b79c0 ffff88048608d290
[ 584.618537] 0000000000487eb7 ffffea0000000201 ffffffff81009de7 ffffffff81068561
[ 584.626876] Call Trace:
[ 584.629699] [<ffffffff81019ad9>] dump_trace+0x59/0x340
[ 584.635645] [<ffffffff81019eaa>] show_stack_log_lvl+0xea/0x170
[ 584.642391] [<ffffffff8101ac51>] show_stack+0x21/0x40
[ 584.648238] [<ffffffff813175a0>] dump_stack+0x5c/0x7c
[ 584.654085] [<ffffffff8107f4e1>] warn_slowpath_common+0x81/0xb0
[ 584.660932] [<ffffffff81009de7>] xen_alloc_pte+0x1c7/0x390
[ 584.667289] [<ffffffff810647f0>] pmd_populate_kernel.constprop.6+0x40/0x80
[ 584.675241] [<ffffffff815ecfe8>] phys_pmd_init+0x210/0x255
[ 584.681587] [<ffffffff815ed207>] phys_pud_init+0x1da/0x247
[ 584.687931] [<ffffffff815edb3b>] kernel_physical_mapping_init+0xf5/0x1d4
[ 584.695682] [<ffffffff815e9bdd>] init_memory_mapping+0x18d/0x380
[ 584.702631] [<ffffffff81064699>] arch_add_memory+0x59/0xf0

Signed-off-by: Juergen Gross <[email protected]>
---
arch/x86/xen/setup.c | 10 ++++++++++
drivers/xen/xen-balloon.c | 6 ++++++
2 files changed, 16 insertions(+)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index d5f303c0e656..fdb184cadaf5 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -12,6 +12,7 @@
#include <linux/memblock.h>
#include <linux/cpuidle.h>
#include <linux/cpufreq.h>
+#include <linux/memory_hotplug.h>

#include <asm/elf.h>
#include <asm/vdso.h>
@@ -825,6 +826,15 @@ char * __init xen_memory_setup(void)
xen_max_p2m_pfn = pfn_s + n_pfns;
} else
discard = true;
+#ifdef CONFIG_MEMORY_HOTPLUG
+ /*
+ * Don't allow adding memory not in E820 map while
+ * booting the system. Once the balloon driver is up
+ * it will remove that restriction again.
+ */
+ max_mem_size = xen_e820_table.entries[i].addr +
+ xen_e820_table.entries[i].size;
+#endif
}

if (!discard)
diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c
index 2acbfe104e46..2a960fcc812e 100644
--- a/drivers/xen/xen-balloon.c
+++ b/drivers/xen/xen-balloon.c
@@ -37,6 +37,7 @@
#include <linux/mm_types.h>
#include <linux/init.h>
#include <linux/capability.h>
+#include <linux/memory_hotplug.h>

#include <xen/xen.h>
#include <xen/interface/xen.h>
@@ -63,6 +64,11 @@ static void watch_target(struct xenbus_watch *watch,
static bool watch_fired;
static long target_diff;

+#ifdef CONFIG_MEMORY_HOTPLUG
+ /* The balloon driver will take care of adding memory now. */
+ max_mem_size = U64_MAX;
+#endif
+
err = xenbus_scanf(XBT_NIL, "memory", "target", "%llu", &new_target);
if (err != 1) {
/* This is ok (for domain0 at least) - so just return */
--
2.16.4


2019-01-30 08:24:39

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v2 1/2] x86: respect memory size limiting via mem= parameter

When limiting memory size via kernel parameter "mem=" this should be
respected even in case of memory made accessible via a PCI card.

Today this kind of memory won't be made usable in initial memory
setup as the memory won't be visible in E820 map, but it might be
added when adding PCI devices due to corresponding ACPI table entries.

Not respecting "mem=" can be corrected by adding a global max_mem_size
variable set by parse_memopt() which will result in rejecting adding
memory areas resulting in a memory size above the allowed limit.

Signed-off-by: Juergen Gross <[email protected]>
---
arch/x86/kernel/e820.c | 5 +++++
include/linux/memory_hotplug.h | 2 ++
mm/memory_hotplug.c | 6 ++++++
3 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 50895c2f937d..e67513e2cbbb 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -14,6 +14,7 @@
#include <linux/acpi.h>
#include <linux/firmware-map.h>
#include <linux/sort.h>
+#include <linux/memory_hotplug.h>

#include <asm/e820/api.h>
#include <asm/setup.h>
@@ -881,6 +882,10 @@ static int __init parse_memopt(char *p)

e820__range_remove(mem_size, ULLONG_MAX - mem_size, E820_TYPE_RAM, 1);

+#ifdef CONFIG_MEMORY_HOTPLUG
+ max_mem_size = mem_size;
+#endif
+
return 0;
}
early_param("mem", parse_memopt);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 07da5c6c5ba0..fb6bd0022d41 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -98,6 +98,8 @@ extern void __online_page_free(struct page *page);

extern int try_online_node(int nid);

+extern u64 max_mem_size;
+
extern bool memhp_auto_online;
/* If movable_node boot option specified */
extern bool movable_node_enabled;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b9a667d36c55..94f81c596151 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -96,10 +96,16 @@ void mem_hotplug_done(void)
cpus_read_unlock();
}

+u64 max_mem_size = U64_MAX;
+
/* add this memory to iomem resource */
static struct resource *register_memory_resource(u64 start, u64 size)
{
struct resource *res, *conflict;
+
+ if (start + size > max_mem_size)
+ return ERR_PTR(-E2BIG);
+
res = kzalloc(sizeof(struct resource), GFP_KERNEL);
if (!res)
return ERR_PTR(-ENOMEM);
--
2.16.4


2019-01-30 11:03:15

by William Kucharski

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/xen: dont add memory above max allowed allocation



> On Jan 30, 2019, at 1:22 AM, Juergen Gross <[email protected]> wrote:
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + /*
> + * Don't allow adding memory not in E820 map while
> + * booting the system. Once the balloon driver is up
> + * it will remove that restriction again.
> + */
> + max_mem_size = xen_e820_table.entries[i].addr +
> + xen_e820_table.entries[i].size;
> +#endif
> }
>
> if (!discard)

Reviewed-by: William Kucharski <[email protected]>


2019-02-01 18:49:03

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/xen: dont add memory above max allowed allocation

On 1/30/19 3:22 AM, Juergen Gross wrote:
> Don't allow memory to be added above the allowed maximum allocation
> limit set by Xen.
>
> Trying to do so would result in cases like the following:
>
> [ 584.559652] ------------[ cut here ]------------
> [ 584.564897] WARNING: CPU: 2 PID: 1 at ../arch/x86/xen/multicalls.c:129 xen_alloc_pte+0x1c7/0x390()
> [ 584.575151] Modules linked in:
> [ 584.578643] Supported: Yes
> [ 584.581750] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.4.120-92.70-default #1
> [ 584.590000] Hardware name: Cisco Systems Inc UCSC-C460-M4/UCSC-C460-M4, BIOS C460M4.4.0.1b.0.0629181419 06/29/2018
> [ 584.601862] 0000000000000000 ffffffff813175a0 0000000000000000 ffffffff8184777c
> [ 584.610200] ffffffff8107f4e1 ffff880487eb7000 ffff8801862b79c0 ffff88048608d290
> [ 584.618537] 0000000000487eb7 ffffea0000000201 ffffffff81009de7 ffffffff81068561
> [ 584.626876] Call Trace:
> [ 584.629699] [<ffffffff81019ad9>] dump_trace+0x59/0x340
> [ 584.635645] [<ffffffff81019eaa>] show_stack_log_lvl+0xea/0x170
> [ 584.642391] [<ffffffff8101ac51>] show_stack+0x21/0x40
> [ 584.648238] [<ffffffff813175a0>] dump_stack+0x5c/0x7c
> [ 584.654085] [<ffffffff8107f4e1>] warn_slowpath_common+0x81/0xb0
> [ 584.660932] [<ffffffff81009de7>] xen_alloc_pte+0x1c7/0x390
> [ 584.667289] [<ffffffff810647f0>] pmd_populate_kernel.constprop.6+0x40/0x80
> [ 584.675241] [<ffffffff815ecfe8>] phys_pmd_init+0x210/0x255
> [ 584.681587] [<ffffffff815ed207>] phys_pud_init+0x1da/0x247
> [ 584.687931] [<ffffffff815edb3b>] kernel_physical_mapping_init+0xf5/0x1d4
> [ 584.695682] [<ffffffff815e9bdd>] init_memory_mapping+0x18d/0x380
> [ 584.702631] [<ffffffff81064699>] arch_add_memory+0x59/0xf0
>
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> arch/x86/xen/setup.c | 10 ++++++++++
> drivers/xen/xen-balloon.c | 6 ++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index d5f303c0e656..fdb184cadaf5 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -12,6 +12,7 @@
> #include <linux/memblock.h>
> #include <linux/cpuidle.h>
> #include <linux/cpufreq.h>
> +#include <linux/memory_hotplug.h>
>
> #include <asm/elf.h>
> #include <asm/vdso.h>
> @@ -825,6 +826,15 @@ char * __init xen_memory_setup(void)
> xen_max_p2m_pfn = pfn_s + n_pfns;
> } else
> discard = true;
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + /*
> + * Don't allow adding memory not in E820 map while
> + * booting the system. Once the balloon driver is up
> + * it will remove that restriction again.
> + */
> + max_mem_size = xen_e820_table.entries[i].addr +
> + xen_e820_table.entries[i].size;
> +#endif
> }
>
> if (!discard)
> diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c
> index 2acbfe104e46..2a960fcc812e 100644
> --- a/drivers/xen/xen-balloon.c
> +++ b/drivers/xen/xen-balloon.c
> @@ -37,6 +37,7 @@
> #include <linux/mm_types.h>
> #include <linux/init.h>
> #include <linux/capability.h>
> +#include <linux/memory_hotplug.h>
>
> #include <xen/xen.h>
> #include <xen/interface/xen.h>
> @@ -63,6 +64,11 @@ static void watch_target(struct xenbus_watch *watch,
> static bool watch_fired;
> static long target_diff;
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + /* The balloon driver will take care of adding memory now. */
> + max_mem_size = U64_MAX;
> +#endif


I don't think I understand this. Are you saying the guest should ignore
'mem' boot option?

-boris


> +
> err = xenbus_scanf(XBT_NIL, "memory", "target", "%llu", &new_target);
> if (err != 1) {
> /* This is ok (for domain0 at least) - so just return */


2019-02-07 06:32:54

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] x86/xen: dont add memory above max allowed allocation

On 01/02/2019 19:46, Boris Ostrovsky wrote:
> On 1/30/19 3:22 AM, Juergen Gross wrote:
>> Don't allow memory to be added above the allowed maximum allocation
>> limit set by Xen.
>>
>> Trying to do so would result in cases like the following:
>>
>> [ 584.559652] ------------[ cut here ]------------
>> [ 584.564897] WARNING: CPU: 2 PID: 1 at ../arch/x86/xen/multicalls.c:129 xen_alloc_pte+0x1c7/0x390()
>> [ 584.575151] Modules linked in:
>> [ 584.578643] Supported: Yes
>> [ 584.581750] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.4.120-92.70-default #1
>> [ 584.590000] Hardware name: Cisco Systems Inc UCSC-C460-M4/UCSC-C460-M4, BIOS C460M4.4.0.1b.0.0629181419 06/29/2018
>> [ 584.601862] 0000000000000000 ffffffff813175a0 0000000000000000 ffffffff8184777c
>> [ 584.610200] ffffffff8107f4e1 ffff880487eb7000 ffff8801862b79c0 ffff88048608d290
>> [ 584.618537] 0000000000487eb7 ffffea0000000201 ffffffff81009de7 ffffffff81068561
>> [ 584.626876] Call Trace:
>> [ 584.629699] [<ffffffff81019ad9>] dump_trace+0x59/0x340
>> [ 584.635645] [<ffffffff81019eaa>] show_stack_log_lvl+0xea/0x170
>> [ 584.642391] [<ffffffff8101ac51>] show_stack+0x21/0x40
>> [ 584.648238] [<ffffffff813175a0>] dump_stack+0x5c/0x7c
>> [ 584.654085] [<ffffffff8107f4e1>] warn_slowpath_common+0x81/0xb0
>> [ 584.660932] [<ffffffff81009de7>] xen_alloc_pte+0x1c7/0x390
>> [ 584.667289] [<ffffffff810647f0>] pmd_populate_kernel.constprop.6+0x40/0x80
>> [ 584.675241] [<ffffffff815ecfe8>] phys_pmd_init+0x210/0x255
>> [ 584.681587] [<ffffffff815ed207>] phys_pud_init+0x1da/0x247
>> [ 584.687931] [<ffffffff815edb3b>] kernel_physical_mapping_init+0xf5/0x1d4
>> [ 584.695682] [<ffffffff815e9bdd>] init_memory_mapping+0x18d/0x380
>> [ 584.702631] [<ffffffff81064699>] arch_add_memory+0x59/0xf0
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>> arch/x86/xen/setup.c | 10 ++++++++++
>> drivers/xen/xen-balloon.c | 6 ++++++
>> 2 files changed, 16 insertions(+)
>>
>> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
>> index d5f303c0e656..fdb184cadaf5 100644
>> --- a/arch/x86/xen/setup.c
>> +++ b/arch/x86/xen/setup.c
>> @@ -12,6 +12,7 @@
>> #include <linux/memblock.h>
>> #include <linux/cpuidle.h>
>> #include <linux/cpufreq.h>
>> +#include <linux/memory_hotplug.h>
>>
>> #include <asm/elf.h>
>> #include <asm/vdso.h>
>> @@ -825,6 +826,15 @@ char * __init xen_memory_setup(void)
>> xen_max_p2m_pfn = pfn_s + n_pfns;
>> } else
>> discard = true;
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> + /*
>> + * Don't allow adding memory not in E820 map while
>> + * booting the system. Once the balloon driver is up
>> + * it will remove that restriction again.
>> + */
>> + max_mem_size = xen_e820_table.entries[i].addr +
>> + xen_e820_table.entries[i].size;
>> +#endif
>> }
>>
>> if (!discard)
>> diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c
>> index 2acbfe104e46..2a960fcc812e 100644
>> --- a/drivers/xen/xen-balloon.c
>> +++ b/drivers/xen/xen-balloon.c
>> @@ -37,6 +37,7 @@
>> #include <linux/mm_types.h>
>> #include <linux/init.h>
>> #include <linux/capability.h>
>> +#include <linux/memory_hotplug.h>
>>
>> #include <xen/xen.h>
>> #include <xen/interface/xen.h>
>> @@ -63,6 +64,11 @@ static void watch_target(struct xenbus_watch *watch,
>> static bool watch_fired;
>> static long target_diff;
>>
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> + /* The balloon driver will take care of adding memory now. */
>> + max_mem_size = U64_MAX;
>> +#endif
>
>
> I don't think I understand this. Are you saying the guest should ignore
> 'mem' boot option?

No, I just managed to forget thinking about that possibility.

I need to save the old max_mem_size setting in setup.c and restore it here.


Juergen

2019-02-11 12:08:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86: respect memory size limiting via mem= parameter


* Juergen Gross <[email protected]> wrote:

> When limiting memory size via kernel parameter "mem=" this should be
> respected even in case of memory made accessible via a PCI card.
>
> Today this kind of memory won't be made usable in initial memory
> setup as the memory won't be visible in E820 map, but it might be
> added when adding PCI devices due to corresponding ACPI table entries.
>
> Not respecting "mem=" can be corrected by adding a global max_mem_size
> variable set by parse_memopt() which will result in rejecting adding
> memory areas resulting in a memory size above the allowed limit.

So historically 'mem=xxxM' was a way to quickly limit RAM.

If PCI devices had physical mmio memory areas above this range, we'd
still expect them to work - the option was really only meant to limit
RAM.

So I'm wondering what the new logic is here - why should an iomem
resource from a PCI device be ignored? It's a completely separate area
that might or might not be enumerated in the e820 table - the only
requirement we have here I think is that it not overlap RAM areas or each
other (obviously).

So if I understood this new restriction you want mem= to imply, devices
would start failing to initialize on bare metal when mem= is used?

Thanks,

Ingo

2019-02-11 12:16:25

by Juergen Gross

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v2 1/2] x86: respect memory size limiting via mem= parameter

On 11/02/2019 13:06, Ingo Molnar wrote:
>
> * Juergen Gross <[email protected]> wrote:
>
>> When limiting memory size via kernel parameter "mem=" this should be
>> respected even in case of memory made accessible via a PCI card.
>>
>> Today this kind of memory won't be made usable in initial memory
>> setup as the memory won't be visible in E820 map, but it might be
>> added when adding PCI devices due to corresponding ACPI table entries.
>>
>> Not respecting "mem=" can be corrected by adding a global max_mem_size
>> variable set by parse_memopt() which will result in rejecting adding
>> memory areas resulting in a memory size above the allowed limit.
>
> So historically 'mem=xxxM' was a way to quickly limit RAM.

Right.

> If PCI devices had physical mmio memory areas above this range, we'd
> still expect them to work - the option was really only meant to limit
> RAM.

No, in this case it seems to be real RAM added via PCI. The RAM is
initially present in the E820 map, but the "mem=" will remove it from
there again. During ACPI scan it is found (again) and will be added
via hotplug mechanism, so "mem=" has no effect for that memory.


Juergen

2019-02-11 12:24:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v2 1/2] x86: respect memory size limiting via mem= parameter


* Juergen Gross <[email protected]> wrote:

> > If PCI devices had physical mmio memory areas above this range, we'd
> > still expect them to work - the option was really only meant to limit
> > RAM.
>
> No, in this case it seems to be real RAM added via PCI. The RAM is
> initially present in the E820 map, but the "mem=" will remove it from
> there again. During ACPI scan it is found (again) and will be added via
> hotplug mechanism, so "mem=" has no effect for that memory.

OK. With that background:

Acked-by: Ingo Molnar <[email protected]>

I suppose you want this to go upstream via the Xen tree, which is the
main testcase for the bug to begin with?

Thanks,

ngo

2019-02-11 12:35:48

by Juergen Gross

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v2 1/2] x86: respect memory size limiting via mem= parameter

On 11/02/2019 13:23, Ingo Molnar wrote:
>
> * Juergen Gross <[email protected]> wrote:
>
>>> If PCI devices had physical mmio memory areas above this range, we'd
>>> still expect them to work - the option was really only meant to limit
>>> RAM.
>>
>> No, in this case it seems to be real RAM added via PCI. The RAM is
>> initially present in the E820 map, but the "mem=" will remove it from
>> there again. During ACPI scan it is found (again) and will be added via
>> hotplug mechanism, so "mem=" has no effect for that memory.
>
> OK. With that background:
>
> Acked-by: Ingo Molnar <[email protected]>
>
> I suppose you want this to go upstream via the Xen tree, which is the
> main testcase for the bug to begin with?

Yes, I'd prefer that.


Juergen