2014-01-08 15:26:53

by Baoquan He

[permalink] [raw]
Subject: kdump failed because of hotplug memory adding in kdump kernel


Hi,

There's a bug found on intel machine which is numa with hotplug memory.
In this machine, numa is on in 1st kernel, but off in 2nd kernel. when
reserve 512M memory even more memory, kdump always failed. The error log
has been added as attachment, you can check it.

>From log, it can be seen clearly hotplug memory caused this, and we set
the memory_device_handler as null to skip the hotplug memory adding,
kdump is successfull.

Below is personal analysis:
-------------------------
>From ACPI code, In 1st kernel init, bios/efi will detect hardware and
form E820 table, meanwhile ACPI is prepared too. Then E820 is built and
passed to 2nd kernel. However, during initialization, acpi table init
call acpi_os_get_root_pointer(), this function will check whether
acpi_rsdp is ready. If not, it will check whether rsdp can be fetched
from efi. Finally, it will try EBDA or area between E0000 and FFFFF
directly. The 3rd is the case of this bug. Once rsdp is got ,it will
initialize all ACPI tables, and build namespace tree which includes
hotplug memory ns object. That's why hotplug memory can be found. So
in acpi_bus_scan trigger the memory_device_handler is matched and
acpi_memory_device_add is called.


Now questions:
1)If acpi tables are handled like acpi_os_get_root_pointer() doing,
acpi regions are passed to 2nd kernel by memmap=xx#yy, but where are
they used? I can see these regions are added into E820 in kdump kernel.

If acpi regions passed from 1st kernel by memmap=xx#yy are not used, and
acpi is detected always like this, I guess there must be something wrong
with it.

2)if we disable memory hotplug in second kernel and if first kernel
reserved a memory in hotplug memory region, will second kerenl still see
that memroy?

Please help check this.

Baoquan
Thanks


Attachments:
error_all.log (69.70 kB)
disable-acpi-mem-hotplug-for-exactmap.patch (1.47 kB)
Download all attachments

2014-01-08 15:58:59

by Vivek Goyal

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:

[..]
> [ 1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> [ 1.605045] PCI host bridge to bus 0000:ff
> [ 1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> [ 1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> [ 1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> [ 1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> [ 1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> [ 1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> [ 1.743224] 0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> [ 1.751513] ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> [ 1.759804] ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> [ 1.768096] Call Trace: [348/1928]
> [ 1.770834] [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> [ 1.776561] [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> [ 1.783076] [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> [ 1.789581] [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> [ 1.796672] [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> [ 1.803274] [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> [ 1.810263] [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> [ 1.816673] [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> [ 1.823665] [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> [ 1.830659] [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> [ 1.836588] [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> [ 1.842804] [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> [ 1.848638] [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> [ 1.855728] [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> [ 1.862625] [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> [ 1.869616] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> [ 1.876896] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> [ 1.884177] [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> [ 1.890780] [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> [ 1.896805] [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> [ 1.903021] [<ffffffff81a14830>] acpi_init+0x25d/0x2a6

So basically acpi thinks that some memory block is a hot plug memory
and tries to add it. And that consumes lots of memory and we don't have
that memory in second kernel.

For this reason, we pass a custom E820 map to second kernel so that it
only initializes page tables and memmap array for a very small physical
memory range.

Now question is what is hot plug memory. In this case we have not
physically plugged in any physical memory. So why acpi is considering
this memory to be a hot add memory operation.

Are there memory hotplug slots and these ranges always considered hot added
memory? IOW, what if I hotplug a memory and then reboot the system. Will
new E820 map contain this new memory range or not?

I guess simplest way to solve this problem might be to disable memory hot
plug in kdump kernel. Is there any command line parameter to do that?

If we disable memory hotplug in second kernel, and a hot plug memory
is passed in E820 map, will it still work. Can I access that memory in
second kernel?

Thanks
Vivek

2014-01-08 22:53:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Wednesday, January 08, 2014 10:58:29 AM Vivek Goyal wrote:
> On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:
>
> [..]
> > [ 1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> > [ 1.605045] PCI host bridge to bus 0000:ff
> > [ 1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> > [ 1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> > [ 1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> > [ 1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> > [ 1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> > [ 1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> > [ 1.743224] 0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> > [ 1.751513] ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> > [ 1.759804] ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> > [ 1.768096] Call Trace: [348/1928]
> > [ 1.770834] [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> > [ 1.776561] [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> > [ 1.783076] [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> > [ 1.789581] [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> > [ 1.796672] [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> > [ 1.803274] [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> > [ 1.810263] [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> > [ 1.816673] [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> > [ 1.823665] [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> > [ 1.830659] [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> > [ 1.836588] [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> > [ 1.842804] [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> > [ 1.848638] [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> > [ 1.855728] [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> > [ 1.862625] [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> > [ 1.869616] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > [ 1.876896] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > [ 1.884177] [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> > [ 1.890780] [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> > [ 1.896805] [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> > [ 1.903021] [<ffffffff81a14830>] acpi_init+0x25d/0x2a6
>
> So basically acpi thinks that some memory block is a hot plug memory
> and tries to add it. And that consumes lots of memory and we don't have
> that memory in second kernel.

That's not exactly the case. What seems to happen is that there is an ACPI
memory object in the ACPI namespace and the ACPI memory hotplug driver
attempts to bind to it. That driver attempts to find removable memory blocks
associated with that object and to add them to the memory map.

Why don't you simply append acpi=off to the kexec command line? That should
make the problem go away.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-01-09 00:17:45

by Toshi Kani

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, 2014-01-09 at 00:07 +0100, Rafael J. Wysocki wrote:
> On Wednesday, January 08, 2014 10:58:29 AM Vivek Goyal wrote:
> > On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:
> >
> > [..]
> > > [ 1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> > > [ 1.605045] PCI host bridge to bus 0000:ff
> > > [ 1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> > > [ 1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> > > [ 1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> > > [ 1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> > > [ 1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> > > [ 1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> > > [ 1.743224] 0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> > > [ 1.751513] ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> > > [ 1.759804] ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> > > [ 1.768096] Call Trace: [348/1928]
> > > [ 1.770834] [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> > > [ 1.776561] [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> > > [ 1.783076] [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> > > [ 1.789581] [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> > > [ 1.796672] [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> > > [ 1.803274] [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> > > [ 1.810263] [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> > > [ 1.816673] [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> > > [ 1.823665] [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> > > [ 1.830659] [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> > > [ 1.836588] [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> > > [ 1.842804] [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> > > [ 1.848638] [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> > > [ 1.855728] [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> > > [ 1.862625] [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> > > [ 1.869616] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > [ 1.876896] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > [ 1.884177] [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> > > [ 1.890780] [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> > > [ 1.896805] [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> > > [ 1.903021] [<ffffffff81a14830>] acpi_init+0x25d/0x2a6
> >
> > So basically acpi thinks that some memory block is a hot plug memory
> > and tries to add it. And that consumes lots of memory and we don't have
> > that memory in second kernel.
>
> That's not exactly the case. What seems to happen is that there is an ACPI
> memory object in the ACPI namespace and the ACPI memory hotplug driver
> attempts to bind to it. That driver attempts to find removable memory blocks
> associated with that object and to add them to the memory map.
>
> Why don't you simply append acpi=off to the kexec command line? That should
> make the problem go away.

Yes, that should work, but Baoquan's approach makes sense to me. When
memmap=exactmap is specified, the kernel should ignore any memory
information from the firmware.

Thanks,
-Toshi

2014-01-09 03:22:41

by Baoquan He

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On 01/09/14 at 12:07am, Rafael J. Wysocki wrote:
> On Wednesday, January 08, 2014 10:58:29 AM Vivek Goyal wrote:
> > On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:
> >
> > [..]
> > > [ 1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> > > [ 1.605045] PCI host bridge to bus 0000:ff
> > > [ 1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> > > [ 1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> > > [ 1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> > > [ 1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> > > [ 1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> > > [ 1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> > > [ 1.743224] 0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> > > [ 1.751513] ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> > > [ 1.759804] ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> > > [ 1.768096] Call Trace: [348/1928]
> > > [ 1.770834] [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> > > [ 1.776561] [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> > > [ 1.783076] [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> > > [ 1.789581] [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> > > [ 1.796672] [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> > > [ 1.803274] [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> > > [ 1.810263] [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> > > [ 1.816673] [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> > > [ 1.823665] [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> > > [ 1.830659] [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> > > [ 1.836588] [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> > > [ 1.842804] [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> > > [ 1.848638] [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> > > [ 1.855728] [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> > > [ 1.862625] [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> > > [ 1.869616] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > [ 1.876896] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > [ 1.884177] [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> > > [ 1.890780] [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> > > [ 1.896805] [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> > > [ 1.903021] [<ffffffff81a14830>] acpi_init+0x25d/0x2a6
> >
> > So basically acpi thinks that some memory block is a hot plug memory
> > and tries to add it. And that consumes lots of memory and we don't have
> > that memory in second kernel.
>
> That's not exactly the case. What seems to happen is that there is an ACPI
> memory object in the ACPI namespace and the ACPI memory hotplug driver
> attempts to bind to it. That driver attempts to find removable memory blocks
> associated with that object and to add them to the memory map.

Yeah, since kdump kernel will detect rsdp for legacy machine in the
first 1K of the EBDA or between E0000 and FFFFF.

>
> Why don't you simply append acpi=off to the kexec command line? That should
> make the problem go away.

acpi=off doesn't work, kdump kernel hang immediately after crash is
triggered. Because acpi information is needed by kdump kernel, we can't
disable it.

2014-01-09 12:56:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Wednesday, January 08, 2014 05:11:48 PM Toshi Kani wrote:
> On Thu, 2014-01-09 at 00:07 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, January 08, 2014 10:58:29 AM Vivek Goyal wrote:
> > > On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:
> > >
> > > [..]
> > > > [ 1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> > > > [ 1.605045] PCI host bridge to bus 0000:ff
> > > > [ 1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> > > > [ 1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> > > > [ 1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> > > > [ 1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> > > > [ 1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> > > > [ 1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> > > > [ 1.743224] 0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> > > > [ 1.751513] ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> > > > [ 1.759804] ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> > > > [ 1.768096] Call Trace: [348/1928]
> > > > [ 1.770834] [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> > > > [ 1.776561] [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> > > > [ 1.783076] [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> > > > [ 1.789581] [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> > > > [ 1.796672] [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> > > > [ 1.803274] [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> > > > [ 1.810263] [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> > > > [ 1.816673] [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> > > > [ 1.823665] [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> > > > [ 1.830659] [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> > > > [ 1.836588] [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> > > > [ 1.842804] [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> > > > [ 1.848638] [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> > > > [ 1.855728] [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> > > > [ 1.862625] [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> > > > [ 1.869616] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > [ 1.876896] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > [ 1.884177] [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> > > > [ 1.890780] [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> > > > [ 1.896805] [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> > > > [ 1.903021] [<ffffffff81a14830>] acpi_init+0x25d/0x2a6
> > >
> > > So basically acpi thinks that some memory block is a hot plug memory
> > > and tries to add it. And that consumes lots of memory and we don't have
> > > that memory in second kernel.
> >
> > That's not exactly the case. What seems to happen is that there is an ACPI
> > memory object in the ACPI namespace and the ACPI memory hotplug driver
> > attempts to bind to it. That driver attempts to find removable memory blocks
> > associated with that object and to add them to the memory map.
> >
> > Why don't you simply append acpi=off to the kexec command line? That should
> > make the problem go away.
>
> Yes, that should work, but Baoquan's approach makes sense to me. When
> memmap=exactmap is specified, the kernel should ignore any memory
> information from the firmware.

OK

Baoquan, please modify your patch to get rid of the #ifdef CONFIG_X86 in
acpi_memory_hotplug_init(). For example, you can add a function returning true
if use_exactmap is set and false otherwise and make acpi_memory_hotplug_init()
call that function. Alternatively, you can define arch-independent
no_memory_hotplug (instead of use_exactmap) and set if for memmap=exactmap.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-01-09 14:49:06

by Vivek Goyal

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, Jan 09, 2014 at 12:07:17AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, January 08, 2014 10:58:29 AM Vivek Goyal wrote:
> > On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:
> >
> > [..]
> > > [ 1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> > > [ 1.605045] PCI host bridge to bus 0000:ff
> > > [ 1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> > > [ 1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> > > [ 1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> > > [ 1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> > > [ 1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> > > [ 1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> > > [ 1.743224] 0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> > > [ 1.751513] ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> > > [ 1.759804] ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> > > [ 1.768096] Call Trace: [348/1928]
> > > [ 1.770834] [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> > > [ 1.776561] [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> > > [ 1.783076] [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> > > [ 1.789581] [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> > > [ 1.796672] [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> > > [ 1.803274] [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> > > [ 1.810263] [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> > > [ 1.816673] [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> > > [ 1.823665] [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> > > [ 1.830659] [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> > > [ 1.836588] [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> > > [ 1.842804] [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> > > [ 1.848638] [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> > > [ 1.855728] [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> > > [ 1.862625] [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> > > [ 1.869616] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > [ 1.876896] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > [ 1.884177] [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> > > [ 1.890780] [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> > > [ 1.896805] [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> > > [ 1.903021] [<ffffffff81a14830>] acpi_init+0x25d/0x2a6
> >
> > So basically acpi thinks that some memory block is a hot plug memory
> > and tries to add it. And that consumes lots of memory and we don't have
> > that memory in second kernel.
>
> That's not exactly the case. What seems to happen is that there is an ACPI
> memory object in the ACPI namespace and the ACPI memory hotplug driver
> attempts to bind to it. That driver attempts to find removable memory blocks
> associated with that object and to add them to the memory map.
>
> Why don't you simply append acpi=off to the kexec command line? That should
> make the problem go away.

I think we need to initialize acpi because we rely on it for other tables
and things. In fact everything in second kernel re-initializes so why ACPI
should be an exception? We want second kernel boot path to be as close
as possible to first kernel so that chances of successful boot are higher.

So I don't think turning off acpi is way to go here.

Key question is, whey this memory is still being considered as hotplugged
memory while nothing has been hotplugged. I think acpi should not treat
this memory as hotplug memory. And if ACPI does not have a way to figure
it out, then disable memory hotplug functionality makes sense to me.

Thanks
Vivek

2014-01-09 14:51:13

by Vivek Goyal

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Wed, Jan 08, 2014 at 05:11:48PM -0700, Toshi Kani wrote:
> On Thu, 2014-01-09 at 00:07 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, January 08, 2014 10:58:29 AM Vivek Goyal wrote:
> > > On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:
> > >
> > > [..]
> > > > [ 1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> > > > [ 1.605045] PCI host bridge to bus 0000:ff
> > > > [ 1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> > > > [ 1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> > > > [ 1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> > > > [ 1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> > > > [ 1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> > > > [ 1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> > > > [ 1.743224] 0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> > > > [ 1.751513] ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> > > > [ 1.759804] ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> > > > [ 1.768096] Call Trace: [348/1928]
> > > > [ 1.770834] [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> > > > [ 1.776561] [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> > > > [ 1.783076] [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> > > > [ 1.789581] [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> > > > [ 1.796672] [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> > > > [ 1.803274] [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> > > > [ 1.810263] [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> > > > [ 1.816673] [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> > > > [ 1.823665] [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> > > > [ 1.830659] [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> > > > [ 1.836588] [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> > > > [ 1.842804] [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> > > > [ 1.848638] [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> > > > [ 1.855728] [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> > > > [ 1.862625] [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> > > > [ 1.869616] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > [ 1.876896] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > [ 1.884177] [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> > > > [ 1.890780] [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> > > > [ 1.896805] [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> > > > [ 1.903021] [<ffffffff81a14830>] acpi_init+0x25d/0x2a6
> > >
> > > So basically acpi thinks that some memory block is a hot plug memory
> > > and tries to add it. And that consumes lots of memory and we don't have
> > > that memory in second kernel.
> >
> > That's not exactly the case. What seems to happen is that there is an ACPI
> > memory object in the ACPI namespace and the ACPI memory hotplug driver
> > attempts to bind to it. That driver attempts to find removable memory blocks
> > associated with that object and to add them to the memory map.
> >
> > Why don't you simply append acpi=off to the kexec command line? That should
> > make the problem go away.
>
> Yes, that should work, but Baoquan's approach makes sense to me. When
> memmap=exactmap is specified, the kernel should ignore any memory
> information from the firmware.

memmap=exactmap is only for E820 map. It does not say that later memory
can not be hotplugged. So to me specifying exactmap does not imply that
memory hotplugging is disabled.

IMO, it makes sense to have a separate knob to disable memory hotplug
behavior.

Also from kdump point of view, I don't want to rely on exactmap as in
new implementation I am planning to move away from exactmap. I will
pass new memory map in bootparams and stop passing it on command line.

Thanks
Vivek

2014-01-09 14:54:39

by Vivek Goyal

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, Jan 09, 2014 at 02:10:26PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, January 08, 2014 05:11:48 PM Toshi Kani wrote:
> > On Thu, 2014-01-09 at 00:07 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, January 08, 2014 10:58:29 AM Vivek Goyal wrote:
> > > > On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:
> > > >
> > > > [..]
> > > > > [ 1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> > > > > [ 1.605045] PCI host bridge to bus 0000:ff
> > > > > [ 1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> > > > > [ 1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> > > > > [ 1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> > > > > [ 1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> > > > > [ 1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> > > > > [ 1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> > > > > [ 1.743224] 0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> > > > > [ 1.751513] ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> > > > > [ 1.759804] ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> > > > > [ 1.768096] Call Trace: [348/1928]
> > > > > [ 1.770834] [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> > > > > [ 1.776561] [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> > > > > [ 1.783076] [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> > > > > [ 1.789581] [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> > > > > [ 1.796672] [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> > > > > [ 1.803274] [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> > > > > [ 1.810263] [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> > > > > [ 1.816673] [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> > > > > [ 1.823665] [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> > > > > [ 1.830659] [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> > > > > [ 1.836588] [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> > > > > [ 1.842804] [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> > > > > [ 1.848638] [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> > > > > [ 1.855728] [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> > > > > [ 1.862625] [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> > > > > [ 1.869616] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > > [ 1.876896] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > > [ 1.884177] [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> > > > > [ 1.890780] [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> > > > > [ 1.896805] [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> > > > > [ 1.903021] [<ffffffff81a14830>] acpi_init+0x25d/0x2a6
> > > >
> > > > So basically acpi thinks that some memory block is a hot plug memory
> > > > and tries to add it. And that consumes lots of memory and we don't have
> > > > that memory in second kernel.
> > >
> > > That's not exactly the case. What seems to happen is that there is an ACPI
> > > memory object in the ACPI namespace and the ACPI memory hotplug driver
> > > attempts to bind to it. That driver attempts to find removable memory blocks
> > > associated with that object and to add them to the memory map.
> > >
> > > Why don't you simply append acpi=off to the kexec command line? That should
> > > make the problem go away.
> >
> > Yes, that should work, but Baoquan's approach makes sense to me. When
> > memmap=exactmap is specified, the kernel should ignore any memory
> > information from the firmware.
>
> OK
>
> Baoquan, please modify your patch to get rid of the #ifdef CONFIG_X86 in
> acpi_memory_hotplug_init(). For example, you can add a function returning true
> if use_exactmap is set and false otherwise and make acpi_memory_hotplug_init()
> call that function. Alternatively, you can define arch-independent
> no_memory_hotplug (instead of use_exactmap) and set if for memmap=exactmap.
>

Prarit sent a patch to introduce no_memory_hotplug command line. I still
think that memmap=exactmap does not necessarily mean that memory hotplug
is disabled.

What about mem= parameter. If somebody specifies mem=1G, should that mean
there can not be any hotplugged memory.

I think we should atleast define a new command line parameter to disable
memory hotplug. After that users can specify both memmap=exactmap and
"no_mem_hotplug" on command line and control the behavior of kernel.

Thanks
Vivek

2014-01-09 16:10:01

by Toshi Kani

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, 2014-01-09 at 09:50 -0500, Vivek Goyal wrote:
> On Wed, Jan 08, 2014 at 05:11:48PM -0700, Toshi Kani wrote:
> > On Thu, 2014-01-09 at 00:07 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, January 08, 2014 10:58:29 AM Vivek Goyal wrote:
> > > > On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:
> > > >
> > > > [..]
> > > > > [ 1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> > > > > [ 1.605045] PCI host bridge to bus 0000:ff
> > > > > [ 1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> > > > > [ 1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> > > > > [ 1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> > > > > [ 1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> > > > > [ 1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> > > > > [ 1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> > > > > [ 1.743224] 0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> > > > > [ 1.751513] ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> > > > > [ 1.759804] ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> > > > > [ 1.768096] Call Trace: [348/1928]
> > > > > [ 1.770834] [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> > > > > [ 1.776561] [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> > > > > [ 1.783076] [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> > > > > [ 1.789581] [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> > > > > [ 1.796672] [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> > > > > [ 1.803274] [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> > > > > [ 1.810263] [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> > > > > [ 1.816673] [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> > > > > [ 1.823665] [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> > > > > [ 1.830659] [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> > > > > [ 1.836588] [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> > > > > [ 1.842804] [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> > > > > [ 1.848638] [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> > > > > [ 1.855728] [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> > > > > [ 1.862625] [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> > > > > [ 1.869616] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > > [ 1.876896] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > > [ 1.884177] [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> > > > > [ 1.890780] [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> > > > > [ 1.896805] [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> > > > > [ 1.903021] [<ffffffff81a14830>] acpi_init+0x25d/0x2a6
> > > >
> > > > So basically acpi thinks that some memory block is a hot plug memory
> > > > and tries to add it. And that consumes lots of memory and we don't have
> > > > that memory in second kernel.
> > >
> > > That's not exactly the case. What seems to happen is that there is an ACPI
> > > memory object in the ACPI namespace and the ACPI memory hotplug driver
> > > attempts to bind to it. That driver attempts to find removable memory blocks
> > > associated with that object and to add them to the memory map.
> > >
> > > Why don't you simply append acpi=off to the kexec command line? That should
> > > make the problem go away.
> >
> > Yes, that should work, but Baoquan's approach makes sense to me. When
> > memmap=exactmap is specified, the kernel should ignore any memory
> > information from the firmware.
>
> memmap=exactmap is only for E820 map. It does not say that later memory
> can not be hotplugged. So to me specifying exactmap does not imply that
> memory hotplugging is disabled.

There are multiple ways to describe memory range info in the firmware;
e820, EFI memory descriptor table, and ACPI memory device objects. They
basically provide the same info.

This problem happens when the firmware implements ACPI memory device
objects, which are necessary to support memory hotplug, but do not mean
that the system always supports hotplug when they exist. They are
optional objects that firmware vendors may choose to implement.

While the exactmap option does not imply that memory hotplug is
disabled, it does require that the kernel only consumes user-supplied
memory range information. Hence, Baoquan's approach makes sense to me.

> IMO, it makes sense to have a separate knob to disable memory hotplug
> behavior.

Regular users do not know if their systems implement ACPI memory device
objects or not. So, asking users to specify a separate option when
their systems implement ACPI memory objects is tricky, IMO.

> Also from kdump point of view, I don't want to rely on exactmap as in
> new implementation I am planning to move away from exactmap. I will
> pass new memory map in bootparams and stop passing it on command line.

I think we still need a flag that indicates the kernel can only consume
the new memory map in bootparams, and cannot to obtain from the
firmware.

Thanks,
-Toshi

2014-01-09 16:21:22

by Toshi Kani

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, 2014-01-09 at 09:53 -0500, Vivek Goyal wrote:
> On Thu, Jan 09, 2014 at 02:10:26PM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, January 08, 2014 05:11:48 PM Toshi Kani wrote:
> > > On Thu, 2014-01-09 at 00:07 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, January 08, 2014 10:58:29 AM Vivek Goyal wrote:
> > > > > On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:
> > > > >
> > > > > [..]
> > > > > > [ 1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> > > > > > [ 1.605045] PCI host bridge to bus 0000:ff
> > > > > > [ 1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> > > > > > [ 1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> > > > > > [ 1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> > > > > > [ 1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> > > > > > [ 1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> > > > > > [ 1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> > > > > > [ 1.743224] 0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> > > > > > [ 1.751513] ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> > > > > > [ 1.759804] ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> > > > > > [ 1.768096] Call Trace: [348/1928]
> > > > > > [ 1.770834] [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> > > > > > [ 1.776561] [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> > > > > > [ 1.783076] [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> > > > > > [ 1.789581] [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> > > > > > [ 1.796672] [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> > > > > > [ 1.803274] [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> > > > > > [ 1.810263] [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> > > > > > [ 1.816673] [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> > > > > > [ 1.823665] [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> > > > > > [ 1.830659] [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> > > > > > [ 1.836588] [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> > > > > > [ 1.842804] [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> > > > > > [ 1.848638] [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> > > > > > [ 1.855728] [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> > > > > > [ 1.862625] [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> > > > > > [ 1.869616] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > > > [ 1.876896] [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > > > [ 1.884177] [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> > > > > > [ 1.890780] [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> > > > > > [ 1.896805] [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> > > > > > [ 1.903021] [<ffffffff81a14830>] acpi_init+0x25d/0x2a6
> > > > >
> > > > > So basically acpi thinks that some memory block is a hot plug memory
> > > > > and tries to add it. And that consumes lots of memory and we don't have
> > > > > that memory in second kernel.
> > > >
> > > > That's not exactly the case. What seems to happen is that there is an ACPI
> > > > memory object in the ACPI namespace and the ACPI memory hotplug driver
> > > > attempts to bind to it. That driver attempts to find removable memory blocks
> > > > associated with that object and to add them to the memory map.
> > > >
> > > > Why don't you simply append acpi=off to the kexec command line? That should
> > > > make the problem go away.
> > >
> > > Yes, that should work, but Baoquan's approach makes sense to me. When
> > > memmap=exactmap is specified, the kernel should ignore any memory
> > > information from the firmware.
> >
> > OK
> >
> > Baoquan, please modify your patch to get rid of the #ifdef CONFIG_X86 in
> > acpi_memory_hotplug_init(). For example, you can add a function returning true
> > if use_exactmap is set and false otherwise and make acpi_memory_hotplug_init()
> > call that function. Alternatively, you can define arch-independent
> > no_memory_hotplug (instead of use_exactmap) and set if for memmap=exactmap.
> >
>
> Prarit sent a patch to introduce no_memory_hotplug command line. I still
> think that memmap=exactmap does not necessarily mean that memory hotplug
> is disabled.
>
> What about mem= parameter. If somebody specifies mem=1G, should that mean
> there can not be any hotplugged memory.

Good point. Yes, I think we need to ignore ACPI memory objects in this
case as well. I suppose the use of this option is limited for specific
test purpose, and disabling memory hotplug is not a big issue here.

Thanks,
-Toshi

2014-01-09 16:25:14

by Vivek Goyal

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, Jan 09, 2014 at 09:03:59AM -0700, Toshi Kani wrote:

[..]
> > > > > So basically acpi thinks that some memory block is a hot plug memory
> > > > > and tries to add it. And that consumes lots of memory and we don't have
> > > > > that memory in second kernel.
> > > >
> > > > That's not exactly the case. What seems to happen is that there is an ACPI
> > > > memory object in the ACPI namespace and the ACPI memory hotplug driver
> > > > attempts to bind to it. That driver attempts to find removable memory blocks
> > > > associated with that object and to add them to the memory map.
> > > >
> > > > Why don't you simply append acpi=off to the kexec command line? That should
> > > > make the problem go away.
> > >
> > > Yes, that should work, but Baoquan's approach makes sense to me. When
> > > memmap=exactmap is specified, the kernel should ignore any memory
> > > information from the firmware.
> >
> > memmap=exactmap is only for E820 map. It does not say that later memory
> > can not be hotplugged. So to me specifying exactmap does not imply that
> > memory hotplugging is disabled.
>
> There are multiple ways to describe memory range info in the firmware;
> e820, EFI memory descriptor table, and ACPI memory device objects. They
> basically provide the same info.

So ACPI memory device objects contain all the memory ranges as exported
in E820?

>
> This problem happens when the firmware implements ACPI memory device
> objects, which are necessary to support memory hotplug, but do not mean
> that the system always supports hotplug when they exist. They are
> optional objects that firmware vendors may choose to implement.

This is confusing. So even if memory hotplug is not supported, ACPI memory
device objects might be present. What's the purpose? How do they help.

If they represent same info as firmware provided using a BIOS call early
(E820 map), then how does system later avoid adding same memory ranges.

IOW, in terms of design, what's the objective. Why to create this
additional path of getting memory information.

>
> While the exactmap option does not imply that memory hotplug is
> disabled,

But Bao's approach will disable memory hotplug on exactmap.

> it does require that the kernel only consumes user-supplied
> memory range information. Hence, Baoquan's approach makes sense to me.

I am fine with this as long as memmap=exactmap is not the only way to
disable memory hotplug. I need another way too so that users who are
not using exactmap can still disable memory hotplug.

>
> > IMO, it makes sense to have a separate knob to disable memory hotplug
> > behavior.
>
> Regular users do not know if their systems implement ACPI memory device
> objects or not. So, asking users to specify a separate option when
> their systems implement ACPI memory objects is tricky, IMO.

They can always specify no_memory_hotplug, irrespective of the fact that
kernel supports memory hotplug or not.

Anyway, I don't mind if one implicitly disables memory hotplug if
memmap=exactmap or mem=X is specified. It is just a matter of figuring
how what should be a more intutive behavior from user's point of view.

But I do want a separate path to disable memory hotplug so that even
if I am not using memmap=exactmap or mem=X, I should be able to disable
memory hotplug.

>
> > Also from kdump point of view, I don't want to rely on exactmap as in
> > new implementation I am planning to move away from exactmap. I will
> > pass new memory map in bootparams and stop passing it on command line.
>
> I think we still need a flag that indicates the kernel can only consume
> the new memory map in bootparams, and cannot to obtain from the
> firmware.

I think creating a new command line option is simpler as compared to
creating a new flag in bootparam which in turn disables memory hotplug.
More users can use that option. For example, if for some reason hotplug
code is crashing, one can just disable it on command line as work around
and move on.

Thanks
Vivek

2014-01-09 17:38:15

by Toshi Kani

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, 2014-01-09 at 11:24 -0500, Vivek Goyal wrote:
> On Thu, Jan 09, 2014 at 09:03:59AM -0700, Toshi Kani wrote:
>
> [..]
> > > > > > So basically acpi thinks that some memory block is a hot plug memory
> > > > > > and tries to add it. And that consumes lots of memory and we don't have
> > > > > > that memory in second kernel.
> > > > >
> > > > > That's not exactly the case. What seems to happen is that there is an ACPI
> > > > > memory object in the ACPI namespace and the ACPI memory hotplug driver
> > > > > attempts to bind to it. That driver attempts to find removable memory blocks
> > > > > associated with that object and to add them to the memory map.
> > > > >
> > > > > Why don't you simply append acpi=off to the kexec command line? That should
> > > > > make the problem go away.
> > > >
> > > > Yes, that should work, but Baoquan's approach makes sense to me. When
> > > > memmap=exactmap is specified, the kernel should ignore any memory
> > > > information from the firmware.
> > >
> > > memmap=exactmap is only for E820 map. It does not say that later memory
> > > can not be hotplugged. So to me specifying exactmap does not imply that
> > > memory hotplugging is disabled.
> >
> > There are multiple ways to describe memory range info in the firmware;
> > e820, EFI memory descriptor table, and ACPI memory device objects. They
> > basically provide the same info.
>
> So ACPI memory device objects contain all the memory ranges as exported
> in E820?

Yes. (Some vendors might choose to implement some portion of memory
with memory objects, but I think they are special cases.)

> > This problem happens when the firmware implements ACPI memory device
> > objects, which are necessary to support memory hotplug, but do not mean
> > that the system always supports hotplug when they exist. They are
> > optional objects that firmware vendors may choose to implement.
>
> This is confusing. So even if memory hotplug is not supported, ACPI memory
> device objects might be present. What's the purpose? How do they help.

They do not help at this point, but the point is that memory objects can
be present without hotplug support. There is nothing wrong with it per
the spec.

> If they represent same info as firmware provided using a BIOS call early
> (E820 map), then how does system later avoid adding same memory ranges.

It attempts to add, but fails with -EEXIST because it is already there.

> IOW, in terms of design, what's the objective. Why to create this
> additional path of getting memory information.

To support memory hot-remove requests, ACPI objects need to be
initialized for the existing memory ranges beforehand.

> > While the exactmap option does not imply that memory hotplug is
> > disabled,
>
> But Bao's approach will disable memory hotplug on exactmap.

Right, but it does not seem worthwhile for adding complexity to support
memory hotplug and exactmap at the same time.

> > it does require that the kernel only consumes user-supplied
> > memory range information. Hence, Baoquan's approach makes sense to me.
>
> I am fine with this as long as memmap=exactmap is not the only way to
> disable memory hotplug. I need another way too so that users who are
> not using exactmap can still disable memory hotplug.

There is a config option to enable/disable memory hotplug. You are
right that the exactmap option is not the way to disable memory hotplug.
This option requests the kernel to use user-supplied memory ranges only,
so memory hotplug will not be supported under this constrain.

> > > IMO, it makes sense to have a separate knob to disable memory hotplug
> > > behavior.
> >
> > Regular users do not know if their systems implement ACPI memory device
> > objects or not. So, asking users to specify a separate option when
> > their systems implement ACPI memory objects is tricky, IMO.
>
> They can always specify no_memory_hotplug, irrespective of the fact that
> kernel supports memory hotplug or not.
>
> Anyway, I don't mind if one implicitly disables memory hotplug if
> memmap=exactmap or mem=X is specified. It is just a matter of figuring
> how what should be a more intutive behavior from user's point of view.

Since memory hotplug won't work under the constrain of the exactmap
option, it seems natural to disable it.

> But I do want a separate path to disable memory hotplug so that even
> if I am not using memmap=exactmap or mem=X, I should be able to disable
> memory hotplug.

I think this is a separate topic.

> > > Also from kdump point of view, I don't want to rely on exactmap as in
> > > new implementation I am planning to move away from exactmap. I will
> > > pass new memory map in bootparams and stop passing it on command line.
> >
> > I think we still need a flag that indicates the kernel can only consume
> > the new memory map in bootparams, and cannot to obtain from the
> > firmware.
>
> I think creating a new command line option is simpler as compared to
> creating a new flag in bootparam which in turn disables memory hotplug.
> More users can use that option. For example, if for some reason hotplug
> code is crashing, one can just disable it on command line as work around
> and move on.

I do not have a strong opinion about having such option. However, I
think it is more user friendly to keep the exactmap option works alone
on any platforms.

Thanks,
-Toshi

2014-01-09 18:23:51

by Vivek Goyal

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, Jan 09, 2014 at 10:24:25AM -0700, Toshi Kani wrote:

[..]
> > I think creating a new command line option is simpler as compared to
> > creating a new flag in bootparam which in turn disables memory hotplug.
> > More users can use that option. For example, if for some reason hotplug
> > code is crashing, one can just disable it on command line as work around
> > and move on.
>
> I do not have a strong opinion about having such option. However, I
> think it is more user friendly to keep the exactmap option works alone
> on any platforms.

I think we should create internally a variable which will disable memory
hotplug. And set that variable based on memmap=exactmap, mem=X and also
provide a way to disable memory hotplug directly using command line
option.

Current kexec-tools can use memmap=exactmap and be happy. I am writing
a new kexec syscall and will not be using memmap=exactmap and would need
to use that command line option to disable memory hotplug behavior.

Thanks
Vivek

2014-01-09 18:40:34

by Toshi Kani

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, 2014-01-09 at 13:23 -0500, Vivek Goyal wrote:
> On Thu, Jan 09, 2014 at 10:24:25AM -0700, Toshi Kani wrote:
>
> [..]
> > > I think creating a new command line option is simpler as compared to
> > > creating a new flag in bootparam which in turn disables memory hotplug.
> > > More users can use that option. For example, if for some reason hotplug
> > > code is crashing, one can just disable it on command line as work around
> > > and move on.
> >
> > I do not have a strong opinion about having such option. However, I
> > think it is more user friendly to keep the exactmap option works alone
> > on any platforms.
>
> I think we should create internally a variable which will disable memory
> hotplug. And set that variable based on memmap=exactmap, mem=X and also
> provide a way to disable memory hotplug directly using command line
> option.
>
> Current kexec-tools can use memmap=exactmap and be happy. I am writing
> a new kexec syscall and will not be using memmap=exactmap and would need
> to use that command line option to disable memory hotplug behavior.

Sounds good to me.

Thanks,
-Toshi

2014-01-09 21:27:56

by Vivek Goyal

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, Jan 09, 2014 at 11:34:30AM -0700, Toshi Kani wrote:
> On Thu, 2014-01-09 at 13:23 -0500, Vivek Goyal wrote:
> > On Thu, Jan 09, 2014 at 10:24:25AM -0700, Toshi Kani wrote:
> >
> > [..]
> > > > I think creating a new command line option is simpler as compared to
> > > > creating a new flag in bootparam which in turn disables memory hotplug.
> > > > More users can use that option. For example, if for some reason hotplug
> > > > code is crashing, one can just disable it on command line as work around
> > > > and move on.
> > >
> > > I do not have a strong opinion about having such option. However, I
> > > think it is more user friendly to keep the exactmap option works alone
> > > on any platforms.
> >
> > I think we should create internally a variable which will disable memory
> > hotplug. And set that variable based on memmap=exactmap, mem=X and also
> > provide a way to disable memory hotplug directly using command line
> > option.
> >
> > Current kexec-tools can use memmap=exactmap and be happy. I am writing
> > a new kexec syscall and will not be using memmap=exactmap and would need
> > to use that command line option to disable memory hotplug behavior.
>
> Sounds good to me.

Nobody responded to my other question, so I would ask it again.

Assume we have disabled hotplug memory in second kernel. First kernel
saw hotplug memory and assume crash kernel reserved region came from
there. We will pass this memory in bootparams to second kernel and it
will show up in E820 map. It should still be accessible in second kernel,
is that right?

Or there is some dependency on ACPI doing some magic before this memory
range is available in second kernel?

Thanks
Vivek

2014-01-09 22:02:26

by Toshi Kani

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thu, 2014-01-09 at 16:27 -0500, Vivek Goyal wrote:
> On Thu, Jan 09, 2014 at 11:34:30AM -0700, Toshi Kani wrote:
> > On Thu, 2014-01-09 at 13:23 -0500, Vivek Goyal wrote:
> > > On Thu, Jan 09, 2014 at 10:24:25AM -0700, Toshi Kani wrote:
> > >
> > > [..]
> > > > > I think creating a new command line option is simpler as compared to
> > > > > creating a new flag in bootparam which in turn disables memory hotplug.
> > > > > More users can use that option. For example, if for some reason hotplug
> > > > > code is crashing, one can just disable it on command line as work around
> > > > > and move on.
> > > >
> > > > I do not have a strong opinion about having such option. However, I
> > > > think it is more user friendly to keep the exactmap option works alone
> > > > on any platforms.
> > >
> > > I think we should create internally a variable which will disable memory
> > > hotplug. And set that variable based on memmap=exactmap, mem=X and also
> > > provide a way to disable memory hotplug directly using command line
> > > option.
> > >
> > > Current kexec-tools can use memmap=exactmap and be happy. I am writing
> > > a new kexec syscall and will not be using memmap=exactmap and would need
> > > to use that command line option to disable memory hotplug behavior.
> >
> > Sounds good to me.
>
> Nobody responded to my other question, so I would ask it again.
>
> Assume we have disabled hotplug memory in second kernel. First kernel
> saw hotplug memory and assume crash kernel reserved region came from
> there. We will pass this memory in bootparams to second kernel and it
> will show up in E820 map. It should still be accessible in second kernel,
> is that right?

Yes.

> Or there is some dependency on ACPI doing some magic before this memory
> range is available in second kernel?

No. The 1st kernel reserves the crash kernel region, which cannot be
hot-deleted. So, this region continues to be accessible by the 2nd
kernel without any operation.

I am more curious to know how makedumpfile decides what memory ranges to
dump. The 1st kernel may have performed memory hot-add / delete
operations before a crash, so it needs to know the valid physical
address range at the time of crash, and may not rely on the E820 map
from BIOS (which is stale). Am I right to assume that makedumpfile gets
it from the page tables of the 1st kernel?

Thanks,
-Toshi

2014-01-10 01:26:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Thursday, January 09, 2014 11:34:30 AM Toshi Kani wrote:
> On Thu, 2014-01-09 at 13:23 -0500, Vivek Goyal wrote:
> > On Thu, Jan 09, 2014 at 10:24:25AM -0700, Toshi Kani wrote:
> >
> > [..]
> > > > I think creating a new command line option is simpler as compared to
> > > > creating a new flag in bootparam which in turn disables memory hotplug.
> > > > More users can use that option. For example, if for some reason hotplug
> > > > code is crashing, one can just disable it on command line as work around
> > > > and move on.
> > >
> > > I do not have a strong opinion about having such option. However, I
> > > think it is more user friendly to keep the exactmap option works alone
> > > on any platforms.
> >
> > I think we should create internally a variable which will disable memory
> > hotplug. And set that variable based on memmap=exactmap, mem=X and also
> > provide a way to disable memory hotplug directly using command line
> > option.
> >
> > Current kexec-tools can use memmap=exactmap and be happy. I am writing
> > a new kexec syscall and will not be using memmap=exactmap and would need
> > to use that command line option to disable memory hotplug behavior.
>
> Sounds good to me.

Agreed.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-01-10 07:12:58

by Baoquan He

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On 01/09/14 at 02:56pm, Toshi Kani wrote:
> On Thu, 2014-01-09 at 16:27 -0500, Vivek Goyal wrote:
> > On Thu, Jan 09, 2014 at 11:34:30AM -0700, Toshi Kani wrote:
> > > On Thu, 2014-01-09 at 13:23 -0500, Vivek Goyal wrote:
> > > > On Thu, Jan 09, 2014 at 10:24:25AM -0700, Toshi Kani wrote:
> > > >
> > > > [..]
> > > > > > I think creating a new command line option is simpler as compared to
> > > > > > creating a new flag in bootparam which in turn disables memory hotplug.
> > > > > > More users can use that option. For example, if for some reason hotplug
> > > > > > code is crashing, one can just disable it on command line as work around
> > > > > > and move on.
> > > > >
> > > > > I do not have a strong opinion about having such option. However, I
> > > > > think it is more user friendly to keep the exactmap option works alone
> > > > > on any platforms.
> > > >
> > > > I think we should create internally a variable which will disable memory
> > > > hotplug. And set that variable based on memmap=exactmap, mem=X and also
> > > > provide a way to disable memory hotplug directly using command line
> > > > option.
> > > >
> > > > Current kexec-tools can use memmap=exactmap and be happy. I am writing
> > > > a new kexec syscall and will not be using memmap=exactmap and would need
> > > > to use that command line option to disable memory hotplug behavior.
> > >
> > > Sounds good to me.
> >
> > Nobody responded to my other question, so I would ask it again.
> >
> > Assume we have disabled hotplug memory in second kernel. First kernel
> > saw hotplug memory and assume crash kernel reserved region came from
> > there. We will pass this memory in bootparams to second kernel and it
> > will show up in E820 map. It should still be accessible in second kernel,
> > is that right?
>
> Yes.
>
> > Or there is some dependency on ACPI doing some magic before this memory
> > range is available in second kernel?
>
> No. The 1st kernel reserves the crash kernel region, which cannot be
> hot-deleted. So, this region continues to be accessible by the 2nd
> kernel without any operation.

Now what I understand is if a several memsection is reserved for
crashkernel, then in 2nd kernel, they are just like normal memory. In ns
object tree, they are not treated as hotplug memory.

Otherwise, any hotplug memory which is not reserved for 2nd kernel can
be parsed and need be added as hotplug memory, and add them into movable
zone.

Am I right?

The other question, e820 reserve is done earlier than acpi
initialization, because acpi_early_init() invocation is very late in
start_kernel(). Does that means at the very beginning all memorys are in
e820, later when acpi_early_init is called, hotplug memory is detected,
they will be moved to different place or need be marked with a specific
flag?



>
> I am more curious to know how makedumpfile decides what memory ranges to
> dump. The 1st kernel may have performed memory hot-add / delete
> operations before a crash, so it needs to know the valid physical
> address range at the time of crash, and may not rely on the E820 map
> from BIOS (which is stale). Am I right to assume that makedumpfile gets
> it from the page tables of the 1st kernel?

makedumpfile just do the dump, what memory ranges to dump is decided in
1st kernel by kexec-tools. In 1st kernel, if kexec-tools executed, it
will find all System Ram memorys which exclude the reserved regions for
kdump kernel, then build a logical elf file, each load segment is one of
these System Ram memory regions, its addr and length is written into the
program header.

Then makedumpfile just read this elf file, and read all of them and
dump.

If after kexec-tools execution and before crash, a hotplug memory is
removed, udev will check this and trigger a kdump restart, kexec-tools
is executed again, System Ram region information are stored. The logical
file header will be passed to 2nd kernel.


>
> Thanks,
> -Toshi
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2014-01-10 08:07:47

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

(2014/01/10 16:11), Baoquan wrote:
> On 01/09/14 at 02:56pm, Toshi Kani wrote:
>> On Thu, 2014-01-09 at 16:27 -0500, Vivek Goyal wrote:
>>> On Thu, Jan 09, 2014 at 11:34:30AM -0700, Toshi Kani wrote:
>>>> On Thu, 2014-01-09 at 13:23 -0500, Vivek Goyal wrote:
>>>>> On Thu, Jan 09, 2014 at 10:24:25AM -0700, Toshi Kani wrote:
>>>>>
>>>>> [..]
>>>>>>> I think creating a new command line option is simpler as compared to
>>>>>>> creating a new flag in bootparam which in turn disables memory hotplug.
>>>>>>> More users can use that option. For example, if for some reason hotplug
>>>>>>> code is crashing, one can just disable it on command line as work around
>>>>>>> and move on.
>>>>>>
>>>>>> I do not have a strong opinion about having such option. However, I
>>>>>> think it is more user friendly to keep the exactmap option works alone
>>>>>> on any platforms.
>>>>>
>>>>> I think we should create internally a variable which will disable memory
>>>>> hotplug. And set that variable based on memmap=exactmap, mem=X and also
>>>>> provide a way to disable memory hotplug directly using command line
>>>>> option.
>>>>>
>>>>> Current kexec-tools can use memmap=exactmap and be happy. I am writing
>>>>> a new kexec syscall and will not be using memmap=exactmap and would need
>>>>> to use that command line option to disable memory hotplug behavior.
>>>>
>>>> Sounds good to me.
>>>
>>> Nobody responded to my other question, so I would ask it again.
>>>
>>> Assume we have disabled hotplug memory in second kernel. First kernel
>>> saw hotplug memory and assume crash kernel reserved region came from
>>> there. We will pass this memory in bootparams to second kernel and it
>>> will show up in E820 map. It should still be accessible in second kernel,
>>> is that right?
>>
>> Yes.
>>
>>> Or there is some dependency on ACPI doing some magic before this memory
>>> range is available in second kernel?
>>
>> No. The 1st kernel reserves the crash kernel region, which cannot be
>> hot-deleted. So, this region continues to be accessible by the 2nd
>> kernel without any operation.
>

If my understanding is correct:

> Now what I understand is if a several memsection is reserved for
> crashkernel, then in 2nd kernel, they are just like normal memory.

correct.

> In ns
> object tree, they are not treated as hotplug memory.

wrong.
They are treated as hotplug memory. But the memory cannot hot removed
because the memory has kernel memory.

> Otherwise, any hotplug memory which is not reserved for 2nd kernel can
> be parsed and need be added as hotplug memory, and add them into movable
> zone.

wrong.
The memory is allocated as normal zone and it is offline.

>
> Am I right?
>

> The other question, e820 reserve is done earlier than acpi
> initialization, because acpi_early_init() invocation is very late in
> start_kernel(). Does that means at the very beginning all memorys are in
> e820, later when acpi_early_init is called, hotplug memory is detected,
> they will be moved to different place or need be marked with a specific
> flag?

No.

Thanks,
Yasuaki Ishimatsu

>
>
>
>>
>> I am more curious to know how makedumpfile decides what memory ranges to
>> dump. The 1st kernel may have performed memory hot-add / delete
>> operations before a crash, so it needs to know the valid physical
>> address range at the time of crash, and may not rely on the E820 map
>> from BIOS (which is stale). Am I right to assume that makedumpfile gets
>> it from the page tables of the 1st kernel?
>
> makedumpfile just do the dump, what memory ranges to dump is decided in
> 1st kernel by kexec-tools. In 1st kernel, if kexec-tools executed, it
> will find all System Ram memorys which exclude the reserved regions for
> kdump kernel, then build a logical elf file, each load segment is one of
> these System Ram memory regions, its addr and length is written into the
> program header.
>
> Then makedumpfile just read this elf file, and read all of them and
> dump.
>
> If after kexec-tools execution and before crash, a hotplug memory is
> removed, udev will check this and trigger a kdump restart, kexec-tools
> is executed again, System Ram region information are stored. The logical
> file header will be passed to 2nd kernel.
>
>
>>
>> Thanks,
>> -Toshi
>>
>>
>> _______________________________________________
>> kexec mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/kexec
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2014-01-10 09:16:31

by Baoquan He

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel


>In ns
> >object tree, they are not treated as hotplug memory.
>
> wrong.
> They are treated as hotplug memory. But the memory cannot hot removed
> because the memory has kernel memory.
>
> >Otherwise, any hotplug memory which is not reserved for 2nd kernel can
> >be parsed and need be added as hotplug memory, and add them into movable
> >zone.
>
> wrong.
> The memory is allocated as normal zone and it is offline.

Hi,

Thanks for answering.

I am confused. Now the fact is in 1st kernel memory is reserved for
crashkernel and passed to 2nd kernel by exactmap. Then in 2nd kernel,
reserved memory regions are added into e820. Later hotplug memory still
trigger add_memory, and cause bug I reported.


>
> >
> >Am I right?
> >
>
> >The other question, e820 reserve is done earlier than acpi
> >initialization, because acpi_early_init() invocation is very late in
> >start_kernel(). Does that means at the very beginning all memorys are in
> >e820, later when acpi_early_init is called, hotplug memory is detected,
> >they will be moved to different place or need be marked with a specific
> >flag?
>
> No.
>
> Thanks,
> Yasuaki Ishimatsu
>

2014-01-10 09:36:17

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

(2014/01/10 18:14), Baoquan wrote:
>
> >In ns
>>> object tree, they are not treated as hotplug memory.
>>
>> wrong.
>> They are treated as hotplug memory. But the memory cannot hot removed
>> because the memory has kernel memory.
>>
>>> Otherwise, any hotplug memory which is not reserved for 2nd kernel can
>>> be parsed and need be added as hotplug memory, and add them into movable
>>> zone.
>>
>> wrong.
>> The memory is allocated as normal zone and it is offline.
>
> Hi,
>
> Thanks for answering.
>


> I am confused. Now the fact is in 1st kernel memory is reserved for
> crashkernel and passed to 2nd kernel by exactmap. Then in 2nd kernel,
> reserved memory regions are added into e820. Later hotplug memory still
> trigger add_memory, and cause bug I reported.

Does the issue occur even if you apply the following Prarit's patch to
your kernel and add no_memory_hotplug boot option to 2nd kernel?

http://marc.info/?l=linux-acpi&m=138922019607796&w=2

Thanks,
Yasuaki Ishimatsu

>
>
>>
>>>
>>> Am I right?
>>>
>>
>>> The other question, e820 reserve is done earlier than acpi
>>> initialization, because acpi_early_init() invocation is very late in
>>> start_kernel(). Does that means at the very beginning all memorys are in
>>> e820, later when acpi_early_init is called, hotplug memory is detected,
>>> they will be moved to different place or need be marked with a specific
>>> flag?
>>
>> No.
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2014-01-10 10:29:16

by Baoquan He

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On 01/10/14 at 06:35pm, Yasuaki Ishimatsu wrote:
> (2014/01/10 18:14), Baoquan wrote:
> >
> > >In ns
> >>>object tree, they are not treated as hotplug memory.
> >>
> >>wrong.
> >>They are treated as hotplug memory. But the memory cannot hot removed
> >>because the memory has kernel memory.
> >>
> >>>Otherwise, any hotplug memory which is not reserved for 2nd kernel can
> >>>be parsed and need be added as hotplug memory, and add them into movable
> >>>zone.
> >>
> >>wrong.
> >>The memory is allocated as normal zone and it is offline.
> >
> >Hi,
> >
> >Thanks for answering.
> >
>
>
> >I am confused. Now the fact is in 1st kernel memory is reserved for
> >crashkernel and passed to 2nd kernel by exactmap. Then in 2nd kernel,
> >reserved memory regions are added into e820. Later hotplug memory still
> >trigger add_memory, and cause bug I reported.
>
> Does the issue occur even if you apply the following Prarit's patch to
> your kernel and add no_memory_hotplug boot option to 2nd kernel?
>
> http://marc.info/?l=linux-acpi&m=138922019607796&w=2

This issue is the same as Prarit's. He posted the formal patch.

But still there are some questions we want to know.

>
> Thanks,
> Yasuaki Ishimatsu
>
> >
> >
> >>
> >>>
> >>>Am I right?
> >>>
> >>
> >>>The other question, e820 reserve is done earlier than acpi
> >>>initialization, because acpi_early_init() invocation is very late in
> >>>start_kernel(). Does that means at the very beginning all memorys are in
> >>>e820, later when acpi_early_init is called, hotplug memory is detected,
> >>>they will be moved to different place or need be marked with a specific
> >>>flag?
> >>
> >>No.
> >>
> >>Thanks,
> >>Yasuaki Ishimatsu
> >>
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> >the body of a message to [email protected]
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
>

2014-01-10 15:25:25

by Toshi Kani

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Fri, 2014-01-10 at 17:14 +0800, Baoquan wrote:
:
> >
> > >Otherwise, any hotplug memory which is not reserved for 2nd kernel can
> > >be parsed and need be added as hotplug memory, and add them into movable
> > >zone.
> >
> > wrong.
> > The memory is allocated as normal zone and it is offline.

This is "logical" offline, which means that the memory is accessible,
but the 1st kernel does not use it.

> Hi,
>
> Thanks for answering.
>
> I am confused. Now the fact is in 1st kernel memory is reserved for
> crashkernel and passed to 2nd kernel by exactmap. Then in 2nd kernel,
> reserved memory regions are added into e820.

Right. And this memory is accessible.

> Later hotplug memory still
> trigger add_memory, and cause bug I reported.

This is because the 2nd kernel gets all memory ranges from ACPI without
your change. This is bad, not only it causes the panic you reported but
also it can overwrite the 1st kernel's memory.

Thanks,
-Toshi

2014-01-10 16:02:17

by Toshi Kani

[permalink] [raw]
Subject: Re: kdump failed because of hotplug memory adding in kdump kernel

On Fri, 2014-01-10 at 15:11 +0800, Baoquan wrote:
> On 01/09/14 at 02:56pm, Toshi Kani wrote:
:
> > I am more curious to know how makedumpfile decides what memory ranges to
> > dump. The 1st kernel may have performed memory hot-add / delete
> > operations before a crash, so it needs to know the valid physical
> > address range at the time of crash, and may not rely on the E820 map
> > from BIOS (which is stale). Am I right to assume that makedumpfile gets
> > it from the page tables of the 1st kernel?
>
> makedumpfile just do the dump, what memory ranges to dump is decided in
> 1st kernel by kexec-tools. In 1st kernel, if kexec-tools executed, it
> will find all System Ram memorys which exclude the reserved regions for
> kdump kernel, then build a logical elf file, each load segment is one of
> these System Ram memory regions, its addr and length is written into the
> program header.
>
> Then makedumpfile just read this elf file, and read all of them and
> dump.
>
> If after kexec-tools execution and before crash, a hotplug memory is
> removed, udev will check this and trigger a kdump restart, kexec-tools
> is executed again, System Ram region information are stored. The logical
> file header will be passed to 2nd kernel.

Oh, that's how it works. Thanks for the explanation! In case of
hot-delete, ideally, the elf file should be updated after a memory
region is put into off-line, but before it is ejected. But it is
difficult/vulnerable to coordinate such sequence with user space. So,
the current scheme sounds good to me.

Thanks,
-Toshi