This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
even if 1st kernel crashs on some AP, a continueing work from:
[PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
https://lkml.org/lkml/2013/10/16/300.
In this version, basic design has changed. Now users need to figure
out initial APIC ID of BSP in the 1st kernel and configures kernel
parameter for the 2nd kernel manually using disable_cpu_apic kernel
parameter to be newly introduced in this patch set. This design is
more flexible than the previous version in that we no longer have to
rely on ACPI/MP table to get initial APIC ID of BSP.
Sorry, this patch set have not include in-source documentation
requested by Borislav Petkov yet, but I'll post it later separately,
which would be better to focus on documentation reviewing.
ChangeLog
v3 => v4)
- Rebased on top of v3.12-rc6
- Basic design has been changed. Now users need to figure out initial
APIC ID of BSP in the 1st kernel and configures kernel parameter for
the 2nd kernel manually using disable_cpu_apic kernel parameter to
be newly introduced in this patch set. This design is more flexible
than the previous version in that we no longer have to rely on
ACPI/MP table to get initial APIC ID of BSP.
v2 => v3)
- Change default value of boot_cpu_is_bsp to true.
- Before executing rdmsr(MSR_IA32_APICBASE), check if the number of
processor family is larger than or equal to 6 in order to avoid
invalid opcode exception on processors where MSR_IA32_APICBASE is
not supported.
v1 => v2)
- Rebased on top of v3.12-rc5.
- Fix linking time error of boot_cpu_is_bsp_init() in case of
CONFIG_LOCAL_APIC disabled by adding empty static inline function
instead.
- Fix missing feature check by means of cpu_has_apic macro in
boot_cpu_is_bsp_init() before calling rdmsr_safe(MSR_IA32_APICBASE).
NOTE: I've checked local apic-present case only; I don't have any
x86 processor without local apic.
- Add __init annotation to boot_cpu_is_bsp_init().
Test
- built with and without CONFIG_LOCAL_APIC
- tested x86_64 in case of acpi and MP table
---
HATAYAMA Daisuke (3):
x86, apic: Don't count the CPU with BP flag from MP table as booting-up CPU
x86, apic: Add disable_cpu_apicid kernel parameter
Documentation, x86, apic, kexec: Add disable_cpu_apicid kernel parameter
Documentation/kernel-parameters.txt | 9 +++++++++
arch/x86/kernel/apic/apic.c | 29 +++++++++++++++++++++++++++++
arch/x86/kernel/mpparse.c | 1 -
3 files changed, 38 insertions(+), 1 deletion(-)
--
Thanks.
HATAYAMA, Daisuke
If crash occurs on some AP, then kdump 2nd kernel is booted up on the
AP. Therefore, it is not always correct that the CPU that is currently
booting up the kernel is BSP. It's wrong to reflect BSP information in
MP table as for the current booting up CPU.
Also, boot_cpu_physical_apicid has already been initialized before
reaching here, for example, in register_lapic_address().
This is a preparation for next patch that will introduce a new kernel
parameter to disabls specified CPU where boot_cpu_physical_apicid
needs to have apicid for the currently booting up CPU to identify it
to avoid falsely disabling it.
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
arch/x86/kernel/mpparse.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index d2b5648..969bb9f 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -64,7 +64,6 @@ static void __init MP_processor_info(struct mpc_cpu *m)
if (m->cpuflag & CPU_BOOTPROCESSOR) {
bootup_cpu = " (Bootup-CPU)";
- boot_cpu_physical_apicid = m->apicid;
}
printk(KERN_INFO "Processor #%d%s\n", m->apicid, bootup_cpu);
Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
specify an initial APIC ID of the corresponding CPU you want to
disable.
This is mostly used for the kdump 2nd kernel to disable BSP to wake up
multiple CPUs without causing system reset or hang due to sending INIT
from AP to BSP.
Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
1st kernel, for example from /proc/cpuinfo and then set up this kernel
parameter for the 2nd kernel using the obtained APIC ID.
This design is more flexible than disabling BSP in kernel boot time
automatically in that in kernel boot time we have no choice but
referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
that the method is not applicable to the systems without such BIOS
tables.
This is designed based on the assumption that users get initial APIC
ID of the BSP in still healthy state and so BSP is uniquely kept in
CPU0; so through this kernel parameter, only one initial APIC ID can
be specified.
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
arch/x86/kernel/apic/apic.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a7eb82d..8cc4180 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -74,6 +74,13 @@ unsigned int max_physical_apicid;
physid_mask_t phys_cpu_present_map;
/*
+ * Processor to be disabled specified by kernel parameter
+ * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
+ * avoid undefined behaviour caused by sending INIT from AP to BSP.
+ */
+unsigned int disabled_cpu_apicid = BAD_APICID;
+
+/*
* Map cpu index to physical APIC ID
*/
DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -2113,6 +2120,19 @@ void generic_processor_info(int apicid, int version)
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
phys_cpu_present_map);
+ if (disabled_cpu_apicid != BAD_APICID &&
+ disabled_cpu_apicid != boot_cpu_physical_apicid &&
+ disabled_cpu_apicid == apicid) {
+ int thiscpu = num_processors + disabled_cpus;
+
+ pr_warning("ACPI: Disable specified CPU."
+ " Processor %d/0x%x ignored.\n",
+ thiscpu, apicid);
+
+ disabled_cpus++;
+ return;
+ }
+
/*
* If boot cpu has not been detected yet, then only allow upto
* nr_cpu_ids - 1 processors and keep one slot free for boot cpu
@@ -2589,3 +2609,12 @@ static int __init lapic_insert_resource(void)
* that is using request_resource
*/
late_initcall(lapic_insert_resource);
+
+static int __init apic_set_disabled_cpu_apicid(char *arg)
+{
+ if (!arg || !get_option(&arg, &disabled_cpu_apicid))
+ return -EINVAL;
+
+ return 0;
+}
+early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
Add disable_cpu_apicid kernel parameter to disable the CPU with the
specified number of initial APIC ID, mostly used for the kdump 2nd
kernel to disable BSP to wake up multiple CPUs without causing system
reset or hang due to sending INIT from AP to BSP.
Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
Documentation/kernel-parameters.txt | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fcbb736..0ca0902 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -774,6 +774,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
disable= [IPV6]
See Documentation/networking/ipv6.txt.
+ disable_cpu_apicid= [X86,APIC,KEXEC,SMP]
+ Format: <int>
+ The number of initial APIC ID for the
+ corresponding CPU to be disabled at boot,
+ mostly used for the kdump 2nd kernel to
+ disable BSP to wake up multiple CPUs without
+ causing system reset or hang due to sending
+ INIT from AP to BSP.
+
disable_ddw [PPC/PSERIES]
Disable Dynamic DMA Window support. Use this if
to workaround buggy firmware.
On Wed, Oct 23, 2013 at 12:01:18AM +0900, HATAYAMA Daisuke wrote:
> This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
> even if 1st kernel crashs on some AP, a continueing work from:
>
> [PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
> https://lkml.org/lkml/2013/10/16/300.
>
> In this version, basic design has changed. Now users need to figure
> out initial APIC ID of BSP in the 1st kernel and configures kernel
> parameter for the 2nd kernel manually using disable_cpu_apic kernel
> parameter to be newly introduced in this patch set. This design is
> more flexible than the previous version in that we no longer have to
> rely on ACPI/MP table to get initial APIC ID of BSP.
>
> Sorry, this patch set have not include in-source documentation
> requested by Borislav Petkov yet, but I'll post it later separately,
> which would be better to focus on documentation reviewing.
>
> ChangeLog
>
> v3 => v4)
>
> - Rebased on top of v3.12-rc6
>
> - Basic design has been changed. Now users need to figure out initial
> APIC ID of BSP in the 1st kernel and configures kernel parameter for
> the 2nd kernel manually using disable_cpu_apic kernel parameter to
> be newly introduced in this patch set. This design is more flexible
> than the previous version in that we no longer have to rely on
> ACPI/MP table to get initial APIC ID of BSP.
Do you literally mean a human at each boot will have to configure
the kdump configuration files for passing disable_cpu_apic?
Or do you envision the setting of disable_cpu_apic being put into
the kdump initialization scripts?
thanks
Jerry
>
> v2 => v3)
>
> - Change default value of boot_cpu_is_bsp to true.
>
> - Before executing rdmsr(MSR_IA32_APICBASE), check if the number of
> processor family is larger than or equal to 6 in order to avoid
> invalid opcode exception on processors where MSR_IA32_APICBASE is
> not supported.
>
> v1 => v2)
>
> - Rebased on top of v3.12-rc5.
>
> - Fix linking time error of boot_cpu_is_bsp_init() in case of
> CONFIG_LOCAL_APIC disabled by adding empty static inline function
> instead.
>
> - Fix missing feature check by means of cpu_has_apic macro in
> boot_cpu_is_bsp_init() before calling rdmsr_safe(MSR_IA32_APICBASE).
>
> NOTE: I've checked local apic-present case only; I don't have any
> x86 processor without local apic.
>
> - Add __init annotation to boot_cpu_is_bsp_init().
>
> Test
>
> - built with and without CONFIG_LOCAL_APIC
> - tested x86_64 in case of acpi and MP table
>
> ---
>
> HATAYAMA Daisuke (3):
> x86, apic: Don't count the CPU with BP flag from MP table as booting-up CPU
> x86, apic: Add disable_cpu_apicid kernel parameter
> Documentation, x86, apic, kexec: Add disable_cpu_apicid kernel parameter
>
>
> Documentation/kernel-parameters.txt | 9 +++++++++
> arch/x86/kernel/apic/apic.c | 29 +++++++++++++++++++++++++++++
> arch/x86/kernel/mpparse.c | 1 -
> 3 files changed, 38 insertions(+), 1 deletion(-)
>
> --
>
> Thanks.
> HATAYAMA, Daisuke
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
--
----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett-Packard/MODL
3404 E Harmony Rd. MS 57 phone: (970) 898-1022
Ft. Collins, CO 80528 FAX: (970) 898-XXXX
email: [email protected]
----------------------------------------------------------------------------
(2013/10/23 7:08), [email protected] wrote:
> On Wed, Oct 23, 2013 at 12:01:18AM +0900, HATAYAMA Daisuke wrote:
>> This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
>> even if 1st kernel crashs on some AP, a continueing work from:
>>
>> [PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
>> https://lkml.org/lkml/2013/10/16/300.
>>
>> In this version, basic design has changed. Now users need to figure
>> out initial APIC ID of BSP in the 1st kernel and configures kernel
>> parameter for the 2nd kernel manually using disable_cpu_apic kernel
>> parameter to be newly introduced in this patch set. This design is
>> more flexible than the previous version in that we no longer have to
>> rely on ACPI/MP table to get initial APIC ID of BSP.
>>
>> Sorry, this patch set have not include in-source documentation
>> requested by Borislav Petkov yet, but I'll post it later separately,
>> which would be better to focus on documentation reviewing.
>>
>> ChangeLog
>>
>> v3 => v4)
>>
>> - Rebased on top of v3.12-rc6
>>
>> - Basic design has been changed. Now users need to figure out initial
>> APIC ID of BSP in the 1st kernel and configures kernel parameter for
>> the 2nd kernel manually using disable_cpu_apic kernel parameter to
>> be newly introduced in this patch set. This design is more flexible
>> than the previous version in that we no longer have to rely on
>> ACPI/MP table to get initial APIC ID of BSP.
>
>
> Do you literally mean a human at each boot will have to configure
> the kdump configuration files for passing disable_cpu_apic?
> Or do you envision the setting of disable_cpu_apic being put into
> the kdump initialization scripts?
>
> thanks
>
> Jerry
Nearer to the former case, but this is not what a human should do. It's
a cumbersome task. I think, on fedora/RHEL system for example, kdump
service should check at each boot automatically.
--
Thanks.
HATAYAMA, Daisuke
On Wed, Oct 23, 2013 at 09:05:06AM +0900, HATAYAMA Daisuke wrote:
[..]
> >Do you literally mean a human at each boot will have to configure
> >the kdump configuration files for passing disable_cpu_apic?
> >Or do you envision the setting of disable_cpu_apic being put into
> >the kdump initialization scripts?
> >
> >thanks
> >
> >Jerry
>
> Nearer to the former case, but this is not what a human should do. It's
> a cumbersome task. I think, on fedora/RHEL system for example, kdump
> service should check at each boot automatically.
Hi Hatayama,
So what information should I look for to prepare disable_cpu_apic=X in
kdump script?
Is BSP processor info exported to user space somewhere? Or assuming that
processor 0 is BSP and corresponding apicid should be disabled in kdump
kernel is good enough?
I am looking at /proc/cpuinfo and following 3 fields seem interesting.
processor: 0
apicid : 0
initial apicid : 0
What's the difference between apicid and "initial apicid". I guess
initial apicid reflects the apicid number as set by firmware and then
kernel can overwrite it and new number would be reflected in "apicid"?
If that's the case, then I guess we should be looking at "apicid" of
processor "0" and set that in disable_cpu_apic? Because that's the
number kdump kernel boot should see in apic upon boot.
Thanks
Vivek
(2013/10/24 0:51), Vivek Goyal wrote:
> On Wed, Oct 23, 2013 at 09:05:06AM +0900, HATAYAMA Daisuke wrote:
>
> [..]
>>> Do you literally mean a human at each boot will have to configure
>>> the kdump configuration files for passing disable_cpu_apic?
>>> Or do you envision the setting of disable_cpu_apic being put into
>>> the kdump initialization scripts?
>>>
>>> thanks
>>>
>>> Jerry
>>
>> Nearer to the former case, but this is not what a human should do. It's
>> a cumbersome task. I think, on fedora/RHEL system for example, kdump
>> service should check at each boot automatically.
>
> Hi Hatayama,
>
> So what information should I look for to prepare disable_cpu_apic=X in
> kdump script?
>
> Is BSP processor info exported to user space somewhere? Or assuming that
> processor 0 is BSP and corresponding apicid should be disabled in kdump
> kernel is good enough?
>
Yes, this patch set assumes that the processor 0 is BSP and there's no
other BSP. Because this patch cares about only one BSP processor,
the disabled_cpu_apicid variable has unsigned int, not mask.
I think this assumption is reasonable since doing it rigorously requires
additional processing between 1st and 2nd kernels just as I explained in
previous mail.
> I am looking at /proc/cpuinfo and following 3 fields seem interesting.
>
> processor: 0
> apicid : 0
> initial apicid : 0
>
> What's the difference between apicid and "initial apicid". I guess
> initial apicid reflects the apicid number as set by firmware and then
> kernel can overwrite it and new number would be reflected in "apicid"?
>
> If that's the case, then I guess we should be looking at "apicid" of
> processor "0" and set that in disable_cpu_apic? Because that's the
> number kdump kernel boot should see in apic upon boot.
>
Yes, that's fully correct, and please see 10.4.6 Local APIC ID in Intel SPG
for details.
BTW, we can use cpuid instruction in user-space, too. It might be more
flexible to use cpuid than looking up /proc/cpuinfo.
Also, there's one corner case that if we hot-remove cpu0, we cannot
look up /proc/cpuinfo to get cpu0 information since /proc/cpunifo displays
*online* cpus only. We cannot use even cpuid instruction for offline cpu.
So, to address this corner case, we need to prepare new interface to see
cpu0 initial apicid which is always available.
My idea is for example to introduce the following file in sysfs:
/sys/devices/system/cpu/cpu0/initial_apicid
Under the current implementation, cpu0 hot-remove is software one and kernel
must start with cpu0 in boot time. It's enough to assign the value of initial
APIC ID in the boot time. The one in boot_cpu_data?
--
Thanks.
HATAYAMA, Daisuke
Vivek Goyal <[email protected]> writes:
> Hi Hatayama,
>
> So what information should I look for to prepare disable_cpu_apic=X in
> kdump script?
>
> Is BSP processor info exported to user space somewhere? Or assuming that
> processor 0 is BSP and corresponding apicid should be disabled in kdump
> kernel is good enough?
On x86 assuming that processor 0 is BSP should be good enough. Unless
we starting getting SMP BIOSen playing games with us.
> I am looking at /proc/cpuinfo and following 3 fields seem interesting.
>
> processor: 0
> apicid : 0
> initial apicid : 0
>
> What's the difference between apicid and "initial apicid". I guess
> initial apicid reflects the apicid number as set by firmware and then
> kernel can overwrite it and new number would be reflected in "apicid"?
Last I was current initial apicid is the apic id the hardware chooses
and it tells you something about the topology of the processors in the
system.
The apicid is programmed by the BIOS so that the BSP can have apicid 0,
and apicid's can be contiguous etc. In principle any piece of software
can program apicids but there is no advantage.
> If that's the case, then I guess we should be looking at "apicid" of
> processor "0" and set that in disable_cpu_apic? Because that's the
> number kdump kernel boot should see in apic upon boot.
Restricting this to x86 and not Voyager that is correct. Linux cpu 0
is the processor we booted up on as is apparent lots of things special
case the bootstrap processor so using a cpu hotplug remove to make it go
away is silly. Still to handle cazy cpu hotplug I would recomend to
simply force a single cpu if you can't find cpu 0.
Eric
Hi,
I am reviewing this patchset, and found there's a cpu0 hotplug feature
posted by intel which we can borrow an idea from. In that implementation,
CPU0 is waken up by nmi not INIT to avoid the realmode bootstrap code
execution. I tried it by below patch which includes one line of change.
By console printing, I got the boot cpu is always 0(namely cpu=0),
however the apicid related to each processor keeps the same as in 1st
kernel. In my HP Z420 machine, the apicid for BSP is 0, so I just make a
test patch which depends on the fact that apicid for BSP is 0. Maybe
generally the apicid for BSP can't be guaranteed, then passing it from
1st kernel to 2nd kernel in cmdline is very helpful, just as you have
done for disable_cpu_apic.
On my HP z420, I add nr_cpus=4 in /etc/sysconfig/kdump, and then execute
below command, then 3 APs (1 boot cpu and 2 AP) can be waken up
correctly, but BSP failed because NMI received for unknown reason 21 on
CPU0. I think I need further check why BSP failed to wake up by nmi. But
3 processors are brought up successfully and kdump is successful too.
sudo taskset -c 1 sh -c "echo c >/proc/sysrq-trigger"
[ 0.296831] smpboot: Booting Node 0, Processors # 1
[ 0.302095]
*****************************************************cpu=1, apicid=0, wakeup_cpu_via_init_nmi
[ 0.311942] cpu=1, apicid=0, register_nmi_handlercpu=1, apicid=0, wakeup_secondary_cpu_via_nmi
[ 0.320826] Uhhuh. NMI received for unknown reason 21 on CPU 0.
[ 0.327129] Do you have a strange power saving mode enabled?
[ 0.333858] Dazed and confused, but trying to continue
[ 0.339290] cpu=1, apicid=0, wakeup_cpu_via_init_nmi
[ 2.409099] Uhhuh. NMI received for unknown reason 21 on CPU 0.
[ 2.415393] Do you have a strange power saving mode enabled?
[ 2.421142] Dazed and confused, but trying to continue
[ 5.379519] smpboot: CPU1: Not responding
[ 5.383692] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6cacab6..e45fe5b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -702,7 +702,7 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long
start_ip, int apicid,
/*
* Wake up AP by INIT, INIT, STARTUP sequence.
*/
- if (cpu)
+ if (cpu && apicid)
return wakeup_secondary_cpu_via_init(apicid, start_ip);
/*
On 10/23/13 at 12:01am, HATAYAMA Daisuke wrote:
> This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
> even if 1st kernel crashs on some AP, a continueing work from:
>
> [PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
> https://lkml.org/lkml/2013/10/16/300.
>
> In this version, basic design has changed. Now users need to figure
> out initial APIC ID of BSP in the 1st kernel and configures kernel
> parameter for the 2nd kernel manually using disable_cpu_apic kernel
> parameter to be newly introduced in this patch set. This design is
> more flexible than the previous version in that we no longer have to
> rely on ACPI/MP table to get initial APIC ID of BSP.
>
> Sorry, this patch set have not include in-source documentation
> requested by Borislav Petkov yet, but I'll post it later separately,
> which would be better to focus on documentation reviewing.
>
> ChangeLog
>
> v3 => v4)
>
> - Rebased on top of v3.12-rc6
>
> - Basic design has been changed. Now users need to figure out initial
> APIC ID of BSP in the 1st kernel and configures kernel parameter for
> the 2nd kernel manually using disable_cpu_apic kernel parameter to
> be newly introduced in this patch set. This design is more flexible
> than the previous version in that we no longer have to rely on
> ACPI/MP table to get initial APIC ID of BSP.
>
> v2 => v3)
>
> - Change default value of boot_cpu_is_bsp to true.
>
> - Before executing rdmsr(MSR_IA32_APICBASE), check if the number of
> processor family is larger than or equal to 6 in order to avoid
> invalid opcode exception on processors where MSR_IA32_APICBASE is
> not supported.
>
> v1 => v2)
>
> - Rebased on top of v3.12-rc5.
>
> - Fix linking time error of boot_cpu_is_bsp_init() in case of
> CONFIG_LOCAL_APIC disabled by adding empty static inline function
> instead.
>
> - Fix missing feature check by means of cpu_has_apic macro in
> boot_cpu_is_bsp_init() before calling rdmsr_safe(MSR_IA32_APICBASE).
>
> NOTE: I've checked local apic-present case only; I don't have any
> x86 processor without local apic.
>
> - Add __init annotation to boot_cpu_is_bsp_init().
>
> Test
>
> - built with and without CONFIG_LOCAL_APIC
> - tested x86_64 in case of acpi and MP table
>
> ---
>
> HATAYAMA Daisuke (3):
> x86, apic: Don't count the CPU with BP flag from MP table as booting-up CPU
> x86, apic: Add disable_cpu_apicid kernel parameter
> Documentation, x86, apic, kexec: Add disable_cpu_apicid kernel parameter
>
>
> Documentation/kernel-parameters.txt | 9 +++++++++
> arch/x86/kernel/apic/apic.c | 29 +++++++++++++++++++++++++++++
> arch/x86/kernel/mpparse.c | 1 -
> 3 files changed, 38 insertions(+), 1 deletion(-)
>
> --
>
> Thanks.
> HATAYAMA, Daisuke
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
On 10/29/13 at 10:21pm, Baoquan He wrote:
> Hi,
>
> I am reviewing this patchset, and found there's a cpu0 hotplug feature
Forget to attach the link of patch for cpu0 hotplug.
http://lwn.net/Articles/475018/
In this patchset, BSP is also called CPU0.
> posted by intel which we can borrow an idea from. In that implementation,
> CPU0 is waken up by nmi not INIT to avoid the realmode bootstrap code
> execution. I tried it by below patch which includes one line of change.
>
> By console printing, I got the boot cpu is always 0(namely cpu=0),
> however the apicid related to each processor keeps the same as in 1st
> kernel. In my HP Z420 machine, the apicid for BSP is 0, so I just make a
> test patch which depends on the fact that apicid for BSP is 0. Maybe
> generally the apicid for BSP can't be guaranteed, then passing it from
> 1st kernel to 2nd kernel in cmdline is very helpful, just as you have
> done for disable_cpu_apic.
>
> On my HP z420, I add nr_cpus=4 in /etc/sysconfig/kdump, and then execute
> below command, then 3 APs (1 boot cpu and 2 AP) can be waken up
> correctly, but BSP failed because NMI received for unknown reason 21 on
> CPU0. I think I need further check why BSP failed to wake up by nmi. But
> 3 processors are brought up successfully and kdump is successful too.
>
> sudo taskset -c 1 sh -c "echo c >/proc/sysrq-trigger"
>
> [ 0.296831] smpboot: Booting Node 0, Processors # 1
> [ 0.302095]
> *****************************************************cpu=1, apicid=0, wakeup_cpu_via_init_nmi
> [ 0.311942] cpu=1, apicid=0, register_nmi_handlercpu=1, apicid=0, wakeup_secondary_cpu_via_nmi
> [ 0.320826] Uhhuh. NMI received for unknown reason 21 on CPU 0.
> [ 0.327129] Do you have a strange power saving mode enabled?
> [ 0.333858] Dazed and confused, but trying to continue
> [ 0.339290] cpu=1, apicid=0, wakeup_cpu_via_init_nmi
> [ 2.409099] Uhhuh. NMI received for unknown reason 21 on CPU 0.
> [ 2.415393] Do you have a strange power saving mode enabled?
> [ 2.421142] Dazed and confused, but trying to continue
> [ 5.379519] smpboot: CPU1: Not responding
> [ 5.383692] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 6cacab6..e45fe5b 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -702,7 +702,7 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long
> start_ip, int apicid,
> /*
> * Wake up AP by INIT, INIT, STARTUP sequence.
> */
> - if (cpu)
> + if (cpu && apicid)
> return wakeup_secondary_cpu_via_init(apicid, start_ip);
>
> /*
>
>
>
> On 10/23/13 at 12:01am, HATAYAMA Daisuke wrote:
> > This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
> > even if 1st kernel crashs on some AP, a continueing work from:
> >
> > [PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
> > https://lkml.org/lkml/2013/10/16/300.
> >
> > In this version, basic design has changed. Now users need to figure
> > out initial APIC ID of BSP in the 1st kernel and configures kernel
> > parameter for the 2nd kernel manually using disable_cpu_apic kernel
> > parameter to be newly introduced in this patch set. This design is
> > more flexible than the previous version in that we no longer have to
> > rely on ACPI/MP table to get initial APIC ID of BSP.
> >
> > Sorry, this patch set have not include in-source documentation
> > requested by Borislav Petkov yet, but I'll post it later separately,
> > which would be better to focus on documentation reviewing.
> >
> > ChangeLog
> >
> > v3 => v4)
> >
> > - Rebased on top of v3.12-rc6
> >
> > - Basic design has been changed. Now users need to figure out initial
> > APIC ID of BSP in the 1st kernel and configures kernel parameter for
> > the 2nd kernel manually using disable_cpu_apic kernel parameter to
> > be newly introduced in this patch set. This design is more flexible
> > than the previous version in that we no longer have to rely on
> > ACPI/MP table to get initial APIC ID of BSP.
> >
> > v2 => v3)
> >
> > - Change default value of boot_cpu_is_bsp to true.
> >
> > - Before executing rdmsr(MSR_IA32_APICBASE), check if the number of
> > processor family is larger than or equal to 6 in order to avoid
> > invalid opcode exception on processors where MSR_IA32_APICBASE is
> > not supported.
> >
> > v1 => v2)
> >
> > - Rebased on top of v3.12-rc5.
> >
> > - Fix linking time error of boot_cpu_is_bsp_init() in case of
> > CONFIG_LOCAL_APIC disabled by adding empty static inline function
> > instead.
> >
> > - Fix missing feature check by means of cpu_has_apic macro in
> > boot_cpu_is_bsp_init() before calling rdmsr_safe(MSR_IA32_APICBASE).
> >
> > NOTE: I've checked local apic-present case only; I don't have any
> > x86 processor without local apic.
> >
> > - Add __init annotation to boot_cpu_is_bsp_init().
> >
> > Test
> >
> > - built with and without CONFIG_LOCAL_APIC
> > - tested x86_64 in case of acpi and MP table
> >
> > ---
> >
> > HATAYAMA Daisuke (3):
> > x86, apic: Don't count the CPU with BP flag from MP table as booting-up CPU
> > x86, apic: Add disable_cpu_apicid kernel parameter
> > Documentation, x86, apic, kexec: Add disable_cpu_apicid kernel parameter
> >
> >
> > Documentation/kernel-parameters.txt | 9 +++++++++
> > arch/x86/kernel/apic/apic.c | 29 +++++++++++++++++++++++++++++
> > arch/x86/kernel/mpparse.c | 1 -
> > 3 files changed, 38 insertions(+), 1 deletion(-)
> >
> > --
> >
> > Thanks.
> > HATAYAMA, Daisuke
> >
> > _______________________________________________
> > kexec mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/kexec
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
(2013/10/29 23:21), Baoquan He wrote:
> Hi,
>
> I am reviewing this patchset, and found there's a cpu0 hotplug feature
> posted by intel which we can borrow an idea from. In that implementation,
> CPU0 is waken up by nmi not INIT to avoid the realmode bootstrap code
> execution. I tried it by below patch which includes one line of change.
>
> By console printing, I got the boot cpu is always 0(namely cpu=0),
> however the apicid related to each processor keeps the same as in 1st
> kernel. In my HP Z420 machine, the apicid for BSP is 0, so I just make a
> test patch which depends on the fact that apicid for BSP is 0. Maybe
> generally the apicid for BSP can't be guaranteed, then passing it from
> 1st kernel to 2nd kernel in cmdline is very helpful, just as you have
> done for disable_cpu_apic.
>
> On my HP z420, I add nr_cpus=4 in /etc/sysconfig/kdump, and then execute
> below command, then 3 APs (1 boot cpu and 2 AP) can be waken up
> correctly, but BSP failed because NMI received for unknown reason 21 on
> CPU0. I think I need further check why BSP failed to wake up by nmi. But
> 3 processors are brought up successfully and kdump is successful too.
>
> sudo taskset -c 1 sh -c "echo c >/proc/sysrq-trigger"
>
> [ 0.296831] smpboot: Booting Node 0, Processors # 1
> [ 0.302095]
> *****************************************************cpu=1, apicid=0, wakeup_cpu_via_init_nmi
> [ 0.311942] cpu=1, apicid=0, register_nmi_handlercpu=1, apicid=0, wakeup_secondary_cpu_via_nmi
> [ 0.320826] Uhhuh. NMI received for unknown reason 21 on CPU 0.
> [ 0.327129] Do you have a strange power saving mode enabled?
> [ 0.333858] Dazed and confused, but trying to continue
> [ 0.339290] cpu=1, apicid=0, wakeup_cpu_via_init_nmi
> [ 2.409099] Uhhuh. NMI received for unknown reason 21 on CPU 0.
> [ 2.415393] Do you have a strange power saving mode enabled?
> [ 2.421142] Dazed and confused, but trying to continue
> [ 5.379519] smpboot: CPU1: Not responding
> [ 5.383692] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
>
We've already discussed this approach and concluded this is not applicable
to our issue.
Follow http://lists.infradead.org/pipermail/kexec/2012-October/006905.html.
The reason is:
- The cpu0-hotplugging approach assumes BSP to be halting before initiating
NMI to it while in our case, BSP is halting in the 1st kernel or is
running in arbitrary position of the 1st kernel in catastrophic state.
- In general, NMI modifies stack, which means if throwing NMI to the BSP
in the 1st kernel, stack on the 1st kernel is modified. It's unpermissible
from kdump's perspective.
- On x86_64, there are two cases where stack is changed to another one
when receiving interrupts. One is when receiving interrupt in user mode.
The other is when using Interrupt Stack Table (IST), which is already
used in the current x86_64 implementation.
By using either, it would be possible to wake up BSP in the 1st kernel
by modifying the contexts on the 2nd kernel's NMI stack pushed on when NMI
to the 1st kernel is initiated.
However, this approach depends on the logic in the 1st kernel, there's
no guarantee that it works well. Consider severely buggy situation again.
- To do this approach rigorously, we need to check if states of BSP and APs
are kept in just what we assume in the place where logic is guaranteed to be
sane, i.e., at least after purgatory. However, adding new logic in the
purgatory means we are forced to introduce additional dependency between
kernel and kexec. The process performed in purgatory itself is not so
simple.I don't like this complication.
To sum up, I think the current idea is simple enough approach.
--
Thanks.
HATAYAMA, Daisuke
On 10/30/13 at 09:44am, HATAYAMA Daisuke wrote:
> (2013/10/29 23:21), Baoquan He wrote:
> >Hi,
> >
> >I am reviewing this patchset, and found there's a cpu0 hotplug feature
> >posted by intel which we can borrow an idea from. In that implementation,
> >CPU0 is waken up by nmi not INIT to avoid the realmode bootstrap code
> >execution. I tried it by below patch which includes one line of change.
> >
> >By console printing, I got the boot cpu is always 0(namely cpu=0),
> >however the apicid related to each processor keeps the same as in 1st
> >kernel. In my HP Z420 machine, the apicid for BSP is 0, so I just make a
> >test patch which depends on the fact that apicid for BSP is 0. Maybe
> >generally the apicid for BSP can't be guaranteed, then passing it from
> >1st kernel to 2nd kernel in cmdline is very helpful, just as you have
> >done for disable_cpu_apic.
> >
> >On my HP z420, I add nr_cpus=4 in /etc/sysconfig/kdump, and then execute
> >below command, then 3 APs (1 boot cpu and 2 AP) can be waken up
> >correctly, but BSP failed because NMI received for unknown reason 21 on
> >CPU0. I think I need further check why BSP failed to wake up by nmi. But
> >3 processors are brought up successfully and kdump is successful too.
> >
> >sudo taskset -c 1 sh -c "echo c >/proc/sysrq-trigger"
> >
> >[ 0.296831] smpboot: Booting Node 0, Processors # 1
> >[ 0.302095]
> >*****************************************************cpu=1, apicid=0, wakeup_cpu_via_init_nmi
> >[ 0.311942] cpu=1, apicid=0, register_nmi_handlercpu=1, apicid=0, wakeup_secondary_cpu_via_nmi
> >[ 0.320826] Uhhuh. NMI received for unknown reason 21 on CPU 0.
> >[ 0.327129] Do you have a strange power saving mode enabled?
> >[ 0.333858] Dazed and confused, but trying to continue
> >[ 0.339290] cpu=1, apicid=0, wakeup_cpu_via_init_nmi
> >[ 2.409099] Uhhuh. NMI received for unknown reason 21 on CPU 0.
> >[ 2.415393] Do you have a strange power saving mode enabled?
> >[ 2.421142] Dazed and confused, but trying to continue
> >[ 5.379519] smpboot: CPU1: Not responding
> >[ 5.383692] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
> >
>
> We've already discussed this approach and concluded this is not applicable
> to our issue.
> Follow http://lists.infradead.org/pipermail/kexec/2012-October/006905.html.
>
> The reason is:
>
> - The cpu0-hotplugging approach assumes BSP to be halting before initiating
> NMI to it while in our case, BSP is halting in the 1st kernel or is
> running in arbitrary position of the 1st kernel in catastrophic state.
>
> - In general, NMI modifies stack, which means if throwing NMI to the BSP
> in the 1st kernel, stack on the 1st kernel is modified. It's unpermissible
> from kdump's perspective.
Hi HATAYAMA,
All right. I didn't think of the stack issues NMI will bring. In fact
without this NMI stack problem, using CPU0 Hotplug as a base and sending
nmi to bsp will be good, because BSP failure can be acceptable, then
(N-1)cpus can be used. Later on if possible the contexts modifying can
be changed to let BSP wake up correctly. After all, from the user's
point of view, multiple cpus can be waken up, why not waking up all cpus
including BSP.
Anyway, this issue has been discussed so long time, and it will be great
to run multiple cpus in 2nd kernel. This evolution may be like CPU0 Hotplug,
they let cpus except of BSP hot plug available, then hanle the last cpu -
the BSP finally. From this perspective, I like your patch and hope it
can be merged asap.
Baoquan
Thanks a lot
>
> - On x86_64, there are two cases where stack is changed to another one
> when receiving interrupts. One is when receiving interrupt in user mode.
> The other is when using Interrupt Stack Table (IST), which is already
> used in the current x86_64 implementation.
>
> By using either, it would be possible to wake up BSP in the 1st kernel
> by modifying the contexts on the 2nd kernel's NMI stack pushed on when NMI
> to the 1st kernel is initiated.
>
> However, this approach depends on the logic in the 1st kernel, there's
> no guarantee that it works well. Consider severely buggy situation again.
>
> - To do this approach rigorously, we need to check if states of BSP and APs
> are kept in just what we assume in the place where logic is guaranteed to be
> sane, i.e., at least after purgatory. However, adding new logic in the
> purgatory means we are forced to introduce additional dependency between
> kernel and kexec. The process performed in purgatory itself is not so
> simple.I don't like this complication.
>
> To sum up, I think the current idea is simple enough approach.
>
> --
> Thanks.
> HATAYAMA, Daisuke
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
(2013/10/30 15:06), Baoquan He wrote:
> On 10/30/13 at 09:44am, HATAYAMA Daisuke wrote:
>> (2013/10/29 23:21), Baoquan He wrote:
>>> Hi,
>>>
>>> I am reviewing this patchset, and found there's a cpu0 hotplug feature
>>> posted by intel which we can borrow an idea from. In that implementation,
>>> CPU0 is waken up by nmi not INIT to avoid the realmode bootstrap code
>>> execution. I tried it by below patch which includes one line of change.
>>>
>>> By console printing, I got the boot cpu is always 0(namely cpu=0),
>>> however the apicid related to each processor keeps the same as in 1st
>>> kernel. In my HP Z420 machine, the apicid for BSP is 0, so I just make a
>>> test patch which depends on the fact that apicid for BSP is 0. Maybe
>>> generally the apicid for BSP can't be guaranteed, then passing it from
>>> 1st kernel to 2nd kernel in cmdline is very helpful, just as you have
>>> done for disable_cpu_apic.
>>>
>>> On my HP z420, I add nr_cpus=4 in /etc/sysconfig/kdump, and then execute
>>> below command, then 3 APs (1 boot cpu and 2 AP) can be waken up
>>> correctly, but BSP failed because NMI received for unknown reason 21 on
>>> CPU0. I think I need further check why BSP failed to wake up by nmi. But
>>> 3 processors are brought up successfully and kdump is successful too.
>>>
>>> sudo taskset -c 1 sh -c "echo c >/proc/sysrq-trigger"
>>>
>>> [ 0.296831] smpboot: Booting Node 0, Processors # 1
>>> [ 0.302095]
>>> *****************************************************cpu=1, apicid=0, wakeup_cpu_via_init_nmi
>>> [ 0.311942] cpu=1, apicid=0, register_nmi_handlercpu=1, apicid=0, wakeup_secondary_cpu_via_nmi
>>> [ 0.320826] Uhhuh. NMI received for unknown reason 21 on CPU 0.
>>> [ 0.327129] Do you have a strange power saving mode enabled?
>>> [ 0.333858] Dazed and confused, but trying to continue
>>> [ 0.339290] cpu=1, apicid=0, wakeup_cpu_via_init_nmi
>>> [ 2.409099] Uhhuh. NMI received for unknown reason 21 on CPU 0.
>>> [ 2.415393] Do you have a strange power saving mode enabled?
>>> [ 2.421142] Dazed and confused, but trying to continue
>>> [ 5.379519] smpboot: CPU1: Not responding
>>> [ 5.383692] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
>>>
>>
>> We've already discussed this approach and concluded this is not applicable
>> to our issue.
>> Follow http://lists.infradead.org/pipermail/kexec/2012-October/006905.html.
>>
>> The reason is:
>>
>> - The cpu0-hotplugging approach assumes BSP to be halting before initiating
>> NMI to it while in our case, BSP is halting in the 1st kernel or is
>> running in arbitrary position of the 1st kernel in catastrophic state.
>>
>> - In general, NMI modifies stack, which means if throwing NMI to the BSP
>> in the 1st kernel, stack on the 1st kernel is modified. It's unpermissible
>> from kdump's perspective.
>
> Hi HATAYAMA,
>
> All right. I didn't think of the stack issues NMI will bring. In fact
> without this NMI stack problem, using CPU0 Hotplug as a base and sending
> nmi to bsp will be good, because BSP failure can be acceptable, then
> (N-1)cpus can be used. Later on if possible the contexts modifying can
> be changed to let BSP wake up correctly. After all, from the user's
> point of view, multiple cpus can be waken up, why not waking up all cpus
> including BSP.
>
> Anyway, this issue has been discussed so long time, and it will be great
> to run multiple cpus in 2nd kernel. This evolution may be like CPU0 Hotplug,
> they let cpus except of BSP hot plug available, then hanle the last cpu -
> the BSP finally. From this perspective, I like your patch and hope it
> can be merged asap.
>
Considering again, I'm now beginning with thinking that making CPU halting
in the 1st kernel to execute the 2nd kernel's NMI handler is impossible.
The address of IDT is saved in IDTR and this is a per-cpu register, and
value of IDTR in the 2nd kernel and the one in the 1st kernel are different.
In other words, to wake up BSP from 2nd kernel using NMI, it's necessary to
tell the address of IDTR in the 2nd kernel to the BSP halting in the 1st
kernel.
--
Thanks.
HATAYAMA, Daisuke
On Wed, Oct 23, 2013 at 09:05:06AM +0900, HATAYAMA Daisuke wrote:
> >>
> >>- Rebased on top of v3.12-rc6
> >>
> >>- Basic design has been changed. Now users need to figure out initial
> >> APIC ID of BSP in the 1st kernel and configures kernel parameter for
> >> the 2nd kernel manually using disable_cpu_apic kernel parameter to
> >> be newly introduced in this patch set. This design is more flexible
> >> than the previous version in that we no longer have to rely on
> >> ACPI/MP table to get initial APIC ID of BSP.
> >
> >
> >Do you literally mean a human at each boot will have to configure
> >the kdump configuration files for passing disable_cpu_apic?
> >Or do you envision the setting of disable_cpu_apic being put into
> >the kdump initialization scripts?
> >
> >thanks
> >
> >Jerry
>
> Nearer to the former case, but this is not what a human should do. It's
> a cumbersome task. I think, on fedora/RHEL system for example, kdump
> service should check at each boot automatically.
>
> HATAYAMA, Daisuke
Daisuke,
Are you planning on making changes to the kexec tools to automate
the setting of disable_cpu_apic to the capture kernel? Or do you
know someone who is planning this?
I back ported the kernel side changes to a 4.2.32 based kernel.
I tested the patch on a prototype system which exhibits a stable
initial_apic_id for CPU 0. While not something that would be suitable
for customers, it does allow me to test the kernel portion of the patch.
I will report the results of the testing later this week.
Thanks
Jerry
--
----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett-Packard/MODL
3404 E Harmony Rd. MS 57 phone: (970) 898-1022
Ft. Collins, CO 80528 FAX: (970) 898-XXXX
email: [email protected]
----------------------------------------------------------------------------
(2013/10/31 9:58), [email protected] wrote:
> On Wed, Oct 23, 2013 at 09:05:06AM +0900, HATAYAMA Daisuke wrote:
>>>>
>>>> - Rebased on top of v3.12-rc6
>>>>
>>>> - Basic design has been changed. Now users need to figure out initial
>>>> APIC ID of BSP in the 1st kernel and configures kernel parameter for
>>>> the 2nd kernel manually using disable_cpu_apic kernel parameter to
>>>> be newly introduced in this patch set. This design is more flexible
>>>> than the previous version in that we no longer have to rely on
>>>> ACPI/MP table to get initial APIC ID of BSP.
>>>
>>>
>>> Do you literally mean a human at each boot will have to configure
>>> the kdump configuration files for passing disable_cpu_apic?
>>> Or do you envision the setting of disable_cpu_apic being put into
>>> the kdump initialization scripts?
>>>
>>> thanks
>>>
>>> Jerry
>>
>> Nearer to the former case, but this is not what a human should do. It's
>> a cumbersome task. I think, on fedora/RHEL system for example, kdump
>> service should check at each boot automatically.
>>
>> HATAYAMA, Daisuke
>
>
> Daisuke,
>
> Are you planning on making changes to the kexec tools to automate
> the setting of disable_cpu_apic to the capture kernel? Or do you
> know someone who is planning this?
>
> I back ported the kernel side changes to a 4.2.32 based kernel.
> I tested the patch on a prototype system which exhibits a stable
> initial_apic_id for CPU 0. While not something that would be suitable
> for customers, it does allow me to test the kernel portion of the patch.
>
> I will report the results of the testing later this week.
>
I'll do that after this patch is merged in kernel. But it is still under
reviewing.
--
Thanks.
HATAYAMA, Daisuke
On Wed, Oct 30, 2013 at 06:58:13PM -0600, [email protected] wrote:
[..]
> Daisuke,
>
> Are you planning on making changes to the kexec tools to automate
> the setting of disable_cpu_apic to the capture kernel? Or do you
> know someone who is planning this?
I think we should not make this change in kexec-tools and should leave
it to distro scripts to append disable_cpu_apic.
Who knows in future this restriction is not there at all and kexec-tools
will be stuck with always passing disable_cpu_apic. Getting rid of
this parameter in distro scripts will be much easier.
Thanks
Vivek
On Thu, Oct 31, 2013 at 09:27:45AM -0400, Vivek Goyal wrote:
> On Wed, Oct 30, 2013 at 06:58:13PM -0600, [email protected] wrote:
>
> [..]
> > Daisuke,
> >
> > Are you planning on making changes to the kexec tools to automate
> > the setting of disable_cpu_apic to the capture kernel? Or do you
> > know someone who is planning this?
>
> I think we should not make this change in kexec-tools and should leave
> it to distro scripts to append disable_cpu_apic.
>
> Who knows in future this restriction is not there at all and kexec-tools
> will be stuck with always passing disable_cpu_apic. Getting rid of
> this parameter in distro scripts will be much easier.
Hi Vivek, Hi Daisuke,
That approach sounds reasonable to me.