On Mon, 16 Apr 2012 11:21:28 +0900
HATAYAMA Daisuke <[email protected]> wrote:
> Currently, booting up 2nd kernel with multiple CPUs fails in most
> cases since it enters 2nd kernel with AP if the crash happens on the
> AP. The problem is to signal startup IPI from AP to BSP. Typical
> result of the operation I saw is the machine hanging during the 2nd
> kernel boot.
> To solve this issue, always enter 2nd kernel with BSP. To do this, I
> modify logic for shooting down CPUs. I use simple existing logic only
> in this mechanism, not complicating crash path to machine_kexec().
These patches looked pretty good. I seem to recall that Fenghua (from
Intel) had an alternative solution for booting from AP. Unfortunately I
can't find his mails in my kexec mailbox...
Anyway, what's the latest upstream status?
> I did stress tests about 100 in total on the processors below:
> Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz
> Socket x 4, Core x 8, Thread x 16 (160 LCPUS in total)
> Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz
> Socket x 8, Core x 10, Thread x 20 (64 LCPUS in total)
> * Motivation of enabling multiple CPUs on the 2nd kernel
> This patch is aimed at doing parallel compression on the 2nd
> kernel. The machine that has more than tera bytes memory requires
> several hours to generate crash dump.
> There are several ways to reduce generation time of crash time, but
> they have different pros and cons:
> Fast I/O devices
> - Can obtain high-speed stably
> - Big financial cost for good performance I/O devices. It's
> difficult financially to prepare these for all environments as
> dump devices.
> - No financial cost.
> - Large reduction of crash dump size
> - Some data is definitely lost. So, we cannot use this on some
> 1) High availability configuration where application triggers
> OS to crash and users want to debug the application later by
> retrieving the application's user process image from the
> system's crash dump.
> 2) KVM virtualization configuration where KVM host machine
> contains KVM guest machine images as user processes.
> 3) Page cache is needed for debugging filesystem related bugs.
> - No financial cost.
> - No data lost.
> - Compression doesn't always reduce crash dump size.
> - take heavy CPU time. Slow if CPU is weak in speed.
> Machines with large memory tend to have a lot of CPUs. Parallel
> compression is sutable for parallel processing. My goal is to make
> compression as for free as possible.
> * TODO
> - Extend 512MB limit of reserved memory size for 2nd kernel for
> multiple CPUs.
> - Intel microcode patch loading on the 2nd kenrel is slow for the
> 2nd and later CPUs: about one or more minutes per one CPU.
> - There are a limited number of irq vectors for TLB flush IPI on
> x86_64: 32 for recent 3.x kernels and 8 for around 2.6.x
> kernels. So compression doesn't scale if a lot of page reclaim
> happens when reading kernel image larger than memory. Special
> handling without page cache could be applicable to parallel dump
> mechanism, but more investigation is needed.
> HATAYAMA Daisuke (2):
> Enter 2nd kernel with BSP
> Introduce crash ipi helpers to wait for APs to stop
> arch/x86/include/asm/reboot.h | 4 +++
> arch/x86/kernel/crash.c | 15 +++++++++-
> arch/x86/kernel/reboot.c | 63 +++++++++++++++++++++++++++++------------
> 3 files changed, 62 insertions(+), 20 deletions(-)
(2013/04/18 20:41), Petr Tesarik wrote:
> On Mon, 16 Apr 2012 11:21:28 +0900
> HATAYAMA Daisuke <[email protected]> wrote:
>> Currently, booting up 2nd kernel with multiple CPUs fails in most
>> cases since it enters 2nd kernel with AP if the crash happens on the
>> AP. The problem is to signal startup IPI from AP to BSP. Typical
>> result of the operation I saw is the machine hanging during the 2nd
>> kernel boot.
>> To solve this issue, always enter 2nd kernel with BSP. To do this, I
>> modify logic for shooting down CPUs. I use simple existing logic only
>> in this mechanism, not complicating crash path to machine_kexec().
> These patches looked pretty good. I seem to recall that Fenghua (from
> Intel) had an alternative solution for booting from AP. Unfortunately I
> can't find his mails in my kexec mailbox...
> Anyway, what's the latest upstream status?
It's still in experimental state.
The patch itself was nacked by Erick since switching the CPU that
entered 2nd kenrel through NMI reduced reliability of kdump.
At the discussion of my 2nd patch set that tried to reset BSP flag at
boot on the 2nd kernel, Erick suggested that BSP flag could be changed
at runtime and then behaviour when INIT was received varied and first we
should discuss how unsetting BSP flag affects system.
I'm now going in this direction and the patch I posted a month ago is:
[PATCH] x86, apic: Add unset_bsp parameter to unset BSP flag at boot time
According to Fenghua, some kind of firmware assumes that BSP flag is
being kept throughout system is running. I have yet to see difference of
behaviour when unsetting BSP flag on top of the patch on my machine. I
think this is system dependent and it might be better to assign each
user to decide whether to unset BSP flag or not.
BTW, the work of software cpu hotplug for BSP by Fenghua is orthogonal
to my case. His work is for system including firmware that is affected
if BSP flag is unset and assumes healthy system that cpu#0 is always
BSP. On the other hand, our case is for crash kernel and we can no
longer assume cpu#0 is BSP and can no longer use NMI to wake up other
CPUs since we cannot use logic that depends on the state of CPUs
sleeping in the 1st kernel.