Zhang Yanfei <[email protected]> writes:
> This patch adds an atomic notifier list named crash_notifier_list.
> Currently, when loading kvm-intel module, a notifier will be registered
> in the list to enable vmcss loaded on all cpus to be VMCLEAR'd if
> needed.
crash_notifier_list ick gag please no. Effectively this makes the kexec
on panic code path undebuggable.
Instead we need to use direct function calls to whatever you are doing.
If a direct function call is too complex then the piece of code you want
to call is almost certainly too complex to be calling on a code path
like this.
Eric
> Signed-off-by: Zhang Yanfei <[email protected]>
> ---
> arch/x86/include/asm/kexec.h | 2 ++
> arch/x86/kernel/crash.c | 9 +++++++++
> 2 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 317ff17..5e22b00 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -163,6 +163,8 @@ struct kimage_arch {
> };
> #endif
>
> +extern struct atomic_notifier_head crash_notifier_list;
> +
> #endif /* __ASSEMBLY__ */
>
> #endif /* _ASM_X86_KEXEC_H */
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 13ad899..c5b2f70 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -16,6 +16,8 @@
> #include <linux/delay.h>
> #include <linux/elf.h>
> #include <linux/elfcore.h>
> +#include <linux/module.h>
> +#include <linux/notifier.h>
>
> #include <asm/processor.h>
> #include <asm/hardirq.h>
> @@ -30,6 +32,9 @@
>
> int in_crash_kexec;
>
> +ATOMIC_NOTIFIER_HEAD(crash_notifier_list);
> +EXPORT_SYMBOL_GPL(crash_notifier_list);
> +
> #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
>
> static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
> @@ -46,6 +51,8 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
> #endif
> crash_save_cpu(regs, cpu);
>
> + atomic_notifier_call_chain(&crash_notifier_list, 0, NULL);
> +
> /* Disable VMX or SVM if needed.
> *
> * We need to disable virtualization on all CPUs.
> @@ -88,6 +95,8 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>
> kdump_nmi_shootdown_cpus();
>
> + atomic_notifier_call_chain(&crash_notifier_list, 0, NULL);
> +
> /* Booting kdump kernel with VMX or SVM enabled won't work,
> * because (among other limitations) we can't disable paging
> * with the virt flags.
On Mon, Nov 26, 2012 at 09:08:54AM -0600, Eric W. Biederman wrote:
> Zhang Yanfei <[email protected]> writes:
>
> > This patch adds an atomic notifier list named crash_notifier_list.
> > Currently, when loading kvm-intel module, a notifier will be registered
> > in the list to enable vmcss loaded on all cpus to be VMCLEAR'd if
> > needed.
>
> crash_notifier_list ick gag please no. Effectively this makes the kexec
> on panic code path undebuggable.
>
> Instead we need to use direct function calls to whatever you are doing.
>
The code walks linked list in kvm-intel module and calls vmclear on
whatever it finds there. Since the function have to resides in kvm-intel
module it cannot be called directly. Is callback pointer that is set
by kvm-intel more acceptable?
> If a direct function call is too complex then the piece of code you want
> to call is almost certainly too complex to be calling on a code path
> like this.
>
> Eric
>
> > Signed-off-by: Zhang Yanfei <[email protected]>
> > ---
> > arch/x86/include/asm/kexec.h | 2 ++
> > arch/x86/kernel/crash.c | 9 +++++++++
> > 2 files changed, 11 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> > index 317ff17..5e22b00 100644
> > --- a/arch/x86/include/asm/kexec.h
> > +++ b/arch/x86/include/asm/kexec.h
> > @@ -163,6 +163,8 @@ struct kimage_arch {
> > };
> > #endif
> >
> > +extern struct atomic_notifier_head crash_notifier_list;
> > +
> > #endif /* __ASSEMBLY__ */
> >
> > #endif /* _ASM_X86_KEXEC_H */
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 13ad899..c5b2f70 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -16,6 +16,8 @@
> > #include <linux/delay.h>
> > #include <linux/elf.h>
> > #include <linux/elfcore.h>
> > +#include <linux/module.h>
> > +#include <linux/notifier.h>
> >
> > #include <asm/processor.h>
> > #include <asm/hardirq.h>
> > @@ -30,6 +32,9 @@
> >
> > int in_crash_kexec;
> >
> > +ATOMIC_NOTIFIER_HEAD(crash_notifier_list);
> > +EXPORT_SYMBOL_GPL(crash_notifier_list);
> > +
> > #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
> >
> > static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
> > @@ -46,6 +51,8 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
> > #endif
> > crash_save_cpu(regs, cpu);
> >
> > + atomic_notifier_call_chain(&crash_notifier_list, 0, NULL);
> > +
> > /* Disable VMX or SVM if needed.
> > *
> > * We need to disable virtualization on all CPUs.
> > @@ -88,6 +95,8 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> >
> > kdump_nmi_shootdown_cpus();
> >
> > + atomic_notifier_call_chain(&crash_notifier_list, 0, NULL);
> > +
> > /* Booting kdump kernel with VMX or SVM enabled won't work,
> > * because (among other limitations) we can't disable paging
> > * with the virt flags.
--
Gleb.
Gleb Natapov <[email protected]> writes:
> On Mon, Nov 26, 2012 at 09:08:54AM -0600, Eric W. Biederman wrote:
>> Zhang Yanfei <[email protected]> writes:
>>
>> > This patch adds an atomic notifier list named crash_notifier_list.
>> > Currently, when loading kvm-intel module, a notifier will be registered
>> > in the list to enable vmcss loaded on all cpus to be VMCLEAR'd if
>> > needed.
>>
>> crash_notifier_list ick gag please no. Effectively this makes the kexec
>> on panic code path undebuggable.
>>
>> Instead we need to use direct function calls to whatever you are doing.
>>
> The code walks linked list in kvm-intel module and calls vmclear on
> whatever it finds there. Since the function have to resides in kvm-intel
> module it cannot be called directly. Is callback pointer that is set
> by kvm-intel more acceptable?
Yes a specific callback function is more acceptable. Looking a little
deeper vmclear_local_loaded_vmcss is not particularly acceptable. It is
doing a lot of work that is unnecessary to save the virtual registers
on the kexec on panic path.
In fact I wonder if it might not just be easier to call vmcs_clear to a
fixed per cpu buffer.
Performing list walking in interrupt context without locking in
vmclear_local_loaded vmcss looks a bit scary. Not that locking would
make it any better, as locking would simply add one more way to deadlock
the system. Only an rcu list walk is at all safe. A list walk that
modifies the list as vmclear_local_loaded_vmcss does is definitely not safe.
Eric
On Mon, Nov 26, 2012 at 11:43:10AM -0600, Eric W. Biederman wrote:
> Gleb Natapov <[email protected]> writes:
>
> > On Mon, Nov 26, 2012 at 09:08:54AM -0600, Eric W. Biederman wrote:
> >> Zhang Yanfei <[email protected]> writes:
> >>
> >> > This patch adds an atomic notifier list named crash_notifier_list.
> >> > Currently, when loading kvm-intel module, a notifier will be registered
> >> > in the list to enable vmcss loaded on all cpus to be VMCLEAR'd if
> >> > needed.
> >>
> >> crash_notifier_list ick gag please no. Effectively this makes the kexec
> >> on panic code path undebuggable.
> >>
> >> Instead we need to use direct function calls to whatever you are doing.
> >>
> > The code walks linked list in kvm-intel module and calls vmclear on
> > whatever it finds there. Since the function have to resides in kvm-intel
> > module it cannot be called directly. Is callback pointer that is set
> > by kvm-intel more acceptable?
>
> Yes a specific callback function is more acceptable. Looking a little
> deeper vmclear_local_loaded_vmcss is not particularly acceptable. It is
> doing a lot of work that is unnecessary to save the virtual registers
> on the kexec on panic path.
>
What work are you referring to in particular that may not be acceptable?
> In fact I wonder if it might not just be easier to call vmcs_clear to a
> fixed per cpu buffer.
>
There may be more than one vmcs loaded on a cpu, hence the list.
> Performing list walking in interrupt context without locking in
> vmclear_local_loaded vmcss looks a bit scary. Not that locking would
> make it any better, as locking would simply add one more way to deadlock
> the system. Only an rcu list walk is at all safe. A list walk that
> modifies the list as vmclear_local_loaded_vmcss does is definitely not safe.
>
The list vmclear_local_loaded walks is per cpu. Zhang's kvm patch
disables kexec callback while list is modified.
--
Gleb.
Gleb Natapov <[email protected]> writes:
> On Mon, Nov 26, 2012 at 11:43:10AM -0600, Eric W. Biederman wrote:
>> Gleb Natapov <[email protected]> writes:
>>
>> > On Mon, Nov 26, 2012 at 09:08:54AM -0600, Eric W. Biederman wrote:
>> >> Zhang Yanfei <[email protected]> writes:
>> >>
>> >> > This patch adds an atomic notifier list named crash_notifier_list.
>> >> > Currently, when loading kvm-intel module, a notifier will be registered
>> >> > in the list to enable vmcss loaded on all cpus to be VMCLEAR'd if
>> >> > needed.
>> >>
>> >> crash_notifier_list ick gag please no. Effectively this makes the kexec
>> >> on panic code path undebuggable.
>> >>
>> >> Instead we need to use direct function calls to whatever you are doing.
>> >>
>> > The code walks linked list in kvm-intel module and calls vmclear on
>> > whatever it finds there. Since the function have to resides in kvm-intel
>> > module it cannot be called directly. Is callback pointer that is set
>> > by kvm-intel more acceptable?
>>
>> Yes a specific callback function is more acceptable. Looking a little
>> deeper vmclear_local_loaded_vmcss is not particularly acceptable. It is
>> doing a lot of work that is unnecessary to save the virtual registers
>> on the kexec on panic path.
>>
> What work are you referring to in particular that may not be
> acceptable?
The unnecessary work that I was see is all of the software state
changing. Unlinking things from linked lists flipping variables.
None of that appears related to the fundamental issue saving cpu
state.
Simply reusing a function that does more than what is strictly required
makes me nervous. What is the chance that the function will grow
with maintenance and add constructs that are not safe in a kexec on
panic situtation.
>> In fact I wonder if it might not just be easier to call vmcs_clear to a
>> fixed per cpu buffer.
>>
> There may be more than one vmcs loaded on a cpu, hence the list.
>
>> Performing list walking in interrupt context without locking in
>> vmclear_local_loaded vmcss looks a bit scary. Not that locking would
>> make it any better, as locking would simply add one more way to deadlock
>> the system. Only an rcu list walk is at all safe. A list walk that
>> modifies the list as vmclear_local_loaded_vmcss does is definitely not safe.
>>
> The list vmclear_local_loaded walks is per cpu. Zhang's kvm patch
> disables kexec callback while list is modified.
If the list is only modified on it's cpu and we are running on that cpu
that does look like it will give the necessary protections. It isn't
particularly clear at first glance that is the case unfortunately.
Eric
?? 2012??11??27?? 02:18, Eric W. Biederman д??:
> Gleb Natapov <[email protected]> writes:
>
>> On Mon, Nov 26, 2012 at 11:43:10AM -0600, Eric W. Biederman wrote:
>>> Gleb Natapov <[email protected]> writes:
>>>
>>>> On Mon, Nov 26, 2012 at 09:08:54AM -0600, Eric W. Biederman wrote:
>>>>> Zhang Yanfei <[email protected]> writes:
>>>>>
>>>>>> This patch adds an atomic notifier list named crash_notifier_list.
>>>>>> Currently, when loading kvm-intel module, a notifier will be registered
>>>>>> in the list to enable vmcss loaded on all cpus to be VMCLEAR'd if
>>>>>> needed.
>>>>>
>>>>> crash_notifier_list ick gag please no. Effectively this makes the kexec
>>>>> on panic code path undebuggable.
>>>>>
>>>>> Instead we need to use direct function calls to whatever you are doing.
>>>>>
>>>> The code walks linked list in kvm-intel module and calls vmclear on
>>>> whatever it finds there. Since the function have to resides in kvm-intel
>>>> module it cannot be called directly. Is callback pointer that is set
>>>> by kvm-intel more acceptable?
>>>
>>> Yes a specific callback function is more acceptable. Looking a little
>>> deeper vmclear_local_loaded_vmcss is not particularly acceptable. It is
>>> doing a lot of work that is unnecessary to save the virtual registers
>>> on the kexec on panic path.
>>>
>> What work are you referring to in particular that may not be
>> acceptable?
>
> The unnecessary work that I was see is all of the software state
> changing. Unlinking things from linked lists flipping variables.
> None of that appears related to the fundamental issue saving cpu
> state.
>
> Simply reusing a function that does more than what is strictly required
> makes me nervous. What is the chance that the function will grow
> with maintenance and add constructs that are not safe in a kexec on
> panic situtation.
So in summary,
1. a specific callback function instead of a notifier?
2. Instead of calling vmclear_local_loaded_vmcss, the vmclear operation
will just call the vmclear on every vmcss loaded on the cpu?
like below:
static void crash_vmclear_local_loaded_vmcss(void)
{
int cpu = raw_smp_processor_id();
struct loaded_vmcs *v, *n;
if (!crash_local_vmclear_enabled(cpu))
return;
list_for_each_entry_safe(v, n, &per_cpu(loaded_vmcss_on_cpu, cpu),
loaded_vmcss_on_cpu_link)
vmcs_clear(v->vmcs);
}
right?
Thanks
Zhang
>
>>> In fact I wonder if it might not just be easier to call vmcs_clear to a
>>> fixed per cpu buffer.
>>>
>> There may be more than one vmcs loaded on a cpu, hence the list.
>>
>>> Performing list walking in interrupt context without locking in
>>> vmclear_local_loaded vmcss looks a bit scary. Not that locking would
>>> make it any better, as locking would simply add one more way to deadlock
>>> the system. Only an rcu list walk is at all safe. A list walk that
>>> modifies the list as vmclear_local_loaded_vmcss does is definitely not safe.
>>>
>> The list vmclear_local_loaded walks is per cpu. Zhang's kvm patch
>> disables kexec callback while list is modified.
>
> If the list is only modified on it's cpu and we are running on that cpu
> that does look like it will give the necessary protections. It isn't
> particularly clear at first glance that is the case unfortunately.
>
> Eric
>
Zhang Yanfei <[email protected]> writes:
> So in summary,
>
> 1. a specific callback function instead of a notifier?
Yes.
> 2. Instead of calling vmclear_local_loaded_vmcss, the vmclear operation
> will just call the vmclear on every vmcss loaded on the cpu?
>
> like below:
>
> static void crash_vmclear_local_loaded_vmcss(void)
> {
> int cpu = raw_smp_processor_id();
> struct loaded_vmcs *v, *n;
>
> if (!crash_local_vmclear_enabled(cpu))
> return;
>
> list_for_each_entry_safe(v, n, &per_cpu(loaded_vmcss_on_cpu, cpu),
> loaded_vmcss_on_cpu_link)
> vmcs_clear(v->vmcs);
> }
>
> right?
Yeah that looks good. I would do list_for_each_entry because the list
isn't changing.
Eric
?? 2012??11??27?? 09:49, Eric W. Biederman д??:
> Zhang Yanfei <[email protected]> writes:
>
>> So in summary,
>>
>> 1. a specific callback function instead of a notifier?
>
> Yes.
>
>> 2. Instead of calling vmclear_local_loaded_vmcss, the vmclear operation
>> will just call the vmclear on every vmcss loaded on the cpu?
>>
>> like below:
>>
>> static void crash_vmclear_local_loaded_vmcss(void)
>> {
>> int cpu = raw_smp_processor_id();
>> struct loaded_vmcs *v, *n;
>>
>> if (!crash_local_vmclear_enabled(cpu))
>> return;
>>
>> list_for_each_entry_safe(v, n, &per_cpu(loaded_vmcss_on_cpu, cpu),
>> loaded_vmcss_on_cpu_link)
>> vmcs_clear(v->vmcs);
>> }
>>
>> right?
>
> Yeah that looks good. I would do list_for_each_entry because the list
> isn't changing.
OK.
I will update the patch and resend it.
Zhang