2020-01-08 10:56:58

by Paolo Bonzini

[permalink] [raw]
Subject: Re: discuss about pvpanic

On 08/01/20 09:25, zhenwei pi wrote:
> Hey, Paolo
>
> Currently, pvpapic only supports bit 0(PVPANIC_PANICKED).
> We usually expect that guest writes ioport (typical 0x505) in panic_notifier_list callback
> during handling panic, then we can handle pvpapic event PVPANIC_PANICKED in QEMU.
>
> On the other hand, guest wants to handle the crash by kdump-tools, and reboots without any
> panic_notifier_list callback. So QEMU only knows that guest has rebooted (because guest
> write 0xcf9 ioport for RCR request), but QEMU can't identify why guest resets.
>
> In production environment, we hit about 100+ guest reboot event everyday, sadly we
> can't separate the abnormal reboot from normal operation.
>
> We want to add a new bit for pvpanic event(maybe PVPANIC_CRASHLOADED) to represent the guest has crashed,
> and the panic is handled by the guest kernel. (here is the previous patch https://lkml.org/lkml/2019/12/14/265)
>
> What do you think about this solution? Or do you have any other suggestions?

Hi Zhenwei,

the kernel-side patch certainly makes sense. I assume that you want the
event to propagate up from QEMU to Libvirt and so on? The QEMU patch
would need to declare a new event (qapi/misc.json) and send it in
handle_event (hw/misc/pvpanic.c). For Libvirt I'm not familiar, so I'm
adding the respective list.

Another possibility is to simply not write to pvpanic if
kexec_crash_loaded() returns true; this would match what xen_panic_event
does for example. The kexec kernel would then log the panic normally,
without the need for MMIO at all. However, I have no problem with
adding a new bit to the pvpanic I/O port so once you post the QEMU patch
I will certainly ack the kernel side.

Thanks,

Paolo


2020-01-08 11:04:12

by Michal Privoznik

[permalink] [raw]
Subject: Re: discuss about pvpanic

On 1/8/20 10:36 AM, Paolo Bonzini wrote:
> On 08/01/20 09:25, zhenwei pi wrote:
>> Hey, Paolo
>>
>> Currently, pvpapic only supports bit 0(PVPANIC_PANICKED).
>> We usually expect that guest writes ioport (typical 0x505) in panic_notifier_list callback
>> during handling panic, then we can handle pvpapic event PVPANIC_PANICKED in QEMU.
>>
>> On the other hand, guest wants to handle the crash by kdump-tools, and reboots without any
>> panic_notifier_list callback. So QEMU only knows that guest has rebooted (because guest
>> write 0xcf9 ioport for RCR request), but QEMU can't identify why guest resets.
>>
>> In production environment, we hit about 100+ guest reboot event everyday, sadly we
>> can't separate the abnormal reboot from normal operation.
>>
>> We want to add a new bit for pvpanic event(maybe PVPANIC_CRASHLOADED) to represent the guest has crashed,
>> and the panic is handled by the guest kernel. (here is the previous patch https://lkml.org/lkml/2019/12/14/265)
>>
>> What do you think about this solution? Or do you have any other suggestions?
>
> Hi Zhenwei,
>
> the kernel-side patch certainly makes sense. I assume that you want the
> event to propagate up from QEMU to Libvirt and so on? The QEMU patch
> would need to declare a new event (qapi/misc.json) and send it in
> handle_event (hw/misc/pvpanic.c). For Libvirt I'm not familiar, so I'm
> adding the respective list.

Adding an event is fairly easy, if everything you want libvirt to do is
report the event to upper layers. I volunteer to do it. Question is, how
qemu is going to report this, whether some attributes to GUEST_PANICKED
event or some new event. But more important is to merge the change into
kernel.

Michal

2020-01-08 12:52:09

by Paolo Bonzini

[permalink] [raw]
Subject: Re: discuss about pvpanic

On 08/01/20 10:58, Michal Privoznik wrote:
>> the kernel-side patch certainly makes sense.  I assume that you want the
>> event to propagate up from QEMU to Libvirt and so on?  The QEMU patch
>> would need to declare a new event (qapi/misc.json) and send it in
>> handle_event (hw/misc/pvpanic.c).  For Libvirt I'm not familiar, so I'm
>> adding the respective list.
>
> Adding an event is fairly easy, if everything you want libvirt to do is
> report the event to upper layers. I volunteer to do it. Question is, how
> qemu is going to report this, whether some attributes to GUEST_PANICKED
> event or some new event.

I think it should be a new event, using GUEST_PANICKED could cause upper
layers to react by shutting down or rebooting the guest.

Thanks,

Paolo