On Wed, 04 May 2022 21:00:42 +0100,
"Guilherme G. Piccoli" <[email protected]> wrote:
>
> Hi folks, this email is to ask feedback / trigger a discussion about the
> concept of custom crash shutdown handler, that is "missing" in arm64
> while it's present in many architectures [mips, powerpc, x86, sh (!)].
>
> Currently, when we kexec in arm64, the function machine_crash_shutdown()
> is called as a handler to disable CPUs and (potentially) do extra
> quiesce work. In the aforementioned architectures, there's a way to
> override this function, if for example an hypervisor wish to have its
> guests running their own custom shutdown machinery.
>
> For powerpc/mips, the approach is a generic shutdown function that might
> call other handler-registered functions, whereas x86/sh relies in the
> "machine_ops" structure, having the crash shutdown as a callback in such
> struct.
>
> The usage for that is very broad, but heavy users are hypervisors like
> Hyper-V / KVM (CCed Michael and Vitaly here for this reason). The
> discussion about the need for that in arm64 is from another thread [0],
> so before start implementing/playing with that, I'd like to ask ARM64
> community if there is any feedback and in case it's positive, what is
> the best implementation strategy (struct callback vs. handler call), etc.
>
> I've CCed ARM64/ARM32 maintainers plus extra people I found as really
> involved with ARM architecture - sorry if I added people I shouldn't or
> if I forgot somebody (though the ARM mailing-list is CC).
I have the feeling that you are conflating two different things here:
(1) general shutdown/reboot, whether this because of a crash or not
(2) kexec, for which the whole point is that it is possible to handle
*everything* from within the kernel
On arm64:
(1) is already abstracted via PSCI. The hypervisor can do whatever it
wants there (KVM, not needing anything, just forwards this to
userspace for fun and profit -- if something has to be done, the VMM
is the right spot). I expect other hypervisors to do the same thing
(and that's what the architecture expects anyway).
(2) must, by definition, fit into the architectural envelope. If you
need help from another entity in the system to be able to kexec,
something is broken, because the hypervisor doesn't implement the
architecture correctly (and frankly, we really don't need much to be
able to kexec).
Not having any 'machine_ops' indirection was a conscious decision on
arm64, if only to avoid the nightmare that 32bit was at a time with
every single platform doing their own stuff. Introducing them would
not be an improvement, but simply the admission that hypervisors are
simply too broken for words. And I don't buy the "but x86 has it!"
argument. x86 is a nightmare of PV mess that we can happily ignore,
because we don't do PV for core operations at all.
If something has to be done to quiesce the system, it probably is
related to the system topology, and must be linked to it. We already
have these requirements in order to correctly stop ongoing DMA, shut
down IOMMUs, and other similar stuff. What other requirements does
your favourite hypervisor have?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On 05/05/2022 04:29, Marc Zyngier wrote:
> [...]
> Not having any 'machine_ops' indirection was a conscious decision on
> arm64, if only to avoid the nightmare that 32bit was at a time with
> every single platform doing their own stuff. Introducing them would
> not be an improvement, but simply the admission that hypervisors are
> simply too broken for words. And I don't buy the "but x86 has it!"
> argument. x86 is a nightmare of PV mess that we can happily ignore,
> because we don't do PV for core operations at all.
>
> If something has to be done to quiesce the system, it probably is
> related to the system topology, and must be linked to it. We already
> have these requirements in order to correctly stop ongoing DMA, shut
> down IOMMUs, and other similar stuff. What other requirements does
> your favourite hypervisor have?
>
Thanks Marc and Mark for the details. I agree with most part of it, and
in fact panic notifiers was the trigger for this discussion (and they
are in fact used for this purpose to some extent in Hyper-V).
The idea of having this custom handler from kexec comes from Hyper-V
discussion - I feel it's better to show the code, so please take a look
at functions: hv_machine_crash_shutdown()
[arch/x86/kernel/cpu/mshyperv.c] and the one called from there,
hv_crash_handler() [drivers/hv/vmbus_drv.c].
These routines perform last minute clean-ups, right before kdump/kexec
happens, but *after* the panic notifiers. It seems there is no way to
accomplish that without architecture involvement or core kexec code
pollution heh
Anyway, the idea here was to gather a feedback on how "receptive" arm64
community would be to allow such customization, appreciated your feedback =)
Cheers,
Guilherme