2013-07-10 01:55:24

by Simon Horman

[permalink] [raw]
Subject: [PATCH] kexec: return error of machine_kexec() fails

From: Stephen Warren <[email protected]>

Prior to commit 3ab8352 "kexec jump", if machine_kexec() returned,
sys_reboot() would return -EINVAL. This patch restores this behaviour
for the non-KEXEC_JUMP case, where machine_kexec() is not expected to
return.

This situation can occur on ARM, where kexec requires disabling all but
one CPU using CPU hotplug. However, if hotplug isn't supported by the
particular HW the kernel is running on, then kexec cannot succeed.

Signed-off-by: Stephen Warren <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Zhang Yanfei <[email protected]>
Acked-by: Simon Horman <[email protected]>
---
kernel/kexec.c | 2 ++
1 file changed, 2 insertions(+)

Andrew, could you consider picking up this patch?

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 59f7b55..bde1190 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1702,6 +1702,8 @@ int kernel_kexec(void)
pm_restore_console();
unlock_system_sleep();
}
+#else
+ error = -EINVAL;
#endif

Unlock:
--
1.8.2.1


2013-07-10 14:36:55

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] kexec: return error of machine_kexec() fails

Simon Horman <[email protected]> writes:

> From: Stephen Warren <[email protected]>
>
> Prior to commit 3ab8352 "kexec jump", if machine_kexec() returned,
> sys_reboot() would return -EINVAL. This patch restores this behaviour
> for the non-KEXEC_JUMP case, where machine_kexec() is not expected to
> return.
>
> This situation can occur on ARM, where kexec requires disabling all but
> one CPU using CPU hotplug. However, if hotplug isn't supported by the
> particular HW the kernel is running on, then kexec cannot succeed.

Ugh. This reasoning is nonsense. Prior to the kexec jump work
machine_kexec could never return and so could never return -EINVAL.

It is not ok to have an image loaded that we can not kexec. kexec_load
should fail not machine_shutdown or machine_kexec.

The only time that machine_kexec can validly return is in the kexec_jump
case, and that is a successful return.

So formally.

Nacked-by: "Eric W. Biederman" <[email protected]>

My apologies for not speaking up sooner the broken ARM mutli-cpu
shutdown architecture hurts my brain to think about

ARM needs to get it's act together and stop modifying the generic code
to deal with it's broken multi-cpu architecture.

Eric

> Signed-off-by: Stephen Warren <[email protected]>
> Acked-by: Will Deacon <[email protected]>
> Acked-by: Zhang Yanfei <[email protected]>
> Acked-by: Simon Horman <[email protected]>
> ---
> kernel/kexec.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> Andrew, could you consider picking up this patch?
>
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 59f7b55..bde1190 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1702,6 +1702,8 @@ int kernel_kexec(void)
> pm_restore_console();
> unlock_system_sleep();
> }
> +#else
> + error = -EINVAL;
> #endif
>
> Unlock:

2013-07-10 19:09:20

by Stephen Warren

[permalink] [raw]
Subject: Re: [PATCH] kexec: return error of machine_kexec() fails

On 07/10/2013 08:36 AM, Eric W. Biederman wrote:
> Simon Horman <[email protected]> writes:
>
>> From: Stephen Warren <[email protected]>
>>
>> Prior to commit 3ab8352 "kexec jump", if machine_kexec() returned,
>> sys_reboot() would return -EINVAL. This patch restores this behaviour
>> for the non-KEXEC_JUMP case, where machine_kexec() is not expected to
>> return.
>>
>> This situation can occur on ARM, where kexec requires disabling all but
>> one CPU using CPU hotplug. However, if hotplug isn't supported by the
>> particular HW the kernel is running on, then kexec cannot succeed.
>
> Ugh. This reasoning is nonsense. Prior to the kexec jump work
> machine_kexec could never return and so could never return -EINVAL.

Well, any function /can/ return. Perhaps there was some undocumented
requirement that machine_kexec() was not allowed to return? I did test
it, and everything appears to work fine if it does return, aside from
the error code.

> It is not ok to have an image loaded that we can not kexec. kexec_load
> should fail not machine_shutdown or machine_kexec.

Hmm. I suppose one option is to enhance ARM's machine_kexec_prepare(),
which is called from kexec_load(), and have that fail unless either the
current HW is non-SMP, or full CPU HW/driver hotplug/PM support is
available, so that it's guaranteed that machine_shutdown() will be able
to fully disable all but one CPU.

Would that be acceptable?

Other alternatives would be:

a) Force the user to disable (hot unplug) the CPUs themselves before
calling kexec_load(). This seems rather onerous, and could be defeated
by replugging them between kexec_load() and kernel_kexec().

b) Actually modifying kexec_load() to disable the CPUs, at the point
where it's legal for it to fail. However, I suspect some use-cases call
kexec_load() a long time before kernel_kexec(), so this would end up
disabling SMP way too early.

> ARM needs to get it's act together and stop modifying the generic code
> to deal with it's broken multi-cpu architecture.

A standardized in-CPU mechanism for disabling CPUs as part of the ARM
architecture would be nice. However, even if that appears today, it's
not going to help all the already extant systems that don't have it.

2013-07-10 20:42:42

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] kexec: return error of machine_kexec() fails

Stephen Warren <[email protected]> writes:

> On 07/10/2013 08:36 AM, Eric W. Biederman wrote:
>> Simon Horman <[email protected]> writes:
>>
>>> From: Stephen Warren <[email protected]>
>>>
>>> Prior to commit 3ab8352 "kexec jump", if machine_kexec() returned,
>>> sys_reboot() would return -EINVAL. This patch restores this behaviour
>>> for the non-KEXEC_JUMP case, where machine_kexec() is not expected to
>>> return.
>>>
>>> This situation can occur on ARM, where kexec requires disabling all but
>>> one CPU using CPU hotplug. However, if hotplug isn't supported by the
>>> particular HW the kernel is running on, then kexec cannot succeed.
>>
>> Ugh. This reasoning is nonsense. Prior to the kexec jump work
>> machine_kexec could never return and so could never return -EINVAL.
>
> Well, any function /can/ return. Perhaps there was some undocumented
> requirement that machine_kexec() was not allowed to return?

I think the name and the lack of an error code is in general a strong
indication that machine_kexec should not return. As returning is
semantically wrong (baring kexec_jump). There is the additional fact
that machine_kexec does not return.

> I did test
> it, and everything appears to work fine if it does return, aside from
> the error code.

My point was really that semantically you are failing in the wrong
location.


>> It is not ok to have an image loaded that we can not kexec. kexec_load
>> should fail not machine_shutdown or machine_kexec.
>
> Hmm. I suppose one option is to enhance ARM's machine_kexec_prepare(),
> which is called from kexec_load(), and have that fail unless either the
> current HW is non-SMP, or full CPU HW/driver hotplug/PM support is
> available, so that it's guaranteed that machine_shutdown() will be able
> to fully disable all but one CPU.
>
> Would that be acceptable?

Yes. Failing in kexec_load via ARMS's machine_kexec_prepare seems much
more appropriate, and it is where userspace will expect and be prepared
to deal with a failure.

> Other alternatives would be:
>
> a) Force the user to disable (hot unplug) the CPUs themselves before
> calling kexec_load(). This seems rather onerous, and could be defeated
> by replugging them between kexec_load() and kernel_kexec().
>
> b) Actually modifying kexec_load() to disable the CPUs, at the point
> where it's legal for it to fail. However, I suspect some use-cases call
> kexec_load() a long time before kernel_kexec(), so this would end up
> disabling SMP way too early.
>
>> ARM needs to get it's act together and stop modifying the generic code
>> to deal with it's broken multi-cpu architecture.
>
> A standardized in-CPU mechanism for disabling CPUs as part of the ARM
> architecture would be nice. However, even if that appears today, it's
> not going to help all the already extant systems that don't have it.

I meant code not hardware architecture. We keep having code thrown in
the the shutdown paths because ARM only supports cpu shutdown via cpu
hotunplug and cpu hotunplug is not universally available.

That is a software architecture BUG with the ARM kernels.

I admit that using cpu hotunplug for everything sounds good on paper but
in practice cpu hotunplug is a nasty heavy weight monster that is much
harder to support than other cpu shutdown schemes.

Eric

2013-07-10 23:57:25

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] kexec: return error of machine_kexec() fails

On Wed, Jul 10, 2013 at 01:42:17PM -0700, Eric W. Biederman wrote:
> I meant code not hardware architecture. We keep having code thrown in
> the the shutdown paths because ARM only supports cpu shutdown via cpu
> hotunplug and cpu hotunplug is not universally available.
>
> That is a software architecture BUG with the ARM kernels.

There are many problems here:

1) if you can't place a CPU individually into reset, what do you do with
it over a kexec?

Once woken it executes code. It will never stop executing code. If you
place it inside an infinite loop in the existing kernel, and then
overwrite it, it will then start executing the instructions there when
the new kernel broadcasts the cache flushes, and it will start executing
whatever code is there.

2) if the CPU itself needs to execute code to shut itself down, how do
you get it to do that on a crash-based kexec when IPI broadcasts may not
work?

3) what about situations where you need the requestor to also do something
to shut down the secondary CPU?

Taking CPUs offline is not an easy thing to do when every platform plays
their own games with doing that, and then you have security crap that
gets in the way too, with the security crap having platform specific
interfaces.

CPU hotplug is by far the best solution we have to this mess.