2021-11-29 16:56:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [External] Re: [PATCH] x86/purgatory: provide config to disable purgatory

Usama Arif <[email protected]> writes:

> Hi,
>
> Thanks for your replies. I have submitted a v2 of the patch with a
> much more detailed commit message including reason for the patch and timing values.
>
> The time taken from reboot to running init process was measured
> with both purgatory enabled and disabled over 20 runs and the
> averages are:
> Purgatory disabled:
> - TSC = 3908766161 cycles
> - ktime = 606.8 ms
> Purgatory enabled:
> - TSC = 5005811885 cycles (28.1% worse)
> - ktime = 843.1 ms (38.9% worse)
>
>
> Our reason for this patch is that it helps reduce the downtime of servers when
> the host kernel managing multiple VMs needs to be updated via kexec,
> but it makes reboot with kexec much faster so should be a general improvement in
> boot time if purgatory is not needed and could have other usecases as well.
> I believe only x86, powerpc and s390 have purgatory supported, other platforms
> like arm64 dont have it implemented yet, so with the reboot time improvement seen,
> it would be a good idea to have the option to disable purgatory completely but set default to y.
> We also have the CONFIG_KEXEC_BZIMAGE_VERIFY_SIG which can be enabled to verify the next
> kernel image to be booted and purgatory can be completely skipped if
> not required.

CONFIG_KEXEC_BZIMAGE_VERIFY_SIG is something totally and completely
different. It's job is to verify that the kernel to be booted comes
from a trusted source. The sha256 verification in purgatory's job
is to verify that memory the kernel cares about was not corrupted
during the kexec process.

I believe when you say purgatory you are really talking about that
sha256 checksum. It really is not possible to disable all of
the code that runs between kernels, as the old and the new kernel may
run at the same addresses. Anything that runs between the two kernels
is what is referred to as purgatory. Even if it is just a small
assembly stub.

That sha256 verification is always needed for kexec on panic, there are
by the nature of a kernel panic too many unknowns to have any confidence
the new kernel will not be corrupted in the process of kexec before it
gets started.

For an ordinary kexec it might be possible to say that you have a
reliable kernel shutdown process and you know for a fact that something
won't come along and corrupt the kernel. I find that a questionable
assertion. I haven't seen anyone yet whose focus when getting an
ordinary kexec to work as anything other than making certain all of the
drivers are shutdown properly.

I have seen countless times when a network packet comes in a the wrong
time and the target kernel's memory is corrupted before it gets far
enough to initialize the network driver.

For a 0.2s speed up you are talking about disabling all of the safety
checks in a very dangerous situation. How much can you can in
performance by optimizing the sha256 implementation instead of using
what is essentially a reference implementation in basic C that I copied
from somewhere long ago.

Optimize the sha256 implementation and the memory copy loop and then
show how the tiny bit of time that is left is on a mission critical path
and must be removed. Then we can reasonably talk about a config option
for disabling the sha256 implementation in the kexec in not-panic case.

That sha256 implementation in part so that we can all sleep at night
because we don't have to deal with very very strange heizenbugs.

Eric



2021-12-01 10:30:14

by Usama Arif

[permalink] [raw]
Subject: Re: [External] Re: [PATCH] x86/purgatory: provide config to disable purgatory

Hi,

(Resending the reply as my email client had updated and inserted html
code and caused a bounceback from the mailing list, sorry about that.)

Thanks for your reply, I have responded with further comments/questions
inline below, and also have provided some context for the patch at the
start:

The patch is not introducing a new CONFIG option, as can be seen in the
v2 patch diff. It is converting an existing CONFIG option that only
enabled purgatory for specific architectures in which it has been
implemented (x86, powerpc and s390) to an option that can that allow
purgatory to be disabled (with default enabled) and only provides code
to disable purgatory for x86 only. From what i see, purgatory is
currently not yet implemented in other architectures like arm64
kexec_file case and riscv, so this would on a high level,
provide only the option to make kexec on x86 similar to other the other
architectures.

As a background to the discussion, the usecase we are aiming is to
update the host kernel with kexec in servers which is managing multiple
VMs, to cut down the downtime of servers as much as possible so that its
not noticeable to VMs. The patch is aimed at x86 purgatory code
specifically. We are targetting other optimizations as well in other
areas in boot path to cut down the 600ms time much further, so that cut
down of 200ms downtime is significant in the usecase described. I did
have a few comments/questions about your reply, please see my responses
below:


On 29/11/2021 16:53, Eric W. Biederman wrote:
> Usama Arif <[email protected]> writes:
>
>> Hi,
>>
>> Thanks for your replies. I have submitted a v2 of the patch with a
>> much more detailed commit message including reason for the patch and timing values.
>>
>> The time taken from reboot to running init process was measured
>> with both purgatory enabled and disabled over 20 runs and the
>> averages are:
>> Purgatory disabled:
>> - TSC = 3908766161 cycles
>> - ktime = 606.8 ms
>> Purgatory enabled:
>> - TSC = 5005811885 cycles (28.1% worse)
>> - ktime = 843.1 ms (38.9% worse)
>>
>>
>> Our reason for this patch is that it helps reduce the downtime of servers when
>> the host kernel managing multiple VMs needs to be updated via kexec,
>> but it makes reboot with kexec much faster so should be a general improvement in
>> boot time if purgatory is not needed and could have other usecases as well.
>> I believe only x86, powerpc and s390 have purgatory supported, other platforms
>> like arm64 dont have it implemented yet, so with the reboot time improvement seen,
>> it would be a good idea to have the option to disable purgatory completely but set default to y.
>> We also have the CONFIG_KEXEC_BZIMAGE_VERIFY_SIG which can be enabled to verify the next
>> kernel image to be booted and purgatory can be completely skipped if
>> not required.
>
> CONFIG_KEXEC_BZIMAGE_VERIFY_SIG is something totally and completely
> different. It's job is to verify that the kernel to be booted comes
> from a trusted source. The sha256 verification in purgatory's job
> is to verify that memory the kernel cares about was not corrupted
> during the kexec process.
>

Thanks, acknowledged.

> I believe when you say purgatory you are really talking about that
> sha256 checksum. It really is not possible to disable all of
> the code that runs between kernels, as the old and the new kernel may
> run at the same addresses. Anything that runs between the two kernels
> is what is referred to as purgatory. Even if it is just a small
> assembly stub.
>
>

With this patch, i am trying to give an option (with default purgatory
enabled)to disable purgatory completely on x86 only, i.e. no code
running between 2 kernels on x86. From my understanding there is no
purgatory in arm64 kexec_file case, in riscv and in some other archs as
well, so I am not sure why its not possible to disable purgatory (all
code running between 2 kernels) in x86? Unless i misunderstood something
about the working of other platforms? In x86 case, from what i see
relocate_kernel is still part of the older kernel and not purgatory, and
with my patch, purgatory.ro is not built and kexec does execute
successfully with purgatory disabled.

About your point for the old and the new kernel may run at the same
addresses,i dont think that can happen as in bzImage64_load function its
loaded using the kbuf struct. This doesnt happen in other architectures
(for e.g. arm64) that dont implement purgatory and any of the tests I
conducted with the patch applied due to the use of kbuf struct.

From what i see in the code, purgatory in x86 specifically, has 2 main
functions, sha256 verification and loading the %rsi register for
bootparam_load_addr. In this patch purgatory was disabled, which
resulted in sha256 verification disabled and bootparam_load_addr moved
to relocate_kernel. Our analysis revealed that most of the time is spent
in the sha256 verification, so I would be ok to reformat the patch
to make the sha verification optional and keep purgatory enabled if
thats preferred? Although in my opinion the current patch of only
providing an option to disable purgatory seems much better.


> That sha256 verification is always needed for kexec on panic, there are
> by the nature of a kernel panic too many unknowns to have any confidence
> the new kernel will not be corrupted in the process of kexec before it
> gets started.
>
> For an ordinary kexec it might be possible to say that you have a
> reliable kernel shutdown process and you know for a fact that something
> won't come along and corrupt the kernel. I find that a questionable
> assertion. I haven't seen anyone yet whose focus when getting an
> ordinary kexec to work as anything other than making certain all of the
> drivers are shutdown properly.
>
> I have seen countless times when a network packet comes in a the wrong
> time and the target kernel's memory is corrupted before it gets far
> enough to initialize the network driver. >

I agree that when doing testing and research, there are things that can
go wrong during kexec process, but i assume that the expectation is that
in production environment, when for e.g. updating a kernel with kexec in
servers,the expectation is that kexec will execute successfully as long
as there is nothing wrong in the old kernel and the new kernel.

Maybe I didn't understand this correctly, but if a network packet comes
in and corrupts the kernel memory, wont it also cause a problem even
when purgatory is enabled? I agree that issue like these would be caught
earlier with purgatory, but then if something like this breaks the kexec
process in a production environment where the old and new kernel are
well tested, trusted and expected to work, wouldn't there be a much more
fundamental issue with the reliability of the current kexec process.

> For a 0.2s speed up you are talking about disabling all of the safety
> checks in a very dangerous situation. How much can you can in
> performance by optimizing the sha256 implementation instead of using
> what is essentially a reference implementation in basic C that I copied
> from somewhere long ago.
>
> Optimize the sha256 implementation and the memory copy loop and then
> show how the tiny bit of time that is left is on a mission critical path
> and must be removed. Then we can reasonably talk about a config option
> for disabling the sha256 implementation in the kexec in not-panic case.
>
Thanks for this, I assume that over here you are suggesting for e.g. in
x86 to replace the existing sha verification implementation in purgatory
with the one in arch/x86/crypto/sha256_ssse3_glue.c? Also could you
point to the memory copy loop to optimize? I can have a look at these 2
options.

Even with these optimizations, i think the v2 patch should still be
considered. The patch in the end is providing only an option to disable
purgatory on x86 only, with the default value of keeping it enabled and
not changing kexec behaviour. The CONFIG already existed, it is just
renamed and now provides user the option to select to disable purgatory
for x86, rather than it being architecture dependent.

Thanks again for the review and comments,
Usama


> That sha256 implementation in part so that we can all sleep at night
> because we don't have to deal with very very strange heizenbugs.
>
> Eric
>

2021-12-06 16:51:14

by Usama Arif

[permalink] [raw]
Subject: Re: [External] Re: [PATCH] x86/purgatory: provide config to disable purgatory

Hi,

I have sent a v3 of the patch with a much clearer commit message, please
let me know if there are any comments for the v3 patch or my responses
below.

Thanks,
Usama

On 01/12/2021 10:29, Usama Arif wrote:
> Hi,
>
> (Resending the reply as my email client had updated and inserted html
> code and caused a bounceback from the mailing list, sorry about that.)
>
> Thanks for your reply, I have responded with further comments/questions
> inline below, and also have provided some context for the patch at the
> start:
>
> The patch is not introducing a new CONFIG option, as can be seen in the
> v2 patch diff. It is converting an existing CONFIG option that only
> enabled purgatory for specific architectures in which it has been
> implemented (x86, powerpc and s390) to an option that can that allow
> purgatory to be disabled (with default enabled) and only provides code
> to disable purgatory for x86 only. From what i see, purgatory is
> currently not yet implemented in other architectures like arm64
> kexec_file case and riscv, so this would on a high level,
> provide only the option to make kexec on x86 similar to other the other
> architectures.
>
> As a background to the discussion, the usecase we are aiming is to
> update the host kernel with kexec in servers which is managing multiple
> VMs, to cut down the downtime of servers as much as possible so that its
> not noticeable to VMs. The patch is aimed at x86 purgatory code
> specifically. We are targetting other optimizations as well in other
> areas in boot path to cut down the 600ms time much further, so that cut
> down of 200ms downtime is significant in the usecase described. I did
> have a few comments/questions about your reply, please see my responses
> below:
>
>
> On 29/11/2021 16:53, Eric W. Biederman wrote:
>> Usama Arif <[email protected]> writes:
>>
>>> Hi,
>>>
>>> Thanks for your replies. I have submitted a v2 of the patch with a
>>> much more detailed commit message including reason for the patch and
>>> timing values.
>>>
>>> The time taken from reboot to running init process was measured
>>> with both purgatory enabled and disabled over 20 runs and the
>>> averages are:
>>> Purgatory disabled:
>>> - TSC = 3908766161 cycles
>>> - ktime = 606.8 ms
>>> Purgatory enabled:
>>> - TSC = 5005811885 cycles (28.1% worse)
>>> - ktime = 843.1 ms (38.9% worse)
>>>
>>>
>>> Our reason for this patch is that it helps reduce the downtime of
>>> servers when
>>> the host kernel managing multiple VMs needs to be updated via kexec,
>>> but it makes reboot with kexec much faster so should be a general
>>> improvement in
>>> boot time if purgatory is not needed and could have other usecases as
>>> well.
>>> I believe only x86, powerpc and s390 have purgatory supported, other
>>> platforms
>>> like arm64 dont have it implemented yet, so with the reboot time
>>> improvement seen,
>>> it would be a good idea to have the option to disable purgatory
>>> completely but set default to y.
>>> We also have the CONFIG_KEXEC_BZIMAGE_VERIFY_SIG which can be enabled
>>> to verify the next
>>> kernel image to be booted and purgatory can be completely skipped if
>>> not required.
>>
>> CONFIG_KEXEC_BZIMAGE_VERIFY_SIG is something totally and completely
>> different.  It's job is to verify that the kernel to be booted comes
>> from a trusted source.   The sha256 verification in purgatory's job
>> is to verify that memory the kernel cares about was not corrupted
>> during the kexec process.
>>
>
> Thanks, acknowledged.
>
>> I believe when you say purgatory you are really talking about that
>> sha256 checksum.  It really is not possible to disable all of
>> the code that runs between kernels, as the old and the new kernel may
>> run at the same addresses.  Anything that runs between the two kernels
>> is what is referred to as purgatory.  Even if it is just a small
>> assembly stub.
>>
> >
>
> With this patch, i am trying to give an option (with default purgatory
> enabled)to disable purgatory completely on x86 only, i.e. no code
> running between 2 kernels on x86. From my understanding there is no
> purgatory in arm64 kexec_file case, in riscv and in some other archs as
> well, so I am not sure why its not possible to disable purgatory (all
> code running between 2 kernels) in x86? Unless i misunderstood something
> about the working of other platforms? In x86 case, from what i see
> relocate_kernel is still part of the older kernel and not purgatory, and
> with my patch, purgatory.ro is not built and kexec does execute
> successfully with purgatory disabled.
>
> About your point for the old and the new kernel may run at the same
> addresses,i dont think that can happen as in bzImage64_load function its
> loaded using the kbuf struct. This doesnt happen in other architectures
> (for e.g. arm64) that dont implement purgatory and any of the tests I
> conducted with the patch applied due to the use of kbuf struct.
>
> From what i see in the code, purgatory in x86 specifically, has 2 main
> functions, sha256 verification and loading the %rsi register for
> bootparam_load_addr. In this patch purgatory was disabled, which
> resulted in sha256 verification disabled and bootparam_load_addr moved
> to relocate_kernel. Our analysis revealed that most of the time is spent
> in the sha256 verification, so I would be ok to reformat the patch
> to make the sha verification optional and keep purgatory enabled if
> thats preferred? Although in my opinion the current patch of only
> providing an option to disable purgatory seems much better.
>
>
>> That sha256 verification is always needed for kexec on panic, there are
>> by the nature of a kernel panic too many unknowns to have any confidence
>> the new kernel will not be corrupted in the process of kexec before it
>> gets started.
>>
>> For an ordinary kexec it might be possible to say that you have a
>> reliable kernel shutdown process and you know for a fact that something
>> won't come along and corrupt the kernel.  I find that a questionable
>> assertion.  I haven't seen anyone yet whose focus when getting an
>> ordinary kexec to work as anything other than making certain all of the
>> drivers are shutdown properly.
>>
>> I have seen countless times when a network packet comes in a the wrong
>> time and the target kernel's memory is corrupted before it gets far
>> enough to initialize the network driver. >
>
> I agree that when doing testing and research, there are things that can
> go wrong during kexec process, but i assume that the expectation is that
> in production environment, when for e.g. updating a kernel with kexec in
> servers,the expectation is that kexec will execute successfully as long
> as there is nothing wrong in the old kernel and the new kernel.
>
> Maybe I didn't understand this correctly, but if a network packet comes
> in and corrupts the kernel memory, wont it also cause a problem even
> when purgatory is enabled? I agree that issue like these would be caught
> earlier with purgatory, but then if something like this breaks the kexec
> process in a production environment where the old and new kernel are
> well tested, trusted and expected to work, wouldn't there be a much more
> fundamental issue with the reliability of the current kexec process.
>
>> For a 0.2s speed up you are talking about disabling all of the safety
>> checks in a very dangerous situation.  How much can you can in
>> performance by optimizing the sha256 implementation instead of using
>> what is essentially a reference implementation in basic C that I copied
>> from somewhere long ago.
>>
>> Optimize the sha256 implementation and the memory copy loop and then
>> show how the tiny bit of time that is left is on a mission critical path
>> and must be removed.  Then we can reasonably talk about a config option
>> for disabling the sha256 implementation in the kexec in not-panic case.
>>
> Thanks for this, I assume that over here you are suggesting for e.g. in
> x86 to replace the existing sha verification implementation in purgatory
> with the one in arch/x86/crypto/sha256_ssse3_glue.c? Also could you
> point to the memory copy loop to optimize? I can have a look at these 2
> options.
>
> Even with these optimizations, i think the v2 patch should still be
> considered. The patch in the end is providing only an option to disable
> purgatory on x86 only, with the default value of keeping it enabled and
> not changing kexec behaviour. The CONFIG already existed, it is just
> renamed and now provides user the option to select to disable purgatory
> for x86, rather than it being architecture dependent.
>
> Thanks again for the review and comments,
> Usama
>
>
>> That sha256 implementation in part so that we can all sleep at night
>> because we don't have to deal with very very strange heizenbugs.
>>
>> Eric
>>

2021-12-16 18:05:07

by Usama Arif

[permalink] [raw]
Subject: Re: [PATCH] x86/purgatory: provide config to disable purgatory

Hi,

Just wanted to check again if there were any comments on my responses
below or the v3 of the patch at
https://lore.kernel.org/lkml/[email protected]/.

Thanks!
Usama

On 06/12/2021 16:51, Usama Arif wrote:
> Hi,
>
> I have sent a v3 of the patch with a much clearer commit message, please
> let me know if there are any comments for the v3 patch or my responses
> below.
>
> Thanks,
> Usama
>
> On 01/12/2021 10:29, Usama Arif wrote:
>> Hi,
>>
>> (Resending the reply as my email client had updated and inserted html
>> code and caused a bounceback from the mailing list, sorry about that.)
>>
>> Thanks for your reply, I have responded with further
>> comments/questions inline below, and also have provided some context
>> for the patch at the start:
>>
>> The patch is not introducing a new CONFIG option, as can be seen in
>> the v2 patch diff. It is converting an existing CONFIG option that
>> only enabled purgatory for specific architectures in which it has been
>> implemented (x86, powerpc and s390) to an option that can that allow
>> purgatory to be disabled (with default enabled) and only provides code
>> to disable purgatory for x86 only. From what i see, purgatory is
>> currently not yet implemented in other architectures like arm64
>> kexec_file case and riscv, so this would on a high level,
>> provide only the option to make kexec on x86 similar to other the
>> other architectures.
>>
>> As a background to the discussion, the usecase we are aiming is to
>> update the host kernel with kexec in servers which is managing
>> multiple VMs, to cut down the downtime of servers as much as possible
>> so that its not noticeable to VMs. The patch is aimed at x86 purgatory
>> code specifically. We are targetting other optimizations as well in
>> other areas in boot path to cut down the 600ms time much further, so
>> that cut down of 200ms downtime is significant in the usecase
>> described. I did have a few comments/questions about your reply,
>> please see my responses below:
>>
>>
>> On 29/11/2021 16:53, Eric W. Biederman wrote:
>>> Usama Arif <[email protected]> writes:
>>>
>>>> Hi,
>>>>
>>>> Thanks for your replies. I have submitted a v2 of the patch with a
>>>> much more detailed commit message including reason for the patch and
>>>> timing values.
>>>>
>>>> The time taken from reboot to running init process was measured
>>>> with both purgatory enabled and disabled over 20 runs and the
>>>> averages are:
>>>> Purgatory disabled:
>>>> - TSC = 3908766161 cycles
>>>> - ktime = 606.8 ms
>>>> Purgatory enabled:
>>>> - TSC = 5005811885 cycles (28.1% worse)
>>>> - ktime = 843.1 ms (38.9% worse)
>>>>
>>>>
>>>> Our reason for this patch is that it helps reduce the downtime of
>>>> servers when
>>>> the host kernel managing multiple VMs needs to be updated via kexec,
>>>> but it makes reboot with kexec much faster so should be a general
>>>> improvement in
>>>> boot time if purgatory is not needed and could have other usecases
>>>> as well.
>>>> I believe only x86, powerpc and s390 have purgatory supported, other
>>>> platforms
>>>> like arm64 dont have it implemented yet, so with the reboot time
>>>> improvement seen,
>>>> it would be a good idea to have the option to disable purgatory
>>>> completely but set default to y.
>>>> We also have the CONFIG_KEXEC_BZIMAGE_VERIFY_SIG which can be
>>>> enabled to verify the next
>>>> kernel image to be booted and purgatory can be completely skipped if
>>>> not required.
>>>
>>> CONFIG_KEXEC_BZIMAGE_VERIFY_SIG is something totally and completely
>>> different.  It's job is to verify that the kernel to be booted comes
>>> from a trusted source.   The sha256 verification in purgatory's job
>>> is to verify that memory the kernel cares about was not corrupted
>>> during the kexec process.
>>>
>>
>> Thanks, acknowledged.
>>
>>> I believe when you say purgatory you are really talking about that
>>> sha256 checksum.  It really is not possible to disable all of
>>> the code that runs between kernels, as the old and the new kernel may
>>> run at the same addresses.  Anything that runs between the two kernels
>>> is what is referred to as purgatory.  Even if it is just a small
>>> assembly stub.
>>>
>>  >
>>
>> With this patch, i am trying to give an option (with default purgatory
>> enabled)to disable purgatory completely on x86 only, i.e. no code
>> running between 2 kernels on x86. From my understanding there is no
>> purgatory in arm64 kexec_file case, in riscv and in some other archs
>> as well, so I am not sure why its not possible to disable purgatory
>> (all code running between 2 kernels) in x86? Unless i misunderstood
>> something about the working of other platforms? In x86 case, from what
>> i see relocate_kernel is still part of the older kernel and not
>> purgatory, and with my patch, purgatory.ro is not built and kexec does
>> execute successfully with purgatory disabled.
>>
>> About your point for the old and the new kernel may run at the same
>> addresses,i dont think that can happen as in bzImage64_load function
>> its loaded using the kbuf struct. This doesnt happen in other
>> architectures (for e.g. arm64) that dont implement purgatory and any
>> of the tests I conducted with the patch applied due to the use of kbuf
>> struct.
>>
>>  From what i see in the code, purgatory in x86 specifically, has 2
>> main functions, sha256 verification and loading the %rsi register for
>> bootparam_load_addr. In this patch purgatory was disabled, which
>> resulted in sha256 verification disabled and bootparam_load_addr moved
>> to relocate_kernel. Our analysis revealed that most of the time is
>> spent in the sha256 verification, so I would be ok to reformat the patch
>> to make the sha verification optional and keep purgatory enabled if
>> thats preferred? Although in my opinion the current patch of only
>> providing an option to disable purgatory seems much better.
>>
>>
>>> That sha256 verification is always needed for kexec on panic, there are
>>> by the nature of a kernel panic too many unknowns to have any confidence
>>> the new kernel will not be corrupted in the process of kexec before it
>>> gets started.
>>>
>>> For an ordinary kexec it might be possible to say that you have a
>>> reliable kernel shutdown process and you know for a fact that something
>>> won't come along and corrupt the kernel.  I find that a questionable
>>> assertion.  I haven't seen anyone yet whose focus when getting an
>>> ordinary kexec to work as anything other than making certain all of the
>>> drivers are shutdown properly.
>>>
>>> I have seen countless times when a network packet comes in a the wrong
>>> time and the target kernel's memory is corrupted before it gets far
>>> enough to initialize the network driver. >
>>
>> I agree that when doing testing and research, there are things that can
>> go wrong during kexec process, but i assume that the expectation is that
>> in production environment, when for e.g. updating a kernel with kexec
>> in servers,the expectation is that kexec will execute successfully as
>> long as there is nothing wrong in the old kernel and the new kernel.
>>
>> Maybe I didn't understand this correctly, but if a network packet
>> comes in and corrupts the kernel memory, wont it also cause a problem
>> even when purgatory is enabled? I agree that issue like these would be
>> caught earlier with purgatory, but then if something like this breaks
>> the kexec process in a production environment where the old and new
>> kernel are well tested, trusted and expected to work, wouldn't there
>> be a much more fundamental issue with the reliability of the current
>> kexec process.
>>
>>> For a 0.2s speed up you are talking about disabling all of the safety
>>> checks in a very dangerous situation.  How much can you can in
>>> performance by optimizing the sha256 implementation instead of using
>>> what is essentially a reference implementation in basic C that I copied
>>> from somewhere long ago.
>>>
>>> Optimize the sha256 implementation and the memory copy loop and then
>>> show how the tiny bit of time that is left is on a mission critical path
>>> and must be removed.  Then we can reasonably talk about a config option
>>> for disabling the sha256 implementation in the kexec in not-panic case.
>>>
>> Thanks for this, I assume that over here you are suggesting for e.g.
>> in x86 to replace the existing sha verification implementation in
>> purgatory with the one in arch/x86/crypto/sha256_ssse3_glue.c? Also
>> could you point to the memory copy loop to optimize? I can have a look
>> at these 2 options.
>>
>> Even with these optimizations, i think the v2 patch should still be
>> considered. The patch in the end is providing only an option to
>> disable purgatory on x86 only, with the default value of keeping it
>> enabled and not changing kexec behaviour. The CONFIG already existed,
>> it is just renamed and now provides user the option to select to
>> disable purgatory for x86, rather than it being architecture dependent.
>>
>> Thanks again for the review and comments,
>> Usama
>>
>>
>>> That sha256 implementation in part so that we can all sleep at night
>>> because we don't have to deal with very very strange heizenbugs.
>>>
>>> Eric
>>>

2021-12-17 00:56:51

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH] x86/purgatory: provide config to disable purgatory

On 12/16/21 at 06:05pm, Usama Arif wrote:
> Hi,
>
> Just wanted to check again if there were any comments on my responses below
> or the v3 of the patch at https://lore.kernel.org/lkml/[email protected]/.

I will have a look. Could you please CC kexec mailing list too next time
when kexec/kdump related patch posted so that more people can help review
if concerned or interested?

I usually don't switch to lkml mail folder unless have to, since it
contains too many mails and take longer time.

>
> Thanks!
> Usama
>
> On 06/12/2021 16:51, Usama Arif wrote:
> > Hi,
> >
> > I have sent a v3 of the patch with a much clearer commit message, please
> > let me know if there are any comments for the v3 patch or my responses
> > below.
> >
> > Thanks,
> > Usama
> >
> > On 01/12/2021 10:29, Usama Arif wrote:
> > > Hi,
> > >
> > > (Resending the reply as my email client had updated and inserted
> > > html code and caused a bounceback from the mailing list, sorry about
> > > that.)
> > >
> > > Thanks for your reply, I have responded with further
> > > comments/questions inline below, and also have provided some context
> > > for the patch at the start:
> > >
> > > The patch is not introducing a new CONFIG option, as can be seen in
> > > the v2 patch diff. It is converting an existing CONFIG option that
> > > only enabled purgatory for specific architectures in which it has
> > > been implemented (x86, powerpc and s390) to an option that can that
> > > allow purgatory to be disabled (with default enabled) and only
> > > provides code to disable purgatory for x86 only. From what i see,
> > > purgatory is currently not yet implemented in other architectures
> > > like arm64 kexec_file case and riscv, so this would on a high level,
> > > provide only the option to make kexec on x86 similar to other the
> > > other architectures.
> > >
> > > As a background to the discussion, the usecase we are aiming is to
> > > update the host kernel with kexec in servers which is managing
> > > multiple VMs, to cut down the downtime of servers as much as
> > > possible so that its not noticeable to VMs. The patch is aimed at
> > > x86 purgatory code specifically. We are targetting other
> > > optimizations as well in other areas in boot path to cut down the
> > > 600ms time much further, so that cut down of 200ms downtime is
> > > significant in the usecase described. I did have a few
> > > comments/questions about your reply, please see my responses below:
> > >
> > >
> > > On 29/11/2021 16:53, Eric W. Biederman wrote:
> > > > Usama Arif <[email protected]> writes:
> > > >
> > > > > Hi,
> > > > >
> > > > > Thanks for your replies. I have submitted a v2 of the patch with a
> > > > > much more detailed commit message including reason for the
> > > > > patch and timing values.
> > > > >
> > > > > The time taken from reboot to running init process was measured
> > > > > with both purgatory enabled and disabled over 20 runs and the
> > > > > averages are:
> > > > > Purgatory disabled:
> > > > > - TSC = 3908766161 cycles
> > > > > - ktime = 606.8 ms
> > > > > Purgatory enabled:
> > > > > - TSC = 5005811885 cycles (28.1% worse)
> > > > > - ktime = 843.1 ms (38.9% worse)
> > > > >
> > > > >
> > > > > Our reason for this patch is that it helps reduce the
> > > > > downtime of servers when
> > > > > the host kernel managing multiple VMs needs to be updated via kexec,
> > > > > but it makes reboot with kexec much faster so should be a
> > > > > general improvement in
> > > > > boot time if purgatory is not needed and could have other
> > > > > usecases as well.
> > > > > I believe only x86, powerpc and s390 have purgatory
> > > > > supported, other platforms
> > > > > like arm64 dont have it implemented yet, so with the reboot
> > > > > time improvement seen,
> > > > > it would be a good idea to have the option to disable
> > > > > purgatory completely but set default to y.
> > > > > We also have the CONFIG_KEXEC_BZIMAGE_VERIFY_SIG which can
> > > > > be enabled to verify the next
> > > > > kernel image to be booted and purgatory can be completely skipped if
> > > > > not required.
> > > >
> > > > CONFIG_KEXEC_BZIMAGE_VERIFY_SIG is something totally and completely
> > > > different.? It's job is to verify that the kernel to be booted comes
> > > > from a trusted source.?? The sha256 verification in purgatory's job
> > > > is to verify that memory the kernel cares about was not corrupted
> > > > during the kexec process.
> > > >
> > >
> > > Thanks, acknowledged.
> > >
> > > > I believe when you say purgatory you are really talking about that
> > > > sha256 checksum.? It really is not possible to disable all of
> > > > the code that runs between kernels, as the old and the new kernel may
> > > > run at the same addresses.? Anything that runs between the two kernels
> > > > is what is referred to as purgatory.? Even if it is just a small
> > > > assembly stub.
> > > >
> > > ?>
> > >
> > > With this patch, i am trying to give an option (with default
> > > purgatory enabled)to disable purgatory completely on x86 only, i.e.
> > > no code running between 2 kernels on x86. From my understanding
> > > there is no purgatory in arm64 kexec_file case, in riscv and in some
> > > other archs as well, so I am not sure why its not possible to
> > > disable purgatory (all code running between 2 kernels) in x86?
> > > Unless i misunderstood something about the working of other
> > > platforms? In x86 case, from what i see relocate_kernel is still
> > > part of the older kernel and not purgatory, and with my patch,
> > > purgatory.ro is not built and kexec does execute successfully with
> > > purgatory disabled.
> > >
> > > About your point for the old and the new kernel may run at the same
> > > addresses,i dont think that can happen as in bzImage64_load function
> > > its loaded using the kbuf struct. This doesnt happen in other
> > > architectures (for e.g. arm64) that dont implement purgatory and any
> > > of the tests I conducted with the patch applied due to the use of
> > > kbuf struct.
> > >
> > > ?From what i see in the code, purgatory in x86 specifically, has 2
> > > main functions, sha256 verification and loading the %rsi register
> > > for bootparam_load_addr. In this patch purgatory was disabled, which
> > > resulted in sha256 verification disabled and bootparam_load_addr
> > > moved to relocate_kernel. Our analysis revealed that most of the
> > > time is spent in the sha256 verification, so I would be ok to
> > > reformat the patch
> > > to make the sha verification optional and keep purgatory enabled if
> > > thats preferred? Although in my opinion the current patch of only
> > > providing an option to disable purgatory seems much better.
> > >
> > >
> > > > That sha256 verification is always needed for kexec on panic, there are
> > > > by the nature of a kernel panic too many unknowns to have any confidence
> > > > the new kernel will not be corrupted in the process of kexec before it
> > > > gets started.
> > > >
> > > > For an ordinary kexec it might be possible to say that you have a
> > > > reliable kernel shutdown process and you know for a fact that something
> > > > won't come along and corrupt the kernel.? I find that a questionable
> > > > assertion.? I haven't seen anyone yet whose focus when getting an
> > > > ordinary kexec to work as anything other than making certain all of the
> > > > drivers are shutdown properly.
> > > >
> > > > I have seen countless times when a network packet comes in a the wrong
> > > > time and the target kernel's memory is corrupted before it gets far
> > > > enough to initialize the network driver. >
> > >
> > > I agree that when doing testing and research, there are things that can
> > > go wrong during kexec process, but i assume that the expectation is that
> > > in production environment, when for e.g. updating a kernel with
> > > kexec in servers,the expectation is that kexec will execute
> > > successfully as long as there is nothing wrong in the old kernel and
> > > the new kernel.
> > >
> > > Maybe I didn't understand this correctly, but if a network packet
> > > comes in and corrupts the kernel memory, wont it also cause a
> > > problem even when purgatory is enabled? I agree that issue like
> > > these would be caught earlier with purgatory, but then if something
> > > like this breaks the kexec process in a production environment where
> > > the old and new kernel are well tested, trusted and expected to
> > > work, wouldn't there be a much more fundamental issue with the
> > > reliability of the current kexec process.
> > >
> > > > For a 0.2s speed up you are talking about disabling all of the safety
> > > > checks in a very dangerous situation.? How much can you can in
> > > > performance by optimizing the sha256 implementation instead of using
> > > > what is essentially a reference implementation in basic C that I copied
> > > > from somewhere long ago.
> > > >
> > > > Optimize the sha256 implementation and the memory copy loop and then
> > > > show how the tiny bit of time that is left is on a mission critical path
> > > > and must be removed.? Then we can reasonably talk about a config option
> > > > for disabling the sha256 implementation in the kexec in not-panic case.
> > > >
> > > Thanks for this, I assume that over here you are suggesting for e.g.
> > > in x86 to replace the existing sha verification implementation in
> > > purgatory with the one in arch/x86/crypto/sha256_ssse3_glue.c? Also
> > > could you point to the memory copy loop to optimize? I can have a
> > > look at these 2 options.
> > >
> > > Even with these optimizations, i think the v2 patch should still be
> > > considered. The patch in the end is providing only an option to
> > > disable purgatory on x86 only, with the default value of keeping it
> > > enabled and not changing kexec behaviour. The CONFIG already
> > > existed, it is just renamed and now provides user the option to
> > > select to disable purgatory for x86, rather than it being
> > > architecture dependent.
> > >
> > > Thanks again for the review and comments,
> > > Usama
> > >
> > >
> > > > That sha256 implementation in part so that we can all sleep at night
> > > > because we don't have to deal with very very strange heizenbugs.
> > > >
> > > > Eric
> > > >
>