On 2023/8/21 18:50, Will Deacon wrote:
> On Sat, Aug 19, 2023 at 06:22:12PM +0800, Shuai Xue wrote:
>> When a process tries to access a page that is already offline
>
> How does this happen?
Case 1:
When a process consume poison, it will trigger a Synchronous External Abort.
The memory-failure.c on arm64 platform does not work as expected, it does NOT
send sigbus to the process consumed poison because GHES handle all notification
as Action Option event. Process will not be collected to be killed unless explicitly
set early kill in mm/memory-failure.c:collect_procs(). Even worse, QEMU relies on
SIGBUS and its code to perform vSEA injection into the Guest Kernel.
The memory-failure.c only offlines the page which unmaps the page from process.
and the process will start from the last PC so that a page fault occurs. Then
a sigbus will send to process in do_page_fault() with BUS_MCEERR_AR.
By the way, I have sent a patch set to solve the memory failure workflow in GHES[1]
collect some reviewed-tags, but there has been no response since the last version
sent 4 months ago. Could you please help to review it? Thank you.
https://lore.kernel.org/all/[email protected]/
Case 2:
When a poison page shared by many processes, the memory-failure.c unmap the poison page
from all processes and only send sigbus to the process which accessing the poison page.
The poison page may be accessed by other processes later and it triggers a page fault,
then a sigbus will send to process in do_page_fault() with BUS_MCEERR_AR.
>
>> the kernel will send a sigbus signal with the BUS_MCEERR_AR code. This
>> signal is typically handled by a registered sigbus handler in the
>> process. However, if the process does not have a registered sigbus
>> handler, it is important for end users to be informed about what
>> happened.
>>
>> To address this, add an error message similar to those implemented on
>> the x86, powerpc, and parisc platforms.
>>
>> Signed-off-by: Shuai Xue <[email protected]>
>> ---
>> arch/arm64/mm/fault.c | 2 ++
>> arch/parisc/mm/fault.c | 5 ++---
>> arch/x86/mm/fault.c | 3 +--
>> 3 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 3fe516b32577..38e2186882bd 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>> } else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
>> unsigned int lsb;
>>
>> + pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
>> + current->comm, current->pid, far);
>> lsb = PAGE_SHIFT;
>> if (fault & VM_FAULT_HWPOISON_LARGE)
>> lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
>
> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
> and there's plenty of code in memory-failure.c for handling poisoned pages
> reported by e.g. GHES. I don't think dumping extra messages in dmesg from
> the arch code really adds anything.
I see the show_unhandled_signals() will dump the stack but it rely on
/proc/sys/debug/exception-trace be set.
The memory failure is the top issue in our production cloud and also other hyperscalers.
We have received complaints from our operations engineers and end users that processes
are being inexplicably killed :(. Could you please consider add a message?
Thank you.
Best Regards,
Shuai