2019-08-06 17:28:42

by Jane Chu

[permalink] [raw]
Subject: [PATCH v4 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

Change in v4:
- remove trailing white space

Changes in v3:
- move **tk cleanup to its own patch

Changes in v2:
- move 'tk' allocations internal to add_to_kill(), suggested by Dan;
- ran checkpatch.pl check, pointed out by Matthew;
- Noaya pointed out that v1 would have missed the SIGKILL
if "tk->addr == -EFAULT", since the code returns early.
Incorporated Noaya's suggestion, also, skip VMAs where
"tk->size_shift == 0" for zone device page, and deliver SIGBUS
when "tk->size_shift != 0" so the payload is helpful;
- added Suggested-by: Naoya Horiguchi <[email protected]>

Jane Chu (2):
mm/memory-failure.c clean up around tk pre-allocation
mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if
mmaped more than once

mm/memory-failure.c | 62 ++++++++++++++++++++++-------------------------------
1 file changed, 26 insertions(+), 36 deletions(-)

--
1.8.3.1


2019-08-06 17:31:10

by Jane Chu

[permalink] [raw]
Subject: [PATCH v4 2/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

Mmap /dev/dax more than once, then read the poison location using address
from one of the mappings. The other mappings due to not having the page
mapped in will cause SIGKILLs delivered to the process. SIGKILL succeeds
over SIGBUS, so user process looses the opportunity to handle the UE.

Although one may add MAP_POPULATE to mmap(2) to work around the issue,
MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so
isn't always an option.

Details -

ndctl inject-error --block=10 --count=1 namespace6.0

./read_poison -x dax6.0 -o 5120 -m 2
mmaped address 0x7f5bb6600000
mmaped address 0x7f3cf3600000
doing local read at address 0x7f3cf3601400
Killed

Console messages in instrumented kernel -

mce: Uncorrected hardware memory error in user-access at edbe201400
Memory failure: tk->addr = 7f5bb6601000
Memory failure: address edbe201: call dev_pagemap_mapping_shift
dev_pagemap_mapping_shift: page edbe201: no PUD
Memory failure: tk->size_shift == 0
Memory failure: Unable to find user space address edbe201 in read_poison
Memory failure: tk->addr = 7f3cf3601000
Memory failure: address edbe201: call dev_pagemap_mapping_shift
Memory failure: tk->size_shift = 21
Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page
=> to deliver SIGKILL
Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption
=> to deliver SIGBUS

Signed-off-by: Jane Chu <[email protected]>
Suggested-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 51d5b20..bd4db33 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -199,7 +199,6 @@ struct to_kill {
struct task_struct *tsk;
unsigned long addr;
short size_shift;
- char addr_valid;
};

/*
@@ -318,22 +317,27 @@ static void add_to_kill(struct task_struct *tsk, struct page *p,
}

tk->addr = page_address_in_vma(p, vma);
- tk->addr_valid = 1;
if (is_zone_device_page(p))
tk->size_shift = dev_pagemap_mapping_shift(p, vma);
else
tk->size_shift = compound_order(compound_head(p)) + PAGE_SHIFT;

/*
- * In theory we don't have to kill when the page was
- * munmaped. But it could be also a mremap. Since that's
- * likely very rare kill anyways just out of paranoia, but use
- * a SIGKILL because the error is not contained anymore.
+ * Send SIGKILL if "tk->addr == -EFAULT". Also, as
+ * "tk->size_shift" is always non-zero for !is_zone_device_page(),
+ * so "tk->size_shift == 0" effectively checks no mapping on
+ * ZONE_DEVICE. Indeed, when a devdax page is mmapped N times
+ * to a process' address space, it's possible not all N VMAs
+ * contain mappings for the page, but at least one VMA does.
+ * Only deliver SIGBUS with payload derived from the VMA that
+ * has a mapping for the page.
*/
- if (tk->addr == -EFAULT || tk->size_shift == 0) {
+ if (tk->addr == -EFAULT) {
pr_info("Memory failure: Unable to find user space address %lx in %s\n",
page_to_pfn(p), tsk->comm);
- tk->addr_valid = 0;
+ } else if (tk->size_shift == 0) {
+ kfree(tk);
+ return;
}

get_task_struct(tsk);
@@ -361,7 +365,7 @@ static void kill_procs(struct list_head *to_kill, int forcekill, bool fail,
* make sure the process doesn't catch the
* signal and then access the memory. Just kill it.
*/
- if (fail || tk->addr_valid == 0) {
+ if (fail || tk->addr == -EFAULT) {
pr_err("Memory failure: %#lx: forcibly killing %s:%d because of failure to unmap corrupted page\n",
pfn, tk->tsk->comm, tk->tsk->pid);
do_send_sig_info(SIGKILL, SEND_SIG_PRIV,
--
1.8.3.1

2019-08-06 21:13:12

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

On Tue, Aug 6, 2019 at 10:28 AM Jane Chu <[email protected]> wrote:
>
> Mmap /dev/dax more than once, then read the poison location using address
> from one of the mappings. The other mappings due to not having the page
> mapped in will cause SIGKILLs delivered to the process. SIGKILL succeeds
> over SIGBUS, so user process looses the opportunity to handle the UE.
>
> Although one may add MAP_POPULATE to mmap(2) to work around the issue,
> MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so
> isn't always an option.
>
> Details -
>
> ndctl inject-error --block=10 --count=1 namespace6.0
>
> ./read_poison -x dax6.0 -o 5120 -m 2
> mmaped address 0x7f5bb6600000
> mmaped address 0x7f3cf3600000
> doing local read at address 0x7f3cf3601400
> Killed
>
> Console messages in instrumented kernel -
>
> mce: Uncorrected hardware memory error in user-access at edbe201400
> Memory failure: tk->addr = 7f5bb6601000
> Memory failure: address edbe201: call dev_pagemap_mapping_shift
> dev_pagemap_mapping_shift: page edbe201: no PUD
> Memory failure: tk->size_shift == 0
> Memory failure: Unable to find user space address edbe201 in read_poison
> Memory failure: tk->addr = 7f3cf3601000
> Memory failure: address edbe201: call dev_pagemap_mapping_shift
> Memory failure: tk->size_shift = 21
> Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page
> => to deliver SIGKILL
> Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption
> => to deliver SIGBUS
>
> Signed-off-by: Jane Chu <[email protected]>
> Suggested-by: Naoya Horiguchi <[email protected]>

Looks good, ignore the checkpatch warning about too long subject line,
looks appropriate to me:

Reviewed-by: Dan Williams <[email protected]>

2019-10-08 18:16:02

by Jane Chu

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

Hi, Naoya,

What is the status of the patches?
Is there anything I need to do from my end ?

Regards,
-jane

On 8/6/2019 10:25 AM, Jane Chu wrote:
> Change in v4:
> - remove trailing white space
>
> Changes in v3:
> - move **tk cleanup to its own patch
>
> Changes in v2:
> - move 'tk' allocations internal to add_to_kill(), suggested by Dan;
> - ran checkpatch.pl check, pointed out by Matthew;
> - Noaya pointed out that v1 would have missed the SIGKILL
> if "tk->addr == -EFAULT", since the code returns early.
> Incorporated Noaya's suggestion, also, skip VMAs where
> "tk->size_shift == 0" for zone device page, and deliver SIGBUS
> when "tk->size_shift != 0" so the payload is helpful;
> - added Suggested-by: Naoya Horiguchi <[email protected]>
>
> Jane Chu (2):
> mm/memory-failure.c clean up around tk pre-allocation
> mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if
> mmaped more than once
>
> mm/memory-failure.c | 62 ++++++++++++++++++++++-------------------------------
> 1 file changed, 26 insertions(+), 36 deletions(-)
>

2019-10-08 23:22:32

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

Hi Jane,

I think that this patchset is good enough and ready to be merged.
Andrew, could you consider queuing this series into your tree?

Thanks,
Naoya Horiguchi

On Tue, Oct 08, 2019 at 11:13:23AM -0700, Jane Chu wrote:
> Hi, Naoya,
>
> What is the status of the patches?
> Is there anything I need to do from my end ?
>
> Regards,
> -jane
>
> On 8/6/2019 10:25 AM, Jane Chu wrote:
> > Change in v4:
> > - remove trailing white space
> >
> > Changes in v3:
> > - move **tk cleanup to its own patch
> >
> > Changes in v2:
> > - move 'tk' allocations internal to add_to_kill(), suggested by Dan;
> > - ran checkpatch.pl check, pointed out by Matthew;
> > - Noaya pointed out that v1 would have missed the SIGKILL
> > if "tk->addr == -EFAULT", since the code returns early.
> > Incorporated Noaya's suggestion, also, skip VMAs where
> > "tk->size_shift == 0" for zone device page, and deliver SIGBUS
> > when "tk->size_shift != 0" so the payload is helpful;
> > - added Suggested-by: Naoya Horiguchi <[email protected]>
> >
> > Jane Chu (2):
> > mm/memory-failure.c clean up around tk pre-allocation
> > mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if
> > mmaped more than once
> >
> > mm/memory-failure.c | 62 ++++++++++++++++++++++-------------------------------
> > 1 file changed, 26 insertions(+), 36 deletions(-)
> >
>

2019-10-09 23:56:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

On Tue, 8 Oct 2019 23:18:31 +0000 Naoya Horiguchi <[email protected]> wrote:

> I think that this patchset is good enough and ready to be merged.
> Andrew, could you consider queuing this series into your tree?

I'll treat that as an acked-by:.

Do you think 2/2 should be backported into -stable trees?

2019-10-10 01:23:36

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

On Wed, Oct 09, 2019 at 04:55:10PM -0700, Andrew Morton wrote:
> On Tue, 8 Oct 2019 23:18:31 +0000 Naoya Horiguchi <[email protected]> wrote:
>
> > I think that this patchset is good enough and ready to be merged.
> > Andrew, could you consider queuing this series into your tree?
>
> I'll treat that as an acked-by:.

thanks.

>
> Do you think 2/2 should be backported into -stable trees?

Yes, I think so. Please add Cc: stable.

Thanks,
Naoya Horiguchi