LinuxLists.cc - [PATCH 0/2] Fatal signal handing within uaccess faults

2017-07-11 14:21:00

Subject: [PATCH 0/2] Fatal signal handing within uaccess faults

[resending with LAKML's address corrected]

Hi,

Arch maintainer tl;dr: most arch fault code doesn't handle fatal signals
correctly, allowing unprivileged users to create an unkillable task which can
lock up the system. Please check whether your arch is affected.

AFAICT, most arches don't correctly handle a fatal signal interrupting a
uaccess fault. They attempt to bail out, returning to the faulting context
without bothering to handle the fault, but forget to apply the uaccess fixup.
Consequently, the uaccess gets replayed, and the same thing happens forver.

When this occurs, the relevant task never returns to userspace, never handles
the fatal signal, and is stuck in an unkillable (though interruptible and
preemptible) state. The task can inhibit forward progress of the rest of the
system, leading to RCU stalls and lockups.

It's possible for an unprivileged user to trigger this deliberately using the
userfaultfd syscall, as demonstrated by the test case at the end of this email
(note: requires CONFIG_USERFAULTFD to be selected). I am not sure if this is
the only way of triggering the issue.

I stumbled upon this while fuzzing arm64 with Syzkaller. I've verified that
both arm and arm64 have the issue, and by inspection is seems that the majority
of other architectures are affected.

It looks like this was fixed up for x86 in 2014 with commit:

26178ec11ef3c6c8 ("x86: mm: consolidate VM_FAULT_RETRY handling")

... but most other architectures never received a similar fixup.

The duplication (and divergence) of this logic is unfortunate. It's largely
copy-paste code that could be consolidated under mm/.

Until we end up refactoring this, and so as to be sutiable for backporting,
this series fixes arm and arm64 in-place. I've not touched other architectures
as I don't have the relevant hardwre or arch knowledge.

Thanks,
Mark.

----
#include <errno.h>
#include <linux/userfaultfd.h>
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/vfs.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
void *mem;
long pagesz;
int uffd, ret;
struct uffdio_api api = {
.api = UFFD_API
};
struct uffdio_register reg;

pagesz = sysconf(_SC_PAGESIZE);
if (pagesz < 0) {
return errno;
}

mem = mmap(NULL, pagesz, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (mem == MAP_FAILED)
return errno;

uffd = syscall(__NR_userfaultfd, 0);
if (uffd < 0)
return errno;

ret = ioctl(uffd, UFFDIO_API, &api);
if (ret < 0)
return errno;

reg = (struct uffdio_register) {
.range = {
.start = (unsigned long)mem,
.len = pagesz
},
.mode = UFFDIO_REGISTER_MODE_MISSING
};

ret = ioctl(uffd, UFFDIO_REGISTER, &reg);
if (ret < 0)
return errno;

/*
* Force an arbitrary uaccess to memory monitored by the userfaultfd.
* This will block, but when a SIGKILL is sent, will consume all
* available CPU time without being killed, and may inhibit forward
* progress of the system.
*/
ret = fstatfs(0, (struct statfs *)mem);

return 0;
}
----

Mark Rutland (2):
arm64: mm: abort uaccess retries upon fatal signal
arm: mm: abort uaccess retries upon fatal signal

arch/arm/mm/fault.c | 5 ++++-
arch/arm64/mm/fault.c | 5 ++++-
2 files changed, 8 insertions(+), 2 deletions(-)

--
1.9.1

2017-07-11 14:21:11

by Mark Rutland

[permalink] [raw]

Subject: [PATCH 1/2] arm64: mm: abort uaccess retries upon fatal signal

When there's a fatal signal pending, arm64's do_page_fault()
implementation returns 0. The intent is that we'll return to the
faulting userspace instruction, delivering the signal on the way.

However, if we take a fatal signal during fixing up a uaccess, this
results in a return to the faulting kernel instruction, which will be
instantly retried, resulting in the same fault being taken forever. As
the task never reaches userspace, the signal is not delivered, and the
task is left unkillable. While the task is stuck in this state, it can
inhibit the forward progress of the system.

To avoid this, we must ensure that when a fatal signal is pending, we
apply any necessary fixup for a faulting kernel instruction. Thus we
will return to an error path, and it is up to that code to make forward
progress towards delivering the fatal signal.

Signed-off-by: Mark Rutland <[email protected]>
Reviewed-by: Steve Capper <[email protected]>
Tested-by: Steve Capper <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
---
arch/arm64/mm/fault.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95df..3952d5e 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -397,8 +397,11 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
* signal first. We do not need to release the mmap_sem because it
* would already be released in __lock_page_or_retry in mm/filemap.c.
*/
- if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
+ if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
+ if (!user_mode(regs))
+ goto no_context;
return 0;
+ }

/*
* Major/minor page fault accounting is only done on the initial
--
1.9.1

2017-07-11 14:21:07

by Mark Rutland

[permalink] [raw]

Subject: [PATCH 2/2] arm: mm: abort uaccess retries upon fatal signal

When there's a fatal signal pending, arm's do_page_fault()
implementation returns 0. The intent is that we'll return to the
faulting userspace instruction, delivering the signal on the way.

However, if we take a fatal signal during fixing up a uaccess, this
results in a return to the faulting kernel instruction, which will be
instantly retried, resulting in the same fault being taken forever. As
the task never reaches userspace, the signal is not delivered, and the
task is left unkillable. While the task is stuck in this state, it can
inhibit the forward progress of the system.

To avoid this, we must ensure that when a fatal signal is pending, we
apply any necessary fixup for a faulting kernel instruction. Thus we
will return to an error path, and it is up to that code to make forward
progress towards delivering the fatal signal.

Signed-off-by: Mark Rutland <[email protected]>
Reviewed-by: Steve Capper <[email protected]>
Cc: Russell King <[email protected]>
Cc: [email protected]
---
arch/arm/mm/fault.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index ff8b0aa..42f5853 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -315,8 +315,11 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
* signal first. We do not need to release the mmap_sem because
* it would already be released in __lock_page_or_retry in
* mm/filemap.c. */
- if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
+ if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
+ if (!user_mode(regs))
+ goto no_context;
return 0;
+ }

/*
* Major/minor page fault accounting is only done on the
--
1.9.1

2017-07-11 14:58:50

by Will Deacon

[permalink] [raw]

Subject: Re: [PATCH 1/2] arm64: mm: abort uaccess retries upon fatal signal

On Tue, Jul 11, 2017 at 03:19:22PM +0100, Mark Rutland wrote:
> When there's a fatal signal pending, arm64's do_page_fault()
> implementation returns 0. The intent is that we'll return to the
> faulting userspace instruction, delivering the signal on the way.
>
> However, if we take a fatal signal during fixing up a uaccess, this
> results in a return to the faulting kernel instruction, which will be
> instantly retried, resulting in the same fault being taken forever. As
> the task never reaches userspace, the signal is not delivered, and the
> task is left unkillable. While the task is stuck in this state, it can
> inhibit the forward progress of the system.
>
> To avoid this, we must ensure that when a fatal signal is pending, we
> apply any necessary fixup for a faulting kernel instruction. Thus we
> will return to an error path, and it is up to that code to make forward
> progress towards delivering the fatal signal.
>
> Signed-off-by: Mark Rutland <[email protected]>
> Reviewed-by: Steve Capper <[email protected]>
> Tested-by: Steve Capper <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: James Morse <[email protected]>
> Cc: Laura Abbott <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: [email protected]
> ---
> arch/arm64/mm/fault.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 37b95df..3952d5e 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -397,8 +397,11 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
> * signal first. We do not need to release the mmap_sem because it
> * would already be released in __lock_page_or_retry in mm/filemap.c.
> */
> - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> + if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
> + if (!user_mode(regs))
> + goto no_context;
> return 0;
> + }

This will need rebasing at -rc1 (take a look at current HEAD).

Also, I think it introduces a weird corner case where we take a page fault
when writing the signal frame to the user stack to deliver a SIGSEGV. If
we end up with VM_FAULT_RETRY and somebody has sent a SIGKILL to the task,
then we'll fail setup_sigframe and force an un-handleable SIGSEGV instead
of SIGKILL.

The end result (task is killed) is the same, but the fatal signal is wrong.

Will

2017-07-12 17:19:59

by James Morse

[permalink] [raw]

Subject: Re: [PATCH 1/2] arm64: mm: abort uaccess retries upon fatal signal

Hi Mark,

On 11/07/17 15:19, Mark Rutland wrote:
> When there's a fatal signal pending, arm64's do_page_fault()
> implementation returns 0. The intent is that we'll return to the
> faulting userspace instruction, delivering the signal on the way.
>
> However, if we take a fatal signal during fixing up a uaccess, this
> results in a return to the faulting kernel instruction, which will be
> instantly retried, resulting in the same fault being taken forever. As
> the task never reaches userspace, the signal is not delivered, and the
> task is left unkillable. While the task is stuck in this state, it can
> inhibit the forward progress of the system.
>
> To avoid this, we must ensure that when a fatal signal is pending, we
> apply any necessary fixup for a faulting kernel instruction. Thus we
> will return to an error path, and it is up to that code to make forward
> progress towards delivering the fatal signal.

VM_FAULT_RETRY's 'I released your locks' behaviour is pretty nasty, but this
looks right. FWIW:
Reviewed-by: James Morse <[email protected]>

I also gave this a spin through LTP on Juno, based on v4.12-defconfig:
Tested-by: James Morse <[email protected]>

Thanks,

James

2017-08-21 13:43:22

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 1/2] arm64: mm: abort uaccess retries upon fatal signal

On Tue, Jul 11, 2017 at 03:58:49PM +0100, Will Deacon wrote:
> On Tue, Jul 11, 2017 at 03:19:22PM +0100, Mark Rutland wrote:
> > When there's a fatal signal pending, arm64's do_page_fault()
> > implementation returns 0. The intent is that we'll return to the
> > faulting userspace instruction, delivering the signal on the way.
> >
> > However, if we take a fatal signal during fixing up a uaccess, this
> > results in a return to the faulting kernel instruction, which will be
> > instantly retried, resulting in the same fault being taken forever. As
> > the task never reaches userspace, the signal is not delivered, and the
> > task is left unkillable. While the task is stuck in this state, it can
> > inhibit the forward progress of the system.
> >
> > To avoid this, we must ensure that when a fatal signal is pending, we
> > apply any necessary fixup for a faulting kernel instruction. Thus we
> > will return to an error path, and it is up to that code to make forward
> > progress towards delivering the fatal signal.
> >
> > Signed-off-by: Mark Rutland <[email protected]>
> > Reviewed-by: Steve Capper <[email protected]>
> > Tested-by: Steve Capper <[email protected]>
> > Cc: Catalin Marinas <[email protected]>
> > Cc: James Morse <[email protected]>
> > Cc: Laura Abbott <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > Cc: [email protected]
> > ---
> > arch/arm64/mm/fault.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > index 37b95df..3952d5e 100644
> > --- a/arch/arm64/mm/fault.c
> > +++ b/arch/arm64/mm/fault.c
> > @@ -397,8 +397,11 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
> > * signal first. We do not need to release the mmap_sem because it
> > * would already be released in __lock_page_or_retry in mm/filemap.c.
> > */
> > - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> > + if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
> > + if (!user_mode(regs))
> > + goto no_context;
> > return 0;
> > + }
>
> This will need rebasing at -rc1 (take a look at current HEAD).
>
> Also, I think it introduces a weird corner case where we take a page fault
> when writing the signal frame to the user stack to deliver a SIGSEGV. If
> we end up with VM_FAULT_RETRY and somebody has sent a SIGKILL to the task,
> then we'll fail setup_sigframe and force an un-handleable SIGSEGV instead
> of SIGKILL.
>
> The end result (task is killed) is the same, but the fatal signal is wrong.

That doesn't seem to be the case, testing on v4.13-rc5.

I used sigaltstack() to use the userfaultfd region as signal stack,
registerd a SIGSEGV handler, and dereferenced NULL. The task locks up,
but when killed with a SIGINT or SIGKILL, the exit status reflects that
signal, rather than the SIGSEGV.

If I move the SIGINT handler onto the userfaultfd-monitored stack, then
delivering SIGINT hangs, but can be killed with SIGKILL, and the exit
status reflects that SIGKILL.

As you say, it does look like we'd try to set up a deferred SIGSEGV for
the failed signal delivery.

I haven't yet figured out exactly how that works; I'll keep digging.

Thanks,
Mark.

2017-08-22 09:45:26

by Will Deacon

[permalink] [raw]

Subject: Re: [PATCH 1/2] arm64: mm: abort uaccess retries upon fatal signal

On Mon, Aug 21, 2017 at 02:42:03PM +0100, Mark Rutland wrote:
> On Tue, Jul 11, 2017 at 03:58:49PM +0100, Will Deacon wrote:
> > On Tue, Jul 11, 2017 at 03:19:22PM +0100, Mark Rutland wrote:
> > > When there's a fatal signal pending, arm64's do_page_fault()
> > > implementation returns 0. The intent is that we'll return to the
> > > faulting userspace instruction, delivering the signal on the way.
> > >
> > > However, if we take a fatal signal during fixing up a uaccess, this
> > > results in a return to the faulting kernel instruction, which will be
> > > instantly retried, resulting in the same fault being taken forever. As
> > > the task never reaches userspace, the signal is not delivered, and the
> > > task is left unkillable. While the task is stuck in this state, it can
> > > inhibit the forward progress of the system.
> > >
> > > To avoid this, we must ensure that when a fatal signal is pending, we
> > > apply any necessary fixup for a faulting kernel instruction. Thus we
> > > will return to an error path, and it is up to that code to make forward
> > > progress towards delivering the fatal signal.
> > >
> > > Signed-off-by: Mark Rutland <[email protected]>
> > > Reviewed-by: Steve Capper <[email protected]>
> > > Tested-by: Steve Capper <[email protected]>
> > > Cc: Catalin Marinas <[email protected]>
> > > Cc: James Morse <[email protected]>
> > > Cc: Laura Abbott <[email protected]>
> > > Cc: Will Deacon <[email protected]>
> > > Cc: [email protected]
> > > ---
> > > arch/arm64/mm/fault.c | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > > index 37b95df..3952d5e 100644
> > > --- a/arch/arm64/mm/fault.c
> > > +++ b/arch/arm64/mm/fault.c
> > > @@ -397,8 +397,11 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
> > > * signal first. We do not need to release the mmap_sem because it
> > > * would already be released in __lock_page_or_retry in mm/filemap.c.
> > > */
> > > - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> > > + if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
> > > + if (!user_mode(regs))
> > > + goto no_context;
> > > return 0;
> > > + }
> >
> > This will need rebasing at -rc1 (take a look at current HEAD).
> >
> > Also, I think it introduces a weird corner case where we take a page fault
> > when writing the signal frame to the user stack to deliver a SIGSEGV. If
> > we end up with VM_FAULT_RETRY and somebody has sent a SIGKILL to the task,
> > then we'll fail setup_sigframe and force an un-handleable SIGSEGV instead
> > of SIGKILL.
> >
> > The end result (task is killed) is the same, but the fatal signal is wrong.
>
> That doesn't seem to be the case, testing on v4.13-rc5.
>
> I used sigaltstack() to use the userfaultfd region as signal stack,
> registerd a SIGSEGV handler, and dereferenced NULL. The task locks up,
> but when killed with a SIGINT or SIGKILL, the exit status reflects that
> signal, rather than the SIGSEGV.
>
> If I move the SIGINT handler onto the userfaultfd-monitored stack, then
> delivering SIGINT hangs, but can be killed with SIGKILL, and the exit
> status reflects that SIGKILL.
>
> As you say, it does look like we'd try to set up a deferred SIGSEGV for
> the failed signal delivery.
>
> I haven't yet figured out exactly how that works; I'll keep digging.

The SEGV makes it all the way into do_group_exit, but then signal_group_exit
is set and the exit_code is overridden with SIGKILL at the last minute (see
complete_signal).

So I'm happy that your patch is doing the right thing -- could you send a
rebased version please?

Thanks,

Will

2017-08-22 10:42:12

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 2/2] arm: mm: abort uaccess retries upon fatal signal

On Tue, Jul 11, 2017 at 03:19:23PM +0100, Mark Rutland wrote:
> When there's a fatal signal pending, arm's do_page_fault()
> implementation returns 0. The intent is that we'll return to the
> faulting userspace instruction, delivering the signal on the way.
>
> However, if we take a fatal signal during fixing up a uaccess, this
> results in a return to the faulting kernel instruction, which will be
> instantly retried, resulting in the same fault being taken forever. As
> the task never reaches userspace, the signal is not delivered, and the
> task is left unkillable. While the task is stuck in this state, it can
> inhibit the forward progress of the system.
>
> To avoid this, we must ensure that when a fatal signal is pending, we
> apply any necessary fixup for a faulting kernel instruction. Thus we
> will return to an error path, and it is up to that code to make forward
> progress towards delivering the fatal signal.
>
> Signed-off-by: Mark Rutland <[email protected]>
> Reviewed-by: Steve Capper <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: [email protected]

Russell, on the assumption that you're happy with this as-is, I've
dropped it into the patch system as 8692/1.

Thanks,
Mark.

> ---
> arch/arm/mm/fault.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index ff8b0aa..42f5853 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -315,8 +315,11 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
> * signal first. We do not need to release the mmap_sem because
> * it would already be released in __lock_page_or_retry in
> * mm/filemap.c. */
> - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> + if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
> + if (!user_mode(regs))
> + goto no_context;
> return 0;
> + }
>
> /*
> * Major/minor page fault accounting is only done on the
> --
> 1.9.1
>

2017-11-14 06:51:23

by Rabin Vincent

[permalink] [raw]

Subject: Re: [PATCH 1/2] arm64: mm: abort uaccess retries upon fatal signal

On Tue, Aug 22, 2017 at 10:45:24AM +0100, Will Deacon wrote:
> On Mon, Aug 21, 2017 at 02:42:03PM +0100, Mark Rutland wrote:
> > On Tue, Jul 11, 2017 at 03:58:49PM +0100, Will Deacon wrote:
> > > On Tue, Jul 11, 2017 at 03:19:22PM +0100, Mark Rutland wrote:
> > > > When there's a fatal signal pending, arm64's do_page_fault()
> > > > implementation returns 0. The intent is that we'll return to the
> > > > faulting userspace instruction, delivering the signal on the way.
> > > >
> > > > However, if we take a fatal signal during fixing up a uaccess, this
> > > > results in a return to the faulting kernel instruction, which will be
> > > > instantly retried, resulting in the same fault being taken forever. As
> > > > the task never reaches userspace, the signal is not delivered, and the
> > > > task is left unkillable. While the task is stuck in this state, it can
> > > > inhibit the forward progress of the system.
> > > >
> > > > To avoid this, we must ensure that when a fatal signal is pending, we
> > > > apply any necessary fixup for a faulting kernel instruction. Thus we
> > > > will return to an error path, and it is up to that code to make forward
> > > > progress towards delivering the fatal signal.
> > > >
> > > > Signed-off-by: Mark Rutland <[email protected]>
> > > > Reviewed-by: Steve Capper <[email protected]>
> > > > Tested-by: Steve Capper <[email protected]>
> > > > Cc: Catalin Marinas <[email protected]>
> > > > Cc: James Morse <[email protected]>
> > > > Cc: Laura Abbott <[email protected]>
> > > > Cc: Will Deacon <[email protected]>
> > > > Cc: [email protected]
> > > > ---
> > > > arch/arm64/mm/fault.c | 5 ++++-
> > > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > > > index 37b95df..3952d5e 100644
> > > > --- a/arch/arm64/mm/fault.c
> > > > +++ b/arch/arm64/mm/fault.c
> > > > @@ -397,8 +397,11 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
> > > > * signal first. We do not need to release the mmap_sem because it
> > > > * would already be released in __lock_page_or_retry in mm/filemap.c.
> > > > */
> > > > - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> > > > + if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
> > > > + if (!user_mode(regs))
> > > > + goto no_context;
> > > > return 0;
> > > > + }
> > >
> > > This will need rebasing at -rc1 (take a look at current HEAD).
> > >
> > > Also, I think it introduces a weird corner case where we take a page fault
> > > when writing the signal frame to the user stack to deliver a SIGSEGV. If
> > > we end up with VM_FAULT_RETRY and somebody has sent a SIGKILL to the task,
> > > then we'll fail setup_sigframe and force an un-handleable SIGSEGV instead
> > > of SIGKILL.
> > >
> > > The end result (task is killed) is the same, but the fatal signal is wrong.
> >
> > That doesn't seem to be the case, testing on v4.13-rc5.
> >
> > I used sigaltstack() to use the userfaultfd region as signal stack,
> > registerd a SIGSEGV handler, and dereferenced NULL. The task locks up,
> > but when killed with a SIGINT or SIGKILL, the exit status reflects that
> > signal, rather than the SIGSEGV.
> >
> > If I move the SIGINT handler onto the userfaultfd-monitored stack, then
> > delivering SIGINT hangs, but can be killed with SIGKILL, and the exit
> > status reflects that SIGKILL.
> >
> > As you say, it does look like we'd try to set up a deferred SIGSEGV for
> > the failed signal delivery.
> >
> > I haven't yet figured out exactly how that works; I'll keep digging.
>
> The SEGV makes it all the way into do_group_exit, but then signal_group_exit
> is set and the exit_code is overridden with SIGKILL at the last minute (see
> complete_signal).

Unfortunately, this last minute is too late for print-fatal-signals.
With print-fatal-signals enabled, this patch leads to misleading
"potentially unexpected fatal signal 11" warnings if a process is
SIGKILL'd at the right time.

I've seen it without userfaultfd, but it's easiest reproduced by
patching Mark's original test code [1] with the following patch and
simply running "pkill -WINCH foo; pkill -KILL foo". This results in:

foo: potentially unexpected fatal signal 11.
CPU: 1 PID: 1793 Comm: foo Not tainted 4.9.58-devel #3
task: b3534780 task.stack: b4b7c000
PC is at 0x76effb60
LR is at 0x4227f4
pc : [<76effb60>] lr : [<004227f4>] psr: 600b0010
sp : 7eaf7bb4 ip : 00000000 fp : 00000000
r10: 00000001 r9 : 00000003 r8 : 76fcd000
r7 : 0000001d r6 : 76fd0cf0 r5 : 7eaf7c08 r4 : 00000000
r3 : 00000000 r2 : 00000000 r1 : 7eaf7a88 r0 : fffffffc
Flags: nZCv IRQs on FIQs on Mode USER_32 ISA ARM Segment user
Control: 10c5387d Table: 3357404a DAC: 00000055
CPU: 1 PID: 1793 Comm: foo Not tainted 4.9.58-devel #3
[<801113f0>] (unwind_backtrace) from [<8010cfb0>] (show_stack+0x18/0x1c)
[<8010cfb0>] (show_stack) from [<8039725c>] (dump_stack+0x84/0x98)
[<8039725c>] (dump_stack) from [<8012f448>] (get_signal+0x384/0x684)
[<8012f448>] (get_signal) from [<8010c2ec>] (do_signal+0xcc/0x470)
[<8010c2ec>] (do_signal) from [<8010c868>] (do_work_pending+0xb8/0xc8)
[<8010c868>] (do_work_pending) from [<801086d4>] (slow_work_pending+0xc/0x20)

This is ARM and I haven't tested ARM64, but the same problem even exists
on x86.

--- foo.c.orig 2017-11-13 23:45:47.802167284 +0100
+++ foo.c 2017-11-14 07:16:13.906363466 +0100
@@ -6,6 +6,11 @@
#include <sys/syscall.h>
#include <sys/vfs.h>
#include <unistd.h>
+#include <signal.h>
+
+static void handler(int sig)
+{
+}

int main(int argc, char *argv[])
{
@@ -47,13 +52,17 @@
if (ret < 0)
return errno;

+ sigaltstack(&(stack_t){.ss_sp = mem, .ss_size = pagesz}, NULL);
+ sigaction(SIGWINCH, &(struct sigaction){ .sa_handler = handler, .sa_flags = SA_ONSTACK, }, NULL);
+
/*
* Force an arbitrary uaccess to memory monitored by the userfaultfd.
* This will block, but when a SIGKILL is sent, will consume all
* available CPU time without being killed, and may inhibit forward
* progress of the system.
*/
- ret = fstatfs(0, (struct statfs *)mem);
+ // ret = fstatfs(0, (struct statfs *)mem);
+ pause();

return 0;
}

[1] https://lkml.kernel.org/r/[email protected]

From 1576424215704800902@xxx Tue Aug 22 09:48:06 +0000 2017
X-GM-THRID: 1572636135149693270
X-Gmail-Labels: Inbox,Category Forums