2021-07-25 17:35:23

by Al Viro

[permalink] [raw]
Subject: [RFC][CFT] signal handling fixes

Back in 2012 or so I'd found a bunch of fun issues with multiple
pending signals on a lot of architectures. m68k looked scarier than
usual (due to the combination of variable-sized exception frames with the
way kernel stack pointer is handled by the hardware), but I'd convinced
myself that it had been correct.

Unfortunately, I was wrong - handling of multiple pending signals
does *not* work correctly there.

Some background: wrt exception stack frames m68k variants fall
into 3 groups -
1) 68000 and near relatives (all non-MMU): push 32bit PC, then
push 16bit SR.
2) everything later than that, except for coldfire: push a
variable amount of auxillary data (used for insn restart, among other
things), then 16bit value encoding the format (upper 4 bits) and vector
(lower 12), then same as for (1) - 32bit PC and 16bit SR. Size of
variable part depends upon the frame type (upper 4 bits of frame/vector
word). Note that CPU32 falls into that group, even though it's non-MMU.
3) coldfire (both MMU and non-MMU): push 32bit PC, then 16bit SR,
then 16bit word superficially similar to format/vector combination on (2).

Handling of (2) is complicated, since we need the right frame
type when we go from kernel to userland. In particular, we want format 0
(8-byte) for entering a signal handler, no matter how did we enter the
kernel when we caught the signal. Conversely, when we return from signal
handler, we have format 0 on kernel entry (sigreturn(2) is a syscall) and
we need whatever frame we used to have back when we'd caught the signal.

The monumentally unpleasant part is that we *must* leave the
kernel mode with the same value of kernel stack pointer we had on entry.
Crossing from user to kernel mode does not set the kernel stack pointer
to known value - its value is kept since the last time we'd left the
kernel mode.

The sigreturn part is ugly as hell. Signal delivery avoids
quite that level of nastiness by the following trick: there's an int
(stkadj, initially 0) between the exception frame and the rest of pt_regs.
On the way back from exception we pop the registers, then add stkadj +
4 to stack pointer before doing RTE. Normally stkadj remains zero;
if we need to shrink the exception stack frame, we put the minimal one
over the aux data and store the offset into stkadj. When on the way out
we pop our way through the registers, we'll end up with stack pointer
pointing to stkadj (4 bytes below the original exception stack frame)
and once we add 4 + stkadj to stack pointer, we have the minimal exception
stack frame on top of stack and RTE does the right thing.

The problem with that trick is that exception stack frame in
pt_regs is no longer valid; in the best case regs->format will match
the original exception stack frame format and in the worst case it'll
be overwritten by bits 31..28 of signal handler address (if aux data
used to be 4 bytes long).

ptrace get_regs()/set_regs() takes that effect into account when
accessing PC and SR; unfortunately, setup_frame() and setup_rt_frame()
do not. That's not a problem for the first signal - ->stkadj is still
0 at that point. However, if we end up building multiple sigframes,
we might get screwed. Not hard to fix, thankfully...

Another bug is on the sigreturn side of things, and that one is
my fault - in bd6f56a75bb2 ("m68k: Missing syscall_trace() on sigreturn")
I'd missed the fact that we'd just relocated pt_regs, without having
updated ->thread.esp0.

These two are, IMO, -stable fodder. The third one isn't -
it cleans sigreturn, hopefully making it less convoluted. Instead of
doing unnatural things to C stack frames, call chain, etc. we let the
asm wrapper of {rt_,}sigreturn(2) do the following:
reserve a gap on stack, sufficiently large to hold any aux data
then call the C side of things, passing pt_regs and switch_stack
pointers to it, same as we do now if C part decides to insert aux data,
it can simply use memmove
to shift switch_stack + pt_regs and memcpy whatever's needed into the
created gap. Then we can return without any kind of magic - the C part
of stack is intact. Just return the new location of switch_stack +
pt_regs to the (asm) caller, so it could set stack pointer to it.

The series is on top of 5.14-rc1; it lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #untested.m68k
Individual patches in followups...

_Very_ lightly tested on aranym; no real hardware to test it on.
Any help with review and testing would be very welcome.

PS: FWIW, ifdefs in arch/m68k/kernel/signal.c are wrong - it's not !MMU
vs. coldfire/MMU vs. classic/MMU. It's actually 68000 vs. coldfire vs.
everything else. These days it's nearly correct, but only because on MMU
variants of coldfire we never see exception stack frames with type other
than 4 - it's controlled by alignment of kernel stack pointer on those,
and it's under the kernel control, so it's always 32bit-aligned. It used
to be more serious back when we had 68360 support - that's !MMU and exception
stack frames are like those on 68020, unless I'm misreading their manual...


2021-07-25 17:35:54

by Al Viro

[permalink] [raw]
Subject: [PATCH 1/3] m68k: handle arrivals of multiple signals correctly

When we have several pending signals, have entered with the kernel
with large exception frame *and* have already built at least one
sigframe, regs->stkadj is going to be non-zero and regs->format/sr/pc
are going to be junk - the real values are in shifted exception stack
frame we'd built when putting together the first sigframe.

If that happens, subsequent sigframes are going to be garbage.
Not hard to fix - just need to find the "adjusted" frame first
and look for format/vector/sr/pc in it.

Signed-off-by: Al Viro <[email protected]>
---
arch/m68k/kernel/signal.c | 88 ++++++++++++++++++++++-------------------------
1 file changed, 42 insertions(+), 46 deletions(-)

diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
index 8f215e79e70e..cd11eb101eac 100644
--- a/arch/m68k/kernel/signal.c
+++ b/arch/m68k/kernel/signal.c
@@ -447,7 +447,7 @@ static inline void save_fpu_state(struct sigcontext *sc, struct pt_regs *regs)

if (CPU_IS_060 ? sc->sc_fpstate[2] : sc->sc_fpstate[0]) {
fpu_version = sc->sc_fpstate[0];
- if (CPU_IS_020_OR_030 &&
+ if (CPU_IS_020_OR_030 && !regs->stkadj &&
regs->vector >= (VEC_FPBRUC * 4) &&
regs->vector <= (VEC_FPNAN * 4)) {
/* Clear pending exception in 68882 idle frame */
@@ -510,7 +510,7 @@ static inline int rt_save_fpu_state(struct ucontext __user *uc, struct pt_regs *
if (!(CPU_IS_060 || CPU_IS_COLDFIRE))
context_size = fpstate[1];
fpu_version = fpstate[0];
- if (CPU_IS_020_OR_030 &&
+ if (CPU_IS_020_OR_030 && !regs->stkadj &&
regs->vector >= (VEC_FPBRUC * 4) &&
regs->vector <= (VEC_FPNAN * 4)) {
/* Clear pending exception in 68882 idle frame */
@@ -832,18 +832,24 @@ asmlinkage int do_rt_sigreturn(struct pt_regs *regs, struct switch_stack *sw)
return 0;
}

+static inline struct pt_regs *rte_regs(struct pt_regs *regs)
+{
+ return (void *)regs + regs->stkadj;
+}
+
static void setup_sigcontext(struct sigcontext *sc, struct pt_regs *regs,
unsigned long mask)
{
+ struct pt_regs *tregs = rte_regs(regs);
sc->sc_mask = mask;
sc->sc_usp = rdusp();
sc->sc_d0 = regs->d0;
sc->sc_d1 = regs->d1;
sc->sc_a0 = regs->a0;
sc->sc_a1 = regs->a1;
- sc->sc_sr = regs->sr;
- sc->sc_pc = regs->pc;
- sc->sc_formatvec = regs->format << 12 | regs->vector;
+ sc->sc_sr = tregs->sr;
+ sc->sc_pc = tregs->pc;
+ sc->sc_formatvec = tregs->format << 12 | tregs->vector;
save_a5_state(sc, regs);
save_fpu_state(sc, regs);
}
@@ -851,6 +857,7 @@ static void setup_sigcontext(struct sigcontext *sc, struct pt_regs *regs,
static inline int rt_setup_ucontext(struct ucontext __user *uc, struct pt_regs *regs)
{
struct switch_stack *sw = (struct switch_stack *)regs - 1;
+ struct pt_regs *tregs = rte_regs(regs);
greg_t __user *gregs = uc->uc_mcontext.gregs;
int err = 0;

@@ -871,9 +878,9 @@ static inline int rt_setup_ucontext(struct ucontext __user *uc, struct pt_regs *
err |= __put_user(sw->a5, &gregs[13]);
err |= __put_user(sw->a6, &gregs[14]);
err |= __put_user(rdusp(), &gregs[15]);
- err |= __put_user(regs->pc, &gregs[16]);
- err |= __put_user(regs->sr, &gregs[17]);
- err |= __put_user((regs->format << 12) | regs->vector, &uc->uc_formatvec);
+ err |= __put_user(tregs->pc, &gregs[16]);
+ err |= __put_user(tregs->sr, &gregs[17]);
+ err |= __put_user((tregs->format << 12) | tregs->vector, &uc->uc_formatvec);
err |= rt_save_fpu_state(uc, regs);
return err;
}
@@ -890,13 +897,14 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
struct pt_regs *regs)
{
struct sigframe __user *frame;
- int fsize = frame_extra_sizes(regs->format);
+ struct pt_regs *tregs = rte_regs(regs);
+ int fsize = frame_extra_sizes(tregs->format);
struct sigcontext context;
int err = 0, sig = ksig->sig;

if (fsize < 0) {
pr_debug("setup_frame: Unknown frame format %#x\n",
- regs->format);
+ tregs->format);
return -EFAULT;
}

@@ -907,7 +915,7 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,

err |= __put_user(sig, &frame->sig);

- err |= __put_user(regs->vector, &frame->code);
+ err |= __put_user(tregs->vector, &frame->code);
err |= __put_user(&frame->sc, &frame->psc);

if (_NSIG_WORDS > 1)
@@ -934,33 +942,27 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
push_cache ((unsigned long) &frame->retcode);

/*
- * Set up registers for signal handler. All the state we are about
- * to destroy is successfully copied to sigframe.
- */
- wrusp ((unsigned long) frame);
- regs->pc = (unsigned long) ksig->ka.sa.sa_handler;
- adjustformat(regs);
-
- /*
* This is subtle; if we build more than one sigframe, all but the
* first one will see frame format 0 and have fsize == 0, so we won't
* screw stkadj.
*/
- if (fsize)
+ if (fsize) {
regs->stkadj = fsize;
-
- /* Prepare to skip over the extra stuff in the exception frame. */
- if (regs->stkadj) {
- struct pt_regs *tregs =
- (struct pt_regs *)((ulong)regs + regs->stkadj);
+ tregs = rte_regs(regs);
pr_debug("Performing stackadjust=%04lx\n", regs->stkadj);
- /* This must be copied with decreasing addresses to
- handle overlaps. */
tregs->vector = 0;
tregs->format = 0;
- tregs->pc = regs->pc;
tregs->sr = regs->sr;
}
+
+ /*
+ * Set up registers for signal handler. All the state we are about
+ * to destroy is successfully copied to sigframe.
+ */
+ wrusp ((unsigned long) frame);
+ tregs->pc = (unsigned long) ksig->ka.sa.sa_handler;
+ adjustformat(regs);
+
return 0;
}

@@ -968,7 +970,8 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
struct pt_regs *regs)
{
struct rt_sigframe __user *frame;
- int fsize = frame_extra_sizes(regs->format);
+ struct pt_regs *tregs = rte_regs(regs);
+ int fsize = frame_extra_sizes(tregs->format);
int err = 0, sig = ksig->sig;

if (fsize < 0) {
@@ -1019,33 +1022,26 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
push_cache ((unsigned long) &frame->retcode);

/*
- * Set up registers for signal handler. All the state we are about
- * to destroy is successfully copied to sigframe.
- */
- wrusp ((unsigned long) frame);
- regs->pc = (unsigned long) ksig->ka.sa.sa_handler;
- adjustformat(regs);
-
- /*
* This is subtle; if we build more than one sigframe, all but the
* first one will see frame format 0 and have fsize == 0, so we won't
* screw stkadj.
*/
- if (fsize)
+ if (fsize) {
regs->stkadj = fsize;
-
- /* Prepare to skip over the extra stuff in the exception frame. */
- if (regs->stkadj) {
- struct pt_regs *tregs =
- (struct pt_regs *)((ulong)regs + regs->stkadj);
+ tregs = rte_regs(regs);
pr_debug("Performing stackadjust=%04lx\n", regs->stkadj);
- /* This must be copied with decreasing addresses to
- handle overlaps. */
tregs->vector = 0;
tregs->format = 0;
- tregs->pc = regs->pc;
tregs->sr = regs->sr;
}
+
+ /*
+ * Set up registers for signal handler. All the state we are about
+ * to destroy is successfully copied to sigframe.
+ */
+ wrusp ((unsigned long) frame);
+ tregs->pc = (unsigned long) ksig->ka.sa.sa_handler;
+ adjustformat(regs);
return 0;
}

--
2.11.0

2021-07-25 17:38:14

by Al Viro

[permalink] [raw]
Subject: [PATCH 2/3] m68k: update ->thread.esp0 before calling syscall_trace() in ret_from_signal

We get there when sigreturn has performed obscene acts on kernel stack;
in particular, the location of pt_regs has shifted. We are about to call
syscall_trace(), which might stop for tracer. If that happens, we'd better
have task_pt_regs() returning correct result...

Fucked-up-by: Al Viro <[email protected]>
Fixes: bd6f56a75bb2 ("m68k: Missing syscall_trace() on sigreturn")
Signed-off-by: Al Viro <[email protected]>
---
arch/m68k/kernel/entry.S | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/m68k/kernel/entry.S b/arch/m68k/kernel/entry.S
index 9dd76fbb7c6b..ff9e842cec0f 100644
--- a/arch/m68k/kernel/entry.S
+++ b/arch/m68k/kernel/entry.S
@@ -186,6 +186,8 @@ ENTRY(ret_from_signal)
movel %curptr@(TASK_STACK),%a1
tstb %a1@(TINFO_FLAGS+2)
jge 1f
+ lea %sp@(SWITCH_STACK_SIZE),%a1
+ movel %a1,%curptr@(TASK_THREAD+THREAD_ESP0)
jbsr syscall_trace
1: RESTORE_SWITCH_STACK
addql #4,%sp
--
2.11.0

2021-07-27 10:25:12

by Finn Thain

[permalink] [raw]
Subject: Re: [RFC][CFT] signal handling fixes

On Sun, 25 Jul 2021, Al Viro wrote:

>
> The series is on top of 5.14-rc1; it lives in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #untested.m68k
> Individual patches in followups...
>
> _Very_ lightly tested on aranym; no real hardware to test it on.
> Any help with review and testing would be very welcome.
>

I can test this branch on a Motorola 68040 machine I have here. Can you
advise how to get decent code coverage? Maybe there's a package out there
with a signal-heavy test suite? Maybe I need a break point in a signal
handler? Or perhaps just send ^C to a process running under strace?

2021-07-27 14:49:04

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][CFT] signal handling fixes

On Tue, Jul 27, 2021 at 08:21:52PM +1000, Finn Thain wrote:
> On Sun, 25 Jul 2021, Al Viro wrote:
>
> >
> > The series is on top of 5.14-rc1; it lives in
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #untested.m68k
> > Individual patches in followups...
> >
> > _Very_ lightly tested on aranym; no real hardware to test it on.
> > Any help with review and testing would be very welcome.
> >
>
> I can test this branch on a Motorola 68040 machine I have here. Can you
> advise how to get decent code coverage? Maybe there's a package out there
> with a signal-heavy test suite? Maybe I need a break point in a signal
> handler? Or perhaps just send ^C to a process running under strace?

Generally, SIGINT is not the best insertion vector...

Set a handler of e.g. SIGALRM with sigaction(), with a couple of other signals
in sa_mask (e.g. SIGUSR1 and SIGUSR2). With raise() on those inside the
SIGALRM handler - then they will become deliverable on return from handler.
And have SIGUSR1 and SIGUSR2 handlers print siginfo and ucontext contents
(have them set with SA_SIGINFO in sa_flags, look at the second and third
arguments of sighandler).

Use alarm(2) to arrange for SIGALRM and sit in a tight loop - that'll give you
delivery on return from interrupt. Alternatively, raise(SIGALRM) will give
you delivery on return from trap. And making that a SIGBUS handler instead,
mmapping a file, truncating it to 0 and dereferencing something in mmapped
area will give you delivery on return from access error trap. Division by
zero (and insertion handler on SIGFPE) ought to give you a type 2 exception
stack frame (4 bytes of aux data, that makes shifted exception frame bugger
format and vector fields of the original).

FWIW, the third argument of handler points to
struct ucontext {
unsigned long uc_flags;
struct ucontext *uc_link;
stack_t uc_stack;
struct mcontext uc_mcontext;
unsigned long uc_filler[80];
sigset_t uc_sigmask; /* mask last for extensibility */
};
and type/vector is stored in uc_filler[54] (216 bytes into the array), with
aux data from exception stack frame starting from uc_filler[55].

2021-07-28 01:25:13

by Finn Thain

[permalink] [raw]
Subject: Re: [RFC][CFT] signal handling fixes

On Tue, 27 Jul 2021, Al Viro wrote:

> On Tue, Jul 27, 2021 at 08:21:52PM +1000, Finn Thain wrote:
> > On Sun, 25 Jul 2021, Al Viro wrote:
> >
> > >
> > > The series is on top of 5.14-rc1; it lives in
> > > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #untested.m68k
> > > Individual patches in followups...
> > >
> > > _Very_ lightly tested on aranym; no real hardware to test it on.
> > > Any help with review and testing would be very welcome.
> > >
> >
> > I can test this branch on a Motorola 68040 machine I have here. Can you
> > advise how to get decent code coverage? Maybe there's a package out there
> > with a signal-heavy test suite? Maybe I need a break point in a signal
> > handler? Or perhaps just send ^C to a process running under strace?
>
> Generally, SIGINT is not the best insertion vector...
>

True. I see that 'man 7 signal' says that SIGQUIT will produce a coredump.
Would that contain anything of interest?

> Set a handler of e.g. SIGALRM with sigaction(), with a couple of other signals
> in sa_mask (e.g. SIGUSR1 and SIGUSR2). With raise() on those inside the
> SIGALRM handler - then they will become deliverable on return from handler.
> And have SIGUSR1 and SIGUSR2 handlers print siginfo and ucontext contents
> (have them set with SA_SIGINFO in sa_flags, look at the second and third
> arguments of sighandler).
>
> Use alarm(2) to arrange for SIGALRM and sit in a tight loop - that'll give you
> delivery on return from interrupt. Alternatively, raise(SIGALRM) will give
> you delivery on return from trap. And making that a SIGBUS handler instead,
> mmapping a file, truncating it to 0 and dereferencing something in mmapped
> area will give you delivery on return from access error trap. Division by
> zero (and insertion handler on SIGFPE) ought to give you a type 2 exception
> stack frame (4 bytes of aux data, that makes shifted exception frame bugger
> format and vector fields of the original).
>
> FWIW, the third argument of handler points to
> struct ucontext {
> unsigned long uc_flags;
> struct ucontext *uc_link;
> stack_t uc_stack;
> struct mcontext uc_mcontext;
> unsigned long uc_filler[80];
> sigset_t uc_sigmask; /* mask last for extensibility */
> };
> and type/vector is stored in uc_filler[54] (216 bytes into the array), with
> aux data from exception stack frame starting from uc_filler[55].
>

OK, give me a week or so and I'll see what I can come up with.

2021-08-11 01:43:25

by Finn Thain

[permalink] [raw]
Subject: Re: [RFC][CFT] signal handling fixes

Hi Al,

On Tue, 27 Jul 2021, Al Viro wrote:

> Set a handler of e.g. SIGALRM with sigaction(), with a couple of other
> signals in sa_mask (e.g. SIGUSR1 and SIGUSR2). With raise() on those
> inside the SIGALRM handler - then they will become deliverable on return
> from handler. And have SIGUSR1 and SIGUSR2 handlers print siginfo and
> ucontext contents (have them set with SA_SIGINFO in sa_flags, look at
> the second and third arguments of sighandler).
>
> Use alarm(2) to arrange for SIGALRM and sit in a tight loop - that'll
> give you delivery on return from interrupt. Alternatively,
> raise(SIGALRM) will give you delivery on return from trap. And making
> that a SIGBUS handler instead, mmapping a file, truncating it to 0 and
> dereferencing something in mmapped area will give you delivery on return
> from access error trap. Division by zero (and insertion handler on
> SIGFPE) ought to give you a type 2 exception stack frame (4 bytes of aux
> data, that makes shifted exception frame bugger format and vector fields
> of the original).
>
> FWIW, the third argument of handler points to
> struct ucontext {
> unsigned long uc_flags;
> struct ucontext *uc_link;
> stack_t uc_stack;
> struct mcontext uc_mcontext;
> unsigned long uc_filler[80];
> sigset_t uc_sigmask; /* mask last for extensibility */
> };
> and type/vector is stored in uc_filler[54] (216 bytes into the array),
> with aux data from exception stack frame starting from uc_filler[55].
>

I wrote the attached program to implement those tests. I linked it
statically and ran it under "setarch m68k --addr-no-randomize" on three
systems: Aranym, Qemu and Quadra 630. On each system I tested two builds,
1) stock 5.14.0-rc4 and 2) your "untested.m68k" branch rebased onto same.

Everything appears to work normally. I didn't see differences in ucontext
data between mainline build and patched builds, that is, after omitting
"random" differences that always occur from one test run to the next.

(Despite my attempt to avoid random addresses, repeating any test produced
some "random" values in uc_filler. I didn't try to find out what these
values represent. They appear in both builds.)

Do I need to run the program under gdb or strace to see the effect of your
changes?

BTW, I did see some differences between the Motorola 68040 and the
emulated 68040 CPUs.

On the Motorola CPU, uc_filler[54] is 0x00000078 for the signals delivered
on return from interrupt, but Qemu has 0x00000064 and Aranym has either
0x00000070 or 0x00000114.

Another discrepancy is uc_filler[55..67] for the SIGBUS case:

Motorola:
000000d0 ffffffff ffffffff 00007008 effffc90 ..........p.....
000000e0 05210001 00210001 c0000000 00985fec .!...!........_.
000000f0 0000001e 8006aeae 80000e94 0000001e ................
00000100 00000004 00000000 00008001 574c0080 ............WL..

Aranym:
000000d0 ffffffff ffffffff 00007008 c0000000 ..........p.....
000000e0 05210000 00000000 c0000000 c0000000 .!..............
000000f0 8000817c 00000000 00000000 00000000 ...|............
00000100 00000000 00000000 00000000 00000000 ................

Qemu:
000000d0 ffffffff ffffffff 00007008 c0000000 ..........p.....
000000e0 05210000 00000000 c0000000 c0000000 .!..............
000000f0 00000000 00000000 00000000 00000000 ................
00000100 00000000 00000000 00000000 00000000 ................

The other signals don't show discrepancies in uc_filler across CPU types
(that is, after omitting "random" values).

I wonder whether the deviation in emulator behaviour could have
consequences. E.g. I have heard a bug report relating to gdb under
qemu-system-m68k. Perhaps there's a connection.


Attachments:
nohup.out-5.14.0-rc4-multi-00003-g420aec9e726e-aranym (16.34 kB)
nohup.out-5.14.0-rc4-multi-00003-g420aec9e726e-q630 (17.29 kB)
nohup.out-5.14.0-rc4-multi-00003-g420aec9e726e-qemu (17.34 kB)
nohup.out-5.14.0-rc4-multi-aranym (16.32 kB)
nohup.out-5.14.0-rc4-multi-q630 (17.27 kB)
nohup.out-5.14.0-rc4-multi-qemu (17.32 kB)
sig.c (2.62 kB)
Download all attachments

2021-09-15 22:10:05

by Michael Schmitz

[permalink] [raw]
Subject: Re: [PATCH 1/3] m68k: handle arrivals of multiple signals correctly

Hi Al,

On 26/07/21 5:19 am, Al Viro wrote:
> When we have several pending signals, have entered with the kernel
> with large exception frame *and* have already built at least one
> sigframe, regs->stkadj is going to be non-zero and regs->format/sr/pc
> are going to be junk - the real values are in shifted exception stack
> frame we'd built when putting together the first sigframe.
>
> If that happens, subsequent sigframes are going to be garbage.
> Not hard to fix - just need to find the "adjusted" frame first
> and look for format/vector/sr/pc in it.
>
> Signed-off-by: Al Viro <[email protected]>

Looks good to me. What's more, it fixes a number of long standing issues
dating back to the 4.10 ages - see discussions at:

https://lore.kernel.org/r/[email protected]

https://lore.kernel.org/r/[email protected]

- so should be applied to -stable IMO.

Tested by me on 68030 - Finn Thain did some testing on 68040 and might
add his own tag.

Tested-by: Michael Schmitz <[email protected]>

Reviewed-by: Michael Schmitz <[email protected]>

> ---
> arch/m68k/kernel/signal.c | 88 ++++++++++++++++++++++-------------------------
> 1 file changed, 42 insertions(+), 46 deletions(-)
>
> diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
> index 8f215e79e70e..cd11eb101eac 100644
> --- a/arch/m68k/kernel/signal.c
> +++ b/arch/m68k/kernel/signal.c
> @@ -447,7 +447,7 @@ static inline void save_fpu_state(struct sigcontext *sc, struct pt_regs *regs)
>
> if (CPU_IS_060 ? sc->sc_fpstate[2] : sc->sc_fpstate[0]) {
> fpu_version = sc->sc_fpstate[0];
> - if (CPU_IS_020_OR_030 &&
> + if (CPU_IS_020_OR_030 && !regs->stkadj &&
> regs->vector >= (VEC_FPBRUC * 4) &&
> regs->vector <= (VEC_FPNAN * 4)) {
> /* Clear pending exception in 68882 idle frame */
> @@ -510,7 +510,7 @@ static inline int rt_save_fpu_state(struct ucontext __user *uc, struct pt_regs *
> if (!(CPU_IS_060 || CPU_IS_COLDFIRE))
> context_size = fpstate[1];
> fpu_version = fpstate[0];
> - if (CPU_IS_020_OR_030 &&
> + if (CPU_IS_020_OR_030 && !regs->stkadj &&
> regs->vector >= (VEC_FPBRUC * 4) &&
> regs->vector <= (VEC_FPNAN * 4)) {
> /* Clear pending exception in 68882 idle frame */
> @@ -832,18 +832,24 @@ asmlinkage int do_rt_sigreturn(struct pt_regs *regs, struct switch_stack *sw)
> return 0;
> }
>
> +static inline struct pt_regs *rte_regs(struct pt_regs *regs)
> +{
> + return (void *)regs + regs->stkadj;
> +}
> +
> static void setup_sigcontext(struct sigcontext *sc, struct pt_regs *regs,
> unsigned long mask)
> {
> + struct pt_regs *tregs = rte_regs(regs);
> sc->sc_mask = mask;
> sc->sc_usp = rdusp();
> sc->sc_d0 = regs->d0;
> sc->sc_d1 = regs->d1;
> sc->sc_a0 = regs->a0;
> sc->sc_a1 = regs->a1;
> - sc->sc_sr = regs->sr;
> - sc->sc_pc = regs->pc;
> - sc->sc_formatvec = regs->format << 12 | regs->vector;
> + sc->sc_sr = tregs->sr;
> + sc->sc_pc = tregs->pc;
> + sc->sc_formatvec = tregs->format << 12 | tregs->vector;
> save_a5_state(sc, regs);
> save_fpu_state(sc, regs);
> }
> @@ -851,6 +857,7 @@ static void setup_sigcontext(struct sigcontext *sc, struct pt_regs *regs,
> static inline int rt_setup_ucontext(struct ucontext __user *uc, struct pt_regs *regs)
> {
> struct switch_stack *sw = (struct switch_stack *)regs - 1;
> + struct pt_regs *tregs = rte_regs(regs);
> greg_t __user *gregs = uc->uc_mcontext.gregs;
> int err = 0;
>
> @@ -871,9 +878,9 @@ static inline int rt_setup_ucontext(struct ucontext __user *uc, struct pt_regs *
> err |= __put_user(sw->a5, &gregs[13]);
> err |= __put_user(sw->a6, &gregs[14]);
> err |= __put_user(rdusp(), &gregs[15]);
> - err |= __put_user(regs->pc, &gregs[16]);
> - err |= __put_user(regs->sr, &gregs[17]);
> - err |= __put_user((regs->format << 12) | regs->vector, &uc->uc_formatvec);
> + err |= __put_user(tregs->pc, &gregs[16]);
> + err |= __put_user(tregs->sr, &gregs[17]);
> + err |= __put_user((tregs->format << 12) | tregs->vector, &uc->uc_formatvec);
> err |= rt_save_fpu_state(uc, regs);
> return err;
> }
> @@ -890,13 +897,14 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
> struct pt_regs *regs)
> {
> struct sigframe __user *frame;
> - int fsize = frame_extra_sizes(regs->format);
> + struct pt_regs *tregs = rte_regs(regs);
> + int fsize = frame_extra_sizes(tregs->format);
> struct sigcontext context;
> int err = 0, sig = ksig->sig;
>
> if (fsize < 0) {
> pr_debug("setup_frame: Unknown frame format %#x\n",
> - regs->format);
> + tregs->format);
> return -EFAULT;
> }
>
> @@ -907,7 +915,7 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
>
> err |= __put_user(sig, &frame->sig);
>
> - err |= __put_user(regs->vector, &frame->code);
> + err |= __put_user(tregs->vector, &frame->code);
> err |= __put_user(&frame->sc, &frame->psc);
>
> if (_NSIG_WORDS > 1)
> @@ -934,33 +942,27 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
> push_cache ((unsigned long) &frame->retcode);
>
> /*
> - * Set up registers for signal handler. All the state we are about
> - * to destroy is successfully copied to sigframe.
> - */
> - wrusp ((unsigned long) frame);
> - regs->pc = (unsigned long) ksig->ka.sa.sa_handler;
> - adjustformat(regs);
> -
> - /*
> * This is subtle; if we build more than one sigframe, all but the
> * first one will see frame format 0 and have fsize == 0, so we won't
> * screw stkadj.
> */
> - if (fsize)
> + if (fsize) {
> regs->stkadj = fsize;
> -
> - /* Prepare to skip over the extra stuff in the exception frame. */
> - if (regs->stkadj) {
> - struct pt_regs *tregs =
> - (struct pt_regs *)((ulong)regs + regs->stkadj);
> + tregs = rte_regs(regs);
> pr_debug("Performing stackadjust=%04lx\n", regs->stkadj);
> - /* This must be copied with decreasing addresses to
> - handle overlaps. */
> tregs->vector = 0;
> tregs->format = 0;
> - tregs->pc = regs->pc;
> tregs->sr = regs->sr;
> }
> +
> + /*
> + * Set up registers for signal handler. All the state we are about
> + * to destroy is successfully copied to sigframe.
> + */
> + wrusp ((unsigned long) frame);
> + tregs->pc = (unsigned long) ksig->ka.sa.sa_handler;
> + adjustformat(regs);
> +
> return 0;
> }
>
> @@ -968,7 +970,8 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
> struct pt_regs *regs)
> {
> struct rt_sigframe __user *frame;
> - int fsize = frame_extra_sizes(regs->format);
> + struct pt_regs *tregs = rte_regs(regs);
> + int fsize = frame_extra_sizes(tregs->format);
> int err = 0, sig = ksig->sig;
>
> if (fsize < 0) {
> @@ -1019,33 +1022,26 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
> push_cache ((unsigned long) &frame->retcode);
>
> /*
> - * Set up registers for signal handler. All the state we are about
> - * to destroy is successfully copied to sigframe.
> - */
> - wrusp ((unsigned long) frame);
> - regs->pc = (unsigned long) ksig->ka.sa.sa_handler;
> - adjustformat(regs);
> -
> - /*
> * This is subtle; if we build more than one sigframe, all but the
> * first one will see frame format 0 and have fsize == 0, so we won't
> * screw stkadj.
> */
> - if (fsize)
> + if (fsize) {
> regs->stkadj = fsize;
> -
> - /* Prepare to skip over the extra stuff in the exception frame. */
> - if (regs->stkadj) {
> - struct pt_regs *tregs =
> - (struct pt_regs *)((ulong)regs + regs->stkadj);
> + tregs = rte_regs(regs);
> pr_debug("Performing stackadjust=%04lx\n", regs->stkadj);
> - /* This must be copied with decreasing addresses to
> - handle overlaps. */
> tregs->vector = 0;
> tregs->format = 0;
> - tregs->pc = regs->pc;
> tregs->sr = regs->sr;
> }
> +
> + /*
> + * Set up registers for signal handler. All the state we are about
> + * to destroy is successfully copied to sigframe.
> + */
> + wrusp ((unsigned long) frame);
> + tregs->pc = (unsigned long) ksig->ka.sa.sa_handler;
> + adjustformat(regs);
> return 0;
> }
>

2021-09-15 22:22:01

by Michael Schmitz

[permalink] [raw]
Subject: Re: [PATCH 2/3] m68k: update ->thread.esp0 before calling syscall_trace() in ret_from_signal

Hi Al,

On 26/07/21 5:19 am, Al Viro wrote:
> We get there when sigreturn has performed obscene acts on kernel stack;
> in particular, the location of pt_regs has shifted. We are about to call
> syscall_trace(), which might stop for tracer. If that happens, we'd better
> have task_pt_regs() returning correct result...
>
> Fucked-up-by: Al Viro <[email protected]>
> Fixes: bd6f56a75bb2 ("m68k: Missing syscall_trace() on sigreturn")
> Signed-off-by: Al Viro <[email protected]>

Looking good also, and should go to -stable.

Tested-by: Michael Schmitz <[email protected]>

Reviewed-by: Michael Schmitz <[email protected]>

> ---
> arch/m68k/kernel/entry.S | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/m68k/kernel/entry.S b/arch/m68k/kernel/entry.S
> index 9dd76fbb7c6b..ff9e842cec0f 100644
> --- a/arch/m68k/kernel/entry.S
> +++ b/arch/m68k/kernel/entry.S
> @@ -186,6 +186,8 @@ ENTRY(ret_from_signal)
> movel %curptr@(TASK_STACK),%a1
> tstb %a1@(TINFO_FLAGS+2)
> jge 1f
> + lea %sp@(SWITCH_STACK_SIZE),%a1
> + movel %a1,%curptr@(TASK_THREAD+THREAD_ESP0)
> jbsr syscall_trace
> 1: RESTORE_SWITCH_STACK
> addql #4,%sp

2021-09-16 09:05:40

by Finn Thain

[permalink] [raw]
Subject: Re: [RFC][CFT] signal handling fixes

On Sun, 25 Jul 2021, Al Viro wrote:

> ...
>
> PS: FWIW, ifdefs in arch/m68k/kernel/signal.c are wrong - it's not !MMU
> vs. coldfire/MMU vs. classic/MMU. It's actually 68000 vs. coldfire vs.
> everything else. These days it's nearly correct, but only because on
> MMU variants of coldfire we never see exception stack frames with type
> other than 4 - it's controlled by alignment of kernel stack pointer on
> those, and it's under the kernel control, so it's always 32bit-aligned.
> It used to be more serious back when we had 68360 support - that's !MMU
> and exception stack frames are like those on 68020, unless I'm
> misreading their manual...
>

I don't claim to understand this code but CPU32 cores appear to be
unsupported on either #ifdef branch: the MMU branch due to CACR and CAAR
used in push_cache(), and the !MMU branch due to frame format $4 used in
adjust_format().

The CPU32 Reference Manual appendix says these chips only supports control
registers SFC, DFC, VBR and stack frame formats $0, $2, $C.
https://www.nxp.com/files-static/microcontrollers/doc/ref_manual/CPU32RM.pdf

2021-09-23 14:44:51

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [RFC][CFT] signal handling fixes

Hi Finn,

On Thu, Sep 16, 2021 at 11:03 AM Finn Thain <[email protected]> wrote:
> On Sun, 25 Jul 2021, Al Viro wrote:
> > ...
> >
> > PS: FWIW, ifdefs in arch/m68k/kernel/signal.c are wrong - it's not !MMU
> > vs. coldfire/MMU vs. classic/MMU. It's actually 68000 vs. coldfire vs.
> > everything else. These days it's nearly correct, but only because on
> > MMU variants of coldfire we never see exception stack frames with type
> > other than 4 - it's controlled by alignment of kernel stack pointer on
> > those, and it's under the kernel control, so it's always 32bit-aligned.
> > It used to be more serious back when we had 68360 support - that's !MMU
> > and exception stack frames are like those on 68020, unless I'm
> > misreading their manual...
> >
>
> I don't claim to understand this code but CPU32 cores appear to be
> unsupported on either #ifdef branch: the MMU branch due to CACR and CAAR
> used in push_cache(), and the !MMU branch due to frame format $4 used in
> adjust_format().
>
> The CPU32 Reference Manual appendix says these chips only supports control
> registers SFC, DFC, VBR and stack frame formats $0, $2, $C.
> https://www.nxp.com/files-static/microcontrollers/doc/ref_manual/CPU32RM.pdf

As of commit a3595962d82495f5 ("m68knommu: remove obsolete 68360
support"), nothing selects MCPU32 anymore.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-09-23 14:47:17

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [RFC][CFT] signal handling fixes

On Sun, Jul 25, 2021 at 7:18 PM Al Viro <[email protected]> wrote:
> Back in 2012 or so I'd found a bunch of fun issues with multiple
> pending signals on a lot of architectures. m68k looked scarier than
> usual (due to the combination of variable-sized exception frames with the
> way kernel stack pointer is handled by the hardware), but I'd convinced
> myself that it had been correct.
>
> Unfortunately, I was wrong - handling of multiple pending signals
> does *not* work correctly there.

[...]

Thank you, queuing in the m68k branch as fixes.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds