2018-06-26 21:18:39

by Mathieu Desnoyers

[permalink] [raw]
Subject: [RFC PATCH for 4.18 1/2] compat: Introduce is_compat_frame

x86 is moving from is_compat_task() to in_compat_syscall(). However,
in_compat_syscall cannot be used to check whether a signal is being
delivered on a compat task.

Introduce is_compat_frame to allow performing this check from
architecture agnostic code. On all architectures except x86, it
invokes is_compat_task(). On x86, it uses is_ia32_frame() and
is_x32_frame() to check whether the signal frame is 32-bit.

This is needed by restartable sequences to detect whether it needs
to clear the top bits of the start_ip, abort_ip, and post_commit_offset
rseq_cs fields on signal delivery, thus ensuring identical behavior
for a 32-bit binary executed on 32-bit and 64-bit kernels.

Signed-off-by: Mathieu Desnoyers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Watson <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: Chris Lameter <[email protected]>
Cc: Russell King <[email protected]>
Cc: Andrew Hunter <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Maurer <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
---
arch/x86/include/asm/compat.h | 24 ++++++++++++++++++++++++
arch/x86/kernel/signal.c | 17 -----------------
include/linux/compat.h | 17 +++++++++++++++++
3 files changed, 41 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index fb97cf7c4137..1405a8df5215 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -8,6 +8,7 @@
#include <linux/types.h>
#include <linux/sched.h>
#include <linux/sched/task_stack.h>
+#include <linux/signal.h>
#include <asm/processor.h>
#include <asm/user32.h>
#include <asm/unistd.h>
@@ -242,4 +243,27 @@ struct compat_siginfo;
int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
const siginfo_t *from, bool x32_ABI);

+static inline int is_ia32_compat_frame(struct ksignal *ksig)
+{
+ return IS_ENABLED(CONFIG_IA32_EMULATION) &&
+ ksig->ka.sa.sa_flags & SA_IA32_ABI;
+}
+
+static inline int is_ia32_frame(struct ksignal *ksig)
+{
+ return IS_ENABLED(CONFIG_X86_32) || is_ia32_compat_frame(ksig);
+}
+
+static inline int is_x32_frame(struct ksignal *ksig)
+{
+ return IS_ENABLED(CONFIG_X86_X32_ABI) &&
+ ksig->ka.sa.sa_flags & SA_X32_ABI;
+}
+
+static inline bool is_compat_frame(struct ksignal *ksig)
+{
+ return is_ia32_frame(ksig) || is_x32_frame(ksig);
+}
+#define is_compat_frame is_compat_frame /* override the generic impl */
+
#endif /* _ASM_X86_COMPAT_H */
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 92a3b312a53c..cb488e3e952d 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -664,23 +664,6 @@ SYSCALL_DEFINE0(rt_sigreturn)
return 0;
}

-static inline int is_ia32_compat_frame(struct ksignal *ksig)
-{
- return IS_ENABLED(CONFIG_IA32_EMULATION) &&
- ksig->ka.sa.sa_flags & SA_IA32_ABI;
-}
-
-static inline int is_ia32_frame(struct ksignal *ksig)
-{
- return IS_ENABLED(CONFIG_X86_32) || is_ia32_compat_frame(ksig);
-}
-
-static inline int is_x32_frame(struct ksignal *ksig)
-{
- return IS_ENABLED(CONFIG_X86_X32_ABI) &&
- ksig->ka.sa.sa_flags & SA_X32_ABI;
-}
-
static int
setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
{
diff --git a/include/linux/compat.h b/include/linux/compat.h
index b1a5562b3215..2e1ffba65117 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -1022,6 +1022,19 @@ static inline struct compat_timeval ns_to_compat_timeval(s64 nsec)
return ctv;
}

+/*
+ * For most but not all architectures, "is this a compat sigframe?" and
+ * "am I a compat task?" are the same question. For architectures on which
+ * they aren't the same question, arch code can override is_compat_frame.
+ */
+
+#ifndef is_compat_frame
+static inline bool is_compat_frame(struct ksignal *ksig)
+{
+ return is_compat_task();
+}
+#endif
+
#else /* !CONFIG_COMPAT */

#define is_compat_task() (0)
@@ -1029,6 +1042,10 @@ static inline struct compat_timeval ns_to_compat_timeval(s64 nsec)
static inline bool in_compat_syscall(void) { return false; }
#endif

+#ifndef is_compat_frame
+static inline bool is_compat_frame(struct ksignal *ksig) { return false; }
+#endif
+
#endif /* CONFIG_COMPAT */

#endif /* _LINUX_COMPAT_H */
--
2.11.0



2018-06-26 21:17:53

by Mathieu Desnoyers

[permalink] [raw]
Subject: [RFC PATCH for 4.18 2/2] rseq: compat: clear high bits of rseq_cs fields

Make the behavior rseq on compat tasks more robust by ensuring that
kernel/rseq.c:rseq_get_rseq_cs() clears the high bits of
rseq_cs->abort_ip, rseq_cs->start_ip and rseq_cs->post_commit_offset
when a 32-bit binary is run on a 64-bit kernel.

The intent here is that if user-space has garbage rather than zeroes
in its struct rseq_cs fields padding, the behavior will be the same
whether the binary is run on 32-bit or 64-bit kernels.

Use in_compat_syscall() when rseq_get_rseq_cs() is invoked from
system call context, and use is_compat_frame() when invoked from
signal delivery.

Signed-off-by: Mathieu Desnoyers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Watson <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: "H . Peter Anvin" <[email protected]>
Cc: Chris Lameter <[email protected]>
Cc: Russell King <[email protected]>
Cc: Andrew Hunter <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Maurer <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
---
kernel/rseq.c | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/kernel/rseq.c b/kernel/rseq.c
index 22b6acf1ad63..7b1d51b965fc 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -13,6 +13,7 @@
#include <linux/syscalls.h>
#include <linux/rseq.h>
#include <linux/types.h>
+#include <linux/compat.h>
#include <asm/ptrace.h>

#define CREATE_TRACE_POINTS
@@ -112,7 +113,23 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t)
return 0;
}

-static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs)
+#ifdef CONFIG_COMPAT
+static void rseq_cs_compat(struct ksignal *ksig, struct rseq_cs *rseq_cs)
+{
+ if (!(ksig ? is_compat_frame(ksig) : in_compat_syscall()))
+ return;
+
+ rseq_cs->abort_ip = (compat_uptr_t) rseq_cs->abort_ip;
+ rseq_cs->start_ip = (compat_uptr_t) rseq_cs->start_ip;
+ rseq_cs->post_commit_offset =
+ (compat_uptr_t) rseq_cs->post_commit_offset;
+}
+#else
+static void rseq_cs_compat(struct ksignal *ksig, struct rseq_cs *rseq_cs) { }
+#endif
+
+static int rseq_get_rseq_cs(struct ksignal *ksig, struct task_struct *t,
+ struct rseq_cs *rseq_cs)
{
struct rseq_cs __user *urseq_cs;
unsigned long ptr;
@@ -132,6 +149,7 @@ static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs)
return -EFAULT;
if (rseq_cs->version > 0)
return -EINVAL;
+ rseq_cs_compat(ksig, rseq_cs);

/* Ensure that abort_ip is not in the critical section. */
if (rseq_cs->abort_ip - rseq_cs->start_ip < rseq_cs->post_commit_offset)
@@ -209,14 +227,14 @@ static bool in_rseq_cs(unsigned long ip, struct rseq_cs *rseq_cs)
return ip - rseq_cs->start_ip < rseq_cs->post_commit_offset;
}

-static int rseq_ip_fixup(struct pt_regs *regs)
+static int rseq_ip_fixup(struct ksignal *ksig, struct pt_regs *regs)
{
unsigned long ip = instruction_pointer(regs);
struct task_struct *t = current;
struct rseq_cs rseq_cs;
int ret;

- ret = rseq_get_rseq_cs(t, &rseq_cs);
+ ret = rseq_get_rseq_cs(ksig, t, &rseq_cs);
if (ret)
return ret;

@@ -260,7 +278,7 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
return;
if (unlikely(!access_ok(VERIFY_WRITE, t->rseq, sizeof(*t->rseq))))
goto error;
- ret = rseq_ip_fixup(regs);
+ ret = rseq_ip_fixup(ksig, regs);
if (unlikely(ret < 0))
goto error;
if (unlikely(rseq_update_cpu_id(t)))
@@ -287,7 +305,7 @@ void rseq_syscall(struct pt_regs *regs)
if (!t->rseq)
return;
if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) ||
- rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs))
+ rseq_get_rseq_cs(NULL, t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs))
force_sig(SIGSEGV, t);
}

--
2.11.0


2018-06-27 01:35:25

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18 2/2] rseq: compat: clear high bits of rseq_cs fields



> On Jun 26, 2018, at 2:16 PM, Mathieu Desnoyers <[email protected]> wrote:
>
> Make the behavior rseq on compat tasks more robust by ensuring that
> kernel/rseq.c:rseq_get_rseq_cs() clears the high bits of
> rseq_cs->abort_ip, rseq_cs->start_ip and rseq_cs->post_commit_offset
> when a 32-bit binary is run on a 64-bit kernel.
>
> The intent here is that if user-space has garbage rather than zeroes
> in its struct rseq_cs fields padding, the behavior will be the same
> whether the binary is run on 32-bit or 64-bit kernels.
>
> Use in_compat_syscall() when rseq_get_rseq_cs() is invoked from
> system call context, and use is_compat_frame() when invoked from
> signal delivery.
>

And when it’s invoked due to preemption unrelated to a syscall or signal, you malfunction?

I think the only sane solution is to make these fields be u64, delete the LINUX_FIELD_ macros, and possibly teach the x86 slowpath return to inject a signal if it’s trying to return to a 32-bit context with garbage in the high bits of regs->ip so that we determistically fail if the user screws up.

Rseq is brand new. It should not need compat code at all.

2018-06-27 01:41:40

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18 2/2] rseq: compat: clear high bits of rseq_cs fields

----- On Jun 26, 2018, at 5:58 PM, Andy Lutomirski [email protected] wrote:

>> On Jun 26, 2018, at 2:16 PM, Mathieu Desnoyers <[email protected]>
>> wrote:
>>
>> Make the behavior rseq on compat tasks more robust by ensuring that
>> kernel/rseq.c:rseq_get_rseq_cs() clears the high bits of
>> rseq_cs->abort_ip, rseq_cs->start_ip and rseq_cs->post_commit_offset
>> when a 32-bit binary is run on a 64-bit kernel.
>>
>> The intent here is that if user-space has garbage rather than zeroes
>> in its struct rseq_cs fields padding, the behavior will be the same
>> whether the binary is run on 32-bit or 64-bit kernels.
>>
>> Use in_compat_syscall() when rseq_get_rseq_cs() is invoked from
>> system call context, and use is_compat_frame() when invoked from
>> signal delivery.
>>
>
> And when it’s invoked due to preemption unrelated to a syscall or signal, you
> malfunction?

Fair point! Hence the "RFC". ;)

So I understand better your intent to use the pt_regs to figure out whether it
is compat or not. My is_compat_frame()+in_compat_syscall() approach does not
handle this correctly.

>
> I think the only sane solution is to make these fields be u64,

I'm OK with turning the rseq_cs start_ip, post_commit_offset, and abort_ip
fields into normal u64.

> delete the
> LINUX_FIELD_ macros,

The LINUX_FIELD_ macros are still needed to ensure single-copy updates of
the (struct rseq *__tls_abi)->rseq_cs pointer by 32-bit user-space.

> and possibly teach the x86 slowpath return to inject a
> signal if it’s trying to return to a 32-bit context with garbage in the high
> bits of regs->ip so that we determistically fail if the user screws up.

I like the approach of dealing with the rseq_cs fields as u64 even on 32-bit
architectures. As a downside, it will require 32-bit architectures to do
arithmetic on 64-bit values, but it's not a fast-path. As you point out, the
tricky bit is to decide what happens when architecture code returns to
userspace with regs->ip containing garbage in the high bits.

An alternative approach is to ensure the high bits are cleared when returning
to an IP with garbage in the high bits.

> Rseq is brand new. It should not need compat code at all.

Dealing with u64 for start_ip, post_commit_offset, and abort_ip at the kernel
level would indeed provide this characteristic. However, I'm uneasy adding
64-bit arithmetic on operations really caring about 32 bits on 32-bit archs,
even though those are not fast paths.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-06-28 08:17:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18 2/2] rseq: compat: clear high bits of rseq_cs fields

On Tue, 26 Jun 2018, Andy Lutomirski wrote:
> > On Jun 26, 2018, at 2:16 PM, Mathieu Desnoyers <[email protected]> wrote:
> >
> > Make the behavior rseq on compat tasks more robust by ensuring that
> > kernel/rseq.c:rseq_get_rseq_cs() clears the high bits of
> > rseq_cs->abort_ip, rseq_cs->start_ip and rseq_cs->post_commit_offset
> > when a 32-bit binary is run on a 64-bit kernel.
> >
> > The intent here is that if user-space has garbage rather than zeroes
> > in its struct rseq_cs fields padding, the behavior will be the same
> > whether the binary is run on 32-bit or 64-bit kernels.
> >
> > Use in_compat_syscall() when rseq_get_rseq_cs() is invoked from
> > system call context, and use is_compat_frame() when invoked from
> > signal delivery.
> >
>
> And when it’s invoked due to preemption unrelated to a syscall or signal,
> you malfunction?
>
> I think the only sane solution is to make these fields be u64, delete the
> LINUX_FIELD_ macros, and possibly teach the x86 slowpath return to inject
> a signal if it’s trying to return to a 32-bit context with garbage in the
> high bits of regs->ip so that we determistically fail if the user screws
> up.

Right. That's the only sane solution. Trying to play games with 32/64bit
for a dubious value is going to bite us within no time and just create ugly
workarounds left and right. Forcing a clear handling upfront avoids all of
that.

Thanks,

tglx