The comment in get_nr_restart_syscall() says:
* The problem is that we can get here when ptrace pokes
* syscall-like values into regs even if we're not in a syscall
* at all.
Yes. but if we are not in syscall then the
status & (TS_COMPAT|TS_I386_REGS_POKED)
check below can't really help:
- TS_COMPAT can't be set
- TS_I386_REGS_POKED is only set if regs->orig_ax was changed by
32bit debugger; and even in this case get_nr_restart_syscall()
is only correct if the tracee is 32bit too.
Suppose that 64bit debugger plays with 32bit tracee and
* Tracee calls sleep(2) // TS_COMPAT is set
* User interrupts the tracee by CTRL-C after 1 sec and does
"(gdb) call func()"
* gdb saves the regs by PTRACE_GETREGS
* does PTRACE_SETREGS to set %rip='func' and %orig_rax=-1
* PTRACE_CONT // TS_COMPAT is cleared
* func() hits int3.
* Debugger catches SIGTRAP.
* Restore original regs by PTRACE_SETREGS.
* PTRACE_CONT
get_nr_restart_syscall() wrongly returns __NR_restart_syscall==219, the
tracee calls ia32_sys_call_table[219] == sys_madvise.
This patch adds the sticky TS_COMPAT_RESTART flag which survives after
return to user mode, hopefully it allows us to kill TS_I386_REGS_POKED
but this needs another patch.
Test-case:
$ cvs -d :pserver:anoncvs:[email protected]:/cvs/systemtap co ptrace-tests
$ gcc -o erestartsys-trap-debuggee ptrace-tests/tests/erestartsys-trap-debuggee.c --m32
$ gcc -o erestartsys-trap-debugger ptrace-tests/tests/erestartsys-trap-debugger.c -lutil
$ ./erestartsys-trap-debugger
Unexpected: retval 1, errno 22
erestartsys-trap-debugger: ptrace-tests/tests/erestartsys-trap-debugger.c:421
Reported-by: Jan Kratochvil <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/entry/common.c | 9 +++++++++
arch/x86/include/asm/thread_info.h | 1 +
arch/x86/kernel/signal.c | 6 ++----
3 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3f8e226..1e2c2ca 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -263,6 +263,15 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs)
rseq_syscall(regs);
+#ifdef CONFIG_IA32_EMULATION
+ if (regs->ax == -ERESTART_RESTARTBLOCK) {
+ struct thread_info *ti = current_thread_info();
+ if (ti->status & TS_COMPAT)
+ ti->status |= TS_COMPAT_RESTART;
+ else
+ ti->status &= ~TS_COMPAT_RESTART;
+ }
+#endif
/*
* First do one-time work. If these work items are enabled, we
* want to run them exactly once per syscall exit with IRQs on.
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index f945353..0784591 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -223,6 +223,7 @@ static inline int arch_within_stack_frames(const void * const stack,
#ifdef CONFIG_COMPAT
#define TS_I386_REGS_POKED 0x0004 /* regs poked by 32-bit ptracer */
+#define TS_COMPAT_RESTART 0x0008
#endif
#ifndef __ASSEMBLY__
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 8eb7193..1054f1c 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -788,12 +788,10 @@ static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
* For now, we maintain historical behavior and guess based on
* stored state. We could do better by saving the actual
* syscall arch in restart_block or (with caveats on x32) by
- * checking if regs->ip points to 'int $0x80'. The current
- * behavior is incorrect if a tracer has a different bitness
- * than the tracee.
+ * checking if regs->ip points to 'int $0x80'.
*/
#ifdef CONFIG_IA32_EMULATION
- if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED))
+ if (current_thread_info()->status & TS_COMPAT_RESTART)
return __NR_ia32_restart_syscall;
#endif
#ifdef CONFIG_X86_X32_ABI
--
2.5.0
On 11/26, Oleg Nesterov wrote:
>
> This patch adds the sticky TS_COMPAT_RESTART flag which survives after
> return to user mode, hopefully it allows us to kill TS_I386_REGS_POKED
> but this needs another patch.
Alternatively we could add ->compat_restart into struct restart_block,
logically this is the same thing.
Rather than changing syscall_return_slowpath() we could introduce a new
helper, say,
void set_restart_block_fn(struct restart_block *restart, fn)
{
restart->fn = fn;
set/clear TS_COMPAT_RESTART or restart->compat_restart
depending on TS_COMPAT
}
but this patch looks simpler to me.
Oleg.
On Tue, Nov 26, 2019 at 3:08 AM Oleg Nesterov <[email protected]> wrote:
>
> Alternatively we could add ->compat_restart into struct restart_block,
> logically this is the same thing.
That sounds like the better model to me. That's what the restart_block
is about: it's supposed to contain the restart information.
I'd much rather see the system call number added into the restart
block (or just the "compat bit" - but we have that X32 case too, so
why not put it all there). And then the get_nr_restart_syscall() hack
goes away and is just "set state from the restart block".
How painful would that be? I guess right now we always just set all
the restart_block info manually in all the restart cases, and that
could make it a bit painful to add this kind of architecture-specific
flag, but it _sounds_ conceptually like the right thing to do.
I definitely don't love the "magic sticky bit in thread status" field model.
Linus
On Tue, Nov 26, 2019 at 8:04 PM Linus Torvalds
<[email protected]> wrote:
>
> On Tue, Nov 26, 2019 at 3:08 AM Oleg Nesterov <[email protected]> wrote:
> >
> > Alternatively we could add ->compat_restart into struct restart_block,
> > logically this is the same thing.
>
> That sounds like the better model to me. That's what the restart_block
> is about: it's supposed to contain the restart information.
>
> I'd much rather see the system call number added into the restart
> block (or just the "compat bit" - but we have that X32 case too, so
> why not put it all there). And then the get_nr_restart_syscall() hack
> goes away and is just "set state from the restart block".
>
> How painful would that be? I guess right now we always just set all
> the restart_block info manually in all the restart cases, and that
> could make it a bit painful to add this kind of architecture-specific
> flag, but it _sounds_ conceptually like the right thing to do.
How about we rename restart_block::fn to __fn, add fields
restart_syscall_nr and restart_syscall_arch, and do:
long restart_block_activate(long (*fn)(struct restart_block *))
{
current->restart_block.__fn = fn;
arch_restart_block_activate();
return -ERESTART_RESTARTBLOCK;
current->restart_block.syscall_nr
}
IMO the ideal solution would be to add a new syscall nr to restart a
syscall and make it the same on all architectures. This has
unfortunate interactions with seccomp, though.
--Andy
On 11/26, Linus Torvalds wrote:
>
> On Tue, Nov 26, 2019 at 3:08 AM Oleg Nesterov <[email protected]> wrote:
> >
> > Alternatively we could add ->compat_restart into struct restart_block,
> > logically this is the same thing.
>
> That sounds like the better model to me. That's what the restart_block
> is about: it's supposed to contain the restart information.
I knew ;) OK, I won't argue, I'll send V2.
> I'd much rather see the system call number added into the restart
> block (or just the "compat bit" - but we have that X32 case too, so
> why not put it all there).
apart from x86, who else can use it? after the quick grep I think nobody,
even arm64 and mips which have compat nr_restart's do not need it.
restart_block.arch_restart_block_infp makes more sense, but that would be
even more painful, I do not want to add asm/restart_block.h or
HAVE_ARCH_RESTART_INFO, or use CONFIG_IA32_EMULATION.
OK, lets add the new restart_block.nr_restart_syscall field, then we need
void set_restart_block_fn(restart, fn)
{
restart->nr_restart_syscall = arch_get_nr_restart_syscall()
restart->fn = fn;
}
but somehow I do not see a good place for
#ifndef arch_get_nr_restart_syscall()
#define arch_get_nr_restart_syscall() 0
#endif
Can you suggest a simple solution?
Hmm. Or may be HAVE_ARCH_RESTART is better after all? Say, just
--- a/include/linux/restart_block.h
+++ b/include/linux/restart_block.h
@@ -24,6 +24,9 @@ enum timespec_type {
*/
struct restart_block {
long (*fn)(struct restart_block *);
+#ifdef CONFIG_HAVE_ARCH_RESTART_XXX
+ int nr_restart_syscall;
+#endif
union {
/* For futex_wait and futex_wait_requeue_pi */
struct {
@@ -55,6 +58,15 @@ struct restart_block {
};
};
+static inline void set_restart_block_fn(restart, fn)
+{
+#ifdef CONFIG_HAVE_ARCH_RESTART_XXX
+ extern int arch_get_nr_restart_syscall();
+ restart->nr_restart_syscall = arch_get_nr_restart_syscall();
+#endif
+ restart->fn = fn;
+}
+
extern long do_no_restart_syscall(struct restart_block *parm);
#endif /* __LINUX_RESTART_BLOCK_H */
?
Oleg.
On 11/26, Andy Lutomirski wrote:
>
> How about we rename restart_block::fn to __fn,
Oh, please no. We can do this later, this needs to update every arch/
> add fields
> restart_syscall_nr and restart_syscall_arch,
Hmm, why do we nedd 2 fields ?
and do:
> IMO the ideal solution would be to add a new syscall nr to restart a
> syscall and make it the same on all architectures.
Damn yes! And I tried to suggest this 2 years ago. But
> This has
> unfortunate interactions with seccomp, though.
Yes. Do you think we can do something with seccomp?
Oleg.
On Wed, Nov 27, 2019 at 9:02 AM Oleg Nesterov <[email protected]> wrote:
>
> OK, lets add the new restart_block.nr_restart_syscall field, then we need
>
> void set_restart_block_fn(restart, fn)
> {
> restart->nr_restart_syscall = arch_get_nr_restart_syscall()
> restart->fn = fn;
> }
No, I'd suggest just adding an arch-specific "unsigned long" to the
restart data (and not force the naming to something like the system
call number - that's just an x86 detail), and then something like this
on x86:
void arch_set_restart_data(restart)
{
restart->arch_data = x86_get_restart_syscall();
}
#define arch_set_restart_data arch_set_restart_data
and then we'd have in generic code something like
#ifndef arch_set_restart_data
#define arch_set_restart_data(block) do { } while (0)
#endif
int set_restart_fn(fn)
{
struct restart_block *restart = ¤t->restart_blockl
arch_set_restart_data(restart);
restart->fn = fn;
return -ERESTART_RESTARTBLOCK;
}
or something like that, and we'd just convert the existing (there
aren't that many)
restart->fn = xyz
return -ERESTART_RESTARTBLOCK;
cases into
return set_restart_fn(fn);
and for bonus points, we probably should rename the "fn" field, but
that might be too much work.
It doesn't look *too* painful, because we just don't have all that
many restarting system calls
But the above is handwaving.
And yeah, I never understood why the compat and x32 cases should have
different system call numbers in the first place. The seccomp argument
is garbage, but probably historical stuff that we can no longer
change.
Linus
On 11/27, Linus Torvalds wrote:
>
> On Wed, Nov 27, 2019 at 9:02 AM Oleg Nesterov <[email protected]> wrote:
> >
> > OK, lets add the new restart_block.nr_restart_syscall field, then we need
> >
> > void set_restart_block_fn(restart, fn)
> > {
> > restart->nr_restart_syscall = arch_get_nr_restart_syscall()
> > restart->fn = fn;
> > }
>
> No, I'd suggest just adding an arch-specific "unsigned long" to the
> restart data (and not force the naming to something like the system
> call number - that's just an x86 detail), and then something like this
> on x86:
>
> void arch_set_restart_data(restart)
> {
> restart->arch_data = x86_get_restart_syscall();
> }
> #define arch_set_restart_data arch_set_restart_data
>
> and then we'd have in generic code something like
>
> #ifndef arch_set_restart_data
> #define arch_set_restart_data(block) do { } while (0)
> #endif
OK, let it be arch_data/arch_set_restart_data, the same thing.
You misunderstood my question, I do not see a good place for the code
above. So I am going to uglify */signal.[ch] files.
See the incomplete patch below, everything else is trivial.
Please tell me if you think I should move this code somewhere else.
Oleg.
diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
index 33d3c88..f536bcb 100644
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -5,6 +5,10 @@
#ifndef __ASSEMBLY__
#include <linux/linkage.h>
+struct restart_block;
+extern void arch_set_restart_data(struct restart_block *);
+#define arch_set_restart_data arch_set_restart_data
+
/* Most things should be clean enough to redefine this at will, if care
is taken to make libc match. */
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 8eb7193..ede5443 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -768,6 +768,11 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
signal_setup_done(failed, ksig, stepping);
}
+void arch_set_restart_data(struct restart_block *restart)
+{
+ // TODO
+}
+
static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
{
/*
diff --git a/include/linux/restart_block.h b/include/linux/restart_block.h
index bba2920..d39f836 100644
--- a/include/linux/restart_block.h
+++ b/include/linux/restart_block.h
@@ -55,6 +55,8 @@ struct restart_block {
};
};
+extern long set_restart_fn(struct restart_block *restart,
+ long (*fn)(struct restart_block *));
extern long do_no_restart_syscall(struct restart_block *parm);
#endif /* __LINUX_RESTART_BLOCK_H */
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 1a5f883..542499f 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -5,12 +5,17 @@
#include <linux/bug.h>
#include <linux/signal_types.h>
#include <linux/string.h>
+#include <linux/restart_block.h>
struct task_struct;
/* for sysctl */
extern int print_fatal_signals;
+#ifndef arch_set_restart_data
+#define arch_set_restart_data(restart) do { } while (0)
+#endif
+
static inline void copy_siginfo(kernel_siginfo_t *to,
const kernel_siginfo_t *from)
{
diff --git a/kernel/signal.c b/kernel/signal.c
index bcd46f5..d6402e6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -4493,6 +4493,14 @@ SYSCALL_DEFINE3(sigsuspend, int, unused1, int, unused2, old_sigset_t, mask)
}
#endif
+long set_restart_fn(struct restart_block *restart,
+ long (*fn)(struct restart_block *))
+{
+ restart->fn = fn;
+ arch_set_restart_data(restart);
+ return -ERESTART_RESTARTBLOCK;
+}
+
__weak const char *arch_vma_name(struct vm_area_struct *vma)
{
return NULL;
On Thu, Nov 28, 2019 at 7:36 AM Oleg Nesterov <[email protected]> wrote:
>
> You misunderstood my question, I do not see a good place for the code
> above. So I am going to uglify */signal.[ch] files.
Ahh, ok, I thought that was kind of understood.
> --- a/arch/x86/include/asm/signal.h
> +++ b/arch/x86/include/asm/signal.h
> @@ -5,6 +5,10 @@
> #ifndef __ASSEMBLY__
> #include <linux/linkage.h>
>
> +struct restart_block;
> +extern void arch_set_restart_data(struct restart_block *);
> +#define arch_set_restart_data arch_set_restart_data
I'd just replace this with
/* We need to save TS_COMPAT at the time of the call */
#define arch_set_restart_data(blk) (blk)->arch_data =
current_thread_info()->status
or something like that.
Linus
>> On Nov 28, 2019, at 11:25 AM, Linus Torvalds <[email protected]> wrote:
>>
>> On Thu, Nov 28, 2019 at 7:36 AM Oleg Nesterov <[email protected]> wrote:
>> You misunderstood my question, I do not see a good place for the code
>> above. So I am going to uglify */signal.[ch] files.
>
> Ahh, ok, I thought that was kind of understood.
>
>> --- a/arch/x86/include/asm/signal.h
>> +++ b/arch/x86/include/asm/signal.h
>> @@ -5,6 +5,10 @@
>> #ifndef __ASSEMBLY__
>> #include <linux/linkage.h>
>> +struct restart_block;
>> +extern void arch_set_restart_data(struct restart_block *);
>> +#define arch_set_restart_data arch_set_restart_data
>
> I'd just replace this with
>
> /* We need to save TS_COMPAT at the time of the call */
> #define arch_set_restart_data(blk) (blk)->arch_data =
> current_thread_info()->status
>
> or something like that.
I think this doesn’t work for x32. Let’s either save the result of syscall_get_arch() or just actually calculate and save the restart_syscall nr here.
Before we commit to this, Kees, do you think we can manage to just renumber restart_syscall()? That’s a much better solution if we can pull it off.
>
> Linus
On 11/28, Linus Torvalds wrote:
>
> > --- a/arch/x86/include/asm/signal.h
> > +++ b/arch/x86/include/asm/signal.h
> > @@ -5,6 +5,10 @@
> > #ifndef __ASSEMBLY__
> > #include <linux/linkage.h>
> >
> > +struct restart_block;
> > +extern void arch_set_restart_data(struct restart_block *);
> > +#define arch_set_restart_data arch_set_restart_data
>
> I'd just replace this with
>
> /* We need to save TS_COMPAT at the time of the call */
> #define arch_set_restart_data(blk) (blk)->arch_data =
> current_thread_info()->status
OK, but then this code should live in {linux,asm}/thread_info.h. And this
is probably better, linux/thread_info.h already includes restart_block.h.
What do you think about 1-4 below?
Please note that 3/4 adds TS_COMPAT_RESTART you do not like, 4/4 kills it
and adds restart_block->arch_data.
This is because I will need to backport this fix, and 4/4 is not trivially
backportable due to KABI issues.
Oleg.
From d1000d6f52ede3875c633067a5ec34e7ba138bc4 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <[email protected]>
Date: Fri, 29 Nov 2019 16:51:12 +0100
Subject: [PATCH 1/4] introduce set_restart_fn()
---
fs/select.c | 10 ++++------
include/linux/thread_info.h | 12 ++++++++++++
kernel/futex.c | 3 +--
kernel/time/alarmtimer.c | 2 +-
kernel/time/hrtimer.c | 2 +-
kernel/time/posix-cpu-timers.c | 2 +-
6 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/fs/select.c b/fs/select.c
index 53a0c14..e517960 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -1037,10 +1037,9 @@ static long do_restart_poll(struct restart_block *restart_block)
ret = do_sys_poll(ufds, nfds, to);
- if (ret == -ERESTARTNOHAND) {
- restart_block->fn = do_restart_poll;
- ret = -ERESTART_RESTARTBLOCK;
- }
+ if (ret == -ERESTARTNOHAND)
+ ret = set_restart_fn(restart_block, do_restart_poll);
+
return ret;
}
@@ -1062,7 +1061,6 @@ SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds,
struct restart_block *restart_block;
restart_block = ¤t->restart_block;
- restart_block->fn = do_restart_poll;
restart_block->poll.ufds = ufds;
restart_block->poll.nfds = nfds;
@@ -1073,7 +1071,7 @@ SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds,
} else
restart_block->poll.has_timeout = 0;
- ret = -ERESTART_RESTARTBLOCK;
+ ret = set_restart_fn(restart_block, do_restart_poll);
}
return ret;
}
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 659a440..df8e3fb 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -39,6 +39,18 @@ enum {
#ifdef __KERNEL__
+#ifndef arch_set_restart_data
+#define arch_set_restart_data(restart) do { } while (0)
+#endif
+
+static inline long set_restart_fn(struct restart_block *restart,
+ long (*fn)(struct restart_block *))
+{
+ restart->fn = fn;
+ arch_set_restart_data(restart);
+ return -ERESTART_RESTARTBLOCK;
+}
+
#ifndef THREAD_ALIGN
#define THREAD_ALIGN THREAD_SIZE
#endif
diff --git a/kernel/futex.c b/kernel/futex.c
index bd18f60..f4f20e9 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2753,14 +2753,13 @@ static int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
goto out;
restart = ¤t->restart_block;
- restart->fn = futex_wait_restart;
restart->futex.uaddr = uaddr;
restart->futex.val = val;
restart->futex.time = *abs_time;
restart->futex.bitset = bitset;
restart->futex.flags = flags | FLAGS_HAS_TIMEOUT;
- ret = -ERESTART_RESTARTBLOCK;
+ ret = set_restart_fn(restart, futex_wait_restart);
out:
if (to) {
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 451f9d0..a2ddf56 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -829,9 +829,9 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,
if (flags == TIMER_ABSTIME)
return -ERESTARTNOHAND;
- restart->fn = alarm_timer_nsleep_restart;
restart->nanosleep.clockid = type;
restart->nanosleep.expires = exp;
+ set_restart_fn(restart, alarm_timer_nsleep_restart);
return ret;
}
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 6560553..55f71c4 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1932,9 +1932,9 @@ long hrtimer_nanosleep(const struct timespec64 *rqtp,
}
restart = ¤t->restart_block;
- restart->fn = hrtimer_nanosleep_restart;
restart->nanosleep.clockid = t.timer.base->clockid;
restart->nanosleep.expires = hrtimer_get_expires_tv64(&t.timer);
+ set_restart_fn(restart, hrtimer_nanosleep_restart);
out:
destroy_hrtimer_on_stack(&t.timer);
return ret;
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 42d512f..eacb0ca 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1335,8 +1335,8 @@ static int posix_cpu_nsleep(const clockid_t which_clock, int flags,
if (flags & TIMER_ABSTIME)
return -ERESTARTNOHAND;
- restart_block->fn = posix_cpu_nsleep_restart;
restart_block->nanosleep.clockid = which_clock;
+ set_restart_fn(restart_block, posix_cpu_nsleep_restart);
}
return error;
}
--
2.5.0
From 54301a07a645a27c982c99e4e975321cf73c5456 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <[email protected]>
Date: Fri, 29 Nov 2019 17:45:16 +0100
Subject: [PATCH 2/4] x86: mv TS_COMPAT from asm/processor.h to
asm/thread_info.h
---
arch/x86/include/asm/processor.h | 9 ---------
arch/x86/include/asm/thread_info.h | 9 +++++++++
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 54f5d54..d247a24 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -507,15 +507,6 @@ static inline void arch_thread_struct_whitelist(unsigned long *offset,
}
/*
- * Thread-synchronous status.
- *
- * This is different from the flags in that nobody else
- * ever touches our thread-synchronous status, so we don't
- * have to worry about atomic accesses.
- */
-#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
-
-/*
* Set IOPL bits in EFLAGS from given mask
*/
static inline void native_set_iopl_mask(unsigned mask)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index f945353..b2125c4 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -221,6 +221,15 @@ static inline int arch_within_stack_frames(const void * const stack,
#endif
+/*
+ * Thread-synchronous status.
+ *
+ * This is different from the flags in that nobody else
+ * ever touches our thread-synchronous status, so we don't
+ * have to worry about atomic accesses.
+ */
+#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
+
#ifdef CONFIG_COMPAT
#define TS_I386_REGS_POKED 0x0004 /* regs poked by 32-bit ptracer */
#endif
--
2.5.0
From db3784d4100cf3bc3097738da9a80fcfb6933fc6 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <[email protected]>
Date: Fri, 29 Nov 2019 17:52:42 +0100
Subject: [PATCH 3/4] ptrace/x86: introduce TS_COMPAT_RESTART to fix
get_nr_restart_syscall()
---
arch/x86/include/asm/thread_info.h | 14 +++++++++++++-
arch/x86/kernel/signal.c | 24 +-----------------------
2 files changed, 14 insertions(+), 24 deletions(-)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index b2125c4..a4de7aa 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -230,10 +230,22 @@ static inline int arch_within_stack_frames(const void * const stack,
*/
#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
+#ifndef __ASSEMBLY__
#ifdef CONFIG_COMPAT
#define TS_I386_REGS_POKED 0x0004 /* regs poked by 32-bit ptracer */
+#define TS_COMPAT_RESTART 0x0008
+
+#define arch_set_restart_data arch_set_restart_data
+
+static inline void arch_set_restart_data(struct restart_block *restart)
+{
+ struct thread_info *ti = current_thread_info();
+ if (ti->status & TS_COMPAT)
+ ti->status |= TS_COMPAT_RESTART;
+ else
+ ti->status &= ~TS_COMPAT_RESTART;
+}
#endif
-#ifndef __ASSEMBLY__
#ifdef CONFIG_X86_32
#define in_ia32_syscall() true
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 8eb7193..2fdbf5e 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -770,30 +770,8 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
{
- /*
- * This function is fundamentally broken as currently
- * implemented.
- *
- * The idea is that we want to trigger a call to the
- * restart_block() syscall and that we want in_ia32_syscall(),
- * in_x32_syscall(), etc. to match whatever they were in the
- * syscall being restarted. We assume that the syscall
- * instruction at (regs->ip - 2) matches whatever syscall
- * instruction we used to enter in the first place.
- *
- * The problem is that we can get here when ptrace pokes
- * syscall-like values into regs even if we're not in a syscall
- * at all.
- *
- * For now, we maintain historical behavior and guess based on
- * stored state. We could do better by saving the actual
- * syscall arch in restart_block or (with caveats on x32) by
- * checking if regs->ip points to 'int $0x80'. The current
- * behavior is incorrect if a tracer has a different bitness
- * than the tracee.
- */
#ifdef CONFIG_IA32_EMULATION
- if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED))
+ if (current_thread_info()->status & TS_COMPAT_RESTART)
return __NR_ia32_restart_syscall;
#endif
#ifdef CONFIG_X86_X32_ABI
--
2.5.0
From f3b1ab409152d2c8ea9c75f7688b54aa6a3cf518 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <[email protected]>
Date: Fri, 29 Nov 2019 18:05:04 +0100
Subject: [PATCH 4/4] x86: turn TS_COMPAT_RESTART into restart_block->arch_data
---
arch/x86/include/asm/thread_info.h | 12 ++----------
arch/x86/kernel/signal.c | 2 +-
include/linux/restart_block.h | 1 +
3 files changed, 4 insertions(+), 11 deletions(-)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index a4de7aa..75c4a0a 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -233,18 +233,10 @@ static inline int arch_within_stack_frames(const void * const stack,
#ifndef __ASSEMBLY__
#ifdef CONFIG_COMPAT
#define TS_I386_REGS_POKED 0x0004 /* regs poked by 32-bit ptracer */
-#define TS_COMPAT_RESTART 0x0008
-#define arch_set_restart_data arch_set_restart_data
+#define arch_set_restart_data(restart) \
+ do { restart->arch_data = current_thread_info()->status; } while (0)
-static inline void arch_set_restart_data(struct restart_block *restart)
-{
- struct thread_info *ti = current_thread_info();
- if (ti->status & TS_COMPAT)
- ti->status |= TS_COMPAT_RESTART;
- else
- ti->status &= ~TS_COMPAT_RESTART;
-}
#endif
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 2fdbf5e..2e52767 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -771,7 +771,7 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
{
#ifdef CONFIG_IA32_EMULATION
- if (current_thread_info()->status & TS_COMPAT_RESTART)
+ if (current->restart_block.arch_data & TS_COMPAT)
return __NR_ia32_restart_syscall;
#endif
#ifdef CONFIG_X86_X32_ABI
diff --git a/include/linux/restart_block.h b/include/linux/restart_block.h
index bba2920..980a655 100644
--- a/include/linux/restart_block.h
+++ b/include/linux/restart_block.h
@@ -23,6 +23,7 @@ enum timespec_type {
* System call restart block.
*/
struct restart_block {
+ unsigned long arch_data;
long (*fn)(struct restart_block *);
union {
/* For futex_wait and futex_wait_requeue_pi */
--
2.5.0
On 11/28, Andy Lutomirski wrote:
>
> I think this doesn’t work for x32.
Why? get_nr_restart_syscall() can still rely on the "orig_ax & __X32_SYSCALL_BIT"
check, debugger should restore regs->orig_ax correctly.
> Let’s either save the result of syscall_get_arch()
We can save the result of syscall_get_arch(), but it doesn't distinguish
x32/x86_64 tasks, so it doesn't really differ from TS_COMPAT.
> or just actually calculate and save the restart_syscall nr here.
sure, we we can do this.
> Before we commit to this, Kees, do you think we can manage to just renumber
> restart_syscall()? That’s a much better solution if we can pull it off.
Agreed.
Oleg.
> On Nov 29, 2019, at 9:32 AM, Oleg Nesterov <[email protected]> wrote:
>
> On 11/28, Andy Lutomirski wrote:
>>
>> I think this doesn’t work for x32.
>
> Why? get_nr_restart_syscall() can still rely on the "orig_ax & __X32_SYSCALL_BIT"
> check, debugger should restore regs->orig_ax correctly.
Right. Although relying on this is IMO a ridiculous bit of ABI.
>
>> Let’s either save the result of syscall_get_arch()
>
> We can save the result of syscall_get_arch(), but it doesn't distinguish
> x32/x86_64 tasks, so it doesn't really differ from TS_COMPAT.
Duh. Never mind.
>
>> or just actually calculate and save the restart_syscall nr here.
>
> sure, we we can do this.
I like this the best unless we can renumber the syscalls.
>
>> Before we commit to this, Kees, do you think we can manage to just renumber
>> restart_syscall()? That’s a much better solution if we can pull it off.
>
> Agreed.
>
> Oleg.
>
This version follows the latest recommendation from Linus,
arch_set_restart_data() just saves ti->status in restart->arch_data.
Andy, I can add another patch or change 4/4 to save the syscall number
instead, I am fine either way.
However, personally I dislike restart->arch_data, imo 3/4 is all we need.
I agree, set_restart_fn() is better than the ugly ERESTART_RESTARTBLOCK
check in syscall_return_slowpath() added by v1. But to me the x86-only
arch_data field in restart_block is much worse than the sticky TS_ flag.
To remind, there is another reason for the "transient" 3/4, 4/4 is not
easily backportable.
Oleg.
---
arch/x86/include/asm/processor.h | 9 ---------
arch/x86/include/asm/thread_info.h | 15 ++++++++++++++-
arch/x86/kernel/signal.c | 24 +-----------------------
fs/select.c | 10 ++++------
include/linux/restart_block.h | 1 +
include/linux/thread_info.h | 12 ++++++++++++
kernel/futex.c | 3 +--
kernel/time/alarmtimer.c | 2 +-
kernel/time/hrtimer.c | 2 +-
kernel/time/posix-cpu-timers.c | 2 +-
10 files changed, 36 insertions(+), 44 deletions(-)
Preparation. Add the new helper which sets restart_block->fn and calls
the dummy arch_set_restart_data() helper.
Signed-off-by: Oleg Nesterov <[email protected]>
---
fs/select.c | 10 ++++------
include/linux/thread_info.h | 12 ++++++++++++
kernel/futex.c | 3 +--
kernel/time/alarmtimer.c | 2 +-
kernel/time/hrtimer.c | 2 +-
kernel/time/posix-cpu-timers.c | 2 +-
6 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/fs/select.c b/fs/select.c
index 53a0c14..e517960 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -1037,10 +1037,9 @@ static long do_restart_poll(struct restart_block *restart_block)
ret = do_sys_poll(ufds, nfds, to);
- if (ret == -ERESTARTNOHAND) {
- restart_block->fn = do_restart_poll;
- ret = -ERESTART_RESTARTBLOCK;
- }
+ if (ret == -ERESTARTNOHAND)
+ ret = set_restart_fn(restart_block, do_restart_poll);
+
return ret;
}
@@ -1062,7 +1061,6 @@ SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds,
struct restart_block *restart_block;
restart_block = ¤t->restart_block;
- restart_block->fn = do_restart_poll;
restart_block->poll.ufds = ufds;
restart_block->poll.nfds = nfds;
@@ -1073,7 +1071,7 @@ SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds,
} else
restart_block->poll.has_timeout = 0;
- ret = -ERESTART_RESTARTBLOCK;
+ ret = set_restart_fn(restart_block, do_restart_poll);
}
return ret;
}
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 659a440..df8e3fb 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -39,6 +39,18 @@ enum {
#ifdef __KERNEL__
+#ifndef arch_set_restart_data
+#define arch_set_restart_data(restart) do { } while (0)
+#endif
+
+static inline long set_restart_fn(struct restart_block *restart,
+ long (*fn)(struct restart_block *))
+{
+ restart->fn = fn;
+ arch_set_restart_data(restart);
+ return -ERESTART_RESTARTBLOCK;
+}
+
#ifndef THREAD_ALIGN
#define THREAD_ALIGN THREAD_SIZE
#endif
diff --git a/kernel/futex.c b/kernel/futex.c
index bd18f60..f4f20e9 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2753,14 +2753,13 @@ static int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
goto out;
restart = ¤t->restart_block;
- restart->fn = futex_wait_restart;
restart->futex.uaddr = uaddr;
restart->futex.val = val;
restart->futex.time = *abs_time;
restart->futex.bitset = bitset;
restart->futex.flags = flags | FLAGS_HAS_TIMEOUT;
- ret = -ERESTART_RESTARTBLOCK;
+ ret = set_restart_fn(restart, futex_wait_restart);
out:
if (to) {
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 451f9d0..a2ddf56 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -829,9 +829,9 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,
if (flags == TIMER_ABSTIME)
return -ERESTARTNOHAND;
- restart->fn = alarm_timer_nsleep_restart;
restart->nanosleep.clockid = type;
restart->nanosleep.expires = exp;
+ set_restart_fn(restart, alarm_timer_nsleep_restart);
return ret;
}
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 6560553..55f71c4 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1932,9 +1932,9 @@ long hrtimer_nanosleep(const struct timespec64 *rqtp,
}
restart = ¤t->restart_block;
- restart->fn = hrtimer_nanosleep_restart;
restart->nanosleep.clockid = t.timer.base->clockid;
restart->nanosleep.expires = hrtimer_get_expires_tv64(&t.timer);
+ set_restart_fn(restart, hrtimer_nanosleep_restart);
out:
destroy_hrtimer_on_stack(&t.timer);
return ret;
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 42d512f..eacb0ca 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1335,8 +1335,8 @@ static int posix_cpu_nsleep(const clockid_t which_clock, int flags,
if (flags & TIMER_ABSTIME)
return -ERESTARTNOHAND;
- restart_block->fn = posix_cpu_nsleep_restart;
restart_block->nanosleep.clockid = which_clock;
+ set_restart_fn(restart_block, posix_cpu_nsleep_restart);
}
return error;
}
--
2.5.0
Move TS_COMPAT back to asm/thread_info.h, close to TS_I386_REGS_POKED.
It was moved to asm/processor.h by b9d989c7218a ("x86/asm: Move the
thread_info::status field to thread_struct"), then later 37a8f7c38339
("x86/asm: Move 'status' from thread_struct to thread_info") moved the
'status' field back but TS_COMPAT was forgotten.
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/include/asm/processor.h | 9 ---------
arch/x86/include/asm/thread_info.h | 9 +++++++++
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 54f5d54..d247a24 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -507,15 +507,6 @@ static inline void arch_thread_struct_whitelist(unsigned long *offset,
}
/*
- * Thread-synchronous status.
- *
- * This is different from the flags in that nobody else
- * ever touches our thread-synchronous status, so we don't
- * have to worry about atomic accesses.
- */
-#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
-
-/*
* Set IOPL bits in EFLAGS from given mask
*/
static inline void native_set_iopl_mask(unsigned mask)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index f945353..b2125c4 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -221,6 +221,15 @@ static inline int arch_within_stack_frames(const void * const stack,
#endif
+/*
+ * Thread-synchronous status.
+ *
+ * This is different from the flags in that nobody else
+ * ever touches our thread-synchronous status, so we don't
+ * have to worry about atomic accesses.
+ */
+#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
+
#ifdef CONFIG_COMPAT
#define TS_I386_REGS_POKED 0x0004 /* regs poked by 32-bit ptracer */
#endif
--
2.5.0
With this patch x86 just saves current_thread_info()->status in the
new restart_block->arch_data field, TS_COMPAT_RESTART can be removed.
Rather than saving "status" we could shift the code from
get_nr_restart_syscall() to arch_set_restart_data() and save the syscall
number in ->arch_data.
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/include/asm/thread_info.h | 12 ++----------
arch/x86/kernel/signal.c | 2 +-
include/linux/restart_block.h | 1 +
3 files changed, 4 insertions(+), 11 deletions(-)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index a4de7aa..75c4a0a 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -233,18 +233,10 @@ static inline int arch_within_stack_frames(const void * const stack,
#ifndef __ASSEMBLY__
#ifdef CONFIG_COMPAT
#define TS_I386_REGS_POKED 0x0004 /* regs poked by 32-bit ptracer */
-#define TS_COMPAT_RESTART 0x0008
-#define arch_set_restart_data arch_set_restart_data
+#define arch_set_restart_data(restart) \
+ do { restart->arch_data = current_thread_info()->status; } while (0)
-static inline void arch_set_restart_data(struct restart_block *restart)
-{
- struct thread_info *ti = current_thread_info();
- if (ti->status & TS_COMPAT)
- ti->status |= TS_COMPAT_RESTART;
- else
- ti->status &= ~TS_COMPAT_RESTART;
-}
#endif
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 2fdbf5e..2e52767 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -771,7 +771,7 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
{
#ifdef CONFIG_IA32_EMULATION
- if (current_thread_info()->status & TS_COMPAT_RESTART)
+ if (current->restart_block.arch_data & TS_COMPAT)
return __NR_ia32_restart_syscall;
#endif
#ifdef CONFIG_X86_X32_ABI
diff --git a/include/linux/restart_block.h b/include/linux/restart_block.h
index bba2920..980a655 100644
--- a/include/linux/restart_block.h
+++ b/include/linux/restart_block.h
@@ -23,6 +23,7 @@ enum timespec_type {
* System call restart block.
*/
struct restart_block {
+ unsigned long arch_data;
long (*fn)(struct restart_block *);
union {
/* For futex_wait and futex_wait_requeue_pi */
--
2.5.0
The comment in get_nr_restart_syscall() says:
* The problem is that we can get here when ptrace pokes
* syscall-like values into regs even if we're not in a syscall
* at all.
Yes. but if we are not in syscall then the
status & (TS_COMPAT|TS_I386_REGS_POKED)
check below can't really help:
- TS_COMPAT can't be set
- TS_I386_REGS_POKED is only set if regs->orig_ax was changed by
32bit debugger; and even in this case get_nr_restart_syscall()
is only correct if the tracee is 32bit too.
Suppose that 64bit debugger plays with 32bit tracee and
* Tracee calls sleep(2) // TS_COMPAT is set
* User interrupts the tracee by CTRL-C after 1 sec and does
"(gdb) call func()"
* gdb saves the regs by PTRACE_GETREGS
* does PTRACE_SETREGS to set %rip='func' and %orig_rax=-1
* PTRACE_CONT // TS_COMPAT is cleared
* func() hits int3.
* Debugger catches SIGTRAP.
* Restore original regs by PTRACE_SETREGS.
* PTRACE_CONT
get_nr_restart_syscall() wrongly returns __NR_restart_syscall==219, the
tracee calls ia32_sys_call_table[219] == sys_madvise.
This patch adds the sticky TS_COMPAT_RESTART flag which survives after
return to user mode, hopefully it allows us to kill TS_I386_REGS_POKED
but this needs another patch.
Test-case:
$ cvs -d :pserver:anoncvs:[email protected]:/cvs/systemtap co ptrace-tests
$ gcc -o erestartsys-trap-debuggee ptrace-tests/tests/erestartsys-trap-debuggee.c --m32
$ gcc -o erestartsys-trap-debugger ptrace-tests/tests/erestartsys-trap-debugger.c -lutil
$ ./erestartsys-trap-debugger
Unexpected: retval 1, errno 22
erestartsys-trap-debugger: ptrace-tests/tests/erestartsys-trap-debugger.c:421
Reported-by: Jan Kratochvil <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/include/asm/thread_info.h | 14 +++++++++++++-
arch/x86/kernel/signal.c | 24 +-----------------------
2 files changed, 14 insertions(+), 24 deletions(-)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index b2125c4..a4de7aa 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -230,10 +230,22 @@ static inline int arch_within_stack_frames(const void * const stack,
*/
#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
+#ifndef __ASSEMBLY__
#ifdef CONFIG_COMPAT
#define TS_I386_REGS_POKED 0x0004 /* regs poked by 32-bit ptracer */
+#define TS_COMPAT_RESTART 0x0008
+
+#define arch_set_restart_data arch_set_restart_data
+
+static inline void arch_set_restart_data(struct restart_block *restart)
+{
+ struct thread_info *ti = current_thread_info();
+ if (ti->status & TS_COMPAT)
+ ti->status |= TS_COMPAT_RESTART;
+ else
+ ti->status &= ~TS_COMPAT_RESTART;
+}
#endif
-#ifndef __ASSEMBLY__
#ifdef CONFIG_X86_32
#define in_ia32_syscall() true
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 8eb7193..2fdbf5e 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -770,30 +770,8 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
{
- /*
- * This function is fundamentally broken as currently
- * implemented.
- *
- * The idea is that we want to trigger a call to the
- * restart_block() syscall and that we want in_ia32_syscall(),
- * in_x32_syscall(), etc. to match whatever they were in the
- * syscall being restarted. We assume that the syscall
- * instruction at (regs->ip - 2) matches whatever syscall
- * instruction we used to enter in the first place.
- *
- * The problem is that we can get here when ptrace pokes
- * syscall-like values into regs even if we're not in a syscall
- * at all.
- *
- * For now, we maintain historical behavior and guess based on
- * stored state. We could do better by saving the actual
- * syscall arch in restart_block or (with caveats on x32) by
- * checking if regs->ip points to 'int $0x80'. The current
- * behavior is incorrect if a tracer has a different bitness
- * than the tracee.
- */
#ifdef CONFIG_IA32_EMULATION
- if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED))
+ if (current_thread_info()->status & TS_COMPAT_RESTART)
return __NR_ia32_restart_syscall;
#endif
#ifdef CONFIG_X86_X32_ABI
--
2.5.0
Preparation. Add the new helper which sets restart_block->fn and calls
the dummy arch_set_restart_data() helper.
Signed-off-by: Oleg Nesterov <[email protected]>
---
fs/select.c | 10 ++++------
include/linux/thread_info.h | 13 +++++++++++++
kernel/futex.c | 3 +--
kernel/time/alarmtimer.c | 2 +-
kernel/time/hrtimer.c | 2 +-
kernel/time/posix-cpu-timers.c | 2 +-
6 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/fs/select.c b/fs/select.c
index 53a0c14..e517960 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -1037,10 +1037,9 @@ static long do_restart_poll(struct restart_block *restart_block)
ret = do_sys_poll(ufds, nfds, to);
- if (ret == -ERESTARTNOHAND) {
- restart_block->fn = do_restart_poll;
- ret = -ERESTART_RESTARTBLOCK;
- }
+ if (ret == -ERESTARTNOHAND)
+ ret = set_restart_fn(restart_block, do_restart_poll);
+
return ret;
}
@@ -1062,7 +1061,6 @@ SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds,
struct restart_block *restart_block;
restart_block = ¤t->restart_block;
- restart_block->fn = do_restart_poll;
restart_block->poll.ufds = ufds;
restart_block->poll.nfds = nfds;
@@ -1073,7 +1071,7 @@ SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds,
} else
restart_block->poll.has_timeout = 0;
- ret = -ERESTART_RESTARTBLOCK;
+ ret = set_restart_fn(restart_block, do_restart_poll);
}
return ret;
}
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 659a440..be124f0 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -11,6 +11,7 @@
#include <linux/types.h>
#include <linux/bug.h>
#include <linux/restart_block.h>
+#include <linux/errno.h>
#ifdef CONFIG_THREAD_INFO_IN_TASK
/*
@@ -39,6 +40,18 @@ enum {
#ifdef __KERNEL__
+#ifndef arch_set_restart_data
+#define arch_set_restart_data(restart) do { } while (0)
+#endif
+
+static inline long set_restart_fn(struct restart_block *restart,
+ long (*fn)(struct restart_block *))
+{
+ restart->fn = fn;
+ arch_set_restart_data(restart);
+ return -ERESTART_RESTARTBLOCK;
+}
+
#ifndef THREAD_ALIGN
#define THREAD_ALIGN THREAD_SIZE
#endif
diff --git a/kernel/futex.c b/kernel/futex.c
index bd18f60..f4f20e9 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2753,14 +2753,13 @@ static int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
goto out;
restart = ¤t->restart_block;
- restart->fn = futex_wait_restart;
restart->futex.uaddr = uaddr;
restart->futex.val = val;
restart->futex.time = *abs_time;
restart->futex.bitset = bitset;
restart->futex.flags = flags | FLAGS_HAS_TIMEOUT;
- ret = -ERESTART_RESTARTBLOCK;
+ ret = set_restart_fn(restart, futex_wait_restart);
out:
if (to) {
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 451f9d0..a2ddf56 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -829,9 +829,9 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,
if (flags == TIMER_ABSTIME)
return -ERESTARTNOHAND;
- restart->fn = alarm_timer_nsleep_restart;
restart->nanosleep.clockid = type;
restart->nanosleep.expires = exp;
+ set_restart_fn(restart, alarm_timer_nsleep_restart);
return ret;
}
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 6560553..55f71c4 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1932,9 +1932,9 @@ long hrtimer_nanosleep(const struct timespec64 *rqtp,
}
restart = ¤t->restart_block;
- restart->fn = hrtimer_nanosleep_restart;
restart->nanosleep.clockid = t.timer.base->clockid;
restart->nanosleep.expires = hrtimer_get_expires_tv64(&t.timer);
+ set_restart_fn(restart, hrtimer_nanosleep_restart);
out:
destroy_hrtimer_on_stack(&t.timer);
return ret;
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 42d512f..eacb0ca 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1335,8 +1335,8 @@ static int posix_cpu_nsleep(const clockid_t which_clock, int flags,
if (flags & TIMER_ABSTIME)
return -ERESTARTNOHAND;
- restart_block->fn = posix_cpu_nsleep_restart;
restart_block->nanosleep.clockid = which_clock;
+ set_restart_fn(restart_block, posix_cpu_nsleep_restart);
}
return error;
}
--
2.5.0
On 12/03, Oleg Nesterov wrote:
>
> +static inline long set_restart_fn(struct restart_block *restart,
> + long (*fn)(struct restart_block *))
> +{
> + restart->fn = fn;
> + arch_set_restart_data(restart);
> + return -ERESTART_RESTARTBLOCK;
this needs include/errno.h (thank you kbuild test robot), I am sending v3.
Oleg.
On Wed, Dec 18, 2019 at 12:02 PM Linus Torvalds
<[email protected]> wrote:
>
> On Wed, Dec 18, 2019 at 7:19 AM Oleg Nesterov <[email protected]> wrote:
> >
> > Andy, Linus, do you have any objections?
>
> It's ok by me, no objections. I still don't love your "hide the bit in
> thread flags over return to user space", and would still prefer it in
> the restart block, but I don't care _that_ deeply.
>
I'd rather stick it in restart_block. I'd also like to see the kernel
*verify* that the variant of restart_syscall() that's invoked is the
same as the variant that should be invoked. In my mind, very few
syscalls say "I can't believe there are no major bugs in here" like
restart_syscall(), and being conservative is nice.
Andy Lutomirski <[email protected]> writes:
> On Wed, Dec 18, 2019 at 12:02 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> On Wed, Dec 18, 2019 at 7:19 AM Oleg Nesterov <[email protected]> wrote:
>> >
>> > Andy, Linus, do you have any objections?
>>
>> It's ok by me, no objections. I still don't love your "hide the bit in
>> thread flags over return to user space", and would still prefer it in
>> the restart block, but I don't care _that_ deeply.
>>
>
> I'd rather stick it in restart_block. I'd also like to see the kernel
> *verify* that the variant of restart_syscall() that's invoked is the
> same as the variant that should be invoked. In my mind, very few
> syscalls say "I can't believe there are no major bugs in here" like
> restart_syscall(), and being conservative is nice.
Just mopping up my backlog. What happened to this?