2021-02-23 05:53:31

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH 0/3] x86/entry: A compat syscall bugfix and some test stuff

The compat syscall argument fixup error path is wrong. Fix it.

This also adds some sanity checks to the kernel that catch the bug
when running selftests.

Andy Lutomirski (3):
entry: Check that syscall entries and syscall exits match
x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
selftests/x86: Add a missing .note.GNU-stack section to thunks_32.S

arch/x86/entry/common.c | 3 ++-
include/linux/entry-common.h | 11 +++++++++++
include/linux/sched.h | 1 +
init/init_task.c | 9 +++++++++
kernel/entry/common.c | 25 ++++++++++++++++++++++++-
tools/testing/selftests/x86/thunks_32.S | 2 ++
6 files changed, 49 insertions(+), 2 deletions(-)

--
2.29.2


2021-02-23 06:23:41

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH 2/3] x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls

On a 32-bit fast syscall that fails to read its arguments from user
memory, the kernel currently does syscall exit work but not
syscall exit work. This would confuse audit and ptrace.

This is a minimal fix intended for ease of backporting. A more
complete cleanup is coming.

Cc: [email protected]
Fixes: 0b085e68f407 ("x86/entry: Consolidate 32/64 bit syscall entry")
Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 0904f5676e4d..cf4dcf346ca8 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -128,7 +128,8 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
regs->ax = -EFAULT;

instrumentation_end();
- syscall_exit_to_user_mode(regs);
+ local_irq_disable();
+ exit_to_user_mode();
return false;
}

--
2.29.2

2021-02-23 06:23:48

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH 1/3] entry: Check that syscall entries and syscall exits match

If arch code calls the wrong kernel entry helpers, syscall entries and
exits can get out of sync. Add a new field to task_struct to track the
syscall state and validate that it transitions correctly.

Signed-off-by: Andy Lutomirski <[email protected]>
---

I'm not in love with this patch. I'm imagining moving TS_COMPAT and such
into the new kentry_syscall_state field. What do you all think?

include/linux/entry-common.h | 11 +++++++++++
include/linux/sched.h | 1 +
init/init_task.c | 9 +++++++++
kernel/entry/common.c | 25 ++++++++++++++++++++++++-
4 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index ca86a00abe86..c2463b6b71fe 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -60,6 +60,17 @@
_TIF_NEED_RESCHED | _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL | \
ARCH_EXIT_TO_USER_MODE_WORK)

+/*
+ * task_struct::kentry_syscall_state
+ *
+ * kentry_syscall_state is reset to zero on syscall exit. For efficiency
+ * reasons, if CONFIG_PROVE_LOCKING=n, kentry_syscall_state is permitted
+ * to remain 0 even inside a syscall.
+ */
+#ifdef CONFIG_PROVE_LOCKING
+# define KENTRY_SYSCALL_STATE_IN_SYSCALL 1
+#endif
+
/**
* arch_check_user_regs - Architecture specific sanity check for user mode regs
* @regs: Pointer to currents pt_regs
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6e3a5eeec509..691057a3b48d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1002,6 +1002,7 @@ struct task_struct {
#endif
struct seccomp seccomp;
struct syscall_user_dispatch syscall_dispatch;
+ u32 kentry_syscall_state;

/* Thread group tracking: */
u64 parent_exec_id;
diff --git a/init/init_task.c b/init/init_task.c
index 8a992d73e6fb..91050c4f0bb3 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -12,6 +12,7 @@
#include <linux/audit.h>
#include <linux/numa.h>
#include <linux/scs.h>
+#include <linux/entry-common.h>

#include <linux/uaccess.h>

@@ -212,6 +213,14 @@ struct task_struct init_task
#ifdef CONFIG_SECCOMP
.seccomp = { .filter_count = ATOMIC_INIT(0) },
#endif
+#ifdef CONFIG_PROVE_LOCKING
+ /*
+ * The init task, and kernel threads in general, are considered
+ * to be "in a syscall". This way they can execve() and then exit
+ * the supposed syscall that they were in to go to user mode.
+ */
+ .kentry_syscall_state = KENTRY_SYSCALL_STATE_IN_SYSCALL,
+#endif
};
EXPORT_SYMBOL(init_task);

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 378341642f94..7e971b866ad2 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -83,7 +83,16 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
static __always_inline long
__syscall_enter_from_user_work(struct pt_regs *regs, long syscall)
{
- unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
+ unsigned long work;
+
+ if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
+ WARN_ONCE(current->kentry_syscall_state,
+ "entering syscall %ld while already in a syscall",
+ syscall);
+ current->kentry_syscall_state = KENTRY_SYSCALL_STATE_IN_SYSCALL;
+ }
+
+ work = READ_ONCE(current_thread_info()->syscall_work);

if (work & SYSCALL_WORK_ENTER)
syscall = syscall_trace_enter(regs, syscall, work);
@@ -195,6 +204,12 @@ static void exit_to_user_mode_prepare(struct pt_regs *regs)
{
unsigned long ti_work = READ_ONCE(current_thread_info()->flags);

+ if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
+ WARN_ONCE(current->kentry_syscall_state &
+ KENTRY_SYSCALL_STATE_IN_SYSCALL,
+ "exiting to user mode while in syscall context");
+ }
+
lockdep_assert_irqs_disabled();

if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
@@ -282,6 +297,14 @@ static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
*/
if (unlikely(work & SYSCALL_WORK_EXIT))
syscall_exit_work(regs, work);
+
+ if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
+ WARN_ONCE(!(current->kentry_syscall_state &
+ KENTRY_SYSCALL_STATE_IN_SYSCALL),
+ "exiting syscall %lu without entering first", nr);
+ }
+
+ current->kentry_syscall_state = 0;
}

static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)
--
2.29.2

2021-02-23 06:24:29

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH 3/3] selftests/x86: Add a missing .note.GNU-stack section to thunks_32.S

test_syscall_vdso_32 ended up with an executable stacks because the asm was
missing the annotation that says that it is modern and doesn't need an
executable stack. Add the annotation.

This was missed in commit aeaaf005da1d ("selftests/x86: Add missing
.note.GNU-stack sections").

Signed-off-by: Andy Lutomirski <[email protected]>
---
tools/testing/selftests/x86/thunks_32.S | 2 ++
1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/x86/thunks_32.S b/tools/testing/selftests/x86/thunks_32.S
index a71d92da8f46..f3f56e681e9f 100644
--- a/tools/testing/selftests/x86/thunks_32.S
+++ b/tools/testing/selftests/x86/thunks_32.S
@@ -45,3 +45,5 @@ call64_from_32:
ret

.size call64_from_32, .-call64_from_32
+
+.section .note.GNU-stack,"",%progbits
--
2.29.2

2021-02-23 14:05:01

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls

On Mon, Feb 22 2021 at 21:50, Andy Lutomirski wrote:
> On a 32-bit fast syscall that fails to read its arguments from user
> memory, the kernel currently does syscall exit work but not
> syscall exit work.

and this sentence does not make any sense.

> This would confuse audit and ptrace.

Would? It does confuse it, right?

> This is a minimal fix intended for ease of backporting. A more
> complete cleanup is coming.
>
> Cc: [email protected]
> Fixes: 0b085e68f407 ("x86/entry: Consolidate 32/64 bit syscall entry")
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
> arch/x86/entry/common.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 0904f5676e4d..cf4dcf346ca8 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -128,7 +128,8 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
> regs->ax = -EFAULT;
>
> instrumentation_end();
> - syscall_exit_to_user_mode(regs);
> + local_irq_disable();
> + exit_to_user_mode();

While I suggested this with tired brain yesterday night, it's not really
correct as it does not go through exit_to_user_mode_prepare() which it
really has to.

Thanks,

tglx

2021-02-23 14:18:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls

On Mon, Feb 22, 2021 at 09:50:28PM -0800, Andy Lutomirski wrote:
> On a 32-bit fast syscall that fails to read its arguments from user
> memory, the kernel currently does syscall exit work but not
> syscall exit work. This would confuse audit and ptrace.
>
> This is a minimal fix intended for ease of backporting. A more
> complete cleanup is coming.
>
> Cc: [email protected]
> Fixes: 0b085e68f407 ("x86/entry: Consolidate 32/64 bit syscall entry")
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
> arch/x86/entry/common.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 0904f5676e4d..cf4dcf346ca8 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -128,7 +128,8 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
> regs->ax = -EFAULT;
>
> instrumentation_end();
> - syscall_exit_to_user_mode(regs);
> + local_irq_disable();
> + exit_to_user_mode();
> return false;
> }

I'm confused, twice. Once by your Changelog, and second by the actual
patch. Shouldn't every return to userspace pass through
exit_to_user_mode_prepare() ? We shouldn't ignore NEED_RESCHED or
NOTIFY_RESUME, both of which can be set I think, even if the SYSCALL
didn't actually do anything.

2021-02-23 15:40:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls



> On Feb 23, 2021, at 3:29 AM, Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Feb 22, 2021 at 09:50:28PM -0800, Andy Lutomirski wrote:
>> On a 32-bit fast syscall that fails to read its arguments from user
>> memory, the kernel currently does syscall exit work but not
>> syscall exit work. This would confuse audit and ptrace.
>>
>> This is a minimal fix intended for ease of backporting. A more
>> complete cleanup is coming.
>>
>> Cc: [email protected]
>> Fixes: 0b085e68f407 ("x86/entry: Consolidate 32/64 bit syscall entry")
>> Signed-off-by: Andy Lutomirski <[email protected]>
>> ---
>> arch/x86/entry/common.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
>> index 0904f5676e4d..cf4dcf346ca8 100644
>> --- a/arch/x86/entry/common.c
>> +++ b/arch/x86/entry/common.c
>> @@ -128,7 +128,8 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
>> regs->ax = -EFAULT;
>>
>> instrumentation_end();
>> - syscall_exit_to_user_mode(regs);
>> + local_irq_disable();
>> + exit_to_user_mode();
>> return false;
>> }
>
> I'm confused, twice. Once by your Changelog, and second by the actual
> patch. Shouldn't every return to userspace pass through
> exit_to_user_mode_prepare() ? We shouldn't ignore NEED_RESCHED or
> NOTIFY_RESUME, both of which can be set I think, even if the SYSCALL
> didn't actually do anything.


Aaaaahhhhhh! There are too many of these functions. I’ll poke around. I’ll also try to figure out why I didn’t catch this in testing.