2020-11-14 03:31:27

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 00/10] Migrate syscall entry/exit work to SYSCALL_WORK flagset

Thomas,

This a refactor work moving the work done by features like seccomp,
ptrace, audit and tracepoints out of the TI flags. The reasons are:

1) Scarcity of TI flags in x86 32-bit.

2) TI flags are defined by the architecture, while these features are
arch-independent.

3) Community resistance in merging new architecture-independent
features as TI flags.

The design exposes a new field in struct thread_info that is read at
syscall_trace_enter and syscall_work_exit in place of the ti flags.
No functional changes is expected from this patchset. The design and
organization of this patchset achieves the following goals:

1) SYSCALL_WORK flags are architecture-independent

2) Architectures that are not using the generic entry code can
continue to use TI flags transparently and be converted later.

3) Architectures that migrate to the generic entry code are forced to
use the new design.

4) x86, since it supports the generic code, is migrated in this
patchset.

The transparent usage of TIF or SYSCALL_WORK flags is achieved through
some macros. Any code outside of the generic entry code is converted to
use the flags only through the accessors.

The patchset has some transition helpers, in an attempt to simplify the
patches converting each of the subsystems separately. I believe this
simplifies the review while making the tree bisectable.

I tested this by running each of the features in x86. Other
architectures were compile tested only.

This is based on top of tip/master.

A tree with the patches applies can be pulled from

https://gitlab.collabora.com/krisman/linux.git -b x86/tif-cleanup-v1

Please, if possible, consider queueing this for the 5.11 merge window,
as this is blocking the Syscall User Dispatch work that has been on the
list for a while.

Gabriel Krisman Bertazi (10):
x86: Expose syscall_work field in thread_info
kernel: entry: Expose helpers to migrate TIF to SYSCALL_WORK flags
kernel: entry: Wire up syscall_work in common entry code
seccomp: Migrate to use SYSCALL_WORK flag
tracepoints: Migrate to use SYSCALL_WORK flag
ptrace: Migrate to use SYSCALL_TRACE flag
ptrace: Migrate TIF_SYSCALL_EMU to use SYSCALL_WORK flag
audit: Migrate to use SYSCALL_WORK flag
kernel: entry: Drop usage of TIF flags in the generic syscall code
x86: Reclaim unused x86 TI flags

arch/x86/include/asm/thread_info.h | 11 +-----
include/asm-generic/syscall.h | 14 ++++----
include/linux/entry-common.h | 44 ++++++++---------------
include/linux/seccomp.h | 2 +-
include/linux/thread_info.h | 57 ++++++++++++++++++++++++++++++
include/linux/tracehook.h | 6 ++--
include/trace/syscall.h | 6 ++--
kernel/auditsc.c | 4 +--
kernel/entry/common.c | 45 +++++++++++------------
kernel/fork.c | 8 ++---
kernel/ptrace.c | 16 ++++-----
kernel/seccomp.c | 6 ++--
kernel/trace/trace_events.c | 2 +-
kernel/tracepoint.c | 4 +--
14 files changed, 130 insertions(+), 95 deletions(-)

--
2.29.2


2020-11-14 03:32:17

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 06/10] ptrace: Migrate to use SYSCALL_TRACE flag

For architectures that rely on the generic syscall entry code, use the
syscall_work field in struct thread_info and the specific SYSCALL_WORK
flag. This set of flags has the advantage of being architecture
independent.

Users of the flag outside of the generic entry code should rely on the
accessor macros, such that the flag is still correctly resolved for
architectures that don't use the generic entry code and still rely on
TIF flags for system call work.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
include/asm-generic/syscall.h | 14 +++++++-------
include/linux/entry-common.h | 10 ++++++----
include/linux/thread_info.h | 2 ++
include/linux/tracehook.h | 6 +++---
kernel/entry/common.c | 4 ++--
kernel/fork.c | 2 +-
kernel/ptrace.c | 6 +++---
7 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
index f3135e734387..5042d1ba4bc5 100644
--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -43,7 +43,7 @@ int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);
* @regs: task_pt_regs() of @task
*
* It's only valid to call this when @task is stopped for system
- * call exit tracing (due to TIF_SYSCALL_TRACE or TIF_SYSCALL_AUDIT),
+ * call exit tracing (due to SYSCALL_TRACE or TIF_SYSCALL_AUDIT),
* after tracehook_report_syscall_entry() returned nonzero to prevent
* the system call from taking place.
*
@@ -63,7 +63,7 @@ void syscall_rollback(struct task_struct *task, struct pt_regs *regs);
* Returns 0 if the system call succeeded, or -ERRORCODE if it failed.
*
* It's only valid to call this when @task is stopped for tracing on exit
- * from a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * from a system call, due to %SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
*/
long syscall_get_error(struct task_struct *task, struct pt_regs *regs);

@@ -76,7 +76,7 @@ long syscall_get_error(struct task_struct *task, struct pt_regs *regs);
* This value is meaningless if syscall_get_error() returned nonzero.
*
* It's only valid to call this when @task is stopped for tracing on exit
- * from a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * from a system call, due to %SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
*/
long syscall_get_return_value(struct task_struct *task, struct pt_regs *regs);

@@ -93,7 +93,7 @@ long syscall_get_return_value(struct task_struct *task, struct pt_regs *regs);
* code; the user sees a failed system call with this errno code.
*
* It's only valid to call this when @task is stopped for tracing on exit
- * from a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * from a system call, due to %SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
*/
void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs,
int error, long val);
@@ -108,7 +108,7 @@ void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs,
* @args[0], and so on.
*
* It's only valid to call this when @task is stopped for tracing on
- * entry to a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * entry to a system call, due to %SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
*/
void syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
unsigned long *args);
@@ -123,7 +123,7 @@ void syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
* The first argument gets value @args[0], and so on.
*
* It's only valid to call this when @task is stopped for tracing on
- * entry to a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * entry to a system call, due to %SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
*/
void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
const unsigned long *args);
@@ -135,7 +135,7 @@ void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
* Returns the AUDIT_ARCH_* based on the system call convention in use.
*
* It's only valid to call this when @task is stopped on entry to a system
- * call, due to %TIF_SYSCALL_TRACE, %TIF_SYSCALL_AUDIT, or %TIF_SECCOMP.
+ * call, due to %SYSCALL_TRACE, %TIF_SYSCALL_AUDIT, or %TIF_SECCOMP.
*
* Architectures which permit CONFIG_HAVE_ARCH_SECCOMP_FILTER must
* provide an implementation of this.
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 8aba367e5c79..dc864edb7950 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -41,7 +41,7 @@
#endif

#define SYSCALL_ENTER_WORK \
- (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
+ (_TIF_SYSCALL_AUDIT | \
_TIF_SYSCALL_EMU | \
ARCH_SYSCALL_ENTER_WORK)

@@ -53,12 +53,14 @@
#endif

#define SYSCALL_EXIT_WORK \
- (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
+ (_TIF_SYSCALL_AUDIT | \
ARCH_SYSCALL_EXIT_WORK)

#define SYSCALL_WORK_ENTER (SYSCALL_WORK_SECCOMP | \
- SYSCALL_WORK_SYSCALL_TRACEPOINT)
-#define SYSCALL_WORK_EXIT (SYSCALL_WORK_SYSCALL_TRACEPOINT)
+ SYSCALL_WORK_SYSCALL_TRACEPOINT | \
+ SYSCALL_WORK_SYSCALL_TRACE)
+#define SYSCALL_WORK_EXIT (SYSCALL_WORK_SYSCALL_TRACEPOINT | \
+ SYSCALL_WORK_SYSCALL_TRACE)

/*
* TIF flags handled in exit_to_user_mode_loop()
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index f764314b00b9..b01f05282158 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -39,10 +39,12 @@ enum syscall_work_bit {

SYSCALL_WORK_SECCOMP = 0,
SYSCALL_WORK_SYSCALL_TRACEPOINT = 1,
+ SYSCALL_WORK_SYSCALL_TRACE = 2,
};

#define _SYSCALL_WORK_SECCOMP BIT(SYSCALL_WORK_SECCOMP)
#define _SYSCALL_WORK_SYSCALL_TRACEPOINT BIT(SYSCALL_WORK_SYSCALL_TRACEPOINT)
+#define _SYSCALL_WORK_SYSCALL_TRACE BIT(SYSCALL_WORK_SYSCALL_TRACE)

#include <asm/thread_info.h>

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index f7d82e4fafd6..0aa3771d1df5 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -83,7 +83,7 @@ static inline int ptrace_report_syscall(struct pt_regs *regs,
* tracehook_report_syscall_entry - task is about to attempt a system call
* @regs: user register state of current task
*
- * This will be called if %TIF_SYSCALL_TRACE or %TIF_SYSCALL_EMU have been set,
+ * This will be called if %SYSCALL_TRACE or %TIF_SYSCALL_EMU have been set,
* when the current task has just entered the kernel for a system call.
* Full user register state is available here. Changing the values
* in @regs can affect the system call number and arguments to be tried.
@@ -109,7 +109,7 @@ static inline __must_check int tracehook_report_syscall_entry(
* @regs: user register state of current task
* @step: nonzero if simulating single-step or block-step
*
- * This will be called if %TIF_SYSCALL_TRACE has been set, when the
+ * This will be called if %SYSCALL_TRACE has been set, when the
* current task has just finished an attempted system call. Full
* user register state is available here. It is safe to block here,
* preventing signals from being processed.
@@ -117,7 +117,7 @@ static inline __must_check int tracehook_report_syscall_entry(
* If @step is nonzero, this report is also in lieu of the normal
* trap that would follow the system call instruction because
* user_enable_block_step() or user_enable_single_step() was used.
- * In this case, %TIF_SYSCALL_TRACE might not be set.
+ * In this case, %SYSCALL_TRACE might not be set.
*
* Called without locks, just before checking for pending signals.
*/
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 745b847f4ed4..55ede5fed650 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -47,7 +47,7 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
long ret = 0;

/* Handle ptrace */
- if (ti_work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
+ if (work & _SYSCALL_WORK_SYSCALL_TRACE || ti_work & _TIF_SYSCALL_EMU) {
ret = arch_syscall_enter_tracehook(regs);
if (ret || (ti_work & _TIF_SYSCALL_EMU))
return -1L;
@@ -237,7 +237,7 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long ti_work,
trace_sys_exit(regs, syscall_get_return_value(current, regs));

step = report_single_step(ti_work);
- if (step || ti_work & _TIF_SYSCALL_TRACE)
+ if (step || work & _SYSCALL_WORK_SYSCALL_TRACE)
arch_syscall_exit_tracehook(regs, step);
}

diff --git a/kernel/fork.c b/kernel/fork.c
index 4433c9c60100..6f934a930015 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2158,7 +2158,7 @@ static __latent_entropy struct task_struct *copy_process(
* child regardless of CLONE_PTRACE.
*/
user_disable_single_step(p);
- clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
+ clear_task_syscall_work(p, SYSCALL_TRACE);
#ifdef TIF_SYSCALL_EMU
clear_tsk_thread_flag(p, TIF_SYSCALL_EMU);
#endif
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43d6179508d6..55a2bc3186a7 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -117,7 +117,7 @@ void __ptrace_unlink(struct task_struct *child)
const struct cred *old_cred;
BUG_ON(!child->ptrace);

- clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+ clear_task_syscall_work(child, SYSCALL_TRACE);
#ifdef TIF_SYSCALL_EMU
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
#endif
@@ -812,9 +812,9 @@ static int ptrace_resume(struct task_struct *child, long request,
return -EIO;

if (request == PTRACE_SYSCALL)
- set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+ set_task_syscall_work(child, SYSCALL_TRACE);
else
- clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
+ clear_task_syscall_work(child, SYSCALL_TRACE);

#ifdef TIF_SYSCALL_EMU
if (request == PTRACE_SYSEMU || request == PTRACE_SYSEMU_SINGLESTEP)
--
2.29.2

2020-11-14 03:32:29

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 07/10] ptrace: Migrate TIF_SYSCALL_EMU to use SYSCALL_WORK flag

For architectures that rely on the generic syscall entry code, use the
syscall_work field in struct thread_info and the specific SYSCALL_WORK
flag. This set of flags has the advantage of being architecture
independent.

Users of the flag outside of the generic entry code should rely on the
accessor macros, such that the flag is still correctly resolved for
architectures that don't use the generic entry code and still rely on
TIF flags for system call work.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
include/linux/entry-common.h | 8 ++------
include/linux/thread_info.h | 2 ++
include/linux/tracehook.h | 2 +-
kernel/entry/common.c | 19 ++++++++++---------
kernel/fork.c | 4 ++--
kernel/ptrace.c | 10 +++++-----
6 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index dc864edb7950..39d56558818d 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -13,10 +13,6 @@
* Define dummy _TIF work flags if not defined by the architecture or for
* disabled functionality.
*/
-#ifndef _TIF_SYSCALL_EMU
-# define _TIF_SYSCALL_EMU (0)
-#endif
-
#ifndef _TIF_SYSCALL_AUDIT
# define _TIF_SYSCALL_AUDIT (0)
#endif
@@ -42,7 +38,6 @@

#define SYSCALL_ENTER_WORK \
(_TIF_SYSCALL_AUDIT | \
- _TIF_SYSCALL_EMU | \
ARCH_SYSCALL_ENTER_WORK)

/*
@@ -58,7 +53,8 @@

#define SYSCALL_WORK_ENTER (SYSCALL_WORK_SECCOMP | \
SYSCALL_WORK_SYSCALL_TRACEPOINT | \
- SYSCALL_WORK_SYSCALL_TRACE)
+ SYSCALL_WORK_SYSCALL_TRACE | \
+ SYSCALL_WORK_SYSCALL_EMU)
#define SYSCALL_WORK_EXIT (SYSCALL_WORK_SYSCALL_TRACEPOINT | \
SYSCALL_WORK_SYSCALL_TRACE)

diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index b01f05282158..3c7dedadf94d 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -40,11 +40,13 @@ enum syscall_work_bit {
SYSCALL_WORK_SECCOMP = 0,
SYSCALL_WORK_SYSCALL_TRACEPOINT = 1,
SYSCALL_WORK_SYSCALL_TRACE = 2,
+ SYSCALL_WORK_SYSCALL_EMU = 3,
};

#define _SYSCALL_WORK_SECCOMP BIT(SYSCALL_WORK_SECCOMP)
#define _SYSCALL_WORK_SYSCALL_TRACEPOINT BIT(SYSCALL_WORK_SYSCALL_TRACEPOINT)
#define _SYSCALL_WORK_SYSCALL_TRACE BIT(SYSCALL_WORK_SYSCALL_TRACE)
+#define _SYSCALL_WORK_SYSCALL_EMU BIT(SYSCALL_WORK_SYSCALL_EMU)

#include <asm/thread_info.h>

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 0aa3771d1df5..24424da49abc 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -83,7 +83,7 @@ static inline int ptrace_report_syscall(struct pt_regs *regs,
* tracehook_report_syscall_entry - task is about to attempt a system call
* @regs: user register state of current task
*
- * This will be called if %SYSCALL_TRACE or %TIF_SYSCALL_EMU have been set,
+ * This will be called if %SYSCALL_TRACE or %SYSCALL_EMU have been set,
* when the current task has just entered the kernel for a system call.
* Full user register state is available here. Changing the values
* in @regs can affect the system call number and arguments to be tried.
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 55ede5fed650..0170a4ae58f8 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -47,9 +47,9 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
long ret = 0;

/* Handle ptrace */
- if (work & _SYSCALL_WORK_SYSCALL_TRACE || ti_work & _TIF_SYSCALL_EMU) {
+ if (work & (_SYSCALL_WORK_SYSCALL_TRACE | _SYSCALL_WORK_SYSCALL_EMU)) {
ret = arch_syscall_enter_tracehook(regs);
- if (ret || (ti_work & _TIF_SYSCALL_EMU))
+ if (ret || (work & _SYSCALL_WORK_SYSCALL_EMU))
return -1L;
}

@@ -208,21 +208,22 @@ static void exit_to_user_mode_prepare(struct pt_regs *regs)
}

#ifndef _TIF_SINGLESTEP
-static inline bool report_single_step(unsigned long ti_work)
+static inline bool report_single_step(unsigned long work)
{
return false;
}
#else
/*
- * If TIF_SYSCALL_EMU is set, then the only reason to report is when
+ * If SYSCALL_EMU is set, then the only reason to report is when
* TIF_SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP). This syscall
* instruction has been already reported in syscall_enter_from_user_mode().
*/
-#define SYSEMU_STEP (_TIF_SINGLESTEP | _TIF_SYSCALL_EMU)
-
-static inline bool report_single_step(unsigned long ti_work)
+static inline bool report_single_step(unsigned long work)
{
- return (ti_work & SYSEMU_STEP) == _TIF_SINGLESTEP;
+ if (!(work & _SYSCALL_WORK_SYSCALL_EMU))
+ return false;
+
+ return !!(current_thread_info()->flags & _TIF_SINGLESTEP);
}
#endif

@@ -236,7 +237,7 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long ti_work,
if (work & _SYSCALL_WORK_SYSCALL_TRACEPOINT)
trace_sys_exit(regs, syscall_get_return_value(current, regs));

- step = report_single_step(ti_work);
+ step = report_single_step(work);
if (step || work & _SYSCALL_WORK_SYSCALL_TRACE)
arch_syscall_exit_tracehook(regs, step);
}
diff --git a/kernel/fork.c b/kernel/fork.c
index 6f934a930015..4f131cb0192a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2159,8 +2159,8 @@ static __latent_entropy struct task_struct *copy_process(
*/
user_disable_single_step(p);
clear_task_syscall_work(p, SYSCALL_TRACE);
-#ifdef TIF_SYSCALL_EMU
- clear_tsk_thread_flag(p, TIF_SYSCALL_EMU);
+#if defined(CONFIG_GENERIC_ENTRY) || defined(TIF_SYSCALL_EMU)
+ clear_task_syscall_work(p, SYSCALL_EMU);
#endif
clear_tsk_latency_tracing(p);

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 55a2bc3186a7..237bcd6d255c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -118,8 +118,8 @@ void __ptrace_unlink(struct task_struct *child)
BUG_ON(!child->ptrace);

clear_task_syscall_work(child, SYSCALL_TRACE);
-#ifdef TIF_SYSCALL_EMU
- clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
+#if defined(CONFIG_GENERIC_ENTRY) || defined(TIF_SYSCALL_EMU)
+ clear_task_syscall_work(child, SYSCALL_EMU);
#endif

child->parent = child->real_parent;
@@ -816,11 +816,11 @@ static int ptrace_resume(struct task_struct *child, long request,
else
clear_task_syscall_work(child, SYSCALL_TRACE);

-#ifdef TIF_SYSCALL_EMU
+#if defined(CONFIG_GENERIC_ENTRY) || defined(TIF_SYSCALL_EMU)
if (request == PTRACE_SYSEMU || request == PTRACE_SYSEMU_SINGLESTEP)
- set_tsk_thread_flag(child, TIF_SYSCALL_EMU);
+ set_task_syscall_work(child, SYSCALL_EMU);
else
- clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
+ clear_task_syscall_work(child, SYSCALL_EMU);
#endif

if (is_singleblock(request)) {
--
2.29.2

2020-11-14 03:32:42

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 03/10] kernel: entry: Wire up syscall_work in common entry code

Prepares the common entry code to use the SYSCALL_WORK flags. They will
be defined in subsequent patches for each type of syscall
work. SYSCALL_WORK_ENTRY/EXIT are defined for the transition, as they
will replace the TIF_ equivalent defines.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
include/linux/entry-common.h | 3 +++
kernel/entry/common.c | 15 +++++++++------
2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 1a128baf3628..cbc5c702ee4d 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -64,6 +64,9 @@
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
_TIF_SYSCALL_TRACEPOINT | ARCH_SYSCALL_EXIT_WORK)

+#define SYSCALL_WORK_ENTER (0)
+#define SYSCALL_WORK_EXIT (0)
+
/*
* TIF flags handled in exit_to_user_mode_loop()
*/
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index bc75c114c1b3..5a4bb72ff28e 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -42,7 +42,7 @@ static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
}

static long syscall_trace_enter(struct pt_regs *regs, long syscall,
- unsigned long ti_work)
+ unsigned long ti_work, unsigned long work)
{
long ret = 0;

@@ -75,10 +75,11 @@ static __always_inline long
__syscall_enter_from_user_work(struct pt_regs *regs, long syscall)
{
unsigned long ti_work;
+ unsigned long work = READ_ONCE(current_thread_info()->syscall_work);

ti_work = READ_ONCE(current_thread_info()->flags);
- if (ti_work & SYSCALL_ENTER_WORK)
- syscall = syscall_trace_enter(regs, syscall, ti_work);
+ if (work & SYSCALL_WORK_ENTER || ti_work & SYSCALL_ENTER_WORK)
+ syscall = syscall_trace_enter(regs, syscall, ti_work, work);

return syscall;
}
@@ -225,7 +226,8 @@ static inline bool report_single_step(unsigned long ti_work)
}
#endif

-static void syscall_exit_work(struct pt_regs *regs, unsigned long ti_work)
+static void syscall_exit_work(struct pt_regs *regs, unsigned long ti_work,
+ unsigned long work)
{
bool step;

@@ -246,6 +248,7 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long ti_work)
static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
{
u32 cached_flags = READ_ONCE(current_thread_info()->flags);
+ unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
unsigned long nr = syscall_get_nr(current, regs);

CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
@@ -262,8 +265,8 @@ static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
* enabled, we want to run them exactly once per syscall exit with
* interrupts enabled.
*/
- if (unlikely(cached_flags & SYSCALL_EXIT_WORK))
- syscall_exit_work(regs, cached_flags);
+ if (unlikely(work & SYSCALL_WORK_EXIT || cached_flags & SYSCALL_EXIT_WORK))
+ syscall_exit_work(regs, cached_flags, work);
}

__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
--
2.29.2

2020-11-14 03:32:49

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 04/10] seccomp: Migrate to use SYSCALL_WORK flag

When one the generic syscall entry code, use the syscall_work field in
struct thread_info and specific SYSCALL_WORK flags to setup this syscall
work. This flag has the advantage of being architecture independent.

Users of the flag outside of the generic entry code should rely on the
accessor macros, such that the flag is still correctly resolved for
architectures that don't use the generic entry code and still rely on
TIF flags for system call work.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
include/linux/entry-common.h | 8 ++------
include/linux/seccomp.h | 2 +-
include/linux/thread_info.h | 6 ++++++
kernel/entry/common.c | 2 +-
kernel/fork.c | 2 +-
kernel/seccomp.c | 6 +++---
6 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index cbc5c702ee4d..f3fc4457f63f 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -21,10 +21,6 @@
# define _TIF_SYSCALL_TRACEPOINT (0)
#endif

-#ifndef _TIF_SECCOMP
-# define _TIF_SECCOMP (0)
-#endif
-
#ifndef _TIF_SYSCALL_AUDIT
# define _TIF_SYSCALL_AUDIT (0)
#endif
@@ -49,7 +45,7 @@
#endif

#define SYSCALL_ENTER_WORK \
- (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SECCOMP | \
+ (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
_TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU | \
ARCH_SYSCALL_ENTER_WORK)

@@ -64,7 +60,7 @@
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
_TIF_SYSCALL_TRACEPOINT | ARCH_SYSCALL_EXIT_WORK)

-#define SYSCALL_WORK_ENTER (0)
+#define SYSCALL_WORK_ENTER (SYSCALL_WORK_SECCOMP)
#define SYSCALL_WORK_EXIT (0)

/*
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 02aef2844c38..47763f3999f7 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -42,7 +42,7 @@ struct seccomp {
extern int __secure_computing(const struct seccomp_data *sd);
static inline int secure_computing(void)
{
- if (unlikely(test_thread_flag(TIF_SECCOMP)))
+ if (unlikely(test_syscall_work(SECCOMP)))
return __secure_computing(NULL);
return 0;
}
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 18755373dc4d..fb53c24fc8a6 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -35,6 +35,12 @@ enum {
GOOD_STACK,
};

+enum syscall_work_bit {
+ SYSCALL_WORK_SECCOMP = 0,
+};
+
+#define _SYSCALL_WORK_SECCOMP BIT(SYSCALL_WORK_SECCOMP)
+
#include <asm/thread_info.h>

#ifdef __KERNEL__
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 5a4bb72ff28e..ef49786e5c5b 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -54,7 +54,7 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
}

/* Do seccomp after ptrace, to catch any tracer changes. */
- if (ti_work & _TIF_SECCOMP) {
+ if (work & _SYSCALL_WORK_SECCOMP) {
ret = __secure_computing(NULL);
if (ret == -1L)
return ret;
diff --git a/kernel/fork.c b/kernel/fork.c
index 7199d359690c..4433c9c60100 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1625,7 +1625,7 @@ static void copy_seccomp(struct task_struct *p)
* to manually enable the seccomp thread flag here.
*/
if (p->seccomp.mode != SECCOMP_MODE_DISABLED)
- set_tsk_thread_flag(p, TIF_SECCOMP);
+ set_task_syscall_work(p, SECCOMP);
#endif
}

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 8ad7a293255a..f67e92d11ad7 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -356,14 +356,14 @@ static inline void seccomp_assign_mode(struct task_struct *task,

task->seccomp.mode = seccomp_mode;
/*
- * Make sure TIF_SECCOMP cannot be set before the mode (and
+ * Make sure SYSCALL_WORK_SECCOMP cannot be set before the mode (and
* filter) is set.
*/
smp_mb__before_atomic();
/* Assume default seccomp processes want spec flaw mitigation. */
if ((flags & SECCOMP_FILTER_FLAG_SPEC_ALLOW) == 0)
arch_seccomp_spec_mitigate(task);
- set_tsk_thread_flag(task, TIF_SECCOMP);
+ set_task_syscall_work(task, SECCOMP);
}

#ifdef CONFIG_SECCOMP_FILTER
@@ -929,7 +929,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,

/*
* Make sure that any changes to mode from another thread have
- * been seen after TIF_SECCOMP was seen.
+ * been seen after SYSCALL_WORK_SECCOMP was seen.
*/
rmb();

--
2.29.2

2020-11-14 03:32:55

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 05/10] tracepoints: Migrate to use SYSCALL_WORK flag

For architectures that rely on the generic syscall entry code, use the
syscall_work field in struct thread_info and the specific SYSCALL_WORK
flag. This set of flags has the advantage of being architecture
independent.

Users of the flag outside of the generic entry code should rely on the
accessor macros, such that the flag is still correctly resolved for
architectures that don't use the generic entry code and still rely on
TIF flags for system call work.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
include/linux/entry-common.h | 13 +++++--------
include/linux/thread_info.h | 3 +++
include/trace/syscall.h | 6 +++---
kernel/entry/common.c | 4 ++--
kernel/trace/trace_events.c | 2 +-
kernel/tracepoint.c | 4 ++--
6 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index f3fc4457f63f..8aba367e5c79 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -17,10 +17,6 @@
# define _TIF_SYSCALL_EMU (0)
#endif

-#ifndef _TIF_SYSCALL_TRACEPOINT
-# define _TIF_SYSCALL_TRACEPOINT (0)
-#endif
-
#ifndef _TIF_SYSCALL_AUDIT
# define _TIF_SYSCALL_AUDIT (0)
#endif
@@ -46,7 +42,7 @@

#define SYSCALL_ENTER_WORK \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
- _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU | \
+ _TIF_SYSCALL_EMU | \
ARCH_SYSCALL_ENTER_WORK)

/*
@@ -58,10 +54,11 @@

#define SYSCALL_EXIT_WORK \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
- _TIF_SYSCALL_TRACEPOINT | ARCH_SYSCALL_EXIT_WORK)
+ ARCH_SYSCALL_EXIT_WORK)

-#define SYSCALL_WORK_ENTER (SYSCALL_WORK_SECCOMP)
-#define SYSCALL_WORK_EXIT (0)
+#define SYSCALL_WORK_ENTER (SYSCALL_WORK_SECCOMP | \
+ SYSCALL_WORK_SYSCALL_TRACEPOINT)
+#define SYSCALL_WORK_EXIT (SYSCALL_WORK_SYSCALL_TRACEPOINT)

/*
* TIF flags handled in exit_to_user_mode_loop()
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index fb53c24fc8a6..f764314b00b9 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -36,10 +36,13 @@ enum {
};

enum syscall_work_bit {
+
SYSCALL_WORK_SECCOMP = 0,
+ SYSCALL_WORK_SYSCALL_TRACEPOINT = 1,
};

#define _SYSCALL_WORK_SECCOMP BIT(SYSCALL_WORK_SECCOMP)
+#define _SYSCALL_WORK_SYSCALL_TRACEPOINT BIT(SYSCALL_WORK_SYSCALL_TRACEPOINT)

#include <asm/thread_info.h>

diff --git a/include/trace/syscall.h b/include/trace/syscall.h
index dc8ac27d27c1..8e193f3a33b3 100644
--- a/include/trace/syscall.h
+++ b/include/trace/syscall.h
@@ -37,10 +37,10 @@ struct syscall_metadata {
#if defined(CONFIG_TRACEPOINTS) && defined(CONFIG_HAVE_SYSCALL_TRACEPOINTS)
static inline void syscall_tracepoint_update(struct task_struct *p)
{
- if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
- set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+ if (test_syscall_work(SYSCALL_TRACEPOINT))
+ set_task_syscall_work(p, SYSCALL_TRACEPOINT);
else
- clear_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+ clear_task_syscall_work(p, SYSCALL_TRACEPOINT);
}
#else
static inline void syscall_tracepoint_update(struct task_struct *p)
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index ef49786e5c5b..745b847f4ed4 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -63,7 +63,7 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
/* Either of the above might have changed the syscall number */
syscall = syscall_get_nr(current, regs);

- if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT))
+ if (unlikely(work & _SYSCALL_WORK_SYSCALL_TRACEPOINT))
trace_sys_enter(regs, syscall);

syscall_enter_audit(regs, syscall);
@@ -233,7 +233,7 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long ti_work,

audit_syscall_exit(regs);

- if (ti_work & _TIF_SYSCALL_TRACEPOINT)
+ if (work & _SYSCALL_WORK_SYSCALL_TRACEPOINT)
trace_sys_exit(regs, syscall_get_return_value(current, regs));

step = report_single_step(ti_work);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 47a71f96e5bc..950764dd226f 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -3428,7 +3428,7 @@ static __init int event_trace_enable(void)
* initialize events and perhaps start any events that are on the
* command line. Unfortunately, there are some events that will not
* start this early, like the system call tracepoints that need
- * to set the TIF_SYSCALL_TRACEPOINT flag of pid 1. But event_trace_enable()
+ * to set the SYSCALL_TRACEPOINT flag of pid 1. But event_trace_enable()
* is called before pid 1 starts, and this flag is never set, making
* the syscall tracepoint never get reached, but the event is enabled
* regardless (and not doing anything).
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 3f659f855074..7261fa0f5e3c 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -594,7 +594,7 @@ int syscall_regfunc(void)
if (!sys_tracepoint_refcount) {
read_lock(&tasklist_lock);
for_each_process_thread(p, t) {
- set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
+ set_task_syscall_work(t, SYSCALL_TRACEPOINT);
}
read_unlock(&tasklist_lock);
}
@@ -611,7 +611,7 @@ void syscall_unregfunc(void)
if (!sys_tracepoint_refcount) {
read_lock(&tasklist_lock);
for_each_process_thread(p, t) {
- clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
+ clear_task_syscall_work(t, SYSCALL_TRACEPOINT);
}
read_unlock(&tasklist_lock);
}
--
2.29.2

2020-11-14 03:33:23

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 09/10] kernel: entry: Drop usage of TIF flags in the generic syscall code

Now that the flags migration in the common syscall entry is complete and
the code relies exclusively on syscall_work, clean up the
accesses to TI flags in that path.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
include/linux/entry-common.h | 20 +++++++++-----------
kernel/entry/common.c | 17 +++++++----------
2 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index afeb927e8545..cffd8bf1e085 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -26,31 +26,29 @@
#endif

/*
- * TIF flags handled in syscall_enter_from_user_mode()
+ * SYSCALL_WORK flags handled in syscall_enter_from_user_mode()
*/
-#ifndef ARCH_SYSCALL_ENTER_WORK
-# define ARCH_SYSCALL_ENTER_WORK (0)
+#ifndef ARCH_SYSCALL_WORK_ENTER
+# define ARCH_SYSCALL_WORK_ENTER (0)
#endif

-#define SYSCALL_ENTER_WORK ARCH_SYSCALL_ENTER_WORK
-
/*
* TIF flags handled in syscall_exit_to_user_mode()
*/
-#ifndef ARCH_SYSCALL_EXIT_WORK
-# define ARCH_SYSCALL_EXIT_WORK (0)
+#ifndef ARCH_SYSCALL_WORK_EXIT
+# define ARCH_SYSCALL_WORK_EXIT (0)
#endif

-#define SYSCALL_EXIT_WORK ARCH_SYSCALL_EXIT_WORK
-
#define SYSCALL_WORK_ENTER (SYSCALL_WORK_SECCOMP | \
SYSCALL_WORK_SYSCALL_TRACEPOINT | \
SYSCALL_WORK_SYSCALL_TRACE | \
SYSCALL_WORK_SYSCALL_EMU | \
- SYSCALL_WORK_SYSCALL_AUDIT)
+ SYSCALL_WORK_SYSCALL_AUDIT | \
+ ARCH_SYSCALL_WORK_ENTER)
#define SYSCALL_WORK_EXIT (SYSCALL_WORK_SYSCALL_TRACEPOINT | \
SYSCALL_WORK_SYSCALL_TRACE | \
- SYSCALL_WORK_SYSCALL_AUDIT)
+ SYSCALL_WORK_SYSCALL_AUDIT | \
+ ARCH_SYSCALL_WORK_EXIT)

/*
* TIF flags handled in exit_to_user_mode_loop()
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 0170a4ae58f8..0ddc590bfe73 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -42,7 +42,7 @@ static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
}

static long syscall_trace_enter(struct pt_regs *regs, long syscall,
- unsigned long ti_work, unsigned long work)
+ unsigned long work)
{
long ret = 0;

@@ -74,12 +74,10 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
static __always_inline long
__syscall_enter_from_user_work(struct pt_regs *regs, long syscall)
{
- unsigned long ti_work;
unsigned long work = READ_ONCE(current_thread_info()->syscall_work);

- ti_work = READ_ONCE(current_thread_info()->flags);
- if (work & SYSCALL_WORK_ENTER || ti_work & SYSCALL_ENTER_WORK)
- syscall = syscall_trace_enter(regs, syscall, ti_work, work);
+ if (work & SYSCALL_WORK_ENTER)
+ syscall = syscall_trace_enter(regs, syscall, work);

return syscall;
}
@@ -227,8 +225,8 @@ static inline bool report_single_step(unsigned long work)
}
#endif

-static void syscall_exit_work(struct pt_regs *regs, unsigned long ti_work,
- unsigned long work)
+
+static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
{
bool step;

@@ -248,7 +246,6 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long ti_work,
*/
static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
{
- u32 cached_flags = READ_ONCE(current_thread_info()->flags);
unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
unsigned long nr = syscall_get_nr(current, regs);

@@ -266,8 +263,8 @@ static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
* enabled, we want to run them exactly once per syscall exit with
* interrupts enabled.
*/
- if (unlikely(work & SYSCALL_WORK_EXIT || cached_flags & SYSCALL_EXIT_WORK))
- syscall_exit_work(regs, cached_flags, work);
+ if (unlikely(work & SYSCALL_WORK_EXIT))
+ syscall_exit_work(regs, work);
}

__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
--
2.29.2

2020-11-14 03:33:29

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 10/10] x86: Reclaim unused x86 TI flags

Reclaim TI flags that were migrated to syscall_work flags.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
arch/x86/include/asm/thread_info.h | 10 ----------
1 file changed, 10 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index b217f63e73b7..33b637442b9e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -75,15 +75,11 @@ struct thread_info {
* - these are process state flags that various assembly files
* may need to access
*/
-#define TIF_SYSCALL_TRACE 0 /* syscall trace active */
#define TIF_NOTIFY_RESUME 1 /* callback before returning to user */
#define TIF_SIGPENDING 2 /* signal pending */
#define TIF_NEED_RESCHED 3 /* rescheduling necessary */
#define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/
#define TIF_SSBD 5 /* Speculative store bypass disable */
-#define TIF_SYSCALL_EMU 6 /* syscall emulation active */
-#define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
-#define TIF_SECCOMP 8 /* secure computing */
#define TIF_SPEC_IB 9 /* Indirect branch speculation mitigation */
#define TIF_SPEC_L1D_FLUSH 10 /* Flush L1D on mm switches (processes) */
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
@@ -101,18 +97,13 @@ struct thread_info {
#define TIF_FORCED_TF 24 /* true if TF in eflags artificially */
#define TIF_BLOCKSTEP 25 /* set when we want DEBUGCTLMSR_BTF */
#define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */
-#define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */
#define TIF_ADDR32 29 /* 32-bit address space on 64 bits */

-#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
#define _TIF_SSBD (1 << TIF_SSBD)
-#define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU)
-#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
-#define _TIF_SECCOMP (1 << TIF_SECCOMP)
#define _TIF_SPEC_IB (1 << TIF_SPEC_IB)
#define _TIF_SPEC_L1D_FLUSH (1 << TIF_SPEC_L1D_FLUSH)
#define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
@@ -129,7 +120,6 @@ struct thread_info {
#define _TIF_FORCED_TF (1 << TIF_FORCED_TF)
#define _TIF_BLOCKSTEP (1 << TIF_BLOCKSTEP)
#define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES)
-#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT)
#define _TIF_ADDR32 (1 << TIF_ADDR32)

/* flags to check in __switch_to() */
--
2.29.2

2020-11-14 03:33:40

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 02/10] kernel: entry: Expose helpers to migrate TIF to SYSCALL_WORK flags

With the goal to split the syscall work related flags into a separate field
that is architecture independent, expose transitional helpers that
resolve to either the TIF flags or to the corresponding SYSCALL_WORK
flags. This will allow architectures to migrate only when they port to
the generic syscall entry code.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
include/linux/thread_info.h | 42 +++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)

diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index e93e249a4e9b..18755373dc4d 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -97,6 +97,48 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag)
#define test_thread_flag(flag) \
test_ti_thread_flag(current_thread_info(), flag)

+#ifdef CONFIG_GENERIC_ENTRY
+static inline void __set_task_syscall_work(struct thread_info *ti, int flag)
+{
+ set_bit(flag, (unsigned long *)&ti->syscall_work);
+}
+static inline int __test_task_syscall_work(struct thread_info *ti, int flag)
+{
+ return test_bit(flag, (unsigned long *)&ti->syscall_work);
+}
+static inline void __clear_task_syscall_work(struct thread_info *ti, int flag)
+{
+ return clear_bit(flag, (unsigned long *)&ti->syscall_work);
+}
+#define set_syscall_work(fl) \
+ __set_task_syscall_work(current_thread_info(), SYSCALL_WORK_##fl)
+#define test_syscall_work(fl) \
+ __test_task_syscall_work(current_thread_info(), SYSCALL_WORK_##fl)
+#define clear_syscall_work(fl) \
+ __clear_task_syscall_work(current_thread_info(), SYSCALL_WORK_##fl)
+
+#define set_task_syscall_work(t, fl) \
+ __set_task_syscall_work(task_thread_info(t), SYSCALL_WORK_##fl)
+#define test_task_syscall_work(t, fl) \
+ __test_task_syscall_work(task_thread_info(t), SYSCALL_WORK_##fl)
+#define clear_task_syscall_work(t, fl) \
+ __clear_task_syscall_work(task_thread_info(t), SYSCALL_WORK_##fl)
+#else
+#define set_syscall_work(fl) \
+ set_ti_thread_flag(current_thread_info(), SYSCALL_WORK_##fl)
+#define test_syscall_work(fl) \
+ test_ti_thread_flag(current_thread_info(), SYSCALL_WORK_##fl)
+#define clear_syscall_work(fl) \
+ clear_ti_thread_flag(current_thread_info(), SYSCALL_WORK_##fl)
+
+#define set_task_syscall_work(t, fl) \
+ set_ti_thread_flag(task_thread_info(t), TIF_##fl)
+#define test_task_syscall_work(t, fl) \
+ test_ti_thread_flag(task_thread_info(t), TIF_##fl)
+#define clear_task_syscall_work(t, fl) \
+ clear_ti_thread_flag(task_thread_info(t), TIF_##fl)
+#endif /* CONFIG_GENERIC_ENTRY */
+
#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)

#ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES
--
2.29.2

2020-11-14 03:34:08

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 08/10] audit: Migrate to use SYSCALL_WORK flag

For architectures that rely on the generic syscall entry code, use the
syscall_work field in struct thread_info and the specific SYSCALL_WORK
flag. This set of flags has the advantage of being architecture
independent.

Users of the flag outside of the generic entry code should rely on the
accessor macros, such that the flag is still correctly resolved for
architectures that don't use the generic entry code and still rely on
TIF flags for system call work.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
include/asm-generic/syscall.h | 14 +++++++-------
include/linux/entry-common.h | 18 ++++++------------
include/linux/thread_info.h | 2 ++
kernel/auditsc.c | 4 ++--
4 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
index 5042d1ba4bc5..66ada3b099eb 100644
--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -43,7 +43,7 @@ int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);
* @regs: task_pt_regs() of @task
*
* It's only valid to call this when @task is stopped for system
- * call exit tracing (due to SYSCALL_TRACE or TIF_SYSCALL_AUDIT),
+ * call exit tracing (due to SYSCALL_TRACE or SYSCALL_AUDIT),
* after tracehook_report_syscall_entry() returned nonzero to prevent
* the system call from taking place.
*
@@ -63,7 +63,7 @@ void syscall_rollback(struct task_struct *task, struct pt_regs *regs);
* Returns 0 if the system call succeeded, or -ERRORCODE if it failed.
*
* It's only valid to call this when @task is stopped for tracing on exit
- * from a system call, due to %SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * from a system call, due to %SYSCALL_TRACE or %SYSCALL_AUDIT.
*/
long syscall_get_error(struct task_struct *task, struct pt_regs *regs);

@@ -76,7 +76,7 @@ long syscall_get_error(struct task_struct *task, struct pt_regs *regs);
* This value is meaningless if syscall_get_error() returned nonzero.
*
* It's only valid to call this when @task is stopped for tracing on exit
- * from a system call, due to %SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * from a system call, due to %SYSCALL_TRACE or %SYSCALL_AUDIT.
*/
long syscall_get_return_value(struct task_struct *task, struct pt_regs *regs);

@@ -93,7 +93,7 @@ long syscall_get_return_value(struct task_struct *task, struct pt_regs *regs);
* code; the user sees a failed system call with this errno code.
*
* It's only valid to call this when @task is stopped for tracing on exit
- * from a system call, due to %SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * from a system call, due to %SYSCALL_TRACE or %SYSCALL_AUDIT.
*/
void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs,
int error, long val);
@@ -108,7 +108,7 @@ void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs,
* @args[0], and so on.
*
* It's only valid to call this when @task is stopped for tracing on
- * entry to a system call, due to %SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * entry to a system call, due to %SYSCALL_TRACE or %SYSCALL_AUDIT.
*/
void syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
unsigned long *args);
@@ -123,7 +123,7 @@ void syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
* The first argument gets value @args[0], and so on.
*
* It's only valid to call this when @task is stopped for tracing on
- * entry to a system call, due to %SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
+ * entry to a system call, due to %SYSCALL_TRACE or %SYSCALL_AUDIT.
*/
void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
const unsigned long *args);
@@ -135,7 +135,7 @@ void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
* Returns the AUDIT_ARCH_* based on the system call convention in use.
*
* It's only valid to call this when @task is stopped on entry to a system
- * call, due to %SYSCALL_TRACE, %TIF_SYSCALL_AUDIT, or %TIF_SECCOMP.
+ * call, due to %SYSCALL_TRACE, %SYSCALL_AUDIT, or %TIF_SECCOMP.
*
* Architectures which permit CONFIG_HAVE_ARCH_SECCOMP_FILTER must
* provide an implementation of this.
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 39d56558818d..afeb927e8545 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -13,10 +13,6 @@
* Define dummy _TIF work flags if not defined by the architecture or for
* disabled functionality.
*/
-#ifndef _TIF_SYSCALL_AUDIT
-# define _TIF_SYSCALL_AUDIT (0)
-#endif
-
#ifndef _TIF_PATCH_PENDING
# define _TIF_PATCH_PENDING (0)
#endif
@@ -36,9 +32,7 @@
# define ARCH_SYSCALL_ENTER_WORK (0)
#endif

-#define SYSCALL_ENTER_WORK \
- (_TIF_SYSCALL_AUDIT | \
- ARCH_SYSCALL_ENTER_WORK)
+#define SYSCALL_ENTER_WORK ARCH_SYSCALL_ENTER_WORK

/*
* TIF flags handled in syscall_exit_to_user_mode()
@@ -47,16 +41,16 @@
# define ARCH_SYSCALL_EXIT_WORK (0)
#endif

-#define SYSCALL_EXIT_WORK \
- (_TIF_SYSCALL_AUDIT | \
- ARCH_SYSCALL_EXIT_WORK)
+#define SYSCALL_EXIT_WORK ARCH_SYSCALL_EXIT_WORK

#define SYSCALL_WORK_ENTER (SYSCALL_WORK_SECCOMP | \
SYSCALL_WORK_SYSCALL_TRACEPOINT | \
SYSCALL_WORK_SYSCALL_TRACE | \
- SYSCALL_WORK_SYSCALL_EMU)
+ SYSCALL_WORK_SYSCALL_EMU | \
+ SYSCALL_WORK_SYSCALL_AUDIT)
#define SYSCALL_WORK_EXIT (SYSCALL_WORK_SYSCALL_TRACEPOINT | \
- SYSCALL_WORK_SYSCALL_TRACE)
+ SYSCALL_WORK_SYSCALL_TRACE | \
+ SYSCALL_WORK_SYSCALL_AUDIT)

/*
* TIF flags handled in exit_to_user_mode_loop()
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 3c7dedadf94d..3fb475583af0 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -41,12 +41,14 @@ enum syscall_work_bit {
SYSCALL_WORK_SYSCALL_TRACEPOINT = 1,
SYSCALL_WORK_SYSCALL_TRACE = 2,
SYSCALL_WORK_SYSCALL_EMU = 3,
+ SYSCALL_WORK_SYSCALL_AUDIT = 4,
};

#define _SYSCALL_WORK_SECCOMP BIT(SYSCALL_WORK_SECCOMP)
#define _SYSCALL_WORK_SYSCALL_TRACEPOINT BIT(SYSCALL_WORK_SYSCALL_TRACEPOINT)
#define _SYSCALL_WORK_SYSCALL_TRACE BIT(SYSCALL_WORK_SYSCALL_TRACE)
#define _SYSCALL_WORK_SYSCALL_EMU BIT(SYSCALL_WORK_SYSCALL_EMU)
+#define _SYSCALL_WORK_SYSCALL_AUDIT BIT(SYSCALL_WORK_SYSCALL_AUDIT)

#include <asm/thread_info.h>

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 8dba8f0983b5..c00aa5837965 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -952,7 +952,7 @@ int audit_alloc(struct task_struct *tsk)

state = audit_filter_task(tsk, &key);
if (state == AUDIT_DISABLED) {
- clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+ clear_task_syscall_work(tsk, SYSCALL_AUDIT);
return 0;
}

@@ -964,7 +964,7 @@ int audit_alloc(struct task_struct *tsk)
context->filterkey = key;

audit_set_context(tsk, context);
- set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+ set_task_syscall_work(tsk, SYSCALL_AUDIT);
return 0;
}

--
2.29.2

2020-11-14 11:26:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 02/10] kernel: entry: Expose helpers to migrate TIF to SYSCALL_WORK flags

> +#ifdef CONFIG_GENERIC_ENTRY
> +static inline void __set_task_syscall_work(struct thread_info *ti, int flag)
> +{
> + set_bit(flag, (unsigned long *)&ti->syscall_work);
> +}
> +static inline int __test_task_syscall_work(struct thread_info *ti, int flag)
> +{
> + return test_bit(flag, (unsigned long *)&ti->syscall_work);
> +}
> +static inline void __clear_task_syscall_work(struct thread_info *ti, int flag)
> +{
> + return clear_bit(flag, (unsigned long *)&ti->syscall_work);

The casts here look bogus.

2020-11-15 18:39:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 04/10] seccomp: Migrate to use SYSCALL_WORK flag

On Fri, Nov 13 2020 at 22:29, Gabriel Krisman Bertazi wrote:
>
> +enum syscall_work_bit {
> + SYSCALL_WORK_SECCOMP = 0,

enums start at 0, so why do you need an explicit assignment?

> +};
> +
> +#define _SYSCALL_WORK_SECCOMP BIT(SYSCALL_WORK_SECCOMP)

Do we really have to repeat the nonsense from TIF/_TIF in the naming
here? Can we please name this in a way which makes it obvious what is
what?

Thanks,

tglx

2020-11-15 18:40:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 02/10] kernel: entry: Expose helpers to migrate TIF to SYSCALL_WORK flags

On Sat, Nov 14 2020 at 11:22, Christoph Hellwig wrote:
>> +#ifdef CONFIG_GENERIC_ENTRY
>> +static inline void __set_task_syscall_work(struct thread_info *ti, int flag)
>> +{
>> + set_bit(flag, (unsigned long *)&ti->syscall_work);
>> +}
>> +static inline int __test_task_syscall_work(struct thread_info *ti, int flag)
>> +{
>> + return test_bit(flag, (unsigned long *)&ti->syscall_work);
>> +}
>> +static inline void __clear_task_syscall_work(struct thread_info *ti, int flag)
>> +{
>> + return clear_bit(flag, (unsigned long *)&ti->syscall_work);
>
> The casts here look bogus.

Making sure that &(unsigned long) results in a pointer to unsigned long
is indeed silly.

Thanks,

tglx

2020-11-15 18:55:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 03/10] kernel: entry: Wire up syscall_work in common entry code

On Fri, Nov 13 2020 at 22:29, Gabriel Krisman Bertazi wrote:

"kernel: entry:" is not the right subsystem prefix.

git log kernel/entry/ might give you a hint.

> Prepares the common entry code to use the SYSCALL_WORK flags. They
> will

s/Prepares/Prepare/

> be defined in subsequent patches for each type of syscall
> work. SYSCALL_WORK_ENTRY/EXIT are defined for the transition, as they
> will replace the TIF_ equivalent defines.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> include/linux/entry-common.h | 3 +++
> kernel/entry/common.c | 15 +++++++++------
> 2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index 1a128baf3628..cbc5c702ee4d 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -64,6 +64,9 @@
> (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
> _TIF_SYSCALL_TRACEPOINT | ARCH_SYSCALL_EXIT_WORK)
>
> +#define SYSCALL_WORK_ENTER (0)
> +#define SYSCALL_WORK_EXIT (0)
> +
> /*
> * TIF flags handled in exit_to_user_mode_loop()
> */
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index bc75c114c1b3..5a4bb72ff28e 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -42,7 +42,7 @@ static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
> }
>
> static long syscall_trace_enter(struct pt_regs *regs, long syscall,
> - unsigned long ti_work)
> + unsigned long ti_work, unsigned long work)
> {
> long ret = 0;
>
> @@ -75,10 +75,11 @@ static __always_inline long
> __syscall_enter_from_user_work(struct pt_regs *regs, long syscall)
> {
> unsigned long ti_work;
> + unsigned long work = READ_ONCE(current_thread_info()->syscall_work);

Even if this is temporary this code uses reverse fir tree ordering of
variable declarations:

unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
unsigned long ti_work;

Thanks,

tglx

2020-11-15 18:55:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 05/10] tracepoints: Migrate to use SYSCALL_WORK flag

On Fri, Nov 13 2020 at 22:29, Gabriel Krisman Bertazi wrote:
>
> enum syscall_work_bit {
> +
> SYSCALL_WORK_SECCOMP = 0,
> + SYSCALL_WORK_SYSCALL_TRACEPOINT = 1,

No assignment required. Enums just do that for you.

Thanks,

tglx

2020-11-15 18:55:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 00/10] Migrate syscall entry/exit work to SYSCALL_WORK flagset

On Fri, Nov 13 2020 at 22:29, Gabriel Krisman Bertazi wrote:
> This a refactor work moving the work done by features like seccomp,
> ptrace, audit and tracepoints out of the TI flags. The reasons are:
>
> 1) Scarcity of TI flags in x86 32-bit.
>
> 2) TI flags are defined by the architecture, while these features are
> arch-independent.
>
> 3) Community resistance in merging new architecture-independent
> features as TI flags.
>
> The design exposes a new field in struct thread_info that is read at
> syscall_trace_enter and syscall_work_exit in place of the ti flags.
> No functional changes is expected from this patchset. The design and
> organization of this patchset achieves the following goals:

Aside of the few nitpicks, this looks good. Thanks for doing this!

tglx