On Tue, 26 Jan 2021 19:54:12 +0100 Piotr Figiel <[email protected]> wrote:
> For userspace checkpoint and restore (C/R) some way of getting process
> state containing RSEQ configuration is needed.
>
> There are two ways this information is going to be used:
> - to re-enable RSEQ for threads which had it enabled before C/R
> - to detect if a thread was in a critical section during C/R
>
> Since C/R preserves TLS memory and addresses RSEQ ABI will be restored
> using the address registered before C/R.
>
> Detection whether the thread is in a critical section during C/R is
> needed to enforce behavior of RSEQ abort during C/R. Attaching with
> ptrace() before registers are dumped itself doesn't cause RSEQ abort.
> Restoring the instruction pointer within the critical section is
> problematic because rseq_cs may get cleared before the control is
> passed to the migrated application code leading to RSEQ invariants not
> being preserved.
>
> To achieve above goals expose the RSEQ structure address and the
> signature value with the new per-thread procfs file "rseq".
Using "/proc/<pid>/rseq" would be more informative.
> fs/exec.c | 2 ++
> fs/proc/base.c | 22 ++++++++++++++++++++++
> kernel/rseq.c | 4 ++++
A Documentation/ update would be appropriate.
> 3 files changed, 28 insertions(+)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 5d4d52039105..5d84f98847f1 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1830,7 +1830,9 @@ static int bprm_execve(struct linux_binprm *bprm,
> /* execve succeeded */
> current->fs->in_exec = 0;
> current->in_execve = 0;
> + task_lock(current);
> rseq_execve(current);
> + task_unlock(current);
There's a comment over the task_lock() implementation which explains
what things it locks. An update to that would be helpful.
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -662,6 +662,22 @@ static int proc_pid_syscall(struct seq_file *m, struct pid_namespace *ns,
>
> return 0;
> }
> +
> +#ifdef CONFIG_RSEQ
> +static int proc_pid_rseq(struct seq_file *m, struct pid_namespace *ns,
> + struct pid *pid, struct task_struct *task)
> +{
> + int res = lock_trace(task);
> +
> + if (res)
> + return res;
> + task_lock(task);
> + seq_printf(m, "%px %08x\n", task->rseq, task->rseq_sig);
> + task_unlock(task);
> + unlock_trace(task);
> + return 0;
> +}
Do we actually need task_lock() for this purpose? Would
exec_update_lock() alone be adequate and appropriate?
On Tue, Jan 26, 2021 at 11:25:47AM -0800, Andrew Morton wrote:
> On Tue, 26 Jan 2021 19:54:12 +0100 Piotr Figiel <[email protected]> wrote:
> > To achieve above goals expose the RSEQ structure address and the
> > signature value with the new per-thread procfs file "rseq".
> Using "/proc/<pid>/rseq" would be more informative.
>
> > fs/exec.c | 2 ++
> > fs/proc/base.c | 22 ++++++++++++++++++++++
> > kernel/rseq.c | 4 ++++
>
> A Documentation/ update would be appropriate.
>
> > + task_lock(current);
> > rseq_execve(current);
> > + task_unlock(current);
>
> There's a comment over the task_lock() implementation which explains
> what things it locks. An update to that would be helpful.
Agreed I'll include fixes for above comments in v4.
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -662,6 +662,22 @@ static int proc_pid_syscall(struct seq_file *m, struct pid_namespace *ns,
> >
> > return 0;
> > }
> > +
> > +#ifdef CONFIG_RSEQ
> > +static int proc_pid_rseq(struct seq_file *m, struct pid_namespace *ns,
> > + struct pid *pid, struct task_struct *task)
> > +{
> > + int res = lock_trace(task);
> > +
> > + if (res)
> > + return res;
> > + task_lock(task);
> > + seq_printf(m, "%px %08x\n", task->rseq, task->rseq_sig);
> > + task_unlock(task);
> > + unlock_trace(task);
> > + return 0;
> > +}
>
> Do we actually need task_lock() for this purpose? Would
> exec_update_lock() alone be adequate and appropriate?
Now rseq syscall which modifies those fields isn't synchronised with
exec_update_lock. So either a new lock or task_lock() could be used or
exec_update_lock could be reused in the syscall. I decided against
exec_update_lock reuse in the syscall because it's normally used to
guard access checks against concurrent setuid exec. This could be
potentially confusing as it's not relevant for the the rseq syscall
code.
I think task_lock usage here is also consistent with how it's used
across the kernel.
Whether we need consistent rseq and rseq_sig pairs in the proc output, I
think there's some argument for it (discussed also in parallel thread
with Mathieu Desnoyers).
Best regards,
Piotr.