Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp699428pxb; Tue, 2 Feb 2021 16:01:01 -0800 (PST) X-Google-Smtp-Source: ABdhPJxxABn8znFTLgE92Ieo8zY8zJ4nsVHXqgAnT1u2jHVuoT58EH081u8ZVPkAXfs7ZWujuRRk X-Received: by 2002:a05:6402:270e:: with SMTP id y14mr504157edd.322.1612310461675; Tue, 02 Feb 2021 16:01:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612310461; cv=none; d=google.com; s=arc-20160816; b=AQNrB76vEVweN7SOUxsjEn/MQcZE2VfLmDJL8KcLztjO/cL2PoWFsej0DKLcVNtRNQ ueIwTo8K62xiGBT1HB1fKYmuF5pAxH9Y7MMY8Zjdxe7nPjP7arfwZF3OCL2lSv4aV6Uf yKV+tDEMf+khv2CZiRMftlsHT+gXRRw6lf/cwYqQ2q3PKy/uLOay7BIAZerZ1xdSByg/ 3CQsU9d+BnSkQGbNCnHWcfUSabK1enQPPByBP3tMDRfVsf45WT2QKhVvA80KMlaTB+Pf /wZQtnBlICMS6khtA2eL6kNkeh+yBwJYmHkp20xaQ0Xbfs4wiQgI0DH9tEAvujvzsSH4 KA0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:mime-version:message-id:date :sender:dkim-signature; bh=Yw2AQQjjN0qlybNUjTs4aWik4pvcwC2CB37OFNajzDc=; b=mfdLiuq1AS4vYlIHM9B8p9CHy8+qSwUvpCNLMHdlsvP5HX3umFezqzPoMVNymxVNU1 MEa76Iuq2CQya+EvTW+AEPrjz/Vyw+xfS/77QBCqt7FfKgM0g4j6GTlvmQe/SRn5gd/P 30uqah6aZ+g+FyTYKkt60YVAnHUYD6RDFX2kaDUq4n/TrZLA4BEupSb4cbHZS5HZxuR+ 36taQ6ECIYfo2KuDdBbhmKK0GVw3j/lAVd5hOv6k8XL3ezQ1H2+6yp1rAydzTLpxnMNL i6y79gypOK/gJ4TEV6obepzsNOxH1bhDJu8Yz/3122REQdvRWrtsVBBsPfZPRK8oLItB kTTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rUQDpLeq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a66si182339edf.607.2021.02.02.16.00.37; Tue, 02 Feb 2021 16:01:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rUQDpLeq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237735AbhBBRkf (ORCPT + 99 others); Tue, 2 Feb 2021 12:40:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237611AbhBBRiZ (ORCPT ); Tue, 2 Feb 2021 12:38:25 -0500 Received: from mail-lj1-x249.google.com (mail-lj1-x249.google.com [IPv6:2a00:1450:4864:20::249]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C308CC061794 for ; Tue, 2 Feb 2021 09:37:41 -0800 (PST) Received: by mail-lj1-x249.google.com with SMTP id r23so11870359ljm.1 for ; Tue, 02 Feb 2021 09:37:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:message-id:mime-version:subject:from:to:cc; bh=Yw2AQQjjN0qlybNUjTs4aWik4pvcwC2CB37OFNajzDc=; b=rUQDpLeqLdk5w526Fchik4CgVhp8n9H/4HYuYgHbqVQzdpib3ketb8+tnpB3o0B+Ty cKh7MHwtHw701wAxuG+UsrxfBeJsTsZynkyO6r+N3S+TIj6c4FizbFLD7GztnTntFHnZ gsSzsbXXMLDrszd3BDAQM1YNEphB6CZXZ/9AZmpXHYfvOXTkUy/d8jlN1/4d2wMA7rbp ju8p8cViWLsM+oBuJFzi/ouTETYDPDTq7DS0nrFC46ZFC36bzHHRdptfjjFXfusRQn3K e4BhoBlEZlaDkzlK3AbURrWoyJGwpgU5NOza3tXHxR56Pd11zv79a52zzeHHroM/AvRE cBZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:message-id:mime-version:subject:from :to:cc; bh=Yw2AQQjjN0qlybNUjTs4aWik4pvcwC2CB37OFNajzDc=; b=BDnrrXF/8obRL3d0CTx0XYSDeT7zuqr/OmEHmeCXRkmxKi1zelGTxVuYStn6mbonG0 m38duqSyWYEwLF9W/CBEB2g754mjVu6TGG6tIUpUjJmSjcsAssEgKyQ2Mh2kXb2pySvg tUydKUWUiLSDCSf1nPvvhinOxuiL3iJtC1ZsJVAILfQH6OwNXByRcgPjTmtrOBIzjis9 d90ACE2OvB8q8To5bJb/60WP0YmR+Etulz8fLo613AVlyjs/8YFfEy0JBbgyoEp2XNgA tJtH9axitdfrATOonkvJ/b3ywoOEpywXJqezl8Z6oLPW5z6rXQcaeg1xwA9chZU2lms1 VMUQ== X-Gm-Message-State: AOAM533dEubWFZJ+fi7Q1B0lQgdSh2rbjdLy+1A6AnMQvmX5OeBnOtED jD2Hy3XEAESugeyuiVwHyJ2rPmUMk9Y= Sender: "figiel via sendgmr" X-Received: from odra.waw.corp.google.com ([2a00:79e0:2:11:1ea0:b8ff:fe79:fe73]) (user=figiel job=sendgmr) by 2002:a05:6512:228b:: with SMTP id f11mr11712619lfu.78.1612287459908; Tue, 02 Feb 2021 09:37:39 -0800 (PST) Date: Tue, 2 Feb 2021 18:37:09 +0100 Message-Id: <20210202173709.4104221-1-figiel@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog Subject: [PATCH v4] fs/proc: Expose RSEQ configuration From: Piotr Figiel To: Alexey Dobriyan , "Eric W. Biederman" , Andrew Morton , Kees Cook , Alexey Gladkov , Michel Lespinasse , Bernd Edlinger , Andrei Vagin , mathieu.desnoyers@efficios.com, viro@zeniv.linux.org.uk, peterz@infradead.org, paulmck@kernel.org, boqun.feng@gmail.com Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, posk@google.com, kyurtsever@google.com, ckennelly@google.com, pjt@google.com, Piotr Figiel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org For userspace checkpoint and restore (C/R) some way of getting process state containing RSEQ configuration is needed. There are two ways this information is going to be used: - to re-enable RSEQ for threads which had it enabled before C/R - to detect if a thread was in a critical section during C/R Since C/R preserves TLS memory and addresses RSEQ ABI will be restored using the address registered before C/R. Detection whether the thread is in a critical section during C/R is needed to enforce behavior of RSEQ abort during C/R. Attaching with ptrace() before registers are dumped itself doesn't cause RSEQ abort. Restoring the instruction pointer within the critical section is problematic because rseq_cs may get cleared before the control is passed to the migrated application code leading to RSEQ invariants not being preserved. To achieve above goals expose the RSEQ ABI address and the signature value with the new procfs file "/proc//rseq". Signed-off-by: Piotr Figiel --- v4: - added documentation and extended comment before task_lock() v3: - added locking so that the proc file always shows consistent pair of RSEQ ABI address and the signature - changed string formatting to use %px for the RSEQ ABI address v2: - fixed string formatting for 32-bit architectures v1: - https://lkml.kernel.org/r/20210113174127.2500051-1-figiel@google.com --- Documentation/filesystems/proc.rst | 16 ++++++++++++++++ fs/exec.c | 2 ++ fs/proc/base.c | 22 ++++++++++++++++++++++ include/linux/sched/task.h | 3 ++- kernel/rseq.c | 4 ++++ 5 files changed, 46 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 2fa69f710e2a..d887666dc849 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -47,6 +47,7 @@ fixes/update part 1.1 Stefani Seibold June 9 2009 3.10 /proc//timerslack_ns - Task timerslack value 3.11 /proc//patch_state - Livepatch patch operation state 3.12 /proc//arch_status - Task architecture specific information + 3.13 /proc//rseq - RSEQ configuration state 4 Configuring procfs 4.1 Mount options @@ -2131,6 +2132,21 @@ AVX512_elapsed_ms the task is unlikely an AVX512 user, but depends on the workload and the scheduling scenario, it also could be a false negative mentioned above. +3.13 /proc//rseq - RSEQ configuration state +--------------------------------------------------- +This file provides RSEQ configuration of a thread. Available fields correspond +to the rseq() syscall parameters and are: + + - RSEQ ABI structure address shared between the kernel and user-space + - signature value expected before the abort handler code + +Both values are in hexadecimal format, for example:: + + # cat /proc/12345/rseq + 0000abcdef12340 aabb0011 + +This file is only present if CONFIG_RSEQ is enabled. + Chapter 4: Configuring procfs ============================= diff --git a/fs/exec.c b/fs/exec.c index 5d4d52039105..5d84f98847f1 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1830,7 +1830,9 @@ static int bprm_execve(struct linux_binprm *bprm, /* execve succeeded */ current->fs->in_exec = 0; current->in_execve = 0; + task_lock(current); rseq_execve(current); + task_unlock(current); acct_update_integrals(current); task_numa_free(current, false); return retval; diff --git a/fs/proc/base.c b/fs/proc/base.c index b3422cda2a91..89232329d966 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -662,6 +662,22 @@ static int proc_pid_syscall(struct seq_file *m, struct pid_namespace *ns, return 0; } + +#ifdef CONFIG_RSEQ +static int proc_pid_rseq(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) +{ + int res = lock_trace(task); + + if (res) + return res; + task_lock(task); + seq_printf(m, "%px %08x\n", task->rseq, task->rseq_sig); + task_unlock(task); + unlock_trace(task); + return 0; +} +#endif /* CONFIG_RSEQ */ #endif /* CONFIG_HAVE_ARCH_TRACEHOOK */ /************************************************************************/ @@ -3182,6 +3198,9 @@ static const struct pid_entry tgid_base_stuff[] = { REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations), #ifdef CONFIG_HAVE_ARCH_TRACEHOOK ONE("syscall", S_IRUSR, proc_pid_syscall), +#ifdef CONFIG_RSEQ + ONE("rseq", S_IRUSR, proc_pid_rseq), +#endif #endif REG("cmdline", S_IRUGO, proc_pid_cmdline_ops), ONE("stat", S_IRUGO, proc_tgid_stat), @@ -3522,6 +3541,9 @@ static const struct pid_entry tid_base_stuff[] = { &proc_pid_set_comm_operations, {}), #ifdef CONFIG_HAVE_ARCH_TRACEHOOK ONE("syscall", S_IRUSR, proc_pid_syscall), +#ifdef CONFIG_RSEQ + ONE("rseq", S_IRUSR, proc_pid_rseq), +#endif #endif REG("cmdline", S_IRUGO, proc_pid_cmdline_ops), ONE("stat", S_IRUGO, proc_tid_stat), diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index c0f71f2e7160..b6d085ac571b 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -155,7 +155,8 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t) * Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring * subscriptions and synchronises with wait4(). Also used in procfs. Also * pins the final release of task.io_context. Also protects ->cpuset and - * ->cgroup.subsys[]. And ->vfork_done. + * ->cgroup.subsys[]. And ->vfork_done. And ->rseq and ->rseq_sig to + * synchronize changes with procfs reader. * * Nests both inside and outside of read_lock(&tasklist_lock). * It must not be nested with write_lock_irq(&tasklist_lock), diff --git a/kernel/rseq.c b/kernel/rseq.c index a4f86a9d6937..6aea67878065 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -322,8 +322,10 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, ret = rseq_reset_rseq_cpu_id(current); if (ret) return ret; + task_lock(current); current->rseq = NULL; current->rseq_sig = 0; + task_unlock(current); return 0; } @@ -353,8 +355,10 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, return -EINVAL; if (!access_ok(rseq, rseq_len)) return -EFAULT; + task_lock(current); current->rseq = rseq; current->rseq_sig = sig; + task_unlock(current); /* * If rseq was previously inactive, and has just been * registered, ensure the cpu_id_start and cpu_id fields -- 2.30.0.478.g8a0d178c01-goog