Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp459467imn; Fri, 29 Jul 2022 12:16:40 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vMvovhd0DJcgHwxuYKRZAiWJCHWDaIv5PTvLGNlHvqbuLEMdbJ7HBez/M813g2EozSGqnm X-Received: by 2002:a17:907:9612:b0:72e:56bd:2b9a with SMTP id gb18-20020a170907961200b0072e56bd2b9amr3936008ejc.526.1659122199904; Fri, 29 Jul 2022 12:16:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659122199; cv=none; d=google.com; s=arc-20160816; b=IaAeSg3cvdgk120R5cOSGfDubVmmDHz5IBfPwrEpOVRQ2B+KLwEICgxBpQdl49tan8 A+BiD9YSJ1tqRTsD0clf2FhxGVxGLvX/XniNtj8wait0va2ZRNfYTiawxbiwBvjhBwIa WAf6wVAS5ikTpZOyBUUtSaczKu/mid2xT4N6eK76Gp0JkU4EpyvwGCtlr+5RV+eao21i oOzi3vfu43eAythey7VCR4GYrhJETImmHGlsYsCFv6kpwQkXzJBBjG/LIHvGe7eHMMNg UY2IFKgrspitIvgdTvSDC9n/pulzi0S6cmyiA4Tp3hA5qIL1rruwWpYpr6F44tWvxzl9 kX6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature:dkim-filter; bh=Cc9JbG7gjTzAaLAreBIoiB9GFv5TVR736Y/5PPIgAkY=; b=sb06nOhMLECcfRqjECyqlhxzq6uCTDwVNXqQIjiowK0jR4Ne3L9ABzr71po8EFBq3o BCfaEZustlVsviZUpu/bmYJHFGQSr8EGZ2Hvc6Sw9Z1Pjdd0bCcSD7v2sIy8mRXmrCz4 XA/iV6VI/YyqiQrHZpy+BgTYtO3VZ70CF1A9fx5gIJ1braYB6gOZMBmJb0SQUv3K0CeJ //KYd8BTmu+CKoCaAIbA+tx511KS8cZU57AzU48PNkryQ852fGjoKYXctxW8CUE5NoLF CA/BKNILzGX9XqGiOgiSMXPgMrY5wbFvZUiQxku5W0NjYZAYFJpMeJJa+sygNqcdti+Q /iQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=o9YC6Zsm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id pv20-20020a170907209400b006e89d168e62si3826894ejb.820.2022.07.29.12.16.15; Fri, 29 Jul 2022 12:16:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=o9YC6Zsm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238592AbiG2TC4 (ORCPT + 99 others); Fri, 29 Jul 2022 15:02:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238005AbiG2TCo (ORCPT ); Fri, 29 Jul 2022 15:02:44 -0400 Received: from mail.efficios.com (mail.efficios.com [167.114.26.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E406287373; Fri, 29 Jul 2022 12:02:42 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 154BA318F92; Fri, 29 Jul 2022 15:02:42 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id MnebPTBsrdV9; Fri, 29 Jul 2022 15:02:41 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 8EDF6318B4C; Fri, 29 Jul 2022 15:02:41 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 8EDF6318B4C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1659121361; bh=Cc9JbG7gjTzAaLAreBIoiB9GFv5TVR736Y/5PPIgAkY=; h=From:To:Date:Message-Id; b=o9YC6ZsmQlWvhOyMmpZJG5bAan8PciAivRvOGciiejnE/gICuLfNexskPcCbDhito IqyLHDcL8EdrpVZRlEJzoEkmGkoZPs06dyNSxJ4+YVsU5BrXyCbJtL4dMDiUm95WTa Q0Qmkd/oqT9tnqAOzKczJXyPIu3MDC5ae2AGQLJMEdBfZV+mQ6c3meMFNUKC3iPRn3 Rd+4VP6nxTHApgjXbhZrcKb+gyavsPLJ20Cvgjrsh7vfn7ayc6z+MIMOhcyBx5qrvC C/4qniHbao6oGA5Qn+5xZ0wMhBY4F72Aq7nVe1qFTKiC4N2xuJsBE//7IIcKTfe+cl 2EIk6hoAX49Sg== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 1fYbi2g1nRpQ; Fri, 29 Jul 2022 15:02:41 -0400 (EDT) Received: from localhost.localdomain (192-222-180-24.qc.cable.ebox.net [192.222.180.24]) by mail.efficios.com (Postfix) with ESMTPSA id 450A3318E13; Fri, 29 Jul 2022 15:02:41 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , Paul Turner , linux-api@vger.kernel.org, Christian Brauner , Florian Weimer , David.Laight@ACULAB.COM, carlos@redhat.com, Peter Oskolkov , Mathieu Desnoyers Subject: [PATCH v3 02/23] rseq: Introduce extensible rseq ABI Date: Fri, 29 Jul 2022 15:02:04 -0400 Message-Id: <20220729190225.12726-3-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220729190225.12726-1-mathieu.desnoyers@efficios.com> References: <20220729190225.12726-1-mathieu.desnoyers@efficios.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Introduce the extensible rseq ABI, where the feature size supported by the kernel and the required alignment are communicated to user-space through ELF auxiliary vectors. This allows user-space to call rseq registration with a rseq_len of either 32 bytes for the original struct rseq size (which includes padding), or larger. If rseq_len is larger than 32 bytes, then it must be large enough to contain the feature size communicated to user-space through ELF auxiliary vectors. Signed-off-by: Mathieu Desnoyers --- include/linux/sched.h | 4 ++++ kernel/ptrace.c | 2 +- kernel/rseq.c | 33 +++++++++++++++++++++++++++------ 3 files changed, 32 insertions(+), 7 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a8911b1f35aa..68b23937b4a5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1290,6 +1290,7 @@ struct task_struct { #ifdef CONFIG_RSEQ struct rseq __user *rseq; + u32 rseq_len; u32 rseq_sig; /* * RmW on rseq_event_mask must be performed atomically @@ -2282,10 +2283,12 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) { if (clone_flags & CLONE_VM) { t->rseq = NULL; + t->rseq_len = 0; t->rseq_sig = 0; t->rseq_event_mask = 0; } else { t->rseq = current->rseq; + t->rseq_len = current->rseq_len; t->rseq_sig = current->rseq_sig; t->rseq_event_mask = current->rseq_event_mask; } @@ -2294,6 +2297,7 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) static inline void rseq_execve(struct task_struct *t) { t->rseq = NULL; + t->rseq_len = 0; t->rseq_sig = 0; t->rseq_event_mask = 0; } diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 6149ca5e0e14..390c71e9e573 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -817,7 +817,7 @@ static long ptrace_get_rseq_configuration(struct task_struct *task, { struct ptrace_rseq_configuration conf = { .rseq_abi_pointer = (u64)(uintptr_t)task->rseq, - .rseq_abi_size = sizeof(*task->rseq), + .rseq_abi_size = task->rseq_len, .signature = task->rseq_sig, .flags = 0, }; diff --git a/kernel/rseq.c b/kernel/rseq.c index 97ac20b4f738..46dc5c2ce2b7 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -18,6 +18,9 @@ #define CREATE_TRACE_POINTS #include +/* The original rseq structure size (including padding) is 32 bytes. */ +#define ORIG_RSEQ_SIZE 32 + #define RSEQ_CS_PREEMPT_MIGRATE_FLAGS (RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE | \ RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT) @@ -86,10 +89,15 @@ static int rseq_update_cpu_id(struct task_struct *t) u32 cpu_id = raw_smp_processor_id(); struct rseq __user *rseq = t->rseq; - if (!user_write_access_begin(rseq, sizeof(*rseq))) + if (!user_write_access_begin(rseq, t->rseq_len)) goto efault; unsafe_put_user(cpu_id, &rseq->cpu_id_start, efault_end); unsafe_put_user(cpu_id, &rseq->cpu_id, efault_end); + /* + * Additional feature fields added after ORIG_RSEQ_SIZE + * need to be conditionally updated only if + * t->rseq_len != ORIG_RSEQ_SIZE. + */ user_write_access_end(); trace_rseq_update(t); return 0; @@ -116,6 +124,11 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t) */ if (put_user(cpu_id, &t->rseq->cpu_id)) return -EFAULT; + /* + * Additional feature fields added after ORIG_RSEQ_SIZE + * need to be conditionally reset only if + * t->rseq_len != ORIG_RSEQ_SIZE. + */ return 0; } @@ -336,7 +349,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, /* Unregister rseq for current thread. */ if (current->rseq != rseq || !current->rseq) return -EINVAL; - if (rseq_len != sizeof(*rseq)) + if (rseq_len != current->rseq_len) return -EINVAL; if (current->rseq_sig != sig) return -EPERM; @@ -345,6 +358,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, return ret; current->rseq = NULL; current->rseq_sig = 0; + current->rseq_len = 0; return 0; } @@ -357,7 +371,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, * the provided address differs from the prior * one. */ - if (current->rseq != rseq || rseq_len != sizeof(*rseq)) + if (current->rseq != rseq || rseq_len != current->rseq_len) return -EINVAL; if (current->rseq_sig != sig) return -EPERM; @@ -366,15 +380,22 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, } /* - * If there was no rseq previously registered, - * ensure the provided rseq is properly aligned and valid. + * If there was no rseq previously registered, ensure the provided rseq + * is properly aligned, as communcated to user-space through the ELF + * auxiliary vector AT_RSEQ_ALIGN. + * + * In order to be valid, rseq_len is either the original rseq size, or + * large enough to contain all supported fields, as communicated to + * user-space through the ELF auxiliary vector AT_RSEQ_FEATURE_SIZE. */ if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || - rseq_len != sizeof(*rseq)) + rseq_len < ORIG_RSEQ_SIZE || + (rseq_len != ORIG_RSEQ_SIZE && rseq_len < offsetof(struct rseq, end))) return -EINVAL; if (!access_ok(rseq, rseq_len)) return -EFAULT; current->rseq = rseq; + current->rseq_len = rseq_len; current->rseq_sig = sig; /* * If rseq was previously inactive, and has just been -- 2.17.1