Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp6406620rwb; Tue, 22 Nov 2022 12:58:09 -0800 (PST) X-Google-Smtp-Source: AA0mqf5ebIAah+CfT15BgRrlFoB68xuv1x7tRWDKlGlt9cxKwppNpAfUqNGOEAEC8EK7VY1BTxt5 X-Received: by 2002:a17:90a:ad47:b0:218:cad8:acef with SMTP id w7-20020a17090aad4700b00218cad8acefmr6027260pjv.161.1669150689399; Tue, 22 Nov 2022 12:58:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669150689; cv=none; d=google.com; s=arc-20160816; b=ltuujD8juCRVQLAQDg/lOgfbJbr+pEp/Wwc+kqd38k5kbkHagr+IyPijk9nRO2mWM5 V+44f+B2be3BmTArBwxXN1kn8IxEwqTkBhQiNYDSFwKDSFRaE522qyY6mC8WnSgx1599 bTloapKLM2YjFNBDS4OAs8OqSxPl0lsaDGWBhHzS3JWwikeTY5viXUjVgFLlEJPJajYU 7uu3M4l4E8BxthS+Xr+6UicKLzshPh0pBOmCXPtEMhhCy8gxv5wJIa47ymIb6HCDVWeE c8ZSOqxps1D12WGUnhHZG+3pCAjhx3M83zJ7vgEmm17bZ04nhI44UTo357hFpuZnpT4f FWOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hL1WzR2S8FDhTRy6bzCRVRgQromgHBI/ywA0nNroLxc=; b=06Rz84/D5j5fonnvlgY4vvzkaTp5+pjmuTUQEnv3h+qrH9DbNTw8iAqWLQl2X8bYTl gKBYCCjSVjrJtVoLNJkhCtPBvKXyzHqKwcanlg16QdRZlCcGn4KeYl3kHHXaR6EMc2rv rUiZY1yR+YWkTvuJQ5tuUbTUcHMbx+RNCFx0GtTXnQkMUC9+YjOYFZ1KAiSV28td3DD0 MwvBGO0+m1diAMsfWV27q/6D0rnji8PdL+EGLy+E1WtDPcpnVIxLFsSx+w0t5GRzmsNB GuGEtl6MNGyEH10OgXU1ufniz2nHwTukvg1wc/Z7uF8Pb8N76X1ziLMMv200XdukhpNF 5P/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=o+HkZdQN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r26-20020a6560da000000b00477931b852csi3647796pgv.715.2022.11.22.12.57.57; Tue, 22 Nov 2022 12:58:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=o+HkZdQN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234928AbiKVUj7 (ORCPT + 89 others); Tue, 22 Nov 2022 15:39:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234777AbiKVUjr (ORCPT ); Tue, 22 Nov 2022 15:39:47 -0500 Received: from smtpout.efficios.com (unknown [IPv6:2607:5300:203:b2ee::31e5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7596B697D9; Tue, 22 Nov 2022 12:39:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1669149585; bh=ay4/Rf8ACqyHamy6OVpRQ1uU4XWlbJihVMLu/JPLDYg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=o+HkZdQNvCKcT3dPVoW6pTHaa7PamhWCG+6KJ/1wraFbtBUWY0gQ1p1wDuB38igZg RT0ITC7BBdjzae2jo1+QJAVIiROAnsDLDjz3mEhOM53ycMu9ZS7n30QDTlmOxvLpV3 tDG+CaoHGI5ak0eTF/7LBiwC4HZL7yxikkjKhZxZm4ju9wWLG/8soBu8Qjt+czQPCE FKvtKcxXbdkBqhtlYOwxPcMAGE2kBiwAWBMNF/bigUaI/Fx2nVx8HioMLugpfrF3IE lDqFuKmlxxoa5pcVIat5z9OUDMOFOY1m3BmO+WVAuW6mtKH60MWftZOEfRkPEkn6Yv aK08UrxtiCsuA== Received: from localhost.localdomain (192-222-180-24.qc.cable.ebox.net [192.222.180.24]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4NGx2T2l21zXRB; Tue, 22 Nov 2022 15:39:45 -0500 (EST) From: Mathieu Desnoyers To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , Paul Turner , linux-api@vger.kernel.org, Christian Brauner , Florian Weimer , David.Laight@ACULAB.COM, carlos@redhat.com, Peter Oskolkov , Alexander Mikhalitsyn , Chris Kennelly , Mathieu Desnoyers Subject: [PATCH 03/30] rseq: Introduce extensible rseq ABI Date: Tue, 22 Nov 2022 15:39:05 -0500 Message-Id: <20221122203932.231377-4-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221122203932.231377-1-mathieu.desnoyers@efficios.com> References: <20221122203932.231377-1-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RDNS_NONE,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Introduce the extensible rseq ABI, where the feature size supported by the kernel and the required alignment are communicated to user-space through ELF auxiliary vectors. This allows user-space to call rseq registration with a rseq_len of either 32 bytes for the original struct rseq size (which includes padding), or larger. If rseq_len is larger than 32 bytes, then it must be large enough to contain the feature size communicated to user-space through ELF auxiliary vectors. Signed-off-by: Mathieu Desnoyers --- Changes since v4: - Accept original rseq alignment for original rseq size. --- include/linux/sched.h | 4 ++++ kernel/ptrace.c | 2 +- kernel/rseq.c | 37 ++++++++++++++++++++++++++++++------- 3 files changed, 35 insertions(+), 8 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 23de7fe86cc4..2a9e14e3e668 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1305,6 +1305,7 @@ struct task_struct { #ifdef CONFIG_RSEQ struct rseq __user *rseq; + u32 rseq_len; u32 rseq_sig; /* * RmW on rseq_event_mask must be performed atomically @@ -2355,10 +2356,12 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) { if (clone_flags & CLONE_VM) { t->rseq = NULL; + t->rseq_len = 0; t->rseq_sig = 0; t->rseq_event_mask = 0; } else { t->rseq = current->rseq; + t->rseq_len = current->rseq_len; t->rseq_sig = current->rseq_sig; t->rseq_event_mask = current->rseq_event_mask; } @@ -2367,6 +2370,7 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) static inline void rseq_execve(struct task_struct *t) { t->rseq = NULL; + t->rseq_len = 0; t->rseq_sig = 0; t->rseq_event_mask = 0; } diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 54482193e1ed..0786450074c1 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -813,7 +813,7 @@ static long ptrace_get_rseq_configuration(struct task_struct *task, { struct ptrace_rseq_configuration conf = { .rseq_abi_pointer = (u64)(uintptr_t)task->rseq, - .rseq_abi_size = sizeof(*task->rseq), + .rseq_abi_size = task->rseq_len, .signature = task->rseq_sig, .flags = 0, }; diff --git a/kernel/rseq.c b/kernel/rseq.c index bda8175f8f99..c1058b3f10ac 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -18,6 +18,9 @@ #define CREATE_TRACE_POINTS #include +/* The original rseq structure size (including padding) is 32 bytes. */ +#define ORIG_RSEQ_SIZE 32 + #define RSEQ_CS_NO_RESTART_FLAGS (RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT | \ RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL | \ RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE) @@ -87,10 +90,15 @@ static int rseq_update_cpu_id(struct task_struct *t) u32 cpu_id = raw_smp_processor_id(); struct rseq __user *rseq = t->rseq; - if (!user_write_access_begin(rseq, sizeof(*rseq))) + if (!user_write_access_begin(rseq, t->rseq_len)) goto efault; unsafe_put_user(cpu_id, &rseq->cpu_id_start, efault_end); unsafe_put_user(cpu_id, &rseq->cpu_id, efault_end); + /* + * Additional feature fields added after ORIG_RSEQ_SIZE + * need to be conditionally updated only if + * t->rseq_len != ORIG_RSEQ_SIZE. + */ user_write_access_end(); trace_rseq_update(t); return 0; @@ -117,6 +125,11 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t) */ if (put_user(cpu_id, &t->rseq->cpu_id)) return -EFAULT; + /* + * Additional feature fields added after ORIG_RSEQ_SIZE + * need to be conditionally reset only if + * t->rseq_len != ORIG_RSEQ_SIZE. + */ return 0; } @@ -329,7 +342,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, /* Unregister rseq for current thread. */ if (current->rseq != rseq || !current->rseq) return -EINVAL; - if (rseq_len != sizeof(*rseq)) + if (rseq_len != current->rseq_len) return -EINVAL; if (current->rseq_sig != sig) return -EPERM; @@ -338,6 +351,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, return ret; current->rseq = NULL; current->rseq_sig = 0; + current->rseq_len = 0; return 0; } @@ -350,7 +364,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, * the provided address differs from the prior * one. */ - if (current->rseq != rseq || rseq_len != sizeof(*rseq)) + if (current->rseq != rseq || rseq_len != current->rseq_len) return -EINVAL; if (current->rseq_sig != sig) return -EPERM; @@ -359,15 +373,24 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, } /* - * If there was no rseq previously registered, - * ensure the provided rseq is properly aligned and valid. + * If there was no rseq previously registered, ensure the provided rseq + * is properly aligned, as communcated to user-space through the ELF + * auxiliary vector AT_RSEQ_ALIGN. If rseq_len is the original rseq + * size, the required alignment is the original struct rseq alignment. + * + * In order to be valid, rseq_len is either the original rseq size, or + * large enough to contain all supported fields, as communicated to + * user-space through the ELF auxiliary vector AT_RSEQ_FEATURE_SIZE. */ - if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || - rseq_len != sizeof(*rseq)) + if (rseq_len < ORIG_RSEQ_SIZE || + (rseq_len == ORIG_RSEQ_SIZE && !IS_ALIGNED((unsigned long)rseq, ORIG_RSEQ_SIZE)) || + (rseq_len != ORIG_RSEQ_SIZE && (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || + rseq_len < offsetof(struct rseq, end)))) return -EINVAL; if (!access_ok(rseq, rseq_len)) return -EFAULT; current->rseq = rseq; + current->rseq_len = rseq_len; current->rseq_sig = sig; /* * If rseq was previously inactive, and has just been -- 2.25.1