Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp2143301rwi; Thu, 3 Nov 2022 13:14:10 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5D3Zo2xqghnDXP7p29jvOXEg0VTNbq6vGUrVpsw5EnRlQ+XUFEav1zbuoGjRyXnNcSViLG X-Received: by 2002:a17:907:628a:b0:781:bbff:1d42 with SMTP id nd10-20020a170907628a00b00781bbff1d42mr29973255ejc.375.1667506450652; Thu, 03 Nov 2022 13:14:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667506450; cv=none; d=google.com; s=arc-20160816; b=DoW3IDe68/YWMuFVmSHSd0i9fixBSQeHJQf2QoeiWOezjlKnTOffCubLE6v0ST/o9O 2I/j2NjCUl+y4WthwwoMSklEMWPRsgK46yrtNtNrA5ee4fPY6ybUWSwZhvWqf9lOBvqd vYcqI88gyTcBULsdJhngXwgIvnd+xWK5+rTljPmrPOeH1RN+XIvxl6Jh+ynKJS2phyJy mP0wZdfSsDIf6dLw0bD7IEB+OI7QcW48RIWYFx0J9gzsrGmTJqLmeoRD5VCa7Iget/cP 9H1j448eYP26SdQkTaho5F8B4I23aMWZMPjDnaOZL0hx2psmswVxlnRL2kq64WvlAEs7 ea5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=HGSfOK1iY+CRiVQw0TI5bTaI8331CeAg5HwVZ/3U08Y=; b=Tr7eRl5N0ZsHRyfu/6tuwN9gGOsU05evPe+NXqk43X/B4Go+s4QWfNN2BSfadBf3qI IHlO+0D3ZlAibix0lwvgIKfJ9k3g6ziDRZwrR25Jt0GqsU1rghSOIGTr8xsJkA7LIwxS QybOpguWDJ+kThjK1V5Z8QndU6OdBNS9aEdnisOwxSdHBhnzKUrpHD6HeE9Cm6o6XPXa /I6linzllOBzjdaMsBPduCzGTk75GMQbnT1AJKR1vpJUq4tHM8LV4tD2WTkdUUd46Pxd 4q8F5KZDjs4qgg9mqcfgIkfYAQuCjQQ7G64bk64cXTBvintjrZ+IkXvFQdNxM9s0GNPy 92fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=ubLqjhI1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h8-20020a056402280800b0045a0e39062esi3050539ede.76.2022.11.03.13.13.47; Thu, 03 Nov 2022 13:14:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=ubLqjhI1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231374AbiKCUFQ (ORCPT + 98 others); Thu, 3 Nov 2022 16:05:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229871AbiKCUE6 (ORCPT ); Thu, 3 Nov 2022 16:04:58 -0400 Received: from smtpout.efficios.com (smtpout.efficios.com [IPv6:2607:5300:203:5aae::31e5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F83020350; Thu, 3 Nov 2022 13:04:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1667505896; bh=lLWUz3oaCIo40YA9zK1pHfVsPRrbvzXT6T+kSxRkqXo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ubLqjhI10lXFzAqnlrB9XT+fqLy1QM4xOZ0w9dm7K5VXt1ZF7jCY6USaXxtgWTMSB cCb+jzPC4dg6lgA/jcRzS5ZPHdrLNuL6hKMQ8Je5tVvw989OE1dKnJwsi8Nfjz2ms2 ME5KryeUvmP8xKA38tL6EEK5w3U9h5EDOWngST9XEbClSQEQcOHBk2/FlGSizEWbwK KPw3B7YRVlzzzr8j0HM/VRyo62j2L5t0wfqHKjnZiqBlOyrWV1BDOUhcROqes1Y1Tb LLNXoOzqSlFYnNVIJ5kpX5prW2H95xRGruQTTrYMi4ify+zzHis0i2lYfemuR3S2MW CQJwnq2ENhBrw== Received: from localhost.localdomain (192-222-180-24.qc.cable.ebox.net [192.222.180.24]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4N3F9404TWzg2k; Thu, 3 Nov 2022 16:04:55 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , Paul Turner , linux-api@vger.kernel.org, Christian Brauner , Florian Weimer , David.Laight@ACULAB.COM, carlos@redhat.com, Peter Oskolkov , Alexander Mikhalitsyn , Chris Kennelly , Mathieu Desnoyers Subject: [PATCH v5 03/24] rseq: Extend struct rseq with numa node id Date: Thu, 3 Nov 2022 16:03:38 -0400 Message-Id: <20221103200359.328736-4-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221103200359.328736-1-mathieu.desnoyers@efficios.com> References: <20221103200359.328736-1-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Adding the NUMA node id to struct rseq is a straightforward thing to do, and a good way to figure out if anything in the user-space ecosystem prevents extending struct rseq. This NUMA node id field allows memory allocators such as tcmalloc to take advantage of fast access to the current NUMA node id to perform NUMA-aware memory allocation. It can also be useful for implementing fast-paths for NUMA-aware user-space mutexes. It also allows implementing getcpu(2) purely in user-space. Signed-off-by: Mathieu Desnoyers --- Changes since v4: - Use __entry->cpu_id as argument for cpu_to_node() in the rseq_update tracepoint. --- include/trace/events/rseq.h | 4 +++- include/uapi/linux/rseq.h | 8 ++++++++ kernel/rseq.c | 19 +++++++++++++------ 3 files changed, 24 insertions(+), 7 deletions(-) diff --git a/include/trace/events/rseq.h b/include/trace/events/rseq.h index a04a64bc1a00..dde7a359b4ef 100644 --- a/include/trace/events/rseq.h +++ b/include/trace/events/rseq.h @@ -16,13 +16,15 @@ TRACE_EVENT(rseq_update, TP_STRUCT__entry( __field(s32, cpu_id) + __field(s32, node_id) ), TP_fast_assign( __entry->cpu_id = raw_smp_processor_id(); + __entry->node_id = cpu_to_node(__entry->cpu_id); ), - TP_printk("cpu_id=%d", __entry->cpu_id) + TP_printk("cpu_id=%d node_id=%d", __entry->cpu_id, __entry->node_id) ); TRACE_EVENT(rseq_ip_fixup, diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h index 05d3c4cdeb40..1cb90a435c5c 100644 --- a/include/uapi/linux/rseq.h +++ b/include/uapi/linux/rseq.h @@ -131,6 +131,14 @@ struct rseq { */ __u32 flags; + /* + * Restartable sequences node_id field. Updated by the kernel. Read by + * user-space with single-copy atomicity semantics. This field should + * only be read by the thread which registered this data structure. + * Aligned on 32-bit. Contains the current NUMA node ID. + */ + __u32 node_id; + /* * Flexible array member at end of structure, after last feature field. */ diff --git a/kernel/rseq.c b/kernel/rseq.c index c1058b3f10ac..e21ad8929958 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -85,15 +85,17 @@ * F1. */ -static int rseq_update_cpu_id(struct task_struct *t) +static int rseq_update_cpu_node_id(struct task_struct *t) { - u32 cpu_id = raw_smp_processor_id(); struct rseq __user *rseq = t->rseq; + u32 cpu_id = raw_smp_processor_id(); + u32 node_id = cpu_to_node(cpu_id); if (!user_write_access_begin(rseq, t->rseq_len)) goto efault; unsafe_put_user(cpu_id, &rseq->cpu_id_start, efault_end); unsafe_put_user(cpu_id, &rseq->cpu_id, efault_end); + unsafe_put_user(node_id, &rseq->node_id, efault_end); /* * Additional feature fields added after ORIG_RSEQ_SIZE * need to be conditionally updated only if @@ -109,9 +111,9 @@ static int rseq_update_cpu_id(struct task_struct *t) return -EFAULT; } -static int rseq_reset_rseq_cpu_id(struct task_struct *t) +static int rseq_reset_rseq_cpu_node_id(struct task_struct *t) { - u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED; + u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED, node_id = 0; /* * Reset cpu_id_start to its initial state (0). @@ -125,6 +127,11 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t) */ if (put_user(cpu_id, &t->rseq->cpu_id)) return -EFAULT; + /* + * Reset node_id to its initial state (0). + */ + if (put_user(node_id, &t->rseq->node_id)) + return -EFAULT; /* * Additional feature fields added after ORIG_RSEQ_SIZE * need to be conditionally reset only if @@ -299,7 +306,7 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) if (unlikely(ret < 0)) goto error; } - if (unlikely(rseq_update_cpu_id(t))) + if (unlikely(rseq_update_cpu_node_id(t))) goto error; return; @@ -346,7 +353,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, return -EINVAL; if (current->rseq_sig != sig) return -EPERM; - ret = rseq_reset_rseq_cpu_id(current); + ret = rseq_reset_rseq_cpu_node_id(current); if (ret) return ret; current->rseq = NULL; -- 2.25.1