Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp1074362rdg; Wed, 11 Oct 2023 13:42:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEa0eNulfWtoKhgXdsB151gbXxr1fxQiD9FhAFeBKi3RxEDwoGQXyykLxrIZRxU+0vuA9TN X-Received: by 2002:a05:6a20:914d:b0:151:35ad:f327 with SMTP id x13-20020a056a20914d00b0015135adf327mr22263111pzc.17.1697056946530; Wed, 11 Oct 2023 13:42:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697056946; cv=none; d=google.com; s=arc-20160816; b=STYSEUVIXCB4IMtkik0NVtNWindUkX7KX1FYqjvABRzIw67n9p7CwAGUuFQiwmLHO4 QR3bqzChEvi6MG6Wt9LN2zeXIoA2vAfz1A30JjgElAQJxvBTSgZxVG6REswvFRiJ7W0a moDDFXJVtDw0IotMQDzseGv4W0PiCn8D4HnS5OwM4a8U78vCijHOqi1LMfOnwowY1lIE xu2SV4QFgFogrMRDfLtdX/MNu8/Xk1PBrPYsSJIUIjFgu1tFmfBvQEPBdgAmTTvrKuYc Cmso4EUYC0o8V1D/56MMDniu4wQQ+jOr1aJMzNb4vMllOn0ClZOpPxzYNZ0vUKbgs611 HpOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=PwSU+FrzFRbKZUOpzLtD4x7QEtM9uXRX1yERat/SRH8=; fh=8zQurl2ctl30uMrH+/6VONvGM5NCORHd7M3WXmciTOE=; b=rCYOldqUBp7eCe2Tc8yOhaLh2SeIXf2dNsoAHwtW6mxJVc42uSetjohTsf5aH4BWcO RYdKC/5vn+1ypTB4T1Yucqs+XO/vJvgPS0UIm8Ls+dvLPqup4QNokCwlkgv3XgtJmcgu jI/O5KEaL0KLlvOe3oNiB0n4g1a6cTL9DYGqi2b7RNBEOahsf/DJzv5H8e+4loUl+U8G rbtV/5OdHAIeP9tmjLD/KhMHiCMwr/saQ3c/V9mOt/Ms+LJ+jSYGsbCw3TmBqxh1FSKB RFEerAadnIZslbIXsTvGUP1CbQg4EN2w5IT1ZofhLYJLCd0W8iG469jPV7UgdNBQv9qE 4Nqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=GWxy9Vqn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id x25-20020a634859000000b0057851fc2b4asi578611pgk.55.2023.10.11.13.42.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Oct 2023 13:42:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=GWxy9Vqn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 255038215C20; Wed, 11 Oct 2023 13:42:22 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233501AbjJKUmN (ORCPT + 99 others); Wed, 11 Oct 2023 16:42:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33496 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233360AbjJKUmM (ORCPT ); Wed, 11 Oct 2023 16:42:12 -0400 Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 050EC91 for ; Wed, 11 Oct 2023 13:42:11 -0700 (PDT) Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-53df747cfe5so554146a12.2 for ; Wed, 11 Oct 2023 13:42:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697056929; x=1697661729; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=PwSU+FrzFRbKZUOpzLtD4x7QEtM9uXRX1yERat/SRH8=; b=GWxy9VqnBi3ueftuQSZCtNgMBiOQqB3u89ecqkFYFL6buZnBKgLDsji/UpJXXOWCOK q7HqXH1fUa4iROibCjeGfUrUQk85MEzusSlb4yCHttf6hoBiWXEYYWCf06FWe+mZpvxz mH5qmkLYWu0dMY2/z2HTkbgMWJQNDUBGgRp3zZ7az4ppQX/lDvUayUu1SUBcA3u2kHy4 u9aD8lZP/46arfo13cQaIbEI4ICarIukGuRcZSgCwjDwtL3SVtzNsx6VCUINPZ3igTfP 1CCBr6r/f/AJeqMzCtPNOjJnMlq7ZzNgyNI4cZt603x0l7O06rQzhXvAe8Mn69VRwFQT KZ3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697056929; x=1697661729; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PwSU+FrzFRbKZUOpzLtD4x7QEtM9uXRX1yERat/SRH8=; b=kTgeOUtZQiFtfvbL/a/GR3YqnTn/S58Vmo3dWKH7oV9alBfhokWbPnnRzaStNPgfIz 7oI9zWfm7c74bVhevir2UmRoTpjs8+xx/XVouGvvBou2MTtKo0X/7XXxWgyLIDpx5JJm UDAnvkS/WZmePswwK3snt09ZYsmisCYEiAVUg6lK1JjoErCWbYA3C/doGS3kHO/2TlXE ctiIbZ8kne6pfiAMK3rJ8XhSC/KLpPlys6bQiyPvDUkiENUXXVzojpe9gSebtFtY39qN tPA/0Fr5ask0JsgeaNtixqqM2RBzU7duCz7oQpSqHC4EMBhVcLPQ903VKcUjsuycAbRa yLVw== X-Gm-Message-State: AOJu0YwQB2HVZatlaDG/NLG2VyZqbTOkPNl7Joy7a7DNdGSnSVKjkSQQ FxesHhQ0p4EicGl9AiraGBM= X-Received: by 2002:a05:6402:1219:b0:531:1455:7528 with SMTP id c25-20020a056402121900b0053114557528mr16824569edw.40.1697056929225; Wed, 11 Oct 2023 13:42:09 -0700 (PDT) Received: from localhost.localdomain ([46.248.82.114]) by smtp.gmail.com with ESMTPSA id k12-20020a056402048c00b0052ffc2e82f1sm9334504edv.4.2023.10.11.13.42.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Oct 2023 13:42:08 -0700 (PDT) From: Uros Bizjak To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: Uros Bizjak , Linus Torvalds , Nadav Amit , Ingo Molnar , Andy Lutomirski , Brian Gerst , Denys Vlasenko , "H . Peter Anvin" , Peter Zijlstra , Thomas Gleixner , Josh Poimboeuf Subject: [PATCH tip] x86/percpu: Rewrite arch_raw_cpu_ptr() Date: Wed, 11 Oct 2023 22:40:36 +0200 Message-ID: <20231011204150.51166-1-ubizjak@gmail.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=3.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_SBL_CSS, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Wed, 11 Oct 2023 13:42:22 -0700 (PDT) X-Spam-Level: ** Implement arch_raw_cpu_ptr() as a load from this_cpu_off and then add the ptr value to the base. This way, the compiler can propagate addend to the following instruction and simplify address calculation. E.g.: address calcuation in amd_pmu_enable_virt() improves from: 48 c7 c0 00 00 00 00 mov $0x0,%rax 87b7: R_X86_64_32S cpu_hw_events 65 48 03 05 00 00 00 add %gs:0x0(%rip),%rax 00 87bf: R_X86_64_PC32 this_cpu_off-0x4 48 c7 80 28 13 00 00 movq $0x0,0x1328(%rax) 00 00 00 00 to: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax 00 8798: R_X86_64_PC32 this_cpu_off-0x4 48 c7 80 00 00 00 00 movq $0x0,0x0(%rax) 00 00 00 00 87a6: R_X86_64_32S cpu_hw_events+0x1328 The compiler can also eliminate redundant loads from this_cpu_off, reducing the number of percpu offset reads (either from this_cpu_off or with rdgsbase) from 1663 to 1571. Additionaly, the patch introduces 'rdgsbase' alternative for CPUs with X86_FEATURE_FSGSBASE. The rdgsbase instruction *probably* will end up only decoding in the first decoder etc. But we're talking single-cycle kind of effects, and the rdgsbase case should be much better from a cache perspective and might use fewer memory pipeline resources to offset the fact that it uses an unusual front end decoder resource... The only drawback of the patch is larger binary size: text data bss dec hex filename 25546594 4387686 808452 30742732 1d518cc vmlinux-new.o 25515256 4387814 808452 30711522 1d49ee2 vmlinux-old.o that increases by 31k (0.123%), due to 1578 rdgsbase altinstructions that are placed in the text section. The increase in text-size is not "real" - the 'rdgsbase' instruction should be smaller than a 'mov %gs'; binary size increases because we obviously have two instructions, but the actual *executable* part likely stays the same, and it's just that we grow the altinstruction metadata. Suggested-by: Linus Torvalds Signed-off-by: Uros Bizjak Cc: Nadav Amit Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Josh Poimboeuf --- arch/x86/include/asm/percpu.h | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 60ea7755c0fe..e047a0bc5554 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -49,18 +49,32 @@ #define __force_percpu_prefix "%%"__stringify(__percpu_seg)":" #define __my_cpu_offset this_cpu_read(this_cpu_off) -/* - * Compared to the generic __my_cpu_offset version, the following - * saves one instruction and avoids clobbering a temp register. - */ +#ifdef CONFIG_X86_64 +#define arch_raw_cpu_ptr(ptr) \ +({ \ + unsigned long tcp_ptr__; \ + asm (ALTERNATIVE("movq " __percpu_arg(1) ", %0", \ + "rdgsbase %0", \ + X86_FEATURE_FSGSBASE) \ + : "=r" (tcp_ptr__) \ + : "m" (__my_cpu_var(this_cpu_off))); \ + \ + tcp_ptr__ += (unsigned long)(ptr); \ + (typeof(*(ptr)) __kernel __force *)tcp_ptr__; \ +}) +#else /* CONFIG_X86_64 */ #define arch_raw_cpu_ptr(ptr) \ ({ \ unsigned long tcp_ptr__; \ - asm ("add " __percpu_arg(1) ", %0" \ + asm ("movl " __percpu_arg(1) ", %0" \ : "=r" (tcp_ptr__) \ - : "m" (__my_cpu_var(this_cpu_off)), "0" (ptr)); \ + : "m" (__my_cpu_var(this_cpu_off))); \ + \ + tcp_ptr__ += (unsigned long)(ptr); \ (typeof(*(ptr)) __kernel __force *)tcp_ptr__; \ }) +#endif /* CONFIG_X86_64 */ + #else /* CONFIG_SMP */ #define __percpu_seg_override #define __percpu_prefix "" -- 2.41.0