Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp1015470rdg; Wed, 11 Oct 2023 11:42:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHSd7UhlHytA/b9cpMoPAJ3gKpp3CB3Tv7D60ujo2BC7hHPew8gjF9f+z47z7MyQSBT05n7 X-Received: by 2002:a17:902:ea0a:b0:1bb:598a:14e5 with SMTP id s10-20020a170902ea0a00b001bb598a14e5mr28681041plg.43.1697049769727; Wed, 11 Oct 2023 11:42:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697049769; cv=none; d=google.com; s=arc-20160816; b=dmJ/Kq2s7wWlAI86kJurUDpDFIolkPnIB4yINypPil67RDQQ2F8i71nPtEe1B4Kv5+ 5hMK4yvWlAWUUZOOaNyZ1VTYhQ4G1RvUgdSUoewmRNQiItjemuvYqSxVfThPSPJ6miDv jeBCHRpwcwiWS3o907H6SpiYo2Ac+u7slr/txEd7WzLGxLYYpX5K1HT22kwPuao18lkr YblooMdA+PRdqG8RfOL5erO3iZvMmIjs0eFuGBNk+JhrdNCdaswecFCHXZprbcUnzO9z vu/M/kjJiagI/S2vYCxXeGgpTUvxpcQa4WstVyOB6yydlw2jnwOv+/OXZT3QKHZAsPBn Eu9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=xa3SsozLlYO5ODz73j4kYUYN5qoC1HuM+qYhUOOMgOE=; fh=tF7riTkuBa8O3vMSvJNFl/jVCIdpn/sknKnZT2xFBLQ=; b=DhRzCA6kJqMjcM+p7VTpA+oJlcKBLMDEHf/jFI2izHKntvU6pz7fJZ6m3fu3F3jRyQ 7F0PNfQNVNMhdPug9RnkA+b4/yA9dDI/j5h+zN3mniret/rlk/WR77cm8H+eHwTNp+51 yPgzQGe+tgKb76AQtE2/tGRklWEvJEPfHTWboOrTBWFYvU5j8tw3VUryXCGrUrlK3Lo0 1qDvWdpboD07RCnUOH+Wq/BRD7qmgJMtwnNwqBVJL8+CGl65FKgfAt3wkjno0xHT8Kph 8IoGuAsMQYXM+Mk4R97/gnt6WX48xgPvvb9thJrPe6ZeaPuWS9vSTD+W8rOrbK+F3UZ3 S3LQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=RoCAoTHp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id n11-20020a170902968b00b001c624237977si182937plp.252.2023.10.11.11.42.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Oct 2023 11:42:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=RoCAoTHp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 44D138038B32; Wed, 11 Oct 2023 11:42:47 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232525AbjJKSmc (ORCPT + 99 others); Wed, 11 Oct 2023 14:42:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232224AbjJKSmb (ORCPT ); Wed, 11 Oct 2023 14:42:31 -0400 Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1C2990 for ; Wed, 11 Oct 2023 11:42:28 -0700 (PDT) Received: by mail-ed1-x52c.google.com with SMTP id 4fb4d7f45d1cf-53db1fbee70so319137a12.2 for ; Wed, 11 Oct 2023 11:42:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697049747; x=1697654547; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xa3SsozLlYO5ODz73j4kYUYN5qoC1HuM+qYhUOOMgOE=; b=RoCAoTHpiCqJGyFKB5ZXefK+1lWK7w+RJVt/TsmUbB5UzSJAh+uMEQoYIXkIReVDCW ssT7NFaQ2X8r5lFVbFAk8HL7vw21V8AMeMtjZ5eaSC1zFvWhbtha7bQphppRFJgaiIBF qiBl5PIBEIDusnjHsgEs21dJJ5Ghf/0sChS6vFkUEBuEe3kH4XfhZWBG2bybbiJEJ2tj hE5f/ls1MsRW205EdaH+2dQwIXx+sLunjnBn3sKgn64Ga+6+Y1NJ5MD/T8dbCe+Vv+pu O9eOFiiw7g+sd4JzLxlSKvQN8q0U/lbNFleQgqq46TB+mzEtUi654x6QYNAHHc7yFhdq ESjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697049747; x=1697654547; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xa3SsozLlYO5ODz73j4kYUYN5qoC1HuM+qYhUOOMgOE=; b=Pnx5dlwxO29UqFiYsVrG4M1nOrHRZ6apDY8smO5xoqBo5eYPVAbpILAltwnpEgN7KH HIacv3uk1ovdxp0s5zD7sM8rFh/6NVDsYWfn+VpSXTnCARLG9yQMZDlPcdnrP2ZfpZH6 xqJWkmYm2TTdQbEJM77u2dQkCYMvoEt5QKY4u2E9mhRF/sgTT2OBAt/E0SuZ4AQ2ArEs nIligifNzXTxJguc1nP3p/PcDptBOVwi9TyQu+1mo1pwJ6T9AnH/ChpywGhvyrhF17XG Ae69hQfkyVe7OlWlpSUQ038FI5ClzCqiyXtnBTXpE0Lwe/NiTnE+AWbXQgz+4nxLnlKd ugPQ== X-Gm-Message-State: AOJu0YyGAvSGRVMVq0psgZsUTQqfkGAl+3pXjyVfPURlb0yAMVZz9Qg7 uVxq2a9htKEisfram5oVtM4vHi1g2IDS14U29f0= X-Received: by 2002:a05:6402:2072:b0:53d:ec99:271 with SMTP id bd18-20020a056402207200b0053dec990271mr1583739edb.33.1697049746888; Wed, 11 Oct 2023 11:42:26 -0700 (PDT) MIME-Version: 1.0 References: <20231010164234.140750-1-ubizjak@gmail.com> In-Reply-To: From: Uros Bizjak Date: Wed, 11 Oct 2023 20:42:15 +0200 Message-ID: Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr() To: Linus Torvalds Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Nadav Amit , Andy Lutomirski , Brian Gerst , Denys Vlasenko , "H . Peter Anvin" , Peter Zijlstra , Thomas Gleixner , Josh Poimboeuf Content-Type: multipart/mixed; boundary="000000000000548ea20607752fc2" X-Spam-Status: No, score=3.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_SBL_CSS, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Wed, 11 Oct 2023 11:42:47 -0700 (PDT) X-Spam-Level: ** --000000000000548ea20607752fc2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Oct 10, 2023 at 8:52=E2=80=AFPM Linus Torvalds wrote: > > On Tue, 10 Oct 2023 at 11:41, Uros Bizjak wrote: > > > > Yes, but does it CSE the load from multiple addresses? > > Yes, it should do that just right, because the *asm* itself is > identical, just the offsets (that gcc then adds separately) would be > different. > > This is not unlike how we depend on gcc CSE'ing the "current" part > when doing multiple accesses of different members off that: > > static __always_inline struct task_struct *get_current(void) > { > return this_cpu_read_stable(pcpu_hot.current_task); > } > > with this_cpu_read_stable() being an inline asm that lacks the memory > component (the same way the fallback hides it by just using > "%%gs:this_cpu_off" directly inside the asm, instead of exposing it as > a memory access to gcc). > > Of course, I think that with the "__seg_gs" patches, we *could* expose > the "%%gs:this_cpu_off" part to gcc, since gcc hopefully then can do > the alias analysis on that side and see that it can CSE the thing > anyway. > > That might be a better choice than __FORCE_ORDER, in fact. > > IOW, something like > > static __always_inline unsigned long new_cpu_offset(void) > { > unsigned long res; > asm(ALTERNATIVE( > "movq " __percpu_arg(1) ",%0", > "rdgsbase %0", > X86_FEATURE_FSGSBASE) > : "=3Dr" (res) > : "m" (this_cpu_off)); > return res; > } > > would presumably work together with your __seg_gs stuff. > > UNTESTED!! The attached patch was tested on a target with fsgsbase CPUID and without it. It works! The patch improves amd_pmu_enable_virt() in the same way as reported in the original patch submission and also reduces the number of percpu offset reads (either from this_cpu_off or with rdgsbase) from 1663 to 1571. The only drawback is a larger binary size: text data bss dec hex filename 25546594 4387686 808452 30742732 1d518cc vmlinux-new.o 25515256 4387814 808452 30711522 1d49ee2 vmlinux-old.o that increases by 31k (0.123%), probably due to 1578 rdgsbase alternatives. I'll prepare and submit a patch for tip/percpu branch. Uros. > > Linus --000000000000548ea20607752fc2 Content-Type: text/plain; charset="US-ASCII"; name="cpu_ptr-mainline.diff.txt" Content-Disposition: attachment; filename="cpu_ptr-mainline.diff.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lnm2oyzb0 ZGlmZiAtLWdpdCBhL2FyY2gveDg2L2luY2x1ZGUvYXNtL3BlcmNwdS5oIGIvYXJjaC94ODYvaW5j bHVkZS9hc20vcGVyY3B1LmgKaW5kZXggMzQ3MzRkNzMwNDYzLi44NDUwZmU0YTI3NTMgMTAwNjQ0 Ci0tLSBhL2FyY2gveDg2L2luY2x1ZGUvYXNtL3BlcmNwdS5oCisrKyBiL2FyY2gveDg2L2luY2x1 ZGUvYXNtL3BlcmNwdS5oCkBAIC0zMSwxOCArMzEsMzIgQEAKICNkZWZpbmUgX19wZXJjcHVfcHJl Zml4CQkiJSUiX19zdHJpbmdpZnkoX19wZXJjcHVfc2VnKSI6IgogI2RlZmluZSBfX215X2NwdV9v ZmZzZXQJCXRoaXNfY3B1X3JlYWQodGhpc19jcHVfb2ZmKQogCi0vKgotICogQ29tcGFyZWQgdG8g dGhlIGdlbmVyaWMgX19teV9jcHVfb2Zmc2V0IHZlcnNpb24sIHRoZSBmb2xsb3dpbmcKLSAqIHNh dmVzIG9uZSBpbnN0cnVjdGlvbiBhbmQgYXZvaWRzIGNsb2JiZXJpbmcgYSB0ZW1wIHJlZ2lzdGVy LgotICovCi0jZGVmaW5lIGFyY2hfcmF3X2NwdV9wdHIocHRyKQkJCQlcCi0oewkJCQkJCQlcCi0J dW5zaWduZWQgbG9uZyB0Y3BfcHRyX187CQkJXAotCWFzbSAoImFkZCAiIF9fcGVyY3B1X2FyZygx KSAiLCAlMCIJCVwKLQkgICAgIDogIj1yIiAodGNwX3B0cl9fKQkJCQlcCi0JICAgICA6ICJtIiAo dGhpc19jcHVfb2ZmKSwgIjAiIChwdHIpKTsJCVwKLQkodHlwZW9mKCoocHRyKSkgX19rZXJuZWwg X19mb3JjZSAqKXRjcF9wdHJfXzsJXAorI2lmZGVmIENPTkZJR19YODZfNjQKKyNkZWZpbmUgYXJj aF9yYXdfY3B1X3B0cihwdHIpCQkJCQlcCisoewkJCQkJCQkJXAorCXVuc2lnbmVkIGxvbmcgdGNw X3B0cl9fOwkJCQlcCisJYXNtIChBTFRFUk5BVElWRSgibW92cSAiIF9fcGVyY3B1X2FyZygxKSAi LCAlMCIsCVwKKwkJCSAicmRnc2Jhc2UgJTAiLAkJCQlcCisJCQkgWDg2X0ZFQVRVUkVfRlNHU0JB U0UpCQkJXAorCSAgICAgOiAiPXIiICh0Y3BfcHRyX18pCQkJCQlcCisJICAgICA6ICJtIiAodGhp c19jcHVfb2ZmKSk7CQkJCVwKKwkJCQkJCQkJXAorCXRjcF9wdHJfXyArPSAodW5zaWduZWQgbG9u ZykocHRyKTsJCQlcCisJKHR5cGVvZigqKHB0cikpIF9fa2VybmVsIF9fZm9yY2UgKil0Y3BfcHRy X187CQlcCiB9KQorI2Vsc2UgLyogQ09ORklHX1g4Nl82NCAqLworI2RlZmluZSBhcmNoX3Jhd19j cHVfcHRyKHB0cikJCQkJCVwKKyh7CQkJCQkJCQlcCisJdW5zaWduZWQgbG9uZyB0Y3BfcHRyX187 CQkJCVwKKwlhc20gKCJtb3ZsICIgX19wZXJjcHVfYXJnKDEpICIsICUwIgkJCVwKKwkgICAgIDog Ij1yIiAodGNwX3B0cl9fKQkJCQkJXAorCSAgICAgOiAibSIgKHRoaXNfY3B1X29mZikpOwkJCQlc CisJCQkJCQkJCVwKKwl0Y3BfcHRyX18gKz0gKHVuc2lnbmVkIGxvbmcpKHB0cik7CQkJXAorCSh0 eXBlb2YoKihwdHIpKSBfX2tlcm5lbCBfX2ZvcmNlICopdGNwX3B0cl9fOwkJXAorfSkKKyNlbmRp ZiAvKiBDT05GSUdfWDg2XzY0ICovCisKICNlbHNlCiAjZGVmaW5lIF9fcGVyY3B1X3ByZWZpeAkJ IiIKICNlbmRpZgo= --000000000000548ea20607752fc2--