Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2726252rdg; Mon, 16 Oct 2023 12:55:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHIc6GgWsNxarX/VXCXM1cFDc++q7ug5+5C157guQnwPrJwzSok5KzEsD8u65xio6IwCOhA X-Received: by 2002:a17:90b:4c8f:b0:27d:5a7:3960 with SMTP id my15-20020a17090b4c8f00b0027d05a73960mr12472310pjb.21.1697486109097; Mon, 16 Oct 2023 12:55:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697486109; cv=none; d=google.com; s=arc-20160816; b=uwCZt12YncxEpG5pisGc1yMH65ccrbKp01FQzKvcJtX8rUkrBT0GLRjw1Jg+ID8UKE gJADZd7oxYgQsrcyju4pckC6U0g+6s4VGx8NssZ1jpO/ELh1XGFaVkIaUlwWNiNCGTLo yWFH9ScK6L6N8v4wM+rS/jdKymCEoyL4ARIYu9S+fPyj7idABNVQtaJwxNfT4gCe8vSw sZWCBYbDXZ30iKZTjdYIkSiaPio0IC4/cvj4BoqrKDlu8/v/Ns6RnGJTMDSd/k/I1OI6 Y4fqfDP24QPGi+HsUNsSuunnmp4vHkym40xi6cdyW/cjP0x/GJhF+BkXqE7EICd11vwN hbIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=baP8YkQxU9yI25DBD+/65bf+wTrEW6XypsQOt1oBEAk=; fh=1PdfQGdW2c54SBxJin50xBePmmAZCKCAtNgXLjVIOcY=; b=TJ1D/Rmr3P+s3N+XwzGN/MBHqKYMqda3yTm6l+qKWTtcof1wB6155IVbKxnkAzP8gG QC8+UuT2RwHn/9b4gKRyg7HXbFGmx1J7k7nVmjYgBZmBut/UVbD0jSnCsER8bREl8I+m gW299doGcYQWoIgXBPwk5wSOqBjMP+sjPpZNISqX4JsjMwIablDoDj6jsklh9++XYAhp 5ytuZ5k4PPguJevJhS0CLTk6o99Q26L6B1kEAMMmBYdsh4Rb0g1DUpEz19fgpuYEjwTw FsuGLIY5prJ4GXO/3dVK7rCk9L9EnaBMUNf75PR1LfHqM+YCFjV30nk/iUeQcTKFJAyD fI7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=cF3MOT+b; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id pf2-20020a17090b1d8200b0026b502223cbsi131pjb.10.2023.10.16.12.55.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 12:55:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=cF3MOT+b; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id C89DD8020D99; Mon, 16 Oct 2023 12:55:06 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233765AbjJPTy7 (ORCPT + 99 others); Mon, 16 Oct 2023 15:54:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232365AbjJPTy6 (ORCPT ); Mon, 16 Oct 2023 15:54:58 -0400 Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 081F98F for ; Mon, 16 Oct 2023 12:54:57 -0700 (PDT) Received: by mail-lf1-x130.google.com with SMTP id 2adb3069b0e04-507a29c7eefso2901423e87.1 for ; Mon, 16 Oct 2023 12:54:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1697486095; x=1698090895; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=baP8YkQxU9yI25DBD+/65bf+wTrEW6XypsQOt1oBEAk=; b=cF3MOT+bHmHi7qeDjP37YIirjNrtzkTlb/iPDk1EMm++OEGvdd2KhBiGv8nE8oTz60 GXVyJAbVVKDi77QV2stFYLZm+cHQ4i90cdZx4LgaqxlgvfwbB/ZQWH/qwTXmHKcj2eQJ BbFgP2Ht052tLezkJMmflNYjgbGSlRnzRhw7k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697486095; x=1698090895; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=baP8YkQxU9yI25DBD+/65bf+wTrEW6XypsQOt1oBEAk=; b=cRmP4QZSiT6ulE0V3OTrW3zPpBZTsLoZSlvMzxIEkL3FvoTV51nqG5G/i9NbUVzfly YGHaeR/G8ny4YQCRZoSy/poQCwfmB0viPhqifMFBzClKcroLmpCPpGSqYbRUAmIRqvU1 MnZwv2K83onMIU1ULS+2h0RHwKNNJkgM3b5Gm6HboxSUiUp/aQ1lG1AYwr978K0fRpco 10NBW0IdCqyK9NIGU7qBCIncUbagBIuEKe6cHHalKRFNkSv/m0V2ChBoOtbxyb3cBYdX 5M0/qRxOFxD2WdGwDr12wEcfPOl0Usvppv7M+R5y0TN+MqjccUAAsslMJKkVx7DvQQCX yhOg== X-Gm-Message-State: AOJu0YxXEUEFBSBx46WJ/rg704OSt7BuXHi6cG0D3vuUobXFj/R6k2zX OJ+H2NiM9pu/+lmAjbK918zW+qxAan7awqRJ2ZmLK+z7 X-Received: by 2002:a19:f80f:0:b0:507:ab5b:7b6c with SMTP id a15-20020a19f80f000000b00507ab5b7b6cmr256640lff.36.1697486094986; Mon, 16 Oct 2023 12:54:54 -0700 (PDT) Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com. [209.85.167.52]) by smtp.gmail.com with ESMTPSA id s8-20020a056512314800b005079a8b0f19sm93042lfi.62.2023.10.16.12.54.53 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Oct 2023 12:54:53 -0700 (PDT) Received: by mail-lf1-f52.google.com with SMTP id 2adb3069b0e04-507a29c7eefso2901387e87.1 for ; Mon, 16 Oct 2023 12:54:53 -0700 (PDT) X-Received: by 2002:a05:6512:3b0c:b0:502:fdca:2eaa with SMTP id f12-20020a0565123b0c00b00502fdca2eaamr300196lfv.52.1697486093274; Mon, 16 Oct 2023 12:54:53 -0700 (PDT) MIME-Version: 1.0 References: <20231015202523.189168-1-ubizjak@gmail.com> <20231015202523.189168-3-ubizjak@gmail.com> In-Reply-To: From: Linus Torvalds Date: Mon, 16 Oct 2023 12:54:35 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH -tip 3/3] x86/percpu: *NOT FOR MERGE* Implement arch_raw_cpu_ptr() with RDGSBASE To: Sean Christopherson Cc: Ingo Molnar , Uros Bizjak , x86@kernel.org, linux-kernel@vger.kernel.org, Nadav Amit , Andy Lutomirski , Brian Gerst , Denys Vlasenko , "H . Peter Anvin" , Peter Zijlstra , Thomas Gleixner , Josh Poimboeuf , Borislav Petkov Content-Type: multipart/mixed; boundary="0000000000009a2b780607dac7a8" X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Mon, 16 Oct 2023 12:55:06 -0700 (PDT) --0000000000009a2b780607dac7a8 Content-Type: text/plain; charset="UTF-8" On Mon, 16 Oct 2023 at 12:29, Sean Christopherson wrote: > > > Are we certain that ucode on modern x86 CPUs check CR4 for every affected > > instruction? > > Not certain at all. I agree the CR4.FSGSBASE thing could be a complete non-issue > and was just me speculating. Note that my timings on two fairly different arches do put the cost of 'rdgsbase' at 2 cycles, so it's not microcoded in the sense of jumping off to some microcode sequence that has a noticeable overhead. So it's almost certainly what Intel calls a "complex decoder" case that generates up to 4 uops inline and only decodes in the first decode slot. One of the uops could easily be a cr4 check, that's not an uncommon thing for those kinds of instructions. If somebody wants to try my truly atrocious test program on other machines, go right ahead. It's attached. I'm not proud of it. It's a hack. Do something like this: $ gcc -O2 t.c $ ./a.out "nop"=0l: 0.380925 "nop"=0l: 0.380640 "nop"=0l: 0.380373 "mov %1,%0":"=r"(base):"m"(zero)=0l: 0.787984 "rdgsbase %0":"=r"(base)=0l: 2.626625 and you'll see that a no-op takes about a third of a cycle on my Zen 2 core (according to this truly stupid benchmark). With some small overhead. And a "mov memory to register" shows up as ~3/4 cycle, but it's really probably that the core can do two of them per cycle, and then the chain of adds (see how that benchmark makes sure the result is "used") adds some more overhead etc. And the 'rdgsbase' is about two cycles, and presumably is fully serialized, so all the loop overhead and adding results then shows up as that extra .6 of a cycle on average. But doing cycle estimations on OoO machines is "guess rough patterns", so take all the above with a big pinch of salt. And feel free to test it on other cores than the ones I did (Intel Skylake and and AMD Zen 2). You migth want to put your machine into "performance" mode or other things to actually make it run at the highest frequency to get more repeatable numbers. The Skylake core does better on the nops (I think Intel gets rid of them earlier in the decode stages and they basically disappear in the uop cache), and can do three loads per cycle. So rdgsbase looks relatively slower on my Skylake at about 3 cycles per op, but when you look at an individual instruction, that's a fairly artificial thing. You don't run these things in the uop cache in reality. Linus --0000000000009a2b780607dac7a8 Content-Type: text/x-c-code; charset="US-ASCII"; name="t.c" Content-Disposition: attachment; filename="t.c" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lntaw81r0 I2luY2x1ZGUgPHN0ZGlvLmg+CgojZGVmaW5lIE5SIDEwMDAwMDAwMAoKI2RlZmluZSBMT09QKHgp IGZvcihpbnQgaSA9IDA7IGkgPCBOUi8xNjsgaSsrKSBkbyB7IFwKCWFzbSB2b2xhdGlsZSh4KTsg c3VtICs9IGJhc2U7IFwKCWFzbSB2b2xhdGlsZSh4KTsgc3VtICs9IGJhc2U7IFwKCWFzbSB2b2xh dGlsZSh4KTsgc3VtICs9IGJhc2U7IFwKCWFzbSB2b2xhdGlsZSh4KTsgc3VtICs9IGJhc2U7IFwK CWFzbSB2b2xhdGlsZSh4KTsgc3VtICs9IGJhc2U7IFwKCWFzbSB2b2xhdGlsZSh4KTsgc3VtICs9 IGJhc2U7IFwKCWFzbSB2b2xhdGlsZSh4KTsgc3VtICs9IGJhc2U7IFwKCWFzbSB2b2xhdGlsZSh4 KTsgc3VtICs9IGJhc2U7IFwKCWFzbSB2b2xhdGlsZSh4KTsgc3VtICs9IGJhc2U7IFwKCWFzbSB2 b2xhdGlsZSh4KTsgc3VtICs9IGJhc2U7IFwKCWFzbSB2b2xhdGlsZSh4KTsgc3VtICs9IGJhc2U7 IFwKCWFzbSB2b2xhdGlsZSh4KTsgc3VtICs9IGJhc2U7IFwKCWFzbSB2b2xhdGlsZSh4KTsgc3Vt ICs9IGJhc2U7IFwKCWFzbSB2b2xhdGlsZSh4KTsgc3VtICs9IGJhc2U7IFwKCWFzbSB2b2xhdGls ZSh4KTsgc3VtICs9IGJhc2U7IFwKCWFzbSB2b2xhdGlsZSh4KTsgc3VtICs9IGJhc2U7IFwKfSB3 aGlsZSAoMCkKCnN0YXRpYyBpbmxpbmUgdW5zaWduZWQgaW50IHJkdHNjKHZvaWQpCnsKCXVuc2ln bmVkIGludCBhLGQ7Cglhc20gdm9sYXRpbGUoInJkdHNjIjoiPWEiKGEpLCI9ZCIoZCk6OiJtZW1v cnkiKTsKCXJldHVybiBhOwp9CgojZGVmaW5lIFRFU1QoeCkgZG8geyBcCgl1bnNpZ25lZCBpbnQg cyA9IHJkdHNjKCk7IFwKCUxPT1AoeCk7IFwKCXMgPSByZHRzYygpLXM7IFwKCWZwcmludGYoc3Rk ZXJyLCAiICAiICN4ICI9JXVsOiAlZlxuIiwgc3VtLCBzIC8gKGRvdWJsZSlOUik7IFwKfSB3aGls ZSAoMCkKCmludCBtYWluKGludCBhcmdjLCBjaGFyICoqYXJndikKewoJdW5zaWduZWQgbG9uZyBi YXNlID0gMCwgc3VtID0gMDsKCXVuc2lnbmVkIGxvbmcgemVybyA9IDA7CgoJVEVTVCgibm9wIik7 CglURVNUKCJub3AiKTsKCVRFU1QoIm5vcCIpOwoJVEVTVCgibW92ICUxLCUwIjoiPXIiKGJhc2Up OiJtIih6ZXJvKSk7CglURVNUKCJyZGdzYmFzZSAlMCI6Ij1yIihiYXNlKSk7CglyZXR1cm4gMDsK fQo= --0000000000009a2b780607dac7a8--