Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp5773024rdb; Sun, 17 Sep 2023 11:34:41 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGx8v7R0IpuW2wHLWE5b2HJ3ksNhz3O2e7PjvYbtuJR7LR67/u8QWOHS5Wh1YhbmqRVVtgJ X-Received: by 2002:a17:902:e84a:b0:1c5:69fa:23e9 with SMTP id t10-20020a170902e84a00b001c569fa23e9mr898377plg.58.1694975681128; Sun, 17 Sep 2023 11:34:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694975681; cv=none; d=google.com; s=arc-20160816; b=LlGtduGslzNsHECmiRawGIQc3QmKf1j3TpZ93LUsEtMaLqsqR1CLEp/mSzu+/eQeVm HknYvbL9Y8vYo9dkaTfv7wChWGNYRrsPxDj2xAS0p1stys1JsuczDYlYRPRIZMBljSe2 eMB7j5oU1Oy2EZtB+ahMBKETChjPwnI+k2pmKpaxuOXqrP86uEiKNr6dbtDFNY36TvZ2 MMK7jbeumDS/Mddg0J8e1AmqG/hrR8VjKokq6g/hJGJ81ZYSFfivguW80MX9CB1gZ/QQ DE942tGK9nyzG0vsdqkOwfdhx+7QfhTmZAXBbaTOxXUdoxe0ot1x02A9eV8z+x0dk/El TV2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=94+/FxvYRL5z0MINdWqlWoO6PcgiG12ebXRGY2RbA3c=; fh=ojIY2OqTzoafIkOYqsdtGPjC8/0Zhn+lzF5iYQ/pn/Y=; b=OLxByfy/Ar2bABULou+aep13VrqoSjPncIW7CLll06xbSrhFMVUdWie95tHoWR8xZJ clQ3gyfgn1GIUhNFtiV30xiSZ+6SI3oBWpXG7xN3f2+v0kT7z5Pc9l89q6BqX9af5HWG 5bjwr0CxqtxRbvxOmTQL8MqtbXKpCTLILGEMEsdj7KtJKpMym3poE3/85G0ifsfZ0nLj q+P2AhprunXCki0njV6L1ChSAY+lCHO/WxmE9xCzVmopT78uEAtW5boFj4yFdbAxH/rF jrN0hCQnGc99B1Ap+6HhzO3ZLHtZ0kViig3M6IEB68pztXS9hm+6KvNZIPCpyCKFMGk9 xcTA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=YpWlv62C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id i16-20020a170902c95000b001c353153012si7133117pla.415.2023.09.17.11.34.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 17 Sep 2023 11:34:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=YpWlv62C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 25759811ADB4; Sun, 17 Sep 2023 11:34:36 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233512AbjIQSeA (ORCPT + 99 others); Sun, 17 Sep 2023 14:34:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237641AbjIQSdo (ORCPT ); Sun, 17 Sep 2023 14:33:44 -0400 Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com [IPv6:2a00:1450:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3A9B94; Sun, 17 Sep 2023 11:33:37 -0700 (PDT) Received: by mail-lj1-x236.google.com with SMTP id 38308e7fff4ca-2b9d07a8d84so60295021fa.3; Sun, 17 Sep 2023 11:33:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1694975616; x=1695580416; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=94+/FxvYRL5z0MINdWqlWoO6PcgiG12ebXRGY2RbA3c=; b=YpWlv62CotP85BRriZex1MK/cjaUKO37bJQR2FYvD+cWq0u5Ho2MNaz7V0X3fC9w3X N5ETBWlk7O/lhzL5v3HPLwBo/jNhWOq5lSJ/zST/jzE40vLPLbOJV5pGwW3Nb5RqISj0 Ek6gmpu5EDvM+pLVPnyNvkGwZlKP7J9a5DcGC7Nxq7v0r8GnKaqPNXClwxgbHKiMfNaK PpTbaEsR5GQcM0S3x3A11NfGqrq8udREDPsj+LQ7HUQoVYmUO9u3Ml5L5dkCcbdUpjHG A655A1sdelqndzOK5DXKMzG04GG92BTNxJVP6R2y6s97NhZ6reAFyohmUhHwXQY/b+v2 DwPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694975616; x=1695580416; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=94+/FxvYRL5z0MINdWqlWoO6PcgiG12ebXRGY2RbA3c=; b=gCi0oyKGX33g50EtBGYgZs9Nj4M43nO1w/LtQvEvczaFRphrIjApT786qTHiJnSDOY OWhmEdwNH8jpQ5WC1yttHsaiVkCxZBI9qq0Mc9uWOaDBYOFwMYEC4zK5XB7IoIzQq4vI KaW7RCiK6e9T5irZ/CIoshYBlVv/8eN5hbMA9DMEkJZBcZRn/NHv3thDg2NN1JqNUT2Y jzQcwdy9gM3DxrdfH6cQ9g45b7r5M3c/fiVhXL1iTImoac6csEkLaceoNEg+urqqH/KH GQ+0IDIqUXMiBXNXQCJsBNz7HTceIsmjTnYpwJyglIh0f5zGxTZRQa+rnK5phrO3HHHR QdUQ== X-Gm-Message-State: AOJu0YxqAaiPyqd5JZe2Gf1SJAXxX5wDIgQHU52WF5jQ0ntv6A2KYl6u pujbEufZ8MXiCcy6mi9MtlQtUUAJ0KxjWjpavZU= X-Received: by 2002:a05:6512:39c9:b0:501:c996:1996 with SMTP id k9-20020a05651239c900b00501c9961996mr7721960lfu.67.1694975615672; Sun, 17 Sep 2023 11:33:35 -0700 (PDT) MIME-Version: 1.0 References: <20230906185941.53527-1-ubizjak@gmail.com> <169477710252.27769.14094735545135203449.tip-bot2@tip-bot2> In-Reply-To: From: Uros Bizjak Date: Sun, 17 Sep 2023 20:33:24 +0200 Message-ID: Subject: Re: [tip: x86/asm] x86/percpu: Define {raw,this}_cpu_try_cmpxchg{64,128} To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-tip-commits@vger.kernel.org, Ingo Molnar , Peter Zijlstra , x86@kernel.org Content-Type: multipart/mixed; boundary="0000000000007a68fa06059243b2" X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Sun, 17 Sep 2023 11:34:36 -0700 (PDT) --0000000000007a68fa06059243b2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Now also with the patch attached. Uros. On Sun, Sep 17, 2023 at 8:31=E2=80=AFPM Uros Bizjak wro= te: > > On Fri, Sep 15, 2023 at 6:45=E2=80=AFPM Linus Torvalds > wrote: > > > > On Fri, 15 Sept 2023 at 04:25, tip-bot2 for Uros Bizjak > > wrote: > > > > > > Several places in mm/slub.o improve from e.g.: > > > > > [...] > > > > > > to: > > > > > > 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx > > > 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx > > > 53c4: 4c 89 f8 mov %r15,%rax > > > 53c7: 48 8d 37 lea (%rdi),%rsi > > > 53ca: e8 00 00 00 00 call 53cf <...> > > > 53cb: R_X86_64_PLT32 this_cpu_cmpxchg16b_= emu-0x4 > > > 53cf: 75 bb jne 538c <...> > > > > Honestly, if y ou care deeply about this code sequence, I think you > > should also move the "lea" out of the inline asm. > > I have to say that the above asm code was shown mostly as an example > of the improvement, to illustrate how the compare sequence at the end > of the cmpxchg loop gets eliminated. Being a fairly mechanical change, > I didn't put much thought in the surrounding code. > > > Both > > > > call this_cpu_cmpxchg16b_emu > > > > and > > > > cmpxchg16b %gs:(%rsi) > > > > are 5 bytes, and I suspect it's easiest to just always put the address > > in %rsi - whether you call the function or not. > > > > It doesn't really make the code generation for the non-call sequence > > worse, and it gives the compiler more information (ie instead of > > clobbering %rsi, the compiler knows what %rsi contains). > > > > IOW, something like this: > > > > - asm qual (ALTERNATIVE("leaq %P[var], %%rsi; call > > this_cpu_cmpxchg16b_emu", \ > > + asm qual (ALTERNATIVE("call this_cpu_cmpxchg16b_emu", = \ > > ... > > - "c" (new__.high) = \ > > - : "memory", "rsi"); = \ > > + "c" (new__.high), = \ > > + "S" (&_var) \ > > + : "memory"); = \ > > > > should do it. > > Yes, and the above change improves slub.o assembly from (current tip > tree with try_cmpxchg patch applied): > > 53b3: 41 8b 44 24 28 mov 0x28(%r12),%eax > 53b8: 49 8b 3c 24 mov (%r12),%rdi > 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx > 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx > 53c4: 4c 89 f8 mov %r15,%rax > 53c7: 48 8d 37 lea (%rdi),%rsi > 53ca: e8 00 00 00 00 call 53cf > 53cb: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 > 53cf: 75 bb jne 538c > > to: > > 53b3: 41 8b 44 24 28 mov 0x28(%r12),%eax > 53b8: 49 8b 34 24 mov (%r12),%rsi > 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx > 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx > 53c4: 4c 89 f8 mov %r15,%rax > 53c7: e8 00 00 00 00 call 53cc > 53c8: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 > 53cc: 75 be jne 538c > > where an effective reg-reg move "lea (%rdi), %rsi" at 537c gets > removed. And indeed, GCC figures out that %rsi holds the address of > the variable and emits: > > 5: 65 48 0f c7 0e cmpxchg16b %gs:(%rsi) > > alternative replacement. > > Now, here comes the best part: We can get rid of the %P modifier. With > named address spaces (__seg_gs), older GCCs had some problems with %P > and emitted "%gs:foo" instead of foo, resulting in "Warning: segment > override on `lea' is ineffectual" assembly warning. With the proposed > change, we use: > > --cut here-- > int __seg_gs g; > > void foo (void) > { > asm ("%0 %1" :: "m"(g), "S"(&g)); > } > --cut here-- > > and get the desired assembly: > > movl $g, %esi > %gs:g(%rip) %rsi > > The above is also in line with [1], where it is said that > "[__seg_gs/__seg_fs] address spaces are not considered to be subspaces > of the generic (flat) address space." So, cmpxchg16b_emu.S must use > %gs to apply segment base offset, which it does. > > > Note that I think this is particularly true of the slub code, because > > afaik, the slub code will *only* use the slow call-out. > > > > Why? Because if the CPU actually supports the cmpxchgb16 instruction, > > then the slub code won't even take this path at all - it will do the > > __CMPXCHG_DOUBLE path, which does an unconditional locked cmpxchg16b. > > > > Maybe I'm misreading it. And no, none of this matters. But since I saw > > the patch fly by, and slub.o mentioned, I thought I'd point out how > > silly this all is. It's optimizing a code-path that is basically never > > taken, and when it *is* taken, it can be improved further, I think. > > True, but as mentioned above, the slub.o code was used to illustrate > the effect of the patch. The new locking primitive should be usable in > a general way and could be also used in other places. > > [1] https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html#x86-Name= d-Address-Spaces > > Uros. --0000000000007a68fa06059243b2 Content-Type: text/plain; charset="US-ASCII"; name="p.diff.txt" Content-Disposition: attachment; filename="p.diff.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lmnsp9lt0 ZGlmZiAtLWdpdCBhL2FyY2gveDg2L2luY2x1ZGUvYXNtL3BlcmNwdS5oIGIvYXJjaC94ODYvaW5j bHVkZS9hc20vcGVyY3B1LmgKaW5kZXggYTg3ZGI2MTQwZmUyLi4zMzFhOWQ0ZGNlODIgMTAwNjQ0 Ci0tLSBhL2FyY2gveDg2L2luY2x1ZGUvYXNtL3BlcmNwdS5oCisrKyBiL2FyY2gveDg2L2luY2x1 ZGUvYXNtL3BlcmNwdS5oCkBAIC0yNDIsMTQgKzI0MiwxNSBAQCBkbyB7CQkJCQkJCQkJXAogCW9s ZF9fLnZhciA9IF9vdmFsOwkJCQkJCVwKIAluZXdfXy52YXIgPSBfbnZhbDsJCQkJCQlcCiAJCQkJ CQkJCQlcCi0JYXNtIHF1YWwgKEFMVEVSTkFUSVZFKCJsZWFsICVQW3Zhcl0sICUlZXNpOyBjYWxs IHRoaXNfY3B1X2NtcHhjaGc4Yl9lbXUiLCBcCisJYXNtIHF1YWwgKEFMVEVSTkFUSVZFKCJjYWxs IHRoaXNfY3B1X2NtcHhjaGc4Yl9lbXUiLAkJXAogCQkJICAgICAgImNtcHhjaGc4YiAiIF9fcGVy Y3B1X2FyZyhbdmFyXSksIFg4Nl9GRUFUVVJFX0NYOCkgXAogCQkgIDogW3Zhcl0gIittIiAoX3Zh ciksCQkJCQlcCiAJCSAgICAiK2EiIChvbGRfXy5sb3cpLAkJCQkJXAogCQkgICAgIitkIiAob2xk X18uaGlnaCkJCQkJCVwKIAkJICA6ICJiIiAobmV3X18ubG93KSwJCQkJCVwKLQkJICAgICJjIiAo bmV3X18uaGlnaCkJCQkJCVwKLQkJICA6ICJtZW1vcnkiLCAiZXNpIik7CQkJCQlcCisJCSAgICAi YyIgKG5ld19fLmhpZ2gpLAkJCQkJXAorCQkgICAgIlMiICgmX3ZhcikJCQkJCQlcCisJCSAgOiAi bWVtb3J5Iik7CQkJCQkJXAogCQkJCQkJCQkJXAogCW9sZF9fLnZhcjsJCQkJCQkJXAogfSkKQEAg LTI3MSw3ICsyNzIsNyBAQCBkbyB7CQkJCQkJCQkJXAogCW9sZF9fLnZhciA9ICpfb3ZhbDsJCQkJ CQlcCiAJbmV3X18udmFyID0gX252YWw7CQkJCQkJXAogCQkJCQkJCQkJXAotCWFzbSBxdWFsIChB TFRFUk5BVElWRSgibGVhbCAlUFt2YXJdLCAlJWVzaTsgY2FsbCB0aGlzX2NwdV9jbXB4Y2hnOGJf ZW11IiwgXAorCWFzbSBxdWFsIChBTFRFUk5BVElWRSgiY2FsbCB0aGlzX2NwdV9jbXB4Y2hnOGJf ZW11IiwJCVwKIAkJCSAgICAgICJjbXB4Y2hnOGIgIiBfX3BlcmNwdV9hcmcoW3Zhcl0pLCBYODZf RkVBVFVSRV9DWDgpIFwKIAkJICBDQ19TRVQoeikJCQkJCQlcCiAJCSAgOiBDQ19PVVQoeikgKHN1 Y2Nlc3MpLAkJCQlcCkBAIC0yNzksOCArMjgwLDkgQEAgZG8gewkJCQkJCQkJCVwKIAkJICAgICIr YSIgKG9sZF9fLmxvdyksCQkJCQlcCiAJCSAgICAiK2QiIChvbGRfXy5oaWdoKQkJCQkJXAogCQkg IDogImIiIChuZXdfXy5sb3cpLAkJCQkJXAotCQkgICAgImMiIChuZXdfXy5oaWdoKQkJCQkJXAot CQkgIDogIm1lbW9yeSIsICJlc2kiKTsJCQkJCVwKKwkJICAgICJjIiAobmV3X18uaGlnaCksCQkJ CQlcCisJCSAgICAiUyIgKCZfdmFyKQkJCQkJCVwKKwkJICA6ICJtZW1vcnkiKTsJCQkJCQlcCiAJ aWYgKHVubGlrZWx5KCFzdWNjZXNzKSkJCQkJCQlcCiAJCSpfb3ZhbCA9IG9sZF9fLnZhcjsJCQkJ CVwKIAlsaWtlbHkoc3VjY2Vzcyk7CQkJCQkJXApAQCAtMzA5LDE0ICszMTEsMTUgQEAgZG8gewkJ CQkJCQkJCVwKIAlvbGRfXy52YXIgPSBfb3ZhbDsJCQkJCQlcCiAJbmV3X18udmFyID0gX252YWw7 CQkJCQkJXAogCQkJCQkJCQkJXAotCWFzbSBxdWFsIChBTFRFUk5BVElWRSgibGVhcSAlUFt2YXJd LCAlJXJzaTsgY2FsbCB0aGlzX2NwdV9jbXB4Y2hnMTZiX2VtdSIsIFwKKwlhc20gcXVhbCAoQUxU RVJOQVRJVkUoImNhbGwgdGhpc19jcHVfY21weGNoZzE2Yl9lbXUiLAkJXAogCQkJICAgICAgImNt cHhjaGcxNmIgIiBfX3BlcmNwdV9hcmcoW3Zhcl0pLCBYODZfRkVBVFVSRV9DWDE2KSBcCiAJCSAg OiBbdmFyXSAiK20iIChfdmFyKSwJCQkJCVwKIAkJICAgICIrYSIgKG9sZF9fLmxvdyksCQkJCQlc CiAJCSAgICAiK2QiIChvbGRfXy5oaWdoKQkJCQkJXAogCQkgIDogImIiIChuZXdfXy5sb3cpLAkJ CQkJXAotCQkgICAgImMiIChuZXdfXy5oaWdoKQkJCQkJXAotCQkgIDogIm1lbW9yeSIsICJyc2ki KTsJCQkJCVwKKwkJICAgICJjIiAobmV3X18uaGlnaCksCQkJCQlcCisJCSAgICAiUyIgKCZfdmFy KQkJCQkJCVwKKwkJICA6ICJtZW1vcnkiKTsJCQkJCQlcCiAJCQkJCQkJCQlcCiAJb2xkX18udmFy OwkJCQkJCQlcCiB9KQpAQCAtMzM4LDcgKzM0MSw3IEBAIGRvIHsJCQkJCQkJCQlcCiAJb2xkX18u dmFyID0gKl9vdmFsOwkJCQkJCVwKIAluZXdfXy52YXIgPSBfbnZhbDsJCQkJCQlcCiAJCQkJCQkJ CQlcCi0JYXNtIHF1YWwgKEFMVEVSTkFUSVZFKCJsZWFxICVQW3Zhcl0sICUlcnNpOyBjYWxsIHRo aXNfY3B1X2NtcHhjaGcxNmJfZW11IiwgXAorCWFzbSBxdWFsIChBTFRFUk5BVElWRSgiY2FsbCB0 aGlzX2NwdV9jbXB4Y2hnMTZiX2VtdSIsCQlcCiAJCQkgICAgICAiY21weGNoZzE2YiAiIF9fcGVy Y3B1X2FyZyhbdmFyXSksIFg4Nl9GRUFUVVJFX0NYMTYpIFwKIAkJICBDQ19TRVQoeikJCQkJCQlc CiAJCSAgOiBDQ19PVVQoeikgKHN1Y2Nlc3MpLAkJCQlcCkBAIC0zNDYsOCArMzQ5LDkgQEAgZG8g ewkJCQkJCQkJCVwKIAkJICAgICIrYSIgKG9sZF9fLmxvdyksCQkJCQlcCiAJCSAgICAiK2QiIChv bGRfXy5oaWdoKQkJCQkJXAogCQkgIDogImIiIChuZXdfXy5sb3cpLAkJCQkJXAotCQkgICAgImMi IChuZXdfXy5oaWdoKQkJCQkJXAotCQkgIDogIm1lbW9yeSIsICJyc2kiKTsJCQkJCVwKKwkJICAg ICJjIiAobmV3X18uaGlnaCksCQkJCQlcCisJCSAgICAiUyIgKCZfdmFyKQkJCQkJCVwKKwkJICA6 ICJtZW1vcnkiKTsJCQkJCQlcCiAJaWYgKHVubGlrZWx5KCFzdWNjZXNzKSkJCQkJCQlcCiAJCSpf b3ZhbCA9IG9sZF9fLnZhcjsJCQkJCVwKIAlsaWtlbHkoc3VjY2Vzcyk7CQkJCQkJXAo= --0000000000007a68fa06059243b2--