Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp5802792rdb; Sun, 17 Sep 2023 13:07:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFqmCr6Y4W71dIOg/F1EKLnbTIpgbZR9xOQY7Vcqm2w7GK9QZoLmt5meOPVt/17ZlbPo3as X-Received: by 2002:a05:6a20:3d15:b0:155:2359:2194 with SMTP id y21-20020a056a203d1500b0015523592194mr10663801pzi.46.1694981240395; Sun, 17 Sep 2023 13:07:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694981240; cv=none; d=google.com; s=arc-20160816; b=HEGcbJ706VrLWks13tgcK9h/RrC6gtxTZdRdBA/5iG/JhmOqRgMvv08KOTspu70Whq NvYsS1vALzA0nL6pXhVamzdSs92rotXhOXawyb4FBAE6Cw3LqizlPET25g9B0aT8plNR iDIE/ONUUSfhZgA1kTeiWb1+Sumo3b3p8S1eMXTkhJejVGqywQw+Lxs1subD67ESTmr0 Op6b29zdTAAD6jUbHHxBwEh4c7wzQABeUOX7TcuCbyvQefQh1KwWh7Wp1UtuNqkTtOWM abAPmIt4Da65Sb8KCrqkcjHPerk6PyIpbVtfcd5j9My8ngL6xDB+AoStY/KzocSUKYFw feEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=l/xNNmuhkMGe/cn0NgL7z0L+eqJxdRlO1BnUuZ2r/3c=; fh=ojIY2OqTzoafIkOYqsdtGPjC8/0Zhn+lzF5iYQ/pn/Y=; b=Afc2n49C7H99XYqlN2DBVFooqQZzqLStkAdFg/ph+l6CE6qqChM549O+drJ+YPorip aW7vMs4opQm+IQQv8XocTSfcelNYrrGjrf+XKk1Ubxs4jZe8plt0MJ++thMe/Bzu5EgV P24pVWFuhky2J5A1Qy+xRvjyuc5bVqGRfleVNGnxtI+s9regAAscJKDuEr3RWIGEk4Sq gw5p4j0YEqvX9qK0vtV8TkK3MFAcLC4NSZZ3rmyA/B6KynUs+BaPteSogPjRYMvnYRKT pY/wuls4JQRZGaQVghS/xDBA80RiszhTNHy95IXybQWbHdxHoE0npIWuFwhjuB4YvY/3 6dBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=bGPu0Trx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id z16-20020a631910000000b0057403c26ea0si6605153pgl.391.2023.09.17.13.07.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 17 Sep 2023 13:07:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=bGPu0Trx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id CB63780DCCA2; Sun, 17 Sep 2023 11:32:34 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232842AbjIQSbx (ORCPT + 99 others); Sun, 17 Sep 2023 14:31:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238064AbjIQSbo (ORCPT ); Sun, 17 Sep 2023 14:31:44 -0400 Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78E969F; Sun, 17 Sep 2023 11:31:38 -0700 (PDT) Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-52a5c0d949eso4532282a12.0; Sun, 17 Sep 2023 11:31:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1694975497; x=1695580297; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=l/xNNmuhkMGe/cn0NgL7z0L+eqJxdRlO1BnUuZ2r/3c=; b=bGPu0Trx5rYXJv/Jiwp3Udy9vRiIFzTdRUYph6YAjCxeN41emNhmsD10sSXncY7HEi oCZ6Xz2DuR6hxYmHvwJX+f/fZX8u77UCh5vIK4ym2PaFV2fRhnyf30Mw8mvOp5HRJnOO 3r5bH7nLNGqfcZjK7q5lGXOpisv/j70SZxDBqsdgqxeeYswlePpEbft4bDddBRn/3X7z BNnSzL0pgONz/OGKsN8NU9hka/9J6lMx/B8Wk+Ws7LemdbpFAhi+tU+vJjrw9jkzW9vZ PVAamyC6lebziaCpjs1IPYH7lE7xv/GBklFmP0Ydd5wJ2/GPXFt7RltFOKLIu36NVh5d Jl9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694975497; x=1695580297; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l/xNNmuhkMGe/cn0NgL7z0L+eqJxdRlO1BnUuZ2r/3c=; b=DTIUoT9CiY+nA3zpBlpjPdedva67hzGEu8wSV5RnrMMnO6lG5sml9KJAbntesnbNkA J574G3D3w8TppoSv/T4CaKJXE55Q9zIHHbL1fnieiwYd3MsnCICk6ns2xm51QYOiE0SP tzP85J4AK1jZz00/YIlGpQz0WA+STFNyln0vFG36Nv3eyg/+5Bz05t8RR6xTfgsW4zKV uvgbH8Brk0infsfIr+G7+rjc7pL3JlD+pnePiH11ddJR7wGvXefomwwqjawv493lgElN opWIDb0i0CKcX0ObMYqU/xNiEYVe+OMvCSEm0jIxasRhdCuKrk/Tj19NfnNhZ0PPmQL8 0uxw== X-Gm-Message-State: AOJu0Yxf5S4cjPB4rg25QqfAfW/LPBECCZ2n6fXB3nFx5ReZ+n6x1UxW vnLGwO5NjEjwTVFNSjpF8WOx/bbukGwW6V/Q56ETrBb7I9s= X-Received: by 2002:aa7:d291:0:b0:52a:5848:c674 with SMTP id w17-20020aa7d291000000b0052a5848c674mr5639666edq.12.1694975496547; Sun, 17 Sep 2023 11:31:36 -0700 (PDT) MIME-Version: 1.0 References: <20230906185941.53527-1-ubizjak@gmail.com> <169477710252.27769.14094735545135203449.tip-bot2@tip-bot2> In-Reply-To: From: Uros Bizjak Date: Sun, 17 Sep 2023 20:31:25 +0200 Message-ID: Subject: Re: [tip: x86/asm] x86/percpu: Define {raw,this}_cpu_try_cmpxchg{64,128} To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-tip-commits@vger.kernel.org, Ingo Molnar , Peter Zijlstra , x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Sun, 17 Sep 2023 11:32:34 -0700 (PDT) On Fri, Sep 15, 2023 at 6:45=E2=80=AFPM Linus Torvalds wrote: > > On Fri, 15 Sept 2023 at 04:25, tip-bot2 for Uros Bizjak > wrote: > > > > Several places in mm/slub.o improve from e.g.: > > > [...] > > > > to: > > > > 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx > > 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx > > 53c4: 4c 89 f8 mov %r15,%rax > > 53c7: 48 8d 37 lea (%rdi),%rsi > > 53ca: e8 00 00 00 00 call 53cf <...> > > 53cb: R_X86_64_PLT32 this_cpu_cmpxchg16b_em= u-0x4 > > 53cf: 75 bb jne 538c <...> > > Honestly, if y ou care deeply about this code sequence, I think you > should also move the "lea" out of the inline asm. I have to say that the above asm code was shown mostly as an example of the improvement, to illustrate how the compare sequence at the end of the cmpxchg loop gets eliminated. Being a fairly mechanical change, I didn't put much thought in the surrounding code. > Both > > call this_cpu_cmpxchg16b_emu > > and > > cmpxchg16b %gs:(%rsi) > > are 5 bytes, and I suspect it's easiest to just always put the address > in %rsi - whether you call the function or not. > > It doesn't really make the code generation for the non-call sequence > worse, and it gives the compiler more information (ie instead of > clobbering %rsi, the compiler knows what %rsi contains). > > IOW, something like this: > > - asm qual (ALTERNATIVE("leaq %P[var], %%rsi; call > this_cpu_cmpxchg16b_emu", \ > + asm qual (ALTERNATIVE("call this_cpu_cmpxchg16b_emu", \ > ... > - "c" (new__.high) \ > - : "memory", "rsi"); \ > + "c" (new__.high), \ > + "S" (&_var) \ > + : "memory"); \ > > should do it. Yes, and the above change improves slub.o assembly from (current tip tree with try_cmpxchg patch applied): 53b3: 41 8b 44 24 28 mov 0x28(%r12),%eax 53b8: 49 8b 3c 24 mov (%r12),%rdi 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 53c4: 4c 89 f8 mov %r15,%rax 53c7: 48 8d 37 lea (%rdi),%rsi 53ca: e8 00 00 00 00 call 53cf 53cb: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 53cf: 75 bb jne 538c to: 53b3: 41 8b 44 24 28 mov 0x28(%r12),%eax 53b8: 49 8b 34 24 mov (%r12),%rsi 53bc: 48 8d 4a 40 lea 0x40(%rdx),%rcx 53c0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 53c4: 4c 89 f8 mov %r15,%rax 53c7: e8 00 00 00 00 call 53cc 53c8: R_X86_64_PLT32 this_cpu_cmpxchg16b_emu-0x4 53cc: 75 be jne 538c where an effective reg-reg move "lea (%rdi), %rsi" at 537c gets removed. And indeed, GCC figures out that %rsi holds the address of the variable and emits: 5: 65 48 0f c7 0e cmpxchg16b %gs:(%rsi) alternative replacement. Now, here comes the best part: We can get rid of the %P modifier. With named address spaces (__seg_gs), older GCCs had some problems with %P and emitted "%gs:foo" instead of foo, resulting in "Warning: segment override on `lea' is ineffectual" assembly warning. With the proposed change, we use: --cut here-- int __seg_gs g; void foo (void) { asm ("%0 %1" :: "m"(g), "S"(&g)); } --cut here-- and get the desired assembly: movl $g, %esi %gs:g(%rip) %rsi The above is also in line with [1], where it is said that "[__seg_gs/__seg_fs] address spaces are not considered to be subspaces of the generic (flat) address space." So, cmpxchg16b_emu.S must use %gs to apply segment base offset, which it does. > Note that I think this is particularly true of the slub code, because > afaik, the slub code will *only* use the slow call-out. > > Why? Because if the CPU actually supports the cmpxchgb16 instruction, > then the slub code won't even take this path at all - it will do the > __CMPXCHG_DOUBLE path, which does an unconditional locked cmpxchg16b. > > Maybe I'm misreading it. And no, none of this matters. But since I saw > the patch fly by, and slub.o mentioned, I thought I'd point out how > silly this all is. It's optimizing a code-path that is basically never > taken, and when it *is* taken, it can be improved further, I think. True, but as mentioned above, the slub.o code was used to illustrate the effect of the patch. The new locking primitive should be usable in a general way and could be also used in other places. [1] https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html#x86-Named-= Address-Spaces Uros.