Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp642963rdb; Fri, 17 Nov 2023 08:38:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IFE6L+y0gHOPv9ZXNVtf/hfhHf4Y2UuMXZX3MhA05ckZinXehCDvvTs3vmgcEwQEXu01hkg X-Received: by 2002:a17:90b:3a8a:b0:280:cc47:b60d with SMTP id om10-20020a17090b3a8a00b00280cc47b60dmr20582247pjb.14.1700239091767; Fri, 17 Nov 2023 08:38:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700239091; cv=none; d=google.com; s=arc-20160816; b=cyryBsipEMpmJxJMFgrQdvgaNr4seyF3dwpLRBnqUKhi67dHxUZdtk8euUr2bm/32X iO0EKoFAR6gNq473wBcL9Ch1flCNdTMYTUM0L0QaNRt8yR6M4lQHHPHeuPDqauqXbOWL Jj5epZY779ivMIPChjiZcycMno+1c2Pfra9yjO1PoFcBIqSPdRQ80b4c3FN9YMvJks5+ JXTqtWYMy1Nf2ViGReQLDo+IrQM0Bhgo1gnpF0c6+P0o6md9IwTnGgkOVRXHavCZPH/G xkaLz5jP2wRc3Ghf8+i6eFVdAJZQ/6QB/TK97NgQtxF8aepnlhp2pglS0Y6vdd+7H2rf hr6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=zxwrR6Bt0E1+oJZAVCUeFTOBfuZBCQ6JEnD8LDGAX94=; fh=eEzSECb1sFEllbfC0/KkIx5q1+AX7YUAvmR6ZsCntXw=; b=nhOIg7djJFV2YSWlNYmuwk37XhbIwwUgeO7+SE5Za/qE4juDWHps8o/DGX5QilzNbC 6GG1Zw6o9Z+R+Lg2yTARWJdDxQ63W80gZ/ddlGTPUsbmf4jDek8sPYda/pBNHx8K2j72 NTi5A8C4n1nBixpfssEpBqyB/7Wm6VDJEHtHTlTmB2OBTv3MLP4X+jH2yxMW33xSlNnO DaAANNhdZMTEOR8bcY4mekAayzgCztUUzIjM8tDAz9YT/CLznzsqFCbUvfTNKrUHBhjQ 35Ye/IpROR846c552ues32LYyhOebQ9+pk1ITpSv+S8o1P9Zu5nVSd2ZAe33IzrTHH7I 1ppw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=MKyB3mRV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id q6-20020a17090a2e0600b002748c1bbd79si4707077pjd.6.2023.11.17.08.38.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Nov 2023 08:38:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=MKyB3mRV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 3DE7A8226F17; Fri, 17 Nov 2023 08:38:08 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231693AbjKQQiB (ORCPT + 99 others); Fri, 17 Nov 2023 11:38:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230383AbjKQQiA (ORCPT ); Fri, 17 Nov 2023 11:38:00 -0500 Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C20DD196 for ; Fri, 17 Nov 2023 08:37:56 -0800 (PST) Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-540c54944c4so4191763a12.1 for ; Fri, 17 Nov 2023 08:37:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700239075; x=1700843875; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zxwrR6Bt0E1+oJZAVCUeFTOBfuZBCQ6JEnD8LDGAX94=; b=MKyB3mRVpjZ8Y/emNgPb6FuVppYOzJxhIqNc0cpBc4c3JKcRiULms9WEDcMH8YUorV SJonmIOwVrtmBABPgwk58YIfN6upCT6CwTN0Yy2gi/OimoZxzlsTefpTCYvIlex9Muun dV4iD9wlCx34WyC1/BdXv9Zz9hpleKRf6ASqjtud3K6vEtKnP3N9pdD4KZXmOtgIf13q xkREuHBtRF7ObxPHm5RpVg2VNrcT2Uetpbi0wMZMBSpXcDSzEIRyDkhg4mc5yQkmCkyb Xyu+HupKVkY8bmQPDLM0eXYFdXZJNUJ/nn1HOuZJj5gtNVMlKJ5p+7q0h8wZedaMvmQ9 +wGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700239075; x=1700843875; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zxwrR6Bt0E1+oJZAVCUeFTOBfuZBCQ6JEnD8LDGAX94=; b=s9a2LHhhWyqtA56AN0jOCaykox956L7BhUWjBSG3pQbvAgssmGcATFTxVH5+C18XY9 gQOvXICXZcj/vGr8PMEWSRc0Ch8wNajEXToLbxoY9Tp1WGzIRDhThlmWaOqyHRygfjQ0 wyv7U+kLTfTXnAT06lnLCQY1I/twkF2Zw+e6URJjXvC4q2S6vXyQXb1sWNWI+5ZD+uwQ oORzoeSL761vHNwjFbUN6TXCW8Aj99TLVuTWCiyqRzaE73PYe5RewA5hIen4LjNdPwbz pFmUPfQuf3aA8rmckmlNAAvChSSzwxz7E1FjALN1CG6kMw1/hakzzitFJ0GpCg5aE8Dx hLzw== X-Gm-Message-State: AOJu0YxXYhy/Qup5yEdaiawZjHbYlGipSmzVsDg0Bx0aJUa150RD3NaQ cf5SdXEwyDr3383LP5NhLZBWBW5oCV7XTknTV7k= X-Received: by 2002:a05:6402:26cd:b0:540:16be:6562 with SMTP id x13-20020a05640226cd00b0054016be6562mr6601215edd.15.1700239074915; Fri, 17 Nov 2023 08:37:54 -0800 (PST) MIME-Version: 1.0 References: <20231114164416.208285-1-ubizjak@gmail.com> <367bc727-3f26-4e78-8e58-af959760b3fc@intel.com> In-Reply-To: <367bc727-3f26-4e78-8e58-af959760b3fc@intel.com> From: Uros Bizjak Date: Fri, 17 Nov 2023 17:37:43 +0100 Message-ID: Subject: Re: [PATCH] x86/smp: Use atomic_try_cmpxchg() to micro-optimize native_stop_other_cpus() To: Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Peter Zijlstra (Intel)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 17 Nov 2023 08:38:08 -0800 (PST) On Fri, Nov 17, 2023 at 5:18=E2=80=AFPM Dave Hansen = wrote: > > On 11/14/23 08:43, Uros Bizjak wrote: > > Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) =3D= =3D old > > in native_stop_other_cpus(). On x86 the CMPXCHG instruction returns suc= cess > > in the ZF flag, so this change saves a compare after CMPXCHG. Together > > with a small code reorder, the generated asm code improves from: > > > > 74: 8b 05 00 00 00 00 mov 0x0(%rip),%eax > > 7a: 41 54 push %r12 > > 7c: 55 push %rbp > > 7d: 65 8b 2d 00 00 00 00 mov %gs:0x0(%rip),%ebp > > 84: 53 push %rbx > > 85: 85 c0 test %eax,%eax > > 87: 75 71 jne fa > > 89: b8 ff ff ff ff mov $0xffffffff,%eax > > 8e: f0 0f b1 2d 00 00 00 lock cmpxchg %ebp,0x0(%rip) > > 95: 00 > > 96: 83 f8 ff cmp $0xffffffff,%eax > > 99: 75 5f jne fa > > > > to: > > > > 74: 8b 05 00 00 00 00 mov 0x0(%rip),%eax > > 7a: 85 c0 test %eax,%eax > > 7c: 0f 85 84 00 00 00 jne 106 > > 82: 41 54 push %r12 > > 84: b8 ff ff ff ff mov $0xffffffff,%eax > > 89: 55 push %rbp > > 8a: 53 push %rbx > > 8b: 65 8b 1d 00 00 00 00 mov %gs:0x0(%rip),%ebx > > 92: f0 0f b1 1d 00 00 00 lock cmpxchg %ebx,0x0(%rip) > > 99: 00 > > 9a: 75 5e jne fa > > > > Please note early exit and lack of CMP after CMPXCHG. > > Uros, I really do appreciate that you are trying to optimize these > paths. But the thing we have to balance is the _need_ for optimization > with the chance that this will break something. > > This is about as much of a slow path as we have in the kernel. It's > only used at shutdown, right? That means this is one of the places in > the kernel that least needs optimization. It can only possibly get run > once per boot. > > So, the benefit is that it might make this code a few cycles faster. In > practice, it might not even be measurably faster. > > On the other hand, this is relatively untested and also makes the C code > more complicated. > > Is there some other side benefit that I'm missing here? Applying this > patch doesn't seem to have a great risk/reward ratio. Yes, in addition to better asm code, I think that the use of magic constant (-1) is not descriptive at all. I tried to make this code look like nmi_panic() from kernel/panic.c, which has similar functionality, and describe that this constant belongs to old_cpu (same as in nmi_panic() ). Also, from converting many cmpxchg to try_cmpxchg, it becomes evident that in cases like this (usage in "if" clauses) the correct locking primitive is try_cmpxchg. Additionally, in this particular case, it is not the speed, but a little code save that can be achieved with the same functionality. Thanks, Uros.