Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8399743imu; Fri, 28 Dec 2018 17:33:50 -0800 (PST) X-Google-Smtp-Source: ALg8bN4reBPusaNKozd7XlH7oJjI9avEoai4y5heMhShfUn5cqGBqRYE66shCGO/Wt2iAqzauKmt X-Received: by 2002:a62:55c4:: with SMTP id j187mr30038128pfb.129.1546047230163; Fri, 28 Dec 2018 17:33:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546047230; cv=none; d=google.com; s=arc-20160816; b=ORoGuEyqAaQz22m/2lJcnxdbY0xJq8LsSV6XCOm8XFqxMiJ1PGM76bPkWDdsjHbtpI 7NdduDjvx7JZWvDG9liYozxB+PBkmYSDinkHN+dQR1etYp7ithKpaL9sqIlbtC9y5ALo 5zTN/1ALLe2gXB4ZskzzJQRlganYnPpdOhNdPBYv4MdT2tZNK5Ptq1Ytep5znCAcb/9O M6KOyTNghtZTtAWYlO1bkOfM0Xpzqmp9TUKmK0DBWD+hm4mowNevOLER/lEQx+sSTEk4 VAllzl+3zwyqNDh6Pko8R1RqjZuQs4FQbNJTzreyp1gFhrgbrFV9gG3R30D6wTIWtlHK jB0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=kQ5DundM5mKyJSywQDNbpiGYqXBfM218ZySSb/64ceQ=; b=NU+ywGw5eLYrYCiffDUv0ePiZ8KlNr6ZX/WTLr5hOzEuxDZw7vfCNuyuWx1/FLEHkr Ai561GVGimR5U6RmJy1DQuz7NJOgQLzBWXoRJ2tZGs/KYvgmXwA8ex/4sYhDPWPUcccS a02s11bn7yaLm2qLkcZ+obxh72TP6t7E9aYaWg4OoaNwcswsBGDVIz1LSIYQi3WRFu5m aeRj+NfmuMzfGqRdpKsxVxCY3PBlQ456m57XKvWUD9JvL5FIhT/qF7XKrXAcn84BAXxA vwQk4LZU2SAqaz3nyOY+0EYwrZALA/9wiLRyMkCrAb5HQLyWQ/KXkrjRwbPyS6BiBBEQ b8BA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=JF1iGwxW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 26si38929172pgu.190.2018.12.28.17.33.34; Fri, 28 Dec 2018 17:33:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=JF1iGwxW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732567AbeL1PiR (ORCPT + 99 others); Fri, 28 Dec 2018 10:38:17 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:40684 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731178AbeL1PiQ (ORCPT ); Fri, 28 Dec 2018 10:38:16 -0500 Received: by mail-wr1-f68.google.com with SMTP id p4so21221352wrt.7 for ; Fri, 28 Dec 2018 07:38:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=kQ5DundM5mKyJSywQDNbpiGYqXBfM218ZySSb/64ceQ=; b=JF1iGwxWslVZKyYQT+YpI1icne4n5O3VmEmqvzX9eFGyFUHmAdpyv1EtPIlUoU0Ysz U0AiOjhg6OHEPmy8f441wAJD3eti01HfUaTCTaJhO0d1SxOCpngh3gskjhpnYH+afZGg AErYNZ9npZAzBkk/fppshty07I8jPHNYy9KdH3xG1xDdT0wOQlNa/qKhYAVUiStiqugs 9XMOc4iRMlEFtFRCzM0NmEIUeE7TDcu0TzD5XBYYS5pxF8BPthT+TvCVzkNnFqzQDDgG ItXL8j+yORbZxO5/R15rDZujB0hGBS0IC1N54LC41w9C2xxP3qvkBlUeUvCBviCul5vk Ejkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=kQ5DundM5mKyJSywQDNbpiGYqXBfM218ZySSb/64ceQ=; b=gJ0Vq7zKF/THiH5EtZ+lwwZZ5vH0apzHxaxEp6+jgnKgpu0t+YY+84dlWmGzSb3lNQ gxXxYIlWkLzy816eO88gZx/tX6Qn/FsWSTY851XgfZo5/xJMRCqIh5ui6udet9OopzEv KaSXe009bXxIxp27MxGA3bI2fA+D2IIMpyz6Q/tw/iiudJoYwWcM2YejxwuQnhG708rg kckF/FkUuS0f1hHtuMlv+RvINK8p34hF2fqJifMO6LQXcalAJjjb4BySw+vHtmV4NCMt erFv2VA/3Qm23Q6Dw3NVL3DuRSBLioKLw1Wr9m+mGdYr6C20KKq6hqe+idzTBLoaEUwE rD1w== X-Gm-Message-State: AJcUukfxPn4j4Dt4wupSIUPUxQ1MBX8wHIsWMVEkVv16Hit95qf0mkld D/jeOU2qgJWRmi/sV5w066T46HZP X-Received: by 2002:a5d:4a0c:: with SMTP id m12mr25263961wrq.38.1546011493956; Fri, 28 Dec 2018 07:38:13 -0800 (PST) Received: from localhost.localdomain.com (93-103-18-160.static.t-2.net. [93.103.18.160]) by smtp.gmail.com with ESMTPSA id v12sm18667215wmd.27.2018.12.28.07.38.12 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 28 Dec 2018 07:38:13 -0800 (PST) From: Uros Bizjak To: linux-kernel@vger.kernel.org Cc: Uros Bizjak , x86@kernel.org Subject: [PATCH] x86/asm: Use CC_SET/CC_OUT in percpu_cmpxchg16b_double Date: Fri, 28 Dec 2018 16:37:59 +0100 Message-Id: <20181228153759.3132-1-ubizjak@gmail.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Use CC_SET(z)/CC_OUT(z) instead of explicit setz instruction. Using these two defines, the compiler that supports generation of condition code outputs from inline assembly flags generates one conditional jump instruction, e.g: [lea (%rdi),%rsi callq this_cpu_cmpxchg16b_emu] -- or -- [cmpxchg16b %gs:(%rdi)] jne 199764 instead of [lea (%rdi),%rsi callq this_cpu_cmpxchg16b_emu] -- or -- [cmpxchg16b %gs:(%rdi)] sete %cl test %cl,%cl je 19ae04 The complication with percpu_cmpxchg16b_double is, that the definition defaults to the call to this_cpu_cmpxchg16b_emu library function, which (depending on X86_FEATURE_CX16 flag) is later patched with real cmpxchg16b instruction. To solve this complication, the patch changes this_cpu_cmpxchg16b_emu library function to return the result in ZF flag of %rflags register, instead of %al register. Please also note that instead of popf instruction (which restores flags register to a previously saved state), the patched function uses sti, but followed by a nop, which ends the inhibition of interrupts early. The patch also introduces alternative_io_tail definition. This definition can take a tail instruction, common to all alternatives. By using this definition, it is possible to remove setz from cmpxchg16b alternatives, saved in .altinstr_replacement section, thus saving a few bytes from the binary. Signed-off-by: Uros Bizjak Cc: x86@kernel.org --- arch/x86/include/asm/alternative.h | 6 ++++++ arch/x86/include/asm/percpu.h | 14 ++++++++------ arch/x86/lib/cmpxchg16b_emu.S | 16 ++++++---------- 3 files changed, 20 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 0660e14690c8..49b29990950a 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -205,6 +205,12 @@ static inline int alternatives_text_reserved(void *start, void *end) asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) \ : output : "i" (0), ## input) +/* Like alternative_io, but with a common tail instruction. */ +#define alternative_io_tail(oldinstr, newinstr, feature, tail, \ + output, input...) \ + asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) tail \ + : output : "i" (0), ## input) + /* Like alternative_io, but for replacing a direct call with another one. */ #define alternative_call(oldfunc, newfunc, feature, output, input...) \ asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature) \ diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 1a19d11cfbbd..9cf2e78eb4b3 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -495,12 +495,14 @@ do { \ bool __ret; \ typeof(pcp1) __o1 = (o1), __n1 = (n1); \ typeof(pcp2) __o2 = (o2), __n2 = (n2); \ - alternative_io("leaq %P1,%%rsi\n\tcall this_cpu_cmpxchg16b_emu\n\t", \ - "cmpxchg16b " __percpu_arg(1) "\n\tsetz %0\n\t", \ - X86_FEATURE_CX16, \ - ASM_OUTPUT2("=a" (__ret), "+m" (pcp1), \ - "+m" (pcp2), "+d" (__o2)), \ - "b" (__n1), "c" (__n2), "a" (__o1) : "rsi"); \ + alternative_io_tail("leaq %P1,%%rsi\n\tcall this_cpu_cmpxchg16b_emu", \ + "cmpxchg16b "__percpu_arg(1), \ + X86_FEATURE_CX16, \ + CC_SET(z), \ + ASM_OUTPUT2(CC_OUT(z) (__ret), \ + "+m" (pcp1), "+m" (pcp2), \ + "+a" (__o1), "+d" (__o2)), \ + "b" (__n1), "c" (__n2) : "rsi"); \ __ret; \ }) diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S index 9b330242e740..d8c1ae48e0d9 100644 --- a/arch/x86/lib/cmpxchg16b_emu.S +++ b/arch/x86/lib/cmpxchg16b_emu.S @@ -17,20 +17,20 @@ * %rdx : high 64 bits of old value * %rbx : low 64 bits of new value * %rcx : high 64 bits of new value - * %al : Operation successful + * + * Outputs: + * %rflags.zf: set if the destination operand and %rdx:%rax are equal */ ENTRY(this_cpu_cmpxchg16b_emu) # -# Emulate 'cmpxchg16b %gs:(%rsi)' except we return the result in %al not -# via the ZF. Caller will access %al to get result. +# Emulate 'cmpxchg16b %gs:(%rsi)' # # Note that this is only useful for a cpuops operation. Meaning that we # do *not* have a fully atomic operation but just an operation that is # *atomic* on a single cpu (as provided by the this_cpu_xx class of # macros). # - pushfq cli cmpq PER_CPU_VAR((%rsi)), %rax @@ -41,13 +41,9 @@ ENTRY(this_cpu_cmpxchg16b_emu) movq %rbx, PER_CPU_VAR((%rsi)) movq %rcx, PER_CPU_VAR(8(%rsi)) - popfq - mov $1, %al - ret - .Lnot_same: - popfq - xor %al,%al + sti + nop ret ENDPROC(this_cpu_cmpxchg16b_emu) -- 2.20.1