Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp5621304pxb; Mon, 14 Feb 2022 03:42:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJyTFSB2MMhqh953oXZtEePDzQ/2on7tdB4Td5ll2cOdhCbnrVYN40q0HFGqKaT5Cl1XypBw X-Received: by 2002:a17:907:608f:: with SMTP id ht15mr11283757ejc.484.1644838973202; Mon, 14 Feb 2022 03:42:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644838973; cv=none; d=google.com; s=arc-20160816; b=QcfAWLK7ONUyZ/rQc79nWtkKld+499XyScCf4ctYuaytSm2nCIKdd09ETQxqdZCh5i dmQlUv8vAUFVxK38B2bkfO5JTKY/OMrY3bPtwoAIEuV/Q5QoAjVyK85Kc0c46nnwgOf3 EKD/emM7sNsJ9199xQWyJD7gTClnOD0VnrZh64A3oaE+7+aW62lUdTyWhwX1GA+y04xd 3qKtPVe8VQbz0Z5l76HnKsRS1dOJ3eJOjgIYeUYFHLIzyU+/m1uHhDUKatnvKnplJtp/ WwMoL68eG5u5qH1Bd9nWeBQyy40rhDdsXlssnFubZT0dsHhuM4GqD9U21eiXVegrod7m Pmbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=rteO8hmsqvWS5TzaahvTaNw6Vcr2tYr2Ng0DScuG/u4=; b=s3625Gh5o5/0f+tItYR0FJ11qJYfx4OyTMddEE1qVUZ3x0JqCkatq67xwce9cJ8osZ WPMnE5U89fa54BCA96yqUfctFt8ZgvXmSwPO+qTadegc63qM8QSXnxYmbqqyZ4/Yxg49 odc9Pf19gqIyMtF+zAOqNegdkR/lTOgV23m1gopyxhAjcCvtcSLPwS1/ZePExoye4KUv uLeHVZrunpEbU2RQJ4SV6/ujeS//7lkyujvBr7BJhKknq+QXDy7Ni3EG7P/5D0NMLrws g9XrBGMIFjks9/egjqkGAGxiVXPCXhVTW7QeyyV6pdZ4L5S37PfSzlm1TsVOuq9IVxpc iCoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=D5CJ80Ys; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 3si17407111ejm.744.2022.02.14.03.42.30; Mon, 14 Feb 2022 03:42:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=D5CJ80Ys; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354186AbiBKXeL (ORCPT + 93 others); Fri, 11 Feb 2022 18:34:11 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:38364 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243430AbiBKXeI (ORCPT ); Fri, 11 Feb 2022 18:34:08 -0500 Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1791DC66 for ; Fri, 11 Feb 2022 15:34:06 -0800 (PST) Received: by mail-ed1-x535.google.com with SMTP id b13so18824168edn.0 for ; Fri, 11 Feb 2022 15:34:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rteO8hmsqvWS5TzaahvTaNw6Vcr2tYr2Ng0DScuG/u4=; b=D5CJ80Ys6vjr4MzEeX0n32e5+AF9VgVmFwxGHen7st7grLT9jReFAEo+Ny8NMiGFR/ gGpfJlh2TiuVucej1bJwXvXwRBMBnjd0N5i3SA97iaBBWp2MW2nOLZhUBvFupfB2Cq/9 hICuGdCJkV8z7IwIp1bEjqjCPDOiy0ulD1Qt0oQc2fPecW6N7iPLNumHgxgdu088wvlz rjarJqG2HgbQdexVQoUQy0xOT0PWcV/HiBhbe88vYSZJGTyS3OiriAd7vCcFNApexQYB 9ZTPn5mu4rMbDrxi9MeSmKav9v1IL92nOAn3ljYzE+rvck42YMWXBMClBLrJ7IXDd7eC h7IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rteO8hmsqvWS5TzaahvTaNw6Vcr2tYr2Ng0DScuG/u4=; b=5uF9IukuVrh230LRwX4GjMwCphuLs53ii4uE+dByfYLZ3I5YYTjidOIOSuNKGoOdBC 0QMAMMMHA+7KwEh4yFuaW6IgBvZpNV1aNv0X1SmmTM4vH3mzjn64Cvm14oW3O0+stWal 1yasQW3CjqLKRAYEv8xSIP+Wa1PqymAEGenz1wdUUqWG30IqS0M0bBVQDPeAKFJctonr ofdh2nPGXaj9z2PVZnC/ksT3lVAWfR6ShDq+BErdNFA/2YxZmSjPggnn8SgKG1bEwq++ 0pm5A+xfwfZorWkUuMq2Vjeg47T5AU0CiiBrInGm7lRGXFDVuAljZ1vkEUxqcs9YRdzA WWtA== X-Gm-Message-State: AOAM533qjxLfPV2RKBuqyayWvIgH8aoTJqnSJWHKeZ9jlOkYG0F0Wk0B 5j4HDuAyEuQzgUv87E4tjYZTWKzWZ98EQKbu9xkc X-Received: by 2002:a05:6402:7ce:: with SMTP id u14mr4353715edy.35.1644622444358; Fri, 11 Feb 2022 15:34:04 -0800 (PST) MIME-Version: 1.0 References: <20220204005742.1222997-1-morbo@google.com> <20220210223134.233757-1-morbo@google.com> <6b83fa302b974f749c60fc6c456e055f@AcuMS.aculab.com> In-Reply-To: <6b83fa302b974f749c60fc6c456e055f@AcuMS.aculab.com> From: Bill Wendling Date: Fri, 11 Feb 2022 15:33:53 -0800 Message-ID: Subject: Re: [PATCH v4] x86: use builtins to read eflags To: David Laight Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "x86@kernel.org" , "H . Peter Anvin" , Nathan Chancellor , Nick Desaulniers , Juergen Gross , Peter Zijlstra , Andy Lutomirski , "llvm@lists.linux.dev" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 11, 2022 at 2:10 PM David Laight wrote: > > From: Bill Wendling > Sent: 11 February 2022 19:26 > > > > On Fri, Feb 11, 2022 at 8:40 AM David Laight wrote: > > > From: Bill Wendling > > > > Sent: 10 February 2022 22:32 > > > > > > > > GCC and Clang both have builtins to read and write the EFLAGS register. > > > > This allows the compiler to determine the best way to generate this > > > > code, which can improve code generation. > > > > > > > > This issue arose due to Clang's issue with the "=rm" constraint. Clang > > > > chooses to be conservative in these situations, and so uses memory > > > > instead of registers. This is a known issue, which is currently being > > > > addressed. > > > > > > > > However, using builtins is beneficial in general, because it removes the > > > > burden of determining what's the way to read the flags register from the > > > > programmer and places it on to the compiler, which has the information > > > > needed to make that decision. > > > > > > Except that neither gcc nor clang attempt to make that decision. > > > They always do pushf; pop ax; > > > > > It looks like both GCC and Clang pop into virtual registers. The > > register allocator is then able to determine if it can allocate a > > physical register or if a stack slot is required. > > Doing: > int fl; > void f(void) { fl = __builtin_ia32_readeflags_u64(); } > Seems to use register. > If it pops to a virtual register it will probably never pop > into a real target location. > > See https://godbolt.org/z/8aY8o8rhe > Yes, it does produce the appropriate code. What I meant was that, internal to the compiler, the code that's generated before register allocation contains "virtual" registers. I.e., registers that would exist if the machine had an infinite number of registers to use. It's the job of the register allocator to replace those virtual registers with physical ones, or with spills to memory if no registers are available. My (completely made up) example was to show that in an extreme case where no registers are available, and no amount of code motion can alleviate the register pressure, then both clang and gcc will produce the appropriate spills to memory. > But performance wise the pop+mov is just one byte longer. > Instruction decode time might be higher for two instruction, but since > 'pop mem' generates 2 uops (intel) it may be constrained to the first > decoder (I can't rememberthe exact details), but the separate pop+mov > can be decoded in parallel - so could end up faster. > It's the spill to memory that I'm trying to avoid here. I'm not concerned about a "pop ; mov" combination (though even that is one instruction too many except in the most extreme cases). > Actual execution time (if that makes any sense) is really the same. > Two operations, one pop and one memory write. > > I bet you'd be hard pressed to find a piece of code where it even made > a consistent difference. > > > > ... > > > > v4: - Clang now no longer generates stack frames when using these builtins. > > > > - Corrected misspellings. > > > > > > While clang 'head' has been fixed, it seems a bit premature to say > > > it is 'fixed' enough for all clang builds to use the builtin. > > > > > True, but it's been cherry-picked into the clang 14.0.0 branch, which > > is scheduled for release in March. > > > > > Seems better to change it (back) to "=r" and comment that this > > > is currently as good as __builtin_ia32_readeflags_u64() and that > > > clang makes a 'pigs breakfast' of "=rm" - which has only marginal > > > benefit. > > > > > That would be okay as far as code generation is concerned, but it does > > place the burden of correctness back on the programmer. Also, it was > > that at some point, but was changed to "=rm" here. :-) > > As I said, a comment should stop the bounce. > The constraints on this piece of code went from "=g" to "=r" to "=rm". Which is the correct one and why? The last will apparently work in all situations, because the compiler's left to determine the output destination: register or memory. The "=r" constraint may fail if a register cannot be allocated for some reason. > ... > > I was able to come up with an example where GCC generates "pushf ; pop mem": > > > > https://godbolt.org/z/9rocjdoaK > > > > (Clang generates a variation of "pop mem," and is horrible code, but > > it's meant for demonstration purposes only.) One interesting thing > > about the use of the builtins is that if at all possible, the "pop" > > instruction may be moved away from the "pushf" if it's safe and would > > reduce register pressure. > > I wouldn't trust the compiler to get stack pointer relative accesses > right if it does move them apart. > Definitely scope for horrid bugs ;-) > Compilers are pretty good at knowing such things and doing it correctly. -bw