Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp429286pxp; Sat, 19 Mar 2022 06:27:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxqobMpxqTwaABaiMaAV78Byv+Aimhs6/NVuLF0f6F4UOP4bNB7ESh+ecZUhB7BWxAglFnX X-Received: by 2002:a05:6a00:162a:b0:4f6:fc39:c076 with SMTP id e10-20020a056a00162a00b004f6fc39c076mr14741045pfc.24.1647696445398; Sat, 19 Mar 2022 06:27:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647696445; cv=none; d=google.com; s=arc-20160816; b=k9H5Y6XHPkTPzv0+g82DEREOwYZ4KEXD4pB+AhWJZN+PX/M7Yl0qA4e/qRRMHN0RLq 1r2MR3KNkOKmtuCFkIbyZ+Neu8OW2coeM+NRqNw1bFFwgM3HlfDBuQvC5Kq+eVml4pL8 X4LkWHysmZbBUar2Kt/+QobYbku7H1u2zFuRCB8R8DEJHPgnaKiL6REATBcImPLlwMhM aU9Gq8mlD+az8CmRLR08c0wDuCcppVuRR9dbFE13Y3ssMCPvFQBttmp9ZHnxBe//MyUD iMJo6ha0W/JQBJcjhlAoIy9Nn7HIS0IbO22eACNc0FQUXbvCvkb1APLCMl5T/moKzLYm 3kGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:in-reply-to:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=O3ubX1bCPdPFYmz+5UsQgrOml2F/0Kz+i0CSKd2ynBA=; b=A1khJZdv2OylaE/c5HBlA1PrKFBFaTPsonddJqAiC/aDf0b0U3JEOujiqVfrVdVMmd wbvGxYe+nfw2yvkVvRaWlGPyg/PXkKzdXBb5wyndrOv2HiW9dy6XQTU6moefIELdiubE ok8eNANBs6RZZSBhM8QdJmOsAbwhfqwvbfaB8k1vS0QHv1Crb++p9kBcg9O/2mQJNnvE iZkCHauO00UGjjiXUJHC9mkBc4qKNb/nikPpBLzRvdOLHUVpKJ4ui3FFe1xq7T6+Klpl iy5MuEP/8zBVr+ZdBwq7ZzD5JQGdJCosQuEsYpq+LGhROZRy9781VaTLfpphwcG9CCvD KqEg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x8-20020a17090a46c800b001bd14e030e4si8764203pjg.188.2022.03.19.06.27.11; Sat, 19 Mar 2022 06:27:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241567AbiCRXx3 convert rfc822-to-8bit (ORCPT + 99 others); Fri, 18 Mar 2022 19:53:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239217AbiCRXx0 (ORCPT ); Fri, 18 Mar 2022 19:53:26 -0400 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.86.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 85D7814A900 for ; Fri, 18 Mar 2022 16:52:05 -0700 (PDT) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mtapsc-7-vQC6cHK1MJW3GTkXq5uV1A-1; Fri, 18 Mar 2022 23:52:02 +0000 X-MC-Unique: vQC6cHK1MJW3GTkXq5uV1A-1 Received: from AcuMS.Aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) by AcuMS.aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Fri, 18 Mar 2022 23:52:02 +0000 Received: from AcuMS.Aculab.com ([fe80::994c:f5c2:35d6:9b65]) by AcuMS.aculab.com ([fe80::994c:f5c2:35d6:9b65%12]) with mapi id 15.00.1497.033; Fri, 18 Mar 2022 23:52:01 +0000 From: David Laight To: 'Segher Boessenkool' , Linus Torvalds CC: Andrew Cooper , Nick Desaulniers , "H. Peter Anvin" , Bill Wendling , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , Nathan Chancellor , "Juergen Gross" , Peter Zijlstra , "Andy Lutomirski" , "llvm@lists.linux.dev" , LKML , linux-toolchains Subject: RE: [PATCH v5] x86: use builtins to read eflags Thread-Topic: [PATCH v5] x86: use builtins to read eflags Thread-Index: AQHYOxyCBe9clLf+8ESM+1cEdgpAzqzFyeaQ Date: Fri, 18 Mar 2022 23:52:01 +0000 Message-ID: <04f65d1a90f640d4943c810f37016b01@AcuMS.aculab.com> References: <20220210223134.233757-1-morbo@google.com> <20220301201903.4113977-1-morbo@google.com> <20220318230425.GT614@gate.crashing.org> In-Reply-To: <20220318230425.GT614@gate.crashing.org> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=C51A453 smtp.mailfrom=david.laight@aculab.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Segher Boessenkool > Sent: 18 March 2022 23:04 ... > The vast majority of compiler builtins are for simple transformations > that the machine can do, for example with vector instructions. Using > such builtins does *not* instruct the compiler to use those machine > insns, even if the builtin name would suggest that; instead, it asks to > have code generated that has such semantics. So it can be optimised by > the compiler, much more than what can be done with inline asm. Bah. I wrote some small functions to convert blocks of 80 audio samples between C 'float' and the 8-bit u-law and A-law floating point formats - one set use the F16C conversions for denormalised values. I really want the instructions I've asked for in the order I've asked for them. I don't want the compiler doing stupid things. (Like deciding to try to vectorise the bit of code at the end that handled non 80 byte blocks.) > It also can be optimised better by the compiler than if you would > open-code the transforms (if you ask to frobnicate something, the > compiler will know you want to frobnicate that thing, and it will not > always recognise that is what you want if you just write it out in more > general code). Yep. If I write 'for (i = 0; i < n; i++) foo[i] = bar[i]' I want a loop - not a call to memcpy(). If I want a memcpy() I'll call memcpy(). And if I write: do { sum64a += buff32[0]; sum64b += buff32[1]; sum64a += buff32[2]; sum64b += buff32[3]; buff += 4; } while (buff != lim); I don't want to see 'buff[1] + buff[2]' anywhere! That loop has half a chance of running at 8 bytes/clock. But not how gcc compiles it. > Well-chosen builtin names are also much more readable than the best > inline asm can ever be, and it can express much more in a much smaller > space, without so much opportunity to make mistakes, either. Hmmm... Trying to write that SSE2/AVX code was a nightmare. Chase through the cpu instruction set trying to sort out the name of the required instruction. Then search through the 'intrinsic' header to find the name of the builtin. Then disassemble the code to check the I'd got the right one. I'm pretty sure the asm would have been shorter and needed just as many comments. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)