Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4216316pxf; Tue, 16 Mar 2021 08:14:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxJ2el4ADBdhKTsGKDYjaiNrF6iLaRBaiOsedrapBYH2eDWZarfVo7jhCCfyuBB5nlD42Dd X-Received: by 2002:a05:6402:34d:: with SMTP id r13mr36782024edw.64.1615907672882; Tue, 16 Mar 2021 08:14:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1615907672; cv=none; d=google.com; s=arc-20160816; b=ZlpR83jFsa30/bsUXQifhzbUVEXyE0onaFMV7TLy8unfaF9nSW5S3USMuYEvrq9ETB rQk3nwh3PD5YZhZ2WM2nOCHCmtxyobK+FUjRzQ3nTTCzTEXHaEmIzcYaZSpQ73wtDJAD dqo96jdrbW+B4whifJgtMqwXL7KR2KSPirW+QKun2owvjD0aiWzio4vPTMoHOx4k/ToU ai5XJ1esayIxNj4EVwHpitcqSLSCMziq2isG1HJeYtN7czx6UI7dfrddfyMeboVddZ0H KFNx/TNdhcWK5VK38nPyOiphV+HscbBuhqAmpyusUm4ouPOsWvmZW3Gwimbxwl8Cs1qW vGQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:in-reply-to:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=jt3n2DkMn3+HpXdZhT9c0Gn990kuX+fYAyxyMFYmF4c=; b=qCVOH0RWSKvgOHC9yQzUTxotBaPkIhOhZSJ+1SFWdRVEnMx3vKCfDLlskCY5QQYK8R KlbMtl151/j2v89sdI/e0yA8wd+SnSdxak+f7CsKnVduviqPNAG6uHBiUbaRoY8Gy1ui e1/j7UtAmOWscPiS6WJTYlA4lgFfFEfKXBTrloqCUMgwr/Sm1KuIhmo8VWVOiBFcCThL z1n2N+R9pBNTV1JbjtPDN8NmjM4EYFKX7W694NzVJ6xhPbXgIkgPJGbMXPEbQbb9Spo3 x0g5Pz/mEemRSAGuxrVBHic5f1Y5ZVHir53msVPWwoxnh9qBFp1a5fjaQNTVZyu8j6K7 Rmeg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hr14si14769771ejc.394.2021.03.16.08.14.10; Tue, 16 Mar 2021 08:14:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231351AbhCPJfr convert rfc822-to-8bit (ORCPT + 99 others); Tue, 16 Mar 2021 05:35:47 -0400 Received: from eu-smtp-delivery-151.mimecast.com ([185.58.86.151]:54566 "EHLO eu-smtp-delivery-151.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233885AbhCPJfb (ORCPT ); Tue, 16 Mar 2021 05:35:31 -0400 Received: from AcuMS.aculab.com (156.67.243.126 [156.67.243.126]) (Using TLS) by relay.mimecast.com with ESMTP id uk-mta-235-IBslWWs2N5Cmqi95XwcuXQ-1; Tue, 16 Mar 2021 09:35:27 +0000 X-MC-Unique: IBslWWs2N5Cmqi95XwcuXQ-1 Received: from AcuMS.Aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) by AcuMS.aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 16 Mar 2021 09:35:26 +0000 Received: from AcuMS.Aculab.com ([fe80::994c:f5c2:35d6:9b65]) by AcuMS.aculab.com ([fe80::994c:f5c2:35d6:9b65%12]) with mapi id 15.00.1497.012; Tue, 16 Mar 2021 09:35:26 +0000 From: David Laight To: 'Segher Boessenkool' CC: 'Rasmus Villemoes' , Christophe Leroy , "linuxppc-dev@lists.ozlabs.org" , Paul Mackerras , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure Thread-Topic: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure Thread-Index: AQHXGbeqwOTgfnOElE2I7+OcNDHUgqqFPWMAgAB9u4CAAJjNQA== Date: Tue, 16 Mar 2021 09:35:26 +0000 Message-ID: References: <20210312022940.GO29191@gate.crashing.org> <023afd0c-dc61-5891-5145-5bcdce8227be@prevas.dk> <14e2cfb8c3f141aaba8fe0fb2d8f1885@AcuMS.aculab.com> <20210315235947.GD16691@gate.crashing.org> In-Reply-To: <20210315235947.GD16691@gate.crashing.org> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=C51A453 smtp.mailfrom=david.laight@aculab.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Segher Boessenkool > Sent: 16 March 2021 00:00 ... > > Although you may need to disable loop unrolling (often dubious at best) > > and either force or disable some function inlining. > > The cases where GCC does loop unrolling at -O2 always help quite a lot. > Or, do you have a counter-example? We'd love to see one. The real problem with loop unrolling is that quite often a modern out-of-order superscaler processor actually has 'spare' execution cycles where the loop control can be done 'for free'. Sometimes you do need to unroll (or interleave) a couple of times to get enough spare execution cycles. But the unrolled loop has to read a lot more code into cache - so unless the code is 'hot cache' (that is usually arranged for benchmarking) those delays apply as well. The larger code footprint also displaces other code. My real annoyance with gcc is unrolling (and vectorizing) loops that I know are never executed as many times as even one copy of the unrolled loop. As an example intel (ivy bridge onwards) cpu execute the following code (the middle of the ip checksum) at 8 bytes/clock. (Limited by the carry flag.) It just doesn't need any further unrolling. + "10: jecxz 20f\n" + " adc (%[buff], %[len]), %[sum_0]\n" + " adc 8(%[buff], %[len]), %[sum_1]\n" + " lea 32(%[len]), %[len_tmp]\n" + " adc 16(%[buff], %[len]), %[sum_0]\n" + " adc 24(%[buff], %[len]), %[sum_1]\n" + " mov %[len_tmp], %[len]\n" + " jmp 10b\n" Annoyingly that loop is slow on my 8-core atom. The existing code only does 4 bytes/clock on intel cpu prior to either broadwell or haswell (forgotten which) in spite of much more unroling. > And yup, inlining is hard. GCC's heuristics there are very good > nowadays, but any single decision has big effects. Doing the important > spots manually (always_inline or noinline) has good payoff. Latest inline gripe was a function replicated about 20 times when the non-inline version was a register load and 'tail call'. The inlining is just bloat. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)