Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp4642434imm; Fri, 18 May 2018 08:21:44 -0700 (PDT) X-Google-Smtp-Source: AB8JxZq1eCPEcs/WCT0ZOhTksQoO4opENSO7pgN2po5Fxh273PdAYUGoF9RtFNLvzG0S4ioGtqIt X-Received: by 2002:a17:902:2927:: with SMTP id g36-v6mr9798130plb.303.1526656904311; Fri, 18 May 2018 08:21:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526656904; cv=none; d=google.com; s=arc-20160816; b=mKCuRABNVgcwQgLQxRuUw6MFJ1d4zL7BzPN3Fmh/90rjszArF/V11bH5fTUMC6+EG0 YVg7pRywqSmnRN9BxFAm45ZfAdHkGU3zHIrmQn02cbaf4XTC30fBN00bN93wtBidTrbW Qum7oVXJtV1Ie57dmwsPS1Av2bDsyijVBLxCq6HUEPs50MXk1QcJag8TQ5nt/7XG/eNl PRjtNof97Cc6N8hMQ4teESIc4gm9NrpZKtGCfn6UftX4wCXIofYKKalyVLo1+0jEN29L KXhLniJsLVN0rRX+Yq1RayPvsyebVkQGzbTss1ZluoPXCdJqv/jEKHzs6e2LGkPH45U+ Axag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=t0uKULAv7gVT6CrUdCLi/RPS7QByTk89R9mqQkQu3L0=; b=1G0yxNM6Gct1JK9nDpoxcUADXaiuS+4zU+4woPERPkrhqMZe3xrudM8HMPNxrEVH2a MXWDG6tUXSpJ0sbhEIyr3Kxm4yMB+01sBCrRSw20D2E/qykMcTC3luTt5sdfJaEULA1f vfmGAXsbOCety1IbqchnbVrFx3j8Oi8wxmQg7Zy0nDyFt+nY1rRPkR91NF0vjR3i8Cfu qBs5IZYFErj2dXuF2nhwJdNQpqD60CYL9vQXsBYQgsLmp62CPZd+2UgYvv3nq29XoJYi yKP7VmRkpJ/bAjoJMkrdXWX/Al42lM60tEyE2w8pt9g/Gj2YV9qBR4WlH+MLTbbzSY4N UYTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w22-v6si7511126plq.196.2018.05.18.08.21.29; Fri, 18 May 2018 08:21:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752149AbeERPVO (ORCPT + 99 others); Fri, 18 May 2018 11:21:14 -0400 Received: from gate.crashing.org ([63.228.1.57]:45040 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751049AbeERPVN (ORCPT ); Fri, 18 May 2018 11:21:13 -0400 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id w4IFKTZO014634; Fri, 18 May 2018 10:20:30 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id w4IFKRth014631; Fri, 18 May 2018 10:20:27 -0500 Date: Fri, 18 May 2018 10:20:27 -0500 From: Segher Boessenkool To: Christophe Leroy Cc: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 5/5] powerpc/lib: inline memcmp() for small constant sizes Message-ID: <20180518152027.GD17342@gate.crashing.org> References: <8a6f90d882c8b60e5fa0826cd23dd70a92075659.1526553552.git.christophe.leroy@c-s.fr> <20180517135551.GT17342@gate.crashing.org> <7a2c3de9-4223-ec47-b3c0-1336c9cdbeee@c-s.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7a2c3de9-4223-ec47-b3c0-1336c9cdbeee@c-s.fr> User-Agent: Mutt/1.4.2.3i Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 18, 2018 at 12:35:48PM +0200, Christophe Leroy wrote: > On 05/17/2018 03:55 PM, Segher Boessenkool wrote: > >On Thu, May 17, 2018 at 12:49:58PM +0200, Christophe Leroy wrote: > >>In my 8xx configuration, I get 208 calls to memcmp() > >Could you show results with a more recent GCC? What version was this? > > It was with the latest GCC version I have available in my environment, > that is GCC 5.4. Is that too old ? Since GCC 7 the compiler knows how to do this, for powerpc; in GCC 8 it has improved still. > It seems that version inlines memcmp() when length is 1. All other > lengths call memcmp() Yup. > c000d018 : > c000d018: 80 64 00 00 lwz r3,0(r4) > c000d01c: 81 25 00 00 lwz r9,0(r5) > c000d020: 7c 69 18 50 subf r3,r9,r3 > c000d024: 4e 80 00 20 blr This is incorrect, it does not get the sign of the result correct. Say when comparing 0xff 0xff 0xff 0xff to 0 0 0 0. This should return positive, but it returns negative. For Power9 GCC does lwz 3,0(3) lwz 9,0(4) cmpld 7,3,9 setb 3,7 and for Power7/Power8, lwz 9,0(3) lwz 3,0(4) subfc 3,3,9 popcntd 3,3 subfe 9,9,9 or 3,3,9 (and it gives up for earlier CPUs, there is no nice simple code sequence as far as we know. Code size matters when generating inline code). (Generating code for -m32 it is the same, just w instead of d in a few places). > c000d09c : > c000d09c: 81 25 00 04 lwz r9,4(r5) > c000d0a0: 80 64 00 04 lwz r3,4(r4) > c000d0a4: 81 04 00 00 lwz r8,0(r4) > c000d0a8: 81 45 00 00 lwz r10,0(r5) > c000d0ac: 7c 69 18 10 subfc r3,r9,r3 > c000d0b0: 7d 2a 41 10 subfe r9,r10,r8 > c000d0b4: 7d 2a fe 70 srawi r10,r9,31 > c000d0b8: 7d 48 4b 79 or. r8,r10,r9 > c000d0bc: 4d a2 00 20 bclr+ 12,eq > c000d0c0: 7d 23 4b 78 mr r3,r9 > c000d0c4: 4e 80 00 20 blr > This shows that on PPC32, the 8 bytes comparison is not optimal, I will > improve it. It's not correct either (same problem as with length 4). Segher