Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3757718imm; Tue, 29 May 2018 13:05:09 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqVQGc5yFqxcl6O7u3EoXmiFd6qwBrNbBs2y7+MJ7FwQ/QjcpABPyjhkdrYfrFQvnseHELm X-Received: by 2002:a65:608d:: with SMTP id t13-v6mr14776455pgu.266.1527624309845; Tue, 29 May 2018 13:05:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527624309; cv=none; d=google.com; s=arc-20160816; b=E+L/IGSobabt3OYzbWIpAC9uTnKtRJEWTZXdSj2KvL3M1ZqOXu8O+3G+GeTefhwk8o ksmx1X0cr1dDI64FXwJqCDSzXRrYUQVpdYNCo1KgjlOBKIK3VXHxpLc7b5+09gVFmN18 MXBbKJctP0HDVk3epr2QyMWNHzL9O4gR7yYfRuXD+sg6k8VACVOal5iqsgdjvBY4x1e7 u1+DCy14JAUktAOKBwII0aC3WQlyLqWeghCzBSCbrpCYfH2EQMrIoI8l30DMZDX7tBLa hqnj12w29IRRqd8sUJXqc5HBgpV969Qmcnpf0v4pQaTi3sqiBrtsYHkKLn5PwRP+fIiY TC2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=pc4B+e9FUMURmLPses3RblCsdVAlGm+CzqkxNXUyh64=; b=fV8lZ26AyuPZD3EtuFA0x4yN/jHdfbsZrMF8s6fgWD42bpsNbkug9paK0lUabeninu Ky/r5aE6PjKNfRqZxlWTsDfksme1SZqE0FCJwrvjAxCq004+WAOJlWh9s9QpdT/kDGM4 utZ1jCTxe3sukqa7MgcRDYQu1/FBij+xcnD6vJw0jpMCjgknvSLuIB1WIPfwPRRBdP9F NwK7OFHE+n70JvvTwfzfBUCyW2AdZXIs5hcWK7IB7roogy+ihmpvGerKv3bHMd1O4mmU 5F43oh+jcpCEzU+CS9yB/lz3QtBkjQHbp3E7mrUWFHJ2kbojjl2ZzyUPyUCFzw5cRWR4 xY6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=HkTGZwy/; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q2-v6si32943485plh.136.2018.05.29.13.04.55; Tue, 29 May 2018 13:05:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=HkTGZwy/; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966605AbeE2UDX (ORCPT + 99 others); Tue, 29 May 2018 16:03:23 -0400 Received: from mail-oi0-f66.google.com ([209.85.218.66]:42418 "EHLO mail-oi0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966430AbeE2UDW (ORCPT ); Tue, 29 May 2018 16:03:22 -0400 Received: by mail-oi0-f66.google.com with SMTP id t27-v6so14233897oij.9 for ; Tue, 29 May 2018 13:03:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=pc4B+e9FUMURmLPses3RblCsdVAlGm+CzqkxNXUyh64=; b=HkTGZwy/vAnac8BQtl5kg2DUTJRAzxw8jqv/iDLIJhZ1FV8EypLlICJ6ws+CCFVEAi GqkV2sfrfCEpsM4DOf+X/gVd8NBHwi+dhUJ684EhAQxQn/h9gOwCfgiVBdT0pRAzsYzK n1hun/vnsdUh2TziA2ZKDbE22OcxYW0J/Ub7b3+dDJTHxbyRCghTk93oYPWfIOzesf7Z MJoie4WqnbzJvQpdCAA94MOXLesIU6lDabvQ6awAYc3SF1I7TWAqoVjuBF5dXCGlcj7P aIJvd2D8BXpMEiWUsDR6Q2b24VYSuGlOFdOQ9o4TWuWiu4FzQRwfL5mFha1K4jVC0MRL PeAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=pc4B+e9FUMURmLPses3RblCsdVAlGm+CzqkxNXUyh64=; b=X3sA31qRm1zC/KPUqZqx77i5gdJd7qX0LqqdojvQeMZgxYfs4+ZH013LjaGge/4ILI wemFxL/AgsLUeLp5YCy9dzRabEf8UXUzk0oh4+TCawtfcEO7Cc6hkiH6TCp3nd5PfFYA qvaIS8EvJEWb0jzQtm5BATdt9dLtKCyI16Q1c3LqCPpMZBfvz6HMyJdZ6HtixRh9DV+G 4C7TGauD9rUUrC6Rmaih+loC2nNiQzypmQsf+3xdUj0VBS5EwGlvHTmzHlha9xH+E7py jwOAcGsILiYTXJL9SnJ/DHXagIqYLLQ1erOaNkaH5dHsGtahZJAB118+dSxJ58I0iGLp 0Y9A== X-Gm-Message-State: ALKqPwccuuH369f6d5GNepYcZ/sr7diWUmVBCJZGGx6XctArd3nsVPn2 WUFMVEdfLmOhuBUymvogp2ftndoWto+XNDEW6AMeDqDw X-Received: by 2002:aca:a686:: with SMTP id t6-v6mr400274oij.48.1527624201353; Tue, 29 May 2018 13:03:21 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a8a:305:0:0:0:0:0 with HTTP; Tue, 29 May 2018 13:03:01 -0700 (PDT) In-Reply-To: <88c8cc033b4ea364d62d1d8ba811a8f8d56c297d.1527503958.git.christophe.leroy@c-s.fr> References: <88c8cc033b4ea364d62d1d8ba811a8f8d56c297d.1527503958.git.christophe.leroy@c-s.fr> From: Mathieu Malaterre Date: Tue, 29 May 2018 22:03:01 +0200 X-Google-Sender-Auth: eGgUWg9EZYDx9N_WpnrKuX0WZXA Message-ID: Subject: Re: [PATCH v5 3/3] powerpc/lib: optimise PPC32 memcmp To: Christophe Leroy Cc: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Segher Boessenkool , linuxppc-dev , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 28, 2018 at 12:49 PM, Christophe Leroy wrote: > At the time being, memcmp() compares two chunks of memory > byte per byte. > > This patch optimises the comparison by comparing word by word. > > A small benchmark performed on an 8xx comparing two chuncks > of 512 bytes performed 100000 times gives: > > Before : 5852274 TB ticks > After: 1488638 TB ticks > > This is almost 4 times faster > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/lib/string_32.S | 37 +++++++++++++++++++++++++++---------- > 1 file changed, 27 insertions(+), 10 deletions(-) Would it possible for you to move the actual code instead to: ./arch/powerpc/lib/memcmp_32.S This will seat right next to memcmp_64.S implementation. > diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S > index 40a576d56ac7..4fbaa046aa84 100644 > --- a/arch/powerpc/lib/string_32.S > +++ b/arch/powerpc/lib/string_32.S > @@ -16,17 +16,34 @@ > .text > > _GLOBAL(memcmp) > - cmpwi cr0, r5, 0 > - beq- 2f > - mtctr r5 > - addi r6,r3,-1 > - addi r4,r4,-1 > -1: lbzu r3,1(r6) > - lbzu r0,1(r4) > - subf. r3,r0,r3 > - bdnzt 2,1b > + srawi. r7, r5, 2 /* Divide len by 4 */ > + mr r6, r3 > + beq- 3f > + mtctr r7 > + li r7, 0 > +1: lwzx r3, r6, r7 > + lwzx r0, r4, r7 > + addi r7, r7, 4 > + cmplw cr0, r3, r0 > + bdnzt eq, 1b > + bne 5f > +3: andi. r3, r5, 3 > + beqlr > + cmplwi cr1, r3, 2 > + blt- cr1, 4f > + lhzx r3, r6, r7 > + lhzx r0, r4, r7 > + addi r7, r7, 2 > + subf. r3, r0, r3 > + beqlr cr1 > + bnelr > +4: lbzx r3, r6, r7 > + lbzx r0, r4, r7 > + subf. r3, r0, r3 > blr > -2: li r3,0 > +5: li r3, 1 > + bgtlr > + li r3, -1 > blr > EXPORT_SYMBOL(memcmp) > > -- > 2.13.3 >