Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4246986imm; Wed, 30 May 2018 01:48:39 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJbggPzqIHhnGHmL4K/pWwfxxDBNON8MZiHOfoHll8jozUFBrbsF0fNesd2JcpNyHbeq0Ca X-Received: by 2002:a63:7a4a:: with SMTP id j10-v6mr1518839pgn.421.1527670119145; Wed, 30 May 2018 01:48:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527670119; cv=none; d=google.com; s=arc-20160816; b=flNSgOpCnyNLovqljo/si1iRBf8qGGAF6IbOvufimWKRjbA6cOhKWcHkMuyZhYr07X 3zDO4nCPDcWxnBa9Hu8G/FcqgQjfC2rehMG3mGalC6QnYlo6ETR/q/7mRzM0ulnIuvbp JGmCM/hzt+8J5Lx1W4ivT0B+dOn/JAm4eQ9R5aDIaxNlO5nr8Bi4gmYohB+uKgx+BLvn ZkKnhKYSQBDFSg3kMwP1Ijmt2kdm+UFVBwNmVrkL3Rwbhu4YRCMNypakNLF/9uMQt1au Jvhx/UM/8EMT9TNojTaNP7BL0ME8UY3CGv7Mpi9jRFNGbcdzw/+D7IY48MmxMMZagrQd 3IIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=SvaekZCy06FLTmyzU/p910Nns6GMI3sCb38WQU9vrWA=; b=JZpZyYzM76OzhIl/IoFXT6Uff56+1e7qHnacwau80av4248jgGF4XgW0bmfAid0Lnz 2BVsD5Q0dcz+SPt68SeOVs9WbtcVU5T+dEss1VBN6WrFLqE4yghVwkuh7i/MzxPqrfmA XymviPUp9bGzhO18cb1gq+EXCRo/vngNHcLWoGxMdMrR2aA9AOOkTPZL87qkIzK2vrlY wM3rb87mOMUYfs3mxi8R/vlxTldfJ2OINqQOCbz9QODd3gtb26zs2Yx0TrxJB7UGSSzP 5vskbJ5Qh79qw6aAyKv1K0/7UzG3mDUxht7RBwgj5CEIZZpxRbXNnsHnu8RYEtz6ZJ/Q AViw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c12-v6si33044307pll.360.2018.05.30.01.48.24; Wed, 30 May 2018 01:48:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936053AbeE3Ir5 (ORCPT + 99 others); Wed, 30 May 2018 04:47:57 -0400 Received: from pegase1.c-s.fr ([93.17.236.30]:57189 "EHLO pegase1.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935976AbeE3Irv (ORCPT ); Wed, 30 May 2018 04:47:51 -0400 Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 40wkkX5ktZz9twmq; Wed, 30 May 2018 10:47:48 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id OvPrc1WXG-Fo; Wed, 30 May 2018 10:47:48 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 40wkkX50Gxz9twmC; Wed, 30 May 2018 10:47:48 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 378618B8DD; Wed, 30 May 2018 10:47:49 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id w8mmr-bub9rD; Wed, 30 May 2018 10:47:49 +0200 (CEST) Received: from PO15451 (po15451.idsi0.si.c-s.fr [172.25.231.2]) by messagerie.si.c-s.fr (Postfix) with ESMTP id EECCD8B8C3; Wed, 30 May 2018 10:47:48 +0200 (CEST) Subject: Re: [PATCH v5 3/3] powerpc/lib: optimise PPC32 memcmp To: Mathieu Malaterre Cc: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Segher Boessenkool , linuxppc-dev , LKML References: <88c8cc033b4ea364d62d1d8ba811a8f8d56c297d.1527503958.git.christophe.leroy@c-s.fr> From: Christophe LEROY Message-ID: Date: Wed, 30 May 2018 10:47:48 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le 29/05/2018 à 22:03, Mathieu Malaterre a écrit : > On Mon, May 28, 2018 at 12:49 PM, Christophe Leroy > wrote: >> At the time being, memcmp() compares two chunks of memory >> byte per byte. >> >> This patch optimises the comparison by comparing word by word. >> >> A small benchmark performed on an 8xx comparing two chuncks >> of 512 bytes performed 100000 times gives: >> >> Before : 5852274 TB ticks >> After: 1488638 TB ticks >> >> This is almost 4 times faster >> >> Signed-off-by: Christophe Leroy >> --- >> arch/powerpc/lib/string_32.S | 37 +++++++++++++++++++++++++++---------- >> 1 file changed, 27 insertions(+), 10 deletions(-) > > Would it possible for you to move the actual code instead to: > > ./arch/powerpc/lib/memcmp_32.S > > This will seat right next to memcmp_64.S implementation. Good idea, thanks. Done in v6 Christophe > >> diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S >> index 40a576d56ac7..4fbaa046aa84 100644 >> --- a/arch/powerpc/lib/string_32.S >> +++ b/arch/powerpc/lib/string_32.S >> @@ -16,17 +16,34 @@ >> .text >> >> _GLOBAL(memcmp) >> - cmpwi cr0, r5, 0 >> - beq- 2f >> - mtctr r5 >> - addi r6,r3,-1 >> - addi r4,r4,-1 >> -1: lbzu r3,1(r6) >> - lbzu r0,1(r4) >> - subf. r3,r0,r3 >> - bdnzt 2,1b >> + srawi. r7, r5, 2 /* Divide len by 4 */ >> + mr r6, r3 >> + beq- 3f >> + mtctr r7 >> + li r7, 0 >> +1: lwzx r3, r6, r7 >> + lwzx r0, r4, r7 >> + addi r7, r7, 4 >> + cmplw cr0, r3, r0 >> + bdnzt eq, 1b >> + bne 5f >> +3: andi. r3, r5, 3 >> + beqlr >> + cmplwi cr1, r3, 2 >> + blt- cr1, 4f >> + lhzx r3, r6, r7 >> + lhzx r0, r4, r7 >> + addi r7, r7, 2 >> + subf. r3, r0, r3 >> + beqlr cr1 >> + bnelr >> +4: lbzx r3, r6, r7 >> + lbzx r0, r4, r7 >> + subf. r3, r0, r3 >> blr >> -2: li r3,0 >> +5: li r3, 1 >> + bgtlr >> + li r3, -1 >> blr >> EXPORT_SYMBOL(memcmp) >> >> -- >> 2.13.3 >>