Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp4489143imm; Fri, 18 May 2018 06:03:45 -0700 (PDT) X-Google-Smtp-Source: AB8JxZo0KlkqE1mtTXWMtYqGhpdLHn3cbkVVTaJL2756ep3/9VSgFCyoxiXiM2156Z0Wnp2ATp7y X-Received: by 2002:a62:3c10:: with SMTP id j16-v6mr9366681pfa.7.1526648624947; Fri, 18 May 2018 06:03:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526648624; cv=none; d=google.com; s=arc-20160816; b=dUqn7cK5SCYu5hSnRveX9DhBj1dwpXinLZbtvYCMuJ3YeTjfNfWZXlzReX+xn28PMA cFocEaIMD3gnRH5aaAyXzJ2KW7N8ItYr/zmpYhmrOAFpIW5DmK+fZSSiEuNioUwX/D5j U+deEZA3RxfVX7Q7CI7g/waG4AeWNX9bTjkbO7RDHn3kYDDV3vk+hxfRjnf5AS88gk23 On4gmVS6OrB/Jpt+dMutF5KIsAyRzmRCdzwjfyHPJ5df8krf/I4qQsxl6s57PeMDYoRh jEN2/047AR8ZSNbUUgVJBLNErdlxMcqx3veZpB9N6lzvu/aLEYmI5YCHVR84ZJ6OSNai gcjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:date:message-id:cc:to:subject:from :arc-authentication-results; bh=wD+BPFOaoQlgiWsHSK7HUOr0aMHjghzA0goEj1AwGXw=; b=q2vGaG/1BFFZXoLJ1ptxrKCTTUcghGLFyhe+GKiQEkmVkfxJHs0AVQHaYJ1eu1QNHf aDSPWr9fQsF4UdMQdBAZDoLZisbT2MytyaSG1wFVLN4yE6ynhn+8FeAAstHVsHbXsH5w pJWjPbaoLBd10J1iycvsVH9U9aWYz73QBnTLGCOzUw0kDEjizsljxXNnuC1mSWNBDkcE 1yWLDM5dNWJVj89vqchRisrUS34TT/wtyobJIfSXbo3onpVA1tDWSc4jVAE1qQ9GYGXx 313ELf5FJ04LKFrUpcGTiex+2OoKiV4hjwRedpshViPVeDygZAapwpTNGhc90RdHUVJJ CwOA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i65-v6si7818812pfb.343.2018.05.18.06.03.14; Fri, 18 May 2018 06:03:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752417AbeERNBW (ORCPT + 99 others); Fri, 18 May 2018 09:01:22 -0400 Received: from pegase1.c-s.fr ([93.17.236.30]:20094 "EHLO pegase1.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751965AbeERNBS (ORCPT ); Fri, 18 May 2018 09:01:18 -0400 Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 40nSwW620Cz9tvRD; Fri, 18 May 2018 15:01:15 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id L1k0I7teFcTe; Fri, 18 May 2018 15:01:15 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 40nSwW5YWqz9tvR8; Fri, 18 May 2018 15:01:15 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 00EEF8BB3A; Fri, 18 May 2018 15:01:17 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id eK6Os-zPlLbT; Fri, 18 May 2018 15:01:16 +0200 (CEST) Received: from po14934vm.idsi0.si.c-s.fr (po15451.idsi0.si.c-s.fr [172.25.231.2]) by messagerie.si.c-s.fr (Postfix) with ESMTP id CD9698B9FF; Fri, 18 May 2018 15:01:16 +0200 (CEST) Received: by po14934vm.idsi0.si.c-s.fr (Postfix, from userid 0) id A1A3B6F937; Fri, 18 May 2018 15:01:16 +0200 (CEST) From: Christophe Leroy Subject: [PATCH v2] powerpc/lib: Adjust .balign inside string functions for PPC32 To: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Message-Id: <20180518130116.A1A3B6F937@po14934vm.idsi0.si.c-s.fr> Date: Fri, 18 May 2018 15:01:16 +0200 (CEST) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org commit 87a156fb18fe1 ("Align hot loops of some string functions") degraded the performance of string functions by adding useless nops A simple benchmark on an 8xx calling 100000x a memchr() that matches the first byte runs in 41668 TB ticks before this patch and in 35986 TB ticks after this patch. So this gives an improvement of approx 10% Another benchmark doing the same with a memchr() matching the 128th byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks after this patch, so regardless on the number of loops, removing those useless nops improves the test by 5683 TB ticks. Fixes: 87a156fb18fe1 ("Align hot loops of some string functions") Signed-off-by: Christophe Leroy --- v2: Define IFETCH_ALIGN_SHIFT for PPC32 and use IFETCH_ALIGN_BYTES for the alignment arch/powerpc/include/asm/cache.h | 3 +++ arch/powerpc/lib/string.S | 7 ++++--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index c1d257aa4c2d..66298461b640 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -9,11 +9,14 @@ #if defined(CONFIG_PPC_8xx) || defined(CONFIG_403GCX) #define L1_CACHE_SHIFT 4 #define MAX_COPY_PREFETCH 1 +#define IFETCH_ALIGN_SHIFT 2 #elif defined(CONFIG_PPC_E500MC) #define L1_CACHE_SHIFT 6 #define MAX_COPY_PREFETCH 4 +#define IFETCH_ALIGN_SHIFT 3 #elif defined(CONFIG_PPC32) #define MAX_COPY_PREFETCH 4 +#define IFETCH_ALIGN_SHIFT 3 /* 603 fetches 2 insn at a time */ #if defined(CONFIG_PPC_47x) #define L1_CACHE_SHIFT 7 #else diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S index a787776822d8..0378def28d41 100644 --- a/arch/powerpc/lib/string.S +++ b/arch/powerpc/lib/string.S @@ -12,6 +12,7 @@ #include #include #include +#include .text @@ -23,7 +24,7 @@ _GLOBAL(strncpy) mtctr r5 addi r6,r3,-1 addi r4,r4,-1 - .balign 16 + .balign IFETCH_ALIGN_BYTES 1: lbzu r0,1(r4) cmpwi 0,r0,0 stbu r0,1(r6) @@ -43,7 +44,7 @@ _GLOBAL(strncmp) mtctr r5 addi r5,r3,-1 addi r4,r4,-1 - .balign 16 + .balign IFETCH_ALIGN_BYTES 1: lbzu r3,1(r5) cmpwi 1,r3,0 lbzu r0,1(r4) @@ -77,7 +78,7 @@ _GLOBAL(memchr) beq- 2f mtctr r5 addi r3,r3,-1 - .balign 16 + .balign IFETCH_ALIGN_BYTES 1: lbzu r0,1(r3) cmpw 0,r0,r4 bdnzf 2,1b -- 2.13.3