Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp3120785imm; Thu, 17 May 2018 03:51:12 -0700 (PDT) X-Google-Smtp-Source: AB8JxZo+/O3n/o/9oO/AG9gr6Y56STqSx5zVybt1Hw6qGYzwF/gT/73PQgVz5hldgLUU6Qm14hZo X-Received: by 2002:a63:6a04:: with SMTP id f4-v6mr3735751pgc.225.1526554272342; Thu, 17 May 2018 03:51:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526554272; cv=none; d=google.com; s=arc-20160816; b=eln1bV8jiCtad/HTga0077q0YyGG8Y+ApdYKwzeZ4eMjeamN8Ggq8IST5RwfQNEoB3 esjkwtmTC17Y/6oPcXTQbt4G9PsFGkEPsS95rqdUX+teCKGrGp7eYwtyGlNladTw4j/Q a5h2ysE1IhG141ymXR8bEDWHDZJyMMLt52MOO8PKoseTur2qFgLV//XoUEm/1+ZX6Eyp 1KCpylGPizxJtH9lvbAiDjN3ciIe8gNf9cAVHu6L28qUD75qU2JwmXBbPWF5i/E1RU5r S+wZDCn53eQJpK6ZryDXVzaXR0qfvVrluK48ciCzOMfGRxQS2Qk6bso5DyCxLX5m/yUG 7Pvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:date:cc:to:subject:from:references :in-reply-to:message-id:arc-authentication-results; bh=GqaS+rXjxES+ovpusrLbqTeEXvquH0us8MhfcMTWsTQ=; b=JUOUxnftcSksgRNgLHKasRkfTwIuoW6r4anbOsfuL8uOBbexFBm/IM5jfg5pDPxl+d EF/osJ1sU9gPuOlWDtZsUdxdWfu2np9In3ULF99MgBj6DcPS4l6T46TMAktKCrELD4Jp PQaLm8jFfHPkcFH5IYEj0+coMlB262jPCmWeOpvmKOtx4X8D23gCUkaq8aOBFW13L3f+ En8fB41BTf2qB6yYXDVteFNWMUZc37qE2Y49iAoro/+KAOqmQzchv2TcVsbcY/Ads99C ozo5/UeYBhHiYL4q1nr355h/5uDekTGxJMVIZ/NbXa5y1PZstOrVwzJV/oiqwe1SAbGE g2Pg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 4-v6si4946095pfb.204.2018.05.17.03.50.58; Thu, 17 May 2018 03:51:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752174AbeEQKu1 (ORCPT + 99 others); Thu, 17 May 2018 06:50:27 -0400 Received: from pegase1.c-s.fr ([93.17.236.30]:27694 "EHLO pegase1.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751844AbeEQKty (ORCPT ); Thu, 17 May 2018 06:49:54 -0400 Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 40mp3M73ZRz9v0kL; Thu, 17 May 2018 12:49:51 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id BLcTUu6NiPiC; Thu, 17 May 2018 12:49:51 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 40mp3M6XS8z9v0k0; Thu, 17 May 2018 12:49:51 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 241868B987; Thu, 17 May 2018 12:49:53 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id qwpsvhT6r-i6; Thu, 17 May 2018 12:49:53 +0200 (CEST) Received: from po14934vm.idsi0.si.c-s.fr (po15451.idsi0.si.c-s.fr [172.25.231.2]) by messagerie.si.c-s.fr (Postfix) with ESMTP id E63FC8B97E; Thu, 17 May 2018 12:49:52 +0200 (CEST) Received: by po14934vm.idsi0.si.c-s.fr (Postfix, from userid 0) id DC17D6F937; Thu, 17 May 2018 12:49:52 +0200 (CEST) Message-Id: <404fbea1966e65b3d6d8f33856f6ff4c6486cce6.1526553552.git.christophe.leroy@c-s.fr> In-Reply-To: References: From: Christophe Leroy Subject: [PATCH v2 2/5] powerpc/lib: optimise 32 bits __clear_user() To: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Date: Thu, 17 May 2018 12:49:52 +0200 (CEST) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rewrite clear_user() on the same principle as memset(0), making use of dcbz to clear complete cache lines. This code is a copy/paste of memset(), with some modifications in order to retrieve remaining number of bytes to be cleared, as it needs to be returned in case of error. On a MPC885, throughput is almost doubled: Before: ~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 18.990779 seconds, 52.7MB/s After: ~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 9.611468 seconds, 104.0MB/s On a MPC8321, throughput is multiplied by 2.12: Before: root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 6.844352 seconds, 146.1MB/s After: root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 3.218854 seconds, 310.7MB/s Signed-off-by: Christophe Leroy --- arch/powerpc/lib/string_32.S | 85 +++++++++++++++++++++++++++++++------------- 1 file changed, 60 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S index ab8c4f5f31b6..2c11c2019b69 100644 --- a/arch/powerpc/lib/string_32.S +++ b/arch/powerpc/lib/string_32.S @@ -13,6 +13,7 @@ #include #include #include +#include .text @@ -31,44 +32,78 @@ _GLOBAL(memcmp) blr EXPORT_SYMBOL(memcmp) +CACHELINE_BYTES = L1_CACHE_BYTES +LG_CACHELINE_BYTES = L1_CACHE_SHIFT +CACHELINE_MASK = (L1_CACHE_BYTES-1) + _GLOBAL(__clear_user) - addi r6,r3,-4 - li r3,0 - li r5,0 - cmplwi 0,r4,4 +/* + * Use dcbz on the complete cache lines in the destination + * to set them to zero. This requires that the destination + * area is cacheable. + */ + cmplwi cr0, r4, 4 + mr r10, r3 + li r3, 0 blt 7f - /* clear a single word */ -11: stwu r5,4(r6) + +11: stw r3, 0(r10) beqlr - /* clear word sized chunks */ - andi. r0,r6,3 - add r4,r0,r4 - subf r6,r0,r6 - srwi r0,r4,2 - andi. r4,r4,3 + andi. r0, r10, 3 + add r11, r0, r4 + subf r6, r0, r10 + + clrlwi r7, r6, 32 - LG_CACHELINE_BYTES + add r8, r7, r11 + srwi r9, r8, LG_CACHELINE_BYTES + addic. r9, r9, -1 /* total number of complete cachelines */ + ble 2f + xori r0, r7, CACHELINE_MASK & ~3 + srwi. r0, r0, 2 + beq 3f + mtctr r0 +4: stwu r3, 4(r6) + bdnz 4b +3: mtctr r9 + li r7, 4 +10: dcbz r7, r6 + addi r6, r6, CACHELINE_BYTES + bdnz 10b + clrlwi r11, r8, 32 - LG_CACHELINE_BYTES + addi r11, r11, 4 + +2: srwi r0 ,r11 ,2 mtctr r0 - bdz 7f -1: stwu r5,4(r6) + bdz 6f +1: stwu r3, 4(r6) bdnz 1b - /* clear byte sized chunks */ -7: cmpwi 0,r4,0 +6: andi. r11, r11, 3 beqlr - mtctr r4 - addi r6,r6,3 -8: stbu r5,1(r6) + mtctr r11 + addi r6, r6, 3 +8: stbu r3, 1(r6) bdnz 8b blr -90: mr r3,r4 + +7: cmpwi cr0, r4, 0 + beqlr + mtctr r4 + addi r6, r10, -1 +9: stbu r3, 1(r6) + bdnz 9b blr -91: mfctr r3 - slwi r3,r3,2 - add r3,r3,r4 + +90: mr r3, r4 blr -92: mfctr r3 +91: add r3, r10, r4 + subf r3, r6, r3 blr EX_TABLE(11b, 90b) + EX_TABLE(4b, 91b) + EX_TABLE(10b, 91b) EX_TABLE(1b, 91b) - EX_TABLE(8b, 92b) + EX_TABLE(8b, 91b) + EX_TABLE(9b, 91b) EXPORT_SYMBOL(__clear_user) -- 2.13.3