Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp2201496imm; Mon, 28 May 2018 03:50:33 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJe4XQY4iO2mjotFDWAm/4b+lShm9O4mO1yWBHWofQJyKOlBqwCRg6uty2tNMlooXvFX3hJ X-Received: by 2002:a62:c11:: with SMTP id u17-v6mr1579687pfi.60.1527504633354; Mon, 28 May 2018 03:50:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527504633; cv=none; d=google.com; s=arc-20160816; b=eSFD7xLT8Yz02B8tysSQCmgrEcBCaBuggIlZkEmQkKjBmbLQxZVJWwKLj/kFLDZEnZ zsVaCsJwupsm4EybLKZrpXthBLq3pbbM5gxFK2xR27MWBVNwgGoz8zzw2onQ2S9bxA3T 3n9eeMwr9SobUbx9YaGrNpYIFkyZ4ELbq4ruB69k4SWqbTABg3slXoOGkyGqN53NdAqa L2SJcQ0Fk9xIFXO7s7hVzIaA+enDUdNf46Gw4uW2aN3OOV8exUQygorYZe0hZtEhwJOj Y/bcR2DITKNzYJ2FrxQ9IV1E5Rs6EiY/nyvogHr7iCemcDaHCnvvD8uy+xE7ptZnbjwc TjKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:date:cc:to:subject:from:references :in-reply-to:message-id:arc-authentication-results; bh=pUhpam96BkDqafAXq4RVjGPgGD2MJnpNTZ3XnEPeToQ=; b=iXYOXaV7YzWQD7LEBfMYvF/K5Vs3rxNxrsow3vt/WIjiuKUJD8v8+VVgElCPsCfFir //VB1A/NyDmmKfIAQXuHn4icNQdjfXyevf52xjlCRICxUyu/m+xy0R0IfopGMFd8hyUb xKZieZ/c3Q76yWr3nObxlT4nN8V1c1m4keDVM7P9Jyiwt8u+Wg3/Gboijm2kOrcP4wAt chzf0hQ6OeCmwsdrNMVlK/F+2YPOPMyt1nvcyW9maEViksIjMEJDYx2wYRYhaRxmVSe2 DUgQT9TC5+ZKnmW7tg92ym1keUVCQOQ5ZMXq7N6bAwRIy2wmFdLJff0+rmFekIbpu8UQ ebkg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w61-v6si831269plb.502.2018.05.28.03.50.18; Mon, 28 May 2018 03:50:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1163641AbeE1KuN (ORCPT + 99 others); Mon, 28 May 2018 06:50:13 -0400 Received: from pegase1.c-s.fr ([93.17.236.30]:63764 "EHLO pegase1.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1163600AbeE1KuF (ORCPT ); Mon, 28 May 2018 06:50:05 -0400 Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 40vYXH0cKxz9ty6y; Mon, 28 May 2018 12:49:51 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id e_zHQShJoCwF; Mon, 28 May 2018 12:49:51 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 40vYXH049Fz9ty6Y; Mon, 28 May 2018 12:49:51 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 527078B971; Mon, 28 May 2018 12:49:56 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id FGi0GarDd_g0; Mon, 28 May 2018 12:49:56 +0200 (CEST) Received: from po14934vm.idsi0.si.c-s.fr (po15451.idsi0.si.c-s.fr [172.25.231.2]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 2B73F8B96E; Mon, 28 May 2018 12:49:56 +0200 (CEST) Received: by po14934vm.idsi0.si.c-s.fr (Postfix, from userid 0) id 212F76F10C; Mon, 28 May 2018 10:49:56 +0000 (UTC) Message-Id: In-Reply-To: References: From: Christophe Leroy Subject: [PATCH v5 2/3] powerpc/lib: optimise 32 bits __clear_user() To: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , segher@kernel.crashing.org Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Date: Mon, 28 May 2018 10:49:56 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rewrite clear_user() on the same principle as memset(0), making use of dcbz to clear complete cache lines. This code is a copy/paste of memset(), with some modifications in order to retrieve remaining number of bytes to be cleared, as it needs to be returned in case of error. On a MPC885, throughput is almost doubled: Before: ~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 18.990779 seconds, 52.7MB/s After: ~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 9.611468 seconds, 104.0MB/s On a MPC8321, throughput is multiplied by 2.12: Before: root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 6.844352 seconds, 146.1MB/s After: root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 3.218854 seconds, 310.7MB/s Signed-off-by: Christophe Leroy --- arch/powerpc/lib/string_32.S | 85 +++++++++++++++++++++++++++++++------------- 1 file changed, 60 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S index 204db8a834fd..40a576d56ac7 100644 --- a/arch/powerpc/lib/string_32.S +++ b/arch/powerpc/lib/string_32.S @@ -11,6 +11,7 @@ #include #include #include +#include .text @@ -29,44 +30,78 @@ _GLOBAL(memcmp) blr EXPORT_SYMBOL(memcmp) +CACHELINE_BYTES = L1_CACHE_BYTES +LG_CACHELINE_BYTES = L1_CACHE_SHIFT +CACHELINE_MASK = (L1_CACHE_BYTES-1) + _GLOBAL(__clear_user) - addi r6,r3,-4 - li r3,0 - li r5,0 - cmplwi 0,r4,4 +/* + * Use dcbz on the complete cache lines in the destination + * to set them to zero. This requires that the destination + * area is cacheable. + */ + cmplwi cr0, r4, 4 + mr r10, r3 + li r3, 0 blt 7f - /* clear a single word */ -11: stwu r5,4(r6) + +11: stw r3, 0(r10) beqlr - /* clear word sized chunks */ - andi. r0,r6,3 - add r4,r0,r4 - subf r6,r0,r6 - srwi r0,r4,2 - andi. r4,r4,3 + andi. r0, r10, 3 + add r11, r0, r4 + subf r6, r0, r10 + + clrlwi r7, r6, 32 - LG_CACHELINE_BYTES + add r8, r7, r11 + srwi r9, r8, LG_CACHELINE_BYTES + addic. r9, r9, -1 /* total number of complete cachelines */ + ble 2f + xori r0, r7, CACHELINE_MASK & ~3 + srwi. r0, r0, 2 + beq 3f + mtctr r0 +4: stwu r3, 4(r6) + bdnz 4b +3: mtctr r9 + li r7, 4 +10: dcbz r7, r6 + addi r6, r6, CACHELINE_BYTES + bdnz 10b + clrlwi r11, r8, 32 - LG_CACHELINE_BYTES + addi r11, r11, 4 + +2: srwi r0 ,r11 ,2 mtctr r0 - bdz 7f -1: stwu r5,4(r6) + bdz 6f +1: stwu r3, 4(r6) bdnz 1b - /* clear byte sized chunks */ -7: cmpwi 0,r4,0 +6: andi. r11, r11, 3 beqlr - mtctr r4 - addi r6,r6,3 -8: stbu r5,1(r6) + mtctr r11 + addi r6, r6, 3 +8: stbu r3, 1(r6) bdnz 8b blr -90: mr r3,r4 + +7: cmpwi cr0, r4, 0 + beqlr + mtctr r4 + addi r6, r10, -1 +9: stbu r3, 1(r6) + bdnz 9b blr -91: mfctr r3 - slwi r3,r3,2 - add r3,r3,r4 + +90: mr r3, r4 blr -92: mfctr r3 +91: add r3, r10, r4 + subf r3, r6, r3 blr EX_TABLE(11b, 90b) + EX_TABLE(4b, 91b) + EX_TABLE(10b, 91b) EX_TABLE(1b, 91b) - EX_TABLE(8b, 92b) + EX_TABLE(8b, 91b) + EX_TABLE(9b, 91b) EXPORT_SYMBOL(__clear_user) -- 2.13.3