Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4176808imm; Wed, 30 May 2018 00:07:12 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJD/8DVQNu1K6N9Mq5xUqcbJUh+Wd/laDU0+jtSOcwq4rAVOOq7mrtCofo9oIjPSzDASAyy X-Received: by 2002:a17:902:a711:: with SMTP id w17-v6mr1691651plq.292.1527664032565; Wed, 30 May 2018 00:07:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527664032; cv=none; d=google.com; s=arc-20160816; b=G9B6qOWq9i5q6e6mgrzxF9yPog05mxA3JRZSMaCdEFFAP1Hqdj67p+9jUKf+P8ulW/ 0KTafiGDusbhe5NWgjegedX6Nzahwh9dJubtU3kGs8Rsm242uSqp79J9abc5LJw0i347 vUA6OTiqe2Ux//BWKwKbbz+wht16oMbXbYurmrLHA7E+GmLsqWpyeGBJJj3F0bO66BFG D4UayAcQ575ALqhZ57mZHgo5P5WTat4dvRE5is0ltB1k6h3O0I88NKEQXZ3nApkTVJ0Q AH5cu46y7/o7X/D+jwkofQGkbXelMc9hXjwlDuoeeMxVoNXn1Hm37Lm1N88pQt/Kt0iR GxvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:date:cc:to:subject:from:references :in-reply-to:message-id:arc-authentication-results; bh=B30ANl+1wzHPxFNt9ITCQkySG+AAzYBQib2m/lpqRzk=; b=ZbkcFelnIrBR32sfjd555GzIal1Q/+5lvtOZHcWv7g4g+jlu9VJi3OdrjTK2Hst37+ Kq3GxuhSJ21xiZUfJ3Fh6t/IfW/nGM+ccQfpigDS8IbD66yxWOQkpGsqI3l7sE0ssgX3 0qSLh9AsixYqUvRZ/LmVY2NKRK1R81yJ2YsTolSERN5xjVyFkvtT2skPTNlqBz9zbaHo cnOYJ8i/g4HxMUu6OzAboDrRzMzY/bJCkR6XQkSdCJcwkq90dGT6zEEePr8p5KDZj0y3 9nZ8vv/MeVZeBQxeXHa2BGFcFnkoMOqyk+4uSakfFas0zCUKvCJj0Wp8jnVAH/DolVzB zowA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o189-v6si33654404pfo.20.2018.05.30.00.06.59; Wed, 30 May 2018 00:07:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968649AbeE3HGY (ORCPT + 99 others); Wed, 30 May 2018 03:06:24 -0400 Received: from pegase1.c-s.fr ([93.17.236.30]:21489 "EHLO pegase1.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S968467AbeE3HGP (ORCPT ); Wed, 30 May 2018 03:06:15 -0400 Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 40whTK5qnCz9tvNm; Wed, 30 May 2018 09:06:13 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id hq355bqLSSxu; Wed, 30 May 2018 09:06:13 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 40whTK5DpWz9tvMf; Wed, 30 May 2018 09:06:13 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 1A05D8B7BB; Wed, 30 May 2018 09:06:14 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id lbSFQtC-CA7Q; Wed, 30 May 2018 09:06:14 +0200 (CEST) Received: from po14934vm.idsi0.si.c-s.fr (po15451.idsi0.si.c-s.fr [172.25.231.2]) by messagerie.si.c-s.fr (Postfix) with ESMTP id DF8468B752; Wed, 30 May 2018 09:06:13 +0200 (CEST) Received: by po14934vm.idsi0.si.c-s.fr (Postfix, from userid 0) id D7F7D6CCAC; Wed, 30 May 2018 07:06:13 +0000 (UTC) Message-Id: <23d156759ff411f5fd932e167b8b5f5ecd6aa88b.1527663626.git.christophe.leroy@c-s.fr> In-Reply-To: References: From: Christophe Leroy Subject: [PATCH v6 1/2] powerpc/lib: optimise 32 bits __clear_user() To: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , segher@kernel.crashing.org Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Date: Wed, 30 May 2018 07:06:13 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rewrite clear_user() on the same principle as memset(0), making use of dcbz to clear complete cache lines. This code is a copy/paste of memset(), with some modifications in order to retrieve remaining number of bytes to be cleared, as it needs to be returned in case of error. On the same way as done on PPC64 in commit 17968fbbd19f1 ("powerpc: 64bit optimised __clear_user"), the patch moves __clear_user() into a dedicated file string_32.S On a MPC885, throughput is almost doubled: Before: ~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 18.990779 seconds, 52.7MB/s After: ~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 9.611468 seconds, 104.0MB/s On a MPC8321, throughput is multiplied by 2.12: Before: root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 6.844352 seconds, 146.1MB/s After: root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000 1048576000 bytes (1000.0MB) copied, 3.218854 seconds, 310.7MB/s Signed-off-by: Christophe Leroy --- arch/powerpc/lib/Makefile | 5 ++- arch/powerpc/lib/string.S | 46 ---------------------- arch/powerpc/lib/string_32.S | 90 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 93 insertions(+), 48 deletions(-) create mode 100644 arch/powerpc/lib/string_32.S diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 653901042ad7..2c9b8c0adf22 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -26,13 +26,14 @@ obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o copypage_power7.o \ memcpy_power7.o obj64-y += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ - string_64.o memcpy_64.o memcmp_64.o pmem.o + memcpy_64.o memcmp_64.o pmem.o obj64-$(CONFIG_SMP) += locks.o obj64-$(CONFIG_ALTIVEC) += vmx-helper.o obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o -obj-y += checksum_$(BITS).o checksum_wrappers.o +obj-y += checksum_$(BITS).o checksum_wrappers.o \ + string_$(BITS).o obj-y += sstep.o ldstfp.o quad.o obj64-y += quad.o diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S index 0378def28d41..5343a88e619e 100644 --- a/arch/powerpc/lib/string.S +++ b/arch/powerpc/lib/string.S @@ -8,8 +8,6 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ -#include -#include #include #include #include @@ -86,47 +84,3 @@ _GLOBAL(memchr) 2: li r3,0 blr EXPORT_SYMBOL(memchr) - -#ifdef CONFIG_PPC32 -_GLOBAL(__clear_user) - addi r6,r3,-4 - li r3,0 - li r5,0 - cmplwi 0,r4,4 - blt 7f - /* clear a single word */ -11: stwu r5,4(r6) - beqlr - /* clear word sized chunks */ - andi. r0,r6,3 - add r4,r0,r4 - subf r6,r0,r6 - srwi r0,r4,2 - andi. r4,r4,3 - mtctr r0 - bdz 7f -1: stwu r5,4(r6) - bdnz 1b - /* clear byte sized chunks */ -7: cmpwi 0,r4,0 - beqlr - mtctr r4 - addi r6,r6,3 -8: stbu r5,1(r6) - bdnz 8b - blr -90: mr r3,r4 - blr -91: mfctr r3 - slwi r3,r3,2 - add r3,r3,r4 - blr -92: mfctr r3 - blr - - EX_TABLE(11b, 90b) - EX_TABLE(1b, 91b) - EX_TABLE(8b, 92b) - -EXPORT_SYMBOL(__clear_user) -#endif diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S new file mode 100644 index 000000000000..6de67d2dbad6 --- /dev/null +++ b/arch/powerpc/lib/string_32.S @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * String handling functions for PowerPC32 + * + * Copyright (C) 1996 Paul Mackerras. + * + */ + +#include +#include +#include + + .text + +CACHELINE_BYTES = L1_CACHE_BYTES +LG_CACHELINE_BYTES = L1_CACHE_SHIFT +CACHELINE_MASK = (L1_CACHE_BYTES-1) + +_GLOBAL(__clear_user) +/* + * Use dcbz on the complete cache lines in the destination + * to set them to zero. This requires that the destination + * area is cacheable. + */ + cmplwi cr0, r4, 4 + mr r10, r3 + li r3, 0 + blt 7f + +11: stw r3, 0(r10) + beqlr + andi. r0, r10, 3 + add r11, r0, r4 + subf r6, r0, r10 + + clrlwi r7, r6, 32 - LG_CACHELINE_BYTES + add r8, r7, r11 + srwi r9, r8, LG_CACHELINE_BYTES + addic. r9, r9, -1 /* total number of complete cachelines */ + ble 2f + xori r0, r7, CACHELINE_MASK & ~3 + srwi. r0, r0, 2 + beq 3f + mtctr r0 +4: stwu r3, 4(r6) + bdnz 4b +3: mtctr r9 + li r7, 4 +10: dcbz r7, r6 + addi r6, r6, CACHELINE_BYTES + bdnz 10b + clrlwi r11, r8, 32 - LG_CACHELINE_BYTES + addi r11, r11, 4 + +2: srwi r0 ,r11 ,2 + mtctr r0 + bdz 6f +1: stwu r3, 4(r6) + bdnz 1b +6: andi. r11, r11, 3 + beqlr + mtctr r11 + addi r6, r6, 3 +8: stbu r3, 1(r6) + bdnz 8b + blr + +7: cmpwi cr0, r4, 0 + beqlr + mtctr r4 + addi r6, r10, -1 +9: stbu r3, 1(r6) + bdnz 9b + blr + +90: mr r3, r4 + blr +91: add r3, r10, r4 + subf r3, r6, r3 + blr + + EX_TABLE(11b, 90b) + EX_TABLE(4b, 91b) + EX_TABLE(10b, 91b) + EX_TABLE(1b, 91b) + EX_TABLE(8b, 91b) + EX_TABLE(9b, 91b) + +EXPORT_SYMBOL(__clear_user) -- 2.13.3