Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030324AbbENUSN (ORCPT ); Thu, 14 May 2015 16:18:13 -0400 Received: from mail-bn1bon0113.outbound.protection.outlook.com ([157.56.111.113]:27241 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S965085AbbENUSK (ORCPT ); Thu, 14 May 2015 16:18:10 -0400 Authentication-Results: c-s.fr; dkim=none (message not signed) header.d=none; Message-ID: <1431634680.3868.200.camel@freescale.com> Subject: Re: [PATCH 3/4] powerpc32: memset(0): use cacheable_memzero From: Scott Wood To: christophe leroy CC: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , , , "Joakim Tjernlund" , Kyle Moffett Date: Thu, 14 May 2015 15:18:00 -0500 In-Reply-To: <555461BF.5020105@c-s.fr> References: <9010ef9da0b2730af564a138b8d316d48eaf6d43.1431436210.git.christophe.leroy@c-s.fr> <1431564909.3868.162.camel@freescale.com> <555461BF.5020105@c-s.fr> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.10-0ubuntu1~14.10.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Originating-IP: [2601:2:5800:3f7:12bf:48ff:fe84:c9a0] X-ClientProxiedBy: BLUPR11CA0046.namprd11.prod.outlook.com (10.141.30.14) To BLUPR03MB1476.namprd03.prod.outlook.com (25.163.81.18) X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BLUPR03MB1476; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:BLUPR03MB1476;BCL:0;PCL:0;RULEID:;SRVR:BLUPR03MB1476; X-Forefront-PRVS: 0576145E86 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(24454002)(377424004)(51704005)(479174004)(77096005)(103116003)(5001960100002)(110136002)(46102003)(50986999)(76176999)(42186005)(189998001)(47776003)(122386002)(92566002)(93886004)(50226001)(36756003)(50466002)(62966003)(33646002)(77156002)(2950100001)(40100003)(23676002)(87976001)(86362001)(5820100001)(19580405001)(21314002)(3826002);DIR:OUT;SFP:1102;SCL:1;SRVR:BLUPR03MB1476;H:[IPv6:2601:2:5800:3f7:12bf:48ff:fe84:c9a0];FPR:;SPF:None;MLV:sfv;LANG:en; X-OriginatorOrg: freescale.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 May 2015 20:18:07.5398 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR03MB1476 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3204 Lines: 94 On Thu, 2015-05-14 at 10:50 +0200, christophe leroy wrote: > > Le 14/05/2015 02:55, Scott Wood a écrit : > > On Tue, 2015-05-12 at 15:32 +0200, Christophe Leroy wrote: > >> cacheable_memzero uses dcbz instruction and is more efficient than > >> memset(0) when the destination is in RAM > >> > >> This patch renames memset as generic_memset, and defines memset > >> as a prolog to cacheable_memzero. This prolog checks if the byte > >> to set is 0 and if the buffer is in RAM. If not, it falls back to > >> generic_memcpy() > >> > >> Signed-off-by: Christophe Leroy > >> --- > >> arch/powerpc/lib/copy_32.S | 15 ++++++++++++++- > >> 1 file changed, 14 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S > >> index cbca76c..d8a9a86 100644 > >> --- a/arch/powerpc/lib/copy_32.S > >> +++ b/arch/powerpc/lib/copy_32.S > >> @@ -12,6 +12,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> > >> #define COPY_16_BYTES \ > >> lwz r7,4(r4); \ > >> @@ -74,6 +75,18 @@ CACHELINE_MASK = (L1_CACHE_BYTES-1) > >> * to set them to zero. This requires that the destination > >> * area is cacheable. -- paulus > >> */ > >> +_GLOBAL(memset) > >> + cmplwi r4,0 > >> + bne- generic_memset > >> + cmplwi r5,L1_CACHE_BYTES > >> + blt- generic_memset > >> + lis r8,max_pfn@ha > >> + lwz r8,max_pfn@l(r8) > >> + tophys (r9,r3) > >> + srwi r9,r9,PAGE_SHIFT > >> + cmplw r9,r8 > >> + bge- generic_memset > >> + mr r4,r5 > > max_pfn includes highmem, and tophys only works on normal kernel > > addresses. > Is there any other simple way to determine whether an address is in RAM > or not ? If you want to do it based on the virtual address, rather than doing a tablewalk or TLB search, you need to limit it to lowmem. > I did that because of the below function from mm/mem.c > > |int page_is_ram(unsigned long pfn) > { > #ifndef CONFIG_PPC64 /* XXX for now */ > return pfn< max_pfn; > #else > unsigned long paddr= (pfn<< PAGE_SHIFT); > struct memblock_region*reg; > > for_each_memblock(memory, reg) > if (paddr>= reg->base&& paddr< (reg->base+ reg->size)) > return 1; > return 0; > #endif > } Right, the problem is figuring out the pfn in the first place. > > If we were to point memset_io, memcpy_toio, etc. at noncacheable > > versions, are there any other callers left that can reasonably point at > > uncacheable memory? > Do you mean we could just consider that memcpy() and memset() are called > only with destination on RAM and thus we could avoid the check ? Maybe. If that's not a safe assumption I hope someone will point it out. > copy_tofrom_user() already does this assumption (allthought a user app > could possibly provide a buffer located in an ALSA mapped IO area) The user could also pass in NULL. That's what the fixups are for. :-) -Scott -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/