Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751291AbbLUNnt (ORCPT ); Mon, 21 Dec 2015 08:43:49 -0500 Received: from mout.kundenserver.de ([212.227.126.187]:60044 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750914AbbLUNnr (ORCPT ); Mon, 21 Dec 2015 08:43:47 -0500 From: Arnd Bergmann To: linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH] ARM64: Improve copy_page for 128 cache line sizes. Date: Mon, 21 Dec 2015 14:42:58 +0100 User-Agent: KMail/1.12.2 (Linux/3.19.0-27-generic; KDE/4.3.2; x86_64; ; ) Cc: Will Deacon , Andrew Pinski , pinsia@gmail.com, linux-kernel@vger.kernel.org References: <1450570278-19404-1-git-send-email-apinski@cavium.com> <20151221124637.GN23092@arm.com> In-Reply-To: <20151221124637.GN23092@arm.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201512211442.58803.arnd@arndb.de> X-Provags-ID: V03:K0:vJh2yeYnjnAlhDGO0I91ewERG9CVmKj7ApYYhyDev8etE7FHe4Y SmvmvPwbDzzDWKCE+54mK54Si76pQliFAwcWlHHdyE9LxLHuF3psn7Uuvzdpyla8AS1qaKZ dw3V3kap7hu/J0ph8N7RpsKpHzQcdxwnCJOduPcx1yq9XXU88adBX0aVM901dldEllwq1zF ij/Rzb7t//WwlmSJsPg3A== X-UI-Out-Filterresults: notjunk:1;V01:K0:krwo1Cys7i8=:uqa1okVO807UBCq2yN68LZ zmrqOJnGs3tyqTCcPwCqgbkePY6SQR+90E+0J6eOD0nWUTvNARfLOBwtGE9smkaM4BNk9vhei nXG6BF4/WbN2qvD0HqCnDI2jIQZ3Zv3uojlYdK586CEgZ0MB1XsuJs3taW6eOQjsCGnAOW2Di 2OLZpGBo/aK3hMYXiJcr1fULyiy1ljsRzBfwGzigH+yizAftVr1Ym43NEOnlWJYr62ntBrGCa ef8YtIdDMOjHx1TPgmy67MAerpBe/p2SDOw89r+DL/01uQfKGqJv8D9xQB/OHXnZU2pnOmC9y HQMFNOCmRf0PCq+wzN5S3iCtYmM2egNw4QhrKoBJKW8xdhnhjlk/8uMt5rnw+iVlDyG410AMQ 24pbcwBMogri8FSN1g+fxkfLLn4kSbbC/LsaeceVvFKKbzkrkSyyle9FKS3sGLjiq3FtD72eO AMQE9jN/WVC9UoJPo7xAVhO91G22F6re+HRdZfjnAL6US1DoSkXz0EzfPsUWJ0G2eUEV0Nn+H X5ANbWw11le6csKBKGeXPjrZpWHjaJ5E2ZOykGKdLk9GuJxDk5r/vamv/HkgiGkXsoQBs4FIP CzjbDw3Hqca/7uzZuf4XhSoVOVWpAYtlkOq3pXbwWfypZE2bp5sb4rq7vYo+cUHoQ5dZKoP3z XHeh2YANvIMMaUPAjsErbBbm08bgXWFSza6hWGNJlgz6j6eqeXerB/VYm3nd+g0sJdKW+G9qO iUe7f9vorBdhjZJp Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1535 Lines: 37 On Monday 21 December 2015, Will Deacon wrote: > On Sat, Dec 19, 2015 at 04:11:18PM -0800, Andrew Pinski wrote: > > Adding a check for the cache line size is not much overhead. > > Special case 128 byte cache line size. > > This improves copy_page by 85% on ThunderX compared to the > > original implementation. > > So this patch seems to: > > - Align the loop > - Increase the prefetch size > - Unroll the loop once > > Do you know where your 85% boost comes from between these? I'd really > like to avoid having multiple versions of copy_page, if possible, but > maybe we could end up with something that works well enough regardless > of cacheline size. Understanding what your bottleneck is would help to > lead us in the right direction. > > Also, how are you measuring the improvement? If you can share your > test somewhere, I can see how it affects the other systems I have access > to. A related question would be how other CPU cores are affected by the change. The test for the cache line size is going to take a few cycles, possibly a lot on certain implementations, e.g. if we ever get one where 'mrs' is microcoded or trapped by a hypervisor. Are there any possible downsides to using the ThunderX version on other microarchitectures too and skip the check? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/