Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752364AbdF3NRy (ORCPT ); Fri, 30 Jun 2017 09:17:54 -0400 Received: from terminus.zytor.com ([65.50.211.136]:45349 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751911AbdF3NRw (ORCPT ); Fri, 30 Jun 2017 09:17:52 -0400 Date: Fri, 30 Jun 2017 06:10:59 -0700 From: tip-bot for Paolo Abeni Message-ID: Cc: bp@alien8.de, hpa@zytor.com, pabeni@redhat.com, mingo@kernel.org, luto@kernel.org, brgerst@gmail.com, dvlasenk@redhat.com, keescook@chromium.org, tglx@linutronix.de, hannes@stressinduktion.org, peterz@infradead.org, jpoimboe@redhat.com, linux-kernel@vger.kernel.org, gnomes@lxorguk.ukuu.org.uk, torvalds@linux-foundation.org Reply-To: luto@kernel.org, brgerst@gmail.com, tglx@linutronix.de, keescook@chromium.org, dvlasenk@redhat.com, hpa@zytor.com, bp@alien8.de, mingo@kernel.org, pabeni@redhat.com, torvalds@linux-foundation.org, gnomes@lxorguk.ukuu.org.uk, hannes@stressinduktion.org, peterz@infradead.org, linux-kernel@vger.kernel.org, jpoimboe@redhat.com In-Reply-To: <4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com> References: <4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com> To: linux-tip-commits@vger.kernel.org Subject: [tip:x86/asm] x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings Git-Commit-ID: 236222d39347e0e486010f10c1493e83dbbdfba8 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2864 Lines: 84 Commit-ID: 236222d39347e0e486010f10c1493e83dbbdfba8 Gitweb: http://git.kernel.org/tip/236222d39347e0e486010f10c1493e83dbbdfba8 Author: Paolo Abeni AuthorDate: Thu, 29 Jun 2017 15:55:58 +0200 Committer: Ingo Molnar CommitDate: Fri, 30 Jun 2017 09:52:51 +0200 x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings According to the Intel datasheet, the REP MOVSB instruction exposes a pretty heavy setup cost (50 ticks), which hurts short string copy operations. This change tries to avoid this cost by calling the explicit loop available in the unrolled code for strings shorter than 64 bytes. The 64 bytes cutoff value is arbitrary from the code logic point of view - it has been selected based on measurements, as the largest value that still ensures a measurable gain. Micro benchmarks of the __copy_from_user() function with lengths in the [0-63] range show this performance gain (shorter the string, larger the gain): - in the [55%-4%] range on Intel Xeon(R) CPU E5-2690 v4 - in the [72%-9%] range on Intel Core i7-4810MQ Other tested CPUs - namely Intel Atom S1260 and AMD Opteron 8216 - show no difference, because they do not expose the ERMS feature bit. Signed-off-by: Paolo Abeni Acked-by: Linus Torvalds Cc: Alan Cox Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Hannes Frederic Sowa Cc: Josh Poimboeuf Cc: Kees Cook Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com [ Clarified the changelog. ] Signed-off-by: Ingo Molnar --- arch/x86/lib/copy_user_64.S | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S index c595957..020f75c 100644 --- a/arch/x86/lib/copy_user_64.S +++ b/arch/x86/lib/copy_user_64.S @@ -37,7 +37,7 @@ ENTRY(copy_user_generic_unrolled) movl %edx,%ecx andl $63,%edx shrl $6,%ecx - jz 17f + jz .L_copy_short_string 1: movq (%rsi),%r8 2: movq 1*8(%rsi),%r9 3: movq 2*8(%rsi),%r10 @@ -58,7 +58,8 @@ ENTRY(copy_user_generic_unrolled) leaq 64(%rdi),%rdi decl %ecx jnz 1b -17: movl %edx,%ecx +.L_copy_short_string: + movl %edx,%ecx andl $7,%edx shrl $3,%ecx jz 20f @@ -174,6 +175,8 @@ EXPORT_SYMBOL(copy_user_generic_string) */ ENTRY(copy_user_enhanced_fast_string) ASM_STAC + cmpl $64,%edx + jb .L_copy_short_string /* less then 64 bytes, avoid the costly 'rep' */ movl %edx,%ecx 1: rep movsb