Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753211AbdF2N5w (ORCPT ); Thu, 29 Jun 2017 09:57:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45208 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753117AbdF2N5r (ORCPT ); Thu, 29 Jun 2017 09:57:47 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 9A51ECD19A Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=pabeni@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 9A51ECD19A From: Paolo Abeni To: x86@kernel.org Cc: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Kees Cook , Hannes Frederic Sowa , Linus Torvalds , Alan Cox , linux-kernel@vger.kernel.org Subject: [PATCH] x86/uaccess: optimize copy_user_enhanced_fast_string for short string Date: Thu, 29 Jun 2017 15:55:58 +0200 Message-Id: <4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com> In-Reply-To: References: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 29 Jun 2017 13:57:47 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1742 Lines: 60 According to the Intel datasheet, the rep movsb instruction exposes a relevant setup cost - 50 ticks - which affect badly short string copy operation. This change tries to avoid such cost calling the explicit loop available in the unrolled code for string shorter than 64 bytes. Such value has been selected with empirical measures as the largest value that still ensure a measurable gain. Micro benchmarks of the __copy_from_user() function with lengths in the [0-63] range show this performance gain (shorter the string, larger the gain): - in the [55%-4%] range on Intel Xeon(R) CPU E5-2690 v4 - in the [72%-9%] range on Intel Core i7-4810MQ Other tested CPUs - namely Intel Atom S1260 and AMD Opteron 8216 - show no differences, because they do not expose the ERMS feature bit. Signed-off-by: Paolo Abeni --- arch/x86/lib/copy_user_64.S | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S index c595957..020f75c 100644 --- a/arch/x86/lib/copy_user_64.S +++ b/arch/x86/lib/copy_user_64.S @@ -37,7 +37,7 @@ ENTRY(copy_user_generic_unrolled) movl %edx,%ecx andl $63,%edx shrl $6,%ecx - jz 17f + jz .L_copy_short_string 1: movq (%rsi),%r8 2: movq 1*8(%rsi),%r9 3: movq 2*8(%rsi),%r10 @@ -58,7 +58,8 @@ ENTRY(copy_user_generic_unrolled) leaq 64(%rdi),%rdi decl %ecx jnz 1b -17: movl %edx,%ecx +.L_copy_short_string: + movl %edx,%ecx andl $7,%edx shrl $3,%ecx jz 20f @@ -174,6 +175,8 @@ EXPORT_SYMBOL(copy_user_generic_string) */ ENTRY(copy_user_enhanced_fast_string) ASM_STAC + cmpl $64,%edx + jb .L_copy_short_string /* less then 64 bytes, avoid the costly 'rep' */ movl %edx,%ecx 1: rep movsb -- 2.9.4