Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753348AbdFURiz (ORCPT ); Wed, 21 Jun 2017 13:38:55 -0400 Received: from mail-it0-f50.google.com ([209.85.214.50]:37750 "EHLO mail-it0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753081AbdFURix (ORCPT ); Wed, 21 Jun 2017 13:38:53 -0400 MIME-Version: 1.0 In-Reply-To: <63d913f28bc64bd4ea66a39a532f0b59ee015382.1498039056.git.pabeni@redhat.com> References: <63d913f28bc64bd4ea66a39a532f0b59ee015382.1498039056.git.pabeni@redhat.com> From: Kees Cook Date: Wed, 21 Jun 2017 10:38:46 -0700 X-Google-Sender-Auth: BiyMJ1_eJPvSpT9y3ate9BkcJsI Message-ID: Subject: Re: [PATCH] x86/uaccess: use unrolled string copy for short strings To: Paolo Abeni Cc: "x86@kernel.org" , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Al Viro , Hannes Frederic Sowa , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5539 Lines: 123 On Wed, Jun 21, 2017 at 4:09 AM, Paolo Abeni wrote: > The 'rep' prefix suffers for a relevant "setup cost"; as a result > string copies with unrolled loops are faster than even > optimized string copy using 'rep' variant, for short string. > > This change updates __copy_user_generic() to use the unrolled > version for small string length. The threshold length for short > string - 64 - has been selected with empirical measures as the > larger value that still ensure a measurable gain. > > A micro-benchmark of __copy_from_user() with different lengths shows > the following: > > string len vanilla patched delta > bytes ticks ticks tick(%) > > 0 58 26 32(55%) > 1 49 29 20(40%) > 2 49 31 18(36%) > 3 49 32 17(34%) > 4 50 34 16(32%) > 5 49 35 14(28%) > 6 49 36 13(26%) > 7 49 38 11(22%) > 8 50 31 19(38%) > 9 51 33 18(35%) > 10 52 36 16(30%) > 11 52 37 15(28%) > 12 52 38 14(26%) > 13 52 40 12(23%) > 14 52 41 11(21%) > 15 52 42 10(19%) > 16 51 34 17(33%) > 17 51 35 16(31%) > 18 52 37 15(28%) > 19 51 38 13(25%) > 20 52 39 13(25%) > 21 52 40 12(23%) > 22 51 42 9(17%) > 23 51 46 5(9%) > 24 52 35 17(32%) > 25 52 37 15(28%) > 26 52 38 14(26%) > 27 52 39 13(25%) > 28 52 40 12(23%) > 29 53 42 11(20%) > 30 52 43 9(17%) > 31 52 44 8(15%) > 32 51 36 15(29%) > 33 51 38 13(25%) > 34 51 39 12(23%) > 35 51 41 10(19%) > 36 52 41 11(21%) > 37 52 43 9(17%) > 38 51 44 7(13%) > 39 52 46 6(11%) > 40 51 37 14(27%) > 41 50 38 12(24%) > 42 50 39 11(22%) > 43 50 40 10(20%) > 44 50 42 8(16%) > 45 50 43 7(14%) > 46 50 43 7(14%) > 47 50 45 5(10%) > 48 50 37 13(26%) > 49 49 38 11(22%) > 50 50 40 10(20%) > 51 50 42 8(16%) > 52 50 42 8(16%) > 53 49 46 3(6%) > 54 50 46 4(8%) > 55 49 48 1(2%) > 56 50 39 11(22%) > 57 50 40 10(20%) > 58 49 42 7(14%) > 59 50 42 8(16%) > 60 50 46 4(8%) > 61 50 47 3(6%) > 62 50 48 2(4%) > 63 50 48 2(4%) > 64 51 38 13(25%) > > Above 64 bytes the gain fades away. > > Very similar values are collectd for __copy_to_user(). > UDP receive performances under flood with small packets using recvfrom() > increase by ~5%. > > Signed-off-by: Paolo Abeni Since there are no regressions here, this seems sensible to me. :) Reviewed-by: Kees Cook -Kees > --- > arch/x86/include/asm/uaccess_64.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h > index c5504b9..16a8871 100644 > --- a/arch/x86/include/asm/uaccess_64.h > +++ b/arch/x86/include/asm/uaccess_64.h > @@ -28,6 +28,9 @@ copy_user_generic(void *to, const void *from, unsigned len) > { > unsigned ret; > > + if (len <= 64) > + return copy_user_generic_unrolled(to, from, len); > + > /* > * If CPU has ERMS feature, use copy_user_enhanced_fast_string. > * Otherwise, if CPU has rep_good feature, use copy_user_generic_string. > -- > 2.9.4 > -- Kees Cook Pixel Security