Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2018388imu; Wed, 21 Nov 2018 05:37:17 -0800 (PST) X-Google-Smtp-Source: AFSGD/WOmYs19khCxDHjcESKvHdO+LTJQmxpEd10HXQtcsjosRCLNa8uxbWI0z/ztSIOh8m6mKOL X-Received: by 2002:a62:de06:: with SMTP id h6mr2576408pfg.158.1542807437684; Wed, 21 Nov 2018 05:37:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542807437; cv=none; d=google.com; s=arc-20160816; b=KOuZ50Fi1f05xpNXK7omwJcSLpWsahtF1ApfyYKlkC9jAFyhuzdaesOT9rSbIUljsi Y5hqio8BHoyLMoWWjVP9Yy3VB1m5EUvCwC+cksNJEsx6lhjvB4vXUzqAAAzVrPqT05hl awCzhhsHg56u5oJKExt0modnHWTPQrPyBFr9mEk/f3TN+xQCztwGy3wbxL6p6kDSwrG1 y61iXnrWaRCaP/kzZFXbKrtCqCoEQPhoThF8Q6Cdvb5rIYOBrrrdbjMFg2tJKzlUV0Tt Uh82XsT4eACfaEaa+l3f5N1ysYWbckaB8q1iEKgWK0AIbc9FMclbcv//o2V3lQ4Tsl2v JE7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=3KCigNGcBiwvCtG61bb44O1ve4E/m+Xvu2uWHxdf3dk=; b=H86Ls6l3lMe+Zdbow5pHQ9UXH2s6SGl59zqo+9oDgwDQ56s8VlRwnEbqbUOXB/cjc9 axNKiQ72nBcKFkEAl0dQMWLEeIx227RTuMbTeFHksDK9TE4eryIXan8Zjp1EMLRRWKyt 3ybzIxdm5eCy5Pd3K2zd6SCDVKgzu5WhLHgz9RXT6VcgY1wjZUBf3e7C908YwCGN4iso kv2AhMlelT+FGvm3QOnjBBEXseUhj1Zk3ktxfYAIrxqQlwYdCkGLxiO/BcGL5mYhhN4b oXQlP0pvrcAcj5jwM2Km8dxOOwS4zP7UGdr6UaG6drDMrqJbnH35tHRwK353M5m1VqKI doVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=Mnd5iaf1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j10si6794900plg.123.2018.11.21.05.36.43; Wed, 21 Nov 2018 05:37:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=Mnd5iaf1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730752AbeKVAGy (ORCPT + 99 others); Wed, 21 Nov 2018 19:06:54 -0500 Received: from mail-pl1-f196.google.com ([209.85.214.196]:37187 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727486AbeKVAGy (ORCPT ); Wed, 21 Nov 2018 19:06:54 -0500 Received: by mail-pl1-f196.google.com with SMTP id b5so5587947plr.4 for ; Wed, 21 Nov 2018 05:32:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=3KCigNGcBiwvCtG61bb44O1ve4E/m+Xvu2uWHxdf3dk=; b=Mnd5iaf1/WuexKJEXSokW7dAQs+dScDVkZHSSug6pBg9B3dA5gUX5gMXIsWmjABpfJ 2afSIlfdlwrckYgEkiYtcVO9Da5bVU4wqecRsoE47aQJcLZBrtKmKTcDwDZmeOLOPVAh 7XGwkVkLxbxLSN8iPs29VTFgzB//jrwacKM2KcJTztaZ+OE+G2h+dNLyT/bYjM5j+7XX J9W0tGPpeOq/h2nTZWB/WokqRO05PA/LeWW+hpkdFBZ7ZSuyXG1GKtfjI3/GfTe3x88C shJWlq3AzbNgnqzpzi6ehnc140Hoi1z25FZpjvWPmxdCko+F2eapm4+5wnnKAFj//7Oo 2hmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=3KCigNGcBiwvCtG61bb44O1ve4E/m+Xvu2uWHxdf3dk=; b=YlMMaLoe/k43VGevYYTDW1RXyItJnpVqWe+u4ukC7YurpHDnVkxJHbnJHYW2+XVgNg IcPENSdN5SHNzskvydTeBFw+srtzwVQk+m0iITR9C7+qoGyUWNqLCr9WVWusAntdcjpU 3gcjNnrUeP9VaBKviwwHAb/RS77jpz7F9xf8r6oiUBnWAJzyA9RNhp+9wEhzKYWj05fp 5ARasaqrm7Qi3SO8G92g6xxW3fwv9CKziV9KQadsTrsxTU23ibRVkL/uxmFVW3Y1pl/m rD1R1Oc3YTnIHiIzMZzE/cxnOmQMVJU5ubXWrNGYYw/+oLJpLK6uQcCRbHgaUnfrxrfM K3cQ== X-Gm-Message-State: AA+aEWYamSEOUnI+zc1elnssWpnvZv6/l+0YfVMM8r5YpfY8EjboSMuV jkZEQUk/l3ygQcxcFVN26An5Jg== X-Received: by 2002:a17:902:298a:: with SMTP id h10mr6913889plb.312.1542807149668; Wed, 21 Nov 2018 05:32:29 -0800 (PST) Received: from [192.168.1.121] (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id v5sm70635102pgn.5.2018.11.21.05.32.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 Nov 2018 05:32:28 -0800 (PST) Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes To: Ingo Molnar Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , the arch/x86 maintainers , Linus Torvalds , Andrew Morton , Andy Lutomirski , Peter Zijlstra , Denys Vlasenko , Brian Gerst , linux-kernel@vger.kernel.org, pabeni@redhat.com References: <02bfc577-32a5-66be-64bf-d476b7d447d2@kernel.dk> <20181121063609.GA109082@gmail.com> From: Jens Axboe Message-ID: <48e27a3a-2bb2-ff41-3512-8aeb3fd59e57@kernel.dk> Date: Wed, 21 Nov 2018 06:32:26 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <20181121063609.GA109082@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/20/18 11:36 PM, Ingo Molnar wrote: > > [ Cc:-ed a few other gents and lkml. ] > > * Jens Axboe wrote: > >> Hi, >> >> So this is a fun one... While I was doing the aio polled work, I noticed >> that the submitting process spent a substantial amount of time copying >> data to/from userspace. For aio, that's iocb and io_event, which are 64 >> and 32 bytes respectively. Looking closer at this, and it seems that >> ERMS rep movsb is SLOWER for smaller copies, due to a higher startup >> cost. >> >> I came up with this hack to test it out, and low and behold, we now cut >> the time spent in copying in half. 50% less. >> >> Since these kinds of patches tend to lend themselves to bike shedding, I >> also ran a string of kernel compilations out of RAM. Results are as >> follows: >> >> Patched : 62.86s avg, stddev 0.65s >> Stock : 63.73s avg, stddev 0.67s >> >> which would also seem to indicate that we're faster punting smaller >> (< 128 byte) copies. >> >> CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz >> >> Interestingly, text size is smaller with the patch as well?! >> >> I'm sure there are smarter ways to do this, but results look fairly >> conclusive. FWIW, the behaviorial change was introduced by: >> >> commit 954e482bde20b0e208fd4d34ef26e10afd194600 >> Author: Fenghua Yu >> Date: Thu May 24 18:19:45 2012 -0700 >> >> x86/copy_user_generic: Optimize copy_user_generic with CPU erms feature >> >> which contains nothing in terms of benchmarking or results, just claims >> that the new hotness is better. >> >> Signed-off-by: Jens Axboe >> --- >> >> diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h >> index a9d637bc301d..7dbb78827e64 100644 >> --- a/arch/x86/include/asm/uaccess_64.h >> +++ b/arch/x86/include/asm/uaccess_64.h >> @@ -29,16 +29,27 @@ copy_user_generic(void *to, const void *from, unsigned len) >> { >> unsigned ret; >> >> + /* >> + * For smaller copies, don't use ERMS as it's slower. >> + */ >> + if (len < 128) { >> + alternative_call(copy_user_generic_unrolled, >> + copy_user_generic_string, X86_FEATURE_REP_GOOD, >> + ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from), >> + "=d" (len)), >> + "1" (to), "2" (from), "3" (len) >> + : "memory", "rcx", "r8", "r9", "r10", "r11"); >> + return ret; >> + } >> + >> /* >> * If CPU has ERMS feature, use copy_user_enhanced_fast_string. >> * Otherwise, if CPU has rep_good feature, use copy_user_generic_string. >> * Otherwise, use copy_user_generic_unrolled. >> */ >> alternative_call_2(copy_user_generic_unrolled, >> - copy_user_generic_string, >> - X86_FEATURE_REP_GOOD, >> - copy_user_enhanced_fast_string, >> - X86_FEATURE_ERMS, >> + copy_user_generic_string, X86_FEATURE_REP_GOOD, >> + copy_user_enhanced_fast_string, X86_FEATURE_ERMS, >> ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from), >> "=d" (len)), >> "1" (to), "2" (from), "3" (len) > > So I'm inclined to do something like yours, because clearly the changelog > of 954e482bde20 was at least partly false: Intel can say whatever they > want, it's a fact that ERMS has high setup costs for low buffer sizes - > ERMS is optimized for large size, cache-aligned copies mainly. I'm actually surprised that something like that was accepted, I guess 2012 was a simpler time :-) > But the result is counter-intuitive in terms of kernel text footprint, > plus the '128' is pretty arbitrary - we should at least try to come up > with a break-even point where manual copy is about as fast as ERMS - on > at least a single CPU ... I did some more investigation yesterday, and found this: commit 236222d39347e0e486010f10c1493e83dbbdfba8 Author: Paolo Abeni Date: Thu Jun 29 15:55:58 2017 +0200 x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings which does attempt to rectify it, but only using ERMS for >= 64 byte copies. At least for me, looks like the break even point is higher than that, which would mean that something like the below would be more appropriate. Adding Paolo, in case he actually wrote a test app for this. In terms of test app, I'm always weary of using microbenchmarking only for this type of thing, real world testing (if stable enough) is much more useful. diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S index db4e5aa0858b..21c4d68c5fac 100644 --- a/arch/x86/lib/copy_user_64.S +++ b/arch/x86/lib/copy_user_64.S @@ -175,8 +175,8 @@ EXPORT_SYMBOL(copy_user_generic_string) */ ENTRY(copy_user_enhanced_fast_string) ASM_STAC - cmpl $64,%edx - jb .L_copy_short_string /* less then 64 bytes, avoid the costly 'rep' */ + cmpl $128,%edx + jb .L_copy_short_string /* less then 128 bytes, avoid costly 'rep' */ movl %edx,%ecx 1: rep movsb -- Jens Axboe