Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755193AbdC3XWi (ORCPT ); Thu, 30 Mar 2017 19:22:38 -0400 Received: from pandora.armlinux.org.uk ([78.32.30.218]:41454 "EHLO pandora.armlinux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752744AbdC3XWJ (ORCPT ); Thu, 30 Mar 2017 19:22:09 -0400 Date: Fri, 31 Mar 2017 00:21:47 +0100 From: Russell King - ARM Linux To: Linus Torvalds Cc: Vineet Gupta , Al Viro , "linux-arch@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Richard Henderson , Will Deacon , Haavard Skinnemoen , Steven Miao , Jesper Nilsson , Mark Salter , Yoshinori Sato , Richard Kuo , Tony Luck , Geert Uytterhoeven , James Hogan , Michal Simek , David Howells , Ley Foon Tan , Jonas Bonn Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification Message-ID: <20170330232147.GL7909@n2100.armlinux.org.uk> References: <20170329055706.GH29622@ZenIV.linux.org.uk> <3399faa9-795e-39db-42f5-7d1e10bbff9c@synopsys.com> <20170329202939.GI29622@ZenIV.linux.org.uk> <32129bc4-0e0a-c21d-0e94-67f73a09ac6e@synopsys.com> <20170329234246.GL29622@ZenIV.linux.org.uk> <09ead054-f62a-76e2-88e0-8d18592d2604@synopsys.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2839 Lines: 67 On Thu, Mar 30, 2017 at 01:59:58PM -0700, Linus Torvalds wrote: > On Thu, Mar 30, 2017 at 1:40 PM, Vineet Gupta > wrote: > > > > So it's a mix bag really. Maybe we need some better directed test to really drill > > it down. > > As mentioned inn the discussion about ARM, I seriously doubt that the > inlining will even be noticeable compared to other effects here. (Sorry to switch sub-threads.) I'm running tests on that point, concentrating on hdparm -T and perfing that. You're right in so far as perf identifies the hotspot as the copy_to_user() function for that workload, rather than the inlined bits - the top hits in perf of hdparm -T are: + 66.52% hdparm [k] __copy_to_user_std + 8.49% hdparm [k] generic_file_read_iter + 3.82% hdparm [k] lock_acquire + 2.80% hdparm [k] copy_page_to_iter + 2.49% hdparm [k] find_get_entry + 1.19% hdparm [k] lock_release Note: perf on ARM does is affected by IRQ-disabled regions, so hotspots can be off. The generic_file_read_iter() one is definitely affected by an IRQ- disabled region in there. Here's the average hdparm -T transfer rates and standard deviation over 20 samples: Unpatched: Average=320.42 MB/s sigma=0.878657 Uaccess+inline: Average=318.77 MB/s sigma=1.003332 Uaccess+noinline: Average=319.40 MB/s sigma=1.088354 This pattern - where the noinline version sits between the inlined version and unpatched version seems to be a pattern in all the measurements I've done so far, and it points to inlining that code having a slight detrimental effect. What we don't know is whether uninlining the code without Al's patch would see a slight boost, but I'm not about to go there. However, this all points towards there being a very slight advantage to dropping the INLINE_COPY_TO_USER and INLINE_COPY_FROM_USER for ARM, but I'd say it's really down in the noise - I'm not concerned. > (On ARM, hopefully the UAO bit is faster to set, but it's still > "another instruction before and after", so even if it's not as > expensive as clac/stac are on current x86 chips, it's an argument > against inlining) The UAO set/clear does show up as a hotspot within copy_page_to_iter(), but as we can see, overall its about 3% of the workload. Within copy_page_to_iter(), it's the __put_user() based loop inside fault_in_pages_writeable() which has the hotspot, due to the repeated enable+disable sequence (more the instruction barriers that we need.) Perf reports that the barriers account for 8.33 and 17.59% of the time spent within that function, so we're actually talking about maybe .25% and .5% of this workload spent doing the UAO thing. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.