Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759692AbcDERmU (ORCPT ); Tue, 5 Apr 2016 13:42:20 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:38634 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758413AbcDERmR (ORCPT ); Tue, 5 Apr 2016 13:42:17 -0400 Subject: Re: [PATCH net-next 1/8] perf: optimize perf_fetch_caller_regs To: Peter Zijlstra References: <1459831974-2891931-1-git-send-email-ast@fb.com> <1459831974-2891931-2-git-send-email-ast@fb.com> <20160405120626.GM3448@twins.programming.kicks-ass.net> CC: Steven Rostedt , "David S . Miller" , Ingo Molnar , Daniel Borkmann , Arnaldo Carvalho de Melo , Wang Nan , Josef Bacik , Brendan Gregg , , , From: Alexei Starovoitov Message-ID: <5703F8AF.7040503@fb.com> Date: Tue, 5 Apr 2016 10:41:03 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <20160405120626.GM3448@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-04-05_11:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1352 Lines: 34 On 4/5/16 5:06 AM, Peter Zijlstra wrote: > On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote: >> avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints. >> It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call >> with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc > > Its not actually allocated; but because its a static uninitialized > variable we get .bss like behaviour and the initial value is copied to > all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC. yes, it's .bss-like in a special section. I think static percpu still goes through some fancy boot time init similar to dynamic. What I tried to emphasize that either static or dynamic percpu areas are guaranteed to be zero initialized. >> and >> subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs, >> so we can safely drop memset from all of the above cases and > > Indeed. > >> move it into >> perf_ftrace_function_call that calls it with stack allocated pt_regs. > > Hmm, is there a reason that's still on-stack instead of using the > per-cpu thing, Steve? > >> Signed-off-by: Alexei Starovoitov > > In any case, > > Acked-by: Peter Zijlstra (Intel) Thanks for the quick review.