Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756032AbaKTLpS (ORCPT ); Thu, 20 Nov 2014 06:45:18 -0500 Received: from mail7.hitachi.co.jp ([133.145.228.42]:38936 "EHLO mail7.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750845AbaKTLpQ (ORCPT ); Thu, 20 Nov 2014 06:45:16 -0500 Message-ID: <546DD445.5080108@hitachi.com> Date: Thu, 20 Nov 2014 20:45:09 +0900 From: Masami Hiramatsu Organization: Hitachi, Ltd., Japan User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120614 Thunderbird/13.0.1 MIME-Version: 1.0 To: Steven Rostedt Cc: LKML , Ingo Molnar , Namhyung Kim , "Paul E. McKenney" Subject: Re: [PATCH] tracing: Fix race of function probes counting References: <20141118134643.4b550ee4@gandalf.local.home> In-Reply-To: <20141118134643.4b550ee4@gandalf.local.home> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (2014/11/19 3:46), Steven Rostedt wrote: > > The function probe counting for traceon and traceoff suffered a race > condition where if the probe was executing on two or more CPUs at the > same time, it could decrement the counter by more than one when > disabling (or enabling) the tracer only once. > > The way the traceon and traceoff probes are suppose to work is that > they disable (or enable) tracing once per count. If a user were to > echo 'schedule:traceoff:3' into set_ftrace_filter, then when the > schedule function was called, it would disable tracing. But the count > should only be decremented once (to 2). Then if the user enabled tracing > again (via tracing_on file), the next call to schedule would disable > tracing again and the count would be decremented to 1. > > But if multiple CPUS called schedule at the same time, it is possible > that the count would be decremented more than once because of the > simple "count--" used. > > By reading the count into a local variable and using memory barriers > we can guarantee that the count would only be decremented once per > disable (or enable). > > The stack trace probe had a similar race, but here the stack trace will > decrement for each time it is called. But this had the read-modify- > write race, where it could stack trace more than the number of times > that was specified. This case we use a cmpxchg to stack trace only the > number of times specified. > > The dump probes can still use the old "update_count()" function as > they only run once, and that is controlled by the dump logic > itself. > > Signed-off-by: Steven Rostedt I have found a couple of typos, but basically, it looks OK. Reviewed-by: Masami Hiramatsu > --- > kernel/trace/trace_functions.c | 117 +++++++++++++++++++++++++++++++++-------- > 1 file changed, 96 insertions(+), 21 deletions(-) > > diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c > index a8e0c7666164..973db52eb070 100644 > --- a/kernel/trace/trace_functions.c > +++ b/kernel/trace/trace_functions.c > @@ -261,37 +261,74 @@ static struct tracer function_trace __tracer_data = > }; > > #ifdef CONFIG_DYNAMIC_FTRACE > -static int update_count(void **data) > +static void update_traceon_count(void **data, int on) btw, why don't you use bool instead of int? > { > - unsigned long *count = (long *)data; > + long *count = (long *)data; > + long old_count = *count; > > - if (!*count) > - return 0; > + /* > + * Tracing gets disabled (or enabled) once per count. > + * This function can be called at the same time on mulitple CPUs. ^^^^^^^^ multiple > + * It is fine if both disable (or enable) tracing, as disabling > + * (or enabling) the second time doesn't do anything as the > + * state of the tracer is already disabled (or enabled). > + * What needs to be synchronized in this case is that the count > + * only gets decremented once, even if the tracer is disabled > + * (or enabled) twice, as the second one is really a nop. > + * > + * The memory barriers guarantee that we only decrement the > + * counter once. First the count is read to a local variable > + * and a read barrier is used to make sure that it is loaded > + * before checking if the tracer is in the state we want. > + * If the tracer is not in the state we want, then the count > + * is guaranteed to be the old count. > + * > + * Next the tracer is set to the state we want (disabled or enabled) > + * then a write memory barrier is used to make sure that > + * the new state is visible before changing the counter by > + * one minus the old counter. This guarantees that another CPU > + * executing this code will see the new state before seeing > + * the new counter value, and would not do anthing if the new ^^^^^^^anything? > + * counter is seen. > + * > + * Note, there is no synchronization between this and a user > + * setting the tracing_on file. But we currently don't care > + * about that. > + */ > + if (!old_count) > + return; > > - if (*count != -1) > - (*count)--; > + /* Make sure we see count before checking tracing state */ > + smp_rmb(); > > - return 1; > + if (on == !!tracing_is_on()) > + return; > + > + if (on) > + tracing_on(); > + else > + tracing_off(); > + > + /* unlimited? */ > + if (old_count == -1) > + return; > + > + /* Make sure tracing state is visible before updating count */ > + smp_wmb(); > + > + *count = old_count - 1; > } > > static void > ftrace_traceon_count(unsigned long ip, unsigned long parent_ip, void **data) > { > - if (tracing_is_on()) > - return; > - > - if (update_count(data)) > - tracing_on(); > + update_traceon_count(data, 1); > } > > static void > ftrace_traceoff_count(unsigned long ip, unsigned long parent_ip, void **data) > { > - if (!tracing_is_on()) > - return; > - > - if (update_count(data)) > - tracing_off(); > + update_traceon_count(data, 0); > } > > static void > @@ -330,11 +367,49 @@ ftrace_stacktrace(unsigned long ip, unsigned long parent_ip, void **data) > static void > ftrace_stacktrace_count(unsigned long ip, unsigned long parent_ip, void **data) > { > - if (!tracing_is_on()) > - return; > + long *count = (long *)data; > + long old_count; > + long new_count; > > - if (update_count(data)) > - trace_dump_stack(STACK_SKIP); > + /* > + * Stack traces should only execute the number of times the > + * user specified in the counter. > + */ > + do { > + > + if (!tracing_is_on()) > + return; > + > + old_count = *count; > + > + if (!old_count) > + return; > + > + /* unlimited? */ > + if (old_count == -1) { > + trace_dump_stack(STACK_SKIP); > + return; > + } > + > + new_count = old_count - 1; > + new_count = cmpxchg(count, old_count, new_count); > + if (new_count == old_count) > + trace_dump_stack(STACK_SKIP); > + > + } while (new_count != old_count); > +} > + > +static int update_count(void **data) > +{ > + unsigned long *count = (long *)data; > + > + if (!*count) > + return 0; > + > + if (*count != -1) > + (*count)--; > + > + return 1; > } > > static void > -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/