Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp203728ybe; Tue, 3 Sep 2019 20:58:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqxHSX5msqdSdaU8JXD85IYmtvbExsyftMajZkqoF0Fd8W7C9KpvvkkysU0hryDc0tDpKQM8 X-Received: by 2002:aa7:980c:: with SMTP id e12mr16798697pfl.79.1567569490419; Tue, 03 Sep 2019 20:58:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567569490; cv=none; d=google.com; s=arc-20160816; b=aiCtlTkYwKnnFCzjcQTTUzkmC085NmUaR1hcu3aq6+S+w2Pex6nm8xtSLESWaLRVKX GKEFVPzOV2F/9farB5HDiFHikZnq0qnsZwLdTNinfuxcl6ZP4pQkJjMsGR+/orPvn7aV KrKzH0sl2xVkxHjiEtnFQZ4BBsmcOvFLZFIw0IsdZ12uT+pMbQM5cFa/19C7+t4SApUV YeSGqlRT4bPPg95227IAKBiWlNYvjIut2g+Mmk8pSAs+Rqni8OphgM0Hu/JmI3ZKxgJ/ NQ6awgxDUwqQsEQueSnc3AoIo864bfv1lS5pah6G+3+iSnU1Te3D8b/3VcZXWfiLt+29 jkVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:message-id:cc:to:from:date:dkim-signature; bh=ZmOY7m3IyWINx7xPEGcvNHI4tV2iIeXUtDevd4vyPOA=; b=now7s9zKgz5W/sNvHAok4omc1N/QMTT3TelKza7oC087yzWBwqnYj7YsoNBpdtAGZy 5unHqc2oKAFzh2NzZj4W2p5+wnxcbPsrK8+RGqqTo+EB8NjyYtc5EwsxllIFrUTOdO63 vYQEPwq6p2f06Bu8XcL3NQye93WDcC4pWOj141kqK3hJAaf0dujqC+SsCkXxj1pqXPWP kkKJ2rVdKq1Fs+/lAkUP6tESdAf7e4foWMEaH9nViRkQ4QkFxzyRb/6t3oKsvrDLyNUn Xcgeo1uwMgLJ/EFv4ZD6yJhqXORXKNkovJ13QpI4WOjsXdQNk56VGy5Gvcf4v+9JFARQ vo2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=iQjibxV1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x1si16729869pgt.258.2019.09.03.20.57.52; Tue, 03 Sep 2019 20:58:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=iQjibxV1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727813AbfIDD5C (ORCPT + 99 others); Tue, 3 Sep 2019 23:57:02 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:33663 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727065AbfIDD5B (ORCPT ); Tue, 3 Sep 2019 23:57:01 -0400 Received: by mail-pl1-f196.google.com with SMTP id t11so2852739plo.0 for ; Tue, 03 Sep 2019 20:57:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:message-id:mime-version:content-disposition :user-agent; bh=ZmOY7m3IyWINx7xPEGcvNHI4tV2iIeXUtDevd4vyPOA=; b=iQjibxV1UZeSOO2NBCzP8UUQtSVBWvvzucdk2Cj5fQNJdMPULZH3K7MpQlnoz2Rt0w qcKQjSen6OKXPBnJEd7scUDqURziwVr1MrvDR94zTjyXB8iF7ZRjMNVly3FwgM6vUSFb zhMJJw6fQ4Q4A6XBKgzik3c0WyiGNSFMJ/kdM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:message-id:mime-version :content-disposition:user-agent; bh=ZmOY7m3IyWINx7xPEGcvNHI4tV2iIeXUtDevd4vyPOA=; b=rqsYXorUkOE5Bp2ZZbnkCyoKS4wd0EWcWe3VQatLeEZF1qg8pHSpDWVrnseMyMTMMP DbUf/7sTQesPWUZ/3XC3bpHKWAvZpcnjOqGepIOwJc1RHs97M5V1Aoq6P4xmGMd2PRQ3 smQWTwsA3XjDv04n2g/9LB49gz2AongXp8mmB9Kq+I/zYc/Dv5exrPfgWrTZjFMG1J0m L8uVt1mNtbht92+/aP+rgnHi3dBZYuxRQ6U2W438lVxy+Ihk7hg5kFN1ckciqRcr1gpt GNN6J67Nvbcuf6VRuS/cqfLCvkyaghbD/c6yK0zuyNCcCKEjeLhgsOvqNyGLH1Hdp85X nQUQ== X-Gm-Message-State: APjAAAVDmWfTbTiq4Q3gZ1QFqnWbQq9URMFXhyPthFYUvD6vC0w6RG2V +/JZKcQlYaB51h8eTk3Mba1fKA== X-Received: by 2002:a17:902:a506:: with SMTP id s6mr11160679plq.232.1567569420632; Tue, 03 Sep 2019 20:57:00 -0700 (PDT) Received: from localhost ([2620:15c:6:12:9c46:e0da:efbf:69cc]) by smtp.gmail.com with ESMTPSA id v67sm31854157pfb.45.2019.09.03.20.56.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Sep 2019 20:56:59 -0700 (PDT) Date: Tue, 3 Sep 2019 23:56:58 -0400 From: Joel Fernandes To: Viktor Rosendahl , paulmck@kernel.org Cc: Steven Rostedt , Ingo Molnar , linux-kernel@vger.kernel.org, Peter Zijlstra Message-ID: <20190904035658.GA150430@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org linux-rt-users@vger.kernel.org Bcc: Subject: Re: [PATCH v5 1/4] ftrace: Implement fs notification for tracing_max_latency Reply-To: In-Reply-To: <20190903132602.3440-2-viktor.rosendahl@gmail.com> On Tue, Sep 03, 2019 at 03:25:59PM +0200, Viktor Rosendahl wrote: > This patch implements the feature that the tracing_max_latency file, > e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive > notifications through the fsnotify framework when a new latency is > available. > > One particularly interesting use of this facility is when enabling > threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh, > together with the preempt/irqsoff tracers. This makes it possible to > implement a user space program that can, with equal probability, > obtain traces of latencies that occur immediately after each other in > spite of the fact that the preempt/irqsoff tracers operate in overwrite > mode. > > This facility works with the hwlat, preempt/irqsoff, and wakeup > tracers. > > This patch also adds some unfortunate calls from __schedule() and > do_idle(). Those calls to the latency_fsnotify_disable/enable() are > needed because we cannot wake up the workqueue from these critical > sections without risking a deadlock. Similar problems would also arise > if we try to schedule a tasklet, raise a softirq, or wake up a kernel Adding Paul since RCU faces similar situations, i.e. raising softirq risks scheduler deadlock in rcu_read_unlock_special() -- but RCU's solution is to avoid raising the softirq and instead use irq_work. I was wondering, if we can rename __raise_softirq_irqoff() to raise_softirq_irqoff_no_wake() and call that from places where there is risk of scheduler related deadlocks. Then I think this can be used from Viktor's code. What would happen if the softirq is raised, but ksoftirqd is not awakened for some paths? Is this really an issue considering the softirq will execute during the next interrupt exit? thanks, - Joel > thread. If a notification event would happen in the forbidden sections, > we schedule the fsnotify work as soon as we have exited them. > > There was a suggestion to remove this latency_fsnotify_enable/disable() > gunk, or at least to combine it with the start_critical_timings() and > stop_critical_timings(). I have however not been able to come up with > a way to do it. > > It seems like it would be possible to simply replace the calls to > latency_fsnotify_enable/disable() with calls to > start/stop_critical_timings(). However, the main problem is that it > would not work for the wakup tracer. The wakeup tracer needs a > facility that postpones the notifications, not one that prevents the > measurements because all its measurements takes place in the middle > of __schedule(). On the other hand, in some places, like in idle and > the console we need start stop functions that prevents the > measurements from being make. > > Signed-off-by: Viktor Rosendahl > --- > include/linux/ftrace.h | 31 +++++++++ > kernel/sched/core.c | 3 + > kernel/sched/idle.c | 3 + > kernel/sched/sched.h | 1 + > kernel/trace/trace.c | 112 +++++++++++++++++++++++++++++- > kernel/trace/trace.h | 22 ++++++ > kernel/trace/trace_hwlat.c | 4 +- > kernel/trace/trace_irqsoff.c | 4 ++ > kernel/trace/trace_sched_wakeup.c | 4 ++ > 9 files changed, 181 insertions(+), 3 deletions(-) > > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h > index 8a8cb3c401b2..b4d9700ef917 100644 > --- a/include/linux/ftrace.h > +++ b/include/linux/ftrace.h > @@ -907,4 +907,35 @@ unsigned long arch_syscall_addr(int nr); > > #endif /* CONFIG_FTRACE_SYSCALLS */ > > +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ > + defined(CONFIG_FSNOTIFY) > + > +DECLARE_PER_CPU(int, latency_notify_disable); > +DECLARE_STATIC_KEY_FALSE(latency_notify_key); > + > +void latency_fsnotify_process(void); > + > +/* > + * Disable/enable fsnotify while in scheduler and idle code. Trying to wake > + * anything up from there, such as calling queue_work() is prone to deadlock. > + */ > +static inline void latency_fsnotify_disable(void) > +{ > + this_cpu_inc(latency_notify_disable); > +} > + > +static inline void latency_fsnotify_enable(void) > +{ > + this_cpu_dec(latency_notify_disable); > + if (static_branch_unlikely(&latency_notify_key)) > + latency_fsnotify_process(); > +} > + > +#else > + > +#define latency_fsnotify_disable() do { } while (0) > +#define latency_fsnotify_enable() do { } while (0) > + > +#endif > + > #endif /* _LINUX_FTRACE_H */ > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 010d578118d6..e3c1dc801073 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3198,6 +3198,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev) > */ > > rq = finish_task_switch(prev); > + latency_fsnotify_enable(); > balance_callback(rq); > preempt_enable(); > > @@ -3820,6 +3821,7 @@ static void __sched notrace __schedule(bool preempt) > > local_irq_disable(); > rcu_note_context_switch(preempt); > + latency_fsnotify_disable(); > > /* > * Make sure that signal_pending_state()->signal_pending() below > @@ -3883,6 +3885,7 @@ static void __sched notrace __schedule(bool preempt) > rq_unlock_irq(rq, &rf); > } > > + latency_fsnotify_enable(); > balance_callback(rq); > } > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index 80940939b733..5fc87d99a407 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -236,6 +236,7 @@ static void do_idle(void) > > __current_set_polling(); > tick_nohz_idle_enter(); > + latency_fsnotify_disable(); > > while (!need_resched()) { > check_pgt_cache(); > @@ -265,6 +266,8 @@ static void do_idle(void) > arch_cpu_idle_exit(); > } > > + latency_fsnotify_enable(); > + > /* > * Since we fell out of the loop above, we know TIF_NEED_RESCHED must > * be set, propagate it into PREEMPT_NEED_RESCHED. > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 802b1f3405f2..467d6ad03f16 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -46,6 +46,7 @@ > #include > #include > #include > +#include > #include > #include > #include > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 563e80f9006a..a622263a69e4 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -44,6 +44,10 @@ > #include > #include > #include > +#include > +#include > +#include > +#include > > #include "trace.h" > #include "trace_output.h" > @@ -1480,6 +1484,110 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt) > > unsigned long __read_mostly tracing_thresh; > > +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ > + defined(CONFIG_FSNOTIFY) > + > +static const struct file_operations tracing_max_lat_fops; > +static struct workqueue_struct *fsnotify_wq; > +static DEFINE_PER_CPU(struct llist_head, notify_list); > + > +DEFINE_PER_CPU(int, latency_notify_disable); > +DEFINE_STATIC_KEY_FALSE(latency_notify_key); > + > +static void latency_fsnotify_workfn(struct work_struct *work) > +{ > + struct trace_array *tr = container_of(work, struct trace_array, > + fsnotify_work); > + fsnotify(tr->d_max_latency->d_inode, FS_MODIFY, > + tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0); > +} > + > +static void trace_create_maxlat_file(struct trace_array *tr, > + struct dentry *d_tracer) > +{ > + INIT_WORK(&tr->fsnotify_work, latency_fsnotify_workfn); > + atomic_set(&tr->notify_pending, 0); > + tr->d_max_latency = trace_create_file("tracing_max_latency", 0644, > + d_tracer, &tr->max_latency, > + &tracing_max_lat_fops); > +} > + > +void latency_fsnotify_stop(void) > +{ > + /* Make sure all CPUs see caller's previous actions to stop tracer */ > + smp_wmb(); > + static_branch_disable(&latency_notify_key); > + latency_fsnotify_process(); > +} > + > +void latency_fsnotify_start(void) > +{ > + static_branch_enable(&latency_notify_key); > + /* Make sure all CPUs see key value before caller continue */ > + smp_wmb(); > +} > + > +void latency_fsnotify_process(void) > +{ > + struct trace_array *tr; > + struct llist_head *list; > + struct llist_node *node; > + > + if (this_cpu_read(latency_notify_disable)) > + return; > + > + list = this_cpu_ptr(¬ify_list); > + for (node = llist_del_first(list); node != NULL; > + node = llist_del_first(list)) { > + tr = llist_entry(node, struct trace_array, notify_ll); > + atomic_set(&tr->notify_pending, 0); > + queue_work(fsnotify_wq, &tr->fsnotify_work); > + } > +} > + > +__init static int latency_fsnotify_init(void) > +{ > + fsnotify_wq = alloc_workqueue("tr_max_lat_wq", > + WQ_UNBOUND | WQ_HIGHPRI, 0); > + if (!fsnotify_wq) { > + pr_err("Unable to allocate tr_max_lat_wq\n"); > + return -ENOMEM; > + } > + return 0; > +} > + > +late_initcall_sync(latency_fsnotify_init); > + > +void latency_fsnotify(struct trace_array *tr) > +{ > + if (!fsnotify_wq) > + return; > + > + if (!this_cpu_read(latency_notify_disable)) > + queue_work(fsnotify_wq, &tr->fsnotify_work); > + else { > + /* > + * notify_pending prevents us from adding the same entry to > + * more than one notify_list. It will get queued in > + * latency_enable_fsnotify() > + */ > + if (!atomic_xchg(&tr->notify_pending, 1)) > + llist_add(&tr->notify_ll, this_cpu_ptr(¬ify_list)); > + } > +} > + > +/* > + * (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ > + * defined(CONFIG_FSNOTIFY) > + */ > +#else > + > +#define trace_create_maxlat_file(tr, d_tracer) \ > + trace_create_file("tracing_max_latency", 0644, d_tracer, \ > + &tr->max_latency, &tracing_max_lat_fops) > + > +#endif > + > #ifdef CONFIG_TRACER_MAX_TRACE > /* > * Copy the new maximum trace into the separate maximum-trace > @@ -1518,6 +1626,7 @@ __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu) > > /* record this tasks comm */ > tracing_record_cmdline(tsk); > + latency_fsnotify(tr); > } > > /** > @@ -8550,8 +8659,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) > create_trace_options_dir(tr); > > #if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) > - trace_create_file("tracing_max_latency", 0644, d_tracer, > - &tr->max_latency, &tracing_max_lat_fops); > + trace_create_maxlat_file(tr, d_tracer); > #endif > > if (ftrace_create_function_files(tr, d_tracer)) > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h > index 005f08629b8b..d9f83b2aaa71 100644 > --- a/kernel/trace/trace.h > +++ b/kernel/trace/trace.h > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > > #ifdef CONFIG_FTRACE_SYSCALLS > #include /* For NR_SYSCALLS */ > @@ -264,6 +265,12 @@ struct trace_array { > #endif > #if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) > unsigned long max_latency; > +#ifdef CONFIG_FSNOTIFY > + struct dentry *d_max_latency; > + struct work_struct fsnotify_work; > + atomic_t notify_pending; > + struct llist_node notify_ll; > +#endif > #endif > struct trace_pid_list __rcu *filtered_pids; > /* > @@ -785,6 +792,21 @@ void update_max_tr_single(struct trace_array *tr, > struct task_struct *tsk, int cpu); > #endif /* CONFIG_TRACER_MAX_TRACE */ > > +#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ > + defined(CONFIG_FSNOTIFY) > + > +void latency_fsnotify(struct trace_array *tr); > +void latency_fsnotify_start(void); > +void latency_fsnotify_stop(void); > + > +#else > + > +#define latency_fsnotify(tr) do { } while (0) > +#define latency_fsnotify_start() do { } while (0) > +#define latency_fsnotify_stop() do { } while (0) > + > +#endif > + > #ifdef CONFIG_STACKTRACE > void __trace_stack(struct trace_array *tr, unsigned long flags, int skip, > int pc); > diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c > index fa95139445b2..9c379261ee89 100644 > --- a/kernel/trace/trace_hwlat.c > +++ b/kernel/trace/trace_hwlat.c > @@ -254,8 +254,10 @@ static int get_sample(void) > trace_hwlat_sample(&s); > > /* Keep a running maximum ever recorded hardware latency */ > - if (sample > tr->max_latency) > + if (sample > tr->max_latency) { > tr->max_latency = sample; > + latency_fsnotify(tr); > + } > } > > out: > diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c > index a745b0cee5d3..29403a83a5f0 100644 > --- a/kernel/trace/trace_irqsoff.c > +++ b/kernel/trace/trace_irqsoff.c > @@ -557,6 +557,7 @@ static int __irqsoff_tracer_init(struct trace_array *tr) > if (irqsoff_busy) > return -EBUSY; > > + latency_fsnotify_start(); > save_flags = tr->trace_flags; > > /* non overwrite screws up the latency tracers */ > @@ -591,16 +592,19 @@ static void __irqsoff_tracer_reset(struct trace_array *tr) > ftrace_reset_array_ops(tr); > > irqsoff_busy = false; > + latency_fsnotify_stop(); > } > > static void irqsoff_tracer_start(struct trace_array *tr) > { > + latency_fsnotify_start(); > tracer_enabled = 1; > } > > static void irqsoff_tracer_stop(struct trace_array *tr) > { > tracer_enabled = 0; > + latency_fsnotify_stop(); > } > > #ifdef CONFIG_IRQSOFF_TRACER > diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c > index 743b2b520d34..3dc90d9f605b 100644 > --- a/kernel/trace/trace_sched_wakeup.c > +++ b/kernel/trace/trace_sched_wakeup.c > @@ -669,6 +669,7 @@ static bool wakeup_busy; > > static int __wakeup_tracer_init(struct trace_array *tr) > { > + latency_fsnotify_start(); > save_flags = tr->trace_flags; > > /* non overwrite screws up the latency tracers */ > @@ -727,10 +728,12 @@ static void wakeup_tracer_reset(struct trace_array *tr) > set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag); > ftrace_reset_array_ops(tr); > wakeup_busy = false; > + latency_fsnotify_stop(); > } > > static void wakeup_tracer_start(struct trace_array *tr) > { > + latency_fsnotify_start(); > wakeup_reset(tr); > tracer_enabled = 1; > } > @@ -738,6 +741,7 @@ static void wakeup_tracer_start(struct trace_array *tr) > static void wakeup_tracer_stop(struct trace_array *tr) > { > tracer_enabled = 0; > + latency_fsnotify_stop(); > } > > static struct tracer wakeup_tracer __read_mostly = > -- > 2.17.1 >