Date: Wed, 2 Aug 2017 09:51:13 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>, john.stultz@linaro.org,
        linux-kernel@vger.kernel.org
Subject: Re: RCU stall when using function_graph
Reply-To: paulmck@linux.vnet.ibm.com
References: <20170801220405.GL3730@linux.vnet.ibm.com>
 <f4827a94-ee63-4f19-4edb-92739d8cdc61@linaro.org>
 <20170801201214.1e9c7d8e@gandalf.local.home>
 <20170802124239.GD1919@mai>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20170802124239.GD1919@mai>
User-Agent: Mutt/1.5.21 (2010-09-15)
Message-Id: <20170802165113.GZ3730@linux.vnet.ibm.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2673
Lines: 73

On Wed, Aug 02, 2017 at 02:42:39PM +0200, Daniel Lezcano wrote:
> On Tue, Aug 01, 2017 at 08:12:14PM -0400, Steven Rostedt wrote:
> > On Wed, 2 Aug 2017 00:15:44 +0200
> > Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> > 
> > > On 02/08/2017 00:04, Paul E. McKenney wrote:
> > > >> Hi Paul,
> > > >>
> > > >> I have been trying to set the function_graph tracer for ftrace and each time I
> > > >> get a CPU stall.
> > > >>
> > > >> How to reproduce:
> > > >> -----------------
> > > >>
> > > >> 		 echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > >>
> > > >> This error appears with v4.13-rc3 and v4.12-rc6.
> > 
> > Can you bisect this? It may be due to this commit:
> > 
> > 0598e4f08 ("ftrace: Add use of synchronize_rcu_tasks() with dynamic trampolines")
> 
> Hi Steve,
> 
> I git bisected but each time the issue occured. I went through the different
> version down to v4.4 where the board was not fully supported and it ended up to
> have the same issue.
> 
> Finally, I had the intuition it could be related to the wall time (there is no
> RTC clock with battery on the board and the wall time is Jan 1st, 1970).
> 
> Setting up the with ntpdate solved the problem.
> 
> Even if it is rarely the case to have the time not set, is it normal to have a
> RCU cpu stall ?

If the system is sufficiently confused about the time, you can indeed
get RCU CPU stall warnings.

In one memorable case, a pair of CPUs had a multi-minute disagreement
as to the current time, which meant that when one of the started an RCU
grace period, the other would immediately issue an RCU CPU stall warning.

							Thanx, Paul

> > > >>
> > > >> Is it something already reported ?  
> > > > 
> > > > I have seen this sort of thing, but only when actually dumping the trace
> > > > out, and I though those got fixed.  You are seeing this just accumulating
> > > > the trace?  
> > > 
> > > No, just by changing the tracer. It is the first operation I do after
> > > rebooting and it is reproducible each time. That happens on an ARM64
> > > platform.
> > > 
> > > > These RCU CPU stall warnings usually occur when something grabs hold of
> > > > a CPU for too long, as in 21 seconds or so.  One way that they can happen
> > > > is excessive lock contention, another is having the kernel run through
> > > > too much data at one shot.
> > > > 
> > > > Adding Steven Rostedt on CC for his thoughts.
> > > > 
> > >
> 
> -- 
> 
>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>