Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752124AbdHCCkR (ORCPT ); Wed, 2 Aug 2017 22:40:17 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:34707 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751946AbdHCCkQ (ORCPT ); Wed, 2 Aug 2017 22:40:16 -0400 Date: Wed, 2 Aug 2017 19:40:09 -0700 From: "Paul E. McKenney" To: Steven Rostedt Cc: Daniel Lezcano , john.stultz@linaro.org, linux-kernel@vger.kernel.org, Pratyush Anand Subject: Re: RCU stall when using function_graph Reply-To: paulmck@linux.vnet.ibm.com References: <20170801220405.GL3730@linux.vnet.ibm.com> <20170801201214.1e9c7d8e@gandalf.local.home> <20170802124239.GD1919@mai> <20170802090744.6922e9e9@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170802090744.6922e9e9@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17080302-0036-0000-0000-00000252DCB9 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007474; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000216; SDB=6.00896757; UDB=6.00448638; IPR=6.00676944; BA=6.00005506; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016508; XFM=3.00000015; UTC=2017-08-03 02:40:12 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17080302-0037-0000-0000-0000414F662D Message-Id: <20170803024009.GM3730@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-03_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1708030037 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2060 Lines: 53 On Wed, Aug 02, 2017 at 09:07:44AM -0400, Steven Rostedt wrote: > On Wed, 2 Aug 2017 14:42:39 +0200 > Daniel Lezcano wrote: > > > On Tue, Aug 01, 2017 at 08:12:14PM -0400, Steven Rostedt wrote: > > > On Wed, 2 Aug 2017 00:15:44 +0200 > > > Daniel Lezcano wrote: > > > > > > > On 02/08/2017 00:04, Paul E. McKenney wrote: > > > > >> Hi Paul, > > > > >> > > > > >> I have been trying to set the function_graph tracer for ftrace and each time I > > > > >> get a CPU stall. > > > > >> > > > > >> How to reproduce: > > > > >> ----------------- > > > > >> > > > > >> echo function_graph > /sys/kernel/debug/tracing/current_tracer > > > > >> > > > > >> This error appears with v4.13-rc3 and v4.12-rc6. > > > > > > Can you bisect this? It may be due to this commit: > > > > > > 0598e4f08 ("ftrace: Add use of synchronize_rcu_tasks() with dynamic trampolines") > > > > Hi Steve, > > > > I git bisected but each time the issue occured. I went through the different > > version down to v4.4 where the board was not fully supported and it ended up to > > have the same issue. > > > > Finally, I had the intuition it could be related to the wall time (there is no > > RTC clock with battery on the board and the wall time is Jan 1st, 1970). > > > > Setting up the with ntpdate solved the problem. > > > > Even if it is rarely the case to have the time not set, is it normal to have a > > RCU cpu stall ? > > > > > > BTW, function_graph tracer is the most invasive of the tracers. It's 4x > slower than function tracer. I'm wondering if the tracer isn't the > cause, but just slows things down enough to cause a some other race > condition that triggers the bug. Easy to check! Use the rcupdate.rcu_cpu_stall_timeout kernel boot parameter to increase this timeout by a factor of four. Mainline default is 21 seconds, but many distros set it to 60 seconds. You can always check sysfs to find the value for your system, or CONFIG_RCU_CPU_STALL_TIMEOUT in your .config file. Thanx, Paul