Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751366AbdHFRC1 (ORCPT ); Sun, 6 Aug 2017 13:02:27 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:54570 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751300AbdHFRCZ (ORCPT ); Sun, 6 Aug 2017 13:02:25 -0400 Date: Sun, 6 Aug 2017 10:02:20 -0700 From: "Paul E. McKenney" To: =?utf-8?B?6rmA64+Z7ZiE?= Cc: Daniel Lezcano , john.stultz@linaro.org, Steven Rostedt , linux-kernel@vger.kernel.org, Pratyush Anand Subject: Re: RCU stall when using function_graph Reply-To: paulmck@linux.vnet.ibm.com References: <20170801201214.1e9c7d8e@gandalf.local.home> <20170802124239.GD1919@mai> <20170802090744.6922e9e9@gandalf.local.home> <11d179df-d8a9-5d3e-3bc4-080df464e85d@linaro.org> <20170803124421.GP3730@linux.vnet.ibm.com> <20170803143801.GE1919@mai> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17080617-0044-0000-0000-00000378FC83 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007496; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000217; SDB=6.00898456; UDB=6.00449608; IPR=6.00678659; BA=6.00005512; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016561; XFM=3.00000015; UTC=2017-08-06 17:02:21 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17080617-0045-0000-0000-000007A70FE6 Message-Id: <20170806170220.GQ3730@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-06_13:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1708060294 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2640 Lines: 86 On Sat, Aug 05, 2017 at 02:24:21PM +0900, 김동현 wrote: > Dear All > > As for me, after configuring function_graph as below, crash disappears. > "echo 0 > d/tracing/tracing_on" > "sleep 1" > > "echo function_graph > d/tracing/current_tracer" > "sleep 1" > > "echo smp_call_function_single > d/tracing/set_ftrace_filter" > adb shell "sleep 1" > > "echo 1 > d/tracing/tracing_on" > adb shell "sleep 1" > > Right after function_graph is enabled, too many logs are traced upon IRQ > transaction which many times eventually causes stall. That would do it! Hmmm... Steven, would it be helpful if RCU were to inform tracing (say) halfway through the RCU CPU stall interval, allowing the tracer to do something like cond_resched_rcu_qs()? I can imagine all sorts of reasons why this wouldn't work, for example, if all the tracing was with irqs disabled or some such, but figured I should ask. Does Guillermo's approach work for others? Thanx, Paul > BR, > Guillermo Austin Kim > > 2017. 8. 3. 오후 11:38에 "Daniel Lezcano" 님이 작성: > > On Thu, Aug 03, 2017 at 05:44:21AM -0700, Paul E. McKenney wrote: > > [ ... ] > > > > > BTW, function_graph tracer is the most invasive of the tracers. It's > 4x > > > > slower than function tracer. I'm wondering if the tracer isn't the > > > > cause, but just slows things down enough to cause a some other race > > > > condition that triggers the bug. > > > > > > Yes, that could be true. > > > > > > I tried the following scenario: > > > > > > - cpufreq governor => userspace + max_freq (1.2GHz) > > > - function_graph set ==> OK > > > > > > - cpufreq governor => userspace + min_freq (200MHz) > > > - function_graph set ==> RCU stall > > > > > > Beside that, I realize the board is constantly processing SOF interrupts > > > every 124us, so that adds more overhead. > > > > > > Removing the USB support, thus the associated processing for the SOF > > > interrupts, I don't see anymore the RCU stall. > > > > Looks like Steve called this one! ;-) > > Yep :) > > > > Is it the expected behavior to have the system hang after a RCU stall > > > raises ? > > > > No, but if NMI stack traces are enabled and there are any NMI problems, > > bad things can happen. In addition, the bulk of output can cause problems > > if you have a slow console connection. > > Ok, thanks. > > -- Daniel > > -- > > Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: Facebook | > Twitter | > Blog