Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753274AbdHIOkj (ORCPT ); Wed, 9 Aug 2017 10:40:39 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:54629 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751862AbdHIOkh (ORCPT ); Wed, 9 Aug 2017 10:40:37 -0400 Date: Wed, 9 Aug 2017 07:40:33 -0700 From: "Paul E. McKenney" To: Daniel Lezcano Cc: Pratyush Anand , =?utf-8?B?6rmA64+Z7ZiE?= , john.stultz@linaro.org, Steven Rostedt , linux-kernel@vger.kernel.org Subject: Re: RCU stall when using function_graph Reply-To: paulmck@linux.vnet.ibm.com References: <11d179df-d8a9-5d3e-3bc4-080df464e85d@linaro.org> <20170803124421.GP3730@linux.vnet.ibm.com> <20170803143801.GE1919@mai> <20170806170220.GQ3730@linux.vnet.ibm.com> <20170809125804.GT3730@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17080914-0024-0000-0000-000002BCF501 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007513; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000218; SDB=6.00899847; UDB=6.00450444; IPR=6.00680052; BA=6.00005520; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016610; XFM=3.00000015; UTC=2017-08-09 14:40:34 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17080914-0025-0000-0000-0000450A08C8 Message-Id: <20170809144033.GU3730@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-09_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1708090228 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3250 Lines: 80 On Wed, Aug 09, 2017 at 03:28:05PM +0200, Daniel Lezcano wrote: > On 09/08/2017 14:58, Paul E. McKenney wrote: > > On Wed, Aug 09, 2017 at 02:43:49PM +0530, Pratyush Anand wrote: > >> > >> > >> On Sunday 06 August 2017 10:32 PM, Paul E. McKenney wrote: > >>> On Sat, Aug 05, 2017 at 02:24:21PM +0900, κΉ€λ™ν˜„ wrote: > >>>> Dear All > >>>> > >>>> As for me, after configuring function_graph as below, crash disappears. > >>>> "echo 0 > d/tracing/tracing_on" > >>>> "sleep 1" > >>>> > >>>> "echo function_graph > d/tracing/current_tracer" > >>>> "sleep 1" > >>>> > >>>> "echo smp_call_function_single > d/tracing/set_ftrace_filter" > >> > >> It will limit trace output to only for the filtered function > >> (smp_call_function_single). > >> > >>>> adb shell "sleep 1" > >>>> > >>>> "echo 1 > d/tracing/tracing_on" > >>>> adb shell "sleep 1" > >>>> > >>>> Right after function_graph is enabled, too many logs are traced upon IRQ > >>>> transaction which many times eventually causes stall. > >>> > >>> That would do it! > >>> > >>> Hmmm... > >>> > >>> Steven, would it be helpful if RCU were to inform tracing (say) halfway > >>> through the RCU CPU stall interval, allowing the tracer to do something > >>> like cond_resched_rcu_qs()? I can imagine all sorts of reasons why this > >>> wouldn't work, for example, if all the tracing was with irqs disabled > >>> or some such, but figured I should ask. > >>> > >>> Does Guillermo's approach work for others? > >> > >> Limited output with a couple of filtered function will definitely > >> not cause RCU schedule stall. But the question is whether we should > >> expect a full function graph trace working on every platform or not > >> (specially the one which generates high interrupts)? > > > > It might well be that the user must disable RCU CPU stall warnings via > > the rcu_cpu_stall_suppress sysfs entry (or increase their timeout via th > > rcu_cpu_stall_timeout sysfs entry) before doing something that greatly > > increases overhead. Like enabling large quantities of tracing. ;-) > > > > It -might- be possible to do this automatically, but reliable > > automation would require that tracing understand how often each > > function was called, which sounds to me to be a bit of a stretch. > > > > Thoughts? > > A random thought: > > Is it possible to have a mid-timeout happening and store some > information like the instruction pointer, so when the timeout happen we > can compare if there was some progress, if yes, very likely, system > performance collapsed and we are not fast enough. RCU already does take various actions for an impending stall, so something could be done. But in most slowdowns, the instruction pointer will be changing rapidly, just not as rapidly as it would normally. So exactly how would the forward-progress comparison be carried out? It would be easy to set up a notifier, so that if any notifier in the chain returned an error, stall warnings would be suppressed. It would be harder to figure out when to re-enable them, though I suppose that they could be suppressed only for the duration of the current grace period or some such. But what exactly would you use such a notifier for? Or am I misunderstanding your suggestion? Thanx, Paul