Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752170AbdHCLlT (ORCPT ); Thu, 3 Aug 2017 07:41:19 -0400 Received: from mail-wr0-f173.google.com ([209.85.128.173]:34587 "EHLO mail-wr0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751079AbdHCLlO (ORCPT ); Thu, 3 Aug 2017 07:41:14 -0400 Subject: Re: RCU stall when using function_graph To: Steven Rostedt , paulmck@linux.vnet.ibm.com Cc: john.stultz@linaro.org, linux-kernel@vger.kernel.org, Pratyush Anand References: <20170801220405.GL3730@linux.vnet.ibm.com> <20170801201214.1e9c7d8e@gandalf.local.home> <20170802124239.GD1919@mai> <20170802090744.6922e9e9@gandalf.local.home> From: Daniel Lezcano Message-ID: <11d179df-d8a9-5d3e-3bc4-080df464e85d@linaro.org> Date: Thu, 3 Aug 2017 13:41:11 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170802090744.6922e9e9@gandalf.local.home> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2563 Lines: 75 On 02/08/2017 15:07, Steven Rostedt wrote: > On Wed, 2 Aug 2017 14:42:39 +0200 > Daniel Lezcano wrote: > >> On Tue, Aug 01, 2017 at 08:12:14PM -0400, Steven Rostedt wrote: >>> On Wed, 2 Aug 2017 00:15:44 +0200 >>> Daniel Lezcano wrote: >>> >>>> On 02/08/2017 00:04, Paul E. McKenney wrote: >>>>>> Hi Paul, >>>>>> >>>>>> I have been trying to set the function_graph tracer for ftrace and each time I >>>>>> get a CPU stall. >>>>>> >>>>>> How to reproduce: >>>>>> ----------------- >>>>>> >>>>>> echo function_graph > /sys/kernel/debug/tracing/current_tracer >>>>>> >>>>>> This error appears with v4.13-rc3 and v4.12-rc6. >>> >>> Can you bisect this? It may be due to this commit: >>> >>> 0598e4f08 ("ftrace: Add use of synchronize_rcu_tasks() with dynamic trampolines") >> >> Hi Steve, >> >> I git bisected but each time the issue occured. I went through the different >> version down to v4.4 where the board was not fully supported and it ended up to >> have the same issue. >> >> Finally, I had the intuition it could be related to the wall time (there is no >> RTC clock with battery on the board and the wall time is Jan 1st, 1970). >> >> Setting up the with ntpdate solved the problem. Actually, it did not solve the problem. The function_graph trace is set, I can use the system but after awhile (no tracing enabled at anytime), the stall appears. >> Even if it is rarely the case to have the time not set, is it normal to have a >> RCU cpu stall ? >> >> > > BTW, function_graph tracer is the most invasive of the tracers. It's 4x > slower than function tracer. I'm wondering if the tracer isn't the > cause, but just slows things down enough to cause a some other race > condition that triggers the bug. Yes, that could be true. I tried the following scenario: - cpufreq governor => userspace + max_freq (1.2GHz) - function_graph set ==> OK - cpufreq governor => userspace + min_freq (200MHz) - function_graph set ==> RCU stall Beside that, I realize the board is constantly processing SOF interrupts every 124us, so that adds more overhead. Removing the USB support, thus the associated processing for the SOF interrupts, I don't see anymore the RCU stall. Is it the expected behavior to have the system hang after a RCU stall raises ? -- Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog