Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752349AbdHAWEL (ORCPT ); Tue, 1 Aug 2017 18:04:11 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:51492 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751938AbdHAWEK (ORCPT ); Tue, 1 Aug 2017 18:04:10 -0400 Date: Tue, 1 Aug 2017 15:04:05 -0700 From: "Paul E. McKenney" To: Daniel Lezcano , john.stultz@linaro.org Cc: linux-kernel@vger.kernel.org, rostedt@goodmis.org Subject: Re: RCU stall when using function_graph Reply-To: paulmck@linux.vnet.ibm.com MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17080122-0044-0000-0000-000003769857 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007467; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000215; SDB=6.00896184; UDB=6.00448294; IPR=6.00676372; BA=6.00005506; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016486; XFM=3.00000015; UTC=2017-08-01 22:04:07 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17080122-0045-0000-0000-000007A4A9D0 Message-Id: <20170801220405.GL3730@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-08-01_13:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1708010354 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5715 Lines: 109 > Hi Paul, > > I have been trying to set the function_graph tracer for ftrace and each time I > get a CPU stall. > > How to reproduce: > ----------------- > > echo function_graph > /sys/kernel/debug/tracing/current_tracer > > This error appears with v4.13-rc3 and v4.12-rc6. > > Is it something already reported ? I have seen this sort of thing, but only when actually dumping the trace out, and I though those got fixed. You are seeing this just accumulating the trace? These RCU CPU stall warnings usually occur when something grabs hold of a CPU for too long, as in 21 seconds or so. One way that they can happen is excessive lock contention, another is having the kernel run through too much data at one shot. Adding Steven Rostedt on CC for his thoughts. Thanx, Paul > [ ... ] > > [ 472.934986] INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 472.941302] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=3195 > [ 472.948982] (detected by 3, t=6502 jiffies, g=549, c=548, q=10) > [ 472.955180] Task dump for CPU 0: > [ 472.958513] swapper/0 R running task 0 0 0 0x00000002 > [ 550.954786] INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 550.960657] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=12926 > [ 550.967897] (detected by 3, t=26007 jiffies, g=549, c=548, q=27) > [ 550.974181] Task dump for CPU 0: > [ 550.977513] swapper/0 R running task 0 0 0 0x00000002 > [ 628.974788] INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 628.980659] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=22666 > [ 628.987900] (detected by 3, t=45512 jiffies, g=549, c=548, q=43) > [ 628.994184] Task dump for CPU 0: > [ 628.997518] swapper/0 R running task 0 0 0 0x00000002 > [ 706.994787] INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 707.000658] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=31785 > [ 707.007899] (detected by 3, t=65017 jiffies, g=549, c=548, q=74) > [ 707.014183] Task dump for CPU 0: > [ 707.017516] swapper/0 R running task 0 0 0 0x00000002 > [ 707.024902] rcu_preempt kthread starved for 1261 jiffies! g549 c548 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 > [ 707.034548] rcu_preempt S 0 8 2 0x00000000 > [ 726.049583] INFO: task jbd2/mmcblk1p2-:1120 blocked for more than 120 seconds. > [ 726.061772] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > [ 726.074294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 726.089027] jbd2/mmcblk1p2- D 0 1120 2 0x00000000 > [ 726.126737] INFO: task rs:main Q:Reg:2272 blocked for more than 120 seconds. > [ 726.140484] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > [ 726.154515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 726.170140] rs:main Q:Reg D 0 2272 1 0x00000000 > [ 726.205626] > [ 726.205626] Showing all locks held in the system: > [ 726.272952] > [ 726.278194] ============================================= > [ 726.278194] > [ 785.014779] INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 785.020658] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=41387 > [ 785.027905] (detected by 3, t=84524 jiffies, g=549, c=548, q=94) > [ 785.034194] Task dump for CPU 0: > [ 785.037532] swapper/0 R running task 0 0 0 0x00000002 > [ 846.879753] INFO: task jbd2/mmcblk1p2-:1120 blocked for more than 120 seconds. > [ 846.892966] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > [ 846.909597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 846.926726] jbd2/mmcblk1p2- D 0 1120 2 0x00000000 > [ 846.981356] INFO: task rs:main Q:Reg:2272 blocked for more than 120 seconds. > [ 847.001593] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > [ 847.017642] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 847.034135] rs:main Q:Reg D 0 2272 1 0x00000000 > [ 847.077209] > [ 847.077209] Showing all locks held in the system: > [ 847.180736] > [ 847.186753] ============================================= > [ 847.186753] > [ 863.034778] INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 863.040660] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=50881 > [ 863.047907] (detected by 4, t=104030 jiffies, g=549, c=548, q=114) > [ 863.054373] Task dump for CPU 0: > [ 863.057711] swapper/0 R running task 0 0 0 0x00000002 > [ 941.054780] INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 941.060659] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=59726 > [ 941.067903] (detected by 7, t=123535 jiffies, g=549, c=548, q=303) > [ 941.074370] Task dump for CPU 0: > [ 941.077708] swapper/0 R running task 0 0 0 0x00000002 > [ 967.711682] INFO: task jbd2/mmcblk1p2-:1120 blocked for more than 120 seconds. > [ 967.725731] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > [ 967.743347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 967.765658] jbd2/mmcblk1p2- D 0 1120 2 0x00000000 > [ 967.810595] INFO: task rs:main Q:Reg:2272 blocked for more than 120 seconds. > [ 967.825360] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202 > [ 967.838595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 967.854460] rs:main Q:Reg D 0 2272 1 0x00000000 > [ 967.897804] > [ 967.897804] Showing all locks held in the system: > [ 967.997067] > [ 968.002570] ============================================= > > > [ ...?] > > etc ...