Received: by 10.192.165.156 with SMTP id m28csp1828646imm; Thu, 12 Apr 2018 04:20:43 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+KZjILrwkGnvg1Ql5l7n/hC+dpruCoJ5xtXJpnE7J4zvAu4zrvyqaZFt4aWwYNvHy00hO9 X-Received: by 2002:a17:902:848c:: with SMTP id c12-v6mr583741plo.316.1523532043852; Thu, 12 Apr 2018 04:20:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523532043; cv=none; d=google.com; s=arc-20160816; b=lbsgWmrRia1AzmoGlhtf3qRDfZ96mZ4nwZe6jJjilvXKiS6YYbWVduC5ZP4v9fKz56 tXpiwavF7nxM2cGE91Pn+m0l2pv9g79qdoS3fRaLWv8PbcEuY0MlRkwLw4AOylqcUU98 bwG1WJUGvRii9GOC1zuK0jPAk/n5xe0UqvI95zxupiGZ6sFX3/2jPYh4O8Z7dFOf389A QRlpQm32N6/h1AVZK5ESQPg4LdpBH3FeJ9zCLnFZ4xnSrcyBG16D3O/QWzdvXzvQlJSk WocRLWEB+xpjsZwa/mmex67q68rGnJhswpCGKOQk9ez1bFLndOEub6hyIsfF6U1VoA7R FHMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=0VMqjNP23818mjjGI9LU57E67yBkaPH2/T5cb5QmIdo=; b=Tn/LbxJ6Ta6NkSaDQr/1nHQQWpS3nli+Pvb9fmniWvqKekVPQzqxnwBSainKDPb03C 2fQjrijGuZne/ieM77ZPO5fCum6rWcREmRxe0CX44niKJDDBO0ELDsbXcv0TSYQwTbzs dRb1qttCcTp+7N1aN6hER6ykCoA9D1YtLhZGgxwhEkpudqnwmrTSRzR+5VzBmAQcIbCe l7IzpPihlHUxGY5vvWgP1556ahGOTmo743rUP/xIFRnkmQzw9QSo97qHy/+rsbXl3XYp Osbpo1Uhyjwx4VRo2IuNwXDz8B735Yit3RIzyWFjHGC+7msI+uv12PE+JdZYwhlLXdfx c+dg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ragnatech-se.20150623.gappssmtp.com header.s=20150623 header.b=DVopgReX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k75si2408459pfk.4.2018.04.12.04.20.06; Thu, 12 Apr 2018 04:20:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ragnatech-se.20150623.gappssmtp.com header.s=20150623 header.b=DVopgReX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751918AbeDLLQ2 (ORCPT + 99 others); Thu, 12 Apr 2018 07:16:28 -0400 Received: from mail-lf0-f48.google.com ([209.85.215.48]:43925 "EHLO mail-lf0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753276AbeDLLPW (ORCPT ); Thu, 12 Apr 2018 07:15:22 -0400 Received: by mail-lf0-f48.google.com with SMTP id v207-v6so7115081lfa.10 for ; Thu, 12 Apr 2018 04:15:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ragnatech-se.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=0VMqjNP23818mjjGI9LU57E67yBkaPH2/T5cb5QmIdo=; b=DVopgReXVDrE7wAlHvTCNsIoBEvuh72rGgxW+6UU/VU4vZ9qGwl/aFs/3/3YvxCdtr vIuB0/Ueb59qt0BqmWazl9WuGisH5xMlfGxCZPfU5jY3zojTb8O+bRyg55kafBgfeqk4 hd640Yp1YVZiWYbbXy9bxkGmMgoXLofzHMMkVl3fGKtbTc0kddfj4dl3RKvFKxdb5Qyr W1cPCv9u0ae2CoEO7/NnkwFe7oy0JEO2jWlUKbOPn70Dkm988pxsSRN7rhLbDZQ4qILl o0CbnrW4EK/invHP4Vy+/jLag0l7Vy86OGk1xbNeUmVt309mDie3WPOoYYSNUkTeS7JQ tnxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=0VMqjNP23818mjjGI9LU57E67yBkaPH2/T5cb5QmIdo=; b=ebBIkVSiEEq1RHsK6FdXYcQhgvCWesRjuTQTUsNHx1A7lKEsx2CPdSV1x4gMqfKGhu fqTY36epvOzKOufvP4fiNO1WAaYuUyG5yYUAB2iLF761/8ZVrAd6keesho3HekeBzJBU 4oaCe/sHqN3wujhQ3/u+Mkozq1rZjHzMLD6MprglDSQCXidZv5IYvP0JUu3MPSh26DzX H36KGbbyvpY1o1oWhSnMYs6DztHJri7OASZtOzPlwdR4FGm8ec6g6RMVMztyyXSbeK2w D4chSg82CsGGc+ChTNohI4cVOgaNRjpITbjXBIfM6A4sO4zQt/1slTEqBY/xc1M4A/ll Q7uQ== X-Gm-Message-State: ALQs6tB7EogYxcSQn2s0ZHqQrE2a9YddeEBMFbYex80Ao9joZdRNZODE gfxb2Mbxeii89/kSXuMZF6VCPA== X-Received: by 10.46.111.8 with SMTP id k8mr379953ljc.112.1523531720456; Thu, 12 Apr 2018 04:15:20 -0700 (PDT) Received: from localhost (89-233-230-99.cust.bredband2.com. [89.233.230.99]) by smtp.gmail.com with ESMTPSA id l24sm539494ljb.60.2018.04.12.04.15.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 12 Apr 2018 04:15:19 -0700 (PDT) Date: Thu, 12 Apr 2018 13:15:19 +0200 From: Niklas =?iso-8859-1?Q?S=F6derlund?= To: Vincent Guittot Cc: Peter Zijlstra , "Paul E. McKenney" , Ingo Molnar , linux-kernel , linux-renesas-soc@vger.kernel.org, Heiner Kallweit Subject: Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") Message-ID: <20180412111519.GH12256@bigcity.dyn.berto.se> References: <20180412091822.GG12256@bigcity.dyn.berto.se> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Vincent, Thanks for your feedback. On 2018-04-12 12:33:27 +0200, Vincent Guittot wrote: > Hi Niklas, > > On 12 April 2018 at 11:18, Niklas S?derlund > wrote: > > Hi Vincent, > > > > I have observed issues running on linus/master from a few days back [1]. > > I'm running on a Renesas Koelsch board (arm32) and I can trigger a issue > > by X forwarding the v4l2 test application qv4l2 over ssh and moving the > > courser around in the GUI (best test case description award...). I'm > > sorry about the really bad way I trigger this but I can't do it in any > > other way, I'm happy to try other methods if you got some ideas. The > > symptom of the issue is a complete hang of the system for more then 30 > > seconds and then this information is printed in the console: > > Heiner (edded cc) also reported similar problem with his platform: a > dual core celeron > > Do you confirm that your platform is a dual cortex-A15 ? At least that > what I have seen on web > This would confirm that dual system is a key point. I can confirm that my platform is a dual core. > > The ssh connection is also common with Heiner's setup Interesting, I found Heiner's mail and I can confirm that I too experience ssh sessions lockups. I ssh into the system and by repeatedly hitting the return key I can lockup the board, while locked up starting another ssh session unblocks the first. If I don't start another ssh session but keep hitting return key sporadically in the first one I can get the trace I reported in my first mail to be printed on the serial console. When locked up the symptoms are that both the single ssh session is dead and the serial console. But attempting another ssh connection immediately unblocks both ssh and serial console. And if I allow enough time before starting the second ssh connection I can trigger a trace to be printed on the serial console, it's similar but different from the first I reported. [ 207.548610] 1-...!: (0 ticks this GP) idle=79a/1/1073741824 softirq=2146/2146 fqs=0 [ 207.556442] (detected by 0, t=12645 jiffies, g=333, c=332, q=20) [ 207.562546] rcu_sched kthread starved for 12645 jiffies! g333 c332 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0 [ 207.572548] RCU grace-period kthread stack dump: [ 207.577166] rcu_sched R running task 0 9 2 0x00000000 [ 207.584389] Backtrace: [ 207.586849] [] (__schedule) from [] (schedule+0x94/0xb8) [ 207.593901] r10:e77813c0 r9:e77813c0 r8:ffffffff r7:e709bed4 r6:ffffaa80 r5:00000000 [ 207.601732] r4:ffffe000 [ 207.604269] [] (schedule) from [] (schedule_timeout+0x380/0x3dc) [ 207.612013] r5:00000000 r4:00000000 [ 207.615596] [] (schedule_timeout) from [] (rcu_gp_kthread+0x668/0xe2c) [ 207.623863] r10:c0b79018 r9:0000014d r8:0000014c r7:00000001 r6:00000000 r5:c0b10ad0 [ 207.631693] r4:c0b10980 [ 207.634230] [] (rcu_gp_kthread) from [] (kthread+0x148/0x160) [ 207.641712] r7:c0b10980 [ 207.644249] [] (kthread) from [] (ret_from_fork+0x14/0x2c) [ 207.651472] Exception stack(0xe709bfb0 to 0xe709bff8) [ 207.656527] bfa0: 00000000 00000000 00000000 00000000 [ 207.664709] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 207.672890] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 207.679508] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c013dc90 [ 207.687340] r4:e7026f4 Continuing the anecdotal testing, I can't seem to be able to trigger the lockup if i have ever had two ssh sessions open to the systems. And about half the time I can't trigger it at all but after a reset of the system it triggers with just hitting the return key 2-5 times of opening a ssh session and just hitting the return key. But please take this part with a grain of salt as it's done by the monkey testing method :-) All tests above have been run base on c18bb396d3d261eb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net"). > > > > > [ 142.849390] INFO: rcu_sched detected stalls on CPUs/tasks: > > [ 142.854972] 1-...!: (1 GPs behind) idle=7a4/0/0 softirq=3214/3217 fqs=0 > > [ 142.861976] (detected by 0, t=8232 jiffies, g=930, c=929, q=11) > > [ 142.868042] Sending NMI from CPU 0 to CPUs 1: > > [ 142.872443] NMI backtrace for cpu 1 > > [ 142.872452] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-05506-g28aba11c1393691a #14 > > [ 142.872455] Hardware name: Generic R8A7791 (Flattened Device Tree) > > [ 142.872473] PC is at arch_cpu_idle+0x28/0x44 > > [ 142.872484] LR is at trace_hardirqs_on_caller+0x1a4/0x1d4 > > [ 142.872488] pc : [] lr : [] psr: 20070013 > > [ 142.872491] sp : eb0b9f90 ip : eb0b9f60 fp : eb0b9f9c > > [ 142.872495] r10: 00000000 r9 : 413fc0f2 r8 : 4000406a > > [ 142.872498] r7 : c0c08478 r6 : c0c0842c r5 : ffffe000 r4 : 00000002 > > [ 142.872502] r3 : eb0b6ec0 r2 : 00000000 r1 : 00000004 r0 : 00000001 > > [ 142.872507] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none > > [ 142.872511] Control: 10c5387d Table: 6a61406a DAC: 00000051 > > [ 142.872516] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-05506-g28aba11c1393691a #14 > > [ 142.872519] Hardware name: Generic R8A7791 (Flattened Device Tree) > > [ 142.872522] Backtrace: > > [ 142.872534] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) > > [ 142.872540] r7:c0c81388 r6:00000000 r5:60070193 r4:c0c81388 > > [ 142.872550] [] (show_stack) from [] (dump_stack+0xa4/0xd8) > > [ 142.872557] [] (dump_stack) from [] (show_regs+0x14/0x18) > > [ 142.872563] r9:00000001 r8:00000000 r7:c0c4f678 r6:eb0b9f40 r5:00000001 r4:c13e1130 > > [ 142.872571] [] (show_regs) from [] (nmi_cpu_backtrace+0xfc/0x118) > > [ 142.872578] [] (nmi_cpu_backtrace) from [] (handle_IPI+0x2a8/0x320) > > [ 142.872583] r7:c0c4f678 r6:eb0b9f40 r5:00000007 r4:c0b75b68 > > [ 142.872594] [] (handle_IPI) from [] (gic_handle_irq+0x8c/0x98) > > [ 142.872599] r10:00000000 r9:eb0b8000 r8:f0803000 r7:c0c4f678 r6:eb0b9f40 r5:c0c08a90 > > [ 142.872602] r4:f0802000 > > [ 142.872608] [] (gic_handle_irq) from [] (__irq_svc+0x70/0x98) > > [ 142.872612] Exception stack(0xeb0b9f40 to 0xeb0b9f88) > > [ 142.872618] 9f40: 00000001 00000004 00000000 eb0b6ec0 00000002 ffffe000 c0c0842c c0c08478 > > [ 142.872624] 9f60: 4000406a 413fc0f2 00000000 eb0b9f9c eb0b9f60 eb0b9f90 c01747a8 c01088a4 > > [ 142.872627] 9f80: 20070013 ffffffff > > [ 142.872632] r9:eb0b8000 r8:4000406a r7:eb0b9f74 r6:ffffffff r5:20070013 r4:c01088a4 > > [ 142.872642] [] (arch_cpu_idle) from [] (default_idle_call+0x34/0x38) > > [ 142.872650] [] (default_idle_call) from [] (do_idle+0xe0/0x134) > > [ 142.872656] [] (do_idle) from [] (cpu_startup_entry+0x20/0x24) > > [ 142.872660] r7:c0c8e9d0 r6:10c0387d r5:00000051 r4:00000085 > > [ 142.872667] [] (cpu_startup_entry) from [] (secondary_start_kernel+0x114/0x134) > > [ 142.872673] [] (secondary_start_kernel) from [<401026ec>] (0x401026ec) > > [ 142.872676] r5:00000051 r4:6b0a406a > > [ 142.873456] rcu_sched kthread starved for 8235 jiffies! g930 c929 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 > > [ 143.135040] RCU grace-period kthread stack dump: > > [ 143.139695] rcu_sched I 0 9 2 0x00000000 > > [ 143.145234] Backtrace: > > [ 143.147719] [] (__schedule) from [] (schedule+0x94/0xb8) > > [ 143.154823] r10:c0b714c0 r9:c0c85f8a r8:ffffffff r7:eb0abec4 r6:ffffa274 r5:00000000 > > [ 143.162712] r4:ffffe000 > > [ 143.165273] [] (schedule) from [] (schedule_timeout+0x440/0x4b0) > > [ 143.173076] r5:00000000 r4:eb79c4c0 > > [ 143.176692] [] (schedule_timeout) from [] (rcu_gp_kthread+0x958/0x150c) > > [ 143.185108] r10:c0c87274 r9:00000000 r8:c0c165b8 r7:00000001 r6:00000000 r5:c0c16590 > > [ 143.192997] r4:c0c16300 > > [ 143.195560] [] (rcu_gp_kthread) from [] (kthread+0x148/0x160) > > [ 143.203099] r7:c0c16300 > > [ 143.205660] [] (kthread) from [] (ret_from_fork+0x14/0x20) > > [ 143.212938] Exception stack(0xeb0abfb0 to 0xeb0abff8) > > [ 143.218030] bfa0: 00000000 00000000 00000000 00000000 > > [ 143.226271] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > > [ 143.234511] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > [ 143.241177] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0145d70 > > [ 143.249065] r4:eb037b00 > > > > After the freeze the system becomes responsive again and I can sometimes > > trigger the hang multiple times. I tried to bisect the problem and I > > found that by reverting [2] I can no longer reproduce the issue. I can > > also not reproduce the issue on v4.16. I can't figure out if reverting > > [2] is just treating a symptom or the root cause of my troubles and > > would appreciate your input. Also my "test-case" do not trigger every > > time but I have tested this scenario quiet a lot and the result seems to > > be constant. > > > > My test setup involves a NFS root filesystem, I ssh in and launch the > > GUI application over X forwarding. From what I know the application is > > not doing any ioctl calls to the v4l2 framework it's just sitting there > > idle as I move the courser around showing tool tips and such as I hover > > over elements and then it freeze up. I have not observed this issue by > > just booting the system and leaving it idle, movement in the GUI seems > > to be the key to trigger this. > > > > I'm a bit lost on how to progress with this issue and would appreciate > > any help you can provide to help me figure this out. > > Can you send me your config ? > > I'm going to prepare a debug patch to spy what's happening when entering idle > > Regards, > Vincent > > > > 1. c18bb396d3d261eb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) > > 2. 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") > > > > -- > > Regards, > > Niklas S?derlund -- Regards, Niklas S?derlund