Received: by 10.192.165.156 with SMTP id m28csp1789843imm; Thu, 12 Apr 2018 03:38:07 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/PP1GnfcR39cTZ5aAw1e9ValoQcBD1ogdGTJtVpT1v8wTYenKc/OckpmBSIar2thIxikrP X-Received: by 10.98.254.17 with SMTP id z17mr7148735pfh.105.1523529487488; Thu, 12 Apr 2018 03:38:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523529487; cv=none; d=google.com; s=arc-20160816; b=jBKAyUPBSVgrUtkWzZOE97WgQJGL9XRUDt4YFpNf8Q0e23HQPcfI7LSQ3aoIESOOTF DPId85AXYaRKwzZ8s6zh4XtfKXNpWOSQCUgTFz8CagNnbTlMG854Yeu41KIrlHaVDebs 5+A1Xhx9z6T+ASkH+mgy1+5IdIsjNmAZe+8YCaWDeT+wKT5PNGjU/lUOFFAG1IrvPRNM 4APBhcPlDCftCTdI9v8mI64EROcVvG4q9GuffVZUSPgiqtMvKGov6CyI8tONfmaijhmL UPIe9PEnLsHcA3hmklK5WSaWXI6Egwrh/Zo+unY5TaS2p+XTUN87vkgobjv14S0Ly3da wNCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=+Fbk8o4D7/BYjrEYSGEBU4RLWYZ9ZwOz+DA3QiFCb6Y=; b=Yg9WJBp92qzH23TC9CyNumdFoW0sGKDlsvyF5/cVKdgkatvP0Hokb4E73AU97xdm4H vsQNte6sJ7+xUyGXgwgwMdtUjytWKRZ2xzrHc/J6ZIs3DJ0nk9pE2VSvMYDXTL7HcSkc RuFD9Q0kpBh/pkptPVu4m16DjW909d8fSEwqJ8JPqBI6gFE8SbPiYaVSfXdJQRMVOOoZ h1Jel1D4EvqKvDW5tY0Ne43uoqLwvG+178K4yOCF9bggvCRfVySE43nQuLy7jJyY/5oy cvV2Afs5nzZ+iBGbKfuDMSuylzA3ldBoEp2Cg+aPM1kBUC/tl6Z/9oqxZIfPCVeVYPkm SVUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=P+UOpnZp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m63si2155147pgm.302.2018.04.12.03.37.30; Thu, 12 Apr 2018 03:38:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=P+UOpnZp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752853AbeDLKdu (ORCPT + 99 others); Thu, 12 Apr 2018 06:33:50 -0400 Received: from mail-it0-f51.google.com ([209.85.214.51]:40713 "EHLO mail-it0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752167AbeDLKds (ORCPT ); Thu, 12 Apr 2018 06:33:48 -0400 Received: by mail-it0-f51.google.com with SMTP id u62-v6so6388478ita.5 for ; Thu, 12 Apr 2018 03:33:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=+Fbk8o4D7/BYjrEYSGEBU4RLWYZ9ZwOz+DA3QiFCb6Y=; b=P+UOpnZpzEr7lnCsEQLaHge1wI5pGqF6dOK7M3mNv3d/Nx59wSKaFhXqK1k0A0uYJK cKDet+XihiYcL5jGeZkgg8QGovfOxefXNTH5PjyzkHBX03wmwf6xbND9VhbJ3XeG51d9 5wZq0hzRn8qU1MHY+aHkOAE+ws1bL0YNmhIbw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=+Fbk8o4D7/BYjrEYSGEBU4RLWYZ9ZwOz+DA3QiFCb6Y=; b=Crgu+JNknSds6+1f+u5odr44Vnn3yRNFhwbndqS4nKUlSOoTUKdbBt7VieBzG8XXxp u4dOivjlFQlQEAfKMJEfPJvbyQYy+GCwI0PzS2I0u83rYaqcQGFZ1s4h2hdqside6BuP lmPe6WlWlCneu67BrXhk6nndW7xdZobweakQggEFvm0Ed8DhuhUEPxOIoY3cD+1Eljut NhjqS04zWr9HGm4k51YPQ6LQ4fi+B9nhrvS1G0Zs//tCjf26Zd4vrDg0ZpZ/BmYKdTV0 itYmyPQDufcdT3ULs0d4KI5WOfMWR6/RNEEl5BknzbdXH3YDkBquzqe7LzFp8NkNSNfO 5Crw== X-Gm-Message-State: ALQs6tAHmi0Biln9uab+3y4CxJn0QO27E2MIIuWNy4DNWWWN/TDLiaa8 v6FLQlgX5KwHZjDvIoeCtRMuSwPTGy29RT7LAc66HEwny4M= X-Received: by 2002:a24:1acc:: with SMTP id 195-v6mr279074iti.89.1523529227577; Thu, 12 Apr 2018 03:33:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.222.20 with HTTP; Thu, 12 Apr 2018 03:33:27 -0700 (PDT) In-Reply-To: <20180412091822.GG12256@bigcity.dyn.berto.se> References: <20180412091822.GG12256@bigcity.dyn.berto.se> From: Vincent Guittot Date: Thu, 12 Apr 2018 12:33:27 +0200 Message-ID: Subject: Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") To: =?UTF-8?Q?Niklas_S=C3=B6derlund?= Cc: Peter Zijlstra , "Paul E. McKenney" , Ingo Molnar , linux-kernel , linux-renesas-soc@vger.kernel.org, Heiner Kallweit Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Niklas, On 12 April 2018 at 11:18, Niklas S=C3=B6derlund wrote: > Hi Vincent, > > I have observed issues running on linus/master from a few days back [1]. > I'm running on a Renesas Koelsch board (arm32) and I can trigger a issue > by X forwarding the v4l2 test application qv4l2 over ssh and moving the > courser around in the GUI (best test case description award...). I'm > sorry about the really bad way I trigger this but I can't do it in any > other way, I'm happy to try other methods if you got some ideas. The > symptom of the issue is a complete hang of the system for more then 30 > seconds and then this information is printed in the console: Heiner (edded cc) also reported similar problem with his platform: a dual core celeron Do you confirm that your platform is a dual cortex-A15 ? At least that what I have seen on web This would confirm that dual system is a key point. The ssh connection is also common with Heiner's setup > > [ 142.849390] INFO: rcu_sched detected stalls on CPUs/tasks: > [ 142.854972] 1-...!: (1 GPs behind) idle=3D7a4/0/0 softirq=3D3214/3217= fqs=3D0 > [ 142.861976] (detected by 0, t=3D8232 jiffies, g=3D930, c=3D929, q=3D1= 1) > [ 142.868042] Sending NMI from CPU 0 to CPUs 1: > [ 142.872443] NMI backtrace for cpu 1 > [ 142.872452] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-05506-g28= aba11c1393691a #14 > [ 142.872455] Hardware name: Generic R8A7791 (Flattened Device Tree) > [ 142.872473] PC is at arch_cpu_idle+0x28/0x44 > [ 142.872484] LR is at trace_hardirqs_on_caller+0x1a4/0x1d4 > [ 142.872488] pc : [] lr : [] psr: 20070013 > [ 142.872491] sp : eb0b9f90 ip : eb0b9f60 fp : eb0b9f9c > [ 142.872495] r10: 00000000 r9 : 413fc0f2 r8 : 4000406a > [ 142.872498] r7 : c0c08478 r6 : c0c0842c r5 : ffffe000 r4 : 00000002 > [ 142.872502] r3 : eb0b6ec0 r2 : 00000000 r1 : 00000004 r0 : 00000001 > [ 142.872507] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segme= nt none > [ 142.872511] Control: 10c5387d Table: 6a61406a DAC: 00000051 > [ 142.872516] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-05506-g28= aba11c1393691a #14 > [ 142.872519] Hardware name: Generic R8A7791 (Flattened Device Tree) > [ 142.872522] Backtrace: > [ 142.872534] [] (dump_backtrace) from [] (show_stac= k+0x18/0x1c) > [ 142.872540] r7:c0c81388 r6:00000000 r5:60070193 r4:c0c81388 > [ 142.872550] [] (show_stack) from [] (dump_stack+0x= a4/0xd8) > [ 142.872557] [] (dump_stack) from [] (show_regs+0x1= 4/0x18) > [ 142.872563] r9:00000001 r8:00000000 r7:c0c4f678 r6:eb0b9f40 r5:000000= 01 r4:c13e1130 > [ 142.872571] [] (show_regs) from [] (nmi_cpu_backtr= ace+0xfc/0x118) > [ 142.872578] [] (nmi_cpu_backtrace) from [] (handle= _IPI+0x2a8/0x320) > [ 142.872583] r7:c0c4f678 r6:eb0b9f40 r5:00000007 r4:c0b75b68 > [ 142.872594] [] (handle_IPI) from [] (gic_handle_ir= q+0x8c/0x98) > [ 142.872599] r10:00000000 r9:eb0b8000 r8:f0803000 r7:c0c4f678 r6:eb0b9= f40 r5:c0c08a90 > [ 142.872602] r4:f0802000 > [ 142.872608] [] (gic_handle_irq) from [] (__irq_svc= +0x70/0x98) > [ 142.872612] Exception stack(0xeb0b9f40 to 0xeb0b9f88) > [ 142.872618] 9f40: 00000001 00000004 00000000 eb0b6ec0 00000002 ffffe00= 0 c0c0842c c0c08478 > [ 142.872624] 9f60: 4000406a 413fc0f2 00000000 eb0b9f9c eb0b9f60 eb0b9f9= 0 c01747a8 c01088a4 > [ 142.872627] 9f80: 20070013 ffffffff > [ 142.872632] r9:eb0b8000 r8:4000406a r7:eb0b9f74 r6:ffffffff r5:200700= 13 r4:c01088a4 > [ 142.872642] [] (arch_cpu_idle) from [] (default_id= le_call+0x34/0x38) > [ 142.872650] [] (default_idle_call) from [] (do_idl= e+0xe0/0x134) > [ 142.872656] [] (do_idle) from [] (cpu_startup_entr= y+0x20/0x24) > [ 142.872660] r7:c0c8e9d0 r6:10c0387d r5:00000051 r4:00000085 > [ 142.872667] [] (cpu_startup_entry) from [] (second= ary_start_kernel+0x114/0x134) > [ 142.872673] [] (secondary_start_kernel) from [<401026ec>] (0= x401026ec) > [ 142.872676] r5:00000051 r4:6b0a406a > [ 142.873456] rcu_sched kthread starved for 8235 jiffies! g930 c929 f0x0= RCU_GP_WAIT_FQS(3) ->state=3D0x402 ->cpu=3D0 > [ 143.135040] RCU grace-period kthread stack dump: > [ 143.139695] rcu_sched I 0 9 2 0x00000000 > [ 143.145234] Backtrace: > [ 143.147719] [] (__schedule) from [] (schedule+0x94= /0xb8) > [ 143.154823] r10:c0b714c0 r9:c0c85f8a r8:ffffffff r7:eb0abec4 r6:ffffa= 274 r5:00000000 > [ 143.162712] r4:ffffe000 > [ 143.165273] [] (schedule) from [] (schedule_timeou= t+0x440/0x4b0) > [ 143.173076] r5:00000000 r4:eb79c4c0 > [ 143.176692] [] (schedule_timeout) from [] (rcu_gp_= kthread+0x958/0x150c) > [ 143.185108] r10:c0c87274 r9:00000000 r8:c0c165b8 r7:00000001 r6:00000= 000 r5:c0c16590 > [ 143.192997] r4:c0c16300 > [ 143.195560] [] (rcu_gp_kthread) from [] (kthread+0= x148/0x160) > [ 143.203099] r7:c0c16300 > [ 143.205660] [] (kthread) from [] (ret_from_fork+0x= 14/0x20) > [ 143.212938] Exception stack(0xeb0abfb0 to 0xeb0abff8) > [ 143.218030] bfa0: 00000000 0000000= 0 00000000 00000000 > [ 143.226271] bfc0: 00000000 00000000 00000000 00000000 00000000 0000000= 0 00000000 00000000 > [ 143.234511] bfe0: 00000000 00000000 00000000 00000000 00000013 0000000= 0 > [ 143.241177] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000= 000 r5:c0145d70 > [ 143.249065] r4:eb037b00 > > After the freeze the system becomes responsive again and I can sometimes > trigger the hang multiple times. I tried to bisect the problem and I > found that by reverting [2] I can no longer reproduce the issue. I can > also not reproduce the issue on v4.16. I can't figure out if reverting > [2] is just treating a symptom or the root cause of my troubles and > would appreciate your input. Also my "test-case" do not trigger every > time but I have tested this scenario quiet a lot and the result seems to > be constant. > > My test setup involves a NFS root filesystem, I ssh in and launch the > GUI application over X forwarding. From what I know the application is > not doing any ioctl calls to the v4l2 framework it's just sitting there > idle as I move the courser around showing tool tips and such as I hover > over elements and then it freeze up. I have not observed this issue by > just booting the system and leaving it idle, movement in the GUI seems > to be the key to trigger this. > > I'm a bit lost on how to progress with this issue and would appreciate > any help you can provide to help me figure this out. Can you send me your config ? I'm going to prepare a debug patch to spy what's happening when entering id= le Regards, Vincent > > 1. c18bb396d3d261eb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git= /davem/net")) > 2. 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") > > -- > Regards, > Niklas S=C3=B6derlund